Re: [PATCH v32 0/4] scsi: ufs: Add Host Performance Booster Support

2021-04-06 Thread Javier Gonzalez

On 31.03.2021 21:40, Bean Huo wrote:

Hi Martin

I don't know when/how do you plan to accept this patch. I think the
Mobile vendors and chipset vendors are all looking forward to this UFS
HPB feature that can be mainlined in the upstream Linux. Since the
first version HPB driver submitted in the community, it is now V32, and
we have been working on this feature for two years. Would you please
take a look at this? thanks.


Hi Bean,

I believe it would help if you and others are willing to review this
patchset thoroughly and give a reviewed-by / tested-by tag. This will
give Martin and others more confidence that there is vendor alignment
and that there is a group of people is willing to maintain HPB going
forward.

What do you think?

Javier


Re: [PATCH 2/2] nvme: add emulation for zone-append

2020-08-20 Thread Javier Gonzalez

On 20.08.2020 13:07, Kanchan Joshi wrote:

On Thu, Aug 20, 2020 at 3:22 AM Keith Busch  wrote:


On Wed, Aug 19, 2020 at 01:11:58PM -0600, David Fugate wrote:
> Intel does not support making *optional* NVMe spec features *required*
> by the NVMe driver.

This is inaccurate. As another example, the spec optionally allows a
zone size to be a power of 2, but linux requires a power of 2 if you
want to use it with this driver.

> Provided there's no glaring technical issues

There are many. Some may be improved through a serious review process,
but the mess it makes out of the fast path is measurably harmful to
devices that don't subscribe to this. That issue is not so easily
remedied.

Since this patch is a copy of the scsi implementation, the reasonable
thing is take this fight to the generic block layer for a common
solution. We're not taking this in the nvme driver.


I sincerely want to minimize any adverse impact to the fast-path of
non-zoned devices.
My understanding of that aspect is (I feel it is good to confirm,
irrespective of the future of this patch):

1. Submission path:
There is no extra code for non-zoned device IO. For append, there is
this "ns->append_emulate = 1" check.
Snippet -
   case REQ_OP_ZONE_APPEND:
-   ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_zone_append);
+   if (!nvme_is_append_emulated(ns))
+   ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_zone_append);
+   else {
+   /* prepare append like write, and adjust lba
afterwards */

2. Completion:
Not as clean as submission for sure.
The extra check is "ns && ns->append_emulate == 1" for completions if
CONFIG_ZONED is enabled.
A slightly better completion-handling version (than the submitted patch) is -

-   } else if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
-  req_op(req) == REQ_OP_ZONE_APPEND) {
-   req->__sector = nvme_lba_to_sect(req->q->queuedata,
-   le64_to_cpu(nvme_req(req)->result.u64));
+   } else if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
+   struct nvme_ns *ns = req->q->queuedata;
+   /* append-emulation requires wp update for some cmds*/
+   if (ns && nvme_is_append_emulated(ns)) {
+   if (nvme_need_zone_wp_update(req))
+   nvme_zone_wp_update(ns, req, status);
+   }
+   else if (req_op(req) == REQ_OP_ZONE_APPEND)
+   req->__sector = nvme_lba_to_sect(ns,
+   le64_to_cpu(nvme_req(req)->result.u64));

Am I missing any other fast-path pain-points?

A quick 1 minute 4K randwrite run (QD 64, 4 jobs,libaio) shows :
before: IOPS=270k, BW=1056MiB/s (1107MB/s)(61.9GiB/60002msec)
after:  IOPS=270k, BW=1055MiB/s (1106MB/s)(61.8GiB/60005msec)


It is good to use the QEMU "simulation" path that we implemented to test
performance with different delays, etc., but for these numbers to make
sense we need to put them in contrast to the simulated NAND speed, etc.



This maynot be the best test to see the cost, and I am happy to
conduct more and improvise.

As for the volume of the code - it is same as SCSI emulation. And I
can make efforts to reduce that by moving common code to blk-zone, and
reuse in SCSI/NVMe emulation.
In the patch I tried to isolate append-emulation by keeping everything
into "nvme_za_emul". It contains nothing nvme'ish except nvme_ns,
which can be turned into "driver_data".

+#ifdef CONFIG_BLK_DEV_ZONED
+struct nvme_za_emul {
+   unsigned int nr_zones;
+   spinlock_t zones_wp_offset_lock;
+   u32 *zones_wp_offset;
+   u32 *rev_wp_offset;
+   struct work_struct zone_wp_offset_work;
+   char *zone_wp_update_buf;
+   struct mutex rev_mutex;
+   struct nvme_ns *ns;
+};
+#endif

Will that be an acceptable line of further work/discussions?


I know we spent time enabling this path, but I don't think that moving
the discussion to the block layer will have much more benefit.

Let's keep the support for these non-append devices in xNVMe and focus
on the support for the append-enabled ones in Linux. We have a lot of
good stuff in the backlog that we can start pushing.



Re: [PATCH 2/2] nvme: add emulation for zone-append

2020-08-20 Thread Javier Gonzalez

On 20.08.2020 08:52, Christoph Hellwig wrote:

On Thu, Aug 20, 2020 at 08:37:19AM +0200, Javier Gonzalez wrote:

We will stop pushing for this emulation. We have a couple of SSDs where
we disabled Append, we implemented support for them, and we wanted to
push the changes upstream. That's it. This is no politics not a
conspiracy against the current ZNS spec. We spent a lot of time working
on this spec and are actually doing a fair amount of work to support
Append other places in the stack. In any case, the fuzz stops here.


FYI, from knowing your personally I'm pretty confident you are not
part of a conspiracy and you are just doing your best given the context,
and I appreciate all your work!


Thanks Christoph.



I'm a lot less happy about thinks that happen in other groups not
involving you directly, and I'm still pretty mad at how the games were
played there, and especially other actors the seem to be reading the
list here, and instead of taking part in the discussion are causing fuzz
in completely unrelated venues.


Yes. Hopefully, we will start keeping things separated and using this
list for Linux, technical conversations only.



Re: [PATCH 2/2] nvme: add emulation for zone-append

2020-08-20 Thread Javier Gonzalez

On 19.08.2020 12:43, Christoph Hellwig wrote:

On Wed, Aug 19, 2020 at 09:14:13AM +, Damien Le Moal wrote:

While defining a zone append command for SCSI/ZBC is possible (using sense data
for returning the written offset), there is no way to define zone append for
SATA/ZAC without entirely breaking the ATA command model. This is why we went
after an emulation implementation instead of trying to standardized native
commands. That implementation does not have any performance impact over regular
writes *and* zone write locking does not in general degrade HDD write
performance (only a few corner cases suffer from it). Comparing things equally,
the same could be said of NVMe drives that do not have zone append native
support: performance will be essentially the same using regular writes and
emulated zone append. But mq-deadline and zone write locking will significantly
lower performance for emulated zone append compared to a native zone append
support by the drive.


And to summarize the most important point - Zone Append doesn't exist
in ZAC/ABC.  For people that spent the last years trying to make zoned
storage work, the lack of such a primite has been the major pain point.
That's why I came up with the Zone Append design in response to a
request for such an operation from another company that is now heavily
involved in both Linux development and hosting Linux VMs.  For ZAC and
ZBC the best we can do is to emulate the approach in the driver, but
for NVMe we can do it.  ZNS until just before the release had Zone
Append mandatory, and it did so for a very good reason.  While making
it optional allows OEMs to request drives without it, I fundamentally
think we should not support that in Linux and request vendors do
implement writes to zones the right way.


Ok. We will just pursue Linux support for the ZNS following the append
model.



And just as some OEMs can request certain TPs or optional features to
be implemented, so can Linux.  Just to give an example from the zone
world - Linux requires uniform and power of two zone sizes, which in
ZAC and ZBC are not required.




Re: [PATCH 2/2] nvme: add emulation for zone-append

2020-08-20 Thread Javier Gonzalez

On 19.08.2020 13:25, Jens Axboe wrote:

On 8/19/20 12:11 PM, David Fugate wrote:

On Tue, 2020-08-18 at 07:12 +, Christoph Hellwig wrote:

On Tue, Aug 18, 2020 at 10:59:36AM +0530, Kanchan Joshi wrote:

If drive does not support zone-append natively, enable emulation
using
regular write.
Make emulated zone-append cmd write-lock the zone, preventing
concurrent append/write on the same zone.


I really don't think we should add this.  ZNS and the Linux support
were all designed with Zone Append in mind, and then your company did
the nastiest possible move violating the normal NVMe procedures to
make
it optional.  But that doesn't change the fact the Linux should keep
requiring it, especially with the amount of code added here and how
it
hooks in the fast path.


Intel does not support making *optional* NVMe spec features *required*
by the NVMe driver.


It's not required, the driver will function quite fine without it. If you
want to use ZNS it's required. The Linux driver thankfully doesn't need
any vendor to sign off on what it can or cannot do, or what features
are acceptable.


It's forgivable WDC's accepted contribution didn't work with other
vendors' devices choosing not to implement the optional Zone Append,
but it's not OK to reject contributions remedying this.  Provided
there's no glaring technical issues, Samsung's contribution should be
accepted to maintain both spec compliance as well as vendor neutrality.


It's *always* ok to reject contributions, if those contributions cause
maintainability issues, unacceptable slowdowns, or whatever other issue
that the maintainers of said driver don't want to deal with. Any
contribution should be judged on merit, not based on political decisions
or opinions. Obviously this thread reeks of it.



I'll reply here, where the discussion diverges from this particular
patch.

We will stop pushing for this emulation. We have a couple of SSDs where
we disabled Append, we implemented support for them, and we wanted to
push the changes upstream. That's it. This is no politics not a
conspiracy against the current ZNS spec. We spent a lot of time working
on this spec and are actually doing a fair amount of work to support
Append other places in the stack. In any case, the fuzz stops here.

Javier


Re: [PATCH 2/2] nvme: add emulation for zone-append

2020-08-19 Thread Javier Gonzalez

On 19.08.2020 09:40, Christoph Hellwig wrote:

On Tue, Aug 18, 2020 at 08:04:28PM +0200, Javier Gonzalez wrote:

I understand that you want vendor alignment in the NVMe driver and I
agree. We are not pushing for a non-append model - you can see that we
are investing effort in implementing the append path in thee block layer
and io_uring and we will continue doing so as patches get merged.

This said, we do have some OEM models that do not implement append and I
would like them to be supported in Linux. As you know, new TPs are being
standardized now and the append emulation is the based for adding
support for this. I do not believe it is unreasonable to find a way to
add support for this SSDs.


I do not think we should support anything but Zone Append, especially not
the new TP, which is going to add even more horrible code for absolutely
no good reason.


I must admit that this is a bit frustrating. The new TP adds
functionality beyond operating as an Append alternative that I would
very much like to see upstream (do want to discuss details here).

I understand the concerns about deviating from the Append model, but I
believe we should find a way to add these new features. We are hiding
all the logic in the NVMe driver and not touching the interface with the
block layer, so the overall model is really not changed.




If you completely close the door this approach, the alternative is
carrying off-tree patches to the several OEMs that use these devices.
This is not good for the zoned ecosystem nor for the future of Zone
Append.


I really don't have a problem with that.  If these OEMs want to use
an inferior access model only, they have to pay the price for it.
I also don't think that proxy arguments are very useful.  If you OEMs
are troubled by carrying patches becomes they decided to buy inferior
drivers they are perfectly happy to argue their cause here on the list.


I am not arguing as a proxy, I am stating the trouble we see from our
perspective in having to diverge from mainline when our approach is
being upstream first.

Whether the I/O mode is inferior or superior, they can answer that
themselves if they read this list.



Are you open to us doing some characterization and if the impact
to the fast path is not significant, moving ahead to a Zone Append
emulation like in SCSI? I will promise that we will remove this path if
requests for these devices terminate.


As said I do not think implementing zone append emulation or the TP that
shall not be named are a good idea for Linux.


I would ask you to reconsider this position. I have a hard time
understanding how zone append emulation is a good idea in SCSI and not
in NVMe, when there is no performance penalty.


Re: [PATCH 2/2] nvme: add emulation for zone-append

2020-08-18 Thread Javier Gonzalez

On 18.08.2020 10:39, Keith Busch wrote:

On Tue, Aug 18, 2020 at 07:29:12PM +0200, Javier Gonzalez wrote:

On 18.08.2020 09:58, Keith Busch wrote:
> On Tue, Aug 18, 2020 at 11:50:33AM +0200, Javier Gonzalez wrote:
> > a number of customers are requiring the use of normal writes, which we
> > want to support.
>
> A device that supports append is completely usable for those customers,
> too. There's no need to create divergence in this driver.

Not really. You know as well as I do that some features are disabled for
a particular SSD model on customer requirements. Generic models
implementing append can submit both I/Os, but those that remove append
are left out.


You are only supporting my point: if your device supports append, you
get to work in every ZNS use case, otherwise you only get to work in a
subset.


There are several devices. Some of them enable append and some do not. I
would like to support the later too.



Re: [PATCH 2/2] nvme: add emulation for zone-append

2020-08-18 Thread Javier Gonzalez

On 18.08.2020 12:51, Matias Bjørling wrote:

On 18/08/2020 11.50, Javier Gonzalez wrote:

On 18.08.2020 09:12, Christoph Hellwig wrote:

On Tue, Aug 18, 2020 at 10:59:36AM +0530, Kanchan Joshi wrote:

If drive does not support zone-append natively, enable emulation using
regular write.
Make emulated zone-append cmd write-lock the zone, preventing
concurrent append/write on the same zone.


I really don't think we should add this.  ZNS and the Linux support
were all designed with Zone Append in mind, and then your company did
the nastiest possible move violating the normal NVMe procedures to make
it optional.  But that doesn't change the fact the Linux should keep
requiring it, especially with the amount of code added here and how it
hooks in the fast path.


I understand that the NVMe process was agitated and that the current ZNS
implementation in Linux relies in append support from the device
perspective. However, the current TP does allow for not implementing
append, and a number of customers are requiring the use of normal
writes, which we want to support.


There is a lot of things that is specified in NVMe, but not 
implemented in the Linux kernel. That your company is not able to 
efficiently implement the Zone Append command (this is the only reason 
I can think of that make you and your company cause such a fuss), 


This comment is out of place and I will choose to ignore it.


shouldn't mean that everyone else has to suffer.


This is not a quirk, nor a software work-around, This is a design
decision affecting the storage stack of several OEMs. As such, I would
like to find a way to implement this functionality.

Last time we discussed this in the mailing list, you among others
pointed to append emulation as the best way to enable this path and here
we are. Can you explained what changed?



In any case, SPDK offers adequate support and can be used today.


We take the SPDK discussion in the appropriate mailing lists and slack
channels.



Re: [PATCH 2/2] nvme: add emulation for zone-append

2020-08-18 Thread Javier Gonzalez

On 18.08.2020 17:50, Christoph Hellwig wrote:

On Tue, Aug 18, 2020 at 11:50:33AM +0200, Javier Gonzalez wrote:

I understand that the NVMe process was agitated and that the current ZNS
implementation in Linux relies in append support from the device
perspective. However, the current TP does allow for not implementing
append, and a number of customers are requiring the use of normal
writes, which we want to support.


The NVMe TPs allow for lots of things, but that doesn't mean we have
to support it.


Agree. As I replied to Keith, I am just interested in enabling the ZNS
models that come with append disabled.




Do you have any early suggestion on how you this patch should look like
to be upstreamable?


My position is that at this point in time we should not consider it.
Zone Append is the major feature in ZNS that solves the issue in ZAC/ZBC.
I want to see broad industry support for it instead of having to add more
code just for zone append emulation than actual current ZNS support.  If
in a few years the market place has decided and has lots of drives
available in the consuer market or OEM channels we'll have to reconsider
and potentially merge Zone Append emulation.  But my deep hope is that
this does not happen, as it sets us back 10 years in the standards of
zoned storage support again.


I understand that you want vendor alignment in the NVMe driver and I
agree. We are not pushing for a non-append model - you can see that we
are investing effort in implementing the append path in thee block layer
and io_uring and we will continue doing so as patches get merged.

This said, we do have some OEM models that do not implement append and I
would like them to be supported in Linux. As you know, new TPs are being
standardized now and the append emulation is the based for adding
support for this. I do not believe it is unreasonable to find a way to
add support for this SSDs.

If you completely close the door this approach, the alternative is
carrying off-tree patches to the several OEMs that use these devices.
This is not good for the zoned ecosystem nor for the future of Zone
Append.

Are you open to us doing some characterization and if the impact
to the fast path is not significant, moving ahead to a Zone Append
emulation like in SCSI? I will promise that we will remove this path if
requests for these devices terminate.



Re: [PATCH 2/2] nvme: add emulation for zone-append

2020-08-18 Thread Javier Gonzalez

On 18.08.2020 09:58, Keith Busch wrote:

On Tue, Aug 18, 2020 at 11:50:33AM +0200, Javier Gonzalez wrote:

a number of customers are requiring the use of normal writes, which we
want to support.


A device that supports append is completely usable for those customers,
too. There's no need to create divergence in this driver.


Not really. You know as well as I do that some features are disabled for
a particular SSD model on customer requirements. Generic models
implementing append can submit both I/Os, but those that remove append
are left out.

I would like to understand how we can enable these NVMe-compatible
models in Linux. If it is a performance concern, we will address it.



Re: [PATCH 2/2] nvme: add emulation for zone-append

2020-08-18 Thread Javier Gonzalez

On 18.08.2020 09:12, Christoph Hellwig wrote:

On Tue, Aug 18, 2020 at 10:59:36AM +0530, Kanchan Joshi wrote:

If drive does not support zone-append natively, enable emulation using
regular write.
Make emulated zone-append cmd write-lock the zone, preventing
concurrent append/write on the same zone.


I really don't think we should add this.  ZNS and the Linux support
were all designed with Zone Append in mind, and then your company did
the nastiest possible move violating the normal NVMe procedures to make
it optional.  But that doesn't change the fact the Linux should keep
requiring it, especially with the amount of code added here and how it
hooks in the fast path.


I understand that the NVMe process was agitated and that the current ZNS
implementation in Linux relies in append support from the device
perspective. However, the current TP does allow for not implementing
append, and a number of customers are requiring the use of normal
writes, which we want to support.

During the initial patch review we discussed this and we agreed that the
block layer is designed for append on zone devices, and that for the
foreseeable future this was not going to change. We therefore took the
feedback and followed a similar approach as in the SCSI driver for
implementing append emulation.

We are happy to do more characterization on the impact of these hooks in
the non-zoned hast path and eventually changing the approach if this
proves to be a problem. Our thought is to isolate any potential
performance degradation to the zoned path using the emulation (we do not
see any ATM).

Do you have any early suggestion on how you this patch should look like
to be upstreamable?

Javier


Re: [PATCH 1/3] lightnvm: pblk: refactor metadata paths

2018-08-29 Thread Javier Gonzalez

> On 29 Aug 2018, at 15.02, Matias Bjørling  wrote:
> 
> On 08/29/2018 10:56 AM, Javier González wrote:
>> pblk maintains two different metadata paths for smeta and emeta, which
>> store metadata at the start of the line and at the end of the line,
>> respectively. Until now, these path has been common for writing and
>> retrieving metadata, however, as these paths diverge, the common code
>> becomes less clear and unnecessary complicated.
>> In preparation for further changes to the metadata write path, this
>> patch separates the write and read paths for smeta and emeta and
>> removes the synchronous emeta path as it not used anymore (emeta is
>> scheduled asynchronously to prevent jittering due to internal I/Os).
>> Signed-off-by: Javier González 
>> ---
>>  drivers/lightnvm/pblk-core.c | 338 
>> ++-
>>  drivers/lightnvm/pblk-gc.c   |   2 +-
>>  drivers/lightnvm/pblk-recovery.c |   4 +-
>>  drivers/lightnvm/pblk.h  |   4 +-
>>  4 files changed, 163 insertions(+), 185 deletions(-)
>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>> index dbf037b2b32f..09160ec02c5f 100644
>> --- a/drivers/lightnvm/pblk-core.c
>> +++ b/drivers/lightnvm/pblk-core.c
>> @@ -661,12 +661,137 @@ u64 pblk_lookup_page(struct pblk *pblk, struct 
>> pblk_line *line)
>>  return paddr;
>>  }
>>  -/*
>> - * Submit emeta to one LUN in the raid line at the time to avoid a deadlock 
>> when
>> - * taking the per LUN semaphore.
>> - */
>> -static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line 
>> *line,
>> - void *emeta_buf, u64 paddr, int dir)
>> +u64 pblk_line_smeta_start(struct pblk *pblk, struct pblk_line *line)
>> +{
>> +struct nvm_tgt_dev *dev = pblk->dev;
>> +struct nvm_geo *geo = >geo;
>> +struct pblk_line_meta *lm = >lm;
>> +int bit;
>> +
>> +/* This usually only happens on bad lines */
>> +bit = find_first_zero_bit(line->blk_bitmap, lm->blk_per_line);
>> +if (bit >= lm->blk_per_line)
>> +return -1;
>> +
>> +return bit * geo->ws_opt;
>> +}
>> +
>> +int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
>> +{
>> +struct nvm_tgt_dev *dev = pblk->dev;
>> +struct pblk_line_meta *lm = >lm;
>> +struct bio *bio;
>> +struct nvm_rq rqd;
>> +u64 paddr = pblk_line_smeta_start(pblk, line);
>> +int i, ret;
>> +
>> +memset(, 0, sizeof(struct nvm_rq));
>> +
>> +rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
>> +_meta_list);
>> +if (!rqd.meta_list)
>> +return -ENOMEM;
>> +
>> +rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
>> +rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;
> 
> If patch 2 is put first, then this is not needed.
> 

True... I will reorder with the changes on 2/3.



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 1/3] lightnvm: pblk: refactor metadata paths

2018-08-29 Thread Javier Gonzalez

> On 29 Aug 2018, at 15.02, Matias Bjørling  wrote:
> 
> On 08/29/2018 10:56 AM, Javier González wrote:
>> pblk maintains two different metadata paths for smeta and emeta, which
>> store metadata at the start of the line and at the end of the line,
>> respectively. Until now, these path has been common for writing and
>> retrieving metadata, however, as these paths diverge, the common code
>> becomes less clear and unnecessary complicated.
>> In preparation for further changes to the metadata write path, this
>> patch separates the write and read paths for smeta and emeta and
>> removes the synchronous emeta path as it not used anymore (emeta is
>> scheduled asynchronously to prevent jittering due to internal I/Os).
>> Signed-off-by: Javier González 
>> ---
>>  drivers/lightnvm/pblk-core.c | 338 
>> ++-
>>  drivers/lightnvm/pblk-gc.c   |   2 +-
>>  drivers/lightnvm/pblk-recovery.c |   4 +-
>>  drivers/lightnvm/pblk.h  |   4 +-
>>  4 files changed, 163 insertions(+), 185 deletions(-)
>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>> index dbf037b2b32f..09160ec02c5f 100644
>> --- a/drivers/lightnvm/pblk-core.c
>> +++ b/drivers/lightnvm/pblk-core.c
>> @@ -661,12 +661,137 @@ u64 pblk_lookup_page(struct pblk *pblk, struct 
>> pblk_line *line)
>>  return paddr;
>>  }
>>  -/*
>> - * Submit emeta to one LUN in the raid line at the time to avoid a deadlock 
>> when
>> - * taking the per LUN semaphore.
>> - */
>> -static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line 
>> *line,
>> - void *emeta_buf, u64 paddr, int dir)
>> +u64 pblk_line_smeta_start(struct pblk *pblk, struct pblk_line *line)
>> +{
>> +struct nvm_tgt_dev *dev = pblk->dev;
>> +struct nvm_geo *geo = >geo;
>> +struct pblk_line_meta *lm = >lm;
>> +int bit;
>> +
>> +/* This usually only happens on bad lines */
>> +bit = find_first_zero_bit(line->blk_bitmap, lm->blk_per_line);
>> +if (bit >= lm->blk_per_line)
>> +return -1;
>> +
>> +return bit * geo->ws_opt;
>> +}
>> +
>> +int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
>> +{
>> +struct nvm_tgt_dev *dev = pblk->dev;
>> +struct pblk_line_meta *lm = >lm;
>> +struct bio *bio;
>> +struct nvm_rq rqd;
>> +u64 paddr = pblk_line_smeta_start(pblk, line);
>> +int i, ret;
>> +
>> +memset(, 0, sizeof(struct nvm_rq));
>> +
>> +rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
>> +_meta_list);
>> +if (!rqd.meta_list)
>> +return -ENOMEM;
>> +
>> +rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
>> +rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;
> 
> If patch 2 is put first, then this is not needed.
> 

True... I will reorder with the changes on 2/3.



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH] lightnvm: pblk: fix rqd.error return value in pblk_blk_erase_sync

2018-08-03 Thread Javier Gonzalez
> On 2 Aug 2018, at 22.50, Matias Bjørling  wrote:
> 
> rqd.error is masked by the return value of pblk_submit_io_sync.
> The rqd structure is then passed on to the end_io function, which
> assumes that any error should lead to a chunk being marked
> offline/bad. Since the pblk_submit_io_sync can fail before the
> command is issued to the device, the error value maybe not correspond
> to a media failure, leading to chunks being immaturely retired.
> 
> Also, the pblk_blk_erase_sync function prints an error message in case
> the erase fails. Since the caller prints an error message by itself,
> remove the error message in this function.
> 
> Signed-off-by: Matias Bjørling 
> ---
> drivers/lightnvm/pblk-core.c | 19 ++-
> 1 file changed, 2 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 72acf2f6dbd6..814204d22a2e 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -886,10 +886,8 @@ static void pblk_setup_e_rq(struct pblk *pblk, struct 
> nvm_rq *rqd,
> 
> static int pblk_blk_erase_sync(struct pblk *pblk, struct ppa_addr ppa)
> {
> - struct nvm_rq rqd;
> - int ret = 0;
> -
> - memset(, 0, sizeof(struct nvm_rq));
> + struct nvm_rq rqd = {0};

This is a matter of taste, but if you want to squeeze it in here, it is
fine by me. There are other places with the same pattern; if you feel
strongly about this then please send a patch changing it in all the
places.

> + int ret;
> 
>   pblk_setup_e_rq(pblk, , ppa);
> 
> @@ -897,19 +895,6 @@ static int pblk_blk_erase_sync(struct pblk *pblk, struct 
> ppa_addr ppa)
>* with writes. Thus, there is no need to take the LUN semaphore.
>*/
>   ret = pblk_submit_io_sync(pblk, );
> - if (ret) {
> - struct nvm_tgt_dev *dev = pblk->dev;
> - struct nvm_geo *geo = >geo;
> -
> - pblk_err(pblk, "could not sync erase line:%d,blk:%d\n",
> - pblk_ppa_to_line(ppa),
> - pblk_ppa_to_pos(geo, ppa));
> -
> - rqd.error = ret;
> - goto out;
> - }
> -
> -out:
>   rqd.private = pblk;
>   __pblk_end_io_erase(pblk, );
> 
> --
> 2.11.0

Otherwise, it looks like a good cleanup. Thanks.

Reviewed-by: Javier González 



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH] lightnvm: pblk: fix rqd.error return value in pblk_blk_erase_sync

2018-08-03 Thread Javier Gonzalez
> On 2 Aug 2018, at 22.50, Matias Bjørling  wrote:
> 
> rqd.error is masked by the return value of pblk_submit_io_sync.
> The rqd structure is then passed on to the end_io function, which
> assumes that any error should lead to a chunk being marked
> offline/bad. Since the pblk_submit_io_sync can fail before the
> command is issued to the device, the error value maybe not correspond
> to a media failure, leading to chunks being immaturely retired.
> 
> Also, the pblk_blk_erase_sync function prints an error message in case
> the erase fails. Since the caller prints an error message by itself,
> remove the error message in this function.
> 
> Signed-off-by: Matias Bjørling 
> ---
> drivers/lightnvm/pblk-core.c | 19 ++-
> 1 file changed, 2 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 72acf2f6dbd6..814204d22a2e 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -886,10 +886,8 @@ static void pblk_setup_e_rq(struct pblk *pblk, struct 
> nvm_rq *rqd,
> 
> static int pblk_blk_erase_sync(struct pblk *pblk, struct ppa_addr ppa)
> {
> - struct nvm_rq rqd;
> - int ret = 0;
> -
> - memset(, 0, sizeof(struct nvm_rq));
> + struct nvm_rq rqd = {0};

This is a matter of taste, but if you want to squeeze it in here, it is
fine by me. There are other places with the same pattern; if you feel
strongly about this then please send a patch changing it in all the
places.

> + int ret;
> 
>   pblk_setup_e_rq(pblk, , ppa);
> 
> @@ -897,19 +895,6 @@ static int pblk_blk_erase_sync(struct pblk *pblk, struct 
> ppa_addr ppa)
>* with writes. Thus, there is no need to take the LUN semaphore.
>*/
>   ret = pblk_submit_io_sync(pblk, );
> - if (ret) {
> - struct nvm_tgt_dev *dev = pblk->dev;
> - struct nvm_geo *geo = >geo;
> -
> - pblk_err(pblk, "could not sync erase line:%d,blk:%d\n",
> - pblk_ppa_to_line(ppa),
> - pblk_ppa_to_pos(geo, ppa));
> -
> - rqd.error = ret;
> - goto out;
> - }
> -
> -out:
>   rqd.private = pblk;
>   __pblk_end_io_erase(pblk, );
> 
> --
> 2.11.0

Otherwise, it looks like a good cleanup. Thanks.

Reviewed-by: Javier González 



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH v2] lightnvm: pblk: expose generic disk name on pr_* msgs

2018-06-29 Thread Javier Gonzalez
> On 29 Jun 2018, at 13.22, Matias Bjørling  wrote:
> 
> On 06/29/2018 01:07 PM, Javier Gonzalez wrote:
>>> On 29 Jun 2018, at 12.59, Matias Bjørling  wrote:
>>> 
>>> On 06/29/2018 11:36 AM, Javier Gonzalez wrote:
>>>>> On 28 Jun 2018, at 15.43, Matias Bjørling  wrote:
>>>>> 
>>>>> The error messages in pblk does not say which pblk instance that
>>>>> a message occurred from. Update each error message to reflect the
>>>>> instance it belongs to, and also prefix it with pblk, so we know
>>>>> the message comes from the pblk module.
>>>>> 
>>>>> Signed-off-by: Matias Bjørling 
>>>> This could be a good moment to make error reporting mroe consistent.
>>>> Some times we used "could not ..." and others "failed to ...".
>>> 
>>> Agree. This should properly be another patch, such that it does not
>>> pollute the raw conversion.
>> Cool.
>>>> There is also an unnecessary error for a memory allocation (see
>>>> checkpatch).
>>> 
>>> That is intentional since I wanted to keep the existing wording.
>>> Although, I also looked at it, and kind of came to the conclusion that
>>> it was put there for a reason (since exports more than a no memory
>>> message).
>> Ok. We can have a separate patch to make this better, as mentioned
>> below.
>>>> See below.
>>>>> ---
>>>>> 
>>>>> Forgot to test with NVM_PBLK_DEBUG on. Fixed up the broken code.
>>>>> ---
>>>>> drivers/lightnvm/pblk-core.c | 51 +-
>>>>> drivers/lightnvm/pblk-gc.c   | 32 -
>>>>> drivers/lightnvm/pblk-init.c | 78 
>>>>> 
>>>>> drivers/lightnvm/pblk-rb.c   |  8 ++---
>>>>> drivers/lightnvm/pblk-read.c | 25 ++---
>>>>> drivers/lightnvm/pblk-recovery.c | 44 +++
>>>>> drivers/lightnvm/pblk-sysfs.c|  5 ++-
>>>>> drivers/lightnvm/pblk-write.c| 21 +--
>>>>> drivers/lightnvm/pblk.h  | 29 ++-
>>>>> 9 files changed, 153 insertions(+), 140 deletions(-)
>>>>> 
>>>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>>>> index 66ab1036f2fb..b829460fe827 100644
>>>>> --- a/drivers/lightnvm/pblk-core.c
>>>>> +++ b/drivers/lightnvm/pblk-core.c
>>>>> @@ -35,7 +35,7 @@ static void pblk_line_mark_bb(struct work_struct *work)
>>>>>   line = >lines[pblk_ppa_to_line(*ppa)];
>>>>>   pos = pblk_ppa_to_pos(>geo, *ppa);
>>>>> 
>>>>> - pr_err("pblk: failed to mark bb, line:%d, pos:%d\n",
>>>>> + pblk_err(pblk, "failed to mark bb, line:%d, pos:%d\n",
>>>>>   line->id, pos);
>>>>>   }
>>>>> 
>>>>> @@ -51,12 +51,12 @@ static void pblk_mark_bb(struct pblk *pblk, struct 
>>>>> pblk_line *line,
>>>>>   struct ppa_addr *ppa;
>>>>>   int pos = pblk_ppa_to_pos(geo, ppa_addr);
>>>>> 
>>>>> - pr_debug("pblk: erase failed: line:%d, pos:%d\n", line->id, pos);
>>>>> + pblk_debug(pblk, "erase failed: line:%d, pos:%d\n", line->id, pos);
>>>>>   atomic_long_inc(>erase_failed);
>>>>> 
>>>>>   atomic_dec(>blk_in_line);
>>>>>   if (test_and_set_bit(pos, line->blk_bitmap))
>>>>> - pr_err("pblk: attempted to erase bb: line:%d, pos:%d\n",
>>>>> + pblk_err(pblk, "attempted to erase bb: line:%d, pos:%d\n",
>>>>>   line->id, pos);
>>>>> 
>>>>>   /* Not necessary to mark bad blocks on 2.0 spec. */
>>>>> @@ -274,7 +274,7 @@ void pblk_free_rqd(struct pblk *pblk, struct nvm_rq 
>>>>> *rqd, int type)
>>>>>   pool = >e_rq_pool;
>>>>>   break;
>>>>>   default:
>>>>> - pr_err("pblk: trying to free unknown rqd type\n");
>>>>> + pblk_err(pblk, "trying to free unknown rqd type\n");
>>>>>   return;
>>>>>   }
>>>>> 
>>>>> @@ -310,7 +

Re: [PATCH v2] lightnvm: pblk: expose generic disk name on pr_* msgs

2018-06-29 Thread Javier Gonzalez
> On 29 Jun 2018, at 13.22, Matias Bjørling  wrote:
> 
> On 06/29/2018 01:07 PM, Javier Gonzalez wrote:
>>> On 29 Jun 2018, at 12.59, Matias Bjørling  wrote:
>>> 
>>> On 06/29/2018 11:36 AM, Javier Gonzalez wrote:
>>>>> On 28 Jun 2018, at 15.43, Matias Bjørling  wrote:
>>>>> 
>>>>> The error messages in pblk does not say which pblk instance that
>>>>> a message occurred from. Update each error message to reflect the
>>>>> instance it belongs to, and also prefix it with pblk, so we know
>>>>> the message comes from the pblk module.
>>>>> 
>>>>> Signed-off-by: Matias Bjørling 
>>>> This could be a good moment to make error reporting mroe consistent.
>>>> Some times we used "could not ..." and others "failed to ...".
>>> 
>>> Agree. This should properly be another patch, such that it does not
>>> pollute the raw conversion.
>> Cool.
>>>> There is also an unnecessary error for a memory allocation (see
>>>> checkpatch).
>>> 
>>> That is intentional since I wanted to keep the existing wording.
>>> Although, I also looked at it, and kind of came to the conclusion that
>>> it was put there for a reason (since exports more than a no memory
>>> message).
>> Ok. We can have a separate patch to make this better, as mentioned
>> below.
>>>> See below.
>>>>> ---
>>>>> 
>>>>> Forgot to test with NVM_PBLK_DEBUG on. Fixed up the broken code.
>>>>> ---
>>>>> drivers/lightnvm/pblk-core.c | 51 +-
>>>>> drivers/lightnvm/pblk-gc.c   | 32 -
>>>>> drivers/lightnvm/pblk-init.c | 78 
>>>>> 
>>>>> drivers/lightnvm/pblk-rb.c   |  8 ++---
>>>>> drivers/lightnvm/pblk-read.c | 25 ++---
>>>>> drivers/lightnvm/pblk-recovery.c | 44 +++
>>>>> drivers/lightnvm/pblk-sysfs.c|  5 ++-
>>>>> drivers/lightnvm/pblk-write.c| 21 +--
>>>>> drivers/lightnvm/pblk.h  | 29 ++-
>>>>> 9 files changed, 153 insertions(+), 140 deletions(-)
>>>>> 
>>>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>>>> index 66ab1036f2fb..b829460fe827 100644
>>>>> --- a/drivers/lightnvm/pblk-core.c
>>>>> +++ b/drivers/lightnvm/pblk-core.c
>>>>> @@ -35,7 +35,7 @@ static void pblk_line_mark_bb(struct work_struct *work)
>>>>>   line = >lines[pblk_ppa_to_line(*ppa)];
>>>>>   pos = pblk_ppa_to_pos(>geo, *ppa);
>>>>> 
>>>>> - pr_err("pblk: failed to mark bb, line:%d, pos:%d\n",
>>>>> + pblk_err(pblk, "failed to mark bb, line:%d, pos:%d\n",
>>>>>   line->id, pos);
>>>>>   }
>>>>> 
>>>>> @@ -51,12 +51,12 @@ static void pblk_mark_bb(struct pblk *pblk, struct 
>>>>> pblk_line *line,
>>>>>   struct ppa_addr *ppa;
>>>>>   int pos = pblk_ppa_to_pos(geo, ppa_addr);
>>>>> 
>>>>> - pr_debug("pblk: erase failed: line:%d, pos:%d\n", line->id, pos);
>>>>> + pblk_debug(pblk, "erase failed: line:%d, pos:%d\n", line->id, pos);
>>>>>   atomic_long_inc(>erase_failed);
>>>>> 
>>>>>   atomic_dec(>blk_in_line);
>>>>>   if (test_and_set_bit(pos, line->blk_bitmap))
>>>>> - pr_err("pblk: attempted to erase bb: line:%d, pos:%d\n",
>>>>> + pblk_err(pblk, "attempted to erase bb: line:%d, pos:%d\n",
>>>>>   line->id, pos);
>>>>> 
>>>>>   /* Not necessary to mark bad blocks on 2.0 spec. */
>>>>> @@ -274,7 +274,7 @@ void pblk_free_rqd(struct pblk *pblk, struct nvm_rq 
>>>>> *rqd, int type)
>>>>>   pool = >e_rq_pool;
>>>>>   break;
>>>>>   default:
>>>>> - pr_err("pblk: trying to free unknown rqd type\n");
>>>>> + pblk_err(pblk, "trying to free unknown rqd type\n");
>>>>>   return;
>>>>>   }
>>>>> 
>>>>> @@ -310,7 +

Re: [PATCH] lightnvm: pblk: limit get chk meta request size

2018-06-12 Thread Javier Gonzalez
> On 12 Jun 2018, at 03.30, Matias Bjørling  wrote:
> 
> For devices that does not specify a limit on its transfer size, the
> get_chk_meta command may send down a single I/O retrieving the full
> chunk metadata table. Resulting in large 2-4MB I/O requests. Instead,
> split up the I/Os to a maximum of 256KB and issue them separately to
> improve I/O latency.
> 
> Signed-off-by: Matias Bjørling 
> ---
> drivers/nvme/host/lightnvm.c | 10 --
> 1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> index b9989717418d..3b644b0e9713 100644
> --- a/drivers/nvme/host/lightnvm.c
> +++ b/drivers/nvme/host/lightnvm.c
> @@ -583,7 +583,13 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
>   struct ppa_addr ppa;
>   size_t left = nchks * sizeof(struct nvme_nvm_chk_meta);
>   size_t log_pos, offset, len;
> - int ret, i;
> + int ret, i, max_len;
> +
> + /*
> +  * limit requests to maximum 256K to avoid issuing arbitrary large
> +  * requests when the device does not specific a maximum transfer size.
> +  */
> + max_len = min_t(unsigned int, ctrl->max_hw_sectors << 9, 256 * 1024);
> 
>   /* Normalize lba address space to obtain log offset */
>   ppa.ppa = slba;
> @@ -596,7 +602,7 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
>   offset = log_pos * sizeof(struct nvme_nvm_chk_meta);
> 
>   while (left) {
> - len = min_t(unsigned int, left, ctrl->max_hw_sectors << 9);
> + len = min_t(unsigned int, left, max_len);
> 
>   ret = nvme_get_log_ext(ctrl, ns, NVME_NVM_LOG_REPORT_CHUNK,
>   dev_meta, len, offset);
> --
> 2.11.0

Looks good to me.

Reviewed-by: Javier González 



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH] lightnvm: pblk: limit get chk meta request size

2018-06-12 Thread Javier Gonzalez
> On 12 Jun 2018, at 03.30, Matias Bjørling  wrote:
> 
> For devices that does not specify a limit on its transfer size, the
> get_chk_meta command may send down a single I/O retrieving the full
> chunk metadata table. Resulting in large 2-4MB I/O requests. Instead,
> split up the I/Os to a maximum of 256KB and issue them separately to
> improve I/O latency.
> 
> Signed-off-by: Matias Bjørling 
> ---
> drivers/nvme/host/lightnvm.c | 10 --
> 1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> index b9989717418d..3b644b0e9713 100644
> --- a/drivers/nvme/host/lightnvm.c
> +++ b/drivers/nvme/host/lightnvm.c
> @@ -583,7 +583,13 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
>   struct ppa_addr ppa;
>   size_t left = nchks * sizeof(struct nvme_nvm_chk_meta);
>   size_t log_pos, offset, len;
> - int ret, i;
> + int ret, i, max_len;
> +
> + /*
> +  * limit requests to maximum 256K to avoid issuing arbitrary large
> +  * requests when the device does not specific a maximum transfer size.
> +  */
> + max_len = min_t(unsigned int, ctrl->max_hw_sectors << 9, 256 * 1024);
> 
>   /* Normalize lba address space to obtain log offset */
>   ppa.ppa = slba;
> @@ -596,7 +602,7 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
>   offset = log_pos * sizeof(struct nvme_nvm_chk_meta);
> 
>   while (left) {
> - len = min_t(unsigned int, left, ctrl->max_hw_sectors << 9);
> + len = min_t(unsigned int, left, max_len);
> 
>   ret = nvme_get_log_ext(ctrl, ns, NVME_NVM_LOG_REPORT_CHUNK,
>   dev_meta, len, offset);
> --
> 2.11.0

Looks good to me.

Reviewed-by: Javier González 



signature.asc
Description: Message signed with OpenPGP


Re: [GIT PULL 20/20] lightnvm: pblk: sync RB and RL states during GC

2018-05-28 Thread Javier Gonzalez
Javier

I somehow missed these patches in the mailing list. Sorry for coming
with feedback this late. I'll look at my filters - in any case, would
you mind Cc'ing me in the future?

> On 28 May 2018, at 10.58, Matias Bjørling  wrote:
> 
> From: Igor Konopko 
> 
> During sequential workloads we can met the case when almost all the
> lines are fully written with data. In that case rate limiter will
> significantly reduce the max number of requests for user IOs.

Do you mean random writes? On fully sequential, a line will either be
fully written, fully invalidated or on its way to be written. When
invalidating the line, then the whole line will be invalidated and GC
will free it without having to move valid data.

> 
> Unfortunately in the case when round buffer is flushed to drive and
> the entries are not yet removed (which is ok, since there is still
> enough free entries in round buffer for user IO) we hang on user
> IO due to not enough entries in rate limiter. The reason is that
> rate limiter user entries are decreased after freeing the round
> buffer entries, which does not happen if there is still plenty of
> space in round buffer.
> 
> This patch forces freeing the round buffer by calling
> pblk_rb_sync_l2p and thus making new free entries in rate limiter,
> when there is no enough of them for user IO.

I can see why you might have problems with very low OP due to the rate
limiter, but unfortunately this is not a good way of solving the
problem. When you do this, you basically make the L2P to point to the
device instead of pointing to the write cache, which in essence bypasses
mw_cuints. As a result, if a read comes in to one of the synced entries,
it will violate the device-host contract and most probably fail (for
sure fail on > SLC).

I think that the right way of solving this problem is separating the
write and GC buffers and then assigning tokens to them. The write thread
will then consume both buffers based on these tokens. In this case, user
I/O will have a buffer for itself, which will be guaranteed to advance
at the rate the rate-limiter is allowing it to. Note that the 2 buffers
can be a single buffer with a new set of pointers so that the lookup can
be done with a single bit.

I have been looking for time to implement this for a while. If you want
to give it a go, we can talk and I can give you some pointers on
potential issues I have thought about.

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [GIT PULL 20/20] lightnvm: pblk: sync RB and RL states during GC

2018-05-28 Thread Javier Gonzalez
Javier

I somehow missed these patches in the mailing list. Sorry for coming
with feedback this late. I'll look at my filters - in any case, would
you mind Cc'ing me in the future?

> On 28 May 2018, at 10.58, Matias Bjørling  wrote:
> 
> From: Igor Konopko 
> 
> During sequential workloads we can met the case when almost all the
> lines are fully written with data. In that case rate limiter will
> significantly reduce the max number of requests for user IOs.

Do you mean random writes? On fully sequential, a line will either be
fully written, fully invalidated or on its way to be written. When
invalidating the line, then the whole line will be invalidated and GC
will free it without having to move valid data.

> 
> Unfortunately in the case when round buffer is flushed to drive and
> the entries are not yet removed (which is ok, since there is still
> enough free entries in round buffer for user IO) we hang on user
> IO due to not enough entries in rate limiter. The reason is that
> rate limiter user entries are decreased after freeing the round
> buffer entries, which does not happen if there is still plenty of
> space in round buffer.
> 
> This patch forces freeing the round buffer by calling
> pblk_rb_sync_l2p and thus making new free entries in rate limiter,
> when there is no enough of them for user IO.

I can see why you might have problems with very low OP due to the rate
limiter, but unfortunately this is not a good way of solving the
problem. When you do this, you basically make the L2P to point to the
device instead of pointing to the write cache, which in essence bypasses
mw_cuints. As a result, if a read comes in to one of the synced entries,
it will violate the device-host contract and most probably fail (for
sure fail on > SLC).

I think that the right way of solving this problem is separating the
write and GC buffers and then assigning tokens to them. The write thread
will then consume both buffers based on these tokens. In this case, user
I/O will have a buffer for itself, which will be guaranteed to advance
at the rate the rate-limiter is allowing it to. Note that the 2 buffers
can be a single buffer with a new set of pointers so that the lookup can
be done with a single bit.

I have been looking for time to implement this for a while. If you want
to give it a go, we can talk and I can give you some pointers on
potential issues I have thought about.

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [GIT PULL 16/20] lightnvm: error handling when whole line is bad

2018-05-28 Thread Javier Gonzalez
> On 28 May 2018, at 10.58, Matias Bjørling  wrote:
> 
> From: Igor Konopko 
> 
> When all the blocks (chunks) in line are marked as bad (offline)
> we shouldn't try to read smeta during init process.
> 
> Currently we are trying to do so by passing -1 as PPA address,
> what causes multiple warnings, that we issuing IOs to out-of-bound
> PPAs.
> 
> Signed-off-by: Igor Konopko 
> Signed-off-by: Marcin Dziegielewski 
> Updated title.
> Signed-off-by: Matias Bjørling 
> ---
> drivers/lightnvm/pblk-core.c | 5 +
> 1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index a20b41c355c5..e3e883547198 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -868,6 +868,11 @@ int pblk_line_read_smeta(struct pblk *pblk, struct 
> pblk_line *line)
> {
>   u64 bpaddr = pblk_line_smeta_start(pblk, line);
> 
> + if (bpaddr == -1) {
> + /* Whole line is bad - do not try to read smeta. */
> + return 1;
> + }

This case cannot occur on the only user of the function
(pblk_recov_l2p()). On the previous check (pblk_line_was_written()), we
verify the state of the line and the position of the first good chunk. In
the case of a bad line (less chunks than a given threshold to allow
emeta), the recovery will not be carried out in the line.

Javier

Re: [GIT PULL 16/20] lightnvm: error handling when whole line is bad

2018-05-28 Thread Javier Gonzalez
> On 28 May 2018, at 10.58, Matias Bjørling  wrote:
> 
> From: Igor Konopko 
> 
> When all the blocks (chunks) in line are marked as bad (offline)
> we shouldn't try to read smeta during init process.
> 
> Currently we are trying to do so by passing -1 as PPA address,
> what causes multiple warnings, that we issuing IOs to out-of-bound
> PPAs.
> 
> Signed-off-by: Igor Konopko 
> Signed-off-by: Marcin Dziegielewski 
> Updated title.
> Signed-off-by: Matias Bjørling 
> ---
> drivers/lightnvm/pblk-core.c | 5 +
> 1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index a20b41c355c5..e3e883547198 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -868,6 +868,11 @@ int pblk_line_read_smeta(struct pblk *pblk, struct 
> pblk_line *line)
> {
>   u64 bpaddr = pblk_line_smeta_start(pblk, line);
> 
> + if (bpaddr == -1) {
> + /* Whole line is bad - do not try to read smeta. */
> + return 1;
> + }

This case cannot occur on the only user of the function
(pblk_recov_l2p()). On the previous check (pblk_line_was_written()), we
verify the state of the line and the position of the first good chunk. In
the case of a bad line (less chunks than a given threshold to allow
emeta), the recovery will not be carried out in the line.

Javier

Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0

2018-05-28 Thread Javier Gonzalez
> On 28 May 2018, at 10.58, Matias Bjørling  wrote:
> 
> From: Marcin Dziegielewski 
> 
> Some devices can expose mw_cunits equal to 0, it can cause creation
> of too small write buffer and cause performance to drop on write
> workloads.
> 
> To handle that, we use the default value for MLC and beacause it
> covers both 1.2 and 2.0 OC specification, setting up mw_cunits
> in nvme_nvm_setup_12 function isn't longer necessary.
> 
> Signed-off-by: Marcin Dziegielewski 
> Signed-off-by: Igor Konopko 
> Signed-off-by: Matias Bjørling 
> ---
> drivers/lightnvm/pblk-init.c | 10 +-
> drivers/nvme/host/lightnvm.c |  1 -
> 2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index d65d2f972ccf..0f277744266b 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
>   atomic64_set(>nr_flush, 0);
>   pblk->nr_flush_rst = 0;
> 
> - pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> + if (geo->mw_cunits) {
> + pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> + } else {
> + pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo->all_luns;
> + /*
> +  * Some devices can expose mw_cunits equal to 0, so let's use
> +  * here default safe value for MLC.
> +  */
> + }
> 
>   pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
>   max_write_ppas = pblk->min_write_pgs * geo->all_luns;
> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> index 41279da799ed..c747792da915 100644
> --- a/drivers/nvme/host/lightnvm.c
> +++ b/drivers/nvme/host/lightnvm.c
> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct nvme_nvm_id12 *id,
> 
>   geo->ws_min = sec_per_pg;
>   geo->ws_opt = sec_per_pg;
> - geo->mw_cunits = geo->ws_opt << 3;  /* default to MLC safe values */
> 
>   /* Do not impose values for maximum number of open blocks as it is
>* unspecified in 1.2. Users of 1.2 must be aware of this and eventually
> --
> 2.11.0

By doing this, 1.2 future users (beyond pblk), will fail to have a valid
mw_cunits value. It's ok to deal with the 0 case in pblk, but I believe
that we should have the default value for 1.2 either way.

A more generic way of doing this would be to have a default value for
2.0 too, in case mw_cunits is reported as 0.

Javier



signature.asc
Description: Message signed with OpenPGP


Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0

2018-05-28 Thread Javier Gonzalez
> On 28 May 2018, at 10.58, Matias Bjørling  wrote:
> 
> From: Marcin Dziegielewski 
> 
> Some devices can expose mw_cunits equal to 0, it can cause creation
> of too small write buffer and cause performance to drop on write
> workloads.
> 
> To handle that, we use the default value for MLC and beacause it
> covers both 1.2 and 2.0 OC specification, setting up mw_cunits
> in nvme_nvm_setup_12 function isn't longer necessary.
> 
> Signed-off-by: Marcin Dziegielewski 
> Signed-off-by: Igor Konopko 
> Signed-off-by: Matias Bjørling 
> ---
> drivers/lightnvm/pblk-init.c | 10 +-
> drivers/nvme/host/lightnvm.c |  1 -
> 2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index d65d2f972ccf..0f277744266b 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
>   atomic64_set(>nr_flush, 0);
>   pblk->nr_flush_rst = 0;
> 
> - pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> + if (geo->mw_cunits) {
> + pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> + } else {
> + pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo->all_luns;
> + /*
> +  * Some devices can expose mw_cunits equal to 0, so let's use
> +  * here default safe value for MLC.
> +  */
> + }
> 
>   pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
>   max_write_ppas = pblk->min_write_pgs * geo->all_luns;
> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> index 41279da799ed..c747792da915 100644
> --- a/drivers/nvme/host/lightnvm.c
> +++ b/drivers/nvme/host/lightnvm.c
> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct nvme_nvm_id12 *id,
> 
>   geo->ws_min = sec_per_pg;
>   geo->ws_opt = sec_per_pg;
> - geo->mw_cunits = geo->ws_opt << 3;  /* default to MLC safe values */
> 
>   /* Do not impose values for maximum number of open blocks as it is
>* unspecified in 1.2. Users of 1.2 must be aware of this and eventually
> --
> 2.11.0

By doing this, 1.2 future users (beyond pblk), will fail to have a valid
mw_cunits value. It's ok to deal with the 0 case in pblk, but I believe
that we should have the default value for 1.2 either way.

A more generic way of doing this would be to have a default value for
2.0 too, in case mw_cunits is reported as 0.

Javier



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 04/12] lightnvm: convert to bioset_init()/mempool_init()

2018-05-22 Thread Javier Gonzalez
> On 21 May 2018, at 00.25, Kent Overstreet  wrote:
> 
> Signed-off-by: Kent Overstreet 
> ---
> drivers/lightnvm/pblk-core.c | 30 ++---
> drivers/lightnvm/pblk-init.c | 72 
> drivers/lightnvm/pblk-read.c |  4 +-
> drivers/lightnvm/pblk-recovery.c |  2 +-
> drivers/lightnvm/pblk-write.c|  8 ++--
> drivers/lightnvm/pblk.h  | 14 +++
> 6 files changed, 65 insertions(+), 65 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 94d5d97c9d..934341b104 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -40,7 +40,7 @@ static void pblk_line_mark_bb(struct work_struct *work)
>   }
> 
>   kfree(ppa);
> - mempool_free(line_ws, pblk->gen_ws_pool);
> + mempool_free(line_ws, >gen_ws_pool);
> }
> 
> static void pblk_mark_bb(struct pblk *pblk, struct pblk_line *line,
> @@ -102,7 +102,7 @@ static void pblk_end_io_erase(struct nvm_rq *rqd)
>   struct pblk *pblk = rqd->private;
> 
>   __pblk_end_io_erase(pblk, rqd);
> - mempool_free(rqd, pblk->e_rq_pool);
> + mempool_free(rqd, >e_rq_pool);
> }
> 
> /*
> @@ -237,15 +237,15 @@ struct nvm_rq *pblk_alloc_rqd(struct pblk *pblk, int 
> type)
>   switch (type) {
>   case PBLK_WRITE:
>   case PBLK_WRITE_INT:
> - pool = pblk->w_rq_pool;
> + pool = >w_rq_pool;
>   rq_size = pblk_w_rq_size;
>   break;
>   case PBLK_READ:
> - pool = pblk->r_rq_pool;
> + pool = >r_rq_pool;
>   rq_size = pblk_g_rq_size;
>   break;
>   default:
> - pool = pblk->e_rq_pool;
> + pool = >e_rq_pool;
>   rq_size = pblk_g_rq_size;
>   }
> 
> @@ -265,13 +265,13 @@ void pblk_free_rqd(struct pblk *pblk, struct nvm_rq 
> *rqd, int type)
>   case PBLK_WRITE:
>   kfree(((struct pblk_c_ctx *)nvm_rq_to_pdu(rqd))->lun_bitmap);
>   case PBLK_WRITE_INT:
> - pool = pblk->w_rq_pool;
> + pool = >w_rq_pool;
>   break;
>   case PBLK_READ:
> - pool = pblk->r_rq_pool;
> + pool = >r_rq_pool;
>   break;
>   case PBLK_ERASE:
> - pool = pblk->e_rq_pool;
> + pool = >e_rq_pool;
>   break;
>   default:
>   pr_err("pblk: trying to free unknown rqd type\n");
> @@ -292,7 +292,7 @@ void pblk_bio_free_pages(struct pblk *pblk, struct bio 
> *bio, int off,
> 
>   for (i = off; i < nr_pages + off; i++) {
>   bv = bio->bi_io_vec[i];
> - mempool_free(bv.bv_page, pblk->page_bio_pool);
> + mempool_free(bv.bv_page, >page_bio_pool);
>   }
> }
> 
> @@ -304,12 +304,12 @@ int pblk_bio_add_pages(struct pblk *pblk, struct bio 
> *bio, gfp_t flags,
>   int i, ret;
> 
>   for (i = 0; i < nr_pages; i++) {
> - page = mempool_alloc(pblk->page_bio_pool, flags);
> + page = mempool_alloc(>page_bio_pool, flags);
> 
>   ret = bio_add_pc_page(q, bio, page, PBLK_EXPOSED_PAGE_SIZE, 0);
>   if (ret != PBLK_EXPOSED_PAGE_SIZE) {
>   pr_err("pblk: could not add page to bio\n");
> - mempool_free(page, pblk->page_bio_pool);
> + mempool_free(page, >page_bio_pool);
>   goto err;
>   }
>   }
> @@ -1593,7 +1593,7 @@ static void pblk_line_put_ws(struct work_struct *work)
>   struct pblk_line *line = line_put_ws->line;
> 
>   __pblk_line_put(pblk, line);
> - mempool_free(line_put_ws, pblk->gen_ws_pool);
> + mempool_free(line_put_ws, >gen_ws_pool);
> }
> 
> void pblk_line_put(struct kref *ref)
> @@ -1610,7 +1610,7 @@ void pblk_line_put_wq(struct kref *ref)
>   struct pblk *pblk = line->pblk;
>   struct pblk_line_ws *line_put_ws;
> 
> - line_put_ws = mempool_alloc(pblk->gen_ws_pool, GFP_ATOMIC);
> + line_put_ws = mempool_alloc(>gen_ws_pool, GFP_ATOMIC);
>   if (!line_put_ws)
>   return;
> 
> @@ -1752,7 +1752,7 @@ void pblk_line_close_ws(struct work_struct *work)
>   struct pblk_line *line = line_ws->line;
> 
>   pblk_line_close(pblk, line);
> - mempool_free(line_ws, pblk->gen_ws_pool);
> + mempool_free(line_ws, >gen_ws_pool);
> }
> 
> void pblk_gen_run_ws(struct pblk *pblk, struct pblk_line *line, void *priv,
> @@ -1761,7 +1761,7 @@ void pblk_gen_run_ws(struct pblk *pblk, struct 
> pblk_line *line, void *priv,
> {
>   struct pblk_line_ws *line_ws;
> 
> - line_ws = mempool_alloc(pblk->gen_ws_pool, gfp_mask);
> + line_ws = mempool_alloc(>gen_ws_pool, gfp_mask);
> 
>   line_ws->pblk = pblk;
>   line_ws->line = line;
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index 91a5bc2556..9a984abd3d 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ 

Re: [PATCH 04/12] lightnvm: convert to bioset_init()/mempool_init()

2018-05-22 Thread Javier Gonzalez
> On 21 May 2018, at 00.25, Kent Overstreet  wrote:
> 
> Signed-off-by: Kent Overstreet 
> ---
> drivers/lightnvm/pblk-core.c | 30 ++---
> drivers/lightnvm/pblk-init.c | 72 
> drivers/lightnvm/pblk-read.c |  4 +-
> drivers/lightnvm/pblk-recovery.c |  2 +-
> drivers/lightnvm/pblk-write.c|  8 ++--
> drivers/lightnvm/pblk.h  | 14 +++
> 6 files changed, 65 insertions(+), 65 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 94d5d97c9d..934341b104 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -40,7 +40,7 @@ static void pblk_line_mark_bb(struct work_struct *work)
>   }
> 
>   kfree(ppa);
> - mempool_free(line_ws, pblk->gen_ws_pool);
> + mempool_free(line_ws, >gen_ws_pool);
> }
> 
> static void pblk_mark_bb(struct pblk *pblk, struct pblk_line *line,
> @@ -102,7 +102,7 @@ static void pblk_end_io_erase(struct nvm_rq *rqd)
>   struct pblk *pblk = rqd->private;
> 
>   __pblk_end_io_erase(pblk, rqd);
> - mempool_free(rqd, pblk->e_rq_pool);
> + mempool_free(rqd, >e_rq_pool);
> }
> 
> /*
> @@ -237,15 +237,15 @@ struct nvm_rq *pblk_alloc_rqd(struct pblk *pblk, int 
> type)
>   switch (type) {
>   case PBLK_WRITE:
>   case PBLK_WRITE_INT:
> - pool = pblk->w_rq_pool;
> + pool = >w_rq_pool;
>   rq_size = pblk_w_rq_size;
>   break;
>   case PBLK_READ:
> - pool = pblk->r_rq_pool;
> + pool = >r_rq_pool;
>   rq_size = pblk_g_rq_size;
>   break;
>   default:
> - pool = pblk->e_rq_pool;
> + pool = >e_rq_pool;
>   rq_size = pblk_g_rq_size;
>   }
> 
> @@ -265,13 +265,13 @@ void pblk_free_rqd(struct pblk *pblk, struct nvm_rq 
> *rqd, int type)
>   case PBLK_WRITE:
>   kfree(((struct pblk_c_ctx *)nvm_rq_to_pdu(rqd))->lun_bitmap);
>   case PBLK_WRITE_INT:
> - pool = pblk->w_rq_pool;
> + pool = >w_rq_pool;
>   break;
>   case PBLK_READ:
> - pool = pblk->r_rq_pool;
> + pool = >r_rq_pool;
>   break;
>   case PBLK_ERASE:
> - pool = pblk->e_rq_pool;
> + pool = >e_rq_pool;
>   break;
>   default:
>   pr_err("pblk: trying to free unknown rqd type\n");
> @@ -292,7 +292,7 @@ void pblk_bio_free_pages(struct pblk *pblk, struct bio 
> *bio, int off,
> 
>   for (i = off; i < nr_pages + off; i++) {
>   bv = bio->bi_io_vec[i];
> - mempool_free(bv.bv_page, pblk->page_bio_pool);
> + mempool_free(bv.bv_page, >page_bio_pool);
>   }
> }
> 
> @@ -304,12 +304,12 @@ int pblk_bio_add_pages(struct pblk *pblk, struct bio 
> *bio, gfp_t flags,
>   int i, ret;
> 
>   for (i = 0; i < nr_pages; i++) {
> - page = mempool_alloc(pblk->page_bio_pool, flags);
> + page = mempool_alloc(>page_bio_pool, flags);
> 
>   ret = bio_add_pc_page(q, bio, page, PBLK_EXPOSED_PAGE_SIZE, 0);
>   if (ret != PBLK_EXPOSED_PAGE_SIZE) {
>   pr_err("pblk: could not add page to bio\n");
> - mempool_free(page, pblk->page_bio_pool);
> + mempool_free(page, >page_bio_pool);
>   goto err;
>   }
>   }
> @@ -1593,7 +1593,7 @@ static void pblk_line_put_ws(struct work_struct *work)
>   struct pblk_line *line = line_put_ws->line;
> 
>   __pblk_line_put(pblk, line);
> - mempool_free(line_put_ws, pblk->gen_ws_pool);
> + mempool_free(line_put_ws, >gen_ws_pool);
> }
> 
> void pblk_line_put(struct kref *ref)
> @@ -1610,7 +1610,7 @@ void pblk_line_put_wq(struct kref *ref)
>   struct pblk *pblk = line->pblk;
>   struct pblk_line_ws *line_put_ws;
> 
> - line_put_ws = mempool_alloc(pblk->gen_ws_pool, GFP_ATOMIC);
> + line_put_ws = mempool_alloc(>gen_ws_pool, GFP_ATOMIC);
>   if (!line_put_ws)
>   return;
> 
> @@ -1752,7 +1752,7 @@ void pblk_line_close_ws(struct work_struct *work)
>   struct pblk_line *line = line_ws->line;
> 
>   pblk_line_close(pblk, line);
> - mempool_free(line_ws, pblk->gen_ws_pool);
> + mempool_free(line_ws, >gen_ws_pool);
> }
> 
> void pblk_gen_run_ws(struct pblk *pblk, struct pblk_line *line, void *priv,
> @@ -1761,7 +1761,7 @@ void pblk_gen_run_ws(struct pblk *pblk, struct 
> pblk_line *line, void *priv,
> {
>   struct pblk_line_ws *line_ws;
> 
> - line_ws = mempool_alloc(pblk->gen_ws_pool, gfp_mask);
> + line_ws = mempool_alloc(>gen_ws_pool, gfp_mask);
> 
>   line_ws->pblk = pblk;
>   line_ws->line = line;
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index 91a5bc2556..9a984abd3d 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -23,7 +23,7 @@
> static 

Re: [PATCH v2 3/3] lightnvm: pblk: fix smeta write error path

2018-04-30 Thread Javier Gonzalez
> On 24 Apr 2018, at 07.45, Hans Holmberg  
> wrote:
> 
> From: Hans Holmberg 
> 
> Smeta write errors were previously ignored. Skip these
> lines instead and throw them back on the free
> list, so the chunks will go through a reset cycle
> before we attempt to use the line again.
> 
> Signed-off-by: Hans Holmberg 
> ---
> drivers/lightnvm/pblk-core.c | 7 ---
> 1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 413cf3b..dec1bb4 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -849,9 +849,10 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, 
> struct pblk_line *line,
>   atomic_dec(>inflight_io);
> 
>   if (rqd.error) {
> - if (dir == PBLK_WRITE)
> + if (dir == PBLK_WRITE) {
>   pblk_log_write_err(pblk, );
> - else if (dir == PBLK_READ)
> + ret = 1;
> + } else if (dir == PBLK_READ)
>   pblk_log_read_err(pblk, );
>   }
> 
> @@ -1120,7 +1121,7 @@ static int pblk_line_init_bb(struct pblk *pblk, struct 
> pblk_line *line,
> 
>   if (init && pblk_line_submit_smeta_io(pblk, line, off, PBLK_WRITE)) {
>   pr_debug("pblk: line smeta I/O failed. Retry\n");
> - return 1;
> + return 0;
>   }
> 
>   bitmap_copy(line->invalid_bitmap, line->map_bitmap, lm->sec_per_line);
> --
> 2.7.4

LGTM

Reviewed-by: Javier González 



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH v2 2/3] lightnvm: pblk: garbage collect lines with failed writes

2018-04-30 Thread Javier Gonzalez

> On 30 Apr 2018, at 11.14, Javier Gonzalez <jav...@cnexlabs.com> wrote:
> 
>> On 24 Apr 2018, at 07.45, Hans Holmberg <hans.ml.holmb...@owltronix.com> 
>> wrote:
>> 
>> From: Hans Holmberg <hans.holmb...@cnexlabs.com>
>> 
>> Write failures should not happen under normal circumstances,
>> so in order to bring the chunk back into a known state as soon
>> as possible, evacuate all the valid data out of the line and let the
>> fw judge if the block can be written to in the next reset cycle.
>> 
>> Do this by introducing a new gc list for lines with failed writes,
>> and ensure that the rate limiter allocates a small portion of
>> the write bandwidth to get the job done.
>> 
>> The lba list is saved in memory for use during gc as we
>> cannot gurantee that the emeta data is readable if a write
>> error occurred.
>> 
>> Signed-off-by: Hans Holmberg <hans.holmb...@cnexlabs.com>
>> ---
>> drivers/lightnvm/pblk-core.c  |  45 ++-
>> drivers/lightnvm/pblk-gc.c| 102 
>> +++---
>> drivers/lightnvm/pblk-init.c  |  45 ---
>> drivers/lightnvm/pblk-rl.c|  29 ++--
>> drivers/lightnvm/pblk-sysfs.c |  15 ++-
>> drivers/lightnvm/pblk-write.c |   2 +
>> drivers/lightnvm/pblk.h   |  25 +--
>> 7 files changed, 199 insertions(+), 64 deletions(-)
>> 
>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>> index 7762e89..413cf3b 100644
>> --- a/drivers/lightnvm/pblk-core.c
>> +++ b/drivers/lightnvm/pblk-core.c
>> @@ -373,7 +373,13 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, 
>> struct pblk_line *line)
>> 
>>  lockdep_assert_held(>lock);
>> 
>> -if (!vsc) {
>> +if (line->w_err_gc->has_write_err) {
>> +if (line->gc_group != PBLK_LINEGC_WERR) {
>> +line->gc_group = PBLK_LINEGC_WERR;
>> +move_list = _mg->gc_werr_list;
>> +pblk_rl_werr_line_in(>rl);
>> +}
>> +} else if (!vsc) {
>>  if (line->gc_group != PBLK_LINEGC_FULL) {
>>  line->gc_group = PBLK_LINEGC_FULL;
>>  move_list = _mg->gc_full_list;
>> @@ -1603,8 +1609,13 @@ static void __pblk_line_put(struct pblk *pblk, struct 
>> pblk_line *line)
>>  line->state = PBLK_LINESTATE_FREE;
>>  line->gc_group = PBLK_LINEGC_NONE;
>>  pblk_line_free(line);
>> -spin_unlock(>lock);
>> 
>> +if (line->w_err_gc->has_write_err) {
>> +pblk_rl_werr_line_out(>rl);
>> +line->w_err_gc->has_write_err = 0;
>> +}
>> +
>> +spin_unlock(>lock);
>>  atomic_dec(>pipeline_gc);
>> 
>>  spin_lock(_mg->free_lock);
>> @@ -1767,11 +1778,34 @@ void pblk_line_close_meta(struct pblk *pblk, struct 
>> pblk_line *line)
>> 
>>  spin_lock(_mg->close_lock);
>>  spin_lock(>lock);
>> +
>> +/* Update the in-memory start address for emeta, in case it has
>> + * shifted due to write errors
>> + */
>> +if (line->emeta_ssec != line->cur_sec)
>> +line->emeta_ssec = line->cur_sec;
>> +
>>  list_add_tail(>list, _mg->emeta_list);
>>  spin_unlock(>lock);
>>  spin_unlock(_mg->close_lock);
>> 
>>  pblk_line_should_sync_meta(pblk);
>> +
>> +
>> +}
>> +
>> +static void pblk_save_lba_list(struct pblk *pblk, struct pblk_line *line)
>> +{
>> +struct pblk_line_meta *lm = >lm;
>> +struct pblk_line_mgmt *l_mg = >l_mg;
>> +unsigned int lba_list_size = lm->emeta_len[2];
>> +struct pblk_w_err_gc *w_err_gc = line->w_err_gc;
>> +struct pblk_emeta *emeta = line->emeta;
>> +
>> +w_err_gc->lba_list = pblk_malloc(lba_list_size,
>> + l_mg->emeta_alloc_type, GFP_KERNEL);
>> +memcpy(w_err_gc->lba_list, emeta_to_lbas(pblk, emeta->buf),
>> +lba_list_size);
>> }
>> 
>> void pblk_line_close_ws(struct work_struct *work)
>> @@ -1780,6 +1814,13 @@ void pblk_line_close_ws(struct work_struct *work)
>>  ws);
>>  struct pblk *pblk = line_ws->pblk;
>>  struct pblk_line *line = line_ws->line;
>> +struct pblk_w_err_gc *w_e

Re: [PATCH v2 3/3] lightnvm: pblk: fix smeta write error path

2018-04-30 Thread Javier Gonzalez
> On 24 Apr 2018, at 07.45, Hans Holmberg  
> wrote:
> 
> From: Hans Holmberg 
> 
> Smeta write errors were previously ignored. Skip these
> lines instead and throw them back on the free
> list, so the chunks will go through a reset cycle
> before we attempt to use the line again.
> 
> Signed-off-by: Hans Holmberg 
> ---
> drivers/lightnvm/pblk-core.c | 7 ---
> 1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 413cf3b..dec1bb4 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -849,9 +849,10 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, 
> struct pblk_line *line,
>   atomic_dec(>inflight_io);
> 
>   if (rqd.error) {
> - if (dir == PBLK_WRITE)
> + if (dir == PBLK_WRITE) {
>   pblk_log_write_err(pblk, );
> - else if (dir == PBLK_READ)
> + ret = 1;
> + } else if (dir == PBLK_READ)
>   pblk_log_read_err(pblk, );
>   }
> 
> @@ -1120,7 +1121,7 @@ static int pblk_line_init_bb(struct pblk *pblk, struct 
> pblk_line *line,
> 
>   if (init && pblk_line_submit_smeta_io(pblk, line, off, PBLK_WRITE)) {
>   pr_debug("pblk: line smeta I/O failed. Retry\n");
> - return 1;
> + return 0;
>   }
> 
>   bitmap_copy(line->invalid_bitmap, line->map_bitmap, lm->sec_per_line);
> --
> 2.7.4

LGTM

Reviewed-by: Javier González 



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH v2 2/3] lightnvm: pblk: garbage collect lines with failed writes

2018-04-30 Thread Javier Gonzalez

> On 30 Apr 2018, at 11.14, Javier Gonzalez  wrote:
> 
>> On 24 Apr 2018, at 07.45, Hans Holmberg  
>> wrote:
>> 
>> From: Hans Holmberg 
>> 
>> Write failures should not happen under normal circumstances,
>> so in order to bring the chunk back into a known state as soon
>> as possible, evacuate all the valid data out of the line and let the
>> fw judge if the block can be written to in the next reset cycle.
>> 
>> Do this by introducing a new gc list for lines with failed writes,
>> and ensure that the rate limiter allocates a small portion of
>> the write bandwidth to get the job done.
>> 
>> The lba list is saved in memory for use during gc as we
>> cannot gurantee that the emeta data is readable if a write
>> error occurred.
>> 
>> Signed-off-by: Hans Holmberg 
>> ---
>> drivers/lightnvm/pblk-core.c  |  45 ++-
>> drivers/lightnvm/pblk-gc.c| 102 
>> +++---
>> drivers/lightnvm/pblk-init.c  |  45 ---
>> drivers/lightnvm/pblk-rl.c|  29 ++--
>> drivers/lightnvm/pblk-sysfs.c |  15 ++-
>> drivers/lightnvm/pblk-write.c |   2 +
>> drivers/lightnvm/pblk.h   |  25 +--
>> 7 files changed, 199 insertions(+), 64 deletions(-)
>> 
>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>> index 7762e89..413cf3b 100644
>> --- a/drivers/lightnvm/pblk-core.c
>> +++ b/drivers/lightnvm/pblk-core.c
>> @@ -373,7 +373,13 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, 
>> struct pblk_line *line)
>> 
>>  lockdep_assert_held(>lock);
>> 
>> -if (!vsc) {
>> +if (line->w_err_gc->has_write_err) {
>> +if (line->gc_group != PBLK_LINEGC_WERR) {
>> +line->gc_group = PBLK_LINEGC_WERR;
>> +move_list = _mg->gc_werr_list;
>> +pblk_rl_werr_line_in(>rl);
>> +}
>> +} else if (!vsc) {
>>  if (line->gc_group != PBLK_LINEGC_FULL) {
>>  line->gc_group = PBLK_LINEGC_FULL;
>>  move_list = _mg->gc_full_list;
>> @@ -1603,8 +1609,13 @@ static void __pblk_line_put(struct pblk *pblk, struct 
>> pblk_line *line)
>>  line->state = PBLK_LINESTATE_FREE;
>>  line->gc_group = PBLK_LINEGC_NONE;
>>  pblk_line_free(line);
>> -spin_unlock(>lock);
>> 
>> +if (line->w_err_gc->has_write_err) {
>> +pblk_rl_werr_line_out(>rl);
>> +line->w_err_gc->has_write_err = 0;
>> +}
>> +
>> +spin_unlock(>lock);
>>  atomic_dec(>pipeline_gc);
>> 
>>  spin_lock(_mg->free_lock);
>> @@ -1767,11 +1778,34 @@ void pblk_line_close_meta(struct pblk *pblk, struct 
>> pblk_line *line)
>> 
>>  spin_lock(_mg->close_lock);
>>  spin_lock(>lock);
>> +
>> +/* Update the in-memory start address for emeta, in case it has
>> + * shifted due to write errors
>> + */
>> +if (line->emeta_ssec != line->cur_sec)
>> +line->emeta_ssec = line->cur_sec;
>> +
>>  list_add_tail(>list, _mg->emeta_list);
>>  spin_unlock(>lock);
>>  spin_unlock(_mg->close_lock);
>> 
>>  pblk_line_should_sync_meta(pblk);
>> +
>> +
>> +}
>> +
>> +static void pblk_save_lba_list(struct pblk *pblk, struct pblk_line *line)
>> +{
>> +struct pblk_line_meta *lm = >lm;
>> +struct pblk_line_mgmt *l_mg = >l_mg;
>> +unsigned int lba_list_size = lm->emeta_len[2];
>> +struct pblk_w_err_gc *w_err_gc = line->w_err_gc;
>> +struct pblk_emeta *emeta = line->emeta;
>> +
>> +w_err_gc->lba_list = pblk_malloc(lba_list_size,
>> + l_mg->emeta_alloc_type, GFP_KERNEL);
>> +memcpy(w_err_gc->lba_list, emeta_to_lbas(pblk, emeta->buf),
>> +lba_list_size);
>> }
>> 
>> void pblk_line_close_ws(struct work_struct *work)
>> @@ -1780,6 +1814,13 @@ void pblk_line_close_ws(struct work_struct *work)
>>  ws);
>>  struct pblk *pblk = line_ws->pblk;
>>  struct pblk_line *line = line_ws->line;
>> +struct pblk_w_err_gc *w_err_gc = line->w_err_gc;
>> +
>> +/* Write errors makes the emeta start address stored in smeta invalid,
>> 

Re: [PATCH v2 2/3] lightnvm: pblk: garbage collect lines with failed writes

2018-04-30 Thread Javier Gonzalez
> On 24 Apr 2018, at 07.45, Hans Holmberg  
> wrote:
> 
> From: Hans Holmberg 
> 
> Write failures should not happen under normal circumstances,
> so in order to bring the chunk back into a known state as soon
> as possible, evacuate all the valid data out of the line and let the
> fw judge if the block can be written to in the next reset cycle.
> 
> Do this by introducing a new gc list for lines with failed writes,
> and ensure that the rate limiter allocates a small portion of
> the write bandwidth to get the job done.
> 
> The lba list is saved in memory for use during gc as we
> cannot gurantee that the emeta data is readable if a write
> error occurred.
> 
> Signed-off-by: Hans Holmberg 
> ---
> drivers/lightnvm/pblk-core.c  |  45 ++-
> drivers/lightnvm/pblk-gc.c| 102 +++---
> drivers/lightnvm/pblk-init.c  |  45 ---
> drivers/lightnvm/pblk-rl.c|  29 ++--
> drivers/lightnvm/pblk-sysfs.c |  15 ++-
> drivers/lightnvm/pblk-write.c |   2 +
> drivers/lightnvm/pblk.h   |  25 +--
> 7 files changed, 199 insertions(+), 64 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 7762e89..413cf3b 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -373,7 +373,13 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, 
> struct pblk_line *line)
> 
>   lockdep_assert_held(>lock);
> 
> - if (!vsc) {
> + if (line->w_err_gc->has_write_err) {
> + if (line->gc_group != PBLK_LINEGC_WERR) {
> + line->gc_group = PBLK_LINEGC_WERR;
> + move_list = _mg->gc_werr_list;
> + pblk_rl_werr_line_in(>rl);
> + }
> + } else if (!vsc) {
>   if (line->gc_group != PBLK_LINEGC_FULL) {
>   line->gc_group = PBLK_LINEGC_FULL;
>   move_list = _mg->gc_full_list;
> @@ -1603,8 +1609,13 @@ static void __pblk_line_put(struct pblk *pblk, struct 
> pblk_line *line)
>   line->state = PBLK_LINESTATE_FREE;
>   line->gc_group = PBLK_LINEGC_NONE;
>   pblk_line_free(line);
> - spin_unlock(>lock);
> 
> + if (line->w_err_gc->has_write_err) {
> + pblk_rl_werr_line_out(>rl);
> + line->w_err_gc->has_write_err = 0;
> + }
> +
> + spin_unlock(>lock);
>   atomic_dec(>pipeline_gc);
> 
>   spin_lock(_mg->free_lock);
> @@ -1767,11 +1778,34 @@ void pblk_line_close_meta(struct pblk *pblk, struct 
> pblk_line *line)
> 
>   spin_lock(_mg->close_lock);
>   spin_lock(>lock);
> +
> + /* Update the in-memory start address for emeta, in case it has
> +  * shifted due to write errors
> +  */
> + if (line->emeta_ssec != line->cur_sec)
> + line->emeta_ssec = line->cur_sec;
> +
>   list_add_tail(>list, _mg->emeta_list);
>   spin_unlock(>lock);
>   spin_unlock(_mg->close_lock);
> 
>   pblk_line_should_sync_meta(pblk);
> +
> +
> +}
> +
> +static void pblk_save_lba_list(struct pblk *pblk, struct pblk_line *line)
> +{
> + struct pblk_line_meta *lm = >lm;
> + struct pblk_line_mgmt *l_mg = >l_mg;
> + unsigned int lba_list_size = lm->emeta_len[2];
> + struct pblk_w_err_gc *w_err_gc = line->w_err_gc;
> + struct pblk_emeta *emeta = line->emeta;
> +
> + w_err_gc->lba_list = pblk_malloc(lba_list_size,
> +  l_mg->emeta_alloc_type, GFP_KERNEL);
> + memcpy(w_err_gc->lba_list, emeta_to_lbas(pblk, emeta->buf),
> + lba_list_size);
> }
> 
> void pblk_line_close_ws(struct work_struct *work)
> @@ -1780,6 +1814,13 @@ void pblk_line_close_ws(struct work_struct *work)
>   ws);
>   struct pblk *pblk = line_ws->pblk;
>   struct pblk_line *line = line_ws->line;
> + struct pblk_w_err_gc *w_err_gc = line->w_err_gc;
> +
> + /* Write errors makes the emeta start address stored in smeta invalid,
> +  * so keep a copy of the lba list until we've gc'd the line
> +  */
> + if (w_err_gc->has_write_err)
> + pblk_save_lba_list(pblk, line);
> 
>   pblk_line_close(pblk, line);
>   mempool_free(line_ws, pblk->gen_ws_pool);
> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
> index b0cc277..df88f1b 100644
> --- a/drivers/lightnvm/pblk-gc.c
> +++ b/drivers/lightnvm/pblk-gc.c
> @@ -129,6 +129,53 @@ static void pblk_gc_line_ws(struct work_struct *work)
>   kfree(gc_rq_ws);
> }
> 
> +static __le64 *get_lba_list_from_emeta(struct pblk *pblk,
> +struct pblk_line *line)
> +{
> + struct line_emeta *emeta_buf;
> + struct pblk_line_mgmt *l_mg = >l_mg;
> + struct pblk_line_meta *lm = >lm;
> + unsigned int lba_list_size = 

Re: [PATCH v2 2/3] lightnvm: pblk: garbage collect lines with failed writes

2018-04-30 Thread Javier Gonzalez
> On 24 Apr 2018, at 07.45, Hans Holmberg  
> wrote:
> 
> From: Hans Holmberg 
> 
> Write failures should not happen under normal circumstances,
> so in order to bring the chunk back into a known state as soon
> as possible, evacuate all the valid data out of the line and let the
> fw judge if the block can be written to in the next reset cycle.
> 
> Do this by introducing a new gc list for lines with failed writes,
> and ensure that the rate limiter allocates a small portion of
> the write bandwidth to get the job done.
> 
> The lba list is saved in memory for use during gc as we
> cannot gurantee that the emeta data is readable if a write
> error occurred.
> 
> Signed-off-by: Hans Holmberg 
> ---
> drivers/lightnvm/pblk-core.c  |  45 ++-
> drivers/lightnvm/pblk-gc.c| 102 +++---
> drivers/lightnvm/pblk-init.c  |  45 ---
> drivers/lightnvm/pblk-rl.c|  29 ++--
> drivers/lightnvm/pblk-sysfs.c |  15 ++-
> drivers/lightnvm/pblk-write.c |   2 +
> drivers/lightnvm/pblk.h   |  25 +--
> 7 files changed, 199 insertions(+), 64 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 7762e89..413cf3b 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -373,7 +373,13 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, 
> struct pblk_line *line)
> 
>   lockdep_assert_held(>lock);
> 
> - if (!vsc) {
> + if (line->w_err_gc->has_write_err) {
> + if (line->gc_group != PBLK_LINEGC_WERR) {
> + line->gc_group = PBLK_LINEGC_WERR;
> + move_list = _mg->gc_werr_list;
> + pblk_rl_werr_line_in(>rl);
> + }
> + } else if (!vsc) {
>   if (line->gc_group != PBLK_LINEGC_FULL) {
>   line->gc_group = PBLK_LINEGC_FULL;
>   move_list = _mg->gc_full_list;
> @@ -1603,8 +1609,13 @@ static void __pblk_line_put(struct pblk *pblk, struct 
> pblk_line *line)
>   line->state = PBLK_LINESTATE_FREE;
>   line->gc_group = PBLK_LINEGC_NONE;
>   pblk_line_free(line);
> - spin_unlock(>lock);
> 
> + if (line->w_err_gc->has_write_err) {
> + pblk_rl_werr_line_out(>rl);
> + line->w_err_gc->has_write_err = 0;
> + }
> +
> + spin_unlock(>lock);
>   atomic_dec(>pipeline_gc);
> 
>   spin_lock(_mg->free_lock);
> @@ -1767,11 +1778,34 @@ void pblk_line_close_meta(struct pblk *pblk, struct 
> pblk_line *line)
> 
>   spin_lock(_mg->close_lock);
>   spin_lock(>lock);
> +
> + /* Update the in-memory start address for emeta, in case it has
> +  * shifted due to write errors
> +  */
> + if (line->emeta_ssec != line->cur_sec)
> + line->emeta_ssec = line->cur_sec;
> +
>   list_add_tail(>list, _mg->emeta_list);
>   spin_unlock(>lock);
>   spin_unlock(_mg->close_lock);
> 
>   pblk_line_should_sync_meta(pblk);
> +
> +
> +}
> +
> +static void pblk_save_lba_list(struct pblk *pblk, struct pblk_line *line)
> +{
> + struct pblk_line_meta *lm = >lm;
> + struct pblk_line_mgmt *l_mg = >l_mg;
> + unsigned int lba_list_size = lm->emeta_len[2];
> + struct pblk_w_err_gc *w_err_gc = line->w_err_gc;
> + struct pblk_emeta *emeta = line->emeta;
> +
> + w_err_gc->lba_list = pblk_malloc(lba_list_size,
> +  l_mg->emeta_alloc_type, GFP_KERNEL);
> + memcpy(w_err_gc->lba_list, emeta_to_lbas(pblk, emeta->buf),
> + lba_list_size);
> }
> 
> void pblk_line_close_ws(struct work_struct *work)
> @@ -1780,6 +1814,13 @@ void pblk_line_close_ws(struct work_struct *work)
>   ws);
>   struct pblk *pblk = line_ws->pblk;
>   struct pblk_line *line = line_ws->line;
> + struct pblk_w_err_gc *w_err_gc = line->w_err_gc;
> +
> + /* Write errors makes the emeta start address stored in smeta invalid,
> +  * so keep a copy of the lba list until we've gc'd the line
> +  */
> + if (w_err_gc->has_write_err)
> + pblk_save_lba_list(pblk, line);
> 
>   pblk_line_close(pblk, line);
>   mempool_free(line_ws, pblk->gen_ws_pool);
> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
> index b0cc277..df88f1b 100644
> --- a/drivers/lightnvm/pblk-gc.c
> +++ b/drivers/lightnvm/pblk-gc.c
> @@ -129,6 +129,53 @@ static void pblk_gc_line_ws(struct work_struct *work)
>   kfree(gc_rq_ws);
> }
> 
> +static __le64 *get_lba_list_from_emeta(struct pblk *pblk,
> +struct pblk_line *line)
> +{
> + struct line_emeta *emeta_buf;
> + struct pblk_line_mgmt *l_mg = >l_mg;
> + struct pblk_line_meta *lm = >lm;
> + unsigned int lba_list_size = lm->emeta_len[2];
> + __le64 *lba_list;
> + int ret;
> +
> + emeta_buf = 

Re: [PATCH v2 1/3] lightnvm: pblk: rework write error recovery path

2018-04-30 Thread Javier Gonzalez
> On 24 Apr 2018, at 07.45, Hans Holmberg  
> wrote:
> 
> From: Hans Holmberg 
> 
> The write error recovery path is incomplete, so rework
> the write error recovery handling to do resubmits directly
> from the write buffer.
> 
> When a write error occurs, the remaining sectors in the chunk are
> mapped out and invalidated and the request inserted in a resubmit list.
> 
> The writer thread checks if there are any requests to resubmit,
> scans and invalidates any lbas that have been overwritten by later
> writes and resubmits the failed entries.
> 
> Signed-off-by: Hans Holmberg 
> ---
> drivers/lightnvm/pblk-init.c |   2 +
> drivers/lightnvm/pblk-rb.c   |  39 --
> drivers/lightnvm/pblk-recovery.c |  91 -
> drivers/lightnvm/pblk-write.c| 267 ++-
> drivers/lightnvm/pblk.h  |  11 +-
> 5 files changed, 181 insertions(+), 229 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index bfc488d..6f06727 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -426,6 +426,7 @@ static int pblk_core_init(struct pblk *pblk)
>   goto free_r_end_wq;
> 
>   INIT_LIST_HEAD(>compl_list);
> + INIT_LIST_HEAD(>resubmit_list);
> 
>   return 0;
> 
> @@ -1185,6 +1186,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct 
> gendisk *tdisk,
>   pblk->state = PBLK_STATE_RUNNING;
>   pblk->gc.gc_enabled = 0;
> 
> + spin_lock_init(>resubmit_lock);
>   spin_lock_init(>trans_lock);
>   spin_lock_init(>lock);
> 
> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
> index 024a366..00cd1f2 100644
> --- a/drivers/lightnvm/pblk-rb.c
> +++ b/drivers/lightnvm/pblk-rb.c
> @@ -503,45 +503,6 @@ int pblk_rb_may_write_gc(struct pblk_rb *rb, unsigned 
> int nr_entries,
> }
> 
> /*
> - * The caller of this function must ensure that the backpointer will not
> - * overwrite the entries passed on the list.
> - */
> -unsigned int pblk_rb_read_to_bio_list(struct pblk_rb *rb, struct bio *bio,
> -   struct list_head *list,
> -   unsigned int max)
> -{
> - struct pblk_rb_entry *entry, *tentry;
> - struct page *page;
> - unsigned int read = 0;
> - int ret;
> -
> - list_for_each_entry_safe(entry, tentry, list, index) {
> - if (read > max) {
> - pr_err("pblk: too many entries on list\n");
> - goto out;
> - }
> -
> - page = virt_to_page(entry->data);
> - if (!page) {
> - pr_err("pblk: could not allocate write bio page\n");
> - goto out;
> - }
> -
> - ret = bio_add_page(bio, page, rb->seg_size, 0);
> - if (ret != rb->seg_size) {
> - pr_err("pblk: could not add page to write bio\n");
> - goto out;
> - }
> -
> - list_del(>index);
> - read++;
> - }
> -
> -out:
> - return read;
> -}
> -
> -/*
>  * Read available entries on rb and add them to the given bio. To avoid a 
> memory
>  * copy, a page reference to the write buffer is used to be added to the bio.
>  *
> diff --git a/drivers/lightnvm/pblk-recovery.c 
> b/drivers/lightnvm/pblk-recovery.c
> index 9cb6d5d..5983428 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -16,97 +16,6 @@
> 
> #include "pblk.h"
> 
> -void pblk_submit_rec(struct work_struct *work)
> -{
> - struct pblk_rec_ctx *recovery =
> - container_of(work, struct pblk_rec_ctx, ws_rec);
> - struct pblk *pblk = recovery->pblk;
> - struct nvm_rq *rqd = recovery->rqd;
> - struct pblk_c_ctx *c_ctx = nvm_rq_to_pdu(rqd);
> - struct bio *bio;
> - unsigned int nr_rec_secs;
> - unsigned int pgs_read;
> - int ret;
> -
> - nr_rec_secs = bitmap_weight((unsigned long int *)>ppa_status,
> - NVM_MAX_VLBA);
> -
> - bio = bio_alloc(GFP_KERNEL, nr_rec_secs);
> -
> - bio->bi_iter.bi_sector = 0;
> - bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
> - rqd->bio = bio;
> - rqd->nr_ppas = nr_rec_secs;
> -
> - pgs_read = pblk_rb_read_to_bio_list(>rwb, bio, >failed,
> - nr_rec_secs);
> - if (pgs_read != nr_rec_secs) {
> - pr_err("pblk: could not read recovery entries\n");
> - goto err;
> - }
> -
> - if (pblk_setup_w_rec_rq(pblk, rqd, c_ctx)) {
> - pr_err("pblk: could not setup recovery request\n");
> - goto err;
> - }
> -
> -#ifdef CONFIG_NVM_DEBUG
> - atomic_long_add(nr_rec_secs, >recov_writes);
> -#endif
> -
> - ret = pblk_submit_io(pblk, rqd);
> - if 

Re: [PATCH v2 1/3] lightnvm: pblk: rework write error recovery path

2018-04-30 Thread Javier Gonzalez
> On 24 Apr 2018, at 07.45, Hans Holmberg  
> wrote:
> 
> From: Hans Holmberg 
> 
> The write error recovery path is incomplete, so rework
> the write error recovery handling to do resubmits directly
> from the write buffer.
> 
> When a write error occurs, the remaining sectors in the chunk are
> mapped out and invalidated and the request inserted in a resubmit list.
> 
> The writer thread checks if there are any requests to resubmit,
> scans and invalidates any lbas that have been overwritten by later
> writes and resubmits the failed entries.
> 
> Signed-off-by: Hans Holmberg 
> ---
> drivers/lightnvm/pblk-init.c |   2 +
> drivers/lightnvm/pblk-rb.c   |  39 --
> drivers/lightnvm/pblk-recovery.c |  91 -
> drivers/lightnvm/pblk-write.c| 267 ++-
> drivers/lightnvm/pblk.h  |  11 +-
> 5 files changed, 181 insertions(+), 229 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index bfc488d..6f06727 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -426,6 +426,7 @@ static int pblk_core_init(struct pblk *pblk)
>   goto free_r_end_wq;
> 
>   INIT_LIST_HEAD(>compl_list);
> + INIT_LIST_HEAD(>resubmit_list);
> 
>   return 0;
> 
> @@ -1185,6 +1186,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct 
> gendisk *tdisk,
>   pblk->state = PBLK_STATE_RUNNING;
>   pblk->gc.gc_enabled = 0;
> 
> + spin_lock_init(>resubmit_lock);
>   spin_lock_init(>trans_lock);
>   spin_lock_init(>lock);
> 
> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
> index 024a366..00cd1f2 100644
> --- a/drivers/lightnvm/pblk-rb.c
> +++ b/drivers/lightnvm/pblk-rb.c
> @@ -503,45 +503,6 @@ int pblk_rb_may_write_gc(struct pblk_rb *rb, unsigned 
> int nr_entries,
> }
> 
> /*
> - * The caller of this function must ensure that the backpointer will not
> - * overwrite the entries passed on the list.
> - */
> -unsigned int pblk_rb_read_to_bio_list(struct pblk_rb *rb, struct bio *bio,
> -   struct list_head *list,
> -   unsigned int max)
> -{
> - struct pblk_rb_entry *entry, *tentry;
> - struct page *page;
> - unsigned int read = 0;
> - int ret;
> -
> - list_for_each_entry_safe(entry, tentry, list, index) {
> - if (read > max) {
> - pr_err("pblk: too many entries on list\n");
> - goto out;
> - }
> -
> - page = virt_to_page(entry->data);
> - if (!page) {
> - pr_err("pblk: could not allocate write bio page\n");
> - goto out;
> - }
> -
> - ret = bio_add_page(bio, page, rb->seg_size, 0);
> - if (ret != rb->seg_size) {
> - pr_err("pblk: could not add page to write bio\n");
> - goto out;
> - }
> -
> - list_del(>index);
> - read++;
> - }
> -
> -out:
> - return read;
> -}
> -
> -/*
>  * Read available entries on rb and add them to the given bio. To avoid a 
> memory
>  * copy, a page reference to the write buffer is used to be added to the bio.
>  *
> diff --git a/drivers/lightnvm/pblk-recovery.c 
> b/drivers/lightnvm/pblk-recovery.c
> index 9cb6d5d..5983428 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -16,97 +16,6 @@
> 
> #include "pblk.h"
> 
> -void pblk_submit_rec(struct work_struct *work)
> -{
> - struct pblk_rec_ctx *recovery =
> - container_of(work, struct pblk_rec_ctx, ws_rec);
> - struct pblk *pblk = recovery->pblk;
> - struct nvm_rq *rqd = recovery->rqd;
> - struct pblk_c_ctx *c_ctx = nvm_rq_to_pdu(rqd);
> - struct bio *bio;
> - unsigned int nr_rec_secs;
> - unsigned int pgs_read;
> - int ret;
> -
> - nr_rec_secs = bitmap_weight((unsigned long int *)>ppa_status,
> - NVM_MAX_VLBA);
> -
> - bio = bio_alloc(GFP_KERNEL, nr_rec_secs);
> -
> - bio->bi_iter.bi_sector = 0;
> - bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
> - rqd->bio = bio;
> - rqd->nr_ppas = nr_rec_secs;
> -
> - pgs_read = pblk_rb_read_to_bio_list(>rwb, bio, >failed,
> - nr_rec_secs);
> - if (pgs_read != nr_rec_secs) {
> - pr_err("pblk: could not read recovery entries\n");
> - goto err;
> - }
> -
> - if (pblk_setup_w_rec_rq(pblk, rqd, c_ctx)) {
> - pr_err("pblk: could not setup recovery request\n");
> - goto err;
> - }
> -
> -#ifdef CONFIG_NVM_DEBUG
> - atomic_long_add(nr_rec_secs, >recov_writes);
> -#endif
> -
> - ret = pblk_submit_io(pblk, rqd);
> - if (ret) {
> - pr_err("pblk: I/O submission failed: %d\n", ret);
> -   

Re: [PATCH v2 0/3] Rework write error handling in pblk

2018-04-30 Thread Javier Gonzalez
> On 28 Apr 2018, at 21.31, Matias Bjørling  wrote:
> 
> On 4/23/18 10:45 PM, Hans Holmberg wrote:
>> From: Hans Holmberg 
>> This patch series fixes the(currently incomplete) write error handling
>> in pblk by:
>>  * queuing and re-submitting failed writes in the write buffer
>>  * evacuating valid data data in lines with write failures, so the
>>chunk(s) with write failures can be reset to a known state by the fw
>> Lines with failures in smeta are put back on the free list.
>> Failed chunks will be reset on the next use.
>> If a write failes in emeta, the lba list is cached so the line can be
>> garbage collected without scanning the out-of-band area.
>> Changes in V2:
>> - Added the recov_writes counter increase to the new path
>> - Moved lba list emeta reading during gc to a separate function
>> - Allocating the saved lba list with pblk_malloc instead of kmalloc
>> - Fixed formatting issues
>> - Removed dead code
>> Hans Holmberg (3):
>>   lightnvm: pblk: rework write error recovery path
>>   lightnvm: pblk: garbage collect lines with failed writes
>>   lightnvm: pblk: fix smeta write error path
>>  drivers/lightnvm/pblk-core.c |  52 +++-
>>  drivers/lightnvm/pblk-gc.c   | 102 +--
>>  drivers/lightnvm/pblk-init.c |  47 ---
>>  drivers/lightnvm/pblk-rb.c   |  39 --
>>  drivers/lightnvm/pblk-recovery.c |  91 -
>>  drivers/lightnvm/pblk-rl.c   |  29 -
>>  drivers/lightnvm/pblk-sysfs.c|  15 ++-
>>  drivers/lightnvm/pblk-write.c| 269 
>> ++-
>>  drivers/lightnvm/pblk.h  |  36 --
>>  9 files changed, 384 insertions(+), 296 deletions(-)
> 
> Thanks Hans. I've applied 1 & 3. The second did not apply cleanly to 
> for-4.18/core. Could you please resend a rebased version?

Hans' patches apply on top of the fixes I sent this week. I have just
sent the V2 and the patches still apply. You can find them at:
  https://github.com/OpenChannelSSD/linux/tree/for-4.18/pblk

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH v2 0/3] Rework write error handling in pblk

2018-04-30 Thread Javier Gonzalez
> On 28 Apr 2018, at 21.31, Matias Bjørling  wrote:
> 
> On 4/23/18 10:45 PM, Hans Holmberg wrote:
>> From: Hans Holmberg 
>> This patch series fixes the(currently incomplete) write error handling
>> in pblk by:
>>  * queuing and re-submitting failed writes in the write buffer
>>  * evacuating valid data data in lines with write failures, so the
>>chunk(s) with write failures can be reset to a known state by the fw
>> Lines with failures in smeta are put back on the free list.
>> Failed chunks will be reset on the next use.
>> If a write failes in emeta, the lba list is cached so the line can be
>> garbage collected without scanning the out-of-band area.
>> Changes in V2:
>> - Added the recov_writes counter increase to the new path
>> - Moved lba list emeta reading during gc to a separate function
>> - Allocating the saved lba list with pblk_malloc instead of kmalloc
>> - Fixed formatting issues
>> - Removed dead code
>> Hans Holmberg (3):
>>   lightnvm: pblk: rework write error recovery path
>>   lightnvm: pblk: garbage collect lines with failed writes
>>   lightnvm: pblk: fix smeta write error path
>>  drivers/lightnvm/pblk-core.c |  52 +++-
>>  drivers/lightnvm/pblk-gc.c   | 102 +--
>>  drivers/lightnvm/pblk-init.c |  47 ---
>>  drivers/lightnvm/pblk-rb.c   |  39 --
>>  drivers/lightnvm/pblk-recovery.c |  91 -
>>  drivers/lightnvm/pblk-rl.c   |  29 -
>>  drivers/lightnvm/pblk-sysfs.c|  15 ++-
>>  drivers/lightnvm/pblk-write.c| 269 
>> ++-
>>  drivers/lightnvm/pblk.h  |  36 --
>>  9 files changed, 384 insertions(+), 296 deletions(-)
> 
> Thanks Hans. I've applied 1 & 3. The second did not apply cleanly to 
> for-4.18/core. Could you please resend a rebased version?

Hans' patches apply on top of the fixes I sent this week. I have just
sent the V2 and the patches still apply. You can find them at:
  https://github.com/OpenChannelSSD/linux/tree/for-4.18/pblk

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 3/3] lightnvm: pblk: fix smeta write error path

2018-04-20 Thread Javier Gonzalez
> On 19 Apr 2018, at 09.39, Hans Holmberg  
> wrote:
> 
> From: Hans Holmberg 
> 
> Smeta write errors were previously ignored. Skip these
> lines instead and throw them back on the free
> list, so the chunks will go through a reset cycle
> before we attempt to use the line again.
> 
> Signed-off-by: Hans Holmberg 
> ---
> drivers/lightnvm/pblk-core.c | 7 ---
> 1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index f6135e4..485fe8c 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -849,9 +849,10 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, 
> struct pblk_line *line,
>   atomic_dec(>inflight_io);
> 
>   if (rqd.error) {
> - if (dir == PBLK_WRITE)
> + if (dir == PBLK_WRITE) {
>   pblk_log_write_err(pblk, );
> - else if (dir == PBLK_READ)
> + ret = 1;
> + } else if (dir == PBLK_READ)
>   pblk_log_read_err(pblk, );
>   }
> 
> @@ -1120,7 +1121,7 @@ static int pblk_line_init_bb(struct pblk *pblk, struct 
> pblk_line *line,
> 
>   if (init && pblk_line_submit_smeta_io(pblk, line, off, PBLK_WRITE)) {
>   pr_debug("pblk: line smeta I/O failed. Retry\n");
> - return 1;
> + return 0;
>   }
> 
>   bitmap_copy(line->invalid_bitmap, line->map_bitmap, lm->sec_per_line);
> --
> 2.7.4

Looks good to me..

Reviewed-by: Javier González 




signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 3/3] lightnvm: pblk: fix smeta write error path

2018-04-20 Thread Javier Gonzalez
> On 19 Apr 2018, at 09.39, Hans Holmberg  
> wrote:
> 
> From: Hans Holmberg 
> 
> Smeta write errors were previously ignored. Skip these
> lines instead and throw them back on the free
> list, so the chunks will go through a reset cycle
> before we attempt to use the line again.
> 
> Signed-off-by: Hans Holmberg 
> ---
> drivers/lightnvm/pblk-core.c | 7 ---
> 1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index f6135e4..485fe8c 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -849,9 +849,10 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, 
> struct pblk_line *line,
>   atomic_dec(>inflight_io);
> 
>   if (rqd.error) {
> - if (dir == PBLK_WRITE)
> + if (dir == PBLK_WRITE) {
>   pblk_log_write_err(pblk, );
> - else if (dir == PBLK_READ)
> + ret = 1;
> + } else if (dir == PBLK_READ)
>   pblk_log_read_err(pblk, );
>   }
> 
> @@ -1120,7 +1121,7 @@ static int pblk_line_init_bb(struct pblk *pblk, struct 
> pblk_line *line,
> 
>   if (init && pblk_line_submit_smeta_io(pblk, line, off, PBLK_WRITE)) {
>   pr_debug("pblk: line smeta I/O failed. Retry\n");
> - return 1;
> + return 0;
>   }
> 
>   bitmap_copy(line->invalid_bitmap, line->map_bitmap, lm->sec_per_line);
> --
> 2.7.4

Looks good to me..

Reviewed-by: Javier González 




signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 2/3] lightnvm: pblk: garbage collect lines with failed writes

2018-04-20 Thread Javier Gonzalez
> On 19 Apr 2018, at 09.39, Hans Holmberg  
> wrote:
> 
> From: Hans Holmberg 
> 
> Write failures should not happen under normal circumstances,
> so in order to bring the chunk back into a known state as soon
> as possible, evacuate all the valid data out of the line and let the
> fw judge if the block can be written to in the next reset cycle.
> 
> Do this by introducing a new gc list for lines with failed writes,
> and ensure that the rate limiter allocates a small portion of
> the write bandwidth to get the job done.
> 
> The lba list is saved in memory for use during gc as we
> cannot gurantee that the emeta data is readable if a write
> error occurred.
> 
> Signed-off-by: Hans Holmberg 
> ---
> drivers/lightnvm/pblk-core.c  | 43 +--
> drivers/lightnvm/pblk-gc.c| 79 +++
> drivers/lightnvm/pblk-init.c  | 39 ++---
> drivers/lightnvm/pblk-rl.c| 29 +---
> drivers/lightnvm/pblk-sysfs.c | 15 ++--
> drivers/lightnvm/pblk-write.c |  2 ++
> drivers/lightnvm/pblk.h   | 25 +++---
> 7 files changed, 178 insertions(+), 54 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 7762e89..f6135e4 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -373,7 +373,13 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, 
> struct pblk_line *line)
> 
>   lockdep_assert_held(>lock);
> 
> - if (!vsc) {
> + if (line->w_err_gc->has_write_err) {
> + if (line->gc_group != PBLK_LINEGC_WERR) {
> + line->gc_group = PBLK_LINEGC_WERR;
> + move_list = _mg->gc_werr_list;
> + pblk_rl_werr_line_in(>rl);
> + }
> + } else if (!vsc) {
>   if (line->gc_group != PBLK_LINEGC_FULL) {
>   line->gc_group = PBLK_LINEGC_FULL;
>   move_list = _mg->gc_full_list;
> @@ -1603,8 +1609,13 @@ static void __pblk_line_put(struct pblk *pblk, struct 
> pblk_line *line)
>   line->state = PBLK_LINESTATE_FREE;
>   line->gc_group = PBLK_LINEGC_NONE;
>   pblk_line_free(line);
> - spin_unlock(>lock);
> 
> + if (line->w_err_gc->has_write_err) {
> + pblk_rl_werr_line_out(>rl);
> + line->w_err_gc->has_write_err = 0;
> + }
> +
> + spin_unlock(>lock);
>   atomic_dec(>pipeline_gc);
> 
>   spin_lock(_mg->free_lock);
> @@ -1767,11 +1778,32 @@ void pblk_line_close_meta(struct pblk *pblk, struct 
> pblk_line *line)
> 
>   spin_lock(_mg->close_lock);
>   spin_lock(>lock);
> +
> + /* Update the in-memory start address for emeta, in case it has
> +  * shifted due to write errors
> +  */
> + if (line->emeta_ssec != line->cur_sec)
> + line->emeta_ssec = line->cur_sec;
> +
>   list_add_tail(>list, _mg->emeta_list);
>   spin_unlock(>lock);
>   spin_unlock(_mg->close_lock);
> 
>   pblk_line_should_sync_meta(pblk);
> +
> +
> +}
> +
> +static void pblk_save_lba_list(struct pblk *pblk, struct pblk_line *line)
> +{
> + struct pblk_line_meta *lm = >lm;
> + unsigned int lba_list_size = lm->emeta_len[2];
> + struct pblk_w_err_gc *w_err_gc = line->w_err_gc;
> + struct pblk_emeta *emeta = line->emeta;
> +
> + w_err_gc->lba_list = kmalloc(lba_list_size, GFP_KERNEL);
> + memcpy(w_err_gc->lba_list, emeta_to_lbas(pblk, emeta->buf),
> + lba_list_size);
> }
> 
> void pblk_line_close_ws(struct work_struct *work)
> @@ -1780,6 +1812,13 @@ void pblk_line_close_ws(struct work_struct *work)
>   ws);
>   struct pblk *pblk = line_ws->pblk;
>   struct pblk_line *line = line_ws->line;
> + struct pblk_w_err_gc *w_err_gc = line->w_err_gc;
> +
> + /* Write errors makes the emeta start address stored in smeta invalid,
> +  * so keep a copy of the lba list until we've gc'd the line
> +  */
> + if (w_err_gc->has_write_err)
> + pblk_save_lba_list(pblk, line);
> 
>   pblk_line_close(pblk, line);
>   mempool_free(line_ws, pblk->gen_ws_pool);
> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
> index b0cc277..62f0548 100644
> --- a/drivers/lightnvm/pblk-gc.c
> +++ b/drivers/lightnvm/pblk-gc.c
> @@ -138,10 +138,10 @@ static void pblk_gc_line_prepare_ws(struct work_struct 
> *work)
>   struct pblk_line_mgmt *l_mg = >l_mg;
>   struct pblk_line_meta *lm = >lm;
>   struct pblk_gc *gc = >gc;
> - struct line_emeta *emeta_buf;
> + struct line_emeta *emeta_buf = NULL;
>   struct pblk_line_ws *gc_rq_ws;
>   struct pblk_gc_rq *gc_rq;
> - __le64 *lba_list;
> + __le64 *lba_list = NULL;
>   unsigned long *invalid_bitmap;
>   int sec_left, nr_secs, bit;
>   int ret;

Re: [PATCH 2/3] lightnvm: pblk: garbage collect lines with failed writes

2018-04-20 Thread Javier Gonzalez
> On 19 Apr 2018, at 09.39, Hans Holmberg  
> wrote:
> 
> From: Hans Holmberg 
> 
> Write failures should not happen under normal circumstances,
> so in order to bring the chunk back into a known state as soon
> as possible, evacuate all the valid data out of the line and let the
> fw judge if the block can be written to in the next reset cycle.
> 
> Do this by introducing a new gc list for lines with failed writes,
> and ensure that the rate limiter allocates a small portion of
> the write bandwidth to get the job done.
> 
> The lba list is saved in memory for use during gc as we
> cannot gurantee that the emeta data is readable if a write
> error occurred.
> 
> Signed-off-by: Hans Holmberg 
> ---
> drivers/lightnvm/pblk-core.c  | 43 +--
> drivers/lightnvm/pblk-gc.c| 79 +++
> drivers/lightnvm/pblk-init.c  | 39 ++---
> drivers/lightnvm/pblk-rl.c| 29 +---
> drivers/lightnvm/pblk-sysfs.c | 15 ++--
> drivers/lightnvm/pblk-write.c |  2 ++
> drivers/lightnvm/pblk.h   | 25 +++---
> 7 files changed, 178 insertions(+), 54 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 7762e89..f6135e4 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -373,7 +373,13 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, 
> struct pblk_line *line)
> 
>   lockdep_assert_held(>lock);
> 
> - if (!vsc) {
> + if (line->w_err_gc->has_write_err) {
> + if (line->gc_group != PBLK_LINEGC_WERR) {
> + line->gc_group = PBLK_LINEGC_WERR;
> + move_list = _mg->gc_werr_list;
> + pblk_rl_werr_line_in(>rl);
> + }
> + } else if (!vsc) {
>   if (line->gc_group != PBLK_LINEGC_FULL) {
>   line->gc_group = PBLK_LINEGC_FULL;
>   move_list = _mg->gc_full_list;
> @@ -1603,8 +1609,13 @@ static void __pblk_line_put(struct pblk *pblk, struct 
> pblk_line *line)
>   line->state = PBLK_LINESTATE_FREE;
>   line->gc_group = PBLK_LINEGC_NONE;
>   pblk_line_free(line);
> - spin_unlock(>lock);
> 
> + if (line->w_err_gc->has_write_err) {
> + pblk_rl_werr_line_out(>rl);
> + line->w_err_gc->has_write_err = 0;
> + }
> +
> + spin_unlock(>lock);
>   atomic_dec(>pipeline_gc);
> 
>   spin_lock(_mg->free_lock);
> @@ -1767,11 +1778,32 @@ void pblk_line_close_meta(struct pblk *pblk, struct 
> pblk_line *line)
> 
>   spin_lock(_mg->close_lock);
>   spin_lock(>lock);
> +
> + /* Update the in-memory start address for emeta, in case it has
> +  * shifted due to write errors
> +  */
> + if (line->emeta_ssec != line->cur_sec)
> + line->emeta_ssec = line->cur_sec;
> +
>   list_add_tail(>list, _mg->emeta_list);
>   spin_unlock(>lock);
>   spin_unlock(_mg->close_lock);
> 
>   pblk_line_should_sync_meta(pblk);
> +
> +
> +}
> +
> +static void pblk_save_lba_list(struct pblk *pblk, struct pblk_line *line)
> +{
> + struct pblk_line_meta *lm = >lm;
> + unsigned int lba_list_size = lm->emeta_len[2];
> + struct pblk_w_err_gc *w_err_gc = line->w_err_gc;
> + struct pblk_emeta *emeta = line->emeta;
> +
> + w_err_gc->lba_list = kmalloc(lba_list_size, GFP_KERNEL);
> + memcpy(w_err_gc->lba_list, emeta_to_lbas(pblk, emeta->buf),
> + lba_list_size);
> }
> 
> void pblk_line_close_ws(struct work_struct *work)
> @@ -1780,6 +1812,13 @@ void pblk_line_close_ws(struct work_struct *work)
>   ws);
>   struct pblk *pblk = line_ws->pblk;
>   struct pblk_line *line = line_ws->line;
> + struct pblk_w_err_gc *w_err_gc = line->w_err_gc;
> +
> + /* Write errors makes the emeta start address stored in smeta invalid,
> +  * so keep a copy of the lba list until we've gc'd the line
> +  */
> + if (w_err_gc->has_write_err)
> + pblk_save_lba_list(pblk, line);
> 
>   pblk_line_close(pblk, line);
>   mempool_free(line_ws, pblk->gen_ws_pool);
> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
> index b0cc277..62f0548 100644
> --- a/drivers/lightnvm/pblk-gc.c
> +++ b/drivers/lightnvm/pblk-gc.c
> @@ -138,10 +138,10 @@ static void pblk_gc_line_prepare_ws(struct work_struct 
> *work)
>   struct pblk_line_mgmt *l_mg = >l_mg;
>   struct pblk_line_meta *lm = >lm;
>   struct pblk_gc *gc = >gc;
> - struct line_emeta *emeta_buf;
> + struct line_emeta *emeta_buf = NULL;
>   struct pblk_line_ws *gc_rq_ws;
>   struct pblk_gc_rq *gc_rq;
> - __le64 *lba_list;
> + __le64 *lba_list = NULL;
>   unsigned long *invalid_bitmap;
>   int sec_left, nr_secs, bit;
>   int ret;
> @@ -150,34 +150,42 @@ static void pblk_gc_line_prepare_ws(struct work_struct 
> 

Re: [PATCH 1/3] lightnvm: pblk: rework write error recovery path

2018-04-20 Thread Javier Gonzalez
> On 19 Apr 2018, at 09.39, Hans Holmberg  
> wrote:
> 
> From: Hans Holmberg 
> 
> The write error recovery path is incomplete, so rework
> the write error recovery handling to do resubmits directly
> from the write buffer.
> 
> When a write error occurs, the remaining sectors in the chunk are
> mapped out and invalidated and the request inserted in a resubmit list.
> 
> The writer thread checks if there are any requests to resubmit,
> scans and invalidates any lbas that have been overwritten by later
> writes and resubmits the failed entries.
> 
> Signed-off-by: Hans Holmberg 
> ---
> drivers/lightnvm/pblk-init.c |   2 +
> drivers/lightnvm/pblk-recovery.c |  91 ---
> drivers/lightnvm/pblk-write.c| 241 ---
> drivers/lightnvm/pblk.h  |   8 +-
> 4 files changed, 180 insertions(+), 162 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index bfc488d..6f06727 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -426,6 +426,7 @@ static int pblk_core_init(struct pblk *pblk)
>   goto free_r_end_wq;
> 
>   INIT_LIST_HEAD(>compl_list);
> + INIT_LIST_HEAD(>resubmit_list);
> 
>   return 0;
> 
> @@ -1185,6 +1186,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct 
> gendisk *tdisk,
>   pblk->state = PBLK_STATE_RUNNING;
>   pblk->gc.gc_enabled = 0;
> 
> + spin_lock_init(>resubmit_lock);
>   spin_lock_init(>trans_lock);
>   spin_lock_init(>lock);
> 
> diff --git a/drivers/lightnvm/pblk-recovery.c 
> b/drivers/lightnvm/pblk-recovery.c
> index 9cb6d5d..5983428 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -16,97 +16,6 @@
> 
> #include "pblk.h"
> 
> -void pblk_submit_rec(struct work_struct *work)
> -{
> - struct pblk_rec_ctx *recovery =
> - container_of(work, struct pblk_rec_ctx, ws_rec);
> - struct pblk *pblk = recovery->pblk;
> - struct nvm_rq *rqd = recovery->rqd;
> - struct pblk_c_ctx *c_ctx = nvm_rq_to_pdu(rqd);
> - struct bio *bio;
> - unsigned int nr_rec_secs;
> - unsigned int pgs_read;
> - int ret;
> -
> - nr_rec_secs = bitmap_weight((unsigned long int *)>ppa_status,
> - NVM_MAX_VLBA);
> -
> - bio = bio_alloc(GFP_KERNEL, nr_rec_secs);
> -
> - bio->bi_iter.bi_sector = 0;
> - bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
> - rqd->bio = bio;
> - rqd->nr_ppas = nr_rec_secs;
> -
> - pgs_read = pblk_rb_read_to_bio_list(>rwb, bio, >failed,
> - nr_rec_secs);

Please, remove functions that are not longer used. Doing a pass on the
rest of the removed functions would be a good idea.

> - if (pgs_read != nr_rec_secs) {
> - pr_err("pblk: could not read recovery entries\n");
> - goto err;
> - }
> -
> - if (pblk_setup_w_rec_rq(pblk, rqd, c_ctx)) {

Same here

> - pr_err("pblk: could not setup recovery request\n");
> - goto err;
> - }
> -
> -#ifdef CONFIG_NVM_DEBUG
> - atomic_long_add(nr_rec_secs, >recov_writes);
> -#endif

Can you add this debug counter to the new path? I see you added other
counters, if it is a rename, can you put it on a separate patch?

> -
> - ret = pblk_submit_io(pblk, rqd);
> - if (ret) {
> - pr_err("pblk: I/O submission failed: %d\n", ret);
> - goto err;
> - }
> -
> - mempool_free(recovery, pblk->rec_pool);
> - return;
> -
> -err:
> - bio_put(bio);
> - pblk_free_rqd(pblk, rqd, PBLK_WRITE);
> -}
> -
> -int pblk_recov_setup_rq(struct pblk *pblk, struct pblk_c_ctx *c_ctx,
> - struct pblk_rec_ctx *recovery, u64 *comp_bits,
> - unsigned int comp)
> -{
> - struct nvm_rq *rec_rqd;
> - struct pblk_c_ctx *rec_ctx;
> - int nr_entries = c_ctx->nr_valid + c_ctx->nr_padded;
> -
> - rec_rqd = pblk_alloc_rqd(pblk, PBLK_WRITE);
> - rec_ctx = nvm_rq_to_pdu(rec_rqd);
> -
> - /* Copy completion bitmap, but exclude the first X completed entries */
> - bitmap_shift_right((unsigned long int *)_rqd->ppa_status,
> - (unsigned long int *)comp_bits,
> - comp, NVM_MAX_VLBA);
> -
> - /* Save the context for the entries that need to be re-written and
> -  * update current context with the completed entries.
> -  */
> - rec_ctx->sentry = pblk_rb_wrap_pos(>rwb, c_ctx->sentry + comp);
> - if (comp >= c_ctx->nr_valid) {
> - rec_ctx->nr_valid = 0;
> - rec_ctx->nr_padded = nr_entries - comp;
> -
> - c_ctx->nr_padded = comp - c_ctx->nr_valid;
> - } else {
> - rec_ctx->nr_valid = c_ctx->nr_valid - comp;
> - 

Re: [PATCH 1/3] lightnvm: pblk: rework write error recovery path

2018-04-20 Thread Javier Gonzalez
> On 19 Apr 2018, at 09.39, Hans Holmberg  
> wrote:
> 
> From: Hans Holmberg 
> 
> The write error recovery path is incomplete, so rework
> the write error recovery handling to do resubmits directly
> from the write buffer.
> 
> When a write error occurs, the remaining sectors in the chunk are
> mapped out and invalidated and the request inserted in a resubmit list.
> 
> The writer thread checks if there are any requests to resubmit,
> scans and invalidates any lbas that have been overwritten by later
> writes and resubmits the failed entries.
> 
> Signed-off-by: Hans Holmberg 
> ---
> drivers/lightnvm/pblk-init.c |   2 +
> drivers/lightnvm/pblk-recovery.c |  91 ---
> drivers/lightnvm/pblk-write.c| 241 ---
> drivers/lightnvm/pblk.h  |   8 +-
> 4 files changed, 180 insertions(+), 162 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index bfc488d..6f06727 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -426,6 +426,7 @@ static int pblk_core_init(struct pblk *pblk)
>   goto free_r_end_wq;
> 
>   INIT_LIST_HEAD(>compl_list);
> + INIT_LIST_HEAD(>resubmit_list);
> 
>   return 0;
> 
> @@ -1185,6 +1186,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct 
> gendisk *tdisk,
>   pblk->state = PBLK_STATE_RUNNING;
>   pblk->gc.gc_enabled = 0;
> 
> + spin_lock_init(>resubmit_lock);
>   spin_lock_init(>trans_lock);
>   spin_lock_init(>lock);
> 
> diff --git a/drivers/lightnvm/pblk-recovery.c 
> b/drivers/lightnvm/pblk-recovery.c
> index 9cb6d5d..5983428 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -16,97 +16,6 @@
> 
> #include "pblk.h"
> 
> -void pblk_submit_rec(struct work_struct *work)
> -{
> - struct pblk_rec_ctx *recovery =
> - container_of(work, struct pblk_rec_ctx, ws_rec);
> - struct pblk *pblk = recovery->pblk;
> - struct nvm_rq *rqd = recovery->rqd;
> - struct pblk_c_ctx *c_ctx = nvm_rq_to_pdu(rqd);
> - struct bio *bio;
> - unsigned int nr_rec_secs;
> - unsigned int pgs_read;
> - int ret;
> -
> - nr_rec_secs = bitmap_weight((unsigned long int *)>ppa_status,
> - NVM_MAX_VLBA);
> -
> - bio = bio_alloc(GFP_KERNEL, nr_rec_secs);
> -
> - bio->bi_iter.bi_sector = 0;
> - bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
> - rqd->bio = bio;
> - rqd->nr_ppas = nr_rec_secs;
> -
> - pgs_read = pblk_rb_read_to_bio_list(>rwb, bio, >failed,
> - nr_rec_secs);

Please, remove functions that are not longer used. Doing a pass on the
rest of the removed functions would be a good idea.

> - if (pgs_read != nr_rec_secs) {
> - pr_err("pblk: could not read recovery entries\n");
> - goto err;
> - }
> -
> - if (pblk_setup_w_rec_rq(pblk, rqd, c_ctx)) {

Same here

> - pr_err("pblk: could not setup recovery request\n");
> - goto err;
> - }
> -
> -#ifdef CONFIG_NVM_DEBUG
> - atomic_long_add(nr_rec_secs, >recov_writes);
> -#endif

Can you add this debug counter to the new path? I see you added other
counters, if it is a rename, can you put it on a separate patch?

> -
> - ret = pblk_submit_io(pblk, rqd);
> - if (ret) {
> - pr_err("pblk: I/O submission failed: %d\n", ret);
> - goto err;
> - }
> -
> - mempool_free(recovery, pblk->rec_pool);
> - return;
> -
> -err:
> - bio_put(bio);
> - pblk_free_rqd(pblk, rqd, PBLK_WRITE);
> -}
> -
> -int pblk_recov_setup_rq(struct pblk *pblk, struct pblk_c_ctx *c_ctx,
> - struct pblk_rec_ctx *recovery, u64 *comp_bits,
> - unsigned int comp)
> -{
> - struct nvm_rq *rec_rqd;
> - struct pblk_c_ctx *rec_ctx;
> - int nr_entries = c_ctx->nr_valid + c_ctx->nr_padded;
> -
> - rec_rqd = pblk_alloc_rqd(pblk, PBLK_WRITE);
> - rec_ctx = nvm_rq_to_pdu(rec_rqd);
> -
> - /* Copy completion bitmap, but exclude the first X completed entries */
> - bitmap_shift_right((unsigned long int *)_rqd->ppa_status,
> - (unsigned long int *)comp_bits,
> - comp, NVM_MAX_VLBA);
> -
> - /* Save the context for the entries that need to be re-written and
> -  * update current context with the completed entries.
> -  */
> - rec_ctx->sentry = pblk_rb_wrap_pos(>rwb, c_ctx->sentry + comp);
> - if (comp >= c_ctx->nr_valid) {
> - rec_ctx->nr_valid = 0;
> - rec_ctx->nr_padded = nr_entries - comp;
> -
> - c_ctx->nr_padded = comp - c_ctx->nr_valid;
> - } else {
> - rec_ctx->nr_valid = c_ctx->nr_valid - comp;
> - rec_ctx->nr_padded = c_ctx->nr_padded;
> -
> - c_ctx->nr_valid = comp;
> -   

Re: [PATCH 07/11] lightnvm: pblk: remove unnecessary indirection

2018-04-18 Thread Javier Gonzalez
> On 17 Apr 2018, at 05.11, Matias Bjørling  wrote:
> 
> On 4/16/18 12:25 PM, Javier González wrote:
>> Remove unnecessary indirection on the read path.
> 
> Title and description are the same. Can you elaborate what changed
> since pblk_submit_io now directly can be returned, and doesn't have
> its return value rewritten to NVM_IO_ERR?
> 

My bad - I assumed submit_io was giving NVM_ errors. I'll resend
returning NVM_IO_ERR - the indirection is still unnecessary (and it's
not used anywhere else in the code).

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 07/11] lightnvm: pblk: remove unnecessary indirection

2018-04-18 Thread Javier Gonzalez
> On 17 Apr 2018, at 05.11, Matias Bjørling  wrote:
> 
> On 4/16/18 12:25 PM, Javier González wrote:
>> Remove unnecessary indirection on the read path.
> 
> Title and description are the same. Can you elaborate what changed
> since pblk_submit_io now directly can be returned, and doesn't have
> its return value rewritten to NVM_IO_ERR?
> 

My bad - I assumed submit_io was giving NVM_ errors. I'll resend
returning NVM_IO_ERR - the indirection is still unnecessary (and it's
not used anywhere else in the code).

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH] nvme: lightnvm: add granby support

2018-04-17 Thread Javier Gonzalez
> On 17 Apr 2018, at 03.55, Wei Xu  wrote:
> 
> Add a new lightnvm quirk to identify CNEX’s Granby controller.
> 
> Signed-off-by: Wei Xu 
> ---
> drivers/nvme/host/pci.c | 2 ++
> 1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index cb73bc8..9419e88 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2529,6 +2529,8 @@ static const struct pci_device_id nvme_id_table[] = {
>   .driver_data = NVME_QUIRK_LIGHTNVM, },
>   { PCI_DEVICE(0x1d1d, 0x2807),   /* CNEX WL */
>   .driver_data = NVME_QUIRK_LIGHTNVM, },
> + { PCI_DEVICE(0x1d1d, 0x2601),   /* CNEX Granby */
> + .driver_data = NVME_QUIRK_LIGHTNVM, },
>   { PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xff) },
>   { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001) },
>   { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2003) },
> --
> 2.7.4


Reviewed-by: Javier González 



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH] nvme: lightnvm: add granby support

2018-04-17 Thread Javier Gonzalez
> On 17 Apr 2018, at 03.55, Wei Xu  wrote:
> 
> Add a new lightnvm quirk to identify CNEX’s Granby controller.
> 
> Signed-off-by: Wei Xu 
> ---
> drivers/nvme/host/pci.c | 2 ++
> 1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index cb73bc8..9419e88 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2529,6 +2529,8 @@ static const struct pci_device_id nvme_id_table[] = {
>   .driver_data = NVME_QUIRK_LIGHTNVM, },
>   { PCI_DEVICE(0x1d1d, 0x2807),   /* CNEX WL */
>   .driver_data = NVME_QUIRK_LIGHTNVM, },
> + { PCI_DEVICE(0x1d1d, 0x2601),   /* CNEX Granby */
> + .driver_data = NVME_QUIRK_LIGHTNVM, },
>   { PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xff) },
>   { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001) },
>   { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2003) },
> --
> 2.7.4


Reviewed-by: Javier González 



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 08/12] lightnvm: implement get log report chunk helpers

2018-03-21 Thread Javier Gonzalez

> On 21 Mar 2018, at 20.27, Matias Bjørling  wrote:
> 
>> On 03/21/2018 03:36 PM, Keith Busch wrote:
>> On Wed, Mar 21, 2018 at 03:06:05AM -0700, Matias Bjørling wrote:
 outside of nvme core so that we can use it form lightnvm.
 
 Signed-off-by: Javier González 
 ---
   drivers/lightnvm/core.c  | 11 +++
   drivers/nvme/host/core.c |  6 ++--
   drivers/nvme/host/lightnvm.c | 74 
 
   drivers/nvme/host/nvme.h |  3 ++
   include/linux/lightnvm.h | 24 ++
   5 files changed, 115 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
 index 2e9e9f973a75..af642ce6ba69 100644
 --- a/drivers/nvme/host/core.c
 +++ b/drivers/nvme/host/core.c
 @@ -2127,9 +2127,9 @@ static int nvme_init_subsystem(struct nvme_ctrl 
 *ctrl, struct nvme_id_ctrl *id)
   return ret;
   }
   -static int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 -u8 log_page, void *log,
 -size_t size, size_t offset)
 +int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 + u8 log_page, void *log,
 + size_t size, size_t offset)
   {
   struct nvme_command c = { };
   unsigned long dwlen = size / 4 - 1;
 diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
 index 08f0f6b5bc06..ffd64a83c8c3 100644
 --- a/drivers/nvme/host/lightnvm.c
 +++ b/drivers/nvme/host/lightnvm.c
 @@ -35,6 +35,10 @@ enum nvme_nvm_admin_opcode {
   nvme_nvm_admin_set_bb_tbl= 0xf1,
   };
   
>>> 
>>> 
>>> 
   diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
 index 1ca08f4993ba..505f797f8c6c 100644
 --- a/drivers/nvme/host/nvme.h
 +++ b/drivers/nvme/host/nvme.h
 @@ -396,6 +396,9 @@ int nvme_reset_ctrl(struct nvme_ctrl *ctrl);
   int nvme_delete_ctrl(struct nvme_ctrl *ctrl);
   int nvme_delete_ctrl_sync(struct nvme_ctrl *ctrl);
   +int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 + u8 log_page, void *log, size_t size, size_t offset);
 +
   extern const struct attribute_group nvme_ns_id_attr_group;
   extern const struct block_device_operations nvme_ns_head_ops;
   
>>> 
>>> 
>>> Keith, Christoph, Sagi, Is it okay that these two changes that exposes
>>> the nvme_get_log_ext fn are carried through Jens' tree after the nvme
>>> tree for 4.17 has been pulled?
>> That's okay with me. Alteratively, if you want to split the generic nvme
>> part out, I can apply that immediately and the API will be in the first
>> nvme-4.17 pull request.
> 
> Will do. I've sent the patch in another mail. Thanks! :)

It’s fine with me.

Matias: do you take that part of the patch out directly on our tree?

Javier. 

Re: [PATCH 08/12] lightnvm: implement get log report chunk helpers

2018-03-21 Thread Javier Gonzalez

> On 21 Mar 2018, at 20.27, Matias Bjørling  wrote:
> 
>> On 03/21/2018 03:36 PM, Keith Busch wrote:
>> On Wed, Mar 21, 2018 at 03:06:05AM -0700, Matias Bjørling wrote:
 outside of nvme core so that we can use it form lightnvm.
 
 Signed-off-by: Javier González 
 ---
   drivers/lightnvm/core.c  | 11 +++
   drivers/nvme/host/core.c |  6 ++--
   drivers/nvme/host/lightnvm.c | 74 
 
   drivers/nvme/host/nvme.h |  3 ++
   include/linux/lightnvm.h | 24 ++
   5 files changed, 115 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
 index 2e9e9f973a75..af642ce6ba69 100644
 --- a/drivers/nvme/host/core.c
 +++ b/drivers/nvme/host/core.c
 @@ -2127,9 +2127,9 @@ static int nvme_init_subsystem(struct nvme_ctrl 
 *ctrl, struct nvme_id_ctrl *id)
   return ret;
   }
   -static int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 -u8 log_page, void *log,
 -size_t size, size_t offset)
 +int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 + u8 log_page, void *log,
 + size_t size, size_t offset)
   {
   struct nvme_command c = { };
   unsigned long dwlen = size / 4 - 1;
 diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
 index 08f0f6b5bc06..ffd64a83c8c3 100644
 --- a/drivers/nvme/host/lightnvm.c
 +++ b/drivers/nvme/host/lightnvm.c
 @@ -35,6 +35,10 @@ enum nvme_nvm_admin_opcode {
   nvme_nvm_admin_set_bb_tbl= 0xf1,
   };
   
>>> 
>>> 
>>> 
   diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
 index 1ca08f4993ba..505f797f8c6c 100644
 --- a/drivers/nvme/host/nvme.h
 +++ b/drivers/nvme/host/nvme.h
 @@ -396,6 +396,9 @@ int nvme_reset_ctrl(struct nvme_ctrl *ctrl);
   int nvme_delete_ctrl(struct nvme_ctrl *ctrl);
   int nvme_delete_ctrl_sync(struct nvme_ctrl *ctrl);
   +int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 + u8 log_page, void *log, size_t size, size_t offset);
 +
   extern const struct attribute_group nvme_ns_id_attr_group;
   extern const struct block_device_operations nvme_ns_head_ops;
   
>>> 
>>> 
>>> Keith, Christoph, Sagi, Is it okay that these two changes that exposes
>>> the nvme_get_log_ext fn are carried through Jens' tree after the nvme
>>> tree for 4.17 has been pulled?
>> That's okay with me. Alteratively, if you want to split the generic nvme
>> part out, I can apply that immediately and the API will be in the first
>> nvme-4.17 pull request.
> 
> Will do. I've sent the patch in another mail. Thanks! :)

It’s fine with me.

Matias: do you take that part of the patch out directly on our tree?

Javier. 

Re: [PATCH 09/15] lightnvm: implement get log report chunk helpers

2018-03-01 Thread Javier Gonzalez

> On 1 Mar 2018, at 12.51, Matias Bjørling <m...@lightnvm.io> wrote:
> 
> On 03/01/2018 12:02 PM, Javier Gonzalez wrote:
>>> On 1 Mar 2018, at 11.40, Matias Bjørling <m...@lightnvm.io> wrote:
>>> 
>>> On 02/28/2018 04:49 PM, Javier González wrote:
>>>> The 2.0 spec provides a report chunk log page that can be retrieved
>>>> using the stangard nvme get log page. This replaces the dedicated
>>>> get/put bad block table in 1.2.
>>>> This patch implements the helper functions to allow targets retrieve the
>>>> chunk metadata using get log page. It makes nvme_get_log_ext available
>>>> outside of nvme core so that we can use it form lightnvm.
>>>> Signed-off-by: Javier González <jav...@cnexlabs.com>
>>>> ---
>>>>  drivers/lightnvm/core.c  | 11 +++
>>>>  drivers/nvme/host/core.c |  6 ++--
>>>>  drivers/nvme/host/lightnvm.c | 74 
>>>> 
>>>>  drivers/nvme/host/nvme.h |  3 ++
>>>>  include/linux/lightnvm.h | 24 ++
>>>>  5 files changed, 115 insertions(+), 3 deletions(-)
>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>>>> index ed33e0b11788..4141871f460d 100644
>>>> --- a/drivers/lightnvm/core.c
>>>> +++ b/drivers/lightnvm/core.c
>>>> @@ -712,6 +712,17 @@ static void nvm_free_rqd_ppalist(struct nvm_tgt_dev 
>>>> *tgt_dev,
>>>>nvm_dev_dma_free(tgt_dev->parent, rqd->ppa_list, rqd->dma_ppa_list);
>>>>  }
>>>>  +int nvm_get_chunk_meta(struct nvm_tgt_dev *tgt_dev, struct nvm_chk_meta 
>>>> *meta,
>>>> +  struct ppa_addr ppa, int nchks)
>>>> +{
>>>> +  struct nvm_dev *dev = tgt_dev->parent;
>>>> +
>>>> +  nvm_ppa_tgt_to_dev(tgt_dev, , 1);
>>>> +
>>>> +  return dev->ops->get_chk_meta(tgt_dev->parent, meta,
>>>> +  (sector_t)ppa.ppa, nchks);
>>>> +}
>>>> +EXPORT_SYMBOL(nvm_get_chunk_meta);
>>>>int nvm_set_tgt_bb_tbl(struct nvm_tgt_dev *tgt_dev, struct ppa_addr 
>>>> *ppas,
>>>>   int nr_ppas, int type)
>>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>>> index 2e9e9f973a75..af642ce6ba69 100644
>>>> --- a/drivers/nvme/host/core.c
>>>> +++ b/drivers/nvme/host/core.c
>>>> @@ -2127,9 +2127,9 @@ static int nvme_init_subsystem(struct nvme_ctrl 
>>>> *ctrl, struct nvme_id_ctrl *id)
>>>>return ret;
>>>>  }
>>>>  -static int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
>>>> -  u8 log_page, void *log,
>>>> -  size_t size, size_t offset)
>>>> +int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
>>>> +   u8 log_page, void *log,
>>>> +   size_t size, size_t offset)
>>>>  {
>>>>struct nvme_command c = { };
>>>>unsigned long dwlen = size / 4 - 1;
>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>>> index f7135659f918..a1796241040f 100644
>>>> --- a/drivers/nvme/host/lightnvm.c
>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>> @@ -35,6 +35,10 @@ enum nvme_nvm_admin_opcode {
>>>>nvme_nvm_admin_set_bb_tbl   = 0xf1,
>>>>  };
>>>>  +enum nvme_nvm_log_page {
>>>> +  NVME_NVM_LOG_REPORT_CHUNK   = 0xca,
>>>> +};
>>>> +
>>>>  struct nvme_nvm_ph_rw {
>>>>__u8opcode;
>>>>__u8flags;
>>>> @@ -236,6 +240,16 @@ struct nvme_nvm_id20 {
>>>>__u8vs[1024];
>>>>  };
>>>>  +struct nvme_nvm_chk_meta {
>>>> +  __u8state;
>>>> +  __u8type;
>>>> +  __u8wi;
>>>> +  __u8rsvd[5];
>>>> +  __le64  slba;
>>>> +  __le64  cnlb;
>>>> +  __le64  wp;
>>>> +};
>>>> +
>>>>  /*
>>>>   * Check we didn't inadvertently grow the command struct
>>>>   */
>>>> @@ -252,6 +266,9 @@ static inline void _nvme_nvm_check_size(void)
>>>>BUILD_BUG_ON(sizeof(struct nvme_nvm_bb_tbl) != 64);
>>>>BUILD_BUG_ON(sizeof(struct nvme_nvm_id20_addrf) != 8);
>>&

Re: [PATCH 09/15] lightnvm: implement get log report chunk helpers

2018-03-01 Thread Javier Gonzalez

> On 1 Mar 2018, at 12.51, Matias Bjørling  wrote:
> 
> On 03/01/2018 12:02 PM, Javier Gonzalez wrote:
>>> On 1 Mar 2018, at 11.40, Matias Bjørling  wrote:
>>> 
>>> On 02/28/2018 04:49 PM, Javier González wrote:
>>>> The 2.0 spec provides a report chunk log page that can be retrieved
>>>> using the stangard nvme get log page. This replaces the dedicated
>>>> get/put bad block table in 1.2.
>>>> This patch implements the helper functions to allow targets retrieve the
>>>> chunk metadata using get log page. It makes nvme_get_log_ext available
>>>> outside of nvme core so that we can use it form lightnvm.
>>>> Signed-off-by: Javier González 
>>>> ---
>>>>  drivers/lightnvm/core.c  | 11 +++
>>>>  drivers/nvme/host/core.c |  6 ++--
>>>>  drivers/nvme/host/lightnvm.c | 74 
>>>> 
>>>>  drivers/nvme/host/nvme.h |  3 ++
>>>>  include/linux/lightnvm.h | 24 ++
>>>>  5 files changed, 115 insertions(+), 3 deletions(-)
>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>>>> index ed33e0b11788..4141871f460d 100644
>>>> --- a/drivers/lightnvm/core.c
>>>> +++ b/drivers/lightnvm/core.c
>>>> @@ -712,6 +712,17 @@ static void nvm_free_rqd_ppalist(struct nvm_tgt_dev 
>>>> *tgt_dev,
>>>>nvm_dev_dma_free(tgt_dev->parent, rqd->ppa_list, rqd->dma_ppa_list);
>>>>  }
>>>>  +int nvm_get_chunk_meta(struct nvm_tgt_dev *tgt_dev, struct nvm_chk_meta 
>>>> *meta,
>>>> +  struct ppa_addr ppa, int nchks)
>>>> +{
>>>> +  struct nvm_dev *dev = tgt_dev->parent;
>>>> +
>>>> +  nvm_ppa_tgt_to_dev(tgt_dev, , 1);
>>>> +
>>>> +  return dev->ops->get_chk_meta(tgt_dev->parent, meta,
>>>> +  (sector_t)ppa.ppa, nchks);
>>>> +}
>>>> +EXPORT_SYMBOL(nvm_get_chunk_meta);
>>>>int nvm_set_tgt_bb_tbl(struct nvm_tgt_dev *tgt_dev, struct ppa_addr 
>>>> *ppas,
>>>>   int nr_ppas, int type)
>>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>>> index 2e9e9f973a75..af642ce6ba69 100644
>>>> --- a/drivers/nvme/host/core.c
>>>> +++ b/drivers/nvme/host/core.c
>>>> @@ -2127,9 +2127,9 @@ static int nvme_init_subsystem(struct nvme_ctrl 
>>>> *ctrl, struct nvme_id_ctrl *id)
>>>>return ret;
>>>>  }
>>>>  -static int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
>>>> -  u8 log_page, void *log,
>>>> -  size_t size, size_t offset)
>>>> +int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
>>>> +   u8 log_page, void *log,
>>>> +   size_t size, size_t offset)
>>>>  {
>>>>struct nvme_command c = { };
>>>>unsigned long dwlen = size / 4 - 1;
>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>>> index f7135659f918..a1796241040f 100644
>>>> --- a/drivers/nvme/host/lightnvm.c
>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>> @@ -35,6 +35,10 @@ enum nvme_nvm_admin_opcode {
>>>>nvme_nvm_admin_set_bb_tbl   = 0xf1,
>>>>  };
>>>>  +enum nvme_nvm_log_page {
>>>> +  NVME_NVM_LOG_REPORT_CHUNK   = 0xca,
>>>> +};
>>>> +
>>>>  struct nvme_nvm_ph_rw {
>>>>__u8opcode;
>>>>__u8flags;
>>>> @@ -236,6 +240,16 @@ struct nvme_nvm_id20 {
>>>>__u8vs[1024];
>>>>  };
>>>>  +struct nvme_nvm_chk_meta {
>>>> +  __u8state;
>>>> +  __u8type;
>>>> +  __u8wi;
>>>> +  __u8rsvd[5];
>>>> +  __le64  slba;
>>>> +  __le64  cnlb;
>>>> +  __le64  wp;
>>>> +};
>>>> +
>>>>  /*
>>>>   * Check we didn't inadvertently grow the command struct
>>>>   */
>>>> @@ -252,6 +266,9 @@ static inline void _nvme_nvm_check_size(void)
>>>>BUILD_BUG_ON(sizeof(struct nvme_nvm_bb_tbl) != 64);
>>>>BUILD_BUG_ON(sizeof(struct nvme_nvm_id20_addrf) != 8);
>>>>BUILD_BUG_ON(sizeof(struct nvme_nvm_id20) != NVME_IDENTIFY_D

Re: [PATCH 09/15] lightnvm: implement get log report chunk helpers

2018-03-01 Thread Javier Gonzalez
> On 1 Mar 2018, at 11.40, Matias Bjørling  wrote:
> 
> On 02/28/2018 04:49 PM, Javier González wrote:
>> The 2.0 spec provides a report chunk log page that can be retrieved
>> using the stangard nvme get log page. This replaces the dedicated
>> get/put bad block table in 1.2.
>> This patch implements the helper functions to allow targets retrieve the
>> chunk metadata using get log page. It makes nvme_get_log_ext available
>> outside of nvme core so that we can use it form lightnvm.
>> Signed-off-by: Javier González 
>> ---
>>  drivers/lightnvm/core.c  | 11 +++
>>  drivers/nvme/host/core.c |  6 ++--
>>  drivers/nvme/host/lightnvm.c | 74 
>> 
>>  drivers/nvme/host/nvme.h |  3 ++
>>  include/linux/lightnvm.h | 24 ++
>>  5 files changed, 115 insertions(+), 3 deletions(-)
>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>> index ed33e0b11788..4141871f460d 100644
>> --- a/drivers/lightnvm/core.c
>> +++ b/drivers/lightnvm/core.c
>> @@ -712,6 +712,17 @@ static void nvm_free_rqd_ppalist(struct nvm_tgt_dev 
>> *tgt_dev,
>>  nvm_dev_dma_free(tgt_dev->parent, rqd->ppa_list, rqd->dma_ppa_list);
>>  }
>>  +int nvm_get_chunk_meta(struct nvm_tgt_dev *tgt_dev, struct nvm_chk_meta 
>> *meta,
>> +struct ppa_addr ppa, int nchks)
>> +{
>> +struct nvm_dev *dev = tgt_dev->parent;
>> +
>> +nvm_ppa_tgt_to_dev(tgt_dev, , 1);
>> +
>> +return dev->ops->get_chk_meta(tgt_dev->parent, meta,
>> +(sector_t)ppa.ppa, nchks);
>> +}
>> +EXPORT_SYMBOL(nvm_get_chunk_meta);
>>int nvm_set_tgt_bb_tbl(struct nvm_tgt_dev *tgt_dev, struct ppa_addr *ppas,
>> int nr_ppas, int type)
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 2e9e9f973a75..af642ce6ba69 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -2127,9 +2127,9 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, 
>> struct nvme_id_ctrl *id)
>>  return ret;
>>  }
>>  -static int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
>> -u8 log_page, void *log,
>> -size_t size, size_t offset)
>> +int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
>> + u8 log_page, void *log,
>> + size_t size, size_t offset)
>>  {
>>  struct nvme_command c = { };
>>  unsigned long dwlen = size / 4 - 1;
>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>> index f7135659f918..a1796241040f 100644
>> --- a/drivers/nvme/host/lightnvm.c
>> +++ b/drivers/nvme/host/lightnvm.c
>> @@ -35,6 +35,10 @@ enum nvme_nvm_admin_opcode {
>>  nvme_nvm_admin_set_bb_tbl   = 0xf1,
>>  };
>>  +enum nvme_nvm_log_page {
>> +NVME_NVM_LOG_REPORT_CHUNK   = 0xca,
>> +};
>> +
>>  struct nvme_nvm_ph_rw {
>>  __u8opcode;
>>  __u8flags;
>> @@ -236,6 +240,16 @@ struct nvme_nvm_id20 {
>>  __u8vs[1024];
>>  };
>>  +struct nvme_nvm_chk_meta {
>> +__u8state;
>> +__u8type;
>> +__u8wi;
>> +__u8rsvd[5];
>> +__le64  slba;
>> +__le64  cnlb;
>> +__le64  wp;
>> +};
>> +
>>  /*
>>   * Check we didn't inadvertently grow the command struct
>>   */
>> @@ -252,6 +266,9 @@ static inline void _nvme_nvm_check_size(void)
>>  BUILD_BUG_ON(sizeof(struct nvme_nvm_bb_tbl) != 64);
>>  BUILD_BUG_ON(sizeof(struct nvme_nvm_id20_addrf) != 8);
>>  BUILD_BUG_ON(sizeof(struct nvme_nvm_id20) != NVME_IDENTIFY_DATA_SIZE);
>> +BUILD_BUG_ON(sizeof(struct nvme_nvm_chk_meta) != 32);
>> +BUILD_BUG_ON(sizeof(struct nvme_nvm_chk_meta) !=
>> +sizeof(struct nvm_chk_meta));
>>  }
>>static void nvme_nvm_set_addr_12(struct nvm_addr_format_12 *dst,
>> @@ -555,6 +572,61 @@ static int nvme_nvm_set_bb_tbl(struct nvm_dev *nvmdev, 
>> struct ppa_addr *ppas,
>>  return ret;
>>  }
>>  +/*
>> + * Expect the lba in device format
>> + */
>> +static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
>> + struct nvm_chk_meta *meta,
>> + sector_t slba, int nchks)
>> +{
>> +struct nvm_geo *geo = >geo;
>> +struct nvme_ns *ns = ndev->q->queuedata;
>> +struct nvme_ctrl *ctrl = ns->ctrl;
>> +struct nvme_nvm_chk_meta *dev_meta = (struct nvme_nvm_chk_meta *)meta;
>> +struct ppa_addr ppa;
>> +size_t left = nchks * sizeof(struct nvme_nvm_chk_meta);
>> +size_t log_pos, offset, len;
>> +int ret, i;
>> +
>> +/* Normalize lba address space to obtain log offset */
>> +ppa.ppa = slba;
>> +ppa = dev_to_generic_addr(ndev, ppa);
>> +
>> +log_pos = ppa.m.chk;
>> +log_pos += ppa.m.pu * geo->num_chk;
>> +log_pos += ppa.m.grp * geo->num_lun * geo->num_chk;
> 
> Why is this done?

The log page 

Re: [PATCH 09/15] lightnvm: implement get log report chunk helpers

2018-03-01 Thread Javier Gonzalez
> On 1 Mar 2018, at 11.40, Matias Bjørling  wrote:
> 
> On 02/28/2018 04:49 PM, Javier González wrote:
>> The 2.0 spec provides a report chunk log page that can be retrieved
>> using the stangard nvme get log page. This replaces the dedicated
>> get/put bad block table in 1.2.
>> This patch implements the helper functions to allow targets retrieve the
>> chunk metadata using get log page. It makes nvme_get_log_ext available
>> outside of nvme core so that we can use it form lightnvm.
>> Signed-off-by: Javier González 
>> ---
>>  drivers/lightnvm/core.c  | 11 +++
>>  drivers/nvme/host/core.c |  6 ++--
>>  drivers/nvme/host/lightnvm.c | 74 
>> 
>>  drivers/nvme/host/nvme.h |  3 ++
>>  include/linux/lightnvm.h | 24 ++
>>  5 files changed, 115 insertions(+), 3 deletions(-)
>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>> index ed33e0b11788..4141871f460d 100644
>> --- a/drivers/lightnvm/core.c
>> +++ b/drivers/lightnvm/core.c
>> @@ -712,6 +712,17 @@ static void nvm_free_rqd_ppalist(struct nvm_tgt_dev 
>> *tgt_dev,
>>  nvm_dev_dma_free(tgt_dev->parent, rqd->ppa_list, rqd->dma_ppa_list);
>>  }
>>  +int nvm_get_chunk_meta(struct nvm_tgt_dev *tgt_dev, struct nvm_chk_meta 
>> *meta,
>> +struct ppa_addr ppa, int nchks)
>> +{
>> +struct nvm_dev *dev = tgt_dev->parent;
>> +
>> +nvm_ppa_tgt_to_dev(tgt_dev, , 1);
>> +
>> +return dev->ops->get_chk_meta(tgt_dev->parent, meta,
>> +(sector_t)ppa.ppa, nchks);
>> +}
>> +EXPORT_SYMBOL(nvm_get_chunk_meta);
>>int nvm_set_tgt_bb_tbl(struct nvm_tgt_dev *tgt_dev, struct ppa_addr *ppas,
>> int nr_ppas, int type)
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 2e9e9f973a75..af642ce6ba69 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -2127,9 +2127,9 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, 
>> struct nvme_id_ctrl *id)
>>  return ret;
>>  }
>>  -static int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
>> -u8 log_page, void *log,
>> -size_t size, size_t offset)
>> +int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
>> + u8 log_page, void *log,
>> + size_t size, size_t offset)
>>  {
>>  struct nvme_command c = { };
>>  unsigned long dwlen = size / 4 - 1;
>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>> index f7135659f918..a1796241040f 100644
>> --- a/drivers/nvme/host/lightnvm.c
>> +++ b/drivers/nvme/host/lightnvm.c
>> @@ -35,6 +35,10 @@ enum nvme_nvm_admin_opcode {
>>  nvme_nvm_admin_set_bb_tbl   = 0xf1,
>>  };
>>  +enum nvme_nvm_log_page {
>> +NVME_NVM_LOG_REPORT_CHUNK   = 0xca,
>> +};
>> +
>>  struct nvme_nvm_ph_rw {
>>  __u8opcode;
>>  __u8flags;
>> @@ -236,6 +240,16 @@ struct nvme_nvm_id20 {
>>  __u8vs[1024];
>>  };
>>  +struct nvme_nvm_chk_meta {
>> +__u8state;
>> +__u8type;
>> +__u8wi;
>> +__u8rsvd[5];
>> +__le64  slba;
>> +__le64  cnlb;
>> +__le64  wp;
>> +};
>> +
>>  /*
>>   * Check we didn't inadvertently grow the command struct
>>   */
>> @@ -252,6 +266,9 @@ static inline void _nvme_nvm_check_size(void)
>>  BUILD_BUG_ON(sizeof(struct nvme_nvm_bb_tbl) != 64);
>>  BUILD_BUG_ON(sizeof(struct nvme_nvm_id20_addrf) != 8);
>>  BUILD_BUG_ON(sizeof(struct nvme_nvm_id20) != NVME_IDENTIFY_DATA_SIZE);
>> +BUILD_BUG_ON(sizeof(struct nvme_nvm_chk_meta) != 32);
>> +BUILD_BUG_ON(sizeof(struct nvme_nvm_chk_meta) !=
>> +sizeof(struct nvm_chk_meta));
>>  }
>>static void nvme_nvm_set_addr_12(struct nvm_addr_format_12 *dst,
>> @@ -555,6 +572,61 @@ static int nvme_nvm_set_bb_tbl(struct nvm_dev *nvmdev, 
>> struct ppa_addr *ppas,
>>  return ret;
>>  }
>>  +/*
>> + * Expect the lba in device format
>> + */
>> +static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
>> + struct nvm_chk_meta *meta,
>> + sector_t slba, int nchks)
>> +{
>> +struct nvm_geo *geo = >geo;
>> +struct nvme_ns *ns = ndev->q->queuedata;
>> +struct nvme_ctrl *ctrl = ns->ctrl;
>> +struct nvme_nvm_chk_meta *dev_meta = (struct nvme_nvm_chk_meta *)meta;
>> +struct ppa_addr ppa;
>> +size_t left = nchks * sizeof(struct nvme_nvm_chk_meta);
>> +size_t log_pos, offset, len;
>> +int ret, i;
>> +
>> +/* Normalize lba address space to obtain log offset */
>> +ppa.ppa = slba;
>> +ppa = dev_to_generic_addr(ndev, ppa);
>> +
>> +log_pos = ppa.m.chk;
>> +log_pos += ppa.m.pu * geo->num_chk;
>> +log_pos += ppa.m.grp * geo->num_lun * geo->num_chk;
> 
> Why is this done?

The log page does not map to the lba space. You need to 

Re: [PATCH] lightnvm: simplify geometry structure.

2018-02-27 Thread Javier Gonzalez

> On 27 Feb 2018, at 19.23, Matias Bjørling  wrote:
> 
> On 02/27/2018 04:57 PM, Javier González wrote:
>> Currently, the device geometry is stored redundantly in the nvm_id and
>> nvm_geo structures at a device level. Moreover, when instantiating
>> targets on a specific number of LUNs, these structures are replicated
>> and manually modified to fit the instance channel and LUN partitioning.
>> Instead, create a generic geometry around nvm_geo, which can be used by
>> (i) the underlying device to describe the geometry of the whole device,
>> and (ii) instances to describe their geometry independently.
>> Since these share a big part of the geometry, create a nvm_common_geo
>> structure that keeps the static geoometry values that are shared across
>> instances.
>> As we introduce support for 2.0, these structures allow to abstract
>> spec. specific values and present a common geometry to targets.
>> Signed-off-by: Javier González 
>> ---
>>  drivers/lightnvm/core.c  | 114 +
>>  drivers/lightnvm/pblk-core.c |  16 +-
>>  drivers/lightnvm/pblk-gc.c   |   2 +-
>>  drivers/lightnvm/pblk-init.c | 123 +++---
>>  drivers/lightnvm/pblk-read.c |   2 +-
>>  drivers/lightnvm/pblk-recovery.c |  14 +-
>>  drivers/lightnvm/pblk-rl.c   |   2 +-
>>  drivers/lightnvm/pblk-sysfs.c|  39 +++--
>>  drivers/lightnvm/pblk-write.c|   2 +-
>>  drivers/lightnvm/pblk.h  |  93 +--
>>  drivers/nvme/host/lightnvm.c | 344 
>> +++
>>  include/linux/lightnvm.h | 202 ---
>>  12 files changed, 501 insertions(+), 452 deletions(-)
>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>> index 689c97b97775..3cd3027f9701 100644
>> --- a/drivers/lightnvm/core.c
>> +++ b/drivers/lightnvm/core.c
>> @@ -111,6 +111,7 @@ static void nvm_release_luns_err(struct nvm_dev *dev, 
>> int lun_begin,
>>  static void nvm_remove_tgt_dev(struct nvm_tgt_dev *tgt_dev, int clear)
>>  {
>>  struct nvm_dev *dev = tgt_dev->parent;
>> +struct nvm_geo *geo = >geo;
>>  struct nvm_dev_map *dev_map = tgt_dev->map;
>>  int i, j;
> 
> Now we are getting somewhere. Let's make a minimal patch, now that all the 
> rewriting isn't necessary. Therefore, use the original dev->geo. statements, 
> and don't rewrite them where before it made sense to shorthand it. Then in 
> the end, there should be a clean patch, that shows the identity structure 
> being removed.
> 
>>  @@ -122,7 +123,7 @@ static void nvm_remove_tgt_dev(struct nvm_tgt_dev 
>> *tgt_dev, int clear)
>>  if (clear) {
>>  for (j = 0; j < ch_map->nr_luns; j++) {
>>  int lun = j + lun_offs[j];
>> -int lunid = (ch * dev->geo.nr_luns) + lun;
>> +int lunid = (ch * geo->nr_luns) + lun;
>>  WARN_ON(!test_and_clear_bit(lunid,
>>  dev->lun_map));
>> @@ -143,19 +144,20 @@ static struct nvm_tgt_dev *nvm_create_tgt_dev(struct 
>> nvm_dev *dev,
>>u16 lun_begin, u16 lun_end,
>>u16 op)
>>  {
>> +struct nvm_geo *geo = >geo;
> 
> Kill
> 
>>  struct nvm_tgt_dev *tgt_dev = NULL;
>>  struct nvm_dev_map *dev_rmap = dev->rmap;
>>  struct nvm_dev_map *dev_map;
>>  struct ppa_addr *luns;
>>  int nr_luns = lun_end - lun_begin + 1;
>>  int luns_left = nr_luns;
>> -int nr_chnls = nr_luns / dev->geo.nr_luns;
>> -int nr_chnls_mod = nr_luns % dev->geo.nr_luns;
>> -int bch = lun_begin / dev->geo.nr_luns;
>> -int blun = lun_begin % dev->geo.nr_luns;
>> +int nr_chnls = nr_luns / geo->nr_luns;
>> +int nr_chnls_mod = nr_luns % geo->nr_luns;
>> +int bch = lun_begin / geo->nr_luns;
>> +int blun = lun_begin % geo->nr_luns;
>>  int lunid = 0;
>>  int lun_balanced = 1;
>> -int prev_nr_luns;
>> +int sec_per_lun, prev_nr_luns;
>>  int i, j;
>>  nr_chnls = (nr_chnls_mod == 0) ? nr_chnls : nr_chnls + 1;
>> @@ -173,15 +175,15 @@ static struct nvm_tgt_dev *nvm_create_tgt_dev(struct 
>> nvm_dev *dev,
>>  if (!luns)
>>  goto err_luns;
>>  -   prev_nr_luns = (luns_left > dev->geo.nr_luns) ?
>> -dev->geo.nr_luns : luns_left;
>> +prev_nr_luns = (luns_left > geo->nr_luns) ?
>> +geo->nr_luns : luns_left;
>>  for (i = 0; i < nr_chnls; i++) {
>>  struct nvm_ch_map *ch_rmap = _rmap->chnls[i + bch];
>>  int *lun_roffs = ch_rmap->lun_offs;
>>  struct nvm_ch_map *ch_map = _map->chnls[i];
>>  int *lun_offs;
>> -int luns_in_chnl = (luns_left > dev->geo.nr_luns) ?
>> -dev->geo.nr_luns : luns_left;
>> +int 

Re: [PATCH] lightnvm: simplify geometry structure.

2018-02-27 Thread Javier Gonzalez

> On 27 Feb 2018, at 19.23, Matias Bjørling  wrote:
> 
> On 02/27/2018 04:57 PM, Javier González wrote:
>> Currently, the device geometry is stored redundantly in the nvm_id and
>> nvm_geo structures at a device level. Moreover, when instantiating
>> targets on a specific number of LUNs, these structures are replicated
>> and manually modified to fit the instance channel and LUN partitioning.
>> Instead, create a generic geometry around nvm_geo, which can be used by
>> (i) the underlying device to describe the geometry of the whole device,
>> and (ii) instances to describe their geometry independently.
>> Since these share a big part of the geometry, create a nvm_common_geo
>> structure that keeps the static geoometry values that are shared across
>> instances.
>> As we introduce support for 2.0, these structures allow to abstract
>> spec. specific values and present a common geometry to targets.
>> Signed-off-by: Javier González 
>> ---
>>  drivers/lightnvm/core.c  | 114 +
>>  drivers/lightnvm/pblk-core.c |  16 +-
>>  drivers/lightnvm/pblk-gc.c   |   2 +-
>>  drivers/lightnvm/pblk-init.c | 123 +++---
>>  drivers/lightnvm/pblk-read.c |   2 +-
>>  drivers/lightnvm/pblk-recovery.c |  14 +-
>>  drivers/lightnvm/pblk-rl.c   |   2 +-
>>  drivers/lightnvm/pblk-sysfs.c|  39 +++--
>>  drivers/lightnvm/pblk-write.c|   2 +-
>>  drivers/lightnvm/pblk.h  |  93 +--
>>  drivers/nvme/host/lightnvm.c | 344 
>> +++
>>  include/linux/lightnvm.h | 202 ---
>>  12 files changed, 501 insertions(+), 452 deletions(-)
>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>> index 689c97b97775..3cd3027f9701 100644
>> --- a/drivers/lightnvm/core.c
>> +++ b/drivers/lightnvm/core.c
>> @@ -111,6 +111,7 @@ static void nvm_release_luns_err(struct nvm_dev *dev, 
>> int lun_begin,
>>  static void nvm_remove_tgt_dev(struct nvm_tgt_dev *tgt_dev, int clear)
>>  {
>>  struct nvm_dev *dev = tgt_dev->parent;
>> +struct nvm_geo *geo = >geo;
>>  struct nvm_dev_map *dev_map = tgt_dev->map;
>>  int i, j;
> 
> Now we are getting somewhere. Let's make a minimal patch, now that all the 
> rewriting isn't necessary. Therefore, use the original dev->geo. statements, 
> and don't rewrite them where before it made sense to shorthand it. Then in 
> the end, there should be a clean patch, that shows the identity structure 
> being removed.
> 
>>  @@ -122,7 +123,7 @@ static void nvm_remove_tgt_dev(struct nvm_tgt_dev 
>> *tgt_dev, int clear)
>>  if (clear) {
>>  for (j = 0; j < ch_map->nr_luns; j++) {
>>  int lun = j + lun_offs[j];
>> -int lunid = (ch * dev->geo.nr_luns) + lun;
>> +int lunid = (ch * geo->nr_luns) + lun;
>>  WARN_ON(!test_and_clear_bit(lunid,
>>  dev->lun_map));
>> @@ -143,19 +144,20 @@ static struct nvm_tgt_dev *nvm_create_tgt_dev(struct 
>> nvm_dev *dev,
>>u16 lun_begin, u16 lun_end,
>>u16 op)
>>  {
>> +struct nvm_geo *geo = >geo;
> 
> Kill
> 
>>  struct nvm_tgt_dev *tgt_dev = NULL;
>>  struct nvm_dev_map *dev_rmap = dev->rmap;
>>  struct nvm_dev_map *dev_map;
>>  struct ppa_addr *luns;
>>  int nr_luns = lun_end - lun_begin + 1;
>>  int luns_left = nr_luns;
>> -int nr_chnls = nr_luns / dev->geo.nr_luns;
>> -int nr_chnls_mod = nr_luns % dev->geo.nr_luns;
>> -int bch = lun_begin / dev->geo.nr_luns;
>> -int blun = lun_begin % dev->geo.nr_luns;
>> +int nr_chnls = nr_luns / geo->nr_luns;
>> +int nr_chnls_mod = nr_luns % geo->nr_luns;
>> +int bch = lun_begin / geo->nr_luns;
>> +int blun = lun_begin % geo->nr_luns;
>>  int lunid = 0;
>>  int lun_balanced = 1;
>> -int prev_nr_luns;
>> +int sec_per_lun, prev_nr_luns;
>>  int i, j;
>>  nr_chnls = (nr_chnls_mod == 0) ? nr_chnls : nr_chnls + 1;
>> @@ -173,15 +175,15 @@ static struct nvm_tgt_dev *nvm_create_tgt_dev(struct 
>> nvm_dev *dev,
>>  if (!luns)
>>  goto err_luns;
>>  -   prev_nr_luns = (luns_left > dev->geo.nr_luns) ?
>> -dev->geo.nr_luns : luns_left;
>> +prev_nr_luns = (luns_left > geo->nr_luns) ?
>> +geo->nr_luns : luns_left;
>>  for (i = 0; i < nr_chnls; i++) {
>>  struct nvm_ch_map *ch_rmap = _rmap->chnls[i + bch];
>>  int *lun_roffs = ch_rmap->lun_offs;
>>  struct nvm_ch_map *ch_map = _map->chnls[i];
>>  int *lun_offs;
>> -int luns_in_chnl = (luns_left > dev->geo.nr_luns) ?
>> -dev->geo.nr_luns : luns_left;
>> +int luns_in_chnl = (luns_left > geo->nr_luns) ?
>> + 

Re: [PATCH] lightnvm: simplify geometry structure.

2018-02-27 Thread Javier Gonzalez

> On 27 Feb 2018, at 16.57, Javier González  wrote:
> 
> Currently, the device geometry is stored redundantly in the nvm_id and
> nvm_geo structures at a device level. Moreover, when instantiating
> targets on a specific number of LUNs, these structures are replicated
> and manually modified to fit the instance channel and LUN partitioning.
> 
> Instead, create a generic geometry around nvm_geo, which can be used by
> (i) the underlying device to describe the geometry of the whole device,
> and (ii) instances to describe their geometry independently.
> 
> Since these share a big part of the geometry, create a nvm_common_geo
> structure that keeps the static geoometry values that are shared across
> instances.
> 
> As we introduce support for 2.0, these structures allow to abstract
> spec. specific values and present a common geometry to targets.
> 
> Signed-off-by: Javier González 
> ---
> 

Please ignore this commit message for now. It’s not updated...

Javier

Re: [PATCH] lightnvm: simplify geometry structure.

2018-02-27 Thread Javier Gonzalez

> On 27 Feb 2018, at 16.57, Javier González  wrote:
> 
> Currently, the device geometry is stored redundantly in the nvm_id and
> nvm_geo structures at a device level. Moreover, when instantiating
> targets on a specific number of LUNs, these structures are replicated
> and manually modified to fit the instance channel and LUN partitioning.
> 
> Instead, create a generic geometry around nvm_geo, which can be used by
> (i) the underlying device to describe the geometry of the whole device,
> and (ii) instances to describe their geometry independently.
> 
> Since these share a big part of the geometry, create a nvm_common_geo
> structure that keeps the static geoometry values that are shared across
> instances.
> 
> As we introduce support for 2.0, these structures allow to abstract
> spec. specific values and present a common geometry to targets.
> 
> Signed-off-by: Javier González 
> ---
> 

Please ignore this commit message for now. It’s not updated...

Javier

Re: [PATCH V3 00/19] lightnvm: pblk: implement 2.0 support

2018-02-26 Thread Javier Gonzalez
> On 26 Feb 2018, at 19.24, Matias Bjørling <m...@lightnvm.io> wrote:
> 
> On 02/26/2018 07:21 PM, Javier Gonzalez wrote:
>>> On 26 Feb 2018, at 19.19, Matias Bjørling <m...@lightnvm.io> wrote:
>>> 
>>> On 02/26/2018 02:16 PM, Javier González wrote:
>>>> # Changes since V2:
>>>> Apply Matias' feedback:
>>>>  - Remove generic nvm_id identify structure.
>>>>  - Do not remap capabilities (cap) to media and controlled capabilities
>>>>(mccap). Instead, add a comment to prevent confusion when
>>>>crosschecking with 2.0 spec.
>>>>  - Change maxoc and maxocpu defaults from 1 block to the max number of
>>>>blocks.
>>>>  - Re-implement the generic geometry to use nvm_geo on both device and
>>>>targets. Maintain nvm_common_geo to make it easier to copy the common
>>>>part of the geometry (without having to overwrite target-specific
>>>>fields, which is ugly and error prone). Matias, if you still want to
>>>>get rid of this, we can do it.
>>> 
>>> I do, the variables should go directly in nvm_geo. Thanks.
>> Ok. Is the rest ok with you?
> I'll go through it when the rebase is posted. Most of the patches is
> dependent on the first patch.

As it is now, it will be basically %s/geo->c.X/geo->X/g. If you can look
at the first patch now, I can put all comments in a single version more,
instead of going through one extra cycle...




signature.asc
Description: Message signed with OpenPGP


Re: [PATCH V3 00/19] lightnvm: pblk: implement 2.0 support

2018-02-26 Thread Javier Gonzalez
> On 26 Feb 2018, at 19.24, Matias Bjørling  wrote:
> 
> On 02/26/2018 07:21 PM, Javier Gonzalez wrote:
>>> On 26 Feb 2018, at 19.19, Matias Bjørling  wrote:
>>> 
>>> On 02/26/2018 02:16 PM, Javier González wrote:
>>>> # Changes since V2:
>>>> Apply Matias' feedback:
>>>>  - Remove generic nvm_id identify structure.
>>>>  - Do not remap capabilities (cap) to media and controlled capabilities
>>>>(mccap). Instead, add a comment to prevent confusion when
>>>>crosschecking with 2.0 spec.
>>>>  - Change maxoc and maxocpu defaults from 1 block to the max number of
>>>>blocks.
>>>>  - Re-implement the generic geometry to use nvm_geo on both device and
>>>>targets. Maintain nvm_common_geo to make it easier to copy the common
>>>>part of the geometry (without having to overwrite target-specific
>>>>fields, which is ugly and error prone). Matias, if you still want to
>>>>get rid of this, we can do it.
>>> 
>>> I do, the variables should go directly in nvm_geo. Thanks.
>> Ok. Is the rest ok with you?
> I'll go through it when the rebase is posted. Most of the patches is
> dependent on the first patch.

As it is now, it will be basically %s/geo->c.X/geo->X/g. If you can look
at the first patch now, I can put all comments in a single version more,
instead of going through one extra cycle...




signature.asc
Description: Message signed with OpenPGP


Re: [PATCH V3 00/19] lightnvm: pblk: implement 2.0 support

2018-02-26 Thread Javier Gonzalez
> On 26 Feb 2018, at 19.19, Matias Bjørling  wrote:
> 
> On 02/26/2018 02:16 PM, Javier González wrote:
>> # Changes since V2:
>> Apply Matias' feedback:
>>  - Remove generic nvm_id identify structure.
>>  - Do not remap capabilities (cap) to media and controlled capabilities
>>(mccap). Instead, add a comment to prevent confusion when
>>crosschecking with 2.0 spec.
>>  - Change maxoc and maxocpu defaults from 1 block to the max number of
>>blocks.
>>  - Re-implement the generic geometry to use nvm_geo on both device and
>>targets. Maintain nvm_common_geo to make it easier to copy the common
>>part of the geometry (without having to overwrite target-specific
>>fields, which is ugly and error prone). Matias, if you still want to
>>get rid of this, we can do it.
> 
> I do, the variables should go directly in nvm_geo. Thanks.

Ok. Is the rest ok with you?

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH] lightnvm: pblk: remove unused variable

2018-02-26 Thread Javier Gonzalez

> On 26 Feb 2018, at 19.20, Matias Bjørling  wrote:
> 
> On 02/26/2018 02:18 PM, Javier González wrote:
>> Remove unused variable after a previous cleanup (a8112b631adb)
>> Signed-off-by: Javier González 
>> ---
>>  drivers/lightnvm/pblk-core.c | 3 ---
>>  1 file changed, 3 deletions(-)
>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>> index ce6a7cfdba66..e6cb4317bb50 100644
>> --- a/drivers/lightnvm/pblk-core.c
>> +++ b/drivers/lightnvm/pblk-core.c
>> @@ -1071,7 +1071,6 @@ static int pblk_line_init_bb(struct pblk *pblk, struct 
>> pblk_line *line,
>>  struct nvm_geo *geo = >geo;
>>  struct pblk_line_meta *lm = >lm;
>>  struct pblk_line_mgmt *l_mg = >l_mg;
>> -int nr_bb = 0;
>>  u64 off;
>>  int bit = -1;
>>  int emeta_secs;
>> @@ -1087,8 +1086,6 @@ static int pblk_line_init_bb(struct pblk *pblk, struct 
>> pblk_line *line,
>>  bitmap_or(line->map_bitmap, line->map_bitmap, l_mg->bb_aux,
>>  lm->sec_per_line);
>>  line->sec_in_line -= geo->c.clba;
>> -if (bit >= lm->emeta_bb)
>> -nr_bb++;
>>  }
>>  /* Mark smeta metadata sectors as bad sectors */
> 
> Should Fixes be added?

It's not a bug as such, that's why I didn't add it. But be welcome to do
so if you think it's better.


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH V3 00/19] lightnvm: pblk: implement 2.0 support

2018-02-26 Thread Javier Gonzalez
> On 26 Feb 2018, at 19.19, Matias Bjørling  wrote:
> 
> On 02/26/2018 02:16 PM, Javier González wrote:
>> # Changes since V2:
>> Apply Matias' feedback:
>>  - Remove generic nvm_id identify structure.
>>  - Do not remap capabilities (cap) to media and controlled capabilities
>>(mccap). Instead, add a comment to prevent confusion when
>>crosschecking with 2.0 spec.
>>  - Change maxoc and maxocpu defaults from 1 block to the max number of
>>blocks.
>>  - Re-implement the generic geometry to use nvm_geo on both device and
>>targets. Maintain nvm_common_geo to make it easier to copy the common
>>part of the geometry (without having to overwrite target-specific
>>fields, which is ugly and error prone). Matias, if you still want to
>>get rid of this, we can do it.
> 
> I do, the variables should go directly in nvm_geo. Thanks.

Ok. Is the rest ok with you?

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH] lightnvm: pblk: remove unused variable

2018-02-26 Thread Javier Gonzalez

> On 26 Feb 2018, at 19.20, Matias Bjørling  wrote:
> 
> On 02/26/2018 02:18 PM, Javier González wrote:
>> Remove unused variable after a previous cleanup (a8112b631adb)
>> Signed-off-by: Javier González 
>> ---
>>  drivers/lightnvm/pblk-core.c | 3 ---
>>  1 file changed, 3 deletions(-)
>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>> index ce6a7cfdba66..e6cb4317bb50 100644
>> --- a/drivers/lightnvm/pblk-core.c
>> +++ b/drivers/lightnvm/pblk-core.c
>> @@ -1071,7 +1071,6 @@ static int pblk_line_init_bb(struct pblk *pblk, struct 
>> pblk_line *line,
>>  struct nvm_geo *geo = >geo;
>>  struct pblk_line_meta *lm = >lm;
>>  struct pblk_line_mgmt *l_mg = >l_mg;
>> -int nr_bb = 0;
>>  u64 off;
>>  int bit = -1;
>>  int emeta_secs;
>> @@ -1087,8 +1086,6 @@ static int pblk_line_init_bb(struct pblk *pblk, struct 
>> pblk_line *line,
>>  bitmap_or(line->map_bitmap, line->map_bitmap, l_mg->bb_aux,
>>  lm->sec_per_line);
>>  line->sec_in_line -= geo->c.clba;
>> -if (bit >= lm->emeta_bb)
>> -nr_bb++;
>>  }
>>  /* Mark smeta metadata sectors as bad sectors */
> 
> Should Fixes be added?

It's not a bug as such, that's why I didn't add it. But be welcome to do
so if you think it's better.


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 1/1] nvme: implement log page low/high offset and dwords

2018-02-26 Thread Javier Gonzalez
> On 13 Feb 2018, at 13.49, Matias Bjørling  wrote:
> 
> NVMe 1.2.1 extends the get log page interface to include 64 bit
> offset and increases the number of dwords to 32 bits. Implement
> for future use.
> 
> Signed-off-by: Matias Bjørling 
> ---
> drivers/nvme/host/core.c | 36 
> 1 file changed, 24 insertions(+), 12 deletions(-)
> 
> 

Looks good to me.

Reviewed-by: Javier González 



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 1/1] nvme: implement log page low/high offset and dwords

2018-02-26 Thread Javier Gonzalez
> On 13 Feb 2018, at 13.49, Matias Bjørling  wrote:
> 
> NVMe 1.2.1 extends the get log page interface to include 64 bit
> offset and increases the number of dwords to 32 bits. Implement
> for future use.
> 
> Signed-off-by: Matias Bjørling 
> ---
> drivers/nvme/host/core.c | 36 
> 1 file changed, 24 insertions(+), 12 deletions(-)
> 
> 

Looks good to me.

Reviewed-by: Javier González 



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 01/20] lightnvm: simplify geometry structure.

2018-02-22 Thread Javier Gonzalez
> On 22 Feb 2018, at 13.22, Matias Bjørling <m...@lightnvm.io> wrote:
> 
> On 02/22/2018 08:44 AM, Javier Gonzalez wrote:
>>> On 22 Feb 2018, at 08.25, Matias Bjørling <m...@lightnvm.io> wrote:
>>> 
>>> On 02/21/2018 10:26 AM, Javier González wrote:
>>>> Currently, the device geometry is stored redundantly in the nvm_id and
>>>> nvm_geo structures at a device level. Moreover, when instantiating
>>>> targets on a specific number of LUNs, these structures are replicated
>>>> and manually modified to fit the instance channel and LUN partitioning.
>>>> Instead, create a generic geometry around two base structures:
>>>> nvm_dev_geo, which describes the geometry of the whole device and
>>>> nvm_geo, which describes the geometry of the instance. Since these share
>>>> a big part of the geometry, create a nvm_common_geo structure that keeps
>>>> the static geoometry values that are shared across instances.
>>>> As we introduce support for 2.0, these structures allow to abstract
>>>> spec. specific values and present a common geometry to targets.
>>>> Signed-off-by: Javier González <jav...@cnexlabs.com>
>>>> ---
>>>>  drivers/lightnvm/core.c  | 137 +++-
>>>>  drivers/lightnvm/pblk-core.c |  16 +-
>>>>  drivers/lightnvm/pblk-gc.c   |   2 +-
>>>>  drivers/lightnvm/pblk-init.c | 123 +++---
>>>>  drivers/lightnvm/pblk-read.c |   2 +-
>>>>  drivers/lightnvm/pblk-recovery.c |  14 +-
>>>>  drivers/lightnvm/pblk-rl.c   |   2 +-
>>>>  drivers/lightnvm/pblk-sysfs.c|  39 +++--
>>>>  drivers/lightnvm/pblk-write.c|   2 +-
>>>>  drivers/lightnvm/pblk.h  |  93 +--
>>>>  drivers/nvme/host/lightnvm.c | 339 
>>>> +++
>>>>  include/linux/lightnvm.h | 204 ---
>>>>  12 files changed, 514 insertions(+), 459 deletions(-)
>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>>>> index 689c97b97775..42596afdf64c 100644
>>>> --- a/drivers/lightnvm/core.c
>>>> +++ b/drivers/lightnvm/core.c
>>>> @@ -111,6 +111,7 @@ static void nvm_release_luns_err(struct nvm_dev *dev, 
>>>> int lun_begin,
>>>>  static void nvm_remove_tgt_dev(struct nvm_tgt_dev *tgt_dev, int clear)
>>>>  {
>>>>struct nvm_dev *dev = tgt_dev->parent;
>>>> +  struct nvm_dev_geo *dev_geo = >dev_geo;
>>>>struct nvm_dev_map *dev_map = tgt_dev->map;
>>>>int i, j;
>>>>  @@ -122,7 +123,7 @@ static void nvm_remove_tgt_dev(struct nvm_tgt_dev 
>>>> *tgt_dev, int clear)
>>>>if (clear) {
>>>>for (j = 0; j < ch_map->nr_luns; j++) {
>>>>int lun = j + lun_offs[j];
>>>> -  int lunid = (ch * dev->geo.nr_luns) + lun;
>>>> +  int lunid = (ch * dev_geo->nr_luns) + lun;
>>>>WARN_ON(!test_and_clear_bit(lunid,
>>>>dev->lun_map));
>>>> @@ -143,19 +144,20 @@ static struct nvm_tgt_dev *nvm_create_tgt_dev(struct 
>>>> nvm_dev *dev,
>>>>  u16 lun_begin, u16 lun_end,
>>>>  u16 op)
>>>>  {
>>>> +  struct nvm_dev_geo *dev_geo = >dev_geo;
>>>>struct nvm_tgt_dev *tgt_dev = NULL;
>>>>struct nvm_dev_map *dev_rmap = dev->rmap;
>>>>struct nvm_dev_map *dev_map;
>>>>struct ppa_addr *luns;
>>>>int nr_luns = lun_end - lun_begin + 1;
>>>>int luns_left = nr_luns;
>>>> -  int nr_chnls = nr_luns / dev->geo.nr_luns;
>>>> -  int nr_chnls_mod = nr_luns % dev->geo.nr_luns;
>>>> -  int bch = lun_begin / dev->geo.nr_luns;
>>>> -  int blun = lun_begin % dev->geo.nr_luns;
>>>> +  int nr_chnls = nr_luns / dev_geo->nr_luns;
>>>> +  int nr_chnls_mod = nr_luns % dev_geo->nr_luns;
>>>> +  int bch = lun_begin / dev_geo->nr_luns;
>>>> +  int blun = lun_begin % dev_geo->nr_luns;
>>>>int lunid = 0;
>>>>int lun_balanced = 1;
>>>> -  int prev_nr_luns;
>>>> +  int sec_per_lun, prev_nr_luns;
>>>>int i, 

Re: [PATCH 01/20] lightnvm: simplify geometry structure.

2018-02-22 Thread Javier Gonzalez
> On 22 Feb 2018, at 13.22, Matias Bjørling  wrote:
> 
> On 02/22/2018 08:44 AM, Javier Gonzalez wrote:
>>> On 22 Feb 2018, at 08.25, Matias Bjørling  wrote:
>>> 
>>> On 02/21/2018 10:26 AM, Javier González wrote:
>>>> Currently, the device geometry is stored redundantly in the nvm_id and
>>>> nvm_geo structures at a device level. Moreover, when instantiating
>>>> targets on a specific number of LUNs, these structures are replicated
>>>> and manually modified to fit the instance channel and LUN partitioning.
>>>> Instead, create a generic geometry around two base structures:
>>>> nvm_dev_geo, which describes the geometry of the whole device and
>>>> nvm_geo, which describes the geometry of the instance. Since these share
>>>> a big part of the geometry, create a nvm_common_geo structure that keeps
>>>> the static geoometry values that are shared across instances.
>>>> As we introduce support for 2.0, these structures allow to abstract
>>>> spec. specific values and present a common geometry to targets.
>>>> Signed-off-by: Javier González 
>>>> ---
>>>>  drivers/lightnvm/core.c  | 137 +++-
>>>>  drivers/lightnvm/pblk-core.c |  16 +-
>>>>  drivers/lightnvm/pblk-gc.c   |   2 +-
>>>>  drivers/lightnvm/pblk-init.c | 123 +++---
>>>>  drivers/lightnvm/pblk-read.c |   2 +-
>>>>  drivers/lightnvm/pblk-recovery.c |  14 +-
>>>>  drivers/lightnvm/pblk-rl.c   |   2 +-
>>>>  drivers/lightnvm/pblk-sysfs.c|  39 +++--
>>>>  drivers/lightnvm/pblk-write.c|   2 +-
>>>>  drivers/lightnvm/pblk.h  |  93 +--
>>>>  drivers/nvme/host/lightnvm.c | 339 
>>>> +++
>>>>  include/linux/lightnvm.h | 204 ---
>>>>  12 files changed, 514 insertions(+), 459 deletions(-)
>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>>>> index 689c97b97775..42596afdf64c 100644
>>>> --- a/drivers/lightnvm/core.c
>>>> +++ b/drivers/lightnvm/core.c
>>>> @@ -111,6 +111,7 @@ static void nvm_release_luns_err(struct nvm_dev *dev, 
>>>> int lun_begin,
>>>>  static void nvm_remove_tgt_dev(struct nvm_tgt_dev *tgt_dev, int clear)
>>>>  {
>>>>struct nvm_dev *dev = tgt_dev->parent;
>>>> +  struct nvm_dev_geo *dev_geo = >dev_geo;
>>>>struct nvm_dev_map *dev_map = tgt_dev->map;
>>>>int i, j;
>>>>  @@ -122,7 +123,7 @@ static void nvm_remove_tgt_dev(struct nvm_tgt_dev 
>>>> *tgt_dev, int clear)
>>>>if (clear) {
>>>>for (j = 0; j < ch_map->nr_luns; j++) {
>>>>int lun = j + lun_offs[j];
>>>> -  int lunid = (ch * dev->geo.nr_luns) + lun;
>>>> +  int lunid = (ch * dev_geo->nr_luns) + lun;
>>>>WARN_ON(!test_and_clear_bit(lunid,
>>>>dev->lun_map));
>>>> @@ -143,19 +144,20 @@ static struct nvm_tgt_dev *nvm_create_tgt_dev(struct 
>>>> nvm_dev *dev,
>>>>  u16 lun_begin, u16 lun_end,
>>>>  u16 op)
>>>>  {
>>>> +  struct nvm_dev_geo *dev_geo = >dev_geo;
>>>>struct nvm_tgt_dev *tgt_dev = NULL;
>>>>struct nvm_dev_map *dev_rmap = dev->rmap;
>>>>struct nvm_dev_map *dev_map;
>>>>struct ppa_addr *luns;
>>>>int nr_luns = lun_end - lun_begin + 1;
>>>>int luns_left = nr_luns;
>>>> -  int nr_chnls = nr_luns / dev->geo.nr_luns;
>>>> -  int nr_chnls_mod = nr_luns % dev->geo.nr_luns;
>>>> -  int bch = lun_begin / dev->geo.nr_luns;
>>>> -  int blun = lun_begin % dev->geo.nr_luns;
>>>> +  int nr_chnls = nr_luns / dev_geo->nr_luns;
>>>> +  int nr_chnls_mod = nr_luns % dev_geo->nr_luns;
>>>> +  int bch = lun_begin / dev_geo->nr_luns;
>>>> +  int blun = lun_begin % dev_geo->nr_luns;
>>>>int lunid = 0;
>>>>int lun_balanced = 1;
>>>> -  int prev_nr_luns;
>>>> +  int sec_per_lun, prev_nr_luns;
>>>>int i, j;
>>>>nr_chnls

Re: [PATCH 03/20] lightnvm: fix capabilities for 2.0 sysfs

2018-02-22 Thread Javier Gonzalez
> On 22 Feb 2018, at 12.10, Matias Bjørling <m...@lightnvm.io> wrote:
> 
> On 02/22/2018 11:25 AM, Javier Gonzalez wrote:
>>> On 22 Feb 2018, at 10.39, Matias Bjørling <m...@lightnvm.io> wrote:
>>> 
>>> On 02/22/2018 08:47 AM, Javier Gonzalez wrote:
>>>>> On 22 Feb 2018, at 08.28, Matias Bjørling <m...@lightnvm.io> wrote:
>>>>> 
>>>>>> On 02/21/2018 10:26 AM, Javier González wrote:
>>>>>> Both 1.2 and 2.0 specs define a field for media and controller
>>>>>> capabilities. Also, 1.2 defines a separate field dedicated to device
>>>>>> capabilities.
>>>>>> In 2.0 sysfs, this values have been mixed. Revert them to the right
>>>>>> value.
>>>>>> Signed-off-by: Javier González <jav...@cnexlabs.com>
>>>>>> ---
>>>>>>  drivers/nvme/host/lightnvm.c | 18 +-
>>>>>>  1 file changed, 9 insertions(+), 9 deletions(-)
>>>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>>>>> index 969bb874850c..598abba66f52 100644
>>>>>> --- a/drivers/nvme/host/lightnvm.c
>>>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>>>> @@ -914,8 +914,8 @@ static ssize_t nvm_dev_attr_show(struct device *dev,
>>>>>>if (strcmp(attr->name, "version") == 0) {
>>>>>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->ver_id);
>>>>>> -} else if (strcmp(attr->name, "capabilities") == 0) {
>>>>>> -return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.cap);
>>>>>> +} else if (strcmp(attr->name, "media_capabilities") == 0) {
>>>>>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.mccap);
>>>>>>  } else if (strcmp(attr->name, "read_typ") == 0) {
>>>>>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.trdt);
>>>>>>  } else if (strcmp(attr->name, "read_max") == 0) {
>>>>>> @@ -993,8 +993,8 @@ static ssize_t nvm_dev_attr_show_12(struct device 
>>>>>> *dev,
>>>>>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.tbem);
>>>>>>  } else if (strcmp(attr->name, "multiplane_modes") == 0) {
>>>>>>  return scnprintf(page, PAGE_SIZE, "0x%08x\n", dev_geo->c.mpos);
>>>>>> -} else if (strcmp(attr->name, "media_capabilities") == 0) {
>>>>>> -return scnprintf(page, PAGE_SIZE, "0x%08x\n", dev_geo->c.mccap);
>>>>>> +} else if (strcmp(attr->name, "capabilities") == 0) {
>>>>>> +return scnprintf(page, PAGE_SIZE, "0x%08x\n", dev_geo->c.cap);
>>>>>>  } else if (strcmp(attr->name, "max_phys_secs") == 0) {
>>>>>>  return scnprintf(page, PAGE_SIZE, "%u\n", NVM_MAX_VLBA);
>>>>>>  } else {
>>>>>> @@ -1055,7 +1055,7 @@ static ssize_t nvm_dev_attr_show_20(struct device 
>>>>>> *dev,
>>>>>>/* general attributes */
>>>>>>  static NVM_DEV_ATTR_RO(version);
>>>>>> -static NVM_DEV_ATTR_RO(capabilities);
>>>>>> +static NVM_DEV_ATTR_RO(media_capabilities);
>>>>>>static NVM_DEV_ATTR_RO(read_typ);
>>>>>>  static NVM_DEV_ATTR_RO(read_max);
>>>>>> @@ -1080,12 +1080,12 @@ static NVM_DEV_ATTR_12_RO(prog_max);
>>>>>>  static NVM_DEV_ATTR_12_RO(erase_typ);
>>>>>>  static NVM_DEV_ATTR_12_RO(erase_max);
>>>>>>  static NVM_DEV_ATTR_12_RO(multiplane_modes);
>>>>>> -static NVM_DEV_ATTR_12_RO(media_capabilities);
>>>>>> +static NVM_DEV_ATTR_12_RO(capabilities);
>>>>>>  static NVM_DEV_ATTR_12_RO(max_phys_secs);
>>>>>>static struct attribute *nvm_dev_attrs_12[] = {
>>>>>>  _attr_version.attr,
>>>>>> -_attr_capabilities.attr,
>>>>>> +_attr_media_capabilities.attr,
>>>>>>_attr_vendor_opcode.attr,
>>>>>>  _attr_device_mode.attr,
>>>>>> @@ -1108,7 +1108,7 @@ static struct attribute *nvm_dev_attrs_12[] = {
>>>>>>  _attr_erase_typ.attr,
>>>>>

Re: [PATCH 03/20] lightnvm: fix capabilities for 2.0 sysfs

2018-02-22 Thread Javier Gonzalez
> On 22 Feb 2018, at 12.10, Matias Bjørling  wrote:
> 
> On 02/22/2018 11:25 AM, Javier Gonzalez wrote:
>>> On 22 Feb 2018, at 10.39, Matias Bjørling  wrote:
>>> 
>>> On 02/22/2018 08:47 AM, Javier Gonzalez wrote:
>>>>> On 22 Feb 2018, at 08.28, Matias Bjørling  wrote:
>>>>> 
>>>>>> On 02/21/2018 10:26 AM, Javier González wrote:
>>>>>> Both 1.2 and 2.0 specs define a field for media and controller
>>>>>> capabilities. Also, 1.2 defines a separate field dedicated to device
>>>>>> capabilities.
>>>>>> In 2.0 sysfs, this values have been mixed. Revert them to the right
>>>>>> value.
>>>>>> Signed-off-by: Javier González 
>>>>>> ---
>>>>>>  drivers/nvme/host/lightnvm.c | 18 +-
>>>>>>  1 file changed, 9 insertions(+), 9 deletions(-)
>>>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>>>>> index 969bb874850c..598abba66f52 100644
>>>>>> --- a/drivers/nvme/host/lightnvm.c
>>>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>>>> @@ -914,8 +914,8 @@ static ssize_t nvm_dev_attr_show(struct device *dev,
>>>>>>if (strcmp(attr->name, "version") == 0) {
>>>>>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->ver_id);
>>>>>> -} else if (strcmp(attr->name, "capabilities") == 0) {
>>>>>> -return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.cap);
>>>>>> +} else if (strcmp(attr->name, "media_capabilities") == 0) {
>>>>>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.mccap);
>>>>>>  } else if (strcmp(attr->name, "read_typ") == 0) {
>>>>>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.trdt);
>>>>>>  } else if (strcmp(attr->name, "read_max") == 0) {
>>>>>> @@ -993,8 +993,8 @@ static ssize_t nvm_dev_attr_show_12(struct device 
>>>>>> *dev,
>>>>>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.tbem);
>>>>>>  } else if (strcmp(attr->name, "multiplane_modes") == 0) {
>>>>>>  return scnprintf(page, PAGE_SIZE, "0x%08x\n", dev_geo->c.mpos);
>>>>>> -} else if (strcmp(attr->name, "media_capabilities") == 0) {
>>>>>> -return scnprintf(page, PAGE_SIZE, "0x%08x\n", dev_geo->c.mccap);
>>>>>> +} else if (strcmp(attr->name, "capabilities") == 0) {
>>>>>> +return scnprintf(page, PAGE_SIZE, "0x%08x\n", dev_geo->c.cap);
>>>>>>  } else if (strcmp(attr->name, "max_phys_secs") == 0) {
>>>>>>  return scnprintf(page, PAGE_SIZE, "%u\n", NVM_MAX_VLBA);
>>>>>>  } else {
>>>>>> @@ -1055,7 +1055,7 @@ static ssize_t nvm_dev_attr_show_20(struct device 
>>>>>> *dev,
>>>>>>/* general attributes */
>>>>>>  static NVM_DEV_ATTR_RO(version);
>>>>>> -static NVM_DEV_ATTR_RO(capabilities);
>>>>>> +static NVM_DEV_ATTR_RO(media_capabilities);
>>>>>>static NVM_DEV_ATTR_RO(read_typ);
>>>>>>  static NVM_DEV_ATTR_RO(read_max);
>>>>>> @@ -1080,12 +1080,12 @@ static NVM_DEV_ATTR_12_RO(prog_max);
>>>>>>  static NVM_DEV_ATTR_12_RO(erase_typ);
>>>>>>  static NVM_DEV_ATTR_12_RO(erase_max);
>>>>>>  static NVM_DEV_ATTR_12_RO(multiplane_modes);
>>>>>> -static NVM_DEV_ATTR_12_RO(media_capabilities);
>>>>>> +static NVM_DEV_ATTR_12_RO(capabilities);
>>>>>>  static NVM_DEV_ATTR_12_RO(max_phys_secs);
>>>>>>static struct attribute *nvm_dev_attrs_12[] = {
>>>>>>  _attr_version.attr,
>>>>>> -_attr_capabilities.attr,
>>>>>> +_attr_media_capabilities.attr,
>>>>>>_attr_vendor_opcode.attr,
>>>>>>  _attr_device_mode.attr,
>>>>>> @@ -1108,7 +1108,7 @@ static struct attribute *nvm_dev_attrs_12[] = {
>>>>>>  _attr_erase_typ.attr,
>>>>>>  _attr_erase_max.attr,
>>>>>>  _attr_multiplane_modes.attr,
&

Re: [PATCH 03/20] lightnvm: fix capabilities for 2.0 sysfs

2018-02-22 Thread Javier Gonzalez


> On 22 Feb 2018, at 10.39, Matias Bjørling <m...@lightnvm.io> wrote:
> 
> On 02/22/2018 08:47 AM, Javier Gonzalez wrote:
>>> On 22 Feb 2018, at 08.28, Matias Bjørling <m...@lightnvm.io> wrote:
>>> 
>>>> On 02/21/2018 10:26 AM, Javier González wrote:
>>>> Both 1.2 and 2.0 specs define a field for media and controller
>>>> capabilities. Also, 1.2 defines a separate field dedicated to device
>>>> capabilities.
>>>> In 2.0 sysfs, this values have been mixed. Revert them to the right
>>>> value.
>>>> Signed-off-by: Javier González <jav...@cnexlabs.com>
>>>> ---
>>>>  drivers/nvme/host/lightnvm.c | 18 +-
>>>>  1 file changed, 9 insertions(+), 9 deletions(-)
>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>>> index 969bb874850c..598abba66f52 100644
>>>> --- a/drivers/nvme/host/lightnvm.c
>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>> @@ -914,8 +914,8 @@ static ssize_t nvm_dev_attr_show(struct device *dev,
>>>>if (strcmp(attr->name, "version") == 0) {
>>>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->ver_id);
>>>> -} else if (strcmp(attr->name, "capabilities") == 0) {
>>>> -return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.cap);
>>>> +} else if (strcmp(attr->name, "media_capabilities") == 0) {
>>>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.mccap);
>>>>  } else if (strcmp(attr->name, "read_typ") == 0) {
>>>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.trdt);
>>>>  } else if (strcmp(attr->name, "read_max") == 0) {
>>>> @@ -993,8 +993,8 @@ static ssize_t nvm_dev_attr_show_12(struct device *dev,
>>>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.tbem);
>>>>  } else if (strcmp(attr->name, "multiplane_modes") == 0) {
>>>>  return scnprintf(page, PAGE_SIZE, "0x%08x\n", dev_geo->c.mpos);
>>>> -} else if (strcmp(attr->name, "media_capabilities") == 0) {
>>>> -return scnprintf(page, PAGE_SIZE, "0x%08x\n", dev_geo->c.mccap);
>>>> +} else if (strcmp(attr->name, "capabilities") == 0) {
>>>> +return scnprintf(page, PAGE_SIZE, "0x%08x\n", dev_geo->c.cap);
>>>>  } else if (strcmp(attr->name, "max_phys_secs") == 0) {
>>>>  return scnprintf(page, PAGE_SIZE, "%u\n", NVM_MAX_VLBA);
>>>>  } else {
>>>> @@ -1055,7 +1055,7 @@ static ssize_t nvm_dev_attr_show_20(struct device 
>>>> *dev,
>>>>/* general attributes */
>>>>  static NVM_DEV_ATTR_RO(version);
>>>> -static NVM_DEV_ATTR_RO(capabilities);
>>>> +static NVM_DEV_ATTR_RO(media_capabilities);
>>>>static NVM_DEV_ATTR_RO(read_typ);
>>>>  static NVM_DEV_ATTR_RO(read_max);
>>>> @@ -1080,12 +1080,12 @@ static NVM_DEV_ATTR_12_RO(prog_max);
>>>>  static NVM_DEV_ATTR_12_RO(erase_typ);
>>>>  static NVM_DEV_ATTR_12_RO(erase_max);
>>>>  static NVM_DEV_ATTR_12_RO(multiplane_modes);
>>>> -static NVM_DEV_ATTR_12_RO(media_capabilities);
>>>> +static NVM_DEV_ATTR_12_RO(capabilities);
>>>>  static NVM_DEV_ATTR_12_RO(max_phys_secs);
>>>>static struct attribute *nvm_dev_attrs_12[] = {
>>>>  _attr_version.attr,
>>>> -_attr_capabilities.attr,
>>>> +_attr_media_capabilities.attr,
>>>>_attr_vendor_opcode.attr,
>>>>  _attr_device_mode.attr,
>>>> @@ -1108,7 +1108,7 @@ static struct attribute *nvm_dev_attrs_12[] = {
>>>>  _attr_erase_typ.attr,
>>>>  _attr_erase_max.attr,
>>>>  _attr_multiplane_modes.attr,
>>>> -_attr_media_capabilities.attr,
>>>> +_attr_capabilities.attr,
>>>>  _attr_max_phys_secs.attr,
>>>>NULL,
>>>> @@ -1134,7 +1134,7 @@ static NVM_DEV_ATTR_20_RO(reset_max);
>>>>static struct attribute *nvm_dev_attrs_20[] = {
>>>>  _attr_version.attr,
>>>> -_attr_capabilities.attr,
>>>> +_attr_media_capabilities.attr,
>>>>_attr_groups.attr,
>>>>  _attr_punits.attr,
>>> 
>>> With the mccap changes, it should make sense to keep the capabilities
>>> as is.
>> The change adds mccap, but sysfs points to cap, which is wrong. This
>> patch is needed. Otherwise, we change the name of mccap to cap, which
>> is _very_ confusing to people familiar to both specs. We can change
>> the name of mccap to cap in a future spec revision.
>> Javier
> 
> Think of the sysfs capabilities as an abstract value that defines generic 
> capabilities. It is not directly tied to either 1.2 or 2.0.

I’m thinking about the user looking at sysfs and at the spec at the same time - 
I myself get confused when names don’t match. 

Anyway, I’ll keep it the way it was and add a comment for clarification. Would 
that work for you?

Javier 

Re: [PATCH 03/20] lightnvm: fix capabilities for 2.0 sysfs

2018-02-22 Thread Javier Gonzalez


> On 22 Feb 2018, at 10.39, Matias Bjørling  wrote:
> 
> On 02/22/2018 08:47 AM, Javier Gonzalez wrote:
>>> On 22 Feb 2018, at 08.28, Matias Bjørling  wrote:
>>> 
>>>> On 02/21/2018 10:26 AM, Javier González wrote:
>>>> Both 1.2 and 2.0 specs define a field for media and controller
>>>> capabilities. Also, 1.2 defines a separate field dedicated to device
>>>> capabilities.
>>>> In 2.0 sysfs, this values have been mixed. Revert them to the right
>>>> value.
>>>> Signed-off-by: Javier González 
>>>> ---
>>>>  drivers/nvme/host/lightnvm.c | 18 +-
>>>>  1 file changed, 9 insertions(+), 9 deletions(-)
>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>>> index 969bb874850c..598abba66f52 100644
>>>> --- a/drivers/nvme/host/lightnvm.c
>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>> @@ -914,8 +914,8 @@ static ssize_t nvm_dev_attr_show(struct device *dev,
>>>>if (strcmp(attr->name, "version") == 0) {
>>>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->ver_id);
>>>> -} else if (strcmp(attr->name, "capabilities") == 0) {
>>>> -return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.cap);
>>>> +} else if (strcmp(attr->name, "media_capabilities") == 0) {
>>>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.mccap);
>>>>  } else if (strcmp(attr->name, "read_typ") == 0) {
>>>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.trdt);
>>>>  } else if (strcmp(attr->name, "read_max") == 0) {
>>>> @@ -993,8 +993,8 @@ static ssize_t nvm_dev_attr_show_12(struct device *dev,
>>>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.tbem);
>>>>  } else if (strcmp(attr->name, "multiplane_modes") == 0) {
>>>>  return scnprintf(page, PAGE_SIZE, "0x%08x\n", dev_geo->c.mpos);
>>>> -} else if (strcmp(attr->name, "media_capabilities") == 0) {
>>>> -return scnprintf(page, PAGE_SIZE, "0x%08x\n", dev_geo->c.mccap);
>>>> +} else if (strcmp(attr->name, "capabilities") == 0) {
>>>> +return scnprintf(page, PAGE_SIZE, "0x%08x\n", dev_geo->c.cap);
>>>>  } else if (strcmp(attr->name, "max_phys_secs") == 0) {
>>>>  return scnprintf(page, PAGE_SIZE, "%u\n", NVM_MAX_VLBA);
>>>>  } else {
>>>> @@ -1055,7 +1055,7 @@ static ssize_t nvm_dev_attr_show_20(struct device 
>>>> *dev,
>>>>/* general attributes */
>>>>  static NVM_DEV_ATTR_RO(version);
>>>> -static NVM_DEV_ATTR_RO(capabilities);
>>>> +static NVM_DEV_ATTR_RO(media_capabilities);
>>>>static NVM_DEV_ATTR_RO(read_typ);
>>>>  static NVM_DEV_ATTR_RO(read_max);
>>>> @@ -1080,12 +1080,12 @@ static NVM_DEV_ATTR_12_RO(prog_max);
>>>>  static NVM_DEV_ATTR_12_RO(erase_typ);
>>>>  static NVM_DEV_ATTR_12_RO(erase_max);
>>>>  static NVM_DEV_ATTR_12_RO(multiplane_modes);
>>>> -static NVM_DEV_ATTR_12_RO(media_capabilities);
>>>> +static NVM_DEV_ATTR_12_RO(capabilities);
>>>>  static NVM_DEV_ATTR_12_RO(max_phys_secs);
>>>>static struct attribute *nvm_dev_attrs_12[] = {
>>>>  _attr_version.attr,
>>>> -_attr_capabilities.attr,
>>>> +_attr_media_capabilities.attr,
>>>>_attr_vendor_opcode.attr,
>>>>  _attr_device_mode.attr,
>>>> @@ -1108,7 +1108,7 @@ static struct attribute *nvm_dev_attrs_12[] = {
>>>>  _attr_erase_typ.attr,
>>>>  _attr_erase_max.attr,
>>>>  _attr_multiplane_modes.attr,
>>>> -_attr_media_capabilities.attr,
>>>> +_attr_capabilities.attr,
>>>>  _attr_max_phys_secs.attr,
>>>>NULL,
>>>> @@ -1134,7 +1134,7 @@ static NVM_DEV_ATTR_20_RO(reset_max);
>>>>static struct attribute *nvm_dev_attrs_20[] = {
>>>>  _attr_version.attr,
>>>> -_attr_capabilities.attr,
>>>> +_attr_media_capabilities.attr,
>>>>_attr_groups.attr,
>>>>  _attr_punits.attr,
>>> 
>>> With the mccap changes, it should make sense to keep the capabilities
>>> as is.
>> The change adds mccap, but sysfs points to cap, which is wrong. This
>> patch is needed. Otherwise, we change the name of mccap to cap, which
>> is _very_ confusing to people familiar to both specs. We can change
>> the name of mccap to cap in a future spec revision.
>> Javier
> 
> Think of the sysfs capabilities as an abstract value that defines generic 
> capabilities. It is not directly tied to either 1.2 or 2.0.

I’m thinking about the user looking at sysfs and at the spec at the same time - 
I myself get confused when names don’t match. 

Anyway, I’ll keep it the way it was and add a comment for clarification. Would 
that work for you?

Javier 

Re: [PATCH 08/20] lightnvm: complete geo structure with maxoc*

2018-02-22 Thread Javier Gonzalez

Javier

> On 22 Feb 2018, at 11.00, Matias Bjørling <m...@lightnvm.io> wrote:
> 
> On 02/22/2018 10:52 AM, Javier Gonzalez wrote:
>>> On 22 Feb 2018, at 10.45, Matias Bjørling <m...@lightnvm.io> wrote:
>>> 
>>> On 02/22/2018 08:55 AM, Javier Gonzalez wrote:
>>>>> On 22 Feb 2018, at 08.45, Matias Bjørling <m...@lightnvm.io> wrote:
>>>>> 
>>>>> On 02/21/2018 10:26 AM, Javier González wrote:
>>>>>> Complete the generic geometry structure with the maxoc and maxocpu
>>>>>> felds, present in the 2.0 spec.
>>>>>> Signed-off-by: Javier González <jav...@cnexlabs.com>
>>>>>> ---
>>>>>>  drivers/nvme/host/lightnvm.c | 4 
>>>>>>  include/linux/lightnvm.h | 2 ++
>>>>>>  2 files changed, 6 insertions(+)
>>>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>>>>> index cca32da05316..9c1f8225c4e1 100644
>>>>>> --- a/drivers/nvme/host/lightnvm.c
>>>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>>>> @@ -318,6 +318,8 @@ static int nvme_nvm_setup_12(struct nvme_nvm_id12 
>>>>>> *id,
>>>>>>  dev_geo->c.ws_min = sec_per_pg;
>>>>>>  dev_geo->c.ws_opt = sec_per_pg;
>>>>>>  dev_geo->c.mw_cunits = 8;   /* default to MLC safe 
>>>>>> values */
>>>>>> +dev_geo->c.maxoc = dev_geo->all_luns;   /* default to 1 chunk 
>>>>>> per LUN */
>>>>>> +dev_geo->c.maxocpu = 1; /* default to 1 chunk 
>>>>>> per LUN */
>>>>> 
>>>>> One can't assume that it is 1 open chunk per lun. If you need this for 
>>>>> specific hardware, make a quirk for it.
>>>> Which default you want for 1.2 if not specified then? I use 1 because it
>>>> has been the implicit default until now.
>>> 
>>> INT_MAX, since it then allows the maximum of open chunks. It cannot be 
>>> assumed that other 1.2 devices is limited to a single open chunk.
>> So you want the default to be that all blocks on the device can be
>> opened at the same time. Interesting... I guess that such a SSD will
>> have a AA battery attached to it, but fine by me if that's how you want
>> it.
> 
> I feel you're a bit sarcastic here. One may think of SLC and other memories 
> that does one-shot programming. In that case no caching is needed, and 
> therefore power-caps can be limited on the hardware.

Sure. Hope people move to 2.0 then for all >SLC memories out there,
otherwise we'll see a lot of quirks coming in. Thought we wanted to be
generic.

> 
>> Assuming this, can we instead set it to the reported number of chunks,
>> since this is the hard limit anyway.
> 
> Works for me.

Cool. I'll add this to the next version.

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 08/20] lightnvm: complete geo structure with maxoc*

2018-02-22 Thread Javier Gonzalez

Javier

> On 22 Feb 2018, at 11.00, Matias Bjørling  wrote:
> 
> On 02/22/2018 10:52 AM, Javier Gonzalez wrote:
>>> On 22 Feb 2018, at 10.45, Matias Bjørling  wrote:
>>> 
>>> On 02/22/2018 08:55 AM, Javier Gonzalez wrote:
>>>>> On 22 Feb 2018, at 08.45, Matias Bjørling  wrote:
>>>>> 
>>>>> On 02/21/2018 10:26 AM, Javier González wrote:
>>>>>> Complete the generic geometry structure with the maxoc and maxocpu
>>>>>> felds, present in the 2.0 spec.
>>>>>> Signed-off-by: Javier González 
>>>>>> ---
>>>>>>  drivers/nvme/host/lightnvm.c | 4 
>>>>>>  include/linux/lightnvm.h | 2 ++
>>>>>>  2 files changed, 6 insertions(+)
>>>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>>>>> index cca32da05316..9c1f8225c4e1 100644
>>>>>> --- a/drivers/nvme/host/lightnvm.c
>>>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>>>> @@ -318,6 +318,8 @@ static int nvme_nvm_setup_12(struct nvme_nvm_id12 
>>>>>> *id,
>>>>>>  dev_geo->c.ws_min = sec_per_pg;
>>>>>>  dev_geo->c.ws_opt = sec_per_pg;
>>>>>>  dev_geo->c.mw_cunits = 8;   /* default to MLC safe 
>>>>>> values */
>>>>>> +dev_geo->c.maxoc = dev_geo->all_luns;   /* default to 1 chunk 
>>>>>> per LUN */
>>>>>> +dev_geo->c.maxocpu = 1; /* default to 1 chunk 
>>>>>> per LUN */
>>>>> 
>>>>> One can't assume that it is 1 open chunk per lun. If you need this for 
>>>>> specific hardware, make a quirk for it.
>>>> Which default you want for 1.2 if not specified then? I use 1 because it
>>>> has been the implicit default until now.
>>> 
>>> INT_MAX, since it then allows the maximum of open chunks. It cannot be 
>>> assumed that other 1.2 devices is limited to a single open chunk.
>> So you want the default to be that all blocks on the device can be
>> opened at the same time. Interesting... I guess that such a SSD will
>> have a AA battery attached to it, but fine by me if that's how you want
>> it.
> 
> I feel you're a bit sarcastic here. One may think of SLC and other memories 
> that does one-shot programming. In that case no caching is needed, and 
> therefore power-caps can be limited on the hardware.

Sure. Hope people move to 2.0 then for all >SLC memories out there,
otherwise we'll see a lot of quirks coming in. Thought we wanted to be
generic.

> 
>> Assuming this, can we instead set it to the reported number of chunks,
>> since this is the hard limit anyway.
> 
> Works for me.

Cool. I'll add this to the next version.

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 08/20] lightnvm: complete geo structure with maxoc*

2018-02-22 Thread Javier Gonzalez
> On 22 Feb 2018, at 10.45, Matias Bjørling <m...@lightnvm.io> wrote:
> 
> On 02/22/2018 08:55 AM, Javier Gonzalez wrote:
>>> On 22 Feb 2018, at 08.45, Matias Bjørling <m...@lightnvm.io> wrote:
>>> 
>>> On 02/21/2018 10:26 AM, Javier González wrote:
>>>> Complete the generic geometry structure with the maxoc and maxocpu
>>>> felds, present in the 2.0 spec.
>>>> Signed-off-by: Javier González <jav...@cnexlabs.com>
>>>> ---
>>>>  drivers/nvme/host/lightnvm.c | 4 
>>>>  include/linux/lightnvm.h | 2 ++
>>>>  2 files changed, 6 insertions(+)
>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>>> index cca32da05316..9c1f8225c4e1 100644
>>>> --- a/drivers/nvme/host/lightnvm.c
>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>> @@ -318,6 +318,8 @@ static int nvme_nvm_setup_12(struct nvme_nvm_id12 *id,
>>>>dev_geo->c.ws_min = sec_per_pg;
>>>>dev_geo->c.ws_opt = sec_per_pg;
>>>>dev_geo->c.mw_cunits = 8;   /* default to MLC safe values */
>>>> +  dev_geo->c.maxoc = dev_geo->all_luns;   /* default to 1 chunk per LUN */
>>>> +  dev_geo->c.maxocpu = 1; /* default to 1 chunk per LUN */
>>> 
>>> One can't assume that it is 1 open chunk per lun. If you need this for 
>>> specific hardware, make a quirk for it.
>> Which default you want for 1.2 if not specified then? I use 1 because it
>> has been the implicit default until now.
> 
> INT_MAX, since it then allows the maximum of open chunks. It cannot be 
> assumed that other 1.2 devices is limited to a single open chunk.

So you want the default to be that all blocks on the device can be
opened at the same time. Interesting... I guess that such a SSD will
have a AA battery attached to it, but fine by me if that's how you want
it.

Assuming this, can we instead set it to the reported number of chunks,
since this is the hard limit anyway.

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 08/20] lightnvm: complete geo structure with maxoc*

2018-02-22 Thread Javier Gonzalez
> On 22 Feb 2018, at 10.45, Matias Bjørling  wrote:
> 
> On 02/22/2018 08:55 AM, Javier Gonzalez wrote:
>>> On 22 Feb 2018, at 08.45, Matias Bjørling  wrote:
>>> 
>>> On 02/21/2018 10:26 AM, Javier González wrote:
>>>> Complete the generic geometry structure with the maxoc and maxocpu
>>>> felds, present in the 2.0 spec.
>>>> Signed-off-by: Javier González 
>>>> ---
>>>>  drivers/nvme/host/lightnvm.c | 4 
>>>>  include/linux/lightnvm.h | 2 ++
>>>>  2 files changed, 6 insertions(+)
>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>>> index cca32da05316..9c1f8225c4e1 100644
>>>> --- a/drivers/nvme/host/lightnvm.c
>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>> @@ -318,6 +318,8 @@ static int nvme_nvm_setup_12(struct nvme_nvm_id12 *id,
>>>>dev_geo->c.ws_min = sec_per_pg;
>>>>dev_geo->c.ws_opt = sec_per_pg;
>>>>dev_geo->c.mw_cunits = 8;   /* default to MLC safe values */
>>>> +  dev_geo->c.maxoc = dev_geo->all_luns;   /* default to 1 chunk per LUN */
>>>> +  dev_geo->c.maxocpu = 1; /* default to 1 chunk per LUN */
>>> 
>>> One can't assume that it is 1 open chunk per lun. If you need this for 
>>> specific hardware, make a quirk for it.
>> Which default you want for 1.2 if not specified then? I use 1 because it
>> has been the implicit default until now.
> 
> INT_MAX, since it then allows the maximum of open chunks. It cannot be 
> assumed that other 1.2 devices is limited to a single open chunk.

So you want the default to be that all blocks on the device can be
opened at the same time. Interesting... I guess that such a SSD will
have a AA battery attached to it, but fine by me if that's how you want
it.

Assuming this, can we instead set it to the reported number of chunks,
since this is the hard limit anyway.

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 08/20] lightnvm: complete geo structure with maxoc*

2018-02-21 Thread Javier Gonzalez
> On 22 Feb 2018, at 08.45, Matias Bjørling  wrote:
> 
> On 02/21/2018 10:26 AM, Javier González wrote:
>> Complete the generic geometry structure with the maxoc and maxocpu
>> felds, present in the 2.0 spec.
>> Signed-off-by: Javier González 
>> ---
>>  drivers/nvme/host/lightnvm.c | 4 
>>  include/linux/lightnvm.h | 2 ++
>>  2 files changed, 6 insertions(+)
>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>> index cca32da05316..9c1f8225c4e1 100644
>> --- a/drivers/nvme/host/lightnvm.c
>> +++ b/drivers/nvme/host/lightnvm.c
>> @@ -318,6 +318,8 @@ static int nvme_nvm_setup_12(struct nvme_nvm_id12 *id,
>>  dev_geo->c.ws_min = sec_per_pg;
>>  dev_geo->c.ws_opt = sec_per_pg;
>>  dev_geo->c.mw_cunits = 8;   /* default to MLC safe values */
>> +dev_geo->c.maxoc = dev_geo->all_luns;   /* default to 1 chunk per LUN */
>> +dev_geo->c.maxocpu = 1; /* default to 1 chunk per LUN */
> 
> One can't assume that it is 1 open chunk per lun. If you need this for 
> specific hardware, make a quirk for it.
> 

Which default you want for 1.2 if not specified then? I use 1 because it
has been the implicit default until now.

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 08/20] lightnvm: complete geo structure with maxoc*

2018-02-21 Thread Javier Gonzalez
> On 22 Feb 2018, at 08.45, Matias Bjørling  wrote:
> 
> On 02/21/2018 10:26 AM, Javier González wrote:
>> Complete the generic geometry structure with the maxoc and maxocpu
>> felds, present in the 2.0 spec.
>> Signed-off-by: Javier González 
>> ---
>>  drivers/nvme/host/lightnvm.c | 4 
>>  include/linux/lightnvm.h | 2 ++
>>  2 files changed, 6 insertions(+)
>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>> index cca32da05316..9c1f8225c4e1 100644
>> --- a/drivers/nvme/host/lightnvm.c
>> +++ b/drivers/nvme/host/lightnvm.c
>> @@ -318,6 +318,8 @@ static int nvme_nvm_setup_12(struct nvme_nvm_id12 *id,
>>  dev_geo->c.ws_min = sec_per_pg;
>>  dev_geo->c.ws_opt = sec_per_pg;
>>  dev_geo->c.mw_cunits = 8;   /* default to MLC safe values */
>> +dev_geo->c.maxoc = dev_geo->all_luns;   /* default to 1 chunk per LUN */
>> +dev_geo->c.maxocpu = 1; /* default to 1 chunk per LUN */
> 
> One can't assume that it is 1 open chunk per lun. If you need this for 
> specific hardware, make a quirk for it.
> 

Which default you want for 1.2 if not specified then? I use 1 because it
has been the implicit default until now.

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 03/20] lightnvm: fix capabilities for 2.0 sysfs

2018-02-21 Thread Javier Gonzalez
> On 22 Feb 2018, at 08.28, Matias Bjørling  wrote:
> 
> On 02/21/2018 10:26 AM, Javier González wrote:
>> Both 1.2 and 2.0 specs define a field for media and controller
>> capabilities. Also, 1.2 defines a separate field dedicated to device
>> capabilities.
>> In 2.0 sysfs, this values have been mixed. Revert them to the right
>> value.
>> Signed-off-by: Javier González 
>> ---
>>  drivers/nvme/host/lightnvm.c | 18 +-
>>  1 file changed, 9 insertions(+), 9 deletions(-)
>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>> index 969bb874850c..598abba66f52 100644
>> --- a/drivers/nvme/host/lightnvm.c
>> +++ b/drivers/nvme/host/lightnvm.c
>> @@ -914,8 +914,8 @@ static ssize_t nvm_dev_attr_show(struct device *dev,
>>  if (strcmp(attr->name, "version") == 0) {
>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->ver_id);
>> -} else if (strcmp(attr->name, "capabilities") == 0) {
>> -return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.cap);
>> +} else if (strcmp(attr->name, "media_capabilities") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.mccap);
>>  } else if (strcmp(attr->name, "read_typ") == 0) {
>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.trdt);
>>  } else if (strcmp(attr->name, "read_max") == 0) {
>> @@ -993,8 +993,8 @@ static ssize_t nvm_dev_attr_show_12(struct device *dev,
>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.tbem);
>>  } else if (strcmp(attr->name, "multiplane_modes") == 0) {
>>  return scnprintf(page, PAGE_SIZE, "0x%08x\n", dev_geo->c.mpos);
>> -} else if (strcmp(attr->name, "media_capabilities") == 0) {
>> -return scnprintf(page, PAGE_SIZE, "0x%08x\n", dev_geo->c.mccap);
>> +} else if (strcmp(attr->name, "capabilities") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "0x%08x\n", dev_geo->c.cap);
>>  } else if (strcmp(attr->name, "max_phys_secs") == 0) {
>>  return scnprintf(page, PAGE_SIZE, "%u\n", NVM_MAX_VLBA);
>>  } else {
>> @@ -1055,7 +1055,7 @@ static ssize_t nvm_dev_attr_show_20(struct device *dev,
>>/* general attributes */
>>  static NVM_DEV_ATTR_RO(version);
>> -static NVM_DEV_ATTR_RO(capabilities);
>> +static NVM_DEV_ATTR_RO(media_capabilities);
>>static NVM_DEV_ATTR_RO(read_typ);
>>  static NVM_DEV_ATTR_RO(read_max);
>> @@ -1080,12 +1080,12 @@ static NVM_DEV_ATTR_12_RO(prog_max);
>>  static NVM_DEV_ATTR_12_RO(erase_typ);
>>  static NVM_DEV_ATTR_12_RO(erase_max);
>>  static NVM_DEV_ATTR_12_RO(multiplane_modes);
>> -static NVM_DEV_ATTR_12_RO(media_capabilities);
>> +static NVM_DEV_ATTR_12_RO(capabilities);
>>  static NVM_DEV_ATTR_12_RO(max_phys_secs);
>>static struct attribute *nvm_dev_attrs_12[] = {
>>  _attr_version.attr,
>> -_attr_capabilities.attr,
>> +_attr_media_capabilities.attr,
>>  _attr_vendor_opcode.attr,
>>  _attr_device_mode.attr,
>> @@ -1108,7 +1108,7 @@ static struct attribute *nvm_dev_attrs_12[] = {
>>  _attr_erase_typ.attr,
>>  _attr_erase_max.attr,
>>  _attr_multiplane_modes.attr,
>> -_attr_media_capabilities.attr,
>> +_attr_capabilities.attr,
>>  _attr_max_phys_secs.attr,
>>  NULL,
>> @@ -1134,7 +1134,7 @@ static NVM_DEV_ATTR_20_RO(reset_max);
>>static struct attribute *nvm_dev_attrs_20[] = {
>>  _attr_version.attr,
>> -_attr_capabilities.attr,
>> +_attr_media_capabilities.attr,
>>  _attr_groups.attr,
>>  _attr_punits.attr,
> 
> With the mccap changes, it should make sense to keep the capabilities
> as is.

The change adds mccap, but sysfs points to cap, which is wrong. This
patch is needed. Otherwise, we change the name of mccap to cap, which
is _very_ confusing to people familiar to both specs. We can change
the name of mccap to cap in a future spec revision.

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 03/20] lightnvm: fix capabilities for 2.0 sysfs

2018-02-21 Thread Javier Gonzalez
> On 22 Feb 2018, at 08.28, Matias Bjørling  wrote:
> 
> On 02/21/2018 10:26 AM, Javier González wrote:
>> Both 1.2 and 2.0 specs define a field for media and controller
>> capabilities. Also, 1.2 defines a separate field dedicated to device
>> capabilities.
>> In 2.0 sysfs, this values have been mixed. Revert them to the right
>> value.
>> Signed-off-by: Javier González 
>> ---
>>  drivers/nvme/host/lightnvm.c | 18 +-
>>  1 file changed, 9 insertions(+), 9 deletions(-)
>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>> index 969bb874850c..598abba66f52 100644
>> --- a/drivers/nvme/host/lightnvm.c
>> +++ b/drivers/nvme/host/lightnvm.c
>> @@ -914,8 +914,8 @@ static ssize_t nvm_dev_attr_show(struct device *dev,
>>  if (strcmp(attr->name, "version") == 0) {
>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->ver_id);
>> -} else if (strcmp(attr->name, "capabilities") == 0) {
>> -return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.cap);
>> +} else if (strcmp(attr->name, "media_capabilities") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.mccap);
>>  } else if (strcmp(attr->name, "read_typ") == 0) {
>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.trdt);
>>  } else if (strcmp(attr->name, "read_max") == 0) {
>> @@ -993,8 +993,8 @@ static ssize_t nvm_dev_attr_show_12(struct device *dev,
>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.tbem);
>>  } else if (strcmp(attr->name, "multiplane_modes") == 0) {
>>  return scnprintf(page, PAGE_SIZE, "0x%08x\n", dev_geo->c.mpos);
>> -} else if (strcmp(attr->name, "media_capabilities") == 0) {
>> -return scnprintf(page, PAGE_SIZE, "0x%08x\n", dev_geo->c.mccap);
>> +} else if (strcmp(attr->name, "capabilities") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "0x%08x\n", dev_geo->c.cap);
>>  } else if (strcmp(attr->name, "max_phys_secs") == 0) {
>>  return scnprintf(page, PAGE_SIZE, "%u\n", NVM_MAX_VLBA);
>>  } else {
>> @@ -1055,7 +1055,7 @@ static ssize_t nvm_dev_attr_show_20(struct device *dev,
>>/* general attributes */
>>  static NVM_DEV_ATTR_RO(version);
>> -static NVM_DEV_ATTR_RO(capabilities);
>> +static NVM_DEV_ATTR_RO(media_capabilities);
>>static NVM_DEV_ATTR_RO(read_typ);
>>  static NVM_DEV_ATTR_RO(read_max);
>> @@ -1080,12 +1080,12 @@ static NVM_DEV_ATTR_12_RO(prog_max);
>>  static NVM_DEV_ATTR_12_RO(erase_typ);
>>  static NVM_DEV_ATTR_12_RO(erase_max);
>>  static NVM_DEV_ATTR_12_RO(multiplane_modes);
>> -static NVM_DEV_ATTR_12_RO(media_capabilities);
>> +static NVM_DEV_ATTR_12_RO(capabilities);
>>  static NVM_DEV_ATTR_12_RO(max_phys_secs);
>>static struct attribute *nvm_dev_attrs_12[] = {
>>  _attr_version.attr,
>> -_attr_capabilities.attr,
>> +_attr_media_capabilities.attr,
>>  _attr_vendor_opcode.attr,
>>  _attr_device_mode.attr,
>> @@ -1108,7 +1108,7 @@ static struct attribute *nvm_dev_attrs_12[] = {
>>  _attr_erase_typ.attr,
>>  _attr_erase_max.attr,
>>  _attr_multiplane_modes.attr,
>> -_attr_media_capabilities.attr,
>> +_attr_capabilities.attr,
>>  _attr_max_phys_secs.attr,
>>  NULL,
>> @@ -1134,7 +1134,7 @@ static NVM_DEV_ATTR_20_RO(reset_max);
>>static struct attribute *nvm_dev_attrs_20[] = {
>>  _attr_version.attr,
>> -_attr_capabilities.attr,
>> +_attr_media_capabilities.attr,
>>  _attr_groups.attr,
>>  _attr_punits.attr,
> 
> With the mccap changes, it should make sense to keep the capabilities
> as is.

The change adds mccap, but sysfs points to cap, which is wrong. This
patch is needed. Otherwise, we change the name of mccap to cap, which
is _very_ confusing to people familiar to both specs. We can change
the name of mccap to cap in a future spec revision.

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 01/20] lightnvm: simplify geometry structure.

2018-02-21 Thread Javier Gonzalez

> On 22 Feb 2018, at 08.25, Matias Bjørling  wrote:
> 
> On 02/21/2018 10:26 AM, Javier González wrote:
>> Currently, the device geometry is stored redundantly in the nvm_id and
>> nvm_geo structures at a device level. Moreover, when instantiating
>> targets on a specific number of LUNs, these structures are replicated
>> and manually modified to fit the instance channel and LUN partitioning.
>> Instead, create a generic geometry around two base structures:
>> nvm_dev_geo, which describes the geometry of the whole device and
>> nvm_geo, which describes the geometry of the instance. Since these share
>> a big part of the geometry, create a nvm_common_geo structure that keeps
>> the static geoometry values that are shared across instances.
>> As we introduce support for 2.0, these structures allow to abstract
>> spec. specific values and present a common geometry to targets.
>> Signed-off-by: Javier González 
>> ---
>>  drivers/lightnvm/core.c  | 137 +++-
>>  drivers/lightnvm/pblk-core.c |  16 +-
>>  drivers/lightnvm/pblk-gc.c   |   2 +-
>>  drivers/lightnvm/pblk-init.c | 123 +++---
>>  drivers/lightnvm/pblk-read.c |   2 +-
>>  drivers/lightnvm/pblk-recovery.c |  14 +-
>>  drivers/lightnvm/pblk-rl.c   |   2 +-
>>  drivers/lightnvm/pblk-sysfs.c|  39 +++--
>>  drivers/lightnvm/pblk-write.c|   2 +-
>>  drivers/lightnvm/pblk.h  |  93 +--
>>  drivers/nvme/host/lightnvm.c | 339 
>> +++
>>  include/linux/lightnvm.h | 204 ---
>>  12 files changed, 514 insertions(+), 459 deletions(-)
>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>> index 689c97b97775..42596afdf64c 100644
>> --- a/drivers/lightnvm/core.c
>> +++ b/drivers/lightnvm/core.c
>> @@ -111,6 +111,7 @@ static void nvm_release_luns_err(struct nvm_dev *dev, 
>> int lun_begin,
>>  static void nvm_remove_tgt_dev(struct nvm_tgt_dev *tgt_dev, int clear)
>>  {
>>  struct nvm_dev *dev = tgt_dev->parent;
>> +struct nvm_dev_geo *dev_geo = >dev_geo;
>>  struct nvm_dev_map *dev_map = tgt_dev->map;
>>  int i, j;
>>  @@ -122,7 +123,7 @@ static void nvm_remove_tgt_dev(struct nvm_tgt_dev 
>> *tgt_dev, int clear)
>>  if (clear) {
>>  for (j = 0; j < ch_map->nr_luns; j++) {
>>  int lun = j + lun_offs[j];
>> -int lunid = (ch * dev->geo.nr_luns) + lun;
>> +int lunid = (ch * dev_geo->nr_luns) + lun;
>>  WARN_ON(!test_and_clear_bit(lunid,
>>  dev->lun_map));
>> @@ -143,19 +144,20 @@ static struct nvm_tgt_dev *nvm_create_tgt_dev(struct 
>> nvm_dev *dev,
>>u16 lun_begin, u16 lun_end,
>>u16 op)
>>  {
>> +struct nvm_dev_geo *dev_geo = >dev_geo;
>>  struct nvm_tgt_dev *tgt_dev = NULL;
>>  struct nvm_dev_map *dev_rmap = dev->rmap;
>>  struct nvm_dev_map *dev_map;
>>  struct ppa_addr *luns;
>>  int nr_luns = lun_end - lun_begin + 1;
>>  int luns_left = nr_luns;
>> -int nr_chnls = nr_luns / dev->geo.nr_luns;
>> -int nr_chnls_mod = nr_luns % dev->geo.nr_luns;
>> -int bch = lun_begin / dev->geo.nr_luns;
>> -int blun = lun_begin % dev->geo.nr_luns;
>> +int nr_chnls = nr_luns / dev_geo->nr_luns;
>> +int nr_chnls_mod = nr_luns % dev_geo->nr_luns;
>> +int bch = lun_begin / dev_geo->nr_luns;
>> +int blun = lun_begin % dev_geo->nr_luns;
>>  int lunid = 0;
>>  int lun_balanced = 1;
>> -int prev_nr_luns;
>> +int sec_per_lun, prev_nr_luns;
>>  int i, j;
>>  nr_chnls = (nr_chnls_mod == 0) ? nr_chnls : nr_chnls + 1;
>> @@ -173,15 +175,15 @@ static struct nvm_tgt_dev *nvm_create_tgt_dev(struct 
>> nvm_dev *dev,
>>  if (!luns)
>>  goto err_luns;
>>  -   prev_nr_luns = (luns_left > dev->geo.nr_luns) ?
>> -dev->geo.nr_luns : luns_left;
>> +prev_nr_luns = (luns_left > dev_geo->nr_luns) ?
>> +dev_geo->nr_luns : luns_left;
>>  for (i = 0; i < nr_chnls; i++) {
>>  struct nvm_ch_map *ch_rmap = _rmap->chnls[i + bch];
>>  int *lun_roffs = ch_rmap->lun_offs;
>>  struct nvm_ch_map *ch_map = _map->chnls[i];
>>  int *lun_offs;
>> -int luns_in_chnl = (luns_left > dev->geo.nr_luns) ?
>> -dev->geo.nr_luns : luns_left;
>> +int luns_in_chnl = (luns_left > dev_geo->nr_luns) ?
>> +dev_geo->nr_luns : luns_left;
>>  if (lun_balanced && prev_nr_luns != luns_in_chnl)
>>  lun_balanced = 0;
>> @@ -215,18 +217,22 @@ static struct nvm_tgt_dev *nvm_create_tgt_dev(struct 
>> nvm_dev *dev,
>>

Re: [PATCH 01/20] lightnvm: simplify geometry structure.

2018-02-21 Thread Javier Gonzalez

> On 22 Feb 2018, at 08.25, Matias Bjørling  wrote:
> 
> On 02/21/2018 10:26 AM, Javier González wrote:
>> Currently, the device geometry is stored redundantly in the nvm_id and
>> nvm_geo structures at a device level. Moreover, when instantiating
>> targets on a specific number of LUNs, these structures are replicated
>> and manually modified to fit the instance channel and LUN partitioning.
>> Instead, create a generic geometry around two base structures:
>> nvm_dev_geo, which describes the geometry of the whole device and
>> nvm_geo, which describes the geometry of the instance. Since these share
>> a big part of the geometry, create a nvm_common_geo structure that keeps
>> the static geoometry values that are shared across instances.
>> As we introduce support for 2.0, these structures allow to abstract
>> spec. specific values and present a common geometry to targets.
>> Signed-off-by: Javier González 
>> ---
>>  drivers/lightnvm/core.c  | 137 +++-
>>  drivers/lightnvm/pblk-core.c |  16 +-
>>  drivers/lightnvm/pblk-gc.c   |   2 +-
>>  drivers/lightnvm/pblk-init.c | 123 +++---
>>  drivers/lightnvm/pblk-read.c |   2 +-
>>  drivers/lightnvm/pblk-recovery.c |  14 +-
>>  drivers/lightnvm/pblk-rl.c   |   2 +-
>>  drivers/lightnvm/pblk-sysfs.c|  39 +++--
>>  drivers/lightnvm/pblk-write.c|   2 +-
>>  drivers/lightnvm/pblk.h  |  93 +--
>>  drivers/nvme/host/lightnvm.c | 339 
>> +++
>>  include/linux/lightnvm.h | 204 ---
>>  12 files changed, 514 insertions(+), 459 deletions(-)
>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>> index 689c97b97775..42596afdf64c 100644
>> --- a/drivers/lightnvm/core.c
>> +++ b/drivers/lightnvm/core.c
>> @@ -111,6 +111,7 @@ static void nvm_release_luns_err(struct nvm_dev *dev, 
>> int lun_begin,
>>  static void nvm_remove_tgt_dev(struct nvm_tgt_dev *tgt_dev, int clear)
>>  {
>>  struct nvm_dev *dev = tgt_dev->parent;
>> +struct nvm_dev_geo *dev_geo = >dev_geo;
>>  struct nvm_dev_map *dev_map = tgt_dev->map;
>>  int i, j;
>>  @@ -122,7 +123,7 @@ static void nvm_remove_tgt_dev(struct nvm_tgt_dev 
>> *tgt_dev, int clear)
>>  if (clear) {
>>  for (j = 0; j < ch_map->nr_luns; j++) {
>>  int lun = j + lun_offs[j];
>> -int lunid = (ch * dev->geo.nr_luns) + lun;
>> +int lunid = (ch * dev_geo->nr_luns) + lun;
>>  WARN_ON(!test_and_clear_bit(lunid,
>>  dev->lun_map));
>> @@ -143,19 +144,20 @@ static struct nvm_tgt_dev *nvm_create_tgt_dev(struct 
>> nvm_dev *dev,
>>u16 lun_begin, u16 lun_end,
>>u16 op)
>>  {
>> +struct nvm_dev_geo *dev_geo = >dev_geo;
>>  struct nvm_tgt_dev *tgt_dev = NULL;
>>  struct nvm_dev_map *dev_rmap = dev->rmap;
>>  struct nvm_dev_map *dev_map;
>>  struct ppa_addr *luns;
>>  int nr_luns = lun_end - lun_begin + 1;
>>  int luns_left = nr_luns;
>> -int nr_chnls = nr_luns / dev->geo.nr_luns;
>> -int nr_chnls_mod = nr_luns % dev->geo.nr_luns;
>> -int bch = lun_begin / dev->geo.nr_luns;
>> -int blun = lun_begin % dev->geo.nr_luns;
>> +int nr_chnls = nr_luns / dev_geo->nr_luns;
>> +int nr_chnls_mod = nr_luns % dev_geo->nr_luns;
>> +int bch = lun_begin / dev_geo->nr_luns;
>> +int blun = lun_begin % dev_geo->nr_luns;
>>  int lunid = 0;
>>  int lun_balanced = 1;
>> -int prev_nr_luns;
>> +int sec_per_lun, prev_nr_luns;
>>  int i, j;
>>  nr_chnls = (nr_chnls_mod == 0) ? nr_chnls : nr_chnls + 1;
>> @@ -173,15 +175,15 @@ static struct nvm_tgt_dev *nvm_create_tgt_dev(struct 
>> nvm_dev *dev,
>>  if (!luns)
>>  goto err_luns;
>>  -   prev_nr_luns = (luns_left > dev->geo.nr_luns) ?
>> -dev->geo.nr_luns : luns_left;
>> +prev_nr_luns = (luns_left > dev_geo->nr_luns) ?
>> +dev_geo->nr_luns : luns_left;
>>  for (i = 0; i < nr_chnls; i++) {
>>  struct nvm_ch_map *ch_rmap = _rmap->chnls[i + bch];
>>  int *lun_roffs = ch_rmap->lun_offs;
>>  struct nvm_ch_map *ch_map = _map->chnls[i];
>>  int *lun_offs;
>> -int luns_in_chnl = (luns_left > dev->geo.nr_luns) ?
>> -dev->geo.nr_luns : luns_left;
>> +int luns_in_chnl = (luns_left > dev_geo->nr_luns) ?
>> +dev_geo->nr_luns : luns_left;
>>  if (lun_balanced && prev_nr_luns != luns_in_chnl)
>>  lun_balanced = 0;
>> @@ -215,18 +217,22 @@ static struct nvm_tgt_dev *nvm_create_tgt_dev(struct 
>> nvm_dev *dev,
>>  if (!tgt_dev)
>>  goto 

Re: [PATCH v2 0/6] lightnvm: base 2.0 implementation

2018-02-21 Thread Javier Gonzalez
> On 15 Feb 2018, at 14.11, Matias Bjørling  wrote:
> 
> A couple of patches for 2.0 support for the lightnvm subsystem. They
> form the foundation for the integration.
> 
> The first two patches is preparation for the 2.0 work. The third patch
> implements the 2.0 data structures, the geometry command, and exposes
> the sysfs attributes that comes with the 2.0 specification. Note that
> the attributes between 1.2 and 2.0 are different, and it is expected
> that user-space shall use the version sysfs attribute to know which
> attributes will be available.
> 
> The next two patches removes max_phys_sect and max_rq_size, as they
> not used.
> 
> The last patch implements support for using the nvme namespace logical
> block and metadata fields and sync it with the internal lightnvm
> identify structures.
> 
> Changes since v2:
> 
> - Removed blk_queue_block_size() setup in nvm_init and made sure
>   to only update csecs and sos in on the late setup path. No reason
>   to set it twice. From discussion with Javier.
> - Added two extra patches, that removes max_phys_sect and
>   max_rq_size.
> 
> Changes since v1:
> 
> - pr_err fix from Randy.
> - Address type fix from Javier.
> - Also CC the nvme mailing list.
> 
> Matias Bjørling (6):
>  lightnvm: make 1.2 data structures explicit
>  lightnvm: flatten nvm_id_group into nvm_id
>  lightnvm: add 2.0 geometry identification
>  lightnvm: remove max_rq_size
>  lightnvm: remove nvm_dev_ops->max_phys_sect
>  nvme: lightnvm: add late setup of block size and metadata
> 
> drivers/lightnvm/core.c  |  61 ++---
> drivers/lightnvm/pblk-init.c |   9 +-
> drivers/lightnvm/pblk-recovery.c |   8 +-
> drivers/nvme/host/core.c |   2 +
> drivers/nvme/host/lightnvm.c | 513 ---
> drivers/nvme/host/nvme.h |   2 +
> include/linux/lightnvm.h |  71 +++---
> 7 files changed, 442 insertions(+), 224 deletions(-)
> 
> --
> 2.11.0
> 

The patches look good. I tested them together with pblk's 2.0 support
and all works as it should.

Reviewed-by: Javier González 



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH v2 0/6] lightnvm: base 2.0 implementation

2018-02-21 Thread Javier Gonzalez
> On 15 Feb 2018, at 14.11, Matias Bjørling  wrote:
> 
> A couple of patches for 2.0 support for the lightnvm subsystem. They
> form the foundation for the integration.
> 
> The first two patches is preparation for the 2.0 work. The third patch
> implements the 2.0 data structures, the geometry command, and exposes
> the sysfs attributes that comes with the 2.0 specification. Note that
> the attributes between 1.2 and 2.0 are different, and it is expected
> that user-space shall use the version sysfs attribute to know which
> attributes will be available.
> 
> The next two patches removes max_phys_sect and max_rq_size, as they
> not used.
> 
> The last patch implements support for using the nvme namespace logical
> block and metadata fields and sync it with the internal lightnvm
> identify structures.
> 
> Changes since v2:
> 
> - Removed blk_queue_block_size() setup in nvm_init and made sure
>   to only update csecs and sos in on the late setup path. No reason
>   to set it twice. From discussion with Javier.
> - Added two extra patches, that removes max_phys_sect and
>   max_rq_size.
> 
> Changes since v1:
> 
> - pr_err fix from Randy.
> - Address type fix from Javier.
> - Also CC the nvme mailing list.
> 
> Matias Bjørling (6):
>  lightnvm: make 1.2 data structures explicit
>  lightnvm: flatten nvm_id_group into nvm_id
>  lightnvm: add 2.0 geometry identification
>  lightnvm: remove max_rq_size
>  lightnvm: remove nvm_dev_ops->max_phys_sect
>  nvme: lightnvm: add late setup of block size and metadata
> 
> drivers/lightnvm/core.c  |  61 ++---
> drivers/lightnvm/pblk-init.c |   9 +-
> drivers/lightnvm/pblk-recovery.c |   8 +-
> drivers/nvme/host/core.c |   2 +
> drivers/nvme/host/lightnvm.c | 513 ---
> drivers/nvme/host/nvme.h |   2 +
> include/linux/lightnvm.h |  71 +++---
> 7 files changed, 442 insertions(+), 224 deletions(-)
> 
> --
> 2.11.0
> 

The patches look good. I tested them together with pblk's 2.0 support
and all works as it should.

Reviewed-by: Javier González 



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH v2 5/6] lightnvm: remove nvm_dev_ops->max_phys_sect

2018-02-19 Thread Javier Gonzalez
> On 19 Feb 2018, at 08.31, Matias Bjørling <m...@lightnvm.io> wrote:
> 
> On 02/16/2018 07:48 AM, Javier Gonzalez wrote:
>>> On 15 Feb 2018, at 05.11, Matias Bjørling <m...@lightnvm.io> wrote:
>>> 
>>> The value of max_phys_sect is always static. Instead of
>>> defining it in the nvm_dev_ops structure, declare it as a global
>>> value.
>>> 
>>> Signed-off-by: Matias Bjørling <m...@lightnvm.io>
>>> ---
>>> drivers/lightnvm/core.c  | 28 +++-
>>> drivers/lightnvm/pblk-init.c |  9 -
>>> drivers/lightnvm/pblk-recovery.c |  8 ++--
>>> drivers/nvme/host/lightnvm.c |  5 +
>>> include/linux/lightnvm.h |  5 ++---
>>> 5 files changed, 16 insertions(+), 39 deletions(-)
>> The patch looks good, but I have a question. If a target implements the
>> scalar interface, then it will not be limited to 64 lbas/ppas and it
>> will not make sense to split the bio base don this value. In fact, it
>> looks like in time, we will move to a scalar interface in the 2.0 path
>> to align with the zoned interface, so this value will be dependent on
>> whether the target is using the scalar or vector interface.
> 
> Both read/write and vector interface will coexist. I am only removing
> what is hardwired into the specification.
> 
> The read/write interface has always been able issue more than 64 LBAs,
> it is instead limited by what the hardware reports its max transfer
> size to be.
> 

Exactly. I was thinking of a similar mechanism for the vector interface
to simplify integration with the scalar interface and avoid having an
if/else for what we now call max_phys_sect.

I guess we can wait and see what the code looks like when we adapt pblk.

Reviewed-by: Javier González <jav...@cnexlabs.com>

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH v2 5/6] lightnvm: remove nvm_dev_ops->max_phys_sect

2018-02-19 Thread Javier Gonzalez
> On 19 Feb 2018, at 08.31, Matias Bjørling  wrote:
> 
> On 02/16/2018 07:48 AM, Javier Gonzalez wrote:
>>> On 15 Feb 2018, at 05.11, Matias Bjørling  wrote:
>>> 
>>> The value of max_phys_sect is always static. Instead of
>>> defining it in the nvm_dev_ops structure, declare it as a global
>>> value.
>>> 
>>> Signed-off-by: Matias Bjørling 
>>> ---
>>> drivers/lightnvm/core.c  | 28 +++-
>>> drivers/lightnvm/pblk-init.c |  9 -
>>> drivers/lightnvm/pblk-recovery.c |  8 ++--
>>> drivers/nvme/host/lightnvm.c |  5 +
>>> include/linux/lightnvm.h |  5 ++---
>>> 5 files changed, 16 insertions(+), 39 deletions(-)
>> The patch looks good, but I have a question. If a target implements the
>> scalar interface, then it will not be limited to 64 lbas/ppas and it
>> will not make sense to split the bio base don this value. In fact, it
>> looks like in time, we will move to a scalar interface in the 2.0 path
>> to align with the zoned interface, so this value will be dependent on
>> whether the target is using the scalar or vector interface.
> 
> Both read/write and vector interface will coexist. I am only removing
> what is hardwired into the specification.
> 
> The read/write interface has always been able issue more than 64 LBAs,
> it is instead limited by what the hardware reports its max transfer
> size to be.
> 

Exactly. I was thinking of a similar mechanism for the vector interface
to simplify integration with the scalar interface and avoid having an
if/else for what we now call max_phys_sect.

I guess we can wait and see what the code looks like when we adapt pblk.

Reviewed-by: Javier González 

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH v2 5/6] lightnvm: remove nvm_dev_ops->max_phys_sect

2018-02-15 Thread Javier Gonzalez

> On 15 Feb 2018, at 05.11, Matias Bjørling  wrote:
> 
> The value of max_phys_sect is always static. Instead of
> defining it in the nvm_dev_ops structure, declare it as a global
> value.
> 
> Signed-off-by: Matias Bjørling 
> ---
> drivers/lightnvm/core.c  | 28 +++-
> drivers/lightnvm/pblk-init.c |  9 -
> drivers/lightnvm/pblk-recovery.c |  8 ++--
> drivers/nvme/host/lightnvm.c |  5 +
> include/linux/lightnvm.h |  5 ++---
> 5 files changed, 16 insertions(+), 39 deletions(-)
> 

The patch looks good, but I have a question. If a target implements the
scalar interface, then it will not be limited to 64 lbas/ppas and it
will not make sense to split the bio base don this value. In fact, it
looks like in time, we will move to a scalar interface in the 2.0 path
to align with the zoned interface, so this value will be dependent on
whether the target is using the scalar or vector interface.

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH v2 5/6] lightnvm: remove nvm_dev_ops->max_phys_sect

2018-02-15 Thread Javier Gonzalez

> On 15 Feb 2018, at 05.11, Matias Bjørling  wrote:
> 
> The value of max_phys_sect is always static. Instead of
> defining it in the nvm_dev_ops structure, declare it as a global
> value.
> 
> Signed-off-by: Matias Bjørling 
> ---
> drivers/lightnvm/core.c  | 28 +++-
> drivers/lightnvm/pblk-init.c |  9 -
> drivers/lightnvm/pblk-recovery.c |  8 ++--
> drivers/nvme/host/lightnvm.c |  5 +
> include/linux/lightnvm.h |  5 ++---
> 5 files changed, 16 insertions(+), 39 deletions(-)
> 

The patch looks good, but I have a question. If a target implements the
scalar interface, then it will not be limited to 64 lbas/ppas and it
will not make sense to split the bio base don this value. In fact, it
looks like in time, we will move to a scalar interface in the 2.0 path
to align with the zoned interface, so this value will be dependent on
whether the target is using the scalar or vector interface.

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 2/8] lightnvm: show generic geometry in sysfs

2018-02-15 Thread Javier Gonzalez

> On 15 Feb 2018, at 02.20, Matias Bjørling  wrote:
> 
> On 02/13/2018 03:06 PM, Javier González wrote:
>> From: Javier González 
>> Apart from showing the geometry returned by the different identify
>> commands, provide the generic geometry too, as this is the geometry that
>> targets will use to describe the device.
>> Signed-off-by: Javier González 
>> ---
>>  drivers/nvme/host/lightnvm.c | 146 
>> ---
>>  1 file changed, 97 insertions(+), 49 deletions(-)
>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>> index 97739e668602..7bc75182c723 100644
>> --- a/drivers/nvme/host/lightnvm.c
>> +++ b/drivers/nvme/host/lightnvm.c
>> @@ -944,8 +944,27 @@ static ssize_t nvm_dev_attr_show(struct device *dev,
>>  return scnprintf(page, PAGE_SIZE, "%u.%u\n",
>>  dev_geo->major_ver_id,
>>  dev_geo->minor_ver_id);
>> -} else if (strcmp(attr->name, "capabilities") == 0) {
>> -return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.cap);
>> +} else if (strcmp(attr->name, "clba") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.clba);
>> +} else if (strcmp(attr->name, "csecs") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.csecs);
>> +} else if (strcmp(attr->name, "sos") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.sos);
>> +} else if (strcmp(attr->name, "ws_min") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.ws_min);
>> +} else if (strcmp(attr->name, "ws_opt") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.ws_opt);
>> +} else if (strcmp(attr->name, "maxoc") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.maxoc);
>> +} else if (strcmp(attr->name, "maxocpu") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.maxocpu);
>> +} else if (strcmp(attr->name, "mw_cunits") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.mw_cunits);
>> +} else if (strcmp(attr->name, "media_capabilities") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.mccap);
>> +} else if (strcmp(attr->name, "max_phys_secs") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n",
>> +ndev->ops->max_phys_sect);
>>  } else if (strcmp(attr->name, "read_typ") == 0) {
>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.trdt);
>>  } else if (strcmp(attr->name, "read_max") == 0) {
>> @@ -984,19 +1003,8 @@ static ssize_t nvm_dev_attr_show_12(struct device *dev,
>>  attr = >attr;
>>  -   if (strcmp(attr->name, "vendor_opcode") == 0) {
>> -return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.vmnt);
>> -} else if (strcmp(attr->name, "device_mode") == 0) {
>> -return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.dom);
>> -/* kept for compatibility */
>> -} else if (strcmp(attr->name, "media_manager") == 0) {
>> -return scnprintf(page, PAGE_SIZE, "%s\n", "gennvm");
>> -} else if (strcmp(attr->name, "ppa_format") == 0) {
>> +if (strcmp(attr->name, "ppa_format") == 0) {
>>  return nvm_dev_attr_show_ppaf((void *)_geo->c.addrf, page);
>> -} else if (strcmp(attr->name, "media_type") == 0) { /* u8 */
>> -return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.mtype);
>> -} else if (strcmp(attr->name, "flash_media_type") == 0) {
>> -return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.fmtype);
>>  } else if (strcmp(attr->name, "num_channels") == 0) {
>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->num_ch);
>>  } else if (strcmp(attr->name, "num_luns") == 0) {
>> @@ -1011,8 +1019,6 @@ static ssize_t nvm_dev_attr_show_12(struct device *dev,
>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.fpg_sz);
>>  } else if (strcmp(attr->name, "hw_sector_size") == 0) {
>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.csecs);
>> -} else if (strcmp(attr->name, "oob_sector_size") == 0) {/* u32 */
>> -return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.sos);
>>  } else if (strcmp(attr->name, "prog_typ") == 0) {
>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.tprt);
>>  } else if (strcmp(attr->name, "prog_max") == 0) {
>> @@ -1021,13 +1027,21 @@ static ssize_t nvm_dev_attr_show_12(struct device 
>> *dev,
>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.tbet);
>>  } else if (strcmp(attr->name, "erase_max") == 0) {
>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.tbem);
>> +} else if (strcmp(attr->name, "vendor_opcode") == 0) {
>> +

Re: [PATCH 2/8] lightnvm: show generic geometry in sysfs

2018-02-15 Thread Javier Gonzalez

> On 15 Feb 2018, at 02.20, Matias Bjørling  wrote:
> 
> On 02/13/2018 03:06 PM, Javier González wrote:
>> From: Javier González 
>> Apart from showing the geometry returned by the different identify
>> commands, provide the generic geometry too, as this is the geometry that
>> targets will use to describe the device.
>> Signed-off-by: Javier González 
>> ---
>>  drivers/nvme/host/lightnvm.c | 146 
>> ---
>>  1 file changed, 97 insertions(+), 49 deletions(-)
>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>> index 97739e668602..7bc75182c723 100644
>> --- a/drivers/nvme/host/lightnvm.c
>> +++ b/drivers/nvme/host/lightnvm.c
>> @@ -944,8 +944,27 @@ static ssize_t nvm_dev_attr_show(struct device *dev,
>>  return scnprintf(page, PAGE_SIZE, "%u.%u\n",
>>  dev_geo->major_ver_id,
>>  dev_geo->minor_ver_id);
>> -} else if (strcmp(attr->name, "capabilities") == 0) {
>> -return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.cap);
>> +} else if (strcmp(attr->name, "clba") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.clba);
>> +} else if (strcmp(attr->name, "csecs") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.csecs);
>> +} else if (strcmp(attr->name, "sos") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.sos);
>> +} else if (strcmp(attr->name, "ws_min") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.ws_min);
>> +} else if (strcmp(attr->name, "ws_opt") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.ws_opt);
>> +} else if (strcmp(attr->name, "maxoc") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.maxoc);
>> +} else if (strcmp(attr->name, "maxocpu") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.maxocpu);
>> +} else if (strcmp(attr->name, "mw_cunits") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.mw_cunits);
>> +} else if (strcmp(attr->name, "media_capabilities") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.mccap);
>> +} else if (strcmp(attr->name, "max_phys_secs") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n",
>> +ndev->ops->max_phys_sect);
>>  } else if (strcmp(attr->name, "read_typ") == 0) {
>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.trdt);
>>  } else if (strcmp(attr->name, "read_max") == 0) {
>> @@ -984,19 +1003,8 @@ static ssize_t nvm_dev_attr_show_12(struct device *dev,
>>  attr = >attr;
>>  -   if (strcmp(attr->name, "vendor_opcode") == 0) {
>> -return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.vmnt);
>> -} else if (strcmp(attr->name, "device_mode") == 0) {
>> -return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.dom);
>> -/* kept for compatibility */
>> -} else if (strcmp(attr->name, "media_manager") == 0) {
>> -return scnprintf(page, PAGE_SIZE, "%s\n", "gennvm");
>> -} else if (strcmp(attr->name, "ppa_format") == 0) {
>> +if (strcmp(attr->name, "ppa_format") == 0) {
>>  return nvm_dev_attr_show_ppaf((void *)_geo->c.addrf, page);
>> -} else if (strcmp(attr->name, "media_type") == 0) { /* u8 */
>> -return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.mtype);
>> -} else if (strcmp(attr->name, "flash_media_type") == 0) {
>> -return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.fmtype);
>>  } else if (strcmp(attr->name, "num_channels") == 0) {
>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->num_ch);
>>  } else if (strcmp(attr->name, "num_luns") == 0) {
>> @@ -1011,8 +1019,6 @@ static ssize_t nvm_dev_attr_show_12(struct device *dev,
>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.fpg_sz);
>>  } else if (strcmp(attr->name, "hw_sector_size") == 0) {
>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.csecs);
>> -} else if (strcmp(attr->name, "oob_sector_size") == 0) {/* u32 */
>> -return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.sos);
>>  } else if (strcmp(attr->name, "prog_typ") == 0) {
>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.tprt);
>>  } else if (strcmp(attr->name, "prog_max") == 0) {
>> @@ -1021,13 +1027,21 @@ static ssize_t nvm_dev_attr_show_12(struct device 
>> *dev,
>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.tbet);
>>  } else if (strcmp(attr->name, "erase_max") == 0) {
>>  return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.tbem);
>> +} else if (strcmp(attr->name, "vendor_opcode") == 0) {
>> +return scnprintf(page, PAGE_SIZE, "%u\n", dev_geo->c.vmnt);

Re: [PATCH 1/8] lightnvm: exposed generic geometry to targets

2018-02-15 Thread Javier Gonzalez

> On 15 Feb 2018, at 02.13, Matias Bjørling  wrote:
> 
>> On 02/13/2018 03:06 PM, Javier González wrote:
>> With the inclusion of 2.0 support, we need a generic geometry that
>> describes the OCSSD independently of the specification that it
>> implements. Otherwise, geometry specific code is required, which
>> complicates targets and makes maintenance much more difficult.
>> This patch refactors the identify path and populates a generic geometry
>> that is then given to the targets on creation. Since the 2.0 geometry is
>> much more abstract that 1.2, the generic geometry resembles 2.0, but it
>> is not identical, as it needs to understand 1.2 abstractions too.
>> Signed-off-by: Javier González 
>> ---
>>  drivers/lightnvm/core.c  | 143 ++-
>>  drivers/lightnvm/pblk-core.c |  16 +-
>>  drivers/lightnvm/pblk-gc.c   |   2 +-
>>  drivers/lightnvm/pblk-init.c | 149 ---
>>  drivers/lightnvm/pblk-read.c |   2 +-
>>  drivers/lightnvm/pblk-recovery.c |  14 +-
>>  drivers/lightnvm/pblk-rl.c   |   2 +-
>>  drivers/lightnvm/pblk-sysfs.c|  39 ++--
>>  drivers/lightnvm/pblk-write.c|   2 +-
>>  drivers/lightnvm/pblk.h  | 105 +--
>>  drivers/nvme/host/lightnvm.c | 379 
>> ---
>>  include/linux/lightnvm.h | 220 +--
>>  12 files changed, 586 insertions(+), 487 deletions(-)
>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>> index 9b1255b3e05e..80492fa6ee76 100644
>> --- a/drivers/lightnvm/core.c
>> +++ b/drivers/lightnvm/core.c
>> @@ -111,6 +111,7 @@ static void nvm_release_luns_err(struct nvm_dev *dev, 
>> int lun_begin,
>>  static void nvm_remove_tgt_dev(struct nvm_tgt_dev *tgt_dev, int clear)
>>  {
>>  struct nvm_dev *dev = tgt_dev->parent;
>> +struct nvm_dev_geo *dev_geo = >dev_geo;
>>  struct nvm_dev_map *dev_map = tgt_dev->map;
>>  int i, j;
>>  @@ -122,7 +123,7 @@ static void nvm_remove_tgt_dev(struct nvm_tgt_dev 
>> *tgt_dev, int clear)
>>  if (clear) {
>>  for (j = 0; j < ch_map->nr_luns; j++) {
>>  int lun = j + lun_offs[j];
>> -int lunid = (ch * dev->geo.nr_luns) + lun;
>> +int lunid = (ch * dev_geo->num_lun) + lun;
>>WARN_ON(!test_and_clear_bit(lunid,
>>  dev->lun_map));
>> @@ -143,19 +144,20 @@ static struct nvm_tgt_dev *nvm_create_tgt_dev(struct 
>> nvm_dev *dev,
>>u16 lun_begin, u16 lun_end,
>>u16 op)
>>  {
>> +struct nvm_dev_geo *dev_geo = >dev_geo;
>>  struct nvm_tgt_dev *tgt_dev = NULL;
>>  struct nvm_dev_map *dev_rmap = dev->rmap;
>>  struct nvm_dev_map *dev_map;
>>  struct ppa_addr *luns;
>>  int nr_luns = lun_end - lun_begin + 1;
>>  int luns_left = nr_luns;
>> -int nr_chnls = nr_luns / dev->geo.nr_luns;
>> -int nr_chnls_mod = nr_luns % dev->geo.nr_luns;
>> -int bch = lun_begin / dev->geo.nr_luns;
>> -int blun = lun_begin % dev->geo.nr_luns;
>> +int nr_chnls = nr_luns / dev_geo->num_lun;
>> +int nr_chnls_mod = nr_luns % dev_geo->num_lun;
>> +int bch = lun_begin / dev_geo->num_lun;
>> +int blun = lun_begin % dev_geo->num_lun;
>>  int lunid = 0;
>>  int lun_balanced = 1;
>> -int prev_nr_luns;
>> +int sec_per_lun, prev_nr_luns;
>>  int i, j;
>>nr_chnls = (nr_chnls_mod == 0) ? nr_chnls : nr_chnls + 1;
>> @@ -173,15 +175,15 @@ static struct nvm_tgt_dev *nvm_create_tgt_dev(struct 
>> nvm_dev *dev,
>>  if (!luns)
>>  goto err_luns;
>>  -prev_nr_luns = (luns_left > dev->geo.nr_luns) ?
>> -dev->geo.nr_luns : luns_left;
>> +prev_nr_luns = (luns_left > dev_geo->num_lun) ?
>> +dev_geo->num_lun : luns_left;
>>  for (i = 0; i < nr_chnls; i++) {
>>  struct nvm_ch_map *ch_rmap = _rmap->chnls[i + bch];
>>  int *lun_roffs = ch_rmap->lun_offs;
>>  struct nvm_ch_map *ch_map = _map->chnls[i];
>>  int *lun_offs;
>> -int luns_in_chnl = (luns_left > dev->geo.nr_luns) ?
>> -dev->geo.nr_luns : luns_left;
>> +int luns_in_chnl = (luns_left > dev_geo->num_lun) ?
>> +dev_geo->num_lun : luns_left;
>>if (lun_balanced && prev_nr_luns != luns_in_chnl)
>>  lun_balanced = 0;
>> @@ -215,18 +217,23 @@ static struct nvm_tgt_dev *nvm_create_tgt_dev(struct 
>> nvm_dev *dev,
>>  if (!tgt_dev)
>>  goto err_ch;
>>  -memcpy(_dev->geo, >geo, sizeof(struct nvm_geo));
>>  /* Target device only owns a portion of the physical device */
>> -tgt_dev->geo.nr_chnls = nr_chnls;
>> +tgt_dev->geo.num_ch = nr_chnls;
>> +tgt_dev->geo.num_lun = (lun_balanced) ? prev_nr_luns : -1;
>>  tgt_dev->geo.all_luns = nr_luns;
>> -tgt_dev->geo.nr_luns = (lun_balanced) ? prev_nr_luns : -1;
>> 

Re: [PATCH 1/8] lightnvm: exposed generic geometry to targets

2018-02-15 Thread Javier Gonzalez

> On 15 Feb 2018, at 02.13, Matias Bjørling  wrote:
> 
>> On 02/13/2018 03:06 PM, Javier González wrote:
>> With the inclusion of 2.0 support, we need a generic geometry that
>> describes the OCSSD independently of the specification that it
>> implements. Otherwise, geometry specific code is required, which
>> complicates targets and makes maintenance much more difficult.
>> This patch refactors the identify path and populates a generic geometry
>> that is then given to the targets on creation. Since the 2.0 geometry is
>> much more abstract that 1.2, the generic geometry resembles 2.0, but it
>> is not identical, as it needs to understand 1.2 abstractions too.
>> Signed-off-by: Javier González 
>> ---
>>  drivers/lightnvm/core.c  | 143 ++-
>>  drivers/lightnvm/pblk-core.c |  16 +-
>>  drivers/lightnvm/pblk-gc.c   |   2 +-
>>  drivers/lightnvm/pblk-init.c | 149 ---
>>  drivers/lightnvm/pblk-read.c |   2 +-
>>  drivers/lightnvm/pblk-recovery.c |  14 +-
>>  drivers/lightnvm/pblk-rl.c   |   2 +-
>>  drivers/lightnvm/pblk-sysfs.c|  39 ++--
>>  drivers/lightnvm/pblk-write.c|   2 +-
>>  drivers/lightnvm/pblk.h  | 105 +--
>>  drivers/nvme/host/lightnvm.c | 379 
>> ---
>>  include/linux/lightnvm.h | 220 +--
>>  12 files changed, 586 insertions(+), 487 deletions(-)
>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>> index 9b1255b3e05e..80492fa6ee76 100644
>> --- a/drivers/lightnvm/core.c
>> +++ b/drivers/lightnvm/core.c
>> @@ -111,6 +111,7 @@ static void nvm_release_luns_err(struct nvm_dev *dev, 
>> int lun_begin,
>>  static void nvm_remove_tgt_dev(struct nvm_tgt_dev *tgt_dev, int clear)
>>  {
>>  struct nvm_dev *dev = tgt_dev->parent;
>> +struct nvm_dev_geo *dev_geo = >dev_geo;
>>  struct nvm_dev_map *dev_map = tgt_dev->map;
>>  int i, j;
>>  @@ -122,7 +123,7 @@ static void nvm_remove_tgt_dev(struct nvm_tgt_dev 
>> *tgt_dev, int clear)
>>  if (clear) {
>>  for (j = 0; j < ch_map->nr_luns; j++) {
>>  int lun = j + lun_offs[j];
>> -int lunid = (ch * dev->geo.nr_luns) + lun;
>> +int lunid = (ch * dev_geo->num_lun) + lun;
>>WARN_ON(!test_and_clear_bit(lunid,
>>  dev->lun_map));
>> @@ -143,19 +144,20 @@ static struct nvm_tgt_dev *nvm_create_tgt_dev(struct 
>> nvm_dev *dev,
>>u16 lun_begin, u16 lun_end,
>>u16 op)
>>  {
>> +struct nvm_dev_geo *dev_geo = >dev_geo;
>>  struct nvm_tgt_dev *tgt_dev = NULL;
>>  struct nvm_dev_map *dev_rmap = dev->rmap;
>>  struct nvm_dev_map *dev_map;
>>  struct ppa_addr *luns;
>>  int nr_luns = lun_end - lun_begin + 1;
>>  int luns_left = nr_luns;
>> -int nr_chnls = nr_luns / dev->geo.nr_luns;
>> -int nr_chnls_mod = nr_luns % dev->geo.nr_luns;
>> -int bch = lun_begin / dev->geo.nr_luns;
>> -int blun = lun_begin % dev->geo.nr_luns;
>> +int nr_chnls = nr_luns / dev_geo->num_lun;
>> +int nr_chnls_mod = nr_luns % dev_geo->num_lun;
>> +int bch = lun_begin / dev_geo->num_lun;
>> +int blun = lun_begin % dev_geo->num_lun;
>>  int lunid = 0;
>>  int lun_balanced = 1;
>> -int prev_nr_luns;
>> +int sec_per_lun, prev_nr_luns;
>>  int i, j;
>>nr_chnls = (nr_chnls_mod == 0) ? nr_chnls : nr_chnls + 1;
>> @@ -173,15 +175,15 @@ static struct nvm_tgt_dev *nvm_create_tgt_dev(struct 
>> nvm_dev *dev,
>>  if (!luns)
>>  goto err_luns;
>>  -prev_nr_luns = (luns_left > dev->geo.nr_luns) ?
>> -dev->geo.nr_luns : luns_left;
>> +prev_nr_luns = (luns_left > dev_geo->num_lun) ?
>> +dev_geo->num_lun : luns_left;
>>  for (i = 0; i < nr_chnls; i++) {
>>  struct nvm_ch_map *ch_rmap = _rmap->chnls[i + bch];
>>  int *lun_roffs = ch_rmap->lun_offs;
>>  struct nvm_ch_map *ch_map = _map->chnls[i];
>>  int *lun_offs;
>> -int luns_in_chnl = (luns_left > dev->geo.nr_luns) ?
>> -dev->geo.nr_luns : luns_left;
>> +int luns_in_chnl = (luns_left > dev_geo->num_lun) ?
>> +dev_geo->num_lun : luns_left;
>>if (lun_balanced && prev_nr_luns != luns_in_chnl)
>>  lun_balanced = 0;
>> @@ -215,18 +217,23 @@ static struct nvm_tgt_dev *nvm_create_tgt_dev(struct 
>> nvm_dev *dev,
>>  if (!tgt_dev)
>>  goto err_ch;
>>  -memcpy(_dev->geo, >geo, sizeof(struct nvm_geo));
>>  /* Target device only owns a portion of the physical device */
>> -tgt_dev->geo.nr_chnls = nr_chnls;
>> +tgt_dev->geo.num_ch = nr_chnls;
>> +tgt_dev->geo.num_lun = (lun_balanced) ? prev_nr_luns : -1;
>>  tgt_dev->geo.all_luns = nr_luns;
>> -tgt_dev->geo.nr_luns = (lun_balanced) ? prev_nr_luns : -1;
>> +tgt_dev->geo.all_chunks = nr_luns 

Re: [PATCH V2 4/4] nvme: lightnvm: add late setup of block size and metadata

2018-02-12 Thread Javier Gonzalez

> On 9 Feb 2018, at 01.27, Matias Bjørling  wrote:
> 
> The nvme driver sets up the size of the nvme namespace in two steps.
> First it initializes the device with standard logical block and
> metadata sizes, and then sets the correct logical block and metadata
> size. Due to the OCSSD 2.0 specification relies on the namespace to
> expose these sizes for correct initialization, let it be updated
> appropriately on the LightNVM side as well.
> 
> Signed-off-by: Matias Bjørling 
> ---
> 

This late initialization breaks ligthnvm's core init since the sector
size (csecs) is used on the first init part to set the logical block size.

nvm_core_init -> blk_queue_logical_block_size(dev->q, dev_geo->c.csecs);

We can do do a nvme_nvm_revalidate and set this on the revalidation path
instead of simply updating the info as in nvme_nvm_update_nvm_info().

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH V2 4/4] nvme: lightnvm: add late setup of block size and metadata

2018-02-12 Thread Javier Gonzalez

> On 9 Feb 2018, at 01.27, Matias Bjørling  wrote:
> 
> The nvme driver sets up the size of the nvme namespace in two steps.
> First it initializes the device with standard logical block and
> metadata sizes, and then sets the correct logical block and metadata
> size. Due to the OCSSD 2.0 specification relies on the namespace to
> expose these sizes for correct initialization, let it be updated
> appropriately on the LightNVM side as well.
> 
> Signed-off-by: Matias Bjørling 
> ---
> 

This late initialization breaks ligthnvm's core init since the sector
size (csecs) is used on the first init part to set the logical block size.

nvm_core_init -> blk_queue_logical_block_size(dev->q, dev_geo->c.csecs);

We can do do a nvme_nvm_revalidate and set this on the revalidation path
instead of simply updating the info as in nvme_nvm_update_nvm_info().

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 3/4] lightnvm: add 2.0 geometry identification

2018-02-08 Thread Javier Gonzalez
> On 5 Feb 2018, at 13.15, Matias Bjørling  wrote:
> 
> Implement the geometry data structures for 2.0 and enable a drive
> to be identified as one, including exposing the appropriate 2.0
> sysfs entries.
> 
> Signed-off-by: Matias Bjørling 
> ---
> drivers/lightnvm/core.c  |   2 +-
> drivers/nvme/host/lightnvm.c | 334 +--
> include/linux/lightnvm.h |  11 +-
> 3 files changed, 295 insertions(+), 52 deletions(-)
> 
> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
> index c72863b36439..250e74dfa120 100644
> --- a/drivers/lightnvm/core.c
> +++ b/drivers/lightnvm/core.c
> @@ -934,7 +934,7 @@ static int nvm_init(struct nvm_dev *dev)
>   pr_debug("nvm: ver:%x nvm_vendor:%x\n",
>   dev->identity.ver_id, dev->identity.vmnt);
> 
> - if (dev->identity.ver_id != 1) {
> + if (dev->identity.ver_id != 1 && dev->identity.ver_id != 2) {
>   pr_err("nvm: device not supported by kernel.");
>   goto err;
>   }
> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> index 6412551ecc65..a9c010655ccc 100644
> --- a/drivers/nvme/host/lightnvm.c
> +++ b/drivers/nvme/host/lightnvm.c
> @@ -184,6 +184,58 @@ struct nvme_nvm_bb_tbl {
>   __u8blk[0];
> };
> 
> +struct nvme_nvm_id20_addrf {
> + __u8grp_len;
> + __u8pu_len;
> + __u8chk_len;
> + __u8lba_len;
> + __u8resv[4];
> +};
> +
> +struct nvme_nvm_id20 {
> + __u8mjr;
> + __u8mnr;
> + __u8resv[6];
> +
> + struct nvme_nvm_id20_addrf lbaf;
> +
> + __u32   mccap;
> + __u8resv2[12];
> +
> + __u8wit;
> + __u8resv3[31];
> +
> + /* Geometry */
> + __u16   num_grp;
> + __u16   num_pu;
> + __u32   num_chk;
> + __u32   clba;
> + __u8resv4[52];
> +
> + /* Write data requirements */
> + __u32   ws_min;
> + __u32   ws_opt;
> + __u32   mw_cunits;
> + __u32   maxoc;
> + __u32   maxocpu;
> + __u8resv5[44];
> +
> + /* Performance related metrics */
> + __u32   trdt;
> + __u32   trdm;
> + __u32   twrt;
> + __u32   twrm;
> + __u32   tcrst;
> + __u32   tcrsm;
> + __u8resv6[40];
> +
> + /* Reserved area */
> + __u8resv7[2816];
> +
> + /* Vendor specific */
> + __u8vs[1024];
> +};
> 

All __u16, __u32 should be __le16, __le32

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 3/4] lightnvm: add 2.0 geometry identification

2018-02-08 Thread Javier Gonzalez
> On 5 Feb 2018, at 13.15, Matias Bjørling  wrote:
> 
> Implement the geometry data structures for 2.0 and enable a drive
> to be identified as one, including exposing the appropriate 2.0
> sysfs entries.
> 
> Signed-off-by: Matias Bjørling 
> ---
> drivers/lightnvm/core.c  |   2 +-
> drivers/nvme/host/lightnvm.c | 334 +--
> include/linux/lightnvm.h |  11 +-
> 3 files changed, 295 insertions(+), 52 deletions(-)
> 
> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
> index c72863b36439..250e74dfa120 100644
> --- a/drivers/lightnvm/core.c
> +++ b/drivers/lightnvm/core.c
> @@ -934,7 +934,7 @@ static int nvm_init(struct nvm_dev *dev)
>   pr_debug("nvm: ver:%x nvm_vendor:%x\n",
>   dev->identity.ver_id, dev->identity.vmnt);
> 
> - if (dev->identity.ver_id != 1) {
> + if (dev->identity.ver_id != 1 && dev->identity.ver_id != 2) {
>   pr_err("nvm: device not supported by kernel.");
>   goto err;
>   }
> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> index 6412551ecc65..a9c010655ccc 100644
> --- a/drivers/nvme/host/lightnvm.c
> +++ b/drivers/nvme/host/lightnvm.c
> @@ -184,6 +184,58 @@ struct nvme_nvm_bb_tbl {
>   __u8blk[0];
> };
> 
> +struct nvme_nvm_id20_addrf {
> + __u8grp_len;
> + __u8pu_len;
> + __u8chk_len;
> + __u8lba_len;
> + __u8resv[4];
> +};
> +
> +struct nvme_nvm_id20 {
> + __u8mjr;
> + __u8mnr;
> + __u8resv[6];
> +
> + struct nvme_nvm_id20_addrf lbaf;
> +
> + __u32   mccap;
> + __u8resv2[12];
> +
> + __u8wit;
> + __u8resv3[31];
> +
> + /* Geometry */
> + __u16   num_grp;
> + __u16   num_pu;
> + __u32   num_chk;
> + __u32   clba;
> + __u8resv4[52];
> +
> + /* Write data requirements */
> + __u32   ws_min;
> + __u32   ws_opt;
> + __u32   mw_cunits;
> + __u32   maxoc;
> + __u32   maxocpu;
> + __u8resv5[44];
> +
> + /* Performance related metrics */
> + __u32   trdt;
> + __u32   trdm;
> + __u32   twrt;
> + __u32   twrm;
> + __u32   tcrst;
> + __u32   tcrsm;
> + __u8resv6[40];
> +
> + /* Reserved area */
> + __u8resv7[2816];
> +
> + /* Vendor specific */
> + __u8vs[1024];
> +};
> 

All __u16, __u32 should be __le16, __le32

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 0/4] lightnvm: base 2.0 implementation

2018-02-08 Thread Javier Gonzalez
> On 5 Feb 2018, at 13.15, Matias Bjørling  wrote:
> 
> Hi,
> 
> A couple of patches for 2.0 support for the lightnvm subsystem. They
> form the basis for integrating 2.0 support.
> 
> For the rest of the support, Javier has code that implements report
> chunk and sets up the LBA format data structure. He also has a bunch
> of patches that brings pblk up to speed.
> 
> The first two patches is preparation for the 2.0 work. The third patch
> implements the 2.0 data structures, the geometry command, and exposes
> the sysfs attributes that comes with the 2.0 specification. Note that
> the attributes between 1.2 and 2.0 are different, and it is expected
> that user-space shall use the version sysfs attribute to know which
> attributes will be available.
> 
> The last patch implements support for using the nvme namespace logical
> block and metadata fields and sync it with the internal lightnvm
> identify structures.
> 
> -Matias
> 
> Matias Bjørling (4):
>  lightnvm: make 1.2 data structures explicit
>  lightnvm: flatten nvm_id_group into nvm_id
>  lightnvm: add 2.0 geometry identification
>  nvme: lightnvm: add late setup of block size and metadata
> 
> drivers/lightnvm/core.c  |  27 ++-
> drivers/nvme/host/core.c |   2 +
> drivers/nvme/host/lightnvm.c | 508 ---
> drivers/nvme/host/nvme.h |   2 +
> include/linux/lightnvm.h |  64 +++---
> 5 files changed, 426 insertions(+), 177 deletions(-)
> 
> --
> 2.11.0

Thanks for posting these. I have started rebasing my patches on top of
the new geometry - it is a bit different of how I implemented it, but
I'll take care of it.

I'll review as I go - some of the changes I have might make sense to
squash in your patches to keep a clean history...

I'll add a couple of patches abstracting the geometry so that at core.c
level we only work with a single geometry structure. This is they way it
is done in the early patches I pointe you to before. Then it is patches
building bottom-up support for the new features in 2.0.

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 0/4] lightnvm: base 2.0 implementation

2018-02-08 Thread Javier Gonzalez
> On 5 Feb 2018, at 13.15, Matias Bjørling  wrote:
> 
> Hi,
> 
> A couple of patches for 2.0 support for the lightnvm subsystem. They
> form the basis for integrating 2.0 support.
> 
> For the rest of the support, Javier has code that implements report
> chunk and sets up the LBA format data structure. He also has a bunch
> of patches that brings pblk up to speed.
> 
> The first two patches is preparation for the 2.0 work. The third patch
> implements the 2.0 data structures, the geometry command, and exposes
> the sysfs attributes that comes with the 2.0 specification. Note that
> the attributes between 1.2 and 2.0 are different, and it is expected
> that user-space shall use the version sysfs attribute to know which
> attributes will be available.
> 
> The last patch implements support for using the nvme namespace logical
> block and metadata fields and sync it with the internal lightnvm
> identify structures.
> 
> -Matias
> 
> Matias Bjørling (4):
>  lightnvm: make 1.2 data structures explicit
>  lightnvm: flatten nvm_id_group into nvm_id
>  lightnvm: add 2.0 geometry identification
>  nvme: lightnvm: add late setup of block size and metadata
> 
> drivers/lightnvm/core.c  |  27 ++-
> drivers/nvme/host/core.c |   2 +
> drivers/nvme/host/lightnvm.c | 508 ---
> drivers/nvme/host/nvme.h |   2 +
> include/linux/lightnvm.h |  64 +++---
> 5 files changed, 426 insertions(+), 177 deletions(-)
> 
> --
> 2.11.0

Thanks for posting these. I have started rebasing my patches on top of
the new geometry - it is a bit different of how I implemented it, but
I'll take care of it.

I'll review as I go - some of the changes I have might make sense to
squash in your patches to keep a clean history...

I'll add a couple of patches abstracting the geometry so that at core.c
level we only work with a single geometry structure. This is they way it
is done in the early patches I pointe you to before. Then it is patches
building bottom-up support for the new features in 2.0.

Javier


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 5/5] lightnvm: pblk: refactor bad block identification

2018-02-04 Thread Javier Gonzalez

> On 4 Feb 2018, at 13.55, Matias Bjørling <m...@lightnvm.io> wrote:
> 
> On 02/04/2018 11:37 AM, Javier Gonzalez wrote:
>>> On 31 Jan 2018, at 19.24, Matias Bjørling <m...@lightnvm.io> wrote:
>>> 
>>> On 01/31/2018 10:13 AM, Javier Gonzalez wrote:
>>>>> On 31 Jan 2018, at 16.51, Matias Bjørling <m...@lightnvm.io> wrote:
>>>>> 
>>>> I have a patches abstracting this, which I think it makes it cleaner. I 
>>>> can push it next week for review. I’m traveling this week. (If you want to 
>>>> get a glimpse I can point you to the code).
>>> 
>>> Yes, please do. Thanks
>> This is the release candidate for 2.0 support based on 4.17. I'll rebase
>> on top of you 2.0 support. We'll see if all changes make it to 4.17
>> then.
>> https://github.com/OpenChannelSSD/linux/tree/for-4.17/spec20
>> Javier
> 
> Great. I look forward to be patches being cleaned up and posted. I do see 
> some nitpicks here and there, which we properly can take a couple of stabs at.

Sure. This is still in development; just wanted to point to the abstractions 
I’m thinking of so that we don’t do the same work twice. 

I’ll wait for posting until you do the 2.0 identify, since the old version is 
implemented on the first patch of this series. 

> One think that generally stands out to me is the "if 1.2 support", else, ... 
> statements. These could be structured better by having dedicated setup 
> functions for 1.2 and 2.0.

We have this construction both in pblk and in core for address translation. 
Note that we need to have them separated to support multi instance and keep 
channels decoupled from each instance. 

I assume 2 if...then is cheaper than doing 2 de-references to function 
pointers. This is the way it is done on legacy paths in other places (e.g., non 
mq scsi), but I can look into how pointer functions would look like and measure 
the performance impact. 

Javier

Re: [PATCH 5/5] lightnvm: pblk: refactor bad block identification

2018-02-04 Thread Javier Gonzalez

> On 4 Feb 2018, at 13.55, Matias Bjørling  wrote:
> 
> On 02/04/2018 11:37 AM, Javier Gonzalez wrote:
>>> On 31 Jan 2018, at 19.24, Matias Bjørling  wrote:
>>> 
>>> On 01/31/2018 10:13 AM, Javier Gonzalez wrote:
>>>>> On 31 Jan 2018, at 16.51, Matias Bjørling  wrote:
>>>>> 
>>>> I have a patches abstracting this, which I think it makes it cleaner. I 
>>>> can push it next week for review. I’m traveling this week. (If you want to 
>>>> get a glimpse I can point you to the code).
>>> 
>>> Yes, please do. Thanks
>> This is the release candidate for 2.0 support based on 4.17. I'll rebase
>> on top of you 2.0 support. We'll see if all changes make it to 4.17
>> then.
>> https://github.com/OpenChannelSSD/linux/tree/for-4.17/spec20
>> Javier
> 
> Great. I look forward to be patches being cleaned up and posted. I do see 
> some nitpicks here and there, which we properly can take a couple of stabs at.

Sure. This is still in development; just wanted to point to the abstractions 
I’m thinking of so that we don’t do the same work twice. 

I’ll wait for posting until you do the 2.0 identify, since the old version is 
implemented on the first patch of this series. 

> One think that generally stands out to me is the "if 1.2 support", else, ... 
> statements. These could be structured better by having dedicated setup 
> functions for 1.2 and 2.0.

We have this construction both in pblk and in core for address translation. 
Note that we need to have them separated to support multi instance and keep 
channels decoupled from each instance. 

I assume 2 if...then is cheaper than doing 2 de-references to function 
pointers. This is the way it is done on legacy paths in other places (e.g., non 
mq scsi), but I can look into how pointer functions would look like and measure 
the performance impact. 

Javier

Re: [PATCH 5/5] lightnvm: pblk: refactor bad block identification

2018-02-04 Thread Javier Gonzalez

> On 31 Jan 2018, at 19.24, Matias Bjørling <m...@lightnvm.io> wrote:
> 
> On 01/31/2018 10:13 AM, Javier Gonzalez wrote:
>>> On 31 Jan 2018, at 16.51, Matias Bjørling <m...@lightnvm.io> wrote:
>>> 
>> I have a patches abstracting this, which I think it makes it cleaner. I can 
>> push it next week for review. I’m traveling this week. (If you want to get a 
>> glimpse I can point you to the code).
> 
> Yes, please do. Thanks

This is the release candidate for 2.0 support based on 4.17. I'll rebase
on top of you 2.0 support. We'll see if all changes make it to 4.17
then.

https://github.com/OpenChannelSSD/linux/tree/for-4.17/spec20

Javier


signature.asc
Description: Message signed with OpenPGP


  1   2   >