[PATCH] MIPS:wrong usage of l_exc_copy in octeon-memcpy.S

2017-02-19 Thread jianchao.wang
src could be greater than bad addr. We will fix it in this patch. We add the max offset of LOAD to 15 here to fix the issue without adding new commands . Howerver, the side effect is that, when LOAD fails in few case, l_exc_copy has to copy more. Signed-off-by: jianchao.wang <jianchao.w...@gmail.

Re: [PATCH] block: move sanity checking ahead of bi_front/back_seg_size updating

2017-09-19 Thread jianchao.wang
On 09/19/2017 10:36 PM, Christoph Hellwig wrote: > On Tue, Sep 19, 2017 at 08:55:59AM +0800, jianchao.wang wrote: >>> But can you elaborate a little more on how this found and if there >>> is a way to easily reproduce it, say for a blktests test case? >>> >&g

Re: [PATCH] block: consider merge of segments when merge bio into rq

2017-09-20 Thread jianchao.wang
On 09/21/2017 09:29 AM, Christoph Hellwig wrote: > So the check change here looks good to me. > > I don't like like the duplicate code, can you look into sharing > the new segment checks between the two functions and the existing > instance in ll_merge_requests_fn by passing say two struct bio

Re: [PATCH] block: consider merge of segments when merge bio into rq

2017-09-20 Thread jianchao.wang
On 09/21/2017 09:29 AM, Christoph Hellwig wrote: > So the check change here looks good to me. > > I don't like like the duplicate code, can you look into sharing > the new segment checks between the two functions and the existing > instance in ll_merge_requests_fn by passing say two struct bio

Re: [PATCH] block: move sanity checking ahead of bi_front/back_seg_size updating

2017-09-18 Thread jianchao.wang
On 09/19/2017 07:51 AM, Christoph Hellwig wrote: > On Sat, Sep 16, 2017 at 07:10:30AM +0800, Jianchao Wang wrote: >> If the bio_integrity_merge_rq() return false or nr_phys_segments exceeds >> the max_segments, the merging fails, but the bi_front/back_seg_size may >> have been modified. To avoid

Re: [PATCH] blk-mq: put the driver tag of nxt rq before first one is requeued

2017-09-12 Thread jianchao.wang
On 09/13/2017 09:24 AM, Ming Lei wrote: > On Wed, Sep 13, 2017 at 09:01:25AM +0800, jianchao.wang wrote: >> Hi ming >> >> On 09/12/2017 06:23 PM, Ming Lei wrote: >>>> @@ -1029,14 +1029,20 @@ bool blk_mq_dispatch_rq_list(struct request_queue

Re: [PATCH] blk-mq: put the driver tag of nxt rq before first one is requeued

2017-09-12 Thread jianchao.wang
On 09/13/2017 11:54 AM, Jens Axboe wrote: > On 09/12/2017 09:39 PM, jianchao.wang wrote: >>> Exactly, and especially the readability is the key element here. It's >>> just not worth it to try and be too clever, especially not for >>> something like this. When you r

Re: [PATCH] blk-mq: put the driver tag of nxt rq before first one is requeued

2017-09-12 Thread jianchao.wang
On 09/13/2017 10:23 AM, Jens Axboe wrote: > On 09/12/2017 07:39 PM, jianchao.wang wrote: >> >> >> On 09/13/2017 09:24 AM, Ming Lei wrote: >>> On Wed, Sep 13, 2017 at 09:01:25AM +0800, jianchao.wang wrote: >>>> Hi ming >>>> >>>> O

Re: [PATCH] blk-mq: put the driver tag of nxt rq before first one is requeued

2017-09-12 Thread jianchao.wang
On 09/13/2017 10:45 AM, Jens Axboe wrote: @@ -1029,14 +1029,20 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list) if (list_empty(list)) bd.last = true; else {

Re: [PATCH] blk-mq: put the driver tag of nxt rq before first one is requeued

2017-09-12 Thread jianchao.wang
Hi ming On 09/12/2017 06:23 PM, Ming Lei wrote: >> @@ -1029,14 +1029,20 @@ bool blk_mq_dispatch_rq_list(struct request_queue >> *q, struct list_head *list) >> if (list_empty(list)) >> bd.last = true; >> else { >> -struct request

Re: [PATCH 6/6] blk-mq: remove REQ_ATOM_STARTED

2017-12-12 Thread jianchao.wang
Hi tejun On 12/13/2017 01:26 AM, Tejun Heo wrote: > Hello, again. > > Sorry, I missed part of your comment in the previous reply. > > On Tue, Dec 12, 2017 at 06:09:32PM +0800, jianchao.wang wrote: >>> static void __blk_mq_requeue_request(struct request *rq) >

Re: [PATCH 1/6] blk-mq: protect completion path with RCU

2017-12-12 Thread jianchao.wang
Hi tejun On 12/10/2017 03:25 AM, Tejun Heo wrote: > Currently, blk-mq protects only the issue path with RCU. This patch > puts the completion path under the same RCU protection. This will be > used to synchronize issue/completion against timeout by later patches, > which will also add the

Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2017-12-14 Thread jianchao.wang
On 12/15/2017 05:54 AM, Peter Zijlstra wrote: > On Thu, Dec 14, 2017 at 09:42:48PM +, Bart Van Assche wrote: >> On Thu, 2017-12-14 at 21:20 +0100, Peter Zijlstra wrote: >>> On Thu, Dec 14, 2017 at 06:51:11PM +, Bart Van Assche wrote: On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo

Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2017-12-15 Thread jianchao.wang
On 12/15/2017 03:31 PM, Peter Zijlstra wrote: > On Fri, Dec 15, 2017 at 10:12:50AM +0800, jianchao.wang wrote: >>> That only makes it a little better: >>> >>> Task-A Worker >>> >>> write_seqcount_begi

Re: [PATCH 6/6] blk-mq: remove REQ_ATOM_STARTED

2017-12-13 Thread jianchao.wang
On 12/14/2017 12:09 AM, Tejun Heo wrote: > Hello, > > On Wed, Dec 13, 2017 at 11:05:11AM +0800, jianchao.wang wrote: >> Just don't quite understand the strict definition of "generation" you said. >> A allocation->free cycle is a generation ? or a idle-&

Re: [PATCH 1/6] blk-mq: protect completion path with RCU

2017-12-13 Thread jianchao.wang
On 12/14/2017 12:13 AM, Tejun Heo wrote: > Hello, > > On Wed, Dec 13, 2017 at 11:30:48AM +0800, jianchao.wang wrote: >>> + } else { >>> + srcu_idx = srcu_read_lock(hctx->queue_rq_srcu); >>> + if (!blk_mark_rq_complete(rq)) >>&g

Re: [PATCH 1/6] blk-mq: protect completion path with RCU

2017-12-12 Thread jianchao.wang
Hello tejun Sorry for missing the V2, same comment again. On 12/13/2017 03:01 AM, Tejun Heo wrote: > Currently, blk-mq protects only the issue path with RCU. This patch > puts the completion path under the same RCU protection. This will be > used to synchronize issue/completion against timeout

Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2017-12-12 Thread jianchao.wang
ut by Jianchao. > - s/request->gstate_seqc/request->gstate_seq/ as suggested by Peter. > - READ_ONCE() added in blk_mq_rq_update_state() as suggested by Peter. > > Signed-off-by: Tejun Heo <t...@kernel.org> > Cc: "jianchao.wang" <jianchao.w.w...@oracle.

BUG: scsi/qla2xxx: scsi_done from qla2x00_sp_compl race with scsi_queue_insert from abort handler

2017-11-07 Thread jianchao.wang
Hi [1.] One line summary of the problem: scsi_done from qla2x00_sp_compl race with scsi_queue_insert from scmd_eh_abort_handler() then cause the BUG_ON(blk_queued_rq(req)) trigger. [2.] Full description of the problem/report: The detailed scene is as following: cpu A

Re: [PATCH 6/6] blk-mq: remove REQ_ATOM_STARTED

2017-12-12 Thread jianchao.wang
Hi Tejun On 12/10/2017 03:25 AM, Tejun Heo wrote: > After the recent updates to use generation number and state based > synchronization, we can easily replace REQ_ATOM_STARTED usages by > adding an extra state to distinguish completed but not yet freed > state. > > Add MQ_RQ_COMPLETE and replace

Re: [PATCH] nvme-rdma: fix double free in nvme_rdma_free_queue

2018-05-08 Thread jianchao.wang
Hi Christoph On 05/07/2018 08:27 PM, Christoph Hellwig wrote: > On Fri, May 04, 2018 at 04:02:18PM +0800, Jianchao Wang wrote: >> BUG: KASAN: double-free or invalid-free in nvme_rdma_free_queue+0xf6/0x110 >> [nvme_rdma] >> Workqueue: nvme-reset-wq nvme_rdma_reset_ctrl_work [nvme_rdma] >> Call

Re: [PATCH] nvme: unquiesce the queue before cleaup it

2018-04-27 Thread jianchao.wang
I'll add IsraelR proposed fix to nvme-rdma that is currently on hold and see > what happens. > Nontheless, I don't like the situation that the reset and delete flows can > run concurrently. > > -Max. > > On 4/26/2018 11:27 AM, jianchao.wang wrote: >> Hi Max >> >

Re: [PATCH] nvme: unquiesce the queue before cleaup it

2018-04-28 Thread jianchao.wang
Hi Max On 04/27/2018 04:51 PM, jianchao.wang wrote: > Hi Max > > On 04/26/2018 06:23 PM, Max Gurtovoy wrote: >> Hi Jianchao, >> I actually tried this scenario with real HW and was able to repro the hang. >> Unfortunatly, after applying your patch I got NULL deref: >

Re: [PATCH] nvme-rdma: clear NVME_RDMA_Q_LIVE before free the queue

2018-05-16 Thread jianchao.wang
Hi Sagi On 05/09/2018 11:06 PM, Sagi Grimberg wrote: > The correct fix would be to add a tag for stop_queue and call > nvme_rdma_stop_queue() in all the failure cases after > nvme_rdma_start_queue. Would you please look at the V2 in following link ?

Re: [PATCH V2] nvme-rdma: fix double free in nvme_rdma_free_queue

2018-05-17 Thread jianchao.wang
Hi Max Thanks for kindly review and suggestion for this. On 05/16/2018 08:18 PM, Max Gurtovoy wrote: > I don't know exactly what Christoph meant but IMO the best place to allocate > it is in nvme_rdma_alloc_queue just before calling > > "set_bit(NVME_RDMA_Q_ALLOCATED, >flags);" > > then you

Re: [PATCH] block: kyber: make kyber more friendly with merging

2018-05-22 Thread jianchao.wang
Hi Omar Thanks for your kindly response. On 05/23/2018 04:02 AM, Omar Sandoval wrote: > On Tue, May 22, 2018 at 10:48:29PM +0800, Jianchao Wang wrote: >> Currently, kyber is very unfriendly with merging. kyber depends >> on ctx rq_list to do merging, however, most of time, it will not >> leave

Re: [PATCH] block: kyber: make kyber more friendly with merging

2018-05-22 Thread jianchao.wang
Hi Jens and Holger Thank for your kindly response. That's really appreciated. I will post next version based on Jens' patch. Thanks Jianchao On 05/23/2018 02:32 AM, Holger Hoffstätte wrote: This looks great but prevents kyber from being built as module, which is AFAIK supposed to

BUG: scsi/qla2xxx: BUG_ON(blk_queued_rq(req) is triggered in blk_finish_request

2018-05-22 Thread jianchao.wang
Hi all Our customer met a panic triggered by BUG_ON in blk_finish_request. >From the dmesg log, the BUG_ON was triggered after command abort occurred many >times. There is a race condition in the following scenario. cpu A cpu B kworker

Re: BUG: scsi/qla2xxx: BUG_ON(blk_queued_rq(req) is triggered in blk_finish_request

2018-05-23 Thread jianchao.wang
Would anyone please take a look at this ? Thanks in advance Jianchao On 05/23/2018 11:55 AM, jianchao.wang wrote: > > > Hi all > > Our customer met a panic triggered by BUG_ON in blk_finish_request. >>From the dmesg log, the BUG_ON was triggered after command abort o

Re: BUG: scsi/qla2xxx: BUG_ON(blk_queued_rq(req) is triggered in blk_finish_request

2018-05-24 Thread jianchao.wang
his issue. > > Thanks, > Himanshu > >> -Original Message- >> From: jianchao.wang [mailto:jianchao.w.w...@oracle.com] >> Sent: Wednesday, May 23, 2018 6:51 PM >> To: Dept-Eng QLA2xxx Upstream <qla2xxx-upstr...@cavium.com>; Madhani, >> Himansh

Re: scsi/qla2xxx: BUG_ON(blk_queued_rq(req) is triggered in blk_finish_request

2018-05-25 Thread jianchao.wang
d for us to look at this in details. > > Can you provide me crash/vmlinux/modules for details analysis. > > Thanks, > himanshu > > On 5/24/18, 6:49 AM, "Madhani, Himanshu" <himanshu.madh...@cavium.com> wrote: > > > > On May 24, 2018, at 2:09

Re: scsi/qla2xxx: BUG_ON(blk_queued_rq(req) is triggered in blk_finish_request

2018-05-28 Thread jianchao.wang
Hi Himanshu do you need any other information ? Thanks Jianchao On 05/25/2018 02:48 PM, jianchao.wang wrote: > Hi Himanshu > > I'm afraid I cannot provide you the vmcore file, it is from our customer. > If any information needed in the vmcore, I could provide with you. >

Re: [PATCH V2 2/2] block: kyber: make kyber more friendly with merging

2018-05-29 Thread jianchao.wang
Hi Omar Thanks for your kindly and detailed comment. That's really appreciated. :) On 05/30/2018 02:55 AM, Omar Sandoval wrote: > On Wed, May 23, 2018 at 02:33:22PM +0800, Jianchao Wang wrote: >> Currently, kyber is very unfriendly with merging. kyber depends >> on ctx rq_list to do merging,

Re: scsi/qla2xxx: BUG_ON(blk_queued_rq(req) is triggered in blk_finish_request

2018-05-29 Thread jianchao.wang
_request. however, the scsi recovery context could clear the ATOM_COMPLETE and requeue the request before irq context get it. Thanks Jianchao > > On 5/28/18, 6:11 PM, "jianchao.wang" wrote: > > Hi Himanshu > > do you need any other information ? >

Re: [PATCH] blk-mq: use blk_mq_timeout_work to limit the max timeout

2018-06-19 Thread jianchao.wang
On 06/20/2018 09:35 AM, Bart Van Assche wrote: > On Wed, 2018-06-20 at 09:28 +0800, jianchao.wang wrote: >> Hi Bart >> >> Thanks for your kindly response. >> >> On 06/19/2018 11:18 PM, Bart Van Assche wrote: >>> On Tue, 2018-06-19 at 15:00 +0800

Re: [PATCH] nvme-pci: not invoke nvme_remove_dead_ctrl when change state fails

2018-06-19 Thread jianchao.wang
Hi Keith On 06/20/2018 12:39 AM, Keith Busch wrote: > On Tue, Jun 19, 2018 at 04:30:50PM +0800, Jianchao Wang wrote: >> There is race between nvme_remove and nvme_reset_work that can >> lead to io hang. >> >> nvme_removenvme_reset_work >> -> change state to DELETING >>

Re: [PATCH] nvme: unquiesce the queue before cleaup it

2018-04-26 Thread jianchao.wang
-> blk_freeze_queue This patch could also fix this issue. Thanks Jianchao On 04/22/2018 11:00 PM, jianchao.wang wrote: > Hi Max > > That's really appreciated! > Here is my test script. > > loop_reset_controller.sh > #!/bin/bash > while true > do >

Re: testing io.low limit for blk-throttle

2018-04-26 Thread jianchao.wang
Hi Tejun and Joseph On 04/27/2018 02:32 AM, Tejun Heo wrote: > Hello, > > On Tue, Apr 24, 2018 at 02:12:51PM +0200, Paolo Valente wrote: >> +Tejun (I guess he might be interested in the results below) > > Our experiments didn't work out too well either. At this point, it > isn't clear whether

Re: [PATCH] nvme-pci: fix the timeout case when reset is ongoing

2018-01-04 Thread jianchao.wang
Hi Christoph Many thanks for your kindly response. On 01/04/2018 06:35 PM, Christoph Hellwig wrote: > On Wed, Jan 03, 2018 at 06:31:44AM +0800, Jianchao Wang wrote: >> NVME_CTRL_RESETTING used to indicate the range of nvme initializing >> strictly in fd634f41(nvme: merge probe_work and

Re: [PATCH V2] nvme-pci: fix NULL pointer reference in nvme_alloc_ns

2018-01-04 Thread jianchao.wang
Hi Christoph Many thanks for your kindly response. On 01/04/2018 06:20 PM, Christoph Hellwig wrote: > This looks generally fine to me, ut a few nitpicks below: > >> - Based on Sagi's suggestion, add new state NVME_CTRL_ADMIN_LIVE. > > Maybe call this NVME_CTRL_ADMIN_ONLY ? Sound more in line

Re: [PATCH 5/7] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq

2017-12-21 Thread jianchao.wang
Sorry for my non-detailed description. On 12/21/2017 09:50 PM, Tejun Heo wrote: > Hello, > > On Thu, Dec 21, 2017 at 11:56:49AM +0800, jianchao.wang wrote: >> It's worrying that even though the blk_mark_rq_complete() here is intended >> to synchronize with >> timeo

Re: [PATCH 5/7] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq

2018-01-08 Thread jianchao.wang
Hi tejun Many thanks for your kindly response. On 01/09/2018 11:37 AM, Tejun Heo wrote: > Hello, > > On Tue, Jan 09, 2018 at 11:08:04AM +0800, jianchao.wang wrote: >>> But what'd prevent the completion reinitializing the request and then >>> the actual completion pa

Re: [PATCH V2 0/2] nvme-pci: fix the timeout case when reset is ongoing

2018-01-08 Thread jianchao.wang
Hi Keith On 01/08/2018 11:26 PM, Keith Busch wrote: > On Tue, Jan 09, 2018 at 10:03:11AM +0800, Jianchao Wang wrote: >> Hello > > Sorry for the distraction, but could you possibly fix the date on your > machine? For some reason, lists.infradead.org sorts threads by the time > you claim to have

Re: [PATCH 5/7] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq

2018-01-08 Thread jianchao.wang
Hi tejun Many thanks for you kindly response. On 01/09/2018 01:27 AM, Tejun Heo wrote: > Hello, Jianchao. > > On Fri, Dec 22, 2017 at 12:02:20PM +0800, jianchao.wang wrote: >>> On Thu, Dec 21, 2017 at 11:56:49AM +0800, jianchao.wang wrote: >>>>

Re: [Suspected-Phishing]Re: [PATCH V3 1/2] nvme: split resetting state into reset_prepate and resetting

2018-01-15 Thread jianchao.wang
On 01/16/2018 01:57 PM, jianchao.wang wrote: > Hi Max > > Thanks for your kindly comment. > > On 01/15/2018 09:36 PM, Max Gurtovoy wrote: >>>>>   case NVME_CTRL_RECONNECTING: >>>>>   switch (old_state) { >>>>>   c

Re: [Suspected-Phishing]Re: [PATCH V3 1/2] nvme: split resetting state into reset_prepate and resetting

2018-01-15 Thread jianchao.wang
Hi Max Thanks for your kindly comment. On 01/15/2018 09:36 PM, Max Gurtovoy wrote:   case NVME_CTRL_RECONNECTING:   switch (old_state) {   case NVME_CTRL_LIVE:   case NVME_CTRL_RESETTING: +    case NVME_CTRL_RESET_PREPARE: > > I forget to

Re: [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each possisble CPU

2018-01-16 Thread jianchao.wang
Hi Ming On 01/12/2018 10:53 AM, Ming Lei wrote: > From: Christoph Hellwig > > The previous patch assigns interrupt vectors to all possible CPUs, so > now hctx can be mapped to possible CPUs, this patch applies this fact > to simplify queue mapping & schedule so that we don't need

Re: [PATCHSET v5] blk-mq: reimplement timeout handling

2018-01-14 Thread jianchao.wang
On 01/13/2018 05:19 AM, Bart Van Assche wrote: > Sorry but I only retrieved the blk-mq debugfs several minutes after the hang > started so I'm not sure the state information is relevant. Anyway, I have > attached > it to this e-mail. The most remarkable part is the following: > >

Re: [PATCH V3 1/2] nvme: split resetting state into reset_prepate and resetting

2018-01-15 Thread jianchao.wang
Hi max Thanks for your kindly response and comment. On 01/15/2018 09:28 PM, Max Gurtovoy wrote: >>> >> >> setting RESET_PREPARE here?? >> >> Also, the error recovery code is mutually excluded from reset_work >> by trying to set the same state which is protected by the ctrl state >> machine, so a

[BUG] do_IRQ: 7.33 No irq handler for vector

2018-01-19 Thread jianchao.wang
Hi Thomas When I did cpu hotplug stress test, I found this log on my machine. [ 267.161043] do_IRQ: 7.33 No irq handler for vector I add a dump_stack below the bug and get following log: [ 267.161043] do_IRQ: 7.33 No irq handler for vector [ 267.161045] CPU: 7 PID: 52 Comm: migration/7 Not

Re: [PATCH V5 0/2] nvme-pci: fix the timeout case when reset is ongoing

2018-01-19 Thread jianchao.wang
Hi Keith Thanks for your kindly response. On 01/19/2018 07:52 PM, Keith Busch wrote: > On Fri, Jan 19, 2018 at 05:02:06PM +0800, jianchao.wang wrote: >> We should not use blk_sync_queue here, the requeue_work and run_work will be >> canceled. >> Just flush_work(>ti

Re: [PATCH V5 0/2] nvme-pci: fix the timeout case when reset is ongoing

2018-01-20 Thread jianchao.wang
Hi Keith Thanks for you kindly response. On 01/20/2018 10:11 AM, Keith Busch wrote: > On Fri, Jan 19, 2018 at 09:56:48PM +0800, jianchao.wang wrote: >> In nvme_dev_disable, the outstanding requests will be requeued finally. >> I'm afraid the requests requeued on the

Re: [PATCH V5 0/2] nvme-pci: fix the timeout case when reset is ongoing

2018-01-20 Thread jianchao.wang
On 01/20/2018 10:07 PM, jianchao.wang wrote: > Hi Keith > > Thanks for you kindly response. > > On 01/20/2018 10:11 AM, Keith Busch wrote: >> On Fri, Jan 19, 2018 at 09:56:48PM +0800, jianchao.wang wrote: >>> In nvme_dev_disable, the outstanding requests wil

Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

2018-01-19 Thread jianchao.wang
mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring) { + dma_wmb(); *ring->wqres.db.db = cpu_to_be32(ring->prod & 0x); } I analyzed the kdump, it should be a memory corruption. Thanks Jianchao On 01/15/2018 01:50 PM, jianchao.wang wrote: > Hi Tariq > > Tha

Re: [PATCH V5 2/2] nvme-pci: fixup the timeout case when reset is ongoing

2018-01-18 Thread jianchao.wang
Hi Keith Thanks for your kindly response and directive. On 01/19/2018 12:59 PM, Keith Busch wrote: > On Thu, Jan 18, 2018 at 06:10:02PM +0800, Jianchao Wang wrote: >> + * - When the ctrl.state is NVME_CTRL_RESETTING, the expired >> + * request should come from the previous work and we

Re: [PATCH V5 2/2] nvme-pci: fixup the timeout case when reset is ongoing

2018-01-18 Thread jianchao.wang
Hi Keith Thanks for your kindly reminding. On 01/19/2018 02:05 PM, Keith Busch wrote: >>> The driver may be giving up on the command here, but that doesn't mean >>> the controller has. We can't just end the request like this because that >>> will release the memory the controller still owns. We

Re: [PATCH V5 0/2] nvme-pci: fix the timeout case when reset is ongoing

2018-01-19 Thread jianchao.wang
Hi Keith Thanks for your time to look into this. On 01/19/2018 04:01 PM, Keith Busch wrote: > On Thu, Jan 18, 2018 at 06:10:00PM +0800, Jianchao Wang wrote: >> Hello >> >> Please consider the following scenario. >> nvme_reset_ctrl >> -> set state to RESETTING >> -> queue reset_work >>

Re: [PATCH V5 1/2] nvme-pci: introduce RECONNECTING state to mark initializing procedure

2018-01-19 Thread jianchao.wang
Hi Max Thanks for your kindly comment and response. On 01/18/2018 06:17 PM, Max Gurtovoy wrote: > > On 1/18/2018 12:10 PM, Jianchao Wang wrote: >> After Sagi's commit (nvme-rdma: fix concurrent reset and reconnect), >> both nvme-fc/rdma have following pattern: >> RESETTING    - quiesce blk-mq

Re: [PATCH V5 0/2] nvme-pci: fix the timeout case when reset is ongoing

2018-01-19 Thread jianchao.wang
Hi Keith Thanks for your kindly and detailed response and patch. On 01/19/2018 04:42 PM, Keith Busch wrote: > On Fri, Jan 19, 2018 at 04:14:02PM +0800, jianchao.wang wrote: >> On 01/19/2018 04:01 PM, Keith Busch wrote: >>> The nvme_dev_disable routine makes forward progress

Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

2018-01-21 Thread jianchao.wang
Hi Eric On 01/22/2018 12:43 AM, Eric Dumazet wrote: > On Sun, 2018-01-21 at 18:24 +0200, Tariq Toukan wrote: >> >> On 21/01/2018 11:31 AM, Tariq Toukan wrote: >>> >>> >>> On 19/01/2018 5:49 PM, Eric Dumazet wrote: >>>> On Fri, 2018-01-

Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

2018-01-21 Thread jianchao.wang
Hi Tariq and all Many thanks for your kindly and detailed response and comment. On 01/22/2018 12:24 AM, Tariq Toukan wrote: > > > On 21/01/2018 11:31 AM, Tariq Toukan wrote: >> >> >> On 19/01/2018 5:49 PM, Eric Dumazet wrote: >>> On Fri, 2018-01-19 at 23:16

Re: [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each possisble CPU

2018-01-16 Thread jianchao.wang
Hi ming Thanks for your kindly response. On 01/17/2018 11:52 AM, Ming Lei wrote: >> It is here. >> __blk_mq_run_hw_queue() >> >> WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) && >> cpu_online(hctx->next_cpu)); > I think this warning is triggered after the CPU

Re: [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each possisble CPU

2018-01-16 Thread jianchao.wang
Hi ming Thanks for your patch and kindly response. On 01/16/2018 11:32 PM, Ming Lei wrote: > OK, I got it, and it should have been the only corner case in which > all CPUs mapped to this hctx become offline, and I believe the following > patch should address this case, could you give a test? >

Re: [PATCH V4 1/2] nvme: add NVME_CTRL_RESET_PREPARE state

2018-01-17 Thread jianchao.wang
Hi Max Thanks for your kindly response. I have merged the response to you together below. On 01/17/2018 05:06 PM, Max Gurtovoy wrote: >>   case NVME_CTRL_RECONNECTING: >>   switch (old_state) { >>   case NVME_CTRL_LIVE: >> -    case NVME_CTRL_RESETTING: >> +    case

Re: [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each possisble CPU

2018-01-17 Thread jianchao.wang
Hi ming Thanks for your kindly response. On 01/17/2018 02:22 PM, Ming Lei wrote: > This warning can't be removed completely, for example, the CPU figured > in blk_mq_hctx_next_cpu(hctx) can be put on again just after the > following call returns and before __blk_mq_run_hw_queue() is scheduled >

Re: [PATCH V4 1/2] nvme: add NVME_CTRL_RESET_PREPARE state

2018-01-17 Thread jianchao.wang
Hi James Thanks for you detailed, kindly response and directive. That's really appreciated. On 01/18/2018 02:24 PM, James Smart wrote: >> So in the patch, RESETTING in nvme-fc/rdma is changed to RESET_PREPARE. Then >> we get: >> nvme-fc/rdma RESET_PREPARE -> RECONNECTING -> LIVE >> nvme-pci

Re: [Suspected-Phishing]Re: [PATCH V3 1/2] nvme: split resetting state into reset_prepate and resetting

2018-01-17 Thread jianchao.wang
Hi James and Sagi Thanks for your kindly response and directive. On 01/18/2018 05:08 AM, James Smart wrote: > On 1/17/2018 2:37 AM, Sagi Grimberg wrote: >> >>> After Sagi's nvme-rdma: fix concurrent reset and reconnect, the rdma ctrl >>> state is changed to RECONNECTING state >>> after some

Re: [PATCH V4 1/2] nvme: add NVME_CTRL_RESET_PREPARE state

2018-01-17 Thread jianchao.wang
the transport will now attempt to reconnect (perhaps several > attempts) to create a new link-side association. Stays in this state until > the controller is fully reconnected and it transitions to NVME_LIVE.   Until > the link side association is active, queues do what they do (as left by &g

Re: [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each possisble CPU

2018-01-16 Thread jianchao.wang
Hi minglei On 01/16/2018 08:10 PM, Ming Lei wrote: >>> - next_cpu = cpumask_next(hctx->next_cpu, hctx->cpumask); >>> + next_cpu = cpumask_next_and(hctx->next_cpu, hctx->cpumask, >>> + cpu_online_mask); >>> if (next_cpu >= nr_cpu_ids) >>> -

Re: [PATCH V3 1/2] nvme: split resetting state into reset_prepate and resetting

2018-01-14 Thread jianchao.wang
Hi keith Thanks for your kindly review and response. On 01/14/2018 05:48 PM, Sagi Grimberg wrote: > >> Currently, the ctrl->state will be changed to NVME_CTRL_RESETTING >> before queue the reset work. This is not so strict. There could be >> a big gap before the reset_work callback is invoked.

Re: [PATCH V3 1/2] nvme: split resetting state into reset_prepate and resetting

2018-01-14 Thread jianchao.wang
On 01/15/2018 10:11 AM, Keith Busch wrote: > On Mon, Jan 15, 2018 at 10:02:04AM +0800, jianchao.wang wrote: >> Hi keith >> >> Thanks for your kindly review and response. > > I agree with Sagi's feedback, but I can't take credit for it. :) > ahh...but sill thank

Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

2018-01-14 Thread jianchao.wang
Hi Tariq Thanks for your kindly response. On 01/14/2018 05:47 PM, Tariq Toukan wrote: > Thanks Jianchao for your patch. > > And Thank you guys for your reviews, much appreciated. > I was off-work on Friday and Saturday. > > On 14/01/2018 4:40 AM, jianchao.wang wrote: >&g

Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

2018-01-24 Thread jianchao.wang
Hi Eric Thanks for you kindly response and suggestion. That's really appreciated. Jianchao On 01/25/2018 11:55 AM, Eric Dumazet wrote: > On Thu, 2018-01-25 at 11:27 +0800, jianchao.wang wrote: >> Hi Tariq >> >> On 01/22/2018 10:12 AM, jianchao.wang wrote: >>>

Re: [PATCH RESENT] nvme-pci: introduce RECONNECTING state to mark initializing procedure

2018-01-24 Thread jianchao.wang
Hi Keith If you have time, can have a look at this. That's really appreciated and thanks in advance. :) Jianchao On 01/22/2018 10:03 PM, Jianchao Wang wrote: > After Sagi's commit (nvme-rdma: fix concurrent reset and reconnect), > both nvme-fc/rdma have following pattern: > RESETTING-

Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

2018-01-24 Thread jianchao.wang
Hi Tariq On 01/22/2018 10:12 AM, jianchao.wang wrote: >>> On 19/01/2018 5:49 PM, Eric Dumazet wrote: >>>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote: >>>>> Hi Tariq >>>>> >>>>> Very sad that the crash was reproduced ag

Re: [PATCH] nvme-pci: calculate iod and avg_seg_size just before use them

2018-01-11 Thread jianchao.wang
Hi Keith Thanks for your kindly response. On 01/11/2018 11:48 PM, Keith Busch wrote: > On Thu, Jan 11, 2018 at 01:09:39PM +0800, Jianchao Wang wrote: >> The calculation of iod and avg_seg_size maybe meaningless if >> nvme_pci_use_sgls returns before uses them. So calculate >> just before use

Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

2018-01-27 Thread jianchao.wang
Hi Tariq Thanks for your kindly response. That's really appreciated. On 01/25/2018 05:54 PM, Tariq Toukan wrote: > > > On 25/01/2018 8:25 AM, jianchao.wang wrote: >> Hi Eric >> >> Thanks for you kindly response and suggestion. >> That's really appreciated. >

Re: [PATCH] nvme-pci: use NOWAIT flag for nvme_set_host_mem

2018-01-29 Thread jianchao.wang
On 01/29/2018 11:07 AM, Jianchao Wang wrote: > nvme_set_host_mem will invoke nvme_alloc_request without NOWAIT > flag, it is unsafe for nvme_dev_disable. The adminq driver tags > may have been used up when the previous outstanding adminq requests > cannot be completed due to some hardware error.

Re: [PATCH] nvme-pci: use NOWAIT flag for nvme_set_host_mem

2018-01-29 Thread jianchao.wang
Hi Keith and Sagi Thanks for your kindly response. :) On 01/30/2018 04:17 AM, Keith Busch wrote: > On Mon, Jan 29, 2018 at 09:55:41PM +0200, Sagi Grimberg wrote: >>> Thanks for the fix. It looks like we still have a problem, though. >>> Commands submitted with the "shutdown_lock" held need to be

Re: [PATCH 4/6] nvme-pci: break up nvme_timeout and nvme_dev_disable

2018-02-04 Thread jianchao.wang
Hi Keith Thanks for you kindly response and comment. That's really appreciated. On 02/03/2018 02:31 AM, Keith Busch wrote: > On Fri, Feb 02, 2018 at 03:00:47PM +0800, Jianchao Wang wrote: >> Currently, the complicated relationship between nvme_dev_disable >> and nvme_timeout has become a devil

Re: [PATCH 2/6] nvme-pci: fix the freeze and quiesce for shutdown and reset case

2018-02-04 Thread jianchao.wang
Hi Keith Thanks for your kindly response. On 02/03/2018 02:24 AM, Keith Busch wrote: > On Fri, Feb 02, 2018 at 03:00:45PM +0800, Jianchao Wang wrote: >> Currently, request queue will be frozen and quiesced for both reset >> and shutdown case. This will trigger ioq requests in RECONNECTING >>

Re: [PATCH 1/6] nvme-pci: move clearing host mem behind stopping queues

2018-02-04 Thread jianchao.wang
Hi Keith Thanks for your kindly response and directive. On 02/03/2018 02:46 AM, Keith Busch wrote: > This one makes sense, though I would alter the change log to something > like: > > This patch quiecses new IO prior to disabling device HMB access. > A controller using HMB may be relying on

Re: [PATCH 2/6] nvme-pci: fix the freeze and quiesce for shutdown and reset case

2018-02-05 Thread jianchao.wang
Hi Keith Thanks for your kindly response. On 02/05/2018 11:13 PM, Keith Busch wrote: > but how many requests are you letting enter to their demise by > freezing on the wrong side of the reset? There are only two difference with this patch from the original one. 1. Don't freeze the queue for

Re: [PATCH 2/6] nvme-pci: fix the freeze and quiesce for shutdown and reset case

2018-02-07 Thread jianchao.wang
Hi Keith Really thanks for your your precious time and kindly directive. That's really appreciated. :) On 02/08/2018 12:13 AM, Keith Busch wrote: > On Wed, Feb 07, 2018 at 10:13:51AM +0800, jianchao.wang wrote: >> What's the difference ? Can you please point out. >> I

Re: [PATCH V2 0/6]nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-02-08 Thread jianchao.wang
Hi Keith and Sagi Many thanks for your kindly response. That's really appreciated. On 02/09/2018 01:56 AM, Keith Busch wrote: > On Thu, Feb 08, 2018 at 05:56:49PM +0200, Sagi Grimberg wrote: >> Given the discussion on this set, you plan to respin again >> for 4.16? > > With the exception of

Re: [PATCH 2/6] nvme-pci: fix the freeze and quiesce for shutdown and reset case

2018-02-08 Thread jianchao.wang
Hi Keith Thanks for your precious time and kindly response. On 02/08/2018 11:15 PM, Keith Busch wrote: > On Thu, Feb 08, 2018 at 10:17:00PM +0800, jianchao.wang wrote: >> There is a dangerous scenario which caused by nvme_wait_freeze in >> nvme_reset_work. >

Re: [PATCH 2/6] nvme-pci: fix the freeze and quiesce for shutdown and reset case

2018-02-06 Thread jianchao.wang
Hi Keith Sorry for bothering you again. On 02/07/2018 10:03 AM, jianchao.wang wrote: > Hi Keith > > Thanks for your time and kindly response on this. > > On 02/06/2018 11:13 PM, Keith Busch wrote: >> On Tue, Feb 06, 2018 at 09:46:36AM +0800, jianchao.wang wrote: >&g

Re: [PATCH 2/6] nvme-pci: fix the freeze and quiesce for shutdown and reset case

2018-02-06 Thread jianchao.wang
Hi Keith Thanks for your time and kindly response on this. On 02/06/2018 11:13 PM, Keith Busch wrote: > On Tue, Feb 06, 2018 at 09:46:36AM +0800, jianchao.wang wrote: >> Hi Keith >> >> Thanks for your kindly response. >> >> On 02/05/2018 11:13 PM, Keith Busch

Re: [PATCH RESENT] nvme-pci: suspend queues based on online_queues

2018-02-12 Thread jianchao.wang
Hi Sagi Thanks for your kindly response. On 02/13/2018 02:37 AM, Sagi Grimberg wrote: > >> nvme cq irq is freed based on queue_count. When the sq/cq creation >> fails, irq will not be setup. free_irq will warn 'Try to free >> already-free irq'. >> >> To fix it, we only increase online_queues

Re: [PATCH] nvme-pci: drain the entered requests after ctrl is shutdown

2018-02-12 Thread jianchao.wang
Hi Keith andn Sagi Thanks for your kindly response and comment on this. On 02/13/2018 03:15 AM, Keith Busch wrote: > On Mon, Feb 12, 2018 at 08:43:58PM +0200, Sagi Grimberg wrote: >> >>> Currently, we will unquiesce the queues after the controller is >>> shutdown to avoid residual requests to be

Re: [PATCH V2 0/6]nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-02-09 Thread jianchao.wang
Hi Keith Thanks for your kindly response here. That's really appreciated. On 02/10/2018 01:12 AM, Keith Busch wrote: > On Fri, Feb 09, 2018 at 09:50:58AM +0800, jianchao.wang wrote: >> >> if we set NVME_REQ_CANCELLED and return BLK_EH_HANDLED as the RESETTING case, >> nvme

Re: [PATCH V2 0/6]nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-02-09 Thread jianchao.wang
Hi Keith On 02/10/2018 10:32 AM, jianchao.wang wrote: > Hi Keith > > Thanks for your kindly response here. > That's really appreciated. > > On 02/10/2018 01:12 AM, Keith Busch wrote: >> On Fri, Feb 09, 2018 at 09:50:58AM +0800, jianchao.wang wrote: >>>

Re: [PATCH V2 0/6]nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-02-10 Thread jianchao.wang
On 02/10/2018 10:32 AM, jianchao.wang wrote: > Hi Keith > > Thanks for your kindly response here. > That's really appreciated. > > On 02/10/2018 01:12 AM, Keith Busch wrote: >> On Fri, Feb 09, 2018 at 09:50:58AM +0800, jianchao.wang wrote: >>> >>&g

Re: [PATCH 2/9] nvme: fix the deadlock in nvme_update_formats

2018-02-11 Thread jianchao.wang
Hi Sagi Thanks for your kindly response and directive. That's really appreciated. On 02/11/2018 07:16 PM, Sagi Grimberg wrote: >>   mutex_lock(>namespaces_mutex); >>   list_for_each_entry(ns, >namespaces, list) { >> -    if (ns->disk && nvme_revalidate_disk(ns->disk)) >> -   

Re: [PATCH 8/9] nvme-pci: break up nvme_timeout and nvme_dev_disable

2018-02-11 Thread jianchao.wang
Hi Sagi Just make some supplement here. On 02/12/2018 10:16 AM, jianchao.wang wrote: >> I think this is going in the wrong direction. Every state that is needed >> to handle serialization should be done in core ctrl state. Moreover, >> please try to avoid handling this locally

Re: [PATCH 8/9] nvme-pci: break up nvme_timeout and nvme_dev_disable

2018-02-11 Thread jianchao.wang
Hi Sagi Thanks for your kindly remind and directive. That's really appreciated. On 02/11/2018 07:36 PM, Sagi Grimberg wrote: > Jianchao, > >> Currently, the complicated relationship between nvme_dev_disable >> and nvme_timeout has become a devil that will introduce many >> circular pattern

Re: [PATCH 4/9] nvme-pci: quiesce IO queues prior to disabling device HMB accesses

2018-02-11 Thread jianchao.wang
Hi Sagi Thanks for your kindly reminding. On 02/11/2018 07:19 PM, Sagi Grimberg wrote: > >> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c >> index 6fe7af0..00cffed 100644 >> --- a/drivers/nvme/host/pci.c >> +++ b/drivers/nvme/host/pci.c >> @@ -2186,7 +2186,10 @@ static void

Re: [PATCH 3/9] nvme: change namespaces_mutext to namespaces_rwsem

2018-02-11 Thread jianchao.wang
Hi Sagi Thanks for your kindly response. And sorry for my bad description. On 02/11/2018 07:17 PM, Sagi Grimberg wrote: >> namespaces_mutext is used to synchronize the operations on ctrl >> namespaces list. Most of the time, it is a read operation. It is >> better to change it from mutex to

Re: [PATCH RESENT] nvme-pci: suspend queues based on online_queues

2018-02-14 Thread jianchao.wang
Hi Keith Thanks for your kindly response and directive. And 恭喜发财 大吉大利!! On 02/14/2018 05:52 AM, Keith Busch wrote: > On Mon, Feb 12, 2018 at 09:05:13PM +0800, Jianchao Wang wrote: >> @@ -1315,9 +1315,6 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq) >> nvmeq->cq_vector = -1; >>

Re: [PATCH 2/6] nvme-pci: fix the freeze and quiesce for shutdown and reset case

2018-02-08 Thread jianchao.wang
being submitted. Looking forward your precious advice. Sincerely Jianchao On 02/08/2018 09:40 AM, jianchao.wang wrote: > Hi Keith > > > Really thanks for your your precious time and kindly directive. > That's really appreciated. :) > > On 02/08/2018 12:13 AM, Keith Busch wro

  1   2   3   >