Re: [PATCH 2/2] target: Fix target_wait_for_sess_cmds breakage with active signals
Hello MNC & Co, On Wed, 2018-10-10 at 11:58 -0500, Mike Christie wrote: > On 10/09/2018 10:23 PM, Nicholas A. Bellinger wrote: > > From: Nicholas Bellinger > > > > With the addition of commit 00d909a107 in v4.19-rc, it incorrectly assumes > > no > > signals will be pending for task_struct executing the normal session > > shutdown > > and I/O quiesce code-path. > > > > diff --git a/drivers/target/target_core_transport.c > > b/drivers/target/target_core_transport.c > > index 86c0156..fc3093d2 100644 > > --- a/drivers/target/target_core_transport.c > > +++ b/drivers/target/target_core_transport.c > > @@ -2754,7 +2754,7 @@ static void target_release_cmd_kref(struct kref *kref) > > if (se_sess) { > > spin_lock_irqsave(&se_sess->sess_cmd_lock, flags); > > list_del_init(&se_cmd->se_cmd_list); > > - if (list_empty(&se_sess->sess_cmd_list)) > > + if (se_sess->sess_tearing_down && > > list_empty(&se_sess->sess_cmd_list)) > > I think there is another issue with 00d909a107 and ibmvscsi_tgt. > > The problem is that ibmvscsi_tgt never called > target_sess_cmd_list_set_waiting. It only called > target_wait_for_sess_cmds. So before 00d909a107 there was a bug in that > driver and target_wait_for_sess_cmds never did what was intended because > sess_wait_list would always be empty. > > With 00d909a107, we no longer need to call > target_sess_cmd_list_set_waiting to wait for outstanding commands, so > for ibmvscsi_tgt will now wait for commands like we wanted. However, the > commit added a WARN_ON that is hit if target_sess_cmd_list_set_waiting > is not called, so we could hit that. > > So I think we need to add a target_sess_cmd_list_set_waiting call in > ibmvscsi_tgt to go along with your patch chunk above and make sure we do > not trigger the WARN_ON. > Nice catch. :) With target_wait_for_sess_cmd() usage pre 00d909a107 doing a list-splice in target_sess_cmd_list_set_waiting(), this particular usage in ibmvscsi_tgt has always been list_empty(&sess->sess_wait_list) = true (eg: no se_cmd I/O is quiesced, because no se_cmd in sess_wait_list) since commit 712db3eb in 4.9.y code. That said, ibmvscsi_tgt usage is very similar to vhost/scsi in the respect individual /sys/kernel/config/target/$FABRIC/$WWN/$TPGT/ endpoints used by VMs do not remove their I_T nexus while the VM is active. So AFAICT, ibmvscsi_tgt doesn't strictly need target_sess_wait_for_cmd() at all if this is true.
Re: [PATCH 2/2] target: Fix target_wait_for_sess_cmds breakage with active signals
Hey Peter & Co, On Wed, 2018-10-10 at 10:43 +0200, Peter Zijlstra wrote: > On Wed, Oct 10, 2018 at 03:23:10AM +, Nicholas A. Bellinger wrote: > > From: Nicholas Bellinger > > > > With the addition of commit 00d909a107 in v4.19-rc, it incorrectly assumes > > no > > signals will be pending for task_struct executing the normal session > > shutdown > > and I/O quiesce code-path. > > > > For example, iscsi-target and iser-target issue SIGINT to all kthreads as > > part of session shutdown. This has been the behaviour since day one. > > Not knowing much context here; but does it make sense for those > kthreads to handle signals, ever? Most kthreads should be fine with > ignore_signals(). > iscsi-target + ib-isert uses SIGINT amongst dedicated rx/tx connection kthreads to signal connection shutdown, requiring in-flight se_cmd I/O descriptors to be quiesced before making forward progress to release se_session. By the point wait_event_lock_irq_timeout() is called in the example here, one of the two rx/tx connection kthreads has been stopped, and the other kthread is still processing shutdown. So while historically the pending SIGINTs where not cleared (or ignored) during shutdown at this point, there is no reason why they could not be ignored for iscsi-target + ib-isert. That said, pre commit 00d909a107 code always used wait_for_completion() and ignored pending signals. As-is target_wait_for_sess_cmds() is called directly from fabric driver code and in one case also from user-space via configfs_write_file(), so AFAICT it does need TASK_UNINTERRUPTIBLE.
[GIT PULL] target updates for v4.16-rc1
Hi Linus, Here are the target-pending updates for v4.16-rc1. Please go ahead and pull from: git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git for-next The highlights include: - Numerous target-core-user improvements related to queue full and timeout handling. (MNC) - Prevent target-core-user corruption when invalid data page is requested. (MNC) - Add target-core device action configfs attributes to allow user-space to trigger events separate from existing attributes exposed to end-users. (MNC) - Fix iscsi-target NULL pointer dereference 4.6+ regression in CHAP error path. (David Disseldorp) - Avoid target-core backend UNMAP callbacks if range is zero. (Andrei Vagin) - Fix a iscsi-target 4.14+ regression related multiple PDU logins, that was exposed due to removal of TCP prequeue support. (Florian Westphal + MNC) Also, there is a iser-target bug still being worked on for post -rc1 code to address a long standing issue resulting in persistent ib_post_send() failures, for RNICs with small max_send_sge. Thank you, --nab Andrei Vagin (1): target: don't call an unmap callback if a range length is zero David Disseldorp (1): target/iscsi: avoid NULL dereference in CHAP auth error path Florian Westphal (1): iscsi-target: make sure to wake up sleeping login worker Luis de Bethencourt (1): tcmu: Fix trailing semicolon Markus Elfring (7): sbp-target: Delete an error message for a failed memory allocation in three functions target: tcm_loop: Delete an error message for a failed memory allocation in four functions target: tcm_loop: Improve a size determination in two functions target: tcm_loop: Combine substrings for 26 messages target: tcm_loop: Delete two unnecessary variable initialisations in tcm_loop_issue_tmr() target: tcm_loop: Delete an unnecessary return statement in tcm_loop_submission_work() target: tcm_loop: Use seq_puts() in tcm_loop_show_info() Mike Christie (19): tcmu: merge common block release code tcmu: split unmap_thread_fn tcmu: fix unmap thread race tcmu: move expired command completion to unmap thread tcmu: remove commands_lock tcmu: release blocks for partially setup cmds tcmu: simplify scatter_data_area error handling tcmu: fix free block calculation tcmu: prep queue_cmd_ring to be used by unmap wq tcmu: simplify dbi thresh handling tcmu: don't block submitting context for block waits tcmu: make ring buffer timer configurable tcmu: allow max block and global max blocks to be settable tcmu: prevent corruption when invalid data page requested target: add SAM_STAT_BUSY sense reason target_core_user: add cmd id to broken ring message target core: add device action configfs files tcmu: allow userspace to reset ring tcmu: fix cmd user after free Rasmus Villemoes (1): target-core: don't use "const char*" for a buffer that is written to Varun Prakash (1): cxgbit: call neigh_event_send() to update MAC address Wei Yongjun (1): tcmu: fix error return code in tcmu_configure_device() Xiubo Li (1): tcmu: clean up the scatter helper tangwenji (2): tcmu: fix page addr in tcmu_flush_dcache_range target: fix destroy device in target_configure_device drivers/target/iscsi/cxgbit/cxgbit_cm.c | 3 + drivers/target/iscsi/iscsi_target_auth.c | 3 +- drivers/target/iscsi/iscsi_target_nego.c | 3 + drivers/target/loopback/tcm_loop.c | 145 ++--- drivers/target/sbp/sbp_target.c | 13 +- drivers/target/target_core_configfs.c| 6 + drivers/target/target_core_device.c | 4 +- drivers/target/target_core_fabric_lib.c | 6 +- drivers/target/target_core_internal.h| 3 +- drivers/target/target_core_pr.c | 4 +- drivers/target/target_core_sbc.c | 8 +- drivers/target/target_core_transport.c | 3 + drivers/target/target_core_user.c| 983 ++- include/target/target_core_backend.h | 1 + include/target/target_core_base.h| 2 + 15 files changed, 798 insertions(+), 389 deletions(-)
Re: [PATCH] tcmu: Fix trailing semicolon
On Tue, 2018-01-16 at 10:25 -0600, Michael Christie wrote: > On 01/16/2018 09:34 AM, Luis de Bethencourt wrote: > > The trailing semicolon is an empty statement that does no operation. > > It is completely stripped out by the compiler. Removing it since it doesn't > > do > > anything. > > > > Signed-off-by: Luis de Bethencourt > > --- > > > > Hi, > > > > After fixing the same thing in drivers/staging/rtl8723bs/, Joe Perches > > suggested I fix it treewide [0]. > > > > Best regards > > Luis > > > > > > [0] > > http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2018-January/115410.html > > [1] > > http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2018-January/115390.html > > > > drivers/target/target_core_user.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > Applied. Thanks Luis + MNC.
Re: [PATCH] target-core: don't use "const char*" for a buffer that is written to
Hi Rasmus, Apologies for the delayed follow-up on this. On Tue, 2017-11-21 at 01:12 +0100, Rasmus Villemoes wrote: > From: Rasmus Villemoes > > iscsi_parse_pr_out_transport_id launders the const away via a call to > strstr(), and then modifies the buffer (writing a nul byte) through > the return value. It's cleaner to be honest and simply declare the > parameter as "char*", fixing up the call chain, and allowing us to > drop the cast in the return statement. > > Amusingly, the two current callers found it necessary to cast a > non-const pointer to a const. > > Signed-off-by: Rasmus Villemoes > --- > drivers/target/target_core_fabric_lib.c | 6 +++--- > drivers/target/target_core_internal.h | 2 +- > drivers/target/target_core_pr.c | 4 ++-- > 3 files changed, 6 insertions(+), 6 deletions(-) > Looks fine. Applied. Thank you.
Re: [PATCH] target: don't call an unmap callback if a range length is zero
Hi Andrei, Apologies for the delayed follow up. On Wed, 2017-12-13 at 13:55 -0800, Andrei Vagin wrote: > If a length of a range is zero, it means there is nothing to unmap > and we can skip this range. > > Here is one more reason, why we have to skip such ranges. An unmap > callback calls file_operations->fallocate(), but the man page for the > fallocate syscall says that fallocate(fd, mode, offset, let) returns > EINVAL, if len is zero. It means that file_operations->fallocate() isn't > obligated to handle zero ranges too. > > Signed-off-by: Andrei Vagin > --- > drivers/target/target_core_sbc.c | 8 +--- > 1 file changed, 5 insertions(+), 3 deletions(-) > Applied. Thank you.
[GIT PULL] target updates for v4.15-rc1
Hello Linus, Here are the target-pending updates for v4.15-rc1. Please go ahead and pull from: git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git for-next This series is predominantly bug-fixes, with a few small improvements that have been outstanding over the last release cycle. As usual, the associated bug-fixes have CC' tags for stable. Also, things have been particularly quiet wrt new developments the last months, with most folks continuing to focus on stability atop 4.x stable kernels for their respective production configurations. Also at this point, the stable trees have been synced up with mainline. This will continue to be a priority, as production users tend to run exclusively atop stable kernels, a few releases behind mainline. The highlights include: - target: Fix PR PREEMPT_AND_ABORT null pointer dereference regression in v4.11+ (tangwenji) - target/user: Fix OOPs during removing TCMU device (Xiubo Li + Zhang Zhuoyu) - target/user: Add netlink command reply supported option for each device (Kenjiro Nakayama) - cxgbit: Abort the TCP connection in case of data out timeout (Varun Prakash) - target: Fix PR/ALUA file path truncation (David Disseldorp) - target/user: Fix double se_cmd completion during ->cmd_time_out (Mike Christie) - target: Fix QUEUE_FULL + SCSI task attribute handling in 4.1+ (Bryant Ly + nab) - target: Fix quiese during transport_write_pending_qf endless loop (nab) - target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK in 3.14+ (Don White + nab) Thank you, --nab Bart Van Assche (9): target: Move a declaration of a global variable into a header file target: Suppress gcc 7 fallthrough warnings target: Inline transport_put_cmd() target/iscsi: Define OFFLOAD_BUF_SIZE once target/iscsi: Use min() in iscsit_dump_data_payload() instead of open-coding it target/iscsi: Fix endianness in an error message target/iscsi: Modify iscsit_do_crypto_hash_buf() prototype target/iscsi: Fix a race condition in iscsit_add_reject_from_cmd() target/iscsi: Detect conn_cmd_list corruption early Dan Carpenter (2): tcmu: Fix some memory corruption tcmu: Add a missing unlock on an error path David Disseldorp (2): target: fix PR state file path truncation target: fix ALUA state file path truncation Jiang Yi (1): target/file: Do not return error for UNMAP if length is zero Kenjiro Nakayama (2): target/tcmu: Use macro to call container_of in tcmu_cmd_time_out_show target: Add netlink command reply supported option for each device Markus Elfring (1): iSCSI-target: Use common error handling code in iscsi_decode_text_input() Mike Christie (2): target: return SAM_STAT_TASK_SET_FULL for TCM_OUT_OF_RESOURCES tcmu: fix double se_cmd completion Nicholas Bellinger (6): target: Fix QUEUE_FULL + SCSI task attribute handling target: Fix caw_sem leak in transport_generic_request_failure target: Fix quiese during transport_write_pending_qf endless loop target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref iscsi-target: Fix non-immediate TMR reference leak Varun Prakash (1): cxgbit: Abort the TCP connection in case of data out timeout Xiubo Li (1): tcmu: fix crash when removing the tcmu device tangwenji (8): target: fix null pointer regression in core_tmr_drain_tmr_list target: fix buffer offset in core_scsi3_pri_read_full_status target: fix double unmap data sg in core_scsi3_emulate_pro_register_and_move() target: add sense code INSUFFICIENT REGISTRATION RESOURCES target: fix match_token option in target_core_configfs.c target:fix condition return in core_pr_dump_initiator_port() iscsi-target: fix memory leak in lio_target_tiqn_addtpg() iscsi-target: fix memory leak in iscsit_release_discovery_tpg() drivers/target/iscsi/cxgbit/cxgbit.h | 2 + drivers/target/iscsi/cxgbit/cxgbit_cm.c | 45 + drivers/target/iscsi/cxgbit/cxgbit_ddp.c | 8 + drivers/target/iscsi/cxgbit/cxgbit_main.c| 1 + drivers/target/iscsi/iscsi_target.c | 80 - drivers/target/iscsi/iscsi_target_configfs.c | 3 +- drivers/target/iscsi/iscsi_target_erl1.c | 7 +- drivers/target/iscsi/iscsi_target_parameters.c | 39 ++--- drivers/target/iscsi/iscsi_target_seq_pdu_list.c | 2 - drivers/target/iscsi/iscsi_target_tpg.c | 7 +- drivers/target/iscsi/iscsi_target_util.c | 4 + drivers/target/target_core_alua.c| 51 +++--- drivers/target/target_core_alua.h| 9 - drivers/target/target_core_configfs.c| 14 +- drivers/target/target_core_fabric_configfs.c | 2 - drivers/target/target_core_file.c| 4 + drivers/target/target_core_internal.h| 1 + drivers/target/target_core_pr.c | 41 ++--- drivers/target/tar
[PATCH 0/6] target fixes for v4.15-rc1
From: Nicholas Bellinger Hi all, Here are the outstanding target bugfixes in queue for v4.15-rc1 code. Patch #1 addresses a long standing bug wrt to QUEUE_FULL and SCSI task attribute handling, that results in SCSI task related counters getting updated multiple times during QUEUE_FULL. It primarly effects hosts using ORDERED tasks, which depend upon these counters to know when to delay incoming tasks. Patch #2 is for a recent v4.11+ regression, which during ABORT of COMPARE_AND_WRITE can result in se_device->cam_sem getting leaked due to se_cmd->transport_complete_callback() being skipped. Patch #3 addresses a possible end-less loop during QUEUE_FULL + TFO->write_pending() failure, allowing se_cmd quiese to properly complete the outstanding descriptor when requested. Patch #4 addresses a use-after-tree that was hit in the field, involving pre-backend execution se_cmd exceptions + subsequent ABORT_TASK for a matching tag. Patch #5 + #6 address a iscsi-target TMR related memory and se_cmd->cmd_kref reference leaks respectively. We've been testing #4, #5, and #6 internally on v4.1.y stable code, and have not run into additional regressions. The rest are straight-forward. Please review. --nab Nicholas Bellinger (6): target: Fix QUEUE_FULL + SCSI task attribute handling target: Fix caw_sem leak in transport_generic_request_failure target: Fix quiese during transport_write_pending_qf endless loop target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref iscsi-target: Fix non-immediate TMR reference leak drivers/target/iscsi/iscsi_target.c| 30 ++ drivers/target/target_core_tmr.c | 9 + drivers/target/target_core_transport.c | 26 +++--- include/target/target_core_base.h | 1 + 4 files changed, 47 insertions(+), 19 deletions(-) -- 1.9.1
[PATCH 6/6] iscsi-target: Fix non-immediate TMR reference leak
From: Nicholas Bellinger This patch fixes a se_cmd->cmd_kref reference leak that can occur when a non immediate TMR is proceeded our of command sequence number order, and CMDSN_LOWER_THAN_EXP is returned by iscsit_sequence_cmd(). To address this bug, call target_put_sess_cmd() during this special case following what iscsit_process_scsi_cmd() does upon CMDSN_LOWER_THAN_EXP. Cc: Mike Christie Cc: Hannes Reinecke Signed-off-by: Nicholas Bellinger --- drivers/target/iscsi/iscsi_target.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index 048d422..3b7bb58 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -2094,12 +2094,14 @@ static enum tcm_tmreq_table iscsit_convert_tmf(u8 iscsi_tmf) if (!(hdr->opcode & ISCSI_OP_IMMEDIATE)) { int cmdsn_ret = iscsit_sequence_cmd(conn, cmd, buf, hdr->cmdsn); - if (cmdsn_ret == CMDSN_HIGHER_THAN_EXP) + if (cmdsn_ret == CMDSN_HIGHER_THAN_EXP) { out_of_order_cmdsn = 1; - else if (cmdsn_ret == CMDSN_LOWER_THAN_EXP) + } else if (cmdsn_ret == CMDSN_LOWER_THAN_EXP) { + target_put_sess_cmd(&cmd->se_cmd); return 0; - else if (cmdsn_ret == CMDSN_ERROR_CANNOT_RECOVER) + } else if (cmdsn_ret == CMDSN_ERROR_CANNOT_RECOVER) { return -1; + } } iscsit_ack_from_expstatsn(conn, be32_to_cpu(hdr->exp_statsn)); -- 1.9.1
[PATCH 1/6] target: Fix QUEUE_FULL + SCSI task attribute handling
From: Nicholas Bellinger This patch fixes a bug during QUEUE_FULL where transport_complete_qf() calls transport_complete_task_attr() after it's already been invoked by target_complete_ok_work() or transport_generic_request_failure() during initial completion, preceeding QUEUE_FULL. This will result in se_device->simple_cmds, se_device->dev_cur_ordered_id and/or se_device->dev_ordered_sync being updated multiple times for a single se_cmd. To address this bug, clear SCF_TASK_ATTR_SET after the first call to transport_complete_task_attr(), and avoid updating SCSI task attribute related counters for any subsequent calls. Also, when a se_cmd is deferred due to ordered tags and executed via target_restart_delayed_cmds(), set CMD_T_SENT before execution matching what target_execute_cmd() does. Cc: Michael Cyr Cc: Bryant G. Ly Cc: Mike Christie Cc: Hannes Reinecke Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_transport.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 473d652..c33d1e9 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -2011,6 +2011,8 @@ static void target_restart_delayed_cmds(struct se_device *dev) list_del(&cmd->se_delayed_node); spin_unlock(&dev->delayed_cmd_lock); + cmd->transport_state |= CMD_T_SENT; + __target_execute_cmd(cmd, true); if (cmd->sam_task_attr == TCM_ORDERED_TAG) @@ -2046,6 +2048,8 @@ static void transport_complete_task_attr(struct se_cmd *cmd) pr_debug("Incremented dev_cur_ordered_id: %u for ORDERED\n", dev->dev_cur_ordered_id); } + cmd->se_cmd_flags &= ~SCF_TASK_ATTR_SET; + restart: target_restart_delayed_cmds(dev); } -- 1.9.1
[PATCH 4/6] target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK
From: Nicholas Bellinger This patch fixes bug where early se_cmd exceptions that occur before backend execution can result in use-after-free if/when a subsequent ABORT_TASK occurs for the same tag. Since an early se_cmd exception will have had se_cmd added to se_session->sess_cmd_list via target_get_sess_cmd(), it will not have CMD_T_COMPLETE set by the usual target_complete_cmd() backend completion path. This causes a subsequent ABORT_TASK + __target_check_io_state() to signal ABORT_TASK should proceed. As core_tmr_abort_task() executes, it will bring the outstanding se_cmd->cmd_kref count down to zero releasing se_cmd, after se_cmd has already been queued with error status into fabric driver response path code. To address this bug, introduce a CMD_T_PRE_EXECUTE bit that is set at target_get_sess_cmd() time, and cleared immediately before backend driver dispatch in target_execute_cmd() once CMD_T_ACTIVE is set. Then, check CMD_T_PRE_EXECUTE within __target_check_io_state() to determine when an early exception has occured, and avoid aborting this se_cmd since it will have already been queued into fabric driver response path code. Reported-by: Donald White Cc: Donald White Cc: Mike Christie Cc: Hannes Reinecke Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_tmr.c | 9 + drivers/target/target_core_transport.c | 2 ++ include/target/target_core_base.h | 1 + 3 files changed, 12 insertions(+) diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c index 61909b2..9c7bc1c 100644 --- a/drivers/target/target_core_tmr.c +++ b/drivers/target/target_core_tmr.c @@ -133,6 +133,15 @@ static bool __target_check_io_state(struct se_cmd *se_cmd, spin_unlock(&se_cmd->t_state_lock); return false; } + if (se_cmd->transport_state & CMD_T_PRE_EXECUTE) { + if (se_cmd->scsi_status) { + pr_debug("Attempted to abort io tag: %llu early failure" +" status: 0x%02x\n", se_cmd->tag, +se_cmd->scsi_status); + spin_unlock(&se_cmd->t_state_lock); + return false; + } + } if (sess->sess_tearing_down || se_cmd->cmd_wait_set) { pr_debug("Attempted to abort io tag: %llu already shutdown," " skipping\n", se_cmd->tag); diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 0e89db8..58caacd 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -1975,6 +1975,7 @@ void target_execute_cmd(struct se_cmd *cmd) } cmd->t_state = TRANSPORT_PROCESSING; + cmd->transport_state &= ~CMD_T_PRE_EXECUTE; cmd->transport_state |= CMD_T_ACTIVE | CMD_T_SENT; spin_unlock_irq(&cmd->t_state_lock); @@ -2667,6 +2668,7 @@ int target_get_sess_cmd(struct se_cmd *se_cmd, bool ack_kref) ret = -ESHUTDOWN; goto out; } + se_cmd->transport_state |= CMD_T_PRE_EXECUTE; list_add_tail(&se_cmd->se_cmd_list, &se_sess->sess_cmd_list); out: spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags); diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h index d3139a9..ccf501b 100644 --- a/include/target/target_core_base.h +++ b/include/target/target_core_base.h @@ -490,6 +490,7 @@ struct se_cmd { #define CMD_T_STOP (1 << 5) #define CMD_T_TAS (1 << 10) #define CMD_T_FABRIC_STOP (1 << 11) +#define CMD_T_PRE_EXECUTE (1 << 12) spinlock_t t_state_lock; struct kref cmd_kref; struct completion t_transport_stop_comp; -- 1.9.1
[PATCH 3/6] target: Fix quiese during transport_write_pending_qf endless loop
From: Nicholas Bellinger This patch fixes a potential end-less loop during QUEUE_FULL, where cmd->se_tfo->write_pending() callback fails repeatedly but __transport_wait_for_tasks() has already been invoked to quiese the outstanding se_cmd descriptor. To address this bug, this patch adds a CMD_T_STOP|CMD_T_ABORTED check within transport_write_pending_qf() and invokes the existing se_cmd->t_transport_stop_comp to signal quiese completion back to __transport_wait_for_tasks(). Cc: Mike Christie Cc: Hannes Reinecke Cc: Bryant G. Ly Cc: Michael Cyr Cc: Potnuri Bharat Teja Cc: Sagi Grimberg Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_transport.c | 13 + 1 file changed, 13 insertions(+) diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index d02218c..0e89db8 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -2560,7 +2560,20 @@ void transport_kunmap_data_sg(struct se_cmd *cmd) static void transport_write_pending_qf(struct se_cmd *cmd) { + unsigned long flags; int ret; + bool stop; + + spin_lock_irqsave(&cmd->t_state_lock, flags); + stop = (cmd->transport_state & (CMD_T_STOP | CMD_T_ABORTED)); + spin_unlock_irqrestore(&cmd->t_state_lock, flags); + + if (stop) { + pr_debug("%s:%d CMD_T_STOP|CMD_T_ABORTED for ITT: 0x%08llx\n", + __func__, __LINE__, cmd->tag); + complete_all(&cmd->t_transport_stop_comp); + return; + } ret = cmd->se_tfo->write_pending(cmd); if (ret) { -- 1.9.1
[PATCH 2/6] target: Fix caw_sem leak in transport_generic_request_failure
From: Nicholas Bellinger With the recent addition of transport_check_aborted_status() within transport_generic_request_failure() to avoid sending a SCSI status exception after CMD_T_ABORTED w/ TAS=1 has occured, it introduced a COMPARE_AND_WRITE early failure regression. Namely when COMPARE_AND_WRITE fails and se_device->caw_sem has been taken by sbc_compare_and_write(), if the new check for transport_check_aborted_status() returns true and exits, cmd->transport_complete_callback() -> compare_and_write_post() is skipped never releasing se_device->caw_sem. This regression was originally introduced by: commit e3b88ee95b4e4bf3e9729a4695d695b9c7c296c8 Author: Bart Van Assche Date: Tue Feb 14 16:25:45 2017 -0800 target: Fix handling of aborted failed commands To address this bug, move the transport_check_aborted_status() call after transport_complete_task_attr() and cmd->transport_complete_callback(). Cc: Mike Christie Cc: Hannes Reinecke Cc: Bart Van Assche Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_transport.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index c33d1e9..d02218c 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -1729,9 +1729,6 @@ void transport_generic_request_failure(struct se_cmd *cmd, { int ret = 0, post_ret = 0; - if (transport_check_aborted_status(cmd, 1)) - return; - pr_debug("-[ Storage Engine Exception; sense_reason %d\n", sense_reason); target_show_cmd("-[ ", cmd); @@ -1740,6 +1737,7 @@ void transport_generic_request_failure(struct se_cmd *cmd, * For SAM Task Attribute emulation for failed struct se_cmd */ transport_complete_task_attr(cmd); + /* * Handle special case for COMPARE_AND_WRITE failure, where the * callback is expected to drop the per device ->caw_sem. @@ -1748,6 +1746,9 @@ void transport_generic_request_failure(struct se_cmd *cmd, cmd->transport_complete_callback) cmd->transport_complete_callback(cmd, false, &post_ret); + if (transport_check_aborted_status(cmd, 1)) + return; + switch (sense_reason) { case TCM_NON_EXISTENT_LUN: case TCM_UNSUPPORTED_SCSI_OPCODE: -- 1.9.1
[PATCH 5/6] iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref
From: Nicholas Bellinger Since commit 59b6986dbf fixed a potential NULL pointer dereference by allocating a se_tmr_req for ISCSI_TM_FUNC_TASK_REASSIGN, the se_tmr_req is currently leaked by iscsit_free_cmd() because no iscsi_cmd->se_cmd.se_tfo was associated. To address this, treat ISCSI_TM_FUNC_TASK_REASSIGN like any other TMR and call transport_init_se_cmd() + target_get_sess_cmd() to setup iscsi_cmd->se_cmd.se_tfo with se_cmd->cmd_kref of 2. This will ensure normal release operation once se_cmd->cmd_kref reaches zero and target_release_cmd_kref() is invoked, se_tmr_req will be released via existing target_free_cmd_mem() and core_tmr_release_req() code. Reported-by: Donald White Cc: Donald White Cc: Mike Christie Cc: Hannes Reinecke Signed-off-by: Nicholas Bellinger --- drivers/target/iscsi/iscsi_target.c | 22 +- 1 file changed, 9 insertions(+), 13 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index 541f66a..048d422 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -1955,7 +1955,6 @@ static enum tcm_tmreq_table iscsit_convert_tmf(u8 iscsi_tmf) struct iscsi_tmr_req *tmr_req; struct iscsi_tm *hdr; int out_of_order_cmdsn = 0, ret; - bool sess_ref = false; u8 function, tcm_function = TMR_UNKNOWN; hdr = (struct iscsi_tm *) buf; @@ -1988,22 +1987,23 @@ static enum tcm_tmreq_table iscsit_convert_tmf(u8 iscsi_tmf) cmd->data_direction = DMA_NONE; cmd->tmr_req = kzalloc(sizeof(*cmd->tmr_req), GFP_KERNEL); - if (!cmd->tmr_req) + if (!cmd->tmr_req) { return iscsit_add_reject_cmd(cmd, ISCSI_REASON_BOOKMARK_NO_RESOURCES, buf); + } + + transport_init_se_cmd(&cmd->se_cmd, &iscsi_ops, + conn->sess->se_sess, 0, DMA_NONE, + TCM_SIMPLE_TAG, cmd->sense_buffer + 2); + + target_get_sess_cmd(&cmd->se_cmd, true); /* * TASK_REASSIGN for ERL=2 / connection stays inside of * LIO-Target $FABRIC_MOD */ if (function != ISCSI_TM_FUNC_TASK_REASSIGN) { - transport_init_se_cmd(&cmd->se_cmd, &iscsi_ops, - conn->sess->se_sess, 0, DMA_NONE, - TCM_SIMPLE_TAG, cmd->sense_buffer + 2); - - target_get_sess_cmd(&cmd->se_cmd, true); - sess_ref = true; tcm_function = iscsit_convert_tmf(function); if (tcm_function == TMR_UNKNOWN) { pr_err("Unknown iSCSI TMR Function:" @@ -2119,12 +2119,8 @@ static enum tcm_tmreq_table iscsit_convert_tmf(u8 iscsi_tmf) * For connection recovery, this is also the default action for * TMR TASK_REASSIGN. */ - if (sess_ref) { - pr_debug("Handle TMR, using sess_ref=true check\n"); - target_put_sess_cmd(&cmd->se_cmd); - } - iscsit_add_cmd_to_response_queue(cmd, conn, cmd->i_state); + target_put_sess_cmd(&cmd->se_cmd); return 0; } EXPORT_SYMBOL(iscsit_handle_task_mgt_cmd); -- 1.9.1
[GIT PULL] target fixes for v4.13-rc5
Hi Linus, Here are the target-pending fixes for v4.13-rc5. Please go ahead and pull from: git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git master The highlights include: - Fix iscsi-target payload memory leak during ISCSI_FLAG_TEXT_CONTINUE (Varun Prakash) - Fix tcm_qla2xxx incorrect use of tcm_qla2xxx_free_cmd during ABORT (Pascal de Bruijn + Himanshu Madhani + nab) - Fix iscsi-target long-standing issue with parallel delete of a single network portal across multiple target instances (Gary Guo + nab) - Fix target dynamic se_node GPF during uncached shutdown regression (Justin Maggard + nab) Thank you, --nab Bryant G. Ly (1): tcmu: free old string on reconfig Nicholas Bellinger (3): qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR ABORT (v2) iscsi-target: Fix iscsi_np reset hung task during parallel delete target: Fix node_acl demo-mode + uncached dynamic shutdown regression Varun Prakash (4): cxgbit: add missing __kfree_skb() iscsi-target: fix memory leak in iscsit_setup_text_cmd() iscsi-target: fix invalid flags in text response cxgbit: fix sg_nents calculation Xiubo Li (1): tcmu: Fix possible to/from address overflow when doing the memcpy drivers/scsi/qla2xxx/tcm_qla2xxx.c | 30 - drivers/target/iscsi/cxgbit/cxgbit_cm.c | 16 +++ drivers/target/iscsi/cxgbit/cxgbit_target.c | 12 +++- drivers/target/iscsi/iscsi_target.c | 6 -- drivers/target/iscsi/iscsi_target_login.c | 7 +-- drivers/target/target_core_tpg.c| 4 ++-- drivers/target/target_core_transport.c | 4 ++-- drivers/target/target_core_user.c | 13 +++-- include/target/iscsi/iscsi_target_core.h| 1 + 9 files changed, 40 insertions(+), 53 deletions(-)
[PATCH] iscsi-target: Fix iscsi_np reset hung task during parallel delete
From: Nicholas Bellinger This patch fixes a bug associated with iscsit_reset_np_thread() that can occur during parallel configfs rmdir of a single iscsi_np used across multiple iscsi-target instances, that would result in hung task(s) similar to below where configfs rmdir process context was blocked indefinately waiting for iscsi_np->np_restart_comp to finish: [ 6726.112076] INFO: task dcp_proxy_node_:15550 blocked for more than 120 seconds. [ 6726.119440] Tainted: GW O 4.1.26-3321 #2 [ 6726.125045] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 6726.132927] dcp_proxy_node_ D 8803f202bc88 0 15550 1 0x [ 6726.140058] 8803f202bc88 88085c64d960 88083b3b1ad0 88087fffeb08 [ 6726.147593] 8803f202c000 7fff 88083f459c28 88083b3b1ad0 [ 6726.155132] 88035373c100 8803f202bca8 8168ced2 8803f202bcb8 [ 6726.162667] Call Trace: [ 6726.165150] [] schedule+0x32/0x80 [ 6726.170156] [] schedule_timeout+0x214/0x290 [ 6726.176030] [] ? __send_signal+0x52/0x4a0 [ 6726.181728] [] wait_for_completion+0x96/0x100 [ 6726.187774] [] ? wake_up_state+0x10/0x10 [ 6726.193395] [] iscsit_reset_np_thread+0x62/0xe0 [iscsi_target_mod] [ 6726.201278] [] iscsit_tpg_disable_portal_group+0x96/0x190 [iscsi_target_mod] [ 6726.210033] [] lio_target_tpg_store_enable+0x4f/0xc0 [iscsi_target_mod] [ 6726.218351] [] configfs_write_file+0xaa/0x110 [ 6726.224392] [] vfs_write+0xa4/0x1b0 [ 6726.229576] [] SyS_write+0x41/0xb0 [ 6726.234659] [] system_call_fastpath+0x12/0x71 It would happen because each iscsit_reset_np_thread() sets state to ISCSI_NP_THREAD_RESET, sends SIGINT, and then blocks waiting for completion on iscsi_np->np_restart_comp. However, if iscsi_np was active processing a login request and more than a single iscsit_reset_np_thread() caller to the same iscsi_np was blocked on iscsi_np->np_restart_comp, iscsi_np kthread process context in __iscsi_target_login_thread() would flush pending signals and only perform a single completion of np->np_restart_comp before going back to sleep within transport specific iscsit_transport->iscsi_accept_np code. To address this bug, add a iscsi_np->np_reset_count and update __iscsi_target_login_thread() to keep completing np->np_restart_comp until ->np_reset_count has reached zero. Reported-by: Gary Guo Tested-by: Gary Guo Cc: Mike Christie Cc: Hannes Reinecke Signed-off-by: Nicholas Bellinger --- drivers/target/iscsi/iscsi_target.c | 1 + drivers/target/iscsi/iscsi_target_login.c | 7 +-- include/target/iscsi/iscsi_target_core.h | 1 + 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index 12803de..5001261 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -418,6 +418,7 @@ int iscsit_reset_np_thread( return 0; } np->np_thread_state = ISCSI_NP_THREAD_RESET; + atomic_inc(&np->np_reset_count); if (np->np_thread) { spin_unlock_bh(&np->np_thread_lock); diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c index e9bdc8b..dc13afb 100644 --- a/drivers/target/iscsi/iscsi_target_login.c +++ b/drivers/target/iscsi/iscsi_target_login.c @@ -1243,9 +1243,11 @@ static int __iscsi_target_login_thread(struct iscsi_np *np) flush_signals(current); spin_lock_bh(&np->np_thread_lock); - if (np->np_thread_state == ISCSI_NP_THREAD_RESET) { + if (atomic_dec_if_positive(&np->np_reset_count) >= 0) { np->np_thread_state = ISCSI_NP_THREAD_ACTIVE; + spin_unlock_bh(&np->np_thread_lock); complete(&np->np_restart_comp); + return 1; } else if (np->np_thread_state == ISCSI_NP_THREAD_SHUTDOWN) { spin_unlock_bh(&np->np_thread_lock); goto exit; @@ -1278,7 +1280,8 @@ static int __iscsi_target_login_thread(struct iscsi_np *np) goto exit; } else if (rc < 0) { spin_lock_bh(&np->np_thread_lock); - if (np->np_thread_state == ISCSI_NP_THREAD_RESET) { + if (atomic_dec_if_positive(&np->np_reset_count) >= 0) { + np->np_thread_state = ISCSI_NP_THREAD_ACTIVE; spin_unlock_bh(&np->np_thread_lock); complete(&np->np_restart_comp); iscsit_put_transport(conn->conn_transport); diff --git a/include/target/iscsi/iscsi_target_core.h b/include/target/iscsi/iscsi_target_core.h index 0ca1fb0..fb87d32 100644 --- a/include/target/iscsi/iscsi_target_core.h +++ b/include/target/iscsi/iscsi_target_core.h @@ -786,6 +786,7 @@ struct iscsi_np { int np_sock_type; enum np_thread_state_table np_thread_state; boolenable
[PATCH] qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR ABORT (v2)
From: Nicholas Bellinger This patch drops two incorrect usages of tcm_qla2xxx_free_cmd() during TMR ABORT within tcm_qla2xxx_handle_data_work() and tcm_qla2xxx_aborted_task(), which where attempting to dispatch into workqueue context to do tcm_qla2xxx_complete_free() and subsequently invoke transport_generic_free_cmd(). This is incorrect because during TMR ABORT target-core will drop the outstanding se_cmd->cmd_kref references once it has quiesced the se_cmd via transport_wait_for_tasks(), and in the case of qla2xxx it should not attempt to do it's own transport_generic_free_cmd() once the abort has occured. As reported by Pascal, this was originally manifesting as a BUG_ON(cmd->cmd_in_wq) in qlt_free_cmd() during TMR ABORT, with a LIO backend that had sufficently high enough WRITE latency to trigger a host side TMR ABORT_TASK. (v2: Drop the qla_tgt_cmd->write_pending_abort_comp changes, as they will be addressed in a seperate series) Reported-by: Pascal de Bruijn Tested-by: Pascal de Bruijn Cc: Pascal de Bruijn Reported-by: Lukasz Engel Cc: Lukasz Engel Acked-by: Himanshu Madhani Cc: Quinn Tran Signed-off-by: Nicholas Bellinger --- drivers/scsi/qla2xxx/tcm_qla2xxx.c | 30 -- 1 file changed, 30 deletions(-) diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c index b20da0d..3f82ea1 100644 --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c @@ -500,7 +500,6 @@ static int tcm_qla2xxx_handle_cmd(scsi_qla_host_t *vha, struct qla_tgt_cmd *cmd, static void tcm_qla2xxx_handle_data_work(struct work_struct *work) { struct qla_tgt_cmd *cmd = container_of(work, struct qla_tgt_cmd, work); - unsigned long flags; /* * Ensure that the complete FCP WRITE payload has been received. @@ -508,17 +507,6 @@ static void tcm_qla2xxx_handle_data_work(struct work_struct *work) */ cmd->cmd_in_wq = 0; - spin_lock_irqsave(&cmd->cmd_lock, flags); - cmd->data_work = 1; - if (cmd->aborted) { - cmd->data_work_free = 1; - spin_unlock_irqrestore(&cmd->cmd_lock, flags); - - tcm_qla2xxx_free_cmd(cmd); - return; - } - spin_unlock_irqrestore(&cmd->cmd_lock, flags); - cmd->qpair->tgt_counters.qla_core_ret_ctio++; if (!cmd->write_data_transferred) { /* @@ -765,31 +753,13 @@ static void tcm_qla2xxx_queue_tm_rsp(struct se_cmd *se_cmd) qlt_xmit_tm_rsp(mcmd); } -#define DATA_WORK_NOT_FREE(_cmd) (_cmd->data_work && !_cmd->data_work_free) static void tcm_qla2xxx_aborted_task(struct se_cmd *se_cmd) { struct qla_tgt_cmd *cmd = container_of(se_cmd, struct qla_tgt_cmd, se_cmd); - unsigned long flags; if (qlt_abort_cmd(cmd)) return; - - spin_lock_irqsave(&cmd->cmd_lock, flags); - if ((cmd->state == QLA_TGT_STATE_NEW)|| - ((cmd->state == QLA_TGT_STATE_DATA_IN) && - DATA_WORK_NOT_FREE(cmd))) { - cmd->data_work_free = 1; - spin_unlock_irqrestore(&cmd->cmd_lock, flags); - /* -* cmd has not reached fw, Use this trigger to free it. -*/ - tcm_qla2xxx_free_cmd(cmd); - return; - } - spin_unlock_irqrestore(&cmd->cmd_lock, flags); - return; - } static void tcm_qla2xxx_clear_sess_lookup(struct tcm_qla2xxx_lport *, -- 1.9.1
[GIT PULL] target updates for v4.13-rc1
Hi Linus, Here are the target-pending updates for v4.13-rc1. Please go ahead and pull from: git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git for-next Note there is a qla2xxx conflict with scsi.git as reported by SFR, which should be straight-forward to resolve: https://lkml.org/lkml/2017/6/28/45 Note there was two additional qla2xxx conflicts with scsi.git: https://lkml.org/lkml/2017/7/3/22 https://lkml.org/lkml/2017/7/3/23 However, the patch that caused these has since been reverted: 55dd8cf Revert "qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR ABORT" as it caused other problems, so these conflicts can be ignored. Attached is v4.13-rc1-resolution.diff if there are any questions wrt to how the resolution should look. Adding Himanshu + Quinn CC'. That said, it's been usually busy for summer, with most of the efforts centered around TCMU developments and various target-core + fabric driver bug fixing activities. Not particularly large in terms of LoC, but lots of smaller patches from many different folks. The highlights include: - ibmvscsis logical partition manager support (Michael Cyr + Bryant Ly) - Convert target/iblock WRITE_SAME to blkdev_issue_zeroout (hch + nab) - Add support for TMR percpu LUN reference counting (nab) - Fix a potential deadlock between EXTENDED_COPY and iscsi shutdown (Bart) - Fix COMPARE_AND_WRITE caw_sem leak during se_cmd quiesce (Jiang Yi) - Fix TMCU module removal (Xiubo Li) - Fix iser-target OOPs during login failure (Andrea Righi + Sagi) - Breakup target-core free_device backend driver callback (mnc) - Perform TCMU add/delete/reconfig synchronously (mnc) - Fix TCMU multiple UIO open/close sequences (mnc) - Fix TCMU CHECK_CONDITION sense handling (mnc) - Fix target-core SAM_STAT_BUSY + TASK_SET_FULL handling (mnc + nab) - Introduce TYPE_ZBC support in PSCSI (Damien Le Moal) - Fix possible TCMU memory leak + OOPs when recalculating cmd base size (Xiubo Li + Bryant Ly + Damien Le Moal + mnc) - Add login_keys_workaround attribute for non RFC initiators (Robert LeBlanc + Arun Easi + nab) Thank you, --nab Bart Van Assche (17): target: Use symbolic value for WRITE_VERIFY_16 target: Remove se_device.dev_list target: Fix transport_init_se_cmd() target: Use {get,put}_unaligned_be*() instead of open coding these functions target: Fix a deadlock between the XCOPY code and iSCSI session shutdown IB/srpt: Make a debug statement in srpt_abort_cmd() more informative xen/scsiback: Fix a TMR related use-after-free xen/scsiback: Replace a waitqueue and a counter by a completion xen/scsiback: Make TMF processing slightly faster target: Introduce a function that shows the command state target/tcm_loop: Merge struct tcm_loop_cmd and struct tcm_loop_tmr target/tcm_loop: Replace a waitqueue and a counter by a completion target/tcm_loop: Use target_submit_tmr() instead of open-coding this function target/tcm_loop: Make TMF processing slightly faster target/iscsi: Remove second argument of __iscsit_free_cmd() target/iscsi: Simplify iscsit_free_cmd() target/iscsi: Remove dead code from iscsit_process_scsi_cmd() Bryant G. Ly (7): ibmvscsis: Use tpgt passed in by user tcmu: Support emulate_write_cache tcmu: Add netlink for device reconfiguration tcmu: Make dev_size configurable via userspace tcmu: Make dev_config configurable tcmu: Add Type of reconfig into netlink tcmu: Fix dev_config_store Byungchul Park (1): vhost/scsi: Don't reinvent the wheel but use existing llist API Colin Ian King (2): tcmu: make array tcmu_attrib_attrs static const target: make device_mutex and device_list static Damien Le Moal (2): target: Use macro for WRITE_VERIFY_32 operation codes target: pscsi: Introduce TYPE_ZBC support Gustavo A. R. Silva (1): target: remove dead code Jiang Yi (2): target: reject COMPARE_AND_WRITE if emulate_caw is not set target: Fix COMPARE_AND_WRITE caw_sem leak during se_cmd quiesce Michael Cyr (1): ibmvscsis: Enable Logical Partition Migration Support Mike Christie (17): tcmu: reconfigure netlink attr changes target: break up free_device callback target: use idr for se_device dev index target: add helper to find se_device by dev_index tcmu: perfom device add, del and reconfig synchronously target: add helper to iterate over devices xcopy: loop over devices using idr helper target: remove g_device_list tcmu: drop configured check in destroy tcmu: fix multiple uio open/close sequences target: do not require a transport_complete for SCF_TRANSPORT_TASK_SENSE target: add helper to copy sense to se_cmd buffer tcmu: fix sense handling during completion pscsi: finish cmd processing from pscsi_req_done target: remove transport_complete target: fix SAM_STAT_BUSY/TASK_SET_FULL handling target: export lio pgr/alua support as device attr Nicholas Bellinger (12): target/iblock: Convert WRITE_SAME to blkdev_issue_zeroout target
Re: [PATCH] iscsi-target: Reject immediate data underflow larger than SCSI transfer length
On Tue, 2017-07-11 at 16:17 +, Bart Van Assche wrote: > On Tue, 2017-07-11 at 00:22 -0700, Nicholas A. Bellinger wrote: > > So rejecting this case as already done in commit abb85a9b51 is the > > correct approach for >= v4.3.y. > > Hello Nic, > > I hope that you agree that the current target_cmd_size_check() implementation > is complicated and ugly. Patch 30/33 of the patch series I referred to in my > e-mail removes a significant number of lines of code from that function. So > my patch series not only makes target_cmd_size_check() easier to maintain and > to verify but it makes that function also faster. Hence please reconsider the > approach from my patch series. For patch 30/33, see also > https://www.spinics.net/lists/target-devel/msg15384.html. For reference, here is the patch your advocating: diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index c28e3b58150b..6cd49fe578a7 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -1164,23 +1164,6 @@ target_cmd_size_check(struct se_cmd *cmd, unsigned int size) " %u does not match SCSI CDB Length: %u for SAM Opcode:" " 0x%02x\n", cmd->se_tfo->get_fabric_name(), cmd->data_length, size, cmd->t_task_cdb[0]); - - if (cmd->data_direction == DMA_TO_DEVICE && - cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) { - pr_err("Rejecting underflow/overflow WRITE data\n"); - return TCM_INVALID_CDB_FIELD; - } - /* -* Reject READ_* or WRITE_* with overflow/underflow for -* type SCF_SCSI_DATA_CDB. -*/ - if (dev->dev_attrib.block_size != 512) { - pr_err("Failing OVERFLOW/UNDERFLOW for LBA op" - " CDB on non 512-byte sector setup subsystem" - " plugin: %s\n", dev->transport->name); - /* Returns CHECK_CONDITION + INVALID_CDB_FIELD */ - return TCM_INVALID_CDB_FIELD; - } /* * For the overflow case keep the existing fabric provided * ->data_length. Otherwise for the underflow case, reset The original code is not 'complicated' or 'ugly', and the proposed change doesn't make anything 'faster' nor removes a 'significant' number of LoC. So no, I don't buy any of that line of reasoning. ;-) It comes down to two items. First, if SCF_SCSI_DATA_CDBs with underflow/overflow will be allowed, and second the block_size != 512 check. For the former, I've still never seen a host environment in the wild over the last 15 years that generates underflow/overflow for DATA CDBs with an LBA. So I'm reluctant to randomly allow this for all cases and fabrics, considering no host environment actually requires it. That said, I do understand libiscsi generates this for WRITE_VERIFY, but I'm still undecided if that's a good enough a reason to enable it for all cases in upstream. For the latter item, it's fine to drop the legacy block_size != 512 check, and I'll take a patch for that separate from the other bit.
Re: [PATCH] iscsi-target: Add login_keys_workaround attribute for non RFC initiators
On Tue, 2017-07-11 at 23:38 +, Bart Van Assche wrote: > On Fri, 2017-07-07 at 22:24 +0000, Nicholas A. Bellinger wrote: > > From: Nicholas Bellinger > > > > This patch re-introduces part of a long standing login workaround that > > was recently dropped by: > > > > commit 1c99de981f30b3e7868b8d20ce5479fa1c0fea46 > > Author: Nicholas Bellinger > > Date: Sun Apr 2 13:36:44 2017 -0700 > > > > iscsi-target: Drop work-around for legacy GlobalSAN initiator > > > > Namely, the workaround for FirstBurstLength ended up being required by > > Mellanox Flexboot PXE boot ROMs as reported by Robert. > > > > So this patch re-adds the work-around for FirstBurstLength within > > iscsi_check_proposer_for_optional_reply(), and makes the key optional > > to respond when the initiator does not propose, nor respond to it. > > > > Also as requested by Arun, this patch introduces a new TPG attribute > > named 'login_keys_workaround' that controls the use of both the > > FirstBurstLength workaround, as well as the two other existing > > workarounds for gPXE iSCSI boot client. > > > > By default, the workaround is enabled with login_keys_workaround=1, > > since Mellanox FlexBoot requires it, and Arun has verified the Qlogic > > MSFT initiator already proposes FirstBurstLength, so it's uneffected > > by this re-adding this part of the original work-around. > > Hello Nick, > > The new configfs attribute ("login_keys_workaround") may confuse users - for > someone who has not followed this e-mail thread it can take a long time > before they figure out that they need to set this configfs attribute. It's enabled by default, so there is nothing a user has to explicitly change in order all hosts to 'just work'. The only reason the attribute was added was by request of Arun, so if some future initiator doesn't proposed the keys controlled by the work-around, and still attempts to respond they can at least get something working w/o code change. > Have > you considered to let the iSCSI target driver figure out whether or not that > variable has to be set, e.g. by looking up the initiator IQN in a list? > Given InitiatorName is end user configurable, trying to do a workaround based on IQN regexs is error prone, at best. Also, since the FirstBurstLength work-around this patch re-adds had already been in place for the better part of 8 years, the risk of interopt issues is almost non existent.
Re: [PATCH] iscsi-target: Add login_keys_workaround attribute for non RFC initiators
Hey Robert, Any chance to test this with your Flexboot PXE setup..? Please give this a spin ASAP to verify it addresses the regression you reported earlier, wrt FirstBurstLength not being proposed nor responded to using Flexboot PXE. Thank you. On Fri, 2017-07-07 at 22:24 +, Nicholas A. Bellinger wrote: > From: Nicholas Bellinger > > This patch re-introduces part of a long standing login workaround that > was recently dropped by: > > commit 1c99de981f30b3e7868b8d20ce5479fa1c0fea46 > Author: Nicholas Bellinger > Date: Sun Apr 2 13:36:44 2017 -0700 > > iscsi-target: Drop work-around for legacy GlobalSAN initiator > > Namely, the workaround for FirstBurstLength ended up being required by > Mellanox Flexboot PXE boot ROMs as reported by Robert. > > So this patch re-adds the work-around for FirstBurstLength within > iscsi_check_proposer_for_optional_reply(), and makes the key optional > to respond when the initiator does not propose, nor respond to it. > > Also as requested by Arun, this patch introduces a new TPG attribute > named 'login_keys_workaround' that controls the use of both the > FirstBurstLength workaround, as well as the two other existing > workarounds for gPXE iSCSI boot client. > > By default, the workaround is enabled with login_keys_workaround=1, > since Mellanox FlexBoot requires it, and Arun has verified the Qlogic > MSFT initiator already proposes FirstBurstLength, so it's uneffected > by this re-adding this part of the original work-around. > > Reported-by: Robert LeBlanc > Cc: Robert LeBlanc > Cc: Arun Easi > Signed-off-by: Nicholas Bellinger > --- > drivers/target/iscsi/iscsi_target_configfs.c | 2 ++ > drivers/target/iscsi/iscsi_target_nego.c | 6 ++-- > drivers/target/iscsi/iscsi_target_parameters.c | 41 > ++ > drivers/target/iscsi/iscsi_target_parameters.h | 2 +- > drivers/target/iscsi/iscsi_target_tpg.c| 19 > drivers/target/iscsi/iscsi_target_tpg.h| 1 + > include/target/iscsi/iscsi_target_core.h | 9 ++ > 7 files changed, 64 insertions(+), 16 deletions(-) > > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c > b/drivers/target/iscsi/iscsi_target_configfs.c > index 535a8e0..0dd4c45 100644 > --- a/drivers/target/iscsi/iscsi_target_configfs.c > +++ b/drivers/target/iscsi/iscsi_target_configfs.c > @@ -781,6 +781,7 @@ static int lio_target_init_nodeacl(struct se_node_acl > *se_nacl, > DEF_TPG_ATTRIB(t10_pi); > DEF_TPG_ATTRIB(fabric_prot_type); > DEF_TPG_ATTRIB(tpg_enabled_sendtargets); > +DEF_TPG_ATTRIB(login_keys_workaround); > > static struct configfs_attribute *lio_target_tpg_attrib_attrs[] = { > &iscsi_tpg_attrib_attr_authentication, > @@ -796,6 +797,7 @@ static int lio_target_init_nodeacl(struct se_node_acl > *se_nacl, > &iscsi_tpg_attrib_attr_t10_pi, > &iscsi_tpg_attrib_attr_fabric_prot_type, > &iscsi_tpg_attrib_attr_tpg_enabled_sendtargets, > + &iscsi_tpg_attrib_attr_login_keys_workaround, > NULL, > }; > > diff --git a/drivers/target/iscsi/iscsi_target_nego.c > b/drivers/target/iscsi/iscsi_target_nego.c > index 96df63f..7a6751f 100644 > --- a/drivers/target/iscsi/iscsi_target_nego.c > +++ b/drivers/target/iscsi/iscsi_target_nego.c > @@ -864,7 +864,8 @@ static int iscsi_target_handle_csg_zero( > SENDER_TARGET, > login->rsp_buf, > &login->rsp_length, > - conn->param_list); > + conn->param_list, > + conn->tpg->tpg_attrib.login_keys_workaround); > if (ret < 0) > return -1; > > @@ -934,7 +935,8 @@ static int iscsi_target_handle_csg_one(struct iscsi_conn > *conn, struct iscsi_log > SENDER_TARGET, > login->rsp_buf, > &login->rsp_length, > - conn->param_list); > + conn->param_list, > + conn->tpg->tpg_attrib.login_keys_workaround); > if (ret < 0) { > iscsit_tx_login_rsp(conn, ISCSI_STATUS_CLS_INITIATOR_ERR, > ISCSI_LOGIN_STATUS_INIT_ERR); > diff --git a/drivers/target/iscsi/iscsi_target_parameters.c > b/drivers/target/iscsi/iscsi_target_parameters.c > index fce6276..caab104 100644 > --- a/drivers/target/iscsi/iscsi_target_parameters.c > +++ b/drivers/target/iscsi/iscsi_target_parameters.c > @@ -765,7 +765,8 @@ static int iscsi_check_for_auth_key(char *key) > return 0; > } > > -static void iscsi_check_pr
Re: [PATCH] iscsi-target: Reject immediate data underflow larger than SCSI transfer length
Hi Bart, On Thu, 2017-06-08 at 23:55 -0700, Nicholas A. Bellinger wrote: > On Thu, 2017-06-08 at 15:37 +, Bart Van Assche wrote: > > On Thu, 2017-06-08 at 04:21 +, Nicholas A. Bellinger wrote: > > > + /* > > > + * Check for underflow case where both EDTL and immediate data payload > > > + * exceeds what is presented by CDB's TRANSFER LENGTH, and what has > > > + * already been set in target_cmd_size_check() as se_cmd->data_length. > > > + * > > > + * For this special case, fail the command and dump the immediate data > > > + * payload. > > > + */ > > > + if (cmd->first_burst_len > cmd->se_cmd.data_length) { > > > + cmd->sense_reason = TCM_INVALID_CDB_FIELD; > > > + goto after_immediate_data; > > > + } > > > > A quote from the iSCSI RFC (https://tools.ietf.org/html/rfc5048): > > > >If SPDTL < EDTL for a task, iSCSI Underflow MUST be signaled in the > >SCSI Response PDU as specified in [RFC3720]. The Residual Count MUST > >be set to the numerical value of (EDTL - SPDTL). > > > > Sorry but I don't think that sending TCM_INVALID_CDB_FIELD back to the > > initiator is compliant with the iSCSI RFC. > > Alas, the nuance of what this patch actually does was missed when you > cut the context. > > First, a bit of history. LIO has rejected underflow for all WRITEs for > the first ~12.5 years of RFC-3720, and in the context of iscsi-target > mode there has never been a single host environment that ever once > cared. > > Since Roland's patch to allow underflow for control CDBs in v4.3+ opened > this discussion for control CDBs with a WRITE payload in order to make > MSFT/FCP cert for PERSISTENT_RESERVE_OUT happy, the question has become > what control CDB WRITE underflow cases should we allow..? > > The point with this patch is when a host is sending a underflow with a > iscsi immediate data payload that exceeds SCSI transfer length, it's a > bogus request with a garbage payload. It's a garbage payload because > the SCSI CDB itself obviously doesn't want anything to do it. > > I'm very dubious of any host environment who's trying to do this for any > CDB, and expects achieve expected results. > > Of course, since v4.3+ normal overflow where SCSI transfer length > matches the iscsi immediate data payload just works with or without this > patch. > > So to that extent, I'm going to push this patch as a defensive fix for > v4.3+, to let those imaginary iscsi host environments know they being > very, very naughty. > > > Please note that a fix that is > > compliant with the iSCSI RFC is present in the following patch series: > > [PATCH > > 00/33] SCSI target driver patches for kernel v4.13, 23 May 2017 > > (https://www.spinics.net/lists/target-devel/msg15370.html). > > So I might still consider this as a v4.13-rc item for control CDB > underflow, but no way as stable material. > > Also, there is certainly no way I'm going to allow a patch to randomly > enable underflow/overflow for all WRITE non control CDBs tree-wide > across all fabric drivers, because 1) no host environments actually care > about this, and 2) it's still dangerous to do for all fabrics without > some serious auditing. After further consideration, I've decided against allowing iscsi-target underflow with a immediate data payload larger than SCSI transfer length. Any host environment that attempts to send an underflow with a immediate data payload larger than SCSI transfer length, expects the target to automatically truncate immediate data, and still return GOOD status is completely bogus. Any host that attempts this is buggy, and needs to be fixed. This is because for the last ~12 years of RFC-3720: - There has never been a host environment in the wild that exhibits this behavior. - There has never been a conformance suite which expects this behavior. So rejecting this case as already done in commit abb85a9b51 is the correct approach for >= v4.3.y. Of course, the typical underflow scenario which Roland's v4.3.y commit enabled, underflow where immediate data matches the SCSI transfer length is supported for control CDBs. That said, thanks for high-lighting this particular corner case, so it could be fixed in >= v4.3.y.
[PATCH] iscsi-target: Add login_keys_workaround attribute for non RFC initiators
From: Nicholas Bellinger This patch re-introduces part of a long standing login workaround that was recently dropped by: commit 1c99de981f30b3e7868b8d20ce5479fa1c0fea46 Author: Nicholas Bellinger Date: Sun Apr 2 13:36:44 2017 -0700 iscsi-target: Drop work-around for legacy GlobalSAN initiator Namely, the workaround for FirstBurstLength ended up being required by Mellanox Flexboot PXE boot ROMs as reported by Robert. So this patch re-adds the work-around for FirstBurstLength within iscsi_check_proposer_for_optional_reply(), and makes the key optional to respond when the initiator does not propose, nor respond to it. Also as requested by Arun, this patch introduces a new TPG attribute named 'login_keys_workaround' that controls the use of both the FirstBurstLength workaround, as well as the two other existing workarounds for gPXE iSCSI boot client. By default, the workaround is enabled with login_keys_workaround=1, since Mellanox FlexBoot requires it, and Arun has verified the Qlogic MSFT initiator already proposes FirstBurstLength, so it's uneffected by this re-adding this part of the original work-around. Reported-by: Robert LeBlanc Cc: Robert LeBlanc Cc: Arun Easi Signed-off-by: Nicholas Bellinger --- drivers/target/iscsi/iscsi_target_configfs.c | 2 ++ drivers/target/iscsi/iscsi_target_nego.c | 6 ++-- drivers/target/iscsi/iscsi_target_parameters.c | 41 ++ drivers/target/iscsi/iscsi_target_parameters.h | 2 +- drivers/target/iscsi/iscsi_target_tpg.c| 19 drivers/target/iscsi/iscsi_target_tpg.h| 1 + include/target/iscsi/iscsi_target_core.h | 9 ++ 7 files changed, 64 insertions(+), 16 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configfs.c index 535a8e0..0dd4c45 100644 --- a/drivers/target/iscsi/iscsi_target_configfs.c +++ b/drivers/target/iscsi/iscsi_target_configfs.c @@ -781,6 +781,7 @@ static int lio_target_init_nodeacl(struct se_node_acl *se_nacl, DEF_TPG_ATTRIB(t10_pi); DEF_TPG_ATTRIB(fabric_prot_type); DEF_TPG_ATTRIB(tpg_enabled_sendtargets); +DEF_TPG_ATTRIB(login_keys_workaround); static struct configfs_attribute *lio_target_tpg_attrib_attrs[] = { &iscsi_tpg_attrib_attr_authentication, @@ -796,6 +797,7 @@ static int lio_target_init_nodeacl(struct se_node_acl *se_nacl, &iscsi_tpg_attrib_attr_t10_pi, &iscsi_tpg_attrib_attr_fabric_prot_type, &iscsi_tpg_attrib_attr_tpg_enabled_sendtargets, + &iscsi_tpg_attrib_attr_login_keys_workaround, NULL, }; diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c index 96df63f..7a6751f 100644 --- a/drivers/target/iscsi/iscsi_target_nego.c +++ b/drivers/target/iscsi/iscsi_target_nego.c @@ -864,7 +864,8 @@ static int iscsi_target_handle_csg_zero( SENDER_TARGET, login->rsp_buf, &login->rsp_length, - conn->param_list); + conn->param_list, + conn->tpg->tpg_attrib.login_keys_workaround); if (ret < 0) return -1; @@ -934,7 +935,8 @@ static int iscsi_target_handle_csg_one(struct iscsi_conn *conn, struct iscsi_log SENDER_TARGET, login->rsp_buf, &login->rsp_length, - conn->param_list); + conn->param_list, + conn->tpg->tpg_attrib.login_keys_workaround); if (ret < 0) { iscsit_tx_login_rsp(conn, ISCSI_STATUS_CLS_INITIATOR_ERR, ISCSI_LOGIN_STATUS_INIT_ERR); diff --git a/drivers/target/iscsi/iscsi_target_parameters.c b/drivers/target/iscsi/iscsi_target_parameters.c index fce6276..caab104 100644 --- a/drivers/target/iscsi/iscsi_target_parameters.c +++ b/drivers/target/iscsi/iscsi_target_parameters.c @@ -765,7 +765,8 @@ static int iscsi_check_for_auth_key(char *key) return 0; } -static void iscsi_check_proposer_for_optional_reply(struct iscsi_param *param) +static void iscsi_check_proposer_for_optional_reply(struct iscsi_param *param, + bool keys_workaround) { if (IS_TYPE_BOOL_AND(param)) { if (!strcmp(param->value, NO)) @@ -773,19 +774,31 @@ static void iscsi_check_proposer_for_optional_reply(struct iscsi_param *param) } else if (IS_TYPE_BOOL_OR(param)) { if (!strcmp(param->value, YES)) SET_PSTATE_REPLY_OPTIONAL(param); -/* - * Required for gPXE iSCSI boot client - */ - if (!strcmp(param->name, IMMEDIATEDATA)) - SET_PSTATE_REPLY_OPTIONAL(param); + + if (keys_workaround) { + /* +* Required f
Re: [PATCH] target: make device_mutex and device_list static
On Wed, 2017-07-05 at 13:15 -0500, Mike Christie wrote: > On 07/04/2017 03:44 AM, Colin King wrote: > > From: Colin Ian King > > > > Variables device_mutex and device_list static are local to the source, > > so make them static. > > > > Cleans up sparse warnings: > > "symbol 'device_list' was not declared. Should it be static?" > > "symbol 'device_mutex' was not declared. Should it be static?" > > > > Signed-off-by: Colin Ian King > > --- > > drivers/target/target_core_device.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/target/target_core_device.c > > b/drivers/target/target_core_device.c > > index 3ae8fbf01bdf..bbcef3bc66c8 100644 > > --- a/drivers/target/target_core_device.c > > +++ b/drivers/target/target_core_device.c > > @@ -49,8 +49,8 @@ > > #include "target_core_pr.h" > > #include "target_core_ua.h" > > > > -DEFINE_MUTEX(device_mutex); > > -LIST_HEAD(device_list); > > +static DEFINE_MUTEX(device_mutex); > > +static LIST_HEAD(device_list); > > static DEFINE_IDR(devices_idr); > > > > static struct se_hba *lun0_hba; > > > > My fault. Thanks. > > Reviewed-by: Mike Christie Applied. Thanks Colin + MNC.
Re: [kernel-hardening] Re: [PATCH v4 06/13] iscsi: ensure RNG is seeded before use
On Mon, 2017-06-26 at 19:38 +0200, Stephan Müller wrote: > Am Montag, 26. Juni 2017, 03:23:09 CEST schrieb Nicholas A. Bellinger: > > Hi Nicholas, > > > Hi Stephan, Lee & Jason, > > > > (Adding target-devel CC') > > > > Apologies for coming late to the discussion. Comments below. > > > > On Sun, 2017-06-18 at 10:04 +0200, Stephan Müller wrote: > > > Am Samstag, 17. Juni 2017, 05:45:57 CEST schrieb Lee Duncan: > > > > > > Hi Lee, > > > > > > > In your testing, how long might a process have to wait? Are we talking > > > > seconds? Longer? What about timeouts? > > > > > > In current kernels (starting with 4.8) this timeout should clear within a > > > few seconds after boot. > > > > > > In older kernels (pre 4.8), my KVM takes up to 90 seconds to reach that > > > seeding point. I have heard that on IBM System Z this trigger point > > > requires minutes to be reached. > > > > I share the same concern as Lee wrt to introducing latency into the > > existing iscsi-target login sequence. > > > > Namely in the use-cases where a single node is supporting ~1K unique > > iscsi-target IQNs, and fail-over + re-balancing of IQNs where 100s of > > login attempts are expected to occur in parallel. > > > > If environments exist that require non trivial amounts of time for RNG > > seeding to be ready for iscsi-target usage, then enforcing this > > requirement at iscsi login time can open up problems, especially when > > iscsi host environments may be sensitive to login timeouts, I/O > > timeouts, et al. > > > > That said, I'd prefer to simply wait for RNG to be seeded at modprobe > > iscsi_target_mod time, instead of trying to enforce randomness during > > login. > > > > This way, those of use who have distributed storage platforms can know > > to avoid putting a node into a state to accept iscsi-target IQN export > > migration, before modprobe iscsi_target_mod has successfully loaded and > > RNG seeding has been confirmed. > > > > WDYT..? > > We may have a chicken and egg problem when the wait is at the modprobe time. > Assume the modprobe happens during initramfs time to get access to the root > file system. In this case, you entire boot process will lock up for an > indefinite amount of time. The reason is that in order to obtain events > detected by the RNG, devices need to be initialized and working. Such devices > commonly start working after the the root partition is mounted as it contains > all drivers, all configuration information etc. > > Note, during the development of my /dev/random implementation, I added the > getrandom-like blocking behavior to /dev/urandom (which is the equivalent to > Jason's patch except that it applies to user space). The boot process locked > up since systemd wanted data from /dev/urandom while it processed the > initramfs. As it did not get any, the boot process did not commence that > could > deliver new events to be picked up by the RNG. > > As I do not have such an iscsi system, I cannot test Jason's patch. But maybe > the mentioned chicken-and-egg problem I mentioned above is already visible > with the current patch as it will lead to a blocking of the mounting of the > root partition in case the root partition is on an iscsi target. AFAIK, there are no distro initramfs dependencies for iscsi_target_mod, and every environment I've ever seen loads iscsi_target_mod after switching to the real rootfs. For an iscsi initiator that might not been the case, especially if the rootfs is running atop a iscsi LUN. But at least for iscsi-target mode, any blocking during modprobe waiting for RNG seeding would happen outside of initramfs.
Re: [PATCH] ib_isert: prevent NULL pointer dereference in isert_login_recv_done()
On Thu, 2017-06-29 at 11:28 +0300, Sagi Grimberg wrote: > >> Can you test just the one liner fix below? > >> > @@ -1452,7 +1452,7 @@ > isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc) > { > struct isert_conn *isert_conn = wc->qp->qp_context; > -struct ib_device *ib_dev = isert_conn->cm_id->device; > +struct ib_device *ib_dev = isert_conn->device->ib_device; > if (unlikely(wc->status != IB_WC_SUCCESS)) { > isert_print_wc(wc, "login recv"); > > > > I'll test also this one-liner fix as soon as I can. > > > > But I can say that I'm pretty sure it will work as well, because all the > > previous NULL pointer dereferences that we've got in the past happened > > all 100% in isert_login_recv_done(). The other cases are probably a safe > > precaution, but they can't really happen. > > Agree, I'd prefer to start with a surgical fix before moving forward. Sounds fine. The single line fix above in isert_login_recv_done() has been queued up in target-pending/for-next with Andrea's Tested-by and your Reviewed-by: https://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git/commit/?h=for-next&id=702c0f17403765cc5aa1c18f6ea6eb549c1bac9b Please let me know if there are any other issues wrt the surgical fix.
Re: [kernel-hardening] Re: [PATCH v4 06/13] iscsi: ensure RNG is seeded before use
Hi Stephan, Lee & Jason, (Adding target-devel CC') Apologies for coming late to the discussion. Comments below. On Sun, 2017-06-18 at 10:04 +0200, Stephan Müller wrote: > Am Samstag, 17. Juni 2017, 05:45:57 CEST schrieb Lee Duncan: > > Hi Lee, > > > In your testing, how long might a process have to wait? Are we talking > > seconds? Longer? What about timeouts? > > > > In current kernels (starting with 4.8) this timeout should clear within a few > seconds after boot. > > In older kernels (pre 4.8), my KVM takes up to 90 seconds to reach that > seeding point. I have heard that on IBM System Z this trigger point requires > minutes to be reached. > I share the same concern as Lee wrt to introducing latency into the existing iscsi-target login sequence. Namely in the use-cases where a single node is supporting ~1K unique iscsi-target IQNs, and fail-over + re-balancing of IQNs where 100s of login attempts are expected to occur in parallel. If environments exist that require non trivial amounts of time for RNG seeding to be ready for iscsi-target usage, then enforcing this requirement at iscsi login time can open up problems, especially when iscsi host environments may be sensitive to login timeouts, I/O timeouts, et al. That said, I'd prefer to simply wait for RNG to be seeded at modprobe iscsi_target_mod time, instead of trying to enforce randomness during login. This way, those of use who have distributed storage platforms can know to avoid putting a node into a state to accept iscsi-target IQN export migration, before modprobe iscsi_target_mod has successfully loaded and RNG seeding has been confirmed. WDYT..?
Re: [PATCH] ib_isert: prevent NULL pointer dereference in isert_login_recv_done() (was: Re: NULL pointer dereference in isert_login_recv_done in 4.9.32)
Hi Andrea & Robert, (Adding HCH CC') On Fri, 2017-06-23 at 00:37 +0200, Andrea Righi wrote: > On Wed, Jun 21, 2017 at 10:33:45AM -0600, Robert LeBlanc wrote: > > On Wed, Jun 21, 2017 at 9:17 AM, Robert LeBlanc > > wrote: > > > On Tue, Jun 20, 2017 at 12:54 PM, Robert LeBlanc > > > wrote: > > >> We have hit this four times today. Any ideas? > > >> > > >> [ 169.382113] BUG: unable to handle kernel NULL pointer dereference at > > >> (null) > > >> [ 169.382152] IP: [] isert_login_recv_done+0x28/0x170 > > >> [ib_isert] > > So, we spent more time to track down this bug. > > It seems that at login time an error is happening, not sure exactly what > kind of error, but isert_connect_error() is called and isert_conn->cm_id > is set to NULL. > > Later, isert_login_recv_done() is trying to access > isert_conn->cm_id->device and we get the NULL pointer dereference. > > Following there's the patch that we have applied to track down this > problem. > > And this is what we see in dmesg with this patch applied: > > [ 658.633188] isert: isert_connect_error: conn 887f2209c000 error > [ 658.633226] isert: isert_login_recv_done: login with broken rdma_cm_id > > As we can see isert_connect_error() is called before isert_login_recv_done > and at that point isert_conn->cm_id is NULL. > > Obviously simply checking if the pointer is NULL, returning and ignoring > the error in isert_login_recv_done() is not the best fix ever and I'm > not sure if I'm breaking something else doing so (even if with this > patch the kernel doesn't crash and I've not seen any problem so far). > > Maybe a better way is to tear down the whole connection when this > particular case is happening? Suggestions? > > Thanks, > -Andrea > > --- > ib_isert: prevent NULL pointer dereference in isert_login_recv_done() > > During a login if an error is happening isert_connect_error() is called > and isert_conn->cm_id is set to NULL. > > Later, isert_login_recv_done() is executed, trying to access > isert_conn->cm_id->device, causing the following BUG: > > [ 169.382113] BUG: unable to handle kernel NULL pointer dereference at > (null) > [ 169.382152] IP: [] isert_login_recv_done+0x28/0x170 > [ib_isert] > > Check if isert_con->cm_id is set to NULL in isert_login_recv_done() to > avoid this problem. > > Signed-off-by: Andrea Righi > Signed-off-by: Robert LeBlanc > --- > drivers/infiniband/ulp/isert/ib_isert.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/infiniband/ulp/isert/ib_isert.c > b/drivers/infiniband/ulp/isert/ib_isert.c > index fcbed35..a8c1143 100644 > --- a/drivers/infiniband/ulp/isert/ib_isert.c > +++ b/drivers/infiniband/ulp/isert/ib_isert.c > @@ -741,6 +741,7 @@ isert_connect_error(struct rdma_cm_id *cma_id) > { > struct isert_conn *isert_conn = cma_id->qp->qp_context; > > + isert_warn("conn %p error\n", isert_conn); > list_del_init(&isert_conn->node); > isert_conn->cm_id = NULL; > isert_put_conn(isert_conn); > @@ -1452,7 +1453,13 @@ static void > isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc) > { > struct isert_conn *isert_conn = wc->qp->qp_context; > - struct ib_device *ib_dev = isert_conn->cm_id->device; > +struct ib_device *ib_dev; > + > + if (unlikely(isert_conn->cm_id == NULL)) { > + isert_warn("login with broken rdma_cm_id"); > + return; > + } > + ib_dev = isert_conn->cm_id->device; > > if (unlikely(wc->status != IB_WC_SUCCESS)) { > isert_print_wc(wc, "login recv"); So I assume isert_cma_handler() -> isert_connect_error() getting called to clear isert_conn->cm_id before connection established, and subsequently isert_conn->login_req_buf->rx_cqe.done() -> isert_login_recv_done() still getting invoked after connection failure is new RDMA API behavior.. That said, following Sagi's original patch that added the clearing of isert_conn->cm_id to isert_connect_error(), it probably makes sense to use isert_conn->device->ib_device, and avoid dereferencing isert_conn->cm_id before connection is established. Here's a quick (untested) patch to do this for recv/send done callbacks, and avoid using isert_conn->cm_id during isert_rdma_accept(). Sagi + Co, WDYT..? diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index fcbed35..f7f97f3 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -52,7 +52,7 @@ static int isert_login_post_recv(struct isert_conn *isert_conn); static int -isert_rdma_accept(struct isert_conn *isert_conn); +isert_rdma_accept(struct isert_conn *isert_conn, struct rdma_cm_id *cm_id); struct rdma_cm_id *isert_setup_id(struct isert_np *isert_np); static void isert_release_work(struct work_struct *work); @@ -543,7 +543,7 @@ if (ret) goto out_conn_dev; - ret = isert_rdma_accept(isert_conn); + ret = iser
Re: [PATCH][V2][target-devel-next] tcmu: make array tcmu_attrib_attrs static const
On Tue, 2017-06-13 at 14:29 +0100, Colin King wrote: > From: Colin Ian King > > The array tcmu_attrib_attrs does not need to be in global scope, so make > it static. > > Cleans up sparse warning: > "symbol 'tcmu_attrib_attrs' was not declared. Should it be static?" > > Signed-off-by: Colin Ian King > --- > drivers/target/target_core_user.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/target/target_core_user.c > b/drivers/target/target_core_user.c > index afc1fd6bacaf..04fb3f720895 100644 > --- a/drivers/target/target_core_user.c > +++ b/drivers/target/target_core_user.c > @@ -1672,7 +1672,7 @@ static ssize_t tcmu_emulate_write_cache_store(struct > config_item *item, > } > CONFIGFS_ATTR(tcmu_, emulate_write_cache); > > -struct configfs_attribute *tcmu_attrib_attrs[] = { > +static struct configfs_attribute *tcmu_attrib_attrs[] = { > &tcmu_attr_cmd_time_out, > &tcmu_attr_dev_path, > &tcmu_attr_dev_size, Applied. Thanks Colin.
[GIT PULL] target fixes for v4.12-rc7
Hi Linus, Here are the target-pending fixes for v4.12-rc7 that have been queued up for the last 2 weeks. Please go ahead and pull from: git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git master This includes: - Fix a TMR related kref underflow detected by the recent refcount_t conversion in upstream. - Fix a iscsi-target corner case during explicit connection logout timeout failure. - Address last fallout in iscsi-target immediate data handling from v4.4 target-core now allowing control CDB payload underflow. Thank you, --nab Nicholas Bellinger (3): target: Fix kref->refcount underflow in transport_cmd_finish_abort iscsi-target: Fix delayed logout processing greater than SECONDS_FOR_LOGOUT_COMP iscsi-target: Reject immediate data underflow larger than SCSI transfer length drivers/target/iscsi/iscsi_target.c| 22 -- drivers/target/target_core_internal.h | 2 +- drivers/target/target_core_tmr.c | 16 drivers/target/target_core_transport.c | 9 ++--- 4 files changed, 35 insertions(+), 14 deletions(-)
Re: [PATCH] configfs: Fix race between create_link and configfs_rmdir
On Thu, 2017-06-08 at 07:34 -0500, Bryant G. Ly wrote: > > Thanks Nic, > > > > applied to the configfs-for-next tree. I'm not entirely sure if we > > should bother adding this to 4.12 or if it hits rarely enough? > > > It hits for us pretty often when we have a GPFS setup with 10 hosts and 1k+ > vms. > > That is how we discovered the bug in the first place. > Using a DATERA workload with 1K unique multi-tenant backend devices and 1K unique iscsi-target IQNs per node, I've never tripped across this particular bug.. However, our userspace built atop rtslib is enforcing tenant shutdown of individual rmdir(2) of /sys/kernel/config/target/$FABRIC/$WWN/$TPGT/, before rmdir(2) of /sys/kernel/config/target/core/$HBA/$DEV/ occurs. Based on Bryant's original backtrace with targetcli, it looks like the Novalink user-space is not enforcing this requirement across user-space processes doing fabric port symlink and backend device shutdown. That said it probably doesn't need a special v4.12-rc PULL, but based on Bryant's feedback it certainly does deserve a stable CC'.
Re: [PATCH] iscsi-target: Reject immediate data underflow larger than SCSI transfer length
On Thu, 2017-06-08 at 15:37 +, Bart Van Assche wrote: > On Thu, 2017-06-08 at 04:21 +0000, Nicholas A. Bellinger wrote: > > + /* > > +* Check for underflow case where both EDTL and immediate data payload > > +* exceeds what is presented by CDB's TRANSFER LENGTH, and what has > > +* already been set in target_cmd_size_check() as se_cmd->data_length. > > +* > > +* For this special case, fail the command and dump the immediate data > > +* payload. > > +*/ > > + if (cmd->first_burst_len > cmd->se_cmd.data_length) { > > + cmd->sense_reason = TCM_INVALID_CDB_FIELD; > > + goto after_immediate_data; > > + } > > A quote from the iSCSI RFC (https://tools.ietf.org/html/rfc5048): > >If SPDTL < EDTL for a task, iSCSI Underflow MUST be signaled in the >SCSI Response PDU as specified in [RFC3720]. The Residual Count MUST >be set to the numerical value of (EDTL - SPDTL). > > Sorry but I don't think that sending TCM_INVALID_CDB_FIELD back to the > initiator is compliant with the iSCSI RFC. Alas, the nuance of what this patch actually does was missed when you cut the context. First, a bit of history. LIO has rejected underflow for all WRITEs for the first ~12.5 years of RFC-3720, and in the context of iscsi-target mode there has never been a single host environment that ever once cared. Since Roland's patch to allow underflow for control CDBs in v4.3+ opened this discussion for control CDBs with a WRITE payload in order to make MSFT/FCP cert for PERSISTENT_RESERVE_OUT happy, the question has become what control CDB WRITE underflow cases should we allow..? The point with this patch is when a host is sending a underflow with a iscsi immediate data payload that exceeds SCSI transfer length, it's a bogus request with a garbage payload. It's a garbage payload because the SCSI CDB itself obviously doesn't want anything to do it. I'm very dubious of any host environment who's trying to do this for any CDB, and expects achieve expected results. Of course, since v4.3+ normal overflow where SCSI transfer length matches the iscsi immediate data payload just works with or without this patch. So to that extent, I'm going to push this patch as a defensive fix for v4.3+, to let those imaginary iscsi host environments know they being very, very naughty. > Please note that a fix that is > compliant with the iSCSI RFC is present in the following patch series: [PATCH > 00/33] SCSI target driver patches for kernel v4.13, 23 May 2017 > (https://www.spinics.net/lists/target-devel/msg15370.html). So I might still consider this as a v4.13-rc item for control CDB underflow, but no way as stable material. Also, there is certainly no way I'm going to allow a patch to randomly enable underflow/overflow for all WRITE non control CDBs tree-wide across all fabric drivers, because 1) no host environments actually care about this, and 2) it's still dangerous to do for all fabrics without some serious auditing.
Re: [PATCH 2/3] target: Add TARGET_SCF_LOOKUP_LUN_FROM_TAG support for ABORT_TASK
On Mon, 2017-06-05 at 15:57 +, Bart Van Assche wrote: > On Sat, 2017-06-03 at 22:10 +0000, Nicholas A. Bellinger wrote: > > +static bool target_lookup_lun_from_tag(struct se_session *se_sess, u64 tag, > > + u64 *unpacked_lun) > > +{ > > + struct se_cmd *se_cmd; > > + unsigned long flags; > > + bool ret = false; > > + > > + spin_lock_irqsave(&se_sess->sess_cmd_lock, flags); > > + list_for_each_entry(se_cmd, &se_sess->sess_cmd_list, se_cmd_list) { > > + if (se_cmd->se_cmd_flags & SCF_SCSI_TMR_CDB) > > + continue; > > + > > + if (se_cmd->tag == tag) { > > + *unpacked_lun = se_cmd->orig_fe_lun; > > + ret = true; > > + break; > > + } > > + } > > + spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags); > > + > > + return ret; > > +} > > + > > /** > > * target_submit_tmr - lookup unpacked lun and submit uninitialized se_cmd > > * for TMR CDBs > > @@ -1639,19 +1662,31 @@ int target_submit_tmr(struct se_cmd *se_cmd, struct > > se_session *se_sess, > > core_tmr_release_req(se_cmd->se_tmr_req); > > return ret; > > } > > + /* > > +* If this is ABORT_TASK with no explicit fabric provided LUN, > > +* go ahead and search active session tags for a match to figure > > +* out unpacked_lun for the original se_cmd. > > +*/ > > + if (tm_type == TMR_ABORT_TASK && (flags & > > TARGET_SCF_LOOKUP_LUN_FROM_TAG)) { > > + if (!target_lookup_lun_from_tag(se_sess, tag, &unpacked_lun)) > > + goto failure; > > + } > > > > ret = transport_lookup_tmr_lun(se_cmd, unpacked_lun); > > - if (ret) { > > - /* > > -* For callback during failure handling, push this work off > > -* to process context with TMR_LUN_DOES_NOT_EXIST status. > > -*/ > > - INIT_WORK(&se_cmd->work, target_complete_tmr_failure); > > - schedule_work(&se_cmd->work); > > - return 0; > > - } > > + if (ret) > > + goto failure; > > + > > transport_generic_handle_tmr(se_cmd); > > return 0; > > Hello Nic, > > So after LUN lookup has finished sess_cmd_lock lock is dropped, a context > switch occurs due to the queue_work() call in transport_generic_handle_tmr() > and next core_tmr_abort_task() reacquires that lock? Sorry but I'm afraid > that if after that lock is dropped and before it is reacquired a command > finishes and the initiator reuses the same tag for another command for the > same LUN that this may result in the wrong command being aborted. This is only getting a unpacked_lun to determine where the incoming ABORT_TASK should perform a se_lun lookup and percpu ref upon. The scenario you are talking about would require a tag be reused on the same session for the same device between when the ABORT_TASK is dispatched here, and actually processed. Because qla2xxx doesn't reuse tags, it's not a problem since it's the only consumer of TARGET_SCF_LOOKUP_LUN_FROM_TAG. Since Himanshu and Quinn are OK it with, I'm OK with it. If you have another driver that needs to use this logic where an ABORT_TASK doesn't know the unpacked_lun to reference, and does reuse tags then I'd consider a patch for that.
Re: [PATCH 0/3] target: Add TARGET_SCF_LOOKUP_LUN_FROM_TAG support
Hi Himanshu & Quinn, On Wed, 2017-06-07 at 05:02 +, Madhani, Himanshu wrote: > > On Jun 3, 2017, at 3:10 PM, Nicholas A. Bellinger > > wrote: > > > > From: Nicholas Bellinger > > > > Hi Himanshu + Quinn, > > > > Here is a small series to introduce proper percpu se_lun->lun_ref > > counting for TMR, and add common code in target_submit_tmr() to > > do tag lookup for unpacked_lun in order to drop the original > > driver specific lookup within __qlt_24xx_handle_abts(). > > > > It's rather straight-forward, so review and test as a v4.13 item. > > > > Thanks! > > > > Nicholas Bellinger (3): > > target: Add support for TMR percpu reference counting > > target: Add TARGET_SCF_LOOKUP_LUN_FROM_TAG support for ABORT_TASK > > qla2xxx: Convert QLA_TGT_ABTS to TARGET_SCF_LOOKUP_LUN_FROM_TAG > > > > drivers/scsi/qla2xxx/qla_target.c | 39 ++- > > drivers/scsi/qla2xxx/tcm_qla2xxx.c | 4 ++- > > drivers/target/target_core_device.c| 14 ++--- > > drivers/target/target_core_transport.c | 56 > > -- > > include/target/target_core_base.h | 3 +- > > 5 files changed, 71 insertions(+), 45 deletions(-) > > > > -- > > 1.9.1 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe target-devel" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Series looks good. > > Acked-by: Himanshu Madhani > Thanks for the review!
Re: [PATCH] iscsi-target: Fix delayed logout processing greater than SECONDS_FOR_LOGOUT_COMP
Reviews pretty please..? On Sat, 2017-06-03 at 21:32 +, Nicholas A. Bellinger wrote: > From: Nicholas Bellinger > > This patch fixes a BUG() in iscsit_close_session() that could be > triggered when iscsit_logout_post_handler() execution from within > tx thread context was not run for more than SECONDS_FOR_LOGOUT_COMP > (15 seconds), and the TCP connection didn't already close before > then forcing tx thread context to automatically exit. > > This would manifest itself during explicit logout as: > > [33206.974254] 1 connection(s) still exist for iSCSI session to > iqn.1993-08.org.debian:01:3f5523242179 > [33206.980184] INFO: NMI handler (kgdb_nmi_handler) took too long to run: > 2100.772 msecs > [33209.078643] [ cut here ] > [33209.078646] kernel BUG at drivers/target/iscsi/iscsi_target.c:4346! > > Normally when explicit logout attempt fails, the tx thread context > exits and iscsit_close_connection() from rx thread context does the > extra cleanup once it detects conn->conn_logout_remove has not been > cleared by the logout type specific post handlers. > > To address this special case, if the logout post handler in tx thread > context detects conn->tx_thread_active has already been cleared, simply > return and exit in order for existing iscsit_close_connection() > logic from rx thread context do failed logout cleanup. > > Reported-by: Bart Van Assche > Cc: Mike Christie > Cc: Hannes Reinecke > Cc: Sagi Grimberg > Cc: sta...@vger.kernel.org # 3.14+ > Signed-off-by: Nicholas Bellinger > --- > drivers/target/iscsi/iscsi_target.c | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/target/iscsi/iscsi_target.c > b/drivers/target/iscsi/iscsi_target.c > index 0d8f815..c025451 100644 > --- a/drivers/target/iscsi/iscsi_target.c > +++ b/drivers/target/iscsi/iscsi_target.c > @@ -4423,8 +4423,11 @@ static void iscsit_logout_post_handler_closesession( >* always sleep waiting for RX/TX thread shutdown to complete >* within iscsit_close_connection(). >*/ > - if (!conn->conn_transport->rdma_shutdown) > + if (!conn->conn_transport->rdma_shutdown) { > sleep = cmpxchg(&conn->tx_thread_active, true, false); > + if (!sleep) > + return; > + } > > atomic_set(&conn->conn_logout_remove, 0); > complete(&conn->conn_logout_comp); > @@ -4440,8 +4443,11 @@ static void iscsit_logout_post_handler_samecid( > { > int sleep = 1; > > - if (!conn->conn_transport->rdma_shutdown) > + if (!conn->conn_transport->rdma_shutdown) { > sleep = cmpxchg(&conn->tx_thread_active, true, false); > + if (!sleep) > + return; > + } > > atomic_set(&conn->conn_logout_remove, 0); > complete(&conn->conn_logout_comp);
Re: [PATCH] target: Fix kref->refcount underflow in transport_cmd_finish_abort
(Adding Quinn CC') Reviews please..? On Sat, 2017-06-03 at 21:09 +0000, Nicholas A. Bellinger wrote: > From: Nicholas Bellinger > > This patch fixes a se_cmd->cmd_kref underflow during CMD_T_ABORTED > when a fabric driver drops it's second reference from below the > target_core_tmr.c based callers of transport_cmd_finish_abort(). > > Recently with the conversion of kref to refcount_t, this bug was > manifesting itself as: > > [705519.601034] refcount_t: underflow; use-after-free. > [705519.604034] INFO: NMI handler (kgdb_nmi_handler) took too long to run: > 20116.512 msecs > [705539.719111] [ cut here ] > [705539.719117] WARNING: CPU: 3 PID: 26510 at lib/refcount.c:184 > refcount_sub_and_test+0x33/0x51 > > Since the original kref atomic_t based kref_put() didn't check for > underflow and only invoked the final callback when zero was reached, > this bug did not manifest in practice since all se_cmd memory is > using preallocated tags. > > To address this, go ahead and propigate the existing return from > transport_put_cmd() up via transport_cmd_finish_abort(), and > change transport_cmd_finish_abort() + core_tmr_handle_tas_abort() > callers to only do their local target_put_sess_cmd() if necessary. > > Reported-by: Bart Van Assche > Cc: Mike Christie > Cc: Hannes Reinecke > Cc: Christoph Hellwig > Cc: Himanshu Madhani > Cc: Sagi Grimberg > Cc: sta...@vger.kernel.org # 3.14+ > Signed-off-by: Nicholas Bellinger > --- > drivers/target/target_core_internal.h | 2 +- > drivers/target/target_core_tmr.c | 16 > drivers/target/target_core_transport.c | 9 ++--- > 3 files changed, 15 insertions(+), 12 deletions(-) > > diff --git a/drivers/target/target_core_internal.h > b/drivers/target/target_core_internal.h > index 9ab7090..0912de7 100644 > --- a/drivers/target/target_core_internal.h > +++ b/drivers/target/target_core_internal.h > @@ -136,7 +136,7 @@ struct se_node_acl > *core_tpg_add_initiator_node_acl(struct se_portal_group *tpg, > void release_se_kmem_caches(void); > u32 scsi_get_new_index(scsi_index_t); > void transport_subsystem_check_init(void); > -void transport_cmd_finish_abort(struct se_cmd *, int); > +int transport_cmd_finish_abort(struct se_cmd *, int); > unsigned char *transport_dump_cmd_direction(struct se_cmd *); > void transport_dump_dev_state(struct se_device *, char *, int *); > void transport_dump_dev_info(struct se_device *, struct se_lun *, > diff --git a/drivers/target/target_core_tmr.c > b/drivers/target/target_core_tmr.c > index dce1e1b..13f47bf 100644 > --- a/drivers/target/target_core_tmr.c > +++ b/drivers/target/target_core_tmr.c > @@ -75,7 +75,7 @@ void core_tmr_release_req(struct se_tmr_req *tmr) > kfree(tmr); > } > > -static void core_tmr_handle_tas_abort(struct se_cmd *cmd, int tas) > +static int core_tmr_handle_tas_abort(struct se_cmd *cmd, int tas) > { > unsigned long flags; > bool remove = true, send_tas; > @@ -91,7 +91,7 @@ static void core_tmr_handle_tas_abort(struct se_cmd *cmd, > int tas) > transport_send_task_abort(cmd); > } > > - transport_cmd_finish_abort(cmd, remove); > + return transport_cmd_finish_abort(cmd, remove); > } > > static int target_check_cdb_and_preempt(struct list_head *list, > @@ -184,8 +184,8 @@ void core_tmr_abort_task( > cancel_work_sync(&se_cmd->work); > transport_wait_for_tasks(se_cmd); > > - transport_cmd_finish_abort(se_cmd, true); > - target_put_sess_cmd(se_cmd); > + if (!transport_cmd_finish_abort(se_cmd, true)) > + target_put_sess_cmd(se_cmd); > > printk("ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for" > " ref_tag: %llu\n", ref_tag); > @@ -281,8 +281,8 @@ static void core_tmr_drain_tmr_list( > cancel_work_sync(&cmd->work); > transport_wait_for_tasks(cmd); > > - transport_cmd_finish_abort(cmd, 1); > - target_put_sess_cmd(cmd); > + if (!transport_cmd_finish_abort(cmd, 1)) > + target_put_sess_cmd(cmd); > } > } > > @@ -380,8 +380,8 @@ static void core_tmr_drain_state_list( > cancel_work_sync(&cmd->work); > transport_wait_for_tasks(cmd); > > - core_tmr_handle_tas_abort(cmd, tas); > - target_put_sess_cmd(cmd); > + if (!core_tmr_handle_tas_abort(cmd, tas)) > + target_put_sess_cmd(cmd); > } > } > > diff --git a/
[PATCH] configfs: Fix race between create_link and configfs_rmdir
From: Nicholas Bellinger This patch closes a long standing race in configfs between the creation of a new symlink in create_link(), while the symlink target's config_item is being concurrently removed via configfs_rmdir(). This can happen because the symlink target's reference is obtained by config_item_get() in create_link() before the CONFIGFS_USET_DROPPING bit set by configfs_detach_prep() during configfs_rmdir() shutdown is actually checked.. This originally manifested itself on ppc64 on v4.8.y under heavy load using ibmvscsi target ports with Novalink API: [ 7877.289863] rpadlpar_io: slot U8247.22L.212A91A-V1-C8 added [ 7879.893760] [ cut here ] [ 7879.893768] WARNING: CPU: 15 PID: 17585 at ./include/linux/kref.h:46 config_item_get+0x7c/0x90 [configfs] [ 7879.893811] CPU: 15 PID: 17585 Comm: targetcli Tainted: G O 4.8.17-customv2.22 #12 [ 7879.893812] task: c0018a0d3400 task.stack: c001f3b4 [ 7879.893813] NIP: d2c664ec LR: d2c60980 CTR: c0b70870 [ 7879.893814] REGS: c001f3b43810 TRAP: 0700 Tainted: G O (4.8.17-customv2.22) [ 7879.893815] MSR: 80029033 CR: 2842 XER: [ 7879.893820] CFAR: d2c664bc SOFTE: 1 GPR00: d2c60980 c001f3b43a90 d2c70908 c000fbc06820 GPR04: c001ef1bd900 0004 0001 GPR08: 0001 d2c69560 d2c66d80 GPR12: c0b70870 ce798700 c001f3b43ca0 c001d4949d40 GPR16: c0014637e1c0 c000f2392940 GPR20: c001f3b43b98 0041 0060 GPR24: f000 d2c60be0 c001f1dac490 GPR28: 0004 c001ef1bd900 c000f2392940 [ 7879.893839] NIP [d2c664ec] config_item_get+0x7c/0x90 [configfs] [ 7879.893841] LR [d2c60980] check_perm+0x80/0x2e0 [configfs] [ 7879.893842] Call Trace: [ 7879.893844] [c001f3b43ac0] [d2c60980] check_perm+0x80/0x2e0 [configfs] [ 7879.893847] [c001f3b43b10] [c0329770] do_dentry_open+0x2c0/0x460 [ 7879.893849] [c001f3b43b70] [c0344480] path_openat+0x210/0x1490 [ 7879.893851] [c001f3b43c80] [c034708c] do_filp_open+0xfc/0x170 [ 7879.893853] [c001f3b43db0] [c032b5bc] do_sys_open+0x1cc/0x390 [ 7879.893856] [c001f3b43e30] [c0009584] system_call+0x38/0xec [ 7879.893856] Instruction dump: [ 7879.893858] 409d0014 38210030 e8010010 7c0803a6 4e800020 3d22 e94981e0 892a [ 7879.893861] 2f89 409effe0 3921 992a <0fe0> 4bd0 6000 6000 [ 7879.893866] ---[ end trace 14078f0b3b5ad0aa ]--- To close this race, go ahead and obtain the symlink's target config_item reference only after the existing CONFIGFS_USET_DROPPING check succeeds. This way, if configfs_rmdir() wins create_link() will return -ENONET, and if create_link() wins configfs_rmdir() will return -EBUSY. Reported-by: Bryant G. Ly Tested-by: Bryant G. Ly Cc: Bryant G. Ly Cc: Christoph Hellwig Cc: Joel Becker Signed-off-by: Nicholas Bellinger --- fs/configfs/symlink.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c index a6ab012..c8aabba 100644 --- a/fs/configfs/symlink.c +++ b/fs/configfs/symlink.c @@ -83,14 +83,13 @@ static int create_link(struct config_item *parent_item, ret = -ENOMEM; sl = kmalloc(sizeof(struct configfs_symlink), GFP_KERNEL); if (sl) { - sl->sl_target = config_item_get(item); spin_lock(&configfs_dirent_lock); if (target_sd->s_type & CONFIGFS_USET_DROPPING) { spin_unlock(&configfs_dirent_lock); - config_item_put(item); kfree(sl); return -ENOENT; } + sl->sl_target = config_item_get(item); list_add(&sl->sl_list, &target_sd->s_links); spin_unlock(&configfs_dirent_lock); ret = configfs_create_link(sl, parent_item->ci_dentry, -- 1.9.1
[PATCH] iscsi-target: Reject immediate data underflow larger than SCSI transfer length
From: Nicholas Bellinger When iscsi WRITE underflow occurs there are two different scenarios that can happen. Normally in practice, when an EDTL vs. SCSI CDB TRANSFER LENGTH underflow is detected, the iscsi immediate data payload is the smaller SCSI CDB TRANSFER LENGTH. That is, when a host fabric LLD is using a fixed size EDTL for a specific control CDB, the SCSI CDB TRANSFER LENGTH and actual SCSI payload ends up being smaller than EDTL. In iscsi, this means the received iscsi immediate data payload matches the smaller SCSI CDB TRANSFER LENGTH, because there is no more SCSI payload to accept beyond SCSI CDB TRANSFER LENGTH. However, it's possible for a malicous host to send a WRITE underflow where EDTL is larger than SCSI CDB TRANSFER LENGTH, but incoming iscsi immediate data actually matches EDTL. In the wild, we've never had a iscsi host environment actually try to do this. For this special case, it's wrong to truncate part of the control CDB payload and continue to process the command during underflow when immediate data payload received was larger than SCSI CDB TRANSFER LENGTH, so go ahead and reject and drop the bogus payload as a defensive action. Note this potential bug was originally relaxed by the following for allowing WRITE underflow in MSFT FCP host environments: commit c72c5250224d475614a00c1d7e54a67f77cd3410 Author: Roland Dreier Date: Wed Jul 22 15:08:18 2015 -0700 target: allow underflow/overflow for PR OUT etc. commands Cc: Roland Dreier Cc: Mike Christie Cc: Hannes Reinecke Cc: Martin K. Petersen Signed-off-by: Nicholas Bellinger --- drivers/target/iscsi/iscsi_target.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index c025451..3fdca2c 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -1279,6 +1279,18 @@ int iscsit_process_scsi_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd, */ if (dump_payload) goto after_immediate_data; + /* +* Check for underflow case where both EDTL and immediate data payload +* exceeds what is presented by CDB's TRANSFER LENGTH, and what has +* already been set in target_cmd_size_check() as se_cmd->data_length. +* +* For this special case, fail the command and dump the immediate data +* payload. +*/ + if (cmd->first_burst_len > cmd->se_cmd.data_length) { + cmd->sense_reason = TCM_INVALID_CDB_FIELD; + goto after_immediate_data; + } immed_ret = iscsit_handle_immediate_data(cmd, hdr, cmd->first_burst_len); -- 1.9.1
[PATCH 0/3] target: Add TARGET_SCF_LOOKUP_LUN_FROM_TAG support
From: Nicholas Bellinger Hi Himanshu + Quinn, Here is a small series to introduce proper percpu se_lun->lun_ref counting for TMR, and add common code in target_submit_tmr() to do tag lookup for unpacked_lun in order to drop the original driver specific lookup within __qlt_24xx_handle_abts(). It's rather straight-forward, so review and test as a v4.13 item. Thanks! Nicholas Bellinger (3): target: Add support for TMR percpu reference counting target: Add TARGET_SCF_LOOKUP_LUN_FROM_TAG support for ABORT_TASK qla2xxx: Convert QLA_TGT_ABTS to TARGET_SCF_LOOKUP_LUN_FROM_TAG drivers/scsi/qla2xxx/qla_target.c | 39 ++- drivers/scsi/qla2xxx/tcm_qla2xxx.c | 4 ++- drivers/target/target_core_device.c| 14 ++--- drivers/target/target_core_transport.c | 56 -- include/target/target_core_base.h | 3 +- 5 files changed, 71 insertions(+), 45 deletions(-) -- 1.9.1
[PATCH 1/3] target: Add support for TMR percpu reference counting
From: Nicholas Bellinger This patch introduces TMR percpu reference counting using se_lun->lun_ref in transport_lookup_tmr_lun(), following how existing non TMR per se_lun reference counting works within transport_lookup_cmd_lun(). It also adds explicit transport_lun_remove_cmd() calls to drop the reference in the three tmr related locations that invoke transport_cmd_check_stop_to_fabric(); - target_tmr_work() during normal ->queue_tm_rsp() - target_complete_tmr_failure() during error ->queue_tm_rsp() - transport_generic_handle_tmr() during early failure Also, note the exception paths in transport_generic_free_cmd() and transport_cmd_finish_abort() already check SCF_SE_LUN_CMD, and will invoke transport_lun_remove_cmd() when necessary. Cc: Himanshu Madhani Cc: Quinn Tran Cc: Mike Christie Cc: Hannes Reinecke Cc: Christoph Hellwig Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_device.c| 14 ++ drivers/target/target_core_transport.c | 3 +++ 2 files changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c index 8f0e0e3..11c80c4 100644 --- a/drivers/target/target_core_device.c +++ b/drivers/target/target_core_device.c @@ -168,11 +168,20 @@ int transport_lookup_tmr_lun(struct se_cmd *se_cmd, u64 unpacked_lun) rcu_read_lock(); deve = target_nacl_find_deve(nacl, unpacked_lun); if (deve) { - se_cmd->se_lun = rcu_dereference(deve->se_lun); se_lun = rcu_dereference(deve->se_lun); + + if (!percpu_ref_tryget_live(&se_lun->lun_ref)) { + se_lun = NULL; + goto out_unlock; + } + + se_cmd->se_lun = rcu_dereference(deve->se_lun); se_cmd->pr_res_key = deve->pr_res_key; se_cmd->orig_fe_lun = unpacked_lun; + se_cmd->se_cmd_flags |= SCF_SE_LUN_CMD; + se_cmd->lun_ref_active = true; } +out_unlock: rcu_read_unlock(); if (!se_lun) { @@ -182,9 +191,6 @@ int transport_lookup_tmr_lun(struct se_cmd *se_cmd, u64 unpacked_lun) unpacked_lun); return -ENODEV; } - /* -* XXX: Add percpu se_lun->lun_ref reference count for TMR -*/ se_cmd->se_dev = rcu_dereference_raw(se_lun->lun_se_dev); se_tmr->tmr_dev = rcu_dereference_raw(se_lun->lun_se_dev); diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index f16a789..83bfc97 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -1588,6 +1588,7 @@ static void target_complete_tmr_failure(struct work_struct *work) se_cmd->se_tmr_req->response = TMR_LUN_DOES_NOT_EXIST; se_cmd->se_tfo->queue_tm_rsp(se_cmd); + transport_lun_remove_cmd(se_cmd); transport_cmd_check_stop_to_fabric(se_cmd); } @@ -3199,6 +3200,7 @@ static void target_tmr_work(struct work_struct *work) cmd->se_tfo->queue_tm_rsp(cmd); check_stop: + transport_lun_remove_cmd(cmd); transport_cmd_check_stop_to_fabric(cmd); } @@ -3221,6 +3223,7 @@ int transport_generic_handle_tmr( pr_warn_ratelimited("handle_tmr caught CMD_T_ABORTED TMR %d" "ref_tag: %llu tag: %llu\n", cmd->se_tmr_req->function, cmd->se_tmr_req->ref_task_tag, cmd->tag); + transport_lun_remove_cmd(cmd); transport_cmd_check_stop_to_fabric(cmd); return 0; } -- 1.9.1
[PATCH 3/3] qla2xxx: Convert QLA_TGT_ABTS to TARGET_SCF_LOOKUP_LUN_FROM_TAG
From: Nicholas Bellinger Following Himanshu's earlier patch to drop the redundant tag lookup within __qlt_24xx_handle_abts(), go ahead and drop this now QLA_TGT_ABTS can use TARGET_SCF_LOOKUP_LUN_FROM_TAG and have target_submit_tmr() do this from common code. Cc: Himanshu Madhani Cc: Quinn Tran Cc: Mike Christie Cc: Hannes Reinecke Cc: Christoph Hellwig Signed-off-by: Nicholas Bellinger --- drivers/scsi/qla2xxx/qla_target.c | 39 +- drivers/scsi/qla2xxx/tcm_qla2xxx.c | 4 +++- 2 files changed, 12 insertions(+), 31 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c index 0e03ca2..401e245 100644 --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -1847,38 +1847,13 @@ static int __qlt_24xx_handle_abts(struct scsi_qla_host *vha, struct abts_recv_from_24xx *abts, struct fc_port *sess) { struct qla_hw_data *ha = vha->hw; - struct se_session *se_sess = sess->se_sess; struct qla_tgt_mgmt_cmd *mcmd; - struct se_cmd *se_cmd; - u32 lun = 0; int rc; - bool found_lun = false; - unsigned long flags; - - spin_lock_irqsave(&se_sess->sess_cmd_lock, flags); - list_for_each_entry(se_cmd, &se_sess->sess_cmd_list, se_cmd_list) { - struct qla_tgt_cmd *cmd = - container_of(se_cmd, struct qla_tgt_cmd, se_cmd); - if (se_cmd->tag == abts->exchange_addr_to_abort) { - lun = cmd->unpacked_lun; - found_lun = true; - break; - } - } - spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags); - /* cmd not in LIO lists, look in qla list */ - if (!found_lun) { - if (abort_cmd_for_tag(vha, abts->exchange_addr_to_abort)) { - /* send TASK_ABORT response immediately */ - qlt_24xx_send_abts_resp(vha, abts, FCP_TMF_CMPL, false); - return 0; - } else { - ql_dbg(ql_dbg_tgt_mgt, vha, 0xf081, - "unable to find cmd in driver or LIO for tag 0x%x\n", - abts->exchange_addr_to_abort); - return -ENOENT; - } + if (abort_cmd_for_tag(vha, abts->exchange_addr_to_abort)) { + /* send TASK_ABORT response immediately */ + qlt_24xx_send_abts_resp(vha, abts, FCP_TMF_CMPL, false); + return 0; } ql_dbg(ql_dbg_tgt_mgt, vha, 0xf00f, @@ -1899,7 +1874,11 @@ static int __qlt_24xx_handle_abts(struct scsi_qla_host *vha, mcmd->reset_count = vha->hw->chip_reset; mcmd->tmr_func = QLA_TGT_ABTS; - rc = ha->tgt.tgt_ops->handle_tmr(mcmd, lun, mcmd->tmr_func, + /* +* LUN is looked up by target-core internally based on the passed +* abts->exchange_addr_to_abort tag. +*/ + rc = ha->tgt.tgt_ops->handle_tmr(mcmd, 0, mcmd->tmr_func, abts->exchange_addr_to_abort); if (rc != 0) { ql_dbg(ql_dbg_tgt_mgt, vha, 0xf052, diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c index 7443e4e..75aeb9f 100644 --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c @@ -601,11 +601,13 @@ static int tcm_qla2xxx_handle_tmr(struct qla_tgt_mgmt_cmd *mcmd, uint32_t lun, struct fc_port *sess = mcmd->sess; struct se_cmd *se_cmd = &mcmd->se_cmd; int transl_tmr_func = 0; + int flags = TARGET_SCF_ACK_KREF; switch (tmr_func) { case QLA_TGT_ABTS: pr_debug("%ld: ABTS received\n", sess->vha->host_no); transl_tmr_func = TMR_ABORT_TASK; + flags |= TARGET_SCF_LOOKUP_LUN_FROM_TAG; break; case QLA_TGT_2G_ABORT_TASK: pr_debug("%ld: 2G Abort Task received\n", sess->vha->host_no); @@ -638,7 +640,7 @@ static int tcm_qla2xxx_handle_tmr(struct qla_tgt_mgmt_cmd *mcmd, uint32_t lun, } return target_submit_tmr(se_cmd, sess->se_sess, NULL, lun, mcmd, - transl_tmr_func, GFP_ATOMIC, tag, TARGET_SCF_ACK_KREF); + transl_tmr_func, GFP_ATOMIC, tag, flags); } static int tcm_qla2xxx_queue_data_in(struct se_cmd *se_cmd) -- 1.9.1
[PATCH 2/3] target: Add TARGET_SCF_LOOKUP_LUN_FROM_TAG support for ABORT_TASK
From: Nicholas Bellinger This patch introduces support in target_submit_tmr() for locating a unpacked_lun from an existing se_cmd->tag during ABORT_TASK. When TARGET_SCF_LOOKUP_LUN_FROM_TAG is set, target_submit_tmr() will do the extra lookup via target_lookup_lun_from_tag() and subsequently invoke transport_lookup_tmr_lun() so a proper percpu se_lun->lun_ref is taken before workqueue dispatch into se_device->tmr_wq happens. Aside from the extra target_lookup_lun_from_tag(), the existing code-path remains unchanged. Cc: Himanshu Madhani Cc: Quinn Tran Cc: Mike Christie Cc: Hannes Reinecke Cc: Christoph Hellwig Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_transport.c | 53 -- include/target/target_core_base.h | 3 +- 2 files changed, 46 insertions(+), 10 deletions(-) diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 83bfc97..dbb8101 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -1592,6 +1592,29 @@ static void target_complete_tmr_failure(struct work_struct *work) transport_cmd_check_stop_to_fabric(se_cmd); } +static bool target_lookup_lun_from_tag(struct se_session *se_sess, u64 tag, + u64 *unpacked_lun) +{ + struct se_cmd *se_cmd; + unsigned long flags; + bool ret = false; + + spin_lock_irqsave(&se_sess->sess_cmd_lock, flags); + list_for_each_entry(se_cmd, &se_sess->sess_cmd_list, se_cmd_list) { + if (se_cmd->se_cmd_flags & SCF_SCSI_TMR_CDB) + continue; + + if (se_cmd->tag == tag) { + *unpacked_lun = se_cmd->orig_fe_lun; + ret = true; + break; + } + } + spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags); + + return ret; +} + /** * target_submit_tmr - lookup unpacked lun and submit uninitialized se_cmd * for TMR CDBs @@ -1639,19 +1662,31 @@ int target_submit_tmr(struct se_cmd *se_cmd, struct se_session *se_sess, core_tmr_release_req(se_cmd->se_tmr_req); return ret; } + /* +* If this is ABORT_TASK with no explicit fabric provided LUN, +* go ahead and search active session tags for a match to figure +* out unpacked_lun for the original se_cmd. +*/ + if (tm_type == TMR_ABORT_TASK && (flags & TARGET_SCF_LOOKUP_LUN_FROM_TAG)) { + if (!target_lookup_lun_from_tag(se_sess, tag, &unpacked_lun)) + goto failure; + } ret = transport_lookup_tmr_lun(se_cmd, unpacked_lun); - if (ret) { - /* -* For callback during failure handling, push this work off -* to process context with TMR_LUN_DOES_NOT_EXIST status. -*/ - INIT_WORK(&se_cmd->work, target_complete_tmr_failure); - schedule_work(&se_cmd->work); - return 0; - } + if (ret) + goto failure; + transport_generic_handle_tmr(se_cmd); return 0; + + /* +* For callback during failure handling, push this work off +* to process context with TMR_LUN_DOES_NOT_EXIST status. +*/ +failure: + INIT_WORK(&se_cmd->work, target_complete_tmr_failure); + schedule_work(&se_cmd->work); + return 0; } EXPORT_SYMBOL(target_submit_tmr); diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h index db2c7b3..a3af69f 100644 --- a/include/target/target_core_base.h +++ b/include/target/target_core_base.h @@ -188,7 +188,8 @@ enum target_sc_flags_table { TARGET_SCF_BIDI_OP = 0x01, TARGET_SCF_ACK_KREF = 0x02, TARGET_SCF_UNKNOWN_SIZE = 0x04, - TARGET_SCF_USE_CPUID= 0x08, + TARGET_SCF_USE_CPUID= 0x08, + TARGET_SCF_LOOKUP_LUN_FROM_TAG = 0x10, }; /* fabric independent task management function values */ -- 1.9.1
[PATCH] iscsi-target: Fix delayed logout processing greater than SECONDS_FOR_LOGOUT_COMP
From: Nicholas Bellinger This patch fixes a BUG() in iscsit_close_session() that could be triggered when iscsit_logout_post_handler() execution from within tx thread context was not run for more than SECONDS_FOR_LOGOUT_COMP (15 seconds), and the TCP connection didn't already close before then forcing tx thread context to automatically exit. This would manifest itself during explicit logout as: [33206.974254] 1 connection(s) still exist for iSCSI session to iqn.1993-08.org.debian:01:3f5523242179 [33206.980184] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 2100.772 msecs [33209.078643] [ cut here ] [33209.078646] kernel BUG at drivers/target/iscsi/iscsi_target.c:4346! Normally when explicit logout attempt fails, the tx thread context exits and iscsit_close_connection() from rx thread context does the extra cleanup once it detects conn->conn_logout_remove has not been cleared by the logout type specific post handlers. To address this special case, if the logout post handler in tx thread context detects conn->tx_thread_active has already been cleared, simply return and exit in order for existing iscsit_close_connection() logic from rx thread context do failed logout cleanup. Reported-by: Bart Van Assche Cc: Mike Christie Cc: Hannes Reinecke Cc: Sagi Grimberg Cc: sta...@vger.kernel.org # 3.14+ Signed-off-by: Nicholas Bellinger --- drivers/target/iscsi/iscsi_target.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index 0d8f815..c025451 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -4423,8 +4423,11 @@ static void iscsit_logout_post_handler_closesession( * always sleep waiting for RX/TX thread shutdown to complete * within iscsit_close_connection(). */ - if (!conn->conn_transport->rdma_shutdown) + if (!conn->conn_transport->rdma_shutdown) { sleep = cmpxchg(&conn->tx_thread_active, true, false); + if (!sleep) + return; + } atomic_set(&conn->conn_logout_remove, 0); complete(&conn->conn_logout_comp); @@ -4440,8 +4443,11 @@ static void iscsit_logout_post_handler_samecid( { int sleep = 1; - if (!conn->conn_transport->rdma_shutdown) + if (!conn->conn_transport->rdma_shutdown) { sleep = cmpxchg(&conn->tx_thread_active, true, false); + if (!sleep) + return; + } atomic_set(&conn->conn_logout_remove, 0); complete(&conn->conn_logout_comp); -- 1.9.1
[PATCH] target: Fix kref->refcount underflow in transport_cmd_finish_abort
From: Nicholas Bellinger This patch fixes a se_cmd->cmd_kref underflow during CMD_T_ABORTED when a fabric driver drops it's second reference from below the target_core_tmr.c based callers of transport_cmd_finish_abort(). Recently with the conversion of kref to refcount_t, this bug was manifesting itself as: [705519.601034] refcount_t: underflow; use-after-free. [705519.604034] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 20116.512 msecs [705539.719111] [ cut here ] [705539.719117] WARNING: CPU: 3 PID: 26510 at lib/refcount.c:184 refcount_sub_and_test+0x33/0x51 Since the original kref atomic_t based kref_put() didn't check for underflow and only invoked the final callback when zero was reached, this bug did not manifest in practice since all se_cmd memory is using preallocated tags. To address this, go ahead and propigate the existing return from transport_put_cmd() up via transport_cmd_finish_abort(), and change transport_cmd_finish_abort() + core_tmr_handle_tas_abort() callers to only do their local target_put_sess_cmd() if necessary. Reported-by: Bart Van Assche Cc: Mike Christie Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Himanshu Madhani Cc: Sagi Grimberg Cc: sta...@vger.kernel.org # 3.14+ Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_internal.h | 2 +- drivers/target/target_core_tmr.c | 16 drivers/target/target_core_transport.c | 9 ++--- 3 files changed, 15 insertions(+), 12 deletions(-) diff --git a/drivers/target/target_core_internal.h b/drivers/target/target_core_internal.h index 9ab7090..0912de7 100644 --- a/drivers/target/target_core_internal.h +++ b/drivers/target/target_core_internal.h @@ -136,7 +136,7 @@ struct se_node_acl *core_tpg_add_initiator_node_acl(struct se_portal_group *tpg, void release_se_kmem_caches(void); u32scsi_get_new_index(scsi_index_t); void transport_subsystem_check_init(void); -void transport_cmd_finish_abort(struct se_cmd *, int); +inttransport_cmd_finish_abort(struct se_cmd *, int); unsigned char *transport_dump_cmd_direction(struct se_cmd *); void transport_dump_dev_state(struct se_device *, char *, int *); void transport_dump_dev_info(struct se_device *, struct se_lun *, diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c index dce1e1b..13f47bf 100644 --- a/drivers/target/target_core_tmr.c +++ b/drivers/target/target_core_tmr.c @@ -75,7 +75,7 @@ void core_tmr_release_req(struct se_tmr_req *tmr) kfree(tmr); } -static void core_tmr_handle_tas_abort(struct se_cmd *cmd, int tas) +static int core_tmr_handle_tas_abort(struct se_cmd *cmd, int tas) { unsigned long flags; bool remove = true, send_tas; @@ -91,7 +91,7 @@ static void core_tmr_handle_tas_abort(struct se_cmd *cmd, int tas) transport_send_task_abort(cmd); } - transport_cmd_finish_abort(cmd, remove); + return transport_cmd_finish_abort(cmd, remove); } static int target_check_cdb_and_preempt(struct list_head *list, @@ -184,8 +184,8 @@ void core_tmr_abort_task( cancel_work_sync(&se_cmd->work); transport_wait_for_tasks(se_cmd); - transport_cmd_finish_abort(se_cmd, true); - target_put_sess_cmd(se_cmd); + if (!transport_cmd_finish_abort(se_cmd, true)) + target_put_sess_cmd(se_cmd); printk("ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for" " ref_tag: %llu\n", ref_tag); @@ -281,8 +281,8 @@ static void core_tmr_drain_tmr_list( cancel_work_sync(&cmd->work); transport_wait_for_tasks(cmd); - transport_cmd_finish_abort(cmd, 1); - target_put_sess_cmd(cmd); + if (!transport_cmd_finish_abort(cmd, 1)) + target_put_sess_cmd(cmd); } } @@ -380,8 +380,8 @@ static void core_tmr_drain_state_list( cancel_work_sync(&cmd->work); transport_wait_for_tasks(cmd); - core_tmr_handle_tas_abort(cmd, tas); - target_put_sess_cmd(cmd); + if (!core_tmr_handle_tas_abort(cmd, tas)) + target_put_sess_cmd(cmd); } } diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 6025935..f1b3a46 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -651,9 +651,10 @@ static void transport_lun_remove_cmd(struct se_cmd *cmd) percpu_ref_put(&lun->lun_ref); } -void transport_cmd_finish_abort(struct se_cmd *cmd, int remove) +int transport_cmd_finish_abort(struct se_cmd *cmd, int remove) { bool ack_kref = (cmd->se_cmd_flags & SCF_ACK_KREF); + int ret = 0; if (cmd->se_cmd_flags & SCF_SE_LUN_CMD) transport_lun_remove_cmd(cmd); @@ -665,9 +666,1
Re: linux-next: upcoming conflict between the target-updates and target-bva trees
On Fri, 2017-06-02 at 13:39 +1000, Stephen Rothwell wrote: > Hi all, > > Last night the tagret-bva tree was rebased on top of the target-updates > tree. Just now, part of the target-updates tree has been rewritten. > So now I expect to get conflict(s) when I merge these trees since the > commits in the target-updates tree are no longer the same as those that > the target-bva tree was rebased on top of. > > Please take a little time to sort out your development process. > Unfortunately, this is going to continue to be a problem once these patches are list reviewed, and included in target-updates. Bart, can you please drop the patches until they are list reviewed so there isn't a constant stream of merge conflicts in linux-next for your un-reviewed code..?
Re: linux-next: manual merge of the target-bva tree with the target-updates tree
On Thu, 2017-06-01 at 14:14 -0700, Bart Van Assche wrote: > On 05/31/17 22:04, Nicholas A. Bellinger wrote: > > Go ahead and get list review on drivers/target/ changes before pushing > > them into linux-next, please. > > > > Btw, I don't care if you queue up one's that do have at least two > > Reviewed-bys into your tree, but everything that doesn't have > > Reviewed-bys or Acked-by should not be going into linux-next. > > It is not your job to rewrite the rules for linux-next. I'm following > the guidelines I received from Stephen in December 2016. You were copied > on the e-mail with guidelines Stephen sent to me. See also > https://www.spinics.net/lists/linux-next/msg38488.html. > > Stephen, if anything would have changed in the meantime that I'm not > aware of please let me know. > The point is you're not sending PULL requests. But like I said earlier, I really don't care if you put patches that have been reviewed into your tree for linux-next before I get a chance to review and pick them up for target-pending. However, you putting random un-reviewed changes is where I have to draw the line, especially considering what happened earlier in year where what you had in linux-next close to the merge window was completely and utterly broken. Would you put un-reviewed block and scsi changes into linux-next..? What would those subsystem maintainers say about that..? Why is drivers/target any different..?
Re: [PATCH] iscsi: Fix a sleep-in-atomic bug
On Fri, 2017-06-02 at 09:13 +0800, Jia-Ju Bai wrote: > On 06/01/2017 02:21 PM, Nicholas A. Bellinger wrote: > > Hi Jia-Ju, > > > > On Wed, 2017-05-31 at 11:26 +0800, Jia-Ju Bai wrote: > >> The driver may sleep under a spin lock, and the function call path is: > >> iscsit_tpg_enable_portal_group (acquire the lock by spin_lock) > >>iscsi_update_param_value > >> kstrdup(GFP_KERNEL) --> may sleep > >> > >> To fix it, the "GFP_KERNEL" is replaced with "GFP_ATOMIC". > >> > >> Signed-off-by: Jia-Ju Bai > >> --- > >> drivers/target/iscsi/iscsi_target_parameters.c |2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > > Btw, the use of tpg->tpg_state_lock in iscsit_tpg_enable_portal_group() > > while checking existing state and calling iscsi_update_param_value() is > > not necessary, since lio_target_tpg_enable_store() is already holding > > iscsit_get_tpg() -> tpg->tpg_access_lock. > > > > How about the following instead to only take tpg->tpg_state_lock when > > updating tpg->tpg_state instead..? > > > > diff --git a/drivers/target/iscsi/iscsi_target_tpg.c > > b/drivers/target/iscsi/iscsi_target_tpg.c > > index 2e7e08d..abaabba 100644 > > --- a/drivers/target/iscsi/iscsi_target_tpg.c > > +++ b/drivers/target/iscsi/iscsi_target_tpg.c > > @@ -311,11 +311,9 @@ int iscsit_tpg_enable_portal_group(struct > > iscsi_portal_group *tpg) > > struct iscsi_tiqn *tiqn = tpg->tpg_tiqn; > > int ret; > > > > - spin_lock(&tpg->tpg_state_lock); > > if (tpg->tpg_state == TPG_STATE_ACTIVE) { > > pr_err("iSCSI target portal group: %hu is already" > > " active, ignoring request.\n", tpg->tpgt); > > - spin_unlock(&tpg->tpg_state_lock); > > return -EINVAL; > > } > > /* > > @@ -324,10 +322,8 @@ int iscsit_tpg_enable_portal_group(struct > > iscsi_portal_group *tpg) > > * is enforced (as per default), and remove the NONE option. > > */ > > param = iscsi_find_param_from_key(AUTHMETHOD, tpg->param_list); > > - if (!param) { > > - spin_unlock(&tpg->tpg_state_lock); > > + if (!param) > > return -EINVAL; > > - } > > > > if (tpg->tpg_attrib.authentication) { > > if (!strcmp(param->value, NONE)) { > > @@ -341,6 +337,7 @@ int iscsit_tpg_enable_portal_group(struct > > iscsi_portal_group *tpg) > > goto err; > > } > > > > + spin_lock(&tpg->tpg_state_lock); > > tpg->tpg_state = TPG_STATE_ACTIVE; > > spin_unlock(&tpg->tpg_state_lock); > > > > @@ -353,7 +350,6 @@ int iscsit_tpg_enable_portal_group(struct > > iscsi_portal_group *tpg) > > return 0; > > > > err: > > - spin_unlock(&tpg->tpg_state_lock); > > return ret; > > } > > > I think it is fine to me. > > Thanks, > Jia-Ju Bai Applied with your Reported-by and Reviewed-by. Thanks!
[PATCH-v2] target: Avoid target_shutdown_sessions loop during queue_depth change
From: Nicholas Bellinger When target_shutdown_sessions() is invoked to shutdown all active sessions associated with a se_node_acl when se_node_acl->queue_depth is changed via core_tpg_set_initiator_node_queue_depth(), it's possible that new connections reconnect immediately after explicit shutdown occurs via target_shutdown_sessions(). Which means it's possible for the newly reconnected session with the proper queue_depth can be shutdown multiple times when target_shutdown_sessions() loops to drain all active sessions for all cases. This was regression was introduced by: commit bc6e6bb470eda42f44bcac96c261cff1216577b3 Author: Christoph Hellwig Date: Mon May 2 15:45:19 2016 +0200 target: consolidate and fix session shutdown To avoid this case, instead change target_shutdown_sessions() to pass 'do_restart' and avoid the looping drain of sessions when invoked via core_tpg_set_initiator_node_queue_depth(), but still loop during normal se_node_acl delete until all associated sessions have been shutdown. (v2 - go back to the original version instead of a local list, in order to protect list_del_init(&sess->sess_acl_list) from transport_deregister_session_configfs. Also use safe list walking in target_shutdown_sessions - nab) Cc: Christoph Hellwig Cc: Mike Christie Cc: Hannes Reinecke Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_tpg.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c index 3691373..1b2b60e 100644 --- a/drivers/target/target_core_tpg.c +++ b/drivers/target/target_core_tpg.c @@ -336,14 +336,14 @@ struct se_node_acl *core_tpg_add_initiator_node_acl( return acl; } -static void target_shutdown_sessions(struct se_node_acl *acl) +static void target_shutdown_sessions(struct se_node_acl *acl, bool do_restart) { - struct se_session *sess; + struct se_session *sess, *sess_tmp; unsigned long flags; restart: spin_lock_irqsave(&acl->nacl_sess_lock, flags); - list_for_each_entry(sess, &acl->acl_sess_list, sess_acl_list) { + list_for_each_entry_safe(sess, sess_tmp, &acl->acl_sess_list, sess_acl_list) { if (sess->sess_tearing_down) continue; @@ -352,7 +352,11 @@ static void target_shutdown_sessions(struct se_node_acl *acl) if (acl->se_tpg->se_tpg_tfo->close_session) acl->se_tpg->se_tpg_tfo->close_session(sess); - goto restart; + + if (do_restart) + goto restart; + + spin_lock_irqsave(&acl->nacl_sess_lock, flags); } spin_unlock_irqrestore(&acl->nacl_sess_lock, flags); } @@ -367,7 +371,7 @@ void core_tpg_del_initiator_node_acl(struct se_node_acl *acl) list_del(&acl->acl_list); mutex_unlock(&tpg->acl_node_mutex); - target_shutdown_sessions(acl); + target_shutdown_sessions(acl, true); target_put_nacl(acl); /* @@ -414,7 +418,7 @@ int core_tpg_set_initiator_node_queue_depth( /* * Shutdown all pending sessions to force session reinstatement. */ - target_shutdown_sessions(acl); + target_shutdown_sessions(acl, false); pr_debug("Successfully changed queue depth to: %d for Initiator" " Node: %s on %s Target Portal Group: %u\n", acl->queue_depth, -- 1.9.1
[PATCH 2/2] target/configfs: Kill se_lun->lun_link_magic
From: Nicholas Bellinger Instead of using a hardcoded magic value in se_lun when verifying a target config_item symlink source during target_fabric_mappedlun_link(), go ahead and use target_fabric_port_item_ops directly instead. Cc: Christoph Hellwig Cc: Mike Christie Cc: Hannes Reinecke Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_fabric_configfs.c | 13 - drivers/target/target_core_tpg.c | 1 - include/target/target_core_base.h| 2 -- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/target/target_core_fabric_configfs.c b/drivers/target/target_core_fabric_configfs.c index 2cbaecd..e9e917c 100644 --- a/drivers/target/target_core_fabric_configfs.c +++ b/drivers/target/target_core_fabric_configfs.c @@ -65,6 +65,8 @@ pr_debug("Setup generic %s\n", __stringify(_name)); \ } +static struct configfs_item_operations target_fabric_port_item_ops; + /* Start of tfc_tpg_mappedlun_cit */ static int target_fabric_mappedlun_link( @@ -72,19 +74,20 @@ static int target_fabric_mappedlun_link( struct config_item *lun_ci) { struct se_dev_entry *deve; - struct se_lun *lun = container_of(to_config_group(lun_ci), - struct se_lun, lun_group); + struct se_lun *lun; struct se_lun_acl *lacl = container_of(to_config_group(lun_acl_ci), struct se_lun_acl, se_lun_group); struct se_portal_group *se_tpg; struct config_item *nacl_ci, *tpg_ci, *tpg_ci_s, *wwn_ci, *wwn_ci_s; bool lun_access_ro; - if (lun->lun_link_magic != SE_LUN_LINK_MAGIC) { - pr_err("Bad lun->lun_link_magic, not a valid lun_ci pointer:" - " %p to struct lun: %p\n", lun_ci, lun); + if (!lun_ci->ci_type || + lun_ci->ci_type->ct_item_ops != &target_fabric_port_item_ops) { + pr_err("Bad lun_ci, not a valid lun_ci pointer: %p\n", lun_ci); return -EFAULT; } + lun = container_of(to_config_group(lun_ci), struct se_lun, lun_group); + /* * Ensure that the source port exists */ diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c index 74893ac..6b20b9f 100644 --- a/drivers/target/target_core_tpg.c +++ b/drivers/target/target_core_tpg.c @@ -584,7 +584,6 @@ struct se_lun *core_tpg_alloc_lun( return ERR_PTR(-ENOMEM); } lun->unpacked_lun = unpacked_lun; - lun->lun_link_magic = SE_LUN_LINK_MAGIC; atomic_set(&lun->lun_acl_count, 0); init_completion(&lun->lun_ref_comp); init_completion(&lun->lun_shutdown_comp); diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h index c3c14d0..47d9f38 100644 --- a/include/target/target_core_base.h +++ b/include/target/target_core_base.h @@ -701,8 +701,6 @@ struct scsi_port_stats { struct se_lun { u64 unpacked_lun; -#define SE_LUN_LINK_MAGIC 0x7771 - u32 lun_link_magic; boollun_shutdown; boollun_access_ro; u32 lun_index; -- 1.9.1
[PATCH 1/2] target/configfs: Kill se_device->dev_link_magic
From: Nicholas Bellinger Instead of using a hardcoded magic value in se_device when verifying a target config_item symlink source during target_fabric_port_link(), go ahead and use target_core_dev_item_ops directly instead. Cc: Christoph Hellwig Cc: Mike Christie Cc: Hannes Reinecke Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_configfs.c| 6 +- drivers/target/target_core_device.c | 1 - drivers/target/target_core_fabric_configfs.c | 12 +++- include/target/target_core_base.h| 2 -- 4 files changed, 12 insertions(+), 9 deletions(-) diff --git a/drivers/target/target_core_configfs.c b/drivers/target/target_core_configfs.c index 0326607..9b8abd5 100644 --- a/drivers/target/target_core_configfs.c +++ b/drivers/target/target_core_configfs.c @@ -2236,7 +2236,11 @@ static void target_core_dev_release(struct config_item *item) target_free_device(dev); } -static struct configfs_item_operations target_core_dev_item_ops = { +/* + * Used in target_core_fabric_configfs.c to verify valid se_device symlink + * within target_fabric_port_link() + */ +struct configfs_item_operations target_core_dev_item_ops = { .release= target_core_dev_release, }; diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c index a5762e6..e4771ce 100644 --- a/drivers/target/target_core_device.c +++ b/drivers/target/target_core_device.c @@ -756,7 +756,6 @@ struct se_device *target_alloc_device(struct se_hba *hba, const char *name) if (!dev) return NULL; - dev->dev_link_magic = SE_DEV_LINK_MAGIC; dev->se_hba = hba; dev->transport = hba->backend->ops; dev->prot_length = sizeof(struct t10_pi_tuple); diff --git a/drivers/target/target_core_fabric_configfs.c b/drivers/target/target_core_fabric_configfs.c index d1e6cab..2cbaecd 100644 --- a/drivers/target/target_core_fabric_configfs.c +++ b/drivers/target/target_core_fabric_configfs.c @@ -620,6 +620,8 @@ static ssize_t target_fabric_port_alua_tg_pt_write_md_store( NULL, }; +extern struct configfs_item_operations target_core_dev_item_ops; + static int target_fabric_port_link( struct config_item *lun_ci, struct config_item *se_dev_ci) @@ -628,16 +630,16 @@ static int target_fabric_port_link( struct se_lun *lun = container_of(to_config_group(lun_ci), struct se_lun, lun_group); struct se_portal_group *se_tpg; - struct se_device *dev = - container_of(to_config_group(se_dev_ci), struct se_device, dev_group); + struct se_device *dev; struct target_fabric_configfs *tf; int ret; - if (dev->dev_link_magic != SE_DEV_LINK_MAGIC) { - pr_err("Bad dev->dev_link_magic, not a valid se_dev_ci pointer:" - " %p to struct se_device: %p\n", se_dev_ci, dev); + if (!se_dev_ci->ci_type || + se_dev_ci->ci_type->ct_item_ops != &target_core_dev_item_ops) { + pr_err("Bad se_dev_ci, not a valid se_dev_ci pointer: %p\n", se_dev_ci); return -EFAULT; } + dev = container_of(to_config_group(se_dev_ci), struct se_device, dev_group); if (!(dev->dev_flags & DF_CONFIGURED)) { pr_err("se_device not configured yet, cannot port link\n"); diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h index 0c1dce2..c3c14d0 100644 --- a/include/target/target_core_base.h +++ b/include/target/target_core_base.h @@ -746,8 +746,6 @@ struct se_dev_stat_grps { }; struct se_device { -#define SE_DEV_LINK_MAGIC 0xfeeddeef - u32 dev_link_magic; /* RELATIVE TARGET PORT IDENTIFER Counter */ u16 dev_rpti_counter; /* Used for SAM Task Attribute ordering */ -- 1.9.1
Re: [PATCH] target: Avoid target_shutdown_sessions loop during queue_depth change
On Thu, 2017-06-01 at 08:57 +0200, Christoph Hellwig wrote: > How about this slightly easier to read version? Fine by me. Applied.
[PATCH-v2] target/iblock: Convert WRITE_SAME to blkdev_issue_zeroout
From: Nicholas Bellinger The people who are actively using iblock_execute_write_same_direct() are doing so in the context of ESX VAAI BlockZero, together with EXTENDED_COPY and COMPARE_AND_WRITE primitives. In practice though I've not seen any users of IBLOCK WRITE_SAME for anything other than VAAI BlockZero, so just using blkdev_issue_zeroout() when available, and falling back to iblock_execute_write_same() if the WRITE_SAME buffer contains anything other than zeros should be OK. (Hook up max_write_zeroes_sectors to signal LBPRZ feature bit in target_configure_unmap_from_queue - nab) Signed-off-by: Christoph Hellwig Cc: Christoph Hellwig Cc: Mike Christie Cc: Hannes Reinecke Cc: Martin K. Petersen Cc: Jens Axboe Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_device.c | 2 +- drivers/target/target_core_iblock.c | 44 +++-- 2 files changed, 28 insertions(+), 18 deletions(-) diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c index 8add07f38..a5762e6 100644 --- a/drivers/target/target_core_device.c +++ b/drivers/target/target_core_device.c @@ -851,7 +851,7 @@ bool target_configure_unmap_from_queue(struct se_dev_attrib *attrib, attrib->unmap_granularity = q->limits.discard_granularity / block_size; attrib->unmap_granularity_alignment = q->limits.discard_alignment / block_size; - attrib->unmap_zeroes_data = 0; + attrib->unmap_zeroes_data = (q->limits.max_write_zeroes_sectors); return true; } EXPORT_SYMBOL(target_configure_unmap_from_queue); diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c index bb069eb..b204413 100644 --- a/drivers/target/target_core_iblock.c +++ b/drivers/target/target_core_iblock.c @@ -86,6 +86,7 @@ static int iblock_configure_device(struct se_device *dev) struct block_device *bd = NULL; struct blk_integrity *bi; fmode_t mode; + unsigned int max_write_zeroes_sectors; int ret = -ENOMEM; if (!(ib_dev->ibd_flags & IBDF_HAS_UDEV_PATH)) { @@ -129,7 +130,11 @@ static int iblock_configure_device(struct se_device *dev) * Enable write same emulation for IBLOCK and use 0x as * the smaller WRITE_SAME(10) only has a two-byte block count. */ - dev->dev_attrib.max_write_same_len = 0x; + max_write_zeroes_sectors = bdev_write_zeroes_sectors(bd); + if (max_write_zeroes_sectors) + dev->dev_attrib.max_write_same_len = max_write_zeroes_sectors; + else + dev->dev_attrib.max_write_same_len = 0x; if (blk_queue_nonrot(q)) dev->dev_attrib.is_nonrot = 1; @@ -415,28 +420,31 @@ static void iblock_end_io_flush(struct bio *bio) } static sense_reason_t -iblock_execute_write_same_direct(struct block_device *bdev, struct se_cmd *cmd) +iblock_execute_zero_out(struct block_device *bdev, struct se_cmd *cmd) { struct se_device *dev = cmd->se_dev; struct scatterlist *sg = &cmd->t_data_sg[0]; - struct page *page = NULL; - int ret; + unsigned char *buf, zero = 0x00, *p = &zero; + int rc, ret; - if (sg->offset) { - page = alloc_page(GFP_KERNEL); - if (!page) - return TCM_OUT_OF_RESOURCES; - sg_copy_to_buffer(sg, cmd->t_data_nents, page_address(page), - dev->dev_attrib.block_size); - } + buf = kmap(sg_page(sg)) + sg->offset; + if (!buf) + return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; + /* +* Fall back to block_execute_write_same() slow-path if +* incoming WRITE_SAME payload does not contain zeros. +*/ + rc = memcmp(buf, p, cmd->data_length); + kunmap(sg_page(sg)); + + if (rc) + return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; - ret = blkdev_issue_write_same(bdev, + ret = blkdev_issue_zeroout(bdev, target_to_linux_sector(dev, cmd->t_task_lba), target_to_linux_sector(dev, sbc_get_write_same_sectors(cmd)), - GFP_KERNEL, page ? page : sg_page(sg)); - if (page) - __free_page(page); + GFP_KERNEL, false); if (ret) return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; @@ -472,8 +480,10 @@ static void iblock_end_io_flush(struct bio *bio) return TCM_INVALID_CDB_FIELD; } - if (bdev_write_same(bdev)) - return iblock_execute_write_same_direct(bdev, cmd); + if (bdev_write_zeroes_sectors(bdev)) { + if (!iblock_execute_zero_out(bdev, cmd)) + return 0; + } ibr = kzalloc(sizeof(struct iblock_req),
[PATCH] target: Avoid target_shutdown_sessions loop during queue_depth change
From: Nicholas Bellinger When target_shutdown_sessions() is invoked to shutdown all active sessions associated with a se_node_acl when se_node_acl->queue_depth is changed via core_tpg_set_initiator_node_queue_depth(), it's possible that new connections reconnect immediately after explicit shutdown occurs via target_shutdown_sessions(). Which means it's possible for the newly reconnected session with the proper queue_depth can be shutdown multiple times when target_shutdown_sessions() loops to drain all active sessions for all cases. This was regression was introduced by: commit bc6e6bb470eda42f44bcac96c261cff1216577b3 Author: Christoph Hellwig Date: Mon May 2 15:45:19 2016 +0200 target: consolidate and fix session shutdown To avoid this case, instead move sessions into a local list and avoid draining the same session multiple times when invoked via core_tpg_set_initiator_node_queue_depth(), but still loop during normal se_node_acl delete until all associated sessions have been shutdown. Cc: Christoph Hellwig Cc: Mike Christie Cc: Hannes Reinecke Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_tpg.c | 21 +++-- 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c index 310d9e5..7a7c2b5 100644 --- a/drivers/target/target_core_tpg.c +++ b/drivers/target/target_core_tpg.c @@ -336,10 +336,11 @@ struct se_node_acl *core_tpg_add_initiator_node_acl( return acl; } -static void target_shutdown_sessions(struct se_node_acl *acl) +static void target_shutdown_sessions(struct se_node_acl *acl, bool do_restart) { struct se_session *sess; unsigned long flags; + LIST_HEAD(tmp_list); restart: spin_lock_irqsave(&acl->nacl_sess_lock, flags); @@ -347,14 +348,22 @@ static void target_shutdown_sessions(struct se_node_acl *acl) if (sess->sess_tearing_down) continue; + list_move_tail(&sess->sess_acl_list, &tmp_list); + } + spin_unlock_irqrestore(&acl->nacl_sess_lock, flags); + + if (list_empty(&tmp_list)) + return; + + list_for_each_entry(sess, &tmp_list, sess_acl_list) { list_del_init(&sess->sess_acl_list); - spin_unlock_irqrestore(&acl->nacl_sess_lock, flags); if (acl->se_tpg->se_tpg_tfo->close_session) acl->se_tpg->se_tpg_tfo->close_session(sess); - goto restart; } - spin_unlock_irqrestore(&acl->nacl_sess_lock, flags); + + if (do_restart) + goto restart; } void core_tpg_del_initiator_node_acl(struct se_node_acl *acl) @@ -367,7 +376,7 @@ void core_tpg_del_initiator_node_acl(struct se_node_acl *acl) list_del(&acl->acl_list); mutex_unlock(&tpg->acl_node_mutex); - target_shutdown_sessions(acl); + target_shutdown_sessions(acl, true); target_put_nacl(acl); /* @@ -414,7 +423,7 @@ int core_tpg_set_initiator_node_queue_depth( /* * Shutdown all pending sessions to force session reinstatement. */ - target_shutdown_sessions(acl); + target_shutdown_sessions(acl, false); pr_debug("Successfully changed queue depth to: %d for Initiator" " Node: %s on %s Target Portal Group: %u\n", acl->queue_depth, -- 1.9.1
Re: [PATCH] iscsi: Fix a sleep-in-atomic bug
Hi Jia-Ju, On Wed, 2017-05-31 at 11:26 +0800, Jia-Ju Bai wrote: > The driver may sleep under a spin lock, and the function call path is: > iscsit_tpg_enable_portal_group (acquire the lock by spin_lock) > iscsi_update_param_value > kstrdup(GFP_KERNEL) --> may sleep > > To fix it, the "GFP_KERNEL" is replaced with "GFP_ATOMIC". > > Signed-off-by: Jia-Ju Bai > --- > drivers/target/iscsi/iscsi_target_parameters.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Btw, the use of tpg->tpg_state_lock in iscsit_tpg_enable_portal_group() while checking existing state and calling iscsi_update_param_value() is not necessary, since lio_target_tpg_enable_store() is already holding iscsit_get_tpg() -> tpg->tpg_access_lock. How about the following instead to only take tpg->tpg_state_lock when updating tpg->tpg_state instead..? diff --git a/drivers/target/iscsi/iscsi_target_tpg.c b/drivers/target/iscsi/iscsi_target_tpg.c index 2e7e08d..abaabba 100644 --- a/drivers/target/iscsi/iscsi_target_tpg.c +++ b/drivers/target/iscsi/iscsi_target_tpg.c @@ -311,11 +311,9 @@ int iscsit_tpg_enable_portal_group(struct iscsi_portal_group *tpg) struct iscsi_tiqn *tiqn = tpg->tpg_tiqn; int ret; - spin_lock(&tpg->tpg_state_lock); if (tpg->tpg_state == TPG_STATE_ACTIVE) { pr_err("iSCSI target portal group: %hu is already" " active, ignoring request.\n", tpg->tpgt); - spin_unlock(&tpg->tpg_state_lock); return -EINVAL; } /* @@ -324,10 +322,8 @@ int iscsit_tpg_enable_portal_group(struct iscsi_portal_group *tpg) * is enforced (as per default), and remove the NONE option. */ param = iscsi_find_param_from_key(AUTHMETHOD, tpg->param_list); - if (!param) { - spin_unlock(&tpg->tpg_state_lock); + if (!param) return -EINVAL; - } if (tpg->tpg_attrib.authentication) { if (!strcmp(param->value, NONE)) { @@ -341,6 +337,7 @@ int iscsit_tpg_enable_portal_group(struct iscsi_portal_group *tpg) goto err; } + spin_lock(&tpg->tpg_state_lock); tpg->tpg_state = TPG_STATE_ACTIVE; spin_unlock(&tpg->tpg_state_lock); @@ -353,7 +350,6 @@ int iscsit_tpg_enable_portal_group(struct iscsi_portal_group *tpg) return 0; err: - spin_unlock(&tpg->tpg_state_lock); return ret; }
Re: [PATCH] tcmu: Add fifo type waiter list support to avoid starvation
Hey MNC, Any comments on this..? It's been sitting on the list for a while now.. ;) On Fri, 2017-05-05 at 10:51 +0800, lixi...@cmss.chinamobile.com wrote: > From: Xiubo Li > > The fifo type waiter list will hold the udevs who are waiting for the > blocks from the data global pool. The unmap thread will try to feed the > first udevs in waiter list, if the global free blocks available are > not enough, it will stop traversing the list and abort waking up the > others. > > Signed-off-by: Xiubo Li > --- > drivers/target/target_core_user.c | 82 > +++ > 1 file changed, 66 insertions(+), 16 deletions(-) > > diff --git a/drivers/target/target_core_user.c > b/drivers/target/target_core_user.c > index 9045837..10c99e7 100644 > --- a/drivers/target/target_core_user.c > +++ b/drivers/target/target_core_user.c > @@ -97,6 +97,7 @@ struct tcmu_hba { > > struct tcmu_dev { > struct list_head node; > + struct list_head waiter; > > struct se_device se_dev; > > @@ -123,7 +124,7 @@ struct tcmu_dev { > wait_queue_head_t wait_cmdr; > struct mutex cmdr_lock; > > - bool waiting_global; > + uint32_t waiting_blocks; > uint32_t dbi_max; > uint32_t dbi_thresh; > DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS); > @@ -165,6 +166,10 @@ struct tcmu_cmd { > static DEFINE_MUTEX(root_udev_mutex); > static LIST_HEAD(root_udev); > > +/* The data blocks global pool waiter list */ > +static DEFINE_MUTEX(root_udev_waiter_mutex); > +static LIST_HEAD(root_udev_waiter); > + > static atomic_t global_db_count = ATOMIC_INIT(0); > > static struct kmem_cache *tcmu_cmd_cache; > @@ -195,6 +200,11 @@ enum tcmu_multicast_groups { > #define tcmu_cmd_set_dbi(cmd, index) ((cmd)->dbi[(cmd)->dbi_cur++] = (index)) > #define tcmu_cmd_get_dbi(cmd) ((cmd)->dbi[(cmd)->dbi_cur++]) > > +static inline bool is_in_waiter_list(struct tcmu_dev *udev) > +{ > + return !!udev->waiting_blocks; > +} > + > static void tcmu_cmd_free_data(struct tcmu_cmd *tcmu_cmd, uint32_t len) > { > struct tcmu_dev *udev = tcmu_cmd->tcmu_dev; > @@ -250,8 +260,6 @@ static bool tcmu_get_empty_blocks(struct tcmu_dev *udev, > { > int i; > > - udev->waiting_global = false; > - > for (i = tcmu_cmd->dbi_cur; i < tcmu_cmd->dbi_cnt; i++) { > if (!tcmu_get_empty_block(udev, tcmu_cmd)) > goto err; > @@ -259,9 +267,7 @@ static bool tcmu_get_empty_blocks(struct tcmu_dev *udev, > return true; > > err: > - udev->waiting_global = true; > - /* Try to wake up the unmap thread */ > - wake_up(&unmap_wait); > + udev->waiting_blocks += tcmu_cmd->dbi_cnt - i; > return false; > } > > @@ -671,6 +677,7 @@ static inline size_t tcmu_cmd_get_cmd_size(struct > tcmu_cmd *tcmu_cmd, > > while (!is_ring_space_avail(udev, tcmu_cmd, command_size, data_length)) > { > int ret; > + bool is_waiting_blocks = !!udev->waiting_blocks; > DEFINE_WAIT(__wait); > > prepare_to_wait(&udev->wait_cmdr, &__wait, TASK_INTERRUPTIBLE); > @@ -688,6 +695,18 @@ static inline size_t tcmu_cmd_get_cmd_size(struct > tcmu_cmd *tcmu_cmd, > return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; > } > > + /* > + * Try to insert the current udev to waiter list and > + * then wake up the unmap thread > + */ > + if (is_waiting_blocks) { > + mutex_lock(&root_udev_waiter_mutex); > + list_add_tail(&udev->waiter, &root_udev_waiter); > + mutex_unlock(&root_udev_waiter_mutex); > + > + wake_up(&unmap_wait); > + } > + > mutex_lock(&udev->cmdr_lock); > > /* We dropped cmdr_lock, cmd_head is stale */ > @@ -902,8 +921,6 @@ static unsigned int tcmu_handle_completions(struct > tcmu_dev *udev) > if (mb->cmd_tail == mb->cmd_head) > del_timer(&udev->timeout); /* no more pending cmds */ > > - wake_up(&udev->wait_cmdr); > - > return handled; > } > > @@ -996,7 +1013,17 @@ static int tcmu_irqcontrol(struct uio_info *info, s32 > irq_on) > struct tcmu_dev *tcmu_dev = container_of(info, struct tcmu_dev, > uio_info); > > mutex_lock(&tcmu_dev->cmdr_lock); > - tcmu_handle_completions(tcmu_dev); > + /* > + * If the current udev is also in waiter list, this will > + * make sure that the other waiters in list be feeded ahead > + * of it. > + */ > + if (is_in_waiter_list(tcmu_dev)) { > + wake_up(&unmap_wait); > + } else { > + tcmu_handle_completions(tcmu_dev); > + wake_up(&tcmu_dev->wait_cmdr); > + } > mutex_unlock(&tcmu_dev->cmdr_lock); > > return 0; > @@ -1231,7 +1258,7 @@ static int tcmu_configure_device(struct se_device *dev) > udev->data_off = CMDR_SI
Re: linux-next: manual merge of the target-bva tree with the target-updates tree
On Thu, 2017-06-01 at 05:05 +, Bart Van Assche wrote: > On Wed, 2017-05-31 at 21:27 -0700, Nicholas A. Bellinger wrote: > > but a weeks worth of list silence for your series doesn't mean > > you're free to push un-reviewed stuff for drivers/target/ into > > linux-next. > > I think this is an example of the pot calling the kettle black. > Your patch "target: Re-add check to reject control WRITEs with > overflow data" has not been reviewed by anyone but was pushed > into linux-next and sent to Linus anyway. Heh, it fixed a regression you yourself pointed out. :) If your going to report a bug and not review the patch to address the regression, I'm not going to let that regression slide to restore existing behavior, just because you didn't bother to review the patch in three plus weeks for the bug you reported. Anyways, I'll get to your patches, but please get reviews on the list by sending series that people want to review, instead of large unwieldy series that intermix new features and random bug-fixes without any context. No wonder why people don't send time reviewing them!
Re: linux-next: manual merge of the target-bva tree with the target-updates tree
On Thu, 2017-06-01 at 04:59 +, Bart Van Assche wrote: > On Thu, 2017-06-01 at 14:10 +1000, Stephen Rothwell wrote: > > Hi Bart, > > > > Today's linux-next merge of the target-bva tree got a conflict in: > > > > drivers/target/target_core_transport.c > > > > between commit: > > > > 4ff83daa0200 ("target: Re-add check to reject control WRITEs with > > overflow data") > > > > from the target-updates tree and commit: > > > > 2c0df665 ("target: Fix overflow/underflow handling of commands with a > > Data-Out buffer") > > > > from the target-bva tree. > > > > I fixed it up (I think (guidance appreciated), see below) and can > > carry the fix as necessary. This is now fixed as far as linux-next is > > concerned, but any non trivial conflicts should be mentioned to your > > upstream maintainer when your tree is submitted for merging. You may > > also want to consider cooperating with the maintainer of the conflicting > > tree to minimise any particularly complex conflicts. > > Hello Stephen, > > Thanks for having fixed this up. I hadn't noticed that Nic had queued up > patches > that conflict with my patches. I will rebase my tree. > Go ahead and get list review on drivers/target/ changes before pushing them into linux-next, please. Btw, I don't care if you queue up one's that do have at least two Reviewed-bys into your tree, but everything that doesn't have Reviewed-bys or Acked-by should not be going into linux-next.
Re: linux-next: manual merge of the target-bva tree with the target-updates tree
On Thu, 2017-06-01 at 14:10 +1000, Stephen Rothwell wrote: > Hi Bart, > > Today's linux-next merge of the target-bva tree got a conflict in: > > drivers/target/target_core_transport.c > > between commit: > > 4ff83daa0200 ("target: Re-add check to reject control WRITEs with overflow > data") > > from the target-updates tree and commit: > > 2c0df665 ("target: Fix overflow/underflow handling of commands with a > Data-Out buffer") > > from the target-bva tree. > > I fixed it up (I think (guidance appreciated), see below) and can > carry the fix as necessary. This is now fixed as far as linux-next is > concerned, but any non trivial conflicts should be mentioned to your > upstream maintainer when your tree is submitted for merging. You may > also want to consider cooperating with the maintainer of the conflicting > tree to minimise any particularly complex conflicts. > Hi Bart, The majority of this series hasn't been list reviewed, and shouldn't be getting pushed in linux-next without some form of reviews. I'll be getting back to v4.13 items now v4.12-rc fixes is out of the way, but a weeks worth of list silence for your series doesn't mean you're free to push un-reviewed stuff for drivers/target/ into linux-next.
[GIT PULL] target fixes for v4.12-rc4
Hello Linus, Here are the target-pending fixes for v4.12-rc4. Please go ahead and pull from: git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git master This includes: - ibmviscsis ABORT_TASK handling fixes that missed the v4.12 merge window. (Bryant Ly and Michael Cyr) - Re-add a target-core check enforcing WRITE overflow reject that was relaxed in v4.3, to avoid unsupported iscsi-target immediate data overflow. (nab) - Fix a target-core-user OOPs during device removal. (MNC + Bryant Ly) - Fix a long standing iscsi-target potential issue where kthread exit did not wait for kthread_should_stop(). (Jiang Yi) - Fix a iscsi-target v3.12.y regression OOPs involving initial login PDU processing during asynchronous TCP connection close. (MNC + nab) This is a little larger than usual for an -rc4, primarily due to the iscsi-target v3.12.y regression OOPs bug-fix. However, it's an important patch as MNC + Hannes where both able to trigger it using a reduced iscsi initiator login timeout combined with a backend taking a long time to complete I/Os during iscsi login driven session reinstatement. Note the trailing two patches have a recent CommitDate, as the second to last was updated today to include MNC's tested-by and reviewed-by tags. The patches themselves has not changed though, and both have been run through manual and automated regression testing from multiple parties over the last 7 days. Thank you, --nab Bryant G. Ly (2): ibmvscsis: Clear left-over abort_cmd pointers ibmvscsis: Fix the incorrect req_lim_delta Jiang Yi (1): iscsi-target: Always wait for kthread_should_stop() before kthread exit Mike Christie (1): tcmu: fix crash during device removal Nicholas Bellinger (2): target: Re-add check to reject control WRITEs with overflow data iscsi-target: Fix initial login PDU asynchronous socket close OOPs drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c | 27 - drivers/target/iscsi/iscsi_target.c | 30 - drivers/target/iscsi/iscsi_target_erl0.c | 6 +- drivers/target/iscsi/iscsi_target_erl0.h | 2 +- drivers/target/iscsi/iscsi_target_login.c | 4 + drivers/target/iscsi/iscsi_target_nego.c | 194 -- drivers/target/target_core_transport.c| 23 +++- drivers/target/target_core_user.c | 46 +-- include/target/iscsi/iscsi_target_core.h | 1 + 9 files changed, 241 insertions(+), 92 deletions(-)
Re: [PATCH] iscsi-target: Fix initial login PDU asynchronous socket close OOPs
On Wed, 2017-05-31 at 15:28 -0500, Mike Christie wrote: > On 05/30/2017 11:58 PM, Nicholas A. Bellinger wrote: > > Hey MNC, > > > > On Fri, 2017-05-26 at 22:14 -0500, Mike Christie wrote: > >> Thanks for the patch. > >> The patch fixes the crash for me. However, is there a possible > >> regression where if the initiator attempts new relogins we could run out > >> of memory? With the old code, we would free the login attempts resources > >> at this time, but with the new code the initiator will send more login > >> attempts and so we just keep allocating more memory for each attempt > >> until we run out or the login is finally able to complete. > > > > AFAICT, no. For the two cases in question: > > > > - Initial login request PDU processing done within iscsi_np kthread > > context in iscsi_target_start_negotiation(), and > > - subsequent login request PDU processing done by delayed work-queue > > kthread context in iscsi_target_do_login_rx() > > > > this patch doesn't change how aggressively connection cleanup happens > > for failed login attempts in the face of new connection login attempts > > for either case. > > > > For the first case when iscsi_np process context invokes > > iscsi_target_start_negotiation() -> iscsi_target_do_login() -> > > iscsi_check_for_session_reinstatement() to wait for backend I/O to > > complete, it still blocks other new connections from being accepted on > > the specific iscsi_np process context. > > > > This patch doesn't change this behavior. > > > > What it does change is when the host closes the connection and > > iscsi_target_sk_state_change() gets invoked, iscsi_np process context > > waits for iscsi_check_for_session_reinstatement() to complete before > > releasing the connection resources. > > > > However since iscsi_np process context is blocked, new connections won't > > be accepted until the new connection forcing session reinstatement > > finishes waiting for outstanding backend I/O to complete. > > I was seeing this. My original mail asked about iscsi login resources > incorrectly, but like you said we do not get that far. I get a giant > backlog (1 connection request per 5 seconds that we waited) of tcp level > connection requests and drops. When the wait is done I get a flood of > "iSCSI Login negotiation failed" due to the target handling all those > now stale requests/drops. Ah, I see what you mean. The TCP backlog = 256 default can fill up when a small host side login timeout is used while iscsi_np is blocked waiting for session reinstatement to complete. > > If we do not care about the memory use at the network level for this > case (it seems like a little and reconnects are not aggressive), then > patch works ok for me. I am guessing it gets nasty to handle, so maybe > not worth it to handle right now? Yeah, since it's a issue separate from root cause here, getting this merged first makes sense. > I tried to do it in my patch which is why it got all crazy with the > waits/wakeups :) > One option to consider is to immediately queue into delayed work-queue context from iscsi_target_start_negotiation() instead of doing the iscsi_target_do_login() and session reinstatement from iscsi_np context. Just taking a quick look, this seems like it would be a pretty straight-forward change.. > Thanks, and you can add a tested-by or reviewed-by from me. Great, thanks MNC. Will send out a PULL request for -rc4 shortly.
Re: [PATCH] iscsi-target: Fix initial login PDU asynchronous socket close OOPs
Hey MNC, On Fri, 2017-05-26 at 22:14 -0500, Mike Christie wrote: > Thanks for the patch. > Btw, after running DATERA's internal longevity and scale tests across ~20 racks on v4.1.y with this patch over the long weekend, there haven't been any additional regressions. > On 05/26/2017 12:32 AM, Nicholas A. Bellinger wrote: > > > > - state = iscsi_target_sk_state_check(sk); > > - write_unlock_bh(&sk->sk_callback_lock); > > - > > - pr_debug("iscsi_target_sk_state_change: state: %d\n", state); > > + orig_state_change(sk); > > > > - if (!state) { > > - pr_debug("iscsi_target_sk_state_change got failed state\n"); > > - schedule_delayed_work(&conn->login_cleanup_work, 0); > > I think login_cleanup_work is no longer used so you can also remove it > and its code. Yep, since this needs to goto stable, I left that part out for now.. Will take care of that post -rc4. > > The patch fixes the crash for me. However, is there a possible > regression where if the initiator attempts new relogins we could run out > of memory? With the old code, we would free the login attempts resources > at this time, but with the new code the initiator will send more login > attempts and so we just keep allocating more memory for each attempt > until we run out or the login is finally able to complete. AFAICT, no. For the two cases in question: - Initial login request PDU processing done within iscsi_np kthread context in iscsi_target_start_negotiation(), and - subsequent login request PDU processing done by delayed work-queue kthread context in iscsi_target_do_login_rx() this patch doesn't change how aggressively connection cleanup happens for failed login attempts in the face of new connection login attempts for either case. For the first case when iscsi_np process context invokes iscsi_target_start_negotiation() -> iscsi_target_do_login() -> iscsi_check_for_session_reinstatement() to wait for backend I/O to complete, it still blocks other new connections from being accepted on the specific iscsi_np process context. This patch doesn't change this behavior. What it does change is when the host closes the connection and iscsi_target_sk_state_change() gets invoked, iscsi_np process context waits for iscsi_check_for_session_reinstatement() to complete before releasing the connection resources. However since iscsi_np process context is blocked, new connections won't be accepted until the new connection forcing session reinstatement finishes waiting for outstanding backend I/O to complete. For the second case of subsequent non initial login request PDUs handled within delayed work-queue process context, AFAICT this patch doesn't change the original behavior either.. Namely when iscsi_target_do_login_rx() is active and host closes the connection causing iscsi_target_sk_state_change() to be invoked, it still checks for LOGIN_FLAGS_READ_ACTIVE and doesn't queue shutdown to occur. As per the original logic preceding this change, it continues to wait for iscsi_target_do_login_rx() to complete in delayed work-queue context, unless sock_recvmsg() returns a failure (which it should once TCP_CLOSE occurs) or times out via iscsi_target_login_timeout(). Once the failure is detected from iscsi_target_do_login_rx(), the remaining connection resources are related from there. That said, was there another case you had in mind..?
[PATCH] iscsi-target: Fix initial login PDU asynchronous socket close OOPs
From: Nicholas Bellinger This patch fixes a OOPs originally introduced by: commit bb048357dad6d604520c91586334c9c230366a14 Author: Nicholas Bellinger Date: Thu Sep 5 14:54:04 2013 -0700 iscsi-target: Add sk->sk_state_change to cleanup after TCP failure which would trigger a NULL pointer dereference when a TCP connection was closed asynchronously via iscsi_target_sk_state_change(), but only when the initial PDU processing in iscsi_target_do_login() from iscsi_np process context was blocked waiting for backend I/O to complete. To address this issue, this patch makes the following changes. First, it introduces some common helper functions used for checking socket closing state, checking login_flags, and atomically checking socket closing state + setting login_flags. Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP connection has dropped via iscsi_target_sk_state_change(), but the initial PDU processing within iscsi_target_do_login() in iscsi_np context is still running. For this case, it sets LOGIN_FLAGS_CLOSED, but doesn't invoke schedule_delayed_work(). The original NULL pointer dereference case reported by MNC is now handled by iscsi_target_do_login() doing a iscsi_target_sk_check_close() before transitioning to FFP to determine when the socket has already closed, or iscsi_target_start_negotiation() if the login needs to exchange more PDUs (eg: iscsi_target_do_login returned 0) but the socket has closed. For both of these cases, the cleanup up of remaining connection resources will occur in iscsi_target_start_negotiation() from iscsi_np process context once the failure is detected. Finally, to handle to case where iscsi_target_sk_state_change() is called after the initial PDU procesing is complete, it now invokes conn->login_work -> iscsi_target_do_login_rx() to perform cleanup once existing iscsi_target_sk_check_close() checks detect connection failure. For this case, the cleanup of remaining connection resources will occur in iscsi_target_do_login_rx() from delayed workqueue process context once the failure is detected. Reported-by: Mike Christie Cc: Mike Christie Reported-by: Hannes Reinecke Cc: Hannes Reinecke Cc: Sagi Grimberg Cc: Varun Prakash Signed-off-by: Nicholas Bellinger --- drivers/target/iscsi/iscsi_target_nego.c | 194 +-- include/target/iscsi/iscsi_target_core.h | 1 + 2 files changed, 133 insertions(+), 62 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c index 7ccc9c1..6f88b31 100644 --- a/drivers/target/iscsi/iscsi_target_nego.c +++ b/drivers/target/iscsi/iscsi_target_nego.c @@ -493,14 +493,60 @@ static void iscsi_target_restore_sock_callbacks(struct iscsi_conn *conn) static int iscsi_target_do_login(struct iscsi_conn *, struct iscsi_login *); -static bool iscsi_target_sk_state_check(struct sock *sk) +static bool __iscsi_target_sk_check_close(struct sock *sk) { if (sk->sk_state == TCP_CLOSE_WAIT || sk->sk_state == TCP_CLOSE) { - pr_debug("iscsi_target_sk_state_check: TCP_CLOSE_WAIT|TCP_CLOSE," + pr_debug("__iscsi_target_sk_check_close: TCP_CLOSE_WAIT|TCP_CLOSE," "returning FALSE\n"); - return false; + return true; } - return true; + return false; +} + +static bool iscsi_target_sk_check_close(struct iscsi_conn *conn) +{ + bool state = false; + + if (conn->sock) { + struct sock *sk = conn->sock->sk; + + read_lock_bh(&sk->sk_callback_lock); + state = (__iscsi_target_sk_check_close(sk) || +test_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags)); + read_unlock_bh(&sk->sk_callback_lock); + } + return state; +} + +static bool iscsi_target_sk_check_flag(struct iscsi_conn *conn, unsigned int flag) +{ + bool state = false; + + if (conn->sock) { + struct sock *sk = conn->sock->sk; + + read_lock_bh(&sk->sk_callback_lock); + state = test_bit(flag, &conn->login_flags); + read_unlock_bh(&sk->sk_callback_lock); + } + return state; +} + +static bool iscsi_target_sk_check_and_clear(struct iscsi_conn *conn, unsigned int flag) +{ + bool state = false; + + if (conn->sock) { + struct sock *sk = conn->sock->sk; + + write_lock_bh(&sk->sk_callback_lock); + state = (__iscsi_target_sk_check_close(sk) || +test_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags)); + if (!state) + clear_bit(flag, &conn->login_flags); + write_unlock_bh(&sk->sk_callback_lock); + } + return state; } static void iscsi_target_login_drop(struct iscsi_conn *conn, struct iscsi_login *login) @@ -540,6 +586,20 @@ static void iscsi_target_do_login_rx(struct work_struct *
Re: [RESEND PATCH] vhost/scsi: Don't reinvent the wheel but use existing llist API
On Fri, 2017-05-12 at 09:42 +0900, Byungchul Park wrote: > Although llist provides proper APIs, they are not used. Make them used. > > Signed-off-by: Byungchul Park > Acked-by: Nicholas Bellinger > --- > drivers/vhost/scsi.c | 11 +++ > 1 file changed, 3 insertions(+), 8 deletions(-) > Applied to target-pending/for-next. Thanks Byungchul.
Re: [PATCH] target: remove dead code
On Thu, 2017-05-11 at 13:39 -0700, Tyrel Datwyler wrote: > On 05/09/2017 02:46 PM, Gustavo A. R. Silva wrote: > > Local variable _ret_ is assigned to a constant value and it is never > > updated again. Remove this variable and the dead code it guards. > > > > Addresses-Coverity-ID: 140761 > > Signed-off-by: Gustavo A. R. Silva > > --- > > Reviewed-by: Tyrel Datwyler > Applied to target-pending/for-next. Thanks Gustavo + Tyrel.
[GIT PULL] target updates for v4.12-rc1
Hi Linus, Here are the target-pending updates for v4.12-rc1. Please go ahead and pull from: git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git for-next Things where alot more calm than previously expected. It's primarily fixes in various areas, with most of the new functionality centering around TCMU backend driver work that Xiubo Li has been driving. Here's the summary on the feature side: - Make T10-PI verify configurable for emulated (FILEIO + RD) backends (Dmitry Monakhov) - Allow target-core/TCMU pass-through to use in-kernel SPC-PR logic (Bryant Ly + MNC) - Add TCMU support for growing ring buffer size (Xiubo Li + MNC) - Add TCMU support for global block data pool (Xiubo Li + MNC) and on the bug-fix side: - Fix COMPARE_AND_WRITE non GOOD status handling for READ phase failures (Gary Guo + nab) - Fix iscsi-target hang with explicitly changing per NodeACL CmdSN number depth with concurrent login driven session reinstatement. (Gary Guo + nab) - Fix ibmvscsis fabric driver ABORT task handling (Bryant Ly) - Fix target-core/FILEIO zero length handling (Bart Van Assche) Also, there was an OOPs introduced with the WRITE_VERIFY changes that I ended up reverting at the last minute, because as not unusual Bart and I could not agree on the fix in time for -rc1. Since it's specific to a conformance test, it's been reverted for now. There is a separate patch in the queue to address the underlying control CDB write overflow regression in >= v4.3 separate from the WRITE_VERIFY revert here, that will be pushed post -rc1. Thank you, --nab Bart Van Assche (4): target: Fix VERIFY and WRITE VERIFY command parsing target/fileio: Fix zero-length READ and WRITE handling IB/srpt: Fix abort handling IB/srpt: Avoid that aborting a command triggers a kernel warning Bryant G. Ly (3): target: Add WRITE_VERIFY_16 target/user: PGR Support ibmvscsis: Do not send aborted task response Christophe Vu-Brugier (1): Documentation/target: add an example script to configure an iSCSI target Dmitry Monakhov (2): tcm_fileio: Prevent information leak for short reads tcm: make pi data verification configurable Elena Reshetova (1): target/iblock: convert iblock_req.pending from atomic_t to refcount_t Hannes Reinecke (2): target: fixup error message in target_tg_pt_gp_alua_access_type_store() target: fixup error message in target_tg_pt_gp_tg_pt_gp_id_store() Markus Elfring (8): iscsi-target: Use kcalloc() in iscsit_allocate_iovecs() iscsi-target: Delete error messages for failed memory allocations iscsi-target: Improve size determinations in four functions target: Use kcalloc() in two functions target: Delete error messages for failed memory allocations target: Improve size determinations in two functions target: Use kmalloc_array() in compare_and_write_callback() target: Use kmalloc_array() in transport_kmap_data_sg() Mike Christie (1): tcmu: fix module removal due to stuck thread Nicholas Bellinger (4): target: Fix compare_and_write_callback handling for non GOOD status iscsi-target: Set session_fall_back_to_erl0 when forcing reinstatement target: Don't force session reset if queue_depth does not change Revert "target: Fix VERIFY and WRITE VERIFY command parsing" Xiubo Li (3): tcmu: Add dynamic growing data area feature support tcmu: Add global data block pool support tcmu: Recalculate the tcmu_cmd size to save cmd area memories Zhu Lingshan (1): target/pr: update PR out action code table Documentation/target/target-export-device| 80 drivers/infiniband/ulp/srpt/ib_srpt.c| 9 +- drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c | 114 +++-- drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.h | 2 + drivers/target/iscsi/iscsi_target.c | 51 +-- drivers/target/iscsi/iscsi_target_configfs.c | 1 + drivers/target/iscsi/iscsi_target_login.c| 1 + drivers/target/target_core_configfs.c| 54 ++- drivers/target/target_core_device.c | 38 ++ drivers/target/target_core_file.c| 32 +- drivers/target/target_core_iblock.c | 12 +- drivers/target/target_core_iblock.h | 3 +- drivers/target/target_core_pr.c | 2 +- drivers/target/target_core_pr.h | 9 +- drivers/target/target_core_pscsi.c | 3 +- drivers/target/target_core_rd.c | 50 +-- drivers/target/target_core_sbc.c | 10 +- drivers/target/target_core_tpg.c | 7 + drivers/target/target_core_transport.c | 2 +- drivers/target/target_core_user.c| 645 +-- include/scsi/scsi_proto.h| 1 + include/target/target_core_backend.h | 1 + include/target/target_core_base.h| 1 + 23 files changed, 859 insertions(+), 269 deletions(-) create mode 100755 Documentation/target/target-export-device
[PATCH] target: Re-add check to reject control WRITEs with overflow data
From: Nicholas Bellinger During v4.3 when the overflow/underflow check was relaxed by commit c72c525022: commit c72c5250224d475614a00c1d7e54a67f77cd3410 Author: Roland Dreier Date: Wed Jul 22 15:08:18 2015 -0700 target: allow underflow/overflow for PR OUT etc. commands to allow underflow/overflow for Windows compliance + FCP, a consequence was to allow control CDBs to process overflow data for iscsi-target with immediate data as well. As per Roland's original change, continue to allow underflow cases for control CDBs to make Windows compliance + FCP happy, but until then explicitly reject all control WRITEs with overflow following pre v4.3.y logic. Reported-by: Bart Van Assche Cc: Roland Dreier Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_transport.c | 23 ++- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 37f5735..6025935 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -1160,15 +1160,28 @@ int transport_dump_vpd_ident( if (cmd->unknown_data_length) { cmd->data_length = size; } else if (size != cmd->data_length) { - pr_warn("TARGET_CORE[%s]: Expected Transfer Length:" + pr_warn_ratelimited("TARGET_CORE[%s]: Expected Transfer Length:" " %u does not match SCSI CDB Length: %u for SAM Opcode:" " 0x%02x\n", cmd->se_tfo->get_fabric_name(), cmd->data_length, size, cmd->t_task_cdb[0]); - if (cmd->data_direction == DMA_TO_DEVICE && - cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) { - pr_err("Rejecting underflow/overflow WRITE data\n"); - return TCM_INVALID_CDB_FIELD; + if (cmd->data_direction == DMA_TO_DEVICE) { + if (cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) { + pr_err_ratelimited("Rejecting underflow/overflow" + " for WRITE data CDB\n"); + return TCM_INVALID_CDB_FIELD; + } + /* +* Some fabric drivers like iscsi-target still expect to +* always reject overflow writes. Reject this case until +* full fabric driver level support for overflow writes +* is introduced tree-wide. +*/ + if (size > cmd->data_length) { + pr_err_ratelimited("Rejecting overflow for" + " WRITE control CDB\n"); + return TCM_INVALID_CDB_FIELD; + } } /* * Reject READ_* or WRITE_* with overflow/underflow for -- 1.9.1
[PATCH 3/3] target: Don't force session reset if queue_depth does not change
From: Nicholas Bellinger Keeping in the idempotent nature of target_core_fabric_configfs.c, if a queue_depth value is set and it's the same as the existing value, don't attempt to force session reinstatement. Reported-by: Raghu Krishnamurthy Cc: Raghu Krishnamurthy Tested-by: Gary Guo Cc: Gary Guo Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_tpg.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c index dfaef4d..310d9e5 100644 --- a/drivers/target/target_core_tpg.c +++ b/drivers/target/target_core_tpg.c @@ -398,6 +398,13 @@ int core_tpg_set_initiator_node_queue_depth( struct se_portal_group *tpg = acl->se_tpg; /* +* Allow the setting of se_node_acl queue_depth to be idempotent, +* and not force a session shutdown event if the value is not +* changing. +*/ + if (acl->queue_depth == queue_depth) + return 0; + /* * User has requested to change the queue depth for a Initiator Node. * Change the value in the Node's struct se_node_acl, and call * target_set_nacl_queue_depth() to set the new queue depth. -- 1.9.1
[PATCH 0/3] target: Fixes for v4.12-rc1
From: Nicholas Bellinger Hi all, Here are a couple of fixes from the last weeks testing while continuing longevity and scale out workloads on v4.x target code. This series contains three patches. The first is to address a COMPARE_AND_WRITE se_cmd reference leak where the READ phase hits a non GOOD status, observed with ESX VAAI hosts when outstanding READ I/O reaches a point where non SAM_STAT_GOOD status completions start to happen. The second addresses a hung task bug observed with iscsi-target ports while explicitly changing the active per se_node_acl queue_depth via the existing configfs attribute, if a new iscsi login was already forcing session reinstatement. And the third to is avoid forcing an session reinstatement if queue_depth is changed via configfs, but the value itself has not changed. The series has been verified on v4.1.y by DATERA Q/A and automation teams. Thank you, --nab Nicholas Bellinger (3): target: Fix compare_and_write_callback handling for non GOOD status iscsi-target: Set session_fall_back_to_erl0 when forcing reinstatement target: Don't force session reset if queue_depth does not change drivers/target/iscsi/iscsi_target.c | 1 + drivers/target/iscsi/iscsi_target_configfs.c | 1 + drivers/target/iscsi/iscsi_target_login.c| 1 + drivers/target/target_core_sbc.c | 5 - drivers/target/target_core_tpg.c | 7 +++ 5 files changed, 14 insertions(+), 1 deletion(-) -- 1.9.1
[PATCH 1/3] target: Fix compare_and_write_callback handling for non GOOD status
From: Nicholas Bellinger Following the bugfix for handling non SAM_STAT_GOOD COMPARE_AND_WRITE status during COMMIT phase in commit 9b2792c3da1, the same bug exists for the READ phase as well. This would manifest first as a lost SCSI response, and eventual hung task during fabric driver logout or re-login, as existing shutdown logic waited for the COMPARE_AND_WRITE se_cmd->cmd_kref to reach zero. To address this bug, compare_and_write_callback() has been changed to set post_ret = 1 and return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE as necessary to signal failure status. Reported-by: Bill Borsari Cc: Bill Borsari Tested-by: Gary Guo Cc: Gary Guo Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_sbc.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c index f9250b3..a0ad618 100644 --- a/drivers/target/target_core_sbc.c +++ b/drivers/target/target_core_sbc.c @@ -507,8 +507,11 @@ static sense_reason_t compare_and_write_callback(struct se_cmd *cmd, bool succes * been failed with a non-zero SCSI status. */ if (cmd->scsi_status) { - pr_err("compare_and_write_callback: non zero scsi_status:" + pr_debug("compare_and_write_callback: non zero scsi_status:" " 0x%02x\n", cmd->scsi_status); + *post_ret = 1; + if (cmd->scsi_status == SAM_STAT_CHECK_CONDITION) + ret = TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; goto out; } -- 1.9.1
[PATCH 2/3] iscsi-target: Set session_fall_back_to_erl0 when forcing reinstatement
From: Nicholas Bellinger While testing modification of per se_node_acl queue_depth forcing session reinstatement via lio_target_nacl_cmdsn_depth_store() -> core_tpg_set_initiator_node_queue_depth(), a hung task bug triggered when changing cmdsn_depth invoked session reinstatement while an iscsi login was already waiting for session reinstatement to complete. This can happen when an outstanding se_cmd descriptor is taking a long time to complete, and session reinstatement from iscsi login or cmdsn_depth change occurs concurrently. To address this bug, explicitly set session_fall_back_to_erl0 = 1 when forcing session reinstatement, so session reinstatement is not attempted if an active session is already being shutdown. This patch has been tested with two scenarios. The first when iscsi login is blocked waiting for iscsi session reinstatement to complete followed by queue_depth change via configfs, and second when queue_depth change via configfs us blocked followed by a iscsi login driven session reinstatement. Note this patch depends on commit d36ad77f702 to handle multiple sessions per se_node_acl when changing cmdsn_depth, and for pre v4.5 kernels will need to be included for stable as well. Reported-by: Gary Guo Tested-by: Gary Guo Cc: Gary Guo Signed-off-by: Nicholas Bellinger --- drivers/target/iscsi/iscsi_target.c | 1 + drivers/target/iscsi/iscsi_target_configfs.c | 1 + drivers/target/iscsi/iscsi_target_login.c| 1 + 3 files changed, 3 insertions(+) diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index 0f7ade0..26a9bcd 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -4663,6 +4663,7 @@ int iscsit_release_sessions_for_tpg(struct iscsi_portal_group *tpg, int force) continue; } atomic_set(&sess->session_reinstatement, 1); + atomic_set(&sess->session_fall_back_to_erl0, 1); spin_unlock(&sess->conn_lock); list_move_tail(&se_sess->sess_list, &free_list); diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configfs.c index 344e844..96d9c73 100644 --- a/drivers/target/iscsi/iscsi_target_configfs.c +++ b/drivers/target/iscsi/iscsi_target_configfs.c @@ -1528,6 +1528,7 @@ static void lio_tpg_close_session(struct se_session *se_sess) return; } atomic_set(&sess->session_reinstatement, 1); + atomic_set(&sess->session_fall_back_to_erl0, 1); spin_unlock(&sess->conn_lock); iscsit_stop_time2retain_timer(sess); diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c index ad8f301..6623847 100644 --- a/drivers/target/iscsi/iscsi_target_login.c +++ b/drivers/target/iscsi/iscsi_target_login.c @@ -208,6 +208,7 @@ int iscsi_check_for_session_reinstatement(struct iscsi_conn *conn) initiatorname_param->value) && (sess_p->sess_ops->SessionType == sessiontype))) { atomic_set(&sess_p->session_reinstatement, 1); + atomic_set(&sess_p->session_fall_back_to_erl0, 1); spin_unlock(&sess_p->conn_lock); iscsit_inc_session_usage_count(sess_p); iscsit_stop_time2retain_timer(sess_p); -- 1.9.1
Re: [PATCH] tcmu: Recalculate the tcmu_cmd size to save cmd area memories
On Tue, 2017-05-02 at 21:06 -0500, Mike Christie wrote: > On 05/02/2017 02:54 AM, lixi...@cmss.chinamobile.com wrote: > > From: Xiubo Li > > > > For the "struct tcmu_cmd_entry" in cmd area, the minimum size > > will be sizeof(struct tcmu_cmd_entry) == 112 Bytes. And it could > > fill about (sizeof(struct rsp) - sizeof(struct req)) / > > sizeof(struct iovec) == 68 / 16 ~= 4 data regions(iov[4]) by > > default. > > > > For most tcmu_cmds, the data block indexes allocated from the > > data area will be continuous. And for the continuous blocks they > > will be merged into the same region using only one iovec. For > > the current code, it will always allocates the same number of > > iovecs with blocks for each tcmu_cmd, and it will wastes much > > memories. > > > > For example, when the block size is 4K and the DATA_OUT buffer > > size is 64K, and the regions needed is less than 5(on my > > environment is almost 99.7%). The current code will allocate > > about 16 iovecs, and there will be (16 - 4) * sizeof(struct > > iovec) = 192 Bytes cmd area memories wasted. > > > > Here adds two helpers to calculate the base size and full size > > of the tcmu_cmd. And will recalculate them again when it make sure > > how many iovs is needed before insert it to cmd area. > > > > Signed-off-by: Xiubo Li > > Looks ok to me. Thanks. > > Acked-by: Mike Christie Applied. Thanks Xiubo + MNC.
Re: [PATCH v7 0/2] tcmu: Dynamic growing data area support
On Tue, 2017-05-02 at 11:38 +0800, lixi...@cmss.chinamobile.com wrote: > From: Xiubo Li > > Changed for V7: > - #1 fix two issues. > - #2 fix kbuild warning and some issues. > > Changed for V6: > - Remove the tcmu_vma_close(). Since the unmap thread will do the same for it > - The unmap thread will skip the busy devices. > - Using and testing the V5 version 3 weeks and the V6 for about 1 week, all > in a higher IOPS environment: > * using fio and dd commands > * using about 4 targets based user:rbd/user:file backend > * set the global pool size to 512 * 1024 blocks * block_size, for 4K page > size, the size is 2G. > * each target here needs more than 1100 blocks. > * fio: -iodepth 16 -thread -rw=[read write] -bs=[1M] -size=20G -numjobs=10 > -runtime=1000 ... > * restart the tcmu-runner at any time. > > Changed for V5: > - Rebase to the newest target-pending repository. > - Add as many comments as possbile to make the patch more readable. > - Move tcmu_handle_completions() in timeout handler to unmap thread > and then replace the spin lock with mutex lock(because the unmap_* > or zap_* may goto sleep) to simplify the patch and the code. > - Thanks very much for Mike's tips and suggestions. > - Tested this for more than 3 days by: > * using fio and dd commands > * using about 1~5 targets > * set the global pool size to [512 1024 2048 512 * 1024] blocks * block_size > * each target here needs more than 450 blocks when running in my > environments. > * fio: -iodepth [1 2 4 8 16] -thread -rw=[read write] -bs=[1K 2K 3K 5K 7K > 16K 64K 1M] -size=20G -numjobs=10 -runtime=1000 ... > * in the tcmu-runner, try to touch blocks out of tcmu_cmds' iov[] manually > * restart the tcmu-runner at any time. > * in my environment for the low IOPS case: the read throughput goes from > about 5200KB/s to 6700KB/s; the write throughput goes from about 3000KB/s to > 3700KB/s. > > Xiubo Li (2): > tcmu: Add dynamic growing data area feature support > tcmu: Add global data block pool support > > drivers/target/target_core_user.c | 602 > ++ > 1 file changed, 473 insertions(+), 129 deletions(-) > Ah, just missed the v7. ;) Applied. Thanks
Re: [PATCH v6 2/2] tcmu: Add global data block pool support
On Mon, 2017-05-01 at 13:40 -0500, Mike Christie wrote: > On 04/30/2017 06:29 AM, Xiubo Li wrote: > > [...] > >> To avoid starvation, I think you want a second list/fifo that holds the > >> watiers. In tcmu_get_empty_block if the list is not empty, record how > >> many pages we needed and then add the device to the list and wait in > >> tcmu_queue_cmd_ring. > >> > >> Above if we freed enough pages for the device at head then wake up the > >> device. > >> > >> I think you also need a wake_up call in the completion path in case the > >> initial call could not free enough pages. It could probably check if the > >> completion was going to free enough pages for a waiter and then call > >> wake. > >> > > Yes, I meant to introduce this later after this series to not let the > > patches too > > complex to review. > > > > If you agree I will do this later, or in V7 series ? > > > Yeah, I am ok with adding it after the initial patches go in. Btw, adding your Acked-by for the initial merge of these. If that's a problem, please make some noise. ;)
Re: [PATCH v6 0/2] tcmu: Dynamic growing data area support
Hi Xiubo & Co, On Wed, 2017-04-26 at 14:25 +0800, lixi...@cmss.chinamobile.com wrote: > From: Xiubo Li > > Changed for V6: > - Remove the tcmu_vma_close(). Since the unmap thread will do the same for it > - The unmap thread will skip the busy devices. > - Using and testing the V5 version 3 weeks and the V6 for about 1 week, all > in a higher IOPS environment: > * using fio and dd commands > * using about 4 targets based user:rbd/user:file backend > * set the global pool size to 512 * 1024 blocks * block_size, for 4K page > size, the size is 2G. > * each target here needs more than 1100 blocks. > * fio: -iodepth 16 -thread -rw=[read write] -bs=[1M] -size=20G -numjobs=10 > -runtime=1000 ... > * restart the tcmu-runner at any time. > > Changed for V5: > - Rebase to the newest target-pending repository. > - Add as many comments as possbile to make the patch more readable. > - Move tcmu_handle_completions() in timeout handler to unmap thread > and then replace the spin lock with mutex lock(because the unmap_* > or zap_* may goto sleep) to simplify the patch and the code. > - Thanks very much for Mike's tips and suggestions. > - Tested this for more than 3 days by: > * using fio and dd commands > * using about 1~5 targets > * set the global pool size to [512 1024 2048 512 * 1024] blocks * block_size > * each target here needs more than 450 blocks when running in my > environments. > * fio: -iodepth [1 2 4 8 16] -thread -rw=[read write] -bs=[1K 2K 3K 5K 7K > 16K 64K 1M] -size=20G -numjobs=10 -runtime=1000 ... > * in the tcmu-runner, try to touch blocks out of tcmu_cmds' iov[] manually > * restart the tcmu-runner at any time. > * in my environment for the low IOPS case: the read throughput goes from > about 5200KB/s to 6700KB/s; the write throughput goes from about 3000KB/s to > 3700KB/s. > > Xiubo Li (2): > tcmu: Add dynamic growing data area feature support > tcmu: Add global data block pool support > > drivers/target/target_core_user.c | 598 > ++ > 1 file changed, 469 insertions(+), 129 deletions(-) > So based upon the feedback from MNC, this looks OK to merge. It looks like there is one bug as reported by MNC introduced by patch #1. Please go ahead and post an incremental patch to address this at your earliest convenience. Thank you.
Re: [PATCH 0/5] target: Fine-tuning for some function implementations
Hi Markus, Apologies for the delayed follow-up here as well. On Sun, 2017-04-09 at 21:43 +0200, SF Markus Elfring wrote: > From: Markus Elfring > Date: Sun, 9 Apr 2017 21:33:21 +0200 > > A few update suggestions were taken into account > from static source code analysis. > > Markus Elfring (5): > rd: Use kcalloc() in two functions > rd: Delete error messages for failed memory allocations > rd: Improve size determinations in two functions > sbc: Use kmalloc_array() in compare_and_write_callback() > transport: Use kmalloc_array() in transport_kmap_data_sg() > > drivers/target/target_core_rd.c| 33 + > drivers/target/target_core_sbc.c | 4 ++-- > drivers/target/target_core_transport.c | 2 +- > 3 files changed, 12 insertions(+), 27 deletions(-) > Applied. Thank you.
[GIT PULL] target fixes for v4.11-rc7
Hi Linus, Here are the outstanding target-pending bug-fixes for v4.11-rc7. Please go ahead and pull from: git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git master There has been work in a number of different areas over the last weeks, including: - Fix target-core-user (TCMU) back-end bi-directional handling (Xiubo Li + Mike Christie + Ilias Tsitsimpis) - Fix iscsi-target TMR reference leak during session shutdown (Rob Millner + Chu Yuan Lin) - Fix target_core_fabric_configfs.c race between LUN shutdown + mapped LUN creation (James Shen) - Fix target-core unknown fabric callback queue-full errors (Potnuri Bharat Teja) - Fix iscsi-target + iser-target queue-full handling in order to support iw_cxgb4 RNICs. (Potnuri Bharat Teja + Sagi Grimberg) - Fix ALUA transition state race between multiple initiator (Mike Christie) - Drop work-around for legacy GlobalSAN initiator, to allow QLogic 57840S + 579xx offload HBAs to work out-of-the-box in MSFT environments. (Martin Svec + Arun Easi) Note that a number are CC'ed for stable, and although the queue-full bug-fixes required for iser-target to work with iw_cxgb4 aren't CC'ed here, they'll be posted to Greg-KH separately. Thank you, --nab Manish Narani (1): usb: gadget: Correct usb EP argument for BOT status request Mike Christie (1): target: Fix ALUA transition state race between multiple initiators Nicholas Bellinger (7): tcmu: Allow cmd_time_out to be set to zero (disabled) iscsi-target: Fix TMR reference leak during session shutdown target: Avoid mappedlun symlink creation during lun shutdown target: Fix unknown fabric callback queue-full errors iscsi-target: Propigate queue_data_in + queue_status errors iser-target: Fix queue-full response handling iscsi-target: Drop work-around for legacy GlobalSAN initiator Sagi Grimberg (1): iser-target: avoid posting a recv buffer twice Xiubo Li (3): tcmu: Fix possible overwrite of t_data_sg's last iov[] tcmu: Fix wrongly calculating of the base_command_size tcmu: Skip Data-Out blocks before gathering Data-In buffer for BIDI case drivers/infiniband/ulp/isert/ib_isert.c| 65 drivers/infiniband/ulp/isert/ib_isert.h| 3 +- drivers/target/iscsi/iscsi_target.c| 3 +- drivers/target/iscsi/iscsi_target_configfs.c | 13 +-- drivers/target/iscsi/iscsi_target_parameters.c | 16 --- drivers/target/iscsi/iscsi_target_util.c | 17 ++-- drivers/target/iscsi/iscsi_target_util.h | 2 +- drivers/target/target_core_alua.c | 136 + drivers/target/target_core_configfs.c | 2 +- drivers/target/target_core_fabric_configfs.c | 5 + drivers/target/target_core_tpg.c | 4 + drivers/target/target_core_transport.c | 102 --- drivers/target/target_core_user.c | 97 -- drivers/usb/gadget/function/f_tcm.c| 2 +- include/target/target_core_base.h | 10 +- 15 files changed, 261 insertions(+), 216 deletions(-)
Re: [PATCH] tcmu: Skip Data-Out blocks before gathering Data-In buffer for BIDI case
On Fri, 2017-03-31 at 10:35 +0800, lixi...@cmss.chinamobile.com wrote: > From: Xiubo Li > > For the bidirectional case, the Data-Out buffer blocks will always at > the head of the tcmu_cmd's bitmap, and before gathering the Data-In > buffer, first of all it should skip the Data-Out ones, or the device > supporting BIDI commands won't work. > > Fixed: 26418649eead ("target/user: Introduce data_bitmap, replace > data_length/data_head/data_tail") > Reported-by: Ilias Tsitsimpis > Signed-off-by: Xiubo Li > --- > drivers/target/target_core_user.c | 48 > +++ > 1 file changed, 33 insertions(+), 15 deletions(-) Applied to target-pending/master, with a CC' to linux-4.6.y stable. Thanks Xiubo + Ilias.
Re: [PATCH 1/2] iscsi-target: Fix TMR reference leak during session shutdown
On Thu, 2017-03-30 at 17:08 +, Bart Van Assche wrote: > On Thu, 2017-03-30 at 08:29 +0000, Nicholas A. Bellinger wrote: > > diff --git a/drivers/target/iscsi/iscsi_target_util.c > > b/drivers/target/iscsi/iscsi_target_util.c > > index 5041a9c..b464033 100644 > > --- a/drivers/target/iscsi/iscsi_target_util.c > > +++ b/drivers/target/iscsi/iscsi_target_util.c > > @@ -737,21 +737,23 @@ void iscsit_free_cmd(struct iscsi_cmd *cmd, bool > > shutdown) > > { > > struct se_cmd *se_cmd = NULL; > > int rc; > > + bool op_scsi = false; > > /* > > * Determine if a struct se_cmd is associated with > > * this struct iscsi_cmd. > > */ > > switch (cmd->iscsi_opcode) { > > case ISCSI_OP_SCSI_CMD: > > - se_cmd = &cmd->se_cmd; > > - __iscsit_free_cmd(cmd, true, shutdown); > > + op_scsi = true; > > /* > > * Fallthrough > > */ > > case ISCSI_OP_SCSI_TMFUNC: > > - rc = transport_generic_free_cmd(&cmd->se_cmd, shutdown); > > - if (!rc && shutdown && se_cmd && se_cmd->se_sess) { > > - __iscsit_free_cmd(cmd, true, shutdown); > > + se_cmd = &cmd->se_cmd; > > + __iscsit_free_cmd(cmd, op_scsi, shutdown); > > + rc = transport_generic_free_cmd(se_cmd, shutdown); > > + if (!rc && shutdown && se_cmd->se_sess) { > > + __iscsit_free_cmd(cmd, op_scsi, shutdown); > > target_put_sess_cmd(se_cmd); > > } > > break; > > Hello Nic, > > I agree that this patch fixes a leak. However, an existing bug in > iscsit_free_cmd() is not addressed by this patch. Before the TMF code started > using kref_get() / kref_put() it was possible for transport_generic_free_cmd() > to determine whether or not iscsit_free_cmd() should call __iscsit_free_cmd() > by checking the command reference count. I think that since the TMF code > manipulates the command reference count it is no longer possible for > transport_generic_free_cmd() to determine this. If iscsit_free_cmd() is called > while a LUN RESET is in progress then the return value of > transport_generic_free_cmd() will be wrong. No. Your assumption is incorrect wrt transport_generic_free_cmd() having a wrong return value during LUN_RESET. It's incorrect because when iscsit_free_cmd() is called with shutdown=true resulting in transport_generic_free_cmd() with wait_for_tasks=true, target_wait_free_cmd() checks for CMD_T_ABORTED to determine if se_cmd has been aborted by target_core_tmr.c logic, and returns aborted=true back up to transport_generic_free_cmd(). When transport_generic_free_cmd() gets aborted=true, it waits for se_cmd->cmd_wait_comp to finish and calls cmd->se_tfo->release_cmd() to release se_cmd back to the pre-allocated session pool. At this point, transport_generic_free_cmd() with aborted=true must return '1' it's caller, signaling iscsit_free_cmd() to not attempt to perform the extra __iscsi_free_cmd() or target_put_sess_cmd(), because se_cmd has already been released back to the session pool.
[PATCH] iscsi-target: Drop work-around for legacy GlobalSAN initiator
From: Nicholas Bellinger Once upon a time back in 2009, a work-around was added to support the GlobalSAN iSCSI initiator v3.3 for MacOSX, which during login did not propose nor respond to MaxBurstLength, FirstBurstLength, DefaultTime2Wait and DefaultTime2Retain keys. The work-around in iscsi_check_proposer_for_optional_reply() allowed the missing keys to be proposed, but did not require waiting for a response before moving to full feature phase operation. This allowed GlobalSAN v3.3 to work out-of-the box, and for many years we didn't run into login interopt issues with any other initiators.. Until recently, when Martin tried a QLogic 57840S iSCSI Offload HBA on Windows 2016 which completed login, but subsequently failed with: Got unknown iSCSI OpCode: 0x43 The issue was QLogic MSFT side did not propose DefaultTime2Wait + DefaultTime2Retain, so LIO proposes them itself, and immediately transitions to full feature phase because of the GlobalSAN hack. However, the QLogic MSFT side still attempts to respond to DefaultTime2Retain + DefaultTime2Wait, even though LIO has set ISCSI_FLAG_LOGIN_NEXT_STAGE3 + ISCSI_FLAG_LOGIN_TRANSIT in last login response. So while the QLogic MSFT side should have been proposing these two keys to start, it was doing the correct thing per RFC-3720 attempting to respond to proposed keys before transitioning to full feature phase. All that said, recent versions of GlobalSAN iSCSI (v5.3.0.541) does correctly propose the four keys during login, making the original work-around moot. So in order to allow QLogic MSFT to run unmodified as-is, go ahead and drop this long standing work-around. Reported-by: Martin Svec Cc: Martin Svec Cc: Himanshu Madhani Cc: Arun Easi Cc: sta...@vger.kernel.org # 3.1+ Signed-off-by: Nicholas Bellinger --- drivers/target/iscsi/iscsi_target_parameters.c | 16 1 file changed, 16 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target_parameters.c b/drivers/target/iscsi/iscsi_target_parameters.c index e65bf78..fce6276 100644 --- a/drivers/target/iscsi/iscsi_target_parameters.c +++ b/drivers/target/iscsi/iscsi_target_parameters.c @@ -782,22 +782,6 @@ static void iscsi_check_proposer_for_optional_reply(struct iscsi_param *param) if (!strcmp(param->name, MAXRECVDATASEGMENTLENGTH)) SET_PSTATE_REPLY_OPTIONAL(param); /* -* The GlobalSAN iSCSI Initiator for MacOSX does -* not respond to MaxBurstLength, FirstBurstLength, -* DefaultTime2Wait or DefaultTime2Retain parameter keys. -* So, we set them to 'reply optional' here, and assume the -* the defaults from iscsi_parameters.h if the initiator -* is not RFC compliant and the keys are not negotiated. -*/ - if (!strcmp(param->name, MAXBURSTLENGTH)) - SET_PSTATE_REPLY_OPTIONAL(param); - if (!strcmp(param->name, FIRSTBURSTLENGTH)) - SET_PSTATE_REPLY_OPTIONAL(param); - if (!strcmp(param->name, DEFAULTTIME2WAIT)) - SET_PSTATE_REPLY_OPTIONAL(param); - if (!strcmp(param->name, DEFAULTTIME2RETAIN)) - SET_PSTATE_REPLY_OPTIONAL(param); - /* * Required for gPXE iSCSI boot client */ if (!strcmp(param->name, MAXCONNECTIONS)) -- 1.9.1
Re: [PATCH 4.10 15/17] qla2xxx: Allow vref count to timeout on vport delete.
Hi Joe + Himanshu, Thanks for backporting this patch to apply to v4.10.y. I haven't seen it queued to v4.9.y yet though, would you be so kind to send out a v4.9.y backport as well to Greg-KH + stable team so it can be picked up for earlier stable versions too..? Thank you, On Thu, 2017-03-30 at 12:00 +0200, Greg Kroah-Hartman wrote: > 4.10-stable review patch. If anyone has any objections, please let me know. > > -- > > From: Joe Carnuccio > > commit c4a9b538ab2a109c5f9798bea1f8f4bf93aadfb9 upstream. > > Signed-off-by: Joe Carnuccio > Signed-off-by: Himanshu Madhani > Signed-off-by: Nicholas Bellinger > Signed-off-by: Greg Kroah-Hartman > > --- > drivers/scsi/qla2xxx/qla_attr.c |4 +--- > drivers/scsi/qla2xxx/qla_def.h |6 +- > drivers/scsi/qla2xxx/qla_init.c |1 + > drivers/scsi/qla2xxx/qla_mid.c | 14 -- > drivers/scsi/qla2xxx/qla_os.c |1 + > 5 files changed, 16 insertions(+), 10 deletions(-) > > --- a/drivers/scsi/qla2xxx/qla_attr.c > +++ b/drivers/scsi/qla2xxx/qla_attr.c > @@ -2154,8 +2154,6 @@ qla24xx_vport_delete(struct fc_vport *fc > "Timer for the VP[%d] has stopped\n", vha->vp_idx); > } > > - BUG_ON(atomic_read(&vha->vref_count)); > - > qla2x00_free_fcports(vha); > > mutex_lock(&ha->vport_lock); > @@ -2163,7 +2161,7 @@ qla24xx_vport_delete(struct fc_vport *fc > clear_bit(vha->vp_idx, ha->vp_idx_map); > mutex_unlock(&ha->vport_lock); > > - if (vha->qpair->vp_idx == vha->vp_idx) { > + if (vha->qpair && vha->qpair->vp_idx == vha->vp_idx) { > if (qla2xxx_delete_qpair(vha, vha->qpair) != QLA_SUCCESS) > ql_log(ql_log_warn, vha, 0x7087, > "Queue Pair delete failed.\n"); > --- a/drivers/scsi/qla2xxx/qla_def.h > +++ b/drivers/scsi/qla2xxx/qla_def.h > @@ -3788,6 +3788,7 @@ typedef struct scsi_qla_host { > struct qla8044_reset_template reset_tmplt; > struct qla_tgt_counters tgt_counters; > uint16_tbbcr; > + wait_queue_head_t vref_waitq; > } scsi_qla_host_t; > > struct qla27xx_image_status { > @@ -3843,14 +3844,17 @@ struct qla2_sgx { > mb(); \ > if (__vha->flags.delete_progress) { \ > atomic_dec(&__vha->vref_count); \ > + wake_up(&__vha->vref_waitq);\ > __bail = 1; \ > } else {\ > __bail = 0; \ > } \ > } while (0) > > -#define QLA_VHA_MARK_NOT_BUSY(__vha) \ > +#define QLA_VHA_MARK_NOT_BUSY(__vha) do {\ > atomic_dec(&__vha->vref_count); \ > + wake_up(&__vha->vref_waitq);\ > +} while (0) \ > > #define QLA_QPAIR_MARK_BUSY(__qpair, __bail) do {\ > atomic_inc(&__qpair->ref_count);\ > --- a/drivers/scsi/qla2xxx/qla_init.c > +++ b/drivers/scsi/qla2xxx/qla_init.c > @@ -4352,6 +4352,7 @@ qla2x00_update_fcports(scsi_qla_host_t * > } > } > atomic_dec(&vha->vref_count); > + wake_up(&vha->vref_waitq); > } > spin_unlock_irqrestore(&ha->vport_slock, flags); > } > --- a/drivers/scsi/qla2xxx/qla_mid.c > +++ b/drivers/scsi/qla2xxx/qla_mid.c > @@ -74,13 +74,14 @@ qla24xx_deallocate_vp_id(scsi_qla_host_t >* ensures no active vp_list traversal while the vport is removed >* from the queue) >*/ > - spin_lock_irqsave(&ha->vport_slock, flags); > - while (atomic_read(&vha->vref_count)) { > - spin_unlock_irqrestore(&ha->vport_slock, flags); > - > - msleep(500); > + wait_event_timeout(vha->vref_waitq, atomic_read(&vha->vref_count), > + 10*HZ); > > - spin_lock_irqsave(&ha->vport_slock, flags); > + spin_lock_irqsave(&ha->vport_slock, flags); > + if (atomic_read(&vha->vref_count)) { > + ql_dbg(ql_dbg_vport, vha, 0xfffa, > + "vha->vref_count=%u timeout\n", vha->vref_count.counter); > + vha->vref_count = (atomic_t)ATOMIC_INIT(0); > } > list_del(&vha->list); > qlt_update_vp_map(vha, RESET_VP_IDX); > @@ -269,6 +270,7 @@ qla2x00_alert_all_vps(struct rsp_que *rs > > spin_lock_irqsave(&ha->vport_slock, flags); > atomic_dec(&vha->vref_count); > + wake_up(&vha->vref_waitq); > } > i++; > } > --- a/drivers/scsi/qla2xxx/qla_os.c > +++ b/drivers/scsi/qla2xxx/qla_os.c > @@ -4215,6 +4215,7 @@ struct scsi_qla_host *qla2x00_create_hos > > spin_lock_init(&vha->work_lock); > spin_lock_init(&vha->cmd_list_lock); > + init_waitqueue_head(&vha->vref_waitq); > >
[PATCH 2/2] target: Avoid mappedlun symlink creation during lun shutdown
From: Nicholas Bellinger This patch closes a race between se_lun deletion during configfs unlink in target_fabric_port_unlink() -> core_dev_del_lun() -> core_tpg_remove_lun(), when transport_clear_lun_ref() blocks waiting for percpu_ref RCU grace period to finish, but a new NodeACL mappedlun is added before the RCU grace period has completed. This can happen in target_fabric_mappedlun_link() because it only checks for se_lun->lun_se_dev, which is not cleared until after transport_clear_lun_ref() percpu_ref RCU grace period finishes. This bug originally manifested as NULL pointer dereference OOPsen in target_stat_scsi_att_intr_port_show_attr_dev() on v4.1.y code, because it dereferences lun->lun_se_dev without a explicit NULL pointer check. In post v4.1 code with target-core RCU conversion, the code in target_stat_scsi_att_intr_port_show_attr_dev() no longer uses se_lun->lun_se_dev, but the same race still exists. To address the bug, go ahead and set se_lun>lun_shutdown as early as possible in core_tpg_remove_lun(), and ensure new NodeACL mappedlun creation in target_fabric_mappedlun_link() fails during se_lun shutdown. Reported-by: James Shen Cc: James Shen Tested-by: James Shen Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_fabric_configfs.c | 5 + drivers/target/target_core_tpg.c | 4 include/target/target_core_base.h| 1 + 3 files changed, 10 insertions(+) diff --git a/drivers/target/target_core_fabric_configfs.c b/drivers/target/target_core_fabric_configfs.c index d8a16ca..d1e6cab 100644 --- a/drivers/target/target_core_fabric_configfs.c +++ b/drivers/target/target_core_fabric_configfs.c @@ -92,6 +92,11 @@ static int target_fabric_mappedlun_link( pr_err("Source se_lun->lun_se_dev does not exist\n"); return -EINVAL; } + if (lun->lun_shutdown) { + pr_err("Unable to create mappedlun symlink because" + " lun->lun_shutdown=true\n"); + return -EINVAL; + } se_tpg = lun->lun_tpg; nacl_ci = &lun_acl_ci->ci_parent->ci_group->cg_item; diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c index 6fb1919..dfaef4d 100644 --- a/drivers/target/target_core_tpg.c +++ b/drivers/target/target_core_tpg.c @@ -642,6 +642,8 @@ void core_tpg_remove_lun( */ struct se_device *dev = rcu_dereference_raw(lun->lun_se_dev); + lun->lun_shutdown = true; + core_clear_lun_from_tpg(lun, tpg); /* * Wait for any active I/O references to percpu se_lun->lun_ref to @@ -663,6 +665,8 @@ void core_tpg_remove_lun( } if (!(dev->se_hba->hba_flags & HBA_FLAGS_INTERNAL_USE)) hlist_del_rcu(&lun->link); + + lun->lun_shutdown = false; mutex_unlock(&tpg->tpg_lun_mutex); percpu_ref_exit(&lun->lun_ref); diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h index 4b784b6..2e28246 100644 --- a/include/target/target_core_base.h +++ b/include/target/target_core_base.h @@ -705,6 +705,7 @@ struct se_lun { u64 unpacked_lun; #define SE_LUN_LINK_MAGIC 0x7771 u32 lun_link_magic; + boollun_shutdown; boollun_access_ro; u32 lun_index; -- 1.9.1
[PATCH 1/2] iscsi-target: Fix TMR reference leak during session shutdown
From: Nicholas Bellinger This patch fixes a iscsi-target specific TMR reference leak during session shutdown, that could occur when a TMR was quiesced before the hand-off back to iscsi-target code via transport_cmd_check_stop_to_fabric(). The reference leak happens because iscsit_free_cmd() was incorrectly skipping the final target_put_sess_cmd() for TMRs when transport_generic_free_cmd() returned zero because the se_cmd->cmd_kref did not reach zero, due to the missing se_cmd assignment in original code. The result was iscsi_cmd and it's associated se_cmd memory would be freed once se_sess->sess_cmd_map where released, but the associated se_tmr_req was leaked and remained part of se_device->dev_tmr_list. This bug would manfiest itself as kernel paging request OOPsen in core_tmr_lun_reset(), when a left-over se_tmr_req attempted to dereference it's se_cmd pointer that had already been released during normal session shutdown. To address this bug, go ahead and treat ISCSI_OP_SCSI_CMD and ISCSI_OP_SCSI_TMFUNC the same when there is an extra se_cmd->cmd_kref to drop in iscsit_free_cmd(), and use op_scsi to signal __iscsit_free_cmd() when the former needs to clear any further iscsi related I/O state. Reported-by: Rob Millner Cc: Rob Millner Reported-by: Chu Yuan Lin Cc: Chu Yuan Lin Tested-by: Chu Yuan Lin Signed-off-by: Nicholas Bellinger --- drivers/target/iscsi/iscsi_target_util.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c index 5041a9c..b464033 100644 --- a/drivers/target/iscsi/iscsi_target_util.c +++ b/drivers/target/iscsi/iscsi_target_util.c @@ -737,21 +737,23 @@ void iscsit_free_cmd(struct iscsi_cmd *cmd, bool shutdown) { struct se_cmd *se_cmd = NULL; int rc; + bool op_scsi = false; /* * Determine if a struct se_cmd is associated with * this struct iscsi_cmd. */ switch (cmd->iscsi_opcode) { case ISCSI_OP_SCSI_CMD: - se_cmd = &cmd->se_cmd; - __iscsit_free_cmd(cmd, true, shutdown); + op_scsi = true; /* * Fallthrough */ case ISCSI_OP_SCSI_TMFUNC: - rc = transport_generic_free_cmd(&cmd->se_cmd, shutdown); - if (!rc && shutdown && se_cmd && se_cmd->se_sess) { - __iscsit_free_cmd(cmd, true, shutdown); + se_cmd = &cmd->se_cmd; + __iscsit_free_cmd(cmd, op_scsi, shutdown); + rc = transport_generic_free_cmd(se_cmd, shutdown); + if (!rc && shutdown && se_cmd->se_sess) { + __iscsit_free_cmd(cmd, op_scsi, shutdown); target_put_sess_cmd(se_cmd); } break; -- 1.9.1
[PATCH 0/2] target: Bug-fixes for v4.11-rc
From: Nicholas Bellinger Hi all, Here are two additional target bug-fixes that have been found by the DATERA Q/A + automation team during extended longevity and stress testing atop v4.1.y stable code. The first is a iscsi-target specific TMR reference leak during session shutdown when se_cmd quiesce occured before hand-off back to iscsi-target, that results in se_tmr_req descriptors being left after se_cmd was freed. This would manifest as kernel paging OOPsen during LUN_RESET after the leak occured, because the associated se_tmr_req had not been released even though the pre-allocated parent se_cmd descriptor had already been freed during session shutdown. The second is a race between se_lun configfs unlink shutdown when se_lun->lun_ref percpu_ref RCU grace period is happening, and user-space attempts to add a new NodeACL mappedlun configfs symlink while se_lun shutdown is occuring. This would manifest as NULL pointer dereference OOPsen within target_stat_scsi_att_intr_port_show_attr_dev() after a new NodeACL mappedlun was added to a se_lun that had already been shutdown. Both have been verified on v4.1.y stable code, and forward ported to v4.11-rc. Please review. --nab Nicholas Bellinger (2): iscsi-target: Fix TMR reference leak during session shutdown target: Avoid mappedlun symlink creation during lun shutdown drivers/target/iscsi/iscsi_target_util.c | 12 +++- drivers/target/target_core_fabric_configfs.c | 5 + drivers/target/target_core_tpg.c | 4 include/target/target_core_base.h| 1 + 4 files changed, 17 insertions(+), 5 deletions(-) -- 1.9.1
Re: [PATCH 24/29] drivers: convert iblock_req.pending from atomic_t to refcount_t
Hi Elena, On Mon, 2017-03-06 at 16:21 +0200, Elena Reshetova wrote: > refcount_t type and corresponding API should be > used instead of atomic_t when the variable is used as > a reference counter. This allows to avoid accidental > refcounter overflows that might lead to use-after-free > situations. > > Signed-off-by: Elena Reshetova > Signed-off-by: Hans Liljestrand > Signed-off-by: Kees Cook > Signed-off-by: David Windsor > --- > drivers/target/target_core_iblock.c | 12 ++-- > drivers/target/target_core_iblock.h | 3 ++- > 2 files changed, 8 insertions(+), 7 deletions(-) After reading up on this thread, it looks like various subsystem maintainers are now picking these atomic_t -> refcount_t conversions.. That said, applied to target-pending/for-next and will plan to include for v4.12-rc1 merge window. Thanks!
Re: [PATCH] usb: gadget: Correct usb EP argument for BOT status request
Hi Manish, (Added target-devel CC') On Mon, 2017-03-20 at 15:05 +0530, Manish Narani wrote: > This patch corrects the argument in usb_ep_free_request as it is > mistakenly set to ep_out. It should be ep_in for status request. > > Signed-off-by: Manish Narani > --- > drivers/usb/gadget/function/f_tcm.c |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/drivers/usb/gadget/function/f_tcm.c > b/drivers/usb/gadget/function/f_tcm.c > index d235113..a82e2bd 100644 > --- a/drivers/usb/gadget/function/f_tcm.c > +++ b/drivers/usb/gadget/function/f_tcm.c > @@ -373,7 +373,7 @@ static void bot_cleanup_old_alt(struct f_uas *fu) > usb_ep_free_request(fu->ep_in, fu->bot_req_in); > usb_ep_free_request(fu->ep_out, fu->bot_req_out); > usb_ep_free_request(fu->ep_out, fu->cmd.req); > - usb_ep_free_request(fu->ep_out, fu->bot_status.req); > + usb_ep_free_request(fu->ep_in, fu->bot_status.req); > > kfree(fu->cmd.buf); > Applied to target-pending/master. Thanks!
Re: [GIT PULL] target fixes for v4.11-rc3
On Sat, 2017-03-18 at 19:08 -0700, Nicholas A. Bellinger wrote: > Hello Linus, > > Here are the target-pending fixes for v4.11-rc3 code. Please go ahead > and pull from: > > git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git master > > The bulk of the changes are in qla2xxx target driver code to address > various issues found during Cavium/QLogic's internal testing (stable > CC's included), along with a few other stability and smaller > miscellaneous improvements. Wrt the other qla2xxx improvements in this PULL request.. They where originally intended for v4.11-rc1, but missed the merge window due to a regression reported on the list. Since then, they have gone through 4 revisions over the last 7 weeks, and Cavium/QLogic folks have addressed the outstanding regression + feedback, and this series has passed their internal Q/A testing. So while I acknowledge they aren't all strictly bug-fixes (5 of them are CC'ed for stable), all are self contained within qla2xxx and I do trust the Cavium/QLogic team's sign-off for the complete series. Please consider pulling the series.
[GIT PULL] target fixes for v4.11-rc3
Hello Linus, Here are the target-pending fixes for v4.11-rc3 code. Please go ahead and pull from: git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git master The bulk of the changes are in qla2xxx target driver code to address various issues found during Cavium/QLogic's internal testing (stable CC's included), along with a few other stability and smaller miscellaneous improvements. There are also a couple of different patch sets from Mike Christie, which have been a result of his work to use target-core ALUA logic together with tcm-user backend driver. Finally, a patch to address some long standing issues with pass-through SCSI export of TYPE_TAPE + TYPE_MEDIUM_CHANGER devices, which will make folks using physical (or virtual) magnetic tape happy. Thank you, --nab Anil Gurumurthy (1): qla2xxx: Export DIF stats via debugfs Himanshu Madhani (2): qla2xxx: Add DebugFS node to display Port Database qla2xxx: Update driver version to 9.00.00.00-k Joe Carnuccio (1): qla2xxx: Allow vref count to timeout on vport delete. Max Lohrmann (1): target: Fix VERIFY_16 handling in sbc_parse_cdb Mike Christie (10): tcmu: allow hw_max_sectors greater than 128 tcmu: return on first Opt parse failure target: allow ALUA setup for some passthrough backends target: fail ALUA transitions for pscsi target: Use system workqueue for ALUA transitions target: fix ALUA transition timeout handling target: allow userspace to set state to transitioning target: fix race during implicit transition work flushes tcmu: add helper to check if dev was configured tcmu: make cmd timeout configurable Nicholas Bellinger (3): target/pscsi: Fix TYPE_TAPE + TYPE_MEDIMUM_CHANGER export target: Drop pointless tfo->check_stop_free check tcmu: Convert cmd_time_out into backend device attribute Quinn Tran (10): qla2xxx: Fix memory leak for abts processing qla2xxx: Fix request queue corruption. qla2xxx: Fix inadequate lock protection for ABTS. qla2xxx: Fix sess_lock & hardware_lock lock order problem. qla2xxx: Allow relogin to proceed if remote login did not finish qla2xxx: Improve T10-DIF/PI handling in driver. qla2xxx: Add async new target notification qla2xxx: Use IOCB interface to submit non-critical MBX. qla2xxx: Change scsi host lookup method. qla2xxx: Fix delayed response to command for loop mode/direct connect. drivers/scsi/qla2xxx/Kconfig | 1 + drivers/scsi/qla2xxx/qla_attr.c| 4 +- drivers/scsi/qla2xxx/qla_dbg.h | 1 + drivers/scsi/qla2xxx/qla_def.h | 56 ++- drivers/scsi/qla2xxx/qla_dfs.c | 107 - drivers/scsi/qla2xxx/qla_gbl.h | 18 +- drivers/scsi/qla2xxx/qla_init.c| 85 ++-- drivers/scsi/qla2xxx/qla_iocb.c| 13 +- drivers/scsi/qla2xxx/qla_isr.c | 41 +- drivers/scsi/qla2xxx/qla_mbx.c | 304 -- drivers/scsi/qla2xxx/qla_mid.c | 14 +- drivers/scsi/qla2xxx/qla_os.c | 23 +- drivers/scsi/qla2xxx/qla_target.c | 748 + drivers/scsi/qla2xxx/qla_target.h | 39 +- drivers/scsi/qla2xxx/qla_version.h | 6 +- drivers/scsi/qla2xxx/tcm_qla2xxx.c | 49 ++- drivers/target/target_core_alua.c | 82 ++-- drivers/target/target_core_configfs.c | 4 + drivers/target/target_core_pscsi.c | 50 +-- drivers/target/target_core_sbc.c | 10 +- drivers/target/target_core_tpg.c | 3 +- drivers/target/target_core_transport.c | 3 +- drivers/target/target_core_user.c | 152 +-- include/target/target_core_backend.h | 7 +- include/target/target_core_base.h | 2 +- 25 files changed, 1274 insertions(+), 548 deletions(-)
[PATCH] target; Drop pointless tfo->check_stop_free check
From: Nicholas Bellinger All in-tree fabric drivers provide a tfo->check_stop_free(), so there is no need to do the extra check within existing transport_cmd_check_stop_to_fabric() code. Just to be sure, add a check in target_fabric_tf_ops_check() to notify any out-of-tree drivers that might be missing it. Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_configfs.c | 4 drivers/target/target_core_transport.c | 3 +-- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/target/target_core_configfs.c b/drivers/target/target_core_configfs.c index 54b36c9..38b5025 100644 --- a/drivers/target/target_core_configfs.c +++ b/drivers/target/target_core_configfs.c @@ -421,6 +421,10 @@ static int target_fabric_tf_ops_check(const struct target_core_fabric_ops *tfo) pr_err("Missing tfo->aborted_task()\n"); return -EINVAL; } + if (!tfo->check_stop_free) { + pr_err("Missing tfo->check_stop_free()\n"); + return -EINVAL; + } /* * We at least require tfo->fabric_make_wwn(), tfo->fabric_drop_wwn() * tfo->fabric_make_tpg() and tfo->fabric_drop_tpg() in diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 434d9d6..b1a3cdb 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -636,8 +636,7 @@ static int transport_cmd_check_stop_to_fabric(struct se_cmd *cmd) * Fabric modules are expected to return '1' here if the se_cmd being * passed is released at this point, or zero if not being released. */ - return cmd->se_tfo->check_stop_free ? cmd->se_tfo->check_stop_free(cmd) - : 0; + return cmd->se_tfo->check_stop_free(cmd); } static void transport_lun_remove_cmd(struct se_cmd *cmd) -- 1.9.1
Re: [PATCH 24/29] drivers: convert iblock_req.pending from atomic_t to refcount_t
Hi Elena, On Mon, 2017-03-06 at 16:21 +0200, Elena Reshetova wrote: > refcount_t type and corresponding API should be > used instead of atomic_t when the variable is used as > a reference counter. This allows to avoid accidental > refcounter overflows that might lead to use-after-free > situations. > > Signed-off-by: Elena Reshetova > Signed-off-by: Hans Liljestrand > Signed-off-by: Kees Cook > Signed-off-by: David Windsor > --- > drivers/target/target_core_iblock.c | 12 ++-- > drivers/target/target_core_iblock.h | 3 ++- > 2 files changed, 8 insertions(+), 7 deletions(-) For the target_core_iblock part: Acked-by: Nicholas Bellinger
[GIT PULL] target updates for v4.11-rc1
Hello Linus, Here are the outstanding target pending changes for v4.11-rc1. Please go ahead and pull from: git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git for-next The highlights this round include: - Enable dual mode (initiator + target) qla2xxx operation. (Quinn + Himanshu) - Add a framework for qla2xxx async fabric discovery. (Quinn + Himanshu) - Enable iscsi PDU DDP completion offload in cxgbit/T6 NICs. (Varun) - Fix target-core handling of aborted failed commands. (Bart) - Fix a long standing target-core issue NULL pointer dereference with active I/O LUN shutdown. (Rob Millner + Bryant + nab) Things are shaping up to be a busy cycle for v4.12 with a new fabric driver (efct) in flight, and a number of other patches on the list being discussed. For post v4.11-rc1 fixes, there is a qla2xxx series that missed -rc1 that will be sent your way soon, along with a issue reported by Bart related to ib_srpt + LUN_RESET that we are still going back and forth on how the bug-fix should work. Beyond that, the number of folks with dedicated Q/A resources for upstream target code is steadily increasing, along with the number of production customers using the mainstream in-tree fabric drivers. Thank you, --nab Bart Van Assche (19): qla2xxx: Avoid using variable-length arrays target/cxgbit: Fix endianness annotations target/tcm_fc: Remove a set-but-not-used variable target/iscsi: Fix indentation in iscsi_target_start_negotiation() target/iscsi: Fix spelling of "perform" target/iscsi: Fix spelling of "reallegiance" target/iscsi: Introduce a helper function for TMF translation target/iscsi: Fix iSCSI task reassignment handling target: Remove se_tmr_req.tmr_lun target: Make core_tmr_abort_task() consider all commands target: Correct transport_wait_for_tasks() documentation target: Stop execution if CMD_T_STOP has been set target: Remove an overly chatty debug message target: Inline transport_cmd_check_stop() target: Move session check from target_put_sess_cmd() into target_release_cmd_kref() target: Remove command flag CMD_T_BUSY target: Remove command flag CMD_T_DEV_ACTIVE target: Fix handling of aborted failed commands target: Delete tmr from list before processing Dmitry V. Levin (1): uapi: fix linux/target_core_user.h userspace compilation errors Himanshu Madhani (2): qla2xxx: Remove SRR code qla2xxx: Remove unused reverse_ini_mode Joe Carnuccio (1): qla2xxx: Simplify usage of SRB structure in driver Mike Christie (1): target: export protocol identifier Nicholas Bellinger (3): target: Fix NULL dereference during LUN lookup + active I/O shutdown iscsi-target: Fix early login failure statistics misses target: Add counters for ABORT_TASK success + failure Quinn Tran (10): qla2xxx: Remove direct access of scsi_status field in se_cmd qla2xxx: Cleanup TMF code translation from qla_target qla2xxx: Make trace flags more readable qla2xxx: Fix wrong argument in sp done callback qla2xxx: Use d_id instead of s_id for more clarity qla2xxx: Track I-T nexus as single fc_port struct qla2xxx: Add framework for async fabric discovery qla2xxx: Add Dual mode support in the driver qla2xxx: Improve RSCN handling in driver qla2xxx: Fix a warning reported by the "smatch" static checker Varun Prakash (7): target/cxgbit: Use T6 specific macro to set the force bit target/iscsi: split iscsit_check_dataout_hdr() target/cxgbit: use cxgb4_tp_smt_idx() to get smt idx target/cxgbit: Use T6 specific macros to get ETH/IP hdr len target/cxgbit: Enable DDP for T6 only if data sequence and pdu are in order target/cxgbit: add T6 iSCSI DDP completion feature target/iscsi: Fix unsolicited data seq_end_offset calculation drivers/net/ethernet/chelsio/cxgb4/t4_msg.h|4 + drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.h |2 +- drivers/scsi/qla2xxx/qla_attr.c|3 + drivers/scsi/qla2xxx/qla_bsg.c | 23 +- drivers/scsi/qla2xxx/qla_def.h | 306 ++- drivers/scsi/qla2xxx/qla_dfs.c | 11 +- drivers/scsi/qla2xxx/qla_fw.h | 106 +- drivers/scsi/qla2xxx/qla_gbl.h | 72 +- drivers/scsi/qla2xxx/qla_gs.c | 726 +- drivers/scsi/qla2xxx/qla_init.c| 1612 + drivers/scsi/qla2xxx/qla_inline.h | 18 +- drivers/scsi/qla2xxx/qla_iocb.c| 167 +- drivers/scsi/qla2xxx/qla_isr.c | 317 ++- drivers/scsi/qla2xxx/qla_mbx.c | 232 +- drivers/scsi/qla2xxx/qla_mr.c | 48 +- drivers/scsi/qla2xxx/qla_os.c | 330 ++- drivers/scsi/qla2xxx/qla_target.c | 2392 +--- drivers/scsi/qla2xxx/qla_target.h | 252 +-- drivers/scsi/qla2xxx/tcm_qla2xxx.c | 256 ++- driver
Re: [PATCH] target: Fix NULL dereference during LUN lookup + active I/O shutdown
On Thu, 2017-02-23 at 11:46 -0600, Bryant G. Ly wrote: > > From: Nicholas Bellinger > > > > When transport_clear_lun_ref() is shutting down a se_lun via > > configfs with new I/O in-flight, it's possible to trigger a > > NULL pointer dereference in transport_lookup_cmd_lun() due > > to the fact percpu_ref_get() doesn't do any __PERCPU_REF_DEAD > > checking before incrementing lun->lun_ref.count after > > lun->lun_ref has switched to atomic_t mode. > > > > This results in a NULL pointer dereference as LUN shutdown > > code in core_tpg_remove_lun() continues running after the > > existing ->release() -> core_tpg_lun_ref_release() callback > > completes, and clears the RCU protected se_lun->lun_se_dev > > pointer. > > > > During the OOPs, the state of lun->lun_ref in the process > > which triggered the NULL pointer dereference looks like > > the following on v4.1.y stable code: > > > > struct se_lun { > >lun_link_magic = 4294932337, > >lun_status = TRANSPORT_LUN_STATUS_FREE, > > > >. > > > >lun_se_dev = 0x0, > >lun_sep = 0x0, > > > >. > > > >lun_ref = { > > count = { > >counter = 1 > > }, > > percpu_count_ptr = 3, > > release = 0xa02fa1e0 , > > confirm_switch = 0x0, > > force_atomic = false, > > rcu = { > >next = 0x88154fa1a5d0, > >func = 0x8137c4c0 > > } > >} > > } > > > > To address this bug, use percpu_ref_tryget_live() to ensure > > once __PERCPU_REF_DEAD is visable on all CPUs and ->lun_ref > > has switched to atomic_t, all new I/Os will fail to obtain > > a new lun->lun_ref reference. > > > > Also use an explicit percpu_ref_kill_and_confirm() callback > > to block on ->lun_ref_comp to allow the first stage and > > associated RCU grace period to complete, and then block on > > ->lun_ref_shutdown waiting for the final percpu_ref_put() > > to drop the last reference via transport_lun_remove_cmd() > > before continuing with core_tpg_remove_lun() shutdown. > > > > Reported-by: Rob Millner > > Tested-by: Rob Millner > > Cc: Rob Millner > > Tested-by: Vaibhav Tandon > > Cc: Vaibhav Tandon > > Signed-off-by: Nicholas Bellinger > > --- > > drivers/target/target_core_device.c| 10 -- > > drivers/target/target_core_tpg.c | 3 ++- > > drivers/target/target_core_transport.c | 31 > > ++- > > include/target/target_core_base.h | 1 + > > 4 files changed, 41 insertions(+), 4 deletions(-) > > > I have seen this and have tested this with our custom kernel. > > So this looks good from me! > Added your Tested-by to the patch. Thanks Bryant.
[PATCH] target: Fix NULL dereference during LUN lookup + active I/O shutdown
From: Nicholas Bellinger When transport_clear_lun_ref() is shutting down a se_lun via configfs with new I/O in-flight, it's possible to trigger a NULL pointer dereference in transport_lookup_cmd_lun() due to the fact percpu_ref_get() doesn't do any __PERCPU_REF_DEAD checking before incrementing lun->lun_ref.count after lun->lun_ref has switched to atomic_t mode. This results in a NULL pointer dereference as LUN shutdown code in core_tpg_remove_lun() continues running after the existing ->release() -> core_tpg_lun_ref_release() callback completes, and clears the RCU protected se_lun->lun_se_dev pointer. During the OOPs, the state of lun->lun_ref in the process which triggered the NULL pointer dereference looks like the following on v4.1.y stable code: struct se_lun { lun_link_magic = 4294932337, lun_status = TRANSPORT_LUN_STATUS_FREE, . lun_se_dev = 0x0, lun_sep = 0x0, . lun_ref = { count = { counter = 1 }, percpu_count_ptr = 3, release = 0xa02fa1e0 , confirm_switch = 0x0, force_atomic = false, rcu = { next = 0x88154fa1a5d0, func = 0x8137c4c0 } } } To address this bug, use percpu_ref_tryget_live() to ensure once __PERCPU_REF_DEAD is visable on all CPUs and ->lun_ref has switched to atomic_t, all new I/Os will fail to obtain a new lun->lun_ref reference. Also use an explicit percpu_ref_kill_and_confirm() callback to block on ->lun_ref_comp to allow the first stage and associated RCU grace period to complete, and then block on ->lun_ref_shutdown waiting for the final percpu_ref_put() to drop the last reference via transport_lun_remove_cmd() before continuing with core_tpg_remove_lun() shutdown. Reported-by: Rob Millner Tested-by: Rob Millner Cc: Rob Millner Tested-by: Vaibhav Tandon Cc: Vaibhav Tandon Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_device.c| 10 -- drivers/target/target_core_tpg.c | 3 ++- drivers/target/target_core_transport.c | 31 ++- include/target/target_core_base.h | 1 + 4 files changed, 41 insertions(+), 4 deletions(-) diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c index cb7047d..c754ae3 100644 --- a/drivers/target/target_core_device.c +++ b/drivers/target/target_core_device.c @@ -78,12 +78,16 @@ &deve->read_bytes); se_lun = rcu_dereference(deve->se_lun); + + if (!percpu_ref_tryget_live(&se_lun->lun_ref)) { + se_lun = NULL; + goto out_unlock; + } + se_cmd->se_lun = rcu_dereference(deve->se_lun); se_cmd->pr_res_key = deve->pr_res_key; se_cmd->orig_fe_lun = unpacked_lun; se_cmd->se_cmd_flags |= SCF_SE_LUN_CMD; - - percpu_ref_get(&se_lun->lun_ref); se_cmd->lun_ref_active = true; if ((se_cmd->data_direction == DMA_TO_DEVICE) && @@ -97,6 +101,7 @@ goto ref_dev; } } +out_unlock: rcu_read_unlock(); if (!se_lun) { @@ -815,6 +820,7 @@ struct se_device *target_alloc_device(struct se_hba *hba, const char *name) xcopy_lun = &dev->xcopy_lun; rcu_assign_pointer(xcopy_lun->lun_se_dev, dev); init_completion(&xcopy_lun->lun_ref_comp); + init_completion(&xcopy_lun->lun_shutdown_comp); INIT_LIST_HEAD(&xcopy_lun->lun_deve_list); INIT_LIST_HEAD(&xcopy_lun->lun_dev_link); mutex_init(&xcopy_lun->lun_tg_pt_md_mutex); diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c index d99752c..2744251 100644 --- a/drivers/target/target_core_tpg.c +++ b/drivers/target/target_core_tpg.c @@ -445,7 +445,7 @@ static void core_tpg_lun_ref_release(struct percpu_ref *ref) { struct se_lun *lun = container_of(ref, struct se_lun, lun_ref); - complete(&lun->lun_ref_comp); + complete(&lun->lun_shutdown_comp); } int core_tpg_register( @@ -571,6 +571,7 @@ struct se_lun *core_tpg_alloc_lun( lun->lun_link_magic = SE_LUN_LINK_MAGIC; atomic_set(&lun->lun_acl_count, 0); init_completion(&lun->lun_ref_comp); + init_completion(&lun->lun_shutdown_comp); INIT_LIST_HEAD(&lun->lun_deve_list); INIT_LIST_HEAD(&lun->lun_dev_link); atomic_set(&lun->lun_tg_pt_secondary_offline, 0); diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index efb9e6f..434d9d6 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -2700,10 +2700,39 @@ void target_wait_for_sess_cmds(struct se_session *se_sess) } EXPORT_SYMBOL(target_wait_for_sess_cmds); +static void target_lun_confirm(struct percpu_ref *ref) +{ + struct se_lun *lun = container_of(ref, struct se_lun, lun_ref); + + compl
Re: [PATCH v3 4/9] vhost/scsi: Don't reinvent the wheel but use existing llist API
Hi Byungchul, On Tue, 2017-02-14 at 16:26 +0900, Byungchul Park wrote: > Although llist provides proper APIs, they are not used. Make them used. > > Signed-off-by: Byungchul Park > --- > drivers/vhost/scsi.c | 11 +++ > 1 file changed, 3 insertions(+), 8 deletions(-) Acked-by: Nicholas Bellinger
Re: [PATCH] uapi: fix linux/target_core_user.h userspace compilation errors
Hi Dmitry, On Wed, 2017-02-15 at 23:04 +0300, Dmitry V. Levin wrote: > Consistently use types from linux/types.h to fix the following > linux/target_core_user.h userspace compilation errors: > > /usr/include/linux/target_core_user.h:108:4: error: unknown type name > 'uint32_t' > uint32_t iov_cnt; > /usr/include/linux/target_core_user.h:109:4: error: unknown type name > 'uint32_t' > uint32_t iov_bidi_cnt; > /usr/include/linux/target_core_user.h:110:4: error: unknown type name > 'uint32_t' > uint32_t iov_dif_cnt; > /usr/include/linux/target_core_user.h:111:4: error: unknown type name > 'uint64_t' > uint64_t cdb_off; > /usr/include/linux/target_core_user.h:112:4: error: unknown type name > 'uint64_t' > uint64_t __pad1; > /usr/include/linux/target_core_user.h:113:4: error: unknown type name > 'uint64_t' > uint64_t __pad2; > /usr/include/linux/target_core_user.h:117:4: error: unknown type name > 'uint8_t' > uint8_t scsi_status; > /usr/include/linux/target_core_user.h:118:4: error: unknown type name > 'uint8_t' > uint8_t __pad1; > /usr/include/linux/target_core_user.h:119:4: error: unknown type name > 'uint16_t' > uint16_t __pad2; > /usr/include/linux/target_core_user.h:120:4: error: unknown type name > 'uint32_t' > uint32_t __pad3; > > Signed-off-by: Dmitry V. Levin > --- > include/uapi/linux/target_core_user.h | 22 +++--- > 1 file changed, 11 insertions(+), 11 deletions(-) Applied to target-pending/for-next. Thank you.
[GIT PULL] target fixes for v4.10
Hello Linus, The following target series for v4.10 contains fixes which address a few long-standing bugs that DATERA's QA + automation teams have uncovered while putting v4.1.y target code into production usage. We've been running the top three in our nightly automated regression runs for the last two months, and the COMPARE_AND_WRITE fix Mr. Gary Guo has been manually verifying against a four node ESX cluster this past week. Note all of them have CC' stable tags. Please go ahead and pull from: git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git master This includes: - Fix a bug with ESX EXTENDED_COPY + SAM_STAT_RESERVATION_CONFLICT status, where target_core_xcopy.c logic was incorrectly returning SAM_STAT_CHECK_CONDITION for all non SAM_STAT_GOOD cases. (Nixon Vincent) - Fix a TMR LUN_RESET hung task bug while other in-flight TMRs are being aborted, before the new one had been dispatched into tmr_wq. (Rob Millner) - Fix a long standing double free OOPs, where a dynamically generated 'demo-mode' NodeACL has multiple sessions associated with it, and the /sys/kernel/config/target/$FABRIC/$WWN/ subsequently disables demo-mode, but never converts the dynamic ACL into a explicit ACL. (Rob Millner) - Fix a long standing reference leak with ESX VAAI COMPARE_AND_WRITE when the second phase WRITE COMMIT command fails, resulting in CHECK_CONDITION response never being sent and se_cmd->cmd_kref never reaching zero. (Gary Guo) Beyond these items on v4.1.y we've reproduced, fixed, and run through our regression test suite using iscsi-target exports, there are two additional outstanding list items: - Remove a >= v4.2 RCU conversion BUG_ON that would trigger when dynamic node NodeACLs where being converted to explicit NodeACLs. The patch drops the BUG_ON to follow how pre RCU conversion worked for this special case. (Benjamin Estrabaud) - Add ibmvscsis target_core_fabric_ops->max_data_sg_nent assignment to match what IBM's Virtual SCSI hypervisor is already enforcing at transport layer. (Bryant Ly + Steven Royer). Thank you, --nab Bryant G. Ly (1): ibmvscsis: Add SGL limit Nicholas Bellinger (5): target: Don't BUG_ON during NodeACL dynamic -> explicit conversion target: Use correct SCSI status during EXTENDED_COPY exception target: Fix early transport_generic_handle_tmr abort scenario target: Fix multi-session dynamic se_node_acl double free OOPs target: Fix COMPARE_AND_WRITE ref leak for non GOOD status drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c | 1 + drivers/target/target_core_device.c | 10 +++- drivers/target/target_core_sbc.c | 8 ++- drivers/target/target_core_transport.c | 86 +--- drivers/target/target_core_xcopy.c | 2 +- include/target/target_core_base.h| 1 + 6 files changed, 76 insertions(+), 32 deletions(-)
Re: [PATCH 1/5] target: Don't BUG_ON during NodeACL dynamic -> explicit conversion
On Tue, 2017-02-07 at 14:44 -0800, Christoph Hellwig wrote: > On Tue, Feb 07, 2017 at 01:17:46PM +0000, Nicholas A. Bellinger wrote: > > + if (orig->se_lun_acl != NULL) { > > + pr_warn_ratelimited("Detected existing explicit" > > + " se_lun_acl->se_lun_group reference for %s" > > + " mapped_lun: %llu, ignoring\n", > > +nacl->initiatorname, mapped_lun); > > The ignoring in the message confused the heck out of me first. But it > seems that's just an incorrect leftover from the original message, as the > changelog also says fail instead. With that fixed up (and maybe the > whole message in a single string literal on a single line): > Fixed up the message to use 'failed'.
Re: [PATCH 4/5] target: Fix multi-session dynamic se_node_acl double free OOPs
On Tue, 2017-02-07 at 19:46 -0800, Nicholas A. Bellinger wrote: > On Tue, 2017-02-07 at 15:12 -0800, Christoph Hellwig wrote: > > And the real patch after compile fixing it is here of course: > > > > Getting rid of the extra se_node_acl->acl_free_comp seems to make sense > here.. > > The only potential issue is if returning from configfs syscall > rmdir /sys/kernel/config/target/$FABRIC/$WWN/$TPGT/acls/$INITIATOR/ > before se_node_acl memory is released has implications when the > underlying ../$WWN/$TPGT/ is removed immediately after. > > In any event, I'll verify this patch with the original test case and use > it instead if everything looks OK. Ok, I remember why se_node_acl->acl_free_comp is needed now.. The issue is when se_node_acl is being explicitly removed from configfs while se_sessions associated with se_node_acl are still active. The se_node_acl->acl_group configfs_group_operations->drop_item() callback must block until all associated se_sessions have done their target_put_nacl() and completed se_node_acl->acl_free_comp, otherwise there is nothing preventing parent config_groups of se_node_acl from being removed while the se_sessions are still active. That said, dropping the unnecessary list_del_init() usage, but otherwise keeping the patch as-is.
Re: [PATCH 4/5] target: Fix multi-session dynamic se_node_acl double free OOPs
On Tue, 2017-02-07 at 15:12 -0800, Christoph Hellwig wrote: > And the real patch after compile fixing it is here of course: > Getting rid of the extra se_node_acl->acl_free_comp seems to make sense here.. The only potential issue is if returning from configfs syscall rmdir /sys/kernel/config/target/$FABRIC/$WWN/$TPGT/acls/$INITIATOR/ before se_node_acl memory is released has implications when the underlying ../$WWN/$TPGT/ is removed immediately after. In any event, I'll verify this patch with the original test case and use it instead if everything looks OK. > diff --git a/drivers/target/target_core_internal.h > b/drivers/target/target_core_internal.h > index 9ab7090f7c83..96c38f30069d 100644 > --- a/drivers/target/target_core_internal.h > +++ b/drivers/target/target_core_internal.h > @@ -152,6 +152,7 @@ void target_qf_do_work(struct work_struct *work); > bool target_check_wce(struct se_device *dev); > bool target_check_fua(struct se_device *dev); > void __target_execute_cmd(struct se_cmd *, bool); > +void target_put_nacl(struct se_node_acl *); > > /* target_core_stat.c */ > void target_stat_setup_dev_default_groups(struct se_device *); > diff --git a/drivers/target/target_core_tpg.c > b/drivers/target/target_core_tpg.c > index d99752c6cd60..08738c87e5b0 100644 > --- a/drivers/target/target_core_tpg.c > +++ b/drivers/target/target_core_tpg.c > @@ -197,7 +197,6 @@ static struct se_node_acl *target_alloc_node_acl(struct > se_portal_group *tpg, > INIT_LIST_HEAD(&acl->acl_sess_list); > INIT_HLIST_HEAD(&acl->lun_entry_hlist); > kref_init(&acl->acl_kref); > - init_completion(&acl->acl_free_comp); > spin_lock_init(&acl->nacl_sess_lock); > mutex_init(&acl->lun_entry_mutex); > atomic_set(&acl->acl_pr_ref_count, 0); > @@ -370,21 +369,6 @@ void core_tpg_del_initiator_node_acl(struct se_node_acl > *acl) > target_shutdown_sessions(acl); > > target_put_nacl(acl); > - /* > - * Wait for last target_put_nacl() to complete in target_complete_nacl() > - * for active fabric session transport_deregister_session() callbacks. > - */ > - wait_for_completion(&acl->acl_free_comp); > - > - core_tpg_wait_for_nacl_pr_ref(acl); > - core_free_device_list_for_node(acl, tpg); > - > - pr_debug("%s_TPG[%hu] - Deleted ACL with TCQ Depth: %d for %s" > - " Initiator Node: %s\n", tpg->se_tpg_tfo->get_fabric_name(), > - tpg->se_tpg_tfo->tpg_get_tag(tpg), acl->queue_depth, > - tpg->se_tpg_tfo->get_fabric_name(), acl->initiatorname); > - > - kfree(acl); > } > > /* core_tpg_set_initiator_node_queue_depth(): > diff --git a/drivers/target/target_core_transport.c > b/drivers/target/target_core_transport.c > index 1cadc9eefa21..9ebd86a8ef41 100644 > --- a/drivers/target/target_core_transport.c > +++ b/drivers/target/target_core_transport.c > @@ -453,19 +453,25 @@ ssize_t target_show_dynamic_sessions(struct > se_portal_group *se_tpg, char *page) > } > EXPORT_SYMBOL(target_show_dynamic_sessions); > > -static void target_complete_nacl(struct kref *kref) > +static void target_destroy_nacl(struct kref *kref) > { > struct se_node_acl *nacl = container_of(kref, > struct se_node_acl, acl_kref); > + struct se_portal_group *se_tpg = nacl->se_tpg; > > - complete(&nacl->acl_free_comp); > + mutex_lock(&se_tpg->acl_node_mutex); > + list_del(&nacl->acl_list); > + mutex_unlock(&se_tpg->acl_node_mutex); > + > + core_tpg_wait_for_nacl_pr_ref(nacl); > + core_free_device_list_for_node(nacl, se_tpg); > + kfree(nacl); > } > > void target_put_nacl(struct se_node_acl *nacl) > { > - kref_put(&nacl->acl_kref, target_complete_nacl); > + kref_put(&nacl->acl_kref, target_destroy_nacl); > } > -EXPORT_SYMBOL(target_put_nacl); > > void transport_deregister_session_configfs(struct se_session *se_sess) > { > @@ -499,12 +505,40 @@ EXPORT_SYMBOL(transport_deregister_session_configfs); > void transport_free_session(struct se_session *se_sess) > { > struct se_node_acl *se_nacl = se_sess->se_node_acl; > + > /* >* Drop the se_node_acl->nacl_kref obtained from within >* core_tpg_get_initiator_node_acl(). >*/ > if (se_nacl) { > + struct se_portal_group *se_tpg = se_nacl->se_tpg; > + const struct target_core_fabric_ops *se_tfo = > se_tpg->se_tpg_tfo; > + unsigned long flags; > + bool stop = false; > + > se_sess->se_node_acl = NULL; > + > + /* > + * Also determine if we need to drop the extra ->cmd_kref if > + * it had been previously dynamically generated, and > + * the endpoint is not caching dynamic ACLs. > + */ > + mutex_lock(&se_tpg->acl_node_mutex); > + if (se_nacl->dynamic_node_acl && > + !se_tfo->tpg_check_demo_mode_cache(se_tpg)) { >
[PATCH 1/5] target: Don't BUG_ON during NodeACL dynamic -> explicit conversion
From: Nicholas Bellinger After the v4.2+ RCU conversion to se_node_acl->lun_entry_hlist, a BUG_ON() was added in core_enable_device_list_for_node() to detect when the passed *lun does not match the existing orig->se_lun pointer reference. However, this scenario can occur happen when a dynamically generated NodeACL is being converted to an explicit NodeACL, when the explicit NodeACL contains a different LUN mapping than the default provided by the WWN endpoint. So instead of triggering BUG_ON(), go ahead and fail instead following the original pre RCU conversion logic. Reported-by: Benjamin ESTRABAUD Cc: Benjamin ESTRABAUD Cc: sta...@vger.kernel.org # 4.2+ Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_device.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c index 1ebd13e..23e89af 100644 --- a/drivers/target/target_core_device.c +++ b/drivers/target/target_core_device.c @@ -345,14 +345,22 @@ int core_enable_device_list_for_node( lockdep_is_held(&nacl->lun_entry_mutex)); if (orig_lun != lun) { - pr_err("Existing orig->se_lun doesn't match new lun" - " for dynamic -> explicit NodeACL conversion:" - " %s\n", nacl->initiatorname); + pr_warn_ratelimited("Existing orig->se_lun doesn't match" + " new lun for dynamic -> explicit NodeACL" + " conversion: %s\n", nacl->initiatorname); + mutex_unlock(&nacl->lun_entry_mutex); + kfree(new); + return -EINVAL; + } + if (orig->se_lun_acl != NULL) { + pr_warn_ratelimited("Detected existing explicit" + " se_lun_acl->se_lun_group reference for %s" + " mapped_lun: %llu, ignoring\n", +nacl->initiatorname, mapped_lun); mutex_unlock(&nacl->lun_entry_mutex); kfree(new); return -EINVAL; } - BUG_ON(orig->se_lun_acl != NULL); rcu_assign_pointer(new->se_lun, lun); rcu_assign_pointer(new->se_lun_acl, lun_acl); -- 1.9.1
[PATCH 2/5] target: Use correct SCSI status during EXTENDED_COPY exception
From: Nicholas Bellinger This patch adds the missing target_complete_cmd() SCSI status parameter change in target_xcopy_do_work(), that was originally missing in commit 926317de33. It correctly propigates up the correct SCSI status during EXTENDED_COPY exception cases, instead of always using the hardcoded SAM_STAT_CHECK_CONDITION from original code. This is required for ESX host environments that expect to hit SAM_STAT_RESERVATION_CONFLICT for certain scenarios, and SAM_STAT_CHECK_CONDITION results in non-retriable status for these cases. Reported-by: Nixon Vincent Tested-by: Nixon Vincent Cc: Nixon Vincent Cc: sta...@vger.kernel.org # 3.14+ Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_xcopy.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/target/target_core_xcopy.c b/drivers/target/target_core_xcopy.c index d828b3b..cac5a20 100644 --- a/drivers/target/target_core_xcopy.c +++ b/drivers/target/target_core_xcopy.c @@ -864,7 +864,7 @@ static void target_xcopy_do_work(struct work_struct *work) " CHECK_CONDITION -> sending response\n", rc); ec_cmd->scsi_status = SAM_STAT_CHECK_CONDITION; } - target_complete_cmd(ec_cmd, SAM_STAT_CHECK_CONDITION); + target_complete_cmd(ec_cmd, ec_cmd->scsi_status); } sense_reason_t target_do_xcopy(struct se_cmd *se_cmd) -- 1.9.1