Re: [PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
Hi! On 01/17/2017 03:39 PM, Joseph Qi wrote: On 17/1/17 14:30, Eric Ren wrote: We are in the situation that we have to avoid recursive cluster locking, but there is no way to check if a cluster lock has been taken by a precess already. Mostly, we can avoid recursive locking by writing code carefully. However, we found that it's very hard to handle the routines that are invoked directly by vfs code. For instance: const struct inode_operations ocfs2_file_iops = { .permission = ocfs2_permission, .get_acl= ocfs2_iop_get_acl, .set_acl= ocfs2_iop_set_acl, }; Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR): do_sys_open may_open inode_permission ocfs2_permission ocfs2_inode_lock() <=== first time generic_permission get_acl ocfs2_iop_get_acl ocfs2_inode_lock() <=== recursive one A deadlock will occur if a remote EX request comes in between two of ocfs2_inode_lock(). Briefly describe how the deadlock is formed: On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of the remote EX lock request. Another hand, the recursive cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock() because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why? because there is no chance for the first cluster lock on this node to be unlocked - we block ourselves in the code path. The idea to fix this issue is mostly taken from gfs2 code. 1. introduce a new field: struct ocfs2_lock_res.l_holders, to keep track of the processes' pid who has taken the cluster lock of this lock resource; 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH; it means just getting back disk inode bh for us if we've got cluster lock. 3. export a helper: ocfs2_is_locked_by_me() is used to check if we have got the cluster lock in the upper code path. The tracking logic should be used by some of the ocfs2 vfs's callbacks, to solve the recursive locking issue cuased by the fact that vfs routines can call into each other. The performance penalty of processing the holder list should only be seen at a few cases where the tracking logic is used, such as get/set acl. You may ask what if the first time we got a PR lock, and the second time we want a EX lock? fortunately, this case never happens in the real world, as far as I can see, including permission check, (get|set)_(acl|attr), and the gfs2 code also do so. Changes since v1: - Let ocfs2_is_locked_by_me() just return true/false to indicate if the process gets the cluster lock - suggested by: Joseph Qiand Junxiao Bi . - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder", suggested by: Junxiao Bi. - Do not inline functions whose bodies are not in scope, changed by: Stephen Rothwell . Changes since v2: - Wrap the tracking logic code of recursive locking into functions, ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(), suggested by: Junxiao Bi. [s...@canb.auug.org.au remove some inlines] Signed-off-by: Eric Ren --- fs/ocfs2/dlmglue.c | 105 +++-- fs/ocfs2/dlmglue.h | 18 + fs/ocfs2/ocfs2.h | 1 + 3 files changed, 121 insertions(+), 3 deletions(-) diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c index 77d1632..c75b9e9 100644 --- a/fs/ocfs2/dlmglue.c +++ b/fs/ocfs2/dlmglue.c @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res) init_waitqueue_head(>l_event); INIT_LIST_HEAD(>l_blocked_list); INIT_LIST_HEAD(>l_mask_waiters); +INIT_LIST_HEAD(>l_holders); } void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res, @@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res) res->l_flags = 0UL; } +/* + * Keep a list of processes who have interest in a lockres. + * Note: this is now only uesed for check recursive cluster locking. + */ +static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres, + struct ocfs2_lock_holder *oh) +{ +INIT_LIST_HEAD(>oh_list); +oh->oh_owner_pid = get_pid(task_pid(current)); Trim the redundant space here. You mean the blank line here? If so, I am OK to make the change. But, I'm little worried about people may feel annoyed if I send the v4 patch set, hah. Others look good to me. Reviewed-by: Joseph Qi Thanks for your guys' review! Eric + +spin_lock(>l_lock); +list_add_tail(>oh_list, >l_holders); +spin_unlock(>l_lock); +} + +static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres, + struct ocfs2_lock_holder *oh) +{ +spin_lock(>l_lock); +list_del(>oh_list); +spin_unlock(>l_lock); + +put_pid(oh->oh_owner_pid); +} + +static inline int ocfs2_is_locked_by_me(struct
Re: [PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
Hi! On 01/17/2017 03:39 PM, Joseph Qi wrote: On 17/1/17 14:30, Eric Ren wrote: We are in the situation that we have to avoid recursive cluster locking, but there is no way to check if a cluster lock has been taken by a precess already. Mostly, we can avoid recursive locking by writing code carefully. However, we found that it's very hard to handle the routines that are invoked directly by vfs code. For instance: const struct inode_operations ocfs2_file_iops = { .permission = ocfs2_permission, .get_acl= ocfs2_iop_get_acl, .set_acl= ocfs2_iop_set_acl, }; Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR): do_sys_open may_open inode_permission ocfs2_permission ocfs2_inode_lock() <=== first time generic_permission get_acl ocfs2_iop_get_acl ocfs2_inode_lock() <=== recursive one A deadlock will occur if a remote EX request comes in between two of ocfs2_inode_lock(). Briefly describe how the deadlock is formed: On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of the remote EX lock request. Another hand, the recursive cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock() because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why? because there is no chance for the first cluster lock on this node to be unlocked - we block ourselves in the code path. The idea to fix this issue is mostly taken from gfs2 code. 1. introduce a new field: struct ocfs2_lock_res.l_holders, to keep track of the processes' pid who has taken the cluster lock of this lock resource; 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH; it means just getting back disk inode bh for us if we've got cluster lock. 3. export a helper: ocfs2_is_locked_by_me() is used to check if we have got the cluster lock in the upper code path. The tracking logic should be used by some of the ocfs2 vfs's callbacks, to solve the recursive locking issue cuased by the fact that vfs routines can call into each other. The performance penalty of processing the holder list should only be seen at a few cases where the tracking logic is used, such as get/set acl. You may ask what if the first time we got a PR lock, and the second time we want a EX lock? fortunately, this case never happens in the real world, as far as I can see, including permission check, (get|set)_(acl|attr), and the gfs2 code also do so. Changes since v1: - Let ocfs2_is_locked_by_me() just return true/false to indicate if the process gets the cluster lock - suggested by: Joseph Qi and Junxiao Bi . - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder", suggested by: Junxiao Bi. - Do not inline functions whose bodies are not in scope, changed by: Stephen Rothwell . Changes since v2: - Wrap the tracking logic code of recursive locking into functions, ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(), suggested by: Junxiao Bi. [s...@canb.auug.org.au remove some inlines] Signed-off-by: Eric Ren --- fs/ocfs2/dlmglue.c | 105 +++-- fs/ocfs2/dlmglue.h | 18 + fs/ocfs2/ocfs2.h | 1 + 3 files changed, 121 insertions(+), 3 deletions(-) diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c index 77d1632..c75b9e9 100644 --- a/fs/ocfs2/dlmglue.c +++ b/fs/ocfs2/dlmglue.c @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res) init_waitqueue_head(>l_event); INIT_LIST_HEAD(>l_blocked_list); INIT_LIST_HEAD(>l_mask_waiters); +INIT_LIST_HEAD(>l_holders); } void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res, @@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res) res->l_flags = 0UL; } +/* + * Keep a list of processes who have interest in a lockres. + * Note: this is now only uesed for check recursive cluster locking. + */ +static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres, + struct ocfs2_lock_holder *oh) +{ +INIT_LIST_HEAD(>oh_list); +oh->oh_owner_pid = get_pid(task_pid(current)); Trim the redundant space here. You mean the blank line here? If so, I am OK to make the change. But, I'm little worried about people may feel annoyed if I send the v4 patch set, hah. Others look good to me. Reviewed-by: Joseph Qi Thanks for your guys' review! Eric + +spin_lock(>l_lock); +list_add_tail(>oh_list, >l_holders); +spin_unlock(>l_lock); +} + +static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres, + struct ocfs2_lock_holder *oh) +{ +spin_lock(>l_lock); +list_del(>oh_list); +spin_unlock(>l_lock); + +put_pid(oh->oh_owner_pid); +} + +static inline int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres) +{ +struct ocfs2_lock_holder *oh; +struct pid *pid; + +/* look in the list of
Re: [PATCH 7/8] Revert "ext4: avoid deadlocks in the writeback path by using sb_getblk_gfp"
On Mon 16-01-17 22:01:18, Theodore Ts'o wrote: > On Fri, Jan 06, 2017 at 03:11:06PM +0100, Michal Hocko wrote: > > From: Michal Hocko> > > > This reverts commit c45653c341f5c8a0ce19c8f0ad4678640849cb86 because > > sb_getblk_gfp is not really needed as > > sb_getblk > > __getblk_gfp > > __getblk_slow > > grow_buffers > > grow_dev_page > > gfp_mask = mapping_gfp_constraint(inode->i_mapping, ~__GFP_FS) | gfp > > > > so __GFP_FS is cleared unconditionally and therefore the above commit > > didn't have any real effect in fact. > > > > This patch should not introduce any functional change. The main point > > of this change is to reduce explicit GFP_NOFS usage inside ext4 code to > > make the review of the remaining usage easier. > > > > Signed-off-by: Michal Hocko > > Reviewed-by: Jan Kara > > If I'm not mistaken, this patch is not dependent on any of the other > patches in this series (and the other patches are not dependent on > this one). Hence, I could take this patch via the ext4 tree, correct? Yes, that is correct -- Michal Hocko SUSE Labs
Re: [PATCH v3 2/2] tpm: enhance TPM 2.0 PCR extend to support multiple banks
On 01/12/2017 11:50 PM, Jarkko Sakkinen wrote: On Thu, Jan 12, 2017 at 11:58:10AM -0500, Nayna Jain wrote: The current TPM 2.0 device driver extends only the SHA1 PCR bank but the TCG Specification[1] recommends extending all active PCR banks, to prevent malicious users from setting unused PCR banks with fake measurements and quoting them. The existing in-kernel interface(tpm_pcr_extend()) expects only a SHA1 digest. To extend all active PCR banks with differing digest sizes, the SHA1 digest is padded with trailing 0's as needed. [1] TPM 2.0 Specification referred here is "TCG PC Client Specific Platform Firmware Profile for TPM 2.0" Signed-off-by: Nayna Jain--- drivers/char/tpm/Kconfig | 1 + drivers/char/tpm/tpm-interface.c | 16 +- drivers/char/tpm/tpm.h | 3 +- drivers/char/tpm/tpm2-cmd.c | 68 +++- drivers/char/tpm/tpm_eventlog.h | 18 +++ 5 files changed, 75 insertions(+), 31 deletions(-) diff --git a/drivers/char/tpm/Kconfig b/drivers/char/tpm/Kconfig index 277186d..af985cc 100644 --- a/drivers/char/tpm/Kconfig +++ b/drivers/char/tpm/Kconfig @@ -6,6 +6,7 @@ menuconfig TCG_TPM tristate "TPM Hardware Support" depends on HAS_IOMEM select SECURITYFS + select CRYPTO_HASH_INFO In the commit message you did not mention this. ---help--- If you have a TPM security chip in your system, which implements the Trusted Computing Group's specification, diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index fecdd3f..e037dd2 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -7,6 +7,7 @@ * Dave Safford * Reiner Sailer * Kylene Hall + * Nayna Jain Remove. * * Maintained by: * @@ -759,6 +760,7 @@ static const struct tpm_input_header pcrextend_header = { int tpm_pcr_extend(u32 chip_num, int pcr_idx, const u8 *hash) { struct tpm_cmd_t cmd; + int i; int rc; struct tpm_chip *chip; @@ -767,7 +769,19 @@ int tpm_pcr_extend(u32 chip_num, int pcr_idx, const u8 *hash) return -ENODEV; if (chip->flags & TPM_CHIP_FLAG_TPM2) { - rc = tpm2_pcr_extend(chip, pcr_idx, hash); + struct tpml_digest_values d_values; + + memset(_values, 0, sizeof(d_values)); + + for (i = 0; (chip->active_banks[i] != 0) && +(i < ARRAY_SIZE(chip->active_banks)); i++) { + d_values.digests[i].alg_id = chip->active_banks[i]; + memcpy(d_values.digests[i].digest, hash, + TPM_DIGEST_SIZE); + d_values.count++; + } + + rc = tpm2_pcr_extend(chip, pcr_idx, _values); tpm_put_ops(chip); return rc; } diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h index 573..dd82d58 100644 --- a/drivers/char/tpm/tpm.h +++ b/drivers/char/tpm/tpm.h @@ -533,7 +533,8 @@ static inline void tpm_add_ppi(struct tpm_chip *chip) #endif int tpm2_pcr_read(struct tpm_chip *chip, int pcr_idx, u8 *res_buf); -int tpm2_pcr_extend(struct tpm_chip *chip, int pcr_idx, const u8 *hash); +int tpm2_pcr_extend(struct tpm_chip *chip, int pcr_idx, + struct tpml_digest_values *digests); int tpm2_get_random(struct tpm_chip *chip, u8 *out, size_t max); int tpm2_seal_trusted(struct tpm_chip *chip, struct trusted_key_payload *payload, diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c index 87388921..5027a54 100644 --- a/drivers/char/tpm/tpm2-cmd.c +++ b/drivers/char/tpm/tpm2-cmd.c @@ -64,9 +64,7 @@ struct tpm2_pcr_extend_in { __be32 pcr_idx; __be32 auth_area_size; struct tpm2_null_auth_area auth_area; - __be32 digest_cnt; - __be16 hash_alg; - u8 digest[TPM_DIGEST_SIZE]; + struct tpml_digest_values digests; } __packed; struct tpm2_get_tpm_pt_in { @@ -296,46 +294,58 @@ int tpm2_pcr_read(struct tpm_chip *chip, int pcr_idx, u8 *res_buf) return rc; } -#define TPM2_GET_PCREXTEND_IN_SIZE \ - (sizeof(struct tpm_input_header) + \ -sizeof(struct tpm2_pcr_extend_in)) - -static const struct tpm_input_header tpm2_pcrextend_header = { - .tag = cpu_to_be16(TPM2_ST_SESSIONS), - .length = cpu_to_be32(TPM2_GET_PCREXTEND_IN_SIZE), - .ordinal = cpu_to_be32(TPM2_CC_PCR_EXTEND) -}; - /** * tpm2_pcr_extend() - extend a PCR value * * @chip: TPM chip to use. * @pcr_idx: index of the PCR. - * @hash: hash value
Re: [PATCH 7/8] Revert "ext4: avoid deadlocks in the writeback path by using sb_getblk_gfp"
On Mon 16-01-17 22:01:18, Theodore Ts'o wrote: > On Fri, Jan 06, 2017 at 03:11:06PM +0100, Michal Hocko wrote: > > From: Michal Hocko > > > > This reverts commit c45653c341f5c8a0ce19c8f0ad4678640849cb86 because > > sb_getblk_gfp is not really needed as > > sb_getblk > > __getblk_gfp > > __getblk_slow > > grow_buffers > > grow_dev_page > > gfp_mask = mapping_gfp_constraint(inode->i_mapping, ~__GFP_FS) | gfp > > > > so __GFP_FS is cleared unconditionally and therefore the above commit > > didn't have any real effect in fact. > > > > This patch should not introduce any functional change. The main point > > of this change is to reduce explicit GFP_NOFS usage inside ext4 code to > > make the review of the remaining usage easier. > > > > Signed-off-by: Michal Hocko > > Reviewed-by: Jan Kara > > If I'm not mistaken, this patch is not dependent on any of the other > patches in this series (and the other patches are not dependent on > this one). Hence, I could take this patch via the ext4 tree, correct? Yes, that is correct -- Michal Hocko SUSE Labs
Re: [PATCH v3 2/2] tpm: enhance TPM 2.0 PCR extend to support multiple banks
On 01/12/2017 11:50 PM, Jarkko Sakkinen wrote: On Thu, Jan 12, 2017 at 11:58:10AM -0500, Nayna Jain wrote: The current TPM 2.0 device driver extends only the SHA1 PCR bank but the TCG Specification[1] recommends extending all active PCR banks, to prevent malicious users from setting unused PCR banks with fake measurements and quoting them. The existing in-kernel interface(tpm_pcr_extend()) expects only a SHA1 digest. To extend all active PCR banks with differing digest sizes, the SHA1 digest is padded with trailing 0's as needed. [1] TPM 2.0 Specification referred here is "TCG PC Client Specific Platform Firmware Profile for TPM 2.0" Signed-off-by: Nayna Jain --- drivers/char/tpm/Kconfig | 1 + drivers/char/tpm/tpm-interface.c | 16 +- drivers/char/tpm/tpm.h | 3 +- drivers/char/tpm/tpm2-cmd.c | 68 +++- drivers/char/tpm/tpm_eventlog.h | 18 +++ 5 files changed, 75 insertions(+), 31 deletions(-) diff --git a/drivers/char/tpm/Kconfig b/drivers/char/tpm/Kconfig index 277186d..af985cc 100644 --- a/drivers/char/tpm/Kconfig +++ b/drivers/char/tpm/Kconfig @@ -6,6 +6,7 @@ menuconfig TCG_TPM tristate "TPM Hardware Support" depends on HAS_IOMEM select SECURITYFS + select CRYPTO_HASH_INFO In the commit message you did not mention this. ---help--- If you have a TPM security chip in your system, which implements the Trusted Computing Group's specification, diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index fecdd3f..e037dd2 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -7,6 +7,7 @@ * Dave Safford * Reiner Sailer * Kylene Hall + * Nayna Jain Remove. * * Maintained by: * @@ -759,6 +760,7 @@ static const struct tpm_input_header pcrextend_header = { int tpm_pcr_extend(u32 chip_num, int pcr_idx, const u8 *hash) { struct tpm_cmd_t cmd; + int i; int rc; struct tpm_chip *chip; @@ -767,7 +769,19 @@ int tpm_pcr_extend(u32 chip_num, int pcr_idx, const u8 *hash) return -ENODEV; if (chip->flags & TPM_CHIP_FLAG_TPM2) { - rc = tpm2_pcr_extend(chip, pcr_idx, hash); + struct tpml_digest_values d_values; + + memset(_values, 0, sizeof(d_values)); + + for (i = 0; (chip->active_banks[i] != 0) && +(i < ARRAY_SIZE(chip->active_banks)); i++) { + d_values.digests[i].alg_id = chip->active_banks[i]; + memcpy(d_values.digests[i].digest, hash, + TPM_DIGEST_SIZE); + d_values.count++; + } + + rc = tpm2_pcr_extend(chip, pcr_idx, _values); tpm_put_ops(chip); return rc; } diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h index 573..dd82d58 100644 --- a/drivers/char/tpm/tpm.h +++ b/drivers/char/tpm/tpm.h @@ -533,7 +533,8 @@ static inline void tpm_add_ppi(struct tpm_chip *chip) #endif int tpm2_pcr_read(struct tpm_chip *chip, int pcr_idx, u8 *res_buf); -int tpm2_pcr_extend(struct tpm_chip *chip, int pcr_idx, const u8 *hash); +int tpm2_pcr_extend(struct tpm_chip *chip, int pcr_idx, + struct tpml_digest_values *digests); int tpm2_get_random(struct tpm_chip *chip, u8 *out, size_t max); int tpm2_seal_trusted(struct tpm_chip *chip, struct trusted_key_payload *payload, diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c index 87388921..5027a54 100644 --- a/drivers/char/tpm/tpm2-cmd.c +++ b/drivers/char/tpm/tpm2-cmd.c @@ -64,9 +64,7 @@ struct tpm2_pcr_extend_in { __be32 pcr_idx; __be32 auth_area_size; struct tpm2_null_auth_area auth_area; - __be32 digest_cnt; - __be16 hash_alg; - u8 digest[TPM_DIGEST_SIZE]; + struct tpml_digest_values digests; } __packed; struct tpm2_get_tpm_pt_in { @@ -296,46 +294,58 @@ int tpm2_pcr_read(struct tpm_chip *chip, int pcr_idx, u8 *res_buf) return rc; } -#define TPM2_GET_PCREXTEND_IN_SIZE \ - (sizeof(struct tpm_input_header) + \ -sizeof(struct tpm2_pcr_extend_in)) - -static const struct tpm_input_header tpm2_pcrextend_header = { - .tag = cpu_to_be16(TPM2_ST_SESSIONS), - .length = cpu_to_be32(TPM2_GET_PCREXTEND_IN_SIZE), - .ordinal = cpu_to_be32(TPM2_CC_PCR_EXTEND) -}; - /** * tpm2_pcr_extend() - extend a PCR value * * @chip: TPM chip to use. * @pcr_idx: index of the PCR. - * @hash: hash value to use for the extend operation. + * @digests: list of pcr banks and corresponding hash values to be extended. * * Return: Same as with
Re: [PATCH RESEND] coredump: Ensure proper size of sparse core files
On Mon, Jan 16, 2017 at 10:42:43PM -0500, David Miller wrote: > The lseek() done by dump_skip() should extend the file properly. lseek never extends a file. It just moves the current file position. So if you do not have a write after the lseek it does nothing.
Re: [PATCH RESEND] coredump: Ensure proper size of sparse core files
On Mon, Jan 16, 2017 at 10:42:43PM -0500, David Miller wrote: > The lseek() done by dump_skip() should extend the file properly. lseek never extends a file. It just moves the current file position. So if you do not have a write after the lseek it does nothing.
Re: [PATCH v2 2/3] spi: pxa2xx: Prepare for edge-triggered interrupts
Jan Kiszkawrites: > When using the a device with edge-triggered interrupts, such as MSIs, > the interrupt handler has to ensure that there is a point in time during > its execution where all interrupts sources are silent so that a new > event can trigger a new interrupt again. > > This is achieved here by looping over SSSR evaluation. We need to take > into account that SSCR1 may be changed by the transfer handler, thus we > need to redo the mask calculation, at least regarding the volatile > interrupt enable bit (TIE). > > Signed-off-by: Jan Kiszka Hi Jan, > + while (1) { This bit worries me a bit, as this can be either : - hogging the SoC's CPU, endlessly running - or even worse, blocking the CPU for ever The question behind is, should this be done in a top-half, or moved to a irq thread ? > + /* Ignore possible writes if we don't need to write */ > + if (!(sccr1_reg & SSCR1_TIE)) > + mask &= ~SSSR_TFS; > > - if (!drv_data->master->cur_msg) { > - handle_bad_msg(drv_data); > - /* Never fail */ > - return IRQ_HANDLED; > - } > + if (!(status & mask)) > + return ret; > + > + if (!drv_data->master->cur_msg) { > + handle_bad_msg(drv_data); > + /* Never fail */ > + return IRQ_HANDLED; > + } > + > + ret |= drv_data->transfer_handler(drv_data); Mmm that looks weird to me, oring a irqreturn. Imagine that on first iteration the handler returns IRQ_NONE, and on second IRQ_HANDLED. This makes ret IRQ_HANDLED. Yet after the first iteration the handler should have exited, especially if the interrupt is shared with another driver. Another thing which is along what Andy already said : it would be better practice to have this loop in the form : do { ... } while (exit_condition_not_met); Just for maintainability, it's better, and it concentrates the test on the "exit_condition_not_met" in one place, which will enable us to review better the algorithm. Cheers. -- Robert
Re: [PATCH v2 2/3] spi: pxa2xx: Prepare for edge-triggered interrupts
Jan Kiszka writes: > When using the a device with edge-triggered interrupts, such as MSIs, > the interrupt handler has to ensure that there is a point in time during > its execution where all interrupts sources are silent so that a new > event can trigger a new interrupt again. > > This is achieved here by looping over SSSR evaluation. We need to take > into account that SSCR1 may be changed by the transfer handler, thus we > need to redo the mask calculation, at least regarding the volatile > interrupt enable bit (TIE). > > Signed-off-by: Jan Kiszka Hi Jan, > + while (1) { This bit worries me a bit, as this can be either : - hogging the SoC's CPU, endlessly running - or even worse, blocking the CPU for ever The question behind is, should this be done in a top-half, or moved to a irq thread ? > + /* Ignore possible writes if we don't need to write */ > + if (!(sccr1_reg & SSCR1_TIE)) > + mask &= ~SSSR_TFS; > > - if (!drv_data->master->cur_msg) { > - handle_bad_msg(drv_data); > - /* Never fail */ > - return IRQ_HANDLED; > - } > + if (!(status & mask)) > + return ret; > + > + if (!drv_data->master->cur_msg) { > + handle_bad_msg(drv_data); > + /* Never fail */ > + return IRQ_HANDLED; > + } > + > + ret |= drv_data->transfer_handler(drv_data); Mmm that looks weird to me, oring a irqreturn. Imagine that on first iteration the handler returns IRQ_NONE, and on second IRQ_HANDLED. This makes ret IRQ_HANDLED. Yet after the first iteration the handler should have exited, especially if the interrupt is shared with another driver. Another thing which is along what Andy already said : it would be better practice to have this loop in the form : do { ... } while (exit_condition_not_met); Just for maintainability, it's better, and it concentrates the test on the "exit_condition_not_met" in one place, which will enable us to review better the algorithm. Cheers. -- Robert
Re: [PATCH v1 00/54] block: support multipage bvec
On Tue, Jan 17, 2017 at 10:40:36AM +0800, Ming Lei wrote: > IMO, the only one left is raid(1/5/10) which can be dealt with by the > "NO_MP" flag. No, they can't in the long run. They need to handle the bvecs just like everyone else.
Re: [PATCH v1 00/54] block: support multipage bvec
On Tue, Jan 17, 2017 at 10:40:36AM +0800, Ming Lei wrote: > IMO, the only one left is raid(1/5/10) which can be dealt with by the > "NO_MP" flag. No, they can't in the long run. They need to handle the bvecs just like everyone else.
Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
On Mon 16-01-17 13:57:43, John Hubbard wrote: > > > On 01/16/2017 01:48 PM, Michal Hocko wrote: > > On Mon 16-01-17 13:15:08, John Hubbard wrote: > > > > > > > > > On 01/16/2017 11:40 AM, Michal Hocko wrote: > > > > On Mon 16-01-17 11:09:37, John Hubbard wrote: > > > > > > > > > > > > > > > On 01/16/2017 12:47 AM, Michal Hocko wrote: > > > > > > On Sun 15-01-17 20:34:13, John Hubbard wrote: > > > > [...] > > > > > > > Is that "Reclaim modifiers" line still true, or is it a leftover > > > > > > > from an > > > > > > > earlier approach? I am having trouble reconciling it with rest of > > > > > > > the > > > > > > > patchset, because: > > > > > > > > > > > > > > a) the flags argument below is effectively passed on to either > > > > > > > kmalloc_node > > > > > > > (possibly adding, but not removing flags), or to > > > > > > > __vmalloc_node_flags. > > > > > > > > > > > > The above only says thos are _unsupported_ - in other words the > > > > > > behavior > > > > > > is not defined. Even if flags are passed down to kmalloc resp. > > > > > > vmalloc > > > > > > it doesn't mean they are used that way. Remember that vmalloc uses > > > > > > some hardcoded GFP_KERNEL allocations. So while I could be really > > > > > > strict about this and mask away these flags I doubt this is worth > > > > > > the > > > > > > additional code. > > > > > > > > > > I do wonder about passing those flags through to kmalloc. Maybe it is > > > > > worth > > > > > stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides > > > > > some > > > > > insulation from any future changes to the implementation of kmalloc, > > > > > and it > > > > > also makes the documentation more believable. > > > > > > > > I am not really convinced that we should take an extra steps for these > > > > flags. There are no existing users for those flags and new users should > > > > follow the documentation. > > > > > > OK, let's just fortify the documentation ever so slightly, then, so that > > > users are more likely to do the right thing. How's this sound: > > > > > > * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. > > > (Even > > > * though the current implementation passes the flags on through to > > > kmalloc and > > > * vmalloc, that is done for efficiency and to avoid unnecessary code. The > > > caller > > > * should not pass in these flags.) > > > * > > > * __GFP_REPEAT is supported, but only for large (>64kB) allocations. > > > > > > > > > ? Or is that documentation overkill? > > > > Dunno, it sounds like an overkill to me. It is telling more than > > necessary. If we want to be so vocal about gfp flags then we would have > > to say much more I suspect. E.g. what about __GFP_HIGHMEM? This flag is > > supported for vmalloc while unsupported for kmalloc. I am pretty sure > > there would be other gfp flags to consider and then this would grow > > borringly large and uninteresting to the point when people simply stop > > reading it. Let's just be as simple as possible. > > Agreed, on the simplicity point: simple and clear is ideal. But here, it's > merely short, and not quite simple. :) People will look at that short bit > of documentation, and then notice that the flags are, in fact, all passed > right on through down to both kmalloc_node and __vmalloc_node_flags. > > If you don't want too much documentation, then I'd be inclined to say > something higher-level, about the intent, rather than mentioning those two > flags directly. Because as it stands, the documentation contradicts what the > code does. Feel free to suggest a better wording. I am, of course, open to any changes. -- Michal Hocko SUSE Labs
Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers
On Mon 16-01-17 13:57:43, John Hubbard wrote: > > > On 01/16/2017 01:48 PM, Michal Hocko wrote: > > On Mon 16-01-17 13:15:08, John Hubbard wrote: > > > > > > > > > On 01/16/2017 11:40 AM, Michal Hocko wrote: > > > > On Mon 16-01-17 11:09:37, John Hubbard wrote: > > > > > > > > > > > > > > > On 01/16/2017 12:47 AM, Michal Hocko wrote: > > > > > > On Sun 15-01-17 20:34:13, John Hubbard wrote: > > > > [...] > > > > > > > Is that "Reclaim modifiers" line still true, or is it a leftover > > > > > > > from an > > > > > > > earlier approach? I am having trouble reconciling it with rest of > > > > > > > the > > > > > > > patchset, because: > > > > > > > > > > > > > > a) the flags argument below is effectively passed on to either > > > > > > > kmalloc_node > > > > > > > (possibly adding, but not removing flags), or to > > > > > > > __vmalloc_node_flags. > > > > > > > > > > > > The above only says thos are _unsupported_ - in other words the > > > > > > behavior > > > > > > is not defined. Even if flags are passed down to kmalloc resp. > > > > > > vmalloc > > > > > > it doesn't mean they are used that way. Remember that vmalloc uses > > > > > > some hardcoded GFP_KERNEL allocations. So while I could be really > > > > > > strict about this and mask away these flags I doubt this is worth > > > > > > the > > > > > > additional code. > > > > > > > > > > I do wonder about passing those flags through to kmalloc. Maybe it is > > > > > worth > > > > > stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides > > > > > some > > > > > insulation from any future changes to the implementation of kmalloc, > > > > > and it > > > > > also makes the documentation more believable. > > > > > > > > I am not really convinced that we should take an extra steps for these > > > > flags. There are no existing users for those flags and new users should > > > > follow the documentation. > > > > > > OK, let's just fortify the documentation ever so slightly, then, so that > > > users are more likely to do the right thing. How's this sound: > > > > > > * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. > > > (Even > > > * though the current implementation passes the flags on through to > > > kmalloc and > > > * vmalloc, that is done for efficiency and to avoid unnecessary code. The > > > caller > > > * should not pass in these flags.) > > > * > > > * __GFP_REPEAT is supported, but only for large (>64kB) allocations. > > > > > > > > > ? Or is that documentation overkill? > > > > Dunno, it sounds like an overkill to me. It is telling more than > > necessary. If we want to be so vocal about gfp flags then we would have > > to say much more I suspect. E.g. what about __GFP_HIGHMEM? This flag is > > supported for vmalloc while unsupported for kmalloc. I am pretty sure > > there would be other gfp flags to consider and then this would grow > > borringly large and uninteresting to the point when people simply stop > > reading it. Let's just be as simple as possible. > > Agreed, on the simplicity point: simple and clear is ideal. But here, it's > merely short, and not quite simple. :) People will look at that short bit > of documentation, and then notice that the flags are, in fact, all passed > right on through down to both kmalloc_node and __vmalloc_node_flags. > > If you don't want too much documentation, then I'd be inclined to say > something higher-level, about the intent, rather than mentioning those two > flags directly. Because as it stands, the documentation contradicts what the > code does. Feel free to suggest a better wording. I am, of course, open to any changes. -- Michal Hocko SUSE Labs
Re: [PATCH v3 2/4] dt-bindings: Add TI SCI PM Domains
On 17/01/17 00:12, Dave Gerlach wrote: On 01/13/2017 08:40 PM, Rob Herring wrote: On Fri, Jan 13, 2017 at 2:28 PM, Dave Gerlachwrote: On 01/13/2017 01:25 PM, Rob Herring wrote: On Thu, Jan 12, 2017 at 9:27 AM, Dave Gerlach wrote: Rob, On 01/11/2017 03:34 PM, Rob Herring wrote: On Mon, Jan 9, 2017 at 11:57 AM, Dave Gerlach wrote: Rob, On 01/09/2017 11:50 AM, Rob Herring wrote: On Wed, Jan 04, 2017 at 02:55:34PM -0600, Dave Gerlach wrote: Add a generic power domain implementation, TI SCI PM Domains, that will hook into the genpd framework and allow the TI SCI protocol to control device power states. Also, provide macros representing each device index as understood by TI SCI to be used in the device node power-domain references. These are identifiers for the K2G devices managed by the PMMC. Signed-off-by: Nishanth Menon Signed-off-by: Dave Gerlach --- v2->v3: Update k2g_pds node docs to show it should be a child of pmmc node. In early versions a phandle was used to point to pmmc and docs still incorrectly showed this. .../devicetree/bindings/soc/ti/sci-pm-domain.txt | 59 ++ MAINTAINERS| 2 + include/dt-bindings/genpd/k2g.h| 90 ++ 3 files changed, 151 insertions(+) create mode 100644 Documentation/devicetree/bindings/soc/ti/sci-pm-domain.txt create mode 100644 include/dt-bindings/genpd/k2g.h diff --git a/Documentation/devicetree/bindings/soc/ti/sci-pm-domain.txt b/Documentation/devicetree/bindings/soc/ti/sci-pm-domain.txt new file mode 100644 index ..4c9064e512cb --- /dev/null +++ b/Documentation/devicetree/bindings/soc/ti/sci-pm-domain.txt @@ -0,0 +1,59 @@ +Texas Instruments TI-SCI Generic Power Domain +- + +Some TI SoCs contain a system controller (like the PMMC, etc...) that is +responsible for controlling the state of the IPs that are present. +Communication between the host processor running an OS and the system +controller happens through a protocol known as TI-SCI [1]. This pm domain +implementation plugs into the generic pm domain framework and makes use of +the TI SCI protocol power on and off each device when needed. + +[1] Documentation/devicetree/bindings/arm/keystone/ti,sci.txt + +PM Domain Node +== +The PM domain node represents the global PM domain managed by the PMMC, +which in this case is the single implementation as documented by the generic +PM domain bindings in Documentation/devicetree/bindings/power/power_domain.txt. +Because this relies on the TI SCI protocol to communicate with the PMMC it +must be a child of the pmmc node. + +Required Properties: + +- compatible: should be "ti,sci-pm-domain" +- #power-domain-cells: Must be 0. + +Example (K2G): +- + pmmc: pmmc { + compatible = "ti,k2g-sci"; + ... + + k2g_pds: k2g_pds { + compatible = "ti,sci-pm-domain"; + #power-domain-cells = <0>; + }; + }; + +PM Domain Consumers +=== +Hardware blocks that require SCI control over their state must provide +a reference to the sci-pm-domain they are part of and a unique device +specific ID that identifies the device. + +Required Properties: + +- power-domains: phandle pointing to the corresponding PM domain node. +- ti,sci-id: index representing the device id to be passed oevr SCI to +be used for device control. As I've already stated before, this goes in power-domain cells. When you have a single thing (i.e. node) that controls multiple things, then you you need to specify the ID for each of them in phandle args. This is how irqs, gpio, clocks, *everything* in DT works. You think the reasoning for doing it this way provided by both Ulf and myself on v2 [1] is not valid then? From Ulf: To me, the TI SCI ID, is similar to a "conid" for any another "device resource" (like clock, pinctrl, regulator etc) which we can describe in DT and assign to a device node. The only difference here, is that we don't have common API to fetch the resource (like clk_get(), regulator_get()), but instead we fetches the device's resource from SoC specific code, via genpd's device ->attach() callback. Sorry, but that sounds like a kernel problem to me and has nothing to do with DT bindings. From me: Yes, you've pretty much hit it on the head. It is not an index into a list of genpds but rather identifies the device *within* a single genpd. It is a property specific to each device that resides in a ti-sci-genpd, not a mapping describing which genpd the device belongs to. The generic power domain binding is concerned with mapping the device to a specific genpd, which is does fine for us, but we have a sub mapping for
Re: [PATCH v3 2/4] dt-bindings: Add TI SCI PM Domains
On 17/01/17 00:12, Dave Gerlach wrote: On 01/13/2017 08:40 PM, Rob Herring wrote: On Fri, Jan 13, 2017 at 2:28 PM, Dave Gerlach wrote: On 01/13/2017 01:25 PM, Rob Herring wrote: On Thu, Jan 12, 2017 at 9:27 AM, Dave Gerlach wrote: Rob, On 01/11/2017 03:34 PM, Rob Herring wrote: On Mon, Jan 9, 2017 at 11:57 AM, Dave Gerlach wrote: Rob, On 01/09/2017 11:50 AM, Rob Herring wrote: On Wed, Jan 04, 2017 at 02:55:34PM -0600, Dave Gerlach wrote: Add a generic power domain implementation, TI SCI PM Domains, that will hook into the genpd framework and allow the TI SCI protocol to control device power states. Also, provide macros representing each device index as understood by TI SCI to be used in the device node power-domain references. These are identifiers for the K2G devices managed by the PMMC. Signed-off-by: Nishanth Menon Signed-off-by: Dave Gerlach --- v2->v3: Update k2g_pds node docs to show it should be a child of pmmc node. In early versions a phandle was used to point to pmmc and docs still incorrectly showed this. .../devicetree/bindings/soc/ti/sci-pm-domain.txt | 59 ++ MAINTAINERS| 2 + include/dt-bindings/genpd/k2g.h| 90 ++ 3 files changed, 151 insertions(+) create mode 100644 Documentation/devicetree/bindings/soc/ti/sci-pm-domain.txt create mode 100644 include/dt-bindings/genpd/k2g.h diff --git a/Documentation/devicetree/bindings/soc/ti/sci-pm-domain.txt b/Documentation/devicetree/bindings/soc/ti/sci-pm-domain.txt new file mode 100644 index ..4c9064e512cb --- /dev/null +++ b/Documentation/devicetree/bindings/soc/ti/sci-pm-domain.txt @@ -0,0 +1,59 @@ +Texas Instruments TI-SCI Generic Power Domain +- + +Some TI SoCs contain a system controller (like the PMMC, etc...) that is +responsible for controlling the state of the IPs that are present. +Communication between the host processor running an OS and the system +controller happens through a protocol known as TI-SCI [1]. This pm domain +implementation plugs into the generic pm domain framework and makes use of +the TI SCI protocol power on and off each device when needed. + +[1] Documentation/devicetree/bindings/arm/keystone/ti,sci.txt + +PM Domain Node +== +The PM domain node represents the global PM domain managed by the PMMC, +which in this case is the single implementation as documented by the generic +PM domain bindings in Documentation/devicetree/bindings/power/power_domain.txt. +Because this relies on the TI SCI protocol to communicate with the PMMC it +must be a child of the pmmc node. + +Required Properties: + +- compatible: should be "ti,sci-pm-domain" +- #power-domain-cells: Must be 0. + +Example (K2G): +- + pmmc: pmmc { + compatible = "ti,k2g-sci"; + ... + + k2g_pds: k2g_pds { + compatible = "ti,sci-pm-domain"; + #power-domain-cells = <0>; + }; + }; + +PM Domain Consumers +=== +Hardware blocks that require SCI control over their state must provide +a reference to the sci-pm-domain they are part of and a unique device +specific ID that identifies the device. + +Required Properties: + +- power-domains: phandle pointing to the corresponding PM domain node. +- ti,sci-id: index representing the device id to be passed oevr SCI to +be used for device control. As I've already stated before, this goes in power-domain cells. When you have a single thing (i.e. node) that controls multiple things, then you you need to specify the ID for each of them in phandle args. This is how irqs, gpio, clocks, *everything* in DT works. You think the reasoning for doing it this way provided by both Ulf and myself on v2 [1] is not valid then? From Ulf: To me, the TI SCI ID, is similar to a "conid" for any another "device resource" (like clock, pinctrl, regulator etc) which we can describe in DT and assign to a device node. The only difference here, is that we don't have common API to fetch the resource (like clk_get(), regulator_get()), but instead we fetches the device's resource from SoC specific code, via genpd's device ->attach() callback. Sorry, but that sounds like a kernel problem to me and has nothing to do with DT bindings. From me: Yes, you've pretty much hit it on the head. It is not an index into a list of genpds but rather identifies the device *within* a single genpd. It is a property specific to each device that resides in a ti-sci-genpd, not a mapping describing which genpd the device belongs to. The generic power domain binding is concerned with mapping the device to a specific genpd, which is does fine for us, but we have a sub mapping for devices that exist inside a genpd which, we must describe as well, hence the
Re: [PATCH v3 3/3] reset: zx2967: add reset controller driver for ZTE's zx2967 family
On Tue, Jan 17, 2017 at 11:22:57AM +0800, Baoyou Xie wrote: > This patch adds reset controller driver for ZTE's zx2967 family. > > Signed-off-by: Baoyou XieReviewed-by: Shawn Guo
Re: [PATCH v3 3/3] reset: zx2967: add reset controller driver for ZTE's zx2967 family
On Tue, Jan 17, 2017 at 11:22:57AM +0800, Baoyou Xie wrote: > This patch adds reset controller driver for ZTE's zx2967 family. > > Signed-off-by: Baoyou Xie Reviewed-by: Shawn Guo
Re: [PATCH v4 07/15] lockdep: Implement crossrelease feature
On Tue, Jan 17, 2017 at 08:14:56AM +0100, Peter Zijlstra wrote: > On Tue, Jan 17, 2017 at 11:05:42AM +0900, Byungchul Park wrote: > > On Mon, Jan 16, 2017 at 04:10:01PM +0100, Peter Zijlstra wrote: > > > On Fri, Dec 09, 2016 at 02:12:03PM +0900, Byungchul Park wrote: > > > > > @@ -155,6 +164,9 @@ struct lockdep_map { > > > > int cpu; > > > > unsigned long ip; > > > > #endif > > > > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE > > > > + struct cross_lock *xlock; > > > > +#endif > > > > > > The use of this escapes me; why does the lockdep_map need a pointer to > > > this? > > > > Lockdep interfaces e.g. lock_acquire(), lock_release() and lock_commit() > > use lockdep_map as an arg, but crossrelease need to extract cross_lock > > instances from that. > > > > Why not do something like: > > > > > > struct lockdep_map_cross { > > > struct lockdep_map map; > > > struct held_lockhlock; > > > } > > Using a structure like that, you can pass lockdep_map_cross around just > fine, since the lockdep_map is the first member, so the pointers are > interchangeable. At worst we might need to munge a few typecasts. > > But then the cross release code can simply cast to the bigger type and > have access to the extra data it knows to be there. Right. I will apply it.
Re: [PATCH v4 07/15] lockdep: Implement crossrelease feature
On Tue, Jan 17, 2017 at 08:14:56AM +0100, Peter Zijlstra wrote: > On Tue, Jan 17, 2017 at 11:05:42AM +0900, Byungchul Park wrote: > > On Mon, Jan 16, 2017 at 04:10:01PM +0100, Peter Zijlstra wrote: > > > On Fri, Dec 09, 2016 at 02:12:03PM +0900, Byungchul Park wrote: > > > > > @@ -155,6 +164,9 @@ struct lockdep_map { > > > > int cpu; > > > > unsigned long ip; > > > > #endif > > > > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE > > > > + struct cross_lock *xlock; > > > > +#endif > > > > > > The use of this escapes me; why does the lockdep_map need a pointer to > > > this? > > > > Lockdep interfaces e.g. lock_acquire(), lock_release() and lock_commit() > > use lockdep_map as an arg, but crossrelease need to extract cross_lock > > instances from that. > > > > Why not do something like: > > > > > > struct lockdep_map_cross { > > > struct lockdep_map map; > > > struct held_lockhlock; > > > } > > Using a structure like that, you can pass lockdep_map_cross around just > fine, since the lockdep_map is the first member, so the pointers are > interchangeable. At worst we might need to munge a few typecasts. > > But then the cross release code can simply cast to the bigger type and > have access to the extra data it knows to be there. Right. I will apply it.
Re: [PATCH v3 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points
On 17/1/17 14:30, Eric Ren wrote: Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") results in a deadlock, as the author "Tariq Saeed" realized shortly after the patch was merged. The discussion happened here (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html). The reason why taking cluster inode lock at vfs entry points opens up a self deadlock window, is explained in the previous patch of this series. So far, we have seen two different code paths that have this issue. 1. do_sys_open may_open inode_permission ocfs2_permission ocfs2_inode_lock() <=== take PR generic_permission get_acl ocfs2_iop_get_acl ocfs2_inode_lock() <=== take PR 2. fchmod|fchmodat chmod_common notify_change ocfs2_setattr <=== take EX posix_acl_chmod get_acl ocfs2_iop_get_acl <=== take PR ocfs2_iop_set_acl <=== take EX Fixes them by adding the tracking logic (in the previous patch) for these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(), ocfs2_setattr(). Changes since v1: - Let ocfs2_is_locked_by_me() just return true/false to indicate if the process gets the cluster lock - suggested by: Joseph Qiand Junxiao Bi . - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder", suggested by: Junxiao Bi. - Add debugging output at ocfs2_setattr() and ocfs2_permission() to catch exceptional cases, suggested by: Junxiao Bi. Changes since v2: - Use new wrappers of tracking logic code, suggested by: Junxiao Bi. Signed-off-by: Eric Ren Looks good. Reviewed-by: Joseph Qi --- fs/ocfs2/acl.c | 29 + fs/ocfs2/file.c | 58 - 2 files changed, 58 insertions(+), 29 deletions(-) diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c index bed1fcb..dc22ba8 100644 --- a/fs/ocfs2/acl.c +++ b/fs/ocfs2/acl.c @@ -283,16 +283,14 @@ int ocfs2_set_acl(handle_t *handle, int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type) { struct buffer_head *bh = NULL; - int status = 0; + int status, had_lock; + struct ocfs2_lock_holder oh; - status = ocfs2_inode_lock(inode, , 1); - if (status < 0) { - if (status != -ENOENT) - mlog_errno(status); - return status; - } + had_lock = ocfs2_inode_lock_tracker(inode, , 1, ); + if (had_lock < 0) + return had_lock; status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL); - ocfs2_inode_unlock(inode, 1); + ocfs2_inode_unlock_tracker(inode, 1, , had_lock); brelse(bh); return status; } @@ -302,21 +300,20 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, int type) struct ocfs2_super *osb; struct buffer_head *di_bh = NULL; struct posix_acl *acl; - int ret; + int had_lock; + struct ocfs2_lock_holder oh; osb = OCFS2_SB(inode->i_sb); if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL)) return NULL; - ret = ocfs2_inode_lock(inode, _bh, 0); - if (ret < 0) { - if (ret != -ENOENT) - mlog_errno(ret); - return ERR_PTR(ret); - } + + had_lock = ocfs2_inode_lock_tracker(inode, _bh, 0, ); + if (had_lock < 0) + return ERR_PTR(had_lock); acl = ocfs2_get_acl_nolock(inode, type, di_bh); - ocfs2_inode_unlock(inode, 0); + ocfs2_inode_unlock_tracker(inode, 0, , had_lock); brelse(di_bh); return acl; } diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index c488965..7b6a146 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1138,6 +1138,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) handle_t *handle = NULL; struct dquot *transfer_to[MAXQUOTAS] = { }; int qtype; + int had_lock; + struct ocfs2_lock_holder oh; trace_ocfs2_setattr(inode, dentry, (unsigned long long)OCFS2_I(inode)->ip_blkno, @@ -1173,11 +1175,30 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) } } - status = ocfs2_inode_lock(inode, , 1); - if (status < 0) { - if (status != -ENOENT) - mlog_errno(status); + had_lock = ocfs2_inode_lock_tracker(inode, , 1, ); + if (had_lock < 0) { + status = had_lock; goto bail_unlock_rw; + } else if (had_lock) { + /* +* As far as we know, ocfs2_setattr() could only be the first +* VFS entry point in the call chain of recursive cluster +* locking issue. +* +* For instance: +*
Re: [PATCH] fbdev: ssd1307fb: allow reset-gpios is missing
On Mon, Jan 16, 2017 at 05:50:21PM +0800, Icenowy Zheng wrote: > > 2017年1月16日 16:02于 Maxime Ripard写道: > > > > On Sun, Jan 15, 2017 at 07:21:46PM +0800, Icenowy Zheng wrote: > > > Currently some SSD1306 OLED modules are sold without a reset pin (only > > > VCC, GND, SCK, SDA four pins). > > > > > > Add support for missing reset-gpios property. > > > > > > Signed-off-by: Icenowy Zheng > > > > Unfortunately, a similar patch has been sent a couple of times > > already: > > https://www.spinics.net/lists/devicetree/msg158330.html > > Why is it never merged? It was sent 4 days ago... And since you didn't have the right maintainers in the cc list, yours didn't have a chance to be merged either. > There are really boards that needs this function. Then you can accelerate its inclusion by reviewing it. Maxime -- Maxime Ripard, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com signature.asc Description: PGP signature
Re: [PATCH v4 07/15] lockdep: Implement crossrelease feature
On Tue, Jan 17, 2017 at 02:24:08PM +0800, Boqun Feng wrote: > On Tue, Jan 17, 2017 at 11:33:41AM +0900, Byungchul Park wrote: > > On Mon, Jan 16, 2017 at 04:13:19PM +0100, Peter Zijlstra wrote: > > > On Fri, Dec 09, 2016 at 02:12:03PM +0900, Byungchul Park wrote: > > > > + /* > > > > +* We assign class_idx here redundantly even though following > > > > +* memcpy will cover it, in order to ensure a rcu reader can > > > > +* access the class_idx atomically without lock. > > > > +* > > > > +* Here we assume setting a word-sized variable is atomic. > > > > > > which one, where? > > > > I meant xlock_class(xlock) in check_add_plock(). > > > > I was not sure about the following two. > > > > 1. Is it ordered between following a and b? > >a. memcpy -> list_add_tail_rcu > >b. list_for_each_entry_rcu -> load class_idx (xlock_class) > >I assumed that it's not ordered. > > 2. Does memcpy guarantee atomic store for each word? > >I assumed that it doesn't. > > > > But I think I was wrong.. The first might be ordered. I will remove > > the following redundant statement. It'd be orderd, right? > > > > Yes, a and b are ordered, IOW, they could be paired, meaning when we > got the item in a list_for_each_entry_rcu() loop, all memory operations > before the corresponding list_add_tail_rcu() should be observed by us. Thank you for confirming it. > > Regards, > Boqun > > > > > > > > +*/ > > > > + xlock->hlock.class_idx = hlock->class_idx; > > > > + gen_id = (unsigned int)atomic_inc_return(_gen_id); > > > > + WRITE_ONCE(xlock->gen_id, gen_id); > > > > + memcpy(>hlock, hlock, sizeof(struct held_lock)); > > > > + INIT_LIST_HEAD(>xlock_entry); > > > > + list_add_tail_rcu(>xlock_entry, _head); > > >
Re: [PATCH v3 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points
On 17/1/17 14:30, Eric Ren wrote: Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") results in a deadlock, as the author "Tariq Saeed" realized shortly after the patch was merged. The discussion happened here (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html). The reason why taking cluster inode lock at vfs entry points opens up a self deadlock window, is explained in the previous patch of this series. So far, we have seen two different code paths that have this issue. 1. do_sys_open may_open inode_permission ocfs2_permission ocfs2_inode_lock() <=== take PR generic_permission get_acl ocfs2_iop_get_acl ocfs2_inode_lock() <=== take PR 2. fchmod|fchmodat chmod_common notify_change ocfs2_setattr <=== take EX posix_acl_chmod get_acl ocfs2_iop_get_acl <=== take PR ocfs2_iop_set_acl <=== take EX Fixes them by adding the tracking logic (in the previous patch) for these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(), ocfs2_setattr(). Changes since v1: - Let ocfs2_is_locked_by_me() just return true/false to indicate if the process gets the cluster lock - suggested by: Joseph Qi and Junxiao Bi . - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder", suggested by: Junxiao Bi. - Add debugging output at ocfs2_setattr() and ocfs2_permission() to catch exceptional cases, suggested by: Junxiao Bi. Changes since v2: - Use new wrappers of tracking logic code, suggested by: Junxiao Bi. Signed-off-by: Eric Ren Looks good. Reviewed-by: Joseph Qi --- fs/ocfs2/acl.c | 29 + fs/ocfs2/file.c | 58 - 2 files changed, 58 insertions(+), 29 deletions(-) diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c index bed1fcb..dc22ba8 100644 --- a/fs/ocfs2/acl.c +++ b/fs/ocfs2/acl.c @@ -283,16 +283,14 @@ int ocfs2_set_acl(handle_t *handle, int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type) { struct buffer_head *bh = NULL; - int status = 0; + int status, had_lock; + struct ocfs2_lock_holder oh; - status = ocfs2_inode_lock(inode, , 1); - if (status < 0) { - if (status != -ENOENT) - mlog_errno(status); - return status; - } + had_lock = ocfs2_inode_lock_tracker(inode, , 1, ); + if (had_lock < 0) + return had_lock; status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL); - ocfs2_inode_unlock(inode, 1); + ocfs2_inode_unlock_tracker(inode, 1, , had_lock); brelse(bh); return status; } @@ -302,21 +300,20 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, int type) struct ocfs2_super *osb; struct buffer_head *di_bh = NULL; struct posix_acl *acl; - int ret; + int had_lock; + struct ocfs2_lock_holder oh; osb = OCFS2_SB(inode->i_sb); if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL)) return NULL; - ret = ocfs2_inode_lock(inode, _bh, 0); - if (ret < 0) { - if (ret != -ENOENT) - mlog_errno(ret); - return ERR_PTR(ret); - } + + had_lock = ocfs2_inode_lock_tracker(inode, _bh, 0, ); + if (had_lock < 0) + return ERR_PTR(had_lock); acl = ocfs2_get_acl_nolock(inode, type, di_bh); - ocfs2_inode_unlock(inode, 0); + ocfs2_inode_unlock_tracker(inode, 0, , had_lock); brelse(di_bh); return acl; } diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index c488965..7b6a146 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1138,6 +1138,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) handle_t *handle = NULL; struct dquot *transfer_to[MAXQUOTAS] = { }; int qtype; + int had_lock; + struct ocfs2_lock_holder oh; trace_ocfs2_setattr(inode, dentry, (unsigned long long)OCFS2_I(inode)->ip_blkno, @@ -1173,11 +1175,30 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) } } - status = ocfs2_inode_lock(inode, , 1); - if (status < 0) { - if (status != -ENOENT) - mlog_errno(status); + had_lock = ocfs2_inode_lock_tracker(inode, , 1, ); + if (had_lock < 0) { + status = had_lock; goto bail_unlock_rw; + } else if (had_lock) { + /* +* As far as we know, ocfs2_setattr() could only be the first +* VFS entry point in the call chain of recursive cluster +* locking issue. +* +* For instance: +* chmod_common() +* notify_change() +*
Re: [PATCH] fbdev: ssd1307fb: allow reset-gpios is missing
On Mon, Jan 16, 2017 at 05:50:21PM +0800, Icenowy Zheng wrote: > > 2017年1月16日 16:02于 Maxime Ripard 写道: > > > > On Sun, Jan 15, 2017 at 07:21:46PM +0800, Icenowy Zheng wrote: > > > Currently some SSD1306 OLED modules are sold without a reset pin (only > > > VCC, GND, SCK, SDA four pins). > > > > > > Add support for missing reset-gpios property. > > > > > > Signed-off-by: Icenowy Zheng > > > > Unfortunately, a similar patch has been sent a couple of times > > already: > > https://www.spinics.net/lists/devicetree/msg158330.html > > Why is it never merged? It was sent 4 days ago... And since you didn't have the right maintainers in the cc list, yours didn't have a chance to be merged either. > There are really boards that needs this function. Then you can accelerate its inclusion by reviewing it. Maxime -- Maxime Ripard, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com signature.asc Description: PGP signature
Re: [PATCH v4 07/15] lockdep: Implement crossrelease feature
On Tue, Jan 17, 2017 at 02:24:08PM +0800, Boqun Feng wrote: > On Tue, Jan 17, 2017 at 11:33:41AM +0900, Byungchul Park wrote: > > On Mon, Jan 16, 2017 at 04:13:19PM +0100, Peter Zijlstra wrote: > > > On Fri, Dec 09, 2016 at 02:12:03PM +0900, Byungchul Park wrote: > > > > + /* > > > > +* We assign class_idx here redundantly even though following > > > > +* memcpy will cover it, in order to ensure a rcu reader can > > > > +* access the class_idx atomically without lock. > > > > +* > > > > +* Here we assume setting a word-sized variable is atomic. > > > > > > which one, where? > > > > I meant xlock_class(xlock) in check_add_plock(). > > > > I was not sure about the following two. > > > > 1. Is it ordered between following a and b? > >a. memcpy -> list_add_tail_rcu > >b. list_for_each_entry_rcu -> load class_idx (xlock_class) > >I assumed that it's not ordered. > > 2. Does memcpy guarantee atomic store for each word? > >I assumed that it doesn't. > > > > But I think I was wrong.. The first might be ordered. I will remove > > the following redundant statement. It'd be orderd, right? > > > > Yes, a and b are ordered, IOW, they could be paired, meaning when we > got the item in a list_for_each_entry_rcu() loop, all memory operations > before the corresponding list_add_tail_rcu() should be observed by us. Thank you for confirming it. > > Regards, > Boqun > > > > > > > > +*/ > > > > + xlock->hlock.class_idx = hlock->class_idx; > > > > + gen_id = (unsigned int)atomic_inc_return(_gen_id); > > > > + WRITE_ONCE(xlock->gen_id, gen_id); > > > > + memcpy(>hlock, hlock, sizeof(struct held_lock)); > > > > + INIT_LIST_HEAD(>xlock_entry); > > > > + list_add_tail_rcu(>xlock_entry, _head); > > >
Re: [PATCH] staging: lustre: headers: potential UAPI headers
On Mon, Jan 16, 2017 at 09:28:30PM +, James Simmons wrote: > > > On Mon, Dec 19, 2016 at 12:06:47PM -0500, James Simmons wrote: > > > Not for landing. This is the purposed UAPI headers > > > with the removal of unlikely and debugging macros. > > > This is just for feedback to see if this is acceptable > > > for the upstream client. > > > > > > Signed-off-by: James Simmons> > > --- > > > .../lustre/lustre/include/lustre/lustre_fid.h | 353 > > > + > > > .../lustre/lustre/include/lustre/lustre_ostid.h| 233 ++ > > > > Can you make a lustre "uapi" directory so we can see which files you > > really want to be UAPI and which you don't as time goes on? > > Where do you want them placed? In uapi/linux/lustre or uapi/lustre. Does > it matter to you? The below was to forth coming UAPI headers which from > your response you seem okay with in general. How many .h files are there going to be? It's just a single filesystem, shouldn't you just need a single file? If so, how about drivers/staging/lustre/include/uapi/lustre.h ? If you really need multiple .h files, put them all in the same uapi/ directory with a lustre_ prefix, you don't need a whole subdir just for yourself, right? thanks, greg k-h
Re: [PATCH] staging: lustre: headers: potential UAPI headers
On Mon, Jan 16, 2017 at 09:28:30PM +, James Simmons wrote: > > > On Mon, Dec 19, 2016 at 12:06:47PM -0500, James Simmons wrote: > > > Not for landing. This is the purposed UAPI headers > > > with the removal of unlikely and debugging macros. > > > This is just for feedback to see if this is acceptable > > > for the upstream client. > > > > > > Signed-off-by: James Simmons > > > --- > > > .../lustre/lustre/include/lustre/lustre_fid.h | 353 > > > + > > > .../lustre/lustre/include/lustre/lustre_ostid.h| 233 ++ > > > > Can you make a lustre "uapi" directory so we can see which files you > > really want to be UAPI and which you don't as time goes on? > > Where do you want them placed? In uapi/linux/lustre or uapi/lustre. Does > it matter to you? The below was to forth coming UAPI headers which from > your response you seem okay with in general. How many .h files are there going to be? It's just a single filesystem, shouldn't you just need a single file? If so, how about drivers/staging/lustre/include/uapi/lustre.h ? If you really need multiple .h files, put them all in the same uapi/ directory with a lustre_ prefix, you don't need a whole subdir just for yourself, right? thanks, greg k-h
Re: [PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
On 17/1/17 14:30, Eric Ren wrote: We are in the situation that we have to avoid recursive cluster locking, but there is no way to check if a cluster lock has been taken by a precess already. Mostly, we can avoid recursive locking by writing code carefully. However, we found that it's very hard to handle the routines that are invoked directly by vfs code. For instance: const struct inode_operations ocfs2_file_iops = { .permission = ocfs2_permission, .get_acl= ocfs2_iop_get_acl, .set_acl= ocfs2_iop_set_acl, }; Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR): do_sys_open may_open inode_permission ocfs2_permission ocfs2_inode_lock() <=== first time generic_permission get_acl ocfs2_iop_get_acl ocfs2_inode_lock() <=== recursive one A deadlock will occur if a remote EX request comes in between two of ocfs2_inode_lock(). Briefly describe how the deadlock is formed: On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of the remote EX lock request. Another hand, the recursive cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock() because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why? because there is no chance for the first cluster lock on this node to be unlocked - we block ourselves in the code path. The idea to fix this issue is mostly taken from gfs2 code. 1. introduce a new field: struct ocfs2_lock_res.l_holders, to keep track of the processes' pid who has taken the cluster lock of this lock resource; 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH; it means just getting back disk inode bh for us if we've got cluster lock. 3. export a helper: ocfs2_is_locked_by_me() is used to check if we have got the cluster lock in the upper code path. The tracking logic should be used by some of the ocfs2 vfs's callbacks, to solve the recursive locking issue cuased by the fact that vfs routines can call into each other. The performance penalty of processing the holder list should only be seen at a few cases where the tracking logic is used, such as get/set acl. You may ask what if the first time we got a PR lock, and the second time we want a EX lock? fortunately, this case never happens in the real world, as far as I can see, including permission check, (get|set)_(acl|attr), and the gfs2 code also do so. Changes since v1: - Let ocfs2_is_locked_by_me() just return true/false to indicate if the process gets the cluster lock - suggested by: Joseph Qiand Junxiao Bi . - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder", suggested by: Junxiao Bi. - Do not inline functions whose bodies are not in scope, changed by: Stephen Rothwell . Changes since v2: - Wrap the tracking logic code of recursive locking into functions, ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(), suggested by: Junxiao Bi. [s...@canb.auug.org.au remove some inlines] Signed-off-by: Eric Ren --- fs/ocfs2/dlmglue.c | 105 +++-- fs/ocfs2/dlmglue.h | 18 + fs/ocfs2/ocfs2.h | 1 + 3 files changed, 121 insertions(+), 3 deletions(-) diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c index 77d1632..c75b9e9 100644 --- a/fs/ocfs2/dlmglue.c +++ b/fs/ocfs2/dlmglue.c @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res) init_waitqueue_head(>l_event); INIT_LIST_HEAD(>l_blocked_list); INIT_LIST_HEAD(>l_mask_waiters); + INIT_LIST_HEAD(>l_holders); } void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res, @@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res) res->l_flags = 0UL; } +/* + * Keep a list of processes who have interest in a lockres. + * Note: this is now only uesed for check recursive cluster locking. + */ +static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres, + struct ocfs2_lock_holder *oh) +{ + INIT_LIST_HEAD(>oh_list); + oh->oh_owner_pid = get_pid(task_pid(current)); Trim the redundant space here. Others look good to me. Reviewed-by: Joseph Qi + + spin_lock(>l_lock); + list_add_tail(>oh_list, >l_holders); + spin_unlock(>l_lock); +} + +static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres, + struct ocfs2_lock_holder *oh) +{ + spin_lock(>l_lock); + list_del(>oh_list); + spin_unlock(>l_lock); + + put_pid(oh->oh_owner_pid); +} + +static inline int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres) +{ + struct ocfs2_lock_holder *oh; + struct pid *pid; + + /* look in the list of holders for one with the current task as owner */
Re: [PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
On 17/1/17 14:30, Eric Ren wrote: We are in the situation that we have to avoid recursive cluster locking, but there is no way to check if a cluster lock has been taken by a precess already. Mostly, we can avoid recursive locking by writing code carefully. However, we found that it's very hard to handle the routines that are invoked directly by vfs code. For instance: const struct inode_operations ocfs2_file_iops = { .permission = ocfs2_permission, .get_acl= ocfs2_iop_get_acl, .set_acl= ocfs2_iop_set_acl, }; Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR): do_sys_open may_open inode_permission ocfs2_permission ocfs2_inode_lock() <=== first time generic_permission get_acl ocfs2_iop_get_acl ocfs2_inode_lock() <=== recursive one A deadlock will occur if a remote EX request comes in between two of ocfs2_inode_lock(). Briefly describe how the deadlock is formed: On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of the remote EX lock request. Another hand, the recursive cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock() because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why? because there is no chance for the first cluster lock on this node to be unlocked - we block ourselves in the code path. The idea to fix this issue is mostly taken from gfs2 code. 1. introduce a new field: struct ocfs2_lock_res.l_holders, to keep track of the processes' pid who has taken the cluster lock of this lock resource; 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH; it means just getting back disk inode bh for us if we've got cluster lock. 3. export a helper: ocfs2_is_locked_by_me() is used to check if we have got the cluster lock in the upper code path. The tracking logic should be used by some of the ocfs2 vfs's callbacks, to solve the recursive locking issue cuased by the fact that vfs routines can call into each other. The performance penalty of processing the holder list should only be seen at a few cases where the tracking logic is used, such as get/set acl. You may ask what if the first time we got a PR lock, and the second time we want a EX lock? fortunately, this case never happens in the real world, as far as I can see, including permission check, (get|set)_(acl|attr), and the gfs2 code also do so. Changes since v1: - Let ocfs2_is_locked_by_me() just return true/false to indicate if the process gets the cluster lock - suggested by: Joseph Qi and Junxiao Bi . - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder", suggested by: Junxiao Bi. - Do not inline functions whose bodies are not in scope, changed by: Stephen Rothwell . Changes since v2: - Wrap the tracking logic code of recursive locking into functions, ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(), suggested by: Junxiao Bi. [s...@canb.auug.org.au remove some inlines] Signed-off-by: Eric Ren --- fs/ocfs2/dlmglue.c | 105 +++-- fs/ocfs2/dlmglue.h | 18 + fs/ocfs2/ocfs2.h | 1 + 3 files changed, 121 insertions(+), 3 deletions(-) diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c index 77d1632..c75b9e9 100644 --- a/fs/ocfs2/dlmglue.c +++ b/fs/ocfs2/dlmglue.c @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res) init_waitqueue_head(>l_event); INIT_LIST_HEAD(>l_blocked_list); INIT_LIST_HEAD(>l_mask_waiters); + INIT_LIST_HEAD(>l_holders); } void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res, @@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res) res->l_flags = 0UL; } +/* + * Keep a list of processes who have interest in a lockres. + * Note: this is now only uesed for check recursive cluster locking. + */ +static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres, + struct ocfs2_lock_holder *oh) +{ + INIT_LIST_HEAD(>oh_list); + oh->oh_owner_pid = get_pid(task_pid(current)); Trim the redundant space here. Others look good to me. Reviewed-by: Joseph Qi + + spin_lock(>l_lock); + list_add_tail(>oh_list, >l_holders); + spin_unlock(>l_lock); +} + +static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres, + struct ocfs2_lock_holder *oh) +{ + spin_lock(>l_lock); + list_del(>oh_list); + spin_unlock(>l_lock); + + put_pid(oh->oh_owner_pid); +} + +static inline int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres) +{ + struct ocfs2_lock_holder *oh; + struct pid *pid; + + /* look in the list of holders for one with the current task as owner */ + spin_lock(>l_lock); + pid = task_pid(current); + list_for_each_entry(oh, >l_holders,
[PATCH v2] usb: host: xhci: plat: check hcc_params after add hcd
From: William wuThe commit 4ac53087d6d4 ("usb: xhci: plat: Create both HCDs before adding them") move add hcd to the end of probe, this cause hcc_params uninitiated, because xHCI driver sets hcc_params in xhci_gen_setup() called from usb_add_hcd(). This patch checks the Maximum Primary Stream Array Size in the hcc_params register after add primary hcd. Signed-off-by: William wu --- drivers/usb/host/xhci-plat.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index ddfab30..f96caeb 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -232,9 +232,6 @@ static int xhci_plat_probe(struct platform_device *pdev) if (device_property_read_bool(>dev, "usb3-lpm-capable")) xhci->quirks |= XHCI_LPM_SUPPORT; - if (HCC_MAX_PSA(xhci->hcc_params) >= 4) - xhci->shared_hcd->can_do_streams = 1; - hcd->usb_phy = devm_usb_get_phy_by_phandle(>dev, "usb-phy", 0); if (IS_ERR(hcd->usb_phy)) { ret = PTR_ERR(hcd->usb_phy); @@ -251,6 +248,9 @@ static int xhci_plat_probe(struct platform_device *pdev) if (ret) goto disable_usb_phy; + if (HCC_MAX_PSA(xhci->hcc_params) >= 4) + xhci->shared_hcd->can_do_streams = 1; + ret = usb_add_hcd(xhci->shared_hcd, irq, IRQF_SHARED); if (ret) goto dealloc_usb2_hcd; -- 2.7.4
[PATCH v2] usb: host: xhci: plat: check hcc_params after add hcd
From: William wu The commit 4ac53087d6d4 ("usb: xhci: plat: Create both HCDs before adding them") move add hcd to the end of probe, this cause hcc_params uninitiated, because xHCI driver sets hcc_params in xhci_gen_setup() called from usb_add_hcd(). This patch checks the Maximum Primary Stream Array Size in the hcc_params register after add primary hcd. Signed-off-by: William wu --- drivers/usb/host/xhci-plat.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index ddfab30..f96caeb 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -232,9 +232,6 @@ static int xhci_plat_probe(struct platform_device *pdev) if (device_property_read_bool(>dev, "usb3-lpm-capable")) xhci->quirks |= XHCI_LPM_SUPPORT; - if (HCC_MAX_PSA(xhci->hcc_params) >= 4) - xhci->shared_hcd->can_do_streams = 1; - hcd->usb_phy = devm_usb_get_phy_by_phandle(>dev, "usb-phy", 0); if (IS_ERR(hcd->usb_phy)) { ret = PTR_ERR(hcd->usb_phy); @@ -251,6 +248,9 @@ static int xhci_plat_probe(struct platform_device *pdev) if (ret) goto disable_usb_phy; + if (HCC_MAX_PSA(xhci->hcc_params) >= 4) + xhci->shared_hcd->can_do_streams = 1; + ret = usb_add_hcd(xhci->shared_hcd, irq, IRQF_SHARED); if (ret) goto dealloc_usb2_hcd; -- 2.7.4
Re: [PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
On 01/17/2017 02:30 PM, Eric Ren wrote: > We are in the situation that we have to avoid recursive cluster locking, > but there is no way to check if a cluster lock has been taken by a > precess already. > > Mostly, we can avoid recursive locking by writing code carefully. > However, we found that it's very hard to handle the routines that > are invoked directly by vfs code. For instance: > > const struct inode_operations ocfs2_file_iops = { > .permission = ocfs2_permission, > .get_acl= ocfs2_iop_get_acl, > .set_acl= ocfs2_iop_set_acl, > }; > > Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR): > do_sys_open > may_open > inode_permission >ocfs2_permission > ocfs2_inode_lock() <=== first time > generic_permission > get_acl >ocfs2_iop_get_acl > ocfs2_inode_lock() <=== recursive one > > A deadlock will occur if a remote EX request comes in between two > of ocfs2_inode_lock(). Briefly describe how the deadlock is formed: > > On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in > BAST(ocfs2_generic_handle_bast) when downconvert is started > on behalf of the remote EX lock request. Another hand, the recursive > cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock() > because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why? > because there is no chance for the first cluster lock on this node to be > unlocked - we block ourselves in the code path. > > The idea to fix this issue is mostly taken from gfs2 code. > 1. introduce a new field: struct ocfs2_lock_res.l_holders, to > keep track of the processes' pid who has taken the cluster lock > of this lock resource; > 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH; > it means just getting back disk inode bh for us if we've got cluster lock. > 3. export a helper: ocfs2_is_locked_by_me() is used to check if we > have got the cluster lock in the upper code path. > > The tracking logic should be used by some of the ocfs2 vfs's callbacks, > to solve the recursive locking issue cuased by the fact that vfs routines > can call into each other. > > The performance penalty of processing the holder list should only be seen > at a few cases where the tracking logic is used, such as get/set acl. > > You may ask what if the first time we got a PR lock, and the second time > we want a EX lock? fortunately, this case never happens in the real world, > as far as I can see, including permission check, (get|set)_(acl|attr), and > the gfs2 code also do so. > > Changes since v1: > - Let ocfs2_is_locked_by_me() just return true/false to indicate if the > process gets the cluster lock - suggested by: Joseph Qi> and Junxiao Bi . > > - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder", > suggested by: Junxiao Bi. > > - Do not inline functions whose bodies are not in scope, changed by: > Stephen Rothwell . > > Changes since v2: > - Wrap the tracking logic code of recursive locking into functions, > ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(), > suggested by: Junxiao Bi. > > [s...@canb.auug.org.au remove some inlines] > Signed-off-by: Eric Ren Reviewed-by: Junxiao Bi > --- > fs/ocfs2/dlmglue.c | 105 > +++-- > fs/ocfs2/dlmglue.h | 18 + > fs/ocfs2/ocfs2.h | 1 + > 3 files changed, 121 insertions(+), 3 deletions(-) > > diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c > index 77d1632..c75b9e9 100644 > --- a/fs/ocfs2/dlmglue.c > +++ b/fs/ocfs2/dlmglue.c > @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res) > init_waitqueue_head(>l_event); > INIT_LIST_HEAD(>l_blocked_list); > INIT_LIST_HEAD(>l_mask_waiters); > + INIT_LIST_HEAD(>l_holders); > } > > void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res, > @@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res) > res->l_flags = 0UL; > } > > +/* > + * Keep a list of processes who have interest in a lockres. > + * Note: this is now only uesed for check recursive cluster locking. > + */ > +static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres, > +struct ocfs2_lock_holder *oh) > +{ > + INIT_LIST_HEAD(>oh_list); > + oh->oh_owner_pid = get_pid(task_pid(current)); > + > + spin_lock(>l_lock); > + list_add_tail(>oh_list, >l_holders); > + spin_unlock(>l_lock); > +} > + > +static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres, > +struct ocfs2_lock_holder *oh) > +{ > + spin_lock(>l_lock); > + list_del(>oh_list); > + spin_unlock(>l_lock); > + > + put_pid(oh->oh_owner_pid); > +} > + > +static inline int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres) > +{
Re: [PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
On 01/17/2017 02:30 PM, Eric Ren wrote: > We are in the situation that we have to avoid recursive cluster locking, > but there is no way to check if a cluster lock has been taken by a > precess already. > > Mostly, we can avoid recursive locking by writing code carefully. > However, we found that it's very hard to handle the routines that > are invoked directly by vfs code. For instance: > > const struct inode_operations ocfs2_file_iops = { > .permission = ocfs2_permission, > .get_acl= ocfs2_iop_get_acl, > .set_acl= ocfs2_iop_set_acl, > }; > > Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR): > do_sys_open > may_open > inode_permission >ocfs2_permission > ocfs2_inode_lock() <=== first time > generic_permission > get_acl >ocfs2_iop_get_acl > ocfs2_inode_lock() <=== recursive one > > A deadlock will occur if a remote EX request comes in between two > of ocfs2_inode_lock(). Briefly describe how the deadlock is formed: > > On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in > BAST(ocfs2_generic_handle_bast) when downconvert is started > on behalf of the remote EX lock request. Another hand, the recursive > cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock() > because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why? > because there is no chance for the first cluster lock on this node to be > unlocked - we block ourselves in the code path. > > The idea to fix this issue is mostly taken from gfs2 code. > 1. introduce a new field: struct ocfs2_lock_res.l_holders, to > keep track of the processes' pid who has taken the cluster lock > of this lock resource; > 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH; > it means just getting back disk inode bh for us if we've got cluster lock. > 3. export a helper: ocfs2_is_locked_by_me() is used to check if we > have got the cluster lock in the upper code path. > > The tracking logic should be used by some of the ocfs2 vfs's callbacks, > to solve the recursive locking issue cuased by the fact that vfs routines > can call into each other. > > The performance penalty of processing the holder list should only be seen > at a few cases where the tracking logic is used, such as get/set acl. > > You may ask what if the first time we got a PR lock, and the second time > we want a EX lock? fortunately, this case never happens in the real world, > as far as I can see, including permission check, (get|set)_(acl|attr), and > the gfs2 code also do so. > > Changes since v1: > - Let ocfs2_is_locked_by_me() just return true/false to indicate if the > process gets the cluster lock - suggested by: Joseph Qi > and Junxiao Bi . > > - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder", > suggested by: Junxiao Bi. > > - Do not inline functions whose bodies are not in scope, changed by: > Stephen Rothwell . > > Changes since v2: > - Wrap the tracking logic code of recursive locking into functions, > ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(), > suggested by: Junxiao Bi. > > [s...@canb.auug.org.au remove some inlines] > Signed-off-by: Eric Ren Reviewed-by: Junxiao Bi > --- > fs/ocfs2/dlmglue.c | 105 > +++-- > fs/ocfs2/dlmglue.h | 18 + > fs/ocfs2/ocfs2.h | 1 + > 3 files changed, 121 insertions(+), 3 deletions(-) > > diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c > index 77d1632..c75b9e9 100644 > --- a/fs/ocfs2/dlmglue.c > +++ b/fs/ocfs2/dlmglue.c > @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res) > init_waitqueue_head(>l_event); > INIT_LIST_HEAD(>l_blocked_list); > INIT_LIST_HEAD(>l_mask_waiters); > + INIT_LIST_HEAD(>l_holders); > } > > void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res, > @@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res) > res->l_flags = 0UL; > } > > +/* > + * Keep a list of processes who have interest in a lockres. > + * Note: this is now only uesed for check recursive cluster locking. > + */ > +static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres, > +struct ocfs2_lock_holder *oh) > +{ > + INIT_LIST_HEAD(>oh_list); > + oh->oh_owner_pid = get_pid(task_pid(current)); > + > + spin_lock(>l_lock); > + list_add_tail(>oh_list, >l_holders); > + spin_unlock(>l_lock); > +} > + > +static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres, > +struct ocfs2_lock_holder *oh) > +{ > + spin_lock(>l_lock); > + list_del(>oh_list); > + spin_unlock(>l_lock); > + > + put_pid(oh->oh_owner_pid); > +} > + > +static inline int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres) > +{ > + struct ocfs2_lock_holder *oh; > + struct pid *pid; > + > + /* look in the list of holders
Re: [PATCH v3 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points
On 01/17/2017 02:30 PM, Eric Ren wrote: > Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") > results in a deadlock, as the author "Tariq Saeed" realized shortly > after the patch was merged. The discussion happened here > (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html). > > The reason why taking cluster inode lock at vfs entry points opens up > a self deadlock window, is explained in the previous patch of this > series. > > So far, we have seen two different code paths that have this issue. > 1. do_sys_open > may_open > inode_permission >ocfs2_permission > ocfs2_inode_lock() <=== take PR > generic_permission > get_acl >ocfs2_iop_get_acl > ocfs2_inode_lock() <=== take PR > 2. fchmod|fchmodat > chmod_common > notify_change > ocfs2_setattr <=== take EX >posix_acl_chmod > get_acl > ocfs2_iop_get_acl <=== take PR > ocfs2_iop_set_acl <=== take EX > > Fixes them by adding the tracking logic (in the previous patch) for > these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(), > ocfs2_setattr(). > > Changes since v1: > - Let ocfs2_is_locked_by_me() just return true/false to indicate if the > process gets the cluster lock - suggested by: Joseph Qi> and Junxiao Bi . > > - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder", > suggested by: Junxiao Bi. > > - Add debugging output at ocfs2_setattr() and ocfs2_permission() to > catch exceptional cases, suggested by: Junxiao Bi. > > Changes since v2: > - Use new wrappers of tracking logic code, suggested by: Junxiao Bi. > > Signed-off-by: Eric Ren Reviewed-by: Junxiao Bi > --- > fs/ocfs2/acl.c | 29 + > fs/ocfs2/file.c | 58 > - > 2 files changed, 58 insertions(+), 29 deletions(-) > > diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c > index bed1fcb..dc22ba8 100644 > --- a/fs/ocfs2/acl.c > +++ b/fs/ocfs2/acl.c > @@ -283,16 +283,14 @@ int ocfs2_set_acl(handle_t *handle, > int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type) > { > struct buffer_head *bh = NULL; > - int status = 0; > + int status, had_lock; > + struct ocfs2_lock_holder oh; > > - status = ocfs2_inode_lock(inode, , 1); > - if (status < 0) { > - if (status != -ENOENT) > - mlog_errno(status); > - return status; > - } > + had_lock = ocfs2_inode_lock_tracker(inode, , 1, ); > + if (had_lock < 0) > + return had_lock; > status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL); > - ocfs2_inode_unlock(inode, 1); > + ocfs2_inode_unlock_tracker(inode, 1, , had_lock); > brelse(bh); > return status; > } > @@ -302,21 +300,20 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode > *inode, int type) > struct ocfs2_super *osb; > struct buffer_head *di_bh = NULL; > struct posix_acl *acl; > - int ret; > + int had_lock; > + struct ocfs2_lock_holder oh; > > osb = OCFS2_SB(inode->i_sb); > if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL)) > return NULL; > - ret = ocfs2_inode_lock(inode, _bh, 0); > - if (ret < 0) { > - if (ret != -ENOENT) > - mlog_errno(ret); > - return ERR_PTR(ret); > - } > + > + had_lock = ocfs2_inode_lock_tracker(inode, _bh, 0, ); > + if (had_lock < 0) > + return ERR_PTR(had_lock); > > acl = ocfs2_get_acl_nolock(inode, type, di_bh); > > - ocfs2_inode_unlock(inode, 0); > + ocfs2_inode_unlock_tracker(inode, 0, , had_lock); > brelse(di_bh); > return acl; > } > diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c > index c488965..7b6a146 100644 > --- a/fs/ocfs2/file.c > +++ b/fs/ocfs2/file.c > @@ -1138,6 +1138,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr > *attr) > handle_t *handle = NULL; > struct dquot *transfer_to[MAXQUOTAS] = { }; > int qtype; > + int had_lock; > + struct ocfs2_lock_holder oh; > > trace_ocfs2_setattr(inode, dentry, > (unsigned long long)OCFS2_I(inode)->ip_blkno, > @@ -1173,11 +1175,30 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr > *attr) > } > } > > - status = ocfs2_inode_lock(inode, , 1); > - if (status < 0) { > - if (status != -ENOENT) > - mlog_errno(status); > + had_lock = ocfs2_inode_lock_tracker(inode, , 1, ); > + if (had_lock < 0) { > + status = had_lock; > goto bail_unlock_rw; > + } else if (had_lock) { > + /* > + * As far as we know, ocfs2_setattr() could only be the first > + * VFS entry
Re: [PATCH v3 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points
On 01/17/2017 02:30 PM, Eric Ren wrote: > Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") > results in a deadlock, as the author "Tariq Saeed" realized shortly > after the patch was merged. The discussion happened here > (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html). > > The reason why taking cluster inode lock at vfs entry points opens up > a self deadlock window, is explained in the previous patch of this > series. > > So far, we have seen two different code paths that have this issue. > 1. do_sys_open > may_open > inode_permission >ocfs2_permission > ocfs2_inode_lock() <=== take PR > generic_permission > get_acl >ocfs2_iop_get_acl > ocfs2_inode_lock() <=== take PR > 2. fchmod|fchmodat > chmod_common > notify_change > ocfs2_setattr <=== take EX >posix_acl_chmod > get_acl > ocfs2_iop_get_acl <=== take PR > ocfs2_iop_set_acl <=== take EX > > Fixes them by adding the tracking logic (in the previous patch) for > these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(), > ocfs2_setattr(). > > Changes since v1: > - Let ocfs2_is_locked_by_me() just return true/false to indicate if the > process gets the cluster lock - suggested by: Joseph Qi > and Junxiao Bi . > > - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder", > suggested by: Junxiao Bi. > > - Add debugging output at ocfs2_setattr() and ocfs2_permission() to > catch exceptional cases, suggested by: Junxiao Bi. > > Changes since v2: > - Use new wrappers of tracking logic code, suggested by: Junxiao Bi. > > Signed-off-by: Eric Ren Reviewed-by: Junxiao Bi > --- > fs/ocfs2/acl.c | 29 + > fs/ocfs2/file.c | 58 > - > 2 files changed, 58 insertions(+), 29 deletions(-) > > diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c > index bed1fcb..dc22ba8 100644 > --- a/fs/ocfs2/acl.c > +++ b/fs/ocfs2/acl.c > @@ -283,16 +283,14 @@ int ocfs2_set_acl(handle_t *handle, > int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type) > { > struct buffer_head *bh = NULL; > - int status = 0; > + int status, had_lock; > + struct ocfs2_lock_holder oh; > > - status = ocfs2_inode_lock(inode, , 1); > - if (status < 0) { > - if (status != -ENOENT) > - mlog_errno(status); > - return status; > - } > + had_lock = ocfs2_inode_lock_tracker(inode, , 1, ); > + if (had_lock < 0) > + return had_lock; > status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL); > - ocfs2_inode_unlock(inode, 1); > + ocfs2_inode_unlock_tracker(inode, 1, , had_lock); > brelse(bh); > return status; > } > @@ -302,21 +300,20 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode > *inode, int type) > struct ocfs2_super *osb; > struct buffer_head *di_bh = NULL; > struct posix_acl *acl; > - int ret; > + int had_lock; > + struct ocfs2_lock_holder oh; > > osb = OCFS2_SB(inode->i_sb); > if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL)) > return NULL; > - ret = ocfs2_inode_lock(inode, _bh, 0); > - if (ret < 0) { > - if (ret != -ENOENT) > - mlog_errno(ret); > - return ERR_PTR(ret); > - } > + > + had_lock = ocfs2_inode_lock_tracker(inode, _bh, 0, ); > + if (had_lock < 0) > + return ERR_PTR(had_lock); > > acl = ocfs2_get_acl_nolock(inode, type, di_bh); > > - ocfs2_inode_unlock(inode, 0); > + ocfs2_inode_unlock_tracker(inode, 0, , had_lock); > brelse(di_bh); > return acl; > } > diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c > index c488965..7b6a146 100644 > --- a/fs/ocfs2/file.c > +++ b/fs/ocfs2/file.c > @@ -1138,6 +1138,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr > *attr) > handle_t *handle = NULL; > struct dquot *transfer_to[MAXQUOTAS] = { }; > int qtype; > + int had_lock; > + struct ocfs2_lock_holder oh; > > trace_ocfs2_setattr(inode, dentry, > (unsigned long long)OCFS2_I(inode)->ip_blkno, > @@ -1173,11 +1175,30 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr > *attr) > } > } > > - status = ocfs2_inode_lock(inode, , 1); > - if (status < 0) { > - if (status != -ENOENT) > - mlog_errno(status); > + had_lock = ocfs2_inode_lock_tracker(inode, , 1, ); > + if (had_lock < 0) { > + status = had_lock; > goto bail_unlock_rw; > + } else if (had_lock) { > + /* > + * As far as we know, ocfs2_setattr() could only be the first > + * VFS entry point in the call chain of recursive cluster > + * locking issue. > +
Re: [PATCH] tpm: add session handles to the save and restore of the tpm2 space manager
On Mon, Jan 16, 2017 at 03:18:45PM -0800, James Bottomley wrote: > On Mon, 2017-01-16 at 12:04 +0200, Jarkko Sakkinen wrote: > > On Fri, Jan 13, 2017 at 11:24:13AM -0800, James Bottomley wrote: > > > Session handles are slightly more difficult to manage because any > > > TPM > > > only has a finite number of allowed handles, even if the session > > > has > > > been saved; so when you context save a session, you must not flush > > > it > > > because that would destroy the ability to context load it (you only > > > flush sessions when you're done with them) and the tpm won't re-use > > > the handle. Additionally, sessions can be flushed as part of the > > > successful execution of a command if the continueSession attribute > > > is > > > clear, so we have to mark any session we find in the command (using > > > TPM2_HT_TAG_FOR_FLUSH) so it can be cleared from the space if the > > > command successfully executes. > > > > > > Finally, a session may also be cleared by flushing it, so we have > > > to > > > emulate the TPM2_FlushContext command to see if a session is being > > > flushed and manually clear it from the space. > > > > > > We also fully flush all sessions on device close. > > > > Some big overview comments without going deeply into details. I will > > use more time for this once the > > > > Please do not use handle allocation code for sessions. This commit > > makes the implementation a mess. Just use the phandle directly and > > have array of session phandles for each space. > > > > I would also almost require to have at minimum two patches: one that > > implements purely isolation and another that implements swapping. > > > > It might be for example that I want to land TPM spaces with session > > isolation to one release and swapping to n+1 because my hunch tells > > me that it is better to bake the swapping part for a while. > > > > One more thing. Maybe commit messages should speak uniformally about > > TPM spaces? They are a tool to implement resource manager, not a > > resource manager. > > Yes, so Ken also had a reply to this which the Mailing List seems to > have eaten: > > > This looks like session handles are virtualized. I believe that this > > will break the HMAC for commands (e.g. TPM2_PolicySecret) that have > > a session handle in the handle area. The session's handle is its > > "Name" and is included in the cpHash (command parameter hash) data > > that is HMACed. > > Basically this means that the advice to virtualize session handles in > the TCG RM document is wrong and we have to use physical handles. I'll > redo the implementation for this ... and now, since we'll have nothing > to use as an index, it probably does make sense to have sessions in a > separate array. I can also separate isolation from context switching > ... although I really think this is less optimal: my TPM only allows > three active context handles, so if we don't context switch them > identially to transient object (which it also only allows three of) I'm > going to run out. I actually redid my openssl_tpm_engine patches so > they use session handles for parameter encryption and HMAC based > authority, so this may end up biting me soon ... > > James This might be obvious to you but saying this just in case: everytime a session handle is created, you must traverse through struct tpm_space instances and check if they have that physical handle and remove it so that they don't leak. I would probably just create a linked list of struct tpm_space instances. Radix tree does not make much sense with the amount of sessions you might except to have on a system simultaneously. Anyway, this kind of lazy approach where you clean up as new stuff gets created is probably also the most straight forward... /Jarkko
Re: [PATCH] tpm: add session handles to the save and restore of the tpm2 space manager
On Mon, Jan 16, 2017 at 03:18:45PM -0800, James Bottomley wrote: > On Mon, 2017-01-16 at 12:04 +0200, Jarkko Sakkinen wrote: > > On Fri, Jan 13, 2017 at 11:24:13AM -0800, James Bottomley wrote: > > > Session handles are slightly more difficult to manage because any > > > TPM > > > only has a finite number of allowed handles, even if the session > > > has > > > been saved; so when you context save a session, you must not flush > > > it > > > because that would destroy the ability to context load it (you only > > > flush sessions when you're done with them) and the tpm won't re-use > > > the handle. Additionally, sessions can be flushed as part of the > > > successful execution of a command if the continueSession attribute > > > is > > > clear, so we have to mark any session we find in the command (using > > > TPM2_HT_TAG_FOR_FLUSH) so it can be cleared from the space if the > > > command successfully executes. > > > > > > Finally, a session may also be cleared by flushing it, so we have > > > to > > > emulate the TPM2_FlushContext command to see if a session is being > > > flushed and manually clear it from the space. > > > > > > We also fully flush all sessions on device close. > > > > Some big overview comments without going deeply into details. I will > > use more time for this once the > > > > Please do not use handle allocation code for sessions. This commit > > makes the implementation a mess. Just use the phandle directly and > > have array of session phandles for each space. > > > > I would also almost require to have at minimum two patches: one that > > implements purely isolation and another that implements swapping. > > > > It might be for example that I want to land TPM spaces with session > > isolation to one release and swapping to n+1 because my hunch tells > > me that it is better to bake the swapping part for a while. > > > > One more thing. Maybe commit messages should speak uniformally about > > TPM spaces? They are a tool to implement resource manager, not a > > resource manager. > > Yes, so Ken also had a reply to this which the Mailing List seems to > have eaten: > > > This looks like session handles are virtualized. I believe that this > > will break the HMAC for commands (e.g. TPM2_PolicySecret) that have > > a session handle in the handle area. The session's handle is its > > "Name" and is included in the cpHash (command parameter hash) data > > that is HMACed. > > Basically this means that the advice to virtualize session handles in > the TCG RM document is wrong and we have to use physical handles. I'll > redo the implementation for this ... and now, since we'll have nothing > to use as an index, it probably does make sense to have sessions in a > separate array. I can also separate isolation from context switching > ... although I really think this is less optimal: my TPM only allows > three active context handles, so if we don't context switch them > identially to transient object (which it also only allows three of) I'm > going to run out. I actually redid my openssl_tpm_engine patches so > they use session handles for parameter encryption and HMAC based > authority, so this may end up biting me soon ... > > James This might be obvious to you but saying this just in case: everytime a session handle is created, you must traverse through struct tpm_space instances and check if they have that physical handle and remove it so that they don't leak. I would probably just create a linked list of struct tpm_space instances. Radix tree does not make much sense with the amount of sessions you might except to have on a system simultaneously. Anyway, this kind of lazy approach where you clean up as new stuff gets created is probably also the most straight forward... /Jarkko
Re: [PATCH 1/1] x86: sanitize argument of clearcpuid command-line option
* Odzioba, Lukaszwrote: > > pr_warn("x86/cpu: Ignoring invalid "clearcpuid=%s' option!\n", arg) > > > > Which would save quite a bit of head scratching and frustration when > > someone has a > > bad enough day to add silly typos to the kernel cmdline. > > Is there any particular reason why we have such warnings only for early > params? > early_param handlers return non-zero values on success: > linux/init.h: " * Emits warning if fn returns non-zero." > __setup handlers in most cases seem to return 1 on success, is the expected > behaviour documented somewhere? > > After looking at some of the ~500 usages of __setup macro it seems that > handler's ret > code doesn't matter so much, because it is treated differently in various > parts > of the kernel. If we make it consistent possibly it could be solved similarly > to > early params by something like this: > > diff --git a/init/main.c b/init/main.c > index b0c9d6f..261178e 100644 > --- a/init/main.c > +++ b/init/main.c > @@ -182,8 +182,12 @@ static bool __init obsolete_checksetup(char *line) > pr_warn("Parameter %s is obsolete, ignored\n", > p->str); > return true; > - } else if (p->setup_func(line + n)) > - return true; > + } else { > + if (p->setup_func(line + n)) > + return true; > + else > + pr_warn("Malformed option '%s'\n", > line); > + } That looks sensible to me! I'd tweak the message slightly: pr_warn("error: Ignoring invalid boot parameter '%s'\n", line); to make it more clear that it's a boot option that has a problem (there are many other types of options), and to make sure the user knows that we ignored that option. Mind sending this as a proper patch, with akpm Cc:-ed? Thanks, Ingo
[PATCH v2 6+/6] platform/x86: dell-wmi-led: fix coding style issues
Fix coding style issues in dell-wmi-led which checkpatch complains about to make sure the module gets a clean start in the x86 platform driver subsystem. Signed-off-by: Michał Kępień--- This is an extra patch that Jacek asked for [1]. [1] https://lkml.org/lkml/2017/1/16/631 drivers/platform/x86/dell-wmi-led.c | 41 +++-- 1 file changed, 16 insertions(+), 25 deletions(-) diff --git a/drivers/platform/x86/dell-wmi-led.c b/drivers/platform/x86/dell-wmi-led.c index d0232c7f1909..8753c4fc36b8 100644 --- a/drivers/platform/x86/dell-wmi-led.c +++ b/drivers/platform/x86/dell-wmi-led.c @@ -46,21 +46,16 @@ struct bios_args { unsigned char off_time; }; -static int dell_led_perform_fn(u8 length, - u8 result_code, - u8 device_id, - u8 command, - u8 on_time, - u8 off_time) +static int dell_led_perform_fn(u8 length, u8 result_code, u8 device_id, + u8 command, u8 on_time, u8 off_time) { - struct bios_args *bios_return; - u8 return_code; - union acpi_object *obj; struct acpi_buffer output = { ACPI_ALLOCATE_BUFFER, NULL }; + struct bios_args *bios_return, args; struct acpi_buffer input; + union acpi_object *obj; acpi_status status; + u8 return_code; - struct bios_args args; args.length = length; args.result_code = result_code; args.device_id = device_id; @@ -71,11 +66,7 @@ static int dell_led_perform_fn(u8 length, input.length = sizeof(struct bios_args); input.pointer = - status = wmi_evaluate_method(DELL_LED_BIOS_GUID, - 1, - 1, - , - ); + status = wmi_evaluate_method(DELL_LED_BIOS_GUID, 1, 1, , ); if (ACPI_FAILURE(status)) return status; @@ -84,7 +75,7 @@ static int dell_led_perform_fn(u8 length, if (!obj) return -EINVAL; - else if (obj->type != ACPI_TYPE_BUFFER) { + if (obj->type != ACPI_TYPE_BUFFER) { kfree(obj); return -EINVAL; } @@ -117,8 +108,7 @@ static int led_off(void) 0); /* not used */ } -static int led_blink(unsigned char on_eighths, - unsigned char off_eighths) +static int led_blink(unsigned char on_eighths, unsigned char off_eighths) { return dell_led_perform_fn(5, /* Length of command */ INTERFACE_ERROR,/* Init to INTERFACE_ERROR */ @@ -129,7 +119,7 @@ static int led_blink(unsigned char on_eighths, } static void dell_led_set(struct led_classdev *led_cdev, - enum led_brightness value) +enum led_brightness value) { if (value == LED_OFF) led_off(); @@ -138,24 +128,25 @@ static void dell_led_set(struct led_classdev *led_cdev, } static int dell_led_blink(struct led_classdev *led_cdev, - unsigned long *delay_on, - unsigned long *delay_off) + unsigned long *delay_on, unsigned long *delay_off) { unsigned long on_eighths; unsigned long off_eighths; - /* The Dell LED delay is based on 125ms intervals. - Need to round up to next interval. */ + /* +* The Dell LED delay is based on 125ms intervals. +* Need to round up to next interval. +*/ on_eighths = (*delay_on + 124) / 125; - if (0 == on_eighths) + if (on_eighths == 0) on_eighths = 1; if (on_eighths > 255) on_eighths = 255; *delay_on = on_eighths * 125; off_eighths = (*delay_off + 124) / 125; - if (0 == off_eighths) + if (off_eighths == 0) off_eighths = 1; if (off_eighths > 255) off_eighths = 255; -- 2.11.0
Re: [PATCH 1/1] x86: sanitize argument of clearcpuid command-line option
* Odzioba, Lukasz wrote: > > pr_warn("x86/cpu: Ignoring invalid "clearcpuid=%s' option!\n", arg) > > > > Which would save quite a bit of head scratching and frustration when > > someone has a > > bad enough day to add silly typos to the kernel cmdline. > > Is there any particular reason why we have such warnings only for early > params? > early_param handlers return non-zero values on success: > linux/init.h: " * Emits warning if fn returns non-zero." > __setup handlers in most cases seem to return 1 on success, is the expected > behaviour documented somewhere? > > After looking at some of the ~500 usages of __setup macro it seems that > handler's ret > code doesn't matter so much, because it is treated differently in various > parts > of the kernel. If we make it consistent possibly it could be solved similarly > to > early params by something like this: > > diff --git a/init/main.c b/init/main.c > index b0c9d6f..261178e 100644 > --- a/init/main.c > +++ b/init/main.c > @@ -182,8 +182,12 @@ static bool __init obsolete_checksetup(char *line) > pr_warn("Parameter %s is obsolete, ignored\n", > p->str); > return true; > - } else if (p->setup_func(line + n)) > - return true; > + } else { > + if (p->setup_func(line + n)) > + return true; > + else > + pr_warn("Malformed option '%s'\n", > line); > + } That looks sensible to me! I'd tweak the message slightly: pr_warn("error: Ignoring invalid boot parameter '%s'\n", line); to make it more clear that it's a boot option that has a problem (there are many other types of options), and to make sure the user knows that we ignored that option. Mind sending this as a proper patch, with akpm Cc:-ed? Thanks, Ingo
[PATCH v2 6+/6] platform/x86: dell-wmi-led: fix coding style issues
Fix coding style issues in dell-wmi-led which checkpatch complains about to make sure the module gets a clean start in the x86 platform driver subsystem. Signed-off-by: Michał Kępień --- This is an extra patch that Jacek asked for [1]. [1] https://lkml.org/lkml/2017/1/16/631 drivers/platform/x86/dell-wmi-led.c | 41 +++-- 1 file changed, 16 insertions(+), 25 deletions(-) diff --git a/drivers/platform/x86/dell-wmi-led.c b/drivers/platform/x86/dell-wmi-led.c index d0232c7f1909..8753c4fc36b8 100644 --- a/drivers/platform/x86/dell-wmi-led.c +++ b/drivers/platform/x86/dell-wmi-led.c @@ -46,21 +46,16 @@ struct bios_args { unsigned char off_time; }; -static int dell_led_perform_fn(u8 length, - u8 result_code, - u8 device_id, - u8 command, - u8 on_time, - u8 off_time) +static int dell_led_perform_fn(u8 length, u8 result_code, u8 device_id, + u8 command, u8 on_time, u8 off_time) { - struct bios_args *bios_return; - u8 return_code; - union acpi_object *obj; struct acpi_buffer output = { ACPI_ALLOCATE_BUFFER, NULL }; + struct bios_args *bios_return, args; struct acpi_buffer input; + union acpi_object *obj; acpi_status status; + u8 return_code; - struct bios_args args; args.length = length; args.result_code = result_code; args.device_id = device_id; @@ -71,11 +66,7 @@ static int dell_led_perform_fn(u8 length, input.length = sizeof(struct bios_args); input.pointer = - status = wmi_evaluate_method(DELL_LED_BIOS_GUID, - 1, - 1, - , - ); + status = wmi_evaluate_method(DELL_LED_BIOS_GUID, 1, 1, , ); if (ACPI_FAILURE(status)) return status; @@ -84,7 +75,7 @@ static int dell_led_perform_fn(u8 length, if (!obj) return -EINVAL; - else if (obj->type != ACPI_TYPE_BUFFER) { + if (obj->type != ACPI_TYPE_BUFFER) { kfree(obj); return -EINVAL; } @@ -117,8 +108,7 @@ static int led_off(void) 0); /* not used */ } -static int led_blink(unsigned char on_eighths, - unsigned char off_eighths) +static int led_blink(unsigned char on_eighths, unsigned char off_eighths) { return dell_led_perform_fn(5, /* Length of command */ INTERFACE_ERROR,/* Init to INTERFACE_ERROR */ @@ -129,7 +119,7 @@ static int led_blink(unsigned char on_eighths, } static void dell_led_set(struct led_classdev *led_cdev, - enum led_brightness value) +enum led_brightness value) { if (value == LED_OFF) led_off(); @@ -138,24 +128,25 @@ static void dell_led_set(struct led_classdev *led_cdev, } static int dell_led_blink(struct led_classdev *led_cdev, - unsigned long *delay_on, - unsigned long *delay_off) + unsigned long *delay_on, unsigned long *delay_off) { unsigned long on_eighths; unsigned long off_eighths; - /* The Dell LED delay is based on 125ms intervals. - Need to round up to next interval. */ + /* +* The Dell LED delay is based on 125ms intervals. +* Need to round up to next interval. +*/ on_eighths = (*delay_on + 124) / 125; - if (0 == on_eighths) + if (on_eighths == 0) on_eighths = 1; if (on_eighths > 255) on_eighths = 255; *delay_on = on_eighths * 125; off_eighths = (*delay_off + 124) / 125; - if (0 == off_eighths) + if (off_eighths == 0) off_eighths = 1; if (off_eighths > 255) off_eighths = 255; -- 2.11.0
Re: [tpmdd-devel] [PATCH RFC v3 5/5] tpm2: expose resource manager via a device link /dev/tpms
On Mon, Jan 16, 2017 at 09:28:17AM -0800, James Bottomley wrote: > On Mon, 2017-01-16 at 19:24 +0200, Jarkko Sakkinen wrote: > > On Mon, Jan 16, 2017 at 09:14:13AM -0700, Jason Gunthorpe wrote: > > > On Mon, Jan 16, 2017 at 03:12:11PM +0200, Jarkko Sakkinen wrote: > > > > > > > @@ -199,7 +227,9 @@ struct tpm_chip *tpm_chip_alloc(struct device > > > > *pdev, > > > > return chip; > > > > > > > > out: > > > > + put_device(>devrm); > > > > put_device(>dev); > > > > + put_device(>devrm); > > > > return ERR_PTR(rc); > > > > } > > > > > > Something has gone wrong here.. > > > > Not a big surprise. There were a bunch of these small patches > > that I had to squash. Thanks for spotting this. > > > > Funny that I didn't experiece any issues when I run my smoke > > tests. > > It's the error path in tpm_chip_alloc: it's incredibly hard to trigger, > which is probably why your tests didn't see it. You basically either > have to be out of kernel memory or out of TPM device numbers. Right, of course. I have branch for each new patch set version. For v3 the branch is named as tabrm3. I'll apply all the fixes as separate commits on top of that branch. Once I'm ready to send a new version, I'll create tabrm4 branch and squash the fixes. /Jarkko
Re: [PATCH v4 07/15] lockdep: Implement crossrelease feature
On Tue, Jan 17, 2017 at 11:05:42AM +0900, Byungchul Park wrote: > On Mon, Jan 16, 2017 at 04:10:01PM +0100, Peter Zijlstra wrote: > > On Fri, Dec 09, 2016 at 02:12:03PM +0900, Byungchul Park wrote: > > > @@ -155,6 +164,9 @@ struct lockdep_map { > > > int cpu; > > > unsigned long ip; > > > #endif > > > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE > > > + struct cross_lock *xlock; > > > +#endif > > > > The use of this escapes me; why does the lockdep_map need a pointer to > > this? > > Lockdep interfaces e.g. lock_acquire(), lock_release() and lock_commit() > use lockdep_map as an arg, but crossrelease need to extract cross_lock > instances from that. > > Why not do something like: > > > > struct lockdep_map_cross { > > struct lockdep_map map; > > struct held_lockhlock; > > } Using a structure like that, you can pass lockdep_map_cross around just fine, since the lockdep_map is the first member, so the pointers are interchangeable. At worst we might need to munge a few typecasts. But then the cross release code can simply cast to the bigger type and have access to the extra data it knows to be there.
Re: [tpmdd-devel] [PATCH RFC v3 5/5] tpm2: expose resource manager via a device link /dev/tpms
On Mon, Jan 16, 2017 at 09:28:17AM -0800, James Bottomley wrote: > On Mon, 2017-01-16 at 19:24 +0200, Jarkko Sakkinen wrote: > > On Mon, Jan 16, 2017 at 09:14:13AM -0700, Jason Gunthorpe wrote: > > > On Mon, Jan 16, 2017 at 03:12:11PM +0200, Jarkko Sakkinen wrote: > > > > > > > @@ -199,7 +227,9 @@ struct tpm_chip *tpm_chip_alloc(struct device > > > > *pdev, > > > > return chip; > > > > > > > > out: > > > > + put_device(>devrm); > > > > put_device(>dev); > > > > + put_device(>devrm); > > > > return ERR_PTR(rc); > > > > } > > > > > > Something has gone wrong here.. > > > > Not a big surprise. There were a bunch of these small patches > > that I had to squash. Thanks for spotting this. > > > > Funny that I didn't experiece any issues when I run my smoke > > tests. > > It's the error path in tpm_chip_alloc: it's incredibly hard to trigger, > which is probably why your tests didn't see it. You basically either > have to be out of kernel memory or out of TPM device numbers. Right, of course. I have branch for each new patch set version. For v3 the branch is named as tabrm3. I'll apply all the fixes as separate commits on top of that branch. Once I'm ready to send a new version, I'll create tabrm4 branch and squash the fixes. /Jarkko
Re: [PATCH v4 07/15] lockdep: Implement crossrelease feature
On Tue, Jan 17, 2017 at 11:05:42AM +0900, Byungchul Park wrote: > On Mon, Jan 16, 2017 at 04:10:01PM +0100, Peter Zijlstra wrote: > > On Fri, Dec 09, 2016 at 02:12:03PM +0900, Byungchul Park wrote: > > > @@ -155,6 +164,9 @@ struct lockdep_map { > > > int cpu; > > > unsigned long ip; > > > #endif > > > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE > > > + struct cross_lock *xlock; > > > +#endif > > > > The use of this escapes me; why does the lockdep_map need a pointer to > > this? > > Lockdep interfaces e.g. lock_acquire(), lock_release() and lock_commit() > use lockdep_map as an arg, but crossrelease need to extract cross_lock > instances from that. > > Why not do something like: > > > > struct lockdep_map_cross { > > struct lockdep_map map; > > struct held_lockhlock; > > } Using a structure like that, you can pass lockdep_map_cross around just fine, since the lockdep_map is the first member, so the pointers are interchangeable. At worst we might need to munge a few typecasts. But then the cross release code can simply cast to the bigger type and have access to the extra data it knows to be there.
Re: [GIT PULL 4/4] ARM: defconfig: exynos: Defconfig for v4.11
On Fri, Jan 13, 2017 at 05:20:30PM +0200, Krzysztof Kozlowski wrote: > > The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77: > > Linux 4.10-rc1 (2016-12-25 16:13:08 -0800) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git > tags/samsung-defconfig-4.11 > > for you to fetch changes up to 74b0ee7579f279dae053053008d29f10f3890c07: > > ARM: exynos_defconfig: Enable IP multicast (2016-12-29 16:04:57 +0200) > > > Samsung defconfig update for v4.11: > 1. Cleanup from old MACHs in s5pv210. > 2. Enable IP_MULTICAST for libnss-mdns. > > > Krzysztof Kozlowski (1): > ARM: s5pv210_defconfig: Remove old MACHs > > Robie Basak (1): > ARM: exynos_defconfig: Enable IP multicast Merged, thanks. -Olof
Re: [GIT PULL 4/4] ARM: defconfig: exynos: Defconfig for v4.11
On Fri, Jan 13, 2017 at 05:20:30PM +0200, Krzysztof Kozlowski wrote: > > The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77: > > Linux 4.10-rc1 (2016-12-25 16:13:08 -0800) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git > tags/samsung-defconfig-4.11 > > for you to fetch changes up to 74b0ee7579f279dae053053008d29f10f3890c07: > > ARM: exynos_defconfig: Enable IP multicast (2016-12-29 16:04:57 +0200) > > > Samsung defconfig update for v4.11: > 1. Cleanup from old MACHs in s5pv210. > 2. Enable IP_MULTICAST for libnss-mdns. > > > Krzysztof Kozlowski (1): > ARM: s5pv210_defconfig: Remove old MACHs > > Robie Basak (1): > ARM: exynos_defconfig: Enable IP multicast Merged, thanks. -Olof
Re: [GIT PULL 2/4] ARM: dts: exynos: DT for v4.11
On Fri, Jan 13, 2017 at 05:20:28PM +0200, Krzysztof Kozlowski wrote: > Hi, > > Few improvements and Exynos4412 DTSI removal. > > This pulls external dependecy from clock tree (one clock commit). > > Best regards, > Krzysztof > > > The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77: > > Linux 4.10-rc1 (2016-12-25 16:13:08 -0800) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git > tags/samsung-dt-4.11 > > for you to fetch changes up to bca9085e0ae93253bc93ce218c85ac7d7e7f1831: > > ARM: dts: exynos: remove Exynos4212 support (dead code) (2017-01-11 > 18:28:43 +0200) > > > Samsung DeviceTree update for v4.11: > 1. Fixes for initial audio clocks configuration. > 2. Enable sound on Odroid-X board. > 3. Enable DMA for UART modules on Exynos5 SoCs. > 4. Add CPU OPPs for Exynos4412 Prime (newer version of Exynos4412). This pulls >necessary change in the clocks. > 5. Remove Exynos4212. We do not have any mainline boards with it. This will >simplify few bits later. Merged, thanks. -Olof
Re: [GIT PULL 2/4] ARM: dts: exynos: DT for v4.11
On Fri, Jan 13, 2017 at 05:20:28PM +0200, Krzysztof Kozlowski wrote: > Hi, > > Few improvements and Exynos4412 DTSI removal. > > This pulls external dependecy from clock tree (one clock commit). > > Best regards, > Krzysztof > > > The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77: > > Linux 4.10-rc1 (2016-12-25 16:13:08 -0800) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git > tags/samsung-dt-4.11 > > for you to fetch changes up to bca9085e0ae93253bc93ce218c85ac7d7e7f1831: > > ARM: dts: exynos: remove Exynos4212 support (dead code) (2017-01-11 > 18:28:43 +0200) > > > Samsung DeviceTree update for v4.11: > 1. Fixes for initial audio clocks configuration. > 2. Enable sound on Odroid-X board. > 3. Enable DMA for UART modules on Exynos5 SoCs. > 4. Add CPU OPPs for Exynos4412 Prime (newer version of Exynos4412). This pulls >necessary change in the clocks. > 5. Remove Exynos4212. We do not have any mainline boards with it. This will >simplify few bits later. Merged, thanks. -Olof
Re: [PATCH 2/3] Make static usermode helper binaries constant
On Mon, Jan 16, 2017 at 04:25:55PM -0500, J. Bruce Fields wrote: > On Mon, Jan 16, 2017 at 05:50:31PM +0100, Greg KH wrote: > > From: Greg Kroah-Hartman> > > > There are a number of usermode helper binaries that are "hard coded" in > > the kernel today, so mark them as "const" to make it harder for someone > > to change where the variables point to. > > > ... > > --- a/drivers/pnp/pnpbios/core.c > > +++ b/drivers/pnp/pnpbios/core.c > > @@ -98,6 +98,7 @@ static struct completion unload_sem; > > */ > > static int pnp_dock_event(int dock, struct pnp_docking_station_info *info) > > { > > + static char const sbin_pnpbios[] = "/sbin/pnpbios"; > > char *argv[3], **envp, *buf, *scratch; > > int i = 0, value; > > > > @@ -112,7 +113,7 @@ static int pnp_dock_event(int dock, struct > > pnp_docking_station_info *info) > > * integrated into the driver core and use the usual infrastructure > > * like sysfs and uevents > > */ > > - argv[0] = "/sbin/pnpbios"; > > + argv[0] = (char *)sbin_pnpbios; > > So here and elsewhere, can attackers write to argv[0] instead of to the > memory where the string lives? Yes, they could, it would be a very "tight" race to do that (have to write after the assignment and before the call_usermodehelper_exec() runs). However, the kernel does not run argv[0], it just passes it to the binary you specify in path, so for this example, the correct program would still be run by the kernel. But, if you do worry about this type of attack, then enable the option I created in patch 3/3 here, which will funnel all calls into a single userspace binary where you can then filter on argv[0] to see if you want to run the binary or not to prevent this type of attack. > Apologies if I'm rehashing earlier discussion, I did a quick search of > archives but could easily have missed something. No problem at all, hopefully I've explained it better now. thanks, greg k-h
Re: [PATCH 2/3] Make static usermode helper binaries constant
On Mon, Jan 16, 2017 at 04:25:55PM -0500, J. Bruce Fields wrote: > On Mon, Jan 16, 2017 at 05:50:31PM +0100, Greg KH wrote: > > From: Greg Kroah-Hartman > > > > There are a number of usermode helper binaries that are "hard coded" in > > the kernel today, so mark them as "const" to make it harder for someone > > to change where the variables point to. > > > ... > > --- a/drivers/pnp/pnpbios/core.c > > +++ b/drivers/pnp/pnpbios/core.c > > @@ -98,6 +98,7 @@ static struct completion unload_sem; > > */ > > static int pnp_dock_event(int dock, struct pnp_docking_station_info *info) > > { > > + static char const sbin_pnpbios[] = "/sbin/pnpbios"; > > char *argv[3], **envp, *buf, *scratch; > > int i = 0, value; > > > > @@ -112,7 +113,7 @@ static int pnp_dock_event(int dock, struct > > pnp_docking_station_info *info) > > * integrated into the driver core and use the usual infrastructure > > * like sysfs and uevents > > */ > > - argv[0] = "/sbin/pnpbios"; > > + argv[0] = (char *)sbin_pnpbios; > > So here and elsewhere, can attackers write to argv[0] instead of to the > memory where the string lives? Yes, they could, it would be a very "tight" race to do that (have to write after the assignment and before the call_usermodehelper_exec() runs). However, the kernel does not run argv[0], it just passes it to the binary you specify in path, so for this example, the correct program would still be run by the kernel. But, if you do worry about this type of attack, then enable the option I created in patch 3/3 here, which will funnel all calls into a single userspace binary where you can then filter on argv[0] to see if you want to run the binary or not to prevent this type of attack. > Apologies if I'm rehashing earlier discussion, I did a quick search of > archives but could easily have missed something. No problem at all, hopefully I've explained it better now. thanks, greg k-h
Re: [GIT PULL] STi DT update for v4.11 round 1
On Thu, Jan 12, 2017 at 05:59:43PM +0100, Patrice Chotard wrote: > Hi Arnd, Kevin, Olof > > PLease consider this first round of STi dts update for v4.11 : > > The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77: > > Linux 4.10-rc1 (2016-12-25 16:13:08 -0800) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/pchotard/sti.git > tags/sti-dt-for-v4.11 > > for you to fetch changes up to 2016ead446b98c42dffd9b6c03ce813e5cb3b810: > > ARM: dts: STiH407-family: Supply Mailbox properties to delta RProc > (2017-01-12 17:23:39 +0100) > > > STi dts update: > > Enable High Quality Video Data Plane (HQVDP) DT entry > Add DELTA V4L2 video decoder DT entry > Disable unused fdma instances > Fix sti-display-subsystem wrong clock parent's value > Cleanup and update DT entries related to remoteproc Merged, thanks! -Olof
Re: [GIT PULL 1/4] ARM: exynos: Mach/soc for v4.11
On Fri, Jan 13, 2017 at 05:20:27PM +0200, Krzysztof Kozlowski wrote: > Hi, > > Mostly cleanups for v4.11. > > Best regards, > Krzysztof > > > The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77: > > Linux 4.10-rc1 (2016-12-25 16:13:08 -0800) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git > tags/samsung-soc-4.11 > > for you to fetch changes up to cda1a52dab50340728e46601e6c9da9fc4beaf1f: > > ARM: s3c64xx: Constify wake_irqs (2016-12-29 15:41:44 +0200) > > > Samsung mach/soc update for v4.11. Mostly cleanups: > 1. Removal of unused platform data in S3C24XX and S3C64xx as follow up of >conversion to new DMA channel request API. > 2. Adding const and __ro_after_init to various data in Samsung platforms. > > > Krzysztof Kozlowski (7): > ARM: EXYNOS: Constify list of retention registers > ARM: EXYNOS: Annotate iomem and pm_data pointers __ro_after_init > ARM: s3c24xx: Constify few integer tables > ARM: s3c64xx: Annotate external clock frequencies __ro_after_init > ARM: SAMSUNG: Constify array of wake irqs passed to > samsung_sync_wakemask > ARM: s3c24xx: Constify wake_irqs > ARM: s3c64xx: Constify wake_irqs > > Sylwester Nawrocki (2): > ARM: s3c64xx: Drop initialization of unused struct s3c_audio_pdata > fields > ARM: s3c24xx: Drop unused struct s3c_audio_pdata entries Merged, thanks. -Olof
Re: [GIT PULL] STi DT update for v4.11 round 1
On Thu, Jan 12, 2017 at 05:59:43PM +0100, Patrice Chotard wrote: > Hi Arnd, Kevin, Olof > > PLease consider this first round of STi dts update for v4.11 : > > The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77: > > Linux 4.10-rc1 (2016-12-25 16:13:08 -0800) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/pchotard/sti.git > tags/sti-dt-for-v4.11 > > for you to fetch changes up to 2016ead446b98c42dffd9b6c03ce813e5cb3b810: > > ARM: dts: STiH407-family: Supply Mailbox properties to delta RProc > (2017-01-12 17:23:39 +0100) > > > STi dts update: > > Enable High Quality Video Data Plane (HQVDP) DT entry > Add DELTA V4L2 video decoder DT entry > Disable unused fdma instances > Fix sti-display-subsystem wrong clock parent's value > Cleanup and update DT entries related to remoteproc Merged, thanks! -Olof
Re: [GIT PULL 1/4] ARM: exynos: Mach/soc for v4.11
On Fri, Jan 13, 2017 at 05:20:27PM +0200, Krzysztof Kozlowski wrote: > Hi, > > Mostly cleanups for v4.11. > > Best regards, > Krzysztof > > > The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77: > > Linux 4.10-rc1 (2016-12-25 16:13:08 -0800) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git > tags/samsung-soc-4.11 > > for you to fetch changes up to cda1a52dab50340728e46601e6c9da9fc4beaf1f: > > ARM: s3c64xx: Constify wake_irqs (2016-12-29 15:41:44 +0200) > > > Samsung mach/soc update for v4.11. Mostly cleanups: > 1. Removal of unused platform data in S3C24XX and S3C64xx as follow up of >conversion to new DMA channel request API. > 2. Adding const and __ro_after_init to various data in Samsung platforms. > > > Krzysztof Kozlowski (7): > ARM: EXYNOS: Constify list of retention registers > ARM: EXYNOS: Annotate iomem and pm_data pointers __ro_after_init > ARM: s3c24xx: Constify few integer tables > ARM: s3c64xx: Annotate external clock frequencies __ro_after_init > ARM: SAMSUNG: Constify array of wake irqs passed to > samsung_sync_wakemask > ARM: s3c24xx: Constify wake_irqs > ARM: s3c64xx: Constify wake_irqs > > Sylwester Nawrocki (2): > ARM: s3c64xx: Drop initialization of unused struct s3c_audio_pdata > fields > ARM: s3c24xx: Drop unused struct s3c_audio_pdata entries Merged, thanks. -Olof
Re: [PATCH v4 07/15] lockdep: Implement crossrelease feature
On Tue, Jan 17, 2017 at 11:05:42AM +0900, Byungchul Park wrote: > On Mon, Jan 16, 2017 at 04:10:01PM +0100, Peter Zijlstra wrote: > > On Fri, Dec 09, 2016 at 02:12:03PM +0900, Byungchul Park wrote: > > > + > > > + /* > > > + * Whenever irq happens, these are updated so that we can > > > + * distinguish each irq context uniquely. > > > + */ > > > + unsigned inthardirq_id; > > > + unsigned intsoftirq_id; > > > > An alternative approach would be to 'unwind' or discard all historical > > events from a nested context once we exit it. > > That's one of what I considered. However, it would make code complex to > detect if pend_lock ring buffer was wrapped. I'm not sure I see the need for detecting that... > > > > After all, all we care about is the history of the release context, once > > the context is gone, we don't care. > > We must care it and decide if the next plock in the ring buffer might be > valid one or not. So I was thinking this was an overwriting ring buffer; something like so: struct pend_lock plocks[64]; unsigned int plocks_idx; static void plocks_add(..) { unsigned int idx = (plocks_idx++) % 64; plocks[idx] = ...; } static void plocks_close_context(int ctx) { for (i = 0; i < 64; i++) { int idx = (plocks_idx - 1) % 64; if (plocks[idx].ctx != ctx) break; plocks_idx--; } } Similarly for the release, it need only look at 64 entries and terminate early if the generation number is too old. static void plocks_release(unsigned int gen) { for (i = 0; i < 64; i++) { int idx = (plocks_idx - 1 - i) % 64; if ((int)(plocks[idx].gen_id - gen) < 0) break; /* do release muck */ } }
Re: [PATCH v4 07/15] lockdep: Implement crossrelease feature
On Tue, Jan 17, 2017 at 11:05:42AM +0900, Byungchul Park wrote: > On Mon, Jan 16, 2017 at 04:10:01PM +0100, Peter Zijlstra wrote: > > On Fri, Dec 09, 2016 at 02:12:03PM +0900, Byungchul Park wrote: > > > + > > > + /* > > > + * Whenever irq happens, these are updated so that we can > > > + * distinguish each irq context uniquely. > > > + */ > > > + unsigned inthardirq_id; > > > + unsigned intsoftirq_id; > > > > An alternative approach would be to 'unwind' or discard all historical > > events from a nested context once we exit it. > > That's one of what I considered. However, it would make code complex to > detect if pend_lock ring buffer was wrapped. I'm not sure I see the need for detecting that... > > > > After all, all we care about is the history of the release context, once > > the context is gone, we don't care. > > We must care it and decide if the next plock in the ring buffer might be > valid one or not. So I was thinking this was an overwriting ring buffer; something like so: struct pend_lock plocks[64]; unsigned int plocks_idx; static void plocks_add(..) { unsigned int idx = (plocks_idx++) % 64; plocks[idx] = ...; } static void plocks_close_context(int ctx) { for (i = 0; i < 64; i++) { int idx = (plocks_idx - 1) % 64; if (plocks[idx].ctx != ctx) break; plocks_idx--; } } Similarly for the release, it need only look at 64 entries and terminate early if the generation number is too old. static void plocks_release(unsigned int gen) { for (i = 0; i < 64; i++) { int idx = (plocks_idx - 1 - i) % 64; if ((int)(plocks[idx].gen_id - gen) < 0) break; /* do release muck */ } }
Re: [GIT PULL 3/4] arm64: dts: exynos: DT for v4.11
On Fri, Jan 13, 2017 at 05:20:29PM +0200, Krzysztof Kozlowski wrote: > Hi, > > Important: > 1. Unlike others, this is rebased on v4.10-rc2 because v4.10-rc1 was not >building on arm64. > 2. This contains pinctrl change (in header), acked by Linus Walleij. >Although I kept it on separate branch and merged here, but I did not >prepare a separate pull request for this one pinctrl change - it is >just one patch. Anyway it sits on separate branch so this is still >possible if you need it. > > Best regards, > Krzysztof > > Cc: Linus Walleij> > The following changes since commit 0c744ea4f77d72b3dcebb7a8f2684633ec79be88: > > Linux 4.10-rc2 (2017-01-01 14:31:53 -0800) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git > tags/samsung-dt64-4.11 > > for you to fetch changes up to e4e381133241a27d732e78be09973b89a193eaf7: > > arm64: dts: exynos: Enable HDMI/TV path on Exynos5433-TM2 (2017-01-11 > 18:20:28 +0200) > > > Samsung DeviceTree ARM64 update for v4.11: > 1. Add bus frequency and voltage scalling on Exynos5433 TM2 device (along with >necessary bus nodes and Platform Performance Monitoring Unit on > Exynos5433). > 2. Use macros for pinctrl settings on Exynos5433. >This contains necessary header with bindings. > 3. Minor cleanups in Exynos5433 DTSI and boards using it. > 4. Create common DTSI betweem Exynos5433 TM2E and TM2E. > 5. Add HDMI/TV to Exynos5433 TM2. Merged, thanks. -Olof
Re: [GIT PULL 3/4] arm64: dts: exynos: DT for v4.11
On Fri, Jan 13, 2017 at 05:20:29PM +0200, Krzysztof Kozlowski wrote: > Hi, > > Important: > 1. Unlike others, this is rebased on v4.10-rc2 because v4.10-rc1 was not >building on arm64. > 2. This contains pinctrl change (in header), acked by Linus Walleij. >Although I kept it on separate branch and merged here, but I did not >prepare a separate pull request for this one pinctrl change - it is >just one patch. Anyway it sits on separate branch so this is still >possible if you need it. > > Best regards, > Krzysztof > > Cc: Linus Walleij > > The following changes since commit 0c744ea4f77d72b3dcebb7a8f2684633ec79be88: > > Linux 4.10-rc2 (2017-01-01 14:31:53 -0800) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git > tags/samsung-dt64-4.11 > > for you to fetch changes up to e4e381133241a27d732e78be09973b89a193eaf7: > > arm64: dts: exynos: Enable HDMI/TV path on Exynos5433-TM2 (2017-01-11 > 18:20:28 +0200) > > > Samsung DeviceTree ARM64 update for v4.11: > 1. Add bus frequency and voltage scalling on Exynos5433 TM2 device (along with >necessary bus nodes and Platform Performance Monitoring Unit on > Exynos5433). > 2. Use macros for pinctrl settings on Exynos5433. >This contains necessary header with bindings. > 3. Minor cleanups in Exynos5433 DTSI and boards using it. > 4. Create common DTSI betweem Exynos5433 TM2E and TM2E. > 5. Add HDMI/TV to Exynos5433 TM2. Merged, thanks. -Olof
Re: [PATCH] usb: dwc3: ep0: Fix the possible missed request for handling delay STATUS phase
Hi, On 16 January 2017 at 20:06, Felipe Balbiwrote: > > Hi, > > Baolin Wang writes: >> Hi, >> >> On 16 January 2017 at 19:29, Felipe Balbi wrote: >>> >>> Hi, >>> >>> Baolin Wang writes: Hi, On 16 January 2017 at 18:56, Felipe Balbi wrote: > > Hi, > > Baolin Wang writes: >> When handing the SETUP packet by composite_setup(), we will release the >> dwc->lock. If we get the 'USB_GADGET_DELAYED_STATUS' result from setup >> function, which means we need to delay handling the STATUS phase. > > this sentence needs a little work. Seems like it's missing some > information. > > anyway, I get that we release the lock but... > >> But during the lock release period, maybe the request for handling delay > > execution of ->setup() itself should be locked. I can see that it's only > locked for set_config() which is rather peculiar. > > What exact request you had when you triggered this? (Hint: dwc3 > tracepoints print out ctrl request bytes). IIRC, only set_config() or > f->set_alt() can actually return USB_GADGET_DELAYED_STATUS. Yes, when host set configuration for mass storage driver, it can produce this issue. > > Which gadget driver were you using when you triggered this? mass storage driver. When host issues the setting config request, we will get USB_GADGET_DELAYED_STATUS result from set_config()--->fsg_set_alt(). Then the mass storage driver will issue one thread to complete the status stage by ep0_queue() (this thread may be running on another core), then if the thread issues ep0_queue() too fast before we get the dwc->lock in dwc3_ep0_delegate_req() or before we get into the STATUS stage, then we can not handle this request for the delay STATUS stage in dwc3_gadget_ep0_queue(). > > Another point here is that the really robust way of fixing this is to > get rid of USB_GADGET_DELAYED_STATUS altogether and just make sure > gadget drivers know how to queue requests for all three phases of a > Control Transfer. > > A lot of code will be removed from all gadget drivers and UDC drivers > while combining all of it in a single place in composite.c. > > The reason I'm saying this is that other UDC drivers might have similar > races already but they just haven't triggered. Yes, maybe. > >> STATUS phase has been queued into list before we set >> 'dwc->delayed_status' >> flag or entering 'EP0_STATUS_PHASE' phase, then we will miss the chance >> to handle the STATUS phase. Thus we should check if the request for delay >> STATUS phase has been enqueued when entering 'EP0_STATUS_PHASE' phase in >> dwc3_ep0_xfernotready(), if so, we should handle it. >> >> Signed-off-by: Baolin Wang >> --- >> drivers/usb/dwc3/ep0.c | 14 ++ >> 1 file changed, 14 insertions(+) >> >> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c >> index 9bb1f85..e689ced 100644 >> --- a/drivers/usb/dwc3/ep0.c >> +++ b/drivers/usb/dwc3/ep0.c >> @@ -1123,7 +1123,21 @@ static void dwc3_ep0_xfernotready(struct dwc3 >> *dwc, >> dwc->ep0state = EP0_STATUS_PHASE; >> >> if (dwc->delayed_status) { >> + struct dwc3_ep *dep = dwc->eps[0]; >> + >> WARN_ON_ONCE(event->endpoint_number != 1); >> + /* >> + * We should handle the delay STATUS phase here if >> the >> + * request for handling delay STATUS has been >> queued >> + * into the list. >> + */ >> + if (!list_empty(>pending_list)) { >> + dwc->delayed_status = false; >> + usb_gadget_set_state(>gadget, >> + USB_STATE_CONFIGURED); > > Isn't this patch also changing the normal case when usb_ep_queue() comes > later? I guess list_empty() protects against that... I think it will not change other cases, we only handle the delayed status and I've tested it for a while and I did not find any problem. >>> >>> Alright, it's important enough to fix this bug. Can you also have a look >>> into dropping USB_GADGET_DELAYED_STATUS altogether? If you're too busy, >>> no issues. It'll stay in my queue. >> >> Okay, I will have a look at f_mass_storage driver to see if we can >> drop USB_GADGET_DELAYED_STATUS. Thanks. > > not only mass storage. It needs to be done for all drivers. The way to > do that is to teach functions that control transfers are composed of
Re: [PATCH] usb: dwc3: ep0: Fix the possible missed request for handling delay STATUS phase
Hi, On 16 January 2017 at 20:06, Felipe Balbi wrote: > > Hi, > > Baolin Wang writes: >> Hi, >> >> On 16 January 2017 at 19:29, Felipe Balbi wrote: >>> >>> Hi, >>> >>> Baolin Wang writes: Hi, On 16 January 2017 at 18:56, Felipe Balbi wrote: > > Hi, > > Baolin Wang writes: >> When handing the SETUP packet by composite_setup(), we will release the >> dwc->lock. If we get the 'USB_GADGET_DELAYED_STATUS' result from setup >> function, which means we need to delay handling the STATUS phase. > > this sentence needs a little work. Seems like it's missing some > information. > > anyway, I get that we release the lock but... > >> But during the lock release period, maybe the request for handling delay > > execution of ->setup() itself should be locked. I can see that it's only > locked for set_config() which is rather peculiar. > > What exact request you had when you triggered this? (Hint: dwc3 > tracepoints print out ctrl request bytes). IIRC, only set_config() or > f->set_alt() can actually return USB_GADGET_DELAYED_STATUS. Yes, when host set configuration for mass storage driver, it can produce this issue. > > Which gadget driver were you using when you triggered this? mass storage driver. When host issues the setting config request, we will get USB_GADGET_DELAYED_STATUS result from set_config()--->fsg_set_alt(). Then the mass storage driver will issue one thread to complete the status stage by ep0_queue() (this thread may be running on another core), then if the thread issues ep0_queue() too fast before we get the dwc->lock in dwc3_ep0_delegate_req() or before we get into the STATUS stage, then we can not handle this request for the delay STATUS stage in dwc3_gadget_ep0_queue(). > > Another point here is that the really robust way of fixing this is to > get rid of USB_GADGET_DELAYED_STATUS altogether and just make sure > gadget drivers know how to queue requests for all three phases of a > Control Transfer. > > A lot of code will be removed from all gadget drivers and UDC drivers > while combining all of it in a single place in composite.c. > > The reason I'm saying this is that other UDC drivers might have similar > races already but they just haven't triggered. Yes, maybe. > >> STATUS phase has been queued into list before we set >> 'dwc->delayed_status' >> flag or entering 'EP0_STATUS_PHASE' phase, then we will miss the chance >> to handle the STATUS phase. Thus we should check if the request for delay >> STATUS phase has been enqueued when entering 'EP0_STATUS_PHASE' phase in >> dwc3_ep0_xfernotready(), if so, we should handle it. >> >> Signed-off-by: Baolin Wang >> --- >> drivers/usb/dwc3/ep0.c | 14 ++ >> 1 file changed, 14 insertions(+) >> >> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c >> index 9bb1f85..e689ced 100644 >> --- a/drivers/usb/dwc3/ep0.c >> +++ b/drivers/usb/dwc3/ep0.c >> @@ -1123,7 +1123,21 @@ static void dwc3_ep0_xfernotready(struct dwc3 >> *dwc, >> dwc->ep0state = EP0_STATUS_PHASE; >> >> if (dwc->delayed_status) { >> + struct dwc3_ep *dep = dwc->eps[0]; >> + >> WARN_ON_ONCE(event->endpoint_number != 1); >> + /* >> + * We should handle the delay STATUS phase here if >> the >> + * request for handling delay STATUS has been >> queued >> + * into the list. >> + */ >> + if (!list_empty(>pending_list)) { >> + dwc->delayed_status = false; >> + usb_gadget_set_state(>gadget, >> + USB_STATE_CONFIGURED); > > Isn't this patch also changing the normal case when usb_ep_queue() comes > later? I guess list_empty() protects against that... I think it will not change other cases, we only handle the delayed status and I've tested it for a while and I did not find any problem. >>> >>> Alright, it's important enough to fix this bug. Can you also have a look >>> into dropping USB_GADGET_DELAYED_STATUS altogether? If you're too busy, >>> no issues. It'll stay in my queue. >> >> Okay, I will have a look at f_mass_storage driver to see if we can >> drop USB_GADGET_DELAYED_STATUS. Thanks. > > not only mass storage. It needs to be done for all drivers. The way to > do that is to teach functions that control transfers are composed of two > or three phases. If you look at UDC drivers today, they all have > peculiarities about control transfers to handle stuff that *maybe* > gadget
Re: [PATCH 3/3] Reverted "mm: bail out in shrink_inactive_list()"
On Mon, Jan 16, 2017 at 08:33:17PM +0100, Michal Hocko wrote: > From: Michal Hocko> > This reverts 91dcade47a3d0e7c31464ef05f56c08e92a0e9c2. > inactive_reclaimable_pages shouldn't be needed anymore since that > get_scan_count is aware of the eligble zones ("mm, vmscan: consider > eligible zones in get_scan_count"). > > Signed-off-by: Michal Hocko Acked-by: Minchan Kim
Re: [PATCH 3/3] Reverted "mm: bail out in shrink_inactive_list()"
On Mon, Jan 16, 2017 at 08:33:17PM +0100, Michal Hocko wrote: > From: Michal Hocko > > This reverts 91dcade47a3d0e7c31464ef05f56c08e92a0e9c2. > inactive_reclaimable_pages shouldn't be needed anymore since that > get_scan_count is aware of the eligble zones ("mm, vmscan: consider > eligible zones in get_scan_count"). > > Signed-off-by: Michal Hocko Acked-by: Minchan Kim
Re: [PATCH 1/3] mm, vmscan: cleanup lru size claculations
On Mon, Jan 16, 2017 at 08:33:15PM +0100, Michal Hocko wrote: > From: Michal Hocko> > lruvec_lru_size returns the full size of the LRU list while we sometimes > need a value reduced only to eligible zones (e.g. for lowmem requests). > inactive_list_is_low is one such user. Later patches will add more of > them. Add a new parameter to lruvec_lru_size and allow it filter out > zones which are not eligible for the given context. > > Acked-by: Johannes Weiner > Signed-off-by: Michal Hocko > --- > include/linux/mmzone.h | 2 +- > mm/vmscan.c| 88 > -- > mm/workingset.c| 2 +- > 3 files changed, 45 insertions(+), 47 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index d1d440cff60e..91f69aa0d581 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -780,7 +780,7 @@ static inline struct pglist_data *lruvec_pgdat(struct > lruvec *lruvec) > #endif > } > > -extern unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list > lru); > +extern unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list > lru, int zone_idx); > > #ifdef CONFIG_HAVE_MEMORY_PRESENT > void memory_present(int nid, unsigned long start, unsigned long end); > diff --git a/mm/vmscan.c b/mm/vmscan.c > index cf940af609fd..1cb0ebdef305 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -234,22 +234,38 @@ bool pgdat_reclaimable(struct pglist_data *pgdat) > pgdat_reclaimable_pages(pgdat) * 6; > } > > -unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru) > +/** lruvec_lru_size - Returns the number of pages on the given LRU list. minor: /* * lruvec_lru_size I don't have any preferance but just found. Otherwise, Acked-by: Minchan Kim
Re: [PATCH 1/3] mm, vmscan: cleanup lru size claculations
On Mon, Jan 16, 2017 at 08:33:15PM +0100, Michal Hocko wrote: > From: Michal Hocko > > lruvec_lru_size returns the full size of the LRU list while we sometimes > need a value reduced only to eligible zones (e.g. for lowmem requests). > inactive_list_is_low is one such user. Later patches will add more of > them. Add a new parameter to lruvec_lru_size and allow it filter out > zones which are not eligible for the given context. > > Acked-by: Johannes Weiner > Signed-off-by: Michal Hocko > --- > include/linux/mmzone.h | 2 +- > mm/vmscan.c| 88 > -- > mm/workingset.c| 2 +- > 3 files changed, 45 insertions(+), 47 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index d1d440cff60e..91f69aa0d581 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -780,7 +780,7 @@ static inline struct pglist_data *lruvec_pgdat(struct > lruvec *lruvec) > #endif > } > > -extern unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list > lru); > +extern unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list > lru, int zone_idx); > > #ifdef CONFIG_HAVE_MEMORY_PRESENT > void memory_present(int nid, unsigned long start, unsigned long end); > diff --git a/mm/vmscan.c b/mm/vmscan.c > index cf940af609fd..1cb0ebdef305 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -234,22 +234,38 @@ bool pgdat_reclaimable(struct pglist_data *pgdat) > pgdat_reclaimable_pages(pgdat) * 6; > } > > -unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru) > +/** lruvec_lru_size - Returns the number of pages on the given LRU list. minor: /* * lruvec_lru_size I don't have any preferance but just found. Otherwise, Acked-by: Minchan Kim
[PATCH v3 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points
Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") results in a deadlock, as the author "Tariq Saeed" realized shortly after the patch was merged. The discussion happened here (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html). The reason why taking cluster inode lock at vfs entry points opens up a self deadlock window, is explained in the previous patch of this series. So far, we have seen two different code paths that have this issue. 1. do_sys_open may_open inode_permission ocfs2_permission ocfs2_inode_lock() <=== take PR generic_permission get_acl ocfs2_iop_get_acl ocfs2_inode_lock() <=== take PR 2. fchmod|fchmodat chmod_common notify_change ocfs2_setattr <=== take EX posix_acl_chmod get_acl ocfs2_iop_get_acl <=== take PR ocfs2_iop_set_acl <=== take EX Fixes them by adding the tracking logic (in the previous patch) for these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(), ocfs2_setattr(). Changes since v1: - Let ocfs2_is_locked_by_me() just return true/false to indicate if the process gets the cluster lock - suggested by: Joseph Qiand Junxiao Bi . - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder", suggested by: Junxiao Bi. - Add debugging output at ocfs2_setattr() and ocfs2_permission() to catch exceptional cases, suggested by: Junxiao Bi. Changes since v2: - Use new wrappers of tracking logic code, suggested by: Junxiao Bi. Signed-off-by: Eric Ren --- fs/ocfs2/acl.c | 29 + fs/ocfs2/file.c | 58 - 2 files changed, 58 insertions(+), 29 deletions(-) diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c index bed1fcb..dc22ba8 100644 --- a/fs/ocfs2/acl.c +++ b/fs/ocfs2/acl.c @@ -283,16 +283,14 @@ int ocfs2_set_acl(handle_t *handle, int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type) { struct buffer_head *bh = NULL; - int status = 0; + int status, had_lock; + struct ocfs2_lock_holder oh; - status = ocfs2_inode_lock(inode, , 1); - if (status < 0) { - if (status != -ENOENT) - mlog_errno(status); - return status; - } + had_lock = ocfs2_inode_lock_tracker(inode, , 1, ); + if (had_lock < 0) + return had_lock; status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL); - ocfs2_inode_unlock(inode, 1); + ocfs2_inode_unlock_tracker(inode, 1, , had_lock); brelse(bh); return status; } @@ -302,21 +300,20 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, int type) struct ocfs2_super *osb; struct buffer_head *di_bh = NULL; struct posix_acl *acl; - int ret; + int had_lock; + struct ocfs2_lock_holder oh; osb = OCFS2_SB(inode->i_sb); if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL)) return NULL; - ret = ocfs2_inode_lock(inode, _bh, 0); - if (ret < 0) { - if (ret != -ENOENT) - mlog_errno(ret); - return ERR_PTR(ret); - } + + had_lock = ocfs2_inode_lock_tracker(inode, _bh, 0, ); + if (had_lock < 0) + return ERR_PTR(had_lock); acl = ocfs2_get_acl_nolock(inode, type, di_bh); - ocfs2_inode_unlock(inode, 0); + ocfs2_inode_unlock_tracker(inode, 0, , had_lock); brelse(di_bh); return acl; } diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index c488965..7b6a146 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1138,6 +1138,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) handle_t *handle = NULL; struct dquot *transfer_to[MAXQUOTAS] = { }; int qtype; + int had_lock; + struct ocfs2_lock_holder oh; trace_ocfs2_setattr(inode, dentry, (unsigned long long)OCFS2_I(inode)->ip_blkno, @@ -1173,11 +1175,30 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) } } - status = ocfs2_inode_lock(inode, , 1); - if (status < 0) { - if (status != -ENOENT) - mlog_errno(status); + had_lock = ocfs2_inode_lock_tracker(inode, , 1, ); + if (had_lock < 0) { + status = had_lock; goto bail_unlock_rw; + } else if (had_lock) { + /* +* As far as we know, ocfs2_setattr() could only be the first +* VFS entry point in the call chain of recursive cluster +* locking issue. +* +* For instance: +* chmod_common() +* notify_change() +* ocfs2_setattr() +
[PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
We are in the situation that we have to avoid recursive cluster locking, but there is no way to check if a cluster lock has been taken by a precess already. Mostly, we can avoid recursive locking by writing code carefully. However, we found that it's very hard to handle the routines that are invoked directly by vfs code. For instance: const struct inode_operations ocfs2_file_iops = { .permission = ocfs2_permission, .get_acl= ocfs2_iop_get_acl, .set_acl= ocfs2_iop_set_acl, }; Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR): do_sys_open may_open inode_permission ocfs2_permission ocfs2_inode_lock() <=== first time generic_permission get_acl ocfs2_iop_get_acl ocfs2_inode_lock() <=== recursive one A deadlock will occur if a remote EX request comes in between two of ocfs2_inode_lock(). Briefly describe how the deadlock is formed: On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of the remote EX lock request. Another hand, the recursive cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock() because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why? because there is no chance for the first cluster lock on this node to be unlocked - we block ourselves in the code path. The idea to fix this issue is mostly taken from gfs2 code. 1. introduce a new field: struct ocfs2_lock_res.l_holders, to keep track of the processes' pid who has taken the cluster lock of this lock resource; 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH; it means just getting back disk inode bh for us if we've got cluster lock. 3. export a helper: ocfs2_is_locked_by_me() is used to check if we have got the cluster lock in the upper code path. The tracking logic should be used by some of the ocfs2 vfs's callbacks, to solve the recursive locking issue cuased by the fact that vfs routines can call into each other. The performance penalty of processing the holder list should only be seen at a few cases where the tracking logic is used, such as get/set acl. You may ask what if the first time we got a PR lock, and the second time we want a EX lock? fortunately, this case never happens in the real world, as far as I can see, including permission check, (get|set)_(acl|attr), and the gfs2 code also do so. Changes since v1: - Let ocfs2_is_locked_by_me() just return true/false to indicate if the process gets the cluster lock - suggested by: Joseph Qiand Junxiao Bi . - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder", suggested by: Junxiao Bi. - Do not inline functions whose bodies are not in scope, changed by: Stephen Rothwell . Changes since v2: - Wrap the tracking logic code of recursive locking into functions, ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(), suggested by: Junxiao Bi. [s...@canb.auug.org.au remove some inlines] Signed-off-by: Eric Ren --- fs/ocfs2/dlmglue.c | 105 +++-- fs/ocfs2/dlmglue.h | 18 + fs/ocfs2/ocfs2.h | 1 + 3 files changed, 121 insertions(+), 3 deletions(-) diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c index 77d1632..c75b9e9 100644 --- a/fs/ocfs2/dlmglue.c +++ b/fs/ocfs2/dlmglue.c @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res) init_waitqueue_head(>l_event); INIT_LIST_HEAD(>l_blocked_list); INIT_LIST_HEAD(>l_mask_waiters); + INIT_LIST_HEAD(>l_holders); } void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res, @@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res) res->l_flags = 0UL; } +/* + * Keep a list of processes who have interest in a lockres. + * Note: this is now only uesed for check recursive cluster locking. + */ +static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres, + struct ocfs2_lock_holder *oh) +{ + INIT_LIST_HEAD(>oh_list); + oh->oh_owner_pid = get_pid(task_pid(current)); + + spin_lock(>l_lock); + list_add_tail(>oh_list, >l_holders); + spin_unlock(>l_lock); +} + +static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres, + struct ocfs2_lock_holder *oh) +{ + spin_lock(>l_lock); + list_del(>oh_list); + spin_unlock(>l_lock); + + put_pid(oh->oh_owner_pid); +} + +static inline int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres) +{ + struct ocfs2_lock_holder *oh; + struct pid *pid; + + /* look in the list of holders for one with the current task as owner */ + spin_lock(>l_lock); + pid = task_pid(current); + list_for_each_entry(oh, >l_holders, oh_list) { + if (oh->oh_owner_pid == pid)
[PATCH v3 0/2] fix deadlock caused by recursive cluster locking
This is a formal patch set v2 to solve the deadlock issue on which I previously started a RFC (draft patch), and the discussion happened here: [https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012455.html] Compared to the previous draft patch, this one is much simple and neat. It neither messes up the dlmglue core, nor has a performance penalty on the whole cluster locking system. Instead, it is only used in places where such recursive cluster locking may happen. Changes since v1: - Let ocfs2_is_locked_by_me() just return true/false to indicate if the process gets the cluster lock - suggested by: Joseph Qiand Junxiao Bi . - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder", suggested by: Junxiao Bi. - Add debugging output at ocfs2_setattr() and ocfs2_permission() to catch exceptional cases, suggested by: Junxiao Bi. - Do not inline functions whose bodies are not in scope, changed by: Stephen Rothwell . Changes since v2: - Use new wrappers of tracking logic code, suggested by: Junxiao Bi. Your comments and feedbacks are always welcomed. Eric Ren (2): ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock ocfs2: fix deadlock issue when taking inode lock at vfs entry points fs/ocfs2/acl.c | 29 +++ fs/ocfs2/dlmglue.c | 105 +++-- fs/ocfs2/dlmglue.h | 18 + fs/ocfs2/file.c| 58 ++--- fs/ocfs2/ocfs2.h | 1 + 5 files changed, 179 insertions(+), 32 deletions(-) -- 2.10.2
[PATCH v3 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points
Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") results in a deadlock, as the author "Tariq Saeed" realized shortly after the patch was merged. The discussion happened here (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html). The reason why taking cluster inode lock at vfs entry points opens up a self deadlock window, is explained in the previous patch of this series. So far, we have seen two different code paths that have this issue. 1. do_sys_open may_open inode_permission ocfs2_permission ocfs2_inode_lock() <=== take PR generic_permission get_acl ocfs2_iop_get_acl ocfs2_inode_lock() <=== take PR 2. fchmod|fchmodat chmod_common notify_change ocfs2_setattr <=== take EX posix_acl_chmod get_acl ocfs2_iop_get_acl <=== take PR ocfs2_iop_set_acl <=== take EX Fixes them by adding the tracking logic (in the previous patch) for these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(), ocfs2_setattr(). Changes since v1: - Let ocfs2_is_locked_by_me() just return true/false to indicate if the process gets the cluster lock - suggested by: Joseph Qi and Junxiao Bi . - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder", suggested by: Junxiao Bi. - Add debugging output at ocfs2_setattr() and ocfs2_permission() to catch exceptional cases, suggested by: Junxiao Bi. Changes since v2: - Use new wrappers of tracking logic code, suggested by: Junxiao Bi. Signed-off-by: Eric Ren --- fs/ocfs2/acl.c | 29 + fs/ocfs2/file.c | 58 - 2 files changed, 58 insertions(+), 29 deletions(-) diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c index bed1fcb..dc22ba8 100644 --- a/fs/ocfs2/acl.c +++ b/fs/ocfs2/acl.c @@ -283,16 +283,14 @@ int ocfs2_set_acl(handle_t *handle, int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type) { struct buffer_head *bh = NULL; - int status = 0; + int status, had_lock; + struct ocfs2_lock_holder oh; - status = ocfs2_inode_lock(inode, , 1); - if (status < 0) { - if (status != -ENOENT) - mlog_errno(status); - return status; - } + had_lock = ocfs2_inode_lock_tracker(inode, , 1, ); + if (had_lock < 0) + return had_lock; status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL); - ocfs2_inode_unlock(inode, 1); + ocfs2_inode_unlock_tracker(inode, 1, , had_lock); brelse(bh); return status; } @@ -302,21 +300,20 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, int type) struct ocfs2_super *osb; struct buffer_head *di_bh = NULL; struct posix_acl *acl; - int ret; + int had_lock; + struct ocfs2_lock_holder oh; osb = OCFS2_SB(inode->i_sb); if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL)) return NULL; - ret = ocfs2_inode_lock(inode, _bh, 0); - if (ret < 0) { - if (ret != -ENOENT) - mlog_errno(ret); - return ERR_PTR(ret); - } + + had_lock = ocfs2_inode_lock_tracker(inode, _bh, 0, ); + if (had_lock < 0) + return ERR_PTR(had_lock); acl = ocfs2_get_acl_nolock(inode, type, di_bh); - ocfs2_inode_unlock(inode, 0); + ocfs2_inode_unlock_tracker(inode, 0, , had_lock); brelse(di_bh); return acl; } diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index c488965..7b6a146 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1138,6 +1138,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) handle_t *handle = NULL; struct dquot *transfer_to[MAXQUOTAS] = { }; int qtype; + int had_lock; + struct ocfs2_lock_holder oh; trace_ocfs2_setattr(inode, dentry, (unsigned long long)OCFS2_I(inode)->ip_blkno, @@ -1173,11 +1175,30 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) } } - status = ocfs2_inode_lock(inode, , 1); - if (status < 0) { - if (status != -ENOENT) - mlog_errno(status); + had_lock = ocfs2_inode_lock_tracker(inode, , 1, ); + if (had_lock < 0) { + status = had_lock; goto bail_unlock_rw; + } else if (had_lock) { + /* +* As far as we know, ocfs2_setattr() could only be the first +* VFS entry point in the call chain of recursive cluster +* locking issue. +* +* For instance: +* chmod_common() +* notify_change() +* ocfs2_setattr() +*posix_acl_chmod() +*
[PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
We are in the situation that we have to avoid recursive cluster locking, but there is no way to check if a cluster lock has been taken by a precess already. Mostly, we can avoid recursive locking by writing code carefully. However, we found that it's very hard to handle the routines that are invoked directly by vfs code. For instance: const struct inode_operations ocfs2_file_iops = { .permission = ocfs2_permission, .get_acl= ocfs2_iop_get_acl, .set_acl= ocfs2_iop_set_acl, }; Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR): do_sys_open may_open inode_permission ocfs2_permission ocfs2_inode_lock() <=== first time generic_permission get_acl ocfs2_iop_get_acl ocfs2_inode_lock() <=== recursive one A deadlock will occur if a remote EX request comes in between two of ocfs2_inode_lock(). Briefly describe how the deadlock is formed: On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of the remote EX lock request. Another hand, the recursive cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock() because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why? because there is no chance for the first cluster lock on this node to be unlocked - we block ourselves in the code path. The idea to fix this issue is mostly taken from gfs2 code. 1. introduce a new field: struct ocfs2_lock_res.l_holders, to keep track of the processes' pid who has taken the cluster lock of this lock resource; 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH; it means just getting back disk inode bh for us if we've got cluster lock. 3. export a helper: ocfs2_is_locked_by_me() is used to check if we have got the cluster lock in the upper code path. The tracking logic should be used by some of the ocfs2 vfs's callbacks, to solve the recursive locking issue cuased by the fact that vfs routines can call into each other. The performance penalty of processing the holder list should only be seen at a few cases where the tracking logic is used, such as get/set acl. You may ask what if the first time we got a PR lock, and the second time we want a EX lock? fortunately, this case never happens in the real world, as far as I can see, including permission check, (get|set)_(acl|attr), and the gfs2 code also do so. Changes since v1: - Let ocfs2_is_locked_by_me() just return true/false to indicate if the process gets the cluster lock - suggested by: Joseph Qi and Junxiao Bi . - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder", suggested by: Junxiao Bi. - Do not inline functions whose bodies are not in scope, changed by: Stephen Rothwell . Changes since v2: - Wrap the tracking logic code of recursive locking into functions, ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(), suggested by: Junxiao Bi. [s...@canb.auug.org.au remove some inlines] Signed-off-by: Eric Ren --- fs/ocfs2/dlmglue.c | 105 +++-- fs/ocfs2/dlmglue.h | 18 + fs/ocfs2/ocfs2.h | 1 + 3 files changed, 121 insertions(+), 3 deletions(-) diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c index 77d1632..c75b9e9 100644 --- a/fs/ocfs2/dlmglue.c +++ b/fs/ocfs2/dlmglue.c @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res) init_waitqueue_head(>l_event); INIT_LIST_HEAD(>l_blocked_list); INIT_LIST_HEAD(>l_mask_waiters); + INIT_LIST_HEAD(>l_holders); } void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res, @@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res) res->l_flags = 0UL; } +/* + * Keep a list of processes who have interest in a lockres. + * Note: this is now only uesed for check recursive cluster locking. + */ +static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres, + struct ocfs2_lock_holder *oh) +{ + INIT_LIST_HEAD(>oh_list); + oh->oh_owner_pid = get_pid(task_pid(current)); + + spin_lock(>l_lock); + list_add_tail(>oh_list, >l_holders); + spin_unlock(>l_lock); +} + +static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres, + struct ocfs2_lock_holder *oh) +{ + spin_lock(>l_lock); + list_del(>oh_list); + spin_unlock(>l_lock); + + put_pid(oh->oh_owner_pid); +} + +static inline int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres) +{ + struct ocfs2_lock_holder *oh; + struct pid *pid; + + /* look in the list of holders for one with the current task as owner */ + spin_lock(>l_lock); + pid = task_pid(current); + list_for_each_entry(oh, >l_holders, oh_list) { + if (oh->oh_owner_pid == pid) { + spin_unlock(>l_lock); + return 1;
[PATCH v3 0/2] fix deadlock caused by recursive cluster locking
This is a formal patch set v2 to solve the deadlock issue on which I previously started a RFC (draft patch), and the discussion happened here: [https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012455.html] Compared to the previous draft patch, this one is much simple and neat. It neither messes up the dlmglue core, nor has a performance penalty on the whole cluster locking system. Instead, it is only used in places where such recursive cluster locking may happen. Changes since v1: - Let ocfs2_is_locked_by_me() just return true/false to indicate if the process gets the cluster lock - suggested by: Joseph Qi and Junxiao Bi . - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder", suggested by: Junxiao Bi. - Add debugging output at ocfs2_setattr() and ocfs2_permission() to catch exceptional cases, suggested by: Junxiao Bi. - Do not inline functions whose bodies are not in scope, changed by: Stephen Rothwell . Changes since v2: - Use new wrappers of tracking logic code, suggested by: Junxiao Bi. Your comments and feedbacks are always welcomed. Eric Ren (2): ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock ocfs2: fix deadlock issue when taking inode lock at vfs entry points fs/ocfs2/acl.c | 29 +++ fs/ocfs2/dlmglue.c | 105 +++-- fs/ocfs2/dlmglue.h | 18 + fs/ocfs2/file.c| 58 ++--- fs/ocfs2/ocfs2.h | 1 + 5 files changed, 179 insertions(+), 32 deletions(-) -- 2.10.2
Re: + mm-vmscan-add-mm_vmscan_inactive_list_is_low-tracepoint.patch added to -mm tree
Hello, On Fri, Jan 13, 2017 at 10:10:09AM +0100, Michal Hocko wrote: > On Fri 13-01-17 17:57:34, Minchan Kim wrote: > > On Fri, Jan 13, 2017 at 08:47:07AM +0100, Michal Hocko wrote: > > > On Fri 13-01-17 10:37:24, Minchan Kim wrote: > > > > Hello, > > > > > > > > On Thu, Jan 12, 2017 at 10:10:17AM +0100, Michal Hocko wrote: > > > > > On Thu 12-01-17 17:48:13, Minchan Kim wrote: > > > > > > On Thu, Jan 12, 2017 at 09:15:54AM +0100, Michal Hocko wrote: > > > > > > > On Thu 12-01-17 14:12:47, Minchan Kim wrote: > > > > > > > > Hello, > > > > > > > > > > > > > > > > On Wed, Jan 11, 2017 at 04:52:39PM +0100, Michal Hocko wrote: > > > > > > > > > On Wed 11-01-17 08:52:50, Minchan Kim wrote: > > > > > > > > > [...] > > > > > > > > > > > @@ -2055,8 +2055,8 @@ static bool > > > > > > > > > > > inactive_list_is_low(struct > > > > > > > > > > > if (!file && !total_swap_pages) > > > > > > > > > > > return false; > > > > > > > > > > > > > > > > > > > > > > - inactive = lruvec_lru_size(lruvec, file * LRU_FILE); > > > > > > > > > > > - active = lruvec_lru_size(lruvec, file * LRU_FILE + > > > > > > > > > > > LRU_ACTIVE); > > > > > > > > > > > + total_inactive = inactive = lruvec_lru_size(lruvec, > > > > > > > > > > > file * LRU_FILE); > > > > > > > > > > > + total_active = active = lruvec_lru_size(lruvec, file * > > > > > > > > > > > LRU_FILE + LRU_ACTIVE); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > the decision of deactivating is based on eligible zone's > > > > > > > > > > LRU size, > > > > > > > > > > not whole zone so why should we need to get a trace of all > > > > > > > > > > zones's LRU? > > > > > > > > > > > > > > > > > > Strictly speaking, the total_ counters are not necessary for > > > > > > > > > making the > > > > > > > > > decision. I found reporting those numbers useful regardless > > > > > > > > > because this > > > > > > > > > will give us also an information how large is the eligible > > > > > > > > > portion of > > > > > > > > > the LRU list. We do not have any other tracepoint which would > > > > > > > > > report > > > > > > > > > that. > > > > > > > > > > > > > > > > The patch doesn't say anything why it's useful. Could you tell > > > > > > > > why it's > > > > > > > > useful and inactive_list_is_low should be right place? > > > > > > > > > > > > > > > > Don't get me wrong, please. I don't want to bother you. > > > > > > > > I really don't want to add random stuff although it's > > > > > > > > tracepoint for > > > > > > > > debugging. > > > > > > > > > > > > > > This doesn't sounds random to me. We simply do not have a full > > > > > > > picture > > > > > > > on 32b systems without this information. Especially when memcgs > > > > > > > are > > > > > > > involved and global numbers spread over different LRUs. > > > > > > > > > > > > Could you elaborate it? > > > > > > > > > > The problem with 32b systems is that you only can consider a part of > > > > > the > > > > > LRU for the lowmem requests. While we have global counters to see how > > > > > much lowmem inactive/active pages we have, those get distributed to > > > > > memcg LRUs. And that distribution is impossible to guess. So my > > > > > thinking > > > > > is that it can become a real head scratcher to realize why certain > > > > > active LRUs are aged while others are not. This was the case when I > > > > > was > > > > > debugging the last issue which triggered all this. All of the sudden I > > > > > have seen many invocations when inactive and active were zero which > > > > > sounded weird, until I realized that those are memcg's lruvec which is > > > > > what total numbers told me... > > > > > > > > Hmm, it seems I miss something. AFAIU, what you need is just memcg > > > > identifier, not all lru size. If it isn't, please tell more detail > > > > usecase of all lru size in that particular tracepoint. > > > > > > Having memcg id would be definitely helpful but that alone wouldn't tell > > > us how is the lowmem distributed. To be honest I really fail to see why > > > this bothers you all that much. > > > > Because I fail to understand why you want to need additional all zone's > > LRU stat in inactive_list_is_low. With clear understanding, we can think > > over that it's really needed and right place to achieve the goal. > > > > Could you say with a example you can think? It's really helpful to > > understand why it's needed. > > OK, I feel I am repeating myself but let me try again. Without the > total_ numbers we really do not know how is the lowmem distributed over > lruvecs. There is simply no way to get this information from existing > counters if memcg is enabled. I can't understand clearly why you need to know distribution. Anyway, if we need it, why do you think such particular inactive_list_is_low is right place? Actually, IMO, there is no need to insert any tracepoint in inactive_list_is_low. Instead, if we add tracepint in get_scan_count to record each LRU list
Re: + mm-vmscan-add-mm_vmscan_inactive_list_is_low-tracepoint.patch added to -mm tree
Hello, On Fri, Jan 13, 2017 at 10:10:09AM +0100, Michal Hocko wrote: > On Fri 13-01-17 17:57:34, Minchan Kim wrote: > > On Fri, Jan 13, 2017 at 08:47:07AM +0100, Michal Hocko wrote: > > > On Fri 13-01-17 10:37:24, Minchan Kim wrote: > > > > Hello, > > > > > > > > On Thu, Jan 12, 2017 at 10:10:17AM +0100, Michal Hocko wrote: > > > > > On Thu 12-01-17 17:48:13, Minchan Kim wrote: > > > > > > On Thu, Jan 12, 2017 at 09:15:54AM +0100, Michal Hocko wrote: > > > > > > > On Thu 12-01-17 14:12:47, Minchan Kim wrote: > > > > > > > > Hello, > > > > > > > > > > > > > > > > On Wed, Jan 11, 2017 at 04:52:39PM +0100, Michal Hocko wrote: > > > > > > > > > On Wed 11-01-17 08:52:50, Minchan Kim wrote: > > > > > > > > > [...] > > > > > > > > > > > @@ -2055,8 +2055,8 @@ static bool > > > > > > > > > > > inactive_list_is_low(struct > > > > > > > > > > > if (!file && !total_swap_pages) > > > > > > > > > > > return false; > > > > > > > > > > > > > > > > > > > > > > - inactive = lruvec_lru_size(lruvec, file * LRU_FILE); > > > > > > > > > > > - active = lruvec_lru_size(lruvec, file * LRU_FILE + > > > > > > > > > > > LRU_ACTIVE); > > > > > > > > > > > + total_inactive = inactive = lruvec_lru_size(lruvec, > > > > > > > > > > > file * LRU_FILE); > > > > > > > > > > > + total_active = active = lruvec_lru_size(lruvec, file * > > > > > > > > > > > LRU_FILE + LRU_ACTIVE); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > the decision of deactivating is based on eligible zone's > > > > > > > > > > LRU size, > > > > > > > > > > not whole zone so why should we need to get a trace of all > > > > > > > > > > zones's LRU? > > > > > > > > > > > > > > > > > > Strictly speaking, the total_ counters are not necessary for > > > > > > > > > making the > > > > > > > > > decision. I found reporting those numbers useful regardless > > > > > > > > > because this > > > > > > > > > will give us also an information how large is the eligible > > > > > > > > > portion of > > > > > > > > > the LRU list. We do not have any other tracepoint which would > > > > > > > > > report > > > > > > > > > that. > > > > > > > > > > > > > > > > The patch doesn't say anything why it's useful. Could you tell > > > > > > > > why it's > > > > > > > > useful and inactive_list_is_low should be right place? > > > > > > > > > > > > > > > > Don't get me wrong, please. I don't want to bother you. > > > > > > > > I really don't want to add random stuff although it's > > > > > > > > tracepoint for > > > > > > > > debugging. > > > > > > > > > > > > > > This doesn't sounds random to me. We simply do not have a full > > > > > > > picture > > > > > > > on 32b systems without this information. Especially when memcgs > > > > > > > are > > > > > > > involved and global numbers spread over different LRUs. > > > > > > > > > > > > Could you elaborate it? > > > > > > > > > > The problem with 32b systems is that you only can consider a part of > > > > > the > > > > > LRU for the lowmem requests. While we have global counters to see how > > > > > much lowmem inactive/active pages we have, those get distributed to > > > > > memcg LRUs. And that distribution is impossible to guess. So my > > > > > thinking > > > > > is that it can become a real head scratcher to realize why certain > > > > > active LRUs are aged while others are not. This was the case when I > > > > > was > > > > > debugging the last issue which triggered all this. All of the sudden I > > > > > have seen many invocations when inactive and active were zero which > > > > > sounded weird, until I realized that those are memcg's lruvec which is > > > > > what total numbers told me... > > > > > > > > Hmm, it seems I miss something. AFAIU, what you need is just memcg > > > > identifier, not all lru size. If it isn't, please tell more detail > > > > usecase of all lru size in that particular tracepoint. > > > > > > Having memcg id would be definitely helpful but that alone wouldn't tell > > > us how is the lowmem distributed. To be honest I really fail to see why > > > this bothers you all that much. > > > > Because I fail to understand why you want to need additional all zone's > > LRU stat in inactive_list_is_low. With clear understanding, we can think > > over that it's really needed and right place to achieve the goal. > > > > Could you say with a example you can think? It's really helpful to > > understand why it's needed. > > OK, I feel I am repeating myself but let me try again. Without the > total_ numbers we really do not know how is the lowmem distributed over > lruvecs. There is simply no way to get this information from existing > counters if memcg is enabled. I can't understand clearly why you need to know distribution. Anyway, if we need it, why do you think such particular inactive_list_is_low is right place? Actually, IMO, there is no need to insert any tracepoint in inactive_list_is_low. Instead, if we add tracepint in get_scan_count to record each LRU list
Re: [PATCH v4 1/3] dt: bindings: add documentation for zx2967 family thermal sensor
On Tue, Jan 17, 2017 at 01:38:48PM +0800, Baoyou Xie wrote: > On 17 January 2017 at 13:27, Eduardo Valentinwrote: > > It's a good idea:) In fact, we defined some thermal zones and used IPA > on products. > Cool! Any feedback on using IPA?
Re: [PATCH v4 1/3] dt: bindings: add documentation for zx2967 family thermal sensor
On Tue, Jan 17, 2017 at 01:38:48PM +0800, Baoyou Xie wrote: > On 17 January 2017 at 13:27, Eduardo Valentin wrote: > > It's a good idea:) In fact, we defined some thermal zones and used IPA > on products. > Cool! Any feedback on using IPA?
[PATCH] ahci: qoriq: added ls2088a platforms support
From: Tang YuantianLs2088a is new introduced arm-based soc with sata support with following features: 1. Complies with the serial ATA 3.0 specification and the AHCI 1.3.1 specification 2. Contains a high-speed descriptor-based DMA controller 3. Supports the following: a. Speeds of 1.5 Gb/s (first-generation SATA), 3 Gb/s (second-generation SATA), and 6 Gb/s (third-generation SATA) b. FIS-based switching c. Native command queuing (NCQ) commands d. Port multiplier operation e. Asynchronous notification f. SATA BIST mode Signed-off-by: Tang Yuantian --- Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt | 2 +- drivers/ata/ahci_qoriq.c | 9 + 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt b/Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt index fc33ca0..ed87c6f 100644 --- a/Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt +++ b/Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt @@ -3,7 +3,7 @@ Binding for Freescale QorIQ AHCI SATA Controller Required properties: - reg: Physical base address and size of the controller's register area. - compatible: Compatibility string. Must be 'fsl,-ahci', where -chip could be ls1021a, ls1043a, ls1046a, ls2080a etc. +chip could be ls1021a, ls1043a, ls1046a, ls2080a, ls2088a etc. - clocks: Input clock specifier. Refer to common clock bindings. - interrupts: Interrupt specifier. Refer to interrupt binding. diff --git a/drivers/ata/ahci_qoriq.c b/drivers/ata/ahci_qoriq.c index 66eb4b5..912fe32 100644 --- a/drivers/ata/ahci_qoriq.c +++ b/drivers/ata/ahci_qoriq.c @@ -53,6 +53,7 @@ enum ahci_qoriq_type { AHCI_LS1043A, AHCI_LS2080A, AHCI_LS1046A, + AHCI_LS2088A, }; struct ahci_qoriq_priv { @@ -67,6 +68,7 @@ static const struct of_device_id ahci_qoriq_of_match[] = { { .compatible = "fsl,ls1043a-ahci", .data = (void *)AHCI_LS1043A}, { .compatible = "fsl,ls2080a-ahci", .data = (void *)AHCI_LS2080A}, { .compatible = "fsl,ls1046a-ahci", .data = (void *)AHCI_LS1046A}, + { .compatible = "fsl,ls2088a-ahci", .data = (void *)AHCI_LS2088A}, {}, }; MODULE_DEVICE_TABLE(of, ahci_qoriq_of_match); @@ -193,6 +195,13 @@ static int ahci_qoriq_phy_init(struct ahci_host_priv *hpriv) if (qpriv->is_dmacoherent) writel(AHCI_PORT_AXICC_CFG, reg_base + PORT_AXICC); break; + + case AHCI_LS2088A: + writel(AHCI_PORT_PHY_1_CFG, reg_base + PORT_PHY1); + writel(AHCI_PORT_TRANS_CFG, reg_base + PORT_TRANS); + if (qpriv->is_dmacoherent) + writel(AHCI_PORT_AXICC_CFG, reg_base + PORT_AXICC); + break; } return 0; -- 2.1.0.27.g96db324
[PATCH] ahci: qoriq: added ls2088a platforms support
From: Tang Yuantian Ls2088a is new introduced arm-based soc with sata support with following features: 1. Complies with the serial ATA 3.0 specification and the AHCI 1.3.1 specification 2. Contains a high-speed descriptor-based DMA controller 3. Supports the following: a. Speeds of 1.5 Gb/s (first-generation SATA), 3 Gb/s (second-generation SATA), and 6 Gb/s (third-generation SATA) b. FIS-based switching c. Native command queuing (NCQ) commands d. Port multiplier operation e. Asynchronous notification f. SATA BIST mode Signed-off-by: Tang Yuantian --- Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt | 2 +- drivers/ata/ahci_qoriq.c | 9 + 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt b/Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt index fc33ca0..ed87c6f 100644 --- a/Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt +++ b/Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt @@ -3,7 +3,7 @@ Binding for Freescale QorIQ AHCI SATA Controller Required properties: - reg: Physical base address and size of the controller's register area. - compatible: Compatibility string. Must be 'fsl,-ahci', where -chip could be ls1021a, ls1043a, ls1046a, ls2080a etc. +chip could be ls1021a, ls1043a, ls1046a, ls2080a, ls2088a etc. - clocks: Input clock specifier. Refer to common clock bindings. - interrupts: Interrupt specifier. Refer to interrupt binding. diff --git a/drivers/ata/ahci_qoriq.c b/drivers/ata/ahci_qoriq.c index 66eb4b5..912fe32 100644 --- a/drivers/ata/ahci_qoriq.c +++ b/drivers/ata/ahci_qoriq.c @@ -53,6 +53,7 @@ enum ahci_qoriq_type { AHCI_LS1043A, AHCI_LS2080A, AHCI_LS1046A, + AHCI_LS2088A, }; struct ahci_qoriq_priv { @@ -67,6 +68,7 @@ static const struct of_device_id ahci_qoriq_of_match[] = { { .compatible = "fsl,ls1043a-ahci", .data = (void *)AHCI_LS1043A}, { .compatible = "fsl,ls2080a-ahci", .data = (void *)AHCI_LS2080A}, { .compatible = "fsl,ls1046a-ahci", .data = (void *)AHCI_LS1046A}, + { .compatible = "fsl,ls2088a-ahci", .data = (void *)AHCI_LS2088A}, {}, }; MODULE_DEVICE_TABLE(of, ahci_qoriq_of_match); @@ -193,6 +195,13 @@ static int ahci_qoriq_phy_init(struct ahci_host_priv *hpriv) if (qpriv->is_dmacoherent) writel(AHCI_PORT_AXICC_CFG, reg_base + PORT_AXICC); break; + + case AHCI_LS2088A: + writel(AHCI_PORT_PHY_1_CFG, reg_base + PORT_PHY1); + writel(AHCI_PORT_TRANS_CFG, reg_base + PORT_TRANS); + if (qpriv->is_dmacoherent) + writel(AHCI_PORT_AXICC_CFG, reg_base + PORT_AXICC); + break; } return 0; -- 2.1.0.27.g96db324
Re: [PATCH v4 07/15] lockdep: Implement crossrelease feature
On Tue, Jan 17, 2017 at 11:33:41AM +0900, Byungchul Park wrote: > On Mon, Jan 16, 2017 at 04:13:19PM +0100, Peter Zijlstra wrote: > > On Fri, Dec 09, 2016 at 02:12:03PM +0900, Byungchul Park wrote: > > > + /* > > > + * We assign class_idx here redundantly even though following > > > + * memcpy will cover it, in order to ensure a rcu reader can > > > + * access the class_idx atomically without lock. > > > + * > > > + * Here we assume setting a word-sized variable is atomic. > > > > which one, where? > > I meant xlock_class(xlock) in check_add_plock(). > > I was not sure about the following two. > > 1. Is it ordered between following a and b? >a. memcpy -> list_add_tail_rcu >b. list_for_each_entry_rcu -> load class_idx (xlock_class) >I assumed that it's not ordered. > 2. Does memcpy guarantee atomic store for each word? >I assumed that it doesn't. > > But I think I was wrong.. The first might be ordered. I will remove > the following redundant statement. It'd be orderd, right? > Yes, a and b are ordered, IOW, they could be paired, meaning when we got the item in a list_for_each_entry_rcu() loop, all memory operations before the corresponding list_add_tail_rcu() should be observed by us. Regards, Boqun > > > > > + */ > > > + xlock->hlock.class_idx = hlock->class_idx; > > > + gen_id = (unsigned int)atomic_inc_return(_gen_id); > > > + WRITE_ONCE(xlock->gen_id, gen_id); > > > + memcpy(>hlock, hlock, sizeof(struct held_lock)); > > > + INIT_LIST_HEAD(>xlock_entry); > > > + list_add_tail_rcu(>xlock_entry, _head); > > signature.asc Description: PGP signature
Re: [PATCH v4 07/15] lockdep: Implement crossrelease feature
On Tue, Jan 17, 2017 at 11:33:41AM +0900, Byungchul Park wrote: > On Mon, Jan 16, 2017 at 04:13:19PM +0100, Peter Zijlstra wrote: > > On Fri, Dec 09, 2016 at 02:12:03PM +0900, Byungchul Park wrote: > > > + /* > > > + * We assign class_idx here redundantly even though following > > > + * memcpy will cover it, in order to ensure a rcu reader can > > > + * access the class_idx atomically without lock. > > > + * > > > + * Here we assume setting a word-sized variable is atomic. > > > > which one, where? > > I meant xlock_class(xlock) in check_add_plock(). > > I was not sure about the following two. > > 1. Is it ordered between following a and b? >a. memcpy -> list_add_tail_rcu >b. list_for_each_entry_rcu -> load class_idx (xlock_class) >I assumed that it's not ordered. > 2. Does memcpy guarantee atomic store for each word? >I assumed that it doesn't. > > But I think I was wrong.. The first might be ordered. I will remove > the following redundant statement. It'd be orderd, right? > Yes, a and b are ordered, IOW, they could be paired, meaning when we got the item in a list_for_each_entry_rcu() loop, all memory operations before the corresponding list_add_tail_rcu() should be observed by us. Regards, Boqun > > > > > + */ > > > + xlock->hlock.class_idx = hlock->class_idx; > > > + gen_id = (unsigned int)atomic_inc_return(_gen_id); > > > + WRITE_ONCE(xlock->gen_id, gen_id); > > > + memcpy(>hlock, hlock, sizeof(struct held_lock)); > > > + INIT_LIST_HEAD(>xlock_entry); > > > + list_add_tail_rcu(>xlock_entry, _head); > > signature.asc Description: PGP signature
Re: [GIT PULL] ARM: exynos: Fixes for v4.10
On Fri, Jan 13, 2017 at 04:32:25PM +0200, Krzysztof Kozlowski wrote: > Hi, > > Just a couple of minor fixes for this release cycle. > > Best regards, > Krzysztof > > > The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77: > > Linux 4.10-rc1 (2016-12-25 16:13:08 -0800) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git > tags/samsung-fixes-4.10 > > for you to fetch changes up to 3ef01c968fbfb21c2f16281445d30a865ee4412c: > > ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_* (2017-01-02 > 21:05:34 +0200) > > > Samsung fixes for v4.10: > 1. Update maintainers entry with Patchwork address. > 2. Fix invalid values for NF_CT_PROTO_* in s3c2410 defconfig (these options >cannot be modules anymore). > > > Krzysztof Kozlowski (2): > MAINTAINERS: Add Patchwork URL to Samsung Exynos entry > ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_* Merged, thanks. -Olof
Re: [GIT PULL] ARM: exynos: Fixes for v4.10
On Fri, Jan 13, 2017 at 04:32:25PM +0200, Krzysztof Kozlowski wrote: > Hi, > > Just a couple of minor fixes for this release cycle. > > Best regards, > Krzysztof > > > The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77: > > Linux 4.10-rc1 (2016-12-25 16:13:08 -0800) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git > tags/samsung-fixes-4.10 > > for you to fetch changes up to 3ef01c968fbfb21c2f16281445d30a865ee4412c: > > ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_* (2017-01-02 > 21:05:34 +0200) > > > Samsung fixes for v4.10: > 1. Update maintainers entry with Patchwork address. > 2. Fix invalid values for NF_CT_PROTO_* in s3c2410 defconfig (these options >cannot be modules anymore). > > > Krzysztof Kozlowski (2): > MAINTAINERS: Add Patchwork URL to Samsung Exynos entry > ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_* Merged, thanks. -Olof
[PATCH v2] thermal/intel_powerclamp: Fix sched_setscheduler fail
From: "Liu, Xinwu"The schedule policy of thread "kidle_inject" is SCHED_NORMAL: [ 772.796284] intel_powerclamp: Start idle injection to reduce power [ 772.825757] [ cut here ] [ 772.825877] WARNING: CPU: 0 PID: 2140 at ../../../../../../kernel/glv/kernel/sched/idle.c:298 play_idle+0x16f/0x230 [ 772.826096] Modules linked in: ip6table_raw iptable_raw hci_uart bluetooth rfkill_gpio cfg80211 imc_ipc(O) [ 772.826194] CPU: 0 PID: 2140 Comm: kidle_inject/0 Tainted: G IO 4.9.0-ge701680f3a17 #1 [ 772.826373] c90001103d58 813ba085 [ 772.826552] c90001103d98 810953e1 012a691441c0 0006 [ 772.826731] 0002 88006a39d708 8800691441c0 e8c0dab8 [ 772.826794] Call Trace: [ 772.826903] [] dump_stack+0x67/0x92 [ 772.827007] [] __warn+0xd1/0xf0 [ 772.827124] [] warn_slowpath_null+0x1d/0x20 [ 772.827234] [] play_idle+0x16f/0x230 [ 772.827365] [] ? __schedule+0x217/0x6f0 [ 772.827485] [] clamp_idle_injection_func+0x6a/0x1f0 [ 772.827612] [] kthread_worker_fn+0xdd/0x200 [ 772.827741] [] ? __kthread_init_worker+0x50/0x50 [ 772.827860] [] kthread+0x102/0x120 [ 772.828004] [] ? kthread_park+0x60/0x60 [ 772.828118] [] ret_from_fork+0x27/0x40 [ 772.828228] ---[ end trace bb738d5d79381554 ]--- Normaly, the user thread have not the capability "CAP_SYS_NICE" (and they shouldn't), it will fail to change the schedule policy. So, fix it by changing by themselves. Signed-off-by: Liu, Xinwu Signed-off-by: Zhang, Di --- drivers/thermal/intel_powerclamp.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c index 83e6971..ea2d9cf 100644 --- a/drivers/thermal/intel_powerclamp.c +++ b/drivers/thermal/intel_powerclamp.c @@ -377,6 +377,7 @@ static bool powerclamp_adjust_controls(unsigned int target_ratio, return set_target_ratio + guard <= current_ratio; } +static DEFINE_PER_CPU(int, setscheduler_done); static void clamp_balancing_func(struct kthread_work *work) { struct powerclamp_worker_data *w_data; @@ -388,6 +389,10 @@ static void clamp_balancing_func(struct kthread_work *work) w_data = container_of(work, struct powerclamp_worker_data, balancing_work); + if (unlikely(this_cpu_read(setscheduler_done) == 0)) { + sched_setscheduler(current, SCHED_FIFO, ); + this_cpu_inc(setscheduler_done); + } /* * make sure user selected ratio does not take effect until * the next round. adjust target_ratio if user has changed @@ -507,7 +512,6 @@ static void start_power_clamp_worker(unsigned long cpu) w_data->cpu = cpu; w_data->clamping = true; set_bit(cpu, cpu_clamping_mask); - sched_setscheduler(worker->task, SCHED_FIFO, ); kthread_init_work(_data->balancing_work, clamp_balancing_func); kthread_init_delayed_work(_data->idle_injection_work, clamp_idle_injection_func); -- 1.9.1
[PATCH v2] thermal/intel_powerclamp: Fix sched_setscheduler fail
From: "Liu, Xinwu" The schedule policy of thread "kidle_inject" is SCHED_NORMAL: [ 772.796284] intel_powerclamp: Start idle injection to reduce power [ 772.825757] [ cut here ] [ 772.825877] WARNING: CPU: 0 PID: 2140 at ../../../../../../kernel/glv/kernel/sched/idle.c:298 play_idle+0x16f/0x230 [ 772.826096] Modules linked in: ip6table_raw iptable_raw hci_uart bluetooth rfkill_gpio cfg80211 imc_ipc(O) [ 772.826194] CPU: 0 PID: 2140 Comm: kidle_inject/0 Tainted: G IO 4.9.0-ge701680f3a17 #1 [ 772.826373] c90001103d58 813ba085 [ 772.826552] c90001103d98 810953e1 012a691441c0 0006 [ 772.826731] 0002 88006a39d708 8800691441c0 e8c0dab8 [ 772.826794] Call Trace: [ 772.826903] [] dump_stack+0x67/0x92 [ 772.827007] [] __warn+0xd1/0xf0 [ 772.827124] [] warn_slowpath_null+0x1d/0x20 [ 772.827234] [] play_idle+0x16f/0x230 [ 772.827365] [] ? __schedule+0x217/0x6f0 [ 772.827485] [] clamp_idle_injection_func+0x6a/0x1f0 [ 772.827612] [] kthread_worker_fn+0xdd/0x200 [ 772.827741] [] ? __kthread_init_worker+0x50/0x50 [ 772.827860] [] kthread+0x102/0x120 [ 772.828004] [] ? kthread_park+0x60/0x60 [ 772.828118] [] ret_from_fork+0x27/0x40 [ 772.828228] ---[ end trace bb738d5d79381554 ]--- Normaly, the user thread have not the capability "CAP_SYS_NICE" (and they shouldn't), it will fail to change the schedule policy. So, fix it by changing by themselves. Signed-off-by: Liu, Xinwu Signed-off-by: Zhang, Di --- drivers/thermal/intel_powerclamp.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c index 83e6971..ea2d9cf 100644 --- a/drivers/thermal/intel_powerclamp.c +++ b/drivers/thermal/intel_powerclamp.c @@ -377,6 +377,7 @@ static bool powerclamp_adjust_controls(unsigned int target_ratio, return set_target_ratio + guard <= current_ratio; } +static DEFINE_PER_CPU(int, setscheduler_done); static void clamp_balancing_func(struct kthread_work *work) { struct powerclamp_worker_data *w_data; @@ -388,6 +389,10 @@ static void clamp_balancing_func(struct kthread_work *work) w_data = container_of(work, struct powerclamp_worker_data, balancing_work); + if (unlikely(this_cpu_read(setscheduler_done) == 0)) { + sched_setscheduler(current, SCHED_FIFO, ); + this_cpu_inc(setscheduler_done); + } /* * make sure user selected ratio does not take effect until * the next round. adjust target_ratio if user has changed @@ -507,7 +512,6 @@ static void start_power_clamp_worker(unsigned long cpu) w_data->cpu = cpu; w_data->clamping = true; set_bit(cpu, cpu_clamping_mask); - sched_setscheduler(worker->task, SCHED_FIFO, ); kthread_init_work(_data->balancing_work, clamp_balancing_func); kthread_init_delayed_work(_data->idle_injection_work, clamp_idle_injection_func); -- 1.9.1
Re: [PATCH v7 3/3] arm64: dts: exynos: Add tm2 touchkey node
Hi, I tested this patch on v6[1] and replied it. But, this version is missing the my tested-by and reviewed-by tag. [1] https://patchwork.kernel.org/patch/9504139/ So, I add my reviewed-by and tested-by tag again. Reviewed-by: Chanwoo ChoiTested-by: Chanwoo Choi On 2017년 01월 17일 14:54, Jaechul Lee wrote: > Add DT node support for TM2 touchkey device. > > Signed-off-by: Beomho Seo > Signed-off-by: Jaechul Lee > Signed-off-by: Andi Shyti > Reviewed-by: Javier Martinez Canillas > --- > arch/arm64/boot/dts/exynos/exynos5433-tm2.dts | 13 + > 1 file changed, 13 insertions(+) > > diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts > b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts > index 2449266..ddba2f8 100644 > --- a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts > +++ b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts > @@ -18,6 +18,19 @@ > compatible = "samsung,tm2", "samsung,exynos5433"; > }; > > +_9 { > + status = "okay"; > + > + touchkey@20 { > + compatible = "cypress,tm2-touchkey"; > + reg = <0x20>; > + interrupt-parent = <>; > + interrupts = <2 IRQ_TYPE_EDGE_FALLING>; > + vcc-supply = <_reg>; > + vdd-supply = <_reg>; > + }; > +}; > + > _reg { > regulator-name = "TSP_VDD_1.85V_AP"; > regulator-min-microvolt = <185>; > -- Best Regards, Chanwoo Choi S/W Center, Samsung Electronics
Re: [PATCH v7 3/3] arm64: dts: exynos: Add tm2 touchkey node
Hi, I tested this patch on v6[1] and replied it. But, this version is missing the my tested-by and reviewed-by tag. [1] https://patchwork.kernel.org/patch/9504139/ So, I add my reviewed-by and tested-by tag again. Reviewed-by: Chanwoo Choi Tested-by: Chanwoo Choi On 2017년 01월 17일 14:54, Jaechul Lee wrote: > Add DT node support for TM2 touchkey device. > > Signed-off-by: Beomho Seo > Signed-off-by: Jaechul Lee > Signed-off-by: Andi Shyti > Reviewed-by: Javier Martinez Canillas > --- > arch/arm64/boot/dts/exynos/exynos5433-tm2.dts | 13 + > 1 file changed, 13 insertions(+) > > diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts > b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts > index 2449266..ddba2f8 100644 > --- a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts > +++ b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts > @@ -18,6 +18,19 @@ > compatible = "samsung,tm2", "samsung,exynos5433"; > }; > > +_9 { > + status = "okay"; > + > + touchkey@20 { > + compatible = "cypress,tm2-touchkey"; > + reg = <0x20>; > + interrupt-parent = <>; > + interrupts = <2 IRQ_TYPE_EDGE_FALLING>; > + vcc-supply = <_reg>; > + vdd-supply = <_reg>; > + }; > +}; > + > _reg { > regulator-name = "TSP_VDD_1.85V_AP"; > regulator-min-microvolt = <185>; > -- Best Regards, Chanwoo Choi S/W Center, Samsung Electronics
Re: [PATCH v4] ARM64: dts: meson-gx: Add reserved memory zone and usable memory range
On Mon, Jan 16, 2017 at 2:39 AM, Neil Armstrongwrote: > On 01/15/2017 03:43 PM, Andreas Färber wrote: >> Am 13.01.2017 um 21:03 schrieb Kevin Hilman: >>> Neil Armstrong writes: >>> The Amlogic Meson GXBB/GXL/GXM secure monitor uses part of the memory space, this patch adds this reserved zone and redefines the usable memory range. The memory node is also moved from the dtsi files into the proper dts files to handle variants memory sizes. This patch also fixes the memory sizes for the following platforms : - gxl-s905x-p212 : 1GiB instead of 2GiB, a proper 2GiB dts should be pushed - gxm-s912-q201 : 1GiB instead of 2GiB, a proper 2GiB dts should be pushed - gxl-s905d-p231 : 1GiB instead of 2GiB, a proper 2GiB dts should be pushed - gxl-nexbox-a95x : 1GiB instead of 2GiB, a proper 2GiB dts should be pushed Signed-off-by: Neil Armstrong >>> >>> Queued for v4.10-rc. >> >> What is the motivation for this change? I have a local U-Boot patch to >> detect the amount of memory available as done downstream, but U-Boot >> only updates the reg property that you seem to be abandoning here... >> >> So for devices that come in multiple RAM configurations - like R-Box Pro >> - this would require separate .dts files now! This looks very wrong to >> me, especially since I am not aware of other platforms doing the same. >> Instead, there's memory reservations for top and bottom done in U-Boot >> for reg, plus reserved-memory nodes for anything in the middle. >> >> Another thing to consider is that uEFI boot (bootefi) handles memory >> reservation differently yet again, on the bootloader level. I have had >> that working fine on Odroid-C2 and Vega S95. >> >> So if there's no bug this is fixing (none mentioned in commit message) I >> strongly object to this patch. >> >> Regards, >> Andreas >> > > Hi Andreas, > > Like I replied of my RFT patch : > I really disagree about relying on any work or properties added by any > bootloader here, Amlogic SoCs has > a lot of u-boot versions in the field, and the Odroid-C2 is part of this. > > Even if Odroid-c2 is in mainline U-Boot or not, the mainline Linux kernel > should work using > any U-boot version even with the one provided by Amlogic on their openlinux > distribution channel. > > Handling multiple RAM configuration is another story, and the Arm-Soc and DT > maintainers should give us > their advices. Is there a way to detect what firmware is running and marking off memory from early kernel init instead? That'll take care of the concerns about memory size variance as well. > Actually there is a severe bug fixed here that cause a huge crash if such > memory is not reserved while > running stock u-boot version on various shipped products and Amlogic's own > development boards. > > The bug is easily triggered by running : > # stress --vm 4 --vm-bytes 128M --timeout 10s & > [ 46.937975] Bad mode in Error handler detected on CPU1, code 0xbf00 -- > SError > ... > [ 47.058536] Internal error: Attempting to execute userspace memory: > 860f [#3] PREEMPT SMP > ... > > Note this is a fix targeted for 4.10 to make the system stable and various > users reported some severe > crash now the system has more drivers and read-world use-cases are running on > Amlogic SoCs. > > Please feel free to push whatever changes that makes this memory reservation > more coherent for 4.11, > and respect the behavior of already shipped u-boot version and mainline > U-Boot, UEFI, whatever... Technically we're not in regression territory here, since the platform is obviously still in bringup and these aren't bugs that have been introduced in this release. So I think we can take a little while to sort out if there's a solution that, even if not ideal, at least is on the path towards the proper fix and not away from it -- which this seems to be. -Olof
Re: [PATCH v4] ARM64: dts: meson-gx: Add reserved memory zone and usable memory range
On Mon, Jan 16, 2017 at 2:39 AM, Neil Armstrong wrote: > On 01/15/2017 03:43 PM, Andreas Färber wrote: >> Am 13.01.2017 um 21:03 schrieb Kevin Hilman: >>> Neil Armstrong writes: >>> The Amlogic Meson GXBB/GXL/GXM secure monitor uses part of the memory space, this patch adds this reserved zone and redefines the usable memory range. The memory node is also moved from the dtsi files into the proper dts files to handle variants memory sizes. This patch also fixes the memory sizes for the following platforms : - gxl-s905x-p212 : 1GiB instead of 2GiB, a proper 2GiB dts should be pushed - gxm-s912-q201 : 1GiB instead of 2GiB, a proper 2GiB dts should be pushed - gxl-s905d-p231 : 1GiB instead of 2GiB, a proper 2GiB dts should be pushed - gxl-nexbox-a95x : 1GiB instead of 2GiB, a proper 2GiB dts should be pushed Signed-off-by: Neil Armstrong >>> >>> Queued for v4.10-rc. >> >> What is the motivation for this change? I have a local U-Boot patch to >> detect the amount of memory available as done downstream, but U-Boot >> only updates the reg property that you seem to be abandoning here... >> >> So for devices that come in multiple RAM configurations - like R-Box Pro >> - this would require separate .dts files now! This looks very wrong to >> me, especially since I am not aware of other platforms doing the same. >> Instead, there's memory reservations for top and bottom done in U-Boot >> for reg, plus reserved-memory nodes for anything in the middle. >> >> Another thing to consider is that uEFI boot (bootefi) handles memory >> reservation differently yet again, on the bootloader level. I have had >> that working fine on Odroid-C2 and Vega S95. >> >> So if there's no bug this is fixing (none mentioned in commit message) I >> strongly object to this patch. >> >> Regards, >> Andreas >> > > Hi Andreas, > > Like I replied of my RFT patch : > I really disagree about relying on any work or properties added by any > bootloader here, Amlogic SoCs has > a lot of u-boot versions in the field, and the Odroid-C2 is part of this. > > Even if Odroid-c2 is in mainline U-Boot or not, the mainline Linux kernel > should work using > any U-boot version even with the one provided by Amlogic on their openlinux > distribution channel. > > Handling multiple RAM configuration is another story, and the Arm-Soc and DT > maintainers should give us > their advices. Is there a way to detect what firmware is running and marking off memory from early kernel init instead? That'll take care of the concerns about memory size variance as well. > Actually there is a severe bug fixed here that cause a huge crash if such > memory is not reserved while > running stock u-boot version on various shipped products and Amlogic's own > development boards. > > The bug is easily triggered by running : > # stress --vm 4 --vm-bytes 128M --timeout 10s & > [ 46.937975] Bad mode in Error handler detected on CPU1, code 0xbf00 -- > SError > ... > [ 47.058536] Internal error: Attempting to execute userspace memory: > 860f [#3] PREEMPT SMP > ... > > Note this is a fix targeted for 4.10 to make the system stable and various > users reported some severe > crash now the system has more drivers and read-world use-cases are running on > Amlogic SoCs. > > Please feel free to push whatever changes that makes this memory reservation > more coherent for 4.11, > and respect the behavior of already shipped u-boot version and mainline > U-Boot, UEFI, whatever... Technically we're not in regression territory here, since the platform is obviously still in bringup and these aren't bugs that have been introduced in this release. So I think we can take a little while to sort out if there's a solution that, even if not ideal, at least is on the path towards the proper fix and not away from it -- which this seems to be. -Olof
Re: [PATCH 0/3] tty: serial: 8250_omap: Enable DMA support
On Friday 13 January 2017 11:50 PM, Tony Lindgren wrote: > * Vignesh R[170113 00:03]: >> This patch series re enables DMA support for UART 8250_omap driver. >> >> Tested on AM335x, AM437x that use EDMA and OMAP5 and DRA74 EVM with >> SDMA. > > Is 8250_omap serial console working for you on omap5 in general? > > I've noticed that it's really unresponsive for me as if the FIFO > interrupt was not working. For example logging in might take several > attempts and a long time with each character showing up much later > after some timeout. > Yes, I did face the same issue on omap5 using 8250_omap driver for console. Looks like this bug has existed all along. > TX on the uart on omap5 seems to work OK, I do see all the console > messages fine. And these patches do not make the issue better or > worse. > True. -- Regards Vignesh
Re: [PATCH 0/3] tty: serial: 8250_omap: Enable DMA support
On Friday 13 January 2017 11:50 PM, Tony Lindgren wrote: > * Vignesh R [170113 00:03]: >> This patch series re enables DMA support for UART 8250_omap driver. >> >> Tested on AM335x, AM437x that use EDMA and OMAP5 and DRA74 EVM with >> SDMA. > > Is 8250_omap serial console working for you on omap5 in general? > > I've noticed that it's really unresponsive for me as if the FIFO > interrupt was not working. For example logging in might take several > attempts and a long time with each character showing up much later > after some timeout. > Yes, I did face the same issue on omap5 using 8250_omap driver for console. Looks like this bug has existed all along. > TX on the uart on omap5 seems to work OK, I do see all the console > messages fine. And these patches do not make the issue better or > worse. > True. -- Regards Vignesh
[PATCH v7 0/3] Add touch key driver support for TM2
Hi, This patch is last three patch from https://lkml.org/lkml/2017/1/6/277. because 1 and 2 patches have already been merged by Krzysztof. This patchset adds support for the tm2 touchkey device. The driver has been ported from Tizen Kernel, originally written by Beomho. I ported it to the latest mainline Kernel. Best Regard, Jaechul Changes in v7 - added Chanwoo's reviewed and tested. - fixed reviews from Dmitry. Changes in v6: - changed compatible name from samsaung to cypress. - updated commit tags. - removed first two patches from the original patchset. Changes in v5: - patch 1: removed a spurious regulator-always-off inherited from a different patch. Thanks Krzysztof. - patch 2: fixed a slip on the model, thanks Javier (this patch confuses me quite a lot, this was all right some patches ago and re appeared on this one). - patch 2: removed 'regulator' label and used the original ldo3x labels. Krzysztof: it looks better indeed. - added Javier's reviews and Krzysztof's acks on the related patches. Changes in v4: - patch 1 has been rebased on top of 7c294e002641 (arm64: dts: exynos: Remove unsupported regulator-always-off property from TM2E) - patch 2 has been generated with -B50% diff option using git 2.11 Changes in v3: - Changed the commit ordering, the tm2-touchkey related patches are the last 3. - Added Chanwoo's patch which fixes the wrong voltage of ldo23 and ldo25. - Andi (patch 3) moves the ldo31 and ldo38 in the tm2 and tm2e files as they have different values. Changes in v2: - fixed reviews from Javier, Dmitry - refactored power enable/disable functions. - reordered signed-offs in patch 2, while patch 4 is left as it was as Andi copy pasted the node to the new tm2.dts file - added Jarvier's (patch 1,2,4) and Krzysztof's (patch 4) reviews and Rob's Ack - patch 3 diff has been generated with -B50% Jaechul Lee (3): input: Add support for the tm2 touchkey device driver input: tm2-touchkey: Add touchkey driver support for TM2 arm64: dts: exynos: Add tm2 touchkey node .../bindings/input/cypress,tm2-touchkey.txt| 27 ++ arch/arm64/boot/dts/exynos/exynos5433-tm2.dts | 13 + drivers/input/keyboard/Kconfig | 11 + drivers/input/keyboard/Makefile| 1 + drivers/input/keyboard/tm2-touchkey.c | 287 + 5 files changed, 339 insertions(+) create mode 100644 Documentation/devicetree/bindings/input/cypress,tm2-touchkey.txt create mode 100644 drivers/input/keyboard/tm2-touchkey.c -- 2.7.4
[PATCH v7 2/3] input: tm2-touchkey: Add touchkey driver support for TM2
This patch adds support for the TM2 touch key and led functionality. The driver interfaces with userspace through an input device and reports KEY_PHONE and KEY_BACK event types. LED brightness can be controlled by "/sys/class/leds/tm2-touchkey/brightness". Signed-off-by: Beomho SeoSigned-off-by: Jaechul Lee Reviewed-by: Javier Martinez Canillas Reviewed-by: Andi Shyti Reviewed-by: Chanwoo Choi Tested-by: Chanwoo Choi Acked-by: Krzysztof Kozlowski Signed-off-by: Dmitry Torokhov --- drivers/input/keyboard/Kconfig| 11 ++ drivers/input/keyboard/Makefile | 1 + drivers/input/keyboard/tm2-touchkey.c | 286 ++ 3 files changed, 298 insertions(+) create mode 100644 drivers/input/keyboard/tm2-touchkey.c diff --git a/drivers/input/keyboard/Kconfig b/drivers/input/keyboard/Kconfig index cbd75cf..97acd65 100644 --- a/drivers/input/keyboard/Kconfig +++ b/drivers/input/keyboard/Kconfig @@ -666,6 +666,17 @@ config KEYBOARD_TC3589X To compile this driver as a module, choose M here: the module will be called tc3589x-keypad. +config KEYBOARD_TM2_TOUCHKEY + tristate "TM2 touchkey support" + depends on I2C + depends on LEDS_CLASS + help + Say Y here to enable device driver for tm2-touchkey with + LED control for the Exynos5433 TM2 board. + + To compile this driver as a module, choose M here. + module will be called tm2-touchkey. + config KEYBOARD_TWL4030 tristate "TI TWL4030/TWL5030/TPS659x0 keypad support" depends on TWL4030_CORE diff --git a/drivers/input/keyboard/Makefile b/drivers/input/keyboard/Makefile index d9f4cfc..7d9acff 100644 --- a/drivers/input/keyboard/Makefile +++ b/drivers/input/keyboard/Makefile @@ -61,6 +61,7 @@ obj-$(CONFIG_KEYBOARD_SUN4I_LRADC)+= sun4i-lradc-keys.o obj-$(CONFIG_KEYBOARD_SUNKBD) += sunkbd.o obj-$(CONFIG_KEYBOARD_TC3589X) += tc3589x-keypad.o obj-$(CONFIG_KEYBOARD_TEGRA) += tegra-kbc.o +obj-$(CONFIG_KEYBOARD_TM2_TOUCHKEY)+= tm2-touchkey.o obj-$(CONFIG_KEYBOARD_TWL4030) += twl4030_keypad.o obj-$(CONFIG_KEYBOARD_XTKBD) += xtkbd.o obj-$(CONFIG_KEYBOARD_W90P910) += w90p910_keypad.o diff --git a/drivers/input/keyboard/tm2-touchkey.c b/drivers/input/keyboard/tm2-touchkey.c new file mode 100644 index 000..916e2f3 --- /dev/null +++ b/drivers/input/keyboard/tm2-touchkey.c @@ -0,0 +1,286 @@ +/* + * TM2 touchkey device driver + * + * Copyright 2005 Phil Blundell + * Copyright 2016 Samsung Electronics Co., Ltd. + * + * Author: Beomho Seo + * Author: Jaechul Lee + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define TM2_TOUCHKEY_DEV_NAME "tm2-touchkey" +#define TM2_TOUCHKEY_KEYCODE_REG 0x03 +#define TM2_TOUCHKEY_BASE_REG 0x00 +#define TM2_TOUCHKEY_CMD_LED_ON0x10 +#define TM2_TOUCHKEY_CMD_LED_OFF 0x20 +#define TM2_TOUCHKEY_BIT_PRESS_EV BIT(3) +#define TM2_TOUCHKEY_BIT_KEYCODE GENMASK(2, 0) +#define TM2_TOUCHKEY_LED_VOLTAGE_MIN 250 +#define TM2_TOUCHKEY_LED_VOLTAGE_MAX 330 + +enum { + TM2_TOUCHKEY_KEY_MENU = 0x1, + TM2_TOUCHKEY_KEY_BACK, +}; + +struct tm2_touchkey_data { + struct i2c_client *client; + struct input_dev *input_dev; + struct led_classdev led_dev; + struct regulator *vdd; + struct regulator_bulk_data regulators[2]; +}; + +static void tm2_touchkey_led_brightness_set(struct led_classdev *led_dev, + enum led_brightness brightness) +{ + struct tm2_touchkey_data *touchkey = + container_of(led_dev, struct tm2_touchkey_data, led_dev); + u32 volt; + u8 data; + + if (brightness == LED_OFF) { + volt = TM2_TOUCHKEY_LED_VOLTAGE_MIN; + data = TM2_TOUCHKEY_CMD_LED_OFF; + } else { + volt = TM2_TOUCHKEY_LED_VOLTAGE_MAX; + data = TM2_TOUCHKEY_CMD_LED_ON; + } + + regulator_set_voltage(touchkey->vdd, volt, volt); + i2c_smbus_write_byte_data(touchkey->client, + TM2_TOUCHKEY_BASE_REG, data); +} + +static int tm2_touchkey_power_enable(struct tm2_touchkey_data *touchkey) +{ + int error; + + error = regulator_bulk_enable(ARRAY_SIZE(touchkey->regulators), + touchkey->regulators); + if (error) + return error; + +
[PATCH v7 0/3] Add touch key driver support for TM2
Hi, This patch is last three patch from https://lkml.org/lkml/2017/1/6/277. because 1 and 2 patches have already been merged by Krzysztof. This patchset adds support for the tm2 touchkey device. The driver has been ported from Tizen Kernel, originally written by Beomho. I ported it to the latest mainline Kernel. Best Regard, Jaechul Changes in v7 - added Chanwoo's reviewed and tested. - fixed reviews from Dmitry. Changes in v6: - changed compatible name from samsaung to cypress. - updated commit tags. - removed first two patches from the original patchset. Changes in v5: - patch 1: removed a spurious regulator-always-off inherited from a different patch. Thanks Krzysztof. - patch 2: fixed a slip on the model, thanks Javier (this patch confuses me quite a lot, this was all right some patches ago and re appeared on this one). - patch 2: removed 'regulator' label and used the original ldo3x labels. Krzysztof: it looks better indeed. - added Javier's reviews and Krzysztof's acks on the related patches. Changes in v4: - patch 1 has been rebased on top of 7c294e002641 (arm64: dts: exynos: Remove unsupported regulator-always-off property from TM2E) - patch 2 has been generated with -B50% diff option using git 2.11 Changes in v3: - Changed the commit ordering, the tm2-touchkey related patches are the last 3. - Added Chanwoo's patch which fixes the wrong voltage of ldo23 and ldo25. - Andi (patch 3) moves the ldo31 and ldo38 in the tm2 and tm2e files as they have different values. Changes in v2: - fixed reviews from Javier, Dmitry - refactored power enable/disable functions. - reordered signed-offs in patch 2, while patch 4 is left as it was as Andi copy pasted the node to the new tm2.dts file - added Jarvier's (patch 1,2,4) and Krzysztof's (patch 4) reviews and Rob's Ack - patch 3 diff has been generated with -B50% Jaechul Lee (3): input: Add support for the tm2 touchkey device driver input: tm2-touchkey: Add touchkey driver support for TM2 arm64: dts: exynos: Add tm2 touchkey node .../bindings/input/cypress,tm2-touchkey.txt| 27 ++ arch/arm64/boot/dts/exynos/exynos5433-tm2.dts | 13 + drivers/input/keyboard/Kconfig | 11 + drivers/input/keyboard/Makefile| 1 + drivers/input/keyboard/tm2-touchkey.c | 287 + 5 files changed, 339 insertions(+) create mode 100644 Documentation/devicetree/bindings/input/cypress,tm2-touchkey.txt create mode 100644 drivers/input/keyboard/tm2-touchkey.c -- 2.7.4
[PATCH v7 2/3] input: tm2-touchkey: Add touchkey driver support for TM2
This patch adds support for the TM2 touch key and led functionality. The driver interfaces with userspace through an input device and reports KEY_PHONE and KEY_BACK event types. LED brightness can be controlled by "/sys/class/leds/tm2-touchkey/brightness". Signed-off-by: Beomho Seo Signed-off-by: Jaechul Lee Reviewed-by: Javier Martinez Canillas Reviewed-by: Andi Shyti Reviewed-by: Chanwoo Choi Tested-by: Chanwoo Choi Acked-by: Krzysztof Kozlowski Signed-off-by: Dmitry Torokhov --- drivers/input/keyboard/Kconfig| 11 ++ drivers/input/keyboard/Makefile | 1 + drivers/input/keyboard/tm2-touchkey.c | 286 ++ 3 files changed, 298 insertions(+) create mode 100644 drivers/input/keyboard/tm2-touchkey.c diff --git a/drivers/input/keyboard/Kconfig b/drivers/input/keyboard/Kconfig index cbd75cf..97acd65 100644 --- a/drivers/input/keyboard/Kconfig +++ b/drivers/input/keyboard/Kconfig @@ -666,6 +666,17 @@ config KEYBOARD_TC3589X To compile this driver as a module, choose M here: the module will be called tc3589x-keypad. +config KEYBOARD_TM2_TOUCHKEY + tristate "TM2 touchkey support" + depends on I2C + depends on LEDS_CLASS + help + Say Y here to enable device driver for tm2-touchkey with + LED control for the Exynos5433 TM2 board. + + To compile this driver as a module, choose M here. + module will be called tm2-touchkey. + config KEYBOARD_TWL4030 tristate "TI TWL4030/TWL5030/TPS659x0 keypad support" depends on TWL4030_CORE diff --git a/drivers/input/keyboard/Makefile b/drivers/input/keyboard/Makefile index d9f4cfc..7d9acff 100644 --- a/drivers/input/keyboard/Makefile +++ b/drivers/input/keyboard/Makefile @@ -61,6 +61,7 @@ obj-$(CONFIG_KEYBOARD_SUN4I_LRADC)+= sun4i-lradc-keys.o obj-$(CONFIG_KEYBOARD_SUNKBD) += sunkbd.o obj-$(CONFIG_KEYBOARD_TC3589X) += tc3589x-keypad.o obj-$(CONFIG_KEYBOARD_TEGRA) += tegra-kbc.o +obj-$(CONFIG_KEYBOARD_TM2_TOUCHKEY)+= tm2-touchkey.o obj-$(CONFIG_KEYBOARD_TWL4030) += twl4030_keypad.o obj-$(CONFIG_KEYBOARD_XTKBD) += xtkbd.o obj-$(CONFIG_KEYBOARD_W90P910) += w90p910_keypad.o diff --git a/drivers/input/keyboard/tm2-touchkey.c b/drivers/input/keyboard/tm2-touchkey.c new file mode 100644 index 000..916e2f3 --- /dev/null +++ b/drivers/input/keyboard/tm2-touchkey.c @@ -0,0 +1,286 @@ +/* + * TM2 touchkey device driver + * + * Copyright 2005 Phil Blundell + * Copyright 2016 Samsung Electronics Co., Ltd. + * + * Author: Beomho Seo + * Author: Jaechul Lee + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define TM2_TOUCHKEY_DEV_NAME "tm2-touchkey" +#define TM2_TOUCHKEY_KEYCODE_REG 0x03 +#define TM2_TOUCHKEY_BASE_REG 0x00 +#define TM2_TOUCHKEY_CMD_LED_ON0x10 +#define TM2_TOUCHKEY_CMD_LED_OFF 0x20 +#define TM2_TOUCHKEY_BIT_PRESS_EV BIT(3) +#define TM2_TOUCHKEY_BIT_KEYCODE GENMASK(2, 0) +#define TM2_TOUCHKEY_LED_VOLTAGE_MIN 250 +#define TM2_TOUCHKEY_LED_VOLTAGE_MAX 330 + +enum { + TM2_TOUCHKEY_KEY_MENU = 0x1, + TM2_TOUCHKEY_KEY_BACK, +}; + +struct tm2_touchkey_data { + struct i2c_client *client; + struct input_dev *input_dev; + struct led_classdev led_dev; + struct regulator *vdd; + struct regulator_bulk_data regulators[2]; +}; + +static void tm2_touchkey_led_brightness_set(struct led_classdev *led_dev, + enum led_brightness brightness) +{ + struct tm2_touchkey_data *touchkey = + container_of(led_dev, struct tm2_touchkey_data, led_dev); + u32 volt; + u8 data; + + if (brightness == LED_OFF) { + volt = TM2_TOUCHKEY_LED_VOLTAGE_MIN; + data = TM2_TOUCHKEY_CMD_LED_OFF; + } else { + volt = TM2_TOUCHKEY_LED_VOLTAGE_MAX; + data = TM2_TOUCHKEY_CMD_LED_ON; + } + + regulator_set_voltage(touchkey->vdd, volt, volt); + i2c_smbus_write_byte_data(touchkey->client, + TM2_TOUCHKEY_BASE_REG, data); +} + +static int tm2_touchkey_power_enable(struct tm2_touchkey_data *touchkey) +{ + int error; + + error = regulator_bulk_enable(ARRAY_SIZE(touchkey->regulators), + touchkey->regulators); + if (error) + return error; + + /* waiting for device initialization, at least 150ms */ + msleep(150); + + return 0; +} + +static void tm2_touchkey_power_disable(void *data) +{ + struct tm2_touchkey_data *touchkey = data; + +
[PATCH v7 3/3] arm64: dts: exynos: Add tm2 touchkey node
Add DT node support for TM2 touchkey device. Signed-off-by: Beomho SeoSigned-off-by: Jaechul Lee Signed-off-by: Andi Shyti Reviewed-by: Javier Martinez Canillas --- arch/arm64/boot/dts/exynos/exynos5433-tm2.dts | 13 + 1 file changed, 13 insertions(+) diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts index 2449266..ddba2f8 100644 --- a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts +++ b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts @@ -18,6 +18,19 @@ compatible = "samsung,tm2", "samsung,exynos5433"; }; +_9 { + status = "okay"; + + touchkey@20 { + compatible = "cypress,tm2-touchkey"; + reg = <0x20>; + interrupt-parent = <>; + interrupts = <2 IRQ_TYPE_EDGE_FALLING>; + vcc-supply = <_reg>; + vdd-supply = <_reg>; + }; +}; + _reg { regulator-name = "TSP_VDD_1.85V_AP"; regulator-min-microvolt = <185>; -- 2.7.4
[PATCH v7 1/3] input: Add support for the tm2 touchkey device driver
This patch adds the binding description of the tm2 touchkey device driver. Signed-off-by: Jaechul LeeReviewed-by: Javier Martinez Canillas Reviewed-by: Andi Shyti Reviewed-by: Chanwoo Choi Acked-by: Rob Herring Acked-by: Krzysztof Kozlowski --- .../bindings/input/cypress,tm2-touchkey.txt| 27 ++ 1 file changed, 27 insertions(+) create mode 100644 Documentation/devicetree/bindings/input/cypress,tm2-touchkey.txt diff --git a/Documentation/devicetree/bindings/input/cypress,tm2-touchkey.txt b/Documentation/devicetree/bindings/input/cypress,tm2-touchkey.txt new file mode 100644 index 000..635f62c --- /dev/null +++ b/Documentation/devicetree/bindings/input/cypress,tm2-touchkey.txt @@ -0,0 +1,27 @@ +Samsung tm2-touchkey + +Required properties: +- compatible: must be "cypress,tm2-touchkey" +- reg: I2C address of the chip. +- interrupt-parent: a phandle for the interrupt controller (see interrupt + binding[0]). +- interrupts: interrupt to which the chip is connected (see interrupt + binding[0]). +- vcc-supply : internal regulator output. 1.8V +- vdd-supply : power supply for IC 3.3V + +[0]: Documentation/devicetree/bindings/interrupt-controller/interrupts.txt + +Example: +{ + /* ... */ + + touchkey@20 { + compatible = "cypress,tm2-touchkey"; + reg = <0x20>; + interrupt-parent = <>; + interrupts = <2 IRQ_TYPE_EDGE_FALLING>; + vcc-supply=<_reg>; + vdd-supply=<_reg>; + }; + }; -- 2.7.4
[PATCH v7 3/3] arm64: dts: exynos: Add tm2 touchkey node
Add DT node support for TM2 touchkey device. Signed-off-by: Beomho Seo Signed-off-by: Jaechul Lee Signed-off-by: Andi Shyti Reviewed-by: Javier Martinez Canillas --- arch/arm64/boot/dts/exynos/exynos5433-tm2.dts | 13 + 1 file changed, 13 insertions(+) diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts index 2449266..ddba2f8 100644 --- a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts +++ b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts @@ -18,6 +18,19 @@ compatible = "samsung,tm2", "samsung,exynos5433"; }; +_9 { + status = "okay"; + + touchkey@20 { + compatible = "cypress,tm2-touchkey"; + reg = <0x20>; + interrupt-parent = <>; + interrupts = <2 IRQ_TYPE_EDGE_FALLING>; + vcc-supply = <_reg>; + vdd-supply = <_reg>; + }; +}; + _reg { regulator-name = "TSP_VDD_1.85V_AP"; regulator-min-microvolt = <185>; -- 2.7.4
[PATCH v7 1/3] input: Add support for the tm2 touchkey device driver
This patch adds the binding description of the tm2 touchkey device driver. Signed-off-by: Jaechul Lee Reviewed-by: Javier Martinez Canillas Reviewed-by: Andi Shyti Reviewed-by: Chanwoo Choi Acked-by: Rob Herring Acked-by: Krzysztof Kozlowski --- .../bindings/input/cypress,tm2-touchkey.txt| 27 ++ 1 file changed, 27 insertions(+) create mode 100644 Documentation/devicetree/bindings/input/cypress,tm2-touchkey.txt diff --git a/Documentation/devicetree/bindings/input/cypress,tm2-touchkey.txt b/Documentation/devicetree/bindings/input/cypress,tm2-touchkey.txt new file mode 100644 index 000..635f62c --- /dev/null +++ b/Documentation/devicetree/bindings/input/cypress,tm2-touchkey.txt @@ -0,0 +1,27 @@ +Samsung tm2-touchkey + +Required properties: +- compatible: must be "cypress,tm2-touchkey" +- reg: I2C address of the chip. +- interrupt-parent: a phandle for the interrupt controller (see interrupt + binding[0]). +- interrupts: interrupt to which the chip is connected (see interrupt + binding[0]). +- vcc-supply : internal regulator output. 1.8V +- vdd-supply : power supply for IC 3.3V + +[0]: Documentation/devicetree/bindings/interrupt-controller/interrupts.txt + +Example: +{ + /* ... */ + + touchkey@20 { + compatible = "cypress,tm2-touchkey"; + reg = <0x20>; + interrupt-parent = <>; + interrupts = <2 IRQ_TYPE_EDGE_FALLING>; + vcc-supply=<_reg>; + vdd-supply=<_reg>; + }; + }; -- 2.7.4
Re: [PATCH] pcie: ti: Provide patch to force GEN1 PCIe operation
Hi Joao, On Monday 16 January 2017 10:31 PM, Joao Pinto wrote: > > Hi, > > Às 10:13 AM de 1/16/2017, Kishon Vijay Abraham I escreveu: >> + Joao, Jingoo >> >> Hi, >> >> On Monday 16 January 2017 03:01 PM, Lukasz Majewski wrote: >>> Hi Kishon, >>> Hi Łukasz, On Monday 16 January 2017 12:19 PM, Lukasz Majewski wrote: > Hi Kishon, > >> Hi, >> >> On Sunday 15 January 2017 06:49 PM, Lukasz Majewski wrote: >>> Some devices (due to e.g. bad PCIe signal integrity) require to >>> run with forced GEN1 speed on PCIe bus. >>> >>> This patch changes the speed explicitly on dra7 based devices when >>> proper device tree attribute is defined for the PCIe controller. >>> >>> Signed-off-by: Lukasz Majewski>> >> Bjorn has already queued a patch to do the same thing >> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_cgit_linux_kernel_git_helgaas_pci.git_log_-3Fh-3Dpci_host-2Ddra7xx=DwIDaQ=DPL6_X_6JkXFx7AXWqB0tg=s2fO0hii0OGNOv9qQy_HRXy-xAJUD1NNoEcc3io_kx0=zD82T5n4WcL7Ga-NSY2NI7KE75xQ99hN-mW2yX46wQk=E8zk1CbKxGH-f3fw_WpXxFU-A8BLkgA8NusCaxk1SvA= >> > > It seems like Bjorn only modifies CAP registers. The patch also modifies the LNKCTL2 register. > > He also needs to change register with 0x080C offset to actually > ( PCIECTRL_PL_WIDTH_SPEED_CTL ) This bit is used to initiate speed change (after the link is initialized in GEN1). Resetting the bit (like what you have done here) prevents speed change. >>> >>> This is strange, but e2e advised me to do things as I did in the patch >>> to _force_ GEN1 operation on PCIe2 port [1] (AM5728) >>> >>> Link: >>> [1] >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__e2e.ti.com_support_arm_sitara-5Farm_f_791_t_566421=DwIDaQ=DPL6_X_6JkXFx7AXWqB0tg=s2fO0hii0OGNOv9qQy_HRXy-xAJUD1NNoEcc3io_kx0=zD82T5n4WcL7Ga-NSY2NI7KE75xQ99hN-mW2yX46wQk=uXLwglyRYqKpwp1JSxkOWmKpQ2wjfhgofpm8DCfquNw= >>> >>> >>> Both patches modify 0x5180 007C register to set GEN1 capability >>> (PCI_EXP_LNKCAP_SLS_2_5GB) >>> >>> The problem is with second register (in your patch): >>> >>> From SPRUHZ6G TRM: >>> >>> PCIECTRL_EP_DBICS_LNK_CAS_2 (0x5180 00A0) >>> - TRGT_LINK_SPEED (Reset 0x1) - "Target Link Speed" - no more >>> description in TRM >>> >>> It is set to PCI_EXP_LNKCAP_SLS_2_5GB = 0x1, which is the same as >>> default /reset value. >> >> The default value is 0x2 (or else none of the cards would have enumerated in >> GEN2) >>> >>> >>> Could you clarify which way to _force_ PCIe GEN1 operation is correct? >>> Mine shows differences in lspci output (as posted in [1]). >> >> You'll see the difference even with the patch in Bjorn's tree ;-) >> >> I think these are 2 different approaches to keep the link at GEN1. Joao or >> Jingoo, do you have any suggestion here? > > I studied the Databook, and both approaches seem to be right, dependently of > the > Core configuration and setup. > > The standard manual speed change sequence is: > a) Write to PCIE_CAP_TARGET_LINK_SPEED (indicating desired speed) > b) Clear "Directed Speed Change" > c) Set "Directed Speed Change" > > If "Directed Speed Change" is set (DEFAULT_GEN2_SPEED_CHANGE is the default > value), it will execute LTSSM to initiate speed change to Gen2 or Gen3, after > link is started in Gen1, and then the bit is automatically cleared. > > Lukasz is reseting this bit, in order to avoid the LTSSM to be executed, which > is correct. There is another way to prevent this automatic speed change, which > is to set GEN1 speed before link up which might be difficult in some setups, > so > Kishon's also right. Just for my understanding, why do you think this will be difficult in some setups? > > In my opinion Lukasz approach would be the one that might be more universal > and > more "secure". IMHO setting link control in the standard PCIe header space should be more universal. I'm not sure about the secure part though. Thanks Kishon > > Joao > > >> >>> IMO the better way is to set the LNKCTL2 to GEN1 instead of hacking the IP register. >>> >>> From the original patch description: >>> >>> "Add support to force Root Complex to work in GEN1 mode if so desired, >>> but don't force GEN1 mode on any board just yet." >>> >>> Are there any (floating around) patches allowing forcing GEN1 operation >>> on any board (I would like to reuse/port them to my current solution)? >> >> For setting to GEN1 mode, "max-link-speed" should be set to 1 in dt with the >> patch in Bjorn's tree. >> >> Thanks >> Kishon >> >
Re: [PATCH] pcie: ti: Provide patch to force GEN1 PCIe operation
Hi Joao, On Monday 16 January 2017 10:31 PM, Joao Pinto wrote: > > Hi, > > Às 10:13 AM de 1/16/2017, Kishon Vijay Abraham I escreveu: >> + Joao, Jingoo >> >> Hi, >> >> On Monday 16 January 2017 03:01 PM, Lukasz Majewski wrote: >>> Hi Kishon, >>> Hi Łukasz, On Monday 16 January 2017 12:19 PM, Lukasz Majewski wrote: > Hi Kishon, > >> Hi, >> >> On Sunday 15 January 2017 06:49 PM, Lukasz Majewski wrote: >>> Some devices (due to e.g. bad PCIe signal integrity) require to >>> run with forced GEN1 speed on PCIe bus. >>> >>> This patch changes the speed explicitly on dra7 based devices when >>> proper device tree attribute is defined for the PCIe controller. >>> >>> Signed-off-by: Lukasz Majewski >> >> Bjorn has already queued a patch to do the same thing >> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_cgit_linux_kernel_git_helgaas_pci.git_log_-3Fh-3Dpci_host-2Ddra7xx=DwIDaQ=DPL6_X_6JkXFx7AXWqB0tg=s2fO0hii0OGNOv9qQy_HRXy-xAJUD1NNoEcc3io_kx0=zD82T5n4WcL7Ga-NSY2NI7KE75xQ99hN-mW2yX46wQk=E8zk1CbKxGH-f3fw_WpXxFU-A8BLkgA8NusCaxk1SvA= >> > > It seems like Bjorn only modifies CAP registers. The patch also modifies the LNKCTL2 register. > > He also needs to change register with 0x080C offset to actually > ( PCIECTRL_PL_WIDTH_SPEED_CTL ) This bit is used to initiate speed change (after the link is initialized in GEN1). Resetting the bit (like what you have done here) prevents speed change. >>> >>> This is strange, but e2e advised me to do things as I did in the patch >>> to _force_ GEN1 operation on PCIe2 port [1] (AM5728) >>> >>> Link: >>> [1] >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__e2e.ti.com_support_arm_sitara-5Farm_f_791_t_566421=DwIDaQ=DPL6_X_6JkXFx7AXWqB0tg=s2fO0hii0OGNOv9qQy_HRXy-xAJUD1NNoEcc3io_kx0=zD82T5n4WcL7Ga-NSY2NI7KE75xQ99hN-mW2yX46wQk=uXLwglyRYqKpwp1JSxkOWmKpQ2wjfhgofpm8DCfquNw= >>> >>> >>> Both patches modify 0x5180 007C register to set GEN1 capability >>> (PCI_EXP_LNKCAP_SLS_2_5GB) >>> >>> The problem is with second register (in your patch): >>> >>> From SPRUHZ6G TRM: >>> >>> PCIECTRL_EP_DBICS_LNK_CAS_2 (0x5180 00A0) >>> - TRGT_LINK_SPEED (Reset 0x1) - "Target Link Speed" - no more >>> description in TRM >>> >>> It is set to PCI_EXP_LNKCAP_SLS_2_5GB = 0x1, which is the same as >>> default /reset value. >> >> The default value is 0x2 (or else none of the cards would have enumerated in >> GEN2) >>> >>> >>> Could you clarify which way to _force_ PCIe GEN1 operation is correct? >>> Mine shows differences in lspci output (as posted in [1]). >> >> You'll see the difference even with the patch in Bjorn's tree ;-) >> >> I think these are 2 different approaches to keep the link at GEN1. Joao or >> Jingoo, do you have any suggestion here? > > I studied the Databook, and both approaches seem to be right, dependently of > the > Core configuration and setup. > > The standard manual speed change sequence is: > a) Write to PCIE_CAP_TARGET_LINK_SPEED (indicating desired speed) > b) Clear "Directed Speed Change" > c) Set "Directed Speed Change" > > If "Directed Speed Change" is set (DEFAULT_GEN2_SPEED_CHANGE is the default > value), it will execute LTSSM to initiate speed change to Gen2 or Gen3, after > link is started in Gen1, and then the bit is automatically cleared. > > Lukasz is reseting this bit, in order to avoid the LTSSM to be executed, which > is correct. There is another way to prevent this automatic speed change, which > is to set GEN1 speed before link up which might be difficult in some setups, > so > Kishon's also right. Just for my understanding, why do you think this will be difficult in some setups? > > In my opinion Lukasz approach would be the one that might be more universal > and > more "secure". IMHO setting link control in the standard PCIe header space should be more universal. I'm not sure about the secure part though. Thanks Kishon > > Joao > > >> >>> IMO the better way is to set the LNKCTL2 to GEN1 instead of hacking the IP register. >>> >>> From the original patch description: >>> >>> "Add support to force Root Complex to work in GEN1 mode if so desired, >>> but don't force GEN1 mode on any board just yet." >>> >>> Are there any (floating around) patches allowing forcing GEN1 operation >>> on any board (I would like to reuse/port them to my current solution)? >> >> For setting to GEN1 mode, "max-link-speed" should be set to 1 in dt with the >> patch in Bjorn's tree. >> >> Thanks >> Kishon >> >