Re: [PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-16 Thread Eric Ren

Hi!

On 01/17/2017 03:39 PM, Joseph Qi wrote:


On 17/1/17 14:30, Eric Ren wrote:

We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a
precess already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that
are invoked directly by vfs code. For instance:

const struct inode_operations ocfs2_file_iops = {
 .permission = ocfs2_permission,
 .get_acl= ocfs2_iop_get_acl,
 .set_acl= ocfs2_iop_set_acl,
};

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
do_sys_open
  may_open
   inode_permission
ocfs2_permission
 ocfs2_inode_lock() <=== first time
  generic_permission
   get_acl
ocfs2_iop_get_acl
ocfs2_inode_lock() <=== recursive one

A deadlock will occur if a remote EX request comes in between two
of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started
on behalf of the remote EX lock request. Another hand, the recursive
cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
because there is no chance for the first cluster lock on this node to be
unlocked - we block ourselves in the code path.

The idea to fix this issue is mostly taken from gfs2 code.
1. introduce a new field: struct ocfs2_lock_res.l_holders, to
keep track of the processes' pid  who has taken the cluster lock
of this lock resource;
2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
it means just getting back disk inode bh for us if we've got cluster lock.
3. export a helper: ocfs2_is_locked_by_me() is used to check if we
have got the cluster lock in the upper code path.

The tracking logic should be used by some of the ocfs2 vfs's callbacks,
to solve the recursive locking issue cuased by the fact that vfs routines
can call into each other.

The performance penalty of processing the holder list should only be seen
at a few cases where the tracking logic is used, such as get/set acl.

You may ask what if the first time we got a PR lock, and the second time
we want a EX lock? fortunately, this case never happens in the real world,
as far as I can see, including permission check, (get|set)_(acl|attr), and
the gfs2 code also do so.

Changes since v1:
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .

- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi.

- Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell .

Changes since v2:
- Wrap the tracking logic code of recursive locking into functions,
ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(),
suggested by: Junxiao Bi.

[s...@canb.auug.org.au remove some inlines]
Signed-off-by: Eric Ren 
---
  fs/ocfs2/dlmglue.c | 105 +++--
  fs/ocfs2/dlmglue.h |  18 +
  fs/ocfs2/ocfs2.h   |   1 +
  3 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 77d1632..c75b9e9 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
  init_waitqueue_head(>l_event);
  INIT_LIST_HEAD(>l_blocked_list);
  INIT_LIST_HEAD(>l_mask_waiters);
+INIT_LIST_HEAD(>l_holders);
  }
void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
@@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
  res->l_flags = 0UL;
  }
  +/*
+ * Keep a list of processes who have interest in a lockres.
+ * Note: this is now only uesed for check recursive cluster locking.
+ */
+static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
+   struct ocfs2_lock_holder *oh)
+{
+INIT_LIST_HEAD(>oh_list);
+oh->oh_owner_pid =  get_pid(task_pid(current));

Trim the redundant space here.


You mean the blank line here? If so, I am OK to make the change. But,
I'm little worried about people may feel annoyed if I send the v4 patch set,
hah.


Others look good to me.
Reviewed-by: Joseph Qi 


Thanks for your guys' review!
Eric

+
+spin_lock(>l_lock);
+list_add_tail(>oh_list, >l_holders);
+spin_unlock(>l_lock);
+}
+
+static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
+   struct ocfs2_lock_holder *oh)
+{
+spin_lock(>l_lock);
+list_del(>oh_list);
+spin_unlock(>l_lock);
+
+put_pid(oh->oh_owner_pid);
+}
+
+static inline int ocfs2_is_locked_by_me(struct 

Re: [PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-16 Thread Eric Ren

Hi!

On 01/17/2017 03:39 PM, Joseph Qi wrote:


On 17/1/17 14:30, Eric Ren wrote:

We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a
precess already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that
are invoked directly by vfs code. For instance:

const struct inode_operations ocfs2_file_iops = {
 .permission = ocfs2_permission,
 .get_acl= ocfs2_iop_get_acl,
 .set_acl= ocfs2_iop_set_acl,
};

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
do_sys_open
  may_open
   inode_permission
ocfs2_permission
 ocfs2_inode_lock() <=== first time
  generic_permission
   get_acl
ocfs2_iop_get_acl
ocfs2_inode_lock() <=== recursive one

A deadlock will occur if a remote EX request comes in between two
of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started
on behalf of the remote EX lock request. Another hand, the recursive
cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
because there is no chance for the first cluster lock on this node to be
unlocked - we block ourselves in the code path.

The idea to fix this issue is mostly taken from gfs2 code.
1. introduce a new field: struct ocfs2_lock_res.l_holders, to
keep track of the processes' pid  who has taken the cluster lock
of this lock resource;
2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
it means just getting back disk inode bh for us if we've got cluster lock.
3. export a helper: ocfs2_is_locked_by_me() is used to check if we
have got the cluster lock in the upper code path.

The tracking logic should be used by some of the ocfs2 vfs's callbacks,
to solve the recursive locking issue cuased by the fact that vfs routines
can call into each other.

The performance penalty of processing the holder list should only be seen
at a few cases where the tracking logic is used, such as get/set acl.

You may ask what if the first time we got a PR lock, and the second time
we want a EX lock? fortunately, this case never happens in the real world,
as far as I can see, including permission check, (get|set)_(acl|attr), and
the gfs2 code also do so.

Changes since v1:
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .

- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi.

- Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell .

Changes since v2:
- Wrap the tracking logic code of recursive locking into functions,
ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(),
suggested by: Junxiao Bi.

[s...@canb.auug.org.au remove some inlines]
Signed-off-by: Eric Ren 
---
  fs/ocfs2/dlmglue.c | 105 +++--
  fs/ocfs2/dlmglue.h |  18 +
  fs/ocfs2/ocfs2.h   |   1 +
  3 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 77d1632..c75b9e9 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
  init_waitqueue_head(>l_event);
  INIT_LIST_HEAD(>l_blocked_list);
  INIT_LIST_HEAD(>l_mask_waiters);
+INIT_LIST_HEAD(>l_holders);
  }
void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
@@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
  res->l_flags = 0UL;
  }
  +/*
+ * Keep a list of processes who have interest in a lockres.
+ * Note: this is now only uesed for check recursive cluster locking.
+ */
+static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
+   struct ocfs2_lock_holder *oh)
+{
+INIT_LIST_HEAD(>oh_list);
+oh->oh_owner_pid =  get_pid(task_pid(current));

Trim the redundant space here.


You mean the blank line here? If so, I am OK to make the change. But,
I'm little worried about people may feel annoyed if I send the v4 patch set,
hah.


Others look good to me.
Reviewed-by: Joseph Qi 


Thanks for your guys' review!
Eric

+
+spin_lock(>l_lock);
+list_add_tail(>oh_list, >l_holders);
+spin_unlock(>l_lock);
+}
+
+static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
+   struct ocfs2_lock_holder *oh)
+{
+spin_lock(>l_lock);
+list_del(>oh_list);
+spin_unlock(>l_lock);
+
+put_pid(oh->oh_owner_pid);
+}
+
+static inline int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres)
+{
+struct ocfs2_lock_holder *oh;
+struct pid *pid;
+
+/* look in the list of 

Re: [PATCH 7/8] Revert "ext4: avoid deadlocks in the writeback path by using sb_getblk_gfp"

2017-01-16 Thread Michal Hocko
On Mon 16-01-17 22:01:18, Theodore Ts'o wrote:
> On Fri, Jan 06, 2017 at 03:11:06PM +0100, Michal Hocko wrote:
> > From: Michal Hocko 
> > 
> > This reverts commit c45653c341f5c8a0ce19c8f0ad4678640849cb86 because
> > sb_getblk_gfp is not really needed as
> > sb_getblk
> >   __getblk_gfp
> > __getblk_slow
> >   grow_buffers
> > grow_dev_page
> >   gfp_mask = mapping_gfp_constraint(inode->i_mapping, ~__GFP_FS) | gfp
> > 
> > so __GFP_FS is cleared unconditionally and therefore the above commit
> > didn't have any real effect in fact.
> > 
> > This patch should not introduce any functional change. The main point
> > of this change is to reduce explicit GFP_NOFS usage inside ext4 code to
> > make the review of the remaining usage easier.
> > 
> > Signed-off-by: Michal Hocko 
> > Reviewed-by: Jan Kara 
> 
> If I'm not mistaken, this patch is not dependent on any of the other
> patches in this series (and the other patches are not dependent on
> this one).  Hence, I could take this patch via the ext4 tree, correct?

Yes, that is correct

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v3 2/2] tpm: enhance TPM 2.0 PCR extend to support multiple banks

2017-01-16 Thread Nayna



On 01/12/2017 11:50 PM, Jarkko Sakkinen wrote:

On Thu, Jan 12, 2017 at 11:58:10AM -0500, Nayna Jain wrote:

The current TPM 2.0 device driver extends only the SHA1 PCR bank
but the TCG Specification[1] recommends extending all active PCR
banks, to prevent malicious users from setting unused PCR banks with
fake measurements and quoting them.

The existing in-kernel interface(tpm_pcr_extend()) expects only a
SHA1 digest.  To extend all active PCR banks with differing
digest sizes, the SHA1 digest is padded with trailing 0's as needed.

[1] TPM 2.0 Specification referred here is "TCG PC Client Specific
Platform Firmware Profile for TPM 2.0"

Signed-off-by: Nayna Jain 
---
  drivers/char/tpm/Kconfig |  1 +
  drivers/char/tpm/tpm-interface.c | 16 +-
  drivers/char/tpm/tpm.h   |  3 +-
  drivers/char/tpm/tpm2-cmd.c  | 68 +++-
  drivers/char/tpm/tpm_eventlog.h  | 18 +++
  5 files changed, 75 insertions(+), 31 deletions(-)

diff --git a/drivers/char/tpm/Kconfig b/drivers/char/tpm/Kconfig
index 277186d..af985cc 100644
--- a/drivers/char/tpm/Kconfig
+++ b/drivers/char/tpm/Kconfig
@@ -6,6 +6,7 @@ menuconfig TCG_TPM
tristate "TPM Hardware Support"
depends on HAS_IOMEM
select SECURITYFS
+   select CRYPTO_HASH_INFO


In the commit message you did not mention this.


---help---
  If you have a TPM security chip in your system, which
  implements the Trusted Computing Group's specification,
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index fecdd3f..e037dd2 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -7,6 +7,7 @@
   * Dave Safford 
   * Reiner Sailer 
   * Kylene Hall 
+ * Nayna Jain 


Remove.


   *
   * Maintained by: 
   *
@@ -759,6 +760,7 @@ static const struct tpm_input_header pcrextend_header = {
  int tpm_pcr_extend(u32 chip_num, int pcr_idx, const u8 *hash)
  {
struct tpm_cmd_t cmd;
+   int i;
int rc;
struct tpm_chip *chip;

@@ -767,7 +769,19 @@ int tpm_pcr_extend(u32 chip_num, int pcr_idx, const u8 
*hash)
return -ENODEV;

if (chip->flags & TPM_CHIP_FLAG_TPM2) {
-   rc = tpm2_pcr_extend(chip, pcr_idx, hash);
+   struct tpml_digest_values d_values;
+
+   memset(_values, 0, sizeof(d_values));
+
+   for (i = 0; (chip->active_banks[i] != 0) &&
+(i < ARRAY_SIZE(chip->active_banks)); i++) {
+   d_values.digests[i].alg_id = chip->active_banks[i];
+   memcpy(d_values.digests[i].digest, hash,
+  TPM_DIGEST_SIZE);
+   d_values.count++;
+   }
+
+   rc = tpm2_pcr_extend(chip, pcr_idx, _values);
tpm_put_ops(chip);
return rc;
}
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 573..dd82d58 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -533,7 +533,8 @@ static inline void tpm_add_ppi(struct tpm_chip *chip)
  #endif

  int tpm2_pcr_read(struct tpm_chip *chip, int pcr_idx, u8 *res_buf);
-int tpm2_pcr_extend(struct tpm_chip *chip, int pcr_idx, const u8 *hash);
+int tpm2_pcr_extend(struct tpm_chip *chip, int pcr_idx,
+   struct tpml_digest_values *digests);
  int tpm2_get_random(struct tpm_chip *chip, u8 *out, size_t max);
  int tpm2_seal_trusted(struct tpm_chip *chip,
  struct trusted_key_payload *payload,
diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c
index 87388921..5027a54 100644
--- a/drivers/char/tpm/tpm2-cmd.c
+++ b/drivers/char/tpm/tpm2-cmd.c
@@ -64,9 +64,7 @@ struct tpm2_pcr_extend_in {
__be32  pcr_idx;
__be32  auth_area_size;
struct tpm2_null_auth_area  auth_area;
-   __be32  digest_cnt;
-   __be16  hash_alg;
-   u8  digest[TPM_DIGEST_SIZE];
+   struct tpml_digest_values   digests;
  } __packed;

  struct tpm2_get_tpm_pt_in {
@@ -296,46 +294,58 @@ int tpm2_pcr_read(struct tpm_chip *chip, int pcr_idx, u8 
*res_buf)
return rc;
  }

-#define TPM2_GET_PCREXTEND_IN_SIZE \
-   (sizeof(struct tpm_input_header) + \
-sizeof(struct tpm2_pcr_extend_in))
-
-static const struct tpm_input_header tpm2_pcrextend_header = {
-   .tag = cpu_to_be16(TPM2_ST_SESSIONS),
-   .length = cpu_to_be32(TPM2_GET_PCREXTEND_IN_SIZE),
-   .ordinal = cpu_to_be32(TPM2_CC_PCR_EXTEND)
-};
-
  /**
   * tpm2_pcr_extend() - extend a PCR value
   *
   * @chip: TPM chip to use.
   * @pcr_idx:  index of the PCR.
- * @hash:  hash value 

Re: [PATCH 7/8] Revert "ext4: avoid deadlocks in the writeback path by using sb_getblk_gfp"

2017-01-16 Thread Michal Hocko
On Mon 16-01-17 22:01:18, Theodore Ts'o wrote:
> On Fri, Jan 06, 2017 at 03:11:06PM +0100, Michal Hocko wrote:
> > From: Michal Hocko 
> > 
> > This reverts commit c45653c341f5c8a0ce19c8f0ad4678640849cb86 because
> > sb_getblk_gfp is not really needed as
> > sb_getblk
> >   __getblk_gfp
> > __getblk_slow
> >   grow_buffers
> > grow_dev_page
> >   gfp_mask = mapping_gfp_constraint(inode->i_mapping, ~__GFP_FS) | gfp
> > 
> > so __GFP_FS is cleared unconditionally and therefore the above commit
> > didn't have any real effect in fact.
> > 
> > This patch should not introduce any functional change. The main point
> > of this change is to reduce explicit GFP_NOFS usage inside ext4 code to
> > make the review of the remaining usage easier.
> > 
> > Signed-off-by: Michal Hocko 
> > Reviewed-by: Jan Kara 
> 
> If I'm not mistaken, this patch is not dependent on any of the other
> patches in this series (and the other patches are not dependent on
> this one).  Hence, I could take this patch via the ext4 tree, correct?

Yes, that is correct

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v3 2/2] tpm: enhance TPM 2.0 PCR extend to support multiple banks

2017-01-16 Thread Nayna



On 01/12/2017 11:50 PM, Jarkko Sakkinen wrote:

On Thu, Jan 12, 2017 at 11:58:10AM -0500, Nayna Jain wrote:

The current TPM 2.0 device driver extends only the SHA1 PCR bank
but the TCG Specification[1] recommends extending all active PCR
banks, to prevent malicious users from setting unused PCR banks with
fake measurements and quoting them.

The existing in-kernel interface(tpm_pcr_extend()) expects only a
SHA1 digest.  To extend all active PCR banks with differing
digest sizes, the SHA1 digest is padded with trailing 0's as needed.

[1] TPM 2.0 Specification referred here is "TCG PC Client Specific
Platform Firmware Profile for TPM 2.0"

Signed-off-by: Nayna Jain 
---
  drivers/char/tpm/Kconfig |  1 +
  drivers/char/tpm/tpm-interface.c | 16 +-
  drivers/char/tpm/tpm.h   |  3 +-
  drivers/char/tpm/tpm2-cmd.c  | 68 +++-
  drivers/char/tpm/tpm_eventlog.h  | 18 +++
  5 files changed, 75 insertions(+), 31 deletions(-)

diff --git a/drivers/char/tpm/Kconfig b/drivers/char/tpm/Kconfig
index 277186d..af985cc 100644
--- a/drivers/char/tpm/Kconfig
+++ b/drivers/char/tpm/Kconfig
@@ -6,6 +6,7 @@ menuconfig TCG_TPM
tristate "TPM Hardware Support"
depends on HAS_IOMEM
select SECURITYFS
+   select CRYPTO_HASH_INFO


In the commit message you did not mention this.


---help---
  If you have a TPM security chip in your system, which
  implements the Trusted Computing Group's specification,
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index fecdd3f..e037dd2 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -7,6 +7,7 @@
   * Dave Safford 
   * Reiner Sailer 
   * Kylene Hall 
+ * Nayna Jain 


Remove.


   *
   * Maintained by: 
   *
@@ -759,6 +760,7 @@ static const struct tpm_input_header pcrextend_header = {
  int tpm_pcr_extend(u32 chip_num, int pcr_idx, const u8 *hash)
  {
struct tpm_cmd_t cmd;
+   int i;
int rc;
struct tpm_chip *chip;

@@ -767,7 +769,19 @@ int tpm_pcr_extend(u32 chip_num, int pcr_idx, const u8 
*hash)
return -ENODEV;

if (chip->flags & TPM_CHIP_FLAG_TPM2) {
-   rc = tpm2_pcr_extend(chip, pcr_idx, hash);
+   struct tpml_digest_values d_values;
+
+   memset(_values, 0, sizeof(d_values));
+
+   for (i = 0; (chip->active_banks[i] != 0) &&
+(i < ARRAY_SIZE(chip->active_banks)); i++) {
+   d_values.digests[i].alg_id = chip->active_banks[i];
+   memcpy(d_values.digests[i].digest, hash,
+  TPM_DIGEST_SIZE);
+   d_values.count++;
+   }
+
+   rc = tpm2_pcr_extend(chip, pcr_idx, _values);
tpm_put_ops(chip);
return rc;
}
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 573..dd82d58 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -533,7 +533,8 @@ static inline void tpm_add_ppi(struct tpm_chip *chip)
  #endif

  int tpm2_pcr_read(struct tpm_chip *chip, int pcr_idx, u8 *res_buf);
-int tpm2_pcr_extend(struct tpm_chip *chip, int pcr_idx, const u8 *hash);
+int tpm2_pcr_extend(struct tpm_chip *chip, int pcr_idx,
+   struct tpml_digest_values *digests);
  int tpm2_get_random(struct tpm_chip *chip, u8 *out, size_t max);
  int tpm2_seal_trusted(struct tpm_chip *chip,
  struct trusted_key_payload *payload,
diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c
index 87388921..5027a54 100644
--- a/drivers/char/tpm/tpm2-cmd.c
+++ b/drivers/char/tpm/tpm2-cmd.c
@@ -64,9 +64,7 @@ struct tpm2_pcr_extend_in {
__be32  pcr_idx;
__be32  auth_area_size;
struct tpm2_null_auth_area  auth_area;
-   __be32  digest_cnt;
-   __be16  hash_alg;
-   u8  digest[TPM_DIGEST_SIZE];
+   struct tpml_digest_values   digests;
  } __packed;

  struct tpm2_get_tpm_pt_in {
@@ -296,46 +294,58 @@ int tpm2_pcr_read(struct tpm_chip *chip, int pcr_idx, u8 
*res_buf)
return rc;
  }

-#define TPM2_GET_PCREXTEND_IN_SIZE \
-   (sizeof(struct tpm_input_header) + \
-sizeof(struct tpm2_pcr_extend_in))
-
-static const struct tpm_input_header tpm2_pcrextend_header = {
-   .tag = cpu_to_be16(TPM2_ST_SESSIONS),
-   .length = cpu_to_be32(TPM2_GET_PCREXTEND_IN_SIZE),
-   .ordinal = cpu_to_be32(TPM2_CC_PCR_EXTEND)
-};
-
  /**
   * tpm2_pcr_extend() - extend a PCR value
   *
   * @chip: TPM chip to use.
   * @pcr_idx:  index of the PCR.
- * @hash:  hash value to use for the extend operation.
+ * @digests:   list of pcr banks and corresponding hash values to be extended.
   *
   * Return: Same as with 

Re: [PATCH RESEND] coredump: Ensure proper size of sparse core files

2017-01-16 Thread Christoph Hellwig
On Mon, Jan 16, 2017 at 10:42:43PM -0500, David Miller wrote:
> The lseek() done by dump_skip() should extend the file properly.

lseek never extends a file.  It just moves the current file position.
So if you do not have a write after the lseek it does nothing.


Re: [PATCH RESEND] coredump: Ensure proper size of sparse core files

2017-01-16 Thread Christoph Hellwig
On Mon, Jan 16, 2017 at 10:42:43PM -0500, David Miller wrote:
> The lseek() done by dump_skip() should extend the file properly.

lseek never extends a file.  It just moves the current file position.
So if you do not have a write after the lseek it does nothing.


Re: [PATCH v2 2/3] spi: pxa2xx: Prepare for edge-triggered interrupts

2017-01-16 Thread Robert Jarzmik
Jan Kiszka  writes:

> When using the a device with edge-triggered interrupts, such as MSIs,
> the interrupt handler has to ensure that there is a point in time during
> its execution where all interrupts sources are silent so that a new
> event can trigger a new interrupt again.
>
> This is achieved here by looping over SSSR evaluation. We need to take
> into account that SSCR1 may be changed by the transfer handler, thus we
> need to redo the mask calculation, at least regarding the volatile
> interrupt enable bit (TIE).
>
> Signed-off-by: Jan Kiszka 
Hi Jan,

> + while (1) {
This bit worries me a bit, as this can be either :
 - hogging the SoC's CPU, endlessly running
 - or even worse, blocking the CPU for ever

The question behind is, should this be done in a top-half, or moved to a irq
thread ?

> + /* Ignore possible writes if we don't need to write */
> + if (!(sccr1_reg & SSCR1_TIE))
> + mask &= ~SSSR_TFS;
>  
> - if (!drv_data->master->cur_msg) {
> - handle_bad_msg(drv_data);
> - /* Never fail */
> - return IRQ_HANDLED;
> - }
> + if (!(status & mask))
> + return ret;
> +
> + if (!drv_data->master->cur_msg) {
> + handle_bad_msg(drv_data);
> + /* Never fail */
> + return IRQ_HANDLED;
> + }
> +
> + ret |= drv_data->transfer_handler(drv_data);
Mmm that looks weird to me, oring a irqreturn.

Imagine that on first iteration the handler returns IRQ_NONE, and on second
IRQ_HANDLED. This makes ret IRQ_HANDLED. Yet after the first iteration the
handler should have exited, especially if the interrupt is shared with another
driver.

Another thing which is along what Andy already said : it would be better
practice to have this loop in the form :
do {
...
} while (exit_condition_not_met);

Just for maintainability, it's better, and it concentrates the test on the
"exit_condition_not_met" in one place, which will enable us to review better the
algorithm.

Cheers.

-- 
Robert


Re: [PATCH v2 2/3] spi: pxa2xx: Prepare for edge-triggered interrupts

2017-01-16 Thread Robert Jarzmik
Jan Kiszka  writes:

> When using the a device with edge-triggered interrupts, such as MSIs,
> the interrupt handler has to ensure that there is a point in time during
> its execution where all interrupts sources are silent so that a new
> event can trigger a new interrupt again.
>
> This is achieved here by looping over SSSR evaluation. We need to take
> into account that SSCR1 may be changed by the transfer handler, thus we
> need to redo the mask calculation, at least regarding the volatile
> interrupt enable bit (TIE).
>
> Signed-off-by: Jan Kiszka 
Hi Jan,

> + while (1) {
This bit worries me a bit, as this can be either :
 - hogging the SoC's CPU, endlessly running
 - or even worse, blocking the CPU for ever

The question behind is, should this be done in a top-half, or moved to a irq
thread ?

> + /* Ignore possible writes if we don't need to write */
> + if (!(sccr1_reg & SSCR1_TIE))
> + mask &= ~SSSR_TFS;
>  
> - if (!drv_data->master->cur_msg) {
> - handle_bad_msg(drv_data);
> - /* Never fail */
> - return IRQ_HANDLED;
> - }
> + if (!(status & mask))
> + return ret;
> +
> + if (!drv_data->master->cur_msg) {
> + handle_bad_msg(drv_data);
> + /* Never fail */
> + return IRQ_HANDLED;
> + }
> +
> + ret |= drv_data->transfer_handler(drv_data);
Mmm that looks weird to me, oring a irqreturn.

Imagine that on first iteration the handler returns IRQ_NONE, and on second
IRQ_HANDLED. This makes ret IRQ_HANDLED. Yet after the first iteration the
handler should have exited, especially if the interrupt is shared with another
driver.

Another thing which is along what Andy already said : it would be better
practice to have this loop in the form :
do {
...
} while (exit_condition_not_met);

Just for maintainability, it's better, and it concentrates the test on the
"exit_condition_not_met" in one place, which will enable us to review better the
algorithm.

Cheers.

-- 
Robert


Re: [PATCH v1 00/54] block: support multipage bvec

2017-01-16 Thread Christoph Hellwig
On Tue, Jan 17, 2017 at 10:40:36AM +0800, Ming Lei wrote:
> IMO, the only one left is raid(1/5/10) which can be dealt with by the
> "NO_MP" flag.

No, they can't in the long run.  They need to handle the bvecs just
like everyone else.


Re: [PATCH v1 00/54] block: support multipage bvec

2017-01-16 Thread Christoph Hellwig
On Tue, Jan 17, 2017 at 10:40:36AM +0800, Ming Lei wrote:
> IMO, the only one left is raid(1/5/10) which can be dealt with by the
> "NO_MP" flag.

No, they can't in the long run.  They need to handle the bvecs just
like everyone else.


Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers

2017-01-16 Thread Michal Hocko
On Mon 16-01-17 13:57:43, John Hubbard wrote:
> 
> 
> On 01/16/2017 01:48 PM, Michal Hocko wrote:
> > On Mon 16-01-17 13:15:08, John Hubbard wrote:
> > > 
> > > 
> > > On 01/16/2017 11:40 AM, Michal Hocko wrote:
> > > > On Mon 16-01-17 11:09:37, John Hubbard wrote:
> > > > > 
> > > > > 
> > > > > On 01/16/2017 12:47 AM, Michal Hocko wrote:
> > > > > > On Sun 15-01-17 20:34:13, John Hubbard wrote:
> > > > [...]
> > > > > > > Is that "Reclaim modifiers" line still true, or is it a leftover 
> > > > > > > from an
> > > > > > > earlier approach? I am having trouble reconciling it with rest of 
> > > > > > > the
> > > > > > > patchset, because:
> > > > > > > 
> > > > > > > a) the flags argument below is effectively passed on to either 
> > > > > > > kmalloc_node
> > > > > > > (possibly adding, but not removing flags), or to 
> > > > > > > __vmalloc_node_flags.
> > > > > > 
> > > > > > The above only says thos are _unsupported_ - in other words the 
> > > > > > behavior
> > > > > > is not defined. Even if flags are passed down to kmalloc resp. 
> > > > > > vmalloc
> > > > > > it doesn't mean they are used that way.  Remember that vmalloc uses
> > > > > > some hardcoded GFP_KERNEL allocations.  So while I could be really
> > > > > > strict about this and mask away these flags I doubt this is worth 
> > > > > > the
> > > > > > additional code.
> > > > > 
> > > > > I do wonder about passing those flags through to kmalloc. Maybe it is 
> > > > > worth
> > > > > stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides 
> > > > > some
> > > > > insulation from any future changes to the implementation of kmalloc, 
> > > > > and it
> > > > > also makes the documentation more believable.
> > > > 
> > > > I am not really convinced that we should take an extra steps for these
> > > > flags. There are no existing users for those flags and new users should
> > > > follow the documentation.
> > > 
> > > OK, let's just fortify the documentation ever so slightly, then, so that
> > > users are more likely to do the right thing. How's this sound:
> > > 
> > > * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. 
> > > (Even
> > > * though the current implementation passes the flags on through to 
> > > kmalloc and
> > > * vmalloc, that is done for efficiency and to avoid unnecessary code. The 
> > > caller
> > > * should not pass in these flags.)
> > > *
> > > * __GFP_REPEAT is supported, but only for large (>64kB) allocations.
> > > 
> > > 
> > > ? Or is that documentation overkill?
> > 
> > Dunno, it sounds like an overkill to me. It is telling more than
> > necessary. If we want to be so vocal about gfp flags then we would have
> > to say much more I suspect. E.g. what about __GFP_HIGHMEM? This flag is
> > supported for vmalloc while unsupported for kmalloc. I am pretty sure
> > there would be other gfp flags to consider and then this would grow
> > borringly large and uninteresting to the point when people simply stop
> > reading it. Let's just be as simple as possible.
> 
> Agreed, on the simplicity point: simple and clear is ideal. But here, it's
> merely short, and not quite simple. :)  People will look at that short bit
> of documentation, and then notice that the flags are, in fact, all passed
> right on through down to both kmalloc_node and __vmalloc_node_flags.
> 
> If you don't want too much documentation, then I'd be inclined to say
> something higher-level, about the intent, rather than mentioning those two
> flags directly. Because as it stands, the documentation contradicts what the
> code does.

Feel free to suggest a better wording. I am, of course, open to any
changes.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/6] mm: introduce kv[mz]alloc helpers

2017-01-16 Thread Michal Hocko
On Mon 16-01-17 13:57:43, John Hubbard wrote:
> 
> 
> On 01/16/2017 01:48 PM, Michal Hocko wrote:
> > On Mon 16-01-17 13:15:08, John Hubbard wrote:
> > > 
> > > 
> > > On 01/16/2017 11:40 AM, Michal Hocko wrote:
> > > > On Mon 16-01-17 11:09:37, John Hubbard wrote:
> > > > > 
> > > > > 
> > > > > On 01/16/2017 12:47 AM, Michal Hocko wrote:
> > > > > > On Sun 15-01-17 20:34:13, John Hubbard wrote:
> > > > [...]
> > > > > > > Is that "Reclaim modifiers" line still true, or is it a leftover 
> > > > > > > from an
> > > > > > > earlier approach? I am having trouble reconciling it with rest of 
> > > > > > > the
> > > > > > > patchset, because:
> > > > > > > 
> > > > > > > a) the flags argument below is effectively passed on to either 
> > > > > > > kmalloc_node
> > > > > > > (possibly adding, but not removing flags), or to 
> > > > > > > __vmalloc_node_flags.
> > > > > > 
> > > > > > The above only says thos are _unsupported_ - in other words the 
> > > > > > behavior
> > > > > > is not defined. Even if flags are passed down to kmalloc resp. 
> > > > > > vmalloc
> > > > > > it doesn't mean they are used that way.  Remember that vmalloc uses
> > > > > > some hardcoded GFP_KERNEL allocations.  So while I could be really
> > > > > > strict about this and mask away these flags I doubt this is worth 
> > > > > > the
> > > > > > additional code.
> > > > > 
> > > > > I do wonder about passing those flags through to kmalloc. Maybe it is 
> > > > > worth
> > > > > stripping out __GFP_NORETRY and __GFP_NOFAIL, after all. It provides 
> > > > > some
> > > > > insulation from any future changes to the implementation of kmalloc, 
> > > > > and it
> > > > > also makes the documentation more believable.
> > > > 
> > > > I am not really convinced that we should take an extra steps for these
> > > > flags. There are no existing users for those flags and new users should
> > > > follow the documentation.
> > > 
> > > OK, let's just fortify the documentation ever so slightly, then, so that
> > > users are more likely to do the right thing. How's this sound:
> > > 
> > > * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported. 
> > > (Even
> > > * though the current implementation passes the flags on through to 
> > > kmalloc and
> > > * vmalloc, that is done for efficiency and to avoid unnecessary code. The 
> > > caller
> > > * should not pass in these flags.)
> > > *
> > > * __GFP_REPEAT is supported, but only for large (>64kB) allocations.
> > > 
> > > 
> > > ? Or is that documentation overkill?
> > 
> > Dunno, it sounds like an overkill to me. It is telling more than
> > necessary. If we want to be so vocal about gfp flags then we would have
> > to say much more I suspect. E.g. what about __GFP_HIGHMEM? This flag is
> > supported for vmalloc while unsupported for kmalloc. I am pretty sure
> > there would be other gfp flags to consider and then this would grow
> > borringly large and uninteresting to the point when people simply stop
> > reading it. Let's just be as simple as possible.
> 
> Agreed, on the simplicity point: simple and clear is ideal. But here, it's
> merely short, and not quite simple. :)  People will look at that short bit
> of documentation, and then notice that the flags are, in fact, all passed
> right on through down to both kmalloc_node and __vmalloc_node_flags.
> 
> If you don't want too much documentation, then I'd be inclined to say
> something higher-level, about the intent, rather than mentioning those two
> flags directly. Because as it stands, the documentation contradicts what the
> code does.

Feel free to suggest a better wording. I am, of course, open to any
changes.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v3 2/4] dt-bindings: Add TI SCI PM Domains

2017-01-16 Thread Tero Kristo

On 17/01/17 00:12, Dave Gerlach wrote:

On 01/13/2017 08:40 PM, Rob Herring wrote:

On Fri, Jan 13, 2017 at 2:28 PM, Dave Gerlach  wrote:

On 01/13/2017 01:25 PM, Rob Herring wrote:


On Thu, Jan 12, 2017 at 9:27 AM, Dave Gerlach  wrote:


Rob,

On 01/11/2017 03:34 PM, Rob Herring wrote:



On Mon, Jan 9, 2017 at 11:57 AM, Dave Gerlach 
wrote:



Rob,

On 01/09/2017 11:50 AM, Rob Herring wrote:




On Wed, Jan 04, 2017 at 02:55:34PM -0600, Dave Gerlach wrote:




Add a generic power domain implementation, TI SCI PM Domains, that
will hook into the genpd framework and allow the TI SCI
protocol to
control device power states.

Also, provide macros representing each device index as understood
by TI SCI to be used in the device node power-domain references.
These are identifiers for the K2G devices managed by the PMMC.

Signed-off-by: Nishanth Menon 
Signed-off-by: Dave Gerlach 
---
v2->v3:
Update k2g_pds node docs to show it should be a child
of pmmc
node.
In early versions a phandle was used to point to pmmc and
docs
still
incorrectly showed this.

 .../devicetree/bindings/soc/ti/sci-pm-domain.txt   | 59
++
 MAINTAINERS|  2 +
 include/dt-bindings/genpd/k2g.h| 90
++
 3 files changed, 151 insertions(+)
 create mode 100644
Documentation/devicetree/bindings/soc/ti/sci-pm-domain.txt
 create mode 100644 include/dt-bindings/genpd/k2g.h

diff --git
a/Documentation/devicetree/bindings/soc/ti/sci-pm-domain.txt
b/Documentation/devicetree/bindings/soc/ti/sci-pm-domain.txt
new file mode 100644
index ..4c9064e512cb
--- /dev/null
+++ b/Documentation/devicetree/bindings/soc/ti/sci-pm-domain.txt
@@ -0,0 +1,59 @@
+Texas Instruments TI-SCI Generic Power Domain
+-
+
+Some TI SoCs contain a system controller (like the PMMC, etc...)
that
is
+responsible for controlling the state of the IPs that are
present.
+Communication between the host processor running an OS and the
system
+controller happens through a protocol known as TI-SCI [1].
This pm
domain
+implementation plugs into the generic pm domain framework and
makes
use
of
+the TI SCI protocol power on and off each device when needed.
+
+[1] Documentation/devicetree/bindings/arm/keystone/ti,sci.txt
+
+PM Domain Node
+==
+The PM domain node represents the global PM domain managed by the
PMMC,
+which in this case is the single implementation as documented
by the
generic
+PM domain bindings in
Documentation/devicetree/bindings/power/power_domain.txt.
+Because this relies on the TI SCI protocol to communicate with
the
PMMC
it
+must be a child of the pmmc node.
+
+Required Properties:
+
+- compatible: should be "ti,sci-pm-domain"
+- #power-domain-cells: Must be 0.
+
+Example (K2G):
+-
+   pmmc: pmmc {
+   compatible = "ti,k2g-sci";
+   ...
+
+   k2g_pds: k2g_pds {
+   compatible = "ti,sci-pm-domain";
+   #power-domain-cells = <0>;
+   };
+   };
+
+PM Domain Consumers
+===
+Hardware blocks that require SCI control over their state must
provide
+a reference to the sci-pm-domain they are part of and a unique
device
+specific ID that identifies the device.
+
+Required Properties:
+
+- power-domains: phandle pointing to the corresponding PM domain
node.
+- ti,sci-id: index representing the device id to be passed
oevr SCI
to
+be used for device control.





As I've already stated before, this goes in power-domain cells.
When
you
have a single thing (i.e. node) that controls multiple things, then
you
you need to specify the ID for each of them in phandle args.
This is
how
irqs, gpio, clocks, *everything* in DT works.





You think the reasoning for doing it this way provided by both
Ulf and
myself on v2 [1] is not valid then?

From Ulf:

To me, the TI SCI ID, is similar to a "conid" for any another
"device
resource" (like clock, pinctrl, regulator etc) which we can describe
in DT and assign to a device node. The only difference here, is that
we don't have common API to fetch the resource (like clk_get(),
regulator_get()), but instead we fetches the device's resource from
SoC specific code, via genpd's device ->attach() callback.




Sorry, but that sounds like a kernel problem to me and has nothing to
do with DT bindings.


From me:

Yes, you've pretty much hit it on the head. It is not an index
into a
list
of genpds but rather identifies the device *within* a single
genpd. It
is
a
property specific to each device that resides in a ti-sci-genpd,
not a
mapping describing which genpd the device belongs to. The generic
power
domain binding is concerned with mapping the device to a specific
genpd,
which is does fine for us, but we have a sub mapping for 

Re: [PATCH v3 2/4] dt-bindings: Add TI SCI PM Domains

2017-01-16 Thread Tero Kristo

On 17/01/17 00:12, Dave Gerlach wrote:

On 01/13/2017 08:40 PM, Rob Herring wrote:

On Fri, Jan 13, 2017 at 2:28 PM, Dave Gerlach  wrote:

On 01/13/2017 01:25 PM, Rob Herring wrote:


On Thu, Jan 12, 2017 at 9:27 AM, Dave Gerlach  wrote:


Rob,

On 01/11/2017 03:34 PM, Rob Herring wrote:



On Mon, Jan 9, 2017 at 11:57 AM, Dave Gerlach 
wrote:



Rob,

On 01/09/2017 11:50 AM, Rob Herring wrote:




On Wed, Jan 04, 2017 at 02:55:34PM -0600, Dave Gerlach wrote:




Add a generic power domain implementation, TI SCI PM Domains, that
will hook into the genpd framework and allow the TI SCI
protocol to
control device power states.

Also, provide macros representing each device index as understood
by TI SCI to be used in the device node power-domain references.
These are identifiers for the K2G devices managed by the PMMC.

Signed-off-by: Nishanth Menon 
Signed-off-by: Dave Gerlach 
---
v2->v3:
Update k2g_pds node docs to show it should be a child
of pmmc
node.
In early versions a phandle was used to point to pmmc and
docs
still
incorrectly showed this.

 .../devicetree/bindings/soc/ti/sci-pm-domain.txt   | 59
++
 MAINTAINERS|  2 +
 include/dt-bindings/genpd/k2g.h| 90
++
 3 files changed, 151 insertions(+)
 create mode 100644
Documentation/devicetree/bindings/soc/ti/sci-pm-domain.txt
 create mode 100644 include/dt-bindings/genpd/k2g.h

diff --git
a/Documentation/devicetree/bindings/soc/ti/sci-pm-domain.txt
b/Documentation/devicetree/bindings/soc/ti/sci-pm-domain.txt
new file mode 100644
index ..4c9064e512cb
--- /dev/null
+++ b/Documentation/devicetree/bindings/soc/ti/sci-pm-domain.txt
@@ -0,0 +1,59 @@
+Texas Instruments TI-SCI Generic Power Domain
+-
+
+Some TI SoCs contain a system controller (like the PMMC, etc...)
that
is
+responsible for controlling the state of the IPs that are
present.
+Communication between the host processor running an OS and the
system
+controller happens through a protocol known as TI-SCI [1].
This pm
domain
+implementation plugs into the generic pm domain framework and
makes
use
of
+the TI SCI protocol power on and off each device when needed.
+
+[1] Documentation/devicetree/bindings/arm/keystone/ti,sci.txt
+
+PM Domain Node
+==
+The PM domain node represents the global PM domain managed by the
PMMC,
+which in this case is the single implementation as documented
by the
generic
+PM domain bindings in
Documentation/devicetree/bindings/power/power_domain.txt.
+Because this relies on the TI SCI protocol to communicate with
the
PMMC
it
+must be a child of the pmmc node.
+
+Required Properties:
+
+- compatible: should be "ti,sci-pm-domain"
+- #power-domain-cells: Must be 0.
+
+Example (K2G):
+-
+   pmmc: pmmc {
+   compatible = "ti,k2g-sci";
+   ...
+
+   k2g_pds: k2g_pds {
+   compatible = "ti,sci-pm-domain";
+   #power-domain-cells = <0>;
+   };
+   };
+
+PM Domain Consumers
+===
+Hardware blocks that require SCI control over their state must
provide
+a reference to the sci-pm-domain they are part of and a unique
device
+specific ID that identifies the device.
+
+Required Properties:
+
+- power-domains: phandle pointing to the corresponding PM domain
node.
+- ti,sci-id: index representing the device id to be passed
oevr SCI
to
+be used for device control.





As I've already stated before, this goes in power-domain cells.
When
you
have a single thing (i.e. node) that controls multiple things, then
you
you need to specify the ID for each of them in phandle args.
This is
how
irqs, gpio, clocks, *everything* in DT works.





You think the reasoning for doing it this way provided by both
Ulf and
myself on v2 [1] is not valid then?

From Ulf:

To me, the TI SCI ID, is similar to a "conid" for any another
"device
resource" (like clock, pinctrl, regulator etc) which we can describe
in DT and assign to a device node. The only difference here, is that
we don't have common API to fetch the resource (like clk_get(),
regulator_get()), but instead we fetches the device's resource from
SoC specific code, via genpd's device ->attach() callback.




Sorry, but that sounds like a kernel problem to me and has nothing to
do with DT bindings.


From me:

Yes, you've pretty much hit it on the head. It is not an index
into a
list
of genpds but rather identifies the device *within* a single
genpd. It
is
a
property specific to each device that resides in a ti-sci-genpd,
not a
mapping describing which genpd the device belongs to. The generic
power
domain binding is concerned with mapping the device to a specific
genpd,
which is does fine for us, but we have a sub mapping for devices
that
exist
inside a genpd which, we must describe as well, hence the 

Re: [PATCH v3 3/3] reset: zx2967: add reset controller driver for ZTE's zx2967 family

2017-01-16 Thread Shawn Guo
On Tue, Jan 17, 2017 at 11:22:57AM +0800, Baoyou Xie wrote:
> This patch adds reset controller driver for ZTE's zx2967 family.
> 
> Signed-off-by: Baoyou Xie 

Reviewed-by: Shawn Guo 


Re: [PATCH v3 3/3] reset: zx2967: add reset controller driver for ZTE's zx2967 family

2017-01-16 Thread Shawn Guo
On Tue, Jan 17, 2017 at 11:22:57AM +0800, Baoyou Xie wrote:
> This patch adds reset controller driver for ZTE's zx2967 family.
> 
> Signed-off-by: Baoyou Xie 

Reviewed-by: Shawn Guo 


Re: [PATCH v4 07/15] lockdep: Implement crossrelease feature

2017-01-16 Thread Byungchul Park
On Tue, Jan 17, 2017 at 08:14:56AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 17, 2017 at 11:05:42AM +0900, Byungchul Park wrote:
> > On Mon, Jan 16, 2017 at 04:10:01PM +0100, Peter Zijlstra wrote:
> > > On Fri, Dec 09, 2016 at 02:12:03PM +0900, Byungchul Park wrote:
> 
> > > > @@ -155,6 +164,9 @@ struct lockdep_map {
> > > > int cpu;
> > > > unsigned long   ip;
> > > >  #endif
> > > > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > > > +   struct cross_lock   *xlock;
> > > > +#endif
> > > 
> > > The use of this escapes me; why does the lockdep_map need a pointer to
> > > this?
> > 
> > Lockdep interfaces e.g. lock_acquire(), lock_release() and lock_commit()
> > use lockdep_map as an arg, but crossrelease need to extract cross_lock
> > instances from that.
> 
> > > Why not do something like:
> > > 
> > > struct lockdep_map_cross {
> > >   struct lockdep_map  map;
> > >   struct held_lockhlock;
> > > }
> 
> Using a structure like that, you can pass lockdep_map_cross around just
> fine, since the lockdep_map is the first member, so the pointers are
> interchangeable. At worst we might need to munge a few typecasts.
> 
> But then the cross release code can simply cast to the bigger type and
> have access to the extra data it knows to be there.

Right. I will apply it.


Re: [PATCH v4 07/15] lockdep: Implement crossrelease feature

2017-01-16 Thread Byungchul Park
On Tue, Jan 17, 2017 at 08:14:56AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 17, 2017 at 11:05:42AM +0900, Byungchul Park wrote:
> > On Mon, Jan 16, 2017 at 04:10:01PM +0100, Peter Zijlstra wrote:
> > > On Fri, Dec 09, 2016 at 02:12:03PM +0900, Byungchul Park wrote:
> 
> > > > @@ -155,6 +164,9 @@ struct lockdep_map {
> > > > int cpu;
> > > > unsigned long   ip;
> > > >  #endif
> > > > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > > > +   struct cross_lock   *xlock;
> > > > +#endif
> > > 
> > > The use of this escapes me; why does the lockdep_map need a pointer to
> > > this?
> > 
> > Lockdep interfaces e.g. lock_acquire(), lock_release() and lock_commit()
> > use lockdep_map as an arg, but crossrelease need to extract cross_lock
> > instances from that.
> 
> > > Why not do something like:
> > > 
> > > struct lockdep_map_cross {
> > >   struct lockdep_map  map;
> > >   struct held_lockhlock;
> > > }
> 
> Using a structure like that, you can pass lockdep_map_cross around just
> fine, since the lockdep_map is the first member, so the pointers are
> interchangeable. At worst we might need to munge a few typecasts.
> 
> But then the cross release code can simply cast to the bigger type and
> have access to the extra data it knows to be there.

Right. I will apply it.


Re: [PATCH v3 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points

2017-01-16 Thread Joseph Qi


On 17/1/17 14:30, Eric Ren wrote:

Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
results in a deadlock, as the author "Tariq Saeed" realized shortly
after the patch was merged. The discussion happened here
(https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).

The reason why taking cluster inode lock at vfs entry points opens up
a self deadlock window, is explained in the previous patch of this
series.

So far, we have seen two different code paths that have this issue.
1. do_sys_open
  may_open
   inode_permission
ocfs2_permission
 ocfs2_inode_lock() <=== take PR
  generic_permission
   get_acl
ocfs2_iop_get_acl
 ocfs2_inode_lock() <=== take PR
2. fchmod|fchmodat
 chmod_common
  notify_change
   ocfs2_setattr <=== take EX
posix_acl_chmod
 get_acl
  ocfs2_iop_get_acl <=== take PR
 ocfs2_iop_set_acl <=== take EX

Fixes them by adding the tracking logic (in the previous patch) for
these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
ocfs2_setattr().

Changes since v1:
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .

- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi.

- Add debugging output at ocfs2_setattr() and ocfs2_permission() to
catch exceptional cases, suggested by: Junxiao Bi.

Changes since v2:
- Use new wrappers of tracking logic code, suggested by: Junxiao Bi.

Signed-off-by: Eric Ren 

Looks good.
Reviewed-by: Joseph Qi 

---
  fs/ocfs2/acl.c  | 29 +
  fs/ocfs2/file.c | 58 -
  2 files changed, 58 insertions(+), 29 deletions(-)

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index bed1fcb..dc22ba8 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -283,16 +283,14 @@ int ocfs2_set_acl(handle_t *handle,
  int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type)
  {
struct buffer_head *bh = NULL;
-   int status = 0;
+   int status, had_lock;
+   struct ocfs2_lock_holder oh;
  
-	status = ocfs2_inode_lock(inode, , 1);

-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
-   return status;
-   }
+   had_lock = ocfs2_inode_lock_tracker(inode, , 1, );
+   if (had_lock < 0)
+   return had_lock;
status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
-   ocfs2_inode_unlock(inode, 1);
+   ocfs2_inode_unlock_tracker(inode, 1, , had_lock);
brelse(bh);
return status;
  }
@@ -302,21 +300,20 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, 
int type)
struct ocfs2_super *osb;
struct buffer_head *di_bh = NULL;
struct posix_acl *acl;
-   int ret;
+   int had_lock;
+   struct ocfs2_lock_holder oh;
  
  	osb = OCFS2_SB(inode->i_sb);

if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
return NULL;
-   ret = ocfs2_inode_lock(inode, _bh, 0);
-   if (ret < 0) {
-   if (ret != -ENOENT)
-   mlog_errno(ret);
-   return ERR_PTR(ret);
-   }
+
+   had_lock = ocfs2_inode_lock_tracker(inode, _bh, 0, );
+   if (had_lock < 0)
+   return ERR_PTR(had_lock);
  
  	acl = ocfs2_get_acl_nolock(inode, type, di_bh);
  
-	ocfs2_inode_unlock(inode, 0);

+   ocfs2_inode_unlock_tracker(inode, 0, , had_lock);
brelse(di_bh);
return acl;
  }
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index c488965..7b6a146 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1138,6 +1138,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
handle_t *handle = NULL;
struct dquot *transfer_to[MAXQUOTAS] = { };
int qtype;
+   int had_lock;
+   struct ocfs2_lock_holder oh;
  
  	trace_ocfs2_setattr(inode, dentry,

(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -1173,11 +1175,30 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
}
}
  
-	status = ocfs2_inode_lock(inode, , 1);

-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
+   had_lock = ocfs2_inode_lock_tracker(inode, , 1, );
+   if (had_lock < 0) {
+   status = had_lock;
goto bail_unlock_rw;
+   } else if (had_lock) {
+   /*
+* As far as we know, ocfs2_setattr() could only be the first
+* VFS entry point in the call chain of recursive cluster
+* locking issue.
+*
+* For instance:
+* 

Re: [PATCH] fbdev: ssd1307fb: allow reset-gpios is missing

2017-01-16 Thread Maxime Ripard
On Mon, Jan 16, 2017 at 05:50:21PM +0800, Icenowy Zheng wrote:
> 
> 2017年1月16日 16:02于 Maxime Ripard 写道:
> >
> > On Sun, Jan 15, 2017 at 07:21:46PM +0800, Icenowy Zheng wrote: 
> > > Currently some SSD1306 OLED modules are sold without a reset pin (only 
> > > VCC, GND, SCK, SDA four pins). 
> > > 
> > > Add support for missing reset-gpios property. 
> > > 
> > > Signed-off-by: Icenowy Zheng  
> >
> > Unfortunately, a similar patch has been sent a couple of times 
> > already: 
> > https://www.spinics.net/lists/devicetree/msg158330.html 
> 
> Why is it never merged?

It was sent 4 days ago...

And since you didn't have the right maintainers in the cc list, yours
didn't have a chance to be merged either.
 
> There are really boards that needs this function.

Then  you can accelerate its inclusion by reviewing it.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


Re: [PATCH v4 07/15] lockdep: Implement crossrelease feature

2017-01-16 Thread Byungchul Park
On Tue, Jan 17, 2017 at 02:24:08PM +0800, Boqun Feng wrote:
> On Tue, Jan 17, 2017 at 11:33:41AM +0900, Byungchul Park wrote:
> > On Mon, Jan 16, 2017 at 04:13:19PM +0100, Peter Zijlstra wrote:
> > > On Fri, Dec 09, 2016 at 02:12:03PM +0900, Byungchul Park wrote:
> > > > +   /*
> > > > +* We assign class_idx here redundantly even though following
> > > > +* memcpy will cover it, in order to ensure a rcu reader can
> > > > +* access the class_idx atomically without lock.
> > > > +*
> > > > +* Here we assume setting a word-sized variable is atomic.
> > > 
> > > which one, where?
> > 
> > I meant xlock_class(xlock) in check_add_plock().
> > 
> > I was not sure about the following two.
> > 
> > 1. Is it ordered between following a and b?
> >a. memcpy -> list_add_tail_rcu
> >b. list_for_each_entry_rcu -> load class_idx (xlock_class)
> >I assumed that it's not ordered.
> > 2. Does memcpy guarantee atomic store for each word?
> >I assumed that it doesn't.
> > 
> > But I think I was wrong.. The first might be ordered. I will remove
> > the following redundant statement. It'd be orderd, right?
> > 
> 
> Yes, a and b are ordered, IOW, they could be paired, meaning when we
> got the item in a list_for_each_entry_rcu() loop, all memory operations
> before the corresponding list_add_tail_rcu() should be observed by us.

Thank you for confirming it.

> 
> Regards,
> Boqun
> 
> > > 
> > > > +*/
> > > > +   xlock->hlock.class_idx = hlock->class_idx;
> > > > +   gen_id = (unsigned int)atomic_inc_return(_gen_id);
> > > > +   WRITE_ONCE(xlock->gen_id, gen_id);
> > > > +   memcpy(>hlock, hlock, sizeof(struct held_lock));
> > > > +   INIT_LIST_HEAD(>xlock_entry);
> > > > +   list_add_tail_rcu(>xlock_entry, _head);
> > > 




Re: [PATCH v3 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points

2017-01-16 Thread Joseph Qi


On 17/1/17 14:30, Eric Ren wrote:

Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
results in a deadlock, as the author "Tariq Saeed" realized shortly
after the patch was merged. The discussion happened here
(https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).

The reason why taking cluster inode lock at vfs entry points opens up
a self deadlock window, is explained in the previous patch of this
series.

So far, we have seen two different code paths that have this issue.
1. do_sys_open
  may_open
   inode_permission
ocfs2_permission
 ocfs2_inode_lock() <=== take PR
  generic_permission
   get_acl
ocfs2_iop_get_acl
 ocfs2_inode_lock() <=== take PR
2. fchmod|fchmodat
 chmod_common
  notify_change
   ocfs2_setattr <=== take EX
posix_acl_chmod
 get_acl
  ocfs2_iop_get_acl <=== take PR
 ocfs2_iop_set_acl <=== take EX

Fixes them by adding the tracking logic (in the previous patch) for
these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
ocfs2_setattr().

Changes since v1:
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .

- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi.

- Add debugging output at ocfs2_setattr() and ocfs2_permission() to
catch exceptional cases, suggested by: Junxiao Bi.

Changes since v2:
- Use new wrappers of tracking logic code, suggested by: Junxiao Bi.

Signed-off-by: Eric Ren 

Looks good.
Reviewed-by: Joseph Qi 

---
  fs/ocfs2/acl.c  | 29 +
  fs/ocfs2/file.c | 58 -
  2 files changed, 58 insertions(+), 29 deletions(-)

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index bed1fcb..dc22ba8 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -283,16 +283,14 @@ int ocfs2_set_acl(handle_t *handle,
  int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type)
  {
struct buffer_head *bh = NULL;
-   int status = 0;
+   int status, had_lock;
+   struct ocfs2_lock_holder oh;
  
-	status = ocfs2_inode_lock(inode, , 1);

-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
-   return status;
-   }
+   had_lock = ocfs2_inode_lock_tracker(inode, , 1, );
+   if (had_lock < 0)
+   return had_lock;
status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
-   ocfs2_inode_unlock(inode, 1);
+   ocfs2_inode_unlock_tracker(inode, 1, , had_lock);
brelse(bh);
return status;
  }
@@ -302,21 +300,20 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, 
int type)
struct ocfs2_super *osb;
struct buffer_head *di_bh = NULL;
struct posix_acl *acl;
-   int ret;
+   int had_lock;
+   struct ocfs2_lock_holder oh;
  
  	osb = OCFS2_SB(inode->i_sb);

if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
return NULL;
-   ret = ocfs2_inode_lock(inode, _bh, 0);
-   if (ret < 0) {
-   if (ret != -ENOENT)
-   mlog_errno(ret);
-   return ERR_PTR(ret);
-   }
+
+   had_lock = ocfs2_inode_lock_tracker(inode, _bh, 0, );
+   if (had_lock < 0)
+   return ERR_PTR(had_lock);
  
  	acl = ocfs2_get_acl_nolock(inode, type, di_bh);
  
-	ocfs2_inode_unlock(inode, 0);

+   ocfs2_inode_unlock_tracker(inode, 0, , had_lock);
brelse(di_bh);
return acl;
  }
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index c488965..7b6a146 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1138,6 +1138,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
handle_t *handle = NULL;
struct dquot *transfer_to[MAXQUOTAS] = { };
int qtype;
+   int had_lock;
+   struct ocfs2_lock_holder oh;
  
  	trace_ocfs2_setattr(inode, dentry,

(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -1173,11 +1175,30 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
}
}
  
-	status = ocfs2_inode_lock(inode, , 1);

-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
+   had_lock = ocfs2_inode_lock_tracker(inode, , 1, );
+   if (had_lock < 0) {
+   status = had_lock;
goto bail_unlock_rw;
+   } else if (had_lock) {
+   /*
+* As far as we know, ocfs2_setattr() could only be the first
+* VFS entry point in the call chain of recursive cluster
+* locking issue.
+*
+* For instance:
+* chmod_common()
+*  notify_change()
+*   

Re: [PATCH] fbdev: ssd1307fb: allow reset-gpios is missing

2017-01-16 Thread Maxime Ripard
On Mon, Jan 16, 2017 at 05:50:21PM +0800, Icenowy Zheng wrote:
> 
> 2017年1月16日 16:02于 Maxime Ripard 写道:
> >
> > On Sun, Jan 15, 2017 at 07:21:46PM +0800, Icenowy Zheng wrote: 
> > > Currently some SSD1306 OLED modules are sold without a reset pin (only 
> > > VCC, GND, SCK, SDA four pins). 
> > > 
> > > Add support for missing reset-gpios property. 
> > > 
> > > Signed-off-by: Icenowy Zheng  
> >
> > Unfortunately, a similar patch has been sent a couple of times 
> > already: 
> > https://www.spinics.net/lists/devicetree/msg158330.html 
> 
> Why is it never merged?

It was sent 4 days ago...

And since you didn't have the right maintainers in the cc list, yours
didn't have a chance to be merged either.
 
> There are really boards that needs this function.

Then  you can accelerate its inclusion by reviewing it.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


Re: [PATCH v4 07/15] lockdep: Implement crossrelease feature

2017-01-16 Thread Byungchul Park
On Tue, Jan 17, 2017 at 02:24:08PM +0800, Boqun Feng wrote:
> On Tue, Jan 17, 2017 at 11:33:41AM +0900, Byungchul Park wrote:
> > On Mon, Jan 16, 2017 at 04:13:19PM +0100, Peter Zijlstra wrote:
> > > On Fri, Dec 09, 2016 at 02:12:03PM +0900, Byungchul Park wrote:
> > > > +   /*
> > > > +* We assign class_idx here redundantly even though following
> > > > +* memcpy will cover it, in order to ensure a rcu reader can
> > > > +* access the class_idx atomically without lock.
> > > > +*
> > > > +* Here we assume setting a word-sized variable is atomic.
> > > 
> > > which one, where?
> > 
> > I meant xlock_class(xlock) in check_add_plock().
> > 
> > I was not sure about the following two.
> > 
> > 1. Is it ordered between following a and b?
> >a. memcpy -> list_add_tail_rcu
> >b. list_for_each_entry_rcu -> load class_idx (xlock_class)
> >I assumed that it's not ordered.
> > 2. Does memcpy guarantee atomic store for each word?
> >I assumed that it doesn't.
> > 
> > But I think I was wrong.. The first might be ordered. I will remove
> > the following redundant statement. It'd be orderd, right?
> > 
> 
> Yes, a and b are ordered, IOW, they could be paired, meaning when we
> got the item in a list_for_each_entry_rcu() loop, all memory operations
> before the corresponding list_add_tail_rcu() should be observed by us.

Thank you for confirming it.

> 
> Regards,
> Boqun
> 
> > > 
> > > > +*/
> > > > +   xlock->hlock.class_idx = hlock->class_idx;
> > > > +   gen_id = (unsigned int)atomic_inc_return(_gen_id);
> > > > +   WRITE_ONCE(xlock->gen_id, gen_id);
> > > > +   memcpy(>hlock, hlock, sizeof(struct held_lock));
> > > > +   INIT_LIST_HEAD(>xlock_entry);
> > > > +   list_add_tail_rcu(>xlock_entry, _head);
> > > 




Re: [PATCH] staging: lustre: headers: potential UAPI headers

2017-01-16 Thread Greg Kroah-Hartman
On Mon, Jan 16, 2017 at 09:28:30PM +, James Simmons wrote:
> 
> > On Mon, Dec 19, 2016 at 12:06:47PM -0500, James Simmons wrote:
> > > Not for landing. This is the purposed UAPI headers
> > > with the removal of unlikely and debugging macros.
> > > This is just for feedback to see if this is acceptable
> > > for the upstream client.
> > > 
> > > Signed-off-by: James Simmons 
> > > ---
> > >  .../lustre/lustre/include/lustre/lustre_fid.h  | 353 
> > > +
> > >  .../lustre/lustre/include/lustre/lustre_ostid.h| 233 ++
> > 
> > Can you make a lustre "uapi" directory so we can see which files you
> > really want to be UAPI and which you don't as time goes on?
> 
> Where do you want them placed? In uapi/linux/lustre or uapi/lustre. Does
> it matter to you? The below was to forth coming UAPI headers which from
> your response you seem okay with in general.

How many .h files are there going to be?  It's just a single filesystem,
shouldn't you just need a single file?  If so, how about
drivers/staging/lustre/include/uapi/lustre.h
?

If you really need multiple .h files, put them all in the same uapi/
directory with a lustre_ prefix, you don't need a whole subdir just for
yourself, right?

thanks,

greg k-h


Re: [PATCH] staging: lustre: headers: potential UAPI headers

2017-01-16 Thread Greg Kroah-Hartman
On Mon, Jan 16, 2017 at 09:28:30PM +, James Simmons wrote:
> 
> > On Mon, Dec 19, 2016 at 12:06:47PM -0500, James Simmons wrote:
> > > Not for landing. This is the purposed UAPI headers
> > > with the removal of unlikely and debugging macros.
> > > This is just for feedback to see if this is acceptable
> > > for the upstream client.
> > > 
> > > Signed-off-by: James Simmons 
> > > ---
> > >  .../lustre/lustre/include/lustre/lustre_fid.h  | 353 
> > > +
> > >  .../lustre/lustre/include/lustre/lustre_ostid.h| 233 ++
> > 
> > Can you make a lustre "uapi" directory so we can see which files you
> > really want to be UAPI and which you don't as time goes on?
> 
> Where do you want them placed? In uapi/linux/lustre or uapi/lustre. Does
> it matter to you? The below was to forth coming UAPI headers which from
> your response you seem okay with in general.

How many .h files are there going to be?  It's just a single filesystem,
shouldn't you just need a single file?  If so, how about
drivers/staging/lustre/include/uapi/lustre.h
?

If you really need multiple .h files, put them all in the same uapi/
directory with a lustre_ prefix, you don't need a whole subdir just for
yourself, right?

thanks,

greg k-h


Re: [PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-16 Thread Joseph Qi


On 17/1/17 14:30, Eric Ren wrote:

We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a
precess already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that
are invoked directly by vfs code. For instance:

const struct inode_operations ocfs2_file_iops = {
 .permission = ocfs2_permission,
 .get_acl= ocfs2_iop_get_acl,
 .set_acl= ocfs2_iop_set_acl,
};

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
do_sys_open
  may_open
   inode_permission
ocfs2_permission
 ocfs2_inode_lock() <=== first time
  generic_permission
   get_acl
ocfs2_iop_get_acl
ocfs2_inode_lock() <=== recursive one

A deadlock will occur if a remote EX request comes in between two
of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started
on behalf of the remote EX lock request. Another hand, the recursive
cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
because there is no chance for the first cluster lock on this node to be
unlocked - we block ourselves in the code path.

The idea to fix this issue is mostly taken from gfs2 code.
1. introduce a new field: struct ocfs2_lock_res.l_holders, to
keep track of the processes' pid  who has taken the cluster lock
of this lock resource;
2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
it means just getting back disk inode bh for us if we've got cluster lock.
3. export a helper: ocfs2_is_locked_by_me() is used to check if we
have got the cluster lock in the upper code path.

The tracking logic should be used by some of the ocfs2 vfs's callbacks,
to solve the recursive locking issue cuased by the fact that vfs routines
can call into each other.

The performance penalty of processing the holder list should only be seen
at a few cases where the tracking logic is used, such as get/set acl.

You may ask what if the first time we got a PR lock, and the second time
we want a EX lock? fortunately, this case never happens in the real world,
as far as I can see, including permission check, (get|set)_(acl|attr), and
the gfs2 code also do so.

Changes since v1:
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .

- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi.

- Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell .

Changes since v2:
- Wrap the tracking logic code of recursive locking into functions,
ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(),
suggested by: Junxiao Bi.

[s...@canb.auug.org.au remove some inlines]
Signed-off-by: Eric Ren 
---
  fs/ocfs2/dlmglue.c | 105 +++--
  fs/ocfs2/dlmglue.h |  18 +
  fs/ocfs2/ocfs2.h   |   1 +
  3 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 77d1632..c75b9e9 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
init_waitqueue_head(>l_event);
INIT_LIST_HEAD(>l_blocked_list);
INIT_LIST_HEAD(>l_mask_waiters);
+   INIT_LIST_HEAD(>l_holders);
  }
  
  void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,

@@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
res->l_flags = 0UL;
  }
  
+/*

+ * Keep a list of processes who have interest in a lockres.
+ * Note: this is now only uesed for check recursive cluster locking.
+ */
+static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   INIT_LIST_HEAD(>oh_list);
+   oh->oh_owner_pid =  get_pid(task_pid(current));

Trim the redundant space here.
Others look good to me.
Reviewed-by: Joseph Qi 

+
+   spin_lock(>l_lock);
+   list_add_tail(>oh_list, >l_holders);
+   spin_unlock(>l_lock);
+}
+
+static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   spin_lock(>l_lock);
+   list_del(>oh_list);
+   spin_unlock(>l_lock);
+
+   put_pid(oh->oh_owner_pid);
+}
+
+static inline int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres)
+{
+   struct ocfs2_lock_holder *oh;
+   struct pid *pid;
+
+   /* look in the list of holders for one with the current task as owner */

Re: [PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-16 Thread Joseph Qi


On 17/1/17 14:30, Eric Ren wrote:

We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a
precess already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that
are invoked directly by vfs code. For instance:

const struct inode_operations ocfs2_file_iops = {
 .permission = ocfs2_permission,
 .get_acl= ocfs2_iop_get_acl,
 .set_acl= ocfs2_iop_set_acl,
};

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
do_sys_open
  may_open
   inode_permission
ocfs2_permission
 ocfs2_inode_lock() <=== first time
  generic_permission
   get_acl
ocfs2_iop_get_acl
ocfs2_inode_lock() <=== recursive one

A deadlock will occur if a remote EX request comes in between two
of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started
on behalf of the remote EX lock request. Another hand, the recursive
cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
because there is no chance for the first cluster lock on this node to be
unlocked - we block ourselves in the code path.

The idea to fix this issue is mostly taken from gfs2 code.
1. introduce a new field: struct ocfs2_lock_res.l_holders, to
keep track of the processes' pid  who has taken the cluster lock
of this lock resource;
2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
it means just getting back disk inode bh for us if we've got cluster lock.
3. export a helper: ocfs2_is_locked_by_me() is used to check if we
have got the cluster lock in the upper code path.

The tracking logic should be used by some of the ocfs2 vfs's callbacks,
to solve the recursive locking issue cuased by the fact that vfs routines
can call into each other.

The performance penalty of processing the holder list should only be seen
at a few cases where the tracking logic is used, such as get/set acl.

You may ask what if the first time we got a PR lock, and the second time
we want a EX lock? fortunately, this case never happens in the real world,
as far as I can see, including permission check, (get|set)_(acl|attr), and
the gfs2 code also do so.

Changes since v1:
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .

- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi.

- Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell .

Changes since v2:
- Wrap the tracking logic code of recursive locking into functions,
ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(),
suggested by: Junxiao Bi.

[s...@canb.auug.org.au remove some inlines]
Signed-off-by: Eric Ren 
---
  fs/ocfs2/dlmglue.c | 105 +++--
  fs/ocfs2/dlmglue.h |  18 +
  fs/ocfs2/ocfs2.h   |   1 +
  3 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 77d1632..c75b9e9 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
init_waitqueue_head(>l_event);
INIT_LIST_HEAD(>l_blocked_list);
INIT_LIST_HEAD(>l_mask_waiters);
+   INIT_LIST_HEAD(>l_holders);
  }
  
  void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,

@@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
res->l_flags = 0UL;
  }
  
+/*

+ * Keep a list of processes who have interest in a lockres.
+ * Note: this is now only uesed for check recursive cluster locking.
+ */
+static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   INIT_LIST_HEAD(>oh_list);
+   oh->oh_owner_pid =  get_pid(task_pid(current));

Trim the redundant space here.
Others look good to me.
Reviewed-by: Joseph Qi 

+
+   spin_lock(>l_lock);
+   list_add_tail(>oh_list, >l_holders);
+   spin_unlock(>l_lock);
+}
+
+static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   spin_lock(>l_lock);
+   list_del(>oh_list);
+   spin_unlock(>l_lock);
+
+   put_pid(oh->oh_owner_pid);
+}
+
+static inline int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres)
+{
+   struct ocfs2_lock_holder *oh;
+   struct pid *pid;
+
+   /* look in the list of holders for one with the current task as owner */
+   spin_lock(>l_lock);
+   pid = task_pid(current);
+   list_for_each_entry(oh, >l_holders, 

[PATCH v2] usb: host: xhci: plat: check hcc_params after add hcd

2017-01-16 Thread William Wu
From: William wu 

The commit 4ac53087d6d4 ("usb: xhci: plat: Create both
HCDs before adding them") move add hcd to the end of
probe, this cause hcc_params uninitiated, because xHCI
driver sets hcc_params in xhci_gen_setup() called from
usb_add_hcd().

This patch checks the Maximum Primary Stream Array Size
in the hcc_params register after add primary hcd.

Signed-off-by: William wu 
---
 drivers/usb/host/xhci-plat.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
index ddfab30..f96caeb 100644
--- a/drivers/usb/host/xhci-plat.c
+++ b/drivers/usb/host/xhci-plat.c
@@ -232,9 +232,6 @@ static int xhci_plat_probe(struct platform_device *pdev)
if (device_property_read_bool(>dev, "usb3-lpm-capable"))
xhci->quirks |= XHCI_LPM_SUPPORT;
 
-   if (HCC_MAX_PSA(xhci->hcc_params) >= 4)
-   xhci->shared_hcd->can_do_streams = 1;
-
hcd->usb_phy = devm_usb_get_phy_by_phandle(>dev, "usb-phy", 0);
if (IS_ERR(hcd->usb_phy)) {
ret = PTR_ERR(hcd->usb_phy);
@@ -251,6 +248,9 @@ static int xhci_plat_probe(struct platform_device *pdev)
if (ret)
goto disable_usb_phy;
 
+   if (HCC_MAX_PSA(xhci->hcc_params) >= 4)
+   xhci->shared_hcd->can_do_streams = 1;
+
ret = usb_add_hcd(xhci->shared_hcd, irq, IRQF_SHARED);
if (ret)
goto dealloc_usb2_hcd;
-- 
2.7.4




[PATCH v2] usb: host: xhci: plat: check hcc_params after add hcd

2017-01-16 Thread William Wu
From: William wu 

The commit 4ac53087d6d4 ("usb: xhci: plat: Create both
HCDs before adding them") move add hcd to the end of
probe, this cause hcc_params uninitiated, because xHCI
driver sets hcc_params in xhci_gen_setup() called from
usb_add_hcd().

This patch checks the Maximum Primary Stream Array Size
in the hcc_params register after add primary hcd.

Signed-off-by: William wu 
---
 drivers/usb/host/xhci-plat.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
index ddfab30..f96caeb 100644
--- a/drivers/usb/host/xhci-plat.c
+++ b/drivers/usb/host/xhci-plat.c
@@ -232,9 +232,6 @@ static int xhci_plat_probe(struct platform_device *pdev)
if (device_property_read_bool(>dev, "usb3-lpm-capable"))
xhci->quirks |= XHCI_LPM_SUPPORT;
 
-   if (HCC_MAX_PSA(xhci->hcc_params) >= 4)
-   xhci->shared_hcd->can_do_streams = 1;
-
hcd->usb_phy = devm_usb_get_phy_by_phandle(>dev, "usb-phy", 0);
if (IS_ERR(hcd->usb_phy)) {
ret = PTR_ERR(hcd->usb_phy);
@@ -251,6 +248,9 @@ static int xhci_plat_probe(struct platform_device *pdev)
if (ret)
goto disable_usb_phy;
 
+   if (HCC_MAX_PSA(xhci->hcc_params) >= 4)
+   xhci->shared_hcd->can_do_streams = 1;
+
ret = usb_add_hcd(xhci->shared_hcd, irq, IRQF_SHARED);
if (ret)
goto dealloc_usb2_hcd;
-- 
2.7.4




Re: [PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-16 Thread Junxiao Bi
On 01/17/2017 02:30 PM, Eric Ren wrote:
> We are in the situation that we have to avoid recursive cluster locking,
> but there is no way to check if a cluster lock has been taken by a
> precess already.
> 
> Mostly, we can avoid recursive locking by writing code carefully.
> However, we found that it's very hard to handle the routines that
> are invoked directly by vfs code. For instance:
> 
> const struct inode_operations ocfs2_file_iops = {
> .permission = ocfs2_permission,
> .get_acl= ocfs2_iop_get_acl,
> .set_acl= ocfs2_iop_set_acl,
> };
> 
> Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
> do_sys_open
>  may_open
>   inode_permission
>ocfs2_permission
> ocfs2_inode_lock() <=== first time
>  generic_permission
>   get_acl
>ocfs2_iop_get_acl
>   ocfs2_inode_lock() <=== recursive one
> 
> A deadlock will occur if a remote EX request comes in between two
> of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:
> 
> On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
> BAST(ocfs2_generic_handle_bast) when downconvert is started
> on behalf of the remote EX lock request. Another hand, the recursive
> cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
> because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
> because there is no chance for the first cluster lock on this node to be
> unlocked - we block ourselves in the code path.
> 
> The idea to fix this issue is mostly taken from gfs2 code.
> 1. introduce a new field: struct ocfs2_lock_res.l_holders, to
> keep track of the processes' pid  who has taken the cluster lock
> of this lock resource;
> 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
> it means just getting back disk inode bh for us if we've got cluster lock.
> 3. export a helper: ocfs2_is_locked_by_me() is used to check if we
> have got the cluster lock in the upper code path.
> 
> The tracking logic should be used by some of the ocfs2 vfs's callbacks,
> to solve the recursive locking issue cuased by the fact that vfs routines
> can call into each other.
> 
> The performance penalty of processing the holder list should only be seen
> at a few cases where the tracking logic is used, such as get/set acl.
> 
> You may ask what if the first time we got a PR lock, and the second time
> we want a EX lock? fortunately, this case never happens in the real world,
> as far as I can see, including permission check, (get|set)_(acl|attr), and
> the gfs2 code also do so.
> 
> Changes since v1:
> - Let ocfs2_is_locked_by_me() just return true/false to indicate if the
> process gets the cluster lock - suggested by: Joseph Qi 
> and Junxiao Bi .
> 
> - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
> suggested by: Junxiao Bi.
> 
> - Do not inline functions whose bodies are not in scope, changed by:
> Stephen Rothwell .
> 
> Changes since v2:
> - Wrap the tracking logic code of recursive locking into functions,
> ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(),
> suggested by: Junxiao Bi.
> 
> [s...@canb.auug.org.au remove some inlines]
> Signed-off-by: Eric Ren 

Reviewed-by: Junxiao Bi 

> ---
>  fs/ocfs2/dlmglue.c | 105 
> +++--
>  fs/ocfs2/dlmglue.h |  18 +
>  fs/ocfs2/ocfs2.h   |   1 +
>  3 files changed, 121 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 77d1632..c75b9e9 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
>   init_waitqueue_head(>l_event);
>   INIT_LIST_HEAD(>l_blocked_list);
>   INIT_LIST_HEAD(>l_mask_waiters);
> + INIT_LIST_HEAD(>l_holders);
>  }
>  
>  void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
> @@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
>   res->l_flags = 0UL;
>  }
>  
> +/*
> + * Keep a list of processes who have interest in a lockres.
> + * Note: this is now only uesed for check recursive cluster locking.
> + */
> +static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
> +struct ocfs2_lock_holder *oh)
> +{
> + INIT_LIST_HEAD(>oh_list);
> + oh->oh_owner_pid =  get_pid(task_pid(current));
> +
> + spin_lock(>l_lock);
> + list_add_tail(>oh_list, >l_holders);
> + spin_unlock(>l_lock);
> +}
> +
> +static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
> +struct ocfs2_lock_holder *oh)
> +{
> + spin_lock(>l_lock);
> + list_del(>oh_list);
> + spin_unlock(>l_lock);
> +
> + put_pid(oh->oh_owner_pid);
> +}
> +
> +static inline int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres)
> +{

Re: [PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-16 Thread Junxiao Bi
On 01/17/2017 02:30 PM, Eric Ren wrote:
> We are in the situation that we have to avoid recursive cluster locking,
> but there is no way to check if a cluster lock has been taken by a
> precess already.
> 
> Mostly, we can avoid recursive locking by writing code carefully.
> However, we found that it's very hard to handle the routines that
> are invoked directly by vfs code. For instance:
> 
> const struct inode_operations ocfs2_file_iops = {
> .permission = ocfs2_permission,
> .get_acl= ocfs2_iop_get_acl,
> .set_acl= ocfs2_iop_set_acl,
> };
> 
> Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
> do_sys_open
>  may_open
>   inode_permission
>ocfs2_permission
> ocfs2_inode_lock() <=== first time
>  generic_permission
>   get_acl
>ocfs2_iop_get_acl
>   ocfs2_inode_lock() <=== recursive one
> 
> A deadlock will occur if a remote EX request comes in between two
> of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:
> 
> On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
> BAST(ocfs2_generic_handle_bast) when downconvert is started
> on behalf of the remote EX lock request. Another hand, the recursive
> cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
> because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
> because there is no chance for the first cluster lock on this node to be
> unlocked - we block ourselves in the code path.
> 
> The idea to fix this issue is mostly taken from gfs2 code.
> 1. introduce a new field: struct ocfs2_lock_res.l_holders, to
> keep track of the processes' pid  who has taken the cluster lock
> of this lock resource;
> 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
> it means just getting back disk inode bh for us if we've got cluster lock.
> 3. export a helper: ocfs2_is_locked_by_me() is used to check if we
> have got the cluster lock in the upper code path.
> 
> The tracking logic should be used by some of the ocfs2 vfs's callbacks,
> to solve the recursive locking issue cuased by the fact that vfs routines
> can call into each other.
> 
> The performance penalty of processing the holder list should only be seen
> at a few cases where the tracking logic is used, such as get/set acl.
> 
> You may ask what if the first time we got a PR lock, and the second time
> we want a EX lock? fortunately, this case never happens in the real world,
> as far as I can see, including permission check, (get|set)_(acl|attr), and
> the gfs2 code also do so.
> 
> Changes since v1:
> - Let ocfs2_is_locked_by_me() just return true/false to indicate if the
> process gets the cluster lock - suggested by: Joseph Qi 
> and Junxiao Bi .
> 
> - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
> suggested by: Junxiao Bi.
> 
> - Do not inline functions whose bodies are not in scope, changed by:
> Stephen Rothwell .
> 
> Changes since v2:
> - Wrap the tracking logic code of recursive locking into functions,
> ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(),
> suggested by: Junxiao Bi.
> 
> [s...@canb.auug.org.au remove some inlines]
> Signed-off-by: Eric Ren 

Reviewed-by: Junxiao Bi 

> ---
>  fs/ocfs2/dlmglue.c | 105 
> +++--
>  fs/ocfs2/dlmglue.h |  18 +
>  fs/ocfs2/ocfs2.h   |   1 +
>  3 files changed, 121 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 77d1632..c75b9e9 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
>   init_waitqueue_head(>l_event);
>   INIT_LIST_HEAD(>l_blocked_list);
>   INIT_LIST_HEAD(>l_mask_waiters);
> + INIT_LIST_HEAD(>l_holders);
>  }
>  
>  void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
> @@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
>   res->l_flags = 0UL;
>  }
>  
> +/*
> + * Keep a list of processes who have interest in a lockres.
> + * Note: this is now only uesed for check recursive cluster locking.
> + */
> +static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
> +struct ocfs2_lock_holder *oh)
> +{
> + INIT_LIST_HEAD(>oh_list);
> + oh->oh_owner_pid =  get_pid(task_pid(current));
> +
> + spin_lock(>l_lock);
> + list_add_tail(>oh_list, >l_holders);
> + spin_unlock(>l_lock);
> +}
> +
> +static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
> +struct ocfs2_lock_holder *oh)
> +{
> + spin_lock(>l_lock);
> + list_del(>oh_list);
> + spin_unlock(>l_lock);
> +
> + put_pid(oh->oh_owner_pid);
> +}
> +
> +static inline int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres)
> +{
> + struct ocfs2_lock_holder *oh;
> + struct pid *pid;
> +
> + /* look in the list of holders 

Re: [PATCH v3 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points

2017-01-16 Thread Junxiao Bi
On 01/17/2017 02:30 PM, Eric Ren wrote:
> Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
> results in a deadlock, as the author "Tariq Saeed" realized shortly
> after the patch was merged. The discussion happened here
> (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).
> 
> The reason why taking cluster inode lock at vfs entry points opens up
> a self deadlock window, is explained in the previous patch of this
> series.
> 
> So far, we have seen two different code paths that have this issue.
> 1. do_sys_open
>  may_open
>   inode_permission
>ocfs2_permission
> ocfs2_inode_lock() <=== take PR
>  generic_permission
>   get_acl
>ocfs2_iop_get_acl
> ocfs2_inode_lock() <=== take PR
> 2. fchmod|fchmodat
> chmod_common
>  notify_change
>   ocfs2_setattr <=== take EX
>posix_acl_chmod
> get_acl
>  ocfs2_iop_get_acl <=== take PR
> ocfs2_iop_set_acl <=== take EX
> 
> Fixes them by adding the tracking logic (in the previous patch) for
> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
> ocfs2_setattr().
> 
> Changes since v1:
> - Let ocfs2_is_locked_by_me() just return true/false to indicate if the
> process gets the cluster lock - suggested by: Joseph Qi 
> and Junxiao Bi .
> 
> - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
> suggested by: Junxiao Bi.
> 
> - Add debugging output at ocfs2_setattr() and ocfs2_permission() to
> catch exceptional cases, suggested by: Junxiao Bi.
> 
> Changes since v2:
> - Use new wrappers of tracking logic code, suggested by: Junxiao Bi.
> 
> Signed-off-by: Eric Ren 
Reviewed-by: Junxiao Bi 

> ---
>  fs/ocfs2/acl.c  | 29 +
>  fs/ocfs2/file.c | 58 
> -
>  2 files changed, 58 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
> index bed1fcb..dc22ba8 100644
> --- a/fs/ocfs2/acl.c
> +++ b/fs/ocfs2/acl.c
> @@ -283,16 +283,14 @@ int ocfs2_set_acl(handle_t *handle,
>  int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type)
>  {
>   struct buffer_head *bh = NULL;
> - int status = 0;
> + int status, had_lock;
> + struct ocfs2_lock_holder oh;
>  
> - status = ocfs2_inode_lock(inode, , 1);
> - if (status < 0) {
> - if (status != -ENOENT)
> - mlog_errno(status);
> - return status;
> - }
> + had_lock = ocfs2_inode_lock_tracker(inode, , 1, );
> + if (had_lock < 0)
> + return had_lock;
>   status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
> - ocfs2_inode_unlock(inode, 1);
> + ocfs2_inode_unlock_tracker(inode, 1, , had_lock);
>   brelse(bh);
>   return status;
>  }
> @@ -302,21 +300,20 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode 
> *inode, int type)
>   struct ocfs2_super *osb;
>   struct buffer_head *di_bh = NULL;
>   struct posix_acl *acl;
> - int ret;
> + int had_lock;
> + struct ocfs2_lock_holder oh;
>  
>   osb = OCFS2_SB(inode->i_sb);
>   if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
>   return NULL;
> - ret = ocfs2_inode_lock(inode, _bh, 0);
> - if (ret < 0) {
> - if (ret != -ENOENT)
> - mlog_errno(ret);
> - return ERR_PTR(ret);
> - }
> +
> + had_lock = ocfs2_inode_lock_tracker(inode, _bh, 0, );
> + if (had_lock < 0)
> + return ERR_PTR(had_lock);
>  
>   acl = ocfs2_get_acl_nolock(inode, type, di_bh);
>  
> - ocfs2_inode_unlock(inode, 0);
> + ocfs2_inode_unlock_tracker(inode, 0, , had_lock);
>   brelse(di_bh);
>   return acl;
>  }
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index c488965..7b6a146 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1138,6 +1138,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
> *attr)
>   handle_t *handle = NULL;
>   struct dquot *transfer_to[MAXQUOTAS] = { };
>   int qtype;
> + int had_lock;
> + struct ocfs2_lock_holder oh;
>  
>   trace_ocfs2_setattr(inode, dentry,
>   (unsigned long long)OCFS2_I(inode)->ip_blkno,
> @@ -1173,11 +1175,30 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
> *attr)
>   }
>   }
>  
> - status = ocfs2_inode_lock(inode, , 1);
> - if (status < 0) {
> - if (status != -ENOENT)
> - mlog_errno(status);
> + had_lock = ocfs2_inode_lock_tracker(inode, , 1, );
> + if (had_lock < 0) {
> + status = had_lock;
>   goto bail_unlock_rw;
> + } else if (had_lock) {
> + /*
> +  * As far as we know, ocfs2_setattr() could only be the first
> +  * VFS entry 

Re: [PATCH v3 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points

2017-01-16 Thread Junxiao Bi
On 01/17/2017 02:30 PM, Eric Ren wrote:
> Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
> results in a deadlock, as the author "Tariq Saeed" realized shortly
> after the patch was merged. The discussion happened here
> (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).
> 
> The reason why taking cluster inode lock at vfs entry points opens up
> a self deadlock window, is explained in the previous patch of this
> series.
> 
> So far, we have seen two different code paths that have this issue.
> 1. do_sys_open
>  may_open
>   inode_permission
>ocfs2_permission
> ocfs2_inode_lock() <=== take PR
>  generic_permission
>   get_acl
>ocfs2_iop_get_acl
> ocfs2_inode_lock() <=== take PR
> 2. fchmod|fchmodat
> chmod_common
>  notify_change
>   ocfs2_setattr <=== take EX
>posix_acl_chmod
> get_acl
>  ocfs2_iop_get_acl <=== take PR
> ocfs2_iop_set_acl <=== take EX
> 
> Fixes them by adding the tracking logic (in the previous patch) for
> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
> ocfs2_setattr().
> 
> Changes since v1:
> - Let ocfs2_is_locked_by_me() just return true/false to indicate if the
> process gets the cluster lock - suggested by: Joseph Qi 
> and Junxiao Bi .
> 
> - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
> suggested by: Junxiao Bi.
> 
> - Add debugging output at ocfs2_setattr() and ocfs2_permission() to
> catch exceptional cases, suggested by: Junxiao Bi.
> 
> Changes since v2:
> - Use new wrappers of tracking logic code, suggested by: Junxiao Bi.
> 
> Signed-off-by: Eric Ren 
Reviewed-by: Junxiao Bi 

> ---
>  fs/ocfs2/acl.c  | 29 +
>  fs/ocfs2/file.c | 58 
> -
>  2 files changed, 58 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
> index bed1fcb..dc22ba8 100644
> --- a/fs/ocfs2/acl.c
> +++ b/fs/ocfs2/acl.c
> @@ -283,16 +283,14 @@ int ocfs2_set_acl(handle_t *handle,
>  int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type)
>  {
>   struct buffer_head *bh = NULL;
> - int status = 0;
> + int status, had_lock;
> + struct ocfs2_lock_holder oh;
>  
> - status = ocfs2_inode_lock(inode, , 1);
> - if (status < 0) {
> - if (status != -ENOENT)
> - mlog_errno(status);
> - return status;
> - }
> + had_lock = ocfs2_inode_lock_tracker(inode, , 1, );
> + if (had_lock < 0)
> + return had_lock;
>   status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
> - ocfs2_inode_unlock(inode, 1);
> + ocfs2_inode_unlock_tracker(inode, 1, , had_lock);
>   brelse(bh);
>   return status;
>  }
> @@ -302,21 +300,20 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode 
> *inode, int type)
>   struct ocfs2_super *osb;
>   struct buffer_head *di_bh = NULL;
>   struct posix_acl *acl;
> - int ret;
> + int had_lock;
> + struct ocfs2_lock_holder oh;
>  
>   osb = OCFS2_SB(inode->i_sb);
>   if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
>   return NULL;
> - ret = ocfs2_inode_lock(inode, _bh, 0);
> - if (ret < 0) {
> - if (ret != -ENOENT)
> - mlog_errno(ret);
> - return ERR_PTR(ret);
> - }
> +
> + had_lock = ocfs2_inode_lock_tracker(inode, _bh, 0, );
> + if (had_lock < 0)
> + return ERR_PTR(had_lock);
>  
>   acl = ocfs2_get_acl_nolock(inode, type, di_bh);
>  
> - ocfs2_inode_unlock(inode, 0);
> + ocfs2_inode_unlock_tracker(inode, 0, , had_lock);
>   brelse(di_bh);
>   return acl;
>  }
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index c488965..7b6a146 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1138,6 +1138,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
> *attr)
>   handle_t *handle = NULL;
>   struct dquot *transfer_to[MAXQUOTAS] = { };
>   int qtype;
> + int had_lock;
> + struct ocfs2_lock_holder oh;
>  
>   trace_ocfs2_setattr(inode, dentry,
>   (unsigned long long)OCFS2_I(inode)->ip_blkno,
> @@ -1173,11 +1175,30 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
> *attr)
>   }
>   }
>  
> - status = ocfs2_inode_lock(inode, , 1);
> - if (status < 0) {
> - if (status != -ENOENT)
> - mlog_errno(status);
> + had_lock = ocfs2_inode_lock_tracker(inode, , 1, );
> + if (had_lock < 0) {
> + status = had_lock;
>   goto bail_unlock_rw;
> + } else if (had_lock) {
> + /*
> +  * As far as we know, ocfs2_setattr() could only be the first
> +  * VFS entry point in the call chain of recursive cluster
> +  * locking issue.
> + 

Re: [PATCH] tpm: add session handles to the save and restore of the tpm2 space manager

2017-01-16 Thread Jarkko Sakkinen
On Mon, Jan 16, 2017 at 03:18:45PM -0800, James Bottomley wrote:
> On Mon, 2017-01-16 at 12:04 +0200, Jarkko Sakkinen wrote:
> > On Fri, Jan 13, 2017 at 11:24:13AM -0800, James Bottomley wrote:
> > > Session handles are slightly more difficult to manage because any
> > > TPM
> > > only has a finite number of allowed handles, even if the session
> > > has
> > > been saved; so when you context save a session, you must not flush
> > > it
> > > because that would destroy the ability to context load it (you only
> > > flush sessions when you're done with them) and the tpm won't re-use
> > > the handle.  Additionally, sessions can be flushed as part of the
> > > successful execution of a command if the continueSession attribute
> > > is
> > > clear, so we have to mark any session we find in the command (using
> > > TPM2_HT_TAG_FOR_FLUSH) so it can be cleared from the space if the
> > > command successfully executes.
> > > 
> > > Finally, a session may also be cleared by flushing it, so we have
> > > to
> > > emulate the TPM2_FlushContext command to see if a session is being
> > > flushed and manually clear it from the space.
> > > 
> > > We also fully flush all sessions on device close.
> > 
> > Some big overview comments without going deeply into details. I will
> > use more time for this once the 
> > 
> > Please do not use handle allocation code for sessions. This commit
> > makes the implementation a mess. Just use the phandle directly and
> > have array of session phandles for each space.
> > 
> > I would also almost require to have at minimum two patches: one that
> > implements purely isolation and another that implements swapping.
> > 
> > It might be for example that I want to land TPM spaces with session
> > isolation to one release and swapping to n+1 because my hunch tells
> > me that it is better to bake the swapping part for a while.
> > 
> > One more thing. Maybe commit messages should speak uniformally about
> > TPM spaces? They are a tool to implement resource manager, not a
> > resource manager.
> 
> Yes, so Ken also had a reply to this which the Mailing List seems to
> have eaten:
> 
> > This looks like session handles are virtualized.  I believe that this 
> > will break the HMAC for commands (e.g. TPM2_PolicySecret) that have
> > a session handle in the handle area.  The session's handle is its 
> > "Name" and is included in the cpHash (command parameter hash) data 
> > that is HMACed. 
> 
> Basically this means that the advice to virtualize session handles in
> the TCG RM document is wrong and we have to use physical handles. I'll
> redo the implementation for this ... and now, since we'll have nothing
> to use as an index, it probably does make sense to have sessions in a
> separate array.  I can also separate isolation from context switching
> ... although I really think this is less optimal: my TPM only allows
> three active context handles, so if we don't context switch them
> identially to transient object (which it also only allows three of) I'm
> going to run out.  I actually redid my openssl_tpm_engine patches so
> they use session handles for parameter encryption and HMAC based
> authority, so this may end up biting me soon ...
> 
> James

This might be obvious to you but saying this just in case: everytime a
session handle is created, you must traverse through struct tpm_space
instances and check if they have that physical handle and remove it so
that they don't leak.

I would probably just create a linked list of struct tpm_space
instances. Radix tree does not make much sense with the amount of
sessions you might except to have on a system simultaneously.

Anyway, this kind of lazy approach where you clean up as new stuff
gets created is probably also the most straight forward...

/Jarkko


Re: [PATCH] tpm: add session handles to the save and restore of the tpm2 space manager

2017-01-16 Thread Jarkko Sakkinen
On Mon, Jan 16, 2017 at 03:18:45PM -0800, James Bottomley wrote:
> On Mon, 2017-01-16 at 12:04 +0200, Jarkko Sakkinen wrote:
> > On Fri, Jan 13, 2017 at 11:24:13AM -0800, James Bottomley wrote:
> > > Session handles are slightly more difficult to manage because any
> > > TPM
> > > only has a finite number of allowed handles, even if the session
> > > has
> > > been saved; so when you context save a session, you must not flush
> > > it
> > > because that would destroy the ability to context load it (you only
> > > flush sessions when you're done with them) and the tpm won't re-use
> > > the handle.  Additionally, sessions can be flushed as part of the
> > > successful execution of a command if the continueSession attribute
> > > is
> > > clear, so we have to mark any session we find in the command (using
> > > TPM2_HT_TAG_FOR_FLUSH) so it can be cleared from the space if the
> > > command successfully executes.
> > > 
> > > Finally, a session may also be cleared by flushing it, so we have
> > > to
> > > emulate the TPM2_FlushContext command to see if a session is being
> > > flushed and manually clear it from the space.
> > > 
> > > We also fully flush all sessions on device close.
> > 
> > Some big overview comments without going deeply into details. I will
> > use more time for this once the 
> > 
> > Please do not use handle allocation code for sessions. This commit
> > makes the implementation a mess. Just use the phandle directly and
> > have array of session phandles for each space.
> > 
> > I would also almost require to have at minimum two patches: one that
> > implements purely isolation and another that implements swapping.
> > 
> > It might be for example that I want to land TPM spaces with session
> > isolation to one release and swapping to n+1 because my hunch tells
> > me that it is better to bake the swapping part for a while.
> > 
> > One more thing. Maybe commit messages should speak uniformally about
> > TPM spaces? They are a tool to implement resource manager, not a
> > resource manager.
> 
> Yes, so Ken also had a reply to this which the Mailing List seems to
> have eaten:
> 
> > This looks like session handles are virtualized.  I believe that this 
> > will break the HMAC for commands (e.g. TPM2_PolicySecret) that have
> > a session handle in the handle area.  The session's handle is its 
> > "Name" and is included in the cpHash (command parameter hash) data 
> > that is HMACed. 
> 
> Basically this means that the advice to virtualize session handles in
> the TCG RM document is wrong and we have to use physical handles. I'll
> redo the implementation for this ... and now, since we'll have nothing
> to use as an index, it probably does make sense to have sessions in a
> separate array.  I can also separate isolation from context switching
> ... although I really think this is less optimal: my TPM only allows
> three active context handles, so if we don't context switch them
> identially to transient object (which it also only allows three of) I'm
> going to run out.  I actually redid my openssl_tpm_engine patches so
> they use session handles for parameter encryption and HMAC based
> authority, so this may end up biting me soon ...
> 
> James

This might be obvious to you but saying this just in case: everytime a
session handle is created, you must traverse through struct tpm_space
instances and check if they have that physical handle and remove it so
that they don't leak.

I would probably just create a linked list of struct tpm_space
instances. Radix tree does not make much sense with the amount of
sessions you might except to have on a system simultaneously.

Anyway, this kind of lazy approach where you clean up as new stuff
gets created is probably also the most straight forward...

/Jarkko


Re: [PATCH 1/1] x86: sanitize argument of clearcpuid command-line option

2017-01-16 Thread Ingo Molnar

* Odzioba, Lukasz  wrote:

> > pr_warn("x86/cpu: Ignoring invalid "clearcpuid=%s' option!\n", arg)
> >
> > Which would save quite a bit of head scratching and frustration when 
> > someone has a 
> > bad enough day to add silly typos to the kernel cmdline.
> 
> Is there any particular reason why we have such warnings only for early 
> params?
> early_param handlers return non-zero values on success:
>   linux/init.h: " * Emits warning if fn returns non-zero."
> __setup handlers in most cases seem to return 1 on success, is the expected
> behaviour documented somewhere?
> 
> After looking at some of the ~500 usages of __setup macro it seems that 
> handler's ret
> code doesn't matter so much, because it is treated differently in various 
> parts
> of the kernel. If we make it consistent possibly it could be solved similarly 
> to 
> early params by something like this: 
> 
> diff --git a/init/main.c b/init/main.c
> index b0c9d6f..261178e 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -182,8 +182,12 @@ static bool __init obsolete_checksetup(char *line)
> pr_warn("Parameter %s is obsolete, ignored\n",
> p->str);
> return true;
> -   } else if (p->setup_func(line + n))
> -   return true;
> +   } else {
> +   if (p->setup_func(line + n))
> +   return true;
> +   else
> +   pr_warn("Malformed option '%s'\n", 
> line);
> +   }

That looks sensible to me! I'd tweak the message slightly:

pr_warn("error: Ignoring invalid boot parameter '%s'\n", line);

to make it more clear that it's a boot option that has a problem (there are 
many 
other types of options), and to make sure the user knows that we ignored that 
option.

Mind sending this as a proper patch, with akpm Cc:-ed?

Thanks,

Ingo


[PATCH v2 6+/6] platform/x86: dell-wmi-led: fix coding style issues

2017-01-16 Thread Michał Kępień
Fix coding style issues in dell-wmi-led which checkpatch complains about
to make sure the module gets a clean start in the x86 platform driver
subsystem.

Signed-off-by: Michał Kępień 
---
This is an extra patch that Jacek asked for [1].

[1] https://lkml.org/lkml/2017/1/16/631

 drivers/platform/x86/dell-wmi-led.c | 41 +++--
 1 file changed, 16 insertions(+), 25 deletions(-)

diff --git a/drivers/platform/x86/dell-wmi-led.c 
b/drivers/platform/x86/dell-wmi-led.c
index d0232c7f1909..8753c4fc36b8 100644
--- a/drivers/platform/x86/dell-wmi-led.c
+++ b/drivers/platform/x86/dell-wmi-led.c
@@ -46,21 +46,16 @@ struct bios_args {
unsigned char off_time;
 };
 
-static int dell_led_perform_fn(u8 length,
-   u8 result_code,
-   u8 device_id,
-   u8 command,
-   u8 on_time,
-   u8 off_time)
+static int dell_led_perform_fn(u8 length, u8 result_code, u8 device_id,
+  u8 command, u8 on_time, u8 off_time)
 {
-   struct bios_args *bios_return;
-   u8 return_code;
-   union acpi_object *obj;
struct acpi_buffer output = { ACPI_ALLOCATE_BUFFER, NULL };
+   struct bios_args *bios_return, args;
struct acpi_buffer input;
+   union acpi_object *obj;
acpi_status status;
+   u8 return_code;
 
-   struct bios_args args;
args.length = length;
args.result_code = result_code;
args.device_id = device_id;
@@ -71,11 +66,7 @@ static int dell_led_perform_fn(u8 length,
input.length = sizeof(struct bios_args);
input.pointer = 
 
-   status = wmi_evaluate_method(DELL_LED_BIOS_GUID,
-   1,
-   1,
-   ,
-   );
+   status = wmi_evaluate_method(DELL_LED_BIOS_GUID, 1, 1, , );
 
if (ACPI_FAILURE(status))
return status;
@@ -84,7 +75,7 @@ static int dell_led_perform_fn(u8 length,
 
if (!obj)
return -EINVAL;
-   else if (obj->type != ACPI_TYPE_BUFFER) {
+   if (obj->type != ACPI_TYPE_BUFFER) {
kfree(obj);
return -EINVAL;
}
@@ -117,8 +108,7 @@ static int led_off(void)
0); /* not used */
 }
 
-static int led_blink(unsigned char on_eighths,
-   unsigned char off_eighths)
+static int led_blink(unsigned char on_eighths, unsigned char off_eighths)
 {
return dell_led_perform_fn(5,   /* Length of command */
INTERFACE_ERROR,/* Init to  INTERFACE_ERROR */
@@ -129,7 +119,7 @@ static int led_blink(unsigned char on_eighths,
 }
 
 static void dell_led_set(struct led_classdev *led_cdev,
-   enum led_brightness value)
+enum led_brightness value)
 {
if (value == LED_OFF)
led_off();
@@ -138,24 +128,25 @@ static void dell_led_set(struct led_classdev *led_cdev,
 }
 
 static int dell_led_blink(struct led_classdev *led_cdev,
-   unsigned long *delay_on,
-   unsigned long *delay_off)
+ unsigned long *delay_on, unsigned long *delay_off)
 {
unsigned long on_eighths;
unsigned long off_eighths;
 
-   /* The Dell LED delay is based on 125ms intervals.
-  Need to round up to next interval. */
+   /*
+* The Dell LED delay is based on 125ms intervals.
+* Need to round up to next interval.
+*/
 
on_eighths = (*delay_on + 124) / 125;
-   if (0 == on_eighths)
+   if (on_eighths == 0)
on_eighths = 1;
if (on_eighths > 255)
on_eighths = 255;
*delay_on = on_eighths * 125;
 
off_eighths = (*delay_off + 124) / 125;
-   if (0 == off_eighths)
+   if (off_eighths == 0)
off_eighths = 1;
if (off_eighths > 255)
off_eighths = 255;
-- 
2.11.0



Re: [PATCH 1/1] x86: sanitize argument of clearcpuid command-line option

2017-01-16 Thread Ingo Molnar

* Odzioba, Lukasz  wrote:

> > pr_warn("x86/cpu: Ignoring invalid "clearcpuid=%s' option!\n", arg)
> >
> > Which would save quite a bit of head scratching and frustration when 
> > someone has a 
> > bad enough day to add silly typos to the kernel cmdline.
> 
> Is there any particular reason why we have such warnings only for early 
> params?
> early_param handlers return non-zero values on success:
>   linux/init.h: " * Emits warning if fn returns non-zero."
> __setup handlers in most cases seem to return 1 on success, is the expected
> behaviour documented somewhere?
> 
> After looking at some of the ~500 usages of __setup macro it seems that 
> handler's ret
> code doesn't matter so much, because it is treated differently in various 
> parts
> of the kernel. If we make it consistent possibly it could be solved similarly 
> to 
> early params by something like this: 
> 
> diff --git a/init/main.c b/init/main.c
> index b0c9d6f..261178e 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -182,8 +182,12 @@ static bool __init obsolete_checksetup(char *line)
> pr_warn("Parameter %s is obsolete, ignored\n",
> p->str);
> return true;
> -   } else if (p->setup_func(line + n))
> -   return true;
> +   } else {
> +   if (p->setup_func(line + n))
> +   return true;
> +   else
> +   pr_warn("Malformed option '%s'\n", 
> line);
> +   }

That looks sensible to me! I'd tweak the message slightly:

pr_warn("error: Ignoring invalid boot parameter '%s'\n", line);

to make it more clear that it's a boot option that has a problem (there are 
many 
other types of options), and to make sure the user knows that we ignored that 
option.

Mind sending this as a proper patch, with akpm Cc:-ed?

Thanks,

Ingo


[PATCH v2 6+/6] platform/x86: dell-wmi-led: fix coding style issues

2017-01-16 Thread Michał Kępień
Fix coding style issues in dell-wmi-led which checkpatch complains about
to make sure the module gets a clean start in the x86 platform driver
subsystem.

Signed-off-by: Michał Kępień 
---
This is an extra patch that Jacek asked for [1].

[1] https://lkml.org/lkml/2017/1/16/631

 drivers/platform/x86/dell-wmi-led.c | 41 +++--
 1 file changed, 16 insertions(+), 25 deletions(-)

diff --git a/drivers/platform/x86/dell-wmi-led.c 
b/drivers/platform/x86/dell-wmi-led.c
index d0232c7f1909..8753c4fc36b8 100644
--- a/drivers/platform/x86/dell-wmi-led.c
+++ b/drivers/platform/x86/dell-wmi-led.c
@@ -46,21 +46,16 @@ struct bios_args {
unsigned char off_time;
 };
 
-static int dell_led_perform_fn(u8 length,
-   u8 result_code,
-   u8 device_id,
-   u8 command,
-   u8 on_time,
-   u8 off_time)
+static int dell_led_perform_fn(u8 length, u8 result_code, u8 device_id,
+  u8 command, u8 on_time, u8 off_time)
 {
-   struct bios_args *bios_return;
-   u8 return_code;
-   union acpi_object *obj;
struct acpi_buffer output = { ACPI_ALLOCATE_BUFFER, NULL };
+   struct bios_args *bios_return, args;
struct acpi_buffer input;
+   union acpi_object *obj;
acpi_status status;
+   u8 return_code;
 
-   struct bios_args args;
args.length = length;
args.result_code = result_code;
args.device_id = device_id;
@@ -71,11 +66,7 @@ static int dell_led_perform_fn(u8 length,
input.length = sizeof(struct bios_args);
input.pointer = 
 
-   status = wmi_evaluate_method(DELL_LED_BIOS_GUID,
-   1,
-   1,
-   ,
-   );
+   status = wmi_evaluate_method(DELL_LED_BIOS_GUID, 1, 1, , );
 
if (ACPI_FAILURE(status))
return status;
@@ -84,7 +75,7 @@ static int dell_led_perform_fn(u8 length,
 
if (!obj)
return -EINVAL;
-   else if (obj->type != ACPI_TYPE_BUFFER) {
+   if (obj->type != ACPI_TYPE_BUFFER) {
kfree(obj);
return -EINVAL;
}
@@ -117,8 +108,7 @@ static int led_off(void)
0); /* not used */
 }
 
-static int led_blink(unsigned char on_eighths,
-   unsigned char off_eighths)
+static int led_blink(unsigned char on_eighths, unsigned char off_eighths)
 {
return dell_led_perform_fn(5,   /* Length of command */
INTERFACE_ERROR,/* Init to  INTERFACE_ERROR */
@@ -129,7 +119,7 @@ static int led_blink(unsigned char on_eighths,
 }
 
 static void dell_led_set(struct led_classdev *led_cdev,
-   enum led_brightness value)
+enum led_brightness value)
 {
if (value == LED_OFF)
led_off();
@@ -138,24 +128,25 @@ static void dell_led_set(struct led_classdev *led_cdev,
 }
 
 static int dell_led_blink(struct led_classdev *led_cdev,
-   unsigned long *delay_on,
-   unsigned long *delay_off)
+ unsigned long *delay_on, unsigned long *delay_off)
 {
unsigned long on_eighths;
unsigned long off_eighths;
 
-   /* The Dell LED delay is based on 125ms intervals.
-  Need to round up to next interval. */
+   /*
+* The Dell LED delay is based on 125ms intervals.
+* Need to round up to next interval.
+*/
 
on_eighths = (*delay_on + 124) / 125;
-   if (0 == on_eighths)
+   if (on_eighths == 0)
on_eighths = 1;
if (on_eighths > 255)
on_eighths = 255;
*delay_on = on_eighths * 125;
 
off_eighths = (*delay_off + 124) / 125;
-   if (0 == off_eighths)
+   if (off_eighths == 0)
off_eighths = 1;
if (off_eighths > 255)
off_eighths = 255;
-- 
2.11.0



Re: [tpmdd-devel] [PATCH RFC v3 5/5] tpm2: expose resource manager via a device link /dev/tpms

2017-01-16 Thread Jarkko Sakkinen
On Mon, Jan 16, 2017 at 09:28:17AM -0800, James Bottomley wrote:
> On Mon, 2017-01-16 at 19:24 +0200, Jarkko Sakkinen wrote:
> > On Mon, Jan 16, 2017 at 09:14:13AM -0700, Jason Gunthorpe wrote:
> > > On Mon, Jan 16, 2017 at 03:12:11PM +0200, Jarkko Sakkinen wrote:
> > > 
> > > > @@ -199,7 +227,9 @@ struct tpm_chip *tpm_chip_alloc(struct device
> > > > *pdev,
> > > > return chip;
> > > >  
> > > >  out:
> > > > +   put_device(>devrm);
> > > > put_device(>dev);
> > > > +   put_device(>devrm);
> > > > return ERR_PTR(rc);
> > > >  }
> > > 
> > > Something has gone wrong here..
> > 
> > Not a big surprise. There were a bunch of these small patches
> > that I had to squash. Thanks for spotting this.
> > 
> > Funny that I didn't experiece any issues when I run my smoke
> > tests.
> 
> It's the error path in tpm_chip_alloc: it's incredibly hard to trigger,
> which is probably why your tests didn't see it.  You basically either
> have to be out of kernel memory or out of TPM device numbers.

Right, of course.

I have branch for each new patch set version. For v3 the branch is named
as tabrm3. I'll apply all the fixes as separate commits on top of that
branch. Once I'm ready to send a new version, I'll create tabrm4 branch
and squash the fixes.

/Jarkko


Re: [PATCH v4 07/15] lockdep: Implement crossrelease feature

2017-01-16 Thread Peter Zijlstra
On Tue, Jan 17, 2017 at 11:05:42AM +0900, Byungchul Park wrote:
> On Mon, Jan 16, 2017 at 04:10:01PM +0100, Peter Zijlstra wrote:
> > On Fri, Dec 09, 2016 at 02:12:03PM +0900, Byungchul Park wrote:

> > > @@ -155,6 +164,9 @@ struct lockdep_map {
> > >   int cpu;
> > >   unsigned long   ip;
> > >  #endif
> > > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > > + struct cross_lock   *xlock;
> > > +#endif
> > 
> > The use of this escapes me; why does the lockdep_map need a pointer to
> > this?
> 
> Lockdep interfaces e.g. lock_acquire(), lock_release() and lock_commit()
> use lockdep_map as an arg, but crossrelease need to extract cross_lock
> instances from that.

> > Why not do something like:
> > 
> > struct lockdep_map_cross {
> > struct lockdep_map  map;
> > struct held_lockhlock;
> > }

Using a structure like that, you can pass lockdep_map_cross around just
fine, since the lockdep_map is the first member, so the pointers are
interchangeable. At worst we might need to munge a few typecasts.

But then the cross release code can simply cast to the bigger type and
have access to the extra data it knows to be there.


Re: [tpmdd-devel] [PATCH RFC v3 5/5] tpm2: expose resource manager via a device link /dev/tpms

2017-01-16 Thread Jarkko Sakkinen
On Mon, Jan 16, 2017 at 09:28:17AM -0800, James Bottomley wrote:
> On Mon, 2017-01-16 at 19:24 +0200, Jarkko Sakkinen wrote:
> > On Mon, Jan 16, 2017 at 09:14:13AM -0700, Jason Gunthorpe wrote:
> > > On Mon, Jan 16, 2017 at 03:12:11PM +0200, Jarkko Sakkinen wrote:
> > > 
> > > > @@ -199,7 +227,9 @@ struct tpm_chip *tpm_chip_alloc(struct device
> > > > *pdev,
> > > > return chip;
> > > >  
> > > >  out:
> > > > +   put_device(>devrm);
> > > > put_device(>dev);
> > > > +   put_device(>devrm);
> > > > return ERR_PTR(rc);
> > > >  }
> > > 
> > > Something has gone wrong here..
> > 
> > Not a big surprise. There were a bunch of these small patches
> > that I had to squash. Thanks for spotting this.
> > 
> > Funny that I didn't experiece any issues when I run my smoke
> > tests.
> 
> It's the error path in tpm_chip_alloc: it's incredibly hard to trigger,
> which is probably why your tests didn't see it.  You basically either
> have to be out of kernel memory or out of TPM device numbers.

Right, of course.

I have branch for each new patch set version. For v3 the branch is named
as tabrm3. I'll apply all the fixes as separate commits on top of that
branch. Once I'm ready to send a new version, I'll create tabrm4 branch
and squash the fixes.

/Jarkko


Re: [PATCH v4 07/15] lockdep: Implement crossrelease feature

2017-01-16 Thread Peter Zijlstra
On Tue, Jan 17, 2017 at 11:05:42AM +0900, Byungchul Park wrote:
> On Mon, Jan 16, 2017 at 04:10:01PM +0100, Peter Zijlstra wrote:
> > On Fri, Dec 09, 2016 at 02:12:03PM +0900, Byungchul Park wrote:

> > > @@ -155,6 +164,9 @@ struct lockdep_map {
> > >   int cpu;
> > >   unsigned long   ip;
> > >  #endif
> > > +#ifdef CONFIG_LOCKDEP_CROSSRELEASE
> > > + struct cross_lock   *xlock;
> > > +#endif
> > 
> > The use of this escapes me; why does the lockdep_map need a pointer to
> > this?
> 
> Lockdep interfaces e.g. lock_acquire(), lock_release() and lock_commit()
> use lockdep_map as an arg, but crossrelease need to extract cross_lock
> instances from that.

> > Why not do something like:
> > 
> > struct lockdep_map_cross {
> > struct lockdep_map  map;
> > struct held_lockhlock;
> > }

Using a structure like that, you can pass lockdep_map_cross around just
fine, since the lockdep_map is the first member, so the pointers are
interchangeable. At worst we might need to munge a few typecasts.

But then the cross release code can simply cast to the bigger type and
have access to the extra data it knows to be there.


Re: [GIT PULL 4/4] ARM: defconfig: exynos: Defconfig for v4.11

2017-01-16 Thread Olof Johansson
On Fri, Jan 13, 2017 at 05:20:30PM +0200, Krzysztof Kozlowski wrote:
> 
> The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77:
> 
>   Linux 4.10-rc1 (2016-12-25 16:13:08 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git 
> tags/samsung-defconfig-4.11
> 
> for you to fetch changes up to 74b0ee7579f279dae053053008d29f10f3890c07:
> 
>   ARM: exynos_defconfig: Enable IP multicast (2016-12-29 16:04:57 +0200)
> 
> 
> Samsung defconfig update for v4.11:
> 1. Cleanup from old MACHs in s5pv210.
> 2. Enable IP_MULTICAST for libnss-mdns.
> 
> 
> Krzysztof Kozlowski (1):
>   ARM: s5pv210_defconfig: Remove old MACHs
> 
> Robie Basak (1):
>   ARM: exynos_defconfig: Enable IP multicast

Merged, thanks.


-Olof



Re: [GIT PULL 4/4] ARM: defconfig: exynos: Defconfig for v4.11

2017-01-16 Thread Olof Johansson
On Fri, Jan 13, 2017 at 05:20:30PM +0200, Krzysztof Kozlowski wrote:
> 
> The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77:
> 
>   Linux 4.10-rc1 (2016-12-25 16:13:08 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git 
> tags/samsung-defconfig-4.11
> 
> for you to fetch changes up to 74b0ee7579f279dae053053008d29f10f3890c07:
> 
>   ARM: exynos_defconfig: Enable IP multicast (2016-12-29 16:04:57 +0200)
> 
> 
> Samsung defconfig update for v4.11:
> 1. Cleanup from old MACHs in s5pv210.
> 2. Enable IP_MULTICAST for libnss-mdns.
> 
> 
> Krzysztof Kozlowski (1):
>   ARM: s5pv210_defconfig: Remove old MACHs
> 
> Robie Basak (1):
>   ARM: exynos_defconfig: Enable IP multicast

Merged, thanks.


-Olof



Re: [GIT PULL 2/4] ARM: dts: exynos: DT for v4.11

2017-01-16 Thread Olof Johansson
On Fri, Jan 13, 2017 at 05:20:28PM +0200, Krzysztof Kozlowski wrote:
> Hi,
> 
> Few improvements and Exynos4412 DTSI removal.
> 
> This pulls external dependecy from clock tree (one clock commit).
> 
> Best regards,
> Krzysztof
> 
> 
> The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77:
> 
>   Linux 4.10-rc1 (2016-12-25 16:13:08 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git 
> tags/samsung-dt-4.11
> 
> for you to fetch changes up to bca9085e0ae93253bc93ce218c85ac7d7e7f1831:
> 
>   ARM: dts: exynos: remove Exynos4212 support (dead code) (2017-01-11 
> 18:28:43 +0200)
> 
> 
> Samsung DeviceTree update for v4.11:
> 1. Fixes for initial audio clocks configuration.
> 2. Enable sound on Odroid-X board.
> 3. Enable DMA for UART modules on Exynos5 SoCs.
> 4. Add CPU OPPs for Exynos4412 Prime (newer version of Exynos4412). This pulls
>necessary change in the clocks.
> 5. Remove Exynos4212. We do not have any mainline boards with it. This will
>simplify few bits later.

Merged, thanks.


-Olof


Re: [GIT PULL 2/4] ARM: dts: exynos: DT for v4.11

2017-01-16 Thread Olof Johansson
On Fri, Jan 13, 2017 at 05:20:28PM +0200, Krzysztof Kozlowski wrote:
> Hi,
> 
> Few improvements and Exynos4412 DTSI removal.
> 
> This pulls external dependecy from clock tree (one clock commit).
> 
> Best regards,
> Krzysztof
> 
> 
> The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77:
> 
>   Linux 4.10-rc1 (2016-12-25 16:13:08 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git 
> tags/samsung-dt-4.11
> 
> for you to fetch changes up to bca9085e0ae93253bc93ce218c85ac7d7e7f1831:
> 
>   ARM: dts: exynos: remove Exynos4212 support (dead code) (2017-01-11 
> 18:28:43 +0200)
> 
> 
> Samsung DeviceTree update for v4.11:
> 1. Fixes for initial audio clocks configuration.
> 2. Enable sound on Odroid-X board.
> 3. Enable DMA for UART modules on Exynos5 SoCs.
> 4. Add CPU OPPs for Exynos4412 Prime (newer version of Exynos4412). This pulls
>necessary change in the clocks.
> 5. Remove Exynos4212. We do not have any mainline boards with it. This will
>simplify few bits later.

Merged, thanks.


-Olof


Re: [PATCH 2/3] Make static usermode helper binaries constant

2017-01-16 Thread Greg KH
On Mon, Jan 16, 2017 at 04:25:55PM -0500, J. Bruce Fields wrote:
> On Mon, Jan 16, 2017 at 05:50:31PM +0100, Greg KH wrote:
> > From: Greg Kroah-Hartman 
> > 
> > There are a number of usermode helper binaries that are "hard coded" in
> > the kernel today, so mark them as "const" to make it harder for someone
> > to change where the variables point to.
> > 
> ...
> > --- a/drivers/pnp/pnpbios/core.c
> > +++ b/drivers/pnp/pnpbios/core.c
> > @@ -98,6 +98,7 @@ static struct completion unload_sem;
> >   */
> >  static int pnp_dock_event(int dock, struct pnp_docking_station_info *info)
> >  {
> > +   static char const sbin_pnpbios[] = "/sbin/pnpbios";
> > char *argv[3], **envp, *buf, *scratch;
> > int i = 0, value;
> >  
> > @@ -112,7 +113,7 @@ static int pnp_dock_event(int dock, struct 
> > pnp_docking_station_info *info)
> >  * integrated into the driver core and use the usual infrastructure
> >  * like sysfs and uevents
> >  */
> > -   argv[0] = "/sbin/pnpbios";
> > +   argv[0] = (char *)sbin_pnpbios;
> 
> So here and elsewhere, can attackers write to argv[0] instead of to the
> memory where the string lives?

Yes, they could, it would be a very "tight" race to do that (have to
write after the assignment and before the call_usermodehelper_exec()
runs).  However, the kernel does not run argv[0], it just passes it to
the binary you specify in path, so for this example, the correct program
would still be run by the kernel.

But, if you do worry about this type of attack, then enable the option I
created in patch 3/3 here, which will funnel all calls into a single
userspace binary where you can then filter on argv[0] to see if you want
to run the binary or not to prevent this type of attack.

> Apologies if I'm rehashing earlier discussion, I did a quick search of
> archives but could easily have missed something.

No problem at all, hopefully I've explained it better now.

thanks,

greg k-h


Re: [PATCH 2/3] Make static usermode helper binaries constant

2017-01-16 Thread Greg KH
On Mon, Jan 16, 2017 at 04:25:55PM -0500, J. Bruce Fields wrote:
> On Mon, Jan 16, 2017 at 05:50:31PM +0100, Greg KH wrote:
> > From: Greg Kroah-Hartman 
> > 
> > There are a number of usermode helper binaries that are "hard coded" in
> > the kernel today, so mark them as "const" to make it harder for someone
> > to change where the variables point to.
> > 
> ...
> > --- a/drivers/pnp/pnpbios/core.c
> > +++ b/drivers/pnp/pnpbios/core.c
> > @@ -98,6 +98,7 @@ static struct completion unload_sem;
> >   */
> >  static int pnp_dock_event(int dock, struct pnp_docking_station_info *info)
> >  {
> > +   static char const sbin_pnpbios[] = "/sbin/pnpbios";
> > char *argv[3], **envp, *buf, *scratch;
> > int i = 0, value;
> >  
> > @@ -112,7 +113,7 @@ static int pnp_dock_event(int dock, struct 
> > pnp_docking_station_info *info)
> >  * integrated into the driver core and use the usual infrastructure
> >  * like sysfs and uevents
> >  */
> > -   argv[0] = "/sbin/pnpbios";
> > +   argv[0] = (char *)sbin_pnpbios;
> 
> So here and elsewhere, can attackers write to argv[0] instead of to the
> memory where the string lives?

Yes, they could, it would be a very "tight" race to do that (have to
write after the assignment and before the call_usermodehelper_exec()
runs).  However, the kernel does not run argv[0], it just passes it to
the binary you specify in path, so for this example, the correct program
would still be run by the kernel.

But, if you do worry about this type of attack, then enable the option I
created in patch 3/3 here, which will funnel all calls into a single
userspace binary where you can then filter on argv[0] to see if you want
to run the binary or not to prevent this type of attack.

> Apologies if I'm rehashing earlier discussion, I did a quick search of
> archives but could easily have missed something.

No problem at all, hopefully I've explained it better now.

thanks,

greg k-h


Re: [GIT PULL] STi DT update for v4.11 round 1

2017-01-16 Thread Olof Johansson
On Thu, Jan 12, 2017 at 05:59:43PM +0100, Patrice Chotard wrote:
> Hi Arnd, Kevin, Olof
> 
> PLease consider this first round of STi dts update for v4.11 :
> 
> The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77:
> 
>   Linux 4.10-rc1 (2016-12-25 16:13:08 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pchotard/sti.git 
> tags/sti-dt-for-v4.11
> 
> for you to fetch changes up to 2016ead446b98c42dffd9b6c03ce813e5cb3b810:
> 
>   ARM: dts: STiH407-family: Supply Mailbox properties to delta RProc 
> (2017-01-12 17:23:39 +0100)
> 
> 
> STi dts update:
> 
> Enable High Quality Video Data Plane (HQVDP) DT entry
> Add DELTA V4L2 video decoder DT entry
> Disable unused fdma instances
> Fix sti-display-subsystem wrong clock parent's value
> Cleanup and update DT entries related to remoteproc

Merged, thanks!


-Olof



Re: [GIT PULL 1/4] ARM: exynos: Mach/soc for v4.11

2017-01-16 Thread Olof Johansson
On Fri, Jan 13, 2017 at 05:20:27PM +0200, Krzysztof Kozlowski wrote:
> Hi,
> 
> Mostly cleanups for v4.11.
> 
> Best regards,
> Krzysztof
> 
> 
> The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77:
> 
>   Linux 4.10-rc1 (2016-12-25 16:13:08 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git 
> tags/samsung-soc-4.11
> 
> for you to fetch changes up to cda1a52dab50340728e46601e6c9da9fc4beaf1f:
> 
>   ARM: s3c64xx: Constify wake_irqs (2016-12-29 15:41:44 +0200)
> 
> 
> Samsung mach/soc update for v4.11. Mostly cleanups:
> 1. Removal of unused platform data in S3C24XX and S3C64xx as follow up of
>conversion to new DMA channel request API.
> 2. Adding const and __ro_after_init to various data in Samsung platforms.
> 
> 
> Krzysztof Kozlowski (7):
>   ARM: EXYNOS: Constify list of retention registers
>   ARM: EXYNOS: Annotate iomem and pm_data pointers __ro_after_init
>   ARM: s3c24xx: Constify few integer tables
>   ARM: s3c64xx: Annotate external clock frequencies __ro_after_init
>   ARM: SAMSUNG: Constify array of wake irqs passed to 
> samsung_sync_wakemask
>   ARM: s3c24xx: Constify wake_irqs
>   ARM: s3c64xx: Constify wake_irqs
> 
> Sylwester Nawrocki (2):
>   ARM: s3c64xx: Drop initialization of unused struct s3c_audio_pdata 
> fields
>   ARM: s3c24xx: Drop unused struct s3c_audio_pdata entries

Merged, thanks.


-Olof


Re: [GIT PULL] STi DT update for v4.11 round 1

2017-01-16 Thread Olof Johansson
On Thu, Jan 12, 2017 at 05:59:43PM +0100, Patrice Chotard wrote:
> Hi Arnd, Kevin, Olof
> 
> PLease consider this first round of STi dts update for v4.11 :
> 
> The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77:
> 
>   Linux 4.10-rc1 (2016-12-25 16:13:08 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pchotard/sti.git 
> tags/sti-dt-for-v4.11
> 
> for you to fetch changes up to 2016ead446b98c42dffd9b6c03ce813e5cb3b810:
> 
>   ARM: dts: STiH407-family: Supply Mailbox properties to delta RProc 
> (2017-01-12 17:23:39 +0100)
> 
> 
> STi dts update:
> 
> Enable High Quality Video Data Plane (HQVDP) DT entry
> Add DELTA V4L2 video decoder DT entry
> Disable unused fdma instances
> Fix sti-display-subsystem wrong clock parent's value
> Cleanup and update DT entries related to remoteproc

Merged, thanks!


-Olof



Re: [GIT PULL 1/4] ARM: exynos: Mach/soc for v4.11

2017-01-16 Thread Olof Johansson
On Fri, Jan 13, 2017 at 05:20:27PM +0200, Krzysztof Kozlowski wrote:
> Hi,
> 
> Mostly cleanups for v4.11.
> 
> Best regards,
> Krzysztof
> 
> 
> The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77:
> 
>   Linux 4.10-rc1 (2016-12-25 16:13:08 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git 
> tags/samsung-soc-4.11
> 
> for you to fetch changes up to cda1a52dab50340728e46601e6c9da9fc4beaf1f:
> 
>   ARM: s3c64xx: Constify wake_irqs (2016-12-29 15:41:44 +0200)
> 
> 
> Samsung mach/soc update for v4.11. Mostly cleanups:
> 1. Removal of unused platform data in S3C24XX and S3C64xx as follow up of
>conversion to new DMA channel request API.
> 2. Adding const and __ro_after_init to various data in Samsung platforms.
> 
> 
> Krzysztof Kozlowski (7):
>   ARM: EXYNOS: Constify list of retention registers
>   ARM: EXYNOS: Annotate iomem and pm_data pointers __ro_after_init
>   ARM: s3c24xx: Constify few integer tables
>   ARM: s3c64xx: Annotate external clock frequencies __ro_after_init
>   ARM: SAMSUNG: Constify array of wake irqs passed to 
> samsung_sync_wakemask
>   ARM: s3c24xx: Constify wake_irqs
>   ARM: s3c64xx: Constify wake_irqs
> 
> Sylwester Nawrocki (2):
>   ARM: s3c64xx: Drop initialization of unused struct s3c_audio_pdata 
> fields
>   ARM: s3c24xx: Drop unused struct s3c_audio_pdata entries

Merged, thanks.


-Olof


Re: [PATCH v4 07/15] lockdep: Implement crossrelease feature

2017-01-16 Thread Peter Zijlstra
On Tue, Jan 17, 2017 at 11:05:42AM +0900, Byungchul Park wrote:
> On Mon, Jan 16, 2017 at 04:10:01PM +0100, Peter Zijlstra wrote:
> > On Fri, Dec 09, 2016 at 02:12:03PM +0900, Byungchul Park wrote:

> > > +
> > > + /*
> > > +  * Whenever irq happens, these are updated so that we can
> > > +  * distinguish each irq context uniquely.
> > > +  */
> > > + unsigned inthardirq_id;
> > > + unsigned intsoftirq_id;
> > 
> > An alternative approach would be to 'unwind' or discard all historical
> > events from a nested context once we exit it.
> 
> That's one of what I considered. However, it would make code complex to
> detect if pend_lock ring buffer was wrapped.

I'm not sure I see the need for detecting that...

> > 
> > After all, all we care about is the history of the release context, once
> > the context is gone, we don't care.
> 
> We must care it and decide if the next plock in the ring buffer might be
> valid one or not.

So I was thinking this was an overwriting ring buffer; something like
so:

struct pend_lock plocks[64];
unsigned int plocks_idx;

static void plocks_add(..)
{
unsigned int idx = (plocks_idx++) % 64;

plocks[idx] = ...;
}

static void plocks_close_context(int ctx)
{
for (i = 0; i < 64; i++) {
int idx = (plocks_idx - 1) % 64;
if (plocks[idx].ctx != ctx)
break;

plocks_idx--;
}
}

Similarly for the release, it need only look at 64 entries and terminate
early if the generation number is too old.

static void plocks_release(unsigned int gen)
{
for (i = 0; i < 64; i++) {
int idx = (plocks_idx - 1 - i) % 64;
if ((int)(plocks[idx].gen_id - gen) < 0)
break;

/* do release muck */
}
}


Re: [PATCH v4 07/15] lockdep: Implement crossrelease feature

2017-01-16 Thread Peter Zijlstra
On Tue, Jan 17, 2017 at 11:05:42AM +0900, Byungchul Park wrote:
> On Mon, Jan 16, 2017 at 04:10:01PM +0100, Peter Zijlstra wrote:
> > On Fri, Dec 09, 2016 at 02:12:03PM +0900, Byungchul Park wrote:

> > > +
> > > + /*
> > > +  * Whenever irq happens, these are updated so that we can
> > > +  * distinguish each irq context uniquely.
> > > +  */
> > > + unsigned inthardirq_id;
> > > + unsigned intsoftirq_id;
> > 
> > An alternative approach would be to 'unwind' or discard all historical
> > events from a nested context once we exit it.
> 
> That's one of what I considered. However, it would make code complex to
> detect if pend_lock ring buffer was wrapped.

I'm not sure I see the need for detecting that...

> > 
> > After all, all we care about is the history of the release context, once
> > the context is gone, we don't care.
> 
> We must care it and decide if the next plock in the ring buffer might be
> valid one or not.

So I was thinking this was an overwriting ring buffer; something like
so:

struct pend_lock plocks[64];
unsigned int plocks_idx;

static void plocks_add(..)
{
unsigned int idx = (plocks_idx++) % 64;

plocks[idx] = ...;
}

static void plocks_close_context(int ctx)
{
for (i = 0; i < 64; i++) {
int idx = (plocks_idx - 1) % 64;
if (plocks[idx].ctx != ctx)
break;

plocks_idx--;
}
}

Similarly for the release, it need only look at 64 entries and terminate
early if the generation number is too old.

static void plocks_release(unsigned int gen)
{
for (i = 0; i < 64; i++) {
int idx = (plocks_idx - 1 - i) % 64;
if ((int)(plocks[idx].gen_id - gen) < 0)
break;

/* do release muck */
}
}


Re: [GIT PULL 3/4] arm64: dts: exynos: DT for v4.11

2017-01-16 Thread Olof Johansson
On Fri, Jan 13, 2017 at 05:20:29PM +0200, Krzysztof Kozlowski wrote:
> Hi,
> 
> Important:
> 1. Unlike others, this is rebased on v4.10-rc2 because v4.10-rc1 was not
>building on arm64.
> 2. This contains pinctrl change (in header), acked by Linus Walleij.
>Although I kept it on separate branch and merged here, but I did not
>prepare a separate pull request for this one pinctrl change - it is
>just one patch. Anyway it sits on separate branch so this is still
>possible if you need it.
> 
> Best regards,
> Krzysztof
> 
> Cc: Linus Walleij 
> 
> The following changes since commit 0c744ea4f77d72b3dcebb7a8f2684633ec79be88:
> 
>   Linux 4.10-rc2 (2017-01-01 14:31:53 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git 
> tags/samsung-dt64-4.11
> 
> for you to fetch changes up to e4e381133241a27d732e78be09973b89a193eaf7:
> 
>   arm64: dts: exynos: Enable HDMI/TV path on Exynos5433-TM2 (2017-01-11 
> 18:20:28 +0200)
> 
> 
> Samsung DeviceTree ARM64 update for v4.11:
> 1. Add bus frequency and voltage scalling on Exynos5433 TM2 device (along with
>necessary bus nodes and Platform Performance Monitoring Unit on 
> Exynos5433).
> 2. Use macros for pinctrl settings on Exynos5433.
>This contains necessary header with bindings.
> 3. Minor cleanups in Exynos5433 DTSI and boards using it.
> 4. Create common DTSI betweem Exynos5433 TM2E and TM2E.
> 5. Add HDMI/TV to Exynos5433 TM2.

Merged, thanks.


-Olof


Re: [GIT PULL 3/4] arm64: dts: exynos: DT for v4.11

2017-01-16 Thread Olof Johansson
On Fri, Jan 13, 2017 at 05:20:29PM +0200, Krzysztof Kozlowski wrote:
> Hi,
> 
> Important:
> 1. Unlike others, this is rebased on v4.10-rc2 because v4.10-rc1 was not
>building on arm64.
> 2. This contains pinctrl change (in header), acked by Linus Walleij.
>Although I kept it on separate branch and merged here, but I did not
>prepare a separate pull request for this one pinctrl change - it is
>just one patch. Anyway it sits on separate branch so this is still
>possible if you need it.
> 
> Best regards,
> Krzysztof
> 
> Cc: Linus Walleij 
> 
> The following changes since commit 0c744ea4f77d72b3dcebb7a8f2684633ec79be88:
> 
>   Linux 4.10-rc2 (2017-01-01 14:31:53 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git 
> tags/samsung-dt64-4.11
> 
> for you to fetch changes up to e4e381133241a27d732e78be09973b89a193eaf7:
> 
>   arm64: dts: exynos: Enable HDMI/TV path on Exynos5433-TM2 (2017-01-11 
> 18:20:28 +0200)
> 
> 
> Samsung DeviceTree ARM64 update for v4.11:
> 1. Add bus frequency and voltage scalling on Exynos5433 TM2 device (along with
>necessary bus nodes and Platform Performance Monitoring Unit on 
> Exynos5433).
> 2. Use macros for pinctrl settings on Exynos5433.
>This contains necessary header with bindings.
> 3. Minor cleanups in Exynos5433 DTSI and boards using it.
> 4. Create common DTSI betweem Exynos5433 TM2E and TM2E.
> 5. Add HDMI/TV to Exynos5433 TM2.

Merged, thanks.


-Olof


Re: [PATCH] usb: dwc3: ep0: Fix the possible missed request for handling delay STATUS phase

2017-01-16 Thread Baolin Wang
Hi,

On 16 January 2017 at 20:06, Felipe Balbi  wrote:
>
> Hi,
>
> Baolin Wang  writes:
>> Hi,
>>
>> On 16 January 2017 at 19:29, Felipe Balbi  wrote:
>>>
>>> Hi,
>>>
>>> Baolin Wang  writes:
 Hi,

 On 16 January 2017 at 18:56, Felipe Balbi  wrote:
>
> Hi,
>
> Baolin Wang  writes:
>> When handing the SETUP packet by composite_setup(), we will release the
>> dwc->lock. If we get the 'USB_GADGET_DELAYED_STATUS' result from setup
>> function, which means we need to delay handling the STATUS phase.
>
> this sentence needs a little work. Seems like it's missing some
> information.
>
> anyway, I get that we release the lock but...
>
>> But during the lock release period, maybe the request for handling delay
>
> execution of ->setup() itself should be locked. I can see that it's only
> locked for set_config() which is rather peculiar.
>
> What exact request you had when you triggered this? (Hint: dwc3
> tracepoints print out ctrl request bytes). IIRC, only set_config() or
> f->set_alt() can actually return USB_GADGET_DELAYED_STATUS.

 Yes, when host set configuration for mass storage driver, it can
 produce this issue.

>
> Which gadget driver were you using when you triggered this?

 mass storage driver. When host issues the setting config request, we
 will get USB_GADGET_DELAYED_STATUS result from
 set_config()--->fsg_set_alt(). Then the mass storage driver will issue
 one thread to complete the status stage by ep0_queue() (this thread
 may be running on another core), then if the thread issues ep0_queue()
 too fast before we get the dwc->lock in dwc3_ep0_delegate_req() or
 before we get into the STATUS stage, then we can not handle this
 request for the delay STATUS stage in dwc3_gadget_ep0_queue().

>
> Another point here is that the really robust way of fixing this is to
> get rid of USB_GADGET_DELAYED_STATUS altogether and just make sure
> gadget drivers know how to queue requests for all three phases of a
> Control Transfer.
>
> A lot of code will be removed from all gadget drivers and UDC drivers
> while combining all of it in a single place in composite.c.
>
> The reason I'm saying this is that other UDC drivers might have similar
> races already but they just haven't triggered.

 Yes, maybe.

>
>> STATUS phase has been queued into list before we set 
>> 'dwc->delayed_status'
>> flag or entering 'EP0_STATUS_PHASE' phase, then we will miss the chance
>> to handle the STATUS phase. Thus we should check if the request for delay
>> STATUS phase has been enqueued when entering 'EP0_STATUS_PHASE' phase in
>> dwc3_ep0_xfernotready(), if so, we should handle it.
>>
>> Signed-off-by: Baolin Wang 
>> ---
>>  drivers/usb/dwc3/ep0.c |   14 ++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>> index 9bb1f85..e689ced 100644
>> --- a/drivers/usb/dwc3/ep0.c
>> +++ b/drivers/usb/dwc3/ep0.c
>> @@ -1123,7 +1123,21 @@ static void dwc3_ep0_xfernotready(struct dwc3 
>> *dwc,
>>   dwc->ep0state = EP0_STATUS_PHASE;
>>
>>   if (dwc->delayed_status) {
>> + struct dwc3_ep *dep = dwc->eps[0];
>> +
>>   WARN_ON_ONCE(event->endpoint_number != 1);
>> + /*
>> +  * We should handle the delay STATUS phase here if 
>> the
>> +  * request for handling delay STATUS has been 
>> queued
>> +  * into the list.
>> +  */
>> + if (!list_empty(>pending_list)) {
>> + dwc->delayed_status = false;
>> + usb_gadget_set_state(>gadget,
>> +  USB_STATE_CONFIGURED);
>
> Isn't this patch also changing the normal case when usb_ep_queue() comes
> later? I guess list_empty() protects against that...

 I think it will not change other cases, we only handle the delayed
 status and I've tested it for a while and I did not find any problem.
>>>
>>> Alright, it's important enough to fix this bug. Can you also have a look
>>> into dropping USB_GADGET_DELAYED_STATUS altogether? If you're too busy,
>>> no issues. It'll stay in my queue.
>>
>> Okay, I will have a look at f_mass_storage driver to see if we can
>> drop USB_GADGET_DELAYED_STATUS. Thanks.
>
> not only mass storage. It needs to be done for all drivers. The way to
> do that is to teach functions that control transfers are composed of 

Re: [PATCH] usb: dwc3: ep0: Fix the possible missed request for handling delay STATUS phase

2017-01-16 Thread Baolin Wang
Hi,

On 16 January 2017 at 20:06, Felipe Balbi  wrote:
>
> Hi,
>
> Baolin Wang  writes:
>> Hi,
>>
>> On 16 January 2017 at 19:29, Felipe Balbi  wrote:
>>>
>>> Hi,
>>>
>>> Baolin Wang  writes:
 Hi,

 On 16 January 2017 at 18:56, Felipe Balbi  wrote:
>
> Hi,
>
> Baolin Wang  writes:
>> When handing the SETUP packet by composite_setup(), we will release the
>> dwc->lock. If we get the 'USB_GADGET_DELAYED_STATUS' result from setup
>> function, which means we need to delay handling the STATUS phase.
>
> this sentence needs a little work. Seems like it's missing some
> information.
>
> anyway, I get that we release the lock but...
>
>> But during the lock release period, maybe the request for handling delay
>
> execution of ->setup() itself should be locked. I can see that it's only
> locked for set_config() which is rather peculiar.
>
> What exact request you had when you triggered this? (Hint: dwc3
> tracepoints print out ctrl request bytes). IIRC, only set_config() or
> f->set_alt() can actually return USB_GADGET_DELAYED_STATUS.

 Yes, when host set configuration for mass storage driver, it can
 produce this issue.

>
> Which gadget driver were you using when you triggered this?

 mass storage driver. When host issues the setting config request, we
 will get USB_GADGET_DELAYED_STATUS result from
 set_config()--->fsg_set_alt(). Then the mass storage driver will issue
 one thread to complete the status stage by ep0_queue() (this thread
 may be running on another core), then if the thread issues ep0_queue()
 too fast before we get the dwc->lock in dwc3_ep0_delegate_req() or
 before we get into the STATUS stage, then we can not handle this
 request for the delay STATUS stage in dwc3_gadget_ep0_queue().

>
> Another point here is that the really robust way of fixing this is to
> get rid of USB_GADGET_DELAYED_STATUS altogether and just make sure
> gadget drivers know how to queue requests for all three phases of a
> Control Transfer.
>
> A lot of code will be removed from all gadget drivers and UDC drivers
> while combining all of it in a single place in composite.c.
>
> The reason I'm saying this is that other UDC drivers might have similar
> races already but they just haven't triggered.

 Yes, maybe.

>
>> STATUS phase has been queued into list before we set 
>> 'dwc->delayed_status'
>> flag or entering 'EP0_STATUS_PHASE' phase, then we will miss the chance
>> to handle the STATUS phase. Thus we should check if the request for delay
>> STATUS phase has been enqueued when entering 'EP0_STATUS_PHASE' phase in
>> dwc3_ep0_xfernotready(), if so, we should handle it.
>>
>> Signed-off-by: Baolin Wang 
>> ---
>>  drivers/usb/dwc3/ep0.c |   14 ++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>> index 9bb1f85..e689ced 100644
>> --- a/drivers/usb/dwc3/ep0.c
>> +++ b/drivers/usb/dwc3/ep0.c
>> @@ -1123,7 +1123,21 @@ static void dwc3_ep0_xfernotready(struct dwc3 
>> *dwc,
>>   dwc->ep0state = EP0_STATUS_PHASE;
>>
>>   if (dwc->delayed_status) {
>> + struct dwc3_ep *dep = dwc->eps[0];
>> +
>>   WARN_ON_ONCE(event->endpoint_number != 1);
>> + /*
>> +  * We should handle the delay STATUS phase here if 
>> the
>> +  * request for handling delay STATUS has been 
>> queued
>> +  * into the list.
>> +  */
>> + if (!list_empty(>pending_list)) {
>> + dwc->delayed_status = false;
>> + usb_gadget_set_state(>gadget,
>> +  USB_STATE_CONFIGURED);
>
> Isn't this patch also changing the normal case when usb_ep_queue() comes
> later? I guess list_empty() protects against that...

 I think it will not change other cases, we only handle the delayed
 status and I've tested it for a while and I did not find any problem.
>>>
>>> Alright, it's important enough to fix this bug. Can you also have a look
>>> into dropping USB_GADGET_DELAYED_STATUS altogether? If you're too busy,
>>> no issues. It'll stay in my queue.
>>
>> Okay, I will have a look at f_mass_storage driver to see if we can
>> drop USB_GADGET_DELAYED_STATUS. Thanks.
>
> not only mass storage. It needs to be done for all drivers. The way to
> do that is to teach functions that control transfers are composed of two
> or three phases. If you look at UDC drivers today, they all have
> peculiarities about control transfers to handle stuff that *maybe*
> gadget 

Re: [PATCH 3/3] Reverted "mm: bail out in shrink_inactive_list()"

2017-01-16 Thread Minchan Kim
On Mon, Jan 16, 2017 at 08:33:17PM +0100, Michal Hocko wrote:
> From: Michal Hocko 
> 
> This reverts 91dcade47a3d0e7c31464ef05f56c08e92a0e9c2.
> inactive_reclaimable_pages shouldn't be needed anymore since that
> get_scan_count is aware of the eligble zones ("mm, vmscan: consider
> eligible zones in get_scan_count").
> 
> Signed-off-by: Michal Hocko 
Acked-by: Minchan Kim 


Re: [PATCH 3/3] Reverted "mm: bail out in shrink_inactive_list()"

2017-01-16 Thread Minchan Kim
On Mon, Jan 16, 2017 at 08:33:17PM +0100, Michal Hocko wrote:
> From: Michal Hocko 
> 
> This reverts 91dcade47a3d0e7c31464ef05f56c08e92a0e9c2.
> inactive_reclaimable_pages shouldn't be needed anymore since that
> get_scan_count is aware of the eligble zones ("mm, vmscan: consider
> eligible zones in get_scan_count").
> 
> Signed-off-by: Michal Hocko 
Acked-by: Minchan Kim 


Re: [PATCH 1/3] mm, vmscan: cleanup lru size claculations

2017-01-16 Thread Minchan Kim
On Mon, Jan 16, 2017 at 08:33:15PM +0100, Michal Hocko wrote:
> From: Michal Hocko 
> 
> lruvec_lru_size returns the full size of the LRU list while we sometimes
> need a value reduced only to eligible zones (e.g. for lowmem requests).
> inactive_list_is_low is one such user. Later patches will add more of
> them. Add a new parameter to lruvec_lru_size and allow it filter out
> zones which are not eligible for the given context.
> 
> Acked-by: Johannes Weiner 
> Signed-off-by: Michal Hocko 
> ---
>  include/linux/mmzone.h |  2 +-
>  mm/vmscan.c| 88 
> --
>  mm/workingset.c|  2 +-
>  3 files changed, 45 insertions(+), 47 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index d1d440cff60e..91f69aa0d581 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -780,7 +780,7 @@ static inline struct pglist_data *lruvec_pgdat(struct 
> lruvec *lruvec)
>  #endif
>  }
>  
> -extern unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list 
> lru);
> +extern unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list 
> lru, int zone_idx);
>  
>  #ifdef CONFIG_HAVE_MEMORY_PRESENT
>  void memory_present(int nid, unsigned long start, unsigned long end);
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index cf940af609fd..1cb0ebdef305 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -234,22 +234,38 @@ bool pgdat_reclaimable(struct pglist_data *pgdat)
>   pgdat_reclaimable_pages(pgdat) * 6;
>  }
>  
> -unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru)
> +/** lruvec_lru_size -  Returns the number of pages on the given LRU list.

minor:

/*
 * lruvec_lru_size

I don't have any preferance but just found.

Otherwise,
Acked-by: Minchan Kim 




Re: [PATCH 1/3] mm, vmscan: cleanup lru size claculations

2017-01-16 Thread Minchan Kim
On Mon, Jan 16, 2017 at 08:33:15PM +0100, Michal Hocko wrote:
> From: Michal Hocko 
> 
> lruvec_lru_size returns the full size of the LRU list while we sometimes
> need a value reduced only to eligible zones (e.g. for lowmem requests).
> inactive_list_is_low is one such user. Later patches will add more of
> them. Add a new parameter to lruvec_lru_size and allow it filter out
> zones which are not eligible for the given context.
> 
> Acked-by: Johannes Weiner 
> Signed-off-by: Michal Hocko 
> ---
>  include/linux/mmzone.h |  2 +-
>  mm/vmscan.c| 88 
> --
>  mm/workingset.c|  2 +-
>  3 files changed, 45 insertions(+), 47 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index d1d440cff60e..91f69aa0d581 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -780,7 +780,7 @@ static inline struct pglist_data *lruvec_pgdat(struct 
> lruvec *lruvec)
>  #endif
>  }
>  
> -extern unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list 
> lru);
> +extern unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list 
> lru, int zone_idx);
>  
>  #ifdef CONFIG_HAVE_MEMORY_PRESENT
>  void memory_present(int nid, unsigned long start, unsigned long end);
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index cf940af609fd..1cb0ebdef305 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -234,22 +234,38 @@ bool pgdat_reclaimable(struct pglist_data *pgdat)
>   pgdat_reclaimable_pages(pgdat) * 6;
>  }
>  
> -unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru)
> +/** lruvec_lru_size -  Returns the number of pages on the given LRU list.

minor:

/*
 * lruvec_lru_size

I don't have any preferance but just found.

Otherwise,
Acked-by: Minchan Kim 




[PATCH v3 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points

2017-01-16 Thread Eric Ren
Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
results in a deadlock, as the author "Tariq Saeed" realized shortly
after the patch was merged. The discussion happened here
(https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).

The reason why taking cluster inode lock at vfs entry points opens up
a self deadlock window, is explained in the previous patch of this
series.

So far, we have seen two different code paths that have this issue.
1. do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== take PR
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== take PR
2. fchmod|fchmodat
chmod_common
 notify_change
  ocfs2_setattr <=== take EX
   posix_acl_chmod
get_acl
 ocfs2_iop_get_acl <=== take PR
ocfs2_iop_set_acl <=== take EX

Fixes them by adding the tracking logic (in the previous patch) for
these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
ocfs2_setattr().

Changes since v1:
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .

- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi.

- Add debugging output at ocfs2_setattr() and ocfs2_permission() to
catch exceptional cases, suggested by: Junxiao Bi.

Changes since v2:
- Use new wrappers of tracking logic code, suggested by: Junxiao Bi.

Signed-off-by: Eric Ren 
---
 fs/ocfs2/acl.c  | 29 +
 fs/ocfs2/file.c | 58 -
 2 files changed, 58 insertions(+), 29 deletions(-)

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index bed1fcb..dc22ba8 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -283,16 +283,14 @@ int ocfs2_set_acl(handle_t *handle,
 int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 {
struct buffer_head *bh = NULL;
-   int status = 0;
+   int status, had_lock;
+   struct ocfs2_lock_holder oh;
 
-   status = ocfs2_inode_lock(inode, , 1);
-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
-   return status;
-   }
+   had_lock = ocfs2_inode_lock_tracker(inode, , 1, );
+   if (had_lock < 0)
+   return had_lock;
status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
-   ocfs2_inode_unlock(inode, 1);
+   ocfs2_inode_unlock_tracker(inode, 1, , had_lock);
brelse(bh);
return status;
 }
@@ -302,21 +300,20 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, 
int type)
struct ocfs2_super *osb;
struct buffer_head *di_bh = NULL;
struct posix_acl *acl;
-   int ret;
+   int had_lock;
+   struct ocfs2_lock_holder oh;
 
osb = OCFS2_SB(inode->i_sb);
if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
return NULL;
-   ret = ocfs2_inode_lock(inode, _bh, 0);
-   if (ret < 0) {
-   if (ret != -ENOENT)
-   mlog_errno(ret);
-   return ERR_PTR(ret);
-   }
+
+   had_lock = ocfs2_inode_lock_tracker(inode, _bh, 0, );
+   if (had_lock < 0)
+   return ERR_PTR(had_lock);
 
acl = ocfs2_get_acl_nolock(inode, type, di_bh);
 
-   ocfs2_inode_unlock(inode, 0);
+   ocfs2_inode_unlock_tracker(inode, 0, , had_lock);
brelse(di_bh);
return acl;
 }
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index c488965..7b6a146 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1138,6 +1138,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
handle_t *handle = NULL;
struct dquot *transfer_to[MAXQUOTAS] = { };
int qtype;
+   int had_lock;
+   struct ocfs2_lock_holder oh;
 
trace_ocfs2_setattr(inode, dentry,
(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -1173,11 +1175,30 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
}
}
 
-   status = ocfs2_inode_lock(inode, , 1);
-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
+   had_lock = ocfs2_inode_lock_tracker(inode, , 1, );
+   if (had_lock < 0) {
+   status = had_lock;
goto bail_unlock_rw;
+   } else if (had_lock) {
+   /*
+* As far as we know, ocfs2_setattr() could only be the first
+* VFS entry point in the call chain of recursive cluster
+* locking issue.
+*
+* For instance:
+* chmod_common()
+*  notify_change()
+*   ocfs2_setattr()
+ 

[PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-16 Thread Eric Ren
We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a
precess already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that
are invoked directly by vfs code. For instance:

const struct inode_operations ocfs2_file_iops = {
.permission = ocfs2_permission,
.get_acl= ocfs2_iop_get_acl,
.set_acl= ocfs2_iop_set_acl,
};

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== first time
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== recursive one

A deadlock will occur if a remote EX request comes in between two
of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started
on behalf of the remote EX lock request. Another hand, the recursive
cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
because there is no chance for the first cluster lock on this node to be
unlocked - we block ourselves in the code path.

The idea to fix this issue is mostly taken from gfs2 code.
1. introduce a new field: struct ocfs2_lock_res.l_holders, to
keep track of the processes' pid  who has taken the cluster lock
of this lock resource;
2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
it means just getting back disk inode bh for us if we've got cluster lock.
3. export a helper: ocfs2_is_locked_by_me() is used to check if we
have got the cluster lock in the upper code path.

The tracking logic should be used by some of the ocfs2 vfs's callbacks,
to solve the recursive locking issue cuased by the fact that vfs routines
can call into each other.

The performance penalty of processing the holder list should only be seen
at a few cases where the tracking logic is used, such as get/set acl.

You may ask what if the first time we got a PR lock, and the second time
we want a EX lock? fortunately, this case never happens in the real world,
as far as I can see, including permission check, (get|set)_(acl|attr), and
the gfs2 code also do so.

Changes since v1:
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .

- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi.

- Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell .

Changes since v2:
- Wrap the tracking logic code of recursive locking into functions,
ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(),
suggested by: Junxiao Bi.

[s...@canb.auug.org.au remove some inlines]
Signed-off-by: Eric Ren 
---
 fs/ocfs2/dlmglue.c | 105 +++--
 fs/ocfs2/dlmglue.h |  18 +
 fs/ocfs2/ocfs2.h   |   1 +
 3 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 77d1632..c75b9e9 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
init_waitqueue_head(>l_event);
INIT_LIST_HEAD(>l_blocked_list);
INIT_LIST_HEAD(>l_mask_waiters);
+   INIT_LIST_HEAD(>l_holders);
 }
 
 void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
@@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
res->l_flags = 0UL;
 }
 
+/*
+ * Keep a list of processes who have interest in a lockres.
+ * Note: this is now only uesed for check recursive cluster locking.
+ */
+static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   INIT_LIST_HEAD(>oh_list);
+   oh->oh_owner_pid =  get_pid(task_pid(current));
+
+   spin_lock(>l_lock);
+   list_add_tail(>oh_list, >l_holders);
+   spin_unlock(>l_lock);
+}
+
+static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   spin_lock(>l_lock);
+   list_del(>oh_list);
+   spin_unlock(>l_lock);
+
+   put_pid(oh->oh_owner_pid);
+}
+
+static inline int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres)
+{
+   struct ocfs2_lock_holder *oh;
+   struct pid *pid;
+
+   /* look in the list of holders for one with the current task as owner */
+   spin_lock(>l_lock);
+   pid = task_pid(current);
+   list_for_each_entry(oh, >l_holders, oh_list) {
+   if (oh->oh_owner_pid == pid) 

[PATCH v3 0/2] fix deadlock caused by recursive cluster locking

2017-01-16 Thread Eric Ren
This is a formal patch set v2 to solve the deadlock issue on which I
previously started a RFC (draft patch), and the discussion happened here:
[https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012455.html]

Compared to the previous draft patch, this one is much simple and neat.
It neither messes up the dlmglue core, nor has a performance penalty on
the whole cluster locking system. Instead, it is only used in places where
such recursive cluster locking may happen.
 
Changes since v1: 
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .
 
- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi. 
 
- Add debugging output at ocfs2_setattr() and ocfs2_permission() to
catch exceptional cases, suggested by: Junxiao Bi. 
 
- Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell .

Changes since v2: 
- Use new wrappers of tracking logic code, suggested by: Junxiao Bi.
 
Your comments and feedbacks are always welcomed.

Eric Ren (2):
  ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
  ocfs2: fix deadlock issue when taking inode lock at vfs entry points

 fs/ocfs2/acl.c |  29 +++
 fs/ocfs2/dlmglue.c | 105 +++--
 fs/ocfs2/dlmglue.h |  18 +
 fs/ocfs2/file.c|  58 ++---
 fs/ocfs2/ocfs2.h   |   1 +
 5 files changed, 179 insertions(+), 32 deletions(-)

-- 
2.10.2



[PATCH v3 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points

2017-01-16 Thread Eric Ren
Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
results in a deadlock, as the author "Tariq Saeed" realized shortly
after the patch was merged. The discussion happened here
(https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).

The reason why taking cluster inode lock at vfs entry points opens up
a self deadlock window, is explained in the previous patch of this
series.

So far, we have seen two different code paths that have this issue.
1. do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== take PR
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== take PR
2. fchmod|fchmodat
chmod_common
 notify_change
  ocfs2_setattr <=== take EX
   posix_acl_chmod
get_acl
 ocfs2_iop_get_acl <=== take PR
ocfs2_iop_set_acl <=== take EX

Fixes them by adding the tracking logic (in the previous patch) for
these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
ocfs2_setattr().

Changes since v1:
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .

- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi.

- Add debugging output at ocfs2_setattr() and ocfs2_permission() to
catch exceptional cases, suggested by: Junxiao Bi.

Changes since v2:
- Use new wrappers of tracking logic code, suggested by: Junxiao Bi.

Signed-off-by: Eric Ren 
---
 fs/ocfs2/acl.c  | 29 +
 fs/ocfs2/file.c | 58 -
 2 files changed, 58 insertions(+), 29 deletions(-)

diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index bed1fcb..dc22ba8 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -283,16 +283,14 @@ int ocfs2_set_acl(handle_t *handle,
 int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 {
struct buffer_head *bh = NULL;
-   int status = 0;
+   int status, had_lock;
+   struct ocfs2_lock_holder oh;
 
-   status = ocfs2_inode_lock(inode, , 1);
-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
-   return status;
-   }
+   had_lock = ocfs2_inode_lock_tracker(inode, , 1, );
+   if (had_lock < 0)
+   return had_lock;
status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
-   ocfs2_inode_unlock(inode, 1);
+   ocfs2_inode_unlock_tracker(inode, 1, , had_lock);
brelse(bh);
return status;
 }
@@ -302,21 +300,20 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, 
int type)
struct ocfs2_super *osb;
struct buffer_head *di_bh = NULL;
struct posix_acl *acl;
-   int ret;
+   int had_lock;
+   struct ocfs2_lock_holder oh;
 
osb = OCFS2_SB(inode->i_sb);
if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
return NULL;
-   ret = ocfs2_inode_lock(inode, _bh, 0);
-   if (ret < 0) {
-   if (ret != -ENOENT)
-   mlog_errno(ret);
-   return ERR_PTR(ret);
-   }
+
+   had_lock = ocfs2_inode_lock_tracker(inode, _bh, 0, );
+   if (had_lock < 0)
+   return ERR_PTR(had_lock);
 
acl = ocfs2_get_acl_nolock(inode, type, di_bh);
 
-   ocfs2_inode_unlock(inode, 0);
+   ocfs2_inode_unlock_tracker(inode, 0, , had_lock);
brelse(di_bh);
return acl;
 }
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index c488965..7b6a146 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1138,6 +1138,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
handle_t *handle = NULL;
struct dquot *transfer_to[MAXQUOTAS] = { };
int qtype;
+   int had_lock;
+   struct ocfs2_lock_holder oh;
 
trace_ocfs2_setattr(inode, dentry,
(unsigned long long)OCFS2_I(inode)->ip_blkno,
@@ -1173,11 +1175,30 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
}
}
 
-   status = ocfs2_inode_lock(inode, , 1);
-   if (status < 0) {
-   if (status != -ENOENT)
-   mlog_errno(status);
+   had_lock = ocfs2_inode_lock_tracker(inode, , 1, );
+   if (had_lock < 0) {
+   status = had_lock;
goto bail_unlock_rw;
+   } else if (had_lock) {
+   /*
+* As far as we know, ocfs2_setattr() could only be the first
+* VFS entry point in the call chain of recursive cluster
+* locking issue.
+*
+* For instance:
+* chmod_common()
+*  notify_change()
+*   ocfs2_setattr()
+*posix_acl_chmod()
+* 

[PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-16 Thread Eric Ren
We are in the situation that we have to avoid recursive cluster locking,
but there is no way to check if a cluster lock has been taken by a
precess already.

Mostly, we can avoid recursive locking by writing code carefully.
However, we found that it's very hard to handle the routines that
are invoked directly by vfs code. For instance:

const struct inode_operations ocfs2_file_iops = {
.permission = ocfs2_permission,
.get_acl= ocfs2_iop_get_acl,
.set_acl= ocfs2_iop_set_acl,
};

Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
do_sys_open
 may_open
  inode_permission
   ocfs2_permission
ocfs2_inode_lock() <=== first time
 generic_permission
  get_acl
   ocfs2_iop_get_acl
ocfs2_inode_lock() <=== recursive one

A deadlock will occur if a remote EX request comes in between two
of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:

On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
BAST(ocfs2_generic_handle_bast) when downconvert is started
on behalf of the remote EX lock request. Another hand, the recursive
cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
because there is no chance for the first cluster lock on this node to be
unlocked - we block ourselves in the code path.

The idea to fix this issue is mostly taken from gfs2 code.
1. introduce a new field: struct ocfs2_lock_res.l_holders, to
keep track of the processes' pid  who has taken the cluster lock
of this lock resource;
2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
it means just getting back disk inode bh for us if we've got cluster lock.
3. export a helper: ocfs2_is_locked_by_me() is used to check if we
have got the cluster lock in the upper code path.

The tracking logic should be used by some of the ocfs2 vfs's callbacks,
to solve the recursive locking issue cuased by the fact that vfs routines
can call into each other.

The performance penalty of processing the holder list should only be seen
at a few cases where the tracking logic is used, such as get/set acl.

You may ask what if the first time we got a PR lock, and the second time
we want a EX lock? fortunately, this case never happens in the real world,
as far as I can see, including permission check, (get|set)_(acl|attr), and
the gfs2 code also do so.

Changes since v1:
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .

- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi.

- Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell .

Changes since v2:
- Wrap the tracking logic code of recursive locking into functions,
ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(),
suggested by: Junxiao Bi.

[s...@canb.auug.org.au remove some inlines]
Signed-off-by: Eric Ren 
---
 fs/ocfs2/dlmglue.c | 105 +++--
 fs/ocfs2/dlmglue.h |  18 +
 fs/ocfs2/ocfs2.h   |   1 +
 3 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 77d1632..c75b9e9 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
init_waitqueue_head(>l_event);
INIT_LIST_HEAD(>l_blocked_list);
INIT_LIST_HEAD(>l_mask_waiters);
+   INIT_LIST_HEAD(>l_holders);
 }
 
 void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
@@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
res->l_flags = 0UL;
 }
 
+/*
+ * Keep a list of processes who have interest in a lockres.
+ * Note: this is now only uesed for check recursive cluster locking.
+ */
+static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   INIT_LIST_HEAD(>oh_list);
+   oh->oh_owner_pid =  get_pid(task_pid(current));
+
+   spin_lock(>l_lock);
+   list_add_tail(>oh_list, >l_holders);
+   spin_unlock(>l_lock);
+}
+
+static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
+  struct ocfs2_lock_holder *oh)
+{
+   spin_lock(>l_lock);
+   list_del(>oh_list);
+   spin_unlock(>l_lock);
+
+   put_pid(oh->oh_owner_pid);
+}
+
+static inline int ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres)
+{
+   struct ocfs2_lock_holder *oh;
+   struct pid *pid;
+
+   /* look in the list of holders for one with the current task as owner */
+   spin_lock(>l_lock);
+   pid = task_pid(current);
+   list_for_each_entry(oh, >l_holders, oh_list) {
+   if (oh->oh_owner_pid == pid) {
+   spin_unlock(>l_lock);
+   return 1;

[PATCH v3 0/2] fix deadlock caused by recursive cluster locking

2017-01-16 Thread Eric Ren
This is a formal patch set v2 to solve the deadlock issue on which I
previously started a RFC (draft patch), and the discussion happened here:
[https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012455.html]

Compared to the previous draft patch, this one is much simple and neat.
It neither messes up the dlmglue core, nor has a performance penalty on
the whole cluster locking system. Instead, it is only used in places where
such recursive cluster locking may happen.
 
Changes since v1: 
- Let ocfs2_is_locked_by_me() just return true/false to indicate if the
process gets the cluster lock - suggested by: Joseph Qi 
and Junxiao Bi .
 
- Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
suggested by: Junxiao Bi. 
 
- Add debugging output at ocfs2_setattr() and ocfs2_permission() to
catch exceptional cases, suggested by: Junxiao Bi. 
 
- Do not inline functions whose bodies are not in scope, changed by:
Stephen Rothwell .

Changes since v2: 
- Use new wrappers of tracking logic code, suggested by: Junxiao Bi.
 
Your comments and feedbacks are always welcomed.

Eric Ren (2):
  ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
  ocfs2: fix deadlock issue when taking inode lock at vfs entry points

 fs/ocfs2/acl.c |  29 +++
 fs/ocfs2/dlmglue.c | 105 +++--
 fs/ocfs2/dlmglue.h |  18 +
 fs/ocfs2/file.c|  58 ++---
 fs/ocfs2/ocfs2.h   |   1 +
 5 files changed, 179 insertions(+), 32 deletions(-)

-- 
2.10.2



Re: + mm-vmscan-add-mm_vmscan_inactive_list_is_low-tracepoint.patch added to -mm tree

2017-01-16 Thread Minchan Kim
Hello,

On Fri, Jan 13, 2017 at 10:10:09AM +0100, Michal Hocko wrote:
> On Fri 13-01-17 17:57:34, Minchan Kim wrote:
> > On Fri, Jan 13, 2017 at 08:47:07AM +0100, Michal Hocko wrote:
> > > On Fri 13-01-17 10:37:24, Minchan Kim wrote:
> > > > Hello,
> > > > 
> > > > On Thu, Jan 12, 2017 at 10:10:17AM +0100, Michal Hocko wrote:
> > > > > On Thu 12-01-17 17:48:13, Minchan Kim wrote:
> > > > > > On Thu, Jan 12, 2017 at 09:15:54AM +0100, Michal Hocko wrote:
> > > > > > > On Thu 12-01-17 14:12:47, Minchan Kim wrote:
> > > > > > > > Hello,
> > > > > > > > 
> > > > > > > > On Wed, Jan 11, 2017 at 04:52:39PM +0100, Michal Hocko wrote:
> > > > > > > > > On Wed 11-01-17 08:52:50, Minchan Kim wrote:
> > > > > > > > > [...]
> > > > > > > > > > > @@ -2055,8 +2055,8 @@ static bool 
> > > > > > > > > > > inactive_list_is_low(struct
> > > > > > > > > > >   if (!file && !total_swap_pages)
> > > > > > > > > > >   return false;
> > > > > > > > > > >  
> > > > > > > > > > > - inactive = lruvec_lru_size(lruvec, file * LRU_FILE);
> > > > > > > > > > > - active = lruvec_lru_size(lruvec, file * LRU_FILE + 
> > > > > > > > > > > LRU_ACTIVE);
> > > > > > > > > > > + total_inactive = inactive = lruvec_lru_size(lruvec, 
> > > > > > > > > > > file * LRU_FILE);
> > > > > > > > > > > + total_active = active = lruvec_lru_size(lruvec, file * 
> > > > > > > > > > > LRU_FILE + LRU_ACTIVE);
> > > > > > > > > > >  
> > > > > > > > > > 
> > > > > > > > > > the decision of deactivating is based on eligible zone's 
> > > > > > > > > > LRU size,
> > > > > > > > > > not whole zone so why should we need to get a trace of all 
> > > > > > > > > > zones's LRU?
> > > > > > > > > 
> > > > > > > > > Strictly speaking, the total_ counters are not necessary for 
> > > > > > > > > making the
> > > > > > > > > decision. I found reporting those numbers useful regardless 
> > > > > > > > > because this
> > > > > > > > > will give us also an information how large is the eligible 
> > > > > > > > > portion of
> > > > > > > > > the LRU list. We do not have any other tracepoint which would 
> > > > > > > > > report
> > > > > > > > > that.
> > > > > > > > 
> > > > > > > > The patch doesn't say anything why it's useful. Could you tell 
> > > > > > > > why it's
> > > > > > > > useful and inactive_list_is_low should be right place?
> > > > > > > > 
> > > > > > > > Don't get me wrong, please. I don't want to bother you.
> > > > > > > > I really don't want to add random stuff although it's 
> > > > > > > > tracepoint for
> > > > > > > > debugging.
> > > > > > > 
> > > > > > > This doesn't sounds random to me. We simply do not have a full 
> > > > > > > picture
> > > > > > > on 32b systems without this information. Especially when memcgs 
> > > > > > > are
> > > > > > > involved and global numbers spread over different LRUs.
> > > > > > 
> > > > > > Could you elaborate it?
> > > > > 
> > > > > The problem with 32b systems is that you only can consider a part of 
> > > > > the
> > > > > LRU for the lowmem requests. While we have global counters to see how
> > > > > much lowmem inactive/active pages we have, those get distributed to
> > > > > memcg LRUs. And that distribution is impossible to guess. So my 
> > > > > thinking
> > > > > is that it can become a real head scratcher to realize why certain
> > > > > active LRUs are aged while others are not. This was the case when I 
> > > > > was
> > > > > debugging the last issue which triggered all this. All of the sudden I
> > > > > have seen many invocations when inactive and active were zero which
> > > > > sounded weird, until I realized that those are memcg's lruvec which is
> > > > > what total numbers told me...
> > > > 
> > > > Hmm, it seems I miss something. AFAIU, what you need is just memcg
> > > > identifier, not all lru size. If it isn't, please tell more detail
> > > > usecase of all lru size in that particular tracepoint.
> > > 
> > > Having memcg id would be definitely helpful but that alone wouldn't tell
> > > us how is the lowmem distributed. To be honest I really fail to see why
> > > this bothers you all that much.
> > 
> > Because I fail to understand why you want to need additional all zone's
> > LRU stat in inactive_list_is_low. With clear understanding, we can think
> > over that it's really needed and right place to achieve the goal.
> > 
> > Could you say with a example you can think? It's really helpful to
> > understand why it's needed.
> 
> OK, I feel I am repeating myself but let me try again. Without the
> total_ numbers we really do not know how is the lowmem distributed over
> lruvecs. There is simply no way to get this information from existing
> counters if memcg is enabled.

I can't understand clearly why you need to know distribution.
Anyway, if we need it, why do you think such particular inactive_list_is_low
is right place?

Actually, IMO, there is no need to insert any tracepoint in 
inactive_list_is_low.
Instead, if we add tracepint in get_scan_count to record each LRU list 

Re: + mm-vmscan-add-mm_vmscan_inactive_list_is_low-tracepoint.patch added to -mm tree

2017-01-16 Thread Minchan Kim
Hello,

On Fri, Jan 13, 2017 at 10:10:09AM +0100, Michal Hocko wrote:
> On Fri 13-01-17 17:57:34, Minchan Kim wrote:
> > On Fri, Jan 13, 2017 at 08:47:07AM +0100, Michal Hocko wrote:
> > > On Fri 13-01-17 10:37:24, Minchan Kim wrote:
> > > > Hello,
> > > > 
> > > > On Thu, Jan 12, 2017 at 10:10:17AM +0100, Michal Hocko wrote:
> > > > > On Thu 12-01-17 17:48:13, Minchan Kim wrote:
> > > > > > On Thu, Jan 12, 2017 at 09:15:54AM +0100, Michal Hocko wrote:
> > > > > > > On Thu 12-01-17 14:12:47, Minchan Kim wrote:
> > > > > > > > Hello,
> > > > > > > > 
> > > > > > > > On Wed, Jan 11, 2017 at 04:52:39PM +0100, Michal Hocko wrote:
> > > > > > > > > On Wed 11-01-17 08:52:50, Minchan Kim wrote:
> > > > > > > > > [...]
> > > > > > > > > > > @@ -2055,8 +2055,8 @@ static bool 
> > > > > > > > > > > inactive_list_is_low(struct
> > > > > > > > > > >   if (!file && !total_swap_pages)
> > > > > > > > > > >   return false;
> > > > > > > > > > >  
> > > > > > > > > > > - inactive = lruvec_lru_size(lruvec, file * LRU_FILE);
> > > > > > > > > > > - active = lruvec_lru_size(lruvec, file * LRU_FILE + 
> > > > > > > > > > > LRU_ACTIVE);
> > > > > > > > > > > + total_inactive = inactive = lruvec_lru_size(lruvec, 
> > > > > > > > > > > file * LRU_FILE);
> > > > > > > > > > > + total_active = active = lruvec_lru_size(lruvec, file * 
> > > > > > > > > > > LRU_FILE + LRU_ACTIVE);
> > > > > > > > > > >  
> > > > > > > > > > 
> > > > > > > > > > the decision of deactivating is based on eligible zone's 
> > > > > > > > > > LRU size,
> > > > > > > > > > not whole zone so why should we need to get a trace of all 
> > > > > > > > > > zones's LRU?
> > > > > > > > > 
> > > > > > > > > Strictly speaking, the total_ counters are not necessary for 
> > > > > > > > > making the
> > > > > > > > > decision. I found reporting those numbers useful regardless 
> > > > > > > > > because this
> > > > > > > > > will give us also an information how large is the eligible 
> > > > > > > > > portion of
> > > > > > > > > the LRU list. We do not have any other tracepoint which would 
> > > > > > > > > report
> > > > > > > > > that.
> > > > > > > > 
> > > > > > > > The patch doesn't say anything why it's useful. Could you tell 
> > > > > > > > why it's
> > > > > > > > useful and inactive_list_is_low should be right place?
> > > > > > > > 
> > > > > > > > Don't get me wrong, please. I don't want to bother you.
> > > > > > > > I really don't want to add random stuff although it's 
> > > > > > > > tracepoint for
> > > > > > > > debugging.
> > > > > > > 
> > > > > > > This doesn't sounds random to me. We simply do not have a full 
> > > > > > > picture
> > > > > > > on 32b systems without this information. Especially when memcgs 
> > > > > > > are
> > > > > > > involved and global numbers spread over different LRUs.
> > > > > > 
> > > > > > Could you elaborate it?
> > > > > 
> > > > > The problem with 32b systems is that you only can consider a part of 
> > > > > the
> > > > > LRU for the lowmem requests. While we have global counters to see how
> > > > > much lowmem inactive/active pages we have, those get distributed to
> > > > > memcg LRUs. And that distribution is impossible to guess. So my 
> > > > > thinking
> > > > > is that it can become a real head scratcher to realize why certain
> > > > > active LRUs are aged while others are not. This was the case when I 
> > > > > was
> > > > > debugging the last issue which triggered all this. All of the sudden I
> > > > > have seen many invocations when inactive and active were zero which
> > > > > sounded weird, until I realized that those are memcg's lruvec which is
> > > > > what total numbers told me...
> > > > 
> > > > Hmm, it seems I miss something. AFAIU, what you need is just memcg
> > > > identifier, not all lru size. If it isn't, please tell more detail
> > > > usecase of all lru size in that particular tracepoint.
> > > 
> > > Having memcg id would be definitely helpful but that alone wouldn't tell
> > > us how is the lowmem distributed. To be honest I really fail to see why
> > > this bothers you all that much.
> > 
> > Because I fail to understand why you want to need additional all zone's
> > LRU stat in inactive_list_is_low. With clear understanding, we can think
> > over that it's really needed and right place to achieve the goal.
> > 
> > Could you say with a example you can think? It's really helpful to
> > understand why it's needed.
> 
> OK, I feel I am repeating myself but let me try again. Without the
> total_ numbers we really do not know how is the lowmem distributed over
> lruvecs. There is simply no way to get this information from existing
> counters if memcg is enabled.

I can't understand clearly why you need to know distribution.
Anyway, if we need it, why do you think such particular inactive_list_is_low
is right place?

Actually, IMO, there is no need to insert any tracepoint in 
inactive_list_is_low.
Instead, if we add tracepint in get_scan_count to record each LRU list 

Re: [PATCH v4 1/3] dt: bindings: add documentation for zx2967 family thermal sensor

2017-01-16 Thread Eduardo Valentin
On Tue, Jan 17, 2017 at 01:38:48PM +0800, Baoyou Xie wrote:
> On 17 January 2017 at 13:27, Eduardo Valentin  wrote:

> > It's a good idea:) In fact, we defined some thermal zones and used IPA
> on products.
> 

Cool! Any feedback on using IPA?


Re: [PATCH v4 1/3] dt: bindings: add documentation for zx2967 family thermal sensor

2017-01-16 Thread Eduardo Valentin
On Tue, Jan 17, 2017 at 01:38:48PM +0800, Baoyou Xie wrote:
> On 17 January 2017 at 13:27, Eduardo Valentin  wrote:

> > It's a good idea:) In fact, we defined some thermal zones and used IPA
> on products.
> 

Cool! Any feedback on using IPA?


[PATCH] ahci: qoriq: added ls2088a platforms support

2017-01-16 Thread yuantian.tang
From: Tang Yuantian 

Ls2088a is new introduced arm-based soc with sata support with
following features:
1. Complies with the serial ATA 3.0 specification and the AHCI 1.3.1
   specification
2. Contains a high-speed descriptor-based DMA controller
3. Supports the following:
   a. Speeds of 1.5 Gb/s (first-generation SATA), 3 Gb/s
  (second-generation SATA), and 6 Gb/s (third-generation SATA)
   b. FIS-based switching
   c. Native command queuing (NCQ) commands
   d. Port multiplier operation
   e. Asynchronous notification
   f. SATA BIST mode

Signed-off-by: Tang Yuantian 
---
 Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt | 2 +-
 drivers/ata/ahci_qoriq.c | 9 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt 
b/Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt
index fc33ca0..ed87c6f 100644
--- a/Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt
+++ b/Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt
@@ -3,7 +3,7 @@ Binding for Freescale QorIQ AHCI SATA Controller
 Required properties:
   - reg: Physical base address and size of the controller's register area.
   - compatible: Compatibility string. Must be 'fsl,-ahci', where
-chip could be ls1021a, ls1043a, ls1046a, ls2080a etc.
+chip could be ls1021a, ls1043a, ls1046a, ls2080a, ls2088a etc.
   - clocks: Input clock specifier. Refer to common clock bindings.
   - interrupts: Interrupt specifier. Refer to interrupt binding.
 
diff --git a/drivers/ata/ahci_qoriq.c b/drivers/ata/ahci_qoriq.c
index 66eb4b5..912fe32 100644
--- a/drivers/ata/ahci_qoriq.c
+++ b/drivers/ata/ahci_qoriq.c
@@ -53,6 +53,7 @@ enum ahci_qoriq_type {
AHCI_LS1043A,
AHCI_LS2080A,
AHCI_LS1046A,
+   AHCI_LS2088A,
 };
 
 struct ahci_qoriq_priv {
@@ -67,6 +68,7 @@ static const struct of_device_id ahci_qoriq_of_match[] = {
{ .compatible = "fsl,ls1043a-ahci", .data = (void *)AHCI_LS1043A},
{ .compatible = "fsl,ls2080a-ahci", .data = (void *)AHCI_LS2080A},
{ .compatible = "fsl,ls1046a-ahci", .data = (void *)AHCI_LS1046A},
+   { .compatible = "fsl,ls2088a-ahci", .data = (void *)AHCI_LS2088A},
{},
 };
 MODULE_DEVICE_TABLE(of, ahci_qoriq_of_match);
@@ -193,6 +195,13 @@ static int ahci_qoriq_phy_init(struct ahci_host_priv 
*hpriv)
if (qpriv->is_dmacoherent)
writel(AHCI_PORT_AXICC_CFG, reg_base + PORT_AXICC);
break;
+
+   case AHCI_LS2088A:
+   writel(AHCI_PORT_PHY_1_CFG, reg_base + PORT_PHY1);
+   writel(AHCI_PORT_TRANS_CFG, reg_base + PORT_TRANS);
+   if (qpriv->is_dmacoherent)
+   writel(AHCI_PORT_AXICC_CFG, reg_base + PORT_AXICC);
+   break;
}
 
return 0;
-- 
2.1.0.27.g96db324



[PATCH] ahci: qoriq: added ls2088a platforms support

2017-01-16 Thread yuantian.tang
From: Tang Yuantian 

Ls2088a is new introduced arm-based soc with sata support with
following features:
1. Complies with the serial ATA 3.0 specification and the AHCI 1.3.1
   specification
2. Contains a high-speed descriptor-based DMA controller
3. Supports the following:
   a. Speeds of 1.5 Gb/s (first-generation SATA), 3 Gb/s
  (second-generation SATA), and 6 Gb/s (third-generation SATA)
   b. FIS-based switching
   c. Native command queuing (NCQ) commands
   d. Port multiplier operation
   e. Asynchronous notification
   f. SATA BIST mode

Signed-off-by: Tang Yuantian 
---
 Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt | 2 +-
 drivers/ata/ahci_qoriq.c | 9 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt 
b/Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt
index fc33ca0..ed87c6f 100644
--- a/Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt
+++ b/Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt
@@ -3,7 +3,7 @@ Binding for Freescale QorIQ AHCI SATA Controller
 Required properties:
   - reg: Physical base address and size of the controller's register area.
   - compatible: Compatibility string. Must be 'fsl,-ahci', where
-chip could be ls1021a, ls1043a, ls1046a, ls2080a etc.
+chip could be ls1021a, ls1043a, ls1046a, ls2080a, ls2088a etc.
   - clocks: Input clock specifier. Refer to common clock bindings.
   - interrupts: Interrupt specifier. Refer to interrupt binding.
 
diff --git a/drivers/ata/ahci_qoriq.c b/drivers/ata/ahci_qoriq.c
index 66eb4b5..912fe32 100644
--- a/drivers/ata/ahci_qoriq.c
+++ b/drivers/ata/ahci_qoriq.c
@@ -53,6 +53,7 @@ enum ahci_qoriq_type {
AHCI_LS1043A,
AHCI_LS2080A,
AHCI_LS1046A,
+   AHCI_LS2088A,
 };
 
 struct ahci_qoriq_priv {
@@ -67,6 +68,7 @@ static const struct of_device_id ahci_qoriq_of_match[] = {
{ .compatible = "fsl,ls1043a-ahci", .data = (void *)AHCI_LS1043A},
{ .compatible = "fsl,ls2080a-ahci", .data = (void *)AHCI_LS2080A},
{ .compatible = "fsl,ls1046a-ahci", .data = (void *)AHCI_LS1046A},
+   { .compatible = "fsl,ls2088a-ahci", .data = (void *)AHCI_LS2088A},
{},
 };
 MODULE_DEVICE_TABLE(of, ahci_qoriq_of_match);
@@ -193,6 +195,13 @@ static int ahci_qoriq_phy_init(struct ahci_host_priv 
*hpriv)
if (qpriv->is_dmacoherent)
writel(AHCI_PORT_AXICC_CFG, reg_base + PORT_AXICC);
break;
+
+   case AHCI_LS2088A:
+   writel(AHCI_PORT_PHY_1_CFG, reg_base + PORT_PHY1);
+   writel(AHCI_PORT_TRANS_CFG, reg_base + PORT_TRANS);
+   if (qpriv->is_dmacoherent)
+   writel(AHCI_PORT_AXICC_CFG, reg_base + PORT_AXICC);
+   break;
}
 
return 0;
-- 
2.1.0.27.g96db324



Re: [PATCH v4 07/15] lockdep: Implement crossrelease feature

2017-01-16 Thread Boqun Feng
On Tue, Jan 17, 2017 at 11:33:41AM +0900, Byungchul Park wrote:
> On Mon, Jan 16, 2017 at 04:13:19PM +0100, Peter Zijlstra wrote:
> > On Fri, Dec 09, 2016 at 02:12:03PM +0900, Byungchul Park wrote:
> > > + /*
> > > +  * We assign class_idx here redundantly even though following
> > > +  * memcpy will cover it, in order to ensure a rcu reader can
> > > +  * access the class_idx atomically without lock.
> > > +  *
> > > +  * Here we assume setting a word-sized variable is atomic.
> > 
> > which one, where?
> 
> I meant xlock_class(xlock) in check_add_plock().
> 
> I was not sure about the following two.
> 
> 1. Is it ordered between following a and b?
>a. memcpy -> list_add_tail_rcu
>b. list_for_each_entry_rcu -> load class_idx (xlock_class)
>I assumed that it's not ordered.
> 2. Does memcpy guarantee atomic store for each word?
>I assumed that it doesn't.
> 
> But I think I was wrong.. The first might be ordered. I will remove
> the following redundant statement. It'd be orderd, right?
> 

Yes, a and b are ordered, IOW, they could be paired, meaning when we
got the item in a list_for_each_entry_rcu() loop, all memory operations
before the corresponding list_add_tail_rcu() should be observed by us.

Regards,
Boqun

> > 
> > > +  */
> > > + xlock->hlock.class_idx = hlock->class_idx;
> > > + gen_id = (unsigned int)atomic_inc_return(_gen_id);
> > > + WRITE_ONCE(xlock->gen_id, gen_id);
> > > + memcpy(>hlock, hlock, sizeof(struct held_lock));
> > > + INIT_LIST_HEAD(>xlock_entry);
> > > + list_add_tail_rcu(>xlock_entry, _head);
> > 


signature.asc
Description: PGP signature


Re: [PATCH v4 07/15] lockdep: Implement crossrelease feature

2017-01-16 Thread Boqun Feng
On Tue, Jan 17, 2017 at 11:33:41AM +0900, Byungchul Park wrote:
> On Mon, Jan 16, 2017 at 04:13:19PM +0100, Peter Zijlstra wrote:
> > On Fri, Dec 09, 2016 at 02:12:03PM +0900, Byungchul Park wrote:
> > > + /*
> > > +  * We assign class_idx here redundantly even though following
> > > +  * memcpy will cover it, in order to ensure a rcu reader can
> > > +  * access the class_idx atomically without lock.
> > > +  *
> > > +  * Here we assume setting a word-sized variable is atomic.
> > 
> > which one, where?
> 
> I meant xlock_class(xlock) in check_add_plock().
> 
> I was not sure about the following two.
> 
> 1. Is it ordered between following a and b?
>a. memcpy -> list_add_tail_rcu
>b. list_for_each_entry_rcu -> load class_idx (xlock_class)
>I assumed that it's not ordered.
> 2. Does memcpy guarantee atomic store for each word?
>I assumed that it doesn't.
> 
> But I think I was wrong.. The first might be ordered. I will remove
> the following redundant statement. It'd be orderd, right?
> 

Yes, a and b are ordered, IOW, they could be paired, meaning when we
got the item in a list_for_each_entry_rcu() loop, all memory operations
before the corresponding list_add_tail_rcu() should be observed by us.

Regards,
Boqun

> > 
> > > +  */
> > > + xlock->hlock.class_idx = hlock->class_idx;
> > > + gen_id = (unsigned int)atomic_inc_return(_gen_id);
> > > + WRITE_ONCE(xlock->gen_id, gen_id);
> > > + memcpy(>hlock, hlock, sizeof(struct held_lock));
> > > + INIT_LIST_HEAD(>xlock_entry);
> > > + list_add_tail_rcu(>xlock_entry, _head);
> > 


signature.asc
Description: PGP signature


Re: [GIT PULL] ARM: exynos: Fixes for v4.10

2017-01-16 Thread Olof Johansson
On Fri, Jan 13, 2017 at 04:32:25PM +0200, Krzysztof Kozlowski wrote:
> Hi,
> 
> Just a couple of minor fixes for this release cycle.
> 
> Best regards,
> Krzysztof
> 
> 
> The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77:
> 
>   Linux 4.10-rc1 (2016-12-25 16:13:08 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git 
> tags/samsung-fixes-4.10
> 
> for you to fetch changes up to 3ef01c968fbfb21c2f16281445d30a865ee4412c:
> 
>   ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_* (2017-01-02 
> 21:05:34 +0200)
> 
> 
> Samsung fixes for v4.10:
> 1. Update maintainers entry with Patchwork address.
> 2. Fix invalid values for NF_CT_PROTO_* in s3c2410 defconfig (these options
>cannot be modules anymore).
> 
> 
> Krzysztof Kozlowski (2):
>   MAINTAINERS: Add Patchwork URL to Samsung Exynos entry
>   ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_*

Merged, thanks.


-Olof



Re: [GIT PULL] ARM: exynos: Fixes for v4.10

2017-01-16 Thread Olof Johansson
On Fri, Jan 13, 2017 at 04:32:25PM +0200, Krzysztof Kozlowski wrote:
> Hi,
> 
> Just a couple of minor fixes for this release cycle.
> 
> Best regards,
> Krzysztof
> 
> 
> The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77:
> 
>   Linux 4.10-rc1 (2016-12-25 16:13:08 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git 
> tags/samsung-fixes-4.10
> 
> for you to fetch changes up to 3ef01c968fbfb21c2f16281445d30a865ee4412c:
> 
>   ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_* (2017-01-02 
> 21:05:34 +0200)
> 
> 
> Samsung fixes for v4.10:
> 1. Update maintainers entry with Patchwork address.
> 2. Fix invalid values for NF_CT_PROTO_* in s3c2410 defconfig (these options
>cannot be modules anymore).
> 
> 
> Krzysztof Kozlowski (2):
>   MAINTAINERS: Add Patchwork URL to Samsung Exynos entry
>   ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_*

Merged, thanks.


-Olof



[PATCH v2] thermal/intel_powerclamp: Fix sched_setscheduler fail

2017-01-16 Thread xinwu . liu
From: "Liu, Xinwu" 

The schedule policy of thread "kidle_inject" is SCHED_NORMAL:
[  772.796284] intel_powerclamp: Start idle injection to reduce power
[  772.825757] [ cut here ]
[  772.825877] WARNING: CPU: 0 PID: 2140 at 
../../../../../../kernel/glv/kernel/sched/idle.c:298 play_idle+0x16f/0x230
[  772.826096] Modules linked in: ip6table_raw iptable_raw hci_uart bluetooth 
rfkill_gpio cfg80211 imc_ipc(O)
[  772.826194] CPU: 0 PID: 2140 Comm: kidle_inject/0 Tainted: G  IO
4.9.0-ge701680f3a17 #1
[  772.826373]  c90001103d58 813ba085  

[  772.826552]  c90001103d98 810953e1 012a691441c0 
0006
[  772.826731]  0002 88006a39d708 8800691441c0 
e8c0dab8
[  772.826794] Call Trace:
[  772.826903]  [] dump_stack+0x67/0x92
[  772.827007]  [] __warn+0xd1/0xf0
[  772.827124]  [] warn_slowpath_null+0x1d/0x20
[  772.827234]  [] play_idle+0x16f/0x230
[  772.827365]  [] ? __schedule+0x217/0x6f0
[  772.827485]  [] clamp_idle_injection_func+0x6a/0x1f0
[  772.827612]  [] kthread_worker_fn+0xdd/0x200
[  772.827741]  [] ? __kthread_init_worker+0x50/0x50
[  772.827860]  [] kthread+0x102/0x120
[  772.828004]  [] ? kthread_park+0x60/0x60
[  772.828118]  [] ret_from_fork+0x27/0x40
[  772.828228] ---[ end trace bb738d5d79381554 ]---

Normaly, the user thread have not the capability "CAP_SYS_NICE"
(and they shouldn't), it will fail to change the schedule policy.
So, fix it by changing by themselves.

Signed-off-by: Liu, Xinwu 
Signed-off-by: Zhang, Di 
---
 drivers/thermal/intel_powerclamp.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/thermal/intel_powerclamp.c 
b/drivers/thermal/intel_powerclamp.c
index 83e6971..ea2d9cf 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -377,6 +377,7 @@ static bool powerclamp_adjust_controls(unsigned int 
target_ratio,
return set_target_ratio + guard <= current_ratio;
 }
 
+static DEFINE_PER_CPU(int, setscheduler_done);
 static void clamp_balancing_func(struct kthread_work *work)
 {
struct powerclamp_worker_data *w_data;
@@ -388,6 +389,10 @@ static void clamp_balancing_func(struct kthread_work *work)
w_data = container_of(work, struct powerclamp_worker_data,
  balancing_work);
 
+   if (unlikely(this_cpu_read(setscheduler_done) == 0)) {
+   sched_setscheduler(current, SCHED_FIFO, );
+   this_cpu_inc(setscheduler_done);
+   }
/*
 * make sure user selected ratio does not take effect until
 * the next round. adjust target_ratio if user has changed
@@ -507,7 +512,6 @@ static void start_power_clamp_worker(unsigned long cpu)
w_data->cpu = cpu;
w_data->clamping = true;
set_bit(cpu, cpu_clamping_mask);
-   sched_setscheduler(worker->task, SCHED_FIFO, );
kthread_init_work(_data->balancing_work, clamp_balancing_func);
kthread_init_delayed_work(_data->idle_injection_work,
  clamp_idle_injection_func);
-- 
1.9.1



[PATCH v2] thermal/intel_powerclamp: Fix sched_setscheduler fail

2017-01-16 Thread xinwu . liu
From: "Liu, Xinwu" 

The schedule policy of thread "kidle_inject" is SCHED_NORMAL:
[  772.796284] intel_powerclamp: Start idle injection to reduce power
[  772.825757] [ cut here ]
[  772.825877] WARNING: CPU: 0 PID: 2140 at 
../../../../../../kernel/glv/kernel/sched/idle.c:298 play_idle+0x16f/0x230
[  772.826096] Modules linked in: ip6table_raw iptable_raw hci_uart bluetooth 
rfkill_gpio cfg80211 imc_ipc(O)
[  772.826194] CPU: 0 PID: 2140 Comm: kidle_inject/0 Tainted: G  IO
4.9.0-ge701680f3a17 #1
[  772.826373]  c90001103d58 813ba085  

[  772.826552]  c90001103d98 810953e1 012a691441c0 
0006
[  772.826731]  0002 88006a39d708 8800691441c0 
e8c0dab8
[  772.826794] Call Trace:
[  772.826903]  [] dump_stack+0x67/0x92
[  772.827007]  [] __warn+0xd1/0xf0
[  772.827124]  [] warn_slowpath_null+0x1d/0x20
[  772.827234]  [] play_idle+0x16f/0x230
[  772.827365]  [] ? __schedule+0x217/0x6f0
[  772.827485]  [] clamp_idle_injection_func+0x6a/0x1f0
[  772.827612]  [] kthread_worker_fn+0xdd/0x200
[  772.827741]  [] ? __kthread_init_worker+0x50/0x50
[  772.827860]  [] kthread+0x102/0x120
[  772.828004]  [] ? kthread_park+0x60/0x60
[  772.828118]  [] ret_from_fork+0x27/0x40
[  772.828228] ---[ end trace bb738d5d79381554 ]---

Normaly, the user thread have not the capability "CAP_SYS_NICE"
(and they shouldn't), it will fail to change the schedule policy.
So, fix it by changing by themselves.

Signed-off-by: Liu, Xinwu 
Signed-off-by: Zhang, Di 
---
 drivers/thermal/intel_powerclamp.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/thermal/intel_powerclamp.c 
b/drivers/thermal/intel_powerclamp.c
index 83e6971..ea2d9cf 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -377,6 +377,7 @@ static bool powerclamp_adjust_controls(unsigned int 
target_ratio,
return set_target_ratio + guard <= current_ratio;
 }
 
+static DEFINE_PER_CPU(int, setscheduler_done);
 static void clamp_balancing_func(struct kthread_work *work)
 {
struct powerclamp_worker_data *w_data;
@@ -388,6 +389,10 @@ static void clamp_balancing_func(struct kthread_work *work)
w_data = container_of(work, struct powerclamp_worker_data,
  balancing_work);
 
+   if (unlikely(this_cpu_read(setscheduler_done) == 0)) {
+   sched_setscheduler(current, SCHED_FIFO, );
+   this_cpu_inc(setscheduler_done);
+   }
/*
 * make sure user selected ratio does not take effect until
 * the next round. adjust target_ratio if user has changed
@@ -507,7 +512,6 @@ static void start_power_clamp_worker(unsigned long cpu)
w_data->cpu = cpu;
w_data->clamping = true;
set_bit(cpu, cpu_clamping_mask);
-   sched_setscheduler(worker->task, SCHED_FIFO, );
kthread_init_work(_data->balancing_work, clamp_balancing_func);
kthread_init_delayed_work(_data->idle_injection_work,
  clamp_idle_injection_func);
-- 
1.9.1



Re: [PATCH v7 3/3] arm64: dts: exynos: Add tm2 touchkey node

2017-01-16 Thread Chanwoo Choi
Hi,

I tested this patch on v6[1] and replied it.
But, this version is missing the my tested-by and reviewed-by tag.
[1] https://patchwork.kernel.org/patch/9504139/

So, I add my reviewed-by and tested-by tag again.

Reviewed-by: Chanwoo Choi 
Tested-by: Chanwoo Choi 


On 2017년 01월 17일 14:54, Jaechul Lee wrote:
> Add DT node support for TM2 touchkey device.
> 
> Signed-off-by: Beomho Seo 
> Signed-off-by: Jaechul Lee 
> Signed-off-by: Andi Shyti 
> Reviewed-by: Javier Martinez Canillas 
> ---
>  arch/arm64/boot/dts/exynos/exynos5433-tm2.dts | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts 
> b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
> index 2449266..ddba2f8 100644
> --- a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
> +++ b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
> @@ -18,6 +18,19 @@
>   compatible = "samsung,tm2", "samsung,exynos5433";
>  };
>  
> +_9 {
> + status = "okay";
> +
> + touchkey@20 {
> + compatible = "cypress,tm2-touchkey";
> + reg = <0x20>;
> + interrupt-parent = <>;
> + interrupts = <2 IRQ_TYPE_EDGE_FALLING>;
> + vcc-supply = <_reg>;
> + vdd-supply = <_reg>;
> + };
> +};
> +
>  _reg {
>   regulator-name = "TSP_VDD_1.85V_AP";
>   regulator-min-microvolt = <185>;
> 


-- 
Best Regards,
Chanwoo Choi
S/W Center, Samsung Electronics


Re: [PATCH v7 3/3] arm64: dts: exynos: Add tm2 touchkey node

2017-01-16 Thread Chanwoo Choi
Hi,

I tested this patch on v6[1] and replied it.
But, this version is missing the my tested-by and reviewed-by tag.
[1] https://patchwork.kernel.org/patch/9504139/

So, I add my reviewed-by and tested-by tag again.

Reviewed-by: Chanwoo Choi 
Tested-by: Chanwoo Choi 


On 2017년 01월 17일 14:54, Jaechul Lee wrote:
> Add DT node support for TM2 touchkey device.
> 
> Signed-off-by: Beomho Seo 
> Signed-off-by: Jaechul Lee 
> Signed-off-by: Andi Shyti 
> Reviewed-by: Javier Martinez Canillas 
> ---
>  arch/arm64/boot/dts/exynos/exynos5433-tm2.dts | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts 
> b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
> index 2449266..ddba2f8 100644
> --- a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
> +++ b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
> @@ -18,6 +18,19 @@
>   compatible = "samsung,tm2", "samsung,exynos5433";
>  };
>  
> +_9 {
> + status = "okay";
> +
> + touchkey@20 {
> + compatible = "cypress,tm2-touchkey";
> + reg = <0x20>;
> + interrupt-parent = <>;
> + interrupts = <2 IRQ_TYPE_EDGE_FALLING>;
> + vcc-supply = <_reg>;
> + vdd-supply = <_reg>;
> + };
> +};
> +
>  _reg {
>   regulator-name = "TSP_VDD_1.85V_AP";
>   regulator-min-microvolt = <185>;
> 


-- 
Best Regards,
Chanwoo Choi
S/W Center, Samsung Electronics


Re: [PATCH v4] ARM64: dts: meson-gx: Add reserved memory zone and usable memory range

2017-01-16 Thread Olof Johansson
On Mon, Jan 16, 2017 at 2:39 AM, Neil Armstrong  wrote:
> On 01/15/2017 03:43 PM, Andreas Färber wrote:
>> Am 13.01.2017 um 21:03 schrieb Kevin Hilman:
>>> Neil Armstrong  writes:
>>>
 The Amlogic Meson GXBB/GXL/GXM secure monitor uses part of the memory 
 space,
 this patch adds this reserved zone and redefines the usable memory range.

 The memory node is also moved from the dtsi files into the proper dts files
 to handle variants memory sizes.

 This patch also fixes the memory sizes for the following platforms :
 - gxl-s905x-p212 : 1GiB instead of 2GiB, a proper 2GiB dts should be pushed
 - gxm-s912-q201 : 1GiB instead of 2GiB, a proper 2GiB dts should be pushed
 - gxl-s905d-p231 : 1GiB instead of 2GiB, a proper 2GiB dts should be pushed
 - gxl-nexbox-a95x : 1GiB instead of 2GiB, a proper 2GiB dts should be 
 pushed

 Signed-off-by: Neil Armstrong 
>>>
>>> Queued for v4.10-rc.
>>
>> What is the motivation for this change? I have a local U-Boot patch to
>> detect the amount of memory available as done downstream, but U-Boot
>> only updates the reg property that you seem to be abandoning here...
>>
>> So for devices that come in multiple RAM configurations - like R-Box Pro
>> - this would require separate .dts files now! This looks very wrong to
>> me, especially since I am not aware of other platforms doing the same.
>> Instead, there's memory reservations for top and bottom done in U-Boot
>> for reg, plus reserved-memory nodes for anything in the middle.
>>
>> Another thing to consider is that uEFI boot (bootefi) handles memory
>> reservation differently yet again, on the bootloader level. I have had
>> that working fine on Odroid-C2 and Vega S95.
>>
>> So if there's no bug this is fixing (none mentioned in commit message) I
>> strongly object to this patch.
>>
>> Regards,
>> Andreas
>>
>
> Hi Andreas,
>
> Like I replied of my RFT patch :
> I really disagree about relying on any work or properties added by any 
> bootloader here, Amlogic SoCs has
> a lot of u-boot versions in the field, and the Odroid-C2 is part of this.
>
> Even if Odroid-c2 is in mainline U-Boot or not, the mainline Linux kernel 
> should work using
> any U-boot version even with the one provided by Amlogic on their openlinux 
> distribution channel.
>
> Handling multiple RAM configuration is another story, and the Arm-Soc and DT 
> maintainers should give us
> their advices.

Is there a way to detect what firmware is running and marking off
memory from early kernel init instead? That'll take care of the
concerns about memory size variance as well.

> Actually there is a severe bug fixed here that cause a huge crash if such 
> memory is not reserved while
> running stock u-boot version on various shipped products and Amlogic's own 
> development boards.
>
> The bug is easily triggered by running :
> # stress --vm 4 --vm-bytes 128M --timeout 10s &
> [   46.937975] Bad mode in Error handler detected on CPU1, code 0xbf00 -- 
> SError
> ...
> [   47.058536] Internal error: Attempting to execute userspace memory: 
> 860f [#3] PREEMPT SMP
> ...
>
> Note this is a fix targeted for 4.10 to make the system stable and various 
> users reported some severe
> crash now the system has more drivers and read-world use-cases are running on 
> Amlogic SoCs.
>
> Please feel free to push whatever changes that makes this memory reservation 
> more coherent for 4.11,
> and respect the behavior of already shipped u-boot version and mainline 
> U-Boot, UEFI, whatever...

Technically we're not in regression territory here, since the platform
is obviously still in bringup and these aren't bugs that have been
introduced in this release. So I think we can take a little while to
sort out if there's a solution that, even if not ideal, at least is on
the path towards the proper fix and not away from it -- which this
seems to be.


-Olof


Re: [PATCH v4] ARM64: dts: meson-gx: Add reserved memory zone and usable memory range

2017-01-16 Thread Olof Johansson
On Mon, Jan 16, 2017 at 2:39 AM, Neil Armstrong  wrote:
> On 01/15/2017 03:43 PM, Andreas Färber wrote:
>> Am 13.01.2017 um 21:03 schrieb Kevin Hilman:
>>> Neil Armstrong  writes:
>>>
 The Amlogic Meson GXBB/GXL/GXM secure monitor uses part of the memory 
 space,
 this patch adds this reserved zone and redefines the usable memory range.

 The memory node is also moved from the dtsi files into the proper dts files
 to handle variants memory sizes.

 This patch also fixes the memory sizes for the following platforms :
 - gxl-s905x-p212 : 1GiB instead of 2GiB, a proper 2GiB dts should be pushed
 - gxm-s912-q201 : 1GiB instead of 2GiB, a proper 2GiB dts should be pushed
 - gxl-s905d-p231 : 1GiB instead of 2GiB, a proper 2GiB dts should be pushed
 - gxl-nexbox-a95x : 1GiB instead of 2GiB, a proper 2GiB dts should be 
 pushed

 Signed-off-by: Neil Armstrong 
>>>
>>> Queued for v4.10-rc.
>>
>> What is the motivation for this change? I have a local U-Boot patch to
>> detect the amount of memory available as done downstream, but U-Boot
>> only updates the reg property that you seem to be abandoning here...
>>
>> So for devices that come in multiple RAM configurations - like R-Box Pro
>> - this would require separate .dts files now! This looks very wrong to
>> me, especially since I am not aware of other platforms doing the same.
>> Instead, there's memory reservations for top and bottom done in U-Boot
>> for reg, plus reserved-memory nodes for anything in the middle.
>>
>> Another thing to consider is that uEFI boot (bootefi) handles memory
>> reservation differently yet again, on the bootloader level. I have had
>> that working fine on Odroid-C2 and Vega S95.
>>
>> So if there's no bug this is fixing (none mentioned in commit message) I
>> strongly object to this patch.
>>
>> Regards,
>> Andreas
>>
>
> Hi Andreas,
>
> Like I replied of my RFT patch :
> I really disagree about relying on any work or properties added by any 
> bootloader here, Amlogic SoCs has
> a lot of u-boot versions in the field, and the Odroid-C2 is part of this.
>
> Even if Odroid-c2 is in mainline U-Boot or not, the mainline Linux kernel 
> should work using
> any U-boot version even with the one provided by Amlogic on their openlinux 
> distribution channel.
>
> Handling multiple RAM configuration is another story, and the Arm-Soc and DT 
> maintainers should give us
> their advices.

Is there a way to detect what firmware is running and marking off
memory from early kernel init instead? That'll take care of the
concerns about memory size variance as well.

> Actually there is a severe bug fixed here that cause a huge crash if such 
> memory is not reserved while
> running stock u-boot version on various shipped products and Amlogic's own 
> development boards.
>
> The bug is easily triggered by running :
> # stress --vm 4 --vm-bytes 128M --timeout 10s &
> [   46.937975] Bad mode in Error handler detected on CPU1, code 0xbf00 -- 
> SError
> ...
> [   47.058536] Internal error: Attempting to execute userspace memory: 
> 860f [#3] PREEMPT SMP
> ...
>
> Note this is a fix targeted for 4.10 to make the system stable and various 
> users reported some severe
> crash now the system has more drivers and read-world use-cases are running on 
> Amlogic SoCs.
>
> Please feel free to push whatever changes that makes this memory reservation 
> more coherent for 4.11,
> and respect the behavior of already shipped u-boot version and mainline 
> U-Boot, UEFI, whatever...

Technically we're not in regression territory here, since the platform
is obviously still in bringup and these aren't bugs that have been
introduced in this release. So I think we can take a little while to
sort out if there's a solution that, even if not ideal, at least is on
the path towards the proper fix and not away from it -- which this
seems to be.


-Olof


Re: [PATCH 0/3] tty: serial: 8250_omap: Enable DMA support

2017-01-16 Thread Vignesh R


On Friday 13 January 2017 11:50 PM, Tony Lindgren wrote:
> * Vignesh R  [170113 00:03]:
>> This patch series re enables DMA support for UART 8250_omap driver.
>>
>> Tested on AM335x, AM437x that use EDMA and OMAP5 and DRA74 EVM with
>> SDMA.
> 
> Is 8250_omap serial console working for you on omap5 in general?
> 
> I've noticed that it's really unresponsive for me as if the FIFO
> interrupt was not working. For example logging in might take several
> attempts and a long time with each character showing up much later
> after some timeout.
> 

Yes, I did face the same issue on omap5 using 8250_omap driver for
console. Looks like this bug has existed all along.

> TX on the uart on omap5 seems to work OK, I do see all the console
> messages fine. And these patches do not make the issue better or
> worse.
> 

True.

-- 
Regards
Vignesh


Re: [PATCH 0/3] tty: serial: 8250_omap: Enable DMA support

2017-01-16 Thread Vignesh R


On Friday 13 January 2017 11:50 PM, Tony Lindgren wrote:
> * Vignesh R  [170113 00:03]:
>> This patch series re enables DMA support for UART 8250_omap driver.
>>
>> Tested on AM335x, AM437x that use EDMA and OMAP5 and DRA74 EVM with
>> SDMA.
> 
> Is 8250_omap serial console working for you on omap5 in general?
> 
> I've noticed that it's really unresponsive for me as if the FIFO
> interrupt was not working. For example logging in might take several
> attempts and a long time with each character showing up much later
> after some timeout.
> 

Yes, I did face the same issue on omap5 using 8250_omap driver for
console. Looks like this bug has existed all along.

> TX on the uart on omap5 seems to work OK, I do see all the console
> messages fine. And these patches do not make the issue better or
> worse.
> 

True.

-- 
Regards
Vignesh


[PATCH v7 0/3] Add touch key driver support for TM2

2017-01-16 Thread Jaechul Lee
Hi,

This patch is last three patch from https://lkml.org/lkml/2017/1/6/277.
because 1 and 2 patches have already been merged by Krzysztof.

This patchset adds support for the tm2 touchkey device.

The driver has been ported from Tizen Kernel, originally written
by Beomho. I ported it to the latest mainline Kernel.

Best Regard,
Jaechul

Changes in v7
 - added Chanwoo's reviewed and tested.
 - fixed reviews from Dmitry.

Changes in v6:
 - changed compatible name from samsaung to cypress.
 - updated commit tags.
 - removed first two patches from the original patchset.

Changes in v5:
 - patch 1: removed a spurious regulator-always-off inherited from
   a different patch. Thanks Krzysztof.
 - patch 2: fixed a slip on  the model, thanks Javier (this patch
   confuses me quite a lot, this was all right some patches ago
   and re appeared on this one).
 - patch 2: removed 'regulator' label and used the original ldo3x
   labels. Krzysztof: it looks better indeed.
 - added Javier's reviews and Krzysztof's acks on the related
   patches.

Changes in v4:
 - patch 1 has been rebased on top of 7c294e002641 (arm64: dts:
   exynos: Remove unsupported regulator-always-off property from
   TM2E)
 - patch 2 has been generated with -B50% diff option using git
   2.11

Changes in v3:
 - Changed the commit ordering, the tm2-touchkey related patches
   are the last 3.
 - Added Chanwoo's patch which fixes the wrong voltage of ldo23
   and ldo25.
 - Andi (patch 3) moves the ldo31 and ldo38 in the tm2 and tm2e
   files as they have different values.

Changes in v2:
 - fixed reviews from Javier, Dmitry
 - refactored power enable/disable functions.
 - reordered signed-offs in patch 2, while patch 4 is left as it
   was as Andi copy pasted the node to the new tm2.dts file
 - added Jarvier's (patch 1,2,4) and Krzysztof's (patch 4)
   reviews
   and Rob's Ack
 - patch 3 diff has been generated with -B50%

Jaechul Lee (3):
  input: Add support for the tm2 touchkey device driver
  input: tm2-touchkey: Add touchkey driver support for TM2
  arm64: dts: exynos: Add tm2 touchkey node

 .../bindings/input/cypress,tm2-touchkey.txt|  27 ++
 arch/arm64/boot/dts/exynos/exynos5433-tm2.dts  |  13 +
 drivers/input/keyboard/Kconfig |  11 +
 drivers/input/keyboard/Makefile|   1 +
 drivers/input/keyboard/tm2-touchkey.c  | 287 +
 5 files changed, 339 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/input/cypress,tm2-touchkey.txt
 create mode 100644 drivers/input/keyboard/tm2-touchkey.c

-- 
2.7.4



[PATCH v7 2/3] input: tm2-touchkey: Add touchkey driver support for TM2

2017-01-16 Thread Jaechul Lee
This patch adds support for the TM2 touch key and led
functionality.

The driver interfaces with userspace through an input device and
reports KEY_PHONE and KEY_BACK event types. LED brightness can be
controlled by "/sys/class/leds/tm2-touchkey/brightness".

Signed-off-by: Beomho Seo 
Signed-off-by: Jaechul Lee 
Reviewed-by: Javier Martinez Canillas 
Reviewed-by: Andi Shyti 
Reviewed-by: Chanwoo Choi 
Tested-by: Chanwoo Choi 
Acked-by: Krzysztof Kozlowski 
Signed-off-by: Dmitry Torokhov 
---
 drivers/input/keyboard/Kconfig|  11 ++
 drivers/input/keyboard/Makefile   |   1 +
 drivers/input/keyboard/tm2-touchkey.c | 286 ++
 3 files changed, 298 insertions(+)
 create mode 100644 drivers/input/keyboard/tm2-touchkey.c

diff --git a/drivers/input/keyboard/Kconfig b/drivers/input/keyboard/Kconfig
index cbd75cf..97acd65 100644
--- a/drivers/input/keyboard/Kconfig
+++ b/drivers/input/keyboard/Kconfig
@@ -666,6 +666,17 @@ config KEYBOARD_TC3589X
  To compile this driver as a module, choose M here: the
  module will be called tc3589x-keypad.
 
+config KEYBOARD_TM2_TOUCHKEY
+   tristate "TM2 touchkey support"
+   depends on I2C
+   depends on LEDS_CLASS
+   help
+ Say Y here to enable device driver for tm2-touchkey with
+ LED control for the Exynos5433 TM2 board.
+
+ To compile this driver as a module, choose M here.
+ module will be called tm2-touchkey.
+
 config KEYBOARD_TWL4030
tristate "TI TWL4030/TWL5030/TPS659x0 keypad support"
depends on TWL4030_CORE
diff --git a/drivers/input/keyboard/Makefile b/drivers/input/keyboard/Makefile
index d9f4cfc..7d9acff 100644
--- a/drivers/input/keyboard/Makefile
+++ b/drivers/input/keyboard/Makefile
@@ -61,6 +61,7 @@ obj-$(CONFIG_KEYBOARD_SUN4I_LRADC)+= sun4i-lradc-keys.o
 obj-$(CONFIG_KEYBOARD_SUNKBD)  += sunkbd.o
 obj-$(CONFIG_KEYBOARD_TC3589X) += tc3589x-keypad.o
 obj-$(CONFIG_KEYBOARD_TEGRA)   += tegra-kbc.o
+obj-$(CONFIG_KEYBOARD_TM2_TOUCHKEY)+= tm2-touchkey.o
 obj-$(CONFIG_KEYBOARD_TWL4030) += twl4030_keypad.o
 obj-$(CONFIG_KEYBOARD_XTKBD)   += xtkbd.o
 obj-$(CONFIG_KEYBOARD_W90P910) += w90p910_keypad.o
diff --git a/drivers/input/keyboard/tm2-touchkey.c 
b/drivers/input/keyboard/tm2-touchkey.c
new file mode 100644
index 000..916e2f3
--- /dev/null
+++ b/drivers/input/keyboard/tm2-touchkey.c
@@ -0,0 +1,286 @@
+/*
+ * TM2 touchkey device driver
+ *
+ * Copyright 2005 Phil Blundell
+ * Copyright 2016 Samsung Electronics Co., Ltd.
+ *
+ * Author: Beomho Seo 
+ * Author: Jaechul Lee 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define TM2_TOUCHKEY_DEV_NAME  "tm2-touchkey"
+#define TM2_TOUCHKEY_KEYCODE_REG   0x03
+#define TM2_TOUCHKEY_BASE_REG  0x00
+#define TM2_TOUCHKEY_CMD_LED_ON0x10
+#define TM2_TOUCHKEY_CMD_LED_OFF   0x20
+#define TM2_TOUCHKEY_BIT_PRESS_EV  BIT(3)
+#define TM2_TOUCHKEY_BIT_KEYCODE   GENMASK(2, 0)
+#define TM2_TOUCHKEY_LED_VOLTAGE_MIN   250
+#define TM2_TOUCHKEY_LED_VOLTAGE_MAX   330
+
+enum {
+   TM2_TOUCHKEY_KEY_MENU = 0x1,
+   TM2_TOUCHKEY_KEY_BACK,
+};
+
+struct tm2_touchkey_data {
+   struct i2c_client *client;
+   struct input_dev *input_dev;
+   struct led_classdev led_dev;
+   struct regulator *vdd;
+   struct regulator_bulk_data regulators[2];
+};
+
+static void tm2_touchkey_led_brightness_set(struct led_classdev *led_dev,
+   enum led_brightness brightness)
+{
+   struct tm2_touchkey_data *touchkey =
+   container_of(led_dev, struct tm2_touchkey_data, led_dev);
+   u32 volt;
+   u8 data;
+
+   if (brightness == LED_OFF) {
+   volt = TM2_TOUCHKEY_LED_VOLTAGE_MIN;
+   data = TM2_TOUCHKEY_CMD_LED_OFF;
+   } else {
+   volt = TM2_TOUCHKEY_LED_VOLTAGE_MAX;
+   data = TM2_TOUCHKEY_CMD_LED_ON;
+   }
+
+   regulator_set_voltage(touchkey->vdd, volt, volt);
+   i2c_smbus_write_byte_data(touchkey->client,
+ TM2_TOUCHKEY_BASE_REG, data);
+}
+
+static int tm2_touchkey_power_enable(struct tm2_touchkey_data *touchkey)
+{
+   int error;
+
+   error = regulator_bulk_enable(ARRAY_SIZE(touchkey->regulators),
+ touchkey->regulators);
+   if (error)
+   return error;
+
+   

[PATCH v7 0/3] Add touch key driver support for TM2

2017-01-16 Thread Jaechul Lee
Hi,

This patch is last three patch from https://lkml.org/lkml/2017/1/6/277.
because 1 and 2 patches have already been merged by Krzysztof.

This patchset adds support for the tm2 touchkey device.

The driver has been ported from Tizen Kernel, originally written
by Beomho. I ported it to the latest mainline Kernel.

Best Regard,
Jaechul

Changes in v7
 - added Chanwoo's reviewed and tested.
 - fixed reviews from Dmitry.

Changes in v6:
 - changed compatible name from samsaung to cypress.
 - updated commit tags.
 - removed first two patches from the original patchset.

Changes in v5:
 - patch 1: removed a spurious regulator-always-off inherited from
   a different patch. Thanks Krzysztof.
 - patch 2: fixed a slip on  the model, thanks Javier (this patch
   confuses me quite a lot, this was all right some patches ago
   and re appeared on this one).
 - patch 2: removed 'regulator' label and used the original ldo3x
   labels. Krzysztof: it looks better indeed.
 - added Javier's reviews and Krzysztof's acks on the related
   patches.

Changes in v4:
 - patch 1 has been rebased on top of 7c294e002641 (arm64: dts:
   exynos: Remove unsupported regulator-always-off property from
   TM2E)
 - patch 2 has been generated with -B50% diff option using git
   2.11

Changes in v3:
 - Changed the commit ordering, the tm2-touchkey related patches
   are the last 3.
 - Added Chanwoo's patch which fixes the wrong voltage of ldo23
   and ldo25.
 - Andi (patch 3) moves the ldo31 and ldo38 in the tm2 and tm2e
   files as they have different values.

Changes in v2:
 - fixed reviews from Javier, Dmitry
 - refactored power enable/disable functions.
 - reordered signed-offs in patch 2, while patch 4 is left as it
   was as Andi copy pasted the node to the new tm2.dts file
 - added Jarvier's (patch 1,2,4) and Krzysztof's (patch 4)
   reviews
   and Rob's Ack
 - patch 3 diff has been generated with -B50%

Jaechul Lee (3):
  input: Add support for the tm2 touchkey device driver
  input: tm2-touchkey: Add touchkey driver support for TM2
  arm64: dts: exynos: Add tm2 touchkey node

 .../bindings/input/cypress,tm2-touchkey.txt|  27 ++
 arch/arm64/boot/dts/exynos/exynos5433-tm2.dts  |  13 +
 drivers/input/keyboard/Kconfig |  11 +
 drivers/input/keyboard/Makefile|   1 +
 drivers/input/keyboard/tm2-touchkey.c  | 287 +
 5 files changed, 339 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/input/cypress,tm2-touchkey.txt
 create mode 100644 drivers/input/keyboard/tm2-touchkey.c

-- 
2.7.4



[PATCH v7 2/3] input: tm2-touchkey: Add touchkey driver support for TM2

2017-01-16 Thread Jaechul Lee
This patch adds support for the TM2 touch key and led
functionality.

The driver interfaces with userspace through an input device and
reports KEY_PHONE and KEY_BACK event types. LED brightness can be
controlled by "/sys/class/leds/tm2-touchkey/brightness".

Signed-off-by: Beomho Seo 
Signed-off-by: Jaechul Lee 
Reviewed-by: Javier Martinez Canillas 
Reviewed-by: Andi Shyti 
Reviewed-by: Chanwoo Choi 
Tested-by: Chanwoo Choi 
Acked-by: Krzysztof Kozlowski 
Signed-off-by: Dmitry Torokhov 
---
 drivers/input/keyboard/Kconfig|  11 ++
 drivers/input/keyboard/Makefile   |   1 +
 drivers/input/keyboard/tm2-touchkey.c | 286 ++
 3 files changed, 298 insertions(+)
 create mode 100644 drivers/input/keyboard/tm2-touchkey.c

diff --git a/drivers/input/keyboard/Kconfig b/drivers/input/keyboard/Kconfig
index cbd75cf..97acd65 100644
--- a/drivers/input/keyboard/Kconfig
+++ b/drivers/input/keyboard/Kconfig
@@ -666,6 +666,17 @@ config KEYBOARD_TC3589X
  To compile this driver as a module, choose M here: the
  module will be called tc3589x-keypad.
 
+config KEYBOARD_TM2_TOUCHKEY
+   tristate "TM2 touchkey support"
+   depends on I2C
+   depends on LEDS_CLASS
+   help
+ Say Y here to enable device driver for tm2-touchkey with
+ LED control for the Exynos5433 TM2 board.
+
+ To compile this driver as a module, choose M here.
+ module will be called tm2-touchkey.
+
 config KEYBOARD_TWL4030
tristate "TI TWL4030/TWL5030/TPS659x0 keypad support"
depends on TWL4030_CORE
diff --git a/drivers/input/keyboard/Makefile b/drivers/input/keyboard/Makefile
index d9f4cfc..7d9acff 100644
--- a/drivers/input/keyboard/Makefile
+++ b/drivers/input/keyboard/Makefile
@@ -61,6 +61,7 @@ obj-$(CONFIG_KEYBOARD_SUN4I_LRADC)+= sun4i-lradc-keys.o
 obj-$(CONFIG_KEYBOARD_SUNKBD)  += sunkbd.o
 obj-$(CONFIG_KEYBOARD_TC3589X) += tc3589x-keypad.o
 obj-$(CONFIG_KEYBOARD_TEGRA)   += tegra-kbc.o
+obj-$(CONFIG_KEYBOARD_TM2_TOUCHKEY)+= tm2-touchkey.o
 obj-$(CONFIG_KEYBOARD_TWL4030) += twl4030_keypad.o
 obj-$(CONFIG_KEYBOARD_XTKBD)   += xtkbd.o
 obj-$(CONFIG_KEYBOARD_W90P910) += w90p910_keypad.o
diff --git a/drivers/input/keyboard/tm2-touchkey.c 
b/drivers/input/keyboard/tm2-touchkey.c
new file mode 100644
index 000..916e2f3
--- /dev/null
+++ b/drivers/input/keyboard/tm2-touchkey.c
@@ -0,0 +1,286 @@
+/*
+ * TM2 touchkey device driver
+ *
+ * Copyright 2005 Phil Blundell
+ * Copyright 2016 Samsung Electronics Co., Ltd.
+ *
+ * Author: Beomho Seo 
+ * Author: Jaechul Lee 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define TM2_TOUCHKEY_DEV_NAME  "tm2-touchkey"
+#define TM2_TOUCHKEY_KEYCODE_REG   0x03
+#define TM2_TOUCHKEY_BASE_REG  0x00
+#define TM2_TOUCHKEY_CMD_LED_ON0x10
+#define TM2_TOUCHKEY_CMD_LED_OFF   0x20
+#define TM2_TOUCHKEY_BIT_PRESS_EV  BIT(3)
+#define TM2_TOUCHKEY_BIT_KEYCODE   GENMASK(2, 0)
+#define TM2_TOUCHKEY_LED_VOLTAGE_MIN   250
+#define TM2_TOUCHKEY_LED_VOLTAGE_MAX   330
+
+enum {
+   TM2_TOUCHKEY_KEY_MENU = 0x1,
+   TM2_TOUCHKEY_KEY_BACK,
+};
+
+struct tm2_touchkey_data {
+   struct i2c_client *client;
+   struct input_dev *input_dev;
+   struct led_classdev led_dev;
+   struct regulator *vdd;
+   struct regulator_bulk_data regulators[2];
+};
+
+static void tm2_touchkey_led_brightness_set(struct led_classdev *led_dev,
+   enum led_brightness brightness)
+{
+   struct tm2_touchkey_data *touchkey =
+   container_of(led_dev, struct tm2_touchkey_data, led_dev);
+   u32 volt;
+   u8 data;
+
+   if (brightness == LED_OFF) {
+   volt = TM2_TOUCHKEY_LED_VOLTAGE_MIN;
+   data = TM2_TOUCHKEY_CMD_LED_OFF;
+   } else {
+   volt = TM2_TOUCHKEY_LED_VOLTAGE_MAX;
+   data = TM2_TOUCHKEY_CMD_LED_ON;
+   }
+
+   regulator_set_voltage(touchkey->vdd, volt, volt);
+   i2c_smbus_write_byte_data(touchkey->client,
+ TM2_TOUCHKEY_BASE_REG, data);
+}
+
+static int tm2_touchkey_power_enable(struct tm2_touchkey_data *touchkey)
+{
+   int error;
+
+   error = regulator_bulk_enable(ARRAY_SIZE(touchkey->regulators),
+ touchkey->regulators);
+   if (error)
+   return error;
+
+   /* waiting for device initialization, at least 150ms */
+   msleep(150);
+
+   return 0;
+}
+
+static void tm2_touchkey_power_disable(void *data)
+{
+   struct tm2_touchkey_data *touchkey = data;
+
+   

[PATCH v7 3/3] arm64: dts: exynos: Add tm2 touchkey node

2017-01-16 Thread Jaechul Lee
Add DT node support for TM2 touchkey device.

Signed-off-by: Beomho Seo 
Signed-off-by: Jaechul Lee 
Signed-off-by: Andi Shyti 
Reviewed-by: Javier Martinez Canillas 
---
 arch/arm64/boot/dts/exynos/exynos5433-tm2.dts | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts 
b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
index 2449266..ddba2f8 100644
--- a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
+++ b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
@@ -18,6 +18,19 @@
compatible = "samsung,tm2", "samsung,exynos5433";
 };
 
+_9 {
+   status = "okay";
+
+   touchkey@20 {
+   compatible = "cypress,tm2-touchkey";
+   reg = <0x20>;
+   interrupt-parent = <>;
+   interrupts = <2 IRQ_TYPE_EDGE_FALLING>;
+   vcc-supply = <_reg>;
+   vdd-supply = <_reg>;
+   };
+};
+
 _reg {
regulator-name = "TSP_VDD_1.85V_AP";
regulator-min-microvolt = <185>;
-- 
2.7.4



[PATCH v7 1/3] input: Add support for the tm2 touchkey device driver

2017-01-16 Thread Jaechul Lee
This patch adds the binding description of the tm2 touchkey
device driver.

Signed-off-by: Jaechul Lee 
Reviewed-by: Javier Martinez Canillas 
Reviewed-by: Andi Shyti 
Reviewed-by: Chanwoo Choi 
Acked-by: Rob Herring 
Acked-by: Krzysztof Kozlowski 
---
 .../bindings/input/cypress,tm2-touchkey.txt| 27 ++
 1 file changed, 27 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/input/cypress,tm2-touchkey.txt

diff --git a/Documentation/devicetree/bindings/input/cypress,tm2-touchkey.txt 
b/Documentation/devicetree/bindings/input/cypress,tm2-touchkey.txt
new file mode 100644
index 000..635f62c
--- /dev/null
+++ b/Documentation/devicetree/bindings/input/cypress,tm2-touchkey.txt
@@ -0,0 +1,27 @@
+Samsung tm2-touchkey
+
+Required properties:
+- compatible: must be "cypress,tm2-touchkey"
+- reg: I2C address of the chip.
+- interrupt-parent: a phandle for the interrupt controller (see interrupt
+   binding[0]).
+- interrupts: interrupt to which the chip is connected (see interrupt
+   binding[0]).
+- vcc-supply : internal regulator output. 1.8V
+- vdd-supply : power supply for IC 3.3V
+
+[0]: Documentation/devicetree/bindings/interrupt-controller/interrupts.txt
+
+Example:
+{
+   /* ... */
+
+   touchkey@20 {
+   compatible = "cypress,tm2-touchkey";
+   reg = <0x20>;
+   interrupt-parent = <>;
+   interrupts = <2 IRQ_TYPE_EDGE_FALLING>;
+   vcc-supply=<_reg>;
+   vdd-supply=<_reg>;
+   };
+   };
-- 
2.7.4



[PATCH v7 3/3] arm64: dts: exynos: Add tm2 touchkey node

2017-01-16 Thread Jaechul Lee
Add DT node support for TM2 touchkey device.

Signed-off-by: Beomho Seo 
Signed-off-by: Jaechul Lee 
Signed-off-by: Andi Shyti 
Reviewed-by: Javier Martinez Canillas 
---
 arch/arm64/boot/dts/exynos/exynos5433-tm2.dts | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts 
b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
index 2449266..ddba2f8 100644
--- a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
+++ b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
@@ -18,6 +18,19 @@
compatible = "samsung,tm2", "samsung,exynos5433";
 };
 
+_9 {
+   status = "okay";
+
+   touchkey@20 {
+   compatible = "cypress,tm2-touchkey";
+   reg = <0x20>;
+   interrupt-parent = <>;
+   interrupts = <2 IRQ_TYPE_EDGE_FALLING>;
+   vcc-supply = <_reg>;
+   vdd-supply = <_reg>;
+   };
+};
+
 _reg {
regulator-name = "TSP_VDD_1.85V_AP";
regulator-min-microvolt = <185>;
-- 
2.7.4



[PATCH v7 1/3] input: Add support for the tm2 touchkey device driver

2017-01-16 Thread Jaechul Lee
This patch adds the binding description of the tm2 touchkey
device driver.

Signed-off-by: Jaechul Lee 
Reviewed-by: Javier Martinez Canillas 
Reviewed-by: Andi Shyti 
Reviewed-by: Chanwoo Choi 
Acked-by: Rob Herring 
Acked-by: Krzysztof Kozlowski 
---
 .../bindings/input/cypress,tm2-touchkey.txt| 27 ++
 1 file changed, 27 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/input/cypress,tm2-touchkey.txt

diff --git a/Documentation/devicetree/bindings/input/cypress,tm2-touchkey.txt 
b/Documentation/devicetree/bindings/input/cypress,tm2-touchkey.txt
new file mode 100644
index 000..635f62c
--- /dev/null
+++ b/Documentation/devicetree/bindings/input/cypress,tm2-touchkey.txt
@@ -0,0 +1,27 @@
+Samsung tm2-touchkey
+
+Required properties:
+- compatible: must be "cypress,tm2-touchkey"
+- reg: I2C address of the chip.
+- interrupt-parent: a phandle for the interrupt controller (see interrupt
+   binding[0]).
+- interrupts: interrupt to which the chip is connected (see interrupt
+   binding[0]).
+- vcc-supply : internal regulator output. 1.8V
+- vdd-supply : power supply for IC 3.3V
+
+[0]: Documentation/devicetree/bindings/interrupt-controller/interrupts.txt
+
+Example:
+{
+   /* ... */
+
+   touchkey@20 {
+   compatible = "cypress,tm2-touchkey";
+   reg = <0x20>;
+   interrupt-parent = <>;
+   interrupts = <2 IRQ_TYPE_EDGE_FALLING>;
+   vcc-supply=<_reg>;
+   vdd-supply=<_reg>;
+   };
+   };
-- 
2.7.4



Re: [PATCH] pcie: ti: Provide patch to force GEN1 PCIe operation

2017-01-16 Thread Kishon Vijay Abraham I
Hi Joao,

On Monday 16 January 2017 10:31 PM, Joao Pinto wrote:
> 
> Hi,
> 
> Às 10:13 AM de 1/16/2017, Kishon Vijay Abraham I escreveu:
>> + Joao, Jingoo
>>
>> Hi,
>>
>> On Monday 16 January 2017 03:01 PM, Lukasz Majewski wrote:
>>> Hi Kishon,
>>>
 Hi Łukasz,

 On Monday 16 January 2017 12:19 PM, Lukasz Majewski wrote:
> Hi Kishon,
>
>> Hi,
>>
>> On Sunday 15 January 2017 06:49 PM, Lukasz Majewski wrote:
>>> Some devices (due to e.g. bad PCIe signal integrity) require to
>>> run with forced GEN1 speed on PCIe bus.
>>>
>>> This patch changes the speed explicitly on dra7 based devices when
>>> proper device tree attribute is defined for the PCIe controller.
>>>
>>> Signed-off-by: Lukasz Majewski 
>>
>> Bjorn has already queued a patch to do the same thing
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_cgit_linux_kernel_git_helgaas_pci.git_log_-3Fh-3Dpci_host-2Ddra7xx=DwIDaQ=DPL6_X_6JkXFx7AXWqB0tg=s2fO0hii0OGNOv9qQy_HRXy-xAJUD1NNoEcc3io_kx0=zD82T5n4WcL7Ga-NSY2NI7KE75xQ99hN-mW2yX46wQk=E8zk1CbKxGH-f3fw_WpXxFU-A8BLkgA8NusCaxk1SvA=
>>  
>
> It seems like Bjorn only modifies CAP registers.

 The patch also modifies the LNKCTL2 register.
>
> He also needs to change register with 0x080C offset to actually
> ( PCIECTRL_PL_WIDTH_SPEED_CTL )

 This bit is used to initiate speed change (after the link is
 initialized in GEN1). Resetting the bit (like what you have done
 here) prevents speed change.
>>>
>>> This is strange, but e2e advised me to do things as I did in the patch
>>> to _force_ GEN1 operation on PCIe2 port [1] (AM5728)
>>>
>>> Link:
>>> [1] 
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__e2e.ti.com_support_arm_sitara-5Farm_f_791_t_566421=DwIDaQ=DPL6_X_6JkXFx7AXWqB0tg=s2fO0hii0OGNOv9qQy_HRXy-xAJUD1NNoEcc3io_kx0=zD82T5n4WcL7Ga-NSY2NI7KE75xQ99hN-mW2yX46wQk=uXLwglyRYqKpwp1JSxkOWmKpQ2wjfhgofpm8DCfquNw=
>>>  
>>>
>>> Both patches modify 0x5180 007C register to set GEN1 capability
>>> (PCI_EXP_LNKCAP_SLS_2_5GB)
>>>
>>> The problem is with second register (in your patch):
>>>
>>> From SPRUHZ6G TRM:
>>>
>>> PCIECTRL_EP_DBICS_LNK_CAS_2 (0x5180 00A0)
>>> - TRGT_LINK_SPEED (Reset 0x1) - "Target Link Speed" - no more
>>>   description in TRM
>>>
>>> It is set to PCI_EXP_LNKCAP_SLS_2_5GB = 0x1, which is the same as
>>> default /reset value.
>>
>> The default value is 0x2 (or else none of the cards would have enumerated in 
>> GEN2)
>>>
>>>
>>> Could you clarify which way to _force_ PCIe GEN1 operation is correct?
>>> Mine shows differences in lspci output (as posted in [1]).
>>
>> You'll see the difference even with the patch in Bjorn's tree ;-)
>>
>> I think these are 2 different approaches to keep the link at GEN1. Joao or
>> Jingoo, do you have any suggestion here?
> 
> I studied the Databook, and both approaches seem to be right, dependently of 
> the
> Core configuration and setup.
> 
> The standard manual speed change sequence is:
> a) Write to PCIE_CAP_TARGET_LINK_SPEED (indicating desired speed)
> b) Clear "Directed Speed Change"
> c) Set "Directed Speed Change"
> 
> If "Directed Speed Change" is set (DEFAULT_GEN2_SPEED_CHANGE is the default
> value), it will execute LTSSM to initiate speed change to Gen2 or Gen3, after
> link is started in Gen1, and then the bit is automatically cleared.
> 
> Lukasz is reseting this bit, in order to avoid the LTSSM to be executed, which
> is correct. There is another way to prevent this automatic speed change, which
> is to set GEN1 speed before link up which might be difficult in some setups, 
> so
> Kishon's also right.

Just for my understanding, why do you think this will be difficult in some 
setups?
> 
> In my opinion Lukasz approach would be the one that might be more universal 
> and
> more "secure".

IMHO setting link control in the standard PCIe header space should be more
universal. I'm not sure about the secure part though.

Thanks
Kishon
> 
> Joao
> 
> 
>>
>>>

 IMO the better way is to set the LNKCTL2 to GEN1 instead of hacking
 the IP register.
>>>
>>> From the original patch description:
>>>
>>> "Add support to force Root Complex to work in GEN1 mode if so desired,
>>> but don't force GEN1 mode on any board just yet."
>>>
>>> Are there any (floating around) patches allowing forcing GEN1 operation
>>> on any board (I would like to reuse/port them to my current solution)?
>>
>> For setting to GEN1 mode, "max-link-speed" should be set to 1 in dt with the
>> patch in Bjorn's tree.
>>
>> Thanks
>> Kishon
>>
> 


Re: [PATCH] pcie: ti: Provide patch to force GEN1 PCIe operation

2017-01-16 Thread Kishon Vijay Abraham I
Hi Joao,

On Monday 16 January 2017 10:31 PM, Joao Pinto wrote:
> 
> Hi,
> 
> Às 10:13 AM de 1/16/2017, Kishon Vijay Abraham I escreveu:
>> + Joao, Jingoo
>>
>> Hi,
>>
>> On Monday 16 January 2017 03:01 PM, Lukasz Majewski wrote:
>>> Hi Kishon,
>>>
 Hi Łukasz,

 On Monday 16 January 2017 12:19 PM, Lukasz Majewski wrote:
> Hi Kishon,
>
>> Hi,
>>
>> On Sunday 15 January 2017 06:49 PM, Lukasz Majewski wrote:
>>> Some devices (due to e.g. bad PCIe signal integrity) require to
>>> run with forced GEN1 speed on PCIe bus.
>>>
>>> This patch changes the speed explicitly on dra7 based devices when
>>> proper device tree attribute is defined for the PCIe controller.
>>>
>>> Signed-off-by: Lukasz Majewski 
>>
>> Bjorn has already queued a patch to do the same thing
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_cgit_linux_kernel_git_helgaas_pci.git_log_-3Fh-3Dpci_host-2Ddra7xx=DwIDaQ=DPL6_X_6JkXFx7AXWqB0tg=s2fO0hii0OGNOv9qQy_HRXy-xAJUD1NNoEcc3io_kx0=zD82T5n4WcL7Ga-NSY2NI7KE75xQ99hN-mW2yX46wQk=E8zk1CbKxGH-f3fw_WpXxFU-A8BLkgA8NusCaxk1SvA=
>>  
>
> It seems like Bjorn only modifies CAP registers.

 The patch also modifies the LNKCTL2 register.
>
> He also needs to change register with 0x080C offset to actually
> ( PCIECTRL_PL_WIDTH_SPEED_CTL )

 This bit is used to initiate speed change (after the link is
 initialized in GEN1). Resetting the bit (like what you have done
 here) prevents speed change.
>>>
>>> This is strange, but e2e advised me to do things as I did in the patch
>>> to _force_ GEN1 operation on PCIe2 port [1] (AM5728)
>>>
>>> Link:
>>> [1] 
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__e2e.ti.com_support_arm_sitara-5Farm_f_791_t_566421=DwIDaQ=DPL6_X_6JkXFx7AXWqB0tg=s2fO0hii0OGNOv9qQy_HRXy-xAJUD1NNoEcc3io_kx0=zD82T5n4WcL7Ga-NSY2NI7KE75xQ99hN-mW2yX46wQk=uXLwglyRYqKpwp1JSxkOWmKpQ2wjfhgofpm8DCfquNw=
>>>  
>>>
>>> Both patches modify 0x5180 007C register to set GEN1 capability
>>> (PCI_EXP_LNKCAP_SLS_2_5GB)
>>>
>>> The problem is with second register (in your patch):
>>>
>>> From SPRUHZ6G TRM:
>>>
>>> PCIECTRL_EP_DBICS_LNK_CAS_2 (0x5180 00A0)
>>> - TRGT_LINK_SPEED (Reset 0x1) - "Target Link Speed" - no more
>>>   description in TRM
>>>
>>> It is set to PCI_EXP_LNKCAP_SLS_2_5GB = 0x1, which is the same as
>>> default /reset value.
>>
>> The default value is 0x2 (or else none of the cards would have enumerated in 
>> GEN2)
>>>
>>>
>>> Could you clarify which way to _force_ PCIe GEN1 operation is correct?
>>> Mine shows differences in lspci output (as posted in [1]).
>>
>> You'll see the difference even with the patch in Bjorn's tree ;-)
>>
>> I think these are 2 different approaches to keep the link at GEN1. Joao or
>> Jingoo, do you have any suggestion here?
> 
> I studied the Databook, and both approaches seem to be right, dependently of 
> the
> Core configuration and setup.
> 
> The standard manual speed change sequence is:
> a) Write to PCIE_CAP_TARGET_LINK_SPEED (indicating desired speed)
> b) Clear "Directed Speed Change"
> c) Set "Directed Speed Change"
> 
> If "Directed Speed Change" is set (DEFAULT_GEN2_SPEED_CHANGE is the default
> value), it will execute LTSSM to initiate speed change to Gen2 or Gen3, after
> link is started in Gen1, and then the bit is automatically cleared.
> 
> Lukasz is reseting this bit, in order to avoid the LTSSM to be executed, which
> is correct. There is another way to prevent this automatic speed change, which
> is to set GEN1 speed before link up which might be difficult in some setups, 
> so
> Kishon's also right.

Just for my understanding, why do you think this will be difficult in some 
setups?
> 
> In my opinion Lukasz approach would be the one that might be more universal 
> and
> more "secure".

IMHO setting link control in the standard PCIe header space should be more
universal. I'm not sure about the secure part though.

Thanks
Kishon
> 
> Joao
> 
> 
>>
>>>

 IMO the better way is to set the LNKCTL2 to GEN1 instead of hacking
 the IP register.
>>>
>>> From the original patch description:
>>>
>>> "Add support to force Root Complex to work in GEN1 mode if so desired,
>>> but don't force GEN1 mode on any board just yet."
>>>
>>> Are there any (floating around) patches allowing forcing GEN1 operation
>>> on any board (I would like to reuse/port them to my current solution)?
>>
>> For setting to GEN1 mode, "max-link-speed" should be set to 1 in dt with the
>> patch in Bjorn's tree.
>>
>> Thanks
>> Kishon
>>
> 


  1   2   3   4   5   6   7   8   9   10   >