Re: [PATCH V3] cpuidle: Handle tick_broadcast_enter() failure gracefully

2015-05-08 Thread Preeti U Murthy
Hi Rafael,

On 05/08/2015 07:48 PM, Rafael J. Wysocki wrote:
>> +/*
>> + * find_tick_valid_state - select a state where tick does not stop
>> + * @dev: cpuidle device for this cpu
>> + * @drv: cpuidle driver for this cpu
>> + */
>> +static int find_tick_valid_state(struct cpuidle_device *dev,
>> +struct cpuidle_driver *drv)
>> +{
>> +int i, ret = -1;
>> +
>> +for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) {
>> +struct cpuidle_state *s = >states[i];
>> +struct cpuidle_state_usage *su = >states_usage[i];
>> +
>> +/*
>> + * We do not explicitly check for latency requirement
>> + * since it is safe to assume that only shallower idle
>> + * states will have the CPUIDLE_FLAG_TIMER_STOP bit
>> + * cleared and they will invariably meet the latency
>> + * requirement.
>> + */
>> +if (s->disabled || su->disable ||
>> +(s->flags & CPUIDLE_FLAG_TIMER_STOP))
>> +continue;
>> +
>> +ret = i;
>> +}
>> +return ret;
>> +}
>> +
>>  /**
>>   * cpuidle_enter_state - enter the state and update stats
>>   * @dev: cpuidle device for this cpu
>> @@ -168,10 +199,17 @@ int cpuidle_enter_state(struct cpuidle_device *dev, 
>> struct cpuidle_driver *drv,
>>   * CPU as a broadcast timer, this call may fail if it is not available.
>>   */
>>  if (broadcast && tick_broadcast_enter()) {
>> -default_idle_call();
>> -return -EBUSY;
>> +index = find_tick_valid_state(dev, drv);
> 
> Well, the new state needs to be deeper than the old one or you may violate the
> governor's choice and this doesn't guarantee that.

The comment above in find_tick_valid_state() explains why we are bound
to choose a shallow idle state. I think its safe to assume that any
state deeper than this one, would have the CPUIDLE_FLAG_TIMER_STOP flag
set and hence would be skipped.

Your patch relies on the assumption that the idle states are arranged in
the increasing order of exit_latency/in the order of shallow to deep.
This is not guaranteed, is it?

> 
> Also I don't quite see a reason to duplicate the find_deepest_state() 
> functionality
> here.

Agreed. We could club them like in your patch.

> 
>> +if (index < 0) {
>> +default_idle_call();
>> +return -EBUSY;
>> +}
>> +target_state = >states[index];
>>  }
>>  
>> +/* Take note of the planned idle state. */
>> +idle_set_state(smp_processor_id(), target_state);
> 
> And I wouldn't do this either.
> 
> The behavior here is pretty much as though the driver demoted the state chosen
> by the governor and we don't call idle_set_state() again in those cases.

Why is this wrong? The idea here is to set the idle state of the
runqueue to the one that it is more likely to enter into. Its is true
that the state has been demoted, but I don't see any code that requires
rq->idle_state to be a only a governor chosen state or nothing at all.

This is a more important chunk of this patch because it allows us to
track the idle states of the broadcast CPU. Else the system idle time is
bound to be higher than the residency time in different idle states of
all the CPUs. This shows up starkly as an anomaly if we are profiling
cpuidle state entry/exit.

> 
>> +
>>  trace_cpu_idle_rcuidle(index, dev->cpu);
>>  time_start = ktime_get();
> 
> Overall, something like the patch below (untested) should work I suppose?

With the exception of the above two points,yes this should work.
> 
> ---
>  drivers/cpuidle/cpuidle.c |   21 ++---
>  1 file changed, 14 insertions(+), 7 deletions(-)
> 
> Index: linux-pm/drivers/cpuidle/cpuidle.c
> ===
> --- linux-pm.orig/drivers/cpuidle/cpuidle.c
> +++ linux-pm/drivers/cpuidle/cpuidle.c
> @@ -73,17 +73,19 @@ int cpuidle_play_dead(void)
>  }
> 
>  static int find_deepest_state(struct cpuidle_driver *drv,
> -   struct cpuidle_device *dev, bool freeze)
> +   struct cpuidle_device *dev, bool freeze,
> +   int limit, unsigned int flags_to_avoid)
>  {
>   unsigned int latency_req = 0;
>   int i, ret = freeze ? -1 : CPUIDLE_DRIVER_STATE_START - 1;
> 
> - for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) {
> + for (i = CPUIDLE_DRIVER_STATE_START; i < limit; i++) {
>   struct cpuidle_state *s = >states[i];
>   struct cpuidle_state_usage *su = >states_usage[i];
> 
>   if (s->disabled || su->disable || s->exit_latency <= latency_req
> - || (freeze && !s->enter_freeze))
> + || (freeze && !s->enter_freeze)
> + || (s->flags & flags_to_avoid))
>   continue;
> 
>   

Re: [PATCH v3] tags: much faster, parallel "make tags"

2015-05-08 Thread Pádraig Brady
On 08/05/15 14:26, Alexey Dobriyan wrote:
> ctags is single-threaded program. Split list of files to be tagged into
> equal parts, 1 part for each CPU and then merge the results.
> 
> Speedup on one 2-way box I have is ~143 s => ~99 s (-31%).
> On another 4-way box: ~120 s => ~65 s (-46%!).
> 
> Resulting "tags" files aren't byte-for-byte identical because ctags
> program numbers anon struct and enum declarations with "__anonNNN"
> symbols. If those lines are removed, "tags" file becomes byte-for-byte
> identical with those generated with current code.
> 
> Signed-off-by: Alexey Dobriyan 
> ---
> 
>  scripts/tags.sh |   36 +++-
>  1 file changed, 31 insertions(+), 5 deletions(-)
> 
> --- a/scripts/tags.sh
> +++ b/scripts/tags.sh
> @@ -152,7 +152,19 @@ dogtags()
>  
>  exuberant()
>  {
> - all_target_sources | xargs $1 -a\
> + rm -f .make-tags.*
> +
> + all_target_sources >.make-tags.src
> + NR_CPUS=$(getconf _NPROCESSORS_ONLN 2>/dev/null || echo 1)

`nproc` is simpler and available since coreutils 8.1 (2009-11-18)

> + NR_LINES=$(wc -l <.make-tags.src)
> + NR_LINES=$((($NR_LINES + $NR_CPUS - 1) / $NR_CPUS))
> +
> + split -a 6 -d -l $NR_LINES .make-tags.src .make-tags.src.

`split -d -nl/$(nproc)` is simpler and available since coreutils 8.8 
(2010-12-22)

> +
> + for i in .make-tags.src.*; do
> + N=$(echo $i | sed -e 's/.*\.//')
> + # -u: don't sort now, sort later
> + xargs <$i $1 -a -f .make-tags.$N -u \
>   -I __initdata,__exitdata,__initconst,   \
>   -I __cpuinitdata,__initdata_memblock\
>   -I __refdata,__attribute,__maybe_unused,__always_unused \
> @@ -211,7 +223,21 @@ exuberant()
>   --regex-c='/DEFINE_PCI_DEVICE_TABLE\((\w*)/\1/v/'   \
>   --regex-c='/(^\s)OFFSET\((\w*)/\2/v/'   \
>   --regex-c='/(^\s)DEFINE\((\w*)/\2/v/'   \
> - --regex-c='/DEFINE_HASHTABLE\((\w*)/\1/v/'
> + --regex-c='/DEFINE_HASHTABLE\((\w*)/\1/v/'  \
> + &
> + done
> + wait
> + rm -f .make-tags.src .make-tags.src.*
> +
> + # write header
> + $1 -f $2 /dev/null
> + # remove headers
> + for i in .make-tags.*; do
> + sed -i -e '/^!/d' $i &
> + done
> + wait
> + sort .make-tags.* >>$2
> + rm -f .make-tags.*

Using sort --merge would speed up significantly?

Even faster would be to get sort to skip the header lines, avoiding the need 
for sed.
It's a bit awkward and was discussed at:
http://lists.gnu.org/archive/html/coreutils/2013-01/msg00027.html
Summarising that, is if not using merge you can:

  tlines=$(($(wc -l < "$2") + 1))
  tail -q -n+$tlines .make-tags.* | LC_ALL=C sort >>$2

Or if merge is appropriate then:

  tlines=$(($(wc -l < "$2") + 1))
  eval "eval LC_ALL=C sort -m '<(tail -n+$tlines .make-tags.'{1..$(nproc)}')'" 
>>$2

Note eval is fine here as inputs are controlled within the script

cheers,
Pádraig.

p.s. To avoid temp files altogether you could wire everything up through fifos,
though that's probably overkill here TBH

p.p.s. You may want to `trap EXIT cleanup` to rm -f .make-tags.*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 16/18] f2fs crypto: add symlink encryption

2015-05-08 Thread Al Viro
On Fri, May 08, 2015 at 09:20:51PM -0700, Jaegeuk Kim wrote:
> This patch implements encryption support for symlink.
> 
> The codes refered the ext4 symlink path.

ext4 symlink patches are seriously misguided - don't mix encrypted and
unencrypted cases in the same inode_operations.

NAK.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/18] f2fs: report unwritten area in f2fs_fiemap

2015-05-08 Thread Jaegeuk Kim
This patch slightly changes f2fs_fiemap function to report unwritten area.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/data.c | 117 +++--
 fs/f2fs/f2fs.h |   4 +-
 2 files changed, 117 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 3b76261..842fcdd 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1241,6 +1241,8 @@ static int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
if (dn.data_blkaddr != NULL_ADDR) {
map->m_flags = F2FS_MAP_MAPPED;
map->m_pblk = dn.data_blkaddr;
+   if (dn.data_blkaddr == NEW_ADDR)
+   map->m_flags |= F2FS_MAP_UNWRITTEN;
} else if (create) {
err = __allocate_data_block();
if (err)
@@ -1288,7 +1290,10 @@ get_next:
blkaddr = dn.data_blkaddr;
}
/* Give more consecutive addresses for the readahead */
-   if (map->m_pblk != NEW_ADDR && blkaddr == (map->m_pblk + ofs)) {
+   if ((map->m_pblk != NEW_ADDR &&
+   blkaddr == (map->m_pblk + ofs)) ||
+   (map->m_pblk == NEW_ADDR &&
+   blkaddr == NEW_ADDR)) {
ofs++;
dn.ofs_in_node++;
pgofs++;
@@ -1339,11 +1344,117 @@ static int get_data_block_fiemap(struct inode *inode, 
sector_t iblock,
return __get_data_block(inode, iblock, bh_result, create, true);
 }
 
+static inline sector_t logical_to_blk(struct inode *inode, loff_t offset)
+{
+   return (offset >> inode->i_blkbits);
+}
+
+static inline loff_t blk_to_logical(struct inode *inode, sector_t blk)
+{
+   return (blk << inode->i_blkbits);
+}
+
 int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
u64 start, u64 len)
 {
-   return generic_block_fiemap(inode, fieinfo,
-   start, len, get_data_block_fiemap);
+   struct buffer_head map_bh;
+   sector_t start_blk, last_blk;
+   loff_t isize = i_size_read(inode);
+   u64 logical = 0, phys = 0, size = 0;
+   u32 flags = 0;
+   bool past_eof = false, whole_file = false;
+   int ret = 0;
+
+   ret = fiemap_check_flags(fieinfo, FIEMAP_FLAG_SYNC);
+   if (ret)
+   return ret;
+
+   mutex_lock(>i_mutex);
+
+   if (len >= isize) {
+   whole_file = true;
+   len = isize;
+   }
+
+   if (logical_to_blk(inode, len) == 0)
+   len = blk_to_logical(inode, 1);
+
+   start_blk = logical_to_blk(inode, start);
+   last_blk = logical_to_blk(inode, start + len - 1);
+next:
+   memset(_bh, 0, sizeof(struct buffer_head));
+   map_bh.b_size = len;
+
+   ret = get_data_block_fiemap(inode, start_blk, _bh, 0);
+   if (ret)
+   goto out;
+
+   /* HOLE */
+   if (!buffer_mapped(_bh)) {
+   start_blk++;
+
+   if (!past_eof && blk_to_logical(inode, start_blk) >= isize)
+   past_eof = 1;
+
+   if (past_eof && size) {
+   flags |= FIEMAP_EXTENT_LAST;
+   ret = fiemap_fill_next_extent(fieinfo, logical,
+   phys, size, flags);
+   } else if (size) {
+   ret = fiemap_fill_next_extent(fieinfo, logical,
+   phys, size, flags);
+   size = 0;
+   }
+
+   /* if we have holes up to/past EOF then we're done */
+   if (start_blk > last_blk || past_eof || ret)
+   goto out;
+   } else {
+   if (start_blk > last_blk && !whole_file) {
+   ret = fiemap_fill_next_extent(fieinfo, logical,
+   phys, size, flags);
+   goto out;
+   }
+
+   /*
+* if size != 0 then we know we already have an extent
+* to add, so add it.
+*/
+   if (size) {
+   ret = fiemap_fill_next_extent(fieinfo, logical,
+   phys, size, flags);
+   if (ret)
+   goto out;
+   }
+
+   logical = blk_to_logical(inode, start_blk);
+   phys = blk_to_logical(inode, map_bh.b_blocknr);
+   size = map_bh.b_size;
+   flags = 0;
+   if (buffer_unwritten(_bh))
+   flags = FIEMAP_EXTENT_UNWRITTEN;
+
+   start_blk += logical_to_blk(inode, size);
+
+   /*
+* If we are past the EOF, then we need to make sure as
+* soon as we find a hole that the last extent we found
+* is 

[PATCH 06/18] f2fs crypto: add encryption policy and password salt support

2015-05-08 Thread Jaegeuk Kim
This patch adds encryption policy and password salt support through ioctl
implementation.

It adds three ioctls:
 F2FS_IOC_SET_ENCRYPTION_POLICY,
 F2FS_IOC_GET_ENCRYPTION_POLICY,
 F2FS_IOC_GET_ENCRYPTION_PWSALT, which use xattr operations.

Note that, these definition and codes are taken from ext4 crypto support.
For f2fs, xattr operations and on-disk flags for superblock and inode were
changed.

Signed-off-by: Michael Halcrow 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Ildar Muslukhov 
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/Makefile|   1 +
 fs/f2fs/crypto_policy.c | 206 
 fs/f2fs/f2fs.h  |  16 
 fs/f2fs/file.c  |  91 +
 fs/f2fs/xattr.c |   3 +
 5 files changed, 317 insertions(+)
 create mode 100644 fs/f2fs/crypto_policy.c

diff --git a/fs/f2fs/Makefile b/fs/f2fs/Makefile
index d923977..7864f4f 100644
--- a/fs/f2fs/Makefile
+++ b/fs/f2fs/Makefile
@@ -6,3 +6,4 @@ f2fs-$(CONFIG_F2FS_STAT_FS) += debug.o
 f2fs-$(CONFIG_F2FS_FS_XATTR) += xattr.o
 f2fs-$(CONFIG_F2FS_FS_POSIX_ACL) += acl.o
 f2fs-$(CONFIG_F2FS_IO_TRACE) += trace.o
+f2fs-$(CONFIG_F2FS_FS_ENCRYPTION) += crypto_policy.o
diff --git a/fs/f2fs/crypto_policy.c b/fs/f2fs/crypto_policy.c
new file mode 100644
index 000..bef254b
--- /dev/null
+++ b/fs/f2fs/crypto_policy.c
@@ -0,0 +1,206 @@
+/*
+ * copied from linux/fs/ext4/crypto_policy.c
+ *
+ * Copyright (C) 2015, Google, Inc.
+ * Copyright (C) 2015, Motorola Mobility.
+ *
+ * This contains encryption policy functions for f2fs with some modifications
+ * to support f2fs-specific xattr APIs.
+ *
+ * Written by Michael Halcrow, 2015.
+ * Modified by Jaegeuk Kim, 2015.
+ */
+#include 
+#include 
+#include 
+#include 
+
+#include "f2fs.h"
+#include "xattr.h"
+
+static int f2fs_inode_has_encryption_context(struct inode *inode)
+{
+   int res = f2fs_getxattr(inode, F2FS_XATTR_INDEX_ENCRYPTION,
+   F2FS_XATTR_NAME_ENCRYPTION_CONTEXT, NULL, 0, NULL);
+   return (res > 0);
+}
+
+/*
+ * check whether the policy is consistent with the encryption context
+ * for the inode
+ */
+static int f2fs_is_encryption_context_consistent_with_policy(
+   struct inode *inode, const struct f2fs_encryption_policy *policy)
+{
+   struct f2fs_encryption_context ctx;
+   int res = f2fs_getxattr(inode, F2FS_XATTR_INDEX_ENCRYPTION,
+   F2FS_XATTR_NAME_ENCRYPTION_CONTEXT, ,
+   sizeof(ctx), NULL);
+
+   if (res != sizeof(ctx))
+   return 0;
+
+   return (memcmp(ctx.master_key_descriptor, policy->master_key_descriptor,
+   F2FS_KEY_DESCRIPTOR_SIZE) == 0 &&
+   (ctx.flags == policy->flags) &&
+   (ctx.contents_encryption_mode ==
+policy->contents_encryption_mode) &&
+   (ctx.filenames_encryption_mode ==
+policy->filenames_encryption_mode));
+}
+
+static int f2fs_create_encryption_context_from_policy(
+   struct inode *inode, const struct f2fs_encryption_policy *policy)
+{
+   struct f2fs_encryption_context ctx;
+
+   ctx.format = F2FS_ENCRYPTION_CONTEXT_FORMAT_V1;
+   memcpy(ctx.master_key_descriptor, policy->master_key_descriptor,
+   F2FS_KEY_DESCRIPTOR_SIZE);
+
+   if (!f2fs_valid_contents_enc_mode(policy->contents_encryption_mode)) {
+   printk(KERN_WARNING
+  "%s: Invalid contents encryption mode %d\n", __func__,
+   policy->contents_encryption_mode);
+   return -EINVAL;
+   }
+
+   if (!f2fs_valid_filenames_enc_mode(policy->filenames_encryption_mode)) {
+   printk(KERN_WARNING
+  "%s: Invalid filenames encryption mode %d\n", __func__,
+   policy->filenames_encryption_mode);
+   return -EINVAL;
+   }
+
+   if (policy->flags & ~F2FS_POLICY_FLAGS_VALID)
+   return -EINVAL;
+
+   ctx.contents_encryption_mode = policy->contents_encryption_mode;
+   ctx.filenames_encryption_mode = policy->filenames_encryption_mode;
+   ctx.flags = policy->flags;
+   BUILD_BUG_ON(sizeof(ctx.nonce) != F2FS_KEY_DERIVATION_NONCE_SIZE);
+   get_random_bytes(ctx.nonce, F2FS_KEY_DERIVATION_NONCE_SIZE);
+
+   return f2fs_setxattr(inode, F2FS_XATTR_INDEX_ENCRYPTION,
+   F2FS_XATTR_NAME_ENCRYPTION_CONTEXT, ,
+   sizeof(ctx), NULL, 0);
+}
+
+int f2fs_process_policy(const struct f2fs_encryption_policy *policy,
+   struct inode *inode)
+{
+   if (policy->version != 0)
+   return -EINVAL;
+
+   if (!f2fs_inode_has_encryption_context(inode)) {
+   if (!f2fs_empty_dir(inode))
+   return -ENOTEMPTY;
+   return f2fs_create_encryption_context_from_policy(inode,
+   

[PATCH 03/18] f2fs crypto: declare some definitions for f2fs encryption feature

2015-05-08 Thread Jaegeuk Kim
This definitions will be used by inode and superblock for encyption.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/f2fs.h  |  54 ++
 fs/f2fs/f2fs_crypto.h   | 149 
 include/linux/f2fs_fs.h |   4 +-
 3 files changed, 206 insertions(+), 1 deletion(-)
 create mode 100644 fs/f2fs/f2fs_crypto.h

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 477e65f..c3c4deb 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -70,6 +70,8 @@ struct f2fs_mount_info {
unsigned intopt;
 };
 
+#define F2FS_FEATURE_ENCRYPT   0x0001
+
 #define F2FS_HAS_FEATURE(sb, mask) \
((F2FS_SB(sb)->raw_super->feature & cpu_to_le32(mask)) != 0)
 #define F2FS_SET_FEATURE(sb, mask) \
@@ -346,6 +348,7 @@ struct f2fs_map_blocks {
  */
 #define FADVISE_COLD_BIT   0x01
 #define FADVISE_LOST_PINO_BIT  0x02
+#define FADVISE_ENCRYPT_BIT0x04
 
 #define file_is_cold(inode)is_file(inode, FADVISE_COLD_BIT)
 #define file_wrong_pino(inode) is_file(inode, FADVISE_LOST_PINO_BIT)
@@ -353,6 +356,16 @@ struct f2fs_map_blocks {
 #define file_lost_pino(inode)  set_file(inode, FADVISE_LOST_PINO_BIT)
 #define file_clear_cold(inode) clear_file(inode, FADVISE_COLD_BIT)
 #define file_got_pino(inode)   clear_file(inode, FADVISE_LOST_PINO_BIT)
+#define file_is_encrypt(inode) is_file(inode, FADVISE_ENCRYPT_BIT)
+#define file_set_encrypt(inode)set_file(inode, FADVISE_ENCRYPT_BIT)
+#define file_clear_encrypt(inode) clear_file(inode, FADVISE_ENCRYPT_BIT)
+
+/* Encryption algorithms */
+#define F2FS_ENCRYPTION_MODE_INVALID   0
+#define F2FS_ENCRYPTION_MODE_AES_256_XTS   1
+#define F2FS_ENCRYPTION_MODE_AES_256_GCM   2
+#define F2FS_ENCRYPTION_MODE_AES_256_CBC   3
+#define F2FS_ENCRYPTION_MODE_AES_256_CTS   4
 
 #define DEF_DIR_LEVEL  0
 
@@ -380,6 +393,11 @@ struct f2fs_inode_info {
struct radix_tree_root inmem_root;  /* radix tree for inmem pages */
struct list_head inmem_pages;   /* inmemory pages managed by f2fs */
struct mutex inmem_lock;/* lock for inmemory pages */
+
+#ifdef CONFIG_F2FS_FS_ENCRYPTION
+   /* Encryption params */
+   struct f2fs_crypt_info *i_crypt_info;
+#endif
 };
 
 static inline void get_extent_info(struct extent_info *ext,
@@ -1891,4 +1909,40 @@ void f2fs_delete_inline_entry(struct f2fs_dir_entry *, 
struct page *,
struct inode *, struct inode *);
 bool f2fs_empty_inline_dir(struct inode *);
 int f2fs_read_inline_dir(struct file *, struct dir_context *);
+
+/*
+ * crypto support
+ */
+static inline int f2fs_encrypted_inode(struct inode *inode)
+{
+#ifdef CONFIG_F2FS_FS_ENCRYPTION
+   return file_is_encrypt(inode);
+#else
+   return 0;
+#endif
+}
+
+static inline void f2fs_set_encrypted_inode(struct inode *inode)
+{
+#ifdef CONFIG_F2FS_FS_ENCRYPTION
+   file_set_encrypt(inode);
+#endif
+}
+
+static inline bool f2fs_bio_encrypted(struct bio *bio)
+{
+#ifdef CONFIG_F2FS_FS_ENCRYPTION
+   return unlikely(bio->bi_private != NULL);
+#else
+   return false;
+#endif
+}
+
+static inline int f2fs_sb_has_crypto(struct super_block *sb)
+{
+#ifdef CONFIG_F2FS_FS_ENCRYPTION
+   return F2FS_HAS_FEATURE(sb, F2FS_FEATURE_ENCRYPT);
+#else
+   return 0;
+#endif
 #endif
diff --git a/fs/f2fs/f2fs_crypto.h b/fs/f2fs/f2fs_crypto.h
new file mode 100644
index 000..cfc37c1
--- /dev/null
+++ b/fs/f2fs/f2fs_crypto.h
@@ -0,0 +1,149 @@
+/*
+ * linux/fs/f2fs/f2fs_crypto.h
+ *
+ * Copied from linux/fs/ext4/ext4_crypto.h
+ *
+ * Copyright (C) 2015, Google, Inc.
+ *
+ * This contains encryption header content for f2fs
+ *
+ * Written by Michael Halcrow, 2015.
+ * Modified by Jaegeuk Kim, 2015.
+ */
+#ifndef _F2FS_CRYPTO_H
+#define _F2FS_CRYPTO_H
+
+#include 
+
+#define F2FS_KEY_DESCRIPTOR_SIZE   8
+
+/* Policy provided via an ioctl on the topmost directory */
+struct f2fs_encryption_policy {
+   char version;
+   char contents_encryption_mode;
+   char filenames_encryption_mode;
+   char flags;
+   char master_key_descriptor[F2FS_KEY_DESCRIPTOR_SIZE];
+} __attribute__((__packed__));
+
+#define F2FS_ENCRYPTION_CONTEXT_FORMAT_V1  1
+#define F2FS_KEY_DERIVATION_NONCE_SIZE 16
+
+#define F2FS_POLICY_FLAGS_PAD_40x00
+#define F2FS_POLICY_FLAGS_PAD_80x01
+#define F2FS_POLICY_FLAGS_PAD_16   0x02
+#define F2FS_POLICY_FLAGS_PAD_32   0x03
+#define F2FS_POLICY_FLAGS_PAD_MASK 0x03
+#define F2FS_POLICY_FLAGS_VALID0x03
+
+/**
+ * Encryption context for inode
+ *
+ * Protector format:
+ *  1 byte: Protector format (1 = this version)
+ *  1 byte: File contents encryption mode
+ *  1 byte: File names encryption mode
+ *  1 byte: Flags
+ *  8 bytes: Master Key descriptor
+ *  16 bytes: Encryption Key derivation nonce
+ */
+struct f2fs_encryption_context {
+   char format;
+

[PATCH 10/18] f2fs crypto: activate encryption support for fs APIs

2015-05-08 Thread Jaegeuk Kim
This patch activates the following APIs for encryption support.

The rules quoted by ext4 are:
 - An unencrypted directory may contain encrypted or unencrypted files
   or directories.
 - All files or directories in a directory must be protected using the
   same key as their containing directory.
 - Encrypted inode for regular file should not have inline_data.
 - Encrypted symlink and directory may have inline_data and inline_dentry.

This patch activates the following APIs.
1. f2fs_link  : validate context
2. f2fs_lookup:  ''
3. f2fs_rename:  ''
4. f2fs_create/f2fs_mkdir : inherit its dir's context
5. f2fs_direct_IO : do buffered io for regular files
6. f2fs_open  : check encryption info
7. f2fs_file_mmap :  ''
8. f2fs_setattr   :  ''
9. f2fs_file_write_iter   :  ''   (Called by sys_io_submit)

Signed-off-by: Michael Halcrow 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/data.c   |  3 +++
 fs/f2fs/dir.c|  6 ++
 fs/f2fs/f2fs.h   | 11 +++
 fs/f2fs/file.c   | 34 --
 fs/f2fs/inline.c |  3 +++
 fs/f2fs/namei.c  | 38 --
 6 files changed, 87 insertions(+), 8 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 842fcdd..473b4d4 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1982,6 +1982,9 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter,
return err;
}
 
+   if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode))
+   return 0;
+
if (check_direct_IO(inode, iter, offset))
return 0;
 
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index 9d558d2..f7293a2 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -390,6 +390,12 @@ struct page *init_inode_metadata(struct inode *inode, 
struct inode *dir,
err = f2fs_init_security(inode, dir, name, page);
if (err)
goto put_error;
+
+   if (f2fs_encrypted_inode(dir) && f2fs_may_encrypt(inode)) {
+   err = f2fs_inherit_context(dir, inode, page);
+   if (err)
+   goto put_error;
+   }
} else {
page = get_node_page(F2FS_I_SB(dir), inode->i_ino);
if (IS_ERR(page))
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index e99205b..544766e 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1975,6 +1975,17 @@ static inline int f2fs_sb_has_crypto(struct super_block 
*sb)
 #endif
 }
 
+static inline bool f2fs_may_encrypt(struct inode *inode)
+{
+#ifdef CONFIG_F2FS_FS_ENCRYPTION
+   mode_t mode = inode->i_mode;
+
+   return (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode));
+#else
+   return 0;
+#endif
+}
+
 /* crypto_policy.c */
 int f2fs_is_child_context_consistent_with_parent(struct inode *,
struct inode *);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 7236be4..9f4b34c 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -408,6 +408,12 @@ static int f2fs_file_mmap(struct file *file, struct 
vm_area_struct *vma)
 {
struct inode *inode = file_inode(file);
 
+   if (f2fs_encrypted_inode(inode)) {
+   int err = f2fs_get_encryption_info(inode);
+   if (err)
+   return 0;
+   }
+
/* we don't need to use inline_data strictly */
if (f2fs_has_inline_data(inode)) {
int err = f2fs_convert_inline_inode(inode);
@@ -420,6 +426,14 @@ static int f2fs_file_mmap(struct file *file, struct 
vm_area_struct *vma)
return 0;
 }
 
+static int f2fs_file_open(struct inode *inode, struct file *filp)
+{
+   if (f2fs_encrypted_inode(inode) && f2fs_get_encryption_info(inode))
+   return -EACCES;
+
+   return generic_file_open(inode, filp);
+}
+
 int truncate_data_blocks_range(struct dnode_of_data *dn, int count)
 {
int nr_free = 0, ofs = dn->ofs_in_node;
@@ -627,6 +641,10 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr)
return err;
 
if (attr->ia_valid & ATTR_SIZE) {
+   if (f2fs_encrypted_inode(inode) &&
+   f2fs_get_encryption_info(inode))
+   return -EACCES;
+
if (attr->ia_size != i_size_read(inode)) {
truncate_setsize(inode, attr->ia_size);
f2fs_truncate(inode);
@@ -1466,6 +1484,18 @@ long f2fs_ioctl(struct file *filp, unsigned int cmd, 
unsigned long arg)
}
 }
 
+static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+   struct inode *inode = file_inode(iocb->ki_filp);
+
+   if (f2fs_encrypted_inode(inode) &&
+   !f2fs_has_encryption_key(inode) &&
+   

[PATCH 07/18] f2fs crypto: add f2fs encryption facilities

2015-05-08 Thread Jaegeuk Kim
Most of parts were copied from ext4, except:

 - add f2fs_restore_and_release_control_page which returns control page and
   restore control page
 - remove ext4_encrypted_zeroout()
 - remove sbi->s_file_encryption_mode & sbi->s_dir_encryption_mode
 - add f2fs_end_io_crypto_work for mpage_end_io
 - set num_prealloc_crypto_pages to 128, max size for one bio
 - call f2fs_exit_crypto() in put_super
 - call f2fs_init_crypto() in fill_super to avoid runtime GFP_KERNEL allocation
   in writepage path

Signed-off-by: Michael Halcrow 
Signed-off-by: Ildar Muslukhov 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/Makefile |   2 +-
 fs/f2fs/crypto.c | 561 +++
 fs/f2fs/f2fs.h   |  25 +++
 fs/f2fs/super.c  |  19 +-
 4 files changed, 603 insertions(+), 4 deletions(-)
 create mode 100644 fs/f2fs/crypto.c

diff --git a/fs/f2fs/Makefile b/fs/f2fs/Makefile
index 7864f4f..a79907b 100644
--- a/fs/f2fs/Makefile
+++ b/fs/f2fs/Makefile
@@ -6,4 +6,4 @@ f2fs-$(CONFIG_F2FS_STAT_FS) += debug.o
 f2fs-$(CONFIG_F2FS_FS_XATTR) += xattr.o
 f2fs-$(CONFIG_F2FS_FS_POSIX_ACL) += acl.o
 f2fs-$(CONFIG_F2FS_IO_TRACE) += trace.o
-f2fs-$(CONFIG_F2FS_FS_ENCRYPTION) += crypto_policy.o
+f2fs-$(CONFIG_F2FS_FS_ENCRYPTION) += crypto_policy.o crypto.o
diff --git a/fs/f2fs/crypto.c b/fs/f2fs/crypto.c
new file mode 100644
index 000..38c005c
--- /dev/null
+++ b/fs/f2fs/crypto.c
@@ -0,0 +1,561 @@
+/*
+ * linux/fs/f2fs/crypto.c
+ *
+ * Copied from linux/fs/ext4/crypto.c
+ *
+ * Copyright (C) 2015, Google, Inc.
+ * Copyright (C) 2015, Motorola Mobility
+ *
+ * This contains encryption functions for f2fs
+ *
+ * Written by Michael Halcrow, 2014.
+ *
+ * Filename encryption additions
+ * Uday Savagaonkar, 2014
+ * Encryption policy handling additions
+ * Ildar Muslukhov, 2014
+ * Remove ext4_encrypted_zeroout(),
+ *   add f2fs_restore_and_release_control_page()
+ * Jaegeuk Kim, 2015.
+ *
+ * This has not yet undergone a rigorous security audit.
+ *
+ * The usage of AES-XTS should conform to recommendations in NIST
+ * Special Publication 800-38E and IEEE P1619/D16.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "f2fs.h"
+#include "xattr.h"
+
+/* Encryption added and removed here! (L: */
+
+static unsigned int num_prealloc_crypto_pages = 128;
+static unsigned int num_prealloc_crypto_ctxs = 128;
+
+module_param(num_prealloc_crypto_pages, uint, 0444);
+MODULE_PARM_DESC(num_prealloc_crypto_pages,
+   "Number of crypto pages to preallocate");
+module_param(num_prealloc_crypto_ctxs, uint, 0444);
+MODULE_PARM_DESC(num_prealloc_crypto_ctxs,
+   "Number of crypto contexts to preallocate");
+
+static mempool_t *f2fs_bounce_page_pool;
+
+static LIST_HEAD(f2fs_free_crypto_ctxs);
+static DEFINE_SPINLOCK(f2fs_crypto_ctx_lock);
+
+static struct workqueue_struct *f2fs_read_workqueue;
+static DEFINE_MUTEX(crypto_init);
+
+/**
+ * f2fs_release_crypto_ctx() - Releases an encryption context
+ * @ctx: The encryption context to release.
+ *
+ * If the encryption context was allocated from the pre-allocated pool, returns
+ * it to that pool. Else, frees it.
+ *
+ * If there's a bounce page in the context, this frees that.
+ */
+void f2fs_release_crypto_ctx(struct f2fs_crypto_ctx *ctx)
+{
+   unsigned long flags;
+
+   if (ctx->bounce_page) {
+   if (ctx->flags & F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL)
+   __free_page(ctx->bounce_page);
+   else
+   mempool_free(ctx->bounce_page, f2fs_bounce_page_pool);
+   ctx->bounce_page = NULL;
+   }
+   ctx->control_page = NULL;
+   if (ctx->flags & F2FS_CTX_REQUIRES_FREE_ENCRYPT_FL) {
+   if (ctx->tfm)
+   crypto_free_tfm(ctx->tfm);
+   kfree(ctx);
+   } else {
+   spin_lock_irqsave(_crypto_ctx_lock, flags);
+   list_add(>free_list, _free_crypto_ctxs);
+   spin_unlock_irqrestore(_crypto_ctx_lock, flags);
+   }
+}
+
+/**
+ * f2fs_alloc_and_init_crypto_ctx() - Allocates and inits an encryption context
+ * @mask: The allocation mask.
+ *
+ * Return: An allocated and initialized encryption context on success. An error
+ * value or NULL otherwise.
+ */
+static struct f2fs_crypto_ctx *f2fs_alloc_and_init_crypto_ctx(gfp_t mask)
+{
+   struct f2fs_crypto_ctx *ctx = kzalloc(sizeof(struct f2fs_crypto_ctx),
+   mask);
+
+   if (!ctx)
+   return ERR_PTR(-ENOMEM);
+   return ctx;
+}
+
+/**
+ * f2fs_get_crypto_ctx() - Gets an encryption context
+ * @inode:   The inode for which we are doing the crypto
+ *
+ * Allocates and initializes an encryption context.
+ *
+ * Return: An allocated and initialized encryption 

[PATCH 08/18] f2fs crypto: add encryption key management facilities

2015-05-08 Thread Jaegeuk Kim
This patch copies from encrypt_key.c in ext4, and modifies for f2fs.

Use GFP_NOFS, since _f2fs_get_encryption_info is called under f2fs_lock_op.

Signed-off-by: Michael Halcrow 
Signed-off-by: Ildar Muslukhov 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/Makefile  |   2 +-
 fs/f2fs/crypto_key.c  | 200 ++
 fs/f2fs/f2fs.h|  22 ++
 fs/f2fs/f2fs_crypto.h |   3 +
 4 files changed, 226 insertions(+), 1 deletion(-)
 create mode 100644 fs/f2fs/crypto_key.c

diff --git a/fs/f2fs/Makefile b/fs/f2fs/Makefile
index a79907b..b08925d 100644
--- a/fs/f2fs/Makefile
+++ b/fs/f2fs/Makefile
@@ -6,4 +6,4 @@ f2fs-$(CONFIG_F2FS_STAT_FS) += debug.o
 f2fs-$(CONFIG_F2FS_FS_XATTR) += xattr.o
 f2fs-$(CONFIG_F2FS_FS_POSIX_ACL) += acl.o
 f2fs-$(CONFIG_F2FS_IO_TRACE) += trace.o
-f2fs-$(CONFIG_F2FS_FS_ENCRYPTION) += crypto_policy.o crypto.o
+f2fs-$(CONFIG_F2FS_FS_ENCRYPTION) += crypto_policy.o crypto.o crypto_key.o
diff --git a/fs/f2fs/crypto_key.c b/fs/f2fs/crypto_key.c
new file mode 100644
index 000..aec7e17
--- /dev/null
+++ b/fs/f2fs/crypto_key.c
@@ -0,0 +1,200 @@
+/*
+ * linux/fs/f2fs/crypto_key.c
+ *
+ * Copied from linux/fs/f2fs/crypto_key.c
+ *
+ * Copyright (C) 2015, Google, Inc.
+ *
+ * This contains encryption key functions for f2fs
+ *
+ * Written by Michael Halcrow, Ildar Muslukhov, and Uday Savagaonkar, 2015.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "f2fs.h"
+#include "xattr.h"
+
+static void derive_crypt_complete(struct crypto_async_request *req, int rc)
+{
+   struct f2fs_completion_result *ecr = req->data;
+
+   if (rc == -EINPROGRESS)
+   return;
+
+   ecr->res = rc;
+   complete(>completion);
+}
+
+/**
+ * f2fs_derive_key_aes() - Derive a key using AES-128-ECB
+ * @deriving_key: Encryption key used for derivatio.
+ * @source_key:   Source key to which to apply derivation.
+ * @derived_key:  Derived key.
+ *
+ * Return: Zero on success; non-zero otherwise.
+ */
+static int f2fs_derive_key_aes(char deriving_key[F2FS_AES_128_ECB_KEY_SIZE],
+   char source_key[F2FS_AES_256_XTS_KEY_SIZE],
+   char derived_key[F2FS_AES_256_XTS_KEY_SIZE])
+{
+   int res = 0;
+   struct ablkcipher_request *req = NULL;
+   DECLARE_F2FS_COMPLETION_RESULT(ecr);
+   struct scatterlist src_sg, dst_sg;
+   struct crypto_ablkcipher *tfm = crypto_alloc_ablkcipher("ecb(aes)", 0,
+   0);
+
+   if (IS_ERR(tfm)) {
+   res = PTR_ERR(tfm);
+   tfm = NULL;
+   goto out;
+   }
+   crypto_ablkcipher_set_flags(tfm, CRYPTO_TFM_REQ_WEAK_KEY);
+   req = ablkcipher_request_alloc(tfm, GFP_NOFS);
+   if (!req) {
+   res = -ENOMEM;
+   goto out;
+   }
+   ablkcipher_request_set_callback(req,
+   CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
+   derive_crypt_complete, );
+   res = crypto_ablkcipher_setkey(tfm, deriving_key,
+   F2FS_AES_128_ECB_KEY_SIZE);
+   if (res < 0)
+   goto out;
+
+   sg_init_one(_sg, source_key, F2FS_AES_256_XTS_KEY_SIZE);
+   sg_init_one(_sg, derived_key, F2FS_AES_256_XTS_KEY_SIZE);
+   ablkcipher_request_set_crypt(req, _sg, _sg,
+   F2FS_AES_256_XTS_KEY_SIZE, NULL);
+   res = crypto_ablkcipher_encrypt(req);
+   if (res == -EINPROGRESS || res == -EBUSY) {
+   BUG_ON(req->base.data != );
+   wait_for_completion();
+   res = ecr.res;
+   }
+out:
+   if (req)
+   ablkcipher_request_free(req);
+   if (tfm)
+   crypto_free_ablkcipher(tfm);
+   return res;
+}
+
+void f2fs_free_encryption_info(struct inode *inode)
+{
+   struct f2fs_inode_info *fi = F2FS_I(inode);
+   struct f2fs_crypt_info *ci = fi->i_crypt_info;
+
+   if (!ci)
+   return;
+
+   if (ci->ci_keyring_key)
+   key_put(ci->ci_keyring_key);
+   crypto_free_ablkcipher(ci->ci_ctfm);
+   memzero_explicit(>ci_raw, sizeof(ci->ci_raw));
+   kfree(ci);
+   fi->i_crypt_info = NULL;
+}
+
+int _f2fs_get_encryption_info(struct inode *inode)
+{
+   struct f2fs_inode_info *fi = F2FS_I(inode);
+   struct f2fs_crypt_info *crypt_info;
+   char full_key_descriptor[F2FS_KEY_DESC_PREFIX_SIZE +
+   (F2FS_KEY_DESCRIPTOR_SIZE * 2) + 1];
+   struct key *keyring_key = NULL;
+   struct f2fs_encryption_key *master_key;
+   struct f2fs_encryption_context ctx;
+   struct user_key_payload *ukp;
+   int res;
+
+   if (fi->i_crypt_info) {
+   if (!fi->i_crypt_info->ci_keyring_key ||
+   key_validate(fi->i_crypt_info->ci_keyring_key) == 0)
+

[PATCH 04/18] f2fs crypto: add f2fs encryption Kconfig

2015-05-08 Thread Jaegeuk Kim
This patch adds f2fs encryption config.

Signed-off-by: Michael Halcrow 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/Kconfig | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/fs/f2fs/Kconfig b/fs/f2fs/Kconfig
index 05f0f66..28f21fe 100644
--- a/fs/f2fs/Kconfig
+++ b/fs/f2fs/Kconfig
@@ -72,6 +72,24 @@ config F2FS_CHECK_FS
 
  If you want to improve the performance, say N.
 
+config F2FS_FS_ENCRYPTION
+   bool "F2FS Encryption"
+   depends on F2FS_FS
+   depends on F2FS_FS_XATTR
+   select CRYPTO_AES
+   select CRYPTO_CBC
+   select CRYPTO_ECB
+   select CRYPTO_XTS
+   select CRYPTO_CTS
+   select CRYPTO_SHA256
+   select KEYS
+   select ENCRYPTED_KEYS
+   help
+ Enable encryption of f2fs files and directories.  This
+ feature is similar to ecryptfs, but it is more memory
+ efficient since it avoids caching the encrypted and
+ decrypted pages in the page cache.
+
 config F2FS_IO_TRACE
bool "F2FS IO tracer"
depends on F2FS_FS
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[git pull] vfs fixes for -rc3

2015-05-08 Thread Al Viro
A couple of fixes for bugs caught while digging in fs/namei.c.
The first one is this cycle regression, the second is 3.11 and later.
Please, pull from the usual place -
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus

Shortlog:
Al Viro (2):
  namei: d_is_negative() should be checked before ->d_seq validation
  path_openat(): fix double fput()

Diffstat:
 fs/namei.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/18] f2fs crypto: add filename encryption for f2fs_add_link

2015-05-08 Thread Jaegeuk Kim
This patch adds filename encryption support for f2fs_add_link.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/dir.c | 39 +++
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index f7293a2..750a688 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -507,24 +507,33 @@ int __f2fs_add_link(struct inode *dir, const struct qstr 
*name,
unsigned long bidx, block;
f2fs_hash_t dentry_hash;
unsigned int nbucket, nblock;
-   size_t namelen = name->len;
struct page *dentry_page = NULL;
struct f2fs_dentry_block *dentry_blk = NULL;
struct f2fs_dentry_ptr d;
-   int slots = GET_DENTRY_SLOTS(namelen);
struct page *page = NULL;
-   int err = 0;
+   struct f2fs_filename fname;
+   struct qstr new_name;
+   int slots, err;
+
+   err = f2fs_fname_setup_filename(dir, name, 0, );
+   if (err)
+   return err;
+
+   new_name.name = fname_name();
+   new_name.len = fname_len();
 
if (f2fs_has_inline_dentry(dir)) {
-   err = f2fs_add_inline_entry(dir, name, inode, ino, mode);
+   err = f2fs_add_inline_entry(dir, _name, inode, ino, mode);
if (!err || err != -EAGAIN)
-   return err;
+   goto out;
else
err = 0;
}
 
-   dentry_hash = f2fs_dentry_hash(name);
level = 0;
+   slots = GET_DENTRY_SLOTS(new_name.len);
+   dentry_hash = f2fs_dentry_hash(_name);
+
current_depth = F2FS_I(dir)->i_current_depth;
if (F2FS_I(dir)->chash == dentry_hash) {
level = F2FS_I(dir)->clevel;
@@ -532,8 +541,10 @@ int __f2fs_add_link(struct inode *dir, const struct qstr 
*name,
}
 
 start:
-   if (unlikely(current_depth == MAX_DIR_HASH_DEPTH))
-   return -ENOSPC;
+   if (unlikely(current_depth == MAX_DIR_HASH_DEPTH)) {
+   err = -ENOSPC;
+   goto out;
+   }
 
/* Increase the depth, if required */
if (level == current_depth)
@@ -547,8 +558,10 @@ start:
 
for (block = bidx; block <= (bidx + nblock - 1); block++) {
dentry_page = get_new_data_page(dir, NULL, block, true);
-   if (IS_ERR(dentry_page))
-   return PTR_ERR(dentry_page);
+   if (IS_ERR(dentry_page)) {
+   err = PTR_ERR(dentry_page);
+   goto out;
+   }
 
dentry_blk = kmap(dentry_page);
bit_pos = room_for_filename(_blk->dentry_bitmap,
@@ -568,7 +581,7 @@ add_dentry:
 
if (inode) {
down_write(_I(inode)->i_sem);
-   page = init_inode_metadata(inode, dir, name, NULL);
+   page = init_inode_metadata(inode, dir, _name, NULL);
if (IS_ERR(page)) {
err = PTR_ERR(page);
goto fail;
@@ -576,7 +589,7 @@ add_dentry:
}
 
make_dentry_ptr(, (void *)dentry_blk, 1);
-   f2fs_update_dentry(ino, mode, , name, dentry_hash, bit_pos);
+   f2fs_update_dentry(ino, mode, , _name, dentry_hash, bit_pos);
 
set_page_dirty(dentry_page);
 
@@ -598,6 +611,8 @@ fail:
}
kunmap(dentry_page);
f2fs_put_page(dentry_page, 1);
+out:
+   f2fs_fname_free_filename();
return err;
 }
 
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14/18] f2fs crypto: add filename encryption for f2fs_lookup

2015-05-08 Thread Jaegeuk Kim
This patch implements filename encryption support for f2fs_lookup.

Note that, f2fs_find_entry should be outside of f2fs_(un)lock_op().

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/dir.c| 79 
 fs/f2fs/f2fs.h   |  9 ---
 fs/f2fs/inline.c |  9 ---
 3 files changed, 56 insertions(+), 41 deletions(-)

diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index ab6455d..5e10d9d 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -76,20 +76,10 @@ static unsigned long dir_block_index(unsigned int level,
return bidx;
 }
 
-static bool early_match_name(size_t namelen, f2fs_hash_t namehash,
-   struct f2fs_dir_entry *de)
-{
-   if (le16_to_cpu(de->name_len) != namelen)
-   return false;
-
-   if (de->hash_code != namehash)
-   return false;
-
-   return true;
-}
-
 static struct f2fs_dir_entry *find_in_block(struct page *dentry_page,
-   struct qstr *name, int *max_slots,
+   struct f2fs_filename *fname,
+   f2fs_hash_t namehash,
+   int *max_slots,
struct page **res_page)
 {
struct f2fs_dentry_block *dentry_blk;
@@ -99,8 +89,7 @@ static struct f2fs_dir_entry *find_in_block(struct page 
*dentry_page,
dentry_blk = (struct f2fs_dentry_block *)kmap(dentry_page);
 
make_dentry_ptr(NULL, , (void *)dentry_blk, 1);
-   de = find_target_dentry(name, max_slots, );
-
+   de = find_target_dentry(fname, namehash, max_slots, );
if (de)
*res_page = dentry_page;
else
@@ -114,13 +103,15 @@ static struct f2fs_dir_entry *find_in_block(struct page 
*dentry_page,
return de;
 }
 
-struct f2fs_dir_entry *find_target_dentry(struct qstr *name, int *max_slots,
-   struct f2fs_dentry_ptr *d)
+struct f2fs_dir_entry *find_target_dentry(struct f2fs_filename *fname,
+   f2fs_hash_t namehash, int *max_slots,
+   struct f2fs_dentry_ptr *d)
 {
struct f2fs_dir_entry *de;
unsigned long bit_pos = 0;
-   f2fs_hash_t namehash = f2fs_dentry_hash(name);
int max_len = 0;
+   struct f2fs_str de_name = FSTR_INIT(NULL, 0);
+   struct f2fs_str *name = >disk_name;
 
if (max_slots)
*max_slots = 0;
@@ -132,8 +123,18 @@ struct f2fs_dir_entry *find_target_dentry(struct qstr 
*name, int *max_slots,
}
 
de = >dentry[bit_pos];
-   if (early_match_name(name->len, namehash, de) &&
-   !memcmp(d->filename[bit_pos], name->name, name->len))
+
+   /* encrypted case */
+   de_name.name = d->filename[bit_pos];
+   de_name.len = le16_to_cpu(de->name_len);
+
+   /* show encrypted name */
+   if (fname->hash) {
+   if (de->hash_code == fname->hash)
+   goto found;
+   } else if (de_name.len == name->len &&
+   de->hash_code == namehash &&
+   !memcmp(de_name.name, name->name, name->len))
goto found;
 
if (max_slots && max_len > *max_slots)
@@ -155,16 +156,21 @@ found:
 }
 
 static struct f2fs_dir_entry *find_in_level(struct inode *dir,
-   unsigned int level, struct qstr *name,
-   f2fs_hash_t namehash, struct page **res_page)
+   unsigned int level,
+   struct f2fs_filename *fname,
+   struct page **res_page)
 {
-   int s = GET_DENTRY_SLOTS(name->len);
+   struct qstr name = FSTR_TO_QSTR(>disk_name);
+   int s = GET_DENTRY_SLOTS(name.len);
unsigned int nbucket, nblock;
unsigned int bidx, end_block;
struct page *dentry_page;
struct f2fs_dir_entry *de = NULL;
bool room = false;
int max_slots;
+   f2fs_hash_t namehash;
+
+   namehash = f2fs_dentry_hash();
 
f2fs_bug_on(F2FS_I_SB(dir), level > MAX_DIR_HASH_DEPTH);
 
@@ -183,7 +189,8 @@ static struct f2fs_dir_entry *find_in_level(struct inode 
*dir,
continue;
}
 
-   de = find_in_block(dentry_page, name, _slots, res_page);
+   de = find_in_block(dentry_page, fname, namehash, _slots,
+   res_page);
if (de)
break;
 
@@ -211,30 +218,34 @@ struct f2fs_dir_entry *f2fs_find_entry(struct inode *dir,
 {
unsigned long npages = dir_blocks(dir);
struct f2fs_dir_entry *de = NULL;
-   f2fs_hash_t name_hash;
unsigned int max_depth;
unsigned int level;
+   struct f2fs_filename fname;
+   

[PATCH 11/18] f2fs crypto: add encryption support in read/write paths

2015-05-08 Thread Jaegeuk Kim
This patch adds encryption support in read and write paths.

Note that, in f2fs, we need to consider cleaning operation.
In cleaning procedure, we must avoid encrypting and decrypting written blocks.
So, this patch implements move_encrypted_block().

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/checkpoint.c |  4 ++-
 fs/f2fs/data.c   | 96 
 fs/f2fs/f2fs.h   |  1 +
 fs/f2fs/file.c   |  2 +-
 fs/f2fs/gc.c | 79 +-
 fs/f2fs/inline.c |  1 +
 fs/f2fs/node.c   |  2 ++
 fs/f2fs/segment.c| 24 ++---
 8 files changed, 180 insertions(+), 29 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 1da20a6..98c31db 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -56,6 +56,7 @@ struct page *get_meta_page(struct f2fs_sb_info *sbi, pgoff_t 
index)
.type = META,
.rw = READ_SYNC | REQ_META | REQ_PRIO,
.blk_addr = index,
+   .encrypted_page = NULL,
};
 repeat:
page = grab_cache_page(mapping, index);
@@ -122,7 +123,8 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, 
int nrpages, int type
struct f2fs_io_info fio = {
.sbi = sbi,
.type = META,
-   .rw = READ_SYNC | REQ_META | REQ_PRIO
+   .rw = READ_SYNC | REQ_META | REQ_PRIO,
+   .encrypted_page = NULL,
};
 
for (; nrpages-- > 0; blkno++) {
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 473b4d4..deb6b69 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -57,6 +57,15 @@ static void mpage_end_io(struct bio *bio, int err)
struct bio_vec *bv;
int i;
 
+   if (f2fs_bio_encrypted(bio)) {
+   if (err) {
+   f2fs_release_crypto_ctx(bio->bi_private);
+   } else {
+   f2fs_end_io_crypto_work(bio->bi_private, bio);
+   return;
+   }
+   }
+
bio_for_each_segment_all(bv, bio, i) {
struct page *page = bv->bv_page;
 
@@ -81,6 +90,8 @@ static void f2fs_write_end_io(struct bio *bio, int err)
bio_for_each_segment_all(bvec, bio, i) {
struct page *page = bvec->bv_page;
 
+   f2fs_restore_and_release_control_page();
+
if (unlikely(err)) {
set_page_dirty(page);
set_bit(AS_EIO, >mapping->flags);
@@ -161,7 +172,7 @@ void f2fs_submit_merged_bio(struct f2fs_sb_info *sbi,
 int f2fs_submit_page_bio(struct f2fs_io_info *fio)
 {
struct bio *bio;
-   struct page *page = fio->page;
+   struct page *page = fio->encrypted_page ? fio->encrypted_page : 
fio->page;
 
trace_f2fs_submit_page_bio(page, fio);
f2fs_trace_ios(fio, 0);
@@ -185,6 +196,7 @@ void f2fs_submit_page_mbio(struct f2fs_io_info *fio)
enum page_type btype = PAGE_TYPE_OF_BIO(fio->type);
struct f2fs_bio_info *io;
bool is_read = is_read_io(fio->rw);
+   struct page *bio_page;
 
io = is_read ? >read_io : >write_io[btype];
 
@@ -206,7 +218,9 @@ alloc_new:
io->fio = *fio;
}
 
-   if (bio_add_page(io->bio, fio->page, PAGE_CACHE_SIZE, 0) <
+   bio_page = fio->encrypted_page ? fio->encrypted_page : fio->page;
+
+   if (bio_add_page(io->bio, bio_page, PAGE_CACHE_SIZE, 0) <
PAGE_CACHE_SIZE) {
__submit_merged_bio(io);
goto alloc_new;
@@ -928,8 +942,12 @@ struct page *get_read_data_page(struct inode *inode, 
pgoff_t index, int rw)
.sbi = F2FS_I_SB(inode),
.type = DATA,
.rw = rw,
+   .encrypted_page = NULL,
};
 
+   if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode))
+   return read_mapping_page(mapping, index, NULL);
+
page = grab_cache_page(mapping, index);
if (!page)
return ERR_PTR(-ENOMEM);
@@ -1066,26 +1084,14 @@ repeat:
zero_user_segment(page, 0, PAGE_CACHE_SIZE);
SetPageUptodate(page);
} else {
-   struct f2fs_io_info fio = {
-   .sbi = F2FS_I_SB(inode),
-   .type = DATA,
-   .rw = READ_SYNC,
-   .blk_addr = dn.data_blkaddr,
-   .page = page,
-   };
-   err = f2fs_submit_page_bio();
-   if (err)
-   return ERR_PTR(err);
+   f2fs_put_page(page, 1);
 
-   lock_page(page);
-   if (unlikely(!PageUptodate(page))) {
-   f2fs_put_page(page, 1);
-   return ERR_PTR(-EIO);
-   }
-   if (unlikely(page->mapping != mapping)) {
-   f2fs_put_page(page, 1);

[PATCH 15/18] f2fs crypto: add filename encryption for roll-forward recovery

2015-05-08 Thread Jaegeuk Kim
This patch adds a bit flag to indicate whether or not i_name in the inode
is encrypted.

If this name is encrypted, we can't do recover_dentry during roll-forward.
So, f2fs_sync_file() needs to do checkpoint, if this will be needed in future.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/dir.c  |  8 +++-
 fs/f2fs/f2fs.h |  5 -
 fs/f2fs/file.c |  4 +++-
 fs/f2fs/namei.c| 20 +++-
 fs/f2fs/recovery.c | 13 -
 5 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index 5e10d9d..12f6869 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -314,10 +314,14 @@ static void init_dent_inode(const struct qstr *name, 
struct page *ipage)
set_page_dirty(ipage);
 }
 
-int update_dent_inode(struct inode *inode, const struct qstr *name)
+int update_dent_inode(struct inode *inode, struct inode *to,
+   const struct qstr *name)
 {
struct page *page;
 
+   if (file_enc_name(to))
+   return 0;
+
page = get_node_page(F2FS_I_SB(inode), inode->i_ino);
if (IS_ERR(page))
return PTR_ERR(page);
@@ -597,6 +601,8 @@ add_dentry:
err = PTR_ERR(page);
goto fail;
}
+   if (f2fs_encrypted_inode(dir))
+   file_set_enc_name(inode);
}
 
make_dentry_ptr(NULL, , (void *)dentry_blk, 1);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 6898331..fda040b 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -378,6 +378,7 @@ struct f2fs_map_blocks {
 #define FADVISE_COLD_BIT   0x01
 #define FADVISE_LOST_PINO_BIT  0x02
 #define FADVISE_ENCRYPT_BIT0x04
+#define FADVISE_ENC_NAME_BIT   0x08
 
 #define file_is_cold(inode)is_file(inode, FADVISE_COLD_BIT)
 #define file_wrong_pino(inode) is_file(inode, FADVISE_LOST_PINO_BIT)
@@ -388,6 +389,8 @@ struct f2fs_map_blocks {
 #define file_is_encrypt(inode) is_file(inode, FADVISE_ENCRYPT_BIT)
 #define file_set_encrypt(inode)set_file(inode, FADVISE_ENCRYPT_BIT)
 #define file_clear_encrypt(inode) clear_file(inode, FADVISE_ENCRYPT_BIT)
+#define file_enc_name(inode)   is_file(inode, FADVISE_ENC_NAME_BIT)
+#define file_set_enc_name(inode) set_file(inode, FADVISE_ENC_NAME_BIT)
 
 /* Encryption algorithms */
 #define F2FS_ENCRYPTION_MODE_INVALID   0
@@ -1602,7 +1605,7 @@ struct f2fs_dir_entry *f2fs_parent_dir(struct inode *, 
struct page **);
 ino_t f2fs_inode_by_name(struct inode *, struct qstr *);
 void f2fs_set_link(struct inode *, struct f2fs_dir_entry *,
struct page *, struct inode *);
-int update_dent_inode(struct inode *, const struct qstr *);
+int update_dent_inode(struct inode *, struct inode *, const struct qstr *);
 void f2fs_update_dentry(nid_t ino, umode_t mode, struct f2fs_dentry_ptr *,
const struct qstr *, f2fs_hash_t , unsigned int);
 int __f2fs_add_link(struct inode *, const struct qstr *, struct inode *, nid_t,
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index d7daff8..14eb4f7 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -106,7 +106,7 @@ static int get_parent_ino(struct inode *inode, nid_t *pino)
if (!dentry)
return 0;
 
-   if (update_dent_inode(inode, >d_name)) {
+   if (update_dent_inode(inode, inode, >d_name)) {
dput(dentry);
return 0;
}
@@ -123,6 +123,8 @@ static inline bool need_do_checkpoint(struct inode *inode)
 
if (!S_ISREG(inode->i_mode) || inode->i_nlink != 1)
need_cp = true;
+   else if (file_enc_name(inode) && need_dentry_mark(sbi, inode->i_ino))
+   need_cp = true;
else if (file_wrong_pino(inode))
need_cp = true;
else if (!space_for_roll_forward(sbi))
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index bc8992e..c857f82 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -517,7 +517,8 @@ static int f2fs_rename(struct inode *old_dir, struct dentry 
*old_dentry,
if (err)
goto put_out_dir;
 
-   if (update_dent_inode(old_inode, _dentry->d_name)) {
+   if (update_dent_inode(old_inode, new_inode,
+   _dentry->d_name)) {
release_orphan_inode(sbi);
goto put_out_dir;
}
@@ -557,6 +558,8 @@ static int f2fs_rename(struct inode *old_dir, struct dentry 
*old_dentry,
 
down_write(_I(old_inode)->i_sem);
file_lost_pino(old_inode);
+   if (new_inode && file_enc_name(new_inode))
+   file_set_enc_name(old_inode);
up_write(_I(old_inode)->i_sem);
 
old_inode->i_ctime = CURRENT_TIME;
@@ -659,13 +662,17 @@ static int f2fs_cross_rename(struct inode *old_dir, 
struct dentry *old_dentry,
 
f2fs_lock_op(sbi);
 
-   err = update_dent_inode(old_inode, 

[PATCH 16/18] f2fs crypto: add symlink encryption

2015-05-08 Thread Jaegeuk Kim
This patch implements encryption support for symlink.

The codes refered the ext4 symlink path.

Signed-off-by: Uday Savagaonkar 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/f2fs_crypto.h |   2 -
 fs/f2fs/namei.c   | 138 --
 2 files changed, 135 insertions(+), 5 deletions(-)

diff --git a/fs/f2fs/f2fs_crypto.h b/fs/f2fs/f2fs_crypto.h
index bad32e6..6e41394 100644
--- a/fs/f2fs/f2fs_crypto.h
+++ b/fs/f2fs/f2fs_crypto.h
@@ -145,8 +145,6 @@ struct f2fs_encrypted_symlink_data {
  */
 static inline u32 encrypted_symlink_data_len(u32 l)
 {
-   if (l < F2FS_CRYPTO_BLOCK_SIZE)
-   l = F2FS_CRYPTO_BLOCK_SIZE;
return (l + sizeof(struct f2fs_encrypted_symlink_data) - 1);
 }
 #endif /* _F2FS_CRYPTO_H */
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index c857f82..e6a6310 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -333,12 +333,102 @@ static void *f2fs_follow_link(struct dentry *dentry, 
struct nameidata *nd)
return page;
 }
 
+#ifdef CONFIG_F2FS_FS_ENCRYPTION
+static void *f2fs_encrypted_follow_link(struct dentry *dentry,
+   struct nameidata *nd)
+{
+   struct page *cpage = NULL;
+   char *caddr, *paddr = NULL;
+   struct f2fs_str cstr, pstr;
+   struct inode *inode = d_inode(dentry);
+   struct f2fs_encrypted_symlink_data *sd;
+   loff_t size = min_t(loff_t, i_size_read(inode), PAGE_SIZE - 1);
+   int res;
+   u32 max_size = inode->i_sb->s_blocksize;
+
+   if (!f2fs_encrypted_inode(inode))
+   return f2fs_follow_link(dentry, nd);
+
+   res = f2fs_setup_fname_crypto(inode);
+   if (res)
+   return ERR_PTR(res);
+
+   cpage = read_mapping_page(inode->i_mapping, 0, NULL);
+   if (IS_ERR(cpage))
+   return cpage;
+   caddr = kmap(cpage);
+   caddr[size] = 0;
+
+   /* Symlink is encrypted */
+   sd = (struct f2fs_encrypted_symlink_data *)caddr;
+   cstr.name = sd->encrypted_path;
+   cstr.len = le16_to_cpu(sd->len);
+
+   /* this is broken symlink case */
+   if (cstr.name[0] == 0 && cstr.len == 0) {
+   res = -ENOENT;
+   goto errout;
+   }
+
+   if ((cstr.len + sizeof(struct f2fs_encrypted_symlink_data) - 1) >
+   max_size) {
+   /* Symlink data on the disk is corrupted */
+   res = -EIO;
+   goto errout;
+   }
+   paddr = kmalloc(cstr.len + 1, GFP_NOFS);
+   if (!paddr) {
+   res = -ENOMEM;
+   goto errout;
+   }
+   pstr.name = paddr;
+   pstr.len = cstr.len;
+   res = f2fs_fname_disk_to_usr(inode, NULL, , );
+   if (res < 0)
+   goto errout;
+
+   /* Null-terminate the name */
+   if (res <= cstr.len)
+   paddr[res] = '\0';
+   nd_set_link(nd, paddr);
+   if (cpage) {
+   kunmap(cpage);
+   page_cache_release(cpage);
+   }
+   return NULL;
+errout:
+   if (cpage) {
+   kunmap(cpage);
+   page_cache_release(cpage);
+   }
+   kfree(paddr);
+   return ERR_PTR(res);
+}
+
+static void f2fs_encrypted_put_link(struct dentry *dentry, struct nameidata 
*nd,
+ void *cookie)
+{
+   struct page *page = cookie;
+
+   if (!page) {
+   kfree(nd_get_link(nd));
+   } else {
+   kunmap(page);
+   page_cache_release(page);
+   }
+}
+#endif
+
 static int f2fs_symlink(struct inode *dir, struct dentry *dentry,
const char *symname)
 {
struct f2fs_sb_info *sbi = F2FS_I_SB(dir);
struct inode *inode;
-   size_t symlen = strlen(symname) + 1;
+   size_t len = strlen(symname);
+   size_t p_len;
+   char *p_str;
+   struct f2fs_str disk_link = FSTR_INIT(NULL, 0);
+   struct f2fs_encrypted_symlink_data *sd = NULL;
int err;
 
f2fs_balance_fs(sbi);
@@ -356,7 +446,40 @@ static int f2fs_symlink(struct inode *dir, struct dentry 
*dentry,
goto out;
f2fs_unlock_op(sbi);
 
-   err = page_symlink(inode, symname, symlen);
+   if (f2fs_encrypted_inode(dir)) {
+   struct qstr istr = QSTR_INIT(symname, len);
+
+   err = f2fs_inherit_context(dir, inode, NULL);
+   if (err)
+   goto out;
+
+   err = f2fs_setup_fname_crypto(inode);
+   if (err)
+   goto out;
+
+   err = f2fs_fname_crypto_alloc_buffer(inode, len, _link);
+   if (err)
+   goto out;
+
+   err = f2fs_fname_usr_to_disk(inode, , _link);
+   if (err < 0)
+   goto out;
+
+   p_len = encrypted_symlink_data_len(disk_link.len) + 1;
+  

[PATCH 13/18] f2fs crypto: add filename encryption for f2fs_readdir

2015-05-08 Thread Jaegeuk Kim
This patch implements filename encryption support for f2fs_readdir.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/dir.c| 57 
 fs/f2fs/f2fs.h   | 12 
 fs/f2fs/inline.c | 13 +++--
 3 files changed, 60 insertions(+), 22 deletions(-)

diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index 750a688..ab6455d 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -98,7 +98,7 @@ static struct f2fs_dir_entry *find_in_block(struct page 
*dentry_page,
 
dentry_blk = (struct f2fs_dentry_block *)kmap(dentry_page);
 
-   make_dentry_ptr(, (void *)dentry_blk, 1);
+   make_dentry_ptr(NULL, , (void *)dentry_blk, 1);
de = find_target_dentry(name, max_slots, );
 
if (de)
@@ -356,7 +356,7 @@ static int make_empty_dir(struct inode *inode,
 
dentry_blk = kmap_atomic(dentry_page);
 
-   make_dentry_ptr(, (void *)dentry_blk, 1);
+   make_dentry_ptr(NULL, , (void *)dentry_blk, 1);
do_make_empty_dir(inode, parent, );
 
kunmap_atomic(dentry_blk);
@@ -588,7 +588,7 @@ add_dentry:
}
}
 
-   make_dentry_ptr(, (void *)dentry_blk, 1);
+   make_dentry_ptr(NULL, , (void *)dentry_blk, 1);
f2fs_update_dentry(ino, mode, , _name, dentry_hash, bit_pos);
 
set_page_dirty(dentry_page);
@@ -750,11 +750,12 @@ bool f2fs_empty_dir(struct inode *dir)
 }
 
 bool f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d,
-   unsigned int start_pos)
+   unsigned int start_pos, struct f2fs_str *fstr)
 {
unsigned char d_type = DT_UNKNOWN;
unsigned int bit_pos;
struct f2fs_dir_entry *de = NULL;
+   struct f2fs_str de_name = FSTR_INIT(NULL, 0);
 
bit_pos = ((unsigned long)ctx->pos % d->max);
 
@@ -768,8 +769,24 @@ bool f2fs_fill_dentries(struct dir_context *ctx, struct 
f2fs_dentry_ptr *d,
d_type = f2fs_filetype_table[de->file_type];
else
d_type = DT_UNKNOWN;
-   if (!dir_emit(ctx, d->filename[bit_pos],
-   le16_to_cpu(de->name_len),
+
+   /* encrypted case */
+   de_name.name = d->filename[bit_pos];
+   de_name.len = le16_to_cpu(de->name_len);
+
+   if (f2fs_encrypted_inode(d->inode)) {
+   int save_len = fstr->len;
+   int ret;
+
+   ret = f2fs_fname_disk_to_usr(d->inode, >hash_code,
+   _name, fstr);
+   de_name = *fstr;
+   fstr->len = save_len;
+   if (ret < 0)
+   return true;
+   }
+
+   if (!dir_emit(ctx, de_name.name, de_name.len,
le32_to_cpu(de->ino), d_type))
return true;
 
@@ -788,9 +805,24 @@ static int f2fs_readdir(struct file *file, struct 
dir_context *ctx)
struct file_ra_state *ra = >f_ra;
unsigned int n = ((unsigned long)ctx->pos / NR_DENTRY_IN_BLOCK);
struct f2fs_dentry_ptr d;
+   struct f2fs_str fstr = FSTR_INIT(NULL, 0);
+   int err = 0;
 
-   if (f2fs_has_inline_dentry(inode))
-   return f2fs_read_inline_dir(file, ctx);
+   err = f2fs_setup_fname_crypto(inode);
+   if (err)
+   return err;
+
+   if (f2fs_encrypted_inode(inode)) {
+   err = f2fs_fname_crypto_alloc_buffer(inode, F2FS_NAME_LEN,
+   );
+   if (err < 0)
+   return err;
+   }
+
+   if (f2fs_has_inline_dentry(inode)) {
+   err = f2fs_read_inline_dir(file, ctx, );
+   goto out;
+   }
 
/* readahead for multi pages of dir */
if (npages - n > 1 && !ra_has_index(ra, n))
@@ -804,9 +836,9 @@ static int f2fs_readdir(struct file *file, struct 
dir_context *ctx)
 
dentry_blk = kmap(dentry_page);
 
-   make_dentry_ptr(, (void *)dentry_blk, 1);
+   make_dentry_ptr(inode, , (void *)dentry_blk, 1);
 
-   if (f2fs_fill_dentries(ctx, , n * NR_DENTRY_IN_BLOCK))
+   if (f2fs_fill_dentries(ctx, , n * NR_DENTRY_IN_BLOCK, ))
goto stop;
 
ctx->pos = (n + 1) * NR_DENTRY_IN_BLOCK;
@@ -819,8 +851,9 @@ stop:
kunmap(dentry_page);
f2fs_put_page(dentry_page, 1);
}
-
-   return 0;
+out:
+   f2fs_fname_crypto_free_buffer();
+   return err;
 }
 
 const struct file_operations f2fs_dir_operations = {
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 1632151..963616f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -277,15 +277,18 @@ struct f2fs_filename {
 #define fname_len(p)   

[PATCH 17/18] f2fs crypto: fix missing key when reading a page

2015-05-08 Thread Jaegeuk Kim
1. mount $mnt
2. cp data $mnt/
3. umount $mnt
4. log out
5. log in
6. cat $mnt/data

-> panic, due to no i_crypt_info.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/crypto.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/crypto.c b/fs/f2fs/crypto.c
index 38c005c..509d19a 100644
--- a/fs/f2fs/crypto.c
+++ b/fs/f2fs/crypto.c
@@ -131,7 +131,9 @@ struct f2fs_crypto_ctx *f2fs_get_crypto_ctx(struct inode 
*inode)
unsigned long flags;
struct f2fs_crypt_info *ci = F2FS_I(inode)->i_crypt_info;
 
-   BUG_ON(ci == NULL);
+   if (ci == NULL)
+   return ERR_PTR(-EACCES);
+
/*
 * We first try getting the ctx from a free list because in
 * the common case the ctx will have an allocated and
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


what's cooking in zram for 4.1

2015-05-08 Thread Sergey Senozhatsky
Hello Karel,

There will be some user-space visible changes in zram 4.1 we'd love to let you 
know
about.


1) new sysfs node -- /sys/block/zramX/compact

triggers zram memory compaction.


2) zram has deprecated some of the existing stat sysfs attributes. we will
consolidate zramX device's stats in 3 files, rather than having N files 
(per-stat).

The idea is:
-- the existing RW sysfs device nodes will be downgraded to WO nodes (in linux 
4.11)
-- deprecated RO sysfs nodes will eventually be removed (in linux 4.11)


User-space is advised to use the following files:

-- /sys/block/zram/stat

Represents block layer statistics (read Documentation/block/stat.txt for
details).

-- /sys/block/zram/io_stat

The stat file represents device's I/O statistics not accounted by block
layer and, thus, not available in zram/stat file. It consists of a
single line of text and contains the following stats separated by
whitespace:
failed_reads
failed_writes
invalid_io
notify_free

-- /sys/block/zram/mm_stat

The stat file represents device's mm statistics. It consists of a single
line of text and contains the following stats separated by whitespace:
orig_data_size
compr_data_size
mem_used_total
mem_limit
mem_used_max
zero_pages
num_migrated

deprecated nodes will be around up until linux 4.11 (approx 2 years from now). 
in the
meantime, zram will warn (once) should any user space app access any of the 
deprecated
attrs:
"zram: 30788 (cat) Attribute num_reads (and others) will be removed. See zram 
documentation."


-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 18/18] f2fs crypto: remove checking key context during lookup

2015-05-08 Thread Jaegeuk Kim
No matter what the key is valid or not, readdir shows the dir entries correctly.
So, lookup should not failed.
But, we expect further accesses should be denied from open, rename, link, and so
on.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/namei.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index e6a6310..cbedf56 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -260,16 +260,6 @@ static struct dentry *f2fs_lookup(struct inode *dir, 
struct dentry *dentry,
if (IS_ERR(inode))
return ERR_CAST(inode);
 
-   if (f2fs_encrypted_inode(dir) && f2fs_may_encrypt(inode) &&
-   !f2fs_is_child_context_consistent_with_parent(dir, inode)) {
-   iput(inode);
-   f2fs_msg(inode->i_sb, KERN_WARNING,
-   "Inconsistent encryption contexts: %lu/%lu\n",
-   (unsigned long)dir->i_ino,
-   (unsigned long)inode->i_ino);
-   return ERR_PTR(-EPERM);
-   }
-
if (f2fs_has_inline_dots(inode)) {
err = __recover_dot_dentries(inode, dir->i_ino);
if (err)
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/18] f2fs crypto: filename encryption facilities

2015-05-08 Thread Jaegeuk Kim
This patch adds filename encryption infra.
Most of codes are copied from ext4 part, but changed to adjust f2fs
directory structure.

Signed-off-by: Uday Savagaonkar 
Signed-off-by: Ildar Muslukhov 
Signed-off-by: Michael Halcrow 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/Makefile   |   3 +-
 fs/f2fs/crypto_fname.c | 490 +
 fs/f2fs/f2fs.h |  52 ++
 3 files changed, 544 insertions(+), 1 deletion(-)
 create mode 100644 fs/f2fs/crypto_fname.c

diff --git a/fs/f2fs/Makefile b/fs/f2fs/Makefile
index b08925d..396be1a 100644
--- a/fs/f2fs/Makefile
+++ b/fs/f2fs/Makefile
@@ -6,4 +6,5 @@ f2fs-$(CONFIG_F2FS_STAT_FS) += debug.o
 f2fs-$(CONFIG_F2FS_FS_XATTR) += xattr.o
 f2fs-$(CONFIG_F2FS_FS_POSIX_ACL) += acl.o
 f2fs-$(CONFIG_F2FS_IO_TRACE) += trace.o
-f2fs-$(CONFIG_F2FS_FS_ENCRYPTION) += crypto_policy.o crypto.o crypto_key.o
+f2fs-$(CONFIG_F2FS_FS_ENCRYPTION) += crypto_policy.o crypto.o \
+   crypto_key.o crypto_fname.o
diff --git a/fs/f2fs/crypto_fname.c b/fs/f2fs/crypto_fname.c
new file mode 100644
index 000..8f3ff9b
--- /dev/null
+++ b/fs/f2fs/crypto_fname.c
@@ -0,0 +1,490 @@
+/*
+ * linux/fs/f2fs/crypto_fname.c
+ *
+ * Copied from linux/fs/ext4/crypto.c
+ *
+ * Copyright (C) 2015, Google, Inc.
+ * Copyright (C) 2015, Motorola Mobility
+ *
+ * This contains functions for filename crypto management in f2fs
+ *
+ * Written by Uday Savagaonkar, 2014.
+ *
+ * Adjust f2fs dentry structure
+ * Jaegeuk Kim, 2015.
+ *
+ * This has not yet undergone a rigorous security audit.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "f2fs.h"
+#include "f2fs_crypto.h"
+#include "xattr.h"
+
+/**
+ * f2fs_dir_crypt_complete() -
+ */
+static void f2fs_dir_crypt_complete(struct crypto_async_request *req, int res)
+{
+   struct f2fs_completion_result *ecr = req->data;
+
+   if (res == -EINPROGRESS)
+   return;
+   ecr->res = res;
+   complete(>completion);
+}
+
+bool f2fs_valid_filenames_enc_mode(uint32_t mode)
+{
+   return (mode == F2FS_ENCRYPTION_MODE_AES_256_CTS);
+}
+
+static unsigned max_name_len(struct inode *inode)
+{
+   return S_ISLNK(inode->i_mode) ? inode->i_sb->s_blocksize :
+   F2FS_NAME_LEN;
+}
+
+/**
+ * f2fs_fname_encrypt() -
+ *
+ * This function encrypts the input filename, and returns the length of the
+ * ciphertext. Errors are returned as negative numbers.  We trust the caller to
+ * allocate sufficient memory to oname string.
+ */
+static int f2fs_fname_encrypt(struct inode *inode,
+   const struct qstr *iname, struct f2fs_str *oname)
+{
+   u32 ciphertext_len;
+   struct ablkcipher_request *req = NULL;
+   DECLARE_F2FS_COMPLETION_RESULT(ecr);
+   struct f2fs_crypt_info *ci = F2FS_I(inode)->i_crypt_info;
+   struct crypto_ablkcipher *tfm = ci->ci_ctfm;
+   int res = 0;
+   char iv[F2FS_CRYPTO_BLOCK_SIZE];
+   struct scatterlist src_sg, dst_sg;
+   int padding = 4 << (ci->ci_flags & F2FS_POLICY_FLAGS_PAD_MASK);
+   char *workbuf, buf[32], *alloc_buf = NULL;
+   unsigned lim = max_name_len(inode);
+
+   if (iname->len <= 0 || iname->len > lim)
+   return -EIO;
+
+   ciphertext_len = (iname->len < F2FS_CRYPTO_BLOCK_SIZE) ?
+   F2FS_CRYPTO_BLOCK_SIZE : iname->len;
+   ciphertext_len = f2fs_fname_crypto_round_up(ciphertext_len, padding);
+   ciphertext_len = (ciphertext_len > lim) ? lim : ciphertext_len;
+
+   if (ciphertext_len <= sizeof(buf)) {
+   workbuf = buf;
+   } else {
+   alloc_buf = kmalloc(ciphertext_len, GFP_NOFS);
+   if (!alloc_buf)
+   return -ENOMEM;
+   workbuf = alloc_buf;
+   }
+
+   /* Allocate request */
+   req = ablkcipher_request_alloc(tfm, GFP_NOFS);
+   if (!req) {
+   printk_ratelimited(KERN_ERR
+   "%s: crypto_request_alloc() failed\n", __func__);
+   kfree(alloc_buf);
+   return -ENOMEM;
+   }
+   ablkcipher_request_set_callback(req,
+   CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
+   f2fs_dir_crypt_complete, );
+
+   /* Copy the input */
+   memcpy(workbuf, iname->name, iname->len);
+   if (iname->len < ciphertext_len)
+   memset(workbuf + iname->len, 0, ciphertext_len - iname->len);
+
+   /* Initialize IV */
+   memset(iv, 0, F2FS_CRYPTO_BLOCK_SIZE);
+
+   /* Create encryption request */
+   sg_init_one(_sg, workbuf, ciphertext_len);
+   sg_init_one(_sg, oname->name, ciphertext_len);
+   ablkcipher_request_set_crypt(req, _sg, _sg, ciphertext_len, iv);
+   res = crypto_ablkcipher_encrypt(req);
+   if (res == 

[PATCH 01/18] f2fs: avoid value overflow in showing current status

2015-05-08 Thread Jaegeuk Kim
This patch fixes overflow when do cat /sys/kernel/debug/f2fs/status.
If a section is relatively large, dist value can be overflowed.

Reported-by: Yossi Goldfill 
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/debug.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index f50acbc..efbc83f 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -94,7 +94,8 @@ static void update_general_status(struct f2fs_sb_info *sbi)
 static void update_sit_info(struct f2fs_sb_info *sbi)
 {
struct f2fs_stat_info *si = F2FS_STAT(sbi);
-   unsigned int blks_per_sec, hblks_per_sec, total_vblocks, bimodal, dist;
+   unsigned long long blks_per_sec, hblks_per_sec, total_vblocks;
+   unsigned long long bimodal, dist;
unsigned int segno, vblocks;
int ndirty = 0;
 
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/18] f2fs crypto: add encryption xattr support

2015-05-08 Thread Jaegeuk Kim
This patch add some definition for enrcyption xattr.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/xattr.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/fs/f2fs/xattr.h b/fs/f2fs/xattr.h
index 969d792..71a7100 100644
--- a/fs/f2fs/xattr.h
+++ b/fs/f2fs/xattr.h
@@ -35,6 +35,10 @@
 #define F2FS_XATTR_INDEX_LUSTRE5
 #define F2FS_XATTR_INDEX_SECURITY  6
 #define F2FS_XATTR_INDEX_ADVISE7
+/* Should be same as EXT4_XATTR_INDEX_ENCRYPTION */
+#define F2FS_XATTR_INDEX_ENCRYPTION9
+
+#define F2FS_XATTR_NAME_ENCRYPTION_CONTEXT "c"
 
 struct f2fs_xattr_header {
__le32  h_magic;/* magic number for identification */
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-08 Thread Linus Torvalds
On Fri, May 8, 2015 at 8:02 PM, Rik van Riel  wrote:
>
> The TLB performance bonus of accessing the large files with
> large pages may make it worthwhile to solve that hard problem.

Very few people can actually measure that TLB advantage on systems
with good TLB's.

It's largely a myth, fed by some truly crappy TLB fill systems
(particularly sw-filled TLB's on some early RISC CPU's, but even
"modern" CPU's sometimes have glass jaws here because they cant'
prefetch TLB entries or do concurrent page table walks etc).

There are *very* few loads that actually have the kinds of access
patterns where TLB accesses dominate - or are even noticeable -
compared to the normal memory access costs.

That is doubly true with file-backed storage. The main reason you get
TLB costs to be noticeable is with very sparse access patterns, where
you hit as many TLB entries as you hit pages. That simply doesn't
happen with file mappings.

Really. The whole thing about TLB advantages of hugepages is this
almost entirely made-up stupid myth. You almost have to make up the
benchmark for it (_that_ part is easy) to even see it.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-08 Thread Rik van Riel
On 05/08/2015 09:14 PM, Linus Torvalds wrote:
> On Fri, May 8, 2015 at 9:59 AM, Rik van Riel  wrote:
>>
>> However, for persistent memory, all of the files will be "in memory".
> 
> Yes. However, I doubt you will find a very sane rw filesystem that
> then also makes them contiguous and aligns them at 2MB boundaries.
> 
> Anything is possible, I guess, but things like that are *hard*. The
> fragmentation issues etc cause it to a really challenging thing.

The TLB performance bonus of accessing the large files with
large pages may make it worthwhile to solve that hard problem.

> And if they aren't aligned big contiguous allocations, then they
> aren't relevant from any largepage cases. You'll still have to map
> them 4k at a time etc.

Absolutely, but we only need the 4k struct pages when the
files are mapped. I suspect a lot of the files will just
sit around idle, without being used.

I am not convinced that the idea I wrote down earlier in
this thread is worthwhile now, but it may turn out to be
at some point in the future. It all depends on how much
data people store on DAX filesystems, and how many files
they have open at once.

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PULL] LSM: Basic module stacking infrastructure for security-next - Acked

2015-05-08 Thread Casey Schaufler
James, here's an updated pull request for LSM stacking.
Acks have been applied.

The following changes since commit b787f68c36d49bb1d9236f403813641efa74a031:

  Linux 4.1-rc1 (2015-04-26 17:59:10 -0700)

are available in the git repository at:

  g...@github.com:cschaufler/smack-next.git stacking-v22-acked

for you to fetch changes up to f17cd945a8761544ac9bfdaf55e952e558dbee3e:

  LSM: Remove unused capability.c (2015-05-08 11:37:27 -0700)

Signed-off-by: Casey Schaufler 
Acked-by: John Johansen 
Acked-by: Kees Cook 
Acked-by: Paul Moore 
Acked-by:  Stephen Smalley 
Acked-by: Tetsuo Handa 


Casey Schaufler (7):
  LSM: Split security.h
  LSM: Add the comment to lsm_hooks.h
  LSM: Remove a comment from security.h
  LSM: Introduce security hook calling Macros
  LSM: Add security module hook list heads
  LSM: Switch to lists of hooks
  LSM: Remove unused capability.c

 include/linux/lsm_hooks.h  | 1886

 include/linux/security.h   | 1621 +
 security/Makefile  |2 +-
 security/apparmor/domain.c |   12 +-
 security/apparmor/lsm.c|  131 ++-
 security/capability.c  | 1158 ---
 security/commoncap.c   |   41 +-
 security/security.c|  955 +++---
 security/selinux/hooks.c   |  490 +---
 security/smack/smack.h |4 +-
 security/smack/smack_lsm.c |  307 ---
 security/smack/smackfs.c   |2 +-
 security/tomoyo/tomoyo.c   |   72 +-
 security/yama/yama_lsm.c   |   60 +-
 14 files changed, 3064 insertions(+), 3677 deletions(-)
 create mode 100644 include/linux/lsm_hooks.h
 delete mode 100644 security/capability.c

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7 v22] LSM: Multiple concurrent LSMs

2015-05-08 Thread Tetsuo Handa
>  On 5/7/2015 4:37 AM, James Morris wrote:
> > On Sat, 2 May 2015, Casey Schaufler wrote:
> >
> >> Subject: [PATCH 0/7 v22] LSM: Multiple concurrent LSMs
> > Please add all of the Acked-by etc. from the patch review process.

Acked-by: Tetsuo Handa 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESEND PATCH 1/8] mfd: cros ec: Remove parent field

2015-05-08 Thread Javier Martinez Canillas
Hello Lee,

On 05/05/2015 12:54 PM, Lee Jones wrote:
> On Tue, 05 May 2015, Javier Martinez Canillas wrote:
>> On 04/29/2015 12:37 PM, Lee Jones wrote:
>> > On Thu, 23 Apr 2015, Gwendal Grignou wrote:
>> > 
>> >> Be consistent, use cros_ec instead of "cros ec" or "cros-ec".
>> > 
>> > What is this in reference to?
>> >
>> 
>> I think Gwendal meant that I should be consistent in general and was not
>> referring to this patch particular.
> 
> Better to quote one of the offending occurrences than to randomly
> top-post helpful comments.
> 

Just for clarification, on a second read I noticed that Gwendal was talking
about the subject line since $subject uses "mfd: cros ec" as a prefix while
most of the commits in the subsystem (and in the series) use "mfd: cros_ec".
Also, there is another patch (5/8) that used "mfd: cros-ec" so he is right
that I should had used cros_ec consistently.

Sorry for missing that and I will fix when posting v2.

Best regards,
Javier
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 2/3] usb: xhci: implement device_suspend/device_resume entries

2015-05-08 Thread Lu Baolu
This patch implements device_suspend/device_resume entries for xHC driver.
device_suspend will be called when a USB device is about to suspend. It
will issue a stop endpoint command for each endpoint in this device. The
Suspend(SP) bit in the command TRB will set which will give xHC a hint
about the suspend. device_resume will be called when a USB device is just
resumed. It will ring doorbells of all endpoint unconditionally. XHC may
use these suspend/resume hints to optimize its operation.

Signed-off-by: Lu Baolu 
---
 drivers/usb/host/xhci-hub.c |  2 +-
 drivers/usb/host/xhci.c | 38 ++
 drivers/usb/host/xhci.h |  9 +
 3 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 0827d7c..a83e82e 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -266,7 +266,7 @@ int xhci_find_slot_id_by_port(struct usb_hcd *hcd, struct 
xhci_hcd *xhci,
  * to complete.
  * suspend will set to 1, if suspend bit need to set in command.
  */
-static int xhci_stop_device(struct xhci_hcd *xhci, int slot_id, int suspend)
+int xhci_stop_device(struct xhci_hcd *xhci, int slot_id, int suspend)
 {
struct xhci_virt_device *virt_dev;
struct xhci_command *cmd;
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index ec8ac16..330961d 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -4680,6 +4680,38 @@ int xhci_disable_usb3_lpm_timeout(struct usb_hcd *hcd,
return ret;
return 0;
 }
+
+/*
+ * xHCI compatible host controller driver expects to be notified prior to
+ * selectively suspending a device. xHCI hcd could optimize the endpoint
+ * cache for power saving purpose. Refer to 4.15.1.1 of xHCI 1.1.
+ */
+void xhci_device_suspend(struct usb_hcd *hcd, struct usb_device *udev)
+{
+   struct xhci_hcd *xhci;
+
+   xhci = hcd_to_xhci(hcd);
+   if (!xhci || !xhci->devs[udev->slot_id])
+   return;
+
+   xhci_stop_device(xhci, udev->slot_id, 1);
+}
+
+/*
+ * xHCI compatible host controller driver expects to be notified after a
+ * USB device is resumed. xHCI hcd could optimize the endpoint cache
+ * to reduce the latency. Refer to 4.15.1.1 of xHCI 1.1.
+ */
+void xhci_device_resume(struct usb_hcd *hcd, struct usb_device *udev)
+{
+   struct xhci_hcd *xhci;
+
+   xhci = hcd_to_xhci(hcd);
+   if (!xhci || !xhci->devs[udev->slot_id])
+   return;
+
+   xhci_ring_device(xhci, udev->slot_id);
+}
 #else /* CONFIG_PM */
 
 int xhci_set_usb2_hardware_lpm(struct usb_hcd *hcd,
@@ -4976,6 +5008,12 @@ static const struct hc_driver xhci_hc_driver = {
.enable_usb3_lpm_timeout =  xhci_enable_usb3_lpm_timeout,
.disable_usb3_lpm_timeout = xhci_disable_usb3_lpm_timeout,
.find_raw_port_number = xhci_find_raw_port_number,
+
+   /*
+* call back when devices suspend or resume
+*/
+   .device_suspend =   xhci_device_suspend,
+   .device_resume =xhci_device_resume,
 };
 
 void xhci_init_driver(struct hc_driver *drv, int (*setup_fn)(struct usb_hcd *))
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 8e421b8..67c13c5 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1867,10 +1867,19 @@ u32 xhci_port_state_to_neutral(u32 state);
 int xhci_find_slot_id_by_port(struct usb_hcd *hcd, struct xhci_hcd *xhci,
u16 port);
 void xhci_ring_device(struct xhci_hcd *xhci, int slot_id);
+int xhci_stop_device(struct xhci_hcd *xhci, int slot_id, int suspend);
 
 /* xHCI contexts */
 struct xhci_input_control_ctx *xhci_get_input_control_ctx(struct 
xhci_container_ctx *ctx);
 struct xhci_slot_ctx *xhci_get_slot_ctx(struct xhci_hcd *xhci, struct 
xhci_container_ctx *ctx);
 struct xhci_ep_ctx *xhci_get_ep_ctx(struct xhci_hcd *xhci, struct 
xhci_container_ctx *ctx, unsigned int ep_index);
 
+#ifdef CONFIG_PM
+void xhci_device_suspend(struct usb_hcd *hcd, struct usb_device *udev);
+void xhci_device_resume(struct usb_hcd *hcd, struct usb_device *udev);
+#else
+#define xhci_device_suspendNULL
+#define xhci_device_resume NULL
+#endif /* CONFIG_PM */
+
 #endif /* __LINUX_XHCI_HCD_H */
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 0/3] usb: notify hcd when USB device suspend or resume

2015-05-08 Thread Lu Baolu
This patch series try to meet a design requirement in xHCI spec.

The xHCI spec is designed to allow an xHC implementation to cache the
endpoint state. Caching endpoint state allows an xHC to reduce latency
when handling ERDYs and other USB asynchronous events. However holding
this state in xHC consumes resources and power. The xHCI spec designs
some methods through which host controller driver can hint xHC about
how to optimize its operation, e.g. to determine when it holds state
internally or pushes it out to memory, when to power down logic, etc.

When a USB device is going to suspend, states of all endpoints cached
in the xHC should be pushed out to memory to save power and resources.
Vice versa, when a USB device resumes, those states should be brought
back to the cache.

It is harmless if a USB devices under USB 3.0 host controller suspends
or resumes without a notification to hcd driver. However there may be
less opportunities for power savings and there may be increased latency
for restarting an endpoint. The precise impact will be different for
each xHC implementation. It all depends on what an implementation does
with the hints.

Change log:
v4->v5:
 - add Alan's ACK for 1/3

v3->v4:
 - remove unused 'msg' parameter in the callbacks

v2->v3:
 - move two xhci specific comments from hub to xhci
 - define xhci_device_suspend(resume) as NULL when no PM_CONFIG

v1->v2:
 - make the callback name specific to the activity in question
 - no need to export hcd_notify
 - put the callback in the right place

Lu Baolu (3):
  usb: notify hcd when USB device suspend or resume
  usb: xhci: implement device_suspend/device_resume entries
  usb: xhci: remove stop device and ring doorbell in hub control and bus
suspend

 drivers/usb/core/hcd.c  | 27 +
 drivers/usb/core/hub.c  |  5 +
 drivers/usb/host/xhci-hub.c | 49 +
 drivers/usb/host/xhci.c | 38 +++
 drivers/usb/host/xhci.h |  9 +
 include/linux/usb/hcd.h |  8 +++-
 6 files changed, 87 insertions(+), 49 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 3/3] usb: xhci: remove stop device and ring doorbell in hub control and bus suspend

2015-05-08 Thread Lu Baolu
There is no need to call xhci_stop_device() and xhci_ring_device() in
hub control and bus suspend functions since all device suspend and
resume have been notified through device_suspend/device_resume interfaces.

Signed-off-by: Lu Baolu 
---
 drivers/usb/host/xhci-hub.c | 47 -
 1 file changed, 47 deletions(-)

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index a83e82e..f12e1b7 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -704,7 +704,6 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 
wValue,
u32 temp, status;
int retval = 0;
__le32 __iomem **port_array;
-   int slot_id;
struct xhci_bus_state *bus_state;
u16 link_state = 0;
u16 wake_mask = 0;
@@ -818,17 +817,6 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 
wValue,
goto error;
}
 
-   slot_id = xhci_find_slot_id_by_port(hcd, xhci,
-   wIndex + 1);
-   if (!slot_id) {
-   xhci_warn(xhci, "slot_id is zero\n");
-   goto error;
-   }
-   /* unlock to execute stop endpoint commands */
-   spin_unlock_irqrestore(>lock, flags);
-   xhci_stop_device(xhci, slot_id, 1);
-   spin_lock_irqsave(>lock, flags);
-
xhci_set_link_state(xhci, port_array, wIndex, XDEV_U3);
 
spin_unlock_irqrestore(>lock, flags);
@@ -876,19 +864,6 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 
wValue,
goto error;
}
 
-   if (link_state == USB_SS_PORT_LS_U3) {
-   slot_id = xhci_find_slot_id_by_port(hcd, xhci,
-   wIndex + 1);
-   if (slot_id) {
-   /* unlock to execute stop endpoint
-* commands */
-   spin_unlock_irqrestore(>lock,
-   flags);
-   xhci_stop_device(xhci, slot_id, 1);
-   spin_lock_irqsave(>lock, flags);
-   }
-   }
-
xhci_set_link_state(xhci, port_array, wIndex,
link_state);
 
@@ -994,14 +969,6 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 
wValue,
XDEV_U0);
}
bus_state->port_c_suspend |= 1 << wIndex;
-
-   slot_id = xhci_find_slot_id_by_port(hcd, xhci,
-   wIndex + 1);
-   if (!slot_id) {
-   xhci_dbg(xhci, "slot_id is zero\n");
-   goto error;
-   }
-   xhci_ring_device(xhci, slot_id);
break;
case USB_PORT_FEAT_C_SUSPEND:
bus_state->port_c_suspend &= ~(1 << wIndex);
@@ -1133,20 +1100,12 @@ int xhci_bus_suspend(struct usb_hcd *hcd)
while (port_index--) {
/* suspend the port if the port is not suspended */
u32 t1, t2;
-   int slot_id;
 
t1 = readl(port_array[port_index]);
t2 = xhci_port_state_to_neutral(t1);
 
if ((t1 & PORT_PE) && !(t1 & PORT_PLS_MASK)) {
xhci_dbg(xhci, "port %d not suspended\n", port_index);
-   slot_id = xhci_find_slot_id_by_port(hcd, xhci,
-   port_index + 1);
-   if (slot_id) {
-   spin_unlock_irqrestore(>lock, flags);
-   xhci_stop_device(xhci, slot_id, 1);
-   spin_lock_irqsave(>lock, flags);
-   }
t2 &= ~PORT_PLS_MASK;
t2 |= PORT_LINK_STROBE | XDEV_U3;
set_bit(port_index, _state->bus_suspended);
@@ -1207,7 +1166,6 @@ int xhci_bus_resume(struct usb_hcd *hcd)
/* Check whether need resume ports. If needed
   resume port and disable remote wakeup */
u32 temp;
-   int slot_id;
 
temp = readl(port_array[port_index]);
if (DEV_SUPERSPEED(temp))
@@ -1240,11 +1198,6 @@ int xhci_bus_resume(struct usb_hcd *hcd)
/* Clear PLC */
xhci_test_and_clear_bit(xhci, 

[PATCH v5 1/3] usb: notify hcd when USB device suspend or resume

2015-05-08 Thread Lu Baolu
This patch adds two new entries in hc_driver. With these new entries,
USB core can notify host driver when a USB device is about to suspend
or just resumed.

The xHCI spec is designed to allow an xHC implementation to cache the
endpoint state. Caching endpoint state allows an xHC to reduce latency
when handling ERDYs and other USB asynchronous events. However holding
this state in xHC consumes resources and power. The xHCI spec designs
some methods through which host controller driver can hint xHC about
how to optimize its operation, e.g. to determine when it holds state
internally or pushes it out to memory, when to power down logic, etc.

When a USB device is going to suspend, states of all endpoints cached
in the xHC should be pushed out to memory to save power and resources.
Vice versa, when a USB device resumes, those states should be brought
back to the cache. USB core suspends or resumes a USB device by sending
set/clear port feature requests to the parent hub where the USB device
is connected. Unfortunately, these operations are transparent to xHCI
driver unless the USB device is plugged in a root port. This patch
utilizes the callback entries to notify xHCI driver whenever a USB
device suspends or resumes.

It is harmless if a USB devices under USB 3.0 host controller suspends
or resumes without a notification to hcd driver. However there may be
less opportunities for power savings and there may be increased latency
for restarting an endpoint. The precise impact will be different for
each xHC implementation. It all depends on what an implementation does
with the hints.

Signed-off-by: Lu Baolu 
Acked-by: Alan Stern 
---
 drivers/usb/core/hcd.c  | 27 +++
 drivers/usb/core/hub.c  |  5 +
 include/linux/usb/hcd.h |  8 +++-
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 45a915c..007450d 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -2289,6 +2289,33 @@ void usb_hcd_resume_root_hub (struct usb_hcd *hcd)
 }
 EXPORT_SYMBOL_GPL(usb_hcd_resume_root_hub);
 
+/**
+ * hcd_suspend_notify - notify hcd driver when a device is going to suspend
+ * @udev: USB device to be suspended
+ *
+ * Call back to hcd driver to notify that a USB device is going to suspend.
+ */
+void hcd_suspend_notify(struct usb_device *udev)
+{
+   struct usb_hcd *hcd = bus_to_hcd(udev->bus);
+
+   if (hcd->driver->device_suspend)
+   hcd->driver->device_suspend(hcd, udev);
+}
+
+/**
+ * hcd_resume_notify - notify hcd driver when a device is just resumed
+ * @udev: USB device just resumed
+ *
+ * Call back to hcd driver to notify that a USB device is just resumed.
+ */
+void hcd_resume_notify(struct usb_device *udev)
+{
+   struct usb_hcd *hcd = bus_to_hcd(udev->bus);
+
+   if (hcd->driver->device_resume)
+   hcd->driver->device_resume(hcd, udev);
+}
 #endif /* CONFIG_PM */
 
 /*-*/
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 52178bc..28d76b7 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -3149,6 +3149,8 @@ int usb_port_suspend(struct usb_device *udev, 
pm_message_t msg)
goto err_lpm3;
}
 
+   hcd_suspend_notify(udev);
+
/* see 7.1.7.6 */
if (hub_is_superspeed(hub->hdev))
status = hub_set_port_link_state(hub, port1, USB_SS_PORT_LS_U3);
@@ -3174,6 +3176,8 @@ int usb_port_suspend(struct usb_device *udev, 
pm_message_t msg)
if (status) {
dev_dbg(_dev->dev, "can't suspend, status %d\n", status);
 
+   hcd_resume_notify(udev);
+
/* Try to enable USB3 LPM and LTM again */
usb_unlocked_enable_lpm(udev);
  err_lpm3:
@@ -3427,6 +3431,7 @@ int usb_port_resume(struct usb_device *udev, pm_message_t 
msg)
}
 
  SuspendCleared:
+   hcd_resume_notify(udev);
if (status == 0) {
udev->port_is_suspended = 0;
if (hub_is_superspeed(hub->hdev)) {
diff --git a/include/linux/usb/hcd.h b/include/linux/usb/hcd.h
index 68b1e83..bc2eb1c 100644
--- a/include/linux/usb/hcd.h
+++ b/include/linux/usb/hcd.h
@@ -383,7 +383,11 @@ struct hc_driver {
int (*find_raw_port_number)(struct usb_hcd *, int);
/* Call for power on/off the port if necessary */
int (*port_power)(struct usb_hcd *hcd, int portnum, bool enable);
-
+   /* Call back to hcd when a USB device is going to suspend or just
+* resumed.
+*/
+   void(*device_suspend)(struct usb_hcd *, struct usb_device *udev);
+   void(*device_resume)(struct usb_hcd *, struct usb_device *udev);
 };
 
 static inline int hcd_giveback_urb_in_bh(struct usb_hcd *hcd)
@@ -632,6 +636,8 @@ extern void usb_root_hub_lost_power(struct usb_device 
*rhdev);
 extern int hcd_bus_suspend(struct usb_device *rhdev, 

Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

2015-05-08 Thread Linus Torvalds
On Fri, May 8, 2015 at 9:59 AM, Rik van Riel  wrote:
>
> However, for persistent memory, all of the files will be "in memory".

Yes. However, I doubt you will find a very sane rw filesystem that
then also makes them contiguous and aligns them at 2MB boundaries.

Anything is possible, I guess, but things like that are *hard*. The
fragmentation issues etc cause it to a really challenging thing.

And if they aren't aligned big contiguous allocations, then they
aren't relevant from any largepage cases. You'll still have to map
them 4k at a time etc.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ARM: EXYNOS: dts: Improvements for 4.2

2015-05-08 Thread Krzysztof Kozlowski
2015-05-06 9:59 GMT+09:00 Krzysztof Kozlowski :
> Dear Kukjin,
>
> I gathered various improvements for upcoming 4.2 merge window.
> Description along with a tag.
>
> Best regards,
> Krzysztof
>
> 
> The following changes since commit 5ebe6afaf0057ac3eaeb98defd5456894b446d22:
>
>   Linux 4.1-rc2 (2015-05-03 19:22:23 -0700)
>
> are available in the git repository at:
>
>   https://github.com/krzk/linux.git tags/samsung-dt-for-next-4.2
>
> for you to fetch changes up to 98155ec454a40212434c83c1388c420a61d62854:
>
>   ARM: dts: exynos4: add nodes for jpeg codec (2015-05-05 19:19:18 +0900)
>
> 
> Device Tree improvements for Exynos based boards:
> 1. Fix PMIC's RTC alarm on Arndale Octa (S2MPS11 PMIC) and SMDK5250 
> (MAXIM77686
>PMIC).
> 2. Fix suspend on Peach Pit Chromebooks prevented by Marvell mwifiex
>driver.
> 3. Fix hang on Odroid XU3 on MMC card detect.
> 4. Enable the S3C RTC (the clock present on SoC) on various Exynos boards.
> 5. Minor improvements to S3C RTC driver.
> 6. Add nodes for JPEG codec on Exynos4.
>
> 
> Andrzej Hajda (1):
>   ARM: dts: exynos5422-odroidxu3: add mmc detect gpio
>
> Javier Martinez Canillas (2):
>   ARM: dts: Make DP a consumer of DISP1 power domain on Exynos5420
>   ARM: dts: Add keep-power-in-suspend to WiFi SDIO node for Peach Boards
>
> Krzysztof Kozlowski (5):
>   ARM: dts: Fix pinctrl settings for S2MPS11 RTC alarm IRQ on Arndale Octa
>   ARM: dts: s3c-rtc: Use s3c6410-rtc instead of exynos3250-rtc
>   ARM: dts: Use define for s3c-rtc clock id
>   ARM: dts: Use define for s3c-rtc clock id
>   ARM: dts: Enable S3C RTC on Trats2 and Arndale Octa
>
> Marek Szyprowski (1):
>   ARM: dts: exynos4: add nodes for jpeg codec
>
> Markus Reichl (2):
>   ARM: dts: Add bindings for 32kHz clocks from s2mps11
>   ARM: dts: exynos5422-odroidxu3: add 'rtc_src' clock to rtc node
>
> Yadwinder Singh Brar (1):
>   ARM: dts: Add missing irq pinctrl for max77686 on smdk5250


Dear Kukjin,

Gentle reminder, it seems this one was missed.
Could you apply these patches as well? Above you will find description
of the pull request.

Best regards,
Krzysztof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] w1_therm reference count family data

2015-05-08 Thread David Fries
A temperature conversion can take 750 ms and when possible the
w1_therm slave driver drops the bus_mutex to allow other bus
operations, but that includes operations such as a periodic slave
search, which can remove this slave when it is no longer detected.
If that happens the sl->family_data will be freed and set to NULL
causing w1_slave_show to crash when it wakes up.

Signed-off-by: David Fries 
Reported-By: Thorsten Bschorr 
Tested-by: Thorsten Bschorr 
Acked-by: Evgeniy Polyakov 
---
This should be applied to the stable series as well.  In the name of
full disclosure, this just narrows the race window, from crashing in
normal operation on the reporters system to no longer crashing with
multiple readers and another process hammering on inserting/removing
the slave device.

This patch has been tested for some weeks by those who were affected,
Evgeniy Polyakov (maintainer) has approved this fix with the intention
of switching to sysfs device reference counting (as w1_slave_show is
called through sysfs).

 drivers/w1/slaves/w1_therm.c |   62 --
 1 file changed, 47 insertions(+), 15 deletions(-)

diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
index 1f11a20..55eb86c 100644
--- a/drivers/w1/slaves/w1_therm.c
+++ b/drivers/w1/slaves/w1_therm.c
@@ -59,16 +59,32 @@ MODULE_ALIAS("w1-family-" __stringify(W1_THERM_DS28EA00));
 static int w1_strong_pullup = 1;
 module_param_named(strong_pullup, w1_strong_pullup, int, 0);
 
+struct w1_therm_family_data {
+   uint8_t rom[9];
+   atomic_t refcnt;
+};
+
+/* return the address of the refcnt in the family data */
+#define THERM_REFCNT(family_data) \
+   (&((struct w1_therm_family_data*)family_data)->refcnt)
+
 static int w1_therm_add_slave(struct w1_slave *sl)
 {
-   sl->family_data = kzalloc(9, GFP_KERNEL);
+   sl->family_data = kzalloc(sizeof(struct w1_therm_family_data),
+   GFP_KERNEL);
if (!sl->family_data)
return -ENOMEM;
+   atomic_set(THERM_REFCNT(sl->family_data), 1);
return 0;
 }
 
 static void w1_therm_remove_slave(struct w1_slave *sl)
 {
+   int refcnt = atomic_sub_return(1, THERM_REFCNT(sl->family_data));
+   while(refcnt) {
+   msleep(1000);
+   refcnt = atomic_read(THERM_REFCNT(sl->family_data));
+   }
kfree(sl->family_data);
sl->family_data = NULL;
 }
@@ -194,13 +210,22 @@ static ssize_t w1_slave_show(struct device *device,
struct w1_slave *sl = dev_to_w1_slave(device);
struct w1_master *dev = sl->master;
u8 rom[9], crc, verdict, external_power;
-   int i, max_trying = 10;
+   int i, ret, max_trying = 10;
ssize_t c = PAGE_SIZE;
+   u8 *family_data = sl->family_data;
+
+   ret = mutex_lock_interruptible(>bus_mutex);
+   if (ret != 0)
+   goto post_unlock;
 
-   i = mutex_lock_interruptible(>bus_mutex);
-   if (i != 0)
-   return i;
+   if(!sl->family_data)
+   {
+   ret = -ENODEV;
+   goto pre_unlock;
+   }
 
+   /* prevent the slave from going away in sleep */
+   atomic_inc(THERM_REFCNT(family_data));
memset(rom, 0, sizeof(rom));
 
while (max_trying--) {
@@ -230,17 +255,19 @@ static ssize_t w1_slave_show(struct device *device,
mutex_unlock(>bus_mutex);
 
sleep_rem = msleep_interruptible(tm);
-   if (sleep_rem != 0)
-   return -EINTR;
+   if (sleep_rem != 0) {
+   ret = -EINTR;
+   goto post_unlock;
+   }
 
-   i = mutex_lock_interruptible(>bus_mutex);
-   if (i != 0)
-   return i;
+   ret = mutex_lock_interruptible(>bus_mutex);
+   if (ret != 0)
+   goto post_unlock;
} else if (!w1_strong_pullup) {
sleep_rem = msleep_interruptible(tm);
if (sleep_rem != 0) {
-   mutex_unlock(>bus_mutex);
-   return -EINTR;
+   ret = -EINTR;
+   goto pre_unlock;
}
}
 
@@ -269,19 +296,24 @@ static ssize_t w1_slave_show(struct device *device,
c -= snprintf(buf + PAGE_SIZE - c, c, ": crc=%02x %s\n",
   crc, (verdict) ? "YES" : "NO");
if (verdict)
-   memcpy(sl->family_data, rom, sizeof(rom));
+   memcpy(family_data, rom, sizeof(rom));
else

Re: [GIT PULL] ARM: EXYNOS: Improvements for 4.2, try 2

2015-05-08 Thread Krzysztof Kozlowski
2015-05-09 3:54 GMT+09:00 Kukjin Kim :
> On 05/06/15 11:19, Krzysztof Kozlowski wrote:
>> Dear Kukjin,
>>
>> Updated pull request. The first one contained an older version
>> of my patch.
>>
>> This adds coupled cpuidle for Exynos3250 and improves the Exynos
>> code in few places. Everything for upcoming 4.2 merge window.
>> Description along with a tag.
>>
>> Best regards,
>> Krzysztof
>>
>> 
>> The following changes since commit 5ebe6afaf0057ac3eaeb98defd5456894b446d22:
>>
>>   Linux 4.1-rc2 (2015-05-03 19:22:23 -0700)
>>
>> are available in the git repository at:
>>
>>   https://github.com/krzk/linux.git tags/samsung-for-next-4.2
>>
>> for you to fetch changes up to 182be665abb95fcddd656355ae6208d02533b6b9:
>>
>>   ARM: plat-samsung: Constify platform_device_id (2015-05-06 11:05:43 +0900)
>>
>> 
>> Extending cpuidle driver and improvements for Exynos based boards:
>> 1. Add missing return-value checks and of_node_put() for power domain
>>driver.
>> 2. Fix missing clk_prepare in S3C24XX ADC driver.
>> 3. Rework clock handling when switching power domains on/off. Instead
>>of settting fixed parent in DTS we grab the parent clock before
>>turning the domain off.
>> 4. Add coupled cpuidle support for Exynos3250 to an existing
>>cpuidle-exynos driver. As a result it enables AFTR mode
>>(ARM-Off Top-Running) to be used by default on Exynos3250
>>without the need to hot unplug CPU1 first.
>>
>> 
>> Bartlomiej Zolnierkiewicz (5):
>>   ARM: EXYNOS: fix exynos_boot_secondary() return value on timeout
>>   ARM: EXYNOS: make exynos_core_restart() less verbose
>>   ARM: EXYNOS: add exynos_set_boot_addr() helper
>>   ARM: EXYNOS: add exynos_get_boot_addr() helper
>>   cpuidle: exynos: add coupled cpuidle support for Exynos3250
>>
>> Krzysztof Kozlowski (8):
>>   ARM: EXYNOS: Fix failed second suspend on Exynos4
>>   ARM: EXYNOS: Handle of of_iomap() failure
>>   ARM: EXYNOS: Handle of_find_device_by_node and kstrdup failures
>>   ARM: EXYNOS: Add missing of_node_put() when parsing power domains
>>   ARM: EXYNOS: Get current parent clock for power domain on/off
>>   ARM: dts: Use last parent for clocks during power domain on/off
>>   ARM: EXYNOS: Constify irq_domain_ops
>>   ARM: plat-samsung: Constify platform_device_id
>>
>> Sergiy Kibrik (1):
>>   ARM: SAMSUNG: fix clk_enable() WARNing in S3C24XX ADC
>>
>>  .../bindings/arm/exynos/power_domain.txt   |   7 +-
>>  arch/arm/boot/dts/exynos5420.dtsi  |  13 +--
>>  arch/arm/include/asm/firmware.h|   4 +
>>  arch/arm/mach-exynos/common.h  |   6 +-
>>  arch/arm/mach-exynos/exynos.c  |  30 -
>>  arch/arm/mach-exynos/firmware.c|  18 +++
>>  arch/arm/mach-exynos/platsmp.c | 123 
>> ++---
>>  arch/arm/mach-exynos/pm.c  |  51 +++--
>>  arch/arm/mach-exynos/pm_domains.c  |  44 ++--
>>  arch/arm/mach-exynos/suspend.c |   5 +-
>>  arch/arm/plat-samsung/adc.c|   6 +-
>>  11 files changed, 208 insertions(+), 99 deletions(-)
>> --
>
> Hi Krzysztof,
>
> Firstly, thanks for your gentle reminder and effort.
>
> By the way, this cannot be pulled because some of them are old patches
> even though I've missed them at that time. Sorry about that and if it is
> required, it should be re-submitted based on latest mainline after
> testing then reviewed in ml again...

They were rebased on mainline. However some of them (cpuidle) depend
on "ARM: EXYNOS: Fix failed second suspend on Exynos4".

> And please do use 'git am' or some relevant git commend when applying
> patches in your tree because I found there are some comments below '---'
> in git log such as 'changes since' or some 'note' but that is not
> required. Just it is for information in ml not git log basically.

Right, this happened only for my patches which I cherry-picked thus
leaving the separator. Next time I will be more careful.

>
> Anyway I'm going to sort them out including your patches in my tree
> directly for 4.2.

Thanks!
Best regards,
Krzysztof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ARM: EXYNOS: Fixes for 4.1

2015-05-08 Thread Krzysztof Kozlowski
2015-05-09 3:26 GMT+09:00 Kukjin Kim :
> On 05/06/15 09:55, Krzysztof Kozlowski wrote:
>> Krzysztof Kozlowski (1):
>>   ARM: EXYNOS: Fix failed second suspend on Exynos4
>>
>>  arch/arm/mach-exynos/common.h  |  2 ++
>>  arch/arm/mach-exynos/exynos.c  | 27 +++
>>  arch/arm/mach-exynos/platsmp.c | 39 ++-
>>  arch/arm/mach-exynos/suspend.c |  3 +++
>>  4 files changed, 34 insertions(+), 37 deletions(-)
>> --
>
> Hi Krzysztof,
>
> Yeah, the patch is old thing but if I remember correctly, I couldn't
> apply it because of soc_is_xxx() macro. Yes, I agree with the way,
> firstly fix something and if required, then update it laterBut!
>
> Please address comments about that, if not, it will not be accepted by
> arm-soc tree.

Hi,

Actually during previous discussion I addressed all the issues:
https://lkml.org/lkml/2015/3/18/108
but there was no response from your side.

In previous thread you also asked about proper way of using the
DELAYED_RESET_ASSERTION. I tried to obtain additional docs from LSI
but couldn't. In current documentation it was described vaguely so I
looked at vendor source code. The patch IMHO is the best solution.
Additionally it was tested (Chanwoo and Bartlomiej) and fixes the
issue.

As for the soc_is_exynos -> of_machine_is_compatible I will prepare a
follow up to fix this.

Let me know if you have any other comments about it.

Best regards,
Krzysztof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] usb: notify hcd when USB device suspend or resume

2015-05-08 Thread Lu, Baolu



On 05/08/2015 10:21 PM, Alan Stern wrote:

On Fri, 8 May 2015, Lu, Baolu wrote:


On 05/07/2015 10:34 PM, Alan Stern wrote:

On Thu, 7 May 2015, Lu, Baolu wrote:


+   void(*device_suspend)(struct usb_hcd *, struct usb_device *udev,
+   pm_message_t msg);
+   void(*device_resume)(struct usb_hcd *, struct usb_device *udev,
+   pm_message_t msg);
};

Your callbacks don't use the msg argument.  What makes you think it is
needed?

This msg argument is valuable. XHCI spec defines a capability named FSC
(Force Save context Capability). When this capability is implemented, the
Save State operation (do during host suspend) shall save any cached Slot,
Endpoint, Stream or other Context information to memory. xHCI hcd could
use this "msg" to determine whether it needs to issue stop endpoint with
SP (suspend) bit set.

I don't understand.  What is the advantage of using FSC?

I'm sorry, I didn't make it clear.

As part of host suspend, controller save state will be issued to save
host internal state in xhci_suspend():

...


If FSC is supported,  the cached Slot, Endpoint, Stream, or other
Context information are also saved.

Hence, when FSC is supported, software does not have to issue Stop
Endpoint Command to push public and private endpoint state into
memory as part of system suspend process.

Why do you have to push this state into memory at all?  Does the
controller hardware lose the cached state information when it is in low
power?


I don't think controller hardware will lose the cached state information
when it is in low power. But since cache in controller consumes power
and resources, by pushing state into memory, hardware can power
off the cache logic during suspend. Hence more power saving gains.




The logic in xhci_device_suspend() will look like:

if xhci_device_suspend() callback was called due to system suspend,
(mesg.event & PM_EVENT_SUSPEND is true) and FSC is supported by
the xHC implementation, xhci_device_suspend() could ignore stop
endpoint command, since CSS will be done in xhc_suspend() later and
where all the endpoint caches will be pushed to memory.

I still don't understand this.  You said earlier that according
to section 4.15.1.1 of the xHCI spec, the endpoint rings should
_always_ be stopped with SP set when a device is suspended.  Now you're


The intention of stop endpoint with SP set is to tell hardware that
"a device is going to suspend, hardware don't need to contain the
endpoint state in internal cache anymore". Hardware _could_ use
this hint to push endpoint state into memory to reduce power
consumption.


saying that they don't need to be stopped during a system suspend if
the controller supports FSC.  (Or maybe you're saying they need to be
stopped but SP doesn't need to be set -- it's hard to tell.)


Even FSC is supported, controller hardware still need to push cached
endpoint state into memory when a USB device is suspended. The
difference is when FSC is enforced, CSS command will push any
cached endpoint state into memory unconditionally.

So, when xhci_device_suspend() knows that CSS command will be
executed later and CSS command will push cached endpoint state
into memory (a.k.a. FSC is enforced), it could skip issuing stop
endpoint command with SP bit set to avoid duplication and reduce
the suspend time.

This is the case for system suspend since CSS command is part of
xhci_suspend() and xhci_suspend() will be executed after all USB
devices have been suspended. But it's not case for run-time suspend
(auto-pm) since USB device suspend and host controller suspend
are independent for run-time case.

That's the reason why I wanted to keep 'msg' parameter. But just as
Greg said, we don't need to keep a parameter when it's not used
and can add it later when it is required.



So which is it?  Do you need to stop the endpoint rings?  Is it okay
not to set SP?


"stop endpoint" and "stop endpoint with SP set" serve different purposes
in Linux xhci driver as my understanding. "stop endpoint" command is
used to stop a active ring when upper layer want to cancel a URB.
"stop endpoint with SP set" is used to hint hardware that a USB is going
to suspend. Hence "stop endpoint with SP set" must be executed in case
that the transfer ring is empty.



Alan Stern


Thank you,
Baolu






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v1 01/12] kernel/params.c: export param_ops_bool_enable_only

2015-05-08 Thread Rusty Russell
"Luis R. Rodriguez"  writes:
> From: "Luis R. Rodriguez" 
>
> This will grant access to this helper to code built as modules.
>
> Cc: Rusty Russell 
> Cc: David Howells 
> Cc: Ming Lei 
> Cc: Seth Forshee 
> Cc: Kyle McMartin 
> Signed-off-by: Luis R. Rodriguez 

Applied,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] module: Call module notifier on failure after complete_formation()

2015-05-08 Thread Rusty Russell
Steven Rostedt  writes:
> On Fri, 8 May 2015 11:17:36 -0400
> Steven Rostedt  wrote:
>
>> 
>> The module notifier call chain for MODULE_STATE_COMING was moved up before
>> the parsing of args, into the complete_formation() call. But if the module 
>> failed
>> to load after that, the notifier call chain for MODULE_STATE_GOING was
>> never called and that prevented the users of those call chains from
>> cleaning up anything that was allocated.
>> 
>> Link: http://lkml.kernel.org/r/554c52b9.9060...@gmail.com
>
> You can nuke the "Link". I didn't realize Pontus didn't Cc any mailing
> lists, and I manually just added it. Usually my scripts will check if
> lkml was Cc'd and only add the "Link" tag if it was. Just shows you
> that my scripts are smarter than I am.

Thanks for this.

Applied,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 0/9] latched RB-trees and __module_address()

2015-05-08 Thread Rusty Russell
Ingo Molnar  writes:
> * Rusty Russell  wrote:
>
>> Peter Zijlstra  writes:
>> > This series is aimed at making __module_address() go fast(er).
>> 
>> Acked-by: Rusty Russell  (module parts)
>> 
>> Since all the interesting stuff is not module-specific, assume this 
>> is via Ingo?  Otherwise, I'll take it...
>
> I can certainly take them, but since I think that the _breakages_ are 
> going to be in module land foremost, it should be rather under your 
> watchful eyes? :-)

Ingo, I feel like you just gave me a free puppy...

Applied,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] m32r: make flush_cpumask non-volatile.

2015-05-08 Thread Rusty Russell
We cast away the volatile, but really, why make it volatile at all?
We already do a mb() inside the cpumask_empty() loop.

Signed-off-by: Rusty Russell 

diff --git a/arch/m32r/kernel/smp.c b/arch/m32r/kernel/smp.c
index ce7aea34fdf4..c18ddc74ef9a 100644
--- a/arch/m32r/kernel/smp.c
+++ b/arch/m32r/kernel/smp.c
@@ -45,7 +45,7 @@ static volatile unsigned long flushcache_cpumask = 0;
 /*
  * For flush_tlb_others()
  */
-static volatile cpumask_t flush_cpumask;
+static cpumask_t flush_cpumask;
 static struct mm_struct *flush_mm;
 static struct vm_area_struct *flush_vma;
 static volatile unsigned long flush_va;
@@ -415,7 +415,7 @@ static void flush_tlb_others(cpumask_t cpumask, struct 
mm_struct *mm,
 */
send_IPI_mask(, INVALIDATE_TLB_IPI, 0);
 
-   while (!cpumask_empty((cpumask_t*)_cpumask)) {
+   while (!cpumask_empty(_cpumask)) {
/* nothing. lockup detection does not belong here */
mb();
}
@@ -468,7 +468,7 @@ void smp_invalidate_interrupt(void)
__flush_tlb_page(va);
}
}
-   cpumask_clear_cpu(cpu_id, (cpumask_t*)_cpumask);
+   cpumask_clear_cpu(cpu_id, _cpumask);
 }
 
 /*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] iommu: exynos: tell kmemleak to ignore 2nd level page tables

2015-05-08 Thread Dmitry Torokhov
From: Colin Cross 

The pointers to the 2nd level page tables are converted to 1st level
page table entries, which means kmemleak can't find them and assumes
they have been leaked.  Call kmemleak_ignore on the 2nd level page
tables to prevent them from showing up in kmemleak reports.

Signed-off-by: Colin Cross 
Signed-off-by: Dmitry Torokhov 
---
 drivers/iommu/exynos-iommu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index dc14fec4..16920b2 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -855,6 +855,7 @@ static sysmmu_pte_t *alloc_lv2entry(struct 
exynos_iommu_domain *priv,
return ERR_PTR(-ENOMEM);
 
*sent = mk_lv1ent_page(virt_to_phys(pent));
+   kmemleak_ignore(pent);
*pgcounter = NUM_LV2ENTRIES;
pgtable_flush(pent, pent + NUM_LV2ENTRIES);
pgtable_flush(sent, sent + 1);
-- 
2.2.0.rc0.207.ga3a616c


-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf: another perf_fuzzer generated lockup

2015-05-08 Thread Stephane Eranian
Vince,

On Thu, May 7, 2015 at 9:40 PM, Vince Weaver  wrote:
>
>
> This is a new one I think, I hit it on the haswell machine running
> 4.1-rc2.
>
> The backtrace is complex enough I'm not really sure what's going on here.
>
> The fuzzer has been having weird issues where it's been getting
> overflow signals from invalid fds.  This seems to happen
> when an overflow signal interrupts the fuzzer mid-fork?
> And the fuzzer code doesn't handle this well and attempts to call exit()
> and/or kill the child from the signal handler that interrupted the
> fork() and that doesn't always go well.  I'm not sure if this is related,
> just that some of those actions seem to appear in the backtrace.
>
>
Is there a way to figure out how the fuzzer had programmed the PMU
to get there? (besides adding PMU state dump in the kernel crashdump)?


>
> [33864.529861] [ cut here ]
> [33864.534824] WARNING: CPU: 1 PID: 9852 at kernel/watchdog.c:302 
> watchdog_overflow_callback+0x92/0xc0()
> [33864.544682] Watchdog detected hard LOCKUP on cpu 1
> [33864.549635] Modules linked in:
> [33864.552943]  fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl 
> iosf_mbi coretemp snd_hda_codec_realtek snd_hda_codec_hdmi 
> snd_hda_codec_generic kvm snd_hda_intel snd_hda_controller snd_hda_codec 
> snd_hda_core crct10dif_pclmul snd_hwdep crc32_pclmul ghash_clmulni_intel 
> snd_pcm aesni_intel aes_x86_64 lrw gf128mul evdev i915 iTCO_wdt 
> iTCO_vendor_support glue_helper snd_timer ppdev ablk_helper psmouse 
> drm_kms_helper cryptd snd drm pcspkr serio_raw lpc_ich soundcore parport_pc 
> xhci_pci battery video processor i2c_i801 mei_me mei wmi i2c_algo_bit tpm_tis 
> mfd_core xhci_hcd tpm parport button sg sr_mod sd_mod cdrom ehci_pci ahci 
> ehci_hcd libahci libata e1000e ptp usbcore crc32c_intel scsi_mod fan 
> usb_common pps_core thermal thermal_sys
> [33864.622413] CPU: 1 PID: 9852 Comm: perf_fuzzer Not tainted 4.1.0-rc2+ #142
> [33864.629776] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 
> 01/26/2014
> [33864.637685]  81a209b5 88011ea45aa0 816d51d3 
> 
> [33864.645709]  88011ea45af0 88011ea45ae0 81072dfa 
> 88011ea45ac0
> [33864.653731]  880119b8f800  88011ea45c40 
> 88011ea45ef8
> [33864.661783] Call Trace:
> [33864.664409][] dump_stack+0x45/0x57
> [33864.670618]  [] warn_slowpath_common+0x8a/0xc0
> [33864.677071]  [] warn_slowpath_fmt+0x46/0x50
> [33864.683202]  [] ? native_apic_wait_icr_idle+0x24/0x30
> [33864.690280]  [] watchdog_overflow_callback+0x92/0xc0
> [33864.697294]  [] __perf_event_overflow+0x91/0x270
> [33864.703916]  [] ? __perf_event_overflow+0xd9/0x270
> [33864.710696]  [] ? x86_perf_event_set_period+0xda/0x180
> [33864.717842]  [] perf_event_overflow+0x19/0x20
> [33864.724195]  [] intel_pmu_handle_irq+0x1e2/0x450
> [33864.730840]  [] perf_event_nmi_handler+0x2b/0x50
> [33864.737436]  [] nmi_handle+0xa0/0x150
> [33864.743025]  [] ? nmi_handle+0x5/0x150
> [33864.748733]  [] default_do_nmi+0x4a/0x140
> [33864.754705]  [] do_nmi+0x98/0xe0
> [33864.759858]  [] end_repeat_nmi+0x1e/0x2e
> [33864.765746]  [] ? check_chain_key+0xdb/0x1e0
> [33864.772004]  [] ? check_chain_key+0xdb/0x1e0
> [33864.778253]  [] ? check_chain_key+0xdb/0x1e0
> [33864.784498]  <>[] 
> __lock_acquire.isra.31+0x3b9/0x1000
> [33864.792950]  [] ? __lock_acquire.isra.31+0x3b9/0x1000
> [33864.800045]  [] lock_acquire+0xa5/0x130
> [33864.805817]  [] ? __lock_task_sighand+0x6e/0x110
> [33864.812468]  [] ? __lock_task_sighand+0x1a/0x110
> [33864.819084]  [] _raw_spin_lock+0x31/0x40
> [33864.824979]  [] ? __lock_task_sighand+0x6e/0x110
> [33864.831623]  [] __lock_task_sighand+0x6e/0x110
> [33864.838096]  [] ? __lock_task_sighand+0x1a/0x110
> [33864.845314]  [] do_send_sig_info+0x2c/0x80
> [33864.851949]  [] ? perf_swevent_event+0x67/0x90
> [33864.858980]  [] send_sigio_to_task+0x12f/0x1a0
> [33864.866005]  [] ? send_sigio_to_task+0x5/0x1a0
> [33864.873047]  [] ? send_sigio+0x56/0x100
> [33864.879411]  [] send_sigio+0xae/0x100
> [33864.885564]  [] kill_fasync+0x97/0xf0
> [33864.891713]  [] ? kill_fasync+0xf/0xf0
> [33864.897983]  [] perf_event_wakeup+0xd4/0xf0
> [33864.904662]  [] ? perf_event_wakeup+0x5/0xf0
> [33864.911490]  [] ? perf_pending_event+0xe0/0x110
> [33864.918580]  [] perf_pending_event+0xe0/0x110
> [33864.925494]  [] irq_work_run_list+0x4c/0x80
> [33864.932197]  [] irq_work_run+0x18/0x40
> [33864.938469]  [] smp_trace_irq_work_interrupt+0x3f/0xc0
> [33864.946263]  [] trace_irq_work_interrupt+0x6e/0x80
> [33864.953646][] ? copy_page_range+0x527/0x9a0
> [33864.961287]  [] ? copy_page_range+0x502/0x9a0
> [33864.968265]  [] copy_process.part.23+0xc92/0x1b80
> [33864.975589]  [] ? SYSC_kill+0x8e/0x230
> [33864.981879]  [] do_fork+0xd8/0x420
> [33864.987807]  [] ? f_setown+0x83/0xa0
> [33864.993953]  [] ? SyS_fcntl+0x310/0x650
> [33865.000348]  [] ? lockdep_sys_exit_thunk+0x12/0x14
> [33865.007781]  [] SyS_clone+0x16/0x20

[PATCH] Input: sx8654 - fix memory allocation check

2015-05-08 Thread Dmitry Torokhov
We have been testing wrong variable when trying to make sure that input
allocation succeeded.

Reported by Coverity (CID 1295918).

Signed-off-by: Dmitry Torokhov 
---
 drivers/input/touchscreen/sx8654.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/input/touchscreen/sx8654.c 
b/drivers/input/touchscreen/sx8654.c
index aecb9ad..642f4a5 100644
--- a/drivers/input/touchscreen/sx8654.c
+++ b/drivers/input/touchscreen/sx8654.c
@@ -187,7 +187,7 @@ static int sx8654_probe(struct i2c_client *client,
return -ENOMEM;
 
input = devm_input_allocate_device(>dev);
-   if (!sx8654)
+   if (!input)
return -ENOMEM;
 
input->name = "SX8654 I2C Touchscreen";
-- 
2.2.0.rc0.207.ga3a616c


-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [alsa-devel] [PATCH] ASoC: codecs-ac97: Remove rate constraints

2015-05-08 Thread Fabio Estevam
Hi Maciej,

On Fri, May 8, 2015 at 8:14 PM, Maciej S. Szmigiero
 wrote:

> I currently have audio running on this board at kernel based on
> vanilla 3.19 and porting required changes part-by-part to linux-next.
>
> Changes required on vanilla 3.19 to have it working are:
> * AC'97 audio support needs to be added to fsl-asoc-card,
>
> * AC'97 CODEC platform device needs to be instantiated in fsl_ssi,
>
> * IPG clock needs to be enabled in fsl_ssi AC'97 mode,
> so AC'97 regs can be accessed,
>
> * Few small fixes for AC'97 mode in fsl_ssi (missing switch label
> for format, missing  fsl_ssi_dai_probe entry in fsl_ssi_ac97_dai,
> etc.).
>
> There also is a problem with this CODEC that it seems to pull samples
> for S/PDIF output from time to time even if S/PDIF output is disabled.
>
> By default this requests samples in AC'97 slots 10/11 via SLOTREQ,
> which in turn causes SSI to enable these slots in SACCST register
> and start sending half of the sound samples there.
>
> The end result is that audio suddenly starts to play two times
> too fast.
>
> Currently, I have a workaround of setting S/PDIF slot assignment
> in CODEC to first front pair so at least it doesn't affect
> playback rate.
>
> If you like to have these changes or DT file diff then naturally
> I can share them, just they aren't production-quality as of now.

Good job!

Please keep me on Cc when you submit further ac97 patches / udoo dts,
so that I can help testing them.

Thanks,

Fabio Estevam
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.13.y-ckt stable] Linux 3.13.11-ckt20

2015-05-08 Thread Kamal Mostafa
diff --git a/Makefile b/Makefile
index d917ed5..35d9566 100644
--- a/Makefile
+++ b/Makefile
@@ -1,7 +1,7 @@
 VERSION = 3
 PATCHLEVEL = 13
 SUBLEVEL = 11
-EXTRAVERSION = -ckt19
+EXTRAVERSION = -ckt20
 NAME = King of Alienated Frog Porn
 
 # *DOCUMENTATION*
diff --git a/arch/arc/kernel/signal.c b/arch/arc/kernel/signal.c
index 7e95e1a..d68b410 100644
--- a/arch/arc/kernel/signal.c
+++ b/arch/arc/kernel/signal.c
@@ -67,7 +67,7 @@ stash_usr_regs(struct rt_sigframe __user *sf, struct pt_regs 
*regs,
   sigset_t *set)
 {
int err;
-   err = __copy_to_user(&(sf->uc.uc_mcontext.regs), regs,
+   err = __copy_to_user(&(sf->uc.uc_mcontext.regs.scratch), regs,
 sizeof(sf->uc.uc_mcontext.regs.scratch));
err |= __copy_to_user(>uc.uc_sigmask, set, sizeof(sigset_t));
 
@@ -83,7 +83,7 @@ static int restore_usr_regs(struct pt_regs *regs, struct 
rt_sigframe __user *sf)
if (!err)
set_current_blocked();
 
-   err |= __copy_from_user(regs, &(sf->uc.uc_mcontext.regs),
+   err |= __copy_from_user(regs, &(sf->uc.uc_mcontext.regs.scratch),
sizeof(sf->uc.uc_mcontext.regs.scratch));
 
return err;
diff --git a/arch/arm64/include/asm/mmu_context.h 
b/arch/arm64/include/asm/mmu_context.h
index a9eee33..101a42b 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -151,6 +151,15 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 {
unsigned int cpu = smp_processor_id();
 
+   /*
+* init_mm.pgd does not contain any user mappings and it is always
+* active for kernel addresses in TTBR1. Just set the reserved TTBR0.
+*/
+   if (next == _mm) {
+   cpu_set_reserved_ttbr0();
+   return;
+   }
+
if (!cpumask_test_and_set_cpu(cpu, mm_cpumask(next)) || prev != next)
check_and_switch_context(next, tsk);
 }
diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index cde4e0a..bf38292 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -24,10 +24,10 @@
 static struct kobject *mobility_kobj;
 
 struct update_props_workarea {
-   u32 phandle;
-   u32 state;
-   u64 reserved;
-   u32 nprops;
+   __be32 phandle;
+   __be32 state;
+   __be64 reserved;
+   __be32 nprops;
 } __packed;
 
 #define NODE_ACTION_MASK   0xff00
@@ -53,11 +53,11 @@ static int mobility_rtas_call(int token, char *buf, s32 
scope)
return rc;
 }
 
-static int delete_dt_node(u32 phandle)
+static int delete_dt_node(__be32 phandle)
 {
struct device_node *dn;
 
-   dn = of_find_node_by_phandle(phandle);
+   dn = of_find_node_by_phandle(be32_to_cpu(phandle));
if (!dn)
return -ENOENT;
 
@@ -126,7 +126,7 @@ static int update_dt_property(struct device_node *dn, 
struct property **prop,
return 0;
 }
 
-static int update_dt_node(u32 phandle, s32 scope)
+static int update_dt_node(__be32 phandle, s32 scope)
 {
struct update_props_workarea *upwa;
struct device_node *dn;
@@ -135,6 +135,7 @@ static int update_dt_node(u32 phandle, s32 scope)
char *prop_data;
char *rtas_buf;
int update_properties_token;
+   u32 nprops;
u32 vd;
 
update_properties_token = rtas_token("ibm,update-properties");
@@ -145,7 +146,7 @@ static int update_dt_node(u32 phandle, s32 scope)
if (!rtas_buf)
return -ENOMEM;
 
-   dn = of_find_node_by_phandle(phandle);
+   dn = of_find_node_by_phandle(be32_to_cpu(phandle));
if (!dn) {
kfree(rtas_buf);
return -ENOENT;
@@ -161,6 +162,7 @@ static int update_dt_node(u32 phandle, s32 scope)
break;
 
prop_data = rtas_buf + sizeof(*upwa);
+   nprops = be32_to_cpu(upwa->nprops);
 
/* On the first call to ibm,update-properties for a node the
 * the first property value descriptor contains an empty
@@ -169,17 +171,17 @@ static int update_dt_node(u32 phandle, s32 scope)
 */
if (*prop_data == 0) {
prop_data++;
-   vd = *(u32 *)prop_data;
+   vd = be32_to_cpu(*(__be32 *)prop_data);
prop_data += vd + sizeof(vd);
-   upwa->nprops--;
+   nprops--;
}
 
-   for (i = 0; i < upwa->nprops; i++) {
+   for (i = 0; i < nprops; i++) {
char *prop_name;
 
prop_name = prop_data;
prop_data += strlen(prop_name) + 1;
-   vd = *(u32 *)prop_data;
+   vd = be32_to_cpu(*(__be32 *)prop_data);
prop_data += 

[3.13.y-ckt stable] Linux 3.13.11-ckt20

2015-05-08 Thread Kamal Mostafa
I am announcing the release of the Linux 3.13.11-ckt20 kernel.

The updated 3.13.y-ckt tree can be found at: 
  git://kernel.ubuntu.com/ubuntu/linux.git linux-3.13.y
and can be browsed at:
http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-3.13.y

The diff from v3.13.11-ckt19 is posted as a follow-up to this email.

The 3.13.y-ckt extended stable tree is maintained by the Canonical Kernel Team.
For more info, see https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

 -Kamal

-- 
 Makefile|   2 +-
 arch/arc/kernel/signal.c|   4 +-
 arch/arm64/include/asm/mmu_context.h|   9 +
 arch/powerpc/platforms/pseries/mobility.c   |  44 ++---
 arch/x86/kernel/reboot.c|  10 +
 arch/x86/kvm/lapic.c|  18 +-
 arch/x86/kvm/vmx.c  |   8 +-
 drivers/acpi/processor_idle.c   |   2 +-
 drivers/block/nbd.c |   8 +-
 drivers/dma/edma.c  |   6 +
 drivers/dma/omap-dma.c  |   1 +
 drivers/gpu/drm/radeon/radeon_bios.c|  10 +-
 drivers/iio/accel/bma180.c  |   2 +-
 drivers/iio/adc/at91_adc.c  |   5 +-
 drivers/iio/adc/ti_am335x_adc.c |   3 +-
 drivers/iio/imu/adis_trigger.c  |   2 +-
 drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c  |  25 +--
 drivers/iio/industrialio-core.c |   5 +-
 drivers/iio/industrialio-event.c|   1 +
 drivers/infiniband/core/umem.c  |   8 +
 drivers/input/mouse/psmouse-base.c  |  14 ++
 drivers/input/mouse/psmouse.h   |   1 +
 drivers/input/mouse/synaptics.c | 241 +---
 drivers/input/serio/i8042-x86ia64io.h   |  15 ++
 drivers/input/serio/i8042.c |   6 +
 drivers/input/serio/serio.c |  14 ++
 drivers/media/platform/s5p-mfc/s5p_mfc_common.h |   2 +-
 drivers/media/platform/sh_veu.c |   1 +
 drivers/mfd/kempld-core.c   |   2 +-
 drivers/net/bonding/bond_3ad.c  |   2 +-
 drivers/net/bonding/bond_alb.c  |   2 +-
 drivers/net/bonding/bond_main.c |  10 +-
 drivers/net/can/flexcan.c   |  11 +-
 drivers/net/ethernet/amd/pcnet32.c  |  31 ++-
 drivers/net/ethernet/broadcom/bnx2.c|   6 +-
 drivers/net/ethernet/broadcom/tg3.c |  14 +-
 drivers/net/ethernet/emulex/benet/be_main.c |   2 +-
 drivers/net/ethernet/freescale/gianfar.c|   4 +-
 drivers/net/ethernet/intel/ixgb/ixgb_main.c |   6 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |  15 +-
 drivers/net/ethernet/realtek/8139cp.c   |   2 +-
 drivers/net/ethernet/realtek/8139too.c  |   4 +-
 drivers/net/ethernet/realtek/r8169.c|   6 +-
 drivers/net/wireless/iwlwifi/dvm/dev.h  |   1 -
 drivers/net/wireless/iwlwifi/dvm/ucode.c|   5 -
 drivers/net/xen-netfront.c  |   5 +-
 drivers/pci/hotplug/cpci_hotplug_pci.c  |   3 +-
 drivers/scsi/be2iscsi/be_main.c |   2 +-
 drivers/scsi/scsi_lib.c |   4 +-
 drivers/spi/spi.c   |   5 +-
 drivers/target/iscsi/iscsi_target.c |   2 +-
 drivers/tty/n_tty.c | 143 +-
 drivers/tty/serial/fsl_lpuart.c |   3 +
 drivers/usb/host/xhci-hub.c |   9 +-
 drivers/usb/host/xhci-pci.c |   2 +-
 drivers/usb/serial/ftdi_sio.c   |   9 +-
 drivers/usb/serial/ftdi_sio_ids.h   |   6 +
 drivers/usb/serial/keyspan_pda.c|   3 +
 fs/aio.c|   3 +
 fs/cifs/file.c  |   1 +
 fs/cifs/smb2ops.c   |   3 +-
 fs/exec.c   |  76 +---
 fs/hfsplus/brec.c   |  20 +-
 fs/ocfs2/file.c |  14 +-
 include/linux/blk_types.h   |   4 +-
 include/linux/netdevice.h   |   6 +
 include/linux/serio.h   |   1 +
 include/linux/skbuff.h  |   1 +
 include/net/ip.h|  16 --
 include/net/ip6_route.h |   3 +-
 include/net/sock.h  |   2 +
 include/uapi/linux/input.h  |   1 +
 kernel/events/core.c|  10 +
 mm/memory_hotplug.c |  13 +-
 mm/mmap.c   |   4 +-
 mm/page-writeback.c |   7 +-
 mm/rmap.c   |   7 +
 net/core/dev.c   

Re: [PATCHv2 0/3] Find mirrored memory, use for boot time allocations

2015-05-08 Thread Tony Luck
On Fri, May 8, 2015 at 1:49 PM, Andrew Morton  wrote:
> What I mean is: allow userspace to consume ZONE_MIRROR memory because
> we can snatch it back if it is needed for kernel memory.

For suitable interpretations of "snatch it back" ... if there is none
free in a GFP_NOWAIT request, then we are doomed.  But we
could maintain some high/low watermarks to arrange the snatching
when mirrored memory is getting low, rather than all the way out.

It's worth a look - but perhaps at phase three. It would make life
a bit easier for people to get the right amount of mirror. If they
guess too high they are still wasting some memory because
every mirrored page has two pages in DIMM. But without this
sort of trick all the extra mirrored memory would be totally wasted.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 3/3] usb: xhci: remove stop device and ring doorbell in hub control and bus suspend

2015-05-08 Thread Lu, Baolu



On 05/08/2015 07:01 PM, Greg Kroah-Hartman wrote:

On Fri, May 08, 2015 at 06:26:28PM +0800, Lu Baolu wrote:

There is no need to call xhci_stop_device() and xhci_ring_device() in
hub control and bus suspend functions since all device suspend and
resume have been notified through device_suspend/device_resume interfaces.

Does this mean that after patch 2, things are broken and require this
patch to prevent problems?


No. things work well without patch 3. "stop device" and "ring doorbell"
operations in hub control and bus suspend is harmless, but duplicated
and unnecessary, so I remove them.



I don't want to have any patch to make the system unstable.

thanks,

greg k-h


Thank you,
Baolu


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] staging: gdm724: adding kernel endianness header

2015-05-08 Thread Jaime Arrocha
>From TODO list: remove test for host endian
Included header to gather information about host endianness.
Please let me know if the code addition requires corrections
to meet standards.

Signed-off-by: Jaime Arrocha 
---
 drivers/staging/gdm724x/gdm_endian.c |   22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/gdm724x/gdm_endian.c 
b/drivers/staging/gdm724x/gdm_endian.c
index f6cc90a..5dfd9d3 100644
--- a/drivers/staging/gdm724x/gdm_endian.c
+++ b/drivers/staging/gdm724x/gdm_endian.c
@@ -11,27 +11,25 @@
  * GNU General Public License for more details.
  */
 
-#include 
+#include
+#ifdef __LITTLE_ENDIAN
+#include
+#define H_ENDIAN ENDIANNESS_LITTLE
+#else
+#include
+#define H_ENDIAN ENDIANNESS_BIG
+#endif
+
 #include "gdm_endian.h"
 
 void gdm_set_endian(struct gdm_endian *ed, u8 dev_endian)
 {
-   u8 a[2] = {0x12, 0x34};
-   u8 b[2] = {0, };
-   u16 c = 0x1234;
-
if (dev_endian == ENDIANNESS_BIG)
ed->dev_ed = ENDIANNESS_BIG;
else
ed->dev_ed = ENDIANNESS_LITTLE;
 
-   memcpy(b, , 2);
-
-   if (a[0] != b[0])
-   ed->host_ed = ENDIANNESS_LITTLE;
-   else
-   ed->host_ed = ENDIANNESS_BIG;
-
+   ed->host_ed = H_ENDIAN;
 }
 
 u16 gdm_cpu_to_dev16(struct gdm_endian *ed, u16 x)
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] support "dataplane" mode for nohz_full

2015-05-08 Thread Andrew Morton
On Fri, 8 May 2015 19:11:10 -0400 Chris Metcalf  wrote:

> On 5/8/2015 5:22 PM, Steven Rostedt wrote:
> > On Fri, 8 May 2015 14:18:24 -0700
> > Andrew Morton  wrote:
> >
> >> On Fri, 8 May 2015 13:58:41 -0400 Chris Metcalf  
> >> wrote:
> >>
> >>> A prctl() option (PR_SET_DATAPLANE) is added
> >> Dumb question: what does the term "dataplane" mean in this context?  I
> >> can't see the relationship between those words and what this patch
> >> does.
> > I was thinking the same thing. I haven't gotten around to searching
> > DATAPLANE yet.
> >
> > I would assume we want a name that is more meaningful for what is
> > happening.
> 
> The text in the commit message and the 0/6 cover letter do try to explain
> the concept.  The terminology comes, I think, from networking line cards,
> where the "dataplane" is the part of the application that handles all the
> fast path processing of network packets, and the "control plane" is the part
> that handles routing updates, etc., generally slow-path stuff.  I've probably
> just been using the terms so long they seem normal to me.
> 
> That said, what would be clearer?  NO_HZ_STRICT as a superset of
> NO_HZ_FULL?  Or move away from the NO_HZ terminology a bit; after all,
> we're talking about no interrupts of any kind, and maybe NO_HZ is too
> limited in scope?  So, NO_INTERRUPTS?  USERSPACE_ONLY?  Or look
> to vendors who ship bare-metal runtimes and call it BARE_METAL?
> Borrow the Tilera marketing name and call it ZERO_OVERHEAD?
> 
> Maybe BARE_METAL seems most plausible -- after DATAPLANE, to me,
> of course :-)

I like NO_INTERRUPTS.  Simple, direct.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages

2015-05-08 Thread Nishanth Aravamudan
On 08.05.2015 [15:47:26 -0700], Andrew Morton wrote:
> On Wed, 06 May 2015 11:28:12 +0200 Vlastimil Babka  wrote:
> 
> > On 05/06/2015 12:09 AM, Nishanth Aravamudan wrote:
> > > On 03.04.2015 [10:45:56 -0700], Nishanth Aravamudan wrote:
> > >>> What I find somewhat worrying though is that we could potentially
> > >>> break the pfmemalloc_watermark_ok() test in situations where
> > >>> zone_reclaimable_pages(zone) == 0 is a transient situation (and not
> > >>> a permanently allocated hugepage). In that case, the throttling is
> > >>> supposed to help system recover, and we might be breaking that
> > >>> ability with this patch, no?
> > >>
> > >> Well, if it's transient, we'll skip it this time through, and once there
> > >> are reclaimable pages, we should notice it again.
> > >>
> > >> I'm not familiar enough with this logic, so I'll read through the code
> > >> again soon to see if your concern is valid, as best I can.
> > >
> > > In reviewing the code, I think that transiently unreclaimable zones will
> > > lead to some higher direct reclaim rates and possible contention, but
> > > shouldn't cause any major harm. The likelihood of that situation, as
> > > well, in a non-reserved memory setup like the one I described, seems
> > > exceedingly low.
> > 
> > OK, I guess when a reasonably configured system has nothing to reclaim, 
> > it's already busted and throttling won't change much.
> > 
> > Consider the patch Acked-by: Vlastimil Babka 
> 
> OK, thanks, I'll move this patch into the queue for 4.2-rc1.

Thank you!

> Or is it important enough to merge into 4.1?

I think 4.2 is sufficient, but I wonder now if I should have included a
stable tag? The issue has been around for a while and there's a
relatively easily workaround (use the per-node sysfs files to manually
round-robin around the exhausted node) in older kernels, so I had
decided against it before.

Thanks,
Nish

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [alsa-devel] [PATCH] ASoC: codecs-ac97: Remove rate constraints

2015-05-08 Thread Maciej S. Szmigiero
W dniu 08.05.2015 23:32, Fabio Estevam pisze:
> On Fri, May 8, 2015 at 6:16 PM, Maciej S. Szmigiero
>  wrote:
>> Remove rate constraints from generic ASoC AC'97 CODEC and make
>> it selectable in config.
> 
> Shouldn't this be split in two patches?

I've submitted it as one patch because they are two trivial changes
to make the generic ASoC AC'97 CODEC usable for me outside existing
platform files.

But naturally, I can split them and submit them separately
if that would be better.

>> Supported rates should be detected and constrained anyway by
>> AC'97 generic code - was tested with VT1613 CODEC and iMX6 SSI
>> controller.
> 
> Nice, I would like to test this on a Udoo board. Care to share the dts
> changes? (I know this is off topic for this list ;-) Apart from the
> dts changes: are there still missing patches in linux-next to make
> audio work in Udoo?

I currently have audio running on this board at kernel based on
vanilla 3.19 and porting required changes part-by-part to linux-next.

Changes required on vanilla 3.19 to have it working are:
* AC'97 audio support needs to be added to fsl-asoc-card,

* AC'97 CODEC platform device needs to be instantiated in fsl_ssi,

* IPG clock needs to be enabled in fsl_ssi AC'97 mode,
so AC'97 regs can be accessed,

* Few small fixes for AC'97 mode in fsl_ssi (missing switch label
for format, missing  fsl_ssi_dai_probe entry in fsl_ssi_ac97_dai,
etc.).

There also is a problem with this CODEC that it seems to pull samples
for S/PDIF output from time to time even if S/PDIF output is disabled.

By default this requests samples in AC'97 slots 10/11 via SLOTREQ,
which in turn causes SSI to enable these slots in SACCST register
and start sending half of the sound samples there.

The end result is that audio suddenly starts to play two times
too fast.

Currently, I have a workaround of setting S/PDIF slot assignment
in CODEC to first front pair so at least it doesn't affect
playback rate.

If you like to have these changes or DT file diff then naturally
I can share them, just they aren't production-quality as of now.

>> This way this driver can be used for platforms which don't need
>> specialized AC'97 CODEC drivers while at the same avoiding
>> code duplication from implementing equivalent functionality in
>> a controller driver.
>>
>> Resending due to no response received.
> 
> No need to put this in the commit log.
>
> Thanks

Thanks for looking into patch and best regards,
Maciej Szmigiero

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] support "dataplane" mode for nohz_full

2015-05-08 Thread Chris Metcalf

On 5/8/2015 5:22 PM, Steven Rostedt wrote:

On Fri, 8 May 2015 14:18:24 -0700
Andrew Morton  wrote:


On Fri, 8 May 2015 13:58:41 -0400 Chris Metcalf  wrote:


A prctl() option (PR_SET_DATAPLANE) is added

Dumb question: what does the term "dataplane" mean in this context?  I
can't see the relationship between those words and what this patch
does.

I was thinking the same thing. I haven't gotten around to searching
DATAPLANE yet.

I would assume we want a name that is more meaningful for what is
happening.


The text in the commit message and the 0/6 cover letter do try to explain
the concept.  The terminology comes, I think, from networking line cards,
where the "dataplane" is the part of the application that handles all the
fast path processing of network packets, and the "control plane" is the part
that handles routing updates, etc., generally slow-path stuff.  I've probably
just been using the terms so long they seem normal to me.

That said, what would be clearer?  NO_HZ_STRICT as a superset of
NO_HZ_FULL?  Or move away from the NO_HZ terminology a bit; after all,
we're talking about no interrupts of any kind, and maybe NO_HZ is too
limited in scope?  So, NO_INTERRUPTS?  USERSPACE_ONLY?  Or look
to vendors who ship bare-metal runtimes and call it BARE_METAL?
Borrow the Tilera marketing name and call it ZERO_OVERHEAD?

Maybe BARE_METAL seems most plausible -- after DATAPLANE, to me,
of course :-)

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] vTPM: support little endian guests

2015-05-08 Thread Hon Ching (Vicky) Lo
Thanks Ashley!

> > The event log in ppc64 arch is always in big endian format. PowerPC
> > supports both little endian and big endian guests. This patch converts
> > the event log entries to guest format.
> 
> I'm a little confused here.  If this patch is to convert the event log 
> entries why are we convert in the conditional statements?  One example 
> below:
> 
> +   if (((convert_to_host_format(event->event_type) == 0) &&
> +(convert_to_host_format(event->event_size) == 0))
> +   ||
> +   ((v + sizeof(struct tcpa_event) +
> + convert_to_host_format(event->event_size)) > limit))
> 
> >
> > We defined a macro to convert to guest format. In addition,
> > tpm_binary_bios_measurements_show() is modified to parse the event
> > and print each field individually.
> 

We do not convert the whole event entry.  Instead, we're converting only
what's necessary such as pcr_index, event_type and event_size. pcr_value
and event_data are of type u8.  They do not need LE conversion.


> It's nice to have human readable format but it may break existing tools 
> that parse or understand the machine readable format.  Any comments on 
> this anyone?

I got comments on the format, so I tried to make that conditional
statement all in one line, but the 'Lindent' tool puts the lines back to
the above format..   


Regards,
Vicky

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[resend/refresh PATCH] infiniband: Remove duplicated KERN_ from pr_ uses

2015-05-08 Thread Joe Perches
These KERN_ uses are unnecessary with pr_ and cause
bad logging output so remove them.

Signed-off-by: Joe Perches 
Acked-by: Steve Wise 
---

Originally sent Jan 5,  This is refreshed against -next.

Now sent to the new maintainer too.

Rasmus Villemoes sent a similar patchset a month later.
https://lkml.org/lkml/2015/2/5/724

Rasmus' had an additional change that's useful.
I don't care which is applied.

 drivers/infiniband/hw/cxgb4/device.c | 4 ++--
 drivers/infiniband/hw/mlx4/main.c| 3 +--
 drivers/infiniband/hw/mlx5/qp.c  | 2 +-
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/device.c 
b/drivers/infiniband/hw/cxgb4/device.c
index cf54d69..7e895d7 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -1386,7 +1386,7 @@ static void recover_lost_dbs(struct uld_ctx *ctx, struct 
qp_list *qp_list)
  t4_sq_host_wq_pidx(>wq),
  t4_sq_wq_size(>wq));
if (ret) {
-   pr_err(KERN_ERR MOD "%s: Fatal error - "
+   pr_err(MOD "%s: Fatal error - "
   "DB overflow recovery failed - "
   "error syncing SQ qid %u\n",
   pci_name(ctx->lldi.pdev), qp->wq.sq.qid);
@@ -1402,7 +1402,7 @@ static void recover_lost_dbs(struct uld_ctx *ctx, struct 
qp_list *qp_list)
  t4_rq_wq_size(>wq));
 
if (ret) {
-   pr_err(KERN_ERR MOD "%s: Fatal error - "
+   pr_err(MOD "%s: Fatal error - "
   "DB overflow recovery failed - "
   "error syncing RQ qid %u\n",
   pci_name(ctx->lldi.pdev), qp->wq.rq.qid);
diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 57070c5..cc64400 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1569,8 +1569,7 @@ static void reset_gids_task(struct work_struct *work)
   MLX4_CMD_TIME_CLASS_B,
   MLX4_CMD_WRAPPED);
if (err)
-   pr_warn(KERN_WARNING
-   "set port %d command failed\n", gw->port);
+   pr_warn("set port %d command failed\n", gw->port);
}
 
mlx4_free_cmd_mailbox(dev, mailbox);
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 4d7024b..d35f62d 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1392,7 +1392,7 @@ static int mlx5_set_path(struct mlx5_ib_dev *dev, const 
struct ib_ah_attr *ah,
 
if (ah->ah_flags & IB_AH_GRH) {
if (ah->grh.sgid_index >= gen->port[port - 1].gid_table_len) {
-   pr_err(KERN_ERR "sgid_index (%u) too large. max is 
%d\n",
+   pr_err("sgid_index (%u) too large. max is %d\n",
   ah->grh.sgid_index, gen->port[port - 
1].gid_table_len);
return -EINVAL;
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] numa,sched: only consider less busy nodes as numa balancing destination

2015-05-08 Thread Rik van Riel
On 05/08/2015 04:03 PM, Rik van Riel wrote:

> If the normal scheduler load balancer is moving tasks the
> other way the NUMA balancer is moving them, things will
> not converge, and tasks will have worse memory locality
> than not doing NUMA balancing at all.
> 
> Currently the load balancer has a preference for moving
> tasks to their preferred nodes (NUMA_FAVOUR_HIGHER, true),
> but there is no resistance to moving tasks away from their
> preferred nodes (NUMA_RESIST_LOWER, false).  That setting
> was arrived at after a fair amount of experimenting, and
> is probably correct.

Never mind that. After reading the code several times after
that earlier post, it looks like having NUMA_FAVOR_HIGHER
enabled does absolutely nothing without also having
NUMA_RESIST_LOWER enabled, at least not for idle balancing.

At first glance, this code looks correct, and even useful:

/*
 * Aggressive migration if:
 * 1) destination numa is preferred
 * 2) task is cache cold, or
 * 3) too many balance attempts have failed.
 */
tsk_cache_hot = task_hot(p, env);
if (!tsk_cache_hot)
tsk_cache_hot = migrate_degrades_locality(p, env);

if (migrate_improves_locality(p, env) || !tsk_cache_hot ||
env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
if (tsk_cache_hot) {
schedstat_inc(env->sd, lb_hot_gained[env->idle]);
schedstat_inc(p,
se.statistics.nr_forced_migrations);
}
return 1;
}

However, with NUMA_RESIST_LOWER disabled (default),
migrate_degrades_locality always returns 0.

Furthermore, sched_migrate_latency_ns, which influences task_hot,
is set so small (.5 us) that task_hot is likely to always return
false for workloads with frequent sleeps and network latencies,
like a web workload...

In other words, the idle balancing code will treat tasks moving
towards their preferred NUMA node the same as tasks moving away
from their preferred NUMA node. It will move tasks regardless of
NUMA affinity, and can end up in a big fight with the NUMA
balancing code, as you have observed.

I am not sure what to do about this.

Peter?

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages

2015-05-08 Thread Andrew Morton
On Wed, 06 May 2015 11:28:12 +0200 Vlastimil Babka  wrote:

> On 05/06/2015 12:09 AM, Nishanth Aravamudan wrote:
> > On 03.04.2015 [10:45:56 -0700], Nishanth Aravamudan wrote:
> >>> What I find somewhat worrying though is that we could potentially
> >>> break the pfmemalloc_watermark_ok() test in situations where
> >>> zone_reclaimable_pages(zone) == 0 is a transient situation (and not
> >>> a permanently allocated hugepage). In that case, the throttling is
> >>> supposed to help system recover, and we might be breaking that
> >>> ability with this patch, no?
> >>
> >> Well, if it's transient, we'll skip it this time through, and once there
> >> are reclaimable pages, we should notice it again.
> >>
> >> I'm not familiar enough with this logic, so I'll read through the code
> >> again soon to see if your concern is valid, as best I can.
> >
> > In reviewing the code, I think that transiently unreclaimable zones will
> > lead to some higher direct reclaim rates and possible contention, but
> > shouldn't cause any major harm. The likelihood of that situation, as
> > well, in a non-reserved memory setup like the one I described, seems
> > exceedingly low.
> 
> OK, I guess when a reasonably configured system has nothing to reclaim, 
> it's already busted and throttling won't change much.
> 
> Consider the patch Acked-by: Vlastimil Babka 

OK, thanks, I'll move this patch into the queue for 4.2-rc1.

Or is it important enough to merge into 4.1?



From: Nishanth Aravamudan 
Subject: mm: vmscan: do not throttle based on pfmemalloc reserves if node has 
no reclaimable pages

Based upon 675becce15 ("mm: vmscan: do not throttle based on pfmemalloc
reserves if node has no ZONE_NORMAL") from Mel.

We have a system with the following topology:

# numactl -H
available: 3 nodes (0,2-3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
23 24 25 26 27 28 29 30 31
node 0 size: 28273 MB
node 0 free: 27323 MB
node 2 cpus:
node 2 size: 16384 MB
node 2 free: 0 MB
node 3 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
node 3 size: 30533 MB
node 3 free: 13273 MB
node distances:
node   0   2   3
  0:  10  20  20
  2:  20  10  20
  3:  20  20  10

Node 2 has no free memory, because:
# cat /sys/devices/system/node/node2/hugepages/hugepages-16777216kB/nr_hugepages
1

This leads to the following zoneinfo:

Node 2, zone  DMA
  pages free 0
min  1840
low  2300
high 2760
scanned  0
spanned  262144
present  262144
managed  262144
...
  all_unreclaimable: 1

If one then attempts to allocate some normal 16M hugepages via

echo 37 > /proc/sys/vm/nr_hugepages

The echo never returns and kswapd2 consumes CPU cycles.

This is because throttle_direct_reclaim ends up calling
wait_event(pfmemalloc_wait, pfmemalloc_watermark_ok...). 
pfmemalloc_watermark_ok() in turn checks all zones on the node if there
are any reserves, and if so, then indicates the watermarks are ok, by
seeing if there are sufficient free pages.

675becce15 added a condition already for memoryless nodes.  In this case,
though, the node has memory, it is just all consumed (and not
reclaimable).  Effectively, though, the result is the same on this call to
pfmemalloc_watermark_ok() and thus seems like a reasonable additional
condition.

With this change, the afore-mentioned 16M hugepage allocation attempt
succeeds and correctly round-robins between Nodes 1 and 3.

Signed-off-by: Nishanth Aravamudan 
Reviewed-by: Michal Hocko 
Acked-by: Vlastimil Babka 
Cc: Dave Hansen 
Cc: Mel Gorman 
Cc: Anton Blanchard 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Rik van Riel 
Cc: Dan Streetman 
Signed-off-by: Andrew Morton 
---

 mm/vmscan.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff -puN 
mm/vmscan.c~mm-vmscan-do-not-throttle-based-on-pfmemalloc-reserves-if-node-has-no-reclaimable-pages
 mm/vmscan.c
--- 
a/mm/vmscan.c~mm-vmscan-do-not-throttle-based-on-pfmemalloc-reserves-if-node-has-no-reclaimable-pages
+++ a/mm/vmscan.c
@@ -2646,7 +2646,8 @@ static bool pfmemalloc_watermark_ok(pg_d
 
for (i = 0; i <= ZONE_NORMAL; i++) {
zone = >node_zones[i];
-   if (!populated_zone(zone))
+   if (!populated_zone(zone) ||
+   zone_reclaimable_pages(zone) == 0)
continue;
 
pfmemalloc_reserve += min_wmark_pages(zone);
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] init.h: mark init functions hot instead of cold

2015-05-08 Thread Rasmus Villemoes
attribute((cold)) causes gcc to optimize the function for size rather
than speed. But since __init functions will be discarded anyway, I
don't see why memory should be a concern. On the contrary, everybody
wants their box to boot faster. Using the opposite attribute, hot,
causes gcc to optimize the functions more aggressively, possibly at
the expense of larger (.init).text. A completely unscientific test
showed about 2% faster boot time: I booted a kernel in qemu with and
without this patch five times each; the boot times were very stable in
each case, so I think the 2% is ok, but of course only applies to that
specific .config running in a virtual machine on my hardware.

__cold also means any path to a call of such a function is treated as
unlikely, while __hot means the opposite. For intra-__init calls, I
don't think that matters. That leaves calls from __ref functions to
__init functions. While I also think it doesn't matter in that case,
I'm sure someone can tell me why I'm wrong.

Signed-off-by: Rasmus Villemoes 
---
 include/linux/compiler-gcc4.h | 6 ++
 include/linux/compiler.h  | 8 
 include/linux/init.h  | 2 +-
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 769e19864632..b5e5c96d538e 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -27,11 +27,9 @@
Early snapshots of gcc 4.3 don't support this and we can't detect this
in the preprocessor, but we can live with this because they're unreleased.
Maketime probing would be overkill here.
-
-   gcc also has a __attribute__((__hot__)) to move hot functions into
-   a special section, but I don't see any sense in this right now in
-   the kernel context */
+ */
 #define __cold __attribute__((__cold__))
+#define __hot  __attribute__((__hot__))
 
 #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__)
 
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 0e41ca0e5927..50dbdc8d570d 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -357,6 +357,14 @@ static __always_inline void __write_once_size(volatile 
void *p, void *res, int s
 #define __cold
 #endif
 
+/*
+ * The opposite of __cold: Tell the compiler to optimize the function
+ * more aggressively, and treat paths leading to a call as likely.
+ */
+#ifndef __hot
+#define __hot
+#endif
+
 /* Simple shorthand for a section definition */
 #ifndef __section
 # define __section(S) __attribute__ ((__section__(#S)))
diff --git a/include/linux/init.h b/include/linux/init.h
index 21b6d768edd7..b6153a612ea0 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -39,7 +39,7 @@
 
 /* These are for everybody (although not all archs will actually
discard it in modules) */
-#define __init __section(.init.text) __cold notrace
+#define __init __section(.init.text) __hot notrace
 #define __initdata __section(.init.data)
 #define __initconst__constsection(.init.rodata)
 #define __exitdata __section(.exit.data)
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Docmentation, ABI: Update contact for L3 cache index disable

2015-05-08 Thread Aravind Gopalakrishnan
The mailing list disc...@x86-64.org is now defunct.
Using x...@kernel.org in its place.

Signed-off-by: Aravind Gopalakrishnan 
---
 Documentation/ABI/testing/sysfs-devices-system-cpu | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 99983e6..f1c46d0 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -162,7 +162,7 @@ Description:Discover CPUs in the same CPU frequency 
coordination domain
 What:  /sys/devices/system/cpu/cpu*/cache/index3/cache_disable_{0,1}
 Date:  August 2008
 KernelVersion: 2.6.27
-Contact:   disc...@x86-64.org
+Contact:   x...@kernel.org
 Description:   Disable L3 cache indices
 
These files exist in every CPU's cache/index3 directory. Each
-- 
2.4.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] vTPM: support little endian guests

2015-05-08 Thread Ashley Lai



The event log in ppc64 arch is always in big endian format. PowerPC
supports both little endian and big endian guests. This patch converts
the event log entries to guest format.


I'm a little confused here.  If this patch is to convert the event log 
entries why are we convert in the conditional statements?  One example 
below:


+   if (((convert_to_host_format(event->event_type) == 0) &&
+(convert_to_host_format(event->event_size) == 0))
+   ||
+   ((v + sizeof(struct tcpa_event) +
+ convert_to_host_format(event->event_size)) > limit))



We defined a macro to convert to guest format. In addition,
tpm_binary_bios_measurements_show() is modified to parse the event
and print each field individually.


It's nice to have human readable format but it may break existing tools 
that parse or understand the machine readable format.  Any comments on 
this anyone?





diff --git a/drivers/char/tpm/tpm_eventlog.c b/drivers/char/tpm/tpm_eventlog.c
index e77d8c1..1b62c52 100644
--- a/drivers/char/tpm/tpm_eventlog.c
+++ b/drivers/char/tpm/tpm_eventlog.c
@@ -28,6 +28,11 @@
#include "tpm.h"
#include "tpm_eventlog.h"

+#ifdef CONFIG_PPC64
+#define convert_to_host_format(x) be32_to_cpu(x)
+#else
+#define convert_to_host_format(x) x
+#endif


This can go in the header file tpm_eventlog.h



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/2] mm/thp: Split out pmd collpase flush into a seperate functions

2015-05-08 Thread Andrew Morton
On Thu,  7 May 2015 12:53:27 +0530 "Aneesh Kumar K.V" 
 wrote:

> After this patch pmdp_* functions operate only on hugepage pte,
> and not on regular pmd_t values pointing to page table.
> 

The patch looks like a pretty safe no-op for non-powerpc?

> --- a/arch/powerpc/include/asm/pgtable-ppc64.h
> +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
> @@ -576,6 +576,10 @@ static inline void pmdp_set_wrprotect(struct mm_struct 
> *mm, unsigned long addr,
>  extern void pmdp_splitting_flush(struct vm_area_struct *vma,
>unsigned long address, pmd_t *pmdp);
>  
> +#define __HAVE_ARCH_PMDP_COLLAPSE_FLUSH
> +extern pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
> +  unsigned long address, pmd_t *pmdp);
> +

The fashionable way of doing this is

extern pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
 unsigned long address, pmd_t *pmdp);
#define pmdp_collapse_flush pmdp_collapse_flush

then, elsewhere,

#ifndef pmdp_collapse_flush
static inline pmd_t pmdp_collapse_flush(...) {}
#define pmdp_collapse_flush pmdp_collapse_flush
#endif

It avoids introducing a second (ugly) symbol into the kernel.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] vfs: add a O_NOMTIME flag

2015-05-08 Thread Sage Weil
On Sat, 9 May 2015, Dave Chinner wrote:
> On Thu, May 07, 2015 at 09:23:24PM -0400, Trond Myklebust wrote:
> > On Thu, May 7, 2015 at 9:01 PM, Sage Weil  wrote:
> > > On Thu, 7 May 2015, Zach Brown wrote:
> > >> On Thu, May 07, 2015 at 10:26:17AM +1000, Dave Chinner wrote:
> > >> > On Wed, May 06, 2015 at 03:00:12PM -0700, Zach Brown wrote:
> > >> > > The criteria for using O_NOMTIME is the same as for using O_NOATIME:
> > >> > > owning the file or having the CAP_FOWNER capability.  If we're not
> > >> > > comfortable allowing owners to prevent mtime/ctime updates then we
> > >> > > should add a tunable to allow O_NOMTIME.  Maybe a mount option?
> > >> >
> > >> > I dislike "turn off safety for performance" options because Joe
> > >> > SpeedRacer will always select performance over safety.
> > >>
> > >> Well, for ceph there's no safety concern.  They never use cmtime in
> > >> these files.
> > >>
> > >> So are you suggesting not implementing this and making them rework their
> > >> IO paths to avoid the fs maintaining mtime so that we don't give Joe
> > >> Speedracer more rope?  Or are we talking about adding some speed bumps
> > >> that ceph can flip on that might give Joe Speedracer pause?
> > >
> > > I think this is the fundamental question: who do we give the ammunition
> > > to, the user or app writer, or the sysadmin?
> > >
> > > One might argue that we gave the user a similar power with O_NOATIME (the
> > > power to break applications that assume atime is accurate).  Here we give
> > > developers/users the power to not update mtime and suffer the consequences
> > > (like, obviously, breaking mtime-based backups).  It should be pretty
> > > obvious to anyone using the flag what the consequences are.
> > >
> > > Note that we can suffer similar lapses in mtime with fdatasync followed by
> > > a system crash.  And as Andy points out it's semi-broken for writable
> > > mmap.  The crash case is obviously a slightly different thing, but the
> > > idea that mtime can't always be trusted certainly isn't crazy talk.
> > >
> > > Or, we can be conservative and require a mount option so that the admin
> > > has to explicitly allow behavior that might break some existing
> > > assumptions about mtime/ctime ('-o user_noatime' I guess?).
> > >
> > > I'm happy either way, so long as in the end an unprivileged ceph daemon
> > > avoids the useless work.  In our case we always own the entire mount/disk,
> > > so a mount option is just fine.
> > >
> > 
> > So, what is the expectation here for filesystems that cannot support
> > this flag? NFSv3 in particular would break pretty catastrophically if
> > someone decided on a whim to turn off mtime: they will have turned off
> > the client's ability to detect cache incoherencies.
> 
> It's worse than that, now that I think about it. I think nomtime
> will break nfsv4 as the I_VERSION check is done *after* the
> NO[C]MTIME checks. e.g. the atomic change count used to detect file
> changes is only updated during the mtime update on write() calls in
> XFS. i.e. when the timestamp is changed, a transaction to change
> mtime is run, and that transaction commit bumps the change count.
> 
> So cutting out mtime updates at the VFS will prevent XFS and other
> I_VERSION aware filesystems from updating the change count that
> NFSv4 clients rely on to detect foreign data changes in a file.
> 
> Not sure what to do here, because the current NOCMTIME
> implementation intentionally cuts out the timestamp update because
> it's usage is fully invisible IO. i.e. it is used by utilities like
> xfs_fsr and HSMs to move data into and out of files without the
> application being able to detect the data movement in any way. These
> are not data modification operations, though - the file contents as
> read by the application do not change despite the fact we are moving
> data in and out of the file. In this case we don't want timestamps
> or change counters to change on the data movement, so I think we've
> actually got a difference in behaviour here between O_NOMTIME and
> O_NOCMTIME, right?
> 
> i.e. for nfsv4 sanity O_NOMTIME still needs to bump I_VERSION on
> write, just not modify the timestamp? In which case, not modifying
> the timestamps gains us nothing, because the inode is still dirtied?

Right: if we dirty the inode we've defeated the purpose of the patch.

> The list of caveats on O_NOMTIME seems to be growing...

...and remain consistent with our goals.  We couldn't care less if NFS or 
backup software or anything else doesn't notice these changes.  This is 
private data that is wholly managed by the ceph daemon.  The goal is to 
derive *some* value from the file system and avoid reimplementing it in 
userspace (without the bits we don't need).

I'm sure you realize what we're try to achieve is the same "invisible IO" 
that the XFS open by handle ioctls do by default.  Would you be more 
comfortable if this option where only available to the generic 
open_by_handle syscall, and not 

Re: [PATCH V2 2/2] powerpc/thp: Serialize pmd clear against a linux page table walk.

2015-05-08 Thread Andrew Morton
On Thu,  7 May 2015 12:53:28 +0530 "Aneesh Kumar K.V" 
 wrote:

> Serialize against find_linux_pte_or_hugepte which does lock-less
> lookup in page tables with local interrupts disabled. For huge pages
> it casts pmd_t to pte_t. Since format of pte_t is different from
> pmd_t we want to prevent transit from pmd pointing to page table
> to pmd pointing to huge page (and back) while interrupts are disabled.
> We clear pmd to possibly replace it with page table pointer in
> different code paths. So make sure we wait for the parallel
> find_linux_pte_or_hugepage to finish.

I'm not seeing here any description of the problem which is being
fixed.  Does the patch make the machine faster?  Does the machine
crash?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] vfs: add a O_NOMTIME flag

2015-05-08 Thread Dave Chinner
On Thu, May 07, 2015 at 09:23:24PM -0400, Trond Myklebust wrote:
> On Thu, May 7, 2015 at 9:01 PM, Sage Weil  wrote:
> > On Thu, 7 May 2015, Zach Brown wrote:
> >> On Thu, May 07, 2015 at 10:26:17AM +1000, Dave Chinner wrote:
> >> > On Wed, May 06, 2015 at 03:00:12PM -0700, Zach Brown wrote:
> >> > > The criteria for using O_NOMTIME is the same as for using O_NOATIME:
> >> > > owning the file or having the CAP_FOWNER capability.  If we're not
> >> > > comfortable allowing owners to prevent mtime/ctime updates then we
> >> > > should add a tunable to allow O_NOMTIME.  Maybe a mount option?
> >> >
> >> > I dislike "turn off safety for performance" options because Joe
> >> > SpeedRacer will always select performance over safety.
> >>
> >> Well, for ceph there's no safety concern.  They never use cmtime in
> >> these files.
> >>
> >> So are you suggesting not implementing this and making them rework their
> >> IO paths to avoid the fs maintaining mtime so that we don't give Joe
> >> Speedracer more rope?  Or are we talking about adding some speed bumps
> >> that ceph can flip on that might give Joe Speedracer pause?
> >
> > I think this is the fundamental question: who do we give the ammunition
> > to, the user or app writer, or the sysadmin?
> >
> > One might argue that we gave the user a similar power with O_NOATIME (the
> > power to break applications that assume atime is accurate).  Here we give
> > developers/users the power to not update mtime and suffer the consequences
> > (like, obviously, breaking mtime-based backups).  It should be pretty
> > obvious to anyone using the flag what the consequences are.
> >
> > Note that we can suffer similar lapses in mtime with fdatasync followed by
> > a system crash.  And as Andy points out it's semi-broken for writable
> > mmap.  The crash case is obviously a slightly different thing, but the
> > idea that mtime can't always be trusted certainly isn't crazy talk.
> >
> > Or, we can be conservative and require a mount option so that the admin
> > has to explicitly allow behavior that might break some existing
> > assumptions about mtime/ctime ('-o user_noatime' I guess?).
> >
> > I'm happy either way, so long as in the end an unprivileged ceph daemon
> > avoids the useless work.  In our case we always own the entire mount/disk,
> > so a mount option is just fine.
> >
> 
> So, what is the expectation here for filesystems that cannot support
> this flag? NFSv3 in particular would break pretty catastrophically if
> someone decided on a whim to turn off mtime: they will have turned off
> the client's ability to detect cache incoherencies.

It's worse than that, now that I think about it. I think nomtime
will break nfsv4 as the I_VERSION check is done *after* the
NO[C]MTIME checks. e.g. the atomic change count used to detect file
changes is only updated during the mtime update on write() calls in
XFS. i.e. when the timestamp is changed, a transaction to change
mtime is run, and that transaction commit bumps the change count.

So cutting out mtime updates at the VFS will prevent XFS and other
I_VERSION aware filesystems from updating the change count that
NFSv4 clients rely on to detect foreign data changes in a file.

Not sure what to do here, because the current NOCMTIME
implementation intentionally cuts out the timestamp update because
it's usage is fully invisible IO. i.e. it is used by utilities like
xfs_fsr and HSMs to move data into and out of files without the
application being able to detect the data movement in any way. These
are not data modification operations, though - the file contents as
read by the application do not change despite the fact we are moving
data in and out of the file. In this case we don't want timestamps
or change counters to change on the data movement, so I think we've
actually got a difference in behaviour here between O_NOMTIME and
O_NOCMTIME, right?

i.e. for nfsv4 sanity O_NOMTIME still needs to bump I_VERSION on
write, just not modify the timestamp? In which case, not modifying
the timestamps gains us nothing, because the inode is still dirtied?

The list of caveats on O_NOMTIME seems to be growing...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/5] selftest/futex: Increment ksft pass and fail counters

2015-05-08 Thread Darren Hart
Add kselftest.h to logging.h and increment the pass and fail counters as
part of the print_result routine which is called by all futex tests.

Cc: Shuah Khan 
Cc: linux-...@vger.kernel.org
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Davidlohr Bueso 
Cc: KOSAKI Motohiro 
Signed-off-by: Darren Hart 
---
 tools/testing/selftests/futex/functional/Makefile | 2 +-
 tools/testing/selftests/futex/include/logging.h   | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/futex/functional/Makefile 
b/tools/testing/selftests/futex/functional/Makefile
index fb96927..e64d43b 100644
--- a/tools/testing/selftests/futex/functional/Makefile
+++ b/tools/testing/selftests/futex/functional/Makefile
@@ -1,4 +1,4 @@
-INCLUDES := -I../include
+INCLUDES := -I../include -I../../
 CFLAGS := $(CFLAGS) -g -O2 -Wall -D_GNU_SOURCE $(INCLUDES)
 LDFLAGS := $(LDFLAGS) -lpthread -lrt
 
diff --git a/tools/testing/selftests/futex/include/logging.h 
b/tools/testing/selftests/futex/include/logging.h
index f6ed5c2..014aa01 100644
--- a/tools/testing/selftests/futex/include/logging.h
+++ b/tools/testing/selftests/futex/include/logging.h
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include "kselftest.h"
 
 /*
  * Define PASS, ERROR, and FAIL strings with and without color escape
@@ -111,12 +112,14 @@ void print_result(int ret)
 
switch (ret) {
case RET_PASS:
+   ksft_inc_pass_cnt();
result = PASS;
break;
case RET_ERROR:
result = ERROR;
break;
case RET_FAIL:
+   ksft_inc_fail_cnt();
result = FAIL;
break;
}
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] selftest/futex: Update Makefile to use lib.mk

2015-05-08 Thread Darren Hart
Adapt the futextest Makefiles to use lib.mk macros for RUN_TESTS and
EMIT_TESTS. For now, we reuse the run.sh mechanism provided by
futextest. This doesn't provide the standard selftests: [PASS|FAIL]
format, but the tests provide very similar output already.

This results in the run_kselftest.sh script for futexes including a
single line: ./run.sh

Cc: Shuah Khan 
Cc: linux-...@vger.kernel.org
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Davidlohr Bueso 
Cc: KOSAKI Motohiro 
Signed-off-by: Darren Hart 
---
 tools/testing/selftests/futex/Makefile| 20 +++-
 tools/testing/selftests/futex/functional/Makefile |  5 +++--
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/futex/Makefile 
b/tools/testing/selftests/futex/Makefile
index 2c26d59..6a17529 100644
--- a/tools/testing/selftests/futex/Makefile
+++ b/tools/testing/selftests/futex/Makefile
@@ -1,11 +1,29 @@
 SUBDIRS := functional
 
+TEST_PROGS := run.sh
+
 .PHONY: all clean
 all:
for DIR in $(SUBDIRS); do $(MAKE) -C $$DIR $@ ; done
 
-run_tests: all
+include ../lib.mk
+
+override define RUN_TESTS
./run.sh
+endef
+
+override define INSTALL_RULE
+   mkdir -p $(INSTALL_PATH)
+   install -t $(INSTALL_PATH) $(TEST_PROGS) $(TEST_PROGS_EXTENDED) 
$(TEST_FILES)
+
+   @for SUBDIR in $(SUBDIRS); do \
+   $(MAKE) -C $$SUBDIR INSTALL_PATH=$(INSTALL_PATH)/$$SUBDIR 
install; \
+   done;
+endef
+
+override define EMIT_TESTS
+   echo "./run.sh"
+endef
 
 clean:
for DIR in $(SUBDIRS); do $(MAKE) -C $$DIR $@ ; done
diff --git a/tools/testing/selftests/futex/functional/Makefile 
b/tools/testing/selftests/futex/functional/Makefile
index 4098340..fb96927 100644
--- a/tools/testing/selftests/futex/functional/Makefile
+++ b/tools/testing/selftests/futex/functional/Makefile
@@ -12,13 +12,14 @@ TARGETS := \
futex_wait_uninitialized_heap \
futex_wait_private_mapped_file
 
+TEST_PROGS := $(TARGETS) run.sh
+
 .PHONY: all clean
 all: $(TARGETS)
 
 $(TARGETS): $(HEADERS)
 
-run_tests: all
-   ./run.sh
+include ../../lib.mk
 
 clean:
rm -f $(TARGETS)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] selftests: Add futex functional tests

2015-05-08 Thread Darren Hart
The futextest testsuite [1] provides functional, stress, and
performance tests for the various futex op codes. Those tests will be of
more use to futex developers if they are included with the kernel
source.

Copy the core infrastructure and the functional tests into selftests,
but adapt them for inclusion in the kernel:

- Update the Makefile to include the run_tests target, remove reference
  to the performance and stress tests from the contributed sources.
- Replace my dead IBM email address with my current Intel email address.
- Remove the warrantee and write-to paragraphs from the license blurbs.
- Remove the NAME section as the filename is easily determined. ;-)
- Make the whitespace usage consistent in a couple of places.
- Cleanup various CodingStyle violations.

A future effort will explore moving the performance and stress tests
into the kernel.

1. http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git

Cc: Shuah Khan 
Cc: linux-...@vger.kernel.org
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Davidlohr Bueso 
Cc: KOSAKI Motohiro 
Signed-off-by: Darren Hart 
---
 tools/testing/selftests/futex/Makefile |  11 +
 tools/testing/selftests/futex/README   |  62 
 tools/testing/selftests/futex/functional/Makefile  |  24 ++
 .../selftests/futex/functional/futex_requeue_pi.c  | 409 +
 .../functional/futex_requeue_pi_mismatched_ops.c   | 135 +++
 .../functional/futex_requeue_pi_signal_restart.c   | 223 +++
 .../functional/futex_wait_private_mapped_file.c| 125 +++
 .../futex/functional/futex_wait_timeout.c  |  86 +
 .../functional/futex_wait_uninitialized_heap.c | 124 +++
 .../futex/functional/futex_wait_wouldblock.c   |  79 
 tools/testing/selftests/futex/functional/run.sh|  80 
 tools/testing/selftests/futex/include/atomic.h |  83 +
 tools/testing/selftests/futex/include/futextest.h  | 266 ++
 tools/testing/selftests/futex/include/logging.h| 150 
 tools/testing/selftests/futex/run.sh   |  33 ++
 15 files changed, 1890 insertions(+)
 create mode 100644 tools/testing/selftests/futex/Makefile
 create mode 100644 tools/testing/selftests/futex/README
 create mode 100644 tools/testing/selftests/futex/functional/Makefile
 create mode 100644 tools/testing/selftests/futex/functional/futex_requeue_pi.c
 create mode 100644 
tools/testing/selftests/futex/functional/futex_requeue_pi_mismatched_ops.c
 create mode 100644 
tools/testing/selftests/futex/functional/futex_requeue_pi_signal_restart.c
 create mode 100644 
tools/testing/selftests/futex/functional/futex_wait_private_mapped_file.c
 create mode 100644 
tools/testing/selftests/futex/functional/futex_wait_timeout.c
 create mode 100644 
tools/testing/selftests/futex/functional/futex_wait_uninitialized_heap.c
 create mode 100644 
tools/testing/selftests/futex/functional/futex_wait_wouldblock.c
 create mode 100755 tools/testing/selftests/futex/functional/run.sh
 create mode 100644 tools/testing/selftests/futex/include/atomic.h
 create mode 100644 tools/testing/selftests/futex/include/futextest.h
 create mode 100644 tools/testing/selftests/futex/include/logging.h
 create mode 100755 tools/testing/selftests/futex/run.sh

diff --git a/tools/testing/selftests/futex/Makefile 
b/tools/testing/selftests/futex/Makefile
new file mode 100644
index 000..2c26d59
--- /dev/null
+++ b/tools/testing/selftests/futex/Makefile
@@ -0,0 +1,11 @@
+SUBDIRS := functional
+
+.PHONY: all clean
+all:
+   for DIR in $(SUBDIRS); do $(MAKE) -C $$DIR $@ ; done
+
+run_tests: all
+   ./run.sh
+
+clean:
+   for DIR in $(SUBDIRS); do $(MAKE) -C $$DIR $@ ; done
diff --git a/tools/testing/selftests/futex/README 
b/tools/testing/selftests/futex/README
new file mode 100644
index 000..3224a04
--- /dev/null
+++ b/tools/testing/selftests/futex/README
@@ -0,0 +1,62 @@
+Futex Test
+==
+Futex Test is intended to thoroughly test the Linux kernel futex system call
+API.
+
+Functional tests shall test the documented behavior of the futex operation
+code under test. This includes checking for proper behavior under normal use,
+odd corner cases, regression tests, and abject abuse and misuse.
+
+Futextest will also provide example implementation of mutual exclusion
+primitives. These can be used as is in user applications or can serve as
+examples for system libraries. These will likely be added to either a new lib/
+directory or purely as header files under include/, I'm leaning toward the
+latter.
+
+Quick Start
+---
+# make
+# ./run.sh
+
+Design and Implementation Goals
+---
+o Tests should be as self contained as is practical so as to facilitate sharing
+  the individual tests on mailing list discussions and bug reports.
+o The build system shall remain as simple as possible, avoiding any archive or
+  shared object building and linking.
+o Where possible, any helper functions or 

[PATCH 4/5] selftest: Add futex tests to the top-level Makefile

2015-05-08 Thread Darren Hart
Enable futex tests to be built and run with the make kselftest and
associated targets.

Most of the tests require escalated privileges. These return ERROR, and
run.sh continues.

Cc: Shuah Khan 
Cc: linux-...@vger.kernel.org
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Davidlohr Bueso 
Cc: KOSAKI Motohiro 
Signed-off-by: Darren Hart 
---
 tools/testing/selftests/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 95abddc..ebac6b8 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -4,6 +4,7 @@ TARGETS += efivarfs
 TARGETS += exec
 TARGETS += firmware
 TARGETS += ftrace
+TARGETS += futex
 TARGETS += kcmp
 TARGETS += memfd
 TARGETS += memory-hotplug
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] kselftest: Add exit code defines

2015-05-08 Thread Darren Hart
Define the exit codes with KSFT_PASS and similar so tests can use these
directly if they choose. Also enable harnesses and other tooling to use
the defines instead of hardcoding the return codes.

Cc: Shuah Khan 
Cc: linux-...@vger.kernel.org
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Davidlohr Bueso 
Cc: KOSAKI Motohiro 
Signed-off-by: Darren Hart 
---
 tools/testing/selftests/kselftest.h | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/kselftest.h 
b/tools/testing/selftests/kselftest.h
index 572c888..ef1c80d 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -13,6 +13,13 @@
 #include 
 #include 
 
+/* define kselftest exit codes */
+#define KSFT_PASS  0
+#define KSFT_FAIL  1
+#define KSFT_XFAIL 2
+#define KSFT_XPASS 3
+#define KSFT_SKIP  4
+
 /* counters */
 struct ksft_count {
unsigned int ksft_pass;
@@ -40,23 +47,23 @@ static inline void ksft_print_cnts(void)
 
 static inline int ksft_exit_pass(void)
 {
-   exit(0);
+   exit(KSFT_PASS);
 }
 static inline int ksft_exit_fail(void)
 {
-   exit(1);
+   exit(KSFT_FAIL);
 }
 static inline int ksft_exit_xfail(void)
 {
-   exit(2);
+   exit(KSFT_XFAIL);
 }
 static inline int ksft_exit_xpass(void)
 {
-   exit(3);
+   exit(KSFT_XPASS);
 }
 static inline int ksft_exit_skip(void)
 {
-   exit(4);
+   exit(KSFT_SKIP);
 }
 
 #endif /* __KSELFTEST_H */
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL v2] selftest: Add futex functional tests

2015-05-08 Thread Darren Hart
Hi Shuah,

This series begins the process of migrating my futextest tests into kselftest.
I've started with only the functional tests, as the performance and stress may
not be appropriate for kselftest as they stand.

I cleaned up various complaints from checkpatch, but I ignored others that would
require significant rework of the testcases, such as not using volatile and not
creating new typedefs.

Since v1:
Avoid checkpatch errors on 1/5 by:
 - combining a later patch which did substantial cleanup.
 - removing file-local typedefs and replacing with structs
 - correcting all >80 char lines, except for quoted strings and header boiler
   plate due to long email addresses

I did *not* make changes for the following:
 - Use of new typdefs for types futex_t and atomic_t as they are used throughout
   the test suite and I consider them to be worth while.
 - Use of volatile as the warning is about use of volatile in kernel code. The
   usage in futextest is correct, as an indicator that other threads may modify
   the value.
 - Adding parentheses around complex defines as it would break one use case and
   change the behavior of another.

The patches will follow, but I'm providing a pull request for your convenience
as well.

The following changes since commit b787f68c36d49bb1d9236f403813641efa74a031:

  Linux 4.1-rc1 (2015-04-26 17:59:10 -0700)

are available in the git repository at:

  git://git.infradead.org/users/dvhart/linux.git futextest-v2

Darren Hart (5):
  selftests: Add futex functional tests
  selftest/futex: Update Makefile to use lib.mk
  selftest/futex: Increment ksft pass and fail counters
  selftest: Add futex tests to the top-level Makefile
  kselftest: Add exit code defines

 tools/testing/selftests/Makefile   |   1 +
 tools/testing/selftests/futex/Makefile |  29 ++
 tools/testing/selftests/futex/README   |  62 
 tools/testing/selftests/futex/functional/Makefile  |  25 ++
 .../selftests/futex/functional/futex_requeue_pi.c  | 409 +
 .../functional/futex_requeue_pi_mismatched_ops.c   | 135 +++
 .../functional/futex_requeue_pi_signal_restart.c   | 223 +++
 .../functional/futex_wait_private_mapped_file.c| 125 +++
 .../futex/functional/futex_wait_timeout.c  |  86 +
 .../functional/futex_wait_uninitialized_heap.c | 124 +++
 .../futex/functional/futex_wait_wouldblock.c   |  79 
 tools/testing/selftests/futex/functional/run.sh|  80 
 tools/testing/selftests/futex/include/atomic.h |  83 +
 tools/testing/selftests/futex/include/futextest.h  | 266 ++
 tools/testing/selftests/futex/include/logging.h| 153 
 tools/testing/selftests/futex/run.sh   |  33 ++
 tools/testing/selftests/kselftest.h|  17 +-
 17 files changed, 1925 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/futex/Makefile
 create mode 100644 tools/testing/selftests/futex/README
 create mode 100644 tools/testing/selftests/futex/functional/Makefile
 create mode 100644 tools/testing/selftests/futex/functional/futex_requeue_pi.c
 create mode 100644 
tools/testing/selftests/futex/functional/futex_requeue_pi_mismatched_ops.c
 create mode 100644 
tools/testing/selftests/futex/functional/futex_requeue_pi_signal_restart.c
 create mode 100644 
tools/testing/selftests/futex/functional/futex_wait_private_mapped_file.c
 create mode 100644 
tools/testing/selftests/futex/functional/futex_wait_timeout.c
 create mode 100644 
tools/testing/selftests/futex/functional/futex_wait_uninitialized_heap.c
 create mode 100644 
tools/testing/selftests/futex/functional/futex_wait_wouldblock.c
 create mode 100755 tools/testing/selftests/futex/functional/run.sh
 create mode 100644 tools/testing/selftests/futex/include/atomic.h
 create mode 100644 tools/testing/selftests/futex/include/futextest.h
 create mode 100644 tools/testing/selftests/futex/include/logging.h
 create mode 100755 tools/testing/selftests/futex/run.sh

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 06/10] mtd: brcmstb_nand: add SoC-specific support

2015-05-08 Thread Ray Jui


On 5/8/2015 1:47 PM, Brian Norris wrote:
> On Fri, May 08, 2015 at 09:49:02PM +0200, Arnd Bergmann wrote:
>> On Friday 08 May 2015 12:38:50 Brian Norris wrote:
>>> On Fri, May 08, 2015 at 03:41:10PM +0200, Arnd Bergmann wrote:
> [...]
> 
>>> To be clear, since I'm not sure if you're confused below:
>>>
>>>  * Cygnus is a family of chips using the IPROC architecture, coming from
>>>the Infrastructure/Networking Group; there are BCM numbers noted
>>>in arch/arm/mach-bcm/Kconfig for them, but I usually just refer to
>>>the Cygnus family or the IPROC architecture.
>>>
>>>  * BCM63xxx is a class of DSL chips from the Broadband/Connectivity
>>>Group.
>>
>> Thanks for the clarification, I think that is roughly what I thought it was,
>> but I'm still not sure about brcmstb. Is that related to bcm63xxx or 
>> separate?
> 
> I think arch/arm/mach-bcm/Kconfig has the best summary. brcmstb is
> separate; BCM7xxx is generally (always?) Set-Top Box.
> 
> Another potentially confusing point: the main driver is named
> 'brcsmtb_nand' since the NAND core (and driver) originated from STB
> chips. But that core was applied to other non-STB chips, and so the
> driver has been extended.
> 
 bcm63138_nand_driver with its own probe() function that calls the
 common probe function. That would make the soc specific parts
 better contained and match how we normally do abstractions of
 similar drivers.
>>>
>>> OK, so I can imagine this might require changing the DT binding a bit [1]
>>> (is that your goal?). But what's the intended software difference? [2]
>>> I'll still be passing around the same sorts of callbacks from the
>>> 'iproc_nand' probe to the common probe function.
> 
> ^^ before getting bogged down on the DT details (which can be changed
> independently), I'd like to address this point.
> 
>>> Brian
>>>
>>> [1] e.g.:
>>>
>>>nand: nand@18046000 {
>>>compatible = "brcm,iproc-nand", "brcm,brcmnand-v6.1", 
>>> "brcm,brcmnand";
>>>reg = <0x18046000 0x600>, <0xf8105408 0x600>, <0x18046f00 
>>> 0x20>;
>>>reg-names = "nand", "iproc-idm", "iproc-ext";
>>>interrupts = ;
>>>
>>>#address-cells = <1>;
>>>#size-cells = <0>;
>>>
>>>brcm,nand-has-wp;
>>>};
>>>
>>> This captures the extra "iproc-*" register ranges. Then we could have
>>> the iproc_nand driver bind against "brcm,iproc-nand", then call into the
>>> common probe, which would then accept/reject based on
>>> "brcm,brcmnand-vX.Y".
>>>
>>> [2] The DT structure from [1] could actually accommodate either driver
>>> structure just fine. So maybe that means it's a better hardware
>>> description?
>>
>> Yes, I think this makes sense overall. Regarding the specific example, can 
>> you
>> clarify how the register areas in iproc are structured?
>>
>> The 0xf8105408 and 0x18046f00 start addresses are not aligned to large powers
>> of two, which often indicates that they are part of some other, larger,
>> unit that might need to have a driver of its own, so before we specify
>> a binding like the one you proposed above I'd like to make sure we're not
>> getting ourselves into trouble later.
> 
> I may want the Cygnus guys to speak up here, partly for technical
> expertise and partly to know how much they care to share...
> 
> <0xf8105408 0x600>: covers a series of NAND_IDM registers. NAND has a
> few bits we don't care about (for debugging, logging, and resetting), as
> well as its interrupt enable bits. The adjacent blocks cover similar IDM
> blocks for other cores (SPI, PNOR, DDR), and they are similarly
> unaligned. Not sure why, exactly; probably just a compact layout.

Yes, starting from 0xf8105408, within the range of 0x600, there are
various NAND_IDM registers scattered, which is indeed a very weird
register layout.

Like Brian said, most of those NAND_IDM registers are for debugging,
logging, or status reporting. As of today, we only care about the first
register, that contains a bunch of bits that allow you to configure the
endianness of the AXI/APB bus, enabling NAND interrupts and clocks.

> 
> <0x18046f00 0x20>: a series of 8 NAND interrupt registers, each word
> containing a single bit representing status/clear. There is nothing
> between the "nand" range and this range, and the SPI core register range
> follows.

Correct.

> 
> So I think these are pretty clearly-delineated register ranges for NAND,
> and the alignment is not really missing anything. Adjacent hardware
> (e.g., SPI) is independent, though pieces look similar. For one, it has
> similar:
> 
>  * interrupt enable bits in the IDM range (0xf8106408 to 0xf8106a00);
>and
>  * interrupt status/clear following the SPI block (0x180473a0 to
>0x180473b8)
> 
> Brian
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH v1 03/12] crypto: qat - address recursive dependency when fw signing is enabled

2015-05-08 Thread Paul Bolle
On Thu, 2015-05-07 at 22:14 +0200, Paul Bolle wrote:
> Tomorrow, after a (western European) night of sleep, I hope to explain
> why the error in dad's file makes sense. I'm not much of a teacher so I
> need a clear head to do that.

Let's start with mom's Kconfig file. It triggers
error: recursive dependency detected!
symbol GYM depends on ROCK_CLIMBING
symbol ROCK_CLIMBING depends on LOCKER
symbol LOCKER depends on GYM

Now you should realize that the kconfig tools have to answers questions
like these, for each (tristate) symbol:
- must it be 'n'; or
- can it be 'm'; or
- can it be 'y'.

Take, for example: can GYM be 'y'? Since GYM depends on ROCK_CLIMBING,
it can only be 'y' if ROCK_CLIMBING is 'y' (both being tristate). And
ROCK_CLIMBING depends on LOCKER, so ROCK_CLIMBING can only be 'y' if
LOCKER is 'y' (ditto). And LOCKER, in its turn, depends on GYM, so it
can only be 'y', if GYM is 'y'.

But we can't say whether GYM is 'y' yet, as it can still be 'n', 'm', or
'y' for all we know. So we can't answer that question. Hence the
recursive dependency error. (There must be a term for this obvious
problem in formal logic, but I'm not trained in formal logic.)

On to dad's Kconfig file (which is your example, but simplified). That
triggers:
error: recursive dependency detected!
symbol GYM is selected by ROCK_CLIMBING
symbol ROCK_CLIMBING depends on LOCKER
symbol LOCKER depends on GYM

Let's try to determine whether GYM should be 'n'. Well, GYM is selected
by ROCK_CLIMBING so it cannot be 'n' if ROCK_CLIMBING is 'm' or 'y'. (If
ROCK_CLIMBING is 'm' it can be 'm' or 'y', but not 'n', and if
ROCK_CLIMBING is 'y' it must be 'y'.) Do we know whether ROCK_CLIMBING
should be 'n'? It should be 'n' only if LOCKER is 'n'. And LOCKER should
in its turn be 'n' if GYM is 'n'. But we don't know yet what GYM will
be. So, again, we can't answer this question. Recursive dependency
error!

The complicated error you ran into was
error: recursive dependency detected!
symbol CRYPTO is selected by SYSDATA_SIG
symbol SYSDATA_SIG is selected by FIRMWARE_SIG
symbol FIRMWARE_SIG depends on FW_LOADER
symbol FW_LOADER is selected by CRYPTO_DEV_QAT
symbol CRYPTO_DEV_QAT is selected by CRYPTO_DEV_QAT_DH895xCC
symbol CRYPTO_DEV_QAT_DH895xCC depends on CRYPTO

I'm lazy, so I haven't gone through this error step by step. But I'm
sure it's just a complicated version of what I tried to explain in the
above two examples. But if you're unconvinced I'll try to go through
this error too.

Now I'm sure the point I'm trying to make can be made more convincingly
and more elegantly. But the thing is, I think, that given how "select"
works and how "depends on" works, some setups will trigger these errors.
One might wish that "select" or "depends on" behaved differently, but
with the thousands of Kconfig symbols now in use, that really looks
unfeasible.

(Now let's see how all the, mostly German, people trained in formal
logic that appear to care about the kconfig tools shoot holes in my
reasoning.)

Hope this helps,


Paul Bolle

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/12] Add parse_integer() (replacement for simple_strto*())

2015-05-08 Thread Rasmus Villemoes
On Fri, May 08 2015, Andrew Morton  wrote:

> My overall reaction to this is "oh god, not again".  Is it really worth
> it?

I think it is, if it's done right. The problem is to get consensus on
what right means, but I think Alexey's approach is ok. The huge macro
may be ugly, but it puts the ugliness in one place.

>> +/* internal, do not use */
>> +int _parse_integer_sc(const char *s, unsigned int base, signed char *val);
>> +int _parse_integer_uc(const char *s, unsigned int base, unsigned char *val);
>> +int _parse_integer_s(const char *s, unsigned int base, short *val);
>> +int _parse_integer_us(const char *s, unsigned int base, unsigned short 
>> *val);
>> +int _parse_integer_i(const char *s, unsigned int base, int *val);
>> +int _parse_integer_u(const char *s, unsigned int base, unsigned int *val);
>> +int _parse_integer_ll(const char *s, unsigned int base, long long *val);
>> +int _parse_integer_ull(const char *s, unsigned int base, unsigned long long 
>> *val);
>
> These all have fairly lengthy implementations.  Could it all be done
> with a single function?
>
> int __parse_integer(const char *s, unsigned int base, unsigned int size, void 
> *val);
>
> Where "size" is 1,2,4,8 with the top bit set if signed?

I suggested something like that in private. These two patches roughly
correspond to 02/12 and 04/12 (they are just proof-of-concept).

Subject: [PATCH 1/2] lib: introduce parse_integer

This is an alternative implementation of parse_integer. It has a
slightly smaller code footprint (both in #LOC and in .text). Another
motivation was to expand on the idea of passing flags to the
underlying function, easing implementation of other interfaces in
terms of parse_integer.

In the other proposal, PARSE_INTEGER_NEWLINE means three things, which
I split into separate flags:

* Accept (but don't require) a single trailing newline.

* Require that the entire string is consumed (possibly after eating
  the trailing newline), otherwise return -EINVAL.

* Change return semantics: 0 for ok instead of #characters consumed.

Besides the three flags doing the above, I also added flags allowing
consuming leading and/or trailing whitespace. This may be used to
remove some boilerplate from code elsewhere. But this may be
over-engineering at this point, and it's easy enough to rip out (and
maybe add later). Still, I think it's nice to keep the
behaviour-changing flags separate.

Implementation-wise (and what saves ~500 bytes of .text), the main
difference is that there's only a single underlying function, and the
type of the destination is communicated to it with another few bits in
the base parameter. In the vast majority of cases, the combined
base_type_flags parameter will be a compile-time constant.

---
 include/linux/parse-integer.h |  78 
 lib/Makefile  |   1 +
 lib/kstrtox.c |  16 
 lib/kstrtox.h |   3 +-
 lib/parse-integer.c   | 166 ++
 5 files changed, 247 insertions(+), 17 deletions(-)
 create mode 100644 include/linux/parse-integer.h
 create mode 100644 lib/parse-integer.c

diff --git a/include/linux/parse-integer.h b/include/linux/parse-integer.h
new file mode 100644
index ..bdeb6c06059a
--- /dev/null
+++ b/include/linux/parse-integer.h
@@ -0,0 +1,78 @@
+#ifndef LINUX_PARSE_INTEGER_H
+#define LINUX_PARSE_INTEGER_H
+
+#include 
+#include 
+
+#define _PARSE_INTEGER_TYPE_SHIFT 8
+enum {
+   _PARSE_INTEGER_U8,
+   _PARSE_INTEGER_U16,
+   _PARSE_INTEGER_U32,
+   _PARSE_INTEGER_U64,
+   _PARSE_INTEGER_S8,
+   _PARSE_INTEGER_S16,
+   _PARSE_INTEGER_S32,
+   _PARSE_INTEGER_S64,
+};
+
+/*
+ * Various flags that can be ORed with the base to request slightly
+ * different semantics, to facilitate implementing various old
+ * interfaces in terms of parse_integer. Not to be used directly.
+ */
+
+/* allow (and skip) leading whitespace */
+#define _PARSE_INTEGER_LEAD_WS0x0800
+/* consume trailing whitespace */
+#define _PARSE_INTEGER_TRAIL_WS   0x1000
+/* consume a single trailing newline */
+#define _PARSE_INTEGER_NEWLINE0x2000
+/* return 0 for success instead of #characters consumed */
+#define _PARSE_INTEGER_ZERO_ON_OK 0x4000
+/* return -EINVAL unless the entire string was consumed */
+#define _PARSE_INTEGER_ALL0x8000
+
+/*
+ * This should just be BUILD_BUG(), but due to header dependency hell
+ * we can't use that.
+ */
+void __parse_integer_build_bug(void);
+
+#define parse_integer(s, base_flags, dest) ({  \
+   const char *_s = (s);   \
+   unsigned _btf = (base_flags);   \
+   typeof([0]) _dest = (dest);\
+   unsigned _t;\
+   \
+   if (__builtin_types_compatible_p(typeof(_dest), u8*))   \

Re: [PATCH v3 06/10] mtd: brcmstb_nand: add SoC-specific support

2015-05-08 Thread Brian Norris
On Fri, May 08, 2015 at 11:38:11PM +0200, Arnd Bergmann wrote:
> On Friday 08 May 2015 13:47:25 Brian Norris wrote:
> > On Fri, May 08, 2015 at 09:49:02PM +0200, Arnd Bergmann wrote:
> > > On Friday 08 May 2015 12:38:50 Brian Norris wrote:
> > > > On Fri, May 08, 2015 at 03:41:10PM +0200, Arnd Bergmann wrote:
> > > > > bcm63138_nand_driver with its own probe() function that calls the
> > > > > common probe function. That would make the soc specific parts
> > > > > better contained and match how we normally do abstractions of
> > > > > similar drivers.
> > > > 
> > > > OK, so I can imagine this might require changing the DT binding a bit 
> > > > [1]
> > > > (is that your goal?). But what's the intended software difference? [2]
> > > > I'll still be passing around the same sorts of callbacks from the
> > > > 'iproc_nand' probe to the common probe function.
> > 
> > ^^ before getting bogged down on the DT details (which can be changed
> > independently), I'd like to address this point.
> 
> The intended change is to make it work according to
> Documentation/driver-model/design-patterns.txt

Huh? There are two bullet points in that file, and neither are
particularly enlightening for this case. Maybe you're referring to your
mental design patterns documentation? :)

> basically, by having all the shared code be a "library" module that gets
> called by the actual hardware specific drivers, rather than having the
> shared code be the central driver that fans out into all possible subdrivers.

OK, I'll see what I can do. It will be a fairly opaque "library" though,
consisting largely of a single monolithic core driver. Might just move
to a whole drivers/mtd/nand/brcmnand/ subdirectory at the same time...

> > > Yes, I think this makes sense overall. Regarding the specific example, 
> > > can you
> > > clarify how the register areas in iproc are structured?
> > > 
> > > The 0xf8105408 and 0x18046f00 start addresses are not aligned to large 
> > > powers
> > > of two, which often indicates that they are part of some other, larger,
> > > unit that might need to have a driver of its own, so before we specify
> > > a binding like the one you proposed above I'd like to make sure we're not
> > > getting ourselves into trouble later.
> > 
> > I may want the Cygnus guys to speak up here, partly for technical
> > expertise and partly to know how much they care to share...
> > 
> > <0xf8105408 0x600>: covers a series of NAND_IDM registers. NAND has a
> > few bits we don't care about (for debugging, logging, and resetting), as
> > well as its interrupt enable bits. The adjacent blocks cover similar IDM
> > blocks for other cores (SPI, PNOR, DDR), and they are similarly
> > unaligned. Not sure why, exactly; probably just a compact layout.
> > 
> > <0x18046f00 0x20>: a series of 8 NAND interrupt registers, each word
> > containing a single bit representing status/clear. There is nothing
> > between the "nand" range and this range, and the SPI core register range
> > follows.
> > 
> > So I think these are pretty clearly-delineated register ranges for NAND,
> > and the alignment is not really missing anything. Adjacent hardware
> > (e.g., SPI) is independent, though pieces look similar. For one, it has
> > similar:
> > 
> >  * interrupt enable bits in the IDM range (0xf8106408 to 0xf8106a00);
> >and
> >  * interrupt status/clear following the SPI block (0x180473a0 to
> >0x180473b8)
> 
> This would in turn indicate that we should treat these ranges as
> an irqchip that handles all sorts of devices, but it really depends
> on the particular register layout.

OK, sure. But this has nothing to do with NAND (which we established
cannot be an irqchip on Cygnus). I think SPI is coming through the
pipeline soon, though, and that's a good point.

Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] staging: lustre: cl_page: delete empty macros

2015-05-08 Thread Julia Lawall
CS_PAGE_INC etc. do nothing, so remove them.

The semantic patch that performs this transformation is as follows:
(http://coccinelle.lip6.fr/)

// 
@@ expression o,item,state; @@
(
- CS_PAGE_INC(o, item);
|
- CS_PAGE_DEC(o, item);
|
- CS_PAGESTATE_INC(o, state);
|
- CS_PAGESTATE_DEC(o, state);
)
// 

Signed-off-by: Julia Lawall 

---
 drivers/staging/lustre/lustre/obdclass/cl_page.c |   17 -
 1 file changed, 17 deletions(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/cl_page.c 
b/drivers/staging/lustre/lustre/obdclass/cl_page.c
index 59d338a..a7f3032 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_page.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_page.c
@@ -62,12 +62,6 @@ static void cl_page_delete0(const struct lu_env *env, struct 
cl_page *pg,
 # define PINVRNT(env, page, exp) \
((void)sizeof(env), (void)sizeof(page), (void)sizeof !!(exp))
 
-/* Disable page statistic by default due to huge performance penalty. */
-#define CS_PAGE_INC(o, item)
-#define CS_PAGE_DEC(o, item)
-#define CS_PAGESTATE_INC(o, state)
-#define CS_PAGESTATE_DEC(o, state)
-
 /**
  * Internal version of cl_page_top, it should be called if the page is
  * known to be not freed, says with page referenced, or radix tree lock held,
@@ -264,8 +258,6 @@ static void cl_page_free(const struct lu_env *env, struct 
cl_page *page)
list_del_init(page->cp_layers.next);
slice->cpl_ops->cpo_fini(env, slice);
}
-   CS_PAGE_DEC(obj, total);
-   CS_PAGESTATE_DEC(obj, page->cp_state);
lu_object_ref_del_at(>co_lu, >cp_obj_ref, "cl_page", page);
cl_object_put(env, obj);
lu_ref_fini(>cp_reference);
@@ -323,11 +315,6 @@ static struct cl_page *cl_page_alloc(const struct lu_env 
*env,
}
}
}
-   if (result == 0) {
-   CS_PAGE_INC(o, total);
-   CS_PAGE_INC(o, create);
-   CS_PAGESTATE_DEC(o, CPS_CACHED);
-   }
} else {
page = ERR_PTR(-ENOMEM);
}
@@ -360,7 +347,6 @@ static struct cl_page *cl_page_find0(const struct lu_env 
*env,
might_sleep();
 
hdr = cl_object_header(o);
-   CS_PAGE_INC(o, lookup);
 
CDEBUG(D_PAGE, "%lu@"DFID" %p %lx %d\n",
   idx, PFID(>coh_lu.loh_fid), vmpage, vmpage->private, type);
@@ -387,7 +373,6 @@ static struct cl_page *cl_page_find0(const struct lu_env 
*env,
}
 
if (page != NULL) {
-   CS_PAGE_INC(o, hit);
return page;
}
 
@@ -554,8 +539,6 @@ static void cl_page_state_set0(const struct lu_env *env,
PASSERT(env, page,
equi(state == CPS_OWNED, page->cp_owner != NULL));
 
-   CS_PAGESTATE_DEC(page->cp_obj, page->cp_state);
-   CS_PAGESTATE_INC(page->cp_obj, state);
cl_page_state_set_trust(page, state);
}
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] staging: lustre: cl_page: drop unneeded variable and macros

2015-05-08 Thread Julia Lawall
Drop an unneeded variable in
drivers/staging/lustre/lustre/obdclass/cl_page.c and then drop a set of
empty macros.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] staging: lustre: cl_page: drop unneeded variable

2015-05-08 Thread Julia Lawall
Drop variable made unnecessary by conversion of obd free functions
to kfree.

Signed-off-by: Julia Lawall 

---
 drivers/staging/lustre/lustre/obdclass/cl_page.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/obdclass/cl_page.c 
b/drivers/staging/lustre/lustre/obdclass/cl_page.c
index 8873553..59d338a 100644
--- a/drivers/staging/lustre/lustre/obdclass/cl_page.c
+++ b/drivers/staging/lustre/lustre/obdclass/cl_page.c
@@ -248,7 +248,6 @@ EXPORT_SYMBOL(cl_page_gang_lookup);
 static void cl_page_free(const struct lu_env *env, struct cl_page *page)
 {
struct cl_object *obj  = page->cp_obj;
-   int pagesize = cl_object_header(obj)->coh_page_bufsize;
 
PASSERT(env, page, list_empty(>cp_batch));
PASSERT(env, page, page->cp_owner == NULL);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mtd: nand: Add on-die ECC support

2015-05-08 Thread Richard Weinberger
Am 08.05.2015 um 23:39 schrieb Brian Norris:
> On Fri, May 08, 2015 at 04:26:32PM -0500, Ben Shelton wrote:
>> On 04/27, Brian Norris wrote:
>>> On Tue, Apr 28, 2015 at 08:18:12AM +0530, punnaiah choudary kalluri wrote:
 On Tue, Apr 28, 2015 at 4:53 AM, Brian Norris
  wrote:
> On Tue, Apr 28, 2015 at 12:19:16AM +0200, Richard Weinberger wrote:
>> Oh, I thought every driver has to implement that function. ;-\
>
> Nope.
>
>> But you're right there is a corner case.
>
> And it's not the only one! Right now, there's no guarantee even that
> read_buf() returns raw data, unmodified by the SoC's controller. Plenty
> of drivers actually have HW-enabled ECC turned on by default, and so
> they override the chip->ecc.read_page() (and sometimes
> chip->ecc.read_page_raw() functions, if we're lucky) with something
> that pokes the appropriate hardware instead. I expect anything
> comprehensive here is probably going to have to utilize
> chip->ecc.read_page_raw(), at least if it's provided by the hardware
> driver.

 Yes, overriding the chip->ecc.read_page_raw would solve this.
>>>
>>> I'm actually suggesting that (in this patch set, for on-die ECC
>>> support), maybe we *shouldn't* override chip->ecc.read_page_raw() and
>>> leave that to be defined by the driver, and then on-die ECC support
>>> should be added in a way that just calls chip->ecc.read_page_raw(). This
>>> should work for any driver that already properly supports the raw
>>> callbacks.
>>>
>>
>> Hi Richard et al,
>>
>> I'm guessing it's probably too late for the on-die ECC stuff to land in
>> 4.2 at this point.
> 
> Not technically. We've got several weeks (approx 5 to 6?) before 4.1 is
> released. 4.2 material should be getting finalized by a week or so
> before the merge window (i.e., 4 to 5 weeks from now).
> 
>> Is there anything I can do to help this along
>> (testing, etc.)?
> 
> This is going to need to get rewritten. I'm not sure if Richard is going
> to tackle this again, as he hasn't responded to the points I brought up.
> (Note that Richard is not the first to have tried to implement this,
> without initial success.)

I'm definitely willing to take the challenge.
But as I'm currently very busy with non-MTD stuff I had no time
to address your comments.

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] i2c-tools: i2ctransfer: add new tool

2015-05-08 Thread Jean Delvare
Hi Wolfram,

On Fri, 8 May 2015 16:38:26 +0200, Wolfram Sang wrote:
> > Having slept over it, I came up with a 3rd proposal:
> > 
> > # i2ctransfer 0 w0x11@0x50 0xc0 0xbd= r1@0x51
> > 
> > That is, combining the slave address, direction and length into a
> > single parameter. The advantage is that this is all more explicit and
> > the risk of mixing up values is close to zero. Whether it is more or
> > less readable than the previous proposals is probably a matter of
> > taste. Also I suspect it would make the parsing and state machine more
> > simple, but that's only a nice side effect.
> > 
> > Wolfram (and others), please tell me what you think. I am not trying to
> > force my views here, just suggesting alternatives for your
> > consideration.
> 
> I liked your proposal, so thanks for this input. I agree that the risk
> of mixing something up is high, I was okay with the printout of the
> messages to be sent, but a better syntax is very welcome, too. I need to
> think about the flags a little bit, though. Although this isn't
> implemented yet, PEC and 10-bit flags might be added in the future?

This is a good point, we need to think about it. Maybe not PEC, as
normally any PEC-enabled transaction would be handled by the other
tools already. And I don't think the kernel can handle PEC over ioctl
I2C_RDWR anyway. But 10-bit addresses, we already had a request to
support than and your new tool would be perfect for that.

One easy way would be to assume that the transaction either targets one
or more 10-bit addressed chips, or one or more 7-bit addressed chips,
but doesn't mix. In that case a simple flag (say -t) in front of the
transaction will do the trick. I'd think it is sufficient, and I even
suspect that some controllers may only support that, but OTOH I never
worked with 10-bit addressed chips so I can't really tell.

If you think it's not enough, then the address modifier could go
separately before or after the address byte, i.e. either r1@0x123t or
r1@t0x123. I suspect that the latter should be easier to implement.

> Handling R/W as "just another" flag made this option extremly simple.
> But we probably can work something out.

I think the proposal above makes more sense than grouping it with the
direction letter (r or w) even though it's also a letter, as it's
really an address modifier, which affects neither the direction nor the
length. But again it's really only a suggestion, if you can come up
with something clearer and/or easier to implement, please do.

> So much for the quick response, I'll have a closer look later.

I wouldn't call it "quick" ;-) but you're welcome.

-- 
Jean Delvare
SUSE L3 Support
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mtd: nand: Add on-die ECC support

2015-05-08 Thread Brian Norris
On Fri, May 08, 2015 at 04:26:32PM -0500, Ben Shelton wrote:
> On 04/27, Brian Norris wrote:
> > On Tue, Apr 28, 2015 at 08:18:12AM +0530, punnaiah choudary kalluri wrote:
> > > On Tue, Apr 28, 2015 at 4:53 AM, Brian Norris
> > >  wrote:
> > > > On Tue, Apr 28, 2015 at 12:19:16AM +0200, Richard Weinberger wrote:
> > > >> Oh, I thought every driver has to implement that function. ;-\
> > > >
> > > > Nope.
> > > >
> > > >> But you're right there is a corner case.
> > > >
> > > > And it's not the only one! Right now, there's no guarantee even that
> > > > read_buf() returns raw data, unmodified by the SoC's controller. Plenty
> > > > of drivers actually have HW-enabled ECC turned on by default, and so
> > > > they override the chip->ecc.read_page() (and sometimes
> > > > chip->ecc.read_page_raw() functions, if we're lucky) with something
> > > > that pokes the appropriate hardware instead. I expect anything
> > > > comprehensive here is probably going to have to utilize
> > > > chip->ecc.read_page_raw(), at least if it's provided by the hardware
> > > > driver.
> > > 
> > > Yes, overriding the chip->ecc.read_page_raw would solve this.
> > 
> > I'm actually suggesting that (in this patch set, for on-die ECC
> > support), maybe we *shouldn't* override chip->ecc.read_page_raw() and
> > leave that to be defined by the driver, and then on-die ECC support
> > should be added in a way that just calls chip->ecc.read_page_raw(). This
> > should work for any driver that already properly supports the raw
> > callbacks.
> > 
> 
> Hi Richard et al,
> 
> I'm guessing it's probably too late for the on-die ECC stuff to land in
> 4.2 at this point.

Not technically. We've got several weeks (approx 5 to 6?) before 4.1 is
released. 4.2 material should be getting finalized by a week or so
before the merge window (i.e., 4 to 5 weeks from now).

> Is there anything I can do to help this along
> (testing, etc.)?

This is going to need to get rewritten. I'm not sure if Richard is going
to tackle this again, as he hasn't responded to the points I brought up.
(Note that Richard is not the first to have tried to implement this,
without initial success.)

Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 06/10] mtd: brcmstb_nand: add SoC-specific support

2015-05-08 Thread Arnd Bergmann
On Friday 08 May 2015 13:47:25 Brian Norris wrote:
> On Fri, May 08, 2015 at 09:49:02PM +0200, Arnd Bergmann wrote:
> > On Friday 08 May 2015 12:38:50 Brian Norris wrote:
> > > On Fri, May 08, 2015 at 03:41:10PM +0200, Arnd Bergmann wrote:
> [...]
> 
> > > To be clear, since I'm not sure if you're confused below:
> > > 
> > >  * Cygnus is a family of chips using the IPROC architecture, coming from
> > >the Infrastructure/Networking Group; there are BCM numbers noted
> > >in arch/arm/mach-bcm/Kconfig for them, but I usually just refer to
> > >the Cygnus family or the IPROC architecture.
> > > 
> > >  * BCM63xxx is a class of DSL chips from the Broadband/Connectivity
> > >Group.
> > 
> > Thanks for the clarification, I think that is roughly what I thought it was,
> > but I'm still not sure about brcmstb. Is that related to bcm63xxx or 
> > separate?
> 
> I think arch/arm/mach-bcm/Kconfig has the best summary. brcmstb is
> separate; BCM7xxx is generally (always?) Set-Top Box.
> 
> Another potentially confusing point: the main driver is named
> 'brcsmtb_nand' since the NAND core (and driver) originated from STB
> chips. But that core was applied to other non-STB chips, and so the
> driver has been extended.

Ok, I see.

> > > > bcm63138_nand_driver with its own probe() function that calls the
> > > > common probe function. That would make the soc specific parts
> > > > better contained and match how we normally do abstractions of
> > > > similar drivers.
> > > 
> > > OK, so I can imagine this might require changing the DT binding a bit [1]
> > > (is that your goal?). But what's the intended software difference? [2]
> > > I'll still be passing around the same sorts of callbacks from the
> > > 'iproc_nand' probe to the common probe function.
> 
> ^^ before getting bogged down on the DT details (which can be changed
> independently), I'd like to address this point.

The intended change is to make it work according to
Documentation/driver-model/design-patterns.txt

basically, by having all the shared code be a "library" module that gets
called by the actual hardware specific drivers, rather than having the
shared code be the central driver that fans out into all possible subdrivers.

> > 
> > Yes, I think this makes sense overall. Regarding the specific example, can 
> > you
> > clarify how the register areas in iproc are structured?
> > 
> > The 0xf8105408 and 0x18046f00 start addresses are not aligned to large 
> > powers
> > of two, which often indicates that they are part of some other, larger,
> > unit that might need to have a driver of its own, so before we specify
> > a binding like the one you proposed above I'd like to make sure we're not
> > getting ourselves into trouble later.
> 
> I may want the Cygnus guys to speak up here, partly for technical
> expertise and partly to know how much they care to share...
> 
> <0xf8105408 0x600>: covers a series of NAND_IDM registers. NAND has a
> few bits we don't care about (for debugging, logging, and resetting), as
> well as its interrupt enable bits. The adjacent blocks cover similar IDM
> blocks for other cores (SPI, PNOR, DDR), and they are similarly
> unaligned. Not sure why, exactly; probably just a compact layout.
> 
> <0x18046f00 0x20>: a series of 8 NAND interrupt registers, each word
> containing a single bit representing status/clear. There is nothing
> between the "nand" range and this range, and the SPI core register range
> follows.
> 
> So I think these are pretty clearly-delineated register ranges for NAND,
> and the alignment is not really missing anything. Adjacent hardware
> (e.g., SPI) is independent, though pieces look similar. For one, it has
> similar:
> 
>  * interrupt enable bits in the IDM range (0xf8106408 to 0xf8106a00);
>and
>  * interrupt status/clear following the SPI block (0x180473a0 to
>0x180473b8)

This would in turn indicate that we should treat these ranges as
an irqchip that handles all sorts of devices, but it really depends
on the particular register layout.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v9] iio: acpi: Add support for ACPI0008 Ambient Light Sensor

2015-05-08 Thread Marek Vasut
On Friday, May 08, 2015 at 11:34:35 PM, Jonathan Cameron wrote:
> On 8 May 2015 20:20:33 BST, Gabriele Mazzotta  wrote:
> >On Friday 08 May 2015 10:58:29 Jonathan Cameron wrote:
> >> On 02/05/15 08:30, Gabriele Mazzotta wrote:
> >> > This driver adds the initial support for the ACPI Ambient Light
> >
> >Sensor
> >
> >> > as defined in Section 9.2 of the ACPI specification (Revision 5.0)
> >
> >[1].
> >
> >> > Sensors complying with the standard are exposed as ACPI devices
> >
> >with
> >
> >> > ACPI0008 as hardware ID and provide standard methods by which the
> >
> >OS
> >
> >> > can query properties of the ambient light environment the system is
> >> > currently operating in.
> >> > 
> >> > This driver currently allows only to get the current ambient light
> >> > illuminance reading through the _ALI method, but is ready to be
> >> > extended extended to handle _ALC, _ALT and _ALP as well.
> >> > 
> >> > [1] http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf
> >> > 
> >> > Signed-off-by: Martin Liska 
> >> > Signed-off-by: Marek Vasut 
> >> > Signed-off-by: Gabriele Mazzotta 
> >> > Cc: Zhang Rui 

Thank you guys for finally getting this mainline :)

Best regards,
Marek Vasut
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 Bugfix 3/6] x86/xsaves: Rename xstate_size to kernel_xstate_size to explicitly distinguish xstate size in kernel from user space

2015-05-08 Thread Fenghua Yu
From: Fenghua Yu 

User space uses standard format xsave area. fpstate in signal frame should
have standard format size.

To explicitly distinguish between xstate size in kernel space and the one
in user space, we rename xstate_size to kernel_xstate_size. This patch is
not fixing a bug. It just makes kernel code more clear.

So we define the xsave area sizes in two global variables:

kernel_xstate_size (previous xstate_size): the xsave area size used in
xsave area allocated in kernel
user_xstate_size: the xsave area size used in xsave area used by user.

In no "xsaves" case, xsave area in both user space and kernel space are in
standard format. Therefore, kernel_xstate_size and user_xstate_size are
equal.

In "xsaves" case, xsave area in user space is in standard format while
xsave area in kernel space is in compact format. Therefore, kernel's
xstate size is less than user's xstate size.

Signed-off-by: Fenghua Yu 
Reviewed-by: Dave Hansen 
---
 arch/x86/include/asm/fpu-internal.h |  4 ++--
 arch/x86/include/asm/processor.h|  2 +-
 arch/x86/kernel/i387.c  | 22 +++---
 arch/x86/kernel/process.c   |  2 +-
 arch/x86/kernel/xsave.c | 14 +++---
 5 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/fpu-internal.h 
b/arch/x86/include/asm/fpu-internal.h
index c00c769..5d9ba0c 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -597,14 +597,14 @@ static inline void fpu_free(struct fpu *fpu)
 static inline void fpu_copy(struct task_struct *dst, struct task_struct *src)
 {
if (use_eager_fpu()) {
-   memset(>thread.fpu.state->xsave, 0, xstate_size);
+   memset(>thread.fpu.state->xsave, 0, kernel_xstate_size);
__save_fpu(dst);
} else {
struct fpu *dfpu = >thread.fpu;
struct fpu *sfpu = >thread.fpu;
 
unlazy_fpu(src);
-   memcpy(dfpu->state, sfpu->state, xstate_size);
+   memcpy(dfpu->state, sfpu->state, kernel_xstate_size);
}
 }
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 576ff8c..f26051b 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -482,7 +482,7 @@ DECLARE_PER_CPU(struct irq_stack *, hardirq_stack);
 DECLARE_PER_CPU(struct irq_stack *, softirq_stack);
 #endif /* X86_64 */
 
-extern unsigned int xstate_size;
+extern unsigned int kernel_xstate_size;
 extern unsigned int user_xstate_size;
 extern void free_thread_xstate(struct task_struct *);
 extern struct kmem_cache *task_xstate_cachep;
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index 8a7b96b..1eba4f2 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -133,8 +133,8 @@ void unlazy_fpu(struct task_struct *tsk)
 EXPORT_SYMBOL(unlazy_fpu);
 
 unsigned int mxcsr_feature_mask __read_mostly = 0xu;
-unsigned int xstate_size;
-EXPORT_SYMBOL_GPL(xstate_size);
+unsigned int kernel_xstate_size;
+EXPORT_SYMBOL_GPL(kernel_xstate_size);
 static struct i387_fxsave_struct fx_scratch;
 
 static void mxcsr_feature_mask_init(void)
@@ -154,7 +154,7 @@ static void mxcsr_feature_mask_init(void)
 static void init_thread_xstate(void)
 {
/*
-* Note that xstate_size might be overwriten later during
+* Note that kernel_xstate_size might be overwriten later during
 * xsave_init().
 */
 
@@ -165,17 +165,17 @@ static void init_thread_xstate(void)
 */
setup_clear_cpu_cap(X86_FEATURE_XSAVE);
setup_clear_cpu_cap(X86_FEATURE_XSAVEOPT);
-   xstate_size = sizeof(struct i387_soft_struct);
-   user_xstate_size = xstate_size;
+   kernel_xstate_size = sizeof(struct i387_soft_struct);
+   user_xstate_size = kernel_xstate_size;
return;
}
 
if (cpu_has_fxsr)
-   xstate_size = sizeof(struct i387_fxsave_struct);
+   kernel_xstate_size = sizeof(struct i387_fxsave_struct);
else
-   xstate_size = sizeof(struct i387_fsave_struct);
+   kernel_xstate_size = sizeof(struct i387_fsave_struct);
 
-   user_xstate_size = xstate_size;
+   user_xstate_size = kernel_xstate_size;
 }
 
 /*
@@ -211,9 +211,9 @@ void fpu_init(void)
 
/*
 * init_thread_xstate is only called once to avoid overriding
-* xstate_size during boot time or during CPU hotplug.
+* kernel_xstate_size during boot time or during CPU hotplug.
 */
-   if (xstate_size == 0)
+   if (kernel_xstate_size == 0)
init_thread_xstate();
 
mxcsr_feature_mask_init();
@@ -228,7 +228,7 @@ void fpu_finit(struct fpu *fpu)
return;
}
 
-   memset(fpu->state, 0, xstate_size);
+   memset(fpu->state, 0, kernel_xstate_size);
 
if (cpu_has_fxsr) {

[PATCH v3 Bugfix 2/6] x86/xsaves: Define and use user_xstate_size for xstate size in signal context

2015-05-08 Thread Fenghua Yu
From: Fenghua Yu 

If "xsaves" is enabled, kernel always uses compact format of xsave area.
But user space still uses standard format of xsave area. Thus, xstate size
in kernel's xsave area is smaller than xstate size in user's xsave area.
xstate in signal frame should be in standard format for user's signal
handler to access.

In no "xsaves" case, xsave area in both user space and kernel space are in
standard format. Therefore, user's and kernel's xstate sizes are equal.

In "xsaves" case, xsave area in user space is in standard format while
xsave area in kernel space is in compact format. Therefore, kernel's
xstate size is less than user's xstate size.

So here is the problem: currently kernel uses the kernel's xstate size
for xstate size in signal frame. This is not a problem in no "xsaves" case.
But it is an issue in "xsaves" case because kernel's xstate size is smaller
than user's xstate size. When setting up signal math frame in
alloc_ mathframe(), the fpstate is in standard format; but a smaller size
of fpstate buffer is allocated in signal frame for standard format
xstate. Then kernel saves only part of xstate registers into this smaller
user's fpstate buffer and user will see part of the xstate registers in
signal context. Similar issue happens after returning from signal handler:
kernel will only restore part of xstate registers from user's fpstate
buffer in signal frame.

This patch defines and uses user_xstate_size for xstate size in signal
frame. It's read from returned value in ebx from CPUID leaf 0x0D subleaf
0x0. This is maximum size required by enabled states in XCR0 and may be
different from ecx when states at the end of the xsave area are not
enabled. This value indicates the size required for XSAVE to save all
supported user states in legacy/standard format.

Signed-off-by: Fenghua Yu 
Reviewed-by: Dave Hansen 
---
 arch/x86/include/asm/fpu-internal.h |  3 ++-
 arch/x86/include/asm/processor.h|  1 +
 arch/x86/include/asm/xsave.h|  1 -
 arch/x86/kernel/i387.c  |  3 +++
 arch/x86/kernel/xsave.c | 29 +
 5 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/fpu-internal.h 
b/arch/x86/include/asm/fpu-internal.h
index da5e967..c00c769 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -496,7 +496,8 @@ extern int __restore_xstate_sig(void __user *buf, void 
__user *fx, int size);
 
 static inline int xstate_sigframe_size(void)
 {
-   return use_xsave() ? xstate_size + FP_XSTATE_MAGIC2_SIZE : xstate_size;
+   return use_xsave() ? user_xstate_size + FP_XSTATE_MAGIC2_SIZE :
+   user_xstate_size;
 }
 
 static inline int restore_xstate_sig(void __user *buf, int ia32_frame)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 23ba676..576ff8c 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -483,6 +483,7 @@ DECLARE_PER_CPU(struct irq_stack *, softirq_stack);
 #endif /* X86_64 */
 
 extern unsigned int xstate_size;
+extern unsigned int user_xstate_size;
 extern void free_thread_xstate(struct task_struct *);
 extern struct kmem_cache *task_xstate_cachep;
 
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index c9a6d68..7799d18 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -44,7 +44,6 @@
 #define REX_PREFIX
 #endif
 
-extern unsigned int xstate_size;
 extern u64 pcntxt_mask;
 extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];
 extern struct xsave_struct *init_xstate_buf;
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index 00918327..8a7b96b 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -166,6 +166,7 @@ static void init_thread_xstate(void)
setup_clear_cpu_cap(X86_FEATURE_XSAVE);
setup_clear_cpu_cap(X86_FEATURE_XSAVEOPT);
xstate_size = sizeof(struct i387_soft_struct);
+   user_xstate_size = xstate_size;
return;
}
 
@@ -173,6 +174,8 @@ static void init_thread_xstate(void)
xstate_size = sizeof(struct i387_fxsave_struct);
else
xstate_size = sizeof(struct i387_fsave_struct);
+
+   user_xstate_size = xstate_size;
 }
 
 /*
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 3c0a9d1..f99a6b7 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -29,6 +29,7 @@ static struct _fpx_sw_bytes fx_sw_reserved, 
fx_sw_reserved_ia32;
 static unsigned int *xstate_offsets, *xstate_sizes;
 static unsigned int xstate_comp_offsets[sizeof(pcntxt_mask)*8];
 static unsigned int xstate_features;
+unsigned int user_xstate_size;
 
 /*
  * If a processor implementation discern that a processor state component is
@@ -85,7 +86,7 @@ void __sanitize_i387_state(struct task_struct *tsk)
 */
while (xstate_bv) {
if (xstate_bv 

[PATCH v3 Bugfix 5/6] x86/xsaves: Keep xstate_bv in init_xstate_buf header as zero for init optimimization

2015-05-08 Thread Fenghua Yu
From: Fenghua Yu 

Keep xstate_bv in init_xstate_buf header as zero for init optimization.
This is important for init optimization that is implemented in processor.
If a bit corresponding to an xstate in xstate_bv is 0, it means the
xstate is in init status and will not be read from memory to the processor
during xrestor* instruction. This largely impacts context switch
performance.

Signed-off-by: Fenghua Yu 
Reviewed-by: Dave Hansen 
---
 arch/x86/kernel/xsave.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 4217bec..547f293 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -545,6 +545,12 @@ static void __init setup_init_fpu_buf(void)
 */
init_xstate_buf = alloc_bootmem_align(kernel_xstate_size,
  __alignof__(struct xsave_struct));
+
+   /*
+* Make sure xstate_bv is zero to allow init optimization work.
+*/
+   init_xstate_buf->xsave_hdr.xstate_bv = 0;
+
fx_finit(_xstate_buf->i387);
 
if (!cpu_has_xsave)
@@ -552,11 +558,9 @@ static void __init setup_init_fpu_buf(void)
 
setup_xstate_features();
 
-   if (cpu_has_xsaves) {
+   if (cpu_has_xsaves)
init_xstate_buf->xsave_hdr.xcomp_bv =
(u64)1 << 63 | pcntxt_mask;
-   init_xstate_buf->xsave_hdr.xstate_bv = pcntxt_mask;
-   }
 
/*
 * Init all the features state with header_bv being 0x0
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 Bugfix 6/6] x86/xsave.c: Introduce a new check that allows correct xstates copy from kernel to user directly

2015-05-08 Thread Fenghua Yu
From: Fenghua Yu 

There are two formats of XSAVE buffers: compact and standard.
We avoid copying the compact format out to userspace since it
might break existing userspace which is not aware of the new
compacted format.

This means that save_xstate_sig() can not simply copy_to_user()
the kernel buffer if it is in compact format, ever.  Add a
heavily-commented function explaining this.

Note that all the paths to save_user_xstate() do currently
check for used_math(), but add a WARN_ONCE() in there for any
future possible use.

Dave Hansen proposes this method to simplify copy xstate directly
to user.

Signed-off-by: Dave Hansen 
Signed-off-by: Fenghua Yu 
---
 arch/x86/kernel/xsave.c | 41 -
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 547f293..69a1847 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -217,6 +217,45 @@ static inline int save_user_xstate(struct xsave_struct 
__user *buf)
return err;
 }
 
+static int should_save_registers_directly(void)
+{
+   /*
+* This should only ever be called when we are actually
+* using the FPU.  Otherwise, we run the risk of using
+* some FPU instructions for saving the registers, and
+* inflating thread.fpu_counter, making us think that
+* the _task_ is using the FPU when in fact it was the
+* kernel.
+*/
+   WARN_ONCE(!used_math(), "direct FPU save with no math use\n");
+
+   /*
+* In the case that we are using a compacted kernel
+* xsave area, we can not copy the thread.fpu.state
+* directly to userspace and *must* save it from the
+* registers directly.
+*/
+   if (cpu_has_xsaves)
+   return 1;
+
+   /*
+* user_has_fpu() means "Can I use the FPU hardware
+* without taking a device-not-available exception?" This
+* means that saving the registers directly will be
+* cheaper than copying their contents out of
+* thread.fpu.state.
+*
+* Note that user_has_fpu() is inherently racy and may
+* become false at any time.  If this race happens, we
+* will take a harmless device-not-available exception
+* when we attempt the FPU save instruction.
+*/
+   if (user_has_fpu())
+   return 1;
+
+   return 0;
+}
+
 /*
  * Save the fpu, extended register state to the user signal frame.
  *
@@ -254,7 +293,7 @@ int save_xstate_sig(void __user *buf, void __user *buf_fx, 
int size)
sizeof(struct user_i387_ia32_struct), NULL,
(struct _fpstate_ia32 __user *) buf) ? -1 : 1;
 
-   if (user_has_fpu()) {
+   if (should_save_registers_directly()) {
/* Save the live register state to the user directly. */
if (save_user_xstate(buf_fx))
return -1;
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 Bugfix 4/6] x86/xsave: Don't add new states in xsave_struct

2015-05-08 Thread Fenghua Yu
From: Fenghua Yu 

The structure of xsave_struct is non-architectural. Some xstates could be
disabled and leave some holes in the xsave area. In compact format,
offsets of xstates in the xsave area are decided during booting time.

So the fields in xsave_struct are not static and fixed during compilation
time. The offsets and sizes of the fields in the structure should be
detected from cpuid during runtime.

Therefore, we don't add new states in xsave_struct except legacy fpu/sse
states and header fields whose offsets and sizes are defined
architecturally.

Signed-off-by: Fenghua Yu 
Reviewed-by: Dave Hansen 
---
 arch/x86/include/asm/processor.h | 20 +---
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index f26051b..163defc 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -386,16 +386,6 @@ struct i387_soft_struct {
u32 entry_eip;
 };
 
-struct ymmh_struct {
-   /* 16 * 16 bytes for each YMMH-reg = 256 bytes */
-   u32 ymmh_space[64];
-};
-
-/* We don't support LWP yet: */
-struct lwp_struct {
-   u8 reserved[128];
-};
-
 struct bndreg {
u64 lower_bound;
u64 upper_bound;
@@ -415,11 +405,11 @@ struct xsave_hdr_struct {
 struct xsave_struct {
struct i387_fxsave_struct i387;
struct xsave_hdr_struct xsave_hdr;
-   struct ymmh_struct ymmh;
-   struct lwp_struct lwp;
-   struct bndreg bndreg[4];
-   struct bndcsr bndcsr;
-   /* new processor state extensions will go here */
+   /*
+* Please don't add more states here. They are non-architectural.
+* Offset and size of each state should be calculated during boot time.
+* So adding states here is meanless.
+*/
 } __attribute__ ((packed, aligned (64)));
 
 union thread_xstate {
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 Bugfix 1/6] x86/xsave.c: Fix xstate offsets and sizes enumeration

2015-05-08 Thread Fenghua Yu
From: Fenghua Yu 

When enumerating xstate offsets and sizes from cpuid (eax=0x0d, ecx>=2),
it's possible that state m is not implemented while state n (n>m)
is implemented. So enumeration shouldn't stop at state m.

There is no platform configured like above yet. But this could be a problem
in the future. For example, suppose XCR0=0xe7, that means FP, SSE, AVX, and
AVX-512 states are enabled and MPX states (bit 3 and 4) are not enabled.
Then in setup_xstate_features(), after finding BNDREGS size is 0 (i.e. eax
from CPUID xstate subleaf 3, break from the for loop. That stops finding
xstate_offsets and xstate_sizes for AVX-512. Later on incorrect
xstate_offsets and xstate_sizes for AVX-512 will be used in a few places
and will causes issues.

This patch enumerates xstate offsets and sizes for all kernel supported
xstates. If a state is not implemented in hardware or not enabled in XCR0,
its size is set as zero and its offset is read from cpuid.

Ingo is rewriting fpu/xstate and his big patchset includes this patch at:
https://lkml.org/lkml/2015/5/5/892

I still send this patch in the patchset because we need to fix this bug
in upstream ASAP.

Signed-off-by: Fenghua Yu 
Reviewed-by: Dave Hansen 
---
 arch/x86/kernel/xsave.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 87a815b..3c0a9d1 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -465,23 +465,18 @@ static inline void xstate_enable(void)
  */
 static void __init setup_xstate_features(void)
 {
-   int eax, ebx, ecx, edx, leaf = 0x2;
+   int eax, ebx, ecx, edx, leaf;
 
xstate_features = fls64(pcntxt_mask);
xstate_offsets = alloc_bootmem(xstate_features * sizeof(int));
xstate_sizes = alloc_bootmem(xstate_features * sizeof(int));
 
-   do {
+   for (leaf = 2; leaf < xstate_features; leaf++) {
cpuid_count(XSTATE_CPUID, leaf, , , , );
 
-   if (eax == 0)
-   break;
-
xstate_offsets[leaf] = ebx;
xstate_sizes[leaf] = eax;
-
-   leaf++;
-   } while (1);
+   }
 }
 
 /*
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v9] iio: acpi: Add support for ACPI0008 Ambient Light Sensor

2015-05-08 Thread Jonathan Cameron


On 8 May 2015 20:20:33 BST, Gabriele Mazzotta  wrote:
>On Friday 08 May 2015 10:58:29 Jonathan Cameron wrote:
>> On 02/05/15 08:30, Gabriele Mazzotta wrote:
>> > This driver adds the initial support for the ACPI Ambient Light
>Sensor
>> > as defined in Section 9.2 of the ACPI specification (Revision 5.0)
>[1].
>> > 
>> > Sensors complying with the standard are exposed as ACPI devices
>with
>> > ACPI0008 as hardware ID and provide standard methods by which the
>OS
>> > can query properties of the ambient light environment the system is
>> > currently operating in.
>> > 
>> > This driver currently allows only to get the current ambient light
>> > illuminance reading through the _ALI method, but is ready to be
>> > extended extended to handle _ALC, _ALT and _ALP as well.
>> > 
>> > [1] http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf
>> > 
>> > Signed-off-by: Martin Liska 
>> > Signed-off-by: Marek Vasut 
>> > Signed-off-by: Gabriele Mazzotta 
>> > Cc: Zhang Rui 
>> Sorry, one last point inline that I missed before!
>> 
>> (just noticed it when taking a last glance before applying the
>patch).
>> 
>> Jonathan
>> > ---
>> > Changes since v8:
>> >  - Set realbits to 32
>> >  - Fix license mismatch (using GPL v2 or later)
>> >  - Drop iio_device_unregister() in favor of
>devm_iio_device_register()
>> > 
>> >  drivers/iio/light/Kconfig|  12 +++
>> >  drivers/iio/light/Makefile   |   1 +
>> >  drivers/iio/light/acpi-als.c | 232
>+++
>> >  3 files changed, 245 insertions(+)
>> >  create mode 100644 drivers/iio/light/acpi-als.c
>> > 
>> > diff --git a/drivers/iio/light/Kconfig b/drivers/iio/light/Kconfig
>> > index 01a1a16..898b2b5 100644
>> > --- a/drivers/iio/light/Kconfig
>> > +++ b/drivers/iio/light/Kconfig
>> > @@ -5,6 +5,18 @@
>> >  
>> >  menu "Light sensors"
>> >  
>> > +config ACPI_ALS
>> > +  tristate "ACPI Ambient Light Sensor"
>> > +  depends on ACPI
>> > +  select IIO_TRIGGERED_BUFFER
>> > +  select IIO_KFIFO_BUF
>> > +  help
>> > +   Say Y here if you want to build a driver for the ACPI0008
>> > +   Ambient Light Sensor.
>> > +
>> > +   To compile this driver as a module, choose M here: the module
>will
>> > +   be called acpi-als.
>> > +
>> >  config ADJD_S311
>> >tristate "ADJD-S311-CR999 digital color sensor"
>> >select IIO_BUFFER
>> > diff --git a/drivers/iio/light/Makefile
>b/drivers/iio/light/Makefile
>> > index ad7c30f..d9aad52a 100644
>> > --- a/drivers/iio/light/Makefile
>> > +++ b/drivers/iio/light/Makefile
>> > @@ -3,6 +3,7 @@
>> >  #
>> >  
>> >  # When adding new entries keep the list in alphabetical order
>> > +obj-$(CONFIG_ACPI_ALS)+= acpi-als.o
>> >  obj-$(CONFIG_ADJD_S311)   += adjd_s311.o
>> >  obj-$(CONFIG_AL3320A) += al3320a.o
>> >  obj-$(CONFIG_APDS9300)+= apds9300.o
>> > diff --git a/drivers/iio/light/acpi-als.c
>b/drivers/iio/light/acpi-als.c
>> > new file mode 100644
>> > index 000..9839c9a
>> > --- /dev/null
>> > +++ b/drivers/iio/light/acpi-als.c
>> > @@ -0,0 +1,232 @@
>> > +/*
>> > + * ACPI Ambient Light Sensor Driver
>> > + *
>> > + * Based on ALS driver:
>> > + * Copyright (C) 2009 Zhang Rui 
>> > + *
>> > + * Rework for IIO subsystem:
>> > + * Copyright (C) 2012-2013 Martin Liska 
>> > + *
>> > + * Final cleanup and debugging:
>> > + * Copyright (C) 2013-2014 Marek Vasut 
>> > + * Copyright (C) 2015 Gabriele Mazzotta 
>> > + *
>> > + * This program is free software; you can redistribute it and/or
>modify it
>> > + * under the terms of the GNU General Public License as published
>by the
>> > + * Free Software Foundation; either version 2 of the License, or
>(at your
>> > + * option) any later version.
>> > + *
>> > + * This program is distributed in the hope that it will be useful,
>but
>> > + * WITHOUT ANY WARRANTY; without even the implied warranty of
>> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>GNU
>> > + * General Public License for more details.
>> > + *
>> > + * You should have received a copy of the GNU General Public
>License along
>> > + * with this program; if not, write to the Free Software
>Foundation, Inc.,
>> > + * 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
>> > + */
>> > +
>> > +#include 
>> > +#include 
>> > +#include 
>> > +#include 
>> > +
>> > +#include 
>> > +#include 
>> > +#include 
>> > +
>> > +#define ACPI_ALS_CLASS"als"
>> > +#define ACPI_ALS_DEVICE_NAME  "acpi-als"
>> > +#define ACPI_ALS_NOTIFY_ILLUMINANCE   0x80
>> > +
>> > +ACPI_MODULE_NAME("acpi-als");
>> > +
>> > +/*
>> > + * So far, there's only one channel in here, but the specification
>for
>> > + * ACPI0008 says there can be more to what the block can report.
>Like
>> > + * chromaticity and such. We are ready for incoming additions!
>> > + */
>> > +static const struct iio_chan_spec acpi_als_channels[] = {
>> > +  {
>> > +  .type   = IIO_LIGHT,
>> > +  .scan_type  = {
>> > +

[PATCH v3 Bugfix 0/6] xstate/fpu bug fixes

2015-05-08 Thread Fenghua Yu
From: Fenghua Yu 

This patchset is supposed to fix some xsave/xsaves/fpu related issues.

We may hit the issues on either existing platforms or upcoming platforms.
We had better to have the patches in upstream and backport them to stable
kernel and distros.

The patch 1/6 fixes an xstate offsets and sizes enumeration issue. During
enumerating offsets and sizes starting from 2 to the last enabled feature,
if one xstate's size is 0, current code thinks there is no other xstate
after this xstate and breaks from enumeration. This is not true because
architecturally it's possible to have a few xstates disabled between
xstate 2 and the last enabled xstate. The offsets and sizes of
the xstates that are not enumerated after the disabled xstate will be
consumed and cause issues in runtime.

The patch 2/6 introduces a new global variable "user_xstate_size". This
variable is used for standard formatted xsave area size in signal frame.
Current code incorrectly uses the smaller compacted formatted xsave area
size for signal frame and will cause issues in xstate access in signal
frame.

The patch 3/6 is not fixing a bug. But it renames "xstate_size" to
"kernel_xstate_size" to explicitly distinguish between xstate size in
kernel space and the one in user space. It just makes kernel code more
clear.

The patch 4/6 claims that the structure of xsave_struct is
non-architectural and fields/xstates in the structure is not defined
in compilation time. No new states should be added in xsave_struct.
The xsave area should be constructed during kernel booting time.

The patch 5/6 clears xstate_bv so that init optimization in hardware
can take action. Without the patch, some xstates are always not in
init status and this will impact badly on performance of context
switch.

The patch 6/6 introduces a correct check for user_has_fpu check.

Changes in v3:

1/6: In description, add that Ingo has a same patch in his xstate/fpu
overall clean up patchset.
2/6: Remove copy_to_user_xstate(). Now copy compact format xsave
area directly from processor to user buffer in 6/6.
Initialize user_xstate_size in init_thread_xstate().
3/6: Add Dave Hansen's credit in description. 
5/6: Add this new patch for performance issue.
6/6: Add this new patch for a new user_has_fpu check to allow copy
compact format xsave area directly from processor to user buffer.

Fenghua Yu (6):
  x86/xsave.c: Fix xstate offsets and sizes enumeration
  x86/xsaves: Define and use user_xstate_size for xstate size in signal
context
  x86/xsaves: Rename xstate_size to kernel_xstate_size to explicitly
distinguish xstate size in kernel from user space
  x86/xsave: Don't add new states in xsave_struct
  x86/xsaves: Keep xstate_bv in init_xstate_buf header as zero for init
optimimization
  x86/xsave.c: Introduce a new check that allows correct xstates copy
from kernel to user directly

 arch/x86/include/asm/fpu-internal.h |   7 +--
 arch/x86/include/asm/processor.h|  23 +++-
 arch/x86/include/asm/xsave.h|   1 -
 arch/x86/kernel/i387.c  |  21 
 arch/x86/kernel/process.c   |   2 +-
 arch/x86/kernel/xsave.c | 105 ++--
 6 files changed, 102 insertions(+), 57 deletions(-)

-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH try #4] proc: fix PAGE_SIZE limit of /proc/$PID/cmdline

2015-05-08 Thread Andrew Morton
On Fri, 8 May 2015 15:28:05 +0300 Alexey Dobriyan  wrote:

> /proc/$PID/cmdline truncates output at PAGE_SIZE. It is easy to see with
> 
>   $ cat /proc/self/cmdline $(seq 1037) 2>/dev/null
> 
> However, command line size was never limited to PAGE_SIZE but to 128 KB and
> relatively recently limitation was removed altogether.
> 
> People noticed and ask questions:
> http://stackoverflow.com/questions/199130/how-do-i-increase-the-proc-pid-cmdline-4096-byte-limit
> 
> seq file interface is not OK, because it kmalloc's for whole output and
> open + read(, 1) + sleep will pin arbitrary amounts of kernel memory.
> To not do that, limit must be imposed which is incompatible with
> arbitrary sized command lines.
> 
> I apologize for hairy code, but this it direct consequence of command line
> layout in memory and hacks to support things like "init [3]".
> 
> The loops are "unrolled" otherwise it is either macros which hide
> control flow or functions with 7-8 arguments with equal line count.
> 
> There should be real setproctitle(2) or something.
> 
> ...
>
>  fs/proc/base.c |  203 
> ++---
>  1 file changed, 194 insertions(+), 9 deletions(-)

I still hate your patch!


Also, dude.  i386:

In file included from include/asm-generic/bug.h:13:0,
 from ./arch/x86/include/asm/bug.h:35,
 from include/linux/bug.h:4,
 from include/linux/thread_info.h:11,
 from ./arch/x86/include/asm/uaccess.h:8,
 from fs/proc/base.c:50:
fs/proc/base.c: In function 'proc_pid_cmdline_read':
include/linux/kernel.h:602:17: warning: comparison of distinct pointer types 
lacks a cast [enabled by default]
  (void) (&_min1 == &_min2);  \
 ^
include/linux/kernel.h:600:9: note: in definition of macro 'min'
  typeof(x) _min1 = (x);   \
 ^
include/linux/kernel.h:611:38: note: in expansion of macro 'min'
 #define min3(x, y, z) min((typeof(x))min(x, y), z)
  ^
fs/proc/base.c:265:13: note: in expansion of macro 'min3'
_count = min3(count, len, PAGE_SIZE);
 ^
include/linux/kernel.h:602:17: warning: comparison of distinct pointer types 
lacks a cast [enabled by default]
  (void) (&_min1 == &_min2);  \
 ^
include/linux/kernel.h:600:21: note: in definition of macro 'min'
  typeof(x) _min1 = (x);   \
 ^
include/linux/kernel.h:611:38: note: in expansion of macro 'min'
 #define min3(x, y, z) min((typeof(x))min(x, y), z)
  ^
fs/proc/base.c:265:13: note: in expansion of macro 'min3'
_count = min3(count, len, PAGE_SIZE);
 ^
include/linux/kernel.h:602:17: warning: comparison of distinct pointer types 
lacks a cast [enabled by default]
  (void) (&_min1 == &_min2);  \
 ^
include/linux/kernel.h:611:23: note: in expansion of macro 'min'
 #define min3(x, y, z) min((typeof(x))min(x, y), z)
   ^
fs/proc/base.c:265:13: note: in expansion of macro 'min3'
_count = min3(count, len, PAGE_SIZE);
 ^
include/linux/kernel.h:602:17: warning: comparison of distinct pointer types 
lacks a cast [enabled by default]
  (void) (&_min1 == &_min2);  \
 ^
include/linux/kernel.h:600:9: note: in definition of macro 'min'
  typeof(x) _min1 = (x);   \
 ^
include/linux/kernel.h:611:38: note: in expansion of macro 'min'
 #define min3(x, y, z) min((typeof(x))min(x, y), z)
  ^
fs/proc/base.c:300:13: note: in expansion of macro 'min3'
_count = min3(count, len, PAGE_SIZE);
 ^
include/linux/kernel.h:602:17: warning: comparison of distinct pointer types 
lacks a cast [enabled by default]
  (void) (&_min1 == &_min2);  \
 ^
include/linux/kernel.h:600:21: note: in definition of macro 'min'
  typeof(x) _min1 = (x);   \
 ^
include/linux/kernel.h:611:38: note: in expansion of macro 'min'
 #define min3(x, y, z) min((typeof(x))min(x, y), z)
  ^
fs/proc/base.c:300:13: note: in expansion of macro 'min3'
_count = min3(count, len, PAGE_SIZE);
 ^
include/linux/kernel.h:602:17: warning: comparison of distinct pointer types 
lacks a cast [enabled by default]
  (void) (&_min1 == &_min2);  \
 ^
include/linux/kernel.h:611:23: note: in expansion of macro 'min'
 #define min3(x, y, z) min((typeof(x))min(x, y), z)
   ^
fs/proc/base.c:300:13: note: in expansion of macro 'min3'
_count = min3(count, len, PAGE_SIZE);
 ^
include/linux/kernel.h:602:17: warning: comparison of distinct pointer types 
lacks a cast [enabled by default]
  (void) (&_min1 == &_min2);  \
 ^
include/linux/kernel.h:600:9: note: in definition of macro 'min'
  typeof(x) _min1 = (x);   \
 ^
include/linux/kernel.h:611:38: note: in expansion of macro 'min'
 #define min3(x, y, 

Re: [alsa-devel] [PATCH] ASoC: codecs-ac97: Remove rate constraints

2015-05-08 Thread Fabio Estevam
On Fri, May 8, 2015 at 6:16 PM, Maciej S. Szmigiero
 wrote:
> Remove rate constraints from generic ASoC AC'97 CODEC and make
> it selectable in config.

Shouldn't this be split in two patches?

>
> Supported rates should be detected and constrained anyway by
> AC'97 generic code - was tested with VT1613 CODEC and iMX6 SSI
> controller.

Nice, I would like to test this on a Udoo board. Care to share the dts
changes? (I know this is off topic for this list ;-) Apart from the
dts changes: are there still missing patches in linux-next to make
audio work in Udoo?

> This way this driver can be used for platforms which don't need
> specialized AC'97 CODEC drivers while at the same avoiding
> code duplication from implementing equivalent functionality in
> a controller driver.
>
> Resending due to no response received.

No need to put this in the commit log.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sched, timer: Fix documentation for 'struct thread_group_cputimer'

2015-05-08 Thread Jason Low
On Fri, 2015-05-08 at 06:22 -0700, tip-bot for Jason Low wrote:
> Commit-ID:  1018016c706f7ff9f56fde3a649789c47085a293
> Gitweb: http://git.kernel.org/tip/1018016c706f7ff9f56fde3a649789c47085a293
> Author: Jason Low 
> AuthorDate: Tue, 28 Apr 2015 13:00:22 -0700
> Committer:  Ingo Molnar 
> CommitDate: Fri, 8 May 2015 12:15:31 +0200
> 
> sched, timer: Replace spinlocks with atomics in thread_group_cputimer(), to 
> improve scalability

The following patch addresses the issue reported by Fengguang Wu
regarding this tip commit 1018016c706f.

---
The description for struct thread_group_cputimer contains the 'cputime'
and 'lock' members, which are not valid anymore since

  tip commit 1018016c706f ("sched, timer: Replace spinlocks with atomics
  in thread_group_cputimer(), to improve scalability")

modified/removed those fields. This patch updates the description
to reflect those changes.

Reported-by: Fengguang Wu 
Signed-off-by: Jason Low 
---
 include/linux/sched.h |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6cc4f7e..cb73486 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -606,10 +606,9 @@ struct task_cputime_atomic {
 
 /**
  * struct thread_group_cputimer - thread group interval timer counts
- * @cputime:   thread group interval timers.
+ * @cputime_atomic:atomic thread group interval timers.
  * @running:   non-zero when there are timers running and
  * @cputime receives updates.
- * @lock:  lock for fields in this struct.
  *
  * This structure contains the version of task_cputime, above, that is
  * used for thread group CPU timer calculations.
-- 
1.7.2.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] Btrfs fix

2015-05-08 Thread Chris Mason
Hi Linus,

When an arm user reported crashes near page_address(page) in my new code,
it became clear that I can't be trusted with GFP masks.  Filipe beat me
to the patch, and I'll just be in the corner with my dunce cap on.

git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git 
for-linus-4.1

Filipe Manana (1) commits (+1/-1):
Btrfs: fix wrong mapping flags for free space inode

Total: (1) commits (+1/-1)

 fs/btrfs/free-space-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mtd: nand: Add on-die ECC support

2015-05-08 Thread Ben Shelton
On 04/27, Brian Norris wrote:
> On Tue, Apr 28, 2015 at 08:18:12AM +0530, punnaiah choudary kalluri wrote:
> > On Tue, Apr 28, 2015 at 4:53 AM, Brian Norris
> >  wrote:
> > > On Tue, Apr 28, 2015 at 12:19:16AM +0200, Richard Weinberger wrote:
> > >> Oh, I thought every driver has to implement that function. ;-\
> > >
> > > Nope.
> > >
> > >> But you're right there is a corner case.
> > >
> > > And it's not the only one! Right now, there's no guarantee even that
> > > read_buf() returns raw data, unmodified by the SoC's controller. Plenty
> > > of drivers actually have HW-enabled ECC turned on by default, and so
> > > they override the chip->ecc.read_page() (and sometimes
> > > chip->ecc.read_page_raw() functions, if we're lucky) with something
> > > that pokes the appropriate hardware instead. I expect anything
> > > comprehensive here is probably going to have to utilize
> > > chip->ecc.read_page_raw(), at least if it's provided by the hardware
> > > driver.
> > 
> > Yes, overriding the chip->ecc.read_page_raw would solve this.
> 
> I'm actually suggesting that (in this patch set, for on-die ECC
> support), maybe we *shouldn't* override chip->ecc.read_page_raw() and
> leave that to be defined by the driver, and then on-die ECC support
> should be added in a way that just calls chip->ecc.read_page_raw(). This
> should work for any driver that already properly supports the raw
> callbacks.
> 

Hi Richard et al,

I'm guessing it's probably too late for the on-die ECC stuff to land in
4.2 at this point.  Is there anything I can do to help this along
(testing, etc.)?

Thanks,
Ben

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3] cpuidle: Handle tick_broadcast_enter() failure gracefully

2015-05-08 Thread Rafael J. Wysocki
On Friday, May 08, 2015 04:18:02 PM Rafael J. Wysocki wrote:
> On Friday, May 08, 2015 01:05:32 PM Preeti U Murthy wrote:
> > When a CPU has to enter an idle state where tick stops, it makes a call
> > to tick_broadcast_enter(). The call will fail if this CPU is the
> > broadcast CPU. Today, under such a circumstance, the arch cpuidle code
> > handles this CPU.  This is not convincing because not only do we not
> > know what the arch cpuidle code does, but we also do not account for the
> > idle state residency time and usage of such a CPU.
> > 
> > This scenario can be handled better by simply choosing an idle state
> > where in ticks do not stop. To accommodate this change move the setting
> > of runqueue idle state from the core to the cpuidle driver, else the
> > rq->idle_state will be set wrong.
> > 
> > Signed-off-by: Preeti U Murthy 
> > ---
> > Changes from V2: https://lkml.org/lkml/2015/5/7/78
> > Introduce a function in cpuidle core to select an idle state where ticks do 
> > not
> > stop rather than going through the governors.
> > 
> > Changes from V1: https://lkml.org/lkml/2015/5/7/24
> > Rebased on the latest linux-pm/bleeding-edge branch
> > 
> >  drivers/cpuidle/cpuidle.c |   45 
> > +++--
> >  include/linux/sched.h |   16 
> >  kernel/sched/core.c   |   17 +
> >  kernel/sched/fair.c   |2 +-
> >  kernel/sched/idle.c   |6 --
> >  kernel/sched/sched.h  |   24 
> >  6 files changed, 77 insertions(+), 33 deletions(-)
> > 
> > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> > index 8c24f95..d1af760 100644
> > --- a/drivers/cpuidle/cpuidle.c
> > +++ b/drivers/cpuidle/cpuidle.c
> > @@ -21,6 +21,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  
> >  #include "cpuidle.h"
> > @@ -146,6 +147,36 @@ int cpuidle_enter_freeze(struct cpuidle_driver *drv, 
> > struct cpuidle_device *dev)
> > return index;
> >  }
> >  
> > +/*
> > + * find_tick_valid_state - select a state where tick does not stop
> > + * @dev: cpuidle device for this cpu
> > + * @drv: cpuidle driver for this cpu
> > + */
> > +static int find_tick_valid_state(struct cpuidle_device *dev,
> > +   struct cpuidle_driver *drv)
> > +{
> > +   int i, ret = -1;
> > +
> > +   for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) {
> > +   struct cpuidle_state *s = >states[i];
> > +   struct cpuidle_state_usage *su = >states_usage[i];
> > +
> > +   /*
> > +* We do not explicitly check for latency requirement
> > +* since it is safe to assume that only shallower idle
> > +* states will have the CPUIDLE_FLAG_TIMER_STOP bit
> > +* cleared and they will invariably meet the latency
> > +* requirement.
> > +*/
> > +   if (s->disabled || su->disable ||
> > +   (s->flags & CPUIDLE_FLAG_TIMER_STOP))
> > +   continue;
> > +
> > +   ret = i;
> > +   }
> > +   return ret;
> > +}
> > +
> >  /**
> >   * cpuidle_enter_state - enter the state and update stats
> >   * @dev: cpuidle device for this cpu
> > @@ -168,10 +199,17 @@ int cpuidle_enter_state(struct cpuidle_device *dev, 
> > struct cpuidle_driver *drv,
> >  * CPU as a broadcast timer, this call may fail if it is not available.
> >  */
> > if (broadcast && tick_broadcast_enter()) {
> > -   default_idle_call();
> > -   return -EBUSY;
> > +   index = find_tick_valid_state(dev, drv);
> 
> Well, the new state needs to be deeper

I should have said "shallower", sorry about that.

The state chosen by the governor satisfies certain latency requirements and we
can't violate those by choosing a deeper state here.

But the patch I sent actually did the right thing. :-)

> than the old one or you may violate the governor's choice and this doesn't
> guarantee that.
> 
> Also I don't quite see a reason to duplicate the find_deepest_state() 
> functionality
> here.
> 
> > +   if (index < 0) {
> > +   default_idle_call();
> > +   return -EBUSY;
> > +   }
> > +   target_state = >states[index];
> > }
> >  
> > +   /* Take note of the planned idle state. */
> > +   idle_set_state(smp_processor_id(), target_state);
> 
> And I wouldn't do this either.
> 
> The behavior here is pretty much as though the driver demoted the state chosen
> by the governor and we don't call idle_set_state() again in those cases.
> 
> > +
> > trace_cpu_idle_rcuidle(index, dev->cpu);
> > time_start = ktime_get();
> 
> Overall, something like the patch below (untested) should work I suppose?
> 
> ---
>  drivers/cpuidle/cpuidle.c |   21 ++---
>  1 file changed, 14 insertions(+), 7 deletions(-)
> 
> Index: linux-pm/drivers/cpuidle/cpuidle.c
> 

Re: [PATCH 0/6] support "dataplane" mode for nohz_full

2015-05-08 Thread Steven Rostedt
On Fri, 8 May 2015 14:18:24 -0700
Andrew Morton  wrote:

> On Fri, 8 May 2015 13:58:41 -0400 Chris Metcalf  wrote:
> 
> > A prctl() option (PR_SET_DATAPLANE) is added
> 
> Dumb question: what does the term "dataplane" mean in this context?  I
> can't see the relationship between those words and what this patch
> does.

I was thinking the same thing. I haven't gotten around to searching
DATAPLANE yet.

I would assume we want a name that is more meaningful for what is
happening.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] support "dataplane" mode for nohz_full

2015-05-08 Thread Andrew Morton
On Fri, 8 May 2015 13:58:41 -0400 Chris Metcalf  wrote:

> A prctl() option (PR_SET_DATAPLANE) is added

Dumb question: what does the term "dataplane" mean in this context?  I
can't see the relationship between those words and what this patch
does.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >