Re: [RFC PATCH v1 1/2] perf/sdt : Listing of SDT markers by perf

2014-02-25 Thread Namhyung Kim
Hi Hemant,

On Tue, 25 Feb 2014 14:33:37 +0530, Hemant Kumar wrote:
> On 02/25/2014 12:26 PM, Namhyung Kim wrote:
>>> +   /* Translation from file representation to memory representation */
>>> +   if (gelf_xlatetom(*elf, , ,
>>> + elf_getident(*elf, NULL)[EI_DATA]) == NULL)
>> Do we really need this xlate function?  It seems elf_getdata() already
>> did necessary conversions so only thing we need to do is checking its
>> class and read out the addresses in a proper length, no?
>>
>
> Hmm, alright. I thought the conversion was necessary for cross
> developed binaries.
> But I guess elf_getdata() should do all these conversions. Will remove that.

Looking at the source, it seems we still need to xlate() anyway.  The
conversion function (elf_cvt_note) only handles the header part since it
cannot know what the content is.

But obviously we cannot enable/disable the marker as we cannot run the
cross-built binary - it only can be used to show the list of SDT markers
in the binary.  Or else, we can simply deny to do it..

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] tools/vm/page-types.c: page-cache sniffing feature

2014-02-25 Thread Konstantin Khlebnikov
After this patch 'page-types' can walk on filesystem mappings and analize
populated page cache pages mostly without disturbing its state.

It maps chunk of file, marks VMA as MADV_RANDOM to turn off readahead,
pokes VMA via mincore() to determine cached pages, triggers page-fault
only for them, and finally gathers information via pagemap/kpageflags.
Before unmap it marks VMA as MADV_SEQUENTIAL for ignoring reference bits.

usage: page-types -f 

If  is directory it will analyse all files in all subdirectories.

Symlinks are not followed as well as mount points. Hardlinks aren't handled,
they'll be dumbed as many times as they are found. Recursive walk brings all
dentries into dcache and populates page cache of block-devices aka 'Buffers'.

Probably it's worth to add ioctl for dumping file page cache as array of PFNs
as a replacement for this hackish juggling with mmap/madvise/mincore/pagemap.
Also recursive walk could be replaced with dumping cached inodes via some ioctl
or debugfs interface followed by openning them via open_by_handle_at, this
would fix hardlinks handling and unneeded population of dcache and buffers.
This interface might be used as data source for constructing readahead plans
and for background optimizations of actively used files.

collateral changes:
+ fix 64-bit LFS: define _FILE_OFFSET_BITS instead of _LARGEFILE64_SOURCE
+ replace lseek + read with single pread
+ make show_page_range() reusable after flush

usage example:

~/src/linux/tools/vm$ sudo ./page-types -L -f page-types
foffset offset  flags
page-types  Inode: 2229277  Size: 89065 (22 pages)
Modify: Tue Feb 25 12:00:59 2014 (162 seconds ago)
Access: Tue Feb 25 12:01:00 2014 (161 seconds ago)
0   3cbf3b  __RU_lAM
1   38946a  __RU_lAM
2   1a3cec  __RU_lAM
3   1a8321  __RU_lAM
4   3af7cc  __RU_lAM
5   1ed532  __RU_lA_
6   2e436a  __RU_lA_
7   29a35e  ___U_lA_
8   2de86e  ___U_lA_
9   3bdfb4  ___U_lA_
10  3cd8a3  ___U_lA_
11  2afa50  ___U_lA_
12  2534c2  ___U_lA_
13  1b7a40  ___U_lA_
14  17b0be  ___U_lA_
15  392b0c  ___U_lA_
16  3ba46a  __RU_lA_
17  397dc8  ___U_lA_
18  1f2a36  ___U_lA_
19  21fd30  __RU_lA_
20  2c35ba  __RU_l__
21  20f181  __RU_l__


 flags  page-count   MB  symbolic-flags 
long-symbolic-flags
0x002c   20  
__RU_l__   referenced,uptodate,lru
0x0068  110  
___U_lA_   uptodate,lru,active
0x006c   40  
__RU_lA_   referenced,uptodate,lru,active
0x086c   50  
__RU_lAM   referenced,uptodate,lru,active,mmap
 total  220



~/src/linux/tools/vm$ sudo ./page-types -f /
 flags  page-count   MB  symbolic-flags 
long-symbolic-flags
0x0028   21761   85  
___U_l__   uptodate,lru
0x002c  127279  497  
__RU_l__   referenced,uptodate,lru
0x0068   74160  289  
___U_lA_   uptodate,lru,active
0x006c   84469  329  
__RU_lA_   referenced,uptodate,lru,active
0x007c   10  
__RUDlA_   referenced,uptodate,dirty,lru,active
0x0228 3701  
___U_l___I__   uptodate,lru,reclaim
0x0828  490  
___U_l_M   uptodate,lru,mmap
0x082c 1260  
__RU_l_M   referenced,uptodate,lru,mmap
0x0868 1370  
___U_lAM   uptodate,lru,active,mmap
0x086c   12890   50  
__RU_lAM   referenced,uptodate,lru,active,mmap
 total  321242 1254

Signed-off-by: Konstantin Khlebnikov 
---
 tools/vm/page-types.c |  170 -
 1 file changed, 152 insertions(+), 18 deletions(-)


Re: [PATCH v2] mm: per-thread vma caching

2014-02-25 Thread Michel Lespinasse
On Tue, Feb 25, 2014 at 8:04 PM, Davidlohr Bueso  wrote:
> On Tue, 2014-02-25 at 18:04 -0800, Michel Lespinasse wrote:
>> On Tue, Feb 25, 2014 at 10:16 AM, Davidlohr Bueso  wrote:
>> > This patch is a continuation of efforts trying to optimize find_vma(),
>> > avoiding potentially expensive rbtree walks to locate a vma upon faults.
>> > The original approach (https://lkml.org/lkml/2013/11/1/410), where the
>> > largest vma was also cached, ended up being too specific and random, thus
>> > further comparison with other approaches were needed. There are two things
>> > to consider when dealing with this, the cache hit rate and the latency of
>> > find_vma(). Improving the hit-rate does not necessarily translate in 
>> > finding
>> > the vma any faster, as the overhead of any fancy caching schemes can be too
>> > high to consider.
>>
>> Actually there is also the cost of keeping the cache up to date. I'm
>> not saying that it's an issue in your proposal - I like the proposal,
>> especially now that you are replacing the per-mm cache rather than
>> adding something on top - but it is a factor to consider.
>
> True, although numbers show that the cost of maintaining the cache is
> quite minimal. Invalidations are a free lunch (except in the rare event
> of a seqnum overflow), so the updating part would consume the most
> cycles, but then again, the hit rate is quite good so I'm not worried
> about that either.

Yes. I like your patch precisely because it keeps maintainance costs low.

>> > +void vmacache_invalidate_all(void)
>> > +{
>> > +   struct task_struct *g, *p;
>> > +
>> > +   rcu_read_lock();
>> > +   for_each_process_thread(g, p) {
>> > +   /*
>> > +* Only flush the vmacache pointers as the
>> > +* mm seqnum is already set and curr's will
>> > +* be set upon invalidation when the next
>> > +* lookup is done.
>> > +*/
>> > +   memset(p->vmacache, 0, sizeof(p->vmacache));
>> > +   }
>> > +   rcu_read_unlock();
>> > +}
>>
>> Two things:
>>
>> - I believe we only need to invalidate vma caches for threads that
>> share a given mm ? we should probably pass in that mm in order to
>> avoid over-invalidation
>
> I think you're right, since the overflows will always occur on
> mm->seqnum, tasks that do not share the mm shouldn't be affected.
>
> So the danger here is that when a lookup occurs, vmacache_valid() will
> return true, having:
>
> mm == curr->mm && mm->vmacache_seqnum == curr->vmacache_seqnum (both 0).
>
> Then we just iterate the cache and potentially return some bugus vma.
>
> However, since we're now going to reset the seqnum on every fork/clone
> (before it was just the oldmm->seqnum + 1 thing), I doubt we'll ever
> overflow.

I'm concerned about it *precisely* because it won't happen often, and
so it'd be hard to debug if we had a problem there. 64 bits would be
safe, but a 32-bit counter doesn't take that long to overflow, and I'm
sure it will happen once in a while in production.

Actually, I think there is a case for masking the seqnum with a
constant (all ones in production builds, but something shorter when
CONFIG_DEBUG_VM is enabled) so that this code is easier to exercise.

>> - My understanding is that the operation is safe because the caller
>> has the mm's mmap_sem held for write, and other threads accessing the
>> vma cache will have mmap_sem held at least for read, so we don't need
>> extra locking to maintain the vma cache.
>
> Yes, that's how I see things as well.
>
>> Please 1- confirm this is the
>> intention, 2- document this, and 3- only invalidate vma caches for
>> threads that match the caller's mm so that mmap_sem locking can
>> actually apply.
>
> Will do.

Thanks :)

>> > +struct vm_area_struct *vmacache_find(struct mm_struct *mm,
>> > +unsigned long addr)
>> > +
>> > +{
>> > +   int i;
>> > +
>> > +   if (!vmacache_valid(mm))
>> > +   return NULL;
>> > +
>> > +   for (i = 0; i < VMACACHE_SIZE; i++) {
>> > +   struct vm_area_struct *vma = current->vmacache[i];
>> > +
>> > +   if (vma && vma->vm_start <= addr && vma->vm_end > addr)
>> > +   return vma;
>> > +   }
>> > +
>> > +   return NULL;
>> > +}
>> > +
>> > +void vmacache_update(struct mm_struct *mm, unsigned long addr,
>> > +struct vm_area_struct *newvma)
>> > +{
>> > +   /*
>> > +* Hash based on the page number. Provides a good
>> > +* hit rate for workloads with good locality and
>> > +* those with random accesses as well.
>> > +*/
>> > +   int idx = (addr >> PAGE_SHIFT) & 3;
>> > +   current->vmacache[idx] = newvma;
>> > +}
>>
>> I did read the previous discussion about how to compute idx here. I
>> did not at the time realize that you are searching all 4 vmacache
>> entries on lookup - that is, we are only talking about 

linux-next: manual merge of the akpm-current tree with the cgroup tree

2014-02-25 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in
mm/memcontrol.c between commits e61734c55c24 ("cgroup: remove
cgroup->name") from the cgroup tree and commits a89db06ab1b4 ("memcg:
change oom_info_lock to mutex") and c78e84121972 ("memcg, slab: cleanup
memcg cache creation") from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc mm/memcontrol.c
index d9c6ac1532e6,452f45087566..
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@@ -1683,25 -1683,54 +1683,25 @@@ static void move_unlock_mem_cgroup(stru
   */
  void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct 
*p)
  {
 -  /*
 -   * protects memcg_name and makes sure that parallel ooms do not
 -   * interleave
 -   */
 +  /* oom_info_lock ensures that parallel ooms do not interleave */
-   static DEFINE_SPINLOCK(oom_info_lock);
+   static DEFINE_MUTEX(oom_info_lock);
 -  struct cgroup *task_cgrp;
 -  struct cgroup *mem_cgrp;
 -  static char memcg_name[PATH_MAX];
 -  int ret;
struct mem_cgroup *iter;
unsigned int i;
  
if (!p)
return;
  
-   spin_lock(_info_lock);
+   mutex_lock(_info_lock);
rcu_read_lock();
  
 -  mem_cgrp = memcg->css.cgroup;
 -  task_cgrp = task_cgroup(p, mem_cgroup_subsys_id);
 +  pr_info("Task in ");
 +  pr_cont_cgroup_path(task_cgroup(p, memory_cgrp_id));
 +  pr_info(" killed as a result of limit of ");
 +  pr_cont_cgroup_path(memcg->css.cgroup);
 +  pr_info("\n");
  
 -  ret = cgroup_path(task_cgrp, memcg_name, PATH_MAX);
 -  if (ret < 0) {
 -  /*
 -   * Unfortunately, we are unable to convert to a useful name
 -   * But we'll still print out the usage information
 -   */
 -  rcu_read_unlock();
 -  goto done;
 -  }
rcu_read_unlock();
  
 -  pr_info("Task in %s killed", memcg_name);
 -
 -  rcu_read_lock();
 -  ret = cgroup_path(mem_cgrp, memcg_name, PATH_MAX);
 -  if (ret < 0) {
 -  rcu_read_unlock();
 -  goto done;
 -  }
 -  rcu_read_unlock();
 -
 -  /*
 -   * Continues from above, so we don't need an KERN_ level
 -   */
 -  pr_cont(" as a result of limit of %s\n", memcg_name);
 -done:
 -
pr_info("memory: usage %llukB, limit %llukB, failcnt %llu\n",
res_counter_read_u64(>res, RES_USAGE) >> 10,
res_counter_read_u64(>res, RES_LIMIT) >> 10,


pgpTPaEIrdgfR.pgp
Description: PGP signature


Re: [PATCHv1 0/2] Convert rx51-battery to IIO API and add DT support

2014-02-25 Thread Pali Rohár
Hi!

2014-02-26 1:46 GMT+01:00 Sebastian Reichel :
> Hi,
>
> This is PATCHv1 for converting rx51-battery to the IIO API
> and adding DT support. The patchset compiles and has been
> tested on my Nokia N900. It depends on another patchset
> converting twl4030-madc to the IIO API:
>
> https://lkml.org/lkml/2014/2/25/627
>
> -- Sebastian
>
> Sebastian Reichel (2):
>   rx51_battery: convert to iio consumer
>   Documentation: DT: Document rx51-battery binding
>
>  .../devicetree/bindings/power/rx51-battery.txt | 25 
>  drivers/power/rx51_battery.c   | 68 
> ++
>  2 files changed, 70 insertions(+), 23 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/power/rx51-battery.txt
>
> --
> 1.8.5.3
>

Thanks for patch!

I would like to ask other kernel developers what do you think about
moving ADC channel numbers from rx51_battery.ko driver code to DT.
Driver rx51_battery.ko is platform specific for Nokia RX-51 (N900) so
it is usefull only for this one device.

Before this patch all driver data (look-up tables, adc channel
numbers, etc...) were in driver code. Now after this patch adc channel
numbers were moved to DT. What do you think? It is better to have all
data in one place (driver code) or some in DT and some in driver code?

For me it does not make sense to move these numbers to DT, because
driver is rx51 device specific and chaning it in DT does not make
sense. And I think it is better to have add driver data in one place
and not in two...

Sebastian already wrote me that this is normal to have numbers in DT
and other code in driver. But I think that driver which can be used
only in one device (so specified only in one DT file) does not need to
have configuration (via DT or board files).

Or do you think that driver specified only for one device needs to
have ADC numbers configuration via DT?

-- 
Pali Rohár
pali.ro...@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] DMA: Freescale: driver cleanups and enhancements

2014-02-25 Thread Hongbo Zhang

Hi Vinod,
How about these patches?
Thanks.


On 01/16/2014 01:47 PM, hongbo.zh...@freescale.com wrote:

From: Hongbo Zhang 

Hi Vinod Koul and Dan Williams,
Please have a look at these patches.

Note that patch 2~6 had beed sent out for upstream before, but were together
with other storage patches at that time, that was not easy for being reviewed
and merged, so I send them separately this time.

Thanks.

Hongbo Zhang (7):
   DMA: Freescale: unify register access methods
   DMA: Freescale: remove attribute DMA_INTERRUPT of dmaengine
   DMA: Freescale: add fsl_dma_free_descriptor() to reduce code
 duplication
   DMA: Freescale: move functions to avoid forward declarations
   DMA: Freescale: change descriptor release process for supporting
 async_tx
   DMA: Freescale: use spin_lock_bh instead of spin_lock_irqsave
   DMA: Freescale: add suspend resume functions for DMA driver

  drivers/dma/fsldma.c |  592 --
  drivers/dma/fsldma.h |   33 ++-
  2 files changed, 412 insertions(+), 213 deletions(-)





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the akpm-current tree with the ext3 tree

2014-02-25 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in
fs/notify/fanotify/fanotify_user.c between commit ff57cd5863cf
("fsnotify: Allocate overflow events with proper type") from the ext3
tree and commit c40e3490382b ("fanotify: convert access_mutex to
spinlock") from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc fs/notify/fanotify/fanotify_user.c
index 287a22c04149,c3406d633925..
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@@ -731,21 -690,9 +691,21 @@@ SYSCALL_DEFINE2(fanotify_init, unsigne
group->fanotify_data.user = user;
atomic_inc(>fanotify_listeners);
  
 +  oevent = kmem_cache_alloc(fanotify_event_cachep, GFP_KERNEL);
 +  if (unlikely(!oevent)) {
 +  fd = -ENOMEM;
 +  goto out_destroy_group;
 +  }
 +  group->overflow_event = >fse;
 +  fsnotify_init_event(group->overflow_event, NULL, FS_Q_OVERFLOW);
 +  oevent->tgid = get_pid(task_tgid(current));
 +  oevent->path.mnt = NULL;
 +  oevent->path.dentry = NULL;
 +
group->fanotify_data.f_flags = event_f_flags;
  #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
 +  oevent->response = 0;
-   mutex_init(>fanotify_data.access_mutex);
+   spin_lock_init(>fanotify_data.access_lock);
init_waitqueue_head(>fanotify_data.access_waitq);
INIT_LIST_HEAD(>fanotify_data.access_list);
atomic_set(>fanotify_data.bypass_perm, 0);


pgp6K3TvrJFg4.pgp
Description: PGP signature


Re: [PATCH 2/8] security: apparmor: Use a more current logging style

2014-02-25 Thread John Johansen
On 02/24/2014 01:59 PM, Joe Perches wrote:
> Convert printks to pr_.
> Add pr_fmt.
> Coalesce formats.
> Remove embedded prefixes from logging.
> 

you missed one place,

--- a/security/apparmor/include/apparmor.h
+++ b/security/apparmor/include/apparmor.h
@@ -50,7 +50,7 @@ extern unsigned int aa_g_path_max;
 #define AA_DEBUG(fmt, args...) \
do {\
if (aa_g_debug && printk_ratelimit())   \
-   printk(KERN_DEBUG "AppArmor: " fmt, ##args);\
+   pr_debug(fmt, ##args);  \
} while (0)
 
 #define AA_ERROR(fmt, args...) \

other than that looks good.

> Signed-off-by: Joe Perches 
Acked-by: John Johansen 

> ---
>  security/apparmor/apparmorfs.c   | 2 ++
>  security/apparmor/crypto.c   | 2 ++
>  security/apparmor/include/apparmor.h | 2 +-
>  security/apparmor/lib.c  | 4 +++-
>  security/apparmor/lsm.c  | 2 ++
>  security/apparmor/match.c| 5 +++--
>  security/apparmor/policy.c   | 2 ++
>  security/apparmor/procattr.c | 2 ++
>  8 files changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
> index 7db9954..d4b65cc 100644
> --- a/security/apparmor/apparmorfs.c
> +++ b/security/apparmor/apparmorfs.c
> @@ -12,6 +12,8 @@
>   * License.
>   */
>  
> +#define pr_fmt(fmt) "AppArmor: " fmt
> +
>  #include 
>  #include 
>  #include 
> diff --git a/security/apparmor/crypto.c b/security/apparmor/crypto.c
> index 532471d..9506544 100644
> --- a/security/apparmor/crypto.c
> +++ b/security/apparmor/crypto.c
> @@ -15,6 +15,8 @@
>   * it should be.
>   */
>  
> +#define pr_fmt(fmt) "AppArmor: " fmt
> +
>  #include 
>  
>  #include "include/apparmor.h"
> diff --git a/security/apparmor/include/apparmor.h 
> b/security/apparmor/include/apparmor.h
> index 8fb1488..3065025 100644
> --- a/security/apparmor/include/apparmor.h
> +++ b/security/apparmor/include/apparmor.h
> @@ -56,7 +56,7 @@ extern unsigned int aa_g_path_max;
>  #define AA_ERROR(fmt, args...)   
> \
>   do {\
>   if (printk_ratelimit()) \
> - printk(KERN_ERR "AppArmor: " fmt, ##args);  \
> + pr_err(fmt, ##args);\
>   } while (0)
>  
>  /* Flag indicating whether initialization completed */
> diff --git a/security/apparmor/lib.c b/security/apparmor/lib.c
> index 6968992..432b1b6 100644
> --- a/security/apparmor/lib.c
> +++ b/security/apparmor/lib.c
> @@ -12,6 +12,8 @@
>   * License.
>   */
>  
> +#define pr_fmt(fmt) "AppArmor: " fmt
> +
>  #include 
>  #include 
>  #include 
> @@ -73,7 +75,7 @@ void aa_info_message(const char *str)
>   aad.info = str;
>   aa_audit_msg(AUDIT_APPARMOR_STATUS, , NULL);
>   }
> - printk(KERN_INFO "AppArmor: %s\n", str);
> + pr_info("%s\n", str);
>  }
>  
>  /**
> diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
> index 9981000..49f0180 100644
> --- a/security/apparmor/lsm.c
> +++ b/security/apparmor/lsm.c
> @@ -12,6 +12,8 @@
>   * License.
>   */
>  
> +#define pr_fmt(fmt) "AppArmor: " fmt
> +
>  #include 
>  #include 
>  #include 
> diff --git a/security/apparmor/match.c b/security/apparmor/match.c
> index 727eb42..688482a 100644
> --- a/security/apparmor/match.c
> +++ b/security/apparmor/match.c
> @@ -12,6 +12,8 @@
>   * License.
>   */
>  
> +#define pr_fmt(fmt) "AppArmor: " fmt
> +
>  #include 
>  #include 
>  #include 
> @@ -140,8 +142,7 @@ static int verify_dfa(struct aa_dfa *dfa, int flags)
>   if (DEFAULT_TABLE(dfa)[i] >= state_count)
>   goto out;
>   if (base_idx(BASE_TABLE(dfa)[i]) + 255 >= trans_count) {
> - printk(KERN_ERR "AppArmor DFA next/check upper "
> -"bounds error\n");
> + pr_err("DFA next/check upper bounds error\n");
>   goto out;
>   }
>   }
> diff --git a/security/apparmor/policy.c b/security/apparmor/policy.c
> index 705c287..4e20c1f 100644
> --- a/security/apparmor/policy.c
> +++ b/security/apparmor/policy.c
> @@ -73,6 +73,8 @@
>   * FIXME: move profile lists to using rcu_lists
>   */
>  
> +#define pr_fmt(fmt) "AppArmor: " fmt
> +
>  #include 
>  #include 
>  #include 
> diff --git a/security/apparmor/procattr.c b/security/apparmor/procattr.c
> index b125acc..c105fc5 100644
> --- a/security/apparmor/procattr.c
> +++ b/security/apparmor/procattr.c
> @@ -12,6 +12,8 @@
>   * License.
>   */
>  
> +#define pr_fmt(fmt) "AppArmor: " fmt
> +
>  #include 

Re: linux-next: manual merge of the clk tree with the keystone tree

2014-02-25 Thread Mike Turquette
On Tue, Feb 25, 2014 at 11:15 PM, Stephen Rothwell  
wrote:
> Hi Mike,
>
> Today's linux-next merge of the clk tree got a conflict in
> arch/arm/boot/dts/keystone-clocks.dtsi between commit 0cfc9ccec2a8 ("ARM:
> dts: keystone: preparatory patch to support K2L and K2E SOCs") from the
> keystone tree and commit 565bbdcd3b91 ("ARM: keystone: dts: fix clkvcp3
> control register address") from the clk tree.
>
> I fixed it up (by adding the following merge fix patch) and can carry the
> fix as necessary (no action is required).
>
> From: Stephen Rothwell 
> Date: Wed, 26 Feb 2014 18:12:55 +1100
> Subject: [PATCH] ARM: keystone: dts: fix for code movement
>
> Signed-off-by: Stephen Rothwell 

Looks good to me.

Thanks!
Mike

> ---
>  arch/arm/boot/dts/k2hk-clocks.dtsi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm/boot/dts/k2hk-clocks.dtsi 
> b/arch/arm/boot/dts/k2hk-clocks.dtsi
> index 4eed84feb761..a71aa2996321 100644
> --- a/arch/arm/boot/dts/k2hk-clocks.dtsi
> +++ b/arch/arm/boot/dts/k2hk-clocks.dtsi
> @@ -339,7 +339,7 @@ clocks {
> compatible = "ti,keystone,psc-clock";
> clocks = <>;
> clock-output-names = "vcp-3";
> -   reg = <0x0235000a8 0xb00>, <0x02350060 0x400>;
> +   reg = <0x023500a8 0xb00>, <0x02350060 0x400>;
> reg-names = "control", "domain";
> domain-id = <24>;
> };
> --
> 1.9.0
>
> --
> Cheers,
> Stephen Rothwells...@canb.auug.org.au
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Quality plastic raw materials

2014-02-25 Thread Iris
Dear Sir,

Glad to hear that you're on the market for HDPE/LDPE/LLDPE/PET. We specialize 
in this field for several years, with the Blow moulding ,injection moulding 
,film industry, with good quality and pretty competitive price.

Should you have any questions, pls do not hesitate to contact me. FREE SAMPLES 
will be sent for your evaluation!

Tks & br,

Iris
Company: TIANJIN DEKAY  INDUSTRIAL CO., LIMITED 
Skype: iris.song12
MSN: iris20111...@hotmail.com
qN�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤�
0鹅h���i

Re: mm: NULL ptr deref in balance_dirty_pages_ratelimited

2014-02-25 Thread Bob Liu
On Wed, Feb 26, 2014 at 3:32 AM, Sasha Levin  wrote:
> Hi all,
>
> While fuzzing with trinity inside a KVM tools running latest -next kernel
> I've stumbled on the following spew:
>
> [  232.869443] BUG: unable to handle kernel NULL pointer dereference at
> 0020
> [  232.870230] IP: []
> balance_dirty_pages_ratelimited+0x1e/0x150
> [  232.870230] PGD 586e1d067 PUD 586e1e067 PMD 0
> [  232.870230] Oops:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [  232.870230] Dumping ftrace buffer:
> [  232.870230](ftrace buffer empty)
> [  232.870230] Modules linked in:
> [  232.870230] CPU: 36 PID: 9707 Comm: trinity-c36 Tainted: GW
> 3.14.0-rc4-next-20140225-sasha-00010-ga117461 #42
> [  232.870230] task: 880586dfb000 ti: 880586e34000 task.ti:
> 880586e34000
> [  232.870230] RIP: 0010:[]
> [] balance_dirty_pages_ratelimited+0x1e/0x150
> [  232.870230] RSP: :880586e35c58  EFLAGS: 00010282
> [  232.870230] RAX:  RBX: 880582831361 RCX:
> 0007
> [  232.870230] RDX: 0007 RSI: 880586dfbcc0 RDI:
> 880582831361
> [  232.870230] RBP: 880586e35c78 R08:  R09:
> 
> [  232.870230] R10: 0001 R11: 0001 R12:
> 7f58007ee000
> [  232.870230] R13: 880c8d6d4f70 R14: 0200 R15:
> 880c8dcce710
> [  232.870230] FS:  7f58018bb700() GS:880c8e80()
> knlGS:
> [  232.870230] CS:  0010 DS:  ES:  CR0: 8005003b
> [  232.870230] CR2: 0020 CR3: 000586e1c000 CR4:
> 06e0
> [  232.870230] Stack:
> [  232.870230]  880586e35c78 880586e33400 7f58007ee000
> 880c8d6d4f70
> [  232.870230]  880586e35cd8 8127d241 0001
> 0001
> [  232.870230]   ea0032337080 8000
> 880586e33400
> [  232.870230] Call Trace:
> [  232.870230]  [] do_shared_fault+0x1a1/0x1f0
> [  232.870230]  [] handle_pte_fault+0xc8/0x230
> [  232.870230]  [] ? delay_tsc+0xea/0x110
> [  232.870230]  [] __handle_mm_fault+0x36e/0x3a0
> [  232.870230]  [] ? rcu_read_unlock+0x5d/0x60
> [  232.870230]  []
> handle_mm_fault+0x10b/0x1b0
> [  232.870230]  [] ? __do_page_fault+0x2e2/0x590
> [  232.870230]  [] __do_page_fault+0x551/0x590
> [  232.870230]  [] ?
> vtime_account_user+0x91/0xa0
> [  232.870230]  [] ?
> context_tracking_user_exit+0xa8/0x1c0
> [  232.870230]  [] ?
> _raw_spin_unlock+0x30/0x50
> [  232.870230]  [] ?
> vtime_account_user+0x91/0xa0
> [  232.870230]  [] ?
> context_tracking_user_exit+0xa8/0x1c0
> [  232.870230]  [] do_page_fault+0x3d/0x70
> [  232.870230]  [] do_async_page_fault+0x35/0x100
> [  232.870230]  []
> async_page_fault+0x28/0x30
> [  232.870230] Code: 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48
> 83 ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 48 89 fb 48 8b 87 50 01 00 00
>  40 20 01 0f 85 18 01 00 00 65 48 8b 14 25 40 da 00 00 44 8b
> [  232.870230] RIP  []
> balance_dirty_pages_ratelimited+0x1e/0x150
> [  232.870230]  RSP 
> [  232.870230] CR2: 0020
>
>

Could you please test below patch? I think it may fix this issue.

diff --git a/mm/memory.c b/mm/memory.c
index 548d97e..90cea22 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3419,6 +3419,7 @@ static int do_shared_fault(struct mm_struct *mm,
struct vm_area_struct *vma,
  pgoff_t pgoff, unsigned int flags, pte_t orig_pte)
 {
  struct page *fault_page;
+ struct address_space *mapping;
  spinlock_t *ptl;
  pte_t *pte;
  int dirtied = 0;
@@ -3454,13 +3455,14 @@ static int do_shared_fault(struct mm_struct
*mm, struct vm_area_struct *vma,

  if (set_page_dirty(fault_page))
  dirtied = 1;
+ mapping = fault_page->mapping;
  unlock_page(fault_page);
- if ((dirtied || vma->vm_ops->page_mkwrite) && fault_page->mapping) {
+ if ((dirtied || vma->vm_ops->page_mkwrite) && mapping) {
  /*
  * Some device drivers do not set page.mapping but still
  * dirty their pages
  */
- balance_dirty_pages_ratelimited(fault_page->mapping);
+ balance_dirty_pages_ratelimited(mapping);
  }

  /* file_update_time outside page_lock */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the clk tree with the keystone tree

2014-02-25 Thread Stephen Rothwell
Hi Mike,

Today's linux-next merge of the clk tree got a conflict in
arch/arm/boot/dts/keystone-clocks.dtsi between commit 0cfc9ccec2a8 ("ARM:
dts: keystone: preparatory patch to support K2L and K2E SOCs") from the
keystone tree and commit 565bbdcd3b91 ("ARM: keystone: dts: fix clkvcp3
control register address") from the clk tree.

I fixed it up (by adding the following merge fix patch) and can carry the
fix as necessary (no action is required).

From: Stephen Rothwell 
Date: Wed, 26 Feb 2014 18:12:55 +1100
Subject: [PATCH] ARM: keystone: dts: fix for code movement

Signed-off-by: Stephen Rothwell 
---
 arch/arm/boot/dts/k2hk-clocks.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/k2hk-clocks.dtsi 
b/arch/arm/boot/dts/k2hk-clocks.dtsi
index 4eed84feb761..a71aa2996321 100644
--- a/arch/arm/boot/dts/k2hk-clocks.dtsi
+++ b/arch/arm/boot/dts/k2hk-clocks.dtsi
@@ -339,7 +339,7 @@ clocks {
compatible = "ti,keystone,psc-clock";
clocks = <>;
clock-output-names = "vcp-3";
-   reg = <0x0235000a8 0xb00>, <0x02350060 0x400>;
+   reg = <0x023500a8 0xb00>, <0x02350060 0x400>;
reg-names = "control", "domain";
domain-id = <24>;
};
-- 
1.9.0

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpMTn0npJq11.pgp
Description: PGP signature


Re: [PATCH net] vhost: net: switch to use data copy if pending DMAs exceed the limit

2014-02-25 Thread Jason Wang

On 02/26/2014 02:32 PM, Qin Chuanyu wrote:

On 2014/2/26 13:53, Jason Wang wrote:

On 02/25/2014 09:57 PM, Michael S. Tsirkin wrote:

On Tue, Feb 25, 2014 at 02:53:58PM +0800, Jason Wang wrote:

We used to stop the handling of tx when the number of pending DMAs
exceeds VHOST_MAX_PEND. This is used to reduce the memory occupation
of both host and guest. But it was too aggressive in some cases, since
any delay or blocking of a single packet may delay or block the guest
transmission. Consider the following setup:

 +-++-+
 | VM1 || VM2 |
 +--+--++--+--+
|  |
 +--+--++--+--+
 | tap0|| tap1|
 +--+--++--+--+
|  |
 pfifo_fast   htb(10Mbit/s)
|  |
 +--+--+---+
 | bridge  |
 +--+--+
|
 pfifo_fast
|
 +-+
 | eth0|(100Mbit/s)
 +-+

- start two VMs and connect them to a bridge
- add an physical card (100Mbit/s) to that bridge
- setup htb on tap1 and limit its throughput to 10Mbit/s
- run two netperfs in the same time, one is from VM1 to VM2. 
Another is

   from VM1 to an external host through eth0.
- result shows that not only the VM1 to VM2 traffic were throttled but
   also the VM1 to external host through eth0 is also throttled 
somehow.


This is because the delay added by htb may lead the delay the finish
of DMAs and cause the pending DMAs for tap0 exceeds the limit
(VHOST_MAX_PEND). In this case vhost stop handling tx request until
htb send some packets. The problem here is all of the packets
transmission were blocked even if it does not go to VM2.

We can solve this issue by relaxing it a little bit: switching to use
data copy instead of stopping tx when the number of pending DMAs
exceed the VHOST_MAX_PEND. This is safe because:

- The number of pending DMAs were still limited by VHOST_MAX_PEND
- The out of order completion during mode switch can make sure that
   most of the tx buffers were freed in time in guest.

So even if about 50% packets were delayed in zero-copy case, vhost
could continue to do the transmission through data copy in this case.

Test result:

Before this patch:
VM1 to VM2 throughput is 9.3Mbit/s
VM1 to External throughput is 40Mbit/s

After this patch:
VM1 to VM2 throughput is 9.3Mbit/s
Vm1 to External throughput is 93Mbit/s

Would like to see CPU utilization #s as well.



Will measure this.

Simple performance test on 40gbe shows no obvious changes in
throughput after this patch.

The patch only solve this issue when unlimited sndbuf. We still need a
solution for limited sndbuf.

Cc: Michael S. Tsirkin
Cc: Qin Chuanyu
Signed-off-by: Jason Wang

I think this needs some thought.

In particular I think this works because VHOST_MAX_PEND
is much smaller than the ring size.
Shouldn't max_pend then be tied to the ring size if it's small?



Yes it should. I just reuse the VHOST_MAX_PEND since it was there for a
long time.

Another question is about stopping vhost:
ATM it's waiting for skbs to complete.
Should we maybe hunt down skbs queued and destroy them
instead?
I think this happens when a device is removed.

Thoughts?



Agree, vhost net removal should not be blocked by a skb. But since the
skbs could be queued may places, just destroy them may need extra locks.

Haven't thought this deeply, but another possible sloution is to rcuify
destructor_arg and assign it to NULL during vhost_net removing.


Xen treat it by a timer, for those skbs which has been delivered for a
while, netback would exchange page of zero_copy's skb with dom0's page.

but there is still a race between host's another process handle the skb
and netback exchange its page. (This problem has been proved by testing)

and Xen hasn't solved this problem yet, because if anyone want to solve
this problem completely, a page lock is necessary, but it would be
complex and expensive.

rcuify destructor arg and assign it to NULL couldn't solve the problem
of page release that has been reserved by host's another process.



There're two issues:

1) if a zerocopy skb won't be freed or frags orphaned in time, vhost_net 
removal will be blocked since it was waiting for the refcnt of ubuf to zero.

2) whether or not we should free all pending skbs during vhost_net removing.

My proposal is for issue 1. Another idea is not wait for the refcnt to 
be zero and then we can defer the freeing of vhost_net during the 
release method of kref_put().


For issue 2, I'm still not sure we should do this or not. Looks like 
there's a similar issue for the packets sent by tcp_sendpage() was 
blocked or delayed.

The key problem is how to release the memory of zero_copy's skb while
been reserved. 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

[PATCH] drivercore: deferral race condition fix

2014-02-25 Thread Peter Ujfalusi
When the kernel is built with CONFIG_PREEMPT it is possible to reach a state
when all modules are loaded but some driver still stuck in the deferred list
and there is a need for external event to kick the deferred queue to probe
these drivers.

The issue has been observed on embedded systems with CONFIG_PREEMPT enabled,
audio support built as modules and using nfsroot for root filesystem.

The following fragment of a log shows such sequence when all audio modules
were loaded but the sound card is not present since the machine driver has
failed to probe due to missing dependency during it's probe.
The board is am335x-evmsk (McASP<->tlv320aic3106 codec) with davinci-evm
machine driver:

...
[   12.615118] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: ENTER
[   12.719969] davinci_evm sound.3: davinci_evm_probe: ENTER
[   12.725753] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card
[   12.753846] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: 
snd_soc_register_component
[   12.922051] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: 
snd_soc_register_component DONE
[   12.950839] davinci_evm sound.3: ASoC: platform (null) not registered
[   12.957898] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card 
DONE (-517)
[   13.099026] davinci-mcasp 4803c000.mcasp: Kicking the deferred list
[   13.177838] davinci-mcasp 4803c000.mcasp: really_probe: probe_count = 2
[   13.194130] davinci_evm sound.3: snd_soc_register_card failed (-517)
[   13.346755] davinci_mcasp_driver_init: LEAVE
[   13.377446] platform sound.3: Driver davinci_evm requests probe deferral
[   13.592527] platform sound.3: really_probe: probe_count = 0

In the log the machine driver enters it's probe at 12.719969 (this point it
has been removed from the deferred lists). McASP driver already executing
it's probing (12.615118) and finishes first as well.
The machine driver tries to construct the sound card (12.950839) but did
not found one of the components so it fails. After this McASP driver
registers all the ASoC components and the deferred work is prepared at
13.099026 (note that this time the machine driver is not in the lists so it
is not going to be handled when the work is executing).
Lastly the machine driver exit from it's probe and the core places it to the
deferred list but there will be no other driver going to load and the
deferred queue is not going to be kicked again - till we have external event
like connecting USB stick, etc.

The proposed solution is to try the deferred queue once more when the last
driver is asking for deferring and we had drivers loaded while this last
driver was probing.

This way we can avoid drivers stuck in the deferred queue.

Signed-off-by: Peter Ujfalusi 
---
Hi Greg, Grant,

I have fixed up the commit message and rebased the patch on top of 3.14-rc4
since the RFC version [1].

[1] https://lkml.org/lkml/2013/12/23/142

Regards,
Peter

 drivers/base/dd.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 06051767393f..80703de6e6ad 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -53,6 +53,10 @@ static LIST_HEAD(deferred_probe_pending_list);
 static LIST_HEAD(deferred_probe_active_list);
 static struct workqueue_struct *deferred_wq;
 
+static atomic_t probe_count = ATOMIC_INIT(0);
+static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
+static bool deferral_retry;
+
 /**
  * deferred_probe_work_func() - Retry probing devices in the active list.
  */
@@ -141,6 +145,11 @@ static void driver_deferred_probe_trigger(void)
if (!driver_deferred_probe_enable)
return;
 
+   if (atomic_read(_count) > 1)
+   deferral_retry = true;
+   else
+   deferral_retry = false;
+
/*
 * A successful probe means that all the devices in the pending list
 * should be triggered to be reprobed.  Move all the deferred devices
@@ -259,9 +268,6 @@ int device_bind_driver(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(device_bind_driver);
 
-static atomic_t probe_count = ATOMIC_INIT(0);
-static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
-
 static int really_probe(struct device *dev, struct device_driver *drv)
 {
int ret = 0;
@@ -310,6 +316,16 @@ probe_failed:
/* Driver requested deferred probing */
dev_info(dev, "Driver %s requests probe deferral\n", drv->name);
driver_deferred_probe_add(dev);
+   /*
+* This is the last driver to load and asking to be deferred.
+* If other driver(s) loaded while this driver was loading, we
+* should try the deferred modules again to avoid missing
+* dependency for this driver.
+*/
+   if (atomic_read(_count) == 1 && deferral_retry) {
+   deferral_retry = false;
+   driver_deferred_probe_trigger();
+   }
  

Re: [PATCH v3 00/14] perf, x86: Haswell LBR call stack support

2014-02-25 Thread Stephane Eranian
On Wed, Feb 26, 2014 at 3:39 AM, Andy Lutomirski  wrote:
> On 02/17/2014 10:07 PM, Yan, Zheng wrote:
>>
>> This patch series adds LBR call stack support. User can enabled/disable
>> this through an sysfs attribute file in the CPU PMU directory:
>>  echo 1 > /sys/bus/event_source/devices/cpu/lbr_callstack
>
> This seems like an unpleasant way to control this.  It would be handy to
> be able to control this as an option to perf record.
>
That would mean you'd be root for perf.
Or are you suggesting a perf event option? But then, you'd expose arch-specific
feature at the API level.

> --Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Info: mapping multiple BARs. Your kernel is fine.

2014-02-25 Thread Stephane Eranian
Hi,

On Tue, Feb 25, 2014 at 11:10 PM, Borislav Petkov  wrote:
> On Tue, Feb 25, 2014 at 07:54:53PM +0100, Stephane Eranian wrote:
>
>> I am on tip.git at cfbf8d4 Linux 3.14-rc4.
>> and I don't see the problem (using Ubuntu Saucy).
>
> Also IVB, model 58?
>
Yes.

>> Given what you commented out, it seems like you're saying
>> something goes wrong with pci_get_device().
>
> Probably. I'll add some debug printk's tomorrow to shed some more light
> on the matter.
>
>> Am I missing some pm callbacks?
>
> Dunno. What do you mean by "pm callbacks" exactly? I don't know that
> code so I have to ask.
>
power management callbacks.

>> The uncore IMC is not used internally.
>
> By IMC I'm assuming this PIC dev:
>
> #define PCI_DEVICE_ID_INTEL_IVB_IMC 0x0154
>
> ?
>
Yes. Needs to point to the DRAM controller.

> And "internally" means by BIOS or something behind the curtains like
> SMM...?
>
I meant by the kernel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Staging: comedi: addi-data: clean-up variable use in hwdrv_apci035.c

2014-02-25 Thread Chase Southwood
>On Tuesday, February 25, 2014 3:56 AM, Ian Abbott  wrote:

>>On 2014-02-25 08:15, Chase Southwood wrote:
>> The static variable ui_Command is as of right now being cleared to a
>> value of zero between everytime that it writes to a port and then takes a
>> new value from a port.  Seems like this zeroing is unnecessary, so we can
>> just remove these lines.
>
>The description is slightly wrong as the variable isn't static storage 
>class.
>

Oh, shoot, you're exactly right.  I was looking at some other variables which
are static when I was making this up and I got storage classes all mixed up.
I'll fix up the description and send it off to Greg again as it hasn't been
applied yet.

>>
>> Signed-off-by: Chase Southwood 
>> ---
>> This sort of thing seems like a copy/paste sort of error to me, but there
>> could as easily be some reason here that I'm missing that this is needed
>> here.  My first impression, however, was that this extra clearing is
>> useless.
>
>Yes, the extra clearing was useless.  There are also some useless 
>initializers for this variable and others.
>
>Fine, apart from the description.
>
>Reviewed-by: Ian Abbott 
>
>-- 
>-=( Ian Abbott @ MEV Ltd.    E-mail:         )=-
>-=( Tel: +44 (0)161 477 1898  FAX: +44 (0)161 718 3587        )=-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the char-misc tree

2014-02-25 Thread Stephen Rothwell
Hi Greg,

On Fri, 21 Feb 2014 16:47:11 +1100 Stephen Rothwell  
wrote:
>
> After merging the char-misc tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
> 
> In file included from drivers/misc/mei/hw-txe.c:25:0:
> drivers/misc/mei/hw-txe.h:63:1: error: unknown type name 'irqreturn_t'
>  irqreturn_t mei_txe_irq_quick_handler(int irq, void *dev_id);
>  ^
> 
> Caused by commit 266f6178d1f1 ("mei: txe: add hw-txe.h header file") but
> probably exposed by commit 46cb7b1bd86f ("PCI: Remove unused SR-IOV VF
> Migration support") from the pci tree which removed the include of
> irqreturn.h from pci.h ...
> 
> See Rule 1 from Documentation/SubmitChecklist ...
> 
> I added the following merge fix patch (this should be applied to the
> char-misc tree):

Ping?

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgprJ5GNTlvwv.pgp
Description: PGP signature


linux-next: build failure after merge of the cgroup tree

2014-02-25 Thread Stephen Rothwell
Hi Tejun,

After merging the cgroup tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

kernel/cgroup.c: In function 'cgroup_mount':
kernel/cgroup.c:1572:2: error: too few arguments to function 'kernfs_mount'
  dentry = kernfs_mount(fs_type, flags, root->kf_root);
  ^

Caused by commit 2bd59d48ebfb ("cgroup: convert to kernfs") interacting
with commit fed95bab8d29 ("sysfs: fix namespace refcnt leak") from the
driver-core.current tree.

I added the following merge fix patch, but it may not be completely
correct, please check.

From: Stephen Rothwell 
Date: Wed, 26 Feb 2014 17:41:51 +1100
Subject: [PATCH] cgroup: fix up for kernfs_mount API change

Signed-off-by: Stephen Rothwell 
---
 kernel/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 306ad0ed19ef..8f4ddbe23d58 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1569,7 +1569,7 @@ out_unlock:
if (ret)
return ERR_PTR(ret);
 
-   dentry = kernfs_mount(fs_type, flags, root->kf_root);
+   dentry = kernfs_mount(fs_type, flags, root->kf_root, NULL);
if (IS_ERR(dentry))
cgroup_put(>top_cgroup);
return dentry;
-- 
1.9.0

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpin_qLz1ciX.pgp
Description: PGP signature


Re: [PATCH v5 0/10] fs: Introduce new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocate

2014-02-25 Thread Dave Chinner
On Tue, Feb 25, 2014 at 08:45:15PM -0800, Hugh Dickins wrote:
> On Wed, 26 Feb 2014, Dave Chinner wrote:
> > On Tue, Feb 25, 2014 at 03:23:35PM -0800, Hugh Dickins wrote:
> > > Of course I'm interested in the possibility of extending it to tmpfs;
> > > which may not be a worthwhile exercise in itself, except that it would
> > > force us to face and solve any pagecache/radixtree issues, if possible,
> > > thereby enhancing the support for disk-based filesystems.
> > > 
> > > I doubt we should look into that before Jan Kara's range locking mods
> > > arrive, or are rejected.  As I understand it, you're going ahead with
> > > this, knowing that there can be awkward races with concurrent faults -
> > > more likely to cause trinity fuzzer reports, than interfere with daily
> > > usage (trinity seems to be good at faulting into holes being punched).
> > 
> > Yes, the caveat is that the applications that use it (TVs, DVRs, NLE
> > applications, etc) typically don't use mmap for accessing the data
> > stream being modified. Further, it's much less generally useful than
> > holepunching, so when these two are combined, the likely exposure to
> > issues resulting from mmap deficiencies are pretty damn low.
> 
> Agreed, but we do want to define how the interaction behaves, we want
> it to be the same across all filesystems supporting COLLAPSE_RANGE,
> and we don't want it to lead to system crashes or corruptions.

We have defined it. It's just a data manipulation operation that
moves file data from one offset to another. How  a filesystem
implements that is a filesystem's problem, not a problem with the
API.

ext4 and XFS implement it by removing all cached data and mappings
over the range from memory and then mainpulating extent status.
Other fileystems might be able to do similar things, or we might
just do an internal kernel read/write loop. And there's nothing
ruling out a hardware ior nework protocol copy offload being used to
implement an optimised data copy, either.

tmpfs is different in that it's only data store is the page cache,
so it needs to operate within the constraints of the page cache.
moving pages inside the page cache might be a new operation for
tmpfs, but it's not an operation that is needed by filesystems
to implement this file data manipulation.

> > > That's probably the right pragmatic decision; but I'm a little worried
> > > that it's justfied by saying we already have such races in hole-punch.
> > > Collapse is significantly more challenging than either hole-punch or
> > > truncation: the shifting of pages from one offset to another is new,
> > > and might present nastier bugs.
> > 
> > Symptoms might be different, but it's exactly the same problem. i.e.
> > mmap_sem locking inversions preventing the filesystem from
> > serialising IO path operations like hole punch, truncate and other
> > extent manipulation operations against concurrent page faults
> > that enter the IO path.
> 
> That may (may) be true of the current kick-everything-out-of-pagecache
> approach.  But in general I stand by "Collapse is significantly more
> challenging".  Forgive me if that amounts to saying "Hey, here's a
> more complicated way to do it.  Ooh, this way is more complicated."
> The concept of moving a page from one file offset to another is new,
> and can be expected to pose new difficulties.

Collapse might be challenging for tmpfs, but it's relatively trivial
for block based filesystems because we have an independent backing
store and so we don't need to move cached data around.

> > > Emphasis on "might": I expect it's impossible, given your current
> > > approach, but something to be on guard against is unmap_mapping_range()
> > > failing to find and unmap a pte, because the page is mapped at the
> > > "wrong" place in the vma, resulting in BUG_ON(page_mapped(page))
> > > in __delete_from_page_cache().
> > 
> > Unmapping occurs before anything is shifted. And even if a fault
> > does occur before the file size changes at the end of a collapse
> > range operation (via the truncate path), the page in the page cache
> > won't be moved about so I don't see how the above problem could
> > occur. All that will happen is that you get the wrong data in the
> > mmap()d page, just like you will with hole_punch issues.
> 
> I think you're probably right.  I expect that attempting to fault
> a page back from disk while collapse is shifting down, will hit a
> mutex and wait.  But that's liable to differ from filesystem to
> filesystem, so I'm not certain.

Well, no, we can't do that entirely atomically because of mmap_sem
inversions. Individual extent shifts, yes, but not against the
operation as a whole. i.e. fallocate needs to serialise against IO
operations, but we can't serialise Io operations against page faults
because of mmap_sem inversions

> > > But your case is different: collapse is much closer to truncation,
> > > and if you do not unmap the private COW'ed pages, then pages left
> > > behind beyond 

Re: [PATCH net] vhost: net: switch to use data copy if pending DMAs exceed the limit

2014-02-25 Thread Qin Chuanyu

On 2014/2/26 13:53, Jason Wang wrote:

On 02/25/2014 09:57 PM, Michael S. Tsirkin wrote:

On Tue, Feb 25, 2014 at 02:53:58PM +0800, Jason Wang wrote:

We used to stop the handling of tx when the number of pending DMAs
exceeds VHOST_MAX_PEND. This is used to reduce the memory occupation
of both host and guest. But it was too aggressive in some cases, since
any delay or blocking of a single packet may delay or block the guest
transmission. Consider the following setup:

 +-++-+
 | VM1 || VM2 |
 +--+--++--+--+
|  |
 +--+--++--+--+
 | tap0|| tap1|
 +--+--++--+--+
|  |
 pfifo_fast   htb(10Mbit/s)
|  |
 +--+--+---+
 | bridge  |
 +--+--+
|
 pfifo_fast
|
 +-+
 | eth0|(100Mbit/s)
 +-+

- start two VMs and connect them to a bridge
- add an physical card (100Mbit/s) to that bridge
- setup htb on tap1 and limit its throughput to 10Mbit/s
- run two netperfs in the same time, one is from VM1 to VM2. Another is
   from VM1 to an external host through eth0.
- result shows that not only the VM1 to VM2 traffic were throttled but
   also the VM1 to external host through eth0 is also throttled somehow.

This is because the delay added by htb may lead the delay the finish
of DMAs and cause the pending DMAs for tap0 exceeds the limit
(VHOST_MAX_PEND). In this case vhost stop handling tx request until
htb send some packets. The problem here is all of the packets
transmission were blocked even if it does not go to VM2.

We can solve this issue by relaxing it a little bit: switching to use
data copy instead of stopping tx when the number of pending DMAs
exceed the VHOST_MAX_PEND. This is safe because:

- The number of pending DMAs were still limited by VHOST_MAX_PEND
- The out of order completion during mode switch can make sure that
   most of the tx buffers were freed in time in guest.

So even if about 50% packets were delayed in zero-copy case, vhost
could continue to do the transmission through data copy in this case.

Test result:

Before this patch:
VM1 to VM2 throughput is 9.3Mbit/s
VM1 to External throughput is 40Mbit/s

After this patch:
VM1 to VM2 throughput is 9.3Mbit/s
Vm1 to External throughput is 93Mbit/s

Would like to see CPU utilization #s as well.



Will measure this.

Simple performance test on 40gbe shows no obvious changes in
throughput after this patch.

The patch only solve this issue when unlimited sndbuf. We still need a
solution for limited sndbuf.

Cc: Michael S. Tsirkin
Cc: Qin Chuanyu
Signed-off-by: Jason Wang

I think this needs some thought.

In particular I think this works because VHOST_MAX_PEND
is much smaller than the ring size.
Shouldn't max_pend then be tied to the ring size if it's small?



Yes it should. I just reuse the VHOST_MAX_PEND since it was there for a
long time.

Another question is about stopping vhost:
ATM it's waiting for skbs to complete.
Should we maybe hunt down skbs queued and destroy them
instead?
I think this happens when a device is removed.

Thoughts?



Agree, vhost net removal should not be blocked by a skb. But since the
skbs could be queued may places, just destroy them may need extra locks.

Haven't thought this deeply, but another possible sloution is to rcuify
destructor_arg and assign it to NULL during vhost_net removing.


Xen treat it by a timer, for those skbs which has been delivered for a
while, netback would exchange page of zero_copy's skb with dom0's page.

but there is still a race between host's another process handle the skb
and netback exchange its page. (This problem has been proved by testing)

and Xen hasn't solved this problem yet, because if anyone want to solve
this problem completely, a page lock is necessary, but it would be
complex and expensive.

rcuify destructor arg and assign it to NULL couldn't solve the problem
of page release that has been reserved by host's another process.

The key problem is how to release the memory of zero_copy's skb while
been reserved.

---
  drivers/vhost/net.c | 17 +++--
  1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index a0fa5de..3e96e47 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -345,7 +345,7 @@ static void handle_tx(struct vhost_net *net)
  .msg_flags = MSG_DONTWAIT,
  };
  size_t len, total_len = 0;
-int err;
+int err, num_pends;
  size_t hdr_size;
  struct socket *sock;
  struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
@@ -366,13 +366,6 @@ static void handle_tx(struct vhost_net *net)
  if (zcopy)
  vhost_zerocopy_signal_used(net, vq);

-/* If more outstanding DMAs, queue the work.
- * Handle upend_idx wrap around
- */
-if (unlikely((nvq->upend_idx + vq->num - VHOST_MAX_PEND)
-  

Re: [PATCH] kallsyms: fix absolute addresses for kASLR

2014-02-25 Thread Kees Cook
On Mon, Feb 24, 2014 at 5:29 PM, Rusty Russell  wrote:
> Kees Cook  writes:
>> From: Andy Honig 
>>
>> Currently symbols that are absolute addresses are incorrectly
>> displayed in /proc/kallsyms if the kernel is loaded with kASLR.
>>
>> The problem was that the scripts/kallsyms.c file which generates
>> the array of symbol names and addresses uses an relocatable value
>> for all symbols, even absolute symbols.  This patch fixes that.
>
> Hi Andy, Kees,
>
> This is not a good patch.  See the commit where this was
> introduced:
>
> [PATCH] relocatable kernel: Fix kallsyms on avr32 after relocatable kernel 
> changes
>
> o On some platforms like avr32, section init comes before .text and
>   not necessarily a symbol's relative position w.r.t _text is positive.
>   In such cases assembler detects the overflow and emits warning. This
>   patch fixes it.
>
> Did you just break avr32?
>
> And absolute symbols are supposed to be handled in the other branch:
>
> for (i = 0; i < table_cnt; i++) {
> if (toupper(table[i].sym[0]) != 'A') {
> if (_text <= table[i].addr)
> printf("\tPTR\t_text + %#llx\n",
> table[i].addr - _text);
> else
> printf("\tPTR\t_text - %#llx\n",
> _text - table[i].addr);
> } else {
> printf("\tPTR\t%#llx\n", table[i].addr);
> }
> }
>
> __per_cpu_start is not an absolute symbol anyway.
>
> You need to fix this properly.
> Rusty.

Hm, yeah, it seems we need another class of variable. The per_cpu
stuff is technically relative, but it's not relocated, since it's not
relative to the text location. We'll see how to do this more sanely.

Thanks!

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] cpufreq: Set policy to non-NULL only after all hotplug online work is done

2014-02-25 Thread Viresh Kumar
On 26 February 2014 09:08, Saravana Kannan  wrote:
> The existing code sets the per CPU policy to a non-NULL value before all
> the steps performed during the hotplug online path is done. Specifically,
> this is done before the policy min/max, governors, etc are initialized for
> the policy.  This in turn means that calls to cpufreq_cpu_get() return a
> non-NULL policy before the policy/CPU is ready to be used.

First two patches are fine but I would still say that take the three patches
I posted instead of this. Reasoning already given in other mails.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done

2014-02-25 Thread Viresh Kumar
On 26 February 2014 07:18, Saravana Kannan  wrote:
> On 02/25/2014 02:41 PM, Rafael J. Wysocki wrote:

>> And is "fully initialized" actually well defined?
>
> The point in add dev/hot plug path after which we will no longer change
> policy fields without sending further CPUFREQ_UPDATE_POLICY_CPU /
> CPUFRE_NOTIFY notifiers.

Okay..

> Pretty much the end of __cpufreq_add_dev() so that it's after:
> - cpufreq_init_policy()
> - And the update of userpolicy fields that after thie init call

No. In that case it can be considered initialized before cpufreq_init_policy().
As we do send CPUFREQ_NOTIFY after that from cpufreq_init_policy()->
cpufreq_set_policy().

There are two types of fields within policy, some are very basic: cpu/min/max/
affected_cpus/related_cpus

some are advanced: sysfs/governors/..

And as a rule you have to get policy->rwsem lock before accessing policy
members. We might not have followed it very well for small things like cpu.

And so if you are doing anything over that, please use a lock and that is
already present in cpufreq_update_policy().

With my latest patchset that I sent yesterday, locking is improved and now
a policy will be usable only after the rwsem is released. And that should be
fine. And so making it available in the per-cpu variable after all the necessary
fields are filled looks fine to me. And so I don't think we need to move it
after call to cpufreq_init_policy(maybe a better name to this function is
required)..

> Ok, here's some pseudo code to explain it better:
>
> Something like, replace all calls to cpufreq_driver->get with
> __cpufreq_driver_get() with the fn being something like:
>
> unsigned int __cpufreq_driver_get(cpufreq_policy *policy)
> {
>   if (policy->clk)
>  return clk_get_rate(policy->clk) / 1000;
>   else
>  return cpufreq_driver->get(policy->cpu);

This part may still use cpufreq_cpu_get().

> }

Drivers are free to have their implementation of ->get() even
if they have a valid policy->clk field..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_pmu_start WARN_ON.

2014-02-25 Thread Vince Weaver
On Mon, 24 Feb 2014, Peter Zijlstra wrote:

> On Fri, Feb 21, 2014 at 03:18:38PM -0500, Vince Weaver wrote:
> > I've applied the patch and have been unable to trigger the warning with 
> > either my testcase or a few hours of fuzzing.
> 
> Yay.
> 
> > My only comment on the patch is it could always use some comments.
> > 
> > The perf_event code is really hard to follow as is, without adding
> > more uncommented special cases.
> 
> Does the below help a bit? Or is there anywhere in particular you want
> more comments?

yes, every little bit helps.

While chasing these fuzzer-related bugs I end up deep in the perf_event
code and many of the routines have no comments at all.  Eventually I have 
to dig out the K+R book to figure out order precendece of ++ prefix 
operators, have at least 2-3 different files open in editors, plus a bunch 
of firefox tabs open to http://lxr.free-electrons.com, and even then I 
misunderstand the code a lot.

Vince

> 
> ---
> Subject: perf, x86: Add a few more comments
> From: Peter Zijlstra 
> Date: Mon Feb 24 12:26:21 CET 2014
> 
> Add a few comments on the ->add(), ->del() and ->*_txn()
> implementation.
> 
> Requested-by: Vince Weaver 
> Signed-off-by: Peter Zijlstra 
> ---
>  arch/x86/kernel/cpu/perf_event.c |   49 
> +++
>  arch/x86/kernel/cpu/perf_event.h |8 +++---
>  2 files changed, 39 insertions(+), 18 deletions(-)
> 
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -892,7 +892,6 @@ static void x86_pmu_enable(struct pmu *p
>* hw_perf_group_sched_in() or x86_pmu_enable()
>*
>* step1: save events moving to new counters
> -  * step2: reprogram moved events into new counters
>*/
>   for (i = 0; i < n_running; i++) {
>   event = cpuc->event_list[i];
> @@ -918,6 +917,9 @@ static void x86_pmu_enable(struct pmu *p
>   x86_pmu_stop(event, PERF_EF_UPDATE);
>   }
>  
> + /*
> +  * step2: reprogram moved events into new counters
> +  */
>   for (i = 0; i < cpuc->n_events; i++) {
>   event = cpuc->event_list[i];
>   hwc = >hw;
> @@ -1043,7 +1045,7 @@ static int x86_pmu_add(struct perf_event
>   /*
>* If group events scheduling transaction was started,
>* skip the schedulability test here, it will be performed
> -  * at commit time (->commit_txn) as a whole
> +  * at commit time (->commit_txn) as a whole.
>*/
>   if (cpuc->group_flag & PERF_EVENT_TXN)
>   goto done_collect;
> @@ -1058,6 +1060,10 @@ static int x86_pmu_add(struct perf_event
>   memcpy(cpuc->assign, assign, n*sizeof(int));
>  
>  done_collect:
> + /*
> +  * Commit the collect_events() state. See x86_pmu_del() and
> +  * x86_pmu_*_txn().
> +  */
>   cpuc->n_events = n;
>   cpuc->n_added += n - n0;
>   cpuc->n_txn += n - n0;
> @@ -1183,28 +1189,38 @@ static void x86_pmu_del(struct perf_even
>* If we're called during a txn, we don't need to do anything.
>* The events never got scheduled and ->cancel_txn will truncate
>* the event_list.
> +  *
> +  * XXX assumes any ->del() called during a TXN will only be on
> +  * an event added during that same TXN.
>*/
>   if (cpuc->group_flag & PERF_EVENT_TXN)
>   return;
>  
> + /*
> +  * Not a TXN, therefore cleanup properly.
> +  */
>   x86_pmu_stop(event, PERF_EF_UPDATE);
>  
>   for (i = 0; i < cpuc->n_events; i++) {
> - if (event == cpuc->event_list[i]) {
> -
> - if (i >= cpuc->n_events - cpuc->n_added)
> - --cpuc->n_added;
> + if (event == cpuc->event_list[i])
> + break;
> + }
>  
> - if (x86_pmu.put_event_constraints)
> - x86_pmu.put_event_constraints(cpuc, event);
> + if (WARN_ON_ONCE(i == cpuc->n_events)) /* called ->del() without 
> ->add() ? */
> + return;
>  
> - while (++i < cpuc->n_events)
> - cpuc->event_list[i-1] = cpuc->event_list[i];
> + /* If we have a newly added event; make sure to decrease n_added. */
> + if (i >= cpuc->n_events - cpuc->n_added)
> + --cpuc->n_added;
> +
> + if (x86_pmu.put_event_constraints)
> + x86_pmu.put_event_constraints(cpuc, event);
> +
> + /* Delete the array entry. */
> + while (++i < cpuc->n_events)
> + cpuc->event_list[i-1] = cpuc->event_list[i];
> + --cpuc->n_events;
>  
> - --cpuc->n_events;
> - break;
> - }
> - }
>   perf_event_update_userpage(event);
>  }
>  
> @@ -1598,7 +1614,8 @@ static void x86_pmu_cancel_txn(struct pm
>  {
>   

Re: [PATCH net] vhost: net: switch to use data copy if pending DMAs exceed the limit

2014-02-25 Thread Jason Wang

On 02/25/2014 09:57 PM, Michael S. Tsirkin wrote:

On Tue, Feb 25, 2014 at 02:53:58PM +0800, Jason Wang wrote:

We used to stop the handling of tx when the number of pending DMAs
exceeds VHOST_MAX_PEND. This is used to reduce the memory occupation
of both host and guest. But it was too aggressive in some cases, since
any delay or blocking of a single packet may delay or block the guest
transmission. Consider the following setup:

 +-++-+
 | VM1 || VM2 |
 +--+--++--+--+
|  |
 +--+--++--+--+
 | tap0|| tap1|
 +--+--++--+--+
|  |
 pfifo_fast   htb(10Mbit/s)
|  |
 +--+--+---+
 | bridge  |
 +--+--+
|
 pfifo_fast
|
 +-+
 | eth0|(100Mbit/s)
 +-+

- start two VMs and connect them to a bridge
- add an physical card (100Mbit/s) to that bridge
- setup htb on tap1 and limit its throughput to 10Mbit/s
- run two netperfs in the same time, one is from VM1 to VM2. Another is
   from VM1 to an external host through eth0.
- result shows that not only the VM1 to VM2 traffic were throttled but
   also the VM1 to external host through eth0 is also throttled somehow.

This is because the delay added by htb may lead the delay the finish
of DMAs and cause the pending DMAs for tap0 exceeds the limit
(VHOST_MAX_PEND). In this case vhost stop handling tx request until
htb send some packets. The problem here is all of the packets
transmission were blocked even if it does not go to VM2.

We can solve this issue by relaxing it a little bit: switching to use
data copy instead of stopping tx when the number of pending DMAs
exceed the VHOST_MAX_PEND. This is safe because:

- The number of pending DMAs were still limited by VHOST_MAX_PEND
- The out of order completion during mode switch can make sure that
   most of the tx buffers were freed in time in guest.

So even if about 50% packets were delayed in zero-copy case, vhost
could continue to do the transmission through data copy in this case.

Test result:

Before this patch:
VM1 to VM2 throughput is 9.3Mbit/s
VM1 to External throughput is 40Mbit/s

After this patch:
VM1 to VM2 throughput is 9.3Mbit/s
Vm1 to External throughput is 93Mbit/s

Would like to see CPU utilization #s as well.



Will measure this.

Simple performance test on 40gbe shows no obvious changes in
throughput after this patch.

The patch only solve this issue when unlimited sndbuf. We still need a
solution for limited sndbuf.

Cc: Michael S. Tsirkin
Cc: Qin Chuanyu
Signed-off-by: Jason Wang

I think this needs some thought.

In particular I think this works because VHOST_MAX_PEND
is much smaller than the ring size.
Shouldn't max_pend then be tied to the ring size if it's small?



Yes it should. I just reuse the VHOST_MAX_PEND since it was there for a 
long time.

Another question is about stopping vhost:
ATM it's waiting for skbs to complete.
Should we maybe hunt down skbs queued and destroy them
instead?
I think this happens when a device is removed.

Thoughts?



Agree, vhost net removal should not be blocked by a skb. But since the 
skbs could be queued may places, just destroy them may need extra locks.


Haven't thought this deeply, but another possible sloution is to rcuify 
destructor_arg and assign it to NULL during vhost_net removing.

---
  drivers/vhost/net.c | 17 +++--
  1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index a0fa5de..3e96e47 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -345,7 +345,7 @@ static void handle_tx(struct vhost_net *net)
.msg_flags = MSG_DONTWAIT,
};
size_t len, total_len = 0;
-   int err;
+   int err, num_pends;
size_t hdr_size;
struct socket *sock;
struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
@@ -366,13 +366,6 @@ static void handle_tx(struct vhost_net *net)
if (zcopy)
vhost_zerocopy_signal_used(net, vq);

-   /* If more outstanding DMAs, queue the work.
-* Handle upend_idx wrap around
-*/
-   if (unlikely((nvq->upend_idx + vq->num - VHOST_MAX_PEND)
- % UIO_MAXIOV == nvq->done_idx))
-   break;
-
head = vhost_get_vq_desc(>dev, vq, vq->iov,
 ARRAY_SIZE(vq->iov),
,,
@@ -405,9 +398,13 @@ static void handle_tx(struct vhost_net *net)
break;
}

+   num_pends = likely(nvq->upend_idx>= nvq->done_idx) ?
+   (nvq->upend_idx - nvq->done_idx) :
+   (nvq->upend_idx + UIO_MAXIOV -
+nvq->done_idx);
+
zcopy_used = zcopy&&  len>= 

Re: [PATCH v4 4/6] arm: add early_ioremap support

2014-02-25 Thread Rob Herring
On Wed, Feb 12, 2014 at 2:56 PM, Mark Salter  wrote:
> This patch uses the generic early_ioremap code to implement
> early_ioremap for ARM. The ARM-specific bits come mostly from
> an earlier patch from Leif Lindholm 
> here:

This doesn't actually work for me. The PTE flags are not correct and
accesses to a device fault. See below.

>
>   https://lkml.org/lkml/2013/10/3/279
>
> Signed-off-by: Mark Salter 
> Tested-by: Leif Lindholm 
> Acked-by: Catalin Marinas 
> ---
>  arch/arm/Kconfig  | 10 +
>  arch/arm/include/asm/Kbuild   |  1 +
>  arch/arm/include/asm/fixmap.h | 20 ++
>  arch/arm/include/asm/io.h |  1 +
>  arch/arm/kernel/setup.c   |  2 +
>  arch/arm/mm/Makefile  |  4 ++
>  arch/arm/mm/early_ioremap.c   | 93 
> +++
>  arch/arm/mm/mmu.c |  2 +
>  8 files changed, 133 insertions(+)
>  create mode 100644 arch/arm/mm/early_ioremap.c
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index e254198..7a35ef6 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1874,6 +1874,16 @@ config UACCESS_WITH_MEMCPY
>   However, if the CPU data cache is using a write-allocate mode,
>   this option is unlikely to provide any performance gain.
>
> +config EARLY_IOREMAP
> +   bool "Provide early_ioremap() support for kernel initialization"
> +   select GENERIC_EARLY_IOREMAP
> +   help
> + Provide a mechanism for kernel initialisation code to temporarily
> + map, in a highmem-agnostic way, memory pages in before ioremap()
> + and friends are available (before paging_init() has run). It uses
> + the same virtual memory range as kmap so all early mappings must
> + be unmapped before paging_init() is called.
> +
>  config SECCOMP
> bool
> prompt "Enable seccomp to safely compute untrusted bytecode"
> diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
> index 3278afe..6fcfd7d 100644
> --- a/arch/arm/include/asm/Kbuild
> +++ b/arch/arm/include/asm/Kbuild
> @@ -4,6 +4,7 @@ generic-y += auxvec.h
>  generic-y += bitsperlong.h
>  generic-y += cputime.h
>  generic-y += current.h
> +generic-y += early_ioremap.h
>  generic-y += emergency-restart.h
>  generic-y += errno.h
>  generic-y += exec.h
> diff --git a/arch/arm/include/asm/fixmap.h b/arch/arm/include/asm/fixmap.h
> index 68ea615..ff8fa3e 100644
> --- a/arch/arm/include/asm/fixmap.h
> +++ b/arch/arm/include/asm/fixmap.h
> @@ -21,8 +21,28 @@ enum fixed_addresses {
> FIX_KMAP_BEGIN,
> FIX_KMAP_END = (FIXADDR_TOP - FIXADDR_START) >> PAGE_SHIFT,
> __end_of_fixed_addresses
> +/*
> + * 224 temporary boot-time mappings, used by early_ioremap(),
> + * before ioremap() is functional.
> + *
> + * (P)re-using the FIXADDR region, which is used for highmem
> + * later on, and statically aligned to 1MB.
> + */
> +#define NR_FIX_BTMAPS  32
> +#define FIX_BTMAPS_SLOTS   7
> +#define TOTAL_FIX_BTMAPS   (NR_FIX_BTMAPS * FIX_BTMAPS_SLOTS)
> +#define FIX_BTMAP_END  FIX_KMAP_BEGIN
> +#define FIX_BTMAP_BEGIN(FIX_BTMAP_END + TOTAL_FIX_BTMAPS - 1)

Why the different logic from arm64? Specifically, it doesn't make
adding a permanent mapping simple.

>  };
>
> +#define FIXMAP_PAGE_COMMON (L_PTE_YOUNG | L_PTE_PRESENT | L_PTE_XN)
> +
> +#define FIXMAP_PAGE_NORMAL (FIXMAP_PAGE_COMMON | L_PTE_MT_WRITEBACK)
> +#define FIXMAP_PAGE_IO(FIXMAP_PAGE_COMMON | L_PTE_MT_DEV_NONSHARED)

This should be L_PTE_MT_DEV_SHARED and also needs L_PTE_DIRTY and
L_PTE_SHARED as we want this to match MT_DEVICE.

These should also be wrapped with __pgprot().

> +
> +extern void __early_set_fixmap(enum fixed_addresses idx,
> +  phys_addr_t phys, pgprot_t flags);
> +
>  #include 
>
>  #endif
> diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
> index 8aa4cca..637e0cd 100644
> --- a/arch/arm/include/asm/io.h
> +++ b/arch/arm/include/asm/io.h
> @@ -28,6 +28,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  /*
> diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
> index b0df976..9c8b751 100644
> --- a/arch/arm/kernel/setup.c
> +++ b/arch/arm/kernel/setup.c
> @@ -30,6 +30,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #include 
> @@ -880,6 +881,7 @@ void __init setup_arch(char **cmdline_p)
> const struct machine_desc *mdesc;
>
> setup_processor();
> +   early_ioremap_init();
> mdesc = setup_machine_fdt(__atags_pointer);
> if (!mdesc)
> mdesc = setup_machine_tags(__atags_pointer, 
> __machine_arch_type);
> diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
> index 7f39ce2..501af98 100644
> --- a/arch/arm/mm/Makefile
> +++ b/arch/arm/mm/Makefile
> @@ -12,6 +12,10 @@ ifneq ($(CONFIG_MMU),y)
>  obj-y  += nommu.o
>  endif
>
> +ifeq ($(CONFIG_MMU),y)
> 

Re: [RFC PATCH] Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE

2014-02-25 Thread Rusty Russell
Steven Rostedt  writes:
> On Mon, 24 Feb 2014 16:55:36 + (UTC)
> Mathieu Desnoyers  wrote:
>
>> - Original Message -
>> > From: "Steven Rostedt" 
>> > To: "Mathieu Desnoyers" 
>> > Cc: "Ingo Molnar" , linux-kernel@vger.kernel.org, "Ingo 
>> > Molnar" , "Thomas
>> > Gleixner" , "Rusty Russell" , 
>> > "David Howells" ,
>> > "Greg Kroah-Hartman" 
>> > Sent: Monday, February 24, 2014 10:54:54 AM
>> > Subject: Re: [RFC PATCH] Fix: module signature vs tracepoints: add new 
>> > TAINT_UNSIGNED_MODULE
>> > 
>> [...]
>> 
>> (keeping discussion for later, as I'm busy at a client site)
>>  
>> > For now, I'm going to push this in, and also mark it for stable.
>> 
>> Which patch or patches do you plan to pull, and which is marked for stable ?
>
> The one that I replied to. I can't pull the module patch unless I get
> an ack from Rusty.

Ah, I applied it in my tree.  Happy for you to take it though; here's
the variant with an Acked-by from me instead of Signed-off-by if you
want it:

===
From: Mathieu Desnoyers 
Subject: Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE

Users have reported being unable to trace non-signed modules loaded
within a kernel supporting module signature.

This is caused by tracepoint.c:tracepoint_module_coming() refusing to
take into account tracepoints sitting within force-loaded modules
(TAINT_FORCED_MODULE). The reason for this check, in the first place, is
that a force-loaded module may have a struct module incompatible with
the layout expected by the kernel, and can thus cause a kernel crash
upon forced load of that module on a kernel with CONFIG_TRACEPOINTS=y.

Tracepoints, however, specifically accept TAINT_OOT_MODULE and
TAINT_CRAP, since those modules do not lead to the "very likely system
crash" issue cited above for force-loaded modules.

With kernels having CONFIG_MODULE_SIG=y (signed modules), a non-signed
module is tainted re-using the TAINT_FORCED_MODULE taint flag.
Unfortunately, this means that Tracepoints treat that module as a
force-loaded module, and thus silently refuse to consider any tracepoint
within this module.

Since an unsigned module does not fit within the "very likely system
crash" category of tainting, add a new TAINT_UNSIGNED_MODULE taint flag
to specifically address this taint behavior, and accept those modules
within Tracepoints. This flag is assigned to the letter 'N', since all
the more obvious ones (e.g. 'S') were already taken.

Signed-off-by: Mathieu Desnoyers 
Nacked-by: Ingo Molnar 
CC: Steven Rostedt 
CC: Thomas Gleixner 
CC: David Howells 
CC: Greg Kroah-Hartman 
Acked-by: Rusty Russell 
---
 include/linux/kernel.h| 1 +
 include/trace/events/module.h | 3 ++-
 kernel/module.c   | 4 +++-
 kernel/panic.c| 1 +
 kernel/tracepoint.c   | 5 +++--
 5 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 196d1ea86df0..471090093c67 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -469,6 +469,7 @@ extern enum system_states {
 #define TAINT_CRAP 10
 #define TAINT_FIRMWARE_WORKAROUND  11
 #define TAINT_OOT_MODULE   12
+#define TAINT_UNSIGNED_MODULE  13
 
 extern const char hex_asc[];
 #define hex_asc_lo(x)  hex_asc[((x) & 0x0f)]
diff --git a/include/trace/events/module.h b/include/trace/events/module.h
index 161932737416..1788a02557f4 100644
--- a/include/trace/events/module.h
+++ b/include/trace/events/module.h
@@ -23,7 +23,8 @@ struct module;
 #define show_module_flags(flags) __print_flags(flags, "",  \
{ (1UL << TAINT_PROPRIETARY_MODULE),"P" },  \
{ (1UL << TAINT_FORCED_MODULE), "F" },  \
-   { (1UL << TAINT_CRAP),  "C" })
+   { (1UL << TAINT_CRAP),  "C" },  \
+   { (1UL << TAINT_UNSIGNED_MODULE),   "N" })
 
 TRACE_EVENT(module_load,
 
diff --git a/kernel/module.c b/kernel/module.c
index efa1e6031950..eadf1f1651fb 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1013,6 +1013,8 @@ static size_t module_flags_taint(struct module *mod, char 
*buf)
buf[l++] = 'F';
if (mod->taints & (1 << TAINT_CRAP))
buf[l++] = 'C';
+   if (mod->taints & (1 << TAINT_UNSIGNED_MODULE))
+   buf[l++] = 'N';
/*
 * TAINT_FORCED_RMMOD: could be added.
 * TAINT_UNSAFE_SMP, TAINT_MACHINE_CHECK, TAINT_BAD_PAGE don't
@@ -3214,7 +3216,7 @@ static int load_module(struct load_info *info, const char 
__user *uargs,
pr_notice_once("%s: module verification failed: signature "
   "and/or  required key missing - tainting "
   "kernel\n", mod->name);
-   add_taint_module(mod, TAINT_FORCED_MODULE, LOCKDEP_STILL_OK);
+   add_taint_module(mod, TAINT_UNSIGNED_MODULE, LOCKDEP_STILL_OK);
}
 #endif
 
diff --git 

Re: [RFC PATCH] Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE

2014-02-25 Thread Rusty Russell
Johannes Berg  writes:
> Going on a tangent here - our use case is using backported upstream
> kernel modules (https://backports.wiki.kernel.org/) for delivering a
> driver to people who decided that they absolutely need to run with some
> random kernel (e.g. 3.10) but we don't yet support all the driver
> features they want/need in the kernel they picked.

Ah, a user!  See, that's not the "I forgot to sign my modules" case the
others were complaining about.

> We push our code upstream as soon as we can and typically only diverge
> from upstream by a few patches, so saying things like "crap" or "felony
> law breaker" about out-of-tree modules in general makes me furious.

Appreciated and understood.

I have applied Mathieu's patch to my pending tree, with Ingo's Nack
recorded.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/10] fs: Add new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocate

2014-02-25 Thread Hugh Dickins
On Wed, 26 Feb 2014, Dave Chinner wrote:
> On Tue, Feb 25, 2014 at 03:41:20PM -0800, Hugh Dickins wrote:
> > On Mon, 24 Feb 2014, Dave Chinner wrote:
> > > On Sat, Feb 22, 2014 at 09:06:25AM -0500, Theodore Ts'o wrote:
> > > > On Wed, Feb 19, 2014 at 01:37:43AM +0900, Namjae Jeon wrote:
> > > > > + /*
> > > > > +  * There is no need to overlap collapse range with EOF, in 
> > > > > which case
> > > > > +  * it is effectively a truncate operation
> > > > > +  */
> > > > > + if ((mode & FALLOC_FL_COLLAPSE_RANGE) &&
> > > > > + (offset + len >= i_size_read(inode)))
> > > > > + return -EINVAL;
> > > > > +
> > > > 
> > > > I wonder if we should just translate a collapse range that is
> > > > equivalent to a truncate operation to, in fact, be a truncate
> > > > operation?
> > > 
> > > Trying to collapse a range that extends beyond EOF, IMO, is likely
> > > to only happen if the DVR/NLE application is buggy. Hence I think
> > > that telling the application it is doing something that is likely to
> > > be wrong is better than silently truncating the file
> > 
> > I do agree with Ted on this point.  This is not an xfs ioctl added
> > for one DVR/NLE application, it's a mode of a Linux system call.
> > 
> > We do not usually reject with an error when one system call happens
> > to ask for something which can already be accomplished another way;
> > nor nanny our callers.
> > 
> > It seems natural to me that COLLAPSE_RANGE should support beyond EOF;
> > unless that adds significantly to implementation difficulties?
> 
> Yes, it does add to the implementation complexity significantly - it
> adds data security issues that don't exist with the current API.
> 
> That is, Filesystems can have uninitialised blocks beyond EOF so
> if we allow COLLAPSE_RANGE to shift them down within EOF, we now
> have to ensure they are properly zeroed or marked as unwritten.
> 
> It also makes implementations more difficult. For example, XFS can
> also have in-memory delayed allocation extents beyond EOF, and they
> can't be brought into the range < EOF without either:
> 
>   a) inserting zeroed pages with appropriately set up
>   and mapped bufferheads into the page cache for the range
>   that sits within EOF; or
>   b) truncating the delalloc extents beyond EOF before the
>   move
> 
> So, really, the moment you go beyond EOF filesystems have to do
> quite a bit more validation and IO in the context of the system
> call. It no longer becomes a pure extent manipulation offload - it
> becomes a data security problem.

Those sound like problems you would already have solved for a
simple extending truncate.

But I wasn't really thinking of the offset > i_size case, just the
offset + len >= i_size case: which would end with i_size at offset,
and the areas you're worried about still beyond EOF - or am I confused?

> 
> And, indeed, the specification that we are working to is that the
> applications that want to collapse the range of a file are using
> this function instead of read/memcpy/write/truncate, which by
> definition means they cannot shift ranges of the file beyond EOF
> into the new file.
> 
> So IMO the API defines the functionality as required by the
> applications that require it and *no more*. If you need some
> different behaviour - we can add it via additional flags in future
> when you have an application that requires it. 

You still seem to be thinking in terms of xfs ioctl hacks,
rather than fully scoped Linux system calls.

But it probably doesn't matter too much, if we start with an error,
and later correct that to a full implementation - an xfstest or LTP
test which expected failure will fail once it sees success, but no
users harmed in the making of this change.

> 
> > Actually, is it even correct to fail at EOF?  What if fallocation
> > with FALLOC_FL_KEEP_SIZE was used earlier, to allocate beyond EOF:
> > shouldn't it be possible to shift that allocation down, along with
> > the EOF, rather than leave it behind as a stranded island?
> 
> It does get shifted down - it just remains beyond EOF, just like it
> was before the operation. And that is part of the specification of
> COLLAPSE_RANGE - it was done so that preallocation (physical or
> speculative delayed allocation) beyond EOF to avoid fragmentation as
> the DVR continues to write is not screwed up by chopping out earlier
> parts of the file.

Yes, I was confused when I pictured a stranded island there.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-25 Thread Paul E. McKenney
On Tue, Feb 25, 2014 at 08:32:38PM -0700, Jeff Law wrote:
> On 02/25/14 17:15, Paul E. McKenney wrote:
> >>I have for the last several years been 100% convinced that the Intel
> >>memory ordering is the right thing, and that people who like weak
> >>memory ordering are wrong and should try to avoid reproducing if at
> >>all possible. But given that we have memory orderings like power and
> >>ARM, I don't actually see a sane way to get a good strong ordering.
> >>You can teach compilers about cases like the above when they actually
> >>see all the code and they could poison the value chain etc. But it
> >>would be fairly painful, and once you cross object files (or even just
> >>functions in the same compilation unit, for that matter), it goes from
> >>painful to just "ridiculously not worth it".
> >
> >And I have indeed seen a post or two from you favoring stronger memory
> >ordering over the past few years.  ;-)
> I couldn't agree more.
> 
> >
> >Are ARM and Power really the bad boys here?  Or are they instead playing
> >the role of the canary in the coal mine?
> That's a question I've been struggling with recently as well.  I
> suspect they (arm, power) are going to be the outliers rather than
> the canary. While the weaker model may give them some advantages WRT
> scalability, I don't think it'll ultimately be enough to overcome
> the difficulty in writing correct low level code for them.
> 
> Regardless, they're here and we have to deal with them.

Agreed...

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] zram: support REQ_DISCARD

2014-02-25 Thread Joonsoo Kim
zram is ram based block device and can be used by backend of filesystem.
When filesystem deletes a file, it normally doesn't do anything on data
block of that file. It just marks on metadata of that file. This behavior
has no problem on disk based block device, but has problems on ram based
block device, since we can't free memory used for data block. To overcome
this disadvantage, there is REQ_DISCARD functionality. If block device
support REQ_DISCARD and filesystem is mounted with discard option,
filesystem sends REQ_DISCARD to block device whenever some data blocks are
discarded. All we have to do is to handle this request.

This patch implements to flag up QUEUE_FLAG_DISCARD and handle this
REQ_DISCARD request. With it, we can free memory used by zram if it isn't
used.

v2: handle unaligned case commented by Jerome

Signed-off-by: Joonsoo Kim 

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 5ec61be..5364c1e 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -501,6 +501,36 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec 
*bvec, u32 index,
return ret;
 }
 
+static void zram_bio_discard(struct zram *zram, struct bio *bio)
+{
+   u32 index = bio->bi_iter.bi_sector >> SECTORS_PER_PAGE_SHIFT;
+   size_t n = bio->bi_iter.bi_size;
+   size_t misalign;
+
+   /*
+* On some arch, logical block (4096) aligned request couldn't be
+* aligned to PAGE_SIZE, since their PAGE_SIZE aren't 4096.
+* Therefore we should handle this misaligned case here.
+*/
+   misalign = (bio->bi_iter.bi_sector &
+   (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;
+   if (misalign) {
+   if (n < misalign)
+   return;
+
+   n -= misalign;
+   index++;
+   }
+
+   while (n >= PAGE_SIZE) {
+   write_lock(>meta->tb_lock);
+   zram_free_page(zram, index);
+   write_unlock(>meta->tb_lock);
+   index++;
+   n -= PAGE_SIZE;
+   }
+}
+
 static void zram_reset_device(struct zram *zram, bool reset_capacity)
 {
size_t index;
@@ -618,6 +648,12 @@ static void __zram_make_request(struct zram *zram, struct 
bio *bio)
struct bio_vec bvec;
struct bvec_iter iter;
 
+   if (unlikely(bio->bi_rw & REQ_DISCARD)) {
+   zram_bio_discard(zram, bio);
+   bio_endio(bio, 0);
+   return;
+   }
+
index = bio->bi_iter.bi_sector >> SECTORS_PER_PAGE_SHIFT;
offset = (bio->bi_iter.bi_sector &
  (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;
@@ -784,6 +820,10 @@ static int create_device(struct zram *zram, int device_id)
ZRAM_LOGICAL_BLOCK_SIZE);
blk_queue_io_min(zram->disk->queue, PAGE_SIZE);
blk_queue_io_opt(zram->disk->queue, PAGE_SIZE);
+   zram->disk->queue->limits.discard_granularity = PAGE_SIZE;
+   zram->disk->queue->limits.max_discard_sectors = UINT_MAX;
+   zram->disk->queue->limits.discard_zeroes_data = 1;
+   queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, zram->disk->queue);
 
add_disk(zram->disk);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-25 Thread Paul E. McKenney
On Tue, Feb 25, 2014 at 10:06:53PM -0500, George Spelvin wrote:
>  wrote:
> >  wrote:
> >> I have for the last several years been 100% convinced that the Intel
> >> memory ordering is the right thing, and that people who like weak
> >> memory ordering are wrong and should try to avoid reproducing if at
> >> all possible.
> >
> > Are ARM and Power really the bad boys here?  Or are they instead playing
> > the role of the canary in the coal mine?
> 
> To paraphrase some older threads, I think Linus's argument is that
> weak memory ordering is like branch delay slots: a way to make a simple
> implementation simpler, but ends up being no help to a more aggressive
> implementation.
> 
> Branch delay slots give a one-cycle bonus to in-order cores, but
> once you go superscalar and add branch prediction, they stop helping,
> and once you go full out of order, they're just an annoyance.
> 
> Likewise, I can see the point that weak ordering can help make a simple
> cache interface simpler, but once you start doing speculative loads,
> you've already bought and paid for all the hardware you need to do
> stronger coherency.
> 
> Another thing that requires all the strong-coherency machinery is
> a high-performance implementation of the various memory barrier and
> synchronization operations.  Yes, a low-performance (drain the pipeline)
> implementation is tolerable if the instructions aren't used frequently,
> but once you're really trying, it doesn't save complexity.
> 
> Once you're there, strong coherency always doesn't actually cost you any
> time outside of critical synchronization code, and it both simplifies
> and speeds up the tricky synchronization software.
> 
> 
> So PPC and ARM's weak ordering are not the direction the future is going.
> Rather, weak ordering is something that's only useful in a limited
> technology window, which is rapidly passing.

That does indeed appear to be Intel's story.  Might well be correct.
Time will tell.

> If you can find someone in IBM who's worked on the Z series cache
> coherency (extremely strong ordering), they probably have some useful
> insights.  The big question is if strong ordering, once you've accepted
> the implementation complexity and area, actually costs anything in
> execution time.  If there's an unavoidable cost which weak ordering saves,
> that's significant.

There has been a lot of ink spilled on this argument.  ;-)

PPC has much larger CPU counts than does the mainframe.  On the other
hand, there are large x86 systems.  Some claim that there are differences
in latency due to the different approaches, and there could be a long
argument about whether all this in inherent in the memory ordering or
whether it is due to implementation issues.

I don't claim to know the answer.  I do know that ARM and PPC are
here now, and that I need to deal with them.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done

2014-02-25 Thread Viresh Kumar
On 26 February 2014 02:41, Saravana Kannan  wrote:
> On 02/25/2014 05:04 AM, Rafael J. Wysocki wrote:
>> On Tuesday, February 25, 2014 02:20:57 PM Viresh Kumar wrote:

> I think there's been a misunderstanding of what I'm proposing. The reference
> to policy->clk is only to get rid of the dependency that
> cpufreq_generic_get() has on the per cpu policy variable being filled. You
> can do that by just removing calls to get _IF_ clk is filled in.

cpufreq_cpu_get() can be called by other drivers as well, which may not have
clock interface implemented for them. We can't stop them from calling it.

> I'll look at the patches later today and respond. Although, I would have
> been nice you had helped me fix any issues with my patch than coming up with
> new ones. Kinda removes to motivation for submitting patches upstream.

Sorry if I demotivated you at all :)

I just wanted to get to a quick-fix and wasn't interested in getting
my patch count
up. Thought that isn't always bad :)

I sent my patches as they were very different then your approach. Obviously, I
wouldn't have done this otherwise :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] cpufreq: Return error if ->get() failed in cpufreq_update_policy()

2014-02-25 Thread Viresh Kumar
On 26 February 2014 03:59, Rafael J. Wysocki  wrote:
> Yes, what exactly do we need it for in the core?

Its probably there to make things faster. We cache the value so that we
don't go to the hardware to read/calculate that again. Isn't it?

And we need to know current freq on many occasions. One of that is that
many drivers need to know the relation between current and new freq before
they can make the change. As they might need to play with volt regulators
before or after the freq change. Also it is used mainly in our loops_per_jiffiy
calculations.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/10] ASoC: blackfin: bf5xx-ad1836: Use snd_soc_dai_set_tdm_slot_xlate()

2014-02-25 Thread Xiubo Li
Use snd_soc_dai_set_tdm_slot_xlate instead of snd_soc_dai_set_tdm_slot.
This will use the default snd_soc_of_xlate_tdm_slot_mask to generate
the TDM slot TX/RX mask using the slot parameter.

Signed-off-by: Xiubo Li 
---
 sound/soc/blackfin/bf5xx-ad1836.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/blackfin/bf5xx-ad1836.c 
b/sound/soc/blackfin/bf5xx-ad1836.c
index 8fcfc4e..b1908ce 100644
--- a/sound/soc/blackfin/bf5xx-ad1836.c
+++ b/sound/soc/blackfin/bf5xx-ad1836.c
@@ -44,7 +44,7 @@ static int bf5xx_ad1836_init(struct snd_soc_pcm_runtime *rtd)
if (ret < 0)
return ret;
 
-   ret = snd_soc_dai_set_tdm_slot(cpu_dai, 0xFF, 0xFF, 8, 32);
+   ret = snd_soc_dai_set_tdm_slot_xlate(cpu_dai, 8, 32);
if (ret < 0)
return ret;
 
-- 
1.8.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-25 Thread Paul E. McKenney
On Tue, Feb 25, 2014 at 05:47:03PM -0800, Linus Torvalds wrote:
> On Mon, Feb 24, 2014 at 10:00 PM, Paul E. McKenney
>  wrote:
> >
> > So let me see if I understand your reasoning.  My best guess is that it
> > goes something like this:
> >
> > 1.  The Linux kernel contains code that passes pointers from
> > rcu_dereference() through external functions.
> 
> No, actually, it's not so much Linux-specific at all.
> 
> I'm actually thinking about what I'd do as a compiler writer, and as a
> defender the "C is a high-level assembler" concept.
> 
> I love C. I'm a huge fan. I think it's a great language, and I think
> it's a great language not because of some theoretical issues, but
> because it is the only language around that actually maps fairly well
> to what machines really do.
> 
> And it's a *simple* language. Sure, it's not quite as simple as it
> used to be, but look at how thin the "K book" is. Which pretty much
> describes it - still.
> 
> That's the real strength of C, and why it's the only language serious
> people use for system programming.  Ignore C++ for a while (Jesus
> Xavier Christ, I've had to do C++ programming for subsurface), and
> just think about what makes _C_ a good language.

The last time I used C++ for a project was in 1990.  It was a lot smaller
then.

> I can look at C code, and I can understand what the code generation
> is, and what it will really *do*. And I think that's important.
> Abstractions that hide what the compiler will actually generate are
> bad abstractions.
> 
> And ok, so this is obviously Linux-specific in that it's generally
> only Linux where I really care about the code generation, but I do
> think it's a bigger issue too.
> 
> So I want C features to *map* to the hardware features they implement.
> The abstractions should match each other, not fight each other.

OK...

> > Actually, the fact that there are more potential optimizations than I can
> > think of is a big reason for my insistence on the carries-a-dependency
> > crap.  My lack of optimization omniscience makes me very nervous about
> > relying on there never ever being a reasonable way of computing a given
> > result without preserving the ordering.
> 
> But if I can give two clear examples that are basically identical from
> a syntactic standpoint, and one clearly can be trivially optimized to
> the point where the ordering guarantee goes away, and the other
> cannot, and you cannot describe the difference, then I think your
> description is seriously lacking.

In my defense, my plan was to constrain the compiler to retain the
ordering guarantee in either case.  Yes, I did notice that you find
that unacceptable.

> And I do *not* think the C language should be defined by how it can be
> described. Leave that to things like Haskell or LISP, where the goal
> is some kind of completeness of the language that is about the
> language, not about the machines it will run on.

I am with you up to the point that the fancy optimizers start kicking
in.  I don't know how to describe what the optimizers are and are not
permitted to do strictly in terms of the underlying hardware.

> >> So the code sequence I already mentioned is *not* ordered:
> >>
> >> Litmus test 1:
> >>
> >> p = atomic_read(pp, consume);
> >> if (p == )
> >> return p->val;
> >>
> >>is *NOT* ordered, because the compiler can trivially turn this into
> >> "return variable.val", and break the data dependency.
> >
> > Right, given your model, the compiler is free to produce code that
> > doesn't order the load from pp against the load from p->val.
> 
> Yes. Note also that that is what existing compilers would actually do.
> 
> And they'd do it "by mistake": they'd load the address of the variable
> into a register, and then compare the two registers, and then end up
> using _one_ of the registers as the base pointer for the "p->val"
> access, but I can almost *guarantee* that there are going to be
> sequences where some compiler will choose one register over the other
> based on some random detail.
> 
> So my model isn't just a "model", it also happens to descibe reality.

Sounds to me like your model -is- reality.  I believe that it is useful
to constrain reality from time to time, but understand that you vehemently
disagree.

> > Indeed, it won't work across different compilation units unless
> > the compiler is told about it, which is of course the whole point of
> > [[carries_dependency]].  Understood, though, the Linux kernel currently
> > does not have anything that could reasonably automatically generate those
> > [[carries_dependency]] attributes.  (Or are there other reasons why you
> > believe [[carries_dependency]] is problematic?)
> 
> So I think carries_dependency is problematic because:
> 
>  - it's not actually in C11 afaik

Indeed it is not, but I bet that gcc will implement it like it does the
other attributes that are not part of C11.

>  - it requires the programmer to solve the problem of 

Re: [PATCH] ARM: tegra: add device tree for SHIELD

2014-02-25 Thread Alexandre Courbot

On 02/26/2014 02:02 PM, Stephen Warren wrote:

On 02/25/2014 09:58 PM, Alexandre Courbot wrote:

On 02/26/2014 07:38 AM, Stephen Warren wrote:

On 02/24/2014 07:13 PM, Alexandre Courbot wrote:

On 02/25/2014 03:53 AM, Stephen Warren wrote:

On 02/24/2014 03:26 AM, Alexandre Courbot wrote:

Add a device tree for NVIDIA SHIELD. The set of enabled features is
still minimal with no display option (although HDMI should be easy
to get to work) and USB requiring external power.

...

For the Wifi chip, non-removable would be the correct setting
hardware-wise, but there is a trap: the chip has its reset line asserted
at boot-time, and you need to set GPIO 229 to de-assert it. Only after
that will the device be detected on the SDIO bus. Since it lacks a CD
line, it must be polled, hence the broken-cd property.


How does that GPIO get manipulated right now? I assume you must be
manually configuring it via sysfs after boot or something? If so,
perhaps it's best to just leave out the WiFi node until it works
automatically.


The GPIO needs to be set from user-space, yes. But if we leave the Wifi
node out, I'm concerned that wireless will not be usable at all,
wouldn't it?


True, but if we have no representation of the device in DT that works
without manually enabling clocks and/or GPIOs, it's not a
complete/accurate representation of the HW, so it doesn't make sense to
add it to DT. Yes, I admit that sucks.


Well, I can always enable it in my out-of-tree branch until we can push 
the complete binding in mainline, so I'm ok with taking it out of this 
patch for now.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/10] ASoC: fsl: eukrea-tlv320: Use snd_soc_dai_set_tdm_slot_xlate()

2014-02-25 Thread Xiubo Li
Use snd_soc_dai_set_tdm_slot_xlate instead of snd_soc_dai_set_tdm_slot.
This will use the default snd_soc_of_xlate_tdm_slot_mask to generate
the TDM slot TX/RX mask using the slot parameter.

Signed-off-by: Xiubo Li 
---
 sound/soc/fsl/eukrea-tlv320.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/fsl/eukrea-tlv320.c b/sound/soc/fsl/eukrea-tlv320.c
index 5983740..3965023 100644
--- a/sound/soc/fsl/eukrea-tlv320.c
+++ b/sound/soc/fsl/eukrea-tlv320.c
@@ -63,7 +63,7 @@ static int eukrea_tlv320_hw_params(struct snd_pcm_substream 
*substream,
"Failed to set the codec sysclk.\n");
return ret;
}
-   snd_soc_dai_set_tdm_slot(cpu_dai, 0xffc, 0xffc, 2, 0);
+   snd_soc_dai_set_tdm_slot_xlate(cpu_dai, 2, 0);
 
ret = snd_soc_dai_set_sysclk(cpu_dai, IMX_SSP_SYS_CLK, 0,
SND_SOC_CLOCK_IN);
-- 
1.8.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [PATCH] PM / devfreq: Fix out of bounds access of transition table array

2014-02-25 Thread 함명주
> On 02/23/2014 11:15 PM, Saravana Kannan wrote:
> > The previous_freq value for a device could be an invalid frequency that
> > results in a error value being returned from devfreq_get_freq_level().
> > Check for an error value before using that to index into the transition
> > table.
> >
> > Not doing this check will result in memory corruption when previous_freq is
> > not a valid frequency.
> >
> > Signed-off-by: Saravana Kannan 
> 
> MyungJoo/Kyungmin,
> 
> Would either of you have some time to respond to this?
> 
> Thanks,
> Saravana

Dear Saravana,

> > +   prev_lev = devfreq_get_freq_level(devfreq, devfreq->previous_freq);
> > +   if (prev_lev < 0)
> > +   return 0;

If devfreq_get_freq_level returned error, please return that error
to the caller. You are retuning 0 in that case.

Plus, do you think we are going to change profile->freq_table in run-time?
(by accidently? or intentionally?)

Cheers,
MyungJoo.

> 
> 
> -- 
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> hosted by The Linux Foundation
> 
> 
> 
>
>   
>  
> 
N떑꿩�r툤y鉉싕b쾊Ф푤v�^�)頻{.n�+돴쪐{콗喩zX㎍썳變}찠꼿쟺�:+v돣�쳭喩zZ+€�+zf"톒쉱�~넮녬i鎬z�췿ⅱ�?솳鈺�&�)刪f뷌^j푹y쬶끷@A첺뛴
0띠h��뭝

Re: [PATCH] tty: Fix low_latency BUG

2014-02-25 Thread Feng Tang
Hi Peter,

2014-02-22 20:31 GMT+08:00 Peter Hurley :
> The user-settable knob, low_latency, has been the source of
> several BUG reports which stem from flush_to_ldisc() running
> in interrupt context. Since 3.12, which added several sleeping
> locks (termios_rwsem and buf->lock) to the input processing path,
> the frequency of these BUG reports has increased.
>
> Note that changes in 3.12 did not introduce this regression;
> sleeping locks were first added to the input processing path
> with the removal of the BKL from N_TTY in commit
> a88a69c91256418c5907c2f1f8a0ec0a36f9e6cc,
> 'n_tty: Fix loss of echoed characters and remove bkl from n_tty'
> and later in commit 38db89799bdf11625a831c5af33938dcb11908b6,
> 'tty: throttling race fix'. Since those changes, executing
> flush_to_ldisc() in interrupt_context (ie, low_latency set), is unsafe.
>
> However, since most devices do not validate if the low_latency
> setting is appropriate for the context (process or interrupt) in
> which they receive data, some reports are due to misconfiguration.
> Further, serial dma devices for which dma fails, resort to
> interrupt receiving as a backup without resetting low_latency.
>
> Historically, low_latency was used to force wake-up the reading
> process rather than wait for the next scheduler tick. The
> effect was to trim multiple milliseconds of latency from
> when the process would receive new data.
>
> Recent tests [1] have shown that the reading process now receives
> data with only 10's of microseconds latency without low_latency set.

The 10's of miscroseconds is ok for 115200 bps like device, but it may
hurt the high speed device like Bluetooth which runs at 3M/4M bps or
higher.

More and more smartphones are using uart as the Bluetooth data
interface due to its low-pin, low-power feature, and many of them
are using HZ=100 kernel, I'm afraid this added delay may cause
some problem.

Thanks,
Feng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] cpufreq: stats: Fix error handling in __cpufreq_stats_create_table()

2014-02-25 Thread Viresh Kumar
On 26 February 2014 09:08, Saravana Kannan  wrote:
> Remove sysfs group if __cpufreq_stats_create_table() fails after creating
> one.
>
> Change-Id: Icb0b44424cc4eb6c88be255e2839ef51c3f8779c
> Signed-off-by: Saravana Kannan 
> ---
>  drivers/cpufreq/cpufreq_stats.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
> index e4bd27f..c52b440 100644
> --- a/drivers/cpufreq/cpufreq_stats.c
> +++ b/drivers/cpufreq/cpufreq_stats.c
> @@ -216,7 +216,7 @@ static int __cpufreq_stats_create_table(struct 
> cpufreq_policy *policy,
> stat->time_in_state = kzalloc(alloc_size, GFP_KERNEL);
> if (!stat->time_in_state) {
> ret = -ENOMEM;
> -   goto error_out;
> +   goto error_alloc;
> }
> stat->freq_table = (unsigned int *)(stat->time_in_state + count);
>
> @@ -237,6 +237,8 @@ static int __cpufreq_stats_create_table(struct 
> cpufreq_policy *policy,
> stat->last_index = freq_table_get_index(stat, policy->cur);
> spin_unlock(_stats_lock);
> return 0;
> +error_alloc:
> +   sysfs_remove_group(>kobj, _attr_group);
>  error_out:
> kfree(stat);
> per_cpu(cpufreq_stats_table, cpu) = NULL;

Acked-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/10] ASoC: fsl: wm1133-ev1: Use snd_soc_dai_set_tdm_slot_xlate()

2014-02-25 Thread Xiubo Li
Use snd_soc_dai_set_tdm_slot_xlate instead of snd_soc_dai_set_tdm_slot.
This will use the default snd_soc_of_xlate_tdm_slot_mask to generate
the TDM slot TX/RX mask using the slot parameter.

Signed-off-by: Xiubo Li 
---
 sound/soc/fsl/wm1133-ev1.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/sound/soc/fsl/wm1133-ev1.c b/sound/soc/fsl/wm1133-ev1.c
index fce6325..4519b08 100644
--- a/sound/soc/fsl/wm1133-ev1.c
+++ b/sound/soc/fsl/wm1133-ev1.c
@@ -114,16 +114,7 @@ static int wm1133_ev1_hw_params(struct snd_pcm_substream 
*substream,
snd_soc_dai_set_fmt(cpu_dai, dai_format);
 
/* TODO: The SSI driver should figure this out for us */
-   switch (channels) {
-   case 2:
-   snd_soc_dai_set_tdm_slot(cpu_dai, 0xffc, 0xffc, 2, 0);
-   break;
-   case 1:
-   snd_soc_dai_set_tdm_slot(cpu_dai, 0xffe, 0xffe, 1, 0);
-   break;
-   default:
-   return -EINVAL;
-   }
+   snd_soc_dai_set_tdm_slot_xlate(cpu_dai, channels, 0);
 
/* set MCLK as the codec system clock for DAC and ADC */
snd_soc_dai_set_sysclk(codec_dai, WM8350_MCLK_SEL_PLL_MCLK,
-- 
1.8.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/10] ASoC: imx-ssi: Add .of_xlate_tdm_slot_mask() support.

2014-02-25 Thread Xiubo Li
This patch add .of_xlate_tdm_slot_mask support for IMX SSI, and this
will generate the TDM slot TX and RX masks.

Signed-off-by: Xiubo Li 
---
 sound/soc/fsl/Kconfig   | 1 +
 sound/soc/fsl/imx-ssi.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/sound/soc/fsl/Kconfig b/sound/soc/fsl/Kconfig
index 7472308..67833dd 100644
--- a/sound/soc/fsl/Kconfig
+++ b/sound/soc/fsl/Kconfig
@@ -122,6 +122,7 @@ if SND_IMX_SOC
 
 config SND_SOC_IMX_SSI
tristate
+   select SND_SOC_FSL_UTILS
 
 config SND_SOC_IMX_PCM_FIQ
tristate
diff --git a/sound/soc/fsl/imx-ssi.c b/sound/soc/fsl/imx-ssi.c
index df552fa..52d4b7a 100644
--- a/sound/soc/fsl/imx-ssi.c
+++ b/sound/soc/fsl/imx-ssi.c
@@ -50,6 +50,7 @@
 #include 
 
 #include "imx-ssi.h"
+#include "fsl_utils.h"
 
 #define SSI_SACNT_DEFAULT (SSI_SACNT_AC97EN | SSI_SACNT_FV)
 
@@ -339,6 +340,7 @@ static const struct snd_soc_dai_ops imx_ssi_pcm_dai_ops = {
.set_fmt= imx_ssi_set_dai_fmt,
.set_clkdiv = imx_ssi_set_dai_clkdiv,
.set_sysclk = imx_ssi_set_dai_sysclk,
+   .of_xlate_tdm_slot_mask = fsl_asoc_of_xlate_tdm_slot_mask,
.set_tdm_slot   = imx_ssi_set_dai_tdm_slot,
.trigger= imx_ssi_trigger,
 };
-- 
1.8.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/10] ASoC: simple-card: Use snd_soc_dai_set_tdm_slot_xlate()

2014-02-25 Thread Xiubo Li
Use snd_soc_dai_set_tdm_slot_xlate instead of snd_soc_dai_set_tdm_slot.

This will use the DAI driver specified .of_xlate_tdm_slot_mask to generate
the TDM slot TX/RX mask, or the default snd_soc_of_xlate_tdm_slot_mask will
be used instead if it's absent.

Signed-off-by: Xiubo Li 
---
 sound/soc/generic/simple-card.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/sound/soc/generic/simple-card.c b/sound/soc/generic/simple-card.c
index bdd176d..22efa83 100644
--- a/sound/soc/generic/simple-card.c
+++ b/sound/soc/generic/simple-card.c
@@ -48,11 +48,10 @@ static int __asoc_simple_card_dai_init(struct snd_soc_dai 
*dai,
}
 
if (set->slots) {
-   ret = snd_soc_dai_set_tdm_slot(dai, 0, 0,
-   set->slots,
-   set->slot_width);
+   ret = snd_soc_dai_set_tdm_slot_xlate(dai, set->slots,
+set->slot_width);
if (ret && ret != -ENOTSUPP) {
-   dev_err(dai->dev, "simple-card: set_tdm_slot error\n");
+   dev_err(dai->dev, "simple-card: set_tdm_slot_xlate 
error\n");
goto err;
}
}
-- 
1.8.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/10] ASoC: fsl-ssi: Add .of_xlate_tdm_slot_mask() support.

2014-02-25 Thread Xiubo Li
This patch add .of_xlate_tdm_slot_mask support for SSI, and this will
generate the TDM slot TX and RX masks.

Signed-off-by: Xiubo Li 
---
 sound/soc/fsl/Kconfig   | 1 +
 sound/soc/fsl/fsl_ssi.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/sound/soc/fsl/Kconfig b/sound/soc/fsl/Kconfig
index 6abb68e..7472308 100644
--- a/sound/soc/fsl/Kconfig
+++ b/sound/soc/fsl/Kconfig
@@ -5,6 +5,7 @@ config SND_SOC_FSL_SAI
 
 config SND_SOC_FSL_SSI
tristate
+   select SND_SOC_FSL_UTILS
 
 config SND_SOC_FSL_SPDIF
tristate "ALSA SoC support for the Freescale SPDIF device"
diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index 5428a1f..80ca76c 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -53,6 +53,7 @@
 
 #include "fsl_ssi.h"
 #include "imx-pcm.h"
+#include "fsl_utils.h"
 
 #ifdef PPC
 #define read_ssi(addr)  in_be32(addr)
@@ -1136,6 +1137,7 @@ static const struct snd_soc_dai_ops fsl_ssi_dai_ops = {
.hw_params  = fsl_ssi_hw_params,
.set_fmt= fsl_ssi_set_dai_fmt,
.set_sysclk = fsl_ssi_set_dai_sysclk,
+   .of_xlate_tdm_slot_mask = fsl_asoc_of_xlate_tdm_slot_mask,
.set_tdm_slot   = fsl_ssi_set_dai_tdm_slot,
.trigger= fsl_ssi_trigger,
 };
-- 
1.8.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/10] ASoC: blackfin: bf5xx-ad193x: Use snd_soc_dai_set_tdm_slot_xlate()

2014-02-25 Thread Xiubo Li
Use snd_soc_dai_set_tdm_slot_xlate instead of snd_soc_dai_set_tdm_slot.
This will use the default snd_soc_of_xlate_tdm_slot_mask to generate
the TDM slot TX/RX mask using the slot parameter.

Signed-off-by: Xiubo Li 
---
 sound/soc/blackfin/bf5xx-ad193x.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sound/soc/blackfin/bf5xx-ad193x.c 
b/sound/soc/blackfin/bf5xx-ad193x.c
index 603ad1f..f622faa 100644
--- a/sound/soc/blackfin/bf5xx-ad193x.c
+++ b/sound/soc/blackfin/bf5xx-ad193x.c
@@ -53,11 +53,11 @@ static int bf5xx_ad193x_link_init(struct 
snd_soc_pcm_runtime *rtd)
return ret;
 
/* set codec DAI slots, 8 channels, all channels are enabled */
-   ret = snd_soc_dai_set_tdm_slot(codec_dai, 0xFF, 0xFF, 8, 32);
+   ret = snd_soc_dai_set_tdm_slot_xlate(codec_dai, 8, 32);
if (ret < 0)
return ret;
 
-   ret = snd_soc_dai_set_tdm_slot(cpu_dai, 0xFF, 0xFF, 8, 32);
+   ret = snd_soc_dai_set_tdm_slot_xlate(cpu_dai, 8, 32);
if (ret < 0)
return ret;
 
-- 
1.8.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] drivers: clk: add samsung common clock config option

2014-02-25 Thread Pankaj Dubey
add samsung common clock config option and let ARCH_EXYNOS or ARCH_S3C
select this if they want to use samsung common clock infrastructure.

CC: Mike Turquette 
Signed-off-by: Pankaj Dubey 
---
 drivers/clk/Kconfig  |   10 ++
 drivers/clk/Makefile |2 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
index 7641965..d93a325 100644
--- a/drivers/clk/Kconfig
+++ b/drivers/clk/Kconfig
@@ -23,6 +23,16 @@ config COMMON_CLK
 menu "Common Clock Framework"
depends on COMMON_CLK
 
+config COMMON_CLK_SAMSUNG
+   bool "Clock driver for Samsung SoCs"
+   depends on ARCH_S3C64XX || ARCH_S3C24XX || ARCH_EXYNOS || ARM64
+   ---help---
+  Supports clocking on Exynos SoCs:
+ - Exynos5250, Exynos5420 board.
+ - Exynos4 boards.
+ - S3C2412, S3C2416, S3C2466 boards.
+ - S3C64XX boards.
+
 config COMMON_CLK_WM831X
tristate "Clock driver for WM831x/2x PMICs"
depends on MFD_WM831X
diff --git a/drivers/clk/Makefile b/drivers/clk/Makefile
index a367a98..f1da6ee 100644
--- a/drivers/clk/Makefile
+++ b/drivers/clk/Makefile
@@ -38,7 +38,7 @@ obj-$(CONFIG_PLAT_ORION)  += mvebu/
 obj-$(CONFIG_ARCH_MXS) += mxs/
 obj-$(CONFIG_COMMON_CLK_QCOM)  += qcom/
 obj-$(CONFIG_ARCH_ROCKCHIP)+= rockchip/
-obj-$(CONFIG_PLAT_SAMSUNG) += samsung/
+obj-$(CONFIG_COMMON_CLK_SAMSUNG)   += samsung/
 obj-$(CONFIG_ARCH_SHMOBILE_MULTI)  += shmobile/
 obj-$(CONFIG_ARCH_SIRF)+= sirf/
 obj-$(CONFIG_ARCH_SOCFPGA) += socfpga/
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/10] ASoC: fsl-esai: Add .of_xlate_tdm_slot_mask() support.

2014-02-25 Thread Xiubo Li
This patch add .of_xlate_tdm_slot_mask support for ESAI, and this will
generate the TDM slot TX and RX masks.

Signed-off-by: Xiubo Li 
---
 sound/soc/fsl/Kconfig| 1 +
 sound/soc/fsl/fsl_esai.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/sound/soc/fsl/Kconfig b/sound/soc/fsl/Kconfig
index f397144..6abb68e 100644
--- a/sound/soc/fsl/Kconfig
+++ b/sound/soc/fsl/Kconfig
@@ -13,6 +13,7 @@ config SND_SOC_FSL_SPDIF
 config SND_SOC_FSL_ESAI
tristate "ALSA SoC support for the Freescale ESAI device"
select REGMAP_MMIO
+   select SND_SOC_FSL_UTILS
 
 config SND_SOC_FSL_UTILS
tristate
diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c
index 0ba3700..912104f 100644
--- a/sound/soc/fsl/fsl_esai.c
+++ b/sound/soc/fsl/fsl_esai.c
@@ -18,6 +18,7 @@
 
 #include "fsl_esai.h"
 #include "imx-pcm.h"
+#include "fsl_utils.h"
 
 #define FSL_ESAI_RATES SNDRV_PCM_RATE_8000_192000
 #define FSL_ESAI_FORMATS   (SNDRV_PCM_FMTBIT_S8 | \
@@ -581,6 +582,7 @@ static struct snd_soc_dai_ops fsl_esai_dai_ops = {
.hw_params = fsl_esai_hw_params,
.set_sysclk = fsl_esai_set_dai_sysclk,
.set_fmt = fsl_esai_set_dai_fmt,
+   .of_xlate_tdm_slot_mask = fsl_asoc_of_xlate_tdm_slot_mask,
.set_tdm_slot = fsl_esai_set_dai_tdm_slot,
 };
 
-- 
1.8.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/10] ASoC: fsl-utils: Add fsl_asoc_of_xlate_tdm_slot_mask() support.

2014-02-25 Thread Xiubo Li
This patch add fsl_asoc_of_xlate_tdm_slot_mask() support for utils.
For the some spcified DAI driver, this will be used to generate the
TDM slot TX/RX mask. And the TX/RX mask will use a 0 bit for an active
slot as default, and the default active bits are at the LSB of
the masks.

Signed-off-by: Xiubo Li 
---
 sound/soc/fsl/fsl_utils.c | 27 +++
 sound/soc/fsl/fsl_utils.h |  4 +++-
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/sound/soc/fsl/fsl_utils.c b/sound/soc/fsl/fsl_utils.c
index b9e42b5..b536eb1 100644
--- a/sound/soc/fsl/fsl_utils.c
+++ b/sound/soc/fsl/fsl_utils.c
@@ -86,6 +86,33 @@ int fsl_asoc_get_dma_channel(struct device_node *ssi_np,
 }
 EXPORT_SYMBOL(fsl_asoc_get_dma_channel);
 
+/**
+ * fsl_asoc_of_xlate_tdm_slot_mask - generate TDM slot TX/RX mask.
+ *
+ * @slots: Number of slots in use.
+ * @tx_mask: bitmask representing active TX slots.
+ * @rx_mask: bitmask representing active RX slots.
+ *
+ * This function used to generate the TDM slot TX/RX mask. And the TX/RX
+ * mask will use a 0 bit for an active slot as default, and the default
+ * active bits are at the LSB of the mask value.
+ */
+int fsl_asoc_of_xlate_tdm_slot_mask(unsigned int slots,
+   unsigned int *tx_mask,
+   unsigned int *rx_mask)
+{
+   if (!slots)
+   return -EINVAL;
+
+   if (tx_mask)
+   *tx_mask = ~((1 << slots) - 1);
+   if (rx_mask)
+   *rx_mask = ~((1 << slots) - 1);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(fsl_asoc_of_xlate_tdm_slot_mask);
+
 MODULE_AUTHOR("Timur Tabi ");
 MODULE_DESCRIPTION("Freescale ASoC utility code");
 MODULE_LICENSE("GPL v2");
diff --git a/sound/soc/fsl/fsl_utils.h b/sound/soc/fsl/fsl_utils.h
index b295112..01b01f9 100644
--- a/sound/soc/fsl/fsl_utils.h
+++ b/sound/soc/fsl/fsl_utils.h
@@ -22,5 +22,7 @@ int fsl_asoc_get_dma_channel(struct device_node *ssi_np, 
const char *name,
 struct snd_soc_dai_link *dai,
 unsigned int *dma_channel_id,
 unsigned int *dma_id);
-
+int fsl_asoc_of_xlate_tdm_slot_mask(unsigned int slots,
+   unsigned int *tx_mask,
+   unsigned int *rx_mask);
 #endif /* _FSL_UTILS_H */
-- 
1.8.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/10] ASoC: core: Add snd_soc_dai_set_tdm_slot_xlate().

2014-02-25 Thread Xiubo Li
For most cases the rx_mask and tx_mask params have no use for
snd_soc_dai_set_tdm_slot(), because they could be generated by
{XXX_ .}of_xlate_tdm_slot_mask().

This patch add snd_soc_dai_set_tdm_slot_xlate() which will replace
the snd_soc_dai_set_tdm_slot() in some use cases to simplify the
code. And for some CODECs or CPU DAI devices there needed much more
work to support the .of_xlate_tdm_slot_mask feature.

This patch can be applied to most use case of the current DAI drivers.

Signed-off-by: Xiubo Li 
---
 include/sound/soc-dai.h |  3 +++
 sound/soc/soc-core.c| 33 -
 2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/include/sound/soc-dai.h b/include/sound/soc-dai.h
index d86e0fc..68569ee 100644
--- a/include/sound/soc-dai.h
+++ b/include/sound/soc-dai.h
@@ -110,6 +110,9 @@ int snd_soc_dai_set_bclk_ratio(struct snd_soc_dai *dai, 
unsigned int ratio);
 /* Digital Audio interface formatting */
 int snd_soc_dai_set_fmt(struct snd_soc_dai *dai, unsigned int fmt);
 
+int snd_soc_dai_set_tdm_slot_xlate(struct snd_soc_dai *dai,
+  unsigned int slots,
+  unsigned int slot_width);
 int snd_soc_dai_set_tdm_slot(struct snd_soc_dai *dai,
unsigned int tx_mask, unsigned int rx_mask, int slots, int slot_width);
 
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index 0911856..e5a535b 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -3687,19 +3687,20 @@ static int snd_soc_of_xlate_tdm_slot_mask(unsigned int 
slots,
 }
 
 /**
- * snd_soc_dai_set_tdm_slot - configure DAI TDM.
+ * snd_soc_dai_set_tdm_slot_xlate - configure DAI TDM with of xlate.
  * @dai: DAI
- * @tx_mask: bitmask representing active TX slots.
- * @rx_mask: bitmask representing active RX slots.
  * @slots: Number of slots in use.
  * @slot_width: Width in bits for each slot.
  *
  * Configures a DAI for TDM operation. Both mask and slots are codec and DAI
  * specific.
  */
-int snd_soc_dai_set_tdm_slot(struct snd_soc_dai *dai,
-   unsigned int tx_mask, unsigned int rx_mask, int slots, int slot_width)
+int snd_soc_dai_set_tdm_slot_xlate(struct snd_soc_dai *dai,
+  unsigned int slots,
+  unsigned int slot_width)
 {
+   unsigned int tx_mask, rx_mask;
+
if (dai->driver && dai->driver->ops->of_xlate_tdm_slot_mask)
dai->driver->ops->of_xlate_tdm_slot_mask(slots,
_mask, _mask);
@@ -3712,6 +3713,28 @@ int snd_soc_dai_set_tdm_slot(struct snd_soc_dai *dai,
else
return -ENOTSUPP;
 }
+EXPORT_SYMBOL_GPL(snd_soc_dai_set_tdm_slot_xlate);
+
+/**
+ * snd_soc_dai_set_tdm_slot - configure DAI TDM.
+ * @dai: DAI
+ * @tx_mask: bitmask representing active TX slots.
+ * @rx_mask: bitmask representing active RX slots.
+ * @slots: Number of slots in use.
+ * @slot_width: Width in bits for each slot.
+ *
+ * Configures a DAI for TDM operation. Both mask and slots are codec and DAI
+ * specific.
+ */
+int snd_soc_dai_set_tdm_slot(struct snd_soc_dai *dai,
+   unsigned int tx_mask, unsigned int rx_mask, int slots, int slot_width)
+{
+   if (dai->driver && dai->driver->ops->set_tdm_slot)
+   return dai->driver->ops->set_tdm_slot(dai, tx_mask, rx_mask,
+   slots, slot_width);
+   else
+   return -ENOTSUPP;
+}
 EXPORT_SYMBOL_GPL(snd_soc_dai_set_tdm_slot);
 
 /**
-- 
1.8.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] ARM: select COMMON_CLK_SAMSUNG for ARCH_EXYNOS and ARCH_S3C64XX

2014-02-25 Thread Pankaj Dubey
CC: Russell King 
Signed-off-by: Pankaj Dubey 
---
 arch/arm/Kconfig |2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e254198..cc8868d 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -756,6 +756,7 @@ config ARCH_S3C64XX
select CLKDEV_LOOKUP
select CLKSRC_SAMSUNG_PWM
select COMMON_CLK
+   select COMMON_CLK_SAMSUNG
select CPU_V6K
select GENERIC_CLOCKEVENTS
select GPIO_SAMSUNG
@@ -835,6 +836,7 @@ config ARCH_EXYNOS
select ARCH_SPARSEMEM_ENABLE
select ARM_GIC
select COMMON_CLK
+   select COMMON_CLK_SAMSUNG
select CPU_V7
select GENERIC_CLOCKEVENTS
select HAVE_S3C2410_I2C if I2C
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] ARM: S3C24XX: select COMMON_CLK_SAMSUNG for S3C24XX

2014-02-25 Thread Pankaj Dubey
CC: Ben Dooks 
CC: Kukjin Kim 
CC: Russell King 
Signed-off-by: Pankaj Dubey 
---
 arch/arm/mach-s3c24xx/Kconfig |3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm/mach-s3c24xx/Kconfig b/arch/arm/mach-s3c24xx/Kconfig
index 80373da..5cf82a1 100644
--- a/arch/arm/mach-s3c24xx/Kconfig
+++ b/arch/arm/mach-s3c24xx/Kconfig
@@ -40,6 +40,7 @@ config CPU_S3C2410
 config CPU_S3C2412
bool "SAMSUNG S3C2412"
select COMMON_CLK
+   select COMMON_CLK_SAMSUNG
select CPU_ARM926T
select CPU_LLSERIAL_S3C2440
select S3C2412_COMMON_CLK
@@ -51,6 +52,7 @@ config CPU_S3C2412
 config CPU_S3C2416
bool "SAMSUNG S3C2416/S3C2450"
select COMMON_CLK
+   select COMMON_CLK_SAMSUNG
select CPU_ARM926T
select CPU_LLSERIAL_S3C2440
select S3C2416_PM if PM
@@ -89,6 +91,7 @@ config CPU_S3C244X
 config CPU_S3C2443
bool "SAMSUNG S3C2443"
select COMMON_CLK
+   select COMMON_CLK_SAMSUNG
select CPU_ARM920T
select CPU_LLSERIAL_S3C2440
select S3C2443_COMMON_CLK
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] introduce new config option for samsung common clock

2014-02-25 Thread Pankaj Dubey
This patchset introduces a new config option for Samsung
common clock infrastructure as COMMON_CLK_SAMSUNG. As current
samsung common clock gets compiled based on PLAT_SAMSUNG, but moving 
ahead with ARM64 we can not have any more such config options, so 
better we introduce new COMMON_CLK_SAMSUNG and make it dependent
on existing ARCH_ which needs SAMSUNG common clock infrastructure, also
on ARM64.

Pankaj Dubey (3):
  drivers: clk: add samsung common clock config option
  ARM: select COMMON_CLK_SAMSUNG for ARCH_EXYNOS and ARCH_S3C64XX
  ARM: S3C24XX: select COMMON_CLK_SAMSUNG for S3C24XX

 arch/arm/Kconfig  |2 ++
 arch/arm/mach-s3c24xx/Kconfig |3 +++
 drivers/clk/Kconfig   |   10 ++
 drivers/clk/Makefile  |2 +-
 4 files changed, 16 insertions(+), 1 deletion(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 00/10] Simplify the code of TDM slot setting

2014-02-25 Thread Xiubo Li

Xiubo Li (10):
  ASoC: core: Add snd_soc_dai_set_tdm_slot_xlate().
  ASoC: fsl-utils: Add fsl_asoc_of_xlate_tdm_slot_mask() support.
  ASoC: fsl-esai: Add .of_xlate_tdm_slot_mask() support.
  ASoC: fsl-ssi: Add .of_xlate_tdm_slot_mask() support.
  ASoC: imx-ssi: Add .of_xlate_tdm_slot_mask() support.
  ASoC: simple-card: Use snd_soc_dai_set_tdm_slot_xlate()
  ASoC: blackfin: bf5xx-ad1836: Use snd_soc_dai_set_tdm_slot_xlate()
  ASoC: blackfin: bf5xx-ad193x: Use snd_soc_dai_set_tdm_slot_xlate()
  ASoC: fsl: eukrea-tlv320: Use snd_soc_dai_set_tdm_slot_xlate()
  ASoC: fsl: wm1133-ev1: Use snd_soc_dai_set_tdm_slot_xlate()

 include/sound/soc-dai.h   |  3 +++
 sound/soc/blackfin/bf5xx-ad1836.c |  2 +-
 sound/soc/blackfin/bf5xx-ad193x.c |  4 ++--
 sound/soc/fsl/Kconfig |  3 +++
 sound/soc/fsl/eukrea-tlv320.c |  2 +-
 sound/soc/fsl/fsl_esai.c  |  2 ++
 sound/soc/fsl/fsl_ssi.c   |  2 ++
 sound/soc/fsl/fsl_utils.c | 27 +++
 sound/soc/fsl/fsl_utils.h |  4 +++-
 sound/soc/fsl/imx-ssi.c   |  2 ++
 sound/soc/fsl/wm1133-ev1.c| 11 +--
 sound/soc/generic/simple-card.c   |  7 +++
 sound/soc/soc-core.c  | 33 -
 13 files changed, 78 insertions(+), 24 deletions(-)

-- 
1.8.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] cpufreq: stats: Remove redundant cpufreq_cpu_get() call

2014-02-25 Thread Viresh Kumar
On 26 February 2014 09:08, Saravana Kannan  wrote:
> __cpufreq_stats_create_table always gets pass the valid and real policy
> struct. So, there's no need to call cpufreq_cpu_get() to get the policy
> again.
>
> Change-Id: I0136b3e67018ee3af2335906407f55d8c6219f71

??

> Signed-off-by: Saravana Kannan 
> ---
>
> Viresh/Rafael,
>
> These 3 patches is the approximate code I have in mind.
>
> Approximate because:
> * I inserted one question as a comment into the code.
> * If the patch doesn't have any bugs, the plan is to remove
>   cpufreq_generic_get() and references to it.
>
> This takes care of the "don't advertise before it's ready for use" rule.
>
> Viresh,
>
> I think the locking updates needs to be done in addition to this.
>
> Regards,
> Saravana
>
>  drivers/cpufreq/cpufreq_stats.c | 12 +---
>  1 file changed, 1 insertion(+), 11 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
> index 5793e14..e4bd27f 100644
> --- a/drivers/cpufreq/cpufreq_stats.c
> +++ b/drivers/cpufreq/cpufreq_stats.c
> @@ -185,7 +185,6 @@ static int __cpufreq_stats_create_table(struct 
> cpufreq_policy *policy,
>  {
> unsigned int i, j, count = 0, ret = 0;
> struct cpufreq_stats *stat;
> -   struct cpufreq_policy *current_policy;
> unsigned int alloc_size;
> unsigned int cpu = policy->cpu;
> if (per_cpu(cpufreq_stats_table, cpu))
> @@ -194,13 +193,7 @@ static int __cpufreq_stats_create_table(struct 
> cpufreq_policy *policy,
> if ((stat) == NULL)
> return -ENOMEM;
>
> -   current_policy = cpufreq_cpu_get(cpu);
> -   if (current_policy == NULL) {
> -   ret = -EINVAL;
> -   goto error_get_fail;
> -   }
> -
> -   ret = sysfs_create_group(_policy->kobj, _attr_group);
> +   ret = sysfs_create_group(>kobj, _attr_group);
> if (ret)
> goto error_out;
>
> @@ -243,11 +236,8 @@ static int __cpufreq_stats_create_table(struct 
> cpufreq_policy *policy,
> stat->last_time = get_jiffies_64();
> stat->last_index = freq_table_get_index(stat, policy->cur);
> spin_unlock(_stats_lock);
> -   cpufreq_cpu_put(current_policy);
> return 0;
>  error_out:
> -   cpufreq_cpu_put(current_policy);
> -error_get_fail:
> kfree(stat);
> per_cpu(cpufreq_stats_table, cpu) = NULL;
> return ret;

I was damn sure that this wasn't a waste of time. This was some meaningful
code when I visited it earlier. And we absolutely required a new
cpufreq_cpu_get()..

Reason: Earlier tables were created for this notifier: CPUFREQ_NOTIFY and
it used to come with another changed copy of 'policy' and so we were required
to get the real copy of policy to get to the right kobj.

But recently I have simplified stuff there and these tables are now added with
CPUFREQ_CREATE_POLICY and so this replication isn't required anymore.

So, Acked-by: Viresh Kumar 

While you are at it please get this part into __cpufreq_stats_create_table()
routine:

table = cpufreq_frequency_get_table(cpu);
if (!table)
return 0;

As it is replicated at two places currently.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: tegra: add device tree for SHIELD

2014-02-25 Thread Stephen Warren
On 02/25/2014 09:58 PM, Alexandre Courbot wrote:
> On 02/26/2014 07:38 AM, Stephen Warren wrote:
>> On 02/24/2014 07:13 PM, Alexandre Courbot wrote:
>>> On 02/25/2014 03:53 AM, Stephen Warren wrote:
 On 02/24/2014 03:26 AM, Alexandre Courbot wrote:
> Add a device tree for NVIDIA SHIELD. The set of enabled features is
> still minimal with no display option (although HDMI should be easy
> to get to work) and USB requiring external power.
...
>>> For the Wifi chip, non-removable would be the correct setting
>>> hardware-wise, but there is a trap: the chip has its reset line asserted
>>> at boot-time, and you need to set GPIO 229 to de-assert it. Only after
>>> that will the device be detected on the SDIO bus. Since it lacks a CD
>>> line, it must be polled, hence the broken-cd property.
>>
>> How does that GPIO get manipulated right now? I assume you must be
>> manually configuring it via sysfs after boot or something? If so,
>> perhaps it's best to just leave out the WiFi node until it works
>> automatically.
> 
> The GPIO needs to be set from user-space, yes. But if we leave the Wifi
> node out, I'm concerned that wireless will not be usable at all,
> wouldn't it?

True, but if we have no representation of the device in DT that works
without manually enabling clocks and/or GPIOs, it's not a
complete/accurate representation of the HW, so it doesn't make sense to
add it to DT. Yes, I admit that sucks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: tegra: add device tree for SHIELD

2014-02-25 Thread Alexandre Courbot

On 02/26/2014 07:38 AM, Stephen Warren wrote:

On 02/24/2014 07:13 PM, Alexandre Courbot wrote:

On 02/25/2014 03:53 AM, Stephen Warren wrote:

On 02/24/2014 03:26 AM, Alexandre Courbot wrote:

Add a device tree for NVIDIA SHIELD. The set of enabled features is
still minimal with no display option (although HDMI should be easy
to get to work) and USB requiring external power.



diff --git a/arch/arm/boot/dts/tegra114-roth.dts
b/arch/arm/boot/dts/tegra114-roth.dts



+memory {
+reg = <0x8000 0x7960>;


It might be worth a comment here pointing out that the rest of RAM is
reserved for some carveouts/..., or at least that these values are set
this way in order to match what the bootloader usually passes to
downstream kernels in the command-line?


Yes, absolutely right. On a more general note I feel like DTs could gain
clarity if they had more comments (e.g. for pinmuxing which are a quite
heavy block otherwise), do you have any objection to this? (I guess not,
but so far the rule seems to be "no comment in DT" :P )


I have no objection in particular. Specifically for pinmux, the values
seem pretty obvious, so I'm not sure what extra the comment could
convey, but I'll take a look at any proposed patch:-)


It would just make grouping of related pins according more visible than 
having to look at the "nvidia,function" property currently does - just a 
little added comfort.



+/* Wifi */
+sdhci@7800 {
+status = "okay";
+bus-width = <4>;
+broken-cd;
+keep-power-in-suspend;
+cap-sdio-irq;


Is non-removable better than broken-cd, or are they entirely unrelated?


They are unrelated actually. With non-removable the driver expects the
device to always be there since boot, and does not check for the card to
be removed/added after boot. broken-cd indicates there is no CD line and
the device should be polled regularly.


It doesn't sound like that's what we want either; we should know exactly
when the device is added/removed, based on when the relevant
clocks/supplies/... are turned on/off.


Yes, I guess this will require a proper DT binding like what Arend proposed.


For the Wifi chip, non-removable would be the correct setting
hardware-wise, but there is a trap: the chip has its reset line asserted
at boot-time, and you need to set GPIO 229 to de-assert it. Only after
that will the device be detected on the SDIO bus. Since it lacks a CD
line, it must be polled, hence the broken-cd property.


How does that GPIO get manipulated right now? I assume you must be
manually configuring it via sysfs after boot or something? If so,
perhaps it's best to just leave out the WiFi node until it works
automatically.


The GPIO needs to be set from user-space, yes. But if we leave the Wifi 
node out, I'm concerned that wireless will not be usable at all, 
wouldn't it?



This also raises another, redundant problem with DT bindings: AFAIK we
currently have no way to let the system know the device will only appear
after a given GPIO is set. It would also be nice to be able to give some
parameters to the Wifi driver through the DT (like the OOB interrupt).
Right now the Wifi chip is brought up by exporting the GPIO and writing
to it from user-space, and the OOB interrupt is not used.


There was a thread on this topic on LAKML recently. I didn't really
follow it, so I don't know if there was a useful resolution. I think it
was "mmc: add support for power-on sequencing through DT", although
there may have been other related threads. It was possibly tangentially
related to power-sequences-in-DT...

...

I'm not sure about cap-sdio-irq, it doesn't seem to make a difference
for SHIELD Wifi.


I'd tend to leave it out then.


I will check whether it helps with the latency issues I have noticed and 
remove it if it doesn't.


Thanks,
Alex.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clk: samsung: remove parentheses from return statements

2014-02-25 Thread Pankaj Dubey

Hi Sachin,

On 02/26/2014 01:12 PM, Sachin Kamat wrote:

Hi Pankaj,

On 26 February 2014 08:14, Pankaj Dubey  wrote:

fixed following checkpatch warning message
"return is not a function, parentheses are not required"

Signed-off-by: Pankaj Dubey 

Similar patch has already been submitted:
http://comments.gmane.org/gmane.linux.ports.arm.kernel/294530



Sorry I missed that.
Since same change already submitted we can ignore my patch.

--
Best Regards,
Pankaj Dubey

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V5 2/4] DRIVERS: IRQCHIP: CROSSBAR: Add support for Crossbar IP

2014-02-25 Thread Sricharan R
Hi Tony,

On Wednesday 05 February 2014 07:41 PM, Sricharan R wrote:
> Tony,
> 
> On Wednesday 05 February 2014 06:41 PM, Sricharan R wrote:
>> On Tuesday 04 February 2014 09:44 PM, Thomas Gleixner wrote:
>>> On Mon, 3 Feb 2014, Sricharan R wrote:
> I already have your reviewed-by tag for the first patch in this series.
>
> Kevin was pointing out that irqchip maintainer tag is needed for this 
> patch as well
> to be merged. We are planning to take this series through arm-soc tree.
>
> Can i please have your tag for this patch as well ?
>>>
>>> Acked-by-me
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> Thanks Thomas.
>>
>> Kevin,
>> I will re-send a branch based on rc1 for this.
>>
> 
> I have pushed a branch based on mainline,
>git://github.com/Sricharanti/sricharan.git
>branch: crossbar_3.14_rc1
> 
 Ping on this..

Regards,
 Sricharan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: tegra: add device tree for SHIELD

2014-02-25 Thread Alexandre Courbot

On 02/25/2014 06:52 PM, Arend van Spriel wrote:

On 02/25/2014 03:13 AM, Alexandre Courbot wrote:



+/* Wifi */
+sdhci@7800 {
+status = "okay";
+bus-width = <4>;
+broken-cd;
+keep-power-in-suspend;
+cap-sdio-irq;


Is non-removable better than broken-cd, or are they entirely unrelated?


They are unrelated actually. With non-removable the driver expects the
device to always be there since boot, and does not check for the card to
be removed/added after boot. broken-cd indicates there is no CD line and
the device should be polled regularly.

For the Wifi chip, non-removable would be the correct setting
hardware-wise, but there is a trap: the chip has its reset line asserted
at boot-time, and you need to set GPIO 229 to de-assert it. Only after
that will the device be detected on the SDIO bus. Since it lacks a CD
line, it must be polled, hence the broken-cd property.

This also raises another, redundant problem with DT bindings: AFAIK we
currently have no way to let the system know the device will only appear
after a given GPIO is set. It would also be nice to be able to give some
parameters to the Wifi driver through the DT (like the OOB interrupt).
Right now the Wifi chip is brought up by exporting the GPIO and writing
to it from user-space, and the OOB interrupt is not used.


Hi Alexandre,

I recently posted a proposal for brcmfmac DT binding [1]. I did receive
some comments, but it would be great if you (and/or others involved) had
a look at it as well and give me some feedback. DT work still needs to
grow on me.


Hi Arend, (and thanks again for all the help with getting the chip to work!)

Great, I'm not subscribed to the devicetree list and so have missed this 
thread, but I'm glad to see it.


I don't think I have much to add to the comments you already received 
there. I'd need it to reference the 32K clock (which I currently 
force-enable manually), the OOB interrupt, and the reset pin as a GPIO 
(as for SHIELD the device needs to be put out of reset using an 
active-low GPIO before anything can happen). That last property could be 
optional as I suspect most designs won't use it.


Getting the device out of reset should be done before the bus probes the 
non-removable device, so I wonder how this would fit wrt. the DT 
power-on sequencing series by Olof. Something tells me this could rather 
be a property of the bus, but physically speaking the pin is connected 
to the wifi chip, so... Maybe we could get the platform driver to ask 
the bus to probe again after enabling power/getting the device out of reset?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 0/10] fs: Introduce new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocate

2014-02-25 Thread Hugh Dickins
On Wed, 26 Feb 2014, Dave Chinner wrote:
> On Tue, Feb 25, 2014 at 03:23:35PM -0800, Hugh Dickins wrote:
> > On Tue, 25 Feb 2014, Dave Chinner wrote:
> > > On Tue, Feb 25, 2014 at 02:16:01PM +1100, Stephen Rothwell wrote:
> > > > On Mon, 24 Feb 2014 11:57:10 +1100 Dave Chinner  
> > > > wrote:
> > > > >
> > > > > > Namjae Jeon (10):
> > > > > >   fs: Add new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocate
> > > > > >   xfs: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate
> > > > > 
> > > > > I've pushed these to the following branch:
> > > > > 
> > > > >   git://oss.sgi.com/xfs/xfs.git xfs-collapse-range
> > > > > 
> > > > > And so they'll be in tomorrow's linux-next tree.
> > > > > 
> > > > > >   ext4: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate
> > > > > 
> > > > > I've left this one alone for the ext4 guys to sort out.
> > > > 
> > > > So presumably that xfs tree branch is now completely stable and so Ted
> > > > could just merge that branch into the ext4 tree as well and put the ext4
> > > > part on top of that in his tree.
> > > 
> > > Well, for some definition of stable. Right now it's just a topic
> > > branch that is merged into the for-next branch, so in theory it is
> > > still just a set of pending changes in a branch in a repo that has
> > > been pushed to linux-next for testing.
> > > 
> > > That said, I don't see that branch changing unless we find bugs in
> > > the code or a problem with the API needs fixing, at which point I
> > > would add more commits to it and rebase the for-next branch that you
> > > are pulling into the linux-next tree.
> > > 
> > > Realistically, I'm waiting for Lukas to repost his other pending
> > > fallocate changes (the zero range changes) so I can pull the VFS and
> > > XFS bits of that into the XFS tree and I can test them together
> > > before I'll call the xfs-collapse-range stable and ready to be
> > > merged into some other tree...
> > 
> > Thank you, Namjae and Dave, for driving this; and thank you, Ted and
> > Matthew, for raising appropriate mmap concerns (2013-7-31 and 2014-2-2).
> > I was aware of this work in progress, but only now found time to look.
> > 
> > I've not studied the implementation, knowing too little of ext4 and
> > xfs; but it sounds like the approach you've taken, writing out dirties
> > and truncating all pagecache from the critical offset onwards, is the
> > sensible approach for now - lame, and leaves me wondering whether an
> > offline tool wouldn't be more appropriate; but a safe place to start
> > if we suppose it will be extended to handle pagecache better in future.
> 
> Offline? You mean with the filesystem unmounted, or something else?

"Something else": which I don't expect you to leap into implementing.

> 
> > Of course I'm interested in the possibility of extending it to tmpfs;
> > which may not be a worthwhile exercise in itself, except that it would
> > force us to face and solve any pagecache/radixtree issues, if possible,
> > thereby enhancing the support for disk-based filesystems.
> > 
> > I doubt we should look into that before Jan Kara's range locking mods
> > arrive, or are rejected.  As I understand it, you're going ahead with
> > this, knowing that there can be awkward races with concurrent faults -
> > more likely to cause trinity fuzzer reports, than interfere with daily
> > usage (trinity seems to be good at faulting into holes being punched).
> 
> Yes, the caveat is that the applications that use it (TVs, DVRs, NLE
> applications, etc) typically don't use mmap for accessing the data
> stream being modified. Further, it's much less generally useful than
> holepunching, so when these two are combined, the likely exposure to
> issues resulting from mmap deficiencies are pretty damn low.

Agreed, but we do want to define how the interaction behaves, we want
it to be the same across all filesystems supporting COLLAPSE_RANGE,
and we don't want it to lead to system crashes or corruptions.

> 
> > That's probably the right pragmatic decision; but I'm a little worried
> > that it's justfied by saying we already have such races in hole-punch.
> > Collapse is significantly more challenging than either hole-punch or
> > truncation: the shifting of pages from one offset to another is new,
> > and might present nastier bugs.
> 
> Symptoms might be different, but it's exactly the same problem. i.e.
> mmap_sem locking inversions preventing the filesystem from
> serialising IO path operations like hole punch, truncate and other
> extent manipulation operations against concurrent page faults
> that enter the IO path.

That may (may) be true of the current kick-everything-out-of-pagecache
approach.  But in general I stand by "Collapse is significantly more
challenging".  Forgive me if that amounts to saying "Hey, here's a
more complicated way to do it.  Ooh, this way is more complicated."
The concept of moving a page from one file offset to another is new,
and can be expected to pose new difficulties.

> 
> > 

Re: [PATCHv7 6/7] zram: add set_max_streams knob

2014-02-25 Thread Minchan Kim
On Tue, Feb 25, 2014 at 02:34:32PM +0300, Sergey Senozhatsky wrote:
> This patch allows to change max_comp_streams on initialised zcomp.
> 
> Introduce zcomp set_max_streams() knob, zcomp_strm_multi_set_max_streams()
> and zcomp_strm_single_set_max_streams() callbacks to change streams limit
> for zcomp_strm_multi and zcomp_strm_single, accordingly. set_max_streams
> for single steam zcomp does nothing.
> 
> If user has lowered the limit, then zcomp_strm_multi_set_max_streams()
> attempts to immediately free extra streams (as much as it can, depending
> on idle streams availability).
> 
> Note, this patch does not allow to change stream 'policy' from single to
> multi stream (or vice versa) on already initialised compression backend,
> this will be introduced later.
> 
> Signed-off-by: Sergey Senozhatsky 
> ---
>  drivers/block/zram/zcomp.c| 39 +++
>  drivers/block/zram/zcomp.h|  3 +++
>  drivers/block/zram/zram_drv.c |  7 ++-
>  3 files changed, 44 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> index 1bcb70e..adfbfee 100644
> --- a/drivers/block/zram/zcomp.c
> +++ b/drivers/block/zram/zcomp.c
> @@ -135,6 +135,32 @@ static void zcomp_strm_multi_put(struct zcomp *comp, 
> struct zcomp_strm *zstrm)
>   wake_up(>strm_wait);
>  }
>  
> +/* change max_strm limit */
> +static void zcomp_strm_multi_set_max_streams(struct zcomp *comp, int 
> num_strm)
> +{
> + struct zcomp_strm_multi *zs = comp->stream;
> +
> + /* single stream handled as a special case by zcomp_strm_single */
> + if (num_strm < 2)
> + return;

I doubt it's good idea. Just leave it as you want although it makes
performance hurt because user want to get more free memory and it
doesn't make any regression for single stream side.

Additionally, if we enhance it with adaptive lock scheme,
performance problem would be gone. :)

But if you want to limit it really, at least, need to notice to
user. Maybe pr_info?

> +
> + spin_lock(>strm_lock);
> + zs->max_strm = num_strm;
> + /*
> +  * if user has lowered the limit and there are idle streams,
> +  * immediately free as much streams (and memory) as we can.
> +  */
> + while (atomic_read(>avail_strm) > num_strm &&
> + !list_empty(>idle_strm)) {
> + struct zcomp_strm *zstrm = list_entry(zs->idle_strm.next,
> + struct zcomp_strm, list);
> + list_del(>list);
> + zcomp_strm_free(comp, zstrm);
> + atomic_dec(>avail_strm);
> + }
> + spin_unlock(>strm_lock);
> +}
> +
>  static void zcomp_strm_multi_destroy(struct zcomp *comp)
>  {
>   struct zcomp_strm_multi *zs = comp->stream;
> @@ -158,6 +184,7 @@ static int zcomp_strm_multi_create(struct zcomp *comp, 
> int num_strm)
>   comp->destroy = zcomp_strm_multi_destroy;
>   comp->strm_get = zcomp_strm_multi_get;
>   comp->strm_put = zcomp_strm_multi_put;
> + comp->set_max_streams = zcomp_strm_multi_set_max_streams;
>   zs = kmalloc(sizeof(struct zcomp_strm_multi), GFP_KERNEL);
>   comp->stream = zs;
>   if (!zs)
> @@ -193,6 +220,12 @@ static void zcomp_strm_single_put(struct zcomp *comp, 
> struct zcomp_strm *zstrm)
>   mutex_unlock(>strm_lock);
>  }
>  
> +static void zcomp_strm_single_set_max_streams(struct zcomp *comp, int 
> num_strm)
> +{
> + /* zcomp_strm_single support only max_comp_streams == 1 */

I hope we should notice to user.
Maybe this function returns true or false and caller should emit some message
with pr_info?


> + return;
> +}
> +
>  static void zcomp_strm_single_destroy(struct zcomp *comp)
>  {
>   struct zcomp_strm_single *zs = comp->stream;
> @@ -207,6 +240,7 @@ static int zcomp_strm_single_create(struct zcomp *comp)
>   comp->destroy = zcomp_strm_single_destroy;
>   comp->strm_get = zcomp_strm_single_get;
>   comp->strm_put = zcomp_strm_single_put;
> + comp->set_max_streams = zcomp_strm_single_set_max_streams;
>   zs = kmalloc(sizeof(struct zcomp_strm_single), GFP_KERNEL);
>   comp->stream = zs;
>   if (!zs)
> @@ -221,6 +255,11 @@ static int zcomp_strm_single_create(struct zcomp *comp)
>   return 0;
>  }
>  
> +void zcomp_set_max_streams(struct zcomp *comp, int num_strm)
> +{
> + comp->set_max_streams(comp, num_strm);
> +}
> +
>  struct zcomp_strm *zcomp_strm_get(struct zcomp *comp)
>  {
>   return comp->strm_get(comp);
> diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
> index 5514509..9645e16 100644
> --- a/drivers/block/zram/zcomp.h
> +++ b/drivers/block/zram/zcomp.h
> @@ -39,6 +39,7 @@ struct zcomp {
>  
>   struct zcomp_strm *(*strm_get)(struct zcomp *comp);
>   void (*strm_put)(struct zcomp *comp, struct zcomp_strm *zstrm);
> + void (*set_max_streams)(struct zcomp *comp, int num_strm);
>   void (*destroy)(struct zcomp *comp);
>  };
>  
> @@ -53,4 +54,6 

Re: [PATCHv7 3/7] zram: factor out single stream compression

2014-02-25 Thread Minchan Kim
Hello Sergey,

On Tue, Feb 25, 2014 at 02:34:29PM +0300, Sergey Senozhatsky wrote:
> This is preparation patch to add multi stream support to zcomp.
> 
> Introduce struct zcomp_strm_single and a set of functions to manage zcomp_strm
> stream access. zcomp_strm_single implements single compession stream, same way
> as current zcomp implementation. This moves zcomp_strm stream control and
> locking from zcomp, so compressing backend zcomp is not aware of required
> locking (single and multi streams require different locking schemes).

Please, add why we need different locking scheme in here so that
some people understand why we need this via git log in future.

> 
> The following set of functions added:
> - zcomp_strm_single_get()/zcomp_strm_single_put()
>   get and put compression stream, implement required locking
> - zcomp_strm_single_create()/zcomp_strm_single_destroy()
>   create and destroy zcomp_strm_single
> 
> New ->strm_get() and ->strm_put() callbacks added to zcomp, which are set to
> zcomp_strm_single_get() and zcomp_strm_single_put() during initialisation.
> Instead of direct locking and zcomp_strm access from zcomp_strm_get() and
> zcomp_strm_put(), zcomp now calls ->strm_get() and ->strm_put()
> correspondingly.
> 
> Signed-off-by: Sergey Senozhatsky 
> ---
>  drivers/block/zram/zcomp.c | 65 
> +++---
>  drivers/block/zram/zcomp.h |  7 +++--
>  2 files changed, 61 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> index 947efe3..e20054b 100644
> --- a/drivers/block/zram/zcomp.c
> +++ b/drivers/block/zram/zcomp.c
> @@ -15,6 +15,14 @@
>  
>  #include "zcomp.h"
>  
> +/*
> + * single zcomp_strm backend
> + */
> +struct zcomp_strm_single {
> + struct mutex strm_lock;
> + struct zcomp_strm *zstrm;
> +};
> +
>  extern struct zcomp_backend zcomp_lzo;
>  
>  static struct zcomp_backend *find_backend(const char *compress)
> @@ -55,17 +63,58 @@ static struct zcomp_strm *zcomp_strm_alloc(struct zcomp 
> *comp)
>   return zstrm;
>  }
>  
> +static struct zcomp_strm *zcomp_strm_single_get(struct zcomp *comp)
> +{
> + struct zcomp_strm_single *zs = comp->stream;
> + mutex_lock(>strm_lock);
> + return zs->zstrm;
> +}
> +
> +static void zcomp_strm_single_put(struct zcomp *comp, struct zcomp_strm 
> *zstrm)
> +{
> + struct zcomp_strm_single *zs = comp->stream;
> + mutex_unlock(>strm_lock);
> +}
> +
> +static void zcomp_strm_single_destroy(struct zcomp *comp)
> +{
> + struct zcomp_strm_single *zs = comp->stream;
> + zcomp_strm_free(comp, zs->zstrm);
> + kfree(zs);
> +}
> +
> +static int zcomp_strm_single_create(struct zcomp *comp)
> +{
> + struct zcomp_strm_single *zs;
> +
> + comp->destroy = zcomp_strm_single_destroy;
> + comp->strm_get = zcomp_strm_single_get;
> + comp->strm_put = zcomp_strm_single_put;
> + zs = kmalloc(sizeof(struct zcomp_strm_single), GFP_KERNEL);
> + comp->stream = zs;
> + if (!zs)
> + return -ENOMEM;

Firstly check zs nullness and then assign zs to the comp->stream.
Yeb. your code doesn't have any problem but let's follow normal
convention.

> +
> + mutex_init(>strm_lock);
> + zs->zstrm = zcomp_strm_alloc(comp);
> + if (!zs->zstrm) {
> + zcomp_strm_single_destroy(comp);

Let's just call kfree in here instead of xxx_destroy.
such pair function call is more clear to me rather than using wrapping
function.


> + return -ENOMEM;
> + }
> + return 0;
> +}
> +
>  struct zcomp_strm *zcomp_strm_get(struct zcomp *comp)
>  {
> - mutex_lock(>strm_lock);
> - return comp->zstrm;
> + return comp->strm_get(comp);
>  }
>  
>  void zcomp_strm_put(struct zcomp *comp, struct zcomp_strm *zstrm)
>  {
> - mutex_unlock(>strm_lock);
> + comp->strm_put(comp, zstrm);
>  }
>  
> +/* compress page */

Function name is clear so I think we don't need a comment.
If we need such comment, it should introduce previous patches, not this one.

>  int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
>   const unsigned char *src, size_t *dst_len)
>  {
> @@ -73,6 +122,7 @@ int zcomp_compress(struct zcomp *comp, struct zcomp_strm 
> *zstrm,
>   zstrm->private);
>  }
>  
> +/* decompress page */
>  int zcomp_decompress(struct zcomp *comp, const unsigned char *src,
>   size_t src_len, unsigned char *dst)
>  {
> @@ -81,7 +131,7 @@ int zcomp_decompress(struct zcomp *comp, const unsigned 
> char *src,
>  
>  void zcomp_destroy(struct zcomp *comp)
>  {
> - zcomp_strm_free(comp, comp->zstrm);
> + comp->destroy(comp);
>   kfree(comp);
>  }
>  
> @@ -105,11 +155,8 @@ struct zcomp *zcomp_create(const char *compress)
>   return NULL;
>  
>   comp->backend = backend;
> - mutex_init(>strm_lock);
> -
> - comp->zstrm = zcomp_strm_alloc(comp);
> - if (!comp->zstrm) {
> - kfree(comp);
> + if 

Re: [PATCHv7 4/7] zram: add multi stream functionality

2014-02-25 Thread Minchan Kim
On Tue, Feb 25, 2014 at 02:34:30PM +0300, Sergey Senozhatsky wrote:
> This patch implements multi stream compression support.
> 
> Introduce struct zcomp_strm_multi and a set of functions to manage
> zcomp_strm stream access. zcomp_strm_multi has a list of idle zcomp_strm
> structs, spinlock to protect idle list and wait queue, making it possible
> to perform parallel compressions.
> 
> The following set of functions added:
> - zcomp_strm_multi_get()/zcomp_strm_multi_put()
>   get and put compression stream, implement required locking
> - zcomp_strm_multi_create()/zcomp_strm_multi_destroy()
>   create and destroy zcomp_strm_multi
> 
> zcomp ->strm_get() and ->strm_put() callbacks are set during initialisation
> to zcomp_strm_multi_get()/zcomp_strm_multi_put() correspondingly.
> 
> Each time zcomp issues a zcomp_strm_multi_get() call, the following set of
> operations performed:
> - spin lock strm_lock
> - if idle list is not empty, remove zcomp_strm from idle list, spin
>   unlock and return zcomp stream pointer to caller
> - if idle list is empty, current adds itself to wait queue. it will be
>   awaken by zcomp_strm_multi_put() caller.
> 
> zcomp_strm_multi_put():
> - spin lock strm_lock
> - add zcomp stream to idle list
> - spin unlock, wake up sleeper
> 
> Minchan Kim reported that spinlock-based locking scheme has demonstrated a
> severe perfomance regression for single compression stream case, comparing
> to mutex-based (https://lkml.org/lkml/2014/2/18/16)
> 
> base  spinlockmutex
> 
> ==Initial write   ==Initial write ==Initial  write
> records:  5   records:  5 records:   5
> avg:  1642424.35  avg:  699610.40 avg:   1655583.71
> std:  39890.95(2.43%) std:  232014.19(33.16%) std:   52293.96
> max:  1690170.94  max:  1163473.45max:   1697164.75
> min:  1568669.52  min:  573429.88 min:   1553410.23
> ==Rewrite ==Rewrite   ==Rewrite
> records:  5   records:  5 records:   5
> avg:  1611775.39  avg:  501406.64 avg:   1684419.11
> std:  17144.58(1.06%) std:  15354.41(3.06%)   std:   18367.42
> max:  1641800.95  max:  531356.78 max:   1706445.84
> min:  1593515.27  min:  488817.78 min:   1655335.73
> ==Random  write   ==Random  write ==Random   write
> records:  5   records:  5 records:   5
> avg:  1626318.29  avg:  497250.78 avg:   1695582.06
> std:  38550.23(2.37%) std:  1405.42(0.28%)std:   9211.98
> max:  1665839.62  max:  498585.88 max:   1703808.22
> min:  1562141.21  min:  494526.45 min:   1677664.94
> ==Pwrite  ==Pwrite==Pwrite
> records:  5   records:  5 records:   5
> avg:  1654641.25  avg:  581709.22 avg:   1641452.34
> std:  47202.59(2.85%) std:  9670.46(1.66%)std:   38963.62
> max:  1740682.36  max:  591300.09 max:   1687387.69
> min:  1611436.34  min:  564324.38 min:   1570496.11
> 
> When only one compression stream available, mutex with spin on owner tends
> to perform much better than frequent wait_event()/wake_up(). This is why
> single stream implemented as a special case with mutex locking.
> 
> This is preparation patch, later patches will use this code.

I'm not sure it's good to introduce preparation patch without code using
that but builded together because it could make build warning.
It surely made code review easy so I'm not sure what's the best method
to make handle this case.

When I see next patch, it's not complex so I think we could merge this
and next patchset.

> 
> Signed-off-by: Sergey Senozhatsky 
> ---
>  drivers/block/zram/zcomp.c | 119 
> -
>  1 file changed, 118 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> index e20054b..6f238f5 100644
> --- a/drivers/block/zram/zcomp.c
> +++ b/drivers/block/zram/zcomp.c
> @@ -15,6 +15,8 @@
>  
>  #include "zcomp.h"
>  
> +extern struct zcomp_backend zcomp_lzo;
> +
>  /*
>   * single zcomp_strm backend
>   */
> @@ -23,7 +25,20 @@ struct zcomp_strm_single {
>   struct zcomp_strm *zstrm;
>  };
>  
> -extern struct zcomp_backend zcomp_lzo;
> +/*
> + * multi zcomp_strm backend
> + */
> +struct zcomp_strm_multi {
> + /* protect strm list */
> + spinlock_t strm_lock;
> + /* max possible number of zstrm streams */
> + int max_strm;
> + /* number of available zstrm streams */
> + atomic_t avail_strm;
> + /* list of available strms */
> + struct list_head idle_strm;
> + wait_queue_head_t strm_wait;
> +};
>  

Re: [PATCHv7 5/7] zram: enable multi stream compression support in zram

2014-02-25 Thread Minchan Kim
On Tue, Feb 25, 2014 at 02:34:31PM +0300, Sergey Senozhatsky wrote:
> 1) Introduce zram device attribute max_comp_streams to show and store
> current zcomp's max number of zcomp streams (num_strm).
> 
> 2) Extend zcomp zcomp_create() with `num_strm' parameter. `num_strm'
> limits the number of zcomp_strm structs in compression backend's idle
> list (max_comp_streams).
> 
> max_comp_streams used during initialisation as follows:
> -- passing to zcomp_create() num_strm equals to 1 will initialise zcomp
> using single compression stream zcomp_strm_single (mutex-based locking).
> -- passing to zcomp_create() num_strm greater than 1 will initialise zcomp
> using multi compression stream zcomp_strm_multi (spinlock-based locking).
> 
> default max_comp_streams value is 1, meaning that zram with single stream
> will be initialised.
> 
> Later patch will introduce configuration knob to change max_comp_streams
> on already initialised and used zcomp.
> 
> iozone -t 3 -R -r 16K -s 60M -I +Z
> 
>test   base   1 strm (mutex) 3 strm (spinlock)
> ---
>  Initial write  589286.78   583518.39  718011.05
>Rewrite  604837.97   596776.38 1515125.72
>   Random write  584120.11   595714.58 1388850.25
> Pwrite  535731.17   541117.38  739295.27
> Fwrite 1418083.88  1478612.72 1484927.06
> 
> Usage example:
> set max_comp_streams to 4
> echo 4 > /sys/block/zram0/max_comp_streams
> 
> show current max_comp_streams (default value is 1).
> cat /sys/block/zram0/max_comp_streams
> 
> Signed-off-by: Sergey Senozhatsky 
> ---
>  drivers/block/zram/zcomp.c|  9 +++--
>  drivers/block/zram/zcomp.h|  2 +-
>  drivers/block/zram/zram_drv.c | 42 +-
>  drivers/block/zram/zram_drv.h |  2 +-
>  4 files changed, 50 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> index 6f238f5..1bcb70e 100644
> --- a/drivers/block/zram/zcomp.c
> +++ b/drivers/block/zram/zcomp.c
> @@ -258,8 +258,9 @@ void zcomp_destroy(struct zcomp *comp)
>   * if requested algorithm is not supported or in case
>   * of init error
>   */
> -struct zcomp *zcomp_create(const char *compress)
> +struct zcomp *zcomp_create(const char *compress, int num_strm)

Let's use max_strm.

>  {
> + int ret;

No need.

>   struct zcomp *comp;
>   struct zcomp_backend *backend;
>  
> @@ -272,7 +273,11 @@ struct zcomp *zcomp_create(const char *compress)
>   return NULL;
>  
>   comp->backend = backend;
> - if (zcomp_strm_single_create(comp) != 0) {
> + if (num_strm > 1)
> + ret = zcomp_strm_multi_create(comp, num_strm);
> + else
> + ret = zcomp_strm_single_create(comp);
> + if (ret != 0) {
>   zcomp_destroy(comp);
>   return NULL;
>   }
> diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
> index 861e04d..5514509 100644
> --- a/drivers/block/zram/zcomp.h
> +++ b/drivers/block/zram/zcomp.h
> @@ -42,7 +42,7 @@ struct zcomp {
>   void (*destroy)(struct zcomp *comp);
>  };
>  
> -struct zcomp *zcomp_create(const char *comp);
> +struct zcomp *zcomp_create(const char *comp, int num_strm);
>  void zcomp_destroy(struct zcomp *comp);
>  
>  struct zcomp_strm *zcomp_strm_get(struct zcomp *comp);
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 569ff58..42b9c7f 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -108,6 +108,40 @@ static ssize_t mem_used_total_show(struct device *dev,
>   return sprintf(buf, "%llu\n", val);
>  }
>  
> +static ssize_t max_comp_streams_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + int val;
> + struct zram *zram = dev_to_zram(dev);
> +
> + down_read(>init_lock);
> + val = zram->max_comp_streams;
> + up_read(>init_lock);
> +
> + return sprintf(buf, "%d\n", val);
> +}
> +
> +static ssize_t max_comp_streams_store(struct device *dev,
> + struct device_attribute *attr, const char *buf, size_t len)
> +{
> + int num;
> + struct zram *zram = dev_to_zram(dev);
> +
> + if (kstrtoint(buf, 0, ))
> + return -EINVAL;
> + if (num < 1)
> + return -EINVAL;
> + down_write(>init_lock);
> + if (init_done(zram)) {
> + up_write(>init_lock);
> + pr_info("Can't set max_comp_streams for initialized device\n");
> + return -EBUSY;
> + }
> + zram->max_comp_streams = num;
> + up_write(>init_lock);
> + return len;
> +}
> +
>  /* flag operations needs meta->tb_lock */
>  static int zram_test_flag(struct zram_meta *meta, u32 index,
>   enum zram_pageflags flag)
> @@ -502,6 +536,8 @@ static void 

Re: [PATCH] clk: samsung: remove parentheses from return statements

2014-02-25 Thread Sachin Kamat
Hi Pankaj,

On 26 February 2014 08:14, Pankaj Dubey  wrote:
> fixed following checkpatch warning message
> "return is not a function, parentheses are not required"
>
> Signed-off-by: Pankaj Dubey 

Similar patch has already been submitted:
http://comments.gmane.org/gmane.linux.ports.arm.kernel/294530


-- 
With warm regards,
Sachin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] mm: per-thread vma caching

2014-02-25 Thread Davidlohr Bueso
On Tue, 2014-02-25 at 18:04 -0800, Michel Lespinasse wrote:
> On Tue, Feb 25, 2014 at 10:16 AM, Davidlohr Bueso  wrote:
> > This patch is a continuation of efforts trying to optimize find_vma(),
> > avoiding potentially expensive rbtree walks to locate a vma upon faults.
> > The original approach (https://lkml.org/lkml/2013/11/1/410), where the
> > largest vma was also cached, ended up being too specific and random, thus
> > further comparison with other approaches were needed. There are two things
> > to consider when dealing with this, the cache hit rate and the latency of
> > find_vma(). Improving the hit-rate does not necessarily translate in finding
> > the vma any faster, as the overhead of any fancy caching schemes can be too
> > high to consider.
> 
> Actually there is also the cost of keeping the cache up to date. I'm
> not saying that it's an issue in your proposal - I like the proposal,
> especially now that you are replacing the per-mm cache rather than
> adding something on top - but it is a factor to consider.

True, although numbers show that the cost of maintaining the cache is
quite minimal. Invalidations are a free lunch (except in the rare event
of a seqnum overflow), so the updating part would consume the most
cycles, but then again, the hit rate is quite good so I'm not worried
about that either.

> 
> > +static inline void __vmacache_invalidate(struct mm_struct *mm)
> > +{
> > +#ifdef CONFIG_MMU
> > +   vmacache_invalidate(mm);
> > +#else
> > +   mm->vmacache = NULL;
> > +#endif
> > +}
> 
> Is there any reason why we can't use your proposal for !CONFIG_MMU as well ?
> (I'm assuming that we could reduce preprocessor checks by doing so)

Based on Linus' feedback today, I'm getting rid of this ugliness and
trying to have per-thread caches for both configs.

> > +void vmacache_invalidate_all(void)
> > +{
> > +   struct task_struct *g, *p;
> > +
> > +   rcu_read_lock();
> > +   for_each_process_thread(g, p) {
> > +   /*
> > +* Only flush the vmacache pointers as the
> > +* mm seqnum is already set and curr's will
> > +* be set upon invalidation when the next
> > +* lookup is done.
> > +*/
> > +   memset(p->vmacache, 0, sizeof(p->vmacache));
> > +   }
> > +   rcu_read_unlock();
> > +}
> 
> Two things:
> 
> - I believe we only need to invalidate vma caches for threads that
> share a given mm ? we should probably pass in that mm in order to
> avoid over-invalidation

I think you're right, since the overflows will always occur on
mm->seqnum, tasks that do not share the mm shouldn't be affected.

So the danger here is that when a lookup occurs, vmacache_valid() will
return true, having:

mm == curr->mm && mm->vmacache_seqnum == curr->vmacache_seqnum (both 0).

Then we just iterate the cache and potentially return some bugus vma.

However, since we're now going to reset the seqnum on every fork/clone
(before it was just the oldmm->seqnum + 1 thing), I doubt we'll ever
overflow.

> - My understanding is that the operation is safe because the caller
> has the mm's mmap_sem held for write, and other threads accessing the
> vma cache will have mmap_sem held at least for read, so we don't need
> extra locking to maintain the vma cache. 

Yes, that's how I see things as well.

> Please 1- confirm this is the
> intention, 2- document this, and 3- only invalidate vma caches for
> threads that match the caller's mm so that mmap_sem locking can
> actually apply.

Will do.

> > +struct vm_area_struct *vmacache_find(struct mm_struct *mm,
> > +unsigned long addr)
> > +
> > +{
> > +   int i;
> > +
> > +   if (!vmacache_valid(mm))
> > +   return NULL;
> > +
> > +   for (i = 0; i < VMACACHE_SIZE; i++) {
> > +   struct vm_area_struct *vma = current->vmacache[i];
> > +
> > +   if (vma && vma->vm_start <= addr && vma->vm_end > addr)
> > +   return vma;
> > +   }
> > +
> > +   return NULL;
> > +}
> > +
> > +void vmacache_update(struct mm_struct *mm, unsigned long addr,
> > +struct vm_area_struct *newvma)
> > +{
> > +   /*
> > +* Hash based on the page number. Provides a good
> > +* hit rate for workloads with good locality and
> > +* those with random accesses as well.
> > +*/
> > +   int idx = (addr >> PAGE_SHIFT) & 3;
> > +   current->vmacache[idx] = newvma;
> > +}
> 
> I did read the previous discussion about how to compute idx here. I
> did not at the time realize that you are searching all 4 vmacache
> entries on lookup - that is, we are only talking about eviction policy
> here, not a lookup hash policy.

Right.

> My understanding is that the reason both your current and your
> previous idx computations work, is that a random eviction policy would
> work too. Basically, what you do is pick some address 

linux-next: manual merge of the crypto tree with the arm-soc tree

2014-02-25 Thread Stephen Rothwell
Hi Herbert,

Today's linux-next merge of the crypto tree got a conflict in
drivers/char/hw_random/Kconfig between commit 2257ffbca73c ("hwrng: msm:
switch Kconfig to ARCH_QCOM depends") from the arm-soc tree and commit
f9bee046c915 ("hwrng: msm: switch Kconfig to ARCH_QCOM depends") from the
crypto tree.

Two version of the same patch ... I just kept the arm-soc version.

-- 
Cheers,
Stephen Rothwell 


pgpEvsOhPiHi0.pgp
Description: PGP signature


[PATCH v2 3/5] bug: When !CONFIG_BUG, make WARN call no_printk to check format and args

2014-02-25 Thread Josh Triplett
The stub version of WARN for !CONFIG_BUG completely ignored its format
string and subsequent arguments; make it check them instead, using
no_printk.

Reported-by: Arnd Bergmann 
Signed-off-by: Josh Triplett 
---
 include/asm-generic/bug.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index 2d54d8d..a97fa11 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -155,6 +155,7 @@ extern void warn_slowpath_null(const char *file, const int 
line);
 #ifndef WARN
 #define WARN(condition, format...) ({  \
int __ret_warn_on = !!(condition);  \
+   no_printk(format);  \
unlikely(__ret_warn_on);\
 })
 #endif
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 4/5] bug: Use a common definition of BUG_ON regardless of CONFIG_BUG

2014-02-25 Thread Josh Triplett
include/asm-generic/bug.h defines BUG_ON to call BUG() if CONFIG_BUG=y,
or as a no-op if !CONFIG_BUG.  However, BUG() is already a no-op if
!CONFIG_BUG, making this pointless.  Use a common definition that always
calls BUG().

This does not change the compiled code at all.

Signed-off-by: Josh Triplett 
---
 include/asm-generic/bug.h | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index a97fa11..653c44a 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -51,10 +51,6 @@ struct bug_entry {
 } while (0)
 #endif
 
-#ifndef HAVE_ARCH_BUG_ON
-#define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
-#endif
-
 /*
  * WARN(), WARN_ON(), WARN_ON_ONCE, and so on can be used to report
  * significant issues that need prompt attention if they should ever
@@ -141,10 +137,6 @@ extern void warn_slowpath_null(const char *file, const int 
line);
 #define BUG() do {} while (0)
 #endif
 
-#ifndef HAVE_ARCH_BUG_ON
-#define BUG_ON(condition) do { if (condition) ; } while (0)
-#endif
-
 #ifndef HAVE_ARCH_WARN_ON
 #define WARN_ON(condition) ({  \
int __ret_warn_on = !!(condition);  \
@@ -167,6 +159,10 @@ extern void warn_slowpath_null(const char *file, const int 
line);
 
 #endif
 
+#ifndef HAVE_ARCH_BUG_ON
+#define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
+#endif
+
 /*
  * WARN_ON_SMP() is for cases that the warning is either
  * meaningless for !SMP or may even cause failures.
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/5] include/asm-generic/bug.h: Style fix: s/while(0)/while (0)/

2014-02-25 Thread Josh Triplett
Reported-by: Randy Dunlap 
Signed-off-by: Josh Triplett 
---
 include/asm-generic/bug.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index 7ecd398..2d54d8d 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -52,7 +52,7 @@ struct bug_entry {
 #endif
 
 #ifndef HAVE_ARCH_BUG_ON
-#define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while(0)
+#define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
 #endif
 
 /*
@@ -138,11 +138,11 @@ extern void warn_slowpath_null(const char *file, const 
int line);
 
 #else /* !CONFIG_BUG */
 #ifndef HAVE_ARCH_BUG
-#define BUG() do {} while(0)
+#define BUG() do {} while (0)
 #endif
 
 #ifndef HAVE_ARCH_BUG_ON
-#define BUG_ON(condition) do { if (condition) ; } while(0)
+#define BUG_ON(condition) do { if (condition) ; } while (0)
 #endif
 
 #ifndef HAVE_ARCH_WARN_ON
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 5/5] bug: Make BUG() call unreachable()

2014-02-25 Thread Josh Triplett
This effectively causes BUG() to act like a function with the noreturn
attribute, which prevents GCC from warning about the code that follows
BUG() (for instance, warning about not returning a value in a non-void
function after calling BUG()).

This actually makes the kernel smaller; bloat-o-meter summary:
add/remove: 2/7 grow/shrink: 34/57 up/down: 475/-1233 (-758)

Signed-off-by: Josh Triplett 
---
 include/asm-generic/bug.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index 653c44a..5f69248 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -134,7 +134,7 @@ extern void warn_slowpath_null(const char *file, const int 
line);
 
 #else /* !CONFIG_BUG */
 #ifndef HAVE_ARCH_BUG
-#define BUG() do {} while (0)
+#define BUG() do { unreachable(); } while (0)
 #endif
 
 #ifndef HAVE_ARCH_WARN_ON
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 10/11] powerpc/perf: add kconfig option for hypervisor provided counters

2014-02-25 Thread Michael Ellerman
On Tue, 2014-02-25 at 13:31 -0800, Cody P Schafer wrote:
> On 02/24/2014 07:33 PM, Michael Ellerman wrote:
> > On Fri, 2014-14-02 at 22:02:14 UTC, Cody P Schafer wrote:
> >> Signed-off-by: Cody P Schafer 
> >> ---
> >>   arch/powerpc/perf/Makefile | 2 ++
> >>   arch/powerpc/platforms/Kconfig.cputype | 6 ++
> >>   2 files changed, 8 insertions(+)
> >>
> >> diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
> >> index 60d71ee..f9c083a 100644
> >> --- a/arch/powerpc/perf/Makefile
> >> +++ b/arch/powerpc/perf/Makefile
> >> @@ -11,5 +11,7 @@ obj32-$(CONFIG_PPC_PERF_CTRS)+= mpc7450-pmu.o
> >>   obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
> >>   obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o
> >>
> >> +obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o
> >> +
> >>   obj-$(CONFIG_PPC64)  += $(obj64-y)
> >>   obj-$(CONFIG_PPC32)  += $(obj32-y)
> >> diff --git a/arch/powerpc/platforms/Kconfig.cputype 
> >> b/arch/powerpc/platforms/Kconfig.cputype
> >> index 434fda3..dcc67cd 100644
> >> --- a/arch/powerpc/platforms/Kconfig.cputype
> >> +++ b/arch/powerpc/platforms/Kconfig.cputype
> >> @@ -364,6 +364,12 @@ config PPC_PERF_CTRS
> >>  help
> >>This enables the powerpc-specific perf_event back-end.
> >>
> >> +config HV_PERF_CTRS
> >> +   def_bool y
> >
> > This was bool, why did you change it?
> 
> No, it wasn't. v1 also had def_bool. https://lkml.org/lkml/2014/1/16/518
> Maybe you're confusing v2.1 and v2 of this patch?

Er yes. Point releases of a patch series confuse me :)

> >> +   depends on PERF_EVENTS && PPC_HAVE_PMU_SUPPORT
> >
> > Should be:
> >
> > depends on PERF_EVENTS && PPC_PSERIES
> >
> >> +   help
> >> + Enable access to perf counters provided by the hypervisor
> >> +
> 
> Yep, the v2.1 patch (which I bungled and labeled as 9/11) already 
> changes both of these.
> It'll end up rolled into v3.

Yes please.

cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/5] bug: When !CONFIG_BUG, simplify WARN_ON_ONCE and family

2014-02-25 Thread Josh Triplett
When !CONFIG_BUG, WARN_ON and family become simple passthroughs of their
condition argument; however, WARN_ON_ONCE and family still have
conditions and a boolean to detect one-time invocation, even though the
warning they'd emit doesn't exist.  Make the existing definitions
conditional on CONFIG_BUG, and add definitions for !CONFIG_BUG that map
to the passthrough versions of WARN and WARN_ON.

This saves 4.4k on a minimized configuration (smaller than
allnoconfig), and 20.6k with defconfig plus CONFIG_BUG=n.

Signed-off-by: Josh Triplett 
---
v2: Incorporate feedback from Arnd Bergmann: make the WARN_* variants with
format strings and arguments call WARN and pass along those arguments,
rather than calling WARN_ON.  Used by the new patch 3, which makes the stub
version of WARN call no_printk.

 include/asm-generic/bug.h | 57 +--
 1 file changed, 30 insertions(+), 27 deletions(-)

diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index 7d10f96..7ecd398 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -106,33 +106,6 @@ extern void warn_slowpath_null(const char *file, const int 
line);
unlikely(__ret_warn_on);\
 })
 
-#else /* !CONFIG_BUG */
-#ifndef HAVE_ARCH_BUG
-#define BUG() do {} while(0)
-#endif
-
-#ifndef HAVE_ARCH_BUG_ON
-#define BUG_ON(condition) do { if (condition) ; } while(0)
-#endif
-
-#ifndef HAVE_ARCH_WARN_ON
-#define WARN_ON(condition) ({  \
-   int __ret_warn_on = !!(condition);  \
-   unlikely(__ret_warn_on);\
-})
-#endif
-
-#ifndef WARN
-#define WARN(condition, format...) ({  \
-   int __ret_warn_on = !!(condition);  \
-   unlikely(__ret_warn_on);\
-})
-#endif
-
-#define WARN_TAINT(condition, taint, format...) WARN_ON(condition)
-
-#endif
-
 #define WARN_ON_ONCE(condition)({  \
static bool __section(.data.unlikely) __warned; \
int __ret_warn_once = !!(condition);\
@@ -163,6 +136,36 @@ extern void warn_slowpath_null(const char *file, const int 
line);
unlikely(__ret_warn_once);  \
 })
 
+#else /* !CONFIG_BUG */
+#ifndef HAVE_ARCH_BUG
+#define BUG() do {} while(0)
+#endif
+
+#ifndef HAVE_ARCH_BUG_ON
+#define BUG_ON(condition) do { if (condition) ; } while(0)
+#endif
+
+#ifndef HAVE_ARCH_WARN_ON
+#define WARN_ON(condition) ({  \
+   int __ret_warn_on = !!(condition);  \
+   unlikely(__ret_warn_on);\
+})
+#endif
+
+#ifndef WARN
+#define WARN(condition, format...) ({  \
+   int __ret_warn_on = !!(condition);  \
+   unlikely(__ret_warn_on);\
+})
+#endif
+
+#define WARN_ON_ONCE(condition) WARN_ON(condition)
+#define WARN_ONCE(condition, format...) WARN(condition, format)
+#define WARN_TAINT(condition, taint, format...) WARN(condition, format)
+#define WARN_TAINT_ONCE(condition, taint, format...) WARN(condition, format)
+
+#endif
+
 /*
  * WARN_ON_SMP() is for cases that the warning is either
  * meaningless for !SMP or may even cause failures.
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc: warn users of smt-snooze-delay that the API isn't there anymore

2014-02-25 Thread Michael Ellerman
On Wed, 2014-02-26 at 09:40 +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2014-02-25 at 13:29 +0530, Deepthi Dharwar wrote:
> > We currently do not use smt-snooze-delay in the kernel.
> > The sysfs entries needs to  be retained until we do a clean up
> > ppc64_cpu
> > util that uses these entries to determine SMT,
> > clean up patch for this has already been posted out by Prerna.
> > Once, we have the ppc64_cpu changes in, we can look to clean up these
> > parts from the kernel.
> 
> We generally shouldn't change user visible interfaces.
> 
> People still have old versions of ppc64_cpu, we must not break them

Yeah we can't remove the file entirely, at least for a few more years.

ppc64_cpu should never have used that file to determine if a cpu existed, but
it did, so we're stuck with it.

What we can do is remove the unused percpu, and just leave the file in sysfs,
and have it print a warning when anyone touches it.

cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] perf, machine: Use map as success in ip__resolve_ams

2014-02-25 Thread Don Zickus
When trying to map a bunch of instruction addresses to their respective
threads, I kept getting a lot of bogus entries [I forget the exact reason
as I patched my code months ago].

Looking through ip__resovle_ams, I noticed the check for

if (al.sym)

and realized, most times I have an al.map definition but sometimes an
al.sym is undefined.  In the cases where al.sym is undefined, the loop
keeps going even though a valid al.map exists.

Modify this check to use the more reliable al.map.  This fixed my bogus
entries.

Signed-off-by: Don Zickus 
---
 tools/perf/util/machine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index ac37d78..813e94e 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1213,7 +1213,7 @@ static void ip__resolve_ams(struct machine *machine, 
struct thread *thread,
 */
thread__find_addr_location(thread, machine, m, MAP__FUNCTION,
ip, );
-   if (al.sym)
+   if (al.map)
goto found;
}
 found:
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] perf: fix synthesizing mmaps for threads

2014-02-25 Thread Don Zickus
Currently if a process creates a bunch of threads using pthread_create
and then perf is run in system_wide mode, the mmaps for those threads
are not captured with a synthesized mmap event.

The reason is those threads are not visible when walking the /proc/
directory looking for /proc//maps files.  Instead they are discovered
using the /proc//tasks file (which the synthesized comm event uses).

This causes problems when a program is trying to map a data address to a
tid.  Because the tid has no maps, the event is dropped.  Changing the program
to look up using the pid instead of the tid, finds the correct maps but creates
ugly hacks in the program to carry the correct tid around.

Fix this by synthesizing mmap events for each tid found in the /proc//tasks
file.

This may not be entirely clean but it seems to work.

Signed-off-by: Don Zickus 
---
 tools/perf/util/event.c   | 15 +++
 tools/perf/util/machine.c |  4 ++--
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 086c7c8..09c53bb 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -93,10 +93,13 @@ static pid_t perf_event__get_comm_tgid(pid_t pid, char 
*comm, size_t len)
 }
 
 static pid_t perf_event__synthesize_comm(struct perf_tool *tool,
-union perf_event *event, pid_t pid,
+union perf_event *event,
+union perf_event *mmap_event,
+pid_t pid,
 int full,
 perf_event__handler_t process,
-struct machine *machine)
+struct machine *machine,
+bool mmap_data)
 {
char filename[PATH_MAX];
size_t size;
@@ -168,6 +171,10 @@ static pid_t perf_event__synthesize_comm(struct perf_tool 
*tool,
tgid = -1;
break;
}
+
+   /* process the thread's maps too */
+   perf_event__synthesize_mmap_events(tool, mmap_event, pid, tgid,
+ process, machine, mmap_data);
}
 
closedir(tasks);
@@ -331,8 +338,8 @@ static int __event__synthesize_thread(union perf_event 
*comm_event,
  struct perf_tool *tool,
  struct machine *machine, bool mmap_data)
 {
-   pid_t tgid = perf_event__synthesize_comm(tool, comm_event, pid, full,
-process, machine);
+   pid_t tgid = perf_event__synthesize_comm(tool, comm_event, mmap_event, 
pid,
+full, process, machine, 
mmap_data);
if (tgid == -1)
return -1;
return perf_event__synthesize_mmap_events(tool, mmap_event, pid, tgid,
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 813e94e..eb26544 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1026,7 +1026,7 @@ int machine__process_mmap2_event(struct machine *machine,
}
 
thread = machine__findnew_thread(machine, event->mmap2.pid,
-   event->mmap2.pid);
+   event->mmap2.tid);
if (thread == NULL)
goto out_problem;
 
@@ -1074,7 +1074,7 @@ int machine__process_mmap_event(struct machine *machine, 
union perf_event *event
}
 
thread = machine__findnew_thread(machine, event->mmap.pid,
-event->mmap.pid);
+event->mmap.tid);
if (thread == NULL)
goto out_problem;
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] perf, session: Change header.misc dump from decimal to hex

2014-02-25 Thread Don Zickus
When printing the raw dump of a data file, the header.misc is
printed as a decimal.  Unfortunately, that field is a bit mask, so
it is hard to interpret as a decimal.

Print in hex, so the user can easily see what bits are set and more
importantly what type of info it is conveying.

V2: add 0x in front per Jiri Olsa

Signed-off-by: Don Zickus 
---
 tools/perf/util/session.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 1d555d6..55960f2 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -794,7 +794,7 @@ static void dump_sample(struct perf_evsel *evsel, union 
perf_event *event,
if (!dump_trace)
return;
 
-   printf("(IP, %d): %d/%d: %#" PRIx64 " period: %" PRIu64 " addr: %#" 
PRIx64 "\n",
+   printf("(IP, 0x%x): %d/%d: %#" PRIx64 " period: %" PRIu64 " addr: %#" 
PRIx64 "\n",
   event->header.misc, sample->pid, sample->tid, sample->ip,
   sample->period, sample->addr);
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] perf: misc fixes

2014-02-25 Thread Don Zickus
Just a small collection of fixes noticed while hacking up the c2c tool

Don Zickus (3):
  perf, machine: Use map as success in ip__resolve_ams
  perf, session: Change header.misc dump from decimal to hex
  perf: fix synthesizing mmaps for threads

 tools/perf/util/event.c   | 15 +++
 tools/perf/util/machine.c |  6 +++---
 tools/perf/util/session.c |  2 +-
 3 files changed, 15 insertions(+), 8 deletions(-)

-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 0/10] fs: Introduce new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocate

2014-02-25 Thread Dave Chinner
On Tue, Feb 25, 2014 at 05:52:16PM -0800, Andrew Morton wrote:
> On Wed, 26 Feb 2014 12:34:26 +1100 Dave Chinner  wrote:
> 
> > On Tue, Feb 25, 2014 at 03:41:28PM -0800, Andrew Morton wrote:
> > > On Tue, 25 Feb 2014 15:23:35 -0800 (PST) Hugh Dickins  
> > > wrote:
> > > > On Tue, 25 Feb 2014, Dave Chinner wrote:
> > > > > On Tue, Feb 25, 2014 at 02:16:01PM +1100, Stephen Rothwell wrote:
> > > > > > On Mon, 24 Feb 2014 11:57:10 +1100 Dave Chinner 
> > > > > >  wrote:
> > > > FALLOC_FL_COLLAPSE_RANGE: I'm a little sad at the name COLLAPSE,
> > > > but probably seven months too late to object.  It surprises me that
> > > > you're doing all this work to deflate a part of the file, without
> > > > the obvious complementary work to inflate it - presumably all those
> > > > advertisers whose ads you're cutting out, will come back to us soon
> > > > to ask for inflation, so that they have somewhere to reinsert them ;)
> > > 
> > > Yes, I was wondering that.  Why not simply "move these blocks from here
> > > to there".
> > 
> > And open a completely unnecessary can of worms to do with
> > behavioural and implementation corner cases?
> 
> But it's general.

Exactly. And because it's general, you can't make arbitrary
decisions about the behaviour.

> > Do you allow it to destroy data by default? Or only allow moves into
> > holes?
> 
> Overwrite.

Application dev says: "I don't want it to overwrite data - I only
want to it succeed if it's moving into a hole that I've already
prepared".

> > What do you do with range the data is moved out of? Does it just
> > become a hole? What happens if the range overlaps EOF - does that
> > change the file size?
> 
> Truncate.

A.D. says: "But I need FALLOC_FL_KEEP_SIZE semantics"

> > What if you want to move the range beyond EOF?
> 
> Extend.

Filesystem developer says: "Ok, so what happens to the range between
the old EOF and destintation offset? What do you do with blocks
beyond EOF that fall within that range? punch, zero, preallocate the
entire range? Do users need to be able to specify this behaviour?
Hell, do we even know of an application that requires this
behaviour?"

> > What if the source and destination ranges overlap?
> 
> Don't screw it up.

Exactly my point - it's a complex behaviour that is difficult to
verify that it is correct.

> > What happens when you move the block at EOF into the middle of a
> > file - do you end up with zeros padding the block and the file size
> > having to be adjusted accordingly? Or do we have to *copy* all the
> > data in high blocks down to fill the hole in the block?
> 
> I don't understand that.  Move the block(s) and truncate to the new
> length.

So, you are saying this (move from s to d):

 +-+
 +s+
  +d+

should result in:

 ++d+


A.D. says: "That's not what I asked for! What happened to all the
rest of my data in the file between d and s? I didn't ask for them
to be removed. And I want a hole where the source was!"

> > What behaviour should we expect if the filesystem can't implement
> > the entire move atomically and we crash in the middle of the move?
> 
> What does collapse_range do now?
> 
> If it's a journaled filesystem, it shouldn't screw up.  If it isn't, fsck.

Define "screw up". For journalled filesystems "don't screw up" means
the filesystem will be consistent after a crash, not that a change
made in a syscall is completed atomicly.

Indeed, collapse range isn't implemented atomically in XFS, and I
doubt it is in ext4. Why? Because the extent tree being manipulated
can be *much* larger than the journal and so the changes can't
easily be done atomically from a crash recovery perspective. The
result is that collapse range will end up with a hole somewhere in
the file the size of the range being collapsed. This was pointed out
during review some time in the past 6 months and, IIRC, the response
was "that's fine, just so long as the filesystem is not corrupted".
I have plans to fix this issue in XFS, but it isn't critical to the
correct functioning of devices using collapse range.

This just illustrates my point is that behaviour needs to be
specified so that we can get all filesystems with the same minimum
crash guarantees

> > I can keep going, but I'll stop here - you get the idea.
> 
> None of this seems like rocket science.

It's not rocket science, but the devil is in the details. There's no
requirements or specification to work from, let alone an application
that needs such generic functionality. Until these exist and there's
someone willing to put the effort into specifying, implementing and
testing such an interface, it's just not going to happen.

> > In comparison, collapse range as a file data manipulation has very
> > specific requirements and from that we can define a simple, specific
> > API that allows filesystems to accelerate that 

[PATCH 3/3] cpufreq: Set policy to non-NULL only after all hotplug online work is done

2014-02-25 Thread Saravana Kannan
The existing code sets the per CPU policy to a non-NULL value before all
the steps performed during the hotplug online path is done. Specifically,
this is done before the policy min/max, governors, etc are initialized for
the policy.  This in turn means that calls to cpufreq_cpu_get() return a
non-NULL policy before the policy/CPU is ready to be used.

To fix this, move the update of per CPU policy to a valid value after all
the initialization steps for the policy are completed.

Example kernel panic without this fix:
[  512.146185] Unable to handle kernel NULL pointer dereference at virtual 
address 0020
[  512.146195] pgd = c0003000
[  512.146213] [0020] *pgd=804003, *pmd=
[  512.146228] Internal error: Oops: 206 [#1] PREEMPT SMP ARM

[  512.146297] PC is at __cpufreq_governor+0x10/0x1ac
[  512.146312] LR is at cpufreq_update_policy+0x114/0x150

[  512.149740] ---[ end trace f23a8defea6cd706 ]---
[  512.149761] Kernel panic - not syncing: Fatal exception
[  513.152016] CPU0: stopping
[  513.154710] CPU: 0 PID: 7136 Comm: mpdecision Tainted: G  D W
3.10.0-gd727407-00074-g979ede8 #396

[  513.317224] [] (notifier_call_chain+0x40/0x68) from [] 
(__blocking_notifier_call_chain+0x40/0x58)
[  513.327809] [] (__blocking_notifier_call_chain+0x40/0x58) from 
[] (blocking_notifier_call_chain+0x14/0x1c)
[  513.339182] [] (blocking_notifier_call_chain+0x14/0x1c) from 
[] (cpufreq_set_policy+0xd4/0x2b8)
[  513.349594] [] (cpufreq_set_policy+0xd4/0x2b8) from [] 
(cpufreq_init_policy+0x30/0x98)
[  513.359231] [] (cpufreq_init_policy+0x30/0x98) from [] 
(__cpufreq_add_dev.isra.17+0x4dc/0x7a4)
[  513.369560] [] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from 
[] (cpufreq_cpu_callback+0x58/0x84)
[  513.379978] [] (cpufreq_cpu_callback+0x58/0x84) from [] 
(notifier_call_chain+0x40/0x68)
[  513.389704] [] (notifier_call_chain+0x40/0x68) from [] 
(__cpu_notify+0x28/0x44)
[  513.398728] [] (__cpu_notify+0x28/0x44) from [] 
(_cpu_up+0xf4/0x1dc)
[  513.406797] [] (_cpu_up+0xf4/0x1dc) from [] 
(cpu_up+0x5c/0x78)
[  513.414357] [] (cpu_up+0x5c/0x78) from [] 
(store_online+0x44/0x74)
[  513.422253] [] (store_online+0x44/0x74) from [] 
(sysfs_write_file+0x108/0x14c)
[  513.431195] [] (sysfs_write_file+0x108/0x14c) from [] 
(vfs_write+0xd0/0x180)
[  513.439958] [] (vfs_write+0xd0/0x180) from [] 
(SyS_write+0x38/0x68)
[  513.447947] [] (SyS_write+0x38/0x68) from [] 
(ret_fast_syscall+0x0/0x30)

In this specific case, thread A set's CPU1's policy->governor in
cpufreq_init_policy() to NULL while thread B is using the policy->governor in
__cpufreq_governor().

Change-Id: I0f6f4e51ac3b7127a1ea56a1cb8e7ae1bcf8d6b6
Signed-off-by: Saravana Kannan 
---
 drivers/cpufreq/cpufreq.c | 52 ---
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index cb003a6..5caefa9 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -849,7 +849,7 @@ static int cpufreq_add_dev_interface(struct cpufreq_policy 
*policy,
goto err_out_kobj_put;
drv_attr++;
}
-   if (cpufreq_driver->get) {
+   if (cpufreq_driver->get || policy->clk) {
ret = sysfs_create_file(>kobj, _cur_freq.attr);
if (ret)
goto err_out_kobj_put;
@@ -877,6 +877,22 @@ err_out_kobj_put:
return ret;
 }
 
+static unsigned int __cpufreq_get_freq(struct cpufreq_policy *policy)
+{
+   unsigned long freq;
+
+   if (policy->clk) {
+   freq = clk_get_rate(policy->clk);
+   if(!IS_ERR_VALUE(freq))
+   return freq / 1000;
+   }
+
+   if (cpufreq_driver->get)
+   return cpufreq_driver->get(policy->cpu);
+
+   return 0;
+}
+
 static void cpufreq_init_policy(struct cpufreq_policy *policy)
 {
struct cpufreq_policy new_policy;
@@ -1109,17 +1125,10 @@ static int __cpufreq_add_dev(struct device *dev, struct 
subsys_interface *sif,
goto err_set_policy_cpu;
}
 
-   write_lock_irqsave(_driver_lock, flags);
-   for_each_cpu(j, policy->cpus)
-   per_cpu(cpufreq_cpu_data, j) = policy;
-   write_unlock_irqrestore(_driver_lock, flags);
-
-   if (cpufreq_driver->get) {
-   policy->cur = cpufreq_driver->get(policy->cpu);
-   if (!policy->cur) {
-   pr_err("%s: ->get() failed\n", __func__);
-   goto err_get_freq;
-   }
+   policy->cur = __cpufreq_get_freq(policy);
+   if (!policy->cur) {
+   pr_err("%s: get freq failed\n", __func__);
+   goto err_get_freq;
}
 
/*
@@ -1207,6 +1216,11 @@ static int __cpufreq_add_dev(struct device *dev, struct 
subsys_interface *sif,
policy->user_policy.governor = policy->governor;
}
 
+   write_lock_irqsave(_driver_lock, flags);
+   

[PATCH 2/3] cpufreq: stats: Fix error handling in __cpufreq_stats_create_table()

2014-02-25 Thread Saravana Kannan
Remove sysfs group if __cpufreq_stats_create_table() fails after creating
one.

Change-Id: Icb0b44424cc4eb6c88be255e2839ef51c3f8779c
Signed-off-by: Saravana Kannan 
---
 drivers/cpufreq/cpufreq_stats.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index e4bd27f..c52b440 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -216,7 +216,7 @@ static int __cpufreq_stats_create_table(struct 
cpufreq_policy *policy,
stat->time_in_state = kzalloc(alloc_size, GFP_KERNEL);
if (!stat->time_in_state) {
ret = -ENOMEM;
-   goto error_out;
+   goto error_alloc;
}
stat->freq_table = (unsigned int *)(stat->time_in_state + count);
 
@@ -237,6 +237,8 @@ static int __cpufreq_stats_create_table(struct 
cpufreq_policy *policy,
stat->last_index = freq_table_get_index(stat, policy->cur);
spin_unlock(_stats_lock);
return 0;
+error_alloc:
+   sysfs_remove_group(>kobj, _attr_group);
 error_out:
kfree(stat);
per_cpu(cpufreq_stats_table, cpu) = NULL;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] cpufreq: stats: Remove redundant cpufreq_cpu_get() call

2014-02-25 Thread Saravana Kannan
__cpufreq_stats_create_table always gets pass the valid and real policy
struct. So, there's no need to call cpufreq_cpu_get() to get the policy
again.

Change-Id: I0136b3e67018ee3af2335906407f55d8c6219f71
Signed-off-by: Saravana Kannan 
---

Viresh/Rafael,

These 3 patches is the approximate code I have in mind.

Approximate because:
* I inserted one question as a comment into the code.
* If the patch doesn't have any bugs, the plan is to remove
  cpufreq_generic_get() and references to it.

This takes care of the "don't advertise before it's ready for use" rule.

Viresh,

I think the locking updates needs to be done in addition to this.

Regards,
Saravana

 drivers/cpufreq/cpufreq_stats.c | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index 5793e14..e4bd27f 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -185,7 +185,6 @@ static int __cpufreq_stats_create_table(struct 
cpufreq_policy *policy,
 {
unsigned int i, j, count = 0, ret = 0;
struct cpufreq_stats *stat;
-   struct cpufreq_policy *current_policy;
unsigned int alloc_size;
unsigned int cpu = policy->cpu;
if (per_cpu(cpufreq_stats_table, cpu))
@@ -194,13 +193,7 @@ static int __cpufreq_stats_create_table(struct 
cpufreq_policy *policy,
if ((stat) == NULL)
return -ENOMEM;
 
-   current_policy = cpufreq_cpu_get(cpu);
-   if (current_policy == NULL) {
-   ret = -EINVAL;
-   goto error_get_fail;
-   }
-
-   ret = sysfs_create_group(_policy->kobj, _attr_group);
+   ret = sysfs_create_group(>kobj, _attr_group);
if (ret)
goto error_out;
 
@@ -243,11 +236,8 @@ static int __cpufreq_stats_create_table(struct 
cpufreq_policy *policy,
stat->last_time = get_jiffies_64();
stat->last_index = freq_table_get_index(stat, policy->cur);
spin_unlock(_stats_lock);
-   cpufreq_cpu_put(current_policy);
return 0;
 error_out:
-   cpufreq_cpu_put(current_policy);
-error_get_fail:
kfree(stat);
per_cpu(cpufreq_stats_table, cpu) = NULL;
return ret;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-25 Thread Jeff Law

On 02/25/14 17:15, Paul E. McKenney wrote:

I have for the last several years been 100% convinced that the Intel
memory ordering is the right thing, and that people who like weak
memory ordering are wrong and should try to avoid reproducing if at
all possible. But given that we have memory orderings like power and
ARM, I don't actually see a sane way to get a good strong ordering.
You can teach compilers about cases like the above when they actually
see all the code and they could poison the value chain etc. But it
would be fairly painful, and once you cross object files (or even just
functions in the same compilation unit, for that matter), it goes from
painful to just "ridiculously not worth it".


And I have indeed seen a post or two from you favoring stronger memory
ordering over the past few years.  ;-)

I couldn't agree more.



Are ARM and Power really the bad boys here?  Or are they instead playing
the role of the canary in the coal mine?
That's a question I've been struggling with recently as well.  I suspect 
they (arm, power) are going to be the outliers rather than the canary. 
While the weaker model may give them some advantages WRT scalability, I 
don't think it'll ultimately be enough to overcome the difficulty in 
writing correct low level code for them.


Regardless, they're here and we have to deal with them.


Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the net-next tree with the wireless tree

2014-02-25 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the net-next tree got a conflict in
drivers/net/wireless/ath/ath9k/recv.c between commit b7b146c9c9a0
("ath9k: fix invalid descriptor discarding") from the wireless tree and
commits 1274603646a8 ("ath9k: move ath9k_process_rate to common.c") and
6438696efa81 ("ath9k: move ath9k_rx_accept to common.c") from the
net-next tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/net/wireless/ath/ath9k/recv.c
index 82e340d3ec60,076dae1e5ab7..
--- a/drivers/net/wireless/ath/ath9k/recv.c
+++ b/drivers/net/wireless/ath/ath9k/recv.c
@@@ -1055,8 -853,10 +857,8 @@@ static int ath9k_rx_skb_preprocess(stru
 * everything but the rate is checked here, the rate check is done
 * separately to avoid doing two lookups for a rate for each frame.
 */
-   if (!ath9k_rx_accept(common, hdr, rx_status, rx_stats, decrypt_error))
 -  if (!ath9k_cmn_rx_accept(common, hdr, rx_status, rx_stats, 
decrypt_error, sc->rx.rxfilter)) {
 -  ret = -EINVAL;
 -  goto exit;
 -  }
++  if (!ath9k_cmn_rx_accept(common, hdr, rx_status, rx_stats, 
decrypt_error, sc->rx.rxfilter))
 +  return -EINVAL;
  
if (ath_is_mybeacon(common, hdr)) {
RX_STAT_INC(rx_beacons);
@@@ -1066,13 -866,24 +868,21 @@@
/*
 * This shouldn't happen, but have a safety check anyway.
 */
 -  if (WARN_ON(!ah->curchan)) {
 -  ret = -EINVAL;
 -  goto exit;
 -  }
 +  if (WARN_ON(!ah->curchan))
 +  return -EINVAL;
  
-   if (ath9k_process_rate(common, hw, rx_stats, rx_status))
+   if (ath9k_cmn_process_rate(common, hw, rx_stats, rx_status)) {
+   /*
+* No valid hardware bitrate found -- we should not get here
+* because hardware has already validated this frame as OK.
+*/
+   ath_dbg(common, ANY, "unsupported hw bitrate detected 0x%02x 
using 1 Mbit\n",
+   rx_stats->rs_rate);
+   RX_STAT_INC(rx_rate_err);
 -  ret =-EINVAL;
 -  goto exit;
 +  return -EINVAL;
+   }
  
-   ath9k_process_rssi(common, hw, rx_stats, rx_status);
+   ath9k_cmn_process_rssi(common, hw, rx_stats, rx_status);
  
rx_status->band = ah->curchan->chan->band;
rx_status->freq = ah->curchan->chan->center_freq;
@@@ -1085,64 -896,11 +895,13 @@@
sc->rx.num_pkts++;
  #endif
  
 -exit:
 -  sc->rx.discard_next = false;
 -  return ret;
 +  return 0;
 +
 +corrupt:
 +  sc->rx.discard_next = rx_stats->rs_more;
 +  return -EINVAL;
  }
  
- static void ath9k_rx_skb_postprocess(struct ath_common *common,
-struct sk_buff *skb,
-struct ath_rx_status *rx_stats,
-struct ieee80211_rx_status *rxs,
-bool decrypt_error)
- {
-   struct ath_hw *ah = common->ah;
-   struct ieee80211_hdr *hdr;
-   int hdrlen, padpos, padsize;
-   u8 keyix;
-   __le16 fc;
- 
-   /* see if any padding is done by the hw and remove it */
-   hdr = (struct ieee80211_hdr *) skb->data;
-   hdrlen = ieee80211_get_hdrlen_from_skb(skb);
-   fc = hdr->frame_control;
-   padpos = ieee80211_hdrlen(fc);
- 
-   /* The MAC header is padded to have 32-bit boundary if the
-* packet payload is non-zero. The general calculation for
-* padsize would take into account odd header lengths:
-* padsize = (4 - padpos % 4) % 4; However, since only
-* even-length headers are used, padding can only be 0 or 2
-* bytes and we can optimize this a bit. In addition, we must
-* not try to remove padding from short control frames that do
-* not have payload. */
-   padsize = padpos & 3;
-   if (padsize && skb->len>=padpos+padsize+FCS_LEN) {
-   memmove(skb->data + padsize, skb->data, padpos);
-   skb_pull(skb, padsize);
-   }
- 
-   keyix = rx_stats->rs_keyix;
- 
-   if (!(keyix == ATH9K_RXKEYIX_INVALID) && !decrypt_error &&
-   ieee80211_has_protected(fc)) {
-   rxs->flag |= RX_FLAG_DECRYPTED;
-   } else if (ieee80211_has_protected(fc)
-  && !decrypt_error && skb->len >= hdrlen + 4) {
-   keyix = skb->data[hdrlen + 3] >> 6;
- 
-   if (test_bit(keyix, common->keymap))
-   rxs->flag |= RX_FLAG_DECRYPTED;
-   }
-   if (ah->sw_mgmt_crypto &&
-   (rxs->flag & RX_FLAG_DECRYPTED) &&
-   ieee80211_is_mgmt(fc))
-   /* Use software decrypt for management frames. */
-   rxs->flag &= ~RX_FLAG_DECRYPTED;
- }
- 
  /*
   * Run the LNA combining 

Re: [PATCH] x86: LLVMLinux: Reimplement current_stack_pointer without register usage.

2014-02-25 Thread H. Peter Anvin
On 02/25/2014 07:00 PM, Andy Lutomirski wrote:
>>
>> How much does this actually affect the output?  I only see three uses of
>> current_stack_pointer:
>>
>> /* how to get the thread information struct from C */
>> static inline struct thread_info *current_thread_info(void)
>> {
>> return (struct thread_info *)
>> (current_stack_pointer & ~(THREAD_SIZE - 1));
>> }
>>
>> ... here we need the mov anyway, because we have to then AND it with a
>> mask, which we obviously can't do inside the stack pointer.
> 
> No clue what code is actually generated, but the new code could generate:
> 
> mov $MASK, %rax;
> and %esp, %rax;
> 
> Admittedly, I can't see any reason why this would be an improvement.
> 

You have to generate one of the code sequences:

mov $MASK, %eax
and %esp, %eax

... or ...

mov %esp, %eax
and $MASK, %eax

No real difference either way.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/21] perf, c2c: Add rbtree sorted on mmap2 data

2014-02-25 Thread Don Zickus
On Fri, Feb 21, 2014 at 05:59:28PM +0100, Jiri Olsa wrote:
> On Thu, Feb 20, 2014 at 09:45:53PM -0500, Don Zickus wrote:
> > On Tue, Feb 18, 2014 at 02:04:05PM +0100, Jiri Olsa wrote:
> > > On Mon, Feb 10, 2014 at 12:29:04PM -0500, Don Zickus wrote:
> 
> SNIP
> 
> > > > +
> > > > +   if (l > r) return 1;
> > > > +   if (l < r) return -1;
> > > > +
> > > > +   /* sorting by iaddr makes calculations easier later */
> > > > +   if (left->mi->iaddr.al_addr > right->mi->iaddr.al_addr) 
> > > > return 1;
> > > > +   if (left->mi->iaddr.al_addr < right->mi->iaddr.al_addr) 
> > > > return -1;
> > > > +   }
> > > > +
> > > > +   return 0;
> > > > +}
> > > 
> > > there's sort object doing exatly this over hist_entry's
> > > 
> > > Is there any reason not to use hist_entries?
> > 
> > So looking over hist_entry, I realize, what do I gain?  I implemented it
> > and realized I had to add, 'cpumode', 'tid' and a 'private' field to
> > struct hist_entry.  Then because I have my own report implementation, I
> > still have to copy and paste a ton of stuff from builtin-report over to
> > here (including callchain support).
> 
> you mean new sort_entry objects?
> 
> > 
> > Not unless you are expecting me to add giant chunks of code to
> > builtin-report.c?
> 
> it can be separated object, implementing new report iterator
> 
> I think that we should go on with existing sort code we have..
> but I understand you might need some special usage.. i'll dive
> in and try to find some answer ;-)

Do things fall apart if I do not use evsel->hists to store the hist_entry
tree?  I need to combine two events (store and load) on to the same tree.

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-25 Thread George Spelvin
 wrote:
>  wrote:
>> I have for the last several years been 100% convinced that the Intel
>> memory ordering is the right thing, and that people who like weak
>> memory ordering are wrong and should try to avoid reproducing if at
>> all possible.
>
> Are ARM and Power really the bad boys here?  Or are they instead playing
> the role of the canary in the coal mine?

To paraphrase some older threads, I think Linus's argument is that
weak memory ordering is like branch delay slots: a way to make a simple
implementation simpler, but ends up being no help to a more aggressive
implementation.

Branch delay slots give a one-cycle bonus to in-order cores, but
once you go superscalar and add branch prediction, they stop helping,
and once you go full out of order, they're just an annoyance.

Likewise, I can see the point that weak ordering can help make a simple
cache interface simpler, but once you start doing speculative loads,
you've already bought and paid for all the hardware you need to do
stronger coherency.

Another thing that requires all the strong-coherency machinery is
a high-performance implementation of the various memory barrier and
synchronization operations.  Yes, a low-performance (drain the pipeline)
implementation is tolerable if the instructions aren't used frequently,
but once you're really trying, it doesn't save complexity.

Once you're there, strong coherency always doesn't actually cost you any
time outside of critical synchronization code, and it both simplifies
and speeds up the tricky synchronization software.


So PPC and ARM's weak ordering are not the direction the future is going.
Rather, weak ordering is something that's only useful in a limited
technology window, which is rapidly passing.

If you can find someone in IBM who's worked on the Z series cache
coherency (extremely strong ordering), they probably have some useful
insights.  The big question is if strong ordering, once you've accepted
the implementation complexity and area, actually costs anything in
execution time.  If there's an unavoidable cost which weak ordering saves,
that's significant.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC patch 0/5] hrtimers: Add deferrable mode

2014-02-25 Thread Andy Lutomirski
On 02/21/2014 09:56 AM, Thomas Gleixner wrote:
> Deferrable timers are beneficial for power saving. They behave like
> standard timers except that their expiry can be delayed up to the
> expiry of the next non deferred timer. That prevents them from waking
> up cpus from deep idle periods.

What does this accomplish that can't be done with hrtimers with enormous
slack?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: LLVMLinux: Reimplement current_stack_pointer without register usage.

2014-02-25 Thread Andy Lutomirski
On 02/20/2014 08:55 PM, H. Peter Anvin wrote:
> This seems like really deep magic when looking at it... at the very
> least, this needs to be very carefully commented, including why it works
> on the various platforms.
> 
> How much does this actually affect the output?  I only see three uses of
> current_stack_pointer:
> 
> /* how to get the thread information struct from C */
> static inline struct thread_info *current_thread_info(void)
> {
> return (struct thread_info *)
> (current_stack_pointer & ~(THREAD_SIZE - 1));
> }
> 
> ... here we need the mov anyway, because we have to then AND it with a
> mask, which we obviously can't do inside the stack pointer.

No clue what code is actually generated, but the new code could generate:

mov $MASK, %rax;
and %esp, %rax;

Admittedly, I can't see any reason why this would be an improvement.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: build failure after merge of the libata tree

2014-02-25 Thread Stephen Rothwell
Hi Tejun,

After merging the libata tree, today's linux-next build (x86_64
allmodconfig) failed like this:

drivers/ata/ahci_st.c: In function 'st_ahci_probe':
drivers/ata/ahci_st.c:159:8: error: implicit declaration of function 
'ahci_platform_put_resources' [-Werror=implicit-function-declaration]
ahci_platform_put_resources(>dev, hpriv);
^

Caused by commit 0d8d213703ff ("ahci: st: Add support for ST's SATA
IP").  That function is defined as "static" in another file ...

I have used the version of the libata tree from next-20140225 for today.
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpK4jBY5xyo4.pgp
Description: PGP signature


Re: [PATCH 3/4] power_supply: Introduce PSE compliant algorithm

2014-02-25 Thread Jenny Tc
On Fri, Feb 21, 2014 at 03:45:29PM +0100, Pavel Machek wrote:
> On Thu 2014-02-20 10:46:55, Jenny Tc wrote:
> > On Tue, Feb 04, 2014 at 12:36:40PM +0100, Pavel Machek wrote:
> > > > --- a/drivers/power/Kconfig
> > > > +++ b/drivers/power/Kconfig
> > > > @@ -22,6 +22,19 @@ config POWER_SUPPLY_CHARGER
> > > >   drivers to keep the charging logic outside and the charger 
> > > > driver
> > > >   just need to abstract the charger hardware.
> > > >  
> > > > +config POWER_SUPPLY_CHARGING_ALGO_PSE
> > > > +   bool "PSE compliant charging algorithm"
> > > > +   help
> > > > + Say Y here to select Product Safety Engineering (PSE) 
> > > > compliant
> > > > + charging algorithm. As per PSE standard the battery 
> > > > characteristics
> > > > + and thereby the charging rates can vary on different 
> > > > temperature
> > > > + zones. This config will enable PSE compliant charging 
> > > > algorithm with
> > > > + maintenance charging support. At runtime the algorithm will be
> > > > + selected by the psy charger driver based on the type of the 
> > > > battery
> > > > + charging profile.
> > > 
> > > Information where to expect PSE compliant chargers would be nice.
> > 
> > This algorithm can be used with non PSE compliant chargers also. This is a 
> > SW
> > based charging algorithm.
> 
> Ok, but you need to explain for the users when it might be good idea
> to select this option...
> 
> Or maybe this should not be user configurable and drivers should just
> select it?

The idea is to allow pluggable charging algorithms. Currently we have only one
charging algorithm proposed, but can have other charging algorithms (like pulse
charging, rule based charging etc.). Based on the platform need, the algorithms
can be selected. So this should be a user configurable option. I can add more
explanation on when to select this option.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 5/6] timerfd: Add support for deferrable timers

2014-02-25 Thread Andy Lutomirski
On 02/20/2014 08:23 AM, Alexey Perevalov wrote:
> From: Anton Vorontsov 
> 
> This patch implements a userland-side API for generic deferrable timers,
> per linux/timer.h:
> 
>  * A deferrable timer will work normally when the system is busy, but
>  * will not cause a CPU to come out of idle just to service it; instead,
>  * the timer will be serviced when the CPU eventually wakes up with a
>  * subsequent non-deferrable timer.
> 
> These timers are crucial for power saving, i.e. periodic tasks that want
> to work in background when the system is under use, but don't want to
> cause wakeups themselves.

Please don't.  This API sucks for all kinds of reasons:

 - Why is it a new kind of clock?
 - How deferrable is deferrable?
 - It adds new core code, which serves no purpose (the problem is
already solved).

On the other hand, if you added a fancier version of timerfd_settime
that could explicitly set the slack value (or, equivalently, the
earliest and latest allowable times), that could be quite useful.

It's often bugged me that timer slack is per-process.

--Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] jffs2: Fix segmentation fault found in stress test

2014-02-25 Thread Brian Norris
On Mon, Jan 06, 2014 at 07:06:54PM +0530, Kamlakant Patel wrote:
> Creating a large file on a JFFS2 partition sometimes crashes with this call
> trace:
[...]
> 
> This crash is caused because the 'positions' is declared as an array of signed
> short. The value of position is in the range 0..65535, and will be converted
> to a negative number when the position is greater than 32767 and causes a
> corruption and crash. Changing the definition to 'unsigned short' fixes this
> issue
> 
> Signed-off-by: Jayachandran C 
> Signed-off-by: Kamlakant Patel 

Pushed both patches to l2-mtd.git, and tagged them for -stable. Let me
know if you object.

Thanks,
Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: intel_pstate: Haswell i7-4600M refuses to enter lower Package States

2014-02-25 Thread Andy Lutomirski
On 02/20/2014 04:29 AM, Dieter Mummenschanz wrote:
> Hello,
> 
> on my Lenovo T440p Laptop the Haswell i7-4600M CPU refuses to enter lower PC 
> states resulting in 14-15 Watts continuous power drain even if the system is 
> idle and every tunalbes in powertop 2.5 are enabled. The issue is 
> reproducable with Kernel versions 3.13 up to 3.14-rc3.
> 
> Did anyone else experience a similar issue?

Something's confused here.  Do you mean P-states (i.e. performance
states or "frequency" settings) or PC-states (i.e. "package C-states")?
 I suspect that you mean the latter, in which case intel_pstate has
nothing to do with the problem.

The relevant diagnostic is the output from turbostat.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG kretprobes] kretprobe triggers General Protection Faults

2014-02-25 Thread Masami Hiramatsu
Hi Mathieu,

(2014/02/26 4:46), Mathieu Desnoyers wrote:
> Hi,
> 
> I had a bug report[1] from a user trying to add a kretprobe on the system
> call entry code path:
> 
> arch/x86/kernel/entry_64.S:
> 
> 813dffe2 :
> cmpl $__NR_syscall_max,%eax
> #endif
> ja badsys
> movq %r10,%rcx
> call *sys_call_table(,%rax,8)  # XXX:rip relative
> movq %rax,RAX-ARGOFFSET(%rsp)   <--- return address pointing here

Hm, I guess you put kretprobes on the functions on the sys_call_table,
right?

> And all hell breaks loose (various types of faults, machine reboots,
> applications exit randomly, etc.). I understand that this code path
> is not marked as unsafe against kprobes, and I tested that a kprobes
> indeed works fine there. However, kretprobes probably presumes a function
> stack layout that is just not valid for the syscall entry routine.

All the syscall entry functions caused this issue? or some
specific function(s) ?
And could you tell me the kernel version you used?

> Any thoughts on how kretprobes should handle this ?

I'll try to reproduce it in kvm environment.

Thank you!

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ACPI/Sleep: pm_power_off need more sanity check to be installed

2014-02-25 Thread Li, Aubrey
Sleep control and status registers need santity check before ACPI
install acpi_power_off to pm_power_off hook. The checking code in
acpi_enter_sleep_state() is too late, we should not allow a not-working
pm_power_off function hooked.

Signed-off-by: Aubrey Li 
---
 drivers/acpi/sleep.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c
index b718806..0284d22 100644
--- a/drivers/acpi/sleep.c
+++ b/drivers/acpi/sleep.c
@@ -809,8 +809,11 @@ int __init acpi_sleep_init(void)
status = acpi_get_sleep_type_data(ACPI_STATE_S5, _a, _b);
if (ACPI_SUCCESS(status)) {
sleep_states[ACPI_STATE_S5] = 1;
-   pm_power_off_prepare = acpi_power_off_prepare;
-   pm_power_off = acpi_power_off;
+   if (acpi_gbl_FADT.sleep_control.address &&
+   acpi_gbl_FADT.sleep_status.address) {
+   pm_power_off_prepare = acpi_power_off_prepare;
+   pm_power_off = acpi_power_off;
+   }
}

supported[0] = 0;
-- 
1.7.10.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched/rt: fix rt timer activation/deactivation

2014-02-25 Thread Mike Galbraith
On Wed, 2014-02-26 at 00:56 +0400, Kirill Tkhai wrote: 
> On Вт, 2014-02-25 at 17:05 +0100, Juri Lelli wrote:
> > Destroy rt bandwidth timer when rq has no more RT tasks, even when
> > CONFIG_RT_GROUP_SCHED is not set.
> > 
> > Signed-off-by: Juri Lelli 
> > ---
> >  kernel/sched/rt.c |   10 +++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index a2740b7..7dba25a 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -86,12 +86,12 @@ void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq)
> > raw_spin_lock_init(_rq->rt_runtime_lock);
> >  }
> >  
> > -#ifdef CONFIG_RT_GROUP_SCHED
> >  static void destroy_rt_bandwidth(struct rt_bandwidth *rt_b)
> >  {
> > hrtimer_cancel(_b->rt_period_timer);
> >  }
> >  
> > +#ifdef CONFIG_RT_GROUP_SCHED
> >  #define rt_entity_is_task(rt_se) (!(rt_se)->my_q)
> >  
> >  static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se)
> > @@ -1011,8 +1011,12 @@ inc_rt_group(struct sched_rt_entity *rt_se, struct 
> > rt_rq *rt_rq)
> > start_rt_bandwidth(_rt_bandwidth);
> >  }
> >  
> > -static inline
> > -void dec_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq) {}
> > +static void
> > +dec_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
> > +{
> > +   if (!rt_rq->rt_nr_running)
> > +   destroy_rt_bandwidth(_rt_bandwidth);
> > +}
> 
> The problem is bandwidth timer is not per-cpu. It's only for all
> processors from the span (sched_rt_period_mask()). Other CPUs may
> have enqueued RT tasks. So, it's not possible to do this.

BTW, I noticed you can no longer turn the turn the noisy thing off since
we grew DL.  I added an old SGI boot parameter to tell it to go away.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 00/14] perf, x86: Haswell LBR call stack support

2014-02-25 Thread Andy Lutomirski
On 02/17/2014 10:07 PM, Yan, Zheng wrote:
> 
> This patch series adds LBR call stack support. User can enabled/disable
> this through an sysfs attribute file in the CPU PMU directory:
>  echo 1 > /sys/bus/event_source/devices/cpu/lbr_callstack 

This seems like an unpleasant way to control this.  It would be handy to
be able to control this as an option to perf record.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >