Re: [PATCH 1/2 v2] ftrace: Be first to run code modification on modules

2013-01-07 Thread Masami Hiramatsu
(2013/01/07 23:02), Steven Rostedt wrote:
> From: Steven Rostedt 
> 
> If some other kernel subsystem has a module notifier, and adds a kprobe
> to a ftrace mcount point (now that kprobes work on ftrace points),
> when the ftrace notifier runs it will fail and disable ftrace, as well
> as kprobes that are attached to ftrace points.
> 
> Here's the error:
> 
>  WARNING: at kernel/trace/ftrace.c:1618 ftrace_bug+0x239/0x280()
>  Hardware name: Bochs
>  Modules linked in: fat(+) stap_56d28a51b3fe546293ca0700b10bcb29__8059(F) 
> nfsv4 auth_rpcgss nfs dns_resolver fscache xt_nat iptable_nat 
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack lockd sunrpc 
> ppdev parport_pc parport microcode virtio_net i2c_piix4 drm_kms_helper ttm 
> drm i2c_core [last unloaded: bid_shared]
>  Pid: 8068, comm: modprobe Tainted: GF
> 3.7.0-0.rc8.git0.1.fc19.x86_64 #1
>  Call Trace:
>   [] warn_slowpath_common+0x7f/0xc0
>   [] ? __probe_kernel_read+0x46/0x70
>   [] ? 0xa017
>   [] ? 0xa017
>   [] warn_slowpath_null+0x1a/0x20
>   [] ftrace_bug+0x239/0x280
>   [] ftrace_process_locs+0x376/0x520
>   [] ftrace_module_notify+0x47/0x50
>   [] notifier_call_chain+0x4d/0x70
>   [] __blocking_notifier_call_chain+0x58/0x80
>   [] blocking_notifier_call_chain+0x16/0x20
>   [] sys_init_module+0x73/0x220
>   [] system_call_fastpath+0x16/0x1b
>  ---[ end trace 9ef46351e53bbf80 ]---
>  ftrace failed to modify [] init_once+0x0/0x20 [fat]
>   actual: cc:bb:d2:4b:e1
> 
> A kprobe was added to the init_once() function in the fat module on load.
> But this happened before ftrace could have touched the code. As ftrace
> didn't run yet, the kprobe system had no idea it was a ftrace point and
> simply added a breakpoint to the code (0xcc in the cc:bb:d2:4b:e1).
> 
> Then when ftrace went to modify the location from a call to mcount/fentry
> into a nop, it didn't see a call op, but instead it saw the breakpoint op
> and not knowing what to do with it, ftrace shut itself down.
> 
> The solution is to simply give the ftrace module notifier the max priority.
> This should have been done regardless, as the core code ftrace modification
> also happens very early on in boot up. This makes the module modification
> closer to core modification.

Correct! Thank you for fix that.

Acked-by: Masami Hiramatsu 

> Reported-by: Frank Ch. Eigler 
> Signed-off-by: Steven Rostedt 
> ---
>  kernel/trace/ftrace.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 51b7159..356bc2f 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -3998,7 +3998,7 @@ static int ftrace_module_notify(struct notifier_block 
> *self,
>  
>  struct notifier_block ftrace_module_nb = {
>   .notifier_call = ftrace_module_notify,
> - .priority = 0,
> + .priority = INT_MAX,/* Run before anything that can use kprobes */
>  };
>  
>  extern unsigned long __start_mcount_loc[];
> 


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND v1 00/16] vfs: hot data tracking

2013-01-07 Thread Zhi Yong Wu
On Thu, Dec 20, 2012 at 10:43 PM,   wrote:
> From: Zhi Yong Wu 
>
> HI, guys,
>
>   This patchset has been done scalability or performance tests
> by fs_mark, ffsb and compilebench.
>   I have done the perf testing on Linux 3.7.0-rc8+ with Intel(R) Core(TM)
> i7-3770 CPU @ 3.40GHz with 8 CPUs, 16G ram and 260G disk.
>
>   Any comments or ideas are appreciated, thanks.
>
> NOTE:
>
>   The patchset can be obtained via my kernel dev git on github:
> git://github.com/wuzhy/kernel.git hot_tracking
>   If you're interested, you can also review them via
> https://github.com/wuzhy/kernel/commits/hot_tracking
>
>   For more info, please check hot_tracking.txt in Documentation
>
> Below is the perf testing report:
>
> 1. fs_mark test
>
> w/o: without hot tracking
> w/ : with hot tracking
>
>   Count Size  FSUse%   Files/sec App 
> Overhead
>
> w/ow/   w/o  w/   w/o 
>w/
>
>   80 1   2  3  13756.4 32144.9  
> 53506275436291
>  160 1   4  5   1163.4  1799.3 
> 20848119   21708216
>  240 1   6  6   1360.8  1252.5  
> 67987058715322
>  320 1   8  8   1600.1  1196.3  
> 57511296013792
>  400 1   9  9   1071.4  1191.2 
> 17204725   26786369
>  480 1  10 10   1483.5  1447.9 
> 19418383046
>  560 1  11 11   1457.9  1699.5  
> 5783588   10074681
>  640 1  12 13   1658.8  1628.5  
> 69926976185551
>  720 1  14 14   1662.4  1857.1  
> 5796793   13772592
>  800 1  15 15   2930.0  2653.8 
> 124316826152573
>  880 1  16 17   1630.8  1665.0  
> 7666719   13682765
>  960 1  18 18   1530.3  1583.9  
> 5823644   10171644
> 1040 1  19 19   1437.9  1798.6 
> 209352246048083
> 1120 1  20 20   1529.0  1550.6  
> 66474506003151
> 1200 1  21 22   1558.6  1501.8 
> 12539509   18144939
> 1280 1  23 23   1644.2  1432.1  
> 7074419   28101975
> 1360 1  24 24   1753.6  1650.2  
> 7164297   20888972
> 1440 1  25 25   2750.0  1483.9 
> 127566927441225
> 1520 1  27 27   1551.1  1514.3  
> 57410668250443
> 1600 1  28 28   1610.8  1635.9 
> 721938608545285
> 1680 1  29 29   1646.7  1907.7  
> 8945856   11703513
> 1760 1  30 31   1496.6  2722.3  
> 58589618989393
> 1840 1  32 32   1457.7  1565.7 
> 10914475   26504660
> 1920 1  33 33   1437.6  1518.7  
> 6708975  213303618
> 2000 1  34 34   1825.4  1521.1  
> 5722086   12490907
> 2080 1  36 35   1718.4  1611.5  
> 5873290   17942534
> 2160 1  37 37   2152.6  1536.9
> 1130506278717940
> 2240 1  38 38   2443.7  1788.2  
> 7398122   19834765
> 2320 1  39 39   1518.5  1587.6  
> 5770959   10134882
> 2400 1  41 41   1536.8  2164.0  
> 57512487214626
> 2480 1  42 42   1576.6  2939.4  
> 73903146070271
> 2560 1  43 43   1707.4  1535.9 
> 110759396052896
> 2640 1  44 44   1522.5  1563.1 
> 10142987   22549898
> 2720 1  46 46   1827.4  1608.5 
> 11613016   24828125
> 2800 1  47 47   3420.5  1741.9  
> 8059985   16599156
> 2880 1  48 48   1815.5  1944.4  
> 78479319043277
> 2960 1  50 49   1650.0  1596.6  
> 56363237929164
> 3040 1  51 51   1683.7  1573.3  
> 5766323   19369146
> 3120 1  52 52   1610.1  1669.8  
> 92561119899107
> 3200 1  53 53   1645.2  3081.0  
> 78550106057257
> 3280 1  54 55   1835.3  3122.0  
> 68991416143875
> 3360 1  56 56   1916.8  1734.8 
> 

[PATCH] cgroup: use new hashtable implementation

2013-01-07 Thread Li Zefan
Switch cgroup to use the new hashtable implementation. No functional changes.

Signed-off-by: Li Zefan 
---
 kernel/cgroup.c | 92 -
 1 file changed, 39 insertions(+), 53 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 4855892..d8df073 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -52,7 +52,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -376,22 +376,18 @@ static int css_set_count;
  * account cgroups in empty hierarchies.
  */
 #define CSS_SET_HASH_BITS  7
-#define CSS_SET_TABLE_SIZE (1 << CSS_SET_HASH_BITS)
-static struct hlist_head css_set_table[CSS_SET_TABLE_SIZE];
+static DEFINE_HASHTABLE(css_set_table, CSS_SET_HASH_BITS);
 
-static struct hlist_head *css_set_hash(struct cgroup_subsys_state *css[])
+static unsigned long css_set_hash(struct cgroup_subsys_state *css[])
 {
int i;
-   int index;
-   unsigned long tmp = 0UL;
+   unsigned long key = 0UL;
 
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++)
-   tmp += (unsigned long)css[i];
-   tmp = (tmp >> 16) ^ tmp;
+   key += (unsigned long)css[i];
+   key = (key >> 16) ^ key;
 
-   index = hash_long(tmp, CSS_SET_HASH_BITS);
-
-   return _set_table[index];
+   return key;
 }
 
 /* We don't maintain the lists running through each css_set to its
@@ -418,7 +414,7 @@ static void __put_css_set(struct css_set *cg, int taskexit)
}
 
/* This css_set is dead. unlink it and release cgroup refcounts */
-   hlist_del(>hlist);
+   hash_del(>hlist);
css_set_count--;
 
list_for_each_entry_safe(link, saved_link, >cg_links,
@@ -550,9 +546,9 @@ static struct css_set *find_existing_css_set(
 {
int i;
struct cgroupfs_root *root = cgrp->root;
-   struct hlist_head *hhead;
struct hlist_node *node;
struct css_set *cg;
+   unsigned long key;
 
/*
 * Build the set of subsystem state objects that we want to see in the
@@ -572,8 +568,8 @@ static struct css_set *find_existing_css_set(
}
}
 
-   hhead = css_set_hash(template);
-   hlist_for_each_entry(cg, node, hhead, hlist) {
+   key = css_set_hash(template);
+   hash_for_each_possible(css_set_table, cg, node, hlist, key) {
if (!compare_css_sets(cg, oldcg, cgrp, template))
continue;
 
@@ -657,8 +653,8 @@ static struct css_set *find_css_set(
 
struct list_head tmp_cg_links;
 
-   struct hlist_head *hhead;
struct cg_cgroup_link *link;
+   unsigned long key;
 
/* First see if we already have a cgroup group that matches
 * the desired set */
@@ -704,8 +700,8 @@ static struct css_set *find_css_set(
css_set_count++;
 
/* Add this cgroup group to the hash table */
-   hhead = css_set_hash(res->subsys);
-   hlist_add_head(>hlist, hhead);
+   key = css_set_hash(res->subsys);
+   hash_add(css_set_table, >hlist, key);
 
write_unlock(_set_lock);
 
@@ -1597,6 +1593,8 @@ static struct dentry *cgroup_mount(struct 
file_system_type *fs_type,
struct cgroupfs_root *existing_root;
const struct cred *cred;
int i;
+   struct hlist_node *node;
+   struct css_set *cg;
 
BUG_ON(sb->s_root != NULL);
 
@@ -1650,14 +1648,8 @@ static struct dentry *cgroup_mount(struct 
file_system_type *fs_type,
/* Link the top cgroup in this hierarchy into all
 * the css_set objects */
write_lock(_set_lock);
-   for (i = 0; i < CSS_SET_TABLE_SIZE; i++) {
-   struct hlist_head *hhead = _set_table[i];
-   struct hlist_node *node;
-   struct css_set *cg;
-
-   hlist_for_each_entry(cg, node, hhead, hlist)
-   link_css_set(_cg_links, cg, root_cgrp);
-   }
+   hash_for_each(css_set_table, i, node, cg, hlist)
+   link_css_set(_cg_links, cg, root_cgrp);
write_unlock(_set_lock);
 
free_cg_links(_cg_links);
@@ -4438,6 +4430,9 @@ int __init_or_module cgroup_load_subsys(struct 
cgroup_subsys *ss)
 {
struct cgroup_subsys_state *css;
int i, ret;
+   struct hlist_node *node, *tmp;
+   struct css_set *cg;
+   unsigned long key;
 
/* check name and function validity */
if (ss->name == NULL || strlen(ss->name) > MAX_CGROUP_TYPE_NAMELEN ||
@@ -4503,23 +4498,17 @@ int __init_or_module cgroup_load_subsys(struct 
cgroup_subsys *ss)
 * this is all done under the css_set_lock.
 */
write_lock(_set_lock);
-   for (i = 0; i < CSS_SET_TABLE_SIZE; i++) {
-   struct css_set *cg;
-   struct hlist_node *node, *tmp;
-   struct 

[PATCH repost] blkcg: fix "scheduling while atomic" in blk_queue_bypass_start

2013-01-07 Thread Jun'ichi Nomura
With 749fefe677 in v3.7 ("block: lift the initial queue bypass mode
on blk_register_queue() instead of blk_init_allocated_queue()"),
the following warning appears when multipath is used with CONFIG_PREEMPT=y.

This patch moves blk_queue_bypass_start() before radix_tree_preload()
to avoid the sleeping call while preemption is disabled.

  BUG: scheduling while atomic: multipath/2460/0x0002
  1 lock held by multipath/2460:
   #0:  (>type_lock){..}, at: [] 
dm_lock_md_type+0x17/0x19 [dm_mod]
  Modules linked in: ...
  Pid: 2460, comm: multipath Tainted: GW3.7.0-rc2 #1
  Call Trace:
   [] __schedule_bug+0x6a/0x78
   [] __schedule+0xb4/0x5e0
   [] schedule+0x64/0x66
   [] schedule_timeout+0x39/0xf8
   [] ? put_lock_stats+0xe/0x29
   [] ? lock_release_holdtime+0xb6/0xbb
   [] wait_for_common+0x9d/0xee
   [] ? try_to_wake_up+0x206/0x206
   [] ? kfree_call_rcu+0x1c/0x1c
   [] wait_for_completion+0x1d/0x1f
   [] wait_rcu_gp+0x5d/0x7a
   [] ? wait_rcu_gp+0x7a/0x7a
   [] ? complete+0x21/0x53
   [] synchronize_rcu+0x1e/0x20
   [] blk_queue_bypass_start+0x5d/0x62
   [] blkcg_activate_policy+0x73/0x270
   [] ? kmem_cache_alloc_node_trace+0xc7/0x108
   [] cfq_init_queue+0x80/0x28e
   [] ? dm_blk_ioctl+0xa7/0xa7 [dm_mod]
   [] elevator_init+0xe1/0x115
   [] ? blk_queue_make_request+0x54/0x59
   [] blk_init_allocated_queue+0x8c/0x9e
   [] dm_setup_md_queue+0x36/0xaa [dm_mod]
   [] table_load+0x1bd/0x2c8 [dm_mod]
   [] ctl_ioctl+0x1d6/0x236 [dm_mod]
   [] ? table_clear+0xaa/0xaa [dm_mod]
   [] dm_ctl_ioctl+0x13/0x17 [dm_mod]
   [] do_vfs_ioctl+0x3fb/0x441
   [] ? file_has_perm+0x8a/0x99
   [] sys_ioctl+0x5e/0x82
   [] ? trace_hardirqs_on_thunk+0x3a/0x3f
   [] system_call_fastpath+0x16/0x1b

Signed-off-by: Jun'ichi Nomura 
Acked-by: Vivek Goyal 
Cc: Tejun Heo 
Cc: Jens Axboe 
Cc: Alasdair G Kergon 
---
 block/blk-cgroup.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index b8858fb..53628e4 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -790,10 +790,10 @@ int blkcg_activate_policy(struct request_queue *q,
if (!blkg)
return -ENOMEM;
 
-   preloaded = !radix_tree_preload(GFP_KERNEL);
-
blk_queue_bypass_start(q);
 
+   preloaded = !radix_tree_preload(GFP_KERNEL);
+
/* make sure the root blkg exists and count the existing blkgs */
spin_lock_irq(q->queue_lock);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Add mempressure cgroup

2013-01-07 Thread Anton Vorontsov
On Mon, Jan 07, 2013 at 05:51:46PM +0900, Kamezawa Hiroyuki wrote:
[...]
> I'm just curious..

Thanks for taking a look! :)

[...]
> > +/*
> > + * The window size is the number of scanned pages before we try to analyze
> > + * the scanned/reclaimed ratio (or difference).
> > + *
> > + * It is used as a rate-limit tunable for the "low" level notification,
> > + * and for averaging medium/oom levels. Using small window sizes can cause
> > + * lot of false positives, but too big window size will delay the
> > + * notifications.
> > + */
> > +static const uint vmpressure_win = SWAP_CLUSTER_MAX * 16;
> > +static const uint vmpressure_level_med = 60;
> > +static const uint vmpressure_level_oom = 99;
> > +static const uint vmpressure_level_oom_prio = 4;
> > +
> 
> Hmm... isn't this window size too small ?
> If vmscan cannot find a reclaimable page while scanning 2M of pages in a zone,
> oom notify will be returned. Right ?

Yup, you are right, if we were not able to find anything within the window
size (which is 2M, but see below), then it is effectively the "OOM level".
The thing is, the vmpressure reports... the pressure. :) Or, the
allocation cost, and if the cost becomes high, it is no good.

The 2M is, of course, not ideal. And the "ideal" depends on many factors,
alike to vmstat. And, actually I dream about deriving the window size from
zone->stat_threshold, which would make the window automatically adjustable
for different "machine sizes" (as we do in calculate_normal_threshold(),
in vmstat.c).

But again, this is all "implementation details"; tunable stuff that we can
either adjust ourselves as needed, or try to be smart, i.e. apply some
heuristics, again, as in vmstat.

Thanks,
Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 3/4] block: implement runtime pm strategy

2013-01-07 Thread Aaron Lu
On 01/08/2013 01:21 AM, Alan Stern wrote:
> On Sun, 6 Jan 2013, Aaron Lu wrote:
> 
>> From: Lin Ming 
>>
>> When a request is added:
>> If device is suspended or is suspending and the request is not a
>> PM request, resume the device.
>>
>> When the last request finishes:
>> Call pm_runtime_mark_last_busy() and pm_runtime_autosuspend().
>>
>> When pick a request:
>> If device is resuming/suspending, then only PM request is allowed to go.
>> Return NULL for other cases.
>>
>> [aaron...@intel.com: PM request does not involve nr_pending counting]
>> [aaron...@intel.com: No need to check q->dev]
>> [aaron...@intel.com: Autosuspend when the last request finished]
>> Signed-off-by: Lin Ming 
>> Signed-off-by: Aaron Lu 
> 
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -974,6 +974,40 @@ extern int blk_pre_runtime_suspend(struct request_queue 
>> *q);
>>  extern void blk_post_runtime_suspend(struct request_queue *q, int err);
>>  extern void blk_pre_runtime_resume(struct request_queue *q);
>>  extern void blk_post_runtime_resume(struct request_queue *q, int err);
>> +
>> +static inline void blk_pm_put_request(struct request *rq)
>> +{
>> +if (!(rq->cmd_flags & REQ_PM) && !--rq->q->nr_pending) {
>> +pm_runtime_mark_last_busy(rq->q->dev);
>> +pm_runtime_autosuspend(rq->q->dev);
>> +}
>> +}
>> +
>> +static inline struct request *blk_pm_peek_request(
>> +struct request_queue *q, struct request *rq)
>> +{
>> +if (q->rpm_status == RPM_SUSPENDED ||
>> +  (q->rpm_status != RPM_ACTIVE && !(rq->cmd_flags & REQ_PM)))
>> +return NULL;
>> +else
>> +return rq;
>> +}
>> +
>> +static inline void blk_pm_requeue_request(struct request *rq)
>> +{
>> +if (!(rq->cmd_flags & REQ_PM))
>> +rq->q->nr_pending--;
>> +}
>> +
>> +static inline void blk_pm_add_request(struct request_queue *q,
>> +struct request *rq)
>> +{
>> +if (!(rq->cmd_flags & REQ_PM) &&
>> +q->nr_pending++ == 0 &&
>> +(q->rpm_status == RPM_SUSPENDED ||
>> + q->rpm_status == RPM_SUSPENDING))
>> +pm_request_resume(q->dev);
>> +}
> 
> These routines also don't belong in include/linux.  And they don't need 
> to be marked inline.

OK, will move them.

What about create a new file blk-pm.c for all these block pm related
code?

Thanks,
Aaron

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 1/2] pinctrl: tegra: add support for rcv-sel and drive type

2013-01-07 Thread Laxman Dewangan
From: Pritesh Raithatha 

NVIDIA's Tegra114 added two more configuration parameter in pinmux i.e.
rcv-sel and drive type.

rcv-sel: Select between High and Normal VIL/VIH receivers.
RCVR_SEL=1: High VIL/VIH
RCVR_SEL=0: Normal VIL/VIH

drv_type: Ouptput drive type:
33-50 ohm driver: 0x1
66-100ohm driver: 0x0

Add support of these parameters to be configure from DTS file.

Tegra20 and Tegra30 does not support this configuration and hence initialize 
their
pinmux structure with reg = -1.

Originally written by Pritesh Raithatha.
Changes by ldewangan:
- remove drvtype_width as it is always 2.
- Better describe the change.

Signed-off-by: Pritesh Raithatha 
Signed-off-by: Laxman Dewangan 
Reviewed-by: Stephen Warren 
---
Changes from V1:
- none in code, added Stephen in review-by.

 drivers/pinctrl/pinctrl-tegra.c   |   14 ++
 drivers/pinctrl/pinctrl-tegra.h   |   16 
 drivers/pinctrl/pinctrl-tegra20.c |6 ++
 drivers/pinctrl/pinctrl-tegra30.c |4 
 4 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/drivers/pinctrl/pinctrl-tegra.c b/drivers/pinctrl/pinctrl-tegra.c
index ae1e4bb..f195d77 100644
--- a/drivers/pinctrl/pinctrl-tegra.c
+++ b/drivers/pinctrl/pinctrl-tegra.c
@@ -201,6 +201,7 @@ static const struct cfg_param {
{"nvidia,open-drain",   TEGRA_PINCONF_PARAM_OPEN_DRAIN},
{"nvidia,lock", TEGRA_PINCONF_PARAM_LOCK},
{"nvidia,io-reset", TEGRA_PINCONF_PARAM_IORESET},
+   {"nvidia,rcv-sel",  TEGRA_PINCONF_PARAM_RCV_SEL},
{"nvidia,high-speed-mode",  TEGRA_PINCONF_PARAM_HIGH_SPEED_MODE},
{"nvidia,schmitt",  TEGRA_PINCONF_PARAM_SCHMITT},
{"nvidia,low-power-mode",   TEGRA_PINCONF_PARAM_LOW_POWER_MODE},
@@ -208,6 +209,7 @@ static const struct cfg_param {
{"nvidia,pull-up-strength", TEGRA_PINCONF_PARAM_DRIVE_UP_STRENGTH},
{"nvidia,slew-rate-falling",TEGRA_PINCONF_PARAM_SLEW_RATE_FALLING},
{"nvidia,slew-rate-rising", TEGRA_PINCONF_PARAM_SLEW_RATE_RISING},
+   {"nvidia,drive-type",   TEGRA_PINCONF_PARAM_DRIVE_TYPE},
 };
 
 static int tegra_pinctrl_dt_subnode_to_map(struct device *dev,
@@ -450,6 +452,12 @@ static int tegra_pinconf_reg(struct tegra_pmx *pmx,
*bit = g->ioreset_bit;
*width = 1;
break;
+   case TEGRA_PINCONF_PARAM_RCV_SEL:
+   *bank = g->rcv_sel_bank;
+   *reg = g->rcv_sel_reg;
+   *bit = g->rcv_sel_bit;
+   *width = 1;
+   break;
case TEGRA_PINCONF_PARAM_HIGH_SPEED_MODE:
*bank = g->drv_bank;
*reg = g->drv_reg;
@@ -492,6 +500,12 @@ static int tegra_pinconf_reg(struct tegra_pmx *pmx,
*bit = g->slwr_bit;
*width = g->slwr_width;
break;
+   case TEGRA_PINCONF_PARAM_DRIVE_TYPE:
+   *bank = g->drvtype_bank;
+   *reg = g->drvtype_reg;
+   *bit = g->drvtype_bit;
+   *width = 2;
+   break;
default:
dev_err(pmx->dev, "Invalid config param %04x\n", param);
return -ENOTSUPP;
diff --git a/drivers/pinctrl/pinctrl-tegra.h b/drivers/pinctrl/pinctrl-tegra.h
index 62e3809..817f706 100644
--- a/drivers/pinctrl/pinctrl-tegra.h
+++ b/drivers/pinctrl/pinctrl-tegra.h
@@ -30,6 +30,8 @@ enum tegra_pinconf_param {
/* argument: Boolean */
TEGRA_PINCONF_PARAM_IORESET,
/* argument: Boolean */
+   TEGRA_PINCONF_PARAM_RCV_SEL,
+   /* argument: Boolean */
TEGRA_PINCONF_PARAM_HIGH_SPEED_MODE,
/* argument: Boolean */
TEGRA_PINCONF_PARAM_SCHMITT,
@@ -43,6 +45,8 @@ enum tegra_pinconf_param {
TEGRA_PINCONF_PARAM_SLEW_RATE_FALLING,
/* argument: Integer, range is HW-dependant */
TEGRA_PINCONF_PARAM_SLEW_RATE_RISING,
+   /* argument: Integer, range is HW-dependant */
+   TEGRA_PINCONF_PARAM_DRIVE_TYPE,
 };
 
 enum tegra_pinconf_pull {
@@ -95,6 +99,9 @@ struct tegra_function {
  * @ioreset_reg:   IO reset register offset. -1 if unsupported.
  * @ioreset_bank:  IO reset register bank. 0 if unsupported.
  * @ioreset_bit:   IO reset register bit. 0 if unsupported.
+ * @rcv_sel_reg:   Receiver select offset. -1 if unsupported.
+ * @rcv_sel_bank:  Receiver select bank. 0 if unsupported.
+ * @rcv_sel_bit:   Receiver select bit. 0 if unsupported.
  * @drv_reg:   Drive fields register offset. -1 if unsupported.
  * This register contains the hsm, schmitt, lpmd, drvdn,
  * drvup, slwr, and slwf parameters.
@@ -110,6 +117,9 @@ struct tegra_function {
  * @slwr_width:Slew Rising field width. 0 if unsupported.
  * @slwf_bit:  Slew Falling register bit. 0 if unsupported.
  * @slwf_width:Slew Falling 

Re: [PATCH v6 0/4] block layer runtime pm

2013-01-07 Thread Aaron Lu
On 01/08/2013 01:11 AM, Alan Stern wrote:
> On Sun, 6 Jan 2013, Aaron Lu wrote:
> 
>> In August 2010, Jens and Alan discussed about "Runtime PM and the block
>> layer". http://marc.info/?t=12825910841=1=2
>> And then Alan has given a detailed implementation guide:
>> http://marc.info/?l=linux-scsi=133727953625963=2
>>
>> To test:
>> # ls -l /sys/block/sda
>> /sys/devices/pci:00/:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda
>>
>> # echo 1 > 
>> /sys/devices/pci:00/:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/power/autosuspend_delay_ms
>> # echo auto > 
>> /sys/devices/pci:00/:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/power/control
>> Then you'll see sda is suspended after 10secs idle.
>>
>> [ 1767.680192] sd 2:0:0:0: [sda] Synchronizing SCSI cache
>> [ 1767.680317] sd 2:0:0:0: [sda] Stopping disk
>>
>> And if you do some IO, it will resume immediately.
>> [ 1791.052438] sd 2:0:0:0: [sda] Starting disk
>>
>> For test, I often set the autosuspend time to 1 second. If you are using
>> a GUI, the 10 seconds delay may be too long that the disk can not enter
>> runtime suspended state.
>>
>> Note that sd's runtime suspend callback will dump some kernel messages
>> and the syslog daemon will write kernel message to /var/log/messages,
>> making the disk instantly resume after suspended. So for test, the
>> syslog daemon should better be temporarily stopped.
>>
>> v6:
>> Take over from Lin Ming.
>>
>> - Instead of put the device into autosuspend state in
>>   blk_post_runtime_suspend, do it when the last request is finished.
>>   This can also solve the problem illustrated below:
>>
>>   thread A thread B
>> |suspend timer expired   |
>> |  ... ...   |a new request comes in,
>> |  ... ...   |blk_pm_add_request
>> |  ... ...   |skip request_resume due to
>> |  ... ...   |q->status is still RPM_ACTIVE
>> |  rpm_suspend   |  ... ...
>> |scsi_runtime_suspend|  ... ...
>> |  blk_pre_runtime_suspend   |  ... ...
>> |  return -EBUSY due to nr_pending   |  ... ...
>> |  rpm_suspend done  |  ... ...
>> ||blk_pm_put_request, mark last busy
>>
>> But no more trigger point, and the device will stay at RPM_ACTIVE state.
>> Run pm_runtime_autosuspend after the last request is finished solved
>> this problem.
> 
> This doesn't look like the best solution, because it involves adding a 
> nontrivial routine (pm_runtime_autosuspend) to a hot path.

Oh right, I didn't realize this. Thanks for pointing this out.

> 
> How about this instead?  When blk_pre_runtime_suspend returns -EBUSY,
> have it do a mark-last-busy.  Then rpm_suspend will automatically
> reschedule the autosuspend for later.

Yes, this is better.

> 
>> - Requests which have the REQ_PM flag should not involve nr_pending
>>   counting, or we may lose the condition to resume the device:
>>   Suppose queue is active and nr_pending is 0. Then a REQ_PM request
>>   comes and nr_pending will be increased to 1, but since the request has
>>   REQ_PM flag, it will not cause resume. Before it is finished, a normal
>>   request comes in, and since nr_pending is 1 now, it will not trigger
>>   the resume of the device either. Bug.
>>
>> - Do not quiesce the device in scsi bus level runtime suspend callback.
>>   Since the only reason the device is to be runtime suspended is due to
>>   no more requests pending for it, quiesce it is pointless.
>>
>> - Remove scsi_autopm_* from sd_check_events as we are request driven.
>>
>> - Call blk_pm_runtime_init in scsi_sysfs_initialize_dev, so that we do
>>   not need to check queue's device in blk_pm_add/put_request.
> 
> I think you still need to have that check.  After all, the block layer 
> has other users besides the SCSI stack, and those users don't call 
> blk_pm_runtime_init.

Right...

So this also reminds me that as long as CONFIG_PM_RUNTIME is selected,
the blk_pm_add/put/peek_request functions will be in the block IO path.
Shall we introduce a new config option to selectively build block
runtime PM functionality? something like CONFIG_BLK_PM_RUNTIME perhaps?

Just some condition checks in those functions, not sure if it is worth a
new config though. Please suggest, thanks.

> 
>> - Do not mark last busy and initiate an autosuspend for the device in
>>   blk_pm_runtime_init function.
>>
>> - Do not mark last busy and initiate an autosuspend for the device in
>>   block_post_runtime_resume, as when the request that triggered the
>>   resume finished, the blk_pm_put_request will mark last busy and
>>   initiate an autosuspend.
> 
> If you make the change that I recommended above then this is still 
> necessary.

Yes, they are needed. Thanks!

-Aaron

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

[PATCH 4/4] gpiolib: add documentation for new gpiod_ API

2013-01-07 Thread Alexandre Courbot
Signed-off-by: Alexandre Courbot 
---
 Documentation/gpio.txt | 94 --
 1 file changed, 92 insertions(+), 2 deletions(-)

diff --git a/Documentation/gpio.txt b/Documentation/gpio.txt
index 77a1d11..871ccda 100644
--- a/Documentation/gpio.txt
+++ b/Documentation/gpio.txt
@@ -120,7 +120,8 @@ example, a number might be valid but temporarily unused on 
a given board.
 
 Whether a platform supports multiple GPIO controllers is a platform-specific
 implementation issue, as are whether that support can leave "holes" in the 
space
-of GPIO numbers, and whether new controllers can be added at runtime.  Such 
issues
+of GPIO numbers, and whether new controllers can be added at runtime.  Such
+issues
 can affect things including whether adjacent GPIO numbers are both valid.
 
 Using GPIOs
@@ -302,7 +303,8 @@ are claimed, three additional calls are defined:
 * 'flags', identical to gpio_request() wrt other arguments and
 * return value
 */
-   int gpio_request_one(unsigned gpio, unsigned long flags, const char 
*label);
+   int gpio_request_one(unsigned gpio, unsigned long flags, const char
+*label);
 
/* request multiple GPIOs in a single call
 */
@@ -773,3 +775,91 @@ differences between boards from user space.  This only 
affects the
 sysfs interface.  Polarity change can be done both before and after
 gpio_export(), and previously enabled poll(2) support for either
 rising or falling edge will be reconfigured to follow this setting.
+
+GPIO descriptor interface
+=
+A more secure, alternative GPIO interface is available through GPIOlib. Instead
+of relying on integers (which can easily be forged and used without being
+properly requested) to reference GPIOs, it uses a system of opaque descriptors
+that must be properly obtained and disposed through the common get/put set of
+functions. This ensures that all GPIO descriptors are valid at any time and
+makes it unnecessary to check the validity of a GPIO. Apart from this
+difference, the interface is similar to the integer-based one, excepted that 
the
+gpio_ prefix is changed to gpiod_.
+
+This interface can be used in conjonction with the integer-based API, however
+new drivers should really try to use the safer descriptor-based interface.
+Drivers using this interface should depend on CONFIG_GPIOLIB being set, as it 
is
+only available when GPIOlib is compiled in.
+
+Using GPIOs
+---
+GPIO consumers should include  which declares the
+consumer-side GPIO functions. GPIOs are obtained through gpiod_get:
+
+   struct gpio_desc *gpiod_get(struct device *dev,
+   const char *con_id);
+
+This will return the GPIO descriptor corresponding to the con_id function of
+dev, or an error code in case of error. A devm variant is also available:
+
+   struct gpio_desc *devm_gpiod_get(struct device *dev,
+const char *con_id);
+
+GPIO descriptors are disposed using the corresponding put functions:
+
+   void gpiod_put(struct gpio_desc *desc);
+   void devm_gpiod_put(struct device *dev, struct gpio_desc *desc);
+
+A valid descriptor can then be used with one of the gpiod_ functions. Their
+interface is identical to the integer-based API, excepted that they take a GPIO
+descriptor instead of an integer:
+
+   int gpiod_direction_input(struct gpio_desc *desc);
+   int gpiod_direction_output(struct gpio_desc *desc, int value);
+   int gpiod_get_value(struct gpio_desc *desc);
+   void gpiod_set_value(struct gpio_desc *desc, int value);
+   ...
+
+If you need to convert a descriptor to an integer or vice-versa, you can use
+gpio_to_desc or desc_to_gpio:
+
+   struct gpio_desc *gpio_to_desc(unsigned gpio);
+   int desc_to_gpio(const struct gpio_desc *desc);
+
+The same GPIO can be used by both interfaces as long as it has properly been
+acquired by one of them (i.e. using either gpio_request() or gpiod_get()).
+
+Declaring GPIOs
+---
+GPIOs can be made available for devices either through board-specific lookup
+tables, or using the device tree.
+
+Board-specific lookup tables match a device name and consumer ID to a GPIO chip
+and GPIO number relative to that chip. They are declared as follows:
+
+   static struct gpio_lookup board_gpio_lookup[] = {
+   GPIO_LOOKUP("tegra-gpio", 28, "backlight.1", "power"),
+   };
+
+   static void __init board_init(void)
+   {
+   ...
+   gpiod_add_table(board_gpio_lookup,
+   ARRAY_SIZE(board_gpio_lookup));
+   ...
+   }
+
+In the driver side, the following code:
+
+   gpiod_get(dev, "power");
+
+will return the descriptor for GPIO 28 of the "tegra-gpio" chip provided
+strcmp(dev_name(dev), "backlight.1") == 0.
+
+If the device tree is used, then the same "power" GPIO can be declared into the
+device's 

[PATCH 1/4] gpiolib: introduce descriptor-based GPIO interface

2013-01-07 Thread Alexandre Courbot
With the current API, GPIOs are represented by a unique integer. This
causes problems in terms of security (GPIO numbers can be arbitrarily
forged and used without proper allocation) and flexibility (the maximum
number of GPIOs that the system can handle is fixed at compilation time
and a static array of that size is allocated to store the GPIO
descriptors).

This patch introduces an alternative GPIO API that uses opaque handlers
and refactors GPIOlib's internals to work with these handlers instead of
GPIO numbers. The former integer-based API is still available as a light
wrapper around this new API.

Signed-off-by: Alexandre Courbot 
---
 drivers/gpio/devres.c |  59 +++--
 drivers/gpio/gpiolib.c| 486 +++---
 include/asm-generic/gpio.h| 176 ++-
 include/linux/gpio/consumer.h |  64 ++
 include/linux/gpio/driver.h   | 112 ++
 5 files changed, 552 insertions(+), 345 deletions(-)
 create mode 100644 include/linux/gpio/consumer.h
 create mode 100644 include/linux/gpio/driver.h

diff --git a/drivers/gpio/devres.c b/drivers/gpio/devres.c
index 1077754..b7a9ad6 100644
--- a/drivers/gpio/devres.c
+++ b/drivers/gpio/devres.c
@@ -15,22 +15,24 @@
  */
 
 #include 
+#include 
 #include 
+#include 
 #include 
 #include 
 
 static void devm_gpio_release(struct device *dev, void *res)
 {
-   unsigned *gpio = res;
+   struct gpio_desc **desc = res;
 
-   gpio_free(*gpio);
+   gpiod_put(*desc);
 }
 
 static int devm_gpio_match(struct device *dev, void *res, void *data)
 {
-   unsigned *this = res, *gpio = data;
+   struct gpio_desc **this = res, **desc = data;
 
-   return *this == *gpio;
+   return *this == *desc;
 }
 
 /**
@@ -50,10 +52,11 @@ static int devm_gpio_match(struct device *dev, void *res, 
void *data)
 
 int devm_gpio_request(struct device *dev, unsigned gpio, const char *label)
 {
-   unsigned *dr;
+   struct gpio_desc **dr;
int rc;
 
-   dr = devres_alloc(devm_gpio_release, sizeof(unsigned), GFP_KERNEL);
+   dr = devres_alloc(devm_gpio_release, sizeof(struct gpio_desc *),
+ GFP_KERNEL);
if (!dr)
return -ENOMEM;
 
@@ -63,7 +66,7 @@ int devm_gpio_request(struct device *dev, unsigned gpio, 
const char *label)
return rc;
}
 
-   *dr = gpio;
+   *dr = gpio_to_desc(gpio);
devres_add(dev, dr);
 
return 0;
@@ -80,10 +83,11 @@ EXPORT_SYMBOL(devm_gpio_request);
 int devm_gpio_request_one(struct device *dev, unsigned gpio,
  unsigned long flags, const char *label)
 {
-   unsigned *dr;
+   struct gpio_desc **dr;
int rc;
 
-   dr = devres_alloc(devm_gpio_release, sizeof(unsigned), GFP_KERNEL);
+   dr = devres_alloc(devm_gpio_release, sizeof(struct gpio_desc *),
+ GFP_KERNEL);
if (!dr)
return -ENOMEM;
 
@@ -93,7 +97,7 @@ int devm_gpio_request_one(struct device *dev, unsigned gpio,
return rc;
}
 
-   *dr = gpio;
+   *dr = gpio_to_desc(gpio);
devres_add(dev, dr);
 
return 0;
@@ -112,8 +116,39 @@ EXPORT_SYMBOL(devm_gpio_request_one);
  */
 void devm_gpio_free(struct device *dev, unsigned int gpio)
 {
+   struct gpio_desc *desc = gpio_to_desc(gpio);
 
-   WARN_ON(devres_release(dev, devm_gpio_release, devm_gpio_match,
-   ));
+   devm_gpiod_put(dev, desc);
 }
 EXPORT_SYMBOL(devm_gpio_free);
+
+struct gpio_desc *__must_check devm_gpiod_get(struct device *dev,
+const char *con_id)
+{
+   struct gpio_desc **dr;
+   struct gpio_desc *desc;
+
+   dr = devres_alloc(devm_gpio_release, sizeof(struct gpio_desc *),
+ GFP_KERNEL);
+   if (!dr)
+   return ERR_PTR(-ENOMEM);
+
+   desc = gpiod_get(dev, con_id);
+   if (IS_ERR_OR_NULL(desc)) {
+   devres_free(dr);
+   return desc;
+   }
+
+   *dr = desc;
+   devres_add(dev, dr);
+
+   return 0;
+}
+EXPORT_SYMBOL(devm_gpiod_get);
+
+void devm_gpiod_put(struct device *dev, struct gpio_desc *desc)
+{
+   WARN_ON(devres_release(dev, devm_gpio_release, devm_gpio_match,
+   ));
+}
+EXPORT_SYMBOL(devm_gpiod_put);
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 199fca1..d04b90b 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -11,6 +11,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -76,6 +78,9 @@ static struct gpio_desc gpio_desc[ARCH_NR_GPIOS];
 static DEFINE_IDR(dirent_idr);
 #endif
 
+static int gpiod_request(struct gpio_desc *desc, const char *label);
+static void gpiod_free(struct gpio_desc *desc);
+
 static inline void desc_set_label(struct gpio_desc *d, const char *label)
 {
 #ifdef CONFIG_DEBUG_FS
@@ -83,6 +88,38 @@ static inline void 

[PATCH 2/4] gpiolib: add gpiod_get and gpiod_put functions

2013-01-07 Thread Alexandre Courbot
Adds new GPIO allocation functions that work with the opaque descriptor
interface.

Signed-off-by: Alexandre Courbot 
---
 drivers/gpio/gpiolib.c| 164 ++
 include/linux/gpio/consumer.h |   8 +++
 include/linux/gpio/driver.h   |  21 ++
 3 files changed, 193 insertions(+)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index d04b90b..06ffadb 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -74,6 +75,11 @@ struct gpio_desc {
 };
 static struct gpio_desc gpio_desc[ARCH_NR_GPIOS];
 
+/* protects both gpio_lookup_list and gpio_chips */
+static DEFINE_MUTEX(gpio_lookup_lock);
+static LIST_HEAD(gpio_lookup_list);
+static LIST_HEAD(gpio_chips);
+
 #ifdef CONFIG_GPIO_SYSFS
 static DEFINE_IDR(dirent_idr);
 #endif
@@ -1162,6 +1168,11 @@ int gpiochip_add(struct gpio_chip *chip)
 
of_gpiochip_add(chip);
 
+   INIT_LIST_HEAD(>list);
+   mutex_lock(_lookup_lock);
+   list_add(>list, _chips);
+   mutex_unlock(_lookup_lock);
+
 unlock:
spin_unlock_irqrestore(_lock, flags);
 
@@ -1200,6 +1211,10 @@ int gpiochip_remove(struct gpio_chip *chip)
 
spin_lock_irqsave(_lock, flags);
 
+   mutex_lock(_lookup_lock);
+   list_del_init(>list);
+   mutex_unlock(_lookup_lock);
+
gpiochip_remove_pin_ranges(chip);
of_gpiochip_remove(chip);
 
@@ -1924,6 +1939,155 @@ void gpiod_set_value_cansleep(struct gpio_desc *desc, 
int value)
 }
 EXPORT_SYMBOL_GPL(gpiod_set_value_cansleep);
 
+
+/**
+ * gpioh_add_table() - register GPIO device consumers
+ * @table: array of consumers to register
+ * @num: number of consumers in table
+ */
+void gpiod_add_table(struct gpiod_lookup *table, size_t size)
+{
+   mutex_lock(_lookup_lock);
+
+   while (size--) {
+   list_add_tail(>list, _lookup_list);
+   table++;
+   }
+
+   mutex_unlock(_lookup_lock);
+}
+
+/*
+ * Caller must have a acquired gpio_lookup_lock
+ */
+static struct gpio_chip *find_chip_by_name(const char *name)
+{
+   struct gpio_chip *chip = NULL;
+
+   list_for_each_entry(chip, _lookup_list, list) {
+   if (chip->label == NULL)
+   continue;
+   if (!strcmp(chip->label, name))
+   break;
+   }
+
+   return chip;
+}
+
+#ifdef CONFIG_OF
+static struct gpio_desc *of_find_gpio(struct device *dev, const char *con_id)
+{
+   char prop_name[32]; /* 32 is max size of property name */
+
+   if (con_id)
+   snprintf(prop_name, 32, "%s-gpios", con_id);
+   else
+   snprintf(prop_name, 32, "gpios");
+
+   return of_get_named_gpiod_flags(dev->of_node, prop_name, 0, NULL);
+}
+#else
+static struct device_node *of_find_gpio(struct device *dev, const char *id)
+{
+   return NULL;
+}
+#endif
+
+static struct gpio_desc *gpiod_find(struct device *dev, const char *con_id)
+{
+   const char *dev_id = dev ? dev_name(dev) : NULL;
+   struct gpio_desc *desc = ERR_PTR(-ENODEV);
+   unsigned int match, best = 0;
+   struct gpiod_lookup *p;
+
+   mutex_lock(_lookup_lock);
+
+   list_for_each_entry(p, _lookup_list, list) {
+   match = 0;
+
+   if (p->dev_id) {
+   if (!dev_id || strcmp(p->dev_id, dev_id))
+   continue;
+
+   match += 2;
+   }
+
+   if (p->con_id) {
+   if (!con_id || strcmp(p->con_id, con_id))
+   continue;
+
+   match += 1;
+   }
+
+   if (match > best) {
+   struct gpio_chip *chip;
+
+   chip = find_chip_by_name(p->chip_label);
+   if (!chip) {
+   dev_warn(dev, "cannot find GPIO chip %s\n",
+p->chip_label);
+   continue;
+   }
+
+   if (chip->ngpio >= p->chip_hwnum) {
+   dev_warn(dev, "GPIO chip %s has %d GPIOs\n",
+chip->label, chip->ngpio);
+   continue;
+   }
+
+   desc = gpio_to_desc(chip->base + p->chip_hwnum);
+
+   if (match != 3)
+   best = match;
+   else
+   break;
+   }
+   }
+
+   mutex_unlock(_lookup_lock);
+
+   return desc;
+}
+
+/**
+ *
+ */
+struct gpio_desc *__must_check gpiod_get(struct device *dev, const char 
*con_id)
+{
+   struct gpio_desc *desc;
+   int status;
+
+   dev_dbg(dev, "GPIO lookup for consumer %s\n", con_id);
+
+   /* Using device tree? */
+   if (IS_ENABLED(CONFIG_OF) 

[PATCH 3/4] gpiolib: of: convert OF helpers to descriptor API

2013-01-07 Thread Alexandre Courbot
Convert gpiolib-of.c's internals to rely on descriptors instead of
integers and add the gpiod_ counterparts of existing OF functions.

Signed-off-by: Alexandre Courbot 
---
 drivers/gpio/gpiolib-of.c | 26 +--
 include/linux/gpio/consumer.h | 56 +
 include/linux/gpio/driver.h   | 38 ++
 include/linux/of_gpio.h   | 73 ++-
 4 files changed, 120 insertions(+), 73 deletions(-)

diff --git a/drivers/gpio/gpiolib-of.c b/drivers/gpio/gpiolib-of.c
index d542a14..8c9f8c5 100644
--- a/drivers/gpio/gpiolib-of.c
+++ b/drivers/gpio/gpiolib-of.c
@@ -13,6 +13,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -22,12 +23,14 @@
 #include 
 #include 
 
+struct gpio_desc;
+
 /* Private data structure for of_gpiochip_find_and_xlate */
 struct gg_data {
enum of_gpio_flags *flags;
struct of_phandle_args gpiospec;
 
-   int out_gpio;
+   struct gpio_desc *out_gpio;
 };
 
 /* Private function for resolving node pointer to gpio_chip */
@@ -45,28 +48,31 @@ static int of_gpiochip_find_and_xlate(struct gpio_chip *gc, 
void *data)
if (ret < 0)
return false;
 
-   gg_data->out_gpio = ret + gc->base;
+   gg_data->out_gpio = gpio_to_desc(ret + gc->base);
return true;
 }
 
 /**
- * of_get_named_gpio_flags() - Get a GPIO number and flags to use with GPIO API
+ * of_get_named_gpiod_flags() - Get a GPIO descriptor and flags for GPIO API
  * @np:device node to get GPIO from
  * @propname:  property name containing gpio specifier(s)
  * @index: index of the GPIO
  * @flags: a flags pointer to fill in
  *
- * Returns GPIO number to use with Linux generic GPIO API, or one of the errno
+ * Returns GPIO descriptor to use with Linux GPIO API, or one of the errno
  * value on the error condition. If @flags is not NULL the function also fills
  * in flags for the GPIO.
  */
-int of_get_named_gpio_flags(struct device_node *np, const char *propname,
-   int index, enum of_gpio_flags *flags)
+struct gpio_desc *of_get_named_gpiod_flags(struct device_node *np,
+   const char *propname, int index, enum of_gpio_flags *flags)
 {
/* Return -EPROBE_DEFER to support probe() functions to be called
 * later when the GPIO actually becomes available
 */
-   struct gg_data gg_data = { .flags = flags, .out_gpio = -EPROBE_DEFER };
+   struct gg_data gg_data = {
+   .flags = flags,
+   .out_gpio = ERR_PTR(-EPROBE_DEFER),
+   };
int ret;
 
/* .of_xlate might decide to not fill in the flags, so clear it. */
@@ -77,13 +83,15 @@ int of_get_named_gpio_flags(struct device_node *np, const 
char *propname,
 _data.gpiospec);
if (ret) {
pr_debug("%s: can't parse gpios property\n", __func__);
-   return ret;
+   return ERR_PTR(ret);
}
 
gpiochip_find(_data, of_gpiochip_find_and_xlate);
 
of_node_put(gg_data.gpiospec.np);
-   pr_debug("%s exited with status %d\n", __func__, gg_data.out_gpio);
+
+   ret = PTR_RET(gg_data.out_gpio);
+   pr_debug("%s exited with status %d\n", __func__, ret);
return gg_data.out_gpio;
 }
 EXPORT_SYMBOL(of_get_named_gpio_flags);
diff --git a/include/linux/gpio/consumer.h b/include/linux/gpio/consumer.h
index 2f30761..b6a3bc8 100644
--- a/include/linux/gpio/consumer.h
+++ b/include/linux/gpio/consumer.h
@@ -3,6 +3,8 @@
 
 #ifdef CONFIG_GPIOLIB
 
+#include 
+
 struct device;
 struct gpio_chip;
 
@@ -67,6 +69,60 @@ static inline void gpiod_unexport(struct gpio_desc *desc)
 
 #endif /* CONFIG_GPIO_SYSFS */
 
+
+struct device_node;
+
+/*
+ * This is Linux-specific flags. By default controllers' and Linux' mapping
+ * match, but GPIO controllers are free to translate their own flags to
+ * Linux-specific in their .xlate callback. Though, 1:1 mapping is recommended.
+ */
+enum of_gpio_flags {
+   OF_GPIO_ACTIVE_LOW = 0x1,
+};
+
+#ifdef CONFIG_OF_GPIO
+
+extern unsigned int of_gpio_named_count(struct device_node *np,
+   const char *propname);
+
+extern struct gpio_desc *of_get_named_gpiod_flags(struct device_node *np,
+   const char *list_name, int index, enum of_gpio_flags *flags);
+
+#else
+
+/* Drivers may not strictly depend on the GPIO support, so let them link. */
+static inline unsigned int of_gpio_named_count(struct device_node *np,
+  const char *propname)
+{
+   return 0;
+}
+
+static inline struct gpio_desc *of_get_named_gpiod_flags(struct device_node 
*np,
+   const char *list_name, int index, enum of_gpio_flags *flags)
+{
+   return ERR_PTR(-ENOSYS);
+}
+
+#endif /* CONFIG_OF_GPIO */
+
+static inline struct gpio_desc *of_get_gpiod_flags(struct device_node *np,
+ 

[PATCH 0/4] gpio: introduce descriptor-based interface

2013-01-07 Thread Alexandre Courbot
This series introduce a first take at implementing the RFC for the new GPIO API
that I submitted last month. It proposes a new, opaque descriptor-based GPIO API
that becomes available when GPIOlib is compiled, and provides a safer, more
abstract alternative to the current integer-based interface. GPIOlib internals
are also switched to use the descriptor logic, and the former integer API
becomes a lightweight wrapper around the new descriptor-based API.

Functionally speaking the new API is identical to the integer-based API, with
only the prefix changing from gpio_ to gpiod_. However, the second patch
introduces new functions for obtaining GPIOs from a device and a consumer name,
in a fashion similar to what is done with e.g. regulators and PWMs.

GPIOs can then be provided either by board-specific lookup tables, or through
the device tree. Device tree lookup might require some attention as it does not
handle properties with multiple descriptors yet. Also, there is currently no
equivalent of gpio_request_array() and GPIOs can only be allocated one-by-one.
Feedback about the relevancy of batch requesting GPIOs is welcome.

This patch series also prepares GPIOlib for the next step, which is getting rid
of ARCH_NR_GPIOS and of the static array in GPIOlib and replace the latter with
per-chip arrays that are allocated when the chip is added. Some challenge may
arise from the fact that gpiochip_add is potentially called before kmalloc is
available.

Anyway, I expect this patchset to go through several iterations in order to
address the points mentioned above (and of course the ones I missed). As usual,
your valuable feedback is most welcome.

Alexandre Courbot (4):
  gpiolib: introduce descriptor-based GPIO interface
  gpiolib: add gpiod_get and gpiod_put functions
  gpiolib: of: convert OF helpers to descriptor API
  gpiolib: add documentation for new gpiod_ API

 Documentation/gpio.txt|  94 +-
 drivers/gpio/devres.c |  59 +++-
 drivers/gpio/gpiolib-of.c |  26 +-
 drivers/gpio/gpiolib.c| 648 --
 include/asm-generic/gpio.h| 176 
 include/linux/gpio/consumer.h | 128 +
 include/linux/gpio/driver.h   | 171 +++
 include/linux/of_gpio.h   |  73 +
 8 files changed, 956 insertions(+), 419 deletions(-)
 create mode 100644 include/linux/gpio/consumer.h
 create mode 100644 include/linux/gpio/driver.h

-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 RESEND] pwm: atmel: add Timer Counter Block PWM driver

2013-01-07 Thread Thierry Reding
On Thu, Dec 20, 2012 at 10:12:56AM +0100, Boris BREZILLON wrote:
> Hi,
> 
> Sorry for resend. The previous version still has alignment issues on 
> atmel_tcb_pwm_set_polarity, atmel_tcb_pwm_request and
> atmel_tcb_pwm_config function parameters.
> 
> This patch adds a PWM driver based on Atmel Timer Counter Block.
> Timer Counter Block is used in Waveform generator mode.
> 
> A Timer Counter Block provides up to 6 PWM devices grouped by 2:
> * group 0 = PWM 0 and 1
> * group 1 = PWM 1 and 2
> * group 2 = PMW 3 and 4

Should this be "PWM 2 and 3" and "PWM 4 and 5"? Or is PWM 1 shared
between groups 0 and 1?

> +static int atmel_tcb_pwm_request(struct pwm_chip *chip,
> +  struct pwm_device *pwm)
> +{
[...]
> + clk_enable(tc->clk[group]);

You need to check the return value of clk_enable(). There's always a
small possibility that it may fail.

> +static void atmel_tcb_pwm_disable(struct pwm_chip *chip, struct pwm_device 
> *pwm)
> +{
[...]
> + /* If duty is 0 reverse polarity */
> + if (tcbpwm->duty == 0)
> + polarity = !polarity;

Rather than commenting on what the code does, this should say why it
does so.

> +static int atmel_tcb_pwm_enable(struct pwm_chip *chip, struct pwm_device 
> *pwm)
> +{
[...]
> + /* If duty is 0 reverse polarity */
> + if (tcbpwm->duty == 0)
> + polarity = !polarity;

Same here.

> +static int atmel_tcb_pwm_probe(struct platform_device *pdev)
> +{
[...]
> + struct atmel_tcb_pwm_chip *tcbpwm;
> + struct device_node *np = pdev->dev.of_node;
> + struct atmel_tc *tc;
> + int err;
> + int tcblock;
> +
> + err = of_property_read_u32(np, "tc-block", );
> + if (err < 0) {
> + dev_err(>dev,
> + "failed to get tc block number from device tree (error: 
> %d)\n",

Maybe: "failed to get Timer Counter Block number..." to make it
consistent with the error message below:

> + tc = atmel_tc_alloc(tcblock, "tcb-pwm");
> + if (tc == NULL) {
> + dev_err(>dev, "failed to allocate Timer Counter Block\n");
> + return -ENOMEM;
> + }
[...]
> +static const struct of_device_id atmel_tcb_pwm_dt_ids[] = {
> + { .compatible = "atmel,tcb-pwm", },
> + { /* sentinel */ }
> +};
> +MODULE_DEVICE_TABLE(of, mxs_pwm_dt_ids);

This is still wrong.

> +static struct platform_driver atmel_tcb_pwm_driver = {
> + .driver = {
> + .name = "atmel-tcb-pwm",
> + .of_match_table = atmel_tcb_pwm_dt_ids,
> + },
> + .probe = atmel_tcb_pwm_probe,
> + .remove = atmel_tcb_pwm_remove,
> +};
> +module_platform_driver(atmel_tcb_pwm_driver);
> +
> +MODULE_AUTHOR("Boris BREZILLON ");
> +MODULE_DESCRIPTION("Atmel Timer Counter Pulse Width Modulation Driver");
> +MODULE_ALIAS("platform:atmel-tcb-pwm");

I don't think you needMODULE_ALIAS() if the alias is the same as the
driver name.

Thierry


pgp8PUjwyYn_P.pgp
Description: PGP signature


Re: [PATCH] spi: tegra: sequence compatible strings as per preference

2013-01-07 Thread Thierry Reding
On Wed, Dec 19, 2012 at 05:00:09PM +, Grant Likely wrote:
> On Sat, 10 Nov 2012 18:07:42 +0100, Thierry Reding 
>  wrote:
> > On Fri, Nov 09, 2012 at 10:28:38AM -0700, Stephen Warren wrote:
> > > On 11/09/2012 10:10 AM, Mark Brown wrote:
> > > > On Fri, Nov 09, 2012 at 10:04:56AM -0700, Stephen Warren wrote:
> > > > 
> > > >> However just FYI, it should not be necessary for correctness; The
> > > >> DT matching order is supposed to be driven purely by the order of
> > > >> the compatible values in the DT now, and not affected by the
> > > >> order of values in the table. (This wasn't always the case, but
> > > >> was a bug that was fixed IIRC by Thierry Reding).
> > > > 
> > > > I guess the driver is being used backported in older kernels which
> > > > don't have that fix?
> > > 
> > > That sounds likely. Laxman, it'd be a good idea to track down the fix
> > > to the DT matching code and backport it, so that hard-to debug issues
> > > aren't caused by the lack of that patch!
> > 
> > Unfortunately the patch that was supposed to fixed this caused a
> > regression and was therefore reverted. Rob (Cc'ed) said there was a
> > patch to fix it properly and was supposed to go into 3.6 but it seems
> > that never happened. Rob, what's the status on this?
> > 
> > The revert is here: bc51b0c22cebf5c311a6f1895fcca9f78efd0478
> 
> Rob, ping on this. I think we talked about it on IRC, but I cannot
> remember what was said I must be getting old.

Any news on this one?

Thierry


pgp7Q58srmxfA.pgp
Description: PGP signature


Re: [RFC]x86: clearing access bit don't flush tlb

2013-01-07 Thread Rik van Riel

On 01/08/2013 12:09 AM, H. Peter Anvin wrote:

On 01/07/2013 09:08 PM, Rik van Riel wrote:

On 01/08/2013 12:03 AM, H. Peter Anvin wrote:

On 01/07/2013 08:55 PM, Shaohua Li wrote:


I searched a little bit, the change (doing TLB flush to clear access
bit) is
made between 2.6.7 - 2.6.8, I can't find the changelog, but I found a
patch:
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.7-rc2/2.6.7-rc2-mm2/broken-out/mm-flush-tlb-when-clearing-young.patch


The changelog declaims this is for arm/ppc/ppc64.



Not really.  It says that those have stumbled over it already.  It is
true in general that this change will make very frequently used pages
(which stick in the TLB) candidates for eviction.


That is only true if the pages were to stay in the TLB for a
very very long time.  Probably multiple seconds.


x86 would seem to be just as affected, although possibly with a
different frequency.

Do we have any actual metrics on anything here?


I suspect that if we do need to force a TLB flush for page
reclaim purposes, it may make sense to do that TLB flush
asynchronously. For example, kswapd could kick off a TLB
flush of every CPU in the system once a second, when the
system is under pageout pressure.

We would have to do this in a smart way, so the kswapds
from multiple nodes do not duplicate the work.

If people want that kind of functionality, I would be
happy to cook up an RFC patch.



So it sounds like you're saying that this patch should never have been
applied in the first place?


It made sense at the time.

However, with larger SMP systems, we may need a different
mechanism to get the TLB flushes done after we clear a bunch
of accessed bits.

One thing we could do is mark bits in a bitmap, keeping track
of which CPUs should have their TLB flushed due to accessed bit
scanning.

Then we could set a timer for eg. a 1 second timeout, after
which the TLB flush IPIs get sent. If the timer is already
pending, we do not start it, but piggyback on the invocation
that is already scheduled to happen.

Does something like that make sense?

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] cpuhotplug/nohz: Remove offline cpus from nohz-idle state

2013-01-07 Thread Srivatsa S. Bhat
On 01/05/2013 04:06 PM, Russell King - ARM Linux wrote:
> On Thu, Jan 03, 2013 at 06:58:38PM -0800, Srivatsa Vaddagiri wrote:
>> I also think that the
>> wait_for_completion() based wait in ARM's __cpu_die() can be replaced with a
>> busy-loop based one, as the wait there in general should be terminated within
>> few cycles.
> 
> Why open-code this stuff when we have infrastructure already in the kernel
> for waiting for stuff to happen?  I chose to use the standard infrastructure
> because its better tested, and avoids having to think about whether we need
> CPU barriers and such like to ensure that updates are seen in a timely
> manner.
> 
> My stance on a lot of this idle/cpu dying code is that much of it can
> probably be cleaned up and merged into a single common implementation -
> in which case the use of standard infrastructure for things like waiting
> for other CPUs do stuff is even more justified.

On similar lines, Nikunj (in CC) and I had posted a patchset sometime ago to
consolidate some of the CPU hotplug related code in the various architectures
into a common standard implementation [1].

However, we ended up hitting a problem with Xen, because its existing code
was unlike the other arch/ pieces [2]. At that time, we decided that we will
first make the CPU online and offline paths symmetric in the generic code and
then provide a common implementation of the duplicated bits in arch/, for the
new CPU hotplug model [3].

I guess we should probably revisit it sometime, consolidating the code in
incremental steps if not all at a time...

--
[1]. http://lwn.net/Articles/500185/
[2]. http://thread.gmane.org/gmane.linux.kernel.cross-arch/14342/focus=14430
[3]. http://thread.gmane.org/gmane.linux.kernel.cross-arch/14342/focus=15567

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 1/3] videobuf2-dma-contig: user can specify GFP flags

2013-01-07 Thread Marek Szyprowski

Hello,

On 1/6/2013 6:29 PM, Federico Vaga wrote:

This is useful when you need to specify specific GFP flags during memory
allocation (e.g. GFP_DMA).

Signed-off-by: Federico Vaga 
---
  drivers/media/v4l2-core/videobuf2-dma-contig.c | 7 ++-
  include/media/videobuf2-dma-contig.h   | 5 +
  2 file modificati, 7 inserzioni(+), 5 rimozioni(-)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c 
b/drivers/media/v4l2-core/videobuf2-dma-contig.c
index 10beaee..bb411c0 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
@@ -21,10 +21,6 @@
  #include 
  #include 
  
-struct vb2_dc_conf {

-   struct device   *dev;
-};
-
  struct vb2_dc_buf {
struct device   *dev;
void*vaddr;
@@ -165,7 +161,8 @@ static void *vb2_dc_alloc(void *alloc_ctx, unsigned long 
size)
/* align image size to PAGE_SIZE */
size = PAGE_ALIGN(size);
  
-	buf->vaddr = dma_alloc_coherent(dev, size, >dma_addr, GFP_KERNEL);

+   buf->vaddr = dma_alloc_coherent(dev, size, >dma_addr,
+   GFP_KERNEL 
| conf->mem_flags);


I think we can add GFP_DMA flag unconditionally to the vb2_dc_contig 
allocator.
It won't hurt existing clients as most of nowadays platforms doesn't 
have DMA
zone (GFP_DMA is ignored in such case), but it should fix the issues 
with some

older and non-standard systems.


if (!buf->vaddr) {
dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
kfree(buf);
diff --git a/include/media/videobuf2-dma-contig.h 
b/include/media/videobuf2-dma-contig.h
index 8197f87..22733f4 100644
--- a/include/media/videobuf2-dma-contig.h
+++ b/include/media/videobuf2-dma-contig.h
@@ -16,6 +16,11 @@
  #include 
  #include 
  
+struct vb2_dc_conf {

+   struct device   *dev;
+   gfp_t   mem_flags;
+};
+
  static inline dma_addr_t
  vb2_dma_contig_plane_dma_addr(struct vb2_buffer *vb, unsigned int plane_no)
  {


Best regards
--
Marek Szyprowski
Samsung Poland R Center


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: Tree for Jan 7 (rcutorture)

2013-01-07 Thread Randy Dunlap
On 01/07/13 19:53, Paul E. McKenney wrote:
> On Mon, Jan 07, 2013 at 07:36:19PM -0500, Steven Rostedt wrote:
>> On Mon, 2013-01-07 at 18:12 -0500, Steven Rostedt wrote:
>>> On Tue, 2013-01-08 at 09:59 +1100, Stephen Rothwell wrote:
>>>> Hi Paul,
>>>>
>>>> On Mon, 7 Jan 2013 14:16:27 -0800 "Paul E. McKenney" 
>>>>  wrote:
>>>>>
>>>>> On Mon, Jan 07, 2013 at 11:42:36AM -0800, Randy Dunlap wrote:
>>>>>>
>>>>>> on i386 or x86_64:
>>>>>>
>>>>>> ERROR: "trace_clock_local" [kernel/rcutorture.ko] undefined!
>>>>>
>>>>> Hello, Randy,
>>>>>
>>>>> Did your build include the following, also pushed to -next in that same
>>>>> batch from -rcu?  Including Steven Rostedt on CC for his take.
>>>>
>>>> That commit was certainly in next-20130107.
>>>
>>> Could be bad config dependencies.
>>
>> Paul,
>>
>> You need to also select TRACE_CLOCK if you are going to use it.
> 
> Thank you, Steve!
> 
> Randy, does the following patch help?

Yes, that's good.

Acked-by: Randy Dunlap 

Thanks.


>   Thanx, Paul
> 
> 
> 
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index ce75d3b..b0fe7bd 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1010,6 +1010,7 @@ config RCU_CPU_STALL_INFO
>  config RCU_TRACE
>   bool "Enable tracing for RCU"
>   depends on DEBUG_KERNEL
> + select TRACE_CLOCK
>   help
> This option provides tracing in RCU which presents stats
> in debugfs for debugging RCU implementation.
> 
> --



-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 2/4] videobuf2-dma-streaming: new videobuf2 memory allocator

2013-01-07 Thread Marek Szyprowski


On 1/7/2013 9:15 PM, Mauro Carvalho Chehab wrote:

Em Mon, 7 Jan 2013 12:40:50 -0700
Jonathan Corbet  escreveu:

> On Mon, 7 Jan 2013 00:09:47 +0100
> Alessandro Rubini  wrote:
>
> > I don't expect you'll see serious performance differences on the PC. I
> > think ARM users will have better benefits, due to the different cache
> > architecture.  You told me Jon measured meaningful figures on a Marvel
> > CPU.
>
> It made the difference between 10 frames per second with the CPU running
> flat out and 30fps mostly idle.  I think that probably counts as
> meaningful, yeah...:)

Couldn't this performance difference be due to the usage of GFP_DMA inside
the VB2 code, like Federico's new patch series is proposing?

If not, why are there a so large performance penalty?


Nope, this was caused rather by a very poor CPU access to non-cached (aka
'coherent') memory and the way the video data has been accessed/read 
with CPU.


Best regards
--
Marek Szyprowski
Samsung Poland R Center


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v12 0/9] LSM: Multiple concurrent LSMs

2013-01-07 Thread Vasily Kulikov
On Mon, Jan 07, 2013 at 20:02 -0800, Casey Schaufler wrote:
> On 1/7/2013 7:01 PM, Stephen Rothwell wrote:
> > Let me ask Andrew's question:  Why do you want to do this (what is the
> > use case)?  What does this gain us?
> 
> There has been an amazing amount of development in system security
> over the past three years. Almost none of it has been in the kernel.
> One important reason that it is not getting done in the kernel is
> that the current single LSM restriction requires an all or nothing
> approach to security. Either you address all your needs with a single
> LSM or you have to go with a user space solution, in which case you
> may as well do everything in user space.
[...]

You should also update Documentation/security/LSM.txt with new "security="
rules and rules of LSM stacking limitations.  Motivation of stacking is
probably worth noting in Documentation/ too.

Thanks,

-- 
Vasily Kulikov
http://www.openwall.com - bringing security into open computing environments
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH] PM / devfreq: Add runtime-pm support

2013-01-07 Thread Rajagopal Venkat
Instead of devfreq device driver explicitly calling devfreq suspend
and resume apis perhaps from runtime-pm suspend and resume callbacks,
let devfreq core handle it automatically.

Attach devfreq core to runtime-pm framework so that, devfreq device
driver pm_runtime_suspend() will automatically suspend the devfreq
and pm_runtime_resume() will resume the devfreq.

Signed-off-by: Rajagopal Venkat 
---
 drivers/devfreq/devfreq.c |   89 ++---
 include/linux/devfreq.h   |   12 --
 2 files changed, 76 insertions(+), 25 deletions(-)

diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 4c50235..781ea47 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -25,10 +25,9 @@
 #include 
 #include 
 #include 
+#include 
 #include "governor.h"
 
-static struct class *devfreq_class;
-
 /*
  * devfreq core provides delayed work based load monitoring helper
  * functions. Governors can use these or can implement their own
@@ -42,6 +41,9 @@ static LIST_HEAD(devfreq_governor_list);
 static LIST_HEAD(devfreq_list);
 static DEFINE_MUTEX(devfreq_list_lock);
 
+static int devfreq_suspend_device(struct devfreq *devfreq);
+static int devfreq_resume_device(struct devfreq *devfreq);
+
 /**
  * find_device_devfreq() - find devfreq struct using device pointer
  * @dev:   device pointer used to lookup device devfreq.
@@ -453,6 +455,61 @@ static void devfreq_dev_release(struct device *dev)
_remove_devfreq(devfreq, true);
 }
 
+static int devfreq_runtime_suspend(struct device *dev)
+{
+   int ret;
+   struct devfreq *devfreq;
+
+   mutex_lock(_list_lock);
+   devfreq = find_device_devfreq(dev);
+   mutex_unlock(_list_lock);
+
+   ret = devfreq_suspend_device(devfreq);
+   if (ret < 0)
+   goto out;
+
+   ret = pm_generic_runtime_suspend(dev);
+out:
+   return ret;
+}
+
+static int devfreq_runtime_resume(struct device *dev)
+{
+   int ret;
+   struct devfreq *devfreq;
+
+   mutex_lock(_list_lock);
+   devfreq = find_device_devfreq(dev);
+   mutex_unlock(_list_lock);
+
+   ret = devfreq_resume_device(devfreq);
+   if (ret < 0)
+   goto out;
+
+   ret = pm_generic_runtime_resume(dev);
+out:
+   return ret;
+}
+
+static int devfreq_runtime_idle(struct device *dev)
+{
+   return pm_generic_runtime_idle(dev);
+}
+
+static const struct dev_pm_ops devfreq_pm_ops = {
+   SET_RUNTIME_PM_OPS(
+   devfreq_runtime_suspend,
+   devfreq_runtime_resume,
+   devfreq_runtime_idle
+   )
+};
+
+static struct class devfreq_class = {
+   .name = "devfreq",
+   .owner = THIS_MODULE,
+   .pm = _pm_ops,
+};
+
 /**
  * devfreq_add_device() - Add devfreq feature to the device
  * @dev:   the device to add devfreq feature.
@@ -494,8 +551,9 @@ struct devfreq *devfreq_add_device(struct device *dev,
 
mutex_init(>lock);
mutex_lock(>lock);
+   dev->class = _class;
devfreq->dev.parent = dev;
-   devfreq->dev.class = devfreq_class;
+   devfreq->dev.class = _class;
devfreq->dev.release = devfreq_dev_release;
devfreq->profile = profile;
strncpy(devfreq->governor_name, governor_name, DEVFREQ_NAME_LEN);
@@ -538,6 +596,9 @@ struct devfreq *devfreq_add_device(struct device *dev,
goto err_init;
}
 
+   pm_runtime_get_noresume(dev);
+   pm_runtime_set_active(dev);
+
return devfreq;
 
 err_init:
@@ -569,7 +630,7 @@ EXPORT_SYMBOL(devfreq_remove_device);
  * devfreq_suspend_device() - Suspend devfreq of a device.
  * @devfreq: the devfreq instance to be suspended
  */
-int devfreq_suspend_device(struct devfreq *devfreq)
+static int devfreq_suspend_device(struct devfreq *devfreq)
 {
if (!devfreq)
return -EINVAL;
@@ -580,13 +641,12 @@ int devfreq_suspend_device(struct devfreq *devfreq)
return devfreq->governor->event_handler(devfreq,
DEVFREQ_GOV_SUSPEND, NULL);
 }
-EXPORT_SYMBOL(devfreq_suspend_device);
 
 /**
  * devfreq_resume_device() - Resume devfreq of a device.
  * @devfreq: the devfreq instance to be resumed
  */
-int devfreq_resume_device(struct devfreq *devfreq)
+static int devfreq_resume_device(struct devfreq *devfreq)
 {
if (!devfreq)
return -EINVAL;
@@ -597,12 +657,12 @@ int devfreq_resume_device(struct devfreq *devfreq)
return devfreq->governor->event_handler(devfreq,
DEVFREQ_GOV_RESUME, NULL);
 }
-EXPORT_SYMBOL(devfreq_resume_device);
 
 /**
  * devfreq_add_governor() - Add devfreq governor
  * @governor:  the devfreq governor to be added
  */
+
 int devfreq_add_governor(struct devfreq_governor *governor)
 {
struct devfreq_governor *g;
@@ -770,6 +830,7 @@ out:
ret = count;
return ret;
 }
+
 static ssize_t show_available_governors(struct device *d,

Re: [PATCH v12 0/9] LSM: Multiple concurrent LSMs

2013-01-07 Thread Vasily Kulikov
On Mon, Jan 07, 2013 at 20:11 -0800, Casey Schaufler wrote:
> On 1/7/2013 7:59 PM, Stephen Rothwell wrote:
> > You probably also want to think a bit harder about the order of the
> > patches - you should introduce new APIs before you use them and remove
> > calls to functions before you remove the functions.
> >
> The unfortunate reality is that I couldn't find a good way to stage the
> changes. It's a wonking big set of infrastructure change. I could introduce
> the security blob abstraction separately but that is a fraction of the
> change. If it would have gone through mail filters as a single patch I'd
> have sent it that way.
> 
> I can spend time on patch presentation, and will if necessary.

I guess it can be divided this way:

1) Introduce lsm_get_cred(), etc. which unconditionally return
->security, ->i_security, etc.

2) Move all LSMs, procfs, etc. to the new API, a patch per LSM/subsystem.

3) Change structures along with new API.


The pro of the division is that if you have a bug in the series (and you
surely have! ;)) it is MUCH more simple to locate this bug (bisect, etc.).
Also it is more descriptive as you divide LSM changes and the core security
subsystem itself targeted on multiple LSMs which divides LSM
implementations (which might be not very important for someone) and the core
architecture (which is important for everybody).

Thanks,

-- 
Vasily Kulikov
http://www.openwall.com - bringing security into open computing environments
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 5/5] KVM: x86: improve reexecute_instruction

2013-01-07 Thread Xiao Guangrong
The current reexecute_instruction can not well detect the failed instruction
emulation. It allows guest to retry all the instructions except it accesses
on error pfn

For example, some cases are nested-write-protect - if the page we want to
write is used as PDE but it chains to itself. Under this case, we should
stop the emulation and report the case to userspace

Signed-off-by: Xiao Guangrong 
---
 arch/x86/include/asm/kvm_host.h |7 +++
 arch/x86/kvm/paging_tmpl.h  |   27 ---
 arch/x86/kvm/x86.c  |8 +++-
 3 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c431b33..d6ab8d2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -502,6 +502,13 @@ struct kvm_vcpu_arch {
u64 msr_val;
struct gfn_to_hva_cache data;
} pv_eoi;
+
+   /*
+* Indicate whether the access faults on its page table in guest
+* which is set when fix page fault and used to detect unhandeable
+* instruction.
+*/
+   bool write_fault_to_shadow_pgtable;
 };

 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 67b390d..df50560 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -497,26 +497,34 @@ out_gpte_changed:
  * created when kvm establishes shadow page table that stop kvm using large
  * page size. Do it early can avoid unnecessary #PF and emulation.
  *
+ * @write_fault_to_shadow_pgtable will return true if the fault gfn is
+ * currently used as its page table.
+ *
  * Note: the PDPT page table is not checked for PAE-32 bit guest. It is ok
  * since the PDPT is always shadowed, that means, we can not use large page
  * size to map the gfn which is used as PDPT.
  */
 static bool
 FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu,
- struct guest_walker *walker, int user_fault)
+ struct guest_walker *walker, int user_fault,
+ bool *write_fault_to_shadow_pgtable)
 {
int level;
gfn_t mask = ~(KVM_PAGES_PER_HPAGE(walker->level) - 1);
+   bool self_changed = false;

if (!(walker->pte_access & ACC_WRITE_MASK ||
  (!is_write_protection(vcpu) && !user_fault)))
return false;

-   for (level = walker->level; level <= walker->max_level; level++)
-   if (!((walker->gfn ^ walker->table_gfn[level - 1]) & mask))
-   return true;
+   for (level = walker->level; level <= walker->max_level; level++) {
+   gfn_t gfn = walker->gfn ^ walker->table_gfn[level - 1];
+
+   self_changed |= !(gfn & mask);
+   *write_fault_to_shadow_pgtable |= !gfn;
+   }

-   return false;
+   return self_changed;
 }

 /*
@@ -544,7 +552,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
int level = PT_PAGE_TABLE_LEVEL;
int force_pt_level;
unsigned long mmu_seq;
-   bool map_writable;
+   bool map_writable, is_self_change_mapping;

pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);

@@ -572,9 +580,14 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
return 0;
}

+   vcpu->arch.write_fault_to_shadow_pgtable = false;
+
+   is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
+ , user_fault, >arch.write_fault_to_shadow_pgtable);
+
if (walker.level >= PT_DIRECTORY_LEVEL)
force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn)
-  || FNAME(is_self_change_mapping)(vcpu, , user_fault);
+  || is_self_change_mapping;
else
force_pt_level = 1;
if (!force_pt_level) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6f13e03..2957012 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4810,7 +4810,13 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, 
gva_t cr2)
 * guest to let CPU execute the instruction.
 */
kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
-   return true;
+
+   /*
+* If the access faults on its page table, it can not
+* be fixed by unprotecting shadow page and it should
+* be reported to userspace.
+*/
+   return !vcpu->arch.write_fault_to_shadow_pgtable;
 }

 static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 4/5] KVM: x86: let reexecute_instruction work for tdp

2013-01-07 Thread Xiao Guangrong
Currently, reexecute_instruction refused to retry all instructions if
tdp is enabled. If nested npt is used, the emulation may be caused by
shadow page, it can be fixed by dropping the shadow page. And the only
condition that tdp can not retry the instruction is the access fault
on error pfn

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/x86.c |   61 ---
 1 files changed, 43 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 08cacd9..6f13e03 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4753,25 +4753,25 @@ static int handle_emulation_failure(struct kvm_vcpu 
*vcpu)
return r;
 }

-static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva)
+static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t cr2)
 {
-   gpa_t gpa;
+   gpa_t gpa = cr2;
pfn_t pfn;

-   if (tdp_enabled)
-   return false;
-
-   gpa = kvm_mmu_gva_to_gpa_system(vcpu, gva, NULL);
-   if (gpa == UNMAPPED_GVA)
-   return true; /* let cpu generate fault */
+   if (!vcpu->arch.mmu.direct_map) {
+   /*
+* Write permission should be allowed since only
+* write access need to be emulated.
+*/
+   gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2, NULL);

-   /*
-* if emulation was due to access to shadowed page table
-* and it failed try to unshadow page and re-enter the
-* guest to let CPU execute the instruction.
-*/
-   if (kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa)))
-   return true;
+   /*
+* If the mapping is invalid in guest, let cpu retry
+* it to generate fault.
+*/
+   if (gpa == UNMAPPED_GVA)
+   return true;
+   }

/*
 * Do not retry the unhandleable instruction if it faults on the
@@ -4780,12 +4780,37 @@ static bool reexecute_instruction(struct kvm_vcpu 
*vcpu, gva_t gva)
 * instruction -> ...
 */
pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
-   if (!is_error_noslot_pfn(pfn)) {
-   kvm_release_pfn_clean(pfn);
+
+   /*
+* If the instruction failed on the error pfn, it can not be fixed,
+* report the error to userspace.
+*/
+   if (is_error_noslot_pfn(pfn))
+   return false;
+
+   kvm_release_pfn_clean(pfn);
+
+   /* The instructions are well-emulated on direct mmu. */
+   if (vcpu->arch.mmu.direct_map) {
+   unsigned int indirect_shadow_pages;
+
+   spin_lock(>kvm->mmu_lock);
+   indirect_shadow_pages = vcpu->kvm->arch.indirect_shadow_pages;
+   spin_unlock(>kvm->mmu_lock);
+
+   if (indirect_shadow_pages)
+   kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
+
return true;
}

-   return false;
+   /*
+* if emulation was due to access to shadowed page table
+* and it failed try to unshadow page and re-enter the
+* guest to let CPU execute the instruction.
+*/
+   kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
+   return true;
 }

 static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 3/5] KVM: x86: clean up reexecute_instruction

2013-01-07 Thread Xiao Guangrong
Little cleanup for reexecute_instruction, also use gpa_to_gfn in
retry_instruction

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/x86.c |   13 ++---
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1c9c834..08cacd9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4761,19 +4761,18 @@ static bool reexecute_instruction(struct kvm_vcpu 
*vcpu, gva_t gva)
if (tdp_enabled)
return false;

+   gpa = kvm_mmu_gva_to_gpa_system(vcpu, gva, NULL);
+   if (gpa == UNMAPPED_GVA)
+   return true; /* let cpu generate fault */
+
/*
 * if emulation was due to access to shadowed page table
 * and it failed try to unshadow page and re-enter the
 * guest to let CPU execute the instruction.
 */
-   if (kvm_mmu_unprotect_page_virt(vcpu, gva))
+   if (kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa)))
return true;

-   gpa = kvm_mmu_gva_to_gpa_system(vcpu, gva, NULL);
-
-   if (gpa == UNMAPPED_GVA)
-   return true; /* let cpu generate fault */
-
/*
 * Do not retry the unhandleable instruction if it faults on the
 * readonly host memory, otherwise it will goto a infinite loop:
@@ -4828,7 +4827,7 @@ static bool retry_instruction(struct x86_emulate_ctxt 
*ctxt,
if (!vcpu->arch.mmu.direct_map)
gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2, NULL);

-   kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
+   kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));

return true;
 }
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 2/5] KVM: MMU: fix infinite fault access retry

2013-01-07 Thread Xiao Guangrong
We have two issues in current code:
- if target gfn is used as its page table, guest will refault then kvm will use
  small page size to map it. We need two #PF to fix its shadow page table

- sometimes, say a exception is triggered during vm-exit caused by #PF
  (see handle_exception() in vmx.c), we remove all the shadow pages shadowed
  by the target gfn before go into page fault path, it will cause infinite
  loop:
  delete shadow pages shadowed by the gfn -> try to use large page size to map
  the gfn -> retry the access ->...

To fix these, we can adjust page size early if the target gfn is used as page
table

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |   13 -
 arch/x86/kvm/paging_tmpl.h |   35 ++-
 2 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2a3c890..54fc61e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2380,15 +2380,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
if (pte_access & ACC_WRITE_MASK) {

/*
-* There are two cases:
-* - the one is other vcpu creates new sp in the window
-*   between mapping_level() and acquiring mmu-lock.
-* - the another case is the new sp is created by itself
-*   (page-fault path) when guest uses the target gfn as
-*   its page table.
-* Both of these cases can be fixed by allowing guest to
-* retry the access, it will refault, then we can establish
-* the mapping by using small page.
+* Other vcpu creates new sp in the window between
+* mapping_level() and acquiring mmu-lock. We can
+* allow guest to retry the access, the mapping can
+* be fixed if guest refault.
 */
if (level > PT_PAGE_TABLE_LEVEL &&
has_wrprotected_page(vcpu->kvm, gfn, level))
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 7c575e7..67b390d 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -487,6 +487,38 @@ out_gpte_changed:
return 0;
 }

+ /*
+ * To see whether the mapped gfn can write its page table in the current
+ * mapping.
+ *
+ * It is the helper function of FNAME(page_fault). When guest uses large page
+ * size to map the writable gfn which is used as current page table, we should
+ * force kvm to use small page size to map it because new shadow page will be
+ * created when kvm establishes shadow page table that stop kvm using large
+ * page size. Do it early can avoid unnecessary #PF and emulation.
+ *
+ * Note: the PDPT page table is not checked for PAE-32 bit guest. It is ok
+ * since the PDPT is always shadowed, that means, we can not use large page
+ * size to map the gfn which is used as PDPT.
+ */
+static bool
+FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu,
+ struct guest_walker *walker, int user_fault)
+{
+   int level;
+   gfn_t mask = ~(KVM_PAGES_PER_HPAGE(walker->level) - 1);
+
+   if (!(walker->pte_access & ACC_WRITE_MASK ||
+ (!is_write_protection(vcpu) && !user_fault)))
+   return false;
+
+   for (level = walker->level; level <= walker->max_level; level++)
+   if (!((walker->gfn ^ walker->table_gfn[level - 1]) & mask))
+   return true;
+
+   return false;
+}
+
 /*
  * Page fault handler.  There are several causes for a page fault:
  *   - there is no shadow pte for the guest pte
@@ -541,7 +573,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
}

if (walker.level >= PT_DIRECTORY_LEVEL)
-   force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn);
+   force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn)
+  || FNAME(is_self_change_mapping)(vcpu, , user_fault);
else
force_pt_level = 1;
if (!force_pt_level) {
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 1/5] KVM: MMU: fix Dirty bit missed if CR0.WP = 0

2013-01-07 Thread Xiao Guangrong
If the write-fault access is from supervisor and CR0.WP is not set on the
vcpu, kvm will fix it by adjusting pte access - it sets the W bit on pte
and clears U bit. This is the chance that kvm can change pte access from
readonly to writable

Unfortunately, the pte access is the access of 'direct' shadow page table,
means direct sp.role.access = pte_access, then we will create a writable
spte entry on the readonly shadow page table. It will cause Dirty bit is
not tracked when two guest ptes point to the same large page. Note, it
does not have other impact except Dirty bit since cr0.wp is encoded into
sp.role

It can be fixed by adjusting pte access before establishing shadow page
table. Also, after that, no mmu specified code exists in the common function
and drop two parameters in set_spte

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |   47 ---
 arch/x86/kvm/paging_tmpl.h |   30 +++
 2 files changed, 38 insertions(+), 39 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 01d7c2a..2a3c890 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2342,8 +2342,7 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, 
gfn_t gfn,
 }

 static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-   unsigned pte_access, int user_fault,
-   int write_fault, int level,
+   unsigned pte_access, int level,
gfn_t gfn, pfn_t pfn, bool speculative,
bool can_unsync, bool host_writable)
 {
@@ -2378,9 +2377,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,

spte |= (u64)pfn << PAGE_SHIFT;

-   if ((pte_access & ACC_WRITE_MASK)
-   || (!vcpu->arch.mmu.direct_map && write_fault
-   && !is_write_protection(vcpu) && !user_fault)) {
+   if (pte_access & ACC_WRITE_MASK) {

/*
 * There are two cases:
@@ -2399,19 +2396,6 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,

spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;

-   if (!vcpu->arch.mmu.direct_map
-   && !(pte_access & ACC_WRITE_MASK)) {
-   spte &= ~PT_USER_MASK;
-   /*
-* If we converted a user page to a kernel page,
-* so that the kernel can write to it when cr0.wp=0,
-* then we should prevent the kernel from executing it
-* if SMEP is enabled.
-*/
-   if (kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
-   spte |= PT64_NX_MASK;
-   }
-
/*
 * Optimization: for pte sync, if spte was writable the hash
 * lookup is unnecessary (and expensive). Write protection
@@ -2442,18 +2426,15 @@ done:

 static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 unsigned pt_access, unsigned pte_access,
-int user_fault, int write_fault,
-int *emulate, int level, gfn_t gfn,
-pfn_t pfn, bool speculative,
-bool host_writable)
+int write_fault, int *emulate, int level, gfn_t gfn,
+pfn_t pfn, bool speculative, bool host_writable)
 {
int was_rmapped = 0;
int rmap_count;

-   pgprintk("%s: spte %llx access %x write_fault %d"
-" user_fault %d gfn %llx\n",
+   pgprintk("%s: spte %llx access %x write_fault %d gfn %llx\n",
 __func__, *sptep, pt_access,
-write_fault, user_fault, gfn);
+write_fault, gfn);

if (is_rmap_spte(*sptep)) {
/*
@@ -2477,9 +2458,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
was_rmapped = 1;
}

-   if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault,
- level, gfn, pfn, speculative, true,
- host_writable)) {
+   if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative,
+ true, host_writable)) {
if (write_fault)
*emulate = 1;
kvm_mmu_flush_tlb(vcpu);
@@ -2571,10 +2551,9 @@ static int direct_pte_prefetch_many(struct kvm_vcpu 
*vcpu,
return -1;

for (i = 0; i < ret; i++, gfn++, start++)
-   mmu_set_spte(vcpu, start, ACC_ALL,
-access, 0, 0, NULL,
-sp->role.level, gfn,
-page_to_pfn(pages[i]), true, true);
+   mmu_set_spte(vcpu, start, ACC_ALL, access, 0, NULL,
+sp->role.level, gfn, page_to_pfn(pages[i]),
+true, true);

return 0;
 }
@@ -2636,8 +2615,8 @@ 

[PATCH] leds-lm355x: support LED trigger functionality

2013-01-07 Thread Kim, Milo
 LM355x family devices provide flash, torch and indicator functions.
 This patch support LED trigger feature.
 Using LED trigger APIs(), other driver simply turn on/off the flash, torch
 and indicator.

 Platform data
  the name of LED trigger is configurable.

 Documentation
  example and detailed description added.

Signed-off-by: Milo(Woogyom) Kim 
---
 Documentation/leds/leds-lm3556.txt|   62 +
 drivers/leds/leds-lm355x.c|3 ++
 include/linux/platform_data/leds-lm355x.h |8 
 3 files changed, 73 insertions(+)

diff --git a/Documentation/leds/leds-lm3556.txt 
b/Documentation/leds/leds-lm3556.txt
index d9eb91b..73244cd 100644
--- a/Documentation/leds/leds-lm3556.txt
+++ b/Documentation/leds/leds-lm3556.txt
@@ -83,3 +83,65 @@ and register it in the platform init function
 Example:
board_register_i2c_bus(4, 400,
board_i2c_ch4, ARRAY_SIZE(board_i2c_ch4));
+
+Support LED Triggers
+
+Flash, torch and indicator can be controlled not only by an user-space but also
+by other drivers, kernel space.
+For example, flash turns on by camera driver internally.
+To support this functionality, LED trigger is registered.
+The name of LED trigger is configurable in the platform data.
+
+Example: LED trigger name for flash
+#include 
+
+struct lm355x_trigger_name lm3556_trigger_name = {
+   .flash = "flash",
+};
+
+struct lm355x_platform_data lm3556_pdata = {
+   ...
+   .trigger = _trigger_name,
+};
+
+Example: Flash control in simple camera driver
+#include 
+
+#ifdef CONFIG_LEDS_TRIGGERS
+DEFINE_LED_TRIGGER(flash_led_trigger);
+#endif
+
+static int foo_camera_init()
+{
+   ...
+
+#ifdef CONFIG_LEDS_TRIGGERS
+   /* should be same name as in lm355x_platform_data */
+   led_trigger_register_simple("flash", _led_trigger);
+#endif
+
+   ...
+}
+
+static void foo_camera_exit()
+{
+   ...
+
+#ifdef CONFIG_LEDS_TRIGGERS
+   led_trigger_unregister_simple(flash_led_trigger);
+#endif
+
+   ...
+}
+
+#ifdef CONFIG_LEDS_TRIGGERS
+static void foo_camera_flash_ctrl(bool on)
+{
+   if (on)
+   led_trigger_event(flash_led_trigger, LED_FULL);
+   else
+   led_trigger_event(flash_led_trigger, LED_OFF);
+}
+#else
+#define foo_camera_flash_ctrl  NULL
+#endif
diff --git a/drivers/leds/leds-lm355x.c b/drivers/leds/leds-lm355x.c
index 65d7928..29df4c0 100644
--- a/drivers/leds/leds-lm355x.c
+++ b/drivers/leds/leds-lm355x.c
@@ -477,6 +477,7 @@ static int lm355x_probe(struct i2c_client *client,
chip->cdev_flash.name = "flash";
chip->cdev_flash.max_brightness = 16;
chip->cdev_flash.brightness_set = lm355x_strobe_brightness_set;
+   chip->cdev_flash.default_trigger = pdata->trigger->flash;
err = led_classdev_register((struct device *)
>dev, >cdev_flash);
if (err < 0)
@@ -486,6 +487,7 @@ static int lm355x_probe(struct i2c_client *client,
chip->cdev_torch.name = "torch";
chip->cdev_torch.max_brightness = 8;
chip->cdev_torch.brightness_set = lm355x_torch_brightness_set;
+   chip->cdev_torch.default_trigger = pdata->trigger->torch;
err = led_classdev_register((struct device *)
>dev, >cdev_torch);
if (err < 0)
@@ -499,6 +501,7 @@ static int lm355x_probe(struct i2c_client *client,
else
chip->cdev_indicator.max_brightness = 8;
chip->cdev_indicator.brightness_set = lm355x_indicator_brightness_set;
+   chip->cdev_indicator.default_trigger = pdata->trigger->indicator;
err = led_classdev_register((struct device *)
>dev, >cdev_indicator);
if (err < 0)
diff --git a/include/linux/platform_data/leds-lm355x.h 
b/include/linux/platform_data/leds-lm355x.h
index b88724b..b64d312 100644
--- a/include/linux/platform_data/leds-lm355x.h
+++ b/include/linux/platform_data/leds-lm355x.h
@@ -42,6 +42,12 @@ enum lm355x_pmode {
LM355x_PMODE_ENABLE = 0x04,
 };
 
+struct lm355x_trigger_name {
+   const char *flash;
+   const char *torch;
+   const char *indicator;
+};
+
 /*
  * struct lm3554_platform_data
  * @pin_strobe: strobe input
@@ -55,6 +61,7 @@ enum lm355x_pmode {
  *  lm3554-ledi/ntc
  *  lm3556-temp pin
  * @pass_mode : pass mode
+ * @triger: led triggers for flash, torch and indicator
  */
 struct lm355x_platform_data {
enum lm355x_strobe pin_strobe;
@@ -63,4 +70,5 @@ struct lm355x_platform_data {
enum lm355x_ntc ntc_pin;
 
enum lm355x_pmode pass_mode;
+   struct lm355x_trigger_name *trigger;
 };
-- 
1.7.9.5

Best Regards,
Milo


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Re: [PATCH 2/2] cgroups: fix cgroup_event_listener error handling

2013-01-07 Thread Li Zefan
> cgroups: fix cgroup_event_listener error handling
> 
> The error handling in cgroup_event_listener.c did not correctly deal
> with either an error opening either  or
> cgroup.event_control.  Due to an uninitialized variable the program
> exit code was undefined if either of these opens failed.
> 
> This patch simplifies and corrects cgroup_event_listener.c error
> handling by:
> 1. using err*() rather than printf(),exit()
> 2. depending on process exit to close open files
> 
> With this patch failures always return non-zero error.
> 
> Signed-off-by: Greg Thelen 

Acked-by: Li Zefan 

> ---
>  tools/cgroup/cgroup_event_listener.c |   72 ++---
>  1 files changed, 22 insertions(+), 50 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mmc: correct the EXCEPTION_EVENTS_STATUS vaule comment

2013-01-07 Thread Zhang, YiX X
>From aaea3405944d844f53679b295d4082584f33d9a3 Mon Sep 17 00:00:00 2001
From: ZhangYi 
Date: Tue, 8 Jan 2013 13:50:09 +0800
Subject: [PATCH] mmc: correct the EXCEPTION_EVENTS_STATUS vaule comment

The right value is 54 according to eMMC 4.5 specification.

Signed-off-by: ZhangYi 
---
 include/linux/mmc/card.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h
index 5c69315..ffde1d3 100644
--- a/include/linux/mmc/card.h
+++ b/include/linux/mmc/card.h
@@ -83,7 +83,7 @@ struct mmc_ext_csd {
unsigned intdata_tag_unit_size; /* DATA TAG UNIT size */
unsigned intboot_ro_lock;   /* ro lock support */
boolboot_ro_lockable;
-   u8  raw_exception_status;   /* 53 */
+   u8  raw_exception_status;   /* 54 */
u8  raw_partition_support;  /* 160 */
u8  raw_rpmb_size_mult; /* 168 */
u8  raw_erased_mem_count;   /* 181 */
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched: Consequences of integrating the Per Entity Load Tracking Metric into the Load Balancer

2013-01-07 Thread Preeti U Murthy
On 01/07/2013 09:18 PM, Vincent Guittot wrote:
> On 2 January 2013 05:22, Preeti U Murthy  wrote:
>> Hi everyone,
>> I have been looking at how different workloads react when the per entity
>> load tracking metric is integrated into the load balancer and what are
>> the possible reasons for it.
>>
>> I had posted the integration patch earlier:
>> https://lkml.org/lkml/2012/11/15/391
>>
>> Essentially what I am doing is:
>> 1.I have disabled CONFIG_FAIR_GROUP_SCHED to make the analysis simple
>> 2.I have replaced cfs_rq->load.weight in weighted_cpuload() with
>> cfs.runnable_load_avg,the active load tracking metric.
>> 3.I have replaced se.load.weight in task_h_load() with
>> se.load.avg.contrib,the per entity load tracking metric.
>> 4.The load balancer will end up using these metrics.
>>
>> After conducting experiments on several workloads I found out that the
>> performance of the workloads with the above integration would neither
>> improve nor deteriorate.And this observation was consistent.
>>
>> Ideally the performance should have improved considering,that the metric
>> does better tracking of load.
>>
>> Let me explain with a simple example as to why we should see a
>> performance improvement ideally:Consider 2 80% tasks and 1 40% task.
>>
>> With integration:
>> 
>>
>>40%
>> 80%40%
>> cpu1  cpu2
>>
>> The above will be the scenario when the tasks fork initially.And this is
>> a perfectly balanced system,hence no more load balancing.And proper
>> distribution of loads on the cpu.
>>
>> Without integration
>> ---
>>
>> 40%   40%
>> 80%40% 80%40%
>> cpu1   cpu2OR cpu1   cpu2
>>
>> Because the  view is that all the tasks as having the same load.The load
>> balancer could ping pong tasks between these two situations.
>>
>> When I performed this experiment,I did not see an improvement in the
>> performance though in the former case.On further observation I found
>> that the following was actually happening.
>>
>> With integration
>> 
>>
>> Initially 40% task sleeps  40% task wakes up
>>and select_idle_sibling()
>>decides to wake it up on cpu1
>>
>>40%   ->   ->   40%
>> 80%40%80%40%   80%  40%
>> cpu1  cpu2cpu1   cpu2  cpu1 cpu2
>>
>>
>> This makes load balance trigger movement of 40% from cpu1 back to
>> cpu2.Hence the stability that the load balancer was trying to achieve is
>> gone.Hence the culprit boils down to select_idle_sibling.How is it the
>> culprit and how is it hindering performance of the workloads?
>>
>> *What is the way ahead with the per entity load tracking metric in the
>> load balancer then?*
>>
>> In replies to a post by Paul in https://lkml.org/lkml/2012/12/6/105,
>> he mentions the following:
>>
>> "It is my intuition that the greatest carnage here is actually caused
>> by wake-up load-balancing getting in the way of periodic in
>> establishing a steady state. I suspect more mileage would result from
>> reducing the interference wake-up load-balancing has with steady
>> state."
>>
>> "The whole point of using blocked load is so that you can converge on a
>> steady state where you don't NEED to move tasks.  What disrupts this is
>> we naturally prefer idle cpus on wake-up balance to reduce wake-up
>> latency. I think the better answer is making these two processes load
>> balancing() and select_idle_sibling() more co-operative."
>>
>> I had not realised how this would happen until I saw it happening in the
>> above experiment.
>>
>> Based on what Paul explained above let us use the runnable load + the
>> blocked load for calculating the load on a cfs runqueue rather than just
>> the runnable load(which is what i am doing now) and see its consequence.
>>
>> Initially:   40% task sleeps
>>
>>40%
>> 80%40%   ->  80%  40%
>> cpu1   cpu2 cpu1  cpu2
>>
>> So initially the load on cpu1 is say 80 and on cpu2 also it is
>> 80.Balanced.Now when 40% task sleeps,the total load on cpu2=runnable
>> load+blocked load.which is still 80.
>>
>> As a consequence,firstly,during periodic load balancing the load is not
>> moved from cpu1 to cpu2 when the 40% task sleeps.(It sees the load on
>> cpu2 as 80 and not as 40).
>> Hence the above scenario remains the same.On wake up,what happens?
>>
>> Here comes the point of making both load balancing and wake up
>> balance(select_idle_sibling) co operative. How about we always schedule
>> the woken up task on the prev_cpu? This seems more sensible considering
>> load balancing considers blocked load as being a part of the load of cpu2.
> 
> Hi Preeti,
> 
> I'm not sure that we want such steady state at cores level because we
> take advantage of migrating wake up tasks between cores that share
> their cache as Matthew demonstrated. But I agree that reaching 

[PATCH] i.MX: DMA: add firmware for i.mx6 series

2013-01-07 Thread Gary Zhang
SDMA uses firmware to expand some features for modules.
add SDMA firmware for i.mx6 series,

Signed-off-by: Gary Zhang 
---
 firmware/Makefile |1 +
 firmware/imx/sdma/sdma-imx6q-to1.bin.ihex |  116 +
 2 files changed, 117 insertions(+), 0 deletions(-)
 create mode 100755 firmware/imx/sdma/sdma-imx6q-to1.bin.ihex

diff --git a/firmware/Makefile b/firmware/Makefile
index cbb09ce..b6170f3 100644
--- a/firmware/Makefile
+++ b/firmware/Makefile
@@ -61,6 +61,7 @@ fw-shipped-$(CONFIG_DRM_RADEON) += radeon/R100_cp.bin 
radeon/R200_cp.bin \
   radeon/RV770_pfp.bin radeon/RV770_me.bin \
   radeon/RV730_pfp.bin radeon/RV730_me.bin \
   radeon/RV710_pfp.bin radeon/RV710_me.bin
+fw-shipped-$(CONFIG_SOC_IMX6Q) += imx/sdma/sdma-imx6q-to1.bin
 fw-shipped-$(CONFIG_DVB_AV7110) += av7110/bootcode.bin
 fw-shipped-$(CONFIG_DVB_TTUSB_BUDGET) += ttusb-budget/dspbootcode.bin
 fw-shipped-$(CONFIG_E100) += e100/d101m_ucode.bin e100/d101s_ucode.bin \
diff --git a/firmware/imx/sdma/sdma-imx6q-to1.bin.ihex 
b/firmware/imx/sdma/sdma-imx6q-to1.bin.ihex
new file mode 100755
index 000..2e561f0
--- /dev/null
+++ b/firmware/imx/sdma/sdma-imx6q-to1.bin.ihex
@@ -0,0 +1,116 @@
+:100053444D41010001001C00AD
+:100010002600B4007A06820202
+:10002000DC
+:10003000D0
+:100040006A1A38
+:10005000EB02BB180804D8
+:10006000C003D9
+:10007000AB027B035D
+:100080004C046E04B6
+:10009000001854
+:1000A0186218161A8E
+:1000B61BE3C1DB57E35FE357F352016A1D
+:1000C0008F00D500017D8D00A005EB5D7804037DD8
+:1000D00079042C7D367C79041F7CEE56000F600677
+:1000E57D0965437E0A62417E20980A623E7E54
+:1000F9653C7E12051205AD026007037DFB55C4
+:1001D36D2B98FB55041DD36DC86A2F7F011F3B
+:100113200048E47C5398FB55D76D1500057803
+:10012962C86A0962C86AD76D5298FB55D76DD3
+:100130001500150005780A62C86A0A62C86AD76D98
+:100140005298FB55D76D15001500150005780B6208
+:10015000C86A0B62C86AD76D097CDF6D077F33
+:10016000EB55004D077DFAC1E35706980700CC68B0
+:10017C6813C20AC20398D9C1E3C1DB57E35F1D
+:10018000E357F352216A8F00D500017D8D00A00551
+:10019000EB5DFB567804037D79042A7D317C79047C
+:1001A000207C700B1103EB53000F6003057D096584
+:1001B000377E0A62357E86980A62327E0965307E15
+:1001C00012051205AD026007027C065A8E98265A67
+:1001D000277F011F03200048E87C700B1103135395
+:1001E000AF98150004780962065A0962265AAE983B
+:1001F0001500150004780A62065A0A62265AAE985B
+:100215001500150004780B62065A0B62265A79
+:1002177CEB55004D067DFAC1E357699855
+:100227000C6813C20AC26698700B11031353BF
+:100230006C07017CD9C1FB5E8A066B07017CD9C1C2
+:10024000F35EDB59D3588F0110010F398B003CC18D
+:100250002B7DC05AC85B4EC1277C88038906E35CAE
+:10026000FF0D1105FF1DBC053E07004D187D7008F0
+:1002700011007E07097D7D07027D2852E698F8521D
+:10028000DB54BC02CC02097C7C07027D2852EF982B
+:10029000F852D354BC02CC02097D0004DD988B00D7
+:1002A000C052C85359C1D67D0002CD98FF08BF0087
+:1002B0007F07157D8804D500017D8D00A005EB5DCD
+:1002C0008F0212021202FF3ADA05027C3E071899E9
+:1002D000A402DD02027D3E0718995E071899EB55CE
+:1002E0009805EB5DF352FB546A07267D6C07017D90
+:1002F00055996B07577C6907047D6807027D010EDD
+:10032F999358D600017D8E009355A005935DDB
+:10031000A00602780255045D1D7C004E087C69072A
+:1003237D0255177E3C99045D147F8906935026
+:10033048017D2799A099150006780255045DB3
+:100340004F070255245D2F07017CA09917006F0706
+:1003517C012093559D000700A7D9F598D36C27
+:100360006907047D6807027D010E64999358D600E1
+:1003717D8E009355A005935DA006027802557D
+:10038000C86D0F7C004E087C6907037D0255097E0D
+:100390007199C86D067F890693500048017D5C996C
+:1003A000A0999A99C36A6907047D6807027D010EC6
+:1003B00087999358D600017D8E009355A005935DD3
+:1003C000A0060278C865045D0F7C004E087C6907B2
+:1003D37DC865097E9499045D067F8906935064
+:1003E048017D7F99A09993559D000700FF6CFF
+:1003F000A7D9F598E354EB55004D017CF59822
+:1004DD98E354EB55FF0A1102FF1A7F07027CC7
+:10041000A005B4999D008C05BA05A0051002BA0488
+:10042000AD0454040600E3C1DB57FB52C36AF35228
+:1004356A8F00D500017D8D00A005EB5D780475
+:1004437D79042B7D1E7C7904337CEE56000FEE
+:10045000FB556007027DC36DD599041DC36DC8624D
+:100460003B7E6006027D10021202096A357F12028D
+:1004796A327F1202096A2F7F011F0320004898
+:10048000E77C099AFB55C76D150015001500057826
+:10049000C8620B6AC8620B6AC76D089AFB55C76DC4
+:1004A000150015000578C8620A6AC8620A6AC76D35
+:1004B89AFB55C76D15000578C862096AC862BD
+:1004C96AC76D097C286A077FEB55004D5B
+:1004D57DFAC1DB57BF9977C254040AC2BA99A5
+:1004E000D9C1E3C1DB57F352056A8F00D500017D06
+:1004F0008D00A005FB567804037D7904297D1F7CBF
+:100579042E7CE35D700D1105ED55000F600739

[PATCH 3/3] PM / devfreq: account suspend/resume for stats

2013-01-07 Thread Rajagopal Venkat
devfreq stats is not taking device suspend and resume into
account. Fix it.

Signed-off-by: Rajagopal Venkat 
---
 drivers/devfreq/devfreq.c |   15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 2843a22..4c50235 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -297,6 +297,7 @@ void devfreq_monitor_suspend(struct devfreq *devfreq)
return;
}
 
+   devfreq_update_status(devfreq, devfreq->previous_freq);
devfreq->stop_polling = true;
mutex_unlock(>lock);
cancel_delayed_work_sync(>work);
@@ -313,6 +314,8 @@ EXPORT_SYMBOL(devfreq_monitor_suspend);
  */
 void devfreq_monitor_resume(struct devfreq *devfreq)
 {
+   unsigned long freq;
+
mutex_lock(>lock);
if (!devfreq->stop_polling)
goto out;
@@ -321,8 +324,14 @@ void devfreq_monitor_resume(struct devfreq *devfreq)
devfreq->profile->polling_ms)
queue_delayed_work(devfreq_wq, >work,
msecs_to_jiffies(devfreq->profile->polling_ms));
+
+   devfreq->last_stat_updated = jiffies;
devfreq->stop_polling = false;
 
+   if (devfreq->profile->get_cur_freq &&
+   !devfreq->profile->get_cur_freq(devfreq->dev.parent, ))
+   devfreq->previous_freq = freq;
+
 out:
mutex_unlock(>lock);
 }
@@ -931,11 +940,11 @@ static ssize_t show_trans_table(struct device *dev, 
struct device_attribute *att
 {
struct devfreq *devfreq = to_devfreq(dev);
ssize_t len;
-   int i, j, err;
+   int i, j;
unsigned int max_state = devfreq->profile->max_state;
 
-   err = devfreq_update_status(devfreq, devfreq->previous_freq);
-   if (err)
+   if (!devfreq->stop_polling &&
+   devfreq_update_status(devfreq, devfreq->previous_freq))
return 0;
 
len = sprintf(buf, "   From  :   To\n");
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] PM / devfreq: fix stats start time stamp

2013-01-07 Thread Rajagopal Venkat
Mark the stats start time stamp when actual load monitoring is
started for accuracy.

Signed-off-by: Rajagopal Venkat 
---
 drivers/devfreq/devfreq.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 5782c9b..2843a22 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -254,9 +254,12 @@ static void devfreq_monitor(struct work_struct *work)
 void devfreq_monitor_start(struct devfreq *devfreq)
 {
INIT_DEFERRABLE_WORK(>work, devfreq_monitor);
-   if (devfreq->profile->polling_ms)
+   if (devfreq->profile->polling_ms) {
queue_delayed_work(devfreq_wq, >work,
msecs_to_jiffies(devfreq->profile->polling_ms));
+
+   devfreq->last_stat_updated = jiffies;
+   }
 }
 EXPORT_SYMBOL(devfreq_monitor_start);
 
@@ -498,7 +501,6 @@ struct devfreq *devfreq_add_device(struct device *dev,
devfreq->time_in_state = devm_kzalloc(dev, sizeof(unsigned int) *
devfreq->profile->max_state,
GFP_KERNEL);
-   devfreq->last_stat_updated = jiffies;
devfreq_set_freq_limits(devfreq);
 
dev_set_name(>dev, dev_name(dev));
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] PM / devfreq: set min/max freq limit from freq table

2013-01-07 Thread Rajagopal Venkat
Set devfreq device min and max frequency limits when device
is added to devfreq, provided frequency table is supplied.
This helps governors to suggest target frequency with in
limits.

Signed-off-by: Rajagopal Venkat 
---
 drivers/devfreq/devfreq.c |   24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index a8f0173..5782c9b 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -69,6 +69,29 @@ static struct devfreq *find_device_devfreq(struct device 
*dev)
 }
 
 /**
+ * devfreq_set_freq_limits() - Set min and max frequency from freq_table
+ * @devfreq:   the devfreq instance
+ */
+static void devfreq_set_freq_limits(struct devfreq *devfreq)
+{
+   int idx;
+   unsigned long min = ~0, max = 0;
+
+   if (!devfreq->profile->freq_table)
+   return;
+
+   for (idx = 0; idx < devfreq->profile->max_state; idx++) {
+   if (min > devfreq->profile->freq_table[idx])
+   min = devfreq->profile->freq_table[idx];
+   if (max < devfreq->profile->freq_table[idx])
+   max = devfreq->profile->freq_table[idx];
+   }
+
+   devfreq->min_freq = min;
+   devfreq->max_freq = max;
+}
+
+/**
  * devfreq_get_freq_level() - Lookup freq_table for the frequency
  * @devfreq:   the devfreq instance
  * @freq:  the target frequency
@@ -476,6 +499,7 @@ struct devfreq *devfreq_add_device(struct device *dev,
devfreq->profile->max_state,
GFP_KERNEL);
devfreq->last_stat_updated = jiffies;
+   devfreq_set_freq_limits(devfreq);
 
dev_set_name(>dev, dev_name(dev));
err = device_register(>dev);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] staging: usbip: refine the lock

2013-01-07 Thread Greg Kroah-Hartman
On Tue, Jan 08, 2013 at 01:49:00PM +0800, Harvey Yang wrote:
> This patchset refines some spinlocks which maybe not used properly.  
> 
> [PATCH 1/2]: The function 'usbip_event_add()' may be called in interrupt 
> context on the stub side: 
> 'stub_complete'->'stub_enqueue_ret_unlink'->'usbip_event_add'.
> In this function it tries to get the lock 'ud->lock', so we should disable 
> irq when we get this lock in process context.
> 
> [PATCH 2/2]: On the client side, we have a virtual hcd driver, there actually 
> no hardware interrupts, so we do not need worry about race conditions caused 
> by irq. To achieve a good performance there is no need to use the interrupt 
> safe spinlock. Just replace them with a non interrupt safe version.

This information needs to be in the patches itself, otherwise it will
never be seen.

Also, why do you think this is going to make anything faster?  Have you
measured it?  What are the results?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [TRIVIAL PATCH 00/26] treewide: Add and use vsprintf extension %pSR

2013-01-07 Thread Joe Perches
On Wed, 2012-12-12 at 10:18 -0800, Joe Perches wrote:
> Remove the somewhat awkward uses of print_symbol and convert all the
> existing uses to a new vsprintf pointer type of %pSR.

Jiri?  Are you going to do anything with this?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] staging: usbip: use interrupt safe spinlock to avoid potential deadlock.

2013-01-07 Thread Greg Kroah-Hartman
On Tue, Jan 08, 2013 at 01:49:01PM +0800, Harvey Yang wrote:
> 
> Signed-off-by: Harvey Yang 

You need to describe _why_ you did this, not just what you did.  Why is
this needed?  What are you fixing by doing this?  Is this something that
older kernels need?  Is it something that others are seeing?

Same goes for patch 2/2 here, you have to provide a good changelog entry
in order for this to be accepted.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] staging: usbip: replace the interrupt safe spinlocks with common ones.

2013-01-07 Thread Harvey Yang

Signed-off-by: Harvey Yang 
---
 drivers/staging/usbip/vhci_hcd.c |   76 --
 drivers/staging/usbip/vhci_rx.c  |   10 ++---
 drivers/staging/usbip/vhci_tx.c  |   14 +++
 3 files changed, 42 insertions(+), 58 deletions(-)

diff --git a/drivers/staging/usbip/vhci_hcd.c b/drivers/staging/usbip/vhci_hcd.c
index c3aa219..216648d 100644
--- a/drivers/staging/usbip/vhci_hcd.c
+++ b/drivers/staging/usbip/vhci_hcd.c
@@ -121,11 +121,9 @@ static void dump_port_status_diff(u32 prev_status, u32 
new_status)
 
 void rh_port_connect(int rhport, enum usb_device_speed speed)
 {
-   unsigned long   flags;
-
usbip_dbg_vhci_rh("rh_port_connect %d\n", rhport);
 
-   spin_lock_irqsave(_controller->lock, flags);
+   spin_lock(_controller->lock);
 
the_controller->port_status[rhport] |= USB_PORT_STAT_CONNECTION
| (1 << USB_PORT_FEAT_C_CONNECTION);
@@ -141,24 +139,22 @@ void rh_port_connect(int rhport, enum usb_device_speed 
speed)
break;
}
 
-   spin_unlock_irqrestore(_controller->lock, flags);
+   spin_unlock(_controller->lock);
 
usb_hcd_poll_rh_status(vhci_to_hcd(the_controller));
 }
 
 static void rh_port_disconnect(int rhport)
 {
-   unsigned long flags;
-
usbip_dbg_vhci_rh("rh_port_disconnect %d\n", rhport);
 
-   spin_lock_irqsave(_controller->lock, flags);
+   spin_lock(_controller->lock);
 
the_controller->port_status[rhport] &= ~USB_PORT_STAT_CONNECTION;
the_controller->port_status[rhport] |=
(1 << USB_PORT_FEAT_C_CONNECTION);
 
-   spin_unlock_irqrestore(_controller->lock, flags);
+   spin_unlock(_controller->lock);
usb_hcd_poll_rh_status(vhci_to_hcd(the_controller));
 }
 
@@ -183,7 +179,6 @@ static void rh_port_disconnect(int rhport)
 static int vhci_hub_status(struct usb_hcd *hcd, char *buf)
 {
struct vhci_hcd *vhci;
-   unsigned long   flags;
int retval;
int rhport;
int changed = 0;
@@ -193,7 +188,7 @@ static int vhci_hub_status(struct usb_hcd *hcd, char *buf)
 
vhci = hcd_to_vhci(hcd);
 
-   spin_lock_irqsave(>lock, flags);
+   spin_lock(>lock);
if (!HCD_HW_ACCESSIBLE(hcd)) {
usbip_dbg_vhci_rh("hw accessible flag not on?\n");
goto done;
@@ -216,7 +211,7 @@ static int vhci_hub_status(struct usb_hcd *hcd, char *buf)
usb_hcd_resume_root_hub(hcd);
 
 done:
-   spin_unlock_irqrestore(>lock, flags);
+   spin_unlock(>lock);
return changed ? retval : 0;
 }
 
@@ -237,7 +232,6 @@ static int vhci_hub_control(struct usb_hcd *hcd, u16 
typeReq, u16 wValue,
 {
struct vhci_hcd *dum;
int retval = 0;
-   unsigned long   flags;
int rhport;
 
u32 prev_port_status[VHCI_NPORTS];
@@ -257,7 +251,7 @@ static int vhci_hub_control(struct usb_hcd *hcd, u16 
typeReq, u16 wValue,
 
dum = hcd_to_vhci(hcd);
 
-   spin_lock_irqsave(>lock, flags);
+   spin_lock(>lock);
 
/* store old status and compare now and old later */
if (usbip_dbg_flag_vhci_rh) {
@@ -410,7 +404,7 @@ static int vhci_hub_control(struct usb_hcd *hcd, u16 
typeReq, u16 wValue,
}
usbip_dbg_vhci_rh(" bye\n");
 
-   spin_unlock_irqrestore(>lock, flags);
+   spin_unlock(>lock);
 
return retval;
 }
@@ -433,7 +427,6 @@ static void vhci_tx_urb(struct urb *urb)
 {
struct vhci_device *vdev = get_vdev(urb->dev);
struct vhci_priv *priv;
-   unsigned long flag;
 
if (!vdev) {
pr_err("could not get virtual device");
@@ -442,11 +435,11 @@ static void vhci_tx_urb(struct urb *urb)
 
priv = kzalloc(sizeof(struct vhci_priv), GFP_ATOMIC);
 
-   spin_lock_irqsave(>priv_lock, flag);
+   spin_lock(>priv_lock);
 
if (!priv) {
dev_err(>dev->dev, "malloc vhci_priv\n");
-   spin_unlock_irqrestore(>priv_lock, flag);
+   spin_unlock(>priv_lock);
usbip_event_add(>ud, VDEV_EVENT_ERROR_MALLOC);
return;
}
@@ -463,7 +456,7 @@ static void vhci_tx_urb(struct urb *urb)
list_add_tail(>list, >priv_tx);
 
wake_up(>waitq_tx);
-   spin_unlock_irqrestore(>priv_lock, flag);
+   spin_unlock(>priv_lock);
 }
 
 static int vhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb,
@@ -471,7 +464,6 @@ static int vhci_urb_enqueue(struct usb_hcd *hcd, struct urb 
*urb,
 {
struct device *dev = >dev->dev;
int ret = 0;
-   unsigned long flags;
struct vhci_device *vdev;
 
usbip_dbg_vhci_hc("enter, usb_hcd %p urb %p mem_flags %d\n",
@@ -480,11 +472,11 @@ static int vhci_urb_enqueue(struct usb_hcd *hcd, struct 
urb *urb,
/* patch to usb_sg_init() is in 2.5.60 */
BUG_ON(!urb->transfer_buffer && urb->transfer_buffer_length);
 
- 

[PATCH 1/2] staging: usbip: use interrupt safe spinlock to avoid potential deadlock.

2013-01-07 Thread Harvey Yang

Signed-off-by: Harvey Yang 
---
 drivers/staging/usbip/stub_dev.c|   34 +-
 drivers/staging/usbip/stub_rx.c |4 ++--
 drivers/staging/usbip/usbip_event.c |6 --
 3 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/drivers/staging/usbip/stub_dev.c b/drivers/staging/usbip/stub_dev.c
index ee36415..d36c69e 100644
--- a/drivers/staging/usbip/stub_dev.c
+++ b/drivers/staging/usbip/stub_dev.c
@@ -67,9 +67,9 @@ static ssize_t show_status(struct device *dev, struct 
device_attribute *attr,
return -ENODEV;
}
 
-   spin_lock(>ud.lock);
+   spin_lock_irq(>ud.lock);
status = sdev->ud.status;
-   spin_unlock(>ud.lock);
+   spin_unlock_irq(>ud.lock);
 
return snprintf(buf, PAGE_SIZE, "%d\n", status);
 }
@@ -97,39 +97,39 @@ static ssize_t store_sockfd(struct device *dev, struct 
device_attribute *attr,
if (sockfd != -1) {
dev_info(dev, "stub up\n");
 
-   spin_lock(>ud.lock);
+   spin_lock_irq(>ud.lock);
 
if (sdev->ud.status != SDEV_ST_AVAILABLE) {
dev_err(dev, "not ready\n");
-   spin_unlock(>ud.lock);
+   spin_unlock_irq(>ud.lock);
return -EINVAL;
}
 
socket = sockfd_to_socket(sockfd);
if (!socket) {
-   spin_unlock(>ud.lock);
+   spin_unlock_irq(>ud.lock);
return -EINVAL;
}
sdev->ud.tcp_socket = socket;
 
-   spin_unlock(>ud.lock);
+   spin_unlock_irq(>ud.lock);
 
sdev->ud.tcp_rx = kthread_get_run(stub_rx_loop, >ud, 
"stub_rx");
sdev->ud.tcp_tx = kthread_get_run(stub_tx_loop, >ud, 
"stub_tx");
 
-   spin_lock(>ud.lock);
+   spin_lock_irq(>ud.lock);
sdev->ud.status = SDEV_ST_USED;
-   spin_unlock(>ud.lock);
+   spin_unlock_irq(>ud.lock);
 
} else {
dev_info(dev, "stub down\n");
 
-   spin_lock(>ud.lock);
+   spin_lock_irq(>ud.lock);
if (sdev->ud.status != SDEV_ST_USED) {
-   spin_unlock(>ud.lock);
+   spin_unlock_irq(>ud.lock);
return -EINVAL;
}
-   spin_unlock(>ud.lock);
+   spin_unlock_irq(>ud.lock);
 
usbip_event_add(>ud, SDEV_EVENT_DOWN);
}
@@ -241,9 +241,9 @@ static void stub_device_reset(struct usbip_device *ud)
ret = usb_lock_device_for_reset(udev, sdev->interface);
if (ret < 0) {
dev_err(>dev, "lock for reset\n");
-   spin_lock(>lock);
+   spin_lock_irq(>lock);
ud->status = SDEV_ST_ERROR;
-   spin_unlock(>lock);
+   spin_unlock_irq(>lock);
return;
}
 
@@ -251,7 +251,7 @@ static void stub_device_reset(struct usbip_device *ud)
ret = usb_reset_device(udev);
usb_unlock_device(udev);
 
-   spin_lock(>lock);
+   spin_lock_irq(>lock);
if (ret) {
dev_err(>dev, "device reset\n");
ud->status = SDEV_ST_ERROR;
@@ -259,14 +259,14 @@ static void stub_device_reset(struct usbip_device *ud)
dev_info(>dev, "device reset\n");
ud->status = SDEV_ST_AVAILABLE;
}
-   spin_unlock(>lock);
+   spin_unlock_irq(>lock);
 }
 
 static void stub_device_unusable(struct usbip_device *ud)
 {
-   spin_lock(>lock);
+   spin_lock_irq(>lock);
ud->status = SDEV_ST_ERROR;
-   spin_unlock(>lock);
+   spin_unlock_irq(>lock);
 }
 
 /**
diff --git a/drivers/staging/usbip/stub_rx.c b/drivers/staging/usbip/stub_rx.c
index 0572a15..e7458e1 100644
--- a/drivers/staging/usbip/stub_rx.c
+++ b/drivers/staging/usbip/stub_rx.c
@@ -307,12 +307,12 @@ static int valid_request(struct stub_device *sdev, struct 
usbip_header *pdu)
int valid = 0;
 
if (pdu->base.devid == sdev->devid) {
-   spin_lock(>lock);
+   spin_lock_irq(>lock);
if (ud->status == SDEV_ST_USED) {
/* A request is valid. */
valid = 1;
}
-   spin_unlock(>lock);
+   spin_unlock_irq(>lock);
}
 
return valid;
diff --git a/drivers/staging/usbip/usbip_event.c 
b/drivers/staging/usbip/usbip_event.c
index d332a34..82123be 100644
--- a/drivers/staging/usbip/usbip_event.c
+++ b/drivers/staging/usbip/usbip_event.c
@@ -105,10 +105,12 @@ EXPORT_SYMBOL_GPL(usbip_stop_eh);
 
 void usbip_event_add(struct usbip_device *ud, unsigned long event)
 {
-   spin_lock(>lock);
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
ud->event |= event;

[PATCH 0/2] staging: usbip: refine the lock

2013-01-07 Thread Harvey Yang
This patchset refines some spinlocks which maybe not used properly.  

[PATCH 1/2]: The function 'usbip_event_add()' may be called in interrupt 
context on the stub side: 
'stub_complete'->'stub_enqueue_ret_unlink'->'usbip_event_add'.
In this function it tries to get the lock 'ud->lock', so we should disable irq 
when we get this lock in process context.

[PATCH 2/2]: On the client side, we have a virtual hcd driver, there actually 
no hardware interrupts, so we do not need worry about race conditions caused by 
irq. To achieve a good performance there is no need to use the interrupt safe 
spinlock. Just replace them with a non interrupt safe version.


Harvey Yang (2):
  staging: usbip: use interrupt safe spinlock to avoid potential
deadlock.
  staging: usbip: replace the interrupt safe spinlocks with common
ones.

 drivers/staging/usbip/stub_dev.c|   34 
 drivers/staging/usbip/stub_rx.c |4 +-
 drivers/staging/usbip/usbip_event.c |6 ++-
 drivers/staging/usbip/vhci_hcd.c|   76 +++
 drivers/staging/usbip/vhci_rx.c |   10 ++---
 drivers/staging/usbip/vhci_tx.c |   14 +++
 6 files changed, 65 insertions(+), 79 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] Staging: comedi: addi_common.h: checkpatch.pl fixes

2013-01-07 Thread Lijo Antony

On 01/08/2013 02:50 AM, H Hartley Sweeten wrote:

On Monday, January 07, 2013 3:40 PM, Greg KH wrote:

On Sat, Jan 05, 2013 at 06:12:55PM +0400, Lijo Antony wrote:

Reduced line lengths to 80 chars by removing extra spaces.

Signed-off-by: Lijo Antony 
---
  .../staging/comedi/drivers/addi-data/addi_common.h |   20 ++--
  1 file changed, 10 insertions(+), 10 deletions(-)





As you didn't do this for all fields in the structure, it's not really
worth doing it for just these, right?

The proper thing to do is use kerneldoc format and document it all at
the top of the structure, care to do that instead?


Hopefully I will soon get the rest of the addi-data drivers split off of the
addi_common stuff. This will effectively remove this header file so I'm
not sure it's worth making patches against it.


Ok, I will wait!
If the file is still around after your cleanup, I will make changes as 
per Greg's suggestion.


Thanks,
-lijo


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC]x86: clearing access bit don't flush tlb

2013-01-07 Thread H. Peter Anvin
On 01/07/2013 09:08 PM, Rik van Riel wrote:
> On 01/08/2013 12:03 AM, H. Peter Anvin wrote:
>> On 01/07/2013 08:55 PM, Shaohua Li wrote:
>>>
>>> I searched a little bit, the change (doing TLB flush to clear access
>>> bit) is
>>> made between 2.6.7 - 2.6.8, I can't find the changelog, but I found a
>>> patch:
>>> http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.7-rc2/2.6.7-rc2-mm2/broken-out/mm-flush-tlb-when-clearing-young.patch
>>>
>>>
>>> The changelog declaims this is for arm/ppc/ppc64.
>>>
>>
>> Not really.  It says that those have stumbled over it already.  It is
>> true in general that this change will make very frequently used pages
>> (which stick in the TLB) candidates for eviction.
> 
> That is only true if the pages were to stay in the TLB for a
> very very long time.  Probably multiple seconds.
> 
>> x86 would seem to be just as affected, although possibly with a
>> different frequency.
>>
>> Do we have any actual metrics on anything here?
> 
> I suspect that if we do need to force a TLB flush for page
> reclaim purposes, it may make sense to do that TLB flush
> asynchronously. For example, kswapd could kick off a TLB
> flush of every CPU in the system once a second, when the
> system is under pageout pressure.
> 
> We would have to do this in a smart way, so the kswapds
> from multiple nodes do not duplicate the work.
> 
> If people want that kind of functionality, I would be
> happy to cook up an RFC patch.
> 

So it sounds like you're saying that this patch should never have been
applied in the first place?

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC]x86: clearing access bit don't flush tlb

2013-01-07 Thread Rik van Riel

On 01/08/2013 12:03 AM, H. Peter Anvin wrote:

On 01/07/2013 08:55 PM, Shaohua Li wrote:


I searched a little bit, the change (doing TLB flush to clear access bit) is
made between 2.6.7 - 2.6.8, I can't find the changelog, but I found a patch:
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.7-rc2/2.6.7-rc2-mm2/broken-out/mm-flush-tlb-when-clearing-young.patch

The changelog declaims this is for arm/ppc/ppc64.



Not really.  It says that those have stumbled over it already.  It is
true in general that this change will make very frequently used pages
(which stick in the TLB) candidates for eviction.


That is only true if the pages were to stay in the TLB for a
very very long time.  Probably multiple seconds.


x86 would seem to be just as affected, although possibly with a
different frequency.

Do we have any actual metrics on anything here?


I suspect that if we do need to force a TLB flush for page
reclaim purposes, it may make sense to do that TLB flush
asynchronously. For example, kswapd could kick off a TLB
flush of every CPU in the system once a second, when the
system is under pageout pressure.

We would have to do this in a smart way, so the kswapds
from multiple nodes do not duplicate the work.

If people want that kind of functionality, I would be
happy to cook up an RFC patch.

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC]x86: clearing access bit don't flush tlb

2013-01-07 Thread H. Peter Anvin
On 01/07/2013 08:55 PM, Shaohua Li wrote:
> 
> I searched a little bit, the change (doing TLB flush to clear access bit) is
> made between 2.6.7 - 2.6.8, I can't find the changelog, but I found a patch:
> http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.7-rc2/2.6.7-rc2-mm2/broken-out/mm-flush-tlb-when-clearing-young.patch
> 
> The changelog declaims this is for arm/ppc/ppc64.
> 

Not really.  It says that those have stumbled over it already.  It is
true in general that this change will make very frequently used pages
(which stick in the TLB) candidates for eviction.

x86 would seem to be just as affected, although possibly with a
different frequency.

Do we have any actual metrics on anything here?

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC]x86: clearing access bit don't flush tlb

2013-01-07 Thread Shaohua Li
On Mon, Jan 07, 2013 at 02:31:21PM -0800, H. Peter Anvin wrote:
> On 01/07/2013 07:14 AM, Rik van Riel wrote:
> > On 01/07/2013 03:12 AM, Shaohua Li wrote:
> >>
> >> We use access bit to age a page at page reclaim. When clearing pte
> >> access bit,
> >> we could skip tlb flush for the virtual address. The side effect is if
> >> the pte
> >> is in tlb and pte access bit is unset, when cpu access the page again,
> >> cpu will
> >> not set pte's access bit. So next time page reclaim can reclaim hot pages
> >> wrongly, but this doesn't corrupt anything. And according to intel
> >> manual, tlb
> >> has less than 1k entries, which coverers < 4M memory. In today's system,
> >> several giga byte memory is normal. After page reclaim clears pte
> >> access bit
> >> and before cpu access the page again, it's quite unlikely this page's
> >> pte is
> >> still in TLB. Skiping the tlb flush for this case sounds ok to me.
> > 
> > Agreed. In current systems, it can take a minute to write
> > all of memory to disk, while context switch (natural TLB
> > flush) times are in the dozens-of-millisecond timeframes.
> > 
> 
> I'm confused.  We used to do this since time immemorial, so if we aren't
> doing that now, that meant something changed somewhere along the line.
> It would be good to figure out if that was an intentional change or
> accidental.

I searched a little bit, the change (doing TLB flush to clear access bit) is
made between 2.6.7 - 2.6.8, I can't find the changelog, but I found a patch:
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.7-rc2/2.6.7-rc2-mm2/broken-out/mm-flush-tlb-when-clearing-young.patch

The changelog declaims this is for arm/ppc/ppc64.

Thanks,
Shaohua

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sched: fix the broken sched_rr_get_interval()

2013-01-07 Thread Zhu Yanhai
The caller of sched_sliced() should pass se.cfs_rq and se as the arguments, 
however in sched_rr_get_interval() we gave it rq.cfs_rq and se, which made
the following compution obviously wrong.

The change was introduced by commit 77034937, while it had been correct 
'cfs_rq_of'
before the commit. Besides the change seems to be irrelevant to the commit
msg, which was to return a 0 timeslice for tasks that are on an idle runqueue.
So I believe that was just a plain typo.

Signed-off-by: Zhu Yanhai 
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5eea870..a7a19ff 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6101,7 +6101,7 @@ static unsigned int get_rr_interval_fair(struct rq *rq, 
struct task_struct *task
 * idle runqueue:
 */
if (rq->cfs.load.weight)
-   rr_interval = NS_TO_JIFFIES(sched_slice(>cfs, se));
+   rr_interval = NS_TO_JIFFIES(sched_slice(cfs_rq_of(se), se));
 
return rr_interval;
 }
-- 
1.8.0.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: Tree for Jan 8

2013-01-07 Thread Stephen Rothwell
Hi all,

Changes since 20130107:

The slave-dma tree lost its build failure.

The staging tree gained a build failure for which I applied a merge fix.

The drop-experimental tree gained conflicts against the net-next and Linus'
trees.



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64. After the
final fixups (if any), it is also built with powerpc allnoconfig (32 and
64 bit), ppc44x_defconfig and allyesconfig (minus
CONFIG_PROFILE_ALL_BRANCHES - this fails its final link) and i386, sparc,
sparc64 and arm defconfig. These builds also have
CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and
CONFIG_DEBUG_INFO disabled when necessary.

Below is a summary of the state of the merge.

We are up to 214 trees (counting Linus' and 28 trees of patches pending
for Linus' tree), more are welcome (even if they are currently empty).
Thanks to those who have contributed, and to those who haven't, please do.

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (d287b87 Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs)
Merging fixes/master (d287b87 Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs)
Merging kbuild-current/rc-fixes (bad9955 menuconfig: Replace CIRCLEQ by 
list_head-style lists.)
Merging arm-current/fixes (d106de3 ARM: 7614/1: mm: fix wrong branch from 
Cortex-A9 to PJ4b)
Merging m68k-current/for-linus (e7e29b4 m68k: Wire up finit_module)
Merging powerpc-merge/merge (e6449c9 powerpc: Add missing NULL terminator to 
avoid boot panic on PPC40x)
Merging sparc/master (4e4d78f sparc: Hook up finit_module syscall.)
Merging net/master (c7e2e1d ipv4: fix NULL checking in devinet_ioctl())
Merging sound-current/for-linus (6d3cd5d ALSA: hda - add mute LED for HP 
Pavilion 17 (Realtek codec))
Merging pci-current/for-linus (812089e PCI: Reduce Ricoh 0xe822 SD card reader 
base clock frequency to 50MHz)
Merging wireless/master (5e20a4b b43: Fix firmware loading when driver is built 
into the kernel)
Merging driver-core.current/driver-core-linus (4956964 Merge tag 
'driver-core-3.8-rc2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core)
Merging tty.current/tty-linus (d1c3ed6 Linux 3.8-rc2)
Merging usb.current/usb-linus (75e1a2a USB: ehci: make debug port in-use 
detection functional again)
Merging staging.current/staging-linus (e16a922 staging: tidspbridge: use 
prepare/unprepare on dsp clocks)
Merging char-misc.current/char-misc-linus (e6028db mei: fix mismatch in mutex 
unlock-lock in mei_amthif_read())
Merging input-current/for-linus (bec7a4b Input: lm8323 - fix checking PWM 
interrupt status)
Merging md-current/for-linus (a9add5d md/raid5: add blktrace calls)
Merging audit-current/for-linus (c158a35 audit: no leading space in 
audit_log_d_path prefix)
Merging crypto-current/master (a2c0911 crypto: caam - Updated SEC-4.0 device 
tree binding for ERA information.)
Merging ide/master (9974e43 ide: fix generic_ide_suspend/resume Oops)
Merging dwmw2/master (084a0ec x86: add CONFIG_X86_MOVBE option)
CONFLICT (content): Merge conflict in arch/x86/Kconfig
Merging sh-current/sh-fixes-for-linus (4403310 SH: Convert out[bwl] macros to 
inline functions)
Merging irqdomain-current/irqdomain/merge (a0d271c Linux 3.6)
Merging devicetree-current/devicetree/merge (ab28698 of: define struct device 
in of_platform.h if !OF_DEVICE and !OF_ADDRESS)
Merging spi-current/spi/merge (d3601e5 spi/sh-hspi: fix return value check in 
hspi_probe().)
Merging gpio-current/gpio/merge (bc1008c gpio/mvebu-gpio: Make mvebu-gpio 
depend on OF_CONFIG)
Merging rr-fixes/fixes (52441fa module: prevent warning when finit_module a 0 
sized file)
Merging asm-generic/master (fb9de7e xtensa: Use generic asm/mmu.h for nommu)
Merging arm/for

Re: linux-next: comment in commit in the slave-dma tree (dmaengine: dw_dmac: Enhance device tree support)

2013-01-07 Thread Viresh Kumar
On Tue, Jan 8, 2013 at 4:05 AM, Stephen Rothwell  wrote:
> In commit 548860697046 ("dmaengine: dw_dmac: Enhance device tree
> support") (which was changed since yesterday :-(), an instance of
> __devinit has been added.  We are in the process of making CONFIG_HOTPLUG
> always true, and since commit 78d86c213f28 ("init.h: Remove __dev*
> sections from the kernel") (included in v3.7), __devinit does nothing.
> Please do not add any more.

When this patch got added, __devinit discussion just started. Because Vinod
missed the merge window, we still have this patch in linux-next.

@Vinod: I hope you can edit this patch yourself?

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the staging tree

2013-01-07 Thread Greg KH
On Tue, Jan 08, 2013 at 01:23:52PM +1100, Stephen Rothwell wrote:
> Hi Greg,
> 
> After merging the staging tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
> 
> drivers/staging/comedi/comedi_fops.c: In function 'comedi_unlocked_ioctl':
> drivers/staging/comedi/comedi_fops.c:1665:4: error: 'dev_file_info' 
> undeclared (first use in this function)
> 
> Caused by commit 4da5fa9a439f ("staging: comedi: use comedi_dev_from_minor
> ()") interacting with commit 7d3135af399e ("staging: comedi: prevent
> auto-unconfig of manually configured devices") from the staging.current
> tree.
> 
> I just reverted the latter commit in the hope that the bug is fixed in
> some other way in the staging tree.

Yes, I had to do some manual fixup when merging the branches together to
get it to work properly, which is what I did in my staging-next tree.
As the fixup was done in the merge commit, maybe you didn't get it when
you did the pull?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the staging tree

2013-01-07 Thread Stephen Rothwell
Hi Greg,

On Mon, 7 Jan 2013 20:27:30 -0800 Greg KH  wrote:
>
> On Tue, Jan 08, 2013 at 01:23:52PM +1100, Stephen Rothwell wrote:
> > Hi Greg,
> > 
> > After merging the staging tree, today's linux-next build (x86_64
> > allmodconfig) failed like this:
> > 
> > drivers/staging/comedi/comedi_fops.c: In function 'comedi_unlocked_ioctl':
> > drivers/staging/comedi/comedi_fops.c:1665:4: error: 'dev_file_info' 
> > undeclared (first use in this function)
> > 
> > Caused by commit 4da5fa9a439f ("staging: comedi: use comedi_dev_from_minor
> > ()") interacting with commit 7d3135af399e ("staging: comedi: prevent
> > auto-unconfig of manually configured devices") from the staging.current
> > tree.
> > 
> > I just reverted the latter commit in the hope that the bug is fixed in
> > some other way in the staging tree.
> 
> Yes, I had to do some manual fixup when merging the branches together to
> get it to work properly, which is what I did in my staging-next tree.
> As the fixup was done in the merge commit, maybe you didn't get it when
> you did the pull?

Presumably not (my top of staging-next is d7f9729f6e06) - it was changing
pretty rapidly this morning as I was fetching trees ...

At least it will be OK tomorrow.
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpklVlrKtIZv.pgp
Description: PGP signature


Re: [PATCH] backlight: check null deference of name when device is registered

2013-01-07 Thread devendra.aaru
On Mon, Jan 7, 2013 at 8:35 PM, Andrew Morton  wrote:
> On Tue, 08 Jan 2013 10:25:35 +0900 Jingoo Han  wrote:
>
>> On Tuesday, January 08, 2013 9:02 AM, Andrew Morton wrote
>> > On Fri, 04 Jan 2013 17:29:11 +0900
>> > Jingoo Han  wrote:
>> >
>> > > NULL deference of name is checked when device is registered.
>> > > If the name is null, it will cause a kernel oops in dev_set_name().
>> > >
>> > > ...
>> > >
>> > > --- a/drivers/video/backlight/backlight.c
>> > > +++ b/drivers/video/backlight/backlight.c
>> > > @@ -292,6 +292,11 @@ struct backlight_device 
>> > > *backlight_device_register(const char *name,
>> > >   struct backlight_device *new_bd;
>> > >   int rc;
>> > >
>> > > + if (name == NULL) {
>> > > + pr_err("backlight name is null\n");
>> > > + return ERR_PTR(-EINVAL);
>> > > + }
>> > > +
>> > >   pr_debug("backlight_device_register: name=%s\n", name);
>> >
>> > I don't understand this.
>> >
>> > Is there some driver which is calling these functions with name=NULL?
>> > If so, which one(s)?
>>
>> No, there is no one.
>>
>> >
>> > If "no" then why don't we declare that "passing name=NULL is a bug" and
>> > leave the code as-is?
>>
>> Do you mean following?
>>
>> + if (name == NULL)
>> + pr_err("passing name=NULL is a bug");
>> +
>>   pr_debug("backlight_device_register: name=%s\n", name);
>
> Nope; I'm suggesting we leave the code alone.  If someone passes in
> NULL they will get a nice oops and their bug will then get fixed.
>

We still fail the probe in the patch Jingoo Han sent,
anyways if i catch your point correctly is this the lines you wanted?

+ /**
+  * BUG if the name of the backlight device
+  * is a NULL
+  */
+ BUG_ON(!name);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] cpuhotplug/nohz: Remove offline cpus from nohz-idle state

2013-01-07 Thread Srivatsa Vaddagiri
* Russell King - ARM Linux  [2013-01-05 10:36:27]:

> On Thu, Jan 03, 2013 at 06:58:38PM -0800, Srivatsa Vaddagiri wrote:
> > I also think that the
> > wait_for_completion() based wait in ARM's __cpu_die() can be replaced with a
> > busy-loop based one, as the wait there in general should be terminated 
> > within
> > few cycles.
> 
> Why open-code this stuff when we have infrastructure already in the kernel
> for waiting for stuff to happen?  I chose to use the standard infrastructure
> because its better tested, and avoids having to think about whether we need
> CPU barriers and such like to ensure that updates are seen in a timely
> manner.

I was primarily thinking of calling as few generic functions as possible on
a dead cpu. I recall several "am I running on a dead cpu?" checks
(cpu_is_offline(this_cpu) that were put in generic routines during early
versions of cpu hotplug [1] to educate code running on dead cpu, the need for
which went away though with introduction of atomic/stop-machine variant. The
need to add a RCU_NONIDLE() wrapper around ARM's cpu_die() [2] is perhaps a more
recent example of educating code running on dead cpu. As quickly we die as
possible after idle thread of dying cpu gains control, the better!

1. http://lwn.net/Articles/69040/
2. http://lists.infradead.org/pipermail/linux-arm-kernel/2012-July/107971.html

- vatsa
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 1/1] Tiny RCU changes for 3.9

2013-01-07 Thread Josh Triplett
On Mon, Jan 07, 2013 at 02:19:15PM -0800, Paul E. McKenney wrote:
> On Mon, Jan 07, 2013 at 09:56:06AM -0800, Josh Triplett wrote:
> > On Mon, Jan 07, 2013 at 08:57:48AM -0800, Paul E. McKenney wrote:
> > > On Mon, Jan 07, 2013 at 07:58:10AM -0800, Josh Triplett wrote:
> > > > This patch seems reasonable to me, but the repeated use of #if
> > > > defined(CONFIG_SMP) || defined(CONFIG_RCU_TRACE) seems somewhat
> > > > annoying, and fragile if you ever decide to change the conditions.  How
> > > > about defining an appropriate symbol in Kconfig for stall warnings, and
> > > > using that?
> > > 
> > > But I only just removed the config option for SMP RCU stall warnings.  ;-)
> > > 
> > > But I must agree that "defined(CONFIG_SMP) || defined(CONFIG_RCU_TRACE)"
> > > is a bit obscure.  The rationale is that RCU stall warnings are
> > > unconditionally enabled in SMP kernels, but don't want to be in
> > > TINY_RCU kernels due to size constraints.  I therefore put it under
> > > CONFIG_RCU_TRACE, which also contains other TINY_RCU debugging-style
> > > options.  Would adding a comment to this effect help?
> > 
> > I understand the rationale; I just think it would become clearer if you
> > added an internal-only Kconfig symbol selected in both cases and change
> > the conditionals to use that.
> 
> My concern was that this would confuse people into thinking that the
> code under those #ifdefs was all the stall-warning code that there was.
> 
> I suppose this could be forestalled with a suitably clever name...
> CONFIG_RCU_CPU_STALL_TINY_TOO?  Better names?

How about CONFIG_RCU_STALL_COMMON, with associated help text saying
"include the stall-detection code common to both rcutree and rcutiny"?

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v12 0/9] LSM: Multiple concurrent LSMs

2013-01-07 Thread Casey Schaufler
On 1/7/2013 7:59 PM, Stephen Rothwell wrote:
> Hi Casey,
>
> On Tue, 8 Jan 2013 14:01:59 +1100 Stephen Rothwell  
> wrote:
>> Let me ask Andrew's question:  Why do you want to do this (what is the
>> use case)?  What does this gain us?
>>
>> Also, you should use unique subjects for each of the patches in the
>> series.
> You probably also want to think a bit harder about the order of the
> patches - you should introduce new APIs before you use them and remove
> calls to functions before you remove the functions.
>
The unfortunate reality is that I couldn't find a good way to stage the
changes. It's a wonking big set of infrastructure change. I could introduce
the security blob abstraction separately but that is a fraction of the
change. If it would have gone through mail filters as a single patch I'd
have sent it that way.

I can spend time on patch presentation, and will if necessary. As it is,
I can start getting substantive commentary from beyond the LSM crowd, who
have already been extremely cooperative and often critical.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 26/33] rcu: Don't keep the tick for RCU while in userspace

2013-01-07 Thread Paul E. McKenney
On Tue, Jan 08, 2013 at 03:08:26AM +0100, Frederic Weisbecker wrote:
> If we are interrupting userspace, we don't need to keep
> the tick for RCU: quiescent states don't need to be reported
> because we soon run in userspace and local callbacks are handled
> by the nocb threads.
> 
> CHECKME: Do the nocb threads actually handle the global
> grace period completion for local callbacks?

First answering this for the nocb stuff in mainline:  In this case,
the grace-period startup is handled by the CPU that is not a nocb
CPU, and there has to be at least one.  The grace-period completion
is handled by the grace-period kthreads.  The nocbs CPU need do
nothing, at least assuming that it gets back into dyntick-idle
(or adaptive tickless) state reasonably quickly.

Second for the version in -rcu: In this case, the nocb kthreads
register the need for a grace period using a new mechanism that
pushes the need up the rcu_node tree.  The grace-period completion
is again handled by the grace-period kthreads.  This allows all
CPUs to be nocbs CPUs.

So, in either case, yes, the below code should be safe as long as
the CPU gets into an RCU-idle state quickly (as in within a few
milliseconds or so).

Thanx, Paul

> Signed-off-by: Frederic Weisbecker 
> Cc: Alessio Igor Bogani 
> Cc: Andrew Morton 
> Cc: Chris Metcalf 
> Cc: Christoph Lameter 
> Cc: Geoff Levand 
> Cc: Gilad Ben Yossef 
> Cc: Hakan Akkan 
> Cc: Ingo Molnar 
> Cc: Li Zhong 
> Cc: Namhyung Kim 
> Cc: Paul E. McKenney 
> Cc: Paul Gortmaker 
> Cc: Peter Zijlstra 
> Cc: Steven Rostedt 
> Cc: Thomas Gleixner 
> ---
>  kernel/time/tick-sched.c |6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 1cd93a9..ecba8b7 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include 
> 
> @@ -604,10 +605,9 @@ static bool can_stop_full_tick(int cpu)
> 
>   /*
>* Keep the tick if we are asked to report a quiescent state.
> -  * This must be further optimized (avoid checks for local callbacks,
> -  * ignore RCU in userspace, etc...
> +  * This must be further optimized (avoid checks for local callbacks)
>*/
> - if (rcu_pending(cpu)) {
> + if (!context_tracking_in_user() && rcu_pending(cpu)) {
>   trace_printk("Can't stop: RCU pending\n");
>   return false;
>   }
> -- 
> 1.7.5.4
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

2013-01-07 Thread Viresh Kumar
Hi Steven,

On 8 January 2013 03:59, Steven Rostedt  wrote:
> On Mon, 2013-01-07 at 23:29 +0530, Viresh Kumar wrote:
>
>> > By default, I would suggest for cache locality,
>> > that we try to keep it on the same CPU. But if there's a better CPU to
>> > run on, it runs there.
>>
>> That would break our intention behind this routine. We should run
>> it on a cpu which we think is the best one for it power/performance wise.
>
> If you are running on a big.Little box sure. But for normal (read x86 ;)
> systems, it should probably stay on the current CPU.

But that's not the purpose of this new call. If the caller want it to be on
local cpu, he must not use this call. It is upto the core routine
(sched_select_cpu()
in our case) to decide where to launch it. If we need something special for x86,
we can hack this routine.

Even for non bigLITTLE systems, we may want to run a work on non-idle cpu and
so we can't guarantee local cpu here.

>> So, i would like to call the sched_select_cpu() routine from this routine to
>> get the suggested cpu.
>
> Does the caller need to know? Or if you have a big.LITTLE setup, it
> should just work automagically?

Caller wouldn't know, he just needs to trust on sched_select_cpu() to give the
most optimum cpu.

Again, it is not only for big LITTLE, but for any SMP system, where we
don't want
an idle cpu to do this work.

>> I don't think we need it :(
>> It would be a new API, with zero existing users. And so, whomsoever uses
>> it, should know what exactly he is doing :)
>
> Heh, you don't know kernel developers well do you? ;-)

I agree with you, but we don't need to care for foolish new code here.

> Once a new API is added to the kernel tree, it quickly becomes
> (mis)used.

Its true with all pieces of code in kernel and we really don't need to take
care of such users here :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v12 0/9] LSM: Multiple concurrent LSMs

2013-01-07 Thread Casey Schaufler
On 1/7/2013 7:01 PM, Stephen Rothwell wrote:
> Hi Casey,
>
> On Mon, 07 Jan 2013 17:54:24 -0800 Casey Schaufler  
> wrote:
>> Subject: [PATCH v12 0/9] LSM: Multiple concurrent LSMs
>>
>> Change the infrastructure for Linux Security Modules (LSM)s
>> from a single vector of hook handlers to a list based method
>> for handling multiple concurrent modules. 
>>
>> A level of indirection has been introduced in the handling of
>> security blobs. LSMs no longer access ->security fields directly,
>> instead they use an abstraction provided by lsm_[gs]et field
>> functions. 
>>
>> The XFRM hooks are only used by SELinux and it is not clear
>> that they can be shared. The First LSM that registers using
>> those hooks gets to use them. Any subsequent LSM that uses
>> those hooks is denied registration. 
>>
>> Secids have not been made shareable. Only one LSM that uses
>> secids (SELinux and Smack) can be used at a time. The first
>> to register wins.
>>
>> The "security=" boot option takes a comma separated list of
>> LSMs, registering them in the order presented. The LSM hooks
>> will be executed in the order registered. Hooks that return
>> errors are not short circuited. All hooks are called even
>> if one of the LSM hooks fails. The result returned will be
>> that of the last LSM hook that failed.
>>
>> Some hooks don't fit that model. setprocattr, getprocattr,
>> and a few others are special cased. All behavior from
>> security/capability.c has been moved into the hook handling.
>> The security/commoncap functions used to get called from
>> the LSM specific code. The handling of the capability
>> functions has been moved out of the LSMs and into the
>> hook handling.
>>
>> The /proc/*/attr interfaces are given to one LSM. This
>> can be done by setting CONFIG_SECURITY_PRESENT. Additional
>> interfaces have been created in /proc/*/attr so that
>> each LSM has its own named interfaces.
>>
>> Signed-off-by: Casey Schaufler 
> Let me ask Andrew's question:  Why do you want to do this (what is the
> use case)?  What does this gain us?

There has been an amazing amount of development in system security
over the past three years. Almost none of it has been in the kernel.
One important reason that it is not getting done in the kernel is
that the current single LSM restriction requires an all or nothing
approach to security. Either you address all your needs with a single
LSM or you have to go with a user space solution, in which case you
may as well do everything in user space.

Multiple concurrent LSMs allows a system to be developed incrementally
and to combine a variety of approaches that meet new and interesting
needs. It allows for systems that are based on an LSM that does not
meet all of the requirements but that can be supplemented by another
LSM that fills the gaps. It allows an LSM like Smack that implements
label based access controls to remain true to its purpose even in the
face of pressure to add controls based on other mechanisms.

I have had requests for running Smack and AppArmor together Tetsuo has
long had need to put SELinux and TOMOYO on the same box. Yama was
recently special cased for stackability.

We are looking at security from different directions than ever before.
What good is a UID on a cell phone? I hear complaints about Android's
"abuse" of the UID. With the option of independent groups creating
smallish LSMs and integrating them by stacking we have the ability to
make the security systems modern devices require using a architecturally
clean model rather than hijacking existing mechanisms that work a
little bit like what you want to do.

I used to believe in a single, integrated security module that addressed
all the issues. Now that Linux is supporting everything from real time
tire pressure gauges in tricycles to the global no-fly list that just
doesn't seem reasonable. We need better turn around on supplemental 
mechanisms. That means collections of smaller, simpler LSMs instead of
monoliths that only a few select individuals or organizations have any
hope of configuring properly.

The topic has been discussed at the past couple of Linux Security Summits
and the only real issue has been who would grind out the code. All of
the existing LSM maintainers are open to stacking. 

> Also, you should use unique subjects for each of the patches in the
> series.
>
Yes, I noticed I'd mucked that up as they flew into the ether.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v12 0/9] LSM: Multiple concurrent LSMs

2013-01-07 Thread Stephen Rothwell
Hi Casey,

On Tue, 8 Jan 2013 14:01:59 +1100 Stephen Rothwell  
wrote:
>
> Let me ask Andrew's question:  Why do you want to do this (what is the
> use case)?  What does this gain us?
> 
> Also, you should use unique subjects for each of the patches in the
> series.

You probably also want to think a bit harder about the order of the
patches - you should introduce new APIs before you use them and remove
calls to functions before you remove the functions.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpNjKmzf8Xwg.pgp
Description: PGP signature


Re: linux-next: Tree for Jan 7 (rcutorture)

2013-01-07 Thread Paul E. McKenney
On Mon, Jan 07, 2013 at 07:36:19PM -0500, Steven Rostedt wrote:
> On Mon, 2013-01-07 at 18:12 -0500, Steven Rostedt wrote:
> > On Tue, 2013-01-08 at 09:59 +1100, Stephen Rothwell wrote:
> > > Hi Paul,
> > > 
> > > On Mon, 7 Jan 2013 14:16:27 -0800 "Paul E. McKenney" 
> > >  wrote:
> > > >
> > > > On Mon, Jan 07, 2013 at 11:42:36AM -0800, Randy Dunlap wrote:
> > > > > 
> > > > > on i386 or x86_64:
> > > > > 
> > > > > ERROR: "trace_clock_local" [kernel/rcutorture.ko] undefined!
> > > > 
> > > > Hello, Randy,
> > > > 
> > > > Did your build include the following, also pushed to -next in that same
> > > > batch from -rcu?  Including Steven Rostedt on CC for his take.
> > > 
> > > That commit was certainly in next-20130107.
> > 
> > Could be bad config dependencies.
> 
> Paul,
> 
> You need to also select TRACE_CLOCK if you are going to use it.

Thank you, Steve!

Randy, does the following patch help?

Thanx, Paul



diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index ce75d3b..b0fe7bd 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1010,6 +1010,7 @@ config RCU_CPU_STALL_INFO
 config RCU_TRACE
bool "Enable tracing for RCU"
depends on DEBUG_KERNEL
+   select TRACE_CLOCK
help
  This option provides tracing in RCU which presents stats
  in debugfs for debugging RCU implementation.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it

2013-01-07 Thread Yinghai Lu
On Mon, Jan 7, 2013 at 7:13 PM, Eric W. Biederman  wrote:
> I meant we should detect failure to allocate bounce buffers in in
> swiotlb_init() instead of panicing.
>
> I meant swiotlb_map_single() should either panic or simply fail.
>
> If I have read lib/swiotlb.c correctly the only place we allocate a
> bounce buffer is in swiotlb_map_single.  If there are more places we can
> allocate bounce buffers those need to be handled as well.

ok, will give it a try.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hwmon: (sht15) avoid CamelCase

2013-01-07 Thread Guenter Roeck
On Mon, Jan 07, 2013 at 02:18:38PM -0500, Vivien Didelot wrote:
> This patch renames the supply_uV* variables to supply_uv* to avoid
> CamelCase as warned by the checkpatch.pl script.
> 
> Signed-off-by: Vivien Didelot 

Applied to -next.

Thanks,
Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] chelsio: Use netdev_ and pr_

2013-01-07 Thread David Miller
From: Joe Perches 
Date: Sun, 06 Jan 2013 15:34:49 -0800

> Use more current logging styles.
> 
> Convert printks to pr_ and
> printks with ("%s: ...", dev->name to netdev_(dev, "...
> 
> Add pr_fmt #defines where appropriate.
> Coalesce formats.
> Use pr__once where appropriate.
> 
> Signed-off-by: Joe Perches 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] drivers: xhci: fix incorrect bit test

2013-01-07 Thread Nickolai Zeldovich
Fix incorrect bit test that originally showed up in
4ee823b83bc9851743fab756c76b27d6a1e2472b: use '&' instead of '&&'.

Signed-off-by: Nickolai Zeldovich 
---
 drivers/usb/host/xhci-ring.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index cbb44b7..06921b6 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1698,7 +1698,7 @@ static void handle_port_status(struct xhci_hcd *xhci,
faked_port_index + 1);
if (slot_id && xhci->devs[slot_id])
xhci_ring_device(xhci, slot_id);
-   if (bus_state->port_remote_wakeup && (1 << faked_port_index)) {
+   if (bus_state->port_remote_wakeup & (1 << faked_port_index)) {
bus_state->port_remote_wakeup &=
~(1 << faked_port_index);
xhci_test_and_clear_bit(xhci, port_array,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/11] spi/pxa2xx: allow building on a 64-bit kernel

2013-01-07 Thread Eric Miao
On Mon, Jan 7, 2013 at 6:44 PM, Mika Westerberg
 wrote:
> In addition fix following warnings seen when compiling 64-bit:
>
> drivers/spi/spi-pxa2xx.c: In function ‘map_dma_buffers’: 
> drivers/spi/spi-pxa2xx.c:384:7: warning: cast from pointer to integer of 
> different size [-Wpointer-to-int-cast]
> drivers/spi/spi-pxa2xx.c:384:40: warning: cast from pointer to integer of 
> different size [-Wpointer-to-int-cast]
> drivers/spi/spi-pxa2xx.c: In function ‘pxa2xx_spi_probe’:
> drivers/spi/spi-pxa2xx.c:1572:34: warning: cast from pointer to integer of 
> different size [-Wpointer-to-int-cast]
> drivers/spi/spi-pxa2xx.c:1572:34: warning: cast from pointer to integer of 
> different size [-Wpointer-to-int-cast]
> drivers/spi/spi-pxa2xx.c:1572:34: warning: cast from pointer to integer of 
> different size [-Wpointer-to-int-cast]
> drivers/spi/spi-pxa2xx.c:1572:27: warning: cast to pointer from integer of 
> different size [-Wint-to-pointer-cast]
>
> Signed-off-by: Mika Westerberg 
> ---
>  drivers/spi/Kconfig  |4 ++--
>  drivers/spi/spi-pxa2xx.c |5 ++---
>  2 files changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/spi/Kconfig b/drivers/spi/Kconfig
> index 2e188e1..a90393d 100644
> --- a/drivers/spi/Kconfig
> +++ b/drivers/spi/Kconfig
> @@ -299,7 +299,7 @@ config SPI_PPC4xx
>
>  config SPI_PXA2XX
> tristate "PXA2xx SSP SPI master"
> -   depends on (ARCH_PXA || (X86_32 && PCI)) && EXPERIMENTAL
> +   depends on ARCH_PXA || PCI
> select PXA_SSP if ARCH_PXA
> help
>   This enables using a PXA2xx or Sodaville SSP port as a SPI master
> @@ -307,7 +307,7 @@ config SPI_PXA2XX
>   additional documentation can be found a Documentation/spi/pxa2xx.
>
>  config SPI_PXA2XX_PCI
> -   def_bool SPI_PXA2XX && X86_32 && PCI
> +   def_tristate SPI_PXA2XX && PCI
>

Generally looks good to me, I think we could split the changes to

* Kconfig (adding 64-bit support or removing restrictions of X86_32
for the driver)
* and to spi-pxa2xx.c (mostly to handle the alignment warnings)

>  config SPI_RSPI
> tristate "Renesas RSPI controller"
> diff --git a/drivers/spi/spi-pxa2xx.c b/drivers/spi/spi-pxa2xx.c
> index 5c8c4f5..7fac65d 100644
> --- a/drivers/spi/spi-pxa2xx.c
> +++ b/drivers/spi/spi-pxa2xx.c
> @@ -47,7 +47,7 @@ MODULE_ALIAS("platform:pxa2xx-spi");
>
>  #define DMA_INT_MASK   (DCSR_ENDINTR | DCSR_STARTINTR | DCSR_BUSERR)
>  #define RESET_DMA_CHANNEL  (DCSR_NODESC | DMA_INT_MASK)
> -#define IS_DMA_ALIGNED(x)  u32)(x)) & 0x07) == 0)
> +#define IS_DMA_ALIGNED(x)  IS_ALIGNED((unsigned long)x, DMA_ALIGNMENT)

OK.

>  #define MAX_DMA_LEN8191
>  #define DMA_ALIGNMENT  8
>
> @@ -1569,8 +1569,7 @@ static int pxa2xx_spi_probe(struct platform_device 
> *pdev)
> master->transfer = transfer;
>
> drv_data->ssp_type = ssp->type;
> -   drv_data->null_dma_buf = (u32 *)ALIGN((u32)(drv_data +
> -   sizeof(struct driver_data)), 
> 8);
> +   drv_data->null_dma_buf = (u32 *)PTR_ALIGN(drv_data + 1, 8);

Hmm... the original code seems to have big problem and interestingly no
one has reported the issue, possibly due to null_dma_buf being seldomly
used.

However, it's still a bit obscure to have 'drv_data + 1', 'drv_data[1]' might
be a bit better for readability.

And it'll be better to have DMA_ALIGNTMENT here instead of a hard-coded
constant '8'.
u

>
> drv_data->ioaddr = ssp->mmio_base;
> drv_data->ssdr_physical = ssp->phys_base + SSDR;
> --
> 1.7.10.4
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the drop-experimental tree with Linus' and the net-next trees

2013-01-07 Thread Stephen Rothwell
Hi Kees,

Today's linux-next merge of the drop-experimental tree got a conflict in
drivers/net/ethernet/intel/Kconfig between commit c56283034ce2 ("pps,
ptp: Remove dependencies on EXPERIMENTAL") from Linus' tree, commit
483f777266f5 ("drivers/net: remove orphaned references to micro channel")
from the net-next tree and commit a9270b9a8436
("drivers/net/ethernet/intel: remove depends on CONFIG_EXPERIMENTAL")
from the drop-experimental tree.

I fixed it up and can carry the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpFX4ZZdd4yL.pgp
Description: PGP signature


Re: [PATCH] mm: compaction: fix echo 1 > compact_memory return error issue

2013-01-07 Thread David Rientjes
On Mon, 7 Jan 2013, Andrew Morton wrote:

> --- 
> a/mm/compaction.c~mm-compaction-fix-echo-1-compact_memory-return-error-issue-fix
> +++ a/mm/compaction.c
> @@ -1150,7 +1150,7 @@ unsigned long try_to_compact_pages(struc
>  
>  
>  /* Compact all zones within a node */
> -static int __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
> +static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
>  {
>   int zoneid;
>   struct zone *zone;
> @@ -1183,11 +1183,9 @@ static int __compact_pgdat(pg_data_t *pg
>   VM_BUG_ON(!list_empty(>freepages));
>   VM_BUG_ON(!list_empty(>migratepages));
>   }
> -
> - return 0;
>  }
>  
> -int compact_pgdat(pg_data_t *pgdat, int order)
> +void compact_pgdat(pg_data_t *pgdat, int order)
>  {
>   struct compact_control cc = {
>   .order = order,
> @@ -1195,10 +1193,10 @@ int compact_pgdat(pg_data_t *pgdat, int 
>   .page = NULL,
>   };
>  
> - return __compact_pgdat(pgdat, );
> + __compact_pgdat(pgdat, );
>  }
>  
> -static int compact_node(int nid)
> +static void compact_node(int nid)
>  {
>   struct compact_control cc = {
>   .order = -1,
> @@ -1206,7 +1204,7 @@ static int compact_node(int nid)
>   .page = NULL,
>   };
>  
> - return __compact_pgdat(NODE_DATA(nid), );
> + __compact_pgdat(NODE_DATA(nid), );
>  }
>  
>  /* Compact all nodes in the system */
> diff -puN 
> include/linux/compaction.h~mm-compaction-fix-echo-1-compact_memory-return-error-issue-fix
>  include/linux/compaction.h
> --- 
> a/include/linux/compaction.h~mm-compaction-fix-echo-1-compact_memory-return-error-issue-fix
> +++ a/include/linux/compaction.h
> @@ -23,7 +23,7 @@ extern int fragmentation_index(struct zo
>  extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
>   int order, gfp_t gfp_mask, nodemask_t *mask,
>   bool sync, bool *contended, struct page **page);
> -extern int compact_pgdat(pg_data_t *pgdat, int order);
> +extern void compact_pgdat(pg_data_t *pgdat, int order);
>  extern void reset_isolation_suitable(pg_data_t *pgdat);
>  extern unsigned long compaction_suitable(struct zone *zone, int order);
>  
> @@ -80,9 +80,8 @@ static inline unsigned long try_to_compa
>   return COMPACT_CONTINUE;
>  }
>  
> -static inline int compact_pgdat(pg_data_t *pgdat, int order)
> +static inline void compact_pgdat(pg_data_t *pgdat, int order)
>  {
> - return COMPACT_CONTINUE;
>  }
>  
>  static inline void reset_isolation_suitable(pg_data_t *pgdat)

Acked-by: David Rientjes 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the drop-experimental tree with the net-next tree

2013-01-07 Thread Stephen Rothwell
Hi Kees,

Today's linux-next merge of the drop-experimental tree got conflicts in
drivers/net/ethernet/8390/Kconfig and drivers/net/ethernet/i825xx/Kconfig
between commits 483f777266f5 ("drivers/net: remove orphaned references to
micro channel") from the net-next tree and commits 16f5471ff2da
("drivers/net/ethernet/8390: remove depends on CONFIG_EXPERIMENTAL") and
65af6f091e6d ("drivers/net/ethernet/i825xx: remove depends on
CONFIG_EXPERIMENTAL") from the drop-experimental tree.

I fixed it up (I used the latter versions) and can carry the fix as
necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgph4ZkY7fzOo.pgp
Description: PGP signature


Re: [PATCH] mm: compaction: fix echo 1 > compact_memory return error issue

2013-01-07 Thread David Rientjes
On Sun, 6 Jan 2013, Jason Liu wrote:

> when run the folloing command under shell, it will return error
> sh/$ echo 1 > /proc/sys/vm/compact_memory
> sh/$ sh: write error: Bad address
> 
> After strace, I found the following log:
> ...
> write(1, "1\n", 2)   = 3
> write(1, "", 4294967295) = -1 EFAULT (Bad address)
> write(2, "echo: write error: Bad address\n", 31echo: write error: Bad address
> ) = 31
> 
> This tells system return 3(COMPACT_COMPLETE) after write data to 
> compact_memory.
> 
> The fix is to make the system just return 0 instead 3(COMPACT_COMPLETE) from
> sysctl_compaction_handler after compaction_nodes finished.
> 
> Suggested-by:David Rientjes 
> Cc:Mel Gorman 
> Cc:Andrew Morton 
> Cc:Rik van Riel 
> Cc:Minchan Kim 
> Cc:KAMEZAWA Hiroyuki 
> Signed-off-by: Jason Liu 

Acked-by: David Rientjes 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC]x86: clearing access bit don't flush tlb

2013-01-07 Thread Simon Jeons
Hi Shaohua,

On Mon, 2013-01-07 at 16:12 +0800, Shaohua Li wrote:
> We use access bit to age a page at page reclaim. When clearing pte access bit,

Who sets this flag to pte? mmu? tlb?

> we could skip tlb flush for the virtual address. The side effect is if the pte
> is in tlb and pte access bit is unset, when cpu access the page again, cpu 
> will
> not set pte's access bit. So next time page reclaim can reclaim hot pages
> wrongly, but this doesn't corrupt anything. And according to intel manual, tlb
> has less than 1k entries, which coverers < 4M memory. In today's system,
> several giga byte memory is normal. After page reclaim clears pte access bit
> and before cpu access the page again, it's quite unlikely this page's pte is
> still in TLB. Skiping the tlb flush for this case sounds ok to me.
> 

If one page is accessed more frequently than the other page before page
reclaim, page reclaim treat them the same hot according to access flag
since the flag used to age page just at page reclaim. How to handle this
issue?

> And in some workloads, TLB flush overhead is very heavy. In my simple
> multithread app with a lot of swap to several pcie SSD, removing the tlb flush
> gives about 20% ~ 30% swapout speedup.
> 
> Signed-off-by: Shaohua Li 
> ---
>  arch/x86/mm/pgtable.c |7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> Index: linux/arch/x86/mm/pgtable.c
> ===
> --- linux.orig/arch/x86/mm/pgtable.c  2012-12-17 16:54:37.847770807 +0800
> +++ linux/arch/x86/mm/pgtable.c   2013-01-07 14:59:40.898066357 +0800
> @@ -376,13 +376,8 @@ int pmdp_test_and_clear_young(struct vm_
>  int ptep_clear_flush_young(struct vm_area_struct *vma,
>  unsigned long address, pte_t *ptep)
>  {
> - int young;
>  
> - young = ptep_test_and_clear_young(vma, address, ptep);
> - if (young)
> - flush_tlb_page(vma, address);
> -
> - return young;
> + return ptep_test_and_clear_young(vma, address, ptep);
>  }
>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it

2013-01-07 Thread Eric W. Biederman
Yinghai Lu  writes:

> On Mon, Jan 7, 2013 at 6:22 PM, Eric W. Biederman  
> wrote:
>> Yinghai I sat down and read your patch and the approach you are taking
>> is totally wrong.
>
> Thanks for check the patch, did you check v3?

I looked at the version of the patch you had as an attachment.  I don't
know the version number but it was the latest version of the patch I saw
in this thread.

After looking at things having a function enoung_mem_for_swiotlb()
in pci_swiotlb_detect_override() and pic_swiotlb_detect_4gb is brittle
hack.

>> The problem is that swiotlb_init() in lib/swiotlb.c does not know how to
>> fail without panic'ing the system.
>
> I did not put panic in swiotlb, now I put panic in amd_iommu ops init
> when it need extra
> swiotlb for unhandled devices by AMD IOMMU.

But the only reason you need to touch this code at all is that
swiotlb_init() calls panic() if you don't have 64MB of memory below 4G.

>> Which leaves two valid approaches.
>> - Create a variant of swiotlb_init that can fail for use on x86 and
>>   handle the failure.
>> - Delay initializing the swiotlb until someone actually needs a mapping
>>   from it.
>>
>> Delaying the initialization of the swiotlb is out because the code
>> needs an early memory allocation to get a large chunk of contiguous
>> memory for the bounce buffers.
>
> ok.
>
>>
>> Which means the panics that occurr in swiotlb_init() need to be delayed
>> until someone something actually needs bounce buffers from the swiotlb.
>>
>> Although arguably what should actually happen instead of panic() is that
>> swiotlb_map_single should simply fail early when it was not possible to
>> preallocate bounce buffers.
>
> do you mean: actually needed dma buffer is much less than swiotlb
> buffer aka 64M?


I meant we should detect failure to allocate bounce buffers in in
swiotlb_init() instead of panicing.

I meant swiotlb_map_single() should either panic or simply fail.

If I have read lib/swiotlb.c correctly the only place we allocate a
bounce buffer is in swiotlb_map_single.  If there are more places we can
allocate bounce buffers those need to be handled as well.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 0/5] Reset PCIe devices to address DMA problem on kdump with iommu

2013-01-07 Thread Yinghai Lu
On Mon, Jan 7, 2013 at 4:42 PM, Thomas Renninger  wrote:
> memmap=256M$3584M

may need to change to:

memmap=256M\$\$3584M

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/5] virtio: add functions for piecewise addition of buffers

2013-01-07 Thread Rusty Russell
Paolo Bonzini  writes:
> Il 07/01/2013 01:02, Rusty Russell ha scritto:
>> Paolo Bonzini  writes:
>>> Il 02/01/2013 06:03, Rusty Russell ha scritto:
 Paolo Bonzini  writes:
> The virtqueue_add_buf function has two limitations:
>
> 1) it requires the caller to provide all the buffers in a single call;
>
> 2) it does not support chained scatterlists: the buffers must be
> provided as an array of struct scatterlist;

 Chained scatterlists are a horrible interface, but that doesn't mean we
 shouldn't support them if there's a need.

 I think I once even had a patch which passed two chained sgs, rather
 than a combo sg and two length numbers.  It's very old, but I've pasted
 it below.

 Duplicating the implementation by having another interface is pretty
 nasty; I think I'd prefer the chained scatterlists, if that's optimal
 for you.
>>>
>>> Unfortunately, that cannot work because not all architectures support
>>> chained scatterlists.
>> 
>> WHAT?  I can't figure out what an arch needs to do to support this?
>
> It needs to use the iterator functions in its DMA driver.

But we don't care for virtio.

>> All archs we care about support them, though, so I think we can ignore
>> this issue for now.
>
> Kind of... In principle all QEMU-supported arches can use virtio, and
> the speedup can be quite useful.  And there is no Kconfig symbol for SG
> chains that I can use to disable virtio-scsi on unsupported arches. :/

Well, we #error if it's not supported.  Then the lazy architectures can
fix it.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it

2013-01-07 Thread Eric W. Biederman
Konrad Rzeszutek Wilk  writes:

> On Mon, Jan 07, 2013 at 06:22:51PM -0800, Eric W. Biederman wrote:
>> Shuah Khan  writes:
>> 
>> > On Mon, Jan 7, 2013 at 8:26 AM, Konrad Rzeszutek Wilk
>> >  wrote:
>> >> On Fri, Jan 04, 2013 at 02:10:25PM -0800, Yinghai Lu wrote:
>> >>> On Fri, Jan 4, 2013 at 1:02 PM, Shuah Khan  wrote:
>> >>> > Pani'cing the system doesn't sound like a good option to me in this
>> >>> > case. This change to disable swiotlb is made for kdump. However, with
>> >>> > this change several system fail to boot, unless crashkernel_low=72M is
>> >>> > specified.
>> >>>
>> >>> this patchset is new feature to put second kdump kernel above 4G.
>> >>>
>> >>> >
>> >>> > I would the say the right approach to solve this would be to not
>> >>> > change the current pci_swiotlb_detect_override() behavior and treat
>> >>> > swiotlb =1 upon entry equivalent to swiotlb_force set.
>> >>>
>> >>> that will make intel system have to take crashkernel_low=72M too.
>> >>> otherwise intel system will get panic during swiotlb allocation.
>> >>
>> >> Two things:
>> >>
>> >>  1). You need to wrap the 'is_enough_..' in CONFIG_KEXEC, which means
>> >> that the function needs to go in a header file.
>> >>  2). The check for 1MB is suspect. Why only 1MB? You mentioned it is
>> >>  b/c of crashkernel_low=72M (which I am not seeing in v3.8 
>> >> kernel-parameters.txt?
>> >>  Is that part of your mega-patchset?). Anyhow, there seems to be a 
>> >> disconnect -
>> >>  what if the user supplied crashkernel_low=27M? Perhaps the 
>> >> 'is_enough'
>> >>  should also parse the bootparams to double-check that there is enough
>> >>  low-mem space? But then if the kernel grows then 72M might not be 
>> >> enough -
>> >>  you might need 82M with 3.9.
>> >>
>> >>  Perhaps a better way for this is to do:
>> >> 1). Change 'is_enough' to check only for 4MB.
>> >> 2). When booting as kexec, the SWIOTLB would only use 4MB instead 
>> >> of 64MB?
>> >>
>> >>  Or, we could also use the post-late SWIOTLB initialization similiary 
>> >> to how it was
>> >>  done on ia64. This would mean that the AMD VI code would just call 
>> >> the
>> >>  .. something like this - NOT tested or even compile tested:
>> >>
>> >> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
>> >> index c1c74e0..e7fa8f7 100644
>> >> --- a/drivers/iommu/amd_iommu.c
>> >> +++ b/drivers/iommu/amd_iommu.c
>> >> @@ -3173,6 +3173,24 @@ int __init amd_iommu_init_dma_ops(void)
>> >> if (unhandled && max_pfn > MAX_DMA32_PFN) {
>> >> /* There are unhandled devices - initialize swiotlb for 
>> >> them */
>> >> swiotlb = 1;
>> >> +   /* Late (so no bootmem allocator) usage and only if the 
>> >> early SWIOTLB
>> >> +* hadn't been allocated (which can happen on kexec 
>> >> kernels booted
>> >> +* above 4GB). */
>> >> +   if (!swiotlb_nr_tbl()) {
>> >> +   int retry = 3;
>> >> +   int mb_size = 64;
>> >> +   int rc = 0;
>> >> +retry_me:
>> >> +   if (retry < 0)
>> >> +   panic("We tried setting %dMB for SWIOTLB 
>> >> but got -ENOMEM", mb_size << 1);
>> >> +   rc = swiotlb_late_init_with_default_size(mb_size 
>> >> * (1<<20));
>> >> +   if (rc) {
>> >> +   retry --;
>> >> +   mb_size >> 1;
>> >> +   goto retry_me;
>> >> +   }
>> >> +   dma_ops = _dma_ops;
>> >> +   }
>> >> }
>> >>
>> >> amd_iommu_stats_init();
>> >>
>> >> And then the early SWIOTLB initialization for 64MB can fail and we are 
>> >> still OK.
>> >>>
>> >
>> > Yinghai/Konrad,
>> >
>> > Did more testing. btw this patch depends on your [v7u1,25/31]
>> > memblock: add memblock_mem_size(). Here are the test results:
>> >
>> > 1. When there is not enough memory: (enough_mem_for_swiotlb() returns 
>> > false)
>> > system will panic in amd_iommu_init_dma_ops().
>> >
>> > 2. When there is enough memory: (enough_mem_for_swiotlb() returns true):
>> > swiotlb is reserved
>> > pci_swiotlb_late_init() leaves the buffer allocated since swiotlb=1
>> > with that getting changed in amd_iommu_init_dma_ops().
>> >
>> > I agree with Konrad that the logic should be wrapped in CONFIG_KEXEC.
>> 
>> If enough_mem_for_swiotlb needs to be conditional on CONFIG_KEXEC the
>> code is architected wrong.  None of this logic has anything to do with
>> kexec except that the kexec path is one way to get this condition to
>> happen.  Especially since the kexec'd kernel where this condition occurs
>> does not need kexec support built in.
>
> Fair enough - with the 'memmap' command line options one can trigger
> this.
>> 
>> Yinghai I sat down and read your patch and the approach you are taking

Re: [PATCH v12 0/9] LSM: Multiple concurrent LSMs

2013-01-07 Thread Stephen Rothwell
Hi Casey,

On Mon, 07 Jan 2013 17:54:24 -0800 Casey Schaufler  
wrote:
>
> Subject: [PATCH v12 0/9] LSM: Multiple concurrent LSMs
> 
> Change the infrastructure for Linux Security Modules (LSM)s
> from a single vector of hook handlers to a list based method
> for handling multiple concurrent modules. 
> 
> A level of indirection has been introduced in the handling of
> security blobs. LSMs no longer access ->security fields directly,
> instead they use an abstraction provided by lsm_[gs]et field
> functions. 
> 
> The XFRM hooks are only used by SELinux and it is not clear
> that they can be shared. The First LSM that registers using
> those hooks gets to use them. Any subsequent LSM that uses
> those hooks is denied registration. 
> 
> Secids have not been made shareable. Only one LSM that uses
> secids (SELinux and Smack) can be used at a time. The first
> to register wins.
> 
> The "security=" boot option takes a comma separated list of
> LSMs, registering them in the order presented. The LSM hooks
> will be executed in the order registered. Hooks that return
> errors are not short circuited. All hooks are called even
> if one of the LSM hooks fails. The result returned will be
> that of the last LSM hook that failed.
> 
> Some hooks don't fit that model. setprocattr, getprocattr,
> and a few others are special cased. All behavior from
> security/capability.c has been moved into the hook handling.
> The security/commoncap functions used to get called from
> the LSM specific code. The handling of the capability
> functions has been moved out of the LSMs and into the
> hook handling.
> 
> The /proc/*/attr interfaces are given to one LSM. This
> can be done by setting CONFIG_SECURITY_PRESENT. Additional
> interfaces have been created in /proc/*/attr so that
> each LSM has its own named interfaces.
> 
> Signed-off-by: Casey Schaufler 

Let me ask Andrew's question:  Why do you want to do this (what is the
use case)?  What does this gain us?

Also, you should use unique subjects for each of the patches in the
series.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpGPoZ7A35I4.pgp
Description: PGP signature


Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it

2013-01-07 Thread Yinghai Lu
On Mon, Jan 7, 2013 at 6:22 PM, Eric W. Biederman  wrote:
> Yinghai I sat down and read your patch and the approach you are taking
> is totally wrong.

Thanks for check the patch, did you check v3?

>
> The problem is that swiotlb_init() in lib/swiotlb.c does not know how to
> fail without panic'ing the system.

I did not put panic in swiotlb, now I put panic in amd_iommu ops init
when it need extra
swiotlb for unhandled devices by AMD IOMMU.

>
> Which leaves two valid approaches.
> - Create a variant of swiotlb_init that can fail for use on x86 and
>   handle the failure.
> - Delay initializing the swiotlb until someone actually needs a mapping
>   from it.
>
> Delaying the initialization of the swiotlb is out because the code
> needs an early memory allocation to get a large chunk of contiguous
> memory for the bounce buffers.

ok.

>
> Which means the panics that occurr in swiotlb_init() need to be delayed
> until someone something actually needs bounce buffers from the swiotlb.
>
> Although arguably what should actually happen instead of panic() is that
> swiotlb_map_single should simply fail early when it was not possible to
> preallocate bounce buffers.

do you mean: actually needed dma buffer is much less than swiotlb
buffer aka 64M?

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it

2013-01-07 Thread Konrad Rzeszutek Wilk
On Mon, Jan 07, 2013 at 06:22:51PM -0800, Eric W. Biederman wrote:
> Shuah Khan  writes:
> 
> > On Mon, Jan 7, 2013 at 8:26 AM, Konrad Rzeszutek Wilk
> >  wrote:
> >> On Fri, Jan 04, 2013 at 02:10:25PM -0800, Yinghai Lu wrote:
> >>> On Fri, Jan 4, 2013 at 1:02 PM, Shuah Khan  wrote:
> >>> > Pani'cing the system doesn't sound like a good option to me in this
> >>> > case. This change to disable swiotlb is made for kdump. However, with
> >>> > this change several system fail to boot, unless crashkernel_low=72M is
> >>> > specified.
> >>>
> >>> this patchset is new feature to put second kdump kernel above 4G.
> >>>
> >>> >
> >>> > I would the say the right approach to solve this would be to not
> >>> > change the current pci_swiotlb_detect_override() behavior and treat
> >>> > swiotlb =1 upon entry equivalent to swiotlb_force set.
> >>>
> >>> that will make intel system have to take crashkernel_low=72M too.
> >>> otherwise intel system will get panic during swiotlb allocation.
> >>
> >> Two things:
> >>
> >>  1). You need to wrap the 'is_enough_..' in CONFIG_KEXEC, which means
> >> that the function needs to go in a header file.
> >>  2). The check for 1MB is suspect. Why only 1MB? You mentioned it is
> >>  b/c of crashkernel_low=72M (which I am not seeing in v3.8 
> >> kernel-parameters.txt?
> >>  Is that part of your mega-patchset?). Anyhow, there seems to be a 
> >> disconnect -
> >>  what if the user supplied crashkernel_low=27M? Perhaps the 'is_enough'
> >>  should also parse the bootparams to double-check that there is enough
> >>  low-mem space? But then if the kernel grows then 72M might not be 
> >> enough -
> >>  you might need 82M with 3.9.
> >>
> >>  Perhaps a better way for this is to do:
> >> 1). Change 'is_enough' to check only for 4MB.
> >> 2). When booting as kexec, the SWIOTLB would only use 4MB instead 
> >> of 64MB?
> >>
> >>  Or, we could also use the post-late SWIOTLB initialization similiary 
> >> to how it was
> >>  done on ia64. This would mean that the AMD VI code would just call the
> >>  .. something like this - NOT tested or even compile tested:
> >>
> >> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> >> index c1c74e0..e7fa8f7 100644
> >> --- a/drivers/iommu/amd_iommu.c
> >> +++ b/drivers/iommu/amd_iommu.c
> >> @@ -3173,6 +3173,24 @@ int __init amd_iommu_init_dma_ops(void)
> >> if (unhandled && max_pfn > MAX_DMA32_PFN) {
> >> /* There are unhandled devices - initialize swiotlb for 
> >> them */
> >> swiotlb = 1;
> >> +   /* Late (so no bootmem allocator) usage and only if the 
> >> early SWIOTLB
> >> +* hadn't been allocated (which can happen on kexec 
> >> kernels booted
> >> +* above 4GB). */
> >> +   if (!swiotlb_nr_tbl()) {
> >> +   int retry = 3;
> >> +   int mb_size = 64;
> >> +   int rc = 0;
> >> +retry_me:
> >> +   if (retry < 0)
> >> +   panic("We tried setting %dMB for SWIOTLB 
> >> but got -ENOMEM", mb_size << 1);
> >> +   rc = swiotlb_late_init_with_default_size(mb_size * 
> >> (1<<20));
> >> +   if (rc) {
> >> +   retry --;
> >> +   mb_size >> 1;
> >> +   goto retry_me;
> >> +   }
> >> +   dma_ops = _dma_ops;
> >> +   }
> >> }
> >>
> >> amd_iommu_stats_init();
> >>
> >> And then the early SWIOTLB initialization for 64MB can fail and we are 
> >> still OK.
> >>>
> >
> > Yinghai/Konrad,
> >
> > Did more testing. btw this patch depends on your [v7u1,25/31]
> > memblock: add memblock_mem_size(). Here are the test results:
> >
> > 1. When there is not enough memory: (enough_mem_for_swiotlb() returns false)
> > system will panic in amd_iommu_init_dma_ops().
> >
> > 2. When there is enough memory: (enough_mem_for_swiotlb() returns true):
> > swiotlb is reserved
> > pci_swiotlb_late_init() leaves the buffer allocated since swiotlb=1
> > with that getting changed in amd_iommu_init_dma_ops().
> >
> > I agree with Konrad that the logic should be wrapped in CONFIG_KEXEC.
> 
> If enough_mem_for_swiotlb needs to be conditional on CONFIG_KEXEC the
> code is architected wrong.  None of this logic has anything to do with
> kexec except that the kexec path is one way to get this condition to
> happen.  Especially since the kexec'd kernel where this condition occurs
> does not need kexec support built in.

Fair enough - with the 'memmap' command line options one can trigger
this.
> 
> Yinghai I sat down and read your patch and the approach you are taking
> is totally wrong.
> 
> The problem is that swiotlb_init() in lib/swiotlb.c does not know how to
> fail without panic'ing the system.
> 
> Which leaves two valid 

Re: [PATCH v7 1/2] KSM: numa awareness sysfs knob

2013-01-07 Thread Hugh Dickins
On Mon, 7 Jan 2013, Simon Jeons wrote:
> On Thu, 2013-01-03 at 13:24 +0100, Petr Holasek wrote:
> > Hi Simon,
> > 
> > On Mon, 31 Dec 2012, Simon Jeons wrote:
> > > On Fri, 2012-12-28 at 02:32 +0100, Petr Holasek wrote:
> > > > 
> > > > v7: - added sysfs ABI documentation for KSM
> > > 
> > > Hi Petr,
> > > 
> > > How you handle "memory corruption because the ksm page still points to
> > > the stable_node that has been freed" mentioned by Andrea this time?
> > > 
> > 
> 
> Hi Petr,
> 
> You still didn't answer my question mentioned above. :)

Yes, I noticed that too :)  I think Petr probably hopes that I'll
answer; and yes, I do hold myself responsible for solving this.

The honest answer is that I forgot all about it for a while.  I
had to go back to read the various threads to remind myself of what
Andrea said back then, and the ideas I had in replying.  Thank you
for reminding us.

I do intend to fix it along the lines I suggested then, if that works
out; but that is a danger in memory hotremove only, so at present I'm
still wrestling with the more immediate problem of stale stable_nodes
when switching merge_across_nodes between 1 and 0 and 1.

Many of the problems there come from reclaim under memory pressure:
stable pages being written out to swap, and faulted back in at "the
wrong time".  Essentially, existing bugs in KSM_RUN_UNMERGE, that
were not visible until merge_across_nodes brought us to rely upon it.

I have "advanced" from kernel oopses to userspace corruption: that's
no advance at all, no doubt I'm doing something stupid, but I haven't
spotted it yet; and once I've fixed that up, shall probably want to
look back at the little heap of fixups (a remove_all_stable_nodes()
function) and go about it quite differently - but for now I'm still
learning from the bugs I give myself.

> 
> > 
> > 
> > > >  
> > > > +   /*
> > > > +* If tree_page has been migrated to another NUMA node, 
> > > > it
> > > > +* will be flushed out and put into the right unstable 
> > > > tree
> > > > +* next time: only merge with it if merge_across_nodes.
> > > 
> > > Why? Do you mean swap based migration? Or where I miss ?
> > > 
> > 
> > It can be physical page migration triggered by page compaction, memory 
> > hotplug
> > or some NUMA sched/memory balancing algorithm developed recently.
> > 
> > > > +* Just notice, we don't have similar problem for 
> > > > PageKsm
> > > > +* because their migration is disabled now. (62b61f611e)
> > > > +*/
> > 
> > Migration of KSM pages is disabled now, you can look into ^^^ commit and
> > changes introduced to migrate.c.

Migration of KSM pages is still enabled in the memory hotremove case.

I don't remember how I tested that back then, so I want to enable KSM
page migration generally, just to be able to test it more thoroughly.
That would then benefit compaction, no longer frustrated by a KSM
page in the way.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sisusbvga: use proper device for dev_err() during probe

2013-01-07 Thread Nickolai Zeldovich
If kzalloc returns NULL, do not dereference the said NULL pointer as the
first argument to dev_err(); use >dev instead.  Similarly, before
sisusb->sisusb_dev has been initialized to dev, use dev_err(>dev)
instead.

Signed-off-by: Nickolai Zeldovich 
---
 drivers/usb/misc/sisusbvga/sisusb.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/misc/sisusbvga/sisusb.c 
b/drivers/usb/misc/sisusbvga/sisusb.c
index dd573ab..bf5f12c 100644
--- a/drivers/usb/misc/sisusbvga/sisusb.c
+++ b/drivers/usb/misc/sisusbvga/sisusb.c
@@ -3084,7 +3084,7 @@ static int sisusb_probe(struct usb_interface *intf,
 
/* Allocate memory for our private */
if (!(sisusb = kzalloc(sizeof(*sisusb), GFP_KERNEL))) {
-   dev_err(>sisusb_dev->dev, "Failed to allocate memory 
for private data\n");
+   dev_err(>dev, "Failed to allocate memory for private 
data\n");
return -ENOMEM;
}
kref_init(>kref);
@@ -3093,7 +3093,7 @@ static int sisusb_probe(struct usb_interface *intf,
 
/* Register device */
if ((retval = usb_register_dev(intf, _sisusb_class))) {
-   dev_err(>sisusb_dev->dev, "Failed to get a minor for 
device %d\n",
+   dev_err(>dev, "Failed to get a minor for device %d\n",
dev->devnum);
retval = -ENODEV;
goto error_1;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] staging/sb105x: remove asm/segment.h dependency

2013-01-07 Thread Jeff Mahoney
On 1/7/13 5:36 PM, Steven Rostedt wrote:
> This patch is obsoleted by:
> 
> https://lkml.org/lkml/2012/12/13/710
> 
> Which I just got the automated reply, and it's in Greg's staging tree
> now.
> 
> When I get time, I do want to try to get this driver (and device)
> working on my ppc64 box.

Ok, this blob (predictably) works to build on my ppc64 box.

--- a/drivers/staging/sb105x/sb_pci_mp.c
+++ b/drivers/staging/sb105x/sb_pci_mp.c
@@ -2836,7 +2836,8 @@ static void __init multi_init_ports(void
osc = 0;
for(j=0;jport.uartclk *= 2;
-   mtpt->port.flags|= STD_COM_FLAGS | UPF_SHARE_IRQ ;
+   mtpt->port.flags|= UPF_BOOT_AUTOCONF|UPF_SKIP_TEST;
+   mtpt->port.flags|= UPF_SHARE_IRQ;
mtpt->port.iotype   = UPIO_PORT;
mtpt->port.ops  = _pops;

-Jeff

> Thanks,
> 
> -- Steve
> 
> 
> On Mon, 2013-01-07 at 17:20 -0500, Jeff Mahoney wrote:
>> sb105x doesn't seem to actually need  (builds on x86
>> without it) and ppc/ppc64 doesn't provide it so it fails to build there.
>>
>> This patch removes the dependency. Unfortunately, it now fails to build
>> because STD_COM_FLAGS isn't defined on most architectures. I'm not familiar
>> enough with the tty/serial system to patch that aspect of it.
>>
>> CC: Steven Rostedt 
>> Signed-off-by: Jeff Mahoney 
>> ---
>>
>>  drivers/staging/sb105x/sb_pci_mp.h |1 -
>>  1 file changed, 1 deletion(-)
>>
>> --- a/drivers/staging/sb105x/sb_pci_mp.h
>> +++ b/drivers/staging/sb105x/sb_pci_mp.h
>> @@ -19,7 +19,6 @@
>>  #include 
>>  #include 
>>  #include 
>> -#include 
>>  #include 
>>  #include 
>>  
>>
>>
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- 
Jeff Mahoney
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] drivers/media/pci: use memmove for overlapping regions

2013-01-07 Thread Nickolai Zeldovich
Change several memcpy() to memmove() in cases when the regions are
definitely overlapping; memcpy() of overlapping regions is undefined
behavior in C and can produce different results depending on the compiler,
the memcpy implementation, etc.

Signed-off-by: Nickolai Zeldovich 
---
 drivers/media/pci/bt8xx/dst_ca.c  |4 ++--
 drivers/media/pci/cx18/cx18-vbi.c |2 +-
 drivers/media/pci/ivtv/ivtv-vbi.c |4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/media/pci/bt8xx/dst_ca.c b/drivers/media/pci/bt8xx/dst_ca.c
index 7d96fab..0e788fc 100644
--- a/drivers/media/pci/bt8xx/dst_ca.c
+++ b/drivers/media/pci/bt8xx/dst_ca.c
@@ -180,11 +180,11 @@ static int ca_get_app_info(struct dst_state *state)
put_command_and_length(>messages[0], CA_APP_INFO, length);
 
// Copy application_type, application_manufacturer and manufacturer_code
-   memcpy(>messages[4], >messages[7], 5);
+   memmove(>messages[4], >messages[7], 5);
 
// Set string length and copy string
state->messages[9] = str_length;
-   memcpy(>messages[10], >messages[12], str_length);
+   memmove(>messages[10], >messages[12], str_length);
 
return 0;
 }
diff --git a/drivers/media/pci/cx18/cx18-vbi.c 
b/drivers/media/pci/cx18/cx18-vbi.c
index 6d3121f..add9964 100644
--- a/drivers/media/pci/cx18/cx18-vbi.c
+++ b/drivers/media/pci/cx18/cx18-vbi.c
@@ -84,7 +84,7 @@ static void copy_vbi_data(struct cx18 *cx, int lines, u32 
pts_stamp)
   (the max size of the VBI data is 36 * 43 + 4 bytes).
   So in this case we use the magic number 'ITV0'. */
memcpy(dst + sd, "ITV0", 4);
-   memcpy(dst + sd + 4, dst + sd + 12, line * 43);
+   memmove(dst + sd + 4, dst + sd + 12, line * 43);
size = 4 + ((43 * line + 3) & ~3);
} else {
memcpy(dst + sd, "itv0", 4);
diff --git a/drivers/media/pci/ivtv/ivtv-vbi.c 
b/drivers/media/pci/ivtv/ivtv-vbi.c
index 293db80..3c156bc 100644
--- a/drivers/media/pci/ivtv/ivtv-vbi.c
+++ b/drivers/media/pci/ivtv/ivtv-vbi.c
@@ -224,7 +224,7 @@ static void copy_vbi_data(struct ivtv *itv, int lines, u32 
pts_stamp)
   (the max size of the VBI data is 36 * 43 + 4 bytes).
   So in this case we use the magic number 'ITV0'. */
memcpy(dst + sd, "ITV0", 4);
-   memcpy(dst + sd + 4, dst + sd + 12, line * 43);
+   memmove(dst + sd + 4, dst + sd + 12, line * 43);
size = 4 + ((43 * line + 3) & ~3);
} else {
memcpy(dst + sd, "itv0", 4);
@@ -532,7 +532,7 @@ void ivtv_vbi_work_handler(struct ivtv *itv)
while (vi->cc_payload_idx) {
cc = vi->cc_payload[0];
 
-   memcpy(vi->cc_payload, vi->cc_payload + 1,
+   memmove(vi->cc_payload, vi->cc_payload + 1,
sizeof(vi->cc_payload) - 
sizeof(vi->cc_payload[0]));
vi->cc_payload_idx--;
if (vi->cc_payload_idx && cc.odd[0] == 0x80 && 
cc.odd[1] == 0x80)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[avr32] [next:akpm 212/227] lib/percpu-refcount.c:57: warning: value computed is not used

2013-01-07 Thread Fengguang Wu
[CC avr32 maintainers]

On Mon, Jan 07, 2013 at 12:20:52PM -0800, Andrew Morton wrote:
> On Mon, 07 Jan 2013 12:45:36 +0800
> kbuild test robot  wrote:
> 
> > tree:   git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
> > akpm
> > head:   e862d51dae9808e394a118ca1692f09bf0751aba
> > commit: 6311ac08e93b12f1367da092b413dd2434be45f5 [212/227] generic dynamic 
> > per cpu refcounting
> > config: make ARCH=avr32 atngw100_defconfig
> > 
> > All warnings:
> > 
> > lib/percpu-refcount.c: In function 'percpu_ref_init':
> > lib/percpu-refcount.c:22: error: 'jiffies' undeclared (first use in this 
> > function)
> > lib/percpu-refcount.c:22: error: (Each undeclared identifier is reported 
> > only once
> > lib/percpu-refcount.c:22: error: for each function it appears in.)
> > lib/percpu-refcount.c: In function 'percpu_ref_alloc':
> > lib/percpu-refcount.c:36: error: 'jiffies' undeclared (first use in this 
> > function)
> > lib/percpu-refcount.c:41: error: 'HZ' undeclared (first use in this 
> > function)
> > lib/percpu-refcount.c:57: warning: value computed is not used
> > 
> 
> Confused. 
> http://ozlabs.org/~akpm/mmotm/broken-out/generic-dynamic-per-cpu-refcounting-fix.patch
> added the jiffies.h include.

Oh that patch's linux-next commit is 18bc58821de88e5621cfdaac78c814ea479accab
and goes _after_ the current commit 6311ac08e93b12f1367da092b413dd2434be45f5.

So linux-next HEAD no longer has the build errors, however I still got these
warnings:

  CC  lib/percpu-refcount.o
In file included from /c/wfg/linux/include/uapi/linux/param.h:4,
 from /c/wfg/linux/include/linux/timex.h:63,
 from /c/wfg/linux/include/linux/jiffies.h:8,
 from /c/wfg/linux/lib/percpu-refcount.c:4:
==> /c/wfg/linux/arch/avr32/include/asm/param.h:6:1: warning: "HZ" redefined

That's a trivial avr32 problem.

In file included from /c/wfg/linux/arch/avr32/include/asm/param.h:4,
 from /c/wfg/linux/include/uapi/linux/param.h:4,
 from /c/wfg/linux/include/linux/timex.h:63,
 from /c/wfg/linux/include/linux/jiffies.h:8,
 from /c/wfg/linux/lib/percpu-refcount.c:4:
/c/wfg/linux/arch/avr32/include/uapi/asm/param.h:6:1: warning: this is 
the location of the previous definition
/c/wfg/linux/lib/percpu-refcount.c: In function 'percpu_ref_alloc':
==> /c/wfg/linux/lib/percpu-refcount.c:58: warning: value computed is not 
used

That's perhaps related to how cmpxchg() is expanded in avr32.

/c/wfg/linux/lib/percpu-refcount.c: In function 'percpu_ref_kill':
==> /c/wfg/linux/lib/percpu-refcount.c:114: warning: 'new' may be used 
uninitialized in this function

Should be a false warning. The code looks fine.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: build failure after merge of the staging tree

2013-01-07 Thread Stephen Rothwell
Hi Greg,

After merging the staging tree, today's linux-next build (x86_64
allmodconfig) failed like this:

drivers/staging/comedi/comedi_fops.c: In function 'comedi_unlocked_ioctl':
drivers/staging/comedi/comedi_fops.c:1665:4: error: 'dev_file_info' undeclared 
(first use in this function)

Caused by commit 4da5fa9a439f ("staging: comedi: use comedi_dev_from_minor
()") interacting with commit 7d3135af399e ("staging: comedi: prevent
auto-unconfig of manually configured devices") from the staging.current
tree.

I just reverted the latter commit in the hope that the bug is fixed in
some other way in the staging tree.
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpN6UPv3MeuK.pgp
Description: PGP signature


Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it

2013-01-07 Thread Eric W. Biederman
Shuah Khan  writes:

> On Mon, Jan 7, 2013 at 8:26 AM, Konrad Rzeszutek Wilk
>  wrote:
>> On Fri, Jan 04, 2013 at 02:10:25PM -0800, Yinghai Lu wrote:
>>> On Fri, Jan 4, 2013 at 1:02 PM, Shuah Khan  wrote:
>>> > Pani'cing the system doesn't sound like a good option to me in this
>>> > case. This change to disable swiotlb is made for kdump. However, with
>>> > this change several system fail to boot, unless crashkernel_low=72M is
>>> > specified.
>>>
>>> this patchset is new feature to put second kdump kernel above 4G.
>>>
>>> >
>>> > I would the say the right approach to solve this would be to not
>>> > change the current pci_swiotlb_detect_override() behavior and treat
>>> > swiotlb =1 upon entry equivalent to swiotlb_force set.
>>>
>>> that will make intel system have to take crashkernel_low=72M too.
>>> otherwise intel system will get panic during swiotlb allocation.
>>
>> Two things:
>>
>>  1). You need to wrap the 'is_enough_..' in CONFIG_KEXEC, which means
>> that the function needs to go in a header file.
>>  2). The check for 1MB is suspect. Why only 1MB? You mentioned it is
>>  b/c of crashkernel_low=72M (which I am not seeing in v3.8 
>> kernel-parameters.txt?
>>  Is that part of your mega-patchset?). Anyhow, there seems to be a 
>> disconnect -
>>  what if the user supplied crashkernel_low=27M? Perhaps the 'is_enough'
>>  should also parse the bootparams to double-check that there is enough
>>  low-mem space? But then if the kernel grows then 72M might not be 
>> enough -
>>  you might need 82M with 3.9.
>>
>>  Perhaps a better way for this is to do:
>> 1). Change 'is_enough' to check only for 4MB.
>> 2). When booting as kexec, the SWIOTLB would only use 4MB instead of 
>> 64MB?
>>
>>  Or, we could also use the post-late SWIOTLB initialization similiary to 
>> how it was
>>  done on ia64. This would mean that the AMD VI code would just call the
>>  .. something like this - NOT tested or even compile tested:
>>
>> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
>> index c1c74e0..e7fa8f7 100644
>> --- a/drivers/iommu/amd_iommu.c
>> +++ b/drivers/iommu/amd_iommu.c
>> @@ -3173,6 +3173,24 @@ int __init amd_iommu_init_dma_ops(void)
>> if (unhandled && max_pfn > MAX_DMA32_PFN) {
>> /* There are unhandled devices - initialize swiotlb for them 
>> */
>> swiotlb = 1;
>> +   /* Late (so no bootmem allocator) usage and only if the 
>> early SWIOTLB
>> +* hadn't been allocated (which can happen on kexec kernels 
>> booted
>> +* above 4GB). */
>> +   if (!swiotlb_nr_tbl()) {
>> +   int retry = 3;
>> +   int mb_size = 64;
>> +   int rc = 0;
>> +retry_me:
>> +   if (retry < 0)
>> +   panic("We tried setting %dMB for SWIOTLB but 
>> got -ENOMEM", mb_size << 1);
>> +   rc = swiotlb_late_init_with_default_size(mb_size * 
>> (1<<20));
>> +   if (rc) {
>> +   retry --;
>> +   mb_size >> 1;
>> +   goto retry_me;
>> +   }
>> +   dma_ops = _dma_ops;
>> +   }
>> }
>>
>> amd_iommu_stats_init();
>>
>> And then the early SWIOTLB initialization for 64MB can fail and we are still 
>> OK.
>>>
>
> Yinghai/Konrad,
>
> Did more testing. btw this patch depends on your [v7u1,25/31]
> memblock: add memblock_mem_size(). Here are the test results:
>
> 1. When there is not enough memory: (enough_mem_for_swiotlb() returns false)
> system will panic in amd_iommu_init_dma_ops().
>
> 2. When there is enough memory: (enough_mem_for_swiotlb() returns true):
> swiotlb is reserved
> pci_swiotlb_late_init() leaves the buffer allocated since swiotlb=1
> with that getting changed in amd_iommu_init_dma_ops().
>
> I agree with Konrad that the logic should be wrapped in CONFIG_KEXEC.

If enough_mem_for_swiotlb needs to be conditional on CONFIG_KEXEC the
code is architected wrong.  None of this logic has anything to do with
kexec except that the kexec path is one way to get this condition to
happen.  Especially since the kexec'd kernel where this condition occurs
does not need kexec support built in.

Yinghai I sat down and read your patch and the approach you are taking
is totally wrong.

The problem is that swiotlb_init() in lib/swiotlb.c does not know how to
fail without panic'ing the system.

Which leaves two valid approaches.
- Create a variant of swiotlb_init that can fail for use on x86 and
  handle the failure.
- Delay initializing the swiotlb until someone actually needs a mapping
  from it.  

Delaying the initialization of the swiotlb is out because the code
needs an early memory allocation to get a large chunk of contiguous
memory for the bounce buffers.

Which means the 

[PATCH v12 1/9] LSM: Multiple concurrent LSMs

2013-01-07 Thread Casey Schaufler
Subject: [PATCH v12 1/9] LSM: Multiple concurrent LSMs

Change the infrastructure for Linux Security Modules (LSM)s
from a single vector of hook handlers to a list based method
for handling multiple concurrent modules. 

Changes for AppArmor. Abstract access to security blobs.
Remove commoncap calls.


Signed-off-by: Casey Schaufler 

---
 security/apparmor/context.c |   10 +++---
 security/apparmor/domain.c  |   19 --
 security/apparmor/include/context.h |   13 +--
 security/apparmor/lsm.c |   66 +--
 4 files changed, 45 insertions(+), 63 deletions(-)

diff --git a/security/apparmor/context.c b/security/apparmor/context.c
index 8a9b502..3d9e460 100644
--- a/security/apparmor/context.c
+++ b/security/apparmor/context.c
@@ -76,7 +76,7 @@ void aa_dup_task_context(struct aa_task_cxt *new, const 
struct aa_task_cxt *old)
  */
 int aa_replace_current_profile(struct aa_profile *profile)
 {
-   struct aa_task_cxt *cxt = current_cred()->security;
+   struct aa_task_cxt *cxt = lsm_get_cred(current_cred(), _ops);
struct cred *new;
BUG_ON(!profile);
 
@@ -87,7 +87,7 @@ int aa_replace_current_profile(struct aa_profile *profile)
if (!new)
return -ENOMEM;
 
-   cxt = new->security;
+   cxt = lsm_get_cred(new, _ops);
if (unconfined(profile) || (cxt->profile->ns != profile->ns)) {
/* if switching to unconfined or a different profile namespace
 * clear out context state
@@ -123,7 +123,7 @@ int aa_set_current_onexec(struct aa_profile *profile)
if (!new)
return -ENOMEM;
 
-   cxt = new->security;
+   cxt = lsm_get_cred(new, _ops);
aa_get_profile(profile);
aa_put_profile(cxt->onexec);
cxt->onexec = profile;
@@ -150,7 +150,7 @@ int aa_set_current_hat(struct aa_profile *profile, u64 
token)
return -ENOMEM;
BUG_ON(!profile);
 
-   cxt = new->security;
+   cxt = lsm_get_cred(new, _ops);
if (!cxt->previous) {
/* transfer refcount */
cxt->previous = cxt->profile;
@@ -187,7 +187,7 @@ int aa_restore_previous_profile(u64 token)
if (!new)
return -ENOMEM;
 
-   cxt = new->security;
+   cxt = lsm_get_cred(new, _ops);
if (cxt->token != token) {
abort_creds(new);
return -EACCES;
diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c
index 60f0c76..7ad4e26 100644
--- a/security/apparmor/domain.c
+++ b/security/apparmor/domain.c
@@ -353,14 +353,12 @@ int apparmor_bprm_set_creds(struct linux_binprm *bprm)
bprm->file->f_path.dentry->d_inode->i_mode
};
const char *name = NULL, *target = NULL, *info = NULL;
-   int error = cap_bprm_set_creds(bprm);
-   if (error)
-   return error;
+   int error = 0;
 
if (bprm->cred_prepared)
return 0;
 
-   cxt = bprm->cred->security;
+   cxt = lsm_get_cred(bprm->cred, _ops);
BUG_ON(!cxt);
 
profile = aa_get_profile(aa_newest_version(cxt->profile));
@@ -539,15 +537,10 @@ cleanup:
  */
 int apparmor_bprm_secureexec(struct linux_binprm *bprm)
 {
-   int ret = cap_bprm_secureexec(bprm);
-
/* the decision to use secure exec is computed in set_creds
 * and stored in bprm->unsafe.
 */
-   if (!ret && (bprm->unsafe & AA_SECURE_X_NEEDED))
-   ret = 1;
-
-   return ret;
+   return bprm->unsafe & AA_SECURE_X_NEEDED;
 }
 
 /**
@@ -557,7 +550,7 @@ int apparmor_bprm_secureexec(struct linux_binprm *bprm)
 void apparmor_bprm_committing_creds(struct linux_binprm *bprm)
 {
struct aa_profile *profile = __aa_current_profile();
-   struct aa_task_cxt *new_cxt = bprm->cred->security;
+   struct aa_task_cxt *new_cxt = lsm_get_cred(bprm->cred, _ops);
 
/* bail out if unconfined or not changing profile */
if ((new_cxt->profile == profile) ||
@@ -634,7 +627,7 @@ int aa_change_hat(const char *hats[], int count, u64 token, 
bool permtest)
 
/* released below */
cred = get_current_cred();
-   cxt = cred->security;
+   cxt = lsm_get_cred(cred, _ops);
profile = aa_cred_profile(cred);
previous_profile = cxt->previous;
 
@@ -770,7 +763,7 @@ int aa_change_profile(const char *ns_name, const char 
*hname, bool onexec,
}
 
cred = get_current_cred();
-   cxt = cred->security;
+   cxt = lsm_get_cred(cred, _ops);
profile = aa_cred_profile(cred);
 
/*
diff --git a/security/apparmor/include/context.h 
b/security/apparmor/include/context.h
index a9cbee4..8484e55 100644
--- a/security/apparmor/include/context.h
+++ b/security/apparmor/include/context.h
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "policy.h"
 
@@ -81,6 +82,8 @@ int aa_set_current_onexec(struct aa_profile *profile);
 int 

[PATCH 05/33] cputime: Use accessors to read task cputime stats

2013-01-07 Thread Frederic Weisbecker
This is in preparation for the full dynticks feature. While
remotely reading the cputime of a task running in a full
dynticks CPU, we'll need to do some extra-computation. This
way we can account the time it spent tickless in userspace
since its last cputime snapshot.

Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: Ingo Molnar 
Cc: Li Zhong 
Cc: Namhyung Kim 
Cc: Paul E. McKenney 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
---
 arch/alpha/kernel/osf_sys.c |6 --
 arch/x86/kernel/apm_32.c|   11 ++-
 drivers/isdn/mISDN/stack.c  |7 ++-
 fs/binfmt_elf.c |8 ++--
 fs/binfmt_elf_fdpic.c   |7 +--
 include/linux/sched.h   |   18 ++
 kernel/acct.c   |6 --
 kernel/cpu.c|4 +++-
 kernel/delayacct.c  |7 +--
 kernel/exit.c   |6 --
 kernel/posix-cpu-timers.c   |   28 ++--
 kernel/sched/cputime.c  |9 +
 kernel/signal.c |   12 
 kernel/tsacct.c |   19 +--
 14 files changed, 109 insertions(+), 39 deletions(-)

diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index 14db93e..dbc1760 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -1139,6 +1139,7 @@ struct rusage32 {
 SYSCALL_DEFINE2(osf_getrusage, int, who, struct rusage32 __user *, ru)
 {
struct rusage32 r;
+   cputime_t utime, stime;
 
if (who != RUSAGE_SELF && who != RUSAGE_CHILDREN)
return -EINVAL;
@@ -1146,8 +1147,9 @@ SYSCALL_DEFINE2(osf_getrusage, int, who, struct rusage32 
__user *, ru)
memset(, 0, sizeof(r));
switch (who) {
case RUSAGE_SELF:
-   jiffies_to_timeval32(current->utime, _utime);
-   jiffies_to_timeval32(current->stime, _stime);
+   task_cputime(current, , );
+   jiffies_to_timeval32(utime, _utime);
+   jiffies_to_timeval32(stime, _stime);
r.ru_minflt = current->min_flt;
r.ru_majflt = current->maj_flt;
break;
diff --git a/arch/x86/kernel/apm_32.c b/arch/x86/kernel/apm_32.c
index d65464e..8d7012b 100644
--- a/arch/x86/kernel/apm_32.c
+++ b/arch/x86/kernel/apm_32.c
@@ -899,6 +899,7 @@ static void apm_cpu_idle(void)
static int use_apm_idle; /* = 0 */
static unsigned int last_jiffies; /* = 0 */
static unsigned int last_stime; /* = 0 */
+   cputime_t stime;
 
int apm_idle_done = 0;
unsigned int jiffies_since_last_check = jiffies - last_jiffies;
@@ -906,23 +907,23 @@ static void apm_cpu_idle(void)
 
WARN_ONCE(1, "deprecated apm_cpu_idle will be deleted in 2012");
 recalc:
+   task_cputime(current, NULL, );
if (jiffies_since_last_check > IDLE_CALC_LIMIT) {
use_apm_idle = 0;
-   last_jiffies = jiffies;
-   last_stime = current->stime;
} else if (jiffies_since_last_check > idle_period) {
unsigned int idle_percentage;
 
-   idle_percentage = current->stime - last_stime;
+   idle_percentage = stime - last_stime;
idle_percentage *= 100;
idle_percentage /= jiffies_since_last_check;
use_apm_idle = (idle_percentage > idle_threshold);
if (apm_info.forbid_idle)
use_apm_idle = 0;
-   last_jiffies = jiffies;
-   last_stime = current->stime;
}
 
+   last_jiffies = jiffies;
+   last_stime = stime;
+
bucket = IDLE_LEAKY_MAX;
 
while (!need_resched()) {
diff --git a/drivers/isdn/mISDN/stack.c b/drivers/isdn/mISDN/stack.c
index 5f21f62..deda591 100644
--- a/drivers/isdn/mISDN/stack.c
+++ b/drivers/isdn/mISDN/stack.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "core.h"
 
 static u_int   *debug;
@@ -202,6 +203,9 @@ static int
 mISDNStackd(void *data)
 {
struct mISDNstack *st = data;
+#ifdef MISDN_MSG_STATS
+   cputime_t utime, stime;
+#endif
int err = 0;
 
sigfillset(>blocked);
@@ -303,9 +307,10 @@ mISDNStackd(void *data)
   "msg %d sleep %d stopped\n",
   dev_name(>dev->dev), st->msg_cnt, st->sleep_cnt,
   st->stopped_cnt);
+   task_cputime(st->thread, , );
printk(KERN_DEBUG
   "mISDNStackd daemon for %s utime(%ld) stime(%ld)\n",
-  dev_name(>dev->dev), st->thread->utime, st->thread->stime);
+  dev_name(>dev->dev), utime, stime);
printk(KERN_DEBUG
   "mISDNStackd daemon for %s nvcsw(%ld) nivcsw(%ld)\n",
   dev_name(>dev->dev), st->thread->nvcsw, st->thread->nivcsw);
diff --git 

[PATCH v12 2/9] LSM: Multiple concurrent LSMs

2013-01-07 Thread Casey Schaufler
Subject: [PATCH v12 2/9] LSM: Multiple concurrent LSMs

Change the infrastructure for Linux Security Modules (LSM)s
from a single vector of hook handlers to a list based method
for handling multiple concurrent modules. 

Remove security/capability.c as there is no longer need
of a "default" LSM. Remove unused commoncap function.

Signed-off-by: Casey Schaufler 

---
 security/Makefile |3 +-
 security/capability.c | 1081 -
 security/commoncap.c  |6 -
 3 files changed, 1 insertion(+), 1089 deletions(-)

diff --git a/security/Makefile b/security/Makefile
index c26c81e..b1875b1 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -14,9 +14,8 @@ obj-y += commoncap.o
 obj-$(CONFIG_MMU)  += min_addr.o
 
 # Object file lists
-obj-$(CONFIG_SECURITY) += security.o capability.o
+obj-$(CONFIG_SECURITY) += security.o
 obj-$(CONFIG_SECURITYFS)   += inode.o
-# Must precede capability.o in order to stack properly.
 obj-$(CONFIG_SECURITY_SELINUX) += selinux/built-in.o
 obj-$(CONFIG_SECURITY_SMACK)   += smack/built-in.o
 obj-$(CONFIG_AUDIT)+= lsm_audit.o
diff --git a/security/capability.c b/security/capability.c
deleted file mode 100644
index 0fe5a02..000
--- a/security/capability.c
+++ /dev/null
@@ -1,1081 +0,0 @@
-/*
- *  Capabilities Linux Security Module
- *
- *  This is the default security module in case no other module is loaded.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- */
-
-#include 
-
-static int cap_syslog(int type)
-{
-   return 0;
-}
-
-static int cap_quotactl(int cmds, int type, int id, struct super_block *sb)
-{
-   return 0;
-}
-
-static int cap_quota_on(struct dentry *dentry)
-{
-   return 0;
-}
-
-static int cap_bprm_check_security(struct linux_binprm *bprm)
-{
-   return 0;
-}
-
-static void cap_bprm_committing_creds(struct linux_binprm *bprm)
-{
-}
-
-static void cap_bprm_committed_creds(struct linux_binprm *bprm)
-{
-}
-
-static int cap_sb_alloc_security(struct super_block *sb)
-{
-   return 0;
-}
-
-static void cap_sb_free_security(struct super_block *sb)
-{
-}
-
-static int cap_sb_copy_data(char *orig, char *copy)
-{
-   return 0;
-}
-
-static int cap_sb_remount(struct super_block *sb, void *data)
-{
-   return 0;
-}
-
-static int cap_sb_kern_mount(struct super_block *sb, int flags, void *data)
-{
-   return 0;
-}
-
-static int cap_sb_show_options(struct seq_file *m, struct super_block *sb)
-{
-   return 0;
-}
-
-static int cap_sb_statfs(struct dentry *dentry)
-{
-   return 0;
-}
-
-static int cap_sb_mount(const char *dev_name, struct path *path,
-   const char *type, unsigned long flags, void *data)
-{
-   return 0;
-}
-
-static int cap_sb_umount(struct vfsmount *mnt, int flags)
-{
-   return 0;
-}
-
-static int cap_sb_pivotroot(struct path *old_path, struct path *new_path)
-{
-   return 0;
-}
-
-static int cap_sb_set_mnt_opts(struct super_block *sb,
-  struct security_mnt_opts *opts)
-{
-   if (unlikely(opts->num_mnt_opts))
-   return -EOPNOTSUPP;
-   return 0;
-}
-
-static void cap_sb_clone_mnt_opts(const struct super_block *oldsb,
- struct super_block *newsb)
-{
-}
-
-static int cap_sb_parse_opts_str(char *options, struct security_mnt_opts *opts)
-{
-   return 0;
-}
-
-static int cap_inode_alloc_security(struct inode *inode)
-{
-   return 0;
-}
-
-static void cap_inode_free_security(struct inode *inode)
-{
-}
-
-static int cap_inode_init_security(struct inode *inode, struct inode *dir,
-  const struct qstr *qstr, char **name,
-  void **value, size_t *len)
-{
-   return -EOPNOTSUPP;
-}
-
-static int cap_inode_create(struct inode *inode, struct dentry *dentry,
-   umode_t mask)
-{
-   return 0;
-}
-
-static int cap_inode_link(struct dentry *old_dentry, struct inode *inode,
- struct dentry *new_dentry)
-{
-   return 0;
-}
-
-static int cap_inode_unlink(struct inode *inode, struct dentry *dentry)
-{
-   return 0;
-}
-
-static int cap_inode_symlink(struct inode *inode, struct dentry *dentry,
-const char *name)
-{
-   return 0;
-}
-
-static int cap_inode_mkdir(struct inode *inode, struct dentry *dentry,
-  umode_t mask)
-{
-   return 0;
-}
-
-static int cap_inode_rmdir(struct inode *inode, struct dentry *dentry)
-{
-   return 0;
-}
-
-static int cap_inode_mknod(struct inode *inode, struct dentry *dentry,
-   

[PATCH 09/33] nohz: Trace timekeeping update

2013-01-07 Thread Frederic Weisbecker
Not for merge. This may become a real tracepoint.

Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: Ingo Molnar 
Cc: Li Zhong 
Cc: Namhyung Kim 
Cc: Paul E. McKenney 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
---
 kernel/time/tick-sched.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index b75e302..a35ae96 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -118,8 +118,10 @@ static void tick_sched_do_timer(ktime_t now)
 #endif
 
/* Check, if the jiffies need an update */
-   if (tick_do_timer_cpu == cpu)
+   if (tick_do_timer_cpu == cpu) {
+   trace_printk("do timekeeping\n");
tick_do_update_jiffies64(now);
+   }
 }
 
 static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/33] nohz: Wake up full dynticks CPUs when a timer gets enqueued

2013-01-07 Thread Frederic Weisbecker
Wake up a CPU when a timer list timer is enqueued there and
the CPU is in full dynticks mode. Sending an IPI to it makes
it reconsidering the next timer to program on top of recent
updates.

Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: Ingo Molnar 
Cc: Li Zhong 
Cc: Namhyung Kim 
Cc: Paul E. McKenney 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
---
 include/linux/sched.h |4 ++--
 kernel/sched/core.c   |   18 +-
 kernel/timer.c|2 +-
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3bca36e..32860ae 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2061,9 +2061,9 @@ static inline void idle_task_exit(void) {}
 #endif
 
 #if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
-extern void wake_up_idle_cpu(int cpu);
+extern void wake_up_nohz_cpu(int cpu);
 #else
-static inline void wake_up_idle_cpu(int cpu) { }
+static inline void wake_up_nohz_cpu(int cpu) { }
 #endif
 
 extern unsigned int sysctl_sched_latency;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 257002c..63b25e2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -587,7 +587,7 @@ unlock:
  * account when the CPU goes back to idle and evaluates the timer
  * wheel for the next timer event.
  */
-void wake_up_idle_cpu(int cpu)
+static void wake_up_idle_cpu(int cpu)
 {
struct rq *rq = cpu_rq(cpu);
 
@@ -617,6 +617,22 @@ void wake_up_idle_cpu(int cpu)
smp_send_reschedule(cpu);
 }
 
+static bool wake_up_full_nohz_cpu(int cpu)
+{
+   if (tick_nohz_full_cpu(cpu)) {
+   smp_send_reschedule(cpu);
+   return true;
+   }
+
+   return false;
+}
+
+void wake_up_nohz_cpu(int cpu)
+{
+   if (!wake_up_full_nohz_cpu(cpu))
+   wake_up_idle_cpu(cpu);
+}
+
 static inline bool got_nohz_idle_kick(void)
 {
int cpu = smp_processor_id();
diff --git a/kernel/timer.c b/kernel/timer.c
index ff3b516..970b57d 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -936,7 +936,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
 * makes sure that a CPU on the way to idle can not evaluate
 * the timer wheel.
 */
-   wake_up_idle_cpu(cpu);
+   wake_up_nohz_cpu(cpu);
spin_unlock_irqrestore(>lock, flags);
 }
 EXPORT_SYMBOL_GPL(add_timer_on);
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/33] rcu: Restart the tick on non-responding full dynticks CPUs

2013-01-07 Thread Frederic Weisbecker
When a CPU in full dynticks mode doesn't respond to complete
a grace period, issue it a specific IPI so that it restarts
the tick and chases a quiescent state.

Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: Ingo Molnar 
Cc: Li Zhong 
Cc: Namhyung Kim 
Cc: Paul E. McKenney 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 

Signed-off-by: Steven Rostedt 
---
 kernel/rcutree.c |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index e441b77..302d360 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -53,6 +53,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "rcutree.h"
 #include 
@@ -743,6 +744,12 @@ static int dyntick_save_progress_counter(struct rcu_data 
*rdp)
return (rdp->dynticks_snap & 0x1) == 0;
 }
 
+static void rcu_kick_nohz_cpu(int cpu)
+{
+   if (tick_nohz_full_cpu(cpu))
+   smp_send_reschedule(cpu);
+}
+
 /*
  * Return true if the specified CPU has passed through a quiescent
  * state by virtue of being in or having passed through an dynticks
@@ -790,6 +797,9 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
rdp->offline_fqs++;
return 1;
}
+
+   rcu_kick_nohz_cpu(rdp->cpu);
+
return 0;
 }
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/33] nohz: Assign timekeeping duty to a non-full-nohz CPU

2013-01-07 Thread Frederic Weisbecker
This way the full nohz CPUs can safely run with the tick
stopped with a guarantee that somebody else is taking
care of the jiffies and gtod progression.

NOTE: this doesn't handle CPU hotplug. Also we could use something
more elaborated wrt. powersaving if we have more than one non full-nohz
CPU running. But let's use this KISS solution for now.

Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: Ingo Molnar 
Cc: Li Zhong 
Cc: Namhyung Kim 
Cc: Paul E. McKenney 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
[fix have_nohz_full_mask offcase]
Signed-off-by: Steven Rostedt 
---
 kernel/time/tick-broadcast.c |3 ++-
 kernel/time/tick-common.c|5 -
 kernel/time/tick-sched.c |9 -
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index f113755..596c547 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -537,7 +537,8 @@ void tick_broadcast_setup_oneshot(struct clock_event_device 
*bc)
bc->event_handler = tick_handle_oneshot_broadcast;
 
/* Take the do_timer update */
-   tick_do_timer_cpu = cpu;
+   if (!tick_nohz_full_cpu(cpu))
+   tick_do_timer_cpu = cpu;
 
/*
 * We must be careful here. There might be other CPUs
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index b1600a6..83f2bd9 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -163,7 +163,10 @@ static void tick_setup_device(struct tick_device *td,
 * this cpu:
 */
if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
-   tick_do_timer_cpu = cpu;
+   if (!tick_nohz_full_cpu(cpu))
+   tick_do_timer_cpu = cpu;
+   else
+   tick_do_timer_cpu = TICK_DO_TIMER_NONE;
tick_next_period = ktime_get();
tick_period = ktime_set(0, NSEC_PER_SEC / HZ);
}
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 494a2aa..b75e302 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -112,7 +112,8 @@ static void tick_sched_do_timer(ktime_t now)
 * this duty, then the jiffies update is still serialized by
 * jiffies_lock.
 */
-   if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE))
+   if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)
+   && !tick_nohz_full_cpu(cpu))
tick_do_timer_cpu = cpu;
 #endif
 
@@ -163,6 +164,8 @@ static int __init tick_nohz_full_setup(char *str)
return 1;
 }
 __setup("full_nohz=", tick_nohz_full_setup);
+#else
+#define have_full_nohz_mask (0)
 #endif
 
 /*
@@ -512,6 +515,10 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched 
*ts)
return false;
}
 
+   /* If there are full nohz CPUs around, we need to keep the timekeeping 
duty */
+   if (have_full_nohz_mask && tick_do_timer_cpu == cpu)
+   return false;
+
return true;
 }
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/33] nohz: Basic full dynticks interface

2013-01-07 Thread Frederic Weisbecker
Start with a very simple interface to define full dynticks CPU:
use a boot time option defined cpumask through the "full_nohz="
kernel parameter.

Make sure you keep at least one CPU outside this range to handle
the timekeeping.

Also full_nohz= must match rcu_nocb= value.

Suggested-by: Paul E. McKenney 
Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: Ingo Molnar 
Cc: Li Zhong 
Cc: Namhyung Kim 
Cc: Paul E. McKenney 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
---
 include/linux/tick.h |7 +++
 kernel/time/Kconfig  |9 +
 kernel/time/tick-sched.c |   23 +++
 3 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 553272e..2d4f6f0 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -157,6 +157,13 @@ static inline u64 get_cpu_idle_time_us(int cpu, u64 
*unused) { return -1; }
 static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
 # endif /* !NO_HZ */
 
+#ifdef CONFIG_NO_HZ_FULL
+int tick_nohz_full_cpu(int cpu);
+#else
+static inline int tick_nohz_full_cpu(int cpu) { return 0; }
+#endif
+
+
 # ifdef CONFIG_CPU_IDLE_GOV_MENU
 extern void menu_hrtimer_cancel(void);
 # else
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index 8601f0d..dc6381d 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -70,6 +70,15 @@ config NO_HZ
  only trigger on an as-needed basis both when the system is
  busy and when the system is idle.
 
+config NO_HZ_FULL
+   bool "Full tickless system"
+   depends on NO_HZ && RCU_USER_QS && VIRT_CPU_ACCOUNTING_GEN && 
RCU_NOCB_CPU && SMP
+   select CONTEXT_TRACKING_FORCE
+   help
+ Try to be tickless everywhere, not just in idle. (You need
+to fill up the full_nohz_mask boot parameter).
+
+
 config HIGH_RES_TIMERS
bool "High Resolution Timer Support"
depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 314b9ee..494a2aa 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -142,6 +142,29 @@ static void tick_sched_handle(struct tick_sched *ts, 
struct pt_regs *regs)
profile_tick(CPU_PROFILING);
 }
 
+#ifdef CONFIG_NO_HZ_FULL
+static cpumask_var_t full_nohz_mask;
+bool have_full_nohz_mask;
+
+int tick_nohz_full_cpu(int cpu)
+{
+   if (!have_full_nohz_mask)
+   return 0;
+
+   return cpumask_test_cpu(cpu, full_nohz_mask);
+}
+
+/* Parse the boot-time nohz CPU list from the kernel parameters. */
+static int __init tick_nohz_full_setup(char *str)
+{
+   alloc_bootmem_cpumask_var(_nohz_mask);
+   have_full_nohz_mask = true;
+   cpulist_parse(str, full_nohz_mask);
+   return 1;
+}
+__setup("full_nohz=", tick_nohz_full_setup);
+#endif
+
 /*
  * NOHZ - aka dynamic tick functionality
  */
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/33] cputime: Allow dynamic switch between tick/virtual based cputime accounting

2013-01-07 Thread Frederic Weisbecker
Allow to dynamically switch between tick and virtual based cputime accounting.
This way we can provide a kind of "on-demand" virtual based cputime
accounting. In this mode, the kernel will rely on the user hooks
subsystem to dynamically hook on kernel boundaries.

This is in preparation for being able to stop the timer tick in
more places than just the idle state. Doing so will depend on
CONFIG_VIRT_CPU_ACCOUNTING which makes it possible to account the
cputime without the tick by hooking on kernel/user boundaries.

Depending whether the tick is stopped or not, we can switch between
tick and vtime based accounting anytime in order to minimize the
overhead associated to user hooks.

Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: Ingo Molnar 
Cc: Li Zhong 
Cc: Namhyung Kim 
Cc: Paul E. McKenney 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
---
 include/linux/kernel_stat.h |2 +-
 include/linux/sched.h   |4 +-
 include/linux/vtime.h   |8 ++
 kernel/fork.c   |2 +-
 kernel/sched/cputime.c  |   58 +++---
 kernel/time/tick-sched.c|5 +++-
 6 files changed, 53 insertions(+), 26 deletions(-)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 66b7078..ed5f6ed 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -127,7 +127,7 @@ extern void account_system_time(struct task_struct *, int, 
cputime_t, cputime_t)
 extern void account_steal_time(cputime_t);
 extern void account_idle_time(cputime_t);
 
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 static inline void account_process_tick(struct task_struct *tsk, int user)
 {
vtime_account_user(tsk);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 206bb08..66b2344 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -605,7 +605,7 @@ struct signal_struct {
cputime_t utime, stime, cutime, cstime;
cputime_t gtime;
cputime_t cgtime;
-#ifndef CONFIG_VIRT_CPU_ACCOUNTING
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
struct cputime prev_cputime;
 #endif
unsigned long nvcsw, nivcsw, cnvcsw, cnivcsw;
@@ -1365,7 +1365,7 @@ struct task_struct {
 
cputime_t utime, stime, utimescaled, stimescaled;
cputime_t gtime;
-#ifndef CONFIG_VIRT_CPU_ACCOUNTING
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
struct cputime prev_cputime;
 #endif
unsigned long nvcsw, nivcsw; /* context switch counts */
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index 21ef703..5368af9 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -10,12 +10,20 @@ extern void vtime_account_system_irqsafe(struct task_struct 
*tsk);
 extern void vtime_account_idle(struct task_struct *tsk);
 extern void vtime_account_user(struct task_struct *tsk);
 extern void vtime_account(struct task_struct *tsk);
+
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+extern bool vtime_accounting_enabled(void);
 #else
+static inline bool vtime_accounting_enabled(void) { return true; }
+#endif
+
+#else /* !CONFIG_VIRT_CPU_ACCOUNTING */
 static inline void vtime_task_switch(struct task_struct *prev) { }
 static inline void vtime_account_system(struct task_struct *tsk) { }
 static inline void vtime_account_system_irqsafe(struct task_struct *tsk) { }
 static inline void vtime_account_user(struct task_struct *tsk) { }
 static inline void vtime_account(struct task_struct *tsk) { }
+static inline bool vtime_accounting_enabled(void) { return false; }
 #endif
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
diff --git a/kernel/fork.c b/kernel/fork.c
index 65ca6d2..81b5209 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1230,7 +1230,7 @@ static struct task_struct *copy_process(unsigned long 
clone_flags,
 
p->utime = p->stime = p->gtime = 0;
p->utimescaled = p->stimescaled = 0;
-#ifndef CONFIG_VIRT_CPU_ACCOUNTING
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
p->prev_cputime.utime = p->prev_cputime.stime = 0;
 #endif
 #if defined(SPLIT_RSS_COUNTING)
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 3749a0e..3ea4233 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -317,8 +317,6 @@ out:
rcu_read_unlock();
 }
 
-#ifndef CONFIG_VIRT_CPU_ACCOUNTING
-
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
 /*
  * Account a tick to a process and cpustat
@@ -388,6 +386,7 @@ static void irqtime_account_process_tick(struct task_struct 
*p, int user_tick,
struct rq *rq) {}
 #endif /* CONFIG_IRQ_TIME_ACCOUNTING */
 
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 /*
  * Account a single tick of cpu time.
  * @p: the process that the cpu time gets accounted to
@@ -398,6 +397,11 @@ void account_process_tick(struct task_struct *p, int 

[PATCH 12/33] sched: Comment on rq->clock correctness in ttwu_do_wakeup() in nohz

2013-01-07 Thread Frederic Weisbecker
Just to avoid confusion.

Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: Ingo Molnar 
Cc: Li Zhong 
Cc: Namhyung Kim 
Cc: Paul E. McKenney 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
---
 kernel/sched/core.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 63b25e2..bfac40f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1302,6 +1302,12 @@ ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int 
wake_flags)
if (p->sched_class->task_woken)
p->sched_class->task_woken(rq, p);
 
+   /*
+* For adaptive nohz case: We called ttwu_activate()
+* which just updated the rq clock. There is an
+* exception with p->on_rq != 0 but in this case
+* we are not idle and rq->idle_stamp == 0
+*/
if (rq->idle_stamp) {
u64 delta = rq->clock - rq->idle_stamp;
u64 max = 2*sysctl_sched_migration_cost;
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ANNOUNCE] 3.8-rc2-nohz2

2013-01-07 Thread Frederic Weisbecker

Hi,

Here is a new version of the full dynticks patchset based on 3.8-rc2.
It addresses most feedbacks I got on the previous release (see the list of 
changes
below).

Thanks you for your reviews, they are really useful!

This version is pullable at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
3.8-rc2-nohz2

For the details on how to use it, check this link:
https://lwn.net/Articles/530345/ on the section "How to use".

Changes since 3.8-rc1-nohz1:

* Let the user choose between CONFIG_VIRT_CPU_ACCOUNTING_NATIVE and
CONFIG_VIRT_CPU_ACCOUNTING_GEN if both are available (thanks Li Zhong).
[patch 03/33]

* Move code that export context tracking state to its own commit to
make the review easier (thanks Paul Gortmaker) [patch 02/33]

* Rename vtime_accounting() to vtime_accounting_enabled() (thanks
Paul Gortmaker) [patch 04/33]

* Fix vtime_enter_user / vtime_user_enter confusion. (thanks Li Zhong)
[patch 03/33]

* Fix grammar, spelling and foggy explanations. (thanks Paul Gortmaker)
[patch 04/33]

* Fix "hook" based naming (thanks Ingo Molnar) [patch 01/33]

* Fix is_nocb_cpu() orphan declaration (thanks Namhyung Kim) [patch 22/33]

* Add full dynticks runqueue clock debugging [patch 29-30/33]

* Fix missing rq clock update in update_cpu_load_nohz(), thanks to the
debugging code on the previous patch. [patch 32/33] That's not yet a full
solution for the nohz rt power scale though.

* Partly handle update_cpu_load_active() [patch 33/33] (we still have to handle
calc_load_account_active)


TODO list has slightly reduced and also slightly grown :)

- Handle calc_load_account_active().

- Handle sched_class->task_tick()

- Handle rt power scaling

- Make sure rcu_nocbs mask matches full_nohz's.

- Get the nohz printk patchset merged.

- Posix cpu timers enqueued while tick is off. Probably no big deal but
I need to look into that.

- Several trivial stuffs: perf_event_task_tick(), profile_tick(),
sched_clock_tick(), etc...

Enjoy!

---
Frederic Weisbecker (41):
  irq_work: Fix racy IRQ_WORK_BUSY flag setting
  irq_work: Fix racy check on work pending flag
  irq_work: Remove CONFIG_HAVE_IRQ_WORK
  nohz: Add API to check tick state
  irq_work: Don't stop the tick with pending works
  irq_work: Make self-IPIs optable
  printk: Wake up klogd using irq_work
  Merge branch 'nohz/printk-v8' into 3.8-rc2-nohz2-base
  context_tracking: Add comments on interface and internals
  context_tracking: Export context state for generic vtime
  cputime: Generic on-demand virtual cputime accounting
  cputime: Allow dynamic switch between tick/virtual based cputime 
accounting
  cputime: Use accessors to read task cputime stats
  cputime: Safely read cputime of full dynticks CPUs
  nohz: Basic full dynticks interface
  nohz: Assign timekeeping duty to a non-full-nohz CPU
  nohz: Trace timekeeping update
  nohz: Wake up full dynticks CPUs when a timer gets enqueued
  rcu: Restart the tick on non-responding full dynticks CPUs
  sched: Comment on rq->clock correctness in ttwu_do_wakeup() in nohz
  sched: Update rq clock on nohz CPU before migrating tasks
  sched: Update rq clock on nohz CPU before setting fair group shares
  sched: Update rq clock on tickless CPUs before calling 
check_preempt_curr()
  sched: Update rq clock earlier in unthrottle_cfs_rq
  sched: Update clock of nohz busiest rq before balancing
  sched: Update rq clock before idle balancing
  sched: Update nohz rq clock before searching busiest group on load 
balancing
  nohz: Move nohz load balancer selection into idle logic
  nohz: Full dynticks mode
  nohz: Only stop the tick on RCU nocb CPUs
  nohz: Don't turn off the tick if rcu needs it
  nohz: Don't stop the tick if posix cpu timers are running
  nohz: Add some tracing
  rcu: Don't keep the tick for RCU while in userspace
  profiling: Remove unused timer hook
  timer: Don't run non-pinned timer to full dynticks CPUs
  sched: Use an accessor to read rq clock
  sched: Debug nohz rq clock
  sched: Remove broken check for skip clock update
  sched: Update rq clock before rt sched average scale
  sched: Disable lb_bias feature for full dynticks

Steven Rostedt (2):
  irq_work: Flush work on CPU_DYING
  irq_work: Warn if there's still work on cpu_down

 arch/alpha/Kconfig |1 -
 arch/alpha/kernel/osf_sys.c|6 +-
 arch/arm/Kconfig   |1 -
 arch/arm64/Kconfig |1 -
 arch/blackfin/Kconfig  |1 -
 arch/frv/Kconfig   |1 -
 arch/hexagon/Kconfig   |1 -
 arch/ia64/include/asm/cputime.h|6 +-
 arch/ia64/include/asm/thread_info.h|4 +-
 arch/ia64/include/asm/xen/minstate.h   |2 +-
 arch/ia64/kernel/asm-offsets.c |2 +-
 arch/ia64/kernel/entry.S  

[PATCH 13/33] sched: Update rq clock on nohz CPU before migrating tasks

2013-01-07 Thread Frederic Weisbecker
Because the sched_class::put_prev_task() callback of rt and fair
classes are referring to the rq clock to update their runtime
statistics. A CPU running in tickless mode may carry a stale value.
We need to update it there.

Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: Ingo Molnar 
Cc: Li Zhong 
Cc: Namhyung Kim 
Cc: Paul E. McKenney 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
---
 kernel/sched/core.c  |6 ++
 kernel/sched/sched.h |7 +++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bfac40f..2fcbb03 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4894,6 +4894,12 @@ static void migrate_tasks(unsigned int dead_cpu)
 */
rq->stop = NULL;
 
+   /*
+* ->put_prev_task() need to have an up-to-date value
+* of rq->clock[_task]
+*/
+   update_nohz_rq_clock(rq);
+
for ( ; ; ) {
/*
 * There's this thread running, bail when that's the only
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index fc88644..f24d91e 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "cpupri.h"
 
@@ -963,6 +964,12 @@ static inline void dec_nr_running(struct rq *rq)
 
 extern void update_rq_clock(struct rq *rq);
 
+static inline void update_nohz_rq_clock(struct rq *rq)
+{
+   if (tick_nohz_full_cpu(cpu_of(rq)))
+   update_rq_clock(rq);
+}
+
 extern void activate_task(struct rq *rq, struct task_struct *p, int flags);
 extern void deactivate_task(struct rq *rq, struct task_struct *p, int flags);
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v12 4/9] LSM: Multiple concurrent LSMs

2013-01-07 Thread Casey Schaufler
Subject: [PATCH v12 4/9] LSM: Multiple concurrent LSMs

Change the infrastructure for Linux Security Modules (LSM)s
from a single vector of hook handlers to a list based method
for handling multiple concurrent modules. 

Configuration changes.
Headers files. Add securityfs files to report the
registered and present LSMs.

Signed-off-by: Casey Schaufler 

---
 include/linux/lsm.h  |  174 +++
 include/linux/security.h |  255 +++---
 security/Kconfig |   79 +-
 security/inode.c |   79 +-
 4 files changed, 521 insertions(+), 66 deletions(-)

diff --git a/include/linux/lsm.h b/include/linux/lsm.h
new file mode 100644
index 000..5f36b6b
--- /dev/null
+++ b/include/linux/lsm.h
@@ -0,0 +1,174 @@
+/*
+ *
+ * Copyright (C) 2012 Casey Schaufler 
+ * Copyright (C) 2012 Intel Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, version 2.
+ *
+ * Author:
+ * Casey Schaufler 
+ *
+ */
+#ifndef _LINUX_LSM_H
+#define _LINUX_LSM_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Maximum number of LSMs that can be used at a time.
+ */
+#define COMPOSER_MAX   CONFIG_SECURITY_COMPOSER_MAX
+#define COMPOSER_NAMES_MAX ((SECURITY_NAME_MAX + 1) * COMPOSER_MAX)
+
+#include 
+
+/*
+ * Just a set of slots for each LSM to keep its blob in.
+ */
+struct lsm_blob {
+   int lsm_setcount;   /* Number of blobs set */
+   void*lsm_blobs[COMPOSER_MAX];   /* LSM specific blobs */
+};
+
+static inline struct lsm_blob *lsm_alloc_blob(gfp_t gfp)
+{
+   return kzalloc(sizeof(struct lsm_blob), gfp);
+}
+
+static inline void *lsm_get_blob(const struct lsm_blob *bp, const int lsm)
+{
+   if (bp == NULL)
+   return NULL;
+   return bp->lsm_blobs[lsm];
+}
+
+static inline void lsm_set_blob(void **vpp, void *value, const int lsm)
+{
+   struct lsm_blob *bp = *vpp;
+
+   if (value == NULL && bp->lsm_blobs[lsm] != NULL)
+   bp->lsm_setcount--;
+   if (value != NULL && bp->lsm_blobs[lsm] == NULL)
+   bp->lsm_setcount++;
+
+   bp->lsm_blobs[lsm] = value;
+}
+
+static inline void *lsm_get_cred(const struct cred *cred,
+   const struct security_operations *sop)
+{
+   return lsm_get_blob(cred->security, sop->order);
+}
+
+static inline void lsm_set_cred(struct cred *cred, void *value,
+   const struct security_operations *sop)
+{
+   lsm_set_blob(>security, value, sop->order);
+}
+
+static inline int lsm_set_init_cred(struct cred *cred, void *value,
+   const struct security_operations *sop)
+{
+   if (cred->security == NULL) {
+   cred->security = lsm_alloc_blob(GFP_KERNEL);
+   if (cred->security == NULL)
+   return -ENOMEM;
+   }
+
+   lsm_set_blob(>security, value, sop->order);
+   return 0;
+}
+
+static inline void *lsm_get_file(const struct file *file,
+   const struct security_operations *sop)
+{
+   return lsm_get_blob(file->f_security, sop->order);
+}
+
+static inline void lsm_set_file(struct file *file, void *value,
+   const struct security_operations *sop)
+{
+   lsm_set_blob(>f_security, value, sop->order);
+}
+
+static inline void *lsm_get_inode(const struct inode *inode,
+   const struct security_operations *sop)
+{
+   return lsm_get_blob(inode->i_security, sop->order);
+}
+
+static inline void lsm_set_inode(struct inode *inode, void *value,
+   const struct security_operations *sop)
+{
+   lsm_set_blob(>i_security, value, sop->order);
+}
+
+static inline void *lsm_get_super(const struct super_block *super,
+   const struct security_operations *sop)
+{
+   return lsm_get_blob(super->s_security, sop->order);
+}
+
+static inline void lsm_set_super(struct super_block *super, void *value,
+   const struct security_operations *sop)
+{
+   lsm_set_blob(>s_security, value, sop->order);
+}
+
+static inline void *lsm_get_ipc(const struct kern_ipc_perm *ipc,
+   const struct security_operations *sop)
+{
+   return lsm_get_blob(ipc->security, sop->order);
+}
+
+static inline void lsm_set_ipc(struct kern_ipc_perm *ipc, void *value,
+   const struct security_operations *sop)
+{
+   lsm_set_blob(>security, value, sop->order);
+}
+
+static inline void *lsm_get_msg(const struct msg_msg *msg,
+   const struct security_operations *sop)
+{

[PATCH v12 6/9] LSM: Multiple concurrent LSMs

2013-01-07 Thread Casey Schaufler
Subject: [PATCH v12 6/9] LSM: Multiple concurrent LSMs

Change the infrastructure for Linux Security Modules (LSM)s
from a single vector of hook handlers to a list based method
for handling multiple concurrent modules. 

Changes for SELinux. Abstract access to security blobs.
Add the now required parameter to reset_security_ops().
Remove commoncap calls.

Signed-off-by: Casey Schaufler 

---
 security/selinux/hooks.c  |  410 ++---
 security/selinux/include/objsec.h |2 +
 security/selinux/include/xfrm.h   |2 +-
 security/selinux/netlabel.c   |   13 +-
 security/selinux/selinuxfs.c  |6 +-
 security/selinux/xfrm.c   |9 +-
 6 files changed, 222 insertions(+), 220 deletions(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 61a5336..8ec7ea0 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -150,6 +150,7 @@ static int selinux_secmark_enabled(void)
  */
 static void cred_init_security(void)
 {
+   int rc;
struct cred *cred = (struct cred *) current->real_cred;
struct task_security_struct *tsec;
 
@@ -158,7 +159,9 @@ static void cred_init_security(void)
panic("SELinux:  Failed to initialize initial task.\n");
 
tsec->osid = tsec->sid = SECINITSID_KERNEL;
-   cred->security = tsec;
+   rc = lsm_set_init_cred(cred, tsec, _ops);
+   if (rc)
+   panic("SELinux:  Failed to initialize initial task.\n");
 }
 
 /*
@@ -168,7 +171,7 @@ static inline u32 cred_sid(const struct cred *cred)
 {
const struct task_security_struct *tsec;
 
-   tsec = cred->security;
+   tsec = lsm_get_cred(cred, _ops);
return tsec->sid;
 }
 
@@ -190,8 +193,9 @@ static inline u32 task_sid(const struct task_struct *task)
  */
 static inline u32 current_sid(void)
 {
-   const struct task_security_struct *tsec = current_security();
+   const struct task_security_struct *tsec;
 
+   tsec = lsm_get_cred(current_cred(), _ops);
return tsec->sid;
 }
 
@@ -212,22 +216,23 @@ static int inode_alloc_security(struct inode *inode)
isec->sid = SECINITSID_UNLABELED;
isec->sclass = SECCLASS_FILE;
isec->task_sid = sid;
-   inode->i_security = isec;
+   lsm_set_inode(inode, isec, _ops);
 
return 0;
 }
 
 static void inode_free_security(struct inode *inode)
 {
-   struct inode_security_struct *isec = inode->i_security;
-   struct superblock_security_struct *sbsec = inode->i_sb->s_security;
+   struct inode_security_struct *isec = lsm_get_inode(inode, _ops);
+   struct superblock_security_struct *sbsec =
+   lsm_get_super(inode->i_sb, _ops);
 
spin_lock(>isec_lock);
if (!list_empty(>list))
list_del_init(>list);
spin_unlock(>isec_lock);
 
-   inode->i_security = NULL;
+   lsm_set_inode(inode, NULL, _ops);
kmem_cache_free(sel_inode_cache, isec);
 }
 
@@ -242,15 +247,15 @@ static int file_alloc_security(struct file *file)
 
fsec->sid = sid;
fsec->fown_sid = sid;
-   file->f_security = fsec;
+   lsm_set_file(file, fsec, _ops);
 
return 0;
 }
 
 static void file_free_security(struct file *file)
 {
-   struct file_security_struct *fsec = file->f_security;
-   file->f_security = NULL;
+   struct file_security_struct *fsec = lsm_get_file(file, _ops);
+   lsm_set_file(file, NULL, _ops);
kfree(fsec);
 }
 
@@ -269,15 +274,16 @@ static int superblock_alloc_security(struct super_block 
*sb)
sbsec->sid = SECINITSID_UNLABELED;
sbsec->def_sid = SECINITSID_FILE;
sbsec->mntpoint_sid = SECINITSID_UNLABELED;
-   sb->s_security = sbsec;
+   lsm_set_super(sb, sbsec, _ops);
 
return 0;
 }
 
 static void superblock_free_security(struct super_block *sb)
 {
-   struct superblock_security_struct *sbsec = sb->s_security;
-   sb->s_security = NULL;
+   struct superblock_security_struct *sbsec =
+   lsm_get_super(sb, _ops);
+   lsm_set_super(sb, NULL, _ops);
kfree(sbsec);
 }
 
@@ -323,9 +329,10 @@ static int may_context_mount_sb_relabel(u32 sid,
struct superblock_security_struct *sbsec,
const struct cred *cred)
 {
-   const struct task_security_struct *tsec = cred->security;
+   const struct task_security_struct *tsec;
int rc;
 
+   tsec = lsm_get_cred(cred, _ops);
rc = avc_has_perm(tsec->sid, sbsec->sid, SECCLASS_FILESYSTEM,
  FILESYSTEM__RELABELFROM, NULL);
if (rc)
@@ -340,8 +347,10 @@ static int may_context_mount_inode_relabel(u32 sid,
struct superblock_security_struct *sbsec,
const struct cred *cred)
 {
-   const struct task_security_struct *tsec = cred->security;
+   const struct task_security_struct *tsec;
  

[PATCH 16/33] sched: Update rq clock earlier in unthrottle_cfs_rq

2013-01-07 Thread Frederic Weisbecker
In this function we are making use of rq->clock right before the
update of the rq clock, let's just call update_rq_clock() just
before that to avoid using a stale rq clock value.

Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: Ingo Molnar 
Cc: Li Zhong 
Cc: Namhyung Kim 
Cc: Paul E. McKenney 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
---
 kernel/sched/fair.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a96f0f2..3d65ac7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2279,14 +2279,15 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
long task_delta;
 
se = cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))];
-
cfs_rq->throttled = 0;
+
+   update_rq_clock(rq);
+
raw_spin_lock(_b->lock);
cfs_b->throttled_time += rq->clock - cfs_rq->throttled_clock;
list_del_rcu(_rq->throttled_list);
raw_spin_unlock(_b->lock);
 
-   update_rq_clock(rq);
/* update hierarchical throttle state */
walk_tg_tree_from(cfs_rq->tg, tg_nop, tg_unthrottle_up, (void *)rq);
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >