Re: [PATCH 2/6] ARM: Tegra: Add CPU's OPPs for using cpufreq-cpu0 driver

2013-08-08 Thread Viresh Kumar
On 8 August 2013 19:52, Lucas Stach  wrote:
> You can certainly define the mapping table in DT where a specialized
> Tegra cpufreq driver could read it in and then map frequency to voltage.
> But that's a runtime decision, as Speedo and process ID are fuse values
> and can not be represented in DT.

> The problem with this is that the hardware description now associates
> voltages with certain frequencies and even if they are not used by the
> Linux driver they are plain wrong.

Hmm. I understand.
Then we probably need mach-tegra/opp.c to call opp_add() for all such
OPPs.. Neither DT nor cpufreq driver are the right place for this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Revert "slub: do not put a slab to cpu partial list when cpu_partial is 0"

2013-08-08 Thread Pekka Enberg
On Thu, Aug 8, 2013 at 4:19 PM, Steven Rostedt  wrote:
> This reverts commit 318df36e57c0ca9f2146660d41ff28e8650af423

[snip]

> Signed-off-by: Steven Rostedt 

Acked-by: Pekka Enberg 

Linus, can you pick this up or do you want a pull request?

   Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


man-pages-3.53 is released

2013-08-08 Thread Michael Kerrisk (man-pages)
Gidday,

I've released man-pages-3.53 - man pages for Linux.

Tarball download:
http://www.kernel.org/doc/man-pages/download.html
Git repository:
https://git.kernel.org/cgit/docs/man-pages/man-pages.git/
Online changelog:
http://man7.org/linux/man-pages/changelog.html#release_3.53

A short summary of the release is blogged at:
http://linux-man-pages.blogspot.com/2013/08/man-pages-353-is-released.html

The current version of the pages is browsable at:
http://man7.org/linux/man-pages/

A few changes in this release that may be of interest to readers of
this list are given below.

Cheers,

Michael


 Changes in man-pages-3.53 

New and rewritten pages
---

restart_syscall.2
Michael Kerrisk
New page for restart_syscall(2) system call


Newly documented interfaces in existing pages
-

fchownat.2
Michael Kerrisk
Document AT_EMPTY_PATH

fstatat.2
Michael Kerrisk
Document AT_EMPTY_PATH

linkat.2
Michael Kerrisk
Document AT_EMPTY_PATH

open.2
Michael Kerrisk [Al Viro]
Document O_PATH
See also https://bugzilla.redhat.com/show_bug.cgi?id=885740


Changes to individual pages
---

open.2
Michael Kerrisk [Geoffrey Thomas]
Remove warning that O_DIRECTORY is only for use with opendir(3)
O_DIRECTORY can also be used with, for example, O_PATH.

perf_event_open.2
Vince Weaver
Improve PERF_SAMPLE_BRANCH_STACK documentation
Vince Weaver
Fix indentation of the MMAP layout section
The indentation of the MMAP layout section wasn't quite right.
I think this improves things but I admit I'm not an expert at the
low-level indentation directives.
Vince Weaver
Update PERF_IOC_FLAG_GROUP info
It turns out PERF_IOC_FLAG_GROUP was broken from 75f937f24bd9
(in Linux 2.6.31, the initial perf_event release) until
724b6daa1 (Linux 3.4).

I've done some extensive kernel source code digging plus
running tests of various kernels and I hope the info
presented is accurate now.

(Patch edited somewhat by mtk.)
Vince Weaver
Improve sysfs files documentation
This improves the documentation of the various
perf_event_open()-related sysfs files.

ptrace.2
Denys Vlasenko  [Oleg Nesterov, Dmitry V. Levin]
If SEIZE was used, initial auto-attach stop is EVENT_STOP
For every PTRACE_O_TRACEfoo option, mention that old-style SIGSTOP
is replaced by PTRACE_EVENT_STOP if PTRACE_SEIZE attach was used.

Mention the same thing again in the description of
PTRACE_EVENT_STOP.
Denys Vlasenko  [Oleg Nesterov, Dmitry V. Levin]
Mention that PTRACE_PEEK* libc API and kernel API are different
Denys Vlasenko  [Oleg Nesterov, Dmitry V. Levin]
Clarify PTRACE_INTERRUPT, PTRACE_LISTEN, and group-stop behavior

readlink.2
Michael Kerrisk
Document use of empty 'pathname' argument

capabilities.7
Michael Kerrisk
Add open_by_handle_at(2) under CAP_DAC_READ_SEARCH

ld.so.8
Michael Kerrisk
Rework rpath token expansion text
Michael Kerrisk
Describe $PLATFORM rpath token
Michael Kerrisk
Describe $LIB rpath token
Michael Kerrisk
Document LD_BIND_NOT

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH cgroup/for-3.12] cgroup: make css_for_each_descendant() and friends include the origin css in the iteration

2013-08-08 Thread Michal Hocko
Ohh, this one totally sliped through cracs.

On Sun 04-08-13 19:07:03, Tejun Heo wrote:
[...]
> From 0e84b0865ab8a87f1c1443e4777c20c7f14e13b6 Mon Sep 17 00:00:00 2001
> From: Tejun Heo 
> Date: Sun, 4 Aug 2013 19:01:23 -0400
> 
> Previously, all css descendant iterators didn't include the origin
> (root of subtree) css in the iteration.  The reasons were maintaining
> consistency with css_for_each_child() and that at the time of
> introduction more use cases needed skipping the origin anyway;
> however, given that css_is_descendant() considers self to be a
> descendant, omitting the origin css has become more confusing and
> looking at the accumulated use cases rather clearly indicates that
> including origin would result in simpler code overall.
> 
> While this is a change which can easily lead to subtle bugs, cgroup
> API including the iterators has recently gone through major
> restructuring and no out-of-tree changes will be applicable without
> adjustments making this a relatively acceptable opportunity for this
> type of change.
> 
> The conversions are mostly straight-forward.  If the iteration block
> had explicit origin handling before or after, it's moved inside the
> iteration.  If not, if (pos == origin) continue; is added.  Some
> conversions add extra reference get/put around origin handling by
> consolidating origin handling and the rest.  While the extra ref
> operations aren't strictly necessary, this shouldn't cause any
> noticeable difference.
> 
> Signed-off-by: Tejun Heo 
> Cc: Li Zefan 
> Cc: Jens Axboe 
> Cc: Vivek Goyal 
> Cc: Matt Helsley 
> Cc: Johannes Weiner 
> Cc: Michal Hocko 
> Cc: Balbir Singh 
> Cc: Aristeu Rozanski 

yes I like it. I found it strange to omit the root during walk.
Acked-by: Michal Hocko 

> ---
>  block/blk-cgroup.c   |  8 ++--
>  block/blk-cgroup.h   |  4 +++-
>  block/blk-throttle.c |  3 ---
>  include/linux/cgroup.h   | 17 +
>  kernel/cgroup.c  | 29 +++--
>  kernel/cgroup_freezer.c  | 29 -
>  kernel/cpuset.c  | 42 ++
>  mm/memcontrol.c  |  9 +
>  security/device_cgroup.c |  2 +-
>  9 files changed, 69 insertions(+), 74 deletions(-)
> 
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index 54ad002..e90c7c1 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -615,12 +615,10 @@ u64 blkg_stat_recursive_sum(struct blkg_policy_data 
> *pd, int off)
>   struct blkcg_policy *pol = blkcg_policy[pd->plid];
>   struct blkcg_gq *pos_blkg;
>   struct cgroup_subsys_state *pos_css;
> - u64 sum;
> + u64 sum = 0;
>  
>   lockdep_assert_held(pd->blkg->q->queue_lock);
>  
> - sum = blkg_stat_read((void *)pd + off);
> -
>   rcu_read_lock();
>   blkg_for_each_descendant_pre(pos_blkg, pos_css, pd_to_blkg(pd)) {
>   struct blkg_policy_data *pos_pd = blkg_to_pd(pos_blkg, pol);
> @@ -650,13 +648,11 @@ struct blkg_rwstat blkg_rwstat_recursive_sum(struct 
> blkg_policy_data *pd,
>   struct blkcg_policy *pol = blkcg_policy[pd->plid];
>   struct blkcg_gq *pos_blkg;
>   struct cgroup_subsys_state *pos_css;
> - struct blkg_rwstat sum;
> + struct blkg_rwstat sum = { };
>   int i;
>  
>   lockdep_assert_held(pd->blkg->q->queue_lock);
>  
> - sum = blkg_rwstat_read((void *)pd + off);
> -
>   rcu_read_lock();
>   blkg_for_each_descendant_pre(pos_blkg, pos_css, pd_to_blkg(pd)) {
>   struct blkg_policy_data *pos_pd = blkg_to_pd(pos_blkg, pol);
> diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
> index 8555386..ae6969a 100644
> --- a/block/blk-cgroup.h
> +++ b/block/blk-cgroup.h
> @@ -291,6 +291,7 @@ struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg, 
> struct request_queue *q,
>   * read locked.  If called under either blkcg or queue lock, the iteration
>   * is guaranteed to include all and only online blkgs.  The caller may
>   * update @pos_css by calling css_rightmost_descendant() to skip subtree.
> + * @p_blkg is included in the iteration and the first node to be visited.
>   */
>  #define blkg_for_each_descendant_pre(d_blkg, pos_css, p_blkg)
> \
>   css_for_each_descendant_pre((pos_css), &(p_blkg)->blkcg->css)   \
> @@ -304,7 +305,8 @@ struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg, 
> struct request_queue *q,
>   * @p_blkg: target blkg to walk descendants of
>   *
>   * Similar to blkg_for_each_descendant_pre() but performs post-order
> - * traversal instead.  Synchronization rules are the same.
> + * traversal instead.  Synchronization rules are the same.  @p_blkg is
> + * included in the iteration and the last node to be visited.
>   */
>  #define blkg_for_each_descendant_post(d_blkg, pos_css, p_blkg)   
> \
>   css_for_each_descendant_post((pos_css), &(p_blkg)->blkcg->css)  \
> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index 8cefa7f..8331aba 

Re: [PATCH 2/6] ARM: Tegra: Add CPU's OPPs for using cpufreq-cpu0 driver

2013-08-08 Thread Lucas Stach
Am Donnerstag, den 08.08.2013, 19:41 +0530 schrieb Viresh Kumar:
> On 8 August 2013 19:28, Lucas Stach  wrote:
> > From what I learned those voltage levels are dependent on both the
> > Speedo and the process ID of the specific Tegra processor. So you really
> > get a two dimensional mapping table instead of a single OPP.
> > Also you can not scale the CPU voltage on it's own, but have to make
> > sure the core voltage isn't too far away from. Then core voltage also
> > depends on the operating states of engines like GR2D or even display.
> 
> So if they depend on a certain type of SoC, which they should, then we
> can get these initialized from that SoC's dts/dtsi file instead of a common
> file.. And so that would resolve the issue you just reported.
> 
You can certainly define the mapping table in DT where a specialized
Tegra cpufreq driver could read it in and then map frequency to voltage.
But that's a runtime decision, as Speedo and process ID are fuse values
and can not be represented in DT.

> Now I haven't proposed in the patch that we will change these voltage
> levels at all.. This is regulator specific code and would come into play
> only when regulators are registered for cpu.. Otherwise we will just
> play with frequency..
> 
> Passing OPP instead of just list of frequencies is the generic way this
> is done now a days..

The problem with this is that the hardware description now associates
voltages with certain frequencies and even if they are not used by the
Linux driver they are plain wrong.

Regards,
Lucas
-- 
Pengutronix e.K.   | Lucas Stach |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 21/35] cpufreq: omap: use cpufreq_table_validate_and_show()

2013-08-08 Thread Santosh Shilimkar
On Thursday 08 August 2013 09:48 AM, Viresh Kumar wrote:
> Lets use cpufreq_table_validate_and_show() instead of calling
> cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().
> 
> Cc: Santosh Shilimkar 
> Signed-off-by: Viresh Kumar 
> ---
Acked-by: Santosh Shilimkar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 17/35] cpufreq: kirkwood: use cpufreq_table_validate_and_show()

2013-08-08 Thread Andrew Lunn
On Thu, Aug 08, 2013 at 07:18:19PM +0530, Viresh Kumar wrote:
> Lets use cpufreq_table_validate_and_show() instead of calling
> cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().
> 
> Cc: Andrew Lunn 
> Signed-off-by: Viresh Kumar 
> ---
>  drivers/cpufreq/kirkwood-cpufreq.c | 10 +-
>  1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/drivers/cpufreq/kirkwood-cpufreq.c 
> b/drivers/cpufreq/kirkwood-cpufreq.c
> index 45e4d7f..336f171 100644
> --- a/drivers/cpufreq/kirkwood-cpufreq.c
> +++ b/drivers/cpufreq/kirkwood-cpufreq.c
> @@ -125,19 +125,11 @@ static int kirkwood_cpufreq_target(struct 
> cpufreq_policy *policy,
>  /* Module init and exit code */
>  static int kirkwood_cpufreq_cpu_init(struct cpufreq_policy *policy)
>  {
> - int result;
> -
>   /* cpuinfo and default policy values */
>   policy->cpuinfo.transition_latency = 5000; /* 5uS */
>   policy->cur = kirkwood_cpufreq_get_cpu_frequency(0);
>  
> - result = cpufreq_frequency_table_cpuinfo(policy, kirkwood_freq_table);
> - if (result)
> - return result;
> -
> - cpufreq_frequency_table_get_attr(kirkwood_freq_table, policy->cpu);
> -
> - return 0;
> + return cpufreq_table_validate_and_show(policy, kirkwood_freq_table);
>  }
>  
>  static int kirkwood_cpufreq_cpu_exit(struct cpufreq_policy *policy)
> -- 
> 1.7.12.rc2.18.g61b472e

Reviewed-by: Andrew Lunn 

Thanks
Andrew 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: at91/dt: split sama5d3 definition

2013-08-08 Thread Jean-Christophe PLAGNIOL-VILLARD
On 10:49 Wed 07 Aug , Boris BREZILLON wrote:
> This patch splits the sama5d3 SoCs definition:
> - a common base for all sama5d3 SoCs (sama5d3.dtsi)
> - several optional peripheral definitions which will be included by sama5d3
>   specific SoCs (sama5d3_'periph name'.dtsi)
> - sama5d3 specific SoC definitions (sama5d3x.dtsi)
> 
> This provides a better representation of the real hardware (drop unneed
> dt nodes) and avoids peripheral id conflict (which is not the case for
> current sama5d3 SoCs, but could be if other SoCs of this family are
> released).

same comment as the other patch

too much file for no real advantage so no for me

Best Regards,
J.
> 
> Signed-off-by: Boris BREZILLON 
> ---
>  arch/arm/boot/dts/sama5d3.dtsi  |  203 
> ---
>  arch/arm/boot/dts/sama5d31.dtsi |   16 +++
>  arch/arm/boot/dts/sama5d31ek.dts|3 +-
>  arch/arm/boot/dts/sama5d33.dtsi |   14 +++
>  arch/arm/boot/dts/sama5d33ek.dts|1 +
>  arch/arm/boot/dts/sama5d34.dtsi |   16 +++
>  arch/arm/boot/dts/sama5d34ek.dts|1 +
>  arch/arm/boot/dts/sama5d35.dtsi |   18 
>  arch/arm/boot/dts/sama5d35ek.dts|1 +
>  arch/arm/boot/dts/sama5d3_can.dtsi  |   54 ++
>  arch/arm/boot/dts/sama5d3_emac.dtsi |   44 
>  arch/arm/boot/dts/sama5d3_gmac.dtsi |   77 +
>  arch/arm/boot/dts/sama5d3_lcd.dtsi  |   55 ++
>  arch/arm/boot/dts/sama5d3_mci2.dtsi |   47 
>  arch/arm/boot/dts/sama5d3_tcb1.dtsi |   27 +
>  arch/arm/boot/dts/sama5d3_uart.dtsi |   53 +
>  arch/arm/boot/dts/sama5d3xcm.dtsi   |1 -
>  17 files changed, 426 insertions(+), 205 deletions(-)
>  create mode 100644 arch/arm/boot/dts/sama5d31.dtsi
>  create mode 100644 arch/arm/boot/dts/sama5d33.dtsi
>  create mode 100644 arch/arm/boot/dts/sama5d34.dtsi
>  create mode 100644 arch/arm/boot/dts/sama5d35.dtsi
>  create mode 100644 arch/arm/boot/dts/sama5d3_can.dtsi
>  create mode 100644 arch/arm/boot/dts/sama5d3_emac.dtsi
>  create mode 100644 arch/arm/boot/dts/sama5d3_gmac.dtsi
>  create mode 100644 arch/arm/boot/dts/sama5d3_lcd.dtsi
>  create mode 100644 arch/arm/boot/dts/sama5d3_mci2.dtsi
>  create mode 100644 arch/arm/boot/dts/sama5d3_tcb1.dtsi
>  create mode 100644 arch/arm/boot/dts/sama5d3_uart.dtsi
> 
> diff --git a/arch/arm/boot/dts/sama5d3.dtsi b/arch/arm/boot/dts/sama5d3.dtsi
> index a1d5e25..b72f310 100644
> --- a/arch/arm/boot/dts/sama5d3.dtsi
> +++ b/arch/arm/boot/dts/sama5d3.dtsi
> @@ -31,7 +31,6 @@
>   gpio3 = 
>   gpio4 = 
>   tcb0 = 
> - tcb1 = 
>   i2c0 = 
>   i2c1 = 
>   i2c2 = 
> @@ -100,15 +99,6 @@
>   status = "disabled";
>   };
>  
> - can0: can@f000c000 {
> - compatible = "atmel,at91sam9x5-can";
> - reg = <0xf000c000 0x300>;
> - interrupts = <40 IRQ_TYPE_LEVEL_HIGH 3>;
> - pinctrl-names = "default";
> - pinctrl-0 = <_can0_rx_tx>;
> - status = "disabled";
> - };
> -
>   tcb0: timer@f001 {
>   compatible = "atmel,at91sam9x5-tcb";
>   reg = <0xf001 0x100>;
> @@ -161,15 +151,6 @@
>   status = "disabled";
>   };
>  
> - macb0: ethernet@f0028000 {
> - compatible = "cdns,pc302-gem", "cdns,gem";
> - reg = <0xf0028000 0x100>;
> - interrupts = <34 IRQ_TYPE_LEVEL_HIGH 3>;
> - pinctrl-names = "default";
> - pinctrl-0 = <_macb0_data_rgmii 
> _macb0_signal_rgmii>;
> - status = "disabled";
> - };
> -
>   isi: isi@f0034000 {
>   compatible = "atmel,at91sam9g45-isi";
>   reg = <0xf0034000 0x4000>;
> @@ -190,19 +171,6 @@
>   #size-cells = <0>;
>   };
>  
> - mmc2: mmc@f8004000 {
> - compatible = "atmel,hsmci";
> - reg = <0xf8004000 0x600>;
> - interrupts = <23 IRQ_TYPE_LEVEL_HIGH 0>;
> - dmas = < 2 AT91_DMA_CFG_PER_ID(1)>;
> - dma-names = "rxtx";
> - pinctrl-names = "default";
> - pinctrl-0 = <_mmc2_clk_cmd_dat0 
> _mmc2_dat1_3>;
> - status = "disabled";
> - #address-cells = <1>;
> - #size-cells = <0>;
> - };
> -
>   

Re: [PATCH 7/7] btrfs: cleanup: removed unused 'btrfs_get_inode_ref_index'

2013-08-08 Thread David Sterba
On Thu, Aug 08, 2013 at 12:43:23AM +0300, Sergei Trofimovich wrote:
> From: Sergei Trofimovich 
> 
> Found by uselex.rb:
> > btrfs_get_inode_ref_index: [R]: exported from: fs/btrfs/inode-item.o 
> > fs/btrfs/btrfs.o fs/btrfs/built-in.o
> 
> Signed-off-by: Sergei Trofimovich 

Safe to remove.

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/35] cpufreq: acpi-cpufreq: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/acpi-cpufreq.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 9b5d1b1..75e829d 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -837,7 +837,7 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
data->freq_table[valid_states].frequency = CPUFREQ_TABLE_END;
perf->state = 0;
 
-   result = cpufreq_frequency_table_cpuinfo(policy, data->freq_table);
+   result = cpufreq_table_validate_and_show(policy, data->freq_table);
if (result)
goto err_freqfree;
 
@@ -868,8 +868,6 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
(u32) perf->states[i].power,
(u32) perf->states[i].transition_latency);
 
-   cpufreq_frequency_table_get_attr(data->freq_table, policy->cpu);
-
/*
 * the first call to ->target() should result in us actually
 * writing something to the appropriate registers.
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH part2 3/4] acpi: Remove "continue" in macro INVALID_TABLE().

2013-08-08 Thread Joe Perches
On Thu, 2013-08-08 at 20:18 +0800, Tang Chen wrote:
> Hi Joe,

Hello Tang.

> On 08/08/2013 01:27 PM, Joe Perches wrote:
> > On Thu, 2013-08-08 at 13:03 +0800, Tang Chen wrote:
> >
> >> Change it to the style like other macros:
> >>
> >>   #define INVALID_TABLE(x, path, name)\
> >>   do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } 
> >> while (0)
> >
> > Single statement macros do _not_ need to use
> > "do { foo(); } while (0)"
> > and should be written as
> > "foo()"
> 
> OK, will remove the do {} while (0).
> 
> But I think we'd better keep the macro, or rename it to something
> more meaningful. At least we can use it to avoid adding "ACPI OVERRIDE:"
> prefix every time. Maybe this is why it is defined.

No, it's just silly.

If you really think that the #define is better, use
something like HW_ERR does and embed that #define
in the pr_err.

#define ACPI_OVERRIDE   "ACPI OVERRIDE: "

pr_err(ACPI_OVERRIDE "Table smaller than ACPI header [%s%s]\n",
   cpio_path, file.name);

It's only used a few times by a single file so
I think it's unnecessary.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/35] cpufreq: pxa: call cpufreq_frequency_table_get_attr()

2013-08-08 Thread Viresh Kumar
This exposes frequency table of driver to cpufreq core and is required for core
to guess what the index for a target frequency is, when it calls
cpufreq_frequency_table_target(). And so this driver needs to expose it.

Cc: Eric Miao 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/pxa2xx-cpufreq.c | 6 +-
 drivers/cpufreq/pxa3xx-cpufreq.c | 8 ++--
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/pxa2xx-cpufreq.c b/drivers/cpufreq/pxa2xx-cpufreq.c
index fb3981a..c7ea005 100644
--- a/drivers/cpufreq/pxa2xx-cpufreq.c
+++ b/drivers/cpufreq/pxa2xx-cpufreq.c
@@ -453,10 +453,14 @@ static int pxa_cpufreq_init(struct cpufreq_policy *policy)
find_freq_tables(_freq_table, _freqs);
pr_info("PXA255 cpufreq using %s frequency table\n",
pxa255_turbo_table ? "turbo" : "run");
+
cpufreq_frequency_table_cpuinfo(policy, pxa255_freq_table);
+   cpufreq_frequency_table_get_attr(pxa255_freq_table, 
policy->cpu);
}
-   else if (cpu_is_pxa27x())
+   else if (cpu_is_pxa27x()) {
cpufreq_frequency_table_cpuinfo(policy, pxa27x_freq_table);
+   cpufreq_frequency_table_get_attr(pxa27x_freq_table, 
policy->cpu);
+   }
 
printk(KERN_INFO "PXA CPU frequency change support initialized\n");
 
diff --git a/drivers/cpufreq/pxa3xx-cpufreq.c b/drivers/cpufreq/pxa3xx-cpufreq.c
index 9c92ef0..f53f28d6 100644
--- a/drivers/cpufreq/pxa3xx-cpufreq.c
+++ b/drivers/cpufreq/pxa3xx-cpufreq.c
@@ -91,7 +91,7 @@ static int setup_freqs_table(struct cpufreq_policy *policy,
 struct pxa3xx_freq_info *freqs, int num)
 {
struct cpufreq_frequency_table *table;
-   int i;
+   int i, ret;
 
table = kzalloc((num + 1) * sizeof(*table), GFP_KERNEL);
if (table == NULL)
@@ -108,7 +108,11 @@ static int setup_freqs_table(struct cpufreq_policy *policy,
pxa3xx_freqs_num = num;
pxa3xx_freqs_table = table;
 
-   return cpufreq_frequency_table_cpuinfo(policy, table);
+   ret = cpufreq_frequency_table_cpuinfo(policy, table);
+   if (!ret)
+   cpufreq_frequency_table_get_attr(table, policy->cpu);
+
+   return ret;
 }
 
 static void __update_core_freq(struct pxa3xx_freq_info *info)
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] ARM: Tegra: Add CPU's OPPs for using cpufreq-cpu0 driver

2013-08-08 Thread Viresh Kumar
On 8 August 2013 19:28, Lucas Stach  wrote:
> From what I learned those voltage levels are dependent on both the
> Speedo and the process ID of the specific Tegra processor. So you really
> get a two dimensional mapping table instead of a single OPP.
> Also you can not scale the CPU voltage on it's own, but have to make
> sure the core voltage isn't too far away from. Then core voltage also
> depends on the operating states of engines like GR2D or even display.

So if they depend on a certain type of SoC, which they should, then we
can get these initialized from that SoC's dts/dtsi file instead of a common
file.. And so that would resolve the issue you just reported.

Now I haven't proposed in the patch that we will change these voltage
levels at all.. This is regulator specific code and would come into play
only when regulators are registered for cpu.. Otherwise we will just
play with frequency..

Passing OPP instead of just list of frequencies is the generic way this
is done now a days..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 18/27] sched: Favour moving tasks towards the preferred node

2013-08-08 Thread Mel Gorman
This patch favours moving tasks towards NUMA node that recorded a higher
number of NUMA faults during active load balancing.  Ideally this is
self-reinforcing as the longer the task runs on that node, the more faults
it should incur causing task_numa_placement to keep the task running on that
node. In reality a big weakness is that the nodes CPUs can be overloaded
and it would be more efficient to queue tasks on an idle node and migrate
to the new node. This would require additional smarts in the balancer so
for now the balancer will simply prefer to place the task on the preferred
node for a PTE scans which is controlled by the numa_balancing_settle_count
sysctl. Once the settle_count number of scans has complete the schedule
is free to place the task on an alternative node if the load is imbalanced.

[sri...@linux.vnet.ibm.com: Fixed statistics]
[pet...@infradead.org: Tunable and use higher faults instead of preferred]
Signed-off-by: Peter Zijlstra 
Signed-off-by: Mel Gorman 
---
 Documentation/sysctl/kernel.txt |  8 +-
 include/linux/sched.h   |  1 +
 kernel/sched/core.c |  3 +-
 kernel/sched/fair.c | 62 ++---
 kernel/sched/features.h |  7 +
 kernel/sysctl.c |  7 +
 6 files changed, 82 insertions(+), 6 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index ad8d4f5..23ff00a 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -374,7 +374,8 @@ feature should be disabled. Otherwise, if the system 
overhead from the
 feature is too high then the rate the kernel samples for NUMA hinting
 faults may be controlled by the numa_balancing_scan_period_min_ms,
 numa_balancing_scan_delay_ms, numa_balancing_scan_period_reset,
-numa_balancing_scan_period_max_ms and numa_balancing_scan_size_mb sysctls.
+numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb and
+numa_balancing_settle_count sysctls.
 
 ==
 
@@ -419,6 +420,11 @@ scanned for a given scan.
 numa_balancing_scan_period_reset is a blunt instrument that controls how
 often a tasks scan delay is reset to detect sudden changes in task behaviour.
 
+numa_balancing_settle_count is how many scan periods must complete before
+the schedule balancer stops pushing the task towards a preferred node. This
+gives the scheduler a chance to place the task on an alternative node if the
+preferred node is overloaded.
+
 ==
 
 osrelease, ostype & version:
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d017be9..e7f3f87 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -776,6 +776,7 @@ enum cpu_idle_type {
 #define SD_ASYM_PACKING0x0800  /* Place busy groups earlier in 
the domain */
 #define SD_PREFER_SIBLING  0x1000  /* Prefer to place tasks in a sibling 
domain */
 #define SD_OVERLAP 0x2000  /* sched_domains of this level overlap 
*/
+#define SD_NUMA0x4000  /* cross-node balancing */
 
 extern int __weak arch_sd_sibiling_asym_packing(void);
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index aad32ff..4bd88bf 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1632,7 +1632,7 @@ static void __sched_fork(struct task_struct *p)
 
p->node_stamp = 0ULL;
p->numa_scan_seq = p->mm ? p->mm->numa_scan_seq : 0;
-   p->numa_migrate_seq = p->mm ? p->mm->numa_scan_seq - 1 : 0;
+   p->numa_migrate_seq = 0;
p->numa_scan_period = sysctl_numa_balancing_scan_delay;
p->numa_preferred_nid = -1;
p->numa_work.next = >numa_work;
@@ -5613,6 +5613,7 @@ sd_numa_init(struct sched_domain_topology_level *tl, int 
cpu)
| 0*SD_SHARE_PKG_RESOURCES
| 1*SD_SERIALIZE
| 0*SD_PREFER_SIBLING
+   | 1*SD_NUMA
| sd_local_flags(level)
,
.last_balance   = jiffies,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8ca8901..dad3ae9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -877,6 +877,15 @@ static unsigned int task_scan_max(struct task_struct *p)
return max(smin, smax);
 }
 
+/*
+ * Once a preferred node is selected the scheduler balancer will prefer moving
+ * a task to that node for sysctl_numa_balancing_settle_count number of PTE
+ * scans. This will give the process the chance to accumulate more faults on
+ * the preferred node but still allow the scheduler to move the task again if
+ * the nodes CPUs are overloaded.
+ */
+unsigned int sysctl_numa_balancing_settle_count __read_mostly = 3;
+
 static void task_numa_placement(struct task_struct *p)
 {
int 

[PATCH 0/27] Basic scheduler support for automatic NUMA balancing V6

2013-08-08 Thread Mel Gorman
This is another revision of the scheduler patches for NUMA balancing. Peter
and Rik, note that the grouping patches, the cpunid conversion and the
task swapping patches are missing as I ran into trouble while testing
them. They are rebased and available in the linux-balancenuma.git tree to
save you the bother of having to rebase them yourselves.

Changelog since V6
o Various TLB flush optimisations
o Comment updates
o Sanitise task_numa_fault callsites for consistent semantics
o Revert some of the scanning adaption stuff
o Revert patch that defers scanning until task schedules on another node
o Start delayed scanning properly
o Avoid the same task always performing the PTE scan
o Continue PTE scanning even if migration is rate limited

Changelog since V5
o Add __GFP_NOWARN for numa hinting fault count
o Use is_huge_zero_page
o Favour moving tasks towards nodes with higher faults
o Optionally resist moving tasks towards nodes with lower faults
o Scan shared THP pages

Changelog since V4
o Added code that avoids overloading preferred nodes
o Swap tasks if nodes are overloaded and the swap does not impair locality

Changelog since V3
o Correct detection of unset last nid/pid information
o Dropped nr_preferred_running and replaced it with Peter's load balancing
o Pass in correct node information for THP hinting faults
o Pressure tasks sharing a THP page to move towards same node
o Do not set pmd_numa if false sharing is detected

Changelog since V2
o Reshuffle to match Peter's implied preference for layout
o Reshuffle to move private/shared split towards end of series to make it
  easier to evaluate the impact
o Use PID information to identify private accesses
o Set the floor for PTE scanning based on virtual address space scan rates
  instead of time
o Some locking improvements
o Do not preempt pinned tasks unless they are kernel threads

Changelog since V1
o Scan pages with elevated map count (shared pages)
o Scale scan rates based on the vsz of the process so the sampling of the
  task is independant of its size
o Favour moving towards nodes with more faults even if it's not the
  preferred node
o Laughably basic accounting of a compute overloaded node when selecting
  the preferred node.
o Applied review comments

This series integrates basic scheduler support for automatic NUMA balancing.
It borrows very heavily from Peter Ziljstra's work in "sched, numa, mm:
Add adaptive NUMA affinity support" but deviates too much to preserve
Signed-off-bys. As before, if the relevant authors are ok with it I'll
add Signed-off-bys (or add them yourselves if you pick the patches up).

This is still far from complete and there are known performance gaps
between this series and manual binding (when that is possible). As before,
the intention is not to complete the work but to incrementally improve
mainline and preserve bisectability for any bug reports that crop up. In
some cases performance may be worse unfortunately and when that happens
it will have to be judged if the system overhead is lower and if so,
is it still an acceptable direction as a stepping stone to something better.

Patches 1-2 adds sysctl documentation and comment fixlets

Patch 3 corrects a THP NUMA hint fault accounting bug

Patch 4 avoids trying to migrate the THP zero page

Patch 5 sanitizes task_numa_fault callsites to have consist semantics and
always record the fault based on the correct location of the page.

Patch 6 avoids the same task being selected to perform the PTE scan within
a shared address space.

Patch 7 continues PTE scanning even if migration rate limited

Patch 8 notes that delaying the PTE scan until a task is scheduled on an
alternatie node misses the case where the task is only accessing
shared memory on a partially loaded machine and reverts a patch.

Patch 9 initialses numa_next_scan properly so that PTE scanning is delayed
when a process starts.

Patch 10 slows the scanning rate if the task is idle

Patch 11 sets the scan rate proportional to the size of the task being
scanned.

Patch 12 is a minor adjustment to scan rate

Patches 13-14 avoids TLB flushes during the PTE scan if no updates are made

Patch 15 tracks NUMA hinting faults per-task and per-node

Patches 16-20 selects a preferred node at the end of a PTE scan based on what
node incurrent the highest number of NUMA faults. When the balancer
is comparing two CPU it will prefer to locate tasks on their
preferred node. When initially selected the task is rescheduled on
the preferred node if it is not running on that node already. This
avoids waiting for the scheduler to move the task slowly.

Patch 21 adds infrastructure to allow separate tracking of shared/private
pages but treats all faults as if they are private accesses. Laying
it out this way reduces churn later in the series when private
fault detection is introduced

Patch 22 avoids some unnecessary 

[PATCH 09/27] sched: numa: Initialise numa_next_scan properly

2013-08-08 Thread Mel Gorman
Scan delay logic and resets are currently initialised to start scanning
immediately instead of delaying properly. Initialise them properly at
fork time and catch when a new mm has been allocated.

Signed-off-by: Mel Gorman 
---
 kernel/sched/core.c | 4 ++--
 kernel/sched/fair.c | 7 +++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b7c32cb..e148975 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1625,8 +1625,8 @@ static void __sched_fork(struct task_struct *p)
 
 #ifdef CONFIG_NUMA_BALANCING
if (p->mm && atomic_read(>mm->mm_users) == 1) {
-   p->mm->numa_next_scan = jiffies;
-   p->mm->numa_next_reset = jiffies;
+   p->mm->numa_next_scan = jiffies + 
msecs_to_jiffies(sysctl_numa_balancing_scan_delay);
+   p->mm->numa_next_reset = jiffies + 
msecs_to_jiffies(sysctl_numa_balancing_scan_period_reset);
p->mm->numa_scan_seq = 0;
}
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bb5d978..2f16703 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -900,6 +900,13 @@ void task_numa_work(struct callback_head *work)
if (p->flags & PF_EXITING)
return;
 
+   if (!mm->numa_next_reset || !mm->numa_next_scan) {
+   mm->numa_next_scan = now +
+   msecs_to_jiffies(sysctl_numa_balancing_scan_delay);
+   mm->numa_next_reset = now +
+   
msecs_to_jiffies(sysctl_numa_balancing_scan_period_reset);
+   }
+
/*
 * Reset the scan period if enough time has gone by. Objective is that
 * scanning will be reduced if pages are properly placed. As tasks
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/27] mm, numa: Sanitize task_numa_fault() callsites

2013-08-08 Thread Mel Gorman
From: Peter Zijlstra 

There are three callers of task_numa_fault():

 - do_huge_pmd_numa_page():
 Accounts against the current node, not the node where the
 page resides, unless we migrated, in which case it accounts
 against the node we migrated to.

 - do_numa_page():
 Accounts against the current node, not the node where the
 page resides, unless we migrated, in which case it accounts
 against the node we migrated to.

 - do_pmd_numa_page():
 Accounts not at all when the page isn't migrated, otherwise
 accounts against the node we migrated towards.

This seems wrong to me; all three sites should have the same
sementaics, furthermore we should accounts against where the page
really is, we already know where the task is.

So modify all three sites to always account; we did after all receive
the fault; and always account to where the page is after migration,
regardless of success.

They all still differ on when they clear the PTE/PMD; ideally that
would get sorted too.

Signed-off-by: Peter Zijlstra 
Signed-off-by: Mel Gorman 
---
 mm/huge_memory.c | 21 -
 mm/memory.c  | 53 +
 2 files changed, 33 insertions(+), 41 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4ebe3aa..bf59194 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1292,9 +1292,9 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
 {
struct page *page;
unsigned long haddr = addr & HPAGE_PMD_MASK;
+   int page_nid = -1, this_nid = numa_node_id();
int target_nid;
-   int src_nid = -1;
-   bool migrated;
+   bool migrated = false;
 
spin_lock(>page_table_lock);
if (unlikely(!pmd_same(pmd, *pmdp)))
@@ -1311,9 +1311,9 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
goto clear_pmdnuma;
 
get_page(page);
-   src_nid = numa_node_id();
+   page_nid = page_to_nid(page);
count_vm_numa_event(NUMA_HINT_FAULTS);
-   if (src_nid == page_to_nid(page))
+   if (page_nid == this_nid)
count_vm_numa_event(NUMA_HINT_FAULTS_LOCAL);
 
target_nid = mpol_misplaced(page, vma, haddr);
@@ -1338,11 +1338,12 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
/* Migrate the THP to the requested node */
migrated = migrate_misplaced_transhuge_page(mm, vma,
pmdp, pmd, addr, page, target_nid);
-   if (!migrated)
+   if (migrated)
+   page_nid = target_nid;
+   else
goto check_same;
 
-   task_numa_fault(target_nid, HPAGE_PMD_NR, true);
-   return 0;
+   goto out;
 
 check_same:
spin_lock(>page_table_lock);
@@ -1355,8 +1356,10 @@ clear_pmdnuma:
update_mmu_cache_pmd(vma, addr, pmdp);
 out_unlock:
spin_unlock(>page_table_lock);
-   if (src_nid != -1)
-   task_numa_fault(src_nid, HPAGE_PMD_NR, false);
+
+out:
+   if (page_nid != -1)
+   task_numa_fault(page_nid, HPAGE_PMD_NR, migrated);
return 0;
 }
 
diff --git a/mm/memory.c b/mm/memory.c
index 871b881..b0307f2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3517,12 +3517,12 @@ static int do_nonlinear_fault(struct mm_struct *mm, 
struct vm_area_struct *vma,
 }
 
 int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
-   unsigned long addr, int current_nid)
+   unsigned long addr, int page_nid)
 {
get_page(page);
 
count_vm_numa_event(NUMA_HINT_FAULTS);
-   if (current_nid == numa_node_id())
+   if (page_nid == numa_node_id())
count_vm_numa_event(NUMA_HINT_FAULTS_LOCAL);
 
return mpol_misplaced(page, vma, addr);
@@ -3533,7 +3533,7 @@ int do_numa_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
 {
struct page *page = NULL;
spinlock_t *ptl;
-   int current_nid = -1;
+   int page_nid = -1;
int target_nid;
bool migrated = false;
 
@@ -3568,15 +3568,10 @@ int do_numa_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
return 0;
}
 
-   current_nid = page_to_nid(page);
-   target_nid = numa_migrate_prep(page, vma, addr, current_nid);
+   page_nid = page_to_nid(page);
+   target_nid = numa_migrate_prep(page, vma, addr, page_nid);
pte_unmap_unlock(ptep, ptl);
if (target_nid == -1) {
-   /*
-* Account for the fault against the current node if it not
-* being replaced regardless of where the page is located.
-*/
-   current_nid = numa_node_id();
put_page(page);
goto out;
}
@@ -3584,11 +3579,11 @@ int do_numa_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
/* Migrate to the requested node 

[PATCH 02/27] sched, numa: Comment fixlets

2013-08-08 Thread Mel Gorman
From: Peter Zijlstra 

Fix a 80 column violation and a PTE vs PMD reference.

Signed-off-by: Peter Zijlstra 
Signed-off-by: Mel Gorman 
---
 kernel/sched/fair.c | 8 
 mm/huge_memory.c| 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bb456f4..679cfcf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -988,10 +988,10 @@ void task_numa_work(struct callback_head *work)
 
 out:
/*
-* It is possible to reach the end of the VMA list but the last few 
VMAs are
-* not guaranteed to the vma_migratable. If they are not, we would find 
the
-* !migratable VMA on the next scan but not reset the scanner to the 
start
-* so check it now.
+* It is possible to reach the end of the VMA list but the last few
+* VMAs are not guaranteed to the vma_migratable. If they are not, we
+* would find the !migratable VMA on the next scan but not reset the
+* scanner to the start so check it now.
 */
if (vma)
mm->numa_scan_offset = start;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 243e710..45ef9dc 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1317,7 +1317,7 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
spin_unlock(>page_table_lock);
lock_page(page);
 
-   /* Confirm the PTE did not while locked */
+   /* Confirm the PMD did not change while page_table_lock was released */
spin_lock(>page_table_lock);
if (unlikely(!pmd_same(pmd, *pmdp))) {
unlock_page(page);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/27] mm: numa: Do not migrate or account for hinting faults on the zero page

2013-08-08 Thread Mel Gorman
The zero page is not replicated between nodes and is often shared
between processes. The data is read-only and likely to be cached in
local CPUs if heavily accessed meaning that the remote memory access
cost is less of a concern. This patch stops accounting for numa hinting
faults on the zero page in both terms of counting faults and scheduling
tasks on nodes.

[pet...@infradead.org: Correct use of is_huge_zero_page]
Signed-off-by: Mel Gorman 
---
 mm/huge_memory.c | 9 +
 mm/memory.c  | 7 ++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e52c131..4ebe3aa 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1301,6 +1301,15 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
goto out_unlock;
 
page = pmd_page(pmd);
+
+   /*
+* Do not account for faults against the huge zero page. The read-only
+* data is likely to be read-cached on the local CPUs and it is less
+* useful to know about local versus remote hits on the zero page.
+*/
+   if (is_huge_zero_page(page))
+   goto clear_pmdnuma;
+
get_page(page);
src_nid = numa_node_id();
count_vm_numa_event(NUMA_HINT_FAULTS);
diff --git a/mm/memory.c b/mm/memory.c
index 1ce2e2a..871b881 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3557,8 +3557,13 @@ int do_numa_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
set_pte_at(mm, addr, ptep, pte);
update_mmu_cache(vma, addr, ptep);
 
+   /*
+* Do not account for faults against the zero page. The read-only data
+* is likely to be read-cached on the local CPUs and it is less useful
+* to know about local versus remote hits on the zero page.
+*/
page = vm_normal_page(vma, addr, pte);
-   if (!page) {
+   if (!page || is_zero_pfn(page_to_pfn(page))) {
pte_unmap_unlock(ptep, ptl);
return 0;
}
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/27] sched: Set the scan rate proportional to the memory usage of the task being scanned

2013-08-08 Thread Mel Gorman
The NUMA PTE scan rate is controlled with a combination of the
numa_balancing_scan_period_min, numa_balancing_scan_period_max and
numa_balancing_scan_size. This scan rate is independent of the size
of the task and as an aside it is further complicated by the fact that
numa_balancing_scan_size controls how many pages are marked pte_numa and
not how much virtual memory is scanned.

In combination, it is almost impossible to meaningfully tune the min and
max scan periods and reasoning about performance is complex when the time
to complete a full scan is is partially a function of the tasks memory
size. This patch alters the semantic of the min and max tunables to be
about tuning the length time it takes to complete a scan of a tasks occupied
virtual address space. Conceptually this is a lot easier to understand. There
is a "sanity" check to ensure the scan rate is never extremely fast based on
the amount of virtual memory that should be scanned in a second. The default
of 2.5G seems arbitrary but it is to have the maximum scan rate after the
patch roughly match the maximum scan rate before the patch was applied.

Signed-off-by: Mel Gorman 
---
 Documentation/sysctl/kernel.txt | 11 +++---
 include/linux/sched.h   |  1 +
 kernel/sched/fair.c | 84 -
 3 files changed, 81 insertions(+), 15 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index ccadb52..ad8d4f5 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -402,15 +402,16 @@ workload pattern changes and minimises performance impact 
due to remote
 memory accesses. These sysctls control the thresholds for scan delays and
 the number of pages scanned.
 
-numa_balancing_scan_period_min_ms is the minimum delay in milliseconds
-between scans. It effectively controls the maximum scanning rate for
-each task.
+numa_balancing_scan_period_min_ms is the minimum time in milliseconds to
+scan a tasks virtual memory. It effectively controls the maximum scanning
+rate for each task.
 
 numa_balancing_scan_delay_ms is the starting "scan delay" used for a task
 when it initially forks.
 
-numa_balancing_scan_period_max_ms is the maximum delay between scans. It
-effectively controls the minimum scanning rate for each task.
+numa_balancing_scan_period_max_ms is the maximum time in milliseconds to
+scan a tasks virtual memory. It effectively controls the minimum scanning
+rate for each task.
 
 numa_balancing_scan_size_mb is how many megabytes worth of pages are
 scanned for a given scan.
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 50d04b9..59c473b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1331,6 +1331,7 @@ struct task_struct {
int numa_scan_seq;
int numa_migrate_seq;
unsigned int numa_scan_period;
+   unsigned int numa_scan_period_max;
u64 node_stamp; /* migration stamp  */
struct callback_head numa_work;
 #endif /* CONFIG_NUMA_BALANCING */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 056ddc9..2908b4e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -818,10 +818,12 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct 
sched_entity *se)
 
 #ifdef CONFIG_NUMA_BALANCING
 /*
- * numa task sample period in ms
+ * Approximate time to scan a full NUMA task in ms. The task scan period is
+ * calculated based on the tasks virtual memory size and
+ * numa_balancing_scan_size.
  */
-unsigned int sysctl_numa_balancing_scan_period_min = 100;
-unsigned int sysctl_numa_balancing_scan_period_max = 100*50;
+unsigned int sysctl_numa_balancing_scan_period_min = 1000;
+unsigned int sysctl_numa_balancing_scan_period_max = 60;
 unsigned int sysctl_numa_balancing_scan_period_reset = 100*600;
 
 /* Portion of address space to scan in MB */
@@ -830,6 +832,51 @@ unsigned int sysctl_numa_balancing_scan_size = 256;
 /* Scan @scan_size MB every @scan_period after an initial @scan_delay in ms */
 unsigned int sysctl_numa_balancing_scan_delay = 1000;
 
+static unsigned int task_nr_scan_windows(struct task_struct *p)
+{
+   unsigned long rss = 0;
+   unsigned long nr_scan_pages;
+
+   /*
+* Calculations based on RSS as non-present and empty pages are skipped
+* by the PTE scanner and NUMA hinting faults should be trapped based
+* on resident pages
+*/
+   nr_scan_pages = sysctl_numa_balancing_scan_size << (20 - PAGE_SHIFT);
+   rss = get_mm_rss(p->mm);
+   if (!rss)
+   rss = nr_scan_pages;
+
+   rss = round_up(rss, nr_scan_pages);
+   return rss / nr_scan_pages;
+}
+
+/* For sanitys sake, never scan more PTEs than MAX_SCAN_WINDOW MB/sec. */
+#define MAX_SCAN_WINDOW 2560
+
+static unsigned int task_scan_min(struct task_struct *p)
+{
+   unsigned int scan, floor;
+   unsigned int windows = 1;
+
+   if (sysctl_numa_balancing_scan_size < 

[PATCH 10/27] sched: numa: Slow scan rate if no NUMA hinting faults are being recorded

2013-08-08 Thread Mel Gorman
NUMA PTE scanning slows if a NUMA hinting fault was trapped and no page
was migrated. For long-lived but idle processes there may be no faults
but the scan rate will be high and just waste CPU. This patch will slow
the scan rate for processes that are not trapping faults.

Signed-off-by: Mel Gorman 
---
 kernel/sched/fair.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2f16703..056ddc9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -975,6 +975,18 @@ void task_numa_work(struct callback_head *work)
 
 out:
/*
+* If the whole process was scanned without updates then no NUMA
+* hinting faults are being recorded and scan rate should be lower.
+*/
+   if (mm->numa_scan_offset == 0 && !nr_pte_updates) {
+   p->numa_scan_period = min(p->numa_scan_period_max,
+   p->numa_scan_period << 1);
+
+   next_scan = now + msecs_to_jiffies(p->numa_scan_period);
+   mm->numa_next_scan = next_scan;
+   }
+
+   /*
 * It is possible to reach the end of the VMA list but the last few
 * VMAs are not guaranteed to the vma_migratable. If they are not, we
 * would find the !migratable VMA on the next scan but not reset the
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/27] sched, numa: Mitigate chance that same task always updates PTEs

2013-08-08 Thread Mel Gorman
From: Peter Zijlstra 

With a trace_printk("working\n"); right after the cmpxchg in
task_numa_work() we can see that of a 4 thread process, its always the
same task winning the race and doing the protection change.

This is a problem since the task doing the protection change has a
penalty for taking faults -- it is busy when marking the PTEs. If its
always the same task the ->numa_faults[] get severely skewed.

Avoid this by delaying the task doing the protection change such that
it is unlikely to win the privilege again.

Before:

root@interlagos:~# grep "thread 0/.*working" /debug/tracing/trace | tail -15
  thread 0/0-3232  [022]    212.787402: task_numa_work: working
  thread 0/0-3232  [022]    212.888473: task_numa_work: working
  thread 0/0-3232  [022]    212.989538: task_numa_work: working
  thread 0/0-3232  [022]    213.090602: task_numa_work: working
  thread 0/0-3232  [022]    213.191667: task_numa_work: working
  thread 0/0-3232  [022]    213.292734: task_numa_work: working
  thread 0/0-3232  [022]    213.393804: task_numa_work: working
  thread 0/0-3232  [022]    213.494869: task_numa_work: working
  thread 0/0-3232  [022]    213.596937: task_numa_work: working
  thread 0/0-3232  [022]    213.699000: task_numa_work: working
  thread 0/0-3232  [022]    213.801067: task_numa_work: working
  thread 0/0-3232  [022]    213.903155: task_numa_work: working
  thread 0/0-3232  [022]    214.005201: task_numa_work: working
  thread 0/0-3232  [022]    214.107266: task_numa_work: working
  thread 0/0-3232  [022]    214.209342: task_numa_work: working

After:

root@interlagos:~# grep "thread 0/.*working" /debug/tracing/trace | tail -15
  thread 0/0-3253  [005]    136.865051: task_numa_work: working
  thread 0/2-3255  [026]    136.965134: task_numa_work: working
  thread 0/3-3256  [024]    137.065217: task_numa_work: working
  thread 0/3-3256  [024]    137.165302: task_numa_work: working
  thread 0/3-3256  [024]    137.265382: task_numa_work: working
  thread 0/0-3253  [004]    137.366465: task_numa_work: working
  thread 0/2-3255  [026]    137.466549: task_numa_work: working
  thread 0/0-3253  [004]    137.566629: task_numa_work: working
  thread 0/0-3253  [004]    137.666711: task_numa_work: working
  thread 0/1-3254  [028]    137.766799: task_numa_work: working
  thread 0/0-3253  [004]    137.866876: task_numa_work: working
  thread 0/2-3255  [026]    137.966960: task_numa_work: working
  thread 0/1-3254  [028]    138.067041: task_numa_work: working
  thread 0/2-3255  [026]    138.167123: task_numa_work: working
  thread 0/3-3256  [024]    138.267207: task_numa_work: working

Signed-off-by: Peter Zijlstra 
Signed-off-by: Mel Gorman 
---
 kernel/sched/fair.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 679cfcf..2a08727 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -946,6 +946,12 @@ void task_numa_work(struct callback_head *work)
return;
 
/*
+* Delay this task enough that another task of this mm will likely win
+* the next time around.
+*/
+   p->node_stamp += 2 * TICK_NSEC;
+
+   /*
 * Do not set pte_numa if the current running node is rate-limited.
 * This loses statistics on the fault but if we are unwilling to
 * migrate to this node, it is less likely we can do useful work
@@ -1026,7 +1032,7 @@ void task_tick_numa(struct rq *rq, struct task_struct 
*curr)
if (now - curr->node_stamp > period) {
if (!curr->node_stamp)
curr->numa_scan_period = 
sysctl_numa_balancing_scan_period_min;
-   curr->node_stamp = now;
+   curr->node_stamp += period;
 
if (!time_before(jiffies, curr->mm->numa_next_scan)) {
init_task_work(work, task_numa_work); /* TODO: move 
this into sched_fork() */
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/27] mm: Only flush TLBs if a transhuge PMD is modified for NUMA pte scanning

2013-08-08 Thread Mel Gorman
NUMA PTE scanning is expensive both in terms of the scanning itself and
the TLB flush if there are any updates. The TLB flush is avoided if no
PTEs are updated but there is a bug where transhuge PMDs are considered
to be updated even if they were already pmd_numa. This patch addresses
the problem and TLB flushes should be reduced.

Signed-off-by: Mel Gorman 
---
 mm/huge_memory.c | 20 +---
 mm/mprotect.c| 14 ++
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bf59194..e6beb0f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1456,6 +1456,12 @@ out:
return ret;
 }
 
+/*
+ * Returns
+ *  - 0 if PMD could not be locked
+ *  - 1 if PMD was locked but protections unchange and TLB flush unnecessary
+ *  - HPAGE_PMD_NR is protections changed and TLB flush necessary
+ */
 int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, pgprot_t newprot, int prot_numa)
 {
@@ -1464,22 +1470,30 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t 
*pmd,
 
if (__pmd_trans_huge_lock(pmd, vma) == 1) {
pmd_t entry;
-   entry = pmdp_get_and_clear(mm, addr, pmd);
+   ret = 1;
if (!prot_numa) {
+   entry = pmdp_get_and_clear(mm, addr, pmd);
entry = pmd_modify(entry, newprot);
+   ret = HPAGE_PMD_NR;
BUG_ON(pmd_write(entry));
} else {
struct page *page = pmd_page(*pmd);
+   ret = 1;
 
/* only check non-shared pages */
if (page_mapcount(page) == 1 &&
!pmd_numa(*pmd)) {
+   entry = pmdp_get_and_clear(mm, addr, pmd);
entry = pmd_mknuma(entry);
+   ret = HPAGE_PMD_NR;
}
}
-   set_pmd_at(mm, addr, pmd, entry);
+
+   /* Set PMD if cleared earlier */
+   if (ret == HPAGE_PMD_NR)
+   set_pmd_at(mm, addr, pmd, entry);
+
spin_unlock(>vm_mm->page_table_lock);
-   ret = 1;
}
 
return ret;
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 94722a4..1f4ab1c 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -143,10 +143,16 @@ static inline unsigned long change_pmd_range(struct 
vm_area_struct *vma,
if (pmd_trans_huge(*pmd)) {
if (next - addr != HPAGE_PMD_SIZE)
split_huge_page_pmd(vma, addr, pmd);
-   else if (change_huge_pmd(vma, pmd, addr, newprot,
-prot_numa)) {
-   pages += HPAGE_PMD_NR;
-   continue;
+   else {
+   int nr_ptes = change_huge_pmd(vma, pmd, addr,
+   newprot, prot_numa);
+
+   if (nr_ptes) {
+   if (nr_ptes == HPAGE_PMD_NR)
+   pages += HPAGE_PMD_NR;
+
+   continue;
+   }
}
/* fall through */
}
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/27] mm: numa: Account for THP numa hinting faults on the correct node

2013-08-08 Thread Mel Gorman
THP NUMA hinting fault on pages that are not migrated are being
accounted for incorrectly. Currently the fault will be counted as if the
task was running on a node local to the page which is not necessarily
true.

Signed-off-by: Mel Gorman 
---
 mm/huge_memory.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 45ef9dc..e52c131 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1293,7 +1293,7 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
struct page *page;
unsigned long haddr = addr & HPAGE_PMD_MASK;
int target_nid;
-   int current_nid = -1;
+   int src_nid = -1;
bool migrated;
 
spin_lock(>page_table_lock);
@@ -1302,9 +1302,9 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
 
page = pmd_page(pmd);
get_page(page);
-   current_nid = page_to_nid(page);
+   src_nid = numa_node_id();
count_vm_numa_event(NUMA_HINT_FAULTS);
-   if (current_nid == numa_node_id())
+   if (src_nid == page_to_nid(page))
count_vm_numa_event(NUMA_HINT_FAULTS_LOCAL);
 
target_nid = mpol_misplaced(page, vma, haddr);
@@ -1346,8 +1346,8 @@ clear_pmdnuma:
update_mmu_cache_pmd(vma, addr, pmdp);
 out_unlock:
spin_unlock(>page_table_lock);
-   if (current_nid != -1)
-   task_numa_fault(current_nid, HPAGE_PMD_NR, false);
+   if (src_nid != -1)
+   task_numa_fault(src_nid, HPAGE_PMD_NR, false);
return 0;
 }
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14/27] mm: Do not flush TLB during protection change if !pte_present && !migration_entry

2013-08-08 Thread Mel Gorman
NUMA PTE scanning is expensive both in terms of the scanning itself and
the TLB flush if there are any updates. Currently non-present PTEs are
accounted for as an update and incurring a TLB flush where it is only
necessary for anonymous migration entries. This patch addresses the
problem and should reduce TLB flushes.

Signed-off-by: Mel Gorman 
---
 mm/mprotect.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index 1f4ab1c..8e7e9bd 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -101,8 +101,9 @@ static unsigned long change_pte_range(struct vm_area_struct 
*vma, pmd_t *pmd,
make_migration_entry_read();
set_pte_at(mm, addr, pte,
swp_entry_to_pte(entry));
+
+   pages++;
}
-   pages++;
}
} while (pte++, addr += PAGE_SIZE, addr != end);
arch_leave_lazy_mmu_mode();
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/27] sched, numa: Continue PTE scanning even if migrate rate limited

2013-08-08 Thread Mel Gorman
From: Peter Zijlstra 

Avoiding marking PTEs pte_numa because a particular NUMA node is migrate rate
limited sees like a bad idea. Even if this node can't migrate anymore other
nodes might and we want up-to-date information to do balance decisions.
We already rate limit the actual migrations, this should leave enough
bandwidth to allow the non-migrating scanning. I think its important we
keep up-to-date information if we're going to do placement based on it.

Signed-off-by: Peter Zijlstra 
Signed-off-by: Mel Gorman 
---
 kernel/sched/fair.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2a08727..cc2ee69 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -951,14 +951,6 @@ void task_numa_work(struct callback_head *work)
 */
p->node_stamp += 2 * TICK_NSEC;
 
-   /*
-* Do not set pte_numa if the current running node is rate-limited.
-* This loses statistics on the fault but if we are unwilling to
-* migrate to this node, it is less likely we can do useful work
-*/
-   if (migrate_ratelimited(numa_node_id()))
-   return;
-
start = mm->numa_scan_offset;
pages = sysctl_numa_balancing_scan_size;
pages <<= 20 - PAGE_SHIFT; /* MB in pages */
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 22/27] sched: Check current->mm before allocating NUMA faults

2013-08-08 Thread Mel Gorman
task_numa_placement checks current->mm but after buffers for faults
have already been uselessly allocated. Move the check earlier.

[pet...@infradead.org: Identified the problem]
Signed-off-by: Mel Gorman 
---
 kernel/sched/fair.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3e2ff8c..f07073b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -930,8 +930,6 @@ static void task_numa_placement(struct task_struct *p)
int seq, nid, max_nid = -1;
unsigned long max_faults = 0;
 
-   if (!p->mm) /* for example, ksmd faulting in a user's mm */
-   return;
seq = ACCESS_ONCE(p->mm->numa_scan_seq);
if (p->numa_scan_seq == seq)
return;
@@ -998,6 +996,10 @@ void task_numa_fault(int last_nid, int node, int pages, 
bool migrated)
if (!sched_feat_numa(NUMA))
return;
 
+   /* for example, ksmd faulting in a user's mm */
+   if (!p->mm)
+   return;
+
/* For now, do not attempt to detect private/shared accesses */
priv = 1;
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 19/27] sched: Resist moving tasks towards nodes with fewer hinting faults

2013-08-08 Thread Mel Gorman
Just as "sched: Favour moving tasks towards the preferred node" favours
moving tasks towards nodes with a higher number of recorded NUMA hinting
faults, this patch resists moving tasks towards nodes with lower faults.

[mgor...@suse.de: changelog]
Signed-off-by: Peter Zijlstra 
Signed-off-by: Mel Gorman 
---
 kernel/sched/fair.c | 33 +
 kernel/sched/features.h |  8 
 2 files changed, 41 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index dad3ae9..f828803 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4057,12 +4057,43 @@ static bool migrate_improves_locality(struct 
task_struct *p, struct lb_env *env)
 
return false;
 }
+
+
+static bool migrate_degrades_locality(struct task_struct *p, struct lb_env 
*env)
+{
+   int src_nid, dst_nid;
+
+   if (!sched_feat(NUMA) || !sched_feat(NUMA_RESIST_LOWER))
+   return false;
+
+   if (!p->numa_faults || !(env->sd->flags & SD_NUMA))
+   return false;
+
+   src_nid = cpu_to_node(env->src_cpu);
+   dst_nid = cpu_to_node(env->dst_cpu);
+
+   if (src_nid == dst_nid ||
+   p->numa_migrate_seq >= sysctl_numa_balancing_settle_count)
+   return false;
+
+   if (p->numa_faults[dst_nid] < p->numa_faults[src_nid])
+   return true;
+ 
+   return false;
+}
+
 #else
 static inline bool migrate_improves_locality(struct task_struct *p,
 struct lb_env *env)
 {
return false;
 }
+
+static inline bool migrate_degrades_locality(struct task_struct *p,
+struct lb_env *env)
+{
+   return false;
+}
 #endif
 
 /*
@@ -4125,6 +4156,8 @@ int can_migrate_task(struct task_struct *p, struct lb_env 
*env)
 * 3) too many balance attempts have failed.
 */
tsk_cache_hot = task_hot(p, rq_clock_task(env->src_rq), env->sd);
+   if (!tsk_cache_hot)
+   tsk_cache_hot = migrate_degrades_locality(p, env);
 
if (migrate_improves_locality(p, env)) {
 #ifdef CONFIG_SCHEDSTATS
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index d9278ce..5716929 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -74,4 +74,12 @@ SCHED_FEAT(NUMA, false)
  * balancing.
  */
 SCHED_FEAT(NUMA_FAVOUR_HIGHER, true)
+
+/*
+ * NUMA_RESIST_LOWER will resist moving tasks towards nodes where a
+ * lower number of hinting faults have been recorded. As this has
+ * the potential to prevent a task ever migrating to a new node
+ * due to CPU overload it is disabled by default.
+ */
+SCHED_FEAT(NUMA_RESIST_LOWER, false)
 #endif
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 23/27] mm: numa: Scan pages with elevated page_mapcount

2013-08-08 Thread Mel Gorman
Currently automatic NUMA balancing is unable to distinguish between false
shared versus private pages except by ignoring pages with an elevated
page_mapcount entirely. This avoids shared pages bouncing between the
nodes whose task is using them but that is ignored quite a lot of data.

This patch kicks away the training wheels in preparation for adding support
for identifying shared/private pages is now in place. The ordering is so
that the impact of the shared/private detection can be easily measured. Note
that the patch does not migrate shared, file-backed within vmas marked
VM_EXEC as these are generally shared library pages. Migrating such pages
is not beneficial as there is an expectation they are read-shared between
caches and iTLB and iCache pressure is generally low.

Signed-off-by: Mel Gorman 
---
 include/linux/migrate.h |  7 ---
 mm/huge_memory.c|  5 +
 mm/memory.c |  7 ++-
 mm/migrate.c| 17 ++---
 mm/mprotect.c   |  4 +---
 5 files changed, 14 insertions(+), 26 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index a405d3dc..e7e26af 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -92,11 +92,12 @@ static inline int migrate_huge_page_move_mapping(struct 
address_space *mapping,
 #endif /* CONFIG_MIGRATION */
 
 #ifdef CONFIG_NUMA_BALANCING
-extern int migrate_misplaced_page(struct page *page, int node);
-extern int migrate_misplaced_page(struct page *page, int node);
+extern int migrate_misplaced_page(struct page *page,
+ struct vm_area_struct *vma, int node);
 extern bool migrate_ratelimited(int node);
 #else
-static inline int migrate_misplaced_page(struct page *page, int node)
+static inline int migrate_misplaced_page(struct page *page,
+struct vm_area_struct *vma, int node)
 {
return -EAGAIN; /* can't migrate now */
 }
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 52c4706..a6153eb 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1478,12 +1478,9 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t 
*pmd,
ret = HPAGE_PMD_NR;
BUG_ON(pmd_write(entry));
} else {
-   struct page *page = pmd_page(*pmd);
ret = 1;
 
-   /* only check non-shared pages */
-   if (page_mapcount(page) == 1 &&
-   !pmd_numa(*pmd)) {
+   if (!pmd_numa(*pmd)) {
entry = pmdp_get_and_clear(mm, addr, pmd);
entry = pmd_mknuma(entry);
ret = HPAGE_PMD_NR;
diff --git a/mm/memory.c b/mm/memory.c
index 7170707..0e7010c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3579,7 +3579,7 @@ int do_numa_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
}
 
/* Migrate to the requested node */
-   migrated = migrate_misplaced_page(page, target_nid);
+   migrated = migrate_misplaced_page(page, vma, target_nid);
if (migrated)
page_nid = target_nid;
 
@@ -3644,16 +3644,13 @@ static int do_pmd_numa_page(struct mm_struct *mm, 
struct vm_area_struct *vma,
page = vm_normal_page(vma, addr, pteval);
if (unlikely(!page))
continue;
-   /* only check non-shared pages */
-   if (unlikely(page_mapcount(page) != 1))
-   continue;
 
last_nid = page_nid_last(page);
page_nid = page_to_nid(page);
target_nid = numa_migrate_prep(page, vma, addr, page_nid);
pte_unmap_unlock(pte, ptl);
if (target_nid != -1) {
-   migrated = migrate_misplaced_page(page, target_nid);
+   migrated = migrate_misplaced_page(page, vma, 
target_nid);
if (migrated)
page_nid = target_nid;
} else {
diff --git a/mm/migrate.c b/mm/migrate.c
index 6f0c244..08ac3ba 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1596,7 +1596,8 @@ int numamigrate_isolate_page(pg_data_t *pgdat, struct 
page *page)
  * node. Caller is expected to have an elevated reference count on
  * the page that will be dropped by this function before returning.
  */
-int migrate_misplaced_page(struct page *page, int node)
+int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
+  int node)
 {
pg_data_t *pgdat = NODE_DATA(node);
int isolated;
@@ -1604,10 +1605,11 @@ int migrate_misplaced_page(struct page *page, int node)
LIST_HEAD(migratepages);
 
/*
-* Don't migrate pages that are mapped in multiple processes.
-* TODO: Handle false sharing detection instead of this hammer
+* Don't migrate file 

[PATCH 20/27] sched: Reschedule task on preferred NUMA node once selected

2013-08-08 Thread Mel Gorman
A preferred node is selected based on the node the most NUMA hinting
faults was incurred on. There is no guarantee that the task is running
on that node at the time so this patch rescheules the task to run on
the most idle CPU of the selected node when selected. This avoids
waiting for the balancer to make a decision.

Signed-off-by: Mel Gorman 
---
 kernel/sched/core.c  | 19 +++
 kernel/sched/fair.c  | 46 +-
 kernel/sched/sched.h |  1 +
 3 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4bd88bf..2269f5e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4318,6 +4318,25 @@ fail:
return ret;
 }
 
+#ifdef CONFIG_NUMA_BALANCING
+/* Migrate current task p to target_cpu */
+int migrate_task_to(struct task_struct *p, int target_cpu)
+{
+   struct migration_arg arg = { p, target_cpu };
+   int curr_cpu = task_cpu(p);
+
+   if (curr_cpu == target_cpu)
+   return 0;
+
+   if (!cpumask_test_cpu(target_cpu, tsk_cpus_allowed(p)))
+   return -EINVAL;
+
+   /* TODO: This is not properly updating schedstats */
+
+   return stop_one_cpu(curr_cpu, migration_cpu_stop, );
+}
+#endif
+
 /*
  * migration_cpu_stop - this will be executed by a highprio stopper thread
  * and performs thread migration by bumping thread off CPU then
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f828803..dd2c0f3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -886,6 +886,31 @@ static unsigned int task_scan_max(struct task_struct *p)
  */
 unsigned int sysctl_numa_balancing_settle_count __read_mostly = 3;
 
+static unsigned long weighted_cpuload(const int cpu);
+
+
+static int
+find_idlest_cpu_node(int this_cpu, int nid)
+{
+   unsigned long load, min_load = ULONG_MAX;
+   int i, idlest_cpu = this_cpu;
+
+   BUG_ON(cpu_to_node(this_cpu) == nid);
+
+   rcu_read_lock();
+   for_each_cpu(i, cpumask_of_node(nid)) {
+   load = weighted_cpuload(i);
+
+   if (load < min_load) {
+   min_load = load;
+   idlest_cpu = i;
+   }
+   }
+   rcu_read_unlock();
+
+   return idlest_cpu;
+}
+
 static void task_numa_placement(struct task_struct *p)
 {
int seq, nid, max_nid = -1;
@@ -916,10 +941,29 @@ static void task_numa_placement(struct task_struct *p)
}
}
 
-   /* Update the tasks preferred node if necessary */
+   /*
+* Record the preferred node as the node with the most faults,
+* requeue the task to be running on the idlest CPU on the
+* preferred node and reset the scanning rate to recheck
+* the working set placement.
+*/
if (max_faults && max_nid != p->numa_preferred_nid) {
+   int preferred_cpu;
+
+   /*
+* If the task is not on the preferred node then find the most
+* idle CPU to migrate to.
+*/
+   preferred_cpu = task_cpu(p);
+   if (cpu_to_node(preferred_cpu) != max_nid) {
+   preferred_cpu = find_idlest_cpu_node(preferred_cpu,
+max_nid);
+   }
+
+   /* Update the preferred nid and migrate task if possible */
p->numa_preferred_nid = max_nid;
p->numa_migrate_seq = 0;
+   migrate_task_to(p, preferred_cpu);
}
 }
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c2f1c86..29d9b2c 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -555,6 +555,7 @@ static inline u64 rq_clock_task(struct rq *rq)
 }
 
 #ifdef CONFIG_NUMA_BALANCING
+extern int migrate_task_to(struct task_struct *p, int cpu);
 static inline void task_numa_free(struct task_struct *p)
 {
kfree(p->numa_faults);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 21/27] sched: Add infrastructure for split shared/private accounting of NUMA hinting faults

2013-08-08 Thread Mel Gorman
Ideally it would be possible to distinguish between NUMA hinting faults
that are private to a task and those that are shared.  This patch prepares
infrastructure for separately accounting shared and private faults by
allocating the necessary buffers and passing in relevant information. For
now, all faults are treated as private and detection will be introduced
later.

Signed-off-by: Mel Gorman 
---
 include/linux/sched.h |  5 +++--
 kernel/sched/fair.c   | 46 +++---
 mm/huge_memory.c  |  5 +++--
 mm/memory.c   |  8 ++--
 4 files changed, 47 insertions(+), 17 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e7f3f87..bbafa60 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1430,10 +1430,11 @@ struct task_struct {
 #define tsk_cpus_allowed(tsk) (&(tsk)->cpus_allowed)
 
 #ifdef CONFIG_NUMA_BALANCING
-extern void task_numa_fault(int node, int pages, bool migrated);
+extern void task_numa_fault(int last_node, int node, int pages, bool migrated);
 extern void set_numabalancing_state(bool enabled);
 #else
-static inline void task_numa_fault(int node, int pages, bool migrated)
+static inline void task_numa_fault(int last_node, int node, int pages,
+  bool migrated)
 {
 }
 static inline void set_numabalancing_state(bool enabled)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index dd2c0f3..3e2ff8c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -886,6 +886,20 @@ static unsigned int task_scan_max(struct task_struct *p)
  */
 unsigned int sysctl_numa_balancing_settle_count __read_mostly = 3;
 
+static inline int task_faults_idx(int nid, int priv)
+{
+   return 2 * nid + priv;
+}
+
+static inline unsigned long task_faults(struct task_struct *p, int nid)
+{
+   if (!p->numa_faults)
+   return 0;
+
+   return p->numa_faults[task_faults_idx(nid, 0)] +
+   p->numa_faults[task_faults_idx(nid, 1)];
+}
+
 static unsigned long weighted_cpuload(const int cpu);
 
 
@@ -928,13 +942,19 @@ static void task_numa_placement(struct task_struct *p)
/* Find the node with the highest number of faults */
for (nid = 0; nid < nr_node_ids; nid++) {
unsigned long faults;
+   int priv, i;
 
-   /* Decay existing window and copy faults since last scan */
-   p->numa_faults[nid] >>= 1;
-   p->numa_faults[nid] += p->numa_faults_buffer[nid];
-   p->numa_faults_buffer[nid] = 0;
+   for (priv = 0; priv < 2; priv++) {
+   i = task_faults_idx(nid, priv);
 
-   faults = p->numa_faults[nid];
+   /* Decay existing window, copy faults since last scan */
+   p->numa_faults[i] >>= 1;
+   p->numa_faults[i] += p->numa_faults_buffer[i];
+   p->numa_faults_buffer[i] = 0;
+   }
+
+   /* Find maximum private faults */
+   faults = p->numa_faults[task_faults_idx(nid, 1)];
if (faults > max_faults) {
max_faults = faults;
max_nid = nid;
@@ -970,16 +990,20 @@ static void task_numa_placement(struct task_struct *p)
 /*
  * Got a PROT_NONE fault for a page on @node.
  */
-void task_numa_fault(int node, int pages, bool migrated)
+void task_numa_fault(int last_nid, int node, int pages, bool migrated)
 {
struct task_struct *p = current;
+   int priv;
 
if (!sched_feat_numa(NUMA))
return;
 
+   /* For now, do not attempt to detect private/shared accesses */
+   priv = 1;
+
/* Allocate buffer to track faults on a per-node basis */
if (unlikely(!p->numa_faults)) {
-   int size = sizeof(*p->numa_faults) * nr_node_ids;
+   int size = sizeof(*p->numa_faults) * 2 * nr_node_ids;
 
/* numa_faults and numa_faults_buffer share the allocation */
p->numa_faults = kzalloc(size * 2, GFP_KERNEL|__GFP_NOWARN);
@@ -987,7 +1011,7 @@ void task_numa_fault(int node, int pages, bool migrated)
return;
 
BUG_ON(p->numa_faults_buffer);
-   p->numa_faults_buffer = p->numa_faults + nr_node_ids;
+   p->numa_faults_buffer = p->numa_faults + (2 * nr_node_ids);
}
 
/*
@@ -1005,7 +1029,7 @@ void task_numa_fault(int node, int pages, bool migrated)
 
task_numa_placement(p);
 
-   p->numa_faults_buffer[node] += pages;
+   p->numa_faults_buffer[task_faults_idx(node, priv)] += pages;
 }
 
 static void reset_ptenuma_scan(struct task_struct *p)
@@ -4096,7 +4120,7 @@ static bool migrate_improves_locality(struct task_struct 
*p, struct lb_env *env)
p->numa_migrate_seq >= sysctl_numa_balancing_settle_count)
return false;
 
-   if (p->numa_faults[dst_nid] > 

[PATCH 27/27] sched: Retry migration of tasks to CPU on a preferred node

2013-08-08 Thread Mel Gorman
When a preferred node is selected for a tasks there is an attempt to migrate
the task to a CPU there. This may fail in which case the task will only
migrate if the active load balancer takes action. This may never happen if
the conditions are not right. This patch will check at NUMA hinting fault
time if another attempt should be made to migrate the task. It will only
make an attempt once every five seconds.

Signed-off-by: Mel Gorman 
Signed-off-by: Peter Zijlstra 
---
 include/linux/sched.h |  1 +
 kernel/sched/fair.c   | 40 +++-
 2 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index bbafa60..cd67c95 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1333,6 +1333,7 @@ struct task_struct {
int numa_migrate_seq;
unsigned int numa_scan_period;
unsigned int numa_scan_period_max;
+   unsigned long numa_migrate_retry;
u64 node_stamp; /* migration stamp  */
struct callback_head numa_work;
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9ea4d5c..7cffbf5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -989,6 +989,22 @@ static int task_numa_find_cpu(struct task_struct *p, int 
nid)
return dst_cpu;
 }
 
+/* Attempt to migrate a task to a CPU on the preferred node. */
+static void numa_migrate_preferred(struct task_struct *p)
+{
+   int preferred_cpu = task_cpu(p);
+
+   /* Success if task is already running on preferred CPU */
+   p->numa_migrate_retry = 0;
+   if (cpu_to_node(preferred_cpu) == p->numa_preferred_nid)
+   return;
+
+   /* Otherwise, try migrate to a CPU on the preferred node */
+   preferred_cpu = task_numa_find_cpu(p, p->numa_preferred_nid);
+   if (migrate_task_to(p, preferred_cpu) != 0)
+   p->numa_migrate_retry = jiffies + HZ*5;
+}
+
 static void task_numa_placement(struct task_struct *p)
 {
int seq, nid, max_nid = -1;
@@ -1023,27 +1039,13 @@ static void task_numa_placement(struct task_struct *p)
}
}
 
-   /*
-* Record the preferred node as the node with the most faults,
-* requeue the task to be running on the idlest CPU on the
-* preferred node and reset the scanning rate to recheck
-* the working set placement.
-*/
+   /* Preferred node as the node with the most faults */
if (max_faults && max_nid != p->numa_preferred_nid) {
-   int preferred_cpu;
-
-   /*
-* If the task is not on the preferred node then find
-* a suitable CPU to migrate to.
-*/
-   preferred_cpu = task_cpu(p);
-   if (cpu_to_node(preferred_cpu) != max_nid)
-   preferred_cpu = task_numa_find_cpu(p, max_nid);
 
-   /* Update the preferred nid and migrate task if possible */
+   /* Queue task on preferred node if possible */
p->numa_preferred_nid = max_nid;
p->numa_migrate_seq = 0;
-   migrate_task_to(p, preferred_cpu);
+   numa_migrate_preferred(p);
}
 }
 
@@ -1099,6 +1101,10 @@ void task_numa_fault(int last_nidpid, int node, int 
pages, bool migrated)
 
task_numa_placement(p);
 
+   /* Retry task to preferred node migration if it previously failed */
+   if (p->numa_migrate_retry && time_after(jiffies, p->numa_migrate_retry))
+   numa_migrate_preferred(p);
+
p->numa_faults_buffer[task_faults_idx(node, priv)] += pages;
 }
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 17/27] sched: Update NUMA hinting faults once per scan

2013-08-08 Thread Mel Gorman
NUMA hinting faults counts and placement decisions are both recorded in the
same array which distorts the samples in an unpredictable fashion. The values
linearly accumulate during the scan and then decay creating a sawtooth-like
pattern in the per-node counts. It also means that placement decisions are
time sensitive. At best it means that it is very difficult to state that
the buffer holds a decaying average of past faulting behaviour. At worst,
it can confuse the load balancer if it sees one node with an artifically high
count due to very recent faulting activity and may create a bouncing effect.

This patch adds a second array. numa_faults stores the historical data
which is used for placement decisions. numa_faults_buffer holds the
fault activity during the current scan window. When the scan completes,
numa_faults decays and the values from numa_faults_buffer are copied
across.

Signed-off-by: Mel Gorman 
---
 include/linux/sched.h | 13 +
 kernel/sched/core.c   |  1 +
 kernel/sched/fair.c   | 16 +---
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d65e31c..d017be9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1335,7 +1335,20 @@ struct task_struct {
u64 node_stamp; /* migration stamp  */
struct callback_head numa_work;
 
+   /*
+* Exponential decaying average of faults on a per-node basis.
+* Scheduling placement decisions are made based on the these counts.
+* The values remain static for the duration of a PTE scan
+*/
unsigned long *numa_faults;
+
+   /*
+* numa_faults_buffer records faults per node during the current
+* scan window. When the scan completes, the counts in numa_faults
+* decay and these values are copied.
+*/
+   unsigned long *numa_faults_buffer;
+
int numa_preferred_nid;
 #endif /* CONFIG_NUMA_BALANCING */
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 194559e..aad32ff 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1637,6 +1637,7 @@ static void __sched_fork(struct task_struct *p)
p->numa_preferred_nid = -1;
p->numa_work.next = >numa_work;
p->numa_faults = NULL;
+   p->numa_faults_buffer = NULL;
 #endif /* CONFIG_NUMA_BALANCING */
 }
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ee8da21..8ca8901 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -892,8 +892,14 @@ static void task_numa_placement(struct task_struct *p)
 
/* Find the node with the highest number of faults */
for (nid = 0; nid < nr_node_ids; nid++) {
-   unsigned long faults = p->numa_faults[nid];
+   unsigned long faults;
+
+   /* Decay existing window and copy faults since last scan */
p->numa_faults[nid] >>= 1;
+   p->numa_faults[nid] += p->numa_faults_buffer[nid];
+   p->numa_faults_buffer[nid] = 0;
+
+   faults = p->numa_faults[nid];
if (faults > max_faults) {
max_faults = faults;
max_nid = nid;
@@ -919,9 +925,13 @@ void task_numa_fault(int node, int pages, bool migrated)
if (unlikely(!p->numa_faults)) {
int size = sizeof(*p->numa_faults) * nr_node_ids;
 
-   p->numa_faults = kzalloc(size, GFP_KERNEL|__GFP_NOWARN);
+   /* numa_faults and numa_faults_buffer share the allocation */
+   p->numa_faults = kzalloc(size * 2, GFP_KERNEL|__GFP_NOWARN);
if (!p->numa_faults)
return;
+
+   BUG_ON(p->numa_faults_buffer);
+   p->numa_faults_buffer = p->numa_faults + nr_node_ids;
}
 
/*
@@ -939,7 +949,7 @@ void task_numa_fault(int node, int pages, bool migrated)
 
task_numa_placement(p);
 
-   p->numa_faults[node] += pages;
+   p->numa_faults_buffer[node] += pages;
 }
 
 static void reset_ptenuma_scan(struct task_struct *p)
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 24/27] sched: Remove check that skips small VMAs

2013-08-08 Thread Mel Gorman
task_numa_work skips small VMAs. At the time the logic was to reduce the
scanning overhead which was considerable. It is a dubious hack at best.
It would make much more sense to cache where faults have been observed
and only rescan those regions during subsequent PTE scans. Remove this
hack as motivation to do it properly in the future.

Signed-off-by: Mel Gorman 
---
 kernel/sched/fair.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f07073b..b66c662 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1127,10 +1127,6 @@ void task_numa_work(struct callback_head *work)
if (!vma_migratable(vma))
continue;
 
-   /* Skip small VMAs. They are not likely to be of relevance */
-   if (vma->vm_end - vma->vm_start < HPAGE_SIZE)
-   continue;
-
do {
start = max(start, vma->vm_start);
end = ALIGN(start + (pages << PAGE_SHIFT), HPAGE_SIZE);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] PCI: avoid NULL deref in alloc_pcie_link_state

2013-08-08 Thread Radim Krčmář
Subject should have said this is a second version.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/27] sched: Select a preferred node with the most numa hinting faults

2013-08-08 Thread Mel Gorman
This patch selects a preferred node for a task to run on based on the
NUMA hinting faults. This information is later used to migrate tasks
towards the node during balancing.

Signed-off-by: Mel Gorman 
---
 include/linux/sched.h |  1 +
 kernel/sched/core.c   |  1 +
 kernel/sched/fair.c   | 17 +++--
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 702a5b6..d65e31c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1336,6 +1336,7 @@ struct task_struct {
struct callback_head numa_work;
 
unsigned long *numa_faults;
+   int numa_preferred_nid;
 #endif /* CONFIG_NUMA_BALANCING */
 
struct rcu_head rcu;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e6dda1b..194559e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1634,6 +1634,7 @@ static void __sched_fork(struct task_struct *p)
p->numa_scan_seq = p->mm ? p->mm->numa_scan_seq : 0;
p->numa_migrate_seq = p->mm ? p->mm->numa_scan_seq - 1 : 0;
p->numa_scan_period = sysctl_numa_balancing_scan_delay;
+   p->numa_preferred_nid = -1;
p->numa_work.next = >numa_work;
p->numa_faults = NULL;
 #endif /* CONFIG_NUMA_BALANCING */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index babac71..ee8da21 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -879,7 +879,8 @@ static unsigned int task_scan_max(struct task_struct *p)
 
 static void task_numa_placement(struct task_struct *p)
 {
-   int seq;
+   int seq, nid, max_nid = -1;
+   unsigned long max_faults = 0;
 
if (!p->mm) /* for example, ksmd faulting in a user's mm */
return;
@@ -889,7 +890,19 @@ static void task_numa_placement(struct task_struct *p)
p->numa_scan_seq = seq;
p->numa_scan_period_max = task_scan_max(p);
 
-   /* FIXME: Scheduling placement policy hints go here */
+   /* Find the node with the highest number of faults */
+   for (nid = 0; nid < nr_node_ids; nid++) {
+   unsigned long faults = p->numa_faults[nid];
+   p->numa_faults[nid] >>= 1;
+   if (faults > max_faults) {
+   max_faults = faults;
+   max_nid = nid;
+   }
+   }
+
+   /* Update the tasks preferred node if necessary */
+   if (max_faults && max_nid != p->numa_preferred_nid)
+   p->numa_preferred_nid = max_nid;
 }
 
 /*
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15/27] sched: Track NUMA hinting faults on per-node basis

2013-08-08 Thread Mel Gorman
This patch tracks what nodes numa hinting faults were incurred on.
This information is later used to schedule a task on the node storing
the pages most frequently faulted by the task.

Signed-off-by: Mel Gorman 
---
 include/linux/sched.h |  2 ++
 kernel/sched/core.c   |  3 +++
 kernel/sched/fair.c   | 11 ++-
 kernel/sched/sched.h  | 12 
 4 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 59c473b..702a5b6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1334,6 +1334,8 @@ struct task_struct {
unsigned int numa_scan_period_max;
u64 node_stamp; /* migration stamp  */
struct callback_head numa_work;
+
+   unsigned long *numa_faults;
 #endif /* CONFIG_NUMA_BALANCING */
 
struct rcu_head rcu;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e148975..e6dda1b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1635,6 +1635,7 @@ static void __sched_fork(struct task_struct *p)
p->numa_migrate_seq = p->mm ? p->mm->numa_scan_seq - 1 : 0;
p->numa_scan_period = sysctl_numa_balancing_scan_delay;
p->numa_work.next = >numa_work;
+   p->numa_faults = NULL;
 #endif /* CONFIG_NUMA_BALANCING */
 }
 
@@ -1896,6 +1897,8 @@ static void finish_task_switch(struct rq *rq, struct 
task_struct *prev)
if (mm)
mmdrop(mm);
if (unlikely(prev_state == TASK_DEAD)) {
+   task_numa_free(prev);
+
/*
 * Remove function-return probe instances associated with this
 * task and put them back on the free list.
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d77bb32..babac71 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -902,7 +902,14 @@ void task_numa_fault(int node, int pages, bool migrated)
if (!sched_feat_numa(NUMA))
return;
 
-   /* FIXME: Allocate task-specific structure for placement policy here */
+   /* Allocate buffer to track faults on a per-node basis */
+   if (unlikely(!p->numa_faults)) {
+   int size = sizeof(*p->numa_faults) * nr_node_ids;
+
+   p->numa_faults = kzalloc(size, GFP_KERNEL|__GFP_NOWARN);
+   if (!p->numa_faults)
+   return;
+   }
 
/*
 * If pages are properly placed (did not migrate) then scan slower.
@@ -918,6 +925,8 @@ void task_numa_fault(int node, int pages, bool migrated)
}
 
task_numa_placement(p);
+
+   p->numa_faults[node] += pages;
 }
 
 static void reset_ptenuma_scan(struct task_struct *p)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ef0a7b2..c2f1c86 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "cpupri.h"
 #include "cpuacct.h"
@@ -553,6 +554,17 @@ static inline u64 rq_clock_task(struct rq *rq)
return rq->clock_task;
 }
 
+#ifdef CONFIG_NUMA_BALANCING
+static inline void task_numa_free(struct task_struct *p)
+{
+   kfree(p->numa_faults);
+}
+#else /* CONFIG_NUMA_BALANCING */
+static inline void task_numa_free(struct task_struct *p)
+{
+}
+#endif /* CONFIG_NUMA_BALANCING */
+
 #ifdef CONFIG_SMP
 
 #define rcu_dereference_check_sched_domain(p) \
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/27] Revert "mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node"

2013-08-08 Thread Mel Gorman
PTE scanning and NUMA hinting fault handling is expensive so commit
5bca2303 ("mm: sched: numa: Delay PTE scanning until a task is scheduled
on a new node") deferred the PTE scan until a task had been scheduled on
another node. The problem is that in the purely shared memory case that
this may never happen and no NUMA hinting fault information will be
captured. We are not ruling out the possibility that something better
can be done here but for now, this patch needs to be reverted and depend
entirely on the scan_delay to avoid punishing short-lived processes.

Signed-off-by: Mel Gorman 
---
 include/linux/mm_types.h | 10 --
 kernel/fork.c|  3 ---
 kernel/sched/fair.c  | 18 --
 kernel/sched/features.h  |  4 +---
 4 files changed, 1 insertion(+), 34 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index fb425aa..86f3a56 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -426,20 +426,10 @@ struct mm_struct {
 
/* numa_scan_seq prevents two threads setting pte_numa */
int numa_scan_seq;
-
-   /*
-* The first node a task was scheduled on. If a task runs on
-* a different node than Make PTE Scan Go Now.
-*/
-   int first_nid;
 #endif
struct uprobes_state uprobes_state;
 };
 
-/* first nid will either be a valid NID or one of these values */
-#define NUMA_PTE_SCAN_INIT -1
-#define NUMA_PTE_SCAN_ACTIVE   -2
-
 static inline void mm_init_cpumask(struct mm_struct *mm)
 {
 #ifdef CONFIG_CPUMASK_OFFSTACK
diff --git a/kernel/fork.c b/kernel/fork.c
index 403d2bb..61e799e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -820,9 +820,6 @@ struct mm_struct *dup_mm(struct task_struct *tsk)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
mm->pmd_huge_pte = NULL;
 #endif
-#ifdef CONFIG_NUMA_BALANCING
-   mm->first_nid = NUMA_PTE_SCAN_INIT;
-#endif
if (!mm_init(mm, tsk))
goto fail_nomem;
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cc2ee69..bb5d978 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -901,24 +901,6 @@ void task_numa_work(struct callback_head *work)
return;
 
/*
-* We do not care about task placement until a task runs on a node
-* other than the first one used by the address space. This is
-* largely because migrations are driven by what CPU the task
-* is running on. If it's never scheduled on another node, it'll
-* not migrate so why bother trapping the fault.
-*/
-   if (mm->first_nid == NUMA_PTE_SCAN_INIT)
-   mm->first_nid = numa_node_id();
-   if (mm->first_nid != NUMA_PTE_SCAN_ACTIVE) {
-   /* Are we running on a new node yet? */
-   if (numa_node_id() == mm->first_nid &&
-   !sched_feat_numa(NUMA_FORCE))
-   return;
-
-   mm->first_nid = NUMA_PTE_SCAN_ACTIVE;
-   }
-
-   /*
 * Reset the scan period if enough time has gone by. Objective is that
 * scanning will be reduced if pages are properly placed. As tasks
 * can enter different phases this needs to be re-examined. Lacking
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 99399f8..cba5c61 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -63,10 +63,8 @@ SCHED_FEAT(LB_MIN, false)
 /*
  * Apply the automatic NUMA scheduling policy. Enabled automatically
  * at runtime if running on a NUMA machine. Can be controlled via
- * numa_balancing=. Allow PTE scanning to be forced on UMA machines
- * for debugging the core machinery.
+ * numa_balancing=
  */
 #ifdef CONFIG_NUMA_BALANCING
 SCHED_FEAT(NUMA,   false)
-SCHED_FEAT(NUMA_FORCE, false)
 #endif
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH part1 0/5] acpica: Split acpi_gbl_root_table_list initialization into two parts.

2013-08-08 Thread Rafael J. Wysocki
On Thursday, August 08, 2013 11:39:31 AM Tang Chen wrote:
> [Problem]
> 
> The current Linux cannot migrate pages used by the kerenl because
> of the kernel direct mapping. In Linux kernel space, va = pa + PAGE_OFFSET.
> When the pa is changed, we cannot simply update the pagetable and
> keep the va unmodified. So the kernel pages are not migratable.
> 
> There are also some other issues will cause the kernel pages not migratable.
> For example, the physical address may be cached somewhere and will be used.
> It is not to update all the caches.
> 
> When doing memory hotplug in Linux, we first migrate all the pages in one
> memory device somewhere else, and then remove the device. But if pages are
> used by the kernel, they are not migratable. As a result, memory used by
> the kernel cannot be hot-removed.
> 
> Modifying the kernel direct mapping mechanism is too difficult to do. And
> it may cause the kernel performance down and unstable. So we use the following
> way to do memory hotplug.
> 
> 
> [What we are doing]
> 
> In Linux, memory in one numa node is divided into several zones. One of the
> zones is ZONE_MOVABLE, which the kernel won't use.
> 
> In order to implement memory hotplug in Linux, we are going to arrange all
> hotpluggable memory in ZONE_MOVABLE so that the kernel won't use these memory.
> 
> To do this, we need ACPI's help.
> 
> 
> [How we do this]
> 
> In ACPI, SRAT(System Resource Affinity Table) contains NUMA info. The memory
> affinities in SRAT record every memory range in the system, and also, flags
> specifying if the memory range is hotpluggable.
> (Please refer to ACPI spec 5.0 5.2.16)
> 
> With the help of SRAT, we have to do the following two things to achieve our
> goal:
> 
> 1. When doing memory hot-add, allow the users arranging hotpluggable as
>ZONE_MOVABLE.
>(This has been done by the MOVABLE_NODE functionality in Linux.)
> 
> 2. when the system is booting, prevent bootmem allocator from allocating
>hotpluggable memory for the kernel before the memory initialization
>finishes.
>(This is what we are going to do. And we need to do some modification in
> ACPICA. See below.)
> 
> 
> [About this patch-set]
> 
> There is a bootmem allocator named memblock in Linux. memblock starts to work
> at very early time, and SRAT has not been parsed. So we don't know which 
> memory
> is hotpluggable. In order to prevent memblock from allocating hotpluggable
> memory for the kernel, we need to obtain SRAT memory affinity info earlier.
> 
> In the current Linux kernel, the acpica code iterates 
> acpi_gbl_root_table_list,
> and install all the acpi tables into it at boot time. Then, it tries to find
> if there is any override table in global array acpi_tables_addr. If any, 
> reinstall
> the override table into acpi_gbl_root_table_list.
> 
> In Linux, global array acpi_tables_addr can be fulfilled by 
> ACPI_INITRD_TABLE_OVERRIDE
> mechanism, which allows users to specify their own ACPI tables in initrd 
> file, and
> override the ones from firmware.
> 
> The whole procedure looks like the following:
> 
> setup_arch()
>  |->   .. /* Setup direct mapping 
> pagetables */
>  |->acpi_initrd_override()/* Store all override 
> tables in acpi_tables_addr. */
>  |...
>  |->acpi_boot_table_init()
> |->acpi_table_init()
>|  
> (Linux code)
> ..
>|  
>(ACPICA code)
>|->acpi_initialize_tables()
>   |->acpi_tb_parse_root_table()   /* Parse RSDT or XSDT, find 
> all tables in firmware */
>  |->for (each item in acpi_gbl_root_table_list)
> |->acpi_tb_install_table()
>|->   ..   /* Install one single table 
> */
>|->acpi_tb_table_override()/* Override one single 
> table */
> 
> It does the table installation and overriding one by one.
> 
> In order to find SRAT at earlier time, we want to initialize 
> acpi_gbl_root_table_list
> earlier. But at the same time, keep ACPI_INITRD_TABLE_OVERRIDE procedure 
> works as well.
> 
> The basic idea is, split the acpi_gbl_root_table_list initialization 
> procedure into
> two steps:
> 1. Install all tables from firmware, not one by one.
> 2. Override any table if necessary, not one by one.
> 
> After this patch-set, it will work like this:
> 
> setup_arch()
>  |-> ..   /* Install all tables from 
> firmware (Step 1) */
>  |-> ..   /* Try to find if any 
> override SRAT in initrd file, if yes, use it */
>  |-> ..   /* Use the SRAT from 
> firmware */
>  |-> ..

[PATCH 25/27] sched: Set preferred NUMA node based on number of private faults

2013-08-08 Thread Mel Gorman
Ideally it would be possible to distinguish between NUMA hinting faults that
are private to a task and those that are shared. If treated identically
there is a risk that shared pages bounce between nodes depending on
the order they are referenced by tasks. Ultimately what is desirable is
that task private pages remain local to the task while shared pages are
interleaved between sharing tasks running on different nodes to give good
average performance. This is further complicated by THP as even
applications that partition their data may not be partitioning on a huge
page boundary.

To start with, this patch assumes that multi-threaded or multi-process
applications partition their data and that in general the private accesses
are more important for cpu->memory locality in the general case. Also,
no new infrastructure is required to treat private pages properly but
interleaving for shared pages requires additional infrastructure.

To detect private accesses the pid of the last accessing task is required
but the storage requirements are a high. This patch borrows heavily from
Ingo Molnar's patch "numa, mm, sched: Implement last-CPU+PID hash tracking"
to encode some bits from the last accessing task in the page flags as
well as the node information. Collisions will occur but it is better than
just depending on the node information. Node information is then used to
determine if a page needs to migrate. The PID information is used to detect
private/shared accesses. The preferred NUMA node is selected based on where
the maximum number of approximately private faults were measured. Shared
faults are not taken into consideration for a few reasons.

First, if there are many tasks sharing the page then they'll all move
towards the same node. The node will be compute overloaded and then
scheduled away later only to bounce back again. Alternatively the shared
tasks would just bounce around nodes because the fault information is
effectively noise. Either way accounting for shared faults the same as
private faults can result in lower performance overall.

The second reason is based on a hypothetical workload that has a small
number of very important, heavily accessed private pages but a large shared
array. The shared array would dominate the number of faults and be selected
as a preferred node even though it's the wrong decision.

The third reason is that multiple threads in a process will race each
other to fault the shared page making the fault information unreliable.

[r...@redhat.com: Fix complication error when !NUMA_BALANCING]
Signed-off-by: Mel Gorman 
---
 include/linux/mm.h| 89 +--
 include/linux/mm_types.h  |  4 +-
 include/linux/page-flags-layout.h | 28 +++-
 kernel/sched/fair.c   | 12 --
 mm/huge_memory.c  |  8 ++--
 mm/memory.c   | 16 +++
 mm/mempolicy.c|  8 ++--
 mm/migrate.c  |  4 +-
 mm/mm_init.c  | 18 
 mm/mmzone.c   | 14 +++---
 mm/mprotect.c | 26 
 mm/page_alloc.c   |  4 +-
 12 files changed, 149 insertions(+), 82 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f022460..0a0db6c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -588,11 +588,11 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct 
vm_area_struct *vma)
  * sets it, so none of the operations on it need to be atomic.
  */
 
-/* Page flags: | [SECTION] | [NODE] | ZONE | [LAST_NID] | ... | FLAGS | */
+/* Page flags: | [SECTION] | [NODE] | ZONE | [LAST_NIDPID] | ... | FLAGS | */
 #define SECTIONS_PGOFF ((sizeof(unsigned long)*8) - SECTIONS_WIDTH)
 #define NODES_PGOFF(SECTIONS_PGOFF - NODES_WIDTH)
 #define ZONES_PGOFF(NODES_PGOFF - ZONES_WIDTH)
-#define LAST_NID_PGOFF (ZONES_PGOFF - LAST_NID_WIDTH)
+#define LAST_NIDPID_PGOFF  (ZONES_PGOFF - LAST_NIDPID_WIDTH)
 
 /*
  * Define the bit shifts to access each section.  For non-existent
@@ -602,7 +602,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct 
vm_area_struct *vma)
 #define SECTIONS_PGSHIFT   (SECTIONS_PGOFF * (SECTIONS_WIDTH != 0))
 #define NODES_PGSHIFT  (NODES_PGOFF * (NODES_WIDTH != 0))
 #define ZONES_PGSHIFT  (ZONES_PGOFF * (ZONES_WIDTH != 0))
-#define LAST_NID_PGSHIFT   (LAST_NID_PGOFF * (LAST_NID_WIDTH != 0))
+#define LAST_NIDPID_PGSHIFT(LAST_NIDPID_PGOFF * (LAST_NIDPID_WIDTH != 0))
 
 /* NODE:ZONE or SECTION:ZONE is used to ID a zone for the buddy allocator */
 #ifdef NODE_NOT_IN_PAGE_FLAGS
@@ -624,7 +624,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct 
vm_area_struct *vma)
 #define ZONES_MASK ((1UL << ZONES_WIDTH) - 1)
 #define NODES_MASK ((1UL << NODES_WIDTH) - 1)
 #define SECTIONS_MASK  ((1UL << SECTIONS_WIDTH) - 1)
-#define LAST_NID_MASK  ((1UL << LAST_NID_WIDTH) 

Re: [PATCH 2/6] ARM: Tegra: Add CPU's OPPs for using cpufreq-cpu0 driver

2013-08-08 Thread Lucas Stach
Am Mittwoch, den 07.08.2013, 12:55 -0600 schrieb Stephen Warren:
> On 08/07/2013 12:06 PM, Viresh Kumar wrote:
> > On 7 August 2013 23:12, Stephen Warren  wrote:
> >> On 08/07/2013 08:46 AM, Viresh Kumar wrote:
> >>> cpufreq-cpu0 driver needs OPPs to be present in DT which can be probed by 
> >>> it to
> >>> get frequency table. This patch adds OPPs and clock-latency to tegra cpu0 
> >>> node
> >>> for multiple SoCs.
> >>>
> >>> Voltage levels aren't used until now for tegra and so a flat value which 
> >>> would
> >>> eventually be ignored is used to represent voltage.
> >>
> >> This patch is problematic w.r.t. DT being an ABI.
> > 
> > :(
> > 
> >> We can certainly add new optional properties to a DT binding that enable
> >> new features. However, a new version of a binding can't require new
> >> properties to exist that didn't before, since that means that old DTs
> >> won't work with new kernels that require the new properties.
> > 
> > To be honest I didn't get it completely. You meant operating-points
> > wasn't present before? Its here:
> > 
> > Documentation/devicetree/bindings/cpufreq/cpufreq-cpu0.txt
> > Documentation/devicetree/bindings/power/opp.txt
> > 
> > Or you meant, Tegra never required voltage levels and we are getting
> > them in here.
> 
> The current Tegra *.dts files do not contain this property. The current
> Tegra *.dts files must continue to work without modification in future
> kernels.
> 
> >> As such, I believe we do need some Tegra-specific piece of code that
> >> defines these OPP tables in the kernel, so that the operating-points
> >> property is not needed.
> > 
> > Generic cpufreq driver depends on OPP library and so somebody has
> > to provide them. Now you can do it by calling opp_add() for each OPP
> > you have or otherwise.
> 
> Sure. That's what the Tegra-specific cpufreq driver should do. It should
> be the top-level cpufreq driver. If parts of the code can be implemented
> by library functions or a core parameterizable driver, then presumably
> the Tegra driver would simply exist to provide those parameters and/or
> callback functions to the generic driver.
> 
> > Btw, you must have some specific voltage level for each freq, we can
> > get them here..
> 
> Yes, I'm sure we do, but I have no idea what they are:-( It may even be
> board-specific or SoC-SKU-specific. I think we should defer this aspect
> for now.

>From what I learned those voltage levels are dependent on both the
Speedo and the process ID of the specific Tegra processor. So you really
get a two dimensional mapping table instead of a single OPP.
Also you can not scale the CPU voltage on it's own, but have to make
sure the core voltage isn't too far away from. Then core voltage also
depends on the operating states of engines like GR2D or even display.

Regards,
Lucas
-- 
Pengutronix e.K.   | Lucas Stach |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 26/27] sched: Avoid overloading CPUs on a preferred NUMA node

2013-08-08 Thread Mel Gorman
This patch replaces find_idlest_cpu_node with task_numa_find_cpu.
find_idlest_cpu_node has two critical limitations. It does not take the
scheduling class into account when calculating the load and it is unsuitable
for using when comparing loads between NUMA nodes.

task_numa_find_cpu uses similar load calculations to wake_affine() when
selecting the least loaded CPU within a scheduling domain common to the
source and destimation nodes. It avoids causing CPU load imbalances in
the machine by refusing to migrate if the relative load on the target
CPU is higher than the source CPU.

Signed-off-by: Peter Zijlstra 
Signed-off-by: Mel Gorman 
---
 kernel/sched/fair.c | 105 +---
 1 file changed, 83 insertions(+), 22 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9d8b5cb..9ea4d5c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -901,28 +901,92 @@ static inline unsigned long task_faults(struct 
task_struct *p, int nid)
 }
 
 static unsigned long weighted_cpuload(const int cpu);
+static unsigned long source_load(int cpu, int type);
+static unsigned long target_load(int cpu, int type);
+static unsigned long power_of(int cpu);
+static long effective_load(struct task_group *tg, int cpu, long wl, long wg);
+
+static int task_numa_find_cpu(struct task_struct *p, int nid)
+{
+   int node_cpu = cpumask_first(cpumask_of_node(nid));
+   int cpu, src_cpu = task_cpu(p), dst_cpu = src_cpu;
+   unsigned long src_load, dst_load;
+   unsigned long min_load = ULONG_MAX;
+   struct task_group *tg = task_group(p);
+   s64 src_eff_load, dst_eff_load;
+   struct sched_domain *sd;
+   unsigned long weight;
+   bool balanced;
+   int imbalance_pct, idx = -1;
 
+   /* No harm being optimistic */
+   if (idle_cpu(node_cpu))
+   return node_cpu;
 
-static int
-find_idlest_cpu_node(int this_cpu, int nid)
-{
-   unsigned long load, min_load = ULONG_MAX;
-   int i, idlest_cpu = this_cpu;
+   /*
+* Find the lowest common scheduling domain covering the nodes of both
+* the CPU the task is currently running on and the target NUMA node.
+*/
+   rcu_read_lock();
+   for_each_domain(src_cpu, sd) {
+   if (cpumask_test_cpu(node_cpu, sched_domain_span(sd))) {
+   /*
+* busy_idx is used for the load decision as it is the
+* same index used by the regular load balancer for an
+* active cpu.
+*/
+   idx = sd->busy_idx;
+   imbalance_pct = sd->imbalance_pct;
+   break;
+   }
+   }
+   rcu_read_unlock();
 
-   BUG_ON(cpu_to_node(this_cpu) == nid);
+   if (WARN_ON_ONCE(idx == -1))
+   return src_cpu;
 
-   rcu_read_lock();
-   for_each_cpu(i, cpumask_of_node(nid)) {
-   load = weighted_cpuload(i);
+   /*
+* XXX the below is mostly nicked from wake_affine(); we should
+* see about sharing a bit if at all possible; also it might want
+* some per entity weight love.
+*/
+   weight = p->se.load.weight;
 
-   if (load < min_load) {
-   min_load = load;
-   idlest_cpu = i;
+   src_load = source_load(src_cpu, idx);
+
+   src_eff_load = 100 + (imbalance_pct - 100) / 2;
+   src_eff_load *= power_of(src_cpu);
+   src_eff_load *= src_load + effective_load(tg, src_cpu, -weight, 
-weight);
+
+   for_each_cpu(cpu, cpumask_of_node(nid)) {
+   dst_load = target_load(cpu, idx);
+
+   /* If the CPU is idle, use it */
+   if (!dst_load)
+   return cpu;
+
+   /* Otherwise check the target CPU load */
+   dst_eff_load = 100;
+   dst_eff_load *= power_of(cpu);
+   dst_eff_load *= dst_load + effective_load(tg, cpu, weight, 
weight);
+
+   /*
+* Destination is considered balanced if the destination CPU is
+* less loaded than the source CPU. Unfortunately there is a
+* risk that a task running on a lightly loaded CPU will not
+* migrate to its preferred node due to load imbalances.
+*/
+   balanced = (dst_eff_load <= src_eff_load);
+   if (!balanced)
+   continue;
+
+   if (dst_load < min_load) {
+   min_load = dst_load;
+   dst_cpu = cpu;
}
}
-   rcu_read_unlock();
 
-   return idlest_cpu;
+   return dst_cpu;
 }
 
 static void task_numa_placement(struct task_struct *p)
@@ -969,14 +1033,12 @@ static void task_numa_placement(struct task_struct *p)
int preferred_cpu;
 
/*
-  

[PATCH 01/27] mm: numa: Document automatic NUMA balancing sysctls

2013-08-08 Thread Mel Gorman
Signed-off-by: Mel Gorman 
---
 Documentation/sysctl/kernel.txt | 66 +
 1 file changed, 66 insertions(+)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index ab7d16e..ccadb52 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -354,6 +354,72 @@ utilize.
 
 ==
 
+numa_balancing
+
+Enables/disables automatic page fault based NUMA memory
+balancing. Memory is moved automatically to nodes
+that access it often.
+
+Enables/disables automatic NUMA memory balancing. On NUMA machines, there
+is a performance penalty if remote memory is accessed by a CPU. When this
+feature is enabled the kernel samples what task thread is accessing memory
+by periodically unmapping pages and later trapping a page fault. At the
+time of the page fault, it is determined if the data being accessed should
+be migrated to a local memory node.
+
+The unmapping of pages and trapping faults incur additional overhead that
+ideally is offset by improved memory locality but there is no universal
+guarantee. If the target workload is already bound to NUMA nodes then this
+feature should be disabled. Otherwise, if the system overhead from the
+feature is too high then the rate the kernel samples for NUMA hinting
+faults may be controlled by the numa_balancing_scan_period_min_ms,
+numa_balancing_scan_delay_ms, numa_balancing_scan_period_reset,
+numa_balancing_scan_period_max_ms and numa_balancing_scan_size_mb sysctls.
+
+==
+
+numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms,
+numa_balancing_scan_period_max_ms, numa_balancing_scan_period_reset,
+numa_balancing_scan_size_mb
+
+Automatic NUMA balancing scans tasks address space and unmaps pages to
+detect if pages are properly placed or if the data should be migrated to a
+memory node local to where the task is running.  Every "scan delay" the task
+scans the next "scan size" number of pages in its address space. When the
+end of the address space is reached the scanner restarts from the beginning.
+
+In combination, the "scan delay" and "scan size" determine the scan rate.
+When "scan delay" decreases, the scan rate increases.  The scan delay and
+hence the scan rate of every task is adaptive and depends on historical
+behaviour. If pages are properly placed then the scan delay increases,
+otherwise the scan delay decreases.  The "scan size" is not adaptive but
+the higher the "scan size", the higher the scan rate.
+
+Higher scan rates incur higher system overhead as page faults must be
+trapped and potentially data must be migrated. However, the higher the scan
+rate, the more quickly a tasks memory is migrated to a local node if the
+workload pattern changes and minimises performance impact due to remote
+memory accesses. These sysctls control the thresholds for scan delays and
+the number of pages scanned.
+
+numa_balancing_scan_period_min_ms is the minimum delay in milliseconds
+between scans. It effectively controls the maximum scanning rate for
+each task.
+
+numa_balancing_scan_delay_ms is the starting "scan delay" used for a task
+when it initially forks.
+
+numa_balancing_scan_period_max_ms is the maximum delay between scans. It
+effectively controls the minimum scanning rate for each task.
+
+numa_balancing_scan_size_mb is how many megabytes worth of pages are
+scanned for a given scan.
+
+numa_balancing_scan_period_reset is a blunt instrument that controls how
+often a tasks scan delay is reset to detect sudden changes in task behaviour.
+
+==
+
 osrelease, ostype & version:
 
 # cat osrelease
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/27] sched: numa: Correct adjustment of numa_scan_period

2013-08-08 Thread Mel Gorman
numa_scan_period is in milliseconds, not jiffies. Properly placed pages
slow the scanning rate but adding 10 jiffies to numa_scan_period means
that the rate scanning slows depends on HZ which is confusing. Get rid
of the jiffies_to_msec conversion and treat it as ms.

Signed-off-by: Mel Gorman 
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2908b4e..d77bb32 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -914,7 +914,7 @@ void task_numa_fault(int node, int pages, bool migrated)
p->numa_scan_period_max = task_scan_max(p);
 
p->numa_scan_period = min(p->numa_scan_period_max,
-   p->numa_scan_period + jiffies_to_msecs(10));
+   p->numa_scan_period + 10);
}
 
task_numa_placement(p);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v9 04/16] iommu/exynos: allocate lv2 page table from own slab

2013-08-08 Thread Tomasz Figa
On Thursday 08 of August 2013 18:38:04 Cho KyongHo wrote:
> Since kmalloc() does not guarantee that the allignment of 1KiB when it
> allocates 1KiB, it is required to allocate lv2 page table from own
> slab that guarantees alignment of 1KiB
> 
> Signed-off-by: Cho KyongHo 
> ---
>  drivers/iommu/exynos-iommu.c |   24 
>  1 files changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
> index d90e6fa..a318049 100644
> --- a/drivers/iommu/exynos-iommu.c
> +++ b/drivers/iommu/exynos-iommu.c
> @@ -100,6 +100,8 @@
>  #define REG_PB1_SADDR0x054
>  #define REG_PB1_EADDR0x058
> 
> +static struct kmem_cache *lv2table_kmem_cache;
> +
>  static unsigned long *section_entry(unsigned long *pgtable, unsigned
> long iova) {
>   return pgtable + lv1ent_offset(iova);
> @@ -765,7 +767,8 @@ static void exynos_iommu_domain_destroy(struct
> iommu_domain *domain)
> 
>   for (i = 0; i < NUM_LV1ENTRIES; i++)
>   if (lv1ent_page(priv->pgtable + i))
> - kfree(__va(lv2table_base(priv->pgtable + i)));
> + kmem_cache_free(lv2table_kmem_cache,
> + __va(lv2table_base(priv->pgtable + i)));
> 
>   free_pages((unsigned long)priv->pgtable, 2);
>   free_pages((unsigned long)priv->lv2entcnt, 1);
> @@ -861,7 +864,7 @@ static unsigned long *alloc_lv2entry(unsigned long
> *sent, unsigned long iova, if (lv1ent_fault(sent)) {
>   unsigned long *pent;
> 
> - pent = kzalloc(LV2TABLE_SIZE, GFP_ATOMIC);
> + pent = kmem_cache_zalloc(lv2table_kmem_cache, GFP_ATOMIC);
>   BUG_ON((unsigned long)pent & (LV2TABLE_SIZE - 1));
>   if (!pent)
>   return ERR_PTR(-ENOMEM);
> @@ -881,7 +884,7 @@ static int lv1set_section(unsigned long *sent,
> phys_addr_t paddr, short *pgcnt)
> 
>   if (lv1ent_page(sent)) {
>   BUG_ON(*pgcnt != NUM_LV2ENTRIES);
> - kfree(page_entry(sent, 0));
> + kmem_cache_free(lv2table_kmem_cache, page_entry(sent, 0));
>   *pgcnt = 0;
>   }
> 
> @@ -1082,10 +1085,23 @@ static int __init exynos_iommu_init(void)
>  {
>   int ret;
> 
> + lv2table_kmem_cache = kmem_cache_create("exynos-iommu-lv2table",
> + LV2TABLE_SIZE, LV2TABLE_SIZE, 0, NULL);
> + if (!lv2table_kmem_cache) {
> + pr_err("%s: Failed to create kmem cache\n", __func__);
> + return -ENOMEM;
> + }
> +
>   ret = platform_driver_register(_sysmmu_driver);
> 
>   if (ret == 0)
> - bus_set_iommu(_bus_type, _iommu_ops);
> + ret = bus_set_iommu(_bus_type, _iommu_ops);
> +
> + if (ret) {
> + pr_err("%s: Failed to register exynos-iommu driver.\n",
> + __func__);
> + kmem_cache_destroy(lv2table_kmem_cache);
> + }

What about making the return value handling here cleaner? For example:

lv2table_kmem_cache = kmem_cache_create("exynos-iommu-lv2table",
LV2TABLE_SIZE, LV2TABLE_SIZE, 0, NULL);
if (!lv2table_kmem_cache) {
...
return -ENOMEM;
}

ret = platform_driver_register(_sysmmu_driver);
if (ret) {
...
goto err_destroy_kmem_cache;
}

ret = bus_set_iommu(_bus_type, _iommu_ops);
if (ret) {
...
goto err_platform_unregister;
}

return 0;

err_platform_unregister:
...
err_destroy_kmem_cache:
...
return ret;
}

Best regards,
Tomasz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/35] cpufreq: e_powersaver: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/e_powersaver.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/cpufreq/e_powersaver.c b/drivers/cpufreq/e_powersaver.c
index 09f64cc..c1b7c99 100644
--- a/drivers/cpufreq/e_powersaver.c
+++ b/drivers/cpufreq/e_powersaver.c
@@ -403,13 +403,12 @@ static int eps_cpu_init(struct cpufreq_policy *policy)
policy->cpuinfo.transition_latency = 14; /* 844mV -> 700mV in ns */
policy->cur = fsb * current_multiplier;
 
-   ret = cpufreq_frequency_table_cpuinfo(policy, >freq_table[0]);
+   ret = cpufreq_table_validate_and_show(policy, >freq_table[0]);
if (ret) {
kfree(centaur);
return ret;
}
 
-   cpufreq_frequency_table_get_attr(>freq_table[0], policy->cpu);
return 0;
 }
 
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH part2 0/4] acpi: Trivial fix and improving for memory hotplug.

2013-08-08 Thread Rafael J. Wysocki
On Thursday, August 08, 2013 01:03:55 PM Tang Chen wrote:
> This patch-set does some trivial fix and improving in ACPI code
> for memory hotplug.
> 
> Patch 1,3,4 have been acked.
> 
> Tang Chen (4):
>   acpi: Print Hot-Pluggable Field in SRAT.
>   earlycpio.c: Fix the confusing comment of find_cpio_data().
>   acpi: Remove "continue" in macro INVALID_TABLE().
>   acpi: Introduce acpi_verify_initrd() to check if a table is invalid.
> 
>  arch/x86/mm/srat.c |   11 --
>  drivers/acpi/osl.c |   84 +++
>  lib/earlycpio.c|   27 
>  3 files changed, 85 insertions(+), 37 deletions(-)

It looks like this part doesn't depend on the other parts, is that correct?

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/35] cpufreq: Introduce cpufreq_table_validate_and_show()

2013-08-08 Thread Rafael J. Wysocki
On Thursday, August 08, 2013 07:18:02 PM Viresh Kumar wrote:
> This is actually part of a bigger patchset which will change declaration of
> cpufreq_driver->target() to include index instead of target_freq and relation
> and hence cpufreq drivers wouldn't require to cpufreq_frequency_table_target()
> anymore.
> 
> Almost every cpufreq driver is required to validate its frequency table with:
> cpufreq_frequency_table_cpuinfo() and then expose it to cpufreq core with:
> cpufreq_frequency_table_get_attr().
> 
> This patch creates another helper routine cpufreq_table_validate_and_show() 
> that
> will do both these steps in a single call and will return 0 for success, error
> otherwise.
> 
> This also fixes potential bugs in cpufreq drivers where people have called
> cpufreq_frequency_table_get_attr() before calling
> cpufreq_frequency_table_cpuinfo(), as the later may fail.
> 
> Viresh Kumar (35):
>   cpufreq: Add new helper cpufreq_table_validate_and_show()
>   cpufreq: pxa: call cpufreq_frequency_table_get_attr()
>   cpufreq: s3cx4xx: call cpufreq_frequency_table_get_attr()
>   cpufreq: sparc: call cpufreq_frequency_table_get_attr()
>   cpufreq: acpi-cpufreq: use cpufreq_table_validate_and_show()
>   cpufreq: arm_big_little: use cpufreq_table_validate_and_show()
>   cpufreq: blackfin: use cpufreq_table_validate_and_show()
>   cpufreq: cpufreq-cpu0: use cpufreq_table_validate_and_show()
>   cpufreq: cris: use cpufreq_table_validate_and_show()
>   cpufreq: davinci: use cpufreq_table_validate_and_show()
>   cpufreq: dbx500: use cpufreq_table_validate_and_show()
>   cpufreq: e_powersaver: use cpufreq_table_validate_and_show()
>   cpufreq: elanfreq: use cpufreq_table_validate_and_show()
>   cpufreq: exynos: use cpufreq_table_validate_and_show()
>   cpufreq: ia64-acpi: use cpufreq_table_validate_and_show()
>   cpufreq: imx6q: use cpufreq_table_validate_and_show()
>   cpufreq: kirkwood: use cpufreq_table_validate_and_show()
>   cpufreq: longhaul: use cpufreq_table_validate_and_show()
>   cpufreq: loongson2: use cpufreq_table_validate_and_show()
>   cpufreq: maple: use cpufreq_table_validate_and_show()
>   cpufreq: omap: use cpufreq_table_validate_and_show()
>   cpufreq: p4-clockmod: use cpufreq_table_validate_and_show()
>   cpufreq: pasemi: use cpufreq_table_validate_and_show()
>   cpufreq: pmac: use cpufreq_table_validate_and_show()
>   cpufreq: powernow: use cpufreq_table_validate_and_show()
>   cpufreq: ppc: use cpufreq_table_validate_and_show()
>   cpufreq: pxa: use cpufreq_table_validate_and_show()
>   cpufreq: s3cx4xx: use cpufreq_table_validate_and_show()
>   cpufreq: s5pv210: use cpufreq_table_validate_and_show()
>   cpufreq: sc520: use cpufreq_table_validate_and_show()
>   cpufreq: sh: use cpufreq_table_validate_and_show()
>   cpufreq: sparc: use cpufreq_table_validate_and_show()
>   cpufreq: spear: use cpufreq_table_validate_and_show()
>   cpufreq: speedstep: use cpufreq_table_validate_and_show()
>   cpufreq: tegra: use cpufreq_table_validate_and_show()
> 
>  drivers/cpufreq/acpi-cpufreq.c |  4 +---
>  drivers/cpufreq/arm_big_little.c   |  4 +---
>  drivers/cpufreq/blackfin-cpufreq.c |  3 +--
>  drivers/cpufreq/cpufreq-cpu0.c |  4 +---
>  drivers/cpufreq/cris-artpec3-cpufreq.c | 10 +-
>  drivers/cpufreq/cris-etraxfs-cpufreq.c | 10 +-
>  drivers/cpufreq/davinci-cpufreq.c  |  6 ++
>  drivers/cpufreq/dbx500-cpufreq.c   |  6 ++
>  drivers/cpufreq/e_powersaver.c |  3 +--
>  drivers/cpufreq/elanfreq.c |  8 +---
>  drivers/cpufreq/exynos-cpufreq.c   |  4 +---
>  drivers/cpufreq/exynos5440-cpufreq.c   |  4 +---
>  drivers/cpufreq/freq_table.c   | 12 
>  drivers/cpufreq/ia64-acpi-cpufreq.c|  4 +---
>  drivers/cpufreq/imx6q-cpufreq.c|  3 +--
>  drivers/cpufreq/kirkwood-cpufreq.c | 10 +-
>  drivers/cpufreq/longhaul.c |  8 +---
>  drivers/cpufreq/loongson2_cpufreq.c|  5 +
>  drivers/cpufreq/maple-cpufreq.c|  4 +---
>  drivers/cpufreq/omap-cpufreq.c |  4 +---
>  drivers/cpufreq/p4-clockmod.c  |  3 +--
>  drivers/cpufreq/pasemi-cpufreq.c   |  4 +---
>  drivers/cpufreq/pmac32-cpufreq.c   |  3 +--
>  drivers/cpufreq/pmac64-cpufreq.c   |  4 +---
>  drivers/cpufreq/powernow-k6.c  |  9 +
>  drivers/cpufreq/powernow-k7.c  |  4 +---
>  drivers/cpufreq/powernow-k8.c  |  4 +---
>  drivers/cpufreq/ppc-corenet-cpufreq.c  |  3 +--
>  drivers/cpufreq/ppc_cbe_cpufreq.c  |  4 +---
>  drivers/cpufreq/pxa2xx-cpufreq.c   |  8 +---
>  drivers/cpufreq/pxa3xx-cpufreq.c   |  2 +-
>  drivers/cpufreq/s3c2416-cpufreq.c  |  4 +---
>  drivers/cpufreq/s3c24xx-cpufreq.c  |  2 +-
>  drivers/cpufreq/s3c64xx-cpufreq.c  |  2 +-
>  drivers/cpufreq/s5pv210-cpufreq.c  |  4 +---
>  drivers/cpufreq/sc520_freq.c   |  9 +
>  drivers/cpufreq/sh-cpufreq.c   |  6 +++---
>  

[PATCH 11/35] cpufreq: dbx500: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: Linus Walleij 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/dbx500-cpufreq.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/dbx500-cpufreq.c b/drivers/cpufreq/dbx500-cpufreq.c
index 26321cd..ea03b92 100644
--- a/drivers/cpufreq/dbx500-cpufreq.c
+++ b/drivers/cpufreq/dbx500-cpufreq.c
@@ -87,10 +87,8 @@ static int dbx500_cpufreq_init(struct cpufreq_policy *policy)
int res;
 
/* get policy fields based on the table */
-   res = cpufreq_frequency_table_cpuinfo(policy, freq_table);
-   if (!res)
-   cpufreq_frequency_table_get_attr(freq_table, policy->cpu);
-   else {
+   res = cpufreq_table_validate_and_show(policy, freq_table);
+   if (res)
pr_err("dbx500-cpufreq: Failed to read policy table\n");
return res;
}
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/35] cpufreq: elanfreq: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/elanfreq.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/cpufreq/elanfreq.c b/drivers/cpufreq/elanfreq.c
index 823a400..4000c34 100644
--- a/drivers/cpufreq/elanfreq.c
+++ b/drivers/cpufreq/elanfreq.c
@@ -202,7 +202,6 @@ static int elanfreq_cpu_init(struct cpufreq_policy *policy)
 {
struct cpuinfo_x86 *c = _data(0);
unsigned int i;
-   int result;
 
/* capability check */
if ((c->x86_vendor != X86_VENDOR_AMD) ||
@@ -223,12 +222,7 @@ static int elanfreq_cpu_init(struct cpufreq_policy *policy)
policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL;
policy->cur = elanfreq_get_cpu_frequency(0);
 
-   result = cpufreq_frequency_table_cpuinfo(policy, elanfreq_table);
-   if (result)
-   return result;
-
-   cpufreq_frequency_table_get_attr(elanfreq_table, policy->cpu);
-   return 0;
+   return cpufreq_table_validate_and_show(policy, elanfreq_table);
 }
 
 
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14/35] cpufreq: exynos: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: Kukjin Kim 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/exynos-cpufreq.c | 4 +---
 drivers/cpufreq/exynos5440-cpufreq.c | 4 +---
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/cpufreq/exynos-cpufreq.c b/drivers/cpufreq/exynos-cpufreq.c
index 3664751..71c4926 100644
--- a/drivers/cpufreq/exynos-cpufreq.c
+++ b/drivers/cpufreq/exynos-cpufreq.c
@@ -249,14 +249,12 @@ static int exynos_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
 {
policy->cur = policy->min = policy->max = exynos_getspeed(policy->cpu);
 
-   cpufreq_frequency_table_get_attr(exynos_info->freq_table, policy->cpu);
-
/* set the transition latency value */
policy->cpuinfo.transition_latency = 10;
 
cpumask_setall(policy->cpus);
 
-   return cpufreq_frequency_table_cpuinfo(policy, exynos_info->freq_table);
+   return cpufreq_table_validate_and_show(policy, exynos_info->freq_table);
 }
 
 static int exynos_cpufreq_cpu_exit(struct cpufreq_policy *policy)
diff --git a/drivers/cpufreq/exynos5440-cpufreq.c 
b/drivers/cpufreq/exynos5440-cpufreq.c
index 0c74018..1ac93e0 100644
--- a/drivers/cpufreq/exynos5440-cpufreq.c
+++ b/drivers/cpufreq/exynos5440-cpufreq.c
@@ -323,7 +323,7 @@ static int exynos_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
 {
int ret;
 
-   ret = cpufreq_frequency_table_cpuinfo(policy, dvfs_info->freq_table);
+   ret = cpufreq_table_validate_and_show(policy, dvfs_info->freq_table);
if (ret) {
dev_err(dvfs_info->dev, "Invalid frequency table: %d\n", ret);
return ret;
@@ -333,8 +333,6 @@ static int exynos_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
policy->cpuinfo.transition_latency = dvfs_info->latency;
cpumask_setall(policy->cpus);
 
-   cpufreq_frequency_table_get_attr(dvfs_info->freq_table, policy->cpu);
-
return 0;
 }
 
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] PCI: avoid NULL deref in alloc_pcie_link_state

2013-08-08 Thread Radim Krčmář
PCIe switch can be connected directly to the PCIe root complex in QEMU;
ASPM does not expect this topology and dereferences NULL pointer when
initializing.

Downstream port can be also connected to the root complex without
upstream one, so code checks for both, otherwise they dereference NULL
on line drivers/pci/pcie/aspm.c:530 (alloc_pcie_link_state+13):
parent = pdev->bus->parent->self->link_state;
"pdev->bus->parent->self == NULL" if upstream port is connected directly
to the root bus and "pdev->bus->parent == NULL" in the second case.

v1 -> v2: (https://lkml.org/lkml/2013/6/19/753)
 - Initialization is aborted in pcie_aspm_init_link_state, where other
   special cases are being handled
 - pci_is_root_bus is used
 - Warning is printed

Reproducer for "downstream -- root" and "downstream -- upstream -- root"
(used qemu-kvm 1.5, q35 machine type might be missing on older ones)

  for parent in pcie.0 upstream; do
   qemu-kvm -m 128 -M q35 -nographic -no-reboot \
 -device x3130-upstream,bus=pcie.0,id=upstream \
 -device xio3130-downstream,bus=$parent,id=downstream,chassis=1 \
 -device virtio-blk-pci,bus=downstream,id=virtio-zero,drive=zero \
 -drive  file=/dev/zero,id=zero,format=raw \
 -kernel bzImage -append "console=ttyS0 panic=3" # pcie_aspm=off
  done

ASPM in QEMU works if we connect upstream through root port
  -device ioh3420,bus=pcie.0,id=root.0 \
  -device x3130-upstream,bus=root.0,id=upstream

Signed-off-by: Radim Krčmář 
---
 drivers/pci/pcie/aspm.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 403a443..209cd7f 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -570,6 +570,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev)
pdev->bus->self)
return;
 
+   /* We require at least two ports between downstream and root bus */
+   if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM &&
+   (pci_is_root_bus(pdev->bus) ||
+pci_is_root_bus(pdev->bus->parent))) {
+   dev_warn(>dev, "ASPM disabled"
+" (connected directly to root bus)\n");
+   return;
+   }
+
down_read(_bus_sem);
if (list_empty(>subordinate->devices))
goto out;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Revert "slub: do not put a slab to cpu partial list when cpu_partial is 0"

2013-08-08 Thread Steven Rostedt
On Thu, 2013-08-08 at 09:19 -0400, Steven Rostedt wrote:

> Link:
> https://lkml.kernel.org/r/1375934936.6848.41.ca...@gandalf.local.home
> 

Evolution is really pissing me off. The wrapping that is used when I
compose a message is not the same as what is sent. This looked fine when
I hit send, but it should have been (/me switches to Preformatted mode):

Link: https://lkml.kernel.org/r/1375934936.6848.41.ca...@gandalf.local.home

Anyone know how to fix Evolution to have the same wrap in the composer
as what is sent?

I'm using Debian testing, XFCE desktop, and Evolution 3.4.4.

Thanks,

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] ARC: gdbserver breakage in Big-Endian configuration #2

2013-08-08 Thread Vineet Gupta
This is related to how how different asm instructions write to a
bitfield member.

In BE config, @orig_r8 is at offset 0 while @event is at offset 2.

"ST 0x_, [addr]" correctly writes 0x to @event, however
a half word write, "STW 0x, [addr]" does not in Big Endian while
works perfectly file for LE config (since @event is at offset 0).

-->8---
Thi issue is already fixed in mainline 3.11 kernel as part of commit:
502a0c775c7f0a "ARC: pt_regs update #5"

However that patch has lot more changes than I would like backporting,
hence this seperate change.
-->8---

Reported-by: Noam Camus 
Cc:  # [3.9 and 3.10 only]
Tested-by: Anton Kolesov 
Signed-off-by: Vineet Gupta 
---
 arch/arc/kernel/entry.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arc/kernel/entry.S b/arch/arc/kernel/entry.S
index 7b5f3ec..724de08 100644
--- a/arch/arc/kernel/entry.S
+++ b/arch/arc/kernel/entry.S
@@ -498,7 +498,7 @@ tracesys_exit:
 trap_with_param:
 
; stop_pc info by gdb needs this info
-   stw orig_r8_IS_BRKPT, [sp, PT_orig_r8]
+   st  orig_r8_IS_BRKPT, [sp, PT_orig_r8]
 
mov r0, r12
lr  r1, [efa]
@@ -727,7 +727,7 @@ not_exception:
; things to what they were, before returning from L2 context
;
 
-   ldw  r9, [sp, PT_orig_r8]  ; get orig_r8 to make sure it is
+   ld   r9, [sp, PT_orig_r8]  ; get orig_r8 to make sure it is
brne r9, orig_r8_IS_IRQ2, 149f ; infact a L2 ISR ret path
 
ld r9, [sp, PT_status32]   ; get statu32_l2 (saved in pt_regs)
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/35] cpufreq: Introduce cpufreq_table_validate_and_show()

2013-08-08 Thread Rafael J. Wysocki
On Thursday, August 08, 2013 07:23:36 PM Viresh Kumar wrote:
> On 8 August 2013 19:30, Rafael J. Wysocki  wrote:
> > I'm not going to take this for 3.12, sorry.  Please resend in the 3.12-rc1 /
> > 3.12-rc2 time frame.
> 
> Okay.. By that time I will accumulate all reviews/Acks for it
> and the next patchset that I will send.

OK

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 28/35] cpufreq: s3cx4xx: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: Kukjin Kim 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/s3c2416-cpufreq.c | 4 +---
 drivers/cpufreq/s3c24xx-cpufreq.c | 6 ++
 drivers/cpufreq/s3c64xx-cpufreq.c | 5 +
 3 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/cpufreq/s3c2416-cpufreq.c 
b/drivers/cpufreq/s3c2416-cpufreq.c
index 22dcb81..a7a4c61 100644
--- a/drivers/cpufreq/s3c2416-cpufreq.c
+++ b/drivers/cpufreq/s3c2416-cpufreq.c
@@ -494,12 +494,10 @@ static int __init s3c2416_cpufreq_driver_init(struct 
cpufreq_policy *policy)
policy->cpuinfo.transition_latency = (500 * 1000) +
 s3c_freq->regulator_latency;
 
-   ret = cpufreq_frequency_table_cpuinfo(policy, s3c_freq->freq_table);
+   ret = cpufreq_table_validate_and_show(policy, s3c_freq->freq_table);
if (ret)
goto err_freq_table;
 
-   cpufreq_frequency_table_get_attr(s3c_freq->freq_table, 0);
-
register_reboot_notifier(_cpufreq_reboot_notifier);
 
return 0;
diff --git a/drivers/cpufreq/s3c24xx-cpufreq.c 
b/drivers/cpufreq/s3c24xx-cpufreq.c
index 8843454..39c1d3e 100644
--- a/drivers/cpufreq/s3c24xx-cpufreq.c
+++ b/drivers/cpufreq/s3c24xx-cpufreq.c
@@ -386,10 +386,8 @@ static int s3c_cpufreq_init(struct cpufreq_policy *policy)
/* feed the latency information from the cpu driver */
policy->cpuinfo.transition_latency = cpu_cur.info->latency;
 
-   if (ftab) {
-   cpufreq_frequency_table_cpuinfo(policy, ftab);
-   cpufreq_frequency_table_get_attr(ftab, policy->cpu);
-   }
+   if (ftab)
+   return cpufreq_table_validate_and_show(policy, ftab);
 
return 0;
 }
diff --git a/drivers/cpufreq/s3c64xx-cpufreq.c 
b/drivers/cpufreq/s3c64xx-cpufreq.c
index 9024043..872f74d 100644
--- a/drivers/cpufreq/s3c64xx-cpufreq.c
+++ b/drivers/cpufreq/s3c64xx-cpufreq.c
@@ -251,15 +251,12 @@ static int s3c64xx_cpufreq_driver_init(struct 
cpufreq_policy *policy)
 */
policy->cpuinfo.transition_latency = (500 * 1000) + regulator_latency;
 
-   ret = cpufreq_frequency_table_cpuinfo(policy, s3c64xx_freq_table);
+   ret = cpufreq_table_validate_and_show(policy, s3c64xx_freq_table);
if (ret != 0) {
pr_err("Failed to configure frequency table: %d\n",
   ret);
regulator_put(vddarm);
clk_put(armclk);
-   } else {
-   cpufreq_frequency_table_get_attr(s3c64xx_freq_table,
-   policy->cpu);
}
 
return ret;
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] ARC: gdbserver breakage in Big-Endian configuration #1

2013-08-08 Thread Vineet Gupta
Exception handling keeps additional state (whether exception was Trap
and if it was due to a breakpoint) in pt_regs->event, a bitfield member

unsigned long orig_r8:16, event:16;

A bitfield esentially has an "offset" and a "length". What I wasn't
aware of was that, bitfields in a union loose the "offset" attribute
and all of them are laid out at offset 0.

This obviously means that both @event and @orig_r8 will be incorrectly
referenced to at same "0" offset by "C" generated code which is
certainly wrong, not because both members are accessed, but because asm
code updates it at different address.

In Little Endian config, @event is at offset 0 and @orig_r8 (not
actively used at all) clashing with it is OK. However in Big Endian
config,

ST 0x_, [addr]

writes 0x to @event (offset 2 in memory) while "C" code references
it from 0.

Needless to say, this causes ptrace machinery to not detect the breakpoint
scenario (and incorrect stop_pc returned to gdbserver).

-->8---
Thi issue is already fixed in mainline 3.11 kernel as part of commit:
502a0c775c7f0a "ARC: pt_regs update #5"

However that patch has lot more changes than I would like backporting,
hence this seperate change.
-->8---

Reported-by: Noam Camus 
Cc:  # [3.9 and 3.10 only]
Tested-by: Anton Kolesov 
Signed-off-by: Vineet Gupta 
---
 arch/arc/include/asm/ptrace.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arc/include/asm/ptrace.h b/arch/arc/include/asm/ptrace.h
index 6179de7..2046a89 100644
--- a/arch/arc/include/asm/ptrace.h
+++ b/arch/arc/include/asm/ptrace.h
@@ -52,12 +52,14 @@ struct pt_regs {
 
/*to distinguish bet excp, syscall, irq */
union {
+   struct {
 #ifdef CONFIG_CPU_BIG_ENDIAN
/* so that assembly code is same for LE/BE */
unsigned long orig_r8:16, event:16;
 #else
unsigned long event:16, orig_r8:16;
 #endif
+   };
long orig_r8_word;
};
 };
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] kernel/sys.c: return the current gid when error occurs

2013-08-08 Thread Michael Kerrisk (man-pages)
On 08/08/13 03:48, Chen Gang wrote:
> On 08/08/2013 09:35 AM, Andy Lutomirski wrote:
>> On Wed, Aug 7, 2013 at 6:30 PM, Chen Gang  wrote:
>>> On 08/08/2013 12:58 AM, Andy Lutomirski wrote:
 On Wed, Aug 7, 2013 at 9:21 AM, Oleg Nesterov  wrote:
> On 08/06, Andy Lutomirski wrote:
>>
>> I assume that what the man page means is that the return value is
>> whatever fsgid was prior to the call.  On error, fsgid isn't changed, so
>> the return value is still "current".
>
> Probably... Still
>
> On success, the previous value of fsuid is returned.
> On error, the current value of fsuid is returned.
>
> looks confusing. sys_setfsuid() always returns the old value.
>
>> (FWIW, this behavior is awful and is probably the cause of a security
>> bug or three, since success and failure are indistinguishable.
>
> At least this all looks strange.
>
> I dunno if we can change this old behaviour. I won't be surprized
> if someone already uses setfsuid(-1) as getfsuid().

>>>
>>> Oh, really it is.
>>>
>>> Hmm... as a pair function, we need add getfsuid() too, if we do not add
>>> it, it will make negative effect with setfsuid().
>>>
>>> Since it is a system call, we have to keep compitable.
>>>
>>> So in my opinion, better add getfsuid2()/setfsuid2() instead of current
>>> setfsuid()
>>
>> How about getfsuid() and setfsuid2()?
>>
> 
> Hmm... I have 2 reasons, please check.
> 
> 1st reason: I checked history (just like Kees Cook suggested),
> getfsuid() is mentioned before (you can google to find it), so need use
> getfsuid2() to bypass the history complex.
> 
> And 2nd reason: getfsuid() seems more like the pair of setfsuid(), not
> for setfsuid2().

Time to apply the brakes... *Why* add new system calls here? I don't 
think there is any good reason. Yes, the existing APIs are rubbish,
but, as far as I can tell, they are also obsolete and unneeded.
The fsuid/fsgid mechanism was a bizarre Linuxism whose only purpose
was (as far as I know), to allow for the fact that Linux long ago
applied nonstandard rules when determining when one process could 
send signals to another. Quoting some book on the subject:

Why does Linux provide the file-system IDs and in what 
circumstances would we want the effective and file-system 
IDs to differ? The reasons are primarily historical.
The file-system IDs first appeared in Linux 1.2. In 
that kernel version, one process could send a signal to 
another if the effective user ID of the sender matched
the real or effective user ID of the target process. 
This affected certain programs such as the Linux NFS 
(Network File System) server program, which needed to be
able to access files as though it had the effective IDs 
of the corresponding client process. However, if the NFS 
server changed its effective user ID, it would be 
vulnerable to signals from unprivileged user processes. 
To prevent this possibility, the separate file-system user
and group IDs were devised. By leaving its effective IDs
unchanged, but changing its file-system IDs, the NFS 
server could masquerade as another user for the purpose of 
accessing files without being vulnerable to signals from
user processes.

From kernel 2.0 onward, Linux adopted the SUSv3-mandated 
rules regarding permission for sending signals, and these 
rules don't involve the effective user ID of the target
process. Thus, the file-system ID feature is no longer 
strictly necessary (a process can nowadays achieve the 
desired results by making judicious use of the system
calls described later in this chapter to change
the value of the effective user ID to and from an
unprivileged value, as required), but it remains for
compatibility with existing software.

So, I don't think anything needs fixing: there should be no 
new users of these system calls anyway.

Cheers,

Michael


-- 
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v9 03/16] iommu/exynos: fix page table maintenance

2013-08-08 Thread Tomasz Figa
On Thursday 08 of August 2013 18:37:43 Cho KyongHo wrote:
> This prevents allocating lv2 page table for the lv1 page table entry
  ^ What this is this this about? :)

> that already has 1MB page mapping. In addition, changed to BUG_ON
> instead of returning -EADDRINUSE.

The change mentioned in last sentence should be a separate patch.

> Signed-off-by: Cho KyongHo 
> ---
>  drivers/iommu/exynos-iommu.c |   68
> - 1 files changed, 40
> insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
> index d545a25..d90e6fa 100644
> --- a/drivers/iommu/exynos-iommu.c
> +++ b/drivers/iommu/exynos-iommu.c
> @@ -52,11 +52,11 @@
>  #define lv2ent_large(pent) ((*(pent) & 3) == 1)
> 
>  #define section_phys(sent) (*(sent) & SECT_MASK)
> -#define section_offs(iova) ((iova) & 0xF)
> +#define section_offs(iova) ((iova) & ~SECT_MASK)
>  #define lpage_phys(pent) (*(pent) & LPAGE_MASK)
> -#define lpage_offs(iova) ((iova) & 0x)
> +#define lpage_offs(iova) ((iova) & ~LPAGE_MASK)
>  #define spage_phys(pent) (*(pent) & SPAGE_MASK)
> -#define spage_offs(iova) ((iova) & 0xFFF)
> +#define spage_offs(iova) ((iova) & ~SPAGE_MASK)
> 
>  #define lv1ent_offset(iova) ((iova) >> SECT_ORDER)
>  #define lv2ent_offset(iova) (((iova) & 0xFF000) >> SPAGE_ORDER)
> @@ -856,13 +856,15 @@ finish:
>  static unsigned long *alloc_lv2entry(unsigned long *sent, unsigned long
> iova, short *pgcounter)
>  {
> + BUG_ON(lv1ent_section(sent));

Is this condition really a critical one, to the point that the system 
should not continue execution?

> +
>   if (lv1ent_fault(sent)) {
>   unsigned long *pent;
> 
>   pent = kzalloc(LV2TABLE_SIZE, GFP_ATOMIC);
>   BUG_ON((unsigned long)pent & (LV2TABLE_SIZE - 1));
>   if (!pent)
> - return NULL;
> + return ERR_PTR(-ENOMEM);
> 
>   *sent = mk_lv1ent_page(__pa(pent));
>   *pgcounter = NUM_LV2ENTRIES;
> @@ -875,15 +877,11 @@ static unsigned long *alloc_lv2entry(unsigned long
> *sent, unsigned long iova,
> 
>  static int lv1set_section(unsigned long *sent, phys_addr_t paddr, short
> *pgcnt) {
> - if (lv1ent_section(sent))
> - return -EADDRINUSE;
> + BUG_ON(lv1ent_section(sent));

Ditto.

>   if (lv1ent_page(sent)) {
> - if (*pgcnt != NUM_LV2ENTRIES)
> - return -EADDRINUSE;
> -
> + BUG_ON(*pgcnt != NUM_LV2ENTRIES);

Ditto.

>   kfree(page_entry(sent, 0));
> -
>   *pgcnt = 0;
>   }
> 
> @@ -894,24 +892,24 @@ static int lv1set_section(unsigned long *sent,
> phys_addr_t paddr, short *pgcnt) return 0;
>  }
> 
> +static void clear_page_table(unsigned long *ent, int n)
> +{
> + if (n > 0)
> + memset(ent, 0, sizeof(*ent) * n);
> +}

I don't see the point of creating this function. It seems to be used only 
once, in addition with a constant as n, so the check for n > 0 is 
unnecessary.

And even if there is a need for this change, it should be done in separate 
patch, as this one is not about stylistic changes, but fixing page table 
maintenance (at least based on your commit message).

>  static int lv2set_page(unsigned long *pent, phys_addr_t paddr, size_t
> size, short *pgcnt)
>  {
>   if (size == SPAGE_SIZE) {
> - if (!lv2ent_fault(pent))
> - return -EADDRINUSE;
> -
> + BUG_ON(!lv2ent_fault(pent));

Ditto.

>   *pent = mk_lv2ent_spage(paddr);
>   pgtable_flush(pent, pent + 1);
>   *pgcnt -= 1;
>   } else { /* size == LPAGE_SIZE */
>   int i;
>   for (i = 0; i < SPAGES_PER_LPAGE; i++, pent++) {
> - if (!lv2ent_fault(pent)) {
> - memset(pent, 0, sizeof(*pent) * i);
> - return -EADDRINUSE;
> - }
> -
> + BUG_ON(!lv2ent_fault(pent));

Ditto.

>   *pent = mk_lv2ent_lpage(paddr);
>   }
>   pgtable_flush(pent - SPAGES_PER_LPAGE, pent);
> @@ -944,17 +942,16 @@ static int exynos_iommu_map(struct iommu_domain
> *domain, unsigned long iova, pent = alloc_lv2entry(entry, iova,
>   >lv2entcnt[lv1ent_offset(iova)]);
> 
> - if (!pent)
> - ret = -ENOMEM;
> + if (IS_ERR(pent))
> + ret = PTR_ERR(pent);
>   else
>   ret = lv2set_page(pent, paddr, size,
>   >lv2entcnt[lv1ent_offset(iova)]);
>   }
> 
> - if (ret) {
> - pr_debug("%s: Failed to map iova 0x%lx/0x%x bytes\n",
> - __func__, iova, size);
> - }
> + if (ret)
> + pr_err("%s: Failed(%d) to map 0x%#x bytes @ %#lx\n",
> + 

[PATCH 35/35] cpufreq: tegra: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: Stephen Warren 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/tegra-cpufreq.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/cpufreq/tegra-cpufreq.c b/drivers/cpufreq/tegra-cpufreq.c
index cd66b85..b50d2c4 100644
--- a/drivers/cpufreq/tegra-cpufreq.c
+++ b/drivers/cpufreq/tegra-cpufreq.c
@@ -215,8 +215,7 @@ static int tegra_cpu_init(struct cpufreq_policy *policy)
clk_prepare_enable(emc_clk);
clk_prepare_enable(cpu_clk);
 
-   cpufreq_frequency_table_cpuinfo(policy, freq_table);
-   cpufreq_frequency_table_get_attr(freq_table, policy->cpu);
+   cpufreq_table_validate_and_show(policy, freq_table);
policy->cur = tegra_getspeed(policy->cpu);
target_cpu_speed[policy->cpu] = policy->cur;
 
@@ -233,7 +232,6 @@ static int tegra_cpu_init(struct cpufreq_policy *policy)
 
 static int tegra_cpu_exit(struct cpufreq_policy *policy)
 {
-   cpufreq_frequency_table_cpuinfo(policy, freq_table);
clk_disable_unprepare(emc_clk);
return 0;
 }
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 30/35] cpufreq: sc520: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/sc520_freq.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/cpufreq/sc520_freq.c b/drivers/cpufreq/sc520_freq.c
index d6f6c6f..bb9c0de 100644
--- a/drivers/cpufreq/sc520_freq.c
+++ b/drivers/cpufreq/sc520_freq.c
@@ -106,7 +106,6 @@ static int sc520_freq_target(struct cpufreq_policy *policy,
 static int sc520_freq_cpu_init(struct cpufreq_policy *policy)
 {
struct cpuinfo_x86 *c = _data(0);
-   int result;
 
/* capability check */
if (c->x86_vendor != X86_VENDOR_AMD ||
@@ -117,13 +116,7 @@ static int sc520_freq_cpu_init(struct cpufreq_policy 
*policy)
policy->cpuinfo.transition_latency = 100; /* 1ms */
policy->cur = sc520_freq_get_cpu_frequency(0);
 
-   result = cpufreq_frequency_table_cpuinfo(policy, sc520_freq_table);
-   if (result)
-   return result;
-
-   cpufreq_frequency_table_get_attr(sc520_freq_table, policy->cpu);
-
-   return 0;
+   return cpufreq_table_validate_and_show(policy, sc520_freq_table);
 }
 
 
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 25/35] cpufreq: powernow: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/powernow-k6.c | 9 +
 drivers/cpufreq/powernow-k7.c | 4 +---
 drivers/cpufreq/powernow-k8.c | 4 +---
 3 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/drivers/cpufreq/powernow-k6.c b/drivers/cpufreq/powernow-k6.c
index 85f1c8c..ab1de0d 100644
--- a/drivers/cpufreq/powernow-k6.c
+++ b/drivers/cpufreq/powernow-k6.c
@@ -145,7 +145,6 @@ static int powernow_k6_target(struct cpufreq_policy *policy,
 static int powernow_k6_cpu_init(struct cpufreq_policy *policy)
 {
unsigned int i, f;
-   int result;
 
if (policy->cpu != 0)
return -ENODEV;
@@ -167,13 +166,7 @@ static int powernow_k6_cpu_init(struct cpufreq_policy 
*policy)
policy->cpuinfo.transition_latency = 20;
policy->cur = busfreq * max_multiplier;
 
-   result = cpufreq_frequency_table_cpuinfo(policy, clock_ratio);
-   if (result)
-   return result;
-
-   cpufreq_frequency_table_get_attr(clock_ratio, policy->cpu);
-
-   return 0;
+   return cpufreq_table_validate_and_show(policy, clock_ratio);
 }
 
 
diff --git a/drivers/cpufreq/powernow-k7.c b/drivers/cpufreq/powernow-k7.c
index 14ce480..c863c13 100644
--- a/drivers/cpufreq/powernow-k7.c
+++ b/drivers/cpufreq/powernow-k7.c
@@ -680,9 +680,7 @@ static int powernow_cpu_init(struct cpufreq_policy *policy)
 
policy->cur = powernow_get(0);
 
-   cpufreq_frequency_table_get_attr(powernow_table, policy->cpu);
-
-   return cpufreq_frequency_table_cpuinfo(policy, powernow_table);
+   return cpufreq_table_validate_and_show(policy, powernow_table);
 }
 
 static int powernow_cpu_exit(struct cpufreq_policy *policy)
diff --git a/drivers/cpufreq/powernow-k8.c b/drivers/cpufreq/powernow-k8.c
index 2344a9e..8d4114a 100644
--- a/drivers/cpufreq/powernow-k8.c
+++ b/drivers/cpufreq/powernow-k8.c
@@ -1156,7 +1156,7 @@ static int powernowk8_cpu_init(struct cpufreq_policy *pol)
pr_debug("policy current frequency %d kHz\n", pol->cur);
 
/* min/max the cpu is capable of */
-   if (cpufreq_frequency_table_cpuinfo(pol, data->powernow_table)) {
+   if (cpufreq_table_validate_and_show(pol, data->powernow_table)) {
printk(KERN_ERR FW_BUG PFX "invalid powernow_table\n");
powernow_k8_cpu_exit_acpi(data);
kfree(data->powernow_table);
@@ -1164,8 +1164,6 @@ static int powernowk8_cpu_init(struct cpufreq_policy *pol)
return -EINVAL;
}
 
-   cpufreq_frequency_table_get_attr(data->powernow_table, pol->cpu);
-
pr_debug("cpu_init done, current fid 0x%x, vid 0x%x\n",
 data->currfid, data->currvid);
 
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 31/35] cpufreq: sh: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: Paul Mundt 
Cc: linux...@vger.kernel.org
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/sh-cpufreq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/sh-cpufreq.c b/drivers/cpufreq/sh-cpufreq.c
index ffc6d24..1362e88 100644
--- a/drivers/cpufreq/sh-cpufreq.c
+++ b/drivers/cpufreq/sh-cpufreq.c
@@ -120,9 +120,9 @@ static int sh_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
if (freq_table) {
int result;
 
-   result = cpufreq_frequency_table_cpuinfo(policy, freq_table);
-   if (!result)
-   cpufreq_frequency_table_get_attr(freq_table, cpu);
+   result = cpufreq_table_validate_and_show(policy, freq_table);
+   if (result)
+   return result;
} else {
dev_notice(dev, "no frequency table found, falling back "
   "to rate rounding.\n");
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/35] cpufreq: Introduce cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
On 8 August 2013 19:30, Rafael J. Wysocki  wrote:
> I'm not going to take this for 3.12, sorry.  Please resend in the 3.12-rc1 /
> 3.12-rc2 time frame.

Okay.. By that time I will accumulate all reviews/Acks for it
and the next patchset that I will send.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 27/35] cpufreq: pxa: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: Eric Miao 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/pxa2xx-cpufreq.c | 6 ++
 drivers/cpufreq/pxa3xx-cpufreq.c | 8 ++--
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/cpufreq/pxa2xx-cpufreq.c b/drivers/cpufreq/pxa2xx-cpufreq.c
index c7ea005..a429d7c 100644
--- a/drivers/cpufreq/pxa2xx-cpufreq.c
+++ b/drivers/cpufreq/pxa2xx-cpufreq.c
@@ -454,12 +454,10 @@ static int pxa_cpufreq_init(struct cpufreq_policy *policy)
pr_info("PXA255 cpufreq using %s frequency table\n",
pxa255_turbo_table ? "turbo" : "run");
 
-   cpufreq_frequency_table_cpuinfo(policy, pxa255_freq_table);
-   cpufreq_frequency_table_get_attr(pxa255_freq_table, 
policy->cpu);
+   cpufreq_table_validate_and_show(policy, pxa255_freq_table);
}
else if (cpu_is_pxa27x()) {
-   cpufreq_frequency_table_cpuinfo(policy, pxa27x_freq_table);
-   cpufreq_frequency_table_get_attr(pxa27x_freq_table, 
policy->cpu);
+   cpufreq_table_validate_and_show(policy, pxa27x_freq_table);
}
 
printk(KERN_INFO "PXA CPU frequency change support initialized\n");
diff --git a/drivers/cpufreq/pxa3xx-cpufreq.c b/drivers/cpufreq/pxa3xx-cpufreq.c
index f53f28d6..89841f5 100644
--- a/drivers/cpufreq/pxa3xx-cpufreq.c
+++ b/drivers/cpufreq/pxa3xx-cpufreq.c
@@ -91,7 +91,7 @@ static int setup_freqs_table(struct cpufreq_policy *policy,
 struct pxa3xx_freq_info *freqs, int num)
 {
struct cpufreq_frequency_table *table;
-   int i, ret;
+   int i;
 
table = kzalloc((num + 1) * sizeof(*table), GFP_KERNEL);
if (table == NULL)
@@ -108,11 +108,7 @@ static int setup_freqs_table(struct cpufreq_policy *policy,
pxa3xx_freqs_num = num;
pxa3xx_freqs_table = table;
 
-   ret = cpufreq_frequency_table_cpuinfo(policy, table);
-   if (!ret)
-   cpufreq_frequency_table_get_attr(table, policy->cpu);
-
-   return ret;
+   return cpufreq_table_validate_and_show(policy, table);
 }
 
 static void __update_core_freq(struct pxa3xx_freq_info *info)
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 26/35] cpufreq: ppc: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/ppc-corenet-cpufreq.c | 3 +--
 drivers/cpufreq/ppc_cbe_cpufreq.c | 4 +---
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/cpufreq/ppc-corenet-cpufreq.c 
b/drivers/cpufreq/ppc-corenet-cpufreq.c
index 60e81d5..5716b44 100644
--- a/drivers/cpufreq/ppc-corenet-cpufreq.c
+++ b/drivers/cpufreq/ppc-corenet-cpufreq.c
@@ -202,7 +202,7 @@ static int corenet_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
table[i].frequency = CPUFREQ_TABLE_END;
 
/* set the min and max frequency properly */
-   ret = cpufreq_frequency_table_cpuinfo(policy, table);
+   ret = cpufreq_table_validate_and_show(policy, table);
if (ret) {
pr_err("invalid frequency table: %d\n", ret);
goto err_nomem1;
@@ -219,7 +219,6 @@ static int corenet_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL;
policy->cur = corenet_cpufreq_get_speed(policy->cpu);
 
-   cpufreq_frequency_table_get_attr(table, cpu);
of_node_put(np);
 
return 0;
diff --git a/drivers/cpufreq/ppc_cbe_cpufreq.c 
b/drivers/cpufreq/ppc_cbe_cpufreq.c
index 2e448f0..6c5be63 100644
--- a/drivers/cpufreq/ppc_cbe_cpufreq.c
+++ b/drivers/cpufreq/ppc_cbe_cpufreq.c
@@ -123,11 +123,9 @@ static int cbe_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
cpumask_copy(policy->cpus, cpu_sibling_mask(policy->cpu));
 #endif
 
-   cpufreq_frequency_table_get_attr(cbe_freqs, policy->cpu);
-
/* this ensures that policy->cpuinfo_min
 * and policy->cpuinfo_max are set correctly */
-   return cpufreq_frequency_table_cpuinfo(policy, cbe_freqs);
+   return cpufreq_table_validate_and_show(policy, cbe_freqs);
 }
 
 static int cbe_cpufreq_cpu_exit(struct cpufreq_policy *policy)
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 34/35] cpufreq: speedstep: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: David S. Miller 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/speedstep-centrino.c | 10 +-
 drivers/cpufreq/speedstep-ich.c  |  9 +
 drivers/cpufreq/speedstep-smi.c  |  8 +---
 3 files changed, 3 insertions(+), 24 deletions(-)

diff --git a/drivers/cpufreq/speedstep-centrino.c 
b/drivers/cpufreq/speedstep-centrino.c
index f897d51..f180561 100644
--- a/drivers/cpufreq/speedstep-centrino.c
+++ b/drivers/cpufreq/speedstep-centrino.c
@@ -345,7 +345,6 @@ static int centrino_cpu_init(struct cpufreq_policy *policy)
struct cpuinfo_x86 *cpu = _data(policy->cpu);
unsigned freq;
unsigned l, h;
-   int ret;
int i;
 
/* Only Intel makes Enhanced Speedstep-capable CPUs */
@@ -402,15 +401,8 @@ static int centrino_cpu_init(struct cpufreq_policy *policy)
 
pr_debug("centrino_cpu_init: cur=%dkHz\n", policy->cur);
 
-   ret = cpufreq_frequency_table_cpuinfo(policy,
+   return cpufreq_table_validate_and_show(policy,
per_cpu(centrino_model, policy->cpu)->op_points);
-   if (ret)
-   return (ret);
-
-   cpufreq_frequency_table_get_attr(
-   per_cpu(centrino_model, policy->cpu)->op_points, policy->cpu);
-
-   return 0;
 }
 
 static int centrino_cpu_exit(struct cpufreq_policy *policy)
diff --git a/drivers/cpufreq/speedstep-ich.c b/drivers/cpufreq/speedstep-ich.c
index 5355abb..86a184e 100644
--- a/drivers/cpufreq/speedstep-ich.c
+++ b/drivers/cpufreq/speedstep-ich.c
@@ -320,7 +320,6 @@ static void get_freqs_on_cpu(void *_get_freqs)
 
 static int speedstep_cpu_init(struct cpufreq_policy *policy)
 {
-   int result;
unsigned int policy_cpu, speed;
struct get_freqs gf;
 
@@ -349,13 +348,7 @@ static int speedstep_cpu_init(struct cpufreq_policy 
*policy)
/* cpuinfo and default policy values */
policy->cur = speed;
 
-   result = cpufreq_frequency_table_cpuinfo(policy, speedstep_freqs);
-   if (result)
-   return result;
-
-   cpufreq_frequency_table_get_attr(speedstep_freqs, policy->cpu);
-
-   return 0;
+   return cpufreq_table_validate_and_show(policy, speedstep_freqs);
 }
 
 
diff --git a/drivers/cpufreq/speedstep-smi.c b/drivers/cpufreq/speedstep-smi.c
index abfba4f..f4d0318 100644
--- a/drivers/cpufreq/speedstep-smi.c
+++ b/drivers/cpufreq/speedstep-smi.c
@@ -329,13 +329,7 @@ static int speedstep_cpu_init(struct cpufreq_policy 
*policy)
policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL;
policy->cur = speed;
 
-   result = cpufreq_frequency_table_cpuinfo(policy, speedstep_freqs);
-   if (result)
-   return result;
-
-   cpufreq_frequency_table_get_attr(speedstep_freqs, policy->cpu);
-
-   return 0;
+   return cpufreq_table_validate_and_show(policy, speedstep_freqs);
 }
 
 static int speedstep_cpu_exit(struct cpufreq_policy *policy)
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 33/35] cpufreq: spear: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: spear-de...@list.st.com
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/spear-cpufreq.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/spear-cpufreq.c b/drivers/cpufreq/spear-cpufreq.c
index c3efa7f..1d619dd 100644
--- a/drivers/cpufreq/spear-cpufreq.c
+++ b/drivers/cpufreq/spear-cpufreq.c
@@ -178,13 +178,12 @@ static int spear_cpufreq_init(struct cpufreq_policy 
*policy)
 {
int ret;
 
-   ret = cpufreq_frequency_table_cpuinfo(policy, spear_cpufreq.freq_tbl);
+   ret = cpufreq_table_validate_and_show(policy, spear_cpufreq.freq_tbl);
if (ret) {
-   pr_err("cpufreq_frequency_table_cpuinfo() failed");
+   pr_err("cpufreq_table_validate_and_show() failed");
return ret;
}
 
-   cpufreq_frequency_table_get_attr(spear_cpufreq.freq_tbl, policy->cpu);
policy->cpuinfo.transition_latency = spear_cpufreq.transition_latency;
policy->cur = spear_cpufreq_get(0);
 
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 29/35] cpufreq: s5pv210: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: Kukjin Kim 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/s5pv210-cpufreq.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/cpufreq/s5pv210-cpufreq.c 
b/drivers/cpufreq/s5pv210-cpufreq.c
index 5c77570..c266a7e 100644
--- a/drivers/cpufreq/s5pv210-cpufreq.c
+++ b/drivers/cpufreq/s5pv210-cpufreq.c
@@ -553,11 +553,9 @@ static int __init s5pv210_cpu_init(struct cpufreq_policy 
*policy)
 
policy->cur = policy->min = policy->max = s5pv210_getspeed(0);
 
-   cpufreq_frequency_table_get_attr(s5pv210_freq_table, policy->cpu);
-
policy->cpuinfo.transition_latency = 4;
 
-   return cpufreq_frequency_table_cpuinfo(policy, s5pv210_freq_table);
+   return cpufreq_table_validate_and_show(policy, s5pv210_freq_table);
 
 out_dmc1:
clk_put(dmc0_clk);
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 32/35] cpufreq: sparc: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: David S. Miller 
Cc: sparcli...@vger.kernel.org
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/sparc-us2e-cpufreq.c | 6 +-
 drivers/cpufreq/sparc-us3-cpufreq.c  | 7 +--
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/drivers/cpufreq/sparc-us2e-cpufreq.c 
b/drivers/cpufreq/sparc-us2e-cpufreq.c
index 80e6d92..e35840d 100644
--- a/drivers/cpufreq/sparc-us2e-cpufreq.c
+++ b/drivers/cpufreq/sparc-us2e-cpufreq.c
@@ -307,7 +307,6 @@ static int __init us2e_freq_cpu_init(struct cpufreq_policy 
*policy)
unsigned long clock_tick = sparc64_get_clock_tick(cpu) / 1000;
struct cpufreq_frequency_table *table =
_freq_table[cpu].table[0];
-   int ret;
 
table[0].driver_data = 0;
table[0].frequency = clock_tick / 1;
@@ -325,10 +324,7 @@ static int __init us2e_freq_cpu_init(struct cpufreq_policy 
*policy)
policy->cpuinfo.transition_latency = 0;
policy->cur = clock_tick;
 
-   ret = cpufreq_frequency_table_cpuinfo(policy, table);
-   if (!ret)
-   cpufreq_frequency_table_get_attr(table, policy->cpu);
-   return ret;
+   return cpufreq_table_validate_and_show(policy, table);
 }
 
 static int us2e_freq_cpu_exit(struct cpufreq_policy *policy)
diff --git a/drivers/cpufreq/sparc-us3-cpufreq.c 
b/drivers/cpufreq/sparc-us3-cpufreq.c
index 73a90de..23f2d3b 100644
--- a/drivers/cpufreq/sparc-us3-cpufreq.c
+++ b/drivers/cpufreq/sparc-us3-cpufreq.c
@@ -168,7 +168,6 @@ static int __init us3_freq_cpu_init(struct cpufreq_policy 
*policy)
unsigned long clock_tick = sparc64_get_clock_tick(cpu) / 1000;
struct cpufreq_frequency_table *table =
_freq_table[cpu].table[0];
-   int ret;
 
table[0].driver_data = 0;
table[0].frequency = clock_tick / 1;
@@ -182,11 +181,7 @@ static int __init us3_freq_cpu_init(struct cpufreq_policy 
*policy)
policy->cpuinfo.transition_latency = 0;
policy->cur = clock_tick;
 
-   ret = cpufreq_frequency_table_cpuinfo(policy, table);
-   if (!ret)
-   cpufreq_frequency_table_get_attr(table, policy->cpu);
-
-   return ret;
+   return cpufreq_table_validate_and_show(policy, table);
 }
 
 static int us3_freq_cpu_exit(struct cpufreq_policy *policy)
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 21/35] cpufreq: omap: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: Santosh Shilimkar 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/omap-cpufreq.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/cpufreq/omap-cpufreq.c b/drivers/cpufreq/omap-cpufreq.c
index f31fcfc..b68ce4e 100644
--- a/drivers/cpufreq/omap-cpufreq.c
+++ b/drivers/cpufreq/omap-cpufreq.c
@@ -191,12 +191,10 @@ static int omap_cpu_init(struct cpufreq_policy *policy)
 
atomic_inc_return(_table_users);
 
-   result = cpufreq_frequency_table_cpuinfo(policy, freq_table);
+   result = cpufreq_table_validate_and_show(policy, freq_table);
if (result)
goto fail_table;
 
-   cpufreq_frequency_table_get_attr(freq_table, policy->cpu);
-
policy->cur = omap_getspeed(policy->cpu);
 
/*
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 22/35] cpufreq: p4-clockmod: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: David S. Miller 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/p4-clockmod.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/cpufreq/p4-clockmod.c b/drivers/cpufreq/p4-clockmod.c
index 2f0a2a6..03478bf 100644
--- a/drivers/cpufreq/p4-clockmod.c
+++ b/drivers/cpufreq/p4-clockmod.c
@@ -230,7 +230,6 @@ static int cpufreq_p4_cpu_init(struct cpufreq_policy 
*policy)
else
p4clockmod_table[i].frequency = (stock_freq * i)/8;
}
-   cpufreq_frequency_table_get_attr(p4clockmod_table, policy->cpu);
 
/* cpuinfo and default policy values */
 
@@ -239,7 +238,7 @@ static int cpufreq_p4_cpu_init(struct cpufreq_policy 
*policy)
policy->cpuinfo.transition_latency = 1001;
policy->cur = stock_freq;
 
-   return cpufreq_frequency_table_cpuinfo(policy, _table[0]);
+   return cpufreq_table_validate_and_show(policy, _table[0]);
 }
 
 
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 19/35] cpufreq: loongson2: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: John Crispin 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/loongson2_cpufreq.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/cpufreq/loongson2_cpufreq.c 
b/drivers/cpufreq/loongson2_cpufreq.c
index 7bc3c44..5dd3692 100644
--- a/drivers/cpufreq/loongson2_cpufreq.c
+++ b/drivers/cpufreq/loongson2_cpufreq.c
@@ -133,10 +133,7 @@ static int loongson2_cpufreq_cpu_init(struct 
cpufreq_policy *policy)
 
policy->cur = loongson2_cpufreq_get(policy->cpu);
 
-   cpufreq_frequency_table_get_attr(_clockmod_table[0],
-policy->cpu);
-
-   return cpufreq_frequency_table_cpuinfo(policy,
+   return cpufreq_table_validate_and_show(policy,
_clockmod_table[0]);
 }
 
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 24/35] cpufreq: pmac: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/pmac32-cpufreq.c | 3 +--
 drivers/cpufreq/pmac64-cpufreq.c | 4 +---
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/cpufreq/pmac32-cpufreq.c b/drivers/cpufreq/pmac32-cpufreq.c
index 38cdc63..0b3efdb 100644
--- a/drivers/cpufreq/pmac32-cpufreq.c
+++ b/drivers/cpufreq/pmac32-cpufreq.c
@@ -407,8 +407,7 @@ static int pmac_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
policy->cpuinfo.transition_latency  = transition_latency;
policy->cur = cur_freq;
 
-   cpufreq_frequency_table_get_attr(pmac_cpu_freqs, policy->cpu);
-   return cpufreq_frequency_table_cpuinfo(policy, pmac_cpu_freqs);
+   return cpufreq_table_validate_and_show(policy, pmac_cpu_freqs);
 }
 
 static u32 read_gpio(struct device_node *np)
diff --git a/drivers/cpufreq/pmac64-cpufreq.c b/drivers/cpufreq/pmac64-cpufreq.c
index b6850d9..366be61 100644
--- a/drivers/cpufreq/pmac64-cpufreq.c
+++ b/drivers/cpufreq/pmac64-cpufreq.c
@@ -362,10 +362,8 @@ static int g5_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
 * cpufreq core if in the secondary policy we tell it that
 * it actually must be one policy together with all others. */
cpumask_copy(policy->cpus, cpu_online_mask);
-   cpufreq_frequency_table_get_attr(g5_cpu_freqs, policy->cpu);
 
-   return cpufreq_frequency_table_cpuinfo(policy,
-   g5_cpu_freqs);
+   return cpufreq_table_validate_and_show(policy, g5_cpu_freqs);
 }
 
 
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15/35] cpufreq: ia64-acpi: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: Tony Luck 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/ia64-acpi-cpufreq.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/cpufreq/ia64-acpi-cpufreq.c 
b/drivers/cpufreq/ia64-acpi-cpufreq.c
index 3e14f03..6cfad51 100644
--- a/drivers/cpufreq/ia64-acpi-cpufreq.c
+++ b/drivers/cpufreq/ia64-acpi-cpufreq.c
@@ -335,7 +335,7 @@ acpi_cpufreq_cpu_init (
}
}
 
-   result = cpufreq_frequency_table_cpuinfo(policy, data->freq_table);
+   result = cpufreq_table_validate_and_show(policy, data->freq_table);
if (result) {
goto err_freqfree;
}
@@ -356,8 +356,6 @@ acpi_cpufreq_cpu_init (
(u32) data->acpi_data.states[i].status,
(u32) data->acpi_data.states[i].control);
 
-   cpufreq_frequency_table_get_attr(data->freq_table, policy->cpu);
-
/* the first call to ->target() should result in us actually
 * writing something to the appropriate registers. */
data->resume = 1;
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 18/35] cpufreq: longhaul: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/longhaul.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
index 4ada1cc..70b66fd 100644
--- a/drivers/cpufreq/longhaul.c
+++ b/drivers/cpufreq/longhaul.c
@@ -921,13 +921,7 @@ static int longhaul_cpu_init(struct cpufreq_policy *policy)
policy->cpuinfo.transition_latency = 20;/* nsec */
policy->cur = calc_speed(longhaul_get_cpu_mult());
 
-   ret = cpufreq_frequency_table_cpuinfo(policy, longhaul_table);
-   if (ret)
-   return ret;
-
-   cpufreq_frequency_table_get_attr(longhaul_table, policy->cpu);
-
-   return 0;
+   return cpufreq_table_validate_and_show(policy, longhaul_table);
 }
 
 static int longhaul_cpu_exit(struct cpufreq_policy *policy)
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 23/35] cpufreq: pasemi: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/pasemi-cpufreq.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/cpufreq/pasemi-cpufreq.c b/drivers/cpufreq/pasemi-cpufreq.c
index 534e43a..23bc8a82 100644
--- a/drivers/cpufreq/pasemi-cpufreq.c
+++ b/drivers/cpufreq/pasemi-cpufreq.c
@@ -219,12 +219,10 @@ static int pas_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
 
ppc_proc_freq = policy->cur * 1000ul;
 
-   cpufreq_frequency_table_get_attr(pas_freqs, policy->cpu);
-
/* this ensures that policy->cpuinfo_min and policy->cpuinfo_max
 * are set correctly
 */
-   return cpufreq_frequency_table_cpuinfo(policy, pas_freqs);
+   return cpufreq_table_validate_and_show(policy, pas_freqs);
 
 out_unmap_sdcpwr:
iounmap(sdcpwr_mapbase);
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/35] cpufreq: imx6q: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: Shawn Guo 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/imx6q-cpufreq.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/cpufreq/imx6q-cpufreq.c b/drivers/cpufreq/imx6q-cpufreq.c
index e37cdae..e6f40fa 100644
--- a/drivers/cpufreq/imx6q-cpufreq.c
+++ b/drivers/cpufreq/imx6q-cpufreq.c
@@ -177,7 +177,7 @@ static int imx6q_cpufreq_init(struct cpufreq_policy *policy)
 {
int ret;
 
-   ret = cpufreq_frequency_table_cpuinfo(policy, freq_table);
+   ret = cpufreq_table_validate_and_show(policy, freq_table);
if (ret) {
dev_err(cpu_dev, "invalid frequency table: %d\n", ret);
return ret;
@@ -186,7 +186,6 @@ static int imx6q_cpufreq_init(struct cpufreq_policy *policy)
policy->cpuinfo.transition_latency = transition_latency;
policy->cur = clk_get_rate(arm_clk) / 1000;
cpumask_setall(policy->cpus);
-   cpufreq_frequency_table_get_attr(freq_table, policy->cpu);
 
return 0;
 }
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 20/35] cpufreq: maple: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: Dmitry Eremin-Solenikov 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/maple-cpufreq.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/cpufreq/maple-cpufreq.c b/drivers/cpufreq/maple-cpufreq.c
index 41c601f..19076cc 100644
--- a/drivers/cpufreq/maple-cpufreq.c
+++ b/drivers/cpufreq/maple-cpufreq.c
@@ -181,10 +181,8 @@ static int maple_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
 * cpufreq core if in the secondary policy we tell it that
 * it actually must be one policy together with all others. */
cpumask_setall(policy->cpus);
-   cpufreq_frequency_table_get_attr(maple_cpu_freqs, policy->cpu);
 
-   return cpufreq_frequency_table_cpuinfo(policy,
-   maple_cpu_freqs);
+   return cpufreq_table_validate_and_show(policy, maple_cpu_freqs);
 }
 
 
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 17/35] cpufreq: kirkwood: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: Andrew Lunn 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/kirkwood-cpufreq.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/cpufreq/kirkwood-cpufreq.c 
b/drivers/cpufreq/kirkwood-cpufreq.c
index 45e4d7f..336f171 100644
--- a/drivers/cpufreq/kirkwood-cpufreq.c
+++ b/drivers/cpufreq/kirkwood-cpufreq.c
@@ -125,19 +125,11 @@ static int kirkwood_cpufreq_target(struct cpufreq_policy 
*policy,
 /* Module init and exit code */
 static int kirkwood_cpufreq_cpu_init(struct cpufreq_policy *policy)
 {
-   int result;
-
/* cpuinfo and default policy values */
policy->cpuinfo.transition_latency = 5000; /* 5uS */
policy->cur = kirkwood_cpufreq_get_cpu_frequency(0);
 
-   result = cpufreq_frequency_table_cpuinfo(policy, kirkwood_freq_table);
-   if (result)
-   return result;
-
-   cpufreq_frequency_table_get_attr(kirkwood_freq_table, policy->cpu);
-
-   return 0;
+   return cpufreq_table_validate_and_show(policy, kirkwood_freq_table);
 }
 
 static int kirkwood_cpufreq_cpu_exit(struct cpufreq_policy *policy)
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


ARM: AM335x: Reboot broken in 3.11

2013-08-08 Thread Mark Jackson
Rebooting appears to have broken in 3.11 (at some point before rc1).

Here is the console output:-

[0.00] Booting Linux on physical CPU 0x0
[0.00] Linux version 3.11.0-rc1-6-gf550793 (mpfj@mpfj-nanobone) 
(gcc version 4.6.3 (Buildroot 2013.02-dirty) ) #328 Thu Aug 8 14:36:16 BST 2013
...
Welcome to Buildroot
nanobone login: root
Password:
# reboot
#
[   23.867076] UBIFS: background thread "ubifs_bgt0_0" stops
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
Requesting system reboot
[   25.924496] reboot: Restarting system

... and at this point the CPU seems to just freeze.

In 3v10, the board would reboot correctly back into uboot, etc.

I've also noticed that some of the output LEDs light up dim when the kernel
is booting on, and they come on full brightness at the reboot "freeze" point.
There are 4 LEDs affected and they are all connected to UART transmit pins.

Before I start bisecting, does anyone have any ideas ?

Cheers
Mark J.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/35] cpufreq: davinci: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: Sekhar Nori 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/davinci-cpufreq.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/davinci-cpufreq.c 
b/drivers/cpufreq/davinci-cpufreq.c
index 551dd65..f67196e 100644
--- a/drivers/cpufreq/davinci-cpufreq.c
+++ b/drivers/cpufreq/davinci-cpufreq.c
@@ -140,15 +140,13 @@ static int davinci_cpu_init(struct cpufreq_policy *policy)
 
policy->cur = davinci_getspeed(0);
 
-   result = cpufreq_frequency_table_cpuinfo(policy, freq_table);
+   result = cpufreq_table_validate_and_show(policy, freq_table);
if (result) {
-   pr_err("%s: cpufreq_frequency_table_cpuinfo() failed",
+   pr_err("%s: cpufreq_table_validate_and_show() failed",
__func__);
return result;
}
 
-   cpufreq_frequency_table_get_attr(freq_table, policy->cpu);
-
/*
 * Time measurement across the target() function yields ~1500-1800us
 * time taken with no drivers on notification list.
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/35] cpufreq: blackfin: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: Steven Miao 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/blackfin-cpufreq.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/cpufreq/blackfin-cpufreq.c 
b/drivers/cpufreq/blackfin-cpufreq.c
index ef05978..54c0a0c 100644
--- a/drivers/cpufreq/blackfin-cpufreq.c
+++ b/drivers/cpufreq/blackfin-cpufreq.c
@@ -210,8 +210,7 @@ static int __bfin_cpu_init(struct cpufreq_policy *policy)
policy->cpuinfo.transition_latency = 5; /* 50us assumed */
 
policy->cur = cclk;
-   cpufreq_frequency_table_get_attr(bfin_freq_table, policy->cpu);
-   return cpufreq_frequency_table_cpuinfo(policy, bfin_freq_table);
+   return cpufreq_table_validate_and_show(policy, bfin_freq_table);
 }
 
 static struct freq_attr *bfin_freq_attr[] = {
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/35] cpufreq: arm_big_little: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/arm_big_little.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c
index 3549f07..5070273 100644
--- a/drivers/cpufreq/arm_big_little.c
+++ b/drivers/cpufreq/arm_big_little.c
@@ -165,7 +165,7 @@ static int bL_cpufreq_init(struct cpufreq_policy *policy)
if (ret)
return ret;
 
-   ret = cpufreq_frequency_table_cpuinfo(policy, freq_table[cur_cluster]);
+   ret = cpufreq_table_validate_and_show(policy, freq_table[cur_cluster]);
if (ret) {
dev_err(cpu_dev, "CPU %d, cluster: %d invalid freq table\n",
policy->cpu, cur_cluster);
@@ -173,8 +173,6 @@ static int bL_cpufreq_init(struct cpufreq_policy *policy)
return ret;
}
 
-   cpufreq_frequency_table_get_attr(freq_table[cur_cluster], policy->cpu);
-
if (arm_bL_ops->get_transition_latency)
policy->cpuinfo.transition_latency =
arm_bL_ops->get_transition_latency(cpu_dev);
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/35] cpufreq: cris: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: Jesper Nilsson 
Cc: Mikael Starvik 
Cc: linux-cris-ker...@axis.com
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cris-artpec3-cpufreq.c | 10 +-
 drivers/cpufreq/cris-etraxfs-cpufreq.c | 10 +-
 2 files changed, 2 insertions(+), 18 deletions(-)

diff --git a/drivers/cpufreq/cris-artpec3-cpufreq.c 
b/drivers/cpufreq/cris-artpec3-cpufreq.c
index cb8276d..444fd96 100644
--- a/drivers/cpufreq/cris-artpec3-cpufreq.c
+++ b/drivers/cpufreq/cris-artpec3-cpufreq.c
@@ -76,19 +76,11 @@ static int cris_freq_target(struct cpufreq_policy *policy,
 
 static int cris_freq_cpu_init(struct cpufreq_policy *policy)
 {
-   int result;
-
/* cpuinfo and default policy values */
policy->cpuinfo.transition_latency = 100; /* 1ms */
policy->cur = cris_freq_get_cpu_frequency(0);
 
-   result = cpufreq_frequency_table_cpuinfo(policy, cris_freq_table);
-   if (result)
-   return (result);
-
-   cpufreq_frequency_table_get_attr(cris_freq_table, policy->cpu);
-
-   return 0;
+   return cpufreq_table_validate_and_show(policy, cris_freq_table);
 }
 
 
diff --git a/drivers/cpufreq/cris-etraxfs-cpufreq.c 
b/drivers/cpufreq/cris-etraxfs-cpufreq.c
index 72328f7..428395e 100644
--- a/drivers/cpufreq/cris-etraxfs-cpufreq.c
+++ b/drivers/cpufreq/cris-etraxfs-cpufreq.c
@@ -75,19 +75,11 @@ static int cris_freq_target(struct cpufreq_policy *policy,
 
 static int cris_freq_cpu_init(struct cpufreq_policy *policy)
 {
-   int result;
-
/* cpuinfo and default policy values */
policy->cpuinfo.transition_latency = 100;   /* 1ms */
policy->cur = cris_freq_get_cpu_frequency(0);
 
-   result = cpufreq_frequency_table_cpuinfo(policy, cris_freq_table);
-   if (result)
-   return (result);
-
-   cpufreq_frequency_table_get_attr(cris_freq_table, policy->cpu);
-
-   return 0;
+   return cpufreq_table_validate_and_show(policy, cris_freq_table);
 }
 
 static int cris_freq_cpu_exit(struct cpufreq_policy *policy)
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/35] cpufreq: cpufreq-cpu0: use cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Lets use cpufreq_table_validate_and_show() instead of calling
cpufreq_frequency_table_cpuinfo() and cpufreq_frequency_table_get_attr().

Cc: Shawn Guo 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq-cpu0.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/cpufreq/cpufreq-cpu0.c b/drivers/cpufreq/cpufreq-cpu0.c
index ad1fde2..65d70a3 100644
--- a/drivers/cpufreq/cpufreq-cpu0.c
+++ b/drivers/cpufreq/cpufreq-cpu0.c
@@ -128,7 +128,7 @@ static int cpu0_cpufreq_init(struct cpufreq_policy *policy)
 {
int ret;
 
-   ret = cpufreq_frequency_table_cpuinfo(policy, freq_table);
+   ret = cpufreq_table_validate_and_show(policy, freq_table);
if (ret) {
pr_err("invalid frequency table: %d\n", ret);
return ret;
@@ -144,8 +144,6 @@ static int cpu0_cpufreq_init(struct cpufreq_policy *policy)
 */
cpumask_setall(policy->cpus);
 
-   cpufreq_frequency_table_get_attr(freq_table, policy->cpu);
-
return 0;
 }
 
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/35] cpufreq: s3cx4xx: call cpufreq_frequency_table_get_attr()

2013-08-08 Thread Viresh Kumar
This exposes frequency table of driver to cpufreq core and is required for core
to guess what the index for a target frequency is, when it calls
cpufreq_frequency_table_target(). And so this driver needs to expose it.

Cc: Kukjin Kim 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/s3c24xx-cpufreq.c | 4 +++-
 drivers/cpufreq/s3c64xx-cpufreq.c | 3 +++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/s3c24xx-cpufreq.c 
b/drivers/cpufreq/s3c24xx-cpufreq.c
index f169ee5..8843454 100644
--- a/drivers/cpufreq/s3c24xx-cpufreq.c
+++ b/drivers/cpufreq/s3c24xx-cpufreq.c
@@ -386,8 +386,10 @@ static int s3c_cpufreq_init(struct cpufreq_policy *policy)
/* feed the latency information from the cpu driver */
policy->cpuinfo.transition_latency = cpu_cur.info->latency;
 
-   if (ftab)
+   if (ftab) {
cpufreq_frequency_table_cpuinfo(policy, ftab);
+   cpufreq_frequency_table_get_attr(ftab, policy->cpu);
+   }
 
return 0;
 }
diff --git a/drivers/cpufreq/s3c64xx-cpufreq.c 
b/drivers/cpufreq/s3c64xx-cpufreq.c
index 8a72b0c..9024043 100644
--- a/drivers/cpufreq/s3c64xx-cpufreq.c
+++ b/drivers/cpufreq/s3c64xx-cpufreq.c
@@ -257,6 +257,9 @@ static int s3c64xx_cpufreq_driver_init(struct 
cpufreq_policy *policy)
   ret);
regulator_put(vddarm);
clk_put(armclk);
+   } else {
+   cpufreq_frequency_table_get_attr(s3c64xx_freq_table,
+   policy->cpu);
}
 
return ret;
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/35] cpufreq: sparc: call cpufreq_frequency_table_get_attr()

2013-08-08 Thread Viresh Kumar
This exposes frequency table of driver to cpufreq core and is required for core
to guess what the index for a target frequency is, when it calls
cpufreq_frequency_table_target(). And so this driver needs to expose it.

Cc: David S. Miller 
Cc: sparcli...@vger.kernel.org
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/sparc-us2e-cpufreq.c | 6 +-
 drivers/cpufreq/sparc-us3-cpufreq.c  | 7 ++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/sparc-us2e-cpufreq.c 
b/drivers/cpufreq/sparc-us2e-cpufreq.c
index cf5bc2c..80e6d92 100644
--- a/drivers/cpufreq/sparc-us2e-cpufreq.c
+++ b/drivers/cpufreq/sparc-us2e-cpufreq.c
@@ -307,6 +307,7 @@ static int __init us2e_freq_cpu_init(struct cpufreq_policy 
*policy)
unsigned long clock_tick = sparc64_get_clock_tick(cpu) / 1000;
struct cpufreq_frequency_table *table =
_freq_table[cpu].table[0];
+   int ret;
 
table[0].driver_data = 0;
table[0].frequency = clock_tick / 1;
@@ -324,7 +325,10 @@ static int __init us2e_freq_cpu_init(struct cpufreq_policy 
*policy)
policy->cpuinfo.transition_latency = 0;
policy->cur = clock_tick;
 
-   return cpufreq_frequency_table_cpuinfo(policy, table);
+   ret = cpufreq_frequency_table_cpuinfo(policy, table);
+   if (!ret)
+   cpufreq_frequency_table_get_attr(table, policy->cpu);
+   return ret;
 }
 
 static int us2e_freq_cpu_exit(struct cpufreq_policy *policy)
diff --git a/drivers/cpufreq/sparc-us3-cpufreq.c 
b/drivers/cpufreq/sparc-us3-cpufreq.c
index ac76b48..73a90de 100644
--- a/drivers/cpufreq/sparc-us3-cpufreq.c
+++ b/drivers/cpufreq/sparc-us3-cpufreq.c
@@ -168,6 +168,7 @@ static int __init us3_freq_cpu_init(struct cpufreq_policy 
*policy)
unsigned long clock_tick = sparc64_get_clock_tick(cpu) / 1000;
struct cpufreq_frequency_table *table =
_freq_table[cpu].table[0];
+   int ret;
 
table[0].driver_data = 0;
table[0].frequency = clock_tick / 1;
@@ -181,7 +182,11 @@ static int __init us3_freq_cpu_init(struct cpufreq_policy 
*policy)
policy->cpuinfo.transition_latency = 0;
policy->cur = clock_tick;
 
-   return cpufreq_frequency_table_cpuinfo(policy, table);
+   ret = cpufreq_frequency_table_cpuinfo(policy, table);
+   if (!ret)
+   cpufreq_frequency_table_get_attr(table, policy->cpu);
+
+   return ret;
 }
 
 static int us3_freq_cpu_exit(struct cpufreq_policy *policy)
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/35] cpufreq: Add new helper cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
Almost every cpufreq driver is required to validate its frequency table with:
cpufreq_frequency_table_cpuinfo() and then expose it to cpufreq core with:
cpufreq_frequency_table_get_attr().

This patch creates another helper routine cpufreq_table_validate_and_show() that
will do both these steps in a single call and will return 0 for success, error
otherwise.

This also fixes potential bugs in cpufreq drivers where people have called
cpufreq_frequency_table_get_attr() before calling
cpufreq_frequency_table_cpuinfo(), as the later may fail.

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/freq_table.c | 12 
 include/linux/cpufreq.h  |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/drivers/cpufreq/freq_table.c b/drivers/cpufreq/freq_table.c
index f111454a..11f6fa9 100644
--- a/drivers/cpufreq/freq_table.c
+++ b/drivers/cpufreq/freq_table.c
@@ -219,6 +219,18 @@ void cpufreq_frequency_table_put_attr(unsigned int cpu)
 }
 EXPORT_SYMBOL_GPL(cpufreq_frequency_table_put_attr);
 
+int cpufreq_table_validate_and_show(struct cpufreq_policy *policy,
+ struct cpufreq_frequency_table *table)
+{
+   int ret = cpufreq_frequency_table_cpuinfo(policy, table);
+
+   if (!ret)
+   cpufreq_frequency_table_get_attr(table, policy->cpu);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(cpufreq_table_validate_and_show);
+
 void cpufreq_frequency_table_update_policy_cpu(struct cpufreq_policy *policy)
 {
pr_debug("Updating show_table for new_cpu %u from last_cpu %u\n",
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index d568f39..c0297a6 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -411,5 +411,7 @@ extern struct freq_attr 
cpufreq_freq_attr_scaling_available_freqs;
 void cpufreq_frequency_table_get_attr(struct cpufreq_frequency_table *table,
  unsigned int cpu);
 void cpufreq_frequency_table_put_attr(unsigned int cpu);
+int cpufreq_table_validate_and_show(struct cpufreq_policy *policy,
+ struct cpufreq_frequency_table *table);
 
 #endif /* _LINUX_CPUFREQ_H */
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 00/35] cpufreq: Introduce cpufreq_table_validate_and_show()

2013-08-08 Thread Viresh Kumar
This is actually part of a bigger patchset which will change declaration of
cpufreq_driver->target() to include index instead of target_freq and relation
and hence cpufreq drivers wouldn't require to cpufreq_frequency_table_target()
anymore.

Almost every cpufreq driver is required to validate its frequency table with:
cpufreq_frequency_table_cpuinfo() and then expose it to cpufreq core with:
cpufreq_frequency_table_get_attr().

This patch creates another helper routine cpufreq_table_validate_and_show() that
will do both these steps in a single call and will return 0 for success, error
otherwise.

This also fixes potential bugs in cpufreq drivers where people have called
cpufreq_frequency_table_get_attr() before calling
cpufreq_frequency_table_cpuinfo(), as the later may fail.

Viresh Kumar (35):
  cpufreq: Add new helper cpufreq_table_validate_and_show()
  cpufreq: pxa: call cpufreq_frequency_table_get_attr()
  cpufreq: s3cx4xx: call cpufreq_frequency_table_get_attr()
  cpufreq: sparc: call cpufreq_frequency_table_get_attr()
  cpufreq: acpi-cpufreq: use cpufreq_table_validate_and_show()
  cpufreq: arm_big_little: use cpufreq_table_validate_and_show()
  cpufreq: blackfin: use cpufreq_table_validate_and_show()
  cpufreq: cpufreq-cpu0: use cpufreq_table_validate_and_show()
  cpufreq: cris: use cpufreq_table_validate_and_show()
  cpufreq: davinci: use cpufreq_table_validate_and_show()
  cpufreq: dbx500: use cpufreq_table_validate_and_show()
  cpufreq: e_powersaver: use cpufreq_table_validate_and_show()
  cpufreq: elanfreq: use cpufreq_table_validate_and_show()
  cpufreq: exynos: use cpufreq_table_validate_and_show()
  cpufreq: ia64-acpi: use cpufreq_table_validate_and_show()
  cpufreq: imx6q: use cpufreq_table_validate_and_show()
  cpufreq: kirkwood: use cpufreq_table_validate_and_show()
  cpufreq: longhaul: use cpufreq_table_validate_and_show()
  cpufreq: loongson2: use cpufreq_table_validate_and_show()
  cpufreq: maple: use cpufreq_table_validate_and_show()
  cpufreq: omap: use cpufreq_table_validate_and_show()
  cpufreq: p4-clockmod: use cpufreq_table_validate_and_show()
  cpufreq: pasemi: use cpufreq_table_validate_and_show()
  cpufreq: pmac: use cpufreq_table_validate_and_show()
  cpufreq: powernow: use cpufreq_table_validate_and_show()
  cpufreq: ppc: use cpufreq_table_validate_and_show()
  cpufreq: pxa: use cpufreq_table_validate_and_show()
  cpufreq: s3cx4xx: use cpufreq_table_validate_and_show()
  cpufreq: s5pv210: use cpufreq_table_validate_and_show()
  cpufreq: sc520: use cpufreq_table_validate_and_show()
  cpufreq: sh: use cpufreq_table_validate_and_show()
  cpufreq: sparc: use cpufreq_table_validate_and_show()
  cpufreq: spear: use cpufreq_table_validate_and_show()
  cpufreq: speedstep: use cpufreq_table_validate_and_show()
  cpufreq: tegra: use cpufreq_table_validate_and_show()

 drivers/cpufreq/acpi-cpufreq.c |  4 +---
 drivers/cpufreq/arm_big_little.c   |  4 +---
 drivers/cpufreq/blackfin-cpufreq.c |  3 +--
 drivers/cpufreq/cpufreq-cpu0.c |  4 +---
 drivers/cpufreq/cris-artpec3-cpufreq.c | 10 +-
 drivers/cpufreq/cris-etraxfs-cpufreq.c | 10 +-
 drivers/cpufreq/davinci-cpufreq.c  |  6 ++
 drivers/cpufreq/dbx500-cpufreq.c   |  6 ++
 drivers/cpufreq/e_powersaver.c |  3 +--
 drivers/cpufreq/elanfreq.c |  8 +---
 drivers/cpufreq/exynos-cpufreq.c   |  4 +---
 drivers/cpufreq/exynos5440-cpufreq.c   |  4 +---
 drivers/cpufreq/freq_table.c   | 12 
 drivers/cpufreq/ia64-acpi-cpufreq.c|  4 +---
 drivers/cpufreq/imx6q-cpufreq.c|  3 +--
 drivers/cpufreq/kirkwood-cpufreq.c | 10 +-
 drivers/cpufreq/longhaul.c |  8 +---
 drivers/cpufreq/loongson2_cpufreq.c|  5 +
 drivers/cpufreq/maple-cpufreq.c|  4 +---
 drivers/cpufreq/omap-cpufreq.c |  4 +---
 drivers/cpufreq/p4-clockmod.c  |  3 +--
 drivers/cpufreq/pasemi-cpufreq.c   |  4 +---
 drivers/cpufreq/pmac32-cpufreq.c   |  3 +--
 drivers/cpufreq/pmac64-cpufreq.c   |  4 +---
 drivers/cpufreq/powernow-k6.c  |  9 +
 drivers/cpufreq/powernow-k7.c  |  4 +---
 drivers/cpufreq/powernow-k8.c  |  4 +---
 drivers/cpufreq/ppc-corenet-cpufreq.c  |  3 +--
 drivers/cpufreq/ppc_cbe_cpufreq.c  |  4 +---
 drivers/cpufreq/pxa2xx-cpufreq.c   |  8 +---
 drivers/cpufreq/pxa3xx-cpufreq.c   |  2 +-
 drivers/cpufreq/s3c2416-cpufreq.c  |  4 +---
 drivers/cpufreq/s3c24xx-cpufreq.c  |  2 +-
 drivers/cpufreq/s3c64xx-cpufreq.c  |  2 +-
 drivers/cpufreq/s5pv210-cpufreq.c  |  4 +---
 drivers/cpufreq/sc520_freq.c   |  9 +
 drivers/cpufreq/sh-cpufreq.c   |  6 +++---
 drivers/cpufreq/sparc-us2e-cpufreq.c   |  2 +-
 drivers/cpufreq/sparc-us3-cpufreq.c|  2 +-
 drivers/cpufreq/spear-cpufreq.c|  5 ++---
 drivers/cpufreq/speedstep-centrino.c   | 10 +-
 drivers/cpufreq/speedstep-ich.c|  9 +
 

Re: [edk2] Corrupted EFI region

2013-08-08 Thread Andrew Fish

On Aug 8, 2013, at 3:17 AM, Matt Fleming  wrote:

> On Wed, 07 Aug, at 02:10:28PM, Andrew Fish wrote:
>> Well the issue I see is I don't think OS X or Windows are doing this.
>> So I'm guessing there is some unique thing beings done on the Linux
>> side and we don't have good tests to catch bugs in the EFI
>> implementations. If the Linux loader hides the bugs and we don't hit
>> them with other operating systems they are never going to get fixed.
>> It would be good if we could track down some of these issues and make
>> a request for some tests that can help catch these issues. The tests
>> would be part of UEFI.org, but since some of us play in both worlds we
>> can forward the known issues to the UEFI test work group. 
> 
> I'm all for helping to develop tests that catch these kind of bugs.
> What's the next step?
> 

I'll bring this up with UEFI.org.

>> Is it possible to have a switch to turn off the not required behavior
>> (hiding EFI implementation bugs) so that bad platforms could be
>> detected? This would be a good thing to try on platforms at the
>> upcoming UEFI Plugfest hosted by the Linux Foundation and the UEFI
>> Forum, so the bad behavior can be detected and the vendors can fix the
>> issue. 
> 
> We don't tend to provide switches for the kernel to turn off workarounds
> because users run the risk of inadvertently stopping their machines from
> booting correctly. Also, because the major distributions will always
> enable the workarounds, the kernel would need to be built manually to
> see any kind of informative error message.
> 
> What we do have though is the Firmware Testsuite - fwts,
> 
>  https://wiki.ubuntu.com/Kernel/Reference/fwts
> 
> I know that Brian (Cc'd) has been doing some excellent advocacy work,
> getting people at plugfetsts to run this testsuite which tests for
> implementation bugs from within a Linux environment.
> 
>> PS Also maybe it would be possible to key this work around behavior on
>> the EFI/UEFI version. So for example no work-around after UEFI v2.3.1?
> 
> That would really depend on who has seen this bug and on which
> platforms. Is there a particular reason that mapping the boot services
> regions as-is would cause problems?
> 

1) The firmware bug could also be a security hole and thus needs to get fixed. 
2) The kernel gets locked into a design that does not follow the specification, 
and this limits future design options.
3) Makes the code more complex to maintain and test. 

> -- 
> Matt Fleming, Intel Open Source Technology Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: device thermal limits represented in device tree nodes

2013-08-08 Thread Pawel Moll
On Thu, 2013-08-08 at 09:53 +0100, Mark Rutland wrote:
> On Wed, Aug 07, 2013 at 09:18:29PM +0100, Eduardo Valentin wrote:
> > Pawel, all,
> > 
> > On 06-08-2013 07:14, Pawel Moll wrote:
> > > Apologies about the delay, I was "otherwise engaged" for a week...
> > > 
> > 
> > I do also excuse for my delay, as I was also "engaged" for a week or so.
> > 
> > > I hope you haven't lost all motivation to work on this subject, as it's
> > > really worth the while!
> > 
> > Not really! quite the opposite. Although I was looking at some other
> > stuff, I got this series also tested on different boards and wrote down
> > a couple of improvements I will be working in the coming days. Indeed,
> > it is worth moving forward with this work.
> > 
> > > 
> > > On Fri, 2013-07-26 at 20:55 +0100, Eduardo Valentin wrote:
> > >> On 25-07-2013 13:33, Pawel Moll wrote:
> > >>> On Thu, 2013-07-25 at 18:20 +0100, Eduardo Valentin wrote:
> > >>  thermal_zone {
> > >>  type = "CPU";
> > >
> > > So what does this exactly mean? What is so special about CPU? What 
> > > other
> > > types you've got there? (Am I just lazy not looking at the numerous
> > > links you provided? ;-)
> > 
> >  Hehehe. OK. Type is supposed to describe what your zone is 
> >  representing.
> > >>>
> > >>> As in "a name"? So, for example "The board", "PSU"? What I meant to ask
> > >>> was: does the string carry any meaning?
> > > 
> > > You haven't commended on this...
> > 
> > The string is supposed to carry meaning, yes. Couple of common used:
> > CPU, GPU, PCB, LCD
> 
> I think the point Pawel was getting at is that the string doesn't have a
> *well-defined* meaning that always allows an OS to figure out the set of
> relevant devices. If we have a thermal zone for "LCD", and have multiple
> LCDs, which LCDs are covered? If we have a "PCB" zone, does this cover all
> the devices attached to the PCB, a subset thereof, or the substrate of
> the PCB itself?

Or: what happens if I type "lcd" instead of "LCD"? Would there be any
decision made based on this string? Or is it just a label to be used
somewhere in debugging messages?

Paweł


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v9 02/16] iommu/exynos: add missing cache flush for removed page table entries

2013-08-08 Thread Tomasz Figa
On Thursday 08 of August 2013 18:37:34 Cho KyongHo wrote:
> This commit adds cache flush for removed small and large page entries
> in exynos_iommu_unmap(). Missing cache flush of removed page table
> entries can cause missing page fault interrupt when a master IP
> accesses an unmapped area.
> 
> Tested-by: Grant Grundler 
> Signed-off-by: Cho KyongHo 
> ---
>  drivers/iommu/exynos-iommu.c |2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
> index 233f382..d545a25 100644
> --- a/drivers/iommu/exynos-iommu.c
> +++ b/drivers/iommu/exynos-iommu.c
> @@ -1002,6 +1002,7 @@ static size_t exynos_iommu_unmap(struct
> iommu_domain *domain, if (lv2ent_small(ent)) {
>   *ent = 0;
>   size = SPAGE_SIZE;
> + pgtable_flush(ent, ent + 1);
>   priv->lv2entcnt[lv1ent_offset(iova)] += 1;
>   goto done;
>   }
> @@ -1010,6 +1011,7 @@ static size_t exynos_iommu_unmap(struct
> iommu_domain *domain, BUG_ON(size < LPAGE_SIZE);
> 
>   memset(ent, 0, sizeof(*ent) * SPAGES_PER_LPAGE);
> + pgtable_flush(ent, ent + SPAGES_PER_LPAGE);
> 
>   size = LPAGE_SIZE;
>   priv->lv2entcnt[lv1ent_offset(iova)] += SPAGES_PER_LPAGE;

Looks reasonable.

Reviewed-by: Tomasz Figa 

Best regards,
Tomasz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v02 1/5] PowerCap: Documentation

2013-08-08 Thread Rob Landley

On 08/07/2013 11:12:41 AM, Srinivas Pandruvada wrote:
Added power cap framework documentation. This explains the use of  
power capping

framework, sysfs and programming interface.
There are two documents:
Documentation/powercap/PowerCappingFramework.txt: Explains use case  
and API in

details.
Documentation/ABI/testing/sysfs-class-powercap: Explains ABIs.

Reviewed-by: Len Brown 
Signed-off-by: Srinivas Pandruvada  


Signed-off-by: Jacob Pan 
Signed-off-by: Arjan van de Ven 
---
 Documentation/ABI/testing/sysfs-class-powercap   | 165 ++
 Documentation/powercap/PowerCappingFramework.txt | 686  
+++

...

--- /dev/null
+++ b/Documentation/powercap/PowerCappingFramework.txt
@@ -0,0 +1,686 @@
+Power Capping Framework
+==
+
+The Linux Power Capping Framework provides user-space with a common
+API to kernel-mode power-capping drivers.  At the same time,
+it provides the hardware-specific power-capping drivers with
+a uniform API to user-space.


s/.  At the same time, it provides/, and/


+Terminology
+=
+The Power Capping framework organizes power capping devices under a  
tree structure.
+At the root level, each device is under some "controller", which is  
the enabler

+of technology.


A controller is the enabler of technology?

What does that mean?


For example this can be "RAPL".


Ah, clears it right up.


+Under each controllers,


each doesn't take a plural.


there are multiple power zones, which can be independently
+monitored and controlled.
+Each power zone can be organized as a tree with parent, children and  
siblings.
+Each power zone defines attributes to enable power monitoring and  
constraints.

+
+Example sysfs interface tree:
+
+/sys/devices/virtual/power_cap
+└── intel-rapl

... intel intel intel intel...

+
+For example, above powercap sysfs tree represents RAPL(Running  
Average Power Limit)
+type controls available in the Intel® 64 and IA-32 Processor  
Architectures. Here


What are the chances of this ever being applied to a non-intel  
processor? (Should it be under Documentation/x86, or is it presented as  
something with a nonzero chance of actually ever being generic?)


+under controller "intel-rapl" there are two CPU packages  
(package-0/1), which can
+provide power monitoring and controls (intel-rapl:0 and  
intel-rapl:1). Each power

+zone has a name.
+For example:
+cat /sys/class/power_cap/intel-rapl/intel-rapl:0/name
+package-0
+
+In addition to providing monitoring and control at package level,  
each package

+is further divided into child power zones (called domains in the RAPL
specifications).


Where are the RAPL specifications, and is this framework just an  
implementation of them or is it more generic?


+Here zones represent controls for core and dram  parts. These zones  
can be represented

+as children of package.
+For example:
+cat /sys/class/power_cap/intel-rapl/intel-rapl:0/intel-rapl:0:1/name
+dram
+
+Under RAPL framework there are two constraints, one for
+short term and one for long term, with two different time windows.  
These can be
+represented as two constraints, with different time windows, power  
limits and names.

+For example:
+   constraint_0_name
+   constraint_0_power_limit_uw
+   constraint_0_time_window_us
+   constraint_1_name
+   constraint_1_power_limit_uw
+   constraint_1_time_window_us
+
+Power Zone Attributes
+=
+Monitoring attributes
+--
+
+energy_uj (rw): Current energy counter in micro joules. Write "0" to  
reset.

+If the counter can not be reset, then this attribute is read only.
+
+max_energy_range_uj (ro): Range of the above energy counter in  
micro-joules.

+
+power_uw (rw): Current power in micro watts. Write "0" to resets the  
value.

+If the value can not be reset, then this attribute is read only.
+
+max_power_range_uw (ro): Range of the above power value in  
micro-watts.

+
+name (ro): Name of this power zone.
+
+It is possible that some domains can have both power and energy  
counters and

+ranges, but at least one is mandatory.
+
+Constraints
+
+constraint_X_power_limit_uw (rw): Power limit in micro watts, which  
should be
+applicable for the time window specified by  
"constraint_X_time_window_us".

+
+constraint_X_time_window_us (rw): Time window in micro seconds.
+
+constraint_X_name (ro): An optional name of the constraint
+
+constraint_X_max_power_uw(ro): Maximum allowed power in micro watts.
+
+constraint_X_min_power_uw(ro): Minimum allowed power in micro watts.
+
+constraint_X_max_time_window_us(ro): Maximum allowed time window in  
micro seconds.

+
+constraint_X_min_time_window_us(ro): Minimum allowed time window in  
micro seconds.

+
+In addition each node has an attribute "type", which shows, whether  
is a controller
+or power zone. Except power_limit_uw and time_window_us other fields  
are optional.

+
+Power Cap Client 

Re: [PATCH v9 01/16] iommu/exynos: do not include removed header

2013-08-08 Thread Tomasz Figa
Hi KyongHo,

On Thursday 08 of August 2013 18:37:25 Cho KyongHo wrote:
> This commit remove  which is removed.

I would prefer a more meaningful commit message, something among following 
lines:

8<---
Commit 25e9d28d92 (ARM: EXYNOS: remove system mmu initialization from 
exynos tree) removed arch/arm/mach-exynos/mach/sysmmu.h header without 
removing remaining use of it from exynos-iommu driver, thus causing a 
compilation error.

This patch fixes the error by removing respective include line from exynos-
iommu.c.
--->8

Also a sentence explaining why linux/kernel.h header must be included would 
be nice.

> Signed-off-by: Cho KyongHo 

Please note that as far as I'm aware of, tags should be made using western 
name writing convention, i.e. starting with first name.

Best regards,
Tomasz

> ---
>  drivers/iommu/exynos-iommu.c |3 +--
>  1 files changed, 1 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
> index 3f32d64..233f382 100644
> --- a/drivers/iommu/exynos-iommu.c
> +++ b/drivers/iommu/exynos-iommu.c
> @@ -12,6 +12,7 @@
>  #define DEBUG
>  #endif
> 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -29,8 +30,6 @@
>  #include 
>  #include 
> 
> -#include 
> -
>  /* We does not consider super section mapping (16MB) */
>  #define SECT_ORDER 20
>  #define LPAGE_ORDER 16
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] btrfs: cleanup: mark 'btrfs_write_and_wait_marked_extents' as static

2013-08-08 Thread David Sterba
On Thu, Aug 08, 2013 at 12:43:19AM +0300, Sergei Trofimovich wrote:
> From: Sergei Trofimovich 
> 
> Found by uselex.rb:
> > btrfs_write_and_wait_marked_extents: [R]: exported from: fs/btrfs/btrfs.o 
> > fs/btrfs/transaction.o fs/btrfs/built-in.o
> 
> Signed-off-by: Sergei Trofimovich 
> ---
>  fs/btrfs/transaction.c | 4 ++--
>  fs/btrfs/transaction.h | 2 --
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index d58cce7..ff891d2 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -837,8 +837,8 @@ int btrfs_wait_marked_extents(struct btrfs_root *root,
>   * them in one of two extent_io trees.  This is used to make sure all of
>   * those extents are on disk for transaction or log commit
>   */
> -int btrfs_write_and_wait_marked_extents(struct btrfs_root *root,
> - struct extent_io_tree *dirty_pages, int mark)
> +static int btrfs_write_and_wait_marked_extents(struct btrfs_root *root,
> +struct extent_io_tree 
> *dirty_pages, int mark)

You may want to run the output through checkpatch.pl and fix obvious
style violations (line too long).

david
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] kernel/sys.c: return the current gid when error occurs

2013-08-08 Thread Michael Kerrisk (man-pages)
On 08/07/13 18:21, Oleg Nesterov wrote:
> On 08/06, Andy Lutomirski wrote:
>>
>> I assume that what the man page means is that the return value is
>> whatever fsgid was prior to the call.  On error, fsgid isn't changed, so
>> the return value is still "current".
> 
> Probably... Still
> 
>   On success, the previous value of fsuid is returned.
>   On error, the current value of fsuid is returned.
> 
> looks confusing. sys_setfsuid() always returns the old value.
> 
>> (FWIW, this behavior is awful and is probably the cause of a security
>> bug or three, since success and failure are indistinguishable.
> 
> At least this all looks strange.
> 
> I dunno if we can change this old behaviour. I won't be surprized
> if someone already uses setfsuid(-1) as getfsuid().
> 
> And perhaps the man page should be changed. Add Michael.

Thanks, Oleg. I've applied the following patch to setfsuid.2
(and a similar patch to setfsgid.2).

Cheers,

Michael

--- a/man2/setfsuid.2
+++ b/man2/setfsuid.2
@@ -67,12 +67,8 @@ matches either the real user ID, effective user ID, saved 
set-user-ID, or
 the current value of
 .IR fsuid .
 .SH RETURN VALUE
-On success, the previous value of
-.I fsuid
-is returned.
-On error, the current value of
-.I fsuid
-is returned.
+On both success and failure,
+this call returns the previous filesystem user ID of the caller.
 .SH VERSIONS
 This system call is present in Linux since version 1.2.
 .\" This system call is present since Linux 1.1.44
@@ -102,7 +98,16 @@ The glibc
 .BR setfsuid ()
 wrapper function transparently deals with the variation across kernel versions.
 .SH BUGS
-No error messages of any kind are returned to the caller.
+No error indications of any kind are returned to the caller,
+and the fact that both successful and unsuccessful calls return
+the same value makes it impossible to directly determine
+whether the call succeeded or failed.
+Instead, the caller must resort to looking at the return value
+from a further call such as
+.IR setfsuid(\-1)
+(which will always fail), in order to determine if a preceding call to
+.BR setfsuid ()
+changed the filesystem user ID.
 At the very
 least,
 .B EPERM


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] AVR32 fix for 3.11

2013-08-08 Thread Hans-Christian Egtvedt
Hello Linus,

please pull

git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32.git for-linus

to receive the following AVR32 fix for 3.11

Cong Ding (1):
  avr32: boards/atngw100/mrmt.c: fix building error

 arch/avr32/boards/atngw100/mrmt.c | 1 -
 1 file changed, 1 deletion(-)

-- 
Best regards,
Hans-Christian Egtvedt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/16] tg3: clean up unnecessary MSI/MSI-X capability find

2013-08-08 Thread Yijing Wang
PCI core will initialize device MSI/MSI-X capability in
pci_msi_init_pci_dev(). So device driver should use
pci_dev->msi_cap/msix_cap to determine whether the device
support MSI/MSI-X instead of using
pci_find_capability(pci_dev, PCI_CAP_ID_MSI/MSIX).
Access to PCIe device config space again will consume more time.

Signed-off-by: Yijing Wang 
Cc: Nithin Nayak Sujir 
Cc: Michael Chan 
Cc: net...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/net/ethernet/broadcom/tg3.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c 
b/drivers/net/ethernet/broadcom/tg3.c
index ddebc7a..11cad77 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -15917,7 +15917,7 @@ static int tg3_get_invariants(struct tg3 *tp, const 
struct pci_device_id *ent)
 */
if (tg3_flag(tp, 5780_CLASS)) {
tg3_flag_set(tp, 40BIT_DMA_BUG);
-   tp->msi_cap = pci_find_capability(tp->pdev, PCI_CAP_ID_MSI);
+   tp->msi_cap = tp->pdev->msi_cap;
} else {
struct pci_dev *bridge = NULL;
 
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perf,arm -- another (different) fuzzer oops

2013-08-08 Thread Vince Weaver
On Thu, 8 Aug 2013, Will Deacon wrote:

> On the flip side, the good news is that we know the problem is there. We're
> probably generating interrupts at some horrendous rate for the lock-up
> are you running your fuzzer as root?

No, I'm running the fuzzer as a regular user.

> Also, is your fuzzer available somewhere? I could take it for a spin on some
> different architectures if you like.

Yes:

  git clone https://github.com/deater/perf_event_tests.git

and it's in the "fuzzer" subdirectory.  I think I've committed all of the 
ARM related patches.

To run the tool it's just "./perf_fuzzer" and away you go.  There's a lot 
of other tools for generating and analyzing fuzzer syscall traces but 
unfortunately they're not very user-friendly yet.


As for other architectures (at least ARM) in addition to the pandaboard I 
also have a beagleboard and a cortex-A15 chromebook.  The challenge is 
always getting recent Linus-git kernels running on the things.

I also have a raspberry-pi.  I've successfully accessed the perf counters 
on that by reading the low-level registers directly with a kernel 
modulue.  There's no perf driver because the PMU interrupt isn't hooked 
up.  I've been meaning to get perf support going by making things 
periodically polled rather than interrupt driven: has anybody looked into 
doing that yet?

Vince

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   8   9   10   >