RE: Dangerous devm_request_irq() conversions

2013-02-21 Thread Jingoo Han
On Friday, February 22, 2013 4:27 PM, Dmitry Torokhov wrote:
> On Fri, Feb 22, 2013 at 04:12:36PM +0900, Jingoo Han wrote:
> > On Friday, February 22, 2013 3:54 PM, Dmitry Torokhov wrote:
> > >
> > > Hi,
> > >
> > > It looks like a whole slew of devm_request_irq() conversions just got
> > > applied to mainline and many of them are quite broken.
> > >
> > > Consider fd5231ce336e038037b4f0190a6838bdd6e17c6d or
> > > c1879fe80c61f3be6f2ddb82509c2e7f92a484fe: the drivers udsed first to
> > > free IRQ and then unregister the corresponding device ensuring that IRQ
> > > handler, while it runs, has the device available. The mechanic
> > > conversion to devm_request_irq() reverses the order of these operations
> > > opening the race window where IRQ can reference device (or other
> > > resource) that is already gone.
> > >
> > > It would be nice if these could be reverted and revioewed again for
> > > correctness.
> >
> > Um, other RTC drivers already have been using devm_request_threaded_irq() or
> > devm_request_irq() like this, before I added these patches.
> >
> > For example,
> > ./drivers/rtc/rtc-tegra.c
> > ./drivers/rtc/rtc-spear.c
> > ./drivers/rtc/rtc-s3c.c
> > ./drivers/rtc/rtc-mxc.c
> > ./drivers/rtc/rtc-ds1553.c
> > ./drivers/rtc/rtc-ds1511.c
> > ./drivers/rtc/rtc-snvs.c
> > ./drivers/rtc/rtc-imxdi.c
> > ./drivers/rtc/rtc-tx4939.c
> > ./drivers/rtc/rtc-mv.c
> > ./drivers/rtc/rtc-coh901331.c
> > ./drivers/rtc/rtc-stk17ta8.c
> > ./drivers/rtc/rtc-lpc32xx.c
> > ./drivers/rtc/rtc-tps65910.c
> > ./drivers/rtc/rtc-rc5t583.c
> >
> >
> > Also, even more, some RTC drivers calls rtc_device_unregister() first,
> > then calls free_irq() later.
> >
> > For example,
> > ./drivers/rtc/rtc-vr41xx.c
> > ./drivers/rtc/rtc-da9052.c
> > ./drivers/rtc/rtc-isl1208.c
> > ./drivers/rtc/rtc-88pm860x.c
> > ./drivers/rtc/rtc-tps6586x.c
> > ./drivers/rtc/rtc-mpc5121.c
> > ./drivers/rtc/rtc-m48t59.c
> >
> >
> > Please, don't argue revert without concrete reasons.
> 
> What more concrete reason do you need? I explained to you the exact
> reason on the patches I noticed before and also on the 2 commits
> referenced above: blind conversion to devm_* changes order of operation
> which may be deadly with IRQs (but others, like clocks and regulators,
> are important too).
> 
> The fact that crap slipped in the kernel before is not the valid reason
> for adding more of the same crap.
> 
> Please *understand* APIs you are using before making changes.
> 
> >
> > If these devm_request_threaded_irq() or devm_request_irq() make the problem,
> > devm_free_irq() will be added later.
> 
> And the point? If you use devm_request_irq() and then call
> devm_free_irq() manually in all paths what you achieved is waste of
> memory required for devm_* tracking.

CC'ed Al Viro, Tejun Heo


So, is there any report that the devm_request_threaded_irq() makes
the deadly problem related IRQ in such cases?

According to your comment, it seems that there is no reason to use
devm_request_irq() or devm_request_threaded_irq().

Please, argue that it would be better to deprecate devm_request_irq()
or devm_request_threaded_irq().


> 
> --
> Dmitry

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/35] mfd: ab8500-gpadc: Implemented suspend/resume

2013-02-21 Thread Lee Jones
On Thu, 21 Feb 2013, Ulf Hansson wrote:

> On 20 February 2013 14:19, Mark Brown
>  wrote:
> > On Fri, Feb 15, 2013 at 12:56:32PM +, Lee Jones wrote:
> >
> >> +static int ab8500_gpadc_suspend(struct device *dev)
> >> +{
> >> + struct ab8500_gpadc *gpadc = dev_get_drvdata(dev);
> >> +
> >> + mutex_lock(>ab8500_gpadc_lock);
> >> +
> >> + pm_runtime_get_sync(dev);
> >> +
> >> + regulator_disable(gpadc->regu);
> >> + return 0;
> >> +}
> >
> > This doesn't look especially sane...  You're doing a runtime get, taking
> > the lock without releasing it and disabling the regulator.  This is
> > *very* odd, both the changelog and the code need to explain what's going
> > on and why it's safe in a lot more detail here.
> 
> You need to do pm_runtime_get_sync to be able to make sure resources
> (which seems to be only the regulator) are safe to switch off. To my
> understanding this is a generic way to use for being able to switch
> off resources at a device suspend when runtime pm is used in
> conjunction.
> 
> Regarding the mutex, I can't tell the reason behind it. It seems
> strange but not sure.

Daniel, any thoughts?

I'm happy to fixup, once I have the full story.

-- 
Lee Jones
Linaro ST-Ericsson Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] PM: align buffers for LZO compression

2013-02-21 Thread Markus F.X.J. Oberhumer
Hi,

for performance reasons I'd strongly suggest that you explicitly align all
buffers passed to the LZO compress and decompress functions.

Below is a small (and completely untested!) patch, but I think you
get the idea.

BTW, it might be even more beneficial (esp. for NUMA systems) to align *all*
individual unc/cmp/wrk pointers to a multiple of the PAGE_SIZE, but this would
require some code restructuring.

Cheers,
Markus

completely untested patch:

diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 7c33ed2..7af4293 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -532,9 +532,9 @@ struct cmp_data {
wait_queue_head_t done;   /* compression done */
size_t unc_len;   /* uncompressed length */
size_t cmp_len;   /* compressed length */
-   unsigned char unc[LZO_UNC_SIZE];  /* uncompressed buffer */
-   unsigned char cmp[LZO_CMP_SIZE];  /* compressed buffer */
-   unsigned char wrk[LZO1X_1_MEM_COMPRESS];  /* compression workspace */
+   unsigned char unc[LZO_UNC_SIZE] cacheline_aligned; /* 
uncompressed buffer */
+   unsigned char cmp[LZO_CMP_SIZE] cacheline_aligned; /* 
compressed buffer */
+   unsigned char wrk[LZO1X_1_MEM_COMPRESS] cacheline_aligned; /* 
compression workspace */
 };

 /**
@@ -1021,8 +1021,8 @@ struct dec_data {
wait_queue_head_t done;   /* decompression done */
size_t unc_len;   /* uncompressed length */
size_t cmp_len;   /* compressed length */
-   unsigned char unc[LZO_UNC_SIZE];  /* uncompressed buffer */
-   unsigned char cmp[LZO_CMP_SIZE];  /* compressed buffer */
+   unsigned char unc[LZO_UNC_SIZE] cacheline_aligned; /* 
uncompressed buffer */
+   unsigned char cmp[LZO_CMP_SIZE] cacheline_aligned; /* 
compressed buffer */
 };

 /**


Signed-off-by: Markus F.X.J. Oberhumer 

-- 
Markus Oberhumer, , http://www.oberhumer.com/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] cpustat: use accessor functions for get/set/add

2013-02-21 Thread Viresh Kumar
On 22 February 2013 12:47, Amit Kucheria  wrote:
> On Fri, Feb 22, 2013 at 11:51 AM, Viresh Kumar  
> wrote:
>> BTW, i don't see kcpustat_cpu() used in
>>
>>  kernel/sched/core.c| 12 +---
>>  kernel/sched/cputime.c | 29 +--
>>
>> I searched tip/master as well as lnext/master.
>
> Added by Frederic's Adaptive NOHZ patchset?

I don't even see them on our unused-nohz-adaptive-tickless-v2 branch :)
Maybe some other latest work.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] acpi: sleep: Avoid interleaved message on errors

2013-02-21 Thread Joe Perches
Got this dmesg log on an Acer Aspire 725.

[0.256351] ACPI: (supports S0ACPI Exception: AE_NOT_FOUND, While evaluating 
Sleep State [\_S1_] (20130117/hwxface-568)
[0.256373] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State 
[\_S2_] (20130117/hwxface-568)
[0.256391]  S3 S4 S5)

Avoid this interleaving error messages.

Signed-off-by: Joe Perches 
---
 drivers/acpi/sleep.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c
index 6d3a06a..2421303 100644
--- a/drivers/acpi/sleep.c
+++ b/drivers/acpi/sleep.c
@@ -599,7 +599,6 @@ static void acpi_sleep_suspend_setup(void)
status = acpi_get_sleep_type_data(i, _a, _b);
if (ACPI_SUCCESS(status)) {
sleep_states[i] = 1;
-   pr_cont(" S%d", i);
}
}
 
@@ -742,7 +741,6 @@ static void acpi_sleep_hibernate_setup(void)
hibernation_set_ops(old_suspend_ordering ?
_hibernation_ops_old : _hibernation_ops);
sleep_states[ACPI_STATE_S4] = 1;
-   pr_cont(KERN_CONT " S4");
if (nosigcheck)
return;
 
@@ -788,6 +786,9 @@ int __init acpi_sleep_init(void)
 {
acpi_status status;
u8 type_a, type_b;
+   char supported[ACPI_S_STATE_COUNT * 3 + 1];
+   char *pos = supported;
+   int i;
 
if (acpi_disabled)
return 0;
@@ -795,7 +796,6 @@ int __init acpi_sleep_init(void)
acpi_sleep_dmi_check();
 
sleep_states[ACPI_STATE_S0] = 1;
-   pr_info(PREFIX "(supports S0");
 
acpi_sleep_suspend_setup();
acpi_sleep_hibernate_setup();
@@ -803,11 +803,17 @@ int __init acpi_sleep_init(void)
status = acpi_get_sleep_type_data(ACPI_STATE_S5, _a, _b);
if (ACPI_SUCCESS(status)) {
sleep_states[ACPI_STATE_S5] = 1;
-   pr_cont(" S5");
pm_power_off_prepare = acpi_power_off_prepare;
pm_power_off = acpi_power_off;
}
-   pr_cont(")\n");
+
+   supported[0] = 0;
+   for (i = 0; i < ACPI_S_STATE_COUNT; i++) {
+   if (sleep_states[i])
+   pos += sprintf(pos, " S%d", i);
+   }
+   pr_info(PREFIX "(supports%s)\n", supported);
+
/*
 * Register the tts_notifier to reboot notifier list so that the _TTS
 * object can also be evaluated when the system enters S5.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/8] ARM: PRIMA2: Divorce timer-marco from local timer API

2013-02-21 Thread Stephen Boyd
Separate the marco local timers from the local timer API. This
will allow us to remove ARM local timer support in the near future
and gets us closer to moving this driver to drivers/clocksource.

Cc: Barry Song 
Signed-off-by: Stephen Boyd 
---
 arch/arm/mach-prima2/timer-marco.c | 98 --
 1 file changed, 52 insertions(+), 46 deletions(-)

diff --git a/arch/arm/mach-prima2/timer-marco.c 
b/arch/arm/mach-prima2/timer-marco.c
index f4eea2e..d54aac2 100644
--- a/arch/arm/mach-prima2/timer-marco.c
+++ b/arch/arm/mach-prima2/timer-marco.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -18,7 +19,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include "common.h"
@@ -154,13 +154,7 @@ static void sirfsoc_clocksource_resume(struct clocksource 
*cs)
BIT(1) | BIT(0), sirfsoc_timer_base + 
SIRFSOC_TIMER_64COUNTER_CTRL);
 }
 
-static struct clock_event_device sirfsoc_clockevent = {
-   .name = "sirfsoc_clockevent",
-   .rating = 200,
-   .features = CLOCK_EVT_FEAT_ONESHOT,
-   .set_mode = sirfsoc_timer_set_mode,
-   .set_next_event = sirfsoc_timer_set_next_event,
-};
+static struct clock_event_device __percpu *sirfsoc_clockevent;
 
 static struct clocksource sirfsoc_clocksource = {
.name = "sirfsoc_clocksource",
@@ -176,11 +170,8 @@ static struct irqaction sirfsoc_timer_irq = {
.name = "sirfsoc_timer0",
.flags = IRQF_TIMER | IRQF_NOBALANCING,
.handler = sirfsoc_timer_interrupt,
-   .dev_id = _clockevent,
 };
 
-#ifdef CONFIG_LOCAL_TIMERS
-
 static struct irqaction sirfsoc_timer1_irq = {
.name = "sirfsoc_timer1",
.flags = IRQF_TIMER | IRQF_NOBALANCING,
@@ -189,56 +180,75 @@ static struct irqaction sirfsoc_timer1_irq = {
 
 static int __cpuinit sirfsoc_local_timer_setup(struct clock_event_device *ce)
 {
-   /* Use existing clock_event for cpu 0 */
-   if (!smp_processor_id())
-   return 0;
+   int cpu = smp_processor_id();
+   struct irqaction *action;
+
+   if (cpu == 0)
+   action = _timer_irq;
+   else
+   action = _timer1_irq;
 
-   ce->irq = sirfsoc_timer1_irq.irq;
+   ce->irq = action->irq;
ce->name = "local_timer";
-   ce->features = sirfsoc_clockevent.features;
-   ce->rating = sirfsoc_clockevent.rating;
+   ce->features = CLOCK_EVT_FEAT_ONESHOT;
+   ce->rating = 200;
ce->set_mode = sirfsoc_timer_set_mode;
ce->set_next_event = sirfsoc_timer_set_next_event;
-   ce->shift = sirfsoc_clockevent.shift;
-   ce->mult = sirfsoc_clockevent.mult;
-   ce->max_delta_ns = sirfsoc_clockevent.max_delta_ns;
-   ce->min_delta_ns = sirfsoc_clockevent.min_delta_ns;
+   clockevents_calc_mult_shift(ce, CLOCK_TICK_RATE, 60);
+   ce->max_delta_ns = clockevent_delta2ns(-2, ce);
+   ce->min_delta_ns = clockevent_delta2ns(2, ce);
+   ce->cpumask = cpumask_of(cpu);
 
-   sirfsoc_timer1_irq.dev_id = ce;
-   BUG_ON(setup_irq(ce->irq, _timer1_irq));
-   irq_set_affinity(sirfsoc_timer1_irq.irq, cpumask_of(1));
+   action->dev_id = ce;
+   BUG_ON(setup_irq(ce->irq, action));
+   irq_set_affinity(action->irq, cpumask_of(cpu));
 
clockevents_register_device(ce);
return 0;
 }
 
-static void sirfsoc_local_timer_stop(struct clock_event_device *ce)
+static void __cpuinit sirfsoc_local_timer_stop(struct clock_event_device *ce)
 {
+   int cpu = smp_processor_id();
+
sirfsoc_timer_count_disable(1);
 
-   remove_irq(sirfsoc_timer1_irq.irq, _timer1_irq);
+   if (cpu == 0)
+   remove_irq(sirfsoc_timer_irq.irq, _timer_irq);
+   else
+   remove_irq(sirfsoc_timer1_irq.irq, _timer1_irq);
+}
+
+static int __cpuinit sirfsoc_cpu_notify(struct notifier_block *self,
+  unsigned long action, void *hcpu)
+{
+   struct clock_event_device *evt = this_cpu_ptr(sirfsoc_clockevent);
+
+   switch (action & ~CPU_TASKS_FROZEN) {
+   case CPU_STARTING:
+   sirfsoc_local_timer_setup(evt);
+   break;
+   case CPU_DYING:
+   sirfsoc_local_timer_stop(evt);
+   break;
+   }
+
+   return NOTIFY_OK;
 }
 
-static struct local_timer_ops sirfsoc_local_timer_ops __cpuinitdata = {
-   .setup  = sirfsoc_local_timer_setup,
-   .stop   = sirfsoc_local_timer_stop,
+static struct notifier_block sirfsoc_cpu_nb __cpuinitdata = {
+   .notifier_call = sirfsoc_cpu_notify,
 };
-#endif /* CONFIG_LOCAL_TIMERS */
 
 static void __init sirfsoc_clockevent_init(void)
 {
-   clockevents_calc_mult_shift(_clockevent, CLOCK_TICK_RATE, 60);
-
-   sirfsoc_clockevent.max_delta_ns =
-   clockevent_delta2ns(-2, _clockevent);
-   sirfsoc_clockevent.min_delta_ns =
-   clockevent_delta2ns(2, _clockevent);
-
-   sirfsoc_clockevent.cpumask = 

Re: [PATCH v4] mfd: syscon: Add non-DT support

2013-02-21 Thread Thierry Reding
On Fri, Feb 22, 2013 at 03:13:12PM +0800, Dong Aisheng wrote:
> On Fri, Feb 22, 2013 at 11:01:18AM +0400, Alexander Shiyan wrote:
> > > On Thu, Feb 21, 2013 at 07:29:02PM +0400, Alexander Shiyan wrote:
> > > > This patch allow using syscon driver from the platform data, i.e.
> > > > possibility using driver on systems without oftree support.
> > > > For search syscon device from the client drivers,
> > > > "syscon_regmap_lookup_by_pdevname" function was added.
> > > > 
> > > > Signed-off-by: Alexander Shiyan 
> > > 
> > > [...]
> > > 
> > > > +   syscon->base = devm_ioremap_resource(dev, res);
> > > > +   if (!syscon->base)
> > > 
> > > Is this correct?
> > 
> > Hmm, of course IS_ERR should be used here...
> > v5?
> > 
> 
> Yes.
> >From here:
> https://lkml.org/lkml/2013/1/21/140
> It seems it is.
> 
> > > 
> > > > +   return -EBUSY;
> 
> Both this line could also be changed.
> 
> > > >
> > > 
> > > Otherwise, i'm also ok with this patch.
> > > Acked-by: Dong Aisheng 
> > > 
> > > BTW, i did not see Samuel's tree having this new API.
> > > So, who will pick this patch?
> > 
> > I have same question.
> 
> I CCed Thierry and Greg who may know it.

Yes, devm_ioremap_resource() never returns NULL. You always need to
check the returned pointer with IS_ERR(). The value that you return
should be extracted from the pointer with PTR_ERR().

Thierry


pgpzWfciBrHRC.pgp
Description: PGP signature


[PATCH 5/8] ARM: MSM: Divorce msm_timer from local timer API

2013-02-21 Thread Stephen Boyd
Separate the msm_timer from the local timer API. This will allow
us to remove ARM local timer support in the near future and gets
us closer to moving this driver to drivers/clocksource.

Cc: David Brown 
Cc: Daniel Walker 
Cc: Bryan Huntsman 
Signed-off-by: Stephen Boyd 
---
 arch/arm/mach-msm/timer.c | 125 +-
 1 file changed, 67 insertions(+), 58 deletions(-)

diff --git a/arch/arm/mach-msm/timer.c b/arch/arm/mach-msm/timer.c
index 2969027..4675c5e 100644
--- a/arch/arm/mach-msm/timer.c
+++ b/arch/arm/mach-msm/timer.c
@@ -16,6 +16,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -25,7 +26,6 @@
 #include 
 
 #include 
-#include 
 #include 
 
 #include "common.h"
@@ -46,7 +46,7 @@ static void __iomem *event_base;
 
 static irqreturn_t msm_timer_interrupt(int irq, void *dev_id)
 {
-   struct clock_event_device *evt = *(struct clock_event_device **)dev_id;
+   struct clock_event_device *evt = dev_id;
/* Stop the timer tick */
if (evt->mode == CLOCK_EVT_MODE_ONESHOT) {
u32 ctrl = readl_relaxed(event_base + TIMER_ENABLE);
@@ -90,18 +90,7 @@ static void msm_timer_set_mode(enum clock_event_mode mode,
writel_relaxed(ctrl, event_base + TIMER_ENABLE);
 }
 
-static struct clock_event_device msm_clockevent = {
-   .name   = "gp_timer",
-   .features   = CLOCK_EVT_FEAT_ONESHOT,
-   .rating = 200,
-   .set_next_event = msm_timer_set_next_event,
-   .set_mode   = msm_timer_set_mode,
-};
-
-static union {
-   struct clock_event_device *evt;
-   struct clock_event_device * __percpu *percpu_evt;
-} msm_evt;
+static struct clock_event_device __percpu *msm_evt;
 
 static void __iomem *source_base;
 
@@ -127,40 +116,66 @@ static struct clocksource msm_clocksource = {
.flags  = CLOCK_SOURCE_IS_CONTINUOUS,
 };
 
-#ifdef CONFIG_LOCAL_TIMERS
+static int msm_timer_irq;
+static int msm_timer_has_ppi;
+
 static int __cpuinit msm_local_timer_setup(struct clock_event_device *evt)
 {
-   /* Use existing clock_event for cpu 0 */
-   if (!smp_processor_id())
-   return 0;
+   int cpu = smp_processor_id();
+   int err;
 
writel_relaxed(0, event_base + TIMER_ENABLE);
writel_relaxed(0, event_base + TIMER_CLEAR);
writel_relaxed(~0, event_base + TIMER_MATCH_VAL);
-   evt->irq = msm_clockevent.irq;
+   evt->irq = msm_timer_irq;
evt->name = "local_timer";
-   evt->features = msm_clockevent.features;
-   evt->rating = msm_clockevent.rating;
+   evt->features = CLOCK_EVT_FEAT_ONESHOT;
+   evt->rating = 200;
evt->set_mode = msm_timer_set_mode;
evt->set_next_event = msm_timer_set_next_event;
+   evt->cpumask = cpumask_of(cpu);
+
+   clockevents_config_and_register(evt, GPT_HZ, 4, 0x);
+
+   if (msm_timer_has_ppi) {
+   enable_percpu_irq(evt->irq, IRQ_TYPE_EDGE_RISING);
+   } else {
+   err = request_irq(evt->irq, msm_timer_interrupt,
+   IRQF_TIMER | IRQF_NOBALANCING |
+   IRQF_TRIGGER_RISING, "gp_timer", evt);
+   if (err)
+   pr_err("request_irq failed\n");
+   }
 
-   *__this_cpu_ptr(msm_evt.percpu_evt) = evt;
-   clockevents_config_and_register(evt, GPT_HZ, 4, 0xf000);
-   enable_percpu_irq(evt->irq, IRQ_TYPE_EDGE_RISING);
return 0;
 }
 
-static void msm_local_timer_stop(struct clock_event_device *evt)
+static void __cpuinit msm_local_timer_stop(struct clock_event_device *evt)
 {
evt->set_mode(CLOCK_EVT_MODE_UNUSED, evt);
disable_percpu_irq(evt->irq);
 }
 
-static struct local_timer_ops msm_local_timer_ops __cpuinitdata = {
-   .setup  = msm_local_timer_setup,
-   .stop   = msm_local_timer_stop,
+static int __cpuinit msm_timer_cpu_notify(struct notifier_block *self,
+  unsigned long action, void *hcpu)
+{
+   struct clock_event_device *evt = this_cpu_ptr(msm_evt);
+
+   switch (action & ~CPU_TASKS_FROZEN) {
+   case CPU_STARTING:
+   msm_local_timer_setup(evt);
+   break;
+   case CPU_DYING:
+   msm_local_timer_stop(evt);
+   break;
+   }
+
+   return NOTIFY_OK;
+}
+
+static struct notifier_block msm_timer_cpu_nb __cpuinitdata = {
+   .notifier_call = msm_timer_cpu_notify,
 };
-#endif /* CONFIG_LOCAL_TIMERS */
 
 static notrace u32 msm_sched_clock_read(void)
 {
@@ -170,41 +185,35 @@ static notrace u32 msm_sched_clock_read(void)
 static void __init msm_timer_init(u32 dgt_hz, int sched_bits, int irq,
  bool percpu)
 {
-   struct clock_event_device *ce = _clockevent;
struct clocksource *cs = _clocksource;
-   int res;
+   int res = 0;
 
-   writel_relaxed(0, event_base + TIMER_ENABLE);
-   writel_relaxed(0, 

[PATCH 3/8] ARM: EXYNOS4: Divorce mct from local timer API

2013-02-21 Thread Stephen Boyd
Separate the mct local timers from the local timer API. This will
allow us to remove ARM local timer support in the near future and
gets us closer to moving this driver to drivers/clocksource.

Cc: Kukjin Kim 
Signed-off-by: Stephen Boyd 
---
 arch/arm/mach-exynos/mct.c | 53 --
 1 file changed, 37 insertions(+), 16 deletions(-)

diff --git a/arch/arm/mach-exynos/mct.c b/arch/arm/mach-exynos/mct.c
index c9d6650..5a9a73f 100644
--- a/arch/arm/mach-exynos/mct.c
+++ b/arch/arm/mach-exynos/mct.c
@@ -16,13 +16,13 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 
 #include 
-#include 
 
 #include 
 
@@ -42,7 +42,7 @@ static unsigned long clk_rate;
 static unsigned int mct_int_type;
 
 struct mct_clock_event_device {
-   struct clock_event_device *evt;
+   struct clock_event_device evt;
void __iomem *base;
char name[10];
 };
@@ -264,8 +264,6 @@ static void exynos4_clockevent_init(void)
setup_irq(EXYNOS4_IRQ_MCT_G0, _comp_event_irq);
 }
 
-#ifdef CONFIG_LOCAL_TIMERS
-
 static DEFINE_PER_CPU(struct mct_clock_event_device, percpu_mct_tick);
 
 /* Clock event handling */
@@ -338,7 +336,7 @@ static inline void exynos4_tick_set_mode(enum 
clock_event_mode mode,
 
 static int exynos4_mct_tick_clear(struct mct_clock_event_device *mevt)
 {
-   struct clock_event_device *evt = mevt->evt;
+   struct clock_event_device *evt = >evt;
 
/*
 * This is for supporting oneshot mode.
@@ -360,7 +358,7 @@ static int exynos4_mct_tick_clear(struct 
mct_clock_event_device *mevt)
 static irqreturn_t exynos4_mct_tick_isr(int irq, void *dev_id)
 {
struct mct_clock_event_device *mevt = dev_id;
-   struct clock_event_device *evt = mevt->evt;
+   struct clock_event_device *evt = >evt;
 
exynos4_mct_tick_clear(mevt);
 
@@ -388,7 +386,6 @@ static int __cpuinit exynos4_local_timer_setup(struct 
clock_event_device *evt)
int mct_lx_irq;
 
mevt = this_cpu_ptr(_mct_tick);
-   mevt->evt = evt;
 
mevt->base = EXYNOS4_MCT_L_BASE(cpu);
sprintf(mevt->name, "mct_tick%d", cpu);
@@ -426,7 +423,7 @@ static int __cpuinit exynos4_local_timer_setup(struct 
clock_event_device *evt)
return 0;
 }
 
-static void exynos4_local_timer_stop(struct clock_event_device *evt)
+static void __cpuinit exynos4_local_timer_stop(struct clock_event_device *evt)
 {
unsigned int cpu = smp_processor_id();
evt->set_mode(CLOCK_EVT_MODE_UNUSED, evt);
@@ -439,22 +436,38 @@ static void exynos4_local_timer_stop(struct 
clock_event_device *evt)
disable_percpu_irq(EXYNOS_IRQ_MCT_LOCALTIMER);
 }
 
-static struct local_timer_ops exynos4_mct_tick_ops __cpuinitdata = {
-   .setup  = exynos4_local_timer_setup,
-   .stop   = exynos4_local_timer_stop,
+static int __cpuinit exynos4_mct_cpu_notify(struct notifier_block *self,
+  unsigned long action, void *hcpu)
+{
+   struct mct_clock_event_device *mevt = this_cpu_ptr(_mct_tick);
+   struct clock_event_device *evt = >evt;
+
+   switch (action & ~CPU_TASKS_FROZEN) {
+   case CPU_STARTING:
+   exynos4_local_timer_setup(evt);
+   break;
+   case CPU_DYING:
+   exynos4_local_timer_stop(evt);
+   break;
+   }
+
+   return NOTIFY_OK;
+}
+
+static struct notifier_block exynos4_mct_cpu_nb __cpuinitdata = {
+   .notifier_call = exynos4_mct_cpu_notify,
 };
-#endif /* CONFIG_LOCAL_TIMERS */
 
 static void __init exynos4_timer_resources(void)
 {
+   int err;
+   struct mct_clock_event_device *mevt = this_cpu_ptr(_mct_tick);
struct clk *mct_clk;
mct_clk = clk_get(NULL, "xtal");
 
clk_rate = clk_get_rate(mct_clk);
 
-#ifdef CONFIG_LOCAL_TIMERS
if (mct_int_type == MCT_INT_PPI) {
-   int err;
 
err = request_percpu_irq(EXYNOS_IRQ_MCT_LOCALTIMER,
 exynos4_mct_tick_isr, "MCT",
@@ -463,8 +476,16 @@ static void __init exynos4_timer_resources(void)
 EXYNOS_IRQ_MCT_LOCALTIMER, err);
}
 
-   local_timer_register(_mct_tick_ops);
-#endif /* CONFIG_LOCAL_TIMERS */
+   err = register_cpu_notifier(_mct_cpu_nb);
+   if (err)
+   goto out_irq;
+
+   /* Immediately configure the timer on the boot CPU */
+   exynos4_local_timer_setup(>evt);
+   return;
+
+out_irq:
+   free_percpu_irq(EXYNOS_IRQ_MCT_LOCALTIMER, _mct_tick);
 }
 
 void __init exynos4_timer_init(void)
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/8] ARM: smp_twd: Divorce smp_twd from local timer API

2013-02-21 Thread Stephen Boyd
Separate the smp_twd timers from the local timer API. This will
allow us to remove ARM local timer support in the near future and
gets us closer to moving this driver to drivers/clocksource.

Cc: Russell King 
Signed-off-by: Stephen Boyd 
---
 arch/arm/kernel/smp_twd.c | 48 +++
 1 file changed, 32 insertions(+), 16 deletions(-)

diff --git a/arch/arm/kernel/smp_twd.c b/arch/arm/kernel/smp_twd.c
index c092115..2439843 100644
--- a/arch/arm/kernel/smp_twd.c
+++ b/arch/arm/kernel/smp_twd.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -23,7 +24,6 @@
 #include 
 
 #include 
-#include 
 
 /* set up by the platform code */
 static void __iomem *twd_base;
@@ -32,7 +32,7 @@ static struct clk *twd_clk;
 static unsigned long twd_timer_rate;
 static DEFINE_PER_CPU(bool, percpu_setup_called);
 
-static struct clock_event_device __percpu **twd_evt;
+static struct clock_event_device __percpu *twd_evt;
 static int twd_ppi;
 
 static void twd_set_mode(enum clock_event_mode mode,
@@ -105,7 +105,7 @@ static void twd_update_frequency(void *new_rate)
 {
twd_timer_rate = *((unsigned long *) new_rate);
 
-   clockevents_update_freq(*__this_cpu_ptr(twd_evt), twd_timer_rate);
+   clockevents_update_freq(__this_cpu_ptr(twd_evt), twd_timer_rate);
 }
 
 static int twd_rate_change(struct notifier_block *nb,
@@ -131,7 +131,7 @@ static struct notifier_block twd_clk_nb = {
 
 static int twd_clk_init(void)
 {
-   if (twd_evt && *__this_cpu_ptr(twd_evt) && !IS_ERR(twd_clk))
+   if (twd_evt && __this_cpu_ptr(twd_evt) && !IS_ERR(twd_clk))
return clk_notifier_register(twd_clk, _clk_nb);
 
return 0;
@@ -150,7 +150,7 @@ static void twd_update_frequency(void *data)
 {
twd_timer_rate = clk_get_rate(twd_clk);
 
-   clockevents_update_freq(*__this_cpu_ptr(twd_evt), twd_timer_rate);
+   clockevents_update_freq(__this_cpu_ptr(twd_evt), twd_timer_rate);
 }
 
 static int twd_cpufreq_transition(struct notifier_block *nb,
@@ -176,7 +176,7 @@ static struct notifier_block twd_cpufreq_nb = {
 
 static int twd_cpufreq_init(void)
 {
-   if (twd_evt && *__this_cpu_ptr(twd_evt) && !IS_ERR(twd_clk))
+   if (twd_evt && __this_cpu_ptr(twd_evt) && !IS_ERR(twd_clk))
return cpufreq_register_notifier(_cpufreq_nb,
CPUFREQ_TRANSITION_NOTIFIER);
 
@@ -266,7 +266,6 @@ static void twd_get_clock(struct device_node *np)
  */
 static int __cpuinit twd_timer_setup(struct clock_event_device *clk)
 {
-   struct clock_event_device **this_cpu_clk;
int cpu = smp_processor_id();
 
/*
@@ -275,7 +274,7 @@ static int __cpuinit twd_timer_setup(struct 
clock_event_device *clk)
 */
if (per_cpu(percpu_setup_called, cpu)) {
__raw_writel(0, twd_base + TWD_TIMER_CONTROL);
-   clockevents_register_device(*__this_cpu_ptr(twd_evt));
+   clockevents_register_device(__this_cpu_ptr(twd_evt));
enable_percpu_irq(clk->irq, 0);
return 0;
}
@@ -296,9 +295,7 @@ static int __cpuinit twd_timer_setup(struct 
clock_event_device *clk)
clk->set_mode = twd_set_mode;
clk->set_next_event = twd_set_next_event;
clk->irq = twd_ppi;
-
-   this_cpu_clk = __this_cpu_ptr(twd_evt);
-   *this_cpu_clk = clk;
+   clk->cpumask = cpumask_of(cpu);
 
clockevents_config_and_register(clk, twd_timer_rate,
0xf, 0x);
@@ -307,16 +304,32 @@ static int __cpuinit twd_timer_setup(struct 
clock_event_device *clk)
return 0;
 }
 
-static struct local_timer_ops twd_lt_ops __cpuinitdata = {
-   .setup  = twd_timer_setup,
-   .stop   = twd_timer_stop,
+static int __cpuinit twd_timer_cpu_notify(struct notifier_block *self,
+  unsigned long action, void *hcpu)
+{
+   struct clock_event_device *evt = this_cpu_ptr(twd_evt);
+
+   switch (action & ~CPU_TASKS_FROZEN) {
+   case CPU_STARTING:
+   twd_timer_setup(evt);
+   break;
+   case CPU_DYING:
+   twd_timer_stop(evt);
+   break;
+   }
+
+   return NOTIFY_OK;
+}
+
+static struct notifier_block twd_timer_cpu_nb __cpuinitdata = {
+   .notifier_call = twd_timer_cpu_notify,
 };
 
 static int __init twd_local_timer_common_register(struct device_node *np)
 {
int err;
 
-   twd_evt = alloc_percpu(struct clock_event_device *);
+   twd_evt = alloc_percpu(struct clock_event_device);
if (!twd_evt) {
err = -ENOMEM;
goto out_free;
@@ -328,10 +341,13 @@ static int __init twd_local_timer_common_register(struct 
device_node *np)
goto out_free;
}
 
-   err = local_timer_register(_lt_ops);
+   err = register_cpu_notifier(_timer_cpu_nb);
if (err)
goto out_irq;
 
+   

[PATCH 8/8] ARM: smp: Remove local timer API

2013-02-21 Thread Stephen Boyd
There are no more users of this API, remove it.

Cc: Russell King 
Signed-off-by: Stephen Boyd 
---
 arch/arm/Kconfig  | 12 +--
 arch/arm/include/asm/localtimer.h | 34 
 arch/arm/kernel/smp.c | 67 ++-
 arch/arm/mach-omap2/Kconfig   |  1 -
 arch/arm/mach-omap2/timer.c   |  7 
 5 files changed, 11 insertions(+), 110 deletions(-)
 delete mode 100644 arch/arm/include/asm/localtimer.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index dedf02b..7d4338d 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1527,6 +1527,7 @@ config SMP
depends on HAVE_SMP
depends on MMU
select HAVE_ARM_SCU if !ARCH_MSM_SCORPIONMP
+   select HAVE_ARM_TWD if (!ARCH_MSM_SCORPIONMP && !EXYNOS4_MCT)
select USE_GENERIC_SMP_HELPERS
help
  This enables support for systems with more than one CPU. If you have
@@ -1646,17 +1647,6 @@ config ARM_PSCI
  0022A ("Power State Coordination Interface System Software on
  ARM processors").
 
-config LOCAL_TIMERS
-   bool "Use local timer interrupts"
-   depends on SMP
-   default y
-   select HAVE_ARM_TWD if (!ARCH_MSM_SCORPIONMP && !EXYNOS4_MCT)
-   help
- Enable support for local timers on SMP platforms, rather then the
- legacy IPI broadcast method.  Local timers allows the system
- accounting to be spread across the timer interval, preventing a
- "thundering herd" at every timer tick.
-
 config ARCH_NR_GPIO
int
default 1024 if ARCH_SHMOBILE || ARCH_TEGRA
diff --git a/arch/arm/include/asm/localtimer.h 
b/arch/arm/include/asm/localtimer.h
deleted file mode 100644
index f77ffc1..000
--- a/arch/arm/include/asm/localtimer.h
+++ /dev/null
@@ -1,34 +0,0 @@
-/*
- *  arch/arm/include/asm/localtimer.h
- *
- *  Copyright (C) 2004-2005 ARM Ltd.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-#ifndef __ASM_ARM_LOCALTIMER_H
-#define __ASM_ARM_LOCALTIMER_H
-
-#include 
-
-struct clock_event_device;
-
-struct local_timer_ops {
-   int  (*setup)(struct clock_event_device *);
-   void (*stop)(struct clock_event_device *);
-};
-
-#ifdef CONFIG_LOCAL_TIMERS
-/*
- * Register a local timer driver
- */
-int local_timer_register(struct local_timer_ops *);
-#else
-static inline int local_timer_register(struct local_timer_ops *ops)
-{
-   return -ENXIO;
-}
-#endif
-
-#endif
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 2d5197d..f628c79 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -41,7 +41,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -133,8 +132,6 @@ int __cpuinit boot_secondary(unsigned int cpu, struct 
task_struct *idle)
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
-static void percpu_timer_stop(void);
-
 static int platform_cpu_kill(unsigned int cpu)
 {
if (smp_ops.cpu_kill)
@@ -178,11 +175,6 @@ int __cpuinit __cpu_disable(void)
migrate_irqs();
 
/*
-* Stop the local timer for this CPU.
-*/
-   percpu_timer_stop();
-
-   /*
 * Flush user cache and TLB mappings, and then remove this CPU
 * from the vm mask set of all processes.
 *
@@ -269,7 +261,7 @@ static void __cpuinit smp_store_cpu_info(unsigned int cpuid)
store_cpu_topology(cpuid);
 }
 
-static void percpu_timer_setup(void);
+static void broadcast_timer_setup(void);
 
 /*
  * This is the secondary CPU boot entry.  We're using this CPUs
@@ -325,9 +317,9 @@ asmlinkage void __cpuinit secondary_start_kernel(void)
complete(_running);
 
/*
-* Setup the percpu timer for this CPU.
+* Setup the dummy broadcast timer for this CPU.
 */
-   percpu_timer_setup();
+   broadcast_timer_setup();
 
local_irq_enable();
local_fiq_enable();
@@ -375,10 +367,10 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
max_cpus = ncores;
if (ncores > 1 && max_cpus) {
/*
-* Enable the local timer or broadcast device for the
+* Enable the dummy broadcast device for the
 * boot CPU, but only if we have more than one CPU.
 */
-   percpu_timer_setup();
+   broadcast_timer_setup();
 
/*
 * Initialise the present map, which describes the set of CPUs
@@ -473,8 +465,12 @@ static void broadcast_timer_set_mode(enum clock_event_mode 
mode,
 {
 }
 
-static void __cpuinit broadcast_timer_setup(struct clock_event_device *evt)
+static void __cpuinit broadcast_timer_setup(void)
 {
+   unsigned int cpu = smp_processor_id();
+   struct clock_event_device *evt = _cpu(percpu_clockevent, cpu);
+
+   evt->cpumask= 

[PATCH 6/8] clocksource: time-armada-370-xp: Fix sparse warning

2013-02-21 Thread Stephen Boyd
drivers/clocksource/time-armada-370-xp.c:217:13: warning: symbol
'armada_370_xp_timer_init' was not declared. Should it be static?

Also remove the __init marking in the prototype as it's
unnecessary and drop the init.h file.

Cc: Gregory CLEMENT 
Signed-off-by: Stephen Boyd 
---
 drivers/clocksource/time-armada-370-xp.c | 3 ++-
 include/linux/time-armada-370-xp.h   | 4 +---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/clocksource/time-armada-370-xp.c 
b/drivers/clocksource/time-armada-370-xp.c
index 47a6730..efe4aef 100644
--- a/drivers/clocksource/time-armada-370-xp.c
+++ b/drivers/clocksource/time-armada-370-xp.c
@@ -27,10 +27,11 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
-#include 
 /*
  * Timer block registers.
  */
diff --git a/include/linux/time-armada-370-xp.h 
b/include/linux/time-armada-370-xp.h
index dfdfdc0..6fb0856 100644
--- a/include/linux/time-armada-370-xp.h
+++ b/include/linux/time-armada-370-xp.h
@@ -11,8 +11,6 @@
 #ifndef __TIME_ARMADA_370_XPPRCMU_H
 #define __TIME_ARMADA_370_XPPRCMU_H
 
-#include 
-
-void __init armada_370_xp_timer_init(void);
+void armada_370_xp_timer_init(void);
 
 #endif
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/8] clocksource: time-armada-370-xp: Divorce from local timer API

2013-02-21 Thread Stephen Boyd
Separate the armada 370xp local timers from the local timer API.
This will allow us to remove ARM local timer support in the near
future and makes this driver multi-architecture friendly.

Cc: Gregory CLEMENT 
Signed-off-by: Stephen Boyd 
---
 drivers/clocksource/time-armada-370-xp.c | 85 ++--
 1 file changed, 38 insertions(+), 47 deletions(-)

diff --git a/drivers/clocksource/time-armada-370-xp.c 
b/drivers/clocksource/time-armada-370-xp.c
index efe4aef..ee2e50c5 100644
--- a/drivers/clocksource/time-armada-370-xp.c
+++ b/drivers/clocksource/time-armada-370-xp.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -31,7 +32,6 @@
 #include 
 
 #include 
-#include 
 /*
  * Timer block registers.
  */
@@ -70,7 +70,7 @@ static bool timer25Mhz = true;
  */
 static u32 ticks_per_jiffy;
 
-static struct clock_event_device __percpu **percpu_armada_370_xp_evt;
+static struct clock_event_device __percpu *armada_370_xp_evt;
 
 static u32 notrace armada_370_xp_read_sched_clock(void)
 {
@@ -143,14 +143,7 @@ armada_370_xp_clkevt_mode(enum clock_event_mode mode,
}
 }
 
-static struct clock_event_device armada_370_xp_clkevt = {
-   .name   = "armada_370_xp_per_cpu_tick",
-   .features   = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_PERIODIC,
-   .shift  = 32,
-   .rating = 300,
-   .set_next_event = armada_370_xp_clkevt_next_event,
-   .set_mode   = armada_370_xp_clkevt_mode,
-};
+static int armada_370_xp_clkevt_irq;
 
 static irqreturn_t armada_370_xp_timer_interrupt(int irq, void *dev_id)
 {
@@ -173,42 +166,53 @@ static int __cpuinit armada_370_xp_timer_setup(struct 
clock_event_device *evt)
u32 u;
int cpu = smp_processor_id();
 
-   /* Use existing clock_event for cpu 0 */
-   if (!smp_processor_id())
-   return 0;
-
u = readl(local_base + TIMER_CTRL_OFF);
if (timer25Mhz)
writel(u | TIMER0_25MHZ, local_base + TIMER_CTRL_OFF);
else
writel(u & ~TIMER0_25MHZ, local_base + TIMER_CTRL_OFF);
 
-   evt->name   = armada_370_xp_clkevt.name;
-   evt->irq= armada_370_xp_clkevt.irq;
-   evt->features   = armada_370_xp_clkevt.features;
-   evt->shift  = armada_370_xp_clkevt.shift;
-   evt->rating = armada_370_xp_clkevt.rating,
+   evt->name   = "armada_370_xp_per_cpu_tick",
+   evt->features   = CLOCK_EVT_FEAT_ONESHOT |
+ CLOCK_EVT_FEAT_PERIODIC;
+   evt->shift  = 32,
+   evt->rating = 300,
evt->set_next_event = armada_370_xp_clkevt_next_event,
evt->set_mode   = armada_370_xp_clkevt_mode,
+   evt->irq= armada_370_xp_clkevt_irq;
evt->cpumask= cpumask_of(cpu);
 
-   *__this_cpu_ptr(percpu_armada_370_xp_evt) = evt;
-
clockevents_config_and_register(evt, timer_clk, 1, 0xfffe);
enable_percpu_irq(evt->irq, 0);
 
return 0;
 }
 
-static void  armada_370_xp_timer_stop(struct clock_event_device *evt)
+static void __cpuinit armada_370_xp_timer_stop(struct clock_event_device *evt)
 {
evt->set_mode(CLOCK_EVT_MODE_UNUSED, evt);
disable_percpu_irq(evt->irq);
 }
 
-static struct local_timer_ops armada_370_xp_local_timer_ops __cpuinitdata = {
-   .setup  = armada_370_xp_timer_setup,
-   .stop   =  armada_370_xp_timer_stop,
+static int __cpuinit armada_370_xp_timer_cpu_notify(struct notifier_block 
*self,
+  unsigned long action, void *hcpu)
+{
+   struct clock_event_device *evt = this_cpu_ptr(armada_370_xp_evt);
+
+   switch (action & ~CPU_TASKS_FROZEN) {
+   case CPU_STARTING:
+   armada_370_xp_timer_setup(evt);
+   break;
+   case CPU_DYING:
+   armada_370_xp_timer_stop(evt);
+   break;
+   }
+
+   return NOTIFY_OK;
+}
+
+static struct notifier_block armada_370_xp_timer_cpu_nb __cpuinitdata = {
+   .notifier_call = armada_370_xp_timer_cpu_notify,
 };
 
 void __init armada_370_xp_timer_init(void)
@@ -224,9 +228,6 @@ void __init armada_370_xp_timer_init(void)
 
if (of_find_property(np, "marvell,timer-25Mhz", NULL)) {
/* The fixed 25MHz timer is available so let's use it */
-   u = readl(local_base + TIMER_CTRL_OFF);
-   writel(u | TIMER0_25MHZ,
-  local_base + TIMER_CTRL_OFF);
u = readl(timer_base + TIMER_CTRL_OFF);
writel(u | TIMER0_25MHZ,
   timer_base + TIMER_CTRL_OFF);
@@ -236,9 +237,6 @@ void __init armada_370_xp_timer_init(void)
struct clk *clk = of_clk_get(np, 0);
WARN_ON(IS_ERR(clk));
rate =  clk_get_rate(clk);
-   u = readl(local_base + 

[PATCH 1/8] ARM: smp: Lower rating of dummy broadcast device

2013-02-21 Thread Stephen Boyd
In the near future the dummy broadcast device will always be
registered with the clockevent core. If the rating of the dummy
is higher than the rating of the real clockevent the clockevents
core will try to replace the real clockevent with the dummy
broadcast. We don't want this to happen, so lower the rating to
something no good clockevent should choose.

Cc: Russell King 
Signed-off-by: Stephen Boyd 
---
 arch/arm/kernel/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index fa86d1c..2d5197d 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -479,7 +479,7 @@ static void __cpuinit broadcast_timer_setup(struct 
clock_event_device *evt)
evt->features   = CLOCK_EVT_FEAT_ONESHOT |
  CLOCK_EVT_FEAT_PERIODIC |
  CLOCK_EVT_FEAT_DUMMY;
-   evt->rating = 400;
+   evt->rating = 100;
evt->mult   = 1;
evt->set_mode   = broadcast_timer_set_mode;
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/8] Remove ARM local timer API

2013-02-21 Thread Stephen Boyd
In light of Mark Rutland's recent work on divorcing the ARM architected
timers from the ARM local timer API and introducing a generic arch hook for
broadcast it seems that we should remove the local timer API entirely.
Doing so will reduce the architecture dependencies of our timer drivers,
reduce code in ARM core, and simplify timer drivers because they no longer
go through an architecture layer that is essentially a hotplug notifier.

Previous attempts have been made[1] unsuccessfully. I'm hoping this can
be accepted now so that we can clean up the timer drivers that are
used in both UP and SMP situations. Right now these drivers have to ignore
the timer setup callback on the boot CPU to avoid registering clockevents
twice. This is not very symmetric and causes convuluted code that does
the same thing in two places.

Patches based on linux-next-20130221. Mostly compile tested as I don't
have access to the hardware.

[1] http://article.gmane.org/gmane.linux.ports.arm.kernel/145705

Note: A hotplug notifier is used by both x86 for the apb_timer (see 
apbt_cpuhp_notify) and by metag (see arch_timer_cpu_notify in
metag_generic.c) so this is not new.

Stephen Boyd (8):
  ARM: smp: Lower rating of dummy broadcast device
  ARM: smp_twd: Divorce smp_twd from local timer API
  ARM: EXYNOS4: Divorce mct from local timer API
  ARM: PRIMA2: Divorce timer-marco from local timer API
  ARM: MSM: Divorce msm_timer from local timer API
  clocksource: time-armada-370-xp: Fix sparse warning
  clocksource: time-armada-370-xp: Divorce from local timer API
  ARM: smp: Remove local timer API

 arch/arm/Kconfig |  12 +--
 arch/arm/include/asm/localtimer.h|  34 -
 arch/arm/kernel/smp.c|  69 +++--
 arch/arm/kernel/smp_twd.c|  48 
 arch/arm/mach-exynos/mct.c   |  53 +
 arch/arm/mach-msm/timer.c| 125 +--
 arch/arm/mach-omap2/Kconfig  |   1 -
 arch/arm/mach-omap2/timer.c  |   7 --
 arch/arm/mach-prima2/timer-marco.c   |  98 
 drivers/clocksource/time-armada-370-xp.c |  88 ++
 include/linux/time-armada-370-xp.h   |   4 +-
 11 files changed, 241 insertions(+), 298 deletions(-)
 delete mode 100644 arch/arm/include/asm/localtimer.h

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] memcg: Add memory.pressure_level events

2013-02-21 Thread Minchan Kim
On Thu, Feb 21, 2013 at 10:55:52PM -0800, Anton Vorontsov wrote:
> On Fri, Feb 22, 2013 at 08:56:08AM +0900, Minchan Kim wrote:
> > [...] The my point is that you have a plan to support? Why I have a
> > question is that you said your goal is to replace lowmemory killer
> 
> In short: yes, of course, if the non-memcg interface will be in demand.
> 
> > but android don't have enabled CONFIG_MEMCG as you know well
> > so they should enable it for using just notifier? or they need another hack 
> > to
> > connect notifier to global thing?
> 
> A hack is not an option for me. :-) My final goal is to switch Android to
> use the notifier without need for hacks/external patches or
> drivers/staging.
> 
> But my current goal is to make the most generic case work, and do this in
> the most correct way. That is, vmpressure + MEMCG. Once I accomplish this,
> I can then think of any niche needs (such as Android).
> 
> There will be two possibilities for Android:
> 
> 1. Obviously, turn on CONFIG_MEMCG. We need to measure its effect on real
>devices, and see if it makes sense. (Plus, maybe there are other uses
>for MEMCG on Android?)

I'd like to see this one.

> 
> or
> 
> 2. Implement /sys/fs/cgroups/memory/memory.pressure_level interface
>without MEMCG. Doing this will be really easy as we'll already have
>vmpressure() core, and Android has CROUPS=y. But I do expect some
>discussion like 'why don't you fix memcg instead?'. We'll have to
>answer this question by looking back at '1.'

Of course.

> 
> Also note that cgroups vmpressure notifiers were tried by QEMU folks, and
> it seemed to be useful:
> 
>http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg02821.html 

I saw that.

> 
> So, nowadays it is not only about Android. Some time ago I also got an
> email from Orna Agmon Ben-Yehuda, who suggested to use vmpressure stuff
> with 'memcached' (but I didn't find time to actually try it, so far. :(
> Thanks for the email, btw!).

I also got received email from another people in embedded side about
memory notifier which I worked long time ago and I used to introduce
your work to them instead of my old solution.
It seems they don't use Android and has very small RAM so they want
to handle memory very efficiently. For such purpose, I hope vmpressure
become tiny and can support even NOMMU system.

> 
> So it is useful with or without MEMCG, and if we will really need to
> support vmpressure without MEMCG, I will have to implement the support in
> addition to MEMCG case, yes.

Thanks for your clarification.

> 
> Thanks,
> 
> Anton
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Dangerous devm_request_irq() conversions

2013-02-21 Thread Dmitry Torokhov
On Fri, Feb 22, 2013 at 04:12:36PM +0900, Jingoo Han wrote:
> On Friday, February 22, 2013 3:54 PM, Dmitry Torokhov wrote:
> > 
> > Hi,
> > 
> > It looks like a whole slew of devm_request_irq() conversions just got
> > applied to mainline and many of them are quite broken.
> > 
> > Consider fd5231ce336e038037b4f0190a6838bdd6e17c6d or
> > c1879fe80c61f3be6f2ddb82509c2e7f92a484fe: the drivers udsed first to
> > free IRQ and then unregister the corresponding device ensuring that IRQ
> > handler, while it runs, has the device available. The mechanic
> > conversion to devm_request_irq() reverses the order of these operations
> > opening the race window where IRQ can reference device (or other
> > resource) that is already gone.
> > 
> > It would be nice if these could be reverted and revioewed again for
> > correctness.
> 
> Um, other RTC drivers already have been using devm_request_threaded_irq() or
> devm_request_irq() like this, before I added these patches.
> 
> For example, 
> ./drivers/rtc/rtc-tegra.c
> ./drivers/rtc/rtc-spear.c
> ./drivers/rtc/rtc-s3c.c
> ./drivers/rtc/rtc-mxc.c
> ./drivers/rtc/rtc-ds1553.c
> ./drivers/rtc/rtc-ds1511.c
> ./drivers/rtc/rtc-snvs.c
> ./drivers/rtc/rtc-imxdi.c
> ./drivers/rtc/rtc-tx4939.c
> ./drivers/rtc/rtc-mv.c
> ./drivers/rtc/rtc-coh901331.c
> ./drivers/rtc/rtc-stk17ta8.c
> ./drivers/rtc/rtc-lpc32xx.c
> ./drivers/rtc/rtc-tps65910.c
> ./drivers/rtc/rtc-rc5t583.c
> 
> 
> Also, even more, some RTC drivers calls rtc_device_unregister() first,
> then calls free_irq() later.
> 
> For example,
> ./drivers/rtc/rtc-vr41xx.c
> ./drivers/rtc/rtc-da9052.c
> ./drivers/rtc/rtc-isl1208.c
> ./drivers/rtc/rtc-88pm860x.c
> ./drivers/rtc/rtc-tps6586x.c
> ./drivers/rtc/rtc-mpc5121.c
> ./drivers/rtc/rtc-m48t59.c
> 
> 
> Please, don't argue revert without concrete reasons.

What more concrete reason do you need? I explained to you the exact
reason on the patches I noticed before and also on the 2 commits
referenced above: blind conversion to devm_* changes order of operation
which may be deadly with IRQs (but others, like clocks and regulators,
are important too).

The fact that crap slipped in the kernel before is not the valid reason
for adding more of the same crap.

Please *understand* APIs you are using before making changes.

> 
> If these devm_request_threaded_irq() or devm_request_irq() make the problem,
> devm_free_irq() will be added later.

And the point? If you use devm_request_irq() and then call
devm_free_irq() manually in all paths what you achieved is waste of
memory required for devm_* tracking.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] cpustat: use accessor functions for get/set/add

2013-02-21 Thread Amit Kucheria
On Fri, Feb 22, 2013 at 11:51 AM, Viresh Kumar  wrote:
> On Fri, Feb 22, 2013 at 11:26 AM, Kevin Hilman  wrote:
>> Add some accessor functions in order to facilitate the conversion to
>> atomic reads/writes of cpustat values.
>>
>> Signed-off-by: Kevin Hilman 
>> ---
>>  drivers/cpufreq/cpufreq_governor.c | 18 -
>>  drivers/cpufreq/cpufreq_ondemand.c |  2 +-
>
>> diff --git a/drivers/cpufreq/cpufreq_governor.c 
>> b/drivers/cpufreq/cpufreq_governor.c
>> index 6c5f1d3..ec6c315 100644
>> --- a/drivers/cpufreq/cpufreq_governor.c
>> +++ b/drivers/cpufreq/cpufreq_governor.c
>> @@ -36,12 +36,12 @@ static inline u64 get_cpu_idle_time_jiffy(unsigned int 
>> cpu, u64 *wall)
>>
>> cur_wall_time = jiffies64_to_cputime64(get_jiffies_64());
>>
>> -   busy_time = kcpustat_cpu(cpu).cpustat[CPUTIME_USER];
>> -   busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_SYSTEM];
>> -   busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_IRQ];
>> -   busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_SOFTIRQ];
>> -   busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_STEAL];
>> -   busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_NICE];
>> +   busy_time = kcpustat_cpu_get(cpu, CPUTIME_USER);
>> +   busy_time += kcpustat_cpu_get(cpu, CPUTIME_SYSTEM);
>> +   busy_time += kcpustat_cpu_get(cpu, CPUTIME_IRQ);
>> +   busy_time += kcpustat_cpu_get(cpu, CPUTIME_SOFTIRQ);
>> +   busy_time += kcpustat_cpu_get(cpu, CPUTIME_STEAL);
>> +   busy_time += kcpustat_cpu_get(cpu, CPUTIME_NICE);
>>
>> idle_time = cur_wall_time - busy_time;
>> if (wall)
>> @@ -103,7 +103,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
>> u64 cur_nice;
>> unsigned long cur_nice_jiffies;
>>
>> -   cur_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE] -
>> +   cur_nice = kcpustat_cpu_get(j, CPUTIME_NICE) -
>>  cdbs->prev_cpu_nice;
>> /*
>>  * Assumption: nice time between sampling periods 
>> will
>> @@ -113,7 +113,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
>> cputime64_to_jiffies64(cur_nice);
>>
>> cdbs->prev_cpu_nice =
>> -   kcpustat_cpu(j).cpustat[CPUTIME_NICE];
>> +   kcpustat_cpu_get(j, CPUTIME_NICE);
>> idle_time += jiffies_to_usecs(cur_nice_jiffies);
>> }
>>
>> @@ -216,7 +216,7 @@ int cpufreq_governor_dbs(struct dbs_data *dbs_data,
>> _cdbs->prev_cpu_wall);
>> if (ignore_nice)
>> j_cdbs->prev_cpu_nice =
>> -   
>> kcpustat_cpu(j).cpustat[CPUTIME_NICE];
>> +   kcpustat_cpu_get(j, CPUTIME_NICE);
>> }
>>
>> /*
>> diff --git a/drivers/cpufreq/cpufreq_ondemand.c 
>> b/drivers/cpufreq/cpufreq_ondemand.c
>> index 7731f7c..ac5d49f 100644
>> --- a/drivers/cpufreq/cpufreq_ondemand.c
>> +++ b/drivers/cpufreq/cpufreq_ondemand.c
>> @@ -403,7 +403,7 @@ static ssize_t store_ignore_nice_load(struct kobject *a, 
>> struct attribute *b,
>> 
>> _info->cdbs.prev_cpu_wall);
>> if (od_tuners.ignore_nice)
>> dbs_info->cdbs.prev_cpu_nice =
>> -   kcpustat_cpu(j).cpustat[CPUTIME_NICE];
>> +   kcpustat_cpu_get(j, CPUTIME_NICE);
>>
>> }
>> return count;
>
> For cpufreq:
>
> Acked-by: Viresh Kumar 
>
> Though i believe you also need this:
>
> diff --git a/drivers/cpufreq/cpufreq_conservative.c
> b/drivers/cpufreq/cpufreq_conservative.c
> index 64ef737..38e3ad7 100644
> --- a/drivers/cpufreq/cpufreq_conservative.c
> +++ b/drivers/cpufreq/cpufreq_conservative.c
> @@ -242,7 +242,7 @@ static ssize_t store_ignore_nice_load(struct
> kobject *a, struct attribute *b,
> 
> _info->cdbs.prev_cpu_wall);
> if (cs_tuners.ignore_nice)
> dbs_info->cdbs.prev_cpu_nice =
> -   kcpustat_cpu(j).cpustat[CPUTIME_NICE];
> +   kcpustat_cpu_get(j, CPUTIME_NICE);
> }
> return count;
>  }
>
> BTW, i don't see kcpustat_cpu() used in
>
>  kernel/sched/core.c| 12 +---
>  kernel/sched/cputime.c | 29 +--
>
> I searched tip/master as well as lnext/master.

Added by Frederic's Adaptive NOHZ patchset?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Re: [PATCH v4] mfd: syscon: Add non-DT support

2013-02-21 Thread Dong Aisheng
On Fri, Feb 22, 2013 at 11:01:18AM +0400, Alexander Shiyan wrote:
> > On Thu, Feb 21, 2013 at 07:29:02PM +0400, Alexander Shiyan wrote:
> > > This patch allow using syscon driver from the platform data, i.e.
> > > possibility using driver on systems without oftree support.
> > > For search syscon device from the client drivers,
> > > "syscon_regmap_lookup_by_pdevname" function was added.
> > > 
> > > Signed-off-by: Alexander Shiyan 
> > 
> > [...]
> > 
> > > + syscon->base = devm_ioremap_resource(dev, res);
> > > + if (!syscon->base)
> > 
> > Is this correct?
> 
> Hmm, of course IS_ERR should be used here...
> v5?
> 

Yes.
>From here:
https://lkml.org/lkml/2013/1/21/140
It seems it is.

> > 
> > > + return -EBUSY;

Both this line could also be changed.

> > >
> > 
> > Otherwise, i'm also ok with this patch.
> > Acked-by: Dong Aisheng 
> > 
> > BTW, i did not see Samuel's tree having this new API.
> > So, who will pick this patch?
> 
> I have same question.

I CCed Thierry and Greg who may know it.

Regards
Dong Aisheng

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] ksm: treat unstable nid like in stable tree

2013-02-21 Thread Ric Mason

On 02/21/2013 04:20 PM, Hugh Dickins wrote:

An inconsistency emerged in reviewing the NUMA node changes to KSM:
when meeting a page from the wrong NUMA node in a stable tree, we say
that it's okay for comparisons, but not as a leaf for merging; whereas
when meeting a page from the wrong NUMA node in an unstable tree, we
bail out immediately.


IIUC
- ksm page from the wrong NUMA node will be add to current node's stable 
tree
- normal page from the wrong NUMA node will be merged to current node's 
stable tree  <- where I miss here? I didn't see any special handling in 
function stable_tree_search for this case.
- normal page from the wrong NUMA node will compare but not as a leaf 
for merging after the patch




Now, it might be that a wrong NUMA node in an unstable tree is more
likely to correlate with instablility (different content, with rbnode
now misplaced) than page migration; but even so, we are accustomed to
instablility in the unstable tree.

Without strong evidence for which strategy is generally better, I'd
rather be consistent with what's done in the stable tree: accept a page
from the wrong NUMA node for comparison, but not as a leaf for merging.

Signed-off-by: Hugh Dickins 
---
  mm/ksm.c |   19 +--
  1 file changed, 9 insertions(+), 10 deletions(-)

--- mmotm.orig/mm/ksm.c 2013-02-20 22:28:23.584001392 -0800
+++ mmotm/mm/ksm.c  2013-02-20 22:28:27.288001480 -0800
@@ -1340,16 +1340,6 @@ struct rmap_item *unstable_tree_search_i
return NULL;
}
  
-		/*

-* If tree_page has been migrated to another NUMA node, it
-* will be flushed out and put into the right unstable tree
-* next time: only merge with it if merge_across_nodes.
-*/
-   if (!ksm_merge_across_nodes && page_to_nid(tree_page) != nid) {
-   put_page(tree_page);
-   return NULL;
-   }
-
ret = memcmp_pages(page, tree_page);
  
  		parent = *new;

@@ -1359,6 +1349,15 @@ struct rmap_item *unstable_tree_search_i
} else if (ret > 0) {
put_page(tree_page);
new = >rb_right;
+   } else if (!ksm_merge_across_nodes &&
+  page_to_nid(tree_page) != nid) {
+   /*
+* If tree_page has been migrated to another NUMA node,
+* it will be flushed out and put in the right unstable
+* tree next time: only merge with it when across_nodes.
+*/
+   put_page(tree_page);
+   return NULL;
} else {
*tree_pagep = tree_page;
return tree_rmap_item;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Dangerous devm_request_irq() conversions

2013-02-21 Thread Jingoo Han
On Friday, February 22, 2013 3:54 PM, Dmitry Torokhov wrote:
> 
> Hi,
> 
> It looks like a whole slew of devm_request_irq() conversions just got
> applied to mainline and many of them are quite broken.
> 
> Consider fd5231ce336e038037b4f0190a6838bdd6e17c6d or
> c1879fe80c61f3be6f2ddb82509c2e7f92a484fe: the drivers udsed first to
> free IRQ and then unregister the corresponding device ensuring that IRQ
> handler, while it runs, has the device available. The mechanic
> conversion to devm_request_irq() reverses the order of these operations
> opening the race window where IRQ can reference device (or other
> resource) that is already gone.
> 
> It would be nice if these could be reverted and revioewed again for
> correctness.

Um, other RTC drivers already have been using devm_request_threaded_irq() or
devm_request_irq() like this, before I added these patches.

For example, 
./drivers/rtc/rtc-tegra.c
./drivers/rtc/rtc-spear.c
./drivers/rtc/rtc-s3c.c
./drivers/rtc/rtc-mxc.c
./drivers/rtc/rtc-ds1553.c
./drivers/rtc/rtc-ds1511.c
./drivers/rtc/rtc-snvs.c
./drivers/rtc/rtc-imxdi.c
./drivers/rtc/rtc-tx4939.c
./drivers/rtc/rtc-mv.c
./drivers/rtc/rtc-coh901331.c
./drivers/rtc/rtc-stk17ta8.c
./drivers/rtc/rtc-lpc32xx.c
./drivers/rtc/rtc-tps65910.c
./drivers/rtc/rtc-rc5t583.c


Also, even more, some RTC drivers calls rtc_device_unregister() first,
then calls free_irq() later.

For example,
./drivers/rtc/rtc-vr41xx.c
./drivers/rtc/rtc-da9052.c
./drivers/rtc/rtc-isl1208.c
./drivers/rtc/rtc-88pm860x.c
./drivers/rtc/rtc-tps6586x.c
./drivers/rtc/rtc-mpc5121.c
./drivers/rtc/rtc-m48t59.c


Please, don't argue revert without concrete reasons.

If these devm_request_threaded_irq() or devm_request_irq() make the problem,
devm_free_irq() will be added later.


> 
> In general any conversion to devm_request_irq() needs double and triple
> checking.
> 
> Thanks.
> 
> --
> Dmitry

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 4/5] UBIFS: Add security.* XATTR support for the UBIFS

2013-02-21 Thread Artem Bityutskiy
OK, the lockdep warnings clearly tell the reason:

CPU0CPU1

   lock(>ui_mutex);
lock(>s_type->i_mutex_key#10);
lock(>ui_mutex);
   lock(>s_type->i_mutex_key#10);

And then there are 2 tracebacks which are useful and show that you
unnecessarily initialize the inode security contenxt whil holding the
parent inode lock. I think you do not need to hold that lock. Move the
initialization out of the protected section.

See below my suggestions.

On Wed, 2013-02-13 at 11:23 +0100, Marc Kleine-Budde wrote:
> @@ -280,6 +280,10 @@ static int ubifs_create(struct inode *dir, struct dentry 
> *dentry, umode_t mode,
>   err = ubifs_jnl_update(c, dir, >d_name, inode, 0, 0);
>   if (err)
>   goto out_cancel;
> +
> + err = ubifs_init_security(dir, inode, >d_name);
> + if (err)
> + goto out_cancel;
>   mutex_unlock(_ui->ui_mutex);

Can you move ubifs_init_security() up to before
'mutex_lock(_ui->ui_mutex)'

> @@ -742,6 +746,10 @@ static int ubifs_mkdir(struct inode *dir, struct dentry 
> *dentry, umode_t mode)
...
> + err = ubifs_init_security(dir, inode, >d_name);
> + if (err)
> + goto out_cancel;
>   mutex_unlock(_ui->ui_mutex);

Ditto.

> @@ -818,6 +826,10 @@ static int ubifs_mknod(struct inode *dir, struct dentry 
> *dentry,
...
> + err = ubifs_init_security(dir, inode, >d_name);
> + if (err)
> + goto out_cancel;
>   mutex_unlock(_ui->ui_mutex);

Ditto.

> @@ -894,6 +906,10 @@ static int ubifs_symlink(struct inode *dir, struct 
> dentry *dentry,
...
> + err = ubifs_init_security(dir, inode, >d_name);
> + if (err)
> + goto out_cancel;
>   mutex_unlock(_ui->ui_mutex);

Ditto.

> +int ubifs_init_security(struct inode *dentry, struct inode *inode,
> + const struct qstr *qstr)
> +{
> + int err;
> +
> + mutex_lock(>i_mutex);
> + err = security_inode_init_security(inode, dentry, qstr,
> +_initxattrs, 0);
> + mutex_unlock(>i_mutex);

I did not verify, but I doubt that you need i_mutex here, because you
only call this function when you create an inode, before it becomes
visible to VFS. Please, double-check this.

Thanks!

-- 
Best Regards,
Artem Bityutskiy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] fusb300_udc: modify stall clear and idma reset procedure

2013-02-21 Thread Yuan-Hsin Chen
From: Yuan-Hsin Chen 

Due to fusb300 controller modification, stall clear procedure should be
modified consistantly. This patch also fixes software bugs: only
enter IDMA_RESET when the condition matched and disable corresponding
PRD interrupt in IDMA_RESET.


Signed-off-by: Yuan-Hsin Chen 
---
 drivers/usb/gadget/fusb300_udc.c |9 ++---
 drivers/usb/gadget/fusb300_udc.h |2 +-
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/gadget/fusb300_udc.c b/drivers/usb/gadget/fusb300_udc.c
index 72cd5e6..109cab1 100644
--- a/drivers/usb/gadget/fusb300_udc.c
+++ b/drivers/usb/gadget/fusb300_udc.c
@@ -394,7 +394,7 @@ static void fusb300_clear_epnstall(struct fusb300 *fusb300, 
u8 ep)
 
if (reg & FUSB300_EPSET0_STL) {
printk(KERN_DEBUG "EP%d stall... Clear!!\n", ep);
-   reg &= ~FUSB300_EPSET0_STL;
+   reg |= FUSB300_EPSET0_STL_CLR;
iowrite32(reg, fusb300->reg + FUSB300_OFFSET_EPSET0(ep));
}
 }
@@ -930,9 +930,12 @@ static void fusb300_wait_idma_finished(struct fusb300_ep 
*ep)
 
fusb300_clear_int(ep->fusb300, FUSB300_OFFSET_IGR0,
FUSB300_IGR0_EPn_PRD_INT(ep->epnum));
+   return;
+
 IDMA_RESET:
-   fusb300_clear_int(ep->fusb300, FUSB300_OFFSET_IGER0,
-   FUSB300_IGER0_EEPn_PRD_INT(ep->epnum));
+   reg = ioread32(ep->fusb300->reg + FUSB300_OFFSET_IGER0);
+   reg &= ~FUSB300_IGER0_EEPn_PRD_INT(ep->epnum);
+   iowrite32(reg, ep->fusb300->reg + FUSB300_OFFSET_IGER0);
 }
 
 static void  fusb300_set_idma(struct fusb300_ep *ep,
diff --git a/drivers/usb/gadget/fusb300_udc.h b/drivers/usb/gadget/fusb300_udc.h
index 542cd83..ccae1b5 100644
--- a/drivers/usb/gadget/fusb300_udc.h
+++ b/drivers/usb/gadget/fusb300_udc.h
@@ -111,8 +111,8 @@
 /*
  * * EPn Setting 0 (EPn_SET0, offset = 020H+(n-1)*30H, n=1~15 )
  * */
+#define FUSB300_EPSET0_STL_CLR (1 << 3)
 #define FUSB300_EPSET0_CLRSEQNUM   (1 << 2)
-#define FUSB300_EPSET0_EPn_TX0BYTE (1 << 1)
 #define FUSB300_EPSET0_STL (1 << 0)
 
 /*
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re[2]: [PATCH v4] mfd: syscon: Add non-DT support

2013-02-21 Thread Alexander Shiyan
> On Thu, Feb 21, 2013 at 07:29:02PM +0400, Alexander Shiyan wrote:
> > This patch allow using syscon driver from the platform data, i.e.
> > possibility using driver on systems without oftree support.
> > For search syscon device from the client drivers,
> > "syscon_regmap_lookup_by_pdevname" function was added.
> > 
> > Signed-off-by: Alexander Shiyan 
> 
> [...]
> 
> > +   syscon->base = devm_ioremap_resource(dev, res);
> > +   if (!syscon->base)
> 
> Is this correct?

Hmm, of course IS_ERR should be used here...
v5?

> 
> > +   return -EBUSY;
> >
> 
> Otherwise, i'm also ok with this patch.
> Acked-by: Dong Aisheng 
> 
> BTW, i did not see Samuel's tree having this new API.
> So, who will pick this patch?

I have same question.

---


Re: [PATCH v2] memcg: Add memory.pressure_level events

2013-02-21 Thread Anton Vorontsov
On Fri, Feb 22, 2013 at 08:56:08AM +0900, Minchan Kim wrote:
> [...] The my point is that you have a plan to support? Why I have a
> question is that you said your goal is to replace lowmemory killer

In short: yes, of course, if the non-memcg interface will be in demand.

> but android don't have enabled CONFIG_MEMCG as you know well
> so they should enable it for using just notifier? or they need another hack to
> connect notifier to global thing?

A hack is not an option for me. :-) My final goal is to switch Android to
use the notifier without need for hacks/external patches or
drivers/staging.

But my current goal is to make the most generic case work, and do this in
the most correct way. That is, vmpressure + MEMCG. Once I accomplish this,
I can then think of any niche needs (such as Android).

There will be two possibilities for Android:

1. Obviously, turn on CONFIG_MEMCG. We need to measure its effect on real
   devices, and see if it makes sense. (Plus, maybe there are other uses
   for MEMCG on Android?)

or

2. Implement /sys/fs/cgroups/memory/memory.pressure_level interface
   without MEMCG. Doing this will be really easy as we'll already have
   vmpressure() core, and Android has CROUPS=y. But I do expect some
   discussion like 'why don't you fix memcg instead?'. We'll have to
   answer this question by looking back at '1.'

Also note that cgroups vmpressure notifiers were tried by QEMU folks, and
it seemed to be useful:

   http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg02821.html 

So, nowadays it is not only about Android. Some time ago I also got an
email from Orna Agmon Ben-Yehuda, who suggested to use vmpressure stuff
with 'memcached' (but I didn't find time to actually try it, so far. :(
Thanks for the email, btw!).

So it is useful with or without MEMCG, and if we will really need to
support vmpressure without MEMCG, I will have to implement the support in
addition to MEMCG case, yes.

Thanks,

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ARM: EXYNOS: Keep USB related LDOs always active on Origen

2013-02-21 Thread Tushar Behera
LDO3 and LDO8 are used for powering both device and host phy controllers.
These regulators are not handled in USB host driver. Hence we get
unexpected behaviour when the regulators are disabled elsewhere.

It would be best to keep these regulators always on.

Signed-off-by: Tushar Behera 
---

Based on v3.8.

 arch/arm/mach-exynos/mach-origen.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mach-exynos/mach-origen.c 
b/arch/arm/mach-exynos/mach-origen.c
index 5e34b9c..7351063 100644
--- a/arch/arm/mach-exynos/mach-origen.c
+++ b/arch/arm/mach-exynos/mach-origen.c
@@ -169,6 +169,7 @@ static struct regulator_init_data __initdata 
max8997_ldo3_data = {
.min_uV = 110,
.max_uV = 110,
.apply_uV   = 1,
+   .always_on  = 1,
.valid_ops_mask = REGULATOR_CHANGE_STATUS,
.state_mem  = {
.disabled   = 1,
@@ -227,6 +228,7 @@ static struct regulator_init_data __initdata 
max8997_ldo8_data = {
.min_uV = 330,
.max_uV = 330,
.apply_uV   = 1,
+   .always_on  = 1,
.valid_ops_mask = REGULATOR_CHANGE_STATUS,
.state_mem  = {
.disabled   = 1,
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] arch/arc for v3.9-rc1

2013-02-21 Thread Vineet Gupta
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Linus,

I would like to introduce the Linux port to ARC Processors (from Synopsys) for
3.9-rc1. The patch-set has been discussed on the public lists since Nov and has
received a fair bit of review, specially from Arnd, tglx, Al and other subsystem
maintainers for DeviceTree, kgdb .

The arch bits are in arch/arc, some asm-generic changes (acked by Arnd), a minor
change to PARISC (acked by Helge).

The series is a touch bigger for a new port for 2 main reasons:
1. It enables a basic kernel in first sub-series and adds ptrace/kgdb/.. later
2. Some of the fallout of review (DeviceTree support, multi-platform-image
support) were added on top of orig series, primarily to record the revision 
history.

Please consider pulling.

Thanks,
Vineet


The following changes since commit 949db153b6466c6f7cad5a427ecea94985927311:

  Linux 3.8-rc5 (2013-01-25 11:57:28 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc.git tags/arc-v3.9-rc1

for you to fetch changes up to fc32781bfdb56dad883469b65e468e749ef35fe5:

  ARC: [plat-arcfpga] DT arc-uart bindings change: "baud" => "current-speed"
(2013-02-15 23:16:22 +0530)

- 
Introducing Linux port to Synopsys ARC Processors (for 3.9-rc1)

This patchset contains architecture specific bits (arch/arc) to enable Linux on
ARC700 Processor and some minor adjustments to generic code (reviewed/acked).

- 
Gilad Ben-Yossef (1):
  ARC: Add support for ioremap_prot API

Mischa Jonker (1):
  ARC: kgdb support

Vineet Gupta (75):
  ARC: Generic Headers
  ARC: Build system: Makefiles, Kconfig, Linker script
  ARC: irqflags - Interrupt enabling/disabling at in-core intc
  ARC: Atomic/bitops/cmpxchg/barriers
  asm-generic headers: uaccess.h to conditionally define segment_eq()
  ARC: uaccess friends
  asm-generic: uaccess: Allow arches to over-ride __{get,put}_user_fn()
  ARC: [optim] uaccess __{get,put}_user() optimised
  asm-generic headers: Allow yet more arch overrides in checksum.h
  ARC: Checksum/byteorder/swab routines
  ARC: Fundamental ARCH data-types/defines
  ARC: Spinlock/rwlock/mutex primitives
  ARC: String library
  ARC: Low level IRQ/Trap/Exception Handling
  ARC: Interrupt Handling
  ARC: Non-MMU Exception Handling
  ARC: Syscall support (no-legacy-syscall ABI)
  ARC: Process-creation/scheduling/idle-loop
  ARC: Timers/counters/delay management
  ARC: Signal handling
  ARC: [Review] Preparing to fix incorrect syscall restarts due to signals
  ARC: [Review] Prevent incorrect syscall restarts
  ARC: Cache Flush Management
  ARC: Page Table Management
  ARC: MMU Context Management
  ARC: MMU Exception Handling
  ARC: TLB flush Handling
  ARC: Page Fault handling
  ARC: I/O and DMA Mappings
  ARC: Boot #1: low-level, setup_arch(), /proc/cpuinfo, mem init
  ARC: [plat-arcfpga] Static platform device for CONFIG_SERIAL_ARC
  ARC: [DeviceTree] Basic support
  ARC: [DeviceTree] Convert some Kconfig items to runtime values
  ARC: [plat-arcfpga]: Enabling DeviceTree for Angel4 board
  ARC: Last bits (stubs) to get to a running kernel with UART
  ARC: [plat-arcfpga] defconfig
  ARC: [optim] Cache "current" in Register r25
  ARC: ptrace support
  ARC: Futex support
  ARC: OProfile support
  ARC: Support for high priority interrupts in the in-core intc
  ARC: Module support
  ARC: Diagnostics: show_regs() etc
  ARC: SMP support
  ARC: DWARF2 .debug_frame based stack unwinder
  ARC: stacktracing APIs based on dw2 unwinder
  ARC: disassembly (needed by kprobes/kgdb/unaligned-access-emul)
  ARC: kprobes support
  sysctl: Enable PARISC "unaligned-trap" to be used cross-arch
  ARC: Unaligned access emulation
  ARC: Boot #2: Verbose Boot reporting / feature verification
  ARC: [plat-arfpga] BVCI Latency Unit setup
  perf, ARC: Enable building perf tools for ARC
  ARC: perf support (software counters only)
  ARC: Support for single cycle Close Coupled Mem (CCM)
  ARC: Hostlink Pseudo-Driver for Metaware Debugger
  ARC: UAPI Disintegrate arch/arc/include/asm
  ARC: [Review] Multi-platform image #1: Kconfig enablement
  ARC: Fold boards sub-menu into platform/SoC menu
  ARC: [Review] Multi-platform image #2: Board callback Infrastructure
  ARC: [Review] Multi-platform image #3: switch to board callback
  ARC: [Review] Multi-platform image #4: Isolate platform headers
  ARC: [Review] Multi-platform image #5: NR_IRQS defined by ARC core
  ARC: [Review] Multi-platform image #6: cpu-to-dma-addr optional
  ARC: [Review] Multi-platform image #7: SMP common code to use callbacks
  ARC: [Review] 

Re: [PATCH v4] mfd: syscon: Add non-DT support

2013-02-21 Thread Dong Aisheng
On Thu, Feb 21, 2013 at 07:29:02PM +0400, Alexander Shiyan wrote:
> This patch allow using syscon driver from the platform data, i.e.
> possibility using driver on systems without oftree support.
> For search syscon device from the client drivers,
> "syscon_regmap_lookup_by_pdevname" function was added.
> 
> Signed-off-by: Alexander Shiyan 

[...]

> + syscon->base = devm_ioremap_resource(dev, res);
> + if (!syscon->base)

Is this correct?

> + return -EBUSY;
>

Otherwise, i'm also ok with this patch.
Acked-by: Dong Aisheng 

BTW, i did not see Samuel's tree having this new API.
So, who will pick this patch?

Regards
Dong Aisheng

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Origen board hang with functionfs

2013-02-21 Thread Tushar Behera
On 02/22/2013 12:03 AM, John Stultz wrote:
> On 02/20/2013 06:01 PM, John Stultz wrote:
>> Hey Kukjin, Andrzej,
>> I recently started playing around with functionfs, and have
>> noticed some strange behavior with my origen board.
>>
>> If I enable the FunctionFS gadget driver, I see the board hang at boot
>> here:
>>
>> [2.36] USB Mass Storage support registered.
>> [2.365000] s3c-hsotg s3c-hsotg: regs f004, irq 103
>> [2.375000] s3c-hsotg s3c-hsotg: EPs:15
>> [2.38] s3c-hsotg s3c-hsotg: dedicated fifos
>> [2.385000] g_ffs: file system registered

I think the issue is because of the USB phy regulators. LDO3 and LDO8
power the phy regulators for OTG and HOST. These regulators are disabled
in OTG probe whereas they are not handled at all in HOST driver. Keeping
these LDOs always active should solve the problem for the time being.

I will follow-up with a patch shortly. But I am not sure if this patch
will be considered for mainline as board patches are not getting
accepted these days.

-- 
Tushar Behera
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Dangerous devm_request_irq() conversions

2013-02-21 Thread Dmitry Torokhov
Hi,

It looks like a whole slew of devm_request_irq() conversions just got
applied to mainline and many of them are quite broken.

Consider fd5231ce336e038037b4f0190a6838bdd6e17c6d or
c1879fe80c61f3be6f2ddb82509c2e7f92a484fe: the drivers udsed first to
free IRQ and then unregister the corresponding device ensuring that IRQ
handler, while it runs, has the device available. The mechanic
conversion to devm_request_irq() reverses the order of these operations
opening the race window where IRQ can reference device (or other
resource) that is already gone.

It would be nice if these could be reverted and revioewed again for
correctness.

In general any conversion to devm_request_irq() needs double and triple
checking.

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PATCH] USB patches for 3.9-rc1

2013-02-21 Thread Thierry Reding
On Thu, Feb 21, 2013 at 01:58:39PM -0800, Greg KH wrote:
> On Thu, Feb 21, 2013 at 12:25:24PM -0800, Linus Torvalds wrote:
> > On Thu, Feb 21, 2013 at 10:40 AM, Greg KH  
> > wrote:
> > >
> > > USB patches for 3.9-rc1
> > >
> > > Here's the big USB merge for 3.9-rc1
> > >
> > > Nothing major, lots of gadget fixes, and of course, xhci stuff.
> > 
> > Ok, so there were a couple of conflicts with Thierry Reding's series
> > to convert devm_request_and_ioremap() users into
> > devm_ioremap_resource(), where some of the old users had been
> > converted to use other helper functions (eg omap_get_control_dev()).
> 
> That's fine.
> 
> > I left the omap_get_control_dev() users alone, but I do want to note
> > that omap_control_usb_probe() itself now uses that
> > devm_request_and_ioremap() function. And I did *not* extend the merge
> > to do that kind of conversion in the helper function, so I'm assuming
> > Thierry might want to extend his work. Assuming people care enough..
> 
> Yes, his plan was to do another sweep of the calls and hopefully remove
> the old api in 3.10 or so once that is all cleaned up.

Given that even devm_request_and_ioremap() is rather new and people have
been busy sending patches to use it I had expected that the initial
series wouldn't catch all uses once it had been merged. grepping is easy
and I even have a semantic patch to help with the conversion so I'll
keep an eye out for any new occurrences.

Thierry


pgpcgAaqLKisO.pgp
Description: PGP signature


Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-21 Thread Michael Wang
On 02/22/2013 01:02 PM, Mike Galbraith wrote:
> On Fri, 2013-02-22 at 10:36 +0800, Michael Wang wrote: 
>> On 02/21/2013 05:43 PM, Mike Galbraith wrote:
>>> On Thu, 2013-02-21 at 17:08 +0800, Michael Wang wrote:
>>>
 But is this patch set really cause regression on your Q6600? It may
 sacrificed some thing, but I still think it will benefit far more,
 especially on huge systems.
>>>
>>> We spread on FORK/EXEC, and will no longer will pull communicating tasks
>>> back to a shared cache with the new logic preferring to leave wakee
>>> remote, so while no, I haven't tested (will try to find round tuit) it
>>> seems  it _must_ hurt.  Dragging data from one llc to the other on Q6600
>>> hurts a LOT.  Every time a client and server are cross llc, it's a huge
>>> hit.  The previous logic pulled communicating tasks together right when
>>> it matters the most, intermittent load... or interactive use.
>>
>> I agree that this is a problem need to be solved, but don't agree that
>> wake_affine() is the solution.
> 
> It's not perfect, but it's better than no countering force at all.  It's
> a relic of the dark ages, when affine meant L2, ie this cpu.  Now days,
> affine has a whole new meaning, L3, so it could be done differently, but
> _some_ kind of opposing force is required.
> 
>> According to my understanding, in the old world, wake_affine() will only
>> be used if curr_cpu and prev_cpu share cache, which means they are in
>> one package, whatever search in llc sd of curr_cpu or prev_cpu, we won't
>> have the chance to spread the task out of that package.
> 
> ? affine_sd is the first domain spanning both cpus, that may be NODE.
> True we won't ever spread in the wakeup path unless SD_WAKE_BALANCE is
> set that is.  Would be nice to be able to do that without shredding
> performance.

That's right, we need two conditions in each select instance:
1. prev_cpu and curr_cpu are not affine
2. SD_WAKE_BALANCE

> 
> Off the top of my pointy head, I can think of a way to _maybe_ improve
> the "affine" wakeup criteria:  Add a small (package size? and very fast)
> FIFO queue to task struct, record waker/wakee relationship.  If
> relationship exists in that queue (rbtree), try to wake local, if not,
> wake remote.  The thought is to identify situations ala 1:N pgbench
> where you really need to keep the load spread.  That need arises when
> the sum wakees + waker won't fit in one cache.  True buddies would
> always hit (hm, hit rate), always try to become affine where they
> thrive.  1:N stuff starts missing when client count exceeds package
> size, starts expanding it's horizons. 'Course you would still need to
> NAK if imbalanced too badly, and let NUMA stuff NAK touching lard-balls
> and whatnot.  With a little more smarts, we could have happy 1:N, and
> buddies don't have to chat through 2m thick walls to make 1:N scale as
> well as it can before it dies of stupidity.

So this is trying to take care the condition when curr_cpu(local) and
prev_cpu(remote) are on different nodes, which in the old world,
wake_affine() won't be invoked, correct?

Hmm...I think this maybe a good additional checking before enter balance
path, but I could not estimate the cost to record the relationship at
this moment of time...

Whatever, after applied the affine logical into new world, it will gain
the ability to spread tasks cross nodes just like the old world, your
idea may be an optimization, but the logical is out of the changing in
this patch set, which means if it benefits, the beneficiary will be not
only new but also old.

Regards,
Michael Wang

> 
> -Mike
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched: Fix signedness bug in yield_to()

2013-02-21 Thread Raghavendra KT
On Fri, Feb 22, 2013 at 4:56 AM, Marcelo Tosatti  wrote:
> On Thu, Feb 21, 2013 at 09:56:54AM +0100, Ingo Molnar wrote:
>>
>> * Shuah Khan  wrote:
>>
>> > On Tue, Feb 19, 2013 at 7:27 PM, Linux Kernel Mailing List
>> >  wrote:
>> > > Gitweb: 
>> > > http://git.kernel.org/linus/;a=commit;h=c3c186403c6abd32e719f005f0af950155a9e54d
>> > > Commit: c3c186403c6abd32e719f005f0af950155a9e54d
>> > > Parent: e0a79f529d5ba2507486d498b25da40911d95cf6
>> > > Author: Dan Carpenter 
>> > > AuthorDate: Tue Feb 5 14:37:51 2013 +0300
>> > > Committer:  Ingo Molnar 
>> > > CommitDate: Tue Feb 5 12:59:29 2013 +0100
>> > >
>> > > sched: Fix signedness bug in yield_to()
>> > >
>> > > In 7b270f6099 "sched: Bail out of yield_to when source and
>> > > target runqueue has one task" we changed this to store -ESRCH so
>> > > it needs to be signed.
>> >
>> > Dan, Ingo,
>> >
>> > I can't find the 7b270f6099 "sched: Bail out of yield_to when
>> > source and target runqueue has one task" in the latest Linus's
>> > git. Am I missing something.
>> >
>> > The current kenel/sched/core.c doesn't have the code from the
>> > associated patch https://patchwork.kernel.org/patch/2016651/
>>
>> As per the lkml discussion that one was supposed to go upstream
>> via the KVM tree. Marcelo?
>
> commit c3c186403c6abd32e719f005f0af950155a9e54d
> Author: Dan Carpenter 
> Date:   Tue Feb 5 14:37:51 2013 +0300
>
> sched: Fix signedness bug in yield_to()
>
> In 7b270f6099 "sched: Bail out of yield_to when source and
> target runqueue has one task" we changed this to store -ESRCH so
> it needs to be signed.
>
> Signed-off-by: Dan Carpenter 
> Cc: Peter Zijlstra 
> Cc: kbu...@01.org
> Cc: Steven Rostedt 
> Cc: Mike Galbraith 
> Link: http://lkml.kernel.org/r/20130205113751.GA20521@elgon.mountain
> Signed-off-by: Ingo Molnar 
>

IIUC, we are only changing variable in yield_to  from bool to int.
I am curious whether we need changes in struct  sched_class (sched.h)

 bool (*yield_to_task) (struct rq *rq, struct task_struct *p, bool preempt);
==>
int (*yield_to_task) (struct rq *rq, struct task_struct *p, bool preempt);

otherwise we would assign bool value  to int here

 yielded = curr->sched_class->yield_to_task(rq, p, preempt);

this return values also cascaded to kvm_main.c.

If we need to patchup entire thing, I can cook a correction patch.

Thanks and Regards
Raghu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Resend: [patch] hsi : Avoid race condition between HSI controller and HSI client when system restart and power down

2013-02-21 Thread xtu4
Avoid race condition between HSI controller and HSI client when system 
restart and power down


hsi_isr_tasklet disabled in HSI_controller exit, but before HSI 
controller exit,
HSI client will cleanup, this cleanup will destroy the spinlock used 
by the
hsi_isr_tasklet,so if after HSI client cleanup, there still such 
tasklet running,

issue will happend. here is the issue stack as below.
hsi-ctrl: WAKEf514b000: f9a800e5 0010 6006  013f5bf9 
582bf9b8 3d565244

hsi-dlp: TTY device close request (mmgr, 133)
hsi-dlp: port shutdown request
mdm_ctrl: Unexpected RESET_OUT 0x0
BUG: spinlock bad magic on CPU#3, zygote/137
lock: f53a0fbc, .magic: , .owner: /-1, .owner_cpu: 0
Pid: 137, comm: zygote Tainted: G C 3.0.34-141888-g9e0a6fb #1
Call Trace:
 [] ? printk+0x1d/0x1f
 [] spin_bug+0xa4/0xac
 [] do_raw_spin_lock+0x7d/0x170
 [] ? _raw_spin_unlock_irqrestore+0x26/0x50
 [] _raw_spin_lock_irqsave+0x2c/0x40
 [] complete+0x20/0x60
 [] ? _raw_spin_unlock_irqrestore+0x26/0x50
 [] dlp_ctrl_complete_tx+0x29/0x40
 [] hsi_isr_tasklet+0x394/0x11a0
 [] ? sched_clock_cpu+0xe5/0x150
 [] tasklet_hi_action+0x59/0x120
 [] ? it_real_fn+0x18/0xb0
 [] __do_softirq+0x9b/0x220
 [] ? remote_softirq_receive+0x110/0x110
Change-Id: I6a0ca0c14409bfc4cd7a2767a4f203c171ece007
Signed-off-by: xiaobing tu 
Signed-off-by: chao bi 
---
 drivers/hsi/clients/dlp_ctrl.c  |4 
 drivers/hsi/clients/dlp_flash.c |5 -
 drivers/hsi/clients/dlp_net.c   |4 
 drivers/hsi/clients/dlp_trace.c |1 -
 drivers/hsi/clients/dlp_tty.c   |5 -
 5 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/hsi/clients/dlp_ctrl.c 
b/drivers/hsi/clients/dlp_ctrl.c

index b09f9e6..e980f0c 100644
--- a/drivers/hsi/clients/dlp_ctrl.c
+++ b/drivers/hsi/clients/dlp_ctrl.c
@@ -394,6 +394,8 @@ static void dlp_ctrl_complete_tx(struct hsi_msg *msg)
 struct dlp_command *dlp_cmd = msg->context;
 struct dlp_channel *ch_ctx = dlp_cmd->channel;

+if (dlp_drv.channels[DLP_CHANNEL_CTRL] == NULL)
+return;
 dlp_cmd->status = (msg->status == HSI_STATUS_COMPLETED) ? 0 : -EIO;

 /* Command done, notify the sender */
@@ -433,6 +435,8 @@ static void dlp_ctrl_complete_rx(struct hsi_msg *msg)
 unsigned long flags;
 int hsi_channel, elp_channel, ret, response, msg_complete, state;

+if (dlp_drv.channels[DLP_CHANNEL_CTRL] == NULL)
+return;
 /* Copy the reponse */
 memcpy(,
sg_virt(msg->sgt.sgl), sizeof(struct dlp_command_params));
diff --git a/drivers/hsi/clients/dlp_flash.c 
b/drivers/hsi/clients/dlp_flash.c

index 885b73a..b333d74 100644
--- a/drivers/hsi/clients/dlp_flash.c
+++ b/drivers/hsi/clients/dlp_flash.c
@@ -42,7 +42,6 @@
  */
 #define DLP_FLASH_NB_RX_MSG 10

-
 /*
  * struct flashing_driver - HSI Modem flashing driver protocol
  *
@@ -259,6 +258,8 @@ static struct hsi_msg *dlp_boot_rx_dequeue(struct 
dlp_channel *ch_ctx)

 */
 static void dlp_flash_complete_tx(struct hsi_msg *msg)
 {
+if (dlp_drv.channels[DLP_CHANNEL_FLASH] == NULL)
+return;
 /* Delete the received msg */
 dlp_pdu_free(msg, -1);
 }
@@ -274,6 +275,8 @@ static void dlp_flash_complete_rx(struct hsi_msg 
*msg)

 struct dlp_flash_ctx *flash_ctx = ch_ctx->ch_data;
 int ret;

+if (dlp_drv.channels[DLP_CHANNEL_FLASH] == NULL)
+return;
 if (msg->status != HSI_STATUS_COMPLETED) {
 pr_err(DRVNAME ": Invalid msg status: %d (ignored)\n",
 msg->status);
diff --git a/drivers/hsi/clients/dlp_net.c 
b/drivers/hsi/clients/dlp_net.c

index f3ca817..0c3e672 100644
--- a/drivers/hsi/clients/dlp_net.c
+++ b/drivers/hsi/clients/dlp_net.c
@@ -158,6 +158,8 @@ static void dlp_net_complete_tx(struct hsi_msg *pdu)
 struct dlp_net_context *net_ctx = ch_ctx->ch_data;
 struct dlp_xfer_ctx *xfer_ctx = _ctx->tx;

+if (dlp_drv.channels[ch_ctx->hsi_channel] == NULL)
+return;
 /* TX done, free the skb */
 dev_kfree_skb(msg_param->skb);

@@ -197,6 +199,8 @@ static void dlp_net_complete_rx(struct hsi_msg *pdu)
 unsigned int *ptr;
 unsigned long flags;

+if (dlp_drv.channels[ch_ctx->hsi_channel] == NULL)
+return;
 /* Pop the CTRL queue */
 write_lock_irqsave(_ctx->lock, flags);
 dlp_hsi_controller_pop(xfer_ctx);
diff --git a/drivers/hsi/clients/dlp_trace.c 
b/drivers/hsi/clients/dlp_trace.c

index fa91985..0067798 100644
--- a/drivers/hsi/clients/dlp_trace.c
+++ b/drivers/hsi/clients/dlp_trace.c
@@ -84,7 +84,6 @@ static unsigned int log_dropped_data;
 module_param_named(log_dropped_data, log_dropped_data, int, S_IRUGO | 
S_IWUSR);

 #endif

-
 /*
  *
  */
diff --git a/drivers/hsi/clients/dlp_tty.c 
b/drivers/hsi/clients/dlp_tty.c

index 7774484..47f6697 100644
--- a/drivers/hsi/clients/dlp_tty.c
+++ b/drivers/hsi/clients/dlp_tty.c
@@ -68,7 +68,6 @@ struct dlp_tty_context {
 struct work_structdo_tty_forward;
 };

-
 /**
  * Push as many RX PDUs  as possible to the controller FIFO
 

Re: Re: [PATCH] PM / devfreq: fix missing unlock on error in exynos4_busfreq_pm_notifier_event()

2013-02-21 Thread MyungJoo Ham
> On 12:33-20130222, Wei Yongjun wrote:
> > From: Wei Yongjun 
> > 
> > Add the missing unlock before return from function
> > exynos4_busfreq_pm_notifier_event() in the error
> > handling case.
> > 
> > This issue introduced by commit 8fa938
> > (PM / devfreq: exynos4_bus: honor RCU lock usage)
> > 
> > Signed-off-by: Wei Yongjun 
> Arrgh.. Thanks for catching this :( My bad.
> 
> Fix looks good to me. upto  MyungJoo.

Applied to devfreq repository.
I'll send pull request to Rafael soon along with other patches.

> 
> MyungJoo, Rafael,
> btw, adding linux...@vger.kernel.org to  MAINTAINERS for devfreq might
> be a nice idea to have right audience.

It appears that replacing the current mailing list address with linux-pm is 
appropriate. If no one objects, I'll post the suggestion later.


Cheers,
MyungJoo

N떑꿩�r툤y鉉싕b쾊Ф푤v�^�)頻{.n�+돴쪐{콗喩zX㎍썳變}찠꼿쟺�:+v돣�쳭喩zZ+€�+zf"톒쉱�~넮녬i鎬z�췿ⅱ�?솳鈺�&�)刪f뷌^j푹y쬶끷@A첺뛴
0띠h��뭝

[patch] hsi : Avoid race condition between HSI controller and HSI client when system restart and power down

2013-02-21 Thread xtu4
Avoid race condition between HSI controller and HSI client when system 
restart and power down



hsi_isr_tasklet disabled in HSI_controller exit, but before HSI 
controller exit,

HSI client will cleanup, this cleanup will destroy the spinlock used by the
hsi_isr_tasklet,so if after HSI client cleanup, there still such tasklet 
running,

issue will happend. here is the issue stack as below.
hsi-ctrl: WAKEf514b000: f9a800e5 0010 6006  013f5bf9 
582bf9b8 3d565244

hsi-dlp: TTY device close request (mmgr, 133)
hsi-dlp: port shutdown request
mdm_ctrl: Unexpected RESET_OUT 0x0
BUG: spinlock bad magic on CPU#3, zygote/137
lock: f53a0fbc, .magic: , .owner: /-1, .owner_cpu: 0
Pid: 137, comm: zygote Tainted: G C  3.0.34-141888-g9e0a6fb #1
Call Trace:
 [] ? printk+0x1d/0x1f
 [] spin_bug+0xa4/0xac
 [] do_raw_spin_lock+0x7d/0x170
 [] ? _raw_spin_unlock_irqrestore+0x26/0x50
 [] _raw_spin_lock_irqsave+0x2c/0x40
 [] complete+0x20/0x60
 [] ? _raw_spin_unlock_irqrestore+0x26/0x50
 [] dlp_ctrl_complete_tx+0x29/0x40
 [] hsi_isr_tasklet+0x394/0x11a0
 [] ? sched_clock_cpu+0xe5/0x150
 [] tasklet_hi_action+0x59/0x120
 [] ? it_real_fn+0x18/0xb0
 [] __do_softirq+0x9b/0x220
 [] ? remote_softirq_receive+0x110/0x110
Change-Id: I6a0ca0c14409bfc4cd7a2767a4f203c171ece007
Signed-off-by: xiaobing tu 
Signed-off-by: chao bi 
---
 drivers/hsi/clients/dlp_ctrl.c  |4 
 drivers/hsi/clients/dlp_flash.c |5 -
 drivers/hsi/clients/dlp_net.c   |4 
 drivers/hsi/clients/dlp_trace.c |1 -
 drivers/hsi/clients/dlp_tty.c   |5 -
 5 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/hsi/clients/dlp_ctrl.c b/drivers/hsi/clients/dlp_ctrl.c
index b09f9e6..e980f0c 100644
--- a/drivers/hsi/clients/dlp_ctrl.c
+++ b/drivers/hsi/clients/dlp_ctrl.c
@@ -394,6 +394,8 @@ static void dlp_ctrl_complete_tx(struct hsi_msg *msg)
 struct dlp_command *dlp_cmd = msg->context;
 struct dlp_channel *ch_ctx = dlp_cmd->channel;

+if (dlp_drv.channels[DLP_CHANNEL_CTRL] == NULL)
+return;
 dlp_cmd->status = (msg->status == HSI_STATUS_COMPLETED) ? 0 : -EIO;

 /* Command done, notify the sender */
@@ -433,6 +435,8 @@ static void dlp_ctrl_complete_rx(struct hsi_msg *msg)
 unsigned long flags;
 int hsi_channel, elp_channel, ret, response, msg_complete, state;

+if (dlp_drv.channels[DLP_CHANNEL_CTRL] == NULL)
+return;
 /* Copy the reponse */
 memcpy(,
sg_virt(msg->sgt.sgl), sizeof(struct dlp_command_params));
diff --git a/drivers/hsi/clients/dlp_flash.c 
b/drivers/hsi/clients/dlp_flash.c

index 885b73a..b333d74 100644
--- a/drivers/hsi/clients/dlp_flash.c
+++ b/drivers/hsi/clients/dlp_flash.c
@@ -42,7 +42,6 @@
  */
 #define DLP_FLASH_NB_RX_MSG 10

-
 /*
  * struct flashing_driver - HSI Modem flashing driver protocol
  *
@@ -259,6 +258,8 @@ static struct hsi_msg *dlp_boot_rx_dequeue(struct 
dlp_channel *ch_ctx)

 */
 static void dlp_flash_complete_tx(struct hsi_msg *msg)
 {
+if (dlp_drv.channels[DLP_CHANNEL_FLASH] == NULL)
+return;
 /* Delete the received msg */
 dlp_pdu_free(msg, -1);
 }
@@ -274,6 +275,8 @@ static void dlp_flash_complete_rx(struct hsi_msg *msg)
 struct dlp_flash_ctx *flash_ctx = ch_ctx->ch_data;
 int ret;

+if (dlp_drv.channels[DLP_CHANNEL_FLASH] == NULL)
+return;
 if (msg->status != HSI_STATUS_COMPLETED) {
 pr_err(DRVNAME ": Invalid msg status: %d (ignored)\n",
 msg->status);
diff --git a/drivers/hsi/clients/dlp_net.c b/drivers/hsi/clients/dlp_net.c
index f3ca817..0c3e672 100644
--- a/drivers/hsi/clients/dlp_net.c
+++ b/drivers/hsi/clients/dlp_net.c
@@ -158,6 +158,8 @@ static void dlp_net_complete_tx(struct hsi_msg *pdu)
 struct dlp_net_context *net_ctx = ch_ctx->ch_data;
 struct dlp_xfer_ctx *xfer_ctx = _ctx->tx;

+if (dlp_drv.channels[ch_ctx->hsi_channel] == NULL)
+return;
 /* TX done, free the skb */
 dev_kfree_skb(msg_param->skb);

@@ -197,6 +199,8 @@ static void dlp_net_complete_rx(struct hsi_msg *pdu)
 unsigned int *ptr;
 unsigned long flags;

+if (dlp_drv.channels[ch_ctx->hsi_channel] == NULL)
+return;
 /* Pop the CTRL queue */
 write_lock_irqsave(_ctx->lock, flags);
 dlp_hsi_controller_pop(xfer_ctx);
diff --git a/drivers/hsi/clients/dlp_trace.c 
b/drivers/hsi/clients/dlp_trace.c

index fa91985..0067798 100644
--- a/drivers/hsi/clients/dlp_trace.c
+++ b/drivers/hsi/clients/dlp_trace.c
@@ -84,7 +84,6 @@ static unsigned int log_dropped_data;
 module_param_named(log_dropped_data, log_dropped_data, int, S_IRUGO | 
S_IWUSR);

 #endif

-
 /*
  *
  */
diff --git a/drivers/hsi/clients/dlp_tty.c b/drivers/hsi/clients/dlp_tty.c
index 7774484..47f6697 100644
--- a/drivers/hsi/clients/dlp_tty.c
+++ b/drivers/hsi/clients/dlp_tty.c
@@ -68,7 +68,6 @@ struct dlp_tty_context {
 struct work_structdo_tty_forward;
 };

-
 /**
  * Push as many RX PDUs  as possible to the controller FIFO
  *
@@ 

Re: [PATCH v2] staging: comedi: drivers: usbduxsigma.c: fix DMA buffers on stack

2013-02-21 Thread Dan Carpenter
Looks good.

Reviewed-by: Dan Carpenter 

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] cpustat: use accessor functions for get/set/add

2013-02-21 Thread Viresh Kumar
On Fri, Feb 22, 2013 at 11:26 AM, Kevin Hilman  wrote:
> Add some accessor functions in order to facilitate the conversion to
> atomic reads/writes of cpustat values.
>
> Signed-off-by: Kevin Hilman 
> ---
>  drivers/cpufreq/cpufreq_governor.c | 18 -
>  drivers/cpufreq/cpufreq_ondemand.c |  2 +-

> diff --git a/drivers/cpufreq/cpufreq_governor.c 
> b/drivers/cpufreq/cpufreq_governor.c
> index 6c5f1d3..ec6c315 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -36,12 +36,12 @@ static inline u64 get_cpu_idle_time_jiffy(unsigned int 
> cpu, u64 *wall)
>
> cur_wall_time = jiffies64_to_cputime64(get_jiffies_64());
>
> -   busy_time = kcpustat_cpu(cpu).cpustat[CPUTIME_USER];
> -   busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_SYSTEM];
> -   busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_IRQ];
> -   busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_SOFTIRQ];
> -   busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_STEAL];
> -   busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_NICE];
> +   busy_time = kcpustat_cpu_get(cpu, CPUTIME_USER);
> +   busy_time += kcpustat_cpu_get(cpu, CPUTIME_SYSTEM);
> +   busy_time += kcpustat_cpu_get(cpu, CPUTIME_IRQ);
> +   busy_time += kcpustat_cpu_get(cpu, CPUTIME_SOFTIRQ);
> +   busy_time += kcpustat_cpu_get(cpu, CPUTIME_STEAL);
> +   busy_time += kcpustat_cpu_get(cpu, CPUTIME_NICE);
>
> idle_time = cur_wall_time - busy_time;
> if (wall)
> @@ -103,7 +103,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
> u64 cur_nice;
> unsigned long cur_nice_jiffies;
>
> -   cur_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE] -
> +   cur_nice = kcpustat_cpu_get(j, CPUTIME_NICE) -
>  cdbs->prev_cpu_nice;
> /*
>  * Assumption: nice time between sampling periods will
> @@ -113,7 +113,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
> cputime64_to_jiffies64(cur_nice);
>
> cdbs->prev_cpu_nice =
> -   kcpustat_cpu(j).cpustat[CPUTIME_NICE];
> +   kcpustat_cpu_get(j, CPUTIME_NICE);
> idle_time += jiffies_to_usecs(cur_nice_jiffies);
> }
>
> @@ -216,7 +216,7 @@ int cpufreq_governor_dbs(struct dbs_data *dbs_data,
> _cdbs->prev_cpu_wall);
> if (ignore_nice)
> j_cdbs->prev_cpu_nice =
> -   kcpustat_cpu(j).cpustat[CPUTIME_NICE];
> +   kcpustat_cpu_get(j, CPUTIME_NICE);
> }
>
> /*
> diff --git a/drivers/cpufreq/cpufreq_ondemand.c 
> b/drivers/cpufreq/cpufreq_ondemand.c
> index 7731f7c..ac5d49f 100644
> --- a/drivers/cpufreq/cpufreq_ondemand.c
> +++ b/drivers/cpufreq/cpufreq_ondemand.c
> @@ -403,7 +403,7 @@ static ssize_t store_ignore_nice_load(struct kobject *a, 
> struct attribute *b,
> 
> _info->cdbs.prev_cpu_wall);
> if (od_tuners.ignore_nice)
> dbs_info->cdbs.prev_cpu_nice =
> -   kcpustat_cpu(j).cpustat[CPUTIME_NICE];
> +   kcpustat_cpu_get(j, CPUTIME_NICE);
>
> }
> return count;

For cpufreq:

Acked-by: Viresh Kumar 

Though i believe you also need this:

diff --git a/drivers/cpufreq/cpufreq_conservative.c
b/drivers/cpufreq/cpufreq_conservative.c
index 64ef737..38e3ad7 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -242,7 +242,7 @@ static ssize_t store_ignore_nice_load(struct
kobject *a, struct attribute *b,
_info->cdbs.prev_cpu_wall);
if (cs_tuners.ignore_nice)
dbs_info->cdbs.prev_cpu_nice =
-   kcpustat_cpu(j).cpustat[CPUTIME_NICE];
+   kcpustat_cpu_get(j, CPUTIME_NICE);
}
return count;
 }

BTW, i don't see kcpustat_cpu() used in

 kernel/sched/core.c| 12 +---
 kernel/sched/cputime.c | 29 +--

I searched tip/master as well as lnext/master.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-21 Thread Mike Galbraith
On Fri, 2013-02-22 at 14:06 +0800, Michael Wang wrote: 
> On 02/22/2013 01:08 PM, Mike Galbraith wrote:
> > On Fri, 2013-02-22 at 10:37 +0800, Michael Wang wrote:
> > 
> >> According to the testing result, I could not agree this purpose of
> >> wake_affine() benefit us, but I'm sure that wake_affine() is a terrible
> >> performance killer when system is busy.
> > 
> > (hm, result is singular.. pgbench in 1:N mode only?)
> 
> I'm not sure about how pgbench implemented, all I know is it will create
> several instance and access the database, I suppose no different from
> several threads access database (1 server and N clients?).

It's user switchable.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-21 Thread Mike Galbraith
On Fri, 2013-02-22 at 13:26 +0800, Michael Wang wrote:

> Just confirm that I'm not on the wrong way, did the 1:N mode here means
> 1 task forked N threads, and child always talk with father?

Yes, one server, many clients.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] staging: comedi: drivers: usbduxsigma.c: fix DMA buffers on stack

2013-02-21 Thread Kumar Amit Mehta
This patch fixes an instance of DMA buffer on stack(being passed to
usb_control_msg)for the USB-DUXsigma Board driver. Found using smatch.

Signed-off-by: Kumar Amit Mehta 
---
 drivers/staging/comedi/drivers/usbduxsigma.c |   27 --
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/comedi/drivers/usbduxsigma.c 
b/drivers/staging/comedi/drivers/usbduxsigma.c
index dc6b017..9e99a4b 100644
--- a/drivers/staging/comedi/drivers/usbduxsigma.c
+++ b/drivers/staging/comedi/drivers/usbduxsigma.c
@@ -681,7 +681,11 @@ static void usbduxsub_ao_IsocIrq(struct urb *urb)
 static int usbduxsub_start(struct usbduxsub *usbduxsub)
 {
int errcode = 0;
-   uint8_t local_transfer_buffer[16];
+   uint8_t *local_transfer_buffer;
+
+   local_transfer_buffer = kmalloc(16, GFP_KERNEL);
+   if (!local_transfer_buffer)
+   return -ENOMEM;
 
/* 7f92 to zero */
local_transfer_buffer[0] = 0;
@@ -702,19 +706,22 @@ static int usbduxsub_start(struct usbduxsub *usbduxsub)
  1,
  /* Timeout */
  BULK_TIMEOUT);
-   if (errcode < 0) {
+   if (errcode < 0)
dev_err(>interface->dev,
"comedi_: control msg failed (start)\n");
-   return errcode;
-   }
-   return 0;
+
+   kfree(local_transfer_buffer);
+   return errcode;
 }
 
 static int usbduxsub_stop(struct usbduxsub *usbduxsub)
 {
int errcode = 0;
+   uint8_t *local_transfer_buffer;
 
-   uint8_t local_transfer_buffer[16];
+   local_transfer_buffer = kmalloc(16, GFP_KERNEL);
+   if (!local_transfer_buffer)
+   return -ENOMEM;
 
/* 7f92 to one */
local_transfer_buffer[0] = 1;
@@ -732,12 +739,12 @@ static int usbduxsub_stop(struct usbduxsub *usbduxsub)
  1,
  /* Timeout */
  BULK_TIMEOUT);
-   if (errcode < 0) {
+   if (errcode < 0)
dev_err(>interface->dev,
"comedi_: control msg failed (stop)\n");
-   return errcode;
-   }
-   return 0;
+
+   kfree(local_transfer_buffer);
+   return errcode;
 }
 
 static int usbduxsub_upload(struct usbduxsub *usbduxsub,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] remoteproc: off by one in rproc_virtio_new_vringh()

2013-02-21 Thread Dan Carpenter
It should be >= ARRAY_SIZE() instead of > ARRAY_SIZE() because it
is an index.

Signed-off-by: Dan Carpenter 

diff --git a/drivers/remoteproc/remoteproc_virtio.c 
b/drivers/remoteproc/remoteproc_virtio.c
index dba33ff..b5e3af5 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -208,7 +208,7 @@ rproc_virtio_new_vringh(struct virtio_device *vdev, 
unsigned index,
struct vringh *vrh;
int err;
 
-   if (index > ARRAY_SIZE(rvdev->vring)) {
+   if (index >= ARRAY_SIZE(rvdev->vring)) {
dev_err(>vdev.dev, "bad vring index: %d\n", index);
return NULL;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-21 Thread Michael Wang
On 02/22/2013 01:08 PM, Mike Galbraith wrote:
> On Fri, 2013-02-22 at 10:37 +0800, Michael Wang wrote:
> 
>> According to the testing result, I could not agree this purpose of
>> wake_affine() benefit us, but I'm sure that wake_affine() is a terrible
>> performance killer when system is busy.
> 
> (hm, result is singular.. pgbench in 1:N mode only?)

I'm not sure about how pgbench implemented, all I know is it will create
several instance and access the database, I suppose no different from
several threads access database (1 server and N clients?).

There are improvement since when system busy, wake_affine() will be skipped.

And in old world, when system is busy, wake_affine() will only be
skipped if prev_cpu and curr_cpu belong to different nodes.

Regards,
Michael Wang

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 2/5] kernel_cpustat: convert to atomic 64-bit accessors

2013-02-21 Thread Kevin Hilman
Frederic Weisbecker  writes:

> 2013/2/21 Frederic Weisbecker :
>> 2013/2/21 Kevin Hilman :
>>> Subject: [PATCH 2/5] kernel_cpustat: convert to atomic 64-bit accessors
>>>
>>> Use the atomic64_* accessors for all the kernel_cpustat fields to
>>> ensure atomic access on non-64 bit platforms.
>>>
>>> Thanks to Mats Liljegren for CGROUP_CPUACCT related fixes.
>>>
>>> Cc: Mats Liljegren 
>>> Signed-off-by: Kevin Hilman 
>>
>> Funny stuff, I thought struct kernel_cpustat was made of cputime_t
>> field. Actually it's u64. So the issue is independant from the new
>> full dynticks cputime accounting. It was already broken before.
>>
>> But yeah that's not the point, we still want to fix this anyway. But
>> let's just treat this patch as independant.

OK, I just sent an updated series based on your proposal.

Thanks for the review,

Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] cpustat: use accessor functions for get/set/add

2013-02-21 Thread Kevin Hilman
Add some accessor functions in order to facilitate the conversion to
atomic reads/writes of cpustat values.

Signed-off-by: Kevin Hilman 
---
 arch/s390/appldata/appldata_os.c   | 16 +++
 drivers/cpufreq/cpufreq_governor.c | 18 -
 drivers/cpufreq/cpufreq_ondemand.c |  2 +-
 drivers/macintosh/rack-meter.c |  6 +++---
 fs/proc/stat.c | 40 +++---
 fs/proc/uptime.c   |  2 +-
 include/linux/kernel_stat.h|  7 ++-
 kernel/sched/core.c| 12 +---
 kernel/sched/cputime.c | 29 +--
 9 files changed, 66 insertions(+), 66 deletions(-)

diff --git a/arch/s390/appldata/appldata_os.c b/arch/s390/appldata/appldata_os.c
index 87521ba..eff76f8 100644
--- a/arch/s390/appldata/appldata_os.c
+++ b/arch/s390/appldata/appldata_os.c
@@ -113,21 +113,21 @@ static void appldata_get_os_data(void *data)
j = 0;
for_each_online_cpu(i) {
os_data->os_cpu[j].per_cpu_user =
-   
cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_USER]);
+   cputime_to_jiffies(kcpustat_cpu_get(i, CPUTIME_USER));
os_data->os_cpu[j].per_cpu_nice =
-   
cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_NICE]);
+   cputime_to_jiffies(kcpustat_cpu_get(i, CPUTIME_NICE));
os_data->os_cpu[j].per_cpu_system =
-   
cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM]);
+   cputime_to_jiffies(kcpustat_cpu_get(i, CPUTIME_SYSTEM));
os_data->os_cpu[j].per_cpu_idle =
-   
cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IDLE]);
+   cputime_to_jiffies(kcpustat_cpu_get(i, CPUTIME_IDLE));
os_data->os_cpu[j].per_cpu_irq =
-   
cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IRQ]);
+   cputime_to_jiffies(kcpustat_cpu_get(i, CPUTIME_IRQ));
os_data->os_cpu[j].per_cpu_softirq =
-   
cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ]);
+   cputime_to_jiffies(kcpustat_cpu_get(i, 
CPUTIME_SOFTIRQ));
os_data->os_cpu[j].per_cpu_iowait =
-   
cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IOWAIT]);
+   cputime_to_jiffies(kcpustat_cpu_get(i, CPUTIME_IOWAIT));
os_data->os_cpu[j].per_cpu_steal =
-   
cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_STEAL]);
+   cputime_to_jiffies(kcpustat_cpu_get(i, CPUTIME_STEAL));
os_data->os_cpu[j].cpu_id = i;
j++;
}
diff --git a/drivers/cpufreq/cpufreq_governor.c 
b/drivers/cpufreq/cpufreq_governor.c
index 6c5f1d3..ec6c315 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -36,12 +36,12 @@ static inline u64 get_cpu_idle_time_jiffy(unsigned int cpu, 
u64 *wall)
 
cur_wall_time = jiffies64_to_cputime64(get_jiffies_64());
 
-   busy_time = kcpustat_cpu(cpu).cpustat[CPUTIME_USER];
-   busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_SYSTEM];
-   busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_IRQ];
-   busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_SOFTIRQ];
-   busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_STEAL];
-   busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_NICE];
+   busy_time = kcpustat_cpu_get(cpu, CPUTIME_USER);
+   busy_time += kcpustat_cpu_get(cpu, CPUTIME_SYSTEM);
+   busy_time += kcpustat_cpu_get(cpu, CPUTIME_IRQ);
+   busy_time += kcpustat_cpu_get(cpu, CPUTIME_SOFTIRQ);
+   busy_time += kcpustat_cpu_get(cpu, CPUTIME_STEAL);
+   busy_time += kcpustat_cpu_get(cpu, CPUTIME_NICE);
 
idle_time = cur_wall_time - busy_time;
if (wall)
@@ -103,7 +103,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
u64 cur_nice;
unsigned long cur_nice_jiffies;
 
-   cur_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE] -
+   cur_nice = kcpustat_cpu_get(j, CPUTIME_NICE) -
 cdbs->prev_cpu_nice;
/*
 * Assumption: nice time between sampling periods will
@@ -113,7 +113,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
cputime64_to_jiffies64(cur_nice);
 
cdbs->prev_cpu_nice =
-   kcpustat_cpu(j).cpustat[CPUTIME_NICE];
+   kcpustat_cpu_get(j, CPUTIME_NICE);
idle_time += jiffies_to_usecs(cur_nice_jiffies);
}
 
@@ -216,7 +216,7 @@ int cpufreq_governor_dbs(struct dbs_data *dbs_data,

Re: [PATCH v3 linux-next] cpufreq: ondemand: Calculate gradient of CPU load to early increase frequency

2013-02-21 Thread Viresh Kumar
On Fri, Feb 22, 2013 at 7:26 AM, Viresh Kumar  wrote:
> On 21 February 2013 23:09, Stratos Karafotis  wrote:

>> Instead of checking only the absolute value of CPU load_freq to increase
>> frequency, we detect forthcoming CPU load rise and increase frequency
>> earlier.
>>
>> Every sampling rate, we calculate the gradient of load_freq. If it is
>> too steep we assume that the load most probably will go over
>> up_threshold in next iteration(s) and we increase frequency immediately.
>>
>> New tuners are introduced:
>> - early_demand: to enable this functionality (disabled by default).
>> - grad_up_threshold: over this gradient of load we will increase
>> frequency immediately.
>>
>> Signed-off-by: Stratos Karafotis 
>
> Acked-by: Viresh Kumar 

Rafael,

I applied it here with my Ack over my patches, for getting a run by
"kbuild test robot".

http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/cpufreq-for-3.10
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] cpustat: convert to atomic operations

2013-02-21 Thread Kevin Hilman
For non 64-bit platforms, convert cpustat fields to atomic64 type so
reads and udpates of cpustats are atomic on those platforms as well.

For 64-bit platforms, the cpustat field is left as u64 because on
64-bit, using atomic64_add will have the additional overhead of a lock.
We could also have used atomic64_set(atomic64_read() + delta), but on
32-bit platforms using the generic 64-bit ops (lib/atomic64.c), that
results in taking a lock twice.

Signed-off-by: Kevin Hilman 
---
 include/linux/kernel_stat.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index df8ad75..a433f87 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -32,7 +32,11 @@ enum cpu_usage_stat {
 };
 
 struct kernel_cpustat {
+#ifdef CONFIG_64BIT
u64 _cpustat[NR_STATS];
+#else
+   atomic64_t _cpustat[NR_STATS];
+#endif
 };
 
 struct kernel_stat {
@@ -51,11 +55,23 @@ DECLARE_PER_CPU(struct kernel_cpustat, kernel_cpustat);
 #define kcpustat_this_cpu (&__get_cpu_var(kernel_cpustat))
 #define kstat_cpu(cpu) per_cpu(kstat, cpu)
 #define kcpustat_cpu(cpu) per_cpu(kernel_cpustat, cpu)
+#ifdef CONFIG_64BIT
 #define kcpustat_cpu_get(cpu, i) (kcpustat_cpu(cpu)._cpustat[i])
 #define kcpustat_cpu_set(cpu, i, val) (kcpustat_cpu(cpu)._cpustat[i] = (val))
 #define kcpustat_cpu_add(cpu, i, val) (kcpustat_cpu(cpu)._cpustat[i] += (val))
 #define kcpustat_this_cpu_set(i, val) (kcpustat_this_cpu->_cpustat[i] = (val))
 #define kcpustat_this_cpu_add(i, val) (kcpustat_this_cpu->_cpustat[i] += (val))
+#else
+#define kcpustat_cpu_get(cpu, i) atomic64_read(_cpu(cpu)._cpustat[i])
+#define kcpustat_cpu_set(cpu, i, val) \
+   atomic64_set(val, _cpu(cpu)._cpustat[i])
+#define kcpustat_cpu_add(cpu, i, val) \
+   atomic64_add(val, _cpu(cpu)._cpustat[i])
+#define kcpustat_this_cpu_set(i, val) \
+   atomic64_set(val, _this_cpu->_cpustat[i])
+#define kcpustat_this_cpu_add(i, val) \
+   atomic64_add(val, _this_cpu->_cpustat[i])
+#endif
 
 extern unsigned long long nr_context_switches(void);
 
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] cpustat: use atomic operations to read/update stats

2013-02-21 Thread Kevin Hilman
On 64-bit platforms, reads/writes of the various cpustat fields are
atomic due to native 64-bit loads/stores.  However, on non 64-bit
platforms, reads/writes of the cpustat fields are not atomic and could
lead to inconsistent statistics.

This problem was originally reported by Frederic Weisbecker as a
64-bit limitation with the nsec granularity cputime accounting for
full dynticks, but then we realized that it's a problem that's been
around for awhile and not specific to the new cputime accounting.

This series fixes this by first converting all access to the cputime
fields to use accessor functions, and then converting the accessor
functions to use the atomic64 functions.

Implemented based on idea proposed by Frederic Weisbecker.

Kevin Hilman (2):
  cpustat: use accessor functions for get/set/add
  cpustat: convert to atomic operations

 arch/s390/appldata/appldata_os.c   | 16 +++
 drivers/cpufreq/cpufreq_governor.c | 18 -
 drivers/cpufreq/cpufreq_ondemand.c |  2 +-
 drivers/macintosh/rack-meter.c |  6 +++---
 fs/proc/stat.c | 40 +++---
 fs/proc/uptime.c   |  2 +-
 include/linux/kernel_stat.h| 11 ++-
 kernel/sched/core.c| 12 +---
 kernel/sched/cputime.c | 29 +--
 9 files changed, 70 insertions(+), 66 deletions(-)

-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[rebased][PATCH 0/4] acpi: do some changes for numa info

2013-02-21 Thread liguang
just do some trivial changes to make acpi's numa info
operation more cleaner.

ChangeLog

v3->v4
 1.fix srat_disabled function
 spotted by Yasuaki Ishimatsu 

v2->v3
 1. rebase on linux-next
 2. bring back lost Makefile changes
 spotted by David Rientjes 
 spotted by Yasuaki Ishimatsu 

v1->v2
 1. fix-up several coding issues
 2. finish srat.c change
 spotted by David Rientjes 

Li Guang (4)
 acpi: move x86/mm/srat.c to x86/kernel/acpi/srat.c
 numa: avoid export acpi_numa variable
 acpi: add clock_domain field to acpi_srat_cpu_affinity
 remove include asm/acpi.h in process_driver.c

arch/x86/include/asm/acpi.h | 2 +-
arch/x86/kernel/acpi/Makefile   | 1 +
arch/x86/kernel/acpi/srat.c | 299 +
arch/x86/mm/Makefile| 1 -
arch/x86/mm/numa.c  | 2 +-
arch/x86/mm/srat.c  | 278 -
arch/x86/xen/enlighten.c| 2 +-
drivers/acpi/processor_driver.c | 1 -
include/acpi/actbl1.h   | 2 +-
9 files changed, 296 insertions(+), 292 deletions(-)
 create mode 100644 arch/x86/kernel/acpi/srat.c
 delete mode 100644 arch/x86/mm/srat.c
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[rebased][PATCH 2/4] numa: avoid export acpi_numa variable

2013-02-21 Thread liguang
acpi_numa is used to prevent srat table
being parsed, seems a little miss-named,
if 'noacpi' was specified by cmdline and
CONFIG_ACPI_NUMA was enabled, acpi_numa
will be operated directly from everywhere
it needed to disable/enable numa in acpi
mode which was a bad thing, so, try to
export a fuction to get srat table
enable/disable info.

Signed-off-by: liguang 
---
 arch/x86/include/asm/acpi.h |2 +-
 arch/x86/kernel/acpi/srat.c |   21 +
 arch/x86/mm/numa.c  |2 +-
 arch/x86/xen/enlighten.c|2 +-
 4 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index b31bf97..449e12a 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -177,7 +177,7 @@ static inline void disable_acpi(void) { }
 #define ARCH_HAS_POWER_INIT1
 
 #ifdef CONFIG_ACPI_NUMA
-extern int acpi_numa;
+extern void disable_acpi_numa(void);
 extern int x86_acpi_numa_init(void);
 #endif /* CONFIG_ACPI_NUMA */
 
diff --git a/arch/x86/kernel/acpi/srat.c b/arch/x86/kernel/acpi/srat.c
index b20b5b7..469a0af 100644
--- a/arch/x86/kernel/acpi/srat.c
+++ b/arch/x86/kernel/acpi/srat.c
@@ -24,22 +24,27 @@
 #include 
 #include 
 
-int acpi_numa __initdata;
+static bool acpi_numa __initdata;
 
 static __init int setup_node(int pxm)
 {
return acpi_map_pxm_to_node(pxm);
 }
 
-static __init void bad_srat(void)
+void __init disable_acpi_numa(void)
 {
-   printk(KERN_ERR "SRAT: SRAT not used.\n");
-   acpi_numa = -1;
+   acpi_numa = false;
 }
 
-static __init inline int srat_disabled(void)
+static void __init bad_srat(void)
 {
-   return acpi_numa < 0;
+   disable_acpi_numa();
+   printk(KERN_ERR "SRAT: SRAT will not be used.\n");
+}
+
+static bool __init srat_disabled(void)
+{
+   return acpi_numa == false;
 }
 
 /* Callback for SLIT parsing */
@@ -88,7 +93,7 @@ acpi_numa_x2apic_affinity_init(struct 
acpi_srat_x2apic_cpu_affinity *pa)
}
set_apicid_to_node(apic_id, node);
node_set(node, numa_nodes_parsed);
-   acpi_numa = 1;
+   acpi_numa = true;
printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u\n",
   pxm, apic_id, node);
 }
@@ -130,7 +135,7 @@ acpi_numa_processor_affinity_init(struct 
acpi_srat_cpu_affinity *pa)
 
set_apicid_to_node(apic_id, node);
node_set(node, numa_nodes_parsed);
-   acpi_numa = 1;
+   acpi_numa = true;
printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u\n",
   pxm, apic_id, node);
 }
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 3545585..62e3b2a 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -47,7 +47,7 @@ static __init int numa_setup(char *opt)
 #endif
 #ifdef CONFIG_ACPI_NUMA
if (!strncmp(opt, "noacpi", 6))
-   acpi_numa = -1;
+   disable_acpi_numa();
 #endif
return 0;
 }
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index bd4c134..724ac84 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1447,7 +1447,7 @@ asmlinkage void __init xen_start_kernel(void)
 * any NUMA information the kernel tries to get from ACPI will
 * be meaningless.  Prevent it from trying.
 */
-   acpi_numa = -1;
+   disable_acpi_numa();
 #endif
 
/* Don't do the full vcpu_info placement stuff until we have a
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[rebased][PATCH 4/4] remove include asm/acpi.h in process_driver.c

2013-02-21 Thread liguang
process_driver.c include linux/acpi.h which already
include asm/acpi.h, so remove it.

Reviewed-by: Yasuaki Ishimatsu 
Acked-by: David Rientjes 
Signed-off-by: liguang 
---
 drivers/acpi/processor_driver.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c
index df34bd0..341258a 100644
--- a/drivers/acpi/processor_driver.c
+++ b/drivers/acpi/processor_driver.c
@@ -53,7 +53,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[rebased][PATCH 1/4] acpi: move x86/mm/srat.c to x86/kernel/acpi/srat.c

2013-02-21 Thread liguang
srat table should present only on acpi domain,
seems mm/ is not the right place for it.

Reviewed-by: Yasuaki Ishimatsu 
Signed-off-by: liguang 
---
 arch/x86/kernel/acpi/Makefile |1 +
 arch/x86/kernel/acpi/srat.c   |  278 +
 arch/x86/mm/Makefile  |1 -
 arch/x86/mm/srat.c|  278 -
 4 files changed, 279 insertions(+), 279 deletions(-)
 create mode 100644 arch/x86/kernel/acpi/srat.c
 delete mode 100644 arch/x86/mm/srat.c

diff --git a/arch/x86/kernel/acpi/Makefile b/arch/x86/kernel/acpi/Makefile
index 163b225..98cea92 100644
--- a/arch/x86/kernel/acpi/Makefile
+++ b/arch/x86/kernel/acpi/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_ACPI) += boot.o
 obj-$(CONFIG_ACPI_SLEEP)   += sleep.o wakeup_$(BITS).o
+obj-$(CONFIG_ACPI_NUMA)+= srat.o
 
 ifneq ($(CONFIG_ACPI_PROCESSOR),)
 obj-y  += cstate.o
diff --git a/arch/x86/kernel/acpi/srat.c b/arch/x86/kernel/acpi/srat.c
new file mode 100644
index 000..b20b5b7
--- /dev/null
+++ b/arch/x86/kernel/acpi/srat.c
@@ -0,0 +1,278 @@
+/*
+ * ACPI 3.0 based NUMA setup
+ * Copyright 2004 Andi Kleen, SuSE Labs.
+ *
+ * Reads the ACPI SRAT table to figure out what memory belongs to which CPUs.
+ *
+ * Called from acpi_numa_init while reading the SRAT and SLIT tables.
+ * Assumes all memory regions belonging to a single proximity domain
+ * are in one chunk. Holes between them will be included in the node.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+int acpi_numa __initdata;
+
+static __init int setup_node(int pxm)
+{
+   return acpi_map_pxm_to_node(pxm);
+}
+
+static __init void bad_srat(void)
+{
+   printk(KERN_ERR "SRAT: SRAT not used.\n");
+   acpi_numa = -1;
+}
+
+static __init inline int srat_disabled(void)
+{
+   return acpi_numa < 0;
+}
+
+/* Callback for SLIT parsing */
+void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
+{
+   int i, j;
+
+   for (i = 0; i < slit->locality_count; i++)
+   for (j = 0; j < slit->locality_count; j++)
+   numa_set_distance(pxm_to_node(i), pxm_to_node(j),
+   slit->entry[slit->locality_count * i + j]);
+}
+
+/* Callback for Proximity Domain -> x2APIC mapping */
+void __init
+acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)
+{
+   int pxm, node;
+   int apic_id;
+
+   if (srat_disabled())
+   return;
+   if (pa->header.length < sizeof(struct acpi_srat_x2apic_cpu_affinity)) {
+   bad_srat();
+   return;
+   }
+   if ((pa->flags & ACPI_SRAT_CPU_ENABLED) == 0)
+   return;
+   pxm = pa->proximity_domain;
+   apic_id = pa->apic_id;
+   if (!apic->apic_id_valid(apic_id)) {
+   printk(KERN_INFO "SRAT: PXM %u -> X2APIC 0x%04x ignored\n",
+pxm, apic_id);
+   return;
+   }
+   node = setup_node(pxm);
+   if (node < 0) {
+   printk(KERN_ERR "SRAT: Too many proximity domains %x\n", pxm);
+   bad_srat();
+   return;
+   }
+
+   if (apic_id >= MAX_LOCAL_APIC) {
+   printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u 
skipped apicid that is too big\n", pxm, apic_id, node);
+   return;
+   }
+   set_apicid_to_node(apic_id, node);
+   node_set(node, numa_nodes_parsed);
+   acpi_numa = 1;
+   printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u\n",
+  pxm, apic_id, node);
+}
+
+/* Callback for Proximity Domain -> LAPIC mapping */
+void __init
+acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
+{
+   int pxm, node;
+   int apic_id;
+
+   if (srat_disabled())
+   return;
+   if (pa->header.length != sizeof(struct acpi_srat_cpu_affinity)) {
+   bad_srat();
+   return;
+   }
+   if ((pa->flags & ACPI_SRAT_CPU_ENABLED) == 0)
+   return;
+   pxm = pa->proximity_domain_lo;
+   if (acpi_srat_revision >= 2)
+   pxm |= *((unsigned int*)pa->proximity_domain_hi) << 8;
+   node = setup_node(pxm);
+   if (node < 0) {
+   printk(KERN_ERR "SRAT: Too many proximity domains %x\n", pxm);
+   bad_srat();
+   return;
+   }
+
+   if (get_uv_system_type() >= UV_X2APIC)
+   apic_id = (pa->apic_id << 8) | pa->local_sapic_eid;
+   else
+   apic_id = pa->apic_id;
+
+   if (apic_id >= MAX_LOCAL_APIC) {
+   printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u 
skipped apicid that is too big\n", pxm, apic_id, node);
+   return;
+   }
+
+   set_apicid_to_node(apic_id, node);
+   node_set(node, numa_nodes_parsed);
+   

[rebased][PATCH 3/4] acpi: add clock_domain field to acpi_srat_cpu_affinity

2013-02-21 Thread liguang
according to ACPI SPEC v5.0, page 152,
5.2.16.1 Processor Local APIC/SAPIC Affinity Structure,
the last member of it is clock_domain.

Reviewed-by: Yasuaki Ishimatsu 
Acked-by: David Rientjes 
Signed-off-by: liguang 
---
 include/acpi/actbl1.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
index 0bd750e..e21d22b 100644
--- a/include/acpi/actbl1.h
+++ b/include/acpi/actbl1.h
@@ -922,7 +922,7 @@ struct acpi_srat_cpu_affinity {
u32 flags;
u8 local_sapic_eid;
u8 proximity_domain_hi[3];
-   u32 reserved;   /* Reserved, must be zero */
+   u32 clock_domain;
 };
 
 /* Flags */
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] lib: devres: Fix misplaced #endif

2013-02-21 Thread Jingoo Han
A misplaced #endif causes link errors related to pcim_*() functions.

Signed-off-by: Jingoo Han 
---
 lib/devres.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/lib/devres.c b/lib/devres.c
index 88ad759..8235331 100644
--- a/lib/devres.c
+++ b/lib/devres.c
@@ -227,6 +227,7 @@ void devm_ioport_unmap(struct device *dev, void __iomem 
*addr)
   devm_ioport_map_match, (void *)addr));
 }
 EXPORT_SYMBOL(devm_ioport_unmap);
+#endif /* CONFIG_HAS_IOPORT */
 
 #ifdef CONFIG_PCI
 /*
@@ -432,4 +433,3 @@ void pcim_iounmap_regions(struct pci_dev *pdev, int mask)
 }
 EXPORT_SYMBOL(pcim_iounmap_regions);
 #endif /* CONFIG_PCI */
-#endif /* CONFIG_HAS_IOPORT */
-- 
1.7.2.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: thermal governor: does it actually work??

2013-02-21 Thread Peter Feuerer

Adding Boris,

sorry, I can't do anything currently, I'm down with influenza.

kind regards,
--peter;

Zhang Rui writes:


On Thu, 2013-02-14 at 16:32 +0100, Andreas Mohr wrote:

For me after having loaded acerhdf the fan never stops (with kernelmode
active), despite staying safely below trip point
(acerhdf_set_cur_state() actually never gets called).


BTW, could you please check if this one fixes the problem for you?
http://git.kernel.org/?p=linux/kernel/git/rzhang/linux.git;a=commit;h=b8bb6cb999858043489c1ddef08eed2127559169

thanks,
rui

And AFAIR in a 3.2.0 kernel acerhdf fan operation seemed to just work
(i.e., no fan for low temps, from the beginning).
Needless to say 3.2.0 didn't even feature all the modern thermal
governor crapyard yet ;)
(ok, well, it's more complex but it's also a very nice environment capability)

3.8-rc7:
CONFIG_ACPI_THERMAL=m
CONFIG_THERMAL=m
CONFIG_THERMAL_HWMON=y
CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE=y
# CONFIG_THERMAL_DEFAULT_GOV_FAIR_SHARE is not set
# CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE is not set
CONFIG_FAIR_SHARE=y
CONFIG_STEP_WISE=y
# CONFIG_USER_SPACE is not set
# CONFIG_CPU_THERMAL is not set



Terminology in this area seems to be quite a bit off, too, at several
docs places, at least according to my understanding:

e.g. drivers/thermal/step_wise.c has the following comment:

/**
 * step_wise_throttle - throttles devices associated with the given zone
 * @tz - thermal_zone_device
 * @trip - the trip point
 * @trip_type - type of the trip point
 *
 * Throttling Logic: This uses the trend of the thermal zone to
 * throttle.
 * If the thermal zone is 'heating up' this throttles all the cooling
 * devices associated with the zone and its particular trip point, by
 * one
 * step. If the zone is 'cooling down' it brings back the performance of
 * the devices by one step.



if ... heating up ... throttles ...
Sorry, but at least for P4 clockmod stuff (or some such), throttle
states (P1...P8 IIRC) meant that the CPU operation was *reduced*,
i.e. with pause intervals.
And the translation of throttle clearly says that it does go that way
and not the other way...
(yes, you managed to confuse me that much that I even had to look up
things to verify)

... cooling down ... brings back ...
This should certainly be worded "reduces" or some such.

So, any idea why I'm missing callbacks in acerhdf (if that is what I'm
supposed to expect to happen)?
Kernel bug, .config mistake, missing/wrong user-side setup?

Needless to say if kernel bug this ought to be fixed pre-3.8 ideally.

Thanks,

Andreas Mohr




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] Update LZO compression code for v3.9

2013-02-21 Thread Markus F.X.J. Oberhumer
Hi Linus,

please pull my "lzo-update" branch from

  git://github.com/markus-oberhumer/linux.git lzo-update

You can also browse the branch at

  https://github.com/markus-oberhumer/linux/compare/lzo-update

The LZO update actually had been approved by akpm for a 3.7 merge and is
available in linux-next since October, but I've only recently learned
that there is no automatic flow from linux-next to linux and I have
to personally send a pull request.

Many thanks,
Markus


Summary:


Update the Linux kernel LZO compression and decompression code to the current
upstream version which features significant performance improvements
on modern machines.

$ git shortlog v3.8..lzo-update
Markus F.X.J. Oberhumer (3):
  lib/lzo: Rename lzo1x_decompress.c to lzo1x_decompress_safe.c
  lib/lzo: Update LZO compression to current upstream version
  crypto: testmgr - update LZO compression test vectors

$ git diff --stat v3.8..lzo-update
 crypto/testmgr.h|   38 +++--
 include/linux/lzo.h |   15 +-
 lib/decompress_unlzo.c  |2 +-
 lib/lzo/Makefile|2 +-
 lib/lzo/lzo1x_compress.c|  335 ++
 lib/lzo/lzo1x_decompress.c  |  255 -
 lib/lzo/lzo1x_decompress_safe.c |  237 +++
 lib/lzo/lzodefs.h   |   38 +++--
 8 files changed, 488 insertions(+), 434 deletions(-)


Some *synthetic* benchmarks:


x86_64 (Sandy Bridge), gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:

   compression speed   decompression speed

  LZO-2005: 150 MB/sec  468 MB/sec
  LZO-2012: 434 MB/sec 1210 MB/sec

i386 (Sandy Bridge), gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:

   compression speed   decompression speed

  LZO-2005: 143 MB/sec  409 MB/sec
  LZO-2012: 372 MB/sec 1121 MB/sec

armv7 (Cortex-A9), Linaro gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:

   compression speed   decompression speed

  LZO-2005:  27 MB/sec   84 MB/sec
  LZO-2012:  44 MB/sec  117 MB/sec
**LZO-2013-UA :  47 MB/sec  167 MB/sec

Legend:

  LZO-2005: LZO version in current 3.8 kernel (which is based on
   the LZO 2.02 release from 2005)
  LZO-2012: updated LZO version available in linux-next
**LZO-2013-UA : updated LZO version available in linux-next plus experimental
   ARM Unaligned Access patch. This needs approval
   from some ARM maintainer ist NOT YET INCLUDED.


-- 
Markus Oberhumer, , http://www.oberhumer.com/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-21 Thread Michael Wang
On 02/22/2013 01:02 PM, Mike Galbraith wrote:
> On Fri, 2013-02-22 at 10:36 +0800, Michael Wang wrote: 
>> On 02/21/2013 05:43 PM, Mike Galbraith wrote:
>>> On Thu, 2013-02-21 at 17:08 +0800, Michael Wang wrote:
>>>
 But is this patch set really cause regression on your Q6600? It may
 sacrificed some thing, but I still think it will benefit far more,
 especially on huge systems.
>>>
>>> We spread on FORK/EXEC, and will no longer will pull communicating tasks
>>> back to a shared cache with the new logic preferring to leave wakee
>>> remote, so while no, I haven't tested (will try to find round tuit) it
>>> seems  it _must_ hurt.  Dragging data from one llc to the other on Q6600
>>> hurts a LOT.  Every time a client and server are cross llc, it's a huge
>>> hit.  The previous logic pulled communicating tasks together right when
>>> it matters the most, intermittent load... or interactive use.
>>
>> I agree that this is a problem need to be solved, but don't agree that
>> wake_affine() is the solution.
> 
> It's not perfect, but it's better than no countering force at all.  It's
> a relic of the dark ages, when affine meant L2, ie this cpu.  Now days,
> affine has a whole new meaning, L3, so it could be done differently, but
> _some_ kind of opposing force is required.
> 
>> According to my understanding, in the old world, wake_affine() will only
>> be used if curr_cpu and prev_cpu share cache, which means they are in
>> one package, whatever search in llc sd of curr_cpu or prev_cpu, we won't
>> have the chance to spread the task out of that package.
> 
> ? affine_sd is the first domain spanning both cpus, that may be NODE.
> True we won't ever spread in the wakeup path unless SD_WAKE_BALANCE is
> set that is.  Would be nice to be able to do that without shredding
> performance.
> 
> Off the top of my pointy head, I can think of a way to _maybe_ improve
> the "affine" wakeup criteria:  Add a small (package size? and very fast)
> FIFO queue to task struct, record waker/wakee relationship.  If
> relationship exists in that queue (rbtree), try to wake local, if not,
> wake remote.  The thought is to identify situations ala 1:N pgbench
> where you really need to keep the load spread.  That need arises when
> the sum wakees + waker won't fit in one cache.  True buddies would
> always hit (hm, hit rate), always try to become affine where they
> thrive.  1:N stuff starts missing when client count exceeds package
> size, starts expanding it's horizons. 'Course you would still need to
> NAK if imbalanced too badly, and let NUMA stuff NAK touching lard-balls
> and whatnot.  With a little more smarts, we could have happy 1:N, and
> buddies don't have to chat through 2m thick walls to make 1:N scale as
> well as it can before it dies of stupidity.

Just confirm that I'm not on the wrong way, did the 1:N mode here means
1 task forked N threads, and child always talk with father?

Regards,
Michael Wang

> 
> -Mike
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the final tree (drm tree related)

2013-02-21 Thread Rob Clark
Hmm, maybe DRM_GEM_CMA_HELPER should depend on ARM (or !PPC)?  Or
maybe there is an alternative fxn to use on other archs?

In truth, it is fine to make TILCDC depend on ARM, as it wouldn't be
used on any other platform (today.. until TI comes up with some crazy
chip w/ some TI DSP plus display controller), although that doesn't
quite feel like the right fix.  It would be nice to make the CMA
helpers do the right thing on other archs somehow.

BR,
-R

On Thu, Feb 21, 2013 at 11:17 PM, Stephen Rothwell  
wrote:
> Hi all,
>
> After merging the final tree, today's linux-next build (powerpc
> allyesconfig) failed like this:
>
> drivers/gpu/drm/drm_gem_cma_helper.c: In function 'drm_gem_cma_buf_destroy':
> drivers/gpu/drm/drm_gem_cma_helper.c:38:2: error: implicit declaration of 
> function 'dma_free_writecombine' [-Werror=implicit-function-declaration]
> drivers/gpu/drm/drm_gem_cma_helper.c: In function 'drm_gem_cma_create':
> drivers/gpu/drm/drm_gem_cma_helper.c:61:2: error: implicit declaration of 
> function 'dma_alloc_writecombine' [-Werror=implicit-function-declaration]
>
> Probably caused by commit 16ea975eac67 ("drm/tilcdc: add TI LCD
> Controller DRM driver (v4)") which forced CONFIG_DRM_GEM_CMA_HELPER to
> 'y'.  dma_alloc/free_writecombine are only defined on ARM.
>
> I added this patch for today.
>
> From: Stephen Rothwell 
> Date: Fri, 22 Feb 2013 15:14:50 +1100
> Subject: [PATCH] drm/tilcdc: only build on arm
>
> Signed-off-by: Stephen Rothwell 
> ---
>  drivers/gpu/drm/tilcdc/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/tilcdc/Kconfig b/drivers/gpu/drm/tilcdc/Kconfig
> index ae14fd6..d24d040 100644
> --- a/drivers/gpu/drm/tilcdc/Kconfig
> +++ b/drivers/gpu/drm/tilcdc/Kconfig
> @@ -1,6 +1,6 @@
>  config DRM_TILCDC
> tristate "DRM Support for TI LCDC Display Controller"
> -   depends on DRM && OF
> +   depends on DRM && OF && ARM
> select DRM_KMS_HELPER
> select DRM_KMS_CMA_HELPER
> select DRM_GEM_CMA_HELPER
> --
> 1.8.1
>
> --
> Cheers,
> Stephen Rothwells...@canb.auug.org.au
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-21 Thread Mike Galbraith
On Fri, 2013-02-22 at 10:37 +0800, Michael Wang wrote:

> According to the testing result, I could not agree this purpose of
> wake_affine() benefit us, but I'm sure that wake_affine() is a terrible
> performance killer when system is busy.

(hm, result is singular.. pgbench in 1:N mode only?)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 1/3] sched: schedule balance map foundation

2013-02-21 Thread Michael Wang
On 02/22/2013 12:46 PM, Alex Shi wrote:
> On 02/22/2013 12:19 PM, Michael Wang wrote:
>>
 Why not seek other way to change O(n^2) to O(n)?

 Access 2G memory is unbelievable performance cost.
>> Not access 2G memory, but (2G / 16K) memory, the sbm size is O(N).
>>
>> And please notice that on 16k cpus system, topology will be deep if NUMA
>> enabled (O(log N) as Peter said), and that's really a good stage for
>> this idea to perform on, we could save lot's of recursed 'for' cycles.
>>
> 
> CPU execute part is very very fast compare to the memory access, the
> 'for' cycles cost is most on the memory access for many domain/groups
> data, not instruction execution.
> 
> In a hot patch, several KB memory access will cause clear cpu cache
> pollution then make kernel slowly.

Hmm...that's a good catch.

Comparison between memory access and cpu execution, no doubt the latter
will win, you are right.

But that was same in the old world when access the struct sched_domain,
isn't it?

for_each_domain(cpu, tmp) {
if (weight <= tmp->span_weight)
break;
if (tmp->flags & sd_flag)
sd = tmp;
}

Both old and new may access data across nodes, but the old one will
access several times more, isn't it?

Regards,
Michael Wang

> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-21 Thread Mike Galbraith
On Fri, 2013-02-22 at 10:36 +0800, Michael Wang wrote: 
> On 02/21/2013 05:43 PM, Mike Galbraith wrote:
> > On Thu, 2013-02-21 at 17:08 +0800, Michael Wang wrote:
> > 
> >> But is this patch set really cause regression on your Q6600? It may
> >> sacrificed some thing, but I still think it will benefit far more,
> >> especially on huge systems.
> > 
> > We spread on FORK/EXEC, and will no longer will pull communicating tasks
> > back to a shared cache with the new logic preferring to leave wakee
> > remote, so while no, I haven't tested (will try to find round tuit) it
> > seems  it _must_ hurt.  Dragging data from one llc to the other on Q6600
> > hurts a LOT.  Every time a client and server are cross llc, it's a huge
> > hit.  The previous logic pulled communicating tasks together right when
> > it matters the most, intermittent load... or interactive use.
> 
> I agree that this is a problem need to be solved, but don't agree that
> wake_affine() is the solution.

It's not perfect, but it's better than no countering force at all.  It's
a relic of the dark ages, when affine meant L2, ie this cpu.  Now days,
affine has a whole new meaning, L3, so it could be done differently, but
_some_ kind of opposing force is required.

> According to my understanding, in the old world, wake_affine() will only
> be used if curr_cpu and prev_cpu share cache, which means they are in
> one package, whatever search in llc sd of curr_cpu or prev_cpu, we won't
> have the chance to spread the task out of that package.

? affine_sd is the first domain spanning both cpus, that may be NODE.
True we won't ever spread in the wakeup path unless SD_WAKE_BALANCE is
set that is.  Would be nice to be able to do that without shredding
performance.

Off the top of my pointy head, I can think of a way to _maybe_ improve
the "affine" wakeup criteria:  Add a small (package size? and very fast)
FIFO queue to task struct, record waker/wakee relationship.  If
relationship exists in that queue (rbtree), try to wake local, if not,
wake remote.  The thought is to identify situations ala 1:N pgbench
where you really need to keep the load spread.  That need arises when
the sum wakees + waker won't fit in one cache.  True buddies would
always hit (hm, hit rate), always try to become affine where they
thrive.  1:N stuff starts missing when client count exceeds package
size, starts expanding it's horizons. 'Course you would still need to
NAK if imbalanced too badly, and let NUMA stuff NAK touching lard-balls
and whatnot.  With a little more smarts, we could have happy 1:N, and
buddies don't have to chat through 2m thick walls to make 1:N scale as
well as it can before it dies of stupidity.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] x86/microcode for v3.9-rc1

2013-02-21 Thread H. Peter Anvin
Hi Linus,

This patchset lets us update the CPU microcode very, very early in
initialization if the BIOS fails to do so (never happens, right?)
This is handy for dealing with things like the Atom erratum where we
have to run without PSE because microcode loading happens too late.

As I mentioned in the x86/mm push request it depends on that
infrastructure but it is otherwise a standalone feature.



The following changes since commit ac2cbab21f318e19bc176a7f38a120cec835220f:

  x86: Don't panic if can not alloc buffer for swiotlb (2013-01-29 19:36:53 
-0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/microcode

for you to fetch changes up to da76f64e7eb28b718501d15c1b79af560b7ca4ea:

  x86/Kconfig: Make early microcode loading a configuration feature (2013-01-31 
13:20:42 -0800)



Fenghua Yu (12):
  x86, doc: Documentation for early microcode loading
  x86/microcode_intel.h: Define functions and macros for early loading ucode
  x86/common.c: Make have_cpuid_p() a global function
  x86/common.c: load ucode in 64 bit or show loading ucode info in 32 bit 
on AP
  x86/microcode_core_early.c: Define interfaces for early loading ucode
  x86/microcode_intel_lib.c: Early update ucode on Intel's CPU
  x86/tlbflush.h: Define __native_flush_tlb_global_irq_disabled()
  x86/microcode_intel_early.c: Early update ucode on Intel's CPU
  x86/head_32.S: Early update ucode in 32-bit
  x86/head64.c: Early update ucode in 64-bit
  x86/mm/init.c: Copy ucode from initrd image to kernel memory
  x86/Kconfig: Make early microcode loading a configuration feature

 Documentation/x86/early-microcode.txt   |  43 ++
 arch/x86/Kconfig|  18 +
 arch/x86/include/asm/microcode.h|  14 +
 arch/x86/include/asm/microcode_intel.h  |  85 
 arch/x86/include/asm/processor.h|   8 +
 arch/x86/include/asm/tlbflush.h |  18 +-
 arch/x86/kernel/Makefile|   3 +
 arch/x86/kernel/cpu/common.c|  17 +-
 arch/x86/kernel/head64.c|   6 +
 arch/x86/kernel/head_32.S   |  11 +
 arch/x86/kernel/microcode_core.c|   7 +-
 arch/x86/kernel/microcode_core_early.c  |  76 +++
 arch/x86/kernel/microcode_intel.c   | 198 ++--
 arch/x86/kernel/microcode_intel_early.c | 796 
 arch/x86/kernel/microcode_intel_lib.c   | 174 +++
 arch/x86/mm/init.c  |  10 +
 16 files changed, 1301 insertions(+), 183 deletions(-)

[full diff omitted due to length]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] kexec: avoid freeing NULL pointer in function kimage_crash_alloc

2013-02-21 Thread Zhang Yanfei
Though there is no error if we free a NULL pointer, I think we could
avoid this behaviour. Change the code a little in kimage_crash_alloc()
could avoid this kind of unnecessary free.

Cc: "Eric W. Biederman" 
Cc: Andrew Morton 
Signed-off-by: Zhang Yanfei 
---
 kernel/kexec.c |   15 +++
 1 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index 5e4bd78..4e96fa7 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -310,7 +310,7 @@ static int kimage_crash_alloc(struct kimage **rimage, 
unsigned long entry,
mend = mstart + image->segment[i].memsz - 1;
/* Ensure we are within the crash kernel limits */
if ((mstart < crashk_res.start) || (mend > crashk_res.end))
-   goto out;
+   goto out_free;
}
 
/*
@@ -323,16 +323,15 @@ static int kimage_crash_alloc(struct kimage **rimage, 
unsigned long entry,
   get_order(KEXEC_CONTROL_PAGE_SIZE));
if (!image->control_code_page) {
printk(KERN_ERR "Could not allocate control_code_buffer\n");
-   goto out;
+   goto out_free;
}
 
-   result = 0;
-out:
-   if (result == 0)
-   *rimage = image;
-   else
-   kfree(image);
+   *rimage = image;
+   return 0;
 
+out_free:
+   kfree(image);
+out:
return result;
 }
 
-- 
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] PM / devfreq: fix missing unlock on error in exynos4_busfreq_pm_notifier_event()

2013-02-21 Thread Nishanth Menon
On 12:33-20130222, Wei Yongjun wrote:
> From: Wei Yongjun 
> 
> Add the missing unlock before return from function
> exynos4_busfreq_pm_notifier_event() in the error
> handling case.
> 
> This issue introduced by commit 8fa938
> (PM / devfreq: exynos4_bus: honor RCU lock usage)
> 
> Signed-off-by: Wei Yongjun 
> ---
>  drivers/devfreq/exynos4_bus.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/devfreq/exynos4_bus.c b/drivers/devfreq/exynos4_bus.c
> index 46d94e9..6208a68 100644
> --- a/drivers/devfreq/exynos4_bus.c
> +++ b/drivers/devfreq/exynos4_bus.c
> @@ -974,6 +974,7 @@ static int exynos4_busfreq_pm_notifier_event(struct 
> notifier_block *this,
>   rcu_read_unlock();
>   dev_err(data->dev, "%s: unable to find a min freq\n",
>   __func__);
> + mutex_unlock(>lock);
>   return PTR_ERR(opp);
>   }
>   new_oppinfo.rate = opp_get_freq(opp);
> 
> 
Arrgh.. Thanks for catching this :( My bad.

Fix looks good to me. upto  MyungJoo.

MyungJoo, Rafael,
btw, adding linux...@vger.kernel.org to  MAINTAINERS for devfreq might
be a nice idea to have right audience.
-- 
Regards,
Nishanth Menon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 1/3] sched: schedule balance map foundation

2013-02-21 Thread Alex Shi
On 02/22/2013 12:19 PM, Michael Wang wrote:
> 
>> > Why not seek other way to change O(n^2) to O(n)?
>> > 
>> > Access 2G memory is unbelievable performance cost.
> Not access 2G memory, but (2G / 16K) memory, the sbm size is O(N).
> 
> And please notice that on 16k cpus system, topology will be deep if NUMA
> enabled (O(log N) as Peter said), and that's really a good stage for
> this idea to perform on, we could save lot's of recursed 'for' cycles.
> 

CPU execute part is very very fast compare to the memory access, the
'for' cycles cost is most on the memory access for many domain/groups
data, not instruction execution.

In a hot patch, several KB memory access will cause clear cpu cache
pollution then make kernel slowly.

-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] kexec: fix memory leak in function kimage_normal_alloc

2013-02-21 Thread Zhang Yanfei
If kimage_normal_alloc() fails to alloc pages for image->swap_page, it
should call kimage_free_page_list() to free allocated pages in
image->control_pages list before it frees image.

Cc: "Eric W. Biederman" 
Cc: Andrew Morton 
Cc: Sasha Levin 
Signed-off-by: Zhang Yanfei 
---
 kernel/kexec.c |   18 ++
 1 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index 5e4bd78..a57face 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -223,6 +223,8 @@ out:
 
 }
 
+static void kimage_free_page_list(struct list_head *list);
+
 static int kimage_normal_alloc(struct kimage **rimage, unsigned long entry,
unsigned long nr_segments,
struct kexec_segment __user *segments)
@@ -248,22 +250,22 @@ static int kimage_normal_alloc(struct kimage **rimage, 
unsigned long entry,
   get_order(KEXEC_CONTROL_PAGE_SIZE));
if (!image->control_code_page) {
printk(KERN_ERR "Could not allocate control_code_buffer\n");
-   goto out;
+   goto out_free;
}
 
image->swap_page = kimage_alloc_control_pages(image, 0);
if (!image->swap_page) {
printk(KERN_ERR "Could not allocate swap buffer\n");
-   goto out;
+   goto out_free;
}
 
-   result = 0;
- out:
-   if (result == 0)
-   *rimage = image;
-   else
-   kfree(image);
+   *rimage = image;
+   return 0;
 
+out_free:
+   kimage_free_page_list(>control_pages);
+   kfree(image);
+out:
return result;
 }
 
-- 
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] PM / devfreq: fix missing unlock on error in exynos4_busfreq_pm_notifier_event()

2013-02-21 Thread Wei Yongjun
From: Wei Yongjun 

Add the missing unlock before return from function
exynos4_busfreq_pm_notifier_event() in the error
handling case.

This issue introduced by commit 8fa938
(PM / devfreq: exynos4_bus: honor RCU lock usage)

Signed-off-by: Wei Yongjun 
---
 drivers/devfreq/exynos4_bus.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/devfreq/exynos4_bus.c b/drivers/devfreq/exynos4_bus.c
index 46d94e9..6208a68 100644
--- a/drivers/devfreq/exynos4_bus.c
+++ b/drivers/devfreq/exynos4_bus.c
@@ -974,6 +974,7 @@ static int exynos4_busfreq_pm_notifier_event(struct 
notifier_block *this,
rcu_read_unlock();
dev_err(data->dev, "%s: unable to find a min freq\n",
__func__);
+   mutex_unlock(>lock);
return PTR_ERR(opp);
}
new_oppinfo.rate = opp_get_freq(opp);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] slave-dmaengine updates

2013-02-21 Thread Vinod Koul

Hi Linus,

Here is the slave-dmaengine updates.

This is fairly big pull by my standards as I had missed last merge window.
So we have the support for device tree for slave-dmaengine, large updates to
dw_dmac driver from Andy for reusing on different architectures. Along with this
we have fixes on bunch of the drivers

Thanks
~Vinod
--

The following changes since commit d1c3ed669a2d452cacfb48c2d171a1f364dae2ed:
are available in the git repository at:

  git://git.infradead.org/users/vkoul/slave-dma.git next

Akinobu Mita (4):
  dmaengine: use for_each_set_bit
  dma: amba-pl08x: use vchan_dma_desc_free_list
  dmatest: adjust invalid module parameters for number of source buffers
  async_tx: use memchr_inv

Alessandro Rubini (1):
  pl080.h: moved from arm/include/asm/hardware to include/linux/amba/

Andy Shevchenko (34):
  dw_dmac: change dev_printk() to corresponding macros
  dw_dmac: don't call platform_get_drvdata twice
  dw_dmac: change dev_crit to dev_WARN in dwc_handle_error
  dw_dmac: introduce to_dw_desc() macro
  dw_dmac: absence of pdata isn't critical when autocfg is set
  dw_dmac: check for mapping errors
  dw_dmac: remove redundant check
  dw_dmac: update tx_node_active in dwc_do_single_block
  dma: dw_dmac: add dwc_chan_pause and dwc_chan_resume
  dmaengine: introduce is_slave_direction function
  dma: at_hdmac: check direction properly for cyclic transfers
  dma: dw_dmac: check direction properly in dw_dma_cyclic_prep
  dma: ep93xx_dma: reuse is_slave_direction helper
  dma: ipu_idmac: reuse is_slave_direction helper
  dma: ste_dma40: reuse is_slave_direction helper
  dw_dmac: call .probe after we have a device in place
  dw_dmac: store direction in the custom channel structure
  dw_dmac: make usage of dw_dma_slave optional
  dw_dmac: backlink to dw_dma in dw_dma_chan is superfluous
  dw_dmac: allocate dma descriptors from DMA_COHERENT memory
  dw_dmac: don't exceed AHB master number in dwc_get_data_width
  dw_dmac: move soft LLP code from tasklet to dwc_scan_descriptors
  dw_dmac: print out DW_PARAMS and DWC_PARAMS when debug
  dw_dmac: remove unnecessary tx_list field in dw_dma_chan
  dw_dmac: introduce total_len field in struct dw_desc
  dw_dmac: fill individual length of descriptor
  dw_dmac: return proper residue value
  dw_dmac: apply default dma_mask if needed
  dma: of-dma: protect list write operation by spin_lock
  dmaengine.h: remove redundant else keyword
  dma: coh901318: avoid unbalanced locking
  dma: coh901318: set residue only if dma is in progress
  edma: do not waste memory for dma_mask
  dma: tegra20-apb-dma: remove unnecessary assignment

Arnd Bergmann (1):
  Revert "ARM: SPEAr13xx: Pass DW DMAC platform data from DT"

Barry Song (4):
  dmaengine: sirf: enable the driver support new SiRFmarco SoC
  DMAEngine: add dmaengine_prep_interleaved_dma wrapper for interleaved api
  DMAEngine: sirf: add DMA pause/resume support
  DMAEngine: sirf: lock the shared registers access in 
sirfsoc_dma_terminate_all

Bartlomiej Zolnierkiewicz (10):
  async_tx: add missing DMA unmap to async_memcpy()
  ioat: add missing DMA unmap to ioat_dma_self_test()
  mtd: fsmc_nand: add missing DMA unmap to dma_xfer()
  carma-fpga: pass correct flags to ->device_prep_dma_memcpy()
  ioat3: add missing DMA unmap to ioat_xor_val_self_test()
  async_tx: fix build for async_memset
  dmaengine: remove dma_async_memcpy_pending() macro
  dmaengine: remove dma_async_memcpy_complete() macro
  dmaengine: add cpu_relax() to busy-loop in dma_sync_wait()
  async_tx: fix checking of dma_wait_for_async_tx() return value

Cong Ding (3):
  dma: remove unnecessary null pointer check in mmp_pdma.c
  dma: sh/shdma-base.c: remove unnecessary null pointer check
  dma: of-dma.c: fix memory leakage

Dave Jiang (3):
  ioat: Add alignment workaround for IVB platforms
  ioat: remove chanerr mask setting for IOAT v3.x
  ioatdma: fix race between updating ioat->head and IOAT_COMPLETION_PENDING

Fabio Baltieri (8):
  dmaengine: ste_dma40: add a done queue for completed descriptors
  dmaengine: ste_dma40: add missing kernel-doc entry
  dmaengine: ste_dma40: minor cosmetic fixes
  dmaengine: ste_dma40: minor code readability fixes
  dmaengine: ste_dma40: add software lli support
  dmaengine: set_dma40: ignore spurious interrupts
  dmaengine: set_dma40: balance clock in probe fail code
  dmaengine: ste_dma40: do not remove descriptors for cyclic transfers

Fabio Estevam (1):
  dma: mxs-dma: Fix build warnings with W=1

Fengguang Wu (1):
  dmaengine: ioat - fix spare sparse complain

Gerald Baeza (2):
  dmaengine: ste_dma40: support fixed physical channel allocation
  dmaengine: ste_dma40: physical channels number correction


Re: [PATCH 1/7] ksm: add some comments

2013-02-21 Thread Ric Mason

On 02/21/2013 04:19 PM, Hugh Dickins wrote:

Added slightly more detail to the Documentation of merge_across_nodes,
a few comments in areas indicated by review, and renamed get_ksm_page()'s
argument from "locked" to "lock_it".  No functional change.

Signed-off-by: Hugh Dickins 
---
  Documentation/vm/ksm.txt |   16 
  mm/ksm.c |   18 ++
  2 files changed, 26 insertions(+), 8 deletions(-)

--- mmotm.orig/Documentation/vm/ksm.txt 2013-02-20 22:28:09.456001057 -0800
+++ mmotm/Documentation/vm/ksm.txt  2013-02-20 22:28:23.580001392 -0800
@@ -60,10 +60,18 @@ sleep_millisecs  - how many milliseconds
  
  merge_across_nodes - specifies if pages from different numa nodes can be merged.

 When set to 0, ksm merges only pages which physically
-   reside in the memory area of same NUMA node. It brings
-   lower latency to access to shared page. Value can be
-   changed only when there is no ksm shared pages in system.
-   Default: 1
+   reside in the memory area of same NUMA node. That brings
+   lower latency to access of shared pages. Systems with more
+   nodes, at significant NUMA distances, are likely to benefit
+   from the lower latency of setting 0. Smaller systems, which
+   need to minimize memory usage, are likely to benefit from
+   the greater sharing of setting 1 (default). You may wish to
+   compare how your system performs under each setting, before
+   deciding on which to use. merge_across_nodes setting can be
+   changed only when there are no ksm shared pages in system:
+   set run 2 to unmerge pages first, then to 1 after changing
+   merge_across_nodes, to remerge according to the new setting.


What's the root reason merge_across_nodes setting just can be changed 
only when there are no ksm shared pages in system? Can they be unmerged 
and merged again during ksmd scan?



+   Default: 1 (merging across nodes as in earlier releases)
  
  run  - set 0 to stop ksmd from running but keep merged pages,

 set 1 to run ksmd e.g. "echo 1 > /sys/kernel/mm/ksm/run",
--- mmotm.orig/mm/ksm.c 2013-02-20 22:28:09.456001057 -0800
+++ mmotm/mm/ksm.c  2013-02-20 22:28:23.584001392 -0800
@@ -87,6 +87,9 @@
   *take 10 attempts to find a page in the unstable tree, once it is found,
   *it is secured in the stable tree.  (When we scan a new page, we first
   *compare it against the stable tree, and then against the unstable tree.)
+ *
+ * If the merge_across_nodes tunable is unset, then KSM maintains multiple
+ * stable trees and multiple unstable trees: one of each for each NUMA node.
   */
  
  /**

@@ -524,7 +527,7 @@ static void remove_node_from_stable_tree
   * a page to put something that might look like our key in page->mapping.
   * is on its way to being freed; but it is an anomaly to bear in mind.
   */
-static struct page *get_ksm_page(struct stable_node *stable_node, bool locked)
+static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it)
  {
struct page *page;
void *expected_mapping;
@@ -573,7 +576,7 @@ again:
goto stale;
}
  
-	if (locked) {

+   if (lock_it) {
lock_page(page);
if (ACCESS_ONCE(page->mapping) != expected_mapping) {
unlock_page(page);
@@ -703,10 +706,17 @@ static int remove_stable_node(struct sta
return 0;
}
  
-	if (WARN_ON_ONCE(page_mapped(page)))

+   if (WARN_ON_ONCE(page_mapped(page))) {
+   /*
+* This should not happen: but if it does, just refuse to let
+* merge_across_nodes be switched - there is no need to panic.
+*/
err = -EBUSY;
-   else {
+   } else {
/*
+* The stable node did not yet appear stale to get_ksm_page(),
+* since that allows for an unmapped ksm page to be recognized
+* right up until it is freed; but the node is safe to remove.
 * This page might be in a pagevec waiting to be freed,
 * or it might be PageSwapCache (perhaps under writeback),
 * or it might have been removed from swapcache a moment ago.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Re: [RFC PATCH v3 1/3] sched: schedule balance map foundation

2013-02-21 Thread Michael Wang
On 02/22/2013 11:33 AM, Alex Shi wrote:
> On 02/22/2013 10:53 AM, Michael Wang wrote:

>> And the final cost is 3000 int and 103 pointer, and some padding,
>> but won't bigger than 10M, not a big deal for a system with 1000 cpu
>> too.

 Maybe, but quadric stuff should be frowned upon at all times, these
 things tend to explode when you least expect it.

 For instance, IIRC the biggest single image system SGI booted had 16k
 cpus in there, that ends up at something like 14+14+3=31 aka as 2G of
 storage just for your lookup -- that seems somewhat preposterous.
>> Honestly, if I'm a admin who own 16k cpus system (I could not even image
>> how many memory it could have...), I really prefer to exchange 2G memory
>> to gain some performance.
>>
>> I see your point here, the cost of space will grow exponentially, but
>> the memory of system will also grow, and according to my understanding ,
>> it's faster.
> 

Hi, Alex

Thanks for your reply.

> Why not seek other way to change O(n^2) to O(n)?
> 
> Access 2G memory is unbelievable performance cost.

Not access 2G memory, but (2G / 16K) memory, the sbm size is O(N).

And please notice that on 16k cpus system, topology will be deep if NUMA
enabled (O(log N) as Peter said), and that's really a good stage for
this idea to perform on, we could save lot's of recursed 'for' cycles.

> 
> There are too many jokes on the short-sight of compute scalability, like
> Gates' 64K memory in 2000.

Please do believe me that I won't give up any chance to solve or lighten
this issue (like apply Mike's suggestion), and please let me know if you
have any suggestions to reduce the memory cost.

May be I could make this idea as an option, override the
select_task_rq_fair() when people want the new logical, and if they
don't want to trade with memory, just !CONFIG.

Regards,
Michael Wang

> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: build failure after merge of the final tree (drm tree related)

2013-02-21 Thread Stephen Rothwell
Hi all,

After merging the final tree, today's linux-next build (powerpc
allyesconfig) failed like this:

drivers/gpu/drm/drm_gem_cma_helper.c: In function 'drm_gem_cma_buf_destroy':
drivers/gpu/drm/drm_gem_cma_helper.c:38:2: error: implicit declaration of 
function 'dma_free_writecombine' [-Werror=implicit-function-declaration]
drivers/gpu/drm/drm_gem_cma_helper.c: In function 'drm_gem_cma_create':
drivers/gpu/drm/drm_gem_cma_helper.c:61:2: error: implicit declaration of 
function 'dma_alloc_writecombine' [-Werror=implicit-function-declaration]

Probably caused by commit 16ea975eac67 ("drm/tilcdc: add TI LCD
Controller DRM driver (v4)") which forced CONFIG_DRM_GEM_CMA_HELPER to
'y'.  dma_alloc/free_writecombine are only defined on ARM.

I added this patch for today.

From: Stephen Rothwell 
Date: Fri, 22 Feb 2013 15:14:50 +1100
Subject: [PATCH] drm/tilcdc: only build on arm

Signed-off-by: Stephen Rothwell 
---
 drivers/gpu/drm/tilcdc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/tilcdc/Kconfig b/drivers/gpu/drm/tilcdc/Kconfig
index ae14fd6..d24d040 100644
--- a/drivers/gpu/drm/tilcdc/Kconfig
+++ b/drivers/gpu/drm/tilcdc/Kconfig
@@ -1,6 +1,6 @@
 config DRM_TILCDC
tristate "DRM Support for TI LCDC Display Controller"
-   depends on DRM && OF
+   depends on DRM && OF && ARM
select DRM_KMS_HELPER
select DRM_KMS_CMA_HELPER
select DRM_GEM_CMA_HELPER
-- 
1.8.1

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpM6ymwwh7Qp.pgp
Description: PGP signature


Re: [PATCH] staging/zcache: Fix/improve zcache writeback code, tie to a config option

2013-02-21 Thread Ric Mason

On 02/07/2013 02:27 AM, Dan Magenheimer wrote:

It was observed by Andrea Arcangeli in 2011 that zcache can get "full"
and there must be some way for compressed swap pages to be (uncompressed
and then) sent through to the backing swap disk.  A prototype of this
functionality, called "unuse", was added in 2012 as part of a major update
to zcache (aka "zcache2"), but was left unfinished due to the unfortunate
temporary fork of zcache.

This earlier version of the code had an unresolved memory leak
and was anyway dependent on not-yet-upstream frontswap and mm changes.
The code was meanwhile adapted by Seth Jennings for similar
functionality in zswap (which he calls "flush").  Seth also made some
clever simplifications which are herein ported back to zcache.  As a
result of those simplifications, the frontswap changes are no longer
necessary, but a slightly different (and simpler) set of mm changes are
still required [1].  The memory leak is also fixed.

Due to feedback from akpm in a zswap thread, this functionality in zcache
has now been renamed from "unuse" to "writeback".

Although this zcache writeback code now works, there are open questions
as how best to handle the policy that drives it.  As a result, this
patch also ties writeback to a new config option.  And, since the
code still depends on not-yet-upstreamed mm patches, to avoid build
problems, the config option added by this patch temporarily depends
on "BROKEN"; this config dependency can be removed in trees that
contain the necessary mm patches.

[1] https://lkml.org/lkml/2013/1/29/540/ https://lkml.org/lkml/2013/1/29/539/


shrink_zcache_memory:

while(nr_evict-- > 0) {
page = zcache_evict_eph_pageframe();
if (page == NULL)
break;
zcache_free_page(page);
}

zcache_evict_eph_pageframe
->zbud_evict_pageframe_lru
->zbud_evict_tmem
->tmem_flush_page
->zcache_pampd_free
->zcache_free_page  <- zbudpage has already been free here

If the zcache_free_page called in shrink_zcache_memory can be treated as 
a double free?




Signed-off-by: Dan Magenheimer 
---
  drivers/staging/zcache/Kconfig   |   17 ++
  drivers/staging/zcache/zcache-main.c |  332 +++---
  2 files changed, 284 insertions(+), 65 deletions(-)

diff --git a/drivers/staging/zcache/Kconfig b/drivers/staging/zcache/Kconfig
index c1dbd04..7358270 100644
--- a/drivers/staging/zcache/Kconfig
+++ b/drivers/staging/zcache/Kconfig
@@ -24,3 +24,20 @@ config RAMSTER
  while minimizing total RAM across the cluster.  RAMster, like
  zcache2, compresses swap pages into local RAM, but then remotifies
  the compressed pages to another node in the RAMster cluster.
+
+# Depends on not-yet-upstreamed mm patches to export end_swap_bio_write and
+# __add_to_swap_cache, and implement __swap_writepage (which is swap_writepage
+# without the frontswap call. When these are in-tree, the dependency on
+# BROKEN can be removed
+config ZCACHE_WRITEBACK
+   bool "Allow compressed swap pages to be writtenback to swap disk"
+   depends on ZCACHE=y && BROKEN
+   default n
+   help
+ Zcache caches compressed swap pages (and other data) in RAM which
+ often improves performance by avoiding I/O's due to swapping.
+ In some workloads with very long-lived large processes, it can
+ instead reduce performance.  Writeback decompresses zcache-compressed
+ pages (in LRU order) when under memory pressure and writes them to
+ the backing swap disk to ameliorate this problem.  Policy driving
+ writeback is still under development.
diff --git a/drivers/staging/zcache/zcache-main.c 
b/drivers/staging/zcache/zcache-main.c
index c1ac905..5bf14c3 100644
--- a/drivers/staging/zcache/zcache-main.c
+++ b/drivers/staging/zcache/zcache-main.c
@@ -22,6 +22,10 @@
  #include 
  #include 
  #include 
+#include 
+#include 
+#include 
+#include 
  
  #include 

  #include 
@@ -55,6 +59,9 @@ static inline void frontswap_tmem_exclusive_gets(bool b)
  }
  #endif
  
+/* enable (or fix code) when Seth's patches are accepted upstream */

+#define zcache_writeback_enabled 0
+
  static int zcache_enabled __read_mostly;
  static int disable_cleancache __read_mostly;
  static int disable_frontswap __read_mostly;
@@ -181,6 +188,8 @@ static unsigned long zcache_last_active_anon_pageframes;
  static unsigned long zcache_last_inactive_anon_pageframes;
  static unsigned long zcache_eph_nonactive_puts_ignored;
  static unsigned long zcache_pers_nonactive_puts_ignored;
+static unsigned long zcache_writtenback_pages;
+static long zcache_outstanding_writeback_pages;
  
  #ifdef CONFIG_DEBUG_FS

  #include 
@@ -239,6 +248,9 @@ static int zcache_debugfs_init(void)
zdfs64("eph_zbytes_max", S_IRUGO, root, _eph_zbytes_max);
zdfs64("pers_zbytes", S_IRUGO, root, _pers_zbytes);
zdfs64("pers_zbytes_max", S_IRUGO, root, _pers_zbytes_max);
+   

Re: [PATCH 0/4] dcache: make Oracle more scalable on large systems

2013-02-21 Thread Waiman Long

On 02/21/2013 07:13 PM, Andi Kleen wrote:

Dave Chinner  writes:


On Tue, Feb 19, 2013 at 01:50:55PM -0500, Waiman Long wrote:

It was found that the Oracle database software issues a lot of call
to the seq_path() kernel function which translates a (dentry, mnt)
pair to an absolute path. The seq_path() function will eventually
take the following two locks:

Nobody should be doing reverse dentry-to-name lookups in a quantity
sufficient for it to become a performance limiting factor. What is
the Oracle DB actually using this path for?

Yes calling d_path frequently is usually a bug elsewhere.
Is that through /proc ?

-Andi


A sample strace of Oracle indicates that it opens a lot of /proc 
filesystem files such as the stat, maps, etc many times while running. 
Oracle has a very detailed system performance reporting infrastructure 
in place to report almost all aspect of system performance through its 
AWR reporting tool or the browser-base enterprise manager. Maybe that is 
the reason why it is hitting this performance bottleneck.


Regards,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH RFC] video: Add Hyper-V Synthetic Video Frame Buffer Driver

2013-02-21 Thread Haiyang Zhang

> From: Olaf Hering
> Sent: Thursday, February 21, 2013 10:53 AM
> To: Haiyang Zhang
> Cc: florianschandi...@gmx.de; linux-fb...@vger.kernel.org; KY Srinivasan; 
> jasow...@redhat.com; linux-kernel@vger.kernel.org; 
> de...@linuxdriverproject.org
> Subject: Re: [PATCH RFC] video: Add Hyper-V Synthetic Video Frame Buffer 
> Driver
> 
> On Tue, Feb 19, Haiyang Zhang wrote:
> 
> > In my test, the vesafb doesn't automatically give up the emulated video 
> > device,
> > unless I add the DMI based mechanism to let it exit on Hyper-V.
> 
> From reading the code, it seems to do that via
> do_remove_conflicting_framebuffers(). hypervfb does not set apertures
> etc, so that function is a noop.

We are currently allocating a new framebuffer for hyperv_fb, which is different
from the framebuffer for the emulated video. So this cannot be detected by
do_remove_conflicting_framebuffers() based on apertures_overlap().

> My point is that with this new driver distro kernel will have no console
> output until hypervfb is loaded. On native hardware there is at least
> vesafb which can display something until initrd is running. So if the
> hypervisor allows that hypervfb can shutdown the emulated vesa hardware
> then it should do that.

Since the generic vga driver starts to work early in the boot process, the 
console
messages are still displayed without vesafb. Actually, I didn't see any console 
messages missing when comparing it to the original VM before my patch.

Thanks,
- Haiyang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kexec: prevent double free on image allocation failure

2013-02-21 Thread Zhang Yanfei
于 2013年02月22日 11:41, Sasha Levin 写道:
> On 02/21/2013 09:46 PM, Zhang Yanfei wrote:
>> 于 2013年02月22日 09:55, Eric W. Biederman 写道:
>>> Sasha Levin  writes:
>>>
 If kimage_normal_alloc() fails to initialize an allocated kimage, it will 
 free
 the image but would still set 'rimage', as a result kexec_load will try
 to free it again.

 This would explode as part of the freeing process is accessing internal
 members which point to uninitialized memory.
>>>
>>> Agreed.
>>>
>>> I don't think that failure path has ever actually been exercised.
>>>
>>> The code is wrong, and it is worth fixing.
>>>
>>> Andrew I do you think you could queue this up?  I don't have a handy tree.
>>
>>
>> I still found another malloc/free problem in this function. So I update the 
>> patch.
>>
>> -
>>
>> From 1fb76a35e4109e1435f55048c20ea58622e7f87b Mon Sep 17 00:00:00 2001
>> From: Zhang Yanfei 
>> Date: Fri, 22 Feb 2013 10:34:02 +0800
>> Subject: [PATCH] kexec: fix allocation problems in function 
>> kimage_normal_alloc
>>
>> The function kimage_normal_alloc() has 2 allocation problems that may cause
>> failures:
>>
>>   1. If kimage_normal_alloc() fails to initialize an allocated kimage, it 
>> will
>>  free the image but would still set 'rimage', as a result kexec_load will
>>  try to free it again.
>>
>>  This would explode as part of the freeing process is accessing internal
>>  members which point to uninitialized memory.
>>
>>   2. If kimage_normal_alloc() fails to alloc pages for image->swap_page, it
>>  should call kimage_free_page_list() to free allocated pages in
>>  image->control_pages list before it frees image.
>>
>> Signed-off-by: Sasha Levin 
>> Signed-off-by: Zhang Yanfei 
>> ---
>>  kernel/kexec.c |   10 ++
>>  1 files changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/kexec.c b/kernel/kexec.c
>> index 5e4bd78..f219357 100644
>> --- a/kernel/kexec.c
>> +++ b/kernel/kexec.c
>> @@ -223,6 +223,8 @@ out:
>>  
>>  }
>>  
>> +static void kimage_free_page_list(struct list_head *list);
>> +
>>  static int kimage_normal_alloc(struct kimage **rimage, unsigned long entry,
>>  unsigned long nr_segments,
>>  struct kexec_segment __user *segments)
>> @@ -236,8 +238,6 @@ static int kimage_normal_alloc(struct kimage **rimage, 
>> unsigned long entry,
>>  if (result)
>>  goto out;
>>  
>> -*rimage = image;
>> -
>>  /*
>>   * Find a location for the control code buffer, and add it
>>   * the vector of segments so that it's pages will also be
>> @@ -259,10 +259,12 @@ static int kimage_normal_alloc(struct kimage **rimage, 
>> unsigned long entry,
>>  
>>  result = 0;
>>   out:
>> -if (result == 0)
>> +if (result == 0) {
>>  *rimage = image;
>> -else
>> +} else {
>> +kimage_free_page_list(>control_pages);
>>  kfree(image);
>> +}
>>  
>>  return result;
>>  }
> 
> And if do_kimage_alloc() fails instead of kimage_alloc_control_pages()
> you will NULL deref 'image', so now instead of leaking pages the kernel
> will explode.

Oh, I missed this.

> 
> Either way, this issue you've pointed out should be fixed in a separate
> patch.
> 
> 

OK,I will send another patch.

Thanks
Zhang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] iommu: making IOMMU sysfs nodes API public

2013-02-21 Thread Alex Williamson
On Fri, 2013-02-22 at 11:04 +1100, David Gibson wrote:
> On Tue, Feb 19, 2013 at 01:11:51PM -0700, Alex Williamson wrote:
> > On Tue, 2013-02-19 at 18:38 +1100, David Gibson wrote:
> > > On Mon, Feb 18, 2013 at 10:24:00PM -0700, Alex Williamson wrote:
> > > > On Mon, 2013-02-18 at 17:15 +1100, Alexey Kardashevskiy wrote:
> [snip]
> > > >  Adding the window size to sysfs seems more readily convenient,
> > > > but is it so hard for userspace to open the files and call a couple
> > > > ioctls to get far enough to call IOMMU_GET_INFO?  I'm unconvinced the
> > > > clutter in sysfs more than just a quick fix.  Thanks,
> > > 
> > > And finally, as Alexey points out, isn't the point here so we know how
> > > much rlimit to give qemu?  Using ioctls we'd need a special tool just
> > > to check the dma window sizes, which seems a bit hideous.
> > 
> > Is it more hideous that using iommu groups to report a vfio imposed
> > restriction?  Are a couple open files and a handful of ioctls worse than
> > code to parse directory entries and the future maintenance of an
> > unrestricted grab bag of sysfs entries?
> 
> The fact that the memory is locked is a vfio restriction, but the
> actual dma window size is, genuinely, a property of the group.

A group is an association of devices based on isolation and visibility.
The dma window happens to be associated with a group on your platform,
but that's not always the case.  This is why I was hoping something in
sysfs already reported the dma window so that we could point to it
rather than creating an interface where it doesn't really belong.
Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT] Networking

2013-02-21 Thread Mark Lord
On 13-02-21 09:26 PM, Paul Gortmaker wrote:
> On Thu, Feb 21, 2013 at 9:37 AM, Mark Lord  wrote:
>> On 13-02-20 10:05 PM, Linus Torvalds wrote:
>>> On Wed, Feb 20, 2013 at 2:09 PM, David Miller  wrote:
..
>>> Nooo You killed the 3c501 and 3c503 drivers! Snif.
>>>
>>> I wonder if they still worked..
>>
>> I hope they're not really dead, because we still use them in several 
>> machines here
>> as secondary interfaces for test rigs and whatnot.
..
> Did you actually look at the drivers deleted?
..

Finally got to one of the boxes here to check.
And you're right, I was confusing drivers.

I always seem to get the 3c509 (ISA) stuff confused with the 3c59x (PCI).
Our boxes here have the 3c59x (PCI) cards.

R.I.P. 3c50x.  :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] staging/zcache: Fix/improve zcache writeback code, tie to a config option

2013-02-21 Thread Ric Mason

On 02/07/2013 02:27 AM, Dan Magenheimer wrote:

It was observed by Andrea Arcangeli in 2011 that zcache can get "full"
and there must be some way for compressed swap pages to be (uncompressed
and then) sent through to the backing swap disk.  A prototype of this
functionality, called "unuse", was added in 2012 as part of a major update
to zcache (aka "zcache2"), but was left unfinished due to the unfortunate
temporary fork of zcache.

This earlier version of the code had an unresolved memory leak
and was anyway dependent on not-yet-upstream frontswap and mm changes.
The code was meanwhile adapted by Seth Jennings for similar
functionality in zswap (which he calls "flush").  Seth also made some
clever simplifications which are herein ported back to zcache.  As a
result of those simplifications, the frontswap changes are no longer
necessary, but a slightly different (and simpler) set of mm changes are
still required [1].  The memory leak is also fixed.

Due to feedback from akpm in a zswap thread, this functionality in zcache
has now been renamed from "unuse" to "writeback".

Although this zcache writeback code now works, there are open questions
as how best to handle the policy that drives it.  As a result, this
patch also ties writeback to a new config option.  And, since the
code still depends on not-yet-upstreamed mm patches, to avoid build
problems, the config option added by this patch temporarily depends
on "BROKEN"; this config dependency can be removed in trees that
contain the necessary mm patches.

[1] https://lkml.org/lkml/2013/1/29/540/ https://lkml.org/lkml/2013/1/29/539/


This patch leads to backend interact with core mm directly,  is it core 
mm should interact with frontend instead of backend? In addition, 
frontswap has already have shrink funtion, should we can take advantage 
of it?




Signed-off-by: Dan Magenheimer 
---
  drivers/staging/zcache/Kconfig   |   17 ++
  drivers/staging/zcache/zcache-main.c |  332 +++---
  2 files changed, 284 insertions(+), 65 deletions(-)

diff --git a/drivers/staging/zcache/Kconfig b/drivers/staging/zcache/Kconfig
index c1dbd04..7358270 100644
--- a/drivers/staging/zcache/Kconfig
+++ b/drivers/staging/zcache/Kconfig
@@ -24,3 +24,20 @@ config RAMSTER
  while minimizing total RAM across the cluster.  RAMster, like
  zcache2, compresses swap pages into local RAM, but then remotifies
  the compressed pages to another node in the RAMster cluster.
+
+# Depends on not-yet-upstreamed mm patches to export end_swap_bio_write and
+# __add_to_swap_cache, and implement __swap_writepage (which is swap_writepage
+# without the frontswap call. When these are in-tree, the dependency on
+# BROKEN can be removed
+config ZCACHE_WRITEBACK
+   bool "Allow compressed swap pages to be writtenback to swap disk"
+   depends on ZCACHE=y && BROKEN
+   default n
+   help
+ Zcache caches compressed swap pages (and other data) in RAM which
+ often improves performance by avoiding I/O's due to swapping.
+ In some workloads with very long-lived large processes, it can
+ instead reduce performance.  Writeback decompresses zcache-compressed
+ pages (in LRU order) when under memory pressure and writes them to
+ the backing swap disk to ameliorate this problem.  Policy driving
+ writeback is still under development.
diff --git a/drivers/staging/zcache/zcache-main.c 
b/drivers/staging/zcache/zcache-main.c
index c1ac905..5bf14c3 100644
--- a/drivers/staging/zcache/zcache-main.c
+++ b/drivers/staging/zcache/zcache-main.c
@@ -22,6 +22,10 @@
  #include 
  #include 
  #include 
+#include 
+#include 
+#include 
+#include 
  
  #include 

  #include 
@@ -55,6 +59,9 @@ static inline void frontswap_tmem_exclusive_gets(bool b)
  }
  #endif
  
+/* enable (or fix code) when Seth's patches are accepted upstream */

+#define zcache_writeback_enabled 0
+
  static int zcache_enabled __read_mostly;
  static int disable_cleancache __read_mostly;
  static int disable_frontswap __read_mostly;
@@ -181,6 +188,8 @@ static unsigned long zcache_last_active_anon_pageframes;
  static unsigned long zcache_last_inactive_anon_pageframes;
  static unsigned long zcache_eph_nonactive_puts_ignored;
  static unsigned long zcache_pers_nonactive_puts_ignored;
+static unsigned long zcache_writtenback_pages;
+static long zcache_outstanding_writeback_pages;
  
  #ifdef CONFIG_DEBUG_FS

  #include 
@@ -239,6 +248,9 @@ static int zcache_debugfs_init(void)
zdfs64("eph_zbytes_max", S_IRUGO, root, _eph_zbytes_max);
zdfs64("pers_zbytes", S_IRUGO, root, _pers_zbytes);
zdfs64("pers_zbytes_max", S_IRUGO, root, _pers_zbytes_max);
+   zdfs("outstanding_writeback_pages", S_IRUGO, root,
+   _outstanding_writeback_pages);
+   zdfs("writtenback_pages", S_IRUGO, root, _writtenback_pages);
return 0;
  }
  #undefzdebugfs
@@ -285,6 +297,18 @@ void 

linux-next: manual merge of the akpm-current tree with the tree

2013-02-21 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in
fs/btrfs/inode.cfs/btrfs/file.c between commit 55e301fd57a6 ("Btrfs: move
fs/btrfs/ioctl.h to include/uapi/linux/btrfs.h") from the btrfs tree and
commit "aio: don't include aio.h in sched.h" from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc fs/btrfs/file.c
index 8614c5b,39f556f..000
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@@ -30,7 -30,7 +30,8 @@@
  #include 
  #include 
  #include 
 +#include 
+ #include 
  #include "ctree.h"
  #include "disk-io.h"
  #include "transaction.h"
diff --cc fs/btrfs/inode.c
index 40d49da,ed7ea0a..000
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@@ -39,8 -39,7 +39,9 @@@
  #include 
  #include 
  #include 
 +#include 
 +#include 
+ #include 
  #include "compat.h"
  #include "ctree.h"
  #include "disk-io.h"


pgpMDsgjxCauN.pgp
Description: PGP signature


Re: [PATCH 0/7] ksm: responses to NUMA review

2013-02-21 Thread Ric Mason

On 02/21/2013 04:17 PM, Hugh Dickins wrote:

Here's a second KSM series, based on mmotm 2013-02-19-17-20: partly in
response to Mel's review feedback, partly fixes to issues that I found
myself in doing more review and testing.  None of the issues fixed are
truly show-stoppers, though I would prefer them fixed sooner than later.


Do you have any ideas ksm support page cache and tmpfs?



1 ksm: add some comments
2 ksm: treat unstable nid like in stable tree
3 ksm: shrink 32-bit rmap_item back to 32 bytes
4 mm,ksm: FOLL_MIGRATION do migration_entry_wait
5 mm,ksm: swapoff might need to copy
6 mm: cleanup "swapcache" in do_swap_page
7 ksm: allocate roots when needed

  Documentation/vm/ksm.txt |   16 +++-
  include/linux/mm.h   |1
  mm/ksm.c |  137 +++--
  mm/memory.c  |   38 +++---
  mm/swapfile.c|   15 +++-
  5 files changed, 140 insertions(+), 67 deletions(-)

Thanks,
Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kexec: prevent double free on image allocation failure

2013-02-21 Thread Sasha Levin
On 02/21/2013 09:46 PM, Zhang Yanfei wrote:
> 于 2013年02月22日 09:55, Eric W. Biederman 写道:
>> Sasha Levin  writes:
>>
>>> If kimage_normal_alloc() fails to initialize an allocated kimage, it will 
>>> free
>>> the image but would still set 'rimage', as a result kexec_load will try
>>> to free it again.
>>>
>>> This would explode as part of the freeing process is accessing internal
>>> members which point to uninitialized memory.
>>
>> Agreed.
>>
>> I don't think that failure path has ever actually been exercised.
>>
>> The code is wrong, and it is worth fixing.
>>
>> Andrew I do you think you could queue this up?  I don't have a handy tree.
> 
> 
> I still found another malloc/free problem in this function. So I update the 
> patch.
> 
> -
> 
> From 1fb76a35e4109e1435f55048c20ea58622e7f87b Mon Sep 17 00:00:00 2001
> From: Zhang Yanfei 
> Date: Fri, 22 Feb 2013 10:34:02 +0800
> Subject: [PATCH] kexec: fix allocation problems in function 
> kimage_normal_alloc
> 
> The function kimage_normal_alloc() has 2 allocation problems that may cause
> failures:
> 
>   1. If kimage_normal_alloc() fails to initialize an allocated kimage, it will
>  free the image but would still set 'rimage', as a result kexec_load will
>  try to free it again.
> 
>  This would explode as part of the freeing process is accessing internal
>  members which point to uninitialized memory.
> 
>   2. If kimage_normal_alloc() fails to alloc pages for image->swap_page, it
>  should call kimage_free_page_list() to free allocated pages in
>  image->control_pages list before it frees image.
> 
> Signed-off-by: Sasha Levin 
> Signed-off-by: Zhang Yanfei 
> ---
>  kernel/kexec.c |   10 ++
>  1 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 5e4bd78..f219357 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -223,6 +223,8 @@ out:
>  
>  }
>  
> +static void kimage_free_page_list(struct list_head *list);
> +
>  static int kimage_normal_alloc(struct kimage **rimage, unsigned long entry,
>   unsigned long nr_segments,
>   struct kexec_segment __user *segments)
> @@ -236,8 +238,6 @@ static int kimage_normal_alloc(struct kimage **rimage, 
> unsigned long entry,
>   if (result)
>   goto out;
>  
> - *rimage = image;
> -
>   /*
>* Find a location for the control code buffer, and add it
>* the vector of segments so that it's pages will also be
> @@ -259,10 +259,12 @@ static int kimage_normal_alloc(struct kimage **rimage, 
> unsigned long entry,
>  
>   result = 0;
>   out:
> - if (result == 0)
> + if (result == 0) {
>   *rimage = image;
> - else
> + } else {
> + kimage_free_page_list(>control_pages);
>   kfree(image);
> + }
>  
>   return result;
>  }

And if do_kimage_alloc() fails instead of kimage_alloc_control_pages()
you will NULL deref 'image', so now instead of leaking pages the kernel
will explode.

Either way, this issue you've pointed out should be fixed in a separate
patch.


Thanks,
Sasha

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] arm: use built-in byte swap function

2013-02-21 Thread Nicolas Pitre
On Thu, 21 Feb 2013, Kim Phillips wrote:

> Here's the asm version I'm working on now, based on compiler
> output of the C version.  Haven't tested beyond defconfig builds,
> which pass ok.
> 
> Is there anything I have to do for thumb mode?  If so, how to test?

You just need to pick a config that uses some ARMv7 processor, and 
enable CONFIG_THUMB2_KERNEL.  I don't see any problem with your patch 
wrt Thumb2.

Still, I have minor comments below.

> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index dedf02b..e8a41d0 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -59,6 +59,7 @@ config ARM
>   select CLONE_BACKWARDS
>   select OLD_SIGSUSPEND3
>   select OLD_SIGACTION
> + select ARCH_USE_BUILTIN_BSWAP
>   help
> The ARM series is a line of low-power-consumption RISC chip designs
> licensed by ARM Ltd and targeted at embedded applications and
> diff --git a/arch/arm/boot/compressed/Makefile 
> b/arch/arm/boot/compressed/Makefile
> index 5cad8a6..a277e97 100644
> --- a/arch/arm/boot/compressed/Makefile
> +++ b/arch/arm/boot/compressed/Makefile
> @@ -108,12 +108,12 @@ endif
>  
>  targets   := vmlinux vmlinux.lds \
>piggy.$(suffix_y) piggy.$(suffix_y).o \
> -  lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
> +  lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \

Should be both bswapsdi2.o bswapsdi2.S

>font.o font.c head.o misc.o $(OBJS)
>  
>  # Make sure files are removed during clean
>  extra-y   += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \
> -  lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
> +  lib1funcs.S ashldi3.S bswapsdi2.o $(libfdt) $(libfdt_hdrs)

Should be bswapsdi2.S.

>  ifeq ($(CONFIG_FUNCTION_TRACER),y)
>  ORIG_CFLAGS := $(KBUILD_CFLAGS)
> @@ -155,6 +155,12 @@ ashldi3 = $(obj)/ashldi3.o
>  $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S
>   $(call cmd,shipped)
>  
> +# For __bswapsi2, __bswapdi2
> +bswapsdi2 = $(obj)/bswapsdi2.o
> +
> +$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S
> + $(call cmd,shipped)
> +
>  # We need to prevent any GOTOFF relocs being used with references
>  # to symbols in the .bss section since we cannot relocate them
>  # independently from the rest at run time.  This can be achieved by
> @@ -176,7 +182,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a 
> "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \
>  fi
>  
>  $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o 
> \
> - $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE
> + $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \
> + $(bswapsdi2) FORCE
>   @$(check_for_multiple_zreladdr)
>   $(call if_changed,ld)
>   @$(check_for_bad_syms)
> diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
> index 60d3b73..ba578f7 100644
> --- a/arch/arm/kernel/armksyms.c
> +++ b/arch/arm/kernel/armksyms.c
> @@ -35,6 +35,8 @@ extern void __ucmpdi2(void);
>  extern void __udivsi3(void);
>  extern void __umodsi3(void);
>  extern void __do_div64(void);
> +extern void __bswapsi2(void);
> +extern void __bswapdi2(void);
>  
>  extern void __aeabi_idiv(void);
>  extern void __aeabi_idivmod(void);
> @@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2);
>  EXPORT_SYMBOL(__udivsi3);
>  EXPORT_SYMBOL(__umodsi3);
>  EXPORT_SYMBOL(__do_div64);
> +EXPORT_SYMBOL(__bswapsi2);
> +EXPORT_SYMBOL(__bswapdi2);
>  
>  #ifdef CONFIG_AEABI
>  EXPORT_SYMBOL(__aeabi_idiv);
> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> index af72969..5383df7 100644
> --- a/arch/arm/lib/Makefile
> +++ b/arch/arm/lib/Makefile
> @@ -13,7 +13,7 @@ lib-y   := backtrace.o changebit.o csumipv6.o 
> csumpartial.o   \
>  ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \
>  ucmpdi2.o lib1funcs.o div64.o  \
>  io-readsb.o io-writesb.o io-readsl.o io-writesl.o  \
> -call_with_stack.o
> +call_with_stack.o bswapsdi2.o
>  
>  mmu-y:= clear_user.o copy_page.o getuser.o putuser.o
>  
> diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S
> new file mode 100644
> index 000..e9c8ca7
> --- /dev/null
> +++ b/arch/arm/lib/bswapsdi2.S
> @@ -0,0 +1,36 @@
> +#include 
> +
> +#if __LINUX_ARM_ARCH__ >= 6
> +ENTRY(__bswapsi2)
> + rev r0, r0
> + bx  lr
> +ENDPROC(__bswapsi2)
> +
> +ENTRY(__bswapdi2)
> + rev r3, r0
> + rev r0, r1
> + mov r1, r3
> + bx  lr
> +ENDPROC(__bswapdi2)
> +#else
> +ENTRY(__bswapsi2)
> + eor r3, r0, r0, ror #16
> + lsr r3, r3, #8

Some older binutils used with pre ARMv6 platforms don't understand the 
latest unified syntax.  So in this case it is better to use:

mov r3, r3, lsr #8

> + bic r3, r3, #65280  @ 0xff00

Please use #0xff00 directly rather than keeping it as a comment.

> + 

Re: [RFC PATCH v3 1/3] sched: schedule balance map foundation

2013-02-21 Thread Alex Shi
On 02/22/2013 10:53 AM, Michael Wang wrote:
>> > 
>>> >> And the final cost is 3000 int and 103 pointer, and some padding,
>>> >> but won't bigger than 10M, not a big deal for a system with 1000 cpu
>>> >> too.
>> > 
>> > Maybe, but quadric stuff should be frowned upon at all times, these
>> > things tend to explode when you least expect it.
>> > 
>> > For instance, IIRC the biggest single image system SGI booted had 16k
>> > cpus in there, that ends up at something like 14+14+3=31 aka as 2G of
>> > storage just for your lookup -- that seems somewhat preposterous.
> Honestly, if I'm a admin who own 16k cpus system (I could not even image
> how many memory it could have...), I really prefer to exchange 2G memory
> to gain some performance.
> 
> I see your point here, the cost of space will grow exponentially, but
> the memory of system will also grow, and according to my understanding ,
> it's faster.

Why not seek other way to change O(n^2) to O(n)?

Access 2G memory is unbelievable performance cost.

There are too many jokes on the short-sight of compute scalability, like
Gates' 64K memory in 2000.

-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Crash cgdeleting empty memory cgroups with memory.kmem.limit_in_bytes set

2013-02-21 Thread Kamezawa Hiroyuki

(2013/02/21 17:34), Glauber Costa wrote:

On 02/21/2013 03:00 AM, Tejun Heo wrote:

(cc'ing cgroup / memcg people and quoting whole body)

Looks like something is going wrong with memcg cache destruction.
Glauber, any ideas?  Also, can we please not use names as generic as
kmem_cache_destroy_work_func for something specific to memcg?  How
about something like memcg_destroy_cache_workfn?



I will take a look. Thanks for the report for the reportee: I tested
cgroup deletion quite extensively (quite important feature for me) so it
is nice to have an uncaught case.

About naming, I can change, no problem.



seems reproduced on linux-3.8 On KVM guest , Fedora18's config + kmemcg.
-Kame
==
[  250.533831] general protection fault:  [#1] SMP
[  250.538096] Modules linked in: ebtable_nat xt_CHECKSUM 
nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle 
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat 
iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun 
bridge stp llc ebtable_filter ebtables be2iscsi iscsi_boot_sysfs 
ip6table_filter ip6_tables bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio 
libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp 
libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_intel snd_hda_codec 
snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc 8139too snd_timer 
microcode snd 8139cp mii floppy pcspkr virtio_balloon soundcore i2c_piix4 btrfs 
libcrc32c zlib_deflate cirrus drm_kms_helper ttm drm virtio_blk i2c_core
[  250.538096] CPU 1
[  250.538096] Pid: 38, comm: kworker/1:1 Not tainted 3.8.0 #3 Bochs Bochs
[  250.538096] RIP: 0010:[]  [] 
kmem_cache_free+0x13a/0x1d0
[  250.538096] RSP: 0018:880214345cc8  EFLAGS: 00010286
[  250.538096] RAX: 81d84020 RBX: 880217000f00 RCX: 0068
[  250.538096] RDX:  RSI: 880217000f00 RDI: 880217000f00
[  250.538096] RBP: 880214345ce8 R08: 13c0 R09: 006c
[  250.538096] R10: 0007ebc0ffe0 R11: 0007ebc0ffe0 R12: 880217001100
[  250.538096] R13: 880214042c00 R14: 0200 R15: 880217000ef0
[  250.538096] FS:  () GS:88021fc8() 
knlGS:
[  250.538096] CS:  0010 DS:  ES:  CR0: 8005003b
[  250.538096] CR2: 003e98ae6ef0 CR3: 00021365 CR4: 06e0
[  250.538096] DR0:  DR1:  DR2: 
[  250.538096] DR3:  DR6: 0ff0 DR7: 0400
[  250.538096] Process kworker/1:1 (pid: 38, threadinfo 880214344000, task 
88021435)
[  250.538096] Stack:
[  250.538096]  e8c013c0   
880214042c00
[  250.538096]  880214345d18 81182084 880214042c00 
880217000ef0
[  250.538096]  880217000ef0 880214042c00 880214345d88 
81184d7e
[  250.538096] Call Trace:
[  250.538096]  [] free_kmem_cache_nodes+0x64/0xb0
[  250.538096]  [] __kmem_cache_shutdown+0x24e/0x320
[  250.538096]  [] ? kmem_cache_shrink+0x210/0x230
[  250.538096]  [] kmem_cache_destroy+0x3f/0xe0
[  250.538096]  [] kmem_cache_destroy_work_func+0x30/0x60
[  250.538096]  [] process_one_work+0x147/0x490
[  250.538096]  [] ? mem_cgroup_slabinfo_read+0xb0/0xb0
[  250.538096]  [] worker_thread+0x15e/0x450
[  250.538096]  [] ? busy_worker_rebind_fn+0x110/0x110
[  250.538096]  [] kthread+0xc0/0xd0
[  250.538096]  [] ? 
ftrace_define_fields_xen_mc_entry+0xa0/0xf0
[  250.538096]  [] ? kthread_create_on_node+0x120/0x120
[  250.538096]  [] ret_from_fork+0x7c/0xb0
[  250.538096]  [] ? kthread_create_on_node+0x120/0x120
[  250.538096] Code: c1 e0 06 48 01 d0 48 8b 10 80 e6 80 0f 85 98 00 00 00 48 8b 40 
30 49 39 c4 0f 84 f9 fe ff ff 48 8b 90 b8 00 00 00 48 85 d2 74 06 <4c> 3b 62 20 
74 50 48 8b 50 60 49 8b 4c 24 60 31 c0 48 c7 c6 68
[  250.538096] RIP  [] kmem_cache_free+0x13a/0x1d0
[  250.538096]  RSP 
[  250.746175] ---[ end trace 91abe13b8481aaaf ]---
[  250.748879] BUG: unable to handle kernel paging request at ffd8
[  250.749818] IP: [] kthread_data+0x10/0x20
[  250.749818] PGD 1c0e067 PUD 1c0f067 PMD 0
[  250.749818] Oops:  [#2] SMP
[  250.749818] Modules linked in: ebtable_nat xt_CHECKSUM 
nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle 
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat 
iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun 
bridge stp llc ebtable_filter ebtables be2iscsi iscsi_boot_sysfs 
ip6table_filter ip6_tables bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio 
libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp 
libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_intel snd_hda_codec 
snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc 8139too snd_timer 
microcode snd 8139cp mii floppy pcspkr virtio_balloon soundcore i2c_piix4 btrfs 

Re: Questin about swap_slot free and invalidate page

2013-02-21 Thread Ric Mason

On 02/22/2013 05:42 AM, Dan Magenheimer wrote:

From: Ric Mason [mailto:ric.mas...@gmail.com]
Subject: Re: Questin about swap_slot free and invalidate page

On 02/19/2013 11:27 PM, Dan Magenheimer wrote:

From: Ric Mason [mailto:ric.mas...@gmail.com]

Hugh is right that handling the possibility of duplicates is
part of the tmem ABI.  If there is any possibility of duplicates,
the ABI defines how a backend must handle them to avoid data
coherency issues.

The kernel implements an in-kernel API which implements the tmem
ABI.  If the frontend and backend can always agree that duplicate

Which ABI in zcache implement that?

https://oss.oracle.com/projects/tmem/dist/documentation/api/tmemspec-v001.pdf

The in-kernel APIs are frontswap and cleancache.  For more information about
tmem, see http://lwn.net/Articles/454795/

But you mentioned that you have in-kernel API which can handle
duplicate.  Do you mean zcache_cleancache/frontswap_put_page? I think
they just overwrite instead of optional flush the page on the
second(duplicate) put as mentioned in your tmemspec.

Maybe I am misunderstanding your question...  The spec allows
overwrite (and return success) OR flush the page (and return
failure).  Zcache does the latter (flush).  The code that implements
it is in tmem_put.


Thanks for your point out.  Pers pages can have duplicate put since swap 
cache page can be reused. Can eph pages also have duplicate put? If yes, 
when can happen?






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Skip looking at skip if next or last is set

2013-02-21 Thread Michael Wang
On 02/22/2013 12:06 AM, Srikar Dronamraju wrote:
> * Peter Zijlstra  [2013-02-20 09:46:25]:
> 
>> On Mon, 2013-02-18 at 18:31 +0530, Srikar Dronamraju wrote:
>>> pick_next_entity() prefers next, then last. However code checks if the
>>> left entity can be skipped even if next / last is set.
>>>
>>> Check if left entity should be skipped only if next/last is not set.
>>
>> You fail to explain why its a problem and continue to make a horrid mess
>> of the code..
>>
> 
> If we look at the comments above pick_next_entity(),  it states:
> /*
>  * Pick the next process, keeping these things in mind, in this order:
>  * 1) keep things fair between processes/task groups
>  * 2) pick the "next" process, since someone really wants that to run
>  * 3) pick the "last" process, for cache locality
>  * 4) do not run the "skip" process, if something else is available
>  */ 
> 
> Currently the code checks in the reverse order, though the preference is
> correctly maintained as listed in comments.  But in some cases, we might be
> doing redundant checks.  Lets assume next is set, then we should avoid 
> checking for skip, last and their fairness with left.
> 
> So what I intended to do was change the order, i.e check for last only if next
> is not set (or was picking next was unfair wrt left) and check for "something
> else (second from left)" if last is not set (or picking last was unfair wrt
> left).
> 
> However after sending the patch, I stumbled across these links.
> https://lkml.org/lkml/2012/1/16/500 and https://lkml.org/lkml/2012/1/25/195

Hi, Srikar

That drag me back to the time when I'm starting to look at scheduler ;-)

Actually I give up this idea since I missed one point that the code will
be optimized by the compiler, and usually it will become some logical we
could not image.

My patch is correct logically, but it may not benefit scheduler a lot, I
don't think there will be a benchmark show us better results, and in
scheduler world, benchmark talks...

Regards,
Michael Wang

> 
>>> Signed-off-by: Srikar Dronamraju 
>>> ---
>>>  kernel/sched/fair.c |   31 +++
>>>  1 files changed, 15 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index fdee793..cc97b12 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -1900,27 +1900,26 @@ static struct sched_entity *pick_next_entity(struct 
>>> cfs_rq *cfs_rq)
>>> struct sched_entity *left = se;
>>>  
>>> /*
>>> -* Avoid running the skip buddy, if running something else can
>>> -* be done without getting too unfair.
>>> +* Someone really wants next to run. If it's not unfair, run it.
>>>  */
>>> -   if (cfs_rq->skip == se) {
>>> -   struct sched_entity *second = __pick_next_entity(se);
>>> +   if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1) {
>>> +   se = cfs_rq->next;
>>> +   } else if (cfs_rq->last && wakeup_preempt_entity(cfs_rq->last, left) < 
>>> 1) {
>>> +   /*
>>> +* Prefer last buddy, try to return the CPU to a preempted
>>> +* task.
>>> +*/
>>> +   se = cfs_rq->last;
>>> +   } else if (cfs_rq->skip == left) {
>>> +   /*
>>> +* Avoid running the skip buddy, if running something else
>>> +* can be done without getting too unfair.
>>> +*/
>>> +   struct sched_entity *second = __pick_next_entity(left);
>>> if (second && wakeup_preempt_entity(second, left) < 1)
>>> se = second;
>>> }
>>>  
>>> -   /*
>>> -* Prefer last buddy, try to return the CPU to a preempted task.
>>> -*/
>>> -   if (cfs_rq->last && wakeup_preempt_entity(cfs_rq->last, left) < 1)
>>> -   se = cfs_rq->last;
>>> -
>>> -   /*
>>> -* Someone really wants this to run. If it's not unfair, run it.
>>> -*/
>>> -   if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
>>> -   se = cfs_rq->next;
>>> -
>>> clear_buddies(cfs_rq, se);
>>>  
>>> return se;
>>>
>>
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] arm: use built-in byte swap function

2013-02-21 Thread Kim Phillips
On Thu, 21 Feb 2013 11:40:54 -0500
Nicolas Pitre  wrote:

> On Thu, 21 Feb 2013, Kim Phillips wrote:
> 
> > On Wed, 20 Feb 2013 23:29:58 -0500
> > Nicolas Pitre  wrote:
> > 
> > > On Wed, 20 Feb 2013, Kim Phillips wrote:
> > > 
> > > > On Wed, 20 Feb 2013 10:43:18 -0500
> > > > Nicolas Pitre  wrote:
> > > > 
> > > > > On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > > > > > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> > > > > > > ... in which case there is no harm shipping a .c file and 
> > > > > > > trivially 
> > > > > > > enforcing -O2, the rest being equal.
> > > > > > 
> > > > > > For today's compilers, unless the wind changes.
> > > > > 
> > > > > We'll adapt if necessary.  Going with -O2 should remain pretty safe 
> > > > > anyway.
> > > > 
> > > > Alas, not so for gcc 4.4 - I had forgotten I had tested
> > > > Ubuntu/Linaro 4.4.7-1ubuntu2 here:
> > > > 
> > > > https://patchwork.kernel.org/patch/2101491/
> > > > 
> > > > add -O2 to that test script and gcc 4.4 *always* emits calls to
> > > > __bswap[sd]i2, even with -march=armv6k+.
> > 
> > argh, sorry - that script was testing support for 
> > __builtin_bswap{16,32,64} directly, which isn't the same as testing
> > code generation of a byte swap pattern in C.
> 
> Still, I'm not as confident as I was about this.

which part exactly?  Having -O2 as "protection"?  Yes, me neither.

> > I'll still try the assembly approach - gcc 4.4's armv6 output looks
> > worse than both the pre-armv6 and post-armv6 __arch_swab32
> > implementations currently in use:
> > 
> > mov ip, sp
> > push{fp, ip, lr, pc}
> > sub fp, ip, #4
> 
> You should use -fomit-frame-pointer to compile this.  We don't need a 
> frame pointer here, especially for a leaf function that the compiler 
> decides to call on its own.
> 
> > and r2, r0, #65280  ; 0xff00
> > lsl ip, r0, #24
> > orr r1, ip, r0, lsr #24
> > and r0, r0, #16711680   ; 0xff
> > orr r3, r1, r2, lsl #8
> > orr r0, r3, r0, lsr #8
> 
> Other than that, it is true that the above is slightly suboptimal.

Here's the asm version I'm working on now, based on compiler
output of the C version.  Haven't tested beyond defconfig builds,
which pass ok.

Is there anything I have to do for thumb mode?  If so, how to test?

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index dedf02b..e8a41d0 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -59,6 +59,7 @@ config ARM
select CLONE_BACKWARDS
select OLD_SIGSUSPEND3
select OLD_SIGACTION
+   select ARCH_USE_BUILTIN_BSWAP
help
  The ARM series is a line of low-power-consumption RISC chip designs
  licensed by ARM Ltd and targeted at embedded applications and
diff --git a/arch/arm/boot/compressed/Makefile 
b/arch/arm/boot/compressed/Makefile
index 5cad8a6..a277e97 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -108,12 +108,12 @@ endif
 
 targets   := vmlinux vmlinux.lds \
 piggy.$(suffix_y) piggy.$(suffix_y).o \
-lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
+lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
 font.o font.c head.o misc.o $(OBJS)
 
 # Make sure files are removed during clean
 extra-y   += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \
-lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
+lib1funcs.S ashldi3.S bswapsdi2.o $(libfdt) $(libfdt_hdrs)
 
 ifeq ($(CONFIG_FUNCTION_TRACER),y)
 ORIG_CFLAGS := $(KBUILD_CFLAGS)
@@ -155,6 +155,12 @@ ashldi3 = $(obj)/ashldi3.o
 $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S
$(call cmd,shipped)
 
+# For __bswapsi2, __bswapdi2
+bswapsdi2 = $(obj)/bswapsdi2.o
+
+$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S
+   $(call cmd,shipped)
+
 # We need to prevent any GOTOFF relocs being used with references
 # to symbols in the .bss section since we cannot relocate them
 # independently from the rest at run time.  This can be achieved by
@@ -176,7 +182,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a 
"$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \
 fi
 
 $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \
-   $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE
+   $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \
+   $(bswapsdi2) FORCE
@$(check_for_multiple_zreladdr)
$(call if_changed,ld)
@$(check_for_bad_syms)
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 60d3b73..ba578f7 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -35,6 +35,8 @@ extern void __ucmpdi2(void);
 extern void __udivsi3(void);
 extern void __umodsi3(void);
 extern void __do_div64(void);
+extern void __bswapsi2(void);
+extern void __bswapdi2(void);
 
 extern void __aeabi_idiv(void);
 extern void __aeabi_idivmod(void);
@@ -114,6 +116,8 @@ 

Re: [PATCH] kexec: prevent double free on image allocation failure

2013-02-21 Thread Zhang Yanfei
于 2013年02月22日 09:55, Eric W. Biederman 写道:
> Sasha Levin  writes:
> 
>> If kimage_normal_alloc() fails to initialize an allocated kimage, it will 
>> free
>> the image but would still set 'rimage', as a result kexec_load will try
>> to free it again.
>>
>> This would explode as part of the freeing process is accessing internal
>> members which point to uninitialized memory.
> 
> Agreed.
> 
> I don't think that failure path has ever actually been exercised.
> 
> The code is wrong, and it is worth fixing.
> 
> Andrew I do you think you could queue this up?  I don't have a handy tree.


I still found another malloc/free problem in this function. So I update the 
patch.

-

>From 1fb76a35e4109e1435f55048c20ea58622e7f87b Mon Sep 17 00:00:00 2001
From: Zhang Yanfei 
Date: Fri, 22 Feb 2013 10:34:02 +0800
Subject: [PATCH] kexec: fix allocation problems in function kimage_normal_alloc

The function kimage_normal_alloc() has 2 allocation problems that may cause
failures:

  1. If kimage_normal_alloc() fails to initialize an allocated kimage, it will
 free the image but would still set 'rimage', as a result kexec_load will
 try to free it again.

 This would explode as part of the freeing process is accessing internal
 members which point to uninitialized memory.

  2. If kimage_normal_alloc() fails to alloc pages for image->swap_page, it
 should call kimage_free_page_list() to free allocated pages in
 image->control_pages list before it frees image.

Signed-off-by: Sasha Levin 
Signed-off-by: Zhang Yanfei 
---
 kernel/kexec.c |   10 ++
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index 5e4bd78..f219357 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -223,6 +223,8 @@ out:
 
 }
 
+static void kimage_free_page_list(struct list_head *list);
+
 static int kimage_normal_alloc(struct kimage **rimage, unsigned long entry,
unsigned long nr_segments,
struct kexec_segment __user *segments)
@@ -236,8 +238,6 @@ static int kimage_normal_alloc(struct kimage **rimage, 
unsigned long entry,
if (result)
goto out;
 
-   *rimage = image;
-
/*
 * Find a location for the control code buffer, and add it
 * the vector of segments so that it's pages will also be
@@ -259,10 +259,12 @@ static int kimage_normal_alloc(struct kimage **rimage, 
unsigned long entry,
 
result = 0;
  out:
-   if (result == 0)
+   if (result == 0) {
*rimage = image;
-   else
+   } else {
+   kimage_free_page_list(>control_pages);
kfree(image);
+   }
 
return result;
 }
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5 2/8] zsmalloc: add documentation

2013-02-21 Thread Ric Mason

On 02/21/2013 11:50 PM, Seth Jennings wrote:

On 02/21/2013 02:49 AM, Ric Mason wrote:

On 02/19/2013 03:16 AM, Seth Jennings wrote:

On 02/16/2013 12:21 AM, Ric Mason wrote:

On 02/14/2013 02:38 AM, Seth Jennings wrote:

This patch adds a documentation file for zsmalloc at
Documentation/vm/zsmalloc.txt

Signed-off-by: Seth Jennings 
---
Documentation/vm/zsmalloc.txt |   68
+
1 file changed, 68 insertions(+)
create mode 100644 Documentation/vm/zsmalloc.txt

diff --git a/Documentation/vm/zsmalloc.txt
b/Documentation/vm/zsmalloc.txt
new file mode 100644
index 000..85aa617
--- /dev/null
+++ b/Documentation/vm/zsmalloc.txt
@@ -0,0 +1,68 @@
+zsmalloc Memory Allocator
+
+Overview
+
+zmalloc a new slab-based memory allocator,
+zsmalloc, for storing compressed pages.  It is designed for
+low fragmentation and high allocation success rate on
+large object, but <= PAGE_SIZE allocations.
+
+zsmalloc differs from the kernel slab allocator in two primary
+ways to achieve these design goals.
+
+zsmalloc never requires high order page allocations to back
+slabs, or "size classes" in zsmalloc terms. Instead it allows
+multiple single-order pages to be stitched together into a
+"zspage" which backs the slab.  This allows for higher allocation
+success rate under memory pressure.
+
+Also, zsmalloc allows objects to span page boundaries within the
+zspage.  This allows for lower fragmentation than could be had
+with the kernel slab allocator for objects between PAGE_SIZE/2
+and PAGE_SIZE.  With the kernel slab allocator, if a page compresses
+to 60% of it original size, the memory savings gained through
+compression is lost in fragmentation because another object of
+the same size can't be stored in the leftover space.
+
+This ability to span pages results in zsmalloc allocations not being
+directly addressable by the user.  The user is given an
+non-dereferencable handle in response to an allocation request.
+That handle must be mapped, using zs_map_object(), which returns
+a pointer to the mapped region that can be used.  The mapping is
+necessary since the object data may reside in two different
+noncontigious pages.

Do you mean the reason of  to use a zsmalloc object must map after
malloc is object data maybe reside in two different nocontiguous pages?

Yes, that is one reason for the mapping.  The other reason (more of an
added bonus) is below.


+
+For 32-bit systems, zsmalloc has the added benefit of being
+able to back slabs with HIGHMEM pages, something not possible

What's the meaning of "back slabs with HIGHMEM pages"?

By HIGHMEM, I'm referring to the HIGHMEM memory zone on 32-bit systems
with larger that 1GB (actually a little less) of RAM.  The upper 3GB
of the 4GB address space, depending on kernel build options, is not
directly addressable by the kernel, but can be mapped into the kernel
address space with functions like kmap() or kmap_atomic().

These pages can't be used by slab/slub because they are not
continuously mapped into the kernel address space.  However, since
zsmalloc requires a mapping anyway to handle objects that span
non-contiguous page boundaries, we do the kernel mapping as part of
the process.

So zspages, the conceptual slab in zsmalloc backed by single-order
pages can include pages from the HIGHMEM zone as well.

Thanks for your clarify,
  http://lwn.net/Articles/537422/, your article about zswap in lwn.
  "Additionally, the kernel slab allocator does not allow objects that
are less
than a page in size to span a page boundary. This means that if an
object is
PAGE_SIZE/2 + 1 bytes in size, it effectively use an entire page,
resulting in
~50% waste. Hense there are *no kmalloc() cache size* between
PAGE_SIZE/2 and
PAGE_SIZE."
Are your sure? It seems that kmalloc cache support big size, your can
check in
include/linux/kmalloc_sizes.h

Yes, kmalloc can allocate large objects > PAGE_SIZE, but there are no
cache sizes _between_ PAGE_SIZE/2 and PAGE_SIZE.  For example, on a
system with 4k pages, there are no caches between kmalloc-2048 and
kmalloc-4096.


Since slub cache can merge, is it the root reason?



Seth

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5 2/8] zsmalloc: add documentation

2013-02-21 Thread Ric Mason

On 02/21/2013 11:50 PM, Seth Jennings wrote:

On 02/21/2013 02:49 AM, Ric Mason wrote:

On 02/19/2013 03:16 AM, Seth Jennings wrote:

On 02/16/2013 12:21 AM, Ric Mason wrote:

On 02/14/2013 02:38 AM, Seth Jennings wrote:

This patch adds a documentation file for zsmalloc at
Documentation/vm/zsmalloc.txt

Signed-off-by: Seth Jennings 
---
Documentation/vm/zsmalloc.txt |   68
+
1 file changed, 68 insertions(+)
create mode 100644 Documentation/vm/zsmalloc.txt

diff --git a/Documentation/vm/zsmalloc.txt
b/Documentation/vm/zsmalloc.txt
new file mode 100644
index 000..85aa617
--- /dev/null
+++ b/Documentation/vm/zsmalloc.txt
@@ -0,0 +1,68 @@
+zsmalloc Memory Allocator
+
+Overview
+
+zmalloc a new slab-based memory allocator,
+zsmalloc, for storing compressed pages.  It is designed for
+low fragmentation and high allocation success rate on
+large object, but <= PAGE_SIZE allocations.
+
+zsmalloc differs from the kernel slab allocator in two primary
+ways to achieve these design goals.
+
+zsmalloc never requires high order page allocations to back
+slabs, or "size classes" in zsmalloc terms. Instead it allows
+multiple single-order pages to be stitched together into a
+"zspage" which backs the slab.  This allows for higher allocation
+success rate under memory pressure.
+
+Also, zsmalloc allows objects to span page boundaries within the
+zspage.  This allows for lower fragmentation than could be had
+with the kernel slab allocator for objects between PAGE_SIZE/2
+and PAGE_SIZE.  With the kernel slab allocator, if a page compresses
+to 60% of it original size, the memory savings gained through
+compression is lost in fragmentation because another object of
+the same size can't be stored in the leftover space.
+
+This ability to span pages results in zsmalloc allocations not being
+directly addressable by the user.  The user is given an
+non-dereferencable handle in response to an allocation request.
+That handle must be mapped, using zs_map_object(), which returns
+a pointer to the mapped region that can be used.  The mapping is
+necessary since the object data may reside in two different
+noncontigious pages.

Do you mean the reason of  to use a zsmalloc object must map after
malloc is object data maybe reside in two different nocontiguous pages?

Yes, that is one reason for the mapping.  The other reason (more of an
added bonus) is below.


+
+For 32-bit systems, zsmalloc has the added benefit of being
+able to back slabs with HIGHMEM pages, something not possible

What's the meaning of "back slabs with HIGHMEM pages"?

By HIGHMEM, I'm referring to the HIGHMEM memory zone on 32-bit systems
with larger that 1GB (actually a little less) of RAM.  The upper 3GB
of the 4GB address space, depending on kernel build options, is not
directly addressable by the kernel, but can be mapped into the kernel
address space with functions like kmap() or kmap_atomic().

These pages can't be used by slab/slub because they are not
continuously mapped into the kernel address space.  However, since
zsmalloc requires a mapping anyway to handle objects that span
non-contiguous page boundaries, we do the kernel mapping as part of
the process.

So zspages, the conceptual slab in zsmalloc backed by single-order
pages can include pages from the HIGHMEM zone as well.

Thanks for your clarify,
  http://lwn.net/Articles/537422/, your article about zswap in lwn.
  "Additionally, the kernel slab allocator does not allow objects that
are less
than a page in size to span a page boundary. This means that if an
object is
PAGE_SIZE/2 + 1 bytes in size, it effectively use an entire page,
resulting in
~50% waste. Hense there are *no kmalloc() cache size* between
PAGE_SIZE/2 and
PAGE_SIZE."
Are your sure? It seems that kmalloc cache support big size, your can
check in
include/linux/kmalloc_sizes.h

Yes, kmalloc can allocate large objects > PAGE_SIZE, but there are no
cache sizes _between_ PAGE_SIZE/2 and PAGE_SIZE.  For example, on a
system with 4k pages, there are no caches between kmalloc-2048 and
kmalloc-4096.


kmalloc object > PAGE_SIZE/2 or > PAGE_SIZE should also allocate from 
slab cache, correct? Then how can alloc object w/o slab cache which 
contains this object size objects?




Seth

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 1/3] sched: schedule balance map foundation

2013-02-21 Thread Michael Wang
On 02/21/2013 07:37 PM, Peter Zijlstra wrote:
> On Thu, 2013-02-21 at 12:58 +0800, Michael Wang wrote:
>>
>> You are right, it cost space in order to accelerate the system, I've
>> calculated the cost once before (I'm really not good at this, please
>> let
>> me know if I make any silly calculation...), 
> 
> The exact size isn't that important, but its trivial to see its quadric.
> You have a NR_CPUS array per-cpu, thus O(n^2).
> 
> ( side note; invest in getting good at complexity analysis -- or at
> least competent, its the single most important aspect of programming. )
> 
> ...
> 
>> And the final cost is 3000 int and 103 pointer, and some padding,
>> but won't bigger than 10M, not a big deal for a system with 1000 cpu
>> too.
> 
> Maybe, but quadric stuff should be frowned upon at all times, these
> things tend to explode when you least expect it.
> 
> For instance, IIRC the biggest single image system SGI booted had 16k
> cpus in there, that ends up at something like 14+14+3=31 aka as 2G of
> storage just for your lookup -- that seems somewhat preposterous.

Honestly, if I'm a admin who own 16k cpus system (I could not even image
how many memory it could have...), I really prefer to exchange 2G memory
to gain some performance.

I see your point here, the cost of space will grow exponentially, but
the memory of system will also grow, and according to my understanding ,
it's faster.

Regards,
Michael Wang

> 
> The domain levels are roughly O(log n) related to the total cpus, so
> what you're doing is replacing an O(log n) lookup with an O(1) lookup,
> but at the cost of O(n^2) storage. 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 0/4] CPUFreq: Implement per policy instances of governors

2013-02-21 Thread Viresh Kumar
On 11 February 2013 13:19, Viresh Kumar  wrote:
> This is targetted for 3.10-rc1 or linux-next just after the merge window.

Hi Rafael,

I have pushed this patch again with the modifications/fixups i posted:
cpufreq-for-3.10

Also i have swapped patch 3 & 4, in case you decide to drop that Kconfig
patch which is no. 4 now :)

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 3/4] cpufreq: Add Kconfig option to enable/disable have_multiple_policies

2013-02-21 Thread Viresh Kumar
On 22 February 2013 07:59, Rafael J. Wysocki  wrote:
> On Friday, February 22, 2013 07:44:23 AM Viresh Kumar wrote:

>> If you don't like this one then we can add another entry
>> into struct policy like: gov_sysfs_parent.
>
> I don't know.  This is going to look kind of ugly this way or another I think.
>
> Maybe I'll figure out something ...

Another simple way of doing this is, leave this patch and here is why i say so.

struct policy is allocated dynamically with kzalloc and so every field is zero
including have_multiple_policies. And only the platforms needing this feature
would make it 1 and all remaining ones would stay unchanged.

This variable would waste just "4" bytes for platforms that don't need this
feature.

About performance: This if/else is called only on policy creation or
destruction.
For platforms that doesn't have multiple policies and thus all cpus share the
same policy struct, the destruction might never happen unless we rmmod/insmod
cpufreq driver, because policy destruction would only happen when all the cpus
are removed :)

So it will execute only once at boot time when we initialize policy struct.

Is this patch worth keeping then?

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] Blackfin updates for 3.9

2013-02-21 Thread Bob Liu
Hi Linus,

The following changes since commit 19f949f52599ba7c3f67a5897ac6be14bfcb1200:

  Linux 3.8 (2013-02-18 15:58:34 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/lliubbo/blackfin.git for-linus

for you to fetch changes up to f656c240ae07c48ddf8552e83b64692121044c42:

  blackfin: time-ts: Remove duplicate assignment (2013-02-20 15:21:23 +0800)


Akinobu Mita (1):
  blackfin: use bitmap library functions

Bob Liu (1):
  blackfin: mem_init: update dmc config register

Sonic Zhang (1):
  blackfin: sync data in blackfin write buffer

Stephen Boyd (1):
  blackfin: time-ts: Remove duplicate assignment

Steven Miao (1):
  blackfin: pm: fix build error

 arch/blackfin/include/asm/mem_init.h  |2 +-
 arch/blackfin/include/asm/uaccess.h   |1 +
 arch/blackfin/kernel/dma-mapping.c|   23 +++
 arch/blackfin/kernel/time-ts.c|6 --
 arch/blackfin/mach-common/ints-priority.c |4 
 5 files changed, 13 insertions(+), 23 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-21 Thread Michael Wang
On 02/21/2013 06:20 PM, Peter Zijlstra wrote:
> On Thu, 2013-02-21 at 12:51 +0800, Michael Wang wrote:
>> The old logical when locate affine_sd is:
>>
>> if prev_cpu != curr_cpu
>> if wake_affine()
>> prev_cpu = curr_cpu
>> new_cpu = select_idle_sibling(prev_cpu)
>> return new_cpu
>>
>> The new logical is same to the old one if prev_cpu == curr_cpu, so
>> let's
>> simplify the old logical like:
>>
>> if wake_affine()
>> new_cpu = select_idle_sibling(curr_cpu)
>> else
>> new_cpu = select_idle_sibling(prev_cpu)
>>
>> return new_cpu
>>
>> Actually that doesn't make sense.
> 
> It does :-)
> 
>> I think wake_affine() is trying to check whether move a task from
>> prev_cpu to curr_cpu will break the balance in affine_sd or not, but
>> why
>> won't break balance means curr_cpu is better than prev_cpu for
>> searching
>> the idle cpu?
> 
> It doesn't, the whole affine wakeup stuff is meant to pull waking tasks
> towards the cpu that does the wakeup, we limit this by putting bounds on
> the imbalance this is may create.
> 
> The reason we want to run tasks on the cpu that does the wakeup is
> because that cpu 'obviously' is running something related and it seems
> like a good idea to run related tasks close together.
> 
> So look at affine wakeups as a force that groups related tasks.

That's right, and it's one point I've missed when judging the
wake_affine()...

But that's really some benefit hardly to be estimate, especially when
the workload is heavy, the cost of wake_affine() is very high to
calculated se one by one, is that worth for some benefit we could not
promise?

According to the testing result, I could not agree this purpose of
wake_affine() benefit us, but I'm sure that wake_affine() is a terrible
performance killer when system is busy.

> 
>> So the new logical in this patch set is:
>>
>> new_cpu = select_idle_sibling(prev_cpu)
>> if idle_cpu(new_cpu)
>> return new_cpu
>>
>> new_cpu = select_idle_sibling(curr_cpu)
>> if idle_cpu(new_cpu) {
>> if wake_affine()
>> return new_cpu
>> }
>>
>> return prev_cpu
>>
>> And now, unless we are really going to move load from prev_cpu to
>> curr_cpu, we won't use wake_affine() any more.
> 
> That's completely breaks stuff, not cool.

Could you please give more details on what's the point you think is bad?

Regards,
Michael Wang

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the writeback tree with the btrfs tree

2013-02-21 Thread Stephen Rothwell
Hi Wu,

Today's linux-next merge of the writeback tree got a conflict in
fs/btrfs/extent-tree.c between commit da633a421701 ("Btrfs: flush all
dirty inodes if writeback can not start") from the btrfs tree and commit
10ee27a06cc8 ("vfs: re-implement writeback_inodes_sb(_nr)_if_idle() and
rename them") from the writeback tree.

I fixed it up (I assumed that the former supercedes the latter and used
that) and can carry the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpBWcad_ACJ7.pgp
Description: PGP signature


Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-21 Thread Michael Wang
On 02/21/2013 05:43 PM, Mike Galbraith wrote:
> On Thu, 2013-02-21 at 17:08 +0800, Michael Wang wrote:
> 
>> But is this patch set really cause regression on your Q6600? It may
>> sacrificed some thing, but I still think it will benefit far more,
>> especially on huge systems.
> 
> We spread on FORK/EXEC, and will no longer will pull communicating tasks
> back to a shared cache with the new logic preferring to leave wakee
> remote, so while no, I haven't tested (will try to find round tuit) it
> seems  it _must_ hurt.  Dragging data from one llc to the other on Q6600
> hurts a LOT.  Every time a client and server are cross llc, it's a huge
> hit.  The previous logic pulled communicating tasks together right when
> it matters the most, intermittent load... or interactive use.

I agree that this is a problem need to be solved, but don't agree that
wake_affine() is the solution.

According to my understanding, in the old world, wake_affine() will only
be used if curr_cpu and prev_cpu share cache, which means they are in
one package, whatever search in llc sd of curr_cpu or prev_cpu, we won't
have the chance to spread the task out of that package.

I'm going to recover the logical that only do select_idle_sibling() when
prev_cpu and curr_cpu are affine, so now the new logical will only
prefer leaving task in old package if both prev_cpu and curr_cpu are in
that package, I think this could solve the problem, isn't it?

Regards,
Michael Wang



> 
> -Mike
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kexec: prevent double free on image allocation failure

2013-02-21 Thread Sasha Levin
On 02/21/2013 08:55 PM, ebied...@xmission.com wrote:
> Sasha Levin  writes:
> 
>> If kimage_normal_alloc() fails to initialize an allocated kimage, it will 
>> free
>> the image but would still set 'rimage', as a result kexec_load will try
>> to free it again.
>>
>> This would explode as part of the freeing process is accessing internal
>> members which point to uninitialized memory.
> 
> Agreed.
> 
> I don't think that failure path has ever actually been exercised.

trinity is actually quite good at hitting that, which is how I discovered
it:

[  418.138251] Could not allocate control_code_buffer
[  418.143739] general protection fault:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
[  418.147131] Dumping ftrace buffer:
[  418.147901](ftrace buffer empty)
[  418.148697] Modules linked in:
[  418.153440] CPU 1
[  418.153440] Pid: 18098, comm: trinity Tainted: GW
3.8.0-next-20130220-sasha-00037-gc07b3b2-dirty #7
[  418.153440] RIP: 0010:[]  [] 
kimage_free_page_list+0x16/0x50
[  418.153440] RSP: 0018:88009bfade78  EFLAGS: 00010292
[  418.153440] RAX: 00180004 RBX: 0002 RCX: 
[  418.153440] RDX: 88009c1a RSI: 0001 RDI: 6b6b6b6b6b6b6b6b
[  418.153440] RBP: 88009bfade98 R08: 2782 R09: 
[  418.153440] R10:  R11:  R12: 88009c6cb4d0
[  418.153440] R13: 88009c6cb720 R14: 88009c6cb4d0 R15: 00f6
[  418.153440] FS:  7fb7eb95b700() GS:8800bb80() 
knlGS:
[  418.153440] CS:  0010 DS:  ES:  CR0: 80050033
[  418.153440] CR2: 004808e0 CR3: 9eaaa000 CR4: 000406e0
[  418.153440] DR0:  DR1:  DR2: 
[  418.153440] DR3:  DR6: 0ff0 DR7: 0400
[  418.153440] Process trinity (pid: 18098, threadinfo 88009bfac000, task 
88009c1a)
[  418.153440] Stack:
[  418.153440]  8546e948 0002 88009c6cb4d0 

[  418.153440]  88009bfaded8 8119b60f 0002 
0002
[  418.153440]  88009c6cb4d0 88009c6cb4d0 fff4 
00f6
[  418.153440] Call Trace:
[  418.153440]  [] kimage_free+0x2f/0x100
[  418.153440]  [] sys_kexec_load+0x593/0x660
[  418.153440]  [] ? trace_hardirqs_on+0xd/0x10
[  418.153440]  [] tracesys+0xe1/0xe6
[  418.153440] Code: c1 ef 0c 55 48 c1 e7 06 48 89 e5 48 01 c7 e8 82 ff ff ff 
5d c3 55 48 89 e5 41 55 49 89 fd 41 54 53 48 83 ec
08 48 8b 3f 49 39 fd <48> 8b 1f 75 08 eb 22 0f 1f 00 48 89 d3 4c 8d 67 e0 e8 54 
a6 8a
[  418.153440] RIP  [] kimage_free_page_list+0x16/0x50
[  418.153440]  RSP 
[  418.219646] ---[ end trace 0adb1d6b71fefb29 ]---


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 4/4] cpufreq: Get rid of "struct global_attr"

2013-02-21 Thread Viresh Kumar
On 22 February 2013 08:03, Rafael J. Wysocki  wrote:
> On Friday, February 22, 2013 07:47:44 AM Viresh Kumar wrote:
>> On 22 February 2013 05:15, Rafael J. Wysocki  wrote:
>> > Why did you change all of the lines of this macro instead of changing just 
>> > the
>> > one line you needed to change?
>>
>> I didn't like the indentation used within the macro. So did it.
>
> In general, things like that are for separate cleanup patches.  If you mix
> functional changes with cleanups, poeple get confused and it's difficult to 
> see
> what's needed and what's "optional".
>
> I know it's tempting to fix stuff like that along with doing functional
> changes and I do that sometimes.  Not very often, though, and with care.

Even i give similar comments sometimes but forget these while writing my
patches :)

Anyway, fixup:

commit b1bbb99467d56140cf3a8a2b70e61b456aa46e48
Author: Viresh Kumar 
Date:   Fri Feb 22 07:59:20 2013 +0530

fixup! cpufreq: Get rid of "struct global_attr"
---
 drivers/cpufreq/intel_pstate.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index e795134..49846b9 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -273,12 +273,12 @@ static void intel_pstate_debug_expose_params(void)
 /** debugfs end /

 /** sysfs begin /
-#define show_one(file_name, object)\
-static ssize_t show_##file_name\
-(struct cpufreq_policy *policy, char *buf) \
-{  \
-   return sprintf(buf, "%u\n", limits.object); \
-}
+#define show_one(file_name, object)\
+   static ssize_t show_##file_name \
+   (struct cpufreq_policy *policy, char *buf)  \
+   {   \
+   return sprintf(buf, "%u\n", limits.object); \
+   }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/4] cpufreq: Add per policy governor-init/exit infrastructure

2013-02-21 Thread Viresh Kumar
On 22 February 2013 05:05, Rafael J. Wysocki  wrote:
> Why don't you use different values here?
>
> If you need only one value, one #define should be sufficient.

This is the fixup i have for this, I will push all patches again to
cpufreq-for-3.10 branch:

--

commit 4d7296fb64f2353aafad5104f0a046466d0f4ea9
Author: Viresh Kumar 
Date:   Fri Feb 22 07:56:31 2013 +0530

fixup! cpufreq: Add per policy governor-init/exit infrastructure
---
 include/linux/cpufreq.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 3b822ce..b7393b5 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -183,7 +183,7 @@ static inline unsigned long cpufreq_scale(unsigned
long old, u_int div, u_int mu
 #define CPUFREQ_GOV_STOP   2
 #define CPUFREQ_GOV_LIMITS 3
 #define CPUFREQ_GOV_POLICY_INIT4
-#define CPUFREQ_GOV_POLICY_EXIT4
+#define CPUFREQ_GOV_POLICY_EXIT5

 struct cpufreq_governor {
charname[CPUFREQ_NAME_LEN];
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >