Re: [RFC PATCH v2 3/6] sched: pack small tasks

2012-12-15 Thread Alex Shi
On 12/14/2012 05:33 PM, Vincent Guittot wrote:
> On 14 December 2012 02:46, Alex Shi  wrote:
>> On 12/13/2012 11:48 PM, Vincent Guittot wrote:
>>> On 13 December 2012 15:53, Vincent Guittot  
>>> wrote:
 On 13 December 2012 15:25, Alex Shi  wrote:
> On 12/13/2012 06:11 PM, Vincent Guittot wrote:
>> On 13 December 2012 03:17, Alex Shi  wrote:
>>> On 12/12/2012 09:31 PM, Vincent Guittot wrote:
 During the creation of sched_domain, we define a pack buddy CPU for 
 each CPU
 when one is available. We want to pack at all levels where a group of 
 CPU can
 be power gated independently from others.
 On a system that can't power gate a group of CPUs independently, the 
 flag is
 set at all sched_domain level and the buddy is set to -1. This is the 
 default
 behavior.
 On a dual clusters / dual cores system which can power gate each core 
 and
 cluster independently, the buddy configuration will be :

   | Cluster 0   | Cluster 1   |
   | CPU0 | CPU1 | CPU2 | CPU3 |
 ---
 buddy | CPU0 | CPU0 | CPU0 | CPU2 |

 Small tasks tend to slip out of the periodic load balance so the best 
 place
 to choose to migrate them is during their wake up. The decision is in 
 O(1) as
 we only check again one buddy CPU
>>>
>>> Just have a little worry about the scalability on a big machine, like on
>>> a 4 sockets NUMA machine * 8 cores * HT machine, the buddy cpu in whole
>>> system need care 64 LCPUs. and in your case cpu0 just care 4 LCPU. That
>>> is different on task distribution decision.
>>
>> The buddy CPU should probably not be the same for all 64 LCPU it
>> depends on where it's worth packing small tasks
>
> Do you have further ideas for buddy cpu on such example?

 yes, I have several ideas which were not really relevant for small
 system but could be interesting for larger system

 We keep the same algorithm in a socket but we could either use another
 LCPU in the targeted socket (conf0) or chain the socket (conf1)
 instead of packing directly in one LCPU

 The scheme below tries to summaries the idea:

 Socket  | socket 0 | socket 1   | socket 2   | socket 3   |
 LCPU| 0 | 1-15 | 16 | 17-31 | 32 | 33-47 | 48 | 49-63 |
 buddy conf0 | 0 | 0| 1  | 16| 2  | 32| 3  | 48|
 buddy conf1 | 0 | 0| 0  | 16| 16 | 32| 32 | 48|
 buddy conf2 | 0 | 0| 16 | 16| 32 | 32| 48 | 48|

 But, I don't know how this can interact with NUMA load balance and the
 better might be to use conf3.
>>>
>>> I mean conf2 not conf3
>>
>> So, it has 4 levels 0/16/32/ for socket 3 and 0 level for socket 0, it
>> is unbalanced for different socket.
> 
> That the target because we have decided to pack the small tasks in
> socket 0 when we have parsed the topology at boot.
> We don't have to loop into sched_domain or sched_group anymore to find
> the best LCPU when a small tasks wake up.

iteration on domain and group is a advantage feature for power efficient
requirement, not shortage. If some CPU are already idle before forking,
let another waking CPU check their load/util and then decide which one
is best CPU can reduce late migrations, that save both the performance
and power.

On the contrary, move task walking on each level buddies is not only bad
on performance but also bad on power. Consider the quite big latency of
waking a deep idle CPU. we lose too much..

> 
>>
>> And the ground level has just one buddy for 16 LCPUs - 8 cores, that's
>> not a good design, consider my previous examples: if there are 4 or 8
>> tasks in one socket, you just has 2 choices: spread them into all cores,
>> or pack them into one LCPU. Actually, moving them just into 2 or 4 cores
>> maybe a better solution. but the design missed this.
> 
> You speak about tasks without any notion of load. This patch only care
> of small tasks and light LCPU load, but it falls back to default
> behavior for other situation. So if there are 4 or 8 small tasks, they
> will migrate to the socket 0 after 1 or up to 3 migration (it depends
> of the conf and the LCPU they come from).

According to your patch, what your mean 'notion of load' is the
utilization of cpu, not the load weight of tasks, right?

Yes, I just talked about tasks numbers, but it naturally extends to the
task utilization on cpu. like 8 tasks with 25% util, that just can full
fill 2 CPUs. but clearly beyond the capacity of the buddy, so you need
to wake up another CPU socket while local socket has some LCPU idle...
> 
> Then, if too much small tasks wake up simultaneously on the same LCPU,
> the default load balance will spread them in the core/cluster/socket
> 
>>
>> Obviously, more and more cores is the trend on any 

Re: [PATCH] clk: factor: calculate rate by do_div

2012-12-15 Thread Haojian Zhuang
On Sun, Dec 16, 2012 at 4:54 AM, Mike Turquette  wrote:
> On Sat, Dec 15, 2012 at 8:41 AM, Haojian Zhuang
>  wrote:
>> On Tue, Dec 4, 2012 at 9:32 AM, Haojian Zhuang  
>> wrote:
>>> On Mon, Dec 3, 2012 at 4:14 PM, Haojian Zhuang  
>>> wrote:
 clk->rate = parent->rate / div * mult

 The formula is OK. But it may overflow while we do operate with
 unsigned long. So use do_div instead.

 Signed-off-by: Haojian Zhuang 
 ---
  drivers/clk/clk-fixed-factor.c |5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

 diff --git a/drivers/clk/clk-fixed-factor.c 
 b/drivers/clk/clk-fixed-factor.c
 index a489985..1ef271e 100644
 --- a/drivers/clk/clk-fixed-factor.c
 +++ b/drivers/clk/clk-fixed-factor.c
 @@ -28,8 +28,11 @@ static unsigned long clk_factor_recalc_rate(struct 
 clk_hw *hw,
 unsigned long parent_rate)
  {
 struct clk_fixed_factor *fix = to_clk_fixed_factor(hw);
 +   unsigned long long int rate;

 -   return parent_rate * fix->mult / fix->div;
 +   rate = (unsigned long long int)parent_rate * fix->mult;
 +   do_div(rate, fix->div);
 +   return (unsigned long)rate;
  }

  static long clk_factor_round_rate(struct clk_hw *hw, unsigned long rate,
 --
 1.7.10.4

>>>
>>> Correct Mike's email address.
>>
>> Any comments? Does it mean that nobody want to fix the bug?
>
> Thanks for the patch.  My apologies for letting this one slip through
> the cracks but my normal email workflow was unavoidably disrupted and
> I find myself playing catch-up with pending patches.
>
> The patch looks good to me but I'll change the $SUBJECT to "clk:
> fixed-factor: round_rate should use do_div" and do some testing before
> taking it in.
>
> Regards,
> Mike

It's nice. Thank you.

Best Regards
Haojian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] cpufreq: Don't use cpu removed during cpufreq_driver_unregister

2012-12-15 Thread Viresh Kumar
This is how the core works:
cpufreq_driver_unregister()
 - subsys_interface_unregister()
   - for_each_cpu() call cpufreq_remove_dev(), i.e. 0,1,2,3,4 when we
 unregister.

cpufreq_remove_dev():
 - Remove policy node
 - Call cpufreq_add_dev() for next cpu, sharing mask with removed cpu.
   i.e. When cpu 0 is removed, we call it for cpu 1. And when called for cpu 2,
   we call it for cpu 3.
   - cpufreq_add_dev() would call cpufreq_driver->init()
   - init would return mask as AND of 2, 3 and 4 for cluster A7.
   - cpufreq core would do online_cpu && policy->cpus
 Here is the BUG(). Because cpu hasn't died but we have just unregistered
 the cpufreq driver, online cpu would still have cpu 2 in it. And so thing
 go bad again.

Solution: Keep cpumask of cpus that are registered with cpufreq core and clear
  cpus when we get a call from subsys_interface_unregister() via
  cpufreq_remove_dev().

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index a0a33bd..271d3be 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -47,6 +47,9 @@ static DEFINE_PER_CPU(char[CPUFREQ_NAME_LEN], 
cpufreq_cpu_governor);
 #endif
 static DEFINE_SPINLOCK(cpufreq_driver_lock);
 
+/* Used when we unregister cpufreq driver */
+struct cpumask cpufreq_online_mask;
+
 /*
  * cpu_policy_rwsem is a per CPU reader-writer semaphore designed to cure
  * all cpufreq/hotplug/workqueue/etc related lock issues.
@@ -981,6 +984,7 @@ static int cpufreq_add_dev(struct device *dev, struct 
subsys_interface *sif)
 * managing offline cpus here.
 */
cpumask_and(policy->cpus, policy->cpus, cpu_online_mask);
+   cpumask_and(policy->cpus, policy->cpus, _online_mask);
 
policy->user_policy.min = policy->min;
policy->user_policy.max = policy->max;
@@ -1064,7 +1068,6 @@ static int __cpufreq_remove_dev(struct device *dev, 
struct subsys_interface *sif
}
per_cpu(cpufreq_cpu_data, cpu) = NULL;
 
-
 #ifdef CONFIG_SMP
/* if this isn't the CPU which is the parent of the kobj, we
 * only need to unlink, put and exit
@@ -1185,6 +1188,7 @@ static int cpufreq_remove_dev(struct device *dev, struct 
subsys_interface *sif)
if (unlikely(lock_policy_rwsem_write(cpu)))
BUG();
 
+   cpumask_clear_cpu(cpu, _online_mask);
retval = __cpufreq_remove_dev(dev, sif);
return retval;
 }
@@ -1903,6 +1907,8 @@ int cpufreq_register_driver(struct cpufreq_driver 
*driver_data)
cpufreq_driver = driver_data;
spin_unlock_irqrestore(_driver_lock, flags);
 
+   cpumask_setall(_online_mask);
+
ret = subsys_interface_register(_interface);
if (ret)
goto err_null_driver;
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] cpufreq: Notify governors when cpus are hot-[un]plugged

2012-12-15 Thread Viresh Kumar
Because cpufreq core and governors worry only about the online cpus, if a cpu is
hot [un]plugged, we must notify governors about it, otherwise be ready to expect
something unexpected.

We already have notifiers in the form of CPUFREQ_GOV_START/CPUFREQ_GOV_STOP, we
just need to call them now.

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index de99517..a0a33bd 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -751,11 +751,16 @@ static int cpufreq_add_dev_policy(unsigned int cpu,
return -EBUSY;
}
 
+   __cpufreq_governor(managed_policy, CPUFREQ_GOV_STOP);
+
spin_lock_irqsave(_driver_lock, flags);
cpumask_copy(managed_policy->cpus, policy->cpus);
per_cpu(cpufreq_cpu_data, cpu) = managed_policy;
spin_unlock_irqrestore(_driver_lock, flags);
 
+   __cpufreq_governor(managed_policy, CPUFREQ_GOV_START);
+   __cpufreq_governor(managed_policy, CPUFREQ_GOV_LIMITS);
+
pr_debug("CPU already managed, adding link\n");
ret = sysfs_create_link(>kobj,
_policy->kobj,
@@ -1066,8 +1071,13 @@ static int __cpufreq_remove_dev(struct device *dev, 
struct subsys_interface *sif
 */
if (unlikely(cpu != data->cpu)) {
pr_debug("removing link\n");
+   __cpufreq_governor(data, CPUFREQ_GOV_STOP);
cpumask_clear_cpu(cpu, data->cpus);
spin_unlock_irqrestore(_driver_lock, flags);
+
+   __cpufreq_governor(data, CPUFREQ_GOV_START);
+   __cpufreq_governor(data, CPUFREQ_GOV_LIMITS);
+
kobj = >kobj;
cpufreq_cpu_put(data);
unlock_policy_rwsem_write(cpu);
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] cpufreq: Manage only online cpus

2012-12-15 Thread Viresh Kumar
cpufreq core doesn't manage offline cpus and if driver->init() has returned
mask including offline cpus, it may result in unwanted behavior by cpufreq core
or governors.

We need to get only online cpus in this mask. There are two places to fix this
mask, cpufreq core and cpufreq driver. It makes sense to do this at common place
and hence is done in core.

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 1f93dbd..de99517 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -970,6 +970,13 @@ static int cpufreq_add_dev(struct device *dev, struct 
subsys_interface *sif)
pr_debug("initialization failed\n");
goto err_unlock_policy;
}
+
+   /*
+* affected cpus must always be the one, which are online. We aren't
+* managing offline cpus here.
+*/
+   cpumask_and(policy->cpus, policy->cpus, cpu_online_mask);
+
policy->user_policy.min = policy->min;
policy->user_policy.max = policy->max;
 
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-15 Thread Eric Wong
Dave Chinner  wrote:
> On Sun, Dec 16, 2012 at 03:35:49AM +, Eric Wong wrote:
> > Dave Chinner  wrote:
> > > On Sun, Dec 16, 2012 at 12:25:49AM +, Eric Wong wrote:
> > > > Alan Cox  wrote:
> > > > > On Sat, 15 Dec 2012 00:54:48 +
> > > > > Eric Wong  wrote:
> > > > > 
> > > > > > Applications streaming large files may want to reduce disk spinups 
> > > > > > and
> > > > > > I/O latency by performing large amounts of readahead up front
> 
> > This could also be a use case for an audio/video player.
> 
> Sure, but this can all be handled by a userspace application. If you
> want to avoid/batch IO to enable longer spindown times, then you
> have to load the file into RAM somewhere, and you don't need special
> kernel support for that.

>From userspace, I don't know when/if I'm caching too much and possibly
getting the userspace cache itself swapped out.

> > So no, there's no difference that matters between the approaches.
> > But I think doing this in the kernel is easier for userspace users.
> 
> The kernel provides mechanisms for applications to use. You have not
> mentioned anything new that requires a new kernel mechanism to
> acheive - you just need to have the knowledge to put the pieces
> together properly.  People have been solving this same problem for
> the last 20 years without needing to tweak fadvise(). Or even having
> an fadvise() syscall...

fadvise() is fairly new, and AFAIK few apps use it.  Perhaps if it
were improved, more people would use it and not have to reinvent
the wheel.

> Nothing about low latency IO or streaming IO is simple or easy, and
> changing how readahead works doesn't change that fact. All it does
> is change the behaviour of every other application that uses
> fadvise() to minimise IO latency

I don't want to introduce regressions, either.

Perhaps if part of the FADV_WILLNEED read-ahead were handled
synchronously (maybe 2M?) and humongous large readaheads (like mine)
went to the background, that would be a good trade off?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-15 Thread Eric Wong
Dave Chinner  wrote:
> On Sun, Dec 16, 2012 at 03:59:53AM +, Eric Wong wrote:
> > I want the first read() to happen sooner than it would under current
> > fadvise. 
> 
> You're not listening.  You do not need the kernel to be modified to
> avoid the latency of issuing 1GB of readahead on a file.
> 
> You don't need to do readahead before the first read. Nor do you do
> need to wait for 1GB of readhead to be issued before you do the
> first read.
> 
> You could do readahead *concurrently* with the first read, so the
> first read only blocks until the readahead of the first part of the
> file completes.  i.e. just do readahead() in a background thread and
> don't wait for it to complete before doing the first read.

What you describe with concurrent readahead() is _exactly_ what my test
program (in other email) does with the RA environment variable set.

I know I do not _need_ fadvise + background WILLNEED support in the
kernel.

But I think the kernel can make life easier and allow us to avoid doing
background threads or writing our own (inferior) caching in userspace.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU

2012-12-15 Thread Yinghai Lu
On Sat, Dec 15, 2012 at 6:09 PM, Yinghai Lu  wrote:
> On Sat, Dec 15, 2012 at 1:40 PM, H. Peter Anvin  wrote:
>> On 12/15/2012 12:55 PM, Yinghai Lu wrote:
>>>
>>> BTW, did you look at smp boot problem with early_level4_pgt version?
>>
>>
>> No, I have been busy with non-Linux stuff today.
>>
>
> ok, i sorted it out. I will split it to small pieces and post them.

I updated for-x86-boot branch with it, and it is based on
linus:master
tip:x86/mm
tip:x86/urgent
tip:x86/mm2.

also attach 7 new ones are just added to that branch.

Thanks

Yinghai


0003-x86-call-copy_bootdata-early.patch
Description: Binary data


0004-x86-mm-add-early-kernel-mapping-in-c.patch
Description: Binary data


0005-x86-realmode-use-init_level4_pgt-to-set-trapmoline_p.patch
Description: Binary data


0006-x86-mm-increase-BRK-area-for-early-page-table.patch
Description: Binary data


0007-x86-64bit-early-PF-handler-set-page-table.patch
Description: Binary data


0008-x86-64bit-PF-handler-set-page-to-cover-2M-only.patch
Description: Binary data


0009-x86-64bit-Print-init-kernel-lowmap-correctly.patch
Description: Binary data


Re: [ANNOUNCE] Multiple run-queues for BFS

2012-12-15 Thread Hillf Danton
On Sun, Dec 16, 2012 at 1:16 AM, Matthias Kohler
 wrote:
> I'm doing a CPU-Scheduler based on BFS by Con Kolivas with support for
> multiple run-queues. BFS in itself uses only one run-queue for all
> CPU's. This avoids the load-balancing overhead, but does not scale well.
> One run-queue per CPU does scale well, but then the scheduler has
> load-balancing overhead. The scheduler I'm developing supports every
> possible run-queues configuration. You can have one single run-queue
> like in BFS, or you can have one run-queue per CPU, or something
> completely different like one run-queue every two CPU's. This, in theory
> would  allow the scheduler to be fine-tuned to the hardware and the
> workload.

Cannot see the reason to install wings on horse back.
Is it developed to schedule apps on advanced servers?
Notebook, or smart phone?

Good Weekend
Hillf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] make CONFIG_EXPERIMENTAL invisible and default

2012-12-15 Thread Jan Engelhardt

On Wednesday 2012-10-03 18:17, Greg Kroah-Hartman wrote:
>> 
>> OK, I will bite...  How should I flag an option that is initially only
>> intended for those willing to take some level of risk?
>
>In the text say "You really don't want to enable this option, use at
>your own risk!"  Or something like that :)

You know that won't not work, just like "everybody is encouraged
to upgrade" for -stable. It needs to say "All users must disable this!"
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-15 Thread Dave Chinner
On Sun, Dec 16, 2012 at 03:59:53AM +, Eric Wong wrote:
> Dave Chinner  wrote:
> > On Sun, Dec 16, 2012 at 03:04:42AM +, Eric Wong wrote:
> > > Dave Chinner  wrote:
> > > > On Sat, Dec 15, 2012 at 12:54:48AM +, Eric Wong wrote:
> > > > > 
> > > > >  Before: fadvise64(3, 0, 0, POSIX_FADV_WILLNEED) = 0 <2.484832>
> > > > >   After: fadvise64(3, 0, 0, POSIX_FADV_WILLNEED) = 0 <0.61>
> > > > 
> > > > You've basically asked fadvise() to readahead the entire file if it
> > > > can. That means it is likely to issue enough readahead to fill the
> > > > IO queue, and that's where all the latency is coming from. If all
> > > > you are trying to do is reduce the latency of the first read, then
> > > > only readahead the initial range that you are going to need to read...
> > > 
> > > Yes, I do want to read the whole file, eventually.  So I want to put
> > > the file into the page cache ASAP and allow the disk to spin down.
> > 
> > Issuing readahead is not going to speed up the first read. Either
> > you will spend more time issuing all the readahead, or you block
> > waiting for the first read to complete. And the way you are issuing
> > readahead does not guarantee the entire file is brought into the
> > page cache
> 
> I'm not relying on readahead to speed up the first read.
> 
> By using fadvise/readahead, I want a _best-effort_ attempt to
> keep the file in cache.
> 
> > > But I also want the first read() to be fast.
> > 
> > You can't have a pony, sorry.
> 
> I want the first read() to happen sooner than it would under current
> fadvise. 

You're not listening.  You do not need the kernel to be modified to
avoid the latency of issuing 1GB of readahead on a file.

You don't need to do readahead before the first read. Nor do you do
need to wait for 1GB of readhead to be issued before you do the
first read.

You could do readahead *concurrently* with the first read, so the
first read only blocks until the readahead of the first part of the
file completes.  i.e. just do readahead() in a background thread and
don't wait for it to complete before doing the first read.

You could even do readahead *after* the first read, when the time it
takes *doesn't matter* to the processing of the incoming data...

> I want "less-bad" initial latency than I was getting.

And you can do that by changing how you issue readahead from
userspace.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-15 Thread Dave Chinner
On Sun, Dec 16, 2012 at 03:35:49AM +, Eric Wong wrote:
> Dave Chinner  wrote:
> > On Sun, Dec 16, 2012 at 12:25:49AM +, Eric Wong wrote:
> > > Alan Cox  wrote:
> > > > On Sat, 15 Dec 2012 00:54:48 +
> > > > Eric Wong  wrote:
> > > > 
> > > > > Applications streaming large files may want to reduce disk spinups and
> > > > > I/O latency by performing large amounts of readahead up front

> This could also be a use case for an audio/video player.

Sure, but this can all be handled by a userspace application. If you
want to avoid/batch IO to enable longer spindown times, then you
have to load the file into RAM somewhere, and you don't need special
kernel support for that.

> So no, there's no difference that matters between the approaches.
> But I think doing this in the kernel is easier for userspace users.

The kernel provides mechanisms for applications to use. You have not
mentioned anything new that requires a new kernel mechanism to
acheive - you just need to have the knowledge to put the pieces
together properly.  People have been solving this same problem for
the last 20 years without needing to tweak fadvise(). Or even having
an fadvise() syscall...

Nothing about low latency IO or streaming IO is simple or easy, and
changing how readahead works doesn't change that fact. All it does
is change the behaviour of every other application that uses
fadvise() to minimise IO latency

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-15 Thread Eric Wong
Dave Chinner  wrote:
> On Sun, Dec 16, 2012 at 03:04:42AM +, Eric Wong wrote:
> > Dave Chinner  wrote:
> > > On Sat, Dec 15, 2012 at 12:54:48AM +, Eric Wong wrote:
> > > > 
> > > >  Before: fadvise64(3, 0, 0, POSIX_FADV_WILLNEED) = 0 <2.484832>
> > > >   After: fadvise64(3, 0, 0, POSIX_FADV_WILLNEED) = 0 <0.61>
> > > 
> > > You've basically asked fadvise() to readahead the entire file if it
> > > can. That means it is likely to issue enough readahead to fill the
> > > IO queue, and that's where all the latency is coming from. If all
> > > you are trying to do is reduce the latency of the first read, then
> > > only readahead the initial range that you are going to need to read...
> > 
> > Yes, I do want to read the whole file, eventually.  So I want to put
> > the file into the page cache ASAP and allow the disk to spin down.
> 
> Issuing readahead is not going to speed up the first read. Either
> you will spend more time issuing all the readahead, or you block
> waiting for the first read to complete. And the way you are issuing
> readahead does not guarantee the entire file is brought into the
> page cache

I'm not relying on readahead to speed up the first read.

By using fadvise/readahead, I want a _best-effort_ attempt to
keep the file in cache.

> > But I also want the first read() to be fast.
> 
> You can't have a pony, sorry.

I want the first read() to happen sooner than it would under current
fadvise.  If it's slightly slower that w/o fadvise, that's fine.
The 1-2s slower with current fadvise is what bothers me.

> > > Also, Pushing readahead off to a workqueue potentially allows
> > > someone to DOS the system because readahead won't ever get throttled
> > > in the syscall context...
> > 
> > Yes, I'm a little worried about this, too.
> > Perhaps squashing something like the following will work?
> > 
> > diff --git a/mm/readahead.c b/mm/readahead.c
> > index 56a80a9..51dc58e 100644
> > --- a/mm/readahead.c
> > +++ b/mm/readahead.c
> > @@ -246,16 +246,18 @@ void wq_page_cache_readahead(struct address_space 
> > *mapping, struct file *filp,
> >  {
> > struct wq_ra_req *req;
> >  
> > +   nr_to_read = max_sane_readahead(nr_to_read);
> > +   if (!nr_to_read)
> > +   goto skip_ra;
> 
> You do realise that anything you read ahead will be accounted as
> inactive pages, so nr_to_read doesn't decrease at all as you fill
> memory with readahead pages...

Ah, ok, I'll see if I can rework it.

> > req = kzalloc(sizeof(*req), GFP_ATOMIC);
> 
> GFP_ATOMIC? Really?

Sorry, I'm really new at this.

> In reality, I think you are looking in the wrong place to fix your
> "first read" latency problem. No matter what you do, there is going
> to be IO latency on the first read. And readahead doesn't guarantee
> that the pages are brought into the page cache (ever heard of
> readahead thrashing?) so the way you are doing your readahead is not
> going to result in you being able to spin the disk down after
> issuing a readahead command...

Right, I want a _best-effort_ readahead (which seems to be what an
advisory interface should offer).

> You've really got two problems - minimal initial latency, and
> reading the file quickly and pinning it in memory until you get
> around to needing it. The first can't be made faster by using
> readahead, and the second can not be guaranteed by using readahead.

Agreed.  I think I overstated the requirements.

I want "less-bad" initial latency than I was getting.

So I don't mind if open()+fadvise()+read() is a couple of milliseconds
slower than just open()+read(), but I do mind if fadvise() takes 1-2
seconds.

> IOWs, readahead is the wrong tool for solving your problems. Minimal
> IO latency from the first read will come from just issuing pread()
> after open(), and ensuring that the file is read quickly and pinned
> in memory can really only be done by allocating RAM in the
> application to hold it until it is needed

I definitely only want a best-effort method to put a file into memory.
I want the kernel to decide whether or not to cache it.

Thanks for looking at this!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-15 Thread Dave Chinner
On Sun, Dec 16, 2012 at 03:04:42AM +, Eric Wong wrote:
> Dave Chinner  wrote:
> > On Sat, Dec 15, 2012 at 12:54:48AM +, Eric Wong wrote:
> > > Applications streaming large files may want to reduce disk spinups and
> > > I/O latency by performing large amounts of readahead up front.
> > > Applications also tend to read files soon after opening them, so waiting
> > > on a slow fadvise may cause unpleasant latency when the application
> > > starts reading the file.
> > > 
> > > As a userspace hacker, I'm sometimes tempted to create a background
> > > thread in my app to run readahead().  However, I believe doing this
> > > in the kernel will make life easier for other userspace hackers.
> > > 
> > > Since fadvise makes no guarantees about when (or even if) readahead
> > > is performed, this change should not hurt existing applications.
> > > 
> > > "strace -T" timing on an uncached, one gigabyte file:
> > > 
> > >  Before: fadvise64(3, 0, 0, POSIX_FADV_WILLNEED) = 0 <2.484832>
> > >   After: fadvise64(3, 0, 0, POSIX_FADV_WILLNEED) = 0 <0.61>
> > 
> > You've basically asked fadvise() to readahead the entire file if it
> > can. That means it is likely to issue enough readahead to fill the
> > IO queue, and that's where all the latency is coming from. If all
> > you are trying to do is reduce the latency of the first read, then
> > only readahead the initial range that you are going to need to read...
> 
> Yes, I do want to read the whole file, eventually.  So I want to put
> the file into the page cache ASAP and allow the disk to spin down.

Issuing readahead is not going to speed up the first read. Either
you will spend more time issuing all the readahead, or you block
waiting for the first read to complete. And the way you are issuing
readahead does not guarantee the entire file is brought into the
page cache

> But I also want the first read() to be fast.

You can't have a pony, sorry.

> > Also, Pushing readahead off to a workqueue potentially allows
> > someone to DOS the system because readahead won't ever get throttled
> > in the syscall context...
> 
> Yes, I'm a little worried about this, too.
> Perhaps squashing something like the following will work?
> 
> diff --git a/mm/readahead.c b/mm/readahead.c
> index 56a80a9..51dc58e 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -246,16 +246,18 @@ void wq_page_cache_readahead(struct address_space 
> *mapping, struct file *filp,
>  {
>   struct wq_ra_req *req;
>  
> + nr_to_read = max_sane_readahead(nr_to_read);
> + if (!nr_to_read)
> + goto skip_ra;

You do realise that anything you read ahead will be accounted as
inactive pages, so nr_to_read doesn't decrease at all as you fill
memory with readahead pages...

> +
>   req = kzalloc(sizeof(*req), GFP_ATOMIC);

GFP_ATOMIC? Really?

In reality, I think you are looking in the wrong place to fix your
"first read" latency problem. No matter what you do, there is going
to be IO latency on the first read. And readahead doesn't guarantee
that the pages are brought into the page cache (ever heard of
readahead thrashing?) so the way you are doing your readahead is not
going to result in you being able to spin the disk down after
issuing a readahead command...

You've really got two problems - minimal initial latency, and
reading the file quickly and pinning it in memory until you get
around to needing it. The first can't be made faster by using
readahead, and the second can not be guaranteed by using readahead.

IOWs, readahead is the wrong tool for solving your problems. Minimal
IO latency from the first read will come from just issuing pread()
after open(), and ensuring that the file is read quickly and pinned
in memory can really only be done by allocating RAM in the
application to hold it until it is needed

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-15 Thread Eric Wong
Dave Chinner  wrote:
> On Sun, Dec 16, 2012 at 12:25:49AM +, Eric Wong wrote:
> > Alan Cox  wrote:
> > > On Sat, 15 Dec 2012 00:54:48 +
> > > Eric Wong  wrote:
> > > 
> > > > Applications streaming large files may want to reduce disk spinups and
> > > > I/O latency by performing large amounts of readahead up front
> > > 
> > > How does it compare benchmark wise with a user thread or using the
> > > readahead() call ?
> > 
> > Very well.
> > 
> > My main concern is for the speed of the initial pread()/read() call
> > after open().
> > 
> > Setting EARLY_EXIT means my test program _exit()s immediately after the
> > first pread().  In my test program (below), I wait for the background
> > thread to become ready before open() so I would not take overhead from
> > pthread_create() into account.
> > 
> > RA=1 uses a pthread + readahead()
> > Not setting RA uses fadvise (with my patch)
> 
> And if you don't use fadvise/readahead at all?

Sorry for the confusion.  I believe my other reply to you summarized
what I wanted to say in my commit message and also reply to Alan.

I want all the following things:

- I want the first read to be fast.
- I want to read the whole file eventually (probably slowly,
  as processing takes a while).
- I want to let my disk spin down for as long as possible.

This could also be a use case for an audio/video player.

> You're not timing how long the first pread() takes at all. You're
> timing the entire set of operations, including cloning a thread and
> for the readahead(2) call and messages to be passed back and forth
> through the eventfd interface to read the entire file.

You're right, I screwed up the measurement.  Using clock_gettime(),
there's hardly a difference between the approaches and I can't
get consistent timings between them.

So no, there's no difference that matters between the approaches.
But I think doing this in the kernel is easier for userspace users.

-- 8<
/* gcc -O2 -Wall -lpthread -lrt -o first_read first_read.c */
#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

static int efd1;
static int efd2;

static void clock_diff(struct timespec *a, const struct timespec *b)
{
a->tv_sec -= b->tv_sec;
a->tv_nsec -= b->tv_nsec;
if (a->tv_nsec < 0) {
--a->tv_sec;
a->tv_nsec += 10;
}
}

static void * start_ra(void *unused)
{
struct stat st;
eventfd_t val;
int fd;

/* tell parent to open() */
assert(eventfd_write(efd1, 1) == 0);

/* wait for parent to tell us fd is ready */
assert(eventfd_read(efd2, ) == 0);
fd = (int)val;

assert(fstat(fd, ) == 0);
assert(readahead(fd, 0, st.st_size) == 0);

return NULL;
}

int main(int argc, char *argv[])
{
char buf[16384];
pthread_t thr;
int fd;
struct timespec start;
struct timespec finish;
char *do_ra = getenv("RA");

if (argc != 2) {
fprintf(stderr, "Usage: strace -T %s LARGE_FILE\n", argv[0]);
return 1;
}

if (do_ra) {
eventfd_t val;
efd1 = eventfd(0, 0);
efd2 = eventfd(0, 0);
assert(efd1 >= 0 && efd2 >= 0 && "eventfd failed");
assert(pthread_create(, NULL, start_ra, NULL) == 0);

/* wait for child thread to spawn */
assert(eventfd_read(efd1, ) == 0);
}

fd = open(argv[1], O_RDONLY);
assert(fd >= 0 && "open failed");

assert(clock_gettime(CLOCK_MONOTONIC, ) == 0);

if (do_ra) {
/* wake up the child thread, give it a chance to run */
assert(eventfd_write(efd2, fd) == 0);
sched_yield();
} else
assert(posix_fadvise(fd, 0, 0, POSIX_FADV_WILLNEED) == 0);

assert(pread(fd, buf, sizeof(buf), 0) == sizeof(buf));
assert(clock_gettime(CLOCK_MONOTONIC, ) == 0);
clock_diff(, );
fprintf(stderr, "elapsed: %lu.%09lu\n", finish.tv_sec, finish.tv_nsec);

if (getenv("FULL_READ")) {
ssize_t r;
do {
r = read(fd, buf, sizeof(buf));
} while (r > 0);
assert(r == 0 && "EOF not reached");
}

if (getenv("EXIT_EARLY"))
_exit(0);

if (do_ra) {
assert(pthread_join(thr, NULL) == 0);
assert(close(efd1) == 0);
assert(close(efd2) == 0);
}

assert(close(fd) == 0);

return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-15 Thread Eric Wong
Eric Wong  wrote:
> Perhaps squashing something like the following will work?

Last hunk should've had a return before skip_ra:

--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -264,6 +266,10 @@ void wq_page_cache_readahead(struct address_space 
*mapping, struct file *filp,
req->nr_to_read = nr_to_read;
 
queue_work(readahead_wq, >work);
+
+   return;
+skip_ra:
+   fput(filp);
 }
 
 /*
-- 
Eric Wong
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-15 Thread Eric Wong
Dave Chinner  wrote:
> On Sat, Dec 15, 2012 at 12:54:48AM +, Eric Wong wrote:
> > Applications streaming large files may want to reduce disk spinups and
> > I/O latency by performing large amounts of readahead up front.
> > Applications also tend to read files soon after opening them, so waiting
> > on a slow fadvise may cause unpleasant latency when the application
> > starts reading the file.
> > 
> > As a userspace hacker, I'm sometimes tempted to create a background
> > thread in my app to run readahead().  However, I believe doing this
> > in the kernel will make life easier for other userspace hackers.
> > 
> > Since fadvise makes no guarantees about when (or even if) readahead
> > is performed, this change should not hurt existing applications.
> > 
> > "strace -T" timing on an uncached, one gigabyte file:
> > 
> >  Before: fadvise64(3, 0, 0, POSIX_FADV_WILLNEED) = 0 <2.484832>
> >   After: fadvise64(3, 0, 0, POSIX_FADV_WILLNEED) = 0 <0.61>
> 
> You've basically asked fadvise() to readahead the entire file if it
> can. That means it is likely to issue enough readahead to fill the
> IO queue, and that's where all the latency is coming from. If all
> you are trying to do is reduce the latency of the first read, then
> only readahead the initial range that you are going to need to read...

Yes, I do want to read the whole file, eventually.  So I want to put
the file into the page cache ASAP and allow the disk to spin down.
But I also want the first read() to be fast.

> Also, Pushing readahead off to a workqueue potentially allows
> someone to DOS the system because readahead won't ever get throttled
> in the syscall context...

Yes, I'm a little worried about this, too.
Perhaps squashing something like the following will work?

diff --git a/mm/readahead.c b/mm/readahead.c
index 56a80a9..51dc58e 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -246,16 +246,18 @@ void wq_page_cache_readahead(struct address_space 
*mapping, struct file *filp,
 {
struct wq_ra_req *req;
 
+   nr_to_read = max_sane_readahead(nr_to_read);
+   if (!nr_to_read)
+   goto skip_ra;
+
req = kzalloc(sizeof(*req), GFP_ATOMIC);
 
/*
 * we are fire-and-forget, not having enough memory means readahead
 * is not worth doing anyways
 */
-   if (!req) {
-   fput(filp);
-   return;
-   }
+   if (!req)
+   goto skip_ra;
 
INIT_WORK(>work, wq_ra_req_fn);
req->mapping = mapping;
@@ -264,6 +266,9 @@ void wq_page_cache_readahead(struct address_space *mapping, 
struct file *filp,
req->nr_to_read = nr_to_read;
 
queue_work(readahead_wq, >work);
+
+skip_ra:
+   fput(filp);
 }
 
 /*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-15 Thread Dave Chinner
On Sun, Dec 16, 2012 at 12:25:49AM +, Eric Wong wrote:
> Alan Cox  wrote:
> > On Sat, 15 Dec 2012 00:54:48 +
> > Eric Wong  wrote:
> > 
> > > Applications streaming large files may want to reduce disk spinups and
> > > I/O latency by performing large amounts of readahead up front
> > 
> > How does it compare benchmark wise with a user thread or using the
> > readahead() call ?
> 
> Very well.
> 
> My main concern is for the speed of the initial pread()/read() call
> after open().
> 
> Setting EARLY_EXIT means my test program _exit()s immediately after the
> first pread().  In my test program (below), I wait for the background
> thread to become ready before open() so I would not take overhead from
> pthread_create() into account.
> 
> RA=1 uses a pthread + readahead()
> Not setting RA uses fadvise (with my patch)

And if you don't use fadvise/readahead at all?

> # readahead + pthread.
> $ EARLY_EXIT=1 RA=1 time  ./first_read 1G
> 0.00user 0.05system 0:01.37elapsed 3%CPU (0avgtext+0avgdata 600maxresident)k
> 0inputs+0outputs (1major+187minor)pagefaults 0swaps
> 
> # patched fadvise
> $ EARLY_EXIT=1 time ./first_read 1G
> 0.00user 0.00system 0:00.01elapsed 0%CPU (0avgtext+0avgdata 564maxresident)k
> 0inputs+0outputs (1major+178minor)pagefaults 0swaps

You're not timing how long the first pread() takes at all. You're
timing the entire set of operations, including cloning a thread and
for the readahead(2) call and messages to be passed back and forth
through the eventfd interface to read the entire file.

Why even bother with another thread for readahead()? It implements
*exactly* the same operation as fadvise(WILL_NEED) (ie.
force_page_cache_readahead), so should perform identically when
called in exactly the same manner...

But again, you are interesting in the latency of the first read of
16k from the file, but you are asking to readahead 1GB of data.
Perhaps your shoul dbe asking for readahead of something more
appropriate to what you care about - the first read

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1] ARM: OMAP2+: common: remove use of vram

2012-12-15 Thread Javier Martinez Canillas
commit 966458f OMAP: remove vram allocator

Removed the OMAP specific vram allocator but OMAP2 common was
still trying to use it and this lead to the following build error:

CC  arch/arm/mach-omap2/common.o
arch/arm/mach-omap2/common.c:19:23: fatal error: plat/vram.h: No such file or 
directory
compilation terminated.
make[1]: *** [arch/arm/mach-omap2/common.o] Error 1
make: *** [arch/arm/mach-omap2] Error 2

Signed-off-by: Javier Martinez Canillas 
---
 arch/arm/mach-omap2/common.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/arch/arm/mach-omap2/common.c b/arch/arm/mach-omap2/common.c
index 5c2fd48..2dabb9e 100644
--- a/arch/arm/mach-omap2/common.c
+++ b/arch/arm/mach-omap2/common.c
@@ -16,8 +16,6 @@
 #include 
 #include 
 
-#include 
-
 #include "common.h"
 #include "omap-secure.h"
 
@@ -32,7 +30,6 @@ int __weak omap_secure_ram_reserve_memblock(void)
 
 void __init omap_reserve(void)
 {
-   omap_vram_reserve_sdram_memblock();
omap_dsp_reserve_sdram_memblock();
omap_secure_ram_reserve_memblock();
omap_barrier_reserve_memblock();
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-15 Thread Dave Chinner
On Sat, Dec 15, 2012 at 12:54:48AM +, Eric Wong wrote:
> Applications streaming large files may want to reduce disk spinups and
> I/O latency by performing large amounts of readahead up front.
> Applications also tend to read files soon after opening them, so waiting
> on a slow fadvise may cause unpleasant latency when the application
> starts reading the file.
> 
> As a userspace hacker, I'm sometimes tempted to create a background
> thread in my app to run readahead().  However, I believe doing this
> in the kernel will make life easier for other userspace hackers.
> 
> Since fadvise makes no guarantees about when (or even if) readahead
> is performed, this change should not hurt existing applications.
> 
> "strace -T" timing on an uncached, one gigabyte file:
> 
>  Before: fadvise64(3, 0, 0, POSIX_FADV_WILLNEED) = 0 <2.484832>
>   After: fadvise64(3, 0, 0, POSIX_FADV_WILLNEED) = 0 <0.61>

You've basically asked fadvise() to readahead the entire file if it
can. That means it is likely to issue enough readahead to fill the
IO queue, and that's where all the latency is coming from. If all
you are trying to do is reduce the latency of the first read, then
only readahead the initial range that you are going to need to read...

Also, Pushing readahead off to a workqueue potentially allows
someone to DOS the system because readahead won't ever get throttled
in the syscall context...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfs: update atimes over one day in the past or future

2012-12-15 Thread Dave Chinner
[ please place patches inline, not as attachments. ]

On Sat, Dec 15, 2012 at 11:25:23PM +0800, ys wrote:
> From 3d56c131b58a21c05bcd677b9d2ba915abcbf195 Mon Sep 17 00:00:00 2001
> From: yangsheng 
> Date: Sat, 15 Dec 2012 21:46:22 +0800
> Subject: [PATCH] vfs: update atimes over one day in the past or future
> 
> Relatime should update the inode atime if it is more than one day
> in the future.  The original problem seen was a tarball that had
> a bad atime in the distant future, but could also happen if someone
> fat-fingers a "touch".  The future atime will never be fixed.
> 
> Without relatime enabled, a future atime is updated to the current
> kernel time on access.  Relatime is meant to reduce the frequency
> of atime updates, not decide if whether the system clock or the
> inode timestamp is correct or not.
> 
> Signed-off-by: Yang Sheng 
> Signed-off-by: Andreas Dilger 
> Acked-by: David Chinner 

No I didn't. Please don't add tags that someone has not added
directly in a reply to the original patch.

> CC: sta...@vger.kernel.org
> ---
>  fs/inode.c | 7 ---
>  1 ??? 4 ???(+)? 3 ???(-)

There's something wrong with the character encoding you are using...

> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 14084b7..8713dc8 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1488,10 +1488,11 @@ static int relatime_need_update(struct vfsmount *mnt, 
> struct inode *inode,
>   return 1;
>  
>   /*
> -  * Is the previous atime value older than a day? If yes,
> -  * update atime:
> +  * Update atime if it's older than a day or more than a day
> +  * in the future, which we assume is corrupt.
> +  *   

A time in the future is not a corruption - the comment should
reflect exactly what you've put in the commit message. i.e. that
relatime is for reducing updates, not preventing atime from ever
moving backwards.

Also, you've added an extra line of whitespace damage that doesn't
need to be there.

FWIW, could you write a test for xfstests for this behaviour
so we can confirm that we don't break it in future?

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: resend--[PATCH] improve read ahead in kernel

2012-12-15 Thread Eric Wong
xtu4  wrote:
> resend it, due to format error
> 
> Subject: [PATCH] when system in low memory scenario, imaging there is a mp3
>  play, ora video play, we need to read mp3 or video file
>  from memory to page cache,but when system lack of memory,
>  page cache of mp3 or video file will be reclaimed.once read
>  in memory, then reclaimed, it will cause audio or video
>  glitch,and it will increase the io operation at the same
>  time.

To me, this basically describes how POSIX_FADV_NOREUSE should work.
I would like to have this ability via fadvise (and not CONFIG_).

Also, I think your patch has too many #ifdefs to be accepted.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU

2012-12-15 Thread Yinghai Lu
On Sat, Dec 15, 2012 at 1:40 PM, H. Peter Anvin  wrote:
> On 12/15/2012 12:55 PM, Yinghai Lu wrote:
>>
>> BTW, did you look at smp boot problem with early_level4_pgt version?
>
>
> No, I have been busy with non-Linux stuff today.
>

ok, i sorted it out. I will split it to small pieces and post them.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: /usr/include/linux/errno.h:1:23: fatal error: asm/errno.h: No such file or directory

2012-12-15 Thread Jeff Chua
On Sun, Dec 16, 2012 at 9:53 AM, Al Viro  wrote:
> On Sun, Dec 16, 2012 at 09:39:01AM +0800, Jeff Chua wrote:
>> On Sun, Dec 16, 2012 at 9:28 AM, Al Viro  wrote:
>> > On Sun, Dec 16, 2012 at 09:23:38AM +0800, Jeff Chua wrote:
>> >> How should the symbolic links be setup to compile the latest kernel?
>> >>
>> >>
>> >> Currently I had these links and kernels compiled fine until 2 days ago.
>> >>
>> >>  asm -> /usr/src/linux/include/uapi/asm-generic/
>> >>  asm-generic -> /usr/src/linux/include/uapi/asm-generic
>> >>  linux -> /usr/src/linux/include/uapi/linux
>> >
>> > What symlinks?  /usr/include/* should not contain any symlinks into
>> > the kernel source.  At all.
>>
>> Al,
>>
>> Oh, perhaps I'm having the right setup. Where should I get the kernel
>> headers.
>
> From your libc.  Which ought to have its own copies, normally coming from
> make headers_install in kernel source.  And yes, it had been that way
> for many years by now.  Userland should *not* blindly grab the kernel
> headers.
>
> Incidentally, your 'asm' is obviously bogus - the headers that should end
> up there ought to come from arch//include/uapi/asm (and _not_
> by pointing a symlink to it); yours points to the place where asm-generic
> ones ought to have been copied from.

Al,

Thanks for the pointers. Will try as what you suggested:)

Merry Christmas.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: /usr/include/linux/errno.h:1:23: fatal error: asm/errno.h: No such file or directory

2012-12-15 Thread Al Viro
On Sun, Dec 16, 2012 at 09:39:01AM +0800, Jeff Chua wrote:
> On Sun, Dec 16, 2012 at 9:28 AM, Al Viro  wrote:
> > On Sun, Dec 16, 2012 at 09:23:38AM +0800, Jeff Chua wrote:
> >> How should the symbolic links be setup to compile the latest kernel?
> >>
> >>
> >> Currently I had these links and kernels compiled fine until 2 days ago.
> >>
> >>  asm -> /usr/src/linux/include/uapi/asm-generic/
> >>  asm-generic -> /usr/src/linux/include/uapi/asm-generic
> >>  linux -> /usr/src/linux/include/uapi/linux
> >
> > What symlinks?  /usr/include/* should not contain any symlinks into
> > the kernel source.  At all.
> 
> Al,
> 
> Oh, perhaps I'm having the right setup. Where should I get the kernel
> headers.

>From your libc.  Which ought to have its own copies, normally coming from
make headers_install in kernel source.  And yes, it had been that way
for many years by now.  Userland should *not* blindly grab the kernel
headers.

Incidentally, your 'asm' is obviously bogus - the headers that should end
up there ought to come from arch//include/uapi/asm (and _not_
by pointing a symlink to it); yours points to the place where asm-generic
ones ought to have been copied from.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 21/21] documentation: drop vmtruncate

2012-12-15 Thread Rob Landley

On 12/15/2012 05:00:38 AM, Marco Stornelli wrote:

Removed vmtruncate

Signed-off-by: Marco Stornelli 


Acked-by: Rob Landley 

(I can't help thinking there should have been some sort of  
feature-removal-schedule entry for this. Is there any sort of trailing  
record of major stuff that happened and when? The kernelnewbies  
http://kernelnewbies.org/LinuxVersions page is the best I've found, but  
it's a bit clumsy to use as a reference to find which version a change  
happened in. The https://lwn.net/Articles/2.6-kernel-api/ page was  
great but it stalled in 2009. Maybe I just miss kernel-traffic...)


Rob--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: /usr/include/linux/errno.h:1:23: fatal error: asm/errno.h: No such file or directory

2012-12-15 Thread Jeff Chua
On Sun, Dec 16, 2012 at 9:39 AM, Jeff Chua  wrote:
> On Sun, Dec 16, 2012 at 9:28 AM, Al Viro  wrote:
>> On Sun, Dec 16, 2012 at 09:23:38AM +0800, Jeff Chua wrote:
>>> How should the symbolic links be setup to compile the latest kernel?
>>>
>>>
>>> Currently I had these links and kernels compiled fine until 2 days ago.
>>>
>>>  asm -> /usr/src/linux/include/uapi/asm-generic/
>>>  asm-generic -> /usr/src/linux/include/uapi/asm-generic
>>>  linux -> /usr/src/linux/include/uapi/linux
>>
>> What symlinks?  /usr/include/* should not contain any symlinks into
>> the kernel source.  At all.
>
> Al,
>
> Oh, perhaps I'm having the right setup. Where should I get the kernel
> headers. After removing the links to the kernel source, here what I
> got ...
>
> make[1]: Nothing to be done for `all'.
>   HOSTCC  scripts/basic/fixdep
> In file included from /usr/include/bits/posix1_lim.h:160:0,
>  from /usr/include/limits.h:144,
>  from scripts/basic/fixdep.c:114:
> /usr/include/bits/local_lim.h:38:26: fatal error: linux/limits.h: No
> such file or directory
> compilation terminated.

> Oh, perhaps I'm having the right setup. Where should I get the kernel

NOT having the right setup.


Jeff.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: /usr/include/linux/errno.h:1:23: fatal error: asm/errno.h: No such file or directory

2012-12-15 Thread Jeff Chua
On Sun, Dec 16, 2012 at 9:28 AM, Al Viro  wrote:
> On Sun, Dec 16, 2012 at 09:23:38AM +0800, Jeff Chua wrote:
>> How should the symbolic links be setup to compile the latest kernel?
>>
>>
>> Currently I had these links and kernels compiled fine until 2 days ago.
>>
>>  asm -> /usr/src/linux/include/uapi/asm-generic/
>>  asm-generic -> /usr/src/linux/include/uapi/asm-generic
>>  linux -> /usr/src/linux/include/uapi/linux
>
> What symlinks?  /usr/include/* should not contain any symlinks into
> the kernel source.  At all.

Al,

Oh, perhaps I'm having the right setup. Where should I get the kernel
headers. After removing the links to the kernel source, here what I
got ...

make[1]: Nothing to be done for `all'.
  HOSTCC  scripts/basic/fixdep
In file included from /usr/include/bits/posix1_lim.h:160:0,
 from /usr/include/limits.h:144,
 from scripts/basic/fixdep.c:114:
/usr/include/bits/local_lim.h:38:26: fatal error: linux/limits.h: No
such file or directory
compilation terminated.

Thanks,
Jeff.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: /usr/include/linux/errno.h:1:23: fatal error: asm/errno.h: No such file or directory

2012-12-15 Thread Al Viro
On Sun, Dec 16, 2012 at 09:23:38AM +0800, Jeff Chua wrote:
> How should the symbolic links be setup to compile the latest kernel?
> 
> 
> Currently I had these links and kernels compiled fine until 2 days ago.
> 
>  asm -> /usr/src/linux/include/uapi/asm-generic/
>  asm-generic -> /usr/src/linux/include/uapi/asm-generic
>  linux -> /usr/src/linux/include/uapi/linux

What symlinks?  /usr/include/* should not contain any symlinks into
the kernel source.  At all.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] target updates for v3.8-rc1

2012-12-15 Thread Nicholas A. Bellinger
On Sat, 2012-12-15 at 14:28 -0800, Linus Torvalds wrote:
> On Fri, Dec 14, 2012 at 3:53 PM, Nicholas A. Bellinger
>  wrote:
> >
> > Here are the target updates for v3.8-rc1 merge window code.  Please go
> > ahead and pull from:
> >
> >   git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git 
> > for-next
> >
> > Just a heads up that there is a minor merge conflict that you'll
> > encounter in target_handle_task_attr() code, that sfr has been carrying
> > a fix for recently within -next.  After dropping the HEAD section, the
> > resolution should look like:
> 
> Hmm. This is *not* how I resolved that conflict - that seems to drop the new
> 
>  complete(>t_transport_stop_comp);
> 
> added by Roland in commit 3ea160b3e8f0 ("target: Fix handling of
> aborted commands").

You are most certainly correct.

> 
> So my conflict resolution looks different.
> 
> Which may be a bug, of course. Nicholas, Roland, please check my end result,
> 

Including the complete() from commit 3ea160b3e8f0 in the exception path
for transport_check_aborted_status() within target_execute_cmd() code
was/is the proper merge resolution.

Thank you for taking care of this.

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Resend][PATCH] PM: Move disabling/enabling runtime PM to late suspend/early resume

2012-12-15 Thread Rafael J. Wysocki
On Saturday, December 15, 2012 10:16:29 PM Jiri Kosina wrote:
> On Sat, 15 Dec 2012, Rafael J. Wysocki wrote:
> 
> > From: Rafael J. Wysocki 
> > 
> > Currently, the PM core disables runtime PM for all devices right
> > after executing subsystem/driver .suspend() callbacks for them
> > and re-enables it right before executing subsystem/driver .resume()
> > callbacks for them.  This may lead to problems when there are
> > two devices such that the .suspend() callback executed for one of
> > them depends on runtime PM working for the other.  In that case,
> > if runtime PM has already been disabled for the second device,
> > the first one's .suspend() won't work correctly (and analogously
> > for resume).
> > 
> > To make those issues go away, make the PM core disable runtime PM
> > for devices right before executing subsystem/driver .suspend_late()
> > callbacks for them and enable runtime PM for them right after
> > executing subsystem/driver .resume_early() callbacks for them.  This
> > way the potential conflitcs between .suspend_late()/.resume_early()
> > and their runtime PM counterparts are still prevented from happening,
> > but the subtle ordering issues related to disabling/enabling runtime
> > PM for devices during system suspend/resume are much easier to avoid.
> > 
> > Reported-and-tested-by: Jan-Matthias Braun 
> > Signed-off-by: Rafael J. Wysocki 
> 
> Hi Rafael,
> 
> just curious what is the reason for resend? Do you want to gather more 
> Acks before pushing this upstream?

Well, I thought that some people might actually look at it when they found it
again in their mailboxes. :-)

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/8] mm: memcg: only evict file pages when we have plenty

2012-12-15 Thread Simon Jeons

On 12/13/2012 10:55 PM, Michal Hocko wrote:

On Wed 12-12-12 17:28:44, Johannes Weiner wrote:

On Wed, Dec 12, 2012 at 04:53:36PM -0500, Rik van Riel wrote:

On 12/12/2012 04:43 PM, Johannes Weiner wrote:

dc0422c "mm: vmscan: only evict file pages when we have plenty" makes
a point of not going for anonymous memory while there is still enough
inactive cache around.

The check was added only for global reclaim, but it is just as useful
for memory cgroup reclaim.

Signed-off-by: Johannes Weiner 
---
  mm/vmscan.c | 19 ++-
  1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 157bb11..3874dcb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1671,6 +1671,16 @@ static void get_scan_count(struct lruvec *lruvec, struct 
scan_control *sc,
denominator = 1;
goto out;
}
+   /*
+* There is enough inactive page cache, do not reclaim
+* anything from the anonymous working set right now.
+*/
+   if (!inactive_file_is_low(lruvec)) {
+   fraction[0] = 0;
+   fraction[1] = 1;
+   denominator = 1;
+   goto out;
+   }

anon  = get_lru_size(lruvec, LRU_ACTIVE_ANON) +
get_lru_size(lruvec, LRU_INACTIVE_ANON);
@@ -1688,15 +1698,6 @@ static void get_scan_count(struct lruvec *lruvec, struct 
scan_control *sc,
fraction[1] = 0;
denominator = 1;
goto out;
-   } else if (!inactive_file_is_low_global(zone)) {
-   /*
-* There is enough inactive page cache, do not
-* reclaim anything from the working set right now.
-*/
-   fraction[0] = 0;
-   fraction[1] = 1;
-   denominator = 1;
-   goto out;
}
}



I believe the if() block should be moved to AFTER
the check where we make sure we actually have enough
file pages.

You are absolutely right, this makes more sense.  Although I'd figure
the impact would be small because if there actually is that little
file cache, it won't be there for long with force-file scanning... :-)

Yes, I think that the result would be worse (more swapping) so the
change can only help.


I moved the condition, but it throws conflicts in the rest of the
series.  Will re-run tests, wait for Michal and Mel, then resend.

Yes the patch makes sense for memcg as well. I guess you have tested
this primarily with memcg. Do you have any numbers? Would be nice to put
them into the changelog if you have (it should help to reduce swapping
with heavy streaming IO load).

Acked-by: Michal Hocko 


Hi Michal,

I still can't understand why "The goto out means that it should be fine 
either way.", could you explain to me, sorry for my stupid. :-)



Regards,
Simon



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re:second hand digital duplicators on sale

2012-12-15 Thread Gdjoub
Duplicator ink and master, copier toner and printer cartridge, spare parts and 
used machineone package service

If you need, call me pls then we talk details.

(*^__^*).



Skype:  luckyme991
Yahoo! Messenger: luckyme991
Tel:   0086 15989081521



*
If you hate this email, pls reject it. Sorry.

Re: [PATCH] avoid entropy starvation due to stack protection

2012-12-15 Thread Theodore Ts'o
On Fri, Dec 14, 2012 at 06:36:41PM +0100, Stephan Mueller wrote:
> >> That patch is about one week from a mainline merge, btw.
> > Initially I was also thinking about get_random_int. But stack protection
> > depends on non-predictable numbers to ensure it cannot be defeated. As
> > get_random_int depends on MD5 which is assumed to be broken now, I
> > discarded the idea of using get_random_int.

The original use of get_random_int() was for applications where the
speed impact of using a heavierweight cryptographic primitive was not
something which could be tolerated.

However, the strength of get_random_int() is actually pretty good.
Note that we never expose the full MD5 hash; we only export the first
32-bits of the hash.   So even if you ignore the effects of:

hash[0] += current->pid + jiffies + get_cycles();

What we effectively have is a deterministic RNG which is using MD5,
where the secret "key" is an initially seeded random value, and the
state counter is the MD5 hash accumulator, where we only expose the
first 32-bits with each turn of the crank.  Now, MD5 has been cracked,
but it's been cracked as a cryptographic checksum --- that is, given a
particular MD5 hash, it is possible to find an input value which will
result in that hash.  That doesn't necessarily mean that it can be
possible to take a stream of numbers produced by using the MD5 core in
this particular RNG configuration, and determine the secret value used
for the RNG (a collision attack allows you to find a possible input
value; that value may not be the one used as the secret).

That being said, it's not a question which has been studied
extensively by cryptographers, and so I can easily see how people
might be paranoid about whether this approach is good enough.

In the case of initializing 16 bits of randomness passed to userspace
after a exec(), performance is presumably not as important, so if
someone wanted to use something that was stronger from a
certificational point of view than get_random_int(), that's certainly
understandable.  However, it's not clear to me that replicating the
full /dev/random pool infrastructure if you're never going to mix in
any additional randomness is the best way to go about things.

What I would do instead is use an AES-based cryptographic random
number generator.  That is, at boot time, grab enough randomness to
for an AES key, and then use that key to create a cryptographic random
number generator by encrypting a counter with said AES key.  This is a
cryptographic primitive which has been very carefully studied, and for
architectures where you have a hardware support for AES (including
ARMv8, Power 7, Sparc T4, as well as x86 processors with the AES-NI
instructions), this will be much faster and require much less memory
and CPU resources than replicating the /dev/urandom infrastructure.

Whether or not we really need this level of paranoia for hardening
stack randomization I'll leave for someone else to decide.
Personally, my philosophy is if someone has managed to get
unprivileged shell acess, trying to protect against a privilege
escalation attack is largely hopeless on most Linux systems.  The name
of the game is to protect against someone who does not yet have the
ability to run arbitrary unprivileged code on the system of interest.
In that case, the attacker isn't going to be able to get access to the
output of get_random_int(), so even if there was a cryptographic
weakness where an attacker who had access to the get_random_int()
output stream could guess the internal state of the MD5-based RNG, in
the case of a remote attacker, they wouldn't have access to the output
of the RNG in the first place.

Regards,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ANNOUNCE] 3.7-ck1, BFS 426 for linux-3.7

2012-12-15 Thread Con Kolivas
These are patches designed to improve system responsiveness and
interactivity with specific emphasis on the desktop, but suitable to
any commodity hardware workload.

Apply to 3.7.x:
-ck-ckhttp://ck.kolivas.org/patches/3.0/3.7/3.71/patch-3.71.bz2
or
-ck-ckhttp://ck.kolivas.org/patches/3.0/3.7/3.71/patch-3.71.lrz

Broken out tarball:
-ck-ckhttp://ck.kolivas.org/patches/3.0/3.7/3.71/3.71-broken-out.tar.bz2
or
-ck-ckhttp://ck.kolivas.org/patches/3.0/3.7/3.71/3.71-broken-out.tar.lrz

Discrete patches:
-ckhttp://ck.kolivas.org/patches/3.0/3.7/3.71/patches/

Latest BFS by itself:
http://ck.kolivas.org/patches/bfs/3.0/3.7/3.7-sched-bfs-426.patch

Web:
http://kernel.kolivas.org

Code blog when I feel like it:
http://ck-hack.blogspot.com/

-- 
-ck
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-15 Thread Eric Wong
Alan Cox  wrote:
> On Sat, 15 Dec 2012 00:54:48 +
> Eric Wong  wrote:
> 
> > Applications streaming large files may want to reduce disk spinups and
> > I/O latency by performing large amounts of readahead up front
> 
> How does it compare benchmark wise with a user thread or using the
> readahead() call ?

Very well.

My main concern is for the speed of the initial pread()/read() call
after open().

Setting EARLY_EXIT means my test program _exit()s immediately after the
first pread().  In my test program (below), I wait for the background
thread to become ready before open() so I would not take overhead from
pthread_create() into account.

RA=1 uses a pthread + readahead()
Not setting RA uses fadvise (with my patch)

# readahead + pthread.
$ EARLY_EXIT=1 RA=1 time  ./first_read 1G
0.00user 0.05system 0:01.37elapsed 3%CPU (0avgtext+0avgdata 600maxresident)k
0inputs+0outputs (1major+187minor)pagefaults 0swaps

# patched fadvise
$ EARLY_EXIT=1 time ./first_read 1G
0.00user 0.00system 0:00.01elapsed 0%CPU (0avgtext+0avgdata 564maxresident)k
0inputs+0outputs (1major+178minor)pagefaults 0swaps

Perhaps I screwed up my readahead() + threads path badly, but there
seems to be a huge benefit in using fadvise with my patch.  I'm not sure
why readahead() + thread does so badly, even...

Even if I badly screwed up my use of readahead(), the benefit of my
patch spares others from screwing up when using threads+readahead() :)

FULL_READ
-
While full, fast reads are not my target use case, there's no noticeable
regression here, either.  Results for doing a full, fast read on the file
are closer and fluctuate more between runs.

# readahead + pthread.
$ FULL_READ=1 EARLY_EXIT=1 RA=1 time ./first_read 1G
0.01user 1.10system 0:09.24elapsed 12%CPU (0avgtext+0avgdata 596maxresident)k
0inputs+0outputs (1major+186minor)pagefaults 0swaps

# patched fadvise
FULL_READ=1 EARLY_EXIT=1 time ./first_read 1G
0.01user 1.04system 0:09.22elapsed 11%CPU (0avgtext+0avgdata 564maxresident)k
0inputs+0outputs (1major+178minor)pagefaults 0swaps

- 8< --
/* gcc -O2 -Wall -lpthread -o first_read first_read.c */
#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

static int efd1;
static int efd2;

static void * start_ra(void *unused)
{
struct stat st;
eventfd_t val;
int fd;

/* tell parent to open() */
assert(eventfd_write(efd1, 1) == 0);

/* wait for parent to tell us fd is ready */
assert(eventfd_read(efd2, ) == 0);
fd = (int)val;

assert(fstat(fd, ) == 0);
assert(readahead(fd, 0, st.st_size) == 0);

return NULL;
}

int main(int argc, char *argv[])
{
char buf[16384];
pthread_t thr;
int fd;
char *do_ra = getenv("RA");

if (argc != 2) {
fprintf(stderr, "Usage: strace -T %s LARGE_FILE\n", argv[0]);
return 1;
}

if (do_ra) {
eventfd_t val;
efd1 = eventfd(0, 0);
efd2 = eventfd(0, 0);
assert(efd1 >= 0 && efd2 >= 0 && "eventfd failed");
assert(pthread_create(, NULL, start_ra, NULL) == 0);

/* wait for child thread to spawn */
assert(eventfd_read(efd1, ) == 0);
}

fd = open(argv[1], O_RDONLY);
assert(fd >= 0 && "open failed");

if (do_ra) {
/* wake up the child thread, give it a chance to run */
assert(eventfd_write(efd2, fd) == 0);
sched_yield();
} else
assert(posix_fadvise(fd, 0, 0, POSIX_FADV_WILLNEED) == 0);

assert(pread(fd, buf, sizeof(buf), 0) == sizeof(buf));

if (getenv("FULL_READ")) {
ssize_t r;
do {
r = read(fd, buf, sizeof(buf));
} while (r > 0);
assert(r == 0 && "EOF not reached");
}

if (getenv("EXIT_EARLY"))
_exit(0);

if (do_ra) {
assert(pthread_join(thr, NULL) == 0);
assert(close(efd1) == 0);
assert(close(efd2) == 0);
}

assert(close(fd) == 0);

return 0;
}
- 8< --

Thanks for your interest in this!

-- 
Eric Wong
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] x86/uapi for 3.8

2012-12-15 Thread H. Peter Anvin

On 12/15/2012 01:37 PM, Dave Jones wrote:

On Sat, Dec 15, 2012 at 11:58:00AM -0800, Linus Torvalds wrote:

  > It might also be that it causes some massive corruption at boot time,
  > but it then requires that that particular memory is actually used. So
  > maybe it's not so much about the memory map except indirectly.

I wonder if this might explain the XFS corruption I've been seeing
the last couple days.  Won't be able to get at the affected laptop
until Monday to find out..



It seems somewhat unlikely, but not implausible, since the trampoline 
page table is only in use for very brief moments and usually not very 
often at all, but if it is just completely screwed and we do fandango on 
memory... yes we could have problems.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] x86/uapi for 3.8

2012-12-15 Thread H. Peter Anvin


Anybody see anything else?

And why do we have to call the get-time calls so early? Couldn't we
move them later and avoid all the crazy "let's create silly magical
page tables just for the idiotic EFI problems".



We need them anyway... actually the whole point of that patch is to try 
to *remove* silly magical page tables just for EFI and use another set 
of silly magical page tables we need anyway (for S3 resume, SMP bootup 
and so on.)  Reducing the sheer number of silly magical page tables has 
been a priority for some time -- I want to get it down to one if we can.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] x86/uapi for 3.8

2012-12-15 Thread Linus Torvalds
On Sat, Dec 15, 2012 at 2:05 PM, Yinghai Lu  wrote:
> On Sat, Dec 15, 2012 at 1:06 PM, Linus Torvalds 
>  wrote:
>>
>> I've reverted the commit.
>
> more than that, 3 commits just after that commit should be reverted at
> the same time.
> they all depend on that commit.

Thanks for pointing that out, and just to make sure I verified that on
my Macbook Air which does use EFI. It was broken by the single revert,
and fixed by the additional three reverts.

Sadly:

> and first checking of that commit, it would have problem with system
> more than 512g ...

That particular bug isn't the cause for my non-EFI problems, since I
don't have that kind of memory..

So there is something else going on in addition to the bug you found.
But good eye.

Anybody see anything else?

And why do we have to call the get-time calls so early? Couldn't we
move them later and avoid all the crazy "let's create silly magical
page tables just for the idiotic EFI problems".

And while I'm asking, why the f*ck did Intel do that crazy EFI thing
in the first place again?

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] f2fs: request for tree inclusion

2012-12-15 Thread Jaegeuk Kim
Hi Linus,

I'm seeing that f2fs has not been merged yet.
Could you give me any notice for this?
Management priority, or something else?

BTW, I have added a couple of bug fixes since "for-3.8-merge".
Which is better sending between [GIT PULL v2] or additional pull request
after merge?

Thanks,
Jaegeuk Kim

2012-12-11 (화), 16:58 +0900, Jaegeuk Kim:
> Hi Linus,
> 
> This is the first pull request for tree inclusion of Flash-Friendly File
> System (F2FS) towards the 3.8 merge window.
> 
> http://lwn.net/Articles/518718/
> http://lwn.net/Articles/518988/
> http://en.wikipedia.org/wiki/F2FS
> 
> The f2fs has been in the linux-next tree for a while, and several issues
> have been cleared as described in the signed tag below.
> And also, I've done testing f2fs successfully based on Linux 3.7 with
> the following test scenarios.
> 
> - Reliability test:
>   Run fsstress on an SSD partition.
> 
> - Robustness test:
>   Conduct sudden-power-off and examine the fs consistency repeatedly,
>   while running a reliability test.
> 
> So, please pull the f2fs filesystem.
> If I'm missing any issues or made mistakes, please let me know.
> 
> Thanks,
> Jaegeuk Kim
> 
> The following changes since commit
> 29594404d7fe73cd80eaa4ee8c43dcc53970c60e:
> 
>   Linux 3.7 (2012-12-10 19:30:57 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git
> tags/for-3.8-merge
> 
> for you to fetch changes up to e6aa9f36b2bfd6b30072c07b34f2a24becf1:
> 
>   f2fs: fix tracking parent inode number (2012-12-11 13:43:45 +0900)
> 
> 
> Introduce a new file system, Flash-Friendly File System (F2FS), to Linux
> 3.8.
> 
> Highlights:
> - Add initial f2fs source codes
> - Fix an endian conversion bug
> - Fix build failures on random configs
> - Fix the power-off-recovery routine
> - Minor cleanup, coding style, and typos patches
> 
> Greg Kroah-Hartman (1):
>   f2fs: move proc files to debugfs
> 
> Huajun Li (1):
>   f2fs: fix a typo in f2fs documentation
> 
> Jaegeuk Kim (22):
>   f2fs: add document
>   f2fs: add on-disk layout
>   f2fs: add superblock and major in-memory structure
>   f2fs: add super block operations
>   f2fs: add checkpoint operations
>   f2fs: add node operations
>   f2fs: add segment operations
>   f2fs: add file operations
>   f2fs: add address space operations for data
>   f2fs: add core inode operations
>   f2fs: add inode operations for special inodes
>   f2fs: add core directory operations
>   f2fs: add xattr and acl functionalities
>   f2fs: add garbage collection functions
>   f2fs: add recovery routines for roll-forward
>   f2fs: update Kconfig and Makefile
>   f2fs: update the f2fs document
>   f2fs: fix endian conversion bugs reported by sparse
>   f2fs: adjust kernel coding style
>   f2fs: resolve build failures
>   f2fs: cleanup the f2fs_bio_alloc routine
>   f2fs: fix tracking parent inode number
> 
> Namjae Jeon (10):
>   f2fs: fix the compiler warning for uninitialized use of variable
>   f2fs: show error in case of invalid mount arguments
>   f2fs: remove unneeded memset from init_once
>   f2fs: check read only condition before beginning write out
>   f2fs: remove unneeded initialization
>   f2fs: move error condition for mkdir at proper place
>   f2fs: rewrite f2fs_bio_alloc to make it simpler
>   f2fs: make use of GFP_F2FS_ZERO for setting gfp_mask
>   f2fs: remove redundant call to f2fs_put_page in delete entry
>   f2fs: introduce accessor to retrieve number of dentry slots
> 
> Sachin Kamat (1):
>   f2fs: remove unneeded version.h header file from f2fs.h
> 
> Wei Yongjun (1):
>   f2fs: remove unused variable
> 
>  Documentation/filesystems/00-INDEX |2 +
>  Documentation/filesystems/f2fs.txt |  421 +
>  fs/Kconfig |1 +
>  fs/Makefile|1 +
>  fs/f2fs/Kconfig|   53 ++
>  fs/f2fs/Makefile   |7 +
>  fs/f2fs/acl.c  |  414 +
>  fs/f2fs/acl.h  |   57 ++
>  fs/f2fs/checkpoint.c   |  794 
>  fs/f2fs/data.c |  702 ++
>  fs/f2fs/debug.c|  361 
>  fs/f2fs/dir.c  |  672 ++
>  fs/f2fs/f2fs.h | 1083 ++
>  fs/f2fs/file.c |  636 +
>  fs/f2fs/gc.c   |  742 +++
>  fs/f2fs/gc.h   |  117 +++
>  fs/f2fs/hash.c |   97 ++
>  fs/f2fs/inode.c|  268 ++
>  fs/f2fs/namei.c|  503 ++
>  fs/f2fs/node.c | 1764
> 

Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU

2012-12-15 Thread H. Peter Anvin

On 12/15/2012 03:15 PM, Yinghai Lu wrote:


That is for the kernel region itself (that code is actually unchanged from
the current code), and yes, we could cap that one to _end if there are
systems which have bugs in that area.  The dynamic page tables map 1G
aligned at a time.


dynamic should be 2M too.

AMD system:

http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=66520ebc2df3fe52eb4792f8101fac573b766baf

  BIOS-e820: [mem 0x0001-0x00e037ff] usable
  BIOS-e820: [mem 0x00e03800-0x00fc] reserved
  BIOS-e820: [mem 0x0100-0x011ffeff] usable

the hole is not 1G aligned.

or HT region is from e04000 ?



The HT region starts at 0xfd -- after that reserved region, so I 
have no idea what that particular system is trying to do or what is 
requirements are (nor what its MTRR setup is, since you didn't post it.)


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU

2012-12-15 Thread Yinghai Lu
On Sat, Dec 15, 2012 at 2:17 PM, H. Peter Anvin  wrote:
> On 12/15/2012 02:13 PM, Yinghai Lu wrote:
>>
>>
>> AMD system could have all mem between TOLM and TOHM all WB, and don
>> need to set them in MTRRs entries.
>>
>
> I include the TOM2 mechanism in the overall umbrella of MTRRs for this
> purpose.
>
>
>> and also your switchover change that handle cross 1G, and 512g, and it
>> is not 1G aligned.
>> for example, if kernel at 4095G+512M, it will map from 4095G+512M to
>> 4096G + 512M.
>
>
> That is for the kernel region itself (that code is actually unchanged from
> the current code), and yes, we could cap that one to _end if there are
> systems which have bugs in that area.  The dynamic page tables map 1G
> aligned at a time.

dynamic should be 2M too.

AMD system:

http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=66520ebc2df3fe52eb4792f8101fac573b766baf

 BIOS-e820: [mem 0x0001-0x00e037ff] usable
 BIOS-e820: [mem 0x00e03800-0x00fc] reserved
 BIOS-e820: [mem 0x0100-0x011ffeff] usable

the hole is not 1G aligned.

or HT region is from e04000 ?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] avoid entropy starvation due to stack protection

2012-12-15 Thread Stephan Müller

Am 15.12.2012 20:15, schrieb Ondřej Bílka:

Why not use nonblocking pool and seed nonblocking pool only with half of
collected entropy to get /dev/random in almost all practical scenarios
nonblocking?


I would not recommend changing /dev/urandom. First, we would change the 
characteristic of a kernel interface a lot of user space cryptographic 
components rely on. According to Linus that is typically a no-go. 
Moreover, the question can be raised, where do we pick the number of 
50%, why not 30% or 70%, why (re)seeding it at all?


Also, let us assume we pick 50% and we leave the create_elf_tables 
function as is (i.e. it pulls from get_random_bytes), I fear that we do 
not win at all. Our discussed problem is the depletion of the entropy 
via nonblocking_pool due to every execve() syscall requires 128 bits of 
data from nonblocking_pool. Even if we seed nonblocking_pool more 
rarely, we still deplete the entropy of the input_pool and thus deplete 
the entropy we want for cryptographic purposes a particular user has.


Thus, my recommendation is to disconnect the system entropy requirements 
from the user entropy requirements as much as possible. I am aware that 
there are in-kernel cryptographic requirements that must seed itself via 
the good entropy. And those users shall be rather left untouched -- i.e. 
they should still call get_random_bytes.


But for users that do not require cryptographic strength, but a strength 
against guessing of a random number on the local system for a decent 
time (like the stack protection or ASLR), we can use a slightly less 
perfect DRNG which is seeded with good entropy and never thereafter.


Ciao
Stephan


On Thu, Dec 13, 2012 at 08:44:36AM +0100, Stephan Mueller wrote:

On 13.12.2012 01:43:21, +0100, Andrew Morton
  wrote:

Hi Andrew,

On Tue, 11 Dec 2012 13:33:04 +0100
Stephan Mueller  wrote:


Some time ago, I noticed the fact that for every newly
executed process, the function create_elf_tables requests 16 bytes of
randomness from get_random_bytes. This is easily visible when calling

while [ 1 ]
do
cat /proc/sys/kernel/random/entropy_avail
sleep 1
done

Please see
http://ozlabs.org/~akpm/mmotm/broken-out/binfmt_elfc-use-get_random_int-to-fix-entropy-depleting.patch

That patch is about one week from a mainline merge, btw.

Initially I was also thinking about get_random_int. But stack protection
depends on non-predictable numbers to ensure it cannot be defeated. As
get_random_int depends on MD5 which is assumed to be broken now, I
discarded the idea of using get_random_int.

Moreover, please consider that get_cycles is an architecture-specific
function that on some architectures only returns 0 (For all
architectures where this is implemented, you have no guarantee that it
increments as a high-resolution timer). So, the quality of
get_random_int is questionable IMHO for the use as a stack protector.

Also note, that other in-kernel users of get_random_bytes may be
converted to using the proposed kernel pool to avoid more entropy drainage.

Please note that the suggested approach of fully seeding a deterministic
RNG never followed by a re-seeding is used elsewhere (e.g. the OpenSSL
RNG). Therefore, I think the suggested approach is viable.

Ciao
Stephan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] target updates for v3.8-rc1

2012-12-15 Thread Linus Torvalds
On Fri, Dec 14, 2012 at 3:53 PM, Nicholas A. Bellinger
 wrote:
>
> Here are the target updates for v3.8-rc1 merge window code.  Please go
> ahead and pull from:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git 
> for-next
>
> Just a heads up that there is a minor merge conflict that you'll
> encounter in target_handle_task_attr() code, that sfr has been carrying
> a fix for recently within -next.  After dropping the HEAD section, the
> resolution should look like:

Hmm. This is *not* how I resolved that conflict - that seems to drop the new

 complete(>t_transport_stop_comp);

added by Roland in commit 3ea160b3e8f0 ("target: Fix handling of
aborted commands").

So my conflict resolution looks different.

Which may be a bug, of course. Nicholas, Roland, please check my end result,

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-15 Thread Alan Cox
On Sat, 15 Dec 2012 00:54:48 +
Eric Wong  wrote:

> Applications streaming large files may want to reduce disk spinups and
> I/O latency by performing large amounts of readahead up front


How does it compare benchmark wise with a user thread or using the
readahead() call ?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] fbdev changes for 3.8

2012-12-15 Thread Dave Jones
On Sat, Dec 15, 2012 at 01:11:04PM -0800, Linus Torvalds wrote:
 > On Fri, Dec 14, 2012 at 2:22 AM, Tomi Valkeinen  
 > wrote:
 > > Hi Linus,
 > >
 > > Florian, the fbdev maintainer, has been very busy lately, so I offered to 
 > > send
 > > the pull request for fbdev for this merge window.
 > 
 > Pulled. However, with this I get the Kconfig question
 > 
 >OMAP2+ Display Subsystem support (OMAP2_DSS) [N/m/y/?] (NEW)
 > 
 > which doesn't make a whole lot of sense on x86-64, unless there's
 > something about OMAP2 that I don't know.
 > 
 > So I'd suggest making that OMAP2_DSS be dependent on OMAP2. Or at
 > least ARM. Because showing it to anybody else seems insane.
 > 
 > Same goes for FB_OMAP2 for that matter. I realize that it's likely
 > nice to get compile testing for this on x86-64 too, but if that's the
 > intent, we need to think about it some more. I don't think it's good
 > to ask actual normal users questions like this just for compile
 > coverage.

This OMAP stuff has been creeping into x86 builds for a while.
Grep from my current build config ..

# CONFIG_OMAP_OCP2SCP is not set
# CONFIG_KEYBOARD_OMAP4 is not set
# CONFIG_OMAP2_DSS is not set
# CONFIG_OMAP_USB2 is not set

There was some other arm-ism that does the same that I' currently forgetting,
or maybe that got fixed..

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU

2012-12-15 Thread H. Peter Anvin

On 12/15/2012 02:13 PM, Yinghai Lu wrote:


AMD system could have all mem between TOLM and TOHM all WB, and don
need to set them in MTRRs entries.



I include the TOM2 mechanism in the overall umbrella of MTRRs for this 
purpose.



and also your switchover change that handle cross 1G, and 512g, and it
is not 1G aligned.
for example, if kernel at 4095G+512M, it will map from 4095G+512M to
4096G + 512M.


That is for the kernel region itself (that code is actually unchanged 
from the current code), and yes, we could cap that one to _end if there 
are systems which have bugs in that area.  The dynamic page tables map 
1G aligned at a time.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: DMAR and DRHD errors[DMAR:[fault reason 06] PTE Read access is not set] Vt-d & intel_iommu

2012-12-15 Thread Robert Hancock

On 12/14/2012 03:32 PM, Don Dutile wrote:

On 12/13/2012 04:50 AM, Jason Gao wrote:

Dear List:

Description of problem:
After installed Centos 6.3(RHEL6.3) on my Dell R710(lastest
bios:Version: 6.3.0,Release Date: 07/24/2012) server,and updated
lastest kernel "2.6.32-279.14.1.el6.x86_64",I want to use the Intel
82576 ET Dual Port nic's SR-IOV feature,assigning VFs to kvm guest

appended kernel boot parameter: intel_iommu=on,after boot with the
following messages:

Dec 13 16:58:15 2 kernel: DRHD: handling fault status reg 2
Dec 13 16:58:15 2 kernel: DMAR:[DMA Read] Request device [03:00.0]
fault addr ffe65000
Dec 13 16:58:15 2 kernel: DMAR:[fault reason 06] PTE Read access is
not set
Dec 13 16:58:15 2 kernel: DRHD: handling fault status reg 102
Dec 13 16:58:15 2 kernel: DMAR:[DMA Read] Request device [03:00.0]
fault addr ffe8a000
Dec 13 16:58:15 2 kernel: DMAR:[fault reason 06] PTE Read access is
not set
Dec 13 16:58:15 2 kernel: scsi 0:0:32:0: Enclosure DP
BACKPLANE1.07 PQ: 0 ANSI: 5
Dec 13 16:58:15 2 kernel: DRHD: handling fault status reg 202
Dec 13 16:58:15 2 kernel: DMAR:[DMA Read] Request device [03:00.0]
fault addr ffe89000
Dec 13 16:58:15 2 kernel: DMAR:[fault reason 06] PTE Read access is
not set

full dmesg detail:
http://pastebin.com/BzFQV0jU
lspci -vvv full detail:
http://pastebin.com/9rP2d1br


it's a production server,and I'm not sure if this is a critical
problem,how to fix it,any help would be greatly appreciated.


DMAR table does not have an entry for this device to this region.
Once the driver reconfigs/resets the device to stop polling bios-boot
cmd rings and use (new) OS (dma-mapped) rings, there's a period of time
during this transition that the hw is babbling away to an area that is no
longer mapped.


Maybe some kind of boot PCI quirk is needed to stop the device DMA 
activity before enabling the IOMMU?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.7.0 5/9] i82975x_edac: optimise mode detection

2012-12-15 Thread Stephen Rothwell
On Sun, 16 Dec 2012 02:12:51 +0530 Arvind R  wrote:
>
> Subject: [PATCH 3.7.0 5/9] i82975x_edac: optimise mode detection
> 
> Minor optimisation of dual channel symmetric operation. Return
> value changed to bool.

And you moved the function for no reason that is obvious form the patch.

> +/* Return 1 if dual channel mode is active.  Else return 0. */

The comment is no longer correct i.e. you are no returning true/false not
0/1.

> +static bool dual_channel_active(void __iomem *mch_window)
> +{
> + /*
> +  * We treat interleaved-symmetric configuration as dual-channel.
> +  * All other configurations are virtual single channel mode.
> +  * bit-0 of EAP always provides the real channel in error.
> +  */
> + u8  drb[2];
> + int row;
> + booldualch;
> +
> + for (dualch = 1, row = 0; dualch &&

um  dualch = true

> + (row < I82975X_NR_CSROWS_PER_CHANNEL); row++) {
> + drb[0] = readb(mch_window + I82975X_DRB + row);
> + drb[1] = readb(mch_window + I82975X_DRB + row + 0x80);
> + dualch &= (drb[0] == drb[1]);

Don't do bit operations on a bool.

> + }
> + return dualch;
> +}

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgp0k3RVHge1l.pgp
Description: PGP signature


Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU

2012-12-15 Thread Yinghai Lu
On Sat, Dec 15, 2012 at 1:40 PM, H. Peter Anvin  wrote:
> On 12/15/2012 12:55 PM, Yinghai Lu wrote:
>> Also if we set map too large, could have chance to cover mem hole near
>> 1T for AMD HT system.
>
>
> Again, should not be cachable in the MTRRs, and even so, is 1G aligned
> already.

AMD system could have all mem between TOLM and TOHM all WB, and don
need to set them in MTRRs entries.

and also your switchover change that handle cross 1G, and 512g, and it
is not 1G aligned.
for example, if kernel at 4095G+512M, it will map from 4095G+512M to
4096G + 512M.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.7.0 1/9] i82975x_edac.c: fix style errors

2012-12-15 Thread Stephen Rothwell
Hi Arvind,

On Sun, 16 Dec 2012 02:08:50 +0530 Arvind R  wrote:
>
> Subject: [PATCH 3.7.0 1/9] i82975x_edac.c: fix style errors
> 
> splits or shortens extra long lines in source.

Don't do this, except for the one marked below, these add no value.  The
line length is a guide.

> - snprintf(csrow->channels[chan]->dimm->label, 
> EDAC_MC_LABEL_LEN, "DIMM %c%d",
> + snprintf(csrow->channels[chan]->dimm->label,
> +  EDAC_MC_LABEL_LEN, "DIMM %c%d",

Only keep this one.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpFbqGKr6kHR.pgp
Description: PGP signature


Re: [GIT PULL] x86/uapi for 3.8

2012-12-15 Thread Yinghai Lu
On Sat, Dec 15, 2012 at 1:06 PM, Linus Torvalds
 wrote:
> On Sat, Dec 15, 2012 at 1:04 PM, Markus Trippelsdorf
>  wrote:
>>
>> So I wonder if the following simple patch might be enough?
>> It fixes the issue for me at least.
>
> Not enough.
>
> It presumably fixes the issue for you by hiding the problem. But if
> you were to boot a kernel with EFI support, it would re-surface.
> Including in any distro kernel that obviously will include EFI support
> in order to handle the generic case.
>
> I've reverted the commit.

more than that, 3 commits just after that commit should be reverted at
the same time.
they all depend on that commit.

and first checking of that commit, it would have problem with system
more than 512g ...

static int insert_identity_mapping(resource_size_t paddr, unsigned long vaddr,
   unsigned long size)
...
 pgd_t *vpgd, *ppgd;
 ppgd = __va(real_mode_header->trampoline_pgd) + pgd_index(paddr);

it missed one . we should use

ppgd = (pgd_t *)__va(real_mode_header->trampoline_pgd) + pgd_index(paddr);

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU

2012-12-15 Thread H. Peter Anvin

On 12/15/2012 12:55 PM, Yinghai Lu wrote:

On Sat, Dec 15, 2012 at 11:30 AM, H. Peter Anvin  wrote:

What is the point of only managing 2M at a time?  Now you have to have
more conditionals and you don't get any more memory efficiency.


We don't need to, because real_data is less than 2M, and ramdisk is about 16M.



In other words, you make magic assumptions (some of which are very wrong 
in many real-life scenarios -- people can and do use gigabyte-plus 
initramfs).  That is exactly the wrong thing to do.  Furthermore it 
doesn't buy you anything, because you still have to allocate the PMDs.



Also if we set map too large, could have chance to cover mem hole near
1T for AMD HT system.


Again, should not be cachable in the MTRRs, and even so, is 1G aligned 
already.



Filling arbitrarily into the brk is not acceptable... the brk is an O(1)
area and all brk allocations need to be reserved at compile time, so the
overflow handling is still necessary.


if run out of BRK, we will get panic, because early_make_pgtable will return -1.


And you consider that panic an acceptable failure mode


and current BRK already have 64 slop space.

BTW, did you look at smp boot problem with early_level4_pgt version?


No, I have been busy with non-Linux stuff today.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] x86/uapi for 3.8

2012-12-15 Thread Dave Jones
On Sat, Dec 15, 2012 at 11:58:00AM -0800, Linus Torvalds wrote:

 > It might also be that it causes some massive corruption at boot time,
 > but it then requires that that particular memory is actually used. So
 > maybe it's not so much about the memory map except indirectly.

I wonder if this might explain the XFS corruption I've been seeing
the last couple days.  Won't be able to get at the affected laptop
until Monday to find out..

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] x86/uapi for 3.8

2012-12-15 Thread H. Peter Anvin

On 12/15/2012 01:06 PM, Linus Torvalds wrote:

On Sat, Dec 15, 2012 at 1:04 PM, Markus Trippelsdorf
 wrote:


So I wonder if the following simple patch might be enough?
It fixes the issue for me at least.


Not enough.

It presumably fixes the issue for you by hiding the problem. But if
you were to boot a kernel with EFI support, it would re-surface.
Including in any distro kernel that obviously will include EFI support
in order to handle the generic case.

I've reverted the commit.



Right... we'll work on fixing it properly.

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU

2012-12-15 Thread H. Peter Anvin
The mem hole at 1T should not be marked cachable in the MTRRs.

Yinghai Lu  wrote:

>On Sat, Dec 15, 2012 at 11:30 AM, H. Peter Anvin 
>wrote:
>> What is the point of only managing 2M at a time?  Now you have to
>have
>> more conditionals and you don't get any more memory efficiency.
>
>We don't need to, because real_data is less than 2M, and ramdisk is
>about 16M.
>
>Also if we set map too large, could have chance to cover mem hole near
>1T for AMD HT system.
>
>>
>> Filling arbitrarily into the brk is not acceptable... the brk is an
>O(1)
>> area and all brk allocations need to be reserved at compile time, so
>the
>> overflow handling is still necessary.
>
>if run out of BRK, we will get panic, because early_make_pgtable will
>return -1.
>
>and current BRK already have 64 slop space.
>
>BTW, did you look at smp boot problem with early_level4_pgt version?
>
>Yinghai

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Resend][PATCH] PM: Move disabling/enabling runtime PM to late suspend/early resume

2012-12-15 Thread Jiri Kosina
On Sat, 15 Dec 2012, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki 
> 
> Currently, the PM core disables runtime PM for all devices right
> after executing subsystem/driver .suspend() callbacks for them
> and re-enables it right before executing subsystem/driver .resume()
> callbacks for them.  This may lead to problems when there are
> two devices such that the .suspend() callback executed for one of
> them depends on runtime PM working for the other.  In that case,
> if runtime PM has already been disabled for the second device,
> the first one's .suspend() won't work correctly (and analogously
> for resume).
> 
> To make those issues go away, make the PM core disable runtime PM
> for devices right before executing subsystem/driver .suspend_late()
> callbacks for them and enable runtime PM for them right after
> executing subsystem/driver .resume_early() callbacks for them.  This
> way the potential conflitcs between .suspend_late()/.resume_early()
> and their runtime PM counterparts are still prevented from happening,
> but the subtle ordering issues related to disabling/enabling runtime
> PM for devices during system suspend/resume are much easier to avoid.
> 
> Reported-and-tested-by: Jan-Matthias Braun 
> Signed-off-by: Rafael J. Wysocki 

Hi Rafael,

just curious what is the reason for resend? Do you want to gather more 
Acks before pushing this upstream?

Thanks.

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] fbdev changes for 3.8

2012-12-15 Thread Linus Torvalds
On Fri, Dec 14, 2012 at 2:22 AM, Tomi Valkeinen  wrote:
> Hi Linus,
>
> Florian, the fbdev maintainer, has been very busy lately, so I offered to send
> the pull request for fbdev for this merge window.

Pulled. However, with this I get the Kconfig question

   OMAP2+ Display Subsystem support (OMAP2_DSS) [N/m/y/?] (NEW)

which doesn't make a whole lot of sense on x86-64, unless there's
something about OMAP2 that I don't know.

So I'd suggest making that OMAP2_DSS be dependent on OMAP2. Or at
least ARM. Because showing it to anybody else seems insane.

Same goes for FB_OMAP2 for that matter. I realize that it's likely
nice to get compile testing for this on x86-64 too, but if that's the
intent, we need to think about it some more. I don't think it's good
to ask actual normal users questions like this just for compile
coverage.

Hmm?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] x86/uapi for 3.8

2012-12-15 Thread Linus Torvalds
On Sat, Dec 15, 2012 at 1:04 PM, Markus Trippelsdorf
 wrote:
>
> So I wonder if the following simple patch might be enough?
> It fixes the issue for me at least.

Not enough.

It presumably fixes the issue for you by hiding the problem. But if
you were to boot a kernel with EFI support, it would re-surface.
Including in any distro kernel that obviously will include EFI support
in order to handle the generic case.

I've reverted the commit.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] x86/uapi for 3.8

2012-12-15 Thread Markus Trippelsdorf
On 2012.12.15 at 11:58 -0800, Linus Torvalds wrote:
> On Sat, Dec 15, 2012 at 11:41 AM, H. Peter Anvin  wrote:
> >
> > Matt is on vacation, and I'm partly offline for the weekend, but that
> > definitely seems suspicious.  Do we have a memory map of the affected
> > machine(s)?
> 
> 
> but as mentioned, there's bound to be some particular kernel layout
> that triggers this, because I definitely ran a few kernels with that
> commit in it without problems (and clearly other people are too).
> Looking at my boot log, I had successful boots with both 6a57d104c8cb
> and c2714334b944, which contains that commit.
> 
> It might also be that it causes some massive corruption at boot time,
> but it then requires that that particular memory is actually used. So
> maybe it's not so much about the memory map except indirectly.
> 
> But that commit *does* look a lot more likely than the things I looked at.

The commit message says that only some broken implementations of EFI
firmware require the mapping for the physical I/O device addresses.

So I wonder if the following simple patch might be enough?
It fixes the issue for me at least.

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index e190f7b..402e4ca 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -50,7 +50,7 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long 
size,
return err;
 }
 
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_EFI
 static void ident_pte_range(unsigned long paddr, unsigned long vaddr,
pmd_t *ppmd, pmd_t *vpmd, unsigned long end)
 {

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU

2012-12-15 Thread Yinghai Lu
On Sat, Dec 15, 2012 at 11:30 AM, H. Peter Anvin  wrote:
> What is the point of only managing 2M at a time?  Now you have to have
> more conditionals and you don't get any more memory efficiency.

We don't need to, because real_data is less than 2M, and ramdisk is about 16M.

Also if we set map too large, could have chance to cover mem hole near
1T for AMD HT system.

>
> Filling arbitrarily into the brk is not acceptable... the brk is an O(1)
> area and all brk allocations need to be reserved at compile time, so the
> overflow handling is still necessary.

if run out of BRK, we will get panic, because early_make_pgtable will return -1.

and current BRK already have 64 slop space.

BTW, did you look at smp boot problem with early_level4_pgt version?

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ANNOUNCE] Git v1.8.1-rc2

2012-12-15 Thread Junio C Hamano
A release candidate Git v1.8.1-rc2 is now available for testing
at the usual places.

The release tarballs are found at:

http://code.google.com/p/git-core/downloads/list

and their SHA-1 checksums are:

0a65a3d203b8d6e320f15abb040e1137e333c967  git-1.8.1.rc2.tar.gz
e6bc111686e6864cc3f078b9523ef1057a7fff8f  git-htmldocs-1.8.1.rc2.tar.gz
2c97472ae861454ff445868c40b49db66fa09f50  git-manpages-1.8.1.rc2.tar.gz

Also the following public repositories all have a copy of the v1.8.1-rc2
tag and the master branch that the tag points at:

  url = git://repo.or.cz/alt-git.git
  url = https://code.google.com/p/git-core/
  url = git://git.sourceforge.jp/gitroot/git-core/git.git
  url = git://git-core.git.sourceforge.net/gitroot/git-core/git-core
  url = https://github.com/gitster/git

Git v1.8.1 Release Notes (draft)


Backward compatibility notes


In the next major release (not *this* one), we will change the
behavior of the "git push" command.

When "git push [$there]" does not say what to push, we have used the
traditional "matching" semantics so far (all your branches were sent
to the remote as long as there already are branches of the same name
over there).  We will use the "simple" semantics that pushes the
current branch to the branch with the same name, only when the current
branch is set to integrate with that remote branch.  There is a user
preference configuration variable "push.default" to change this, and
"git push" will warn about the upcoming change until you set this
variable in this release.

"git branch --set-upstream" is deprecated and may be removed in a
relatively distant future.  "git branch [-u|--set-upstream-to]" has
been introduced with a saner order of arguments to replace it.


Updates since v1.8.0


UI, Workflows & Features

 * Command-line completion scripts for tcsh and zsh have been added.

 * A new remote-helper interface for Mercurial has been added to
   contrib/remote-helpers.

 * We used to have a workaround for a bug in ancient "less" that
   causes it to exit without any output when the terminal is resized.
   The bug has been fixed in "less" version 406 (June 2007), and the
   workaround has been removed in this release.

 * Some documentation pages that used to ship only in the plain text
   format are now formatted in HTML as well.

 * "git-prompt" scriptlet (in contrib/completion) can be told to paint
   pieces of the hints in the prompt string in colors.

 * A new configuration variable "diff.context" can be used to
   give the default number of context lines in the patch output, to
   override the hardcoded default of 3 lines.

 * When "git checkout" checks out a branch, it tells the user how far
   behind (or ahead) the new branch is relative to the remote tracking
   branch it builds upon.  The message now also advises how to sync
   them up by pushing or pulling.  This can be disabled with the
   advice.statusHints configuration variable.

 * "git config --get" used to diagnose presence of multiple
   definitions of the same variable in the same configuration file as
   an error, but it now applies the "last one wins" rule used by the
   internal configuration logic.  Strictly speaking, this may be an
   API regression but it is expected that nobody will notice it in
   practice.

 * "git log -p -S" now looks for the  after applying
   the textconv filter (if defined); earlier it inspected the contents
   of the blobs without filtering.

 * "git format-patch" learned the "--notes=" option to give
   notes for the commit after the three-dash lines in its output.

 * "git log --grep=" learned to honor the "grep.patterntype"
   configuration set to "perl".

 * "git replace -d " now interprets  as an extended
   SHA-1 (e.g. HEAD~4 is allowed), instead of only accepting full hex
   object name.

 * "git rm $submodule" used to punt on removing a submodule working
   tree to avoid losing the repository embedded in it.  Because
   recent git uses a mechanism to separate the submodule repository
   from the submodule working tree, "git rm" learned to detect this
   case and removes the submodule working tree when it is safe to do so.

 * "git send-email" used to prompt for the sender address, even when
   the committer identity is well specified (e.g. via user.name and
   user.email configuration variables).  The command no longer gives
   this prompt when not necessary.

 * "git send-email" did not allow non-address garbage strings to
   appear after addresses on Cc: lines in the patch files (and when
   told to pick them up to find more recipients), e.g.

 Cc: Stable Kernel  # for v3.2 and up

   The command now strips " # for v3.2 and up" part before adding the
   remainder of this line to the list of recipients.

 * "git submodule add" learned to add a new submodule at the same
   path as the path where an unrelated submodule was bound to in an
   existing revision via the "--name" option.

 * 

Re: [PATCH] clk: factor: calculate rate by do_div

2012-12-15 Thread Mike Turquette
On Sat, Dec 15, 2012 at 8:41 AM, Haojian Zhuang
 wrote:
> On Tue, Dec 4, 2012 at 9:32 AM, Haojian Zhuang  
> wrote:
>> On Mon, Dec 3, 2012 at 4:14 PM, Haojian Zhuang  
>> wrote:
>>> clk->rate = parent->rate / div * mult
>>>
>>> The formula is OK. But it may overflow while we do operate with
>>> unsigned long. So use do_div instead.
>>>
>>> Signed-off-by: Haojian Zhuang 
>>> ---
>>>  drivers/clk/clk-fixed-factor.c |5 -
>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/clk/clk-fixed-factor.c b/drivers/clk/clk-fixed-factor.c
>>> index a489985..1ef271e 100644
>>> --- a/drivers/clk/clk-fixed-factor.c
>>> +++ b/drivers/clk/clk-fixed-factor.c
>>> @@ -28,8 +28,11 @@ static unsigned long clk_factor_recalc_rate(struct 
>>> clk_hw *hw,
>>> unsigned long parent_rate)
>>>  {
>>> struct clk_fixed_factor *fix = to_clk_fixed_factor(hw);
>>> +   unsigned long long int rate;
>>>
>>> -   return parent_rate * fix->mult / fix->div;
>>> +   rate = (unsigned long long int)parent_rate * fix->mult;
>>> +   do_div(rate, fix->div);
>>> +   return (unsigned long)rate;
>>>  }
>>>
>>>  static long clk_factor_round_rate(struct clk_hw *hw, unsigned long rate,
>>> --
>>> 1.7.10.4
>>>
>>
>> Correct Mike's email address.
>
> Any comments? Does it mean that nobody want to fix the bug?

Thanks for the patch.  My apologies for letting this one slip through
the cracks but my normal email workflow was unavoidably disrupted and
I find myself playing catch-up with pending patches.

The patch looks good to me but I'll change the $SUBJECT to "clk:
fixed-factor: round_rate should use do_div" and do some testing before
taking it in.

Regards,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/22] 3.0.57-stable review

2012-12-15 Thread Shuah Khan
On Sat, Dec 15, 2012 at 7:25 AM, Shuah Khan  wrote:
> On Fri, Dec 14, 2012 at 3:25 PM, Greg Kroah-Hartman
>  wrote:
>> This is the start of the stable review cycle for the 3.0.57 release.
>> There are 22 patches in this series, all will be posted as a response
>> to this one.  If anyone has any issues with these being applied, please
>> let me know.
>>
>> Responses should be made by Sun Dec 16 22:16:57 UTC 2012.
>> Anything received after that time might be too late.
>>
>> The whole patch series can be found in one patch at:
>> kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.0.57-rc1.gz
>> and the diffstat can be found below.
>>
>> thanks,
>>
>> greg k-h
>
> Patches applied cleanly to 3.0.y, 3.4.y, 3.6.y, and 3.7.y.
> Compiled and booted on the following systems:
> HP EliteBook 6930p Intel(R) Core(TM)2 Duo CPU T9400  @ 2.53GHz
> HP ProBook 6475b AMD A10-4600M APU with Radeon(tm) HD Graphics
>
> Started cross-compile tests and will report the status.
>
> -- Shuah

Cross-compile tests:
alpha: defconfig passed on all
arm: defconfig passed on all
c6x: not applicable to 3.0.y, defconfig passed on the rest three
mips: defconfig passed on all
mipsel: defconfig passed on all
powerpc: wii_defconfig failed on 3.0.y (known issue), passed on the rest three
sh: defconfig passed on all
sparc: defconfig passed on all
tile: tilegx_defconfig passed on all

-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/28] 3.4.24-stable review

2012-12-15 Thread Shuah Khan
On Sat, Dec 15, 2012 at 7:27 AM, Shuah Khan  wrote:
> On Fri, Dec 14, 2012 at 3:26 PM, Greg Kroah-Hartman
>  wrote:
>> This is the start of the stable review cycle for the 3.4.24 release.
>> There are 28 patches in this series, all will be posted as a response
>> to this one.  If anyone has any issues with these being applied, please
>> let me know.
>>
>> Responses should be made by Sun Dec 16 22:16:59 UTC 2012.
>> Anything received after that time might be too late.
>>
>> The whole patch series can be found in one patch at:
>> kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.4.24-rc1.gz
>> and the diffstat can be found below.
>>
>> thanks,
>>
>> greg k-h
>>
>
> Patches applied cleanly to 3.0.y, 3.4.y, 3.6.y, and 3.7.y.
> Compiled and booted on the following systems:
> HP EliteBook 6930p Intel(R) Core(TM)2 Duo CPU T9400  @ 2.53GHz
> HP ProBook 6475b AMD A10-4600M APU with Radeon(tm) HD Graphics
>
> Started cross-compile tests and will report status.
>
> -- Shuah

Cross-compile tests:
alpha: defconfig passed on all
arm: defconfig passed on all
c6x: not applicable to 3.0.y, defconfig passed on the rest three
mips: defconfig passed on all
mipsel: defconfig passed on all
powerpc: wii_defconfig failed on 3.0.y (known issue), passed on the rest three
sh: defconfig passed on all
sparc: defconfig passed on all
tile: tilegx_defconfig passed on all

-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/37] 3.6.11-stable review

2012-12-15 Thread Shuah Khan
On Sat, Dec 15, 2012 at 7:24 AM, Shuah Khan  wrote:
> On Fri, Dec 14, 2012 at 4:00 PM, Greg Kroah-Hartman
>  wrote:
>> Note: This is going to be the last 3.6.y kernel release, unless
>> something major comes up, everyone should be moving to the 3.7.y kernel
>> at this point in time.
>>
>> This is the start of the stable review cycle for the 3.6.11 release.
>> There are 37 patches in this series, all will be posted as a response
>> to this one.  If anyone has any issues with these being applied, please
>> let me know.
>>
>> Responses should be made by Sun Dec 16 22:16:49 UTC 2012.
>> Anything received after that time might be too late.
>>
>> The whole patch series can be found in one patch at:
>> kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.6.11-rc1.gz
>> and the diffstat can be found below.
>>
>> thanks,
>>
>> greg k-h
>
> Patches applied cleanly to 3.0.y, 3.4.y, 3.6.y, and 3.7.y.
> Compiled and booted on the following systems:
> HP EliteBook 6930p Intel(R) Core(TM)2 Duo CPU T9400  @ 2.53GHz
> HP ProBook 6475b AMD A10-4600M APU with Radeon(tm) HD Graphics
>
> I started cross-compile tests and will report the status.
>
> -- Shuah

Cross-compile tests:
alpha: defconfig passed on all
arm: defconfig passed on all
c6x: not applicable to 3.0.y, defconfig passed on the rest three
mips: defconfig passed on all
mipsel: defconfig passed on all
powerpc: wii_defconfig failed on 3.0.y (known issue), passed on the rest three
sh: defconfig passed on all
sparc: defconfig passed on all
tile: tilegx_defconfig passed on all

-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/27] 3.7.1-stable review

2012-12-15 Thread Shuah Khan
On Sat, Dec 15, 2012 at 7:22 AM, Shuah Khan  wrote:
> On Fri, Dec 14, 2012 at 4:01 PM, Greg Kroah-Hartman
>  wrote:
>> This is the start of the stable review cycle for the 3.7.1 release.
>> There are 27 patches in this series, all will be posted as a response
>> to this one.  If anyone has any issues with these being applied, please
>> let me know.
>>
>> Responses should be made by Sun Dec 16 22:16:56 UTC 2012.
>> Anything received after that time might be too late.
>>
>> The whole patch series can be found in one patch at:
>> kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.7.1-rc1.gz
>> and the diffstat can be found below.
>>
>> thanks,
>>
>> greg k-h
>
> Patches applied cleanly to 3.0.y, 3.4.y, 3.6.y, and 3.7.y.
> ompiled and booted on the following systems:
> HP EliteBook 6930p Intel(R) Core(TM)2 Duo CPU T9400  @ 2.53GHz
> HP ProBook 6475b AMD A10-4600M APU with Radeon(tm) HD Graphics
>
> I started cross-compile tests and will report the status
>
Cross-compile tests:
alpha: defconfig passed on all
arm: defconfig passed on all
c6x: not applicable to 3.0.y, defconfig passed on the rest three
mips: defconfig passed on all
mipsel: defconfig passed on all
powerpc: wii_defconfig failed on 3.0.y (known issue), passed on the rest three
sh: defconfig passed on all
sparc: defconfig passed on all
tile: tilegx_defconfig passed on all

-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.7.0 9/9] i82975x_edac: set sw-scrub mode, bump rev.

2012-12-15 Thread Arvind R
Subject: [PATCH 3.7.0 9/9] i82975x_edac: set sw-scrub mode, bump rev.

update revision number and enable software scrub mode.
Signed-off-by: Arvind R. 
---
 i82975x_edac.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/edac/i82975x_edac.c   2012-12-15 23:08:28.0 +0530
+++ b/drivers/edac/i82975x_edac.c   2012-12-15 23:09:29.0 +0530
@@ -16,7 +16,7 @@
 #include 
 #include "edac_core.h"

-#define I82975X_REVISION   " Ver: 1.0.0"
+#define I82975X_REVISION   " Ver: 2.0.0"
 #define EDAC_MOD_STR   "i82975x_edac"

 #define i82975x_printk(level, fmt, arg...) \
@@ -586,7 +586,7 @@ static int i82975x_probe1(struct pci_dev
pvt = (struct i82975x_pvt *) mci->pvt_info;
pvt->chip = dev_idx;
i82975x_init_csrows(mci, mch_window);
-   mci->scrub_mode = SCRUB_HW_SRC;
+   mci->scrub_mode = SCRUB_SW_SRC;
i82975x_get_error_info(mci, );  /* clear counters */

/* finalize this instance of memory controller with edac core */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.7.0 8/9] i82975x_edac: fix wrong offset reported

2012-12-15 Thread Arvind R
Subject: [PATCH 3.7.0 8/9] i82975x_edac: fix wrong offset reported

Cleanup error reporting function. This also corrects the wrong
calculation of the offset mask.
Signed-off-by: Arvind R. 
---
 i82975x_edac.c |   59 +-
 1 file changed, 25 insertions(+), 34 deletions(-)

--- a/drivers/edac/i82975x_edac.c   2012-12-15 23:55:02.0 +0530
+++ b/drivers/edac/i82975x_edac.c   2012-12-15 23:54:00.0 +0530
@@ -34,6 +34,8 @@
 #defineI82975X_NR_CSROWS_PER_CHANNEL   4
 #defineI82975X_NR_CSROWS_PER_DIMM  2

+#defineI82975X_ECC_GRAIN   (1 << 7)
+
 /* Intel 82975X register addresses - device 0 function 0 - DRAM Controller */
 #define I82975X_EAP0x58/* Dram Error Address Pointer (32b)
 *
@@ -205,6 +207,10 @@ NOTE: Only ONE of the three must be enab
 #define I82975X_DRC_CH0M1  0x124
 #define I82975X_DRC_CH1M1  0x1A4

+#defineI82975X_BIT_ERROR_CE0x01
+#defineI82975X_BIT_ERROR_UE0x02
+#defineI82975X_BITS_ERROR  0x03
+
 enum i82975x_chips {
I82975X_chip = 0,
 };
@@ -239,7 +245,7 @@ static struct pci_dev *mci_pdev;/* init

 static int i82975x_registered = 1;

-static void i82975x_get_error_info(struct mem_ctl_info *mci,
+static bool i82975x_get_error_info(struct mem_ctl_info *mci,
struct i82975x_error_info *info)
 {
struct pci_dev *pdev;
@@ -258,7 +264,8 @@ static void i82975x_get_error_info(struc
pci_read_config_byte(pdev, I82975X_DERRSYN, >derrsyn);
pci_read_config_word(pdev, I82975X_ERRSTS, >errsts2);

-   pci_write_bits16(pdev, I82975X_ERRSTS, 0x0003, 0x0003);
+   pci_write_bits16(pdev, I82975X_ERRSTS, I82975X_BITS_ERROR,
+   I82975X_BITS_ERROR);

/*
 * If the error is the same then we can for both reads then
@@ -266,31 +273,30 @@ static void i82975x_get_error_info(struc
 * there is a CE no info and the second set of reads is valid
 * and should be UE info.
 */
-   if (!(info->errsts2 & 0x0003))
-   return;
+   if (!(info->errsts2 & I82975X_BITS_ERROR))
+   return false;

-   if ((info->errsts ^ info->errsts2) & 0x0003) {
+   if ((info->errsts ^ info->errsts2) & I82975X_BITS_ERROR) {
pci_read_config_dword(pdev, I82975X_EAP, >eap);
pci_read_config_byte(pdev, I82975X_XEAP, >xeap);
pci_read_config_byte(pdev, I82975X_DES, >des);
pci_read_config_byte(pdev, I82975X_DERRSYN,
>derrsyn);
}
+   return true;
 }

 static int i82975x_process_error_info(struct mem_ctl_info *mci,
struct i82975x_error_info *info, int handle_errors)
 {
+   enum hw_event_mc_err_type err_type;
int row, chan;
unsigned long offst, page;

-   if (!(info->errsts2 & 0x0003))
-   return 0;
-
if (!handle_errors)
return 1;

-   if ((info->errsts ^ info->errsts2) & 0x0003) {
+   if ((info->errsts ^ info->errsts2) & I82975X_BITS_ERROR) {
edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci, 1, 0, 0, 0,
 -1, -1, -1, "UE overwrote CE", "");
info->errsts = info->errsts2;
@@ -302,30 +308,15 @@ static int i82975x_process_error_info(st
page |= 0x8000;
page >>= (PAGE_SHIFT - 1);
row = edac_mc_find_csrow_by_page(mci, page);
+   chan = (mci->num_cschannel == 1) ? 0 : info->eap & 1;
+   offst = info->eap & ((1 << PAGE_SHIFT) - I82975X_ECC_GRAIN);

-   if (row == -1)  {
-   i82975x_mc_printk(mci, KERN_ERR, "error processing EAP:\n"
-   "\tXEAP=%u\n"
-   "\t EAP=0x%08x\n"
-   "\tPAGE=0x%08x\n",
-   (info->xeap & 1) ? 1 : 0, info->eap, (unsigned) page);
-   return 0;
-   }
-   chan = (mci->csrows[row]->nr_channels == 1) ? 0 : info->eap & 1;
-   offst = info->eap
-   & ((1 << PAGE_SHIFT) -
-   (1 << mci->csrows[row]->channels[chan]->dimm->grain));
-
-   if (info->errsts & 0x0002)
-   edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci, 1,
-page, offst, 0,
-row, -1, -1,
-"i82975x UE", "");
-   else
-   edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci, 1,
+   err_type = (info->errsts & I82975X_BIT_ERROR_UE) ?
+   HW_EVENT_ERR_UNCORRECTED : HW_EVENT_ERR_CORRECTED;
+   edac_mc_handle_error(err_type, mci, 1,
 page, offst, info->derrsyn,
-row, chan ? chan : 0, -1,
-"i82975x CE", "");
+ 

[PATCH 3.7.0 7/9] i82975x_edac: correct dimm label initialisation

2012-12-15 Thread Arvind R
Subject: [PATCH 3.7.0 7/9] i82975x_edac: correct dimm label initialisation

DIMM label are the legends on the mobo. Fix their initialisation
to correspond to the legends.
Channels are designated A/B. A single DIMM occupies 2 ranks. And
the first DIMM is 1, not 0. This is as found in Asus P5WDG2 family
of mobos. This patch maps to that.
Signed-off-by: Arvind R. 
---
 i82975x_edac.c |   21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

--- a/drivers/edac/i82975x_edac.c   2012-12-15 22:32:00.0 +0530
+++ b/drivers/edac/i82975x_edac.c   2012-12-15 22:42:00.0 +0530
@@ -32,6 +32,7 @@
 #define I82975X_NR_DIMMS   8
 #define I82975X_NR_CSROWS(nr_chans)(I82975X_NR_DIMMS / (nr_chans))
 #defineI82975X_NR_CSROWS_PER_CHANNEL   4
+#defineI82975X_NR_CSROWS_PER_DIMM  2

 /* Intel 82975X register addresses - device 0 function 0 - DRAM Controller */
 #define I82975X_EAP0x58/* Dram Error Address Pointer (32b)
@@ -339,13 +340,13 @@ static void i82975x_check(struct mem_ctl
 }

 static void i82975x_init_csrows(struct mem_ctl_info *mci,
-   struct pci_dev *pdev, void __iomem *mch_window)
+   void __iomem *mch_window)
 {
struct csrow_info *csrow;
unsigned long last_cumul_size;
u8 value;
u32 cumul_size, nr_pages;
-   int index, chan;
+   unsigned index, chan;
struct dimm_info *dimm;

last_cumul_size = 0;
@@ -370,7 +371,8 @@ static void i82975x_init_csrows(struct m
 * Adjust cumul_size w.r.t number of channels
 *
 */
-   if (csrow->nr_channels > 1)
+   if (mci->num_cschannel > 1)
+   /* dual_channel symmetric */
cumul_size <<= 1;
edac_dbg(3, "(%d) cumul_size 0x%x\n", index, cumul_size);

@@ -384,15 +386,18 @@ static void i82975x_init_csrows(struct m
 *   [0-7] for single-channel; i.e. csrow->nr_channels = 1
 *   [0-3] for dual-channel; i.e. csrow->nr_channels = 2
 */
-   for (chan = 0; chan < csrow->nr_channels; chan++) {
+   for (chan = 0; chan < mci->num_cschannel; chan++) {
dimm = mci->csrows[index]->channels[chan]->dimm;

-   dimm->nr_pages = nr_pages / csrow->nr_channels;
+   dimm->nr_pages = nr_pages / mci->num_cschannel;

snprintf(csrow->channels[chan]->dimm->label,
 EDAC_MC_LABEL_LEN, "DIMM %c%d",
-(chan == 0) ? 'A' : 'B',
-index);
+((mci->num_cschannel <= 1) ?
+   index / I82975X_NR_CSROWS_PER_CHANNEL :
+   chan) + 'A',
+((index % I82975X_NR_CSROWS_PER_CHANNEL) /
+   I82975X_NR_CSROWS_PER_DIMM) + 1);
dimm->grain = 1 << 7;   /* always */
dimm->dtype = DEV_X8;   /* only with ECC */
dimm->mtype = MEM_DDR2; /* only supported */
@@ -589,7 +594,7 @@ static int i82975x_probe1(struct pci_dev
mci->ctl_page_to_phys = NULL;
pvt = (struct i82975x_pvt *) mci->pvt_info;
pvt->chip = dev_idx;
-   i82975x_init_csrows(mci, pdev, mch_window);
+   i82975x_init_csrows(mci, mch_window);
mci->scrub_mode = SCRUB_HW_SRC;
i82975x_get_error_info(mci, );  /* clear counters */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.7.0 6/9] i82975x_edac: unmap pcibar after init

2012-12-15 Thread Arvind R
Subject: [PATCH 3.7.0 6/9] i82975x_edac: unmap pcibar after init

Remove the unnecessary mapped window in private data structure.
Then the window can be unmapped right after driver initialisation
is done.
Signed-off-by: Arvind R. 
---
 i82975x_edac.c |   24 
 1 file changed, 8 insertions(+), 16 deletions(-)

--- a/drivers/edac/i82975x_edac.c   2012-12-16 00:09:23.0 +0530
+++ b/drivers/edac/i82975x_edac.c   2012-12-16 00:08:52.0 +0530
@@ -205,11 +205,11 @@ NOTE: Only ONE of the three must be enab
 #define I82975X_DRC_CH1M1  0x1A4

 enum i82975x_chips {
-   I82975X = 0,
+   I82975X_chip = 0,
 };

 struct i82975x_pvt {
-   void __iomem *mch_window;
+   enum i82975x_chips chip;
 };

 struct i82975x_dev_info {
@@ -227,7 +227,7 @@ struct i82975x_error_info {
 };

 static const struct i82975x_dev_info i82975x_devs[] = {
-   [I82975X] = {
+   [I82975X_chip] = {
.ctl_name = "i82975x"
},
 };
@@ -588,7 +588,7 @@ static int i82975x_probe1(struct pci_dev
mci->edac_check = i82975x_check;
mci->ctl_page_to_phys = NULL;
pvt = (struct i82975x_pvt *) mci->pvt_info;
-   pvt->mch_window = mch_window;
+   pvt->chip = dev_idx;
i82975x_init_csrows(mci, pdev, mch_window);
mci->scrub_mode = SCRUB_HW_SRC;
i82975x_get_error_info(mci, );  /* clear counters */
@@ -596,15 +596,13 @@ static int i82975x_probe1(struct pci_dev
/* finalize this instance of memory controller with edac core */
if (edac_mc_add_mc(mci)) {
edac_dbg(0, "MC%d failed add_mc()\n", dev_idx);
-   goto fail2;
+   edac_mc_free(mci);
+   goto fail1;
}

/* get this far and it's successful */
edac_dbg(3, "MC%d success\n", dev_idx);
-   return 0;
-
-fail2:
-   edac_mc_free(mci);
+   rc = 0;

 fail1:
iounmap(mch_window);
@@ -632,23 +630,17 @@ static int __devinit i82975x_init_one(st
 static void __devexit i82975x_remove_one(struct pci_dev *pdev)
 {
struct mem_ctl_info *mci;
-   struct i82975x_pvt *pvt;

mci = edac_mc_del_mc(>dev);
if (mci  == NULL)
return;
-
-   pvt = mci->pvt_info;
-   if (pvt->mch_window)
-   iounmap( pvt->mch_window );
-
edac_mc_free(mci);
 }

 static DEFINE_PCI_DEVICE_TABLE(i82975x_pci_tbl) = {
{
PCI_VEND_DEV(INTEL, 82975_0), PCI_ANY_ID, PCI_ANY_ID, 0, 0,
-   I82975X
+   I82975X_chip
},
{
0,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.7.0 5/9] i82975x_edac: optimise mode detection

2012-12-15 Thread Arvind R
Subject: [PATCH 3.7.0 5/9] i82975x_edac: optimise mode detection

Minor optimisation of dual channel symmetric operation. Return
value changed to bool.
Signed-off-by: Arvind R. 
---
 i82975x_edac.c |   45 --
 1 file changed, 22 insertions(+), 23 deletions(-)

--- a/drivers/edac/i82975x_edac.c   2012-12-15 22:14:18.0 +0530
+++ b/drivers/edac/i82975x_edac.c   2012-12-15 22:17:12.0 +0530
@@ -31,6 +31,7 @@

 #define I82975X_NR_DIMMS   8
 #define I82975X_NR_CSROWS(nr_chans)(I82975X_NR_DIMMS / (nr_chans))
+#defineI82975X_NR_CSROWS_PER_CHANNEL   4

 /* Intel 82975X register addresses - device 0 function 0 - DRAM Controller */
 #define I82975X_EAP0x58/* Dram Error Address Pointer (32b)
@@ -337,29 +338,6 @@ static void i82975x_check(struct mem_ctl
i82975x_process_error_info(mci, , 1);
 }

-/* Return 1 if dual channel mode is active.  Else return 0. */
-static int dual_channel_active(void __iomem *mch_window)
-{
-   /*
-* We treat interleaved-symmetric configuration as dual-channel - EAP's
-* bit-0 giving the channel of the error location.
-*
-* All other configurations are treated as single channel - the EAP's
-* bit-0 will resolve ok in symmetric area of mixed
-* (symmetric/asymmetric) configurations
-*/
-   u8  drb[4][2];
-   int row;
-   intdualch;
-
-   for (dualch = 1, row = 0; dualch && (row < 4); row++) {
-   drb[row][0] = readb(mch_window + I82975X_DRB + row);
-   drb[row][1] = readb(mch_window + I82975X_DRB + row + 0x80);
-   dualch = dualch && (drb[row][0] == drb[row][1]);
-   }
-   return dualch;
-}
-
 static void i82975x_init_csrows(struct mem_ctl_info *mci,
struct pci_dev *pdev, void __iomem *mch_window)
 {
@@ -527,6 +505,27 @@ static void i82975x_print_dram_timings(v
 }
 #endif

+/* Return 1 if dual channel mode is active.  Else return 0. */
+static bool dual_channel_active(void __iomem *mch_window)
+{
+   /*
+* We treat interleaved-symmetric configuration as dual-channel.
+* All other configurations are virtual single channel mode.
+* bit-0 of EAP always provides the real channel in error.
+*/
+   u8  drb[2];
+   int row;
+   booldualch;
+
+   for (dualch = 1, row = 0; dualch &&
+   (row < I82975X_NR_CSROWS_PER_CHANNEL); row++) {
+   drb[0] = readb(mch_window + I82975X_DRB + row);
+   drb[1] = readb(mch_window + I82975X_DRB + row + 0x80);
+   dualch &= (drb[0] == drb[1]);
+   }
+   return dualch;
+}
+
 static int i82975x_probe1(struct pci_dev *pdev, int dev_idx)
 {
int rc = -ENODEV;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 162/241] Bluetooth: ath3k: Add support for VAIO VPCEH [0489:e027]

2012-12-15 Thread Herton Ronaldo Krzesinski
On Sat, Dec 15, 2012 at 07:59:18PM +, Ben Hutchings wrote:
> On Thu, 2012-12-13 at 11:58 -0200, Herton Ronaldo Krzesinski wrote:
> > 3.5.7.2 -stable review patch.  If anyone has any objections, please let me 
> > know.
> > 
> > --
> > 
> > From: Marcos Chaparro 
> > 
> > commit acd9454433e28c1a365d8b069813c35c1c3a8ac3 upstream.
> > 
> > Added Atheros AR3011 internal bluetooth device found in Sony VAIO VPCEH to 
> > the
> > devices list.
> > Before this, the bluetooth module was identified as an Foxconn / Hai 
> > bluetooth
> > device [0489:e027], now it claims to be an AtherosAR3011 Bluetooth
> > [0cf3:3005].
> [...]
> 
> This seems to be applicable to 3.{0,2,4,6}.y as well...

While we're here you may also want to consider to add to other stables the
following (where applicable, I didn't verify exactly which versions these
may be needed):

[163/241] drm/i915: EBUSY status handling added to i915_gem_fault().
(reference: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1087302)

[164/241] MISC: hpilo, remove pci_disable_device
(reference: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1087860)

This is what I recall as bugfixes requested to be added directly to me,
but that didn't yet came through as an stable mailing list request.

> 
> Ben.
> 
> -- 
> Ben Hutchings
> Theory and practice are closer in theory than in practice.
> - John Levine, moderator of comp.compilers

-- 
[]'s
Herton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.7.0 4/9] i82975x_edac.c: remove unnecessary function

2012-12-15 Thread Arvind R
Subject: [PATCH 3.7.0 4/9] i82975x_edac.c: remove unnecessary function

remove function that returns a constant value and variable to
hold the returned value.
Signed-off-by: Arvind R. 
---
 i82975x_edac.c |   12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

--- a/drivers/edac/i82975x_edac.c   2012-12-15 20:19:16 +0530
+++ b/drivers/edac/i82975x_edac.c   2012-12-15 20:19:02 +0530
@@ -360,14 +360,6 @@ static int dual_channel_active(void __io
return dualch;
 }

-static enum dev_type i82975x_dram_type(void __iomem *mch_window, int rank)
-{
-   /*
-* ECC is possible on i92975x ONLY with DEV_X8
-*/
-   return DEV_X8;
-}
-
 static void i82975x_init_csrows(struct mem_ctl_info *mci,
struct pci_dev *pdev, void __iomem *mch_window)
 {
@@ -377,7 +369,6 @@ static void i82975x_init_csrows(struct m
u32 cumul_size, nr_pages;
int index, chan;
struct dimm_info *dimm;
-   enum dev_type dtype;

last_cumul_size = 0;

@@ -415,7 +406,6 @@ static void i82975x_init_csrows(struct m
 *   [0-7] for single-channel; i.e. csrow->nr_channels = 1
 *   [0-3] for dual-channel; i.e. csrow->nr_channels = 2
 */
-   dtype = i82975x_dram_type(mch_window, index);
for (chan = 0; chan < csrow->nr_channels; chan++) {
dimm = mci->csrows[index]->channels[chan]->dimm;

@@ -426,7 +416,7 @@ static void i82975x_init_csrows(struct m
 (chan == 0) ? 'A' : 'B',
 index);
dimm->grain = 1 << 7;   /* always */
-   dimm->dtype = i82975x_dram_type(mch_window, index);
+   dimm->dtype = DEV_X8;   /* only with ECC */
dimm->mtype = MEM_DDR2; /* only supported */
dimm->edac_mode = EDAC_SECDED; /* only supported */
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.7.0 3/9] i82975x_edac.c: cleanup debug code

2012-12-15 Thread Arvind R
Subject: [PATCH 3.7.0 3/9] i82975x_edac.c: cleanup debug code

modify debug levels to sane levels. Also move random debug code
into CONFIG_EDAC_DEBUG sections.
Signed-off-by: Arvind R. 
---
 i82975x_edac.c |  171 +++---
 1 file changed, 97 insertions(+), 74 deletions(-)

--- a/drivers/edac/i82975x_edac.c   2012-12-15 20:18:24 +0530
+++ b/drivers/edac/i82975x_edac.c   2012-12-15 20:17:58 +0530
@@ -167,7 +167,8 @@ NOTE: Only ONE of the three must be enab
 #define I82975X_C0BNKARC   0x10e
 #define I82975X_C1BNKARC   0x18e

-
+#define I82975X_C0DRT1 0x114
+#define I82975X_C1DRT1 0x194

 #define I82975X_DRC0x120 /* DRAM Controller Mode0 (32b)
 *
@@ -331,7 +332,7 @@ static void i82975x_check(struct mem_ctl
 {
struct i82975x_error_info info;

-   edac_dbg(1, "MC%d\n", mci->mc_idx);
+   edac_dbg(4, "MC%d\n", mci->mc_idx);
i82975x_get_error_info(mci, );
i82975x_process_error_info(mci, , 1);
 }
@@ -436,27 +437,93 @@ static void i82975x_init_csrows(struct m
}
 }

-/* #define  i82975x_DEBUG_IOMEM */
-
-#ifdef i82975x_DEBUG_IOMEM
-static void i82975x_print_dram_timings(void __iomem *mch_window)
-{
-   /*
-* The register meanings are from Intel specs;
-* (shows 13-5-5-5 for 800-DDR2)
-* Asus P5W Bios reports 15-5-4-4
-* What's your religion?
-*/
+#ifdef CONFIG_EDAC_DEBUG
+static void i82975x_print_dram_settings(void __iomem *mch_window,
+   u32 mchbar, u32 *drc, bool is_symmetric)
+{
+   static const char *refresh_modes[8] = {
+   "disabled"
+   "15.6 uSec", "7.8 uSec", "3.9 uSec", "1.95 uSec",
+   "reserved", "reserved",
+   "fast refresh (64 clocks)"
+   };
+   static const char *rank_attr[8] = {
+   "empty   ", "reserved",
+   "4 Kb", "8 Kb", "16 Kb   ",
+   "reserved", "reserved", "reserved"
+   };
static const int caslats[4] = { 5, 4, 3, 6 };
u32 dtreg[2];
+   u8 drb[4];
+   u8 dra[2][2];
+
+   /* Show memory config if debug level is 1 or upper */
+   if (!edac_debug_level)
+   return;
+
+   i82975x_printk(KERN_INFO, "MCHBAR real = %0x, remapped = %p\n",
+   mchbar, mch_window);
+
+   drb[0] = readb(mch_window + I82975X_DRB_CH0R0);
+   drb[1] = readb(mch_window + I82975X_DRB_CH0R1);
+   drb[2] = readb(mch_window + I82975X_DRB_CH0R2);
+   drb[3] = readb(mch_window + I82975X_DRB_CH0R3);
+   i82975x_printk(KERN_INFO, "DRBCH0R0 = 0x%02x\n", drb[0]);
+   i82975x_printk(KERN_INFO, "DRBCH0R1 = 0x%02x\n", drb[1]);
+   i82975x_printk(KERN_INFO, "DRBCH0R2 = 0x%02x\n", drb[2]);
+   i82975x_printk(KERN_INFO, "DRBCH0R3 = 0x%02x\n\n", drb[3]);
+   drb[0] = readb(mch_window + I82975X_DRB_CH1R0);
+   drb[1] = readb(mch_window + I82975X_DRB_CH1R1);
+   drb[2] = readb(mch_window + I82975X_DRB_CH1R2);
+   drb[3] = readb(mch_window + I82975X_DRB_CH1R3);
+   i82975x_printk(KERN_INFO, "DRBCH1R0 = 0x%02x\n", drb[0]);
+   i82975x_printk(KERN_INFO, "DRBCH1R1 = 0x%02x\n", drb[1]);
+   i82975x_printk(KERN_INFO, "DRBCH1R2 = 0x%02x\n", drb[2]);
+   i82975x_printk(KERN_INFO, "DRBCH1R3 = 0x%02x\n", drb[3]);
+   i82975x_printk(KERN_INFO, "Memory in %ssymmetric mode\n",
+   is_symmetric ? "" : "as");

-   dtreg[0] = readl(mch_window + 0x114);
-   dtreg[1] = readl(mch_window + 0x194);
+   i82975x_printk(KERN_INFO, "DRC_CH0 = %0x, %s\n", drc[0],
+   ((drc[0] >> 21) & 3) == 1 ?
+   "ECC enabled" : "ECC disabled");
+   i82975x_printk(KERN_INFO, "DRC_CH1 = %0x, %s\n", drc[1],
+   ((drc[1] >> 21) & 3) == 1 ?
+   "ECC enabled" : "ECC disabled");
+
+   dra[0][0] = readb(mch_window + I82975X_DRA_CH0R01);
+   dra[0][1] = readb(mch_window + I82975X_DRA_CH0R23);
+   dra[1][0] = readb(mch_window + I82975X_DRA_CH1R01);
+   dra[1][1] = readb(mch_window + I82975X_DRA_CH1R23);
+   i82975x_printk(KERN_INFO, "Rank Attribute:\n"
+   " Rank:  0123\n"
+   "  Ch0:  %s   %s   %s   %s\n"
+   "  Ch1:  %s   %s   %s   %s\n",
+   rank_attr[dra[0][0] & 7],
+   rank_attr[(dra[0][0] >> 4) & 7],
+   rank_attr[dra[0][1] & 7],
+   rank_attr[(dra[0][1] >> 4) & 7],
+   rank_attr[dra[1][0] & 7],
+   rank_attr[(dra[1][0] >> 4) & 7],
+   rank_attr[dra[1][1] & 7],
+   rank_attr[(dra[1][1] >> 4) & 7]);
+
+   i82975x_printk(KERN_INFO, "Bank Architecture:\n"
+   "  2 bits 

[PATCH 3.7.0 2/9] i82975x_edac.c: fix layers initialisation

2012-12-15 Thread Arvind R
Subject: [PATCH 3.7.0 2/9] i82975x_edac.c: fix layers initialisation

correct the absolutely wrong initialisation of memory layout.
Signed-off-by: Arvind R. 
---
 i82975x_edac.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/edac/i82975x_edac.c   2012-12-15 16:17:28 +0530
+++ b/drivers/edac/i82975x_edac.c   2012-12-15 16:16:51 +0530
@@ -544,10 +544,10 @@ static int i82975x_probe1(struct pci_dev

/* assuming only one controller, index thus is 0 */
layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
-   layers[0].size = I82975X_NR_DIMMS;
+   layers[0].size = I82975X_NR_CSROWS(chans);
layers[0].is_virt_csrow = true;
layers[1].type = EDAC_MC_LAYER_CHANNEL;
-   layers[1].size = I82975X_NR_CSROWS(chans);
+   layers[1].size = chans;
layers[1].is_virt_csrow = false;
mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(*pvt));
if (!mci) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.7.0 1/9] i82975x_edac.c: fix style errors

2012-12-15 Thread Arvind R
Subject: [PATCH 3.7.0 1/9] i82975x_edac.c: fix style errors

splits or shortens extra long lines in source.
Signed-off-by: Arvind R. 
---
 i82975x_edac.c |   15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

--- a/drivers/edac/i82975x_edac.c   2012-12-11 09:00:57 +0530
+++ b/drivers/edac/i82975x_edac.c   2012-12-15 16:01:29 +0530
@@ -48,7 +48,7 @@
 #define I82975X_DES0x5d/* Dram ERRor DeSTination (8b)
 * 0h:Processor Memory Reads
 * 1h:7h  reserved
-* More - See Page 65 of Intel DocSheet.
+* More - See Pg.65 of Intel DocSheet.
 */

 #define I82975X_ERRSTS 0xc8/* Error Status Register (16b)
@@ -98,7 +98,7 @@ NOTE: Only ONE of the three must be enab
 #define I82975X_XEAP   0xfc/* Extended Dram Error Address Pointer (8b)
 *
 * 7:1   reserved
-* 0 Bit32 of the Dram Error Address
+* 0 Bit32 of Dram Error Address
 */

 #define I82975X_MCHBAR 0x44/*
@@ -305,13 +305,13 @@ static int i82975x_process_error_info(st
"\tXEAP=%u\n"
"\t EAP=0x%08x\n"
"\tPAGE=0x%08x\n",
-   (info->xeap & 1) ? 1 : 0, info->eap, (unsigned int) 
page);
+   (info->xeap & 1) ? 1 : 0, info->eap, (unsigned) page);
return 0;
}
chan = (mci->csrows[row]->nr_channels == 1) ? 0 : info->eap & 1;
offst = info->eap
& ((1 << PAGE_SHIFT) -
-  (1 << 
mci->csrows[row]->channels[chan]->dimm->grain));
+   (1 << mci->csrows[row]->channels[chan]->dimm->grain));

if (info->errsts & 0x0002)
edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci, 1,
@@ -420,12 +420,13 @@ static void i82975x_init_csrows(struct m

dimm->nr_pages = nr_pages / csrow->nr_channels;

-   snprintf(csrow->channels[chan]->dimm->label, 
EDAC_MC_LABEL_LEN, "DIMM %c%d",
+   snprintf(csrow->channels[chan]->dimm->label,
+EDAC_MC_LABEL_LEN, "DIMM %c%d",
 (chan == 0) ? 'A' : 'B',
 index);
-   dimm->grain = 1 << 7;   /* 128Byte cache-line 
resolution */
+   dimm->grain = 1 << 7;   /* always */
dimm->dtype = i82975x_dram_type(mch_window, index);
-   dimm->mtype = MEM_DDR2; /* I82975x supports only DDR2 */
+   dimm->mtype = MEM_DDR2; /* only supported */
dimm->edac_mode = EDAC_SECDED; /* only supported */
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] omap_vout: find_vma() needs ->mmap_sem held

2012-12-15 Thread Al Viro
On Sat, Dec 15, 2012 at 08:12:37PM +, Al Viro wrote:
>   Walking rbtree while it's modified is a Bad Idea(tm); besides,
> the result of find_vma() can be freed just as it's getting returned
> to caller.  Fortunately, it's easy to fix - just take ->mmap_sem a bit
> earlier (and don't bother with find_vma() at all if virtp >= PAGE_OFFSET -
> in that case we don't even look at its result).

While we are at it, what prevents VIDIOC_PREPARE_BUF calling
v4l_prepare_buf() -> (e.g) vb2_ioctl_prepare_buf() -> vb2_prepare_buf() ->
__buf_prepare() -> __qbuf_userptr() -> vb2_vmalloc_get_userptr() -> find_vma(),
AFAICS without having taken ->mmap_sem anywhere in process?  The code flow
is bloody convoluted and depends on a bunch of things done by initialization,
so I certainly might've missed something...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3.7.0 0/9] i82975x_edac: driver cleanup

2012-12-15 Thread Arvind R
Subject: [PATCH 3.7.0 0/9] i82975x_edac: driver cleanup

This patchset cleans up the accumulated mess the driver has become.
Currently, it does not crash, but serves no other purpose. This patch-set
gets it to print correct DIMM labels on errors, and sync with the core
w.r.t memory layout.

It consists of 9 patches as follows:
1. fix style errors:
clean up the source w.r.t. long lines

2. fix layers initialisation:
the wrong initialisation caused the fatal 3.6 crash that has 
been
temporarily fixed. The csrow_init func did not handle rows
exceeding 8 in number.
This patch sets channels as 1 or 2 and chip_select accordingly, 
so
that there are always 8 ranks of memory.

3. cleanup debug code:
Remove / modify debug levels to sane values. Fixes log flooding 
when
CONFIG_EDAC_DEBUG set.
Move local ifdef debug code to print DRAM settings into
CONFIG_EDAC_DEBUG section.

4. remove unnecessary function:
function returning a constant DEV_X8 value and used only once, 
removed.

5. optimise mode detection:
dual_channel_active now returns bool. With minor optimisations.

6. unmap pcibar after init:
remove the unused iomapped mch_window in private data 
structure; and
unmap the window after initialisation. Now that __devinit is 
deprecated,
this is needed.

7. correct dimm label initialisation:
fix dimm labels to correspond to mobo legends. Assumes 2 ranks 
per DIMM
and DIMMS not spanning channels - as it is on Asus P5WDG2 
family at least.
Currently, the label bears no correspondance to mobo legends.

8. fix wrong offset reported:
fixes the sometimes greate than PAGESIZE offset reporting 
caused by mixup
of grain-size and grain-shift.

9. set SW-SCRUB mode and bump revision:
With the error address being correct in the core's view,
enable SCRUB_SW_SRC so CEs can be written back.
And update driver revision.

Signed-off-by: Arvind R. 
---
Total changes:
 i82975x_edac.c |  349 +-
  1 file changed, 175 insertions(+), 174 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] asm-generic, mm: pgtable: fix include for my_zero_pfn()

2012-12-15 Thread Deng-Cheng Zhu
From: Deng-Cheng Zhu 

A MIPS build showed:

In file included from arch/mips/include/asm/pgtable.h:388,
 from mm/init-mm.c:9:
include/asm-generic/pgtable.h: In function 'my_zero_pfn':
include/asm-generic/pgtable.h:462: error: 'mem_map' undeclared (first use
in this function)
include/asm-generic/pgtable.h:462: error: (Each undeclared identifier is
reported only once
include/asm-generic/pgtable.h:462: error: for each function it appears in.)

This was caused by the following commit:
816422ad76 asm-generic, mm: pgtable: consolidate zero page helpers

Changing my_zero_pfn from #define to an inline function requires the
include fix. I believe s390 has the same problem as mips.

Although Ralf has added "#include " in
arch/mips/include/asm/pgtable.h in his "MIPS: Transparent Huge Pages
support" commit, and this error went away, I think this fix is needed
since asm-generic/pgtable.h is now the place of the function my_zero_pfn()
who requires the definition of mem_map and this header could be included by
others.

Cc: Kirill A. Shutemov 
Cc: Ralf Baechle 
Cc: Steven J. Hill 
Signed-off-by: Deng-Cheng Zhu 
---
 include/asm-generic/pgtable.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 284e808..628dbbb 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -4,7 +4,7 @@
 #ifndef __ASSEMBLY__
 #ifdef CONFIG_MMU
 
-#include 
+#include 
 #include 
 
 #ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] omap_vout: find_vma() needs ->mmap_sem held

2012-12-15 Thread Al Viro
Walking rbtree while it's modified is a Bad Idea(tm); besides,
the result of find_vma() can be freed just as it's getting returned
to caller.  Fortunately, it's easy to fix - just take ->mmap_sem a bit
earlier (and don't bother with find_vma() at all if virtp >= PAGE_OFFSET -
in that case we don't even look at its result).

Cc: sta...@vger.kernel.org [2.6.35]
Signed-off-by: Al Viro 
---
diff --git a/drivers/media/platform/omap/omap_vout.c 
b/drivers/media/platform/omap/omap_vout.c
index 9935040..984512f 100644
--- a/drivers/media/platform/omap/omap_vout.c
+++ b/drivers/media/platform/omap/omap_vout.c
@@ -207,19 +207,21 @@ static u32 omap_vout_uservirt_to_phys(u32 virtp)
struct vm_area_struct *vma;
struct mm_struct *mm = current->mm;
 
-   vma = find_vma(mm, virtp);
/* For kernel direct-mapped memory, take the easy way */
-   if (virtp >= PAGE_OFFSET) {
-   physp = virt_to_phys((void *) virtp);
+   if (virtp >= PAGE_OFFSET)
+   return virt_to_phys((void *) virtp);
+
+   down_read(>mm->mmap_sem);
+   vma = find_vma(mm, virtp);
} else if (vma && (vma->vm_flags & VM_IO) && vma->vm_pgoff) {
/* this will catch, kernel-allocated, mmaped-to-usermode
   addresses */
physp = (vma->vm_pgoff << PAGE_SHIFT) + (virtp - vma->vm_start);
+   up_read(>mm->mmap_sem);
} else {
/* otherwise, use get_user_pages() for general userland pages */
int res, nr_pages = 1;
struct page *pages;
-   down_read(>mm->mmap_sem);
 
res = get_user_pages(current, current->mm, virtp, nr_pages, 1,
0, , NULL);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 167/241] SUNRPC: Set alloc_slot for backchannel tcp ops

2012-12-15 Thread Ben Hutchings
On Thu, 2012-12-13 at 11:58 -0200, Herton Ronaldo Krzesinski wrote:
> 3.5.7.2 -stable review patch.  If anyone has any objections, please let me 
> know.
> 
> --
> 
> From: Bryan Schumaker 
> 
> commit 84e28a307e376f271505af65a7b7e212dd6f61f4 upstream.
> 
> f39c1bfb5a03e2d255451bff05be0d7255298fa4 (SUNRPC: Fix a UDP transport
> regression) introduced the "alloc_slot" function for xprt operations,
> but never created one for the backchannel operations.  This patch fixes
> a null pointer dereference when mounting NFS over v4.1.
[...]

Greg, you missed this in 3.4.y.  It might need a context fix; I'm
attaching the version I used for 3.2.y.

Ben.

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
- John Levine, moderator of comp.compilers
From: Bryan Schumaker 
Date: Mon, 24 Sep 2012 13:39:01 -0400
Subject: SUNRPC: Set alloc_slot for backchannel tcp ops

commit 84e28a307e376f271505af65a7b7e212dd6f61f4 upstream.

f39c1bfb5a03e2d255451bff05be0d7255298fa4 (SUNRPC: Fix a UDP transport
regression) introduced the "alloc_slot" function for xprt operations,
but never created one for the backchannel operations.  This patch fixes
a null pointer dereference when mounting NFS over v4.1.

Call Trace:
 [] ? xprt_reserve+0x47/0x50 [sunrpc]
 [] call_reserve+0x34/0x60 [sunrpc]
 [] __rpc_execute+0x90/0x400 [sunrpc]
 [] rpc_async_schedule+0x2a/0x40 [sunrpc]
 [] process_one_work+0x139/0x500
 [] ? alloc_worker+0x70/0x70
 [] ? __rpc_execute+0x400/0x400 [sunrpc]
 [] worker_thread+0x15e/0x460
 [] ? preempt_schedule+0x49/0x70
 [] ? rescuer_thread+0x230/0x230
 [] kthread+0x93/0xa0
 [] kernel_thread_helper+0x4/0x10
 [] ? kthread_freezable_should_stop+0x70/0x70
 [] ? gs_change+0x13/0x13

Signed-off-by: Bryan Schumaker 
Signed-off-by: Trond Myklebust 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings 
---
 net/sunrpc/xprtsock.c |1 +
 1 file changed, 1 insertion(+)

--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2488,6 +2488,7 @@ static struct rpc_xprt_ops xs_tcp_ops =
 static struct rpc_xprt_ops bc_tcp_ops = {
 	.reserve_xprt		= xprt_reserve_xprt,
 	.release_xprt		= xprt_release_xprt,
+	.alloc_slot		= xprt_alloc_slot,
 	.buf_alloc		= bc_malloc,
 	.buf_free		= bc_free,
 	.send_request		= bc_send_request,


signature.asc
Description: This is a digitally signed message part


Re: [GIT PULL] x86/uapi for 3.8

2012-12-15 Thread Markus Trippelsdorf
On 2012.12.15 at 11:58 -0800, Linus Torvalds wrote:
> On Sat, Dec 15, 2012 at 11:41 AM, H. Peter Anvin  wrote:
> >
> > Matt is on vacation, and I'm partly offline for the weekend, but that
> > definitely seems suspicious.  Do we have a memory map of the affected
> > machine(s)?
> 
> Here's mine.
> 
>   e820: BIOS-provided physical RAM map:
>   BIOS-e820: [mem 0x-0x0009e7ff] usable
>   BIOS-e820: [mem 0x0009e800-0x0009] reserved
>   BIOS-e820: [mem 0x000e4000-0x000f] reserved
>   BIOS-e820: [mem 0x0010-0xbdc6] usable
>   BIOS-e820: [mem 0xbdc7-0xbdc87fff] ACPI data
>   BIOS-e820: [mem 0xbdc88000-0xbdcdbfff] ACPI NVS
>   BIOS-e820: [mem 0xbdcdc000-0xbfff] reserved
>   BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
>   BIOS-e820: [mem 0xff80-0x] reserved
>   BIOS-e820: [mem 0x0001-0x0001fbff] usable
>   BIOS-e820: [mem 0x0001fc00-0x0001] reserved
>   BIOS-e820: [mem 0x0002-0x00023fff] usable
> 
> but as mentioned, there's bound to be some particular kernel layout
> that triggers this, because I definitely ran a few kernels with that
> commit in it without problems (and clearly other people are too).
> Looking at my boot log, I had successful boots with both 6a57d104c8cb
> and c2714334b944, which contains that commit.
> 
> It might also be that it causes some massive corruption at boot time,
> but it then requires that that particular memory is actually used. So
> maybe it's not so much about the memory map except indirectly.
> 
> But that commit *does* look a lot more likely than the things I looked at.
> 
> Markus, how did you happen to pinpoint that particular commit? Is it
> entirely repeatable for you?

Yes, although at one point during bisecting the BUG disappeared and the
screen went simply black during boot and X never started. I marked this
as bad and continued the bisection.

Here is my mem-map:

e820: BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0100-0x0009fbff] usable
BIOS-e820: [mem 0x0009fc00-0x0009] reserved
BIOS-e820: [mem 0x000e6000-0x000f] reserved
BIOS-e820: [mem 0x0010-0xdfe8] usable
BIOS-e820: [mem 0xdfe9-0xdfea7fff] ACPI data
BIOS-e820: [mem 0xdfea8000-0xdfec] ACPI NVS
BIOS-e820: [mem 0xdfed-0xdfef] reserved
BIOS-e820: [mem 0xfff0-0x] reserved
BIOS-e820: [mem 0x0001-0x00021fff] usable

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 162/241] Bluetooth: ath3k: Add support for VAIO VPCEH [0489:e027]

2012-12-15 Thread Ben Hutchings
On Thu, 2012-12-13 at 11:58 -0200, Herton Ronaldo Krzesinski wrote:
> 3.5.7.2 -stable review patch.  If anyone has any objections, please let me 
> know.
> 
> --
> 
> From: Marcos Chaparro 
> 
> commit acd9454433e28c1a365d8b069813c35c1c3a8ac3 upstream.
> 
> Added Atheros AR3011 internal bluetooth device found in Sony VAIO VPCEH to the
> devices list.
> Before this, the bluetooth module was identified as an Foxconn / Hai bluetooth
> device [0489:e027], now it claims to be an AtherosAR3011 Bluetooth
> [0cf3:3005].
[...]

This seems to be applicable to 3.{0,2,4,6}.y as well...

Ben.

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
- John Levine, moderator of comp.compilers


signature.asc
Description: This is a digitally signed message part


Re: [GIT PULL] x86/uapi for 3.8

2012-12-15 Thread Linus Torvalds
On Sat, Dec 15, 2012 at 11:41 AM, H. Peter Anvin  wrote:
>
> Matt is on vacation, and I'm partly offline for the weekend, but that
> definitely seems suspicious.  Do we have a memory map of the affected
> machine(s)?

Here's mine.

  e820: BIOS-provided physical RAM map:
  BIOS-e820: [mem 0x-0x0009e7ff] usable
  BIOS-e820: [mem 0x0009e800-0x0009] reserved
  BIOS-e820: [mem 0x000e4000-0x000f] reserved
  BIOS-e820: [mem 0x0010-0xbdc6] usable
  BIOS-e820: [mem 0xbdc7-0xbdc87fff] ACPI data
  BIOS-e820: [mem 0xbdc88000-0xbdcdbfff] ACPI NVS
  BIOS-e820: [mem 0xbdcdc000-0xbfff] reserved
  BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
  BIOS-e820: [mem 0xff80-0x] reserved
  BIOS-e820: [mem 0x0001-0x0001fbff] usable
  BIOS-e820: [mem 0x0001fc00-0x0001] reserved
  BIOS-e820: [mem 0x0002-0x00023fff] usable

but as mentioned, there's bound to be some particular kernel layout
that triggers this, because I definitely ran a few kernels with that
commit in it without problems (and clearly other people are too).
Looking at my boot log, I had successful boots with both 6a57d104c8cb
and c2714334b944, which contains that commit.

It might also be that it causes some massive corruption at boot time,
but it then requires that that particular memory is actually used. So
maybe it's not so much about the memory map except indirectly.

But that commit *does* look a lot more likely than the things I looked at.

Markus, how did you happen to pinpoint that particular commit? Is it
entirely repeatable for you?

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 152/241] mm: vmscan: fix endless loop in kswapd balancing

2012-12-15 Thread Ben Hutchings
On Thu, 2012-12-13 at 11:58 -0200, Herton Ronaldo Krzesinski wrote:
> 3.5.7.2 -stable review patch.  If anyone has any objections, please let me 
> know.
> 
> --
> 
> From: Johannes Weiner 
> 
> commit 60cefed485a02bd99b6299dad70666fe49245da7 upstream.
[...]

Greg, you missed this in 3.{0,4}.y.  I'm attaching the version I used
for 3.2.y, which seems to be applicable to 3.0.y.  One or other of these
should work for 3.4.y.

Ben.

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
- John Levine, moderator of comp.compilers
From 39d18dc4b8b0c000fa681cbae10ac3f8a132814b Mon Sep 17 00:00:00 2001
From: Johannes Weiner 
Date: Thu, 29 Nov 2012 13:54:23 -0800
Subject: [PATCH] mm: vmscan: fix endless loop in kswapd balancing

commit 60cefed485a02bd99b6299dad70666fe49245da7 upstream.

Kswapd does not in all places have the same criteria for a balanced
zone.  Zones are only being reclaimed when their high watermark is
breached, but compaction checks loop over the zonelist again when the
zone does not meet the low watermark plus two times the size of the
allocation.  This gets kswapd stuck in an endless loop over a small
zone, like the DMA zone, where the high watermark is smaller than the
compaction requirement.

Add a function, zone_balanced(), that checks the watermark, and, for
higher order allocations, if compaction has enough free memory.  Then
use it uniformly to check for balanced zones.

This makes sure that when the compaction watermark is not met, at least
reclaim happens and progress is made - or the zone is declared
unreclaimable at some point and skipped entirely.

Signed-off-by: Johannes Weiner 
Reported-by: George Spelvin 
Reported-by: Johannes Hirte 
Reported-by: Tomas Racek 
Tested-by: Johannes Hirte 
Reviewed-by: Rik van Riel 
Cc: Mel Gorman 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings 
---
 mm/vmscan.c |   27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 313381c..1e4ee1a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2492,6 +2492,19 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 }
 #endif
 
+static bool zone_balanced(struct zone *zone, int order,
+			  unsigned long balance_gap, int classzone_idx)
+{
+	if (!zone_watermark_ok_safe(zone, order, high_wmark_pages(zone) +
+balance_gap, classzone_idx, 0))
+		return false;
+
+	if (COMPACTION_BUILD && order && !compaction_suitable(zone, order))
+		return false;
+
+	return true;
+}
+
 /*
  * pgdat_balanced is used when checking if a node is balanced for high-order
  * allocations. Only zones that meet watermarks and are in a zone allowed
@@ -2551,8 +2564,7 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining,
 			continue;
 		}
 
-		if (!zone_watermark_ok_safe(zone, order, high_wmark_pages(zone),
-			i, 0))
+		if (!zone_balanced(zone, order, 0, i))
 			all_zones_ok = false;
 		else
 			balanced += zone->present_pages;
@@ -2655,8 +2667,7 @@ loop_again:
 shrink_active_list(SWAP_CLUSTER_MAX, zone,
 			, priority, 0);
 
-			if (!zone_watermark_ok_safe(zone, order,
-	high_wmark_pages(zone), 0, 0)) {
+			if (!zone_balanced(zone, order, 0, 0)) {
 end_zone = i;
 break;
 			} else {
@@ -2717,9 +2728,8 @@ loop_again:
 (zone->present_pages +
 	KSWAPD_ZONE_BALANCE_GAP_RATIO-1) /
 KSWAPD_ZONE_BALANCE_GAP_RATIO);
-			if (!zone_watermark_ok_safe(zone, order,
-	high_wmark_pages(zone) + balance_gap,
-	end_zone, 0)) {
+			if (!zone_balanced(zone, order,
+	   balance_gap, end_zone)) {
 shrink_zone(priority, zone, );
 
 reclaim_state->reclaimed_slab = 0;
@@ -2746,8 +2756,7 @@ loop_again:
 continue;
 			}
 
-			if (!zone_watermark_ok_safe(zone, order,
-	high_wmark_pages(zone), end_zone, 0)) {
+			if (!zone_balanced(zone, order, 0, end_zone)) {
 all_zones_ok = 0;
 /*
  * We are still under min water mark.  This


signature.asc
Description: This is a digitally signed message part


Re: [RFC v2 1/1] RTL8712 alignment bug in 3.6 and up on ARMV5

2012-12-15 Thread Dan Carpenter
Thanks for fixing this bug.  Your patch works but it's not the right
way to do it.

The original code here adds 4 to pointers which are currently
aligned instead of leaving them as is.  We have a kernel ALIGN()
macro which works correctly, but actually, it's not needed.

On arm, the pointer returned from kmalloc() is already aligned at
the 8 byte boundary because "#define ARCH_SLAB_MINALIGN 8".  The
original code always adds 4 to the pointer so everything is
misaligned.

Your patch adds another 4 bytes so it is now aligned at the 8 byte
boundary again.  That works, of course, but it's better to remove
the whole mess.

pstapriv->pallocated_stainfo_buf = kmalloc(sizeof(struct sta_info) * 
NUM_STA);

Get rid of the ->pstainfo_buf pointer which is only used to store
the "aligned" version of ->pallocated_stainfo_buf.

Please send a version which applies with "git am" and has the proper
sign-off.  Sent it to yourself first.  Save the raw email (including
headers and everything).
cat raw_email.txt | git am
Type "git log -p" to verify that the commit message looks good.
Then resend it to the list.

Thanks again.  This is a good bugfix.

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] x86/uapi for 3.8

2012-12-15 Thread H. Peter Anvin
On 12/15/2012 10:35 AM, Linus Torvalds wrote:
> On Sat, Dec 15, 2012 at 8:33 AM, Markus Trippelsdorf
>  wrote:
>> On 2012.12.14 at 17:47 -0800, Linus Torvalds wrote:
>>>
>>> Ho humm. Anybody else see anything strange?
>>
>> Yes. I'm seeing a BUG early during boot on my machine (RIP=NULL):
>>
>>  BUG: unable to handle kernel NULL pointer dereference at   (null)
>>
>> This is caused by commit 53b87cf088e2 ("x86, mm: Include the
>> entire kernel memory map in trampoline_pgd")
> 
> Hmm. That reverts cleanly, and the result boots fine for me. And the
> commit looks like exactly the kind of thing that could result in
> problems with exactly the right memory layout, so it could explain why
> the bisect failed and some kernels randomly worked for me and others
> didn't.
> 
> So this at least looks like a very possible candidate.
> 
> Does anybody have an explanation for the problem?
> 
> Btw. the machine in question does not have EFI, and is a bog-standard
> PC (DMI string: "P7H57D-V EVO, BIOS 0999 01/19/2010")
> 
> Matt? Jan?
> 

Matt is on vacation, and I'm partly offline for the weekend, but that
definitely seems suspicious.  Do we have a memory map of the affected
machine(s)?

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 139/241] block: Don't access request after it might be freed

2012-12-15 Thread Ben Hutchings
On Thu, 2012-12-13 at 11:58 -0200, Herton Ronaldo Krzesinski wrote:
> 3.5.7.2 -stable review patch.  If anyone has any objections, please let me 
> know.
> 
> --
> 
> From: Roland Dreier 
> 
> commit 893d290f1d7496db97c9471bc352ad4a11dc8a25 upstream.
> 
> After we've done __elv_add_request() and __blk_run_queue() in
> blk_execute_rq_nowait(), the request might finish and be freed
> immediately.  Therefore checking if the type is REQ_TYPE_PM_RESUME
> isn't safe afterwards, because if it isn't, rq might be gone.
> Instead, check beforehand and stash the result in a temporary.
> 
> This fixes crashes in blk_execute_rq_nowait() I get occasionally when
> running with lots of memory debugging options enabled -- I think this
> race is usually harmless because the window for rq to be reallocated
> is so small.
> 
> Signed-off-by: Roland Dreier 
> Signed-off-by: Jens Axboe 
> [ herton: adjust context ]
> Signed-off-by: Herton Ronaldo Krzesinski 

This is missing from 3.{0,4} but I did apply it to 3.2, again with the
need to adjust context.

Perhaps the intermediate fixes to blk_execute_rq_nowait() that resulted
in the changed context should also be applied to stable updates.  The
fixes in question are:

8ba61435d73f block: add missing blk_queue_dead() checks
- applied in 3.3; missing from 3.{0,2}.y
e81ca6fe85b7 [SCSI] block: Fix blk_execute_rq_nowait() dead queue handling
- applied in 3.6; missing from 3.{0,2,4,5}.y
893d290f1d74 block: Don't access request after it might be freed
- applied in 3.7, then to 3.{2,6}.y; missing from 3.{0,4}.y

Jens?

Ben.
> ---
>  block/blk-exec.c |   10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-exec.c b/block/blk-exec.c
> index fb2cbd5..9925fbe 100644
> --- a/block/blk-exec.c
> +++ b/block/blk-exec.c
> @@ -49,8 +49,16 @@ void blk_execute_rq_nowait(struct request_queue *q, struct 
> gendisk *bd_disk,
>  rq_end_io_fn *done)
>  {
>   int where = at_head ? ELEVATOR_INSERT_FRONT : ELEVATOR_INSERT_BACK;
> + bool is_pm_resume;
>  
>   WARN_ON(irqs_disabled());
> +
> + /*
> +  * need to check this before __blk_run_queue(), because rq can
> +  * be freed before that returns.
> +  */
> + is_pm_resume = rq->cmd_type == REQ_TYPE_PM_RESUME;
> +
>   spin_lock_irq(q->queue_lock);
>  
>   if (unlikely(blk_queue_dead(q))) {
> @@ -66,7 +74,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct 
> gendisk *bd_disk,
>   __elv_add_request(q, rq, where);
>   __blk_run_queue(q);
>   /* the queue is stopped so it won't be run */
> - if (rq->cmd_type == REQ_TYPE_PM_RESUME)
> + if (is_pm_resume)
>   q->request_fn(q);
>   spin_unlock_irq(q->queue_lock);
>  }

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
- John Levine, moderator of comp.compilers


signature.asc
Description: This is a digitally signed message part


Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU

2012-12-15 Thread H. Peter Anvin
On 12/14/2012 11:57 PM, Yinghai Lu wrote:
> 
> I tailored your patch and made use 2M page increase to replace patch
> ioremap function.
> 
>[PATCH v6 12/27] x86: use io_remap to access real_mode_data
> 
> and it will extend init_level4_pgt to map extra range. that will limit
> affect to even others.
> 
> please check if that is ok to you.
> 

What is the point of only managing 2M at a time?  Now you have to have
more conditionals and you don't get any more memory efficiency.

Filling arbitrarily into the brk is not acceptable... the brk is an O(1)
area and all brk allocations need to be reserved at compile time, so the
overflow handling is still necessary.

So no, this patch is not acceptable.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 135/241] md/raid10: close race that lose writes lost when replacement completes.

2012-12-15 Thread Ben Hutchings
On Thu, 2012-12-13 at 11:58 -0200, Herton Ronaldo Krzesinski wrote:
> 3.5.7.2 -stable review patch.  If anyone has any objections, please let me 
> know.
> 
> --
> 
> From: NeilBrown 
> 
> commit e7c0c3fa29280d62aa5e11101a674bb3064bd791 upstream.
> 
> When a replacement operation completes there is a small window
> when the original device is marked 'faulty' and the replacement
> still looks like a replacement.  The faulty should be removed and
> the replacement moved in place very quickly, bit it isn't instant.
> 
> So the code write out to the array must handle the possibility that
> the only working device for some slot in the replacement - but it
> doesn't.  If the primary device is faulty it just gives up.  This
> can lead to corruption.
> 
> So make the code more robust: if either  the primary or the
> replacement is present and working, write to them.  Only when
> neither are present do we give up.
> 
> This bug has been present since replacement was introduced in
> 3.3, so it is suitable for any -stable kernel since then.

This is missing from 3.4, so Greg will presumably want to apply this (if
the backport is correct).

Ben.

> Reported-by: "George Spelvin" 
> Signed-off-by: NeilBrown 
> [ herton: hairy code adjustment on 3rd hunk (conf->copies for loop) ]
> Signed-off-by: Herton Ronaldo Krzesinski 
> ---
>  drivers/md/raid10.c |  113 
> +++
>  1 file changed, 59 insertions(+), 54 deletions(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 17fae37..0920adf 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -1267,18 +1267,21 @@ retry_write:
>   blocked_rdev = rrdev;
>   break;
>   }
> + if (rdev && (test_bit(Faulty, >flags)
> +  || test_bit(Unmerged, >flags)))
> + rdev = NULL;
>   if (rrdev && (test_bit(Faulty, >flags)
> || test_bit(Unmerged, >flags)))
>   rrdev = NULL;
>  
>   r10_bio->devs[i].bio = NULL;
>   r10_bio->devs[i].repl_bio = NULL;
> - if (!rdev || test_bit(Faulty, >flags) ||
> - test_bit(Unmerged, >flags)) {
> +
> + if (!rdev && !rrdev) {
>   set_bit(R10BIO_Degraded, _bio->state);
>   continue;
>   }
> - if (test_bit(WriteErrorSeen, >flags)) {
> + if (rdev && test_bit(WriteErrorSeen, >flags)) {
>   sector_t first_bad;
>   sector_t dev_sector = r10_bio->devs[i].addr;
>   int bad_sectors;
> @@ -1320,8 +1323,10 @@ retry_write:
>   max_sectors = good_sectors;
>   }
>   }
> - r10_bio->devs[i].bio = bio;
> - atomic_inc(>nr_pending);
> + if (rdev) {
> + r10_bio->devs[i].bio = bio;
> + atomic_inc(>nr_pending);
> + }
>   if (rrdev) {
>   r10_bio->devs[i].repl_bio = bio;
>   atomic_inc(>nr_pending);
> @@ -1377,58 +1382,58 @@ retry_write:
>   for (i = 0; i < conf->copies; i++) {
>   struct bio *mbio;
>   int d = r10_bio->devs[i].devnum;
> - if (!r10_bio->devs[i].bio)
> - continue;
> -
> - mbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
> - md_trim_bio(mbio, r10_bio->sector - bio->bi_sector,
> - max_sectors);
> - r10_bio->devs[i].bio = mbio;
> -
> - mbio->bi_sector = (r10_bio->devs[i].addr+
> -choose_data_offset(r10_bio,
> -   conf->mirrors[d].rdev));
> - mbio->bi_bdev = conf->mirrors[d].rdev->bdev;
> - mbio->bi_end_io = raid10_end_write_request;
> - mbio->bi_rw = WRITE | do_sync | do_fua;
> - mbio->bi_private = r10_bio;
> -
> - atomic_inc(_bio->remaining);
> - spin_lock_irqsave(>device_lock, flags);
> - bio_list_add(>pending_bio_list, mbio);
> - conf->pending_count++;
> - spin_unlock_irqrestore(>device_lock, flags);
> - if (!mddev_check_plugged(mddev))
> - md_wakeup_thread(mddev->thread);
> -
> - if (!r10_bio->devs[i].repl_bio)
> - continue;
> + if (r10_bio->devs[i].bio) {
> + struct md_rdev *rdev = conf->mirrors[d].rdev;
> + mbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
> + md_trim_bio(mbio, r10_bio->sector - bio->bi_sector,
> + max_sectors);
> + r10_bio->devs[i].bio = mbio;
> +
> + mbio->bi_sector = 

Re: man page for s390_runtime_instr syscall

2012-12-15 Thread Michael Kerrisk (man-pages)
Hello Jan,

On Mon, Dec 10, 2012 at 12:34 PM, Jan Glauber  wrote:
> Hi Michael,
>
> I've written a man page for the s390_runtime_instr syscall which was
> merged with 3.7 (e4b8b3f). Now the question is if you would like to
> include it in the man-pages although it is completely s390 specific
and wont be available on any other arch? Or should it go into a
> different package?

Thanks for this page. The man-pages package is the right place for it,
but a few things need fixing. Could you see below and resubmit please?

> --- /dev/null   2012-12-04 10:52:46.657720288 +0100
> +++ s390_runtime_instr.22012-10-09 13:55:39.0 +0200
> @@ -0,0 +1,73 @@
> +.\" Copyright IBM Corp. 2012
> +.\" Author: Jan Glauber 

You have provided no license here. Can you please add one. Please see
http://www.kernel.org/doc/man-pages/licenses.html. (My preference is
the "verbatim" license, but others are of course possible.)

> +.\"
> +.TH S390_RUNTIME_INSTR 2 2012-10-09 "Linux Programmer's Manual"

Update the date here.

> +.SH NAME
> +s390_runtime_instr \- enable/disable s390 CPU runtime instrumentation
> +.SH SYNOPSIS
> +.nf
> +.B #include 
> +
> +.BI "int s390_runtime_instr(int " command ", int " signum ");
> +.fi
> +
> +.SH DESCRIPTION
> +The
> +.BR s390_runtime_instr ()
> +system call starts or stops CPU runtime instrumentation for the current 
> thread.
> +
> +The
> +.IR command
> +argument controls whether runtime instumentation is started

Spelling: instrumentation

> +( 1 ) or stopped ( 2 ) for the current thread.
> +
> +The
> +.IR signum
> +argument specifies the number of a real-time signal. The

Please start new sentences on a new source line.

> +real-time signal is sent to the thread if the runtime instrumentation
> +buffer is full or if the runtime-instrumentation-halted interrupt
> +occured.

Spelling: occurred.

> +
> +.SH RETURN VALUE
> +On success
> +.BR s390_runtime_instr ()
> +returns 0 and enables the thread for
> +runtime instrumentation by assigning the thread a default runtime
> +instrumentation control block. The caller can then read and modify the

Start new sentence on a new source line.

> +control block and start the runtime instrumentation. On error, -1 is

Start new sentence on a new source line.

> +returned and
> +.IR errno
> +is set to one of the error codes listed below.
> +
> +.SH ERRORS
> +.TP
> +.B EOPNOTSUPP
> +The runtime instrumentation facility is not available.
> +.TP
> +.B EINVAL
> +The value specified in
> +.IR command
> +is not a valid command or the value specified in
> +.IR signum
> +is not a real-time signal number.
> +.TP
> +.B ENOMEM
> +Allocating memory for the runtime instrumentation control block
> +failed.
> +
> +.SH VERSIONS
> +This system call is available since Linux 3.7.
> +
> +.SH CONFORMING TO
> +This system call

This Linux-specific system call

> +is only available on the s390 architecture. The runtime instrumentation 
> facility is available

http://www.kernel.org/doc/man-pages/licenses.html

> +beginning with System z EC12.
> +
> +.SH NOTES
> +Glibc does not provide a wrapper for this system call, use
> +.BR syscall (2)
> +to call it.

Somewhere around here it would be nice to have some notes on how one
uses this RI feature. The commit
e4b8b3f33fcaa0ed6e6b5482a606091d8cd20beb has a bit of info. I'd
suggest including that info in the page, with (possibly) an example.

> +
> +.SH SEE ALSO
> +.BR signal (7),
> +.BR syscall (2)

Order entries here by section number.

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] x86/uapi for 3.8

2012-12-15 Thread Linus Torvalds
On Sat, Dec 15, 2012 at 8:33 AM, Markus Trippelsdorf
 wrote:
> On 2012.12.14 at 17:47 -0800, Linus Torvalds wrote:
>>
>> Ho humm. Anybody else see anything strange?
>
> Yes. I'm seeing a BUG early during boot on my machine (RIP=NULL):
>
>  BUG: unable to handle kernel NULL pointer dereference at   (null)
>
> This is caused by commit 53b87cf088e2 ("x86, mm: Include the
> entire kernel memory map in trampoline_pgd")

Hmm. That reverts cleanly, and the result boots fine for me. And the
commit looks like exactly the kind of thing that could result in
problems with exactly the right memory layout, so it could explain why
the bisect failed and some kernels randomly worked for me and others
didn't.

So this at least looks like a very possible candidate.

Does anybody have an explanation for the problem?

Btw. the machine in question does not have EFI, and is a bog-standard
PC (DMI string: "P7H57D-V EVO, BIOS 0999 01/19/2010")

Matt? Jan?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] avoid entropy starvation due to stack protection

2012-12-15 Thread Ondřej Bílka
Why not use nonblocking pool and seed nonblocking pool only with half of
collected entropy to get /dev/random in almost all practical scenarios
nonblocking?

On Thu, Dec 13, 2012 at 08:44:36AM +0100, Stephan Mueller wrote:
> On 13.12.2012 01:43:21, +0100, Andrew Morton
>  wrote:
> 
> Hi Andrew,
> > On Tue, 11 Dec 2012 13:33:04 +0100
> > Stephan Mueller  wrote:
> >
> >> Some time ago, I noticed the fact that for every newly
> >> executed process, the function create_elf_tables requests 16 bytes of
> >> randomness from get_random_bytes. This is easily visible when calling
> >>
> >> while [ 1 ]
> >> do
> >>cat /proc/sys/kernel/random/entropy_avail
> >>sleep 1
> >> done
> > Please see
> > http://ozlabs.org/~akpm/mmotm/broken-out/binfmt_elfc-use-get_random_int-to-fix-entropy-depleting.patch
> >
> > That patch is about one week from a mainline merge, btw.
> 
> Initially I was also thinking about get_random_int. But stack protection
> depends on non-predictable numbers to ensure it cannot be defeated. As
> get_random_int depends on MD5 which is assumed to be broken now, I
> discarded the idea of using get_random_int.
> 
> Moreover, please consider that get_cycles is an architecture-specific
> function that on some architectures only returns 0 (For all
> architectures where this is implemented, you have no guarantee that it
> increments as a high-resolution timer). So, the quality of
> get_random_int is questionable IMHO for the use as a stack protector.
> 
> Also note, that other in-kernel users of get_random_bytes may be
> converted to using the proposed kernel pool to avoid more entropy drainage.
> 
> Please note that the suggested approach of fully seeding a deterministic
> RNG never followed by a re-seeding is used elsewhere (e.g. the OpenSSL
> RNG). Therefore, I think the suggested approach is viable.
> 
> Ciao
> Stephan
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 

It's those computer people in X {city of world}.  They keep stuffing things up.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


oopsable race in xen-gntdev (unsafe vma access)

2012-12-15 Thread Al Viro
1) find_vma() is *not* safe without ->mmap_sem and its result may
very well be freed just as it's returned to caller.  IOW,
gntdev_ioctl_get_offset_for_vaddr() is racy and may end up with
dereferencing freed memory.

2) gntdev_vma_close() is putting NULL into map->vma with only
->mmap_sem held by caller.  Things like
if (!map->vma)
continue;
if (map->vma->vm_start >= end)
continue;
if (map->vma->vm_end <= start)
done with just priv->lock held are racy.

I'm not familiar with the code, but it looks like we need to
protect gntdev_vma_close() guts with the same spinlock and probably
hold ->mmap_sem shared around the "find_vma()+get to map->{index,count}"
in the ioctl.  Or replace the logics in ioctl with search through the
list of grant_map under the same spinlock...

Comments?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 12/12] edac: fix kernel panic on module unloading

2012-12-15 Thread Borislav Petkov
On Fri, Dec 14, 2012 at 03:03:10PM +0400, Konstantin Khlebnikov wrote:
> This patch fixes use-after-free and double-free bugs in
> edac_mc_sysfs_exit(). mci_pdev has single reference and put_device()
> calls mc_attr_release() which calls kfree(), thus following
> device_del() works with already released memory. An another kfree() in
> edac_mc_sysfs_exit() releses the same memory again. Great.

Applied and tagged for 3.6 and 3.7 stable.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Panic at shutdown in x86-64 3.7 kernel under qemu 1.3.0?

2012-12-15 Thread Rob Landley
Reasonably vanilla versions of both just did this. No idea why. Just  
did it the once, haven't gotten it to reproduce...


Rob

Restarting system.
reboot: machine restart
general protection fault: fff2 [#1]
CPU 0
Pid: 8542, comm: oneit Not tainted 3.7.0 #1 Bochs Bochs
RIP: 0010:[]  []  
lapic_shutdown+0x29/0x2b

RSP: 0018:88000fb57e28  EFLAGS: 0202
RAX: 8130e2d0 RBX: 0202 RCX: 
RDX: 81322a40 RSI: 00ff RDI: 00f0
RBP: 28121969 R08: 88000fb57fd8 R09: 
R10:  R11: 81015721 R12: fee1dead
R13:  R14: 0004 R15: 00425e02
FS:  () GS:81304000()  
knlGS:

CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 00440f11 CR3: 0fb53000 CR4: 06b0
DR0:  DR1:  DR2: 
DR3:  DR6:  DR7: 
Process oneit (pid: 8542, threadinfo 88000fb56000, task  
88000e848f90)

Stack:
 01234567 810136cb 28121969 810136a8
 01234567 8102b451 0011 00040001
 0023  0006 0001802a0027
Call Trace:
 [] ? native_machine_shutdown+0x9/0x1e
 [] ? native_machine_restart+0x20/0x29
 [] ? sys_reboot+0x11f/0x14a
 [] ? __kill_pgrp_info+0x37/0x5f
 [] ? do_exit+0x61f/0x623
 [] ? schedule_tail+0x20/0x46
 [] ? ret_from_fork+0xf/0xb0
 [] ? system_call_fastpath+0x16/0x1b
Code: c8 c3 48 8b 05 8a cc 31 00 53 f6 c4 02 75 12 83 3d c9 da 38 00 00  
74 13 83 3d d8 ea 38 00 00 75 0a 9c 5b fa e8 3a ff ff ff 53 9d <5b> c3  
48 83 ec 08 eb 02 f3 90 48 8b 05 eb db 31 00 bf 00 03 00

RIP  [] lapic_shutdown+0x29/0x2b
 RSP 
---[ end trace 0c69c9c16377bd9d ]-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 04/27] x86, boot: Move lldt/ltr out of 64bit code section

2012-12-15 Thread Borislav Petkov
On Thu, Dec 13, 2012 at 02:01:58PM -0800, Yinghai Lu wrote:
> commit 08da5a2ca
> 
> x86_64: Early segment setup for VT
> 
> add lldt/ltr to clean more segments.
> 
> Those code are put in code64, and it is using gdt that is only
> loaded from code32 path.
> 
> That breaks booting with 64bit bootloader that does not go through
> code32 path. It get at startup_64 directly,  and it has different
> gdt.
> 
> Move those lines into code32 after their gdt is loaded.

Let me rewrite that commit message for ya, you tell me whether I got it
right:

"08da5a2ca479 ("x86_64: Early segment setup for VT") sets up LDT and TR
into a valid state in order to speed up boot decompression under VT. The
code which loads the GDT is executed in the 32-bit startup code while
the above change in the 64-bit part.

However, this breaks 64-bit bootloaders which jump straight to the
64-bit startup entry point and thus skip LDR and TR setup because they
use a different GDT.

Fix this by moving the LDT and TR setup to the 32-bit section."

Is that correct?

If so, why not take the time and try to write your commits more
understandably so that bystanders like me don't have to look at the code
first and scramble to understand what you mean?

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 v3] Fix memory freeing issues

2012-12-15 Thread Hans J. Koch
On Fri, Dec 14, 2012 at 11:33:50AM +0200, Vitalii Demianets wrote:
> 
> Hans, why do you want to put in this patch, which is dealing with 
> memory-freeing issues only, completely unrelated functional changes?

Because during review of your patch we happened to find another issue
a few lines up and down. Why not fix it on the way?

What I'd like is simply

[PATCH] Fix uio_pdrv_genirq issues

If you like, make it two patches, one with your memory-freeing issue
and one "Remove irq tracking" or something like that. That's just
three or four lines difference, I'd even accept it if it were only
one patch.

I don't want to fix one thing now and leave the other one unresolved.
That would just be a waste of time.

To be clear, I have no objections regarding your memory freeing ideas.

Thanks,
Hans

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: at drivers/tty/tty_buffer.c:476 flush_to_ldisc+0x1de/0x1f0()

2012-12-15 Thread Greg Kroah-Hartman
On Fri, Dec 14, 2012 at 10:53:16PM -0500, Peter Hurley wrote:
> On Fri, 2012-12-14 at 18:29 -0800, Greg Kroah-Hartman wrote:
> > On Tue, Dec 11, 2012 at 10:01:24PM -0500, Dave Jones wrote:
> > > Fuzz-testing fallout from post 3.7 tree as of commit 
> > > 414a6750e59b0b687034764c464e9ddecac0f7a6
> > > 
> > > [ 2181.230579] [ cut here ]
> > > [ 2181.231277] WARNING: at drivers/tty/tty_buffer.c:476 
> > > flush_to_ldisc+0x1de/0x1f0()
> > > [ 2181.232358] Hardware name: GA-MA78GM-S2H
> > > [ 2181.232925] tty is NULL
> > > [ 2181.233430] Modules linked in: l2tp_ppp l2tp_core fuse rfcomm 
> > > binfmt_misc hidp bnep scsi_transport_iscsi ipt_ULOG nfnetlink rose ipx 
> > > p8023 p8022 caif_socket caif af_rxrpc x25 irda af_key appletalk pppoe 
> > > netrom pppox ppp_generic decnet phonet slhc psnap crc_ccitt ax25 llc2 rds 
> > > atm llc nfc can nfsv3 nfs_acl nfs fscache lockd sunrpc ip6t_REJECT 
> > > nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack 
> > > ip6table_filter ip6_tables snd_hda_codec_realtek btusb snd_hda_intel 
> > > bluetooth usb_debug snd_hda_codec microcode snd_pcm serio_raw pcspkr 
> > > snd_page_alloc snd_timer edac_core snd soundcore r8169 mii vhost_net tun 
> > > macvtap macvlan kvm_amd kvm
> > > [ 2181.245632] Pid: 29787, comm: kworker/0:1 Not tainted 3.7.0+ #12
> > > [ 2181.246503] Call Trace:
> > > [ 2181.246851]  [] warn_slowpath_common+0x7f/0xc0
> > > [ 2181.247725]  [] warn_slowpath_fmt+0x46/0x50
> > > [ 2181.248558]  [] ? ___ratelimit+0x9a/0x120
> > > [ 2181.249347]  [] flush_to_ldisc+0x1de/0x1f0
> > > [ 2181.250164]  [] process_one_work+0x207/0x750
> > > [ 2181.251013]  [] ? process_one_work+0x197/0x750
> > > [ 2181.251893]  [] ? destroy_work_on_stack+0x20/0x20
> > > [ 2181.252809]  [] ? 
> > > tty_insert_flip_string_fixed_flag+0x110/0x110
> > > [ 2181.253993]  [] worker_thread+0x156/0x440
> > > [ 2181.254815]  [] ? rescuer_thread+0x240/0x240
> > > [ 2181.255638]  [] kthread+0xed/0x100
> > > [ 2181.256374]  [] ? put_lock_stats.isra.23+0xe/0x40
> > > [ 2181.257290]  [] ? kthread_create_on_node+0x160/0x160
> > > [ 2181.258223]  [] ret_from_fork+0x7c/0xb0
> > > [ 2181.259018]  [] ? kthread_create_on_node+0x160/0x160
> > > [ 2181.259969] ---[ end trace 12dd9f01acd7e09f ]---
> > 
> > Jiri, I thought we resolved these warnings in the linux-next tree, how
> > are they still showing up?
> 
> Greg, that's what the series that I just sent v2 of fixes. Look for
> "[PATCH v2 0/11] tty: Fix buffer work access-after-free" et al.

Ah, ok, I was holding off on looking at those until after 3.8-rc1 is
out, I'll do so then, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   >