date:20121026

SalesSBR chinachem shared photos with you

2012-10-26 Thread SalesSBR chinachem


Hello Sir/Mam:

Happy to contact you !

We would like to introudce our company.Our company is a large scale  
pertrochemical enterprise with synthetic rubber as the main product. We  
could supply SBR 1502 and 1712. It is manufactured by SINOPEC YANGZI.


The following is the SBR quotation :

* SBR 1502: USD 2500.00/TON FOB China port

 SBR 1712: USD 2350.00/TON FOB China port

* packing : 35kg ppbag

* quantity: 21ton/20'container

* price validity: within 3 days .

* payment terms : TT in advance or LC at sight .

* delivery time: within 20 days .

Any other question, contact me freely !

best wishes

Adelle Liang

sales manage

rXuzhou Yizhengyuan Chemical Technology Co., Ltd

email:chem...@163.com
<>

Re: [PATCH v3] firmware loader: introduce module parameter to customize fw search path

2012-10-26 Thread anish kumar

On Sat, 2012-10-27 at 09:23 +0800, Ming Lei wrote:
> This patch introduces one module parameter of 'path' in firmware_class
> to support customizing firmware image search path, so that people can
> use its own firmware path if the default built-in paths can't meet their
> demand[1], and the typical usage is passing the below from kernel command
> parameter when 'firmware_class' is built in kernel:
> 
>   firmware_class.path=$CUSTOMIZED_PATH
> 
> [1], https://lkml.org/lkml/2012/10/11/337
> 
> Cc: Linus Torvalds 
> Signed-off-by: Ming Lei 
> ---
> v3
>   - fix one mistake on checking unset firmware path
> 
> v2
>   - take a cleaner approach suggested by Linus
>   - mark the path array as const because it needn't be changed
>   - fix one error in Document about the module name
> 
> v1:
>   - remove kernel boot parameter and only support the feature by
> module parameter as suggested by Greg
> ---
>  Documentation/firmware_class/README |5 +
>  drivers/base/firmware_class.c   |   17 -
>  2 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/firmware_class/README 
> b/Documentation/firmware_class/README
> index 815b711..e9fce78 100644
> --- a/Documentation/firmware_class/README
> +++ b/Documentation/firmware_class/README
> @@ -22,12 +22,17 @@
>   - calls request_firmware(_entry, $FIRMWARE, device)
>   - kernel searchs the fimware image with name $FIRMWARE directly
>   in the below search path of root filesystem:
> + User customized search path by module parameter 'path'[1]
>   "/lib/firmware/updates/" UTS_RELEASE,
>   "/lib/firmware/updates",
>   "/lib/firmware/" UTS_RELEASE,
>   "/lib/firmware"
>   - If found, goto 7), else goto 2)
>  
> + [1], the 'path' is a string parameter which length should be less
whose length should be less...
> + than 256, user should pass 'firmware_class.path=$CUSTOMIZED_PATH'
> + if firmware_class is built in kernel(the general situation)
> +
>   2), userspace:
>   - /sys/class/firmware/xxx/{loading,data} appear.
>   - hotplug gets called with a firmware identifier in $FIRMWARE
> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
> index 8945f4e..62568c2 100644
> --- a/drivers/base/firmware_class.c
> +++ b/drivers/base/firmware_class.c
> @@ -267,13 +267,23 @@ static void fw_free_buf(struct firmware_buf *buf)
>  }
>  
>  /* direct firmware loading support */
> -static const char *fw_path[] = {
> +static char fw_path_para[256];
> +static const char * const fw_path[] = {
> + fw_path_para,
>   "/lib/firmware/updates/" UTS_RELEASE,
>   "/lib/firmware/updates",
>   "/lib/firmware/" UTS_RELEASE,
>   "/lib/firmware"
>  };
>  
> +/*
> + * Typical usage is that passing 'firmware_class.path=$CUSTOMIZED_PATH'
> + * from kernel command because firmware_class is generally built in
do you mean kernel command line?
> + * kernel instead of module.
> + */
> +module_param_string(path, fw_path_para, sizeof(fw_path_para), 0644);
> +MODULE_PARM_DESC(path, "customized firmware image search path with a higher 
> priority than default path");
> +
>  /* Don't inline this: 'struct kstat' is biggish */
>  static noinline long fw_file_size(struct file *file)
>  {
> @@ -315,6 +325,11 @@ static bool fw_get_filesystem_firmware(struct 
> firmware_buf *buf)
>  
>   for (i = 0; i < ARRAY_SIZE(fw_path); i++) {
>   struct file *file;
> +
> + /* skip the unset customized path */
> + if (!fw_path[i][0])
> + continue;
> +
>   snprintf(path, PATH_MAX, "%s/%s", fw_path[i], buf->fw_id);
>  
>   file = filp_open(path, O_RDONLY, 0);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH]Documentation:Chinese translation of Documentation/arm64/memory.txt

2012-10-26 Thread Ninja Tekkaman

Once I realized that the original document have been updated, I will
update that translation immediately.

2012/10/25 Catalin Marinas :
> 2012/10/24 Tekkaman Ninja :
>> This is a Chinese translated version of
>> Documentation/arm64/memory.txt
>>
>> Signed-off-by: Fu Wei 
>
> As long as you maintain it and you keep it in sync:
>
> Acked-by: Catalin Marinas 
>
> I used google translate and it seems close to the original document :)
>
> --
> Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH]Documentation:Chinese translation of Documentation/arm64/memory.txt

2012-10-26 Thread Ninja Tekkaman

Hi, Catalin
I have a question about the original document.
what is the exact meaning about the "memory" in
"ffc0    256GB  memory"
Can I treat it as "kernel logical address space" ? Or maybe It can be
modified to “kernel” ？

It's relevant to my Chinese translation.

--
Tekkaman Ninja
2012/10/25 Catalin Marinas :
> 2012/10/24 Tekkaman Ninja :
>> This is a Chinese translated version of
>> Documentation/arm64/memory.txt
>>
>> Signed-off-by: Fu Wei 
>
> As long as you maintain it and you keep it in sync:
>
> Acked-by: Catalin Marinas 
>
> I used google translate and it seems close to the original document :)
>
> --
> Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [sqlite] light weight write barriers

2012-10-26 Thread Theodore Ts'o

On Fri, Oct 26, 2012 at 09:54:53PM -0400, Vladislav Bolkhovitin wrote:
> What different in our positions is that you are considering storage
> as something you can connect to your desktop, while in my view
> storage is something, which stores data and serves them the best
> possible way with the best performance.

I don't get paid to make Linux storage work well for gold-plated
storage, and as far as I know, none of the purveyors of said gold
plated software systems are currently employing Linux file system
developers to make Linux file systems work well on said gold-plated
hardware.

As for what I might do on my own time, for fun, I can't afford said
gold-plated hardware, and personally I get a lot more satisfaction if
I know there will be a large number of people who benefit from my work
(it was really cool when I found out that millions and millions of
Android devices were going to be using ext4 :-), as opposed to a very
small number of people who have paid $$$ to storage vendors who don't
feel it's worthwhile to pay core Linux file system developers to
leverage their hardware.  Earlier, you were bemoaning why Linux file
system developers weren't paying attention to using said fancy SCSI
features.  Perhaps now you'll understand better it's not happening?

> Price doesn't matter here, because it's completely different topic.

It matters if you think I'm going to do it on my own time, out of my
own budget.  And if you think my employer is going to choose to use
said hardware, price definitely matters.  I consider engineering to be
the art of making tradeoffs, and price is absolutely one of the things
that we need to trade off against other goals.

It's rare that you get to design something where performance matters
above all else.  Maybe it's that way if you're paid by folks whose job
it is to destablize the world's financial markets by pushing the holes
into the right half plane (i.e., high frequency trading :-).  But for
the rest of the world, price absolutely matters.

- Ted

P.S.  All of the storage I have access to at home is SATA.  If someone
would like to change that and ship me free hardware, as long as it
doesn't require three-phase power (or require some exotic interconnect
which is ghastly expensive and which you are also not going to provide
me for free), do contact me off-line.  :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH]Documentation:Chinese translation of Documentation/arm64/booting.txt

2012-10-26 Thread Ninja Tekkaman

hi, Rob,
Thanks for your suggestion, and It can be decided by the maintainer.
My suggestion is that non-english translations in the kernel source
can help the developer who use that language as mother language  join
the development of Linux kernel.
They can get info from  kernel source, it's more convenient.

The translations I made is for that I hope  more and more Chinese
developer can join the kernel development ,and contribute to the
kernel.

2012/10/26 Rob Landley :
> On 10/24/2012 11:11:36 AM, Catalin Marinas wrote:
>> 2012/10/24 Tekkaman Ninja :
>> > This is a Chinese translated version of
>> > Documentation/arm64/booting.txt
>> >
>> > Signed-off-by: Fu Wei 
>>
>> Same as my comment on the memory.txt patch, I don't understand what
>> it
>> says, so you need to keep it up to date.
>>
>> Acked-by: Catalin Marinas 
>
> I have a similar problem: I can't read translations to languages I
> don't speak. I argued for putting them on the web way back when, but
> Greg Kroah-Hartman incorporated stuff he can't read either into the
> Documentation directory.
>
> Last I pinged him he was willing to maintain non-english translations,
> try sending them to him?
>
> Rob
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1] powerpc/book3e: store critical/machine/debug exception thread info

2012-10-26 Thread Tiejun Chen

We need to store thread info to these exception thread info like something
we already did for PPC32.

Signed-off-by: Tiejun Chen 
---
This patch is followed on my three patches I send recently:

[PATCH 1/3] powerpc/book3e: load critical/machine/debug exception stack
[PATCH 2/3] powerpc/book3e: support kgdb for kernel space
[PATCH 3/3] kgdb/kgdbts: support ppc64

Tiejun

 arch/powerpc/kernel/exceptions-64e.S |   20 
 1 file changed, 20 insertions(+)

diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index c5564d4..4e7083e 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -91,10 +91,28 @@
 #define SPRN_GDBELL_SRR0   SPRN_GSRR0
 #define SPRN_GDBELL_SRR1   SPRN_GSRR1
 
+/* Store something to exception thread info */
+#defineBOOK3E_STORE_EXC_LEVEL_THEAD_INFO(type) 
\
+   std r14,PACA_EX##type+EX_R14(r13);  
\
+   std r15,PACA_EX##type+EX_R15(r13);  
\
+   ld  r14,PACA_EX##type+EX_R1(r13);   
\
+   clrrdi  r14,r14,THREAD_SHIFT;   
\
+   clrrdi  r15,r1,THREAD_SHIFT;
\
+   ld  r10,TI_FLAGS(r14);  
\
+   std r10,TI_FLAGS(r15);  
\
+   ld  r10,TI_PREEMPT(r14);
\
+   std r10,TI_PREEMPT(r1); 
\
+   ld  r10,TI_TASK(r14);   
\
+   std r10,TI_TASK(r1);
\
+   ld  r14,PACA_EX##type+EX_R14(r13);  
\
+   ld  r15,PACA_EX##type+EX_R15(r13);  
\
+1:
+
 #define CRIT_SET_KSTACK
\
BOOK3E_LOAD_EXC_LEVEL_STACK(CRIT);  
\
ld  r1,PACA_CRIT_STACK(r13);\
subir1,r1,SPECIAL_EXC_FRAME_SIZE;
+   BOOK3E_STORE_EXC_LEVEL_THEAD_INFO(CRIT);
 #define SPRN_CRIT_SRR0 SPRN_CSRR0
 #define SPRN_CRIT_SRR1 SPRN_CSRR1
 
@@ -102,6 +120,7 @@
BOOK3E_LOAD_EXC_LEVEL_STACK(DBG);   
\
ld  r1,PACA_DBG_STACK(r13); \
subir1,r1,SPECIAL_EXC_FRAME_SIZE;
+   BOOK3E_STORE_EXC_LEVEL_THEAD_INFO(DBG);
 #define SPRN_DBG_SRR0  SPRN_DSRR0
 #define SPRN_DBG_SRR1  SPRN_DSRR1
 
@@ -109,6 +128,7 @@
BOOK3E_LOAD_EXC_LEVEL_STACK(MC);
\
ld  r1,PACA_MC_STACK(r13);  \
subir1,r1,SPECIAL_EXC_FRAME_SIZE;
+   BOOK3E_STORE_EXC_LEVEL_THEAD_INFO(MC);
 #define SPRN_MC_SRR0   SPRN_MCSRR0
 #define SPRN_MC_SRR1   SPRN_MCSRR1
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH]Documentation:Chinese translation of Documentation/arm64/booting.txt

2012-10-26 Thread Ninja Tekkaman

OK, I will try my best to  keep it up to date.
I hope that Catalin Marinas can Cc the patch to me, when the
Documentation/arm64 has got a patch.
Then I can  translate it to Chinese immediately.

Now the info about AArch64 is limited. so maybe it have some translation errors.
When I realized that error in the future, I will fix it immediately.

Thank you for reminding me.

2012/10/26 harryxiyou :
> 2012/10/24 Tekkaman Ninja :
>> This is a Chinese translated version of
>> Documentation/arm64/booting.txt
>>
>> Signed-off-by: Fu Wei 
>>
>
> Acked-by: Harry Wei 
>
> Thanks for your good job and you should keep it up to date ;-)
>
>
> --
> Thanks
> Harry Wei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] x86,mm: drop TLB flush from ptep_set_access_flags

2012-10-26 Thread Rik van Riel


On 10/26/2012 05:12 PM, Alan Cox wrote:

On Fri, 26 Oct 2012 14:45:02 -0400
Rik van Riel  wrote:


Intel has an architectural guarantee that the TLB entry causing
a page fault gets invalidated automatically. This means
we should be able to drop the local TLB invalidation.

Because of the way other areas of the page fault code work,
chances are good that all x86 CPUs do this.  However, if
someone somewhere has an x86 CPU that does not invalidate
the TLB entry causing a page fault, this one-liner should
be easy to revert.


This does not strike me as a good standard of validation for such a change

At the very least we should have an ACK from AMD and from VIA, and
preferably ping RDC and some of the other embedded folks. Given an AMD
and VIA ACK I'd be fine. I doubt anyone knows any more what Cyrix CPUs
did or cared about and I imagine H Peter or Linus can answer for
Transmeta ;-)


Fair enough.

If it turns out any of those CPUs need an explicit
flush, then we can also adjust flush_tlb_fix_spurious_fault
to actually do a local flush on x86 (or at least on those
CPUs).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/13] sched: Integrating Per-entity-load-tracking with the core scheduler

2012-10-26 Thread Preeti U Murthy

On 10/26/2012 06:37 PM, Ingo Molnar wrote:
> 
> * Peter Zijlstra  wrote:
> 
>> [...]
>>
>> So a sane series would introduce maybe two functions: 
>> cpu_load() and task_load() and use those where we now use 
>> rq->load.weight and p->se.load.weight for load balancing 
>> purposes. Implement these functions using those two 
>> expression. So effectively this patch is a NOP.
>>
>> Secondly, switch these two functions over to the per-task 
>> based averages.
>>
>> Tada! all done. The load balancer will then try and equalize 
>> effective load instead of instant load.
>>
>> It will do the 3x10% vs 100% thing correctly with just those 
>> two patches. Simply because it will report a lower cpu-load 
>> for the 3x10% case than it will for the 100% case, no need to 
>> go fudge about in the load-balance internals.
>>
>> Once you've got this correctly done, you can go change 
>> balancing to better utilize the new metric, like use the 
>> effective load instead of nr_running against the capacity and 
>> things like that. But for every such change you want to be 
>> very careful and run all the benchmarks you can find -- in 
>> fact you want to do that after the 2nd patch too.
> 
> If anyone posted that simple two-patch series that switches over 
> to the new load metrics I'd be happy to test the performance of 
> those.
> 
> Having two parallel load metrics is really not something that we 
> should tolerate for too long.
> 
> Thanks,
> 
>   Ingo
> 
Right Ingo.I will incorporate this approach and post out very soon.

Thank you

Regards
Preeti

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/13] sched: Integrating Per-entity-load-tracking with the core scheduler

2012-10-26 Thread Preeti U Murthy

On 10/26/2012 05:59 PM, Peter Zijlstra wrote:
> On Thu, 2012-10-25 at 23:42 +0530, Preeti U Murthy wrote:

> firstly, cfs_rq is the wrong place for a per-cpu load measure, secondly
> why add another load field instead of fixing the one we have? 
Hmm..,rq->load.weight is the place.

>> So why didnt I replace? I added cfs_rq->runnable_load_avg as an
>> additional metric instead of replacing the older metric.I let the old
>> metric be as a dead metric and used the newer metric as an
>> alternative.so if this alternate metric does not do us good we have the
>> old metric to fall back on.
> 
> That's wrong.. either it works and we can apply the patches or it
> doesn't and we won't. What you did makes it very hard to see you
> preserve the current balancer -- which afaict you don't, you change the
> balancer with the very first patch.
You are right on this Peter.
> 
> Why not update weighted_cpuload(), update_idle_cpu_load() and
> update_cpu_load_active() to use another metric and go from there. If you
> do that the whole balancer will auto-magically use the new weight
> measure.
> 
> Once you have that running, you can look at modifying it.
Hmm...Correct.

>>  a.find_busiest_group/find_idlest_group/update_sg_lb_stats:use sum of
>> cfs_rq->runnable_load_avg to decide this instead of sum of
>> cfs_rq->load.weight.
> 
> But the first patches are about adding new balancing conditions, that's
> a complete fail..
> 
>>  b.find_busiest_queue/find_idlest_queue: use cfs_rq->runnable_load_avg
>> to decide this instead of cfs_rq->load.weight
> 
> See, if you would have changed the input function (weighted_cpuload),
> you wouldn't have needed to touch any of this.
> 
I see your point.
>>  c.move_tasks: use se->avg.load_avg_contrib to decide the weight of of
>> each task instead of se->load.weight as the former reflects the runtime
>> of the sched entity and hence its actual load.
> 
> The changelog in that patch (7) is completely devoid of any useful
> information.
> 
>> This is what my patches3-13 do.Merely *Replace*.
>>
>> *Why am I doing it*: Feed the load balancer with a more realistic metric
>> to do load balancing and observe the consequences.
> 
> I know why you're doing the entire thing, but you fail to motivate each
> individual change. A changelog should read something like:
> 
>   current code does blah, this has a problem when blah, we can improve
> this doing blah because blah.
> 
Ah! I get it.

>>> you start out by some weird avoid short running task movement.
>>> why is that a good start?
>>
>> The short running tasks are not really burdening you,they have very
>> little run time,so why move them?
> 
> The most important part is why this would be a good start to the series,
> it is not.
> 
> The patch is also dubious at best; short running might be all you have,
> your definition of 'short' is also iffy.
My definition of 'short' was bursty loads.What I observed from using the
new metric to study the effective load calculation was,when there are
around 2-3 such bursty loads the effective load stays below the
threshold that i have stated,and I thought this would be a good enough
excuse to let the loads stay on the cpu.

Bursty being a load that sleeps for 9ms every 10ms for a duration of
10s.(a 10% load) in my experiments.
> 
>> Throughout the concept of load balancing the focus is on *balancing the
>> *existing* load* between the sched groups.But not really evaluating the
>> *absolute load* of any given sched group.
> 
> Which is why you're going to change the metrics.. the balancer really
> only cares about making load equal, flipping the metric of the load
> doesn't change anything fundamental.
Ok.
> 
>> Why is this *the start*? This is like a round of elimination before the
>> actual interview round  Try to have only those sched groups as
>> candidates for load balancing that are sufficiently loaded.[Patch1]
>> This *sufficiently loaded* is decided by the new metric explained in the
>> *How* above.
> 
> No, this is absolutely wrong.
> 
> 
> So a sane series would introduce maybe two functions: cpu_load() and
> task_load() and use those where we now use rq->load.weight and
> p->se.load.weight for load balancing purposes. Implement these functions
> using those two expression. So effectively this patch is a NOP.
> 
> Secondly, switch these two functions over to the per-task based
> averages.
> 
> Tada! all done. The load balancer will then try and equalize effective
> load instead of instant load.
> 
> It will do the 3x10% vs 100% thing correctly with just those two
> patches. Simply because it will report a lower cpu-load for the 3x10%
> case than it will for the 100% case, no need to go fudge about in the
> load-balance internals.
> 
> Once you've got this correctly done, you can go change balancing to
> better utilize the new metric, like use the effective load instead of
> nr_running against the capacity and things like that. But for every such
> change you want to be very careful and run all the

Re: [update] Re: new execve/kernel_thread design

2012-10-26 Thread Al Viro

On Fri, Oct 26, 2012 at 07:31:07PM +0100, Al Viro wrote:
>   The situation got much better by now.  More than a half of
> architectures are done - alpha arm arm64 c6x hexagon ia64 m68k mips openrisc
> parisc sparc tile um unicore32 and x86.
> 
>   Two more avait ACKs from maintainers - powerpc and s390.  Should work,
> AFAICS.
> 
>   xtensa - Max was going to repost updated patches; waiting for that
> to happen, but essentially it's done and tested.
> 
>   microblaze - Michal was debugging kernel_execve side of it the last
> time I've heard from him...
> 
>   frv, mn10300 - dhowells was going to test those
> 
>   sh - Paul Mundt was going to test and send fixes
> 
>   avr32, blackfin, cris, h8300, m32r, score - no signs of life from
> maintainers.  Folks, please show up and at least test the damn patchsets.
> Hell knows, they might even work - unicore32 one did, modulo trivial typo,
> to my deep surprise...

BTW, there's a tangentially related issue: several architectures have
very odd clone(2).  Namely, blackfin, h8300, no-MMU microblaze and sh64 (==sh5)
silently ignore child_tidptr and parent_tidptr arguments.  I.e. treat them
as NULL - or as if CLONE_PARENT_SETTID/CLONE_CHILD_SETTID/CLONE_CHILD_CLEARTID
were never set.  With the patchset in the local part of queue it would be
trivial to switch to normal semantics; strictly speaking, it's an ABI change.
Somebody doing
n = 0x69696969;
if (clone(CLONE_PARENT_SETTID, 0, ) > 0) {
if (n != 0x69696969) {
printf("oh, shit, we are not on blackfin\n");
exit(-1);
}
}
would run into a user-visible behaviour change, but IMO that's in the realm
of testing for known architecture-dependent bugs and finding them fixed...

Opinions, vetoes?  Should we preserve the current behaviour in this case?
I would obviously prefer to just go ahead and fix the sucker - the odds of
any actual software depending on that behaviour are pretty much nil.
Linus, does that cross the boundary between bug fix and ABI breakage?

Another curious thing happens on blackfin; there we subtract 12 from usp
when it's non-zero (zero == inherit the parent's usp, as always).  No idea
why is that done; this one definitely has to be preserved, so I'm just
wondering about the reasons behind that oddity...  Mike?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-26 Thread Jim Rees

Theodore Ts'o wrote:

  The problem is this code isn't done yet, and journal_checksum is
  really not ready for prime time.  When it is ready, my plan is to wire
  it up so it is enabled by default; at the moment, it was intended for
  developer experimentation only.  As I said, it's my fault for not
  clearly labelling it "Not for you!", or putting it under an #ifdef to
  prevent unwary civilians from coming across the feature and saying,
  "oooh, shiny!" and turning it on.  :-(

Perhaps a word or two in the mount man page would be appropriate?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PCIe IO space support on Tilera GX: Is there any one who can confirm my modification to fix it is OK?

2012-10-26 Thread Bjorn Helgaas

On Fri, Oct 26, 2012 at 7:39 PM, Cyberman Wu  wrote:
> On Sat, Oct 27, 2012 at 12:28 AM, Bjorn Helgaas  wrote:
>> On Fri, Oct 26, 2012 at 8:08 AM, Chris Metcalf  wrote:
>>
>>> Cyberman: it seems like your bias hack is working for you.  But, as Bjorn
>>> says, this sounds like a driver bug.  What happens if you just revert your
>>> changes, but then in mvsas.c change the "if (!res_start || !res_len)" to
>>> just say "if (!res_len)"?  That seems like the true error test.  If that
>>> works, you should submit that change to the community.
>>
>> I don't *think* that is going to be enough, even with the kernel that
>> has some I/O space support, because both devices are assigned
>> identical resources:
>>
>>   pci :01:00.0: BAR 2: assigned [io  0x-0x007f]
>>   pci 0001:01:00.0: BAR 2: assigned [io  0x-0x007f]
>>
>> The I/O space support that's there is broken because we think the same
>> I/O range is available on both root buses, which is probably not the
>> case:
>>
>>   pci_bus :00: resource 0 [io  0x-0x]
>>   pci_bus 0001:00: resource 0 [io  0x-0x]
>>
> That's the problem I want to confirm what I've changed is correct. I've split
> the two RootComplex using separate I/O range, it seems works on our device,
> but since I'm not very clear about Linux kernel, I want some some to check it.
> For mvsas, I've already modified it some thing like Chris said when I began
> using MDE-4.0.0 GA release. I bring it out to see if there have some ideas
> about that issue.

Some architectures do implement multiple I/O ranges.  Typical HP
parisc and ia64 boxes have a PCI host bridge for every slot, so each
slot can be in a separate PCI domain, and each host bridge can support
a separate 64KB I/O port space for its slot.  In that case, the values
in the struct resource will be different from the actual addresses
that appear on the PCI buses.

For example, you might have bridge A leading to bus :00 with [io
0x-0x] and bridge B leading to bus 0001:00 with [io
0x1-0x1].  The I/O port addresses used by drivers don't
overlap, and there's no ambiguity, but if you put an analyzer on bus
0001:00, you'd see port addresses in the 0x-0x range.  If you
moved the analyzer to bus :00, you'd see the same 0x-0x
range of port addresses.  It's up to the architecture implementation
of inb()/outb()/etc. to map an I/O resource address to a host bridge
and a bus port address behind that bridge.

The bottom line is that what you want to do seems possible and makes
some sense.  Of course, the diff you posted is useless for upstream
Linux because it's all entangled with MDE and it reverts a lot of the
recent Linux work.  But what you want to do is possible in principle.
It's up to you and Chris to figure out whether and how to rework the
changes to add this functionality cleanly.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFT] leds: blinkm: Avoid calling INIT_WORK in blinkm_led_common_set()

2012-10-26 Thread Axel Lin

Calling INIT_WORK in blinkm_led_common_set() means we init a workqueue every
time when brightness_set callback is called.
Move INIT_WORK to blinkm_probe() so we only need to init the workqueue once.

This patch also refactors the data structure of blinkm_data and blinkm_work.
Embedded struct blinkm_led in struct blinkm_work, and embedded struct
blinkm_work in blinkm_data. With this change, we don't need to allocate and free
memory for bl_work in blinkm_led_common_set() and led_work().

Signed-off-by: Axel Lin 
---
Hi Jan-Simon,
  I don't have this hardware, can you help review and testing this patch.
Thank you,
Axel
 drivers/leds/leds-blinkm.c |   30 ++
 1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/drivers/leds/leds-blinkm.c b/drivers/leds/leds-blinkm.c
index f7c3d7f..5e4cd9b 100644
--- a/drivers/leds/leds-blinkm.c
+++ b/drivers/leds/leds-blinkm.c
@@ -44,7 +44,8 @@ struct blinkm_led {
 };
 
 struct blinkm_work {
-   struct blinkm_led *blinkm_led;
+   /* used for led class interface */
+   struct blinkm_led led;
struct work_struct work;
 };
 
@@ -54,8 +55,7 @@ struct blinkm_work {
 struct blinkm_data {
struct i2c_client *i2c_client;
struct mutex update_lock;
-   /* used for led class interface */
-   struct blinkm_led blinkm_leds[3];
+   struct blinkm_work bl_work[3];
/* used for "blinkm" sysfs interface */
u8 red; /* color red */
u8 green;   /* color green */
@@ -446,12 +446,10 @@ static int blinkm_transfer_hw(struct i2c_client *client, 
int cmd)
 static void led_work(struct work_struct *work)
 {
int ret;
-   struct blinkm_led *led;
-   struct blinkm_data *data ;
struct blinkm_work *blm_work = work_to_blmwork(work);
+   struct blinkm_led *led = _work->led;
+   struct blinkm_data *data = i2c_get_clientdata(led->i2c_client);
 
-   led = blm_work->blinkm_led;
-   data = i2c_get_clientdata(led->i2c_client);
ret = blinkm_transfer_hw(led->i2c_client, BLM_GO_RGB);
atomic_dec(>active);
dev_dbg(>i2c_client->dev,
@@ -459,7 +457,6 @@ static void led_work(struct work_struct *work)
" next_blue = %d, active = %d\n",
data->next_red, data->next_green,
data->next_blue, atomic_read(>active));
-   kfree(blm_work);
 }
 
 static int blinkm_led_common_set(struct led_classdev *led_cdev,
@@ -468,7 +465,7 @@ static int blinkm_led_common_set(struct led_classdev 
*led_cdev,
/* led_brightness is 0, 127 or 255 - we just use it here as-is */
struct blinkm_led *led = cdev_to_blmled(led_cdev);
struct blinkm_data *data = i2c_get_clientdata(led->i2c_client);
-   struct blinkm_work *bl_work;
+   struct blinkm_work *bl_work = >bl_work[color];
 
switch (color) {
case RED:
@@ -510,10 +507,6 @@ static int blinkm_led_common_set(struct led_classdev 
*led_cdev,
return -EINVAL;
}
 
-   bl_work = kzalloc(sizeof(*bl_work), GFP_ATOMIC);
-   if (!bl_work)
-   return -ENOMEM;
-
atomic_inc(>active);
dev_dbg(>i2c_client->dev,
"#TO_SCHED# next_red = %d, next_green = %d,"
@@ -521,9 +514,6 @@ static int blinkm_led_common_set(struct led_classdev 
*led_cdev,
data->next_red, data->next_green,
data->next_blue, atomic_read(>active));
 
-   /* a fresh work _item_ for each change */
-   bl_work->blinkm_led = led;
-   INIT_WORK(_work->work, led_work);
/* queue work in own queue for easy sync on exit*/
schedule_work(_work->work);
 
@@ -667,8 +657,9 @@ static int __devinit blinkm_probe(struct i2c_client *client,
}
 
for (i = 0; i < 3; i++) {
+   INIT_WORK(>bl_work[i].work, led_work);
/* RED = 0, GREEN = 1, BLUE = 2 */
-   led[i] = >blinkm_leds[i];
+   led[i] = >bl_work[i].led;
led[i]->i2c_client = client;
led[i]->id = i;
led[i]->led_cdev.max_brightness = 255;
@@ -746,13 +737,12 @@ exit:
 static int __devexit blinkm_remove(struct i2c_client *client)
 {
struct blinkm_data *data = i2c_get_clientdata(client);
-   int ret = 0;
-   int i;
+   int i, ret;
 
/* make sure no workqueue entries are pending */
for (i = 0; i < 3; i++) {
flush_scheduled_work();
-   led_classdev_unregister(>blinkm_leds[i].led_cdev);
+   led_classdev_unregister(>bl_work[i].led.led_cdev);
}
 
/* reset rgb */
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [sqlite] light weight write barriers

2012-10-26 Thread Vladislav Bolkhovitin



Theodore Ts'o, on 10/25/2012 09:50 AM wrote:

Yeah  I don't buy that.  One, flash is still too expensive.  Two,
the capital costs to build enough Silicon foundries to replace the
current production volume of HDD's is way too expensive for any
company to afford (the cloud providers are buying *huge* numbers of
HDD's) --- and that's assuming companies wouldn't chose to use those
foundries for products with larger margins --- such as, for example,
CPU/GPU chips. :-) And third and finally, if you study the long-term
trends in terms of Data Retention Time (going down), Program and Read
Disturb (going up), and Write Endurance (going down) as a function of
feature size and/or time, you'd be wise to treat flash as nothing more
than short-term cache, and not as a long term stable store.

If end users completely give up on flash, and store all of their
precious family pictures on flash storage, after a couple of years,
they are likely going to be very disappointed

Speaking personally, I wouldn't want to have anything on flash for
more than a few months at *most* before I made sure I had another copy
saved on spinning rust platters for long-term retention.


Here I agree with you.

Vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [sqlite] light weight write barriers

2012-10-26 Thread Vladislav Bolkhovitin



Theodore Ts'o, on 10/25/2012 01:14 AM wrote:

On Tue, Oct 23, 2012 at 03:53:11PM -0400, Vladislav Bolkhovitin wrote:

Yes, SCSI has full support for ordered/simple commands designed
exactly for that task: to have steady flow of commands even in case
when some of them are ordered.


SCSI does, yes --- *if* the device actually implements Tagged Command
Queuing (TCQ).  Not all devices do.

More importantly, SATA drives do *not* have this capability, and when
you compare the price of SATA drives to uber-expensive "enterprise
drives", it's not surprising that most people don't actually use
SCSI/SAS drives that have implemented TCQ.


What different in our positions is that you are considering storage as something 
you can connect to your desktop, while in my view storage is something, which 
stores data and serves them the best possible way with the best performance.


Hence, for you the least common denominator of all storage features is the most 
important, while for me to get the best of what possible from storage is the most 
important.


In my view storage should offload from the host system as much as possible: data 
movements, ordered operations requirements, atomic operations, deduplication, 
snapshots, reliability measures (eg RAIDs), load balancing, etc.


It's the same as with 2D/3D video acceleration hardware. If you want the best 
performance from your system, you should offload from it as much as possible. In 
case of video - to the video hardware, in case of storage - to the storage. The 
same as with video, for storage better offload - better performance. On hundreds 
of thousands IOPS it's clearly visible.


Price doesn't matter here, because it's completely different topic.


SATA's Native Command
Queuing (NCQ) is not equivalent; this allows the drive to reorder
requests (in particular read requests) so they can be serviced more
efficiently, but it does *not* allow the OS to specify a partial,
relative ordering of requests.


And so? If SATA can't do it, does it mean that nobody else can't do it too? I know 
a plenty of non-SATA devices, which can do the ordering requirements you need.


Vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [sqlite] light weight write barriers

2012-10-26 Thread Vladislav Bolkhovitin



Nico Williams, on 10/24/2012 05:17 PM wrote:

Yes, SCSI has full support for ordered/simple commands designed exactly for
that task: [...]

[...]

But historically for some reason Linux storage developers were stuck with
"barriers" concept, which is obviously not the same as ORDERED commands,
hence had a lot troubles with their ambiguous semantic. As far as I can tell
the reason of that was some lack of sufficiently deep SCSI understanding
(how to handle errors, believe that ACA is something legacy from parallel
SCSI times, etc.).


Barriers are a very simple abstraction, so there's that.


It isn't simple at all. If you think for some time about barriers from the storage 
point of view, you will soon realize how bad and ambiguous they are.



Before that happens, people will keep returning again and again with those
simple questions: why the queue must be flushed for any ordered operation?
Isn't is an obvious overkill?


That [cache flushing]


It isn't cache flushing, it's _queue_ flushing. You can call it queue draining, if 
you like.


Often there's a big difference where it's done: on the system side, or on the 
storage side.


Actually, performance improvements from NCQ in many cases are not because it 
allows the drive to reorder requests, as it's commonly thought, but because it 
allows to have internal drive's processing stages stay always busy without any 
idle time. Drives often have a long internal pipeline.. Hence the need to keep 
every stage of it always busy and hence why using ORDERED commands is important 
for performance.



is not what's being asked for here. Just a
light-weight barrier.  My proposal works without having to add new
system calls: a) use a COW format, b) have background threads doing
fsync()s, c) in each transaction's root block note the last
known-committed (from a completed fsync()) transaction's root block,
d) have an array of well-known ubberblocks large enough to accommodate
as many transactions as possible without having to wait for any one
fsync() to complete, d) do not reclaim space from any one past
transaction until at least one subsequent transaction is fully
committed.  This obtains ACI- transaction semantics (survives power
failures but without durability for the last N transactions at
power-failure time) without requiring changes to the OS at all, and
with support for delayed D (durability) notification.


I believe what you really want is to be able to send to the storage a sequence of 
your favorite operations (FS operations, async IO operations, etc.) like:


Write back caching disabled:

data op11, ..., data op1N, ORDERED data op1, data op21, ..., data op2M, ...

Write back caching enabled:

data op11, ..., data op1N, ORDERED sync cache, ORDERED FUA data op1, data op21, 
..., data op2M, ...


Right?

(ORDERED means that it is guaranteed that this ordered command never in any 
circumstances will be executed before any previous command completed AND after any 
subsequent command completed.)


Vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5] pwm: vt8500: Update vt8500 PWM driver support

2012-10-26 Thread Tony Prisk

This patch updates pwm-vt8500.c to support devicetree probing and
make use of the common clock subsystem.

A binding document describing the PWM controller found on
arch-vt8500 is also included.

Signed-off-by: Tony Prisk 
---
v4:
return err from clk_enable rather than -EBUSY
v5:
replace IS_ERR_OR_NULL with IS_ERR as pointed out by Chris Brand

 .../devicetree/bindings/pwm/vt8500-pwm.txt |   17 
 drivers/pwm/pwm-vt8500.c   |   86 ++--
 2 files changed, 80 insertions(+), 23 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/pwm/vt8500-pwm.txt

diff --git a/Documentation/devicetree/bindings/pwm/vt8500-pwm.txt 
b/Documentation/devicetree/bindings/pwm/vt8500-pwm.txt
new file mode 100644
index 000..bcc6367
--- /dev/null
+++ b/Documentation/devicetree/bindings/pwm/vt8500-pwm.txt
@@ -0,0 +1,17 @@
+VIA/Wondermedia VT8500/WM8xxx series SoC PWM controller
+
+Required properties:
+- compatible: should be "via,vt8500-pwm"
+- reg: physical base address and length of the controller's registers
+- #pwm-cells: should be 2.  The first cell specifies the per-chip index
+  of the PWM to use and the second cell is the period in nanoseconds.
+- clocks: phandle to the PWM source clock
+
+Example:
+
+pwm1: pwm@d822 {
+   #pwm-cells = <2>;
+   compatible = "via,vt8500-pwm";
+   reg = <0xd822 0x1000>;
+   clocks = <>;
+};
diff --git a/drivers/pwm/pwm-vt8500.c b/drivers/pwm/pwm-vt8500.c
index ad14389..970b0c6 100644
--- a/drivers/pwm/pwm-vt8500.c
+++ b/drivers/pwm/pwm-vt8500.c
@@ -1,7 +1,8 @@
 /*
  * drivers/pwm/pwm-vt8500.c
  *
- *  Copyright (C) 2010 Alexey Charkov 
+ * Copyright (C) 2012 Tony Prisk 
+ * Copyright (C) 2010 Alexey Charkov 
  *
  * This software is licensed under the terms of the GNU General Public
  * License version 2, as published by the Free Software Foundation, and
@@ -21,14 +22,24 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
-#define VT8500_NR_PWMS 4
+#include 
+#include 
+#include 
+
+/*
+ * SoC architecture allocates register space for 4 PWMs but only
+ * 2 are currently implemented.
+ */
+#define VT8500_NR_PWMS 2
 
 struct vt8500_chip {
struct pwm_chip chip;
void __iomem *base;
+   struct clk *clk;
 };
 
 #define to_vt8500_chip(chip)   container_of(chip, struct vt8500_chip, chip)
@@ -52,7 +63,7 @@ static int vt8500_pwm_config(struct pwm_chip *chip, struct 
pwm_device *pwm,
unsigned long long c;
unsigned long period_cycles, prescale, pv, dc;
 
-   c = 2500/2; /* wild guess --- need to implement clocks */
+   c = clk_get_rate(vt8500->clk);
c = c * period_ns;
do_div(c, 10);
period_cycles = c;
@@ -85,8 +96,15 @@ static int vt8500_pwm_config(struct pwm_chip *chip, struct 
pwm_device *pwm,
 
 static int vt8500_pwm_enable(struct pwm_chip *chip, struct pwm_device *pwm)
 {
+   int err;
struct vt8500_chip *vt8500 = to_vt8500_chip(chip);
 
+   err = clk_enable(vt8500->clk);
+   if (err < 0)
+   dev_err(chip->dev, "failed to enable clock\n");
+   return err;
+   };
+
pwm_busy_wait(vt8500->base + 0x40 + pwm->hwpwm, (1 << 0));
writel(5, vt8500->base + (pwm->hwpwm << 4));
return 0;
@@ -98,6 +116,8 @@ static void vt8500_pwm_disable(struct pwm_chip *chip, struct 
pwm_device *pwm)
 
pwm_busy_wait(vt8500->base + 0x40 + pwm->hwpwm, (1 << 0));
writel(0, vt8500->base + (pwm->hwpwm << 4));
+
+   clk_disable(vt8500->clk);
 }
 
 static struct pwm_ops vt8500_pwm_ops = {
@@ -107,12 +127,24 @@ static struct pwm_ops vt8500_pwm_ops = {
.owner = THIS_MODULE,
 };
 
-static int __devinit pwm_probe(struct platform_device *pdev)
+static const struct of_device_id vt8500_pwm_dt_ids[] = {
+   { .compatible = "via,vt8500-pwm", },
+   { /* Sentinel */ }
+};
+MODULE_DEVICE_TABLE(of, vt8500_pwm_dt_ids);
+
+static int vt8500_pwm_probe(struct platform_device *pdev)
 {
struct vt8500_chip *chip;
struct resource *r;
+   struct device_node *np = pdev->dev.of_node;
int ret;
 
+   if (!np) {
+   dev_err(>dev, "invalid devicetree node\n");
+   return -EINVAL;
+   }
+
chip = devm_kzalloc(>dev, sizeof(*chip), GFP_KERNEL);
if (chip == NULL) {
dev_err(>dev, "failed to allocate memory\n");
@@ -124,6 +156,12 @@ static int __devinit pwm_probe(struct platform_device 
*pdev)
chip->chip.base = -1;
chip->chip.npwm = VT8500_NR_PWMS;
 
+   chip->clk = devm_clk_get(>dev, NULL);
+   if (IS_ERR(chip->clk)) {
+   dev_err(>dev, "clock source not specified\n");
+   return PTR_ERR(chip->clk);
+   }
+
r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (r == NULL) {
dev_err(>dev, "no memory resource defined\n");
@@ -131,18 +169,26 @@ static int __devinit pwm_probe(struct platform_device 
*pdev)
}

Re: PCIe IO space support on Tilera GX: Is there any one who can confirm my modification to fix it is OK?

2012-10-26 Thread Cyberman Wu

On Sat, Oct 27, 2012 at 12:28 AM, Bjorn Helgaas  wrote:
> On Fri, Oct 26, 2012 at 8:08 AM, Chris Metcalf  wrote:
>
>> Cyberman: it seems like your bias hack is working for you.  But, as Bjorn
>> says, this sounds like a driver bug.  What happens if you just revert your
>> changes, but then in mvsas.c change the "if (!res_start || !res_len)" to
>> just say "if (!res_len)"?  That seems like the true error test.  If that
>> works, you should submit that change to the community.
>
> I don't *think* that is going to be enough, even with the kernel that
> has some I/O space support, because both devices are assigned
> identical resources:
>
>   pci :01:00.0: BAR 2: assigned [io  0x-0x007f]
>   pci 0001:01:00.0: BAR 2: assigned [io  0x-0x007f]
>
> The I/O space support that's there is broken because we think the same
> I/O range is available on both root buses, which is probably not the
> case:
>
>   pci_bus :00: resource 0 [io  0x-0x]
>   pci_bus 0001:00: resource 0 [io  0x-0x]
>
That's the problem I want to confirm what I've changed is correct. I've split
the two RootComplex using separate I/O range, it seems works on our device,
but since I'm not very clear about Linux kernel, I want some some to check it.
For mvsas, I've already modified it some thing like Chris said when I began
using MDE-4.0.0 GA release. I bring it out to see if there have some ideas
about that issue.

> If mvsas really doesn't need the I/O BAR, I think it's likely that
> making it use pci_enable_device_mem() will make both devices work even
> without I/O space support in the kernel.
>
>> Bjorn et al: does it seem reasonable to add a bias to the mappings so that
>> we never report a zero value as valid?  This may be sufficiently defensive
>> programming that it's just the right thing to do regardless of whether
>> drivers are technically at fault or not.  If so, what's a good bias?  (I'm
>> inclined to think 64K rather than 4K.)
>
> I/O space is very limited to begin with (many architectures only
> *have* 64K), so I hesitate to add a bias in the PCI core.  But we do
> something similar in arch_remove_reservations(), and I think you could
> implement it that way if you wanted to.
>
> Bjorn



-- 
Cyberman Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3] firmware loader: introduce module parameter to customize fw search path

2012-10-26 Thread Ming Lei

This patch introduces one module parameter of 'path' in firmware_class
to support customizing firmware image search path, so that people can
use its own firmware path if the default built-in paths can't meet their
demand[1], and the typical usage is passing the below from kernel command
parameter when 'firmware_class' is built in kernel:

firmware_class.path=$CUSTOMIZED_PATH

[1], https://lkml.org/lkml/2012/10/11/337

Cc: Linus Torvalds 
Signed-off-by: Ming Lei 
---
v3
- fix one mistake on checking unset firmware path

v2
- take a cleaner approach suggested by Linus
- mark the path array as const because it needn't be changed
- fix one error in Document about the module name

v1:
- remove kernel boot parameter and only support the feature by
module parameter as suggested by Greg
---
 Documentation/firmware_class/README |5 +
 drivers/base/firmware_class.c   |   17 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/firmware_class/README 
b/Documentation/firmware_class/README
index 815b711..e9fce78 100644
--- a/Documentation/firmware_class/README
+++ b/Documentation/firmware_class/README
@@ -22,12 +22,17 @@
- calls request_firmware(_entry, $FIRMWARE, device)
- kernel searchs the fimware image with name $FIRMWARE directly
in the below search path of root filesystem:
+   User customized search path by module parameter 'path'[1]
"/lib/firmware/updates/" UTS_RELEASE,
"/lib/firmware/updates",
"/lib/firmware/" UTS_RELEASE,
"/lib/firmware"
- If found, goto 7), else goto 2)
 
+   [1], the 'path' is a string parameter which length should be less
+   than 256, user should pass 'firmware_class.path=$CUSTOMIZED_PATH'
+   if firmware_class is built in kernel(the general situation)
+
  2), userspace:
- /sys/class/firmware/xxx/{loading,data} appear.
- hotplug gets called with a firmware identifier in $FIRMWARE
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 8945f4e..62568c2 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -267,13 +267,23 @@ static void fw_free_buf(struct firmware_buf *buf)
 }
 
 /* direct firmware loading support */
-static const char *fw_path[] = {
+static char fw_path_para[256];
+static const char * const fw_path[] = {
+   fw_path_para,
"/lib/firmware/updates/" UTS_RELEASE,
"/lib/firmware/updates",
"/lib/firmware/" UTS_RELEASE,
"/lib/firmware"
 };
 
+/*
+ * Typical usage is that passing 'firmware_class.path=$CUSTOMIZED_PATH'
+ * from kernel command because firmware_class is generally built in
+ * kernel instead of module.
+ */
+module_param_string(path, fw_path_para, sizeof(fw_path_para), 0644);
+MODULE_PARM_DESC(path, "customized firmware image search path with a higher 
priority than default path");
+
 /* Don't inline this: 'struct kstat' is biggish */
 static noinline long fw_file_size(struct file *file)
 {
@@ -315,6 +325,11 @@ static bool fw_get_filesystem_firmware(struct firmware_buf 
*buf)
 
for (i = 0; i < ARRAY_SIZE(fw_path); i++) {
struct file *file;
+
+   /* skip the unset customized path */
+   if (!fw_path[i][0])
+   continue;
+
snprintf(path, PATH_MAX, "%s/%s", fw_path[i], buf->fw_id);
 
file = filp_open(path, O_RDONLY, 0);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2 0/2] vmevent: A bit reworked pressure attribute + docs + man page

2012-10-26 Thread Anton Vorontsov

On Fri, Oct 26, 2012 at 11:37:20AM +0900, Minchan Kim wrote:
[...]
> > > Of course, it's very flexible and potential to add new VM knob easily but
> > > the thing we is about to use now is only VMEVENT_ATTR_PRESSURE.
> > > Is there any other use cases for swap or free? or potential user?
> > 
> > Number of idle pages by itself might be not that interesting, but
> > cache+idle level is quite interesting.
> > 
> > By definition, _MED happens when performance already degraded, slightly,
> > but still -- we can be swapping.
> > 
> > But _LOW notifications are coming when kernel is just reclaiming, so by
> > using _LOW notifications + watching for cache level we can very easily
> > predict the swapping activity long before we have even _MED pressure.
> 
> So, for seeing cache level, we need new vmevent_attr?

Hopefully, not. We're not interested in the raw values of the cache level,
but what we want is to to tell the kernel how much "easily reclaimable
pages" userland has, and get notified when kernel believes that it's good
time for the userland is to help. I.e. this new _MILD level:

> > Maybe it makes sense to implement something like PRESSURE_MILD with an
> > additional nr_pages threshold, which basically hits the kernel about how
> > many easily reclaimable pages userland has (that would be a part of our
> > definition for the mild pressure level). So, essentially it will be
> > 
> > if (pressure_index >= oom_level)
> > return PRESSURE_OOM;
> > else if (pressure_index >= med_level)
> > return PRESSURE_MEDIUM;
> > else if (userland_reclaimable_pages >= nr_reclaimable_pages)
> > return PRESSURE_MILD;
> > return PRESSURE_LOW;
> > 
> > I must admit I like the idea more than exposing NR_FREE and stuff, but the
> > scheme reminds me the blended attributes, which we abandoned. Although,
> > the definition sounds better now, and we seem to be doing it in the right
> > place.
> > 
> > And if we go this way, then sure, we won't need any other attributes, and
> > so we could make the API much simpler.
> 
> That's what I want! If there isn't any user who really are willing to use it,
> let's drop it. Do not persuade with imaginary scenario because we should be 
> careful to introduce new ABI.

Yeah, I think you're right. Let's make the vmevent_fd slim first. I won't
even focus on the _MILD/_BALANCE level for now, we can do it later, and we
always have the /proc/vmstat even if the _MILD turns out to be a bad idea.

Reading /proc/vmstat is a bit more overhead, but it's not that much at all
(especially when we don't have to timer-poll the vmstat).

> > > Adding vmevent_fd without them is rather overkill.
> > > 
> > > And I want to avoid timer-base polling of vmevent if possbile.
> > > mem_notify of KOSAKI doesn't use such timer.
> > 
> > For pressure notifications we don't use the timers. We also read the
> 
> Hmm, when I see the code, timer still works and can notify to user. No?

Yes, I was mostly saying that it is technically not required anymore, but
you're right, the code still fires the timer (it just runs needlessly for
the pressure attr).

Bad wording on my side.

[..]
> > We can do it via eventfd, or /dev/chardev (which has been discussed and
> > people didn't like it, IIRC), or signals (which also has been discussed
> > and there are problems with this approach as well).
> > 
> > I'm not sure why having a syscall is a big issue. If we're making eventfd
> > interface, then we'd need to maintain /sys/.../ ABI the same way as we
> > maintain the syscall. What's the difference? A dedicated syscall is just a
> 
> No difference. What I want is just to remove unnecessary stuff in vmevent_fd
> and keep it as simple. If we do via /dev/chardev, I expect we can do necessary
> things for VM pressure. But if we can diet with vmevent_fd, It would be 
> better.
> If so, maybe we have to change vmevent_fd to lowmem_fd or
> vmpressure_fd.

Sure, then I'm starting the work to slim the API down, and we'll see how
things are going to look after that.

Thanks a lot!

Anton.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

target-pending updated to v3.7-rc2

2012-10-26 Thread Nicholas A. Bellinger

Hi folks,

This is a just friendly heads up that target-pending has been updated to
latest v3.7-rc2 mainline code.  As usual, the for-next* branches have
been recreated from master and include in-flight target development
patches destined for upstream during the v3.8 merge window.

Also, a heads up for bleeding edge users of for-next/auto-next branches.
There are some SCSI reservations related items that still need to be
resolved for-3.8 code, so please be careful when using SCSI reservations
logic with the current bleeding edge target-pending code.

Thanks!

--nab



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-26 Thread Theodore Ts'o

On Fri, Oct 26, 2012 at 10:19:21PM +0100, Nix wrote:
> > prevent unwary civilians from coming across the feature and saying,
> > "oooh, shiny!" and turning it on.  :-(
> 
> Or having it turned on by default either, which seems to be the case
> now.

Huh?  It's not turned on by default.  If you mount with no mount
options, journal checksums are *not* turned on.

 - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ext4 issue after unclean shutdown and reboot

2012-10-26 Thread Theodore Ts'o

I've bcc'ed sta...@vger.kernel.org, LKML, and greg-kh, since I suspect
they aren't interested in all of these details... we'll keep this on
linux-ext4 for sanity's sake.

On Sat, Oct 27, 2012 at 01:15:42AM +0200, Martin wrote:
> 
> sorry for the repetition, but Theodore Ts'o asked me to re-post this
> as a new mail thread in order to keep track of this more easily.

1)  Is it repeatable?  This is what happened once.  Can you reliably reproduce 
it?

2)  Can you send the full dmesg of the previous boot and the boot where
you found the problem?

3)  Can you send me the fsck logs?

4)  Can you send me the output dumpe2fs on the file system in question?

Thanks

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linix-3.6.3 sda, sdb drives in reverse order (with a USB 2.0 drives and a monolithic kernel configuration)

2012-10-26 Thread Wallak


Chris Friesen wrote:

On 10/26/2012 01:43 PM, Wallak wrote:

Chris Friesen wrote:

On 10/25/2012 04:49 PM, Wallak wrote:
I've a very annoying behavior with the linux-3.6.x kernels release, 
and

a monolithic configuration. The USB 2.0 drives are mapped first with
/dev/sda, /dev/sdb... devices, and than the SATA AHCI drives come 
after.

This is out of order with the BIOS configuration and breaks a program
like lilo. This is also annoying when we use a static partition 
mapping.


Linux-3.5 works fine. Where this bug come from ? Is this a patch to 
get

the old, and classical behavior ?


As you have discovered it's fragile to rely on /dev/sd* names since a
BIOS update, kernel update, or motherboard replacement could
conceivably cause them to change.

Better to use something like partition labels that you control and
that don't change.

Chris


You are right, when we have a configuration with a lot of drvies and
adapters SATA, old SCSI,.. etc. the order may change. But having the
main SATA hard drive defined, as the BIOS boot device, behind external
and removable USB drives is in my opinion a bug.And may lead to security
issues (drives with the same label, etc...).

Using =LABEL, or =UUID with a bootloader like grub or lilo, save the the
boot device mapped drive partition number , and so booting on an older
kernel like linux 3.5 will fail. If we remove the external USB drive,
the boot process will fail too...

So such a bug have to be fix.


If you specify "root=LABEL=" as part of the kernel boot args in 
grub does it not check the label at boot time?


Using root=LABEL= or root=UUID= don't work on a plain kernel, this 
feature may be handled by an initrd trick. Otherwise for all non root 
partitions UUID= work fine.
Nevertheless not fixing this bug yields some other issues:  Using lilo 
to launch a second OS (other= option) fail, the command try to parse 
partitions available on /dev/sda, and miss the real main HDD. Boot drive 
must be force with lilo options...
SATA drives have, most of the time, no reason to be behind USB drives. 
If we want to get a reliable behavior: /dev/sda must be mapped to the 
BIOS boot device. Using the same behavior as linux-3.5 will be fine.


Wallak.



Chris



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3 v2] firmware: Add /proc/firmware_path entry to list the firmware paths

2012-10-26 Thread Greg Kroah-Hartman

On Fri, Oct 26, 2012 at 11:07:31AM +0100, Dimitris Papastamos wrote:
> On Thu, Oct 25, 2012 at 11:48:33AM -0700, Greg Kroah-Hartman wrote:
> > On Tue, Oct 23, 2012 at 01:52:55PM +0100, Dimitris Papastamos wrote:
> > > This patch provides the aforementioned procfs file that lists
> > > the default firmware paths that are used during firmware lookup.
> > > 
> > > The file contains a white space separated list of paths.
> > 
> > Paths can have whitespaces :(
> > 
> > How about using the "universal" path seperator of ':' that shells are
> > used to using?  Yeah, it gets messy if you have a ':' in a path, but
> > from what I can tell, only the openSUSE build system does looney things
> > like that.
> 
> Sure I can do that, however, could you comment on Ming's comments?  Is
> this patchset something we find useful at the moment as regards listing
> and/or updating the firmware path list.

I really don't remember anymore, sorry.  That was a few hunderd or so
patches ago that I've reviewed...

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] debugfs: add get/set for atomic types

2012-10-26 Thread Greg Kroah-Hartman

On Fri, Oct 26, 2012 at 01:46:38PM -0500, Seth Jennings wrote:
> On 10/26/2012 11:13 AM, Greg Kroah-Hartman wrote:
> > On Fri, Oct 26, 2012 at 10:49:01AM -0500, Seth Jennings wrote:
> >> debugfs currently lacks the ability to create attributes
> >> that set/get atomic_t values.
> >>
> >> This patch adds support for this through a new
> >> debugfs_create_atomic_t() function.
> > 
> > Why would you want to set an atomic variable type from userspace?
> 
> In my particular use case, I'm not setting from userspace, only getting.

That's good, you wouldn't want to set an atomic from userspace.

> All the other types are symmetric, being both settable and gettable,
> so I was just trying to blend with the other types.

Fair enough.

> > What in-kernel code needs this?
> 
> None, yet.  I am working on code that does.
> 
> If you would rather me send this as part of the patchset for that
> code, I can.  I just wanted to get this in early if I could since it
> is functionality I can see others using.
> 
> For an example, see drivers/staging/ramster/zcache-main.c:132 where
> both atomic and non-atomic version of stats are being maintained so
> that the non-atomic version can be read by debugfs.
> 
> Beside not having an in-kernel user yet, does this approach look good
> to you?

It looks reasonable, but please hold off and submit this as part of a
patch series that needs it.  No need to add it now.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kmemleak report on isp1763 and sierra MC8705

2012-10-26 Thread Greg KH

On Fri, Oct 26, 2012 at 05:57:23PM -0400, Richard Retanubun wrote:
> Hi Guys,
> 
> I am debugging a reported kmemleak involving a sierra wireless MC8705 
> connected
> through isp1763 on powerpc linux-3.0.22

Does this also happen on 3.6.3?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-26 Thread Martin


On 10/26/2012 11:10 PM, Theodore Ts'o wrote:

This looks very different.  The symptoms are quite different, and it's
most likely that an unclean shutdown is involved.  In your case,
you're doing clean shutdowns, with some suspend/resume cycles thrown
in.


No no, the case I reported was triggered by an unclean shutdown: my son 
hitting the power button after a system crash, or more likely when the 
graphics subsystem became unresponsive.



Are you running e2fsck to fix the file system consistency problems;
what is e2fsck reporting?


by now it attests a bill of clean health. at first it reported issues 
the precise nature of which escaping my memory, fixed them, and after 
the next reboot reported some more issues which again were fixed. Had I 
known this will look similar to a prominent issue I would have paid more 
attention.



Do you need to have a suspend/resume in order to trigger the problem?


no, I just mentioned the suspend/resume cycles to explain what is going 
on in the syslog, which I didn't attach in the end. During the period of 
the problem building up there was no suspend/resume event.



This could very be some kind of hardware problem or kernel bug related
to suspend/resume.  Unfortunately, many different problems get noticed
by the file system, but the root cause is can often be something else;
a hardware problem, or a bug somewhere else in the kernel.


I hear what you are saying. I just want to add that the hardware has 
survived the past two or three years despite suspend/resume and the odd 
abusive treatment (like unclean shutdown by non-techie users). I tend to 
keep the kernel, patches, modules and user land up to date.




Regards,

- Ted

P.S.  Can you do us a favor and start a separate mail thread with the
information reposted?  It's can get hard to track different cases when
a lot of people assume that their random failure (some of which are
hardware problems) are related to the issue we are trying to track
down in this mail thread and then they all pile onto the same mail
thread or the same web forum --- one of the reasons why I detest
Ubuntu Launchpad.  Thanks!!


Shall do.

cu Martin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] mm: memmap_init_zone() performance improvement

2012-10-26 Thread Mike Yoknis

memmap_init_zone() loops through every Page Frame Number (pfn),
including pfn values that are within the gaps between existing
memory sections.  The unneeded looping will become a boot
performance issue when machines configure larger memory ranges
that will contain larger and more numerous gaps.

The code will skip across invalid pfn values to reduce the
number of loops executed.

Signed-off-by: Mike Yoknis 
---
 mm/page_alloc.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 45c916b..9f9c1a6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3857,8 +3857,11 @@ void __meminit memmap_init_zone(unsigned long
size, int nid, unsigned long zone,
 * exist on hotplugged memory.
 */
if (context == MEMMAP_EARLY) {
-   if (!early_pfn_valid(pfn))
+   if (!early_pfn_valid(pfn)) {
+   pfn = ALIGN(pfn + MAX_ORDER_NR_PAGES,
+   MAX_ORDER_NR_PAGES) - 1;
continue;
+   }
if (!early_pfn_in_nid(pfn, nid))
continue;
}
-- 
1.7.11.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.6.3-rt7

2012-10-26 Thread Anca Emanuel

On Sat, Oct 27, 2012 at 1:08 AM, Thomas Gleixner  wrote:
>
> Looks like CONFIG_NUMA=y exposes explosions. I just noticed that none
> of the machines which are in my basic set of test systems have that
> enabled.
>
> /me goes to do some homework

Try 
https://github.com/torvalds/linux/commit/6b187d0260b6cd1d0904309f32659b7ed5948af8

(mm, numa: avoid setting zone_reclaim_mode unless a node is
sufficiently distant)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 2/3] ext4: introduce ext4_error_remove_page

2012-10-26 Thread Luck, Tony

> Well, we could set a new attribute bit on the file which indicates
> that the file has been corrupted, and this could cause any attempts to
> open the file to return some error until the bit has been cleared.

That sounds a lot better than renaming/moving the file.

> This would persist across reboots.  The only problem is that system
> administrators might get very confused (at least at first, when they
> first run a kernel or a distribution which has this feature enabled).

Yes. This would require some education. But new attributes have been
added in the past (e.g. immutable) that caused confusion to users and
tools that didn't know about them.

> Application programs could also get very confused when any attempt to
> open or read from a file suddenly returned some new error code (EIO,
> or should we designate a new errno code for this purpose, so there is
> a better indication of what the heck was going on?)

EIO sounds wrong ... but it is perhaps the best of the existing codes. Adding
a new one is also challenging too.

> Also, if we just log the message in dmesg, if the system administrator
> doesn't find the "this file is corrupted" bit right away

This is pretty much a given. Nobody will see the message in the console log
until it is far too late.

> I'm not sure it's worth it to go to these extents, but I could imagine
> some customers wanting to have this sort of information.  Do we know
> what their "nice to have" / "must have" requirements might be?

18 years ago Intel rather famously attempted to sell users on the idea that a
rare divide error that sometimes gave the wrong answer could be ignored. Before
my time at Intel, but it is still burned into the corporate psyche that 
customers
really don't like to get the wrong answers from their computers.

Whether it is worth it may depend on the relative frequency of data being
corrupted this way, compared to all the other ways that it might get messed
up. If it were a thousand times more likely that data got silently corrupted
on its path to media, sitting spinning on the media, and then back off the
drive again - then all this fancy stuff wouldn't make any real difference.
I have no data on the relative error rates of memory and i/o - so I can't
answer this.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFT RESEND linux-next] mips: dma-mapping: support debug_dma_mapping_error

2012-10-26 Thread David Daney


On 10/26/2012 02:51 PM, Shuah Khan wrote:


On 10/26/2012 09:01 AM, Shuah Khan wrote:

Add support for debug_dma_mapping_error() call to avoid warning from
debug_dma_unmap() interface when it checks for mapping error checked
status. Without this patch, device driver failed to check map error
warning is generated.


I'm confused.

Your claim that a 'warning is generated' seems to be in conflict with...


[...]

Got it. Thanks. I would volunteer to look at fixing all the problems,
but unfortunately I don't have a MIPS box handy


This statement that you don't have hardware that exhibits the problem.

How was the patch tested?  How do you even know there is a problem?

David Daney


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 1/2] rcu: rcutiny: Prevent RCU stall

2012-10-26 Thread Steven Rostedt

From: Thomas Gleixner 

rcu_read_unlock_special() checks in_serving_softirq() and leaves early
when true. On RT this is obviously wrong as softirq processing context
can be preempted and therefor such a task can be on the gp_tasks
list. Leaving early here will leave the task on the list and therefor
block RCU processing forever.

This cannot happen on mainline because softirq processing context
cannot be preempted and therefor this can never happen at all.

In fact this check looks quite questionable in general. Neither irq
context nor softirq processing context in mainline can ever be
preempted in mainline so the special unlock case should not ever be
invoked in such context. Now the only explanation might be a
rcu_read_unlock() being interrupted and therefor leave the rcu nest
count at 0 before the special unlock bit has been cleared. That looks
fragile. At least it's missing a big fat comment. Paul 

See mainline commits: ec433f0c5 and 8762705a for further enlightment.

Reported-by: Kristian Lehmann 
Signed-off-by: Thomas Gleixner 
Cc: stable...@vger.kernel.org
Signed-off-by: Steven Rostedt 
---
 kernel/rcutiny_plugin.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcutiny_plugin.h b/kernel/rcutiny_plugin.h
index 22ecea0..97e359b 100644
--- a/kernel/rcutiny_plugin.h
+++ b/kernel/rcutiny_plugin.h
@@ -572,7 +572,7 @@ static noinline void rcu_read_unlock_special(struct 
task_struct *t)
rcu_preempt_cpu_qs();
 
/* Hardware IRQ handlers cannot block. */
-   if (in_irq() || in_serving_softirq()) {
+   if (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_OFFSET)) {
local_irq_restore(flags);
return;
}
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 2/2] Linux 3.4.15-rt25-rc1

2012-10-26 Thread Steven Rostedt

From: Steven Rostedt 

---
 localversion-rt |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/localversion-rt b/localversion-rt
index b2111a2..792cb5f 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt24
+-rt25-rc1
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 0/2] [ANNOUNCE] 3.4.15-rt25-rc1 stable review

2012-10-26 Thread Steven Rostedt


Dear RT Folks,

This is the RT stable review cycle of patch 3.4.15-rt25-rc1.

Please scream at me if I messed something up. Please test the patches too.

The -rc release will be uploaded to kernel.org and will be deleted when
the final release is out. This is just a review release (or release candidate).

The pre-releases will not be pushed to the git repository, only the
final release is.

If all goes well, this patch will be converted to the next main release
on 10/29/2013.

Enjoy,

-- Steve


To build 3.4.15-rt25-rc1 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.4.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.4.15.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.4/patch-3.4.15-rt25-rc1.patch.xz

You can also build from 3.4.15-rt24 by applying the incremental patch:

http://www.kernel.org/pub/linux/kernel/projects/rt/3.4/incr/patch-3.4.15-rt24-rt25-rc1.patch.xz


Changes from 3.4.15-rt24:

---


Steven Rostedt (1):
  Linux 3.4.15-rt25-rc1

Thomas Gleixner (1):
  rcu: rcutiny: Prevent RCU stall


 kernel/rcutiny_plugin.h |2 +-
 localversion-rt |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] staging:comedi: Use pr_ or dev_ printks in drivers/usbduxfaxt.c

2012-10-26 Thread YAMANE Toshiaki

fixed below checkpatch warnings.
-WARNING: printk() should include KERN_ facility level

Signed-off-by: YAMANE Toshiaki 
---
 drivers/staging/comedi/drivers/usbduxfast.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/comedi/drivers/usbduxfast.c 
b/drivers/staging/comedi/drivers/usbduxfast.c
index 060d853dc..b4e987b 100644
--- a/drivers/staging/comedi/drivers/usbduxfast.c
+++ b/drivers/staging/comedi/drivers/usbduxfast.c
@@ -370,9 +370,8 @@ static void usbduxfastsub_ai_Irq(struct urb *urb)
return;
 
default:
-   printk("comedi%d: usbduxfast: non-zero urb status received in "
-  "ai intr context: %d\n",
-  udfs->comedidev->minor, urb->status);
+   pr_err("non-zero urb status received in ai intr context: %d\n",
+  urb->status);
s->async->events |= COMEDI_CB_EOA;
s->async->events |= COMEDI_CB_ERROR;
comedi_event(udfs->comedidev, s);
@@ -454,7 +453,8 @@ static int usbduxfastsub_start(struct usbduxfastsub_s *udfs)
  1,  /* Length */
  EZTIMEOUT);/* Timeout */
if (ret < 0) {
-   printk("comedi_: usbduxfast_: control msg failed (start)\n");
+   dev_err(>interface->dev,
+   "control msg failed (start)\n");
return ret;
}
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv2] serial: 8250 check iir rdi in interrupt

2012-10-26 Thread Min Zhang


The patch works around two UART interrupt bugs when the serial console is
flooded with inputs:
1. syslog shows "serial8250: too much works for irq"
2. serial console stops responding to key stroke

serial8250_handle_irq() checks UART_IIR_RDI before reading receive fifo
and clears bogus interrupt UART_IIR_RDI without accompanying UART_LSR_DR,
otherwise RDI interrupt could freeze or too many unhandled RDI interrupts.

Added module parameter skip_rdi_check to opt out this workaround.

Tested on Radisys ATCA 46XX which uses FPGA 16550-compatible and
other generic 16550 UART. It takes from an hour to days to reproduce by
pumping inputs to serial console continously using TeraTerm script:

Signed-off-by: Min Zhang 
---
 drivers/tty/serial/8250/8250.c |   50 
 1 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/drivers/tty/serial/8250/8250.c b/drivers/tty/serial/8250/8250.c
index 3ba4234..dfc13d1 100644
--- a/drivers/tty/serial/8250/8250.c
+++ b/drivers/tty/serial/8250/8250.c
@@ -64,6 +64,7 @@ static int serial_index(struct uart_port *port)
 }

 static unsigned int skip_txen_test; /* force skip of txen test at init time */
+static unsigned int skip_rdi_check = 1; /* skip of IIR RDI check in interrupt 
*/

 /*
  * Debugging.
@@ -1479,6 +1480,46 @@ unsigned int serial8250_modem_status(struct 
uart_8250_port *up)
 EXPORT_SYMBOL_GPL(serial8250_modem_status);

 /*
+ * Check if status UART_LSR_RD accompanies with interrupt UART_IIR_RDI.
+ * If they are mismatch, massage the status or interupt cause accordingly:
+ *
+ * Return a cleared UART_LSR_RD status if there is no accompanying
+ * UART_IIR_RDI. Hopefully the new status is used by interrupt handler
+ * to skip reading receive FIFO. Otherwise some UART controller stops
+ * generating RDI interrupt after this unnotified FIFO read, until other
+ * interrupts maybe transmit interrupt reads UART_LSR again.
+ *
+ * Or clear interrupt cause UART_IIR_RDI without UART_LSR_RD. The UART sets
+ * UART_IIR_RDI *even* if the received data has been read out from the FIFO
+ * before the timeout occurs.  To clear UART_IIR_RDI, read receive buffer
+ * register. Reading it also clears timeout interrupt for 16550+. Otherwise
+ * the uncleared UART_IIR_RDI will keep triggering IRQ but interrupt
+ * handler finds nothing to do.
+ *
+ * Skip this workaround if interrupt is not expected, such as backup timer,
+ * so that handler can still solely rely on original status register.
+ */
+static inline unsigned char serial8250_iir_rdi_check(struct uart_8250_port *up,
+unsigned char status,
+unsigned int iir)
+{
+   unsigned int rdi_stat, rdi_intr;
+
+   /* skip for timer based handler */
+   if (up->timer.data)
+   return status;
+
+   rdi_stat = status & UART_LSR_DR;
+   rdi_intr = iir & UART_IIR_RDI;
+
+   if (rdi_stat && !rdi_intr)
+   status &= ~UART_LSR_DR;
+   else if (!rdi_stat && rdi_intr)
+   serial_in(up, UART_RX);
+   return status;
+}
+
+/*
  * This handles the interrupt from one port.
  */
 int serial8250_handle_irq(struct uart_port *port, unsigned int iir)
@@ -1497,6 +1538,12 @@ int serial8250_handle_irq(struct uart_port *port, 
unsigned int iir)

DEBUG_INTR("status = %x...", status);

+   /* Some UART controller has mismatched UART_IIR_RDI and UART_LSR_DR,
+  which causes either too many interrupts or interrupt freeze
+*/
+   if (!skip_rdi_check)
+   status = serial8250_iir_rdi_check(up, status, iir);
+
if (status & (UART_LSR_DR | UART_LSR_BI))
status = serial8250_rx_chars(up, status);
serial8250_modem_status(up);
@@ -3338,6 +3385,9 @@ MODULE_PARM_DESC(nr_uarts, "Maximum number of UARTs supported. 
(1-" __MODULE_STR
 module_param(skip_txen_test, uint, 0644);
 MODULE_PARM_DESC(skip_txen_test, "Skip checking for the TXEN bug at init 
time");

+module_param(skip_rdi_check, uint, 0644);
+MODULE_PARM_DESC(skip_rdi_check, "Skip checking IIR RDI bug in interrupt");
+
 #ifdef CONFIG_SERIAL_8250_RSA
 module_param_array(probe_rsa, ulong, _rsa_count, 0444);
 MODULE_PARM_DESC(probe_rsa, "Probe I/O ports for RSA");
--
1.7.0.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

kmemleak report on isp1763 and sierra MC8705

2012-10-26 Thread Richard Retanubun


Hi Guys,

I am debugging a reported kmemleak involving a sierra wireless MC8705 connected
through isp1763 on powerpc linux-3.0.22

We are still isolating the exact trigger, but this is a pretty good one so far

send "at!reset" to the modem control tty, wait until it finishes rebooting
then try to bring up a PPP link that will fail (non existent ISP).

After some time, we got the report (included at the end) from kmemleak.

There seems to be two variants of trace that is prevalent:

something like this:

unreferenced object 0xd58e58c8 (size 8):
  comm "khubd", pid 1034, jiffies 74467293 (age 2380.122s)
  hex dump (first 8 bytes):
4d 43 38 37 30 35 00 00  MC8705..
  backtrace:
[] usb_cache_string+0x74/0xac [usbcore]
[] usb_enumerate_device+0x44/0xf8 [usbcore]
[] usb_new_device+0x3c/0x13c [usbcore]
[] hub_thread+0xc8c/0x1544 [usbcore]
[] kthread+0x7c/0x80
[] kernel_thread+0x4c/0x68

and something like this:

unreferenced object 0xd5893e00 (size 512):
  comm "khubd", pid 1034, jiffies 74467270 (age 2378.786s)
  hex dump (first 32 bytes):
09 02 a8 00 06 01 01 e0 00 00 00 00 d5 87 d6 00  
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[] usb_get_configuration+0x5c/0x13a8 [usbcore]
[] usb_enumerate_device+0xd8/0xf8 [usbcore]
[] usb_new_device+0x3c/0x13c [usbcore]
[] hub_thread+0xc8c/0x1544 [usbcore]
[] kthread+0x7c/0x80
[] kernel_thread+0x4c/0x68

Some questions:
1. Have you guys seen anything like this before?

2. The report does not point to sierra or isp1763, so our current understanding
   is that the memory is allocated outside these drivers and it is supposed
   to mark it done for someone to free it. We think this way because if
   we rigged a driver to leak a memory it allocates, kmemleak will trace
   right into it. Is this understanding correct?

3. Any ideas on how to deepen the probe to get more understanding of what 
happens?

4. Michael, is this similar to the problem you reported here?
   http://marc.info/?l=linux-usb=133432571801643=4
   From reading your report (serial device hanging), It doesn't look like it...

5. Our current hypothesis is this:
   we open the /dev/ttyUSB to send "at!reset", then a race begins
   between closing the file handle and freeing the driver resources
   and the modem hardware actually resetting, which then caused the leak.
   Can this be it? and if so, any ideas on how to solve it?

   To test this we are power cycling the modem using a gpio
   (without opening /dev/ttyUSB) to see if this is the culprit.

6. There is a worrisome line in our (old version) of isp1763 inherited from 
isp1760:

   isp1760_endpoint_disable()
   ...
qh_destroy(qh);
ep->hcpriv = NULL;
/* remove requests and leak them.
 * ATL are pretty fast done, INT could take a while...
 * The latter shoule be removed
 */
What is leaking here? qh_destroy release the memory already.


Thanks for everyone's time!

-- Richard Retanubun


unreferenced object 0xd5922c00 (size 1024):
  comm "khubd", pid 1034, jiffies 74467113 (age 2378.943s)
  hex dump (first 32 bytes):
ff ff ff ff 31 2e 32 00 00 00 00 00 00 00 00 00  1.2.
00 00 00 00 00 00 00 02 00 00 00 00 00 00 00 03  
  backtrace:
[] usb_alloc_dev+0x48/0x290 [usbcore]
[] hub_thread+0x654/0x1544 [usbcore]
[] kthread+0x7c/0x80
[] kernel_thread+0x4c/0x68
unreferenced object 0xd58e52b0 (size 8):
  comm "khubd", pid 1034, jiffies 74467113 (age 2378.943s)
  hex dump (first 8 bytes):
32 2d 31 2e 32 00 04 00  2-1.2...
  backtrace:
[] kvasprintf+0x58/0x88
[] kobject_set_name_vargs+0x34/0x84
[] dev_set_name+0x50/0x60
[] usb_alloc_dev+0x190/0x290 [usbcore]
[] hub_thread+0x654/0x1544 [usbcore]
[] kthread+0x7c/0x80
[] kernel_thread+0x4c/0x68
unreferenced object 0xd5893e00 (size 512):
  comm "khubd", pid 1034, jiffies 74467270 (age 2378.786s)
  hex dump (first 32 bytes):
09 02 a8 00 06 01 01 e0 00 00 00 00 d5 87 d6 00  
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[] usb_get_configuration+0x5c/0x13a8 [usbcore]
[] usb_enumerate_device+0xd8/0xf8 [usbcore]
[] usb_new_device+0x3c/0x13c [usbcore]
[] hub_thread+0xc8c/0x1544 [usbcore]
[] kthread+0x7c/0x80
[] kernel_thread+0x4c/0x68
unreferenced object 0xd58e5930 (size 8):
  comm "khubd", pid 1034, jiffies 74467270 (age 2378.786s)
  hex dump (first 8 bytes):
d5 8a dc c0 00 00 00 00  
  backtrace:
[] usb_get_configuration+0x7c/0x13a8 [usbcore]
[] usb_enumerate_device+0xd8/0xf8 [usbcore]
[] usb_new_device+0x3c/0x13c [usbcore]
[] hub_thread+0xc8c/0x1544 [usbcore]
[] kthread+0x7c/0x80
[] kernel_thread+0x4c/0x68
unreferenced

Re: [ANNOUNCE] 3.6.3-rt7

2012-10-26 Thread Thomas Gleixner

On Fri, 26 Oct 2012, Thomas Gleixner wrote:

> Dear RT Folks,
> 
> I'm pleased to announce the 3.6.3-rt7 release.
> 
> Changes since 3.6.3-rt6:
> 
>* Enable SLUB for RT
> 
>  Last time I looked at SLUB for RT (some years ago) it was just
>  way more painful than dealing with SLAB, but Christoph Lameter
>  has done major surgery on the SLUB code since then and it turns
>  out that making SLUB usable for RT has become very simple. Thanks
>  Christoph!
> 
>   slab.c: 172 insertions(+), 58 deletions(-)
>   slub.c:  17 insertions(+), 13 deletions(-)
> 
>  I did some quick comparisons and even a simple hackbench run
>  shows a significant speedup with SLUB vs. SLAB on RT. I'm not too
>  surprised as SLUBs fastpath does not have the RT induced
>  contention problems which we can observe with SLAB.
> 
>  As usual, give it a good testing and report whatever explodes :)

Looks like CONFIG_NUMA=y exposes explosions. I just noticed that none
of the machines which are in my basic set of test systems have that
enabled.

/me goes to do some homework

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] serial: 8250 check iir rdi in interrupt

2012-10-26 Thread Min Zhang

On Fri, Oct 26, 2012 at 7:19 AM, Alan Cox  wrote:

> So we only need to check this in serial8250_handle_irq when IIR indicates
> a data timeout interrupt ?
>
> Can we do
>
> if ((iir & 0x0F) == 0x0C) {
> /* Expensive RDI check */
> }
>
>
>
> Alan

Checking data timeout interrupt is only for "too much work for irq"
problem. There is another console freeze problem which is caused by
reading receive FIFO when there is no RDI interrupt, such as during
THRI transmit interrupt, so no timeout interrupt. This time one has to
check two both IIR and LSR. After all these checking, it becomes the
original patch anyway.

I am posting another simpler revision to exclude timer handler from
this workaround, and make this workaround default off and inline.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Yama: enable link restrictions when selected

2012-10-26 Thread Kees Cook

Systems running with Yama enabled expect restrictions on various
potentially dangerous operations that could create backward-compaibility
issues with rare userspace corner-cases. Since 561ec64ae67e ("VFS:
don't do protected {sym,hard}links by default") has disabled VFS link
restrictions by default, enable them for systems using Yama.

Cc: sta...@vger.kernel.org # v3.6
Signed-off-by: Kees Cook 
---
 security/yama/Kconfig|4 +++-
 security/yama/yama_lsm.c |5 +
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/security/yama/Kconfig b/security/yama/Kconfig
index 20ef514..2a91598 100644
--- a/security/yama/Kconfig
+++ b/security/yama/Kconfig
@@ -8,7 +8,9 @@ config SECURITY_YAMA
  This selects Yama, which extends DAC support with additional
  system-wide security settings beyond regular Linux discretionary
  access controls. Currently available is ptrace scope restriction.
- Further information can be found in Documentation/security/Yama.txt.
+ Enables the VFS link restriction sysctls (protected_symlinks and
+ protected_hardlinks) by default. Further information can be found
+ in Documentation/security/Yama.txt.
 
  If you are unsure how to answer this question, answer N.
 
diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
index b4c2984..3bef1c1 100644
--- a/security/yama/yama_lsm.c
+++ b/security/yama/yama_lsm.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define YAMA_SCOPE_DISABLED0
 #define YAMA_SCOPE_RELATIONAL  1
@@ -397,6 +398,10 @@ static __init int yama_init(void)
panic("Yama: sysctl registration failed.\n");
 #endif
 
+   /* Enable link restrictions when Yama is enabled. */
+   sysctl_protected_symlinks = 1;
+   sysctl_protected_hardlinks = 1;
+
return 0;
 }
 
-- 
1.7.9.5


-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Process Hang in __read_seqcount_begin

2012-10-26 Thread Thomas Gleixner

On Fri, 26 Oct 2012, Peter LaDow wrote:
> On Fri, Oct 26, 2012 at 2:05 PM, Eric Dumazet  wrote:
> If this were safe, we wouldn't be seeing this lockup and your patch
> wouldn't be needed.  So it seems that your patch doesn't really
> address the issue that we are not "sure a thread cannot be interrupted
> by a softirq, and cannot migrate to another cpu".  Well, we know it
> cannot migrate to another CPU, because there isn't another CPU.  So
> apparently, it can be interrupted by a softirq.  So local_bh_disable
> isn't doing anything useful in the RT patches with regard to this.

RT changes the semantics slightly. And yes it's not prepared for stuff
which is relying on some of the magic mainline implicit semantics.

Let me have a look at the whole scenario, once I'm more awake.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFT RESEND linux-next] mips: dma-mapping: support debug_dma_mapping_error

2012-10-26 Thread Shuah Khan

On Fri, 2012-10-26 at 13:07 -0700, David Daney wrote:
> On 10/26/2012 10:06 AM, Shuah Khan wrote:
> > On Fri, 2012-10-26 at 09:45 -0700, David Daney wrote:
> >> On 10/26/2012 09:01 AM, Shuah Khan wrote:
> >>> Add support for debug_dma_mapping_error() call to avoid warning from
> >>> debug_dma_unmap() interface when it checks for mapping error checked
> >>> status. Without this patch, device driver failed to check map error
> >>> warning is generated.
> >>>
> >>> Signed-off-by: Shuah Khan 
> >>> ---
> >>>arch/mips/include/asm/dma-mapping.h |2 ++
> >>>1 file changed, 2 insertions(+)
> >>>
> >>> diff --git a/arch/mips/include/asm/dma-mapping.h 
> >>> b/arch/mips/include/asm/dma-mapping.h
> >>> index be39a12..006b43e 100644
> >>> --- a/arch/mips/include/asm/dma-mapping.h
> >>> +++ b/arch/mips/include/asm/dma-mapping.h
> >>> @@ -40,6 +40,8 @@ static inline int dma_supported(struct device *dev, u64 
> >>> mask)
> >>>static inline int dma_mapping_error(struct device *dev, u64 mask)
> >>>{
> >>>   struct dma_map_ops *ops = get_dma_ops(dev);
> >>> +
> >>> + debug_dma_mapping_error(dev, mask);
> >>>   return ops->mapping_error(dev, mask);
> >>>}
> >>>
> >>>
> >>
> >> Although this is a start, I don't think it is sufficient.
> >>
> >> As far as I can tell, there are many missing calls to debug_dma_*() in
> >> the various MIPS commone and sub-architecture DMA code.
> >>
> >> Really you (or someone) needs to look at *all* the functions in
> >> arch/mips/asm/dma-mapping.h, and arch/mips/mm/dma-default.c and find
> >> places missing a debug_dma_*().
> >
> > Is it correct to assume that this patch is not needed on MIPS until
> > debug_dma interfaces get added to MIPS common and sub-architecture DMA
> > code.
> 
> No, you have a false predicate here.
> 
> debug_dma_* *is* already mostly added.  The problem is that it is 
> incomplete.  That is the nature of the problem.  Your patch makes it 
> slightly better, but doesn't fully fix the problems.

Got it. Thanks. I would volunteer to look at fixing all the problems,
but unfortunately I don't have a MIPS box handy for a thorough testing
that would be necessary for such a change.

-- Shuah
> 
> 
> >
> > When I didn't see dma_map_page() in arch/mips/include/asm/dma-mapping.h
> > defined, and just an extern, I incorrectly assumed, it is getting picked
> > up from , hence the need for this
> > patch in the first place.
> >
> > -- Shuah
> >
> >
> >
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] mm: add support for zsmalloc and zcache

2012-10-26 Thread Seth Jennings

On 10/02/2012 01:17 PM, Dan Magenheimer wrote:
> If so,  and move forward?  What do you see as next steps?

I've been reviewing the changes between zcache and zcache2 and getting
a feel for the scope and direction of those changes.

- Getting the community engaged to review zcache1 at ~2300SLOC was
  difficult.
- Adding RAMSter has meant adding RAMSter-specific code broadly across
  zcache and increases the size of code to review to ~7600SLOC.
- The changes have blurred zcache's internal layering and increased
  complexity beyond what a simple SLOC metric can reflect.
- Getting the community engaged in reviewing zcache2 will be difficult
  and will require an exceptional amount of effort for maintainer and
  reviewer.

It is difficult for me to know when it could be ready for mainline and
production use.  While zcache2 isn't getting broad code reviews yet,
how do suggest managing that complexity to make the code maintainable
and get it reviewed?

Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] slave-dmaengine fixes

2012-10-26 Thread David Woodhouse

On Fri, 2012-10-26 at 13:49 -0700, Linus Torvalds wrote:
> On Fri, Oct 26, 2012 at 5:29 AM, Vinod Koul 
> wrote:
> >
> >   git://git.infradead.org/users/vkoul/slave-dma.git fixes
> 
> git.infradead.org is sick, and is apparently not accepting
> connections...
> 
> Can you check whether git-daemon is actually running there? The
> machine is pinging ok.

Mea culpa; I've just finished upgrading it, and for some reason xinetd
got disabled in the upgrade. Should work now.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature

[PATCH v3 7/7] Add a format check for an existing variable name at erasing time

2012-10-26 Thread Seiji Aguchi

[Issue]

A format of variable name has been updated to type, id, count and ctime
to support holding multiple logs.

Format of current variable name
  dump-type0-1-2-12345678

  type:0
  id:1
  count:2
  ctime:12345678

On the other hand, if an old variable name before being updated 
remains, users can't erase it via /dev/pstore.

Format of old variable name
  dump-type0-1-12345678

  type:0
  id:1
  ctime:12345678

[Solution]

This patch adds a format check for the old variable name in a erase callback to 
make it erasable.

Signed-off-by: Seiji Aguchi 
---
 drivers/firmware/efivars.c |   14 --
 1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index dd228d5..b1cd028 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -777,6 +777,8 @@ static int efi_pstore_erase(enum pstore_type_id type, u64 
id, int count,
struct efivars *efivars = psi->data;
struct efivar_entry *entry, *found = NULL;
int i;
+   unsigned int type_old, part_old;
+   unsigned long time_old;
 
sprintf(name, "dump-type%u-%u-%d-%lu", type, (unsigned int)id, count,
time.tv_sec);
@@ -796,8 +798,16 @@ static int efi_pstore_erase(enum pstore_type_id type, u64 
id, int count,
if (efi_guidcmp(entry->var.VendorGuid, vendor))
continue;
if (utf16_strncmp(entry->var.VariableName, efi_name,
- utf16_strlen(efi_name)))
-   continue;
+ utf16_strlen(efi_name))) {
+   /*
+* Check if an old format,
+* which doesn't support holding
+* multiple logs, remains.
+*/
+   if (sscanf(name, "dump-type%u-%u-%lu",
+  _old, _old, _old) != 3)
+   continue;
+   }
 
/* found */
found = entry;
-- 1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 6/7] efi_pstore: Add a format check for an existing variable name at reading time

2012-10-26 Thread Seiji Aguchi

[Issue]

A format of variable name has been updated to type, id, count and ctime
to support holding multiple logs.

Format of current variable name
  dump-type0-1-2-12345678

  type:0
  id:1
  count:2
  ctime:12345678

On the other hand, if an old variable name before being updated 
remains, users can't read it via /dev/pstore.

Format of old variable name
  dump-type0-1-12345678

  type:0
  id:1
  ctime:12345678

[Solution]

This patch adds a format check for the old variable name in a read callback
to make it readable.

Signed-off-by: Seiji Aguchi 
---
 drivers/firmware/efivars.c |   38 --
 1 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index dc69802..dd228d5 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -681,17 +681,35 @@ static ssize_t efi_pstore_read(u64 *id, enum 
pstore_type_id *type,
*count = cnt;
timespec->tv_sec = time;
timespec->tv_nsec = 0;
-   get_var_data_locked(efivars, 
>walk_entry->var);
-   size = efivars->walk_entry->var.DataSize;
-   *buf = kmalloc(size, GFP_KERNEL);
-   if (*buf == NULL)
-   return -ENOMEM;
-   memcpy(*buf, efivars->walk_entry->var.Data,
-  size);
-   efivars->walk_entry = 
list_entry(efivars->walk_entry->list.next,
-  struct efivar_entry, list);
-   return size;
+   } else if (sscanf(name, "dump-type%u-%u-%lu",
+  type, , ) == 3) {
+   /*
+* Check if an old format,
+* which doesn't support holding
+* multiple logs, remains.
+*/
+   *id = part;
+   *count = 0;
+   timespec->tv_sec = time;
+   timespec->tv_nsec = 0;
+   } else {
+   efivars->walk_entry = list_entry(
+   efivars->walk_entry->list.next,
+struct efivar_entry, list);
+   continue;
}
+
+   get_var_data_locked(efivars, >walk_entry->var);
+   size = efivars->walk_entry->var.DataSize;
+   *buf = kmalloc(size, GFP_KERNEL);
+   if (*buf == NULL)
+   return -ENOMEM;
+   memcpy(*buf, efivars->walk_entry->var.Data,
+  size);
+   efivars->walk_entry = list_entry(
+   efivars->walk_entry->list.next,
+   struct efivar_entry, list);
+   return size;
}
efivars->walk_entry = list_entry(efivars->walk_entry->list.next,
 struct efivar_entry, list);
-- 1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 5/7] efi_pstore: Add a sequence counter to a variable name

2012-10-26 Thread Seiji Aguchi

[Issue]

Currently, a variable name, which identifies each entry, consists of type, id 
and ctime.
But if multiple events happens in a short time, a second/third event may fail 
to log because
efi_pstore can't distinguish each event with current variable name.

[Solution]

A reasonable way to identify all events precisely is introducing a sequence 
counter to
the variable name.

The sequence counter has already supported in a pstore layer with "oopscount".
So, this patch adds it to a variable name.
Also, it is passed to read/erase callbacks of platform drivers in accordance 
with
the modification of the variable name.

  
 a variable name of first event: dump-type0-1-12345678
 a variable name of second event: dump-type0-1-12345678

  type:0
  id:1
  ctime:12345678

 If multiple events happen in a short time, efi_pstore can't distinguish them 
because
 variable names are same among them.

  

 it can be distinguishable by adding a sequence counter as follows.

 a variable name of first event: dump-type0-1-1-12345678
 a variable name of Second event: dump-type0-1-2-12345678

  type:0
  id:1
  sequence counter: 1(first event), 2(second event)
  ctime:12345678 

Signed-off-by: Seiji Aguchi 
---
 drivers/acpi/apei/erst.c   |   12 ++--
 drivers/firmware/efivars.c |   18 +++---
 fs/pstore/inode.c  |8 +---
 fs/pstore/internal.h   |2 +-
 fs/pstore/platform.c   |   11 ++-
 fs/pstore/ram.c|7 +++
 include/linux/pstore.h |8 +---
 7 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c
index 0bd6ae4..6d894bf 100644
--- a/drivers/acpi/apei/erst.c
+++ b/drivers/acpi/apei/erst.c
@@ -931,13 +931,13 @@ static int erst_check_table(struct acpi_table_erst 
*erst_tab)
 
 static int erst_open_pstore(struct pstore_info *psi);
 static int erst_close_pstore(struct pstore_info *psi);
-static ssize_t erst_reader(u64 *id, enum pstore_type_id *type,
+static ssize_t erst_reader(u64 *id, enum pstore_type_id *type, int *count,
   struct timespec *time, char **buf,
   struct pstore_info *psi);
 static int erst_writer(enum pstore_type_id type, enum kmsg_dump_reason reason,
-  u64 *id, unsigned int part,
+  u64 *id, unsigned int part, int count,
   size_t size, struct pstore_info *psi);
-static int erst_clearer(enum pstore_type_id type, u64 id,
+static int erst_clearer(enum pstore_type_id type, u64 id, int count,
struct timespec time, struct pstore_info *psi);
 
 static struct pstore_info erst_info = {
@@ -987,7 +987,7 @@ static int erst_close_pstore(struct pstore_info *psi)
return 0;
 }
 
-static ssize_t erst_reader(u64 *id, enum pstore_type_id *type,
+static ssize_t erst_reader(u64 *id, enum pstore_type_id *type, int *count,
   struct timespec *time, char **buf,
   struct pstore_info *psi)
 {
@@ -1055,7 +1055,7 @@ out:
 }
 
 static int erst_writer(enum pstore_type_id type, enum kmsg_dump_reason reason,
-  u64 *id, unsigned int part,
+  u64 *id, unsigned int part, int count,
   size_t size, struct pstore_info *psi)
 {
struct cper_pstore_record *rcd = (struct cper_pstore_record *)
@@ -1101,7 +1101,7 @@ static int erst_writer(enum pstore_type_id type, enum 
kmsg_dump_reason reason,
return ret;
 }
 
-static int erst_clearer(enum pstore_type_id type, u64 id,
+static int erst_clearer(enum pstore_type_id type, u64 id, int count,
struct timespec time, struct pstore_info *psi)
 {
return erst_clear(id);
diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index 6cbeea7..dc69802 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -658,13 +658,14 @@ static int efi_pstore_close(struct pstore_info *psi)
 }
 
 static ssize_t efi_pstore_read(u64 *id, enum pstore_type_id *type,
-  struct timespec *timespec,
+  int *count, struct timespec *timespec,
   char **buf, struct pstore_info *psi)
 {
efi_guid_t vendor = LINUX_EFI_CRASH_GUID;
struct efivars *efivars = psi->data;
char name[DUMP_NAME_LEN];
int i;
+   int cnt;
unsigned int part, size;
unsigned long time;
 
@@ -674,8 +675,10 @@ static ssize_t efi_pstore_read(u64 *id, enum 
pstore_type_id *type,
for (i = 0; i < DUMP_NAME_LEN; i++) {
name[i] = 
efivars->walk_entry->var.VariableName[i];
}
-   if (sscanf(name, "dump-type%u-%u-%lu", type, , 
) == 3) {
+   if (sscanf(name, "dump-type%u-%u-%d-%lu",
+  type, , , ) == 4) {

[PATCH v3 4/7] efi_pstore: Add ctime to argument of erase callback

2012-10-26 Thread Seiji Aguchi

[Issue]

Currently, a variable name, which is used to identify each log entry, consists 
of type,
id and ctime. But an erase callback does not use ctime.

If efi_pstore supported just one log, type and id were enough.
However, in case of supporting multiple logs, it doesn't work because
it can't distinguish each entry without ctime at erasing time.

 

 As you can see below, efi_pstore can't differentiate first event from second 
one without ctime.

 a variable name of first event: dump-type0-1-12345678
 a variable name of second event: dump-type0-1-23456789

  type:0
  id:1
  ctime:12345678, 23456789

[Solution]

This patch adds ctime to an argument of an erase callback.

It works across reboots because ctime of pstore means the date that the record 
was originally stored.
To do this, efi_pstore saves the ctime to variable name at writing time and 
passes it to pstore
at reading time. 

Signed-off-by: Seiji Aguchi 
---
 drivers/acpi/apei/erst.c   |4 ++--
 drivers/firmware/efivars.c |   15 +++
 fs/pstore/inode.c  |3 ++-
 fs/pstore/ram.c|2 +-
 include/linux/pstore.h |2 +-
 5 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c
index e4d9d24..0bd6ae4 100644
--- a/drivers/acpi/apei/erst.c
+++ b/drivers/acpi/apei/erst.c
@@ -938,7 +938,7 @@ static int erst_writer(enum pstore_type_id type, enum 
kmsg_dump_reason reason,
   u64 *id, unsigned int part,
   size_t size, struct pstore_info *psi);
 static int erst_clearer(enum pstore_type_id type, u64 id,
-   struct pstore_info *psi);
+   struct timespec time, struct pstore_info *psi);
 
 static struct pstore_info erst_info = {
.owner  = THIS_MODULE,
@@ -1102,7 +1102,7 @@ static int erst_writer(enum pstore_type_id type, enum 
kmsg_dump_reason reason,
 }
 
 static int erst_clearer(enum pstore_type_id type, u64 id,
-   struct pstore_info *psi)
+   struct timespec time, struct pstore_info *psi)
 {
return erst_clear(id);
 }
diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index fbe9202..6cbeea7 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -747,24 +747,25 @@ static int efi_pstore_write(enum pstore_type_id type,
 };
 
 static int efi_pstore_erase(enum pstore_type_id type, u64 id,
-   struct pstore_info *psi)
+   struct timespec time, struct pstore_info *psi)
 {
-   char stub_name[DUMP_NAME_LEN];
+   char name[DUMP_NAME_LEN];
efi_char16_t efi_name[DUMP_NAME_LEN];
efi_guid_t vendor = LINUX_EFI_CRASH_GUID;
struct efivars *efivars = psi->data;
struct efivar_entry *entry, *found = NULL;
int i;
 
-   sprintf(stub_name, "dump-type%u-%u-", type, (unsigned int)id);
+   sprintf(name, "dump-type%u-%u-%lu", type, (unsigned int)id,
+   time.tv_sec);
 
spin_lock(>lock);
 
for (i = 0; i < DUMP_NAME_LEN; i++)
-   efi_name[i] = stub_name[i];
+   efi_name[i] = name[i];
 
/*
-* Clean up any entries with the same name
+* Clean up an entry with the same name
 */
 
list_for_each_entry(entry, >list, list) {
@@ -775,9 +776,6 @@ static int efi_pstore_erase(enum pstore_type_id type, u64 
id,
if (utf16_strncmp(entry->var.VariableName, efi_name,
  utf16_strlen(efi_name)))
continue;
-   /* Needs to be a prefix */
-   if (entry->var.VariableName[utf16_strlen(efi_name)] == 0)
-   continue;
 
/* found */
found = entry;
@@ -785,6 +783,7 @@ static int efi_pstore_erase(enum pstore_type_id type, u64 
id,
   >var.VendorGuid,
   PSTORE_EFI_ATTRIBUTES,
   0, NULL);
+   break;
}
 
if (found)
diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
index 4ab572e..4300af6 100644
--- a/fs/pstore/inode.c
+++ b/fs/pstore/inode.c
@@ -175,7 +175,8 @@ static int pstore_unlink(struct inode *dir, struct dentry 
*dentry)
struct pstore_private *p = dentry->d_inode->i_private;
 
if (p->psi->erase)
-   p->psi->erase(p->type, p->id, p->psi);
+   p->psi->erase(p->type, p->id, dentry->d_inode->i_ctime,
+ p->psi);
 
return simple_unlink(dir, dentry);
 }
diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index 1a4f6da..749693f 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -237,7 +237,7 @@ static int notrace ramoops_pstore_write_buf(enum 
pstore_type_id type,
 }
 
 static int ramoops_pstore_erase(enum pstore_type_id type, u64 id,
-   struct

Re: [PATCH] VFS: add config options to enable link restrictions

2012-10-26 Thread Kees Cook

On Fri, Oct 26, 2012 at 1:27 PM, Linus Torvalds
 wrote:
> On Fri, Oct 26, 2012 at 1:23 PM, Kees Cook  wrote:
>>
>> I'd like it to be the exception to turn it _off_, rather than the
>> exception to turn it on.
>
> Kees, you don't seem to understand.
>
> Breaking applications is unacceptable. End of story. It's broken them.
> Get over it.

No, I get that. I've been over it. I can handle it being off by
default. I just want there to be a way to make it enabled at build
time. I'll explore some other options; it's sensible to tie it to
other settings/things that are security-sensitive.

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 3/7] efi_pstore: Remove a logic erasing entries from a write callback to hold multiple logs

2012-10-26 Thread Seiji Aguchi

[Issue]

Currently, efi_pstore driver simply overwrites existing panic messages in NVRAM.
So, in the following scenario, we will lose 1st panic messages.

1. kernel panics.
2. efi_pstore is kicked and writes panic messages to NVRAM.
3. system reboots.
4. kernel panics again before a user checks the 1st panic messages in NVRAM.

[Solution]

A reasonable solution to fix the issue is just holding multiple logs without 
erasing
existing entries.
This patch removes a logic erasing existing entries in a write callback
because the logic is not needed in the write callback to support holding 
multiple logs.

Signed-off-by: Seiji Aguchi 
---
 drivers/firmware/efivars.c |   39 ++-
 1 files changed, 2 insertions(+), 37 deletions(-)

diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index bee14cc..fbe9202 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -701,18 +701,13 @@ static int efi_pstore_write(enum pstore_type_id type,
unsigned int part, size_t size, struct pstore_info *psi)
 {
char name[DUMP_NAME_LEN];
-   char stub_name[DUMP_NAME_LEN];
efi_char16_t efi_name[DUMP_NAME_LEN];
efi_guid_t vendor = LINUX_EFI_CRASH_GUID;
struct efivars *efivars = psi->data;
-   struct efivar_entry *entry, *found = NULL;
int i, ret = 0;
u64 storage_space, remaining_space, max_variable_size;
efi_status_t status = EFI_NOT_FOUND;
 
-   sprintf(stub_name, "dump-type%u-%u-", type, part);
-   sprintf(name, "%s%lu", stub_name, get_seconds());
-
spin_lock(>lock);
 
/*
@@ -730,35 +725,8 @@ static int efi_pstore_write(enum pstore_type_id type,
return -ENOSPC;
}
 
-   for (i = 0; i < DUMP_NAME_LEN; i++)
-   efi_name[i] = stub_name[i];
-
-   /*
-* Clean up any entries with the same name
-*/
-
-   list_for_each_entry(entry, >list, list) {
-   get_var_data_locked(efivars, >var);
-
-   if (efi_guidcmp(entry->var.VendorGuid, vendor))
-   continue;
-   if (utf16_strncmp(entry->var.VariableName, efi_name,
- utf16_strlen(efi_name)))
-   continue;
-   /* Needs to be a prefix */
-   if (entry->var.VariableName[utf16_strlen(efi_name)] == 0)
-   continue;
-
-   /* found */
-   found = entry;
-   efivars->ops->set_variable(entry->var.VariableName,
-  >var.VendorGuid,
-  PSTORE_EFI_ATTRIBUTES,
-  0, NULL);
-   }
-
-   if (found)
-   list_del(>list);
+   sprintf(name, "dump-type%u-%u-%lu", type, part,
+   get_seconds());
 
for (i = 0; i < DUMP_NAME_LEN; i++)
efi_name[i] = name[i];
@@ -768,9 +736,6 @@ static int efi_pstore_write(enum pstore_type_id type,
 
spin_unlock(>lock);
 
-   if (found)
-   efivar_unregister(found);
-
if (size)
ret = efivar_create_sysfs_entry(efivars,
  utf16_strsize(efi_name,
-- 1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 2/7] efi_pstore: Add a logic erasing entries to an erase callback

2012-10-26 Thread Seiji Aguchi

[Issue]

Currently, efi_pstore driver simply overwrites existing panic messages in NVRAM.
So, in the following scenario, we will lose 1st panic messages.
 
 1. kernel panics.
 2. efi_pstore is kicked and writes panic messages to NVRAM.
 3. system reboots.
 4. kernel panics again before a user checks the 1st panic messages in NVRAM.

[Solution]

A reasonable solution to fix the issue is just holding multiple logs without 
erasing
existing entries.

This patch freshly adds a logic erasing existing entries, which shared with a 
write callback,
to an erase callback.
To support holding multiple logs, the write callback doesn't need to erase any 
entries and
it will be removed in a subsequent patch.

Signed-off-by: Seiji Aguchi 
---
 drivers/firmware/efivars.c |   46 +++-
 1 files changed, 45 insertions(+), 1 deletions(-)

diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index 37ac21a..bee14cc 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -784,7 +784,51 @@ static int efi_pstore_write(enum pstore_type_id type,
 static int efi_pstore_erase(enum pstore_type_id type, u64 id,
struct pstore_info *psi)
 {
-   efi_pstore_write(type, 0, , (unsigned int)id, 0, psi);
+   char stub_name[DUMP_NAME_LEN];
+   efi_char16_t efi_name[DUMP_NAME_LEN];
+   efi_guid_t vendor = LINUX_EFI_CRASH_GUID;
+   struct efivars *efivars = psi->data;
+   struct efivar_entry *entry, *found = NULL;
+   int i;
+
+   sprintf(stub_name, "dump-type%u-%u-", type, (unsigned int)id);
+
+   spin_lock(>lock);
+
+   for (i = 0; i < DUMP_NAME_LEN; i++)
+   efi_name[i] = stub_name[i];
+
+   /*
+* Clean up any entries with the same name
+*/
+
+   list_for_each_entry(entry, >list, list) {
+   get_var_data_locked(efivars, >var);
+
+   if (efi_guidcmp(entry->var.VendorGuid, vendor))
+   continue;
+   if (utf16_strncmp(entry->var.VariableName, efi_name,
+ utf16_strlen(efi_name)))
+   continue;
+   /* Needs to be a prefix */
+   if (entry->var.VariableName[utf16_strlen(efi_name)] == 0)
+   continue;
+
+   /* found */
+   found = entry;
+   efivars->ops->set_variable(entry->var.VariableName,
+  >var.VendorGuid,
+  PSTORE_EFI_ATTRIBUTES,
+  0, NULL);
+   }
+
+   if (found)
+   list_del(>list);
+
+   spin_unlock(>lock);
+
+   if (found)
+   efivar_unregister(found);
 
return 0;
 }
-- 1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFT RESEND linux-next] sparc: dma-mapping: support debug_dma_mapping_error

2012-10-26 Thread Shuah Khan

On Fri, 2012-10-26 at 15:10 -0400, David Miller wrote:
> From: David Miller 
> Date: Fri, 26 Oct 2012 15:01:53 -0400 (EDT)
> 
> > From: Shuah Khan 
> > Date: Fri, 26 Oct 2012 10:13:09 -0600
> > 
> >> Add support for debug_dma_mapping_error() call to avoid warning from
> >> debug_dma_unmap() interface when it checks for mapping error checked
> >> status. Without this patch, device driver failed to check map error
> >> warning is generated.
> >> 
> >> Signed-off-by: Shuah Khan 
> > 
> > Why are you resending this?
> > 
> > When you resend patches, you have to give a reason.  Did the patch
> > change?  Are you updating the CC: list of the patch posting? Are
> > you correcting something else?
> 
> In fact you sent this _3_ times.  Please, don't do this, or at least
> explain why you are reposting something each time.

I am really sorry for the noise. I forgot to mark this explicitly for
linux-next the first time and this patch depends on a patch that went
into linux-next. I sent a few patches for several architectures and lost
track. I meant to add explanation before I hit send. The reason for
resend is to tag it for linux-next. 

-- Shuah


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 1/7] efi_pstore: Check remaining space with QueryVariableInfo() before writing data

2012-10-26 Thread Seiji Aguchi

[Issue]

As discussed in a thread below, Running out of space in EFI isn't a well-tested 
scenario.
And we wouldn't expect all firmware to handle it gracefully.
http://marc.info/?l=linux-kernel=134305325801789=2

On the other hand, current efi_pstore doesn't check a remaining space of 
storage at writing time.
Therefore, efi_pstore may not work if it tries to write a large amount of data.

[Patch Description]

To avoid handling the situation above, this patch checks if there is a space 
enough to log with
QueryVariableInfo() before writing data. 

Signed-off-by: Seiji Aguchi 
Acked-by: Mike Waychison 
---
 drivers/firmware/efivars.c |   18 ++
 include/linux/efi.h|1 +
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index d10c987..37ac21a 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -707,12 +707,29 @@ static int efi_pstore_write(enum pstore_type_id type,
struct efivars *efivars = psi->data;
struct efivar_entry *entry, *found = NULL;
int i, ret = 0;
+   u64 storage_space, remaining_space, max_variable_size;
+   efi_status_t status = EFI_NOT_FOUND;
 
sprintf(stub_name, "dump-type%u-%u-", type, part);
sprintf(name, "%s%lu", stub_name, get_seconds());
 
spin_lock(>lock);
 
+   /*
+* Check if there is a space enough to log.
+* size: a size of logging data
+* DUMP_NAME_LEN * 2: a maximum size of variable name
+*/
+   status = efivars->ops->query_variable_info(PSTORE_EFI_ATTRIBUTES,
+  _space,
+  _space,
+  _variable_size);
+   if (status || remaining_space < size + DUMP_NAME_LEN * 2) {
+   spin_unlock(>lock);
+   *id = part;
+   return -ENOSPC;
+   }
+
for (i = 0; i < DUMP_NAME_LEN; i++)
efi_name[i] = stub_name[i];
 
@@ -1237,6 +1254,7 @@ efivars_init(void)
ops.get_variable = efi.get_variable;
ops.set_variable = efi.set_variable;
ops.get_next_variable = efi.get_next_variable;
+   ops.query_variable_info = efi.query_variable_info;
error = register_efivars(&__efivars, , efi_kobj);
if (error)
goto err_put;
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 8670eb1..c47ec36 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -643,6 +643,7 @@ struct efivar_operations {
efi_get_variable_t *get_variable;
efi_get_next_variable_t *get_next_variable;
efi_set_variable_t *set_variable;
+   efi_query_variable_info_t *query_variable_info;
 };
 
 struct efivars {
-- 1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 0/7] efi_pstore: multiple event logging support

2012-10-26 Thread Seiji Aguchi

Changelog

v2 -> v3
   - Create patches 6/7 and 7/7 to work with an existing format of variable name

v1 -> v2
   - Separate into 5 patches in accordance with Mike's comment
   - Erase an extra line of comment in patch 1/5

[Issue]

Currently, efi_pstore driver simply overwrites existing panic messages in 
NVRAM.
So, in the following scenario, we will lose 1st panic messages.

1. kernel panics.
2. efi_pstore is kicked and writes panic messages to NVRAM.
3. system reboots.
4. kernel panics again before a user checks the 1st panic messages in NVRAM.

[Solution]

   Solutions of this problem has been discussed among Tony, Matthew, Don, Mike 
and me.

   http://marc.info/?l=linux-kernel=134273270704586=2

   And there are two possible solutions right now.
 - First one is introducing some policy overwriting existing logs.
 - Second one is simply holding multiple log without overwriting any 
entries.

   We haven't decided the overwriting policy which is reasonable to all users 
yet.
   But I believe we agree that just holding multiple logs is a reasonable way.

   We may need further discussions to find the possibility of introducing 
overwriting
   policy, especially getting critical messages in multiple oops case.
   But I would like to begin with a simple and reasonable way to everyone.
   So, this patch takes an approach just holding multiple logs.

[Patch Description]

(1/7) efi_pstore: Check remaining space with QueryVariableInfo() before writing 
data

(2/7) efi_pstore: Add a logic erasing entries to an erase callback

(3/7) efi_pstore: Remove a logic erasing entries from a write callback to hold 
multiple logs

(4/7) efi_pstore: Add ctime to argument of erase callback

(5/7) efi_pstore: Change a format of a variable name by adding a sequence 
counter

(6/7) efi_pstore: Add a format check for an existing variable name at reading 
time

(7/7) efi_pstore: Add a format check for an existing variable name at erasing 
time

Detailed explanations are written in each patch.

 drivers/acpi/apei/erst.c   |   16 +++---
 drivers/firmware/efivars.c |  150 ++-
 fs/pstore/inode.c  |7 ++-
 fs/pstore/internal.h   |2 +-
 fs/pstore/platform.c   |   11 ++--
 fs/pstore/ram.c|9 +--
 include/linux/efi.h|1 +
 include/linux/pstore.h |6 +-
 8 files changed, 133 insertions(+), 69 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT] Networking

2012-10-26 Thread David Miller


This is what we usually expect at this stage of the game, lots of
little things, mostly in drivers.  With the occaisional "oops didn't
mean to do that" kind of regressions in the core code.

1) Uninitialized data in __ip_vs_get_timeouts(), from Arnd Bergmann

2) Reject invalid ACK sequences in Fast Open sockets, from Jerry Chu.

3) Lost error code on return from _rtl_usb_receive(), from Christian
   Lamparter.

4) Fix reset resume on USB rt2x00, from Stanislaw Gruszka.

5) Release resources on error in pch_gbe driver, from Veaceslav Falico.

6) Default hop limit not set correctly in ip6_template_metrics[], fix
   from Li RongQing.

7) Gianfar PTP code requests wrong kind of resource during probe,
   fix from Wei Yang.

8) Fix VHOST net driver on big-endian, from Michael S. Tsirkin.

9) Mallenox driver bug fixes from Jack Morgenstein, Or Gerlitz,
   Moni Shoua, Dotan Barak, and Uri Habusha.

10) usbnet leaks memory on TX path, fix from Hemant Kumar.

11) Use socket state test, rather than presence of FIN bit packet,
to determine FIONREAD/SIOCINQ value.  Fix from Eric Dumazet.

12) Fix cxgb4 build failure, from Vipul Pandya.

13) Provide a SYN_DATA_ACKED state to complement SYN_FASTOPEN in socket
info dumps.  From Yuchung Cheng.

14) Fix leak of security path in kfree_skb_partial().  Fix from
Eric Dumazet.

15) Handle RX FIFO overflows more resiliently in pch_gbe driver, from
Veaceslav Falico.

16) Fix MAINTAINERS file pattern for networking drivers, from Jean
Delvare.

17) Add iPhone5 IDs to IPHETH driver, from Jay Purohit.

18) VLAN device type change restriction is too strict, and should not
trigger for the automatically generated vlan0 device.  Fix from
Jiri Pirko.

19) Make PMTU/redirect flushing work properly again in ipv4, from Steffen
Klassert.

20) Fix memory corruptions by using kfree_rcu() in netlink_release().
From Eric Dumazet.

21) More qmi_wwan device IDs, from Bjørn Mork.

22) Fix unintentional change of SNAT/DNAT hooks in generic NAT
infrastructure, from Elison Niven.

23) Fix 3.6.x regression in xt_TEE netfilter module, from Eric Dumazet.

Please pull, thanks a lot!

The following changes since commit 43c422eda99b894f18d1cca17bcd2401efaf7bd0:

  apparmor: fix apparmor OOPS in audit_log_untrustedstring+0x1c/0x40 
(2012-10-17 16:29:46 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master

for you to fetch changes up to 330ee00412bbaefa7d0597a1bed7804e818ba91c:

  Merge branch 'master' of 
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 
(2012-10-26 10:32:13 -0400)



Arnd Bergmann (1):
  ipvs: initialize returned data in do_ip_vs_get_ctl

Bing Zhao (2):
  mwifiex: return -EBUSY if specific scan request cannot be honored
  mwifiex: clean up scan state on error

Bjørn Mork (1):
  net: qmi_wwan: adding more ZTE devices

Chris Metcalf (1):
  tilegx: fix some issues in the SW TSO support

Christian Lamparter (1):
  rtlwifi: pass rx setup error code to caller

Dan Carpenter (1):
  brcmfmac: fix end of loop check (signedness bug)

Dan Williams (1):
  qmi_wwan/cdc_ether: move Novatel 551 and E362 to qmi_wwan

David S. Miller (4):
  Merge branch 'master' of git://1984.lsi.us.es/nf
  Merge tag 'batman-adv-fix-for-davem' of 
git://git.open-mesh.org/linux-merge
  Merge branch 'for-davem' of git://git.kernel.org/.../linville/wireless
  Merge branch 'fixes-for-3.7' of git://gitorious.org/linux-can/linux-can

Dotan Barak (1):
  net/mlx4_core: Unmap UAR also in the case of error flow

Elison Niven (1):
  netfilter: xt_nat: fix incorrect hooks for SNAT and DNAT targets

Eric Dumazet (4):
  netfilter: xt_TEE: don't use destination address found in header
  netlink: use kfree_rcu() in netlink_release()
  tcp: fix FIONREAD/SIOCINQ
  net: fix secpath kmemleak

Felix Fietkau (2):
  mac80211: use ieee80211_free_txskb in a few more places
  Revert "ath9k_hw: Updated AR9003 tx gain table for 5GHz"

Franky Lin (2):
  brcmfmac: use control channel in roamed status reporting
  brcmfmac: set dongle mode accordingly when interface up

Hante Meuleman (1):
  brcmfmac: handle all exceptions as an error.

Hemant Kumar (1):
  net: usb: Fix memory leak on Tx data path

Jack Morgenstein (1):
  net/mlx4_en: Fix double-release-range in tx-rings

Jay Purohit (1):
  usb/ipheth: Add iPhone 5 support

Jean Delvare (1):
  MAINTAINERS: NETWORKING DRIVERS matches too much

Jerry Chu (1):
  tcp: Reject invalid ack_seq to Fast Open sockets

Jiri Pirko (1):
  vlan: allow to change type when no vlan device is hooked on netdev

Joe Jin (1):
  qla3xxx: Ensure request/response queue addr writes to the registers

Johan Hedberg (1):
  Bluetooth: SMP: Fix setting unknown auth_req bits

Johannes Berg (3):
  Merge

Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()

2012-10-26 Thread Tejun Heo

Hello,

On Fri, Oct 26, 2012 at 11:29:56PM +0200, Rafael J. Wysocki wrote:
> Actually, what tree is it supposed to apply to?
> 
> The change in kernel/cgroup_freezer.c doesn't look like anything in
> the current Linus' tree to me.

Ooh, right.  This depends on the earlier cgroup_freezer changes.
Sorry about the confusion.  I'll apply it to the following branch (the
same one used for the previous cgroup_freezer updates).

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git cgroup-freezer

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()

2012-10-26 Thread Rafael J. Wysocki

On Friday, October 26, 2012 11:14:17 PM Rafael J. Wysocki wrote:
> On Friday, October 26, 2012 08:01:49 PM Oleg Nesterov wrote:
> > On 10/26, Tejun Heo wrote:
> > >
> > >  Acked-by: Tejun Heo 
> > 
> > Thanks!
> > 
> > > Rafael, sorry that this one doesn't have pm cc'd
> > 
> > Ah, sorry Rafael. Yes, I have read you email, and I was going to
> > add linux-pm but forgot.
> > 
> > > but can you please
> > > pick up this one too?
> > 
> > Please, and thanks.
> 
> OK, but that will go to Linus in the next batch.

Actually, what tree is it supposed to apply to?

The change in kernel/cgroup_freezer.c doesn't look like anything in
the current Linus' tree to me.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Process Hang in __read_seqcount_begin

2012-10-26 Thread Peter LaDow

On Fri, Oct 26, 2012 at 2:05 PM, Eric Dumazet  wrote:
> Do you know what is per cpu data in linux kernel ?

I sorta did.  But since your response, I did more reading, and now I
see what you mean.  But I don't think this is a per cpu issue.  More
below.

> Because its not needed. Really I dont know why you want that.
>
> Once you are sure a thread cannot be interrupted by a softirq, and
> cannot migrate to another cpu, access to percpu data doesnt need other
> synchronization at all.

Because there are multiple entry points on the same CPU.  In
net/ipv4/netfilter/ip_tables, there are two entries to
xt_write_recseq_begin().  The first is in ipt_do_table and the other
is in get_counters.  Where we are seeing the lockup is with a
getsockopt syscall leading to do_counters.  The other path is through
ipt_do_table, which is installed as a hook.  I'm not sure from what
context the hooks are called, but it is certainly from a different
context than the syscall.

> Following sequence is safe :
>
> addend = (__this_cpu_read(xt_recseq.sequence) + 1) & 1;
> /*
>  * This is kind of a write_seqcount_begin(), but addend is 0 or 1
>  * We dont check addend value to avoid a test and conditional jump,
>  * since addend is most likely 1
>  */
> __this_cpu_add(xt_recseq.sequence, addend);

If this were safe, we wouldn't be seeing this lockup and your patch
wouldn't be needed.  So it seems that your patch doesn't really
address the issue that we are not "sure a thread cannot be interrupted
by a softirq, and cannot migrate to another cpu".  Well, we know it
cannot migrate to another CPU, because there isn't another CPU.  So
apparently, it can be interrupted by a softirq.  So local_bh_disable
isn't doing anything useful in the RT patches with regard to this.

As I mentioned earlier, I think perhaps what your patch did was ensure
an atomic update of the sequence counter.  But it does nothing to
synchronize two writers.  If they were already synchronized (such as
via the call to local_bh_disable), then we wouldn't see sequence
counter corruption, no?

Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] VFS: add config options to enable link restrictions

2012-10-26 Thread Boaz Harrosh

On 10/26/2012 01:23 PM, Kees Cook wrote:
> 
> Every distro will ship with this enabled (except perhaps Damn
> Vulnerable Linux), so why make it harder?
> 

So please remind me why can't it be on by default in code.
And the normal sysctl to turn it off for these who want to
experiment with "filesystem corruption".

So the basic premise is that you must not have any
filesystem corruption at the parts used by boot up until
the init portion that turns "filesystem corruption" on

> -Kees
> 

Cheers
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-26 Thread Nix

On 26 Oct 2012, Theodore Ts'o uttered the following:

> The plan is that eventually, we will have checksums on a
> per-journalled block basis, instead of a per-commit basis, and when we
> get a failed checksum, we skip the replay of that block,

But not of everything it implies, since that's quite tricky to track
down (it's basically the same work needed for softupdates, but in
reverse). Hence the e2fsck check, I suppose.

> prevent unwary civilians from coming across the feature and saying,
> "oooh, shiny!" and turning it on.  :-(

Or having it turned on by default either, which seems to be the case
now.

-- 
NULL && (void)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] Power management and ACPI fixes for 3.7-rc3

2012-10-26 Thread Rafael J. Wysocki

Hi Linus,

Please pull from the git repository at

  git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 
pm+acpi-for-3.7-rc3

to receive power management and ACPI fixes for v3.7-rc3 with top-most commit
879dca019dc43a1622edca3e7dde644b14b5acc5

  ACPI: missing break

on top of commit 6f0c0580b70c89094b3422ba81118c7b959c7556

  Linux 3.7-rc2

Included are:

* Fix for a recently introduced memory leak in acpi_bind_one() from Jesper Juhl.

* PM domains fix for an error code path memory leak in pm_genpd_attach_cpuidle()
  from Jonghwan Choi.

* Fix for smp_processor_id() usage in preemptible code in powernow-k8 from
  Andreas Herrmann (stable material).

* Fix for a suspend-related memory leak in cpufreq stats from Xiaobing Tu.

* Freezer fix for failure to clear PF_NOFREEZE along with PF_KTHREAD
  in flush_old_exec() from Oleg Nesterov.

* One-liner acpi_processor_notify() fix from Alan Cox.

Thanks!


 drivers/acpi/glue.c | 1 +
 drivers/acpi/processor_driver.c | 1 +
 drivers/base/power/domain.c | 5 -
 drivers/cpufreq/cpufreq_stats.c | 1 +
 drivers/cpufreq/powernow-k8.c   | 9 +
 fs/exec.c   | 3 ++-
 6 files changed, 10 insertions(+), 10 deletions(-)

---

Alan Cox (1):
  ACPI: missing break

Andreas Herrmann (1):
  cpufreq / powernow-k8: Remove usage of smp_processor_id() in preemptible 
code

Jesper Juhl (1):
  ACPI: Fix memory leak in acpi_bind_one()

Oleg Nesterov (1):
  freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD

Tu, Xiaobing (1):
  Fix memory leak in cpufreq stats.

jhbird.c...@samsung.com (1):
  PM / Domains: Fix memory leak on error path in pm_genpd_attach_cpuidle


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-26 Thread Theodore Ts'o

> This isn't the first time that journal_checksum has proven problematic.
> It's a shame that we're stuck between two error-inducing stools here...

The problem is that it currently bails out be aborting the entire
journal replay, and the file system will get left in a mess when it
does that.  It's actually safer today to just be blissfully ignorant
of a corrupted block in the journal, than to have the journal getting
aborted mid-replay when we detect a corrupted commit.

The plan is that eventually, we will have checksums on a
per-journalled block basis, instead of a per-commit basis, and when we
get a failed checksum, we skip the replay of that block, but we keep
going and replay all of the other blocks and commits.  We'll then set
the "file system corrupted" bit and force an e2fsck check.

The problem is this code isn't done yet, and journal_checksum is
really not ready for prime time.  When it is ready, my plan is to wire
it up so it is enabled by default; at the moment, it was intended for
developer experimentation only.  As I said, it's my fault for not
clearly labelling it "Not for you!", or putting it under an #ifdef to
prevent unwary civilians from coming across the feature and saying,
"oooh, shiny!" and turning it on.  :-(

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] pwm: vt8500: Update vt8500 PWM driver support

2012-10-26 Thread Arnd Bergmann

On Friday 26 October 2012, Chris Brand wrote:
> > @@ -124,6 +156,12 @@ static int __devinit pwm_probe(struct
> > platform_device *pdev)
> >   chip->chip.base = -1;
> >   chip->chip.npwm = VT8500_NR_PWMS;
> > 
> > + chip->clk = devm_clk_get(>dev, NULL);
> > + if (IS_ERR_OR_NULL(chip->clk)) {
> > + dev_err(>dev, "clock source not specified\n");
> > + return PTR_ERR(chip->clk);
> > + }
> > +
> 
> Do you want IS_ERR() rather than IS_ERR_OR_NULL() here ?
> Otherwise you may return PTR_ERR(NULL).
> 

Right. devm_clk_get() never returns NULL, but as Russell pointed out
recently, this usage of IS_ERR_OR_NULL() is completely bogus.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-26 Thread Theodore Ts'o

This looks very different.  The symptoms are quite different, and it's
most likely that an unclean shutdown is involved.  In your case,
you're doing clean shutdowns, with some suspend/resume cycles thrown
in.  Also, kernel version 3.5.5 doesn't have the commits that were
added between 3.6.1 and 3.6.3.

Are you running e2fsck to fix the file system consistency problems;
what is e2fsck reporting?

Do you need to have a suspend/resume in order to trigger the problem?

This could very be some kind of hardware problem or kernel bug related
to suspend/resume.  Unfortunately, many different problems get noticed
by the file system, but the root cause is can often be something else;
a hardware problem, or a bug somewhere else in the kernel.

Regards,

- Ted

P.S.  Can you do us a favor and start a separate mail thread with the
information reposted?  It's can get hard to track different cases when
a lot of people assume that their random failure (some of which are
hardware problems) are related to the issue we are trying to track
down in this mail thread and then they all pile onto the same mail
thread or the same web forum --- one of the reasons why I detest
Ubuntu Launchpad.  Thanks!!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()

2012-10-26 Thread Rafael J. Wysocki

On Friday, October 26, 2012 08:01:49 PM Oleg Nesterov wrote:
> On 10/26, Tejun Heo wrote:
> >
> >  Acked-by: Tejun Heo 
> 
> Thanks!
> 
> > Rafael, sorry that this one doesn't have pm cc'd
> 
> Ah, sorry Rafael. Yes, I have read you email, and I was going to
> add linux-pm but forgot.
> 
> > but can you please
> > pick up this one too?
> 
> Please, and thanks.

OK, but that will go to Linus in the next batch.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] x86,mm: drop TLB flush from ptep_set_access_flags

2012-10-26 Thread Alan Cox

On Fri, 26 Oct 2012 14:45:02 -0400
Rik van Riel  wrote:

> Intel has an architectural guarantee that the TLB entry causing
> a page fault gets invalidated automatically. This means
> we should be able to drop the local TLB invalidation.
> 
> Because of the way other areas of the page fault code work,
> chances are good that all x86 CPUs do this.  However, if
> someone somewhere has an x86 CPU that does not invalidate
> the TLB entry causing a page fault, this one-liner should
> be easy to revert.

This does not strike me as a good standard of validation for such a change

At the very least we should have an ACK from AMD and from VIA, and
preferably ping RDC and some of the other embedded folks. Given an AMD
and VIA ACK I'd be fine. I doubt anyone knows any more what Cyrix CPUs
did or cared about and I imagine H Peter or Linus can answer for
Transmeta ;-)

Alan


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Process Hang in __read_seqcount_begin

2012-10-26 Thread Eric Dumazet

On Fri, 2012-10-26 at 11:51 -0700, Peter LaDow wrote:
> (I've added netfilter and linux-rt-users to try to pull in more help).
> 
> On Fri, Oct 26, 2012 at 9:48 AM, Eric Dumazet  wrote:
> > Upstream kernel is fine, there is no race, as long as :
> >
> > local_bh_disable() disables BH and preemption.
> 
> Looking at the unpatched code in net/ipv4/netfilter/ip_tables.c, it
> doesn't appear that any of the code checks the return value for
> xt_write_receq_begin to determine if it is safe to write.  And neither
> does the newly patched code.  How did the mainline code prevent
> corruption of the tables it is updating?
> 

Do you know what is per cpu data in linux kernel ?

> Why isn't there something like
> 
>   while ( (addend = xt_write_recseq_begin()) == 0 );
> 
> To make sure that only one person has write access to the tables?
> Better yet, why not use a seqlock_t instead?
> 

Because its not needed. Really I dont know why you want that.

Once you are sure a thread cannot be interrupted by a softirq, and
cannot migrate to another cpu, access to percpu data doesnt need other
synchronization at all.

Following sequence is safe :

addend = (__this_cpu_read(xt_recseq.sequence) + 1) & 1;
/*
 * This is kind of a write_seqcount_begin(), but addend is 0 or 1
 * We dont check addend value to avoid a test and conditional jump,
 * since addend is most likely 1
 */
__this_cpu_add(xt_recseq.sequence, addend);

Because any other thread will use a different percpu data.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-26 Thread Nix

On 26 Oct 2012, Theodore Ts'o stated:

> On Fri, Oct 26, 2012 at 09:37:08PM +0100, Nix wrote:
>> 
>> I can reproduce this on a small filesystem and stick the image somewhere
>> if that would be of any use to anyone. (If I'm very lucky, merely making
>> this offer will make the problem go away. :} )
>
> I'm not sure the image is going to be that useful.  What we really
> need to do is to get a reliable reproduction of what _you_ are seeing.
>
> It's clear from Eric's experiments that journal_checksum is dangerous.
> 
> That's why one of the things I asked you to do when you had time was
> to see if you could reproduce the problem you are seeing w/o
> nobarrier,journal_checksum,journal_async_commit.

OK. Will do tomorrow.

> The other experiment that would be really useful if you could do is to
> try to apply these two patches which I sent earlier this week:
>
> [PATCH 1/2] ext4: revert "jbd2: don't write superblock when if its empty
> [PATCH 2/2] ext4: fix I/O error when unmounting an ro file system
>
> ... and see if they make a difference.

As of tomorrow I'll be able to reboot without causing a riot: I'll test
it then. (Sorry for the delay :( )

>   So I really don't want
> to push these patches to Linus until I get confirmation that they make
> a difference to *somebody*.

Agreed.

This isn't the first time that journal_checksum has proven problematic.
It's a shame that we're stuck between two error-inducing stools here...

-- 
NULL && (void)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [drm:i915_hangcheck_hung] ERROR Hangcheck timer elapsed... GPU hung

2012-10-26 Thread Justin P. Mattock


On 10/26/2012 01:05 AM, Daniel Vetter wrote:

On Fri, Oct 26, 2012 at 6:43 AM, Justin P. Mattock
 wrote:


No worries, it is another ILK hang similar to the ones reported earlier
- it just seems the ring stops advancing. Hopefully it is a missing w/a
from http://cgit.freedesktop.org/~danvet/drm/log/?h=ilk-wa-pile
-Chris



well if this means building libdrm etc.. then thats not a problem, more time
consuming if anything. perhaps an *.rpm that I can test to see?


It's not libdrm, the above is just a kernel git tree with a bunch of
ironlake workarounds.
-Daniel




nice..

:~/drm> git clone git://people.freedesktop.org/~danvet/drm
Cloning into 'drm'...
remote: Counting objects: 2728390, done.
remote: Compressing objects: 100% (418606/418606), done.
remote: Total 2728390 (delta 2293727), reused 2717443 (delta 2282880)
Receiving objects: 100% (2728390/2728390), 637.95 MiB | 599 KiB/s, done.
Resolving deltas: 100% (2293727/2293727), done.
warning: remote HEAD refers to nonexistent ref, unable to checkout.


so now I have to go on a witch hunt for 600MB's in my system.

Justin P. Mattock
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-26 Thread Theodore Ts'o

On Fri, Oct 26, 2012 at 09:37:08PM +0100, Nix wrote:
> 
> I can reproduce this on a small filesystem and stick the image somewhere
> if that would be of any use to anyone. (If I'm very lucky, merely making
> this offer will make the problem go away. :} )

I'm not sure the image is going to be that useful.  What we really
need to do is to get a reliable reproduction of what _you_ are seeing.

It's clear from Eric's experiments that journal_checksum is dangerous.
In fact, I will likely put it under an #ifdef EXT4_EXPERIMENTAL to try
to discourage people from using it in the future.  There are things
I've been planning on doing to make it be safer, but there's a very
good *reason* that both journal_checksum and journal_async_commit are
not on by default.

That's why one of the things I asked you to do when you had time was
to see if you could reproduce the problem you are seeing w/o
nobarrier,journal_checksum,journal_async_commit.

The other experiment that would be really useful if you could do is to
try to apply these two patches which I sent earlier this week:

[PATCH 1/2] ext4: revert "jbd2: don't write superblock when if its empty
[PATCH 2/2] ext4: fix I/O error when unmounting an ro file system

... and see if they make a difference.

If they don't make a difference, I don't want to apply patches just
for placebo/PR reasons.  And for Eric at least, he can reproduce the
journal checksum error followed by fairly significant corruption
reported by e2fsck with journal_checksum, and the presence or absense
of these patches make no difference for him.  So I really don't want
to push these patches to Linus until I get confirmation that they make
a difference to *somebody*.

Regards,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] at91: fixes for 3.7-rc3

2012-10-26 Thread Arnd Bergmann

On Thursday 25 October 2012, Nicolas Ferre wrote:
>   ARM: at91: fix external interrupts in non-DT case

This patch now leads to build errors with at91x40_defconfig,
which I've fixed up by applying the patch below on top.

Please yell if this is not the right fix.

Arnd

commit 3a8852afde01682083028ee427e0678f9bbddd75
Author: Arnd Bergmann 
Date:   Fri Oct 26 22:49:09 2012 +0200

ARM: at91: fix at91x40 build

patch 738a0fd7 "ARM: at91: fix external interrupts in non-DT case"
fixed a run-time error on some at91 platforms but did not apply
the same change to at91x40, which now doesn't build.

This changes at91x40 in the same way that the other platforms
were changed.

Signed-off-by: Arnd Bergmann 

diff --git a/arch/arm/mach-at91/at91x40.c b/arch/arm/mach-at91/at91x40.c
index 6bd7300..bb7f544 100644
--- a/arch/arm/mach-at91/at91x40.c
+++ b/arch/arm/mach-at91/at91x40.c
@@ -88,6 +88,6 @@ void __init at91x40_init_interrupts(unsigned int 
priority[NR_AIC_IRQS])
if (!priority)
priority = at91x40_default_irq_priority;
 
-   at91_aic_init(priority);
+   at91_aic_init(priority, at91_extern_irq);
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Process Hang in __read_seqcount_begin

2012-10-26 Thread Thomas Gleixner

On Fri, 26 Oct 2012, Peter LaDow wrote:
> (I've added netfilter and linux-rt-users to try to pull in more help).
> 
> On Fri, Oct 26, 2012 at 9:48 AM, Eric Dumazet  wrote:
> > Upstream kernel is fine, there is no race, as long as :
> >
> > local_bh_disable() disables BH and preemption.
> 
> Looking at the unpatched code in net/ipv4/netfilter/ip_tables.c, it
> doesn't appear that any of the code checks the return value for
> xt_write_receq_begin to determine if it is safe to write.  And neither
> does the newly patched code.  How did the mainline code prevent
> corruption of the tables it is updating?
> 
> Why isn't there something like:
> 
>   while ( (addend = xt_write_recseq_begin()) == 0 );
> 
> To make sure that only one person has write access to the tables?
> Better yet, why not use a seqlock_t instead?
> 
> > Apparently RT changes this, so RT needs to change the code.
> 
> The RT patch only touches local_bh_disable/enable, not the code in
> ip_tables.c.  Does the local_bh_disable/enable in the mainline code
> protect against multiple writers?
> 
> > cmpxchg() has strong guarantees (and is also slower than upstream code),
> > and seems a reasonable way to fix RT/iptables
> 
> I see this now.  And I agree that your patch would prevent corruption
> of the sequence counter.

Thanks for the reminder. I'll have a look.

   tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v4] pwm: vt8500: Update vt8500 PWM driver support

2012-10-26 Thread Chris Brand

Hi Tony,

> @@ -124,6 +156,12 @@ static int __devinit pwm_probe(struct
> platform_device *pdev)
>   chip->chip.base = -1;
>   chip->chip.npwm = VT8500_NR_PWMS;
> 
> + chip->clk = devm_clk_get(>dev, NULL);
> + if (IS_ERR_OR_NULL(chip->clk)) {
> + dev_err(>dev, "clock source not specified\n");
> + return PTR_ERR(chip->clk);
> + }
> +

Do you want IS_ERR() rather than IS_ERR_OR_NULL() here ?
Otherwise you may return PTR_ERR(NULL).

Chris


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] slave-dmaengine fixes

2012-10-26 Thread Linus Torvalds

On Fri, Oct 26, 2012 at 5:29 AM, Vinod Koul  wrote:
>
>   git://git.infradead.org/users/vkoul/slave-dma.git fixes

git.infradead.org is sick, and is apparently not accepting connections...

Can you check whether git-daemon is actually running there? The
machine is pinging ok.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-26 Thread Nix

On 26 Oct 2012, Martin said:

> On 10/26/2012 10:24 PM, Nix wrote:
>> On 26 Oct 2012, Martin spake thusly:
>>> Computer is booted again in order to copy a few files to memory stick. 
>>> Unbeknownst to me, the following entries are logged in the
>>> system log:
>>>
>>> Oct 15 20:00:16 harold kernel: EXT4-fs error (device sda5): 
>>> add_dirent_to_buf:1587: inode #655361: block 2629945: comm mount: bad
>>> entry in directory: rec_len % 4 != 0 - offset=360(360), inode=655682, 
>>> rec_len=18, name_len=5
>>> Oct 15 20:00:16 harold kernel: Aborting journal on device sda5-8.
>>> Oct 15 20:00:16 harold kernel: EXT4-fs (sda5): Remounting filesystem 
>>> read-only
>>> Oct 15 20:00:16 harold kernel: EXT4-fs error (device sda5) in 
>>> ext4_evict_inode:238: Journal has aborted
>>> Oct 15 20:00:16 harold kernel: EXT4-fs error (device sda5) in 
>>> ext4_create:2120: IO failure
>>
>> That's an interesting failure, but looks slightly different to what I
>> saw. No bad directory entries, no aborted journals: a replayed journal
>> and subsequent corruption. Still damaged though, and after a journal
>> abort I'm not surprised you had problems!
>
> So my corrupt journal is simply the result of a user turning off the machine 
> at a bad point in time? That's scary. In that scenario
> even the option data=journal wouldn't save me from harm, would it?

No, I think that's probably a bug -- but I don't know if it's the same
bug: the symptoms are slightly different.

(Note that some hard drives in the distant past had been known to write
rubbish if powered down during a write. I don't think this has been true
for a good decade or so, though.)

>> It's hard to reason about a kernel that's had *that* massive lump of
>> binary junk applied to it, alas. This may or may not be the same
>> problem: it has some common features with what I see, but not all.
>
> true, i normally re-create problems with vanilla kernels before
> reporting them. In this case I was cleanly sniped with no chance of
> re-play so far.

True. I'm stuck with a problem that I can only currently reproduce on
physical hardware myself :( In addition to seeing if Ted's proposed
patch reduces the frequency of corruption, I'll be doing some tests this
weekend with LVM block device suspension and subsequent reboots to see
if that causes similar symptoms even in virtualization.

-- 
NULL && (void)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-26 Thread Martin


On 10/26/2012 10:24 PM, Nix wrote:

On 26 Oct 2012, Martin spake thusly:

[...]

I have studied my corruption problem more closely and can give you a
description of what happened below. Would you say this may be the same
bug?


No. You want to keep up with the thread. Ted's first educated guess is
not always guaranteed to be correct (though this is rare).


OK




Oct 15 19:56:12

Computer is booted again in order to copy a few files to memory stick. 
Unbeknownst to me, the following entries are logged in the
system log:

Oct 15 20:00:16 harold kernel: EXT4-fs error (device sda5): 
add_dirent_to_buf:1587: inode #655361: block 2629945: comm mount: bad
entry in directory: rec_len % 4 != 0 - offset=360(360), inode=655682, 
rec_len=18, name_len=5
Oct 15 20:00:16 harold kernel: Aborting journal on device sda5-8.
Oct 15 20:00:16 harold kernel: EXT4-fs (sda5): Remounting filesystem read-only
Oct 15 20:00:16 harold kernel: EXT4-fs error (device sda5) in 
ext4_evict_inode:238: Journal has aborted
Oct 15 20:00:16 harold kernel: EXT4-fs error (device sda5) in ext4_create:2120: 
IO failure


That's an interesting failure, but looks slightly different to what I
saw. No bad directory entries, no aborted journals: a replayed journal
and subsequent corruption. Still damaged though, and after a journal
abort I'm not surprised you had problems!


So my corrupt journal is simply the result of a user turning off the 
machine at a bad point in time? That's scary. In that scenario even the 
option data=journal wouldn't save me from harm, would it?


Funny this happens to someone who has always said that robustness is the 
most important quality of a filesystem (and who thinks data=writeback is 
madness).





   I will try to rename them to their
proper name on another machine, and restore them on the target
machine. However, due to the sheer number this might take forever.


I relearned this week that backups are good.


Backups are good, and always too old.




Also I am worried the problem might re-surface, as it has neither been
identified nor fixed.


I'm seeing it on almost every reboot.


Indeed the symptoms look different.




NB: kernel was v3.5.5


Hm, this provides possible evidence that the problem does indeed extend
into 3.5.x.


with CK1 and BFQ patches, tainted by nvidia module.


It's hard to reason about a kernel that's had *that* massive lump of
binary junk applied to it, alas. This may or may not be the same
problem: it has some common features with what I see, but not all.



true, i normally re-create problems with vanilla kernels before 
reporting them. In this case I was cleanly sniped with no chance of 
re-play so far.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 10/33] perf, tools: Add abort,notx,intx branch filter options to perf report -j v2

2012-10-26 Thread Andi Kleen

From: Andi Kleen 

Make perf report -j aware of the new intx,notx,abort branch qualifiers.

v2: ABORT -> ABORTTX
Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-record.txt |3 +++
 tools/perf/builtin-record.c  |3 +++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index b38a1f9..4b9f477 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -172,6 +172,9 @@ following filters are defined:
 - u:  only when the branch target is at the user level
 - k: only when the branch target is in the kernel
 - hv: only when the target is at the hypervisor level
+   - intx: only when the target is in a hardware transaction
+   - notx: only when the target is not in a hardware transaction
+   - aborttx: only when the target is a hardware transaction abort
 
 +
 The option requires at least one branch type among any, any_call, any_ret, 
ind_call.
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index e9231659..88ecbbd 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -725,6 +725,9 @@ static const struct branch_mode branch_modes[] = {
BRANCH_OPT("any_call", PERF_SAMPLE_BRANCH_ANY_CALL),
BRANCH_OPT("any_ret", PERF_SAMPLE_BRANCH_ANY_RETURN),
BRANCH_OPT("ind_call", PERF_SAMPLE_BRANCH_IND_CALL),
+   BRANCH_OPT("aborttx", PERF_SAMPLE_BRANCH_ABORTTX),
+   BRANCH_OPT("intx", PERF_SAMPLE_BRANCH_INTX),
+   BRANCH_OPT("notx", PERF_SAMPLE_BRANCH_NOTX),
BRANCH_END
 };
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 07/33] perf, x86: Support Haswell v4 LBR format

2012-10-26 Thread Andi Kleen

From: Andi Kleen 

Haswell has two additional LBR from flags for TSX: intx and abort, implemented
as a new v4 version of the LBR format.

Handle those in and adjust the sign extension code to still correctly extend.
The flags are exported similarly in the LBR record to the existing misprediction
flag

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c |   18 +++---
 include/linux/perf_event.h |7 ++-
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index da02e9c..2af6695b 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -12,6 +12,7 @@ enum {
LBR_FORMAT_LIP  = 0x01,
LBR_FORMAT_EIP  = 0x02,
LBR_FORMAT_EIP_FLAGS= 0x03,
+   LBR_FORMAT_EIP_FLAGS2   = 0x04,
 };
 
 /*
@@ -56,6 +57,8 @@ enum {
 LBR_FAR)
 
 #define LBR_FROM_FLAG_MISPRED  (1ULL << 63)
+#define LBR_FROM_FLAG_INTX (1ULL << 62)
+#define LBR_FROM_FLAG_ABORT(1ULL << 61)
 
 #define for_each_branch_sample_type(x) \
for ((x) = PERF_SAMPLE_BRANCH_USER; \
@@ -270,21 +273,30 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events 
*cpuc)
 
for (i = 0; i < x86_pmu.lbr_nr; i++) {
unsigned long lbr_idx = (tos - i) & mask;
-   u64 from, to, mis = 0, pred = 0;
+   u64 from, to, mis = 0, pred = 0, intx = 0, abort = 0;
 
rdmsrl(x86_pmu.lbr_from + lbr_idx, from);
rdmsrl(x86_pmu.lbr_to   + lbr_idx, to);
 
-   if (lbr_format == LBR_FORMAT_EIP_FLAGS) {
+   if (lbr_format == LBR_FORMAT_EIP_FLAGS ||
+   lbr_format == LBR_FORMAT_EIP_FLAGS2) {
mis = !!(from & LBR_FROM_FLAG_MISPRED);
pred = !mis;
-   from = (u64)s64)from) << 1) >> 1);
+   if (lbr_format == LBR_FORMAT_EIP_FLAGS)
+   from = (u64)s64)from) << 1) >> 1);
+   else if (lbr_format == LBR_FORMAT_EIP_FLAGS2) {
+   intx = !!(from & LBR_FROM_FLAG_INTX);
+   abort = !!(from & LBR_FROM_FLAG_ABORT);
+   from = (u64)s64)from) << 3) >> 3);
+   }
}
 
cpuc->lbr_entries[i].from   = from;
cpuc->lbr_entries[i].to = to;
cpuc->lbr_entries[i].mispred= mis;
cpuc->lbr_entries[i].predicted  = pred;
+   cpuc->lbr_entries[i].intx   = intx;
+   cpuc->lbr_entries[i].abort  = abort;
cpuc->lbr_entries[i].reserved   = 0;
}
cpuc->lbr_stack.nr = i;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2e90235..0e528fc 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -74,13 +74,18 @@ struct perf_raw_record {
  *
  * support for mispred, predicted is optional. In case it
  * is not supported mispred = predicted = 0.
+ *
+ * intx: running in a hardware transaction
+ * abort: aborting a hardware transaction
  */
 struct perf_branch_entry {
__u64   from;
__u64   to;
__u64   mispred:1,  /* target mispredicted */
predicted:1,/* target predicted */
-   reserved:62;
+   intx:1, /* in transaction */
+   abort:1,/* transaction abort */
+   reserved:60;
 };
 
 /*
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 17/33] perf, tools: Handle XBEGIN like a jump

2012-10-26 Thread Andi Kleen

From: Andi Kleen 

So that the browser still shows the abort label

Signed-off-by: Andi Kleen 
---
 tools/perf/util/annotate.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index f0a9103..a34a1ae 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -400,6 +400,8 @@ static struct ins instructions[] = {
{ .name = "testb", .ops  = _ops, },
{ .name = "testl", .ops  = _ops, },
{ .name = "xadd",  .ops  = _ops, },
+   { .name = "xbeginl", .ops  = _ops, },
+   { .name = "xbeginq", .ops  = _ops, },
 };
 
 static int ins__cmp(const void *name, const void *insp)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 20/33] perf, x86: Add Haswell specific transaction flag reporting

2012-10-26 Thread Andi Kleen

From: Andi Kleen 

In the PEBS handler report the transaction flags using the new
generic transaction flags facility. Most of them come from
the "tsx_tuning" field in PEBSv2, but the abort code is derived
from the RAX register reported in the PEBS record.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel_ds.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c 
b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 3094caa..4b657c2 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -677,6 +677,15 @@ static void __intel_pmu_pebs_event(struct perf_event 
*event,
data.weight = ((struct pebs_record_v2 *)pebs)->nhm.lat;
}
 
+   if ((event->attr.sample_type & PERF_SAMPLE_TRANSACTION) &&
+   x86_pmu.intel_cap.pebs_format >= 2) {
+   data.transaction =
+((struct pebs_record_v2 *)pebs)->tsx_tuning >> 32;
+   if ((data.transaction & PERF_SAMPLE_TXN_TRANSACTION) &&
+   (pebs->ax & 1))
+   data.transaction |= pebs->ax & 0xff00;
+   }
+
if (has_branch_stack(event))
data.br_stack = >lbr_stack;
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 22/33] perf, tools: Point --sort documentation to --help

2012-10-26 Thread Andi Kleen

From: Andi Kleen 

The --sort documentation for top and report was hopelessly out-of-date
Instead of having two more places that would need to be updated,
just point to --help.

Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-report.txt |2 +-
 tools/perf/Documentation/perf-top.txt|2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index f4d91be..7cd5d0a 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -57,7 +57,7 @@ OPTIONS
 
 -s::
 --sort=::
-   Sort by key(s): pid, comm, dso, symbol, parent, srcline.
+   Sort by key(s): See --help for a full list.
 
 -p::
 --parent=::
diff --git a/tools/perf/Documentation/perf-top.txt 
b/tools/perf/Documentation/perf-top.txt
index 5b80d84..0f0fa3e 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -112,7 +112,7 @@ Default is to monitor all CPUS.
 
 -s::
 --sort::
-   Sort by key(s): pid, comm, dso, symbol, parent, srcline.
+   Sort by key(s): see --help for a full list.
 
 -n::
 --show-nr-samples::
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 11/33] perf, tools: Support sorting by intx, abort branch flags

2012-10-26 Thread Andi Kleen

From: Andi Kleen 

Extend the perf branch sorting code to support sorting by intx
or abort qualifiers. Also print out those qualifiers.

Signed-off-by: Andi Kleen 
---
 tools/perf/builtin-report.c |3 +-
 tools/perf/builtin-top.c|4 ++-
 tools/perf/perf.h   |4 ++-
 tools/perf/util/hist.h  |2 +
 tools/perf/util/sort.c  |   55 +++
 tools/perf/util/sort.h  |2 +
 6 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index a61725d..d46f887 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -595,7 +595,8 @@ int cmd_report(int argc, const char **argv, const char 
*prefix __maybe_unused)
"Use the stdio interface"),
OPT_STRING('s', "sort", _order, "key[,key2...]",
   "sort by key(s): pid, comm, dso, symbol, parent, dso_to,"
-  " dso_from, symbol_to, symbol_from, mispredict"),
+  " dso_from, symbol_to, symbol_from, mispredict, srcline,"
+  " abort, intx"),
OPT_BOOLEAN(0, "showcpuutilization", _conf.show_cpu_utilization,
"Show sample percentage for different cpu modes"),
OPT_STRING('p', "parent", _pattern, "regex",
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index ff6db80..3861118 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1221,7 +1221,9 @@ int cmd_top(int argc, const char **argv, const char 
*prefix __maybe_unused)
OPT_INCR('v', "verbose", ,
"be more verbose (show counter open errors, etc)"),
OPT_STRING('s', "sort", _order, "key[,key2...]",
-  "sort by key(s): pid, comm, dso, symbol, parent"),
+  "sort by key(s): pid, comm, dso, symbol, parent, dso_to,"
+  " dso_from, symbol_to, symbol_from, mispredict, srcline,"
+  " abort, intx"),
OPT_BOOLEAN('n', "show-nr-samples", _conf.show_nr_samples,
"Show a column with the number of samples"),
OPT_CALLBACK_DEFAULT('G', "call-graph", , "output_type,min_percent, 
call_order",
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index c50985e..22a5502 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -194,7 +194,9 @@ struct ip_callchain {
 struct branch_flags {
u64 mispred:1;
u64 predicted:1;
-   u64 reserved:62;
+   u64 intx:1;
+   u64 abort:1;
+   u64 reserved:60;
 };
 
 struct branch_entry {
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 66cb31f..d918a1a 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -43,6 +43,8 @@ enum hist_column {
HISTC_PARENT,
HISTC_CPU,
HISTC_MISPREDICT,
+   HISTC_INTX,
+   HISTC_ABORT,
HISTC_SYMBOL_FROM,
HISTC_SYMBOL_TO,
HISTC_DSO_FROM,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index cfd1c0f..a8d1f1a 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -476,6 +476,55 @@ struct sort_entry sort_mispredict = {
.se_width_idx   = HISTC_MISPREDICT,
 };
 
+static int64_t
+sort__abort_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+   return left->branch_info->flags.abort !=
+   right->branch_info->flags.abort;
+}
+
+static int hist_entry__abort_snprintf(struct hist_entry *self, char *bf,
+   size_t size, unsigned int width)
+{
+   static const char *out = ".";
+
+   if (self->branch_info->flags.abort)
+   out = "A";
+   return repsep_snprintf(bf, size, "%-*s", width, out);
+}
+
+struct sort_entry sort_abort = {
+   .se_header  = "Transaction abort",
+   .se_cmp = sort__abort_cmp,
+   .se_snprintf= hist_entry__abort_snprintf,
+   .se_width_idx   = HISTC_ABORT,
+};
+
+static int64_t
+sort__intx_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+   return left->branch_info->flags.intx !=
+   right->branch_info->flags.intx;
+}
+
+static int hist_entry__intx_snprintf(struct hist_entry *self, char *bf,
+   size_t size, unsigned int width)
+{
+   static const char *out = ".";
+
+   if (self->branch_info->flags.intx)
+   out = "T";
+
+   return repsep_snprintf(bf, size, "%-*s", width, out);
+}
+
+struct sort_entry sort_intx = {
+   .se_header  = "Branch in transaction",
+   .se_cmp = sort__intx_cmp,
+   .se_snprintf= hist_entry__intx_snprintf,
+   .se_width_idx   = HISTC_INTX,
+};
+
 struct sort_dimension {
const char  *name;
struct sort_entry   *entry;
@@ -497,6 +546,8 @@ static struct sort_dimension sort_dimensions[] = {
DIM(SORT_CPU, "cpu", sort_cpu),
DIM(SORT_MISPREDICT, "mispredict", sort_mispredict),
DIM(SORT_SRCLINE,

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-26 Thread Nix

On 26 Oct 2012, Eric Sandeen outgrape:

> On 10/23/12 3:57 PM, Nix wrote:
>> The only unusual thing about the filesystems on this machine are that
>> they have hardware RAID-5 (using the Areca driver), so I'm mounting with
>> 'nobarrier': the full set of options for all my ext4 filesystems are:
>> 
>> rw,nosuid,nodev,relatime,journal_checksum,journal_async_commit,nobarrier,quota,
>> usrquota,grpquota,commit=30,stripe=16,data=ordered,usrquota,grpquota
>
> Out of curiosity, when I test log replay with the journal_checksum option, I
> almost always get something like:
>
> [  999.917805] JBD2: journal transaction 84121 on dm-1-8 is corrupt.
> [  999.923904] EXT4-fs (dm-1): error loading journal
>
> after a simulated crash & log replay.
>
> Do you see anything like that in your logs?

I'm not seeing any corrupt journals or abort messages at all. The
journal claims to be fine, but plainly isn't.

I can reproduce this on a small filesystem and stick the image somewhere
if that would be of any use to anyone. (If I'm very lucky, merely making
this offer will make the problem go away. :} )

-- 
NULL && (void)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 05/33] perf, kvm: Support the intx/intx_cp modifiers in KVM arch perfmon emulation v3

2012-10-26 Thread Andi Kleen

From: Andi Kleen 

This is not arch perfmon, but older CPUs will just ignore it. This makes
it possible to do at least some TSX measurements from a KVM guest

Cc: a...@redhat.com
Cc: g...@redhat.com
v2: Various fixes to address review feedback
v3: Ignore the bits when no CPUID. No #GP. Force raw events with TSX bits.
Cc: g...@redhat.com
Signed-off-by: Andi Kleen 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/pmu.c  |   34 ++
 2 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b2e11f4..6783289 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -318,6 +318,7 @@ struct kvm_pmu {
u64 global_ovf_ctrl;
u64 counter_bitmask[2];
u64 global_ctrl_mask;
+   u64 cpuid_word9;
u8 version;
struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC];
struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED];
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index cfc258a..8bc954a 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -160,7 +160,7 @@ static void stop_counter(struct kvm_pmc *pmc)
 
 static void reprogram_counter(struct kvm_pmc *pmc, u32 type,
unsigned config, bool exclude_user, bool exclude_kernel,
-   bool intr)
+   bool intr, bool intx, bool intx_cp)
 {
struct perf_event *event;
struct perf_event_attr attr = {
@@ -173,6 +173,11 @@ static void reprogram_counter(struct kvm_pmc *pmc, u32 
type,
.exclude_kernel = exclude_kernel,
.config = config,
};
+   /* Will be ignored on CPUs that don't support this. */
+   if (intx)
+   attr.config |= HSW_INTX;
+   if (intx_cp)
+   attr.config |= HSW_INTX_CHECKPOINTED;
 
attr.sample_period = (-pmc->counter) & pmc_bitmask(pmc);
 
@@ -206,7 +211,8 @@ static unsigned find_arch_event(struct kvm_pmu *pmu, u8 
event_select,
return arch_events[i].event_type;
 }
 
-static void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
+static void reprogram_gp_counter(struct kvm_pmu *pmu, struct kvm_pmc *pmc, 
+u64 eventsel)
 {
unsigned config, type = PERF_TYPE_RAW;
u8 event_select, unit_mask;
@@ -224,9 +230,16 @@ static void reprogram_gp_counter(struct kvm_pmc *pmc, u64 
eventsel)
event_select = eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
unit_mask = (eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
 
+   if (!(boot_cpu_has(X86_FEATURE_HLE) ||
+ boot_cpu_has(X86_FEATURE_RTM)) ||
+   !(pmu->cpuid_word9 & (X86_FEATURE_HLE|X86_FEATURE_RTM)))
+   eventsel &= ~(HSW_INTX|HSW_INTX_CHECKPOINTED);
+
if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
ARCH_PERFMON_EVENTSEL_INV |
-   ARCH_PERFMON_EVENTSEL_CMASK))) {
+   ARCH_PERFMON_EVENTSEL_CMASK |
+   HSW_INTX |
+   HSW_INTX_CHECKPOINTED))) {
config = find_arch_event(>vcpu->arch.pmu, event_select,
unit_mask);
if (config != PERF_COUNT_HW_MAX)
@@ -239,7 +252,9 @@ static void reprogram_gp_counter(struct kvm_pmc *pmc, u64 
eventsel)
reprogram_counter(pmc, type, config,
!(eventsel & ARCH_PERFMON_EVENTSEL_USR),
!(eventsel & ARCH_PERFMON_EVENTSEL_OS),
-   eventsel & ARCH_PERFMON_EVENTSEL_INT);
+   eventsel & ARCH_PERFMON_EVENTSEL_INT,
+   (eventsel & HSW_INTX),
+   (eventsel & HSW_INTX_CHECKPOINTED));
 }
 
 static void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 en_pmi, int idx)
@@ -256,7 +271,7 @@ static void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 
en_pmi, int idx)
arch_events[fixed_pmc_events[idx]].event_type,
!(en & 0x2), /* exclude user */
!(en & 0x1), /* exclude kernel */
-   pmi);
+   pmi, false, false);
 }
 
 static inline u8 fixed_en_pmi(u64 ctrl, int idx)
@@ -289,7 +304,7 @@ static void reprogram_idx(struct kvm_pmu *pmu, int idx)
return;
 
if (pmc_is_gp(pmc))
-   reprogram_gp_counter(pmc, pmc->eventsel);
+   reprogram_gp_counter(pmu, pmc, pmc->eventsel);
else {
int fidx = idx - INTEL_PMC_IDX_FIXED;
reprogram_fixed_counter(pmc,
@@ -400,8 +415,8 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 
data)
} else if ((pmc = get_gp_pmc(pmu, index, MSR_P6_EVNTSEL0))) {
if (data == pmc->eventsel)
return 0;
-   if (!(data &

[PATCH 18/33] perf, x86: Support for printing PMU state on spurious PMIs v3

2012-10-26 Thread Andi Kleen

From: Andi Kleen 

I had some problems with spurious PMIs, so print the PMU state
on a spurious one. This will not interact well with other NMI users.
Disabled by default, has to be explicitely enabled through sysfs.

Optional, but useful for debugging.

v2: Move to /sys/devices/cpu
v3: Print in more cases
Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event.c   |3 +++
 arch/x86/kernel/cpu/perf_event.h   |2 ++
 arch/x86/kernel/cpu/perf_event_intel.c |   11 ++-
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 81b5e65..4a35eef 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -38,6 +38,7 @@
 #include "perf_event.h"
 
 struct x86_pmu x86_pmu __read_mostly;
+int   print_spurious_pmi __read_mostly;
 
 DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
.enabled = 1,
@@ -1636,9 +1637,11 @@ static ssize_t set_attr_rdpmc(struct device *cdev,
 }
 
 static DEVICE_ATTR(rdpmc, S_IRUSR | S_IWUSR, get_attr_rdpmc, set_attr_rdpmc);
+static DEVICE_INT_ATTR(print_spurious_pmi, 0644, print_spurious_pmi);
 
 static struct attribute *x86_pmu_attrs[] = {
_attr_rdpmc.attr,
+   _attr_print_spurious_pmi.attr.attr,
NULL,
 };
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 89394e1..7b43503 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -659,3 +659,5 @@ static inline struct intel_shared_regs 
*allocate_shared_regs(int cpu)
 }
 
 #endif /* CONFIG_CPU_SUP_INTEL */
+
+extern int print_spurious_pmi;
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 3a7b962..bb1a539 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1146,11 +1146,8 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
intel_pmu_disable_all();
handled = intel_pmu_drain_bts_buffer();
status = intel_pmu_get_status();
-   if (!status) {
-   intel_pmu_enable_all(0);
-   return handled;
-   }
-
+   if (!status)
+   goto done;
loops = 0;
 again:
intel_pmu_ack_status(status);
@@ -1210,6 +1207,10 @@ again:
goto again;
 
 done:
+   if (!handled && print_spurious_pmi) {
+   pr_debug("Spurious PMI\n");
+   perf_event_print_debug();
+   }
intel_pmu_enable_all(0);
return handled;
 }
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 12/33] perf, x86: Support full width counting

2012-10-26 Thread Andi Kleen

From: Andi Kleen 

Recent Intel CPUs have a new alternative MSR range for perfctrs that allows
writing the full counter width. Enable this range if the hardware reports it
using a new capability bit. This lowers overhead of perf stat slightly because
it has to do less interrupts to accumulate the counter value. On Haswell it
also avoids some problems with TSX aborting when the end of the counter
range is reached.

Signed-off-by: Andi Kleen 
---
 arch/x86/include/asm/msr-index.h   |3 +++
 arch/x86/kernel/cpu/perf_event.h   |1 +
 arch/x86/kernel/cpu/perf_event_intel.c |6 ++
 3 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 7f0edce..2070f46 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -126,6 +126,9 @@
 #define MSR_KNC_EVNTSEL0   0x0028
 #define MSR_KNC_EVNTSEL1   0x0029
 
+/* Alternative perfctr range with full access. */
+#define MSR_IA32_PMC0  0x04c1
+
 /* AMD64 MSRs. Not complete. See the architecture manual for a more
complete list. */
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index e5da138..17cb08f 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -278,6 +278,7 @@ union perf_capabilities {
u64 pebs_arch_reg:1;
u64 pebs_format:4;
u64 smm_freeze:1;
+   u64 fw_write:1;
};
u64 capabilities;
 };
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 9502c19..9bff694 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2238,5 +2238,11 @@ __init int intel_pmu_init(void)
}
}
 
+   /* Support full width counters using alternative MSR range */
+   if (x86_pmu.intel_cap.fw_write) {
+   x86_pmu.max_period = x86_pmu.cntval_mask;
+   x86_pmu.perfctr = MSR_IA32_PMC0;
+   }
+
return 0;
 }
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 29/33] perf, tools: Add perf stat --transaction v2

2012-10-26 Thread Andi Kleen

From: Andi Kleen 

Add support to perf stat to print the basic transactional execution statistics:
Total cycles, Cycles in Transaction, Cycles in aborted transsactions
using the intx and intx_checkpoint qualifiers.
Transaction Starts and Elision Starts, to compute the average transaction 
length.

This is a reasonable overview over the success of the transactions.

Enable with a new --transaction / -T option.

This requires measuring these events in a group, since they depend on each
other.

This is implemented by using TM sysfs events exported by the kernel

v2: Only print the extended statistics when the option is enabled.
This avoids negative output when the user specifies the -T events
in separate groups.
Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-stat.txt |3 +
 tools/perf/builtin-stat.c  |  101 +++-
 tools/perf/util/evsel.h|6 ++
 3 files changed, 107 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt 
b/tools/perf/Documentation/perf-stat.txt
index 2fa173b..653bdbd 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -108,7 +108,10 @@ with it.  --append may be used here.  Examples:
  3>results  perf stat --log-fd 3  -- $cmd
  3>>results perf stat --log-fd 3 --append -- $cmd
 
+-T::
+--transaction::
 
+Print statistics of transactional execution if supported.
 
 EXAMPLES
 
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 93b9011..a451490 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -64,6 +64,29 @@
 #define CNTR_NOT_SUPPORTED ""
 #define CNTR_NOT_COUNTED   ""
 
+static const char *transaction_attrs[] = {
+   "task-clock",
+   "{"
+   "instructions,"
+   "cycles,"
+   "cpu/cycles-t/,"
+   "cpu/cycles-ct/,"
+   "cpu/tx-start/,"
+   "cpu/el-start/"
+   "}"
+};
+
+/* must match the transaction_attrs above */
+enum {
+   T_TASK_CLOCK,
+   T_INSTRUCTIONS,
+   T_CYCLES,
+   T_CYCLES_INTX,
+   T_CYCLES_INTX_CP,
+   T_TRANSACTION_START,
+   T_ELISION_START
+};
+
 static struct perf_evlist  *evsel_list;
 
 static struct perf_target  target = {
@@ -77,6 +100,7 @@ static bool  no_aggr 
= false;
 static pid_t   child_pid   = -1;
 static boolnull_run=  false;
 static int detailed_run=  0;
+static booltransaction_run =  false;
 static boolbig_num =  true;
 static int big_num_opt =  -1;
 static const char  *csv_sep= NULL;
@@ -123,7 +147,11 @@ static struct stats runtime_l1_icache_stats[MAX_NR_CPUS];
 static struct stats runtime_ll_cache_stats[MAX_NR_CPUS];
 static struct stats runtime_itlb_cache_stats[MAX_NR_CPUS];
 static struct stats runtime_dtlb_cache_stats[MAX_NR_CPUS];
+static struct stats runtime_cycles_intx_stats[MAX_NR_CPUS];
+static struct stats runtime_cycles_intxcp_stats[MAX_NR_CPUS];
 static struct stats walltime_nsecs_stats;
+static struct stats runtime_transaction_stats[MAX_NR_CPUS];
+static struct stats runtime_elision_stats[MAX_NR_CPUS];
 
 static int create_perf_stat_counter(struct perf_evsel *evsel,
struct perf_evsel *first)
@@ -183,6 +211,18 @@ static inline int nsec_counter(struct perf_evsel *evsel)
return 0;
 }
 
+static struct perf_evsel *nth_evsel(int n)
+{
+   struct perf_evsel *ev;
+   int j;
+
+   j = 0;
+   list_for_each_entry (ev, _list->entries, node)
+   if (j++ == n)
+   return ev;
+   return NULL;
+}
+
 /*
  * Update various tracking values we maintain to print
  * more semantic information such as miss/hit ratios,
@@ -194,8 +234,14 @@ static void update_shadow_stats(struct perf_evsel 
*counter, u64 *count)
update_stats(_nsecs_stats[0], count[0]);
else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
update_stats(_cycles_stats[0], count[0]);
-   else if (perf_evsel__match(counter, HARDWARE, 
HW_STALLED_CYCLES_FRONTEND))
-   update_stats(_stalled_cycles_front_stats[0], count[0]);
+   else if (perf_evsel__cmp(counter, nth_evsel(T_CYCLES_INTX)))
+   update_stats(_cycles_intx_stats[0], count[0]);
+   else if (perf_evsel__cmp(counter, nth_evsel(T_CYCLES_INTX_CP)))
+   update_stats(_cycles_intxcp_stats[0], count[0]);
+   else if (perf_evsel__cmp(counter, nth_evsel(T_TRANSACTION_START)))
+   update_stats(_transaction_stats[0], count[0]);
+   else if (perf_evsel__cmp(counter, nth_evsel(T_ELISION_START)))
+   update_stats(_elision_stats[0], count[0]);

[PATCH 27/33] tools, perf: Add a precise event qualifier

2012-10-26 Thread Andi Kleen

From: Andi Kleen 

Add a precise qualifier, like cpu/event=0x3c,precise=1/

This is needed so that the kernel can request enabling PEBS
for TSX events. The parser bails out on any sysfs parse errors,
so this is needed in any case to handle any event on the TSX
perf kernel.

Signed-off-by: Andi Kleen 
---
 tools/perf/util/parse-events.c |6 ++
 tools/perf/util/parse-events.h |1 +
 tools/perf/util/parse-events.l |1 +
 3 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 409da3e..f800765 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -526,6 +526,12 @@ do {   
\
case PARSE_EVENTS__TERM_TYPE_NAME:
CHECK_TYPE_VAL(STR);
break;
+   case PARSE_EVENTS__TERM_TYPE_PRECISE:
+   CHECK_TYPE_VAL(NUM);
+   if ((unsigned)term->val.num > 2)
+   return -EINVAL;
+   attr->precise_ip = term->val.num;
+   break;
default:
return -EINVAL;
}
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 839230c..0c78bb8 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -49,6 +49,7 @@ enum {
PARSE_EVENTS__TERM_TYPE_NAME,
PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD,
PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE,
+   PARSE_EVENTS__TERM_TYPE_PRECISE,
 };
 
 struct parse_events__term {
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index ef602f0..c2e5142 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -168,6 +168,7 @@ period  { return term(yyscanner, 
PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD); }
 branch_type{ return term(yyscanner, 
PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE); }
 ,  { return ','; }
 "/"{ BEGIN(INITIAL); return '/'; }
+precise{ return term(yyscanner, 
PARSE_EVENTS__TERM_TYPE_PRECISE); }
 }
 
 mem:   { BEGIN(mem); return PE_PREFIX_MEM; }
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 24/33] perf, tools: Move parse_events error printing to parse_events_options

2012-10-26 Thread Andi Kleen

From: Andi Kleen 

The callers of parse_events usually have their own error handling.
Move the fprintf for a bad event to parse_events_options, which
is the only one who should need it.

Signed-off-by: Andi Kleen 
---
 tools/perf/util/parse-events.c |   10 +++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 75c7b0f..409da3e 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -827,8 +827,6 @@ int parse_events(struct perf_evlist *evlist, const char 
*str,
 * Both call perf_evlist__delete in case of error, so we dont
 * need to bother.
 */
-   fprintf(stderr, "invalid or unsupported event: '%s'\n", str);
-   fprintf(stderr, "Run 'perf list' for a list of valid events\n");
return ret;
 }
 
@@ -836,7 +834,13 @@ int parse_events_option(const struct option *opt, const 
char *str,
int unset __maybe_unused)
 {
struct perf_evlist *evlist = *(struct perf_evlist **)opt->value;
-   return parse_events(evlist, str, unset);
+   int ret = parse_events(evlist, str, unset);
+
+   if (ret) {
+   fprintf(stderr, "invalid or unsupported event: '%s'\n", str);
+   fprintf(stderr, "Run 'perf list' for a list of valid events\n");
+   }
+   return ret;
 }
 
 int parse_filter(const struct option *opt, const char *str,
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 19/33] perf, core: Add generic transaction flags

2012-10-26 Thread Andi Kleen

From: Andi Kleen 

Add a generic qualifier for transaction events, as a new sample
type that returns a flag word. This is particularly useful
for qualifying aborts: to distinguish aborts which happen
due to asynchronous events (like conflicts caused by another
CPU) versus instructions that lead to an abort.

The tuning strategies are very different for those cases,
so it's important to distinguish them easily and early.

Since it's inconvenient and inflexible to filter for this
in the kernel we report all the events out and allow
some post processing in user space.

The flags are based on the Intel TSX events, but should be fairly
generic and mostly applicable to other architectures too. In addition
to various flag words there's also reserved space to report an
program supplied abort code. For TSX this is used to distinguish specific
classes of aborts, like a lock busy abort when doing lock elision.

This adds the perf core glue needed for reporting the new flag word out.

Signed-off-by: Andi Kleen 
---
 include/linux/perf_event.h  |2 ++
 include/uapi/linux/perf_event.h |   24 ++--
 kernel/events/core.c|6 ++
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f4ded17..7e6a4b6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -589,6 +589,7 @@ struct perf_sample_data {
struct perf_regs_user   regs_user;
u64 stack_user_size;
u64 weight;
+   u64 transaction;
 };
 
 static inline void perf_sample_data_init(struct perf_sample_data *data,
@@ -603,6 +604,7 @@ static inline void perf_sample_data_init(struct 
perf_sample_data *data,
data->regs_user.regs = NULL;
data->stack_user_size = 0;
data->weight = 0;
+   data->transaction = 0;
 }
 
 extern void perf_output_sample(struct perf_output_handle *handle,
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 809a5fd..7155205 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -133,9 +133,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_REGS_USER   = 1U << 12,
PERF_SAMPLE_STACK_USER  = 1U << 13,
PERF_SAMPLE_WEIGHT  = 1U << 14,
+   PERF_SAMPLE_TRANSACTION = 1U << 15,
 
-   PERF_SAMPLE_MAX = 1U << 15, /* non-ABI */
-
+   PERF_SAMPLE_MAX = 1U << 16, /* non-ABI */
 };
 
 /*
@@ -179,6 +179,26 @@ enum perf_sample_regs_abi {
 };
 
 /*
+ * Values for the transaction event qualifier, mostly for abort events.
+ */
+enum {
+   PERF_SAMPLE_TXN_ELISION = (1 << 0), /* From elision */
+   PERF_SAMPLE_TXN_TRANSACTION = (1 << 1), /* From transaction */
+   PERF_SAMPLE_TXN_SYNC= (1 << 2), /* Instruction is related */
+   PERF_SAMPLE_TXN_ASYNC   = (1 << 3), /* Instruction not related */
+   PERF_SAMPLE_TXN_RETRY   = (1 << 4), /* Retry possible */
+   PERF_SAMPLE_TXN_CONFLICT= (1 << 5), /* Conflict abort */
+   PERF_SAMPLE_TXN_CAPACITY= (1 << 6), /* Capacity abort */
+
+   PERF_SAMPLE_TXN_MAX = (1 << 7),  /* non-ABI */
+
+   /* bits 24..31 are reserved for the abort code */
+
+   PERF_SAMPLE_TXN_ABORT_MASK  = 0xff00,
+   PERF_SAMPLE_TXN_ABORT_SHIFT = 24,
+};
+
+/*
  * The format of the data returned by read() on a perf event fd,
  * as specified by attr.read_format:
  *
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d633581..534810d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -955,6 +955,9 @@ static void perf_event__header_size(struct perf_event 
*event)
if (sample_type & PERF_SAMPLE_WEIGHT)
size += sizeof(data->weight);
 
+   if (sample_type & PERF_SAMPLE_TRANSACTION)
+   size += sizeof(data->transaction);
+
if (sample_type & PERF_SAMPLE_READ)
size += event->read_size;
 
@@ -4086,6 +4089,9 @@ void perf_output_sample(struct perf_output_handle *handle,
if (sample_type & PERF_SAMPLE_WEIGHT)
perf_output_put(handle, data->weight);
 
+   if (sample_type & PERF_SAMPLE_TRANSACTION)
+   perf_output_put(handle, data->transaction);
+
if (sample_type & PERF_SAMPLE_READ)
perf_output_read(handle, event);
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 28/33] perf, x86: Add Haswell TSX event aliases

2012-10-26 Thread Andi Kleen

From: Andi Kleen 

Add infrastructure to generate event aliases in /sys/devices/cpu/events/

And use this to set up user friendly aliases for the common TSX events.
TSX tuning relies heavily on the PMU, so it's important to be user friendly.

This replaces the generic transaction events in an earlier version
of this patchkit.

tx-start/commit/abort  to count RTM transactions
el-start/commit/abort  to count HLE ("elision") transactions
tx-conflict/overflow   to count conflict/overflow for both combined.

The general abort events exist in precise and non precise variants
Since the common case is sampling plain "tx-aborts" in precise.

This is very important because abort sampling only really works
with PEBS enabled, otherwise it would report the IP after the abort,
not the abort point. But counting with PEBS has more overhead,
so also have tx/el-abort-count aliases that do not enable PEBS
for perf stat.

It would be nice to switch automatically between those two, like in the
previous version, but that would need more new infrastructure for sysfs
first.

There is an tx-abort<->tx-aborts alias too, because I found myself
using both variants.

Also added friendly aliases for cpu/cycles,intx=1/ and
cpu/cycles,intx=1,intx_cp=1/ and the same for instructions.
These will be used by perf stat -T, and are also useful for users directly.

So for example to get transactional cycles can use "perf stat -e cycles-t"

Some of the sysfs macros/functions could probably move to generic code, but
I left it in the Intel code for now until there are more users.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel.c |   95 
 1 files changed, 95 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index c3beee1..e9706f0 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2054,6 +2054,99 @@ static __init void intel_gen_arch_events(void)
x86_pmu.events_attrs = intel_arch_events;
 }
 
+struct sevent_attribute {
+   struct device_attribute attr;
+   const char  *val;
+};
+
+#define PMU_EVENT(_name, _id, _val)\
+   static struct sevent_attribute attr_ ## _name = \
+   { .attr =   \
+ { .attr = { .name = _id, .mode = 0444 },  \
+   .show = show_sevent },  \
+ .val = _val }
+
+static ssize_t show_sevent(struct device *dev,
+ struct device_attribute *attr,
+ char *page)
+{
+   struct sevent_attribute *e = container_of(attr, struct 
sevent_attribute, attr);
+
+   return sprintf(page, "%s", e->val);
+}
+
+/* Haswell special events */
+PMU_EVENT(tx_start,   "tx-start",   "event=0xc9,umask=0x1");
+PMU_EVENT(tx_commit,  "tx-commit",  "event=0xc9,umask=0x2");
+PMU_EVENT(tx_abort,   "tx-abort",   "event=0xc9,umask=0x4,precise=2");
+PMU_EVENT(tx_abort_count, "tx-abort-count", "event=0xc9,umask=0x4");
+/* alias */
+PMU_EVENT(tx_aborts,  "tx-aborts",  "event=0xc9,umask=0x4,precise=2");
+PMU_EVENT(tx_capacity,"tx-capacity","event=0x54,umask=0x2");
+PMU_EVENT(tx_conflict,"tx-conflict","event=0x54,umask=0x1");
+PMU_EVENT(el_start,   "el-start",   "event=0xc8,umask=0x1");
+PMU_EVENT(el_commit,  "el-commit",  "event=0xc8,umask=0x2");
+PMU_EVENT(el_abort,   "el-abort",   "event=0xc8,umask=0x4,precise=2");
+PMU_EVENT(el_abort_count, "el-abort-count", "event=0xc8,umask=0x4");
+/* alias */
+PMU_EVENT(el_aborts,  "el-aborts",  "event=0xc8,umask=0x4,precise=2");
+/* shared with tx-* */
+PMU_EVENT(el_capacity,"el-capacity","event=0x54,umask=0x2");
+/* shared with tx-* */
+PMU_EVENT(el_conflict,"el-conflict","event=0x54,umask=0x1");
+PMU_EVENT(cycles_t,   "cycles-t",   "event=0x3c,intx=1");
+PMU_EVENT(cycles_ct,  "cycles-ct",  "event=0x3c,intx=1,intx_cp=1");
+PMU_EVENT(insns_t,"instructions-t", "event=0xc0,intx=1");
+PMU_EVENT(insns_ct,   "instructions-ct","event=0xc0,intx=1,intx_cp=1");
+
+#define PMU_EVENT_PTR(x) _ ## x .attr.attr
+
+static struct attribute *hsw_events_attrs[] = {
+   PMU_EVENT_PTR(tx_start),
+   PMU_EVENT_PTR(tx_commit),
+   PMU_EVENT_PTR(tx_abort),
+   PMU_EVENT_PTR(tx_aborts),
+   PMU_EVENT_PTR(tx_abort_count),
+   PMU_EVENT_PTR(tx_capacity),
+   PMU_EVENT_PTR(tx_conflict),
+   PMU_EVENT_PTR(el_start),
+   PMU_EVENT_PTR(el_commit),
+   PMU_EVENT_PTR(el_abort),
+   PMU_EVENT_PTR(el_aborts),
+   PMU_EVENT_PTR(el_abort_count),
+   PMU_EVENT_PTR(el_capacity),
+   PMU_EVENT_PTR(el_conflict),
+   PMU_EVENT_PTR(cycles_t),
+   PMU_EVENT_PTR(cycles_ct),
+   PMU_EVENT_PTR(insns_t),
+

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-26 Thread Eric Sandeen

On 10/23/12 3:57 PM, Nix wrote:
> [Bruce, Trond, I fear it may be hard for me to continue chasing this NFS
>  lockd crash as long as ext4 on 3.6.3 is hosing my filesystems like
>  this. Apologies.]



> The only unusual thing about the filesystems on this machine are that
> they have hardware RAID-5 (using the Areca driver), so I'm mounting with
> 'nobarrier': the full set of options for all my ext4 filesystems are:
> 
> rw,nosuid,nodev,relatime,journal_checksum,journal_async_commit,nobarrier,quota,
> usrquota,grpquota,commit=30,stripe=16,data=ordered,usrquota,grpquota

Out of curiosity, when I test log replay with the journal_checksum option, I
almost always get something like:

[  999.917805] JBD2: journal transaction 84121 on dm-1-8 is corrupt.
[  999.923904] EXT4-fs (dm-1): error loading journal

after a simulated crash & log replay.

Do you see anything like that in your logs?



Thanks,
-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 21/33] perf, tools: Add support for record transaction flags

2012-10-26 Thread Andi Kleen

From: Andi Kleen 

Add the glue in the user tools to record transaction flags with
--transaction (-T was already taken) and dump them.

Followon patches will use them.

Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-record.txt |5 -
 tools/perf/builtin-record.c  |2 ++
 tools/perf/perf.h|1 +
 tools/perf/util/event.h  |1 +
 tools/perf/util/evsel.c  |9 +
 tools/perf/util/session.c|3 +++
 6 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index 0ffb436..34f4f1a 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -185,12 +185,15 @@ is enabled for all the sampling events. The sampled 
branch type is the same for
 The various filters must be specified as a comma separated list: 
--branch-filter any_ret,u,k
 Note that this feature may not be available on all processors.
 
--W::
 --weight::
 Enable weightened sampling. When the event supports an additional weight per 
sample scale
 the histogram by this weight. This currently works for TSX abort events and 
some memory events
 in precise mode on modern Intel CPUs.
 
+-T::
+--transaction::
+Record transaction flags for transaction related events.
+
 SEE ALSO
 
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 9a25116..49de48e 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1008,6 +1008,8 @@ const struct option record_options[] = {
 parse_branch_stack),
OPT_BOOLEAN('W', "weight", _weight,
"sample by weight (on special events only)"),
+   OPT_BOOLEAN(0, "transaction", _transaction,
+   "sample transaction flags (special events only)"),
OPT_END()
 };
 
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 2365abf..395d216 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -246,6 +246,7 @@ struct perf_record_opts {
u64  default_interval;
u64  user_interval;
u16  stack_dump_size;
+   bool sample_transaction;
 };
 
 #endif
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index d60015b..28fd2eb 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -88,6 +88,7 @@ struct perf_sample {
u64 stream_id;
u64 period;
u64 weight;
+   u64 transaction;
u32 cpu;
u32 raw_size;
void *raw_data;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 3800fb5..5c9790d 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -448,6 +448,9 @@ void perf_evsel__config(struct perf_evsel *evsel, struct 
perf_record_opts *opts,
if (opts->sample_weight)
attr->sample_type   |= PERF_SAMPLE_WEIGHT;
 
+   if (opts->sample_transaction)
+   attr->sample_type   |= PERF_SAMPLE_TRANSACTION;
+
if (opts->call_graph) {
attr->sample_type   |= PERF_SAMPLE_CALLCHAIN;
 
@@ -951,6 +954,12 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, 
union perf_event *event,
array++;
}
 
+   data->transaction = 0;
+   if (type & PERF_SAMPLE_TRANSACTION) {
+   data->transaction = *array;
+   array++;
+   }
+
if (type & PERF_SAMPLE_READ) {
fprintf(stderr, "PERF_SAMPLE_READ is unsupported for now\n");
return -1;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 2009665..316dd91 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1009,6 +1009,9 @@ static void dump_sample(struct perf_evsel *evsel, union 
perf_event *event,
 
if (sample_type & PERF_SAMPLE_WEIGHT)
printf("... weight: %" PRIu64 "\n", sample->weight);
+
+   if (sample_type & PERF_SAMPLE_TRANSACTION)
+   printf("... transaction: %" PRIx64 "\n", sample->transaction);
 }
 
 static struct machine *
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 13/33] perf, x86: Avoid checkpointed counters causing excessive TSX aborts v3

2012-10-26 Thread Andi Kleen

From: Andi Kleen 

With checkpointed counters there can be a situation where the counter
is overflowing, aborts the transaction, is set back to a non overflowing
checkpoint, causes interupt. The interrupt doesn't see the overflow
because it has been checkpointed.  This is then a spurious PMI, typically with a
ugly NMI message.  It can also lead to excessive aborts.

Avoid this problem by:
- Using the full counter width for counting counters (previous patch)
- Forbid sampling for checkpointed counters. It's not too useful anyways,
checkpointing is mainly for counting.
- On a PMI always set back checkpointed counters to zero.

v2: Add unlikely. Add comment
v3: Allow large sampling periods with CP for KVM
Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel.c |   33 
 1 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 9bff694..bbd00cc 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1079,6 +1079,17 @@ static void intel_pmu_enable_event(struct perf_event 
*event)
 int intel_pmu_save_and_restart(struct perf_event *event)
 {
x86_perf_event_update(event);
+   /*
+* For a checkpointed counter always reset back to 0.  This
+* avoids a situation where the counter overflows, aborts the
+* transaction and is then set back to shortly before the
+* overflow, and overflows and aborts again.
+*/
+   if (unlikely(event->hw.config & HSW_INTX_CHECKPOINTED)) {
+   /* No race with NMIs because the counter should not be armed */
+   wrmsrl(event->hw.event_base, 0);
+   local64_set(>hw.prev_count, 0);
+   }
return x86_perf_event_set_period(event);
 }
 
@@ -1162,6 +1173,15 @@ again:
x86_pmu.drain_pebs(regs);
}
 
+   /*
+* To avoid spurious interrupts with perf stat always reset checkpointed
+* counters.
+*
+* XXX move somewhere else.
+*/
+   if (cpuc->events[2] && (cpuc->events[2]->hw.config & 
HSW_INTX_CHECKPOINTED))
+   status |= (1ULL << 2);
+
for_each_set_bit(bit, (unsigned long *), X86_PMC_IDX_MAX) {
struct perf_event *event = cpuc->events[bit];
 
@@ -1635,6 +1655,19 @@ static int hsw_hw_config(struct perf_event *event)
 ((event->hw.config & ARCH_PERFMON_EVENTSEL_ANY) ||
  event->attr.precise_ip > 0))
return -EIO;
+   if (event->hw.config & HSW_INTX_CHECKPOINTED) {
+   /*
+* Sampling of checkpointed events can cause situations where
+* the CPU constantly aborts because of a overflow, which is
+* then checkpointed back and ignored. Forbid checkpointing
+* for sampling.
+*
+* But still allow a long sampling period, so that perf stat
+* from KVM works.
+*/
+   if (event->attr.sample_period < 0x7fff)
+   return -EIO;
+   }
return 0;
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 15/33] perf, x86: Support weight samples for PEBS

2012-10-26 Thread Andi Kleen

From: Andi Kleen 

When a weighted sample is requested, first try to report the TSX abort cost
on Haswell. If that is not available report the memory latency. This
allows profiling both by abort cost and by memory latencies.

Memory latencies requires enabling a different PEBS mode (LL).
When both address and weight is requested address wins.

The LL mode only works for memory related PEBS events, so add a
separate event constraint table for those.

I only did this for Haswell for now, but it could be added
for several other Intel CPUs too by just adding the right
table for them.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event.h  |4 ++
 arch/x86/kernel/cpu/perf_event_intel.c|4 ++
 arch/x86/kernel/cpu/perf_event_intel_ds.c |   47 +++-
 3 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 17cb08f..89394e1 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -168,6 +168,7 @@ struct cpu_hw_events {
u64 perf_ctr_virt_mask;
 
void*kfree_on_online;
+   u8  *memory_latency_events;
 };
 
 #define __EVENT_CONSTRAINT(c, n, m, w, o) {\
@@ -388,6 +389,7 @@ struct x86_pmu {
struct event_constraint *pebs_constraints;
void(*pebs_aliases)(struct perf_event *event);
int max_pebs_events;
+   struct event_constraint *memory_lat_events;
 
/*
 * Intel LBR
@@ -594,6 +596,8 @@ extern struct event_constraint 
intel_ivb_pebs_event_constraints[];
 
 extern struct event_constraint intel_hsw_pebs_event_constraints[];
 
+extern struct event_constraint intel_hsw_memory_latency_events[];
+
 struct event_constraint *intel_pebs_constraints(struct perf_event *event);
 
 void intel_pmu_pebs_enable(struct perf_event *event);
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index bbd00cc..3a7b962 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1644,6 +1644,9 @@ static int hsw_hw_config(struct perf_event *event)
 
if (ret)
return ret;
+   /* PEBS cannot capture both */
+   if (event->attr.sample_type & PERF_SAMPLE_ADDR)
+   event->attr.sample_type &= ~PERF_SAMPLE_WEIGHT;
if (!boot_cpu_has(X86_FEATURE_RTM) && !boot_cpu_has(X86_FEATURE_HLE))
return 0;
event->hw.config |= event->attr.config & 
(HSW_INTX|HSW_INTX_CHECKPOINTED);
@@ -2220,6 +2223,7 @@ __init int intel_pmu_init(void)
x86_pmu.hw_config = hsw_hw_config;
x86_pmu.get_event_constraints = hsw_get_event_constraints;
x86_pmu.format_attrs = intel_hsw_formats_attr;
+   x86_pmu.memory_lat_events = intel_hsw_memory_latency_events;
pr_cont("Haswell events, ");
break;
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c 
b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index aa0f5fa..3094caa 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -456,6 +456,17 @@ struct event_constraint intel_hsw_pebs_event_constraints[] 
= {
EVENT_CONSTRAINT_END
 };
 
+/* Subset of PEBS events supporting memory latency. Not used for scheduling */
+
+struct event_constraint intel_hsw_memory_latency_events[] = {
+   INTEL_EVENT_CONSTRAINT(0xcd, 0), /* MEM_TRANS_RETIRED.* */
+   INTEL_EVENT_CONSTRAINT(0xd0, 0), /* MEM_UOPS_RETIRED.* */
+   INTEL_EVENT_CONSTRAINT(0xd1, 0), /* MEM_LOAD_UOPS_RETIRED.* */
+   INTEL_EVENT_CONSTRAINT(0xd2, 0), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
+   INTEL_EVENT_CONSTRAINT(0xd3, 0), /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */
+   EVENT_CONSTRAINT_END
+};
+
 struct event_constraint *intel_pebs_constraints(struct perf_event *event)
 {
struct event_constraint *c;
@@ -473,6 +484,21 @@ struct event_constraint *intel_pebs_constraints(struct 
perf_event *event)
return 
 }
 
+static bool is_memory_lat_event(struct perf_event *event)
+{
+   struct event_constraint *c;
+
+   if (x86_pmu.intel_cap.pebs_format < 1)
+   return false;
+   if (!x86_pmu.memory_lat_events)
+   return false;
+   for_each_event_constraint(c, x86_pmu.memory_lat_events) {
+   if ((event->hw.config & c->cmask) == c->code)
+   return true;
+   }
+   return false;
+}
+
 void intel_pmu_pebs_enable(struct perf_event *event)
 {
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
@@ -480,7 +506,12 @@ void intel_pmu_pebs_enable(struct perf_event *event)
 
hwc->config &= ~ARCH_PERFMON_EVENTSEL_INT;
 
-   cpuc->pebs_enabled |= 1ULL << hwc->idx;
+   /* When weight is requested enable LL instead of normal PEBS */
+   if

[PATCH 14/33] perf, core: Add a concept of a weightened sample

2012-10-26 Thread Andi Kleen

From: Andi Kleen 

For some events it's useful to weight sample with a hardware
provided number. This expresses how expensive the action the
sample represent was.  This allows the profiler to scale
the samples to be more informative to the programmer.

There is already the period which is used similarly, but it means
something different, so I chose to not overload it. Instead
a new sample type for WEIGHT is added.

Can be used for multiple things. Initially it is used for TSX abort costs
and profiling by memory latencies (so to make expensive load appear higher
up in the histograms)  The concept is quite generic and can be extended
to many other kinds of events or architectures, as long as the hardware
provides suitable auxillary values. In principle it could be also
used for software tracpoints.

This adds the generic glue. A new optional sample format for a 64bit
weight value.

Signed-off-by: Andi Kleen 
---
 include/linux/perf_event.h  |2 ++
 include/uapi/linux/perf_event.h |8 ++--
 kernel/events/core.c|6 ++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 0e528fc..f4ded17 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -588,6 +588,7 @@ struct perf_sample_data {
struct perf_branch_stack*br_stack;
struct perf_regs_user   regs_user;
u64 stack_user_size;
+   u64 weight;
 };
 
 static inline void perf_sample_data_init(struct perf_sample_data *data,
@@ -601,6 +602,7 @@ static inline void perf_sample_data_init(struct 
perf_sample_data *data,
data->regs_user.abi = PERF_SAMPLE_REGS_ABI_NONE;
data->regs_user.regs = NULL;
data->stack_user_size = 0;
+   data->weight = 0;
 }
 
 extern void perf_output_sample(struct perf_output_handle *handle,
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 8e38823..809a5fd 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -132,8 +132,10 @@ enum perf_event_sample_format {
PERF_SAMPLE_BRANCH_STACK= 1U << 11,
PERF_SAMPLE_REGS_USER   = 1U << 12,
PERF_SAMPLE_STACK_USER  = 1U << 13,
+   PERF_SAMPLE_WEIGHT  = 1U << 14,
+
+   PERF_SAMPLE_MAX = 1U << 15, /* non-ABI */
 
-   PERF_SAMPLE_MAX = 1U << 14, /* non-ABI */
 };
 
 /*
@@ -201,8 +203,9 @@ enum perf_event_read_format {
PERF_FORMAT_TOTAL_TIME_RUNNING  = 1U << 1,
PERF_FORMAT_ID  = 1U << 2,
PERF_FORMAT_GROUP   = 1U << 3,
+   PERF_FORMAT_WEIGHT  = 1U << 4,
 
-   PERF_FORMAT_MAX = 1U << 4,  /* non-ABI */
+   PERF_FORMAT_MAX = 1U << 5,  /* non-ABI */
 };
 
 #define PERF_ATTR_SIZE_VER064  /* sizeof first published struct */
@@ -562,6 +565,7 @@ enum perf_event_type {
 *  { u64   stream_id;} && PERF_SAMPLE_STREAM_ID
 *  { u32   cpu, res; } && PERF_SAMPLE_CPU
 *  { u64   period;   } && PERF_SAMPLE_PERIOD
+*  { u64   weight;   } && PERF_SAMPLE_WEIGHT
 *
 *  { struct read_formatvalues;   } && PERF_SAMPLE_READ
 *
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dbccf83..d633581 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -952,6 +952,9 @@ static void perf_event__header_size(struct perf_event 
*event)
if (sample_type & PERF_SAMPLE_PERIOD)
size += sizeof(data->period);
 
+   if (sample_type & PERF_SAMPLE_WEIGHT)
+   size += sizeof(data->weight);
+
if (sample_type & PERF_SAMPLE_READ)
size += event->read_size;
 
@@ -4080,6 +4083,9 @@ void perf_output_sample(struct perf_output_handle *handle,
if (sample_type & PERF_SAMPLE_PERIOD)
perf_output_put(handle, data->period);
 
+   if (sample_type & PERF_SAMPLE_WEIGHT)
+   perf_output_put(handle, data->weight);
+
if (sample_type & PERF_SAMPLE_READ)
perf_output_read(handle, event);
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 03/33] perf, x86: Basic Haswell PEBS support v3

2012-10-26 Thread Andi Kleen

From: Andi Kleen 

Add basic PEBS support for Haswell.
The constraints are similar to SandyBridge with a few new events.

v2: Readd missing pebs_aliases
v3: Readd missing hunk. Fix some constraints.
Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event.h  |2 ++
 arch/x86/kernel/cpu/perf_event_intel.c|6 --
 arch/x86/kernel/cpu/perf_event_intel_ds.c |   29 +
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index a9cac71..e5da138 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -591,6 +591,8 @@ extern struct event_constraint 
intel_snb_pebs_event_constraints[];
 
 extern struct event_constraint intel_ivb_pebs_event_constraints[];
 
+extern struct event_constraint intel_hsw_pebs_event_constraints[];
+
 struct event_constraint *intel_pebs_constraints(struct perf_event *event);
 
 void intel_pmu_pebs_enable(struct perf_event *event);
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index b903eb0..1770fb0 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -826,7 +826,8 @@ static inline bool intel_pmu_needs_lbr_smpl(struct 
perf_event *event)
return true;
 
/* implicit branch sampling to correct PEBS skid */
-   if (x86_pmu.intel_cap.pebs_trap && event->attr.precise_ip > 1)
+   if (x86_pmu.intel_cap.pebs_trap && event->attr.precise_ip > 1 &&
+   x86_pmu.intel_cap.pebs_format < 2)
return true;
 
return false;
@@ -2118,8 +2119,9 @@ __init int intel_pmu_init(void)
intel_pmu_lbr_init_nhm();
 
x86_pmu.event_constraints = intel_hsw_event_constraints;
-
+   x86_pmu.pebs_constraints = intel_hsw_pebs_event_constraints;
x86_pmu.extra_regs = intel_snb_extra_regs;
+   x86_pmu.pebs_aliases = intel_pebs_aliases_snb;
/* all extra regs are per-cpu when HT is on */
x86_pmu.er_flags |= ERF_HAS_RSP_1;
x86_pmu.er_flags |= ERF_NO_HT_SHARING;
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c 
b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 9d0dae0..16d7c58 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -427,6 +427,35 @@ struct event_constraint intel_ivb_pebs_event_constraints[] 
= {
 EVENT_CONSTRAINT_END
 };
 
+struct event_constraint intel_hsw_pebs_event_constraints[] = {
+   INTEL_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PRECDIST */
+   INTEL_UEVENT_CONSTRAINT(0x01c2, 0xf), /* UOPS_RETIRED.ALL */
+   INTEL_UEVENT_CONSTRAINT(0x02c2, 0xf), /* UOPS_RETIRED.RETIRE_SLOTS */
+   INTEL_EVENT_CONSTRAINT(0xc4, 0xf),/* BR_INST_RETIRED.* */
+   INTEL_UEVENT_CONSTRAINT(0x01c5, 0xf), /* BR_MISP_RETIRED.CONDITIONAL */
+   INTEL_UEVENT_CONSTRAINT(0x04c5, 0xf), /* BR_MISP_RETIRED.ALL_BRANCHES */
+   INTEL_UEVENT_CONSTRAINT(0x20c5, 0xf), /* BR_MISP_RETIRED.NEAR_TAKEN */
+   INTEL_EVENT_CONSTRAINT(0xcd, 0x8),/* MEM_TRANS_RETIRED.* */
+   INTEL_UEVENT_CONSTRAINT(0x11d0, 0xf), /* 
MEM_UOPS_RETIRED.STLB_MISS_LOADS */
+   INTEL_UEVENT_CONSTRAINT(0x12d0, 0xf), /* 
MEM_UOPS_RETIRED.STLB_MISS_STORES */
+   INTEL_UEVENT_CONSTRAINT(0x21d0, 0xf), /* MEM_UOPS_RETIRED.LOCK_LOADS */
+   INTEL_UEVENT_CONSTRAINT(0x41d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_LOADS */
+   INTEL_UEVENT_CONSTRAINT(0x42d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_STORES 
*/
+   INTEL_UEVENT_CONSTRAINT(0x81d0, 0xf), /* MEM_UOPS_RETIRED.ALL_LOADS */
+   INTEL_UEVENT_CONSTRAINT(0x82d0, 0xf), /* MEM_UOPS_RETIRED.ALL_STORES */
+   INTEL_UEVENT_CONSTRAINT(0x01d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L1_HIT */
+   INTEL_UEVENT_CONSTRAINT(0x02d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L2_HIT */
+   INTEL_UEVENT_CONSTRAINT(0x04d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L3_HIT */
+   INTEL_UEVENT_CONSTRAINT(0x40d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.HIT_LFB 
*/
+   INTEL_UEVENT_CONSTRAINT(0x01d2, 0xf), /* 
MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_MISS */
+   INTEL_UEVENT_CONSTRAINT(0x02d2, 0xf), /* 
MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT */
+   INTEL_UEVENT_CONSTRAINT(0x02d3, 0xf), /* 
MEM_LOAD_UOPS_LLC_MISS_RETIRED.LOCAL_DRAM */
+   INTEL_UEVENT_CONSTRAINT(0x04c8, 0xf), /* HLE_RETIRED.Abort */
+   INTEL_UEVENT_CONSTRAINT(0x04c9, 0xf), /* RTM_RETIRED.Abort */
+
+   EVENT_CONSTRAINT_END
+};
+
 struct event_constraint *intel_pebs_constraints(struct perf_event *event)
 {
struct event_constraint *c;
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1238 matches

Mail list logo