x86 force enabling perf events

2013-09-28 Thread Andi Kleen

Frederic,

I noticed that an allnoconfig x86 build builds perf events.

I think it comes from this commit of yours:

commit 99e8c5a3b875a34d894a711c9a3669858d6adf45
Author: Frederic Weisbecker 
Date:   Thu Dec 17 01:33:54 2009 +0100

hw-breakpoints: Fix hardware breakpoints -> perf events dependency

The kbuild's select command doesn't propagate through the config
dependencies.

Hence the current rules of hardware breakpoint's config can't
ensure perf can never be disabled under us.

Given that perf is quite big these days, and there are likely a lot of systems
which don't need it (everything which noone does development with) 
that seems very drastic.

Is there a way to solve your old build problem without that
unconditional select?

   textdata bss dec hex filename
  40154 879 216   41249a121 kernel/events/built-in.o
816  29   8 853 355 kernel/events/callchain.o
  36305 694 200   37199914f kernel/events/core.o
   1840 136   81984 7c0 kernel/events/hw_breakpoint.o
   1170   0   01170 492 kernel/events/ring_buffer.o
   3122 985   04107100b arch/x86/kernel/cpu/perf_event_amd.o
   1839 304  162159 86f 
arch/x86/kernel/cpu/perf_event_amd_uncore.o
   46414009   0865021ca 
arch/x86/kernel/cpu/perf_event_intel_ds.o
  11189   1   0   111902bb6 
arch/x86/kernel/cpu/perf_event_intel_lbr.o
  125904613   1   172044334 arch/x86/kernel/cpu/perf_event_intel.o
  20987   12523 164   33674838a 
arch/x86/kernel/cpu/perf_event_intel_uncore.o
   1686 897   02583 a17 arch/x86/kernel/cpu/perf_event_knc.o
  120655117 108   17290438a arch/x86/kernel/cpu/perf_event.o
   34361476   049121330 arch/x86/kernel/cpu/perf_event_p4.o
   1234 376   01610 64a arch/x86/kernel/cpu/perf_event_p6.o


-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Copy on write hard links?

2013-09-28 Thread Pádraig Brady
On 09/25/2013 03:37 PM, richard -rw- weinberger wrote:
> On Wed, Sep 25, 2013 at 4:28 PM, Thomas Meyer  wrote:
>> Am Mittwoch, den 25.09.2013, 08:59 -0500 schrieb Rob Landley:
>>> On 09/24/2013 01:36:56 PM, Thomas Meyer wrote:
 Hi,

 Is there such a thing?
>>>
>>> In the kernel's vfs layer?
>>
>> Yes, that would be a nice feature!
> 
> You mean reflinks?
> Currently only OCFS2 and btrfs support them.
> Both using a fs specific ioctl().
> IIRC GNU cp uses the btrfs specific one if the --reflink parameter is used.

coreutils is waiting for a reflink syscall to materialize
rather than adding new per filesystem support
http://lwn.net/Articles/335380/

thanks,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v1.1] x86, boot: Further compress CPUs bootup message

2013-09-28 Thread H. Peter Anvin
On 09/28/2013 01:10 PM, Borislav Petkov wrote:
> On Sat, Sep 28, 2013 at 09:58:04PM +0200, Ingo Molnar wrote:
>> It's also probably the first time that code entered on an ordinary
>> cell phone has gets into the Linux kernel, so it's probably a new
>> Linux milestone, in a twisted, sick way. ;-)
> 
> We could put the following comment *below* the function in
> arch/x86/lib/misc.c:
> 
> /*
>  * Sent from my mobile phone.  Please pardon brevity and lack of formatting.
>  */
> 
> LoooL.
> 

*Snork* ;)

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/26] clk: prima2: declare OF clock provider

2013-09-28 Thread Barry Song
2013/9/28 Sebastian Hesselbarth :
> On 09/22/2013 12:37 PM, Barry Song wrote:
>>
>> 2013/9/22 Sebastian Hesselbarth :
>>>
>>> On 09/19/2013 10:48 AM, Sebastian Hesselbarth wrote:


 On 09/19/13 10:45, Barry Song wrote:
>>
>>
>> @@ -1124,3 +1106,4 @@ void __init sirfsoc_of_clk_init(void)
>>
>>   of_clk_add_provider(np, of_clk_src_onecell_get, _data);
>>}
>> +CLK_OF_DECLARE(sirfsoc_clk, "sirf,prima2-clkc", sirfsoc_clk_init);
>
>
>
> hi Sebastian,
> what if the driver is compatible with multiple clk controllers and we
> want to add another compatible string here except "sirf,prima2-clkc"?
>
> -barry
>

 Just add
 CLK_OF_DECLARE(other_fancy_clk, "sirf,different-compatible",
 sirfsoc_clk_init);
>>>
>>>
>>>
>>> Barry, BTW, do I get your Acked-By for this and the corresponding
>>> removal patch?
>>
>>
>> Sebastian, would you wait for a while as i need to merge them and make a
>> test.
>
>
> Sure, take your time.
> I now have almost all Acked-by's for the other patches, would be
> great if you find some time to test this on prima2 anytime soon.

Acked-by: Barry Song 

i have another clock
patchset:http://permalink.gmane.org/gmane.linux.ports.arm.kernel/268241

i will rebase mine against your this one.

>
> Sebastian

-barry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] clocksource: Add Allwinner SoCs HS timers driver

2013-09-28 Thread Emilio López

Hi Maxime,

El 26/09/13 10:13, Maxime Ripard escribió:

On Wed, Sep 25, 2013 at 08:23:14PM -0300, Emilio López wrote:

El 25/09/13 11:03, Maxime Ripard escribió:

Most of the Allwinner SoCs (at this time, all but the A10) also have a
High Speed timers that are not using the 24MHz oscillator as a source
but rather the AHB clock running much faster.

The IP is slightly different between the A10s/A13 and the one used in
the A20/A31, since the latter have 4 timers available, while the former
have only 2 of them.

Signed-off-by: Maxime Ripard 
---


[...]


+static void __init sun5i_timer_init(struct device_node *node)
+{
+   unsigned long rate;
+   struct clk *clk;
+   int ret, irq;
+   u32 val;
+
+   timer_base = of_iomap(node, 0);
+   if (!timer_base)
+   panic("Can't map registers");
+
+   irq = irq_of_parse_and_map(node, 0);
+   if (irq <= 0)
+   panic("Can't parse IRQ");
+
+   clk = of_clk_get(node, 0);
+   if (IS_ERR(clk))
+   panic("Can't get timer clock");


I'm not familiar with clocksources, but does this have to be as
fatal as it is considering the kernel also supports the slower sun4i
timer?


Hmmm, I don't know, one might choose to enable only this timer, in that
case that would make sense to panic, since it would be the only timer in
that case.


Fair enough.


Also, would any special considerations be needed when adjusting the
ahb clock? A future cpufreq driver will most likely need to.


While this will be needed at some point, I don't really see how to
handle that properly. The clock framework doesn't seem to have any
callback when it comes to reconfiguring a clock that a device might
use.


Maybe we should consider using one of the other timers; from a quick 
look at the A20 user manual, it seems they can run at ~200MHz ("PLL6/6" 
as input)



This will also creates trouble for IPs such as the I2C that have to
setup internal dividers, and use clk_get_rate to do so.


Not really, because they use APB1 which is not scaled. The manual makes 
it explicit when describing APB1:


"This clock is used for some special module apbclk(twi,uart,
ps2, can, scr). Because these modules need special clock rate
even if the apbclk changed."

Cheers,

Emilio
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 08/11] rcu: Micro-optimize rcu_cpu_has_callbacks()

2013-09-28 Thread Chen Gang
On 09/27/2013 10:29 AM, Chen Gang wrote:
> On 09/27/2013 02:33 AM, Paul E. McKenney wrote:
>> On Thu, Sep 26, 2013 at 10:57:39AM +0800, Chen Gang wrote:
>>> On 09/26/2013 04:16 AM, Paul E. McKenney wrote:
 On Wed, Sep 25, 2013 at 10:55:30AM +0800, Chen Gang wrote:
>
> Thank you for your whole work, firstly  :-).
>
> And your suggestion about testing (in our discussion) is also valuable
> to me.
>
> I need start LTP in q4. After referenced your suggestion, my first step
> for using/learning LTP is not mainly for finding kernel issues, but for
> testing kernel (to improve my kernel testing efficiency).
>
> When I want to find issues by reading code, I will consider about LTP
> too (I will try to find issues which can be tested by LTP).

 Doing more testing will be good!  You will probably need more tests
 than just LTP, but you must of course start somewhere.
>>>
>>> Give more testing is good, but also mean more time resources cost. If
>>> spend the 'cost', also need get additional 'contributions' (not only
>>> prove an issue), or the 'efficiency' can not be 'acceptable'.
>>>
>>> When "I need more tests than just LTP", firstly I need perform this
>>> test, and then, also try to send "test case" to LTP (I guess, these
>>> kinds of mails are welcomed by LTP).
>>>
>>> And LTP is also a way to find kernel issues, although I will not mainly
>>> depend on it now (but maybe in future), it is better to familiar with it
>>> step by step.
>>>
>>> LTP (Linux Test Project) is one of main kernel mad user at downstream.
>>> Tool chain (GCC/Binutils) is one of kernel main mad tools at upstream.
>>> If we face to the whole kernel, suggest to use them. ;-)
>>
>> Yep, starting with just LTP is OK.  But if by this time next year you
>> really should be using more than just LTP.
>>

What I have done is trying to fully use other members contributions, not trying 
to instead of them.


And the reason why I want/try to 'open' my 'ideas' to public:

  get more suggestions, and completions from other members.

  share my ideas, it can let other members provide more contributions (e.g. I 
am glad, if find other members also try 'allmodconfig' on all architectures).

  If some members replicate me, I will save my current time resources and 
devote them to another things (which also based on other members contributions).


In my opinion:

  "Open and Share" are both important and urgent to everyone, although it may 
not be noticed directly. Like "Air and Water" which God have blessed to 
everyone.


Thanks.

> 
> Hmm... LTP is "Linux Test Project", if I make some test cases which is
> useful for the issue which I find, I guess, these test cases are also
> welcomed by LTP.
> 
> Except testing, "I really should be using more than just LTP" (just
> like you said).
> 
> e.g.
> 
>   Tool Chain: just I am trying.
> 
> According to my current time resources, within this year, I can not 
> finish allmodconfig on all architectures. :-(
> I am just solving one gcc issue, it seems it is not quite difficult, but 
> at least now, I have no time on it. :-(
> 
>   Documents: just I am trying.
> 
> I am trying to discuss API definition comments, but it seems I am not 
> well done. :-(
> I am also trying some of trivial patches, neither seems what I have done 
> is well enough. :-(
> Communicating and discussing related issues with other members. Only 
> this, it seems not quite bad. :-)
> 
>   LTP:  I will try in q4 2013.
> 
> In fact, when I first comes to our Public Kernel, I already use LTP (and 
> disccus an nfs issue by LTP test), which is still suspending. :-(
> In my original plan (not declare to outside), I want to start LTP in q3 
> 2013, but fails (because of no time resources). :-(
> 
> 
>   Bugzilla: plan to try in next year.
> 
> I also want to solve some issues which comes from Bugzilla (especially 
> for some issues which no one wants to try).
> but according to my current action result and time resources, I can not 
> dare to declare it to outside in next year. :-(
> 
>   And I still have some company internal things to do (which may be urgent, 
> sometimes), it will consume my 20-40% time resources. :-(
> 
> 
> So, please understand with each other: every members' time resource is
> expensive, we have to take care of it. and also, I thank all members
> who can spend their time resources on my mail and disccus with me.
> 
> 
> Thanks.
> 
>>  Thanx, Paul
>>
> On 09/25/2013 09:29 AM, Paul E. McKenney wrote:
>> From: "Paul E. McKenney" 
>>
>> The for_each_rcu_flavor() loop unconditionally scans all flavors, even
>> when the first flavor might have some non-lazy callbacks.  Once the
>> loop has seen a non-lazy callback, further passes through the loop
>> cannot change the state.  This is not a huge problem, given that there
>> can be at most three RCU 

Re: [tip:x86/boot] x86: Improve the printout of the SMP bootup CPU table

2013-09-28 Thread Yinghai Lu
On Sat, Sep 28, 2013 at 1:28 AM, tip-bot for Borislav Petkov
 wrote:
> Commit-ID:  646e29a1789a3a936871008c15199c50367bf291
> Gitweb: http://git.kernel.org/tip/646e29a1789a3a936871008c15199c50367bf291
> Author: Borislav Petkov 
> AuthorDate: Fri, 27 Sep 2013 16:35:54 +0200
> Committer:  Ingo Molnar 
> CommitDate: Sat, 28 Sep 2013 10:10:26 +0200
>
> x86: Improve the printout of the SMP bootup CPU table
>
> As the new x86 CPU bootup printout format code maintainer, I am
> taking immediate action to improve and clean (and thus indulge
> my OCD) the reporting of the cores when coming up online.
>
> Fix padding to a right-hand alignment, cleanup code and bind
> reporting width to the max number of supported CPUs on the
> system, like this:
>
>  [0.074509] smpboot: Booting Node   0, Processors:  #1  #2  #3  #4  
> #5  #6  #7 OK
>  [0.644008] smpboot: Booting Node   1, Processors:  #8  #9 #10 #11 #12 
> #13 #14 #15 OK
>  [1.245006] smpboot: Booting Node   2, Processors: #16 #17 #18 #19 #20 
> #21 #22 #23 OK
>  [1.864005] smpboot: Booting Node   3, Processors: #24 #25 #26 #27 #28 
> #29 #30 #31 OK
>  [2.489005] smpboot: Booting Node   4, Processors: #32 #33 #34 #35 #36 
> #37 #38 #39 OK
>  [3.093005] smpboot: Booting Node   5, Processors: #40 #41 #42 #43 #44 
> #45 #46 #47 OK
>  [3.698005] smpboot: Booting Node   6, Processors: #48 #49 #50 #51 #52 
> #53 #54 #55 OK
>  [4.304005] smpboot: Booting Node   7, Processors: #56 #57 #58 #59 #60 
> #61 #62 #63 OK
>  [4.961413] Brought up 64 CPUs
>
> and this:
>
>  [0.072367] smpboot: Booting Node   0, Processors:#1 #2 #3 #4 #5 #6 
> #7 OK
>  [0.686329] Brought up 8 CPUs
>
> Signed-off-by: Borislav Petkov 
> Cc: Libin 
> Cc: wangyij...@huawei.com
> Cc: fenghua...@intel.com
> Cc: guohan...@huawei.com
> Cc: paul.gortma...@windriver.com
> Link: http://lkml.kernel.org/r/20130927143554.gf4...@pd.tnic
> Signed-off-by: Ingo Molnar 
> ---
>  arch/x86/include/asm/misc.h |  6 ++
>  arch/x86/kernel/smpboot.c   | 21 +++--
>  arch/x86/lib/Makefile   |  2 +-
>  arch/x86/lib/misc.c | 11 +++
>  4 files changed, 33 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/include/asm/misc.h b/arch/x86/include/asm/misc.h
> new file mode 100644
> index 000..475f5bb
> --- /dev/null
> +++ b/arch/x86/include/asm/misc.h
> @@ -0,0 +1,6 @@
> +#ifndef _ASM_X86_MISC_H
> +#define _ASM_X86_MISC_H
> +
> +int num_digits(int val);
> +
> +#endif /* _ASM_X86_MISC_H */
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 6cacab6..d41f3ba 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -73,11 +73,10 @@
>  #include 
>  #include 
>  #include 
> -
>  #include 
>  #include 
> -
>  #include 
> +#include 
>
>  /* State of each CPU */
>  DEFINE_PER_CPU(int, cpu_state) = { 0 };
> @@ -653,17 +652,27 @@ static void announce_cpu(int cpu, int apicid)
>  {
> static int current_node = -1;
> int node = early_cpu_to_node(cpu);
> -   int max_cpu_present = find_last_bit(cpumask_bits(cpu_present_mask), 
> NR_CPUS);
> +   static int width;
> +
> +   if (!width)
> +   width = num_digits(num_possible_cpus()) + 1; /* + '#' sign */
>
> if (system_state == SYSTEM_BOOTING) {
> if (node != current_node) {
> if (current_node > (-1))
> pr_cont(" OK\n");
> current_node = node;
> -   pr_info("Booting Node %3d, Processors ", node);
> +   pr_info("Booting Node %3d, Processors:", node);
> }
> -   pr_cont(" #%4d%s", cpu, cpu == max_cpu_present ? " OK\n" : 
> "");
> -   return;
> +
> +   /* Add padding for the BSP */
> +   if (cpu == 1)
> +   pr_cont("%*s", width + 1, " ");

what's point to add the pad?

How do you know BIOS MADT or kernel MADT parsing code would have cpu1
the same node cpu0?

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


IRQ SMP affinity Problem

2013-09-28 Thread Yijing Wang
Hi Ingo,
   I found problems about irq affinity in my machine, I don't know whether it's 
my machine problem
or linux can not support irq affinity setting in my machine. Do you have time 
to help point out that?
Thank you very much!

According to your Documentation/IRQ-Affiniy.txt
I try to test irq affinity in my machine.
Because apic flat will only be selected when possible cpus < 8, But my machine 
CPUs > 8, really 24 cpus,
so I forced code to select apic flat model. like

 55 void __init default_setup_apic_routing(void)
 56 {
 57 #ifdef CONFIG_X86_X2APIC
 58 if (x2apic_mode
 59 #ifdef CONFIG_X86_UV
 60&& apic != _x2apic_uv_x
 61 #endif
 62) {
 63 if (x2apic_phys)
 64 apic = _x2apic_phys;
 65 else
 66 apic = _x2apic_cluster;
 67 }
 68 #endif
 69 /*
 70 if (apic == _flat && num_possible_cpus() > 8) > I mask 
these two line codes to force kernel use apic flat.
 71 apic = _physflat;
 72 */
 73 printk(KERN_INFO "Setting APIC routing to %s\n", apic->name);


dmesg:
.
Performance Events: Westmere events, Intel PMU driver.
... version:3
... bit width:  48
... generic registers:  4
... value mask: 
... max period: 7fff
... fixed-purpose events:   3
... event mask: 0007000f
ACPI: Core revision 20090903
Setting APIC routing to flat   > apic use flat model now
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel(R) Xeon(R) CPU   E5620  @ 2.40GHz stepping 02
Booting Node   1, Processors  #1 #2 #3
Brought up 4 CPUs
Total of 4 processors activated (19200.97 BogoMIPS).


linux-drf:/lib/modules/2.6.32.12-qiuxishi-5-default/source # cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   E5620  @ 2.40GHz
stepping: 2
cpu MHz : 1600.000
cache size  : 12288 KB
physical id : 1
siblings: 4
core id : 0
cpu cores   : 4
apicid  : 32
initial apicid  : 32
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb 
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc 
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr 
pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat tpr_shadow vnmi flexpriority 
ept vpid
bogomips: 4800.57
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

[.snip.]

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   E5620  @ 2.40GHz
stepping: 2
cpu MHz : 1600.000
cache size  : 12288 KB
physical id : 1
siblings: 4
core id : 10
cpu cores   : 4
apicid  : 52
initial apicid  : 52
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb 
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc 
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr 
pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat tpr_shadow vnmi flexpriority 
ept vpid
bogomips: 4800.15
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

linux-drf:/lib/modules/2.6.32.12-qiuxishi-5-default/source # cat 
/proc/interrupts
CPU0   CPU1   CPU2   CPU3
   0: 84  0  0  5   IO-APIC-edge  timer
   ..
  24:   2845  0  0  0   IO-APIC-fasteoi   eth2
  32:   8617   1201  0  0   IO-APIC-fasteoi   ioc0

linux-drf:/lib/modules/2.6.32.12-qiuxishi-5-default/source # cat 
/proc/irq/24/smp_affinity
ff

I try to set irq 24(eth2) affinity to all f, but irqs from eth2 always deliver 
to cpu0,
my machine cpu is Intel Xeon, so is linux cannot support irq affinity in Intel 
Xeon or something wrong in my test or irq setting?


I found Intel Programming Guide 3A 10.6.2 say:
"In operating systems that use the lowest priority delivery mode but do not 
update
the TPR, the TPR information saved in the chipset will potentially cause the 
interrupt
to be always delivered to the same processor from the logical set".


So in my Xeon machine, the irq affinity work is normal?




-- 

Re: [PATCH net-next] virtio-net: switch to use XPS to choose txq

2013-09-28 Thread Jason Wang
On 09/27/2013 10:35 PM, Michael S. Tsirkin wrote:
> On Fri, Sep 27, 2013 at 01:57:24PM +0800, Jason Wang wrote:
>> We used to use a percpu structure vq_index to record the cpu to queue
>> mapping, this is suboptimal since it duplicates the work of XPS and
>> loses all other XPS functionality such as allowing use to configure
>> their own transmission steering strategy.
>>
>> So this patch switches to use XPS and suggest a default mapping when
>> the number of cpus is equal to the number of queues. With XPS support,
>> there's no need for keeping per-cpu vq_index and .ndo_select_queue(),
>> so they were removed also.
>>
>> Cc: Rusty Russell 
>> Cc: Michael S. Tsirkin 
>> Signed-off-by: Jason Wang 
> More lines deleted that added is good :)
> But how does the result perform?
> About the same?
>

Yes, the same.
>> ---
>>  drivers/net/virtio_net.c |   55 
>> +++--
>>  1 files changed, 9 insertions(+), 46 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index defec2b..4102c1b 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -127,9 +127,6 @@ struct virtnet_info {
>>  /* Does the affinity hint is set for virtqueues? */
>>  bool affinity_hint_set;
>>  
>> -/* Per-cpu variable to show the mapping from CPU to virtqueue */
>> -int __percpu *vq_index;
>> -
>>  /* CPU hot plug notifier */
>>  struct notifier_block nb;
>>  };
>> @@ -1063,7 +1060,6 @@ static int virtnet_vlan_rx_kill_vid(struct net_device 
>> *dev,
>>  static void virtnet_clean_affinity(struct virtnet_info *vi, long hcpu)
>>  {
>>  int i;
>> -int cpu;
>>  
>>  if (vi->affinity_hint_set) {
>>  for (i = 0; i < vi->max_queue_pairs; i++) {
>> @@ -1073,20 +1069,11 @@ static void virtnet_clean_affinity(struct 
>> virtnet_info *vi, long hcpu)
>>  
>>  vi->affinity_hint_set = false;
>>  }
>> -
>> -i = 0;
>> -for_each_online_cpu(cpu) {
>> -if (cpu == hcpu) {
>> -*per_cpu_ptr(vi->vq_index, cpu) = -1;
>> -} else {
>> -*per_cpu_ptr(vi->vq_index, cpu) =
>> -++i % vi->curr_queue_pairs;
>> -}
>> -}
>>  }
>>  
>>  static void virtnet_set_affinity(struct virtnet_info *vi)
>>  {
>> +cpumask_var_t cpumask;
>>  int i;
>>  int cpu;
>>  
>> @@ -1100,15 +1087,21 @@ static void virtnet_set_affinity(struct virtnet_info 
>> *vi)
>>  return;
>>  }
>>  
>> +if (!alloc_cpumask_var(, GFP_KERNEL))
>> +return;
>> +
>>  i = 0;
>>  for_each_online_cpu(cpu) {
>>  virtqueue_set_affinity(vi->rq[i].vq, cpu);
>>  virtqueue_set_affinity(vi->sq[i].vq, cpu);
>> -*per_cpu_ptr(vi->vq_index, cpu) = i;
>> +cpumask_clear(cpumask);
>> +cpumask_set_cpu(cpu, cpumask);
>> +netif_set_xps_queue(vi->dev, cpumask, i);
>>  i++;
>>  }
>>  
>>  vi->affinity_hint_set = true;
>> +free_cpumask_var(cpumask);
>>  }
>>  
>>  static int virtnet_cpu_callback(struct notifier_block *nfb,
>> @@ -1217,28 +1210,6 @@ static int virtnet_change_mtu(struct net_device *dev, 
>> int new_mtu)
>>  return 0;
>>  }
>>  
>> -/* To avoid contending a lock hold by a vcpu who would exit to host, select 
>> the
>> - * txq based on the processor id.
>> - */
>> -static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
>> -{
>> -int txq;
>> -struct virtnet_info *vi = netdev_priv(dev);
>> -
>> -if (skb_rx_queue_recorded(skb)) {
>> -txq = skb_get_rx_queue(skb);
>> -} else {
>> -txq = *__this_cpu_ptr(vi->vq_index);
>> -if (txq == -1)
>> -txq = 0;
>> -}
>> -
>> -while (unlikely(txq >= dev->real_num_tx_queues))
>> -txq -= dev->real_num_tx_queues;
>> -
>> -return txq;
>> -}
>> -
>>  static const struct net_device_ops virtnet_netdev = {
>>  .ndo_open= virtnet_open,
>>  .ndo_stop= virtnet_close,
>> @@ -1250,7 +1221,6 @@ static const struct net_device_ops virtnet_netdev = {
>>  .ndo_get_stats64 = virtnet_stats,
>>  .ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
>>  .ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
>> -.ndo_select_queue = virtnet_select_queue,
>>  #ifdef CONFIG_NET_POLL_CONTROLLER
>>  .ndo_poll_controller = virtnet_netpoll,
>>  #endif
>> @@ -1559,10 +1529,6 @@ static int virtnet_probe(struct virtio_device *vdev)
>>  if (vi->stats == NULL)
>>  goto free;
>>  
>> -vi->vq_index = alloc_percpu(int);
>> -if (vi->vq_index == NULL)
>> -goto free_stats;
>> -
>>  mutex_init(>config_lock);
>>  vi->config_enable = true;
>>  INIT_WORK(>config_work, virtnet_config_changed_work);
>> @@ -1589,7 +1555,7 @@ static int virtnet_probe(struct virtio_device *vdev)
>>  /* Allocate/initialize the 

[git pull] drm fixes

2013-09-28 Thread Dave Airlie
Hi Linus,

nothing too major, radeon still has some dpm changes for off by default.

radeon, intel, msm:
radeon a few more dpm fixes (still off by default), uvd fixes,
i915 runtime warn backtrace and regression fix
msm: iommu changes fallout,

Dave.

The following changes since commit 4a10c2ac2f368583138b774ca41fac4207911983:

  Linux 3.12-rc2 (2013-09-23 15:41:09 -0700)

are available in the git repository at:

  git://people.freedesktop.org/~airlied/linux drm-fixes

for you to fetch changes up to 665441791f553221901f679cb47236798f9a649c:

  Merge branch 'msm-fixes-3.12-rc2' of
git://people.freedesktop.org/~robclark/linux into drm-fixes
(2013-09-29 10:06:28 +1000)



Alex Deucher (13):
  drm/radeon: avoid UVD corruption on AGP cards using GPU gart
  drm/radeon: additional gcc fixes for radeon_atombios.c
  drm/radeon: fix missed variable sized access
  drm/radeon/dpm: fetch the max clk from voltage dep tables helper
  drm/radeon/dpm/btc: filter clocks based on voltage/clk dep tables
  drm/radeon/dpm/ni: filter clocks based on voltage/clk dep tables
  drm/radeon/dpm/si: filter clocks based on voltage/clk dep tables
  drm/radeon/dpm/ci: filter clocks based on voltage/clk dep tables
  drm/radeon: don't set default clocks for SI when DPM is disabled
  drm/radeon: disable tests/benchmarks if accel is disabled
  drm/radeon: add missing hdmi callbacks for rv6xx
  drm/radeon/cik: fix overflow in vram fetch
  drm/radeon: fix hdmi audio on DCE3.0/3.1 asics

Alex Ivanov (1):
  drm/radeon: Make r100_cp_ring_info() and radeon_ring_gfx() safe (v2)

Chris Wilson (1):
  drm/i915: Use a temporary va_list for two-pass string handling

Christian König (1):
  drm/radeon/uvd: lower msg buffer requirements on UVD3

Daniel Vetter (3):
  drm/i915/tv: clear adjusted_mode.flags
  drm/i915: preserve pipe A quirk in i9xx_set_pipeconf
  drm/i915: Fix up usage of SHRINK_STOP

Dave Airlie (3):
  Merge tag 'drm-intel-fixes-2013-09-26' of
git://people.freedesktop.org/~danvet/drm-intel into drm-fixes
  Merge branch 'drm-fixes-3.12' of
git://people.freedesktop.org/~agd5f/linux into drm-fixes
  Merge branch 'msm-fixes-3.12-rc2' of
git://people.freedesktop.org/~robclark/linux into drm-fixes

Jani Nikula (1):
  drm/i915/dp: increase i2c-over-aux retry interval on AUX DEFER

Joerg Roedel (1):
  drm/msm: Remove iommu include from mdp4_kms.c

Michel Dänzer (3):
  drm/radeon/cik: Fix printing of client name on VM protection fault
  drm/radeon/cik: Fix encoding of number of banks in tiling
configuration info
  drm/radeon/cik: Add tiling mode index for 1D tiled depth/stencil surfaces

Rob Clark (2):
  drm/msm: deal with mach/iommu.h removal
  drm/msm: use drm_gem_dumb_destroy helper

Thomas Meyer (1):
  drm/msm: Odd PTR_ERR usage

 drivers/gpu/drm/i915/i915_gem.c  |  8 ++--
 drivers/gpu/drm/i915/i915_gpu_error.c|  6 ++-
 drivers/gpu/drm/i915/intel_display.c |  4 ++
 drivers/gpu/drm/i915/intel_dp.c  | 13 ++-
 drivers/gpu/drm/i915/intel_tv.c  |  8 
 drivers/gpu/drm/msm/mdp4/mdp4_kms.c  |  2 -
 drivers/gpu/drm/msm/msm_drv.c|  8 ++--
 drivers/gpu/drm/msm/msm_gem.c|  7 
 drivers/gpu/drm/radeon/btc_dpm.c | 51 
 drivers/gpu/drm/radeon/btc_dpm.h |  2 +
 drivers/gpu/drm/radeon/ci_dpm.c  | 26 +
 drivers/gpu/drm/radeon/cik.c | 17 
 drivers/gpu/drm/radeon/ni_dpm.c  | 24 
 drivers/gpu/drm/radeon/r100.c|  8 ++--
 drivers/gpu/drm/radeon/r600_dpm.c|  2 +-
 drivers/gpu/drm/radeon/r600_hdmi.c   | 20 +++---
 drivers/gpu/drm/radeon/radeon_asic.c |  2 +
 drivers/gpu/drm/radeon/radeon_atombios.c | 66 +---
 drivers/gpu/drm/radeon/radeon_cs.c   |  5 ++-
 drivers/gpu/drm/radeon/radeon_device.c   | 15 ++--
 drivers/gpu/drm/radeon/radeon_pm.c   |  8 ++--
 drivers/gpu/drm/radeon/radeon_ring.c |  8 ++--
 drivers/gpu/drm/radeon/radeon_uvd.c  |  3 +-
 drivers/gpu/drm/radeon/si_dpm.c  | 24 
 drivers/gpu/drm/radeon/uvd_v1_0.c|  4 +-
 include/uapi/drm/radeon_drm.h|  2 +
 26 files changed, 266 insertions(+), 77 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] USB: OHCI: Check the overrides pointer for NULL in ohci_init_driver()

2013-09-28 Thread Alan Stern
On Sat, 28 Sep 2013, Tomasz Figa wrote:

> A series of commit starting at
> 
> 50a97e059b USB: OHCI: make ohci-exynos a separate driver
> 
> and ending at
> 
> b8ad5c3706 USB: OHCI: make ohci-pxa27x a separate driver
> 
> introduced the concept of separate OHCI drivers for particular
> controllers. Respective drivers need to call ohci_init_driver() to
> initialize hc_driver struct with generic data and to certain extent
> with platform specific overrides through ohci_driver_overrides struct
> passed as second argument to this function. However the code does not
> check if the ohci_driver_overrides struct pointer is non-NULL, which
> leads for a NULL pointer dereference for drivers that do not need any
> overrides.
> 
> This patch fixes the problem by dereferencing the passed pointer to
> ohci_driver_overrides struct only if it is non-NULL.
> 
> Signed-off-by: Tomasz Figa 
> ---
>  drivers/usb/host/ohci-hcd.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/usb/host/ohci-hcd.c b/drivers/usb/host/ohci-hcd.c
> index 21d937a..8ada13f 100644
> --- a/drivers/usb/host/ohci-hcd.c
> +++ b/drivers/usb/host/ohci-hcd.c
> @@ -1161,10 +1161,12 @@ void ohci_init_driver(struct hc_driver *drv,
>   /* Copy the generic table to drv and then apply the overrides */
>   *drv = ohci_hc_driver;
>  
> - drv->product_desc = over->product_desc;
> - drv->hcd_priv_size += over->extra_priv_size;
> - if (over->reset)
> - drv->reset = over->reset;
> + if (over) {
> + drv->product_desc = over->product_desc;
> + drv->hcd_priv_size += over->extra_priv_size;
> + if (over->reset)
> + drv->reset = over->reset;
> + }
>  }
>  EXPORT_SYMBOL_GPL(ohci_init_driver);

You were scooped by Kevin Hilman:

http://marc.info/?l=linux-usb=138029463906143=2

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86: Cleanup boot CPUs table

2013-09-28 Thread Mike Galbraith
On Sat, 2013-09-28 at 10:25 +0200, Ingo Molnar wrote:

> > [0.072367] x86: Booting node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 <-- 
> > [0.686329] x86: Booted up 1 node, 8 CPUs
  ^ 
> - Added nodes count to the 'Booted up' line. Since we count nodes and
> CPUs, we might as well want to demonstrate our superior counting
> skills and print out both.

What superior counting skills?  ;-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kernel/groups.c: consider about NULL for 'group_info' in all related extern functions

2013-09-28 Thread Chen Gang
On 09/27/2013 08:19 PM, Chen Gang wrote:
> On 09/27/2013 07:53 PM, Tejun Heo wrote:
>> Hello,
>>
>> On Fri, Sep 27, 2013 at 03:16:59PM +0800, Chen Gang wrote:
>>> Hmm... do you mean: "can not evaluate an interface before implement(or
>>> read details) them all"?
>>
>> No, I'm saying there are a lot more steps necessary between
>> recognizing that an interface needs an improvement and actually
>> improving it than what you're doing now.
>>
>>> If we are agree with each other that "this interface can be improved",
>>> I will go ahead:
>>>
>>>   I will reference the information which Paul McKenney provided.
>>>   And also, I will use LTP's some features to give a test.
>>>   And also, I will reference some contents you said above.
>>>
>>>   Hope I can finish within next month (2013-10-31).
>>
>> If you want to, go ahead but please see below.
>>
 So, please take some time to mull over why your initial patch was
 completely wrong and I didn't even have to read the code to predict
 that your patch has high chance of being wrong.  Now, you're doing the
 *exactly* same thing in the opposite direction.  You should be able to
 recognize that there's something very wrong with that.
>>>
>>> No, I don't think so, in my opinion, for evaluate an api interface,
>>> don't need see the details implementation, even don't need know all
>>> demands.
>>>
>>> During discussing, anyone can make mistakes, in fact, that is the main
>>> reason why we need discussing.
>>>
>>> Hmm... in my opinion, for evaluate one's way/method whether suitable or
>>> not, it is not based on 1-2 mistakes, it need based on mistake/correct 
>>> ratio.
>>
>> The thing is you are showing a classical and common failure pattern
>> which is known to lead to bad code.  The only safe thing you'd be able
>> to do with your current pattern is making changes which are completely
>> contained and don't affect its interaction with large body of code,
>> and by not doing the necessary steps, you're shifting what you should
>> have done to your reviewers.
>>
>> Your patch is bascially just saying "this part looks a bit
>> inconsistent and may need to be improved" and that's all it is.  This
>> is bad in two ways.  Firstly, the workload on reviewer is higher as
>> they have to do the actual work.  Secondly, it's a lot more likely to
>> lead to bugs as the developer is supposed to be our first and best
>> line of defense against introducing silliness and reviewers operate on
>> the assumption that the developer did her role.
>>
>> Please recognize that obvious local changes and changes which may
>> affect larger interaction are different.  You will need to either
>> stick to obvious local changes or put a lot of effort into learning
>> how to do larger scope work.
>>
> 
> Do we agree with each other:
> 
>   Current 'groups' interface need be improved, although maybe my 2 fix 
> patches are incorrect (but also maybe one of them is correct).
>   And we need additional steps to find the correct fix.
> 
> If so, I should continue, or I think we still need discussing.
> 
> 
>> I hope you understand what I mean.  If not, I don't know what else I
>> can do.  I already spent too much time on this thread and probably
>> won't be as verbose in my future interactions, so if you can come up
>> with a good patch with convincing enough presentation, go for it.  If
>> not, I'm likely to nack it again.
>>
> 
> Hmm... I can understand your feelings. :-)
> 
>> Thanks.
>>

Hmm... excuse me, before getting agreement, I can not "go an inch". And
I think it is still valuable to discuss about it before "go an inch".

How about use WARN_ON() on "!group_info" for groups_search()? It can
let 'groups' interface 'explains' itself 'reasonably' (maybe the 3rd
'patch' is just bothering you? ;-) ).


---patch begin--

kernel/groups.c: add WARN_ON() on "!group_info" for groups_search()

  'groups' interface assumes caller need be sure of 'group_info' valid
  (if 'group_info' is not allocated, it need point to 'init_group').

  If callers pass invalid 'group_info' to groups_search(), we can sure
  "current usage is incorrect, and need be fixed", although we can not
  sure "in current condition, OS must be continuing blindly".

  So need add WARN_ON (not BUG_ON) in groups_search() for alerting.


Signed-off-by: Chen Gang 
---
 kernel/groups.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/groups.c b/kernel/groups.c
index 90cf1c3..d201da0 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -136,7 +136,7 @@ int groups_search(const struct group_info *group_info, 
kgid_t grp)
 {
unsigned int left, right;

-   if (!group_info)
+   if (WARN_ON(!group_info))
return 0;

left = 0;
-- 
1.7.7.6


---patch end


> 
> 
> Thanks.
> 


-- 
Chen Gang
--
To unsubscribe from this list: send the line "unsubscribe 

Re: [PATCH 2/3] clk: samsung: Add clock driver for s5pc100

2013-09-28 Thread Tomasz Figa
On Friday 27 of September 2013 18:37:56 Yadwinder Singh Brar wrote:
> Hi Tomasz,
> 
> On Thu, Sep 26, 2013 at 7:30 PM, Tomasz Figa  wrote:
> > Hi Yadwinder,
> > 
> > I haven't reviewed this series yet, but let me clarify some things
> > from
> > your comments.
> > 
> > On Thursday 26 of September 2013 17:38:58 Yadwinder Singh Brar wrote:
> >> > +
> >> > +/* Helper macros to define clock arrays. */
> >> > +#define FIXED_RATE_CLOCKS(name)\
> >> > +   static struct samsung_fixed_rate_clock name[]
> >> > +#define MUX_CLOCKS(name)   \
> >> > +   static struct samsung_mux_clock name[]
> >> > +#define DIV_CLOCKS(name)   \
> >> > +   static struct samsung_div_clock name[]
> >> > +#define GATE_CLOCKS(name)  \
> >> > +   static struct samsung_gate_clock name[]
> >> > +
> >> 
> >> These macros seems little bit odd in our common practice,
> >> perhaps these are making code harder to read below.
> > 
> > They allow array declaration to fit into single line. I agree that it
> > is not particularly easy to read at first sight, but shouldn't really
> > be much of nuisance.
> 
> Defining a macro just to use once/twice, especially hiding the
> definition of some array, doesn't looks justified.

If it makes the code look better, then I believe it's justified. If this 
really looks that scary for you then I won't insist to keep it, though ;).

> >In addition, most of this driver is based on macros
> >
> > like this, e.g. GATE(), MUX(), PNAME(), etc.
> > 
> >> > +PNAME(mout_i2s_2_p) = {
> >> > +   "fout_epll",
> >> > +   "i2scdclk0",
> >> > +   "dout_audio0",
> >> > +   "none"
> >> > +};
> >> > +
> >> 
> >> Using one line per parent isn't increasing length of file
> >> unnecessarily?> 
> > I believe this improves readability. Do we really care about size of
> > source code that much, over readability?
> 
> yes, its looks little bit clean but in this case I felt, its making
> the traversability in file difficult due to length of file.

Most modern editors (like vim or emacs) have symbol browsers, so I don't 
think this is an issue. Instead it's easy to look up which parent has 
which index and any further correction will not cause merge conflicts, due 
to having only one entry per line.

> >> > +   ALIAS(SCLK_AUDIO0, "soc-audio.0", "sclk_audio"),
> >> > +   ALIAS(SCLK_AUDIO1, "soc-audio.1", "sclk_audio"),
> >> > +   ALIAS(SCLK_AUDIO2, "soc-audio.2", "sclk_audio"),
> >> > +   ALIAS(KEYIF, NULL, "keypad"),
> >> > +
> >> > +   ALIAS(MFC, "s5p-mfc", "sclk_mfc"),
> >> > +   ALIAS(G2D, "s5p-g2d", "fimg2d"),
> >> > +
> >> > +};
> >> > +
> >> 
> >> Any reason/hidden advantage for using a separate of ALIAS,
> >> instead of using MUX_A/GATE_A ?
> > 
> > Yes, not even hidden. Alias is not a property of clock. One clock can
> > have multiple aliases, e.g. the same clock being input to multiple
> > devices.
> 
> Yes, its required if same clk has different alias for different devices,
> but while using same alias for different(all, in this case) devices,
> doesn't seems advantageous.

An alias (technically clkdev lookup) is an existence separate from a 
clock. It's a binding of controller's clock output and device's clock 
input. Even if sometimes there is a 1:1 mapping of clocks and devices, 
there is no reason to mix them together. Moreover, since there is a need 
to provide more than one alias per clock, there is even less reason to 
provide two different ways of defining them.

This way makes the code easier to read, because in clock tables you just 
have data internal to common clock framework and in alias tables you have 
data that belongs to clkdev.

Best regards,
Tomasz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fwd: [v3.12-rc1] [regression] PM / hibernate: Create memory bitmaps after freezing user space

2013-09-28 Thread Rafael J. Wysocki
On Saturday, September 28, 2013 08:18:18 PM Ronald wrote:
> [ resend, forgot to disable HTML (sorry!) ]
> 
> Dear kernel developers,
> 
> Commit 8fd37a4c9 (PM / hibernate: Create memory bitmaps after freezing
> user space) causes resume to fail.
> 
> Only, when using the s2disk utility (through pm-hibernate).
> 
> Not when I do:
> 
> echo -n "disk" > /sys/power/state
> 
> Reverting the commit did not work.
> 
> I am using a encrypted LUKS partition with a temporary key that is
> functioning as the swap device used for the suspend image.
> 
> Awaiting further orders

I'm traveling now, so I can't really test things, but I think I know what the
problem is.

Can you please check if the appended patch makes any difference for you?

Rafael


---
 kernel/power/snapshot.c |5 -
 kernel/power/user.c |5 +
 2 files changed, 9 insertions(+), 1 deletion(-)

Index: linux-pm/kernel/power/snapshot.c
===
--- linux-pm.orig/kernel/power/snapshot.c
+++ linux-pm/kernel/power/snapshot.c
@@ -743,7 +743,10 @@ int create_basic_memory_bitmaps(void)
struct memory_bitmap *bm1, *bm2;
int error = 0;
 
-   BUG_ON(forbidden_pages_map || free_pages_map);
+   if (forbidden_pages_map && free_pages_map)
+   return 0;
+   else
+   BUG_ON(forbidden_pages_map || free_pages_map);
 
bm1 = kzalloc(sizeof(struct memory_bitmap), GFP_KERNEL);
if (!bm1)
Index: linux-pm/kernel/power/user.c
===
--- linux-pm.orig/kernel/power/user.c
+++ linux-pm/kernel/power/user.c
@@ -82,6 +82,9 @@ static int snapshot_open(struct inode *i
data->swap = -1;
data->mode = O_WRONLY;
error = pm_notifier_call_chain(PM_RESTORE_PREPARE);
+   if (!error)
+   error = create_basic_memory_bitmaps();
+
if (error)
pm_notifier_call_chain(PM_POST_RESTORE);
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] clk: Use kcalloc() to allocate arrays

2013-09-28 Thread Tomasz Figa
Instead of calculating sizes of arrays manually, kcalloc() can be used
to allocate arrays of elements with defined size. This is just a cleanup
patch without any functional changes.

Signed-off-by: Tomasz Figa 
---
 drivers/clk/clk.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 9e0a837..63f9ac1 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -1085,8 +1085,8 @@ static int clk_fetch_parent_index(struct clk *clk, struct 
clk *parent)
int i;
 
if (!clk->parents) {
-   clk->parents = kzalloc((sizeof(struct clk*) * clk->num_parents),
-   GFP_KERNEL);
+   clk->parents = kcalloc(clk->num_parents,
+   sizeof(struct clk *), GFP_KERNEL);
if (!clk->parents)
return -ENOMEM;
}
@@ -1535,7 +1535,7 @@ static struct clk *__clk_init_parent(struct clk *clk)
 
if (!clk->parents)
clk->parents =
-   kzalloc((sizeof(struct clk*) * clk->num_parents),
+   kcalloc(clk->num_parents, sizeof(struct clk *),
GFP_KERNEL);
 
ret = clk_get_parent_by_index(clk, index);
@@ -1692,8 +1692,8 @@ int __clk_init(struct device *dev, struct clk *clk)
 * for clock drivers to statically initialize clk->parents.
 */
if (clk->num_parents > 1 && !clk->parents) {
-   clk->parents = kzalloc((sizeof(struct clk*) * clk->num_parents),
-   GFP_KERNEL);
+   clk->parents = kcalloc(clk->num_parents, sizeof(struct clk *),
+   GFP_KERNEL);
/*
 * __clk_lookup returns NULL for parents that have not been
 * clk_init'd; thus any access to clk->parents[] must check
@@ -1833,8 +1833,8 @@ static int _clk_register(struct device *dev, struct 
clk_hw *hw, struct clk *clk)
hw->clk = clk;
 
/* allocate local copy in case parent_names is __initdata */
-   clk->parent_names = kzalloc((sizeof(char*) * clk->num_parents),
-   GFP_KERNEL);
+   clk->parent_names = kcalloc(clk->num_parents, sizeof(char *),
+   GFP_KERNEL);
 
if (!clk->parent_names) {
pr_err("%s: could not allocate clk->parent_names\n", __func__);
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] clk: Correct lookup logic in clk_fetch_parent_index()

2013-09-28 Thread Tomasz Figa
This function is supposed to iterate over all parents of given child
clock to find the index of given parent clock in its parent list,
using parent cache if possible and falling back to string compare
otherwise. However currently the logic falls back to string compare in
every iteration in which clock cache entry does not match given parent,
due to wrong check conditions.

This patch corrects the logic to continue the loop if parent cache entry
is present and does not match requested parent clock. In addition,
redundant checks for parent cache array presence are removed, because it
is always allocated in the beginning of the function.

Signed-off-by: Tomasz Figa 
---
 drivers/clk/clk.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 63f9ac1..32e2fed 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -1097,11 +1097,14 @@ static int clk_fetch_parent_index(struct clk *clk, 
struct clk *parent)
 * them now to avoid future calls to __clk_lookup.
 */
for (i = 0; i < clk->num_parents; i++) {
-   if (clk->parents && clk->parents[i] == parent)
+   if (clk->parents[i] == parent)
return i;
-   else if (!strcmp(clk->parent_names[i], parent->name)) {
-   if (clk->parents)
-   clk->parents[i] = __clk_lookup(parent->name);
+
+   if (clk->parents[i])
+   continue;
+
+   if (!strcmp(clk->parent_names[i], parent->name)) {
+   clk->parents[i] = __clk_lookup(parent->name);
return i;
}
}
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] clk: Add error handling to clk_fetch_parent_index()

2013-09-28 Thread Tomasz Figa
There are at least two different error cases that can happen in
clk_fetch_parent_index() function:
 - allocation failure,
 - parent clock lookup failure,
however it returns only an u8, which is supposed to contain parent clock
index.

This patch modified the function to return full int instead allowing
positive clock indices and negative error codes to be returned. All
users of this function are adjusted as well to handle the return value
correctly.

Signed-off-by: Tomasz Figa 
---
 drivers/clk/clk.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index a004769..9e0a837 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -1080,13 +1080,16 @@ unsigned long clk_get_rate(struct clk *clk)
 }
 EXPORT_SYMBOL_GPL(clk_get_rate);
 
-static u8 clk_fetch_parent_index(struct clk *clk, struct clk *parent)
+static int clk_fetch_parent_index(struct clk *clk, struct clk *parent)
 {
-   u8 i;
+   int i;
 
-   if (!clk->parents)
+   if (!clk->parents) {
clk->parents = kzalloc((sizeof(struct clk*) * clk->num_parents),
GFP_KERNEL);
+   if (!clk->parents)
+   return -ENOMEM;
+   }
 
/*
 * find index of new parent clock using cached parent ptrs,
@@ -1095,15 +1098,15 @@ static u8 clk_fetch_parent_index(struct clk *clk, 
struct clk *parent)
 */
for (i = 0; i < clk->num_parents; i++) {
if (clk->parents && clk->parents[i] == parent)
-   break;
+   return i;
else if (!strcmp(clk->parent_names[i], parent->name)) {
if (clk->parents)
clk->parents[i] = __clk_lookup(parent->name);
-   break;
+   return i;
}
}
 
-   return i;
+   return -EINVAL;
 }
 
 static void clk_reparent(struct clk *clk, struct clk *new_parent)
@@ -1265,7 +1268,7 @@ static struct clk *clk_calc_new_rates(struct clk *clk, 
unsigned long rate)
struct clk *old_parent, *parent;
unsigned long best_parent_rate = 0;
unsigned long new_rate;
-   u8 p_index = 0;
+   int p_index = 0;
 
/* sanity */
if (IS_ERR_OR_NULL(clk))
@@ -1306,7 +1309,7 @@ static struct clk *clk_calc_new_rates(struct clk *clk, 
unsigned long rate)
/* try finding the new parent index */
if (parent) {
p_index = clk_fetch_parent_index(clk, parent);
-   if (p_index == clk->num_parents) {
+   if (p_index < 0) {
pr_debug("%s: clk %s can not be parent of clk %s\n",
 __func__, parent->name, clk->name);
return NULL;
@@ -1568,7 +1571,7 @@ void __clk_reparent(struct clk *clk, struct clk 
*new_parent)
 int clk_set_parent(struct clk *clk, struct clk *parent)
 {
int ret = 0;
-   u8 p_index = 0;
+   int p_index = 0;
unsigned long p_rate = 0;
 
if (!clk)
@@ -1597,10 +1600,10 @@ int clk_set_parent(struct clk *clk, struct clk *parent)
if (parent) {
p_index = clk_fetch_parent_index(clk, parent);
p_rate = parent->rate;
-   if (p_index == clk->num_parents) {
+   if (p_index < 0) {
pr_debug("%s: clk %s can not be parent of clk %s\n",
__func__, parent->name, clk->name);
-   ret = -EINVAL;
+   ret = p_index;
goto out;
}
}
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Please revert 928bea964827d7824b548c1f8e06eccbbc4d0d7d

2013-09-28 Thread Rafael J. Wysocki
On Friday, September 27, 2013 04:44:20 PM Yinghai Lu wrote:
> [+ Rafael]
> 
> On Fri, Sep 27, 2013 at 4:19 PM, Benjamin Herrenschmidt
>  wrote:
> > On Fri, 2013-09-27 at 15:56 -0700, Yinghai Lu wrote:
> >
> >> ok, please if you are ok attached one instead. It will print some warning 
> >> about
> >> driver skipping pci_set_master, so we can catch more problem with drivers.
> >
> > Except that the message is pretty cryptic :-) Especially since the
> > driver causing the message to be printed is not the one that did
> > the mistake in the first place, it's the next one coming up that
> > trips the warning.
> >
> > In any case, the root cause is indeed the PCIe port driver:
> >
> > We don't have ACPI, so pcie_port_platform_notify() isn't implemented,
> > and pcie_ports_auto is true, so we end up with capabilities set to 0.
> 
> in
> | commit fe31e69740eddc7316071ed5165fed6703c8cd12
> | Author: Rafael J. Wysocki 
> | Date:   Sun Dec 19 15:57:16 2010 +0100
> |
> |PCI/PCIe: Clear Root PME Status bits early during system resume
> |
> |I noticed that PCI Express PMEs don't work on my Toshiba Portege R500
> |after the system has been woken up from a sleep state by a PME
> |(through Wake-on-LAN).  After some investigation it turned out that
> |the BIOS didn't clear the Root PME Status bit in the root port that
> |received the wakeup PME and since the Requester ID was also set in
> |the port's Root Status register, any subsequent PMEs didn't trigger
> |interrupts.
> |
> |This problem can be avoided by clearing the Root PME Status bits in
> |all PCI Express root ports during early resume.  For this purpose,
> |add an early resume routine to the PCIe port driver and make this
> |driver be always registered, even if pci_ports_disable is set (in
> |which case the driver's only function is to provide the early
> |resume callback).
> |
> |
> |@@ -349,15 +349,18 @@ int pcie_port_device_register(struct pci_dev *dev)
> |int status, capabilities, i, nr_service;
> |int irqs[PCIE_PORT_DEVICE_MAXSERVICES];
> |
> |-   /* Get and check PCI Express port services */
> |-   capabilities = get_port_device_capability(dev);
> |-   if (!capabilities)
> |-   return -ENODEV;
> |-
> |/* Enable PCI Express port device */
> |status = pci_enable_device(dev);
> |if (status)
> |return status;
> |+
> |+   /* Get and check PCI Express port services */
> |+   capabilities = get_port_device_capability(dev);
> |+   if (!capabilities) {
> |+   pcie_no_aspm();
> |+   return 0;
> |+   }
> |+
> |pci_set_master(dev);
> |/*
> | * Initialize service irqs. Don't use service devices that
> 
> >
> > Thus the port driver bails out before calling pci_set_master(). The fix
> > is to call pci_set_master() unconditionally. However that lead me to
> > find to a few interesting oddities in that port driver code:
> 
> can we revert that partially change ? aka we should check get_port
> at first...
> 
> like attached.

It looks like we can do something like this (just pasting your patch):

diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 31063ac..1ee6f16 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -362,16 +362,16 @@ int pcie_port_device_register(struct pci_dev *dev)
int status, capabilities, i, nr_service;
int irqs[PCIE_PORT_DEVICE_MAXSERVICES];
 
-   /* Enable PCI Express port device */
-   status = pci_enable_device(dev);
-   if (status)
-   return status;
-
/* Get and check PCI Express port services */
capabilities = get_port_device_capability(dev);
if (!capabilities)
return 0;
 
+   /* Enable PCI Express port device */
+   status = pci_enable_device(dev);
+   if (status)
+   return status;
+
pci_set_master(dev);
/*
 * Initialize service irqs. Don't use service devices that

but I don't have that box with me to test whether or not it still works
correctly after this change.  I'll be back home on the next Saturday if
all goes well.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [[PATCH] 0/3] imx-dma: fixes

2013-09-28 Thread Michael Grzeschik
Hi Christoph,

On Tue, Sep 24, 2013 at 08:35:03AM +0200, Christoph Fritz wrote:
> On Mon, 2013-09-23 at 09:49 +0530, Vinod Koul wrote:
> > On Tue, Sep 17, 2013 at 03:56:05PM +0200, Michael Grzeschik wrote:
> > > Hello,
> > > 
> > > this series is solving some lockdep issues in the imx-dma code.
> > > There are some list_head and deadlock issues in the code,
> > > that is running the implementation into unsafe situations.
> > Thanks for this, I have trying to fix this with testing done by Christoph. 
> > I had
> > similar set of fixes
> > 
> > Christoph can you pls try runnning this on your setup and check and we can 
> > apply
> > these
> 
> Thanks for the update, I added Michaels imx-dma patchset to Kernel
> 3.4.62 and gave it a shot:
> 
> In contrast to DMA-disabled, a 'dd' copy still results in a hung:
> 
>dd if=/dev/zero of=/mnt/dd-test.bin count=102400 bs=1K
> 
> Please see the full log from boot to hung with DEBUG enabled below.
> With 2.6.31, copying to an SD-Card with DMA enabled works flawlessly,
> this log is also below.
> 
> Michael, any ideas? I suppose you have the same board?

The hardware we tested these patches for/with was custom hardware. But
yes, we have this board you refer. We will need to setup the same
situation first for debugging.

Did you realize that the stalling mem2dev transfer in 3.4.62
is generating this footprint:

> [   60.579646] imx-dma imx-dma: imxdma_xfer_desc channel: 0 sg=c70ff000 
> sgcount=8 total length=32768 dev_addr=0x10014038 (mem2dev)
> [   60.591192] imx-dma imx-dma:  imxdma_sg_next channel: 0 dst 0x10014038, 
> src 0xa5527000, size 0x1000
> [   60.600624] imx-dma imx-dma: imxdma_enable_hw channel 0
> [   60.605887] imx-dma imx-dma:  imxdma_sg_next channel: 0 dst 0x10014038, 
> src 0xa5525000, size 0x1000
> [   60.795424] imx-dma imx-dma: dma_irq_handler called, disr=0x0001
> [   60.801857] imx-dma imx-dma:  imxdma_sg_next channel: 0 dst 0x10014038, 
> src 0xa5523000, size 0x1000
> [   61.290221] imx-dma imx-dma: channel 0: watchdog timeout!

Beside on 2.6.31 the same transfer results in no failure.

> [   55.27] imxdma0: imx_dma_setup_sg sg=c7ae3800 sgcount=9 total 
> length=32768 dev_addr=0x10014038 for write
> [   55.28] imxdma0: next sg chunk dst 0x10014038, src 0xa73a1c00, size 
> 0x1000
> [   55.29] imxdma0: imx_dma_enable
> [   55.29] imxdma: dma_irq_handler called, disr=0x0001
> [   55.29] imxdma0: next sg chunk dst 0x10014038, src 0xa73a2c00, size 
> 0x0400
> [   55.30] imxdma: dma_irq_handler called, disr=0x0001
> [   55.30] imxdma0: next sg chunk dst 0x10014038, src 0xa73a5000, size 
> 0x1000
> [   55.32] imxdma: dma_irq_handler called, disr=0x0001
> [   55.32] imxdma0: next sg chunk dst 0x10014038, src 0xa73a7000, size 
> 0x1000
> [   55.33] imxdma: dma_irq_handler called, disr=0x0001
> [   55.33] imxdma0: next sg chunk dst 0x10014038, src 0xa73a9000, size 
> 0x1000
> [   55.34] imxdma: dma_irq_handler called, disr=0x0001
> [   55.34] imxdma0: next sg chunk dst 0x10014038, src 0xa73ab000, size 
> 0x1000
> [   55.36] imxdma: dma_irq_handler called, disr=0x0001
> [   55.36] imxdma0: next sg chunk dst 0x10014038, src 0xa73ad000, size 
> 0x1000
> [   55.38] imxdma: dma_irq_handler called, disr=0x0001
> [   55.38] imxdma0: next sg chunk dst 0x10014038, src 0xa73af000, size 
> 0x1000
> [   55.39] imxdma: dma_irq_handler called, disr=0x0001
> [   55.39] imxdma0: next sg chunk dst 0x10014038, src 0xa73b1000, size 
> 0x0c00
> [   55.40] imxdma: dma_irq_handler called, disr=0x0001
> [   55.41] imxdma0: imx_dma_disable

It looks suspicious that the same same transfer in the newer kernel should take
less amount of sg (sgcount=8 vs. sgcount=9) with the same amount of payload 
data.

I don't think this issue is related to the patch series I posted. But
anyway needs to be investigated.

Regards,
Michael

-- 
Pengutronix e.K.   | |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] fs/binfmt_elf.c: fill_note_info: Reduce scope of a variable

2013-09-28 Thread Geyslan Gregório Bem
Please, disconsider this e-mail.

Geyslan Gregório Bem
hackingbits.com


2013/9/28 Geyslan G. Bem :
> The *t variable in the fill_note_info function is only used if
> siginfo->si_signo isn't 0. Moving "t" pointer to that inner scope.
>
> Tested.
>
> Signed-off-by: Geyslan G. Bem 
> ---
>  fs/binfmt_elf.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 100edcc..d0ac15b 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -1848,12 +1848,11 @@ static int fill_note_info(struct elfhdr *elf, int 
> phdrs,
>   struct elf_note_info *info,
>   siginfo_t *siginfo, struct pt_regs *regs)
>  {
> -   struct list_head *t;
> -
> if (!elf_note_info_init(info))
> return 0;
>
> if (siginfo->si_signo) {
> +   struct list_head *t;
> struct core_thread *ct;
> struct elf_thread_status *ets;
>
> --
> 1.8.4
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fs: exec.c: Coding style sanitization

2013-09-28 Thread Geyslan Gregório Bem
Please, disconsider this e-mail.

Geyslan Gregório Bem
hackingbits.com


2013/9/28 Geyslan G. Bem :
> Adjustment based on the checkpatch.pl.
>
> Tested.
>
> Signed-off-by: Geyslan G. Bem 
> ---
>  fs/exec.c | 92 
> ++-
>  1 file changed, 49 insertions(+), 43 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 8875dd1..b5c6086 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -19,7 +19,7 @@
>   * current->executable is only used by the procfs.  This allows a dispatch
>   * table to check for several different types  of binary formats.  We keep
>   * trying until we recognize the file or we run out of supported binary
> - * formats.
> + * formats.
>   */
>
>  #include 
> @@ -55,8 +55,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
> -#include 
>  #include 
>  #include 
>
> @@ -71,7 +71,7 @@ int suid_dumpable = 0;
>  static LIST_HEAD(formats);
>  static DEFINE_RWLOCK(binfmt_lock);
>
> -void __register_binfmt(struct linux_binfmt * fmt, int insert)
> +void __register_binfmt(struct linux_binfmt *fmt, int insert)
>  {
> BUG_ON(!fmt);
> if (WARN_ON(!fmt->load_binary))
> @@ -81,19 +81,17 @@ void __register_binfmt(struct linux_binfmt * fmt, int 
> insert)
>  list_add_tail(>lh, );
> write_unlock(_lock);
>  }
> -
>  EXPORT_SYMBOL(__register_binfmt);
>
> -void unregister_binfmt(struct linux_binfmt * fmt)
> +void unregister_binfmt(struct linux_binfmt *fmt)
>  {
> write_lock(_lock);
> list_del(>lh);
> write_unlock(_lock);
>  }
> -
>  EXPORT_SYMBOL(unregister_binfmt);
>
> -static inline void put_binfmt(struct linux_binfmt * fmt)
> +static inline void put_binfmt(struct linux_binfmt *fmt)
>  {
> module_put(fmt->module);
>  }
> @@ -136,8 +134,8 @@ SYSCALL_DEFINE1(uselib, const char __user *, library)
> fsnotify_open(file);
>
> error = -ENOEXEC;
> -   if(file->f_op) {
> -   struct linux_binfmt * fmt;
> +   if (file->f_op) {
> +   struct linux_binfmt *fmt;
>
> read_lock(_lock);
> list_for_each_entry(fmt, , lh) {
> @@ -154,10 +152,11 @@ SYSCALL_DEFINE1(uselib, const char __user *, library)
> }
> read_unlock(_lock);
> }
> +
>  exit:
> fput(file);
>  out:
> -   return error;
> +   return error;
>  }
>
>  #ifdef CONFIG_MMU
> @@ -180,7 +179,7 @@ static void acct_arg_size(struct linux_binprm *bprm, 
> unsigned long pages)
>  }
>
>  static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long 
> pos,
> -   int write)
> +int write)
>  {
> struct page *page;
> int ret;
> @@ -241,7 +240,7 @@ static void free_arg_pages(struct linux_binprm *bprm)
>  }
>
>  static void flush_arg_page(struct linux_binprm *bprm, unsigned long pos,
> -   struct page *page)
> +  struct page *page)
>  {
> flush_cache_page(bprm->vma, pos, page_to_pfn(page));
>  }
> @@ -268,7 +267,9 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
> BUILD_BUG_ON(VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP);
> vma->vm_end = STACK_TOP_MAX;
> vma->vm_start = vma->vm_end - PAGE_SIZE;
> -   vma->vm_flags = VM_SOFTDIRTY | VM_STACK_FLAGS | 
> VM_STACK_INCOMPLETE_SETUP;
> +   vma->vm_flags = (VM_SOFTDIRTY |
> +VM_STACK_FLAGS |
> +VM_STACK_INCOMPLETE_SETUP);
> vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> INIT_LIST_HEAD(>anon_vma_chain);
>
> @@ -299,7 +300,7 @@ static inline void acct_arg_size(struct linux_binprm 
> *bprm, unsigned long pages)
>  }
>
>  static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long 
> pos,
> -   int write)
> +int write)
>  {
> struct page *page;
>
> @@ -335,7 +336,7 @@ static void free_arg_pages(struct linux_binprm *bprm)
>  }
>
>  static void flush_arg_page(struct linux_binprm *bprm, unsigned long pos,
> -   struct page *page)
> +  struct page *page)
>  {
>  }
>
> @@ -616,7 +617,9 @@ static int shift_arg_pages(struct vm_area_struct *vma, 
> unsigned long shift)
>  * when the old and new regions overlap clear from new_end.
>  */
> free_pgd_range(, new_end, old_end, new_end,
> -   vma->vm_next ? vma->vm_next->vm_start : 
> USER_PGTABLES_CEILING);
> +  vma->vm_next
> +  ? vma->vm_next->vm_start
> +  : USER_PGTABLES_CEILING);
> } else {
> /*
>  * otherwise, clean from old_start; this is done to not touch
> @@ -625,7 +628,9 @@ static int shift_arg_pages(struct vm_area_struct *vma, 
> unsigned long shift)
>  * for the others its just a little faster.
>

Re: [PATCH v3 2/2] ia32_aout: x86_64: Add safe check in a.out loaders, printks, conding style fixes

2013-09-28 Thread Geyslan Gregório Bem
Please, disconsider this e-mail.

Geyslan Gregório Bem
hackingbits.com


2013/9/28 Geyslan G. Bem :
> ia32_aout had no safe checks concerning the mmap and f_op in this module.
> It's not necessary to verify f_op in the load_aout_library, since the
> prior kernel_read/vfs_read function already does.
> Made coding style fixes and printks replacements.
>
> Tested using qemu, a handcrafted a.out binary and an a.out linked with a
> cross-compiled ld.
>
> Signed-off-by: Geyslan G. Bem 
> ---
>  arch/x86/ia32/ia32_aout.c | 63 
> +++
>  1 file changed, 31 insertions(+), 32 deletions(-)
>
> diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
> index bae3aba..87d5114 100644
> --- a/arch/x86/ia32/ia32_aout.c
> +++ b/arch/x86/ia32/ia32_aout.c
> @@ -24,9 +24,9 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#include 
> +#include 
>
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -224,9 +224,9 @@ static u32 __user *create_aout_tables(char __user *p, 
> struct linux_binprm *bprm)
> int argc = bprm->argc, envc = bprm->envc;
>
> sp = (u32 __user *) ((-(unsigned long)sizeof(u32)) & (unsigned long) 
> p);
> -   sp -= envc+1;
> +   sp -= envc + 1;
> envp = sp;
> -   sp -= argc+1;
> +   sp -= argc + 1;
> argv = sp;
> put_user((unsigned long) envp, --sp);
> put_user((unsigned long) argv, --sp);
> @@ -271,10 +271,17 @@ static int load_aout_binary(struct linux_binprm *bprm)
>  N_MAGIC(ex) != QMAGIC && N_MAGIC(ex) != NMAGIC) ||
> N_TRSIZE(ex) || N_DRSIZE(ex) ||
> i_size_read(file_inode(bprm->file)) <
> -   ex.a_text+ex.a_data+N_SYMSIZE(ex)+N_TXTOFF(ex)) {
> +   ex.a_text + ex.a_data + N_SYMSIZE(ex) + N_TXTOFF(ex)) {
> return -ENOEXEC;
> }
>
> +   /*
> +* Requires a mmap handler. This prevents people from using a.out
> +* as part of an exploit attack against /proc-related vulnerabilities.
> +*/
> +   if (!bprm->file->f_op || !bprm->file->f_op->mmap)
> +   return -ENOEXEC;
> +
> fd_offset = N_TXTOFF(ex);
>
> /* Check initial limits. This avoids letting people circumvent
> @@ -322,7 +329,7 @@ static int load_aout_binary(struct linux_binprm *bprm)
> unsigned long text_addr, map_size;
>
> text_addr = N_TXTADDR(ex);
> -   map_size = ex.a_text+ex.a_data;
> +   map_size = ex.a_text + ex.a_data;
>
> error = vm_brk(text_addr & PAGE_MASK, map_size);
>
> @@ -339,28 +346,19 @@ static int load_aout_binary(struct linux_binprm *bprm)
> }
> } else {
>  #ifdef WARN_OLD
> -   static unsigned long error_time, error_time2;
> if ((ex.a_text & 0xfff || ex.a_data & 0xfff) &&
> -   (N_MAGIC(ex) != NMAGIC) &&
> -   time_after(jiffies, error_time2 + 5*HZ)) {
> -   printk(KERN_NOTICE "executable not page aligned\n");
> -   error_time2 = jiffies;
> -   }
> +   (N_MAGIC(ex) != NMAGIC))
> +   pr_notice_ratelimited("executable not page 
> aligned\n");
>
> -   if ((fd_offset & ~PAGE_MASK) != 0 &&
> -   time_after(jiffies, error_time + 5*HZ)) {
> -   printk(KERN_WARNING
> -  "fd_offset is not page aligned. Please convert 
> "
> -  "program: %s\n",
> -  bprm->file->f_path.dentry->d_name.name);
> -   error_time = jiffies;
> -   }
> +   if ((fd_offset & ~PAGE_MASK) != 0)
> +   pr_warn_ratelimited("fd_offset is not page aligned. 
> Please convert program: %s\n",
> +   
> bprm->file->f_path.dentry->d_name.name);
>  #endif
>
> -   if (!bprm->file->f_op->mmap || (fd_offset & ~PAGE_MASK) != 0) 
> {
> -   vm_brk(N_TXTADDR(ex), ex.a_text+ex.a_data);
> +   if ((fd_offset & ~PAGE_MASK) != 0) {
> +   vm_brk(N_TXTADDR(ex), ex.a_text + ex.a_data);
> read_code(bprm->file, N_TXTADDR(ex), fd_offset,
> -   ex.a_text+ex.a_data);
> +   ex.a_text + ex.a_data);
> goto beyond_if;
> }
>
> @@ -424,10 +422,17 @@ static int load_aout_library(struct file *file)
> if ((N_MAGIC(ex) != ZMAGIC && N_MAGIC(ex) != QMAGIC) || N_TRSIZE(ex) 
> ||
> N_DRSIZE(ex) || ((ex.a_entry & 0xfff) && N_MAGIC(ex) == ZMAGIC) ||
> i_size_read(file_inode(file)) <
> -   ex.a_text+ex.a_data+N_SYMSIZE(ex)+N_TXTOFF(ex)) {
> +   ex.a_text + ex.a_data + N_SYMSIZE(ex) + N_TXTOFF(ex)) {
> goto out;
>

Re: [PATCH v3 1/2] binfmt_aout: x86: Useless inode var, printks coding style fixes

2013-09-28 Thread Geyslan Gregório Bem
Please, disconsider this e-mail.

Geyslan Gregório Bem
hackingbits.com


2013/9/28 Geyslan G. Bem :
> file size used only once, so removed due its useless prior allocation.
> It's not necessary to verify f_op in the load_aout_library, since the
> prior kernel_read/vfs_read function already does.
> Made coding style fixes and printk replacements.
>
> Tested using qemu, a handcrafted a.out binary and an a.out linked with a
> cross-compiled ld.
>
> Signed-off-by: Geyslan G. Bem 
> ---
>  fs/binfmt_aout.c | 98 
> +++-
>  1 file changed, 48 insertions(+), 50 deletions(-)
>
> diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
> index 89dec7f..c732b8e 100644
> --- a/fs/binfmt_aout.c
> +++ b/fs/binfmt_aout.c
> @@ -25,13 +25,14 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>
> -#include 
>  #include 
>  #include 
>
>  static int load_aout_binary(struct linux_binprm *);
> -static int load_aout_library(struct file*);
> +static int load_aout_library(struct file *);
>
>  #ifdef CONFIG_COREDUMP
>  /*
> @@ -62,7 +63,7 @@ static int aout_core_dump(struct coredump_params *cprm)
> fs = get_fs();
> set_fs(KERNEL_DS);
> has_dumped = 1;
> -   strncpy(dump.u_comm, current->comm, sizeof(dump.u_comm));
> +   strncpy(dump.u_comm, current->comm, sizeof(dump.u_comm));
> dump.u_ar0 = offsetof(struct user, regs);
> dump.signal = cprm->siginfo->si_signo;
> aout_dump_thread(cprm->regs, );
> @@ -78,9 +79,11 @@ static int aout_core_dump(struct coredump_params *cprm)
>
>  /* make sure we actually have a data and stack area to dump */
> set_fs(USER_DS);
> -   if (!access_ok(VERIFY_READ, START_DATA(dump), dump.u_dsize << 
> PAGE_SHIFT))
> +   if (!access_ok(VERIFY_READ, START_DATA(dump),
> +  dump.u_dsize << PAGE_SHIFT))
> dump.u_dsize = 0;
> -   if (!access_ok(VERIFY_READ, START_STACK(dump), dump.u_ssize << 
> PAGE_SHIFT))
> +   if (!access_ok(VERIFY_READ, START_STACK(dump),
> +  dump.u_ssize << PAGE_SHIFT))
> dump.u_ssize = 0;
>
> set_fs(KERNEL_DS);
> @@ -142,7 +145,8 @@ static int set_brk(unsigned long start, unsigned long end)
>   * memory and creates the pointer tables from them, and puts their
>   * addresses on the "stack", returning the new stack pointer value.
>   */
> -static unsigned long __user *create_aout_tables(char __user *p, struct 
> linux_binprm * bprm)
> +static unsigned long __user *create_aout_tables(char __user *p,
> +   struct linux_binprm *bprm)
>  {
> char __user * __user *argv;
> char __user * __user *envp;
> @@ -150,7 +154,8 @@ static unsigned long __user *create_aout_tables(char 
> __user *p, struct linux_bin
> int argc = bprm->argc;
> int envc = bprm->envc;
>
> -   sp = (void __user *)((-(unsigned long)sizeof(char *)) & (unsigned 
> long) p);
> +   sp = (void __user *) ((-(unsigned long) sizeof(char *))
> + & (unsigned long) p);
>  #ifdef __alpha__
>  /* whee.. test-programs are so much fun. */
> put_user(0, --sp);
> @@ -169,28 +174,28 @@ static unsigned long __user *create_aout_tables(char 
> __user *p, struct linux_bin
> sp -= argc+1;
> argv = (char __user * __user *) sp;
>  #ifndef __alpha__
> -   put_user((unsigned long) envp,--sp);
> -   put_user((unsigned long) argv,--sp);
> +   put_user((unsigned long) envp, --sp);
> +   put_user((unsigned long) argv, --sp);
>  #endif
> -   put_user(argc,--sp);
> +   put_user(argc, --sp);
> current->mm->arg_start = (unsigned long) p;
> -   while (argc-->0) {
> +   while (argc-- > 0) {
> char c;
> -   put_user(p,argv++);
> +   put_user(p, argv++);
> do {
> -   get_user(c,p++);
> +   get_user(c, p++);
> } while (c);
> }
> -   put_user(NULL,argv);
> +   put_user(NULL, argv);
> current->mm->arg_end = current->mm->env_start = (unsigned long) p;
> -   while (envc-->0) {
> +   while (envc-- > 0) {
> char c;
> -   put_user(p,envp++);
> +   put_user(p, envp++);
> do {
> -   get_user(c,p++);
> +   get_user(c, p++);
> } while (c);
> }
> -   put_user(NULL,envp);
> +   put_user(NULL, envp);
> current->mm->env_end = (unsigned long) p;
> return sp;
>  }
> @@ -200,7 +205,7 @@ static unsigned long __user *create_aout_tables(char 
> __user *p, struct linux_bin
>   * libraries.  There is no binary dependent code anywhere else.
>   */
>
> -static int load_aout_binary(struct linux_binprm * bprm)
> +static int load_aout_binary(struct linux_binprm *bprm)
>  {
> struct pt_regs *regs = 

[PATCH] 9p: remove useless assignment

2013-09-28 Thread Geyslan G. Bem
There is no use of pointer 'fid' before the next assignment.

Signed-off-by: Geyslan G. Bem 
---
 fs/9p/vfs_inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 94de6d1..7566477 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -867,7 +867,7 @@ v9fs_vfs_atomic_open(struct inode *dir, struct dentry 
*dentry,
return finish_no_open(file, res);
 
err = 0;
-   fid = NULL;
+
v9ses = v9fs_inode2v9ses(dir);
perm = unixmode2p9mode(v9ses, mode);
fid = v9fs_create(v9ses, dir, dentry, NULL, perm,
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] 9p: remove unused 'super_block' struct pointer

2013-09-28 Thread Geyslan G. Bem
Get rid of the useless '*sb' variable.

Signed-off-by: Geyslan G. Bem 
---
 fs/9p/vfs_inode.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 7566477..b352457 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -783,7 +783,6 @@ struct dentry *v9fs_vfs_lookup(struct inode *dir, struct 
dentry *dentry,
  unsigned int flags)
 {
struct dentry *res;
-   struct super_block *sb;
struct v9fs_session_info *v9ses;
struct p9_fid *dfid, *fid;
struct inode *inode;
@@ -795,7 +794,6 @@ struct dentry *v9fs_vfs_lookup(struct inode *dir, struct 
dentry *dentry,
if (dentry->d_name.len > NAME_MAX)
return ERR_PTR(-ENAMETOOLONG);
 
-   sb = dir->i_sb;
v9ses = v9fs_inode2v9ses(dir);
/* We can walk d_parent because we hold the dir->i_mutex */
dfid = v9fs_fid_lookup(dentry->d_parent);
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] 9p: remove never used return variable

2013-09-28 Thread Geyslan G. Bem
Get rid of the useless 'err' variable, since the return is treated
farther down without the use of it.

Signed-off-by: Geyslan G. Bem 
---
 fs/9p/vfs_inode_dotl.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index a7c4814..a892c2f 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -477,13 +477,11 @@ static int
 v9fs_vfs_getattr_dotl(struct vfsmount *mnt, struct dentry *dentry,
 struct kstat *stat)
 {
-   int err;
struct v9fs_session_info *v9ses;
struct p9_fid *fid;
struct p9_stat_dotl *st;
 
p9_debug(P9_DEBUG_VFS, "dentry: %p\n", dentry);
-   err = -EPERM;
v9ses = v9fs_dentry2v9ses(dentry);
if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
generic_fillattr(dentry->d_inode, stat);
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1] fs/binfmt_elf.c: fill_note_info: Reduce scope of a variable

2013-09-28 Thread Geyslan G. Bem
The *t variable in the fill_note_info function is only used if
siginfo->si_signo isn't 0. Moving "t" pointer to that inner scope.

Tested.

Signed-off-by: Geyslan G. Bem 
---
 fs/binfmt_elf.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 100edcc..d0ac15b 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1848,12 +1848,11 @@ static int fill_note_info(struct elfhdr *elf, int phdrs,
  struct elf_note_info *info,
  siginfo_t *siginfo, struct pt_regs *regs)
 {
-   struct list_head *t;
-
if (!elf_note_info_init(info))
return 0;
 
if (siginfo->si_signo) {
+   struct list_head *t;
struct core_thread *ct;
struct elf_thread_status *ets;
 
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 1/2] binfmt_aout: x86: Useless inode var, printks coding style fixes

2013-09-28 Thread Geyslan G. Bem
file size used only once, so removed due its useless prior allocation.
It's not necessary to verify f_op in the load_aout_library, since the
prior kernel_read/vfs_read function already does.
Made coding style fixes and printk replacements.

Tested using qemu, a handcrafted a.out binary and an a.out linked with a
cross-compiled ld.

Signed-off-by: Geyslan G. Bem 
---
 fs/binfmt_aout.c | 98 +++-
 1 file changed, 48 insertions(+), 50 deletions(-)

diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
index 89dec7f..c732b8e 100644
--- a/fs/binfmt_aout.c
+++ b/fs/binfmt_aout.c
@@ -25,13 +25,14 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
-#include 
 #include 
 #include 
 
 static int load_aout_binary(struct linux_binprm *);
-static int load_aout_library(struct file*);
+static int load_aout_library(struct file *);
 
 #ifdef CONFIG_COREDUMP
 /*
@@ -62,7 +63,7 @@ static int aout_core_dump(struct coredump_params *cprm)
fs = get_fs();
set_fs(KERNEL_DS);
has_dumped = 1;
-   strncpy(dump.u_comm, current->comm, sizeof(dump.u_comm));
+   strncpy(dump.u_comm, current->comm, sizeof(dump.u_comm));
dump.u_ar0 = offsetof(struct user, regs);
dump.signal = cprm->siginfo->si_signo;
aout_dump_thread(cprm->regs, );
@@ -78,9 +79,11 @@ static int aout_core_dump(struct coredump_params *cprm)
 
 /* make sure we actually have a data and stack area to dump */
set_fs(USER_DS);
-   if (!access_ok(VERIFY_READ, START_DATA(dump), dump.u_dsize << 
PAGE_SHIFT))
+   if (!access_ok(VERIFY_READ, START_DATA(dump),
+  dump.u_dsize << PAGE_SHIFT))
dump.u_dsize = 0;
-   if (!access_ok(VERIFY_READ, START_STACK(dump), dump.u_ssize << 
PAGE_SHIFT))
+   if (!access_ok(VERIFY_READ, START_STACK(dump),
+  dump.u_ssize << PAGE_SHIFT))
dump.u_ssize = 0;
 
set_fs(KERNEL_DS);
@@ -142,7 +145,8 @@ static int set_brk(unsigned long start, unsigned long end)
  * memory and creates the pointer tables from them, and puts their
  * addresses on the "stack", returning the new stack pointer value.
  */
-static unsigned long __user *create_aout_tables(char __user *p, struct 
linux_binprm * bprm)
+static unsigned long __user *create_aout_tables(char __user *p,
+   struct linux_binprm *bprm)
 {
char __user * __user *argv;
char __user * __user *envp;
@@ -150,7 +154,8 @@ static unsigned long __user *create_aout_tables(char __user 
*p, struct linux_bin
int argc = bprm->argc;
int envc = bprm->envc;
 
-   sp = (void __user *)((-(unsigned long)sizeof(char *)) & (unsigned long) 
p);
+   sp = (void __user *) ((-(unsigned long) sizeof(char *))
+ & (unsigned long) p);
 #ifdef __alpha__
 /* whee.. test-programs are so much fun. */
put_user(0, --sp);
@@ -169,28 +174,28 @@ static unsigned long __user *create_aout_tables(char 
__user *p, struct linux_bin
sp -= argc+1;
argv = (char __user * __user *) sp;
 #ifndef __alpha__
-   put_user((unsigned long) envp,--sp);
-   put_user((unsigned long) argv,--sp);
+   put_user((unsigned long) envp, --sp);
+   put_user((unsigned long) argv, --sp);
 #endif
-   put_user(argc,--sp);
+   put_user(argc, --sp);
current->mm->arg_start = (unsigned long) p;
-   while (argc-->0) {
+   while (argc-- > 0) {
char c;
-   put_user(p,argv++);
+   put_user(p, argv++);
do {
-   get_user(c,p++);
+   get_user(c, p++);
} while (c);
}
-   put_user(NULL,argv);
+   put_user(NULL, argv);
current->mm->arg_end = current->mm->env_start = (unsigned long) p;
-   while (envc-->0) {
+   while (envc-- > 0) {
char c;
-   put_user(p,envp++);
+   put_user(p, envp++);
do {
-   get_user(c,p++);
+   get_user(c, p++);
} while (c);
}
-   put_user(NULL,envp);
+   put_user(NULL, envp);
current->mm->env_end = (unsigned long) p;
return sp;
 }
@@ -200,7 +205,7 @@ static unsigned long __user *create_aout_tables(char __user 
*p, struct linux_bin
  * libraries.  There is no binary dependent code anywhere else.
  */
 
-static int load_aout_binary(struct linux_binprm * bprm)
+static int load_aout_binary(struct linux_binprm *bprm)
 {
struct pt_regs *regs = current_pt_regs();
struct exec ex;
@@ -213,7 +218,8 @@ static int load_aout_binary(struct linux_binprm * bprm)
if ((N_MAGIC(ex) != ZMAGIC && N_MAGIC(ex) != OMAGIC &&
 N_MAGIC(ex) != QMAGIC && N_MAGIC(ex) != NMAGIC) ||
N_TRSIZE(ex) || N_DRSIZE(ex) ||
-   i_size_read(file_inode(bprm->file)) < 

[PATCH v3 2/2] ia32_aout: x86_64: Add safe check in a.out loaders, printks, conding style fixes

2013-09-28 Thread Geyslan G. Bem
ia32_aout had no safe checks concerning the mmap and f_op in this module.
It's not necessary to verify f_op in the load_aout_library, since the
prior kernel_read/vfs_read function already does.
Made coding style fixes and printks replacements.

Tested using qemu, a handcrafted a.out binary and an a.out linked with a
cross-compiled ld.

Signed-off-by: Geyslan G. Bem 
---
 arch/x86/ia32/ia32_aout.c | 63 +++
 1 file changed, 31 insertions(+), 32 deletions(-)

diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
index bae3aba..87d5114 100644
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -24,9 +24,9 @@
 #include 
 #include 
 #include 
-#include 
+#include 
+#include 
 
-#include 
 #include 
 #include 
 #include 
@@ -224,9 +224,9 @@ static u32 __user *create_aout_tables(char __user *p, 
struct linux_binprm *bprm)
int argc = bprm->argc, envc = bprm->envc;
 
sp = (u32 __user *) ((-(unsigned long)sizeof(u32)) & (unsigned long) p);
-   sp -= envc+1;
+   sp -= envc + 1;
envp = sp;
-   sp -= argc+1;
+   sp -= argc + 1;
argv = sp;
put_user((unsigned long) envp, --sp);
put_user((unsigned long) argv, --sp);
@@ -271,10 +271,17 @@ static int load_aout_binary(struct linux_binprm *bprm)
 N_MAGIC(ex) != QMAGIC && N_MAGIC(ex) != NMAGIC) ||
N_TRSIZE(ex) || N_DRSIZE(ex) ||
i_size_read(file_inode(bprm->file)) <
-   ex.a_text+ex.a_data+N_SYMSIZE(ex)+N_TXTOFF(ex)) {
+   ex.a_text + ex.a_data + N_SYMSIZE(ex) + N_TXTOFF(ex)) {
return -ENOEXEC;
}
 
+   /*
+* Requires a mmap handler. This prevents people from using a.out
+* as part of an exploit attack against /proc-related vulnerabilities.
+*/
+   if (!bprm->file->f_op || !bprm->file->f_op->mmap)
+   return -ENOEXEC;
+
fd_offset = N_TXTOFF(ex);
 
/* Check initial limits. This avoids letting people circumvent
@@ -322,7 +329,7 @@ static int load_aout_binary(struct linux_binprm *bprm)
unsigned long text_addr, map_size;
 
text_addr = N_TXTADDR(ex);
-   map_size = ex.a_text+ex.a_data;
+   map_size = ex.a_text + ex.a_data;
 
error = vm_brk(text_addr & PAGE_MASK, map_size);
 
@@ -339,28 +346,19 @@ static int load_aout_binary(struct linux_binprm *bprm)
}
} else {
 #ifdef WARN_OLD
-   static unsigned long error_time, error_time2;
if ((ex.a_text & 0xfff || ex.a_data & 0xfff) &&
-   (N_MAGIC(ex) != NMAGIC) &&
-   time_after(jiffies, error_time2 + 5*HZ)) {
-   printk(KERN_NOTICE "executable not page aligned\n");
-   error_time2 = jiffies;
-   }
+   (N_MAGIC(ex) != NMAGIC))
+   pr_notice_ratelimited("executable not page aligned\n");
 
-   if ((fd_offset & ~PAGE_MASK) != 0 &&
-   time_after(jiffies, error_time + 5*HZ)) {
-   printk(KERN_WARNING
-  "fd_offset is not page aligned. Please convert "
-  "program: %s\n",
-  bprm->file->f_path.dentry->d_name.name);
-   error_time = jiffies;
-   }
+   if ((fd_offset & ~PAGE_MASK) != 0)
+   pr_warn_ratelimited("fd_offset is not page aligned. 
Please convert program: %s\n",
+   
bprm->file->f_path.dentry->d_name.name);
 #endif
 
-   if (!bprm->file->f_op->mmap || (fd_offset & ~PAGE_MASK) != 0) {
-   vm_brk(N_TXTADDR(ex), ex.a_text+ex.a_data);
+   if ((fd_offset & ~PAGE_MASK) != 0) {
+   vm_brk(N_TXTADDR(ex), ex.a_text + ex.a_data);
read_code(bprm->file, N_TXTADDR(ex), fd_offset,
-   ex.a_text+ex.a_data);
+   ex.a_text + ex.a_data);
goto beyond_if;
}
 
@@ -424,10 +422,17 @@ static int load_aout_library(struct file *file)
if ((N_MAGIC(ex) != ZMAGIC && N_MAGIC(ex) != QMAGIC) || N_TRSIZE(ex) ||
N_DRSIZE(ex) || ((ex.a_entry & 0xfff) && N_MAGIC(ex) == ZMAGIC) ||
i_size_read(file_inode(file)) <
-   ex.a_text+ex.a_data+N_SYMSIZE(ex)+N_TXTOFF(ex)) {
+   ex.a_text + ex.a_data + N_SYMSIZE(ex) + N_TXTOFF(ex)) {
goto out;
}
 
+   /*
+* Requires a mmap handler. This prevents people from using a.out
+* as part of an exploit attack against /proc-related vulnerabilities.
+*/
+   if (!file->f_op->mmap)
+   goto out;
+
if (N_FLAGS(ex))
goto out;
 
@@ -438,14 +443,8 @@ static int 

[PATCH] 9p: fix return value in case of error in v9fs_fid_xattr_set

2013-09-28 Thread Geyslan G. Bem
In case of error in the p9_client_write, the function v9fs_fid_xattr_set
should return its negative value, what was never being done.

Signed-off-by: Geyslan G. Bem 
---
 fs/9p/xattr.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/9p/xattr.c b/fs/9p/xattr.c
index 3c28cdf..0788388 100644
--- a/fs/9p/xattr.c
+++ b/fs/9p/xattr.c
@@ -149,11 +149,10 @@ int v9fs_fid_xattr_set(struct p9_fid *fid, const char 
*name,
write_count = value_len;
write_count = p9_client_write(fid, ((char *)value)+offset,
NULL, offset, write_count);
-   if (write_count < 0) {
-   /* error in xattr write */
-   retval = write_count;
-   break;
-   }
+   /* error in xattr write */
+   if (write_count < 0)
+   return write_count;
+
offset += write_count;
value_len -= write_count;
}
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] 9p: proper use of the 'name' variable

2013-09-28 Thread Geyslan G. Bem
The 'name' variable was assigned but never used. Hence puts its
assignment to the top and makes proper use of its value.

Signed-off-by: Geyslan G. Bem 
---
 fs/9p/vfs_inode_dotl.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 6803758..86b6944 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -772,8 +772,10 @@ v9fs_vfs_link_dotl(struct dentry *old_dentry, struct inode 
*dir,
struct p9_fid *dfid, *oldfid;
struct v9fs_session_info *v9ses;
 
+   name = (char *) dentry->d_name.name;
+
p9_debug(P9_DEBUG_VFS, "dir ino: %lu, old_name: %s, new_name: %s\n",
-dir->i_ino, old_dentry->d_name.name, dentry->d_name.name);
+dir->i_ino, old_dentry->d_name.name, name);
 
v9ses = v9fs_inode2v9ses(dir);
dir_dentry = dentry->d_parent;
@@ -785,9 +787,7 @@ v9fs_vfs_link_dotl(struct dentry *old_dentry, struct inode 
*dir,
if (IS_ERR(oldfid))
return PTR_ERR(oldfid);
 
-   name = (char *) dentry->d_name.name;
-
-   err = p9_client_link(dfid, oldfid, (char *)dentry->d_name.name);
+   err = p9_client_link(dfid, oldfid, name);
 
if (err < 0) {
p9_debug(P9_DEBUG_VFS, "p9_client_link failed %d\n", err);
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] 9p: remove useless variable and assignment

2013-09-28 Thread Geyslan G. Bem
There is no use of pointer 'v9ses'. Get rid of useless 'retval'
assignment.

Signed-off-by: Geyslan G. Bem 
---
 fs/9p/vfs_inode_dotl.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index a892c2f..6803758 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -558,7 +558,6 @@ static int v9fs_mapped_iattr_valid(int iattr_valid)
 int v9fs_vfs_setattr_dotl(struct dentry *dentry, struct iattr *iattr)
 {
int retval;
-   struct v9fs_session_info *v9ses;
struct p9_fid *fid;
struct p9_iattr_dotl p9attr;
struct inode *inode = dentry->d_inode;
@@ -579,8 +578,6 @@ int v9fs_vfs_setattr_dotl(struct dentry *dentry, struct 
iattr *iattr)
p9attr.mtime_sec = iattr->ia_mtime.tv_sec;
p9attr.mtime_nsec = iattr->ia_mtime.tv_nsec;
 
-   retval = -EPERM;
-   v9ses = v9fs_dentry2v9ses(dentry);
fid = v9fs_fid_lookup(dentry);
if (IS_ERR(fid))
return PTR_ERR(fid);
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] fs: exec.c: Coding style sanitization

2013-09-28 Thread Geyslan G. Bem
Adjustment based on the checkpatch.pl.

Tested.

Signed-off-by: Geyslan G. Bem 
---
 fs/exec.c | 92 ++-
 1 file changed, 49 insertions(+), 43 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 8875dd1..b5c6086 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -19,7 +19,7 @@
  * current->executable is only used by the procfs.  This allows a dispatch
  * table to check for several different types  of binary formats.  We keep
  * trying until we recognize the file or we run out of supported binary
- * formats. 
+ * formats.
  */
 
 #include 
@@ -55,8 +55,8 @@
 #include 
 #include 
 #include 
+#include 
 
-#include 
 #include 
 #include 
 
@@ -71,7 +71,7 @@ int suid_dumpable = 0;
 static LIST_HEAD(formats);
 static DEFINE_RWLOCK(binfmt_lock);
 
-void __register_binfmt(struct linux_binfmt * fmt, int insert)
+void __register_binfmt(struct linux_binfmt *fmt, int insert)
 {
BUG_ON(!fmt);
if (WARN_ON(!fmt->load_binary))
@@ -81,19 +81,17 @@ void __register_binfmt(struct linux_binfmt * fmt, int 
insert)
 list_add_tail(>lh, );
write_unlock(_lock);
 }
-
 EXPORT_SYMBOL(__register_binfmt);
 
-void unregister_binfmt(struct linux_binfmt * fmt)
+void unregister_binfmt(struct linux_binfmt *fmt)
 {
write_lock(_lock);
list_del(>lh);
write_unlock(_lock);
 }
-
 EXPORT_SYMBOL(unregister_binfmt);
 
-static inline void put_binfmt(struct linux_binfmt * fmt)
+static inline void put_binfmt(struct linux_binfmt *fmt)
 {
module_put(fmt->module);
 }
@@ -136,8 +134,8 @@ SYSCALL_DEFINE1(uselib, const char __user *, library)
fsnotify_open(file);
 
error = -ENOEXEC;
-   if(file->f_op) {
-   struct linux_binfmt * fmt;
+   if (file->f_op) {
+   struct linux_binfmt *fmt;
 
read_lock(_lock);
list_for_each_entry(fmt, , lh) {
@@ -154,10 +152,11 @@ SYSCALL_DEFINE1(uselib, const char __user *, library)
}
read_unlock(_lock);
}
+
 exit:
fput(file);
 out:
-   return error;
+   return error;
 }
 
 #ifdef CONFIG_MMU
@@ -180,7 +179,7 @@ static void acct_arg_size(struct linux_binprm *bprm, 
unsigned long pages)
 }
 
 static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
-   int write)
+int write)
 {
struct page *page;
int ret;
@@ -241,7 +240,7 @@ static void free_arg_pages(struct linux_binprm *bprm)
 }
 
 static void flush_arg_page(struct linux_binprm *bprm, unsigned long pos,
-   struct page *page)
+  struct page *page)
 {
flush_cache_page(bprm->vma, pos, page_to_pfn(page));
 }
@@ -268,7 +267,9 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
BUILD_BUG_ON(VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP);
vma->vm_end = STACK_TOP_MAX;
vma->vm_start = vma->vm_end - PAGE_SIZE;
-   vma->vm_flags = VM_SOFTDIRTY | VM_STACK_FLAGS | 
VM_STACK_INCOMPLETE_SETUP;
+   vma->vm_flags = (VM_SOFTDIRTY |
+VM_STACK_FLAGS |
+VM_STACK_INCOMPLETE_SETUP);
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
INIT_LIST_HEAD(>anon_vma_chain);
 
@@ -299,7 +300,7 @@ static inline void acct_arg_size(struct linux_binprm *bprm, 
unsigned long pages)
 }
 
 static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
-   int write)
+int write)
 {
struct page *page;
 
@@ -335,7 +336,7 @@ static void free_arg_pages(struct linux_binprm *bprm)
 }
 
 static void flush_arg_page(struct linux_binprm *bprm, unsigned long pos,
-   struct page *page)
+  struct page *page)
 {
 }
 
@@ -616,7 +617,9 @@ static int shift_arg_pages(struct vm_area_struct *vma, 
unsigned long shift)
 * when the old and new regions overlap clear from new_end.
 */
free_pgd_range(, new_end, old_end, new_end,
-   vma->vm_next ? vma->vm_next->vm_start : 
USER_PGTABLES_CEILING);
+  vma->vm_next
+  ? vma->vm_next->vm_start
+  : USER_PGTABLES_CEILING);
} else {
/*
 * otherwise, clean from old_start; this is done to not touch
@@ -625,7 +628,9 @@ static int shift_arg_pages(struct vm_area_struct *vma, 
unsigned long shift)
 * for the others its just a little faster.
 */
free_pgd_range(, old_start, old_end, new_end,
-   vma->vm_next ? vma->vm_next->vm_start : 
USER_PGTABLES_CEILING);
+  vma->vm_next
+  ? vma->vm_next->vm_start
+  : USER_PGTABLES_CEILING);
}
tlb_finish_mmu(, old_start, old_end);
 

[PATCH] 9p: remove unused 'p9_fid' struct pointer

2013-09-28 Thread Geyslan G. Bem
Get rid of the useless '*fid' variable.

Signed-off-by: Geyslan G. Bem 
---
 fs/9p/cache.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/9p/cache.c b/fs/9p/cache.c
index a9ea73d..50f9d9c 100644
--- a/fs/9p/cache.c
+++ b/fs/9p/cache.c
@@ -239,13 +239,12 @@ void v9fs_cache_inode_flush_cookie(struct inode *inode)
 void v9fs_cache_inode_set_cookie(struct inode *inode, struct file *filp)
 {
struct v9fs_inode *v9inode = V9FS_I(inode);
-   struct p9_fid *fid;
 
if (!v9inode->fscache)
return;
 
spin_lock(>fscache_lock);
-   fid = filp->private_data;
+
if ((filp->f_flags & O_ACCMODE) != O_RDONLY)
v9fs_cache_inode_flush_cookie(inode);
else
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] 9p: remove unused 'p9_client' struct pointer

2013-09-28 Thread Geyslan G. Bem
Get rid of the useless '*clnt' variable.

Signed-off-by: Geyslan G. Bem 
---
 fs/9p/vfs_file.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index aa5ecf4..42b7286 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -463,14 +463,12 @@ v9fs_file_write_internal(struct inode *inode, struct 
p9_fid *fid,
int n;
loff_t i_size;
size_t total = 0;
-   struct p9_client *clnt;
loff_t origin = *offset;
unsigned long pg_start, pg_end;
 
p9_debug(P9_DEBUG_VFS, "data %p count %d offset %x\n",
 data, (int)count, (int)*offset);
 
-   clnt = fid->clnt;
do {
n = p9_client_write(fid, NULL, data+total, origin+total, count);
if (n <= 0)
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v1.1] x86, boot: Further compress CPUs bootup message

2013-09-28 Thread H. Peter Anvin
For negative numbers presumably we need to d++ for the minus sign, no?

Borislav Petkov  wrote:
>On Sat, Sep 28, 2013 at 09:58:33PM +0200, Ingo Molnar wrote:
>> 
>> * Borislav Petkov  wrote:
>> 
>> > On Sat, Sep 28, 2013 at 09:49:27PM +0200, Borislav Petkov wrote:
>> > > And yes, that one works too.
>> > 
>> > Btw, just to be thorough, we could handle negative numbers too:
>> > 
>> > int num_digits(int val)
>> > {
>> > int m = 10;
>> > int d = 1;
>> > 
>> > if (val < 0)
>> > val = -val;
>> > 
>> > while (val >= m) {
>> > m *= 10;
>> > d++;
>> > }
>> > 
>> > return d;
>> > }
>> 
>> I like the cell phone version better!
>
>This *is* the cell phone version + negative numbers handling.

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] mfd: arizona: Update device tree regulator bindings

2013-09-28 Thread Mark Brown
On Sat, Sep 28, 2013 at 04:53:08PM +0100, Charles Keepax wrote:

> > I don't see why this would only be an issue for ASoC - it happens to

> > I would suggest that rather than unconditionally doing this on lookup
> > it'd be easier to do it the other way round and explicitly add mappings
> > from the parent to the child when registering the device.  That doesn't
> > have the potential to explode and get the wrong thing like this does.

> Been a bit delayed on this as I have been investigating the
> situation a little further. Is it just moving the supplies of the
> parent node we object to? Or do we want to keep everything tidied

No, having the supplies bound to the parent is desired (especially given
that there isn't a child node) - it's the fact that you're bodging this
in the framework by just randomly peering at the parent device and
hoping it's an MFD parent when a lookup fails.  That's not a safe thing
to do.  

Like I said in the quote above trying to handle this in the child isn't
a good approach, it's both more idiomatic and more robust to put the
mappings from the parent device to the child devices in when creating
the child devices.

> onto the one device tree node? The reason I ask is because

Yes, for a device like this.

> looking at this further we run into other problems, the MFD
> framework is clearly intended to have a sub-node for each device.

No it isn't, see recent discussions on the list and look at the existing
drivers.  This is supported because for some devices it makes sense to
split things up but it's not something that needs to be used for every
device, it's useful when for example a device embeds a reusable generic
IP.  If you're ending up with the subdevices defined with the same level
of granularity as the parent device then that's not happening, and
remember that for these devices you do also need to support non-DT
systems.

> For example, the GPIO driver has a similar issue if anything else
> wishes to use an Arizona devices GPIO, because the GPIO driver
> is on a different device to the MFD so again it can't locate it.
> I haven't checked yet but I am guessing there will be similar
> issues with the interrupts.

No, this isn't an issue at all.  Look at how the regulator API resolves
DT lookups for example, the structure of the driver offering the service
should have no impact on anything referencing it.  The fact that Linux
happens to split things up into a particular set of subsystems at the
current time should have no bearing on the way that the DT bindings are
written since that's just a detail of how Linux works.

Linux (or any other operating system) can always decide to refactor
later on, something we take advantage of relatively often.  For example
with these devices the clocks are currently described as part of the
audio CODEC but which should probably in future move over to the clock
API once that's generally available since they're shared within the
device and can be brought out of the device to other system components.
The device tree for a system using the device shouldn't need to change
when that happens since nothing about the hardware changed.


signature.asc
Description: Digital signature


Re: [PATCH 4/5] ASoC: samsung: Use CONFIG_ARCH_S3C64XX to check for S3C64xx support

2013-09-28 Thread Mark Brown
On Sat, Sep 28, 2013 at 08:21:36PM +0200, Tomasz Figa wrote:
> Since CONFIG_PLAT_S3C64XX is going to be removed, this patch modifies
> the s3c-i2s-v2 driver to use the proper way of checking for S3C64xx
> support - CONFIG_ARCH_S3C64XX.

Acked-by: Mark Brown 


signature.asc
Description: Digital signature


Personal Email (28th September 2013)

2013-09-28 Thread Mrs.Nicole Marois

Dear Beloved Friend,

  I am Mrs Nicole Marois Benoite, and i have been suffering from ovarian 
cancer disease and the doctor says that i have just few days to leave. I am 
from (Paris) France but based in Africa Burkina Faso since eleven years ago as 
a business woman dealing with gold exportation.

Now that i am about to end the race like this, without any family members and 
no child. I have $3 Million US DOLLARS in Africa Development Bank (ADB) Burkina 
Faso which i instructed the bank to give St Andrews Missionary Home in Burkina 
Faso.But my mind is not at rest because i am writing this letter now through 
the help of my computer beside my sick bed.

I also have $4.5 Million US Dollars at Eco-Bank here in Burkina Faso and i 
instructed the bank to transfer the fund to you as foreigner that will apply to 
the bank after i have gone, that they should release the fund to him/her,but 
you will assure me that you will take 50% of the fund and give 50% to the 
orphanages home in your country for my heart to rest.

Respond to me immediately  for further details since I have just few days to 
end my life due to the ovarian cancer disease, hoping you will understand my 
point.

Yours fairly friend,
Mrs Nicole Marois Benoite
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] misc: cs5535-mfgpt: Replace 'module_init' with 'module_platform_driver'

2013-09-28 Thread Richard Weinberger
Am 29.09.2013 00:14, schrieb Elad Wexler:
> On Sat, Sep 28, 2013 at 08:54:55PM +0200, richard -rw- weinberger wrote:
>> On Sat, Sep 28, 2013 at 8:14 PM,   wrote:
>>> From: Elad Wexler 
>>>
>>> Driver doesn't do anything special in 'module_init'.
>>>
>>> 'module_platform_init' makes the code more readable.
>>>
>>> Signed-off-by: Elad Wexler 
>>
>> NAK.
>>
>> By moving to module_platform_init() you make this module unloadable.
>> Currently it is unloadable for good reasons.
>> The cs5535-mfgpt chip is nasty because it has no reliable reset function.
>> Therefore by unloading and loading the module again bad things can happen.
>>
> I think this HW information should be well documented.
> Explaining the reason why it doesn't implemented the 'module_exit'

Usually if a driver does not implement the exit function it's because of 
hardware
limitations.
IOW, the alarm bell rings. ;-)

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET] sysfs: use seq_file and unify regular and bin file handling

2013-09-28 Thread Tejun Heo
Hey, again.

On Sat, Sep 28, 2013 at 05:49:30PM -0400, Tejun Heo wrote:
>  0001-sysfs-remove-unused-sysfs_buffer-pos.patch
>  0002-sysfs-remove-sysfs_buffer-needs_read_fill.patch
>  0003-sysfs-remove-sysfs_buffer-ops.patch
>  0004-sysfs-add-sysfs_open_file_mutex.patch
>  0005-sysfs-rename-sysfs_buffer-to-sysfs_open_file.patch
>  0006-sysfs-add-sysfs_open_file-sd-and-file.patch
>  0007-sysfs-use-transient-write-buffer.patch
>  0008-sysfs-use-seq_file-when-reading-regular-files.patch
>  0009-sysfs-prepare-llseek-path-for-unified-regular-bin-fi.patch
>  0010-sysfs-prepare-path-write-for-unified-regular-bin-fil.patch
>  0011-sysfs-prepare-read-path-for-unified-regular-bin-file.patch
>  0012-sysfs-copy-bin-mmap-support-from-fs-sysfs-bin.c-to-f.patch
>  0013-sysfs-prepare-open-path-for-unified-regular-bin-file.patch
>  0014-sysfs-merge-regular-and-bin-file-handling.patch

On the second thought, 0011 seems too dangerous, especially for pci IO
BAR regions.  Grumble, looks like I'll have to break out the bin read
path.  Please ignore patches >= 0009.  I'll update them and repost.

Thanks!

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] misc: cs5535-mfgpt: Replace 'module_init' with 'module_platform_driver'

2013-09-28 Thread Elad Wexler
On Sat, Sep 28, 2013 at 08:54:55PM +0200, richard -rw- weinberger wrote:
> On Sat, Sep 28, 2013 at 8:14 PM,   wrote:
> > From: Elad Wexler 
> >
> > Driver doesn't do anything special in 'module_init'.
> >
> > 'module_platform_init' makes the code more readable.
> >
> > Signed-off-by: Elad Wexler 
> 
> NAK.
> 
> By moving to module_platform_init() you make this module unloadable.
> Currently it is unloadable for good reasons.
> The cs5535-mfgpt chip is nasty because it has no reliable reset function.
> Therefore by unloading and loading the module again bad things can happen.
> 
I think this HW information should be well documented.
Explaining the reason why it doesn't implemented the 'module_exit'

> -- 
> Thanks,
> //richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/14] sysfs: remove sysfs_buffer->needs_read_fill

2013-09-28 Thread Tejun Heo
->needs_read_fill is used to implement the following behaviors.

1. Ensure buffer filling on the first read.
2. Force buffer filling after a write.
3. Force buffer filling after a successful poll.

However, #2 and #3 don't really work as sysfs doesn't reset file
position.  While the read buffer would be refilled, the next read
would continue from the position after the last read or write,
requiring an explicit seek to the start for it to be useful, which
makes ->needs_read_fill superflous as read buffer is always refilled
if f_pos == 0.

Update sysfs_read_file() to test buffer->page for #1 instead and
remove ->needs_read_fill.  While this changes behavior in extreme
corner cases - e.g. re-reading a sysfs file after seeking to non-zero
position after a write or poll, it's highly unlikely to lead to actual
breakage.  This change is to prepare for using seq_file in the read
path.

While at it, reformat a comment in fill_write_buffer().

Signed-off-by: Tejun Heo 
Cc: Kay Sievers 
---
 fs/sysfs/file.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 81e3f72..e2fafc0 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -47,7 +47,6 @@ struct sysfs_buffer {
char*page;
const struct sysfs_ops  *ops;
struct mutexmutex;
-   int needs_read_fill;
int event;
struct list_headlist;
 };
@@ -95,12 +94,10 @@ static int fill_read_buffer(struct dentry *dentry, struct 
sysfs_buffer *buffer)
/* Try to struggle along */
count = PAGE_SIZE - 1;
}
-   if (count >= 0) {
-   buffer->needs_read_fill = 0;
+   if (count >= 0)
buffer->count = count;
-   } else {
+   else
ret = count;
-   }
return ret;
 }
 
@@ -130,7 +127,11 @@ sysfs_read_file(struct file *file, char __user *buf, 
size_t count, loff_t *ppos)
ssize_t retval = 0;
 
mutex_lock(>mutex);
-   if (buffer->needs_read_fill || *ppos == 0) {
+   /*
+* Fill on zero offset and the first read so that silly things like
+* "dd bs=1 skip=N" can work on sysfs files.
+*/
+   if (*ppos == 0 || !buffer->page) {
retval = fill_read_buffer(file->f_path.dentry, buffer);
if (retval)
goto out;
@@ -166,14 +167,15 @@ static int fill_write_buffer(struct sysfs_buffer *buffer,
if (count >= PAGE_SIZE)
count = PAGE_SIZE - 1;
error = copy_from_user(buffer->page, buf, count);
-   buffer->needs_read_fill = 1;
-   /* if buf is assumed to contain a string, terminate it by \0,
-  so e.g. sscanf() can scan the string easily */
+
+   /*
+* If buf is assumed to contain a string, terminate it by \0, so
+* e.g. sscanf() can scan the string easily.
+*/
buffer->page[count] = 0;
return error ? -EFAULT : count;
 }
 
-
 /**
  * flush_write_buffer - push buffer to kobject.
  * @dentry:dentry to the attribute
@@ -368,7 +370,6 @@ static int sysfs_open_file(struct inode *inode, struct file 
*file)
goto err_out;
 
mutex_init(>mutex);
-   buffer->needs_read_fill = 1;
buffer->ops = ops;
file->private_data = buffer;
 
@@ -435,7 +436,6 @@ static unsigned int sysfs_poll(struct file *filp, 
poll_table *wait)
return DEFAULT_POLLMASK;
 
  trigger:
-   buffer->needs_read_fill = 1;
return DEFAULT_POLLMASK|POLLERR|POLLPRI;
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/14] sysfs: rename sysfs_buffer to sysfs_open_file

2013-09-28 Thread Tejun Heo
sysfs read path will be converted to use seq_file which will handle
buffering making sysfs_buffer a misnomer.  Rename sysfs_buffer to
sysfs_open_file, and sysfs_open_dirent->buffers to ->files.

This path is pure rename.

Signed-off-by: Tejun Heo 
---
 fs/sysfs/file.c | 127 
 1 file changed, 63 insertions(+), 64 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 499cff8..4b55bcf 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -25,14 +25,14 @@
 #include "sysfs.h"
 
 /*
- * There's one sysfs_buffer for each open file and one sysfs_open_dirent
+ * There's one sysfs_open_file for each open file and one sysfs_open_dirent
  * for each sysfs_dirent with one or more open files.
  *
  * sysfs_dirent->s_attr.open points to sysfs_open_dirent.  s_attr.open is
  * protected by sysfs_open_dirent_lock.
  *
- * filp->private_data points to sysfs_buffer which is chained at
- * sysfs_open_dirent->buffers, which is protected by sysfs_open_file_mutex.
+ * filp->private_data points to sysfs_open_file which is chained at
+ * sysfs_open_dirent->files, which is protected by sysfs_open_file_mutex.
  */
 static DEFINE_SPINLOCK(sysfs_open_dirent_lock);
 static DEFINE_MUTEX(sysfs_open_file_mutex);
@@ -41,10 +41,10 @@ struct sysfs_open_dirent {
atomic_trefcnt;
atomic_tevent;
wait_queue_head_t   poll;
-   struct list_headbuffers; /* goes through sysfs_buffer.list */
+   struct list_headfiles; /* goes through sysfs_open_file.list */
 };
 
-struct sysfs_buffer {
+struct sysfs_open_file {
size_t  count;
char*page;
struct mutexmutex;
@@ -75,7 +75,7 @@ static const struct sysfs_ops *sysfs_file_ops(struct 
sysfs_dirent *sd)
  * This is called only once, on the file's first read unless an error
  * is returned.
  */
-static int fill_read_buffer(struct dentry *dentry, struct sysfs_buffer *buffer)
+static int fill_read_buffer(struct dentry *dentry, struct sysfs_open_file *of)
 {
struct sysfs_dirent *attr_sd = dentry->d_fsdata;
struct kobject *kobj = attr_sd->s_parent->s_dir.kobj;
@@ -83,19 +83,19 @@ static int fill_read_buffer(struct dentry *dentry, struct 
sysfs_buffer *buffer)
int ret = 0;
ssize_t count;
 
-   if (!buffer->page)
-   buffer->page = (char *) get_zeroed_page(GFP_KERNEL);
-   if (!buffer->page)
+   if (!of->page)
+   of->page = (char *) get_zeroed_page(GFP_KERNEL);
+   if (!of->page)
return -ENOMEM;
 
/* need attr_sd for attr and ops, its parent for kobj */
if (!sysfs_get_active(attr_sd))
return -ENODEV;
 
-   buffer->event = atomic_read(_sd->s_attr.open->event);
+   of->event = atomic_read(_sd->s_attr.open->event);
 
ops = sysfs_file_ops(attr_sd);
-   count = ops->show(kobj, attr_sd->s_attr.attr, buffer->page);
+   count = ops->show(kobj, attr_sd->s_attr.attr, of->page);
 
sysfs_put_active(attr_sd);
 
@@ -110,7 +110,7 @@ static int fill_read_buffer(struct dentry *dentry, struct 
sysfs_buffer *buffer)
count = PAGE_SIZE - 1;
}
if (count >= 0)
-   buffer->count = count;
+   of->count = count;
else
ret = count;
return ret;
@@ -138,63 +138,62 @@ static int fill_read_buffer(struct dentry *dentry, struct 
sysfs_buffer *buffer)
 static ssize_t
 sysfs_read_file(struct file *file, char __user *buf, size_t count, loff_t 
*ppos)
 {
-   struct sysfs_buffer *buffer = file->private_data;
+   struct sysfs_open_file *of = file->private_data;
ssize_t retval = 0;
 
-   mutex_lock(>mutex);
+   mutex_lock(>mutex);
/*
 * Fill on zero offset and the first read so that silly things like
 * "dd bs=1 skip=N" can work on sysfs files.
 */
-   if (*ppos == 0 || !buffer->page) {
-   retval = fill_read_buffer(file->f_path.dentry, buffer);
+   if (*ppos == 0 || !of->page) {
+   retval = fill_read_buffer(file->f_path.dentry, of);
if (retval)
goto out;
}
pr_debug("%s: count = %zd, ppos = %lld, buf = %s\n",
-__func__, count, *ppos, buffer->page);
-   retval = simple_read_from_buffer(buf, count, ppos, buffer->page,
-buffer->count);
+__func__, count, *ppos, of->page);
+   retval = simple_read_from_buffer(buf, count, ppos, of->page, of->count);
 out:
-   mutex_unlock(>mutex);
+   mutex_unlock(>mutex);
return retval;
 }
 
 /**
  * fill_write_buffer - copy buffer from userspace.
- * @buffer:data buffer for file.
+ * @of:open file struct.
  * @buf:   data from user.
  * @count: number of bytes in 

[PATCH 06/14] sysfs: add sysfs_open_file->sd and ->file

2013-09-28 Thread Tejun Heo
sysfs will be converted to use seq_file for read path, which will make
it difficult to pass around multiple pointers directly.  This patch
adds sysfs_open_file->sd and ->file so that we can reach all the
necessary data structures from sysfs_open_file.

flush_write_buffer() is updated to drop @dentry which was used to
discover the sysfs_dirent as it's now available through
sysfs_open_file->sd.

This patch doesn't cause any behavior difference.

Signed-off-by: Tejun Heo 
---
 fs/sysfs/file.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 4b55bcf..af6e909 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -45,6 +45,8 @@ struct sysfs_open_dirent {
 };
 
 struct sysfs_open_file {
+   struct sysfs_dirent *sd;
+   struct file *file;
size_t  count;
char*page;
struct mutexmutex;
@@ -192,7 +194,6 @@ static int fill_write_buffer(struct sysfs_open_file *of,
 
 /**
  * flush_write_buffer - push buffer to kobject.
- * @dentry:dentry to the attribute
  * @of:open file
  * @count: number of bytes
  *
@@ -200,22 +201,20 @@ static int fill_write_buffer(struct sysfs_open_file *of,
  * dealing with, then call the store() method for the attribute,
  * passing the buffer that we acquired in fill_write_buffer().
  */
-static int flush_write_buffer(struct dentry *dentry,
- struct sysfs_open_file *of, size_t count)
+static int flush_write_buffer(struct sysfs_open_file *of, size_t count)
 {
-   struct sysfs_dirent *attr_sd = dentry->d_fsdata;
-   struct kobject *kobj = attr_sd->s_parent->s_dir.kobj;
+   struct kobject *kobj = of->sd->s_parent->s_dir.kobj;
const struct sysfs_ops *ops;
int rc;
 
-   /* need attr_sd for attr and ops, its parent for kobj */
-   if (!sysfs_get_active(attr_sd))
+   /* need @of->sd for attr and ops, its parent for kobj */
+   if (!sysfs_get_active(of->sd))
return -ENODEV;
 
-   ops = sysfs_file_ops(attr_sd);
-   rc = ops->store(kobj, attr_sd->s_attr.attr, of->page, count);
+   ops = sysfs_file_ops(of->sd);
+   rc = ops->store(kobj, of->sd->s_attr.attr, of->page, count);
 
-   sysfs_put_active(attr_sd);
+   sysfs_put_active(of->sd);
 
return rc;
 }
@@ -245,7 +244,7 @@ static ssize_t sysfs_write_file(struct file *file, const 
char __user *buf,
mutex_lock(>mutex);
len = fill_write_buffer(of, buf, count);
if (len > 0)
-   len = flush_write_buffer(file->f_path.dentry, of, len);
+   len = flush_write_buffer(of, len);
if (len > 0)
*ppos += len;
mutex_unlock(>mutex);
@@ -385,6 +384,8 @@ static int sysfs_open_file(struct inode *inode, struct file 
*file)
goto err_out;
 
mutex_init(>mutex);
+   of->sd = attr_sd;
+   of->file = file;
file->private_data = of;
 
/* make sure we have open dirent struct */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/14] sysfs: prepare llseek path for unified regular / bin file handling

2013-09-28 Thread Tejun Heo
sysfs bin file handling will be merged into the regular file support.
This patch prepares the llseek path.

sysfs currently unconditionally uses seq_lseek() whether the file
supports read or not, which means that sysfs_seq_show() may be used
purely for seeking even if the file doesn't implement read.
sysfs_seq_show() simply doesn't produce any data if sysfs_ops->show()
is not available.  This is good enough for write-only files as open()
doesn't allow FMODE_READ if sysfs_ops->show() is not implemented and
seq_lseek() sets f_pos to the requested offset as long as show()
doesn't fail.

However, bin files allow FMODE_READ when ->mmap() is implemented even
if ->read() is not, which means that sysfs_seq_show() would need to
fail if ->read() is not implemented, which is fine for read(2) but
would break lseek(2).

This patch implements sysfs_llseek() which uses seq_lseek() iff read
is implemented.  If not, generic_file_llseek() is used instead.  This
removes the case where sysfs_seq_show() is used purely for seeking
thus solving the above issue.  Plus, it's weird to use seq_seek() when
seq_file isn't being used anyway.

Note that sysfs_llseek() handles both regular and bin files.  While
this isn't used yet, it'll allow unifying handling of both types.

Signed-off-by: Tejun Heo 
---
 fs/sysfs/file.c | 44 +++-
 1 file changed, 35 insertions(+), 9 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 6211dd7..d9109d3 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -54,6 +54,11 @@ struct sysfs_open_file {
struct list_headlist;
 };
 
+static bool sysfs_is_bin(struct sysfs_dirent *sd)
+{
+   return sysfs_type(sd) == SYSFS_KOBJ_BIN_ATTR;
+}
+
 static struct sysfs_open_file *sysfs_of(struct file *file)
 {
return ((struct seq_file *)file->private_data)->private;
@@ -72,6 +77,33 @@ static const struct sysfs_ops *sysfs_file_ops(struct 
sysfs_dirent *sd)
 }
 
 /*
+ * llseek for sysfs.  Use seq_lseek() if read operation is implemented;
+ * otherwise, fall back to generic_file_llseek().  This ensures that
+ * sysfs_seq_show() isn't invoked to seek in a file which doesn't
+ * implemented read.
+ */
+static loff_t sysfs_llseek(struct file *file, loff_t offset, int whence)
+{
+   struct sysfs_open_file *of = sysfs_of(file);
+   bool has_read;
+
+   if (!sysfs_get_active(of->sd))
+   return -ENODEV;
+
+   if (sysfs_is_bin(of->sd))
+   has_read = of->sd->s_bin_attr.bin_attr->read;
+   else
+   has_read = sysfs_file_ops(of->sd)->show;
+
+   sysfs_put_active(of->sd);
+
+   if (has_read)
+   return seq_lseek(file, offset, whence);
+   else
+   return generic_file_llseek(file, offset, whence);
+}
+
+/*
  * Reads on sysfs are handled through seq_file, which takes care of hairy
  * details like buffering and seeking.  The following function pipes
  * sysfs_ops->show() result through seq_file.
@@ -104,15 +136,9 @@ static int sysfs_seq_show(struct seq_file *sf, void *v)
 
of->event = atomic_read(>sd->s_attr.open->event);
 
-   /*
-* Lookup @ops and invoke show().  Control may reach here via seq
-* file lseek even if @ops->show() isn't implemented.
-*/
+   /* lookup @ops and invoke show() */
ops = sysfs_file_ops(of->sd);
-   if (ops->show)
-   count = ops->show(kobj, of->sd->s_attr.attr, buf);
-   else
-   count = 0;
+   count = ops->show(kobj, of->sd->s_attr.attr, buf);
 
sysfs_put_active(of->sd);
mutex_unlock(>mutex);
@@ -465,7 +491,7 @@ EXPORT_SYMBOL_GPL(sysfs_notify);
 const struct file_operations sysfs_file_operations = {
.read   = seq_read,
.write  = sysfs_write_file,
-   .llseek = seq_lseek,
+   .llseek = sysfs_llseek,
.open   = sysfs_open_file,
.release= sysfs_release,
.poll   = sysfs_poll,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/14] sysfs: copy bin mmap support from fs/sysfs/bin.c to fs/sysfs/file.c

2013-09-28 Thread Tejun Heo
sysfs bin file handling will be merged into the regular file support.
This patch copies mmap support from bin so that fs/sysfs/file.c can
handle mmapping bin files.

The code is copied mostly verbatim with the following updates.

* ->mmapped and ->vm_ops are added to sysfs_open_file and bin_buffer
  references are replaced with sysfs_open_file ones.

* Symbols are prefixed with sysfs_.

* sysfs_bin_mmap() explicitly checks whether the file is a bin file
  and returns -ENODEV if not.

* sysfs_unmap_bin_file() grabs sysfs_open_dirent and traverses
  ->files.  Invocation of this function is added to
  sysfs_addrm_finish().

This is a preparation and the new mmap path isn't used yet.

Signed-off-by: Tejun Heo 
---
 fs/sysfs/dir.c   |   1 +
 fs/sysfs/file.c  | 250 ++-
 fs/sysfs/sysfs.h |   2 +
 3 files changed, 252 insertions(+), 1 deletion(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index b518afd..c4040dd 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -595,6 +595,7 @@ void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt)
acxt->removed = sd->u.removed_list;
 
sysfs_deactivate(sd);
+   sysfs_unmap_bin_file(sd);
unmap_bin_file(sd);
sysfs_put(sd);
}
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 46f7d59..fe5c440 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "sysfs.h"
 
@@ -53,6 +54,9 @@ struct sysfs_open_file {
int event;
struct list_headlist;
 
+   boolmmapped;
+   const struct vm_operations_struct *vm_ops;
+
void*private_data;
 };
 
@@ -326,6 +330,221 @@ out_free:
return len;
 }
 
+static void sysfs_bin_vma_open(struct vm_area_struct *vma)
+{
+   struct file *file = vma->vm_file;
+   struct sysfs_open_file *of = sysfs_of(file);
+
+   if (!of->vm_ops)
+   return;
+
+   if (!sysfs_get_active(of->sd))
+   return;
+
+   if (of->vm_ops->open)
+   of->vm_ops->open(vma);
+
+   sysfs_put_active(of->sd);
+}
+
+static int sysfs_bin_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+   struct file *file = vma->vm_file;
+   struct sysfs_open_file *of = sysfs_of(file);
+   int ret;
+
+   if (!of->vm_ops)
+   return VM_FAULT_SIGBUS;
+
+   if (!sysfs_get_active(of->sd))
+   return VM_FAULT_SIGBUS;
+
+   ret = VM_FAULT_SIGBUS;
+   if (of->vm_ops->fault)
+   ret = of->vm_ops->fault(vma, vmf);
+
+   sysfs_put_active(of->sd);
+   return ret;
+}
+
+static int sysfs_bin_page_mkwrite(struct vm_area_struct *vma,
+ struct vm_fault *vmf)
+{
+   struct file *file = vma->vm_file;
+   struct sysfs_open_file *of = sysfs_of(file);
+   int ret;
+
+   if (!of->vm_ops)
+   return VM_FAULT_SIGBUS;
+
+   if (!sysfs_get_active(of->sd))
+   return VM_FAULT_SIGBUS;
+
+   ret = 0;
+   if (of->vm_ops->page_mkwrite)
+   ret = of->vm_ops->page_mkwrite(vma, vmf);
+   else
+   file_update_time(file);
+
+   sysfs_put_active(of->sd);
+   return ret;
+}
+
+static int sysfs_bin_access(struct vm_area_struct *vma, unsigned long addr,
+   void *buf, int len, int write)
+{
+   struct file *file = vma->vm_file;
+   struct sysfs_open_file *of = sysfs_of(file);
+   int ret;
+
+   if (!of->vm_ops)
+   return -EINVAL;
+
+   if (!sysfs_get_active(of->sd))
+   return -EINVAL;
+
+   ret = -EINVAL;
+   if (of->vm_ops->access)
+   ret = of->vm_ops->access(vma, addr, buf, len, write);
+
+   sysfs_put_active(of->sd);
+   return ret;
+}
+
+#ifdef CONFIG_NUMA
+static int sysfs_bin_set_policy(struct vm_area_struct *vma,
+   struct mempolicy *new)
+{
+   struct file *file = vma->vm_file;
+   struct sysfs_open_file *of = sysfs_of(file);
+   int ret;
+
+   if (!of->vm_ops)
+   return 0;
+
+   if (!sysfs_get_active(of->sd))
+   return -EINVAL;
+
+   ret = 0;
+   if (of->vm_ops->set_policy)
+   ret = of->vm_ops->set_policy(vma, new);
+
+   sysfs_put_active(of->sd);
+   return ret;
+}
+
+static struct mempolicy *sysfs_bin_get_policy(struct vm_area_struct *vma,
+ unsigned long addr)
+{
+   struct file *file = vma->vm_file;
+   struct sysfs_open_file *of = sysfs_of(file);
+   struct mempolicy *pol;
+
+   if (!of->vm_ops)
+   return vma->vm_policy;
+
+   if (!sysfs_get_active(of->sd))
+   return vma->vm_policy;
+
+   pol = vma->vm_policy;
+   if (of->vm_ops->get_policy)
+   pol = 

[PATCH 11/14] sysfs: prepare read path for unified regular / bin file handling

2013-09-28 Thread Tejun Heo
sysfs bin file handling will be merged into the regular file support.
This patch prepares the read path.

This is a bit tricky as read support is quite different between
regular and bin files.  bin file supports arbitrarily large file size
and passes the read offset and size directly to the callback as long
as the size is <= PAGE_SIZE.  While it's doable to preserve the offset
and size parameters from userland and pass them to
bin_attribute->read() callback, none of the current users seems to
depend on the behavior and it's a lot simpler and more efficient to
implement pagniated behavior.  After all, it is an extremely bad idea
to make reads of sysfs files to have side-effects.

sysfs_bin_start/next/stop() are implemented so that seq_file iterator
pointer is 1-based page index and sysfs_seq_show() is updated to
transfer data from bin_attribute->read() to seq_file buffer
page-by-page.  A comment clarifying that ->read() must not have
side-effects is added to bin_attribute definition.

This is a preparation and the new bin file path isn't used yet.

Signed-off-by: Tejun Heo 
Cc: Kay Sievers 
---
 fs/sysfs/file.c   | 96 +++
 include/linux/sysfs.h |  7 
 2 files changed, 89 insertions(+), 14 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 5380009..46f7d59 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -52,6 +52,8 @@ struct sysfs_open_file {
struct mutexmutex;
int event;
struct list_headlist;
+
+   void*private_data;
 };
 
 static bool sysfs_is_bin(struct sysfs_dirent *sd)
@@ -112,7 +114,6 @@ static int sysfs_seq_show(struct seq_file *sf, void *v)
 {
struct sysfs_open_file *of = sf->private;
struct kobject *kobj = of->sd->s_parent->s_dir.kobj;
-   const struct sysfs_ops *ops;
char *buf;
ssize_t count;
 
@@ -136,9 +137,48 @@ static int sysfs_seq_show(struct seq_file *sf, void *v)
 
of->event = atomic_read(>sd->s_attr.open->event);
 
-   /* lookup @ops and invoke show() */
-   ops = sysfs_file_ops(of->sd);
-   count = ops->show(kobj, of->sd->s_attr.attr, buf);
+   /* lookup ops and invoke read/show() */
+   if (sysfs_is_bin(of->sd)) {
+   struct bin_attribute *battr = of->sd->s_bin_attr.bin_attr;
+   size_t idx = (unsigned long)v - 1;
+   loff_t off = idx << PAGE_SHIFT;
+   size_t size = file_inode(of->file)->i_size;
+
+   if (size)
+   size = min_t(size_t, size - off, PAGE_SIZE);
+   else
+   size = PAGE_SIZE;
+
+   /* @battr may be implementing only ->mmap() */
+   count = -EIO;
+   if (battr->read) {
+   count = battr->read(of->file, kobj, battr, buf, off,
+   size);
+   /*
+* If read() returned zero, it is the end of the
+* file.  Record it so that ->next() terminates on
+* the next invocation.
+*/
+   if (!count)
+   of->private_data = (void *)(unsigned long)idx;
+   }
+   } else {
+   const struct sysfs_ops *ops = sysfs_file_ops(of->sd);
+
+   count = ops->show(kobj, of->sd->s_attr.attr, buf);
+
+   /*
+* The code works fine with PAGE_SIZE return but it's
+* likely to indicate truncated result or overflow in
+* normal use cases.
+*/
+   if (unlikely(count >= (ssize_t)PAGE_SIZE)) {
+   print_symbol("fill_read_buffer: %s returned bad 
count\n",
+(unsigned long)ops->show);
+   /* Try to struggle along */
+   count = PAGE_SIZE - 1;
+   }
+   }
 
sysfs_put_active(of->sd);
mutex_unlock(>mutex);
@@ -146,20 +186,48 @@ static int sysfs_seq_show(struct seq_file *sf, void *v)
if (count < 0)
return count;
 
-   /*
-* The code works fine with PAGE_SIZE return but it's likely to
-* indicate truncated result or overflow in normal use cases.
-*/
-   if (count >= (ssize_t)PAGE_SIZE) {
-   print_symbol("fill_read_buffer: %s returned bad count\n",
-   (unsigned long)ops->show);
-   /* Try to struggle along */
-   count = PAGE_SIZE - 1;
-   }
seq_commit(sf, count);
return 0;
 }
 
+static void *sysfs_bin_seq_start(struct seq_file *sf, loff_t *pidx)
+{
+   struct sysfs_open_file *of = sf->private;
+   loff_t size = file_inode(of->file)->i_size;
+   size_t nr_pages = DIV_ROUND_UP(size, PAGE_SIZE);
+
+   /* record number of pages in 

[PATCH 10/14] sysfs: prepare path write for unified regular / bin file handling

2013-09-28 Thread Tejun Heo
sysfs bin file handling will be merged into the regular file support.
This patch prepares the write path.

bin file write is almost identical to regular file write except that
the write length is capped by the inode size and @off is passed to the
write method.  This patch adds bin file handling to sysfs_write_file()
so that it can handle both regular and bin files.

This is a preparation and the new bin file path isn't used yet.

Signed-off-by: Tejun Heo 
---
 fs/sysfs/file.c | 30 --
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index d9109d3..5380009 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -164,16 +164,16 @@ static int sysfs_seq_show(struct seq_file *sf, void *v)
  * flush_write_buffer - push buffer to kobject
  * @of: open file
  * @buf: data buffer for file
+ * @off: file offset to write to
  * @count: number of bytes
  *
  * Get the correct pointers for the kobject and the attribute we're dealing
  * with, then call the store() method for it with @buf.
  */
-static int flush_write_buffer(struct sysfs_open_file *of, char *buf,
+static int flush_write_buffer(struct sysfs_open_file *of, char *buf, loff_t 
off,
  size_t count)
 {
struct kobject *kobj = of->sd->s_parent->s_dir.kobj;
-   const struct sysfs_ops *ops;
int rc = 0;
 
/*
@@ -187,8 +187,18 @@ static int flush_write_buffer(struct sysfs_open_file *of, 
char *buf,
return -ENODEV;
}
 
-   ops = sysfs_file_ops(of->sd);
-   rc = ops->store(kobj, of->sd->s_attr.attr, buf, count);
+   if (sysfs_is_bin(of->sd)) {
+   struct bin_attribute *battr = of->sd->s_bin_attr.bin_attr;
+
+   rc = -EIO;
+   if (battr->write)
+   rc = battr->write(of->file, kobj, battr, buf, off,
+ count);
+   } else {
+   const struct sysfs_ops *ops = sysfs_file_ops(of->sd);
+
+   rc = ops->store(kobj, of->sd->s_attr.attr, buf, count);
+   }
 
sysfs_put_active(of->sd);
mutex_unlock(>mutex);
@@ -216,9 +226,17 @@ static ssize_t sysfs_write_file(struct file *file, const 
char __user *user_buf,
size_t count, loff_t *ppos)
 {
struct sysfs_open_file *of = sysfs_of(file);
-   ssize_t len = min(count, PAGE_SIZE - 1);
+   ssize_t len = min(count, PAGE_SIZE);
char *buf;
 
+   if (sysfs_is_bin(of->sd)) {
+   loff_t size = file_inode(file)->i_size;
+
+   if (size <= *ppos)
+   return 0;
+   len = min_t(ssize_t, len, size - *ppos);
+   }
+
if (!len)
return 0;
 
@@ -232,7 +250,7 @@ static ssize_t sysfs_write_file(struct file *file, const 
char __user *user_buf,
}
buf[len] = '\0';/* guarantee string termination */
 
-   len = flush_write_buffer(of, buf, len);
+   len = flush_write_buffer(of, buf, *ppos, len);
if (len > 0)
*ppos += len;
 out_free:
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/14] sysfs: add sysfs_open_file_mutex

2013-09-28 Thread Tejun Heo
Add a separate mutex to protect sysfs_open_dirent->buffers list.  This
will allow performing sleepable operations while traversing
sysfs_buffers, which will be renamed to sysfs_open_file.

Note that currently sysfs_open_dirent->buffers list isn't being used
for anything and this patch doesn't make any functional difference.
It will be used to merge regular and bin file supports.

Signed-off-by: Tejun Heo 
---
 fs/sysfs/file.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 7dfcc33..499cff8 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -25,15 +25,17 @@
 #include "sysfs.h"
 
 /*
- * There's one sysfs_buffer for each open file and one
- * sysfs_open_dirent for each sysfs_dirent with one or more open
- * files.
+ * There's one sysfs_buffer for each open file and one sysfs_open_dirent
+ * for each sysfs_dirent with one or more open files.
  *
- * filp->private_data points to sysfs_buffer and
- * sysfs_dirent->s_attr.open points to sysfs_open_dirent.  s_attr.open
- * is protected by sysfs_open_dirent_lock.
+ * sysfs_dirent->s_attr.open points to sysfs_open_dirent.  s_attr.open is
+ * protected by sysfs_open_dirent_lock.
+ *
+ * filp->private_data points to sysfs_buffer which is chained at
+ * sysfs_open_dirent->buffers, which is protected by sysfs_open_file_mutex.
  */
 static DEFINE_SPINLOCK(sysfs_open_dirent_lock);
+static DEFINE_MUTEX(sysfs_open_file_mutex);
 
 struct sysfs_open_dirent {
atomic_trefcnt;
@@ -272,6 +274,7 @@ static int sysfs_get_open_dirent(struct sysfs_dirent *sd,
struct sysfs_open_dirent *od, *new_od = NULL;
 
  retry:
+   mutex_lock(_open_file_mutex);
spin_lock_irq(_open_dirent_lock);
 
if (!sd->s_attr.open && new_od) {
@@ -286,6 +289,7 @@ static int sysfs_get_open_dirent(struct sysfs_dirent *sd,
}
 
spin_unlock_irq(_open_dirent_lock);
+   mutex_unlock(_open_file_mutex);
 
if (od) {
kfree(new_od);
@@ -321,6 +325,7 @@ static void sysfs_put_open_dirent(struct sysfs_dirent *sd,
struct sysfs_open_dirent *od = sd->s_attr.open;
unsigned long flags;
 
+   mutex_lock(_open_file_mutex);
spin_lock_irqsave(_open_dirent_lock, flags);
 
list_del(>list);
@@ -330,6 +335,7 @@ static void sysfs_put_open_dirent(struct sysfs_dirent *sd,
od = NULL;
 
spin_unlock_irqrestore(_open_dirent_lock, flags);
+   mutex_unlock(_open_file_mutex);
 
kfree(od);
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/14] sysfs: prepare open path for unified regular / bin file handling

2013-09-28 Thread Tejun Heo
sysfs bin file handling will be merged into the regular file support.
All three access paths - read, write and mmap - can now handle both
regular and bin files.  This patch updates sysfs_open_file() and
sysfs_release() such that they can handle both regular and bin files.

This is a preparation and the new bin file path isn't used yet.

Signed-off-by: Tejun Heo 
---
 fs/sysfs/file.c | 61 ++---
 1 file changed, 37 insertions(+), 24 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index fe5c440..723f78c 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -638,38 +638,40 @@ static int sysfs_open_file(struct inode *inode, struct 
file *file)
struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
struct kobject *kobj = attr_sd->s_parent->s_dir.kobj;
struct sysfs_open_file *of;
-   const struct sysfs_ops *ops;
+   bool has_read, has_write;
int error = -EACCES;
 
/* need attr_sd for attr and ops, its parent for kobj */
if (!sysfs_get_active(attr_sd))
return -ENODEV;
 
-   /* every kobject with an attribute needs a ktype assigned */
-   ops = sysfs_file_ops(attr_sd);
-   if (WARN(!ops, KERN_ERR
-"missing sysfs attribute operations for kobject: %s\n",
-kobject_name(kobj)))
-   goto err_out;
+   if (sysfs_is_bin(attr_sd)) {
+   struct bin_attribute *battr = attr_sd->s_bin_attr.bin_attr;
 
-   /* File needs write support.
-* The inode's perms must say it's ok,
-* and we must have a store method.
-*/
-   if (file->f_mode & FMODE_WRITE) {
-   if (!(inode->i_mode & S_IWUGO) || !ops->store)
-   goto err_out;
-   }
+   has_read = battr->read || battr->mmap;
+   has_write = battr->write || battr->mmap;
+   } else {
+   const struct sysfs_ops *ops = sysfs_file_ops(attr_sd);
 
-   /* File needs read support.
-* The inode's perms must say it's ok, and we there
-* must be a show method for it.
-*/
-   if (file->f_mode & FMODE_READ) {
-   if (!(inode->i_mode & S_IRUGO) || !ops->show)
+   /* every kobject with an attribute needs a ktype assigned */
+   if (WARN(!ops, KERN_ERR
+"missing sysfs attribute operations for kobject: %s\n",
+kobject_name(kobj)))
goto err_out;
+
+   has_read = ops->show;
+   has_write = ops->store;
}
 
+   /* check perms and supported operations */
+   if ((file->f_mode & FMODE_WRITE) &&
+   (!(inode->i_mode & S_IWUGO) || !has_write))
+   goto err_out;
+
+   if ((file->f_mode & FMODE_READ) &&
+   (!(inode->i_mode & S_IRUGO) || !has_read))
+   goto err_out;
+
/* allocate a sysfs_open_file for the file */
error = -ENOMEM;
of = kzalloc(sizeof(struct sysfs_open_file), GFP_KERNEL);
@@ -685,10 +687,15 @@ static int sysfs_open_file(struct inode *inode, struct 
file *file)
 * implemented or requested.  This unifies private data access and
 * most files are readable anyway.
 */
-   error = single_open(file, sysfs_seq_show, of);
+   if (sysfs_is_bin(attr_sd))
+   error = seq_open(file, _bin_seq_ops);
+   else
+   error = single_open(file, sysfs_seq_show, NULL);
if (error)
goto err_free;
 
+   ((struct seq_file *)file->private_data)->private = of;
+
/* seq_file clears PWRITE unconditionally, restore it if WRITE */
if (file->f_mode & FMODE_WRITE)
file->f_mode |= FMODE_PWRITE;
@@ -703,7 +710,10 @@ static int sysfs_open_file(struct inode *inode, struct 
file *file)
return 0;
 
 err_close:
-   single_release(inode, file);
+   if (sysfs_is_bin(attr_sd))
+   seq_release(inode, file);
+   else
+   single_release(inode, file);
 err_free:
kfree(of);
 err_out:
@@ -717,7 +727,10 @@ static int sysfs_release(struct inode *inode, struct file 
*filp)
struct sysfs_open_file *of = sysfs_of(filp);
 
sysfs_put_open_dirent(sd, of);
-   single_release(inode, filp);
+   if (sysfs_is_bin(sd))
+   seq_release(inode, filp);
+   else
+   single_release(inode, filp);
kfree(of);
 
return 0;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/14] sysfs: use seq_file when reading regular files

2013-09-28 Thread Tejun Heo
sysfs read path implements its own buffering scheme between userland
and kernel callbacks, which essentially is a degenerate duplicate of
seq_file.  This patch replaces the custom read buffering
implementation in sysfs with seq_file.

While the amount of code reduction is small, this reduces low level
hairiness and enables future development of a new versatile API based
on seq_file so that sysfs features can be shared with other
subsystems.

As write path was already converted to not use sysfs_open_file->page,
this patch makes ->page and ->count unused and removes them.

Userland behavior remains the same except for some extreme corner
cases - e.g. sysfs will now regenerate the content each time a file is
read after a non-contiguous seek whereas the original code would keep
using the same content.  While this is a userland visible behavior
change, it is extremely unlikely to be noticeable and brings sysfs
behavior closer to that of procfs.

Signed-off-by: Tejun Heo 
Cc: Kay Sievers 
---
 fs/sysfs/file.c | 164 +---
 1 file changed, 73 insertions(+), 91 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 642dbcc..6211dd7 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "sysfs.h"
 
@@ -31,7 +32,8 @@
  * sysfs_dirent->s_attr.open points to sysfs_open_dirent.  s_attr.open is
  * protected by sysfs_open_dirent_lock.
  *
- * filp->private_data points to sysfs_open_file which is chained at
+ * filp->private_data points to seq_file whose ->private points to
+ * sysfs_open_file.  sysfs_open_files are chained at
  * sysfs_open_dirent->files, which is protected by sysfs_open_file_mutex.
  */
 static DEFINE_SPINLOCK(sysfs_open_dirent_lock);
@@ -47,13 +49,16 @@ struct sysfs_open_dirent {
 struct sysfs_open_file {
struct sysfs_dirent *sd;
struct file *file;
-   size_t  count;
-   char*page;
struct mutexmutex;
int event;
struct list_headlist;
 };
 
+static struct sysfs_open_file *sysfs_of(struct file *file)
+{
+   return ((struct seq_file *)file->private_data)->private;
+}
+
 /*
  * Determine ktype->sysfs_ops for the given sysfs_dirent.  This function
  * must be called while holding an active reference.
@@ -66,40 +71,54 @@ static const struct sysfs_ops *sysfs_file_ops(struct 
sysfs_dirent *sd)
return kobj->ktype ? kobj->ktype->sysfs_ops : NULL;
 }
 
-/**
- * fill_read_buffer - allocate and fill buffer from object.
- * @dentry:dentry pointer.
- * @buffer:data buffer for file.
- *
- * Allocate @buffer->page, if it hasn't been already, then call the
- * kobject's show() method to fill the buffer with this attribute's
- * data.
- * This is called only once, on the file's first read unless an error
- * is returned.
+/*
+ * Reads on sysfs are handled through seq_file, which takes care of hairy
+ * details like buffering and seeking.  The following function pipes
+ * sysfs_ops->show() result through seq_file.
  */
-static int fill_read_buffer(struct dentry *dentry, struct sysfs_open_file *of)
+static int sysfs_seq_show(struct seq_file *sf, void *v)
 {
-   struct sysfs_dirent *attr_sd = dentry->d_fsdata;
-   struct kobject *kobj = attr_sd->s_parent->s_dir.kobj;
+   struct sysfs_open_file *of = sf->private;
+   struct kobject *kobj = of->sd->s_parent->s_dir.kobj;
const struct sysfs_ops *ops;
-   int ret = 0;
+   char *buf;
ssize_t count;
 
-   if (!of->page)
-   of->page = (char *) get_zeroed_page(GFP_KERNEL);
-   if (!of->page)
-   return -ENOMEM;
+   /* acquire buffer and ensure that it's >= PAGE_SIZE */
+   count = seq_get_buf(sf, );
+   if (count < PAGE_SIZE) {
+   seq_commit(sf, -1);
+   return 0;
+   }
 
-   /* need attr_sd for attr and ops, its parent for kobj */
-   if (!sysfs_get_active(attr_sd))
+   /*
+* Need @of->sd for attr and ops, its parent for kobj.  @of->mutex
+* nests outside active ref and is just to ensure that the ops
+* aren't called concurrently for the same open file.
+*/
+   mutex_lock(>mutex);
+   if (!sysfs_get_active(of->sd)) {
+   mutex_unlock(>mutex);
return -ENODEV;
+   }
 
-   of->event = atomic_read(_sd->s_attr.open->event);
+   of->event = atomic_read(>sd->s_attr.open->event);
 
-   ops = sysfs_file_ops(attr_sd);
-   count = ops->show(kobj, attr_sd->s_attr.attr, of->page);
+   /*
+* Lookup @ops and invoke show().  Control may reach here via seq
+* file lseek even if @ops->show() isn't implemented.
+*/
+   ops = sysfs_file_ops(of->sd);
+   if (ops->show)
+   count = ops->show(kobj, of->sd->s_attr.attr, 

[PATCH 14/14] sysfs: merge regular and bin file handling

2013-09-28 Thread Tejun Heo
With the previous changes, sysfs regular file code is ready to handle
bin files too.  This patch makes bin files share the regular file
path.

* sysfs_create/remove_bin_file() are moved to fs/sysfs/file.c.

* sysfs_init_inode() is updated to use sysfs_file_operations for both
  regular and bin files.

* fs/sysfs/bin.c and the related pieces are removed.

This patch introduces subtle behavior changes in extreme corner cases.
Please refer to the preceding preparation patches which added bin file
handling to regular file paths for details.

Overall, this unification reduces the amount of duplicate logic, makes
behaviors more consistent and paves the road for building simpler and
more versatile interface which will allow other subsystems to make use
of sysfs for their pseudo filesystems.

Signed-off-by: Tejun Heo 
Cc: Kay Sievers 
---
 fs/sysfs/Makefile |   3 +-
 fs/sysfs/bin.c| 502 --
 fs/sysfs/dir.c|   1 -
 fs/sysfs/file.c   |  26 +++
 fs/sysfs/inode.c  |   2 +-
 fs/sysfs/sysfs.h  |   6 -
 6 files changed, 28 insertions(+), 512 deletions(-)
 delete mode 100644 fs/sysfs/bin.c

diff --git a/fs/sysfs/Makefile b/fs/sysfs/Makefile
index 7a1ceb9..8876ac1 100644
--- a/fs/sysfs/Makefile
+++ b/fs/sysfs/Makefile
@@ -2,5 +2,4 @@
 # Makefile for the sysfs virtual filesystem
 #
 
-obj-y  := inode.o file.o dir.o symlink.o mount.o bin.o \
-  group.o
+obj-y  := inode.o file.o dir.o symlink.o mount.o group.o
diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
deleted file mode 100644
index d49e6ca..000
--- a/fs/sysfs/bin.c
+++ /dev/null
@@ -1,502 +0,0 @@
-/*
- * fs/sysfs/bin.c - sysfs binary file implementation
- *
- * Copyright (c) 2003 Patrick Mochel
- * Copyright (c) 2003 Matthew Wilcox
- * Copyright (c) 2004 Silicon Graphics, Inc.
- * Copyright (c) 2007 SUSE Linux Products GmbH
- * Copyright (c) 2007 Tejun Heo 
- *
- * This file is released under the GPLv2.
- *
- * Please see Documentation/filesystems/sysfs.txt for more information.
- */
-
-#undef DEBUG
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "sysfs.h"
-
-/*
- * There's one bin_buffer for each open file.
- *
- * filp->private_data points to bin_buffer and
- * sysfs_dirent->s_bin_attr.buffers points to a the bin_buffer s
- * sysfs_dirent->s_bin_attr.buffers is protected by sysfs_bin_lock
- */
-static DEFINE_MUTEX(sysfs_bin_lock);
-
-struct bin_buffer {
-   struct mutexmutex;
-   void*buffer;
-   int mmapped;
-   const struct vm_operations_struct *vm_ops;
-   struct file *file;
-   struct hlist_node   list;
-};
-
-static int
-fill_read(struct file *file, char *buffer, loff_t off, size_t count)
-{
-   struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
-   struct bin_attribute *attr = attr_sd->s_bin_attr.bin_attr;
-   struct kobject *kobj = attr_sd->s_parent->s_dir.kobj;
-   int rc;
-
-   /* need attr_sd for attr, its parent for kobj */
-   if (!sysfs_get_active(attr_sd))
-   return -ENODEV;
-
-   rc = -EIO;
-   if (attr->read)
-   rc = attr->read(file, kobj, attr, buffer, off, count);
-
-   sysfs_put_active(attr_sd);
-
-   return rc;
-}
-
-static ssize_t
-read(struct file *file, char __user *userbuf, size_t bytes, loff_t *off)
-{
-   struct bin_buffer *bb = file->private_data;
-   int size = file_inode(file)->i_size;
-   loff_t offs = *off;
-   int count = min_t(size_t, bytes, PAGE_SIZE);
-   char *temp;
-
-   if (!bytes)
-   return 0;
-
-   if (size) {
-   if (offs > size)
-   return 0;
-   if (offs + count > size)
-   count = size - offs;
-   }
-
-   temp = kmalloc(count, GFP_KERNEL);
-   if (!temp)
-   return -ENOMEM;
-
-   mutex_lock(>mutex);
-
-   count = fill_read(file, bb->buffer, offs, count);
-   if (count < 0) {
-   mutex_unlock(>mutex);
-   goto out_free;
-   }
-
-   memcpy(temp, bb->buffer, count);
-
-   mutex_unlock(>mutex);
-
-   if (copy_to_user(userbuf, temp, count)) {
-   count = -EFAULT;
-   goto out_free;
-   }
-
-   pr_debug("offs = %lld, *off = %lld, count = %d\n", offs, *off, count);
-
-   *off = offs + count;
-
- out_free:
-   kfree(temp);
-   return count;
-}
-
-static int
-flush_write(struct file *file, char *buffer, loff_t offset, size_t count)
-{
-   struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata;
-   struct bin_attribute *attr = attr_sd->s_bin_attr.bin_attr;
-   struct kobject *kobj = attr_sd->s_parent->s_dir.kobj;
-   int rc;
-
-   /* need attr_sd for attr, its parent for kobj */
-   if (!sysfs_get_active(attr_sd))
-  

[PATCH 01/14] sysfs: remove unused sysfs_buffer->pos

2013-09-28 Thread Tejun Heo
Signed-off-by: Tejun Heo 
---
 fs/sysfs/file.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 1656a79..81e3f72 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -44,7 +44,6 @@ struct sysfs_open_dirent {
 
 struct sysfs_buffer {
size_t  count;
-   loff_t  pos;
char*page;
const struct sysfs_ops  *ops;
struct mutexmutex;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/14] sysfs: use transient write buffer

2013-09-28 Thread Tejun Heo
There isn't much to be gained by keeping around kernel buffer while a
file is open especially as the read path planned to be converted to
use seq_file and won't use the buffer.  This patch makes
sysfs_write_file() use per-write transient buffer instead of
sysfs_open_file->page.

This simplifies the write path, enables removing sysfs_open_file->page
once read path is updated and will help merging bin file write path
which already requires the use of a transient buffer due to a locking
order issue.

As the function comments of flush_write_buffer() and
sysfs_write_buffer() are being updated anyway, reformat them so that
they're more conventional.

Signed-off-by: Tejun Heo 
---
 fs/sysfs/file.c | 114 ++--
 1 file changed, 52 insertions(+), 62 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index af6e909..642dbcc 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -162,92 +162,82 @@ out:
 }
 
 /**
- * fill_write_buffer - copy buffer from userspace.
- * @of:open file struct.
- * @buf:   data from user.
- * @count: number of bytes in @userbuf.
+ * flush_write_buffer - push buffer to kobject
+ * @of: open file
+ * @buf: data buffer for file
+ * @count: number of bytes
  *
- * Allocate @of->page if it hasn't been already, then copy the
- * user-supplied buffer into it.
+ * Get the correct pointers for the kobject and the attribute we're dealing
+ * with, then call the store() method for it with @buf.
  */
-static int fill_write_buffer(struct sysfs_open_file *of,
-const char __user *buf, size_t count)
-{
-   int error;
-
-   if (!of->page)
-   of->page = (char *)get_zeroed_page(GFP_KERNEL);
-   if (!of->page)
-   return -ENOMEM;
-
-   if (count >= PAGE_SIZE)
-   count = PAGE_SIZE - 1;
-   error = copy_from_user(of->page, buf, count);
-
-   /*
-* If buf is assumed to contain a string, terminate it by \0, so
-* e.g. sscanf() can scan the string easily.
-*/
-   of->page[count] = 0;
-   return error ? -EFAULT : count;
-}
-
-/**
- * flush_write_buffer - push buffer to kobject.
- * @of:open file
- * @count: number of bytes
- *
- * Get the correct pointers for the kobject and the attribute we're
- * dealing with, then call the store() method for the attribute,
- * passing the buffer that we acquired in fill_write_buffer().
- */
-static int flush_write_buffer(struct sysfs_open_file *of, size_t count)
+static int flush_write_buffer(struct sysfs_open_file *of, char *buf,
+ size_t count)
 {
struct kobject *kobj = of->sd->s_parent->s_dir.kobj;
const struct sysfs_ops *ops;
-   int rc;
+   int rc = 0;
 
-   /* need @of->sd for attr and ops, its parent for kobj */
-   if (!sysfs_get_active(of->sd))
+   /*
+* Need @of->sd for attr and ops, its parent for kobj.  @of->mutex
+* nests outside active ref and is just to ensure that the ops
+* aren't called concurrently for the same open file.
+*/
+   mutex_lock(>mutex);
+   if (!sysfs_get_active(of->sd)) {
+   mutex_unlock(>mutex);
return -ENODEV;
+   }
 
ops = sysfs_file_ops(of->sd);
-   rc = ops->store(kobj, of->sd->s_attr.attr, of->page, count);
+   rc = ops->store(kobj, of->sd->s_attr.attr, buf, count);
 
sysfs_put_active(of->sd);
+   mutex_unlock(>mutex);
 
return rc;
 }
 
 /**
- * sysfs_write_file - write an attribute.
- * @file:  file pointer
- * @buf:   data to write
- * @count: number of bytes
- * @ppos:  starting offset
+ * sysfs_write_file - write an attribute
+ * @file: file pointer
+ * @user_buf: data to write
+ * @count: number of bytes
+ * @ppos: starting offset
  *
- * Similar to sysfs_read_file(), though working in the opposite direction.
- * We allocate and fill the data from the user in fill_write_buffer(),
- * then push it to the kobject in flush_write_buffer().
- * There is no easy way for us to know if userspace is only doing a partial
- * write, so we don't support them. We expect the entire buffer to come
- * on the first write.
- * Hint: if you're writing a value, first read the file, modify only the
- * the value you're changing, then write entire buffer back.
+ * Copy data in from userland and pass it to the matching
+ * sysfs_ops->store() by invoking flush_write_buffer().
+ *
+ * There is no easy way for us to know if userspace is only doing a partial
+ * write, so we don't support them. We expect the entire buffer to come on
+ * the first write.  Hint: if you're writing a value, first read the file,
+ * modify only the the value you're changing, then write entire buffer
+ * back.
  */
-static ssize_t sysfs_write_file(struct file *file, const char __user *buf,

[PATCH 03/14] sysfs: remove sysfs_buffer->ops

2013-09-28 Thread Tejun Heo
Currently, sysfs_ops is fetched during sysfs_open_file() and cached in
sysfs_buffer->ops to be used while the file is open.  This patch
removes the caching and makes each operation directly fetch sysfs_ops.

This patch doesn't introduce any behavior difference and is to prepare
for merging regular and bin file supports.

Signed-off-by: Tejun Heo 
---
 fs/sysfs/file.c | 33 +
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index e2fafc0..7dfcc33 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -45,12 +45,23 @@ struct sysfs_open_dirent {
 struct sysfs_buffer {
size_t  count;
char*page;
-   const struct sysfs_ops  *ops;
struct mutexmutex;
int event;
struct list_headlist;
 };
 
+/*
+ * Determine ktype->sysfs_ops for the given sysfs_dirent.  This function
+ * must be called while holding an active reference.
+ */
+static const struct sysfs_ops *sysfs_file_ops(struct sysfs_dirent *sd)
+{
+   struct kobject *kobj = sd->s_parent->s_dir.kobj;
+
+   lockdep_assert_held(sd);
+   return kobj->ktype ? kobj->ktype->sysfs_ops : NULL;
+}
+
 /**
  * fill_read_buffer - allocate and fill buffer from object.
  * @dentry:dentry pointer.
@@ -66,7 +77,7 @@ static int fill_read_buffer(struct dentry *dentry, struct 
sysfs_buffer *buffer)
 {
struct sysfs_dirent *attr_sd = dentry->d_fsdata;
struct kobject *kobj = attr_sd->s_parent->s_dir.kobj;
-   const struct sysfs_ops *ops = buffer->ops;
+   const struct sysfs_ops *ops;
int ret = 0;
ssize_t count;
 
@@ -80,6 +91,8 @@ static int fill_read_buffer(struct dentry *dentry, struct 
sysfs_buffer *buffer)
return -ENODEV;
 
buffer->event = atomic_read(_sd->s_attr.open->event);
+
+   ops = sysfs_file_ops(attr_sd);
count = ops->show(kobj, attr_sd->s_attr.attr, buffer->page);
 
sysfs_put_active(attr_sd);
@@ -191,13 +204,14 @@ static int flush_write_buffer(struct dentry *dentry,
 {
struct sysfs_dirent *attr_sd = dentry->d_fsdata;
struct kobject *kobj = attr_sd->s_parent->s_dir.kobj;
-   const struct sysfs_ops *ops = buffer->ops;
+   const struct sysfs_ops *ops;
int rc;
 
/* need attr_sd for attr and ops, its parent for kobj */
if (!sysfs_get_active(attr_sd))
return -ENODEV;
 
+   ops = sysfs_file_ops(attr_sd);
rc = ops->store(kobj, attr_sd->s_attr.attr, buffer->page, count);
 
sysfs_put_active(attr_sd);
@@ -205,7 +219,6 @@ static int flush_write_buffer(struct dentry *dentry,
return rc;
 }
 
-
 /**
  * sysfs_write_file - write an attribute.
  * @file:  file pointer
@@ -334,14 +347,11 @@ static int sysfs_open_file(struct inode *inode, struct 
file *file)
return -ENODEV;
 
/* every kobject with an attribute needs a ktype assigned */
-   if (kobj->ktype && kobj->ktype->sysfs_ops)
-   ops = kobj->ktype->sysfs_ops;
-   else {
-   WARN(1, KERN_ERR
-"missing sysfs attribute operations for kobject: %s\n",
-kobject_name(kobj));
+   ops = sysfs_file_ops(attr_sd);
+   if (WARN(!ops, KERN_ERR
+"missing sysfs attribute operations for kobject: %s\n",
+kobject_name(kobj)))
goto err_out;
-   }
 
/* File needs write support.
 * The inode's perms must say it's ok,
@@ -370,7 +380,6 @@ static int sysfs_open_file(struct inode *inode, struct file 
*file)
goto err_out;
 
mutex_init(>mutex);
-   buffer->ops = ops;
file->private_data = buffer;
 
/* make sure we have open dirent struct */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHSET] sysfs: use seq_file and unify regular and bin file handling

2013-09-28 Thread Tejun Heo
Hello,

Currently, sysfs's file handling is a bit weird.

* Regular and bin file paths are similar but implemented completely
  separately duplicating some hairy logics.

* Read path implements custom buffering which is essentially
  degenerate seq_file.

In addition, sysfs core implementation is planned to be separated out
so that it can be shared by other subsystems and the current file
handling is too restrictive and quirky to spread further to other
parts of the kernel.  It'd be a lot more desirable to have read path
completely handled by seq_file which is a lot more versatile and would
also increase overall behavior consistency.

This patchset updates file handling such that read is handled by
seq_file and then merges bin file handling into regular file path.
While some changes introduces behavior changes in extreme corner
cases, they are highly unlikely to be noticeable (please refer to the
description of each patch for details) and generally bring sysfs's
behavior closer to those of procfs or any pseudo filesystem which
makes use of seq_file.

After the conversion, LOC is reduced by ~110 lines and read path is
fully handled by seq_file, which allows defining a new seq_file based
core interface which will enable sharing sysfs from other subsystems.

This patchset contains the following patches.

 0001-sysfs-remove-unused-sysfs_buffer-pos.patch
 0002-sysfs-remove-sysfs_buffer-needs_read_fill.patch
 0003-sysfs-remove-sysfs_buffer-ops.patch
 0004-sysfs-add-sysfs_open_file_mutex.patch
 0005-sysfs-rename-sysfs_buffer-to-sysfs_open_file.patch
 0006-sysfs-add-sysfs_open_file-sd-and-file.patch
 0007-sysfs-use-transient-write-buffer.patch
 0008-sysfs-use-seq_file-when-reading-regular-files.patch
 0009-sysfs-prepare-llseek-path-for-unified-regular-bin-fi.patch
 0010-sysfs-prepare-path-write-for-unified-regular-bin-fil.patch
 0011-sysfs-prepare-read-path-for-unified-regular-bin-file.patch
 0012-sysfs-copy-bin-mmap-support-from-fs-sysfs-bin.c-to-f.patch
 0013-sysfs-prepare-open-path-for-unified-regular-bin-file.patch
 0014-sysfs-merge-regular-and-bin-file-handling.patch

0001-0006 are misc preps.

0007 makes write path use transient buffer instead of the one
persistent during open.

0008 makes read path use seq_file.

0009-0014 merge bin file handling into regular file support.

The patches are on top of

  linus#master c2d95729e3 ("Merge branch 'akpm' (patches from Andrew Morton)")
+ [1] [PATCHSET] sysfs: disentangle kobject namespace handling from sysfs
+ [2] [PATCHSET] sysfs: implement sysfs_remove()

and available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git review-sysfs-seq_file

diffstat follows, thanks.

 fs/sysfs/Makefile |3 
 fs/sysfs/bin.c|  502 ---
 fs/sysfs/dir.c|2 
 fs/sysfs/file.c   |  805 --
 fs/sysfs/inode.c  |2 
 fs/sysfs/sysfs.h  |6 
 include/linux/sysfs.h |7 
 7 files changed, 606 insertions(+), 721 deletions(-)

Thanks.

--
tejun

[1] http://thread.gmane.org/gmane.linux.kernel/1560372
[2] http://thread.gmane.org/gmane.linux.kernel/1564002
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] msi: free msi_desc entry only after we've released the kobject

2013-09-28 Thread Veaceslav Falico

On Thu, Sep 26, 2013 at 11:59:51AM +0200, Veaceslav Falico wrote:

Currently, we first do kobject_put(>kobj) and the kfree(entry),
however kobject_put() doesn't guarantee us that it was the last reference
and that the kobj isn't used currently by someone else, so after we
kfree(entry) with the struct kobject - other users will begin using the
freed memory, instead of the actual kobject.


Hi Bjorn,

I've seen that you've dropped this bugfix (and the 3 cleanup patches) with
"Changes Requested", however I don't recall any request to change this.

I'm really sorry for bugging - but I need this fix to get included for my
testing to work :(.

This fix fixes the critical bug when we free the entry with the kobject
.../msi_irqs/IRQ_NR still being used, however the issue with msi_irqs kset
itself still exists, and really is a different issue - and I'm trying to
find a fix for it.

Thank you, and sorry for the mess and the noise.



Fix this by using the kobject->release callback, which is called last when
the kobject is indeed not used and is cleaned up - it's msi_kobj_release(),
which can do the kfree(entry) safely (kobject_put/cleanup doesn't use the
kobj itself after ->release() was called, so we're safe).

In case we've failed to create the sysfs directories - just kfree()
it - cause we don't have the kobjects attached.

Also, remove the same functionality from populate_msi_sysfs(), cause on
failure we anyway call free_msi_irqs(), which will take care of all the
kobjects properly.

And add the forgotten pci_dev_put(pdev) in case of failure to register the
kobject in populate_msi_sysfs().

CC: Bjorn Helgaas 
CC: Neil Horman 
CC: Greg Kroah-Hartman 
CC: linux-...@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Veaceslav Falico 
---

Notes:
   v1  -> v2:
   Make it as a standalone patch, which is a bugfix, and add the forgotten
   pci_dev_put() so that it won't break bisecting. The pci_dev_put() will
   go away anyway in the following patchset, which cleans removes
   kobject_del and useless pci_dev_get/put().  Rebased on linux-pci/next.

drivers/pci/msi.c | 31 ---
1 file changed, 12 insertions(+), 19 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index d5f90d6..5d70f49 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -374,19 +374,22 @@ static void free_msi_irqs(struct pci_dev *dev)
iounmap(entry->mask_base);
}

+   list_del(>list);
+
/*
 * Its possible that we get into this path
 * When populate_msi_sysfs fails, which means the entries
 * were not registered with sysfs.  In that case don't
-* unregister them.
+* unregister them, and just free. Otherwise the
+* kobject->release will take care of freeing the entry via
+* msi_kobj_release().
 */
if (entry->kobj.parent) {
kobject_del(>kobj);
kobject_put(>kobj);
+   } else {
+   kfree(entry);
}
-
-   list_del(>list);
-   kfree(entry);
}
}

@@ -509,6 +512,7 @@ static void msi_kobj_release(struct kobject *kobj)
struct msi_desc *entry = to_msi_desc(kobj);

pci_dev_put(entry->dev);
+   kfree(entry);
}

static struct kobj_type msi_irq_ktype = {
@@ -522,7 +526,6 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
struct msi_desc *entry;
struct kobject *kobj;
int ret;
-   int count = 0;

pdev->msi_kset = kset_create_and_add("msi_irqs", NULL, >dev.kobj);
if (!pdev->msi_kset)
@@ -534,23 +537,13 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
pci_dev_get(pdev);
ret = kobject_init_and_add(kobj, _irq_ktype, NULL,
 "%u", entry->irq);
-   if (ret)
-   goto out_unroll;
-
-   count++;
+   if (ret) {
+   pci_dev_put(pdev);
+   return ret;
+   }
}

return 0;
-
-out_unroll:
-   list_for_each_entry(entry, >msi_list, list) {
-   if (!count)
-   break;
-   kobject_del(>kobj);
-   kobject_put(>kobj);
-   count--;
-   }
-   return ret;
}

/**
--
1.8.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 19/21] cpuidle: create list of registered drivers

2013-09-28 Thread Paul E. McKenney
On Thu, Sep 26, 2013 at 10:19:14AM +0200, Daniel Lezcano wrote:
> On 09/26/2013 08:17 AM, Viresh Kumar wrote:
> >On 26 September 2013 04:00, Daniel Lezcano  wrote:
> >>If you introduce a list, you will have to introduce a lock to protect
> >>it.
> >
> >I missed it, should have added that :)
> >
> >>This lock will be in the fast path cpuidle_idle_call with the
> >>get_driver function and conforming to the comment: "NOTE: no locks or
> >>semaphores should be used here".
> >>
> >>A lock has been introduced in this function already and the system hangs
> >>with 1024 cpus.
> >
> >Hmm... I see.. I didn't knew about this expectation.. What about a rcu
> >read/write lock? As far as I know its too lightweight... Can we have that
> >in fast path?
> 
> Nope, we can't use rcu in the idle path :)
> 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2012-February/083054.html

But you should be able to use SRCU in the idle path, if that helps.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] extending splice for copy offloading

2013-09-28 Thread Ric Wheeler

On 09/28/2013 11:20 AM, Myklebust, Trond wrote:

-Original Message-
From: Miklos Szeredi [mailto:mik...@szeredi.hu]
Sent: Saturday, September 28, 2013 12:50 AM
To: Zach Brown
Cc: J. Bruce Fields; Ric Wheeler; Anna Schumaker; Kernel Mailing List; Linux-
Fsdevel; linux-...@vger.kernel.org; Myklebust, Trond; Schumaker, Bryan;
Martin K. Petersen; Jens Axboe; Mark Fasheh; Joel Becker; Eric Wong
Subject: Re: [RFC] extending splice for copy offloading

On Fri, Sep 27, 2013 at 10:50 PM, Zach Brown  wrote:

Also, I don't get the first option above at all.  The argument is
that it's safer to have more copies?  How much safety does another
copy on the same disk really give you?  Do systems that do dedup
provide interfaces to turn it off per-file?

I don't see the safety argument very compelling either.  There are real
semantic differences, however: ENOSPC on a write to a
(apparentlíy) already allocated block.  That could be a bit unexpected.  Do we
need a fallocate extension to deal with shared blocks?

The above has been the case for all enterprise storage arrays ever since the 
invention of snapshots. The NFSv4.2 spec does allow you to set a per-file 
attribute that causes the storage server to always preallocate enough buffers 
to guarantee that you can rewrite the entire file, however the fact that we've 
lived without it for said 20 years leads me to believe that demand for it is 
going to be limited. I haven't put it top of the list of features we care to 
implement...

Cheers,
Trond


I agree - this has been common behaviour for a very long time in the array 
space. Even without an array,  this is the same as overwriting a block in btrfs 
or any file system with a read-write LVM snapshot.


Regards,

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Samsung N145 Plus lid state issue on sleep

2013-09-28 Thread Nick Warne
On Thu, Sep 26, 2013 at 06:25:00PM +0100, Nick Warne wrote:
> Hi all,
> 
> I have a strange problem, which has been on going on for ages, and I finally 
> decided to look at it (as it is a pain in the arse).
> 
> Brief details:
> 
> Samsung N145 Plus running Slack 14 with handbuilt kernel
> Kernel: Linux 3.11.1 #3 SMP Mon Sep 23 19:09:00 BST 2013 i686 Intel(R) 
> Atom(TM) CPU N455   @ 1.66GHz GenuineIntel GNU/Linux
> I have no modules built in (.config on request if it helps).
> 
> This issue also happened with 'distro' kernel builds... so either it is BIOS 
> issue or hardware fault.  But just in case:
> 
> Boot laptop into console - no X - so running pure acpi events.
> 
> cat /proc/acpi/button/lid/LID0/state
> state:  open
> 
> shut lid
> 
> laptop goes to sleep all great.
> 
> open lid.  Laptop wakes up, video, wlan0 all comes on line, everything hunky 
> dory - but:
> 
> cat /proc/acpi/button/lid/LID0/state
> state:  closed
> 
> The lid is open, of course!
> 
> OK, shut lid.  LCD backlight goes off (so something knows the lid is shut), 
> but no sleep event.  Open lid after a few seconds (maybe 10), and screen 
> lights up and then laptop goes to sleep!
> 
> Shut lid (wait for a few seconds), open lid, laptop wakes up fine again, and 
> now:
> 
> cat /proc/acpi/button/lid/LID0/state
> state:  open
> 
> !
> 
> So it appears that closing lid flags 'closed' state but opening it doesn't 
> flag 'open' state... unless I then close it again and open which then flags 
> 'closed' state when open so goes to sleep.  So no open it again, and 'state 
> now reports 'open' again.  At this point, back to square one (confused?  I 
> am!).
> 
> Using Fn [sleep] in any mode above works OK.  The same happens in X using 
> xfce4 PM or similar.
> 
> What is confusing me is that something can see the lid flapping as backlight 
> works on lid open/close.
> 
> acpi_listen reports the events as described above, but I can't work out how 
> to record the events when a sleep :)
> 
> And ideas/help etc. appreciated, and also I am in the position to be able to 
> debug (with help, of course)!

OK, doing a lot of research, it appears the dsdt is well fubarred.

I have now managed to get a clean build of the extracted dsdt, and testing with 
various (LIDS) stuff in the code it seems that something is drastically wrong.

Anyhow, I have now got a decent working dsdt that at least sleeps everytime on 
lid close - although it then goes to sleep again after lid is open, but I can 
handle that (reverse of my original problem, almost, but at least lid close 
makes it sleep 100%).

Sleep button (Fn Esc) works as it should.

Anybody good at asl coding?  There is some thing obvioulsy wrong with the logic 
in this code.

Nick
-- 
FSF Associate Member 5508
http://linicks.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Out-of-bounds access in get_wchan (arch/x86/kernel/process_64.c)

2013-09-28 Thread Kees Cook
On Sat, Sep 28, 2013 at 09:51:14PM +0200, Wolfram Gloger wrote:
> Kees Cook  writes:
> 
> > Please note that these bounds checks aren't correct to begin with. Since
> > a pointer is being dereferenced, the end boundry must be reduced by
> > sizeof(unsigned long) as well.
> >
> > It looks like process_32.c suffers the same problems, too.
> 
> I can't see the end boundary problem in process_32.c.  The end checks
> are properly reduced with the top_esp and top_ebp macros.
> 
> All I can see in process_32.c is that the check
> 
>   if (bp < stack_page || bp > top_ebp+stack_page)
> 
> could be replaced by:
> 
>   if (bp < stack_page-sizeof(unsigned long) || bp > 
> top_ebp+stack_page)
> 
> but that is a relaxation and not an over/underrun fix.
> 
> Can you elaborate what problem you see in process_32.c?

Ah, yes, sorry, this appears to only be a problem in process_64.c. I didn't look
closely enough. I see now that top_esp and top_ebp correctly reduce the size of
THREAD_SIZE. Thanks!

-Kees

-- 
Kees Cook@outflux.net
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hotplug: Optimize {get,put}_online_cpus()

2013-09-28 Thread Paul E. McKenney
On Sat, Sep 28, 2013 at 02:48:59PM +0200, Oleg Nesterov wrote:
> On 09/27, Peter Zijlstra wrote:
> >
> > On Fri, Sep 27, 2013 at 08:15:32PM +0200, Oleg Nesterov wrote:
> >
> > > > +static bool cpuhp_readers_active_check(void)
> > > >  {
> > > > +   unsigned int seq = per_cpu_sum(cpuhp_seq);
> > > > +
> > > > +   smp_mb(); /* B matches A */
> > > > +
> > > > +   /*
> > > > +* In other words, if we see __get_online_cpus() cpuhp_seq 
> > > > increment,
> > > > +* we are guaranteed to also see its __cpuhp_refcount increment.
> > > > +*/
> > > >
> > > > +   if (per_cpu_sum(__cpuhp_refcount) != 0)
> > > > +   return false;
> > > >
> > > > +   smp_mb(); /* D matches C */
> > >
> > > It seems that both barries could be smp_rmb() ? I am not sure the comments
> > > from srcu_readers_active_idx_check() can explain mb(),
> 
> To avoid the confusion, I meant "those comments can't explain mb()s here,
> in cpuhp_readers_active_check()".
> 
> > > note that
> > > __srcu_read_lock() always succeeds unlike get_cpus_online().
> 
> And this cput_hotplug_ and synchronize_srcu() differ, see below.
> 
> > I see what you mean; cpuhp_readers_active_check() is all purely reads;
> > there are no writes to order.
> >
> > Paul; is there any argument for the MB here as opposed to RMB;
> 
> Yes, Paul, please ;)

Sorry to be slow -- I will reply by end of Monday Pacific time at the
latest.  I need to allow myself enough time so that it seems new...

Also I might try some mechanical proofs of parts of it.

Thanx, Paul

> > and if
> > not should we change both these and SRCU?
> 
> I guess that SRCU is more "complex" in this respect. IIUC,
> cpuhp_readers_active_check() needs "more" barriers because if
> synchronize_srcu() succeeds it needs to synchronize with the new readers
> which call srcu_read_lock/unlock() "right now". Again, unlike cpu-hotplug
> srcu never blocks the readers, srcu_read_*() always succeeds.
> 
> 
> 
> Hmm. I am wondering why __srcu_read_lock() needs ACCESS_ONCE() to increment
> ->c and ->seq. A plain this_cpu_inc() should be fine?
> 
> And since it disables preemption, why it can't use __this_cpu_inc() to inc
> ->c[idx]. OK, in general __this_cpu_inc() is not irq-safe (rmw) so we can't
> do __this_cpu_inc(seq[idx]), c[idx] should be fine? If irq does 
> srcu_read_lock()
> it should also do _unlock.
> 
> But this is minor/offtopic.
> 
> > > >  void cpu_hotplug_done(void)
> > > >  {
> ...
> > > > +   /*
> > > > +* Wait for any pending readers to be running. This ensures 
> > > > readers
> > > > +* after writer and avoids writers starving readers.
> > > > +*/
> > > > +   wait_event(cpuhp_writer, !atomic_read(_waitcount));
> > > >  }
> > >
> > > OK, to some degree I can understand "avoids writers starving readers"
> > > part (although the next writer should do synchronize_sched() first),
> > > but could you explain "ensures readers after writer" ?
> >
> > Suppose reader A sees state == BLOCK and goes to sleep; our writer B
> > does cpu_hotplug_done() and wakes all pending readers. If for some
> > reason A doesn't schedule to inc ref until B again executes
> > cpu_hotplug_begin() and state is once again BLOCK, A will not have made
> > any progress.
> 
> Yes, yes, thanks, this is clear. But this explains "writers starving readers".
> And let me repeat, if B again executes cpu_hotplug_begin() it will do
> another synchronize_sched() before it sets BLOCK, so I am not sure we
> need this "in practice".
> 
> I was confused by "ensures readers after writer", I thought this means
> we need the additional synchronization with the readers which are going
> to increment cpuhp_waitcount, say, some sort of barries.
> 
> Please note that this wait_event() adds a problem... it doesn't allow
> to "offload" the final synchronize_sched(). Suppose a 4k cpu machine
> does disable_nonboot_cpus(), we do not want 2 * 4k * synchronize_sched's
> in this case. We can solve this, but this wait_event() complicates
> the problem.
> 
> Oleg.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][possible solution] RCU vfsmounts

2013-09-28 Thread Linus Torvalds
On Sat, Sep 28, 2013 at 1:27 PM, Al Viro  wrote:
> FWIW, I think I have a kinda-sorta solution for that and I'd like
> to hear your comments on that.  I want to replace vfsmount_lock with seqlock
> and store additional seq number in nameidata, set to vfsmount_seq in the
> beginning and rechecked in unlazy_walk/complete_walk.

Yes, that would be lovely.

> The obvious variant would be to have unlazy_walk/complete_walk to
> grab refcount, check vfsmount_seq and mntput on mismatch.  The trouble
> with that is race with what would've been the final mntput() done by
> umount(2); complete_walk() would drop that temporary reference and
> fail, all right, but... we would get a umount(2) returning without having
> actually shut the filesystem down.  Said shutdown would happen in whoever
> had been doing pathname resolution that stepped into the race.

Sounds reasonable to to me.

Side note: I really wish there was some way to avoid having to
finalize the vfsmount entirely for some common things. For example,
"[l]stat[at]()" really doesn't need it for the common cases (network
filesystems may need to revalidate), and is a very critical operation,
and we *could* just look up the inode under RCU and never finalize the
dentry _or_ the vfsmount. However, very annoyingly, the security layer
wants the vfsmount, and we don't know if that is RCU-safe...

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.12 regression: i915 warnings

2013-09-28 Thread Daniel Vetter
On Sat, Sep 28, 2013 at 10:33 PM, Woody Suwalski  wrote:
> Warnings are gone with this patch (and other 4 patches  since 3.2.12-rc2
> from drm tree).

Ok, fix should land in stable kernels soonish (depends upon how
quickly the pull request propagate). Thanks for reporting this issue
and testing the patch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Guten Tag

2013-09-28 Thread Mr. Sun Zhijun
Guten Tag,

Mein Name ist Mr.SUN Zhijun, ich mit der Bank of China zu arbeiten. Ich brauche 
Ihre Unterstützung in Durchführung einer Transaktion bei $ 18,5 Millionen 
Dollar geschätzt, möchte ich Ihnen 30% der gesamten Mittel als Ausgleich für 
Ihre Unterstützung in dieser Transaktion. Ich werde Sie über die vollständige 
Transaktion benachrichtigt nach Eingang Ihrer Antwort, wenn interessiert, bitte 
senden Sie mir Ihren vollständigen detials als unten, um meine E-Mail 
aufgeführt: sun_zhi...@yahoo.com.hk

1. Vollständiger Name
2.Private Telefonnummer
3.Current Wohnadresse

Mit freundlichen Grüßen,
Mr.SUN Zhijun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.12 regression: i915 warnings

2013-09-28 Thread Woody Suwalski

Daniel Vetter wrote:

On Fri, Sep 27, 2013 at 7:27 PM, Woody Suwalski  wrote:

Daniel Vetter wrote:

On Thu, Sep 26, 2013 at 2:36 PM, Woody Suwalski 
wrote:

Daniel, I have noticed these warnings on 3.12-rc1, did not go away on
3.12-rc2.
I see it only on EeePC with i915,not on ThinkPad with Radeon.
It is a 32-bit kernel with overlayfs and TuxOnIce patches, so not
perfectly
clean, however same config and patches on 3.11 do not show these issues.
No,sorry, did not have time to investigate further or bisect.If you have
a
quick test in mind - I will try ;-)


a) Please always cc: relevant mailing lists, not just your maintainer.
b) Please retest with latest drm-intel-fixes from
http://cgit.freedesktop.org/~danvet/drm-intel/
c) If that doesn't help please boot with drm.debug=0xe, reproduce the
issue once and attach the complete dmesg. Please make sure that it
contains everything from boot-up to the WARN.

Thanks, Daniel

For now I can do the last - a complete dmesg with drm.debug attached...
I have hoped that you were to reply "Yawn, an old issue..." 8-)

Here we go: Mode flag mismatch in the tv encoder, should be fixed with

http://cgit.freedesktop.org/~danvet/drm-intel/commit/?id=1062b81598bc00e2f6620e6f3788f8f8df2f01e7

Pull request is already out and patch is cc: stable, so should show up
in a kernel near you soon.
-Daniel
Warnings are gone with this patch (and other 4 patches  since 3.2.12-rc2 
from drm tree).


Thanks, Woody


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] perf stat cleanups

2013-09-28 Thread David Ahern
Arnaldo:

A few perf-stat cleanups.

David Ahern (3):
  perf stat: Fix misleading message when specifying cpu list or system wide
  perf stat: Don't require a workload when using system wide or CPU options
  perf stat: Add units to nanosec-based counters

 tools/perf/builtin-stat.c |   15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

-- 
1.7.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] perf stat: Add units to nanosec-based counters

2013-09-28 Thread David Ahern
Ingo pointed out that the task-clock counter should have the units explicitly
stated since it is not a counter.

Before:

perf stat -a -- sleep 1

 Performance counter stats for 'sleep 1':

  16186.874834 task-clock  #   16.154 CPUs utilized
...

After:

perf stat -a -- sleep 1

 Performance counter stats for 'system wide':

  16146.402138 task-clock (msec)   #   16.125 CPUs utilized
...

Reported-by: Ingo Molnar 
Signed-off-by: David Ahern 
Cc: Ingo Molnar 
Cc: Frederic Weisbecker 
Cc: Peter Zijlstra 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Stephane Eranian 
---
 tools/perf/builtin-stat.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 60239fe..746a6db 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -628,10 +628,13 @@ static void nsec_printout(int cpu, int nr, struct 
perf_evsel *evsel, double avg)
 {
double msecs = avg / 1e6;
const char *fmt = csv_output ? "%.6f%s%s" : "%18.6f%s%-25s";
+   char name[25];
 
aggr_printout(evsel, cpu, nr);
 
-   fprintf(output, fmt, msecs, csv_sep, perf_evsel__name(evsel));
+   scnprintf(name, sizeof(name), "%s%s",
+ perf_evsel__name(evsel), csv_output ? "" : " (msec)");
+   fprintf(output, fmt, msecs, csv_sep, name);
 
if (evsel->cgrp)
fprintf(output, "%s%s", csv_sep, evsel->cgrp->name);
-- 
1.7.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] perf stat: Don't require a workload when using system wide or CPU options

2013-09-28 Thread David Ahern
perf-stat can do system wide counters or one or more cpus. For
these options do not require a workload to be specified.

Signed-off-by: David Ahern 
Cc: Ingo Molnar 
Cc: Frederic Weisbecker 
Cc: Peter Zijlstra 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Stephane Eranian 
---
 tools/perf/builtin-stat.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 6cc0aa2..60239fe 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1517,8 +1517,10 @@ int cmd_stat(int argc, const char **argv, const char 
*prefix __maybe_unused)
} else if (big_num_opt == 0) /* User passed --no-big-num */
big_num = false;
 
-   if (!argc && !perf_target__has_task())
+   if (!argc && !perf_target__has_task() &&
+   !perf_target__has_cpu())
usage_with_options(stat_usage, options);
+
if (run_count < 0) {
usage_with_options(stat_usage, options);
} else if (run_count == 0) {
-- 
1.7.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] perf stat: Fix misleading message when specifying cpu list or system wide

2013-09-28 Thread David Ahern
perf-stat displays the command run in its summary output which is misleading
when using a cpu list or system wide collection.

Before:

perf stat -a -- sleep 1

 Performance counter stats for 'sleep 1':

16152.670249 task-clock#   16.132 CPUs utilized
 417 context-switches  #0.002 M/sec
   7 cpu-migrations#0.030 K/sec
...

After:

perf stat -a -- sleep 1

 Performance counter stats for 'system wide':

16206.931120 task-clock#   16.144 CPUs utilized
 395 context-switches  #0.002 M/sec
   5 cpu-migrations#0.030 K/sec
...

or

perf stat -C1 -- sleep 1

 Performance counter stats for 'CPU(s) 1':

   1001.669257 task-clock#1.000 CPUs utilized
 4,264 context-switches  #0.004 M/sec
 3 cpu-migrations#0.003 K/sec
...

Signed-off-by: David Ahern 
Cc: Ingo Molnar 
Cc: Frederic Weisbecker 
Cc: Peter Zijlstra 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Stephane Eranian 
---
 tools/perf/builtin-stat.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index f686d5f..6cc0aa2 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1115,7 +1115,11 @@ static void print_stat(int argc, const char **argv)
if (!csv_output) {
fprintf(output, "\n");
fprintf(output, " Performance counter stats for ");
-   if (!perf_target__has_task()) {
+   if (target.system_wide)
+   fprintf(output, "\'system wide");
+   else if (target.cpu_list)
+   fprintf(output, "\'CPU(s) %s", target.cpu_list);
+   else if (!perf_target__has_task()) {
fprintf(output, "\'%s", argv[0]);
for (i = 1; i < argc; i++)
fprintf(output, " %s", argv[i]);
-- 
1.7.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[rfc][possible solution] RCU vfsmounts

2013-09-28 Thread Al Viro
FWIW, I think I have a kinda-sorta solution for that and I'd like
to hear your comments on that.  I want to replace vfsmount_lock with seqlock
and store additional seq number in nameidata, set to vfsmount_seq in the
beginning and rechecked in unlazy_walk/complete_walk.

The obvious variant would be to have unlazy_walk/complete_walk to
grab refcount, check vfsmount_seq and mntput on mismatch.  The trouble
with that is race with what would've been the final mntput() done by
umount(2); complete_walk() would drop that temporary reference and
fail, all right, but... we would get a umount(2) returning without having
actually shut the filesystem down.  Said shutdown would happen in whoever
had been doing pathname resolution that stepped into the race.

I _think_ I have a workable variant:
* new vfsmount flag (MNT_SYNC_UMOUNT or something like that) and
ability to tell umount_tree() to set that on all victims; done on
non-lazy umount and on expiry.  Never cleared once set, and set only
when propagate_mount_busy() has been called and returned true.
Set before bumping vfsmount_seq.
* rcu_barrier() added in namespace_unlock(), between
dropping namespace_sem and doing mntput() on the victims.
* unlazy_walk() and complete_walk() use the common helper along
the lines of

legitimize_mnt(struct vfsmount *mnt, unsigned seq)
{
if (read_seqcount_retry(_seq, seq)) {
rcu_read_unlock();
return false;
}
mntget(mnt);
if (!read_seqcount_retry(_seq, seq)) {
rcu_read_unlock();
return true;
}
if (mnt->mnt_flags & MNT_SYNC_UMOUNT) {
/* it couldn't have gotten through rcu_barrier() yet */
mnt_add_count(real_mount(mnt), -1);
rcu_read_unlock();
return false;
}
rcu_read_unlock();
mntput(mnt);
return false;
}

Freeing vfsmounts would be done with rcu delay, vfsmount hash lookups,
d_path(), etc. do the obvious things as we do with rename_lock for dentry
side of things - that stuff is all obvious.  Not ending up with final
mntput() stolen from something that really expects it to be final is the
hard part and it looks like the above would be a solution.

Comments?  AFAICS, that would've killed *all* vfsmount-related locked stores
in RCU-mode pathwalks...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Please revert 928bea964827d7824b548c1f8e06eccbbc4d0d7d

2013-09-28 Thread Yinghai Lu
On Fri, Sep 27, 2013 at 8:05 PM, Benjamin Herrenschmidt
 wrote:
> On Fri, 2013-09-27 at 16:44 -0700, Yinghai Lu wrote:
>
> In the meantime, can you properly submit the other one with the warning
> to Linus ? It will make things more robust overall...

https://patchwork.kernel.org/patch/2959121/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] PCI: Workaround missing pci_set_master in pci drivers

2013-09-28 Thread Yinghai Lu
BenH found:
| 928bea964827d7824b548c1f8e06eccbbc4d0d7d
| PCI: Delay enabling bridges until they're needed

break PCI on powerpc.  The reason is that the PCIe port driver will
call pci_enable_device() on the bridge, so device enabled (but skip
pci_set_master because pcie_port_auto and no acpi on powerpc ).

Because of that, pci_enable_bridge() later on (called as a result of the
child device driver doing pci_enable_device) will see the bridge as
already enabled and will not call pci_set_master() on it.

Fixed by add checking in pci_enable_bridge, and call pci_set_master
if driver skip that.
That will make the code more robot and wade off problem for missing
pci_set_master in drivers.

Reported-by: Benjamin Herrenschmidt 
Signed-off-by: Yinghai Lu 

---
 drivers/pci/pci.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1156,8 +1156,14 @@ static void pci_enable_bridge(struct pci
 
pci_enable_bridge(dev->bus->self);
 
-   if (pci_is_enabled(dev))
+   if (pci_is_enabled(dev)) {
+   if (!dev->is_busmaster) {
+   dev_warn(>dev, "driver skip pci_set_master, fix 
it!\n");
+   pci_set_master(dev);
+   }
return;
+   }
+
retval = pci_enable_device(dev);
if (retval)
dev_err(>dev, "Error enabling bridge (%d), continuing\n",
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v1.1] x86, boot: Further compress CPUs bootup message

2013-09-28 Thread Borislav Petkov
On Sat, Sep 28, 2013 at 09:58:04PM +0200, Ingo Molnar wrote:
> It's also probably the first time that code entered on an ordinary
> cell phone has gets into the Linux kernel, so it's probably a new
> Linux milestone, in a twisted, sick way. ;-)

We could put the following comment *below* the function in
arch/x86/lib/misc.c:

/*
 * Sent from my mobile phone.  Please pardon brevity and lack of formatting.
 */

LoooL.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v1.1] x86, boot: Further compress CPUs bootup message

2013-09-28 Thread Borislav Petkov
On Sat, Sep 28, 2013 at 09:58:33PM +0200, Ingo Molnar wrote:
> 
> * Borislav Petkov  wrote:
> 
> > On Sat, Sep 28, 2013 at 09:49:27PM +0200, Borislav Petkov wrote:
> > > And yes, that one works too.
> > 
> > Btw, just to be thorough, we could handle negative numbers too:
> > 
> > int num_digits(int val)
> > {
> > int m = 10;
> > int d = 1;
> > 
> > if (val < 0)
> > val = -val;
> > 
> > while (val >= m) {
> > m *= 10;
> > d++;
> > }
> > 
> > return d;
> > }
> 
> I like the cell phone version better!

This *is* the cell phone version + negative numbers handling.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] serial: 8250_pci: clean up printk() calls

2013-09-28 Thread Greg Kroah-Hartman
Move the printk() calls to to dev_*() instead, to tie into the dynamic
debugging infrastructure.

Also change some "raw" printk() calls to dev_err() to provide a better
error message to userspace so it can properly identify the device and
not just have to guess.

Signed-off-by: Greg Kroah-Hartman 

---
 drivers/tty/serial/8250/8250_pci.c |   40 +
 1 file changed, 19 insertions(+), 21 deletions(-)

--- a/drivers/tty/serial/8250/8250_pci.c
+++ b/drivers/tty/serial/8250/8250_pci.c
@@ -9,6 +9,7 @@
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License.
  */
+#undef DEBUG
 #include 
 #include 
 #include 
@@ -27,8 +28,6 @@
 
 #include "8250.h"
 
-#undef SERIAL_DEBUG_PCI
-
 /*
  * init function returns:
  *  > 0 - number of ports
@@ -63,7 +62,7 @@ static int pci_default_setup(struct seri
 
 static void moan_device(const char *str, struct pci_dev *dev)
 {
-   printk(KERN_WARNING
+   dev_err(>dev,
   "%s: %s\n"
   "Please send the output of lspci -vv, this\n"
   "message (0x%04x,0x%04x,0x%04x,0x%04x), the\n"
@@ -233,7 +232,7 @@ static int pci_inteli960ni_init(struct p
/* is firmware started? */
pci_read_config_dword(dev, 0x44, (void *));
if (oldval == 0x1000L) { /* RESET value */
-   printk(KERN_DEBUG "Local i960 firmware missing");
+   dev_dbg(>dev, "Local i960 firmware missing\n");
return -ENODEV;
}
return 0;
@@ -827,7 +826,7 @@ static int pci_netmos_9900_numports(stru
if (sub_serports > 0) {
return sub_serports;
} else {
-   printk(KERN_NOTICE "NetMos/Mostech serial driver 
ignoring port on ambiguous config.\n");
+   dev_err(>dev, "NetMos/Mostech serial driver 
ignoring port on ambiguous config.\n");
return 0;
}
}
@@ -931,7 +930,7 @@ static int pci_ite887x_init(struct pci_d
}
 
if (!inta_addr[i]) {
-   printk(KERN_ERR "ite887x: could not find iobase\n");
+   dev_err(>dev, "ite887x: could not find iobase\n");
return -ENODEV;
}
 
@@ -1024,9 +1023,9 @@ static int pci_oxsemi_tornado_init(struc
/* Tornado device */
if (deviceID == 0x07000200) {
number_uarts = ioread8(p + 4);
-   printk(KERN_DEBUG
+   dev_dbg(>dev,
"%d ports detected on Oxford PCI Express device\n",
-   number_uarts);
+   number_uarts);
}
pci_iounmap(dev, p);
return number_uarts;
@@ -1349,12 +1348,10 @@ static int skip_tx_en_setup(struct seria
struct uart_8250_port *port, int idx)
 {
port->port.flags |= UPF_NO_TXEN_TEST;
-   printk(KERN_DEBUG "serial8250: skipping TxEn test for device "
- "[%04x:%04x] subsystem [%04x:%04x]\n",
- priv->dev->vendor,
- priv->dev->device,
- priv->dev->subsystem_vendor,
- priv->dev->subsystem_device);
+   dev_dbg(>dev->dev,
+   "serial8250: skipping TxEn test for device [%04x:%04x] 
subsystem [%04x:%04x]\n",
+   priv->dev->vendor, priv->dev->device,
+   priv->dev->subsystem_vendor, priv->dev->subsystem_device);
 
return pci_default_setup(priv, board, port, idx);
 }
@@ -3361,14 +3358,15 @@ pciserial_init_ports(struct pci_dev *dev
if (quirk->setup(priv, board, , i))
break;
 
-#ifdef SERIAL_DEBUG_PCI
-   printk(KERN_DEBUG "Setup PCI port: port %lx, irq %d, type %d\n",
-  uart.port.iobase, uart.port.irq, uart.port.iotype);
-#endif
+   dev_dbg(>dev, "Setup PCI port: port %lx, irq %d, type 
%d\n",
+   uart.port.iobase, uart.port.irq, uart.port.iotype);
 
priv->line[i] = serial8250_register_8250_port();
if (priv->line[i] < 0) {
-   printk(KERN_WARNING "Couldn't register serial port %s: 
%d\n", pci_name(dev), priv->line[i]);
+   dev_err(>dev,
+   "Couldn't register serial port %lx, irq %d, 
type %d, error %d\n",
+   uart.port.iobase, uart.port.irq,
+   uart.port.iotype, priv->line[i]);
break;
}
}
@@ -3461,7 +3459,7 @@ pciserial_init_one(struct pci_dev *dev,
}
 
if (ent->driver_data >= ARRAY_SIZE(pci_boards)) {
-   printk(KERN_ERR "pci_init_one: invalid driver_data: %ld\n",
+   dev_err(>dev, "invalid driver_data: %ld\n",

Re: [PATCH -v1.1] x86, boot: Further compress CPUs bootup message

2013-09-28 Thread Ingo Molnar

* Borislav Petkov  wrote:

> On Sat, Sep 28, 2013 at 09:49:27PM +0200, Borislav Petkov wrote:
> > And yes, that one works too.
> 
> Btw, just to be thorough, we could handle negative numbers too:
> 
> int num_digits(int val)
> {
> int m = 10;
> int d = 1;
> 
> if (val < 0)
> val = -val;
> 
> while (val >= m) {
> m *= 10;
> d++;
> }
> 
> return d;
> }

I like the cell phone version better!

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v1.1] x86, boot: Further compress CPUs bootup message

2013-09-28 Thread Ingo Molnar

* H. Peter Anvin  wrote:

> How about:
> 
> m = 10;
> d = 1;
> 
> while ( val >= m ) {
> m *= 10;
> d++;
> }
> 
> ... and not have a *divide* in there?

Yeah :-)

> Man, entering code on a cell phone sucks...

Indeed that explains the BASIC style! (and you got the 8-space tabs right 
on a html mailer - wow, that's impressive.)

It's also probably the first time that code entered on an ordinary cell 
phone has gets into the Linux kernel, so it's probably a new Linux 
milestone, in a twisted, sick way. ;-)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v1.1] x86, boot: Further compress CPUs bootup message

2013-09-28 Thread Borislav Petkov
On Sat, Sep 28, 2013 at 09:49:27PM +0200, Borislav Petkov wrote:
> And yes, that one works too.

Btw, just to be thorough, we could handle negative numbers too:

int num_digits(int val)
{
int m = 10;
int d = 1;

if (val < 0)
val = -val;

while (val >= m) {
m *= 10;
d++;
}

return d;
}

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Out-of-bounds access in get_wchan (arch/x86/kernel/process_64.c)

2013-09-28 Thread Wolfram Gloger
Kees Cook  writes:

> Please note that these bounds checks aren't correct to begin with. Since
> a pointer is being dereferenced, the end boundry must be reduced by
> sizeof(unsigned long) as well.
>
> It looks like process_32.c suffers the same problems, too.

I can't see the end boundary problem in process_32.c.  The end checks
are properly reduced with the top_esp and top_ebp macros.

All I can see in process_32.c is that the check

if (bp < stack_page || bp > top_ebp+stack_page)

could be replaced by:

if (bp < stack_page-sizeof(unsigned long) || bp > 
top_ebp+stack_page)

but that is a relaxation and not an over/underrun fix.

Can you elaborate what problem you see in process_32.c?

Regards,
Wolfram.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH, v2] anon_vmas: Convert the rwsem to an rwlock_t

2013-09-28 Thread Ingo Molnar

* Linus Torvalds  wrote:

> On Sat, Sep 28, 2013 at 12:37 PM, Ingo Molnar  wrote:
> >
> > -   down_write_nest_lock(_vma->root->rwsem, >mmap_sem);
> > +   down_write_nest_lock(_vma->root->rwlock, 
> > >mmap_sem);
> 
> That's just completely bogus, and cannot work.
> 
> Maybe just a "write_lock(_vma->root->rwlock)" (which is just
> anon_vma_unlock_write(anon_vma)). But I think we might have a lockdep
> issue. I'm not quite sure what's up with the nesting there.
> 
> > -   if (rwsem_is_locked(_vma->root->rwsem)) {
> > +   if (write_can_lock(_vma->root->rwlock)) {
> > anon_vma_lock_write(anon_vma);
> > anon_vma_unlock_write(anon_vma);
> > }
> 
> That's the wrong way around. It should be
> 
> if (!write_can_lock(_vma->root->rwlock)) {
> 
> so some more testing definitely needed.

Yeah, that silly API asymmetry has bitten me before as well :-/

The attached version booted up fine under 16-way KVM:

 sh-4.2# uptime
  19:50:08 up 0 min,  0 users,  load average: 0.00, 0.00, 0.00

That's all the testing it will get this evening though. Patch should be 
good enough for Tim to try?

Thanks,

Ingo

>
Subject: anon_vmas: Convert the rwsem to an rwlock_t
From: Ingo Molnar 
Date: Sat, 28 Sep 2013 21:37:39 +0200

Here's a an almost totally untested patch to convert the anon vma lock to 
an rwlock_t.

I think its lack of modern queueing will hurt on big systems big time - it 
might even regress. But ... it's hard to tell such things in advance.

[ That might as well be for the better as it will eventually be fixed,
  which in turn will improve tasklist_lock workloads ;-) ]

--
 include/linux/mmu_notifier.h |2 +-
 include/linux/rmap.h |   19 +--
 mm/huge_memory.c |4 ++--
 mm/mmap.c|   10 +-
 mm/rmap.c|   24 
 5 files changed, 29 insertions(+), 30 deletions(-)

Signed-off-by: Ingo Molnar 
Index: tip/include/linux/mmu_notifier.h
===
--- tip.orig/include/linux/mmu_notifier.h
+++ tip/include/linux/mmu_notifier.h
@@ -151,7 +151,7 @@ struct mmu_notifier_ops {
  * Therefore notifier chains can only be traversed when either
  *
  * 1. mmap_sem is held.
- * 2. One of the reverse map locks is held (i_mmap_mutex or anon_vma->rwsem).
+ * 2. One of the reverse map locks is held (i_mmap_mutex or anon_vma->rwlock).
  * 3. No other concurrent thread can access the list (release)
  */
 struct mmu_notifier {
Index: tip/include/linux/rmap.h
===
--- tip.orig/include/linux/rmap.h
+++ tip/include/linux/rmap.h
@@ -7,7 +7,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 
 /*
@@ -26,7 +26,7 @@
  */
 struct anon_vma {
struct anon_vma *root;  /* Root of this anon_vma tree */
-   struct rw_semaphore rwsem;  /* W: modification, R: walking the list 
*/
+   rwlock_t rwlock;/* W: modification, R: walking the list 
*/
/*
 * The refcount is taken on an anon_vma when there is no
 * guarantee that the vma of page tables will exist for
@@ -64,7 +64,7 @@ struct anon_vma_chain {
struct vm_area_struct *vma;
struct anon_vma *anon_vma;
struct list_head same_vma;   /* locked by mmap_sem & page_table_lock */
-   struct rb_node rb;  /* locked by anon_vma->rwsem */
+   struct rb_node rb;  /* locked by anon_vma->rwlock */
unsigned long rb_subtree_last;
 #ifdef CONFIG_DEBUG_VM_RB
unsigned long cached_vma_start, cached_vma_last;
@@ -108,37 +108,36 @@ static inline void vma_lock_anon_vma(str
 {
struct anon_vma *anon_vma = vma->anon_vma;
if (anon_vma)
-   down_write(_vma->root->rwsem);
+   write_lock(_vma->root->rwlock);
 }
 
 static inline void vma_unlock_anon_vma(struct vm_area_struct *vma)
 {
struct anon_vma *anon_vma = vma->anon_vma;
if (anon_vma)
-   up_write(_vma->root->rwsem);
+   write_unlock(_vma->root->rwlock);
 }
 
 static inline void anon_vma_lock_write(struct anon_vma *anon_vma)
 {
-   down_write(_vma->root->rwsem);
+   write_lock(_vma->root->rwlock);
 }
 
 static inline void anon_vma_unlock_write(struct anon_vma *anon_vma)
 {
-   up_write(_vma->root->rwsem);
+   write_unlock(_vma->root->rwlock);
 }
 
 static inline void anon_vma_lock_read(struct anon_vma *anon_vma)
 {
-   down_read(_vma->root->rwsem);
+   read_unlock(_vma->root->rwlock);
 }
 
 static inline void anon_vma_unlock_read(struct anon_vma *anon_vma)
 {
-   up_read(_vma->root->rwsem);
+   read_unlock(_vma->root->rwlock);
 }
 
-
 /*
  * anon_vma helper functions.
  */
Index: tip/mm/huge_memory.c
===
--- 

Re: [PATCH -v1.1] x86, boot: Further compress CPUs bootup message

2013-09-28 Thread Borislav Petkov
On Sat, Sep 28, 2013 at 12:42:37PM -0700, H. Peter Anvin wrote:
> How about:
> 
> m = 10;
> d = 1;
> 
> while ( val >= m ) {
> m *= 10;
> d++;
> }
> 
> ... and not have a *divide* in there?

Yep, I know why :-)

And yes, that one works too.

> Man, entering code on a cell phone sucks...

Haha, that's why your variables are single letters :-)

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] anon_vmas: Convert the rwsem to an rwlock_t

2013-09-28 Thread Ingo Molnar

* Linus Torvalds  wrote:

> On Sat, Sep 28, 2013 at 12:37 PM, Ingo Molnar  wrote:
> >
> > -   down_write_nest_lock(_vma->root->rwsem, >mmap_sem);
> > +   down_write_nest_lock(_vma->root->rwlock, 
> > >mmap_sem);
> 
> That's just completely bogus, and cannot work.

Told you it's totally untested :-) Found that build failure a few minutes 
ago (the place escaped my search pattern), I'm trying the fix below.

Ingo

diff --git a/mm/mmap.c b/mm/mmap.c
index 25ce233..7ee85bf 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2955,7 +2955,7 @@ static void vm_lock_anon_vma(struct mm_struct *mm, struct 
anon_vma *anon_vma)
 * The LSB of head.next can't change from under us
 * because we hold the mm_all_locks_mutex.
 */
-   down_write_nest_lock(_vma->root->rwlock, >mmap_sem);
+   write_lock(_vma->root->rwlock);
/*
 * We can safely modify head.next after taking the
 * anon_vma->root->rwlock. If some other vma in this mm shares
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] anon_vmas: Convert the rwsem to an rwlock_t

2013-09-28 Thread Linus Torvalds
On Sat, Sep 28, 2013 at 12:37 PM, Ingo Molnar  wrote:
>
> -   down_write_nest_lock(_vma->root->rwsem, >mmap_sem);
> +   down_write_nest_lock(_vma->root->rwlock, >mmap_sem);

That's just completely bogus, and cannot work.

Maybe just a "write_lock(_vma->root->rwlock)" (which is just
anon_vma_unlock_write(anon_vma)). But I think we might have a lockdep
issue. I'm not quite sure what's up with the nesting there.

> -   if (rwsem_is_locked(_vma->root->rwsem)) {
> +   if (write_can_lock(_vma->root->rwlock)) {
> anon_vma_lock_write(anon_vma);
> anon_vma_unlock_write(anon_vma);
> }

That's the wrong way around. It should be

if (!write_can_lock(_vma->root->rwlock)) {

so some more testing definitely needed.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v1.1] x86, boot: Further compress CPUs bootup message

2013-09-28 Thread H. Peter Anvin
How about:

m = 10;
d = 1;

while ( val >= m ) {
m *= 10;
d++;
}

... and not have a *divide* in there?

Man, entering code on a cell phone sucks...


Ingo Molnar  wrote:
>
>* Borislav Petkov  wrote:
>
>> @@ -2,6 +2,10 @@ int num_digits(int val)
>>  {
>>  int digits = 0;
>>  
>> +/* Handle special case */
>> +if (!val)
>> +return 1;
>> +
>>  while (val) {
>>  val /= 10;
>>  digits++;
>
>Hm. I suspect this could then be written as:
>
>int num_digits(int val)
>{
>   int digits = 0;
>
>   do {
>   val /= 10;
>   digits++;
>   } while (val);
>
>   return digits;
>}
>
>No ugly special case! :-)
>
>Thanks,
>
>   Ingo

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rwsem: reduce spinlock contention in wakeup code path

2013-09-28 Thread Ingo Molnar

* Linus Torvalds  wrote:

> On Sat, Sep 28, 2013 at 12:21 PM, Ingo Molnar  wrote:
> >
> > If we do that then I suspect the next step will be queued rwlocks :-/ The
> > current rwlock_t implementation is rather primitive by modern standards.
> > (We'd probably have killed rwlock_t long ago if not for the
> > tasklist_lock.)
> 
> Yeah, I'm not happy about or rwlocks. That's one lock that currently
> is so broken that I think we could easily argue for making that one
> queued.
> 
> Waiman had a qrwlock series that looked reasonable, and I think his
> later versions were drop-in replacements (ie they automatically just
> did the RightThing(tm) wrt interrupts taking a recursive read lock - I
> objected to the first versions that required that to be stated
> explicitly).
> 
> I think Waiman's patches (even the later ones) made the queued rwlocks
> be a side-by-side implementation with the old rwlocks, and I think
> that was just being unnecessarily careful. It might be useful for
> testing to have a config option to switch between the two, but we
> might as well go all the way.
> 
> The old rwlock's really have been a disappointment - they are slower 
> than spinlocks, and seldom/never end up scaling any better.  Their main 
> advantage was literally the irq behavior - allowing readers to happen 
> without the expense of worrying about irq's.

Yeah.

But at least here the read side will not play, as the AIM7 workloads where 
the testing goes on excercises the write path exclusively I think.

Still, the lack of queueing ought to hurt - the question is by how much.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] drivers: net: vmxnet3 : vmxnet3_drv.c: removed checkaptch warning related to msleep()

2013-09-28 Thread David Miller

I see only patch #2 and #3.

Sort out why only 2 of the 3 patches were posted, and resend them
all.

Thank you.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] anon_vmas: Convert the rwsem to an rwlock_t

2013-09-28 Thread Ingo Molnar

* Ingo Molnar  wrote:

> If we do that then I suspect the next step will be queued rwlocks :-/ 
> The current rwlock_t implementation is rather primitive by modern 
> standards. (We'd probably have killed rwlock_t long ago if not for the 
> tasklist_lock.)
> 
> But yeah, it would work and conceptually a hard spinlock fits something 
> as lowlevel as the anon-vma lock.
> 
> I did a quick review pass and it appears nothing obvious is scheduling 
> with the anon-vma lock held. If it did in a non-obvious way it's likely 
> a bug anyway. The hugepage code grew a lot of logic running under the 
> anon-vma lock, but it all seems atomic.
> 
> So a conversion to rwlock_t could be attempted. (It should be relatively 
> easy patch as well, because the locking operation is now nicely 
> abstracted out.)

Here's a totally untested patch to convert the anon vma lock to an 
rwlock_t.

I think its lack of modern queueing will hurt on big systems big time - it 
might even regress. But ... it's hard to tell such things in advance.

[ That might as well be for the better as it will eventually be fixed, 
  which in turn will improve tasklist_lock workloads ;-) ]

Thanks,

Ingo

->
Subject: anon_vmas: Convert the rwsem to an rwlock_t
From: Ingo Molnar 

--
 include/linux/mmu_notifier.h |  2 +-
 include/linux/rmap.h | 19 +--
 mm/huge_memory.c |  4 ++--
 mm/mmap.c| 10 +-
 mm/rmap.c| 24 
 5 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index deca874..628e807 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -151,7 +151,7 @@ struct mmu_notifier_ops {
  * Therefore notifier chains can only be traversed when either
  *
  * 1. mmap_sem is held.
- * 2. One of the reverse map locks is held (i_mmap_mutex or anon_vma->rwsem).
+ * 2. One of the reverse map locks is held (i_mmap_mutex or anon_vma->rwlock).
  * 3. No other concurrent thread can access the list (release)
  */
 struct mmu_notifier {
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 6dacb93..f4ab929 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -7,7 +7,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 
 /*
@@ -26,7 +26,7 @@
  */
 struct anon_vma {
struct anon_vma *root;  /* Root of this anon_vma tree */
-   struct rw_semaphore rwsem;  /* W: modification, R: walking the list 
*/
+   rwlock_t rwlock;/* W: modification, R: walking the list 
*/
/*
 * The refcount is taken on an anon_vma when there is no
 * guarantee that the vma of page tables will exist for
@@ -64,7 +64,7 @@ struct anon_vma_chain {
struct vm_area_struct *vma;
struct anon_vma *anon_vma;
struct list_head same_vma;   /* locked by mmap_sem & page_table_lock */
-   struct rb_node rb;  /* locked by anon_vma->rwsem */
+   struct rb_node rb;  /* locked by anon_vma->rwlock */
unsigned long rb_subtree_last;
 #ifdef CONFIG_DEBUG_VM_RB
unsigned long cached_vma_start, cached_vma_last;
@@ -108,37 +108,36 @@ static inline void vma_lock_anon_vma(struct 
vm_area_struct *vma)
 {
struct anon_vma *anon_vma = vma->anon_vma;
if (anon_vma)
-   down_write(_vma->root->rwsem);
+   write_lock(_vma->root->rwlock);
 }
 
 static inline void vma_unlock_anon_vma(struct vm_area_struct *vma)
 {
struct anon_vma *anon_vma = vma->anon_vma;
if (anon_vma)
-   up_write(_vma->root->rwsem);
+   write_unlock(_vma->root->rwlock);
 }
 
 static inline void anon_vma_lock_write(struct anon_vma *anon_vma)
 {
-   down_write(_vma->root->rwsem);
+   write_lock(_vma->root->rwlock);
 }
 
 static inline void anon_vma_unlock_write(struct anon_vma *anon_vma)
 {
-   up_write(_vma->root->rwsem);
+   write_unlock(_vma->root->rwlock);
 }
 
 static inline void anon_vma_lock_read(struct anon_vma *anon_vma)
 {
-   down_read(_vma->root->rwsem);
+   read_unlock(_vma->root->rwlock);
 }
 
 static inline void anon_vma_unlock_read(struct anon_vma *anon_vma)
 {
-   up_read(_vma->root->rwsem);
+   read_unlock(_vma->root->rwlock);
 }
 
-
 /*
  * anon_vma helper functions.
  */
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7489884..78f6c08 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1542,7 +1542,7 @@ static int __split_huge_page_splitting(struct page *page,
 * We can't temporarily set the pmd to null in order
 * to split it, the pmd must remain marked huge at all
 * times or the VM won't take the pmd_trans_huge paths
-* and it won't wait on the anon_vma->root->rwsem to
+* and it won't wait on the anon_vma->root->rwlock to
 

[PATCH -v1.3] x86, boot: Further compress CPUs bootup message

2013-09-28 Thread Borislav Petkov
From: Borislav Petkov 

Turn it into (for example):

[0.073380] x86: Booting SMP configuration:
[0.074005]  node   #0, CPUs:  #1   #2   #3   #4   #5   #6   #7
[0.603005]  node   #1, CPUs: #8   #9  #10  #11  #12  #13  #14  #15
[1.25]  node   #2, CPUs:#16  #17  #18  #19  #20  #21  #22  #23
[1.796005]  node   #3, CPUs:#24  #25  #26  #27  #28  #29  #30  #31
[2.393005]  node   #4, CPUs:#32  #33  #34  #35  #36  #37  #38  #39
[2.996005]  node   #5, CPUs:#40  #41  #42  #43  #44  #45  #46  #47
[3.65]  node   #6, CPUs:#48  #49  #50  #51  #52  #53  #54  #55
[4.202005]  node   #7, CPUs:#56  #57  #58  #59  #60  #61  #62  #63
[4.811005]  node   #8, CPUs:#64  #65  #66  #67  #68  #69  #70  #71
[5.421006]  node   #9, CPUs:#72  #73  #74  #75  #76  #77  #78  #79
[6.032005]  node  #10, CPUs:#80  #81  #82  #83  #84  #85  #86  #87
[6.648006]  node  #11, CPUs:#88  #89  #90  #91  #92  #93  #94  #95
[7.262005]  node  #12, CPUs:#96  #97  #98  #99 #100 #101 #102 #103
[7.865005]  node  #13, CPUs:   #104 #105 #106 #107 #108 #109 #110 #111
[8.466005]  node  #14, CPUs:   #112 #113 #114 #115 #116 #117 #118 #119
[9.073006]  node  #15, CPUs:   #120 #121 #122 #123 #124 #125 #126 #127
[9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

While at it, change num_digits() to handle arg of 0 correctly, as Ingo
suggested.

Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/smpboot.c | 24 +++-
 arch/x86/lib/misc.c   |  4 ++--
 kernel/smp.c  |  7 ++-
 3 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index d41f3ba26ced..2a165580fa16 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -647,22 +647,38 @@ wakeup_secondary_cpu_via_init(int phys_apicid, unsigned 
long start_eip)
return (send_status | accept_status);
 }
 
+void smp_announce(void)
+{
+   int num_nodes = num_online_nodes();
+
+   printk(KERN_INFO "x86: Booted up %d node%s, %d CPUs\n",
+  num_nodes, (num_nodes > 1 ? "s" : ""), num_online_cpus());
+}
+
 /* reduce the number of lines printed when booting a large cpu count system */
 static void announce_cpu(int cpu, int apicid)
 {
static int current_node = -1;
int node = early_cpu_to_node(cpu);
-   static int width;
+   static int width, node_width;
 
if (!width)
width = num_digits(num_possible_cpus()) + 1; /* + '#' sign */
 
+   if (!node_width)
+   node_width = num_digits(num_possible_nodes()) + 1; /* + '#' */
+
+   if (cpu == 1)
+   printk(KERN_INFO "x86: Booting SMP configuration:\n");
+
if (system_state == SYSTEM_BOOTING) {
if (node != current_node) {
if (current_node > (-1))
-   pr_cont(" OK\n");
+   pr_cont("\n");
current_node = node;
-   pr_info("Booting Node %3d, Processors:", node);
+
+   printk(KERN_INFO " node %*s#%d, CPUs:  ",
+  node_width - num_digits(node), " ", node);
}
 
/* Add padding for the BSP */
@@ -671,8 +687,6 @@ static void announce_cpu(int cpu, int apicid)
 
pr_cont("%*s#%d", width - num_digits(cpu), " ", cpu);
 
-   if (cpu == num_present_cpus() - 1)
-   pr_cont(" OK\n");
} else
pr_info("Booting Node %d Processor %d APIC 0x%x\n",
node, cpu, apicid);
diff --git a/arch/x86/lib/misc.c b/arch/x86/lib/misc.c
index bc35cde9769f..903dfdc4dfd4 100644
--- a/arch/x86/lib/misc.c
+++ b/arch/x86/lib/misc.c
@@ -2,10 +2,10 @@ int num_digits(int val)
 {
int digits = 0;
 
-   while (val) {
+   do {
val /= 10;
digits++;
-   }
+   } while (val);
 
return digits;
 }
diff --git a/kernel/smp.c b/kernel/smp.c
index 0564571dcdf7..f5768b0c816a 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -524,6 +524,11 @@ void __init setup_nr_cpu_ids(void)
nr_cpu_ids = find_last_bit(cpumask_bits(cpu_possible_mask),NR_CPUS) + 1;
 }
 
+void __weak smp_announce(void)
+{
+   printk(KERN_INFO "Brought up %d CPUs\n", num_online_cpus());
+}
+
 /* Called by boot processor to activate the rest. */
 void __init smp_init(void)
 {
@@ -540,7 +545,7 @@ void __init smp_init(void)
}
 
/* Any cleanup work */
-   printk(KERN_INFO "Brought up %ld CPUs\n", (long)num_online_cpus());
+   smp_announce();
smp_cpus_done(setup_max_cpus);
 }
 
-- 
1.8.4

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line 

Re: [PATCH] rwsem: reduce spinlock contention in wakeup code path

2013-09-28 Thread Linus Torvalds
On Sat, Sep 28, 2013 at 12:21 PM, Ingo Molnar  wrote:
>
> If we do that then I suspect the next step will be queued rwlocks :-/ The
> current rwlock_t implementation is rather primitive by modern standards.
> (We'd probably have killed rwlock_t long ago if not for the
> tasklist_lock.)

Yeah, I'm not happy about or rwlocks. That's one lock that currently
is so broken that I think we could easily argue for making that one
queued.

Waiman had a qrwlock series that looked reasonable, and I think his
later versions were drop-in replacements (ie they automatically just
did the RightThing(tm) wrt interrupts taking a recursive read lock - I
objected to the first versions that required that to be stated
explicitly).

I think Waiman's patches (even the later ones) made the queued rwlocks
be a side-by-side implementation with the old rwlocks, and I think
that was just being unnecessarily careful. It might be useful for
testing to have a config option to switch between the two, but we
might as well go all the way.

The old rwlock's really have been a disappointment - they are slower
than spinlocks, and seldom/never end up scaling any better.  Their
main advantage was literally the irq behavior - allowing readers to
happen without the expense of worrying about irq's.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v1.1] x86, boot: Further compress CPUs bootup message

2013-09-28 Thread Ingo Molnar

* Borislav Petkov  wrote:

> @@ -2,6 +2,10 @@ int num_digits(int val)
>  {
>   int digits = 0;
>  
> + /* Handle special case */
> + if (!val)
> + return 1;
> +
>   while (val) {
>   val /= 10;
>   digits++;

Hm. I suspect this could then be written as:

int num_digits(int val)
{
int digits = 0;

do {
val /= 10;
digits++;
} while (val);

return digits;
}

No ugly special case! :-)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rwsem: reduce spinlock contention in wakeup code path

2013-09-28 Thread Linus Torvalds
On Sat, Sep 28, 2013 at 12:13 PM, Andi Kleen  wrote:
>
> And afaik anon_vma is usually hold short.

Yes.

But the problem with anon_vma is that the "usually" may be the 99.9%
case, but then there are some insane loads that do tons of forking
without execve, and they really make some of the rmap code work very
very hard. And then they all not only share that one root vma, but the
mm/rmap.c code ends up having to walk all their VM's because there
could be a page in there somewhere.

These loads aren't necessarily very realistic and very much not
common, but I think AIM7 actually has one of those cases, iirc.

Our anon_vma locking really is some of the more complex parts of the
kernel. Not because of the lock itself, but because of the subtle
rules about the whole anon_vma chain and how we have to lock the root
of the chain etc etc. And under all _normal_ behavior it's not a
problem at all. But I personally dread looking at some of that code,
because if we get anything wrong there (and it's happened), it's too
painful for words.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rwsem: reduce spinlock contention in wakeup code path

2013-09-28 Thread Ingo Molnar

* Linus Torvalds  wrote:

> On Sat, Sep 28, 2013 at 12:41 AM, Ingo Molnar  wrote:
> >
> >
> > Yeah, I fully agree. The reason I'm still very sympathetic to Tim's
> > efforts is that they address a regression caused by a mechanic
> > mutex->rwsem conversion:
> >
> >   5a505085f043 mm/rmap: Convert the struct anon_vma::mutex to an rwsem
> >
> > ... and Tim's patches turn that regression into an actual speedup.
> 
> Btw, I really hate that thing. I think we should turn it back into a 
> spinlock. None of what it protects needs a mutex or an rwsem.
> 
> Because you guys talk about the regression of turning it into a rwsem, 
> but nobody talks about the *original* regression.
> 
> And it *used* to be a spinlock, and it was changed into a mutex back in 
> 2011 by commit 2b575eb64f7a. That commit doesn't even have a reason 
> listed for it, although my dim memory of it is that the reason was 
> preemption latency.

Yeah, I think it was latency.

> And that caused big regressions too.
> 
> Of course, since then, we may well have screwed things up and now we 
> sleep under it, but I still really think it was a mistake to do it in 
> the first place.
> 
> So if the primary reason for this is really just that f*cking anon_vma 
> lock, then I would seriously suggest:
> 
>  - turn it back into a spinlock (or rwlock_t, since we subsequently
>separated the read and write paths)
> 
>  - fix up any breakage (ie new scheduling points) that exposes
> 
>  - look at possible other approaches wrt latency on that thing.
> 
> Hmm?

If we do that then I suspect the next step will be queued rwlocks :-/ The 
current rwlock_t implementation is rather primitive by modern standards. 
(We'd probably have killed rwlock_t long ago if not for the 
tasklist_lock.)

But yeah, it would work and conceptually a hard spinlock fits something as 
lowlevel as the anon-vma lock.

I did a quick review pass and it appears nothing obvious is scheduling 
with the anon-vma lock held. If it did in a non-obvious way it's likely a 
bug anyway. The hugepage code grew a lot of logic running under the 
anon-vma lock, but it all seems atomic.

So a conversion to rwlock_t could be attempted. (It should be relatively 
easy patch as well, because the locking operation is now nicely abstracted 
out.)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/12] netfilter: Remove extern from function prototypes

2013-09-28 Thread Jan Engelhardt
On Monday 2013-09-23 20:37, Joe Perches wrote:

>There are a mix of function prototypes with and without extern
>in the kernel sources.  Standardize on not using extern for
>function prototypes.
>
>Function prototypes don't need to be written with extern.
>extern is assumed by the compiler.  Its use is as unnecessary as
>using auto to declare automatic/local variables in a block.

Or you could just extern all functions for consistency with variables.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v1.1] x86, boot: Further compress CPUs bootup message

2013-09-28 Thread Borislav Petkov
On Sat, Sep 28, 2013 at 08:12:36PM +0200, Ingo Molnar wrote:
> 
> * Borislav Petkov  wrote:
> 
> > -   pr_info("Booting Node %3d, Processors:", node);
> > +   printk(KERN_INFO " node #%3d, CPUs:  ", node);
> 
> I think this should use %*s# and num_digits(node) as well?

How's that:

[0.073380] x86: Booting SMP configuration:
[0.074005]  node   #0, CPUs:  #1   #2   #3   #4   #5   #6   #7
[0.603005]  node   #1, CPUs: #8   #9  #10  #11  #12  #13  #14  #15
[1.25]  node   #2, CPUs:#16  #17  #18  #19  #20  #21  #22  #23
[1.796005]  node   #3, CPUs:#24  #25  #26  #27  #28  #29  #30  #31
[2.393005]  node   #4, CPUs:#32  #33  #34  #35  #36  #37  #38  #39
[2.996005]  node   #5, CPUs:#40  #41  #42  #43  #44  #45  #46  #47
[3.65]  node   #6, CPUs:#48  #49  #50  #51  #52  #53  #54  #55
[4.202005]  node   #7, CPUs:#56  #57  #58  #59  #60  #61  #62  #63
[4.811005]  node   #8, CPUs:#64  #65  #66  #67  #68  #69  #70  #71
[5.421006]  node   #9, CPUs:#72  #73  #74  #75  #76  #77  #78  #79
[6.032005]  node  #10, CPUs:#80  #81  #82  #83  #84  #85  #86  #87
[6.648006]  node  #11, CPUs:#88  #89  #90  #91  #92  #93  #94  #95
[7.262005]  node  #12, CPUs:#96  #97  #98  #99 #100 #101 #102 #103
[7.865005]  node  #13, CPUs:   #104 #105 #106 #107 #108 #109 #110 #111
[8.466005]  node  #14, CPUs:   #112 #113 #114 #115 #116 #117 #118 #119
[9.073006]  node  #15, CPUs:   #120 #121 #122 #123 #124 #125 #126 #127
[9.679901] x86: Booted up 16 nodes, 128 CPUs

?

Btw, num_digits() needed to handle the val==0 case :)

---
From: Borislav Petkov 
Date: Sat, 28 Sep 2013 19:37:18 +0200
Subject: [PATCH -v1.2] x86, boot: Further compress CPUs bootup message

Turn it into (for example):

[0.073380] x86: Booting SMP configuration:
[0.074005]  node   #0, CPUs:  #1   #2   #3   #4   #5   #6   #7
[0.603005]  node   #1, CPUs: #8   #9  #10  #11  #12  #13  #14  #15
[1.25]  node   #2, CPUs:#16  #17  #18  #19  #20  #21  #22  #23
[1.796005]  node   #3, CPUs:#24  #25  #26  #27  #28  #29  #30  #31
[2.393005]  node   #4, CPUs:#32  #33  #34  #35  #36  #37  #38  #39
[2.996005]  node   #5, CPUs:#40  #41  #42  #43  #44  #45  #46  #47
[3.65]  node   #6, CPUs:#48  #49  #50  #51  #52  #53  #54  #55
[4.202005]  node   #7, CPUs:#56  #57  #58  #59  #60  #61  #62  #63
[4.811005]  node   #8, CPUs:#64  #65  #66  #67  #68  #69  #70  #71
[5.421006]  node   #9, CPUs:#72  #73  #74  #75  #76  #77  #78  #79
[6.032005]  node  #10, CPUs:#80  #81  #82  #83  #84  #85  #86  #87
[6.648006]  node  #11, CPUs:#88  #89  #90  #91  #92  #93  #94  #95
[7.262005]  node  #12, CPUs:#96  #97  #98  #99 #100 #101 #102 #103
[7.865005]  node  #13, CPUs:   #104 #105 #106 #107 #108 #109 #110 #111
[8.466005]  node  #14, CPUs:   #112 #113 #114 #115 #116 #117 #118 #119
[9.073006]  node  #15, CPUs:   #120 #121 #122 #123 #124 #125 #126 #127
[9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/smpboot.c | 24 +++-
 arch/x86/lib/misc.c   |  4 
 kernel/smp.c  |  7 ++-
 3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index d41f3ba26ced..2a165580fa16 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -647,22 +647,38 @@ wakeup_secondary_cpu_via_init(int phys_apicid, unsigned 
long start_eip)
return (send_status | accept_status);
 }
 
+void smp_announce(void)
+{
+   int num_nodes = num_online_nodes();
+
+   printk(KERN_INFO "x86: Booted up %d node%s, %d CPUs\n",
+  num_nodes, (num_nodes > 1 ? "s" : ""), num_online_cpus());
+}
+
 /* reduce the number of lines printed when booting a large cpu count system */
 static void announce_cpu(int cpu, int apicid)
 {
static int current_node = -1;
int node = early_cpu_to_node(cpu);
-   static int width;
+   static int width, node_width;
 
if (!width)
width = num_digits(num_possible_cpus()) + 1; /* + '#' sign */
 
+   if (!node_width)
+   node_width = num_digits(num_possible_nodes()) + 1; /* + '#' */
+
+   if (cpu == 1)
+   printk(KERN_INFO "x86: Booting SMP configuration:\n");
+
if (system_state == SYSTEM_BOOTING) {
if (node != current_node) {
if (current_node > (-1))
-   pr_cont(" OK\n");
+   pr_cont("\n");
current_node = node;
-   pr_info("Booting Node %3d, Processors:", node);
+
+   

[PATCH 4/7] perf trace: Add record option

2013-09-28 Thread David Ahern
record option is a convience alias to include the -e raw_syscalls:*
argument to perf-record. All other options are passed to perf-record's
handler. Resulting data file can be analyzed by perf-trace -i.

Signed-off-by: David Ahern 
Cc: Ingo Molnar 
Cc: Frederic Weisbecker 
Cc: Peter Zijlstra 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Stephane Eranian 
---
 tools/perf/Documentation/perf-trace.txt |   12 +---
 tools/perf/builtin-trace.c  |   32 +++
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-trace.txt 
b/tools/perf/Documentation/perf-trace.txt
index b927e6f..6728b00 100644
--- a/tools/perf/Documentation/perf-trace.txt
+++ b/tools/perf/Documentation/perf-trace.txt
@@ -9,6 +9,7 @@ SYNOPSIS
 
 [verse]
 'perf trace'
+'perf trace record'
 
 DESCRIPTION
 ---
@@ -16,9 +17,14 @@ This command will show the events associated with the 
target, initially
 syscalls, but other system events like pagefaults, task lifetime events,
 scheduling events, etc.
 
-Initially this is a live mode only tool, but eventually will work with
-perf.data files like the other tools, allowing a detached 'record' from
-analysis phases.
+This is a live mode tool in addition to working with perf.data files like
+the other perf tools. Files can be generated using the 'perf record' command
+but the session needs to include the raw_syscalls events (-e 'raw_syscalls:*').
+Alernatively, the 'perf trace record' can be used as a shortcut to
+automatically include the raw_syscalls events when writing events to a file.
+
+The following options apply to perf trace; options to perf trace record are
+found in the perf record man page.
 
 OPTIONS
 ---
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 420e48b..ab42148 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1538,6 +1538,33 @@ static void perf_evlist__add_vfs_getname(struct 
perf_evlist *evlist)
perf_evlist__add(evlist, evsel);
 }
 
+static int trace__record(int argc, const char **argv)
+{
+   unsigned int rec_argc, i, j;
+   const char **rec_argv;
+   const char * const record_args[] = {
+   "record",
+   "-R",
+   "-m", "1024",
+   "-c", "1",
+   "-e", "raw_syscalls:sys_enter,raw_syscalls:sys_exit",
+   };
+
+   rec_argc = ARRAY_SIZE(record_args) + argc;
+   rec_argv = calloc(rec_argc + 1, sizeof(char *));
+
+   if (rec_argv == NULL)
+   return -ENOMEM;
+
+   for (i = 0; i < ARRAY_SIZE(record_args); i++)
+   rec_argv[i] = record_args[i];
+
+   for (j = 0; j < (unsigned int)argc; j++, i++)
+   rec_argv[i] = argv[j];
+
+   return cmd_record(i, rec_argv, NULL);
+}
+
 static int trace__run(struct trace *trace, int argc, const char **argv)
 {
struct perf_evlist *evlist = perf_evlist__new();
@@ -1857,6 +1884,8 @@ int cmd_trace(int argc, const char **argv, const char 
*prefix __maybe_unused)
const char * const trace_usage[] = {
"perf trace [] []",
"perf trace [] --  []",
+   "perf trace record [] []",
+   "perf trace record [] --  []",
NULL
};
struct trace trace = {
@@ -1917,6 +1946,9 @@ int cmd_trace(int argc, const char **argv, const char 
*prefix __maybe_unused)
int err;
char bf[BUFSIZ];
 
+   if ((argc > 1) && (strcmp(argv[1], "record") == 0))
+   return trace__record(argc-2, [2]);
+
argc = parse_options(argc, argv, trace_options, trace_usage, 0);
 
if (output_name != NULL) {
-- 
1.7.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/7] perf util: Add machine method to loop over threads and invoke handler

2013-09-28 Thread David Ahern
Loop over all threads within a machine - including threads moved to the
dead threads list -- and invoked a function. This allows commands to run
some specific function on each thread (eg., dump statistics) yet hides
how the threads are maintained within the machine and

Signed-off-by: David Ahern 
Cc: Arnaldo Carvalho de Melo 
Cc: Ingo Molnar 
Cc: Frederic Weisbecker 
Cc: Peter Zijlstra 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Stephane Eranian 
---
 tools/perf/util/machine.c |   23 +++
 tools/perf/util/machine.h |4 
 2 files changed, 27 insertions(+)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 40083df..d5055be 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1376,3 +1376,26 @@ int machine__resolve_callchain(struct machine *machine,
   sample);
 
 }
+
+int machine__for_each_thread(struct machine *machine,
+int (*fn)(struct thread *thread, void *p),
+void *priv)
+{
+   struct rb_node *nd;
+   struct thread *thread;
+   int rc = 0;
+
+   for (nd = rb_first(>threads); nd; nd = rb_next(nd)) {
+   thread = rb_entry(nd, struct thread, rb_node);
+   rc = fn(thread, priv);
+   if (rc != 0)
+   return rc;
+   }
+
+   list_for_each_entry(thread, >dead_threads, node) {
+   rc = fn(thread, priv);
+   if (rc != 0)
+   return rc;
+   }
+   return rc;
+}
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 58a6be1..dc9c57e 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -165,4 +165,8 @@ void machines__destroy_kernel_maps(struct machines 
*machines);
 
 size_t machine__fprintf_vmlinux_path(struct machine *machine, FILE *fp);
 
+int machine__for_each_thread(struct machine *machine,
+int (*fn)(struct thread *thread, void *p),
+void *priv);
+
 #endif /* __PERF_MACHINE_H */
-- 
1.7.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/7] perf trace: Fix comm resolution when reading events from file

2013-09-28 Thread David Ahern
Task comm's are getting lost when processing events from a file. The problem is
that the trace struct used by the live processing has its host machine and the
perf-session used for file based processing has its host machine.  Fix by
having both references point to the same machine.

Before:

 0.030 ( 0.001 ms): :27743/27743 brk( ...
 0.057 ( 0.004 ms): :27743/27743 mmap(len: 4096, prot: READ|WRITE, flags: 
...
 0.075 ( 0.006 ms): :27743/27743 access(filename: 0x7f3809fbce00, mode: R 
...
 0.091 ( 0.005 ms): :27743/27743 open(filename: 0x7f3809fba14c, flags: 
CLOEXEC ...
...

After:
 0.030 ( 0.001 ms): make/27743 brk( ...
 0.057 ( 0.004 ms): make/27743 mmap(len: 4096, prot: READ|WRITE, flags: ...
 0.075 ( 0.006 ms): make/27743 access(filename: 0x7f3809fbce00, mode: R ...
 0.091 ( 0.005 ms): make/27743 open(filename: 0x7f3809fba14c, flags: 
CLOEXEC ...
...

Signed-off-by: David Ahern 
Cc: Ingo Molnar 
Cc: Frederic Weisbecker 
Cc: Peter Zijlstra 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Stephane Eranian 
---
 tools/perf/builtin-trace.c |   29 +++--
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 0c93928..420e48b 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -892,7 +892,7 @@ struct trace {
struct syscall  *table;
} syscalls;
struct perf_record_opts opts;
-   struct machine  host;
+   struct machine  *host;
u64 base_time;
boolfull_time;
FILE*output;
@@ -1079,26 +1079,33 @@ static int trace__tool_process(struct perf_tool *tool,
 
 static int trace__symbols_init(struct trace *trace, struct perf_evlist *evlist)
 {
+   struct machine *host;
int err = symbol__init();
 
if (err)
return err;
 
-   machine__init(>host, "", HOST_KERNEL_ID);
-   machine__create_kernel_maps(>host);
+   host = malloc(sizeof(*host));
+   if (host == NULL)
+   return -ENOMEM;
+
+   machine__init(host, "", HOST_KERNEL_ID);
+   machine__create_kernel_maps(host);
 
if (perf_target__has_task(>opts.target)) {
err = perf_event__synthesize_thread_map(>tool, 
evlist->threads,
trace__tool_process,
-   >host);
+   host);
} else {
err = perf_event__synthesize_threads(>tool, 
trace__tool_process,
->host);
+host);
}
 
if (err)
symbol__exit();
 
+   trace->host = host;
+
return err;
 }
 
@@ -1303,7 +1310,7 @@ static int trace__sys_enter(struct trace *trace, struct 
perf_evsel *evsel,
if (sc->filtered)
return 0;
 
-   thread = machine__findnew_thread(>host, sample->pid,
+   thread = machine__findnew_thread(trace->host, sample->pid,
 sample->tid);
ttrace = thread__trace(thread, trace->output);
if (ttrace == NULL)
@@ -1357,7 +1364,7 @@ static int trace__sys_exit(struct trace *trace, struct 
perf_evsel *evsel,
if (sc->filtered)
return 0;
 
-   thread = machine__findnew_thread(>host, sample->pid,
+   thread = machine__findnew_thread(trace->host, sample->pid,
 sample->tid);
ttrace = thread__trace(thread, trace->output);
if (ttrace == NULL)
@@ -1427,7 +1434,7 @@ static int trace__sched_stat_runtime(struct trace *trace, 
struct perf_evsel *evs
 {
 u64 runtime = perf_evsel__intval(evsel, sample, "runtime");
double runtime_ms = (double)runtime / NSEC_PER_MSEC;
-   struct thread *thread = machine__findnew_thread(>host,
+   struct thread *thread = machine__findnew_thread(trace->host,
sample->pid,
sample->tid);
struct thread_trace *ttrace = thread__trace(thread, trace->output);
@@ -1628,7 +1635,7 @@ again:
trace->base_time = sample.time;
 
if (type != PERF_RECORD_SAMPLE) {
-   trace__process_event(trace, >host, 
event);
+   trace__process_event(trace, trace->host, event);
continue;
}
 
@@ -1722,6 +1729,8 @@ static int trace__replay(struct trace *trace)
if (session == NULL)
return -ENOMEM;
 
+   trace->host = >machines.host;
+
err = perf_session__set_tracepoints_handlers(session, handlers);
if (err)
goto out;
@@ 

  1   2   3   4   5   6   >