[PATCH, REGRESSION v4] mm: make apply_to_page_range more robust

2016-01-21 Thread Mika Penttilä
Recent changes (4.4.0+) in module loader triggered oops on ARM : 

The module in question is in-tree module :
drivers/misc/ti-st/st_drv.ko

The BUG is here :

[ 53.638335] [ cut here ]
[ 53.642967] kernel BUG at mm/memory.c:1878!
[ 53.647153] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[ 53.652987] Modules linked in:
[ 53.656061] CPU: 0 PID: 483 Comm: insmod Not tainted 4.4.0 #3
[ 53.661808] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 53.668338] task: a989d400 ti: 9e6a2000 task.ti: 9e6a2000
[ 53.673751] PC is at apply_to_page_range+0x204/0x224
[ 53.678723] LR is at change_memory_common+0x90/0xdc
[ 53.683604] pc : [<800ca0ec>] lr : [<8001d668>] psr: 600b0013
[ 53.683604] sp : 9e6a3e38 ip : 8001d6b4 fp : 7f0042fc
[ 53.695082] r10:  r9 : 9e6a3e90 r8 : 0080
[ 53.700309] r7 :  r6 : 7f008000 r5 : 7f008000 r4 : 7f008000
[ 53.706837] r3 : 8001d5a4 r2 : 7f008000 r1 : 7f008000 r0 : 80b8d3c0
[ 53.713368] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 53.720504] Control: 10c5387d Table: 2e6b804a DAC: 0055
[ 53.726252] Process insmod (pid: 483, stack limit = 0x9e6a2210)
[ 53.732173] Stack: (0x9e6a3e38 to 0x9e6a4000)
[ 53.736532] 3e20: 7f007fff 7f008000
[ 53.744714] 3e40: 80b8d3c0 80b8d3c0  7f007000 7f00426c 7f008000 
 7f008000
[ 53.752895] 3e60: 7f004140 7f008000  0080   
7f0042fc 8001d668
[ 53.761076] 3e80: 9e6a3e90  8001d6b4 7f00426c 0080  
9e6a3f58 7f004140
[ 53.769257] 3ea0: 7f004240 7f00414c  8008bbe0  7f00 
 
[ 53.777438] 3ec0: a8b12f00 0001cfd4 7f004250 7f004240 80b8159c  
00e0 7f0042fc
[ 53.785619] 3ee0: c183d000 74f8 18fd  0b3c  
 7f002024
[ 53.793800] 3f00: 0002      
 
[ 53.801980] 3f20:     0040  
0003 0001cfd4
[ 53.810161] 3f40: 017b 8000f7e4 9e6a2000  0002 8008c498 
c183d000 74f8
[ 53.818342] 3f60: c1841588 c1841409 c1842950 5000 52a0  
 
[ 53.826523] 3f80: 0023 0024 001a 001e 0016  
 
[ 53.834703] 3fa0: 003e3d60 8000f640   0003 0001cfd4 
 003e3d60
[ 53.842884] 3fc0:   003e3d60 017b 003e3d20 7eabc9d4 
76f2c000 0002
[ 53.851065] 3fe0: 7eabc990 7eabc980 00016320 76e81d00 600b0010 0003 
 
[ 53.859256] [<800ca0ec>] (apply_to_page_range) from [<8001d668>] 
(change_memory_common+0x90/0xdc)
[ 53.868139] [<8001d668>] (change_memory_common) from [<8008bbe0>] 
(load_module+0x194c/0x2068)
[ 53.876671] [<8008bbe0>] (load_module) from [<8008c498>] 
(SyS_finit_module+0x64/0x74)
[ 53.884512] [<8008c498>] (SyS_finit_module) from [<8000f640>] 
(ret_fast_syscall+0x0/0x34)
[ 53.892694] Code: e0834104 eabc e51a1008 eaac (e7f001f2)
[ 53.898792] ---[ end trace fe43fc78ebde29a3 ]---


apply_to_page_range gets zero length resulting in triggering :
   
  BUG_ON(addr >= end)

This is regression and a consequence of changes in module section handling.

BUG_ON() is not needed here and would need all call sites to be checked
because there may be callers that expect zero size to succeed and BUG_ON allows
easy way to DOS.

With this patch loading this module throws out a warning but that can be
handled in arch code with a separate patch.

v2: add more explanation
v3: added even more explanation and stack trace, tagged as regression
v4: change BUG_ON() to WARN_ON() and return -EINVAL

Signed-off-by: Mika Penttilä mika.pentt...@nextfour.com
---

diff --git a/mm/memory.c b/mm/memory.c
index 30991f8..9178ee6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1871,7 +1871,9 @@ int apply_to_page_range(struct mm_struct *mm, unsigned 
long addr,
unsigned long end = addr + size;
int err;
 
-   BUG_ON(addr >= end);
+   if (WARN_ON(addr >= end))
+   return -EINVAL;
+
pgd = pgd_offset(mm, addr);
do {
next = pgd_addr_end(addr, end);




Re: [perf/x86] 75925e1ad7: BUG: unable to handle kernel paging request at 000045b8

2016-01-21 Thread Peter Zijlstra
On Fri, Jan 22, 2016 at 12:33:24PM +0800, kernel test robot wrote:
> Greetings,
> 
> 0day kernel testing robot got the below dmesg and the first bad commit is
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> 
> commit 75925e1ad7f5a4e867bd14ff8e7f114ea1596434
> Author: Andi Kleen 
> AuthorDate: Thu Oct 22 15:07:21 2015 -0700
> Commit: Ingo Molnar 
> CommitDate: Mon Nov 23 09:58:25 2015 +0100
> 
> perf/x86: Optimize stack walk user accesses
> 
> Change the perf user stack walking to use the new
> __copy_from_user_nmi(), and split each access into word sized transfer
> sizes. This allows to inline the complete access and optimize it all
> into a single load.

Andi, please have a look at this. Also note that x86_64
__copy_from_user_nocheck() actually supports .size=16.


Re: [PATCH 0/2] Fix for ADJ_SETOFFSET w/ ADJ_NANO

2016-01-21 Thread Thomas Gleixner
On Thu, 21 Jan 2016, Shuah Khan wrote:
> On 01/21/2016 04:03 PM, John Stultz wrote:
> > David Herrmann mailed me pointing out that one of the
> > changes that landed in 4.5-rc broke users of ADJ_SETOFFSET
> > when used with ADJ_NANO.
> > 
> > I've implemented a fix to this issue and also introduced
> > more unit tests to validate these going forward.
> > 
> > Thomas: Can you queue the first patch for tip/timers/urgent?
> > 
> > Shuah: The kselftests patch can wait to the next merge window
> > if you'd prefer.
> 
> Yeah. Probably it has to wait until the next merge window as
> this is a new test. I can pull this into linux-kselftest next
> after merge window closes.

We really should not delay selftests, especially if they have been written
along with a fix for a recently detected problem.

Thanks,

tglx




Re: [PATCH 4.3 00/55] 4.3.4-stable review

2016-01-21 Thread Greg Kroah-Hartman
On Thu, Jan 21, 2016 at 09:42:53AM +, Mel Gorman wrote:
> On Wed, Jan 20, 2016 at 04:43:35PM -0800, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.3.4 release.
> 
> Any particular reason why "[PATCH 4.3-stable 0/5] Disable automatic numa
> balancing on UMA" was rejected?

It wasn't "rejected" at all, I have over 500 patches in my stable queue
right now to dig through.  That series is still in there, I'll get to
it, give me a few weeks to catch up, sorry for the delay.

greg k-h


Re: [PATCH 4.3 02/55] vrf: fix double free and memory corruption on register_netdevice failure

2016-01-21 Thread Greg Kroah-Hartman
On Thu, Jan 21, 2016 at 01:37:34AM +, Ben Hutchings wrote:
> On Wed, 2016-01-20 at 16:43 -0800, Greg Kroah-Hartman wrote:
> > 4.3-stable review patch.  If anyone has any objections, please let me
> > know.
> > 
> > --
> > 
> > From: Ben Hutchings 
> [...]
> 
> It's really From: Nikolay Aleksandrov .
>  Or at least the upstream version and commit message is his.
> 
> I probably introduced this error when backporting the patch.

Now fixed up, thanks.

greg k-h


Re: [PATCH 3.10 00/35] 3.10.95-stable review

2016-01-21 Thread Greg Kroah-Hartman
On Thu, Jan 21, 2016 at 08:06:27AM +0100, Willy Tarreau wrote:
> On Wed, Jan 20, 2016 at 04:14:51PM -0700, Shuah Khan wrote:
> > On 01/20/2016 03:00 PM, Greg Kroah-Hartman wrote:
> > > This is the start of the stable review cycle for the 3.10.95 release.
> > > There are 35 patches in this series, all will be posted as a response
> > > to this one.  If anyone has any issues with these being applied, please
> > > let me know.
> > > 
> > > Responses should be made by Fri Jan 22 21:19:15 UTC 2016.
> > > Anything received after that time might be too late.
> > > 
> > > The whole patch series can be found in one patch at:
> > >   kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.10.95-rc1.gz
> > > and the diffstat can be found below.
> > > 
> > > thanks,
> > > 
> > > greg k-h
> > > 
> > 
> > Compiled and booted on my test system. No dmesg regressions.
> 
> And running fine on my laptop FWIW.

Heh, a laptop on 3.10, hopefully it's old hardware :)

thanks for testing,

greg k-h


Re: [PATCH 4.3 00/55] 4.3.4-stable review

2016-01-21 Thread Greg Kroah-Hartman
On Thu, Jan 21, 2016 at 04:24:36AM -0800, Guenter Roeck wrote:
> On 01/20/2016 04:43 PM, Greg Kroah-Hartman wrote:
> >This is the start of the stable review cycle for the 4.3.4 release.
> >There are 55 patches in this series, all will be posted as a response
> >to this one.  If anyone has any issues with these being applied, please
> >let me know.
> >
> >Responses should be made by Fri Jan 22 23:21:49 UTC 2016.
> >Anything received after that time might be too late.
> >
> 
> Build results:
>   total: 146 pass: 146 fail: 0
> Qemu test results:
>   total: 95 pass: 95 fail: 0
> 
> Details are available at http://kerneltests.org/builders.

Thanks for testing all of these and letting me know.

greg k-h


Re: [PATCH 4.3 00/55] 4.3.4-stable review

2016-01-21 Thread Greg Kroah-Hartman
On Wed, Jan 20, 2016 at 06:39:48PM -0700, Shuah Khan wrote:
> On 01/20/2016 05:43 PM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.3.4 release.
> > There are 55 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Fri Jan 22 23:21:49 UTC 2016.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.3.4-rc1.gz
> > and the diffstat can be found below.
> > 
> > thanks,
> > 
> > greg k-h
> > 
> 
> Compiled and booted on my test system. No dmesg regressions.

Thanks for testing all of these and letting me know.

greg k-h


Re: [PATCH] clk: rockchip: rk3036: Add apll as the critical clock

2016-01-21 Thread Xing Zheng

Hi Heiko,

On 2016年01月21日 17:21, Heiko Stuebner wrote:

Hi Xing,

Am Mittwoch, 20. Januar 2016, 16:37:17 schrieb Xing Zheng:

The apll may be closed if there are some child clock nodes below
it when the device startup. Therefore, the apll should be keep
critical.

The apll tree like this:
 pll_apll
apll
   armclk
  pclk_dbg
  aclk_core_pre
   aclk_hvec
   uart_pll_clk
  uart2_src
 uart2_frac
  uart1_src
 uart1_frac
  uart0_src
 uart0_frac

can you find out which of those clocks does cause your hang?
Because things like the uart-clocks for example should be handled by their
driver already, at the time the clk_disable_unused runs(). So I'd really
like the critical clock to be the actually needed clock.

Thanks
Heiko


It looks like that we call the rockchip_rk3036_pll_disable cause the 
apll is diabled.

I think the diabled tracing like this:
1. All of uart0_frac~uart2_frac are branch_fraction_divider type, they 
have CLK_SET_RATE_UNGATE flag,
2. I enable cpufreq configs on the rk3036_defconfig, the default cpu 
freq is 600MHz durning loader, when startup it is 816MHz with default DTS.
Therefore, cpu freq will be change rate 600MHz to 816MHz then call 
clk_change_rate.
3. With the flag CLK_SET_RATE_UNGATE, triggering call clk_core_disable. 
In here, it will recursively close all of uart gates, finally, to call 
the root

parent diable callback that is rockchip_rk3036_pll_disable.

The disble log:
[ 1.074186] clk_change_rate -- CLK_SET_RATE_UNGATE name: uart2_frac, 
parent: uart2_src, core->flags = 0x0424
[ 1.105722] clk_gate_endisable -- name: uart2_frac, parent: uart2_src, 
enable = 0
[ 1.110125] clk_gate_endisable -- name: uart2_src, parent: uart_pll_clk, 
enable = 0

[ 2.604445] rockchip_rk3036_pll_disable -- name: pll_apll, parent: xin24m

Therefore, I am considering uart_pll_clk hang onto gpll, or add it into 
the critical clock replace using apll...


If there are some mistake, please correct me. :-)

Thanks.





Signed-off-by: Xing Zheng
---

  drivers/clk/rockchip/clk-rk3036.c |1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/clk/rockchip/clk-rk3036.c
b/drivers/clk/rockchip/clk-rk3036.c index ebce980..483913b 100644
--- a/drivers/clk/rockchip/clk-rk3036.c
+++ b/drivers/clk/rockchip/clk-rk3036.c
@@ -425,6 +425,7 @@ static struct rockchip_clk_branch
rk3036_clk_branches[] __initdata = { };

  static const char *const rk3036_critical_clocks[] __initconst = {
+   "apll",
"aclk_cpu",
"aclk_peri",
"hclk_peri",









[PATCH] mmc: dw_mmc: fix err handle of dw_mci_probe

2016-01-21 Thread Shawn Lin
This patch add correct err handle if dw_mci_ctrl_reset
failed while probing.

Signed-off-by: Shawn Lin 
---

 drivers/mmc/host/dw_mmc.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index 065a8f5..ec19984 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -3046,8 +3046,10 @@ int dw_mci_probe(struct dw_mci *host)
}
 
/* Reset all blocks */
-   if (!dw_mci_ctrl_reset(host, SDMMC_CTRL_ALL_RESET_FLAGS))
-   return -ENODEV;
+   if (!dw_mci_ctrl_reset(host, SDMMC_CTRL_ALL_RESET_FLAGS)) {
+   ret = -ENODEV;
+   goto err_clk_ciu;
+   }
 
host->dma_ops = host->pdata->dma_ops;
dw_mci_init_dma(host);
-- 
2.3.7




Re: tags: Unify emacs and exuberant rules

2016-01-21 Thread Michal Marek
On 2016-01-21 06:22, Dave Jones wrote:
> On Wed, Jan 20, 2016 at 06:22:04PM +, Linux Kernel wrote:
>  > Web:
> https://git.kernel.org/torvalds/c/93209d65c1d38f86ffb3f61a1214130b581a9709
>  > Commit: 93209d65c1d38f86ffb3f61a1214130b581a9709
>  > Parent: a1ccdb63b5535dc3446b0a9efc6d97aca82c72ef
>  > Refname:refs/heads/master
>  > Author: Michal Marek 
>  > AuthorDate: Wed Oct 14 11:48:06 2015 +0200
>  > Committer:  Michal Marek 
>  > CommitDate: Tue Jan 5 22:18:48 2016 +0100
>  > 
>  > tags: Unify emacs and exuberant rules
>  > 
>  > The emacs rules were constantly lagging behind the exuberant ones. Use 
> a
>  > single set of rules for both, to make the script easier to maintain.
>  > The language understood by both tools is basic regular expression with
>  > some limitations, which are documented in a comment. To be able to 
> store
>  > the rules in an array and easily iterate over it, the script requires
>  > bash now. In the exuberant case, the change fixes some false matches in
>  >  and also some too greedy matches in the arguments
>  > of the DECLARE_*/DEFINE_* macros. In the emacs case, several previously
>  > not working rules are matching now. Tested with these versions of the
>  > tools:
>  > 
>  >   Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
>  >   etags (GNU Emacs 24.5)
>  > 
>  > Signed-off-by: Michal Marek 
> 
> Since today, make tags got a lot more noisy for me on Debian unstable
> (exuberant-ctags 1:5.9~svn20110310-10)
> 
> $ make tags
> GEN tags
> ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name 
> pattern "\1"
> ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name 
> pattern "\1"
> ctags: Warning: kernel/locking/lockdep.c:153: null expansion of name pattern 
> "\1"
> ctags: Warning: kernel/workqueue.c:307: null expansion of name pattern "\1"
> ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern 
> "\1"
> ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern 
> "\1"
> ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1"
> ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern "\1"
> ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern "\1"
> 
> Looks like it's choking on DEFINE_PER_CPU definitions ?

Yes. But each time I submitted a patch to remove the line breaks in
DEFINE_PER_CPU, somebody came up with the clever idea to fix ctags instead.

Michal



[PATCH] mmc: dw_mmc: fix num_slots setting

2016-01-21 Thread Shawn Lin
This patch make num_slots to 1 if pdata->num_slot is not
defined. Meanwhile, we need to make sure num_slots should
not larger that the supported slots

Signed-off-by: Shawn Lin 
---

 drivers/mmc/host/dw_mmc.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index 7128351..065a8f5 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -2949,12 +2949,6 @@ int dw_mci_probe(struct dw_mci *host)
}
}
 
-   if (host->pdata->num_slots < 1) {
-   dev_err(host->dev,
-   "Platform data must supply num_slots.\n");
-   return -ENODEV;
-   }
-
host->biu_clk = devm_clk_get(host->dev, "biu");
if (IS_ERR(host->biu_clk)) {
dev_dbg(host->dev, "biu clock not available\n");
@@ -3111,7 +3105,15 @@ int dw_mci_probe(struct dw_mci *host)
if (host->pdata->num_slots)
host->num_slots = host->pdata->num_slots;
else
-   host->num_slots = SDMMC_GET_SLOT_NUM(mci_readl(host, HCON));
+   host->num_slots = 1;
+
+   if (host->num_slots < 1 ||
+   host->num_slots > SDMMC_GET_SLOT_NUM(mci_readl(host, HCON))) {
+   dev_err(host->dev,
+   "Platform data must supply correct num_slots.\n");
+   ret = -ENODEV;
+   goto err_clk_ciu;
+   }
 
/*
 * Enable interrupts for command done, data over, data empty,
-- 
2.3.7




Re: [RFC PATCH v3] POWER/runtime: refining the rpm_suspend function

2016-01-21 Thread Zhaoyang Huang
On 22 January 2016 at 03:32, Pavel Machek  wrote:
>
>> - goto repeat;
>> +
>> + /*check expires firstly for auto suspend mode,
>> + *if not, just go ahead to the async
>> + */
>
> English, coding style.
> Pavel
Hi Pavel,
Thank you for review. I will modify it for next version. So In terms
of readability,
do you think if the patch improves a little or not?

>
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Re: [PATCH 23/33] x86/asm/bpf: Create stack frames in bpf_jit.S

2016-01-21 Thread Ingo Molnar

* Alexei Starovoitov  wrote:

> > > I could be missing something. I think either this patch is not need or 
> > > you 
> > > need to teach the tool to ignore all JITed stuff. I don't think it's 
> > > practical to annotate everything. Different JITs do their own magic. s390 
> > > JIT is even more fancy.
> > 
> > Well, but the point of these patches isn't to make the tool happy.  It's 
> > really to make sure that runtime stack traces can be made reliable. Maybe 
> > I'm 
> > missing something but I don't see why JIT code can't honor 
> > CONFIG_FRAME_POINTER just like any other code.
> 
> It can if there is no performance cost added. I can speak for x64 JIT, but 
> the 
> rest needs to be analyzed as well. My point was that may be it's easier to 
> ignore all JITed code and just say that such call stacks may be unreliable? 
> live-patching is not applicable to JITed code anyway or you want to livepatch 
> the callees of it?

So the rule is that if frame pointers are enabled all kernel code should have 
correct stack frames - in case an IRQ (or NMI) hits it or it crashes.

Thanks,

Ingo


Re: wireless-drivers: random cleanup patches piling up

2016-01-21 Thread Dan Carpenter
On Thu, Jan 21, 2016 at 04:52:45PM -0800, Joe Perches wrote:
> Whitespace patches, where git diff -w does not show
> any difference and objdiff on the objects for a few
> randconfigs are identical, maybe could be sifted
> into a separate category from other patches.
> Maybe the kbuild test robot could help identify the
> whitespace style patches that can be easily applied.

It sort of takes a while to test the randconfig thing...  diff -w
doesn't catch every single issue.  For example:

http://www.spinics.net/lists/linux-driver-devel/msg78707.html

That's the only time I noticed a mistake like that which diff -w missed.

Also here is my rename_rev.pl script.  It's pretty useful.  Pipe
patches to it and it filters out the white space changes.

regards,
dan carpenter

#!/usr/bin/perl

# This is a tool to help review variable rename patches. The goal is
# to strip out the automatic sed renames and the white space changes
# and leaves the interesting code changes.
#
# Example 1: A patch renames openInfo to open_info:
# cat diff | rename_review.pl openInfo open_info
#
# Example 2: A patch swaps the first two arguments to some_func():
# cat diff | rename_review.pl \
#-e 's/some_func\((.*?),(.*?),/some_func\($2, $1,/'
#
# Example 3: A patch removes the xkcd_ prefix from some but not all the
# variables.  Instead of trying to figure out which variables were renamed
# just remove the prefix from them all:
# cat diff | rename_review.pl -ea 's/xkcd_//g'
#
# Example 4: A patch renames 20 CamelCase variables.  To review this let's
# just ignore all case changes and all '_' chars.
# cat diff | rename_review -ea 'tr/[A-Z]/[a-z]/' -ea 's/_//g'
#
# The other arguments are:
# -nc removes comments
# -ns removes '\' chars if they are at the end of the line.

use strict;
use File::Temp qw/ :mktemp  /;

sub usage() {
print "usage: cat diff | $0 old new old new old new...\n";
print "   or: cat diff | $0 -e 's/old/new/g'\n";
print " -a : auto";
print " -e : execute on old lines\n";
print " -ea: execute on all lines\n";
print " -nc: no comments\n";
print " -nb: no unneeded braces\n";
print " -ns: no slashes at the end of a line\n";
print " -pull: for function pull.  deletes context.\n";
print " -r : NULL, bool";
exit(1);
}
my @subs;
my @strict_subs;
my @cmds;
my $strip_comments;
my $strip_braces;
my $strip_slashes;
my $pull_context;
my $auto;

sub filter($) {
my $_ = shift();
my $old = 0;
if ($_ =~ /^-/) {
$old = 1;
}
# remove the first char
s/^[ +-]//;
if ($strip_comments) {
s/\/\*.*?\*\///g;
s/\/\/.*//;
}
foreach my $cmd (@cmds) {
if ($old || $cmd->[0] =~ /^-ea$/) {
eval $cmd->[1];
}
}
foreach my $sub (@subs) {
if ($old) {
s/$sub->[0]/$sub->[1]/g;
}
}
foreach my $sub (@strict_subs) {
if ($old) {
s/\b$sub->[0]\b/$sub->[1]/g;
}
}

# remove the newline so we can move curly braces here if we want.
s/\n//;
return $_;
}

while (my $param1 = shift()) {
if ($param1 =~ /^-a$/) {
$auto = 1;
next;
}
if ($param1 =~ /^-nc$/) {
$strip_comments = 1;
next;
}
if ($param1 =~ /^-nb$/) {
$strip_braces = 1;
next;
}
if ($param1 =~ /^-ns$/) {
$strip_slashes = 1;
next;
}
if ($param1 =~ /^-pull$/) {
$pull_context = 1;
next;
}
my $param2 = shift();
if ($param2 =~ /^$/) {
usage();
}
if ($param1 =~ /^-e(a|)$/) {
push @cmds, [$param1, $param2];
next;
}
if ($param1 =~ /^-r$/) {
if ($param2 =~ /bool/) {
push @cmds, ["-e", "s/== true//"];
push @cmds, ["-e", "s/true ==//"];
push @cmds, ["-e", "s/([a-zA-Z\-\>\._]+) == false/!\$1/"];
next;
} elsif ($param2 =~ /NULL/) {
push @cmds, ["-e", "s/ != NULL//"];
push @cmds, ["-e", "s/([a-zA-Z\-\>\._0-9]+) == NULL/!\$1/"];
next;
} elsif ($param2 =~ /BIT/) {
push @cmds, ["-e", 's/1[uUlL]* *<< *(\d+)/BIT($1)/'];
push @cmds, ["-e", 's/\(1 *<< *(\w+)\)/BIT($1)/'];
push @cmds, ["-e", 's/\(BIT\((.*?)\)\)/BIT($1)/'];
next;
}
usage();
}

push @subs, [$param1, $param2];
}

my ($oldfh, $oldfile) = mkstemp("/tmp/oldX");
my ($newfh, $newfile) = mkstemp("/tmp/newX");

my @input = ;

# auto works on the observation that the - line comes before the + line when we
# rename variables.  Take the first - line.  Find the first + line.  Find the
# one word difference.  Test that the old word never occurs in the new text.
if ($auto) {
my %c_keywords = (  auto => 1,
break => 1,
case => 1,
char => 1,
const => 1,
continue 

Re: [PATCH V3 00/11] Add T210 support in Tegra soctherm

2016-01-21 Thread Wei Ni


On 2016年01月21日 22:56, Thierry Reding wrote:
> * PGP Signed by an unknown key
> 
> On Mon, Jan 18, 2016 at 06:02:25PM +0800, Wei Ni wrote:
>> This patchset adds following functions for tegra_soctherm driver:
>> 1. add T210 support.
>> 2. export debugfs to show some registers.
>> 3. add thermtrip funciton.
>> 4. add suspend/resume function.
>>
>> The V1 serial is in:
>> http://www.spinics.net/lists/linux-tegra/msg24808.html
>> The V2 serial is in:
>> http://www.spinics.net/lists/linux-tegra/msg24901.html
>>
>> Main changes from V2:
>> 1. Fix build error in patch [1/11].
>> 2. Use of_get_child_by_name instead of of_find_node_by_name in patch [8/11].
>> 3. Use debugfs_remove_recursive to remove debugfs in patch [6/11].
>>
>> Main changes from V1:
>> 1. Use the new type to handl different Tegra chips in one driver,
>> which suggested by Thierry.
>> 2. Changes per Thieery's other comments.
>>
>> Wei Ni (11):
>>   thermal: tegra: move tegra thermal files into tegra directory
>>   thermal: tegra: combine sensor group-related data
>>   thermal: tegra: get rid of PDIV/HOTSPOT hack
>>   thermal: tegra: split tegra_soctherm driver
>>   thermal: tegra: add T210-specific SOC_THERM driver
>>   thermal: tegra: add a debugfs to show registers
>>   of: Add bindings of hw-trips for soctherm
>>   thermal: tegra: add thermtrip function
>>   thermal: tegra: add PM support
>>   arm64: tegra: add soctherm node for Tegra210
>>   ARM: tegra: set hw trips for Tegra124
> 
> Hi Wei,
> 
> This series looks mostly good to me. I've commented on a couple of minor
> things as replies to the individual patches.

Thanks for your review, I will check your comments and send out next version in
next few days.

> 
> On a higher level, what's the test procedure that we can use to validate
> that this code works?

You can check following files:
1. Run "cat /sys/class/thermal/thermal*/temp" to read temperature
This driver will register four thermal zones: cpu, gpu, mem and pll.
2. Run "cat /sys/kernel/debug/tegra_soctherm/regs" to show the register 
contents.
3. Write a low temperature value to
/sys/kernel/debug/tegra_soctherm/thermtrip/xxx to trigger the thermtrip 
function.
For example, if the cpu temperature is 3 now, you can read it from above
thermal zones, then "echo 25000 >
/sys/kernel/debug/tegra_soctherm/thermtrip/cpu", the system will be shutdown
immediately.

> 
> Thierry
> 
> * Unknown Key
> * 0x7F3EB3A1
> 


[PATCH V2 1/1] gpio-f7188x: Add F81866 GPIO supports

2016-01-21 Thread Peter Hung
Add F81866 GPIO supports

Fintek F81866 is a SuperIO. It contains HWMON/GPIO/Serial Ports.
and it has totally 72(9x8 sets) gpio pins.

Here is the PDF spec:
http://www.alldatasheet.com/datasheet-pdf/pdf/459085/FINTEK/F81866AD-I.html

The control method is the same with F7188x, but we should care the address
of GPIO8x.

GPIO address is below:
GPIO0x based: 0xf0
GPIO1x based: 0xe0
GPIO2x based: 0xd0
GPIO3x based: 0xc0
GPIO4x based: 0xb0
GPIO5x based: 0xa0
GPIO6x based: 0x90
GPIO7x based: 0x80
GPIO8x based: 0x88 <-- not 0x70.

Signed-off-by: Peter Hung 
---
 drivers/gpio/Kconfig   |  4 ++--
 drivers/gpio/gpio-f7188x.c | 27 ---
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
index cb212eb..c1ad573 100644
--- a/drivers/gpio/Kconfig
+++ b/drivers/gpio/Kconfig
@@ -513,10 +513,10 @@ config GPIO_104_IDI_48
  via the idi_48_irq module parameter.
 
 config GPIO_F7188X
-   tristate "F71869, F71869A, F71882FG and F71889F GPIO support"
+   tristate "F71869, F71869A, F71882FG, F71889F and F81866 GPIO support"
help
  This option enables support for GPIOs found on Fintek Super-I/O
- chips F71869, F71869A, F71882FG and F71889F.
+ chips F71869, F71869A, F71882FG, F71889F and F81866.
 
  To compile this driver as a module, choose M here: the module will
  be called f7188x-gpio.
diff --git a/drivers/gpio/gpio-f7188x.c b/drivers/gpio/gpio-f7188x.c
index d62fd6b..0417798 100644
--- a/drivers/gpio/gpio-f7188x.c
+++ b/drivers/gpio/gpio-f7188x.c
@@ -1,5 +1,5 @@
 /*
- * GPIO driver for Fintek Super-I/O F71869, F71869A, F71882 and F71889
+ * GPIO driver for Fintek Super-I/O F71869, F71869A, F71882, F71889 and F81866
  *
  * Copyright (C) 2010-2013 LaCie
  *
@@ -36,14 +36,16 @@
 #define SIO_F71869A_ID 0x1007  /* F71869A chipset ID */
 #define SIO_F71882_ID  0x0541  /* F71882 chipset ID */
 #define SIO_F71889_ID  0x0909  /* F71889 chipset ID */
+#define SIO_F81866_ID  0x1010  /* F81866 chipset ID */
 
-enum chips { f71869, f71869a, f71882fg, f71889f };
+enum chips { f71869, f71869a, f71882fg, f71889f, f81866 };
 
 static const char * const f7188x_names[] = {
"f71869",
"f71869a",
"f71882fg",
"f71889f",
+   "f81866",
 };
 
 struct f7188x_sio {
@@ -190,6 +192,18 @@ static struct f7188x_gpio_bank f71889_gpio_bank[] = {
F7188X_GPIO_BANK(70, 8, 0x80),
 };
 
+static struct f7188x_gpio_bank f81866_gpio_bank[] = {
+   F7188X_GPIO_BANK(0, 8, 0xF0),
+   F7188X_GPIO_BANK(10, 8, 0xE0),
+   F7188X_GPIO_BANK(20, 8, 0xD0),
+   F7188X_GPIO_BANK(30, 8, 0xC0),
+   F7188X_GPIO_BANK(40, 8, 0xB0),
+   F7188X_GPIO_BANK(50, 8, 0xA0),
+   F7188X_GPIO_BANK(60, 8, 0x90),
+   F7188X_GPIO_BANK(70, 8, 0x80),
+   F7188X_GPIO_BANK(80, 8, 0x88),
+};
+
 static int f7188x_gpio_direction_in(struct gpio_chip *chip, unsigned offset)
 {
int err;
@@ -318,6 +332,10 @@ static int f7188x_gpio_probe(struct platform_device *pdev)
data->nr_bank = ARRAY_SIZE(f71889_gpio_bank);
data->bank = f71889_gpio_bank;
break;
+   case f81866:
+   data->nr_bank = ARRAY_SIZE(f81866_gpio_bank);
+   data->bank = f81866_gpio_bank;
+   break;
default:
return -ENODEV;
}
@@ -395,6 +413,9 @@ static int __init f7188x_find(int addr, struct f7188x_sio 
*sio)
case SIO_F71889_ID:
sio->type = f71889f;
break;
+   case SIO_F81866_ID:
+   sio->type = f81866;
+   break;
default:
pr_info(DRVNAME ": Unsupported Fintek device 0x%04x\n", devid);
goto err;
@@ -485,6 +506,6 @@ static void __exit f7188x_gpio_exit(void)
 }
 module_exit(f7188x_gpio_exit);
 
-MODULE_DESCRIPTION("GPIO driver for Super-I/O chips F71869, F71869A, F71882FG 
and F71889F");
+MODULE_DESCRIPTION("GPIO driver for Super-I/O chips F71869, F71869A, F71882FG, 
F71889F and F81866");
 MODULE_AUTHOR("Simon Guinot ");
 MODULE_LICENSE("GPL");
-- 
1.9.1



[PATCH V2 0/1] gpio-f7188x: Add F81866 GPIO supports

2016-01-21 Thread Peter Hung
Fintek F81866 is a SuperIO. It contains HWMON/GPIO/Serial Ports.
and it has totally 72(9x8 sets) gpio pins.

Here is the PDF spec:
http://www.alldatasheet.com/datasheet-pdf/pdf/459085/FINTEK/F81866AD-I.html

The control method is the same with F7188x, but we should care the address
of GPIO8x.

GPIO address is below:
GPIO0x based: 0xf0
GPIO1x based: 0xe0
GPIO2x based: 0xd0
GPIO3x based: 0xc0
GPIO4x based: 0xb0
GPIO5x based: 0xa0
GPIO6x based: 0x90
GPIO7x based: 0x80
GPIO8x based: 0x88 <-- not 0x70.

Change Log:
V2:
1. V1 contains 2 patches, first is add F81866 and second is a filter
   to find enabled GPIO. But Simon say some mainboard maybe configure
   the SuperIO with wrong setting. So the V2 patch only implements
   F81866 GPIO control method the same with F7188x.

Peter Hung (1):
  gpio-f7188x: Add F81866 GPIO supports

 drivers/gpio/Kconfig   |  4 ++--
 drivers/gpio/gpio-f7188x.c | 27 ---
 2 files changed, 26 insertions(+), 5 deletions(-)

-- 
Change Log:
V2

1.9.1



[PATCH v6 2/5] cpufreq: powernv: Remove cpu_to_chip_id() from hot-path

2016-01-21 Thread Shilpasri G Bhat
cpu_to_chip_id() does a DT walk through to find out the chip id by
taking a contended device tree lock. This adds an unnecessary overhead
in a hot path. So instead of calling cpu_to_chip_id() everytime cache
the chip ids for all cores in the array 'core_to_chip_map' and use it
in the hotpath.

Reported-by: Anton Blanchard 
Signed-off-by: Shilpasri G Bhat 
Reviewed-by: Gautham R. Shenoy 
---
No changes from v5.

Changes from v4:
- Taken care of Shreyas's comments to add a core_to_chip_map array to
  store the chip id.

 drivers/cpufreq/powernv-cpufreq.c | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 140c75f..6f186dc 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -43,6 +43,7 @@
 
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
 static bool rebooting, throttled, occ_reset;
+static unsigned int *core_to_chip_map;
 
 static struct chip {
unsigned int id;
@@ -313,13 +314,14 @@ static inline unsigned int get_nominal_index(void)
 static void powernv_cpufreq_throttle_check(void *data)
 {
unsigned int cpu = smp_processor_id();
+   unsigned int chip_id = core_to_chip_map[cpu_core_index_of_thread(cpu)];
unsigned long pmsr;
int pmsr_pmax, i;
 
pmsr = get_pmspr(SPRN_PMSR);
 
for (i = 0; i < nr_chips; i++)
-   if (chips[i].id == cpu_to_chip_id(cpu))
+   if (chips[i].id == chip_id)
break;
 
/* Check for Pmax Capping */
@@ -559,19 +561,29 @@ static int init_chip_info(void)
unsigned int chip[256];
unsigned int cpu, i;
unsigned int prev_chip_id = UINT_MAX;
+   cpumask_t cpu_mask;
+   int ret = -ENOMEM;
 
-   for_each_possible_cpu(cpu) {
+   cpumask_copy(_mask, cpu_possible_mask);
+   core_to_chip_map = kcalloc(cpu_nr_cores(), sizeof(unsigned int),
+  GFP_KERNEL);
+   if (!core_to_chip_map)
+   goto out;
+
+   for_each_cpu(cpu, _mask) {
unsigned int id = cpu_to_chip_id(cpu);
 
if (prev_chip_id != id) {
prev_chip_id = id;
chip[nr_chips++] = id;
}
+   core_to_chip_map[cpu_core_index_of_thread(cpu)] = id;
+   cpumask_andnot(_mask, _mask, cpu_sibling_mask(cpu));
}
 
chips = kmalloc_array(nr_chips, sizeof(struct chip), GFP_KERNEL);
if (!chips)
-   return -ENOMEM;
+   goto free_chip_map;
 
for (i = 0; i < nr_chips; i++) {
chips[i].id = chip[i];
@@ -582,6 +594,10 @@ static int init_chip_info(void)
}
 
return 0;
+free_chip_map:
+   kfree(core_to_chip_map);
+out:
+   return ret;
 }
 
 static int __init powernv_cpufreq_init(void)
@@ -615,6 +631,8 @@ static void __exit powernv_cpufreq_exit(void)
unregister_reboot_notifier(_cpufreq_reboot_nb);
opal_message_notifier_unregister(OPAL_MSG_OCC,
 _cpufreq_opal_nb);
+   kfree(chips);
+   kfree(core_to_chip_map);
cpufreq_unregister_driver(_cpufreq_driver);
 }
 module_exit(powernv_cpufreq_exit);
-- 
1.9.3



[PATCH v6 5/5] cpufreq: powernv: Add sysfs attributes to show throttle stats

2016-01-21 Thread Shilpasri G Bhat
Create sysfs attributes to export throttle information in
/sys/devices/system/cpu/cpufreq/chipN. The newly added sysfs files are as
follows:

1)/sys/devices/system/cpu/cpufreq/chip0/throttle_frequencies
  This gives the throttle stats for each of the available frequencies.
  The throttle stat of a frequency is the total number of times the max
  frequency is reduced to that frequency.
  # cat /sys/devices/system/cpu/cpufreq/chip0/throttle_frequencies
  4023000 0
  399 0
  3956000 1
  3923000 0
  389 0
  3857000 2
  3823000 0
  379 0
  3757000 2
  3724000 1
  369 1
  ...

2)/sys/devices/system/cpu/cpufreq/chip0/throttle_reasons
  This directory contains throttle reason files. Each file gives the
  total number of times the max frequency is throttled, except for
  'throttle_reset', which gives the total number of times the max
  frequency is unthrottled after being throttled.
  # cd /sys/devices/system/cpu/cpufreq/chip0/throttle_reasons
  # cat cpu_over_temperature
  7
  # cat occ_reset
  0
  # cat over_current
  0
  # cat power_cap
  0
  # cat power_supply_failure
  0
  # cat throttle_reset
  7

3)/sys/devices/system/cpu/cpufreq/chip0/throttle_stat
  This gives the total number of events of max frequency throttling to
  lower frequencies in the turbo range of frequencies and the sub-turbo(at
  and below nominal) range of frequencies.
  # cat /sys/devices/system/cpu/cpufreq/chip0/throttle_stat
  turbo 7
  sub-turbo 0

Signed-off-by: Shilpasri G Bhat 
Reviewed-by: Gautham R. Shenoy 
---
No changes from v5.

Changes from v4:
- Taken care of Gautham's comments to use inline get_chip_index()

Changes from v3:
- Seperate the patch to contain only the throttle sysfs attribute changes.
- Add helper inline function get_chip_index()

Changes from v2:
- Fixed kbuild test warning.
drivers/cpufreq/powernv-cpufreq.c:609:2: warning: ignoring return
value of 'kstrtoint', declared with attribute warn_unused_result
[-Wunused-result]

Changes from v1:
- Added a kobject to struct chip
- Grouped the throttle reasons under a separate attribute_group and
  exported each reason as individual file.
- Moved the sysfs files from /sys/devices/system/node/nodeN to
  /sys/devices/system/cpu/cpufreq/chipN
- As suggested by Paul Clarke replaced 'Nominal' with 'sub-turbo'.

 drivers/cpufreq/powernv-cpufreq.c | 205 --
 1 file changed, 196 insertions(+), 9 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 2d09274..7d65c82 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -55,6 +55,16 @@ static const char * const throttle_reason[] = {
"OCC Reset"
 };
 
+enum throt_reason_type {
+   NO_THROTTLE = 0,
+   POWERCAP,
+   CPU_OVERTEMP,
+   POWER_SUPPLY_FAILURE,
+   OVERCURRENT,
+   OCC_RESET_THROTTLE,
+   OCC_MAX_REASON
+};
+
 static struct chip {
unsigned int id;
bool throttled;
@@ -62,6 +72,11 @@ static struct chip {
u8 throt_reason;
cpumask_t mask;
struct work_struct throttle;
+   int throt_turbo;
+   int throt_nominal;
+   int reason[OCC_MAX_REASON];
+   int *pstate_stat;
+   struct kobject *kobj;
 } *chips;
 
 static int nr_chips;
@@ -196,6 +211,128 @@ static struct freq_attr *powernv_cpu_freq_attr[] = {
NULL,
 };
 
+static inline int get_chip_index(unsigned int id)
+{
+   int i;
+
+   for (i = 0; i < nr_chips; i++)
+   if (chips[i].id == id)
+   return i;
+
+   return -EINVAL;
+}
+
+static ssize_t throttle_freq_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+   int i, count = 0, id;
+
+   i = kstrtoint(kobj->name + 4, 0, );
+   if (i)
+   return i;
+
+   id = get_chip_index(id);
+   if (id < 0) {
+   pr_warn_once("%s Matching chip-id not found\n", __func__);
+   return id;
+   }
+
+   for (i = 0; i < powernv_pstate_info.nr_pstates; i++)
+   count += sprintf([count], "%d %d\n",
+  powernv_freqs[i].frequency,
+  chips[id].pstate_stat[i]);
+
+   return count;
+}
+
+static struct kobj_attribute attr_throttle_frequencies =
+__ATTR(throttle_frequencies, 0444, throttle_freq_show, NULL);
+
+static ssize_t throttle_stat_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+   int ret, id, count = 0;
+
+   ret = kstrtoint(kobj->name + 4, 0, );
+   if (ret)
+   return ret;
+
+   id = get_chip_index(id);
+   if (id < 0) {
+   pr_warn_once("%s Matching chip-id not found\n", __func__);
+   return id;
+   }
+
+   count += sprintf([count], "turbo %d\n", chips[id].throt_turbo);
+   count += sprintf([count], "sub-turbo %d\n",
+   

[PATCH v6 1/5] cpufreq: powernv: Hot-plug safe the kworker thread

2016-01-21 Thread Shilpasri G Bhat
In the kworker_thread powernv_cpufreq_work_fn(), we can end up
sending an IPI to a cpu going offline. This is a rare corner case
which is fixed using {get/put}_online_cpus(). Along with this fix,
this patch adds changes to do oneshot cpumask_{clear/and} operation.

Suggested-by: Shreyas B Prabhu 
Suggested-by: Gautham R Shenoy 
Signed-off-by: Shilpasri G Bhat 
Reviewed-by: Gautham R. Shenoy 
---
Changes form v5:
- Fix the kbuild-error:
drivers/cpufreq/powernv-cpufreq.c:428:2: error: implicit declaration of
function 'get_online_cpus' [-Werror=implicit-function-declaration

 drivers/cpufreq/powernv-cpufreq.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 547890f..140c75f 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -423,18 +424,19 @@ void powernv_cpufreq_work_fn(struct work_struct *work)
 {
struct chip *chip = container_of(work, struct chip, throttle);
unsigned int cpu;
-   cpumask_var_t mask;
+   cpumask_t mask;
 
-   smp_call_function_any(>mask,
+   get_online_cpus();
+   cpumask_and(, >mask, cpu_online_mask);
+   smp_call_function_any(,
  powernv_cpufreq_throttle_check, NULL, 0);
 
if (!chip->restore)
-   return;
+   goto out;
 
chip->restore = false;
-   cpumask_copy(mask, >mask);
-   for_each_cpu_and(cpu, mask, cpu_online_mask) {
-   int index, tcpu;
+   for_each_cpu(cpu, ) {
+   int index;
struct cpufreq_policy policy;
 
cpufreq_get_policy(, cpu);
@@ -442,9 +444,10 @@ void powernv_cpufreq_work_fn(struct work_struct *work)
   policy.cur,
   CPUFREQ_RELATION_C, );
powernv_cpufreq_target_index(, index);
-   for_each_cpu(tcpu, policy.cpus)
-   cpumask_clear_cpu(tcpu, mask);
+   cpumask_andnot(, , policy.cpus);
}
+out:
+   put_online_cpus();
 }
 
 static char throttle_reason[][30] = {
-- 
1.9.3



[PATCH v6 4/5] cpufreq: powernv: Replace pr_info with trace print for throttle event

2016-01-21 Thread Shilpasri G Bhat
Currently we use printk message to notify the throttle event. But this
can flood the console if the cpu is throttled frequently. So replace the
printk with the tracepoint to notify the throttle event. And also events
like throttle below nominal frequency and OCC_RESET are reduced to
pr_warn/pr_warn_once as pointed by MFG to not mark them as critical
messages. This patch adds 'throt_reason' to struct chip to store the
throttle reason.

Signed-off-by: Shilpasri G Bhat 
Reviewed-by: Gautham R. Shenoy 
---
No changes from v5.

Changes from v4:
- Taken care of Gautham's comments to remove the new function
  powernv_cpufreq_check_pmax()
- Modified commit message

Changes from v3:
- Separate this patch to contain trace_point changes
- Move struct chip member 'restore' of type bool above 'mask' to reduce
  structure padding.

No changes from v2.

Changes from v1:
- As suggested by Paul Clarke replaced char * throttle_reason[][30] by 
  const char * const throttle_reason[].

 drivers/cpufreq/powernv-cpufreq.c | 73 ++-
 1 file changed, 34 insertions(+), 39 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 6f186dc..2d09274 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -45,12 +46,22 @@ static struct cpufreq_frequency_table 
powernv_freqs[POWERNV_MAX_PSTATES+1];
 static bool rebooting, throttled, occ_reset;
 static unsigned int *core_to_chip_map;
 
+static const char * const throttle_reason[] = {
+   "No throttling",
+   "Power Cap",
+   "Processor Over Temperature",
+   "Power Supply Failure",
+   "Over Current",
+   "OCC Reset"
+};
+
 static struct chip {
unsigned int id;
bool throttled;
+   bool restore;
+   u8 throt_reason;
cpumask_t mask;
struct work_struct throttle;
-   bool restore;
 } *chips;
 
 static int nr_chips;
@@ -331,17 +342,17 @@ static void powernv_cpufreq_throttle_check(void *data)
goto next;
chips[i].throttled = true;
if (pmsr_pmax < powernv_pstate_info.nominal)
-   pr_crit("CPU %d on Chip %u has Pmax reduced below 
nominal frequency (%d < %d)\n",
-   cpu, chips[i].id, pmsr_pmax,
-   powernv_pstate_info.nominal);
-   else
-   pr_info("CPU %d on Chip %u has Pmax reduced below turbo 
frequency (%d < %d)\n",
-   cpu, chips[i].id, pmsr_pmax,
-   powernv_pstate_info.max);
+   pr_warn_once("CPU %d on Chip %u has Pmax reduced below 
nominal frequency (%d < %d)\n",
+cpu, chips[i].id, pmsr_pmax,
+powernv_pstate_info.nominal);
+   trace_powernv_throttle(chips[i].id,
+  throttle_reason[chips[i].throt_reason],
+  pmsr_pmax);
} else if (chips[i].throttled) {
chips[i].throttled = false;
-   pr_info("CPU %d on Chip %u has Pmax restored to %d\n", cpu,
-   chips[i].id, pmsr_pmax);
+   trace_powernv_throttle(chips[i].id,
+  throttle_reason[chips[i].throt_reason],
+  pmsr_pmax);
}
 
/* Check if Psafe_mode_active is set in PMSR. */
@@ -359,7 +370,7 @@ next:
 
if (throttled) {
pr_info("PMSR = %16lx\n", pmsr);
-   pr_crit("CPU Frequency could be throttled\n");
+   pr_warn("CPU Frequency could be throttled\n");
}
 }
 
@@ -452,15 +463,6 @@ out:
put_online_cpus();
 }
 
-static char throttle_reason[][30] = {
-   "No throttling",
-   "Power Cap",
-   "Processor Over Temperature",
-   "Power Supply Failure",
-   "Over Current",
-   "OCC Reset"
-};
-
 static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
   unsigned long msg_type, void *_msg)
 {
@@ -486,7 +488,7 @@ static int powernv_cpufreq_occ_msg(struct notifier_block 
*nb,
 */
if (!throttled) {
throttled = true;
-   pr_crit("CPU frequency is throttled for duration\n");
+   pr_warn("CPU frequency is throttled for duration\n");
}
 
break;
@@ -510,23 +512,18 @@ static int powernv_cpufreq_occ_msg(struct notifier_block 
*nb,
return 0;
}
 
-  

[PATCH v6 3/5] cpufreq: powernv/tracing: Add powernv_throttle tracepoint

2016-01-21 Thread Shilpasri G Bhat
This patch adds the powernv_throttle tracepoint to trace the CPU
frequency throttling event, which is used by the powernv-cpufreq
driver in POWER8.

Signed-off-by: Shilpasri G Bhat 
Reviewed-by: Gautham R. Shenoy 
CC: Ingo Molnar 
CC: Steven Rostedt 
---
No changes since v2.

 include/trace/events/power.h | 22 ++
 kernel/trace/power-traces.c  |  1 +
 2 files changed, 23 insertions(+)

diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 284244e..19e5030 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -38,6 +38,28 @@ DEFINE_EVENT(cpu, cpu_idle,
TP_ARGS(state, cpu_id)
 );
 
+TRACE_EVENT(powernv_throttle,
+
+   TP_PROTO(int chip_id, const char *reason, int pmax),
+
+   TP_ARGS(chip_id, reason, pmax),
+
+   TP_STRUCT__entry(
+   __field(int, chip_id)
+   __string(reason, reason)
+   __field(int, pmax)
+   ),
+
+   TP_fast_assign(
+   __entry->chip_id = chip_id;
+   __assign_str(reason, reason);
+   __entry->pmax = pmax;
+   ),
+
+   TP_printk("Chip %d Pmax %d %s", __entry->chip_id,
+ __entry->pmax, __get_str(reason))
+);
+
 TRACE_EVENT(pstate_sample,
 
TP_PROTO(u32 core_busy,
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index eb4220a..81b8745 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -15,4 +15,5 @@
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(suspend_resume);
 EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
+EXPORT_TRACEPOINT_SYMBOL_GPL(powernv_throttle);
 
-- 
1.9.3



[PATCH v6 0/5] cpufreq: powernv: Redesign the presentation of throttle notification and solve bug-fixes in the driver

2016-01-21 Thread Shilpasri G Bhat
In POWER8, OCC(On-Chip-Controller) can throttle the frequency of the
CPU when the chip crosses its thermal and power limits. Currently,
powernv-cpufreq driver detects and reports this event as a console
message. Some machines may not sustain the max turbo frequency in all
conditions and can be throttled frequently. This can lead to the
flooding of console with throttle messages. So this patchset aims to
redesign the presentation of this event via sysfs counters and
tracepoints. And it also fixes couple of bugs reported in the driver.

- Patch [1] fixes the cpu hot-plug bug in powernv_cpufreq_work_fn().
- Patch [2] solves a bug in powernv_cpufreq_throttle_check(), which
  calls in to cpu_to_chip_id() in hot path which reads DT every time
  to find the chip id.
- Patches [3] to [5] will add a perf trace point
  "power:powernv_throttle" and sysfs throttle counter stats in
  /sys/devices/system/cpu/cpufreq/chipN.

Changes from v5:
- Fix kbuild error:
drivers/cpufreq/powernv-cpufreq.c:428:2: error: implicit declaration of
function 'get_online_cpus' [-Werror=implicit-function-declaration]

Changes from v4:
- Fix a hot-plug bug in powernv_cpufreq_work_fn()
- Changes wrt Gautham's and Shreyas's comments 

Changes from v3:
- Add a fix to replace cpu_to_chip_id() with simpler PIR shift to 
  obtain the chip id.
- Break patch2 in to two patches separating the tracepoint and sysfs
  attribute changes.

Changes from v2:
- Fixed kbuild test warning.
drivers/cpufreq/powernv-cpufreq.c:609:2: warning: ignoring return
value of 'kstrtoint', declared with attribute warn_unused_result
[-Wunused-result]

Shilpasri G Bhat (5):
  cpufreq: powernv: Hot-plug safe the kworker thread
  cpufreq: powernv: Remove cpu_to_chip_id() from hot-path
  cpufreq: powernv/tracing: Add powernv_throttle tracepoint
  cpufreq: powernv: Replace pr_info with trace print for throttle event
  cpufreq: powernv: Add sysfs attributes to show throttle stats

 drivers/cpufreq/powernv-cpufreq.c | 313 +++---
 include/trace/events/power.h  |  22 +++
 kernel/trace/power-traces.c   |   1 +
 3 files changed, 281 insertions(+), 55 deletions(-)

-- 
1.9.3



Re: [PATCH, REGRESSION v3] mm: make apply_to_page_range more robust

2016-01-21 Thread Pekka Enberg

On 01/22/2016 01:12 AM, David Rientjes wrote:
NACK to your patch as it is just covering up buggy code silently. The 
problem needs to be addressed in change_memory_common() to return if 
there is no size to change (numpages == 0). It's a two line fix to 
that function. 


So add a WARN_ON there to *warn* about the situations. There's really no 
need to BUG_ON here.


- Pekka


Re: [PATCH v13 04/23] perf config: Document variables for 'annotate' section in man page

2016-01-21 Thread Taeung Song

Hi, Arnaldo

Sorry for my tardy response.

On 01/21/2016 11:45 PM, Arnaldo Carvalho de Melo wrote:

Em Fri, Jan 08, 2016 at 08:39:34PM +0900, Taeung Song escreveu:

Explain 'annotate' section and its variables.

'hide_src_code', 'use_offset', 'jump_arrows',
'show_linenr', 'show_nr_jump' and 'show_total_period'.

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Signed-off-by: Taeung Song 
---
  tools/perf/Documentation/perf-config.txt | 110 +++
  1 file changed, 110 insertions(+)

diff --git a/tools/perf/Documentation/perf-config.txt 
b/tools/perf/Documentation/perf-config.txt
index 8835215..85b811f 100644
--- a/tools/perf/Documentation/perf-config.txt
+++ b/tools/perf/Documentation/perf-config.txt
@@ -168,6 +168,116 @@ buildid.*::
cache location, or to disable it altogether. If you want to 
disable it,
set buildid.dir to /dev/null. The default is $HOME/.debug


I suggest you document here also the hotkeys that are available in the
TUI to toggle those knobs, i.e. please go to the annotate browser and
press 'h', you'll get the list of hotkeys, this way, for someone reading
the man page the information will know that this can be done
interactively, not just by changing a config file.



I don't know whether this patch was applied
as it is, because of my tardy response or not..
I saw this patch was included in [GIT PULL] mail.

Would you mind if I add this hotkeys information
into perf-config documentation as new patchset ?


Also it would be interesting to change the annotate/top/report man page
to point to this documentation.



Ok, I understood that add 'linkperf:perf-config[1]' to section
'SEE ALSO' on the annotate/top/report man page.
Is it right ?

If it isn't, add a sentence such as
'Please refer to the perf-config manual.'
to the annotate/top/report documentation ?


But this can be done on top, I'm doing quick text flowing/grammar fixes
and applying as much as I can from this patchkit, thanks for continuing
work on it.


Although it is a minor contribution,
I'm so glad I could contribute. :-)

Thanks,
Taeung


Re: [PATCH v2 3/4] x86/efi: print size and base in binary units in efi_print_memmap

2016-01-21 Thread Andy Shevchenko
On Fri, Jan 22, 2016 at 12:35 AM, Andrew Morton
 wrote:
> On Thu, 21 Jan 2016 17:22:31 +0200 Andy Shevchenko 
>  wrote:
>
>> From: Robert Elliott 
>>
>> Print the base in the best-fit B, KiB, MiB, etc. units rather than
>> always MiB. This avoids rounding, which can be misleading.
>>
>> Use proper IEC binary units (KiB, MiB, etc.) rather than misuse SI
>> decimal units (KB, MB, etc.).
>>
>> old:
>> efi: mem61: [Persistent Memory  |   |  |  |  |  |  |   |WB|WT|WC|UC] 
>> range=[0x00088000-0x000c7fff) (16384MB)
>>
>> new:
>> efi: mem61: [Persistent Memory  |   |  |  |  |  |  |   |WB|WT|WC|UC] 
>> range=[0x00088000-0x000c7fff] (16 GiB)
>
> hm,
>
>> @@ -225,21 +235,20 @@ int __init efi_memblock_x86_reserve_range(void)
>>  void __init efi_print_memmap(void)
>>  {
>>  #ifdef EFI_DEBUG
>> - efi_memory_desc_t *md;
>>   void *p;
>>   int i;
>>
>>   for (p = memmap.map, i = 0;
>>p < memmap.map_end;
>>p += memmap.desc_size, i++) {
>> - char buf[64];
>> + efi_memory_desc_t *md = p;
>> + u64 size = md->num_pages << EFI_PAGE_SHIFT;
>> + char buf[64], buf3[32];
>>
>> - md = p;
>> - pr_info("mem%02u: %s range=[0x%016llx-0x%016llx] (%lluMB)\n",
>> + pr_info("mem%02u: %s range=[0x%016llx-0x%016llx] (%s)\n",
>>   i, efi_md_typeattr_format(buf, sizeof(buf), md),
>> - md->phys_addr,
>> - md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1,
>
> Where did this " - 1" come from?  I can't find a tree which has this.

http://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/commit/?h=next=b324ee15566d9de933e7926c37a4d091904a513b

>
>> - (md->num_pages >> (20 - EFI_PAGE_SHIFT)));
>> + md->phys_addr, md->phys_addr + size - 1,
>
> So I did s/ - 1// here, but worried.

I'm pretty sure the series should go via efi tree.

>
>> + efi_size_format(buf3, sizeof(buf3), size));
>>   }
>>  #endif  /*  EFI_DEBUG  */
>>  }
>



-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH] perf core: Get rid of 'uses dynamic stack allocation' warning

2016-01-21 Thread kbuild test robot
Hi Wang,

[auto build test WARNING on tip/perf/core]
[also build test WARNING on v4.4 next-20160121]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Wang-Nan/perf-core-Get-rid-of-uses-dynamic-stack-allocation-warning/20160122-145515
config: i386-tinyconfig (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

   kernel/events/core.c: In function 'perf_event_read_event':
>> kernel/events/core.c:5571:2: warning: ISO C90 forbids mixed declarations and 
>> code [-Wdeclaration-after-statement]
 struct perf_read_event read_event = {
 ^
   kernel/events/core.c: In function 'perf_event_task_output':
   kernel/events/core.c:5695:2: warning: ISO C90 forbids mixed declarations and 
code [-Wdeclaration-after-statement]
 struct task_struct *task = task_event->task;
 ^
   kernel/events/core.c: In function 'perf_event_comm_output':
   kernel/events/core.c:5791:2: warning: ISO C90 forbids mixed declarations and 
code [-Wdeclaration-after-statement]
 int size = comm_event->event_id.header.size;
 ^
   kernel/events/core.c: In function 'perf_event_mmap_output':
   kernel/events/core.c:5904:2: warning: ISO C90 forbids mixed declarations and 
code [-Wdeclaration-after-statement]
 int size = mmap_event->event_id.header.size;
 ^
   kernel/events/core.c: In function 'perf_event_aux_event':
   kernel/events/core.c:6109:2: warning: ISO C90 forbids mixed declarations and 
code [-Wdeclaration-after-statement]
 struct perf_aux_event {
 ^
   kernel/events/core.c: In function 'perf_log_lost_samples':
   kernel/events/core.c:6145:2: warning: ISO C90 forbids mixed declarations and 
code [-Wdeclaration-after-statement]
 int ret;
 ^
   kernel/events/core.c: In function 'perf_event_switch_output':
   kernel/events/core.c:6196:2: warning: ISO C90 forbids mixed declarations and 
code [-Wdeclaration-after-statement]
 int ret;
 ^
   kernel/events/core.c: In function 'perf_log_throttle':
   kernel/events/core.c:6264:2: warning: ISO C90 forbids mixed declarations and 
code [-Wdeclaration-after-statement]
 int ret;
 ^
   kernel/events/core.c: In function 'perf_log_itrace_start':
   kernel/events/core.c:6301:2: warning: ISO C90 forbids mixed declarations and 
code [-Wdeclaration-after-statement]
 struct perf_aux_event {
 ^
   kernel/events/core.c: In function 'perf_bp_event':
   kernel/events/core.c:7078:2: warning: ISO C90 forbids mixed declarations and 
code [-Wdeclaration-after-statement]
 struct pt_regs *regs = data;
 ^
   kernel/events/core.c: In function 'perf_swevent_hrtimer':
   kernel/events/core.c:7095:2: warning: ISO C90 forbids mixed declarations and 
code [-Wdeclaration-after-statement]
 struct pt_regs *regs;
 ^

vim +5571 kernel/events/core.c

cdd6c482 kernel/perf_event.c   Ingo Molnar  2009-09-21     * 
read event_id
38b200d6 kernel/perf_counter.c Peter Zijlstra   2009-06-23  5556   */
38b200d6 kernel/perf_counter.c Peter Zijlstra   2009-06-23  5557  
38b200d6 kernel/perf_counter.c Peter Zijlstra   2009-06-23  5558  
struct perf_read_event {
38b200d6 kernel/perf_counter.c Peter Zijlstra   2009-06-23  5559
struct perf_event_headerheader;
38b200d6 kernel/perf_counter.c Peter Zijlstra   2009-06-23  5560  
38b200d6 kernel/perf_counter.c Peter Zijlstra   2009-06-23  5561
u32 pid;
38b200d6 kernel/perf_counter.c Peter Zijlstra   2009-06-23  5562
u32 tid;
38b200d6 kernel/perf_counter.c Peter Zijlstra   2009-06-23  5563  };
38b200d6 kernel/perf_counter.c Peter Zijlstra   2009-06-23  5564  
38b200d6 kernel/perf_counter.c Peter Zijlstra   2009-06-23  5565  
static void
cdd6c482 kernel/perf_event.c   Ingo Molnar  2009-09-21  5566  
perf_event_read_event(struct perf_event *event,
38b200d6 kernel/perf_counter.c Peter Zijlstra   2009-06-23  5567
struct task_struct *task)
38b200d6 kernel/perf_counter.c Peter Zijlstra   2009-06-23  5568  {
38b200d6 kernel/perf_counter.c Peter Zijlstra   2009-06-23  5569
struct perf_output_handle handle;
3ff786eb kernel/events/core.c  Wang Nan 2016-01-22  5570
DEFINE_PERF_SAMPLE_DATA_ALIGNED(psample, temp);
dfc65094 kernel/perf_counter.c Ingo Molnar  2009-09-21 @5571
struct perf_read_event read_event = {
38b200d6 kernel/perf_counter.c Peter Zijlstra   2009-06-23  5572
.header = {
cdd6c482 kernel/perf_event.c   Ingo Molnar  2009-09-21  5573
.type = PERF_RECORD_READ,
38b200d6 kernel/perf_counter.c Peter Zijlstra   2009-06-23  5574

[RFC] mm: shall we add an entry in meminfo to show the memory from module?

2016-01-21 Thread Xishi Qiu
Currently /proc/meminfo will not show the memory from module.

This entry "VmallocUsed: xxx" only shows the memory in the range
[VMALLOC_START, VMALLOC_END] alloced by vmalloc() ->... -> 
__vmalloc_node_range().

The memory which used by module is from module_alloc() -> 
__vmalloc_node_range().

So we will miss some memory when we calculate the total in meminfo.

Thanks,
Xishi Qiu



RE: [RFC] spi-nor: fix cross die reads on Micron multi-die devices

2016-01-21 Thread beanhuo
> Hi Bean,
> 
> On Thu, 21 Jan 2016 01:06:48 +
> Bean Huo 霍斌斌 (beanhuo)  wrote:
> 
> >  Hi, Adam and Boris
> >
> > For Micron MT25Q ,MT25T and MT35Q, they does not exist this action
> > even they are Multi-die devices. So when the last byte of the die
> > selected is read, the next byte output is the first byte of next die(not the
> same die).
> > You can check this by extended address register chapter in our
> > datasheet, there are detail Information.
> 
> I never said you were wrong ;), I just asked if it was relevant to 
> differentiate
> the two cases. IOW, would the implementation proposed by Adam work
> correctly on all chips? And what is the real performance penalty for
> MT25Q ,MT25T and MT35Q if we decide to split the read command in several
> reads to handle this cross die case?
For this , performance penalty is tiny, can ignore. 
SPI NOR read performance only depends on SPI I/O clock. Not the same as NAND.
> Best Regards,
> 
> Boris
> 
> --
> Boris Brezillon, Free Electrons
> Embedded Linux and Kernel engineering
> http://free-electrons.com


[PATCH v7 2/2] power: add documentation for ACT8945A's charger DT bindings

2016-01-21 Thread Wenyou Yang
This patch adds documentation for the DT bindings of the charger
subdevice of ACT8945A MFD.

Signed-off-by: Wenyou Yang 
Reviewed-by: Krzysztof Kozlowski 
---

Changes in v7: None
Changes in v6: None
Changes in v5:
 - collect Reviewed-by from Krzysztof.

Changes in v4:
 - change the properties with more legible name, clearer description.

Changes in v3: None
Changes in v2: None

 .../devicetree/bindings/power/act8945a-charger.txt |   33 
 1 file changed, 33 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/power/act8945a-charger.txt

diff --git a/Documentation/devicetree/bindings/power/act8945a-charger.txt 
b/Documentation/devicetree/bindings/power/act8945a-charger.txt
new file mode 100644
index 000..2055da7
--- /dev/null
+++ b/Documentation/devicetree/bindings/power/act8945a-charger.txt
@@ -0,0 +1,33 @@
+Device-Tree bindings for charger of Active-semi ACT8945A Multi-Function Device
+
+Required properties:
+ - compatible: "active-semi,act8945a-charger".
+ - active-semi,chglev-gpios: charge current level phandle with args
+   as described in ../gpio/gpio.txt.
+
+Optional properties:
+ - active-semi,check-battery-temperature: boolean to check the battery
+   temperature or not.
+ - active-semi,input-voltage-threshold-microvolt: unit: mV;
+   Specifies the charger's input over-voltage threshold value;
+   The value can be: 6600, 7000, 7500, 8000; default: 6600
+ - active-semi,precondition-timeout: unit: minutes;
+   Specifies the charger's PRECONDITION safety timer setting value;
+   The value can be: 40, 60, 80, 0; If 0, it means to disable this timer;
+   default: 40.
+ - active-semi,total-timeout: unit: hours;
+   Specifies the charger's total safety timer setting value;
+   The value can be: 3, 4, 5, 0; If 0, it means to disable this timer;
+   default: 3.
+
+Example:
+
+   charger {
+   compatible = "active-semi,act8945a-charger";
+   pinctrl-names = "default";
+   pinctrl-0 = <_charger_chglev>;
+   active-semi,chglev-gpios = < 12 GPIO_ACTIVE_HIGH>;
+   active-semi,input-voltage-threshold-microvolt = <6600>;
+   active-semi,precondition-timeout = <40>;
+   active-semi,total-timeout = <3>;
+   };
-- 
1.7.9.5



Re: [lkp] [kallsyms] 06862f34f6: BUG: unable to handle kernel NULL pointer dereference at (null)

2016-01-21 Thread Ard Biesheuvel
On 22 January 2016 at 03:20, kernel test robot
 wrote:
> FYI, we noticed the below changes on
>
> https://git.linaro.org/people/ard.biesheuvel/linux-arm kallsyms-text-relative
> commit 06862f34f614bb6ff6a9fc9c4b0d849e2ee2018d ("kallsyms: add support for 
> relative offsets in kallsyms address table")
>
>
> +---+++
> |   | 30f05309bd | 
> 06862f34f6 |
> +---+++
> | boot_successes| 0  | 0  
> |
> | boot_failures | 6  | 8  
> |
> | Kernel_panic-not_syncing:Attempted_to_kill_init!exitcode= | 4  |
> |
> | BUG:kernel_test_oversize  | 2  |
> |
> | BUG:unable_to_handle_kernel   | 0  | 8  
> |
> +---+++
>
>
>
> [0.568228] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
> [0.568971] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
> [0.569835] CPU: Intel Xeon E312xx (Sandy Bridge) (family: 0x6, model: 
> 0x2a, stepping: 0x1)
> [0.598441] BUG: unable to handle kernel NULL pointer dereference at   
> (null)
> [0.599646] IP:
> [0.599884] BUG: unable to handle kernel NULL pointer dereference at   
> (null)
> [0.610184] IP:
> Elapsed time: 10
> qemu-system-x86_64 -enable-kvm -cpu SandyBridge -kernel 
> /pkg/linux/x86_64-randconfig-s4-01220217/gcc-5/06862f34f614bb6ff6a9fc9c4b0d849e2ee2018d/vmlinuz-4.4.0-10063-g06862f34
>  -append 'root=/dev/ram0 user=lkp 
> job=/lkp/scheduled/vm-kbuild-yocto-x86_64-62/bisect_boot-1-yocto-minimal-x86_64.cgz-x86_64-randconfig-s4-01220217-06862f34f614bb6ff6a9fc9c4b0d849e2ee2018d-20160122-39812-uowsx2-0.yaml
>  ARCH=x86_64 kconfig=x86_64-randconfig-s4-01220217 
> branch=linux-devel/devel-spot-201601220143 
> commit=06862f34f614bb6ff6a9fc9c4b0d849e2ee2018d 
> BOOT_IMAGE=/pkg/linux/x86_64-randconfig-s4-01220217/gcc-5/06862f34f614bb6ff6a9fc9c4b0d849e2ee2018d/vmlinuz-4.4.0-10063-g06862f34
>  max_uptime=600 
> RESULT_ROOT=/result/boot/1/vm-kbuild-yocto-x86_64/yocto-minimal-x86_64.cgz/x86_64-randconfig-s4-01220217/gcc-5/06862f34f614bb6ff6a9fc9c4b0d849e2ee2018d/0
>  LKP_SERVER=inn earlyprintk=ttyS0,115200 systemd.log_level=err debug 
> apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 
> softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 
> prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal rw 
> ip=vm-kbuild-yocto-x86_64-62::dhcp drbd.minor_count=8'  -initrd 
> /fs/sdg1/initrd-vm-kbuild-yocto-x86_64-62 -m 320 -smp 1 -device 
> e1000,netdev=net0 -netdev user,id=net0 -boot order=nc -no-reboot -watchdog 
> i6300esb -rtc base=localtime -drive 
> file=/fs/sdg1/disk0-vm-kbuild-yocto-x86_64-62,media=disk,if=virtio -pidfile 
> /dev/shm/kboot/pid-vm-kbuild-yocto-x86_64-62 -serial 
> file:/dev/shm/kboot/serial-vm-kbuild-yocto-x86_64-62 -daemonize -display none 
> -monitor null
>

Hello Ying,

Thanks for the report. Is this a clean build? I cannot reproduce the
issue with the attached config. Do you have the vmlinux file available
for inspection?

Thanks,
Ard.


[PATCH v7 1/2] power: act8945a: add charger driver for ACT8945A

2016-01-21 Thread Wenyou Yang
This patch adds new driver for Active-semi ACT8945A ActivePath
charger (part of ACT8945A MFD driver) providing power supply class
information to userspace.

The driver can be configured through DT (such as, total timer,
precondition timer and input over-voltage threshold).

Signed-off-by: Wenyou Yang 
---

Changes in v7:
 - use the helper dev_get_regmap(pdev->dev.parent, NULL) to get regmap.
 - remove *act8945a_dev member from struct act8945a_charger.
 - remove *psy member from struct act8945a_charger.
 - merge _parse_dt() and _charger_config() functions, remove relevent
   members from struct act8945a_charger.
 - remove unused platform_set_drvdata(pdev, charger) statement.

Changes in v6:
 - change the type value to unsigned int.

Changes in v5:
 - remove spare spaces after #define.
 - add OF match table.

Changes in v4:
 - use spaces after #define, not tabs.
 - use BIT(n) macros to substitute (0x01 << x).
 - change dt properties with more legible name.

Changes in v3:
 - update the file header with short version license and author line.
 - remove unused member of struct act8945a_charger, dev.
 - action due to removing the member of stuct act8945a_dev, dev.
 - remove the unnecessary print out.
 - remove the unnecessary act8945a_charger_remove().
 - fix align of the code-style.

Changes in v2:
 1./ Substitute of_property_read_bool() for of_get_property().
 2./ Substitute devm_power_supply_register() for power_supply_register().
 3./ Use module_platform_driver(), instead of subsys_initcall().
 4./ Substitute MODULE_LICENSE("GPL") for MODULE_LICENSE("GPL v2").

 drivers/power/Kconfig|7 +
 drivers/power/Makefile   |1 +
 drivers/power/act8945a_charger.c |  362 ++
 3 files changed, 370 insertions(+)
 create mode 100644 drivers/power/act8945a_charger.c

diff --git a/drivers/power/Kconfig b/drivers/power/Kconfig
index 1ddd13c..ae75211 100644
--- a/drivers/power/Kconfig
+++ b/drivers/power/Kconfig
@@ -75,6 +75,13 @@ config BATTERY_88PM860X
help
  Say Y here to enable battery monitor for Marvell 88PM860x chip.
 
+config BATTERY_ACT8945A
+   tristate "Active-semi ACT8945A charger driver"
+   depends on MFD_ACT8945A
+   help
+ Say Y here to enable support for power supply provided by
+ Active-semi ActivePath ACT8945A charger.
+
 config BATTERY_DS2760
tristate "DS2760 battery driver (HP iPAQ & others)"
depends on W1 && W1_SLAVE_DS2760
diff --git a/drivers/power/Makefile b/drivers/power/Makefile
index 0e4eab5..e46b75d 100644
--- a/drivers/power/Makefile
+++ b/drivers/power/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_WM8350_POWER)+= wm8350_power.o
 obj-$(CONFIG_TEST_POWER)   += test_power.o
 
 obj-$(CONFIG_BATTERY_88PM860X) += 88pm860x_battery.o
+obj-$(CONFIG_BATTERY_ACT8945A) += act8945a_charger.o
 obj-$(CONFIG_BATTERY_DS2760)   += ds2760_battery.o
 obj-$(CONFIG_BATTERY_DS2780)   += ds2780_battery.o
 obj-$(CONFIG_BATTERY_DS2781)   += ds2781_battery.o
diff --git a/drivers/power/act8945a_charger.c b/drivers/power/act8945a_charger.c
new file mode 100644
index 000..930614e
--- /dev/null
+++ b/drivers/power/act8945a_charger.c
@@ -0,0 +1,362 @@
+/*
+ * Power supply driver for the Active-semi ACT8945A PMIC
+ *
+ * Copyright (C) 2015 Atmel Corporation
+ *
+ * Author: Wenyou Yang 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static const char *act8945a_charger_model = "ACT8945A";
+static const char *act8945a_charger_manufacturer = "Active-semi";
+
+/**
+ * ACT8945A Charger Register Map
+ */
+
+/* 0x70: Reserved */
+#define ACT8945A_APCH_CFG  0x71
+#define ACT8945A_APCH_STATUS   0x78
+#define ACT8945A_APCH_CTRL 0x79
+#define ACT8945A_APCH_STATE0x7A
+
+/* ACT8945A_APCH_CFG */
+#define APCH_CFG_OVPSET(0x3 << 0)
+#define APCH_CFG_OVPSET_6V6(0x0 << 0)
+#define APCH_CFG_OVPSET_7V (0x1 << 0)
+#define APCH_CFG_OVPSET_7V5(0x2 << 0)
+#define APCH_CFG_OVPSET_8V (0x3 << 0)
+#define APCH_CFG_PRETIMO   (0x3 << 2)
+#define APCH_CFG_PRETIMO_40_MIN(0x0 << 2)
+#define APCH_CFG_PRETIMO_60_MIN(0x1 << 2)
+#define APCH_CFG_PRETIMO_80_MIN(0x2 << 2)
+#define APCH_CFG_PRETIMO_DISABLED  (0x3 << 2)
+#define APCH_CFG_TOTTIMO   (0x3 << 4)
+#define APCH_CFG_TOTTIMO_3_HOUR(0x0 << 4)
+#define APCH_CFG_TOTTIMO_4_HOUR(0x1 << 4)
+#define APCH_CFG_TOTTIMO_5_HOUR(0x2 << 4)
+#define APCH_CFG_TOTTIMO_DISABLED  (0x3 << 4)
+#define APCH_CFG_SUSCHG(0x1 << 7)
+
+#define APCH_STATUS_CHGDAT BIT(0)
+#define APCH_STATUS_INDAT  BIT(1)

[PATCH v7 0/2] power: act8945a: add charger driver for the sub-device of ACT8945A MFD

2016-01-21 Thread Wenyou Yang
The ACT8945A is a Multi Function Device with the following subdevices:
 - Regulator
 - Charger

This patch set is to add regulator driver for ACT8945A.
It is based on the patch set:

http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/401044.html

Changes in v7:
 - use the helper dev_get_regmap(pdev->dev.parent, NULL) to get regmap.
 - remove *act8945a_dev member from struct act8945a_charger.
 - remove *psy member from struct act8945a_charger.
 - merge _parse_dt() and _charger_config() functions, remove relevent
   members from struct act8945a_charger.
 - remove unused platform_set_drvdata(pdev, charger) statement.

Changes in v6:
 - change the type value to unsigned int.

Changes in v5:
 - remove spare spaces after #define.
 - add OF match table.
 - collect Reviewed-by from Krzysztof.

Changes in v4:
 - use spaces after #define, not tabs.
 - use BIT(n) macros to substitute (0x01 << x).
 - change dt properties with more legible name.
 - change the properties with more legible name, clearer description.

Changes in v3:
 - update the file header with short version license and author line.
 - remove unused member of struct act8945a_charger, dev.
 - action due to removing the member of stuct act8945a_dev, dev.
 - remove the unnecessary print out.
 - remove the unnecessary act8945a_charger_remove().
 - fix align of the code-style.

Changes in v2:
 1./ Substitute of_property_read_bool() for of_get_property().
 2./ Substitute devm_power_supply_register() for power_supply_register().
 3./ Use module_platform_driver(), instead of subsys_initcall().
 4./ Substitute MODULE_LICENSE("GPL") for MODULE_LICENSE("GPL v2").

Wenyou Yang (2):
  power: act8945a: add charger driver for ACT8945A
  power: add documentation for ACT8945A's charger DT bindings

 .../devicetree/bindings/power/act8945a-charger.txt |   33 ++
 drivers/power/Kconfig  |7 +
 drivers/power/Makefile |1 +
 drivers/power/act8945a_charger.c   |  362 
 4 files changed, 403 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/power/act8945a-charger.txt
 create mode 100644 drivers/power/act8945a_charger.c

-- 
1.7.9.5



[PATCH] perf core: Get rid of 'uses dynamic stack allocation' warning

2016-01-21 Thread Wang Nan
On s390 with CONFIG_WARN_DYNAMIC_STACK set, 'uses dynamic stack
allocation' warning is issued when defining 'struct perf_sample_data'
local variable.

This patch suppress this warning by allocating extra 255 bytes and
compute aligned pointer manually.

Reported-by: kbuild test robot 
Signed-off-by: Wang Nan 
Cc: Peter Zijlstra 
Cc: pi3or...@163.com
---

I'm not confident on this patch because I know nothing about s390,
and the extra 255 bytes seems too large. Please simply ignore this
patch if it is inappropriate.

KBuild robot say:

 kernel/events/ring_buffer.c: In function 'perf_output_begin':
 kernel/events/ring_buffer.c:251:1: warning: 'perf_output_begin' uses dynamic 
stack allocation
  }
  ^

---
 include/linux/perf_event.h  | 11 ++
 kernel/events/core.c| 86 ++---
 kernel/events/ring_buffer.c |  6 ++--
 3 files changed, 57 insertions(+), 46 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f9828a4..263b6ef 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -797,6 +797,17 @@ struct perf_sample_data {
u64 stack_user_size;
 } cacheline_aligned;
 
+#ifdef CONFIG_WARN_DYNAMIC_STACK
+#define DEFINE_PERF_SAMPLE_DATA_ALIGNED(pn, an) \
+   u8 an[SMP_CACHE_BYTES - 1 + sizeof(struct perf_sample_data)]; \
+   struct perf_sample_data *pn = \
+   (struct perf_sample_data *)PTR_ALIGN(, SMP_CACHE_BYTES)
+#else
+#define DEFINE_PERF_SAMPLE_DATA_ALIGNED(pn, an) \
+   struct perf_sample_data an; \
+   struct perf_sample_data *pn = 
+#endif
+
 /* default value for data source */
 #define PERF_MEM_NA (PERF_MEM_S(OP, NA)   |\
PERF_MEM_S(LVL, NA)   |\
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9e9c84da..36abe60 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5580,7 +5580,7 @@ perf_event_read_event(struct perf_event *event,
struct task_struct *task)
 {
struct perf_output_handle handle;
-   struct perf_sample_data sample;
+   DEFINE_PERF_SAMPLE_DATA_ALIGNED(psample, temp);
struct perf_read_event read_event = {
.header = {
.type = PERF_RECORD_READ,
@@ -5592,14 +5592,14 @@ perf_event_read_event(struct perf_event *event,
};
int ret;
 
-   perf_event_header__init_id(_event.header, , event);
+   perf_event_header__init_id(_event.header, psample, event);
ret = perf_output_begin(, event, read_event.header.size);
if (ret)
return;
 
perf_output_put(, read_event);
perf_output_read(, event);
-   perf_event__output_id_sample(event, , );
+   perf_event__output_id_sample(event, , psample);
 
perf_output_end();
 }
@@ -5704,14 +5704,14 @@ static void perf_event_task_output(struct perf_event 
*event,
 {
struct perf_task_event *task_event = data;
struct perf_output_handle handle;
-   struct perf_sample_data sample;
+   DEFINE_PERF_SAMPLE_DATA_ALIGNED(psample, temp);
struct task_struct *task = task_event->task;
int ret, size = task_event->event_id.header.size;
 
if (!perf_event_task_match(event))
return;
 
-   perf_event_header__init_id(_event->event_id.header, , 
event);
+   perf_event_header__init_id(_event->event_id.header, psample, 
event);
 
ret = perf_output_begin(, event,
task_event->event_id.header.size);
@@ -5728,7 +5728,7 @@ static void perf_event_task_output(struct perf_event 
*event,
 
perf_output_put(, task_event->event_id);
 
-   perf_event__output_id_sample(event, , );
+   perf_event__output_id_sample(event, , psample);
 
perf_output_end();
 out:
@@ -5800,14 +5800,14 @@ static void perf_event_comm_output(struct perf_event 
*event,
 {
struct perf_comm_event *comm_event = data;
struct perf_output_handle handle;
-   struct perf_sample_data sample;
+   DEFINE_PERF_SAMPLE_DATA_ALIGNED(psample, temp);
int size = comm_event->event_id.header.size;
int ret;
 
if (!perf_event_comm_match(event))
return;
 
-   perf_event_header__init_id(_event->event_id.header, , 
event);
+   perf_event_header__init_id(_event->event_id.header, psample, 
event);
ret = perf_output_begin(, event,
comm_event->event_id.header.size);
 
@@ -5821,7 +5821,7 @@ static void perf_event_comm_output(struct perf_event 
*event,
__output_copy(, comm_event->comm,
   comm_event->comm_size);
 
-   perf_event__output_id_sample(event, , );
+   perf_event__output_id_sample(event, , psample);
 
perf_output_end();
 out:
@@ -5913,7 +5913,7 @@ static void perf_event_mmap_output(struct perf_event 
*event,
 {
struct perf_mmap_event *mmap_event = data;
struct 

Re: [PATCH kernel] vfio: Only check for bus IOMMU when NOIOMMU is selected

2016-01-21 Thread Alexey Kardashevskiy

On 01/22/2016 05:34 PM, Alexey Kardashevskiy wrote:

Recent change 03a76b60 "vfio: Include No-IOMMU mode" disabled VFIO
on systems which do not implement iommu_ops for the PCI bus even though
there is an VFIO IOMMU driver for it such as SPAPR TCE driver for
PPC64/powernv platform.

This moves iommu_present() call under #ifdef CONFIG_VFIO_NOIOMMU as
it is done in the rest of the file to re-enable VFIO on powernv.

Signed-off-by: Alexey Kardashevskiy 
---
  drivers/vfio/vfio.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 82f25cc..3f8060e 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -343,7 +343,6 @@ static struct vfio_group *vfio_create_group(struct 
iommu_group *iommu_group,
atomic_set(>opened, 0);
group->iommu_group = iommu_group;
group->noiommu = !iommu_present;
-



Agrh. Unrelated change, repost?



group->nb.notifier_call = vfio_iommu_group_notifier;

/*
@@ -767,7 +766,11 @@ int vfio_add_group_dev(struct device *dev,

group = vfio_group_get_from_iommu(iommu_group);
if (!group) {
+#ifdef CONFIG_VFIO_NOIOMMU
group = vfio_create_group(iommu_group, iommu_present(dev->bus));
+#else
+   group = vfio_create_group(iommu_group, true);
+#endif
if (IS_ERR(group)) {
iommu_group_put(iommu_group);
return PTR_ERR(group);




--
Alexey


[PATCH v7 2/2] mfd: add documentation for ACT8945A DT bindings

2016-01-21 Thread Wenyou Yang
The Active-semi ACT8945A PMIC is a Multi-Function Device, it has
two subdevices:
 - Regulator
 - Charger

This patch adds documentation for ACT8945A DT bindings.

Signed-off-by: Wenyou Yang 
Acked-by: Rob Herring 
---

Changes in v7: None
Changes in v6:
 - change the regulator name.

Changes in v5: None
Changes in v4: None
Changes in v3:
 - fix the tabbing errors in Example.
 - use "pmic@5b" label, not "act8945a@5b" in Example.
 - collect Acked-by from Rob.

Changes in v2:
 - use more specific label in Example.
 - add pmic and charger nodes in Example.

 Documentation/devicetree/bindings/mfd/act8945a.txt |   82 
 1 file changed, 82 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/mfd/act8945a.txt

diff --git a/Documentation/devicetree/bindings/mfd/act8945a.txt 
b/Documentation/devicetree/bindings/mfd/act8945a.txt
new file mode 100644
index 000..f2a8387
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/act8945a.txt
@@ -0,0 +1,82 @@
+Device-Tree bindings for Active-semi ACT8945A MFD driver
+
+Required properties:
+ - compatible: "active-semi,act8945a".
+ - reg: the I2C slave address for the ACT8945A chip
+
+The chip exposes two subdevices:
+ - a regulators: see ../regulator/act8945a-regulator.txt
+ - a charger: see ../power/act8945a-charger.txt
+
+Example:
+   pmic@5b {
+   compatible = "active-semi,act8945a";
+   reg = <0x5b>;
+   status = "okay";
+
+   pmic {
+   compatible = "active-semi,act8945a-regulator";
+   active-semi,vsel-high;
+
+   regulators {
+   vdd_1v35_reg: REG_DCDC1 {
+   regulator-name = "VDD_1V35";
+   regulator-min-microvolt = <135>;
+   regulator-max-microvolt = <135>;
+   regulator-always-on;
+   };
+
+   vdd_1v2_reg: REG_DCDC2 {
+   regulator-name = "VDD_1V2";
+   regulator-min-microvolt = <110>;
+   regulator-max-microvolt = <130>;
+   regulator-always-on;
+   };
+
+   vdd_3v3_reg: REG_DCDC3 {
+   regulator-name = "VDD_3V3";
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-always-on;
+   };
+
+   vdd_fuse_reg: REG_LDO1 {
+   regulator-name = "VDD_FUSE";
+   regulator-min-microvolt = <250>;
+   regulator-max-microvolt = <250>;
+   regulator-always-on;
+   };
+
+   vdd_3v3_lp_reg: REG_LDO2 {
+   regulator-name = "VDD_3V3_LP";
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-always-on;
+   };
+
+   vdd_led_reg: REG_LDO3 {
+   regulator-name = "VDD_LED";
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-always-on;
+   };
+
+   vdd_sdhc_1v8_reg: REG_LDO4 {
+   regulator-name = "VDD_SDHC_1V8";
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   regulator-always-on;
+   };
+   };
+   };
+
+   charger {
+   compatible = "active-semi,act8945a-charger";
+   pinctrl-names = "default";
+   pinctrl-0 = <_charger_chglev>;
+   active-semi,chglev-gpio = < 12 GPIO_ACTIVE_HIGH>;
+   active-semi,input_voltage_threshold = <6600>;
+   active-semi,precondition_timeout = <40>;
+   active-semi,total_timeout = <3>;
+   };
+   };
-- 
1.7.9.5



[PATCH v7 1/2] mfd: act8945a: add Active-semi ACT8945A PMIC MFD driver

2016-01-21 Thread Wenyou Yang
This patch adds support for the Active-semi ACT8945A PMIC.
It is a Multi Function Device with the following subdevices:
 - Regulator
 - Charger

It is interfaced to the host controller using I2C interface,
ACT8945A is a child device of the I2C.

Signed-off-by: Wenyou Yang 
Reviewed-by: Krzysztof Kozlowski 
---

Changes in v7:
 - remove struct act8945a_dev to .c file.
 - remove unused .h file.

Changes in v6:
 - change MFD_ACT8945A type from bool to tristate.
 - revert depends on to 'I2C'.

Changes in v5:
 - change depends on to 'I2C=y'.

Changes in v4:
 - add a space before .compatible.
 - collect Reviewed-by from Krzysztof Kozlowski.

Changes in v3: None
Changes in v2:
 - add more help information in Kconfig.
 - update the file header with short version license and author line.
 - remove unused structure members (dev, i2c_client) of stuct act8945a_dev.
 - use define "PLATFORM_DEVID_NONE" for mfd_add_devices(), instead of '-1'.
 - use more explicit info to indicate the failure to add sub devices.
 - remove the unnecessary print out.
 - substitute MODULE_LICENSE("GPL") for MODULE_LICENSE("GPL v2").

 drivers/mfd/Kconfig|   11 +
 drivers/mfd/Makefile   |1 +
 drivers/mfd/act8945a.c |  112 
 3 files changed, 124 insertions(+)
 create mode 100644 drivers/mfd/act8945a.c

diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
index 9ca66de..5038e78 100644
--- a/drivers/mfd/Kconfig
+++ b/drivers/mfd/Kconfig
@@ -18,6 +18,17 @@ config MFD_CS5535
  This is the core driver for CS5535/CS5536 MFD functions.  This is
   necessary for using the board's GPIO and MFGPT functionality.
 
+config MFD_ACT8945A
+   tristate "Active-semi ACT8945A"
+   select MFD_CORE
+   select REGMAP_I2C
+   depends on I2C && OF
+   help
+ Support for the ACT8945A PMIC from Active-semi. This device
+ features three step-down DC/DC converters and four low-dropout
+ linear regulators, along with a complete ActivePath battery
+ charger.
+
 config MFD_AS3711
bool "AMS AS3711"
select MFD_CORE
diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile
index 0f230a6..2f1ca82 100644
--- a/drivers/mfd/Makefile
+++ b/drivers/mfd/Makefile
@@ -6,6 +6,7 @@
 obj-$(CONFIG_MFD_88PM860X) += 88pm860x.o
 obj-$(CONFIG_MFD_88PM800)  += 88pm800.o 88pm80x.o
 obj-$(CONFIG_MFD_88PM805)  += 88pm805.o 88pm80x.o
+obj-$(CONFIG_MFD_ACT8945A) += act8945a.o
 obj-$(CONFIG_MFD_SM501)+= sm501.o
 obj-$(CONFIG_MFD_ASIC3)+= asic3.o tmio_core.o
 obj-$(CONFIG_MFD_BCM590XX) += bcm590xx.o
diff --git a/drivers/mfd/act8945a.c b/drivers/mfd/act8945a.c
new file mode 100644
index 000..d2e01941
--- /dev/null
+++ b/drivers/mfd/act8945a.c
@@ -0,0 +1,112 @@
+/*
+ * MFD driver for Active-semi ACT8945a PMIC
+ *
+ * Copyright (C) 2015 Atmel Corporation.
+ *
+ * Author: Wenyou Yang 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under  the terms of the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the License, or (at your
+ * option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct act8945a_dev {
+   struct regmap *regmap;
+};
+
+static const struct mfd_cell act8945a_devs[] = {
+   {
+   .name = "act8945a-pmic",
+   .of_compatible = "active-semi,act8945a-regulator",
+   },
+   {
+   .name = "act8945a-charger",
+   .of_compatible = "active-semi,act8945a-charger",
+   },
+};
+
+static const struct regmap_config act8945a_regmap_config = {
+   .reg_bits = 8,
+   .val_bits = 8,
+};
+
+static int act8945a_i2c_probe(struct i2c_client *i2c,
+ const struct i2c_device_id *id)
+{
+   struct act8945a_dev *act8945a;
+   int ret;
+
+   act8945a = devm_kzalloc(>dev, sizeof(*act8945a), GFP_KERNEL);
+   if (!act8945a)
+   return -ENOMEM;
+
+   i2c_set_clientdata(i2c, act8945a);
+
+   act8945a->regmap = devm_regmap_init_i2c(i2c, _regmap_config);
+   if (IS_ERR(act8945a->regmap)) {
+   ret = PTR_ERR(act8945a->regmap);
+   dev_err(>dev, "regmap init failed: %d\n", ret);
+   return ret;
+   }
+
+   ret = mfd_add_devices(>dev, PLATFORM_DEVID_NONE, act8945a_devs,
+ ARRAY_SIZE(act8945a_devs), NULL, 0, NULL);
+   if (ret) {
+   dev_err(>dev, "Failed to add sub devices\n");
+   return ret;
+   }
+
+   return 0;
+}
+
+static int act8945a_i2c_remove(struct i2c_client *i2c)
+{
+   mfd_remove_devices(>dev);
+
+   return 0;
+}
+
+static const struct i2c_device_id act8945a_i2c_id[] = {
+   { "act8945a", 0 },
+   {}
+};
+MODULE_DEVICE_TABLE(i2c, act8945a_i2c_id);
+
+static const struct of_device_id act8945a_of_match[] = {
+   { .compatible 

[PATCH v7 0/2] mfd: act8945a: add Active-semi ACT8945A PMIC MFD driver

2016-01-21 Thread Wenyou Yang
This patch set adds support for the Active-semi ACT8945A PMIC
MFD driver. It is a Multi Function Device with the following
subdevices:
 - Regulator
 - Charger

It is interfaced to the host controller using I2C interface,
ACT8945A is a child device of the I2C.

Changes in v7:
 - remove struct act8945a_dev to .c file.
 - remove unused .h file.

Changes in v6:
 - change MFD_ACT8945A type from bool to tristate.
 - revert depends on to 'I2C'.
 - change the regulator name.

Changes in v5:
 - change depends on to 'I2C=y'.

Changes in v4:
 - add a space before .compatible.
 - collect Reviewed-by from Krzysztof Kozlowski.

Changes in v3:
 - fix the tabbing errors in Example.
 - use "pmic@5b" label, not "act8945a@5b" in Example.
 - collect Acked-by from Rob.

Changes in v2:
 - add more help information in Kconfig.
 - update the file header with short version license and author line.
 - remove unused structure members (dev, i2c_client) of stuct act8945a_dev.
 - use define "PLATFORM_DEVID_NONE" for mfd_add_devices(), instead of '-1'.
 - use more explicit info to indicate the failure to add sub devices.
 - remove the unnecessary print out.
 - substitute MODULE_LICENSE("GPL") for MODULE_LICENSE("GPL v2").
 - use more specific label in Example.
 - add pmic and charger nodes in Example.

Wenyou Yang (2):
  mfd: act8945a: add Active-semi ACT8945A PMIC MFD driver
  mfd: add documentation for ACT8945A DT bindings

 Documentation/devicetree/bindings/mfd/act8945a.txt |   82 ++
 drivers/mfd/Kconfig|   11 ++
 drivers/mfd/Makefile   |1 +
 drivers/mfd/act8945a.c |  112 
 4 files changed, 206 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/mfd/act8945a.txt
 create mode 100644 drivers/mfd/act8945a.c

-- 
1.7.9.5



[RFC PATCH 0/1] Adding previous syscall context to seccomp

2016-01-21 Thread Daniel Sangorrin
Hi,

During my presentation last year at Linuxcon Japan [1], I released
a proof-of-concept patch [2] for the seccomp subsystem. The main
purpose of that patch was to let applications restrict the order
in which their system calls are requested. In more technical terms,
a host-based anomaly intrusion detection system (HIDS) that uses call
sequence monitoring for detecting unusual patterns. For example,
to detect when the execution flow unexpectedly diverts towards the
'mprotect' syscall, perhaps after a stack overflow.

The main target for the patch was embedded real-time systems
where applications have a high degree of determinism. For that
reason, my original proof-of-concept patch was using bitmaps,
which allow for a constant O(1) overhead (correct me if
I'm wrong but I think the current seccomp-filter implementation
introduces an O(n) overhead proportional to the number of system
calls that one wants to allow or prohibit).

However, I realized that it would be too hard to merge with the
current code. I have adapted my original patch which now allows
BPF filters to retrieve information regarding the previous system
call requested by the application.

The patch can be tested on linux-master as follows (tested
on Debian Jessie x86_64):

  $ sudo vi /usr/include/linux/seccomp.h
   ...
   struct seccomp_data {
int nr;
int prev_nr; <-- add this entry
  ...
  $ cd samples/seccomp/
  $ make bpf-prev
  $ ./bpf-prev
parent msgsnd: hello
parent msgrcv after prctl: hello (128 bytes)
parent msgsnd: world
parent msgrcv after msgsnd: world (128 bytes)
parent msgsnd: this is mars
child msgrcv after clone: this is mars (128 bytes)
parent: child 11409 exited with status 0
Should fail: Bad system call

For simplicity, at the moment the patch only records the last
requested system call. Despite being vulnerable to specially-
crafted mimicry attacks, I think it can deter common attacks
specially during the "initial phase" of the attack (e.g.: a 
return-oriented jump).

It could also be extended with longer call sequences (NGRAMs),
call stack and call site information, or scratch memory for
restricting a system call to the application's initalization
for example. However, I'm not sure if such complexity would
be worth. I would like to know at this early stage if any
of you is interested in this type of approach and what you
think about it.

Thanks,
Daniel

[1] Kernel security hacking for the Internet of Things
http://events.linuxfoundation.jp/sites/events/files/slides/linuxcon-2015-daniel-sangorrin-final.pdf
[2] https://github.com/sangorrin/linuxcon-japan-2015/tree/master/hids

Daniel Sangorrin (1):
  seccomp: provide information about the previous syscall

 include/linux/seccomp.h  |   2 +
 include/uapi/linux/seccomp.h |   2 +
 kernel/seccomp.c |  10 +++
 samples/seccomp/.gitignore   |   1 +
 samples/seccomp/Makefile |   9 ++-
 samples/seccomp/bpf-prev.c   | 160 +++
 6 files changed, 183 insertions(+), 1 deletion(-)
 create mode 100644 samples/seccomp/bpf-prev.c

-- 
2.1.4




[RFC PATCH 1/1] seccomp: provide information about the previous syscall

2016-01-21 Thread Daniel Sangorrin
This patch allows applications to restrict the order in which
its system calls may be requested. In order to do that, we
provide seccomp-BPF scripts with information about the
previous system call requested.

An example use case consists of detecting (and stopping) return
oriented attacks that disturb the normal execution flow of
a user program.

Signed-off-by: Daniel Sangorrin 
---
 include/linux/seccomp.h  |   2 +
 include/uapi/linux/seccomp.h |   2 +
 kernel/seccomp.c |  10 +++
 samples/seccomp/.gitignore   |   1 +
 samples/seccomp/Makefile |   9 ++-
 samples/seccomp/bpf-prev.c   | 160 +++
 6 files changed, 183 insertions(+), 1 deletion(-)
 create mode 100644 samples/seccomp/bpf-prev.c

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 2296e6b..8c6de6d 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -16,6 +16,7 @@ struct seccomp_filter;
  *
  * @mode:  indicates one of the valid values above for controlled
  * system calls available to a process.
+ * @prev_nr: stores the previous system call number.
  * @filter: must always point to a valid seccomp-filter or NULL as it is
  *  accessed without locking during system call entry.
  *
@@ -24,6 +25,7 @@ struct seccomp_filter;
  */
 struct seccomp {
int mode;
+   int prev_nr;
struct seccomp_filter *filter;
 };
 
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 0f238a4..42775dc 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -38,6 +38,7 @@
 /**
  * struct seccomp_data - the format the BPF program executes over.
  * @nr: the system call number
+ * @prev_nr: the previous system call number
  * @arch: indicates system call convention as an AUDIT_ARCH_* value
  *as defined in .
  * @instruction_pointer: at the time of the system call.
@@ -46,6 +47,7 @@
  */
 struct seccomp_data {
int nr;
+   int prev_nr;
__u32 arch;
__u64 instruction_pointer;
__u64 args[6];
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 580ac2d..98b2c9d3 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -190,6 +190,8 @@ static u32 seccomp_run_filters(struct seccomp_data *sd)
sd = _local;
}
 
+   sd->prev_nr = current->seccomp.prev_nr;
+
/*
 * All filters in the list are evaluated and the lowest BPF return
 * value always takes priority (ignoring the DATA).
@@ -200,6 +202,9 @@ static u32 seccomp_run_filters(struct seccomp_data *sd)
if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
ret = cur_ret;
}
+
+   current->seccomp.prev_nr = sd->nr;
+
return ret;
 }
 #endif /* CONFIG_SECCOMP_FILTER */
@@ -443,6 +448,11 @@ static long seccomp_attach_filter(unsigned int flags,
return ret;
}
 
+   /* Initialize the prev_nr field only once */
+   if (current->seccomp.filter == NULL)
+   current->seccomp.prev_nr =
+   syscall_get_nr(current, task_pt_regs(current));
+
/*
 * If there is an existing filter, make it the prev and don't drop its
 * task reference.
diff --git a/samples/seccomp/.gitignore b/samples/seccomp/.gitignore
index 78fb781..11dda7a 100644
--- a/samples/seccomp/.gitignore
+++ b/samples/seccomp/.gitignore
@@ -1,3 +1,4 @@
 bpf-direct
 bpf-fancy
 dropper
+bpf-prev
diff --git a/samples/seccomp/Makefile b/samples/seccomp/Makefile
index 1b4e4b8..b50821c 100644
--- a/samples/seccomp/Makefile
+++ b/samples/seccomp/Makefile
@@ -1,7 +1,7 @@
 # kbuild trick to avoid linker error. Can be omitted if a module is built.
 obj- := dummy.o
 
-hostprogs-$(CONFIG_SECCOMP_FILTER) := bpf-fancy dropper bpf-direct
+hostprogs-$(CONFIG_SECCOMP_FILTER) := bpf-fancy dropper bpf-direct bpf-prev
 
 HOSTCFLAGS_bpf-fancy.o += -I$(objtree)/usr/include
 HOSTCFLAGS_bpf-fancy.o += -idirafter $(objtree)/include
@@ -17,6 +17,11 @@ HOSTCFLAGS_bpf-direct.o += -I$(objtree)/usr/include
 HOSTCFLAGS_bpf-direct.o += -idirafter $(objtree)/include
 bpf-direct-objs := bpf-direct.o
 
+HOSTCFLAGS_bpf-prev.o += -I$(objtree)/usr/include
+HOSTCFLAGS_bpf-prev.o += -idirafter $(objtree)/include
+bpf-prev-objs := bpf-prev.o
+
+
 # Try to match the kernel target.
 ifndef CROSS_COMPILE
 ifndef CONFIG_64BIT
@@ -29,10 +34,12 @@ MFLAG = -m31
 endif
 
 HOSTCFLAGS_bpf-direct.o += $(MFLAG)
+HOSTCFLAGS_bpf-prev.o += $(MFLAG)
 HOSTCFLAGS_dropper.o += $(MFLAG)
 HOSTCFLAGS_bpf-helper.o += $(MFLAG)
 HOSTCFLAGS_bpf-fancy.o += $(MFLAG)
 HOSTLOADLIBES_bpf-direct += $(MFLAG)
+HOSTLOADLIBES_bpf-prev += $(MFLAG)
 HOSTLOADLIBES_bpf-fancy += $(MFLAG)
 HOSTLOADLIBES_dropper += $(MFLAG)
 endif
diff --git a/samples/seccomp/bpf-prev.c b/samples/seccomp/bpf-prev.c
new file mode 100644
index 000..138c584
--- /dev/null
+++ b/samples/seccomp/bpf-prev.c
@@ -0,0 +1,160 @@
+/*
+ * Seccomp BPF example that uses 

Re: [PATCH v3] kallsyms: add support for relative offsets in kallsyms address table

2016-01-21 Thread Ard Biesheuvel
On 22 January 2016 at 04:44, Michael Ellerman  wrote:
> On Thu, 2016-01-21 at 14:55 -0800, Kees Cook wrote:
>> On Thu, Jan 21, 2016 at 2:50 PM, Andrew Morton
>>  wrote:
>> > On Thu, 21 Jan 2016 18:19:43 +0100 Ard Biesheuvel 
>> >  wrote:
>> >
>> > > Similar to how relative extables are implemented, it is possible to emit
>> > > the kallsyms table in such a way that it contains offsets relative to 
>> > > some
>> > > anchor point in the kernel image rather than absolute addresses. The 
>> > > benefit
>> > > is that such table entries are no longer subject to dynamic relocation 
>> > > when
>> > > the build time and runtime offsets of the kernel image are different. 
>> > > Also,
>> > > on 64-bit architectures, it essentially cuts the size of the address 
>> > > table
>> > > in half since offsets can typically be expressed in 32 bits.
>> > >
>> > > Since it is useful for some architectures (like x86) to retain the 
>> > > ability
>> > > to emit absolute values as well, this patch adds support for both, by
>> > > emitting absolute addresses as positive 32-bit values, and addresses
>> > > relative to the lowest encountered relative symbol as negative values, 
>> > > which
>> > > are subtracted from the runtime address of this base symbol to produce 
>> > > the
>> > > actual address.
>> > >
>> > > Support for the above is enabled by default for all architectures except
>> > > IA-64, whose symbols are too far apart to capture in this manner.
>> >
>> > I'm not really understanding the benefits of this.  A smaller address
>> > table is nice, but why is it desirable that "such table entries are no
>> > longer subject to dynamic relocation when the build time and runtime
>> > offsets of the kernel image are different"?
>>
>> IIUC, this means that the relocation work done after decompression now
>> doesn't have to do relocation updates for all these values, which
>> means a smaller relocation table as well.
>
> Yep. If I remember the figures rightly it saves ~250K of relocations for the
> powerpc build.
>

For ppc64_defconfig (which has CONFIG_RELOCATABLE=y, i.e., it has a
dynamic relocation section containing a 24-byte RELA entry per
relocated quantity), I got the following numbers

101740 kallsyms entries
397 KB saved in permanent .rodata
2.4 MB saved in __init rela.dyn section
~500 KB saved in compressed image

For arm64, we don't have a compressed image, which is the reason I
need this for my arm64 implementation of CONFIG_RELOCATABLE (for
KASLR), since the RELA overhead goes straight into the distributed
image.

Thanks,
Ard.


RE: [PATCH v6 1/2] power: act8945a: add charger driver for ACT8945A

2016-01-21 Thread Yang, Wenyou
Hi Peter,

Thank you for so much advice.


> -Original Message-
> From: Peter Korsgaard [mailto:jac...@gmail.com] On Behalf Of Peter Korsgaard
> Sent: 2016年1月21日 4:25
> To: Yang, Wenyou 
> Cc: Sebastian Reichel ; Dmitry Eremin-Solenikov
> ; David Woodhouse ; Rob
> Herring ; Pawel Moll ; Mark
> Rutland ; Ian Campbell ;
> Kumar Gala ; Krzysztof Kozlowski
> ; Javier Martinez Canillas ;
> Lee Jones ; Peter Korsgaard ; Ferre,
> Nicolas ; linux-arm-ker...@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux...@vger.kernel.org
> Subject: Re: [PATCH v6 1/2] power: act8945a: add charger driver for ACT8945A
> 
> > "Wenyou" == Wenyou Yang  writes:
> 
>  > This patch adds new driver for Active-semi ACT8945A ActivePath  > charger
> (part of ACT8945A MFD driver) providing power supply class  > information to
> userspace.
> 
>  > The driver can be configured through DT (such as, total timer,  > 
> precondition
> timer and input over-voltage threshold).
> 
>  > Signed-off-by: Wenyou Yang   > ---
> 
>  > Changes in v6:
>  >  - change the type value to unsigned int.
> 
> [snip]
> 
> > +++ b/drivers/power/act8945a_charger.c
>  > @@ -0,0 +1,375 @@
>  > +/*
>  > + * Power supply driver for the Active-semi ACT8945A PMIC  > + *  > + *
> Copyright (C) 2015 Atmel Corporation  > + *  > + * Author: Wenyou Yang
>   > + *  > + * This program is free software; you can
> redistribute it and/or modify  > + * it under the terms of the GNU General 
> Public
> License version 2 as  > + * published by the Free Software Foundation.
>  > + *
>  > + */
>  > +#include 
>  > +#include 
>  > +#include 
>  > +#include 
>  > +#include 
>  > +#include 
>  > +#include 
>  > +
>  > +static const char *act8945a_charger_model = "ACT8945A";  > +static const
> char *act8945a_charger_manufacturer = "Active-semi";  > +  > +/**  > + *
> ACT8945A Charger Register Map  > + */  > +  > +/* 0x70: Reserved */
>  > +#define ACT8945A_APCH_CFG 0x71
>  > +#define ACT8945A_APCH_STATUS  0x78
>  > +#define ACT8945A_APCH_CTRL0x79
>  > +#define ACT8945A_APCH_STATE   0x7A
>  > +
>  > +/* ACT8945A_APCH_CFG */
>  > +#define APCH_CFG_OVPSET   (0x3 << 0)
>  > +#define APCH_CFG_OVPSET_6V6   (0x0 << 0)
>  > +#define APCH_CFG_OVPSET_7V(0x1 << 0)
>  > +#define APCH_CFG_OVPSET_7V5   (0x2 << 0)
>  > +#define APCH_CFG_OVPSET_8V(0x3 << 0)
>  > +#define APCH_CFG_PRETIMO  (0x3 << 2)
>  > +#define APCH_CFG_PRETIMO_40_MIN   (0x0 << 2)
>  > +#define APCH_CFG_PRETIMO_60_MIN   (0x1 << 2)
>  > +#define APCH_CFG_PRETIMO_80_MIN   (0x2 << 2)
>  > +#define APCH_CFG_PRETIMO_DISABLED (0x3 << 2)
>  > +#define APCH_CFG_TOTTIMO  (0x3 << 4)
>  > +#define APCH_CFG_TOTTIMO_3_HOUR   (0x0 << 4)
>  > +#define APCH_CFG_TOTTIMO_4_HOUR   (0x1 << 4)
>  > +#define APCH_CFG_TOTTIMO_5_HOUR   (0x2 << 4)
>  > +#define APCH_CFG_TOTTIMO_DISABLED (0x3 << 4)
>  > +#define APCH_CFG_SUSCHG   (0x1 << 7)
>  > +
>  > +#define APCH_STATUS_CHGDATBIT(0)
>  > +#define APCH_STATUS_INDAT BIT(1)
>  > +#define APCH_STATUS_TEMPDAT   BIT(2)
>  > +#define APCH_STATUS_TIMRDAT   BIT(3)
>  > +#define APCH_STATUS_CHGSTAT   BIT(4)
>  > +#define APCH_STATUS_INSTATBIT(5)
>  > +#define APCH_STATUS_TEMPSTAT  BIT(6)
>  > +#define APCH_STATUS_TIMRSTAT  BIT(7)
>  > +
>  > +#define APCH_CTRL_CHGEOCOUT   BIT(0)
>  > +#define APCH_CTRL_INDIS   BIT(1)
>  > +#define APCH_CTRL_TEMPOUT BIT(2)
>  > +#define APCH_CTRL_TIMRPRE BIT(3)
>  > +#define APCH_CTRL_CHGEOCINBIT(4)
>  > +#define APCH_CTRL_INCON   BIT(5)
>  > +#define APCH_CTRL_TEMPIN  BIT(6)
>  > +#define APCH_CTRL_TIMRTOT BIT(7)
>  > +
>  > +#define APCH_STATE_ACINSTAT   (0x1 << 1)
>  > +#define APCH_STATE_CSTATE (0x3 << 4)
>  > +#define APCH_STATE_CSTATE_SHIFT   4
>  > +#define APCH_STATE_CSTATE_DISABLED0x00
>  > +#define APCH_STATE_CSTATE_EOC 0x01
>  > +#define APCH_STATE_CSTATE_FAST0x02
>  > +#define APCH_STATE_CSTATE_PRE 0x03
>  > +
>  > +struct act8945a_charger {
>  > +  struct act8945a_dev *act8945a_dev;
> 
> I still don't see the point in this struct act8945a_dev instead of just 
> having a
> pointer to a regmap here (more about that later).
> 
> > +   struct power_supply *psy;
> 
> You use devm and only use this in the probe, so not needed.
> 
> > +
>  > +  u32 tolal_time_out;
> 
> Typo, should be total_time_out
> 
> > +   u32 pre_time_out;
>  > +  u32 input_voltage_threshold;
> 
> These 3 parameters are only used to keep track of data between
> _parse_dt() and _charger_config() which you call right after eachother. If you
> merge those two functions then these can be dropped.

Yes, you are right,  it is clearer.

> 
> 

[PATCH kernel] vfio: Only check for bus IOMMU when NOIOMMU is selected

2016-01-21 Thread Alexey Kardashevskiy
Recent change 03a76b60 "vfio: Include No-IOMMU mode" disabled VFIO
on systems which do not implement iommu_ops for the PCI bus even though
there is an VFIO IOMMU driver for it such as SPAPR TCE driver for
PPC64/powernv platform.

This moves iommu_present() call under #ifdef CONFIG_VFIO_NOIOMMU as
it is done in the rest of the file to re-enable VFIO on powernv.

Signed-off-by: Alexey Kardashevskiy 
---
 drivers/vfio/vfio.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 82f25cc..3f8060e 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -343,7 +343,6 @@ static struct vfio_group *vfio_create_group(struct 
iommu_group *iommu_group,
atomic_set(>opened, 0);
group->iommu_group = iommu_group;
group->noiommu = !iommu_present;
-
group->nb.notifier_call = vfio_iommu_group_notifier;
 
/*
@@ -767,7 +766,11 @@ int vfio_add_group_dev(struct device *dev,
 
group = vfio_group_get_from_iommu(iommu_group);
if (!group) {
+#ifdef CONFIG_VFIO_NOIOMMU
group = vfio_create_group(iommu_group, iommu_present(dev->bus));
+#else
+   group = vfio_create_group(iommu_group, true);
+#endif
if (IS_ERR(group)) {
iommu_group_put(iommu_group);
return PTR_ERR(group);
-- 
2.5.0.rc3



Re: [PATCH v5 5/5] cpufreq: powernv: Add sysfs attributes to show throttle stats

2016-01-21 Thread Gautham R Shenoy
On Thu, Jan 21, 2016 at 03:08:59PM +0530, Shilpasri G Bhat wrote:
> Signed-off-by: Shilpasri G Bhat 
Reviewed-by: Gautham R. Shenoy 

--
Thanks and Regards
gautham.



Re: [PATCH perf 3/4] perf tools: Fix unused variables: x86_{32,64}_regoffset_table

2016-01-21 Thread Wangnan (F)



On 2016/1/22 13:56, 平松雅巳 / HIRAMATU,MASAMI wrote:

From: Wangnan (F) [mailto:wangn...@huawei.com]

On 2016/1/20 21:59, Arnaldo Carvalho de Melo wrote:

Em Tue, Jan 19, 2016 at 09:33:06PM +, Ben Hutchings escreveu:

gcc 5 doesn't seem to care about these, but gcc 6 does and that
results in a build failure.

Ben, please CC the people on the CC list for the patch that introduces
the problem, Wang, He, can I have your Acked-by?

- Arnaldo


This patch lead me find a bug in original code.

If both perf and target ELF binary is x86_64, following command works okay:

  # perf probe -v -n --exec /tmp/oxygen_root/lib64/libc.so.6 pselect
data exceptfds readfds writefds nfds sigmask tval timeout
  
  Opening /sys/kernel/debug/tracing//uprobe_events write=1
  Writing event: p:probe_libc/pselect
/home/w00229757/oxygen_root-w00229757/lib64/libc-2.18.so:0xdfef0
data=-216(%sp):u64 exceptfds=%cx:u64 readfds=%si:u64 writefds=%dx:u64
nfds=%di:s32 sigmask=%r9:u64 tval=-232(%sp):u64 timeout=%r8:u64
  

But if the library is x86_32, result is incorrect:

   # perf probe -v -n --exec /tmp/oxygen_root/lib32/libc.so.6 pselect
data exceptfds readfds writefds nfds sigmask tval
   
   Writing event: p:probe_libc/pselect
/tmp/oxygen_root-w00229757/lib32/libc-2.18.so:0xd1330 data=-172(%si):u64
exceptfds=+16(%si):u32 readfds=+8(%si):u32 writefds=+12(%si):u32
nfds=+4(%si):s32 sigmask=+24(%si):u32 tval=-180(%si):u64
timeout=+20(%si):u32
   

We know that (%si) is used to passing arguments. Here we should see
'%sp' or '$stack'.

Use a x86_32 perf we get currect result:

  # ~/perf probe -v -n --exec /tmp/oxygen_root/lib32/libc.so.6 pselect
data exceptfds readfds writefds nfds sigmask tval
  
  Writing event: p:probe_libc/pselect
/tmp/oxygen_root-w00229757/lib32/libc-2.18.so:0xd1330
data=-172($stack):u64 exceptfds=+16($stack):u32 readfds=+8($stack):u32
writefds=+12($stack):u32 nfds=+4($stack):s32 sigmask=+24($stack):u32
tval=-180($stack):u64
  

Ah, I see. Uprobes may not check the target binary is in 32bit mode.
Since the stack of x86-64 and x86-32 on pt_regs are different,
(regs->sp points stack on x86-64, &(regs->pt) points stack on x86-32)
uprobes would better checking and change the behavior.

But anyway, it is also fixed by changing perf's register table.



Use a small test program to check the result:

  #include 
  #include 
  #include 
  #include 

  static struct {
 fd_set r, w, e;
 struct timespec ts;
 sigset_t m;
  } s;

  int main()
  {
 memset(, '\0', sizeof(s));

 pselect(0, , , , , );
 return 0;
  }

# gcc -m32 -g ./test_pselect.c

Use x86_32 perf:

# ./perf probe -v  --exec /tmp/oxygen_root/lib32/libc.so.6 pselect data
exceptfds readfds writefds nfds sigmask tval
Writing event: p:probe_libc/pselect
/tmp/oxygen_root-w00229757/lib32/libc-2.18.so:0xd1330
data=-172($stack):u64 exceptfds=+16($stack):u32 readfds=+8($stack):u32
writefds=+12($stack):u32 nfds=+4($stack):s32 sigmask=+24($stack):u32
tval=-180($stack):u64
Added new event:
   probe_libc:pselect   (on pselect in
/tmp/oxygen_root-w00229757/lib32/libc-2.18.so with data exceptfds
readfds writefds nfds sigmask tval)

You can now use it in all perf tools, such as:

 perf record -e probe_libc:pselect -aR sleep 1

# ./perf record -e probe_libc:pselect ./a.out
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data (1 samples) ]
# ./perf script
a.out 25336 [006] 64588.457597: probe_libc:pselect:
(f7663330) data=0xf772e000 exceptfds=0x8049880 readfds=0x8049780
writefds=0x8049800 nfds=0 sigmask=0x8049908 tval=0x0

Switch to x86_64 perf:

  # ./perf probe -v  --exec /tmp/oxygen_root/lib32/libc.so.6 pselect
data exceptfds readfds writefds nfds sigmask tval
  
  Opening /sys/kernel/debug/tracing//uprobe_events write=1
Writing event: p:probe_libc/pselect
/tmp/oxygen_root-w00229757/lib32/libc-2.18.so:0xd1330 data=-172(%si):u64
exceptfds=+16(%si):u32 readfds=+8(%si):u32 writefds=+12(%si):u32
nfds=+4(%si):s32 sigmask=+24(%si):u32 tval=-180(%si):u64
Added new event:
   probe_libc:pselect   (on pselect in
/tmp/oxygen_root-w00229757/lib32/libc-2.18.so with data exceptfds
readfds writefds nfds sigmask tval)

You can now use it in all perf tools, such as:

 perf record -e probe_libc:pselect -aR sleep 1

# ./perf record -e probe_libc:pselect ./a.out
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data (1 samples) ]
# ./perf script
a.out 25599 [002] 64759.743554: probe_libc:pselect:
(f76e7330) data=0x0 exceptfds=0x0 readfds=0x0 writefds=0x0 nfds=0
sigmask=0x0 tval=0x0

Sad...

I think this problem is not introduced by my patch. In fact
there's a fundamental problem in get_arch_regstr() that it is
impossible to switch sub ISA.

Right, but I guess this can fixed by switching %sp (for x86-64)
and +0(%sp) (for x86-32) instead of $stack.
  


It may not work.

No matter how we change regoffset_table, when 

Re: [PATCH, REGRESSION v3] mm: make apply_to_page_range more robust

2016-01-21 Thread Mika Penttilä
On 01/22/2016 01:12 AM, David Rientjes wrote:
> On Thu, 21 Jan 2016, Mika Penttilä wrote:
> 
>> Recent changes (4.4.0+) in module loader triggered oops on ARM : 
>>
>> The module in question is in-tree module :
>> drivers/misc/ti-st/st_drv.ko
>>
>> The BUG is here :
>>
>> [ 53.638335] [ cut here ]
>> [ 53.642967] kernel BUG at mm/memory.c:1878!
>> [ 53.647153] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
>> [ 53.652987] Modules linked in:
>> [ 53.656061] CPU: 0 PID: 483 Comm: insmod Not tainted 4.4.0 #3
>> [ 53.661808] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
>> [ 53.668338] task: a989d400 ti: 9e6a2000 task.ti: 9e6a2000
>> [ 53.673751] PC is at apply_to_page_range+0x204/0x224
>> [ 53.678723] LR is at change_memory_common+0x90/0xdc
>> [ 53.683604] pc : [<800ca0ec>] lr : [<8001d668>] psr: 600b0013
>> [ 53.683604] sp : 9e6a3e38 ip : 8001d6b4 fp : 7f0042fc
>> [ 53.695082] r10:  r9 : 9e6a3e90 r8 : 0080
>> [ 53.700309] r7 :  r6 : 7f008000 r5 : 7f008000 r4 : 7f008000
>> [ 53.706837] r3 : 8001d5a4 r2 : 7f008000 r1 : 7f008000 r0 : 80b8d3c0
>> [ 53.713368] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
>> [ 53.720504] Control: 10c5387d Table: 2e6b804a DAC: 0055
>> [ 53.726252] Process insmod (pid: 483, stack limit = 0x9e6a2210)
>> [ 53.732173] Stack: (0x9e6a3e38 to 0x9e6a4000)
>> [ 53.736532] 3e20: 7f007fff 7f008000
>> [ 53.744714] 3e40: 80b8d3c0 80b8d3c0  7f007000 7f00426c 7f008000 
>>  7f008000
>> [ 53.752895] 3e60: 7f004140 7f008000  0080   
>> 7f0042fc 8001d668
>> [ 53.761076] 3e80: 9e6a3e90  8001d6b4 7f00426c 0080  
>> 9e6a3f58 7f004140
>> [ 53.769257] 3ea0: 7f004240 7f00414c  8008bbe0  7f00 
>>  
>> [ 53.777438] 3ec0: a8b12f00 0001cfd4 7f004250 7f004240 80b8159c  
>> 00e0 7f0042fc
>> [ 53.785619] 3ee0: c183d000 74f8 18fd  0b3c  
>>  7f002024
>> [ 53.793800] 3f00: 0002      
>>  
>> [ 53.801980] 3f20:     0040  
>> 0003 0001cfd4
>> [ 53.810161] 3f40: 017b 8000f7e4 9e6a2000  0002 8008c498 
>> c183d000 74f8
>> [ 53.818342] 3f60: c1841588 c1841409 c1842950 5000 52a0  
>>  
>> [ 53.826523] 3f80: 0023 0024 001a 001e 0016  
>>  
>> [ 53.834703] 3fa0: 003e3d60 8000f640   0003 0001cfd4 
>>  003e3d60
>> [ 53.842884] 3fc0:   003e3d60 017b 003e3d20 7eabc9d4 
>> 76f2c000 0002
>> [ 53.851065] 3fe0: 7eabc990 7eabc980 00016320 76e81d00 600b0010 0003 
>>  
>> [ 53.859256] [<800ca0ec>] (apply_to_page_range) from [<8001d668>] 
>> (change_memory_common+0x90/0xdc)
>> [ 53.868139] [<8001d668>] (change_memory_common) from [<8008bbe0>] 
>> (load_module+0x194c/0x2068)
>> [ 53.876671] [<8008bbe0>] (load_module) from [<8008c498>] 
>> (SyS_finit_module+0x64/0x74)
>> [ 53.884512] [<8008c498>] (SyS_finit_module) from [<8000f640>] 
>> (ret_fast_syscall+0x0/0x34)
>> [ 53.892694] Code: e0834104 eabc e51a1008 eaac (e7f001f2)
>> [ 53.898792] ---[ end trace fe43fc78ebde29a3 ]---
>>
> 
> NACK to your patch as it is just covering up buggy code silently.  The 
> problem needs to be addressed in change_memory_common() to return if there 
> is no size to change (numpages == 0).  It's a two line fix to that 
> function.
> 

That surely would make this particular problem disappear on ARM. But, we 
probably get similar behavior on other arches too (arm64 at least).

Also, you are suggesting it is ok to call set_memory_xx() with numpages==0, but 
bug to call apply_to_page_range() with size==0 ? 
I think these are similar apis with a size type of argument. Functions taking a 
range [start, end) are a different story and should be illegal to call 
start==end.

Also, taking a fast look at all call sites of apply_to_page_range not all are 
checking for !size (some Xen code for instance) and could trigger a kernel BUG 
(potentially triggerable from user code). So something that was meant to help 
finding buggy code could be turned into an easy way to DOS. 

Thanks,
--Mika






Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-21 Thread Davidlohr Bueso

On Thu, 21 Jan 2016, Waiman Long wrote:


On 01/21/2016 04:29 AM, Ding Tianhong wrote:



I got the vmcore and found that the ifconfig is already in the wait_list of the
rtnl_lock for 120 second, but my process could get and release the rtnl_lock
normally several times in one second, so it means that my process jump the
queue and the ifconfig couldn't get the rtnl all the time, I check the mutex 
lock
slow path and found that the mutex may spin on owner ignore whether the  wait 
list
is empty, it will cause the task in the wait list always be cut in line, so add
test for wait list in the mutex_can_spin_on_owner and avoid this problem.


So this has been somewhat always known, at least in theory, until now. It's the 
cost
of spinning without going through the wait-queue, unlike other locks.


[...]



From: Waiman Long 
Date: Thu, 21 Jan 2016 17:53:14 -0500
Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken task in 
wait list

Ding Tianhong reported a live-lock situation where a constant stream
of incoming optimistic spinners blocked a task in the wait list from
getting the mutex.

This patch attempts to fix this live-lock condition by enabling the
a woken task in the wait list to enter optimistic spinning loop itself
with precedence over the ones in the OSQ. This should prevent the
live-lock
condition from happening.


And one of the reasons why we never bothered 'fixing' things was the additional
branching out in the slowpath (and lack of real issue, although this one being 
so
damn pathological). I fear that your approach is one of those scenarios where 
the
code ends up being bloated, albeit most of it is actually duplicated and can be
refactored *sigh*. So now we'd spin, then sleep, then try spinning then sleep 
again...
phew. Not to mention the performance implications, ie loosing the benefits of 
osq
over waiter spinning in scenarios that would otherwise have more osq spinners as
opposed to waiter spinners, or in setups where it is actually best to block 
instead
of spinning.

Thanks,
Davidlohr


[GIT PULL] ext4 changes for 4.5

2016-01-21 Thread Theodore Ts'o
(I thought I had sent this earlier, but apparently the e-mail never
left my machine.  Apologies if this is a duplicate, but I'm pretty
sure it was never sent on my end.)


The following changes since commit f41683a204ea61568f0fd0804d47c19561f2ee39:

  Merge tag 'ext4_for_linus_stable' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 (2015-12-07 10:25:00 
-0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git 
tags/ext4_for_linus

for you to fetch changes up to 68ce7bfcd995a8a393b1b14fa67dbc16fa3dc784:

  fs: clean up the flags definition in uapi/linux/fs.h (2016-01-08 16:01:25 
-0500)


Some locking and page fault bug fixes from Jan Kara, some ext4
encryption fixes from me, and Li Xi's Project Quota commits.


Jan Kara (9):
  ext4: fix races between page faults and hole punching
  ext4: move unlocked dio protection from ext4_alloc_file_blocks()
  ext4: fix races between buffered IO and collapse / insert range
  ext4: fix races of writeback with punch hole and zero range
  ext4: document lock ordering
  ext4: get rid of EXT4_GET_BLOCKS_NO_LOCK flag
  ext4: provide ext4_issue_zeroout()
  ext4: implement allocation of pre-zeroed blocks
  ext4: use pre-zeroed blocks for DAX page faults

Li Xi (3):
  ext4: adds project ID support
  ext4: add project quota support
  ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support

Theodore Ts'o (3):
  ext4 crypto: add missing locking for keyring_key access
  ext4 crypto: simplify interfaces to directory entry insert functions
  fs: clean up the flags definition in uapi/linux/fs.h

 fs/ext4/crypto.c|   6 +-
 fs/ext4/crypto_key.c|   4 +
 fs/ext4/ext4.h  |  99 ---
 fs/ext4/extents.c   | 153 +++
 fs/ext4/file.c  |  82 +--
 fs/ext4/ialloc.c|   7 ++
 fs/ext4/inline.c|  10 +--
 fs/ext4/inode.c | 268 

 fs/ext4/ioctl.c | 376 
+
 fs/ext4/namei.c |  34 +---
 fs/ext4/super.c |  97 --
 fs/ext4/truncate.h  |   2 +
 include/trace/events/ext4.h |   2 +-
 include/uapi/linux/fs.h |  31 ++-
 14 files changed, 895 insertions(+), 276 deletions(-)


Re: [PATCH] perf/kvm: Guest Symbol Resolution for powerpc

2016-01-21 Thread Ravi Bangoria

Hi Arnaldo,

On Wednesday 13 January 2016 10:29 PM, Arnaldo Carvalho de Melo wrote:

Em Tue, Dec 29, 2015 at 03:38:40PM +0530, Ravi Bangoria escreveu:

'perf kvm {record|report}' is used to record and report the profiled
performance of any workload on a guest. From the host, we can collect
guest kernel statistics which is useful in finding out any contentions
in guest kernel symbols for a certain workload.
This feature is not available on powerpc because 'perf' relies on the
'cycles' event (a PMU event) to profile the guest. However, for powerpc,
this can't be used from the host because the PMUs are controlled by the
guest rather than the host.

Without entering the realms if the approach is the right one, which I
leave to PowerPC experts, Ingo, PeterZ, etc:

So, in these cases, please break this into a series, where you, for
instance, will add that extra evsel parameter to the functions that will
ultimately use it to extract those event fields, that should be a
separate patch, so that when reviewing the "meat" of your patch we can
quickly see what it does, not having to extract that from leg work.

Two other patches should introduce arch__get_{ip,cpumode}().

- Arnaldo


Thanks for suggestion. I've sent v2 with changes you suggested.

Can you please take a look.

Regards,
Ravi



Re: [PATCHv8 1/4] EDAC, altera: Add Altera L2 Cache and OCRAM EDAC Support

2016-01-21 Thread Vladimir Zapolskiy
Hi Thor,

On 21.01.2016 19:34, ttha...@opensource.altera.com wrote:
> From: Thor Thayer 
> 
> Adding L2 Cache and On-Chip RAM EDAC support for the
> Altera SoCs using the EDAC device  model. The SDRAM
> controller is using the Memory Controller model.
> 
> Each type of ECC is individually configurable.
> 
> Signed-off-by: Thor Thayer 
> Signed-off-by: Dinh Nguyen 

You are sending a change authored by yourself for review, but you add Dinh's
SoB, what's his role here?

See Documentation/SubmittingPatches "Sign your work".

[snip]

> +/*
> + * altr_edac_device_probe()
> + *   This is a generic EDAC device driver that will support
> + *   various Altera memory devices such as the L2 cache ECC and
> + *   OCRAM ECC as well as the memories for other peripherals.
> + *   Module specific initialization is done by passing the
> + *   function index in the device tree.
> + */
> +static int altr_edac_device_probe(struct platform_device *pdev)
> +{
> + struct edac_device_ctl_info *dci;
> + struct altr_edac_device_dev *drvdata;
> + struct resource *r;
> + int res = 0;
> + struct device_node *np = pdev->dev.of_node;
> + char *ecc_name = (char *)np->name;
> + static int dev_instance;
> + struct dentry *debugfs;
> +
> + if (!devres_open_group(>dev, NULL, GFP_KERNEL)) {
> + edac_printk(KERN_ERR, EDAC_DEVICE,
> + "Unable to open devm\n");
> + return -ENOMEM;
> + }
> +
> + r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> + if (!r) {
> + edac_printk(KERN_ERR, EDAC_DEVICE,
> + "Unable to get mem resource\n");

Missing devres_release_group(>dev, NULL) on error path.

> + return -ENODEV;
> + }
> +
> + if (!devm_request_mem_region(>dev, r->start, resource_size(r),
> +  dev_name(>dev))) {
> + edac_printk(KERN_ERR, EDAC_DEVICE,
> + "%s:Error requesting mem region\n", ecc_name);

See above.

> + return -EBUSY;
> + }
> +
> + dci = edac_device_alloc_ctl_info(sizeof(*drvdata), ecc_name,
> +  1, ecc_name, 1, 0, NULL, 0,
> +  dev_instance++);
> +
> + if (!dci) {
> + edac_printk(KERN_ERR, EDAC_DEVICE,
> + "%s: Unable to allocate EDAC device\n", ecc_name);

See above.

> + return -ENOMEM;
> + }
> +
> + drvdata = dci->pvt_info;
> + dci->dev = >dev;
> + platform_set_drvdata(pdev, dci);
> + drvdata->edac_dev_name = ecc_name;
> +
> + drvdata->base = devm_ioremap(>dev, r->start, resource_size(r));
> + if (!drvdata->base)
> + goto err;
> +
> + /* Get driver specific data for this EDAC device */
> + drvdata->data = of_match_node(altr_edac_device_of_match, np)->data;
> +
> + /* Check specific dependencies for the module */
> + if (drvdata->data->setup) {
> + res = drvdata->data->setup(pdev, drvdata->base);
> + if (res < 0)
> + goto err;
> + }
> +
> + drvdata->sb_irq = platform_get_irq(pdev, 0);
> + res = devm_request_irq(>dev, drvdata->sb_irq,
> +altr_edac_device_handler,
> +0, dev_name(>dev), dci);
> + if (res < 0)
> + goto err;
> +
> + drvdata->db_irq = platform_get_irq(pdev, 1);
> + res = devm_request_irq(>dev, drvdata->db_irq,
> +altr_edac_device_handler,
> +0, dev_name(>dev), dci);
> + if (res < 0)
> + goto err;
> +
> + dci->mod_name = "Altera ECC Manager";
> + dci->dev_name = drvdata->edac_dev_name;
> +
> + debugfs = edac_debugfs_create_dir(ecc_name);
> + if (debugfs)
> + altr_create_edacdev_dbgfs(dci, drvdata->data, debugfs);
> +
> + if (edac_device_add_device(dci))
> + goto err;
> +
> + devres_close_group(>dev, NULL);
> +
> + return 0;
> +err:
> + edac_printk(KERN_ERR, EDAC_DEVICE,
> + "%s:Error setting up EDAC device: %d\n", ecc_name, res);
> + devres_release_group(>dev, NULL);
> + edac_device_free_ctl_info(dci);
> +
> + return res;
> +}
> +
> +static int altr_edac_device_remove(struct platform_device *pdev)
> +{
> + struct edac_device_ctl_info *dci = platform_get_drvdata(pdev);
> +
> + edac_device_del_device(>dev);
> + edac_device_free_ctl_info(dci);
> +
> + return 0;
> +}
> +
> +static struct platform_driver altr_edac_device_driver = {
> + .probe =  altr_edac_device_probe,
> + .remove = altr_edac_device_remove,
> + .driver = {
> + .name = "altr_edac_device",
> + .of_match_table = altr_edac_device_of_match,
> + },
> +};
> +module_platform_driver(altr_edac_device_driver);
> +
> +/*** OCRAM EDAC Device Functions */
> +
> +#ifdef 

[PATCH v2 2/3] perf kvm: enable record|report feature on powerpc

2016-01-21 Thread Ravi Bangoria
This patch contains core logic for enabling perf kvm {record|report} on
powerpc.

For perf kvm record,
This patch will replace default event(cycle) with kvm_hv:kvm_guest_exit
while recording guest data from host.

For perf kvm report,
This patch makes use of the 'kvm_guest_exit' tracepoint and checks the
exit reason for any kvm exit. If it is HV_DECREMENTER, then the
instruction pointer dumped along with this tracepoint is retrieved and
mapped with the guest kallsyms.

Signed-off-by: Ravi Bangoria 
Signed-off-by: Hemant Kumar 
---
changes in v2:
- Breakdown of v1 patch into two sub patches
- Merged parse-tp.c and evlist.c from tools/perf/arch/powerpc/util/ into
  single file with name kvm.c

 tools/perf/arch/powerpc/util/Build |   1 +
 tools/perf/arch/powerpc/util/kvm.c | 104 +
 tools/perf/util/event.c|  12 -
 tools/perf/util/evlist.c   |   9 
 tools/perf/util/evlist.h   |   1 +
 tools/perf/util/evsel.c|   7 +++
 tools/perf/util/evsel.h|   4 ++
 tools/perf/util/session.c  |   9 ++--
 tools/perf/util/util.c |   5 ++
 tools/perf/util/util.h |   1 +
 10 files changed, 147 insertions(+), 6 deletions(-)
 create mode 100644 tools/perf/arch/powerpc/util/kvm.c

diff --git a/tools/perf/arch/powerpc/util/Build 
b/tools/perf/arch/powerpc/util/Build
index 7b8b0d1..eb819e0 100644
--- a/tools/perf/arch/powerpc/util/Build
+++ b/tools/perf/arch/powerpc/util/Build
@@ -1,5 +1,6 @@
 libperf-y += header.o
 libperf-y += sym-handling.o
+libperf-y += kvm.o
 
 libperf-$(CONFIG_DWARF) += dwarf-regs.o
 libperf-$(CONFIG_DWARF) += skip-callchain-idx.o
diff --git a/tools/perf/arch/powerpc/util/kvm.c 
b/tools/perf/arch/powerpc/util/kvm.c
new file mode 100644
index 000..317f29a
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/kvm.c
@@ -0,0 +1,104 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * Copyright (C) 2016 Hemant Kumar Shaw, IBM Corporation
+ * Copyright (C) 2016 Ravikumar B. Bangoria, IBM Corporation
+ */
+
+#include 
+#include "../../../util/evsel.h"
+#include "../../../util/evlist.h"
+#include "../../../util/trace-event.h"
+#include "../../../util/session.h"
+#include "../../../util/util.h"
+
+#define KVMPPC_EXIT "kvm_hv:kvm_guest_exit"
+#define HV_DECREMENTER 2432
+#define HV_BIT 3
+#define PR_BIT 49
+#define PPC_MAX 63
+
+/*
+ * To sample for only guest, record kvm_hv:kvm_guest_exit.
+ * Otherwise go via normal way(cycles).
+ */
+int perf_evlist__arch_add_default(struct perf_evlist *evlist)
+{
+   struct perf_evsel *evsel;
+
+   if (!perf_guest_only())
+   return -1;
+
+   evsel = perf_evsel__newtp_idx("kvm_hv", "kvm_guest_exit", 0);
+   if (IS_ERR(evsel))
+   return PTR_ERR(evsel);
+
+   perf_evlist__add(evlist, evsel);
+   return 0;
+}
+
+static bool is_kvmppc_exit_event(struct perf_evsel *evsel)
+{
+   static unsigned int kvmppc_exit;
+
+   if (evsel->attr.type != PERF_TYPE_TRACEPOINT)
+   return false;
+
+   if (unlikely(kvmppc_exit == 0)) {
+   if (strcmp(KVMPPC_EXIT, evsel->name))
+   return false;
+   kvmppc_exit = evsel->attr.config;
+   } else if (kvmppc_exit != evsel->attr.config) {
+   return false;
+   }
+
+   return true;
+}
+
+static bool is_hv_dec_trap(struct perf_evsel *evsel, struct perf_sample 
*sample)
+{
+   int trap = perf_evsel__intval(evsel, sample, "trap");
+   return trap == HV_DECREMENTER;
+}
+
+/*
+ * Get the instruction pointer from the tracepoint data
+ */
+u64 arch__get_ip(struct perf_evsel *evsel, struct perf_sample *sample)
+{
+   if (perf_guest_only() &&
+   is_kvmppc_exit_event(evsel) &&
+   is_hv_dec_trap(evsel, sample))
+   return perf_evsel__intval(evsel, sample, "pc");
+
+   return sample->ip;
+}
+
+/*
+ * Get the HV and PR bits and accordingly, determine the cpumode
+ */
+u8 arch__get_cpumode(const union perf_event *event, struct perf_evsel *evsel,
+struct perf_sample *sample)
+{
+   unsigned long hv, pr, msr;
+   u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+
+   if (!perf_guest_only() || !is_kvmppc_exit_event(evsel))
+   goto ret;
+
+   if (sample->raw_data && is_hv_dec_trap(evsel, sample)) {
+   msr = perf_evsel__intval(evsel, sample, "msr");
+   hv = msr & ((unsigned long)1 << (PPC_MAX - HV_BIT));
+   pr = msr & ((unsigned long)1 << (PPC_MAX - PR_BIT));
+
+   if (!hv && pr)
+   cpumode = PERF_RECORD_MISC_GUEST_USER;
+   else
+   cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
+   }
+
+ret:
+   return cpumode;
+}
diff --git 

Re: [PATCH V2 3/3] vhost_net: basic polling support

2016-01-21 Thread Jason Wang


On 01/20/2016 10:35 PM, Michael S. Tsirkin wrote:
> On Tue, Dec 01, 2015 at 02:39:45PM +0800, Jason Wang wrote:
>> This patch tries to poll for new added tx buffer or socket receive
>> queue for a while at the end of tx/rx processing. The maximum time
>> spent on polling were specified through a new kind of vring ioctl.
>>
>> Signed-off-by: Jason Wang 
>> ---
>>  drivers/vhost/net.c| 72 
>> ++
>>  drivers/vhost/vhost.c  | 15 ++
>>  drivers/vhost/vhost.h  |  1 +
>>  include/uapi/linux/vhost.h | 11 +++
>>  4 files changed, 94 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index 9eda69e..ce6da77 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -287,6 +287,41 @@ static void vhost_zerocopy_callback(struct ubuf_info 
>> *ubuf, bool success)
>>  rcu_read_unlock_bh();
>>  }
>>  
>> +static inline unsigned long busy_clock(void)
>> +{
>> +return local_clock() >> 10;
>> +}
>> +
>> +static bool vhost_can_busy_poll(struct vhost_dev *dev,
>> +unsigned long endtime)
>> +{
>> +return likely(!need_resched()) &&
>> +   likely(!time_after(busy_clock(), endtime)) &&
>> +   likely(!signal_pending(current)) &&
>> +   !vhost_has_work(dev) &&
>> +   single_task_running();
>> +}
>> +
>> +static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
>> +struct vhost_virtqueue *vq,
>> +struct iovec iov[], unsigned int iov_size,
>> +unsigned int *out_num, unsigned int *in_num)
>> +{
>> +unsigned long uninitialized_var(endtime);
>> +
>> +if (vq->busyloop_timeout) {
>> +preempt_disable();
>> +endtime = busy_clock() + vq->busyloop_timeout;
>> +while (vhost_can_busy_poll(vq->dev, endtime) &&
>> +   !vhost_vq_more_avail(vq->dev, vq))
>> +cpu_relax();
>> +preempt_enable();
>> +}
> Isn't there a way to call all this after vhost_get_vq_desc?

We can.

> First, this will reduce the good path overhead as you
> won't have to play with timers and preemption.

For good path, yes.

>
> Second, this will reduce the chance of a pagefault on avail ring read.

Yes.

>
>> +
>> +return vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
>> + out_num, in_num, NULL, NULL);
>> +}
>> +
>>  /* Expects to be always run from workqueue - which acts as
>>   * read-size critical section for our kind of RCU. */
>>  static void handle_tx(struct vhost_net *net)
>> @@ -331,10 +366,9 @@ static void handle_tx(struct vhost_net *net)
>>% UIO_MAXIOV == nvq->done_idx))
>>  break;
>>  
>> -head = vhost_get_vq_desc(vq, vq->iov,
>> - ARRAY_SIZE(vq->iov),
>> - , ,
>> - NULL, NULL);
>> +head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
>> +ARRAY_SIZE(vq->iov),
>> +, );
>>  /* On error, stop handling until the next kick. */
>>  if (unlikely(head < 0))
>>  break;
>> @@ -435,6 +469,34 @@ static int peek_head_len(struct sock *sk)
>>  return len;
>>  }
>>  
>> +static int vhost_net_peek_head_len(struct vhost_net *net, struct sock *sk)
> Need a hint that it's rx related in the name.

Ok.

>
>> +{
>> +struct vhost_net_virtqueue *nvq = >vqs[VHOST_NET_VQ_TX];
>> +struct vhost_virtqueue *vq = >vq;
>> +unsigned long uninitialized_var(endtime);
>> +
>> +if (vq->busyloop_timeout) {
>> +mutex_lock(>mutex);
> This appears to be called under vq mutex in handle_rx.
> So how does this work then?

This is tx mutex, an optimization here: both rx socket and tx ring is
polled.  So there's no need to tx notification anymore. This can save
lots of vmexits.

>
>
>> +vhost_disable_notify(>dev, vq);
> This appears to be called after disable notify
> in handle_rx - so why disable here again?

It disable the tx notification instead of rx.

>
>> +
>> +preempt_disable();
>> +endtime = busy_clock() + vq->busyloop_timeout;
>> +
>> +while (vhost_can_busy_poll(>dev, endtime) &&
>> +   skb_queue_empty(>sk_receive_queue) &&
>> +   !vhost_vq_more_avail(>dev, vq))
>> +cpu_relax();
> This seems to mix in several items.
> RX queue is normally not empty. I don't think
> we need to poll for that.
> So IMHO we only need to poll for sk_receive_queue really.

Same as above, tx virt queue is being polled here.

>
>> +
>> +preempt_enable();
>> +
>> +if (vhost_enable_notify(>dev, vq))
>> +vhost_poll_queue(>poll);
> But 

[PATCH v2 3/3] perf kvm: Fix output fields instead of 'trace' for perf kvm report on powerpc

2016-01-21 Thread Ravi Bangoria
commit d49dadea7862 ("perf tools: Make 'trace' or 'trace_fields' sort key
default for tracepoint events") makes 'trace' sort key as a default
while displaying report for tracepoint.

As tracepoint(kvm_hv:kvm_guest_exit) is used as a default event for
recording data, perf kvm report will display output as a list of
tracepoint hits and not with a normal report columns.

This patch will replace 'overhead,comm,dso,sym' fields instead of 'trace'
while displaying perf kvm report on powerpc.

Before applying patch:

  $ ./perf kvm --guestkallsyms=guest.kallsyms --guestmodules=guest.modules 
report --stdio
  # To display the perf.data header info, please use --header/--header-only 
options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 181K of event 'kvm_hv:kvm_guest_exit'
  # Event count (approx.): 181061
  #
  # Overhead  Trace output
  #   
.
  #
   0.02%  VCPU 8: trap=HV_DECREMENTER pc=0xc0091924 
msr=0x80009032, ceded=0
   0.00%  VCPU 0: trap=HV_DECREMENTER pc=0xc0091924 
msr=0x80009032, ceded=0
   0.00%  VCPU 8: trap=HV_DECREMENTER pc=0x10005c7c msr=0x8280f032, 
ceded=0
   0.00%  VCPU 8: trap=HV_DECREMENTER pc=0x1001ef14 msr=0x8280f032, 
ceded=0
   0.00%  VCPU 8: trap=HV_DECREMENTER pc=0x3fff83398830 
msr=0x8280f032, ceded=0
   0.00%  VCPU 8: trap=HV_DECREMENTER pc=0x3fff833a6fe4 
msr=0x8280f032, ceded=0
   0.00%  VCPU 8: trap=HV_DECREMENTER pc=0x3fff833a7a64 
msr=0x8280f032, ceded=0

After applying patch:

  $ ./perf kvm --guestkallsyms=guest.kallsyms --guestmodules=guest.modules 
report --stdio
  # To display the perf.data header info, please use --header/--header-only 
options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 181K of event 'kvm_hv:kvm_guest_exit'
  # Event count (approx.): 181061
  #
  # Overhead  Command  Shared ObjectSymbol
  #   ...  ...  ..
  #
   0.02%  :57276   [guest.kernel.kallsyms]  [g] .plpar_hcall_norets
   0.00%  :57274   [guest.kernel.kallsyms]  [g] .plpar_hcall_norets
   0.00%  :57276   [guest.kernel.kallsyms]  [g] .__copy_tofrom_user_power7
   0.00%  :57276   [guest.kernel.kallsyms]  [g] ._atomic_dec_and_lock
   0.00%  :57276   [guest.kernel.kallsyms]  [g] ._raw_spin_lock
   0.00%  :57276   [guest.kernel.kallsyms]  [g] ._switch
   0.00%  :57276   [guest.kernel.kallsyms]  [g] .bio_add_page
   0.00%  :57276   [guest.kernel.kallsyms]  [g] .kmem_cache_alloc

Signed-off-by: Ravi Bangoria 
---
changes in v2:
- Fixes output format of perf kvm report on powerpc

 tools/perf/arch/powerpc/util/kvm.c | 30 ++
 tools/perf/builtin-kvm.c   | 23 +--
 tools/perf/builtin.h   |  3 +++
 3 files changed, 50 insertions(+), 6 deletions(-)

diff --git a/tools/perf/arch/powerpc/util/kvm.c 
b/tools/perf/arch/powerpc/util/kvm.c
index 317f29a..e5d88cc 100644
--- a/tools/perf/arch/powerpc/util/kvm.c
+++ b/tools/perf/arch/powerpc/util/kvm.c
@@ -8,11 +8,13 @@
  */
 
 #include 
+#include 
 #include "../../../util/evsel.h"
 #include "../../../util/evlist.h"
 #include "../../../util/trace-event.h"
 #include "../../../util/session.h"
 #include "../../../util/util.h"
+#include "../../../builtin.h"
 
 #define KVMPPC_EXIT "kvm_hv:kvm_guest_exit"
 #define HV_DECREMENTER 2432
@@ -102,3 +104,31 @@ u8 arch__get_cpumode(const union perf_event *event, struct 
perf_evsel *evsel,
 ret:
return cpumode;
 }
+
+const char **arch__cmd_kvm_report_argv(const char *file_name, int argc,
+  int *rec_argc, const char **argv)
+{
+   int i = 0, j, arch_argc = 0;
+   const char **rec_argv;
+
+   if (perf_guest_only())
+   arch_argc = 2;
+
+   *rec_argc = argc + arch_argc + 2;
+   rec_argv = calloc(*rec_argc + 1, sizeof(char *));
+   rec_argv[i++] = strdup("report");
+   rec_argv[i++] = strdup("-i");
+   rec_argv[i++] = strdup(file_name);
+
+   if (arch_argc) {
+   rec_argv[i++] = strdup("-F");
+   rec_argv[i++] = strdup("overhead,comm,dso,sym");
+   }
+
+   for (j = 1; j < argc; j++, i++)
+   rec_argv[i] = argv[j];
+
+   BUG_ON(i != *rec_argc);
+
+   return rec_argv;
+}
diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 4418d92..48455c9 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1480,22 +1480,33 @@ static int __cmd_record(const char *file_name, int 
argc, const char **argv)
return cmd_record(i, rec_argv, NULL);
 }
 
-static int __cmd_report(const char *file_name, int argc, const char **argv)
+
+const char ** __weak arch__cmd_kvm_report_argv(const char *file_name, int argc,
+  int *rec_argc, const char **argv)
 {
-   int 

[PATCH v2 1/3] perf kvm: Introduce evsel as argument to perf_event__preprocess_sample

2016-01-21 Thread Ravi Bangoria
This patch changes prototype of perf_event__preprocess_sample() with
additional argument evsel added at last.

This change is required because perf_event__preprocess_sample()
function will use evsel to determine cpumode of samples for powerpc
architecture.

Signed-off-by: Ravi Bangoria 
---
changes in v2:
- Breakdown of v1 patch into two sub patches

 tools/perf/builtin-annotate.c |  3 ++-
 tools/perf/builtin-diff.c |  3 ++-
 tools/perf/builtin-mem.c  | 10 ++
 tools/perf/builtin-report.c   |  3 ++-
 tools/perf/builtin-script.c   |  3 ++-
 tools/perf/builtin-timechart.c|  8 +---
 tools/perf/builtin-top.c  |  3 ++-
 tools/perf/tests/hists_cumulate.c |  2 +-
 tools/perf/tests/hists_filter.c   |  2 +-
 tools/perf/tests/hists_link.c |  4 ++--
 tools/perf/tests/hists_output.c   |  2 +-
 tools/perf/util/event.c   |  3 ++-
 tools/perf/util/event.h   |  3 ++-
 13 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index cc5c126..b488a5c 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -94,7 +94,8 @@ static int process_sample_event(struct perf_tool *tool,
struct addr_location al;
int ret = 0;
 
-   if (perf_event__preprocess_sample(event, machine, , sample) < 0) {
+   if (perf_event__preprocess_sample(event, machine, ,
+ sample, evsel) < 0) {
pr_warning("problem processing %d event, skipping it.\n",
   event->header.type);
return -1;
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 36ccc2b..d2a27fe 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -330,7 +330,8 @@ static int diff__process_sample_event(struct perf_tool 
*tool __maybe_unused,
struct hists *hists = evsel__hists(evsel);
int ret = -1;
 
-   if (perf_event__preprocess_sample(event, machine, , sample) < 0) {
+   if (perf_event__preprocess_sample(event, machine, ,
+ sample, evsel) < 0) {
pr_warning("problem processing %d event, skipping it.\n",
   event->header.type);
return -1;
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 3901700..eb27b49 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -61,13 +61,15 @@ static int
 dump_raw_samples(struct perf_tool *tool,
 union perf_event *event,
 struct perf_sample *sample,
-struct machine *machine)
+struct machine *machine,
+struct perf_evsel *evsel)
 {
struct perf_mem *mem = container_of(tool, struct perf_mem, tool);
struct addr_location al;
const char *fmt;
 
-   if (perf_event__preprocess_sample(event, machine, , sample) < 0) {
+   if (perf_event__preprocess_sample(event, machine, ,
+ sample, evsel) < 0) {
fprintf(stderr, "problem processing %d event, skipping it.\n",
event->header.type);
return -1;
@@ -111,10 +113,10 @@ out_put:
 static int process_sample_event(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
-   struct perf_evsel *evsel __maybe_unused,
+   struct perf_evsel *evsel,
struct machine *machine)
 {
-   return dump_raw_samples(tool, event, sample, machine);
+   return dump_raw_samples(tool, event, sample, machine, evsel);
 }
 
 static int report_raw_events(struct perf_mem *mem)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 2bf537f..fa7bbd9 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -151,7 +151,8 @@ static int process_sample_event(struct perf_tool *tool,
};
int ret = 0;
 
-   if (perf_event__preprocess_sample(event, machine, , sample) < 0) {
+   if (perf_event__preprocess_sample(event, machine, ,
+ sample, evsel) < 0) {
pr_debug("problem processing %d event, skipping it.\n",
 event->header.type);
return -1;
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index c691214..4363e8a 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -783,7 +783,8 @@ static int process_sample_event(struct perf_tool *tool,
return 0;
}
 
-   if (perf_event__preprocess_sample(event, machine, , sample) < 0) {
+   if (perf_event__preprocess_sample(event, machine, ,
+ sample, evsel) < 0) {
pr_err("problem 

[PATCH v2 0/3] perf kvm: Guest Symbol Resolution for powerpc

2016-01-21 Thread Ravi Bangoria
'perf kvm {record|report}' is used to record and report the profiled
performance of any workload on a guest. From the host, we can collect
guest kernel statistics which is useful in finding out any contentions
in guest kernel symbols for a certain workload.
This feature is not available on powerpc because 'perf' relies on the
'cycles' event (a PMU event) to profile the guest. However, for powerpc,
this can't be used from the host because the PMUs are controlled by the
guest rather than the host.

Due to this issue, we need a different approach to profile the
workload in the guest. There exists a tracepoint 'kvm_hv:kvm_guest_exit'
in powerpc which is hit whenever any of the threads exit the guest
context. The guest instruction pointer dumped along with this
tracepoint data in the field 'pc', can be used as guest instruction
pointer while postprocessing the trace data to map this IP to symbol
from guest.kallsyms.

However, to have some kind of periodicity, we can't use all the kvm
exits, rather exits which are bound to happen in certain intervals.
HV_DECREMENTER Interrupt forces the threads to exit after an interval
of 10 ms.

This patch makes use of the 'kvm_guest_exit' tracepoint and checks the
exit reason for any kvm exit. If it is HV_DECREMENTER, then the
instruction pointer dumped along with this tracepoint is retrieved and
mapped with the guest kallsyms. So for powerpc, 'perf kvm record' will
record 'kvm_hv:kvm_guest_exit' events instead of cycles.

This patch will enable --guest option for perf kvm {record|report} on
powerpc. Still --host --guest together won't work.

This patch can be considered as a next iteration to RFC patch sent by
Hemant Kumar: https://lkml.org/lkml/2015/6/15/670. Hemant's patch is used
for enabling 'perf kvm report', while I've added code to enable
'perf kvm record' on powerpc.

Patches are developed on acme's perf/core branch.

 * changes in v2)
- Patch 1,2 are breakdown of v1 patch with little changes
- Patch 3 is new. It fixes output format of perf kvm report

Before applying patch:
[Note: one needs to run vm with kvm enabled]

  $ ./perf kvm --guestkallsyms=guest.kallsyms --guestmodules=guest.modules 
record -a
  [ perf record: Captured and wrote 1.530 MB perf.data.guest (28768 samples) ]

  $ ./perf script -i perf.data.guest
   qemu-system-ppc  9688 [000] 842566.451558:  1 cycles:ppp:  
c01f2860 .mmap_region ([kernel.kallsyms])
   qemu-system-ppc  9688 [000] 842566.451562:  1 cycles:ppp:  
c00a2d68 .kvmppc_do_h_enter ([kernel.kallsyms])
   qemu-system-ppc  9688 [000] 842566.451564:  7 cycles:ppp:  
c001f26c .vsx_unavailable_tm ([kernel.kallsyms])
   qemu-system-ppc  9688 [000] 842566.451565:138 cycles:ppp:  
c001f26c .vsx_unavailable_tm ([kernel.kallsyms])
   qemu-system-ppc  9688 [000] 842566.451567:   3128 cycles:ppp:  
c00097d8 ._switch ([kernel.kallsyms])
   qemu-system-ppc  9688 [000] 842566.451570:  81568 cycles:ppp:  
c00ea8bc .wake_up_new_task ([kernel.kallsyms])
   swapper 0 [004] 842566.451580:  1 cycles:ppp:  
c01f2d88 .sys_munmap ([kernel.kallsyms])
   swapper 0 [004] 842566.451583:  1 cycles:ppp:  
c001f26c .vsx_unavailable_tm ([kernel.kallsyms])
   swapper 0 [004] 842566.451584: 11 cycles:ppp:  
c001f26c .vsx_unavailable_tm ([kernel.kallsyms])
   swapper 0 [004] 842566.451585:226 cycles:ppp:  
c00097d4 ._switch ([kernel.kallsyms])
   swapper 0 [004] 842566.451586:   5664 cycles:ppp:  
c000990c resume_kernel ([kernel.kallsyms])
   swapper 0 [004] 842566.451591: 147929 cycles:ppp:  
c010a4fc .freeze_set_ops ([kernel.kallsyms])
   swapper 0 [008] 842566.451597:  1 cycles:ppp:  
c01f2d98 .sys_munmap ([kernel.kallsyms])
   swapper 0 [008] 842566.451600:  1 cycles:ppp:  
c00a2ee0 .kvmppc_do_h_enter ([kernel.kallsyms])
   swapper 0 [008] 842566.451602: 11 cycles:ppp:  
c00a2ee0 .kvmppc_do_h_enter ([kernel.kallsyms])
   swapper 0 [008] 842566.451603:224 cycles:ppp:  
c001f274 .vsx_unavailable_tm ([kernel.kallsyms])
   swapper 0 [008] 842566.451604:   5240 cycles:ppp:  
c0009984 fast_exception_return ([kernel.kallsyms])
   swapper 0 [008] 842566.451608: 134752 cycles:ppp:  
c0780af4 .inet_diag_handler_get_info ([kernel.kallsyms])
   swapper 0 [012] 842566.451616:  1 cycles:ppp:  
c01f2828 .mmap_region ([kernel.kallsyms])
   swapper 0 [012] 842566.451619:  1 cycles:ppp:  
c00a2d78 .kvmppc_do_h_enter ([kernel.kallsyms])
   swapper 0 [012] 842566.451620: 11 cycles:ppp:  
c001f26c .vsx_unavailable_tm ([kernel.kallsyms])
   swapper 0 [012] 842566.451621:226 

RE: [PATCH perf 3/4] perf tools: Fix unused variables: x86_{32,64}_regoffset_table

2016-01-21 Thread 平松雅巳 / HIRAMATU,MASAMI
>From: Wangnan (F) [mailto:wangn...@huawei.com]
>
>On 2016/1/20 21:59, Arnaldo Carvalho de Melo wrote:
>> Em Tue, Jan 19, 2016 at 09:33:06PM +, Ben Hutchings escreveu:
>>> gcc 5 doesn't seem to care about these, but gcc 6 does and that
>>> results in a build failure.
>> Ben, please CC the people on the CC list for the patch that introduces
>> the problem, Wang, He, can I have your Acked-by?
>>
>> - Arnaldo
>>
>
>This patch lead me find a bug in original code.
>
>If both perf and target ELF binary is x86_64, following command works okay:
>
>  # perf probe -v -n --exec /tmp/oxygen_root/lib64/libc.so.6 pselect
>data exceptfds readfds writefds nfds sigmask tval timeout
>  
>  Opening /sys/kernel/debug/tracing//uprobe_events write=1
>  Writing event: p:probe_libc/pselect
>/home/w00229757/oxygen_root-w00229757/lib64/libc-2.18.so:0xdfef0
>data=-216(%sp):u64 exceptfds=%cx:u64 readfds=%si:u64 writefds=%dx:u64
>nfds=%di:s32 sigmask=%r9:u64 tval=-232(%sp):u64 timeout=%r8:u64
>  
>
>But if the library is x86_32, result is incorrect:
>
>   # perf probe -v -n --exec /tmp/oxygen_root/lib32/libc.so.6 pselect
>data exceptfds readfds writefds nfds sigmask tval
>   
>   Writing event: p:probe_libc/pselect
>/tmp/oxygen_root-w00229757/lib32/libc-2.18.so:0xd1330 data=-172(%si):u64
>exceptfds=+16(%si):u32 readfds=+8(%si):u32 writefds=+12(%si):u32
>nfds=+4(%si):s32 sigmask=+24(%si):u32 tval=-180(%si):u64
>timeout=+20(%si):u32
>   
>
>We know that (%si) is used to passing arguments. Here we should see
>'%sp' or '$stack'.
>
>Use a x86_32 perf we get currect result:
>
>  # ~/perf probe -v -n --exec /tmp/oxygen_root/lib32/libc.so.6 pselect
>data exceptfds readfds writefds nfds sigmask tval
>  
>  Writing event: p:probe_libc/pselect
>/tmp/oxygen_root-w00229757/lib32/libc-2.18.so:0xd1330
>data=-172($stack):u64 exceptfds=+16($stack):u32 readfds=+8($stack):u32
>writefds=+12($stack):u32 nfds=+4($stack):s32 sigmask=+24($stack):u32
>tval=-180($stack):u64
>  

Ah, I see. Uprobes may not check the target binary is in 32bit mode.
Since the stack of x86-64 and x86-32 on pt_regs are different,
(regs->sp points stack on x86-64, &(regs->pt) points stack on x86-32)
uprobes would better checking and change the behavior.

But anyway, it is also fixed by changing perf's register table.

>
>
>Use a small test program to check the result:
>
>  #include 
>  #include 
>  #include 
>  #include 
>
>  static struct {
> fd_set r, w, e;
> struct timespec ts;
> sigset_t m;
>  } s;
>
>  int main()
>  {
> memset(, '\0', sizeof(s));
>
> pselect(0, , , , , );
> return 0;
>  }
>
># gcc -m32 -g ./test_pselect.c
>
>Use x86_32 perf:
>
># ./perf probe -v  --exec /tmp/oxygen_root/lib32/libc.so.6 pselect data
>exceptfds readfds writefds nfds sigmask tval
>Writing event: p:probe_libc/pselect
>/tmp/oxygen_root-w00229757/lib32/libc-2.18.so:0xd1330
>data=-172($stack):u64 exceptfds=+16($stack):u32 readfds=+8($stack):u32
>writefds=+12($stack):u32 nfds=+4($stack):s32 sigmask=+24($stack):u32
>tval=-180($stack):u64
>Added new event:
>   probe_libc:pselect   (on pselect in
>/tmp/oxygen_root-w00229757/lib32/libc-2.18.so with data exceptfds
>readfds writefds nfds sigmask tval)
>
>You can now use it in all perf tools, such as:
>
> perf record -e probe_libc:pselect -aR sleep 1
>
># ./perf record -e probe_libc:pselect ./a.out
>[ perf record: Woken up 1 times to write data ]
>[ perf record: Captured and wrote 0.011 MB perf.data (1 samples) ]
># ./perf script
>a.out 25336 [006] 64588.457597: probe_libc:pselect:
>(f7663330) data=0xf772e000 exceptfds=0x8049880 readfds=0x8049780
>writefds=0x8049800 nfds=0 sigmask=0x8049908 tval=0x0
>
>Switch to x86_64 perf:
>
>  # ./perf probe -v  --exec /tmp/oxygen_root/lib32/libc.so.6 pselect
>data exceptfds readfds writefds nfds sigmask tval
>  
>  Opening /sys/kernel/debug/tracing//uprobe_events write=1
>Writing event: p:probe_libc/pselect
>/tmp/oxygen_root-w00229757/lib32/libc-2.18.so:0xd1330 data=-172(%si):u64
>exceptfds=+16(%si):u32 readfds=+8(%si):u32 writefds=+12(%si):u32
>nfds=+4(%si):s32 sigmask=+24(%si):u32 tval=-180(%si):u64
>Added new event:
>   probe_libc:pselect   (on pselect in
>/tmp/oxygen_root-w00229757/lib32/libc-2.18.so with data exceptfds
>readfds writefds nfds sigmask tval)
>
>You can now use it in all perf tools, such as:
>
> perf record -e probe_libc:pselect -aR sleep 1
>
># ./perf record -e probe_libc:pselect ./a.out
>[ perf record: Woken up 1 times to write data ]
>[ perf record: Captured and wrote 0.011 MB perf.data (1 samples) ]
># ./perf script
>a.out 25599 [002] 64759.743554: probe_libc:pselect:
>(f76e7330) data=0x0 exceptfds=0x0 readfds=0x0 writefds=0x0 nfds=0
>sigmask=0x0 tval=0x0
>
>Sad...
>
>I think this problem is not introduced by my patch. In fact
>there's a fundamental problem in get_arch_regstr() that it is
>impossible to switch sub ISA.

Right, but I guess this can fixed by switching %sp (for x86-64)
and +0(%sp) (for x86-32) 

Re: [PATCH V2 2/3] vhost: introduce vhost_vq_more_avail()

2016-01-21 Thread Jason Wang


On 01/20/2016 10:09 PM, Michael S. Tsirkin wrote:
> On Tue, Dec 01, 2015 at 02:39:44PM +0800, Jason Wang wrote:
>> Signed-off-by: Jason Wang 
> Wow new API with no comments anywhere, and no
> commit log to say what it's good for.
> Want to know what it does and whether
> it's correct? You have to read the next patch.
>
> So what is the point of splitting it out?
> It's confusing, and in fact it made you
> miss a bug.

Ok, will add comments to explain the function.

>
>> ---
>>  drivers/vhost/vhost.c | 13 +
>>  drivers/vhost/vhost.h |  1 +
>>  2 files changed, 14 insertions(+)
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index 163b365..4f45a03 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -1633,6 +1633,19 @@ void vhost_add_used_and_signal_n(struct vhost_dev 
>> *dev,
>>  }
>>  EXPORT_SYMBOL_GPL(vhost_add_used_and_signal_n);
>>  
>> +bool vhost_vq_more_avail(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>> +{
>> +__virtio16 avail_idx;
>> +int r;
>> +
>> +r = __get_user(avail_idx, >avail->idx);
>> +if (r)
>> +return false;
> So the result is that if the page is not present,
> you return false (empty ring) and the
> caller will busy wait with preempt disabled.
> Nasty.
>
> So it should return something that breaks
> the loop, and this means it should have
> a different name for the return code
> to make sense.
>
> Maybe reverse the polarity: vhost_vq_avail_empty?
> And add a comment saying we can't be sure ring
> is empty so return false.

Sounds good, will do this.

>
>> +
>> +return vhost16_to_cpu(vq, avail_idx) != vq->avail_idx;
>> +}
>> +EXPORT_SYMBOL_GPL(vhost_vq_more_avail);
>> +
>>  /* OK, now we need to know about added descriptors. */
>>  bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>>  {
>> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
>> index 43284ad..2f3c57c 100644
>> --- a/drivers/vhost/vhost.h
>> +++ b/drivers/vhost/vhost.h
>> @@ -159,6 +159,7 @@ void vhost_add_used_and_signal_n(struct vhost_dev *, 
>> struct vhost_virtqueue *,
>> struct vring_used_elem *heads, unsigned count);
>>  void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
>>  void vhost_disable_notify(struct vhost_dev *, struct vhost_virtqueue *);
>> +bool vhost_vq_more_avail(struct vhost_dev *, struct vhost_virtqueue *);
>>  bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
>>  
>>  int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
>> -- 
>> 2.5.0
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [PATCH 1/2] regulator: ltc3589: make IRQ optional

2016-01-21 Thread Lothar Waßmann
Hi,

> On Thu, Jan 21, 2016 at 12:33:11PM +0100, Lothar Waßmann wrote:
> > > On Thu, Jan 21, 2016 at 11:26:11AM +0100, Lothar Waßmann wrote:
> > > > > On Thu, Jan 21, 2016 at 08:05:24AM +0100, Lothar Waßmann wrote:
> > > > > > > On Wed, Jan 20, 2016 at 01:29:51PM +0100, Lothar Waßmann wrote:
> 
> > > > > > > > This pin is used as IRQ pin for the LTC3589 PMIC on the Ka-Ro
> > > > > > > > electronics TX48 module. Make the IRQ optional in the driver 
> > > > > > > > and use a
> > > > > > > > polling routine instead if no IRQ is specified in DT.
> > > > > > > > Otherwise the driver will continuously generate interrupts and 
> > > > > > > > make
> > > > > > > > the system unusable.
> 
> > It won't. That's the whole purpose of this patch.
> > I'm afraid, I don't quite understand what you want to say...
> 
> Your commit message (quoted above) claims that without this patch if no
> interrupt is supplied then the unsupplied interrupt will somehow be left
> screaming and make the system unusable.  This doesn't make sense, if
> there is no interrupt there is nothing to scream.
> 
"Otherwise" meant the case where the IRQ is specified in DT as is
currently required to get the driver loaded at all.

> > Without this patch there will be a constantly active interrupt, which
> > will stall the system because the nNMI interrupt (on the EXTINTn pin) is
> > level triggered.
> > Since the polarity of the interrupt input is fixed, there is no way to
> > use it in our HW.
> 
> So, contrary to what you've been saying, the interrupt is actually
> connected (and worse, connected to a NMI) but apparently not described
> in DT.  Why is it sensible to make the driver poll (which will affect
> all systems using this device, even those that don't care) and not just
> describe the interrupt in DT so it can be handled promptly in the normal
> fashion?  Presumably this will run into serious problems if the
> interrupt actually fires at runtime since the NMI will scream, it's not
> clear to me how the poll will manage to run successfully in that case.
> 
Currently the driver won't even load without an IRQ specified in DT.
My patch makes it possible to use the driver without requiring an IRQ!


Lothar Waßmann


[PATCH] MAINTAINERS: return arch/sh to maintained state, with new maintainers

2016-01-21 Thread Rich Felker
From: Rich Felker 

Add Yoshinori Sato and Rich Felker as maintainers for arch/sh
(SUPERH).

Signed-off-by: Rich Felker 
Signed-off-by: Yoshinori Sato 
Acked-by: D. Jeff Dionne 
Acked-by: Rob Landley 
Acked-by: Peter Zijlstra (Intel) 
Acked-by: Simon Horman 
Acked-by: Geert Uytterhoeven 

---

Andrew, since we don't have our own repo up for Linus to pull from
yet, could you commit this? Geert (who's the closest there is to an
acting maintainer for sh right now) and I thought that would make the
most sense. If possible I'd really like to get this in the merge
window to make it official that sh isn't abandoned.


diff --git a/MAINTAINERS b/MAINTAINERS
index 9bff63c..55e48b1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10274,9 +10274,11 @@ S: Maintained
 F: drivers/net/ethernet/dlink/sundance.c
 
 SUPERH
+M: Yoshinori Sato 
+M: Rich Felker 
 L: linux...@vger.kernel.org
 Q: http://patchwork.kernel.org/project/linux-sh/list/
-S: Orphan
+S: Maintained
 F: Documentation/sh/
 F: arch/sh/
 F: drivers/sh/


Re: UBSAN: run-time undefined behavior sanity checker

2016-01-21 Thread Dave Jones
On Thu, Jan 21, 2016 at 08:57:17PM +, Linux Kernel wrote:
 > Web:
 > https://git.kernel.org/torvalds/c/c6d308534aef6c99904bf5862066360ae067abc4
 > Commit: c6d308534aef6c99904bf5862066360ae067abc4
 > Parent: 68920c973254c5b71a684645c5f6f82d6732c5d6
 > Refname:refs/heads/master
 > Author: Andrey Ryabinin 
 > AuthorDate: Wed Jan 20 15:00:55 2016 -0800
 > Committer:  Linus Torvalds 
 > CommitDate: Wed Jan 20 17:09:18 2016 -0800
 > 
 > UBSAN: run-time undefined behavior sanity checker
 > 
 > UBSAN uses compile-time instrumentation to catch undefined behavior
 > (UB).  Compiler inserts code that perform certain kinds of checks before
 > operations that could cause UB.  If check fails (i.e.  UB detected)
 > __ubsan_handle_* function called to print error message.
 > 
 > So the most of the work is done by compiler.  This patch just implements
 > ubsan handlers printing errors.
 > 
 > GCC has this capability since 4.9.x [1] (see -fsanitize=undefined
 > option and its suboptions).
 > However GCC 5.x has more checkers implemented [2].
 > Article [3] has a bit more details about UBSAN in the GCC.

If I enable this and CONFIG_UBSAN_ALIGNMENT, the kernel doesn't boot,
and hangs really early (pretty much as soon as I hit return in grub)
far too early for serial console or even tty output.

Compiler is debian unstable's 5.3.1 20160114

I don't know if this is worth chasing down, I chose to just disable it,
but figured I'd post in case other people stumble across the same issue.

That aside though, neat feature. I look forward to breaking kernels with it :)

Dave



RE: [PATCH v3 3/4] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2016-01-21 Thread Wu, Feng


> -Original Message-
> From: Radim Krčmář [mailto:rkrc...@redhat.com]
> Sent: Friday, January 22, 2016 4:17 AM
> To: Wu, Feng 
> Cc: pbonz...@redhat.com; linux-kernel@vger.kernel.org;
> k...@vger.kernel.org
> Subject: Re: [PATCH v3 3/4] KVM: x86: Add lowest-priority support for vt-d
> posted-interrupts
> 
> 2016-01-20 09:42+0800, Feng Wu:
> > Use vector-hashing to deliver lowest-priority interrupts for
> > VT-d posted-interrupts. This patch extends kvm_intr_is_single_vcpu()
> > to support lowest-priority handling.
> >
> > Signed-off-by: Feng Wu 
> > ---
> > v3:
> > - Remove unnecessary check in fast irq delivery patch
> > - print a error message only once for each guest when we find hardware
> >   disabled LAPIC during interrupt injection.
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h
> b/arch/x86/include/asm/kvm_host.h
> > @@ -1316,8 +1316,8 @@ int x86_set_memory_region(struct kvm *kvm, int
> id, gpa_t gpa, u32 size);
> >  bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu);
> >  bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu);
> >
> > -bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
> > -struct kvm_vcpu **dest_vcpu);
> > +bool kvm_intr_can_posting(struct kvm *kvm, struct kvm_lapic_irq *irq,
> 
> I prefer the original one;  I think it's better to describe function
> than intent in names -- intention should be obvious from its use.
> 
> > + struct kvm_vcpu **dest_vcpu);
> >
> >  void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
> >  struct kvm_lapic_irq *irq);
> > diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> > @@ -300,13 +300,13 @@ out:
> > return r;
> >  }
> >
> > -bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
> > -struct kvm_vcpu **dest_vcpu)
> > +bool kvm_intr_can_posting(struct kvm *kvm, struct kvm_lapic_irq *irq,
> > +   struct kvm_vcpu **dest_vcpu)
> >  {
> > int i, r = 0;
> > struct kvm_vcpu *vcpu;
> >
> > -   if (kvm_intr_is_single_vcpu_fast(kvm, irq, dest_vcpu))
> > +   if (kvm_intr_can_posting_fast(kvm, irq, dest_vcpu))
> > return true;
> 
> There is one pitfall:  xAPIC flat logical broadcast returns false,

Do you mean kvm_intr_can_posting_fast() returns false for
xAPIC flat logical lowest-priority broadcast?

After carefully read the code for several times, I still cannot
find the reason, could you please give more hints?

BTW, I noticed there is  a "if(irq->dest_id == 0xFF) goto out;" in
this function, but it is for the physical dest mode. I am not
sure you mean this.

> but lowest priority is defined for it (practically isn't a broadcast) and
> the rest of this function doesn't check for lowest priority, so the
> interrupt won't be posted.
> 
> We could modify our _fast functions to cover 0xff in flat logical, but
> ignoring this case isn't bad either ... it can happen only with 8 VCPU
> guests. 

Could you please elaborate a bit more why only for the 8 VCPU guests?
Thanks a lot!

Thanks,
Feng


RE: [PATCH v3 2/4] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2016-01-21 Thread Wu, Feng


> -Original Message-
> From: Radim Krčmář [mailto:rkrc...@redhat.com]
> Sent: Friday, January 22, 2016 3:50 AM
> To: Wu, Feng 
> Cc: pbonz...@redhat.com; linux-kernel@vger.kernel.org;
> k...@vger.kernel.org
> Subject: Re: [PATCH v3 2/4] KVM: x86: Use vector-hashing to deliver lowest-
> priority interrupts
> 
> 2016-01-20 09:42+0800, Feng Wu:
> > Use vector-hashing to deliver lowest-priority interrupts, As an
> > example, modern Intel CPUs in server platform use this method to
> > handle lowest-priority interrupts.
> >
> > Signed-off-by: Feng Wu 
> > ---
> 
> Functionality looks good, so I had a lot of stylistic comments, sorry :)

Any comments are welcome! Thank you! :)

> 
> > +  const unsigned long *bitmap, u32 bitmap_size)
> > +{
> > +   u32 mod;
> > +   int i, idx = 0;
> > +
> > +   mod = vector % dest_vcpus;
> > +
> > +   for (i = 0; i <= mod; i++) {
> > +   idx = find_next_bit(bitmap, bitmap_size, idx) + 1;
> 
> I'd remove this "+ 1".  Current users don't check for errors and always
> do "- 1".  The new error value could be 'idx = bitmap_size', with u32 as
> return type.
> 

Does the following code look good to you:

u32 mod;
int i, idx = -1;

mod = vector % dest_vcpus;

for (i = 0; i <= mod; i++) {
idx = find_next_bit(bitmap, bitmap_size, idx + 1);
BUG_ON(idx == bitmap_size);
}

return idx;

Thanks,
Feng


Re: [RFC PATCH] mmc: dw_mmc: remove redundant num_slots check

2016-01-21 Thread Jaehoon Chung
On 01/22/2016 12:07 PM, Shawn Lin wrote:
> On 2016/1/22 10:46, Jaehoon Chung wrote:
>> Hi, Shawn.
>>
>> On 01/21/2016 04:52 PM, Shawn Lin wrote:
>>> num_slots comes from pdata if existing, otherwise from
>>> dw_mci_parse_dt which make it at least one slot. If
>>> num_slots is less than 1 for the existing pdata case,
>>> current code return -ENODEV. But dw_mci_probe seems to
>>> treat this a optional case as it will call SDMMC_GET_SLOT_NUM
>>> if no slot assigned.
>>
>> Well, we need to consider more thing..
>> Host can get the number of slot from SDMMC_GET_SLOT_NUM().
>> But i think this way also has the problem.
>>
>> num_slot isn't defined anywhere, and num_slot should be set to value of 
>> SDMMC_GET_SLOT_NUM.
>> If that value is higher than 1, it should be blocking..(I didn't test all 
>> cases..)
>>
> 
> Actually, from the code itself, it confused me the way about how we get
> num_slot. At leaset, we might should try to cleanup it someway to make
> it a little more clear. And just as what you point out, we see some
> broblem here.
> 
>> Even though this patch is not correct, i could check the problem relevant to 
>> num_slot, because of this patch. :)
>>
> 
> Nice to here that. I make it a RFC patch since I also not quite sure
> about all cases including some corner cases. Let's think it twice.
> 
>> my suggestion is if pdata->num_slot is not defined anywhere, just set to 1 
>> by default.
>> not take from SDMMC_GET_SLOT_NUM.
>>
> 
> yes, SDMMC_GET_SLOT_NUM is the capability of controller, num_slot is
> hardware wired number. So, geting it from SDMMC_GET_SLOT_NUM has
> problem.
> 
>> if (host->pdata->nums_slots < 1 ||
>> host->pdata->nums_slots > SDMMC_GET_SLOT_NUM())
>>
>> This is correct condition. num_slots can't be higher than number of 
>> supported slots.
>> how about?
> 
> Seems reasonable.
> I guess you want to come up with a new patch dealing with it? :)

No matter who does this.
If you are ok, i will wait for patch. :)

Best Regards,
Jaehoon Chung

> 
>>
>> Best Regards,
>> Jaehoon Chung
>>
>>>
>>> Signed-off-by: Shawn Lin 
>>>
>>> ---
>>>
>>>   drivers/mmc/host/dw_mmc.c | 6 --
>>>   1 file changed, 6 deletions(-)
>>>
>>> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
>>> index 7128351..a116ec6 100644
>>> --- a/drivers/mmc/host/dw_mmc.c
>>> +++ b/drivers/mmc/host/dw_mmc.c
>>> @@ -2949,12 +2949,6 @@ int dw_mci_probe(struct dw_mci *host)
>>>   }
>>>   }
>>>
>>> -if (host->pdata->num_slots < 1) {
>>> -dev_err(host->dev,
>>> -"Platform data must supply num_slots.\n");
>>> -return -ENODEV;
>>> -}
>>> -
>>>   host->biu_clk = devm_clk_get(host->dev, "biu");
>>>   if (IS_ERR(host->biu_clk)) {
>>>   dev_dbg(host->dev, "biu clock not available\n");
>>>
>>
>>
>>
>>
> 
> 



Re: [PATCH 0/6] perf core: Read from overwrite ring buffer

2016-01-21 Thread Wangnan (F)



On 2016/1/22 11:21, Alexei Starovoitov wrote:

On Fri, Jan 22, 2016 at 10:21:19AM +0800, Wangnan (F) wrote:


On 2016/1/21 14:51, Wangnan (F) wrote:


On 2016/1/20 10:20, Alexei Starovoitov wrote:

On Wed, Jan 20, 2016 at 09:37:42AM +0800, Wangnan (F) wrote:

On 2016/1/20 1:42, Alexei Starovoitov wrote:

On Tue, Jan 19, 2016 at 11:16:44AM +, Wang Nan wrote:

This patchset introduces two methods to support reading from
overwrite.

  1) Tailsize: write the size of an event at the end of it
  2) Backward writing: write the ring buffer from the end of it to
the
 beginning.

what happend with your other idea of moving the whole header to the
end?
That felt better than either of these options.

I'll try it today. However, putting all of the three together is
not as easy as this patchset.

I'm missing something. Why all three in one set?

Can't implement all three in one, but implement two of them make
benchmarking simpler :)

Here comes some numbers.

I attach a target program at the end of this mail. It calls
close(-1) for 300 times, and use gettimeofday to check
how many us it takes.

Following cases are tested:


BASE: ./a.out
RAWPERF : ./perf record -o /dev/null -e raw_syscalls:* ./a.out
WRTBKWRD: ./perf record -o /dev/null -e raw_syscalls:* ./a.out
TAILSIZE: ./perf record --no-has-write-backward -o /dev/null -e
raw_syscalls:*/overwrite/ ./a.out
RAWOVWRT: ./perf record --no-has-write-backward --no-has-tailsize -o
/dev/null -e raw_syscalls:*/overwrite/ ./a.out

With this script:

func() {
for x in `seq 1 100` ; do $1; done | tee data_$2
}

func ./a.out base
func "./perf record -o /dev/null -e raw_syscalls:* ./a.out" rawperf
func "./perf record -o /dev/null -e raw_syscalls:*/overwrite/ ./a.out"
wrtbkwrd
func "./perf record -o /dev/null --no-has-write-backward -e
raw_syscalls:*/overwrite/ ./a.out" tailsize
func "./perf record -o /dev/null --no-has-write-backward --no-has-tailsize
-o /dev/null -e raw_syscalls:*/overwrite/ ./a.out" rawovwrt

Result:

MEAN   STDVAR
BASE:  879870.81  11913.13
RAWPERF : 2603854.7  706658.4
WRTBKWRD: 2313301.220  6727.957
TAILSIZE: 2383051.860  5248.061
RAWOVWRT: 2315273.180  5221.025

Add a number: I tested original perf overwrite ring buffer in pure v4.4
on the same machine:

 MEAN  STDVAR
RAWOVWRT(original): 2323970.455103.39

So I think backward writing method doesn't add extra overhead into
fastpath.

I will send this patchset again with several bugs fixed. After that
I'll start working on tail-header if it is still required.

interesting.
did I read the numbers correctly that 'write backwards' method
is actually the fastest? even faster than no-overwrite?


Yes. But notice STDVAR, we can't say 'WRTBKWRD' outperform 'RAWOVWRT'. 
However,

at least 'WRTBKWRD' should be as fast as 'RAWOVWRT'.


nice. I guess it makes snese that overwrite is faster.


In no-overwrite case perf itself wakes up many times to collect data,
I guess it is the source of high stdvar.


I guess than moving the header to the end will have the same
performance in this benchmark, since RAWOVWRT is the same as well.


Yes.

Do you want to test it by yourself? The code is ready.

Thank you.



Re: [PATCH 2/2] i2c: enable i2c adapter to suspend/resume asynchronously

2016-01-21 Thread Fu, Zhonghui

Hi Wolfram,

What do you think about this patch?


Thanks,
Zhonghui



On 12/24/2015 10:44 PM, Fu, Zhonghui wrote:
> Now, PM core supports asynchronous suspend/resume mode for devices
> during system suspend/resume, and the power state transition of one
> device may be completed in separate kernel thread. PM core ensures
> all power state transition dependency between devices. This patch
> enables i2c adapters to suspend/resume asynchronously. This will take
> advantage of multicore and improve system suspend/resume speed. After
> enabling all i2c devices, i2c adapters and i2c controllers on ASUS
> T100TA tablet, the system suspend-to-idle time is reduced to about
> 510ms from 750ms, and the system resume time is reduced to about 790ms
> from 900ms.
>
> Signed-off-by: Zhonghui Fu 
> ---
>  drivers/i2c/i2c-core.c |2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
> index ba8eb08..72d5466 100644
> --- a/drivers/i2c/i2c-core.c
> +++ b/drivers/i2c/i2c-core.c
> @@ -1564,6 +1564,8 @@ static int i2c_register_adapter(struct i2c_adapter 
> *adap)
>  
>   pm_runtime_no_callbacks(>dev);
>  
> + device_enable_async_suspend(>dev);
> +
>  #ifdef CONFIG_I2C_COMPAT
>   res = class_compat_create_link(i2c_adapter_compat_class, >dev,
>  adap->dev.parent);
> -- 1.7.1
>



Re: [PATCH 1/2] i2c: enable i2c device to suspend/resume asynchronously

2016-01-21 Thread Fu, Zhonghui

Hi Wolfram,

What do you think about this patch?


Thanks,
Zhonghui



On 12/24/2015 10:41 PM, Fu, Zhonghui wrote:
> Now, PM core supports asynchronous suspend/resume mode for devices
> during system suspend/resume, and the power state transition of one
> device may be completed in separate kernel thread. PM core ensures
> all power state transition dependency between devices. This patch
> enables i2c devices to suspend/resume asynchronously. This will take
> advantage of multicore and improve system suspend/resume speed. After
> enabling all i2c devices, i2c adapters and i2c controllers on ASUS
> T100TA tablet, the system suspend-to-idle time is reduced to about
> 510ms from 750ms, and the system resume time is reduced to about 790ms
> from 900ms.
>
> Signed-off-by: Zhonghui Fu 
> ---
>  drivers/i2c/i2c-core.c |1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
> index ba8eb08..4ff620e 100644
> --- a/drivers/i2c/i2c-core.c
> +++ b/drivers/i2c/i2c-core.c
> @@ -1072,6 +1072,7 @@ i2c_new_device(struct i2c_adapter *adap, struct 
> i2c_board_info const *info)
>   client->dev.of_node = info->of_node;
>   client->dev.fwnode = info->fwnode;
>  
> + device_enable_async_suspend(>dev);
>   i2c_dev_set_name(adap, client);
>   status = device_register(>dev);
>   if (status)
> -- 1.7.1
>



Re: [PATCH 2/2] mmc/sdhci-acpi: enable sdhci-acpi device to suspend/resume asynchronously

2016-01-21 Thread Fu, Zhonghui


On 1/12/2016 10:43 PM, Ulf Hansson wrote:
> On 28 December 2015 at 16:41, Fu, Zhonghui  
> wrote:
>> Now, PM core supports asynchronous suspend/resume mode for devices
>> during system suspend/resume, and the power state transition of one
>> device may be completed in separate kernel thread. PM core ensures
>> all power state transition dependency between devices. This patch
>> enables sdhci-acpi devices to suspend/resume asynchronously. This
>> will take advantage of multicore and improve system suspend/resume
>> speed. After enabling the sdhci-acpi devices and all their child
>> devices to suspend/resume asynchronously on ASUS T100TA, the system
>> suspend-to-idle time is reduced from 1645ms to 1089ms, and the
>> system resume time is reduced from 940ms to 908ms.
> Same comment as for patch 1.

I have updated the change log according to your comments and resent this patch 
- "[PATCH 2/2 v2] mmc/sdhci-acpi: enable sdhci-acpi device to suspend/resume 
asynchronously".


Thanks,
Zhonghui


>
>> Signed-off-by: Zhonghui Fu 
>> ---
>>  drivers/mmc/host/sdhci-acpi.c |2 ++
>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/mmc/host/sdhci-acpi.c b/drivers/mmc/host/sdhci-acpi.c
>> index f6047fc..3d27f2d 100644
>> --- a/drivers/mmc/host/sdhci-acpi.c
>> +++ b/drivers/mmc/host/sdhci-acpi.c
>> @@ -388,6 +388,8 @@ static int sdhci_acpi_probe(struct platform_device *pdev)
>> pm_runtime_enable(dev);
>> }
>>
>> +   device_enable_async_suspend(dev);
>> +
>> return 0;
>>
>>  err_free:
>> -- 1.7.1
>>
> Otherwise, looks good!
>
> Kind regards
> Uffe



[PATCH 2/2 v2] mmc/sdhci-acpi: enable sdhci-acpi device to suspend/resume asynchronously

2016-01-21 Thread Fu, Zhonghui
This patch enables sdhci-acpi devices to suspend/resume asynchronously.
This will improve system suspend/resume speed. After enabling the
sdhci-acpi devices and all their child devices to suspend/resume
asynchronously on ASUS T100TA, the system suspend-to-idle time is
reduced from 1645ms to 1089ms, and the system resume time is reduced
from 940ms to 908ms.

Signed-off-by: Zhonghui Fu 
---
Changes in v2:
- Update commit message

 drivers/mmc/host/sdhci-acpi.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/mmc/host/sdhci-acpi.c b/drivers/mmc/host/sdhci-acpi.c
index f6047fc..3d27f2d 100644
--- a/drivers/mmc/host/sdhci-acpi.c
+++ b/drivers/mmc/host/sdhci-acpi.c
@@ -388,6 +388,8 @@ static int sdhci_acpi_probe(struct platform_device *pdev)
pm_runtime_enable(dev);
}
 
+   device_enable_async_suspend(dev);
+
return 0;
 
 err_free:
-- 1.7.1



Re: [PATCH 23/33] x86/asm/bpf: Create stack frames in bpf_jit.S

2016-01-21 Thread Alexei Starovoitov
On Thu, Jan 21, 2016 at 09:55:31PM -0600, Josh Poimboeuf wrote:
> On Thu, Jan 21, 2016 at 06:44:28PM -0800, Alexei Starovoitov wrote:
> > On Thu, Jan 21, 2016 at 04:49:27PM -0600, Josh Poimboeuf wrote:
> > > bpf_jit.S has several callable non-leaf functions which don't honor
> > > CONFIG_FRAME_POINTER, which can result in bad stack traces.
> > > 
> > > Create a stack frame before the call instructions when
> > > CONFIG_FRAME_POINTER is enabled.
> > > 
> > > Signed-off-by: Josh Poimboeuf 
> > > Cc: Alexei Starovoitov 
> > > Cc: net...@vger.kernel.org
> > > ---
> > >  arch/x86/net/bpf_jit.S | 9 +++--
> > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/net/bpf_jit.S b/arch/x86/net/bpf_jit.S
> > > index eb4a3bd..f2a7faf 100644
> > > --- a/arch/x86/net/bpf_jit.S
> > > +++ b/arch/x86/net/bpf_jit.S
> > > @@ -8,6 +8,7 @@
> > >   * of the License.
> > >   */
> > >  #include 
> > > +#include 
> > >  
> > >  /*
> > >   * Calling convention :
> > > @@ -65,16 +66,18 @@ FUNC(sk_load_byte_positive_offset)
> > >  
> > >  /* rsi contains offset and can be scratched */
> > >  #define bpf_slow_path_common(LEN)\
> > > + lea -MAX_BPF_STACK + 32(%rbp), %rdx;\
> > > + FRAME_BEGIN;\
> > >   mov %rbx, %rdi; /* arg1 == skb */   \
> > >   push%r9;\
> > >   pushSKBDATA;\
> > >  /* rsi already has offset */ \
> > >   mov $LEN,%ecx;  /* len */   \
> > > - lea - MAX_BPF_STACK + 32(%rbp),%rdx;\
> > >   callskb_copy_bits;  \
> > >   test%eax,%eax;  \
> > >   pop SKBDATA;\
> > > - pop %r9;
> > > + pop %r9;\
> > > + FRAME_END
> > 
> > I'm not sure what above is doing.
> > There is already 'push rbp; mov rbp,rsp' at the beginning of generated
> > code and with above the stack trace will show two function at the same ip?
> > since there were no calls between them?
> > I think the stack walker will get even more confused?
> > Also the JIT of bpf_call insn will emit variable number of push/pop
> > around the call and I definitely don't want to add extra push rbp
> > there, since it's the critical path and callee will do its own
> > push rbp.
> > Also there are push/pops emitted around div/mod
> > and there is indirect goto emitted as well for bpf_tail_call
> > that jumps into different function body without touching
> > current stack.
> 
> Hm, I'm not sure I follow.  Let me try to explain my understanding.
> 
> As you mentioned, the generated code sets up the frame pointer.  From
> emit_prologue():
> 
> EMIT1(0x55); /* push rbp */
>   EMIT3(0x48, 0x89, 0xE5); /* mov rbp,rsp */
> 
> And then later, do_jit() can generate a call into the functions in
> bpf_jit.S.  For example:
> 
>   func = CHOOSE_LOAD_FUNC(imm32, sk_load_word);
>   ...
>   EMIT1_off32(0xE8, jmp_offset); /* call */
> 
> So the functions in bpf_jit.S are being called by the generated code.
> They're not part of the generated code itself.  So they're callees and
> need to create their own stack frame before they call out to somewhere
> else.
> 
> Or did I miss something?

yes. all correct.
This particular patch is ok, since it adds to
bpf_slow_path_common and as the name says it's slow and rare,
but wanted to make sure the rest of it is understood.

> > Also none of the JITed function are dwarf annotated.
> 
> But what does that have to do with frame pointers?

nothing, but then why do you need
.type name, @function
annotations in another patch?

> > I could be missing something. I think either this patch
> > is not need or you need to teach the tool to ignore
> > all JITed stuff. I don't think it's practical to annotate
> > everything. Different JITs do their own magic.
> > s390 JIT is even more fancy.
> 
> Well, but the point of these patches isn't to make the tool happy.  It's
> really to make sure that runtime stack traces can be made reliable.
> Maybe I'm missing something but I don't see why JIT code can't honor
> CONFIG_FRAME_POINTER just like any other code.

It can if there is no performance cost added.
I can speak for x64 JIT, but the rest needs to be analyzed as well.
My point was that may be it's easier to ignore all JITed code and
just say that such call stacks may be unreliable?
live-patching is not applicable to JITed code anyway
or you want to livepatch the callees of it?



[ANNOUNCE]: SCST 3.1 release

2016-01-21 Thread Vladislav Bolkhovitin
Hi All,

I'm glad to announce that SCST version 3.1 has just been released and available 
for
download from http://scst.sourceforge.net/downloads.html.

Highlights for this release:

 - Cluster support for SCSI reservations. This feature is essential for 
initiator-side
clustering approaches based on persistent reservations, e.g. the quorum disk
implementation in Windows Clustering.

 - Full support for VAAI or vStorage API for Array Integration: Extended Copy 
command
support has been added as well as performance of WRITE SAME and of Atomic Test 
& Set,
also known as COMPARE AND WRITE, has been improved.

 - T10-PI support has been added.

 - ALUA support has been improved: explicit ALUA (SET TARGET PORT GROUPS 
command) has
been added and DRBD compatibility has been improved.

 - SCST events user space infrastructure has been added, so now SCST can notify 
a user
space agent about important internal and fabric events.

 - QLogic target driver has been significantly improved.

SCST is alternative SCSI target stack for Linux. SCST allows creation of 
sophisticated
storage devices, which can provide advanced functionality, like replication, 
thin
provisioning, deduplication, high availability, automatic backup, etc. Majority 
of
recently developed SAN appliances, especially higher end ones, are SCST based. 
It might
well be that your favorite storage appliance running SCST in the firmware.

More info about SCST and its modules you can find on: 
http://scst.sourceforge.net

Thanks to all who made it happen, especially to SanDisk for the great support! 
All
above highlights development was supported by SanDisk.

Vlad



Re: [BUG] Regression introduced with "block: split bios to max possible length"

2016-01-21 Thread Linus Torvalds
On Thu, Jan 21, 2016 at 7:21 PM, Keith Busch  wrote:
> On Thu, Jan 21, 2016 at 05:12:13PM -0800, Linus Torvalds wrote:
>>
>> I assume that in this case it's simply that
>>
>>  - max_sectors is some odd number in sectors (ie 65535)
>>
>>  - the block size is larger than a sector (ie 4k)
>
> Wouldn't that make max sectors non-sensical? Or am I mistaken to think max
> sectors is supposed to be a valid transfer in multiples of the physical
> sector size?

If the controller interface is some 16-bit register, then the maximum
number of sectors you can specify is 65535.

But if the disk then doesn't like 512-byte accesses, but wants 4kB or
whatever, then clearly you can't actually *feed* it that maximum
number. Not because it's a maximal, but because it's not aligned.

But that doesn't mean that it's non-sensical. It just means that you
have to take both things into account.  There may be two totally
independent things that cause the two (very different) rules on what
the IO can look like.

Obviously there are probably games we could play, like always limiting
the maximum sector number to a multiple of the sector size. That would
presumably work for Stefan's case, by simply "artificially" making
max_sectors be 65528 instead.

But I do think it's better to consider them independent issues, and
just make sure that we always honor those things independently.

That "honor things independently" used to happen automatically before,
simply because we'd never split in the middle of a bio segment. And
since each bio segment was created with the limitations of the device
in mind, that all worked.

Now that it splits in the middle of a vector entry, that splitting
just needs to honor _all_ the rules. Not just the max sector one.

>> What I think it _should_ do is:
>>
>>  (a) check against max sectors like it used to do:
>>
>> if (sectors + (bv.bv_len >> 9) > queue_max_sectors(q))
>> goto split;
>
> This can create less optimal splits for h/w that advertise chunk size. I
> know it's a quirky feature (wasn't my idea), but the h/w is very slow
> to not split at the necessary alignments, and we used to handle this
> split correctly.

I suspect few high-performance controllers will really have big issues
with the max_sectors thing. If you have big enough IO that you could
hit the maximum sector number, you're already pretty well off, you
might as well split at that point.

So I think it's ok to split at the max sector case early.

For the case of nvme, for example, I think the max sector number is so
high that you'll never hit that anyway, and you'll only ever hit the
chunk limit. No?

So in practice it won't matter, I suspect.

 Linus


Re: [PATCH 31/33] bpf: Add __bpf_prog_run() to stacktool whitelist

2016-01-21 Thread Josh Poimboeuf
On Thu, Jan 21, 2016 at 06:55:41PM -0800, Alexei Starovoitov wrote:
> On Thu, Jan 21, 2016 at 04:49:35PM -0600, Josh Poimboeuf wrote:
> > stacktool reports the following false positive warnings:
> > 
> >   stacktool: kernel/bpf/core.o: __bpf_prog_run()+0x5c: sibling call from 
> > callable instruction with changed frame pointer
> >   stacktool: kernel/bpf/core.o: __bpf_prog_run()+0x60: function has 
> > unreachable instruction
> >   stacktool: kernel/bpf/core.o: __bpf_prog_run()+0x64: function has 
> > unreachable instruction
> >   [...]
> > 
> > It's confused by the following dynamic jump instruction in
> > __bpf_prog_run()::
> > 
> >   jmp *(%r12,%rax,8)
> > 
> > which corresponds to the following line in the C code:
> > 
> >   goto *jumptable[insn->code];
> > 
> > There's no way for stacktool to deterministically find all possible
> > branch targets for a dynamic jump, so it can't verify this code.
> > 
> > In this case the jumps all stay within the function, and there's nothing
> > unusual going on related to the stack, so we can whitelist the function.
> 
> well, few things are very unusual in this function.
> did you see what JMP_CALL does? it's a call into a different function,
> but not like typical indirect call. Will it be ok as well?
> 
> In general it's not possible for any tool to identify all possible
> branch targets. bpf programs can be loaded on the fly and
> jumping sequence will change.
> So if this marking says 'don't bother analyzing this function because
> it does sane stuff' that's probably not the case.
> If this marking says 'don't bother analyzing, the stack may be crazy
> from here on' then it's ok.

So the tool doesn't need to follow all possible call targets.  Instead
it just verifies that all functions follow the frame pointer convention.
That way it doesn't matter *which* function is being called because they
all do the right thing.

But it *does* need to follow all jump targets, so that it can analyze
all possible code paths within the function itself.  With a dynamic
jump, it can't do that.

So the JMP_CALL is fine, but the goto *jumptable[insn->code] isn't.
(And BTW that's the only occurrence of such a dynamic jump table in the
entire kernel.)

-- 
Josh


Re: [PATCH v3 2/4] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2016-01-21 Thread Yang Zhang

On 2016/1/22 1:21, rkrc...@redhat.com wrote:

2016-01-21 05:33+, Wu, Feng:

From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
ow...@vger.kernel.org] On Behalf Of Yang Zhang
On 2016/1/20 9:42, Feng Wu wrote:

+   /*
+* We may find a hardware disabled LAPIC here, if

that

+* is the case, print out a error message once for each
+* guest and return.
+*/
+   if (!dst[idx-1] &&
+   (kvm->arch.disabled_lapic_found == 0)) {
+   kvm->arch.disabled_lapic_found = 1;
+   printk(KERN_ERR
+   "Disabled LAPIC found during irq

injection\n");

+   goto out;


What does "goto out" mean? Inject successfully or fail? According the
value of ret which is set to ture here, it means inject successfully but


(true actually means that fast path did the job and slow path isn't
  needed.)


i = -1.


(I think there isn't a practical difference between *r=-1 and *r=0.)


Currently, if *r == -1, the remote_irr may get set. But it seems wrong. 
I need to have a double check to see whether it is a bug in current code.





Oh, I didn't notice 'ret' is initialized to true, I thought it was initialized
to false like another function, I should add a "ret = false' here. We should
failed to inject the interrupt since hardware disabled LAPIC is found.


'ret = true' is the better one.  We know that the interrupt is not
deliverable [1], so there's no point in trying to deliver with the slow
path.  We behave similarly when the interrupt targets a single disabled
APIC.

---
1: Well ... it's possible that slowpath would deliver it thanks to
different handling of disabled APICs, but it's undefined behavior,


why it is undefined behavior? Besides, why we will keep two different 
handling logic for the fast path and slow path? It looks weird.



so it doesn't matter matter if we don't try.




--
best regards
yang


[PATCH v3] HID: Support for CMedia CM6533 HID audio jack controls

2016-01-21 Thread Ben Chen
Thanks for your time.

The C-Media CM6533 is a USB audio chip featuring it's jack detection 
capability.The device originates an interrupt transfer via HID interface 
each time when a jack event occurs.
The purpose of this patch is to handle hid raw events to keep the operating 
system informed of user interactions.

Signed-off-by: Ben Chen 
---
Changes in v3:
- Renaming the driver to hid-cmedia.
Changes in v2:
- The return type of input_configured callback has been changed from void to be 
int.
 drivers/hid/Kconfig  |   6 ++
 drivers/hid/Makefile |   1 +
 drivers/hid/hid-cmedia.c | 168 +++
 drivers/hid/hid-core.c   |   1 +
 drivers/hid/hid-ids.h|   1 +
 5 files changed, 177 insertions(+)
 create mode 100644 drivers/hid/hid-cmedia.c

diff --git a/drivers/hid/Kconfig b/drivers/hid/Kconfig
index 513a16c..4117225 100644
--- a/drivers/hid/Kconfig
+++ b/drivers/hid/Kconfig
@@ -196,6 +196,12 @@ config HID_PRODIKEYS
  multimedia keyboard, but will lack support for the musical keyboard
  and some additional multimedia keys.
 
+config HID_CMEDIA
+   tristate "CMedia CM6533 HID audio jack controls"
+   depends on HID
+   ---help---
+   Support for CMedia CM6533 HID audio jack controls.
+
 config HID_CP2112
tristate "Silicon Labs CP2112 HID USB-to-SMBus Bridge support"
depends on USB_HID && I2C && GPIOLIB
diff --git a/drivers/hid/Makefile b/drivers/hid/Makefile
index 00011fe..be56ab6 100644
--- a/drivers/hid/Makefile
+++ b/drivers/hid/Makefile
@@ -29,6 +29,7 @@ obj-$(CONFIG_HID_BELKIN)  += hid-belkin.o
 obj-$(CONFIG_HID_BETOP_FF) += hid-betopff.o
 obj-$(CONFIG_HID_CHERRY)   += hid-cherry.o
 obj-$(CONFIG_HID_CHICONY)  += hid-chicony.o
+obj-$(CONFIG_HID_CMEDIA)   += hid-cmedia.o
 obj-$(CONFIG_HID_CORSAIR)  += hid-corsair.o
 obj-$(CONFIG_HID_CP2112)   += hid-cp2112.o
 obj-$(CONFIG_HID_CYPRESS)  += hid-cypress.o
diff --git a/drivers/hid/hid-cmedia.c b/drivers/hid/hid-cmedia.c
new file mode 100644
index 000..7230f85
--- /dev/null
+++ b/drivers/hid/hid-cmedia.c
@@ -0,0 +1,168 @@
+/*
+ * HID driver for CMedia CM6533 audio jack controls
+ *
+ * Copyright (C) 2015 Ben Chen 
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include "hid-ids.h"
+
+MODULE_AUTHOR("Ben Chen");
+MODULE_DESCRIPTION("CM6533 HID jack controls");
+MODULE_LICENSE("GPL");
+
+#define CM6533_JD_TYPE_COUNT  1
+#define CM6533_JD_RAWEV_LEN 16
+#define CM6533_JD_SFX_OFFSET 8
+
+/*
+*
+*CM6533 audio jack HID raw events:
+*
+*Plug in:
+*01000600 002083xx 080008c0 1000
+*about 3 seconds later...
+*01000a00 002083xx 08000380 1000
+*01000600 002083xx 08000380 1000
+*
+*Plug out:
+*01000400 002083xx 080008c0 x000
+*/
+
+static const u8 ji_sfx[] = { 0x08, 0x00, 0x08, 0xc0 };
+static const u8 ji_in[]  = { 0x01, 0x00, 0x06, 0x00 };
+static const u8 ji_out[] = { 0x01, 0x00, 0x04, 0x00 };
+
+static int jack_switch_types[CM6533_JD_TYPE_COUNT] = {
+   SW_HEADPHONE_INSERT,
+};
+
+struct cmhid {
+   struct input_dev *input_dev;
+   struct hid_device *hid;
+   unsigned short switch_map[CM6533_JD_TYPE_COUNT];
+};
+
+static void hp_ev(struct hid_device *hid, struct cmhid *cm, int value)
+{
+   input_report_switch(cm->input_dev, SW_HEADPHONE_INSERT, value);
+   input_sync(cm->input_dev);
+}
+
+static int cmhid_raw_event(struct hid_device *hid, struct hid_report *report,
+u8 *data, int len)
+{
+   struct cmhid *cm = hid_get_drvdata(hid);
+
+   if (len != CM6533_JD_RAWEV_LEN)
+   goto out;
+   if (memcmp(data+CM6533_JD_SFX_OFFSET, ji_sfx, sizeof(ji_sfx)))
+   goto out;
+
+   if (!memcmp(data, ji_out, sizeof(ji_out))) {
+   hp_ev(hid, cm, 0);
+   goto out;
+   }
+   if (!memcmp(data, ji_in, sizeof(ji_in))) {
+   hp_ev(hid, cm, 1);
+   goto out;
+   }
+
+out:
+   return 0;
+}
+
+static int cmhid_input_configured(struct hid_device *hid,
+   struct hid_input *hidinput)
+{
+   struct input_dev *input_dev = hidinput->input;
+   struct cmhid *cm = hid_get_drvdata(hid);
+   int i;
+
+   cm->input_dev = input_dev;
+   memcpy(cm->switch_map, jack_switch_types, sizeof(cm->switch_map));
+   input_dev->evbit[0] = BIT(EV_SW);
+   for (i = 0; i < CM6533_JD_TYPE_COUNT; i++)
+   input_set_capability(cm->input_dev,
+   EV_SW, jack_switch_types[i]);
+   return 0;
+}
+

Re: [Xen-devel] [PATCH v2 16/16] ARM64: XEN: Initialize Xen specific UEFI runtime services

2016-01-21 Thread Shannon Zhao


On 2016/1/19 1:03, Stefano Stabellini wrote:
> On Fri, 15 Jan 2016, Shannon Zhao wrote:
>> > From: Shannon Zhao 
>> > 
>> > When running on Xen hypervisor, runtime services are supported through
>> > hypercall. So call Xen specific function to initialize runtime services.
>> > 
>> > Signed-off-by: Shannon Zhao 
> Thanks Shannon, much much better!  Just a couple of questions.
> 
> 
>> >  arch/arm/xen/enlighten.c |  5 +
>> >  arch/arm64/xen/Makefile  |  1 +
>> >  arch/arm64/xen/efi.c | 36 
>> >  drivers/xen/Kconfig  |  2 +-
>> >  include/xen/xen-ops.h|  1 +
>> >  5 files changed, 44 insertions(+), 1 deletion(-)
>> >  create mode 100644 arch/arm64/xen/efi.c
>> > 
>> > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
>> > index 485e117..84f27ec 100644
>> > --- a/arch/arm/xen/enlighten.c
>> > +++ b/arch/arm/xen/enlighten.c
>> > @@ -414,6 +414,11 @@ static int __init xen_guest_init(void)
>> >if (xen_initial_domain())
>> >pvclock_gtod_register_notifier(_pvclock_gtod_notifier);
>> >  
>> > +  if (IS_ENABLED(CONFIG_XEN_EFI)) {
>> > +  if (efi_enabled(EFI_PARAVIRT))
>> > +  xen_efi_runtime_setup();
>> > +  }
>> > +
>> >return 0;
>> >  }
>> >  early_initcall(xen_guest_init);
>> > diff --git a/arch/arm64/xen/Makefile b/arch/arm64/xen/Makefile
>> > index 74a8d87..62e6fe2 100644
>> > --- a/arch/arm64/xen/Makefile
>> > +++ b/arch/arm64/xen/Makefile
>> > @@ -1,2 +1,3 @@
>> >  xen-arm-y += $(addprefix ../../arm/xen/, enlighten.o grant-table.o p2m.o 
>> > mm.o)
>> >  obj-y := xen-arm.o hypercall.o
>> > +obj-$(CONFIG_XEN_EFI) += efi.o
>> > diff --git a/arch/arm64/xen/efi.c b/arch/arm64/xen/efi.c
>> > new file mode 100644
>> > index 000..33046b0
>> > --- /dev/null
>> > +++ b/arch/arm64/xen/efi.c
>> > @@ -0,0 +1,36 @@
>> > +/*
>> > + * Copyright (c) 2015, Linaro Limited, Shannon Zhao
>> > + *
>> > + * This program is free software; you can redistribute it and/or modify
>> > + * it under the terms of the GNU General Public License as published by
>> > + * the Free Software Foundation; either version 2 of the License, or
>> > + * (at your option) any later version.
>> > + *
>> > + * This program is distributed in the hope that it will be useful,
>> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> > + * GNU General Public License for more details.
>> > + *
>> > + * You should have received a copy of the GNU General Public License along
>> > + * with this program.  If not, see .
>> > + */
>> > +
>> > +#include 
>> > +#include 
>> > +
>> > +void __init xen_efi_runtime_setup(void)
>> > +{
>> > +  efi.get_time = xen_efi_get_time;
>> > +  efi.set_time = xen_efi_set_time;
>> > +  efi.get_wakeup_time  = xen_efi_get_wakeup_time;
>> > +  efi.set_wakeup_time  = xen_efi_set_wakeup_time;
>> > +  efi.get_variable = xen_efi_get_variable;
>> > +  efi.get_next_variable= xen_efi_get_next_variable;
>> > +  efi.set_variable = xen_efi_set_variable;
>> > +  efi.query_variable_info  = xen_efi_query_variable_info;
>> > +  efi.update_capsule   = xen_efi_update_capsule;
>> > +  efi.query_capsule_caps   = xen_efi_query_capsule_caps;
>> > +  efi.get_next_high_mono_count = xen_efi_get_next_high_mono_count;
>> > +  efi.reset_system = NULL;
>> > +}
>> > +EXPORT_SYMBOL_GPL(xen_efi_runtime_setup);
> This looks very similar to struct efi efi_xen previously in
> drivers/xen/efi.c.  Maybe it makes sense to leave struct efi efi_xen in
> drivers/xen/efi.c, export it in include/xen/xen-ops.h, then here just:
> 
>   efi = efi_xen;
> 
> Would that improve code readability?

Rethink about this. It's a little different on ARM since we call
xen_efi_runtime_setup after parsing the FDT and setting some members of
efi already, e.g. efi.systab, efi.acpi20. So it necessary to have a
different way to initialize the struct efi.

Thanks,
-- 
Shannon



Re: [GIT PULL] NVMe changes for 4.5-rc1

2016-01-21 Thread Linus Torvalds
On Thu, Jan 21, 2016 at 1:27 PM, Jens Axboe  wrote:
>
> Note that pulling this in will conflict with master, since the code was
> forked off pretty early, and we had a good chunk of nvme fixes later in
> the 4.4 cycle.

Not just conflict, but conflict in bad ways. I don't think I'll be
able to fix it up sanely.

In particular, commit b5875222de2f ("NVMe: IO ending fixes on surprise
removal") by Keith Busch added this to nvme_dev_remove():

if (nvme_io_incapable(dev)) {
/*
 * If the device is not capable of IO (surprise hot-removal,
 * for example), we need to quiesce prior to deleting the
 * namespaces. This will end outstanding requests and prevent
 * attempts to sync dirty data.
 */
nvme_dev_shutdown(dev);
}

and in your branch we now have:

 - nvme_dev_shutdown() is now nvme_dev_disable(dev, false). Fine.

 - nvme_dev_remove() got renamed to nvme_remove_namespaces(), but also
lost the "dev" argument (it takes "struct nvme_ctrl *ctrl" now).

I think I will end up doing the merge by just dropping that part of
the surprise removal commit, and letting you and Keith work out what
the real solution is..

 Linus


Re: [PATCH 23/33] x86/asm/bpf: Create stack frames in bpf_jit.S

2016-01-21 Thread Josh Poimboeuf
On Thu, Jan 21, 2016 at 06:44:28PM -0800, Alexei Starovoitov wrote:
> On Thu, Jan 21, 2016 at 04:49:27PM -0600, Josh Poimboeuf wrote:
> > bpf_jit.S has several callable non-leaf functions which don't honor
> > CONFIG_FRAME_POINTER, which can result in bad stack traces.
> > 
> > Create a stack frame before the call instructions when
> > CONFIG_FRAME_POINTER is enabled.
> > 
> > Signed-off-by: Josh Poimboeuf 
> > Cc: Alexei Starovoitov 
> > Cc: net...@vger.kernel.org
> > ---
> >  arch/x86/net/bpf_jit.S | 9 +++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/net/bpf_jit.S b/arch/x86/net/bpf_jit.S
> > index eb4a3bd..f2a7faf 100644
> > --- a/arch/x86/net/bpf_jit.S
> > +++ b/arch/x86/net/bpf_jit.S
> > @@ -8,6 +8,7 @@
> >   * of the License.
> >   */
> >  #include 
> > +#include 
> >  
> >  /*
> >   * Calling convention :
> > @@ -65,16 +66,18 @@ FUNC(sk_load_byte_positive_offset)
> >  
> >  /* rsi contains offset and can be scratched */
> >  #define bpf_slow_path_common(LEN)  \
> > +   lea -MAX_BPF_STACK + 32(%rbp), %rdx;\
> > +   FRAME_BEGIN;\
> > mov %rbx, %rdi; /* arg1 == skb */   \
> > push%r9;\
> > pushSKBDATA;\
> >  /* rsi already has offset */   \
> > mov $LEN,%ecx;  /* len */   \
> > -   lea - MAX_BPF_STACK + 32(%rbp),%rdx;\
> > callskb_copy_bits;  \
> > test%eax,%eax;  \
> > pop SKBDATA;\
> > -   pop %r9;
> > +   pop %r9;\
> > +   FRAME_END
> 
> I'm not sure what above is doing.
> There is already 'push rbp; mov rbp,rsp' at the beginning of generated
> code and with above the stack trace will show two function at the same ip?
> since there were no calls between them?
> I think the stack walker will get even more confused?
> Also the JIT of bpf_call insn will emit variable number of push/pop
> around the call and I definitely don't want to add extra push rbp
> there, since it's the critical path and callee will do its own
> push rbp.
> Also there are push/pops emitted around div/mod
> and there is indirect goto emitted as well for bpf_tail_call
> that jumps into different function body without touching
> current stack.

Hm, I'm not sure I follow.  Let me try to explain my understanding.

As you mentioned, the generated code sets up the frame pointer.  From
emit_prologue():

EMIT1(0x55); /* push rbp */
EMIT3(0x48, 0x89, 0xE5); /* mov rbp,rsp */

And then later, do_jit() can generate a call into the functions in
bpf_jit.S.  For example:

func = CHOOSE_LOAD_FUNC(imm32, sk_load_word);
...
EMIT1_off32(0xE8, jmp_offset); /* call */

So the functions in bpf_jit.S are being called by the generated code.
They're not part of the generated code itself.  So they're callees and
need to create their own stack frame before they call out to somewhere
else.

Or did I miss something?

> Also none of the JITed function are dwarf annotated.

But what does that have to do with frame pointers?

> I could be missing something. I think either this patch
> is not need or you need to teach the tool to ignore
> all JITed stuff. I don't think it's practical to annotate
> everything. Different JITs do their own magic.
> s390 JIT is even more fancy.

Well, but the point of these patches isn't to make the tool happy.  It's
really to make sure that runtime stack traces can be made reliable.
Maybe I'm missing something but I don't see why JIT code can't honor
CONFIG_FRAME_POINTER just like any other code.

-- 
Josh


Re: [PATCH] watchdog: Add watchdog timer support for the WinSystems EBC-C384

2016-01-21 Thread Guenter Roeck

On 01/21/2016 05:11 PM, William Breathitt Gray wrote:

The WinSystems EBC-C384 has an onboard watchdog timer. The timeout range
supported by the watchdog timer is 1 second to 255 minutes. Timeouts
under 256 seconds have a 1 second resolution, while the rest have a 1
minute resolution.

This driver adds watchdog timer support for this onboard watchdog timer.
The timeout may be configured via the timeout module parameter.

Signed-off-by: William Breathitt Gray 
---
  MAINTAINERS |   6 ++
  drivers/watchdog/Kconfig|   9 ++
  drivers/watchdog/Makefile   |   1 +
  drivers/watchdog/ebc-c384_wdt.c | 226 
  4 files changed, 242 insertions(+)
  create mode 100644 drivers/watchdog/ebc-c384_wdt.c

diff --git a/MAINTAINERS b/MAINTAINERS
index b1e3da7..c058abf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11629,6 +11629,12 @@ M: David Härdeman 
  S:Maintained
  F:drivers/media/rc/winbond-cir.c

+WINSYSTEMS EBC-C384 WATCHDOG DRIVER
+M: William Breathitt Gray 
+L: linux-watch...@vger.kernel.org
+S: Maintained
+F: drivers/watchdog/ebc-c384_wdt.c
+
  WIMAX STACK
  M:Inaky Perez-Gonzalez 
  M:linux-wi...@intel.com
diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 4f0e7be..94569ec 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -711,6 +711,15 @@ config ALIM7101_WDT

  Most people will say N.

+config EBC_C386_WDT
+   tristate "WinSystems EBC-C384 watchdog timer support"
+   depends on X86
+   select WATCHDOG_CORE
+   help
+ Enables watchdog timer support for the watchdog timer on the
+ WinSystems EBC-C384 motherboard. The timeout may be configured via
+ the timeout module parameter.
+
  config F71808E_WDT
tristate "Fintek F71808E, F71862FG, F71869, F71882FG and F71889FG 
Watchdog"
depends on X86
diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
index f566753..1522316 100644
--- a/drivers/watchdog/Makefile
+++ b/drivers/watchdog/Makefile
@@ -88,6 +88,7 @@ obj-$(CONFIG_ACQUIRE_WDT) += acquirewdt.o
  obj-$(CONFIG_ADVANTECH_WDT) += advantechwdt.o
  obj-$(CONFIG_ALIM1535_WDT) += alim1535_wdt.o
  obj-$(CONFIG_ALIM7101_WDT) += alim7101_wdt.o
+obj-$(CONFIG_EBC_C386_WDT) += ebc-c384_wdt.o
  obj-$(CONFIG_F71808E_WDT) += f71808e_wdt.o
  obj-$(CONFIG_SP5100_TCO) += sp5100_tco.o
  obj-$(CONFIG_GEODE_WDT) += geodewdt.o
diff --git a/drivers/watchdog/ebc-c384_wdt.c b/drivers/watchdog/ebc-c384_wdt.c
new file mode 100644
index 000..1d7bd67
--- /dev/null
+++ b/drivers/watchdog/ebc-c384_wdt.c
@@ -0,0 +1,226 @@
+/*
+ * Watchdog timer driver for the WinSystems EBC-C384
+ * Copyright (C) 2016 William Breathitt Gray
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define MODULE_NAME "ebc-c384_wdt"
+#define WATCHDOG_TIMEOUT 60
+
+static bool nowayout = WATCHDOG_NOWAYOUT;
+module_param(nowayout, bool, 0);
+MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default="
+   __MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
+
+static unsigned timeout = WATCHDOG_TIMEOUT;
+module_param(timeout, uint, 0);
+MODULE_PARM_DESC(timeout, "Watchdog timeout in seconds (default="
+   __MODULE_STRING(WATCHDOG_TIMEOUT) ")");
+
+/**
+ * struct ebc_c384_wdt - Watchdog timer device private data structure
+ * @wdd:   instance of struct watchdog_device
+ * @lock:  synchronization lock to prevent race conditions
+ * @base:  base port address of the device
+ * @extent:extent of port address region of the device
+ */
+struct ebc_c384_wdt {
+   struct watchdog_device wdd;
+   spinlock_t lock;
+   unsigned base;
+   unsigned extent;
+};
+
+static int ebc_c384_wdt_start(struct watchdog_device *wdev)
+{
+   struct ebc_c384_wdt *wdt = watchdog_get_drvdata(wdev);
+
+   return wdt->wdd.ops->set_timeout(wdev, wdt->wdd.timeout);


This implies that setting the timeout would start the watchdog,
which is inappropriate (the timeout can be set while the watchdog
is stopped).

Also, setting the timeout sets both the resolution _and_ the timeout,
which is probably unnecessary when starting or pinging the watchdog.


+}
+
+static int ebc_c384_wdt_stop(struct watchdog_device *wdev)
+{
+   struct ebc_c384_wdt *wdt = watchdog_get_drvdata(wdev);
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+
+   outb(0x00, wdt->base + 2);
+
+   

Re: [PATCH v3] kallsyms: add support for relative offsets in kallsyms address table

2016-01-21 Thread Michael Ellerman
On Thu, 2016-01-21 at 14:55 -0800, Kees Cook wrote:
> On Thu, Jan 21, 2016 at 2:50 PM, Andrew Morton
>  wrote:
> > On Thu, 21 Jan 2016 18:19:43 +0100 Ard Biesheuvel 
> >  wrote:
> > 
> > > Similar to how relative extables are implemented, it is possible to emit
> > > the kallsyms table in such a way that it contains offsets relative to some
> > > anchor point in the kernel image rather than absolute addresses. The 
> > > benefit
> > > is that such table entries are no longer subject to dynamic relocation 
> > > when
> > > the build time and runtime offsets of the kernel image are different. 
> > > Also,
> > > on 64-bit architectures, it essentially cuts the size of the address table
> > > in half since offsets can typically be expressed in 32 bits.
> > > 
> > > Since it is useful for some architectures (like x86) to retain the ability
> > > to emit absolute values as well, this patch adds support for both, by
> > > emitting absolute addresses as positive 32-bit values, and addresses
> > > relative to the lowest encountered relative symbol as negative values, 
> > > which
> > > are subtracted from the runtime address of this base symbol to produce the
> > > actual address.
> > > 
> > > Support for the above is enabled by default for all architectures except
> > > IA-64, whose symbols are too far apart to capture in this manner.
> > 
> > I'm not really understanding the benefits of this.  A smaller address
> > table is nice, but why is it desirable that "such table entries are no
> > longer subject to dynamic relocation when the build time and runtime
> > offsets of the kernel image are different"?
> 
> IIUC, this means that the relocation work done after decompression now
> doesn't have to do relocation updates for all these values, which
> means a smaller relocation table as well.

Yep. If I remember the figures rightly it saves ~250K of relocations for the
powerpc build.

cheers



Re: [RFC 1/3] dt-bindings: soc: Add documentation for the MediaTek GCE unit

2016-01-21 Thread Horng-Shyang Liao
Hi Rob,

On Wed, 2016-01-20 at 10:38 -0600, Rob Herring wrote:
> On Wed, Jan 20, 2016 at 01:14:38PM +0800, hs.l...@mediatek.com wrote:
> > From: HS Liao 
> > 
> > This adds documentation for the MediaTek Global Command Engine (GCE) unit
> > found in MT8173 SoCs.
> > 
> > Signed-off-by: HS Liao 
> > ---
> >  .../devicetree/bindings/soc/mediatek/gce.txt   |   33 
> > 
> >  1 file changed, 33 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/soc/mediatek/gce.txt
> > 
> > diff --git a/Documentation/devicetree/bindings/soc/mediatek/gce.txt 
> > b/Documentation/devicetree/bindings/soc/mediatek/gce.txt
> > new file mode 100644
> > index 000..878b11e
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/soc/mediatek/gce.txt
> > @@ -0,0 +1,33 @@
> > +MediaTek GCE
> > +===
> > +
> > +The Global Command Engine (GCE) is used to help read/write registers with
> > +critical time limitation, such as updating display configuration during the
> > +vblank. The GCE can be used to implement the Command Queue (CMDQ) driver.
> > +Currently, the GCE only supports display related hardwares, but we expect
> > +it can be extended to other hardwares for future requirements.
> 
> That's a hardware limitation or just s/w is only using it for display? 
> If the latter, that's not really relevant to this binding and should be 
> removed.

Just s/w is only using it for display.
I will remove it from next patch.

> > +
> > +Required properties:
> > +- compatible: Must be "mediatek,mt8173-gce"
> > +- reg: Address range of the GCE unit
> > +- interrupts: The interrupt signal from the GCE block
> > +- clock: Clocks according to the common clock binding
> > +- clock-names: Must be "gce" to stand for GCE clock
> > +
> > +Example:
> > +
> > +   gce: gce@10212000 {
> > +   compatible = "mediatek,mt8173-gce";
> > +   reg = <0 0x10212000 0 0x1000>;
> > +   interrupts = ;
> > +   clocks = < CLK_INFRA_GCE>;
> > +   clock-names = "gce";
> > +   };
> > +
> > +   mmsys: clock-controller@1400 {
> > +   compatible = "mediatek,mt8173-mmsys", "syscon";
> > +   reg = <0 0x1400 0 0x1000>;
> > +   power-domains = < MT8173_POWER_DOMAIN_MM>;
> > +   #clock-cells = <1>;
> > +   mediatek,gce = <>;
> 
> Not documented.

It's just an example about how gce is used by display mmsys.
After I discussed with Mediatek display owner,
we think this can be moved to display device tree document.
Do you agree with this suggestion?
If so, I will remove it from next patch, too.

> > +   };
> > -- 
> > 1.7.9.5

Thanks,
HS Liao
> > 




Re: [PATCH 1/2] mmc: enable mmc host device to suspend/resume asynchronously

2016-01-21 Thread Fu, Zhonghui


On 1/12/2016 10:42 PM, Ulf Hansson wrote:
> On 28 December 2015 at 16:39, Fu, Zhonghui  
> wrote:
>> Now, PM core supports asynchronous suspend/resume mode for devices
>> during system suspend/resume, and the power state transition of one
>> device may be completed in separate kernel thread. PM core ensures
>> all power state transition dependency between devices. This patch
>> enables mmc hosts to suspend/resume asynchronously. This will take
>> advantage of multicore and improve system suspend/resume speed.
>> After applying this patch and enabling all mmc hosts' child devices
>> to suspend/resume asynchronously on ASUS T100TA, the system
>> suspend-to-idle time is reduced from 1645ms to 1107ms, and the
>> system resume time is reduced from 940ms to 914ms.
> Please update the  change log as I don't think the above is really correct.
>
> I think you can simplify the change log quite a bit and just mention
> what and why we want this change.

I have updated the change log according to your comments and resent this patch 
- "[PATCH 1/2 v2] mmc: enable mmc host device to suspend/resume asynchronously".


Thanks,
Zhonghui
>
>> Signed-off-by: Zhonghui Fu 
>> ---
>>  drivers/mmc/core/host.c |1 +
>>  1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
>> index da950c4..7222fd7 100644
>> --- a/drivers/mmc/core/host.c
>> +++ b/drivers/mmc/core/host.c
>> @@ -339,6 +339,7 @@ struct mmc_host *mmc_alloc_host(int extra, struct device 
>> *dev)
>> host->class_dev.parent = dev;
>> host->class_dev.class = _host_class;
>> device_initialize(>class_dev);
>> +   device_enable_async_suspend(>class_dev);
>>
>> if (mmc_gpio_alloc(host)) {
>> put_device(>class_dev);
>> -- 1.7.1
>>
> Otherwise I think this looks good!
>
> Kind regards
> Uffe



[PATCH 1/2 v2] mmc: enable mmc host device to suspend/resume asynchronously

2016-01-21 Thread Fu, Zhonghui
This patch enables mmc hosts to suspend/resume asynchronously.
This will improve system suspend/resume speed. After applying
this patch and enabling all mmc hosts' child devices to
suspend/resume asynchronously on ASUS T100TA, the system
suspend-to-idle time is reduced from 1645ms to 1107ms, and the
system resume time is reduced from 940ms to 914ms.

Signed-off-by: Zhonghui Fu 
Acked-by: Venu Byravarasu 
---
Changes in v2:
- Update commit message

 drivers/mmc/core/host.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
index 0aecd5c..1d94607 100644
--- a/drivers/mmc/core/host.c
+++ b/drivers/mmc/core/host.c
@@ -339,6 +339,7 @@ struct mmc_host *mmc_alloc_host(int extra, struct device 
*dev)
host->class_dev.parent = dev;
host->class_dev.class = _host_class;
device_initialize(>class_dev);
+   device_enable_async_suspend(>class_dev);
 
if (mmc_gpio_alloc(host)) {
put_device(>class_dev);
-- 1.7.1



linux-next: Tree for Jan 22

2016-01-21 Thread Stephen Rothwell
Hi all,

Please do not add any material for v4.6 to your linux-next included
branches until after v4.5-rc1 is released.

Changes since 20160121:

The aio tree still had a build failure so I used the version from
next-20160111.

Non-merge commits (relative to Linus' tree): 1023
 910 files changed, 49808 insertions(+), 16189 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After the
final fixups (if any), I do an x86_64 modules_install followed by builds
for powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(this fails its final link) and pseries_le_defconfig and i386, sparc and
sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 238 trees (counting Linus' and 36 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (404a47410c26 Merge branch 'uaccess' (batched user access 
infrastructure))
Merging fixes/master (25cb62b76430 Linux 4.3-rc5)
Merging kbuild-current/rc-fixes (3d1450d54a4f Makefile: Force gzip and xz on 
module install)
Merging arc-current/for-curr (74bf8efb5fa6 Linux 4.4-rc7)
Merging arm-current/fixes (34bfbae33ae8 ARM: 8475/1: SWP emulation: Restore 
original *data when failed)
Merging m68k-current/for-linus (eb37bc3f85b6 m68k: Provide __phys_to_pfn() and 
__pfn_to_phys())
Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached 
build errors)
Merging mips-fixes/mips-fixes (1795cd9b3a91 Linux 3.16-rc5)
Merging powerpc-fixes/fixes (0e2bce741154 powerpc: Remove newly added extra 
definition of pmd_dirty)
Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2)
Merging sparc/master (5807fcaa9bf7 Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security)
Merging net/master (0a9c453eef65 net: phy: smsc: Fix disabling energy detect 
mode)
Merging ipsec/master (a8a572a6b5f2 xfrm: dst_entries_init() per-net dst_ops)
Merging ipvs/master (8e662164abb4 netfilter: nfnetlink_queue: avoid harmless 
unnitialized variable warnings)
Merging wireless-drivers/master (e0045bf80f62 brcmfmac: fix sdio sg table alloc 
crash)
Merging mac80211/master (da629cf111a2 mac80211: Don't buffer non-bufferable 
MMPDUs)
Merging sound-current/for-linus (40ed9444cd24 ALSA: timer: Introduce disconnect 
op to snd_timer_instance)
Merging pci-current/for-linus (5c3b99d05752 PCI: dra7xx: Mark driver as broken)
Merging driver-core.current/driver-core-linus (2b4015e9fb33 Merge tag 
'platform-drivers-x86-v4.5-1' of 
git://git.infradead.org/users/dvhart/linux-platform-drivers-x86)
Merging tty.current/tty-linus (ece6267878ae Merge tag 'clk-for-linus-4.5' of 
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux)
Merging usb.current/usb-linus (ece6267878ae Merge tag 'clk-for-linus-4.5' of 
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux)
Merging usb-gadget-fixes/fixes (7d32cdef5356 usb: musb: fail with error when no 
DMA controller set)
Merging usb-serial-fixes/usb-linus (f7d7f59ab124 USB: cp210x: add ID for ELV 
Marble Sound Board 1)
Merging usb-chipidea-fixes/ci-for-usb-stable (6f51bc340d2a usb: chipidea: imx: 
fix a possible NULL dereference)
Merging staging.current/staging-linus (f744c423cacf Merge tag 
'iio-fixes-for-4.4c' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio 
into staging-linus)
Merging char-misc.current/char-misc-linus (ece6267878ae Merge tag 
'clk-for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux)
Merging input-current/for-linus (009f77383651 Merge branch 'next' into 
for-linus)
Merging crypto-current/master (202736d99b7f crypto: algif_skcipher - sendmsg SG 
marking is off by one)
Merging ide/master (e04a2bd6d8c9 drivers/ide: make ide-scan-pci.

Re: [PATCH] mm: memcontrol: only manage socket pressure for CONFIG_INET

2016-01-21 Thread Masanari Iida
Hi,
I hit this while I was testing 4.5-rc1 with randconfig during merger period.
And now I noticed that it was fixed after Linus merged akpm branch.

commit eae21770b4fed5597623aad0d618190fa60426ff
Merge: e9f57eb 9f273c2
Author: Linus Torvalds 
Date:   Thu Jan 21 12:32:08 2016 -0800

Merge branch 'akpm' (patches from Andrew)

Try one commit before this (commit e9f57ebcba563e0cd532926cab83c92bb4d79360 )
DOES have an issue.
So I believe it was fixed for now.
Thanks

Masanari


On Thu, Dec 10, 2015 at 8:13 AM, Andrew Morton
 wrote:
> On Wed, 9 Dec 2015 18:05:05 -0500 Johannes Weiner  wrote:
>
>> On Wed, Dec 09, 2015 at 02:28:36PM -0800, Andrew Morton wrote:
>> > On Wed, 9 Dec 2015 13:58:58 -0500 Johannes Weiner  
>> > wrote:
>> > > The calls to tcp_init_cgroup() appear earlier in the series than "mm:
>> > > memcontrol: hook up vmpressure to socket pressure". However, they get
>> > > moved around a few times so fixing it earlier means respinning the
>> > > series. Andrew, it's up to you whether we take the bisectability hit
>> > > for !CONFIG_INET && CONFIG_MEMCG (how common is this?) or whether you
>> > > want me to resend the series.
>> >
>> > hm, drat, I was suspecting dependency issues here, but a test build
>> > said it was OK.
>> >
>> > Actually, I was expecting this patch series to depend on the linux-next
>> > cgroup2 changes, but that doesn't appear to be the case.  *should* this
>> > series be staged after the cgroup2 code?
>>
>> Code-wise they are independent. My stuff is finishing up the new memcg
>> control knobs, the cgroup2 stuff is changing how and when those knobs
>> are exposed from within the cgroup core. I'm not relying on any recent
>> changes in the cgroup core AFAICS, so the order shouldn't matter here.
>
> OK, thanks.
>
>> > Regarding this particular series: yes, I think we can live with a
>> > bisection hole for !CONFIG_INET && CONFIG_MEMCG users.  But I'm not
>> > sure why we're discussing bisection issues, because Arnd's build
>> > failure occurs with everything applied?
>>
>> Arnd's patches apply to the top of the stack, but they address issues
>> introduced early in the series and the problematic code gets touched a
>> lot in subsequent patches. E.g. the first build breakage is in ("net:
>> tcp_memcontrol: simplify linkage between socket and page counter")
>> when the tcp_init_cgroup() and tcp_destroy_cgroup() function calls get
>> moved around and lose the CONFIG_INET protection.
>
> Yeah, this is a pain.  I think I'll fold Arnd's fix into
> mm-memcontrol-introduce-config_memcg_legacy_kmem.patch (which is staged
> after all the other MM patches and after linux-next) and will pretend I
> didn't know about the issue ;)
>
>> Anyway, if we can live with the bisection caveat then Arnd's fixes on
>> top of the kmem series look good to me. Depending on what Vladimir
>> thinks we might want to replace the CONFIG_SLOB fix with something
>> else later on, but that shouldn't be a problem, either.
>
> I don't have a fix for the CONFIG_SLOB&_MEMCG issue yet.  I
> agree that it would be best to make the combination work correctly
> rather than banning it, but that does require a bit of runtime testing.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] Regression introduced with "block: split bios to max possible length"

2016-01-21 Thread Keith Busch
On Thu, Jan 21, 2016 at 05:12:13PM -0800, Linus Torvalds wrote:
> On Thu, Jan 21, 2016 at 2:51 PM, Keith Busch  wrote:
> >
> > My apologies for the trouble. I trust it really is broken, but I don't
> > quite see how. The patch supposedly splits the transfer to the max size
> > the request queue says it allows. How does the max allowed size end up
> > an invalid multiple?
> 
> I assume that in this case it's simply that
> 
>  - max_sectors is some odd number in sectors (ie 65535)
> 
>  - the block size is larger than a sector (ie 4k)

Wouldn't that make max sectors non-sensical? Or am I mistaken to think max
sectors is supposed to be a valid transfer in multiples of the physical
sector size?

I do think this is what's happening though. A recent commit (ca369d51b)
limits the max_sectors to 255 max by default, which isn't right for
4k. A driver has to override the queue's limits.max_dev_sectors first
to get the desired limit for their storage.

I'm not sure if that was the intention. There are lots of drivers
requesting more than 255 and probably unaware they're not getting it,
DASD included. I don't think we'd have seen this problem if the requested
setting wasn't overridden.
 
> What I think it _should_ do is:
> 
>  (a) check against max sectors like it used to do:
> 
> if (sectors + (bv.bv_len >> 9) > queue_max_sectors(q))
> goto split;

This can create less optimal splits for h/w that advertise chunk size. I
know it's a quirky feature (wasn't my idea), but the h/w is very slow
to not split at the necessary alignments, and we used to handle this
split correctly.

Point taken, though. The implementation needs some cleaning up.


Re: [PATCH 0/6] perf core: Read from overwrite ring buffer

2016-01-21 Thread Alexei Starovoitov
On Fri, Jan 22, 2016 at 10:21:19AM +0800, Wangnan (F) wrote:
> 
> 
> On 2016/1/21 14:51, Wangnan (F) wrote:
> >
> >
> >On 2016/1/20 10:20, Alexei Starovoitov wrote:
> >>On Wed, Jan 20, 2016 at 09:37:42AM +0800, Wangnan (F) wrote:
> >>>
> >>>On 2016/1/20 1:42, Alexei Starovoitov wrote:
> On Tue, Jan 19, 2016 at 11:16:44AM +, Wang Nan wrote:
> >This patchset introduces two methods to support reading from
> >overwrite.
> >
> >  1) Tailsize: write the size of an event at the end of it
> >  2) Backward writing: write the ring buffer from the end of it to
> >the
> > beginning.
> what happend with your other idea of moving the whole header to the
> end?
> That felt better than either of these options.
> >>>I'll try it today. However, putting all of the three together is
> >>>not as easy as this patchset.
> >>I'm missing something. Why all three in one set?
> >
> >Can't implement all three in one, but implement two of them make
> >benchmarking simpler :)
> >
> >Here comes some numbers.
> >
> >I attach a target program at the end of this mail. It calls
> >close(-1) for 300 times, and use gettimeofday to check
> >how many us it takes.
> >
> >Following cases are tested:
> >
> >
> > BASE: ./a.out
> > RAWPERF : ./perf record -o /dev/null -e raw_syscalls:* ./a.out
> > WRTBKWRD: ./perf record -o /dev/null -e raw_syscalls:* ./a.out
> > TAILSIZE: ./perf record --no-has-write-backward -o /dev/null -e
> >raw_syscalls:*/overwrite/ ./a.out
> > RAWOVWRT: ./perf record --no-has-write-backward --no-has-tailsize -o
> >/dev/null -e raw_syscalls:*/overwrite/ ./a.out
> >
> >With this script:
> >
> >func() {
> >for x in `seq 1 100` ; do $1; done | tee data_$2
> >}
> >
> >func ./a.out base
> >func "./perf record -o /dev/null -e raw_syscalls:* ./a.out" rawperf
> >func "./perf record -o /dev/null -e raw_syscalls:*/overwrite/ ./a.out"
> >wrtbkwrd
> >func "./perf record -o /dev/null --no-has-write-backward -e
> >raw_syscalls:*/overwrite/ ./a.out" tailsize
> >func "./perf record -o /dev/null --no-has-write-backward --no-has-tailsize
> >-o /dev/null -e raw_syscalls:*/overwrite/ ./a.out" rawovwrt
> >
> >Result:
> >
> >MEAN   STDVAR
> >BASE:  879870.81  11913.13
> >RAWPERF : 2603854.7  706658.4
> >WRTBKWRD: 2313301.220  6727.957
> >TAILSIZE: 2383051.860  5248.061
> >RAWOVWRT: 2315273.180  5221.025
> 
> Add a number: I tested original perf overwrite ring buffer in pure v4.4
> on the same machine:
> 
> MEAN  STDVAR
> RAWOVWRT(original): 2323970.455103.39
> 
> So I think backward writing method doesn't add extra overhead into
> fastpath.
> 
> I will send this patchset again with several bugs fixed. After that
> I'll start working on tail-header if it is still required.

interesting.
did I read the numbers correctly that 'write backwards' method
is actually the fastest? even faster than no-overwrite?
nice. I guess it makes snese that overwrite is faster.
I guess than moving the header to the end will have the same
performance in this benchmark, since RAWOVWRT is the same as well.



[PATCH] bus: arm-cci: Add missing of_node_put

2016-01-21 Thread Amitoj Kaur Chawla
for_each_child_of_node performs an of_node_get on each iteration, so
to break out of the loop an of_node_put is required.

Found using Coccinelle. The semantic patch used for this is as follows:

// 
@@
expression e;
local idexpression n;
@@

 for_each_child_of_node(n, ...) {
   ... when != of_node_put(n)
   when != e = n
(
   return n;
+  of_node_put(n);
?  return ...;
)
   ...
 }
// 

Signed-off-by: Amitoj Kaur Chawla 
---
 drivers/bus/arm-cci.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
index 577cc4b..011b2f6 100644
--- a/drivers/bus/arm-cci.c
+++ b/drivers/bus/arm-cci.c
@@ -1983,8 +1983,10 @@ static int cci_probe_ports(struct device_node *np)
 
i = nb_ace + nb_ace_lite;
 
-   if (i >= nb_cci_ports)
+   if (i >= nb_cci_ports) {
+   of_node_put(cp);
break;
+   }
 
if (of_property_read_string(cp, "interface-type",
_str)) {
-- 
1.9.1



Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-21 Thread Paul E. McKenney
On Thu, Jan 21, 2016 at 06:48:54PM -0800, Davidlohr Bueso wrote:
> On Thu, 21 Jan 2016, Paul E. McKenney wrote:
> 
> >I did some testing, which exposed it to the 0day test robot, which
> >did note some performance differences.  I was hoping that it would
> >clear up some instability from other patches, but no such luck.  ;-)
> 
> Oh, that explains why we got a performance regression report :)

Plus I suspected that you wanted some extra email.  ;-)

Thanx, Paul



Re: [RFC PATCH] mmc: dw_mmc: remove redundant num_slots check

2016-01-21 Thread Shawn Lin

On 2016/1/22 10:46, Jaehoon Chung wrote:

Hi, Shawn.

On 01/21/2016 04:52 PM, Shawn Lin wrote:

num_slots comes from pdata if existing, otherwise from
dw_mci_parse_dt which make it at least one slot. If
num_slots is less than 1 for the existing pdata case,
current code return -ENODEV. But dw_mci_probe seems to
treat this a optional case as it will call SDMMC_GET_SLOT_NUM
if no slot assigned.


Well, we need to consider more thing..
Host can get the number of slot from SDMMC_GET_SLOT_NUM().
But i think this way also has the problem.

num_slot isn't defined anywhere, and num_slot should be set to value of 
SDMMC_GET_SLOT_NUM.
If that value is higher than 1, it should be blocking..(I didn't test all 
cases..)



Actually, from the code itself, it confused me the way about how we get
num_slot. At leaset, we might should try to cleanup it someway to make
it a little more clear. And just as what you point out, we see some
broblem here.


Even though this patch is not correct, i could check the problem relevant to 
num_slot, because of this patch. :)



Nice to here that. I make it a RFC patch since I also not quite sure
about all cases including some corner cases. Let's think it twice.


my suggestion is if pdata->num_slot is not defined anywhere, just set to 1 by 
default.
not take from SDMMC_GET_SLOT_NUM.



yes, SDMMC_GET_SLOT_NUM is the capability of controller, num_slot is
hardware wired number. So, geting it from SDMMC_GET_SLOT_NUM has
problem.


if (host->pdata->nums_slots < 1 ||
host->pdata->nums_slots > SDMMC_GET_SLOT_NUM())

This is correct condition. num_slots can't be higher than number of supported 
slots.
how about?


Seems reasonable.
I guess you want to come up with a new patch dealing with it? :)



Best Regards,
Jaehoon Chung



Signed-off-by: Shawn Lin 

---

  drivers/mmc/host/dw_mmc.c | 6 --
  1 file changed, 6 deletions(-)

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index 7128351..a116ec6 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -2949,12 +2949,6 @@ int dw_mci_probe(struct dw_mci *host)
}
}

-   if (host->pdata->num_slots < 1) {
-   dev_err(host->dev,
-   "Platform data must supply num_slots.\n");
-   return -ENODEV;
-   }
-
host->biu_clk = devm_clk_get(host->dev, "biu");
if (IS_ERR(host->biu_clk)) {
dev_dbg(host->dev, "biu clock not available\n");









--
Best Regards
Shawn Lin



Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-21 Thread Michel Dänzer

[ Trimming KDE folks from Cc ]

On 21.01.2016 19:09, Daniel Vetter wrote:
> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>> 
>>> Can you please point me at the vblank on/off jump bug please?
>>
>> AFAIR I originally reported it in response to
>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>> , but I can't find that in the archives, so maybe that was just on IRC.
>> See
>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>> . Basically, I ran into the bug fixed by your patch because the counter
>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>> just a few days.
> 
> Ok, so just uncovered the overflow bug.

Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
counter jumping bug (similar to the bug this thread is about), which
exposed the overflow bug, is still alive and kicking in 4.5. It seems
to happen when turning off the CRTC:

[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=0, hw=916 hw_last=916
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 
7304.307354 -> 7304.308006 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=16776301, hw=1 hw_last=916
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 
-> 7304.317140 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=234880995, diff=16777215, hw=0 hw_last=1

I suspect this may not be evident with current Intel hardware because
dev->max_vblank_count = 0x, which makes the wraparound code in
drm_update_vblank_count a no-op. Maybe you can reproduce it if you
artificially set a lower max_vblank_count in the driver.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer


Re: linux-next: build failure after merge of the akpm tree

2016-01-21 Thread Stephen Rothwell
Hi all,

On Fri, 22 Jan 2016 11:24:42 +1100 Stephen Rothwell  
wrote:
>
> On Thu, 21 Jan 2016 07:38:59 +1100 Stephen Rothwell  
> wrote:
> >
> > On Wed, 20 Jan 2016 15:09:47 +0100 Takashi Iwai  wrote:  
> > >
> > > On Sat, 16 Jan 2016 09:51:29 +0100,
> > > Takashi Iwai wrote:
> > > > 
> > > > There are a few ways to fix this, but all are not comfortable.
> > > > 
> > > > A. Disable compress API for powerpc.
> > 
> > This also affects alpha, mips and (maybe) sparc.  
> 
> This was exposed on PowerPC by commit bf76f73c5f65 ("powerpc: enable
> UBSAN support") which is in Linus' tree as of this morning.  The only
> relevant change that made was in the compiler flags (I tested this by
> building the file without that commit but with these new compiler flags:
> 
> -fsanitize=shift -fsanitize=integer-divide-by-zero
> -fsanitize=unreachable -fsanitize=vla-bound -fsanitize=null
> -fsanitize=signed-integer-overflow -fsanitize=bounds
> -fsanitize=object-size -fsanitize=returns-nonnull-attribute
> -fsanitize=bool -fsanitize=enum -fsanitize=alignment
> 
> The preprocessed file is the same in both cases, but with these flags
> the compiler errors.

I have discussed this with the PowerPC maintainer (Michael) and he
figured out why the compiler does not produce an error (normally).  It
is because this driver is using _IOC_NR(xxx) to match ioctls instead of
the full ioctl number.  Because of that, the compiler can figure out
that it does not care about the undefined reference to
__invalid_size_argument_for_IOC that the size check shouold generate
(since _IOC_NR shifts and masks it out).

So, the switch statement in snd_compr_ioctl() should be rewritten to
check against the full ioctl number (since currently it could
theoretically match any number of ioctls, not just the relevant ones).
And then something needs to be done about the very large structure
being passed.
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


Re: [PATCH 31/33] bpf: Add __bpf_prog_run() to stacktool whitelist

2016-01-21 Thread Alexei Starovoitov
On Thu, Jan 21, 2016 at 04:49:35PM -0600, Josh Poimboeuf wrote:
> stacktool reports the following false positive warnings:
> 
>   stacktool: kernel/bpf/core.o: __bpf_prog_run()+0x5c: sibling call from 
> callable instruction with changed frame pointer
>   stacktool: kernel/bpf/core.o: __bpf_prog_run()+0x60: function has 
> unreachable instruction
>   stacktool: kernel/bpf/core.o: __bpf_prog_run()+0x64: function has 
> unreachable instruction
>   [...]
> 
> It's confused by the following dynamic jump instruction in
> __bpf_prog_run()::
> 
>   jmp *(%r12,%rax,8)
> 
> which corresponds to the following line in the C code:
> 
>   goto *jumptable[insn->code];
> 
> There's no way for stacktool to deterministically find all possible
> branch targets for a dynamic jump, so it can't verify this code.
> 
> In this case the jumps all stay within the function, and there's nothing
> unusual going on related to the stack, so we can whitelist the function.

well, few things are very unusual in this function.
did you see what JMP_CALL does? it's a call into a different function,
but not like typical indirect call. Will it be ok as well?

In general it's not possible for any tool to identify all possible
branch targets. bpf programs can be loaded on the fly and
jumping sequence will change.
So if this marking says 'don't bother analyzing this function because
it does sane stuff' that's probably not the case.
If this marking says 'don't bother analyzing, the stack may be crazy
from here on' then it's ok.



Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-21 Thread Davidlohr Bueso

On Thu, 21 Jan 2016, Paul E. McKenney wrote:


I did some testing, which exposed it to the 0day test robot, which
did note some performance differences.  I was hoping that it would
clear up some instability from other patches, but no such luck.  ;-)


Oh, that explains why we got a performance regression report :)

Thanks,
Davidlohr


Re: [lkp] [locking/mutexes] cb4bbc457b: -40.0% unixbench.score

2016-01-21 Thread Davidlohr Bueso

On Fri, 22 Jan 2016, kernel test robot wrote:


FYI, we noticed the below changes on

https://github.com/0day-ci/linux 
Ding-Tianhong/locking-mutexes-don-t-spin-on-owner-when-wait-list-is-not-NULL/20160121-173317
commit cb4bbc457bfed6194ffab1b10c7be73b3f16ca2d ("locking/mutexes: don't spin on 
owner when wait list is not NULL.")


I'm not sure why this would even be reported, as this patch has not been 
accepted
or acked or nothin', by anyone. In this particular case that raw performance 
drop
is because spinning is pretty much disabled by Ding's change. Totally expected 
for
the kind of workload unixbench triggers.

All this does is hurt lkml-searchability.

Thanks,
Davidlohr


Re: [RFC PATCH] mmc: dw_mmc: remove redundant num_slots check

2016-01-21 Thread Jaehoon Chung
Hi, Shawn.

On 01/21/2016 04:52 PM, Shawn Lin wrote:
> num_slots comes from pdata if existing, otherwise from
> dw_mci_parse_dt which make it at least one slot. If
> num_slots is less than 1 for the existing pdata case,
> current code return -ENODEV. But dw_mci_probe seems to
> treat this a optional case as it will call SDMMC_GET_SLOT_NUM
> if no slot assigned.

Well, we need to consider more thing..
Host can get the number of slot from SDMMC_GET_SLOT_NUM().
But i think this way also has the problem.

num_slot isn't defined anywhere, and num_slot should be set to value of 
SDMMC_GET_SLOT_NUM.
If that value is higher than 1, it should be blocking..(I didn't test all 
cases..)

Even though this patch is not correct, i could check the problem relevant to 
num_slot, because of this patch. :)

my suggestion is if pdata->num_slot is not defined anywhere, just set to 1 by 
default.
not take from SDMMC_GET_SLOT_NUM.

if (host->pdata->nums_slots < 1 || 
host->pdata->nums_slots > SDMMC_GET_SLOT_NUM())

This is correct condition. num_slots can't be higher than number of supported 
slots.
how about?

Best Regards,
Jaehoon Chung

> 
> Signed-off-by: Shawn Lin 
> 
> ---
> 
>  drivers/mmc/host/dw_mmc.c | 6 --
>  1 file changed, 6 deletions(-)
> 
> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
> index 7128351..a116ec6 100644
> --- a/drivers/mmc/host/dw_mmc.c
> +++ b/drivers/mmc/host/dw_mmc.c
> @@ -2949,12 +2949,6 @@ int dw_mci_probe(struct dw_mci *host)
>   }
>   }
>  
> - if (host->pdata->num_slots < 1) {
> - dev_err(host->dev,
> - "Platform data must supply num_slots.\n");
> - return -ENODEV;
> - }
> -
>   host->biu_clk = devm_clk_get(host->dev, "biu");
>   if (IS_ERR(host->biu_clk)) {
>   dev_dbg(host->dev, "biu clock not available\n");
> 



Re: [PATCH 23/33] x86/asm/bpf: Create stack frames in bpf_jit.S

2016-01-21 Thread Alexei Starovoitov
On Thu, Jan 21, 2016 at 04:49:27PM -0600, Josh Poimboeuf wrote:
> bpf_jit.S has several callable non-leaf functions which don't honor
> CONFIG_FRAME_POINTER, which can result in bad stack traces.
> 
> Create a stack frame before the call instructions when
> CONFIG_FRAME_POINTER is enabled.
> 
> Signed-off-by: Josh Poimboeuf 
> Cc: Alexei Starovoitov 
> Cc: net...@vger.kernel.org
> ---
>  arch/x86/net/bpf_jit.S | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/net/bpf_jit.S b/arch/x86/net/bpf_jit.S
> index eb4a3bd..f2a7faf 100644
> --- a/arch/x86/net/bpf_jit.S
> +++ b/arch/x86/net/bpf_jit.S
> @@ -8,6 +8,7 @@
>   * of the License.
>   */
>  #include 
> +#include 
>  
>  /*
>   * Calling convention :
> @@ -65,16 +66,18 @@ FUNC(sk_load_byte_positive_offset)
>  
>  /* rsi contains offset and can be scratched */
>  #define bpf_slow_path_common(LEN)\
> + lea -MAX_BPF_STACK + 32(%rbp), %rdx;\
> + FRAME_BEGIN;\
>   mov %rbx, %rdi; /* arg1 == skb */   \
>   push%r9;\
>   pushSKBDATA;\
>  /* rsi already has offset */ \
>   mov $LEN,%ecx;  /* len */   \
> - lea - MAX_BPF_STACK + 32(%rbp),%rdx;\
>   callskb_copy_bits;  \
>   test%eax,%eax;  \
>   pop SKBDATA;\
> - pop %r9;
> + pop %r9;\
> + FRAME_END

I'm not sure what above is doing.
There is already 'push rbp; mov rbp,rsp' at the beginning of generated
code and with above the stack trace will show two function at the same ip?
since there were no calls between them?
I think the stack walker will get even more confused?
Also the JIT of bpf_call insn will emit variable number of push/pop
around the call and I definitely don't want to add extra push rbp
there, since it's the critical path and callee will do its own
push rbp.
Also there are push/pops emitted around div/mod
and there is indirect goto emitted as well for bpf_tail_call
that jumps into different function body without touching
current stack.
Also none of the JITed function are dwarf annotated.
I could be missing something. I think either this patch
is not need or you need to teach the tool to ignore
all JITed stuff. I don't think it's practical to annotate
everything. Different JITs do their own magic.
s390 JIT is even more fancy.



Re: linux-next: build failure after merge of the akpm tree

2016-01-21 Thread Stephen Rothwell
Hi all,

On Fri, 22 Jan 2016 11:24:42 +1100 Stephen Rothwell  
wrote:
>
> On Thu, 21 Jan 2016 07:38:59 +1100 Stephen Rothwell  
> wrote:
> >
> > On Wed, 20 Jan 2016 15:09:47 +0100 Takashi Iwai  wrote:  
> > >
> > > On Sat, 16 Jan 2016 09:51:29 +0100,
> > > Takashi Iwai wrote:
> > > > 
> > > > There are a few ways to fix this, but all are not comfortable.
> > > > 
> > > > A. Disable compress API for powerpc.
> > 
> > This also affects alpha, mips and (maybe) sparc.  
> 
> This was exposed on PowerPC by commit bf76f73c5f65 ("powerpc: enable
> UBSAN support") which is in Linus' tree as of this morning.  The only
> relevant change that made was in the compiler flags (I tested this by
> building the file without that commit but with these new compiler flags:
> 
> -fsanitize=shift -fsanitize=integer-divide-by-zero
> -fsanitize=unreachable -fsanitize=vla-bound -fsanitize=null
> -fsanitize=signed-integer-overflow -fsanitize=bounds
> -fsanitize=object-size -fsanitize=returns-nonnull-attribute
> -fsanitize=bool -fsanitize=enum -fsanitize=alignment
> 
> The preprocessed file is the same in both cases, but with these flags
> the compiler errors.

So for now I have suppressed the error using the following patch (which
I will keep in my fixes tree until some other fix is applied):

From: Stephen Rothwell 
Date: Fri, 22 Jan 2016 13:24:57 +1100
Subject: [PATCH] next: suppress the building of all the sound codecs on PPC
 for now

Signed-off-by: Stephen Rothwell 
---
 sound/soc/codecs/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/soc/codecs/Kconfig b/sound/soc/codecs/Kconfig
index 50693c867e71..ee5f36b9c787 100644
--- a/sound/soc/codecs/Kconfig
+++ b/sound/soc/codecs/Kconfig
@@ -13,6 +13,7 @@ menu "CODEC drivers"
 config SND_SOC_ALL_CODECS
tristate "Build all ASoC CODEC drivers"
depends on COMPILE_TEST
+   depends on !PPC
select SND_SOC_88PM860X if MFD_88PM860X
select SND_SOC_L3
select SND_SOC_AB8500_CODEC if ABX500_CORE
-- 
2.6.4

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-21 Thread Paul E. McKenney
On Thu, Jan 21, 2016 at 01:23:09PM -0800, Tim Chen wrote:
> On Thu, 2016-01-21 at 17:29 +0800, Ding Tianhong wrote:
> 
> > 
> > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> > index 0551c21..596b341 100644
> > --- a/kernel/locking/mutex.c
> > +++ b/kernel/locking/mutex.c
> > @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex 
> > *lock)
> > struct task_struct *owner;
> > int retval = 1;
> >  
> > -   if (need_resched())
> > +   if (need_resched() || atomic_read(>count) == -1)
> > return 0;
> >  
> 
> One concern I have is this change will eliminate any optimistic spinning
> as long as there is a waiter.  Is there a middle ground that we
> can allow only one spinner if there are waiters?  
> 
> In other words, we allow spinning when
> atomic_read(>count) == -1 but there is no one on the
> osq lock that queue up the spinners (i.e. no other process doing
> optimistic spinning).
> 
> This could allow a bit of spinning without starving out the waiters.

I did some testing, which exposed it to the 0day test robot, which
did note some performance differences.  I was hoping that it would
clear up some instability from other patches, but no such luck.  ;-)

Thanx, Paul



Re: [PATCH 0/6] perf core: Read from overwrite ring buffer

2016-01-21 Thread Wangnan (F)



On 2016/1/21 14:51, Wangnan (F) wrote:



On 2016/1/20 10:20, Alexei Starovoitov wrote:

On Wed, Jan 20, 2016 at 09:37:42AM +0800, Wangnan (F) wrote:


On 2016/1/20 1:42, Alexei Starovoitov wrote:

On Tue, Jan 19, 2016 at 11:16:44AM +, Wang Nan wrote:
This patchset introduces two methods to support reading from 
overwrite.


  1) Tailsize: write the size of an event at the end of it
  2) Backward writing: write the ring buffer from the end of it to 
the

 beginning.
what happend with your other idea of moving the whole header to the 
end?

That felt better than either of these options.

I'll try it today. However, putting all of the three together is
not as easy as this patchset.

I'm missing something. Why all three in one set?


Can't implement all three in one, but implement two of them make
benchmarking simpler :)

Here comes some numbers.

I attach a target program at the end of this mail. It calls
close(-1) for 300 times, and use gettimeofday to check
how many us it takes.

Following cases are tested:


 BASE: ./a.out
 RAWPERF : ./perf record -o /dev/null -e raw_syscalls:* ./a.out
 WRTBKWRD: ./perf record -o /dev/null -e raw_syscalls:* ./a.out
 TAILSIZE: ./perf record --no-has-write-backward -o /dev/null -e 
raw_syscalls:*/overwrite/ ./a.out
 RAWOVWRT: ./perf record --no-has-write-backward --no-has-tailsize -o 
/dev/null -e raw_syscalls:*/overwrite/ ./a.out


With this script:

func() {
for x in `seq 1 100` ; do $1; done | tee data_$2
}

func ./a.out base
func "./perf record -o /dev/null -e raw_syscalls:* ./a.out" rawperf
func "./perf record -o /dev/null -e raw_syscalls:*/overwrite/ ./a.out" 
wrtbkwrd
func "./perf record -o /dev/null --no-has-write-backward -e 
raw_syscalls:*/overwrite/ ./a.out" tailsize
func "./perf record -o /dev/null --no-has-write-backward 
--no-has-tailsize -o /dev/null -e raw_syscalls:*/overwrite/ ./a.out" 
rawovwrt


Result:

MEAN   STDVAR
BASE:  879870.81  11913.13
RAWPERF : 2603854.7  706658.4
WRTBKWRD: 2313301.220  6727.957
TAILSIZE: 2383051.860  5248.061
RAWOVWRT: 2315273.180  5221.025


Add a number: I tested original perf overwrite ring buffer in pure v4.4
on the same machine:

MEAN  STDVAR
RAWOVWRT(original): 2323970.455103.39

So I think backward writing method doesn't add extra overhead into
fastpath.

I will send this patchset again with several bugs fixed. After that
I'll start working on tail-header if it is still required.

Thank you.



  1   2   3   4   5   6   7   8   9   10   >