Re: [PATCH v6 0/5] /dev/random - a new approach

2016-08-18 Thread Herbert Xu
On Thu, Aug 18, 2016 at 10:49:47PM -0400, Theodore Ts'o wrote:
>
> That really depends on the system.  We can't assume that people are
> using systems with a 100Hz clock interrupt.  More often than not
> people are using tickless kernels these days.  That's actually the
> problem with changing /dev/urandom to block until things are
> initialized.

Couldn't we disable tickless until urandom has been seeded? In fact
perhaps we should accelerate the timer interrupt rate until it has
been seeded?

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [RFC PATCH-tip v4 07/10] locking/rwsem: Change RWSEM_WAITING_BIAS for better disambiguation

2016-08-18 Thread Wanpeng Li
2016-08-19 5:11 GMT+08:00 Waiman Long :
> When the count value is in between 0 and RWSEM_WAITING_BIAS, there
> are 2 possibilities.
> Either a writer is present and there is no waiter

count = 0x0001

>or there are waiters and readers. There is no easy way to

count = 0x000X

However, RWSEM_WAITING_BIAS is equal to 0x, so both these two
cases are beyond RWSEM_WAITING_BIAS, right?

Regards,
Wanpeng Li


Re: [PATCH v6 0/5] /dev/random - a new approach

2016-08-18 Thread Herbert Xu
On Thu, Aug 18, 2016 at 10:49:47PM -0400, Theodore Ts'o wrote:
>
> That really depends on the system.  We can't assume that people are
> using systems with a 100Hz clock interrupt.  More often than not
> people are using tickless kernels these days.  That's actually the
> problem with changing /dev/urandom to block until things are
> initialized.

Couldn't we disable tickless until urandom has been seeded? In fact
perhaps we should accelerate the timer interrupt rate until it has
been seeded?

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [RFC PATCH-tip v4 07/10] locking/rwsem: Change RWSEM_WAITING_BIAS for better disambiguation

2016-08-18 Thread Wanpeng Li
2016-08-19 5:11 GMT+08:00 Waiman Long :
> When the count value is in between 0 and RWSEM_WAITING_BIAS, there
> are 2 possibilities.
> Either a writer is present and there is no waiter

count = 0x0001

>or there are waiters and readers. There is no easy way to

count = 0x000X

However, RWSEM_WAITING_BIAS is equal to 0x, so both these two
cases are beyond RWSEM_WAITING_BIAS, right?

Regards,
Wanpeng Li


Re: [PATCH 4/8] pipe: fix limit checking in pipe_set_size()

2016-08-18 Thread Willy Tarreau
Hi Michael,

Since you're changing this code, it's probably worth swapping the size
check and capable() below to save a function call in the normal path :

On Fri, Aug 19, 2016 at 05:25:35PM +1200, Michael Kerrisk (man-pages) wrote:
> + if (nr_pages > pipe->buffers) {
> + if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) {

=>  if (size > pipe_max_size && !capable(CAP_SYS_RESOURCE)) {

> + ret = -EPERM;
> + goto out_revert_acct;
> + } else if ((too_many_pipe_buffers_hard(pipe->user) ||
> + too_many_pipe_buffers_soft(pipe->user)) &&
> + !capable(CAP_SYS_RESOURCE) &&
> + !capable(CAP_SYS_ADMIN)) {
> + ret = -EPERM;
> + goto out_revert_acct;
> + }
> + }
(...)

Cheers,
Willy


Re: [PATCH 4/8] pipe: fix limit checking in pipe_set_size()

2016-08-18 Thread Willy Tarreau
Hi Michael,

Since you're changing this code, it's probably worth swapping the size
check and capable() below to save a function call in the normal path :

On Fri, Aug 19, 2016 at 05:25:35PM +1200, Michael Kerrisk (man-pages) wrote:
> + if (nr_pages > pipe->buffers) {
> + if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) {

=>  if (size > pipe_max_size && !capable(CAP_SYS_RESOURCE)) {

> + ret = -EPERM;
> + goto out_revert_acct;
> + } else if ((too_many_pipe_buffers_hard(pipe->user) ||
> + too_many_pipe_buffers_soft(pipe->user)) &&
> + !capable(CAP_SYS_RESOURCE) &&
> + !capable(CAP_SYS_ADMIN)) {
> + ret = -EPERM;
> + goto out_revert_acct;
> + }
> + }
(...)

Cheers,
Willy


Re: [PATCH v3] mm/slab: Improve performance of gathering slabinfo stats

2016-08-18 Thread aruna . ramakrishna

On 08/18/2016 04:52 AM, Michal Hocko wrote:

I am not opposing the patch (to be honest it is quite neat) but this
is buggering me for quite some time. Sorry for hijacking this email
thread but I couldn't resist. Why are we trying to optimize SLAB and
slowly converge it to SLUB feature-wise. I always thought that SLAB
should remain stable and time challenged solution which works reasonably
well for many/most workloads, while SLUB is an optimized implementation
which experiment with slightly different concepts that might boost the
performance considerably but might also surprise from time to time. If
this is not the case then why do we have both of them in the kernel. It
is a lot of code and some features need tweaking both while only one
gets testing coverage. So this is mainly a question for maintainers. Why
do we maintain both and what is the purpose of them.


Michal,

Speaking about this patch specifically - I'm not trying to optimize SLAB 
or make it more similar to SLUB. This patch is a bug fix for an issue 
where the slowness of 'cat /proc/slabinfo' caused timeouts in other 
drivers. While optimizing that flow, it became apparent (as Christoph 
pointed out) that one could converge this patch to SLUB's current 
implementation. Though I have not done that in this patch (because that 
warrants a separate patch), I think it makes sense to converge where 
appropriate, since they both do share some common data structures and 
code already.


Thanks,
Aruna


Re: [PATCH v3] mm/slab: Improve performance of gathering slabinfo stats

2016-08-18 Thread aruna . ramakrishna

On 08/18/2016 04:52 AM, Michal Hocko wrote:

I am not opposing the patch (to be honest it is quite neat) but this
is buggering me for quite some time. Sorry for hijacking this email
thread but I couldn't resist. Why are we trying to optimize SLAB and
slowly converge it to SLUB feature-wise. I always thought that SLAB
should remain stable and time challenged solution which works reasonably
well for many/most workloads, while SLUB is an optimized implementation
which experiment with slightly different concepts that might boost the
performance considerably but might also surprise from time to time. If
this is not the case then why do we have both of them in the kernel. It
is a lot of code and some features need tweaking both while only one
gets testing coverage. So this is mainly a question for maintainers. Why
do we maintain both and what is the purpose of them.


Michal,

Speaking about this patch specifically - I'm not trying to optimize SLAB 
or make it more similar to SLUB. This patch is a bug fix for an issue 
where the slowness of 'cat /proc/slabinfo' caused timeouts in other 
drivers. While optimizing that flow, it became apparent (as Christoph 
pointed out) that one could converge this patch to SLUB's current 
implementation. Though I have not done that in this patch (because that 
warrants a separate patch), I think it makes sense to converge where 
appropriate, since they both do share some common data structures and 
code already.


Thanks,
Aruna


Re: [PATCH] dmaengine: qcom_hidma: release the descriptor before the callback

2016-08-18 Thread Vinod Koul
On Thu, Aug 18, 2016 at 11:48:52PM -0400, Sinan Kaya wrote:
> On 8/18/2016 11:42 PM, Vinod Koul wrote:
> > On Thu, Aug 18, 2016 at 11:26:28PM -0400, Sinan Kaya wrote:
> >> On 8/18/2016 10:48 PM, Vinod Koul wrote:
>  Keep a size limited list with error cookies and flush them in terminate 
>  all?
> >>> I think so, terminate_all anyway cleans up the channel. Btw what is the
> >>> behaviour on error? Do you terminate or somthing else?
> >>>
> >>
> >> On error, I flush all outstanding transactions with an error code and I 
> >> reset
> >> the channel. After the reset, the DMA channel is functional again. The 
> >> client
> >> doesn't need to shutdown anything.
> > 
> > You mean from the client context or driver?
> > 
> 
> The client doesn't need to call device_free_chan_resources and 
> device_terminate_all
> to be specific. Client can certainly call these if it needs to but it is not
> required to recover the channel.

You didn't answer my question!

On error you said you flush, so who does that?

> After the reset in error condition, the client can continue issuing new 
> requests
> with tx_submit and device_issue_pending as usual.

-- 
~Vinod


Re: [PATCH] dmaengine: qcom_hidma: release the descriptor before the callback

2016-08-18 Thread Vinod Koul
On Thu, Aug 18, 2016 at 11:48:52PM -0400, Sinan Kaya wrote:
> On 8/18/2016 11:42 PM, Vinod Koul wrote:
> > On Thu, Aug 18, 2016 at 11:26:28PM -0400, Sinan Kaya wrote:
> >> On 8/18/2016 10:48 PM, Vinod Koul wrote:
>  Keep a size limited list with error cookies and flush them in terminate 
>  all?
> >>> I think so, terminate_all anyway cleans up the channel. Btw what is the
> >>> behaviour on error? Do you terminate or somthing else?
> >>>
> >>
> >> On error, I flush all outstanding transactions with an error code and I 
> >> reset
> >> the channel. After the reset, the DMA channel is functional again. The 
> >> client
> >> doesn't need to shutdown anything.
> > 
> > You mean from the client context or driver?
> > 
> 
> The client doesn't need to call device_free_chan_resources and 
> device_terminate_all
> to be specific. Client can certainly call these if it needs to but it is not
> required to recover the channel.

You didn't answer my question!

On error you said you flush, so who does that?

> After the reset in error condition, the client can continue issuing new 
> requests
> with tx_submit and device_issue_pending as usual.

-- 
~Vinod


[PATCH 3/8] pipe: refactor argument for account_pipe_buffers()

2016-08-18 Thread Michael Kerrisk (man-pages)
This is a preparatory patch for following work. account_pipe_buffers()
performs accounting in the 'user_struct'. There is no need to pass
a pointer to a 'pipe_inode_info' struct (which is then dereferenced
to obtain a pointer to the 'user' field). Instead, pass a pointer
directly to the 'user_struct'. This change is needed in preparation
for subsequent patches (and the resulting code is a little more logical).

Cc: Willy Tarreau 
Cc: Vegard Nossum 
Cc: socketp...@gmail.com
Cc: Tetsuo Handa 
Cc: Jens Axboe 
Cc: Al Viro 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Michael Kerrisk 

---
 fs/pipe.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 4b98fd0..37b7f5e 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -604,10 +604,10 @@ pipe_fasync(int fd, struct file *filp, int on)
return retval;
 }
 
-static void account_pipe_buffers(struct pipe_inode_info *pipe,
+static void account_pipe_buffers(struct user_struct *user,
  unsigned long old, unsigned long new)
 {
-   atomic_long_add(new - old, >user->pipe_bufs);
+   atomic_long_add(new - old, >pipe_bufs);
 }
 
 static bool too_many_pipe_buffers_soft(struct user_struct *user)
@@ -644,7 +644,7 @@ struct pipe_inode_info *alloc_pipe_info(void)
pipe->r_counter = pipe->w_counter = 1;
pipe->buffers = pipe_bufs;
pipe->user = user;
-   account_pipe_buffers(pipe, 0, pipe_bufs);
+   account_pipe_buffers(user, 0, pipe_bufs);
mutex_init(>mutex);
return pipe;
}
@@ -659,7 +659,7 @@ void free_pipe_info(struct pipe_inode_info *pipe)
 {
int i;
 
-   account_pipe_buffers(pipe, pipe->buffers, 0);
+   account_pipe_buffers(pipe->user, pipe->buffers, 0);
free_uid(pipe->user);
for (i = 0; i < pipe->buffers; i++) {
struct pipe_buffer *buf = pipe->bufs + i;
@@ -1080,7 +1080,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, 
unsigned long arg)
memcpy(bufs + head, pipe->bufs, tail * sizeof(struct 
pipe_buffer));
}
 
-   account_pipe_buffers(pipe, pipe->buffers, nr_pages);
+   account_pipe_buffers(pipe->user, pipe->buffers, nr_pages);
pipe->curbuf = 0;
kfree(pipe->bufs);
pipe->bufs = bufs;
-- 
2.5.5



[PATCH 3/8] pipe: refactor argument for account_pipe_buffers()

2016-08-18 Thread Michael Kerrisk (man-pages)
This is a preparatory patch for following work. account_pipe_buffers()
performs accounting in the 'user_struct'. There is no need to pass
a pointer to a 'pipe_inode_info' struct (which is then dereferenced
to obtain a pointer to the 'user' field). Instead, pass a pointer
directly to the 'user_struct'. This change is needed in preparation
for subsequent patches (and the resulting code is a little more logical).

Cc: Willy Tarreau 
Cc: Vegard Nossum 
Cc: socketp...@gmail.com
Cc: Tetsuo Handa 
Cc: Jens Axboe 
Cc: Al Viro 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Michael Kerrisk 

---
 fs/pipe.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 4b98fd0..37b7f5e 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -604,10 +604,10 @@ pipe_fasync(int fd, struct file *filp, int on)
return retval;
 }
 
-static void account_pipe_buffers(struct pipe_inode_info *pipe,
+static void account_pipe_buffers(struct user_struct *user,
  unsigned long old, unsigned long new)
 {
-   atomic_long_add(new - old, >user->pipe_bufs);
+   atomic_long_add(new - old, >pipe_bufs);
 }
 
 static bool too_many_pipe_buffers_soft(struct user_struct *user)
@@ -644,7 +644,7 @@ struct pipe_inode_info *alloc_pipe_info(void)
pipe->r_counter = pipe->w_counter = 1;
pipe->buffers = pipe_bufs;
pipe->user = user;
-   account_pipe_buffers(pipe, 0, pipe_bufs);
+   account_pipe_buffers(user, 0, pipe_bufs);
mutex_init(>mutex);
return pipe;
}
@@ -659,7 +659,7 @@ void free_pipe_info(struct pipe_inode_info *pipe)
 {
int i;
 
-   account_pipe_buffers(pipe, pipe->buffers, 0);
+   account_pipe_buffers(pipe->user, pipe->buffers, 0);
free_uid(pipe->user);
for (i = 0; i < pipe->buffers; i++) {
struct pipe_buffer *buf = pipe->bufs + i;
@@ -1080,7 +1080,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, 
unsigned long arg)
memcpy(bufs + head, pipe->bufs, tail * sizeof(struct 
pipe_buffer));
}
 
-   account_pipe_buffers(pipe, pipe->buffers, nr_pages);
+   account_pipe_buffers(pipe->user, pipe->buffers, nr_pages);
pipe->curbuf = 0;
kfree(pipe->bufs);
pipe->bufs = bufs;
-- 
2.5.5



Re: [PATCH v10 8/9] arm: dts: mt2701: Add clock controller device nodes

2016-08-18 Thread James Liao
On Thu, 2016-08-18 at 17:18 -0700, Stephen Boyd wrote:
> On 08/16, Erin Lo wrote:
> > From: James Liao 
> > 
> > Add clock controller nodes for MT2701, include topckgen, infracfg,
> > pericfg, apmixedsys, mmsys, imgsys, vdecsys, hifsys, ethsys and
> > bdpsys. This patch also add two oscillators that provide clocks for
> > MT2701.
> > 
> > Signed-off-by: James Liao 
> > Signed-off-by: Erin Lo 
> > ---
> 
> This should go through arm-soc tree, so do you need a stable
> branch in clk tree to pull through arm-soc, or are we going to
> wait a release cycle on the dts patches?

Hi Stephen,

I prefer to wait a release cycle. We may merge clk driver first, then
merge dts patches in next kernel release.


Best regards,

James




[PATCH 4/8] pipe: fix limit checking in pipe_set_size()

2016-08-18 Thread Michael Kerrisk (man-pages)
The limit checking in pipe_set_size() (used by fcntl(F_SETPIPE_SZ))
has the following problems:

(1) When increasing the pipe capacity, the checks against the limits
in /proc/sys/fs/pipe-user-pages-{soft,hard} are made against
existing consumption, and exclude the memory required for the
increased pipe capacity. The new increase in pipe capacity
can then push the total memory used by the user for pipes
(possibly far) over a limit. This can also trigger the problem
described next.

(2) The limit checks are performed even when the new pipe capacity
is less than the existing pipe capacity. This can lead to
problems if a user sets a large pipe capacity, and then the
limits are lowered, with the result that the user will no
longer be able to decrease the pipe capacity.

(3) As currently implemented, accounting and checking against the limits
is done as follows:

(a) Test whether the user has exceeded the limit.
(b) Make new pipe buffer allocation.
(c) Account new allocation against the limits.

This is racey. Multiple processes may pass point (a) simultaneously,
and then allocate pipe buffers that are accounted for only in step (c).
The race means that the user's pipe buffer allocation could be pushed
over the limit (by an arbitrary amount, depending on how unlucky we
were in the race). [Thanks to Vegard Nossum for spotting this point,
which I had missed.]

This patch addresses the above problems as follows:

* Perform checks against the limits only when increasing a pipe's
  capacity; an unprivileged user can always decrease a pipe's capacity.
* Alter the checks against limits to include the memory required for the
  new pipe capacity.
* Re-order the accounting step so that it precedes the buffer
  allocation. If the accounting step determines that a limit has
  been reached, revert the accounting and cause the operation to fail.

The program below can be used to demonstrate problems 1 and 2, and the
effect of the fix. The program takes one or more command-line
arguments. The first argument specifies the number of pipes that the
program should create. The remaining arguments are, alternately, pipe
capacities that should be set using fcntl(F_SETPIPE_SZ), and sleep
intervals (in seconds) between the fcntl() operations. (The sleep
intervals allow the possibility to change the limits between fcntl()
operations.)

Problem 1
=

Using the test program on an unpatched kernel, we first set some limits:

# echo 0 > /proc/sys/fs/pipe-user-pages-soft
# echo 10 > /proc/sys/fs/pipe-max-size
# echo 1 > /proc/sys/fs/pipe-user-pages-hard# 40.96 MB

Then show that we can set a pipe with capacity (100MB) that is
over the hard limit

# sudo -u mtk ./test_F_SETPIPE_SZ 1 1
Initial pipe capacity: 65536
Loop 1: set pipe capacity to 1 bytes
F_SETPIPE_SZ returned 134217728

Now set the capacity to 100MB twice. The second call fails (which is
probably surprising to most users, since it seems like a no-op):

# sudo -u mtk ./test_F_SETPIPE_SZ 1 1 0 1
Initial pipe capacity: 65536
Loop 1: set pipe capacity to 1 bytes
F_SETPIPE_SZ returned 134217728
Loop 2: set pipe capacity to 1 bytes
Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted

With a patched kernel, setting a capacity over the limit fails at the
first attempt:

# echo 0 > /proc/sys/fs/pipe-user-pages-soft
# echo 10 > /proc/sys/fs/pipe-max-size
# echo 1 > /proc/sys/fs/pipe-user-pages-hard
# sudo -u mtk ./test_F_SETPIPE_SZ 1 1
Initial pipe capacity: 65536
Loop 1: set pipe capacity to 1 bytes
Loop 1, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted

There is a small chance that the change to fix this problem could
break user-space, since there are cases where fcntl(F_SETPIPE_SZ)
calls that previously succeeded might fail. However, the chances
are small, since (a) the pipe-user-pages-{soft,hard} limits are
new (in 4.5), and the default soft/hard limits are high/unlimited.
Therefore, it seems warranted to make these limits operate more
precisely (and behave more like what users probably expect).

Problem 2
=

Running the test program on an unpatched kernel, we first set some limits:

# getconf PAGESIZE
4096
# echo 0 > /proc/sys/fs/pipe-user-pages-soft
# echo 10 > /proc/sys/fs/pipe-max-size
# echo 1 > /proc/sys/fs/pipe-user-pages-hard# 40.96 MB

Now perform two fcntl(F_SETPIPE_SZ) operations on a single pipe,
first setting a pipe capacity (10MB), sleeping for a few seconds,
during which time the hard limit is lowered, and then set pipe
capacity to a smaller amount (5MB):

# sudo -u mtk ./test_F_SETPIPE_SZ 1 1000 15 500 &
[1] 748
# Initial pipe capacity: 65536
Loop 1: set pipe capacity 

Re: [PATCH v10 8/9] arm: dts: mt2701: Add clock controller device nodes

2016-08-18 Thread James Liao
On Thu, 2016-08-18 at 17:18 -0700, Stephen Boyd wrote:
> On 08/16, Erin Lo wrote:
> > From: James Liao 
> > 
> > Add clock controller nodes for MT2701, include topckgen, infracfg,
> > pericfg, apmixedsys, mmsys, imgsys, vdecsys, hifsys, ethsys and
> > bdpsys. This patch also add two oscillators that provide clocks for
> > MT2701.
> > 
> > Signed-off-by: James Liao 
> > Signed-off-by: Erin Lo 
> > ---
> 
> This should go through arm-soc tree, so do you need a stable
> branch in clk tree to pull through arm-soc, or are we going to
> wait a release cycle on the dts patches?

Hi Stephen,

I prefer to wait a release cycle. We may merge clk driver first, then
merge dts patches in next kernel release.


Best regards,

James




[PATCH 4/8] pipe: fix limit checking in pipe_set_size()

2016-08-18 Thread Michael Kerrisk (man-pages)
The limit checking in pipe_set_size() (used by fcntl(F_SETPIPE_SZ))
has the following problems:

(1) When increasing the pipe capacity, the checks against the limits
in /proc/sys/fs/pipe-user-pages-{soft,hard} are made against
existing consumption, and exclude the memory required for the
increased pipe capacity. The new increase in pipe capacity
can then push the total memory used by the user for pipes
(possibly far) over a limit. This can also trigger the problem
described next.

(2) The limit checks are performed even when the new pipe capacity
is less than the existing pipe capacity. This can lead to
problems if a user sets a large pipe capacity, and then the
limits are lowered, with the result that the user will no
longer be able to decrease the pipe capacity.

(3) As currently implemented, accounting and checking against the limits
is done as follows:

(a) Test whether the user has exceeded the limit.
(b) Make new pipe buffer allocation.
(c) Account new allocation against the limits.

This is racey. Multiple processes may pass point (a) simultaneously,
and then allocate pipe buffers that are accounted for only in step (c).
The race means that the user's pipe buffer allocation could be pushed
over the limit (by an arbitrary amount, depending on how unlucky we
were in the race). [Thanks to Vegard Nossum for spotting this point,
which I had missed.]

This patch addresses the above problems as follows:

* Perform checks against the limits only when increasing a pipe's
  capacity; an unprivileged user can always decrease a pipe's capacity.
* Alter the checks against limits to include the memory required for the
  new pipe capacity.
* Re-order the accounting step so that it precedes the buffer
  allocation. If the accounting step determines that a limit has
  been reached, revert the accounting and cause the operation to fail.

The program below can be used to demonstrate problems 1 and 2, and the
effect of the fix. The program takes one or more command-line
arguments. The first argument specifies the number of pipes that the
program should create. The remaining arguments are, alternately, pipe
capacities that should be set using fcntl(F_SETPIPE_SZ), and sleep
intervals (in seconds) between the fcntl() operations. (The sleep
intervals allow the possibility to change the limits between fcntl()
operations.)

Problem 1
=

Using the test program on an unpatched kernel, we first set some limits:

# echo 0 > /proc/sys/fs/pipe-user-pages-soft
# echo 10 > /proc/sys/fs/pipe-max-size
# echo 1 > /proc/sys/fs/pipe-user-pages-hard# 40.96 MB

Then show that we can set a pipe with capacity (100MB) that is
over the hard limit

# sudo -u mtk ./test_F_SETPIPE_SZ 1 1
Initial pipe capacity: 65536
Loop 1: set pipe capacity to 1 bytes
F_SETPIPE_SZ returned 134217728

Now set the capacity to 100MB twice. The second call fails (which is
probably surprising to most users, since it seems like a no-op):

# sudo -u mtk ./test_F_SETPIPE_SZ 1 1 0 1
Initial pipe capacity: 65536
Loop 1: set pipe capacity to 1 bytes
F_SETPIPE_SZ returned 134217728
Loop 2: set pipe capacity to 1 bytes
Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted

With a patched kernel, setting a capacity over the limit fails at the
first attempt:

# echo 0 > /proc/sys/fs/pipe-user-pages-soft
# echo 10 > /proc/sys/fs/pipe-max-size
# echo 1 > /proc/sys/fs/pipe-user-pages-hard
# sudo -u mtk ./test_F_SETPIPE_SZ 1 1
Initial pipe capacity: 65536
Loop 1: set pipe capacity to 1 bytes
Loop 1, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted

There is a small chance that the change to fix this problem could
break user-space, since there are cases where fcntl(F_SETPIPE_SZ)
calls that previously succeeded might fail. However, the chances
are small, since (a) the pipe-user-pages-{soft,hard} limits are
new (in 4.5), and the default soft/hard limits are high/unlimited.
Therefore, it seems warranted to make these limits operate more
precisely (and behave more like what users probably expect).

Problem 2
=

Running the test program on an unpatched kernel, we first set some limits:

# getconf PAGESIZE
4096
# echo 0 > /proc/sys/fs/pipe-user-pages-soft
# echo 10 > /proc/sys/fs/pipe-max-size
# echo 1 > /proc/sys/fs/pipe-user-pages-hard# 40.96 MB

Now perform two fcntl(F_SETPIPE_SZ) operations on a single pipe,
first setting a pipe capacity (10MB), sleeping for a few seconds,
during which time the hard limit is lowered, and then set pipe
capacity to a smaller amount (5MB):

# sudo -u mtk ./test_F_SETPIPE_SZ 1 1000 15 500 &
[1] 748
# Initial pipe capacity: 65536
Loop 1: set pipe capacity 

[PATCH v5 12/12] [media] vivid: Add support for HSV encoding

2016-08-18 Thread Ricardo Ribalda Delgado
Support HSV encoding. Most of the logic is replicated from ycbcr_enc.

Signed-off-by: Ricardo Ribalda Delgado 
---
 drivers/media/common/v4l2-tpg/v4l2-tpg-core.c   | 25 +
 drivers/media/platform/vivid/vivid-core.h   |  1 +
 drivers/media/platform/vivid/vivid-ctrls.c  | 25 +
 drivers/media/platform/vivid/vivid-vid-cap.c| 17 +++--
 drivers/media/platform/vivid/vivid-vid-common.c |  2 ++
 drivers/media/platform/vivid/vivid-vid-out.c|  1 +
 include/media/v4l2-tpg.h| 15 +++
 7 files changed, 76 insertions(+), 10 deletions(-)

diff --git a/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c 
b/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
index ed37ae307cac..28d7b072d867 100644
--- a/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
+++ b/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
@@ -504,6 +504,7 @@ static void color_to_hsv(struct tpg_data *tpg, int r, int 
g, int b,
int max_rgb, min_rgb, diff_rgb;
int aux;
int third;
+   int third_size;
 
r >>= 4;
g >>= 4;
@@ -530,30 +531,36 @@ static void color_to_hsv(struct tpg_data *tpg, int r, int 
g, int b,
return;
}
 
+   third_size = (tpg->real_hsv_enc == V4L2_HSV_ENC_180) ? 60 : 85;
+
/* Hue */
if (max_rgb == r) {
aux =  g - b;
third = 0;
} else if (max_rgb == g) {
aux =  b - r;
-   third = 60;
+   third = third_size;
} else {
aux =  r - g;
-   third = 120;
+   third = third_size * 2;
}
 
-   aux *= 30;
+   aux *= third_size / 2;
aux += diff_rgb / 2;
aux /= diff_rgb;
aux += third;
 
/* Clamp Hue */
-   if (aux < 0)
-   aux += 180;
-   else if (aux > 180)
-   aux -= 180;
-   *h = aux;
+   if (tpg->real_hsv_enc == V4L2_HSV_ENC_180) {
+   if (aux < 0)
+   aux += 180;
+   else if (aux > 180)
+   aux -= 180;
+   } else {
+   aux = aux & 0xff;
+   }
 
+   *h = aux;
 }
 
 static void rgb2ycbcr(const int m[3][3], int r, int g, int b,
@@ -1928,6 +1935,7 @@ static void tpg_recalc(struct tpg_data *tpg)
tpg->recalc_lines = true;
tpg->real_xfer_func = tpg->xfer_func;
tpg->real_ycbcr_enc = tpg->ycbcr_enc;
+   tpg->real_hsv_enc = tpg->hsv_enc;
tpg->real_quantization = tpg->quantization;
 
if (tpg->xfer_func == V4L2_XFER_FUNC_DEFAULT)
@@ -2018,6 +2026,7 @@ void tpg_log_status(struct tpg_data *tpg)
pr_info("tpg colorspace: %d\n", tpg->colorspace);
pr_info("tpg transfer function: %d/%d\n", tpg->xfer_func, 
tpg->real_xfer_func);
pr_info("tpg Y'CbCr encoding: %d/%d\n", tpg->ycbcr_enc, 
tpg->real_ycbcr_enc);
+   pr_info("tpg HSV encoding: %d/%d\n", tpg->hsv_enc, tpg->real_hsv_enc);
pr_info("tpg quantization: %d/%d\n", tpg->quantization, 
tpg->real_quantization);
pr_info("tpg RGB range: %d/%d\n", tpg->rgb_range, tpg->real_rgb_range);
 }
diff --git a/drivers/media/platform/vivid/vivid-core.h 
b/drivers/media/platform/vivid/vivid-core.h
index b59b49456d45..5cdf95bdc4d1 100644
--- a/drivers/media/platform/vivid/vivid-core.h
+++ b/drivers/media/platform/vivid/vivid-core.h
@@ -346,6 +346,7 @@ struct vivid_dev {
struct v4l2_dv_timings  dv_timings_out;
u32 colorspace_out;
u32 ycbcr_enc_out;
+   u32 hsv_enc_out;
u32 quantization_out;
u32 xfer_func_out;
u32 service_set_out;
diff --git a/drivers/media/platform/vivid/vivid-ctrls.c 
b/drivers/media/platform/vivid/vivid-ctrls.c
index aceb38d9f7e7..34731f71cc00 100644
--- a/drivers/media/platform/vivid/vivid-ctrls.c
+++ b/drivers/media/platform/vivid/vivid-ctrls.c
@@ -79,6 +79,7 @@
 #define VIVID_CID_MAX_EDID_BLOCKS  (VIVID_CID_VIVID_BASE + 40)
 #define VIVID_CID_PERCENTAGE_FILL  (VIVID_CID_VIVID_BASE + 41)
 #define VIVID_CID_REDUCED_FPS  (VIVID_CID_VIVID_BASE + 42)
+#define VIVID_CID_HSV_ENC  (VIVID_CID_VIVID_BASE + 43)
 
 #define VIVID_CID_STD_SIGNAL_MODE  (VIVID_CID_VIVID_BASE + 60)
 #define VIVID_CID_STANDARD (VIVID_CID_VIVID_BASE + 61)
@@ -378,6 +379,14 @@ static int vivid_vid_cap_s_ctrl(struct v4l2_ctrl *ctrl)
vivid_send_source_change(dev, HDMI);
vivid_send_source_change(dev, WEBCAM);
break;
+   case VIVID_CID_HSV_ENC:
+   tpg_s_hsv_enc(>tpg, ctrl->val ? V4L2_HSV_ENC_256 :
+V4L2_HSV_ENC_180);
+   

[PATCH v5 12/12] [media] vivid: Add support for HSV encoding

2016-08-18 Thread Ricardo Ribalda Delgado
Support HSV encoding. Most of the logic is replicated from ycbcr_enc.

Signed-off-by: Ricardo Ribalda Delgado 
---
 drivers/media/common/v4l2-tpg/v4l2-tpg-core.c   | 25 +
 drivers/media/platform/vivid/vivid-core.h   |  1 +
 drivers/media/platform/vivid/vivid-ctrls.c  | 25 +
 drivers/media/platform/vivid/vivid-vid-cap.c| 17 +++--
 drivers/media/platform/vivid/vivid-vid-common.c |  2 ++
 drivers/media/platform/vivid/vivid-vid-out.c|  1 +
 include/media/v4l2-tpg.h| 15 +++
 7 files changed, 76 insertions(+), 10 deletions(-)

diff --git a/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c 
b/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
index ed37ae307cac..28d7b072d867 100644
--- a/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
+++ b/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
@@ -504,6 +504,7 @@ static void color_to_hsv(struct tpg_data *tpg, int r, int 
g, int b,
int max_rgb, min_rgb, diff_rgb;
int aux;
int third;
+   int third_size;
 
r >>= 4;
g >>= 4;
@@ -530,30 +531,36 @@ static void color_to_hsv(struct tpg_data *tpg, int r, int 
g, int b,
return;
}
 
+   third_size = (tpg->real_hsv_enc == V4L2_HSV_ENC_180) ? 60 : 85;
+
/* Hue */
if (max_rgb == r) {
aux =  g - b;
third = 0;
} else if (max_rgb == g) {
aux =  b - r;
-   third = 60;
+   third = third_size;
} else {
aux =  r - g;
-   third = 120;
+   third = third_size * 2;
}
 
-   aux *= 30;
+   aux *= third_size / 2;
aux += diff_rgb / 2;
aux /= diff_rgb;
aux += third;
 
/* Clamp Hue */
-   if (aux < 0)
-   aux += 180;
-   else if (aux > 180)
-   aux -= 180;
-   *h = aux;
+   if (tpg->real_hsv_enc == V4L2_HSV_ENC_180) {
+   if (aux < 0)
+   aux += 180;
+   else if (aux > 180)
+   aux -= 180;
+   } else {
+   aux = aux & 0xff;
+   }
 
+   *h = aux;
 }
 
 static void rgb2ycbcr(const int m[3][3], int r, int g, int b,
@@ -1928,6 +1935,7 @@ static void tpg_recalc(struct tpg_data *tpg)
tpg->recalc_lines = true;
tpg->real_xfer_func = tpg->xfer_func;
tpg->real_ycbcr_enc = tpg->ycbcr_enc;
+   tpg->real_hsv_enc = tpg->hsv_enc;
tpg->real_quantization = tpg->quantization;
 
if (tpg->xfer_func == V4L2_XFER_FUNC_DEFAULT)
@@ -2018,6 +2026,7 @@ void tpg_log_status(struct tpg_data *tpg)
pr_info("tpg colorspace: %d\n", tpg->colorspace);
pr_info("tpg transfer function: %d/%d\n", tpg->xfer_func, 
tpg->real_xfer_func);
pr_info("tpg Y'CbCr encoding: %d/%d\n", tpg->ycbcr_enc, 
tpg->real_ycbcr_enc);
+   pr_info("tpg HSV encoding: %d/%d\n", tpg->hsv_enc, tpg->real_hsv_enc);
pr_info("tpg quantization: %d/%d\n", tpg->quantization, 
tpg->real_quantization);
pr_info("tpg RGB range: %d/%d\n", tpg->rgb_range, tpg->real_rgb_range);
 }
diff --git a/drivers/media/platform/vivid/vivid-core.h 
b/drivers/media/platform/vivid/vivid-core.h
index b59b49456d45..5cdf95bdc4d1 100644
--- a/drivers/media/platform/vivid/vivid-core.h
+++ b/drivers/media/platform/vivid/vivid-core.h
@@ -346,6 +346,7 @@ struct vivid_dev {
struct v4l2_dv_timings  dv_timings_out;
u32 colorspace_out;
u32 ycbcr_enc_out;
+   u32 hsv_enc_out;
u32 quantization_out;
u32 xfer_func_out;
u32 service_set_out;
diff --git a/drivers/media/platform/vivid/vivid-ctrls.c 
b/drivers/media/platform/vivid/vivid-ctrls.c
index aceb38d9f7e7..34731f71cc00 100644
--- a/drivers/media/platform/vivid/vivid-ctrls.c
+++ b/drivers/media/platform/vivid/vivid-ctrls.c
@@ -79,6 +79,7 @@
 #define VIVID_CID_MAX_EDID_BLOCKS  (VIVID_CID_VIVID_BASE + 40)
 #define VIVID_CID_PERCENTAGE_FILL  (VIVID_CID_VIVID_BASE + 41)
 #define VIVID_CID_REDUCED_FPS  (VIVID_CID_VIVID_BASE + 42)
+#define VIVID_CID_HSV_ENC  (VIVID_CID_VIVID_BASE + 43)
 
 #define VIVID_CID_STD_SIGNAL_MODE  (VIVID_CID_VIVID_BASE + 60)
 #define VIVID_CID_STANDARD (VIVID_CID_VIVID_BASE + 61)
@@ -378,6 +379,14 @@ static int vivid_vid_cap_s_ctrl(struct v4l2_ctrl *ctrl)
vivid_send_source_change(dev, HDMI);
vivid_send_source_change(dev, WEBCAM);
break;
+   case VIVID_CID_HSV_ENC:
+   tpg_s_hsv_enc(>tpg, ctrl->val ? V4L2_HSV_ENC_256 :
+V4L2_HSV_ENC_180);
+   vivid_send_source_change(dev, TV);
+   

Re: [PATCH v4 0/3] perf annotate: Enable cross arch annotate

2016-08-18 Thread Ravi Bangoria
I've sent v5 series for this. Please review it.

Thanks,
Ravi

On Wednesday 13 July 2016 03:15 PM, Ravi Bangoria wrote:
> Arnaldo, Michael,
>
> I've tested this patchset on ppc64 BE and LE both. Please review this.
>
> -Ravi
>
> On Friday 08 July 2016 10:10 AM, Ravi Bangoria wrote:
>> Perf can currently only support code navigation (branches and calls) in
>> annotate when run on the same architecture where perf.data was recorded.
>> But cross arch annotate is not supported.
>>
>> This patchset enables cross arch annotate. Currently I've used x86
>> and arm instructions which are already available and adding support
>> for powerpc as well. Adding support for other arch will be easy.
>>
>> I've created this patch on top of acme/perf/core. And tested it with
>> x86 and powerpc only.
>>
>> Note for arm:
>> Few instructions were defined under #if __arm__ which I've used as a
>> table for arm. But I'm not sure whether instruction defined outside of
>> that also contains arm instructions. Apart from that, 'call__parse()'
>> and 'move__parse()' contains #ifdef __arm__ directive. I've changed it
>> to  if (!strcmp(norm_arch, arm)). I don't have a arm machine to test
>> these changes.
>>
>> Example:
>>
>>Record on powerpc:
>>$ ./perf record -a
>>
>>Report -> Annotate on x86:
>>$ ./perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc
>>
>> Changes in v4:
>>- powerpc: Added support for branch instructions that includes 'ctr'
>>- __maybe_unused was misplaced at few location. Corrected it.
>>- Moved position of v3 last patch that define macro for each arch name
>>
>> v3 link: https://lkml.org/lkml/2016/6/30/99
>>
>> Naveen N. Rao (1):
>>perf annotate: add powerpc support
>>
>> Ravi Bangoria (2):
>>perf: Define macro for normalized arch names
>>perf annotate: Enable cross arch annotate
>>
>>   tools/perf/arch/common.c   |  36 ++---
>>   tools/perf/arch/common.h   |  11 ++
>>   tools/perf/builtin-top.c   |   2 +-
>>   tools/perf/ui/browsers/annotate.c  |   3 +-
>>   tools/perf/ui/gtk/annotate.c   |   2 +-
>>   tools/perf/util/annotate.c | 273 
>> ++---
>>   tools/perf/util/annotate.h |   6 +-
>>   tools/perf/util/unwind-libunwind.c |   4 +-
>>   8 files changed, 265 insertions(+), 72 deletions(-)
>>
>> -- 
>> 2.5.5
>>
>



Re: [PATCH v4 0/3] perf annotate: Enable cross arch annotate

2016-08-18 Thread Ravi Bangoria
I've sent v5 series for this. Please review it.

Thanks,
Ravi

On Wednesday 13 July 2016 03:15 PM, Ravi Bangoria wrote:
> Arnaldo, Michael,
>
> I've tested this patchset on ppc64 BE and LE both. Please review this.
>
> -Ravi
>
> On Friday 08 July 2016 10:10 AM, Ravi Bangoria wrote:
>> Perf can currently only support code navigation (branches and calls) in
>> annotate when run on the same architecture where perf.data was recorded.
>> But cross arch annotate is not supported.
>>
>> This patchset enables cross arch annotate. Currently I've used x86
>> and arm instructions which are already available and adding support
>> for powerpc as well. Adding support for other arch will be easy.
>>
>> I've created this patch on top of acme/perf/core. And tested it with
>> x86 and powerpc only.
>>
>> Note for arm:
>> Few instructions were defined under #if __arm__ which I've used as a
>> table for arm. But I'm not sure whether instruction defined outside of
>> that also contains arm instructions. Apart from that, 'call__parse()'
>> and 'move__parse()' contains #ifdef __arm__ directive. I've changed it
>> to  if (!strcmp(norm_arch, arm)). I don't have a arm machine to test
>> these changes.
>>
>> Example:
>>
>>Record on powerpc:
>>$ ./perf record -a
>>
>>Report -> Annotate on x86:
>>$ ./perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc
>>
>> Changes in v4:
>>- powerpc: Added support for branch instructions that includes 'ctr'
>>- __maybe_unused was misplaced at few location. Corrected it.
>>- Moved position of v3 last patch that define macro for each arch name
>>
>> v3 link: https://lkml.org/lkml/2016/6/30/99
>>
>> Naveen N. Rao (1):
>>perf annotate: add powerpc support
>>
>> Ravi Bangoria (2):
>>perf: Define macro for normalized arch names
>>perf annotate: Enable cross arch annotate
>>
>>   tools/perf/arch/common.c   |  36 ++---
>>   tools/perf/arch/common.h   |  11 ++
>>   tools/perf/builtin-top.c   |   2 +-
>>   tools/perf/ui/browsers/annotate.c  |   3 +-
>>   tools/perf/ui/gtk/annotate.c   |   2 +-
>>   tools/perf/util/annotate.c | 273 
>> ++---
>>   tools/perf/util/annotate.h |   6 +-
>>   tools/perf/util/unwind-libunwind.c |   4 +-
>>   8 files changed, 265 insertions(+), 72 deletions(-)
>>
>> -- 
>> 2.5.5
>>
>



Re: [PATCH v4 02/57] x86/asm/head: remove unused init_rsp variable extern

2016-08-18 Thread Sebastian Andrzej Siewior
On 2016-08-18 08:05:42 [-0500], Josh Poimboeuf wrote:
> There is no init_rsp variable.  Remove its extern.

You could add that it was removed in 9cf4f298e29a ("x86: use stack_start
in x86_64") (merged in v2.6.27-rc1).

> Signed-off-by: Josh Poimboeuf 

Sebastian


[PATCH 2/8] pipe: move limit checking logic into pipe_set_size()

2016-08-18 Thread Michael Kerrisk (man-pages)
This is a preparatory patch for following work. Move the F_SETPIPE_SZ
limit-checking logic from pipe_fcntl() into pipe_set_size().
This simplifies the code a little, and allows for reworking
required in later patches.

Cc: Willy Tarreau 
Cc: Vegard Nossum 
Cc: socketp...@gmail.com
Cc: Tetsuo Handa 
Cc: Jens Axboe 
Cc: Al Viro 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Michael Kerrisk 

---
 fs/pipe.c | 41 ++---
 1 file changed, 18 insertions(+), 23 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 7d7c21e..4b98fd0 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1026,9 +1026,24 @@ static inline unsigned int round_pipe_size(unsigned int 
size)
  * Allocate a new array of pipe buffers and copy the info over. Returns the
  * pipe size if successful, or return -ERROR on error.
  */
-static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long nr_pages)
+static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
 {
struct pipe_buffer *bufs;
+   unsigned int size, nr_pages;
+
+   size = round_pipe_size(arg);
+   nr_pages = size >> PAGE_SHIFT;
+
+   if (!nr_pages)
+   return -EINVAL;
+
+   if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size)
+   return -EPERM;
+
+   if ((too_many_pipe_buffers_hard(pipe->user) ||
+   too_many_pipe_buffers_soft(pipe->user)) &&
+   !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
+   return -EPERM;
 
/*
 * We can shrink the pipe, if arg >= pipe->nrbufs. Since we don't
@@ -1112,28 +1127,9 @@ long pipe_fcntl(struct file *file, unsigned int cmd, 
unsigned long arg)
__pipe_lock(pipe);
 
switch (cmd) {
-   case F_SETPIPE_SZ: {
-   unsigned int size, nr_pages;
-
-   size = round_pipe_size(arg);
-   nr_pages = size >> PAGE_SHIFT;
-
-   ret = -EINVAL;
-   if (!nr_pages)
-   goto out;
-
-   if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) {
-   ret = -EPERM;
-   goto out;
-   } else if ((too_many_pipe_buffers_hard(pipe->user) ||
-   too_many_pipe_buffers_soft(pipe->user)) &&
-  !capable(CAP_SYS_RESOURCE) && 
!capable(CAP_SYS_ADMIN)) {
-   ret = -EPERM;
-   goto out;
-   }
-   ret = pipe_set_size(pipe, nr_pages);
+   case F_SETPIPE_SZ:
+   ret = pipe_set_size(pipe, arg);
break;
-   }
case F_GETPIPE_SZ:
ret = pipe->buffers * PAGE_SIZE;
break;
@@ -1142,7 +1138,6 @@ long pipe_fcntl(struct file *file, unsigned int cmd, 
unsigned long arg)
break;
}
 
-out:
__pipe_unlock(pipe);
return ret;
 }
-- 
2.5.5



Re: [PATCH v4 02/57] x86/asm/head: remove unused init_rsp variable extern

2016-08-18 Thread Sebastian Andrzej Siewior
On 2016-08-18 08:05:42 [-0500], Josh Poimboeuf wrote:
> There is no init_rsp variable.  Remove its extern.

You could add that it was removed in 9cf4f298e29a ("x86: use stack_start
in x86_64") (merged in v2.6.27-rc1).

> Signed-off-by: Josh Poimboeuf 

Sebastian


[PATCH 2/8] pipe: move limit checking logic into pipe_set_size()

2016-08-18 Thread Michael Kerrisk (man-pages)
This is a preparatory patch for following work. Move the F_SETPIPE_SZ
limit-checking logic from pipe_fcntl() into pipe_set_size().
This simplifies the code a little, and allows for reworking
required in later patches.

Cc: Willy Tarreau 
Cc: Vegard Nossum 
Cc: socketp...@gmail.com
Cc: Tetsuo Handa 
Cc: Jens Axboe 
Cc: Al Viro 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Michael Kerrisk 

---
 fs/pipe.c | 41 ++---
 1 file changed, 18 insertions(+), 23 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 7d7c21e..4b98fd0 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1026,9 +1026,24 @@ static inline unsigned int round_pipe_size(unsigned int 
size)
  * Allocate a new array of pipe buffers and copy the info over. Returns the
  * pipe size if successful, or return -ERROR on error.
  */
-static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long nr_pages)
+static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
 {
struct pipe_buffer *bufs;
+   unsigned int size, nr_pages;
+
+   size = round_pipe_size(arg);
+   nr_pages = size >> PAGE_SHIFT;
+
+   if (!nr_pages)
+   return -EINVAL;
+
+   if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size)
+   return -EPERM;
+
+   if ((too_many_pipe_buffers_hard(pipe->user) ||
+   too_many_pipe_buffers_soft(pipe->user)) &&
+   !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
+   return -EPERM;
 
/*
 * We can shrink the pipe, if arg >= pipe->nrbufs. Since we don't
@@ -1112,28 +1127,9 @@ long pipe_fcntl(struct file *file, unsigned int cmd, 
unsigned long arg)
__pipe_lock(pipe);
 
switch (cmd) {
-   case F_SETPIPE_SZ: {
-   unsigned int size, nr_pages;
-
-   size = round_pipe_size(arg);
-   nr_pages = size >> PAGE_SHIFT;
-
-   ret = -EINVAL;
-   if (!nr_pages)
-   goto out;
-
-   if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) {
-   ret = -EPERM;
-   goto out;
-   } else if ((too_many_pipe_buffers_hard(pipe->user) ||
-   too_many_pipe_buffers_soft(pipe->user)) &&
-  !capable(CAP_SYS_RESOURCE) && 
!capable(CAP_SYS_ADMIN)) {
-   ret = -EPERM;
-   goto out;
-   }
-   ret = pipe_set_size(pipe, nr_pages);
+   case F_SETPIPE_SZ:
+   ret = pipe_set_size(pipe, arg);
break;
-   }
case F_GETPIPE_SZ:
ret = pipe->buffers * PAGE_SIZE;
break;
@@ -1142,7 +1138,6 @@ long pipe_fcntl(struct file *file, unsigned int cmd, 
unsigned long arg)
break;
}
 
-out:
__pipe_unlock(pipe);
return ret;
 }
-- 
2.5.5



[PATCH v11 6/9] reset: mediatek: Add MT2701 reset controller dt-binding file

2016-08-18 Thread Erin Lo
From: Shunli Wang 

Dt-binding file about reset controller is used to provide
kinds of definition, which is referenced by dts file and
IC-specified reset controller driver code.

Signed-off-by: Shunli Wang 
Signed-off-by: James Liao 
Signed-off-by: Erin Lo 
Tested-by: John Crispin 
Acked-by: Philipp Zabel 
---
 include/dt-bindings/reset/mt2701-resets.h | 83 +++
 1 file changed, 83 insertions(+)
 create mode 100644 include/dt-bindings/reset/mt2701-resets.h

diff --git a/include/dt-bindings/reset/mt2701-resets.h 
b/include/dt-bindings/reset/mt2701-resets.h
new file mode 100644
index 000..aaf0305
--- /dev/null
+++ b/include/dt-bindings/reset/mt2701-resets.h
@@ -0,0 +1,83 @@
+/*
+ * Copyright (c) 2015 MediaTek, Shunli Wang 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _DT_BINDINGS_RESET_CONTROLLER_MT2701
+#define _DT_BINDINGS_RESET_CONTROLLER_MT2701
+
+/* INFRACFG resets */
+#define MT2701_INFRA_EMI_REG_RST   0
+#define MT2701_INFRA_DRAMC0_A0_RST 1
+#define MT2701_INFRA_FHCTL_RST 2
+#define MT2701_INFRA_APCIRQ_EINT_RST   3
+#define MT2701_INFRA_APXGPT_RST4
+#define MT2701_INFRA_SCPSYS_RST5
+#define MT2701_INFRA_KP_RST6
+#define MT2701_INFRA_PMIC_WRAP_RST 7
+#define MT2701_INFRA_MIPI_RST  8
+#define MT2701_INFRA_IRRX_RST  9
+#define MT2701_INFRA_CEC_RST   10
+#define MT2701_INFRA_EMI_RST   32
+#define MT2701_INFRA_DRAMC0_RST34
+#define MT2701_INFRA_TRNG_RST  37
+#define MT2701_INFRA_SYSIRQ_RST38
+
+/*  PERICFG resets */
+#define MT2701_PERI_UART0_SW_RST   0
+#define MT2701_PERI_UART1_SW_RST   1
+#define MT2701_PERI_UART2_SW_RST   2
+#define MT2701_PERI_UART3_SW_RST   3
+#define MT2701_PERI_GCPU_SW_RST5
+#define MT2701_PERI_BTIF_SW_RST6
+#define MT2701_PERI_PWM_SW_RST 8
+#define MT2701_PERI_AUXADC_SW_RST  10
+#define MT2701_PERI_DMA_SW_RST 11
+#define MT2701_PERI_NFI_SW_RST 14
+#define MT2701_PERI_NLI_SW_RST 15
+#define MT2701_PERI_THERM_SW_RST   16
+#define MT2701_PERI_MSDC2_SW_RST   17
+#define MT2701_PERI_MSDC0_SW_RST   19
+#define MT2701_PERI_MSDC1_SW_RST   20
+#define MT2701_PERI_I2C0_SW_RST22
+#define MT2701_PERI_I2C1_SW_RST23
+#define MT2701_PERI_I2C2_SW_RST24
+#define MT2701_PERI_I2C3_SW_RST25
+#define MT2701_PERI_USB_SW_RST 28
+#define MT2701_PERI_ETH_SW_RST 29
+#define MT2701_PERI_SPI0_SW_RST33
+
+/* TOPRGU resets */
+#define MT2701_TOPRGU_INFRA_RST0
+#define MT2701_TOPRGU_MM_RST   1
+#define MT2701_TOPRGU_MFG_RST  2
+#define MT2701_TOPRGU_ETHDMA_RST   3
+#define MT2701_TOPRGU_VDEC_RST 4
+#define MT2701_TOPRGU_VENC_IMG_RST 5
+#define MT2701_TOPRGU_DDRPHY_RST   6
+#define MT2701_TOPRGU_MD_RST   7
+#define MT2701_TOPRGU_INFRA_AO_RST 8
+#define MT2701_TOPRGU_CONN_RST 9
+#define MT2701_TOPRGU_APMIXED_RST  10
+#define MT2701_TOPRGU_HIFSYS_RST   11
+#define MT2701_TOPRGU_CONN_MCU_RST 12
+#define MT2701_TOPRGU_BDP_DISP_RST 13
+
+/* HIFSYS resets */
+#define MT2701_HIFSYS_UHOST0_RST   3
+#define MT2701_HIFSYS_UHOST1_RST   4
+#define MT2701_HIFSYS_UPHY0_RST21
+#define MT2701_HIFSYS_UPHY1_RST22
+#define MT2701_HIFSYS_PCIE0_RST24
+#define MT2701_HIFSYS_PCIE1_RST25
+#define MT2701_HIFSYS_PCIE2_RST26
+
+#endif  /* _DT_BINDINGS_RESET_CONTROLLER_MT2701 */
-- 
1.9.1



[PATCH v11 2/9] clk: mediatek: Refine the makefile to support multiple clock drivers

2016-08-18 Thread Erin Lo
From: James Liao 

Add a Kconfig to define clock configuration for each SoC, and
modify the Makefile to build drivers that only selected in config.

Signed-off-by: Shunli Wang 
Signed-off-by: James Liao 
Signed-off-by: Erin Lo 
Tested-by: John Crispin 
Reviewed-by: Matthias Brugger 
---
 drivers/clk/Kconfig   |  1 +
 drivers/clk/mediatek/Kconfig  | 21 +
 drivers/clk/mediatek/Makefile |  6 +++---
 3 files changed, 25 insertions(+), 3 deletions(-)
 create mode 100644 drivers/clk/mediatek/Kconfig

diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
index e2d9bd7..4265471 100644
--- a/drivers/clk/Kconfig
+++ b/drivers/clk/Kconfig
@@ -210,6 +210,7 @@ config COMMON_CLK_OXNAS
 
 source "drivers/clk/bcm/Kconfig"
 source "drivers/clk/hisilicon/Kconfig"
+source "drivers/clk/mediatek/Kconfig"
 source "drivers/clk/meson/Kconfig"
 source "drivers/clk/mvebu/Kconfig"
 source "drivers/clk/qcom/Kconfig"
diff --git a/drivers/clk/mediatek/Kconfig b/drivers/clk/mediatek/Kconfig
new file mode 100644
index 000..380c372
--- /dev/null
+++ b/drivers/clk/mediatek/Kconfig
@@ -0,0 +1,21 @@
+#
+# MediaTek SoC drivers
+#
+config COMMON_CLK_MEDIATEK
+   bool
+   ---help---
+ Mediatek SoCs' clock support.
+
+config COMMON_CLK_MT8135
+   bool "Clock driver for Mediatek MT8135"
+   select COMMON_CLK_MEDIATEK
+   default ARCH_MEDIATEK
+   ---help---
+ This driver supports Mediatek MT8135 clocks.
+
+config COMMON_CLK_MT8173
+   bool "Clock driver for Mediatek MT8173"
+   select COMMON_CLK_MEDIATEK
+   default ARCH_MEDIATEK
+   ---help---
+ This driver supports Mediatek MT8173 clocks.
diff --git a/drivers/clk/mediatek/Makefile b/drivers/clk/mediatek/Makefile
index 95fdfac..32e7222 100644
--- a/drivers/clk/mediatek/Makefile
+++ b/drivers/clk/mediatek/Makefile
@@ -1,4 +1,4 @@
-obj-y += clk-mtk.o clk-pll.o clk-gate.o clk-apmixed.o
+obj-$(CONFIG_COMMON_CLK_MEDIATEK) += clk-mtk.o clk-pll.o clk-gate.o 
clk-apmixed.o
 obj-$(CONFIG_RESET_CONTROLLER) += reset.o
-obj-y += clk-mt8135.o
-obj-y += clk-mt8173.o
+obj-$(CONFIG_COMMON_CLK_MT8135) += clk-mt8135.o
+obj-$(CONFIG_COMMON_CLK_MT8173) += clk-mt8173.o
-- 
1.9.1



[PATCH v11 6/9] reset: mediatek: Add MT2701 reset controller dt-binding file

2016-08-18 Thread Erin Lo
From: Shunli Wang 

Dt-binding file about reset controller is used to provide
kinds of definition, which is referenced by dts file and
IC-specified reset controller driver code.

Signed-off-by: Shunli Wang 
Signed-off-by: James Liao 
Signed-off-by: Erin Lo 
Tested-by: John Crispin 
Acked-by: Philipp Zabel 
---
 include/dt-bindings/reset/mt2701-resets.h | 83 +++
 1 file changed, 83 insertions(+)
 create mode 100644 include/dt-bindings/reset/mt2701-resets.h

diff --git a/include/dt-bindings/reset/mt2701-resets.h 
b/include/dt-bindings/reset/mt2701-resets.h
new file mode 100644
index 000..aaf0305
--- /dev/null
+++ b/include/dt-bindings/reset/mt2701-resets.h
@@ -0,0 +1,83 @@
+/*
+ * Copyright (c) 2015 MediaTek, Shunli Wang 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _DT_BINDINGS_RESET_CONTROLLER_MT2701
+#define _DT_BINDINGS_RESET_CONTROLLER_MT2701
+
+/* INFRACFG resets */
+#define MT2701_INFRA_EMI_REG_RST   0
+#define MT2701_INFRA_DRAMC0_A0_RST 1
+#define MT2701_INFRA_FHCTL_RST 2
+#define MT2701_INFRA_APCIRQ_EINT_RST   3
+#define MT2701_INFRA_APXGPT_RST4
+#define MT2701_INFRA_SCPSYS_RST5
+#define MT2701_INFRA_KP_RST6
+#define MT2701_INFRA_PMIC_WRAP_RST 7
+#define MT2701_INFRA_MIPI_RST  8
+#define MT2701_INFRA_IRRX_RST  9
+#define MT2701_INFRA_CEC_RST   10
+#define MT2701_INFRA_EMI_RST   32
+#define MT2701_INFRA_DRAMC0_RST34
+#define MT2701_INFRA_TRNG_RST  37
+#define MT2701_INFRA_SYSIRQ_RST38
+
+/*  PERICFG resets */
+#define MT2701_PERI_UART0_SW_RST   0
+#define MT2701_PERI_UART1_SW_RST   1
+#define MT2701_PERI_UART2_SW_RST   2
+#define MT2701_PERI_UART3_SW_RST   3
+#define MT2701_PERI_GCPU_SW_RST5
+#define MT2701_PERI_BTIF_SW_RST6
+#define MT2701_PERI_PWM_SW_RST 8
+#define MT2701_PERI_AUXADC_SW_RST  10
+#define MT2701_PERI_DMA_SW_RST 11
+#define MT2701_PERI_NFI_SW_RST 14
+#define MT2701_PERI_NLI_SW_RST 15
+#define MT2701_PERI_THERM_SW_RST   16
+#define MT2701_PERI_MSDC2_SW_RST   17
+#define MT2701_PERI_MSDC0_SW_RST   19
+#define MT2701_PERI_MSDC1_SW_RST   20
+#define MT2701_PERI_I2C0_SW_RST22
+#define MT2701_PERI_I2C1_SW_RST23
+#define MT2701_PERI_I2C2_SW_RST24
+#define MT2701_PERI_I2C3_SW_RST25
+#define MT2701_PERI_USB_SW_RST 28
+#define MT2701_PERI_ETH_SW_RST 29
+#define MT2701_PERI_SPI0_SW_RST33
+
+/* TOPRGU resets */
+#define MT2701_TOPRGU_INFRA_RST0
+#define MT2701_TOPRGU_MM_RST   1
+#define MT2701_TOPRGU_MFG_RST  2
+#define MT2701_TOPRGU_ETHDMA_RST   3
+#define MT2701_TOPRGU_VDEC_RST 4
+#define MT2701_TOPRGU_VENC_IMG_RST 5
+#define MT2701_TOPRGU_DDRPHY_RST   6
+#define MT2701_TOPRGU_MD_RST   7
+#define MT2701_TOPRGU_INFRA_AO_RST 8
+#define MT2701_TOPRGU_CONN_RST 9
+#define MT2701_TOPRGU_APMIXED_RST  10
+#define MT2701_TOPRGU_HIFSYS_RST   11
+#define MT2701_TOPRGU_CONN_MCU_RST 12
+#define MT2701_TOPRGU_BDP_DISP_RST 13
+
+/* HIFSYS resets */
+#define MT2701_HIFSYS_UHOST0_RST   3
+#define MT2701_HIFSYS_UHOST1_RST   4
+#define MT2701_HIFSYS_UPHY0_RST21
+#define MT2701_HIFSYS_UPHY1_RST22
+#define MT2701_HIFSYS_PCIE0_RST24
+#define MT2701_HIFSYS_PCIE1_RST25
+#define MT2701_HIFSYS_PCIE2_RST26
+
+#endif  /* _DT_BINDINGS_RESET_CONTROLLER_MT2701 */
-- 
1.9.1



[PATCH v11 2/9] clk: mediatek: Refine the makefile to support multiple clock drivers

2016-08-18 Thread Erin Lo
From: James Liao 

Add a Kconfig to define clock configuration for each SoC, and
modify the Makefile to build drivers that only selected in config.

Signed-off-by: Shunli Wang 
Signed-off-by: James Liao 
Signed-off-by: Erin Lo 
Tested-by: John Crispin 
Reviewed-by: Matthias Brugger 
---
 drivers/clk/Kconfig   |  1 +
 drivers/clk/mediatek/Kconfig  | 21 +
 drivers/clk/mediatek/Makefile |  6 +++---
 3 files changed, 25 insertions(+), 3 deletions(-)
 create mode 100644 drivers/clk/mediatek/Kconfig

diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
index e2d9bd7..4265471 100644
--- a/drivers/clk/Kconfig
+++ b/drivers/clk/Kconfig
@@ -210,6 +210,7 @@ config COMMON_CLK_OXNAS
 
 source "drivers/clk/bcm/Kconfig"
 source "drivers/clk/hisilicon/Kconfig"
+source "drivers/clk/mediatek/Kconfig"
 source "drivers/clk/meson/Kconfig"
 source "drivers/clk/mvebu/Kconfig"
 source "drivers/clk/qcom/Kconfig"
diff --git a/drivers/clk/mediatek/Kconfig b/drivers/clk/mediatek/Kconfig
new file mode 100644
index 000..380c372
--- /dev/null
+++ b/drivers/clk/mediatek/Kconfig
@@ -0,0 +1,21 @@
+#
+# MediaTek SoC drivers
+#
+config COMMON_CLK_MEDIATEK
+   bool
+   ---help---
+ Mediatek SoCs' clock support.
+
+config COMMON_CLK_MT8135
+   bool "Clock driver for Mediatek MT8135"
+   select COMMON_CLK_MEDIATEK
+   default ARCH_MEDIATEK
+   ---help---
+ This driver supports Mediatek MT8135 clocks.
+
+config COMMON_CLK_MT8173
+   bool "Clock driver for Mediatek MT8173"
+   select COMMON_CLK_MEDIATEK
+   default ARCH_MEDIATEK
+   ---help---
+ This driver supports Mediatek MT8173 clocks.
diff --git a/drivers/clk/mediatek/Makefile b/drivers/clk/mediatek/Makefile
index 95fdfac..32e7222 100644
--- a/drivers/clk/mediatek/Makefile
+++ b/drivers/clk/mediatek/Makefile
@@ -1,4 +1,4 @@
-obj-y += clk-mtk.o clk-pll.o clk-gate.o clk-apmixed.o
+obj-$(CONFIG_COMMON_CLK_MEDIATEK) += clk-mtk.o clk-pll.o clk-gate.o 
clk-apmixed.o
 obj-$(CONFIG_RESET_CONTROLLER) += reset.o
-obj-y += clk-mt8135.o
-obj-y += clk-mt8173.o
+obj-$(CONFIG_COMMON_CLK_MT8135) += clk-mt8135.o
+obj-$(CONFIG_COMMON_CLK_MT8173) += clk-mt8173.o
-- 
1.9.1



[PATCH v11 3/9] dt-bindings: ARM: Mediatek: Document bindings for MT2701

2016-08-18 Thread Erin Lo
From: James Liao 

This patch adds the binding documentation for apmixedsys, bdpsys,
ethsys, hifsys, imgsys, infracfg, mmsys, pericfg, topckgen and
vdecsys for Mediatek MT2701.

Signed-off-by: James Liao 
Signed-off-by: Erin Lo 
Tested-by: John Crispin 
Acked-by: Rob Herring 
---
 .../bindings/arm/mediatek/mediatek,apmixedsys.txt  |  3 ++-
 .../bindings/arm/mediatek/mediatek,bdpsys.txt  | 22 
 .../bindings/arm/mediatek/mediatek,ethsys.txt  | 22 
 .../bindings/arm/mediatek/mediatek,hifsys.txt  | 24 ++
 .../bindings/arm/mediatek/mediatek,imgsys.txt  |  3 ++-
 .../bindings/arm/mediatek/mediatek,infracfg.txt|  3 ++-
 .../bindings/arm/mediatek/mediatek,mmsys.txt   |  3 ++-
 .../bindings/arm/mediatek/mediatek,pericfg.txt |  3 ++-
 .../bindings/arm/mediatek/mediatek,topckgen.txt|  3 ++-
 .../bindings/arm/mediatek/mediatek,vdecsys.txt |  3 ++-
 10 files changed, 82 insertions(+), 7 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt
 create mode 100644 
Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt
 create mode 100644 
Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt

diff --git 
a/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt 
b/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt
index 936166f..cb0054a 100644
--- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt
+++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt
@@ -5,7 +5,8 @@ The Mediatek apmixedsys controller provides the PLLs to the 
system.
 
 Required Properties:
 
-- compatible: Should be:
+- compatible: Should be one of:
+   - "mediatek,mt2701-apmixedsys"
- "mediatek,mt8135-apmixedsys"
- "mediatek,mt8173-apmixedsys"
 - #clock-cells: Must be 1
diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt 
b/Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt
new file mode 100644
index 000..4137196
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt
@@ -0,0 +1,22 @@
+Mediatek bdpsys controller
+
+
+The Mediatek bdpsys controller provides various clocks to the system.
+
+Required Properties:
+
+- compatible: Should be:
+   - "mediatek,mt2701-bdpsys", "syscon"
+- #clock-cells: Must be 1
+
+The bdpsys controller uses the common clk binding from
+Documentation/devicetree/bindings/clock/clock-bindings.txt
+The available clocks are defined in dt-bindings/clock/mt*-clk.h.
+
+Example:
+
+bdpsys: clock-controller@1c00 {
+   compatible = "mediatek,mt2701-bdpsys", "syscon";
+   reg = <0 0x1c00 0 0x1000>;
+   #clock-cells = <1>;
+};
diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt 
b/Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt
new file mode 100644
index 000..768f3a5
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt
@@ -0,0 +1,22 @@
+Mediatek ethsys controller
+
+
+The Mediatek ethsys controller provides various clocks to the system.
+
+Required Properties:
+
+- compatible: Should be:
+   - "mediatek,mt2701-ethsys", "syscon"
+- #clock-cells: Must be 1
+
+The ethsys controller uses the common clk binding from
+Documentation/devicetree/bindings/clock/clock-bindings.txt
+The available clocks are defined in dt-bindings/clock/mt*-clk.h.
+
+Example:
+
+ethsys: clock-controller@1b00 {
+   compatible = "mediatek,mt2701-ethsys", "syscon";
+   reg = <0 0x1b00 0 0x1000>;
+   #clock-cells = <1>;
+};
diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt 
b/Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt
new file mode 100644
index 000..beed7b5
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt
@@ -0,0 +1,24 @@
+Mediatek hifsys controller
+
+
+The Mediatek hifsys controller provides various clocks and reset
+outputs to the system.
+
+Required Properties:
+
+- compatible: Should be:
+   - "mediatek,mt2701-hifsys", "syscon"
+- #clock-cells: Must be 1
+
+The hifsys controller uses the common clk binding from
+Documentation/devicetree/bindings/clock/clock-bindings.txt
+The available clocks are defined in dt-bindings/clock/mt*-clk.h.
+
+Example:
+
+hifsys: clock-controller@1a00 {
+   compatible = "mediatek,mt2701-hifsys", "syscon";
+   reg = <0 0x1a00 0 0x1000>;
+   #clock-cells = <1>;
+   #reset-cells = <1>;
+};
diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,imgsys.txt 
b/Documentation/devicetree/bindings/arm/mediatek/mediatek,imgsys.txt
index b1f2ce1..f6a9166 

[PATCH v11 3/9] dt-bindings: ARM: Mediatek: Document bindings for MT2701

2016-08-18 Thread Erin Lo
From: James Liao 

This patch adds the binding documentation for apmixedsys, bdpsys,
ethsys, hifsys, imgsys, infracfg, mmsys, pericfg, topckgen and
vdecsys for Mediatek MT2701.

Signed-off-by: James Liao 
Signed-off-by: Erin Lo 
Tested-by: John Crispin 
Acked-by: Rob Herring 
---
 .../bindings/arm/mediatek/mediatek,apmixedsys.txt  |  3 ++-
 .../bindings/arm/mediatek/mediatek,bdpsys.txt  | 22 
 .../bindings/arm/mediatek/mediatek,ethsys.txt  | 22 
 .../bindings/arm/mediatek/mediatek,hifsys.txt  | 24 ++
 .../bindings/arm/mediatek/mediatek,imgsys.txt  |  3 ++-
 .../bindings/arm/mediatek/mediatek,infracfg.txt|  3 ++-
 .../bindings/arm/mediatek/mediatek,mmsys.txt   |  3 ++-
 .../bindings/arm/mediatek/mediatek,pericfg.txt |  3 ++-
 .../bindings/arm/mediatek/mediatek,topckgen.txt|  3 ++-
 .../bindings/arm/mediatek/mediatek,vdecsys.txt |  3 ++-
 10 files changed, 82 insertions(+), 7 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt
 create mode 100644 
Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt
 create mode 100644 
Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt

diff --git 
a/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt 
b/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt
index 936166f..cb0054a 100644
--- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt
+++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt
@@ -5,7 +5,8 @@ The Mediatek apmixedsys controller provides the PLLs to the 
system.
 
 Required Properties:
 
-- compatible: Should be:
+- compatible: Should be one of:
+   - "mediatek,mt2701-apmixedsys"
- "mediatek,mt8135-apmixedsys"
- "mediatek,mt8173-apmixedsys"
 - #clock-cells: Must be 1
diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt 
b/Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt
new file mode 100644
index 000..4137196
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt
@@ -0,0 +1,22 @@
+Mediatek bdpsys controller
+
+
+The Mediatek bdpsys controller provides various clocks to the system.
+
+Required Properties:
+
+- compatible: Should be:
+   - "mediatek,mt2701-bdpsys", "syscon"
+- #clock-cells: Must be 1
+
+The bdpsys controller uses the common clk binding from
+Documentation/devicetree/bindings/clock/clock-bindings.txt
+The available clocks are defined in dt-bindings/clock/mt*-clk.h.
+
+Example:
+
+bdpsys: clock-controller@1c00 {
+   compatible = "mediatek,mt2701-bdpsys", "syscon";
+   reg = <0 0x1c00 0 0x1000>;
+   #clock-cells = <1>;
+};
diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt 
b/Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt
new file mode 100644
index 000..768f3a5
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt
@@ -0,0 +1,22 @@
+Mediatek ethsys controller
+
+
+The Mediatek ethsys controller provides various clocks to the system.
+
+Required Properties:
+
+- compatible: Should be:
+   - "mediatek,mt2701-ethsys", "syscon"
+- #clock-cells: Must be 1
+
+The ethsys controller uses the common clk binding from
+Documentation/devicetree/bindings/clock/clock-bindings.txt
+The available clocks are defined in dt-bindings/clock/mt*-clk.h.
+
+Example:
+
+ethsys: clock-controller@1b00 {
+   compatible = "mediatek,mt2701-ethsys", "syscon";
+   reg = <0 0x1b00 0 0x1000>;
+   #clock-cells = <1>;
+};
diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt 
b/Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt
new file mode 100644
index 000..beed7b5
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt
@@ -0,0 +1,24 @@
+Mediatek hifsys controller
+
+
+The Mediatek hifsys controller provides various clocks and reset
+outputs to the system.
+
+Required Properties:
+
+- compatible: Should be:
+   - "mediatek,mt2701-hifsys", "syscon"
+- #clock-cells: Must be 1
+
+The hifsys controller uses the common clk binding from
+Documentation/devicetree/bindings/clock/clock-bindings.txt
+The available clocks are defined in dt-bindings/clock/mt*-clk.h.
+
+Example:
+
+hifsys: clock-controller@1a00 {
+   compatible = "mediatek,mt2701-hifsys", "syscon";
+   reg = <0 0x1a00 0 0x1000>;
+   #clock-cells = <1>;
+   #reset-cells = <1>;
+};
diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,imgsys.txt 
b/Documentation/devicetree/bindings/arm/mediatek/mediatek,imgsys.txt
index b1f2ce1..f6a9166 100644
--- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,imgsys.txt
+++ 

[PATCH v11 5/9] clk: mediatek: Add MT2701 clock support

2016-08-18 Thread Erin Lo
From: Shunli Wang 

Add MT2701 clock support, include topckgen, apmixedsys,
infracfg, pericfg and subsystem clocks.

Signed-off-by: Shunli Wang 
Signed-off-by: James Liao 
Signed-off-by: Erin Lo 
Tested-by: John Crispin 
---
 drivers/clk/mediatek/Kconfig   |   43 ++
 drivers/clk/mediatek/Makefile  |7 +
 drivers/clk/mediatek/clk-gate.c|   52 ++
 drivers/clk/mediatek/clk-gate.h|2 +
 drivers/clk/mediatek/clk-mt2701-bdp.c  |  140 +
 drivers/clk/mediatek/clk-mt2701-eth.c  |   82 +++
 drivers/clk/mediatek/clk-mt2701-hif.c  |   79 +++
 drivers/clk/mediatek/clk-mt2701-img.c  |   82 +++
 drivers/clk/mediatek/clk-mt2701-mm.c   |  125 
 drivers/clk/mediatek/clk-mt2701-vdec.c |   93 +++
 drivers/clk/mediatek/clk-mt2701.c  | 1033 
 drivers/clk/mediatek/clk-mtk.c |   40 ++
 drivers/clk/mediatek/clk-mtk.h |   41 +-
 drivers/clk/mediatek/clk-pll.c |1 +
 14 files changed, 1815 insertions(+), 5 deletions(-)
 create mode 100644 drivers/clk/mediatek/clk-mt2701-bdp.c
 create mode 100644 drivers/clk/mediatek/clk-mt2701-eth.c
 create mode 100644 drivers/clk/mediatek/clk-mt2701-hif.c
 create mode 100644 drivers/clk/mediatek/clk-mt2701-img.c
 create mode 100644 drivers/clk/mediatek/clk-mt2701-mm.c
 create mode 100644 drivers/clk/mediatek/clk-mt2701-vdec.c
 create mode 100644 drivers/clk/mediatek/clk-mt2701.c

diff --git a/drivers/clk/mediatek/Kconfig b/drivers/clk/mediatek/Kconfig
index 380c372..7202db5 100644
--- a/drivers/clk/mediatek/Kconfig
+++ b/drivers/clk/mediatek/Kconfig
@@ -6,6 +6,49 @@ config COMMON_CLK_MEDIATEK
---help---
  Mediatek SoCs' clock support.
 
+config COMMON_CLK_MT2701
+   bool "Clock driver for Mediatek MT2701"
+   select COMMON_CLK_MEDIATEK
+   default ARCH_MEDIATEK
+   ---help---
+ This driver supports Mediatek MT2701 basic clocks.
+
+config COMMON_CLK_MT2701_MMSYS
+   bool "Clock driver for Mediatek MT2701 mmsys"
+   select COMMON_CLK_MT2701
+   ---help---
+ This driver supports Mediatek MT2701 mmsys clocks.
+
+config COMMON_CLK_MT2701_IMGSYS
+   bool "Clock driver for Mediatek MT2701 imgsys"
+   select COMMON_CLK_MT2701
+   ---help---
+ This driver supports Mediatek MT2701 imgsys clocks.
+
+config COMMON_CLK_MT2701_VDECSYS
+   bool "Clock driver for Mediatek MT2701 vdecsys"
+   select COMMON_CLK_MT2701
+   ---help---
+ This driver supports Mediatek MT2701 vdecsys clocks.
+
+config COMMON_CLK_MT2701_HIFSYS
+   bool "Clock driver for Mediatek MT2701 hifsys"
+   select COMMON_CLK_MT2701
+   ---help---
+ This driver supports Mediatek MT2701 hifsys clocks.
+
+config COMMON_CLK_MT2701_ETHSYS
+   bool "Clock driver for Mediatek MT2701 ethsys"
+   select COMMON_CLK_MT2701
+   ---help---
+ This driver supports Mediatek MT2701 ethsys clocks.
+
+config COMMON_CLK_MT2701_BDPSYS
+   bool "Clock driver for Mediatek MT2701 bdpsys"
+   select COMMON_CLK_MT2701
+   ---help---
+ This driver supports Mediatek MT2701 bdpsys clocks.
+
 config COMMON_CLK_MT8135
bool "Clock driver for Mediatek MT8135"
select COMMON_CLK_MEDIATEK
diff --git a/drivers/clk/mediatek/Makefile b/drivers/clk/mediatek/Makefile
index 32e7222..19ae7ef 100644
--- a/drivers/clk/mediatek/Makefile
+++ b/drivers/clk/mediatek/Makefile
@@ -1,4 +1,11 @@
 obj-$(CONFIG_COMMON_CLK_MEDIATEK) += clk-mtk.o clk-pll.o clk-gate.o 
clk-apmixed.o
 obj-$(CONFIG_RESET_CONTROLLER) += reset.o
+obj-$(CONFIG_COMMON_CLK_MT2701) += clk-mt2701.o
+obj-$(CONFIG_COMMON_CLK_MT2701_BDPSYS) += clk-mt2701-bdp.o
+obj-$(CONFIG_COMMON_CLK_MT2701_ETHSYS) += clk-mt2701-eth.o
+obj-$(CONFIG_COMMON_CLK_MT2701_HIFSYS) += clk-mt2701-hif.o
+obj-$(CONFIG_COMMON_CLK_MT2701_IMGSYS) += clk-mt2701-img.o
+obj-$(CONFIG_COMMON_CLK_MT2701_MMSYS) += clk-mt2701-mm.o
+obj-$(CONFIG_COMMON_CLK_MT2701_VDECSYS) += clk-mt2701-vdec.o
 obj-$(CONFIG_COMMON_CLK_MT8135) += clk-mt8135.o
 obj-$(CONFIG_COMMON_CLK_MT8173) += clk-mt8173.o
diff --git a/drivers/clk/mediatek/clk-gate.c b/drivers/clk/mediatek/clk-gate.c
index d8787bf..934bf0e 100644
--- a/drivers/clk/mediatek/clk-gate.c
+++ b/drivers/clk/mediatek/clk-gate.c
@@ -61,6 +61,22 @@ static void mtk_cg_clr_bit(struct clk_hw *hw)
regmap_write(cg->regmap, cg->clr_ofs, BIT(cg->bit));
 }
 
+static void mtk_cg_set_bit_no_setclr(struct clk_hw *hw)
+{
+   struct mtk_clk_gate *cg = to_mtk_clk_gate(hw);
+   u32 cgbit = BIT(cg->bit);
+
+   regmap_update_bits(cg->regmap, cg->sta_ofs, cgbit, cgbit);
+}
+
+static void mtk_cg_clr_bit_no_setclr(struct clk_hw *hw)
+{
+   struct mtk_clk_gate *cg = to_mtk_clk_gate(hw);
+   u32 cgbit = BIT(cg->bit);
+
+   regmap_update_bits(cg->regmap, cg->sta_ofs, cgbit, 0);
+}
+
 static int mtk_cg_enable(struct clk_hw 

[PATCH v11 5/9] clk: mediatek: Add MT2701 clock support

2016-08-18 Thread Erin Lo
From: Shunli Wang 

Add MT2701 clock support, include topckgen, apmixedsys,
infracfg, pericfg and subsystem clocks.

Signed-off-by: Shunli Wang 
Signed-off-by: James Liao 
Signed-off-by: Erin Lo 
Tested-by: John Crispin 
---
 drivers/clk/mediatek/Kconfig   |   43 ++
 drivers/clk/mediatek/Makefile  |7 +
 drivers/clk/mediatek/clk-gate.c|   52 ++
 drivers/clk/mediatek/clk-gate.h|2 +
 drivers/clk/mediatek/clk-mt2701-bdp.c  |  140 +
 drivers/clk/mediatek/clk-mt2701-eth.c  |   82 +++
 drivers/clk/mediatek/clk-mt2701-hif.c  |   79 +++
 drivers/clk/mediatek/clk-mt2701-img.c  |   82 +++
 drivers/clk/mediatek/clk-mt2701-mm.c   |  125 
 drivers/clk/mediatek/clk-mt2701-vdec.c |   93 +++
 drivers/clk/mediatek/clk-mt2701.c  | 1033 
 drivers/clk/mediatek/clk-mtk.c |   40 ++
 drivers/clk/mediatek/clk-mtk.h |   41 +-
 drivers/clk/mediatek/clk-pll.c |1 +
 14 files changed, 1815 insertions(+), 5 deletions(-)
 create mode 100644 drivers/clk/mediatek/clk-mt2701-bdp.c
 create mode 100644 drivers/clk/mediatek/clk-mt2701-eth.c
 create mode 100644 drivers/clk/mediatek/clk-mt2701-hif.c
 create mode 100644 drivers/clk/mediatek/clk-mt2701-img.c
 create mode 100644 drivers/clk/mediatek/clk-mt2701-mm.c
 create mode 100644 drivers/clk/mediatek/clk-mt2701-vdec.c
 create mode 100644 drivers/clk/mediatek/clk-mt2701.c

diff --git a/drivers/clk/mediatek/Kconfig b/drivers/clk/mediatek/Kconfig
index 380c372..7202db5 100644
--- a/drivers/clk/mediatek/Kconfig
+++ b/drivers/clk/mediatek/Kconfig
@@ -6,6 +6,49 @@ config COMMON_CLK_MEDIATEK
---help---
  Mediatek SoCs' clock support.
 
+config COMMON_CLK_MT2701
+   bool "Clock driver for Mediatek MT2701"
+   select COMMON_CLK_MEDIATEK
+   default ARCH_MEDIATEK
+   ---help---
+ This driver supports Mediatek MT2701 basic clocks.
+
+config COMMON_CLK_MT2701_MMSYS
+   bool "Clock driver for Mediatek MT2701 mmsys"
+   select COMMON_CLK_MT2701
+   ---help---
+ This driver supports Mediatek MT2701 mmsys clocks.
+
+config COMMON_CLK_MT2701_IMGSYS
+   bool "Clock driver for Mediatek MT2701 imgsys"
+   select COMMON_CLK_MT2701
+   ---help---
+ This driver supports Mediatek MT2701 imgsys clocks.
+
+config COMMON_CLK_MT2701_VDECSYS
+   bool "Clock driver for Mediatek MT2701 vdecsys"
+   select COMMON_CLK_MT2701
+   ---help---
+ This driver supports Mediatek MT2701 vdecsys clocks.
+
+config COMMON_CLK_MT2701_HIFSYS
+   bool "Clock driver for Mediatek MT2701 hifsys"
+   select COMMON_CLK_MT2701
+   ---help---
+ This driver supports Mediatek MT2701 hifsys clocks.
+
+config COMMON_CLK_MT2701_ETHSYS
+   bool "Clock driver for Mediatek MT2701 ethsys"
+   select COMMON_CLK_MT2701
+   ---help---
+ This driver supports Mediatek MT2701 ethsys clocks.
+
+config COMMON_CLK_MT2701_BDPSYS
+   bool "Clock driver for Mediatek MT2701 bdpsys"
+   select COMMON_CLK_MT2701
+   ---help---
+ This driver supports Mediatek MT2701 bdpsys clocks.
+
 config COMMON_CLK_MT8135
bool "Clock driver for Mediatek MT8135"
select COMMON_CLK_MEDIATEK
diff --git a/drivers/clk/mediatek/Makefile b/drivers/clk/mediatek/Makefile
index 32e7222..19ae7ef 100644
--- a/drivers/clk/mediatek/Makefile
+++ b/drivers/clk/mediatek/Makefile
@@ -1,4 +1,11 @@
 obj-$(CONFIG_COMMON_CLK_MEDIATEK) += clk-mtk.o clk-pll.o clk-gate.o 
clk-apmixed.o
 obj-$(CONFIG_RESET_CONTROLLER) += reset.o
+obj-$(CONFIG_COMMON_CLK_MT2701) += clk-mt2701.o
+obj-$(CONFIG_COMMON_CLK_MT2701_BDPSYS) += clk-mt2701-bdp.o
+obj-$(CONFIG_COMMON_CLK_MT2701_ETHSYS) += clk-mt2701-eth.o
+obj-$(CONFIG_COMMON_CLK_MT2701_HIFSYS) += clk-mt2701-hif.o
+obj-$(CONFIG_COMMON_CLK_MT2701_IMGSYS) += clk-mt2701-img.o
+obj-$(CONFIG_COMMON_CLK_MT2701_MMSYS) += clk-mt2701-mm.o
+obj-$(CONFIG_COMMON_CLK_MT2701_VDECSYS) += clk-mt2701-vdec.o
 obj-$(CONFIG_COMMON_CLK_MT8135) += clk-mt8135.o
 obj-$(CONFIG_COMMON_CLK_MT8173) += clk-mt8173.o
diff --git a/drivers/clk/mediatek/clk-gate.c b/drivers/clk/mediatek/clk-gate.c
index d8787bf..934bf0e 100644
--- a/drivers/clk/mediatek/clk-gate.c
+++ b/drivers/clk/mediatek/clk-gate.c
@@ -61,6 +61,22 @@ static void mtk_cg_clr_bit(struct clk_hw *hw)
regmap_write(cg->regmap, cg->clr_ofs, BIT(cg->bit));
 }
 
+static void mtk_cg_set_bit_no_setclr(struct clk_hw *hw)
+{
+   struct mtk_clk_gate *cg = to_mtk_clk_gate(hw);
+   u32 cgbit = BIT(cg->bit);
+
+   regmap_update_bits(cg->regmap, cg->sta_ofs, cgbit, cgbit);
+}
+
+static void mtk_cg_clr_bit_no_setclr(struct clk_hw *hw)
+{
+   struct mtk_clk_gate *cg = to_mtk_clk_gate(hw);
+   u32 cgbit = BIT(cg->bit);
+
+   regmap_update_bits(cg->regmap, cg->sta_ofs, cgbit, 0);
+}
+
 static int mtk_cg_enable(struct clk_hw *hw)
 {
mtk_cg_clr_bit(hw);
@@ -85,6 +101,30 @@ static void mtk_cg_disable_inv(struct clk_hw *hw)

[git pull] drm fixes - part two

2016-08-18 Thread Dave Airlie

Hey,

Daniel pointed out I'd missed some i915 fixes, and I also found a single 
etnaviv fix I missed.

So here they are,

Thanks,
Dave.

The following changes since commit 91d62d9f30206be6f7749a0e6f7fa58c6d70c702:

  Merge branch 'drm-fixes-4.8' of git://people.freedesktop.org/~agd5f/linux 
into drm-fixes (2016-08-18 12:51:27 +1000)

are available in the git repository at:

  git://people.freedesktop.org/~airlied/linux tags/drm-fixes-for-4.8-rc3-2

for you to fetch changes up to 2c24ba2116d653b4a1315210e38eefbc9eeb1058:

  Merge tag 'drm-intel-fixes-2016-08-15' of 
git://anongit.freedesktop.org/drm-intel into drm-fixes (2016-08-19 08:51:13 
+1000)


Chris Wilson (5):
  drm/i915: Flush GT idle status upon reset
  drm/i915: Handle ENOSPC after failing to insert a mappable node
  drm/i915/fbc: FBC causes display flicker when VT-d is enabled on Skylake
  drm/i915: Add missing rpm wakelock to GGTT pread
  drm/i915: Acquire audio powerwell for HD-Audio registers

Dave Airlie (2):
  Merge branch 'drm-etnaviv-fixes' of 
git://git.pengutronix.de/git/lst/linux into drm-fixes
  Merge tag 'drm-intel-fixes-2016-08-15' of 
git://anongit.freedesktop.org/drm-intel into drm-fixes

Lucas Stach (1):
  drm/etnaviv: take GPU lock later in the submit process

Maarten Lankhorst (1):
  drm/i915: Fix modeset handling during gpu reset, v5.

Matt Roper (1):
  drm/i915/gen9: Give one extra block per line for SKL plane WM calculations

Matthew Auld (2):
  drm/i915: fix WaInsertDummyPushConstPs
  drm/i915: fix aliasing_ppgtt leak

Ville Syrjälä (4):
  drm/i915: Fix iboost setting for DDI with 4 lanes on SKL
  drm/i915: Program iboost settings for HDMI/DVI on SKL
  drm/i915: Clean up the extra RPM ref on CHV with i915.enable_rc6=0
  drm/i915: Fix iboost setting for SKL Y/U DP DDI buffer translation entry 2

 drivers/gpu/drm/etnaviv/etnaviv_gpu.c   |  10 +-
 drivers/gpu/drm/i915/i915_drv.h |   1 +
 drivers/gpu/drm/i915/i915_gem.c |  10 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c |   1 +
 drivers/gpu/drm/i915/i915_reg.h |   1 +
 drivers/gpu/drm/i915/intel_audio.c  |   6 ++
 drivers/gpu/drm/i915/intel_ddi.c|  91 -
 drivers/gpu/drm/i915/intel_display.c| 170 +---
 drivers/gpu/drm/i915/intel_fbc.c|  20 
 drivers/gpu/drm/i915/intel_pm.c |   6 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |   8 +-
 11 files changed, 224 insertions(+), 100 deletions(-)

[PATCH v11 9/9] arm: dts: mt2701: Use real clock for UARTs

2016-08-18 Thread Erin Lo
We used to use a fixed rate clock for the UARTs. Now that we have clock
support we can associate the correct clocks to the UARTs and drop the
26MHz fixed rate UART clock.

Signed-off-by: Erin Lo 
---
 arch/arm/boot/dts/mt2701.dtsi | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/arm/boot/dts/mt2701.dtsi b/arch/arm/boot/dts/mt2701.dtsi
index c9a8dbf..7eab6f4 100644
--- a/arch/arm/boot/dts/mt2701.dtsi
+++ b/arch/arm/boot/dts/mt2701.dtsi
@@ -73,12 +73,6 @@
#clock-cells = <0>;
};
 
-   uart_clk: dummy26m {
-   compatible = "fixed-clock";
-   clock-frequency = <2600>;
-   #clock-cells = <0>;
-   };
-
clk26m: oscillator@0 {
compatible = "fixed-clock";
#clock-cells = <0>;
@@ -186,7 +180,8 @@
 "mediatek,mt6577-uart";
reg = <0 0x11002000 0 0x400>;
interrupts = ;
-   clocks = <_clk>;
+   clocks = < CLK_PERI_UART0_SEL>, < 
CLK_PERI_UART0>;
+   clock-names = "baud", "bus";
status = "disabled";
};
 
@@ -195,7 +190,8 @@
 "mediatek,mt6577-uart";
reg = <0 0x11003000 0 0x400>;
interrupts = ;
-   clocks = <_clk>;
+   clocks = < CLK_PERI_UART1_SEL>, < 
CLK_PERI_UART1>;
+   clock-names = "baud", "bus";
status = "disabled";
};
 
@@ -204,7 +200,8 @@
 "mediatek,mt6577-uart";
reg = <0 0x11004000 0 0x400>;
interrupts = ;
-   clocks = <_clk>;
+   clocks = < CLK_PERI_UART2_SEL>, < 
CLK_PERI_UART2>;
+   clock-names = "baud", "bus";
status = "disabled";
};
 
@@ -213,7 +210,8 @@
 "mediatek,mt6577-uart";
reg = <0 0x11005000 0 0x400>;
interrupts = ;
-   clocks = <_clk>;
+   clocks = < CLK_PERI_UART3_SEL>, < 
CLK_PERI_UART3>;
+   clock-names = "baud", "bus";
status = "disabled";
};
 };
-- 
1.9.1



[PATCH v11 8/9] arm: dts: mt2701: Add clock controller device nodes

2016-08-18 Thread Erin Lo
From: James Liao 

Add clock controller nodes for MT2701, include topckgen, infracfg,
pericfg, apmixedsys, mmsys, imgsys, vdecsys, hifsys, ethsys and
bdpsys. This patch also add two oscillators that provide clocks for
MT2701.

Signed-off-by: James Liao 
Signed-off-by: Erin Lo 
---
 arch/arm/boot/dts/mt2701.dtsi | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/arch/arm/boot/dts/mt2701.dtsi b/arch/arm/boot/dts/mt2701.dtsi
index 18596a2..c9a8dbf 100644
--- a/arch/arm/boot/dts/mt2701.dtsi
+++ b/arch/arm/boot/dts/mt2701.dtsi
@@ -12,8 +12,10 @@
  * GNU General Public License for more details.
  */
 
+#include 
 #include 
 #include 
+#include 
 #include "skeleton64.dtsi"
 #include "mt2701-pinfunc.h"
 
@@ -77,6 +79,20 @@
#clock-cells = <0>;
};
 
+   clk26m: oscillator@0 {
+   compatible = "fixed-clock";
+   #clock-cells = <0>;
+   clock-frequency = <2600>;
+   clock-output-names = "clk26m";
+   };
+
+   rtc32k: oscillator@1 {
+   compatible = "fixed-clock";
+   #clock-cells = <0>;
+   clock-frequency = <32000>;
+   clock-output-names = "rtc32k";
+   };
+
timer {
compatible = "arm,armv7-timer";
interrupt-parent = <>;
@@ -104,6 +120,26 @@
reg = <0 0x10005000 0 0x1000>;
};
 
+   topckgen: syscon@1000 {
+   compatible = "mediatek,mt2701-topckgen", "syscon";
+   reg = <0 0x1000 0 0x1000>;
+   #clock-cells = <1>;
+   };
+
+   infracfg: syscon@10001000 {
+   compatible = "mediatek,mt2701-infracfg", "syscon";
+   reg = <0 0x10001000 0 0x1000>;
+   #clock-cells = <1>;
+   #reset-cells = <1>;
+   };
+
+   pericfg: syscon@10003000 {
+   compatible = "mediatek,mt2701-pericfg", "syscon";
+   reg = <0 0x10003000 0 0x1000>;
+   #clock-cells = <1>;
+   #reset-cells = <1>;
+   };
+
watchdog: watchdog@10007000 {
compatible = "mediatek,mt2701-wdt",
 "mediatek,mt6589-wdt";
@@ -128,6 +164,12 @@
reg = <0 0x10200100 0 0x1c>;
};
 
+   apmixedsys: syscon@10209000 {
+   compatible = "mediatek,mt2701-apmixedsys", "syscon";
+   reg = <0 0x10209000 0 0x1000>;
+   #clock-cells = <1>;
+   };
+
gic: interrupt-controller@10211000 {
compatible = "arm,cortex-a7-gic";
interrupt-controller;
-- 
1.9.1



[git pull] drm fixes - part two

2016-08-18 Thread Dave Airlie

Hey,

Daniel pointed out I'd missed some i915 fixes, and I also found a single 
etnaviv fix I missed.

So here they are,

Thanks,
Dave.

The following changes since commit 91d62d9f30206be6f7749a0e6f7fa58c6d70c702:

  Merge branch 'drm-fixes-4.8' of git://people.freedesktop.org/~agd5f/linux 
into drm-fixes (2016-08-18 12:51:27 +1000)

are available in the git repository at:

  git://people.freedesktop.org/~airlied/linux tags/drm-fixes-for-4.8-rc3-2

for you to fetch changes up to 2c24ba2116d653b4a1315210e38eefbc9eeb1058:

  Merge tag 'drm-intel-fixes-2016-08-15' of 
git://anongit.freedesktop.org/drm-intel into drm-fixes (2016-08-19 08:51:13 
+1000)


Chris Wilson (5):
  drm/i915: Flush GT idle status upon reset
  drm/i915: Handle ENOSPC after failing to insert a mappable node
  drm/i915/fbc: FBC causes display flicker when VT-d is enabled on Skylake
  drm/i915: Add missing rpm wakelock to GGTT pread
  drm/i915: Acquire audio powerwell for HD-Audio registers

Dave Airlie (2):
  Merge branch 'drm-etnaviv-fixes' of 
git://git.pengutronix.de/git/lst/linux into drm-fixes
  Merge tag 'drm-intel-fixes-2016-08-15' of 
git://anongit.freedesktop.org/drm-intel into drm-fixes

Lucas Stach (1):
  drm/etnaviv: take GPU lock later in the submit process

Maarten Lankhorst (1):
  drm/i915: Fix modeset handling during gpu reset, v5.

Matt Roper (1):
  drm/i915/gen9: Give one extra block per line for SKL plane WM calculations

Matthew Auld (2):
  drm/i915: fix WaInsertDummyPushConstPs
  drm/i915: fix aliasing_ppgtt leak

Ville Syrjälä (4):
  drm/i915: Fix iboost setting for DDI with 4 lanes on SKL
  drm/i915: Program iboost settings for HDMI/DVI on SKL
  drm/i915: Clean up the extra RPM ref on CHV with i915.enable_rc6=0
  drm/i915: Fix iboost setting for SKL Y/U DP DDI buffer translation entry 2

 drivers/gpu/drm/etnaviv/etnaviv_gpu.c   |  10 +-
 drivers/gpu/drm/i915/i915_drv.h |   1 +
 drivers/gpu/drm/i915/i915_gem.c |  10 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c |   1 +
 drivers/gpu/drm/i915/i915_reg.h |   1 +
 drivers/gpu/drm/i915/intel_audio.c  |   6 ++
 drivers/gpu/drm/i915/intel_ddi.c|  91 -
 drivers/gpu/drm/i915/intel_display.c| 170 +---
 drivers/gpu/drm/i915/intel_fbc.c|  20 
 drivers/gpu/drm/i915/intel_pm.c |   6 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |   8 +-
 11 files changed, 224 insertions(+), 100 deletions(-)

[PATCH v11 9/9] arm: dts: mt2701: Use real clock for UARTs

2016-08-18 Thread Erin Lo
We used to use a fixed rate clock for the UARTs. Now that we have clock
support we can associate the correct clocks to the UARTs and drop the
26MHz fixed rate UART clock.

Signed-off-by: Erin Lo 
---
 arch/arm/boot/dts/mt2701.dtsi | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/arm/boot/dts/mt2701.dtsi b/arch/arm/boot/dts/mt2701.dtsi
index c9a8dbf..7eab6f4 100644
--- a/arch/arm/boot/dts/mt2701.dtsi
+++ b/arch/arm/boot/dts/mt2701.dtsi
@@ -73,12 +73,6 @@
#clock-cells = <0>;
};
 
-   uart_clk: dummy26m {
-   compatible = "fixed-clock";
-   clock-frequency = <2600>;
-   #clock-cells = <0>;
-   };
-
clk26m: oscillator@0 {
compatible = "fixed-clock";
#clock-cells = <0>;
@@ -186,7 +180,8 @@
 "mediatek,mt6577-uart";
reg = <0 0x11002000 0 0x400>;
interrupts = ;
-   clocks = <_clk>;
+   clocks = < CLK_PERI_UART0_SEL>, < 
CLK_PERI_UART0>;
+   clock-names = "baud", "bus";
status = "disabled";
};
 
@@ -195,7 +190,8 @@
 "mediatek,mt6577-uart";
reg = <0 0x11003000 0 0x400>;
interrupts = ;
-   clocks = <_clk>;
+   clocks = < CLK_PERI_UART1_SEL>, < 
CLK_PERI_UART1>;
+   clock-names = "baud", "bus";
status = "disabled";
};
 
@@ -204,7 +200,8 @@
 "mediatek,mt6577-uart";
reg = <0 0x11004000 0 0x400>;
interrupts = ;
-   clocks = <_clk>;
+   clocks = < CLK_PERI_UART2_SEL>, < 
CLK_PERI_UART2>;
+   clock-names = "baud", "bus";
status = "disabled";
};
 
@@ -213,7 +210,8 @@
 "mediatek,mt6577-uart";
reg = <0 0x11005000 0 0x400>;
interrupts = ;
-   clocks = <_clk>;
+   clocks = < CLK_PERI_UART3_SEL>, < 
CLK_PERI_UART3>;
+   clock-names = "baud", "bus";
status = "disabled";
};
 };
-- 
1.9.1



[PATCH v11 8/9] arm: dts: mt2701: Add clock controller device nodes

2016-08-18 Thread Erin Lo
From: James Liao 

Add clock controller nodes for MT2701, include topckgen, infracfg,
pericfg, apmixedsys, mmsys, imgsys, vdecsys, hifsys, ethsys and
bdpsys. This patch also add two oscillators that provide clocks for
MT2701.

Signed-off-by: James Liao 
Signed-off-by: Erin Lo 
---
 arch/arm/boot/dts/mt2701.dtsi | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/arch/arm/boot/dts/mt2701.dtsi b/arch/arm/boot/dts/mt2701.dtsi
index 18596a2..c9a8dbf 100644
--- a/arch/arm/boot/dts/mt2701.dtsi
+++ b/arch/arm/boot/dts/mt2701.dtsi
@@ -12,8 +12,10 @@
  * GNU General Public License for more details.
  */
 
+#include 
 #include 
 #include 
+#include 
 #include "skeleton64.dtsi"
 #include "mt2701-pinfunc.h"
 
@@ -77,6 +79,20 @@
#clock-cells = <0>;
};
 
+   clk26m: oscillator@0 {
+   compatible = "fixed-clock";
+   #clock-cells = <0>;
+   clock-frequency = <2600>;
+   clock-output-names = "clk26m";
+   };
+
+   rtc32k: oscillator@1 {
+   compatible = "fixed-clock";
+   #clock-cells = <0>;
+   clock-frequency = <32000>;
+   clock-output-names = "rtc32k";
+   };
+
timer {
compatible = "arm,armv7-timer";
interrupt-parent = <>;
@@ -104,6 +120,26 @@
reg = <0 0x10005000 0 0x1000>;
};
 
+   topckgen: syscon@1000 {
+   compatible = "mediatek,mt2701-topckgen", "syscon";
+   reg = <0 0x1000 0 0x1000>;
+   #clock-cells = <1>;
+   };
+
+   infracfg: syscon@10001000 {
+   compatible = "mediatek,mt2701-infracfg", "syscon";
+   reg = <0 0x10001000 0 0x1000>;
+   #clock-cells = <1>;
+   #reset-cells = <1>;
+   };
+
+   pericfg: syscon@10003000 {
+   compatible = "mediatek,mt2701-pericfg", "syscon";
+   reg = <0 0x10003000 0 0x1000>;
+   #clock-cells = <1>;
+   #reset-cells = <1>;
+   };
+
watchdog: watchdog@10007000 {
compatible = "mediatek,mt2701-wdt",
 "mediatek,mt6589-wdt";
@@ -128,6 +164,12 @@
reg = <0 0x10200100 0 0x1c>;
};
 
+   apmixedsys: syscon@10209000 {
+   compatible = "mediatek,mt2701-apmixedsys", "syscon";
+   reg = <0 0x10209000 0 0x1000>;
+   #clock-cells = <1>;
+   };
+
gic: interrupt-controller@10211000 {
compatible = "arm,cortex-a7-gic";
interrupt-controller;
-- 
1.9.1



[PATCH v11 1/9] clk: mediatek: remove __init from clk registration functions

2016-08-18 Thread Erin Lo
From: James Liao 

Remove __init from functions that will be used by init functions
that support probe deferral.

Signed-off-by: James Liao 
Signed-off-by: Erin Lo 
---
 drivers/clk/mediatek/clk-gate.c |  2 +-
 drivers/clk/mediatek/clk-mtk.c  | 12 ++--
 drivers/clk/mediatek/clk-pll.c  |  2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/clk/mediatek/clk-gate.c b/drivers/clk/mediatek/clk-gate.c
index 2a76901..d8787bf 100644
--- a/drivers/clk/mediatek/clk-gate.c
+++ b/drivers/clk/mediatek/clk-gate.c
@@ -97,7 +97,7 @@ const struct clk_ops mtk_clk_gate_ops_setclr_inv = {
.disable= mtk_cg_disable_inv,
 };
 
-struct clk * __init mtk_clk_register_gate(
+struct clk *mtk_clk_register_gate(
const char *name,
const char *parent_name,
struct regmap *regmap,
diff --git a/drivers/clk/mediatek/clk-mtk.c b/drivers/clk/mediatek/clk-mtk.c
index 5ada644..bb30f70 100644
--- a/drivers/clk/mediatek/clk-mtk.c
+++ b/drivers/clk/mediatek/clk-mtk.c
@@ -24,7 +24,7 @@
 #include "clk-mtk.h"
 #include "clk-gate.h"
 
-struct clk_onecell_data * __init mtk_alloc_clk_data(unsigned int clk_num)
+struct clk_onecell_data *mtk_alloc_clk_data(unsigned int clk_num)
 {
int i;
struct clk_onecell_data *clk_data;
@@ -49,7 +49,7 @@ err_out:
return NULL;
 }
 
-void __init mtk_clk_register_fixed_clks(const struct mtk_fixed_clk *clks,
+void mtk_clk_register_fixed_clks(const struct mtk_fixed_clk *clks,
int num, struct clk_onecell_data *clk_data)
 {
int i;
@@ -72,7 +72,7 @@ void __init mtk_clk_register_fixed_clks(const struct 
mtk_fixed_clk *clks,
}
 }
 
-void __init mtk_clk_register_factors(const struct mtk_fixed_factor *clks,
+void mtk_clk_register_factors(const struct mtk_fixed_factor *clks,
int num, struct clk_onecell_data *clk_data)
 {
int i;
@@ -95,7 +95,7 @@ void __init mtk_clk_register_factors(const struct 
mtk_fixed_factor *clks,
}
 }
 
-int __init mtk_clk_register_gates(struct device_node *node,
+int mtk_clk_register_gates(struct device_node *node,
const struct mtk_gate *clks,
int num, struct clk_onecell_data *clk_data)
 {
@@ -135,7 +135,7 @@ int __init mtk_clk_register_gates(struct device_node *node,
return 0;
 }
 
-struct clk * __init mtk_clk_register_composite(const struct mtk_composite *mc,
+struct clk *mtk_clk_register_composite(const struct mtk_composite *mc,
void __iomem *base, spinlock_t *lock)
 {
struct clk *clk;
@@ -222,7 +222,7 @@ err_out:
return ERR_PTR(ret);
 }
 
-void __init mtk_clk_register_composites(const struct mtk_composite *mcs,
+void mtk_clk_register_composites(const struct mtk_composite *mcs,
int num, void __iomem *base, spinlock_t *lock,
struct clk_onecell_data *clk_data)
 {
diff --git a/drivers/clk/mediatek/clk-pll.c b/drivers/clk/mediatek/clk-pll.c
index 966cab1..0c2deac 100644
--- a/drivers/clk/mediatek/clk-pll.c
+++ b/drivers/clk/mediatek/clk-pll.c
@@ -313,7 +313,7 @@ static struct clk *mtk_clk_register_pll(const struct 
mtk_pll_data *data,
return clk;
 }
 
-void __init mtk_clk_register_plls(struct device_node *node,
+void mtk_clk_register_plls(struct device_node *node,
const struct mtk_pll_data *plls, int num_plls, struct 
clk_onecell_data *clk_data)
 {
void __iomem *base;
-- 
1.9.1



[PATCH v11 4/9] clk: mediatek: Add dt-bindings for MT2701 clocks

2016-08-18 Thread Erin Lo
From: Shunli Wang 

Add MT2701 clock dt-bindings, include topckgen, apmixedsys,
infracfg, pericfg and subsystem clocks.

Signed-off-by: Shunli Wang 
Signed-off-by: James Liao 
Signed-off-by: Erin Lo 
Tested-by: John Crispin 
Reviewed-by: Matthias Brugger 
---
 include/dt-bindings/clock/mt2701-clk.h | 486 +
 1 file changed, 486 insertions(+)
 create mode 100644 include/dt-bindings/clock/mt2701-clk.h

diff --git a/include/dt-bindings/clock/mt2701-clk.h 
b/include/dt-bindings/clock/mt2701-clk.h
new file mode 100644
index 000..2062c67
--- /dev/null
+++ b/include/dt-bindings/clock/mt2701-clk.h
@@ -0,0 +1,486 @@
+/*
+ * Copyright (c) 2014 MediaTek Inc.
+ * Author: Shunli Wang 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _DT_BINDINGS_CLK_MT2701_H
+#define _DT_BINDINGS_CLK_MT2701_H
+
+/* TOPCKGEN */
+#define CLK_TOP_SYSPLL 1
+#define CLK_TOP_SYSPLL_D2  2
+#define CLK_TOP_SYSPLL_D3  3
+#define CLK_TOP_SYSPLL_D5  4
+#define CLK_TOP_SYSPLL_D7  5
+#define CLK_TOP_SYSPLL1_D2 6
+#define CLK_TOP_SYSPLL1_D4 7
+#define CLK_TOP_SYSPLL1_D8 8
+#define CLK_TOP_SYSPLL1_D169
+#define CLK_TOP_SYSPLL2_D2 10
+#define CLK_TOP_SYSPLL2_D4 11
+#define CLK_TOP_SYSPLL2_D8 12
+#define CLK_TOP_SYSPLL3_D2 13
+#define CLK_TOP_SYSPLL3_D4 14
+#define CLK_TOP_SYSPLL4_D2 15
+#define CLK_TOP_SYSPLL4_D4 16
+#define CLK_TOP_UNIVPLL17
+#define CLK_TOP_UNIVPLL_D2 18
+#define CLK_TOP_UNIVPLL_D3 19
+#define CLK_TOP_UNIVPLL_D5 20
+#define CLK_TOP_UNIVPLL_D7 21
+#define CLK_TOP_UNIVPLL_D2622
+#define CLK_TOP_UNIVPLL_D5223
+#define CLK_TOP_UNIVPLL_D108   24
+#define CLK_TOP_USB_PHY48M 25
+#define CLK_TOP_UNIVPLL1_D226
+#define CLK_TOP_UNIVPLL1_D427
+#define CLK_TOP_UNIVPLL1_D828
+#define CLK_TOP_UNIVPLL2_D229
+#define CLK_TOP_UNIVPLL2_D430
+#define CLK_TOP_UNIVPLL2_D831
+#define CLK_TOP_UNIVPLL2_D16   32
+#define CLK_TOP_UNIVPLL2_D32   33
+#define CLK_TOP_UNIVPLL3_D234
+#define CLK_TOP_UNIVPLL3_D435
+#define CLK_TOP_UNIVPLL3_D836
+#define CLK_TOP_MSDCPLL37
+#define CLK_TOP_MSDCPLL_D2 38
+#define CLK_TOP_MSDCPLL_D4 39
+#define CLK_TOP_MSDCPLL_D8 40
+#define CLK_TOP_MMPLL  41
+#define CLK_TOP_MMPLL_D2   42
+#define CLK_TOP_DMPLL  43
+#define CLK_TOP_DMPLL_D2   44
+#define CLK_TOP_DMPLL_D4   45
+#define CLK_TOP_DMPLL_X2   46
+#define CLK_TOP_TVDPLL 47
+#define CLK_TOP_TVDPLL_D2  48
+#define CLK_TOP_TVDPLL_D4  49
+#define CLK_TOP_TVD2PLL50
+#define CLK_TOP_TVD2PLL_D2 51
+#define CLK_TOP_HADDS2PLL_98M  52
+#define CLK_TOP_HADDS2PLL_294M 53
+#define CLK_TOP_HADDS2_FB  54
+#define CLK_TOP_MIPIPLL_D2 55
+#define CLK_TOP_MIPIPLL_D4 56
+#define CLK_TOP_HDMIPLL57
+#define CLK_TOP_HDMIPLL_D2 58
+#define CLK_TOP_HDMIPLL_D3 59
+#define CLK_TOP_HDMI_SCL_RX60
+#define CLK_TOP_HDMI_0_PIX340M 61
+#define CLK_TOP_HDMI_0_DEEP340M62
+#define CLK_TOP_HDMI_0_PLL340M 63
+#define CLK_TOP_AUD1PLL_98M64
+#define CLK_TOP_AUD2PLL_90M65
+#define CLK_TOP_AUDPLL 66
+#define CLK_TOP_AUDPLL_D4  67
+#define CLK_TOP_AUDPLL_D8  68
+#define CLK_TOP_AUDPLL_D16 69
+#define 

[PATCH v11 1/9] clk: mediatek: remove __init from clk registration functions

2016-08-18 Thread Erin Lo
From: James Liao 

Remove __init from functions that will be used by init functions
that support probe deferral.

Signed-off-by: James Liao 
Signed-off-by: Erin Lo 
---
 drivers/clk/mediatek/clk-gate.c |  2 +-
 drivers/clk/mediatek/clk-mtk.c  | 12 ++--
 drivers/clk/mediatek/clk-pll.c  |  2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/clk/mediatek/clk-gate.c b/drivers/clk/mediatek/clk-gate.c
index 2a76901..d8787bf 100644
--- a/drivers/clk/mediatek/clk-gate.c
+++ b/drivers/clk/mediatek/clk-gate.c
@@ -97,7 +97,7 @@ const struct clk_ops mtk_clk_gate_ops_setclr_inv = {
.disable= mtk_cg_disable_inv,
 };
 
-struct clk * __init mtk_clk_register_gate(
+struct clk *mtk_clk_register_gate(
const char *name,
const char *parent_name,
struct regmap *regmap,
diff --git a/drivers/clk/mediatek/clk-mtk.c b/drivers/clk/mediatek/clk-mtk.c
index 5ada644..bb30f70 100644
--- a/drivers/clk/mediatek/clk-mtk.c
+++ b/drivers/clk/mediatek/clk-mtk.c
@@ -24,7 +24,7 @@
 #include "clk-mtk.h"
 #include "clk-gate.h"
 
-struct clk_onecell_data * __init mtk_alloc_clk_data(unsigned int clk_num)
+struct clk_onecell_data *mtk_alloc_clk_data(unsigned int clk_num)
 {
int i;
struct clk_onecell_data *clk_data;
@@ -49,7 +49,7 @@ err_out:
return NULL;
 }
 
-void __init mtk_clk_register_fixed_clks(const struct mtk_fixed_clk *clks,
+void mtk_clk_register_fixed_clks(const struct mtk_fixed_clk *clks,
int num, struct clk_onecell_data *clk_data)
 {
int i;
@@ -72,7 +72,7 @@ void __init mtk_clk_register_fixed_clks(const struct 
mtk_fixed_clk *clks,
}
 }
 
-void __init mtk_clk_register_factors(const struct mtk_fixed_factor *clks,
+void mtk_clk_register_factors(const struct mtk_fixed_factor *clks,
int num, struct clk_onecell_data *clk_data)
 {
int i;
@@ -95,7 +95,7 @@ void __init mtk_clk_register_factors(const struct 
mtk_fixed_factor *clks,
}
 }
 
-int __init mtk_clk_register_gates(struct device_node *node,
+int mtk_clk_register_gates(struct device_node *node,
const struct mtk_gate *clks,
int num, struct clk_onecell_data *clk_data)
 {
@@ -135,7 +135,7 @@ int __init mtk_clk_register_gates(struct device_node *node,
return 0;
 }
 
-struct clk * __init mtk_clk_register_composite(const struct mtk_composite *mc,
+struct clk *mtk_clk_register_composite(const struct mtk_composite *mc,
void __iomem *base, spinlock_t *lock)
 {
struct clk *clk;
@@ -222,7 +222,7 @@ err_out:
return ERR_PTR(ret);
 }
 
-void __init mtk_clk_register_composites(const struct mtk_composite *mcs,
+void mtk_clk_register_composites(const struct mtk_composite *mcs,
int num, void __iomem *base, spinlock_t *lock,
struct clk_onecell_data *clk_data)
 {
diff --git a/drivers/clk/mediatek/clk-pll.c b/drivers/clk/mediatek/clk-pll.c
index 966cab1..0c2deac 100644
--- a/drivers/clk/mediatek/clk-pll.c
+++ b/drivers/clk/mediatek/clk-pll.c
@@ -313,7 +313,7 @@ static struct clk *mtk_clk_register_pll(const struct 
mtk_pll_data *data,
return clk;
 }
 
-void __init mtk_clk_register_plls(struct device_node *node,
+void mtk_clk_register_plls(struct device_node *node,
const struct mtk_pll_data *plls, int num_plls, struct 
clk_onecell_data *clk_data)
 {
void __iomem *base;
-- 
1.9.1



[PATCH v11 4/9] clk: mediatek: Add dt-bindings for MT2701 clocks

2016-08-18 Thread Erin Lo
From: Shunli Wang 

Add MT2701 clock dt-bindings, include topckgen, apmixedsys,
infracfg, pericfg and subsystem clocks.

Signed-off-by: Shunli Wang 
Signed-off-by: James Liao 
Signed-off-by: Erin Lo 
Tested-by: John Crispin 
Reviewed-by: Matthias Brugger 
---
 include/dt-bindings/clock/mt2701-clk.h | 486 +
 1 file changed, 486 insertions(+)
 create mode 100644 include/dt-bindings/clock/mt2701-clk.h

diff --git a/include/dt-bindings/clock/mt2701-clk.h 
b/include/dt-bindings/clock/mt2701-clk.h
new file mode 100644
index 000..2062c67
--- /dev/null
+++ b/include/dt-bindings/clock/mt2701-clk.h
@@ -0,0 +1,486 @@
+/*
+ * Copyright (c) 2014 MediaTek Inc.
+ * Author: Shunli Wang 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _DT_BINDINGS_CLK_MT2701_H
+#define _DT_BINDINGS_CLK_MT2701_H
+
+/* TOPCKGEN */
+#define CLK_TOP_SYSPLL 1
+#define CLK_TOP_SYSPLL_D2  2
+#define CLK_TOP_SYSPLL_D3  3
+#define CLK_TOP_SYSPLL_D5  4
+#define CLK_TOP_SYSPLL_D7  5
+#define CLK_TOP_SYSPLL1_D2 6
+#define CLK_TOP_SYSPLL1_D4 7
+#define CLK_TOP_SYSPLL1_D8 8
+#define CLK_TOP_SYSPLL1_D169
+#define CLK_TOP_SYSPLL2_D2 10
+#define CLK_TOP_SYSPLL2_D4 11
+#define CLK_TOP_SYSPLL2_D8 12
+#define CLK_TOP_SYSPLL3_D2 13
+#define CLK_TOP_SYSPLL3_D4 14
+#define CLK_TOP_SYSPLL4_D2 15
+#define CLK_TOP_SYSPLL4_D4 16
+#define CLK_TOP_UNIVPLL17
+#define CLK_TOP_UNIVPLL_D2 18
+#define CLK_TOP_UNIVPLL_D3 19
+#define CLK_TOP_UNIVPLL_D5 20
+#define CLK_TOP_UNIVPLL_D7 21
+#define CLK_TOP_UNIVPLL_D2622
+#define CLK_TOP_UNIVPLL_D5223
+#define CLK_TOP_UNIVPLL_D108   24
+#define CLK_TOP_USB_PHY48M 25
+#define CLK_TOP_UNIVPLL1_D226
+#define CLK_TOP_UNIVPLL1_D427
+#define CLK_TOP_UNIVPLL1_D828
+#define CLK_TOP_UNIVPLL2_D229
+#define CLK_TOP_UNIVPLL2_D430
+#define CLK_TOP_UNIVPLL2_D831
+#define CLK_TOP_UNIVPLL2_D16   32
+#define CLK_TOP_UNIVPLL2_D32   33
+#define CLK_TOP_UNIVPLL3_D234
+#define CLK_TOP_UNIVPLL3_D435
+#define CLK_TOP_UNIVPLL3_D836
+#define CLK_TOP_MSDCPLL37
+#define CLK_TOP_MSDCPLL_D2 38
+#define CLK_TOP_MSDCPLL_D4 39
+#define CLK_TOP_MSDCPLL_D8 40
+#define CLK_TOP_MMPLL  41
+#define CLK_TOP_MMPLL_D2   42
+#define CLK_TOP_DMPLL  43
+#define CLK_TOP_DMPLL_D2   44
+#define CLK_TOP_DMPLL_D4   45
+#define CLK_TOP_DMPLL_X2   46
+#define CLK_TOP_TVDPLL 47
+#define CLK_TOP_TVDPLL_D2  48
+#define CLK_TOP_TVDPLL_D4  49
+#define CLK_TOP_TVD2PLL50
+#define CLK_TOP_TVD2PLL_D2 51
+#define CLK_TOP_HADDS2PLL_98M  52
+#define CLK_TOP_HADDS2PLL_294M 53
+#define CLK_TOP_HADDS2_FB  54
+#define CLK_TOP_MIPIPLL_D2 55
+#define CLK_TOP_MIPIPLL_D4 56
+#define CLK_TOP_HDMIPLL57
+#define CLK_TOP_HDMIPLL_D2 58
+#define CLK_TOP_HDMIPLL_D3 59
+#define CLK_TOP_HDMI_SCL_RX60
+#define CLK_TOP_HDMI_0_PIX340M 61
+#define CLK_TOP_HDMI_0_DEEP340M62
+#define CLK_TOP_HDMI_0_PLL340M 63
+#define CLK_TOP_AUD1PLL_98M64
+#define CLK_TOP_AUD2PLL_90M65
+#define CLK_TOP_AUDPLL 66
+#define CLK_TOP_AUDPLL_D4  67
+#define CLK_TOP_AUDPLL_D8  68
+#define CLK_TOP_AUDPLL_D16 69
+#define CLK_TOP_AUDPLL_D24 70
+#define CLK_TOP_ETHPLL_500M71
+#define CLK_TOP_VDECPLL72
+#define CLK_TOP_VENCPLL 

[PATCH v11 0/9] Add clock support for Mediatek MT2701

2016-08-18 Thread Erin Lo
This series is based on v4.8-rc1, add clock and reset controller support
for Mediatek MT2701.

This series also refined makefile and Kconfig to support configurable
multiple SoC clock support.

changes since v10:
- Remove COMMON_CLK dependency from clk/mediatek/Kconfig.

changes since v9:
- Rebase to v4.8-rc1.
- Drop a fix patch of parent clock initial state. It will be replaced by a new
  patch from Mike/Stephen.
- Replace clk.h with clk-provider.h.
- Correct register settings of clocks.

changes since v8:
- Rebase to v4.7-rc1.
- Include mt2701-resets.h in mt2701.dtsi.
- Remove an unused property from apmixedsys DT node.

changes since v7:
- Rebase to clk-next.
- Implement subsystem clocks in seperated files.
- Replace critical clock enabling with CLK_IS_CRITICAL flag.
- Reduce most clock registrations in CLK_OF_DECLARE().
- Remove __init and __initconst from most init fucntions and data,
  and replace driver registration with platform_driver_register().
- Replace some common function or variable names with unique names.
- Use real clock for UARTs.

changes since v6:
- Rebase to v4.6-rc1.
- Register subsystem clocks in probe() instead of CLK_OF_DECLARE().
- Add clocks that referred by subsystem clocks.
- Fix clk_data size of apmixedsys.
- Add config options for each subsystem clock provider.

changes since v5:
- Rebase to v4.5-rc1 and [1].
- Enable critical clocks for MT2701
- Refine dt-binding documents, add reset controller support for hifsys.

changes since v4:
- Rebase to v4.5-rc1.
- Remove CLK_SET_RATE_PARENT from divider flags.
- Add img_jpgdec_smi clock.
- Move clk/mediatek/Kconfig into menu section in clk/Kconfig.

changes since v3:
- Change the parent of mm_mdp_bls_26m from clk26m to pwm_sel.

changes since v2:
- Fix ethsys definition.
- Replace read-modify-write with regmap_update_bits() in clock operations.
- Move mt2701-resets.h to include/dt-bindings/reset/.
- Add hifsys reset patch from John Crispin.

changes since v1:
- Document MT2701 compatible strings.

[1] https://patchwork.kernel.org/patch/8147901/

Erin Lo (1):
  arm: dts: mt2701: Use real clock for UARTs

James Liao (4):
  clk: mediatek: remove __init from clk registration functions
  clk: mediatek: Refine the makefile to support multiple clock drivers
  dt-bindings: ARM: Mediatek: Document bindings for MT2701
  arm: dts: mt2701: Add clock controller device nodes

Shunli Wang (4):
  clk: mediatek: Add dt-bindings for MT2701 clocks
  clk: mediatek: Add MT2701 clock support
  reset: mediatek: Add MT2701 reset controller dt-binding file
  reset: mediatek: Add MT2701 reset driver

 .../bindings/arm/mediatek/mediatek,apmixedsys.txt  |3 +-
 .../bindings/arm/mediatek/mediatek,bdpsys.txt  |   22 +
 .../bindings/arm/mediatek/mediatek,ethsys.txt  |   22 +
 .../bindings/arm/mediatek/mediatek,hifsys.txt  |   24 +
 .../bindings/arm/mediatek/mediatek,imgsys.txt  |3 +-
 .../bindings/arm/mediatek/mediatek,infracfg.txt|3 +-
 .../bindings/arm/mediatek/mediatek,mmsys.txt   |3 +-
 .../bindings/arm/mediatek/mediatek,pericfg.txt |3 +-
 .../bindings/arm/mediatek/mediatek,topckgen.txt|3 +-
 .../bindings/arm/mediatek/mediatek,vdecsys.txt |3 +-
 arch/arm/boot/dts/mt2701.dtsi  |   50 +-
 drivers/clk/Kconfig|1 +
 drivers/clk/mediatek/Kconfig   |   64 ++
 drivers/clk/mediatek/Makefile  |   13 +-
 drivers/clk/mediatek/clk-gate.c|   54 +-
 drivers/clk/mediatek/clk-gate.h|2 +
 drivers/clk/mediatek/clk-mt2701-bdp.c  |  140 +++
 drivers/clk/mediatek/clk-mt2701-eth.c  |   82 ++
 drivers/clk/mediatek/clk-mt2701-hif.c  |   81 ++
 drivers/clk/mediatek/clk-mt2701-img.c  |   82 ++
 drivers/clk/mediatek/clk-mt2701-mm.c   |  125 +++
 drivers/clk/mediatek/clk-mt2701-vdec.c |   93 ++
 drivers/clk/mediatek/clk-mt2701.c  | 1037 
 drivers/clk/mediatek/clk-mtk.c |   52 +-
 drivers/clk/mediatek/clk-mtk.h |   41 +-
 drivers/clk/mediatek/clk-pll.c |3 +-
 include/dt-bindings/clock/mt2701-clk.h |  486 +
 include/dt-bindings/reset/mt2701-resets.h  |   83 ++
 28 files changed, 2550 insertions(+), 28 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt
 create mode 100644 
Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt
 create mode 100644 
Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt
 create mode 100644 drivers/clk/mediatek/Kconfig
 create mode 100644 drivers/clk/mediatek/clk-mt2701-bdp.c
 create mode 100644 drivers/clk/mediatek/clk-mt2701-eth.c
 create mode 100644 drivers/clk/mediatek/clk-mt2701-hif.c
 create mode 100644 drivers/clk/mediatek/clk-mt2701-img.c
 create mode 100644 

[PATCH v11 7/9] reset: mediatek: Add MT2701 reset driver

2016-08-18 Thread Erin Lo
From: Shunli Wang 

In infrasys and perifsys, there are many reset
control bits for kinds of modules. These bits are
used as actual reset controllers to be registered
into kernel's generic reset controller framework.

Signed-off-by: Shunli Wang 
Signed-off-by: James Liao 
Signed-off-by: Erin Lo 
Tested-by: John Crispin 
Acked-by: Philipp Zabel 
---
 drivers/clk/mediatek/clk-mt2701-hif.c | 2 ++
 drivers/clk/mediatek/clk-mt2701.c | 4 
 2 files changed, 6 insertions(+)

diff --git a/drivers/clk/mediatek/clk-mt2701-hif.c 
b/drivers/clk/mediatek/clk-mt2701-hif.c
index 33ead83..0ca0537 100644
--- a/drivers/clk/mediatek/clk-mt2701-hif.c
+++ b/drivers/clk/mediatek/clk-mt2701-hif.c
@@ -55,6 +55,8 @@ static void mtk_hifsys_init(struct device_node *node)
if (r)
pr_err("%s(): could not register clock provider: %d\n",
__func__, r);
+
+   mtk_register_reset_controller(node, 1, 0x34);
 }
 
 static const struct of_device_id of_match_clk_mt2701_hif[] = {
diff --git a/drivers/clk/mediatek/clk-mt2701.c 
b/drivers/clk/mediatek/clk-mt2701.c
index f64dc4e..9dab533 100644
--- a/drivers/clk/mediatek/clk-mt2701.c
+++ b/drivers/clk/mediatek/clk-mt2701.c
@@ -791,6 +791,8 @@ static void mtk_infrasys_init(struct device_node *node)
if (r)
pr_err("%s(): could not register clock provider: %d\n",
__func__, r);
+
+   mtk_register_reset_controller(node, 2, 0x30);
 }
 
 static const struct mtk_gate_regs peri0_cg_regs = {
@@ -911,6 +913,8 @@ static void mtk_pericfg_init(struct device_node *node)
if (r)
pr_err("%s(): could not register clock provider: %d\n",
__func__, r);
+
+   mtk_register_reset_controller(node, 2, 0x0);
 }
 
 #define MT8590_PLL_FMAX(2000 * MHZ)
-- 
1.9.1



[PATCH v11 0/9] Add clock support for Mediatek MT2701

2016-08-18 Thread Erin Lo
This series is based on v4.8-rc1, add clock and reset controller support
for Mediatek MT2701.

This series also refined makefile and Kconfig to support configurable
multiple SoC clock support.

changes since v10:
- Remove COMMON_CLK dependency from clk/mediatek/Kconfig.

changes since v9:
- Rebase to v4.8-rc1.
- Drop a fix patch of parent clock initial state. It will be replaced by a new
  patch from Mike/Stephen.
- Replace clk.h with clk-provider.h.
- Correct register settings of clocks.

changes since v8:
- Rebase to v4.7-rc1.
- Include mt2701-resets.h in mt2701.dtsi.
- Remove an unused property from apmixedsys DT node.

changes since v7:
- Rebase to clk-next.
- Implement subsystem clocks in seperated files.
- Replace critical clock enabling with CLK_IS_CRITICAL flag.
- Reduce most clock registrations in CLK_OF_DECLARE().
- Remove __init and __initconst from most init fucntions and data,
  and replace driver registration with platform_driver_register().
- Replace some common function or variable names with unique names.
- Use real clock for UARTs.

changes since v6:
- Rebase to v4.6-rc1.
- Register subsystem clocks in probe() instead of CLK_OF_DECLARE().
- Add clocks that referred by subsystem clocks.
- Fix clk_data size of apmixedsys.
- Add config options for each subsystem clock provider.

changes since v5:
- Rebase to v4.5-rc1 and [1].
- Enable critical clocks for MT2701
- Refine dt-binding documents, add reset controller support for hifsys.

changes since v4:
- Rebase to v4.5-rc1.
- Remove CLK_SET_RATE_PARENT from divider flags.
- Add img_jpgdec_smi clock.
- Move clk/mediatek/Kconfig into menu section in clk/Kconfig.

changes since v3:
- Change the parent of mm_mdp_bls_26m from clk26m to pwm_sel.

changes since v2:
- Fix ethsys definition.
- Replace read-modify-write with regmap_update_bits() in clock operations.
- Move mt2701-resets.h to include/dt-bindings/reset/.
- Add hifsys reset patch from John Crispin.

changes since v1:
- Document MT2701 compatible strings.

[1] https://patchwork.kernel.org/patch/8147901/

Erin Lo (1):
  arm: dts: mt2701: Use real clock for UARTs

James Liao (4):
  clk: mediatek: remove __init from clk registration functions
  clk: mediatek: Refine the makefile to support multiple clock drivers
  dt-bindings: ARM: Mediatek: Document bindings for MT2701
  arm: dts: mt2701: Add clock controller device nodes

Shunli Wang (4):
  clk: mediatek: Add dt-bindings for MT2701 clocks
  clk: mediatek: Add MT2701 clock support
  reset: mediatek: Add MT2701 reset controller dt-binding file
  reset: mediatek: Add MT2701 reset driver

 .../bindings/arm/mediatek/mediatek,apmixedsys.txt  |3 +-
 .../bindings/arm/mediatek/mediatek,bdpsys.txt  |   22 +
 .../bindings/arm/mediatek/mediatek,ethsys.txt  |   22 +
 .../bindings/arm/mediatek/mediatek,hifsys.txt  |   24 +
 .../bindings/arm/mediatek/mediatek,imgsys.txt  |3 +-
 .../bindings/arm/mediatek/mediatek,infracfg.txt|3 +-
 .../bindings/arm/mediatek/mediatek,mmsys.txt   |3 +-
 .../bindings/arm/mediatek/mediatek,pericfg.txt |3 +-
 .../bindings/arm/mediatek/mediatek,topckgen.txt|3 +-
 .../bindings/arm/mediatek/mediatek,vdecsys.txt |3 +-
 arch/arm/boot/dts/mt2701.dtsi  |   50 +-
 drivers/clk/Kconfig|1 +
 drivers/clk/mediatek/Kconfig   |   64 ++
 drivers/clk/mediatek/Makefile  |   13 +-
 drivers/clk/mediatek/clk-gate.c|   54 +-
 drivers/clk/mediatek/clk-gate.h|2 +
 drivers/clk/mediatek/clk-mt2701-bdp.c  |  140 +++
 drivers/clk/mediatek/clk-mt2701-eth.c  |   82 ++
 drivers/clk/mediatek/clk-mt2701-hif.c  |   81 ++
 drivers/clk/mediatek/clk-mt2701-img.c  |   82 ++
 drivers/clk/mediatek/clk-mt2701-mm.c   |  125 +++
 drivers/clk/mediatek/clk-mt2701-vdec.c |   93 ++
 drivers/clk/mediatek/clk-mt2701.c  | 1037 
 drivers/clk/mediatek/clk-mtk.c |   52 +-
 drivers/clk/mediatek/clk-mtk.h |   41 +-
 drivers/clk/mediatek/clk-pll.c |3 +-
 include/dt-bindings/clock/mt2701-clk.h |  486 +
 include/dt-bindings/reset/mt2701-resets.h  |   83 ++
 28 files changed, 2550 insertions(+), 28 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt
 create mode 100644 
Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt
 create mode 100644 
Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt
 create mode 100644 drivers/clk/mediatek/Kconfig
 create mode 100644 drivers/clk/mediatek/clk-mt2701-bdp.c
 create mode 100644 drivers/clk/mediatek/clk-mt2701-eth.c
 create mode 100644 drivers/clk/mediatek/clk-mt2701-hif.c
 create mode 100644 drivers/clk/mediatek/clk-mt2701-img.c
 create mode 100644 

[PATCH v11 7/9] reset: mediatek: Add MT2701 reset driver

2016-08-18 Thread Erin Lo
From: Shunli Wang 

In infrasys and perifsys, there are many reset
control bits for kinds of modules. These bits are
used as actual reset controllers to be registered
into kernel's generic reset controller framework.

Signed-off-by: Shunli Wang 
Signed-off-by: James Liao 
Signed-off-by: Erin Lo 
Tested-by: John Crispin 
Acked-by: Philipp Zabel 
---
 drivers/clk/mediatek/clk-mt2701-hif.c | 2 ++
 drivers/clk/mediatek/clk-mt2701.c | 4 
 2 files changed, 6 insertions(+)

diff --git a/drivers/clk/mediatek/clk-mt2701-hif.c 
b/drivers/clk/mediatek/clk-mt2701-hif.c
index 33ead83..0ca0537 100644
--- a/drivers/clk/mediatek/clk-mt2701-hif.c
+++ b/drivers/clk/mediatek/clk-mt2701-hif.c
@@ -55,6 +55,8 @@ static void mtk_hifsys_init(struct device_node *node)
if (r)
pr_err("%s(): could not register clock provider: %d\n",
__func__, r);
+
+   mtk_register_reset_controller(node, 1, 0x34);
 }
 
 static const struct of_device_id of_match_clk_mt2701_hif[] = {
diff --git a/drivers/clk/mediatek/clk-mt2701.c 
b/drivers/clk/mediatek/clk-mt2701.c
index f64dc4e..9dab533 100644
--- a/drivers/clk/mediatek/clk-mt2701.c
+++ b/drivers/clk/mediatek/clk-mt2701.c
@@ -791,6 +791,8 @@ static void mtk_infrasys_init(struct device_node *node)
if (r)
pr_err("%s(): could not register clock provider: %d\n",
__func__, r);
+
+   mtk_register_reset_controller(node, 2, 0x30);
 }
 
 static const struct mtk_gate_regs peri0_cg_regs = {
@@ -911,6 +913,8 @@ static void mtk_pericfg_init(struct device_node *node)
if (r)
pr_err("%s(): could not register clock provider: %d\n",
__func__, r);
+
+   mtk_register_reset_controller(node, 2, 0x0);
 }
 
 #define MT8590_PLL_FMAX(2000 * MHZ)
-- 
1.9.1



[PATCH 0/4] zswap: Optimize compressed pool memory utilization

2016-08-18 Thread Srividya Desireddy
On 17 August 2016 at 18:08, Pekka Enberg  wrote:
> On Wed, Aug 17, 2016 at 1:03 PM, Srividya Desireddy
>  wrote:
>> This series of patches optimize the memory utilized by zswap for storing
>> the swapped out pages.
>>
>> Zswap is a cache which compresses the pages that are being swapped out
>> and stores them into a dynamically allocated RAM-based memory pool.
>> Experiments have shown that around 10-15% of pages stored in zswap are
>> duplicates which results in 10-12% more RAM required to store these
>> duplicate compressed pages. Around 10-20% of pages stored in zswap
>> are zero-filled pages, but these pages are handled as normal pages by
>> compressing and allocating memory in the pool.
>>
>> The following patch-set optimizes memory utilized by zswap by avoiding the
>> storage of duplicate pages and zero-filled pages in zswap compressed memory
>> pool.
>>
>> Patch 1/4: zswap: Share zpool memory of duplicate pages
>> This patch shares compressed pool memory of the duplicate pages. When a new
>> page is requested for swap-out to zswap; search for an identical page in
>> the pages already stored in zswap. If an identical page is found then share
>> the compressed page data of the identical page with the new page. This
>> avoids allocation of memory in the compressed pool for a duplicate page.
>> This feature is tested on devices with 1GB, 2GB and 3GB RAM by executing
>> performance test at low memory conditions. Around 15-20% of the pages
>> swapped are duplicate of the pages existing in zswap, resulting in 15%
>> saving of zswap memory pool when compared to the baseline version.
>>
>> Test Parameters BaselineWith patch  Improvement
>> Total RAM   955MB   955MB
>> Available RAM 254MB   269MB   15MB
>> Avg. App entry time 2.469sec2.207sec7%
>> Avg. App close time 1.151sec1.085sec6%
>> Apps launched in 1sec   5 12 7
>>
>> There is little overhead in zswap store function due to the search
>> operation for finding duplicate pages. However, if duplicate page is
>> found it saves the compression and allocation time of the page. The average
>> overhead per zswap_frontswap_store() function call in the experimental
>> device is 9us. There is no overhead in case of zswap_frontswap_load()
>> operation.
>>
>> Patch 2/4: zswap: Enable/disable sharing of duplicate pages at runtime
>> This patch adds a module parameter to enable or disable the sharing of
>> duplicate zswap pages at runtime.
>>
>> Patch 3/4: zswap: Zero-filled pages handling
>> This patch checks if a page to be stored in zswap is a zero-filled page
>> (i.e. contents of the page are all zeros). If such page is found,
>> compression and allocation of memory for the compressed page is avoided
>> and instead the page is just marked as zero-filled page.
>> Although, compressed size of a zero-filled page using LZO compressor is
>> very less (52 bytes including zswap_header), this patch saves compression
>> and allocation time during store operation and decompression time during
>> zswap load operation for zero-filled pages. Experiments have shown that
>> around 10-20% of pages stored in zswap are zero-filled.
>
> Aren't zero-filled pages already handled by patch 1/4 as their
> contents match? So the overall memory saving is 52 bytes?
>
> - Pekka

Thanks for the quick reply.

Zero-filled pages can also be handled by patch 1/4. It performs
searching of a duplicate page among existing stored pages in zswap.
Its been observed that average search time to identify duplicate zero
filled pages(using patch 1/4) is almost thrice compared to checking
all pages for zero-filled. 

Also, in case of patch 1/4, the zswap_frontswap_load() operation requires
the compressed zero-filled page to be decompressed. zswap_frontswap_load()
function in patch 3/4 just fills the page with zeros while loading a
zero-filled page and is faster than decompression.

- Srividya

[PATCH 0/4] zswap: Optimize compressed pool memory utilization

2016-08-18 Thread Srividya Desireddy
On 17 August 2016 at 18:08, Pekka Enberg  wrote:
> On Wed, Aug 17, 2016 at 1:03 PM, Srividya Desireddy
>  wrote:
>> This series of patches optimize the memory utilized by zswap for storing
>> the swapped out pages.
>>
>> Zswap is a cache which compresses the pages that are being swapped out
>> and stores them into a dynamically allocated RAM-based memory pool.
>> Experiments have shown that around 10-15% of pages stored in zswap are
>> duplicates which results in 10-12% more RAM required to store these
>> duplicate compressed pages. Around 10-20% of pages stored in zswap
>> are zero-filled pages, but these pages are handled as normal pages by
>> compressing and allocating memory in the pool.
>>
>> The following patch-set optimizes memory utilized by zswap by avoiding the
>> storage of duplicate pages and zero-filled pages in zswap compressed memory
>> pool.
>>
>> Patch 1/4: zswap: Share zpool memory of duplicate pages
>> This patch shares compressed pool memory of the duplicate pages. When a new
>> page is requested for swap-out to zswap; search for an identical page in
>> the pages already stored in zswap. If an identical page is found then share
>> the compressed page data of the identical page with the new page. This
>> avoids allocation of memory in the compressed pool for a duplicate page.
>> This feature is tested on devices with 1GB, 2GB and 3GB RAM by executing
>> performance test at low memory conditions. Around 15-20% of the pages
>> swapped are duplicate of the pages existing in zswap, resulting in 15%
>> saving of zswap memory pool when compared to the baseline version.
>>
>> Test Parameters BaselineWith patch  Improvement
>> Total RAM   955MB   955MB
>> Available RAM 254MB   269MB   15MB
>> Avg. App entry time 2.469sec2.207sec7%
>> Avg. App close time 1.151sec1.085sec6%
>> Apps launched in 1sec   5 12 7
>>
>> There is little overhead in zswap store function due to the search
>> operation for finding duplicate pages. However, if duplicate page is
>> found it saves the compression and allocation time of the page. The average
>> overhead per zswap_frontswap_store() function call in the experimental
>> device is 9us. There is no overhead in case of zswap_frontswap_load()
>> operation.
>>
>> Patch 2/4: zswap: Enable/disable sharing of duplicate pages at runtime
>> This patch adds a module parameter to enable or disable the sharing of
>> duplicate zswap pages at runtime.
>>
>> Patch 3/4: zswap: Zero-filled pages handling
>> This patch checks if a page to be stored in zswap is a zero-filled page
>> (i.e. contents of the page are all zeros). If such page is found,
>> compression and allocation of memory for the compressed page is avoided
>> and instead the page is just marked as zero-filled page.
>> Although, compressed size of a zero-filled page using LZO compressor is
>> very less (52 bytes including zswap_header), this patch saves compression
>> and allocation time during store operation and decompression time during
>> zswap load operation for zero-filled pages. Experiments have shown that
>> around 10-20% of pages stored in zswap are zero-filled.
>
> Aren't zero-filled pages already handled by patch 1/4 as their
> contents match? So the overall memory saving is 52 bytes?
>
> - Pekka

Thanks for the quick reply.

Zero-filled pages can also be handled by patch 1/4. It performs
searching of a duplicate page among existing stored pages in zswap.
Its been observed that average search time to identify duplicate zero
filled pages(using patch 1/4) is almost thrice compared to checking
all pages for zero-filled. 

Also, in case of patch 1/4, the zswap_frontswap_load() operation requires
the compressed zero-filled page to be decompressed. zswap_frontswap_load()
function in patch 3/4 just fills the page with zeros while loading a
zero-filled page and is faster than decompression.

- Srividya

Re: linux-next: build warnings after merge of the kbuild tree

2016-08-18 Thread Nicholas Piggin
On Fri, 19 Aug 2016 15:09:14 +1000
Stephen Rothwell  wrote:

> Hi Nick,
> 
> On Fri, 19 Aug 2016 13:38:54 +1000 Stephen Rothwell  
> wrote:
> >
> > On Thu, 18 Aug 2016 11:09:48 +1000 Nicholas Piggin  
> > wrote:  
> > >
> > > On Wed, 17 Aug 2016 14:59:59 +0200
> > > Michal Marek  wrote:
> > > 
> > > > On 2016-08-17 03:44, Stephen Rothwell wrote:  
> > > > > 
> > > > > After merging the kbuild tree, today's linux-next build (powerpc
> > > > > ppc64_defconfig) produced these warnings:
> > > > > 
> > > > > WARNING: 25 bad relocations
> > > > > c0cf2570 R_PPC64_ADDR64__crc___arch_hweight16
> > > > [...]  
> > > > > Introduced by commit
> > > > > 
> > > > >   9445aa1a3062 ("ppc: move exports to definitions")
> > > > > 
> > > > > I have reverted that commit for today.
> > > > > 
> > > > > [cc-ing the ppc guys for clues - also involved is commit
> > > > > 
> > > > >   22823ab419d8 ("EXPORT_SYMBOL() for asm")
> > > > > ]
> > > > 
> > > > FWIW, I see these warnings as well. Any help from ppc developers is
> > > > appreciated - should the R_PPC64_ADDR64 be whitelisted for exported asm
> > > > symbols (their CRCs actually)?  
> > > 
> > > The dangling relocation is a side effect of linker unable to resolve the
> > > reference to the undefined weak symbols. So the real question is, why has
> > > genksyms not overridden these symbols with their CRC values?
> > > 
> > > This may not even be powerpc specific, but  I'll poke at it a bit more
> > > when I get a chance.
> > 
> > Not sure if this is relevant, but with the commit reverted, the
> > __crc___... symbols are absolute.
> > 
> > f55b3b3d A __crc___arch_hweight16  
> 
> Ignore that :-)
> 
> I just had a look at a x86_64 allmodconfig result and it looks like the
> weak symbols are not resolved their either ...
> 
> I may be missing something, but genksyms generates the crc's off the
> preprocessed C source code and we don't have any for the asm files ...

Looks like you're right, good find!

Thanks,
Nick


Re: linux-next: build warnings after merge of the kbuild tree

2016-08-18 Thread Nicholas Piggin
On Fri, 19 Aug 2016 15:09:14 +1000
Stephen Rothwell  wrote:

> Hi Nick,
> 
> On Fri, 19 Aug 2016 13:38:54 +1000 Stephen Rothwell  
> wrote:
> >
> > On Thu, 18 Aug 2016 11:09:48 +1000 Nicholas Piggin  
> > wrote:  
> > >
> > > On Wed, 17 Aug 2016 14:59:59 +0200
> > > Michal Marek  wrote:
> > > 
> > > > On 2016-08-17 03:44, Stephen Rothwell wrote:  
> > > > > 
> > > > > After merging the kbuild tree, today's linux-next build (powerpc
> > > > > ppc64_defconfig) produced these warnings:
> > > > > 
> > > > > WARNING: 25 bad relocations
> > > > > c0cf2570 R_PPC64_ADDR64__crc___arch_hweight16
> > > > [...]  
> > > > > Introduced by commit
> > > > > 
> > > > >   9445aa1a3062 ("ppc: move exports to definitions")
> > > > > 
> > > > > I have reverted that commit for today.
> > > > > 
> > > > > [cc-ing the ppc guys for clues - also involved is commit
> > > > > 
> > > > >   22823ab419d8 ("EXPORT_SYMBOL() for asm")
> > > > > ]
> > > > 
> > > > FWIW, I see these warnings as well. Any help from ppc developers is
> > > > appreciated - should the R_PPC64_ADDR64 be whitelisted for exported asm
> > > > symbols (their CRCs actually)?  
> > > 
> > > The dangling relocation is a side effect of linker unable to resolve the
> > > reference to the undefined weak symbols. So the real question is, why has
> > > genksyms not overridden these symbols with their CRC values?
> > > 
> > > This may not even be powerpc specific, but  I'll poke at it a bit more
> > > when I get a chance.
> > 
> > Not sure if this is relevant, but with the commit reverted, the
> > __crc___... symbols are absolute.
> > 
> > f55b3b3d A __crc___arch_hweight16  
> 
> Ignore that :-)
> 
> I just had a look at a x86_64 allmodconfig result and it looks like the
> weak symbols are not resolved their either ...
> 
> I may be missing something, but genksyms generates the crc's off the
> preprocessed C source code and we don't have any for the asm files ...

Looks like you're right, good find!

Thanks,
Nick


[PATCH v5 3/7] perf annotate: Add support for powerpc

2016-08-18 Thread Ravi Bangoria
From: "Naveen N. Rao" 

Current perf can disassemble annotated function but it does not have
parsing logic for powerpc instructions. So all navigation options are
not available for powerpc.

Apart from that, Powerpc has long list of branch instructions and
hardcoding them in table appears to be error-prone. So, add function
to find instruction instead of creating table. This function dynamically
create table (list of 'struct ins'), and instead of creating object
every time, first check if list already contain object for that
instruction.

Signed-off-by: Naveen N. Rao 
Signed-off-by: Ravi Bangoria 
---
Changes in v5:
  - Removed hacks for instructions like bctr and bctrl from this patch.

 tools/perf/util/annotate.c | 116 +
 1 file changed, 116 insertions(+)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index deb9af0..0b64841 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -459,6 +459,11 @@ static struct ins instructions_arm[] = {
{ .name = "bne",   .ops  = _ops, },
 };
 
+struct instructions_powerpc {
+   struct ins *ins;
+   struct list_head list;
+};
+
 static int ins__key_cmp(const void *name, const void *insp)
 {
const struct ins *ins = insp;
@@ -474,6 +479,115 @@ static int ins__cmp(const void *a, const void *b)
return strcmp(ia->name, ib->name);
 }
 
+static struct ins *list_add__ins_powerpc(struct instructions_powerpc *head,
+const char *name, struct ins_ops *ops)
+{
+   struct instructions_powerpc *ins_powerpc;
+   struct ins *ins;
+
+   ins = zalloc(sizeof(struct ins));
+   if (!ins)
+   return NULL;
+
+   ins_powerpc = zalloc(sizeof(struct instructions_powerpc));
+   if (!ins_powerpc)
+   goto out_free_ins;
+
+   ins->name = strdup(name);
+   if (!ins->name)
+   goto out_free_ins_power;
+
+   ins->ops = ops;
+   ins_powerpc->ins = ins;
+   list_add_tail(&(ins_powerpc->list), &(head->list));
+
+   return ins;
+
+out_free_ins_power:
+   zfree(_powerpc);
+out_free_ins:
+   zfree();
+   return NULL;
+}
+
+static struct ins *list_search__ins_powerpc(struct instructions_powerpc *head,
+   const char *name)
+{
+   struct instructions_powerpc *pos;
+
+   list_for_each_entry(pos, >list, list) {
+   if (!strcmp(pos->ins->name, name))
+   return pos->ins;
+   }
+   return NULL;
+}
+
+static struct ins *ins__find_powerpc(const char *name)
+{
+   int i;
+   struct ins *ins;
+   struct ins_ops *ops;
+   static struct instructions_powerpc head;
+   static bool list_initialized;
+
+   /*
+* - Interested only if instruction starts with 'b'.
+* - Few start with 'b', but aren't branch instructions.
+*/
+   if (name[0] != 'b' ||
+   !strncmp(name, "bcd", 3)   ||
+   !strncmp(name, "brinc", 5) ||
+   !strncmp(name, "bper", 4))
+   return NULL;
+
+   if (!list_initialized) {
+   INIT_LIST_HEAD();
+   list_initialized = true;
+   }
+
+   /*
+* Return if we already have object of 'struct ins' for this instruction
+*/
+   ins = list_search__ins_powerpc(, name);
+   if (ins)
+   return ins;
+
+   ops = _ops;
+
+   i = strlen(name) - 1;
+   if (i < 0)
+   return NULL;
+
+   /* ignore optional hints at the end of the instructions */
+   if (name[i] == '+' || name[i] == '-')
+   i--;
+
+   if (name[i] == 'l' || (name[i] == 'a' && name[i-1] == 'l')) {
+   /*
+* if the instruction ends up with 'l' or 'la', then
+* those are considered 'calls' since they update LR.
+* ... except for 'bnl' which is branch if not less than
+* and the absolute form of the same.
+*/
+   if (strcmp(name, "bnl") && strcmp(name, "bnl+") &&
+   strcmp(name, "bnl-") && strcmp(name, "bnla") &&
+   strcmp(name, "bnla+") && strcmp(name, "bnla-"))
+   ops = _ops;
+   }
+   if (name[i] == 'r' && name[i-1] == 'l')
+   /*
+* instructions ending with 'lr' are considered to be
+* return instructions
+*/
+   ops = _ops;
+
+   /*
+* Add instruction to list so next time no need to
+* allocate memory for it.
+*/
+   return list_add__ins_powerpc(, name, ops);
+}
+
 static void ins__sort(struct ins *instructions, int nmemb)
 {
qsort(instructions, nmemb, sizeof(struct ins), ins__cmp);
@@ -509,6 +623,8 @@ static struct ins *ins__find(const 

[PATCH v5 3/7] perf annotate: Add support for powerpc

2016-08-18 Thread Ravi Bangoria
From: "Naveen N. Rao" 

Current perf can disassemble annotated function but it does not have
parsing logic for powerpc instructions. So all navigation options are
not available for powerpc.

Apart from that, Powerpc has long list of branch instructions and
hardcoding them in table appears to be error-prone. So, add function
to find instruction instead of creating table. This function dynamically
create table (list of 'struct ins'), and instead of creating object
every time, first check if list already contain object for that
instruction.

Signed-off-by: Naveen N. Rao 
Signed-off-by: Ravi Bangoria 
---
Changes in v5:
  - Removed hacks for instructions like bctr and bctrl from this patch.

 tools/perf/util/annotate.c | 116 +
 1 file changed, 116 insertions(+)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index deb9af0..0b64841 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -459,6 +459,11 @@ static struct ins instructions_arm[] = {
{ .name = "bne",   .ops  = _ops, },
 };
 
+struct instructions_powerpc {
+   struct ins *ins;
+   struct list_head list;
+};
+
 static int ins__key_cmp(const void *name, const void *insp)
 {
const struct ins *ins = insp;
@@ -474,6 +479,115 @@ static int ins__cmp(const void *a, const void *b)
return strcmp(ia->name, ib->name);
 }
 
+static struct ins *list_add__ins_powerpc(struct instructions_powerpc *head,
+const char *name, struct ins_ops *ops)
+{
+   struct instructions_powerpc *ins_powerpc;
+   struct ins *ins;
+
+   ins = zalloc(sizeof(struct ins));
+   if (!ins)
+   return NULL;
+
+   ins_powerpc = zalloc(sizeof(struct instructions_powerpc));
+   if (!ins_powerpc)
+   goto out_free_ins;
+
+   ins->name = strdup(name);
+   if (!ins->name)
+   goto out_free_ins_power;
+
+   ins->ops = ops;
+   ins_powerpc->ins = ins;
+   list_add_tail(&(ins_powerpc->list), &(head->list));
+
+   return ins;
+
+out_free_ins_power:
+   zfree(_powerpc);
+out_free_ins:
+   zfree();
+   return NULL;
+}
+
+static struct ins *list_search__ins_powerpc(struct instructions_powerpc *head,
+   const char *name)
+{
+   struct instructions_powerpc *pos;
+
+   list_for_each_entry(pos, >list, list) {
+   if (!strcmp(pos->ins->name, name))
+   return pos->ins;
+   }
+   return NULL;
+}
+
+static struct ins *ins__find_powerpc(const char *name)
+{
+   int i;
+   struct ins *ins;
+   struct ins_ops *ops;
+   static struct instructions_powerpc head;
+   static bool list_initialized;
+
+   /*
+* - Interested only if instruction starts with 'b'.
+* - Few start with 'b', but aren't branch instructions.
+*/
+   if (name[0] != 'b' ||
+   !strncmp(name, "bcd", 3)   ||
+   !strncmp(name, "brinc", 5) ||
+   !strncmp(name, "bper", 4))
+   return NULL;
+
+   if (!list_initialized) {
+   INIT_LIST_HEAD();
+   list_initialized = true;
+   }
+
+   /*
+* Return if we already have object of 'struct ins' for this instruction
+*/
+   ins = list_search__ins_powerpc(, name);
+   if (ins)
+   return ins;
+
+   ops = _ops;
+
+   i = strlen(name) - 1;
+   if (i < 0)
+   return NULL;
+
+   /* ignore optional hints at the end of the instructions */
+   if (name[i] == '+' || name[i] == '-')
+   i--;
+
+   if (name[i] == 'l' || (name[i] == 'a' && name[i-1] == 'l')) {
+   /*
+* if the instruction ends up with 'l' or 'la', then
+* those are considered 'calls' since they update LR.
+* ... except for 'bnl' which is branch if not less than
+* and the absolute form of the same.
+*/
+   if (strcmp(name, "bnl") && strcmp(name, "bnl+") &&
+   strcmp(name, "bnl-") && strcmp(name, "bnla") &&
+   strcmp(name, "bnla+") && strcmp(name, "bnla-"))
+   ops = _ops;
+   }
+   if (name[i] == 'r' && name[i-1] == 'l')
+   /*
+* instructions ending with 'lr' are considered to be
+* return instructions
+*/
+   ops = _ops;
+
+   /*
+* Add instruction to list so next time no need to
+* allocate memory for it.
+*/
+   return list_add__ins_powerpc(, name, ops);
+}
+
 static void ins__sort(struct ins *instructions, int nmemb)
 {
qsort(instructions, nmemb, sizeof(struct ins), ins__cmp);
@@ -509,6 +623,8 @@ static struct ins *ins__find(const char *name, const char 
*norm_arch)
} else if (!strcmp(norm_arch, NORM_ARM)) {
 

[PATCH v5 7/7] perf annotate: Fix jump target outside of function address range

2016-08-18 Thread Ravi Bangoria
If jump target is outside of function range, perf is not handling it
correctly. Especially when target address is lesser than function start
address, target offset will be negative. But, target address declared
to be unsigned, converts negative number into 2's complement. See below
example. Here target of 'jumpq' instruction at 34cf8 is 34ac0 which is
lesser than function start address(34cf0).

34ac0 - 34cf0 = -0x230 = 0xfdd0

Objdump output:

  00034cf0 <__sigaction>:
  __GI___sigaction():
34cf0: lea-0x20(%rdi),%eax
34cf3: cmp-bashx1,%eax
34cf6: jbe34d00 <__sigaction+0x10>
34cf8: jmpq   34ac0 <__GI___libc_sigaction>
34cfd: nopl   (%rax)
34d00: mov0x386161(%rip),%rax# 3bae68 <_DYNAMIC+0x2e8>
34d07: movl   -bashx16,%fs:(%rax)
34d0e: mov-bashx,%eax
34d13: retq

perf annotate before applying patch:

  __GI___sigaction  /usr/lib64/libc-2.22.so
   lea-0x20(%rdi),%eax
   cmp-bashx1,%eax
V  jbe10
V  jmpq   fdd0
   nop
10:mov_DYNAMIC+0x2e8,%rax
   movl   -bashx16,%fs:(%rax)
   mov-bashx,%eax
   retq

perf annotate after applying patch:

  __GI___sigaction  /usr/lib64/libc-2.22.so
   lea-0x20(%rdi),%eax
   cmp-bashx1,%eax
V  jbe10
^  jmpq   34ac0 <__GI___libc_sigaction>
   nop
10:mov_DYNAMIC+0x2e8,%rax
   movl   -bashx16,%fs:(%rax)
   mov-bashx,%eax
   retq

Signed-off-by: Ravi Bangoria 
---
Changes in v5:
  - New patch

 tools/perf/ui/browsers/annotate.c |  5 +++--
 tools/perf/util/annotate.c| 14 +-
 tools/perf/util/annotate.h|  5 +++--
 3 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 21c5e10..c13df5b 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -215,7 +215,7 @@ static void annotate_browser__write(struct ui_browser 
*browser, void *entry, int
ui_browser__set_color(browser, color);
if (dl->ins && dl->ins->ops->scnprintf) {
if (ins__is_jump(dl->ins)) {
-   bool fwd = dl->ops.target.offset > 
(u64)dl->offset;
+   bool fwd = dl->ops.target.offset > dl->offset;
 
ui_browser__write_graph(browser, fwd ? 
SLSMG_DARROW_CHAR :

SLSMG_UARROW_CHAR);
@@ -245,7 +245,8 @@ static bool disasm_line__is_valid_jump(struct disasm_line 
*dl, struct symbol *sy
 {
if (!dl || !dl->ins || !ins__is_jump(dl->ins)
|| !disasm_line__has_offset(dl)
-   || dl->ops.target.offset >= symbol__size(sym))
+   || dl->ops.target.offset < 0
+   || dl->ops.target.offset >= (s64)symbol__size(sym))
return false;
 
return true;
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 678fb81..c8b017c 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -124,10 +124,12 @@ static int jump__parse(struct ins_operands *ops,
else
ops->target.addr = strtoull(ops->raw, NULL, 16);
 
-   if (s++ != NULL)
+   if (s++ != NULL) {
ops->target.offset = strtoull(s, NULL, 16);
-   else
-   ops->target.offset = UINT64_MAX;
+   ops->target.offset_avail = true;
+   } else {
+   ops->target.offset_avail = false;
+   }
 
return 0;
 }
@@ -135,7 +137,7 @@ static int jump__parse(struct ins_operands *ops,
 static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
   struct ins_operands *ops)
 {
-   if (!ops->target.addr)
+   if (!ops->target.addr || ops->target.offset < 0)
return ins__raw_scnprintf(ins, bf, size, ops);
 
return scnprintf(bf, size, "%-6.6s %" PRIx64, ins->name, 
ops->target.offset);
@@ -1228,9 +1230,11 @@ static int symbol__parse_objdump_line(struct symbol 
*sym, struct map *map,
if (dl == NULL)
return -1;
 
-   if (dl->ops.target.offset == UINT64_MAX)
+   if (!disasm_line__has_offset(dl)) {
dl->ops.target.offset = dl->ops.target.addr -
map__rip_2objdump(map, sym->start);
+   dl->ops.target.offset_avail = true;
+   }
 
/* kcore has no symbols, so add the call target name */
if (dl->ins && ins__is_call(dl->ins) && !dl->ops.target.name) {
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 5cfad4e..5787ed8 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -19,7 +19,8 @@ struct ins_operands {
char   

[PATCH v5 4/7] perf annotate: Do not ignore call instruction with indirect target

2016-08-18 Thread Ravi Bangoria
Do not ignore call instruction with indirect target when its already
identified as a call. This is an extension of commit e8ea1561952b
("perf annotate: Use raw form for register indirect call instructions")
to generalize annotation for all instructions with indirect calls.

This is needed for certain powerpc call instructions that use address
in a register (such as bctrl, btarl, ...).

Apart from that, when kcore is used to disassemble function, all call
instructions were ignored. This patch will fix it as a side effect by
not ignoring them. For example,

Before (with kcore):
   mov%r13,%rdi
   callq  0x811a7e70
 ^ jmpq   64
   mov%gs:0x7ef41a6e(%rip),%al

After (with kcore):
   mov%r13,%rdi
 > callq  0x811a7e70
 ^ jmpq   64
   mov%gs:0x7ef41a6e(%rip),%al

Suggested-by: Michael Ellerman 
[Suggested about 'bctrl' instruction]
Signed-off-by: Ravi Bangoria 
---
Changes in v5:
  - New patch, introduced to annotate all indirect call instructions.

 tools/perf/util/annotate.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 0b64841..6368ba9 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -81,16 +81,12 @@ static int call__parse(struct ins_operands *ops, const char 
*norm_arch)
return ops->target.name == NULL ? -1 : 0;
 
 indirect_call:
-   tok = strchr(endptr, '(');
-   if (tok != NULL) {
+   tok = strchr(endptr, '*');
+   if (tok == NULL) {
ops->target.addr = 0;
return 0;
}
 
-   tok = strchr(endptr, '*');
-   if (tok == NULL)
-   return -1;
-
ops->target.addr = strtoull(tok + 1, NULL, 16);
return 0;
 }
-- 
2.5.5



[PATCH v5 2/7] perf annotate: Add cross arch annotate support

2016-08-18 Thread Ravi Bangoria
Change current data structures and function to enable cross arch
annotate.

Current perf implementation does not support cross arch annotate.
To make it truly cross arch, instruction table of all arch should
be present in perf binary. And use appropriate table based on arch
where perf.data was recorded.

Signed-off-by: Ravi Bangoria 
---
Changes in v5:
  - Replaced symbol__annotate with symbol__disassemble.

 tools/perf/builtin-top.c  |   2 +-
 tools/perf/ui/browsers/annotate.c |   3 +-
 tools/perf/ui/gtk/annotate.c  |   2 +-
 tools/perf/util/annotate.c| 133 --
 tools/perf/util/annotate.h|   5 +-
 5 files changed, 92 insertions(+), 53 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index a3223aa..fdd4203 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -129,7 +129,7 @@ static int perf_top__parse_source(struct perf_top *top, 
struct hist_entry *he)
return err;
}
 
-   err = symbol__disassemble(sym, map, 0);
+   err = symbol__disassemble(sym, map, 0, NULL);
if (err == 0) {
 out_assign:
top->sym_filter_entry = he;
diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 2e2d100..21c5e10 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -1050,7 +1050,8 @@ int symbol__tui_annotate(struct symbol *sym, struct map 
*map,
  (nr_pcnt - 1);
}
 
-   err = symbol__disassemble(sym, map, sizeof_bdl);
+   err = symbol__disassemble(sym, map, sizeof_bdl,
+ perf_evsel__env_arch(evsel));
if (err) {
char msg[BUFSIZ];
symbol__strerror_disassemble(sym, map, err, msg, sizeof(msg));
diff --git a/tools/perf/ui/gtk/annotate.c b/tools/perf/ui/gtk/annotate.c
index 42d3199..c127aba 100644
--- a/tools/perf/ui/gtk/annotate.c
+++ b/tools/perf/ui/gtk/annotate.c
@@ -167,7 +167,7 @@ static int symbol__gtk_annotate(struct symbol *sym, struct 
map *map,
if (map->dso->annotate_warned)
return -1;
 
-   err = symbol__disassemble(sym, map, 0);
+   err = symbol__disassemble(sym, map, 0, perf_evsel__env_arch(evsel));
if (err) {
char msg[BUFSIZ];
symbol__strerror_disassemble(sym, map, err, msg, sizeof(msg));
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 25a9259..deb9af0 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -20,12 +20,14 @@
 #include 
 #include 
 #include 
+#include 
+#include "../arch/common.h"
 
 const char *disassembler_style;
 const char *objdump_path;
 static regex_t  file_lineno;
 
-static struct ins *ins__find(const char *name);
+static struct ins *ins__find(const char *name, const char *norm_arch);
 static int disasm_line__parse(char *line, char **namep, char **rawp);
 
 static void ins__delete(struct ins_operands *ops)
@@ -53,7 +55,7 @@ int ins__scnprintf(struct ins *ins, char *bf, size_t size,
return ins__raw_scnprintf(ins, bf, size, ops);
 }
 
-static int call__parse(struct ins_operands *ops)
+static int call__parse(struct ins_operands *ops, const char *norm_arch)
 {
char *endptr, *tok, *name;
 
@@ -65,10 +67,8 @@ static int call__parse(struct ins_operands *ops)
 
name++;
 
-#ifdef __arm__
-   if (strchr(name, '+'))
+   if (!strcmp(norm_arch, NORM_ARM) && strchr(name, '+'))
return -1;
-#endif
 
tok = strchr(name, '>');
if (tok == NULL)
@@ -117,7 +117,8 @@ bool ins__is_call(const struct ins *ins)
return ins->ops == _ops;
 }
 
-static int jump__parse(struct ins_operands *ops)
+static int jump__parse(struct ins_operands *ops,
+  const char *norm_arch __maybe_unused)
 {
const char *s = strchr(ops->raw, '+');
 
@@ -172,7 +173,7 @@ static int comment__symbol(char *raw, char *comment, u64 
*addrp, char **namep)
return 0;
 }
 
-static int lock__parse(struct ins_operands *ops)
+static int lock__parse(struct ins_operands *ops, const char *norm_arch)
 {
char *name;
 
@@ -183,7 +184,7 @@ static int lock__parse(struct ins_operands *ops)
if (disasm_line__parse(ops->raw, , >locked.ops->raw) < 0)
goto out_free_ops;
 
-   ops->locked.ins = ins__find(name);
+   ops->locked.ins = ins__find(name, norm_arch);
free(name);
 
if (ops->locked.ins == NULL)
@@ -193,7 +194,7 @@ static int lock__parse(struct ins_operands *ops)
return 0;
 
if (ops->locked.ins->ops->parse &&
-   ops->locked.ins->ops->parse(ops->locked.ops) < 0)
+   ops->locked.ins->ops->parse(ops->locked.ops, norm_arch) < 0)
goto out_free_ops;
 
return 0;
@@ -236,7 +237,7 @@ static struct ins_ops lock_ops = {
.scnprintf = lock__scnprintf,
 };
 

[PATCH v5 7/7] perf annotate: Fix jump target outside of function address range

2016-08-18 Thread Ravi Bangoria
If jump target is outside of function range, perf is not handling it
correctly. Especially when target address is lesser than function start
address, target offset will be negative. But, target address declared
to be unsigned, converts negative number into 2's complement. See below
example. Here target of 'jumpq' instruction at 34cf8 is 34ac0 which is
lesser than function start address(34cf0).

34ac0 - 34cf0 = -0x230 = 0xfdd0

Objdump output:

  00034cf0 <__sigaction>:
  __GI___sigaction():
34cf0: lea-0x20(%rdi),%eax
34cf3: cmp-bashx1,%eax
34cf6: jbe34d00 <__sigaction+0x10>
34cf8: jmpq   34ac0 <__GI___libc_sigaction>
34cfd: nopl   (%rax)
34d00: mov0x386161(%rip),%rax# 3bae68 <_DYNAMIC+0x2e8>
34d07: movl   -bashx16,%fs:(%rax)
34d0e: mov-bashx,%eax
34d13: retq

perf annotate before applying patch:

  __GI___sigaction  /usr/lib64/libc-2.22.so
   lea-0x20(%rdi),%eax
   cmp-bashx1,%eax
V  jbe10
V  jmpq   fdd0
   nop
10:mov_DYNAMIC+0x2e8,%rax
   movl   -bashx16,%fs:(%rax)
   mov-bashx,%eax
   retq

perf annotate after applying patch:

  __GI___sigaction  /usr/lib64/libc-2.22.so
   lea-0x20(%rdi),%eax
   cmp-bashx1,%eax
V  jbe10
^  jmpq   34ac0 <__GI___libc_sigaction>
   nop
10:mov_DYNAMIC+0x2e8,%rax
   movl   -bashx16,%fs:(%rax)
   mov-bashx,%eax
   retq

Signed-off-by: Ravi Bangoria 
---
Changes in v5:
  - New patch

 tools/perf/ui/browsers/annotate.c |  5 +++--
 tools/perf/util/annotate.c| 14 +-
 tools/perf/util/annotate.h|  5 +++--
 3 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 21c5e10..c13df5b 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -215,7 +215,7 @@ static void annotate_browser__write(struct ui_browser 
*browser, void *entry, int
ui_browser__set_color(browser, color);
if (dl->ins && dl->ins->ops->scnprintf) {
if (ins__is_jump(dl->ins)) {
-   bool fwd = dl->ops.target.offset > 
(u64)dl->offset;
+   bool fwd = dl->ops.target.offset > dl->offset;
 
ui_browser__write_graph(browser, fwd ? 
SLSMG_DARROW_CHAR :

SLSMG_UARROW_CHAR);
@@ -245,7 +245,8 @@ static bool disasm_line__is_valid_jump(struct disasm_line 
*dl, struct symbol *sy
 {
if (!dl || !dl->ins || !ins__is_jump(dl->ins)
|| !disasm_line__has_offset(dl)
-   || dl->ops.target.offset >= symbol__size(sym))
+   || dl->ops.target.offset < 0
+   || dl->ops.target.offset >= (s64)symbol__size(sym))
return false;
 
return true;
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 678fb81..c8b017c 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -124,10 +124,12 @@ static int jump__parse(struct ins_operands *ops,
else
ops->target.addr = strtoull(ops->raw, NULL, 16);
 
-   if (s++ != NULL)
+   if (s++ != NULL) {
ops->target.offset = strtoull(s, NULL, 16);
-   else
-   ops->target.offset = UINT64_MAX;
+   ops->target.offset_avail = true;
+   } else {
+   ops->target.offset_avail = false;
+   }
 
return 0;
 }
@@ -135,7 +137,7 @@ static int jump__parse(struct ins_operands *ops,
 static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
   struct ins_operands *ops)
 {
-   if (!ops->target.addr)
+   if (!ops->target.addr || ops->target.offset < 0)
return ins__raw_scnprintf(ins, bf, size, ops);
 
return scnprintf(bf, size, "%-6.6s %" PRIx64, ins->name, 
ops->target.offset);
@@ -1228,9 +1230,11 @@ static int symbol__parse_objdump_line(struct symbol 
*sym, struct map *map,
if (dl == NULL)
return -1;
 
-   if (dl->ops.target.offset == UINT64_MAX)
+   if (!disasm_line__has_offset(dl)) {
dl->ops.target.offset = dl->ops.target.addr -
map__rip_2objdump(map, sym->start);
+   dl->ops.target.offset_avail = true;
+   }
 
/* kcore has no symbols, so add the call target name */
if (dl->ins && ins__is_call(dl->ins) && !dl->ops.target.name) {
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 5cfad4e..5787ed8 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -19,7 +19,8 @@ struct ins_operands {
char*raw;
char

[PATCH v5 4/7] perf annotate: Do not ignore call instruction with indirect target

2016-08-18 Thread Ravi Bangoria
Do not ignore call instruction with indirect target when its already
identified as a call. This is an extension of commit e8ea1561952b
("perf annotate: Use raw form for register indirect call instructions")
to generalize annotation for all instructions with indirect calls.

This is needed for certain powerpc call instructions that use address
in a register (such as bctrl, btarl, ...).

Apart from that, when kcore is used to disassemble function, all call
instructions were ignored. This patch will fix it as a side effect by
not ignoring them. For example,

Before (with kcore):
   mov%r13,%rdi
   callq  0x811a7e70
 ^ jmpq   64
   mov%gs:0x7ef41a6e(%rip),%al

After (with kcore):
   mov%r13,%rdi
 > callq  0x811a7e70
 ^ jmpq   64
   mov%gs:0x7ef41a6e(%rip),%al

Suggested-by: Michael Ellerman 
[Suggested about 'bctrl' instruction]
Signed-off-by: Ravi Bangoria 
---
Changes in v5:
  - New patch, introduced to annotate all indirect call instructions.

 tools/perf/util/annotate.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 0b64841..6368ba9 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -81,16 +81,12 @@ static int call__parse(struct ins_operands *ops, const char 
*norm_arch)
return ops->target.name == NULL ? -1 : 0;
 
 indirect_call:
-   tok = strchr(endptr, '(');
-   if (tok != NULL) {
+   tok = strchr(endptr, '*');
+   if (tok == NULL) {
ops->target.addr = 0;
return 0;
}
 
-   tok = strchr(endptr, '*');
-   if (tok == NULL)
-   return -1;
-
ops->target.addr = strtoull(tok + 1, NULL, 16);
return 0;
 }
-- 
2.5.5



[PATCH v5 2/7] perf annotate: Add cross arch annotate support

2016-08-18 Thread Ravi Bangoria
Change current data structures and function to enable cross arch
annotate.

Current perf implementation does not support cross arch annotate.
To make it truly cross arch, instruction table of all arch should
be present in perf binary. And use appropriate table based on arch
where perf.data was recorded.

Signed-off-by: Ravi Bangoria 
---
Changes in v5:
  - Replaced symbol__annotate with symbol__disassemble.

 tools/perf/builtin-top.c  |   2 +-
 tools/perf/ui/browsers/annotate.c |   3 +-
 tools/perf/ui/gtk/annotate.c  |   2 +-
 tools/perf/util/annotate.c| 133 --
 tools/perf/util/annotate.h|   5 +-
 5 files changed, 92 insertions(+), 53 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index a3223aa..fdd4203 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -129,7 +129,7 @@ static int perf_top__parse_source(struct perf_top *top, 
struct hist_entry *he)
return err;
}
 
-   err = symbol__disassemble(sym, map, 0);
+   err = symbol__disassemble(sym, map, 0, NULL);
if (err == 0) {
 out_assign:
top->sym_filter_entry = he;
diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 2e2d100..21c5e10 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -1050,7 +1050,8 @@ int symbol__tui_annotate(struct symbol *sym, struct map 
*map,
  (nr_pcnt - 1);
}
 
-   err = symbol__disassemble(sym, map, sizeof_bdl);
+   err = symbol__disassemble(sym, map, sizeof_bdl,
+ perf_evsel__env_arch(evsel));
if (err) {
char msg[BUFSIZ];
symbol__strerror_disassemble(sym, map, err, msg, sizeof(msg));
diff --git a/tools/perf/ui/gtk/annotate.c b/tools/perf/ui/gtk/annotate.c
index 42d3199..c127aba 100644
--- a/tools/perf/ui/gtk/annotate.c
+++ b/tools/perf/ui/gtk/annotate.c
@@ -167,7 +167,7 @@ static int symbol__gtk_annotate(struct symbol *sym, struct 
map *map,
if (map->dso->annotate_warned)
return -1;
 
-   err = symbol__disassemble(sym, map, 0);
+   err = symbol__disassemble(sym, map, 0, perf_evsel__env_arch(evsel));
if (err) {
char msg[BUFSIZ];
symbol__strerror_disassemble(sym, map, err, msg, sizeof(msg));
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 25a9259..deb9af0 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -20,12 +20,14 @@
 #include 
 #include 
 #include 
+#include 
+#include "../arch/common.h"
 
 const char *disassembler_style;
 const char *objdump_path;
 static regex_t  file_lineno;
 
-static struct ins *ins__find(const char *name);
+static struct ins *ins__find(const char *name, const char *norm_arch);
 static int disasm_line__parse(char *line, char **namep, char **rawp);
 
 static void ins__delete(struct ins_operands *ops)
@@ -53,7 +55,7 @@ int ins__scnprintf(struct ins *ins, char *bf, size_t size,
return ins__raw_scnprintf(ins, bf, size, ops);
 }
 
-static int call__parse(struct ins_operands *ops)
+static int call__parse(struct ins_operands *ops, const char *norm_arch)
 {
char *endptr, *tok, *name;
 
@@ -65,10 +67,8 @@ static int call__parse(struct ins_operands *ops)
 
name++;
 
-#ifdef __arm__
-   if (strchr(name, '+'))
+   if (!strcmp(norm_arch, NORM_ARM) && strchr(name, '+'))
return -1;
-#endif
 
tok = strchr(name, '>');
if (tok == NULL)
@@ -117,7 +117,8 @@ bool ins__is_call(const struct ins *ins)
return ins->ops == _ops;
 }
 
-static int jump__parse(struct ins_operands *ops)
+static int jump__parse(struct ins_operands *ops,
+  const char *norm_arch __maybe_unused)
 {
const char *s = strchr(ops->raw, '+');
 
@@ -172,7 +173,7 @@ static int comment__symbol(char *raw, char *comment, u64 
*addrp, char **namep)
return 0;
 }
 
-static int lock__parse(struct ins_operands *ops)
+static int lock__parse(struct ins_operands *ops, const char *norm_arch)
 {
char *name;
 
@@ -183,7 +184,7 @@ static int lock__parse(struct ins_operands *ops)
if (disasm_line__parse(ops->raw, , >locked.ops->raw) < 0)
goto out_free_ops;
 
-   ops->locked.ins = ins__find(name);
+   ops->locked.ins = ins__find(name, norm_arch);
free(name);
 
if (ops->locked.ins == NULL)
@@ -193,7 +194,7 @@ static int lock__parse(struct ins_operands *ops)
return 0;
 
if (ops->locked.ins->ops->parse &&
-   ops->locked.ins->ops->parse(ops->locked.ops) < 0)
+   ops->locked.ins->ops->parse(ops->locked.ops, norm_arch) < 0)
goto out_free_ops;
 
return 0;
@@ -236,7 +237,7 @@ static struct ins_ops lock_ops = {
.scnprintf = lock__scnprintf,
 };
 
-static int mov__parse(struct 

Re: [LKP] [lkp] [sctp] a6c2f79287: netperf.Throughput_Mbps -37.2% regression

2016-08-18 Thread Aaron Lu
On Thu, Aug 18, 2016 at 08:45:42PM +0800, Xin Long wrote:
> >> Hi, Aaron
> >>
> >> 1)
> >> I talked with Marcelo about this one.
> >> He said it might be related with cacheline.  the  new field distroyed
> >> the prior cacheline. So on top of commit 826d253d57b1, pls only add
> >> +   unsigned long prsctp_param;
> >>
> >> to the end of struct sctp_chunk, then try.
> >
> > This doesn't work.
> >
> 
> If it's because of cache lines changed, I'm not sure this, either.
> Maybe 2) is a good way to fix it.

A comparison of the good commit 826d253d57b1 and the bad a6c2f792873a:

tests: 8
testcase/path_params/tbox_group/run: 
netperf/ipv4-300s-200%-cs-localhost-10K-SCTP_STREAM_MANY-performance/lkp-ivb-d02

826d253d57b11f69 a6c2f792873aff332a4689717c  
 --  
 %stddev  change %stddev
 \  |\  
  3923 -37%   2461netperf.Throughput_Mbps
 9 -78%  2vmstat.procs.r
112616  19% 133981vmstat.system.cs
  4053   7%   4350vmstat.system.in
  8598 ±  4%   957%  90912softirqs.SCHED
  16466114 -37%   10305467softirqs.NET_RX
605899 -46% 329262softirqs.TIMER
 72067 ± 10%   -63%  26356 ±  3%  softirqs.RCU
  4785 ±  7%-9%   4352slabinfo.anon_vma_chain.num_objs
   642 ±  7%14%731 ±  6%  slabinfo.kmalloc-512.active_objs
  4993  15%   5735slabinfo.kmalloc-64.active_objs
  4993  15%   5735slabinfo.kmalloc-64.num_objs
  2529 ±  4%   -15%   2150proc-vmstat.nr_alloc_batch
 4.733e+08 -37%  2.999e+08proc-vmstat.pgalloc_normal
 8.476e+08 -37%   5.36e+08proc-vmstat.pgfree
 3.742e+08 -37%  2.361e+08proc-vmstat.pgalloc_dma32
  1.48e+08 -37%   93033641proc-vmstat.numa_hit
  1.48e+08 -37%   93033640proc-vmstat.numa_local
  0.05 ± 17% 52102%  24.80turbostat.CPU%c1
  0.643065%  20.10 ±  3%  turbostat.CPU%c6
  0.12 ± 39%  1900%   2.35 ±  3%  turbostat.Pkg%pc2
  0.46 ± 10%  1686%   8.22 ±  6%  turbostat.Pkg%pc6
 37.54 -14%  32.11turbostat.PkgWatt
 20.20 -25%  15.22turbostat.CorWatt
 99.31 -45%  54.97turbostat.%Busy
  3269 -45%   1803turbostat.Avg_MHz
 76510 ± 46% 3e+05%  1.954e+08cpuidle.C1-IVB.time
 19769 ± 17%  5534%1113742 ±  5%  cpuidle.C1E-IVB.time
   151 ± 11%  4175%   6454 ±  7%  cpuidle.C1E-IVB.usage
   114 ± 14%  6216%   7232 ±  5%  cpuidle.C3-IVB.usage
 33074 ± 14%  5159%1739419 ±  3%  cpuidle.C3-IVB.time
  88744203% 381901cpuidle.C6-IVB.usage
   80061844072%   3.34e+08cpuidle.C6-IVB.time
 12019 ± 35%   303%  48398perf-stat.cpu-migrations
  34232822  19%   40780053perf-stat.context-switches
339045   5% 354573perf-stat.minor-faults
339041   5% 354568perf-stat.page-faults
 2.776e+11 -28%  2.003e+11perf-stat.branch-instructions
 1.505e+12 -29%  1.065e+12perf-stat.instructions
 6.421e+11 -30%  4.473e+11perf-stat.dTLB-loads
  5.32e+11 -34%  3.536e+11perf-stat.dTLB-stores
 1.173e+11 -38%  7.271e+10perf-stat.cache-references
 3.735e+08 ±  5%   -48%  1.959e+08 ±  4%  perf-stat.iTLB-load-misses
 3.864e+09 -51%1.9e+09perf-stat.branch-misses
 4.069e+09 ± 20%   -56%  1.798e+09 ± 40%  perf-stat.dTLB-load-misses
 5.285e+08 ± 22%   -70%  1.585e+08 ± 16%  perf-stat.dTLB-store-misses
 7.126e+09 ± 16%   -97%   2.27e+08 ±  4%  perf-stat.cache-misses

The obvious change is:
1 the bad commit has a much fewer runnable process - vmstat.procs.r
2 the context switches are much higher in the bad commit - vmstat.system.cs

It all suggests the netperf processes go to sleep for some reason in the bad
commit.

I used "perf record -p one_netperf_pid -e probe:pick_next_task_idle" as
suggested by Tim to see where it went to sleep:

Samples: 78  of event 'probe:pick_next_task_idle', Event count(approx.): 78
  Children  Self  Trace output
  â– -  100.00%   100.00%  (810fc750)
  â–’ __sendmsg_nocancel
  â–’ entry_SYSCALL_64_fastpath
  â–’ sys_sendmsg
  â–’ __sys_sendmsg
  â–’ ___sys_sendmsg
  â–’ inet_sendmsg
  â–’ sctp_sendmsg
  â–’ sctp_wait_for_sndbuf
  â–’ schedule_timeout
  â–’ schedule
  â–’ pick_next_task_idle

It doesn't look insane and sctp_wait_for_sndbuf may actually have
something to do 

[PATCH v5 6/7] perf annotate: Support jump instruction with target as second operand

2016-08-18 Thread Ravi Bangoria
Current perf is not able to parse jump instruction when second operand
contains target address. Arch like powerpc has such instructions. For
example, 'beq  cr7,10173e60'.

Signed-off-by: Ravi Bangoria 
---
Changes in v5:
  - New patch

 tools/perf/util/annotate.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 4a4a583..678fb81 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -117,8 +117,12 @@ static int jump__parse(struct ins_operands *ops,
   const char *norm_arch __maybe_unused)
 {
const char *s = strchr(ops->raw, '+');
+   const char *c = strchr(ops->raw, ',');
 
-   ops->target.addr = strtoull(ops->raw, NULL, 16);
+   if (c++ != NULL)
+   ops->target.addr = strtoull(c, NULL, 16);
+   else
+   ops->target.addr = strtoull(ops->raw, NULL, 16);
 
if (s++ != NULL)
ops->target.offset = strtoull(s, NULL, 16);
-- 
2.5.5



Re: [LKP] [lkp] [sctp] a6c2f79287: netperf.Throughput_Mbps -37.2% regression

2016-08-18 Thread Aaron Lu
On Thu, Aug 18, 2016 at 08:45:42PM +0800, Xin Long wrote:
> >> Hi, Aaron
> >>
> >> 1)
> >> I talked with Marcelo about this one.
> >> He said it might be related with cacheline.  the  new field distroyed
> >> the prior cacheline. So on top of commit 826d253d57b1, pls only add
> >> +   unsigned long prsctp_param;
> >>
> >> to the end of struct sctp_chunk, then try.
> >
> > This doesn't work.
> >
> 
> If it's because of cache lines changed, I'm not sure this, either.
> Maybe 2) is a good way to fix it.

A comparison of the good commit 826d253d57b1 and the bad a6c2f792873a:

tests: 8
testcase/path_params/tbox_group/run: 
netperf/ipv4-300s-200%-cs-localhost-10K-SCTP_STREAM_MANY-performance/lkp-ivb-d02

826d253d57b11f69 a6c2f792873aff332a4689717c  
 --  
 %stddev  change %stddev
 \  |\  
  3923 -37%   2461netperf.Throughput_Mbps
 9 -78%  2vmstat.procs.r
112616  19% 133981vmstat.system.cs
  4053   7%   4350vmstat.system.in
  8598 ±  4%   957%  90912softirqs.SCHED
  16466114 -37%   10305467softirqs.NET_RX
605899 -46% 329262softirqs.TIMER
 72067 ± 10%   -63%  26356 ±  3%  softirqs.RCU
  4785 ±  7%-9%   4352slabinfo.anon_vma_chain.num_objs
   642 ±  7%14%731 ±  6%  slabinfo.kmalloc-512.active_objs
  4993  15%   5735slabinfo.kmalloc-64.active_objs
  4993  15%   5735slabinfo.kmalloc-64.num_objs
  2529 ±  4%   -15%   2150proc-vmstat.nr_alloc_batch
 4.733e+08 -37%  2.999e+08proc-vmstat.pgalloc_normal
 8.476e+08 -37%   5.36e+08proc-vmstat.pgfree
 3.742e+08 -37%  2.361e+08proc-vmstat.pgalloc_dma32
  1.48e+08 -37%   93033641proc-vmstat.numa_hit
  1.48e+08 -37%   93033640proc-vmstat.numa_local
  0.05 ± 17% 52102%  24.80turbostat.CPU%c1
  0.643065%  20.10 ±  3%  turbostat.CPU%c6
  0.12 ± 39%  1900%   2.35 ±  3%  turbostat.Pkg%pc2
  0.46 ± 10%  1686%   8.22 ±  6%  turbostat.Pkg%pc6
 37.54 -14%  32.11turbostat.PkgWatt
 20.20 -25%  15.22turbostat.CorWatt
 99.31 -45%  54.97turbostat.%Busy
  3269 -45%   1803turbostat.Avg_MHz
 76510 ± 46% 3e+05%  1.954e+08cpuidle.C1-IVB.time
 19769 ± 17%  5534%1113742 ±  5%  cpuidle.C1E-IVB.time
   151 ± 11%  4175%   6454 ±  7%  cpuidle.C1E-IVB.usage
   114 ± 14%  6216%   7232 ±  5%  cpuidle.C3-IVB.usage
 33074 ± 14%  5159%1739419 ±  3%  cpuidle.C3-IVB.time
  88744203% 381901cpuidle.C6-IVB.usage
   80061844072%   3.34e+08cpuidle.C6-IVB.time
 12019 ± 35%   303%  48398perf-stat.cpu-migrations
  34232822  19%   40780053perf-stat.context-switches
339045   5% 354573perf-stat.minor-faults
339041   5% 354568perf-stat.page-faults
 2.776e+11 -28%  2.003e+11perf-stat.branch-instructions
 1.505e+12 -29%  1.065e+12perf-stat.instructions
 6.421e+11 -30%  4.473e+11perf-stat.dTLB-loads
  5.32e+11 -34%  3.536e+11perf-stat.dTLB-stores
 1.173e+11 -38%  7.271e+10perf-stat.cache-references
 3.735e+08 ±  5%   -48%  1.959e+08 ±  4%  perf-stat.iTLB-load-misses
 3.864e+09 -51%1.9e+09perf-stat.branch-misses
 4.069e+09 ± 20%   -56%  1.798e+09 ± 40%  perf-stat.dTLB-load-misses
 5.285e+08 ± 22%   -70%  1.585e+08 ± 16%  perf-stat.dTLB-store-misses
 7.126e+09 ± 16%   -97%   2.27e+08 ±  4%  perf-stat.cache-misses

The obvious change is:
1 the bad commit has a much fewer runnable process - vmstat.procs.r
2 the context switches are much higher in the bad commit - vmstat.system.cs

It all suggests the netperf processes go to sleep for some reason in the bad
commit.

I used "perf record -p one_netperf_pid -e probe:pick_next_task_idle" as
suggested by Tim to see where it went to sleep:

Samples: 78  of event 'probe:pick_next_task_idle', Event count(approx.): 78
  Children  Self  Trace output
  â– -  100.00%   100.00%  (810fc750)
  â–’ __sendmsg_nocancel
  â–’ entry_SYSCALL_64_fastpath
  â–’ sys_sendmsg
  â–’ __sys_sendmsg
  â–’ ___sys_sendmsg
  â–’ inet_sendmsg
  â–’ sctp_sendmsg
  â–’ sctp_wait_for_sndbuf
  â–’ schedule_timeout
  â–’ schedule
  â–’ pick_next_task_idle

It doesn't look insane and sctp_wait_for_sndbuf may actually have
something to do 

[PATCH v5 6/7] perf annotate: Support jump instruction with target as second operand

2016-08-18 Thread Ravi Bangoria
Current perf is not able to parse jump instruction when second operand
contains target address. Arch like powerpc has such instructions. For
example, 'beq  cr7,10173e60'.

Signed-off-by: Ravi Bangoria 
---
Changes in v5:
  - New patch

 tools/perf/util/annotate.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 4a4a583..678fb81 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -117,8 +117,12 @@ static int jump__parse(struct ins_operands *ops,
   const char *norm_arch __maybe_unused)
 {
const char *s = strchr(ops->raw, '+');
+   const char *c = strchr(ops->raw, ',');
 
-   ops->target.addr = strtoull(ops->raw, NULL, 16);
+   if (c++ != NULL)
+   ops->target.addr = strtoull(c, NULL, 16);
+   else
+   ops->target.addr = strtoull(ops->raw, NULL, 16);
 
if (s++ != NULL)
ops->target.offset = strtoull(s, NULL, 16);
-- 
2.5.5



[PATCH v5 1/7] perf: Define macro for normalized arch names

2016-08-18 Thread Ravi Bangoria
Define macro for each normalized arch name and use them instead
of using arch name as string.

Signed-off-by: Ravi Bangoria 
---
Changes in v5:
  - No changes.

 tools/perf/arch/common.c   | 36 ++--
 tools/perf/arch/common.h   | 11 +++
 tools/perf/util/unwind-libunwind.c |  4 ++--
 3 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c
index 886dd2a..f763666 100644
--- a/tools/perf/arch/common.c
+++ b/tools/perf/arch/common.c
@@ -123,25 +123,25 @@ static int lookup_triplets(const char *const *triplets, 
const char *name)
 const char *normalize_arch(char *arch)
 {
if (!strcmp(arch, "x86_64"))
-   return "x86";
+   return NORM_X86;
if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
-   return "x86";
+   return NORM_X86;
if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
-   return "sparc";
+   return NORM_SPARC;
if (!strcmp(arch, "aarch64") || !strcmp(arch, "arm64"))
-   return "arm64";
+   return NORM_ARM64;
if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
-   return "arm";
+   return NORM_ARM;
if (!strncmp(arch, "s390", 4))
-   return "s390";
+   return NORM_S390;
if (!strncmp(arch, "parisc", 6))
-   return "parisc";
+   return NORM_PARISC;
if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
-   return "powerpc";
+   return NORM_POWERPC;
if (!strncmp(arch, "mips", 4))
-   return "mips";
+   return NORM_MIPS;
if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
-   return "sh";
+   return NORM_SH;
 
return arch;
 }
@@ -181,21 +181,21 @@ static int perf_env__lookup_binutils_path(struct perf_env 
*env,
zfree();
}
 
-   if (!strcmp(arch, "arm"))
+   if (!strcmp(arch, NORM_ARM))
path_list = arm_triplets;
-   else if (!strcmp(arch, "arm64"))
+   else if (!strcmp(arch, NORM_ARM64))
path_list = arm64_triplets;
-   else if (!strcmp(arch, "powerpc"))
+   else if (!strcmp(arch, NORM_POWERPC))
path_list = powerpc_triplets;
-   else if (!strcmp(arch, "sh"))
+   else if (!strcmp(arch, NORM_SH))
path_list = sh_triplets;
-   else if (!strcmp(arch, "s390"))
+   else if (!strcmp(arch, NORM_S390))
path_list = s390_triplets;
-   else if (!strcmp(arch, "sparc"))
+   else if (!strcmp(arch, NORM_SPARC))
path_list = sparc_triplets;
-   else if (!strcmp(arch, "x86"))
+   else if (!strcmp(arch, NORM_X86))
path_list = x86_triplets;
-   else if (!strcmp(arch, "mips"))
+   else if (!strcmp(arch, NORM_MIPS))
path_list = mips_triplets;
else {
ui__error("binutils for %s not supported.\n", arch);
diff --git a/tools/perf/arch/common.h b/tools/perf/arch/common.h
index 6b01c73..14ca8ca 100644
--- a/tools/perf/arch/common.h
+++ b/tools/perf/arch/common.h
@@ -5,6 +5,17 @@
 
 extern const char *objdump_path;
 
+/* Macro for normalized arch names */
+#define NORM_X86   "x86"
+#define NORM_SPARC "sparc"
+#define NORM_ARM64 "arm64"
+#define NORM_ARM   "arm"
+#define NORM_S390  "s390"
+#define NORM_PARISC"parisc"
+#define NORM_POWERPC   "powerpc"
+#define NORM_MIPS  "mips"
+#define NORM_SH"sh"
+
 int perf_env__lookup_objdump(struct perf_env *env);
 const char *normalize_arch(char *arch);
 
diff --git a/tools/perf/util/unwind-libunwind.c 
b/tools/perf/util/unwind-libunwind.c
index 6d542a4..6199102 100644
--- a/tools/perf/util/unwind-libunwind.c
+++ b/tools/perf/util/unwind-libunwind.c
@@ -40,10 +40,10 @@ int unwind__prepare_access(struct thread *thread, struct 
map *map,
 
arch = normalize_arch(thread->mg->machine->env->arch);
 
-   if (!strcmp(arch, "x86")) {
+   if (!strcmp(arch, NORM_X86)) {
if (dso_type != DSO__TYPE_64BIT)
ops = x86_32_unwind_libunwind_ops;
-   } else if (!strcmp(arch, "arm64") || !strcmp(arch, "arm")) {
+   } else if (!strcmp(arch, NORM_ARM64) || !strcmp(arch, NORM_ARM)) {
if (dso_type == DSO__TYPE_64BIT)
ops = arm64_unwind_libunwind_ops;
}
-- 
2.5.5



[PATCH v5 0/7] perf: Cross arch annotate + few miscellaneous fixes

2016-08-18 Thread Ravi Bangoria
Currently Perf annotate support code navigation (branches and calls)
only when run on the same architecture where perf.data was recorded.
But, for example, record on powerpc server and annotate on client's
x86 desktop is not supported.

This patchset enables cross arch annotate. Currently I've used x86
and arm instructions which are already available and added support
for powerpc.

Additionally this patch series also contains few other related fixes.

Patches are prepared on top of acme/perf/core and tested it with x86
and powerpc only.

Note for arm:
Few instructions were defined under #if __arm__ which I've used as a
table for arm. But I'm not sure whether instruction defined outside of
that also contains arm instructions. Apart from that, 'call__parse()'
and 'move__parse()' contains #ifdef __arm__ directive. I've changed it
to  if (!strcmp(norm_arch, arm)). I don't have a arm machine to test
these changes.

Example:

  Record on powerpc:
  $ ./perf record -a

  Report -> Annotate on x86:
  $ ./perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc

Changes in v5:
  - Replaced symbol__annotate with symbol__disassemble.
  - Removed hacks for jump and call instructions like bctr and bctrl
respectively from generic patch that enables support for powerpc
and made separate patch for that.
  - v4 was not annotating powerpc 'btar' instruction. Included that.
  - Added few generic fixes.

v4 link:
  https://lkml.org/lkml/2016/7/8/10

Naveen N. Rao (1):
  perf annotate: Add support for powerpc

Ravi Bangoria (6):
  perf: Define macro for normalized arch names
  perf annotate: Add cross arch annotate support
  perf annotate: Do not ignore call instruction with indirect target
  perf annotate: Show raw form for jump instruction with indirect
target
  perf annotate: Support jump instruction with target as second operand
  perf annotate: Fix jump target outside of function address range

 tools/perf/arch/common.c   |  36 ++---
 tools/perf/arch/common.h   |  11 ++
 tools/perf/builtin-top.c   |   2 +-
 tools/perf/ui/browsers/annotate.c  |   8 +-
 tools/perf/ui/gtk/annotate.c   |   2 +-
 tools/perf/util/annotate.c | 276 +
 tools/perf/util/annotate.h |  10 +-
 tools/perf/util/unwind-libunwind.c |   4 +-
 8 files changed, 262 insertions(+), 87 deletions(-)

-- 
2.5.5



[PATCH v5 5/7] perf annotate: Show raw form for jump instruction with indirect target

2016-08-18 Thread Ravi Bangoria
For jump instructions that does not include target address as direct
operand, use raw value for that. This is needed for certain powerpc
jump instructions that use target address in a register (such as bctr,
btar, ...).

Suggested-by: Michael Ellerman 
Signed-off-by: Ravi Bangoria 
---
Changes in v5:
  - New patch introduced to annotate jump instruction with indirect target

 tools/perf/util/annotate.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 6368ba9..4a4a583 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -131,6 +131,9 @@ static int jump__parse(struct ins_operands *ops,
 static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
   struct ins_operands *ops)
 {
+   if (!ops->target.addr)
+   return ins__raw_scnprintf(ins, bf, size, ops);
+
return scnprintf(bf, size, "%-6.6s %" PRIx64, ins->name, 
ops->target.offset);
 }
 
-- 
2.5.5



[PATCH v5 1/7] perf: Define macro for normalized arch names

2016-08-18 Thread Ravi Bangoria
Define macro for each normalized arch name and use them instead
of using arch name as string.

Signed-off-by: Ravi Bangoria 
---
Changes in v5:
  - No changes.

 tools/perf/arch/common.c   | 36 ++--
 tools/perf/arch/common.h   | 11 +++
 tools/perf/util/unwind-libunwind.c |  4 ++--
 3 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c
index 886dd2a..f763666 100644
--- a/tools/perf/arch/common.c
+++ b/tools/perf/arch/common.c
@@ -123,25 +123,25 @@ static int lookup_triplets(const char *const *triplets, 
const char *name)
 const char *normalize_arch(char *arch)
 {
if (!strcmp(arch, "x86_64"))
-   return "x86";
+   return NORM_X86;
if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6')
-   return "x86";
+   return NORM_X86;
if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
-   return "sparc";
+   return NORM_SPARC;
if (!strcmp(arch, "aarch64") || !strcmp(arch, "arm64"))
-   return "arm64";
+   return NORM_ARM64;
if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
-   return "arm";
+   return NORM_ARM;
if (!strncmp(arch, "s390", 4))
-   return "s390";
+   return NORM_S390;
if (!strncmp(arch, "parisc", 6))
-   return "parisc";
+   return NORM_PARISC;
if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3))
-   return "powerpc";
+   return NORM_POWERPC;
if (!strncmp(arch, "mips", 4))
-   return "mips";
+   return NORM_MIPS;
if (!strncmp(arch, "sh", 2) && isdigit(arch[2]))
-   return "sh";
+   return NORM_SH;
 
return arch;
 }
@@ -181,21 +181,21 @@ static int perf_env__lookup_binutils_path(struct perf_env 
*env,
zfree();
}
 
-   if (!strcmp(arch, "arm"))
+   if (!strcmp(arch, NORM_ARM))
path_list = arm_triplets;
-   else if (!strcmp(arch, "arm64"))
+   else if (!strcmp(arch, NORM_ARM64))
path_list = arm64_triplets;
-   else if (!strcmp(arch, "powerpc"))
+   else if (!strcmp(arch, NORM_POWERPC))
path_list = powerpc_triplets;
-   else if (!strcmp(arch, "sh"))
+   else if (!strcmp(arch, NORM_SH))
path_list = sh_triplets;
-   else if (!strcmp(arch, "s390"))
+   else if (!strcmp(arch, NORM_S390))
path_list = s390_triplets;
-   else if (!strcmp(arch, "sparc"))
+   else if (!strcmp(arch, NORM_SPARC))
path_list = sparc_triplets;
-   else if (!strcmp(arch, "x86"))
+   else if (!strcmp(arch, NORM_X86))
path_list = x86_triplets;
-   else if (!strcmp(arch, "mips"))
+   else if (!strcmp(arch, NORM_MIPS))
path_list = mips_triplets;
else {
ui__error("binutils for %s not supported.\n", arch);
diff --git a/tools/perf/arch/common.h b/tools/perf/arch/common.h
index 6b01c73..14ca8ca 100644
--- a/tools/perf/arch/common.h
+++ b/tools/perf/arch/common.h
@@ -5,6 +5,17 @@
 
 extern const char *objdump_path;
 
+/* Macro for normalized arch names */
+#define NORM_X86   "x86"
+#define NORM_SPARC "sparc"
+#define NORM_ARM64 "arm64"
+#define NORM_ARM   "arm"
+#define NORM_S390  "s390"
+#define NORM_PARISC"parisc"
+#define NORM_POWERPC   "powerpc"
+#define NORM_MIPS  "mips"
+#define NORM_SH"sh"
+
 int perf_env__lookup_objdump(struct perf_env *env);
 const char *normalize_arch(char *arch);
 
diff --git a/tools/perf/util/unwind-libunwind.c 
b/tools/perf/util/unwind-libunwind.c
index 6d542a4..6199102 100644
--- a/tools/perf/util/unwind-libunwind.c
+++ b/tools/perf/util/unwind-libunwind.c
@@ -40,10 +40,10 @@ int unwind__prepare_access(struct thread *thread, struct 
map *map,
 
arch = normalize_arch(thread->mg->machine->env->arch);
 
-   if (!strcmp(arch, "x86")) {
+   if (!strcmp(arch, NORM_X86)) {
if (dso_type != DSO__TYPE_64BIT)
ops = x86_32_unwind_libunwind_ops;
-   } else if (!strcmp(arch, "arm64") || !strcmp(arch, "arm")) {
+   } else if (!strcmp(arch, NORM_ARM64) || !strcmp(arch, NORM_ARM)) {
if (dso_type == DSO__TYPE_64BIT)
ops = arm64_unwind_libunwind_ops;
}
-- 
2.5.5



[PATCH v5 0/7] perf: Cross arch annotate + few miscellaneous fixes

2016-08-18 Thread Ravi Bangoria
Currently Perf annotate support code navigation (branches and calls)
only when run on the same architecture where perf.data was recorded.
But, for example, record on powerpc server and annotate on client's
x86 desktop is not supported.

This patchset enables cross arch annotate. Currently I've used x86
and arm instructions which are already available and added support
for powerpc.

Additionally this patch series also contains few other related fixes.

Patches are prepared on top of acme/perf/core and tested it with x86
and powerpc only.

Note for arm:
Few instructions were defined under #if __arm__ which I've used as a
table for arm. But I'm not sure whether instruction defined outside of
that also contains arm instructions. Apart from that, 'call__parse()'
and 'move__parse()' contains #ifdef __arm__ directive. I've changed it
to  if (!strcmp(norm_arch, arm)). I don't have a arm machine to test
these changes.

Example:

  Record on powerpc:
  $ ./perf record -a

  Report -> Annotate on x86:
  $ ./perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc

Changes in v5:
  - Replaced symbol__annotate with symbol__disassemble.
  - Removed hacks for jump and call instructions like bctr and bctrl
respectively from generic patch that enables support for powerpc
and made separate patch for that.
  - v4 was not annotating powerpc 'btar' instruction. Included that.
  - Added few generic fixes.

v4 link:
  https://lkml.org/lkml/2016/7/8/10

Naveen N. Rao (1):
  perf annotate: Add support for powerpc

Ravi Bangoria (6):
  perf: Define macro for normalized arch names
  perf annotate: Add cross arch annotate support
  perf annotate: Do not ignore call instruction with indirect target
  perf annotate: Show raw form for jump instruction with indirect
target
  perf annotate: Support jump instruction with target as second operand
  perf annotate: Fix jump target outside of function address range

 tools/perf/arch/common.c   |  36 ++---
 tools/perf/arch/common.h   |  11 ++
 tools/perf/builtin-top.c   |   2 +-
 tools/perf/ui/browsers/annotate.c  |   8 +-
 tools/perf/ui/gtk/annotate.c   |   2 +-
 tools/perf/util/annotate.c | 276 +
 tools/perf/util/annotate.h |  10 +-
 tools/perf/util/unwind-libunwind.c |   4 +-
 8 files changed, 262 insertions(+), 87 deletions(-)

-- 
2.5.5



[PATCH v5 5/7] perf annotate: Show raw form for jump instruction with indirect target

2016-08-18 Thread Ravi Bangoria
For jump instructions that does not include target address as direct
operand, use raw value for that. This is needed for certain powerpc
jump instructions that use target address in a register (such as bctr,
btar, ...).

Suggested-by: Michael Ellerman 
Signed-off-by: Ravi Bangoria 
---
Changes in v5:
  - New patch introduced to annotate jump instruction with indirect target

 tools/perf/util/annotate.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 6368ba9..4a4a583 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -131,6 +131,9 @@ static int jump__parse(struct ins_operands *ops,
 static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
   struct ins_operands *ops)
 {
+   if (!ops->target.addr)
+   return ins__raw_scnprintf(ins, bf, size, ops);
+
return scnprintf(bf, size, "%-6.6s %" PRIx64, ins->name, 
ops->target.offset);
 }
 
-- 
2.5.5



[PATCH 6/8] pipe: fix limit checking in alloc_pipe_info()

2016-08-18 Thread Michael Kerrisk (man-pages)
The limit checking in alloc_pipe_info() (used by pipe(2) and when
opening a FIFO) has the following problems:

(1) When checking capacity required for the new pipe, the checks
against the limit in /proc/sys/fs/pipe-user-pages-{soft,hard}
are made against existing consumption, and exclude the memory
required for the new pipe capacity. As a consequence: (1) the
memory allocation throttling provided by the soft limit does
not kick in quite as early as it should, and (2) the user can
overrun the hard limit.

(2) As currently implemented, accounting and checking against the limits
is done as follows:

(a) Test whether the user has exceeded the limit.
(b) Make new pipe buffer allocation.
(c) Account new allocation against the limits.

This is racey. Multiple processes may pass point (a)
simultaneously, and then allocate pipe buffers that are
accounted for only in step (c).  The race means that the
user's pipe buffer allocation could be pushed over the limit
(by an arbitrary amount, depending on how unlucky we were in
the race). [Thanks to Vegard Nossum for spotting this point,
which I had missed.]

This patch addresses the above problems as follows:

* Alter the checks against limits to include the memory required for the
  new pipe.
* Re-order the accounting step so that it precedes the buffer allocation.
  If the accounting step determines that a limit has been reached, revert
  the accounting and cause the operation to fail.

Cc: Willy Tarreau 
Cc: Vegard Nossum 
Cc: socketp...@gmail.com
Cc: Tetsuo Handa 
Cc: Jens Axboe 
Cc: Al Viro 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Michael Kerrisk 

---
 fs/pipe.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 613c6b9..705d79f 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -632,24 +632,28 @@ struct pipe_inode_info *alloc_pipe_info(void)
if (pipe == NULL)
goto out_free_uid;
 
-   if (!too_many_pipe_buffers_hard(user)) {
-   if (too_many_pipe_buffers_soft(user))
-   pipe_bufs = 1;
-   pipe->bufs = kcalloc(pipe_bufs,
-sizeof(struct pipe_buffer),
-GFP_KERNEL_ACCOUNT);
-   }
+   if (too_many_pipe_buffers_soft(user))
+   pipe_bufs = 1;
+
+   account_pipe_buffers(user, 0, pipe_bufs);
+
+   if (too_many_pipe_buffers_hard(user))
+   goto out_revert_acct;
+
+   pipe->bufs = kcalloc(pipe_bufs, sizeof(struct pipe_buffer),
+GFP_KERNEL_ACCOUNT);
 
if (pipe->bufs) {
init_waitqueue_head(>wait);
pipe->r_counter = pipe->w_counter = 1;
pipe->buffers = pipe_bufs;
pipe->user = user;
-   account_pipe_buffers(user, 0, pipe_bufs);
mutex_init(>mutex);
return pipe;
}
 
+out_revert_acct:
+   account_pipe_buffers(user, pipe_bufs, 0);
kfree(pipe);
 out_free_uid:
free_uid(user);
-- 
2.5.5



[PATCH 8/8] pipe: cap initial pipe capacity according to pipe-max-size limit

2016-08-18 Thread Michael Kerrisk (man-pages)
This is an patch that provides behavior that is more consistent,
and probably less surprising to users. I consider the change
optional, and welcome opinions about whether it should be applied.

By default, pipes are created with a capacity of 64 kiB.  However,
/proc/sys/fs/pipe-max-size may be set smaller than this value.  In
this scenario, an unprivileged user could thus create a pipe whose
initial capacity exceeds the limit. Therefore, it seems logical to
cap the initial pipe capacity according to the value of
pipe-max-size.

The test program shown earlier in this patch series can be used to
demonstrate the effect of the change brought about with this
patch:

# cat /proc/sys/fs/pipe-max-size
1048576
# sudo -u mtk ./test_F_SETPIPE_SZ 1
Initial pipe capacity: 65536
# echo 1 > /proc/sys/fs/pipe-max-size
# cat /proc/sys/fs/pipe-max-size
16384
# sudo -u mtk ./test_F_SETPIPE_SZ 1
Initial pipe capacity: 16384
# ./test_F_SETPIPE_SZ 1
Initial pipe capacity: 65536

The last two executions of 'test_F_SETPIPE_SZ' show that pipe-max-size
caps the initial allocation for a new pipe for unprivileged users, but
not for privileged users.

Cc: Willy Tarreau 
Cc: Vegard Nossum 
Cc: socketp...@gmail.com
Cc: Tetsuo Handa 
Cc: Jens Axboe 
Cc: Al Viro 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Michael Kerrisk 

---
 fs/pipe.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/pipe.c b/fs/pipe.c
index ada1777..caced8b 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -631,6 +631,9 @@ struct pipe_inode_info *alloc_pipe_info(void)
if (pipe == NULL)
goto out_free_uid;
 
+   if (!capable(CAP_SYS_RESOURCE) && pipe_bufs * PAGE_SIZE > pipe_max_size)
+   pipe_bufs = pipe_max_size >> PAGE_SHIFT;
+
if (too_many_pipe_buffers_soft(atomic_long_read(>pipe_bufs)))
pipe_bufs = 1;
 
-- 
2.5.5



Seeking recommendation on whether a bahviour is right/wrong

2016-08-18 Thread Ajay Garg
Hi All.

I have been trying to debug a strange issue occurring on a "mostly
mainline"-linux-kernel, running on a proprietary embedded-platform.

I still haven't been able to zero-on the issue with 100% confirmation,
but I think the following might be happening ::

a)
A C-user-application is running, and a file is being written, one byte
at a time.
Let's say the file-name being written is "file1.txt"

b)
There's another file "file2.txt", which is in absolutely sane-state
(no open file-descriptors, etc.)

c)
Now, a cron-script reboots the machine via /sbin/reboot "abruptly"
(i.e. without closing the open-file-descriptor of "file1.txt").

d)
When the machine comes up, we find that "file2.txt" is corrupted.


In this behaviour, is the kernel at fault?
Or the cron-job is the culprit for abrupt reboot?



Thanks and Regards,
Ajay


[PATCH 5/8] pipe: simplify logic in alloc_pipe_info()

2016-08-18 Thread Michael Kerrisk (man-pages)
Replace an 'if' block that covers most of the code in this
function with a 'goto'. This makes the code a little simpler
to read, and also simplifies the next patch.

Cc: Willy Tarreau 
Cc: Vegard Nossum 
Cc: socketp...@gmail.com
Cc: Tetsuo Handa 
Cc: Jens Axboe 
Cc: Al Viro 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Michael Kerrisk 

---
 fs/pipe.c | 45 +++--
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index a7470a9..613c6b9 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -625,33 +625,34 @@ static bool too_many_pipe_buffers_hard(struct user_struct 
*user)
 struct pipe_inode_info *alloc_pipe_info(void)
 {
struct pipe_inode_info *pipe;
+   unsigned long pipe_bufs = PIPE_DEF_BUFFERS;
+   struct user_struct *user = get_current_user();
 
pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL_ACCOUNT);
-   if (pipe) {
-   unsigned long pipe_bufs = PIPE_DEF_BUFFERS;
-   struct user_struct *user = get_current_user();
-
-   if (!too_many_pipe_buffers_hard(user)) {
-   if (too_many_pipe_buffers_soft(user))
-   pipe_bufs = 1;
-   pipe->bufs = kcalloc(pipe_bufs,
-sizeof(struct pipe_buffer),
-GFP_KERNEL_ACCOUNT);
-   }
+   if (pipe == NULL)
+   goto out_free_uid;
+
+   if (!too_many_pipe_buffers_hard(user)) {
+   if (too_many_pipe_buffers_soft(user))
+   pipe_bufs = 1;
+   pipe->bufs = kcalloc(pipe_bufs,
+sizeof(struct pipe_buffer),
+GFP_KERNEL_ACCOUNT);
+   }
 
-   if (pipe->bufs) {
-   init_waitqueue_head(>wait);
-   pipe->r_counter = pipe->w_counter = 1;
-   pipe->buffers = pipe_bufs;
-   pipe->user = user;
-   account_pipe_buffers(user, 0, pipe_bufs);
-   mutex_init(>mutex);
-   return pipe;
-   }
-   free_uid(user);
-   kfree(pipe);
+   if (pipe->bufs) {
+   init_waitqueue_head(>wait);
+   pipe->r_counter = pipe->w_counter = 1;
+   pipe->buffers = pipe_bufs;
+   pipe->user = user;
+   account_pipe_buffers(user, 0, pipe_bufs);
+   mutex_init(>mutex);
+   return pipe;
}
 
+   kfree(pipe);
+out_free_uid:
+   free_uid(user);
return NULL;
 }
 
-- 
2.5.5



[PATCH 6/8] pipe: fix limit checking in alloc_pipe_info()

2016-08-18 Thread Michael Kerrisk (man-pages)
The limit checking in alloc_pipe_info() (used by pipe(2) and when
opening a FIFO) has the following problems:

(1) When checking capacity required for the new pipe, the checks
against the limit in /proc/sys/fs/pipe-user-pages-{soft,hard}
are made against existing consumption, and exclude the memory
required for the new pipe capacity. As a consequence: (1) the
memory allocation throttling provided by the soft limit does
not kick in quite as early as it should, and (2) the user can
overrun the hard limit.

(2) As currently implemented, accounting and checking against the limits
is done as follows:

(a) Test whether the user has exceeded the limit.
(b) Make new pipe buffer allocation.
(c) Account new allocation against the limits.

This is racey. Multiple processes may pass point (a)
simultaneously, and then allocate pipe buffers that are
accounted for only in step (c).  The race means that the
user's pipe buffer allocation could be pushed over the limit
(by an arbitrary amount, depending on how unlucky we were in
the race). [Thanks to Vegard Nossum for spotting this point,
which I had missed.]

This patch addresses the above problems as follows:

* Alter the checks against limits to include the memory required for the
  new pipe.
* Re-order the accounting step so that it precedes the buffer allocation.
  If the accounting step determines that a limit has been reached, revert
  the accounting and cause the operation to fail.

Cc: Willy Tarreau 
Cc: Vegard Nossum 
Cc: socketp...@gmail.com
Cc: Tetsuo Handa 
Cc: Jens Axboe 
Cc: Al Viro 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Michael Kerrisk 

---
 fs/pipe.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 613c6b9..705d79f 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -632,24 +632,28 @@ struct pipe_inode_info *alloc_pipe_info(void)
if (pipe == NULL)
goto out_free_uid;
 
-   if (!too_many_pipe_buffers_hard(user)) {
-   if (too_many_pipe_buffers_soft(user))
-   pipe_bufs = 1;
-   pipe->bufs = kcalloc(pipe_bufs,
-sizeof(struct pipe_buffer),
-GFP_KERNEL_ACCOUNT);
-   }
+   if (too_many_pipe_buffers_soft(user))
+   pipe_bufs = 1;
+
+   account_pipe_buffers(user, 0, pipe_bufs);
+
+   if (too_many_pipe_buffers_hard(user))
+   goto out_revert_acct;
+
+   pipe->bufs = kcalloc(pipe_bufs, sizeof(struct pipe_buffer),
+GFP_KERNEL_ACCOUNT);
 
if (pipe->bufs) {
init_waitqueue_head(>wait);
pipe->r_counter = pipe->w_counter = 1;
pipe->buffers = pipe_bufs;
pipe->user = user;
-   account_pipe_buffers(user, 0, pipe_bufs);
mutex_init(>mutex);
return pipe;
}
 
+out_revert_acct:
+   account_pipe_buffers(user, pipe_bufs, 0);
kfree(pipe);
 out_free_uid:
free_uid(user);
-- 
2.5.5



[PATCH 8/8] pipe: cap initial pipe capacity according to pipe-max-size limit

2016-08-18 Thread Michael Kerrisk (man-pages)
This is an patch that provides behavior that is more consistent,
and probably less surprising to users. I consider the change
optional, and welcome opinions about whether it should be applied.

By default, pipes are created with a capacity of 64 kiB.  However,
/proc/sys/fs/pipe-max-size may be set smaller than this value.  In
this scenario, an unprivileged user could thus create a pipe whose
initial capacity exceeds the limit. Therefore, it seems logical to
cap the initial pipe capacity according to the value of
pipe-max-size.

The test program shown earlier in this patch series can be used to
demonstrate the effect of the change brought about with this
patch:

# cat /proc/sys/fs/pipe-max-size
1048576
# sudo -u mtk ./test_F_SETPIPE_SZ 1
Initial pipe capacity: 65536
# echo 1 > /proc/sys/fs/pipe-max-size
# cat /proc/sys/fs/pipe-max-size
16384
# sudo -u mtk ./test_F_SETPIPE_SZ 1
Initial pipe capacity: 16384
# ./test_F_SETPIPE_SZ 1
Initial pipe capacity: 65536

The last two executions of 'test_F_SETPIPE_SZ' show that pipe-max-size
caps the initial allocation for a new pipe for unprivileged users, but
not for privileged users.

Cc: Willy Tarreau 
Cc: Vegard Nossum 
Cc: socketp...@gmail.com
Cc: Tetsuo Handa 
Cc: Jens Axboe 
Cc: Al Viro 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Michael Kerrisk 

---
 fs/pipe.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/pipe.c b/fs/pipe.c
index ada1777..caced8b 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -631,6 +631,9 @@ struct pipe_inode_info *alloc_pipe_info(void)
if (pipe == NULL)
goto out_free_uid;
 
+   if (!capable(CAP_SYS_RESOURCE) && pipe_bufs * PAGE_SIZE > pipe_max_size)
+   pipe_bufs = pipe_max_size >> PAGE_SHIFT;
+
if (too_many_pipe_buffers_soft(atomic_long_read(>pipe_bufs)))
pipe_bufs = 1;
 
-- 
2.5.5



Seeking recommendation on whether a bahviour is right/wrong

2016-08-18 Thread Ajay Garg
Hi All.

I have been trying to debug a strange issue occurring on a "mostly
mainline"-linux-kernel, running on a proprietary embedded-platform.

I still haven't been able to zero-on the issue with 100% confirmation,
but I think the following might be happening ::

a)
A C-user-application is running, and a file is being written, one byte
at a time.
Let's say the file-name being written is "file1.txt"

b)
There's another file "file2.txt", which is in absolutely sane-state
(no open file-descriptors, etc.)

c)
Now, a cron-script reboots the machine via /sbin/reboot "abruptly"
(i.e. without closing the open-file-descriptor of "file1.txt").

d)
When the machine comes up, we find that "file2.txt" is corrupted.


In this behaviour, is the kernel at fault?
Or the cron-job is the culprit for abrupt reboot?



Thanks and Regards,
Ajay


[PATCH 5/8] pipe: simplify logic in alloc_pipe_info()

2016-08-18 Thread Michael Kerrisk (man-pages)
Replace an 'if' block that covers most of the code in this
function with a 'goto'. This makes the code a little simpler
to read, and also simplifies the next patch.

Cc: Willy Tarreau 
Cc: Vegard Nossum 
Cc: socketp...@gmail.com
Cc: Tetsuo Handa 
Cc: Jens Axboe 
Cc: Al Viro 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Michael Kerrisk 

---
 fs/pipe.c | 45 +++--
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index a7470a9..613c6b9 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -625,33 +625,34 @@ static bool too_many_pipe_buffers_hard(struct user_struct 
*user)
 struct pipe_inode_info *alloc_pipe_info(void)
 {
struct pipe_inode_info *pipe;
+   unsigned long pipe_bufs = PIPE_DEF_BUFFERS;
+   struct user_struct *user = get_current_user();
 
pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL_ACCOUNT);
-   if (pipe) {
-   unsigned long pipe_bufs = PIPE_DEF_BUFFERS;
-   struct user_struct *user = get_current_user();
-
-   if (!too_many_pipe_buffers_hard(user)) {
-   if (too_many_pipe_buffers_soft(user))
-   pipe_bufs = 1;
-   pipe->bufs = kcalloc(pipe_bufs,
-sizeof(struct pipe_buffer),
-GFP_KERNEL_ACCOUNT);
-   }
+   if (pipe == NULL)
+   goto out_free_uid;
+
+   if (!too_many_pipe_buffers_hard(user)) {
+   if (too_many_pipe_buffers_soft(user))
+   pipe_bufs = 1;
+   pipe->bufs = kcalloc(pipe_bufs,
+sizeof(struct pipe_buffer),
+GFP_KERNEL_ACCOUNT);
+   }
 
-   if (pipe->bufs) {
-   init_waitqueue_head(>wait);
-   pipe->r_counter = pipe->w_counter = 1;
-   pipe->buffers = pipe_bufs;
-   pipe->user = user;
-   account_pipe_buffers(user, 0, pipe_bufs);
-   mutex_init(>mutex);
-   return pipe;
-   }
-   free_uid(user);
-   kfree(pipe);
+   if (pipe->bufs) {
+   init_waitqueue_head(>wait);
+   pipe->r_counter = pipe->w_counter = 1;
+   pipe->buffers = pipe_bufs;
+   pipe->user = user;
+   account_pipe_buffers(user, 0, pipe_bufs);
+   mutex_init(>mutex);
+   return pipe;
}
 
+   kfree(pipe);
+out_free_uid:
+   free_uid(user);
return NULL;
 }
 
-- 
2.5.5



Re: [PATCH 1/2] x86: Set up resources correctly on Hyper-V Generation 2

2016-08-18 Thread Dan Williams
On Thu, Aug 18, 2016 at 12:56 PM, Matthew Wilcox  wrote:
> Compared to a patch which removes 5 lines of code, almost any additional work 
> is ocean-boiling.
>

Did you check the state of NFIT enabling in Hyper-V?  Not patching the
Linux kernel at all is even less work.


Re: [PATCH 1/2] x86: Set up resources correctly on Hyper-V Generation 2

2016-08-18 Thread Dan Williams
On Thu, Aug 18, 2016 at 12:56 PM, Matthew Wilcox  wrote:
> Compared to a patch which removes 5 lines of code, almost any additional work 
> is ocean-boiling.
>

Did you check the state of NFIT enabling in Hyper-V?  Not patching the
Linux kernel at all is even less work.


[PATCH 7/8] pipe: make account_pipe_buffers() return a value, and use it

2016-08-18 Thread Michael Kerrisk (man-pages)
This is an optional patch, to provide a small performance improvement.
Alter account_pipe_buffers() so that it returns the new value in
user->pipe_bufs. This means that we can refactor too_many_pipe_buffers_soft()
and too_many_pipe_buffers_hard() to avoid the costs of repeated use of
atomic_long_read() to get the value user->pipe_bufs.

Cc: Willy Tarreau 
Cc: Vegard Nossum 
Cc: socketp...@gmail.com
Cc: Tetsuo Handa 
Cc: Jens Axboe 
Cc: Al Viro 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Michael Kerrisk 

---
 fs/pipe.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 705d79f..ada1777 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -604,22 +604,20 @@ pipe_fasync(int fd, struct file *filp, int on)
return retval;
 }
 
-static void account_pipe_buffers(struct user_struct *user,
+static unsigned long account_pipe_buffers(struct user_struct *user,
  unsigned long old, unsigned long new)
 {
-   atomic_long_add(new - old, >pipe_bufs);
+   return atomic_long_add_return(new - old, >pipe_bufs);
 }
 
-static bool too_many_pipe_buffers_soft(struct user_struct *user)
+static bool too_many_pipe_buffers_soft(unsigned long num_bufs)
 {
-   return pipe_user_pages_soft &&
-  atomic_long_read(>pipe_bufs) >= pipe_user_pages_soft;
+   return pipe_user_pages_soft && num_bufs >= pipe_user_pages_soft;
 }
 
-static bool too_many_pipe_buffers_hard(struct user_struct *user)
+static bool too_many_pipe_buffers_hard(unsigned long num_bufs)
 {
-   return pipe_user_pages_hard &&
-  atomic_long_read(>pipe_bufs) >= pipe_user_pages_hard;
+   return pipe_user_pages_hard && num_bufs >= pipe_user_pages_hard;
 }
 
 struct pipe_inode_info *alloc_pipe_info(void)
@@ -627,17 +625,18 @@ struct pipe_inode_info *alloc_pipe_info(void)
struct pipe_inode_info *pipe;
unsigned long pipe_bufs = PIPE_DEF_BUFFERS;
struct user_struct *user = get_current_user();
+   unsigned long num_bufs;
 
pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL_ACCOUNT);
if (pipe == NULL)
goto out_free_uid;
 
-   if (too_many_pipe_buffers_soft(user))
+   if (too_many_pipe_buffers_soft(atomic_long_read(>pipe_bufs)))
pipe_bufs = 1;
 
-   account_pipe_buffers(user, 0, pipe_bufs);
+   num_bufs = account_pipe_buffers(user, 0, pipe_bufs);
 
-   if (too_many_pipe_buffers_hard(user))
+   if (too_many_pipe_buffers_hard(num_bufs))
goto out_revert_acct;
 
pipe->bufs = kcalloc(pipe_bufs, sizeof(struct pipe_buffer),
@@ -653,7 +652,7 @@ struct pipe_inode_info *alloc_pipe_info(void)
}
 
 out_revert_acct:
-   account_pipe_buffers(user, pipe_bufs, 0);
+   (void) account_pipe_buffers(user, pipe_bufs, 0);
kfree(pipe);
 out_free_uid:
free_uid(user);
@@ -664,7 +663,7 @@ void free_pipe_info(struct pipe_inode_info *pipe)
 {
int i;
 
-   account_pipe_buffers(pipe->user, pipe->buffers, 0);
+   (void) account_pipe_buffers(pipe->user, pipe->buffers, 0);
free_uid(pipe->user);
for (i = 0; i < pipe->buffers; i++) {
struct pipe_buffer *buf = pipe->bufs + i;
@@ -1035,6 +1034,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, 
unsigned long arg)
 {
struct pipe_buffer *bufs;
unsigned int size, nr_pages;
+   unsigned long num_bufs;
long ret = 0;
 
size = round_pipe_size(arg);
@@ -1043,7 +1043,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, 
unsigned long arg)
if (!nr_pages)
return -EINVAL;
 
-   account_pipe_buffers(pipe->user, pipe->buffers, nr_pages);
+   num_bufs = account_pipe_buffers(pipe->user, pipe->buffers, nr_pages);
 
/*
 * If trying to increase the pipe capacity, check that an
@@ -1055,8 +1055,8 @@ static long pipe_set_size(struct pipe_inode_info *pipe, 
unsigned long arg)
if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) {
ret = -EPERM;
goto out_revert_acct;
-   } else if ((too_many_pipe_buffers_hard(pipe->user) ||
-   too_many_pipe_buffers_soft(pipe->user)) &&
+   } else if ((too_many_pipe_buffers_hard(num_bufs) ||
+   too_many_pipe_buffers_soft(num_bufs)) &&
!capable(CAP_SYS_RESOURCE) &&
!capable(CAP_SYS_ADMIN)) {
ret = -EPERM;
@@ -1110,7 +1110,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, 
unsigned long arg)
return nr_pages * PAGE_SIZE;
 
 out_revert_acct:
-   account_pipe_buffers(pipe->user, 

[PATCH 7/8] pipe: make account_pipe_buffers() return a value, and use it

2016-08-18 Thread Michael Kerrisk (man-pages)
This is an optional patch, to provide a small performance improvement.
Alter account_pipe_buffers() so that it returns the new value in
user->pipe_bufs. This means that we can refactor too_many_pipe_buffers_soft()
and too_many_pipe_buffers_hard() to avoid the costs of repeated use of
atomic_long_read() to get the value user->pipe_bufs.

Cc: Willy Tarreau 
Cc: Vegard Nossum 
Cc: socketp...@gmail.com
Cc: Tetsuo Handa 
Cc: Jens Axboe 
Cc: Al Viro 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Michael Kerrisk 

---
 fs/pipe.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 705d79f..ada1777 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -604,22 +604,20 @@ pipe_fasync(int fd, struct file *filp, int on)
return retval;
 }
 
-static void account_pipe_buffers(struct user_struct *user,
+static unsigned long account_pipe_buffers(struct user_struct *user,
  unsigned long old, unsigned long new)
 {
-   atomic_long_add(new - old, >pipe_bufs);
+   return atomic_long_add_return(new - old, >pipe_bufs);
 }
 
-static bool too_many_pipe_buffers_soft(struct user_struct *user)
+static bool too_many_pipe_buffers_soft(unsigned long num_bufs)
 {
-   return pipe_user_pages_soft &&
-  atomic_long_read(>pipe_bufs) >= pipe_user_pages_soft;
+   return pipe_user_pages_soft && num_bufs >= pipe_user_pages_soft;
 }
 
-static bool too_many_pipe_buffers_hard(struct user_struct *user)
+static bool too_many_pipe_buffers_hard(unsigned long num_bufs)
 {
-   return pipe_user_pages_hard &&
-  atomic_long_read(>pipe_bufs) >= pipe_user_pages_hard;
+   return pipe_user_pages_hard && num_bufs >= pipe_user_pages_hard;
 }
 
 struct pipe_inode_info *alloc_pipe_info(void)
@@ -627,17 +625,18 @@ struct pipe_inode_info *alloc_pipe_info(void)
struct pipe_inode_info *pipe;
unsigned long pipe_bufs = PIPE_DEF_BUFFERS;
struct user_struct *user = get_current_user();
+   unsigned long num_bufs;
 
pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL_ACCOUNT);
if (pipe == NULL)
goto out_free_uid;
 
-   if (too_many_pipe_buffers_soft(user))
+   if (too_many_pipe_buffers_soft(atomic_long_read(>pipe_bufs)))
pipe_bufs = 1;
 
-   account_pipe_buffers(user, 0, pipe_bufs);
+   num_bufs = account_pipe_buffers(user, 0, pipe_bufs);
 
-   if (too_many_pipe_buffers_hard(user))
+   if (too_many_pipe_buffers_hard(num_bufs))
goto out_revert_acct;
 
pipe->bufs = kcalloc(pipe_bufs, sizeof(struct pipe_buffer),
@@ -653,7 +652,7 @@ struct pipe_inode_info *alloc_pipe_info(void)
}
 
 out_revert_acct:
-   account_pipe_buffers(user, pipe_bufs, 0);
+   (void) account_pipe_buffers(user, pipe_bufs, 0);
kfree(pipe);
 out_free_uid:
free_uid(user);
@@ -664,7 +663,7 @@ void free_pipe_info(struct pipe_inode_info *pipe)
 {
int i;
 
-   account_pipe_buffers(pipe->user, pipe->buffers, 0);
+   (void) account_pipe_buffers(pipe->user, pipe->buffers, 0);
free_uid(pipe->user);
for (i = 0; i < pipe->buffers; i++) {
struct pipe_buffer *buf = pipe->bufs + i;
@@ -1035,6 +1034,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, 
unsigned long arg)
 {
struct pipe_buffer *bufs;
unsigned int size, nr_pages;
+   unsigned long num_bufs;
long ret = 0;
 
size = round_pipe_size(arg);
@@ -1043,7 +1043,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, 
unsigned long arg)
if (!nr_pages)
return -EINVAL;
 
-   account_pipe_buffers(pipe->user, pipe->buffers, nr_pages);
+   num_bufs = account_pipe_buffers(pipe->user, pipe->buffers, nr_pages);
 
/*
 * If trying to increase the pipe capacity, check that an
@@ -1055,8 +1055,8 @@ static long pipe_set_size(struct pipe_inode_info *pipe, 
unsigned long arg)
if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) {
ret = -EPERM;
goto out_revert_acct;
-   } else if ((too_many_pipe_buffers_hard(pipe->user) ||
-   too_many_pipe_buffers_soft(pipe->user)) &&
+   } else if ((too_many_pipe_buffers_hard(num_bufs) ||
+   too_many_pipe_buffers_soft(num_bufs)) &&
!capable(CAP_SYS_RESOURCE) &&
!capable(CAP_SYS_ADMIN)) {
ret = -EPERM;
@@ -1110,7 +1110,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, 
unsigned long arg)
return nr_pages * PAGE_SIZE;
 
 out_revert_acct:
-   account_pipe_buffers(pipe->user, nr_pages, pipe->buffers);
+   (void) account_pipe_buffers(pipe->user, nr_pages, pipe->buffers);
return ret;
 }
 
-- 
2.5.5



Re: [RFC PATCH 0/3] UART slave device bus

2016-08-18 Thread Sebastian Reichel
Hi,

On Thu, Aug 18, 2016 at 06:08:24PM -0500, Rob Herring wrote:
> On Thu, Aug 18, 2016 at 3:29 PM, Sebastian Reichel  wrote:
> > Thanks for going forward and implementing this. I also started,
> > but was far from a functional state.
> >
> > On Wed, Aug 17, 2016 at 08:14:42PM -0500, Rob Herring wrote:
> >> Currently, devices attached via a UART are not well supported in
> >> the kernel. The problem is the device support is done in tty line
> >> disciplines, various platform drivers to handle some sideband, and
> >> in userspace with utilities such as hciattach.
> >>
> >> There have been several attempts to improve support, but they suffer from
> >> still being tied into the tty layer and/or abusing the platform bus. This
> >> is a prototype to show creating a proper UART bus for UART devices. It is
> >> tied into the serial core (really struct uart_port) below the tty layer
> >> in order to use existing serial drivers.
> >>
> >> This is functional with minimal testing using the loopback driver and
> >> pl011 (w/o DMA) UART under QEMU (modified to add a DT node for the slave
> >> device). It still needs lots of work and polish.
> >>
> >> TODOs:
> >> - Figure out the port locking. mutex plus spinlock plus refcounting? I'm
> >>   hoping all that complexity is from the tty layer and not needed here.
> >> - Split out the controller for uart_ports into separate driver. Do we see
> >>   a need for controller drivers that are not standard serial drivers?
> >> - Implement/test the removal paths
> >> - Fix the receive callbacks for more than character at a time (i.e. DMA)
> >> - Need better receive buffering than just a simple circular buffer or
> >>   perhaps a different receive interface (e.g. direct to client buffer)?
> >> - Test with other UART drivers
> >> - Convert a real driver/line discipline over to UART bus.
> >>
> >> Before I spend more time on this, I'm looking mainly for feedback on the
> >> general direction and structure (the interface with the existing serial
> >> drivers in particular).
> >
> > I had a look at the uart_dev API:
> >
> > int uart_dev_config(struct uart_device *udev, int baud, int parity, int 
> > bits, int flow);
> > int uart_dev_connect(struct uart_device *udev);
> >
> >   The flow control configuration should be done separately. e.g.:
> >   uart_dev_flow_control(struct uart_device *udev, bool enable);
> 
> No objection, but out of curiosity, why?

Nokia's bluetooth uart protocol disables flow control during speed
changes.

> > int uart_dev_tx(struct uart_device *udev, u8 *buf, size_t count);
> > int uart_dev_rx(struct uart_device *udev, u8 *buf, size_t count);
> >
> >   UART communication does not have to be host-initiated, so this
> >   API requires polling. Either some function similar to poll in
> >   userspace is needed, or it should be implemented as callback.
> 
> What's the userspace need?

I meant "Either some function similar to userspace's poll() is
needed, ...". Something like uart_dev_wait_for_rx()

Alternatively the rx function could be a callback, that
is called when there is new data.

> I'm assuming the only immediate consumers are in-kernel.

Yes, but the driver should be notified about incoming data.

-- Sebastian


signature.asc
Description: PGP signature


Re: [RFC PATCH 0/3] UART slave device bus

2016-08-18 Thread Sebastian Reichel
Hi,

On Thu, Aug 18, 2016 at 06:08:24PM -0500, Rob Herring wrote:
> On Thu, Aug 18, 2016 at 3:29 PM, Sebastian Reichel  wrote:
> > Thanks for going forward and implementing this. I also started,
> > but was far from a functional state.
> >
> > On Wed, Aug 17, 2016 at 08:14:42PM -0500, Rob Herring wrote:
> >> Currently, devices attached via a UART are not well supported in
> >> the kernel. The problem is the device support is done in tty line
> >> disciplines, various platform drivers to handle some sideband, and
> >> in userspace with utilities such as hciattach.
> >>
> >> There have been several attempts to improve support, but they suffer from
> >> still being tied into the tty layer and/or abusing the platform bus. This
> >> is a prototype to show creating a proper UART bus for UART devices. It is
> >> tied into the serial core (really struct uart_port) below the tty layer
> >> in order to use existing serial drivers.
> >>
> >> This is functional with minimal testing using the loopback driver and
> >> pl011 (w/o DMA) UART under QEMU (modified to add a DT node for the slave
> >> device). It still needs lots of work and polish.
> >>
> >> TODOs:
> >> - Figure out the port locking. mutex plus spinlock plus refcounting? I'm
> >>   hoping all that complexity is from the tty layer and not needed here.
> >> - Split out the controller for uart_ports into separate driver. Do we see
> >>   a need for controller drivers that are not standard serial drivers?
> >> - Implement/test the removal paths
> >> - Fix the receive callbacks for more than character at a time (i.e. DMA)
> >> - Need better receive buffering than just a simple circular buffer or
> >>   perhaps a different receive interface (e.g. direct to client buffer)?
> >> - Test with other UART drivers
> >> - Convert a real driver/line discipline over to UART bus.
> >>
> >> Before I spend more time on this, I'm looking mainly for feedback on the
> >> general direction and structure (the interface with the existing serial
> >> drivers in particular).
> >
> > I had a look at the uart_dev API:
> >
> > int uart_dev_config(struct uart_device *udev, int baud, int parity, int 
> > bits, int flow);
> > int uart_dev_connect(struct uart_device *udev);
> >
> >   The flow control configuration should be done separately. e.g.:
> >   uart_dev_flow_control(struct uart_device *udev, bool enable);
> 
> No objection, but out of curiosity, why?

Nokia's bluetooth uart protocol disables flow control during speed
changes.

> > int uart_dev_tx(struct uart_device *udev, u8 *buf, size_t count);
> > int uart_dev_rx(struct uart_device *udev, u8 *buf, size_t count);
> >
> >   UART communication does not have to be host-initiated, so this
> >   API requires polling. Either some function similar to poll in
> >   userspace is needed, or it should be implemented as callback.
> 
> What's the userspace need?

I meant "Either some function similar to userspace's poll() is
needed, ...". Something like uart_dev_wait_for_rx()

Alternatively the rx function could be a callback, that
is called when there is new data.

> I'm assuming the only immediate consumers are in-kernel.

Yes, but the driver should be notified about incoming data.

-- Sebastian


signature.asc
Description: PGP signature


Re: linux-next: build warnings after merge of the kbuild tree

2016-08-18 Thread Stephen Rothwell
Hi Nick,

On Fri, 19 Aug 2016 13:38:54 +1000 Stephen Rothwell  
wrote:
>
> On Thu, 18 Aug 2016 11:09:48 +1000 Nicholas Piggin  wrote:
> >
> > On Wed, 17 Aug 2016 14:59:59 +0200
> > Michal Marek  wrote:
> >   
> > > On 2016-08-17 03:44, Stephen Rothwell wrote:
> > > > 
> > > > After merging the kbuild tree, today's linux-next build (powerpc
> > > > ppc64_defconfig) produced these warnings:
> > > > 
> > > > WARNING: 25 bad relocations
> > > > c0cf2570 R_PPC64_ADDR64__crc___arch_hweight16  
> > > [...]
> > > > Introduced by commit
> > > > 
> > > >   9445aa1a3062 ("ppc: move exports to definitions")
> > > > 
> > > > I have reverted that commit for today.
> > > > 
> > > > [cc-ing the ppc guys for clues - also involved is commit
> > > > 
> > > >   22823ab419d8 ("EXPORT_SYMBOL() for asm")
> > > > ]  
> > > 
> > > FWIW, I see these warnings as well. Any help from ppc developers is
> > > appreciated - should the R_PPC64_ADDR64 be whitelisted for exported asm
> > > symbols (their CRCs actually)?
> > 
> > The dangling relocation is a side effect of linker unable to resolve the
> > reference to the undefined weak symbols. So the real question is, why has
> > genksyms not overridden these symbols with their CRC values?
> > 
> > This may not even be powerpc specific, but  I'll poke at it a bit more
> > when I get a chance.  
> 
> Not sure if this is relevant, but with the commit reverted, the
> __crc___... symbols are absolute.
> 
> f55b3b3d A __crc___arch_hweight16

Ignore that :-)

I just had a look at a x86_64 allmodconfig result and it looks like the
weak symbols are not resolved their either ...

I may be missing something, but genksyms generates the crc's off the
preprocessed C source code and we don't have any for the asm files ...
-- 
Cheers,
Stephen Rothwell


Re: linux-next: build warnings after merge of the kbuild tree

2016-08-18 Thread Stephen Rothwell
Hi Nick,

On Fri, 19 Aug 2016 13:38:54 +1000 Stephen Rothwell  
wrote:
>
> On Thu, 18 Aug 2016 11:09:48 +1000 Nicholas Piggin  wrote:
> >
> > On Wed, 17 Aug 2016 14:59:59 +0200
> > Michal Marek  wrote:
> >   
> > > On 2016-08-17 03:44, Stephen Rothwell wrote:
> > > > 
> > > > After merging the kbuild tree, today's linux-next build (powerpc
> > > > ppc64_defconfig) produced these warnings:
> > > > 
> > > > WARNING: 25 bad relocations
> > > > c0cf2570 R_PPC64_ADDR64__crc___arch_hweight16  
> > > [...]
> > > > Introduced by commit
> > > > 
> > > >   9445aa1a3062 ("ppc: move exports to definitions")
> > > > 
> > > > I have reverted that commit for today.
> > > > 
> > > > [cc-ing the ppc guys for clues - also involved is commit
> > > > 
> > > >   22823ab419d8 ("EXPORT_SYMBOL() for asm")
> > > > ]  
> > > 
> > > FWIW, I see these warnings as well. Any help from ppc developers is
> > > appreciated - should the R_PPC64_ADDR64 be whitelisted for exported asm
> > > symbols (their CRCs actually)?
> > 
> > The dangling relocation is a side effect of linker unable to resolve the
> > reference to the undefined weak symbols. So the real question is, why has
> > genksyms not overridden these symbols with their CRC values?
> > 
> > This may not even be powerpc specific, but  I'll poke at it a bit more
> > when I get a chance.  
> 
> Not sure if this is relevant, but with the commit reverted, the
> __crc___... symbols are absolute.
> 
> f55b3b3d A __crc___arch_hweight16

Ignore that :-)

I just had a look at a x86_64 allmodconfig result and it looks like the
weak symbols are not resolved their either ...

I may be missing something, but genksyms generates the crc's off the
preprocessed C source code and we don't have any for the asm files ...
-- 
Cheers,
Stephen Rothwell


Re: [PACTH v2 0/3] Implement /proc//totmaps

2016-08-18 Thread Sonny Rao
On Thu, Aug 18, 2016 at 12:44 AM, Michal Hocko  wrote:
> On Wed 17-08-16 11:57:56, Sonny Rao wrote:
>> On Wed, Aug 17, 2016 at 6:03 AM, Michal Hocko  wrote:
>> > On Wed 17-08-16 11:31:25, Jann Horn wrote:
> [...]
>> >> That's at least 30.43% + 9.12% + 7.66% = 47.21% of the task's kernel
>> >> time spent on evaluating format strings. The new interface
>> >> wouldn't have to spend that much time on format strings because there
>> >> isn't so much text to format.
>> >
>> > well, this is true of course but I would much rather try to reduce the
>> > overhead of smaps file than add a new file. The following should help
>> > already. I've measured ~7% systime cut down. I guess there is still some
>> > room for improvements but I have to say I'm far from being convinced about
>> > a new proc file just because we suck at dumping information to the
>> > userspace.
>> > If this was something like /proc//stat which is
>> > essentially read all the time then it would be a different question but
>> > is the rss, pss going to be all that often? If yes why?
>>
>> If the question is why do we need to read RSS, PSS, Private_*, Swap
>> and the other fields so often?
>>
>> I have two use cases so far involving monitoring per-process memory
>> usage, and we usually need to read stats for about 25 processes.
>>
>> Here's a timing example on an fairly recent ARM system 4 core RK3288
>> running at 1.8Ghz
>>
>> localhost ~ # time cat /proc/25946/smaps > /dev/null
>>
>> real0m0.036s
>> user0m0.020s
>> sys 0m0.020s
>>
>> localhost ~ # time cat /proc/25946/totmaps > /dev/null
>>
>> real0m0.027s
>> user0m0.010s
>> sys 0m0.010s
>> localhost ~ #
>>
>> I'll ignore the user time for now, and we see about 20 ms of system
>> time with smaps and 10 ms with totmaps, with 20 similar processes it
>> would be 400 milliseconds of cpu time for the kernel to get this
>> information from smaps vs 200 milliseconds with totmaps.  Even totmaps
>> is still pretty slow, but much better than smaps.
>>
>> Use cases:
>> 1) Basic task monitoring -- like "top" that shows memory consumption
>> including PSS, Private, Swap
>> 1 second update means about 40% of one CPU is spent in the kernel
>> gathering the data with smaps
>
> I would argue that even 20% is way too much for such a monitoring. What
> is the value to do it so often tha 20 vs 40ms really matters?

Yeah it is too much (I believe I said that) but it's significantly better.

>> 2) User space OOM handling -- we'd rather do a more graceful shutdown
>> than let the kernel's OOM killer activate and need to gather this
>> information and we'd like to be able to get this information to make
>> the decision much faster than 400ms
>
> Global OOM handling in userspace is really dubious if you ask me. I
> understand you want something better than SIGKILL and in fact this is
> already possible with memory cgroup controller (btw. memcg will give
> you a cheap access to rss, amount of shared, swapped out memory as
> well). Anyway if you are getting close to the OOM your system will most
> probably be really busy and chances are that also reading your new file
> will take much more time. I am also not quite sure how is pss useful for
> oom decisions.

I mentioned it before, but based on experience RSS just isn't good
enough -- there's too much sharing going on in our use case to make
the correct decision based on RSS.  If RSS were good enough, simply
put, this patch wouldn't exist.  So even with memcg I think we'd have
the same problem?

>
> Don't take me wrong, /proc//totmaps might be suitable for your
> specific usecase but so far I haven't heard any sound argument for it to
> be generally usable. It is true that smaps is unnecessarily costly but
> at least I can see some room for improvements. A simple patch I've
> posted cut the formatting overhead by 7%. Maybe we can do more.

It seems like a general problem that if you want these values the
existing kernel interface can be very expensive, so it would be
generally usable by any application which wants a per process PSS,
private data, dirty data or swap value.   I mentioned two use cases,
but I guess I don't understand the comment about why it's not usable
by other use cases.

> --
> Michal Hocko
> SUSE Labs


Re: [PACTH v2 0/3] Implement /proc//totmaps

2016-08-18 Thread Sonny Rao
On Thu, Aug 18, 2016 at 12:44 AM, Michal Hocko  wrote:
> On Wed 17-08-16 11:57:56, Sonny Rao wrote:
>> On Wed, Aug 17, 2016 at 6:03 AM, Michal Hocko  wrote:
>> > On Wed 17-08-16 11:31:25, Jann Horn wrote:
> [...]
>> >> That's at least 30.43% + 9.12% + 7.66% = 47.21% of the task's kernel
>> >> time spent on evaluating format strings. The new interface
>> >> wouldn't have to spend that much time on format strings because there
>> >> isn't so much text to format.
>> >
>> > well, this is true of course but I would much rather try to reduce the
>> > overhead of smaps file than add a new file. The following should help
>> > already. I've measured ~7% systime cut down. I guess there is still some
>> > room for improvements but I have to say I'm far from being convinced about
>> > a new proc file just because we suck at dumping information to the
>> > userspace.
>> > If this was something like /proc//stat which is
>> > essentially read all the time then it would be a different question but
>> > is the rss, pss going to be all that often? If yes why?
>>
>> If the question is why do we need to read RSS, PSS, Private_*, Swap
>> and the other fields so often?
>>
>> I have two use cases so far involving monitoring per-process memory
>> usage, and we usually need to read stats for about 25 processes.
>>
>> Here's a timing example on an fairly recent ARM system 4 core RK3288
>> running at 1.8Ghz
>>
>> localhost ~ # time cat /proc/25946/smaps > /dev/null
>>
>> real0m0.036s
>> user0m0.020s
>> sys 0m0.020s
>>
>> localhost ~ # time cat /proc/25946/totmaps > /dev/null
>>
>> real0m0.027s
>> user0m0.010s
>> sys 0m0.010s
>> localhost ~ #
>>
>> I'll ignore the user time for now, and we see about 20 ms of system
>> time with smaps and 10 ms with totmaps, with 20 similar processes it
>> would be 400 milliseconds of cpu time for the kernel to get this
>> information from smaps vs 200 milliseconds with totmaps.  Even totmaps
>> is still pretty slow, but much better than smaps.
>>
>> Use cases:
>> 1) Basic task monitoring -- like "top" that shows memory consumption
>> including PSS, Private, Swap
>> 1 second update means about 40% of one CPU is spent in the kernel
>> gathering the data with smaps
>
> I would argue that even 20% is way too much for such a monitoring. What
> is the value to do it so often tha 20 vs 40ms really matters?

Yeah it is too much (I believe I said that) but it's significantly better.

>> 2) User space OOM handling -- we'd rather do a more graceful shutdown
>> than let the kernel's OOM killer activate and need to gather this
>> information and we'd like to be able to get this information to make
>> the decision much faster than 400ms
>
> Global OOM handling in userspace is really dubious if you ask me. I
> understand you want something better than SIGKILL and in fact this is
> already possible with memory cgroup controller (btw. memcg will give
> you a cheap access to rss, amount of shared, swapped out memory as
> well). Anyway if you are getting close to the OOM your system will most
> probably be really busy and chances are that also reading your new file
> will take much more time. I am also not quite sure how is pss useful for
> oom decisions.

I mentioned it before, but based on experience RSS just isn't good
enough -- there's too much sharing going on in our use case to make
the correct decision based on RSS.  If RSS were good enough, simply
put, this patch wouldn't exist.  So even with memcg I think we'd have
the same problem?

>
> Don't take me wrong, /proc//totmaps might be suitable for your
> specific usecase but so far I haven't heard any sound argument for it to
> be generally usable. It is true that smaps is unnecessarily costly but
> at least I can see some room for improvements. A simple patch I've
> posted cut the formatting overhead by 7%. Maybe we can do more.

It seems like a general problem that if you want these values the
existing kernel interface can be very expensive, so it would be
generally usable by any application which wants a per process PSS,
private data, dirty data or swap value.   I mentioned two use cases,
but I guess I don't understand the comment about why it's not usable
by other use cases.

> --
> Michal Hocko
> SUSE Labs


Re: [PATCH v3 1/3] devicetree: Sort vendor prefixes in alphabetical order

2016-08-18 Thread Rask Ingemann Lambertsen
On Fri, Aug 19, 2016 at 01:02:37AM +0200, Andrew Lunn wrote:
> > @@ -54,8 +54,8 @@ chipone   ChipOne
> >  chipspark  ChipSPARK
> >  chrp   Common Hardware Reference Platform
> >  chunghwa   Chunghwa Picture Tubes Ltd.
> > -ciaa   Computadora Industrial Abierta Argentina
> >  cirrus Cirrus Logic, Inc.
> > +ciaa   Computadora Industrial Abierta Argentina
> 
> ciaa comes after cirrus?

It does with LC_COLLATE=da_DK :-(
I'm sorry about that. I'll post v4 this afternoon.

-- 
Rask Ingemann Lambertsen
Danish law requires addresses in e-mail to be logged and stored for a year


Re: [PATCH v3 1/3] devicetree: Sort vendor prefixes in alphabetical order

2016-08-18 Thread Rask Ingemann Lambertsen
On Fri, Aug 19, 2016 at 01:02:37AM +0200, Andrew Lunn wrote:
> > @@ -54,8 +54,8 @@ chipone   ChipOne
> >  chipspark  ChipSPARK
> >  chrp   Common Hardware Reference Platform
> >  chunghwa   Chunghwa Picture Tubes Ltd.
> > -ciaa   Computadora Industrial Abierta Argentina
> >  cirrus Cirrus Logic, Inc.
> > +ciaa   Computadora Industrial Abierta Argentina
> 
> ciaa comes after cirrus?

It does with LC_COLLATE=da_DK :-(
I'm sorry about that. I'll post v4 this afternoon.

-- 
Rask Ingemann Lambertsen
Danish law requires addresses in e-mail to be logged and stored for a year


[GIT PULL] xfs, iomap: fixes for 4.8-rc3

2016-08-18 Thread Dave Chinner
Hi Linus,

Can you please pull the fixes from the tag list below? This update
contains fixes for most of the outstanding regressions introduced
with the 4.8-rc1 XFS and iomap infrastructure merge.

The only regression that isn't addressed by this pullreq is the aim7
write regression.  I'm still testing Christophs patches that address
the simple cases we've reproduced, but the cause of the aim7
regression is still not clear so there's more work to be done there.
Still, that's no reason to hold up all the other issues we have
tested fixes for.

Thanks!

-Dave.

The following changes since commit 694d0d0bb2030d2e36df73e2d23d5770511dbc8d:

  Linux 4.8-rc2 (2016-08-14 19:11:36 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git 
tags/xfs-iomap-for-linus-4.8-rc3

for you to fetch changes up to 32438cf9d54bd53b531f6d98814e84dd278360c1:

  Merge branch 'iomap-fixes-4.8-rc3' into for-next (2016-08-17 11:13:37 +1000)


xfs, iomap: update for 4.8-rc3

Changes in this update
- regression fixes for XFS changes introduce in 4.8-rc1
- buffer IO accounting assert failure
- ENOSPC block accounting reservation issue
- DAX IO path page cache invalidation fix
- rmapbt on-disk block count in agf
- correct classification of rmap block type when updating AGFL.
- iomap support for attribute fork mapping
- regression fixes for iomap infrastructure in 4.8-rc1
- fiemap: honor FIEMAP_FLAG_SYNC
- fiemap: implement FIEMAP_FLAG_XATTR support to fix XFS regression
- make mark_page_accessed and pagefault_disable usage consistent with
  other IO paths


Brian Foster (1):
  xfs: don't assert fail on non-async buffers on ioacct decrement

Christoph Hellwig (6):
  xfs: fix bogus space reservation in xfs_iomap_write_allocate
  iomap: remove superflous mark_page_accessed from iomap_write_actor
  iomap: remove superflous pagefault_disable from iomap_write_actor
  iomap: mark ->iomap_end as optional
  xfs: simplify xfs_file_iomap_begin
  xfs: (re-)implement FIEMAP_FLAG_XATTR

Darrick J. Wong (2):
  xfs: store rmapbt block count in the AGF
  xfs: remove OWN_AG rmap when allocating a block from the AGFL

Dave Chinner (4):
  xfs: don't invalidate whole file on DAX read/write
  iomap: fiemap should honor the FIEMAP_FLAG_SYNC flag
  iomap: prepare iomap_fiemap for attribute mappings
  Merge branch 'iomap-fixes-4.8-rc3' into for-next

 fs/iomap.c | 21 -
 fs/xfs/libxfs/xfs_alloc.c  | 14 +
 fs/xfs/libxfs/xfs_format.h | 11 +--
 fs/xfs/libxfs/xfs_rmap_btree.c |  6 
 fs/xfs/xfs_buf.c   |  1 -
 fs/xfs/xfs_file.c  | 13 +++-
 fs/xfs/xfs_fsops.c |  1 +
 fs/xfs/xfs_iomap.c | 69 ++
 fs/xfs/xfs_iomap.h |  1 +
 fs/xfs/xfs_iops.c  |  9 +-
 fs/xfs/xfs_trace.h |  1 -
 11 files changed, 119 insertions(+), 28 deletions(-)
-- 
Dave Chinner
da...@fromorbit.com


[GIT PULL] xfs, iomap: fixes for 4.8-rc3

2016-08-18 Thread Dave Chinner
Hi Linus,

Can you please pull the fixes from the tag list below? This update
contains fixes for most of the outstanding regressions introduced
with the 4.8-rc1 XFS and iomap infrastructure merge.

The only regression that isn't addressed by this pullreq is the aim7
write regression.  I'm still testing Christophs patches that address
the simple cases we've reproduced, but the cause of the aim7
regression is still not clear so there's more work to be done there.
Still, that's no reason to hold up all the other issues we have
tested fixes for.

Thanks!

-Dave.

The following changes since commit 694d0d0bb2030d2e36df73e2d23d5770511dbc8d:

  Linux 4.8-rc2 (2016-08-14 19:11:36 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git 
tags/xfs-iomap-for-linus-4.8-rc3

for you to fetch changes up to 32438cf9d54bd53b531f6d98814e84dd278360c1:

  Merge branch 'iomap-fixes-4.8-rc3' into for-next (2016-08-17 11:13:37 +1000)


xfs, iomap: update for 4.8-rc3

Changes in this update
- regression fixes for XFS changes introduce in 4.8-rc1
- buffer IO accounting assert failure
- ENOSPC block accounting reservation issue
- DAX IO path page cache invalidation fix
- rmapbt on-disk block count in agf
- correct classification of rmap block type when updating AGFL.
- iomap support for attribute fork mapping
- regression fixes for iomap infrastructure in 4.8-rc1
- fiemap: honor FIEMAP_FLAG_SYNC
- fiemap: implement FIEMAP_FLAG_XATTR support to fix XFS regression
- make mark_page_accessed and pagefault_disable usage consistent with
  other IO paths


Brian Foster (1):
  xfs: don't assert fail on non-async buffers on ioacct decrement

Christoph Hellwig (6):
  xfs: fix bogus space reservation in xfs_iomap_write_allocate
  iomap: remove superflous mark_page_accessed from iomap_write_actor
  iomap: remove superflous pagefault_disable from iomap_write_actor
  iomap: mark ->iomap_end as optional
  xfs: simplify xfs_file_iomap_begin
  xfs: (re-)implement FIEMAP_FLAG_XATTR

Darrick J. Wong (2):
  xfs: store rmapbt block count in the AGF
  xfs: remove OWN_AG rmap when allocating a block from the AGFL

Dave Chinner (4):
  xfs: don't invalidate whole file on DAX read/write
  iomap: fiemap should honor the FIEMAP_FLAG_SYNC flag
  iomap: prepare iomap_fiemap for attribute mappings
  Merge branch 'iomap-fixes-4.8-rc3' into for-next

 fs/iomap.c | 21 -
 fs/xfs/libxfs/xfs_alloc.c  | 14 +
 fs/xfs/libxfs/xfs_format.h | 11 +--
 fs/xfs/libxfs/xfs_rmap_btree.c |  6 
 fs/xfs/xfs_buf.c   |  1 -
 fs/xfs/xfs_file.c  | 13 +++-
 fs/xfs/xfs_fsops.c |  1 +
 fs/xfs/xfs_iomap.c | 69 ++
 fs/xfs/xfs_iomap.h |  1 +
 fs/xfs/xfs_iops.c  |  9 +-
 fs/xfs/xfs_trace.h |  1 -
 11 files changed, 119 insertions(+), 28 deletions(-)
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH 2/2] pipe: make pipe user buffer limit checks more precise

2016-08-18 Thread Michael Kerrisk (man-pages)
Andrew,

Thanks for picking up this patch series in -mm. Please drop it.
After discussions with Vegard, I have something better now.

Cheers,

Michael


On 08/16/2016 11:14 PM, Michael Kerrisk (man-pages) wrote:
> As currently implemented, when creating a new pipe or increasing
> a pipe's capacity with fcntl(F_SETPIPE_SZ), the checks against
> the limits in /proc/sys/fs/pipe-user-pages-{soft,hard} (added by
> commit 759c01142a5d0) do not include the pages required for the
> new pipe or increased capacity.  In the case of fcntl(F_SETPIPE_SZ),
> this means that an unprivileged user can make a one-time capacity
> increase that pushes the user consumption over the limits by up
> to the value specified in /proc/sys/fs/pipe-max-size (which
> defaults to 1 MiB, but might be set to a much higher value).
> 
> This patch remedies the problem by including the capacity required
> for the new pipe or the pipe capacity increase in the check against
> the limit.
> 
> There is a small chance that this change could break user-space,
> since there are cases where pipe() and fcntl(F_SETPIPE_SZ) calls
> that previously succeeded might fail. However, the chances are
> small, since (a) the pipe-user-pages-{soft,hard} limits are new
> (in 4.5), and the default soft/hard limits are high/unlimited.
> Therefore, it seems warranted to make these limits operate more
> precisely (and behave more like what users probably expect).
> 
> Using the test program shown in the previous patch, on an unpatched
> kernel, we first set some limits:
> 
> # echo 0 > /proc/sys/fs/pipe-user-pages-soft
> # echo 10 > /proc/sys/fs/pipe-max-size
> # echo 1 > /proc/sys/fs/pipe-user-pages-hard# 40.96 MB
> 
> Then show that we can set a pipe with capacity (100MB) that is
> over the hard limit
> 
> # sudo -u mtk ./test_F_SETPIPE_SZ 1 1
> Loop 1: set pipe capacity to 1 bytes
> F_SETPIPE_SZ returned 134217728
> 
> Now set the capacity to 100MB twice. The second call fails (which is
> probably surprising to most users, since it seems like a no-op):
> 
> # sudo -u mtk ./test_F_SETPIPE_SZ 1 1 0 1
> Loop 1: set pipe capacity to 1 bytes
> F_SETPIPE_SZ returned 134217728
> Loop 2: set pipe capacity to 1 bytes
> Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted
> 
> With a patched kernel, setting a capacity over the limit fails at the
> first attempt:
> 
> # echo 0 > /proc/sys/fs/pipe-user-pages-soft
> # echo 10 > /proc/sys/fs/pipe-max-size
> # echo 1 > /proc/sys/fs/pipe-user-pages-hard
> # sudo -u mtk ./test_F_SETPIPE_SZ 1 1
> Loop 1: set pipe capacity to 1 bytes
> Loop 1, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not 
> permitted
> 
> Cc: Willy Tarreau 
> Cc: Vegard Nossum 
> Cc: socketp...@gmail.com
> Cc: Tetsuo Handa 
> Cc: Jens Axboe 
> Cc: Al Viro 
> Cc: sta...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Michael Kerrisk 
> ---
>  fs/pipe.c | 24 ++--
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/pipe.c b/fs/pipe.c
> index a98ebca..397d8d9 100644
> --- a/fs/pipe.c
> +++ b/fs/pipe.c
> @@ -610,16 +610,20 @@ static void account_pipe_buffers(struct pipe_inode_info 
> *pipe,
>   atomic_long_add(new - old, >user->pipe_bufs);
>  }
>  
> -static bool too_many_pipe_buffers_soft(struct user_struct *user)
> +static bool too_many_pipe_buffers_soft(struct user_struct *user,
> +unsigned int nr_pages)
>  {
>   return pipe_user_pages_soft &&
> -atomic_long_read(>pipe_bufs) >= pipe_user_pages_soft;
> +atomic_long_read(>pipe_bufs) + nr_pages >=
> + pipe_user_pages_soft;
>  }
>  
> -static bool too_many_pipe_buffers_hard(struct user_struct *user)
> +static bool too_many_pipe_buffers_hard(struct user_struct *user,
> +unsigned int nr_pages)
>  {
>   return pipe_user_pages_hard &&
> -atomic_long_read(>pipe_bufs) >= pipe_user_pages_hard;
> +atomic_long_read(>pipe_bufs) + nr_pages >=
> + pipe_user_pages_hard;
>  }
>  
>  struct pipe_inode_info *alloc_pipe_info(void)
> @@ -631,13 +635,13 @@ struct pipe_inode_info *alloc_pipe_info(void)
>   unsigned long pipe_bufs = PIPE_DEF_BUFFERS;
>   struct user_struct *user = get_current_user();
>  
> - if (!too_many_pipe_buffers_hard(user)) {
> - if (too_many_pipe_buffers_soft(user))
> - pipe_bufs = 1;
> + if (too_many_pipe_buffers_soft(user, PIPE_DEF_BUFFERS))
> + pipe_bufs = 1;
> +
> + if 

Re: [PATCH 1/2] pipe: check limits only when increasing pipe capacity

2016-08-18 Thread Michael Kerrisk (man-pages)
Andrew,

thanks for picking up this patch series in -mm. Please drop it.
After discussions with Vegard, I have something better now.

Cheers,

Michael

On 08/16/2016 11:10 PM, Michael Kerrisk (man-pages) wrote:
> When changing a pipe's capacity with fcntl(F_SETPIPE_SZ), various
> limits defined by /proc/sys/fs/pipe-* files are checked to see
> if unprivileged users are exceeding limits on memory consumption.
> 
> While documenting and testing the operation of these limits I noticed
> that, as currently implemented, these checks can lead to cases where
> a user can increase a pipe's capacity and is then unable to decrease
> the capacity. The origin of the problem is two-fold:
> 
> (1) When increasing the pipe capacity, the checks against the limits
> in /proc/sys/fs/pipe-user-pages-{soft,hard} are made against
> existing consumption, and exclude the memory required for the
> increased pipe capacity. The new increase in pipe capacity
> can then push the total memory used by the user for pipes
> (possibly far) over a limit.
> 
> (2) The limit checks are performed even when the new pipe capacity
> is less than the existing pipe capacity. This can lead to
> problems if a user sets a large pipe capacity, and then the
> limits are lowered, with the result that the user will no
> longer be able to decrease the pipe capacity.
> 
> The simple solution given by this patch is to perform the checks
> only when the pipe capacity is being increased. The patch does not
> address the broken check in (1), which allows a user to (one-time)
> set a pipe capacity that pushes the user's consumption over the user
> pipe limits. A change to fix that check is proposed in a subsequent
> patch. I've separated the two fixes because the second fix is a
> little more complex, and could possibly (though unlikely) break
> existing user-space. The current patch implements the simple fix
> that carries little risk and seems obviously correct: allowing an
> unprivileged user always to decrease a pipe's capacity.
> 
> The program below can be used to demonstrate the problem, and the
> effect of the fix. The program takes one or more command-line
> arguments. The first argument specifies the number of pipes
> that the program should create. The remaining arguments are,
> alternately, pipe capacities that should be set using
> fcntl(F_SETPIPE_SZ), and sleep intervals (in seconds) between
> the fcntl() operations. (The sleep intervals allow the possibility
> to change the limits between fcntl() operations.)
> 
> Running this program on an unpatched kernel, we first set some limits:
> 
> # getconf PAGESIZE
> 4096
> # echo 0 > /proc/sys/fs/pipe-user-pages-soft
> # echo 10 > /proc/sys/fs/pipe-max-size
> # echo 1 > /proc/sys/fs/pipe-user-pages-hard# 40.96 MB
> 
> Now perform two fcntl(F_SETPIPE_SZ) operations on a single pipe,
> first setting a pipe capacity (10MB), sleeping for a few seconds,
> during which time the hard limit is lowered, and then set pipe
> capacity to a smaller amount (5MB):
> 
> # sudo -u mtk ./test_F_SETPIPE_SZ 1 1000 15 500 &
> [1] 748
> # Loop 1: set pipe capacity to 1000 bytes
> F_SETPIPE_SZ returned 16777216
> Sleeping 15 seconds
> 
> # echo 1000 > /proc/sys/fs/pipe-user-pages-hard  # 4.096 MB
> 
> # Loop 2: set pipe capacity to 500 bytes
> Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not 
> permitted
> 
> In this case, the user should be able to lower the limit.
> 
> With a kernel that has the patch below, the second fcntl() 
> succeeds:
> 
> # echo 0 > /proc/sys/fs/pipe-user-pages-soft
> # echo 10 > /proc/sys/fs/pipe-max-size
> # echo 1 > /proc/sys/fs/pipe-user-pages-hard
> # sudo -u mtk ./test_F_SETPIPE_SZ 1 1000 15 500 &
> [1] 3215
> # Loop 1: set pipe capacity to 1000 bytes
> F_SETPIPE_SZ returned 16777216
> Sleeping 15 seconds
> 
> # echo 1000 > /proc/sys/fs/pipe-user-pages-hard
> 
> # Loop 2: set pipe capacity to 500 bytes
> F_SETPIPE_SZ returned 8388608
> 
> 8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---
> 
> /* test_F_SETPIPE_SZ.c
> 
>(C) 2016, Michael Kerrisk; licensed under GNU GPL version 2 or later
> 
>Test operation of fcntl(F_SETPIPE_SZ) for setting pipe capacity
>and interactions with limits defined by /proc/sys/fs/pipe-* files.
> */
> 
> int
> main(int argc, char *argv[])
> {
> int (*pfd)[2];
> int npipes;
> int pcap, rcap;
> int j, p, s, stime, loop;
> 
> if (argc < 2) {
> fprintf(stderr, "Usage: %s num-pipes "
> "[pipe-capacity sleep-time]...\n", argv[0]);
> exit(EXIT_FAILURE);
> }
> 
> npipes = atoi(argv[1]);
> 
> pfd = calloc(npipes, sizeof (int [2]));
> if (pfd == NULL) {
> perror("calloc");
> exit(EXIT_FAILURE);
> 

Re: [PATCH 2/2] pipe: make pipe user buffer limit checks more precise

2016-08-18 Thread Michael Kerrisk (man-pages)
Andrew,

Thanks for picking up this patch series in -mm. Please drop it.
After discussions with Vegard, I have something better now.

Cheers,

Michael


On 08/16/2016 11:14 PM, Michael Kerrisk (man-pages) wrote:
> As currently implemented, when creating a new pipe or increasing
> a pipe's capacity with fcntl(F_SETPIPE_SZ), the checks against
> the limits in /proc/sys/fs/pipe-user-pages-{soft,hard} (added by
> commit 759c01142a5d0) do not include the pages required for the
> new pipe or increased capacity.  In the case of fcntl(F_SETPIPE_SZ),
> this means that an unprivileged user can make a one-time capacity
> increase that pushes the user consumption over the limits by up
> to the value specified in /proc/sys/fs/pipe-max-size (which
> defaults to 1 MiB, but might be set to a much higher value).
> 
> This patch remedies the problem by including the capacity required
> for the new pipe or the pipe capacity increase in the check against
> the limit.
> 
> There is a small chance that this change could break user-space,
> since there are cases where pipe() and fcntl(F_SETPIPE_SZ) calls
> that previously succeeded might fail. However, the chances are
> small, since (a) the pipe-user-pages-{soft,hard} limits are new
> (in 4.5), and the default soft/hard limits are high/unlimited.
> Therefore, it seems warranted to make these limits operate more
> precisely (and behave more like what users probably expect).
> 
> Using the test program shown in the previous patch, on an unpatched
> kernel, we first set some limits:
> 
> # echo 0 > /proc/sys/fs/pipe-user-pages-soft
> # echo 10 > /proc/sys/fs/pipe-max-size
> # echo 1 > /proc/sys/fs/pipe-user-pages-hard# 40.96 MB
> 
> Then show that we can set a pipe with capacity (100MB) that is
> over the hard limit
> 
> # sudo -u mtk ./test_F_SETPIPE_SZ 1 1
> Loop 1: set pipe capacity to 1 bytes
> F_SETPIPE_SZ returned 134217728
> 
> Now set the capacity to 100MB twice. The second call fails (which is
> probably surprising to most users, since it seems like a no-op):
> 
> # sudo -u mtk ./test_F_SETPIPE_SZ 1 1 0 1
> Loop 1: set pipe capacity to 1 bytes
> F_SETPIPE_SZ returned 134217728
> Loop 2: set pipe capacity to 1 bytes
> Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted
> 
> With a patched kernel, setting a capacity over the limit fails at the
> first attempt:
> 
> # echo 0 > /proc/sys/fs/pipe-user-pages-soft
> # echo 10 > /proc/sys/fs/pipe-max-size
> # echo 1 > /proc/sys/fs/pipe-user-pages-hard
> # sudo -u mtk ./test_F_SETPIPE_SZ 1 1
> Loop 1: set pipe capacity to 1 bytes
> Loop 1, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not 
> permitted
> 
> Cc: Willy Tarreau 
> Cc: Vegard Nossum 
> Cc: socketp...@gmail.com
> Cc: Tetsuo Handa 
> Cc: Jens Axboe 
> Cc: Al Viro 
> Cc: sta...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Michael Kerrisk 
> ---
>  fs/pipe.c | 24 ++--
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/pipe.c b/fs/pipe.c
> index a98ebca..397d8d9 100644
> --- a/fs/pipe.c
> +++ b/fs/pipe.c
> @@ -610,16 +610,20 @@ static void account_pipe_buffers(struct pipe_inode_info 
> *pipe,
>   atomic_long_add(new - old, >user->pipe_bufs);
>  }
>  
> -static bool too_many_pipe_buffers_soft(struct user_struct *user)
> +static bool too_many_pipe_buffers_soft(struct user_struct *user,
> +unsigned int nr_pages)
>  {
>   return pipe_user_pages_soft &&
> -atomic_long_read(>pipe_bufs) >= pipe_user_pages_soft;
> +atomic_long_read(>pipe_bufs) + nr_pages >=
> + pipe_user_pages_soft;
>  }
>  
> -static bool too_many_pipe_buffers_hard(struct user_struct *user)
> +static bool too_many_pipe_buffers_hard(struct user_struct *user,
> +unsigned int nr_pages)
>  {
>   return pipe_user_pages_hard &&
> -atomic_long_read(>pipe_bufs) >= pipe_user_pages_hard;
> +atomic_long_read(>pipe_bufs) + nr_pages >=
> + pipe_user_pages_hard;
>  }
>  
>  struct pipe_inode_info *alloc_pipe_info(void)
> @@ -631,13 +635,13 @@ struct pipe_inode_info *alloc_pipe_info(void)
>   unsigned long pipe_bufs = PIPE_DEF_BUFFERS;
>   struct user_struct *user = get_current_user();
>  
> - if (!too_many_pipe_buffers_hard(user)) {
> - if (too_many_pipe_buffers_soft(user))
> - pipe_bufs = 1;
> + if (too_many_pipe_buffers_soft(user, PIPE_DEF_BUFFERS))
> + pipe_bufs = 1;
> +
> + if (!too_many_pipe_buffers_hard(user, pipe_bufs))
>   pipe->bufs = kcalloc(pipe_bufs,
>sizeof(struct 

Re: [PATCH 1/2] pipe: check limits only when increasing pipe capacity

2016-08-18 Thread Michael Kerrisk (man-pages)
Andrew,

thanks for picking up this patch series in -mm. Please drop it.
After discussions with Vegard, I have something better now.

Cheers,

Michael

On 08/16/2016 11:10 PM, Michael Kerrisk (man-pages) wrote:
> When changing a pipe's capacity with fcntl(F_SETPIPE_SZ), various
> limits defined by /proc/sys/fs/pipe-* files are checked to see
> if unprivileged users are exceeding limits on memory consumption.
> 
> While documenting and testing the operation of these limits I noticed
> that, as currently implemented, these checks can lead to cases where
> a user can increase a pipe's capacity and is then unable to decrease
> the capacity. The origin of the problem is two-fold:
> 
> (1) When increasing the pipe capacity, the checks against the limits
> in /proc/sys/fs/pipe-user-pages-{soft,hard} are made against
> existing consumption, and exclude the memory required for the
> increased pipe capacity. The new increase in pipe capacity
> can then push the total memory used by the user for pipes
> (possibly far) over a limit.
> 
> (2) The limit checks are performed even when the new pipe capacity
> is less than the existing pipe capacity. This can lead to
> problems if a user sets a large pipe capacity, and then the
> limits are lowered, with the result that the user will no
> longer be able to decrease the pipe capacity.
> 
> The simple solution given by this patch is to perform the checks
> only when the pipe capacity is being increased. The patch does not
> address the broken check in (1), which allows a user to (one-time)
> set a pipe capacity that pushes the user's consumption over the user
> pipe limits. A change to fix that check is proposed in a subsequent
> patch. I've separated the two fixes because the second fix is a
> little more complex, and could possibly (though unlikely) break
> existing user-space. The current patch implements the simple fix
> that carries little risk and seems obviously correct: allowing an
> unprivileged user always to decrease a pipe's capacity.
> 
> The program below can be used to demonstrate the problem, and the
> effect of the fix. The program takes one or more command-line
> arguments. The first argument specifies the number of pipes
> that the program should create. The remaining arguments are,
> alternately, pipe capacities that should be set using
> fcntl(F_SETPIPE_SZ), and sleep intervals (in seconds) between
> the fcntl() operations. (The sleep intervals allow the possibility
> to change the limits between fcntl() operations.)
> 
> Running this program on an unpatched kernel, we first set some limits:
> 
> # getconf PAGESIZE
> 4096
> # echo 0 > /proc/sys/fs/pipe-user-pages-soft
> # echo 10 > /proc/sys/fs/pipe-max-size
> # echo 1 > /proc/sys/fs/pipe-user-pages-hard# 40.96 MB
> 
> Now perform two fcntl(F_SETPIPE_SZ) operations on a single pipe,
> first setting a pipe capacity (10MB), sleeping for a few seconds,
> during which time the hard limit is lowered, and then set pipe
> capacity to a smaller amount (5MB):
> 
> # sudo -u mtk ./test_F_SETPIPE_SZ 1 1000 15 500 &
> [1] 748
> # Loop 1: set pipe capacity to 1000 bytes
> F_SETPIPE_SZ returned 16777216
> Sleeping 15 seconds
> 
> # echo 1000 > /proc/sys/fs/pipe-user-pages-hard  # 4.096 MB
> 
> # Loop 2: set pipe capacity to 500 bytes
> Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not 
> permitted
> 
> In this case, the user should be able to lower the limit.
> 
> With a kernel that has the patch below, the second fcntl() 
> succeeds:
> 
> # echo 0 > /proc/sys/fs/pipe-user-pages-soft
> # echo 10 > /proc/sys/fs/pipe-max-size
> # echo 1 > /proc/sys/fs/pipe-user-pages-hard
> # sudo -u mtk ./test_F_SETPIPE_SZ 1 1000 15 500 &
> [1] 3215
> # Loop 1: set pipe capacity to 1000 bytes
> F_SETPIPE_SZ returned 16777216
> Sleeping 15 seconds
> 
> # echo 1000 > /proc/sys/fs/pipe-user-pages-hard
> 
> # Loop 2: set pipe capacity to 500 bytes
> F_SETPIPE_SZ returned 8388608
> 
> 8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---
> 
> /* test_F_SETPIPE_SZ.c
> 
>(C) 2016, Michael Kerrisk; licensed under GNU GPL version 2 or later
> 
>Test operation of fcntl(F_SETPIPE_SZ) for setting pipe capacity
>and interactions with limits defined by /proc/sys/fs/pipe-* files.
> */
> 
> int
> main(int argc, char *argv[])
> {
> int (*pfd)[2];
> int npipes;
> int pcap, rcap;
> int j, p, s, stime, loop;
> 
> if (argc < 2) {
> fprintf(stderr, "Usage: %s num-pipes "
> "[pipe-capacity sleep-time]...\n", argv[0]);
> exit(EXIT_FAILURE);
> }
> 
> npipes = atoi(argv[1]);
> 
> pfd = calloc(npipes, sizeof (int [2]));
> if (pfd == NULL) {
> perror("calloc");
> exit(EXIT_FAILURE);
> 

RE: [PATCH 1/2] x86: Set up resources correctly on Hyper-V Generation 2

2016-08-18 Thread Matthew Wilcox
Yes, but this actually *removes a bug* in the Linux kernel; if any memory 
resource is left to be set up later, it is currently not set up on x86 machines 
which don't have PCI busses.  That's not very many x86 systems, I'll agree, but 
I'm sure some enterprising person is busy creating an SoC which lacks PCI.

-Original Message-
From: Dan Williams [mailto:dan.j.willi...@intel.com] 
Sent: Thursday, August 18, 2016 4:17 PM
To: Matthew Wilcox 
Cc: X86 ML ; linux-kernel@vger.kernel.org; 
linux-nvd...@lists.01.org
Subject: Re: [PATCH 1/2] x86: Set up resources correctly on Hyper-V Generation 2

On Thu, Aug 18, 2016 at 12:56 PM, Matthew Wilcox  wrote:
> Compared to a patch which removes 5 lines of code, almost any additional work 
> is ocean-boiling.
>

Did you check the state of NFIT enabling in Hyper-V?  Not patching the Linux 
kernel at all is even less work.


RE: [PATCH 1/2] x86: Set up resources correctly on Hyper-V Generation 2

2016-08-18 Thread Matthew Wilcox
Yes, but this actually *removes a bug* in the Linux kernel; if any memory 
resource is left to be set up later, it is currently not set up on x86 machines 
which don't have PCI busses.  That's not very many x86 systems, I'll agree, but 
I'm sure some enterprising person is busy creating an SoC which lacks PCI.

-Original Message-
From: Dan Williams [mailto:dan.j.willi...@intel.com] 
Sent: Thursday, August 18, 2016 4:17 PM
To: Matthew Wilcox 
Cc: X86 ML ; linux-kernel@vger.kernel.org; 
linux-nvd...@lists.01.org
Subject: Re: [PATCH 1/2] x86: Set up resources correctly on Hyper-V Generation 2

On Thu, Aug 18, 2016 at 12:56 PM, Matthew Wilcox  wrote:
> Compared to a patch which removes 5 lines of code, almost any additional work 
> is ocean-boiling.
>

Did you check the state of NFIT enabling in Hyper-V?  Not patching the Linux 
kernel at all is even less work.


Re: [PATCH] sched: fix incorrect PELT values on SMT

2016-08-18 Thread Steve Muckle
On Fri, Aug 19, 2016 at 10:30:36AM +0800, Wanpeng Li wrote:
> 2016-08-19 9:55 GMT+08:00 Steve Muckle :
> > PELT scales its util_sum and util_avg values via
> > arch_scale_cpu_capacity(). If that function is passed the CPU's sched
> > domain then it will reduce the scaling capacity if SD_SHARE_CPUCAPACITY
> > is set. PELT does not pass in the sd however. The other caller of
> > arch_scale_cpu_capacity, update_cpu_capacity(), does. This means
> > util_sum and util_avg scale beyond the CPU capacity on SMT.
> >
> > On an Intel i7-3630QM for example rq->cpu_capacity_orig is 589 but
> > util_avg scales up to 1024.
> >
> > Fix this by passing in the sd in __update_load_avg() as well.
> 
> I believe we notice this at least several months ago.
> https://lkml.org/lkml/2016/5/25/228

Glad to see I'm not alone in thinking this is an issue.

It causes an issue with schedutil, effectively doubling the apparent
demand on SMT. I don't know the load balance code well enough offhand to
say whether it's an issue there.

cheers,
Steve


Re: [PATCH] sched: fix incorrect PELT values on SMT

2016-08-18 Thread Steve Muckle
On Fri, Aug 19, 2016 at 10:30:36AM +0800, Wanpeng Li wrote:
> 2016-08-19 9:55 GMT+08:00 Steve Muckle :
> > PELT scales its util_sum and util_avg values via
> > arch_scale_cpu_capacity(). If that function is passed the CPU's sched
> > domain then it will reduce the scaling capacity if SD_SHARE_CPUCAPACITY
> > is set. PELT does not pass in the sd however. The other caller of
> > arch_scale_cpu_capacity, update_cpu_capacity(), does. This means
> > util_sum and util_avg scale beyond the CPU capacity on SMT.
> >
> > On an Intel i7-3630QM for example rq->cpu_capacity_orig is 589 but
> > util_avg scales up to 1024.
> >
> > Fix this by passing in the sd in __update_load_avg() as well.
> 
> I believe we notice this at least several months ago.
> https://lkml.org/lkml/2016/5/25/228

Glad to see I'm not alone in thinking this is an issue.

It causes an issue with schedutil, effectively doubling the apparent
demand on SMT. I don't know the load balance code well enough offhand to
say whether it's an issue there.

cheers,
Steve


[x86/mm] e1a58320a3: WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page()

2016-08-18 Thread kernel test robot
Greetings,

0day kernel testing robot got the below dmesg and the first bad commit is

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

commit e1a58320a38dfa72be48a0f1a3a92273663ba6db
Author: Stephen Smalley 
AuthorDate: Mon Oct 5 12:55:20 2015 -0400
Commit: Ingo Molnar 
CommitDate: Tue Oct 6 11:11:48 2015 +0200

x86/mm: Warn on W^X mappings

Warn on any residual W+X mappings after setting NX
if DEBUG_WX is enabled.  Introduce a separate
X86_PTDUMP_CORE config that enables the code for
dumping the page tables without enabling the debugfs
interface, so that DEBUG_WX can be enabled without
exposing the debugfs interface.  Switch EFI_PGT_DUMP
to using X86_PTDUMP_CORE so that it also does not require
enabling the debugfs interface.

On success it prints this to the kernel log:

  x86/mm: Checked W+X mappings: passed, no W+X pages found.

On failure it prints a warning and a count of the failed pages:

  [ cut here ]
  WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:226 
note_page+0x610/0x7b0()
  x86/mm: Found insecure W+X mapping at address 
81755000/__stop___ex_table+0xfa8/0xabfa8
  [...]
  Call Trace:
   [] dump_stack+0x44/0x55
   [] warn_slowpath_common+0x82/0xc0
   [] warn_slowpath_fmt+0x5c/0x80
   [] ? note_page+0x5c9/0x7b0
   [] note_page+0x610/0x7b0
   [] ptdump_walk_pgd_level_core+0x259/0x3c0
   [] ptdump_walk_pgd_level_checkwx+0x17/0x20
   [] mark_rodata_ro+0xf5/0x100
   [] ? rest_init+0x80/0x80
   [] kernel_init+0x1d/0xe0
   [] ret_from_fork+0x3f/0x70
   [] ? rest_init+0x80/0x80
  ---[ end trace a1f23a1e42a2ac76 ]---
  x86/mm: Checked W+X mappings: FAILED, 171 W+X pages found.

Signed-off-by: Stephen Smalley 
Acked-by: Kees Cook 
Cc: Andy Lutomirski 
Cc: Arjan van de Ven 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-kernel@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1444064120-11450-1-git-send-email-...@tycho.nsa.gov
[ Improved the Kconfig help text and made the new option default-y
  if CONFIG_DEBUG_RODATA=y, because it already found buggy mappings,
  so we really want people to have this on by default. ]
Signed-off-by: Ingo Molnar 

+---+++--+
|   | 38a413cbc2 | 
e1a58320a3 | v4.4 |
+---+++--+
| boot_successes| 63 | 0
  | 0|
| boot_failures | 0  | 22   
  | 45   |
| WARNING:at_arch/x86/mm/dump_pagetables.c:#note_page() | 0  | 22   
  | 45   |
| calltrace:mark_rodata_ro  | 0  | 22   
  | 45   |
+---+++--+

[   50.648376] debug: unmapping init [mem 0x8800139e9000-0x8800139f]
[   50.652158] debug: unmapping init [mem 0x880013d38000-0x880013df]
[   50.654923] [ cut here ]
[   50.655544] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:225 
note_page+0x334/0x340()
[   50.664908] x86/mm: Found insecure W+X mapping at address 
c00f6000/0xc00f6000
[   50.665893] Modules linked in:
[   50.666282] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
4.3.0-rc3-00013-ge1a5832 #1
[   50.667144] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[   50.680247]  00e1 8819fce8 93698935 
88198000
[   50.681279]  8819fd38 8819fd28 93495f2d 

[   50.682318]  8819fe88   
0004
[   50.683342] Call Trace:
[   50.683668]  [] dump_stack+0x4c/0x67
[   50.690347]  [] warn_slowpath_common+0x8d/0xd0
[   50.691179]  [] warn_slowpath_fmt+0x41/0x50
[   50.696101]  [] note_page+0x334/0x340
[   50.696723]  [] walk_pmd_level+0x13a/0x1c0
[   50.697382]  [] walk_pud_level+0xfe/0x110
[   50.698034]  [] ptdump_walk_pgd_level_core+0xb1/0x130
[   50.698788]  [] ptdump_walk_pgd_level_checkwx+0x12/0x20
[   50.699680]  [] mark_rodata_ro+0xec/0x100
[   50.708648]  [] ? rest_init+0x150/0x150
[   50.709400]  [] kernel_init+0x18/0xe0
[   50.712290]  [] 

[x86/mm] e1a58320a3: WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page()

2016-08-18 Thread kernel test robot
Greetings,

0day kernel testing robot got the below dmesg and the first bad commit is

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

commit e1a58320a38dfa72be48a0f1a3a92273663ba6db
Author: Stephen Smalley 
AuthorDate: Mon Oct 5 12:55:20 2015 -0400
Commit: Ingo Molnar 
CommitDate: Tue Oct 6 11:11:48 2015 +0200

x86/mm: Warn on W^X mappings

Warn on any residual W+X mappings after setting NX
if DEBUG_WX is enabled.  Introduce a separate
X86_PTDUMP_CORE config that enables the code for
dumping the page tables without enabling the debugfs
interface, so that DEBUG_WX can be enabled without
exposing the debugfs interface.  Switch EFI_PGT_DUMP
to using X86_PTDUMP_CORE so that it also does not require
enabling the debugfs interface.

On success it prints this to the kernel log:

  x86/mm: Checked W+X mappings: passed, no W+X pages found.

On failure it prints a warning and a count of the failed pages:

  [ cut here ]
  WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:226 
note_page+0x610/0x7b0()
  x86/mm: Found insecure W+X mapping at address 
81755000/__stop___ex_table+0xfa8/0xabfa8
  [...]
  Call Trace:
   [] dump_stack+0x44/0x55
   [] warn_slowpath_common+0x82/0xc0
   [] warn_slowpath_fmt+0x5c/0x80
   [] ? note_page+0x5c9/0x7b0
   [] note_page+0x610/0x7b0
   [] ptdump_walk_pgd_level_core+0x259/0x3c0
   [] ptdump_walk_pgd_level_checkwx+0x17/0x20
   [] mark_rodata_ro+0xf5/0x100
   [] ? rest_init+0x80/0x80
   [] kernel_init+0x1d/0xe0
   [] ret_from_fork+0x3f/0x70
   [] ? rest_init+0x80/0x80
  ---[ end trace a1f23a1e42a2ac76 ]---
  x86/mm: Checked W+X mappings: FAILED, 171 W+X pages found.

Signed-off-by: Stephen Smalley 
Acked-by: Kees Cook 
Cc: Andy Lutomirski 
Cc: Arjan van de Ven 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-kernel@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1444064120-11450-1-git-send-email-...@tycho.nsa.gov
[ Improved the Kconfig help text and made the new option default-y
  if CONFIG_DEBUG_RODATA=y, because it already found buggy mappings,
  so we really want people to have this on by default. ]
Signed-off-by: Ingo Molnar 

+---+++--+
|   | 38a413cbc2 | 
e1a58320a3 | v4.4 |
+---+++--+
| boot_successes| 63 | 0
  | 0|
| boot_failures | 0  | 22   
  | 45   |
| WARNING:at_arch/x86/mm/dump_pagetables.c:#note_page() | 0  | 22   
  | 45   |
| calltrace:mark_rodata_ro  | 0  | 22   
  | 45   |
+---+++--+

[   50.648376] debug: unmapping init [mem 0x8800139e9000-0x8800139f]
[   50.652158] debug: unmapping init [mem 0x880013d38000-0x880013df]
[   50.654923] [ cut here ]
[   50.655544] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:225 
note_page+0x334/0x340()
[   50.664908] x86/mm: Found insecure W+X mapping at address 
c00f6000/0xc00f6000
[   50.665893] Modules linked in:
[   50.666282] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
4.3.0-rc3-00013-ge1a5832 #1
[   50.667144] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[   50.680247]  00e1 8819fce8 93698935 
88198000
[   50.681279]  8819fd38 8819fd28 93495f2d 

[   50.682318]  8819fe88   
0004
[   50.683342] Call Trace:
[   50.683668]  [] dump_stack+0x4c/0x67
[   50.690347]  [] warn_slowpath_common+0x8d/0xd0
[   50.691179]  [] warn_slowpath_fmt+0x41/0x50
[   50.696101]  [] note_page+0x334/0x340
[   50.696723]  [] walk_pmd_level+0x13a/0x1c0
[   50.697382]  [] walk_pud_level+0xfe/0x110
[   50.698034]  [] ptdump_walk_pgd_level_core+0xb1/0x130
[   50.698788]  [] ptdump_walk_pgd_level_checkwx+0x12/0x20
[   50.699680]  [] mark_rodata_ro+0xec/0x100
[   50.708648]  [] ? rest_init+0x150/0x150
[   50.709400]  [] kernel_init+0x18/0xe0
[   50.712290]  [] ret_from_fork+0x3f/0x70
[   50.712991]  [] ? rest_init+0x150/0x150
[   50.713686] ---[ end trace 77c60916b05835a9 ]---
[   50.714324] x86/mm: Checked W+X mappings: FAILED, 2 W+X pages found.

git bisect start v4.4 v4.3 --
git bisect  bad cd6caf550a2adc763c6301ecc0be01f422fb2aea  # 10:51  0- 
17  Merge tag 

  1   2   3   4   5   6   7   8   9   10   >