Re: [PATCH v6 0/5] /dev/random - a new approach
On Thu, Aug 18, 2016 at 10:49:47PM -0400, Theodore Ts'o wrote: > > That really depends on the system. We can't assume that people are > using systems with a 100Hz clock interrupt. More often than not > people are using tickless kernels these days. That's actually the > problem with changing /dev/urandom to block until things are > initialized. Couldn't we disable tickless until urandom has been seeded? In fact perhaps we should accelerate the timer interrupt rate until it has been seeded? Cheers, -- Email: Herbert XuHome Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [RFC PATCH-tip v4 07/10] locking/rwsem: Change RWSEM_WAITING_BIAS for better disambiguation
2016-08-19 5:11 GMT+08:00 Waiman Long: > When the count value is in between 0 and RWSEM_WAITING_BIAS, there > are 2 possibilities. > Either a writer is present and there is no waiter count = 0x0001 >or there are waiters and readers. There is no easy way to count = 0x000X However, RWSEM_WAITING_BIAS is equal to 0x, so both these two cases are beyond RWSEM_WAITING_BIAS, right? Regards, Wanpeng Li
Re: [PATCH v6 0/5] /dev/random - a new approach
On Thu, Aug 18, 2016 at 10:49:47PM -0400, Theodore Ts'o wrote: > > That really depends on the system. We can't assume that people are > using systems with a 100Hz clock interrupt. More often than not > people are using tickless kernels these days. That's actually the > problem with changing /dev/urandom to block until things are > initialized. Couldn't we disable tickless until urandom has been seeded? In fact perhaps we should accelerate the timer interrupt rate until it has been seeded? Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [RFC PATCH-tip v4 07/10] locking/rwsem: Change RWSEM_WAITING_BIAS for better disambiguation
2016-08-19 5:11 GMT+08:00 Waiman Long : > When the count value is in between 0 and RWSEM_WAITING_BIAS, there > are 2 possibilities. > Either a writer is present and there is no waiter count = 0x0001 >or there are waiters and readers. There is no easy way to count = 0x000X However, RWSEM_WAITING_BIAS is equal to 0x, so both these two cases are beyond RWSEM_WAITING_BIAS, right? Regards, Wanpeng Li
Re: [PATCH 4/8] pipe: fix limit checking in pipe_set_size()
Hi Michael, Since you're changing this code, it's probably worth swapping the size check and capable() below to save a function call in the normal path : On Fri, Aug 19, 2016 at 05:25:35PM +1200, Michael Kerrisk (man-pages) wrote: > + if (nr_pages > pipe->buffers) { > + if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) { => if (size > pipe_max_size && !capable(CAP_SYS_RESOURCE)) { > + ret = -EPERM; > + goto out_revert_acct; > + } else if ((too_many_pipe_buffers_hard(pipe->user) || > + too_many_pipe_buffers_soft(pipe->user)) && > + !capable(CAP_SYS_RESOURCE) && > + !capable(CAP_SYS_ADMIN)) { > + ret = -EPERM; > + goto out_revert_acct; > + } > + } (...) Cheers, Willy
Re: [PATCH 4/8] pipe: fix limit checking in pipe_set_size()
Hi Michael, Since you're changing this code, it's probably worth swapping the size check and capable() below to save a function call in the normal path : On Fri, Aug 19, 2016 at 05:25:35PM +1200, Michael Kerrisk (man-pages) wrote: > + if (nr_pages > pipe->buffers) { > + if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) { => if (size > pipe_max_size && !capable(CAP_SYS_RESOURCE)) { > + ret = -EPERM; > + goto out_revert_acct; > + } else if ((too_many_pipe_buffers_hard(pipe->user) || > + too_many_pipe_buffers_soft(pipe->user)) && > + !capable(CAP_SYS_RESOURCE) && > + !capable(CAP_SYS_ADMIN)) { > + ret = -EPERM; > + goto out_revert_acct; > + } > + } (...) Cheers, Willy
Re: [PATCH v3] mm/slab: Improve performance of gathering slabinfo stats
On 08/18/2016 04:52 AM, Michal Hocko wrote: I am not opposing the patch (to be honest it is quite neat) but this is buggering me for quite some time. Sorry for hijacking this email thread but I couldn't resist. Why are we trying to optimize SLAB and slowly converge it to SLUB feature-wise. I always thought that SLAB should remain stable and time challenged solution which works reasonably well for many/most workloads, while SLUB is an optimized implementation which experiment with slightly different concepts that might boost the performance considerably but might also surprise from time to time. If this is not the case then why do we have both of them in the kernel. It is a lot of code and some features need tweaking both while only one gets testing coverage. So this is mainly a question for maintainers. Why do we maintain both and what is the purpose of them. Michal, Speaking about this patch specifically - I'm not trying to optimize SLAB or make it more similar to SLUB. This patch is a bug fix for an issue where the slowness of 'cat /proc/slabinfo' caused timeouts in other drivers. While optimizing that flow, it became apparent (as Christoph pointed out) that one could converge this patch to SLUB's current implementation. Though I have not done that in this patch (because that warrants a separate patch), I think it makes sense to converge where appropriate, since they both do share some common data structures and code already. Thanks, Aruna
Re: [PATCH v3] mm/slab: Improve performance of gathering slabinfo stats
On 08/18/2016 04:52 AM, Michal Hocko wrote: I am not opposing the patch (to be honest it is quite neat) but this is buggering me for quite some time. Sorry for hijacking this email thread but I couldn't resist. Why are we trying to optimize SLAB and slowly converge it to SLUB feature-wise. I always thought that SLAB should remain stable and time challenged solution which works reasonably well for many/most workloads, while SLUB is an optimized implementation which experiment with slightly different concepts that might boost the performance considerably but might also surprise from time to time. If this is not the case then why do we have both of them in the kernel. It is a lot of code and some features need tweaking both while only one gets testing coverage. So this is mainly a question for maintainers. Why do we maintain both and what is the purpose of them. Michal, Speaking about this patch specifically - I'm not trying to optimize SLAB or make it more similar to SLUB. This patch is a bug fix for an issue where the slowness of 'cat /proc/slabinfo' caused timeouts in other drivers. While optimizing that flow, it became apparent (as Christoph pointed out) that one could converge this patch to SLUB's current implementation. Though I have not done that in this patch (because that warrants a separate patch), I think it makes sense to converge where appropriate, since they both do share some common data structures and code already. Thanks, Aruna
Re: [PATCH] dmaengine: qcom_hidma: release the descriptor before the callback
On Thu, Aug 18, 2016 at 11:48:52PM -0400, Sinan Kaya wrote: > On 8/18/2016 11:42 PM, Vinod Koul wrote: > > On Thu, Aug 18, 2016 at 11:26:28PM -0400, Sinan Kaya wrote: > >> On 8/18/2016 10:48 PM, Vinod Koul wrote: > Keep a size limited list with error cookies and flush them in terminate > all? > >>> I think so, terminate_all anyway cleans up the channel. Btw what is the > >>> behaviour on error? Do you terminate or somthing else? > >>> > >> > >> On error, I flush all outstanding transactions with an error code and I > >> reset > >> the channel. After the reset, the DMA channel is functional again. The > >> client > >> doesn't need to shutdown anything. > > > > You mean from the client context or driver? > > > > The client doesn't need to call device_free_chan_resources and > device_terminate_all > to be specific. Client can certainly call these if it needs to but it is not > required to recover the channel. You didn't answer my question! On error you said you flush, so who does that? > After the reset in error condition, the client can continue issuing new > requests > with tx_submit and device_issue_pending as usual. -- ~Vinod
Re: [PATCH] dmaengine: qcom_hidma: release the descriptor before the callback
On Thu, Aug 18, 2016 at 11:48:52PM -0400, Sinan Kaya wrote: > On 8/18/2016 11:42 PM, Vinod Koul wrote: > > On Thu, Aug 18, 2016 at 11:26:28PM -0400, Sinan Kaya wrote: > >> On 8/18/2016 10:48 PM, Vinod Koul wrote: > Keep a size limited list with error cookies and flush them in terminate > all? > >>> I think so, terminate_all anyway cleans up the channel. Btw what is the > >>> behaviour on error? Do you terminate or somthing else? > >>> > >> > >> On error, I flush all outstanding transactions with an error code and I > >> reset > >> the channel. After the reset, the DMA channel is functional again. The > >> client > >> doesn't need to shutdown anything. > > > > You mean from the client context or driver? > > > > The client doesn't need to call device_free_chan_resources and > device_terminate_all > to be specific. Client can certainly call these if it needs to but it is not > required to recover the channel. You didn't answer my question! On error you said you flush, so who does that? > After the reset in error condition, the client can continue issuing new > requests > with tx_submit and device_issue_pending as usual. -- ~Vinod
[PATCH 3/8] pipe: refactor argument for account_pipe_buffers()
This is a preparatory patch for following work. account_pipe_buffers() performs accounting in the 'user_struct'. There is no need to pass a pointer to a 'pipe_inode_info' struct (which is then dereferenced to obtain a pointer to the 'user' field). Instead, pass a pointer directly to the 'user_struct'. This change is needed in preparation for subsequent patches (and the resulting code is a little more logical). Cc: Willy TarreauCc: Vegard Nossum Cc: socketp...@gmail.com Cc: Tetsuo Handa Cc: Jens Axboe Cc: Al Viro Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Michael Kerrisk --- fs/pipe.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/pipe.c b/fs/pipe.c index 4b98fd0..37b7f5e 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -604,10 +604,10 @@ pipe_fasync(int fd, struct file *filp, int on) return retval; } -static void account_pipe_buffers(struct pipe_inode_info *pipe, +static void account_pipe_buffers(struct user_struct *user, unsigned long old, unsigned long new) { - atomic_long_add(new - old, >user->pipe_bufs); + atomic_long_add(new - old, >pipe_bufs); } static bool too_many_pipe_buffers_soft(struct user_struct *user) @@ -644,7 +644,7 @@ struct pipe_inode_info *alloc_pipe_info(void) pipe->r_counter = pipe->w_counter = 1; pipe->buffers = pipe_bufs; pipe->user = user; - account_pipe_buffers(pipe, 0, pipe_bufs); + account_pipe_buffers(user, 0, pipe_bufs); mutex_init(>mutex); return pipe; } @@ -659,7 +659,7 @@ void free_pipe_info(struct pipe_inode_info *pipe) { int i; - account_pipe_buffers(pipe, pipe->buffers, 0); + account_pipe_buffers(pipe->user, pipe->buffers, 0); free_uid(pipe->user); for (i = 0; i < pipe->buffers; i++) { struct pipe_buffer *buf = pipe->bufs + i; @@ -1080,7 +1080,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg) memcpy(bufs + head, pipe->bufs, tail * sizeof(struct pipe_buffer)); } - account_pipe_buffers(pipe, pipe->buffers, nr_pages); + account_pipe_buffers(pipe->user, pipe->buffers, nr_pages); pipe->curbuf = 0; kfree(pipe->bufs); pipe->bufs = bufs; -- 2.5.5
[PATCH 3/8] pipe: refactor argument for account_pipe_buffers()
This is a preparatory patch for following work. account_pipe_buffers() performs accounting in the 'user_struct'. There is no need to pass a pointer to a 'pipe_inode_info' struct (which is then dereferenced to obtain a pointer to the 'user' field). Instead, pass a pointer directly to the 'user_struct'. This change is needed in preparation for subsequent patches (and the resulting code is a little more logical). Cc: Willy Tarreau Cc: Vegard Nossum Cc: socketp...@gmail.com Cc: Tetsuo Handa Cc: Jens Axboe Cc: Al Viro Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Michael Kerrisk --- fs/pipe.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/pipe.c b/fs/pipe.c index 4b98fd0..37b7f5e 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -604,10 +604,10 @@ pipe_fasync(int fd, struct file *filp, int on) return retval; } -static void account_pipe_buffers(struct pipe_inode_info *pipe, +static void account_pipe_buffers(struct user_struct *user, unsigned long old, unsigned long new) { - atomic_long_add(new - old, >user->pipe_bufs); + atomic_long_add(new - old, >pipe_bufs); } static bool too_many_pipe_buffers_soft(struct user_struct *user) @@ -644,7 +644,7 @@ struct pipe_inode_info *alloc_pipe_info(void) pipe->r_counter = pipe->w_counter = 1; pipe->buffers = pipe_bufs; pipe->user = user; - account_pipe_buffers(pipe, 0, pipe_bufs); + account_pipe_buffers(user, 0, pipe_bufs); mutex_init(>mutex); return pipe; } @@ -659,7 +659,7 @@ void free_pipe_info(struct pipe_inode_info *pipe) { int i; - account_pipe_buffers(pipe, pipe->buffers, 0); + account_pipe_buffers(pipe->user, pipe->buffers, 0); free_uid(pipe->user); for (i = 0; i < pipe->buffers; i++) { struct pipe_buffer *buf = pipe->bufs + i; @@ -1080,7 +1080,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg) memcpy(bufs + head, pipe->bufs, tail * sizeof(struct pipe_buffer)); } - account_pipe_buffers(pipe, pipe->buffers, nr_pages); + account_pipe_buffers(pipe->user, pipe->buffers, nr_pages); pipe->curbuf = 0; kfree(pipe->bufs); pipe->bufs = bufs; -- 2.5.5
Re: [PATCH v10 8/9] arm: dts: mt2701: Add clock controller device nodes
On Thu, 2016-08-18 at 17:18 -0700, Stephen Boyd wrote: > On 08/16, Erin Lo wrote: > > From: James Liao> > > > Add clock controller nodes for MT2701, include topckgen, infracfg, > > pericfg, apmixedsys, mmsys, imgsys, vdecsys, hifsys, ethsys and > > bdpsys. This patch also add two oscillators that provide clocks for > > MT2701. > > > > Signed-off-by: James Liao > > Signed-off-by: Erin Lo > > --- > > This should go through arm-soc tree, so do you need a stable > branch in clk tree to pull through arm-soc, or are we going to > wait a release cycle on the dts patches? Hi Stephen, I prefer to wait a release cycle. We may merge clk driver first, then merge dts patches in next kernel release. Best regards, James
[PATCH 4/8] pipe: fix limit checking in pipe_set_size()
The limit checking in pipe_set_size() (used by fcntl(F_SETPIPE_SZ)) has the following problems: (1) When increasing the pipe capacity, the checks against the limits in /proc/sys/fs/pipe-user-pages-{soft,hard} are made against existing consumption, and exclude the memory required for the increased pipe capacity. The new increase in pipe capacity can then push the total memory used by the user for pipes (possibly far) over a limit. This can also trigger the problem described next. (2) The limit checks are performed even when the new pipe capacity is less than the existing pipe capacity. This can lead to problems if a user sets a large pipe capacity, and then the limits are lowered, with the result that the user will no longer be able to decrease the pipe capacity. (3) As currently implemented, accounting and checking against the limits is done as follows: (a) Test whether the user has exceeded the limit. (b) Make new pipe buffer allocation. (c) Account new allocation against the limits. This is racey. Multiple processes may pass point (a) simultaneously, and then allocate pipe buffers that are accounted for only in step (c). The race means that the user's pipe buffer allocation could be pushed over the limit (by an arbitrary amount, depending on how unlucky we were in the race). [Thanks to Vegard Nossum for spotting this point, which I had missed.] This patch addresses the above problems as follows: * Perform checks against the limits only when increasing a pipe's capacity; an unprivileged user can always decrease a pipe's capacity. * Alter the checks against limits to include the memory required for the new pipe capacity. * Re-order the accounting step so that it precedes the buffer allocation. If the accounting step determines that a limit has been reached, revert the accounting and cause the operation to fail. The program below can be used to demonstrate problems 1 and 2, and the effect of the fix. The program takes one or more command-line arguments. The first argument specifies the number of pipes that the program should create. The remaining arguments are, alternately, pipe capacities that should be set using fcntl(F_SETPIPE_SZ), and sleep intervals (in seconds) between the fcntl() operations. (The sleep intervals allow the possibility to change the limits between fcntl() operations.) Problem 1 = Using the test program on an unpatched kernel, we first set some limits: # echo 0 > /proc/sys/fs/pipe-user-pages-soft # echo 10 > /proc/sys/fs/pipe-max-size # echo 1 > /proc/sys/fs/pipe-user-pages-hard# 40.96 MB Then show that we can set a pipe with capacity (100MB) that is over the hard limit # sudo -u mtk ./test_F_SETPIPE_SZ 1 1 Initial pipe capacity: 65536 Loop 1: set pipe capacity to 1 bytes F_SETPIPE_SZ returned 134217728 Now set the capacity to 100MB twice. The second call fails (which is probably surprising to most users, since it seems like a no-op): # sudo -u mtk ./test_F_SETPIPE_SZ 1 1 0 1 Initial pipe capacity: 65536 Loop 1: set pipe capacity to 1 bytes F_SETPIPE_SZ returned 134217728 Loop 2: set pipe capacity to 1 bytes Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted With a patched kernel, setting a capacity over the limit fails at the first attempt: # echo 0 > /proc/sys/fs/pipe-user-pages-soft # echo 10 > /proc/sys/fs/pipe-max-size # echo 1 > /proc/sys/fs/pipe-user-pages-hard # sudo -u mtk ./test_F_SETPIPE_SZ 1 1 Initial pipe capacity: 65536 Loop 1: set pipe capacity to 1 bytes Loop 1, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted There is a small chance that the change to fix this problem could break user-space, since there are cases where fcntl(F_SETPIPE_SZ) calls that previously succeeded might fail. However, the chances are small, since (a) the pipe-user-pages-{soft,hard} limits are new (in 4.5), and the default soft/hard limits are high/unlimited. Therefore, it seems warranted to make these limits operate more precisely (and behave more like what users probably expect). Problem 2 = Running the test program on an unpatched kernel, we first set some limits: # getconf PAGESIZE 4096 # echo 0 > /proc/sys/fs/pipe-user-pages-soft # echo 10 > /proc/sys/fs/pipe-max-size # echo 1 > /proc/sys/fs/pipe-user-pages-hard# 40.96 MB Now perform two fcntl(F_SETPIPE_SZ) operations on a single pipe, first setting a pipe capacity (10MB), sleeping for a few seconds, during which time the hard limit is lowered, and then set pipe capacity to a smaller amount (5MB): # sudo -u mtk ./test_F_SETPIPE_SZ 1 1000 15 500 & [1] 748 # Initial pipe capacity: 65536 Loop 1: set pipe capacity
Re: [PATCH v10 8/9] arm: dts: mt2701: Add clock controller device nodes
On Thu, 2016-08-18 at 17:18 -0700, Stephen Boyd wrote: > On 08/16, Erin Lo wrote: > > From: James Liao > > > > Add clock controller nodes for MT2701, include topckgen, infracfg, > > pericfg, apmixedsys, mmsys, imgsys, vdecsys, hifsys, ethsys and > > bdpsys. This patch also add two oscillators that provide clocks for > > MT2701. > > > > Signed-off-by: James Liao > > Signed-off-by: Erin Lo > > --- > > This should go through arm-soc tree, so do you need a stable > branch in clk tree to pull through arm-soc, or are we going to > wait a release cycle on the dts patches? Hi Stephen, I prefer to wait a release cycle. We may merge clk driver first, then merge dts patches in next kernel release. Best regards, James
[PATCH 4/8] pipe: fix limit checking in pipe_set_size()
The limit checking in pipe_set_size() (used by fcntl(F_SETPIPE_SZ)) has the following problems: (1) When increasing the pipe capacity, the checks against the limits in /proc/sys/fs/pipe-user-pages-{soft,hard} are made against existing consumption, and exclude the memory required for the increased pipe capacity. The new increase in pipe capacity can then push the total memory used by the user for pipes (possibly far) over a limit. This can also trigger the problem described next. (2) The limit checks are performed even when the new pipe capacity is less than the existing pipe capacity. This can lead to problems if a user sets a large pipe capacity, and then the limits are lowered, with the result that the user will no longer be able to decrease the pipe capacity. (3) As currently implemented, accounting and checking against the limits is done as follows: (a) Test whether the user has exceeded the limit. (b) Make new pipe buffer allocation. (c) Account new allocation against the limits. This is racey. Multiple processes may pass point (a) simultaneously, and then allocate pipe buffers that are accounted for only in step (c). The race means that the user's pipe buffer allocation could be pushed over the limit (by an arbitrary amount, depending on how unlucky we were in the race). [Thanks to Vegard Nossum for spotting this point, which I had missed.] This patch addresses the above problems as follows: * Perform checks against the limits only when increasing a pipe's capacity; an unprivileged user can always decrease a pipe's capacity. * Alter the checks against limits to include the memory required for the new pipe capacity. * Re-order the accounting step so that it precedes the buffer allocation. If the accounting step determines that a limit has been reached, revert the accounting and cause the operation to fail. The program below can be used to demonstrate problems 1 and 2, and the effect of the fix. The program takes one or more command-line arguments. The first argument specifies the number of pipes that the program should create. The remaining arguments are, alternately, pipe capacities that should be set using fcntl(F_SETPIPE_SZ), and sleep intervals (in seconds) between the fcntl() operations. (The sleep intervals allow the possibility to change the limits between fcntl() operations.) Problem 1 = Using the test program on an unpatched kernel, we first set some limits: # echo 0 > /proc/sys/fs/pipe-user-pages-soft # echo 10 > /proc/sys/fs/pipe-max-size # echo 1 > /proc/sys/fs/pipe-user-pages-hard# 40.96 MB Then show that we can set a pipe with capacity (100MB) that is over the hard limit # sudo -u mtk ./test_F_SETPIPE_SZ 1 1 Initial pipe capacity: 65536 Loop 1: set pipe capacity to 1 bytes F_SETPIPE_SZ returned 134217728 Now set the capacity to 100MB twice. The second call fails (which is probably surprising to most users, since it seems like a no-op): # sudo -u mtk ./test_F_SETPIPE_SZ 1 1 0 1 Initial pipe capacity: 65536 Loop 1: set pipe capacity to 1 bytes F_SETPIPE_SZ returned 134217728 Loop 2: set pipe capacity to 1 bytes Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted With a patched kernel, setting a capacity over the limit fails at the first attempt: # echo 0 > /proc/sys/fs/pipe-user-pages-soft # echo 10 > /proc/sys/fs/pipe-max-size # echo 1 > /proc/sys/fs/pipe-user-pages-hard # sudo -u mtk ./test_F_SETPIPE_SZ 1 1 Initial pipe capacity: 65536 Loop 1: set pipe capacity to 1 bytes Loop 1, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted There is a small chance that the change to fix this problem could break user-space, since there are cases where fcntl(F_SETPIPE_SZ) calls that previously succeeded might fail. However, the chances are small, since (a) the pipe-user-pages-{soft,hard} limits are new (in 4.5), and the default soft/hard limits are high/unlimited. Therefore, it seems warranted to make these limits operate more precisely (and behave more like what users probably expect). Problem 2 = Running the test program on an unpatched kernel, we first set some limits: # getconf PAGESIZE 4096 # echo 0 > /proc/sys/fs/pipe-user-pages-soft # echo 10 > /proc/sys/fs/pipe-max-size # echo 1 > /proc/sys/fs/pipe-user-pages-hard# 40.96 MB Now perform two fcntl(F_SETPIPE_SZ) operations on a single pipe, first setting a pipe capacity (10MB), sleeping for a few seconds, during which time the hard limit is lowered, and then set pipe capacity to a smaller amount (5MB): # sudo -u mtk ./test_F_SETPIPE_SZ 1 1000 15 500 & [1] 748 # Initial pipe capacity: 65536 Loop 1: set pipe capacity
[PATCH v5 12/12] [media] vivid: Add support for HSV encoding
Support HSV encoding. Most of the logic is replicated from ycbcr_enc. Signed-off-by: Ricardo Ribalda Delgado--- drivers/media/common/v4l2-tpg/v4l2-tpg-core.c | 25 + drivers/media/platform/vivid/vivid-core.h | 1 + drivers/media/platform/vivid/vivid-ctrls.c | 25 + drivers/media/platform/vivid/vivid-vid-cap.c| 17 +++-- drivers/media/platform/vivid/vivid-vid-common.c | 2 ++ drivers/media/platform/vivid/vivid-vid-out.c| 1 + include/media/v4l2-tpg.h| 15 +++ 7 files changed, 76 insertions(+), 10 deletions(-) diff --git a/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c b/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c index ed37ae307cac..28d7b072d867 100644 --- a/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c +++ b/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c @@ -504,6 +504,7 @@ static void color_to_hsv(struct tpg_data *tpg, int r, int g, int b, int max_rgb, min_rgb, diff_rgb; int aux; int third; + int third_size; r >>= 4; g >>= 4; @@ -530,30 +531,36 @@ static void color_to_hsv(struct tpg_data *tpg, int r, int g, int b, return; } + third_size = (tpg->real_hsv_enc == V4L2_HSV_ENC_180) ? 60 : 85; + /* Hue */ if (max_rgb == r) { aux = g - b; third = 0; } else if (max_rgb == g) { aux = b - r; - third = 60; + third = third_size; } else { aux = r - g; - third = 120; + third = third_size * 2; } - aux *= 30; + aux *= third_size / 2; aux += diff_rgb / 2; aux /= diff_rgb; aux += third; /* Clamp Hue */ - if (aux < 0) - aux += 180; - else if (aux > 180) - aux -= 180; - *h = aux; + if (tpg->real_hsv_enc == V4L2_HSV_ENC_180) { + if (aux < 0) + aux += 180; + else if (aux > 180) + aux -= 180; + } else { + aux = aux & 0xff; + } + *h = aux; } static void rgb2ycbcr(const int m[3][3], int r, int g, int b, @@ -1928,6 +1935,7 @@ static void tpg_recalc(struct tpg_data *tpg) tpg->recalc_lines = true; tpg->real_xfer_func = tpg->xfer_func; tpg->real_ycbcr_enc = tpg->ycbcr_enc; + tpg->real_hsv_enc = tpg->hsv_enc; tpg->real_quantization = tpg->quantization; if (tpg->xfer_func == V4L2_XFER_FUNC_DEFAULT) @@ -2018,6 +2026,7 @@ void tpg_log_status(struct tpg_data *tpg) pr_info("tpg colorspace: %d\n", tpg->colorspace); pr_info("tpg transfer function: %d/%d\n", tpg->xfer_func, tpg->real_xfer_func); pr_info("tpg Y'CbCr encoding: %d/%d\n", tpg->ycbcr_enc, tpg->real_ycbcr_enc); + pr_info("tpg HSV encoding: %d/%d\n", tpg->hsv_enc, tpg->real_hsv_enc); pr_info("tpg quantization: %d/%d\n", tpg->quantization, tpg->real_quantization); pr_info("tpg RGB range: %d/%d\n", tpg->rgb_range, tpg->real_rgb_range); } diff --git a/drivers/media/platform/vivid/vivid-core.h b/drivers/media/platform/vivid/vivid-core.h index b59b49456d45..5cdf95bdc4d1 100644 --- a/drivers/media/platform/vivid/vivid-core.h +++ b/drivers/media/platform/vivid/vivid-core.h @@ -346,6 +346,7 @@ struct vivid_dev { struct v4l2_dv_timings dv_timings_out; u32 colorspace_out; u32 ycbcr_enc_out; + u32 hsv_enc_out; u32 quantization_out; u32 xfer_func_out; u32 service_set_out; diff --git a/drivers/media/platform/vivid/vivid-ctrls.c b/drivers/media/platform/vivid/vivid-ctrls.c index aceb38d9f7e7..34731f71cc00 100644 --- a/drivers/media/platform/vivid/vivid-ctrls.c +++ b/drivers/media/platform/vivid/vivid-ctrls.c @@ -79,6 +79,7 @@ #define VIVID_CID_MAX_EDID_BLOCKS (VIVID_CID_VIVID_BASE + 40) #define VIVID_CID_PERCENTAGE_FILL (VIVID_CID_VIVID_BASE + 41) #define VIVID_CID_REDUCED_FPS (VIVID_CID_VIVID_BASE + 42) +#define VIVID_CID_HSV_ENC (VIVID_CID_VIVID_BASE + 43) #define VIVID_CID_STD_SIGNAL_MODE (VIVID_CID_VIVID_BASE + 60) #define VIVID_CID_STANDARD (VIVID_CID_VIVID_BASE + 61) @@ -378,6 +379,14 @@ static int vivid_vid_cap_s_ctrl(struct v4l2_ctrl *ctrl) vivid_send_source_change(dev, HDMI); vivid_send_source_change(dev, WEBCAM); break; + case VIVID_CID_HSV_ENC: + tpg_s_hsv_enc(>tpg, ctrl->val ? V4L2_HSV_ENC_256 : +V4L2_HSV_ENC_180); +
[PATCH v5 12/12] [media] vivid: Add support for HSV encoding
Support HSV encoding. Most of the logic is replicated from ycbcr_enc. Signed-off-by: Ricardo Ribalda Delgado --- drivers/media/common/v4l2-tpg/v4l2-tpg-core.c | 25 + drivers/media/platform/vivid/vivid-core.h | 1 + drivers/media/platform/vivid/vivid-ctrls.c | 25 + drivers/media/platform/vivid/vivid-vid-cap.c| 17 +++-- drivers/media/platform/vivid/vivid-vid-common.c | 2 ++ drivers/media/platform/vivid/vivid-vid-out.c| 1 + include/media/v4l2-tpg.h| 15 +++ 7 files changed, 76 insertions(+), 10 deletions(-) diff --git a/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c b/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c index ed37ae307cac..28d7b072d867 100644 --- a/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c +++ b/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c @@ -504,6 +504,7 @@ static void color_to_hsv(struct tpg_data *tpg, int r, int g, int b, int max_rgb, min_rgb, diff_rgb; int aux; int third; + int third_size; r >>= 4; g >>= 4; @@ -530,30 +531,36 @@ static void color_to_hsv(struct tpg_data *tpg, int r, int g, int b, return; } + third_size = (tpg->real_hsv_enc == V4L2_HSV_ENC_180) ? 60 : 85; + /* Hue */ if (max_rgb == r) { aux = g - b; third = 0; } else if (max_rgb == g) { aux = b - r; - third = 60; + third = third_size; } else { aux = r - g; - third = 120; + third = third_size * 2; } - aux *= 30; + aux *= third_size / 2; aux += diff_rgb / 2; aux /= diff_rgb; aux += third; /* Clamp Hue */ - if (aux < 0) - aux += 180; - else if (aux > 180) - aux -= 180; - *h = aux; + if (tpg->real_hsv_enc == V4L2_HSV_ENC_180) { + if (aux < 0) + aux += 180; + else if (aux > 180) + aux -= 180; + } else { + aux = aux & 0xff; + } + *h = aux; } static void rgb2ycbcr(const int m[3][3], int r, int g, int b, @@ -1928,6 +1935,7 @@ static void tpg_recalc(struct tpg_data *tpg) tpg->recalc_lines = true; tpg->real_xfer_func = tpg->xfer_func; tpg->real_ycbcr_enc = tpg->ycbcr_enc; + tpg->real_hsv_enc = tpg->hsv_enc; tpg->real_quantization = tpg->quantization; if (tpg->xfer_func == V4L2_XFER_FUNC_DEFAULT) @@ -2018,6 +2026,7 @@ void tpg_log_status(struct tpg_data *tpg) pr_info("tpg colorspace: %d\n", tpg->colorspace); pr_info("tpg transfer function: %d/%d\n", tpg->xfer_func, tpg->real_xfer_func); pr_info("tpg Y'CbCr encoding: %d/%d\n", tpg->ycbcr_enc, tpg->real_ycbcr_enc); + pr_info("tpg HSV encoding: %d/%d\n", tpg->hsv_enc, tpg->real_hsv_enc); pr_info("tpg quantization: %d/%d\n", tpg->quantization, tpg->real_quantization); pr_info("tpg RGB range: %d/%d\n", tpg->rgb_range, tpg->real_rgb_range); } diff --git a/drivers/media/platform/vivid/vivid-core.h b/drivers/media/platform/vivid/vivid-core.h index b59b49456d45..5cdf95bdc4d1 100644 --- a/drivers/media/platform/vivid/vivid-core.h +++ b/drivers/media/platform/vivid/vivid-core.h @@ -346,6 +346,7 @@ struct vivid_dev { struct v4l2_dv_timings dv_timings_out; u32 colorspace_out; u32 ycbcr_enc_out; + u32 hsv_enc_out; u32 quantization_out; u32 xfer_func_out; u32 service_set_out; diff --git a/drivers/media/platform/vivid/vivid-ctrls.c b/drivers/media/platform/vivid/vivid-ctrls.c index aceb38d9f7e7..34731f71cc00 100644 --- a/drivers/media/platform/vivid/vivid-ctrls.c +++ b/drivers/media/platform/vivid/vivid-ctrls.c @@ -79,6 +79,7 @@ #define VIVID_CID_MAX_EDID_BLOCKS (VIVID_CID_VIVID_BASE + 40) #define VIVID_CID_PERCENTAGE_FILL (VIVID_CID_VIVID_BASE + 41) #define VIVID_CID_REDUCED_FPS (VIVID_CID_VIVID_BASE + 42) +#define VIVID_CID_HSV_ENC (VIVID_CID_VIVID_BASE + 43) #define VIVID_CID_STD_SIGNAL_MODE (VIVID_CID_VIVID_BASE + 60) #define VIVID_CID_STANDARD (VIVID_CID_VIVID_BASE + 61) @@ -378,6 +379,14 @@ static int vivid_vid_cap_s_ctrl(struct v4l2_ctrl *ctrl) vivid_send_source_change(dev, HDMI); vivid_send_source_change(dev, WEBCAM); break; + case VIVID_CID_HSV_ENC: + tpg_s_hsv_enc(>tpg, ctrl->val ? V4L2_HSV_ENC_256 : +V4L2_HSV_ENC_180); + vivid_send_source_change(dev, TV); +
Re: [PATCH v4 0/3] perf annotate: Enable cross arch annotate
I've sent v5 series for this. Please review it. Thanks, Ravi On Wednesday 13 July 2016 03:15 PM, Ravi Bangoria wrote: > Arnaldo, Michael, > > I've tested this patchset on ppc64 BE and LE both. Please review this. > > -Ravi > > On Friday 08 July 2016 10:10 AM, Ravi Bangoria wrote: >> Perf can currently only support code navigation (branches and calls) in >> annotate when run on the same architecture where perf.data was recorded. >> But cross arch annotate is not supported. >> >> This patchset enables cross arch annotate. Currently I've used x86 >> and arm instructions which are already available and adding support >> for powerpc as well. Adding support for other arch will be easy. >> >> I've created this patch on top of acme/perf/core. And tested it with >> x86 and powerpc only. >> >> Note for arm: >> Few instructions were defined under #if __arm__ which I've used as a >> table for arm. But I'm not sure whether instruction defined outside of >> that also contains arm instructions. Apart from that, 'call__parse()' >> and 'move__parse()' contains #ifdef __arm__ directive. I've changed it >> to if (!strcmp(norm_arch, arm)). I don't have a arm machine to test >> these changes. >> >> Example: >> >>Record on powerpc: >>$ ./perf record -a >> >>Report -> Annotate on x86: >>$ ./perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc >> >> Changes in v4: >>- powerpc: Added support for branch instructions that includes 'ctr' >>- __maybe_unused was misplaced at few location. Corrected it. >>- Moved position of v3 last patch that define macro for each arch name >> >> v3 link: https://lkml.org/lkml/2016/6/30/99 >> >> Naveen N. Rao (1): >>perf annotate: add powerpc support >> >> Ravi Bangoria (2): >>perf: Define macro for normalized arch names >>perf annotate: Enable cross arch annotate >> >> tools/perf/arch/common.c | 36 ++--- >> tools/perf/arch/common.h | 11 ++ >> tools/perf/builtin-top.c | 2 +- >> tools/perf/ui/browsers/annotate.c | 3 +- >> tools/perf/ui/gtk/annotate.c | 2 +- >> tools/perf/util/annotate.c | 273 >> ++--- >> tools/perf/util/annotate.h | 6 +- >> tools/perf/util/unwind-libunwind.c | 4 +- >> 8 files changed, 265 insertions(+), 72 deletions(-) >> >> -- >> 2.5.5 >> >
Re: [PATCH v4 0/3] perf annotate: Enable cross arch annotate
I've sent v5 series for this. Please review it. Thanks, Ravi On Wednesday 13 July 2016 03:15 PM, Ravi Bangoria wrote: > Arnaldo, Michael, > > I've tested this patchset on ppc64 BE and LE both. Please review this. > > -Ravi > > On Friday 08 July 2016 10:10 AM, Ravi Bangoria wrote: >> Perf can currently only support code navigation (branches and calls) in >> annotate when run on the same architecture where perf.data was recorded. >> But cross arch annotate is not supported. >> >> This patchset enables cross arch annotate. Currently I've used x86 >> and arm instructions which are already available and adding support >> for powerpc as well. Adding support for other arch will be easy. >> >> I've created this patch on top of acme/perf/core. And tested it with >> x86 and powerpc only. >> >> Note for arm: >> Few instructions were defined under #if __arm__ which I've used as a >> table for arm. But I'm not sure whether instruction defined outside of >> that also contains arm instructions. Apart from that, 'call__parse()' >> and 'move__parse()' contains #ifdef __arm__ directive. I've changed it >> to if (!strcmp(norm_arch, arm)). I don't have a arm machine to test >> these changes. >> >> Example: >> >>Record on powerpc: >>$ ./perf record -a >> >>Report -> Annotate on x86: >>$ ./perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc >> >> Changes in v4: >>- powerpc: Added support for branch instructions that includes 'ctr' >>- __maybe_unused was misplaced at few location. Corrected it. >>- Moved position of v3 last patch that define macro for each arch name >> >> v3 link: https://lkml.org/lkml/2016/6/30/99 >> >> Naveen N. Rao (1): >>perf annotate: add powerpc support >> >> Ravi Bangoria (2): >>perf: Define macro for normalized arch names >>perf annotate: Enable cross arch annotate >> >> tools/perf/arch/common.c | 36 ++--- >> tools/perf/arch/common.h | 11 ++ >> tools/perf/builtin-top.c | 2 +- >> tools/perf/ui/browsers/annotate.c | 3 +- >> tools/perf/ui/gtk/annotate.c | 2 +- >> tools/perf/util/annotate.c | 273 >> ++--- >> tools/perf/util/annotate.h | 6 +- >> tools/perf/util/unwind-libunwind.c | 4 +- >> 8 files changed, 265 insertions(+), 72 deletions(-) >> >> -- >> 2.5.5 >> >
Re: [PATCH v4 02/57] x86/asm/head: remove unused init_rsp variable extern
On 2016-08-18 08:05:42 [-0500], Josh Poimboeuf wrote: > There is no init_rsp variable. Remove its extern. You could add that it was removed in 9cf4f298e29a ("x86: use stack_start in x86_64") (merged in v2.6.27-rc1). > Signed-off-by: Josh PoimboeufSebastian
[PATCH 2/8] pipe: move limit checking logic into pipe_set_size()
This is a preparatory patch for following work. Move the F_SETPIPE_SZ limit-checking logic from pipe_fcntl() into pipe_set_size(). This simplifies the code a little, and allows for reworking required in later patches. Cc: Willy TarreauCc: Vegard Nossum Cc: socketp...@gmail.com Cc: Tetsuo Handa Cc: Jens Axboe Cc: Al Viro Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Michael Kerrisk --- fs/pipe.c | 41 ++--- 1 file changed, 18 insertions(+), 23 deletions(-) diff --git a/fs/pipe.c b/fs/pipe.c index 7d7c21e..4b98fd0 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -1026,9 +1026,24 @@ static inline unsigned int round_pipe_size(unsigned int size) * Allocate a new array of pipe buffers and copy the info over. Returns the * pipe size if successful, or return -ERROR on error. */ -static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long nr_pages) +static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg) { struct pipe_buffer *bufs; + unsigned int size, nr_pages; + + size = round_pipe_size(arg); + nr_pages = size >> PAGE_SHIFT; + + if (!nr_pages) + return -EINVAL; + + if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) + return -EPERM; + + if ((too_many_pipe_buffers_hard(pipe->user) || + too_many_pipe_buffers_soft(pipe->user)) && + !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) + return -EPERM; /* * We can shrink the pipe, if arg >= pipe->nrbufs. Since we don't @@ -1112,28 +1127,9 @@ long pipe_fcntl(struct file *file, unsigned int cmd, unsigned long arg) __pipe_lock(pipe); switch (cmd) { - case F_SETPIPE_SZ: { - unsigned int size, nr_pages; - - size = round_pipe_size(arg); - nr_pages = size >> PAGE_SHIFT; - - ret = -EINVAL; - if (!nr_pages) - goto out; - - if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) { - ret = -EPERM; - goto out; - } else if ((too_many_pipe_buffers_hard(pipe->user) || - too_many_pipe_buffers_soft(pipe->user)) && - !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) { - ret = -EPERM; - goto out; - } - ret = pipe_set_size(pipe, nr_pages); + case F_SETPIPE_SZ: + ret = pipe_set_size(pipe, arg); break; - } case F_GETPIPE_SZ: ret = pipe->buffers * PAGE_SIZE; break; @@ -1142,7 +1138,6 @@ long pipe_fcntl(struct file *file, unsigned int cmd, unsigned long arg) break; } -out: __pipe_unlock(pipe); return ret; } -- 2.5.5
Re: [PATCH v4 02/57] x86/asm/head: remove unused init_rsp variable extern
On 2016-08-18 08:05:42 [-0500], Josh Poimboeuf wrote: > There is no init_rsp variable. Remove its extern. You could add that it was removed in 9cf4f298e29a ("x86: use stack_start in x86_64") (merged in v2.6.27-rc1). > Signed-off-by: Josh Poimboeuf Sebastian
[PATCH 2/8] pipe: move limit checking logic into pipe_set_size()
This is a preparatory patch for following work. Move the F_SETPIPE_SZ limit-checking logic from pipe_fcntl() into pipe_set_size(). This simplifies the code a little, and allows for reworking required in later patches. Cc: Willy Tarreau Cc: Vegard Nossum Cc: socketp...@gmail.com Cc: Tetsuo Handa Cc: Jens Axboe Cc: Al Viro Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Michael Kerrisk --- fs/pipe.c | 41 ++--- 1 file changed, 18 insertions(+), 23 deletions(-) diff --git a/fs/pipe.c b/fs/pipe.c index 7d7c21e..4b98fd0 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -1026,9 +1026,24 @@ static inline unsigned int round_pipe_size(unsigned int size) * Allocate a new array of pipe buffers and copy the info over. Returns the * pipe size if successful, or return -ERROR on error. */ -static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long nr_pages) +static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg) { struct pipe_buffer *bufs; + unsigned int size, nr_pages; + + size = round_pipe_size(arg); + nr_pages = size >> PAGE_SHIFT; + + if (!nr_pages) + return -EINVAL; + + if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) + return -EPERM; + + if ((too_many_pipe_buffers_hard(pipe->user) || + too_many_pipe_buffers_soft(pipe->user)) && + !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) + return -EPERM; /* * We can shrink the pipe, if arg >= pipe->nrbufs. Since we don't @@ -1112,28 +1127,9 @@ long pipe_fcntl(struct file *file, unsigned int cmd, unsigned long arg) __pipe_lock(pipe); switch (cmd) { - case F_SETPIPE_SZ: { - unsigned int size, nr_pages; - - size = round_pipe_size(arg); - nr_pages = size >> PAGE_SHIFT; - - ret = -EINVAL; - if (!nr_pages) - goto out; - - if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) { - ret = -EPERM; - goto out; - } else if ((too_many_pipe_buffers_hard(pipe->user) || - too_many_pipe_buffers_soft(pipe->user)) && - !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) { - ret = -EPERM; - goto out; - } - ret = pipe_set_size(pipe, nr_pages); + case F_SETPIPE_SZ: + ret = pipe_set_size(pipe, arg); break; - } case F_GETPIPE_SZ: ret = pipe->buffers * PAGE_SIZE; break; @@ -1142,7 +1138,6 @@ long pipe_fcntl(struct file *file, unsigned int cmd, unsigned long arg) break; } -out: __pipe_unlock(pipe); return ret; } -- 2.5.5
[PATCH v11 6/9] reset: mediatek: Add MT2701 reset controller dt-binding file
From: Shunli WangDt-binding file about reset controller is used to provide kinds of definition, which is referenced by dts file and IC-specified reset controller driver code. Signed-off-by: Shunli Wang Signed-off-by: James Liao Signed-off-by: Erin Lo Tested-by: John Crispin Acked-by: Philipp Zabel --- include/dt-bindings/reset/mt2701-resets.h | 83 +++ 1 file changed, 83 insertions(+) create mode 100644 include/dt-bindings/reset/mt2701-resets.h diff --git a/include/dt-bindings/reset/mt2701-resets.h b/include/dt-bindings/reset/mt2701-resets.h new file mode 100644 index 000..aaf0305 --- /dev/null +++ b/include/dt-bindings/reset/mt2701-resets.h @@ -0,0 +1,83 @@ +/* + * Copyright (c) 2015 MediaTek, Shunli Wang + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#ifndef _DT_BINDINGS_RESET_CONTROLLER_MT2701 +#define _DT_BINDINGS_RESET_CONTROLLER_MT2701 + +/* INFRACFG resets */ +#define MT2701_INFRA_EMI_REG_RST 0 +#define MT2701_INFRA_DRAMC0_A0_RST 1 +#define MT2701_INFRA_FHCTL_RST 2 +#define MT2701_INFRA_APCIRQ_EINT_RST 3 +#define MT2701_INFRA_APXGPT_RST4 +#define MT2701_INFRA_SCPSYS_RST5 +#define MT2701_INFRA_KP_RST6 +#define MT2701_INFRA_PMIC_WRAP_RST 7 +#define MT2701_INFRA_MIPI_RST 8 +#define MT2701_INFRA_IRRX_RST 9 +#define MT2701_INFRA_CEC_RST 10 +#define MT2701_INFRA_EMI_RST 32 +#define MT2701_INFRA_DRAMC0_RST34 +#define MT2701_INFRA_TRNG_RST 37 +#define MT2701_INFRA_SYSIRQ_RST38 + +/* PERICFG resets */ +#define MT2701_PERI_UART0_SW_RST 0 +#define MT2701_PERI_UART1_SW_RST 1 +#define MT2701_PERI_UART2_SW_RST 2 +#define MT2701_PERI_UART3_SW_RST 3 +#define MT2701_PERI_GCPU_SW_RST5 +#define MT2701_PERI_BTIF_SW_RST6 +#define MT2701_PERI_PWM_SW_RST 8 +#define MT2701_PERI_AUXADC_SW_RST 10 +#define MT2701_PERI_DMA_SW_RST 11 +#define MT2701_PERI_NFI_SW_RST 14 +#define MT2701_PERI_NLI_SW_RST 15 +#define MT2701_PERI_THERM_SW_RST 16 +#define MT2701_PERI_MSDC2_SW_RST 17 +#define MT2701_PERI_MSDC0_SW_RST 19 +#define MT2701_PERI_MSDC1_SW_RST 20 +#define MT2701_PERI_I2C0_SW_RST22 +#define MT2701_PERI_I2C1_SW_RST23 +#define MT2701_PERI_I2C2_SW_RST24 +#define MT2701_PERI_I2C3_SW_RST25 +#define MT2701_PERI_USB_SW_RST 28 +#define MT2701_PERI_ETH_SW_RST 29 +#define MT2701_PERI_SPI0_SW_RST33 + +/* TOPRGU resets */ +#define MT2701_TOPRGU_INFRA_RST0 +#define MT2701_TOPRGU_MM_RST 1 +#define MT2701_TOPRGU_MFG_RST 2 +#define MT2701_TOPRGU_ETHDMA_RST 3 +#define MT2701_TOPRGU_VDEC_RST 4 +#define MT2701_TOPRGU_VENC_IMG_RST 5 +#define MT2701_TOPRGU_DDRPHY_RST 6 +#define MT2701_TOPRGU_MD_RST 7 +#define MT2701_TOPRGU_INFRA_AO_RST 8 +#define MT2701_TOPRGU_CONN_RST 9 +#define MT2701_TOPRGU_APMIXED_RST 10 +#define MT2701_TOPRGU_HIFSYS_RST 11 +#define MT2701_TOPRGU_CONN_MCU_RST 12 +#define MT2701_TOPRGU_BDP_DISP_RST 13 + +/* HIFSYS resets */ +#define MT2701_HIFSYS_UHOST0_RST 3 +#define MT2701_HIFSYS_UHOST1_RST 4 +#define MT2701_HIFSYS_UPHY0_RST21 +#define MT2701_HIFSYS_UPHY1_RST22 +#define MT2701_HIFSYS_PCIE0_RST24 +#define MT2701_HIFSYS_PCIE1_RST25 +#define MT2701_HIFSYS_PCIE2_RST26 + +#endif /* _DT_BINDINGS_RESET_CONTROLLER_MT2701 */ -- 1.9.1
[PATCH v11 2/9] clk: mediatek: Refine the makefile to support multiple clock drivers
From: James LiaoAdd a Kconfig to define clock configuration for each SoC, and modify the Makefile to build drivers that only selected in config. Signed-off-by: Shunli Wang Signed-off-by: James Liao Signed-off-by: Erin Lo Tested-by: John Crispin Reviewed-by: Matthias Brugger --- drivers/clk/Kconfig | 1 + drivers/clk/mediatek/Kconfig | 21 + drivers/clk/mediatek/Makefile | 6 +++--- 3 files changed, 25 insertions(+), 3 deletions(-) create mode 100644 drivers/clk/mediatek/Kconfig diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig index e2d9bd7..4265471 100644 --- a/drivers/clk/Kconfig +++ b/drivers/clk/Kconfig @@ -210,6 +210,7 @@ config COMMON_CLK_OXNAS source "drivers/clk/bcm/Kconfig" source "drivers/clk/hisilicon/Kconfig" +source "drivers/clk/mediatek/Kconfig" source "drivers/clk/meson/Kconfig" source "drivers/clk/mvebu/Kconfig" source "drivers/clk/qcom/Kconfig" diff --git a/drivers/clk/mediatek/Kconfig b/drivers/clk/mediatek/Kconfig new file mode 100644 index 000..380c372 --- /dev/null +++ b/drivers/clk/mediatek/Kconfig @@ -0,0 +1,21 @@ +# +# MediaTek SoC drivers +# +config COMMON_CLK_MEDIATEK + bool + ---help--- + Mediatek SoCs' clock support. + +config COMMON_CLK_MT8135 + bool "Clock driver for Mediatek MT8135" + select COMMON_CLK_MEDIATEK + default ARCH_MEDIATEK + ---help--- + This driver supports Mediatek MT8135 clocks. + +config COMMON_CLK_MT8173 + bool "Clock driver for Mediatek MT8173" + select COMMON_CLK_MEDIATEK + default ARCH_MEDIATEK + ---help--- + This driver supports Mediatek MT8173 clocks. diff --git a/drivers/clk/mediatek/Makefile b/drivers/clk/mediatek/Makefile index 95fdfac..32e7222 100644 --- a/drivers/clk/mediatek/Makefile +++ b/drivers/clk/mediatek/Makefile @@ -1,4 +1,4 @@ -obj-y += clk-mtk.o clk-pll.o clk-gate.o clk-apmixed.o +obj-$(CONFIG_COMMON_CLK_MEDIATEK) += clk-mtk.o clk-pll.o clk-gate.o clk-apmixed.o obj-$(CONFIG_RESET_CONTROLLER) += reset.o -obj-y += clk-mt8135.o -obj-y += clk-mt8173.o +obj-$(CONFIG_COMMON_CLK_MT8135) += clk-mt8135.o +obj-$(CONFIG_COMMON_CLK_MT8173) += clk-mt8173.o -- 1.9.1
[PATCH v11 6/9] reset: mediatek: Add MT2701 reset controller dt-binding file
From: Shunli Wang Dt-binding file about reset controller is used to provide kinds of definition, which is referenced by dts file and IC-specified reset controller driver code. Signed-off-by: Shunli Wang Signed-off-by: James Liao Signed-off-by: Erin Lo Tested-by: John Crispin Acked-by: Philipp Zabel --- include/dt-bindings/reset/mt2701-resets.h | 83 +++ 1 file changed, 83 insertions(+) create mode 100644 include/dt-bindings/reset/mt2701-resets.h diff --git a/include/dt-bindings/reset/mt2701-resets.h b/include/dt-bindings/reset/mt2701-resets.h new file mode 100644 index 000..aaf0305 --- /dev/null +++ b/include/dt-bindings/reset/mt2701-resets.h @@ -0,0 +1,83 @@ +/* + * Copyright (c) 2015 MediaTek, Shunli Wang + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#ifndef _DT_BINDINGS_RESET_CONTROLLER_MT2701 +#define _DT_BINDINGS_RESET_CONTROLLER_MT2701 + +/* INFRACFG resets */ +#define MT2701_INFRA_EMI_REG_RST 0 +#define MT2701_INFRA_DRAMC0_A0_RST 1 +#define MT2701_INFRA_FHCTL_RST 2 +#define MT2701_INFRA_APCIRQ_EINT_RST 3 +#define MT2701_INFRA_APXGPT_RST4 +#define MT2701_INFRA_SCPSYS_RST5 +#define MT2701_INFRA_KP_RST6 +#define MT2701_INFRA_PMIC_WRAP_RST 7 +#define MT2701_INFRA_MIPI_RST 8 +#define MT2701_INFRA_IRRX_RST 9 +#define MT2701_INFRA_CEC_RST 10 +#define MT2701_INFRA_EMI_RST 32 +#define MT2701_INFRA_DRAMC0_RST34 +#define MT2701_INFRA_TRNG_RST 37 +#define MT2701_INFRA_SYSIRQ_RST38 + +/* PERICFG resets */ +#define MT2701_PERI_UART0_SW_RST 0 +#define MT2701_PERI_UART1_SW_RST 1 +#define MT2701_PERI_UART2_SW_RST 2 +#define MT2701_PERI_UART3_SW_RST 3 +#define MT2701_PERI_GCPU_SW_RST5 +#define MT2701_PERI_BTIF_SW_RST6 +#define MT2701_PERI_PWM_SW_RST 8 +#define MT2701_PERI_AUXADC_SW_RST 10 +#define MT2701_PERI_DMA_SW_RST 11 +#define MT2701_PERI_NFI_SW_RST 14 +#define MT2701_PERI_NLI_SW_RST 15 +#define MT2701_PERI_THERM_SW_RST 16 +#define MT2701_PERI_MSDC2_SW_RST 17 +#define MT2701_PERI_MSDC0_SW_RST 19 +#define MT2701_PERI_MSDC1_SW_RST 20 +#define MT2701_PERI_I2C0_SW_RST22 +#define MT2701_PERI_I2C1_SW_RST23 +#define MT2701_PERI_I2C2_SW_RST24 +#define MT2701_PERI_I2C3_SW_RST25 +#define MT2701_PERI_USB_SW_RST 28 +#define MT2701_PERI_ETH_SW_RST 29 +#define MT2701_PERI_SPI0_SW_RST33 + +/* TOPRGU resets */ +#define MT2701_TOPRGU_INFRA_RST0 +#define MT2701_TOPRGU_MM_RST 1 +#define MT2701_TOPRGU_MFG_RST 2 +#define MT2701_TOPRGU_ETHDMA_RST 3 +#define MT2701_TOPRGU_VDEC_RST 4 +#define MT2701_TOPRGU_VENC_IMG_RST 5 +#define MT2701_TOPRGU_DDRPHY_RST 6 +#define MT2701_TOPRGU_MD_RST 7 +#define MT2701_TOPRGU_INFRA_AO_RST 8 +#define MT2701_TOPRGU_CONN_RST 9 +#define MT2701_TOPRGU_APMIXED_RST 10 +#define MT2701_TOPRGU_HIFSYS_RST 11 +#define MT2701_TOPRGU_CONN_MCU_RST 12 +#define MT2701_TOPRGU_BDP_DISP_RST 13 + +/* HIFSYS resets */ +#define MT2701_HIFSYS_UHOST0_RST 3 +#define MT2701_HIFSYS_UHOST1_RST 4 +#define MT2701_HIFSYS_UPHY0_RST21 +#define MT2701_HIFSYS_UPHY1_RST22 +#define MT2701_HIFSYS_PCIE0_RST24 +#define MT2701_HIFSYS_PCIE1_RST25 +#define MT2701_HIFSYS_PCIE2_RST26 + +#endif /* _DT_BINDINGS_RESET_CONTROLLER_MT2701 */ -- 1.9.1
[PATCH v11 2/9] clk: mediatek: Refine the makefile to support multiple clock drivers
From: James Liao Add a Kconfig to define clock configuration for each SoC, and modify the Makefile to build drivers that only selected in config. Signed-off-by: Shunli Wang Signed-off-by: James Liao Signed-off-by: Erin Lo Tested-by: John Crispin Reviewed-by: Matthias Brugger --- drivers/clk/Kconfig | 1 + drivers/clk/mediatek/Kconfig | 21 + drivers/clk/mediatek/Makefile | 6 +++--- 3 files changed, 25 insertions(+), 3 deletions(-) create mode 100644 drivers/clk/mediatek/Kconfig diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig index e2d9bd7..4265471 100644 --- a/drivers/clk/Kconfig +++ b/drivers/clk/Kconfig @@ -210,6 +210,7 @@ config COMMON_CLK_OXNAS source "drivers/clk/bcm/Kconfig" source "drivers/clk/hisilicon/Kconfig" +source "drivers/clk/mediatek/Kconfig" source "drivers/clk/meson/Kconfig" source "drivers/clk/mvebu/Kconfig" source "drivers/clk/qcom/Kconfig" diff --git a/drivers/clk/mediatek/Kconfig b/drivers/clk/mediatek/Kconfig new file mode 100644 index 000..380c372 --- /dev/null +++ b/drivers/clk/mediatek/Kconfig @@ -0,0 +1,21 @@ +# +# MediaTek SoC drivers +# +config COMMON_CLK_MEDIATEK + bool + ---help--- + Mediatek SoCs' clock support. + +config COMMON_CLK_MT8135 + bool "Clock driver for Mediatek MT8135" + select COMMON_CLK_MEDIATEK + default ARCH_MEDIATEK + ---help--- + This driver supports Mediatek MT8135 clocks. + +config COMMON_CLK_MT8173 + bool "Clock driver for Mediatek MT8173" + select COMMON_CLK_MEDIATEK + default ARCH_MEDIATEK + ---help--- + This driver supports Mediatek MT8173 clocks. diff --git a/drivers/clk/mediatek/Makefile b/drivers/clk/mediatek/Makefile index 95fdfac..32e7222 100644 --- a/drivers/clk/mediatek/Makefile +++ b/drivers/clk/mediatek/Makefile @@ -1,4 +1,4 @@ -obj-y += clk-mtk.o clk-pll.o clk-gate.o clk-apmixed.o +obj-$(CONFIG_COMMON_CLK_MEDIATEK) += clk-mtk.o clk-pll.o clk-gate.o clk-apmixed.o obj-$(CONFIG_RESET_CONTROLLER) += reset.o -obj-y += clk-mt8135.o -obj-y += clk-mt8173.o +obj-$(CONFIG_COMMON_CLK_MT8135) += clk-mt8135.o +obj-$(CONFIG_COMMON_CLK_MT8173) += clk-mt8173.o -- 1.9.1
[PATCH v11 3/9] dt-bindings: ARM: Mediatek: Document bindings for MT2701
From: James LiaoThis patch adds the binding documentation for apmixedsys, bdpsys, ethsys, hifsys, imgsys, infracfg, mmsys, pericfg, topckgen and vdecsys for Mediatek MT2701. Signed-off-by: James Liao Signed-off-by: Erin Lo Tested-by: John Crispin Acked-by: Rob Herring --- .../bindings/arm/mediatek/mediatek,apmixedsys.txt | 3 ++- .../bindings/arm/mediatek/mediatek,bdpsys.txt | 22 .../bindings/arm/mediatek/mediatek,ethsys.txt | 22 .../bindings/arm/mediatek/mediatek,hifsys.txt | 24 ++ .../bindings/arm/mediatek/mediatek,imgsys.txt | 3 ++- .../bindings/arm/mediatek/mediatek,infracfg.txt| 3 ++- .../bindings/arm/mediatek/mediatek,mmsys.txt | 3 ++- .../bindings/arm/mediatek/mediatek,pericfg.txt | 3 ++- .../bindings/arm/mediatek/mediatek,topckgen.txt| 3 ++- .../bindings/arm/mediatek/mediatek,vdecsys.txt | 3 ++- 10 files changed, 82 insertions(+), 7 deletions(-) create mode 100644 Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt create mode 100644 Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt create mode 100644 Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt index 936166f..cb0054a 100644 --- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt @@ -5,7 +5,8 @@ The Mediatek apmixedsys controller provides the PLLs to the system. Required Properties: -- compatible: Should be: +- compatible: Should be one of: + - "mediatek,mt2701-apmixedsys" - "mediatek,mt8135-apmixedsys" - "mediatek,mt8173-apmixedsys" - #clock-cells: Must be 1 diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt new file mode 100644 index 000..4137196 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt @@ -0,0 +1,22 @@ +Mediatek bdpsys controller + + +The Mediatek bdpsys controller provides various clocks to the system. + +Required Properties: + +- compatible: Should be: + - "mediatek,mt2701-bdpsys", "syscon" +- #clock-cells: Must be 1 + +The bdpsys controller uses the common clk binding from +Documentation/devicetree/bindings/clock/clock-bindings.txt +The available clocks are defined in dt-bindings/clock/mt*-clk.h. + +Example: + +bdpsys: clock-controller@1c00 { + compatible = "mediatek,mt2701-bdpsys", "syscon"; + reg = <0 0x1c00 0 0x1000>; + #clock-cells = <1>; +}; diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt new file mode 100644 index 000..768f3a5 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt @@ -0,0 +1,22 @@ +Mediatek ethsys controller + + +The Mediatek ethsys controller provides various clocks to the system. + +Required Properties: + +- compatible: Should be: + - "mediatek,mt2701-ethsys", "syscon" +- #clock-cells: Must be 1 + +The ethsys controller uses the common clk binding from +Documentation/devicetree/bindings/clock/clock-bindings.txt +The available clocks are defined in dt-bindings/clock/mt*-clk.h. + +Example: + +ethsys: clock-controller@1b00 { + compatible = "mediatek,mt2701-ethsys", "syscon"; + reg = <0 0x1b00 0 0x1000>; + #clock-cells = <1>; +}; diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt new file mode 100644 index 000..beed7b5 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt @@ -0,0 +1,24 @@ +Mediatek hifsys controller + + +The Mediatek hifsys controller provides various clocks and reset +outputs to the system. + +Required Properties: + +- compatible: Should be: + - "mediatek,mt2701-hifsys", "syscon" +- #clock-cells: Must be 1 + +The hifsys controller uses the common clk binding from +Documentation/devicetree/bindings/clock/clock-bindings.txt +The available clocks are defined in dt-bindings/clock/mt*-clk.h. + +Example: + +hifsys: clock-controller@1a00 { + compatible = "mediatek,mt2701-hifsys", "syscon"; + reg = <0 0x1a00 0 0x1000>; + #clock-cells = <1>; + #reset-cells = <1>; +}; diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,imgsys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,imgsys.txt index b1f2ce1..f6a9166
[PATCH v11 3/9] dt-bindings: ARM: Mediatek: Document bindings for MT2701
From: James Liao This patch adds the binding documentation for apmixedsys, bdpsys, ethsys, hifsys, imgsys, infracfg, mmsys, pericfg, topckgen and vdecsys for Mediatek MT2701. Signed-off-by: James Liao Signed-off-by: Erin Lo Tested-by: John Crispin Acked-by: Rob Herring --- .../bindings/arm/mediatek/mediatek,apmixedsys.txt | 3 ++- .../bindings/arm/mediatek/mediatek,bdpsys.txt | 22 .../bindings/arm/mediatek/mediatek,ethsys.txt | 22 .../bindings/arm/mediatek/mediatek,hifsys.txt | 24 ++ .../bindings/arm/mediatek/mediatek,imgsys.txt | 3 ++- .../bindings/arm/mediatek/mediatek,infracfg.txt| 3 ++- .../bindings/arm/mediatek/mediatek,mmsys.txt | 3 ++- .../bindings/arm/mediatek/mediatek,pericfg.txt | 3 ++- .../bindings/arm/mediatek/mediatek,topckgen.txt| 3 ++- .../bindings/arm/mediatek/mediatek,vdecsys.txt | 3 ++- 10 files changed, 82 insertions(+), 7 deletions(-) create mode 100644 Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt create mode 100644 Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt create mode 100644 Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt index 936166f..cb0054a 100644 --- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt @@ -5,7 +5,8 @@ The Mediatek apmixedsys controller provides the PLLs to the system. Required Properties: -- compatible: Should be: +- compatible: Should be one of: + - "mediatek,mt2701-apmixedsys" - "mediatek,mt8135-apmixedsys" - "mediatek,mt8173-apmixedsys" - #clock-cells: Must be 1 diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt new file mode 100644 index 000..4137196 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt @@ -0,0 +1,22 @@ +Mediatek bdpsys controller + + +The Mediatek bdpsys controller provides various clocks to the system. + +Required Properties: + +- compatible: Should be: + - "mediatek,mt2701-bdpsys", "syscon" +- #clock-cells: Must be 1 + +The bdpsys controller uses the common clk binding from +Documentation/devicetree/bindings/clock/clock-bindings.txt +The available clocks are defined in dt-bindings/clock/mt*-clk.h. + +Example: + +bdpsys: clock-controller@1c00 { + compatible = "mediatek,mt2701-bdpsys", "syscon"; + reg = <0 0x1c00 0 0x1000>; + #clock-cells = <1>; +}; diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt new file mode 100644 index 000..768f3a5 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt @@ -0,0 +1,22 @@ +Mediatek ethsys controller + + +The Mediatek ethsys controller provides various clocks to the system. + +Required Properties: + +- compatible: Should be: + - "mediatek,mt2701-ethsys", "syscon" +- #clock-cells: Must be 1 + +The ethsys controller uses the common clk binding from +Documentation/devicetree/bindings/clock/clock-bindings.txt +The available clocks are defined in dt-bindings/clock/mt*-clk.h. + +Example: + +ethsys: clock-controller@1b00 { + compatible = "mediatek,mt2701-ethsys", "syscon"; + reg = <0 0x1b00 0 0x1000>; + #clock-cells = <1>; +}; diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt new file mode 100644 index 000..beed7b5 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt @@ -0,0 +1,24 @@ +Mediatek hifsys controller + + +The Mediatek hifsys controller provides various clocks and reset +outputs to the system. + +Required Properties: + +- compatible: Should be: + - "mediatek,mt2701-hifsys", "syscon" +- #clock-cells: Must be 1 + +The hifsys controller uses the common clk binding from +Documentation/devicetree/bindings/clock/clock-bindings.txt +The available clocks are defined in dt-bindings/clock/mt*-clk.h. + +Example: + +hifsys: clock-controller@1a00 { + compatible = "mediatek,mt2701-hifsys", "syscon"; + reg = <0 0x1a00 0 0x1000>; + #clock-cells = <1>; + #reset-cells = <1>; +}; diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,imgsys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,imgsys.txt index b1f2ce1..f6a9166 100644 --- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,imgsys.txt +++
[PATCH v11 5/9] clk: mediatek: Add MT2701 clock support
From: Shunli WangAdd MT2701 clock support, include topckgen, apmixedsys, infracfg, pericfg and subsystem clocks. Signed-off-by: Shunli Wang Signed-off-by: James Liao Signed-off-by: Erin Lo Tested-by: John Crispin --- drivers/clk/mediatek/Kconfig | 43 ++ drivers/clk/mediatek/Makefile |7 + drivers/clk/mediatek/clk-gate.c| 52 ++ drivers/clk/mediatek/clk-gate.h|2 + drivers/clk/mediatek/clk-mt2701-bdp.c | 140 + drivers/clk/mediatek/clk-mt2701-eth.c | 82 +++ drivers/clk/mediatek/clk-mt2701-hif.c | 79 +++ drivers/clk/mediatek/clk-mt2701-img.c | 82 +++ drivers/clk/mediatek/clk-mt2701-mm.c | 125 drivers/clk/mediatek/clk-mt2701-vdec.c | 93 +++ drivers/clk/mediatek/clk-mt2701.c | 1033 drivers/clk/mediatek/clk-mtk.c | 40 ++ drivers/clk/mediatek/clk-mtk.h | 41 +- drivers/clk/mediatek/clk-pll.c |1 + 14 files changed, 1815 insertions(+), 5 deletions(-) create mode 100644 drivers/clk/mediatek/clk-mt2701-bdp.c create mode 100644 drivers/clk/mediatek/clk-mt2701-eth.c create mode 100644 drivers/clk/mediatek/clk-mt2701-hif.c create mode 100644 drivers/clk/mediatek/clk-mt2701-img.c create mode 100644 drivers/clk/mediatek/clk-mt2701-mm.c create mode 100644 drivers/clk/mediatek/clk-mt2701-vdec.c create mode 100644 drivers/clk/mediatek/clk-mt2701.c diff --git a/drivers/clk/mediatek/Kconfig b/drivers/clk/mediatek/Kconfig index 380c372..7202db5 100644 --- a/drivers/clk/mediatek/Kconfig +++ b/drivers/clk/mediatek/Kconfig @@ -6,6 +6,49 @@ config COMMON_CLK_MEDIATEK ---help--- Mediatek SoCs' clock support. +config COMMON_CLK_MT2701 + bool "Clock driver for Mediatek MT2701" + select COMMON_CLK_MEDIATEK + default ARCH_MEDIATEK + ---help--- + This driver supports Mediatek MT2701 basic clocks. + +config COMMON_CLK_MT2701_MMSYS + bool "Clock driver for Mediatek MT2701 mmsys" + select COMMON_CLK_MT2701 + ---help--- + This driver supports Mediatek MT2701 mmsys clocks. + +config COMMON_CLK_MT2701_IMGSYS + bool "Clock driver for Mediatek MT2701 imgsys" + select COMMON_CLK_MT2701 + ---help--- + This driver supports Mediatek MT2701 imgsys clocks. + +config COMMON_CLK_MT2701_VDECSYS + bool "Clock driver for Mediatek MT2701 vdecsys" + select COMMON_CLK_MT2701 + ---help--- + This driver supports Mediatek MT2701 vdecsys clocks. + +config COMMON_CLK_MT2701_HIFSYS + bool "Clock driver for Mediatek MT2701 hifsys" + select COMMON_CLK_MT2701 + ---help--- + This driver supports Mediatek MT2701 hifsys clocks. + +config COMMON_CLK_MT2701_ETHSYS + bool "Clock driver for Mediatek MT2701 ethsys" + select COMMON_CLK_MT2701 + ---help--- + This driver supports Mediatek MT2701 ethsys clocks. + +config COMMON_CLK_MT2701_BDPSYS + bool "Clock driver for Mediatek MT2701 bdpsys" + select COMMON_CLK_MT2701 + ---help--- + This driver supports Mediatek MT2701 bdpsys clocks. + config COMMON_CLK_MT8135 bool "Clock driver for Mediatek MT8135" select COMMON_CLK_MEDIATEK diff --git a/drivers/clk/mediatek/Makefile b/drivers/clk/mediatek/Makefile index 32e7222..19ae7ef 100644 --- a/drivers/clk/mediatek/Makefile +++ b/drivers/clk/mediatek/Makefile @@ -1,4 +1,11 @@ obj-$(CONFIG_COMMON_CLK_MEDIATEK) += clk-mtk.o clk-pll.o clk-gate.o clk-apmixed.o obj-$(CONFIG_RESET_CONTROLLER) += reset.o +obj-$(CONFIG_COMMON_CLK_MT2701) += clk-mt2701.o +obj-$(CONFIG_COMMON_CLK_MT2701_BDPSYS) += clk-mt2701-bdp.o +obj-$(CONFIG_COMMON_CLK_MT2701_ETHSYS) += clk-mt2701-eth.o +obj-$(CONFIG_COMMON_CLK_MT2701_HIFSYS) += clk-mt2701-hif.o +obj-$(CONFIG_COMMON_CLK_MT2701_IMGSYS) += clk-mt2701-img.o +obj-$(CONFIG_COMMON_CLK_MT2701_MMSYS) += clk-mt2701-mm.o +obj-$(CONFIG_COMMON_CLK_MT2701_VDECSYS) += clk-mt2701-vdec.o obj-$(CONFIG_COMMON_CLK_MT8135) += clk-mt8135.o obj-$(CONFIG_COMMON_CLK_MT8173) += clk-mt8173.o diff --git a/drivers/clk/mediatek/clk-gate.c b/drivers/clk/mediatek/clk-gate.c index d8787bf..934bf0e 100644 --- a/drivers/clk/mediatek/clk-gate.c +++ b/drivers/clk/mediatek/clk-gate.c @@ -61,6 +61,22 @@ static void mtk_cg_clr_bit(struct clk_hw *hw) regmap_write(cg->regmap, cg->clr_ofs, BIT(cg->bit)); } +static void mtk_cg_set_bit_no_setclr(struct clk_hw *hw) +{ + struct mtk_clk_gate *cg = to_mtk_clk_gate(hw); + u32 cgbit = BIT(cg->bit); + + regmap_update_bits(cg->regmap, cg->sta_ofs, cgbit, cgbit); +} + +static void mtk_cg_clr_bit_no_setclr(struct clk_hw *hw) +{ + struct mtk_clk_gate *cg = to_mtk_clk_gate(hw); + u32 cgbit = BIT(cg->bit); + + regmap_update_bits(cg->regmap, cg->sta_ofs, cgbit, 0); +} + static int mtk_cg_enable(struct clk_hw
[PATCH v11 5/9] clk: mediatek: Add MT2701 clock support
From: Shunli Wang Add MT2701 clock support, include topckgen, apmixedsys, infracfg, pericfg and subsystem clocks. Signed-off-by: Shunli Wang Signed-off-by: James Liao Signed-off-by: Erin Lo Tested-by: John Crispin --- drivers/clk/mediatek/Kconfig | 43 ++ drivers/clk/mediatek/Makefile |7 + drivers/clk/mediatek/clk-gate.c| 52 ++ drivers/clk/mediatek/clk-gate.h|2 + drivers/clk/mediatek/clk-mt2701-bdp.c | 140 + drivers/clk/mediatek/clk-mt2701-eth.c | 82 +++ drivers/clk/mediatek/clk-mt2701-hif.c | 79 +++ drivers/clk/mediatek/clk-mt2701-img.c | 82 +++ drivers/clk/mediatek/clk-mt2701-mm.c | 125 drivers/clk/mediatek/clk-mt2701-vdec.c | 93 +++ drivers/clk/mediatek/clk-mt2701.c | 1033 drivers/clk/mediatek/clk-mtk.c | 40 ++ drivers/clk/mediatek/clk-mtk.h | 41 +- drivers/clk/mediatek/clk-pll.c |1 + 14 files changed, 1815 insertions(+), 5 deletions(-) create mode 100644 drivers/clk/mediatek/clk-mt2701-bdp.c create mode 100644 drivers/clk/mediatek/clk-mt2701-eth.c create mode 100644 drivers/clk/mediatek/clk-mt2701-hif.c create mode 100644 drivers/clk/mediatek/clk-mt2701-img.c create mode 100644 drivers/clk/mediatek/clk-mt2701-mm.c create mode 100644 drivers/clk/mediatek/clk-mt2701-vdec.c create mode 100644 drivers/clk/mediatek/clk-mt2701.c diff --git a/drivers/clk/mediatek/Kconfig b/drivers/clk/mediatek/Kconfig index 380c372..7202db5 100644 --- a/drivers/clk/mediatek/Kconfig +++ b/drivers/clk/mediatek/Kconfig @@ -6,6 +6,49 @@ config COMMON_CLK_MEDIATEK ---help--- Mediatek SoCs' clock support. +config COMMON_CLK_MT2701 + bool "Clock driver for Mediatek MT2701" + select COMMON_CLK_MEDIATEK + default ARCH_MEDIATEK + ---help--- + This driver supports Mediatek MT2701 basic clocks. + +config COMMON_CLK_MT2701_MMSYS + bool "Clock driver for Mediatek MT2701 mmsys" + select COMMON_CLK_MT2701 + ---help--- + This driver supports Mediatek MT2701 mmsys clocks. + +config COMMON_CLK_MT2701_IMGSYS + bool "Clock driver for Mediatek MT2701 imgsys" + select COMMON_CLK_MT2701 + ---help--- + This driver supports Mediatek MT2701 imgsys clocks. + +config COMMON_CLK_MT2701_VDECSYS + bool "Clock driver for Mediatek MT2701 vdecsys" + select COMMON_CLK_MT2701 + ---help--- + This driver supports Mediatek MT2701 vdecsys clocks. + +config COMMON_CLK_MT2701_HIFSYS + bool "Clock driver for Mediatek MT2701 hifsys" + select COMMON_CLK_MT2701 + ---help--- + This driver supports Mediatek MT2701 hifsys clocks. + +config COMMON_CLK_MT2701_ETHSYS + bool "Clock driver for Mediatek MT2701 ethsys" + select COMMON_CLK_MT2701 + ---help--- + This driver supports Mediatek MT2701 ethsys clocks. + +config COMMON_CLK_MT2701_BDPSYS + bool "Clock driver for Mediatek MT2701 bdpsys" + select COMMON_CLK_MT2701 + ---help--- + This driver supports Mediatek MT2701 bdpsys clocks. + config COMMON_CLK_MT8135 bool "Clock driver for Mediatek MT8135" select COMMON_CLK_MEDIATEK diff --git a/drivers/clk/mediatek/Makefile b/drivers/clk/mediatek/Makefile index 32e7222..19ae7ef 100644 --- a/drivers/clk/mediatek/Makefile +++ b/drivers/clk/mediatek/Makefile @@ -1,4 +1,11 @@ obj-$(CONFIG_COMMON_CLK_MEDIATEK) += clk-mtk.o clk-pll.o clk-gate.o clk-apmixed.o obj-$(CONFIG_RESET_CONTROLLER) += reset.o +obj-$(CONFIG_COMMON_CLK_MT2701) += clk-mt2701.o +obj-$(CONFIG_COMMON_CLK_MT2701_BDPSYS) += clk-mt2701-bdp.o +obj-$(CONFIG_COMMON_CLK_MT2701_ETHSYS) += clk-mt2701-eth.o +obj-$(CONFIG_COMMON_CLK_MT2701_HIFSYS) += clk-mt2701-hif.o +obj-$(CONFIG_COMMON_CLK_MT2701_IMGSYS) += clk-mt2701-img.o +obj-$(CONFIG_COMMON_CLK_MT2701_MMSYS) += clk-mt2701-mm.o +obj-$(CONFIG_COMMON_CLK_MT2701_VDECSYS) += clk-mt2701-vdec.o obj-$(CONFIG_COMMON_CLK_MT8135) += clk-mt8135.o obj-$(CONFIG_COMMON_CLK_MT8173) += clk-mt8173.o diff --git a/drivers/clk/mediatek/clk-gate.c b/drivers/clk/mediatek/clk-gate.c index d8787bf..934bf0e 100644 --- a/drivers/clk/mediatek/clk-gate.c +++ b/drivers/clk/mediatek/clk-gate.c @@ -61,6 +61,22 @@ static void mtk_cg_clr_bit(struct clk_hw *hw) regmap_write(cg->regmap, cg->clr_ofs, BIT(cg->bit)); } +static void mtk_cg_set_bit_no_setclr(struct clk_hw *hw) +{ + struct mtk_clk_gate *cg = to_mtk_clk_gate(hw); + u32 cgbit = BIT(cg->bit); + + regmap_update_bits(cg->regmap, cg->sta_ofs, cgbit, cgbit); +} + +static void mtk_cg_clr_bit_no_setclr(struct clk_hw *hw) +{ + struct mtk_clk_gate *cg = to_mtk_clk_gate(hw); + u32 cgbit = BIT(cg->bit); + + regmap_update_bits(cg->regmap, cg->sta_ofs, cgbit, 0); +} + static int mtk_cg_enable(struct clk_hw *hw) { mtk_cg_clr_bit(hw); @@ -85,6 +101,30 @@ static void mtk_cg_disable_inv(struct clk_hw *hw)
[git pull] drm fixes - part two
Hey, Daniel pointed out I'd missed some i915 fixes, and I also found a single etnaviv fix I missed. So here they are, Thanks, Dave. The following changes since commit 91d62d9f30206be6f7749a0e6f7fa58c6d70c702: Merge branch 'drm-fixes-4.8' of git://people.freedesktop.org/~agd5f/linux into drm-fixes (2016-08-18 12:51:27 +1000) are available in the git repository at: git://people.freedesktop.org/~airlied/linux tags/drm-fixes-for-4.8-rc3-2 for you to fetch changes up to 2c24ba2116d653b4a1315210e38eefbc9eeb1058: Merge tag 'drm-intel-fixes-2016-08-15' of git://anongit.freedesktop.org/drm-intel into drm-fixes (2016-08-19 08:51:13 +1000) Chris Wilson (5): drm/i915: Flush GT idle status upon reset drm/i915: Handle ENOSPC after failing to insert a mappable node drm/i915/fbc: FBC causes display flicker when VT-d is enabled on Skylake drm/i915: Add missing rpm wakelock to GGTT pread drm/i915: Acquire audio powerwell for HD-Audio registers Dave Airlie (2): Merge branch 'drm-etnaviv-fixes' of git://git.pengutronix.de/git/lst/linux into drm-fixes Merge tag 'drm-intel-fixes-2016-08-15' of git://anongit.freedesktop.org/drm-intel into drm-fixes Lucas Stach (1): drm/etnaviv: take GPU lock later in the submit process Maarten Lankhorst (1): drm/i915: Fix modeset handling during gpu reset, v5. Matt Roper (1): drm/i915/gen9: Give one extra block per line for SKL plane WM calculations Matthew Auld (2): drm/i915: fix WaInsertDummyPushConstPs drm/i915: fix aliasing_ppgtt leak Ville Syrjälä (4): drm/i915: Fix iboost setting for DDI with 4 lanes on SKL drm/i915: Program iboost settings for HDMI/DVI on SKL drm/i915: Clean up the extra RPM ref on CHV with i915.enable_rc6=0 drm/i915: Fix iboost setting for SKL Y/U DP DDI buffer translation entry 2 drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 10 +- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_gem.c | 10 +- drivers/gpu/drm/i915/i915_gem_gtt.c | 1 + drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_audio.c | 6 ++ drivers/gpu/drm/i915/intel_ddi.c| 91 - drivers/gpu/drm/i915/intel_display.c| 170 +--- drivers/gpu/drm/i915/intel_fbc.c| 20 drivers/gpu/drm/i915/intel_pm.c | 6 +- drivers/gpu/drm/i915/intel_ringbuffer.c | 8 +- 11 files changed, 224 insertions(+), 100 deletions(-)
[PATCH v11 9/9] arm: dts: mt2701: Use real clock for UARTs
We used to use a fixed rate clock for the UARTs. Now that we have clock support we can associate the correct clocks to the UARTs and drop the 26MHz fixed rate UART clock. Signed-off-by: Erin Lo--- arch/arm/boot/dts/mt2701.dtsi | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/arch/arm/boot/dts/mt2701.dtsi b/arch/arm/boot/dts/mt2701.dtsi index c9a8dbf..7eab6f4 100644 --- a/arch/arm/boot/dts/mt2701.dtsi +++ b/arch/arm/boot/dts/mt2701.dtsi @@ -73,12 +73,6 @@ #clock-cells = <0>; }; - uart_clk: dummy26m { - compatible = "fixed-clock"; - clock-frequency = <2600>; - #clock-cells = <0>; - }; - clk26m: oscillator@0 { compatible = "fixed-clock"; #clock-cells = <0>; @@ -186,7 +180,8 @@ "mediatek,mt6577-uart"; reg = <0 0x11002000 0 0x400>; interrupts = ; - clocks = <_clk>; + clocks = < CLK_PERI_UART0_SEL>, < CLK_PERI_UART0>; + clock-names = "baud", "bus"; status = "disabled"; }; @@ -195,7 +190,8 @@ "mediatek,mt6577-uart"; reg = <0 0x11003000 0 0x400>; interrupts = ; - clocks = <_clk>; + clocks = < CLK_PERI_UART1_SEL>, < CLK_PERI_UART1>; + clock-names = "baud", "bus"; status = "disabled"; }; @@ -204,7 +200,8 @@ "mediatek,mt6577-uart"; reg = <0 0x11004000 0 0x400>; interrupts = ; - clocks = <_clk>; + clocks = < CLK_PERI_UART2_SEL>, < CLK_PERI_UART2>; + clock-names = "baud", "bus"; status = "disabled"; }; @@ -213,7 +210,8 @@ "mediatek,mt6577-uart"; reg = <0 0x11005000 0 0x400>; interrupts = ; - clocks = <_clk>; + clocks = < CLK_PERI_UART3_SEL>, < CLK_PERI_UART3>; + clock-names = "baud", "bus"; status = "disabled"; }; }; -- 1.9.1
[PATCH v11 8/9] arm: dts: mt2701: Add clock controller device nodes
From: James LiaoAdd clock controller nodes for MT2701, include topckgen, infracfg, pericfg, apmixedsys, mmsys, imgsys, vdecsys, hifsys, ethsys and bdpsys. This patch also add two oscillators that provide clocks for MT2701. Signed-off-by: James Liao Signed-off-by: Erin Lo --- arch/arm/boot/dts/mt2701.dtsi | 42 ++ 1 file changed, 42 insertions(+) diff --git a/arch/arm/boot/dts/mt2701.dtsi b/arch/arm/boot/dts/mt2701.dtsi index 18596a2..c9a8dbf 100644 --- a/arch/arm/boot/dts/mt2701.dtsi +++ b/arch/arm/boot/dts/mt2701.dtsi @@ -12,8 +12,10 @@ * GNU General Public License for more details. */ +#include #include #include +#include #include "skeleton64.dtsi" #include "mt2701-pinfunc.h" @@ -77,6 +79,20 @@ #clock-cells = <0>; }; + clk26m: oscillator@0 { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <2600>; + clock-output-names = "clk26m"; + }; + + rtc32k: oscillator@1 { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <32000>; + clock-output-names = "rtc32k"; + }; + timer { compatible = "arm,armv7-timer"; interrupt-parent = <>; @@ -104,6 +120,26 @@ reg = <0 0x10005000 0 0x1000>; }; + topckgen: syscon@1000 { + compatible = "mediatek,mt2701-topckgen", "syscon"; + reg = <0 0x1000 0 0x1000>; + #clock-cells = <1>; + }; + + infracfg: syscon@10001000 { + compatible = "mediatek,mt2701-infracfg", "syscon"; + reg = <0 0x10001000 0 0x1000>; + #clock-cells = <1>; + #reset-cells = <1>; + }; + + pericfg: syscon@10003000 { + compatible = "mediatek,mt2701-pericfg", "syscon"; + reg = <0 0x10003000 0 0x1000>; + #clock-cells = <1>; + #reset-cells = <1>; + }; + watchdog: watchdog@10007000 { compatible = "mediatek,mt2701-wdt", "mediatek,mt6589-wdt"; @@ -128,6 +164,12 @@ reg = <0 0x10200100 0 0x1c>; }; + apmixedsys: syscon@10209000 { + compatible = "mediatek,mt2701-apmixedsys", "syscon"; + reg = <0 0x10209000 0 0x1000>; + #clock-cells = <1>; + }; + gic: interrupt-controller@10211000 { compatible = "arm,cortex-a7-gic"; interrupt-controller; -- 1.9.1
[git pull] drm fixes - part two
Hey, Daniel pointed out I'd missed some i915 fixes, and I also found a single etnaviv fix I missed. So here they are, Thanks, Dave. The following changes since commit 91d62d9f30206be6f7749a0e6f7fa58c6d70c702: Merge branch 'drm-fixes-4.8' of git://people.freedesktop.org/~agd5f/linux into drm-fixes (2016-08-18 12:51:27 +1000) are available in the git repository at: git://people.freedesktop.org/~airlied/linux tags/drm-fixes-for-4.8-rc3-2 for you to fetch changes up to 2c24ba2116d653b4a1315210e38eefbc9eeb1058: Merge tag 'drm-intel-fixes-2016-08-15' of git://anongit.freedesktop.org/drm-intel into drm-fixes (2016-08-19 08:51:13 +1000) Chris Wilson (5): drm/i915: Flush GT idle status upon reset drm/i915: Handle ENOSPC after failing to insert a mappable node drm/i915/fbc: FBC causes display flicker when VT-d is enabled on Skylake drm/i915: Add missing rpm wakelock to GGTT pread drm/i915: Acquire audio powerwell for HD-Audio registers Dave Airlie (2): Merge branch 'drm-etnaviv-fixes' of git://git.pengutronix.de/git/lst/linux into drm-fixes Merge tag 'drm-intel-fixes-2016-08-15' of git://anongit.freedesktop.org/drm-intel into drm-fixes Lucas Stach (1): drm/etnaviv: take GPU lock later in the submit process Maarten Lankhorst (1): drm/i915: Fix modeset handling during gpu reset, v5. Matt Roper (1): drm/i915/gen9: Give one extra block per line for SKL plane WM calculations Matthew Auld (2): drm/i915: fix WaInsertDummyPushConstPs drm/i915: fix aliasing_ppgtt leak Ville Syrjälä (4): drm/i915: Fix iboost setting for DDI with 4 lanes on SKL drm/i915: Program iboost settings for HDMI/DVI on SKL drm/i915: Clean up the extra RPM ref on CHV with i915.enable_rc6=0 drm/i915: Fix iboost setting for SKL Y/U DP DDI buffer translation entry 2 drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 10 +- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_gem.c | 10 +- drivers/gpu/drm/i915/i915_gem_gtt.c | 1 + drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_audio.c | 6 ++ drivers/gpu/drm/i915/intel_ddi.c| 91 - drivers/gpu/drm/i915/intel_display.c| 170 +--- drivers/gpu/drm/i915/intel_fbc.c| 20 drivers/gpu/drm/i915/intel_pm.c | 6 +- drivers/gpu/drm/i915/intel_ringbuffer.c | 8 +- 11 files changed, 224 insertions(+), 100 deletions(-)
[PATCH v11 9/9] arm: dts: mt2701: Use real clock for UARTs
We used to use a fixed rate clock for the UARTs. Now that we have clock support we can associate the correct clocks to the UARTs and drop the 26MHz fixed rate UART clock. Signed-off-by: Erin Lo --- arch/arm/boot/dts/mt2701.dtsi | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/arch/arm/boot/dts/mt2701.dtsi b/arch/arm/boot/dts/mt2701.dtsi index c9a8dbf..7eab6f4 100644 --- a/arch/arm/boot/dts/mt2701.dtsi +++ b/arch/arm/boot/dts/mt2701.dtsi @@ -73,12 +73,6 @@ #clock-cells = <0>; }; - uart_clk: dummy26m { - compatible = "fixed-clock"; - clock-frequency = <2600>; - #clock-cells = <0>; - }; - clk26m: oscillator@0 { compatible = "fixed-clock"; #clock-cells = <0>; @@ -186,7 +180,8 @@ "mediatek,mt6577-uart"; reg = <0 0x11002000 0 0x400>; interrupts = ; - clocks = <_clk>; + clocks = < CLK_PERI_UART0_SEL>, < CLK_PERI_UART0>; + clock-names = "baud", "bus"; status = "disabled"; }; @@ -195,7 +190,8 @@ "mediatek,mt6577-uart"; reg = <0 0x11003000 0 0x400>; interrupts = ; - clocks = <_clk>; + clocks = < CLK_PERI_UART1_SEL>, < CLK_PERI_UART1>; + clock-names = "baud", "bus"; status = "disabled"; }; @@ -204,7 +200,8 @@ "mediatek,mt6577-uart"; reg = <0 0x11004000 0 0x400>; interrupts = ; - clocks = <_clk>; + clocks = < CLK_PERI_UART2_SEL>, < CLK_PERI_UART2>; + clock-names = "baud", "bus"; status = "disabled"; }; @@ -213,7 +210,8 @@ "mediatek,mt6577-uart"; reg = <0 0x11005000 0 0x400>; interrupts = ; - clocks = <_clk>; + clocks = < CLK_PERI_UART3_SEL>, < CLK_PERI_UART3>; + clock-names = "baud", "bus"; status = "disabled"; }; }; -- 1.9.1
[PATCH v11 8/9] arm: dts: mt2701: Add clock controller device nodes
From: James Liao Add clock controller nodes for MT2701, include topckgen, infracfg, pericfg, apmixedsys, mmsys, imgsys, vdecsys, hifsys, ethsys and bdpsys. This patch also add two oscillators that provide clocks for MT2701. Signed-off-by: James Liao Signed-off-by: Erin Lo --- arch/arm/boot/dts/mt2701.dtsi | 42 ++ 1 file changed, 42 insertions(+) diff --git a/arch/arm/boot/dts/mt2701.dtsi b/arch/arm/boot/dts/mt2701.dtsi index 18596a2..c9a8dbf 100644 --- a/arch/arm/boot/dts/mt2701.dtsi +++ b/arch/arm/boot/dts/mt2701.dtsi @@ -12,8 +12,10 @@ * GNU General Public License for more details. */ +#include #include #include +#include #include "skeleton64.dtsi" #include "mt2701-pinfunc.h" @@ -77,6 +79,20 @@ #clock-cells = <0>; }; + clk26m: oscillator@0 { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <2600>; + clock-output-names = "clk26m"; + }; + + rtc32k: oscillator@1 { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <32000>; + clock-output-names = "rtc32k"; + }; + timer { compatible = "arm,armv7-timer"; interrupt-parent = <>; @@ -104,6 +120,26 @@ reg = <0 0x10005000 0 0x1000>; }; + topckgen: syscon@1000 { + compatible = "mediatek,mt2701-topckgen", "syscon"; + reg = <0 0x1000 0 0x1000>; + #clock-cells = <1>; + }; + + infracfg: syscon@10001000 { + compatible = "mediatek,mt2701-infracfg", "syscon"; + reg = <0 0x10001000 0 0x1000>; + #clock-cells = <1>; + #reset-cells = <1>; + }; + + pericfg: syscon@10003000 { + compatible = "mediatek,mt2701-pericfg", "syscon"; + reg = <0 0x10003000 0 0x1000>; + #clock-cells = <1>; + #reset-cells = <1>; + }; + watchdog: watchdog@10007000 { compatible = "mediatek,mt2701-wdt", "mediatek,mt6589-wdt"; @@ -128,6 +164,12 @@ reg = <0 0x10200100 0 0x1c>; }; + apmixedsys: syscon@10209000 { + compatible = "mediatek,mt2701-apmixedsys", "syscon"; + reg = <0 0x10209000 0 0x1000>; + #clock-cells = <1>; + }; + gic: interrupt-controller@10211000 { compatible = "arm,cortex-a7-gic"; interrupt-controller; -- 1.9.1
[PATCH v11 1/9] clk: mediatek: remove __init from clk registration functions
From: James LiaoRemove __init from functions that will be used by init functions that support probe deferral. Signed-off-by: James Liao Signed-off-by: Erin Lo --- drivers/clk/mediatek/clk-gate.c | 2 +- drivers/clk/mediatek/clk-mtk.c | 12 ++-- drivers/clk/mediatek/clk-pll.c | 2 +- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/clk/mediatek/clk-gate.c b/drivers/clk/mediatek/clk-gate.c index 2a76901..d8787bf 100644 --- a/drivers/clk/mediatek/clk-gate.c +++ b/drivers/clk/mediatek/clk-gate.c @@ -97,7 +97,7 @@ const struct clk_ops mtk_clk_gate_ops_setclr_inv = { .disable= mtk_cg_disable_inv, }; -struct clk * __init mtk_clk_register_gate( +struct clk *mtk_clk_register_gate( const char *name, const char *parent_name, struct regmap *regmap, diff --git a/drivers/clk/mediatek/clk-mtk.c b/drivers/clk/mediatek/clk-mtk.c index 5ada644..bb30f70 100644 --- a/drivers/clk/mediatek/clk-mtk.c +++ b/drivers/clk/mediatek/clk-mtk.c @@ -24,7 +24,7 @@ #include "clk-mtk.h" #include "clk-gate.h" -struct clk_onecell_data * __init mtk_alloc_clk_data(unsigned int clk_num) +struct clk_onecell_data *mtk_alloc_clk_data(unsigned int clk_num) { int i; struct clk_onecell_data *clk_data; @@ -49,7 +49,7 @@ err_out: return NULL; } -void __init mtk_clk_register_fixed_clks(const struct mtk_fixed_clk *clks, +void mtk_clk_register_fixed_clks(const struct mtk_fixed_clk *clks, int num, struct clk_onecell_data *clk_data) { int i; @@ -72,7 +72,7 @@ void __init mtk_clk_register_fixed_clks(const struct mtk_fixed_clk *clks, } } -void __init mtk_clk_register_factors(const struct mtk_fixed_factor *clks, +void mtk_clk_register_factors(const struct mtk_fixed_factor *clks, int num, struct clk_onecell_data *clk_data) { int i; @@ -95,7 +95,7 @@ void __init mtk_clk_register_factors(const struct mtk_fixed_factor *clks, } } -int __init mtk_clk_register_gates(struct device_node *node, +int mtk_clk_register_gates(struct device_node *node, const struct mtk_gate *clks, int num, struct clk_onecell_data *clk_data) { @@ -135,7 +135,7 @@ int __init mtk_clk_register_gates(struct device_node *node, return 0; } -struct clk * __init mtk_clk_register_composite(const struct mtk_composite *mc, +struct clk *mtk_clk_register_composite(const struct mtk_composite *mc, void __iomem *base, spinlock_t *lock) { struct clk *clk; @@ -222,7 +222,7 @@ err_out: return ERR_PTR(ret); } -void __init mtk_clk_register_composites(const struct mtk_composite *mcs, +void mtk_clk_register_composites(const struct mtk_composite *mcs, int num, void __iomem *base, spinlock_t *lock, struct clk_onecell_data *clk_data) { diff --git a/drivers/clk/mediatek/clk-pll.c b/drivers/clk/mediatek/clk-pll.c index 966cab1..0c2deac 100644 --- a/drivers/clk/mediatek/clk-pll.c +++ b/drivers/clk/mediatek/clk-pll.c @@ -313,7 +313,7 @@ static struct clk *mtk_clk_register_pll(const struct mtk_pll_data *data, return clk; } -void __init mtk_clk_register_plls(struct device_node *node, +void mtk_clk_register_plls(struct device_node *node, const struct mtk_pll_data *plls, int num_plls, struct clk_onecell_data *clk_data) { void __iomem *base; -- 1.9.1
[PATCH v11 4/9] clk: mediatek: Add dt-bindings for MT2701 clocks
From: Shunli WangAdd MT2701 clock dt-bindings, include topckgen, apmixedsys, infracfg, pericfg and subsystem clocks. Signed-off-by: Shunli Wang Signed-off-by: James Liao Signed-off-by: Erin Lo Tested-by: John Crispin Reviewed-by: Matthias Brugger --- include/dt-bindings/clock/mt2701-clk.h | 486 + 1 file changed, 486 insertions(+) create mode 100644 include/dt-bindings/clock/mt2701-clk.h diff --git a/include/dt-bindings/clock/mt2701-clk.h b/include/dt-bindings/clock/mt2701-clk.h new file mode 100644 index 000..2062c67 --- /dev/null +++ b/include/dt-bindings/clock/mt2701-clk.h @@ -0,0 +1,486 @@ +/* + * Copyright (c) 2014 MediaTek Inc. + * Author: Shunli Wang + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#ifndef _DT_BINDINGS_CLK_MT2701_H +#define _DT_BINDINGS_CLK_MT2701_H + +/* TOPCKGEN */ +#define CLK_TOP_SYSPLL 1 +#define CLK_TOP_SYSPLL_D2 2 +#define CLK_TOP_SYSPLL_D3 3 +#define CLK_TOP_SYSPLL_D5 4 +#define CLK_TOP_SYSPLL_D7 5 +#define CLK_TOP_SYSPLL1_D2 6 +#define CLK_TOP_SYSPLL1_D4 7 +#define CLK_TOP_SYSPLL1_D8 8 +#define CLK_TOP_SYSPLL1_D169 +#define CLK_TOP_SYSPLL2_D2 10 +#define CLK_TOP_SYSPLL2_D4 11 +#define CLK_TOP_SYSPLL2_D8 12 +#define CLK_TOP_SYSPLL3_D2 13 +#define CLK_TOP_SYSPLL3_D4 14 +#define CLK_TOP_SYSPLL4_D2 15 +#define CLK_TOP_SYSPLL4_D4 16 +#define CLK_TOP_UNIVPLL17 +#define CLK_TOP_UNIVPLL_D2 18 +#define CLK_TOP_UNIVPLL_D3 19 +#define CLK_TOP_UNIVPLL_D5 20 +#define CLK_TOP_UNIVPLL_D7 21 +#define CLK_TOP_UNIVPLL_D2622 +#define CLK_TOP_UNIVPLL_D5223 +#define CLK_TOP_UNIVPLL_D108 24 +#define CLK_TOP_USB_PHY48M 25 +#define CLK_TOP_UNIVPLL1_D226 +#define CLK_TOP_UNIVPLL1_D427 +#define CLK_TOP_UNIVPLL1_D828 +#define CLK_TOP_UNIVPLL2_D229 +#define CLK_TOP_UNIVPLL2_D430 +#define CLK_TOP_UNIVPLL2_D831 +#define CLK_TOP_UNIVPLL2_D16 32 +#define CLK_TOP_UNIVPLL2_D32 33 +#define CLK_TOP_UNIVPLL3_D234 +#define CLK_TOP_UNIVPLL3_D435 +#define CLK_TOP_UNIVPLL3_D836 +#define CLK_TOP_MSDCPLL37 +#define CLK_TOP_MSDCPLL_D2 38 +#define CLK_TOP_MSDCPLL_D4 39 +#define CLK_TOP_MSDCPLL_D8 40 +#define CLK_TOP_MMPLL 41 +#define CLK_TOP_MMPLL_D2 42 +#define CLK_TOP_DMPLL 43 +#define CLK_TOP_DMPLL_D2 44 +#define CLK_TOP_DMPLL_D4 45 +#define CLK_TOP_DMPLL_X2 46 +#define CLK_TOP_TVDPLL 47 +#define CLK_TOP_TVDPLL_D2 48 +#define CLK_TOP_TVDPLL_D4 49 +#define CLK_TOP_TVD2PLL50 +#define CLK_TOP_TVD2PLL_D2 51 +#define CLK_TOP_HADDS2PLL_98M 52 +#define CLK_TOP_HADDS2PLL_294M 53 +#define CLK_TOP_HADDS2_FB 54 +#define CLK_TOP_MIPIPLL_D2 55 +#define CLK_TOP_MIPIPLL_D4 56 +#define CLK_TOP_HDMIPLL57 +#define CLK_TOP_HDMIPLL_D2 58 +#define CLK_TOP_HDMIPLL_D3 59 +#define CLK_TOP_HDMI_SCL_RX60 +#define CLK_TOP_HDMI_0_PIX340M 61 +#define CLK_TOP_HDMI_0_DEEP340M62 +#define CLK_TOP_HDMI_0_PLL340M 63 +#define CLK_TOP_AUD1PLL_98M64 +#define CLK_TOP_AUD2PLL_90M65 +#define CLK_TOP_AUDPLL 66 +#define CLK_TOP_AUDPLL_D4 67 +#define CLK_TOP_AUDPLL_D8 68 +#define CLK_TOP_AUDPLL_D16 69 +#define
[PATCH v11 1/9] clk: mediatek: remove __init from clk registration functions
From: James Liao Remove __init from functions that will be used by init functions that support probe deferral. Signed-off-by: James Liao Signed-off-by: Erin Lo --- drivers/clk/mediatek/clk-gate.c | 2 +- drivers/clk/mediatek/clk-mtk.c | 12 ++-- drivers/clk/mediatek/clk-pll.c | 2 +- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/clk/mediatek/clk-gate.c b/drivers/clk/mediatek/clk-gate.c index 2a76901..d8787bf 100644 --- a/drivers/clk/mediatek/clk-gate.c +++ b/drivers/clk/mediatek/clk-gate.c @@ -97,7 +97,7 @@ const struct clk_ops mtk_clk_gate_ops_setclr_inv = { .disable= mtk_cg_disable_inv, }; -struct clk * __init mtk_clk_register_gate( +struct clk *mtk_clk_register_gate( const char *name, const char *parent_name, struct regmap *regmap, diff --git a/drivers/clk/mediatek/clk-mtk.c b/drivers/clk/mediatek/clk-mtk.c index 5ada644..bb30f70 100644 --- a/drivers/clk/mediatek/clk-mtk.c +++ b/drivers/clk/mediatek/clk-mtk.c @@ -24,7 +24,7 @@ #include "clk-mtk.h" #include "clk-gate.h" -struct clk_onecell_data * __init mtk_alloc_clk_data(unsigned int clk_num) +struct clk_onecell_data *mtk_alloc_clk_data(unsigned int clk_num) { int i; struct clk_onecell_data *clk_data; @@ -49,7 +49,7 @@ err_out: return NULL; } -void __init mtk_clk_register_fixed_clks(const struct mtk_fixed_clk *clks, +void mtk_clk_register_fixed_clks(const struct mtk_fixed_clk *clks, int num, struct clk_onecell_data *clk_data) { int i; @@ -72,7 +72,7 @@ void __init mtk_clk_register_fixed_clks(const struct mtk_fixed_clk *clks, } } -void __init mtk_clk_register_factors(const struct mtk_fixed_factor *clks, +void mtk_clk_register_factors(const struct mtk_fixed_factor *clks, int num, struct clk_onecell_data *clk_data) { int i; @@ -95,7 +95,7 @@ void __init mtk_clk_register_factors(const struct mtk_fixed_factor *clks, } } -int __init mtk_clk_register_gates(struct device_node *node, +int mtk_clk_register_gates(struct device_node *node, const struct mtk_gate *clks, int num, struct clk_onecell_data *clk_data) { @@ -135,7 +135,7 @@ int __init mtk_clk_register_gates(struct device_node *node, return 0; } -struct clk * __init mtk_clk_register_composite(const struct mtk_composite *mc, +struct clk *mtk_clk_register_composite(const struct mtk_composite *mc, void __iomem *base, spinlock_t *lock) { struct clk *clk; @@ -222,7 +222,7 @@ err_out: return ERR_PTR(ret); } -void __init mtk_clk_register_composites(const struct mtk_composite *mcs, +void mtk_clk_register_composites(const struct mtk_composite *mcs, int num, void __iomem *base, spinlock_t *lock, struct clk_onecell_data *clk_data) { diff --git a/drivers/clk/mediatek/clk-pll.c b/drivers/clk/mediatek/clk-pll.c index 966cab1..0c2deac 100644 --- a/drivers/clk/mediatek/clk-pll.c +++ b/drivers/clk/mediatek/clk-pll.c @@ -313,7 +313,7 @@ static struct clk *mtk_clk_register_pll(const struct mtk_pll_data *data, return clk; } -void __init mtk_clk_register_plls(struct device_node *node, +void mtk_clk_register_plls(struct device_node *node, const struct mtk_pll_data *plls, int num_plls, struct clk_onecell_data *clk_data) { void __iomem *base; -- 1.9.1
[PATCH v11 4/9] clk: mediatek: Add dt-bindings for MT2701 clocks
From: Shunli Wang Add MT2701 clock dt-bindings, include topckgen, apmixedsys, infracfg, pericfg and subsystem clocks. Signed-off-by: Shunli Wang Signed-off-by: James Liao Signed-off-by: Erin Lo Tested-by: John Crispin Reviewed-by: Matthias Brugger --- include/dt-bindings/clock/mt2701-clk.h | 486 + 1 file changed, 486 insertions(+) create mode 100644 include/dt-bindings/clock/mt2701-clk.h diff --git a/include/dt-bindings/clock/mt2701-clk.h b/include/dt-bindings/clock/mt2701-clk.h new file mode 100644 index 000..2062c67 --- /dev/null +++ b/include/dt-bindings/clock/mt2701-clk.h @@ -0,0 +1,486 @@ +/* + * Copyright (c) 2014 MediaTek Inc. + * Author: Shunli Wang + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#ifndef _DT_BINDINGS_CLK_MT2701_H +#define _DT_BINDINGS_CLK_MT2701_H + +/* TOPCKGEN */ +#define CLK_TOP_SYSPLL 1 +#define CLK_TOP_SYSPLL_D2 2 +#define CLK_TOP_SYSPLL_D3 3 +#define CLK_TOP_SYSPLL_D5 4 +#define CLK_TOP_SYSPLL_D7 5 +#define CLK_TOP_SYSPLL1_D2 6 +#define CLK_TOP_SYSPLL1_D4 7 +#define CLK_TOP_SYSPLL1_D8 8 +#define CLK_TOP_SYSPLL1_D169 +#define CLK_TOP_SYSPLL2_D2 10 +#define CLK_TOP_SYSPLL2_D4 11 +#define CLK_TOP_SYSPLL2_D8 12 +#define CLK_TOP_SYSPLL3_D2 13 +#define CLK_TOP_SYSPLL3_D4 14 +#define CLK_TOP_SYSPLL4_D2 15 +#define CLK_TOP_SYSPLL4_D4 16 +#define CLK_TOP_UNIVPLL17 +#define CLK_TOP_UNIVPLL_D2 18 +#define CLK_TOP_UNIVPLL_D3 19 +#define CLK_TOP_UNIVPLL_D5 20 +#define CLK_TOP_UNIVPLL_D7 21 +#define CLK_TOP_UNIVPLL_D2622 +#define CLK_TOP_UNIVPLL_D5223 +#define CLK_TOP_UNIVPLL_D108 24 +#define CLK_TOP_USB_PHY48M 25 +#define CLK_TOP_UNIVPLL1_D226 +#define CLK_TOP_UNIVPLL1_D427 +#define CLK_TOP_UNIVPLL1_D828 +#define CLK_TOP_UNIVPLL2_D229 +#define CLK_TOP_UNIVPLL2_D430 +#define CLK_TOP_UNIVPLL2_D831 +#define CLK_TOP_UNIVPLL2_D16 32 +#define CLK_TOP_UNIVPLL2_D32 33 +#define CLK_TOP_UNIVPLL3_D234 +#define CLK_TOP_UNIVPLL3_D435 +#define CLK_TOP_UNIVPLL3_D836 +#define CLK_TOP_MSDCPLL37 +#define CLK_TOP_MSDCPLL_D2 38 +#define CLK_TOP_MSDCPLL_D4 39 +#define CLK_TOP_MSDCPLL_D8 40 +#define CLK_TOP_MMPLL 41 +#define CLK_TOP_MMPLL_D2 42 +#define CLK_TOP_DMPLL 43 +#define CLK_TOP_DMPLL_D2 44 +#define CLK_TOP_DMPLL_D4 45 +#define CLK_TOP_DMPLL_X2 46 +#define CLK_TOP_TVDPLL 47 +#define CLK_TOP_TVDPLL_D2 48 +#define CLK_TOP_TVDPLL_D4 49 +#define CLK_TOP_TVD2PLL50 +#define CLK_TOP_TVD2PLL_D2 51 +#define CLK_TOP_HADDS2PLL_98M 52 +#define CLK_TOP_HADDS2PLL_294M 53 +#define CLK_TOP_HADDS2_FB 54 +#define CLK_TOP_MIPIPLL_D2 55 +#define CLK_TOP_MIPIPLL_D4 56 +#define CLK_TOP_HDMIPLL57 +#define CLK_TOP_HDMIPLL_D2 58 +#define CLK_TOP_HDMIPLL_D3 59 +#define CLK_TOP_HDMI_SCL_RX60 +#define CLK_TOP_HDMI_0_PIX340M 61 +#define CLK_TOP_HDMI_0_DEEP340M62 +#define CLK_TOP_HDMI_0_PLL340M 63 +#define CLK_TOP_AUD1PLL_98M64 +#define CLK_TOP_AUD2PLL_90M65 +#define CLK_TOP_AUDPLL 66 +#define CLK_TOP_AUDPLL_D4 67 +#define CLK_TOP_AUDPLL_D8 68 +#define CLK_TOP_AUDPLL_D16 69 +#define CLK_TOP_AUDPLL_D24 70 +#define CLK_TOP_ETHPLL_500M71 +#define CLK_TOP_VDECPLL72 +#define CLK_TOP_VENCPLL
[PATCH v11 0/9] Add clock support for Mediatek MT2701
This series is based on v4.8-rc1, add clock and reset controller support for Mediatek MT2701. This series also refined makefile and Kconfig to support configurable multiple SoC clock support. changes since v10: - Remove COMMON_CLK dependency from clk/mediatek/Kconfig. changes since v9: - Rebase to v4.8-rc1. - Drop a fix patch of parent clock initial state. It will be replaced by a new patch from Mike/Stephen. - Replace clk.h with clk-provider.h. - Correct register settings of clocks. changes since v8: - Rebase to v4.7-rc1. - Include mt2701-resets.h in mt2701.dtsi. - Remove an unused property from apmixedsys DT node. changes since v7: - Rebase to clk-next. - Implement subsystem clocks in seperated files. - Replace critical clock enabling with CLK_IS_CRITICAL flag. - Reduce most clock registrations in CLK_OF_DECLARE(). - Remove __init and __initconst from most init fucntions and data, and replace driver registration with platform_driver_register(). - Replace some common function or variable names with unique names. - Use real clock for UARTs. changes since v6: - Rebase to v4.6-rc1. - Register subsystem clocks in probe() instead of CLK_OF_DECLARE(). - Add clocks that referred by subsystem clocks. - Fix clk_data size of apmixedsys. - Add config options for each subsystem clock provider. changes since v5: - Rebase to v4.5-rc1 and [1]. - Enable critical clocks for MT2701 - Refine dt-binding documents, add reset controller support for hifsys. changes since v4: - Rebase to v4.5-rc1. - Remove CLK_SET_RATE_PARENT from divider flags. - Add img_jpgdec_smi clock. - Move clk/mediatek/Kconfig into menu section in clk/Kconfig. changes since v3: - Change the parent of mm_mdp_bls_26m from clk26m to pwm_sel. changes since v2: - Fix ethsys definition. - Replace read-modify-write with regmap_update_bits() in clock operations. - Move mt2701-resets.h to include/dt-bindings/reset/. - Add hifsys reset patch from John Crispin. changes since v1: - Document MT2701 compatible strings. [1] https://patchwork.kernel.org/patch/8147901/ Erin Lo (1): arm: dts: mt2701: Use real clock for UARTs James Liao (4): clk: mediatek: remove __init from clk registration functions clk: mediatek: Refine the makefile to support multiple clock drivers dt-bindings: ARM: Mediatek: Document bindings for MT2701 arm: dts: mt2701: Add clock controller device nodes Shunli Wang (4): clk: mediatek: Add dt-bindings for MT2701 clocks clk: mediatek: Add MT2701 clock support reset: mediatek: Add MT2701 reset controller dt-binding file reset: mediatek: Add MT2701 reset driver .../bindings/arm/mediatek/mediatek,apmixedsys.txt |3 +- .../bindings/arm/mediatek/mediatek,bdpsys.txt | 22 + .../bindings/arm/mediatek/mediatek,ethsys.txt | 22 + .../bindings/arm/mediatek/mediatek,hifsys.txt | 24 + .../bindings/arm/mediatek/mediatek,imgsys.txt |3 +- .../bindings/arm/mediatek/mediatek,infracfg.txt|3 +- .../bindings/arm/mediatek/mediatek,mmsys.txt |3 +- .../bindings/arm/mediatek/mediatek,pericfg.txt |3 +- .../bindings/arm/mediatek/mediatek,topckgen.txt|3 +- .../bindings/arm/mediatek/mediatek,vdecsys.txt |3 +- arch/arm/boot/dts/mt2701.dtsi | 50 +- drivers/clk/Kconfig|1 + drivers/clk/mediatek/Kconfig | 64 ++ drivers/clk/mediatek/Makefile | 13 +- drivers/clk/mediatek/clk-gate.c| 54 +- drivers/clk/mediatek/clk-gate.h|2 + drivers/clk/mediatek/clk-mt2701-bdp.c | 140 +++ drivers/clk/mediatek/clk-mt2701-eth.c | 82 ++ drivers/clk/mediatek/clk-mt2701-hif.c | 81 ++ drivers/clk/mediatek/clk-mt2701-img.c | 82 ++ drivers/clk/mediatek/clk-mt2701-mm.c | 125 +++ drivers/clk/mediatek/clk-mt2701-vdec.c | 93 ++ drivers/clk/mediatek/clk-mt2701.c | 1037 drivers/clk/mediatek/clk-mtk.c | 52 +- drivers/clk/mediatek/clk-mtk.h | 41 +- drivers/clk/mediatek/clk-pll.c |3 +- include/dt-bindings/clock/mt2701-clk.h | 486 + include/dt-bindings/reset/mt2701-resets.h | 83 ++ 28 files changed, 2550 insertions(+), 28 deletions(-) create mode 100644 Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt create mode 100644 Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt create mode 100644 Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt create mode 100644 drivers/clk/mediatek/Kconfig create mode 100644 drivers/clk/mediatek/clk-mt2701-bdp.c create mode 100644 drivers/clk/mediatek/clk-mt2701-eth.c create mode 100644 drivers/clk/mediatek/clk-mt2701-hif.c create mode 100644 drivers/clk/mediatek/clk-mt2701-img.c create mode 100644
[PATCH v11 7/9] reset: mediatek: Add MT2701 reset driver
From: Shunli WangIn infrasys and perifsys, there are many reset control bits for kinds of modules. These bits are used as actual reset controllers to be registered into kernel's generic reset controller framework. Signed-off-by: Shunli Wang Signed-off-by: James Liao Signed-off-by: Erin Lo Tested-by: John Crispin Acked-by: Philipp Zabel --- drivers/clk/mediatek/clk-mt2701-hif.c | 2 ++ drivers/clk/mediatek/clk-mt2701.c | 4 2 files changed, 6 insertions(+) diff --git a/drivers/clk/mediatek/clk-mt2701-hif.c b/drivers/clk/mediatek/clk-mt2701-hif.c index 33ead83..0ca0537 100644 --- a/drivers/clk/mediatek/clk-mt2701-hif.c +++ b/drivers/clk/mediatek/clk-mt2701-hif.c @@ -55,6 +55,8 @@ static void mtk_hifsys_init(struct device_node *node) if (r) pr_err("%s(): could not register clock provider: %d\n", __func__, r); + + mtk_register_reset_controller(node, 1, 0x34); } static const struct of_device_id of_match_clk_mt2701_hif[] = { diff --git a/drivers/clk/mediatek/clk-mt2701.c b/drivers/clk/mediatek/clk-mt2701.c index f64dc4e..9dab533 100644 --- a/drivers/clk/mediatek/clk-mt2701.c +++ b/drivers/clk/mediatek/clk-mt2701.c @@ -791,6 +791,8 @@ static void mtk_infrasys_init(struct device_node *node) if (r) pr_err("%s(): could not register clock provider: %d\n", __func__, r); + + mtk_register_reset_controller(node, 2, 0x30); } static const struct mtk_gate_regs peri0_cg_regs = { @@ -911,6 +913,8 @@ static void mtk_pericfg_init(struct device_node *node) if (r) pr_err("%s(): could not register clock provider: %d\n", __func__, r); + + mtk_register_reset_controller(node, 2, 0x0); } #define MT8590_PLL_FMAX(2000 * MHZ) -- 1.9.1
[PATCH v11 0/9] Add clock support for Mediatek MT2701
This series is based on v4.8-rc1, add clock and reset controller support for Mediatek MT2701. This series also refined makefile and Kconfig to support configurable multiple SoC clock support. changes since v10: - Remove COMMON_CLK dependency from clk/mediatek/Kconfig. changes since v9: - Rebase to v4.8-rc1. - Drop a fix patch of parent clock initial state. It will be replaced by a new patch from Mike/Stephen. - Replace clk.h with clk-provider.h. - Correct register settings of clocks. changes since v8: - Rebase to v4.7-rc1. - Include mt2701-resets.h in mt2701.dtsi. - Remove an unused property from apmixedsys DT node. changes since v7: - Rebase to clk-next. - Implement subsystem clocks in seperated files. - Replace critical clock enabling with CLK_IS_CRITICAL flag. - Reduce most clock registrations in CLK_OF_DECLARE(). - Remove __init and __initconst from most init fucntions and data, and replace driver registration with platform_driver_register(). - Replace some common function or variable names with unique names. - Use real clock for UARTs. changes since v6: - Rebase to v4.6-rc1. - Register subsystem clocks in probe() instead of CLK_OF_DECLARE(). - Add clocks that referred by subsystem clocks. - Fix clk_data size of apmixedsys. - Add config options for each subsystem clock provider. changes since v5: - Rebase to v4.5-rc1 and [1]. - Enable critical clocks for MT2701 - Refine dt-binding documents, add reset controller support for hifsys. changes since v4: - Rebase to v4.5-rc1. - Remove CLK_SET_RATE_PARENT from divider flags. - Add img_jpgdec_smi clock. - Move clk/mediatek/Kconfig into menu section in clk/Kconfig. changes since v3: - Change the parent of mm_mdp_bls_26m from clk26m to pwm_sel. changes since v2: - Fix ethsys definition. - Replace read-modify-write with regmap_update_bits() in clock operations. - Move mt2701-resets.h to include/dt-bindings/reset/. - Add hifsys reset patch from John Crispin. changes since v1: - Document MT2701 compatible strings. [1] https://patchwork.kernel.org/patch/8147901/ Erin Lo (1): arm: dts: mt2701: Use real clock for UARTs James Liao (4): clk: mediatek: remove __init from clk registration functions clk: mediatek: Refine the makefile to support multiple clock drivers dt-bindings: ARM: Mediatek: Document bindings for MT2701 arm: dts: mt2701: Add clock controller device nodes Shunli Wang (4): clk: mediatek: Add dt-bindings for MT2701 clocks clk: mediatek: Add MT2701 clock support reset: mediatek: Add MT2701 reset controller dt-binding file reset: mediatek: Add MT2701 reset driver .../bindings/arm/mediatek/mediatek,apmixedsys.txt |3 +- .../bindings/arm/mediatek/mediatek,bdpsys.txt | 22 + .../bindings/arm/mediatek/mediatek,ethsys.txt | 22 + .../bindings/arm/mediatek/mediatek,hifsys.txt | 24 + .../bindings/arm/mediatek/mediatek,imgsys.txt |3 +- .../bindings/arm/mediatek/mediatek,infracfg.txt|3 +- .../bindings/arm/mediatek/mediatek,mmsys.txt |3 +- .../bindings/arm/mediatek/mediatek,pericfg.txt |3 +- .../bindings/arm/mediatek/mediatek,topckgen.txt|3 +- .../bindings/arm/mediatek/mediatek,vdecsys.txt |3 +- arch/arm/boot/dts/mt2701.dtsi | 50 +- drivers/clk/Kconfig|1 + drivers/clk/mediatek/Kconfig | 64 ++ drivers/clk/mediatek/Makefile | 13 +- drivers/clk/mediatek/clk-gate.c| 54 +- drivers/clk/mediatek/clk-gate.h|2 + drivers/clk/mediatek/clk-mt2701-bdp.c | 140 +++ drivers/clk/mediatek/clk-mt2701-eth.c | 82 ++ drivers/clk/mediatek/clk-mt2701-hif.c | 81 ++ drivers/clk/mediatek/clk-mt2701-img.c | 82 ++ drivers/clk/mediatek/clk-mt2701-mm.c | 125 +++ drivers/clk/mediatek/clk-mt2701-vdec.c | 93 ++ drivers/clk/mediatek/clk-mt2701.c | 1037 drivers/clk/mediatek/clk-mtk.c | 52 +- drivers/clk/mediatek/clk-mtk.h | 41 +- drivers/clk/mediatek/clk-pll.c |3 +- include/dt-bindings/clock/mt2701-clk.h | 486 + include/dt-bindings/reset/mt2701-resets.h | 83 ++ 28 files changed, 2550 insertions(+), 28 deletions(-) create mode 100644 Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt create mode 100644 Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt create mode 100644 Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt create mode 100644 drivers/clk/mediatek/Kconfig create mode 100644 drivers/clk/mediatek/clk-mt2701-bdp.c create mode 100644 drivers/clk/mediatek/clk-mt2701-eth.c create mode 100644 drivers/clk/mediatek/clk-mt2701-hif.c create mode 100644 drivers/clk/mediatek/clk-mt2701-img.c create mode 100644
[PATCH v11 7/9] reset: mediatek: Add MT2701 reset driver
From: Shunli Wang In infrasys and perifsys, there are many reset control bits for kinds of modules. These bits are used as actual reset controllers to be registered into kernel's generic reset controller framework. Signed-off-by: Shunli Wang Signed-off-by: James Liao Signed-off-by: Erin Lo Tested-by: John Crispin Acked-by: Philipp Zabel --- drivers/clk/mediatek/clk-mt2701-hif.c | 2 ++ drivers/clk/mediatek/clk-mt2701.c | 4 2 files changed, 6 insertions(+) diff --git a/drivers/clk/mediatek/clk-mt2701-hif.c b/drivers/clk/mediatek/clk-mt2701-hif.c index 33ead83..0ca0537 100644 --- a/drivers/clk/mediatek/clk-mt2701-hif.c +++ b/drivers/clk/mediatek/clk-mt2701-hif.c @@ -55,6 +55,8 @@ static void mtk_hifsys_init(struct device_node *node) if (r) pr_err("%s(): could not register clock provider: %d\n", __func__, r); + + mtk_register_reset_controller(node, 1, 0x34); } static const struct of_device_id of_match_clk_mt2701_hif[] = { diff --git a/drivers/clk/mediatek/clk-mt2701.c b/drivers/clk/mediatek/clk-mt2701.c index f64dc4e..9dab533 100644 --- a/drivers/clk/mediatek/clk-mt2701.c +++ b/drivers/clk/mediatek/clk-mt2701.c @@ -791,6 +791,8 @@ static void mtk_infrasys_init(struct device_node *node) if (r) pr_err("%s(): could not register clock provider: %d\n", __func__, r); + + mtk_register_reset_controller(node, 2, 0x30); } static const struct mtk_gate_regs peri0_cg_regs = { @@ -911,6 +913,8 @@ static void mtk_pericfg_init(struct device_node *node) if (r) pr_err("%s(): could not register clock provider: %d\n", __func__, r); + + mtk_register_reset_controller(node, 2, 0x0); } #define MT8590_PLL_FMAX(2000 * MHZ) -- 1.9.1
[PATCH 0/4] zswap: Optimize compressed pool memory utilization
On 17 August 2016 at 18:08, Pekka Enbergwrote: > On Wed, Aug 17, 2016 at 1:03 PM, Srividya Desireddy > wrote: >> This series of patches optimize the memory utilized by zswap for storing >> the swapped out pages. >> >> Zswap is a cache which compresses the pages that are being swapped out >> and stores them into a dynamically allocated RAM-based memory pool. >> Experiments have shown that around 10-15% of pages stored in zswap are >> duplicates which results in 10-12% more RAM required to store these >> duplicate compressed pages. Around 10-20% of pages stored in zswap >> are zero-filled pages, but these pages are handled as normal pages by >> compressing and allocating memory in the pool. >> >> The following patch-set optimizes memory utilized by zswap by avoiding the >> storage of duplicate pages and zero-filled pages in zswap compressed memory >> pool. >> >> Patch 1/4: zswap: Share zpool memory of duplicate pages >> This patch shares compressed pool memory of the duplicate pages. When a new >> page is requested for swap-out to zswap; search for an identical page in >> the pages already stored in zswap. If an identical page is found then share >> the compressed page data of the identical page with the new page. This >> avoids allocation of memory in the compressed pool for a duplicate page. >> This feature is tested on devices with 1GB, 2GB and 3GB RAM by executing >> performance test at low memory conditions. Around 15-20% of the pages >> swapped are duplicate of the pages existing in zswap, resulting in 15% >> saving of zswap memory pool when compared to the baseline version. >> >> Test Parameters BaselineWith patch Improvement >> Total RAM 955MB 955MB >> Available RAM 254MB 269MB 15MB >> Avg. App entry time 2.469sec2.207sec7% >> Avg. App close time 1.151sec1.085sec6% >> Apps launched in 1sec 5 12 7 >> >> There is little overhead in zswap store function due to the search >> operation for finding duplicate pages. However, if duplicate page is >> found it saves the compression and allocation time of the page. The average >> overhead per zswap_frontswap_store() function call in the experimental >> device is 9us. There is no overhead in case of zswap_frontswap_load() >> operation. >> >> Patch 2/4: zswap: Enable/disable sharing of duplicate pages at runtime >> This patch adds a module parameter to enable or disable the sharing of >> duplicate zswap pages at runtime. >> >> Patch 3/4: zswap: Zero-filled pages handling >> This patch checks if a page to be stored in zswap is a zero-filled page >> (i.e. contents of the page are all zeros). If such page is found, >> compression and allocation of memory for the compressed page is avoided >> and instead the page is just marked as zero-filled page. >> Although, compressed size of a zero-filled page using LZO compressor is >> very less (52 bytes including zswap_header), this patch saves compression >> and allocation time during store operation and decompression time during >> zswap load operation for zero-filled pages. Experiments have shown that >> around 10-20% of pages stored in zswap are zero-filled. > > Aren't zero-filled pages already handled by patch 1/4 as their > contents match? So the overall memory saving is 52 bytes? > > - Pekka Thanks for the quick reply. Zero-filled pages can also be handled by patch 1/4. It performs searching of a duplicate page among existing stored pages in zswap. Its been observed that average search time to identify duplicate zero filled pages(using patch 1/4) is almost thrice compared to checking all pages for zero-filled. Also, in case of patch 1/4, the zswap_frontswap_load() operation requires the compressed zero-filled page to be decompressed. zswap_frontswap_load() function in patch 3/4 just fills the page with zeros while loading a zero-filled page and is faster than decompression. - Srividya
[PATCH 0/4] zswap: Optimize compressed pool memory utilization
On 17 August 2016 at 18:08, Pekka Enberg wrote: > On Wed, Aug 17, 2016 at 1:03 PM, Srividya Desireddy > wrote: >> This series of patches optimize the memory utilized by zswap for storing >> the swapped out pages. >> >> Zswap is a cache which compresses the pages that are being swapped out >> and stores them into a dynamically allocated RAM-based memory pool. >> Experiments have shown that around 10-15% of pages stored in zswap are >> duplicates which results in 10-12% more RAM required to store these >> duplicate compressed pages. Around 10-20% of pages stored in zswap >> are zero-filled pages, but these pages are handled as normal pages by >> compressing and allocating memory in the pool. >> >> The following patch-set optimizes memory utilized by zswap by avoiding the >> storage of duplicate pages and zero-filled pages in zswap compressed memory >> pool. >> >> Patch 1/4: zswap: Share zpool memory of duplicate pages >> This patch shares compressed pool memory of the duplicate pages. When a new >> page is requested for swap-out to zswap; search for an identical page in >> the pages already stored in zswap. If an identical page is found then share >> the compressed page data of the identical page with the new page. This >> avoids allocation of memory in the compressed pool for a duplicate page. >> This feature is tested on devices with 1GB, 2GB and 3GB RAM by executing >> performance test at low memory conditions. Around 15-20% of the pages >> swapped are duplicate of the pages existing in zswap, resulting in 15% >> saving of zswap memory pool when compared to the baseline version. >> >> Test Parameters BaselineWith patch Improvement >> Total RAM 955MB 955MB >> Available RAM 254MB 269MB 15MB >> Avg. App entry time 2.469sec2.207sec7% >> Avg. App close time 1.151sec1.085sec6% >> Apps launched in 1sec 5 12 7 >> >> There is little overhead in zswap store function due to the search >> operation for finding duplicate pages. However, if duplicate page is >> found it saves the compression and allocation time of the page. The average >> overhead per zswap_frontswap_store() function call in the experimental >> device is 9us. There is no overhead in case of zswap_frontswap_load() >> operation. >> >> Patch 2/4: zswap: Enable/disable sharing of duplicate pages at runtime >> This patch adds a module parameter to enable or disable the sharing of >> duplicate zswap pages at runtime. >> >> Patch 3/4: zswap: Zero-filled pages handling >> This patch checks if a page to be stored in zswap is a zero-filled page >> (i.e. contents of the page are all zeros). If such page is found, >> compression and allocation of memory for the compressed page is avoided >> and instead the page is just marked as zero-filled page. >> Although, compressed size of a zero-filled page using LZO compressor is >> very less (52 bytes including zswap_header), this patch saves compression >> and allocation time during store operation and decompression time during >> zswap load operation for zero-filled pages. Experiments have shown that >> around 10-20% of pages stored in zswap are zero-filled. > > Aren't zero-filled pages already handled by patch 1/4 as their > contents match? So the overall memory saving is 52 bytes? > > - Pekka Thanks for the quick reply. Zero-filled pages can also be handled by patch 1/4. It performs searching of a duplicate page among existing stored pages in zswap. Its been observed that average search time to identify duplicate zero filled pages(using patch 1/4) is almost thrice compared to checking all pages for zero-filled. Also, in case of patch 1/4, the zswap_frontswap_load() operation requires the compressed zero-filled page to be decompressed. zswap_frontswap_load() function in patch 3/4 just fills the page with zeros while loading a zero-filled page and is faster than decompression. - Srividya
Re: linux-next: build warnings after merge of the kbuild tree
On Fri, 19 Aug 2016 15:09:14 +1000 Stephen Rothwellwrote: > Hi Nick, > > On Fri, 19 Aug 2016 13:38:54 +1000 Stephen Rothwell > wrote: > > > > On Thu, 18 Aug 2016 11:09:48 +1000 Nicholas Piggin > > wrote: > > > > > > On Wed, 17 Aug 2016 14:59:59 +0200 > > > Michal Marek wrote: > > > > > > > On 2016-08-17 03:44, Stephen Rothwell wrote: > > > > > > > > > > After merging the kbuild tree, today's linux-next build (powerpc > > > > > ppc64_defconfig) produced these warnings: > > > > > > > > > > WARNING: 25 bad relocations > > > > > c0cf2570 R_PPC64_ADDR64__crc___arch_hweight16 > > > > [...] > > > > > Introduced by commit > > > > > > > > > > 9445aa1a3062 ("ppc: move exports to definitions") > > > > > > > > > > I have reverted that commit for today. > > > > > > > > > > [cc-ing the ppc guys for clues - also involved is commit > > > > > > > > > > 22823ab419d8 ("EXPORT_SYMBOL() for asm") > > > > > ] > > > > > > > > FWIW, I see these warnings as well. Any help from ppc developers is > > > > appreciated - should the R_PPC64_ADDR64 be whitelisted for exported asm > > > > symbols (their CRCs actually)? > > > > > > The dangling relocation is a side effect of linker unable to resolve the > > > reference to the undefined weak symbols. So the real question is, why has > > > genksyms not overridden these symbols with their CRC values? > > > > > > This may not even be powerpc specific, but I'll poke at it a bit more > > > when I get a chance. > > > > Not sure if this is relevant, but with the commit reverted, the > > __crc___... symbols are absolute. > > > > f55b3b3d A __crc___arch_hweight16 > > Ignore that :-) > > I just had a look at a x86_64 allmodconfig result and it looks like the > weak symbols are not resolved their either ... > > I may be missing something, but genksyms generates the crc's off the > preprocessed C source code and we don't have any for the asm files ... Looks like you're right, good find! Thanks, Nick
Re: linux-next: build warnings after merge of the kbuild tree
On Fri, 19 Aug 2016 15:09:14 +1000 Stephen Rothwell wrote: > Hi Nick, > > On Fri, 19 Aug 2016 13:38:54 +1000 Stephen Rothwell > wrote: > > > > On Thu, 18 Aug 2016 11:09:48 +1000 Nicholas Piggin > > wrote: > > > > > > On Wed, 17 Aug 2016 14:59:59 +0200 > > > Michal Marek wrote: > > > > > > > On 2016-08-17 03:44, Stephen Rothwell wrote: > > > > > > > > > > After merging the kbuild tree, today's linux-next build (powerpc > > > > > ppc64_defconfig) produced these warnings: > > > > > > > > > > WARNING: 25 bad relocations > > > > > c0cf2570 R_PPC64_ADDR64__crc___arch_hweight16 > > > > [...] > > > > > Introduced by commit > > > > > > > > > > 9445aa1a3062 ("ppc: move exports to definitions") > > > > > > > > > > I have reverted that commit for today. > > > > > > > > > > [cc-ing the ppc guys for clues - also involved is commit > > > > > > > > > > 22823ab419d8 ("EXPORT_SYMBOL() for asm") > > > > > ] > > > > > > > > FWIW, I see these warnings as well. Any help from ppc developers is > > > > appreciated - should the R_PPC64_ADDR64 be whitelisted for exported asm > > > > symbols (their CRCs actually)? > > > > > > The dangling relocation is a side effect of linker unable to resolve the > > > reference to the undefined weak symbols. So the real question is, why has > > > genksyms not overridden these symbols with their CRC values? > > > > > > This may not even be powerpc specific, but I'll poke at it a bit more > > > when I get a chance. > > > > Not sure if this is relevant, but with the commit reverted, the > > __crc___... symbols are absolute. > > > > f55b3b3d A __crc___arch_hweight16 > > Ignore that :-) > > I just had a look at a x86_64 allmodconfig result and it looks like the > weak symbols are not resolved their either ... > > I may be missing something, but genksyms generates the crc's off the > preprocessed C source code and we don't have any for the asm files ... Looks like you're right, good find! Thanks, Nick
[PATCH v5 3/7] perf annotate: Add support for powerpc
From: "Naveen N. Rao"Current perf can disassemble annotated function but it does not have parsing logic for powerpc instructions. So all navigation options are not available for powerpc. Apart from that, Powerpc has long list of branch instructions and hardcoding them in table appears to be error-prone. So, add function to find instruction instead of creating table. This function dynamically create table (list of 'struct ins'), and instead of creating object every time, first check if list already contain object for that instruction. Signed-off-by: Naveen N. Rao Signed-off-by: Ravi Bangoria --- Changes in v5: - Removed hacks for instructions like bctr and bctrl from this patch. tools/perf/util/annotate.c | 116 + 1 file changed, 116 insertions(+) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index deb9af0..0b64841 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -459,6 +459,11 @@ static struct ins instructions_arm[] = { { .name = "bne", .ops = _ops, }, }; +struct instructions_powerpc { + struct ins *ins; + struct list_head list; +}; + static int ins__key_cmp(const void *name, const void *insp) { const struct ins *ins = insp; @@ -474,6 +479,115 @@ static int ins__cmp(const void *a, const void *b) return strcmp(ia->name, ib->name); } +static struct ins *list_add__ins_powerpc(struct instructions_powerpc *head, +const char *name, struct ins_ops *ops) +{ + struct instructions_powerpc *ins_powerpc; + struct ins *ins; + + ins = zalloc(sizeof(struct ins)); + if (!ins) + return NULL; + + ins_powerpc = zalloc(sizeof(struct instructions_powerpc)); + if (!ins_powerpc) + goto out_free_ins; + + ins->name = strdup(name); + if (!ins->name) + goto out_free_ins_power; + + ins->ops = ops; + ins_powerpc->ins = ins; + list_add_tail(&(ins_powerpc->list), &(head->list)); + + return ins; + +out_free_ins_power: + zfree(_powerpc); +out_free_ins: + zfree(); + return NULL; +} + +static struct ins *list_search__ins_powerpc(struct instructions_powerpc *head, + const char *name) +{ + struct instructions_powerpc *pos; + + list_for_each_entry(pos, >list, list) { + if (!strcmp(pos->ins->name, name)) + return pos->ins; + } + return NULL; +} + +static struct ins *ins__find_powerpc(const char *name) +{ + int i; + struct ins *ins; + struct ins_ops *ops; + static struct instructions_powerpc head; + static bool list_initialized; + + /* +* - Interested only if instruction starts with 'b'. +* - Few start with 'b', but aren't branch instructions. +*/ + if (name[0] != 'b' || + !strncmp(name, "bcd", 3) || + !strncmp(name, "brinc", 5) || + !strncmp(name, "bper", 4)) + return NULL; + + if (!list_initialized) { + INIT_LIST_HEAD(); + list_initialized = true; + } + + /* +* Return if we already have object of 'struct ins' for this instruction +*/ + ins = list_search__ins_powerpc(, name); + if (ins) + return ins; + + ops = _ops; + + i = strlen(name) - 1; + if (i < 0) + return NULL; + + /* ignore optional hints at the end of the instructions */ + if (name[i] == '+' || name[i] == '-') + i--; + + if (name[i] == 'l' || (name[i] == 'a' && name[i-1] == 'l')) { + /* +* if the instruction ends up with 'l' or 'la', then +* those are considered 'calls' since they update LR. +* ... except for 'bnl' which is branch if not less than +* and the absolute form of the same. +*/ + if (strcmp(name, "bnl") && strcmp(name, "bnl+") && + strcmp(name, "bnl-") && strcmp(name, "bnla") && + strcmp(name, "bnla+") && strcmp(name, "bnla-")) + ops = _ops; + } + if (name[i] == 'r' && name[i-1] == 'l') + /* +* instructions ending with 'lr' are considered to be +* return instructions +*/ + ops = _ops; + + /* +* Add instruction to list so next time no need to +* allocate memory for it. +*/ + return list_add__ins_powerpc(, name, ops); +} + static void ins__sort(struct ins *instructions, int nmemb) { qsort(instructions, nmemb, sizeof(struct ins), ins__cmp); @@ -509,6 +623,8 @@ static struct ins *ins__find(const
[PATCH v5 3/7] perf annotate: Add support for powerpc
From: "Naveen N. Rao" Current perf can disassemble annotated function but it does not have parsing logic for powerpc instructions. So all navigation options are not available for powerpc. Apart from that, Powerpc has long list of branch instructions and hardcoding them in table appears to be error-prone. So, add function to find instruction instead of creating table. This function dynamically create table (list of 'struct ins'), and instead of creating object every time, first check if list already contain object for that instruction. Signed-off-by: Naveen N. Rao Signed-off-by: Ravi Bangoria --- Changes in v5: - Removed hacks for instructions like bctr and bctrl from this patch. tools/perf/util/annotate.c | 116 + 1 file changed, 116 insertions(+) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index deb9af0..0b64841 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -459,6 +459,11 @@ static struct ins instructions_arm[] = { { .name = "bne", .ops = _ops, }, }; +struct instructions_powerpc { + struct ins *ins; + struct list_head list; +}; + static int ins__key_cmp(const void *name, const void *insp) { const struct ins *ins = insp; @@ -474,6 +479,115 @@ static int ins__cmp(const void *a, const void *b) return strcmp(ia->name, ib->name); } +static struct ins *list_add__ins_powerpc(struct instructions_powerpc *head, +const char *name, struct ins_ops *ops) +{ + struct instructions_powerpc *ins_powerpc; + struct ins *ins; + + ins = zalloc(sizeof(struct ins)); + if (!ins) + return NULL; + + ins_powerpc = zalloc(sizeof(struct instructions_powerpc)); + if (!ins_powerpc) + goto out_free_ins; + + ins->name = strdup(name); + if (!ins->name) + goto out_free_ins_power; + + ins->ops = ops; + ins_powerpc->ins = ins; + list_add_tail(&(ins_powerpc->list), &(head->list)); + + return ins; + +out_free_ins_power: + zfree(_powerpc); +out_free_ins: + zfree(); + return NULL; +} + +static struct ins *list_search__ins_powerpc(struct instructions_powerpc *head, + const char *name) +{ + struct instructions_powerpc *pos; + + list_for_each_entry(pos, >list, list) { + if (!strcmp(pos->ins->name, name)) + return pos->ins; + } + return NULL; +} + +static struct ins *ins__find_powerpc(const char *name) +{ + int i; + struct ins *ins; + struct ins_ops *ops; + static struct instructions_powerpc head; + static bool list_initialized; + + /* +* - Interested only if instruction starts with 'b'. +* - Few start with 'b', but aren't branch instructions. +*/ + if (name[0] != 'b' || + !strncmp(name, "bcd", 3) || + !strncmp(name, "brinc", 5) || + !strncmp(name, "bper", 4)) + return NULL; + + if (!list_initialized) { + INIT_LIST_HEAD(); + list_initialized = true; + } + + /* +* Return if we already have object of 'struct ins' for this instruction +*/ + ins = list_search__ins_powerpc(, name); + if (ins) + return ins; + + ops = _ops; + + i = strlen(name) - 1; + if (i < 0) + return NULL; + + /* ignore optional hints at the end of the instructions */ + if (name[i] == '+' || name[i] == '-') + i--; + + if (name[i] == 'l' || (name[i] == 'a' && name[i-1] == 'l')) { + /* +* if the instruction ends up with 'l' or 'la', then +* those are considered 'calls' since they update LR. +* ... except for 'bnl' which is branch if not less than +* and the absolute form of the same. +*/ + if (strcmp(name, "bnl") && strcmp(name, "bnl+") && + strcmp(name, "bnl-") && strcmp(name, "bnla") && + strcmp(name, "bnla+") && strcmp(name, "bnla-")) + ops = _ops; + } + if (name[i] == 'r' && name[i-1] == 'l') + /* +* instructions ending with 'lr' are considered to be +* return instructions +*/ + ops = _ops; + + /* +* Add instruction to list so next time no need to +* allocate memory for it. +*/ + return list_add__ins_powerpc(, name, ops); +} + static void ins__sort(struct ins *instructions, int nmemb) { qsort(instructions, nmemb, sizeof(struct ins), ins__cmp); @@ -509,6 +623,8 @@ static struct ins *ins__find(const char *name, const char *norm_arch) } else if (!strcmp(norm_arch, NORM_ARM)) {
[PATCH v5 7/7] perf annotate: Fix jump target outside of function address range
If jump target is outside of function range, perf is not handling it correctly. Especially when target address is lesser than function start address, target offset will be negative. But, target address declared to be unsigned, converts negative number into 2's complement. See below example. Here target of 'jumpq' instruction at 34cf8 is 34ac0 which is lesser than function start address(34cf0). 34ac0 - 34cf0 = -0x230 = 0xfdd0 Objdump output: 00034cf0 <__sigaction>: __GI___sigaction(): 34cf0: lea-0x20(%rdi),%eax 34cf3: cmp-bashx1,%eax 34cf6: jbe34d00 <__sigaction+0x10> 34cf8: jmpq 34ac0 <__GI___libc_sigaction> 34cfd: nopl (%rax) 34d00: mov0x386161(%rip),%rax# 3bae68 <_DYNAMIC+0x2e8> 34d07: movl -bashx16,%fs:(%rax) 34d0e: mov-bashx,%eax 34d13: retq perf annotate before applying patch: __GI___sigaction /usr/lib64/libc-2.22.so lea-0x20(%rdi),%eax cmp-bashx1,%eax V jbe10 V jmpq fdd0 nop 10:mov_DYNAMIC+0x2e8,%rax movl -bashx16,%fs:(%rax) mov-bashx,%eax retq perf annotate after applying patch: __GI___sigaction /usr/lib64/libc-2.22.so lea-0x20(%rdi),%eax cmp-bashx1,%eax V jbe10 ^ jmpq 34ac0 <__GI___libc_sigaction> nop 10:mov_DYNAMIC+0x2e8,%rax movl -bashx16,%fs:(%rax) mov-bashx,%eax retq Signed-off-by: Ravi Bangoria--- Changes in v5: - New patch tools/perf/ui/browsers/annotate.c | 5 +++-- tools/perf/util/annotate.c| 14 +- tools/perf/util/annotate.h| 5 +++-- 3 files changed, 15 insertions(+), 9 deletions(-) diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index 21c5e10..c13df5b 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -215,7 +215,7 @@ static void annotate_browser__write(struct ui_browser *browser, void *entry, int ui_browser__set_color(browser, color); if (dl->ins && dl->ins->ops->scnprintf) { if (ins__is_jump(dl->ins)) { - bool fwd = dl->ops.target.offset > (u64)dl->offset; + bool fwd = dl->ops.target.offset > dl->offset; ui_browser__write_graph(browser, fwd ? SLSMG_DARROW_CHAR : SLSMG_UARROW_CHAR); @@ -245,7 +245,8 @@ static bool disasm_line__is_valid_jump(struct disasm_line *dl, struct symbol *sy { if (!dl || !dl->ins || !ins__is_jump(dl->ins) || !disasm_line__has_offset(dl) - || dl->ops.target.offset >= symbol__size(sym)) + || dl->ops.target.offset < 0 + || dl->ops.target.offset >= (s64)symbol__size(sym)) return false; return true; diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 678fb81..c8b017c 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -124,10 +124,12 @@ static int jump__parse(struct ins_operands *ops, else ops->target.addr = strtoull(ops->raw, NULL, 16); - if (s++ != NULL) + if (s++ != NULL) { ops->target.offset = strtoull(s, NULL, 16); - else - ops->target.offset = UINT64_MAX; + ops->target.offset_avail = true; + } else { + ops->target.offset_avail = false; + } return 0; } @@ -135,7 +137,7 @@ static int jump__parse(struct ins_operands *ops, static int jump__scnprintf(struct ins *ins, char *bf, size_t size, struct ins_operands *ops) { - if (!ops->target.addr) + if (!ops->target.addr || ops->target.offset < 0) return ins__raw_scnprintf(ins, bf, size, ops); return scnprintf(bf, size, "%-6.6s %" PRIx64, ins->name, ops->target.offset); @@ -1228,9 +1230,11 @@ static int symbol__parse_objdump_line(struct symbol *sym, struct map *map, if (dl == NULL) return -1; - if (dl->ops.target.offset == UINT64_MAX) + if (!disasm_line__has_offset(dl)) { dl->ops.target.offset = dl->ops.target.addr - map__rip_2objdump(map, sym->start); + dl->ops.target.offset_avail = true; + } /* kcore has no symbols, so add the call target name */ if (dl->ins && ins__is_call(dl->ins) && !dl->ops.target.name) { diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h index 5cfad4e..5787ed8 100644 --- a/tools/perf/util/annotate.h +++ b/tools/perf/util/annotate.h @@ -19,7 +19,8 @@ struct ins_operands { char
[PATCH v5 4/7] perf annotate: Do not ignore call instruction with indirect target
Do not ignore call instruction with indirect target when its already identified as a call. This is an extension of commit e8ea1561952b ("perf annotate: Use raw form for register indirect call instructions") to generalize annotation for all instructions with indirect calls. This is needed for certain powerpc call instructions that use address in a register (such as bctrl, btarl, ...). Apart from that, when kcore is used to disassemble function, all call instructions were ignored. This patch will fix it as a side effect by not ignoring them. For example, Before (with kcore): mov%r13,%rdi callq 0x811a7e70 ^ jmpq 64 mov%gs:0x7ef41a6e(%rip),%al After (with kcore): mov%r13,%rdi > callq 0x811a7e70 ^ jmpq 64 mov%gs:0x7ef41a6e(%rip),%al Suggested-by: Michael Ellerman[Suggested about 'bctrl' instruction] Signed-off-by: Ravi Bangoria --- Changes in v5: - New patch, introduced to annotate all indirect call instructions. tools/perf/util/annotate.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 0b64841..6368ba9 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -81,16 +81,12 @@ static int call__parse(struct ins_operands *ops, const char *norm_arch) return ops->target.name == NULL ? -1 : 0; indirect_call: - tok = strchr(endptr, '('); - if (tok != NULL) { + tok = strchr(endptr, '*'); + if (tok == NULL) { ops->target.addr = 0; return 0; } - tok = strchr(endptr, '*'); - if (tok == NULL) - return -1; - ops->target.addr = strtoull(tok + 1, NULL, 16); return 0; } -- 2.5.5
[PATCH v5 2/7] perf annotate: Add cross arch annotate support
Change current data structures and function to enable cross arch annotate. Current perf implementation does not support cross arch annotate. To make it truly cross arch, instruction table of all arch should be present in perf binary. And use appropriate table based on arch where perf.data was recorded. Signed-off-by: Ravi Bangoria--- Changes in v5: - Replaced symbol__annotate with symbol__disassemble. tools/perf/builtin-top.c | 2 +- tools/perf/ui/browsers/annotate.c | 3 +- tools/perf/ui/gtk/annotate.c | 2 +- tools/perf/util/annotate.c| 133 -- tools/perf/util/annotate.h| 5 +- 5 files changed, 92 insertions(+), 53 deletions(-) diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index a3223aa..fdd4203 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -129,7 +129,7 @@ static int perf_top__parse_source(struct perf_top *top, struct hist_entry *he) return err; } - err = symbol__disassemble(sym, map, 0); + err = symbol__disassemble(sym, map, 0, NULL); if (err == 0) { out_assign: top->sym_filter_entry = he; diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index 2e2d100..21c5e10 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -1050,7 +1050,8 @@ int symbol__tui_annotate(struct symbol *sym, struct map *map, (nr_pcnt - 1); } - err = symbol__disassemble(sym, map, sizeof_bdl); + err = symbol__disassemble(sym, map, sizeof_bdl, + perf_evsel__env_arch(evsel)); if (err) { char msg[BUFSIZ]; symbol__strerror_disassemble(sym, map, err, msg, sizeof(msg)); diff --git a/tools/perf/ui/gtk/annotate.c b/tools/perf/ui/gtk/annotate.c index 42d3199..c127aba 100644 --- a/tools/perf/ui/gtk/annotate.c +++ b/tools/perf/ui/gtk/annotate.c @@ -167,7 +167,7 @@ static int symbol__gtk_annotate(struct symbol *sym, struct map *map, if (map->dso->annotate_warned) return -1; - err = symbol__disassemble(sym, map, 0); + err = symbol__disassemble(sym, map, 0, perf_evsel__env_arch(evsel)); if (err) { char msg[BUFSIZ]; symbol__strerror_disassemble(sym, map, err, msg, sizeof(msg)); diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 25a9259..deb9af0 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -20,12 +20,14 @@ #include #include #include +#include +#include "../arch/common.h" const char *disassembler_style; const char *objdump_path; static regex_t file_lineno; -static struct ins *ins__find(const char *name); +static struct ins *ins__find(const char *name, const char *norm_arch); static int disasm_line__parse(char *line, char **namep, char **rawp); static void ins__delete(struct ins_operands *ops) @@ -53,7 +55,7 @@ int ins__scnprintf(struct ins *ins, char *bf, size_t size, return ins__raw_scnprintf(ins, bf, size, ops); } -static int call__parse(struct ins_operands *ops) +static int call__parse(struct ins_operands *ops, const char *norm_arch) { char *endptr, *tok, *name; @@ -65,10 +67,8 @@ static int call__parse(struct ins_operands *ops) name++; -#ifdef __arm__ - if (strchr(name, '+')) + if (!strcmp(norm_arch, NORM_ARM) && strchr(name, '+')) return -1; -#endif tok = strchr(name, '>'); if (tok == NULL) @@ -117,7 +117,8 @@ bool ins__is_call(const struct ins *ins) return ins->ops == _ops; } -static int jump__parse(struct ins_operands *ops) +static int jump__parse(struct ins_operands *ops, + const char *norm_arch __maybe_unused) { const char *s = strchr(ops->raw, '+'); @@ -172,7 +173,7 @@ static int comment__symbol(char *raw, char *comment, u64 *addrp, char **namep) return 0; } -static int lock__parse(struct ins_operands *ops) +static int lock__parse(struct ins_operands *ops, const char *norm_arch) { char *name; @@ -183,7 +184,7 @@ static int lock__parse(struct ins_operands *ops) if (disasm_line__parse(ops->raw, , >locked.ops->raw) < 0) goto out_free_ops; - ops->locked.ins = ins__find(name); + ops->locked.ins = ins__find(name, norm_arch); free(name); if (ops->locked.ins == NULL) @@ -193,7 +194,7 @@ static int lock__parse(struct ins_operands *ops) return 0; if (ops->locked.ins->ops->parse && - ops->locked.ins->ops->parse(ops->locked.ops) < 0) + ops->locked.ins->ops->parse(ops->locked.ops, norm_arch) < 0) goto out_free_ops; return 0; @@ -236,7 +237,7 @@ static struct ins_ops lock_ops = { .scnprintf = lock__scnprintf, };
[PATCH v5 7/7] perf annotate: Fix jump target outside of function address range
If jump target is outside of function range, perf is not handling it correctly. Especially when target address is lesser than function start address, target offset will be negative. But, target address declared to be unsigned, converts negative number into 2's complement. See below example. Here target of 'jumpq' instruction at 34cf8 is 34ac0 which is lesser than function start address(34cf0). 34ac0 - 34cf0 = -0x230 = 0xfdd0 Objdump output: 00034cf0 <__sigaction>: __GI___sigaction(): 34cf0: lea-0x20(%rdi),%eax 34cf3: cmp-bashx1,%eax 34cf6: jbe34d00 <__sigaction+0x10> 34cf8: jmpq 34ac0 <__GI___libc_sigaction> 34cfd: nopl (%rax) 34d00: mov0x386161(%rip),%rax# 3bae68 <_DYNAMIC+0x2e8> 34d07: movl -bashx16,%fs:(%rax) 34d0e: mov-bashx,%eax 34d13: retq perf annotate before applying patch: __GI___sigaction /usr/lib64/libc-2.22.so lea-0x20(%rdi),%eax cmp-bashx1,%eax V jbe10 V jmpq fdd0 nop 10:mov_DYNAMIC+0x2e8,%rax movl -bashx16,%fs:(%rax) mov-bashx,%eax retq perf annotate after applying patch: __GI___sigaction /usr/lib64/libc-2.22.so lea-0x20(%rdi),%eax cmp-bashx1,%eax V jbe10 ^ jmpq 34ac0 <__GI___libc_sigaction> nop 10:mov_DYNAMIC+0x2e8,%rax movl -bashx16,%fs:(%rax) mov-bashx,%eax retq Signed-off-by: Ravi Bangoria --- Changes in v5: - New patch tools/perf/ui/browsers/annotate.c | 5 +++-- tools/perf/util/annotate.c| 14 +- tools/perf/util/annotate.h| 5 +++-- 3 files changed, 15 insertions(+), 9 deletions(-) diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index 21c5e10..c13df5b 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -215,7 +215,7 @@ static void annotate_browser__write(struct ui_browser *browser, void *entry, int ui_browser__set_color(browser, color); if (dl->ins && dl->ins->ops->scnprintf) { if (ins__is_jump(dl->ins)) { - bool fwd = dl->ops.target.offset > (u64)dl->offset; + bool fwd = dl->ops.target.offset > dl->offset; ui_browser__write_graph(browser, fwd ? SLSMG_DARROW_CHAR : SLSMG_UARROW_CHAR); @@ -245,7 +245,8 @@ static bool disasm_line__is_valid_jump(struct disasm_line *dl, struct symbol *sy { if (!dl || !dl->ins || !ins__is_jump(dl->ins) || !disasm_line__has_offset(dl) - || dl->ops.target.offset >= symbol__size(sym)) + || dl->ops.target.offset < 0 + || dl->ops.target.offset >= (s64)symbol__size(sym)) return false; return true; diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 678fb81..c8b017c 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -124,10 +124,12 @@ static int jump__parse(struct ins_operands *ops, else ops->target.addr = strtoull(ops->raw, NULL, 16); - if (s++ != NULL) + if (s++ != NULL) { ops->target.offset = strtoull(s, NULL, 16); - else - ops->target.offset = UINT64_MAX; + ops->target.offset_avail = true; + } else { + ops->target.offset_avail = false; + } return 0; } @@ -135,7 +137,7 @@ static int jump__parse(struct ins_operands *ops, static int jump__scnprintf(struct ins *ins, char *bf, size_t size, struct ins_operands *ops) { - if (!ops->target.addr) + if (!ops->target.addr || ops->target.offset < 0) return ins__raw_scnprintf(ins, bf, size, ops); return scnprintf(bf, size, "%-6.6s %" PRIx64, ins->name, ops->target.offset); @@ -1228,9 +1230,11 @@ static int symbol__parse_objdump_line(struct symbol *sym, struct map *map, if (dl == NULL) return -1; - if (dl->ops.target.offset == UINT64_MAX) + if (!disasm_line__has_offset(dl)) { dl->ops.target.offset = dl->ops.target.addr - map__rip_2objdump(map, sym->start); + dl->ops.target.offset_avail = true; + } /* kcore has no symbols, so add the call target name */ if (dl->ins && ins__is_call(dl->ins) && !dl->ops.target.name) { diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h index 5cfad4e..5787ed8 100644 --- a/tools/perf/util/annotate.h +++ b/tools/perf/util/annotate.h @@ -19,7 +19,8 @@ struct ins_operands { char*raw; char
[PATCH v5 4/7] perf annotate: Do not ignore call instruction with indirect target
Do not ignore call instruction with indirect target when its already identified as a call. This is an extension of commit e8ea1561952b ("perf annotate: Use raw form for register indirect call instructions") to generalize annotation for all instructions with indirect calls. This is needed for certain powerpc call instructions that use address in a register (such as bctrl, btarl, ...). Apart from that, when kcore is used to disassemble function, all call instructions were ignored. This patch will fix it as a side effect by not ignoring them. For example, Before (with kcore): mov%r13,%rdi callq 0x811a7e70 ^ jmpq 64 mov%gs:0x7ef41a6e(%rip),%al After (with kcore): mov%r13,%rdi > callq 0x811a7e70 ^ jmpq 64 mov%gs:0x7ef41a6e(%rip),%al Suggested-by: Michael Ellerman [Suggested about 'bctrl' instruction] Signed-off-by: Ravi Bangoria --- Changes in v5: - New patch, introduced to annotate all indirect call instructions. tools/perf/util/annotate.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 0b64841..6368ba9 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -81,16 +81,12 @@ static int call__parse(struct ins_operands *ops, const char *norm_arch) return ops->target.name == NULL ? -1 : 0; indirect_call: - tok = strchr(endptr, '('); - if (tok != NULL) { + tok = strchr(endptr, '*'); + if (tok == NULL) { ops->target.addr = 0; return 0; } - tok = strchr(endptr, '*'); - if (tok == NULL) - return -1; - ops->target.addr = strtoull(tok + 1, NULL, 16); return 0; } -- 2.5.5
[PATCH v5 2/7] perf annotate: Add cross arch annotate support
Change current data structures and function to enable cross arch annotate. Current perf implementation does not support cross arch annotate. To make it truly cross arch, instruction table of all arch should be present in perf binary. And use appropriate table based on arch where perf.data was recorded. Signed-off-by: Ravi Bangoria --- Changes in v5: - Replaced symbol__annotate with symbol__disassemble. tools/perf/builtin-top.c | 2 +- tools/perf/ui/browsers/annotate.c | 3 +- tools/perf/ui/gtk/annotate.c | 2 +- tools/perf/util/annotate.c| 133 -- tools/perf/util/annotate.h| 5 +- 5 files changed, 92 insertions(+), 53 deletions(-) diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index a3223aa..fdd4203 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -129,7 +129,7 @@ static int perf_top__parse_source(struct perf_top *top, struct hist_entry *he) return err; } - err = symbol__disassemble(sym, map, 0); + err = symbol__disassemble(sym, map, 0, NULL); if (err == 0) { out_assign: top->sym_filter_entry = he; diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index 2e2d100..21c5e10 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -1050,7 +1050,8 @@ int symbol__tui_annotate(struct symbol *sym, struct map *map, (nr_pcnt - 1); } - err = symbol__disassemble(sym, map, sizeof_bdl); + err = symbol__disassemble(sym, map, sizeof_bdl, + perf_evsel__env_arch(evsel)); if (err) { char msg[BUFSIZ]; symbol__strerror_disassemble(sym, map, err, msg, sizeof(msg)); diff --git a/tools/perf/ui/gtk/annotate.c b/tools/perf/ui/gtk/annotate.c index 42d3199..c127aba 100644 --- a/tools/perf/ui/gtk/annotate.c +++ b/tools/perf/ui/gtk/annotate.c @@ -167,7 +167,7 @@ static int symbol__gtk_annotate(struct symbol *sym, struct map *map, if (map->dso->annotate_warned) return -1; - err = symbol__disassemble(sym, map, 0); + err = symbol__disassemble(sym, map, 0, perf_evsel__env_arch(evsel)); if (err) { char msg[BUFSIZ]; symbol__strerror_disassemble(sym, map, err, msg, sizeof(msg)); diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 25a9259..deb9af0 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -20,12 +20,14 @@ #include #include #include +#include +#include "../arch/common.h" const char *disassembler_style; const char *objdump_path; static regex_t file_lineno; -static struct ins *ins__find(const char *name); +static struct ins *ins__find(const char *name, const char *norm_arch); static int disasm_line__parse(char *line, char **namep, char **rawp); static void ins__delete(struct ins_operands *ops) @@ -53,7 +55,7 @@ int ins__scnprintf(struct ins *ins, char *bf, size_t size, return ins__raw_scnprintf(ins, bf, size, ops); } -static int call__parse(struct ins_operands *ops) +static int call__parse(struct ins_operands *ops, const char *norm_arch) { char *endptr, *tok, *name; @@ -65,10 +67,8 @@ static int call__parse(struct ins_operands *ops) name++; -#ifdef __arm__ - if (strchr(name, '+')) + if (!strcmp(norm_arch, NORM_ARM) && strchr(name, '+')) return -1; -#endif tok = strchr(name, '>'); if (tok == NULL) @@ -117,7 +117,8 @@ bool ins__is_call(const struct ins *ins) return ins->ops == _ops; } -static int jump__parse(struct ins_operands *ops) +static int jump__parse(struct ins_operands *ops, + const char *norm_arch __maybe_unused) { const char *s = strchr(ops->raw, '+'); @@ -172,7 +173,7 @@ static int comment__symbol(char *raw, char *comment, u64 *addrp, char **namep) return 0; } -static int lock__parse(struct ins_operands *ops) +static int lock__parse(struct ins_operands *ops, const char *norm_arch) { char *name; @@ -183,7 +184,7 @@ static int lock__parse(struct ins_operands *ops) if (disasm_line__parse(ops->raw, , >locked.ops->raw) < 0) goto out_free_ops; - ops->locked.ins = ins__find(name); + ops->locked.ins = ins__find(name, norm_arch); free(name); if (ops->locked.ins == NULL) @@ -193,7 +194,7 @@ static int lock__parse(struct ins_operands *ops) return 0; if (ops->locked.ins->ops->parse && - ops->locked.ins->ops->parse(ops->locked.ops) < 0) + ops->locked.ins->ops->parse(ops->locked.ops, norm_arch) < 0) goto out_free_ops; return 0; @@ -236,7 +237,7 @@ static struct ins_ops lock_ops = { .scnprintf = lock__scnprintf, }; -static int mov__parse(struct
Re: [LKP] [lkp] [sctp] a6c2f79287: netperf.Throughput_Mbps -37.2% regression
On Thu, Aug 18, 2016 at 08:45:42PM +0800, Xin Long wrote: > >> Hi, Aaron > >> > >> 1) > >> I talked with Marcelo about this one. > >> He said it might be related with cacheline. the new field distroyed > >> the prior cacheline. So on top of commit 826d253d57b1, pls only add > >> + unsigned long prsctp_param; > >> > >> to the end of struct sctp_chunk, then try. > > > > This doesn't work. > > > > If it's because of cache lines changed, I'm not sure this, either. > Maybe 2) is a good way to fix it. A comparison of the good commit 826d253d57b1 and the bad a6c2f792873a: tests: 8 testcase/path_params/tbox_group/run: netperf/ipv4-300s-200%-cs-localhost-10K-SCTP_STREAM_MANY-performance/lkp-ivb-d02 826d253d57b11f69 a6c2f792873aff332a4689717c -- %stddev change %stddev \ |\ 3923 -37% 2461netperf.Throughput_Mbps 9 -78% 2vmstat.procs.r 112616 19% 133981vmstat.system.cs 4053 7% 4350vmstat.system.in 8598 ± 4% 957% 90912softirqs.SCHED 16466114 -37% 10305467softirqs.NET_RX 605899 -46% 329262softirqs.TIMER 72067 ± 10% -63% 26356 ± 3% softirqs.RCU 4785 ± 7%-9% 4352slabinfo.anon_vma_chain.num_objs 642 ± 7%14%731 ± 6% slabinfo.kmalloc-512.active_objs 4993 15% 5735slabinfo.kmalloc-64.active_objs 4993 15% 5735slabinfo.kmalloc-64.num_objs 2529 ± 4% -15% 2150proc-vmstat.nr_alloc_batch 4.733e+08 -37% 2.999e+08proc-vmstat.pgalloc_normal 8.476e+08 -37% 5.36e+08proc-vmstat.pgfree 3.742e+08 -37% 2.361e+08proc-vmstat.pgalloc_dma32 1.48e+08 -37% 93033641proc-vmstat.numa_hit 1.48e+08 -37% 93033640proc-vmstat.numa_local 0.05 ± 17% 52102% 24.80turbostat.CPU%c1 0.643065% 20.10 ± 3% turbostat.CPU%c6 0.12 ± 39% 1900% 2.35 ± 3% turbostat.Pkg%pc2 0.46 ± 10% 1686% 8.22 ± 6% turbostat.Pkg%pc6 37.54 -14% 32.11turbostat.PkgWatt 20.20 -25% 15.22turbostat.CorWatt 99.31 -45% 54.97turbostat.%Busy 3269 -45% 1803turbostat.Avg_MHz 76510 ± 46% 3e+05% 1.954e+08cpuidle.C1-IVB.time 19769 ± 17% 5534%1113742 ± 5% cpuidle.C1E-IVB.time 151 ± 11% 4175% 6454 ± 7% cpuidle.C1E-IVB.usage 114 ± 14% 6216% 7232 ± 5% cpuidle.C3-IVB.usage 33074 ± 14% 5159%1739419 ± 3% cpuidle.C3-IVB.time 88744203% 381901cpuidle.C6-IVB.usage 80061844072% 3.34e+08cpuidle.C6-IVB.time 12019 ± 35% 303% 48398perf-stat.cpu-migrations 34232822 19% 40780053perf-stat.context-switches 339045 5% 354573perf-stat.minor-faults 339041 5% 354568perf-stat.page-faults 2.776e+11 -28% 2.003e+11perf-stat.branch-instructions 1.505e+12 -29% 1.065e+12perf-stat.instructions 6.421e+11 -30% 4.473e+11perf-stat.dTLB-loads 5.32e+11 -34% 3.536e+11perf-stat.dTLB-stores 1.173e+11 -38% 7.271e+10perf-stat.cache-references 3.735e+08 ± 5% -48% 1.959e+08 ± 4% perf-stat.iTLB-load-misses 3.864e+09 -51%1.9e+09perf-stat.branch-misses 4.069e+09 ± 20% -56% 1.798e+09 ± 40% perf-stat.dTLB-load-misses 5.285e+08 ± 22% -70% 1.585e+08 ± 16% perf-stat.dTLB-store-misses 7.126e+09 ± 16% -97% 2.27e+08 ± 4% perf-stat.cache-misses The obvious change is: 1 the bad commit has a much fewer runnable process - vmstat.procs.r 2 the context switches are much higher in the bad commit - vmstat.system.cs It all suggests the netperf processes go to sleep for some reason in the bad commit. I used "perf record -p one_netperf_pid -e probe:pick_next_task_idle" as suggested by Tim to see where it went to sleep: Samples: 78 of event 'probe:pick_next_task_idle', Event count(approx.): 78 Children Self Trace output ■- 100.00% 100.00% (810fc750) ▒ __sendmsg_nocancel ▒ entry_SYSCALL_64_fastpath ▒ sys_sendmsg ▒ __sys_sendmsg ▒ ___sys_sendmsg ▒ inet_sendmsg ▒ sctp_sendmsg ▒ sctp_wait_for_sndbuf ▒ schedule_timeout ▒ schedule ▒ pick_next_task_idle It doesn't look insane and sctp_wait_for_sndbuf may actually have something to do
[PATCH v5 6/7] perf annotate: Support jump instruction with target as second operand
Current perf is not able to parse jump instruction when second operand contains target address. Arch like powerpc has such instructions. For example, 'beq cr7,10173e60'. Signed-off-by: Ravi Bangoria--- Changes in v5: - New patch tools/perf/util/annotate.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 4a4a583..678fb81 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -117,8 +117,12 @@ static int jump__parse(struct ins_operands *ops, const char *norm_arch __maybe_unused) { const char *s = strchr(ops->raw, '+'); + const char *c = strchr(ops->raw, ','); - ops->target.addr = strtoull(ops->raw, NULL, 16); + if (c++ != NULL) + ops->target.addr = strtoull(c, NULL, 16); + else + ops->target.addr = strtoull(ops->raw, NULL, 16); if (s++ != NULL) ops->target.offset = strtoull(s, NULL, 16); -- 2.5.5
Re: [LKP] [lkp] [sctp] a6c2f79287: netperf.Throughput_Mbps -37.2% regression
On Thu, Aug 18, 2016 at 08:45:42PM +0800, Xin Long wrote: > >> Hi, Aaron > >> > >> 1) > >> I talked with Marcelo about this one. > >> He said it might be related with cacheline. the new field distroyed > >> the prior cacheline. So on top of commit 826d253d57b1, pls only add > >> + unsigned long prsctp_param; > >> > >> to the end of struct sctp_chunk, then try. > > > > This doesn't work. > > > > If it's because of cache lines changed, I'm not sure this, either. > Maybe 2) is a good way to fix it. A comparison of the good commit 826d253d57b1 and the bad a6c2f792873a: tests: 8 testcase/path_params/tbox_group/run: netperf/ipv4-300s-200%-cs-localhost-10K-SCTP_STREAM_MANY-performance/lkp-ivb-d02 826d253d57b11f69 a6c2f792873aff332a4689717c -- %stddev change %stddev \ |\ 3923 -37% 2461netperf.Throughput_Mbps 9 -78% 2vmstat.procs.r 112616 19% 133981vmstat.system.cs 4053 7% 4350vmstat.system.in 8598 ± 4% 957% 90912softirqs.SCHED 16466114 -37% 10305467softirqs.NET_RX 605899 -46% 329262softirqs.TIMER 72067 ± 10% -63% 26356 ± 3% softirqs.RCU 4785 ± 7%-9% 4352slabinfo.anon_vma_chain.num_objs 642 ± 7%14%731 ± 6% slabinfo.kmalloc-512.active_objs 4993 15% 5735slabinfo.kmalloc-64.active_objs 4993 15% 5735slabinfo.kmalloc-64.num_objs 2529 ± 4% -15% 2150proc-vmstat.nr_alloc_batch 4.733e+08 -37% 2.999e+08proc-vmstat.pgalloc_normal 8.476e+08 -37% 5.36e+08proc-vmstat.pgfree 3.742e+08 -37% 2.361e+08proc-vmstat.pgalloc_dma32 1.48e+08 -37% 93033641proc-vmstat.numa_hit 1.48e+08 -37% 93033640proc-vmstat.numa_local 0.05 ± 17% 52102% 24.80turbostat.CPU%c1 0.643065% 20.10 ± 3% turbostat.CPU%c6 0.12 ± 39% 1900% 2.35 ± 3% turbostat.Pkg%pc2 0.46 ± 10% 1686% 8.22 ± 6% turbostat.Pkg%pc6 37.54 -14% 32.11turbostat.PkgWatt 20.20 -25% 15.22turbostat.CorWatt 99.31 -45% 54.97turbostat.%Busy 3269 -45% 1803turbostat.Avg_MHz 76510 ± 46% 3e+05% 1.954e+08cpuidle.C1-IVB.time 19769 ± 17% 5534%1113742 ± 5% cpuidle.C1E-IVB.time 151 ± 11% 4175% 6454 ± 7% cpuidle.C1E-IVB.usage 114 ± 14% 6216% 7232 ± 5% cpuidle.C3-IVB.usage 33074 ± 14% 5159%1739419 ± 3% cpuidle.C3-IVB.time 88744203% 381901cpuidle.C6-IVB.usage 80061844072% 3.34e+08cpuidle.C6-IVB.time 12019 ± 35% 303% 48398perf-stat.cpu-migrations 34232822 19% 40780053perf-stat.context-switches 339045 5% 354573perf-stat.minor-faults 339041 5% 354568perf-stat.page-faults 2.776e+11 -28% 2.003e+11perf-stat.branch-instructions 1.505e+12 -29% 1.065e+12perf-stat.instructions 6.421e+11 -30% 4.473e+11perf-stat.dTLB-loads 5.32e+11 -34% 3.536e+11perf-stat.dTLB-stores 1.173e+11 -38% 7.271e+10perf-stat.cache-references 3.735e+08 ± 5% -48% 1.959e+08 ± 4% perf-stat.iTLB-load-misses 3.864e+09 -51%1.9e+09perf-stat.branch-misses 4.069e+09 ± 20% -56% 1.798e+09 ± 40% perf-stat.dTLB-load-misses 5.285e+08 ± 22% -70% 1.585e+08 ± 16% perf-stat.dTLB-store-misses 7.126e+09 ± 16% -97% 2.27e+08 ± 4% perf-stat.cache-misses The obvious change is: 1 the bad commit has a much fewer runnable process - vmstat.procs.r 2 the context switches are much higher in the bad commit - vmstat.system.cs It all suggests the netperf processes go to sleep for some reason in the bad commit. I used "perf record -p one_netperf_pid -e probe:pick_next_task_idle" as suggested by Tim to see where it went to sleep: Samples: 78 of event 'probe:pick_next_task_idle', Event count(approx.): 78 Children Self Trace output ■- 100.00% 100.00% (810fc750) ▒ __sendmsg_nocancel ▒ entry_SYSCALL_64_fastpath ▒ sys_sendmsg ▒ __sys_sendmsg ▒ ___sys_sendmsg ▒ inet_sendmsg ▒ sctp_sendmsg ▒ sctp_wait_for_sndbuf ▒ schedule_timeout ▒ schedule ▒ pick_next_task_idle It doesn't look insane and sctp_wait_for_sndbuf may actually have something to do
[PATCH v5 6/7] perf annotate: Support jump instruction with target as second operand
Current perf is not able to parse jump instruction when second operand contains target address. Arch like powerpc has such instructions. For example, 'beq cr7,10173e60'. Signed-off-by: Ravi Bangoria --- Changes in v5: - New patch tools/perf/util/annotate.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 4a4a583..678fb81 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -117,8 +117,12 @@ static int jump__parse(struct ins_operands *ops, const char *norm_arch __maybe_unused) { const char *s = strchr(ops->raw, '+'); + const char *c = strchr(ops->raw, ','); - ops->target.addr = strtoull(ops->raw, NULL, 16); + if (c++ != NULL) + ops->target.addr = strtoull(c, NULL, 16); + else + ops->target.addr = strtoull(ops->raw, NULL, 16); if (s++ != NULL) ops->target.offset = strtoull(s, NULL, 16); -- 2.5.5
[PATCH v5 1/7] perf: Define macro for normalized arch names
Define macro for each normalized arch name and use them instead of using arch name as string. Signed-off-by: Ravi Bangoria--- Changes in v5: - No changes. tools/perf/arch/common.c | 36 ++-- tools/perf/arch/common.h | 11 +++ tools/perf/util/unwind-libunwind.c | 4 ++-- 3 files changed, 31 insertions(+), 20 deletions(-) diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c index 886dd2a..f763666 100644 --- a/tools/perf/arch/common.c +++ b/tools/perf/arch/common.c @@ -123,25 +123,25 @@ static int lookup_triplets(const char *const *triplets, const char *name) const char *normalize_arch(char *arch) { if (!strcmp(arch, "x86_64")) - return "x86"; + return NORM_X86; if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6') - return "x86"; + return NORM_X86; if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5)) - return "sparc"; + return NORM_SPARC; if (!strcmp(arch, "aarch64") || !strcmp(arch, "arm64")) - return "arm64"; + return NORM_ARM64; if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110")) - return "arm"; + return NORM_ARM; if (!strncmp(arch, "s390", 4)) - return "s390"; + return NORM_S390; if (!strncmp(arch, "parisc", 6)) - return "parisc"; + return NORM_PARISC; if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3)) - return "powerpc"; + return NORM_POWERPC; if (!strncmp(arch, "mips", 4)) - return "mips"; + return NORM_MIPS; if (!strncmp(arch, "sh", 2) && isdigit(arch[2])) - return "sh"; + return NORM_SH; return arch; } @@ -181,21 +181,21 @@ static int perf_env__lookup_binutils_path(struct perf_env *env, zfree(); } - if (!strcmp(arch, "arm")) + if (!strcmp(arch, NORM_ARM)) path_list = arm_triplets; - else if (!strcmp(arch, "arm64")) + else if (!strcmp(arch, NORM_ARM64)) path_list = arm64_triplets; - else if (!strcmp(arch, "powerpc")) + else if (!strcmp(arch, NORM_POWERPC)) path_list = powerpc_triplets; - else if (!strcmp(arch, "sh")) + else if (!strcmp(arch, NORM_SH)) path_list = sh_triplets; - else if (!strcmp(arch, "s390")) + else if (!strcmp(arch, NORM_S390)) path_list = s390_triplets; - else if (!strcmp(arch, "sparc")) + else if (!strcmp(arch, NORM_SPARC)) path_list = sparc_triplets; - else if (!strcmp(arch, "x86")) + else if (!strcmp(arch, NORM_X86)) path_list = x86_triplets; - else if (!strcmp(arch, "mips")) + else if (!strcmp(arch, NORM_MIPS)) path_list = mips_triplets; else { ui__error("binutils for %s not supported.\n", arch); diff --git a/tools/perf/arch/common.h b/tools/perf/arch/common.h index 6b01c73..14ca8ca 100644 --- a/tools/perf/arch/common.h +++ b/tools/perf/arch/common.h @@ -5,6 +5,17 @@ extern const char *objdump_path; +/* Macro for normalized arch names */ +#define NORM_X86 "x86" +#define NORM_SPARC "sparc" +#define NORM_ARM64 "arm64" +#define NORM_ARM "arm" +#define NORM_S390 "s390" +#define NORM_PARISC"parisc" +#define NORM_POWERPC "powerpc" +#define NORM_MIPS "mips" +#define NORM_SH"sh" + int perf_env__lookup_objdump(struct perf_env *env); const char *normalize_arch(char *arch); diff --git a/tools/perf/util/unwind-libunwind.c b/tools/perf/util/unwind-libunwind.c index 6d542a4..6199102 100644 --- a/tools/perf/util/unwind-libunwind.c +++ b/tools/perf/util/unwind-libunwind.c @@ -40,10 +40,10 @@ int unwind__prepare_access(struct thread *thread, struct map *map, arch = normalize_arch(thread->mg->machine->env->arch); - if (!strcmp(arch, "x86")) { + if (!strcmp(arch, NORM_X86)) { if (dso_type != DSO__TYPE_64BIT) ops = x86_32_unwind_libunwind_ops; - } else if (!strcmp(arch, "arm64") || !strcmp(arch, "arm")) { + } else if (!strcmp(arch, NORM_ARM64) || !strcmp(arch, NORM_ARM)) { if (dso_type == DSO__TYPE_64BIT) ops = arm64_unwind_libunwind_ops; } -- 2.5.5
[PATCH v5 0/7] perf: Cross arch annotate + few miscellaneous fixes
Currently Perf annotate support code navigation (branches and calls) only when run on the same architecture where perf.data was recorded. But, for example, record on powerpc server and annotate on client's x86 desktop is not supported. This patchset enables cross arch annotate. Currently I've used x86 and arm instructions which are already available and added support for powerpc. Additionally this patch series also contains few other related fixes. Patches are prepared on top of acme/perf/core and tested it with x86 and powerpc only. Note for arm: Few instructions were defined under #if __arm__ which I've used as a table for arm. But I'm not sure whether instruction defined outside of that also contains arm instructions. Apart from that, 'call__parse()' and 'move__parse()' contains #ifdef __arm__ directive. I've changed it to if (!strcmp(norm_arch, arm)). I don't have a arm machine to test these changes. Example: Record on powerpc: $ ./perf record -a Report -> Annotate on x86: $ ./perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc Changes in v5: - Replaced symbol__annotate with symbol__disassemble. - Removed hacks for jump and call instructions like bctr and bctrl respectively from generic patch that enables support for powerpc and made separate patch for that. - v4 was not annotating powerpc 'btar' instruction. Included that. - Added few generic fixes. v4 link: https://lkml.org/lkml/2016/7/8/10 Naveen N. Rao (1): perf annotate: Add support for powerpc Ravi Bangoria (6): perf: Define macro for normalized arch names perf annotate: Add cross arch annotate support perf annotate: Do not ignore call instruction with indirect target perf annotate: Show raw form for jump instruction with indirect target perf annotate: Support jump instruction with target as second operand perf annotate: Fix jump target outside of function address range tools/perf/arch/common.c | 36 ++--- tools/perf/arch/common.h | 11 ++ tools/perf/builtin-top.c | 2 +- tools/perf/ui/browsers/annotate.c | 8 +- tools/perf/ui/gtk/annotate.c | 2 +- tools/perf/util/annotate.c | 276 + tools/perf/util/annotate.h | 10 +- tools/perf/util/unwind-libunwind.c | 4 +- 8 files changed, 262 insertions(+), 87 deletions(-) -- 2.5.5
[PATCH v5 5/7] perf annotate: Show raw form for jump instruction with indirect target
For jump instructions that does not include target address as direct operand, use raw value for that. This is needed for certain powerpc jump instructions that use target address in a register (such as bctr, btar, ...). Suggested-by: Michael EllermanSigned-off-by: Ravi Bangoria --- Changes in v5: - New patch introduced to annotate jump instruction with indirect target tools/perf/util/annotate.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 6368ba9..4a4a583 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -131,6 +131,9 @@ static int jump__parse(struct ins_operands *ops, static int jump__scnprintf(struct ins *ins, char *bf, size_t size, struct ins_operands *ops) { + if (!ops->target.addr) + return ins__raw_scnprintf(ins, bf, size, ops); + return scnprintf(bf, size, "%-6.6s %" PRIx64, ins->name, ops->target.offset); } -- 2.5.5
[PATCH v5 1/7] perf: Define macro for normalized arch names
Define macro for each normalized arch name and use them instead of using arch name as string. Signed-off-by: Ravi Bangoria --- Changes in v5: - No changes. tools/perf/arch/common.c | 36 ++-- tools/perf/arch/common.h | 11 +++ tools/perf/util/unwind-libunwind.c | 4 ++-- 3 files changed, 31 insertions(+), 20 deletions(-) diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c index 886dd2a..f763666 100644 --- a/tools/perf/arch/common.c +++ b/tools/perf/arch/common.c @@ -123,25 +123,25 @@ static int lookup_triplets(const char *const *triplets, const char *name) const char *normalize_arch(char *arch) { if (!strcmp(arch, "x86_64")) - return "x86"; + return NORM_X86; if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6') - return "x86"; + return NORM_X86; if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5)) - return "sparc"; + return NORM_SPARC; if (!strcmp(arch, "aarch64") || !strcmp(arch, "arm64")) - return "arm64"; + return NORM_ARM64; if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110")) - return "arm"; + return NORM_ARM; if (!strncmp(arch, "s390", 4)) - return "s390"; + return NORM_S390; if (!strncmp(arch, "parisc", 6)) - return "parisc"; + return NORM_PARISC; if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3)) - return "powerpc"; + return NORM_POWERPC; if (!strncmp(arch, "mips", 4)) - return "mips"; + return NORM_MIPS; if (!strncmp(arch, "sh", 2) && isdigit(arch[2])) - return "sh"; + return NORM_SH; return arch; } @@ -181,21 +181,21 @@ static int perf_env__lookup_binutils_path(struct perf_env *env, zfree(); } - if (!strcmp(arch, "arm")) + if (!strcmp(arch, NORM_ARM)) path_list = arm_triplets; - else if (!strcmp(arch, "arm64")) + else if (!strcmp(arch, NORM_ARM64)) path_list = arm64_triplets; - else if (!strcmp(arch, "powerpc")) + else if (!strcmp(arch, NORM_POWERPC)) path_list = powerpc_triplets; - else if (!strcmp(arch, "sh")) + else if (!strcmp(arch, NORM_SH)) path_list = sh_triplets; - else if (!strcmp(arch, "s390")) + else if (!strcmp(arch, NORM_S390)) path_list = s390_triplets; - else if (!strcmp(arch, "sparc")) + else if (!strcmp(arch, NORM_SPARC)) path_list = sparc_triplets; - else if (!strcmp(arch, "x86")) + else if (!strcmp(arch, NORM_X86)) path_list = x86_triplets; - else if (!strcmp(arch, "mips")) + else if (!strcmp(arch, NORM_MIPS)) path_list = mips_triplets; else { ui__error("binutils for %s not supported.\n", arch); diff --git a/tools/perf/arch/common.h b/tools/perf/arch/common.h index 6b01c73..14ca8ca 100644 --- a/tools/perf/arch/common.h +++ b/tools/perf/arch/common.h @@ -5,6 +5,17 @@ extern const char *objdump_path; +/* Macro for normalized arch names */ +#define NORM_X86 "x86" +#define NORM_SPARC "sparc" +#define NORM_ARM64 "arm64" +#define NORM_ARM "arm" +#define NORM_S390 "s390" +#define NORM_PARISC"parisc" +#define NORM_POWERPC "powerpc" +#define NORM_MIPS "mips" +#define NORM_SH"sh" + int perf_env__lookup_objdump(struct perf_env *env); const char *normalize_arch(char *arch); diff --git a/tools/perf/util/unwind-libunwind.c b/tools/perf/util/unwind-libunwind.c index 6d542a4..6199102 100644 --- a/tools/perf/util/unwind-libunwind.c +++ b/tools/perf/util/unwind-libunwind.c @@ -40,10 +40,10 @@ int unwind__prepare_access(struct thread *thread, struct map *map, arch = normalize_arch(thread->mg->machine->env->arch); - if (!strcmp(arch, "x86")) { + if (!strcmp(arch, NORM_X86)) { if (dso_type != DSO__TYPE_64BIT) ops = x86_32_unwind_libunwind_ops; - } else if (!strcmp(arch, "arm64") || !strcmp(arch, "arm")) { + } else if (!strcmp(arch, NORM_ARM64) || !strcmp(arch, NORM_ARM)) { if (dso_type == DSO__TYPE_64BIT) ops = arm64_unwind_libunwind_ops; } -- 2.5.5
[PATCH v5 0/7] perf: Cross arch annotate + few miscellaneous fixes
Currently Perf annotate support code navigation (branches and calls) only when run on the same architecture where perf.data was recorded. But, for example, record on powerpc server and annotate on client's x86 desktop is not supported. This patchset enables cross arch annotate. Currently I've used x86 and arm instructions which are already available and added support for powerpc. Additionally this patch series also contains few other related fixes. Patches are prepared on top of acme/perf/core and tested it with x86 and powerpc only. Note for arm: Few instructions were defined under #if __arm__ which I've used as a table for arm. But I'm not sure whether instruction defined outside of that also contains arm instructions. Apart from that, 'call__parse()' and 'move__parse()' contains #ifdef __arm__ directive. I've changed it to if (!strcmp(norm_arch, arm)). I don't have a arm machine to test these changes. Example: Record on powerpc: $ ./perf record -a Report -> Annotate on x86: $ ./perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc Changes in v5: - Replaced symbol__annotate with symbol__disassemble. - Removed hacks for jump and call instructions like bctr and bctrl respectively from generic patch that enables support for powerpc and made separate patch for that. - v4 was not annotating powerpc 'btar' instruction. Included that. - Added few generic fixes. v4 link: https://lkml.org/lkml/2016/7/8/10 Naveen N. Rao (1): perf annotate: Add support for powerpc Ravi Bangoria (6): perf: Define macro for normalized arch names perf annotate: Add cross arch annotate support perf annotate: Do not ignore call instruction with indirect target perf annotate: Show raw form for jump instruction with indirect target perf annotate: Support jump instruction with target as second operand perf annotate: Fix jump target outside of function address range tools/perf/arch/common.c | 36 ++--- tools/perf/arch/common.h | 11 ++ tools/perf/builtin-top.c | 2 +- tools/perf/ui/browsers/annotate.c | 8 +- tools/perf/ui/gtk/annotate.c | 2 +- tools/perf/util/annotate.c | 276 + tools/perf/util/annotate.h | 10 +- tools/perf/util/unwind-libunwind.c | 4 +- 8 files changed, 262 insertions(+), 87 deletions(-) -- 2.5.5
[PATCH v5 5/7] perf annotate: Show raw form for jump instruction with indirect target
For jump instructions that does not include target address as direct operand, use raw value for that. This is needed for certain powerpc jump instructions that use target address in a register (such as bctr, btar, ...). Suggested-by: Michael Ellerman Signed-off-by: Ravi Bangoria --- Changes in v5: - New patch introduced to annotate jump instruction with indirect target tools/perf/util/annotate.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 6368ba9..4a4a583 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -131,6 +131,9 @@ static int jump__parse(struct ins_operands *ops, static int jump__scnprintf(struct ins *ins, char *bf, size_t size, struct ins_operands *ops) { + if (!ops->target.addr) + return ins__raw_scnprintf(ins, bf, size, ops); + return scnprintf(bf, size, "%-6.6s %" PRIx64, ins->name, ops->target.offset); } -- 2.5.5
[PATCH 6/8] pipe: fix limit checking in alloc_pipe_info()
The limit checking in alloc_pipe_info() (used by pipe(2) and when opening a FIFO) has the following problems: (1) When checking capacity required for the new pipe, the checks against the limit in /proc/sys/fs/pipe-user-pages-{soft,hard} are made against existing consumption, and exclude the memory required for the new pipe capacity. As a consequence: (1) the memory allocation throttling provided by the soft limit does not kick in quite as early as it should, and (2) the user can overrun the hard limit. (2) As currently implemented, accounting and checking against the limits is done as follows: (a) Test whether the user has exceeded the limit. (b) Make new pipe buffer allocation. (c) Account new allocation against the limits. This is racey. Multiple processes may pass point (a) simultaneously, and then allocate pipe buffers that are accounted for only in step (c). The race means that the user's pipe buffer allocation could be pushed over the limit (by an arbitrary amount, depending on how unlucky we were in the race). [Thanks to Vegard Nossum for spotting this point, which I had missed.] This patch addresses the above problems as follows: * Alter the checks against limits to include the memory required for the new pipe. * Re-order the accounting step so that it precedes the buffer allocation. If the accounting step determines that a limit has been reached, revert the accounting and cause the operation to fail. Cc: Willy TarreauCc: Vegard Nossum Cc: socketp...@gmail.com Cc: Tetsuo Handa Cc: Jens Axboe Cc: Al Viro Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Michael Kerrisk --- fs/pipe.c | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/fs/pipe.c b/fs/pipe.c index 613c6b9..705d79f 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -632,24 +632,28 @@ struct pipe_inode_info *alloc_pipe_info(void) if (pipe == NULL) goto out_free_uid; - if (!too_many_pipe_buffers_hard(user)) { - if (too_many_pipe_buffers_soft(user)) - pipe_bufs = 1; - pipe->bufs = kcalloc(pipe_bufs, -sizeof(struct pipe_buffer), -GFP_KERNEL_ACCOUNT); - } + if (too_many_pipe_buffers_soft(user)) + pipe_bufs = 1; + + account_pipe_buffers(user, 0, pipe_bufs); + + if (too_many_pipe_buffers_hard(user)) + goto out_revert_acct; + + pipe->bufs = kcalloc(pipe_bufs, sizeof(struct pipe_buffer), +GFP_KERNEL_ACCOUNT); if (pipe->bufs) { init_waitqueue_head(>wait); pipe->r_counter = pipe->w_counter = 1; pipe->buffers = pipe_bufs; pipe->user = user; - account_pipe_buffers(user, 0, pipe_bufs); mutex_init(>mutex); return pipe; } +out_revert_acct: + account_pipe_buffers(user, pipe_bufs, 0); kfree(pipe); out_free_uid: free_uid(user); -- 2.5.5
[PATCH 8/8] pipe: cap initial pipe capacity according to pipe-max-size limit
This is an patch that provides behavior that is more consistent, and probably less surprising to users. I consider the change optional, and welcome opinions about whether it should be applied. By default, pipes are created with a capacity of 64 kiB. However, /proc/sys/fs/pipe-max-size may be set smaller than this value. In this scenario, an unprivileged user could thus create a pipe whose initial capacity exceeds the limit. Therefore, it seems logical to cap the initial pipe capacity according to the value of pipe-max-size. The test program shown earlier in this patch series can be used to demonstrate the effect of the change brought about with this patch: # cat /proc/sys/fs/pipe-max-size 1048576 # sudo -u mtk ./test_F_SETPIPE_SZ 1 Initial pipe capacity: 65536 # echo 1 > /proc/sys/fs/pipe-max-size # cat /proc/sys/fs/pipe-max-size 16384 # sudo -u mtk ./test_F_SETPIPE_SZ 1 Initial pipe capacity: 16384 # ./test_F_SETPIPE_SZ 1 Initial pipe capacity: 65536 The last two executions of 'test_F_SETPIPE_SZ' show that pipe-max-size caps the initial allocation for a new pipe for unprivileged users, but not for privileged users. Cc: Willy TarreauCc: Vegard Nossum Cc: socketp...@gmail.com Cc: Tetsuo Handa Cc: Jens Axboe Cc: Al Viro Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Michael Kerrisk --- fs/pipe.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/pipe.c b/fs/pipe.c index ada1777..caced8b 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -631,6 +631,9 @@ struct pipe_inode_info *alloc_pipe_info(void) if (pipe == NULL) goto out_free_uid; + if (!capable(CAP_SYS_RESOURCE) && pipe_bufs * PAGE_SIZE > pipe_max_size) + pipe_bufs = pipe_max_size >> PAGE_SHIFT; + if (too_many_pipe_buffers_soft(atomic_long_read(>pipe_bufs))) pipe_bufs = 1; -- 2.5.5
Seeking recommendation on whether a bahviour is right/wrong
Hi All. I have been trying to debug a strange issue occurring on a "mostly mainline"-linux-kernel, running on a proprietary embedded-platform. I still haven't been able to zero-on the issue with 100% confirmation, but I think the following might be happening :: a) A C-user-application is running, and a file is being written, one byte at a time. Let's say the file-name being written is "file1.txt" b) There's another file "file2.txt", which is in absolutely sane-state (no open file-descriptors, etc.) c) Now, a cron-script reboots the machine via /sbin/reboot "abruptly" (i.e. without closing the open-file-descriptor of "file1.txt"). d) When the machine comes up, we find that "file2.txt" is corrupted. In this behaviour, is the kernel at fault? Or the cron-job is the culprit for abrupt reboot? Thanks and Regards, Ajay
[PATCH 5/8] pipe: simplify logic in alloc_pipe_info()
Replace an 'if' block that covers most of the code in this function with a 'goto'. This makes the code a little simpler to read, and also simplifies the next patch. Cc: Willy TarreauCc: Vegard Nossum Cc: socketp...@gmail.com Cc: Tetsuo Handa Cc: Jens Axboe Cc: Al Viro Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Michael Kerrisk --- fs/pipe.c | 45 +++-- 1 file changed, 23 insertions(+), 22 deletions(-) diff --git a/fs/pipe.c b/fs/pipe.c index a7470a9..613c6b9 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -625,33 +625,34 @@ static bool too_many_pipe_buffers_hard(struct user_struct *user) struct pipe_inode_info *alloc_pipe_info(void) { struct pipe_inode_info *pipe; + unsigned long pipe_bufs = PIPE_DEF_BUFFERS; + struct user_struct *user = get_current_user(); pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL_ACCOUNT); - if (pipe) { - unsigned long pipe_bufs = PIPE_DEF_BUFFERS; - struct user_struct *user = get_current_user(); - - if (!too_many_pipe_buffers_hard(user)) { - if (too_many_pipe_buffers_soft(user)) - pipe_bufs = 1; - pipe->bufs = kcalloc(pipe_bufs, -sizeof(struct pipe_buffer), -GFP_KERNEL_ACCOUNT); - } + if (pipe == NULL) + goto out_free_uid; + + if (!too_many_pipe_buffers_hard(user)) { + if (too_many_pipe_buffers_soft(user)) + pipe_bufs = 1; + pipe->bufs = kcalloc(pipe_bufs, +sizeof(struct pipe_buffer), +GFP_KERNEL_ACCOUNT); + } - if (pipe->bufs) { - init_waitqueue_head(>wait); - pipe->r_counter = pipe->w_counter = 1; - pipe->buffers = pipe_bufs; - pipe->user = user; - account_pipe_buffers(user, 0, pipe_bufs); - mutex_init(>mutex); - return pipe; - } - free_uid(user); - kfree(pipe); + if (pipe->bufs) { + init_waitqueue_head(>wait); + pipe->r_counter = pipe->w_counter = 1; + pipe->buffers = pipe_bufs; + pipe->user = user; + account_pipe_buffers(user, 0, pipe_bufs); + mutex_init(>mutex); + return pipe; } + kfree(pipe); +out_free_uid: + free_uid(user); return NULL; } -- 2.5.5
[PATCH 6/8] pipe: fix limit checking in alloc_pipe_info()
The limit checking in alloc_pipe_info() (used by pipe(2) and when opening a FIFO) has the following problems: (1) When checking capacity required for the new pipe, the checks against the limit in /proc/sys/fs/pipe-user-pages-{soft,hard} are made against existing consumption, and exclude the memory required for the new pipe capacity. As a consequence: (1) the memory allocation throttling provided by the soft limit does not kick in quite as early as it should, and (2) the user can overrun the hard limit. (2) As currently implemented, accounting and checking against the limits is done as follows: (a) Test whether the user has exceeded the limit. (b) Make new pipe buffer allocation. (c) Account new allocation against the limits. This is racey. Multiple processes may pass point (a) simultaneously, and then allocate pipe buffers that are accounted for only in step (c). The race means that the user's pipe buffer allocation could be pushed over the limit (by an arbitrary amount, depending on how unlucky we were in the race). [Thanks to Vegard Nossum for spotting this point, which I had missed.] This patch addresses the above problems as follows: * Alter the checks against limits to include the memory required for the new pipe. * Re-order the accounting step so that it precedes the buffer allocation. If the accounting step determines that a limit has been reached, revert the accounting and cause the operation to fail. Cc: Willy Tarreau Cc: Vegard Nossum Cc: socketp...@gmail.com Cc: Tetsuo Handa Cc: Jens Axboe Cc: Al Viro Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Michael Kerrisk --- fs/pipe.c | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/fs/pipe.c b/fs/pipe.c index 613c6b9..705d79f 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -632,24 +632,28 @@ struct pipe_inode_info *alloc_pipe_info(void) if (pipe == NULL) goto out_free_uid; - if (!too_many_pipe_buffers_hard(user)) { - if (too_many_pipe_buffers_soft(user)) - pipe_bufs = 1; - pipe->bufs = kcalloc(pipe_bufs, -sizeof(struct pipe_buffer), -GFP_KERNEL_ACCOUNT); - } + if (too_many_pipe_buffers_soft(user)) + pipe_bufs = 1; + + account_pipe_buffers(user, 0, pipe_bufs); + + if (too_many_pipe_buffers_hard(user)) + goto out_revert_acct; + + pipe->bufs = kcalloc(pipe_bufs, sizeof(struct pipe_buffer), +GFP_KERNEL_ACCOUNT); if (pipe->bufs) { init_waitqueue_head(>wait); pipe->r_counter = pipe->w_counter = 1; pipe->buffers = pipe_bufs; pipe->user = user; - account_pipe_buffers(user, 0, pipe_bufs); mutex_init(>mutex); return pipe; } +out_revert_acct: + account_pipe_buffers(user, pipe_bufs, 0); kfree(pipe); out_free_uid: free_uid(user); -- 2.5.5
[PATCH 8/8] pipe: cap initial pipe capacity according to pipe-max-size limit
This is an patch that provides behavior that is more consistent, and probably less surprising to users. I consider the change optional, and welcome opinions about whether it should be applied. By default, pipes are created with a capacity of 64 kiB. However, /proc/sys/fs/pipe-max-size may be set smaller than this value. In this scenario, an unprivileged user could thus create a pipe whose initial capacity exceeds the limit. Therefore, it seems logical to cap the initial pipe capacity according to the value of pipe-max-size. The test program shown earlier in this patch series can be used to demonstrate the effect of the change brought about with this patch: # cat /proc/sys/fs/pipe-max-size 1048576 # sudo -u mtk ./test_F_SETPIPE_SZ 1 Initial pipe capacity: 65536 # echo 1 > /proc/sys/fs/pipe-max-size # cat /proc/sys/fs/pipe-max-size 16384 # sudo -u mtk ./test_F_SETPIPE_SZ 1 Initial pipe capacity: 16384 # ./test_F_SETPIPE_SZ 1 Initial pipe capacity: 65536 The last two executions of 'test_F_SETPIPE_SZ' show that pipe-max-size caps the initial allocation for a new pipe for unprivileged users, but not for privileged users. Cc: Willy Tarreau Cc: Vegard Nossum Cc: socketp...@gmail.com Cc: Tetsuo Handa Cc: Jens Axboe Cc: Al Viro Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Michael Kerrisk --- fs/pipe.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/pipe.c b/fs/pipe.c index ada1777..caced8b 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -631,6 +631,9 @@ struct pipe_inode_info *alloc_pipe_info(void) if (pipe == NULL) goto out_free_uid; + if (!capable(CAP_SYS_RESOURCE) && pipe_bufs * PAGE_SIZE > pipe_max_size) + pipe_bufs = pipe_max_size >> PAGE_SHIFT; + if (too_many_pipe_buffers_soft(atomic_long_read(>pipe_bufs))) pipe_bufs = 1; -- 2.5.5
Seeking recommendation on whether a bahviour is right/wrong
Hi All. I have been trying to debug a strange issue occurring on a "mostly mainline"-linux-kernel, running on a proprietary embedded-platform. I still haven't been able to zero-on the issue with 100% confirmation, but I think the following might be happening :: a) A C-user-application is running, and a file is being written, one byte at a time. Let's say the file-name being written is "file1.txt" b) There's another file "file2.txt", which is in absolutely sane-state (no open file-descriptors, etc.) c) Now, a cron-script reboots the machine via /sbin/reboot "abruptly" (i.e. without closing the open-file-descriptor of "file1.txt"). d) When the machine comes up, we find that "file2.txt" is corrupted. In this behaviour, is the kernel at fault? Or the cron-job is the culprit for abrupt reboot? Thanks and Regards, Ajay
[PATCH 5/8] pipe: simplify logic in alloc_pipe_info()
Replace an 'if' block that covers most of the code in this function with a 'goto'. This makes the code a little simpler to read, and also simplifies the next patch. Cc: Willy Tarreau Cc: Vegard Nossum Cc: socketp...@gmail.com Cc: Tetsuo Handa Cc: Jens Axboe Cc: Al Viro Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Michael Kerrisk --- fs/pipe.c | 45 +++-- 1 file changed, 23 insertions(+), 22 deletions(-) diff --git a/fs/pipe.c b/fs/pipe.c index a7470a9..613c6b9 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -625,33 +625,34 @@ static bool too_many_pipe_buffers_hard(struct user_struct *user) struct pipe_inode_info *alloc_pipe_info(void) { struct pipe_inode_info *pipe; + unsigned long pipe_bufs = PIPE_DEF_BUFFERS; + struct user_struct *user = get_current_user(); pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL_ACCOUNT); - if (pipe) { - unsigned long pipe_bufs = PIPE_DEF_BUFFERS; - struct user_struct *user = get_current_user(); - - if (!too_many_pipe_buffers_hard(user)) { - if (too_many_pipe_buffers_soft(user)) - pipe_bufs = 1; - pipe->bufs = kcalloc(pipe_bufs, -sizeof(struct pipe_buffer), -GFP_KERNEL_ACCOUNT); - } + if (pipe == NULL) + goto out_free_uid; + + if (!too_many_pipe_buffers_hard(user)) { + if (too_many_pipe_buffers_soft(user)) + pipe_bufs = 1; + pipe->bufs = kcalloc(pipe_bufs, +sizeof(struct pipe_buffer), +GFP_KERNEL_ACCOUNT); + } - if (pipe->bufs) { - init_waitqueue_head(>wait); - pipe->r_counter = pipe->w_counter = 1; - pipe->buffers = pipe_bufs; - pipe->user = user; - account_pipe_buffers(user, 0, pipe_bufs); - mutex_init(>mutex); - return pipe; - } - free_uid(user); - kfree(pipe); + if (pipe->bufs) { + init_waitqueue_head(>wait); + pipe->r_counter = pipe->w_counter = 1; + pipe->buffers = pipe_bufs; + pipe->user = user; + account_pipe_buffers(user, 0, pipe_bufs); + mutex_init(>mutex); + return pipe; } + kfree(pipe); +out_free_uid: + free_uid(user); return NULL; } -- 2.5.5
Re: [PATCH 1/2] x86: Set up resources correctly on Hyper-V Generation 2
On Thu, Aug 18, 2016 at 12:56 PM, Matthew Wilcoxwrote: > Compared to a patch which removes 5 lines of code, almost any additional work > is ocean-boiling. > Did you check the state of NFIT enabling in Hyper-V? Not patching the Linux kernel at all is even less work.
Re: [PATCH 1/2] x86: Set up resources correctly on Hyper-V Generation 2
On Thu, Aug 18, 2016 at 12:56 PM, Matthew Wilcox wrote: > Compared to a patch which removes 5 lines of code, almost any additional work > is ocean-boiling. > Did you check the state of NFIT enabling in Hyper-V? Not patching the Linux kernel at all is even less work.
[PATCH 7/8] pipe: make account_pipe_buffers() return a value, and use it
This is an optional patch, to provide a small performance improvement. Alter account_pipe_buffers() so that it returns the new value in user->pipe_bufs. This means that we can refactor too_many_pipe_buffers_soft() and too_many_pipe_buffers_hard() to avoid the costs of repeated use of atomic_long_read() to get the value user->pipe_bufs. Cc: Willy TarreauCc: Vegard Nossum Cc: socketp...@gmail.com Cc: Tetsuo Handa Cc: Jens Axboe Cc: Al Viro Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Michael Kerrisk --- fs/pipe.c | 34 +- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/fs/pipe.c b/fs/pipe.c index 705d79f..ada1777 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -604,22 +604,20 @@ pipe_fasync(int fd, struct file *filp, int on) return retval; } -static void account_pipe_buffers(struct user_struct *user, +static unsigned long account_pipe_buffers(struct user_struct *user, unsigned long old, unsigned long new) { - atomic_long_add(new - old, >pipe_bufs); + return atomic_long_add_return(new - old, >pipe_bufs); } -static bool too_many_pipe_buffers_soft(struct user_struct *user) +static bool too_many_pipe_buffers_soft(unsigned long num_bufs) { - return pipe_user_pages_soft && - atomic_long_read(>pipe_bufs) >= pipe_user_pages_soft; + return pipe_user_pages_soft && num_bufs >= pipe_user_pages_soft; } -static bool too_many_pipe_buffers_hard(struct user_struct *user) +static bool too_many_pipe_buffers_hard(unsigned long num_bufs) { - return pipe_user_pages_hard && - atomic_long_read(>pipe_bufs) >= pipe_user_pages_hard; + return pipe_user_pages_hard && num_bufs >= pipe_user_pages_hard; } struct pipe_inode_info *alloc_pipe_info(void) @@ -627,17 +625,18 @@ struct pipe_inode_info *alloc_pipe_info(void) struct pipe_inode_info *pipe; unsigned long pipe_bufs = PIPE_DEF_BUFFERS; struct user_struct *user = get_current_user(); + unsigned long num_bufs; pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL_ACCOUNT); if (pipe == NULL) goto out_free_uid; - if (too_many_pipe_buffers_soft(user)) + if (too_many_pipe_buffers_soft(atomic_long_read(>pipe_bufs))) pipe_bufs = 1; - account_pipe_buffers(user, 0, pipe_bufs); + num_bufs = account_pipe_buffers(user, 0, pipe_bufs); - if (too_many_pipe_buffers_hard(user)) + if (too_many_pipe_buffers_hard(num_bufs)) goto out_revert_acct; pipe->bufs = kcalloc(pipe_bufs, sizeof(struct pipe_buffer), @@ -653,7 +652,7 @@ struct pipe_inode_info *alloc_pipe_info(void) } out_revert_acct: - account_pipe_buffers(user, pipe_bufs, 0); + (void) account_pipe_buffers(user, pipe_bufs, 0); kfree(pipe); out_free_uid: free_uid(user); @@ -664,7 +663,7 @@ void free_pipe_info(struct pipe_inode_info *pipe) { int i; - account_pipe_buffers(pipe->user, pipe->buffers, 0); + (void) account_pipe_buffers(pipe->user, pipe->buffers, 0); free_uid(pipe->user); for (i = 0; i < pipe->buffers; i++) { struct pipe_buffer *buf = pipe->bufs + i; @@ -1035,6 +1034,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg) { struct pipe_buffer *bufs; unsigned int size, nr_pages; + unsigned long num_bufs; long ret = 0; size = round_pipe_size(arg); @@ -1043,7 +1043,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg) if (!nr_pages) return -EINVAL; - account_pipe_buffers(pipe->user, pipe->buffers, nr_pages); + num_bufs = account_pipe_buffers(pipe->user, pipe->buffers, nr_pages); /* * If trying to increase the pipe capacity, check that an @@ -1055,8 +1055,8 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg) if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) { ret = -EPERM; goto out_revert_acct; - } else if ((too_many_pipe_buffers_hard(pipe->user) || - too_many_pipe_buffers_soft(pipe->user)) && + } else if ((too_many_pipe_buffers_hard(num_bufs) || + too_many_pipe_buffers_soft(num_bufs)) && !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) { ret = -EPERM; @@ -1110,7 +1110,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg) return nr_pages * PAGE_SIZE; out_revert_acct: - account_pipe_buffers(pipe->user,
[PATCH 7/8] pipe: make account_pipe_buffers() return a value, and use it
This is an optional patch, to provide a small performance improvement. Alter account_pipe_buffers() so that it returns the new value in user->pipe_bufs. This means that we can refactor too_many_pipe_buffers_soft() and too_many_pipe_buffers_hard() to avoid the costs of repeated use of atomic_long_read() to get the value user->pipe_bufs. Cc: Willy Tarreau Cc: Vegard Nossum Cc: socketp...@gmail.com Cc: Tetsuo Handa Cc: Jens Axboe Cc: Al Viro Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Michael Kerrisk --- fs/pipe.c | 34 +- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/fs/pipe.c b/fs/pipe.c index 705d79f..ada1777 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -604,22 +604,20 @@ pipe_fasync(int fd, struct file *filp, int on) return retval; } -static void account_pipe_buffers(struct user_struct *user, +static unsigned long account_pipe_buffers(struct user_struct *user, unsigned long old, unsigned long new) { - atomic_long_add(new - old, >pipe_bufs); + return atomic_long_add_return(new - old, >pipe_bufs); } -static bool too_many_pipe_buffers_soft(struct user_struct *user) +static bool too_many_pipe_buffers_soft(unsigned long num_bufs) { - return pipe_user_pages_soft && - atomic_long_read(>pipe_bufs) >= pipe_user_pages_soft; + return pipe_user_pages_soft && num_bufs >= pipe_user_pages_soft; } -static bool too_many_pipe_buffers_hard(struct user_struct *user) +static bool too_many_pipe_buffers_hard(unsigned long num_bufs) { - return pipe_user_pages_hard && - atomic_long_read(>pipe_bufs) >= pipe_user_pages_hard; + return pipe_user_pages_hard && num_bufs >= pipe_user_pages_hard; } struct pipe_inode_info *alloc_pipe_info(void) @@ -627,17 +625,18 @@ struct pipe_inode_info *alloc_pipe_info(void) struct pipe_inode_info *pipe; unsigned long pipe_bufs = PIPE_DEF_BUFFERS; struct user_struct *user = get_current_user(); + unsigned long num_bufs; pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL_ACCOUNT); if (pipe == NULL) goto out_free_uid; - if (too_many_pipe_buffers_soft(user)) + if (too_many_pipe_buffers_soft(atomic_long_read(>pipe_bufs))) pipe_bufs = 1; - account_pipe_buffers(user, 0, pipe_bufs); + num_bufs = account_pipe_buffers(user, 0, pipe_bufs); - if (too_many_pipe_buffers_hard(user)) + if (too_many_pipe_buffers_hard(num_bufs)) goto out_revert_acct; pipe->bufs = kcalloc(pipe_bufs, sizeof(struct pipe_buffer), @@ -653,7 +652,7 @@ struct pipe_inode_info *alloc_pipe_info(void) } out_revert_acct: - account_pipe_buffers(user, pipe_bufs, 0); + (void) account_pipe_buffers(user, pipe_bufs, 0); kfree(pipe); out_free_uid: free_uid(user); @@ -664,7 +663,7 @@ void free_pipe_info(struct pipe_inode_info *pipe) { int i; - account_pipe_buffers(pipe->user, pipe->buffers, 0); + (void) account_pipe_buffers(pipe->user, pipe->buffers, 0); free_uid(pipe->user); for (i = 0; i < pipe->buffers; i++) { struct pipe_buffer *buf = pipe->bufs + i; @@ -1035,6 +1034,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg) { struct pipe_buffer *bufs; unsigned int size, nr_pages; + unsigned long num_bufs; long ret = 0; size = round_pipe_size(arg); @@ -1043,7 +1043,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg) if (!nr_pages) return -EINVAL; - account_pipe_buffers(pipe->user, pipe->buffers, nr_pages); + num_bufs = account_pipe_buffers(pipe->user, pipe->buffers, nr_pages); /* * If trying to increase the pipe capacity, check that an @@ -1055,8 +1055,8 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg) if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) { ret = -EPERM; goto out_revert_acct; - } else if ((too_many_pipe_buffers_hard(pipe->user) || - too_many_pipe_buffers_soft(pipe->user)) && + } else if ((too_many_pipe_buffers_hard(num_bufs) || + too_many_pipe_buffers_soft(num_bufs)) && !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) { ret = -EPERM; @@ -1110,7 +1110,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg) return nr_pages * PAGE_SIZE; out_revert_acct: - account_pipe_buffers(pipe->user, nr_pages, pipe->buffers); + (void) account_pipe_buffers(pipe->user, nr_pages, pipe->buffers); return ret; } -- 2.5.5
Re: [RFC PATCH 0/3] UART slave device bus
Hi, On Thu, Aug 18, 2016 at 06:08:24PM -0500, Rob Herring wrote: > On Thu, Aug 18, 2016 at 3:29 PM, Sebastian Reichelwrote: > > Thanks for going forward and implementing this. I also started, > > but was far from a functional state. > > > > On Wed, Aug 17, 2016 at 08:14:42PM -0500, Rob Herring wrote: > >> Currently, devices attached via a UART are not well supported in > >> the kernel. The problem is the device support is done in tty line > >> disciplines, various platform drivers to handle some sideband, and > >> in userspace with utilities such as hciattach. > >> > >> There have been several attempts to improve support, but they suffer from > >> still being tied into the tty layer and/or abusing the platform bus. This > >> is a prototype to show creating a proper UART bus for UART devices. It is > >> tied into the serial core (really struct uart_port) below the tty layer > >> in order to use existing serial drivers. > >> > >> This is functional with minimal testing using the loopback driver and > >> pl011 (w/o DMA) UART under QEMU (modified to add a DT node for the slave > >> device). It still needs lots of work and polish. > >> > >> TODOs: > >> - Figure out the port locking. mutex plus spinlock plus refcounting? I'm > >> hoping all that complexity is from the tty layer and not needed here. > >> - Split out the controller for uart_ports into separate driver. Do we see > >> a need for controller drivers that are not standard serial drivers? > >> - Implement/test the removal paths > >> - Fix the receive callbacks for more than character at a time (i.e. DMA) > >> - Need better receive buffering than just a simple circular buffer or > >> perhaps a different receive interface (e.g. direct to client buffer)? > >> - Test with other UART drivers > >> - Convert a real driver/line discipline over to UART bus. > >> > >> Before I spend more time on this, I'm looking mainly for feedback on the > >> general direction and structure (the interface with the existing serial > >> drivers in particular). > > > > I had a look at the uart_dev API: > > > > int uart_dev_config(struct uart_device *udev, int baud, int parity, int > > bits, int flow); > > int uart_dev_connect(struct uart_device *udev); > > > > The flow control configuration should be done separately. e.g.: > > uart_dev_flow_control(struct uart_device *udev, bool enable); > > No objection, but out of curiosity, why? Nokia's bluetooth uart protocol disables flow control during speed changes. > > int uart_dev_tx(struct uart_device *udev, u8 *buf, size_t count); > > int uart_dev_rx(struct uart_device *udev, u8 *buf, size_t count); > > > > UART communication does not have to be host-initiated, so this > > API requires polling. Either some function similar to poll in > > userspace is needed, or it should be implemented as callback. > > What's the userspace need? I meant "Either some function similar to userspace's poll() is needed, ...". Something like uart_dev_wait_for_rx() Alternatively the rx function could be a callback, that is called when there is new data. > I'm assuming the only immediate consumers are in-kernel. Yes, but the driver should be notified about incoming data. -- Sebastian signature.asc Description: PGP signature
Re: [RFC PATCH 0/3] UART slave device bus
Hi, On Thu, Aug 18, 2016 at 06:08:24PM -0500, Rob Herring wrote: > On Thu, Aug 18, 2016 at 3:29 PM, Sebastian Reichel wrote: > > Thanks for going forward and implementing this. I also started, > > but was far from a functional state. > > > > On Wed, Aug 17, 2016 at 08:14:42PM -0500, Rob Herring wrote: > >> Currently, devices attached via a UART are not well supported in > >> the kernel. The problem is the device support is done in tty line > >> disciplines, various platform drivers to handle some sideband, and > >> in userspace with utilities such as hciattach. > >> > >> There have been several attempts to improve support, but they suffer from > >> still being tied into the tty layer and/or abusing the platform bus. This > >> is a prototype to show creating a proper UART bus for UART devices. It is > >> tied into the serial core (really struct uart_port) below the tty layer > >> in order to use existing serial drivers. > >> > >> This is functional with minimal testing using the loopback driver and > >> pl011 (w/o DMA) UART under QEMU (modified to add a DT node for the slave > >> device). It still needs lots of work and polish. > >> > >> TODOs: > >> - Figure out the port locking. mutex plus spinlock plus refcounting? I'm > >> hoping all that complexity is from the tty layer and not needed here. > >> - Split out the controller for uart_ports into separate driver. Do we see > >> a need for controller drivers that are not standard serial drivers? > >> - Implement/test the removal paths > >> - Fix the receive callbacks for more than character at a time (i.e. DMA) > >> - Need better receive buffering than just a simple circular buffer or > >> perhaps a different receive interface (e.g. direct to client buffer)? > >> - Test with other UART drivers > >> - Convert a real driver/line discipline over to UART bus. > >> > >> Before I spend more time on this, I'm looking mainly for feedback on the > >> general direction and structure (the interface with the existing serial > >> drivers in particular). > > > > I had a look at the uart_dev API: > > > > int uart_dev_config(struct uart_device *udev, int baud, int parity, int > > bits, int flow); > > int uart_dev_connect(struct uart_device *udev); > > > > The flow control configuration should be done separately. e.g.: > > uart_dev_flow_control(struct uart_device *udev, bool enable); > > No objection, but out of curiosity, why? Nokia's bluetooth uart protocol disables flow control during speed changes. > > int uart_dev_tx(struct uart_device *udev, u8 *buf, size_t count); > > int uart_dev_rx(struct uart_device *udev, u8 *buf, size_t count); > > > > UART communication does not have to be host-initiated, so this > > API requires polling. Either some function similar to poll in > > userspace is needed, or it should be implemented as callback. > > What's the userspace need? I meant "Either some function similar to userspace's poll() is needed, ...". Something like uart_dev_wait_for_rx() Alternatively the rx function could be a callback, that is called when there is new data. > I'm assuming the only immediate consumers are in-kernel. Yes, but the driver should be notified about incoming data. -- Sebastian signature.asc Description: PGP signature
Re: linux-next: build warnings after merge of the kbuild tree
Hi Nick, On Fri, 19 Aug 2016 13:38:54 +1000 Stephen Rothwellwrote: > > On Thu, 18 Aug 2016 11:09:48 +1000 Nicholas Piggin wrote: > > > > On Wed, 17 Aug 2016 14:59:59 +0200 > > Michal Marek wrote: > > > > > On 2016-08-17 03:44, Stephen Rothwell wrote: > > > > > > > > After merging the kbuild tree, today's linux-next build (powerpc > > > > ppc64_defconfig) produced these warnings: > > > > > > > > WARNING: 25 bad relocations > > > > c0cf2570 R_PPC64_ADDR64__crc___arch_hweight16 > > > [...] > > > > Introduced by commit > > > > > > > > 9445aa1a3062 ("ppc: move exports to definitions") > > > > > > > > I have reverted that commit for today. > > > > > > > > [cc-ing the ppc guys for clues - also involved is commit > > > > > > > > 22823ab419d8 ("EXPORT_SYMBOL() for asm") > > > > ] > > > > > > FWIW, I see these warnings as well. Any help from ppc developers is > > > appreciated - should the R_PPC64_ADDR64 be whitelisted for exported asm > > > symbols (their CRCs actually)? > > > > The dangling relocation is a side effect of linker unable to resolve the > > reference to the undefined weak symbols. So the real question is, why has > > genksyms not overridden these symbols with their CRC values? > > > > This may not even be powerpc specific, but I'll poke at it a bit more > > when I get a chance. > > Not sure if this is relevant, but with the commit reverted, the > __crc___... symbols are absolute. > > f55b3b3d A __crc___arch_hweight16 Ignore that :-) I just had a look at a x86_64 allmodconfig result and it looks like the weak symbols are not resolved their either ... I may be missing something, but genksyms generates the crc's off the preprocessed C source code and we don't have any for the asm files ... -- Cheers, Stephen Rothwell
Re: linux-next: build warnings after merge of the kbuild tree
Hi Nick, On Fri, 19 Aug 2016 13:38:54 +1000 Stephen Rothwell wrote: > > On Thu, 18 Aug 2016 11:09:48 +1000 Nicholas Piggin wrote: > > > > On Wed, 17 Aug 2016 14:59:59 +0200 > > Michal Marek wrote: > > > > > On 2016-08-17 03:44, Stephen Rothwell wrote: > > > > > > > > After merging the kbuild tree, today's linux-next build (powerpc > > > > ppc64_defconfig) produced these warnings: > > > > > > > > WARNING: 25 bad relocations > > > > c0cf2570 R_PPC64_ADDR64__crc___arch_hweight16 > > > [...] > > > > Introduced by commit > > > > > > > > 9445aa1a3062 ("ppc: move exports to definitions") > > > > > > > > I have reverted that commit for today. > > > > > > > > [cc-ing the ppc guys for clues - also involved is commit > > > > > > > > 22823ab419d8 ("EXPORT_SYMBOL() for asm") > > > > ] > > > > > > FWIW, I see these warnings as well. Any help from ppc developers is > > > appreciated - should the R_PPC64_ADDR64 be whitelisted for exported asm > > > symbols (their CRCs actually)? > > > > The dangling relocation is a side effect of linker unable to resolve the > > reference to the undefined weak symbols. So the real question is, why has > > genksyms not overridden these symbols with their CRC values? > > > > This may not even be powerpc specific, but I'll poke at it a bit more > > when I get a chance. > > Not sure if this is relevant, but with the commit reverted, the > __crc___... symbols are absolute. > > f55b3b3d A __crc___arch_hweight16 Ignore that :-) I just had a look at a x86_64 allmodconfig result and it looks like the weak symbols are not resolved their either ... I may be missing something, but genksyms generates the crc's off the preprocessed C source code and we don't have any for the asm files ... -- Cheers, Stephen Rothwell
Re: [PACTH v2 0/3] Implement /proc//totmaps
On Thu, Aug 18, 2016 at 12:44 AM, Michal Hockowrote: > On Wed 17-08-16 11:57:56, Sonny Rao wrote: >> On Wed, Aug 17, 2016 at 6:03 AM, Michal Hocko wrote: >> > On Wed 17-08-16 11:31:25, Jann Horn wrote: > [...] >> >> That's at least 30.43% + 9.12% + 7.66% = 47.21% of the task's kernel >> >> time spent on evaluating format strings. The new interface >> >> wouldn't have to spend that much time on format strings because there >> >> isn't so much text to format. >> > >> > well, this is true of course but I would much rather try to reduce the >> > overhead of smaps file than add a new file. The following should help >> > already. I've measured ~7% systime cut down. I guess there is still some >> > room for improvements but I have to say I'm far from being convinced about >> > a new proc file just because we suck at dumping information to the >> > userspace. >> > If this was something like /proc//stat which is >> > essentially read all the time then it would be a different question but >> > is the rss, pss going to be all that often? If yes why? >> >> If the question is why do we need to read RSS, PSS, Private_*, Swap >> and the other fields so often? >> >> I have two use cases so far involving monitoring per-process memory >> usage, and we usually need to read stats for about 25 processes. >> >> Here's a timing example on an fairly recent ARM system 4 core RK3288 >> running at 1.8Ghz >> >> localhost ~ # time cat /proc/25946/smaps > /dev/null >> >> real0m0.036s >> user0m0.020s >> sys 0m0.020s >> >> localhost ~ # time cat /proc/25946/totmaps > /dev/null >> >> real0m0.027s >> user0m0.010s >> sys 0m0.010s >> localhost ~ # >> >> I'll ignore the user time for now, and we see about 20 ms of system >> time with smaps and 10 ms with totmaps, with 20 similar processes it >> would be 400 milliseconds of cpu time for the kernel to get this >> information from smaps vs 200 milliseconds with totmaps. Even totmaps >> is still pretty slow, but much better than smaps. >> >> Use cases: >> 1) Basic task monitoring -- like "top" that shows memory consumption >> including PSS, Private, Swap >> 1 second update means about 40% of one CPU is spent in the kernel >> gathering the data with smaps > > I would argue that even 20% is way too much for such a monitoring. What > is the value to do it so often tha 20 vs 40ms really matters? Yeah it is too much (I believe I said that) but it's significantly better. >> 2) User space OOM handling -- we'd rather do a more graceful shutdown >> than let the kernel's OOM killer activate and need to gather this >> information and we'd like to be able to get this information to make >> the decision much faster than 400ms > > Global OOM handling in userspace is really dubious if you ask me. I > understand you want something better than SIGKILL and in fact this is > already possible with memory cgroup controller (btw. memcg will give > you a cheap access to rss, amount of shared, swapped out memory as > well). Anyway if you are getting close to the OOM your system will most > probably be really busy and chances are that also reading your new file > will take much more time. I am also not quite sure how is pss useful for > oom decisions. I mentioned it before, but based on experience RSS just isn't good enough -- there's too much sharing going on in our use case to make the correct decision based on RSS. If RSS were good enough, simply put, this patch wouldn't exist. So even with memcg I think we'd have the same problem? > > Don't take me wrong, /proc//totmaps might be suitable for your > specific usecase but so far I haven't heard any sound argument for it to > be generally usable. It is true that smaps is unnecessarily costly but > at least I can see some room for improvements. A simple patch I've > posted cut the formatting overhead by 7%. Maybe we can do more. It seems like a general problem that if you want these values the existing kernel interface can be very expensive, so it would be generally usable by any application which wants a per process PSS, private data, dirty data or swap value. I mentioned two use cases, but I guess I don't understand the comment about why it's not usable by other use cases. > -- > Michal Hocko > SUSE Labs
Re: [PACTH v2 0/3] Implement /proc//totmaps
On Thu, Aug 18, 2016 at 12:44 AM, Michal Hocko wrote: > On Wed 17-08-16 11:57:56, Sonny Rao wrote: >> On Wed, Aug 17, 2016 at 6:03 AM, Michal Hocko wrote: >> > On Wed 17-08-16 11:31:25, Jann Horn wrote: > [...] >> >> That's at least 30.43% + 9.12% + 7.66% = 47.21% of the task's kernel >> >> time spent on evaluating format strings. The new interface >> >> wouldn't have to spend that much time on format strings because there >> >> isn't so much text to format. >> > >> > well, this is true of course but I would much rather try to reduce the >> > overhead of smaps file than add a new file. The following should help >> > already. I've measured ~7% systime cut down. I guess there is still some >> > room for improvements but I have to say I'm far from being convinced about >> > a new proc file just because we suck at dumping information to the >> > userspace. >> > If this was something like /proc//stat which is >> > essentially read all the time then it would be a different question but >> > is the rss, pss going to be all that often? If yes why? >> >> If the question is why do we need to read RSS, PSS, Private_*, Swap >> and the other fields so often? >> >> I have two use cases so far involving monitoring per-process memory >> usage, and we usually need to read stats for about 25 processes. >> >> Here's a timing example on an fairly recent ARM system 4 core RK3288 >> running at 1.8Ghz >> >> localhost ~ # time cat /proc/25946/smaps > /dev/null >> >> real0m0.036s >> user0m0.020s >> sys 0m0.020s >> >> localhost ~ # time cat /proc/25946/totmaps > /dev/null >> >> real0m0.027s >> user0m0.010s >> sys 0m0.010s >> localhost ~ # >> >> I'll ignore the user time for now, and we see about 20 ms of system >> time with smaps and 10 ms with totmaps, with 20 similar processes it >> would be 400 milliseconds of cpu time for the kernel to get this >> information from smaps vs 200 milliseconds with totmaps. Even totmaps >> is still pretty slow, but much better than smaps. >> >> Use cases: >> 1) Basic task monitoring -- like "top" that shows memory consumption >> including PSS, Private, Swap >> 1 second update means about 40% of one CPU is spent in the kernel >> gathering the data with smaps > > I would argue that even 20% is way too much for such a monitoring. What > is the value to do it so often tha 20 vs 40ms really matters? Yeah it is too much (I believe I said that) but it's significantly better. >> 2) User space OOM handling -- we'd rather do a more graceful shutdown >> than let the kernel's OOM killer activate and need to gather this >> information and we'd like to be able to get this information to make >> the decision much faster than 400ms > > Global OOM handling in userspace is really dubious if you ask me. I > understand you want something better than SIGKILL and in fact this is > already possible with memory cgroup controller (btw. memcg will give > you a cheap access to rss, amount of shared, swapped out memory as > well). Anyway if you are getting close to the OOM your system will most > probably be really busy and chances are that also reading your new file > will take much more time. I am also not quite sure how is pss useful for > oom decisions. I mentioned it before, but based on experience RSS just isn't good enough -- there's too much sharing going on in our use case to make the correct decision based on RSS. If RSS were good enough, simply put, this patch wouldn't exist. So even with memcg I think we'd have the same problem? > > Don't take me wrong, /proc//totmaps might be suitable for your > specific usecase but so far I haven't heard any sound argument for it to > be generally usable. It is true that smaps is unnecessarily costly but > at least I can see some room for improvements. A simple patch I've > posted cut the formatting overhead by 7%. Maybe we can do more. It seems like a general problem that if you want these values the existing kernel interface can be very expensive, so it would be generally usable by any application which wants a per process PSS, private data, dirty data or swap value. I mentioned two use cases, but I guess I don't understand the comment about why it's not usable by other use cases. > -- > Michal Hocko > SUSE Labs
Re: [PATCH v3 1/3] devicetree: Sort vendor prefixes in alphabetical order
On Fri, Aug 19, 2016 at 01:02:37AM +0200, Andrew Lunn wrote: > > @@ -54,8 +54,8 @@ chipone ChipOne > > chipspark ChipSPARK > > chrp Common Hardware Reference Platform > > chunghwa Chunghwa Picture Tubes Ltd. > > -ciaa Computadora Industrial Abierta Argentina > > cirrus Cirrus Logic, Inc. > > +ciaa Computadora Industrial Abierta Argentina > > ciaa comes after cirrus? It does with LC_COLLATE=da_DK :-( I'm sorry about that. I'll post v4 this afternoon. -- Rask Ingemann Lambertsen Danish law requires addresses in e-mail to be logged and stored for a year
Re: [PATCH v3 1/3] devicetree: Sort vendor prefixes in alphabetical order
On Fri, Aug 19, 2016 at 01:02:37AM +0200, Andrew Lunn wrote: > > @@ -54,8 +54,8 @@ chipone ChipOne > > chipspark ChipSPARK > > chrp Common Hardware Reference Platform > > chunghwa Chunghwa Picture Tubes Ltd. > > -ciaa Computadora Industrial Abierta Argentina > > cirrus Cirrus Logic, Inc. > > +ciaa Computadora Industrial Abierta Argentina > > ciaa comes after cirrus? It does with LC_COLLATE=da_DK :-( I'm sorry about that. I'll post v4 this afternoon. -- Rask Ingemann Lambertsen Danish law requires addresses in e-mail to be logged and stored for a year
[GIT PULL] xfs, iomap: fixes for 4.8-rc3
Hi Linus, Can you please pull the fixes from the tag list below? This update contains fixes for most of the outstanding regressions introduced with the 4.8-rc1 XFS and iomap infrastructure merge. The only regression that isn't addressed by this pullreq is the aim7 write regression. I'm still testing Christophs patches that address the simple cases we've reproduced, but the cause of the aim7 regression is still not clear so there's more work to be done there. Still, that's no reason to hold up all the other issues we have tested fixes for. Thanks! -Dave. The following changes since commit 694d0d0bb2030d2e36df73e2d23d5770511dbc8d: Linux 4.8-rc2 (2016-08-14 19:11:36 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git tags/xfs-iomap-for-linus-4.8-rc3 for you to fetch changes up to 32438cf9d54bd53b531f6d98814e84dd278360c1: Merge branch 'iomap-fixes-4.8-rc3' into for-next (2016-08-17 11:13:37 +1000) xfs, iomap: update for 4.8-rc3 Changes in this update - regression fixes for XFS changes introduce in 4.8-rc1 - buffer IO accounting assert failure - ENOSPC block accounting reservation issue - DAX IO path page cache invalidation fix - rmapbt on-disk block count in agf - correct classification of rmap block type when updating AGFL. - iomap support for attribute fork mapping - regression fixes for iomap infrastructure in 4.8-rc1 - fiemap: honor FIEMAP_FLAG_SYNC - fiemap: implement FIEMAP_FLAG_XATTR support to fix XFS regression - make mark_page_accessed and pagefault_disable usage consistent with other IO paths Brian Foster (1): xfs: don't assert fail on non-async buffers on ioacct decrement Christoph Hellwig (6): xfs: fix bogus space reservation in xfs_iomap_write_allocate iomap: remove superflous mark_page_accessed from iomap_write_actor iomap: remove superflous pagefault_disable from iomap_write_actor iomap: mark ->iomap_end as optional xfs: simplify xfs_file_iomap_begin xfs: (re-)implement FIEMAP_FLAG_XATTR Darrick J. Wong (2): xfs: store rmapbt block count in the AGF xfs: remove OWN_AG rmap when allocating a block from the AGFL Dave Chinner (4): xfs: don't invalidate whole file on DAX read/write iomap: fiemap should honor the FIEMAP_FLAG_SYNC flag iomap: prepare iomap_fiemap for attribute mappings Merge branch 'iomap-fixes-4.8-rc3' into for-next fs/iomap.c | 21 - fs/xfs/libxfs/xfs_alloc.c | 14 + fs/xfs/libxfs/xfs_format.h | 11 +-- fs/xfs/libxfs/xfs_rmap_btree.c | 6 fs/xfs/xfs_buf.c | 1 - fs/xfs/xfs_file.c | 13 +++- fs/xfs/xfs_fsops.c | 1 + fs/xfs/xfs_iomap.c | 69 ++ fs/xfs/xfs_iomap.h | 1 + fs/xfs/xfs_iops.c | 9 +- fs/xfs/xfs_trace.h | 1 - 11 files changed, 119 insertions(+), 28 deletions(-) -- Dave Chinner da...@fromorbit.com
[GIT PULL] xfs, iomap: fixes for 4.8-rc3
Hi Linus, Can you please pull the fixes from the tag list below? This update contains fixes for most of the outstanding regressions introduced with the 4.8-rc1 XFS and iomap infrastructure merge. The only regression that isn't addressed by this pullreq is the aim7 write regression. I'm still testing Christophs patches that address the simple cases we've reproduced, but the cause of the aim7 regression is still not clear so there's more work to be done there. Still, that's no reason to hold up all the other issues we have tested fixes for. Thanks! -Dave. The following changes since commit 694d0d0bb2030d2e36df73e2d23d5770511dbc8d: Linux 4.8-rc2 (2016-08-14 19:11:36 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git tags/xfs-iomap-for-linus-4.8-rc3 for you to fetch changes up to 32438cf9d54bd53b531f6d98814e84dd278360c1: Merge branch 'iomap-fixes-4.8-rc3' into for-next (2016-08-17 11:13:37 +1000) xfs, iomap: update for 4.8-rc3 Changes in this update - regression fixes for XFS changes introduce in 4.8-rc1 - buffer IO accounting assert failure - ENOSPC block accounting reservation issue - DAX IO path page cache invalidation fix - rmapbt on-disk block count in agf - correct classification of rmap block type when updating AGFL. - iomap support for attribute fork mapping - regression fixes for iomap infrastructure in 4.8-rc1 - fiemap: honor FIEMAP_FLAG_SYNC - fiemap: implement FIEMAP_FLAG_XATTR support to fix XFS regression - make mark_page_accessed and pagefault_disable usage consistent with other IO paths Brian Foster (1): xfs: don't assert fail on non-async buffers on ioacct decrement Christoph Hellwig (6): xfs: fix bogus space reservation in xfs_iomap_write_allocate iomap: remove superflous mark_page_accessed from iomap_write_actor iomap: remove superflous pagefault_disable from iomap_write_actor iomap: mark ->iomap_end as optional xfs: simplify xfs_file_iomap_begin xfs: (re-)implement FIEMAP_FLAG_XATTR Darrick J. Wong (2): xfs: store rmapbt block count in the AGF xfs: remove OWN_AG rmap when allocating a block from the AGFL Dave Chinner (4): xfs: don't invalidate whole file on DAX read/write iomap: fiemap should honor the FIEMAP_FLAG_SYNC flag iomap: prepare iomap_fiemap for attribute mappings Merge branch 'iomap-fixes-4.8-rc3' into for-next fs/iomap.c | 21 - fs/xfs/libxfs/xfs_alloc.c | 14 + fs/xfs/libxfs/xfs_format.h | 11 +-- fs/xfs/libxfs/xfs_rmap_btree.c | 6 fs/xfs/xfs_buf.c | 1 - fs/xfs/xfs_file.c | 13 +++- fs/xfs/xfs_fsops.c | 1 + fs/xfs/xfs_iomap.c | 69 ++ fs/xfs/xfs_iomap.h | 1 + fs/xfs/xfs_iops.c | 9 +- fs/xfs/xfs_trace.h | 1 - 11 files changed, 119 insertions(+), 28 deletions(-) -- Dave Chinner da...@fromorbit.com
Re: [PATCH 2/2] pipe: make pipe user buffer limit checks more precise
Andrew, Thanks for picking up this patch series in -mm. Please drop it. After discussions with Vegard, I have something better now. Cheers, Michael On 08/16/2016 11:14 PM, Michael Kerrisk (man-pages) wrote: > As currently implemented, when creating a new pipe or increasing > a pipe's capacity with fcntl(F_SETPIPE_SZ), the checks against > the limits in /proc/sys/fs/pipe-user-pages-{soft,hard} (added by > commit 759c01142a5d0) do not include the pages required for the > new pipe or increased capacity. In the case of fcntl(F_SETPIPE_SZ), > this means that an unprivileged user can make a one-time capacity > increase that pushes the user consumption over the limits by up > to the value specified in /proc/sys/fs/pipe-max-size (which > defaults to 1 MiB, but might be set to a much higher value). > > This patch remedies the problem by including the capacity required > for the new pipe or the pipe capacity increase in the check against > the limit. > > There is a small chance that this change could break user-space, > since there are cases where pipe() and fcntl(F_SETPIPE_SZ) calls > that previously succeeded might fail. However, the chances are > small, since (a) the pipe-user-pages-{soft,hard} limits are new > (in 4.5), and the default soft/hard limits are high/unlimited. > Therefore, it seems warranted to make these limits operate more > precisely (and behave more like what users probably expect). > > Using the test program shown in the previous patch, on an unpatched > kernel, we first set some limits: > > # echo 0 > /proc/sys/fs/pipe-user-pages-soft > # echo 10 > /proc/sys/fs/pipe-max-size > # echo 1 > /proc/sys/fs/pipe-user-pages-hard# 40.96 MB > > Then show that we can set a pipe with capacity (100MB) that is > over the hard limit > > # sudo -u mtk ./test_F_SETPIPE_SZ 1 1 > Loop 1: set pipe capacity to 1 bytes > F_SETPIPE_SZ returned 134217728 > > Now set the capacity to 100MB twice. The second call fails (which is > probably surprising to most users, since it seems like a no-op): > > # sudo -u mtk ./test_F_SETPIPE_SZ 1 1 0 1 > Loop 1: set pipe capacity to 1 bytes > F_SETPIPE_SZ returned 134217728 > Loop 2: set pipe capacity to 1 bytes > Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted > > With a patched kernel, setting a capacity over the limit fails at the > first attempt: > > # echo 0 > /proc/sys/fs/pipe-user-pages-soft > # echo 10 > /proc/sys/fs/pipe-max-size > # echo 1 > /proc/sys/fs/pipe-user-pages-hard > # sudo -u mtk ./test_F_SETPIPE_SZ 1 1 > Loop 1: set pipe capacity to 1 bytes > Loop 1, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not > permitted > > Cc: Willy Tarreau> Cc: Vegard Nossum > Cc: socketp...@gmail.com > Cc: Tetsuo Handa > Cc: Jens Axboe > Cc: Al Viro > Cc: sta...@vger.kernel.org > Cc: linux-...@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Signed-off-by: Michael Kerrisk > --- > fs/pipe.c | 24 ++-- > 1 file changed, 14 insertions(+), 10 deletions(-) > > diff --git a/fs/pipe.c b/fs/pipe.c > index a98ebca..397d8d9 100644 > --- a/fs/pipe.c > +++ b/fs/pipe.c > @@ -610,16 +610,20 @@ static void account_pipe_buffers(struct pipe_inode_info > *pipe, > atomic_long_add(new - old, >user->pipe_bufs); > } > > -static bool too_many_pipe_buffers_soft(struct user_struct *user) > +static bool too_many_pipe_buffers_soft(struct user_struct *user, > +unsigned int nr_pages) > { > return pipe_user_pages_soft && > -atomic_long_read(>pipe_bufs) >= pipe_user_pages_soft; > +atomic_long_read(>pipe_bufs) + nr_pages >= > + pipe_user_pages_soft; > } > > -static bool too_many_pipe_buffers_hard(struct user_struct *user) > +static bool too_many_pipe_buffers_hard(struct user_struct *user, > +unsigned int nr_pages) > { > return pipe_user_pages_hard && > -atomic_long_read(>pipe_bufs) >= pipe_user_pages_hard; > +atomic_long_read(>pipe_bufs) + nr_pages >= > + pipe_user_pages_hard; > } > > struct pipe_inode_info *alloc_pipe_info(void) > @@ -631,13 +635,13 @@ struct pipe_inode_info *alloc_pipe_info(void) > unsigned long pipe_bufs = PIPE_DEF_BUFFERS; > struct user_struct *user = get_current_user(); > > - if (!too_many_pipe_buffers_hard(user)) { > - if (too_many_pipe_buffers_soft(user)) > - pipe_bufs = 1; > + if (too_many_pipe_buffers_soft(user, PIPE_DEF_BUFFERS)) > + pipe_bufs = 1; > + > + if
Re: [PATCH 1/2] pipe: check limits only when increasing pipe capacity
Andrew, thanks for picking up this patch series in -mm. Please drop it. After discussions with Vegard, I have something better now. Cheers, Michael On 08/16/2016 11:10 PM, Michael Kerrisk (man-pages) wrote: > When changing a pipe's capacity with fcntl(F_SETPIPE_SZ), various > limits defined by /proc/sys/fs/pipe-* files are checked to see > if unprivileged users are exceeding limits on memory consumption. > > While documenting and testing the operation of these limits I noticed > that, as currently implemented, these checks can lead to cases where > a user can increase a pipe's capacity and is then unable to decrease > the capacity. The origin of the problem is two-fold: > > (1) When increasing the pipe capacity, the checks against the limits > in /proc/sys/fs/pipe-user-pages-{soft,hard} are made against > existing consumption, and exclude the memory required for the > increased pipe capacity. The new increase in pipe capacity > can then push the total memory used by the user for pipes > (possibly far) over a limit. > > (2) The limit checks are performed even when the new pipe capacity > is less than the existing pipe capacity. This can lead to > problems if a user sets a large pipe capacity, and then the > limits are lowered, with the result that the user will no > longer be able to decrease the pipe capacity. > > The simple solution given by this patch is to perform the checks > only when the pipe capacity is being increased. The patch does not > address the broken check in (1), which allows a user to (one-time) > set a pipe capacity that pushes the user's consumption over the user > pipe limits. A change to fix that check is proposed in a subsequent > patch. I've separated the two fixes because the second fix is a > little more complex, and could possibly (though unlikely) break > existing user-space. The current patch implements the simple fix > that carries little risk and seems obviously correct: allowing an > unprivileged user always to decrease a pipe's capacity. > > The program below can be used to demonstrate the problem, and the > effect of the fix. The program takes one or more command-line > arguments. The first argument specifies the number of pipes > that the program should create. The remaining arguments are, > alternately, pipe capacities that should be set using > fcntl(F_SETPIPE_SZ), and sleep intervals (in seconds) between > the fcntl() operations. (The sleep intervals allow the possibility > to change the limits between fcntl() operations.) > > Running this program on an unpatched kernel, we first set some limits: > > # getconf PAGESIZE > 4096 > # echo 0 > /proc/sys/fs/pipe-user-pages-soft > # echo 10 > /proc/sys/fs/pipe-max-size > # echo 1 > /proc/sys/fs/pipe-user-pages-hard# 40.96 MB > > Now perform two fcntl(F_SETPIPE_SZ) operations on a single pipe, > first setting a pipe capacity (10MB), sleeping for a few seconds, > during which time the hard limit is lowered, and then set pipe > capacity to a smaller amount (5MB): > > # sudo -u mtk ./test_F_SETPIPE_SZ 1 1000 15 500 & > [1] 748 > # Loop 1: set pipe capacity to 1000 bytes > F_SETPIPE_SZ returned 16777216 > Sleeping 15 seconds > > # echo 1000 > /proc/sys/fs/pipe-user-pages-hard # 4.096 MB > > # Loop 2: set pipe capacity to 500 bytes > Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not > permitted > > In this case, the user should be able to lower the limit. > > With a kernel that has the patch below, the second fcntl() > succeeds: > > # echo 0 > /proc/sys/fs/pipe-user-pages-soft > # echo 10 > /proc/sys/fs/pipe-max-size > # echo 1 > /proc/sys/fs/pipe-user-pages-hard > # sudo -u mtk ./test_F_SETPIPE_SZ 1 1000 15 500 & > [1] 3215 > # Loop 1: set pipe capacity to 1000 bytes > F_SETPIPE_SZ returned 16777216 > Sleeping 15 seconds > > # echo 1000 > /proc/sys/fs/pipe-user-pages-hard > > # Loop 2: set pipe capacity to 500 bytes > F_SETPIPE_SZ returned 8388608 > > 8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x--- > > /* test_F_SETPIPE_SZ.c > >(C) 2016, Michael Kerrisk; licensed under GNU GPL version 2 or later > >Test operation of fcntl(F_SETPIPE_SZ) for setting pipe capacity >and interactions with limits defined by /proc/sys/fs/pipe-* files. > */ > > int > main(int argc, char *argv[]) > { > int (*pfd)[2]; > int npipes; > int pcap, rcap; > int j, p, s, stime, loop; > > if (argc < 2) { > fprintf(stderr, "Usage: %s num-pipes " > "[pipe-capacity sleep-time]...\n", argv[0]); > exit(EXIT_FAILURE); > } > > npipes = atoi(argv[1]); > > pfd = calloc(npipes, sizeof (int [2])); > if (pfd == NULL) { > perror("calloc"); > exit(EXIT_FAILURE); >
Re: [PATCH 2/2] pipe: make pipe user buffer limit checks more precise
Andrew, Thanks for picking up this patch series in -mm. Please drop it. After discussions with Vegard, I have something better now. Cheers, Michael On 08/16/2016 11:14 PM, Michael Kerrisk (man-pages) wrote: > As currently implemented, when creating a new pipe or increasing > a pipe's capacity with fcntl(F_SETPIPE_SZ), the checks against > the limits in /proc/sys/fs/pipe-user-pages-{soft,hard} (added by > commit 759c01142a5d0) do not include the pages required for the > new pipe or increased capacity. In the case of fcntl(F_SETPIPE_SZ), > this means that an unprivileged user can make a one-time capacity > increase that pushes the user consumption over the limits by up > to the value specified in /proc/sys/fs/pipe-max-size (which > defaults to 1 MiB, but might be set to a much higher value). > > This patch remedies the problem by including the capacity required > for the new pipe or the pipe capacity increase in the check against > the limit. > > There is a small chance that this change could break user-space, > since there are cases where pipe() and fcntl(F_SETPIPE_SZ) calls > that previously succeeded might fail. However, the chances are > small, since (a) the pipe-user-pages-{soft,hard} limits are new > (in 4.5), and the default soft/hard limits are high/unlimited. > Therefore, it seems warranted to make these limits operate more > precisely (and behave more like what users probably expect). > > Using the test program shown in the previous patch, on an unpatched > kernel, we first set some limits: > > # echo 0 > /proc/sys/fs/pipe-user-pages-soft > # echo 10 > /proc/sys/fs/pipe-max-size > # echo 1 > /proc/sys/fs/pipe-user-pages-hard# 40.96 MB > > Then show that we can set a pipe with capacity (100MB) that is > over the hard limit > > # sudo -u mtk ./test_F_SETPIPE_SZ 1 1 > Loop 1: set pipe capacity to 1 bytes > F_SETPIPE_SZ returned 134217728 > > Now set the capacity to 100MB twice. The second call fails (which is > probably surprising to most users, since it seems like a no-op): > > # sudo -u mtk ./test_F_SETPIPE_SZ 1 1 0 1 > Loop 1: set pipe capacity to 1 bytes > F_SETPIPE_SZ returned 134217728 > Loop 2: set pipe capacity to 1 bytes > Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted > > With a patched kernel, setting a capacity over the limit fails at the > first attempt: > > # echo 0 > /proc/sys/fs/pipe-user-pages-soft > # echo 10 > /proc/sys/fs/pipe-max-size > # echo 1 > /proc/sys/fs/pipe-user-pages-hard > # sudo -u mtk ./test_F_SETPIPE_SZ 1 1 > Loop 1: set pipe capacity to 1 bytes > Loop 1, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not > permitted > > Cc: Willy Tarreau > Cc: Vegard Nossum > Cc: socketp...@gmail.com > Cc: Tetsuo Handa > Cc: Jens Axboe > Cc: Al Viro > Cc: sta...@vger.kernel.org > Cc: linux-...@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Signed-off-by: Michael Kerrisk > --- > fs/pipe.c | 24 ++-- > 1 file changed, 14 insertions(+), 10 deletions(-) > > diff --git a/fs/pipe.c b/fs/pipe.c > index a98ebca..397d8d9 100644 > --- a/fs/pipe.c > +++ b/fs/pipe.c > @@ -610,16 +610,20 @@ static void account_pipe_buffers(struct pipe_inode_info > *pipe, > atomic_long_add(new - old, >user->pipe_bufs); > } > > -static bool too_many_pipe_buffers_soft(struct user_struct *user) > +static bool too_many_pipe_buffers_soft(struct user_struct *user, > +unsigned int nr_pages) > { > return pipe_user_pages_soft && > -atomic_long_read(>pipe_bufs) >= pipe_user_pages_soft; > +atomic_long_read(>pipe_bufs) + nr_pages >= > + pipe_user_pages_soft; > } > > -static bool too_many_pipe_buffers_hard(struct user_struct *user) > +static bool too_many_pipe_buffers_hard(struct user_struct *user, > +unsigned int nr_pages) > { > return pipe_user_pages_hard && > -atomic_long_read(>pipe_bufs) >= pipe_user_pages_hard; > +atomic_long_read(>pipe_bufs) + nr_pages >= > + pipe_user_pages_hard; > } > > struct pipe_inode_info *alloc_pipe_info(void) > @@ -631,13 +635,13 @@ struct pipe_inode_info *alloc_pipe_info(void) > unsigned long pipe_bufs = PIPE_DEF_BUFFERS; > struct user_struct *user = get_current_user(); > > - if (!too_many_pipe_buffers_hard(user)) { > - if (too_many_pipe_buffers_soft(user)) > - pipe_bufs = 1; > + if (too_many_pipe_buffers_soft(user, PIPE_DEF_BUFFERS)) > + pipe_bufs = 1; > + > + if (!too_many_pipe_buffers_hard(user, pipe_bufs)) > pipe->bufs = kcalloc(pipe_bufs, >sizeof(struct
Re: [PATCH 1/2] pipe: check limits only when increasing pipe capacity
Andrew, thanks for picking up this patch series in -mm. Please drop it. After discussions with Vegard, I have something better now. Cheers, Michael On 08/16/2016 11:10 PM, Michael Kerrisk (man-pages) wrote: > When changing a pipe's capacity with fcntl(F_SETPIPE_SZ), various > limits defined by /proc/sys/fs/pipe-* files are checked to see > if unprivileged users are exceeding limits on memory consumption. > > While documenting and testing the operation of these limits I noticed > that, as currently implemented, these checks can lead to cases where > a user can increase a pipe's capacity and is then unable to decrease > the capacity. The origin of the problem is two-fold: > > (1) When increasing the pipe capacity, the checks against the limits > in /proc/sys/fs/pipe-user-pages-{soft,hard} are made against > existing consumption, and exclude the memory required for the > increased pipe capacity. The new increase in pipe capacity > can then push the total memory used by the user for pipes > (possibly far) over a limit. > > (2) The limit checks are performed even when the new pipe capacity > is less than the existing pipe capacity. This can lead to > problems if a user sets a large pipe capacity, and then the > limits are lowered, with the result that the user will no > longer be able to decrease the pipe capacity. > > The simple solution given by this patch is to perform the checks > only when the pipe capacity is being increased. The patch does not > address the broken check in (1), which allows a user to (one-time) > set a pipe capacity that pushes the user's consumption over the user > pipe limits. A change to fix that check is proposed in a subsequent > patch. I've separated the two fixes because the second fix is a > little more complex, and could possibly (though unlikely) break > existing user-space. The current patch implements the simple fix > that carries little risk and seems obviously correct: allowing an > unprivileged user always to decrease a pipe's capacity. > > The program below can be used to demonstrate the problem, and the > effect of the fix. The program takes one or more command-line > arguments. The first argument specifies the number of pipes > that the program should create. The remaining arguments are, > alternately, pipe capacities that should be set using > fcntl(F_SETPIPE_SZ), and sleep intervals (in seconds) between > the fcntl() operations. (The sleep intervals allow the possibility > to change the limits between fcntl() operations.) > > Running this program on an unpatched kernel, we first set some limits: > > # getconf PAGESIZE > 4096 > # echo 0 > /proc/sys/fs/pipe-user-pages-soft > # echo 10 > /proc/sys/fs/pipe-max-size > # echo 1 > /proc/sys/fs/pipe-user-pages-hard# 40.96 MB > > Now perform two fcntl(F_SETPIPE_SZ) operations on a single pipe, > first setting a pipe capacity (10MB), sleeping for a few seconds, > during which time the hard limit is lowered, and then set pipe > capacity to a smaller amount (5MB): > > # sudo -u mtk ./test_F_SETPIPE_SZ 1 1000 15 500 & > [1] 748 > # Loop 1: set pipe capacity to 1000 bytes > F_SETPIPE_SZ returned 16777216 > Sleeping 15 seconds > > # echo 1000 > /proc/sys/fs/pipe-user-pages-hard # 4.096 MB > > # Loop 2: set pipe capacity to 500 bytes > Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not > permitted > > In this case, the user should be able to lower the limit. > > With a kernel that has the patch below, the second fcntl() > succeeds: > > # echo 0 > /proc/sys/fs/pipe-user-pages-soft > # echo 10 > /proc/sys/fs/pipe-max-size > # echo 1 > /proc/sys/fs/pipe-user-pages-hard > # sudo -u mtk ./test_F_SETPIPE_SZ 1 1000 15 500 & > [1] 3215 > # Loop 1: set pipe capacity to 1000 bytes > F_SETPIPE_SZ returned 16777216 > Sleeping 15 seconds > > # echo 1000 > /proc/sys/fs/pipe-user-pages-hard > > # Loop 2: set pipe capacity to 500 bytes > F_SETPIPE_SZ returned 8388608 > > 8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x--- > > /* test_F_SETPIPE_SZ.c > >(C) 2016, Michael Kerrisk; licensed under GNU GPL version 2 or later > >Test operation of fcntl(F_SETPIPE_SZ) for setting pipe capacity >and interactions with limits defined by /proc/sys/fs/pipe-* files. > */ > > int > main(int argc, char *argv[]) > { > int (*pfd)[2]; > int npipes; > int pcap, rcap; > int j, p, s, stime, loop; > > if (argc < 2) { > fprintf(stderr, "Usage: %s num-pipes " > "[pipe-capacity sleep-time]...\n", argv[0]); > exit(EXIT_FAILURE); > } > > npipes = atoi(argv[1]); > > pfd = calloc(npipes, sizeof (int [2])); > if (pfd == NULL) { > perror("calloc"); > exit(EXIT_FAILURE); >
RE: [PATCH 1/2] x86: Set up resources correctly on Hyper-V Generation 2
Yes, but this actually *removes a bug* in the Linux kernel; if any memory resource is left to be set up later, it is currently not set up on x86 machines which don't have PCI busses. That's not very many x86 systems, I'll agree, but I'm sure some enterprising person is busy creating an SoC which lacks PCI. -Original Message- From: Dan Williams [mailto:dan.j.willi...@intel.com] Sent: Thursday, August 18, 2016 4:17 PM To: Matthew WilcoxCc: X86 ML ; linux-kernel@vger.kernel.org; linux-nvd...@lists.01.org Subject: Re: [PATCH 1/2] x86: Set up resources correctly on Hyper-V Generation 2 On Thu, Aug 18, 2016 at 12:56 PM, Matthew Wilcox wrote: > Compared to a patch which removes 5 lines of code, almost any additional work > is ocean-boiling. > Did you check the state of NFIT enabling in Hyper-V? Not patching the Linux kernel at all is even less work.
RE: [PATCH 1/2] x86: Set up resources correctly on Hyper-V Generation 2
Yes, but this actually *removes a bug* in the Linux kernel; if any memory resource is left to be set up later, it is currently not set up on x86 machines which don't have PCI busses. That's not very many x86 systems, I'll agree, but I'm sure some enterprising person is busy creating an SoC which lacks PCI. -Original Message- From: Dan Williams [mailto:dan.j.willi...@intel.com] Sent: Thursday, August 18, 2016 4:17 PM To: Matthew Wilcox Cc: X86 ML ; linux-kernel@vger.kernel.org; linux-nvd...@lists.01.org Subject: Re: [PATCH 1/2] x86: Set up resources correctly on Hyper-V Generation 2 On Thu, Aug 18, 2016 at 12:56 PM, Matthew Wilcox wrote: > Compared to a patch which removes 5 lines of code, almost any additional work > is ocean-boiling. > Did you check the state of NFIT enabling in Hyper-V? Not patching the Linux kernel at all is even less work.
Re: [PATCH] sched: fix incorrect PELT values on SMT
On Fri, Aug 19, 2016 at 10:30:36AM +0800, Wanpeng Li wrote: > 2016-08-19 9:55 GMT+08:00 Steve Muckle: > > PELT scales its util_sum and util_avg values via > > arch_scale_cpu_capacity(). If that function is passed the CPU's sched > > domain then it will reduce the scaling capacity if SD_SHARE_CPUCAPACITY > > is set. PELT does not pass in the sd however. The other caller of > > arch_scale_cpu_capacity, update_cpu_capacity(), does. This means > > util_sum and util_avg scale beyond the CPU capacity on SMT. > > > > On an Intel i7-3630QM for example rq->cpu_capacity_orig is 589 but > > util_avg scales up to 1024. > > > > Fix this by passing in the sd in __update_load_avg() as well. > > I believe we notice this at least several months ago. > https://lkml.org/lkml/2016/5/25/228 Glad to see I'm not alone in thinking this is an issue. It causes an issue with schedutil, effectively doubling the apparent demand on SMT. I don't know the load balance code well enough offhand to say whether it's an issue there. cheers, Steve
Re: [PATCH] sched: fix incorrect PELT values on SMT
On Fri, Aug 19, 2016 at 10:30:36AM +0800, Wanpeng Li wrote: > 2016-08-19 9:55 GMT+08:00 Steve Muckle : > > PELT scales its util_sum and util_avg values via > > arch_scale_cpu_capacity(). If that function is passed the CPU's sched > > domain then it will reduce the scaling capacity if SD_SHARE_CPUCAPACITY > > is set. PELT does not pass in the sd however. The other caller of > > arch_scale_cpu_capacity, update_cpu_capacity(), does. This means > > util_sum and util_avg scale beyond the CPU capacity on SMT. > > > > On an Intel i7-3630QM for example rq->cpu_capacity_orig is 589 but > > util_avg scales up to 1024. > > > > Fix this by passing in the sd in __update_load_avg() as well. > > I believe we notice this at least several months ago. > https://lkml.org/lkml/2016/5/25/228 Glad to see I'm not alone in thinking this is an issue. It causes an issue with schedutil, effectively doubling the apparent demand on SMT. I don't know the load balance code well enough offhand to say whether it's an issue there. cheers, Steve
[x86/mm] e1a58320a3: WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page()
Greetings, 0day kernel testing robot got the below dmesg and the first bad commit is https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master commit e1a58320a38dfa72be48a0f1a3a92273663ba6db Author: Stephen SmalleyAuthorDate: Mon Oct 5 12:55:20 2015 -0400 Commit: Ingo Molnar CommitDate: Tue Oct 6 11:11:48 2015 +0200 x86/mm: Warn on W^X mappings Warn on any residual W+X mappings after setting NX if DEBUG_WX is enabled. Introduce a separate X86_PTDUMP_CORE config that enables the code for dumping the page tables without enabling the debugfs interface, so that DEBUG_WX can be enabled without exposing the debugfs interface. Switch EFI_PGT_DUMP to using X86_PTDUMP_CORE so that it also does not require enabling the debugfs interface. On success it prints this to the kernel log: x86/mm: Checked W+X mappings: passed, no W+X pages found. On failure it prints a warning and a count of the failed pages: [ cut here ] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:226 note_page+0x610/0x7b0() x86/mm: Found insecure W+X mapping at address 81755000/__stop___ex_table+0xfa8/0xabfa8 [...] Call Trace: [] dump_stack+0x44/0x55 [] warn_slowpath_common+0x82/0xc0 [] warn_slowpath_fmt+0x5c/0x80 [] ? note_page+0x5c9/0x7b0 [] note_page+0x610/0x7b0 [] ptdump_walk_pgd_level_core+0x259/0x3c0 [] ptdump_walk_pgd_level_checkwx+0x17/0x20 [] mark_rodata_ro+0xf5/0x100 [] ? rest_init+0x80/0x80 [] kernel_init+0x1d/0xe0 [] ret_from_fork+0x3f/0x70 [] ? rest_init+0x80/0x80 ---[ end trace a1f23a1e42a2ac76 ]--- x86/mm: Checked W+X mappings: FAILED, 171 W+X pages found. Signed-off-by: Stephen Smalley Acked-by: Kees Cook Cc: Andy Lutomirski Cc: Arjan van de Ven Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1444064120-11450-1-git-send-email-...@tycho.nsa.gov [ Improved the Kconfig help text and made the new option default-y if CONFIG_DEBUG_RODATA=y, because it already found buggy mappings, so we really want people to have this on by default. ] Signed-off-by: Ingo Molnar +---+++--+ | | 38a413cbc2 | e1a58320a3 | v4.4 | +---+++--+ | boot_successes| 63 | 0 | 0| | boot_failures | 0 | 22 | 45 | | WARNING:at_arch/x86/mm/dump_pagetables.c:#note_page() | 0 | 22 | 45 | | calltrace:mark_rodata_ro | 0 | 22 | 45 | +---+++--+ [ 50.648376] debug: unmapping init [mem 0x8800139e9000-0x8800139f] [ 50.652158] debug: unmapping init [mem 0x880013d38000-0x880013df] [ 50.654923] [ cut here ] [ 50.655544] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x334/0x340() [ 50.664908] x86/mm: Found insecure W+X mapping at address c00f6000/0xc00f6000 [ 50.665893] Modules linked in: [ 50.666282] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc3-00013-ge1a5832 #1 [ 50.667144] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 50.680247] 00e1 8819fce8 93698935 88198000 [ 50.681279] 8819fd38 8819fd28 93495f2d [ 50.682318] 8819fe88 0004 [ 50.683342] Call Trace: [ 50.683668] [] dump_stack+0x4c/0x67 [ 50.690347] [] warn_slowpath_common+0x8d/0xd0 [ 50.691179] [] warn_slowpath_fmt+0x41/0x50 [ 50.696101] [] note_page+0x334/0x340 [ 50.696723] [] walk_pmd_level+0x13a/0x1c0 [ 50.697382] [] walk_pud_level+0xfe/0x110 [ 50.698034] [] ptdump_walk_pgd_level_core+0xb1/0x130 [ 50.698788] [] ptdump_walk_pgd_level_checkwx+0x12/0x20 [ 50.699680] [] mark_rodata_ro+0xec/0x100 [ 50.708648] [] ? rest_init+0x150/0x150 [ 50.709400] [] kernel_init+0x18/0xe0 [ 50.712290] []
[x86/mm] e1a58320a3: WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page()
Greetings, 0day kernel testing robot got the below dmesg and the first bad commit is https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master commit e1a58320a38dfa72be48a0f1a3a92273663ba6db Author: Stephen Smalley AuthorDate: Mon Oct 5 12:55:20 2015 -0400 Commit: Ingo Molnar CommitDate: Tue Oct 6 11:11:48 2015 +0200 x86/mm: Warn on W^X mappings Warn on any residual W+X mappings after setting NX if DEBUG_WX is enabled. Introduce a separate X86_PTDUMP_CORE config that enables the code for dumping the page tables without enabling the debugfs interface, so that DEBUG_WX can be enabled without exposing the debugfs interface. Switch EFI_PGT_DUMP to using X86_PTDUMP_CORE so that it also does not require enabling the debugfs interface. On success it prints this to the kernel log: x86/mm: Checked W+X mappings: passed, no W+X pages found. On failure it prints a warning and a count of the failed pages: [ cut here ] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:226 note_page+0x610/0x7b0() x86/mm: Found insecure W+X mapping at address 81755000/__stop___ex_table+0xfa8/0xabfa8 [...] Call Trace: [] dump_stack+0x44/0x55 [] warn_slowpath_common+0x82/0xc0 [] warn_slowpath_fmt+0x5c/0x80 [] ? note_page+0x5c9/0x7b0 [] note_page+0x610/0x7b0 [] ptdump_walk_pgd_level_core+0x259/0x3c0 [] ptdump_walk_pgd_level_checkwx+0x17/0x20 [] mark_rodata_ro+0xf5/0x100 [] ? rest_init+0x80/0x80 [] kernel_init+0x1d/0xe0 [] ret_from_fork+0x3f/0x70 [] ? rest_init+0x80/0x80 ---[ end trace a1f23a1e42a2ac76 ]--- x86/mm: Checked W+X mappings: FAILED, 171 W+X pages found. Signed-off-by: Stephen Smalley Acked-by: Kees Cook Cc: Andy Lutomirski Cc: Arjan van de Ven Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1444064120-11450-1-git-send-email-...@tycho.nsa.gov [ Improved the Kconfig help text and made the new option default-y if CONFIG_DEBUG_RODATA=y, because it already found buggy mappings, so we really want people to have this on by default. ] Signed-off-by: Ingo Molnar +---+++--+ | | 38a413cbc2 | e1a58320a3 | v4.4 | +---+++--+ | boot_successes| 63 | 0 | 0| | boot_failures | 0 | 22 | 45 | | WARNING:at_arch/x86/mm/dump_pagetables.c:#note_page() | 0 | 22 | 45 | | calltrace:mark_rodata_ro | 0 | 22 | 45 | +---+++--+ [ 50.648376] debug: unmapping init [mem 0x8800139e9000-0x8800139f] [ 50.652158] debug: unmapping init [mem 0x880013d38000-0x880013df] [ 50.654923] [ cut here ] [ 50.655544] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x334/0x340() [ 50.664908] x86/mm: Found insecure W+X mapping at address c00f6000/0xc00f6000 [ 50.665893] Modules linked in: [ 50.666282] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc3-00013-ge1a5832 #1 [ 50.667144] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 50.680247] 00e1 8819fce8 93698935 88198000 [ 50.681279] 8819fd38 8819fd28 93495f2d [ 50.682318] 8819fe88 0004 [ 50.683342] Call Trace: [ 50.683668] [] dump_stack+0x4c/0x67 [ 50.690347] [] warn_slowpath_common+0x8d/0xd0 [ 50.691179] [] warn_slowpath_fmt+0x41/0x50 [ 50.696101] [] note_page+0x334/0x340 [ 50.696723] [] walk_pmd_level+0x13a/0x1c0 [ 50.697382] [] walk_pud_level+0xfe/0x110 [ 50.698034] [] ptdump_walk_pgd_level_core+0xb1/0x130 [ 50.698788] [] ptdump_walk_pgd_level_checkwx+0x12/0x20 [ 50.699680] [] mark_rodata_ro+0xec/0x100 [ 50.708648] [] ? rest_init+0x150/0x150 [ 50.709400] [] kernel_init+0x18/0xe0 [ 50.712290] [] ret_from_fork+0x3f/0x70 [ 50.712991] [] ? rest_init+0x150/0x150 [ 50.713686] ---[ end trace 77c60916b05835a9 ]--- [ 50.714324] x86/mm: Checked W+X mappings: FAILED, 2 W+X pages found. git bisect start v4.4 v4.3 -- git bisect bad cd6caf550a2adc763c6301ecc0be01f422fb2aea # 10:51 0- 17 Merge tag