date:20170209

Re: [PATCH] proc/sysctl: prune stale dentries during unregistering

2017-02-09 Thread Konstantin Khlebnikov


On 10.02.2017 10:47, Al Viro wrote:

On Fri, Feb 10, 2017 at 10:35:02AM +0300, Konstantin Khlebnikov wrote:


# time sysctl -a > /dev/null
real1m12.806s
user0m0.016s
sys 1m12.400s

Currently only memory reclaimer could remove this garbage.
But without significant memory pressure this never happens.

This patch collects sysctl inodes into list on sysctl table header and
prunes all their dentries once that table unregisters.


I'd probably go for hlist, but that's mostly cosmetic difference; how about
the matching stats *after* that patch?



dcache size doesn't grow endlessly, so stats are fine

# sysctl fs.dentry-state
fs.dentry-state = 92712 58376   45  0   0   0

# time sysctl -a &>/dev/null

real0m0.013s
user0m0.004s
sys 0m0.008s

linux-next: Tree for Feb 10

2017-02-09 Thread Stephen Rothwell

Hi all,

Changes since 20170209:

The l2mtd-tree still had ist build failure so I used the version from
next-20170208.

The kvm tree gained conflicts against the powerpc tree.

The akpm-current tree gained conflicts against the xfs tree.

The akpm tree gained a conflict against the net tree.

Non-merge commits (relative to Linus' tree): 8369
 9421 files changed, 356691 insertions(+), 174353 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
and pseries_le_defconfig and i386, sparc and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 256 trees (counting Linus' and 37 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (55aac6ef53e1 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending)
Merging fixes/master (30066ce675d3 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging kbuild-current/rc-fixes (c7858bf16c0b asm-prototypes: Clear any CPP 
defines before declaring the functions)
Merging arc-current/for-curr (8ba605b607b7 ARC: [plat-*] ARC_HAS_COH_CACHES no 
longer relevant)
Merging arm-current/fixes (228dbbfb5d77 ARM: 8643/3: arm/ptrace: Preserve 
previous registers for short regset write)
Merging m68k-current/for-linus (ad595b77c4a8 m68k/atari: Use seq_puts() in 
atari_get_hardware_list())
Merging metag-fixes/fixes (35d04077ad96 metag: Only define 
atomic_dec_if_positive conditionally)
Merging powerpc-fixes/fixes (f83e6862047e powerpc/powernv: Properly set 
"host-ipi" on IPIs)
Merging sparc/master (f9a42e0d58cf Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (bb1a619735b4 net: phy: Initialize mdio clock at probe 
function)
Merging ipsec/master (4e5da369df64 Documentation/networking: fix typo in 
mpls-sysctl)
Merging netfilter/master (f95d7a46bc57 netfilter: ctnetlink: Fix regression in 
CTA_HELP processing)
Merging ipvs/master (045169816b31 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging wireless-drivers/master (52f5631a4c05 rtlwifi: rtl8192ce: Fix loading 
of incorrect firmware)
Merging mac80211/master (fd551bac4795 nl80211: Fix mesh HT operation check)
Merging sound-current/for-linus (af677166cf63 ALSA: hda - adding a new NV 
HDMI/DP codec ID in the driver)
Merging pci-current/for-linus (d98e0929071e Revert "PCI: pciehp: Add runtime PM 
support for PCIe hotplug ports")
Merging driver-core.current/driver-core-linus (49def1853334 Linux 4.10-rc4)
Merging tty.current/tty-linus (49def1853334 Linux 4.10-rc4)
Merging usb.current/usb-linus (d5adbfcd5f7b Linux 4.10-rc7)
Merging usb-gadget-fixes/fixes (efe357f4633a usb: dwc2: host: fix 
Wmaybe-uninitialized warning)
Merging usb-serial-fixes/usb-linus (d07830db1bdb USB: serial: pl2303: add ATEN 
device ID)
Merging usb-chipidea-fixes/ci-for-usb-stable (c7fbb09b2ea1 usb: chipidea: move 
the lock initialization to core file)
Merging phy/fixes (7ce7d89f4883 Linux 4.10-rc1)
Merging staging.current/staging-linus (d5adbfcd5f7b Linux 4.10-rc7)
Merging char-misc.current/char-misc-linus (d5adbfcd5f7b Linux 4.10-rc7)
Merging input-current/for-linus (413d37326700 Input: synaptics-rmi4 - select 
'SERIO' when needed)
Merging crypto-current/master (7c2cf1c4615c crypto: chcr - Fix key length for 
RFC4106)
Merging ide/master (

Re: [PATCH 3/3 staging-next] mm: Remove RCU and tasklocks from lmk

2017-02-09 Thread Michal Hocko

On Fri 10-02-17 08:39:11, peter enderborg wrote:
> On 02/09/2017 09:05 PM, Michal Hocko wrote:
> > On Thu 09-02-17 14:21:52, peter enderborg wrote:
> >> Fundamental changes:
> >> 1 Does NOT take any RCU lock in shrinker functions.
> >> 2 It returns same result for scan and counts, so  we dont need to do
> >>   shinker will know when it is pointless to call scan.
> >> 3 It does not lock any other process than the one that is
> >>   going to be killed.
> >>
> >> Background.
> >> The low memory killer scans for process that can be killed to free
> >> memory. This can be cpu consuming when there is a high demand for
> >> memory. This can be seen by analysing the kswapd0 task work.
> >> The stats function added in earler patch adds a counter for waste work.
> >>
> >> How it works.
> >> This patch create a structure within the lowmemory killer that caches
> >> the user spaces processes that it might kill. It is done with a
> >> sorted rbtree so we can very easy find the candidate to be killed,
> >> and knows its properies as memory usage and sorted by oom_score_adj
> >> to look up the task with highest oom_score_adj. To be able to achive
> >> this it uses oom_score_notify events.
> >>
> >> This patch also as a other effect, we are now free to do other
> >> lowmemorykiller configurations.  Without the patch there is a need
> >> for a tradeoff between freed memory and task and rcu locks. This
> >> is no longer a concern for tuning lmk. This patch is not intended
> >> to do any calculation changes other than we do use the cache for
> >> calculate the count values and that makes kswapd0 to shrink other
> >> areas.
> > I have to admit I really do not understand big part of the above
> > paragraph as well as how this all is supposed to work. A quick glance
> > over the implementation. __lmk_task_insert seems to be only called from
> > the oom_score notifier context. If nobody updates the value then no task
> > will get into the tree. Or am I missing something really obvious here?
> > Moreover oom scores tend to be mostly same for tasks. That means that
> > your sorted tree will become sorted by pids in most cases. I do not see
> > any sorting based on the rss nor any updates that would reflect updates
> > of rss. How can this possibly work?
> 
> The task tree nodes are created,updated or removed from the notifier when
> there is a relevant oom_score_adj change. If no one create a task that
> is in the range for the lowmemorykiller the tree will be empty. This is
> an android feature so the score will be updated very often. It is
> part of activity manager to prioritise tasks.  Why should we do sort of
> rss?
 
Because the current lmk selects the tasks based on rss. And the patch
doesn't explain why this is no longer suitable and a different metric
shoult be used. If you also consider that the scale of oom_score_adj is
quite small, conllisions when you simply sort based on pids which is
more than questionable. I really fail to see how this can work
reasonably and why the change of the lmk semantic is even acceptable.

-- 
Michal Hocko
SUSE Labs

[tip:perf/core] perf intel-pt: Use __fallthrough

2017-02-09 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  7ea6856d6f5629d742edc23b8b76e6263371ef45
Gitweb: http://git.kernel.org/tip/7ea6856d6f5629d742edc23b8b76e6263371ef45
Author: Arnaldo Carvalho de Melo 
AuthorDate: Thu, 9 Feb 2017 15:22:22 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Thu, 9 Feb 2017 16:32:03 -0300

perf intel-pt: Use __fallthrough

To address new warnings emmited by gcc 7, e.g.::

CC   /tmp/build/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.o
CC   /tmp/build/perf/tests/parse-events.o
  util/intel-pt-decoder/intel-pt-pkt-decoder.c: In function 'intel_pt_pkt_desc':
  util/intel-pt-decoder/intel-pt-pkt-decoder.c:499:6: error: this statement may 
fall through [-Werror=implicit-fallthrough=]
 if (!(packet->count))
^
  util/intel-pt-decoder/intel-pt-pkt-decoder.c:501:2: note: here
case INTEL_PT_CYC:
^~~~
CC   /tmp/build/perf/util/intel-pt-decoder/intel-pt-decoder.o
  cc1: all warnings being treated as errors

Acked-by: Andi Kleen 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: http://lkml.kernel.org/n/tip-mf0hw789pu9x855us5l32...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/intel-pt-decoder/intel-pt-decoder.c | 5 +
 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c 
b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index e4e7dc7..7cf7f7a 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../cache.h"
 #include "../util.h"
@@ -1746,6 +1747,7 @@ static int intel_pt_walk_psb(struct intel_pt_decoder 
*decoder)
switch (decoder->packet.type) {
case INTEL_PT_TIP_PGD:
decoder->continuous_period = false;
+   __fallthrough;
case INTEL_PT_TIP_PGE:
case INTEL_PT_TIP:
intel_pt_log("ERROR: Unexpected packet\n");
@@ -1799,6 +1801,8 @@ static int intel_pt_walk_psb(struct intel_pt_decoder 
*decoder)
decoder->pge = false;
decoder->continuous_period = false;
intel_pt_clear_tx_flags(decoder);
+   __fallthrough;
+
case INTEL_PT_TNT:
decoder->have_tma = false;
intel_pt_log("ERROR: Unexpected packet\n");
@@ -1839,6 +1843,7 @@ static int intel_pt_walk_to_ip(struct intel_pt_decoder 
*decoder)
switch (decoder->packet.type) {
case INTEL_PT_TIP_PGD:
decoder->continuous_period = false;
+   __fallthrough;
case INTEL_PT_TIP_PGE:
case INTEL_PT_TIP:
decoder->pge = decoder->packet.type != INTEL_PT_TIP_PGD;
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c 
b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
index 4f7b320..7528ae4 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "intel-pt-pkt-decoder.h"
 
@@ -498,6 +499,7 @@ int intel_pt_pkt_desc(const struct intel_pt_pkt *packet, 
char *buf,
case INTEL_PT_FUP:
if (!(packet->count))
return snprintf(buf, buf_len, "%s no ip", name);
+   __fallthrough;
case INTEL_PT_CYC:
case INTEL_PT_VMCS:
case INTEL_PT_MTC:

Re: [PATCH] sched: Enhance readability of iterating wake_list

2017-02-09 Thread Peter Zijlstra

On Fri, Feb 10, 2017 at 01:09:31PM +0900, Byungchul Park wrote:
> +#define for_each_wake_list(task, node) \
> + for ((task) = llist_entry((node), struct task_struct, wake_entry); \
> +  node; (node) = llist_next(node), \
> +  (task) = llist_entry((node), struct task_struct, wake_entry))
> +

How about you make that llist_for_each(pos, member) or similar and
replace all while (foo) { foo = llist_next(foo); } instances?

Because most llist_next() users have the same pattern.

Re: [PATCH v4 1/2] staging: omap4iss: fix multiline comment style

2017-02-09 Thread Laurent Pinchart

Hi Avraham,

Thank you for the patches.

On Thursday 09 Feb 2017 18:57:55 Avraham Shukron wrote:
> Fixed multi-line comments to their preferred style (First line empty)
> 
> Signed-off-by: Avraham Shukron 

For both of them,

Acked-by: Laurent Pinchart 

I've applied the patches to my tree and will send a pull request for v4.12.

> ---
>  drivers/staging/media/omap4iss/iss_video.c | 38 ++-
>  1 file changed, 25 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/staging/media/omap4iss/iss_video.c
> b/drivers/staging/media/omap4iss/iss_video.c index bb0e3b4..e21811a 100644
> --- a/drivers/staging/media/omap4iss/iss_video.c
> +++ b/drivers/staging/media/omap4iss/iss_video.c
> @@ -128,7 +128,8 @@ static unsigned int iss_video_mbus_to_pix(const struct
> iss_video *video, pix->width = mbus->width;
>   pix->height = mbus->height;
> 
> - /* Skip the last format in the loop so that it will be selected if no
> + /*
> +  * Skip the last format in the loop so that it will be selected if no
>* match is found.
>*/
>   for (i = 0; i < ARRAY_SIZE(formats) - 1; ++i) {
> @@ -138,7 +139,8 @@ static unsigned int iss_video_mbus_to_pix(const struct
> iss_video *video,
> 
>   min_bpl = pix->width * ALIGN(formats[i].bpp, 8) / 8;
> 
> - /* Clamp the requested bytes per line value. If the maximum bytes per
> + /*
> +  * Clamp the requested bytes per line value. If the maximum bytes per
>* line value is zero, the module doesn't support user configurable 
line
>* sizes. Override the requested value with the minimum in that case.
>*/
> @@ -172,7 +174,8 @@ static void iss_video_pix_to_mbus(const struct
> v4l2_pix_format *pix, mbus->width = pix->width;
>   mbus->height = pix->height;
> 
> - /* Skip the last format in the loop so that it will be selected if no
> + /*
> +  * Skip the last format in the loop so that it will be selected if no
>* match is found.
>*/
>   for (i = 0; i < ARRAY_SIZE(formats) - 1; ++i) {
> @@ -360,7 +363,8 @@ static void iss_video_buf_queue(struct vb2_buffer *vb)
> 
>   spin_lock_irqsave(&video->qlock, flags);
> 
> - /* Mark the buffer is faulty and give it back to the queue immediately
> + /*
> +  * Mark the buffer is faulty and give it back to the queue immediately
>* if the video node has registered an error. vb2 will perform the 
same
>* check when preparing the buffer, but that is inherently racy, so we
>* need to handle the race condition with an authoritative check here.
> @@ -443,7 +447,8 @@ struct iss_buffer *omap4iss_video_buffer_next(struct
> iss_video *video)
> 
>   buf->vb.vb2_buf.timestamp = ktime_get_ns();
> 
> - /* Do frame number propagation only if this is the output video node.
> + /*
> +  * Do frame number propagation only if this is the output video node.
>* Frame number either comes from the CSI receivers or it gets
>* incremented here if H3A is not active.
>* Note: There is no guarantee that the output buffer will finish
> @@ -605,7 +610,8 @@ iss_video_set_format(struct file *file, void *fh, struct
> v4l2_format *format)
> 
>   mutex_lock(&video->mutex);
> 
> - /* Fill the bytesperline and sizeimage fields by converting to media 
bus
> + /*
> +  * Fill the bytesperline and sizeimage fields by converting to media 
bus
>* format and back to pixel format.
>*/
>   iss_video_pix_to_mbus(&format->fmt.pix, &fmt);
> @@ -678,8 +684,9 @@ iss_video_get_selection(struct file *file, void *fh,
> struct v4l2_selection *sel) if (subdev == NULL)
>   return -EINVAL;
> 
> - /* Try the get selection operation first and fallback to get format if 
not
> -  * implemented.
> + /*
> +  * Try the get selection operation first and fallback to get format if
> +  * not implemented.
>*/
>   sdsel.pad = pad;
>   ret = v4l2_subdev_call(subdev, pad, get_selection, NULL, &sdsel);
> @@ -867,7 +874,8 @@ iss_video_streamon(struct file *file, void *fh, enum
> v4l2_buf_type type)
> 
>   mutex_lock(&video->stream_lock);
> 
> - /* Start streaming on the pipeline. No link touching an entity in the
> + /*
> +  * Start streaming on the pipeline. No link touching an entity in the
>* pipeline can be activated or deactivated once streaming is started.
>*/
>   pipe = entity->pipe
> @@ -895,7 +903,8 @@ iss_video_streamon(struct file *file, void *fh, enum
> v4l2_buf_type type) while ((entity = media_graph_walk_next(&graph)))
>   media_entity_enum_set(&pipe->ent_enum, entity);
> 
> - /* Verify that the currently configured format matches the output of
> + /*
> +  * Verify that the currently configured format matches the output of
>* the connected subdev.
>*/
>   ret = iss_video_check_format(video, vfh);
> @@ -905,7 +914,8 @@ iss_video_streamon(struct file *file

[tip:perf/core] perf tests: Avoid possible truncation with dirent->d_name + snprintf

2017-02-09 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  2e2bbc039fad9eabad6c4c1a473c8b2554cdd2d4
Gitweb: http://git.kernel.org/tip/2e2bbc039fad9eabad6c4c1a473c8b2554cdd2d4
Author: Arnaldo Carvalho de Melo 
AuthorDate: Thu, 9 Feb 2017 14:48:46 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Thu, 9 Feb 2017 14:48:46 -0300

perf tests: Avoid possible truncation with dirent->d_name + snprintf

Addressing a few cases spotted by a new warning in gcc 7:

  tests/parse-events.c: In function 'test_pmu_events':
  tests/parse-events.c:1790:39: error: '%s' directive output may be truncated 
writing up to 255 bytes into a region of size 90 [-Werror=format-truncation=]
 snprintf(name, MAX_NAME, "cpu/event=%s/u", ent->d_name);
   ^~
  In file included from /usr/include/stdio.h:939:0,
   from /git/linux/tools/perf/util/map.h:9,
   from /git/linux/tools/perf/util/symbol.h:7,
   from /git/linux/tools/perf/util/evsel.h:10,
   from tests/parse-events.c:3:
  /usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk' output 
between 13 and 268 bytes into a destination of size 100
 return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
^~~~
  __bos (__s), __fmt, __va_arg_pack ());
  ~
  tests/parse-events.c:1798:29: error: '%s' directive output may be truncated 
writing up to 255 bytes into a region of size 100 [-Werror=format-truncation=]
 snprintf(name, MAX_NAME, "%s:u,cpu/event=%s/u", ent->d_name, ent->d_name);

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Fixes: 945aea220bb8 ("perf tests: Move test objects into 'tests' directory")
Link: http://lkml.kernel.org/n/tip-ty4q2p8zp1dp3mskvubxs...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/tests/parse-events.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 20c2e64..aa9276b 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -1779,15 +1779,14 @@ static int test_pmu_events(void)
}
 
while (!ret && (ent = readdir(dir))) {
-#define MAX_NAME 100
struct evlist_test e;
-   char name[MAX_NAME];
+   char name[2 * NAME_MAX + 1 + 12 + 3];
 
/* Names containing . are special and cannot be used directly */
if (strchr(ent->d_name, '.'))
continue;
 
-   snprintf(name, MAX_NAME, "cpu/event=%s/u", ent->d_name);
+   snprintf(name, sizeof(name), "cpu/event=%s/u", ent->d_name);
 
e.name  = name;
e.check = test__checkevent_pmu_events;
@@ -1795,11 +1794,10 @@ static int test_pmu_events(void)
ret = test_event(&e);
if (ret)
break;
-   snprintf(name, MAX_NAME, "%s:u,cpu/event=%s/u", ent->d_name, 
ent->d_name);
+   snprintf(name, sizeof(name), "%s:u,cpu/event=%s/u", 
ent->d_name, ent->d_name);
e.name  = name;
e.check = test__checkevent_pmu_events_mix;
ret = test_event(&e);
-#undef MAX_NAME
}
 
closedir(dir);

[tip:perf/core] perf header: Fix handling of PERF_EVENT_UPDATE__SCALE

2017-02-09 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  8434a2ec13d5c8cb25716950bfbf7c9d7b64628a
Gitweb: http://git.kernel.org/tip/8434a2ec13d5c8cb25716950bfbf7c9d7b64628a
Author: Arnaldo Carvalho de Melo 
AuthorDate: Wed, 8 Feb 2017 21:57:22 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 8 Feb 2017 22:06:18 -0300

perf header: Fix handling of PERF_EVENT_UPDATE__SCALE

In commit daeecbc0c431 ("perf tools: Add event_update event scale type"), the
handling of PERF_EVENT_UPDATE__SCALE cast struct event_update_event->data to a
pointer to event_update_event_scale, uses some field from this casted struct
and then ends up falling through to the handling of another event type,
PERF_EVENT_UPDATE__CPUS were it casts that ev->data to yet another type, oops,
fix it by inserting the missing break.

Noticed when building perf using gcc 7 on Fedora Rawhide:

  util/header.c: In function 'perf_event__process_event_update':
  util/header.c:3207:16: error: this statement may fall through 
[-Werror=implicit-fallthrough=]
 evsel->scale = ev_scale->scale;
 ~^
  util/header.c:3208:2: note: here
case PERF_EVENT_UPDATE__CPUS:
^~~~

This wasn't noticed because probably PERF_EVENT_UPDATE__CPUS comes after
PERF_EVENT_UPDATE__SCALE, so we would just create a bogus evsel->own_cpus when
processing a PERF_EVENT_UPDATE__SCALE to then leak it and create a new cpu map
with the correct data.

Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Fixes: daeecbc0c431 ("perf tools: Add event_update event scale type")
Link: http://lkml.kernel.org/n/tip-lukcf9hdj092ax2914ss9...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/header.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index c567d9f..3d12c16 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -3205,6 +3205,7 @@ int perf_event__process_event_update(struct perf_tool 
*tool __maybe_unused,
case PERF_EVENT_UPDATE__SCALE:
ev_scale = (struct event_update_event_scale *) ev->data;
evsel->scale = ev_scale->scale;
+   break;
case PERF_EVENT_UPDATE__CPUS:
ev_cpus = (struct event_update_event_cpus *) ev->data;

[tip:perf/core] perf bench numa: Avoid possible truncation when using snprintf()

2017-02-09 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  3aff8ba0a4c9c9191bb788171a1c54778e1246a2
Gitweb: http://git.kernel.org/tip/3aff8ba0a4c9c9191bb788171a1c54778e1246a2
Author: Arnaldo Carvalho de Melo 
AuthorDate: Thu, 9 Feb 2017 14:39:42 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Thu, 9 Feb 2017 14:39:42 -0300

perf bench numa: Avoid possible truncation when using snprintf()

Addressing this warning from gcc 7:

CC   /tmp/build/perf/bench/numa.o
  bench/numa.c: In function '__bench_numa':
  bench/numa.c:1582:42: error: '%d' directive output may be truncated writing 
between 1 and 10 bytes into a region of size between 8 and 17 
[-Werror=format-truncation=]
   snprintf(tname, 32, "process%d:thread%d", p, t);
^~
  bench/numa.c:1582:25: note: directive argument in the range [0, 2147483647]
   snprintf(tname, 32, "process%d:thread%d", p, t);
   ^~~~
  In file included from /usr/include/stdio.h:939:0,
   from bench/../util/util.h:47,
   from bench/../builtin.h:4,
   from bench/numa.c:11:
  /usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk' output 
between 17 and 35 bytes into a destination of size 32
 return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
^~~~
  __bos (__s), __fmt, __va_arg_pack ());
  ~
  cc1: all warnings being treated as errors

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Petr Holasek 
Cc: Wang Nan 
Link: http://lkml.kernel.org/n/tip-twa37vsfqcie5gwpqwnju...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/bench/numa.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index 8efe904..9e5a02d 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -1573,13 +1573,13 @@ static int __bench_numa(const char *name)
"GB/sec,", "total-speed",   "GB/sec total speed");
 
if (g->p.show_details >= 2) {
-   char tname[32];
+   char tname[14 + 2 * 10 + 1];
struct thread_data *td;
for (p = 0; p < g->p.nr_proc; p++) {
for (t = 0; t < g->p.nr_threads; t++) {
-   memset(tname, 0, 32);
+   memset(tname, 0, sizeof(tname));
td = g->threads + p*g->p.nr_threads + t;
-   snprintf(tname, 32, "process%d:thread%d", p, t);
+   snprintf(tname, sizeof(tname), 
"process%d:thread%d", p, t);
print_res(tname, td->speed_gbs,
"GB/sec",   "thread-speed", 
"GB/sec/thread speed");
print_res(tname, td->system_time_ns / 
NSEC_PER_SEC,

[tip:perf/core] tools string: Use __fallthrough in perf_atoll()

2017-02-09 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  94bdd5edb34e472980d1e18b4600d6fb92bd6b0a
Gitweb: http://git.kernel.org/tip/94bdd5edb34e472980d1e18b4600d6fb92bd6b0a
Author: Arnaldo Carvalho de Melo 
AuthorDate: Wed, 8 Feb 2017 17:01:46 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 8 Feb 2017 17:31:01 -0300

tools string: Use __fallthrough in perf_atoll()

The implicit fall through case label here is intended, so let us inform
that to gcc >= 7:

CC   /tmp/build/perf/util/string.o
  util/string.c: In function 'perf_atoll':
  util/string.c:22:7: error: this statement may fall through 
[-Werror=implicit-fallthrough=]
  if (*p)
 ^
  util/string.c:24:3: note: here
 case '\0':
 ^~~~

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: http://lkml.kernel.org/n/tip-0ophb30v9apkk6o95el0r...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/string.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/util/string.c b/tools/perf/util/string.c
index d8dfaf6..bddca51 100644
--- a/tools/perf/util/string.c
+++ b/tools/perf/util/string.c
@@ -21,6 +21,8 @@ s64 perf_atoll(const char *str)
case 'b': case 'B':
if (*p)
goto out_err;
+
+   __fallthrough;
case '\0':
return length;
default:

Re: [PATCH 1/3 staging-next] android: Collect statistics from lowmemorykiller

2017-02-09 Thread peter enderborg

Im not speaking for google, but I think there is a work ongoing to
replace this with user-space code. Until then we have to polish
this version as good as we can. It is essential for android as it is now.

On 02/09/2017 09:54 PM, Michal Hocko wrote:
> On Thu 09-02-17 21:07:37, Greg KH wrote:
>> On Thu, Feb 09, 2017 at 08:26:41PM +0100, Michal Hocko wrote:
>>> On Thu 09-02-17 14:21:45, peter enderborg wrote:
 This collects stats for shrinker calls and how much
 waste work we do within the lowmemorykiller.
>>> This doesn't explain why do we need this information and who is going to
>>> use it. Not to mention it exports it in /proc which is considered a
>>> stable user API. This is a no-go, especially for something that is still
>>> lingering in the staging tree without any actuall effort to make it
>>> fully supported MM feature. I am actually strongly inclined to simply
>>> drop lmk from the tree completely.
>> I thought that someone was working to get the "native" mm features to
>> work properly with the lmk "feature"  Do you recall if that work got
>> rejected, or just never happened?
> Never happened AFAIR. There were some attempts to tune the current
> behavior which has been rejected for one reason or another but I am not
> really aware of anybody working on moving the code from staging area.
>
> I already have this in the to-send queue, just didn't get to post it yet
> because I planned to polish the reasoning some more.
> ---
> From 9f871b54a387e0a7cdfaf0fa256d1440093e427c Mon Sep 17 00:00:00 2001
> From: Michal Hocko 
> Date: Wed, 1 Feb 2017 10:37:30 +0100
> Subject: [PATCH] staging, android: remove lowmemory killer from the tree
>
> Lowmemory killer is sitting in the staging tree since 2008 without any
> serious interest for fixing issues brought up by the MM folks. The main
> objection is that the implementation is basically broken by design:
>   - it hooks into slab shrinker API which is not suitable for this
> purpose. lowmem_count implementation just shows this nicely.
> There is no scaling based on the memory pressure and no
> feedback to the generic shrinker infrastructure.
>   - it is not reclaim context aware - no NUMA and/or memcg
> awareness.
>
> As the code stands right now it just adds a maintenance overhead when
> core MM changes have to update lowmemorykiller.c as well.
>
> Signed-off-by: Michal Hocko 
> ---
>  drivers/staging/android/Kconfig   |  10 --
>  drivers/staging/android/Makefile  |   1 -
>  drivers/staging/android/lowmemorykiller.c | 212 
> --
>  include/linux/sched.h |   4 -
>  4 files changed, 227 deletions(-)
>  delete mode 100644 drivers/staging/android/lowmemorykiller.c
>
> diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
> index 6c00d6f765c6..71a50b99caff 100644
> --- a/drivers/staging/android/Kconfig
> +++ b/drivers/staging/android/Kconfig
> @@ -14,16 +14,6 @@ config ASHMEM
> It is, in theory, a good memory allocator for low-memory devices,
> because it can discard shared memory units when under memory pressure.
>  
> -config ANDROID_LOW_MEMORY_KILLER
> - bool "Android Low Memory Killer"
> - ---help---
> -   Registers processes to be killed when low memory conditions, this is 
> useful
> -   as there is no particular swap space on android.
> -
> -   The registered process will kill according to the priorities in 
> android init
> -   scripts (/init.rc), and it defines priority values with minimum free 
> memory size
> -   for each priority.
> -
>  source "drivers/staging/android/ion/Kconfig"
>  
>  endif # if ANDROID
> diff --git a/drivers/staging/android/Makefile 
> b/drivers/staging/android/Makefile
> index 7ed1be798909..7cf1564a49a5 100644
> --- a/drivers/staging/android/Makefile
> +++ b/drivers/staging/android/Makefile
> @@ -3,4 +3,3 @@ ccflags-y += -I$(src) # needed for trace 
> events
>  obj-y+= ion/
>  
>  obj-$(CONFIG_ASHMEM) += ashmem.o
> -obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER)  += lowmemorykiller.o
> diff --git a/drivers/staging/android/lowmemorykiller.c 
> b/drivers/staging/android/lowmemorykiller.c
> deleted file mode 100644
> index ec3b66561412..
> --- a/drivers/staging/android/lowmemorykiller.c
> +++ /dev/null
> @@ -1,212 +0,0 @@
> -/* drivers/misc/lowmemorykiller.c
> - *
> - * The lowmemorykiller driver lets user-space specify a set of memory 
> thresholds
> - * where processes with a range of oom_score_adj values will get killed. 
> Specify
> - * the minimum oom_score_adj values in
> - * /sys/module/lowmemorykiller/parameters/adj and the number of free pages in
> - * /sys/module/lowmemorykiller/parameters/minfree. Both files take a comma
> - * separated list of numbers in ascending order.
> - *
> - * For example, write "0,8" to /sys/module/lowmemorykiller/parameters/adj and
> - * "1

[tip:perf/core] tools include: Add a __fallthrough statement

2017-02-09 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  b5bf1733d6a391c4e90ea8f8468d83023be74a2a
Gitweb: http://git.kernel.org/tip/b5bf1733d6a391c4e90ea8f8468d83023be74a2a
Author: Arnaldo Carvalho de Melo 
AuthorDate: Wed, 8 Feb 2017 17:01:46 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 8 Feb 2017 17:30:58 -0300

tools include: Add a __fallthrough statement

For cases where implicit fall through case labels are intended,
to let us inform that to gcc >= 7:

CC   /tmp/build/perf/util/string.o
  util/string.c: In function 'perf_atoll':
  util/string.c:22:7: error: this statement may fall through 
[-Werror=implicit-fallthrough=]
  if (*p)
 ^
  util/string.c:24:3: note: here
 case '\0':
 ^~~~

So we introduce:

  #define __fallthrough __attribute__ ((fallthrough))

And use it in such cases.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Cc: William Cohen 
Link: http://lkml.kernel.org/n/tip-qnpig0xfop4hwv6k4mv1w...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/include/linux/compiler.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/tools/include/linux/compiler.h b/tools/include/linux/compiler.h
index e33fc1d..d94179f 100644
--- a/tools/include/linux/compiler.h
+++ b/tools/include/linux/compiler.h
@@ -126,4 +126,13 @@ static __always_inline void __write_once_size(volatile 
void *p, void *res, int s
 #define WRITE_ONCE(x, val) \
({ union { typeof(x) __val; char __c[1]; } __u = { .__val = (val) }; 
__write_once_size(&(x), __u.__c, sizeof(x)); __u.__val; })
 
+
+#ifndef __fallthrough
+# if defined(__GNUC__) && __GNUC__ >= 7
+#  define __fallthrough __attribute__ ((fallthrough))
+# else
+#  define __fallthrough
+# endif
+#endif
+
 #endif /* _TOOLS_LINUX_COMPILER_H */

Re: [RFC][PATCH] tracing: Have traceprobe_probes_write() not access userspace unnecessarily

2017-02-09 Thread Ingo Molnar


* Masami Hiramatsu  wrote:

> On Thu, 9 Feb 2017 18:04:58 -0500
> Steven Rostedt  wrote:
> 
> > 
> > The code in traceprobe_probes_write() reads up to 4096 bytes from userpace
> > for each line. If userspace passes in several lines to execute, the code
> > will do a large read for each line, even though, it is highly likely that
> > the first read from userspace received all of the lines at one.
> > 
> > I changed the logic to do a single read from userspace, and to only read
> > from userspace again if not all of the read from userspace made it in.
> > 
> > I tested this by adding printk()s and writing files that would test -1, ==,
> > and +1 the buffer size, to make sure that there's no overflows and that if a
> > single line is written with +1 the buffer size, that it fails properly.
> > 
> 
> Thanks Steve!
> 
> Acked-by: Masami Hiramatsu 
> 
> BTW, this can conflict with my previous patch.
> 
> https://lkml.org/lkml/2017/2/6/1048
> https://lkml.org/lkml/2017/2/7/203
> 
> I'll update this. Ingo, Can I send these patch to Steve?

Sure, I've not applied your patch yet - mind sending it to Steve on top of 
Steve's 
patch?

Thanks,

Ingo

[tip:perf/core] perf tools: Fix include of linux/mman.h

2017-02-09 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  2f7db55579943cb7723e7567bd9b9927d3777d29
Gitweb: http://git.kernel.org/tip/2f7db55579943cb7723e7567bd9b9927d3777d29
Author: Arnaldo Carvalho de Melo 
AuthorDate: Tue, 7 Feb 2017 16:15:21 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 8 Feb 2017 16:06:21 -0300

perf tools: Fix include of linux/mman.h

It was using uapi/linux/mmap.h which caused for at least one reporter,
that hasn't specified in what environment the problem manifests itself:

 
The original error is:

In file included from util/event.c:2:0:
...tools/include/uapi/linux/mman.h:4:27: fatal error: uapi/asm/mman.h:
No such file or directory
 #include 
   ^
compilation terminated.
 

Test built it on these containers:

  # dm
   1 alpine:3.4: Ok
   2 android-ndk:r12b-arm: Ok
   3 archlinux:latest: Ok
   4 centos:5: Ok
   5 centos:6: Ok
   6 centos:7: Ok
   7 debian:7: Ok
   8 debian:8: Ok
   9 debian:experimental: Ok
  10 debian:experimental-x-arm64: Ok
  11 debian:experimental-x-mips: Ok
  12 debian:experimental-x-mips64: Ok
  13 debian:experimental-x-mipsel: Ok
  14 fedora:20: Ok
  15 fedora:21: Ok
  16 fedora:22: Ok
  17 fedora:23: Ok
  18 fedora:24: Ok
  19 fedora:24-x-ARC-uClibc: Ok
  20 fedora:25: Ok
  21 fedora:rawhide: Ok
  22 mageia:5: Ok
  23 opensuse:13.2: Ok
  24 opensuse:42.1: Ok
  25 opensuse:tumbleweed: Ok
  26 ubuntu:12.04.5: Ok
  27 ubuntu:14.04.4-x-linaro-arm64: Ok
  28 ubuntu:15.10: Ok
  29 ubuntu:16.04: Ok
  30 ubuntu:16.04-x-arm: Ok
  31 ubuntu:16.04-x-arm64: Ok
  32 ubuntu:16.04-x-powerpc: Ok
  33 ubuntu:16.04-x-powerpc64: Ok
  34 ubuntu:16.04-x-powerpc64el: Ok
  35 ubuntu:16.04-x-s390: Ok
  36 ubuntu:16.10: Ok

Reported-by: David Carrillo-Cisneros 
Cc: Alexander Shishkin 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Michal Marek 
Cc: Paul Turner 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Uwe Kleine-König 
Cc: Wang Nan 
Fixes: fbef103fad50 ("perf tools: Do hugetlb handling in more systems")
Link: http://lkml.kernel.org/n/tip-4wm5xmjz5wgbq7ucyz4dy...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/event.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 8ab0d7d..4ea7ce7 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -1,5 +1,5 @@
 #include 
-#include  /* To get things like MAP_HUGETLB even on older 
libc headers */
+#include  /* To get things like MAP_HUGETLB even on older libc 
headers */
 #include 
 #include "event.h"
 #include "debug.h"

Re: [PATCH] media: usb: uvc: add a quirk for Generalplus Technology Inc. 808 Camera

2017-02-09 Thread Laurent Pinchart

Hi Neil,

Thank you for the patch.

On Thursday 09 Feb 2017 14:26:46 Neil Armstrong wrote:
> As reported on [1], this device needs this quirk to be able to
> reliably initialise the webcam.
> 
> [1] https://sourceforge.net/p/linux-uvc/mailman/message/33791098/
> 
> Signed-off-by: Neil Armstrong 

Reviewed-by: Laurent Pinchart 

and applied to my tree. I'll send a pull request for v4.12.

> ---
>  drivers/media/usb/uvc/uvc_driver.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/media/usb/uvc/uvc_driver.c
> b/drivers/media/usb/uvc/uvc_driver.c index 04bf350..6b2d761 100644
> --- a/drivers/media/usb/uvc/uvc_driver.c
> +++ b/drivers/media/usb/uvc/uvc_driver.c
> @@ -2671,6 +2671,15 @@ static int uvc_clock_param_set(const char *val,
> struct kernel_param *kp) .bInterfaceSubClass  = 1,
> .bInterfaceProtocol   = 0,
> .driver_info  = UVC_QUIRK_PROBE_MINMAX },
> + /* Generalplus Technology Inc. 808 Camera */
> + { .match_flags  = USB_DEVICE_ID_MATCH_DEVICE
> + | USB_DEVICE_ID_MATCH_INT_INFO,
> +   .idVendor = 0x1b3f,
> +   .idProduct= 0x2002,
> +   .bInterfaceClass  = USB_CLASS_VIDEO,
> +   .bInterfaceSubClass   = 1,
> +   .bInterfaceProtocol   = 0,
> +   .driver_info  = UVC_QUIRK_PROBE_MINMAX },
>   /* SiGma Micro USB Web Camera */
>   { .match_flags  = USB_DEVICE_ID_MATCH_DEVICE
> 
>   | USB_DEVICE_ID_MATCH_INT_INFO,

-- 
Regards,

Laurent Pinchart

[PATCH 3/4] phy: rockchip-typec: support DP phy switch

2017-02-09 Thread Chris Zhong

There are 2 Type-c PHYs in RK3399, but only one DP controller. Hence
only one PHY can connect to DP controller at one time, the other should
be disconnected. The GRF_SOC_CON26 register has a switch bit to do it,
set this bit means enable PHY 1, clear this bit means enable PHY 0.

Signed-off-by: Chris Zhong 
---

 drivers/phy/phy-rockchip-typec.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/phy/phy-rockchip-typec.c b/drivers/phy/phy-rockchip-typec.c
index 7cfb0f8..1604aaa 100644
--- a/drivers/phy/phy-rockchip-typec.c
+++ b/drivers/phy/phy-rockchip-typec.c
@@ -267,6 +267,7 @@ struct rockchip_usb3phy_port_cfg {
struct usb3phy_reg usb3tousb2_en;
struct usb3phy_reg external_psm;
struct usb3phy_reg pipe_status;
+   struct usb3phy_reg uphy_dp_sel;
 };
 
 struct rockchip_typec_phy {
@@ -736,6 +737,7 @@ static const struct phy_ops rockchip_usb3_phy_ops = {
 static int rockchip_dp_phy_power_on(struct phy *phy)
 {
struct rockchip_typec_phy *tcphy = phy_get_drvdata(phy);
+   struct rockchip_usb3phy_port_cfg *cfg = &tcphy->port_cfgs;
int new_mode, ret = 0;
u32 val;
 
@@ -766,6 +768,8 @@ static int rockchip_dp_phy_power_on(struct phy *phy)
tcphy_phy_init(tcphy, new_mode);
}
 
+   property_enable(tcphy, &cfg->uphy_dp_sel, 1);
+
ret = readx_poll_timeout(readl, tcphy->base + DP_MODE_CTL,
 val, val & DP_MODE_A2, 1000,
 PHY_MODE_SET_TIMEOUT);
@@ -869,6 +873,11 @@ static int tcphy_parse_dt(struct rockchip_typec_phy *tcphy,
if (ret)
return ret;
 
+   ret = tcphy_get_param(dev, &cfg->uphy_dp_sel,
+ "rockchip,uphy-dp-sel");
+   if (ret)
+   return ret;
+
tcphy->grf_regs = syscon_regmap_lookup_by_phandle(dev->of_node,
  "rockchip,grf");
if (IS_ERR(tcphy->grf_regs)) {
-- 
2.6.3

[PATCH 4/4] drm/rockchip: cdn-dp: remove the DP phy switch

2017-02-09 Thread Chris Zhong

There are 2 Type-c PHYs in RK3399, but only one DP controller. Hence
only one PHY can connect to DP controller at one time, the other should
be disconnected. The GRF_SOC_CON26 register has a switch bit to do it,
set this bit means enable PHY 1, clear this bit means enable PHY 0.

If the board has 2 Type-C ports, the DP driver get the phy id from
devm_of_phy_get_by_index, and then control this switch according to
this id. But some others board only has one Type-C port, it may be PHY 0
or PHY 1. The dts node id can not tell us the correct PHY id. Hence move
this switch to PHY driver, the PHY driver can distinguish between PHY 0
and PHY 1, and then write the correct register bit.

Signed-off-by: Chris Zhong 
---

 drivers/gpu/drm/rockchip/cdn-dp-core.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/cdn-dp-core.c 
b/drivers/gpu/drm/rockchip/cdn-dp-core.c
index 9ab67a6..d3f6e6b 100644
--- a/drivers/gpu/drm/rockchip/cdn-dp-core.c
+++ b/drivers/gpu/drm/rockchip/cdn-dp-core.c
@@ -43,8 +43,6 @@
 #define GRF_SOC_CON9   0x6224
 #define DP_SEL_VOP_LIT BIT(12)
 #define GRF_SOC_CON26  0x6268
-#define UPHY_SEL_BIT   3
-#define UPHY_SEL_MASK  BIT(19)
 #define DPTX_HPD_SEL   (3 << 12)
 #define DPTX_HPD_DEL   (2 << 12)
 #define DPTX_HPD_SEL_MASK  (3 << 28)
@@ -403,11 +401,6 @@ static int cdn_dp_enable_phy(struct cdn_dp_device *dp, 
struct cdn_dp_port *port)
union extcon_property_value property;
int ret;
 
-   ret = cdn_dp_grf_write(dp, GRF_SOC_CON26,
-  (port->id << UPHY_SEL_BIT) | UPHY_SEL_MASK);
-   if (ret)
-   return ret;
-
if (!port->phy_enabled) {
ret = phy_power_on(port->phy);
if (ret) {
-- 
2.6.3

[PATCH 0/4] Move DP phy switch to PHY driver

2017-02-09 Thread Chris Zhong


There are 2 Type-c PHYs in RK3399, but only one DP controller. Hence
only one PHY can connect to DP controller at one time, the other should
be disconnected. The GRF_SOC_CON26 register has a switch bit to do it,
set this bit means enable PHY 1, clear this bit means enable PHY 0.

If the board has 2 Type-C ports, the DP driver get the phy id from
devm_of_phy_get_by_index, and then control this switch according to
this id. But some others board only has one Type-C port, it may be PHY 0
or PHY 1. The dts node id can not tell us the correct PHY id. Hence move
this switch to PHY driver, the PHY driver can distinguish between PHY 0
and PHY 1, and then write the correct register bit.



Chris Zhong (4):
  Documentation: bindings: add uphy-dp-sel for Rockchip USB Type-C PHY
  arm64: dts: rockchip: add rockchip,uphy-dp-sel for Type-C phy
  phy: rockchip-typec: support DP phy switch
  drm/rockchip: cdn-dp: remove the DP phy switch

 Documentation/devicetree/bindings/phy/phy-rockchip-typec.txt | 5 +
 arch/arm64/boot/dts/rockchip/rk3399.dtsi | 2 ++
 drivers/gpu/drm/rockchip/cdn-dp-core.c   | 7 ---
 drivers/phy/phy-rockchip-typec.c | 9 +
 4 files changed, 16 insertions(+), 7 deletions(-)

-- 
2.6.3

Re: [PATCH 1/3 staging-next] android: Collect statistics from lowmemorykiller

2017-02-09 Thread Greg Kroah-Hartman

On Fri, Feb 10, 2017 at 08:21:32AM +0100, peter enderborg wrote:
> Im not speaking for google, but I think there is a work ongoing to
> replace this with user-space code.

Really?  I have not heard this at all, any pointers to whom in Google is
doing it?

> Until then we have to polish this version as good as we can. It is
> essential for android as it is now.

But if no one is willing to do the work to fix the reported issues, why
should it remain?  Can you do the work here?  You're already working on
fixing some of the issues in a differnt way, why not do the "real work"
here instead for everyone to benifit from?

thanks,

greg k-h

[PATCH 1/4] Documentation: bindings: add uphy-dp-sel for Rockchip USB Type-C PHY

2017-02-09 Thread Chris Zhong

rockchip,uphy-dp-sel is the register of type-c phy enable DP function.

Signed-off-by: Chris Zhong 
---

 Documentation/devicetree/bindings/phy/phy-rockchip-typec.txt | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/phy/phy-rockchip-typec.txt 
b/Documentation/devicetree/bindings/phy/phy-rockchip-typec.txt
index 6ea867e..c3be83b 100644
--- a/Documentation/devicetree/bindings/phy/phy-rockchip-typec.txt
+++ b/Documentation/devicetree/bindings/phy/phy-rockchip-typec.txt
@@ -33,6 +33,9 @@ offset, enable bit, write mask bit.
  - rockchip,pipe-status : the register of type-c phy pipe status.
for type-c phy0, it must be <0xe5c0 0 0>;
for type-c phy1, it must be <0xe5c0 16 16>;
+ - rockchip,uphy-dp-sel : the register of type-c phy enable DP function
+   for type-c phy0, it must be <0x6268 19 19>;
+   for type-c phy1, it must be <0x6268 3 19>;
 
 Required nodes : a sub-node is required for each port the phy provides.
 The sub-node name is used to identify dp or usb3 port,
@@ -62,6 +65,7 @@ Example:
rockchip,usb3tousb2-en = <0xe580 3 19>;
rockchip,external-psm = <0xe588 14 30>;
rockchip,pipe-status = <0xe5c0 0 0>;
+   rockchip,uphy-dp-sel = <0x6268 19 19>;
 
tcphy0_dp: dp-port {
#phy-cells = <0>;
@@ -90,6 +94,7 @@ Example:
rockchip,usb3tousb2-en = <0xe58c 3 19>;
rockchip,external-psm = <0xe594 14 30>;
rockchip,pipe-status = <0xe5c0 16 16>;
+   rockchip,uphy-dp-sel = <0x6268 3 19>;
 
tcphy1_dp: dp-port {
#phy-cells = <0>;
-- 
2.6.3

Re: [PATCH 1/3] kprobes: introduce weak variant of kprobe_exceptions_notify

2017-02-09 Thread Ingo Molnar


* Michael Ellerman  wrote:

> "Naveen N. Rao"  writes:
> 
> > kprobe_exceptions_notify() is not used on some of the architectures such
> > as arm[64] and powerpc anymore. Introduce a weak variant for such
> > architectures.
> 
> I'll merge patch 1 & 3 via the powerpc tree for v4.11.

Acked-by: Ingo Molnar 

Thanks,

Ingo

[tip:perf/core] perf vendor events intel: Add uncore events for Sandy Bridge Server

2017-02-09 Thread tip-bot for Andi Kleen

Commit-ID:  dd32cb5d8fd42316bf8c2b9f7e5c51a38625f755
Gitweb: http://git.kernel.org/tip/dd32cb5d8fd42316bf8c2b9f7e5c51a38625f755
Author: Andi Kleen 
AuthorDate: Sat, 17 Sep 2016 18:10:03 -0700
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 8 Feb 2017 16:37:35 -0300

perf vendor events intel: Add uncore events for Sandy Bridge Server

This is not a full uncore event list, but a short list of useful
and understandable metrics.

Signed-off-by: Andi Kleen 
Cc: Jiri Olsa 
Link: http://lkml.kernel.org/n/tip-c0cix4eprbldfrx5zf60s...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 .../pmu-events/arch/x86/jaketown/uncore-cache.json | 209 +
 .../arch/x86/jaketown/uncore-interconnect.json |  46 
 .../arch/x86/jaketown/uncore-memory.json   |  79 +++
 .../pmu-events/arch/x86/jaketown/uncore-power.json | 248 +
 4 files changed, 582 insertions(+)

diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json 
b/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json
new file mode 100644
index 000..2f23cf0
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json
@@ -0,0 +1,209 @@
+[
+{
+"BriefDescription": "Uncore cache clock ticks. Derived from 
unc_c_clockticks",
+"Counter": "0,1,2,3",
+"EventName": "UNC_C_CLOCKTICKS",
+"PerPkg": "1",
+"Unit": "CBO"
+},
+{
+"BriefDescription": "All LLC Misses (code+ data rd + data wr - 
including demand and prefetch). Derived from unc_c_llc_lookup.any",
+"Counter": "0,1",
+"EventCode": "0x34",
+"EventName": "UNC_C_LLC_LOOKUP.ANY",
+"Filter": "filter_state=0x1",
+"PerPkg": "1",
+"ScaleUnit": "64Bytes",
+"UMask": "0x11",
+"Unit": "CBO"
+},
+{
+"BriefDescription": "M line evictions from LLC (writebacks to memory). 
Derived from unc_c_llc_victims.m_state",
+"Counter": "0,1",
+"EventCode": "0x37",
+"EventName": "UNC_C_LLC_VICTIMS.M_STATE",
+"PerPkg": "1",
+"ScaleUnit": "64Bytes",
+"UMask": "0x1",
+"Unit": "CBO"
+},
+{
+"BriefDescription": "LLC misses - demand and prefetch data reads - 
excludes LLC prefetches. Derived from unc_c_tor_inserts.miss_opcode.demand",
+"Counter": "0,1",
+"EventCode": "0x35",
+"EventName": "LLC_MISSES.DATA_READ",
+"Filter": "filter_opc=0x182",
+"PerPkg": "1",
+"ScaleUnit": "64Bytes",
+"UMask": "0x3",
+"Unit": "CBO"
+},
+{
+"BriefDescription": "LLC misses - Uncacheable reads. Derived from 
unc_c_tor_inserts.miss_opcode.uncacheable",
+"Counter": "0,1",
+"EventCode": "0x35",
+"EventName": "LLC_MISSES.UNCACHEABLE",
+"Filter": "filter_opc=0x187",
+"PerPkg": "1",
+"ScaleUnit": "64Bytes",
+"UMask": "0x3",
+"Unit": "CBO"
+},
+{
+"BriefDescription": "PCIe allocating writes that miss LLC - DDIO 
misses. Derived from unc_c_tor_inserts.miss_opcode.ddio_miss",
+"Counter": "0,1",
+"EventCode": "0x35",
+"EventName": "LLC_MISSES.PCIE_WRITE",
+"Filter": "filter_opc=0x19c",
+"PerPkg": "1",
+"ScaleUnit": "64Bytes",
+"UMask": "0x3",
+"Unit": "CBO"
+},
+{
+"BriefDescription": "LLC misses for ItoM writes (as part of fast 
string memcpy stores). Derived from unc_c_tor_inserts.miss_opcode.itom_write",
+"Counter": "0,1",
+"EventCode": "0x35",
+"EventName": "LLC_MISSES.ITOM_WRITE",
+"Filter": "filter_opc=0x1c8",
+"PerPkg": "1",
+"ScaleUnit": "64Bytes",
+"UMask": "0x3",
+"Unit": "CBO"
+},
+{
+"BriefDescription": "Streaming stores (full cache line). Derived from 
unc_c_tor_inserts.opcode.streaming_full",
+"Counter": "0,1",
+"EventCode": "0x35",
+"EventName": "LLC_REFERENCES.STREAMING_FULL",
+"Filter": "filter_opc=0x18c",
+"PerPkg": "1",
+"ScaleUnit": "64Bytes",
+"UMask": "0x1",
+"Unit": "CBO"
+},
+{
+"BriefDescription": "Streaming stores (partial cache line). Derived 
from unc_c_tor_inserts.opcode.streaming_partial",
+"Counter": "0,1",
+"EventCode": "0x35",
+"EventName": "LLC_REFERENCES.STREAMING_PARTIAL",
+"Filter": "filter_opc=0x18d",
+"PerPkg": "1",
+"ScaleUnit": "64Bytes",
+"UMask": "0x1",
+"Unit": "CBO"
+},
+{
+"BriefDescription": "Partial PCIe reads. Derived from 
unc_c_tor_inserts.opcode.pcie_partial",
+"Counter": "0,1",
+"EventCode": "0x35",
+"EventName": "LLC_REFERENCES.PCIE_PARTIAL_READ",
+"Filter": "filter_opc=0x195",
+"PerPkg": "1",
+"ScaleUnit": "64Bytes",
+"UMask": "0x1",
+"Unit": "CBO"
+},
+{
+"BriefDesc

[tip:perf/core] perf tools: Use zfree() instead of ad hoc equivalent

2017-02-09 Thread tip-bot for Taeung Song

Commit-ID:  506fde11a35f39e1b44478339c41e94dfd278aa2
Gitweb: http://git.kernel.org/tip/506fde11a35f39e1b44478339c41e94dfd278aa2
Author: Taeung Song 
AuthorDate: Wed, 1 Feb 2017 21:34:06 +0900
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 8 Feb 2017 09:41:11 -0300

perf tools: Use zfree() instead of ad hoc equivalent

We have zfree(&ptr) for this very common pattern:

   free(ptr);
   ptr = NULL;

So use it in a few more places.

Signed-off-by: Taeung Song 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Wang Nan 
Link: 
http://lkml.kernel.org/r/1485952447-7013-4-git-send-email-treeze.tae...@gmail.com
[ rewrote commit log ]
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/parse-events.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 1f1f77d..ab1ba22 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -254,8 +254,7 @@ struct tracepoint_path *tracepoint_name_to_path(const char 
*name)
if (path->system == NULL || path->name == NULL) {
zfree(&path->system);
zfree(&path->name);
-   free(path);
-   path = NULL;
+   zfree(&path);
}
 
return path;
@@ -1482,8 +1481,7 @@ static void perf_pmu__parse_cleanup(void)
p = perf_pmu_events_list + i;
free(p->symbol);
}
-   free(perf_pmu_events_list);
-   perf_pmu_events_list = NULL;
+   zfree(&perf_pmu_events_list);
perf_pmu_events_list_num = 0;
}
 }

Re: [PATCH] proc/sysctl: prune stale dentries during unregistering

2017-02-09 Thread Al Viro

On Fri, Feb 10, 2017 at 10:35:02AM +0300, Konstantin Khlebnikov wrote:

> # time sysctl -a > /dev/null
> real1m12.806s
> user0m0.016s
> sys 1m12.400s
> 
> Currently only memory reclaimer could remove this garbage.
> But without significant memory pressure this never happens.
> 
> This patch collects sysctl inodes into list on sysctl table header and
> prunes all their dentries once that table unregisters.

I'd probably go for hlist, but that's mostly cosmetic difference; how about
the matching stats *after* that patch?

RE: [PATCH v4 3/3] USB3/DWC3: Enable undefined length INCR burst type

2017-02-09 Thread Jerry Huang

> -Original Message-
> From: Changming Huang [mailto:jerry.hu...@nxp.com]
> Sent: Wednesday, January 18, 2017 4:12 PM
> To: ba...@kernel.org; robh...@kernel.org; mark.rutl...@arm.com;
> catalin.mari...@arm.com
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org;
> devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; Jerry
> Huang ; Rajesh Bhagat 
> Subject: [PATCH v4 3/3] USB3/DWC3: Enable undefined length INCR burst
> type
> 
> Enable the undefined length INCR burst type and set INCRx.
> Different platform may has the different burst size type.
> In order to get best performance, we need to tune the burst size to one
> special value, instead of the default value.
> 
> Signed-off-by: Changming Huang 
> Signed-off-by: Rajesh Bhagat 
> ---
> Changes in v4:
>   - Modify the codes according to the definition of this property.
> Changes in v3:
>   - add new property for INCR burst in usb node to reset GSBUSCFG0.
> Changes in v2:
>   - split patch
>   - create one new function to handle soc bus configuration register.
> 
>  drivers/usb/dwc3/core.c |   83
> +++
>  drivers/usb/dwc3/core.h |7 
>  2 files changed, 90 insertions(+)
> 
> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c index
> 369bab1..446aec3 100644
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -650,6 +650,87 @@ static void dwc3_core_setup_global_control(struct
> dwc3 *dwc)
>   dwc3_writel(dwc->regs, DWC3_GCTL, reg);  }
> 
> +/* set global soc bus configuration registers */ static void
> +dwc3_set_soc_bus_cfg(struct dwc3 *dwc) {
> + struct device *dev = dwc->dev;
> + u32 *vals;
> + u32 cfg;
> + int ntype;
> + int ret;
> + int i;
> +
> + cfg = dwc3_readl(dwc->regs, DWC3_GSBUSCFG0);
> +
> + /*
> +  * Handle property "snps,incr-burst-type-adjustment".
> +  * Get the number of value from this property:
> +  * result <= 0, means this property is not supported.
> +  * result = 1, means INCRx burst mode supported.
> +  * result > 1, means undefined length burst mode supported.
> +  */
> + ntype = device_property_read_u32_array(dev,
> + "snps,incr-burst-type-adjustment", NULL, 0);
> + if (ntype > 0) {
> + vals = kcalloc(ntype, sizeof(u32), GFP_KERNEL);
> + if (!vals) {
> + dev_err(dev, "Error to get memory\n");
> + return;
> + }
> + /* Get INCR burst type, and parse it */
> + ret = device_property_read_u32_array(dev,
> + "snps,incr-burst-type-adjustment", vals, ntype);
> + if (ret) {
> + dev_err(dev, "Error to get property\n");
> + return;
> + }
> + *(dwc->incrx_type + 1) = vals[0];
> + if (ntype > 1) {
> + *dwc->incrx_type = 1;
> + for (i = 1; i < ntype; i++) {
> + if (vals[i] > *(dwc->incrx_type + 1))
> + *(dwc->incrx_type + 1) = vals[i];
> + }
> + } else
> + *dwc->incrx_type = 0;
> +
> + /* Enable Undefined Length INCR Burst and Enable INCRx
> Burst */
> + cfg &= ~DWC3_GSBUSCFG0_INCRBRST_MASK;
> + if (*dwc->incrx_type)
> + cfg |= DWC3_GSBUSCFG0_INCRBRSTENA;
> + switch (*(dwc->incrx_type + 1)) {
> + case 256:
> + cfg |= DWC3_GSBUSCFG0_INCR256BRSTENA;
> + break;
> + case 128:
> + cfg |= DWC3_GSBUSCFG0_INCR128BRSTENA;
> + break;
> + case 64:
> + cfg |= DWC3_GSBUSCFG0_INCR64BRSTENA;
> + break;
> + case 32:
> + cfg |= DWC3_GSBUSCFG0_INCR32BRSTENA;
> + break;
> + case 16:
> + cfg |= DWC3_GSBUSCFG0_INCR16BRSTENA;
> + break;
> + case 8:
> + cfg |= DWC3_GSBUSCFG0_INCR8BRSTENA;
> + break;
> + case 4:
> + cfg |= DWC3_GSBUSCFG0_INCR4BRSTENA;
> + break;
> + case 1:
> + break;
> + default:
> + dev_err(dev, "Invalid property\n");
> + break;
> + }
> + }
> +
> + dwc3_writel(dwc->regs, DWC3_GSBUSCFG0, cfg); }
> +
>  /**
>   * dwc3_core_init - Low-level initialization of DWC3 Core
>   * @dwc: Pointer to our controller context structure @@ -698,6 +779,8 @@
> static int dwc3_core_init(struct dwc3 *dwc)
>   /* Adjust Frame Length */
>   dwc3_frame_length_adjustment(dwc);
> 
> + dwc3_set_soc_bus_cfg(dwc);
> +
>   usb_phy_set_suspend(dwc->usb2_phy, 0);
>   usb_phy_set_susp

Re: [PATCH 2/6] tpm: export tpm2_flush_context_cmd

2017-02-09 Thread Jarkko Sakkinen

On Wed, Feb 08, 2017 at 10:58:30AM -0700, Jason Gunthorpe wrote:
> >  /**
> > + * tpm2_flush_context_cmd() - execute a TPM2_FlushContext command
> > + * @chip: TPM chip to use
> > + * @payload: the key data in clear and encrypted form
> > + * @options: authentication values and other options
> > + *
> > + * Return: same as with tpm_transmit_cmd
> > + */
> > +void tpm2_flush_context_cmd(struct tpm_chip *chip, u32 handle,
> > +   unsigned int flags)
> 
> Why did you move the function in the same file? Adding a prototype
> shouldn't require that..
> 
> Jason

To be logically positioned in the same way as it is in the header.

It cannot in-between the helper functions for tpm2_seal_trusted().

/Jarkko

Re: [PATCH V3 2/2] Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASAN

2017-02-09 Thread Johannes Thumshirn

On 02/09/2017 06:20 PM, Scott Bauer wrote:
> When CONFIG_KASAN is enabled, compilation fails:
> 
> block/sed-opal.c: In function 'sed_ioctl':
> block/sed-opal.c:2447:1: error: the frame size of 2256 bytes is larger than 
> 2048 bytes [-Werror=frame-larger-than=]
> 
> Moved all the ioctl structures off the stack and dynamically activate
> using _IOC_SIZE()
> 
> Fixes: 455a7b238cd6 ("block: Add Sed-opal library")
> 
> Reported-by: Arnd Bergmann 
> Signed-off-by: Scott Bauer 
> ---
>  block/sed-opal.c | 134 
> +--
>  1 file changed, 50 insertions(+), 84 deletions(-)
> 

[...]

> - if (copy_from_user(&session, arg, sizeof(session)))
> - return -EFAULT;
> - return opal_erase_locking_range(dev, &session);
> + ioctl_ptr = kzalloc(cmd_size, GFP_KERNEL);
> + if (!ioctl_ptr)
> + return -ENOMEM;
> + if (copy_from_user(ioctl_ptr, arg, cmd_size)) {
> + ret = -EFAULT;
> + goto out;
>   }

Can't we use memdup_user() instead of kzalloc() + copy_from_user()?



-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

[tip:perf/core] perf sdt: Show proper hint when event not yet in place via 'perf probe'

2017-02-09 Thread tip-bot for Ravi Bangoria

Commit-ID:  27cf5706a04e53f6844c71be1cbbf1df665f5d19
Gitweb: http://git.kernel.org/tip/27cf5706a04e53f6844c71be1cbbf1df665f5d19
Author: Ravi Bangoria 
AuthorDate: Fri, 3 Feb 2017 15:56:42 +0530
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 8 Feb 2017 09:28:54 -0300

perf sdt: Show proper hint when event not yet in place via 'perf probe'

All events from 'perf list', except SDT events, can be directly recorded
with 'perf record'. But, the flow is little different for SDT events.

Probe points for SDT event needs to be created using 'perf probe' before
recording it using 'perf record'.

Perf shows misleading hint when a user tries to record SDT event without
first creating a probe point. Show proper hint there.

Before patch:

  $ perf record -a -e sdt_glib:idle__add
event syntax error: 'sdt_glib:idle__add'
 \___ unknown tracepoint

Error: File /sys/kernel/debug/tracing/events/sdt_glib/idle__add not found.
Hint:  Perhaps this kernel misses some CONFIG_ setting to enable this 
feature?.
...

After patch:

  $ perf record -a -e sdt_glib:idle__add
event syntax error: 'sdt_glib:idle__add'
 \___ unknown tracepoint

Error: File /sys/kernel/debug/tracing/events/sdt_glib/idle__add not found.
Hint:  SDT event cannot be directly recorded on.
   Please first use 'perf probe sdt_glib:idle__add' before recording it.
...

  $ perf probe sdt_glib:idle__add
Added new event:
  sdt_glib:idle__add   (on %idle__add in /usr/lib64/libglib-2.0.so.0.5000.2)

You can now use it in all perf tools, such as:

perf record -e sdt_glib:idle__add -aR sleep 1

  $ perf record -a -e sdt_glib:idle__add
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.175 MB perf.data ]

Suggested-and-Acked-by: Ingo Molnar 
Signed-off-by: Ravi Bangoria 
Tested-by: Arnaldo Carvalho de Melo 
Acked-by: Masami Hiramatsu 
Cc: Alexander Shishkin 
Cc: Alexis Berlemont 
Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Naveen N. Rao 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/20170203102642.17258-1-ravi.bango...@linux.vnet.ibm.com
[ s/Please use/Please first use/ and break the Hint line in two ]
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/api/fs/tracing_path.c | 32 ++--
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c
index 251b7c3..3e606b9 100644
--- a/tools/lib/api/fs/tracing_path.c
+++ b/tools/lib/api/fs/tracing_path.c
@@ -86,9 +86,13 @@ void put_tracing_file(char *file)
free(file);
 }
 
-static int strerror_open(int err, char *buf, size_t size, const char *filename)
+int tracing_path__strerror_open_tp(int err, char *buf, size_t size,
+  const char *sys, const char *name)
 {
char sbuf[128];
+   char filename[PATH_MAX];
+
+   snprintf(filename, PATH_MAX, "%s/%s", sys, name ?: "*");
 
switch (err) {
case ENOENT:
@@ -99,10 +103,19 @@ static int strerror_open(int err, char *buf, size_t size, 
const char *filename)
 * - jirka
 */
if (debugfs__configured() || tracefs__configured()) {
-   snprintf(buf, size,
-"Error:\tFile %s/%s not found.\n"
-"Hint:\tPerhaps this kernel misses some 
CONFIG_ setting to enable this feature?.\n",
-tracing_events_path, filename);
+   /* sdt markers */
+   if (!strncmp(filename, "sdt_", 4)) {
+   snprintf(buf, size,
+   "Error:\tFile %s/%s not found.\n"
+   "Hint:\tSDT event cannot be directly 
recorded on.\n"
+   "\tPlease first use 'perf probe %s:%s' 
before recording it.\n",
+   tracing_events_path, filename, sys, 
name);
+   } else {
+   snprintf(buf, size,
+"Error:\tFile %s/%s not found.\n"
+"Hint:\tPerhaps this kernel misses 
some CONFIG_ setting to enable this feature?.\n",
+tracing_events_path, filename);
+   }
break;
}
snprintf(buf, size, "%s",
@@ -125,12 +138,3 @@ static int strerror_open(int err, char *buf, size_t size, 
const char *filename)
 
return 0;
 }
-
-int tracing_path__strerror_open_tp(int err, char *buf, size_t size, const char 
*sys, const char *name)
-{
-   char path[PATH_MAX];
-
-   snprintf(path, PATH_MAX, "%s/%s", sys, name ?: "*");
-
-   return strerror_open(err, buf, size, path);
-}

[tip:perf/core] perf symbols: Take into account symfs setting when reading file build ID

2017-02-09 Thread tip-bot for Victor Kamensky

Commit-ID:  9b200653518ea9ccc331b204c7d555d2440570d1
Gitweb: http://git.kernel.org/tip/9b200653518ea9ccc331b204c7d555d2440570d1
Author: Victor Kamensky 
AuthorDate: Mon, 6 Feb 2017 15:48:28 -0800
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 8 Feb 2017 09:28:55 -0300

perf symbols: Take into account symfs setting when reading file build ID

After commit 5baecbcd9c9a ("perf symbols: we can now read separate
debug-info files based on a build ID") and when --symfs option is used
perf failed to pick up symbols for file with the same name between host
and sysroot specified by --symfs option.  One can see message like this:

  bin/bash with build id 26f0062cb6950d4d1ab0fd9c43eae8b10ca42062 not found, 
continuing without symbols

It happens because code added by 5baecbcd9c9a opens files directly by
dso->long_name without symbol_conf.symfs consideration, which as result
picks one from the host. It reads its build ID and later even code finds
another proper file in directory pointed by --symfs perf ignores it
because build id mismatches.

Fix is to use __symbol__join_symfs to adjust file name according to
--symfs setting. If no --symfs passed the operation would noop and picks
the same host file as before.

Also note in latter tree after 5baecbcd9c9a commit additional check for
'!dso->has_build_id' was added, so to observe error condition 'perf
record' should run with --no-buildid, so perf.data itself would not have
build id for target binary in buildid perf section and 'perf report'
will pass '!dso->has_build_id' condition. Or target binary should not
have build id, but the same binary on host has build id, again
'!dso->has_build_id' will pass in this case and incorrect build id could
be read if --symfs is used.

Signed-off-by: Victor Kamensky 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Chris Phlipot 
Cc: Dima Kogan 
Cc: He Kuang 
Cc: Kan Liang 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Wang Nan 
Cc: xe-linux-exter...@cisco.com
Fixes: 5baecbcd9c9a ("perf symbols: we can now read separate debug-info files 
based on a build ID")
Link: 
http://lkml.kernel.org/r/1486424908-17094-1-git-send-email-kamen...@cisco.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/symbol.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index dc93940..70e389b 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1460,9 +1460,11 @@ int dso__load(struct dso *dso, struct map *map)
 * DSO_BINARY_TYPE__BUILDID_DEBUGINFO to work
 */
if (!dso->has_build_id &&
-   is_regular_file(dso->long_name) &&
-   filename__read_build_id(dso->long_name, build_id, BUILD_ID_SIZE) > 
0)
+   is_regular_file(dso->long_name)) {
+   __symbol__join_symfs(name, PATH_MAX, dso->long_name);
+   if (filename__read_build_id(name, build_id, BUILD_ID_SIZE) > 0)
dso__set_build_id(dso, build_id);
+   }
 
/*
 * Iterate over candidate debug images.

[tip:perf/core] perf pmu: Support per pmu json aliases

2017-02-09 Thread tip-bot for Andi Kleen

Commit-ID:  15b22ed369aa23ef4d083ffb9621650c353d3ddd
Gitweb: http://git.kernel.org/tip/15b22ed369aa23ef4d083ffb9621650c353d3ddd
Author: Andi Kleen 
AuthorDate: Fri, 27 Jan 2017 18:03:38 -0800
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 8 Feb 2017 08:55:03 -0300

perf pmu: Support per pmu json aliases

Add support for registering json aliases per PMU. Any alias with an unit
matching the prefix is registered to the PMU.  Uncore has multiple
instances of most units, so all these aliases get registered for each
individual PMU (this is important later to run the event on every
instance of the PMU).

To avoid printing the events multiple times in perf list filter out
duplicated events during printing.

v2: Rely on uncore_ prefix already in unit
v3: Document why calls were reordered

Signed-off-by: Andi Kleen 
Cc: Jiri Olsa 
Link: http://lkml.kernel.org/r/20170128020345.19007-4-a...@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/pmu.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 6dc3cc0..8e9d00f 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -590,14 +590,16 @@ static struct perf_pmu *pmu_lookup(const char *name)
if (pmu_format(name, &format))
return NULL;
 
-   if (pmu_aliases(name, &aliases))
+   /*
+* Check the type first to avoid unnecessary work.
+*/
+   if (pmu_type(name, &type))
return NULL;
 
-   pmu_add_cpu_aliases(&aliases, name);
-
-   if (pmu_type(name, &type))
+   if (pmu_aliases(name, &aliases))
return NULL;
 
+   pmu_add_cpu_aliases(&aliases, name);
pmu = zalloc(sizeof(*pmu));
if (!pmu)
return NULL;
@@ -1195,6 +1197,9 @@ void print_pmu_events(const char *event_glob, bool 
name_only, bool quiet_flag,
len = j;
qsort(aliases, len, sizeof(struct sevent), cmp_sevent);
for (j = 0; j < len; j++) {
+   /* Skip duplicates */
+   if (j > 0 && !strcmp(aliases[j].name, aliases[j - 1].name))
+   continue;
if (name_only) {
printf("%s ", aliases[j].name);
continue;

[tip:perf/core] perf tools arm64: Add support for generating bpf prologue

2017-02-09 Thread tip-bot for He Kuang

Commit-ID:  3bb53c9f124bd9297f18d58a395cff59dfaf8541
Gitweb: http://git.kernel.org/tip/3bb53c9f124bd9297f18d58a395cff59dfaf8541
Author: He Kuang 
AuthorDate: Tue, 7 Feb 2017 07:34:11 +
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 8 Feb 2017 08:55:01 -0300

perf tools arm64: Add support for generating bpf prologue

Since HAVE_KPROBES can be enabled in arm64, this patch introduces
regs_query_register_offset() to convert register name to offset for
arm64, so the BPF prologue feature is ready to use.

Signed-off-by: He Kuang 
Reviewed-by: Will Deacon 
Acked-by: Masami Hiramatsu 
Cc: Alexander Shishkin 
Cc: Bintian Wang 
Cc: Jiri Olsa 
Cc: Peter Zijlstra 
Cc: Wang Nan 
Cc: linux-arm-ker...@lists.infradead.org
Link: http://lkml.kernel.org/r/20170207073412.26983-1-heku...@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/arch/arm64/Makefile  |  1 +
 tools/perf/arch/arm64/util/dwarf-regs.c | 15 ++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/tools/perf/arch/arm64/Makefile b/tools/perf/arch/arm64/Makefile
index 18b1351..eebe1ec 100644
--- a/tools/perf/arch/arm64/Makefile
+++ b/tools/perf/arch/arm64/Makefile
@@ -2,3 +2,4 @@ ifndef NO_DWARF
 PERF_HAVE_DWARF_REGS := 1
 endif
 PERF_HAVE_JITDUMP := 1
+PERF_HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET := 1
diff --git a/tools/perf/arch/arm64/util/dwarf-regs.c 
b/tools/perf/arch/arm64/util/dwarf-regs.c
index d49efeb..068b618 100644
--- a/tools/perf/arch/arm64/util/dwarf-regs.c
+++ b/tools/perf/arch/arm64/util/dwarf-regs.c
@@ -10,17 +10,20 @@
 
 #include 
 #include 
+#include  /* for struct user_pt_regs */
+#include "util.h"
 
 struct pt_regs_dwarfnum {
const char *name;
unsigned int dwarfnum;
 };
 
-#define STR(s) #s
 #define REG_DWARFNUM_NAME(r, num) {.name = r, .dwarfnum = num}
 #define GPR_DWARFNUM_NAME(num) \
{.name = STR(%x##num), .dwarfnum = num}
 #define REG_DWARFNUM_END {.name = NULL, .dwarfnum = 0}
+#define DWARFNUM2OFFSET(index) \
+   (index * sizeof((struct user_pt_regs *)0)->regs[0])
 
 /*
  * Reference:
@@ -78,3 +81,13 @@ const char *get_arch_regstr(unsigned int n)
return roff->name;
return NULL;
 }
+
+int regs_query_register_offset(const char *name)
+{
+   const struct pt_regs_dwarfnum *roff;
+
+   for (roff = regdwarfnum_table; roff->name != NULL; roff++)
+   if (!strcmp(roff->name, name))
+   return DWARFNUM2OFFSET(roff->dwarfnum);
+   return -EINVAL;
+}

[tip:perf/core] perf pmu: Support event aliases for non cpu// pmus

2017-02-09 Thread tip-bot for Andi Kleen

Commit-ID:  231bb2aa32498cbebef1306889a02114e9dfc934
Gitweb: http://git.kernel.org/tip/231bb2aa32498cbebef1306889a02114e9dfc934
Author: Andi Kleen 
AuthorDate: Fri, 27 Jan 2017 18:03:39 -0800
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 8 Feb 2017 08:55:04 -0300

perf pmu: Support event aliases for non cpu// pmus

The code for handling pmu aliases without specifying the PMU hardcoded
only supported the cpu PMU.

This patch extends it to work for all PMUs. We always duplicate the
event for all PMUs that have an matching alias.  This allows to
automatically expand an alias for all instances of a PMU (so for example
you can monitor all cache boxes with a single event)

Signed-off-by: Andi Kleen 
Acked-by: Jiri Olsa 
Link: http://lkml.kernel.org/r/20170128020345.19007-5-a...@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/parse-events.c | 46 --
 tools/perf/util/parse-events.y | 32 ++---
 2 files changed, 51 insertions(+), 27 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 3c876b8..6dbcba7 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1504,35 +1504,41 @@ static void perf_pmu__parse_init(void)
struct perf_pmu_alias *alias;
int len = 0;
 
-   pmu = perf_pmu__find("cpu");
-   if ((pmu == NULL) || list_empty(&pmu->aliases)) {
+   pmu = NULL;
+   while ((pmu = perf_pmu__scan(pmu)) != NULL) {
+   list_for_each_entry(alias, &pmu->aliases, list) {
+   if (strchr(alias->name, '-'))
+   len++;
+   len++;
+   }
+   }
+
+   if (len == 0) {
perf_pmu_events_list_num = -1;
return;
}
-   list_for_each_entry(alias, &pmu->aliases, list) {
-   if (strchr(alias->name, '-'))
-   len++;
-   len++;
-   }
perf_pmu_events_list = malloc(sizeof(struct perf_pmu_event_symbol) * 
len);
if (!perf_pmu_events_list)
return;
perf_pmu_events_list_num = len;
 
len = 0;
-   list_for_each_entry(alias, &pmu->aliases, list) {
-   struct perf_pmu_event_symbol *p = perf_pmu_events_list + len;
-   char *tmp = strchr(alias->name, '-');
-
-   if (tmp != NULL) {
-   SET_SYMBOL(strndup(alias->name, tmp - alias->name),
-   PMU_EVENT_SYMBOL_PREFIX);
-   p++;
-   SET_SYMBOL(strdup(++tmp), PMU_EVENT_SYMBOL_SUFFIX);
-   len += 2;
-   } else {
-   SET_SYMBOL(strdup(alias->name), PMU_EVENT_SYMBOL);
-   len++;
+   pmu = NULL;
+   while ((pmu = perf_pmu__scan(pmu)) != NULL) {
+   list_for_each_entry(alias, &pmu->aliases, list) {
+   struct perf_pmu_event_symbol *p = perf_pmu_events_list 
+ len;
+   char *tmp = strchr(alias->name, '-');
+
+   if (tmp != NULL) {
+   SET_SYMBOL(strndup(alias->name, tmp - 
alias->name),
+   PMU_EVENT_SYMBOL_PREFIX);
+   p++;
+   SET_SYMBOL(strdup(++tmp), 
PMU_EVENT_SYMBOL_SUFFIX);
+   len += 2;
+   } else {
+   SET_SYMBOL(strdup(alias->name), 
PMU_EVENT_SYMBOL);
+   len++;
+   }
}
}
qsort(perf_pmu_events_list, len,
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 879115f..f3b5ec9 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include "util.h"
+#include "pmu.h"
 #include "parse-events.h"
 #include "parse-events-bison.h"
 
@@ -236,15 +237,32 @@ PE_KERNEL_PMU_EVENT sep_dc
struct list_head *head;
struct parse_events_term *term;
struct list_head *list;
+   struct perf_pmu *pmu = NULL;
+   int ok = 0;
 
-   ALLOC_LIST(head);
-   ABORT_ON(parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_USER,
-   $1, 1, &@1, NULL));
-   list_add_tail(&term->list, head);
-
+   /* Add it for all PMUs that support the alias */
ALLOC_LIST(list);
-   ABORT_ON(parse_events_add_pmu(data, list, "cpu", head));
-   parse_events_terms__delete(head);
+   while ((pmu = perf_pmu__scan(pmu)) != NULL) {
+   struct perf_pmu_alias *alias;
+
+   list_for_each_entry(alias, &pmu->aliases, list) {
+   if (!strcasecmp(alias->name, $1)) {
+   ALLOC_LIST(head);
+

[tip:perf/core] perf jevents: Parse eventcode as number

2017-02-09 Thread tip-bot for Andi Kleen

Commit-ID:  d581141970ef3965c1624960fa928d765afd8a3e
Gitweb: http://git.kernel.org/tip/d581141970ef3965c1624960fa928d765afd8a3e
Author: Andi Kleen 
AuthorDate: Fri, 27 Jan 2017 18:03:36 -0800
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Wed, 8 Feb 2017 08:55:02 -0300

perf jevents: Parse eventcode as number

The next patch needs to modify event code. Previously eventcode was just
passed through as a string. Now parse it as a number.

v2: Don't special case 0

Signed-off-by: Andi Kleen 
Acked-by: Jiri Olsa 
Link: http://lkml.kernel.org/r/20170128020345.19007-2-a...@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/pmu-events/jevents.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c
index 41611d7..551377b 100644
--- a/tools/perf/pmu-events/jevents.c
+++ b/tools/perf/pmu-events/jevents.c
@@ -135,7 +135,6 @@ static struct field {
const char *field;
const char *kernel;
 } fields[] = {
-   { "EventCode",  "event=" },
{ "UMask",  "umask=" },
{ "CounterMask", "cmask=" },
{ "Invert", "inv=" },
@@ -343,6 +342,7 @@ int json_events(const char *fn,
jsmntok_t *tokens, *tok;
int i, j, len;
char *map;
+   char buf[128];
 
if (!fn)
return -ENOENT;
@@ -356,6 +356,7 @@ int json_events(const char *fn,
char *event = NULL, *desc = NULL, *name = NULL;
char *long_desc = NULL;
char *extra_desc = NULL;
+   unsigned long long eventcode = 0;
struct msrmap *msr = NULL;
jsmntok_t *msrval = NULL;
jsmntok_t *precise = NULL;
@@ -376,6 +377,11 @@ int json_events(const char *fn,
nz = !json_streq(map, val, "0");
if (match_field(map, field, nz, &event, val)) {
/* ok */
+   } else if (json_streq(map, field, "EventCode")) {
+   char *code = NULL;
+   addfield(map, &code, "", "", val);
+   eventcode |= strtoul(code, NULL, 0);
+   free(code);
} else if (json_streq(map, field, "EventName")) {
addfield(map, &name, "", "", val);
} else if (json_streq(map, field, "BriefDescription")) {
@@ -410,6 +416,8 @@ int json_events(const char *fn,
addfield(map, &extra_desc, " ",
"(Precise event)", NULL);
}
+   snprintf(buf, sizeof buf, "event=%#llx", eventcode);
+   addfield(map, &event, ",", buf, NULL);
if (desc && extra_desc)
addfield(map, &desc, " ", extra_desc, NULL);
if (long_desc && extra_desc)

Re: [PATCHSET 0/4] perf diff: Introduce delta-abs compute method (v2)

2017-02-09 Thread Ingo Molnar


* Namhyung Kim  wrote:

> Hello,
> 
> This patchset adds 'delta-abs' compute method to -c/--compute option.
> The 'delta-abs' is same as 'delta' but shows entries with bigger
> absolute delta first instead of sorting numerically.  This is only
> useful together with -o option.
> 
>  * v2 changes
>   - rebase onto acme/perf/core
>   - change default option to '-o 1 -c delta-abs'
> 
> 
> Below is default output (-c delta):
> 
>   $ perf diff -o 1 -c delta | grep -v ^# | head
> 42.22%   +4.97%  [kernel.kallsyms]  [k] cfb_imageblit
>  0.62%   +1.23%  [kernel.kallsyms]  [k] mutex_lock
>  +1.15%  [kernel.kallsyms]  [k] copy_user_generic_string
>  2.40%   +0.95%  [kernel.kallsyms]  [k] bit_putcs
>  0.31%   +0.79%  [kernel.kallsyms]  [k] link_path_walk
>  +0.64%  [kernel.kallsyms]  [k] kmem_cache_alloc
>  0.00%   +0.57%  [kernel.kallsyms]  [k] __rcu_read_unlock
>  +0.45%  [kernel.kallsyms]  [k] alloc_set_pte
>  0.16%   +0.45%  [kernel.kallsyms]  [k] menu_select
>  +0.41%  ld-2.24.so [.] do_lookup_x
> 
> Now with 'delta-abs' it shows entries have bigger delta value either
> positive or negative.
> 
>   $ perf diff -o 1 -c delta-abs | grep -v ^# | head
> 42.22%   +4.97%  [kernel.kallsyms]  [k] cfb_imageblit
> 12.72%   -3.01%  [kernel.kallsyms]  [k] intel_idle
>  9.72%   -1.31%  [unknown]  [.] 0x00411343
>  0.62%   +1.23%  [kernel.kallsyms]  [k] mutex_lock
>  +1.15%  [kernel.kallsyms]  [k] copy_user_generic_string
>  2.40%   +0.95%  [kernel.kallsyms]  [k] bit_putcs
>  0.31%   +0.79%  [kernel.kallsyms]  [k] link_path_walk
>  1.35%   -0.71%  [kernel.kallsyms]  [k] smp_call_function_single
>  +0.64%  [kernel.kallsyms]  [k] kmem_cache_alloc
>  0.00%   +0.57%  [kernel.kallsyms]  [k] __rcu_read_unlock

Nice!

BTW., to me the second output looks a lot more intuitive and useful than the 
default one.

Would it be possible to flip over the default to the 'most useful' options, and 
see whether anyone complains?

Thanks,

Ingo

[no subject]

2017-02-09 Thread Marty Plummer

Greetings.

I think I may have found a bug with the hix5hd2_gmac driver; unless I'm
missing something, it appears that somehow the net_device struct is not
being initialized properly in the hix5hd2_dev_probe function.

Having set up my devicetree properly (I hope, still new to this), I first
recieved an error when inserting the module:
"(unnamed net_device) (uninitialized): No irq resource"
while I very clearly have the interrupts property defined within this node.

Removing the phy-handle node for testing purposes, I get a similar message:
"(unnamed net_device) (uninitialized): not find phy-handle"

So, it seams to my (admittedly inexperienced) mind that the ndev pointer is
not being initialized properly, or that the error checking at line 
is not functioning properly either, for it to have gotten so far along
into the function, only to fail at the attempt to access the ndev pointer.

If you require more information from me, please let me know.

Marty

Re: [PATCH 3/3 staging-next] mm: Remove RCU and tasklocks from lmk

2017-02-09 Thread peter enderborg

On 02/09/2017 09:05 PM, Michal Hocko wrote:
> On Thu 09-02-17 14:21:52, peter enderborg wrote:
>> Fundamental changes:
>> 1 Does NOT take any RCU lock in shrinker functions.
>> 2 It returns same result for scan and counts, so  we dont need to do
>>   shinker will know when it is pointless to call scan.
>> 3 It does not lock any other process than the one that is
>>   going to be killed.
>>
>> Background.
>> The low memory killer scans for process that can be killed to free
>> memory. This can be cpu consuming when there is a high demand for
>> memory. This can be seen by analysing the kswapd0 task work.
>> The stats function added in earler patch adds a counter for waste work.
>>
>> How it works.
>> This patch create a structure within the lowmemory killer that caches
>> the user spaces processes that it might kill. It is done with a
>> sorted rbtree so we can very easy find the candidate to be killed,
>> and knows its properies as memory usage and sorted by oom_score_adj
>> to look up the task with highest oom_score_adj. To be able to achive
>> this it uses oom_score_notify events.
>>
>> This patch also as a other effect, we are now free to do other
>> lowmemorykiller configurations.  Without the patch there is a need
>> for a tradeoff between freed memory and task and rcu locks. This
>> is no longer a concern for tuning lmk. This patch is not intended
>> to do any calculation changes other than we do use the cache for
>> calculate the count values and that makes kswapd0 to shrink other
>> areas.
> I have to admit I really do not understand big part of the above
> paragraph as well as how this all is supposed to work. A quick glance
> over the implementation. __lmk_task_insert seems to be only called from
> the oom_score notifier context. If nobody updates the value then no task
> will get into the tree. Or am I missing something really obvious here?
> Moreover oom scores tend to be mostly same for tasks. That means that
> your sorted tree will become sorted by pids in most cases. I do not see
> any sorting based on the rss nor any updates that would reflect updates
> of rss. How can this possibly work?

The task tree nodes are created,updated or removed from the notifier when
there is a relevant oom_score_adj change. If no one create a task that
is in the range for the lowmemorykiller the tree will be empty. This is
an android feature so the score will be updated very often. It is
part of activity manager to prioritise tasks.  Why should we do sort of
rss?

[PATCH v2 2/4] perf diff: Add diff.order config option

2017-02-09 Thread Namhyung Kim

In many cases, I need to look at differences between two data so I often
used the -o option to sort the result base on the difference first.
It'd be nice to have a config option to set it by default.

The diff.order config option is to set the default value of -o/--order
option.

Cc: Taeung Song 
Signed-off-by: Namhyung Kim 
---
 tools/perf/Documentation/perf-config.txt |  7 +++
 tools/perf/Documentation/perf-diff.txt   |  6 +-
 tools/perf/builtin-diff.c| 14 ++
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-config.txt 
b/tools/perf/Documentation/perf-config.txt
index 9365b75fd04f..5b54d47ef713 100644
--- a/tools/perf/Documentation/perf-config.txt
+++ b/tools/perf/Documentation/perf-config.txt
@@ -498,6 +498,13 @@ Variables
But if this option is 'no-cache', it will not update the 
build-id cache.
'skip' skips post-processing and does not update the cache.
 
+diff.*::
+   diff.order::
+   This option sets the number of column to sort the result.
+   Default is 0 which means sorting by baseline.
+   Setting it to 1 will sort the result by delta (or other
+   compute method selected).
+
 SEE ALSO
 
 linkperf:perf[1]
diff --git a/tools/perf/Documentation/perf-diff.txt 
b/tools/perf/Documentation/perf-diff.txt
index af80284cd2f6..6ba3bf582d79 100644
--- a/tools/perf/Documentation/perf-diff.txt
+++ b/tools/perf/Documentation/perf-diff.txt
@@ -99,7 +99,11 @@ OPTIONS
 
 -o::
 --order::
-   Specify compute sorting column number.
+   Specify compute sorting column number.  0 means sorting by baseline
+   overhead (default) and 1 means sorting by computed value of column 1
+   (data from the first file other base baseline).  Values more than 1
+   can be used only if enough data files are provided.
+   Default value can be set using diff.order config option.
 
 --percentage::
Determine how to display the overhead percentage of filtered entries.
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 781c9e60bd21..181ff996e039 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -17,6 +17,7 @@
 #include "util/symbol.h"
 #include "util/util.h"
 #include "util/data.h"
+#include "util/config.h"
 
 #include 
 #include 
@@ -1291,6 +1292,17 @@ static int data_init(int argc, const char **argv)
return 0;
 }
 
+static int diff__config(const char *var, const char *value,
+   void *cb __maybe_unused)
+{
+   if (!strcmp(var, "diff.order")) {
+   sort_compute = perf_config_int(var, value);
+   return 0;
+   }
+
+   return 0;
+}
+
 int cmd_diff(int argc, const char **argv, const char *prefix __maybe_unused)
 {
int ret = hists__init();
@@ -1298,6 +1310,8 @@ int cmd_diff(int argc, const char **argv, const char 
*prefix __maybe_unused)
if (ret < 0)
return ret;
 
+   perf_config(diff__config, NULL);
+
argc = parse_options(argc, argv, options, diff_usage, 0);
 
if (symbol__init(NULL) < 0)
-- 
2.11.0

[PATCHSET 0/4] perf diff: Introduce delta-abs compute method (v2)

2017-02-09 Thread Namhyung Kim

Hello,

This patchset adds 'delta-abs' compute method to -c/--compute option.
The 'delta-abs' is same as 'delta' but shows entries with bigger
absolute delta first instead of sorting numerically.  This is only
useful together with -o option.

 * v2 changes
  - rebase onto acme/perf/core
  - change default option to '-o 1 -c delta-abs'


Below is default output (-c delta):

  $ perf diff -o 1 -c delta | grep -v ^# | head
42.22%   +4.97%  [kernel.kallsyms]  [k] cfb_imageblit
 0.62%   +1.23%  [kernel.kallsyms]  [k] mutex_lock
 +1.15%  [kernel.kallsyms]  [k] copy_user_generic_string
 2.40%   +0.95%  [kernel.kallsyms]  [k] bit_putcs
 0.31%   +0.79%  [kernel.kallsyms]  [k] link_path_walk
 +0.64%  [kernel.kallsyms]  [k] kmem_cache_alloc
 0.00%   +0.57%  [kernel.kallsyms]  [k] __rcu_read_unlock
 +0.45%  [kernel.kallsyms]  [k] alloc_set_pte
 0.16%   +0.45%  [kernel.kallsyms]  [k] menu_select
 +0.41%  ld-2.24.so [.] do_lookup_x

Now with 'delta-abs' it shows entries have bigger delta value either
positive or negative.

  $ perf diff -o 1 -c delta-abs | grep -v ^# | head
42.22%   +4.97%  [kernel.kallsyms]  [k] cfb_imageblit
12.72%   -3.01%  [kernel.kallsyms]  [k] intel_idle
 9.72%   -1.31%  [unknown]  [.] 0x00411343
 0.62%   +1.23%  [kernel.kallsyms]  [k] mutex_lock
 +1.15%  [kernel.kallsyms]  [k] copy_user_generic_string
 2.40%   +0.95%  [kernel.kallsyms]  [k] bit_putcs
 0.31%   +0.79%  [kernel.kallsyms]  [k] link_path_walk
 1.35%   -0.71%  [kernel.kallsyms]  [k] smp_call_function_single
 +0.64%  [kernel.kallsyms]  [k] kmem_cache_alloc
 0.00%   +0.57%  [kernel.kallsyms]  [k] __rcu_read_unlock

The patch 2 and 3 are to add config options to control the default
behavior of perf diff command.  And patch 4 changes the default setting.

The code is avaiable at 'perf/diff-delta-abs-v2' branch in

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung


Namhyung Kim (4):
  perf diff: Add 'delta-abs' compute method
  perf diff: Add diff.order config option
  perf diff: Add diff.compute config option
  perf diff: Change default setting to "delta-abs"

 tools/perf/Documentation/perf-config.txt | 12 +
 tools/perf/Documentation/perf-diff.txt   | 15 --
 tools/perf/builtin-diff.c| 78 ++--
 3 files changed, 98 insertions(+), 7 deletions(-)

-- 
2.11.0

Re: [PATCH 1/3] ath10k: remove ath10k_vif_to_arvif()

2017-02-09 Thread Joe Perches

On Thu, 2017-02-09 at 23:14 -0800, Adrian Chadd wrote:

> If there
> were accessors for the skb data / len fields (like we do for mbufs)
> then porting the code would've involved about 5,000 less changed
> lines.

What generic mechanisms would you suggest to make
porting easier between bsd and linux and what in
your opinion are the best naming schemes to make
these functions easiest to read and implement
without resorting to excessive identifier lengths?

If you have some, please provide examples.

[PATCH v2 1/4] perf diff: Add 'delta-abs' compute method

2017-02-09 Thread Namhyung Kim

The 'delta-abs' compute method is same as 'delta' but shows entries with
bigger absolute delta first instead of sorting numerically.  This is
only useful together with -o option.

Below is default output (-c delta):

  $ perf diff -o 1 -c delta | grep -v ^# | head
42.22%   +4.97%  [kernel.kallsyms]  [k] cfb_imageblit
 0.62%   +1.23%  [kernel.kallsyms]  [k] mutex_lock
 +1.15%  [kernel.kallsyms]  [k] copy_user_generic_string
 2.40%   +0.95%  [kernel.kallsyms]  [k] bit_putcs
 0.31%   +0.79%  [kernel.kallsyms]  [k] link_path_walk
 +0.64%  [kernel.kallsyms]  [k] kmem_cache_alloc
 0.00%   +0.57%  [kernel.kallsyms]  [k] __rcu_read_unlock
 +0.45%  [kernel.kallsyms]  [k] alloc_set_pte
 0.16%   +0.45%  [kernel.kallsyms]  [k] menu_select
 +0.41%  ld-2.24.so [.] do_lookup_x

Now with 'delta-abs' it shows entries have bigger delta value either
positive or negative.

  $ perf diff -o 1 -c delta-abs | grep -v ^# | head
42.22%   +4.97%  [kernel.kallsyms]  [k] cfb_imageblit
12.72%   -3.01%  [kernel.kallsyms]  [k] intel_idle
 9.72%   -1.31%  [unknown]  [.] 0x00411343
 0.62%   +1.23%  [kernel.kallsyms]  [k] mutex_lock
 2.40%   +0.95%  [kernel.kallsyms]  [k] bit_putcs
 0.31%   +0.79%  [kernel.kallsyms]  [k] link_path_walk
 1.35%   -0.71%  [kernel.kallsyms]  [k] smp_call_function_single
 0.00%   +0.57%  [kernel.kallsyms]  [k] __rcu_read_unlock
 0.16%   +0.45%  [kernel.kallsyms]  [k] menu_select
 0.72%   -0.44%  [kernel.kallsyms]  [k] lookup_fast

Signed-off-by: Namhyung Kim 
---
 tools/perf/Documentation/perf-diff.txt |  6 -
 tools/perf/builtin-diff.c  | 46 --
 2 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-diff.txt 
b/tools/perf/Documentation/perf-diff.txt
index 3e9490b9c533..af80284cd2f6 100644
--- a/tools/perf/Documentation/perf-diff.txt
+++ b/tools/perf/Documentation/perf-diff.txt
@@ -86,7 +86,7 @@ OPTIONS
 
 -c::
 --compute::
-Differential computation selection - delta,ratio,wdiff (default is 
delta).
+Differential computation selection - delta,ratio,wdiff,delta-abs 
(default is delta).
 See COMPARISON METHODS section for more info.
 
 -p::
@@ -181,6 +181,10 @@ delta
 relative to how entries are filtered.  Use --percentage=absolute to
 prevent such fluctuation.
 
+delta-abs
+~
+Same as 'delta` method, but sort the result with the absolute values.
+
 ratio
 ~
 If specified the 'Ratio' column is displayed with value 'r' computed as:
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 933aeec46f4a..781c9e60bd21 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -30,6 +30,7 @@ enum {
PERF_HPP_DIFF__RATIO,
PERF_HPP_DIFF__WEIGHTED_DIFF,
PERF_HPP_DIFF__FORMULA,
+   PERF_HPP_DIFF__DELTA_ABS,
 
PERF_HPP_DIFF__MAX_INDEX
 };
@@ -73,11 +74,13 @@ enum {
COMPUTE_DELTA,
COMPUTE_RATIO,
COMPUTE_WEIGHTED_DIFF,
+   COMPUTE_DELTA_ABS,
COMPUTE_MAX,
 };
 
 const char *compute_names[COMPUTE_MAX] = {
[COMPUTE_DELTA] = "delta",
+   [COMPUTE_DELTA_ABS] = "delta-abs",
[COMPUTE_RATIO] = "ratio",
[COMPUTE_WEIGHTED_DIFF] = "wdiff",
 };
@@ -86,6 +89,7 @@ static int compute;
 
 static int compute_2_hpp[COMPUTE_MAX] = {
[COMPUTE_DELTA] = PERF_HPP_DIFF__DELTA,
+   [COMPUTE_DELTA_ABS] = PERF_HPP_DIFF__DELTA_ABS,
[COMPUTE_RATIO] = PERF_HPP_DIFF__RATIO,
[COMPUTE_WEIGHTED_DIFF] = PERF_HPP_DIFF__WEIGHTED_DIFF,
 };
@@ -111,6 +115,10 @@ static struct header_column {
.name  = "Delta",
.width = 7,
},
+   [PERF_HPP_DIFF__DELTA_ABS] = {
+   .name  = "Delta Abs",
+   .width = 7,
+   },
[PERF_HPP_DIFF__RATIO] = {
.name  = "Ratio",
.width = 14,
@@ -298,6 +306,7 @@ static int formula_fprintf(struct hist_entry *he, struct 
hist_entry *pair,
 {
switch (compute) {
case COMPUTE_DELTA:
+   case COMPUTE_DELTA_ABS:
return formula_delta(he, pair, buf, size);
case COMPUTE_RATIO:
return formula_ratio(he, pair, buf, size);
@@ -461,6 +470,7 @@ static void hists__precompute(struct hists *hists)
 
switch (compute) {
case COMPUTE_DELTA:
+   case COMPUTE_DELTA_ABS:
compute_delta(he, pair);
break;
case COMPUTE_RATIO:
@@ -498,6 +508,13 @@ __hist_entry__cmp_compute(struct hist_entry *left, struct 
hist_entry *right,
 
return cmp_doubles(l, r);
}
+   case COMPUTE_DELTA_ABS:
+   {
+   double l = fabs(left->diff.period_ratio_delta);
+   double r = fabs(righ

[PATCH v2 4/4] perf diff: Change default setting to "delta-abs"

2017-02-09 Thread Namhyung Kim

The "delta-abs" compute method will show most changed entries on top.
So users can easily see how much effect between the data.  To see
original-style (sorted by baseline) use -o 0 option.

Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 4b4004d41c6a..27300f2c665b 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -66,7 +66,7 @@ static bool force;
 static bool show_period;
 static bool show_formula;
 static bool show_baseline_only;
-static unsigned int sort_compute;
+static unsigned int sort_compute = 1;
 
 static s64 compute_wdiff_w1;
 static s64 compute_wdiff_w2;
@@ -86,7 +86,7 @@ const char *compute_names[COMPUTE_MAX] = {
[COMPUTE_WEIGHTED_DIFF] = "wdiff",
 };
 
-static int compute = COMPUTE_DELTA;
+static int compute = COMPUTE_DELTA_ABS;
 
 static int compute_2_hpp[COMPUTE_MAX] = {
[COMPUTE_DELTA] = PERF_HPP_DIFF__DELTA,
-- 
2.11.0

[PATCH v2 3/4] perf diff: Add diff.compute config option

2017-02-09 Thread Namhyung Kim

The diff.compute config variable is to set the default compute method of
perf diff command (-c option).  Possible values 'delta' (default),
'delta-abs', 'ratio' and 'wdiff'.

Cc: Taeung Song 
Signed-off-by: Namhyung Kim 
---
 tools/perf/Documentation/perf-config.txt |  5 +
 tools/perf/Documentation/perf-diff.txt   |  5 +++--
 tools/perf/builtin-diff.c| 16 +++-
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-config.txt 
b/tools/perf/Documentation/perf-config.txt
index 5b54d47ef713..f2d758dc1edc 100644
--- a/tools/perf/Documentation/perf-config.txt
+++ b/tools/perf/Documentation/perf-config.txt
@@ -505,6 +505,11 @@ Variables
Setting it to 1 will sort the result by delta (or other
compute method selected).
 
+   diff.compute::
+   This options sets the method of computing diff result.
+   Possible values are 'delta', 'delta-abs', 'ratio' and
+   'wdiff'.  Default is 'delta'.
+
 SEE ALSO
 
 linkperf:perf[1]
diff --git a/tools/perf/Documentation/perf-diff.txt 
b/tools/perf/Documentation/perf-diff.txt
index 6ba3bf582d79..70f490408262 100644
--- a/tools/perf/Documentation/perf-diff.txt
+++ b/tools/perf/Documentation/perf-diff.txt
@@ -86,8 +86,9 @@ OPTIONS
 
 -c::
 --compute::
-Differential computation selection - delta,ratio,wdiff,delta-abs 
(default is delta).
-See COMPARISON METHODS section for more info.
+Differential computation selection - delta,ratio,wdiff,delta-abs
+   (default is delta).  Default can be changed using diff.compute
+   config option.  See COMPARISON METHODS section for more info.
 
 -p::
 --period::
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 181ff996e039..4b4004d41c6a 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -86,7 +86,7 @@ const char *compute_names[COMPUTE_MAX] = {
[COMPUTE_WEIGHTED_DIFF] = "wdiff",
 };
 
-static int compute;
+static int compute = COMPUTE_DELTA;
 
 static int compute_2_hpp[COMPUTE_MAX] = {
[COMPUTE_DELTA] = PERF_HPP_DIFF__DELTA,
@@ -1299,6 +1299,20 @@ static int diff__config(const char *var, const char 
*value,
sort_compute = perf_config_int(var, value);
return 0;
}
+   if (!strcmp(var, "diff.compute")) {
+   if (!strcmp(value, "delta"))
+   compute = COMPUTE_DELTA;
+   else if (!strcmp(value, "delta-abs"))
+   compute = COMPUTE_DELTA_ABS;
+   else if (!strcmp(value, "ratio"))
+   compute = COMPUTE_RATIO;
+   else if (!strcmp(value, "wdiff"))
+   compute = COMPUTE_WEIGHTED_DIFF;
+   else {
+   pr_err("Invalid compute method: %s\n", value);
+   return -1;
+   }
+   }
 
return 0;
 }
-- 
2.11.0

[PATCH] ipc/mqueue: use unsigned int for retval

2017-02-09 Thread Nicholas Mc Guire

The retval variable is assigned by bitops only and thus there is no
reason to use a signed type. Further matching it with the return 
type of the function also makes static code checkers happy.

Signed-off-by: Nicholas Mc Guire 
---

Found by experimental type-checking coccinelle script

Patch was compile tested with: x86_64_defconfig

Patch is against 4.10-rc6 (localversion-next is next-20170209)

 ipc/mqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 4fdd970..897ce70 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -519,7 +519,7 @@ static int mqueue_flush_file(struct file *filp, fl_owner_t 
id)
 static unsigned int mqueue_poll_file(struct file *filp, struct 
poll_table_struct *poll_tab)
 {
struct mqueue_inode_info *info = MQUEUE_I(file_inode(filp));
-   int retval = 0;
+   unsigned int retval = 0;
 
poll_wait(filp, &info->wait_q, poll_tab);
 
-- 
2.1.4

Re: [PATCH ipsec] xfrm: policy: init locks early

2017-02-09 Thread Steffen Klassert

On Wed, Feb 08, 2017 at 11:52:29AM +0100, Florian Westphal wrote:
> Dmitry reports following splat:
>  INFO: trying to register non-static key.
>  the code is fine but needs lockdep annotation.
>  turning off the locking correctness validator.
>  CPU: 0 PID: 13059 Comm: syz-executor1 Not tainted 4.10.0-rc7-next-20170207 #1
> [..]
>  spin_lock_bh include/linux/spinlock.h:304 [inline]
>  xfrm_policy_flush+0x32/0x470 net/xfrm/xfrm_policy.c:963
>  xfrm_policy_fini+0xbf/0x560 net/xfrm/xfrm_policy.c:3041
>  xfrm_net_init+0x79f/0x9e0 net/xfrm/xfrm_policy.c:3091
>  ops_init+0x10a/0x530 net/core/net_namespace.c:115
>  setup_net+0x2ed/0x690 net/core/net_namespace.c:291
>  copy_net_ns+0x26c/0x530 net/core/net_namespace.c:396
>  create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106
>  unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
>  SYSC_unshare kernel/fork.c:2281 [inline]
> 
> Problem is that when we get error during xfrm_net_init we will call
> xfrm_policy_fini which will acquire xfrm_policy_lock before it was
> initialized.  Just move it around so locks get set up first.
> 
> Reported-by: Dmitry Vyukov 
> Fixes: 283bc9f35bbbcb0e9 ("xfrm: Namespacify xfrm state/policy locks")
> Signed-off-by: Florian Westphal 

Applied, thanks everyone!

[PATCH] proc/sysctl: prune stale dentries during unregistering

2017-02-09 Thread Konstantin Khlebnikov

Currently unregistering sysctl table does not prune its dentries.
Stale dentries could slowdown sysctl operations significantly.

For example, command:

# for i in {1..10} ; do unshare -n -- sysctl -a &> /dev/null ; done

creates a millions of stale denties around sysctls of loopback interface:

# sysctl fs.dentry-state
fs.dentry-state = 25812579  2472413545  0   0   0

All of them have matching names thus lookup have to scan though whole
hash chain and call d_compare (proc_sys_compare) which checks them
under system-wide spinlock (sysctl_lock).

# time sysctl -a > /dev/null
real1m12.806s
user0m0.016s
sys 1m12.400s

Currently only memory reclaimer could remove this garbage.
But without significant memory pressure this never happens.

This patch collects sysctl inodes into list on sysctl table header and
prunes all their dentries once that table unregisters.

Signed-off-by: Konstantin Khlebnikov 
Suggested-by: Al Viro 
---
 fs/proc/inode.c|3 ++
 fs/proc/internal.h |7 --
 fs/proc/proc_sysctl.c  |   59 +++-
 include/linux/sysctl.h |1 +
 4 files changed, 51 insertions(+), 19 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 842a5ff5b85c..7ad9ed7958af 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -43,10 +43,11 @@ static void proc_evict_inode(struct inode *inode)
de = PDE(inode);
if (de)
pde_put(de);
+
head = PROC_I(inode)->sysctl;
if (head) {
RCU_INIT_POINTER(PROC_I(inode)->sysctl, NULL);
-   sysctl_head_put(head);
+   proc_sys_evict_inode(inode, head);
}
 }
 
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 2de5194ba378..ed1d762160e6 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -65,6 +65,7 @@ struct proc_inode {
struct proc_dir_entry *pde;
struct ctl_table_header *sysctl;
struct ctl_table *sysctl_entry;
+   struct list_head sysctl_inodes;
const struct proc_ns_operations *ns_ops;
struct inode vfs_inode;
 };
@@ -249,10 +250,12 @@ extern void proc_thread_self_init(void);
  */
 #ifdef CONFIG_PROC_SYSCTL
 extern int proc_sys_init(void);
-extern void sysctl_head_put(struct ctl_table_header *);
+extern void proc_sys_evict_inode(struct inode *inode,
+struct ctl_table_header *head);
 #else
 static inline void proc_sys_init(void) { }
-static inline void sysctl_head_put(struct ctl_table_header *head) { }
+static inline void proc_sys_evict_inode(struct  inode *inode,
+   struct ctl_table_header *head) { }
 #endif
 
 /*
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index d4e37acd4821..8efb1e10b025 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -190,6 +190,7 @@ static void init_header(struct ctl_table_header *head,
head->set = set;
head->parent = NULL;
head->node = node;
+   INIT_LIST_HEAD(&head->inodes);
if (node) {
struct ctl_table *entry;
for (entry = table; entry->procname; entry++, node++)
@@ -259,6 +260,29 @@ static void unuse_table(struct ctl_table_header *p)
complete(p->unregistering);
 }
 
+/* called under sysctl_lock */
+static void proc_sys_prune_dcache(struct ctl_table_header *head)
+{
+   struct inode *inode, *prev = NULL;
+   struct proc_inode *ei;
+
+   list_for_each_entry(ei, &head->inodes, sysctl_inodes) {
+   inode = igrab(&ei->vfs_inode);
+   if (inode) {
+   spin_unlock(&sysctl_lock);
+   iput(prev);
+   prev = inode;
+   d_prune_aliases(inode);
+   spin_lock(&sysctl_lock);
+   }
+   }
+   if (prev) {
+   spin_unlock(&sysctl_lock);
+   iput(prev);
+   spin_lock(&sysctl_lock);
+   }
+}
+
 /* called under sysctl_lock, will reacquire if has to wait */
 static void start_unregistering(struct ctl_table_header *p)
 {
@@ -278,27 +302,17 @@ static void start_unregistering(struct ctl_table_header 
*p)
p->unregistering = ERR_PTR(-EINVAL);
}
/*
+* Prune dentries for unregistered sysctls: namespaced sysctls
+* can have duplicate names and contaminate dcache very badly.
+*/
+   proc_sys_prune_dcache(p);
+   /*
 * do not remove from the list until nobody holds it; walking the
 * list in do_sysctl() relies on that.
 */
erase_header(p);
 }
 
-static void sysctl_head_get(struct ctl_table_header *head)
-{
-   spin_lock(&sysctl_lock);
-   head->count++;
-   spin_unlock(&sysctl_lock);
-}
-
-void sysctl_head_put(struct ctl_table_header *head)
-{
-   spin_lock(&sysctl_lock);
-   if (!--head->count)
-   kfree_rcu(head, rcu);
-

[PATCH] Staging: iio: meter: meter.h - style fix

2017-02-09 Thread Derek Robson

Changed file permissions to octal.
Found with checkpatch.

Signed-off-by: Derek Robson 
---
 drivers/staging/iio/meter/meter.h | 60 +++
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/drivers/staging/iio/meter/meter.h 
b/drivers/staging/iio/meter/meter.h
index dfba510f29be..0e37f23853f1 100644
--- a/drivers/staging/iio/meter/meter.h
+++ b/drivers/staging/iio/meter/meter.h
@@ -81,94 +81,94 @@
IIO_DEVICE_ATTR(reactive_power_c_gain, _mode, _show, _store, _addr)
 
 #define IIO_DEV_ATTR_CURRENT_A(_show, _addr)   \
-   IIO_DEVICE_ATTR(current_a, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(current_a, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_CURRENT_B(_show, _addr)   \
-   IIO_DEVICE_ATTR(current_b, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(current_b, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_CURRENT_C(_show, _addr)   \
-   IIO_DEVICE_ATTR(current_c, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(current_c, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_VOLT_A(_show, _addr)  \
-   IIO_DEVICE_ATTR(volt_a, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(volt_a, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_VOLT_B(_show, _addr)  \
-   IIO_DEVICE_ATTR(volt_b, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(volt_b, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_VOLT_C(_show, _addr)  \
-   IIO_DEVICE_ATTR(volt_c, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(volt_c, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_AENERGY(_show, _addr) \
-   IIO_DEVICE_ATTR(aenergy, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(aenergy, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_LENERGY(_show, _addr) \
-   IIO_DEVICE_ATTR(lenergy, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(lenergy, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_RAENERGY(_show, _addr)\
-   IIO_DEVICE_ATTR(raenergy, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(raenergy, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_LAENERGY(_show, _addr)\
-   IIO_DEVICE_ATTR(laenergy, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(laenergy, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_VAENERGY(_show, _addr)\
-   IIO_DEVICE_ATTR(vaenergy, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(vaenergy, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_LVAENERGY(_show, _addr)   \
-   IIO_DEVICE_ATTR(lvaenergy, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(lvaenergy, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_RVAENERGY(_show, _addr)   \
-   IIO_DEVICE_ATTR(rvaenergy, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(rvaenergy, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_LVARENERGY(_show, _addr)  \
-   IIO_DEVICE_ATTR(lvarenergy, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(lvarenergy, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_CHKSUM(_show, _addr)   \
-   IIO_DEVICE_ATTR(chksum, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(chksum, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_ANGLE0(_show, _addr)   \
-   IIO_DEVICE_ATTR(angle0, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(angle0, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_ANGLE1(_show, _addr)   \
-   IIO_DEVICE_ATTR(angle1, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(angle1, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_ANGLE2(_show, _addr)   \
-   IIO_DEVICE_ATTR(angle2, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(angle2, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_AWATTHR(_show, _addr) \
-   IIO_DEVICE_ATTR(awatthr, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(awatthr, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_BWATTHR(_show, _addr) \
-   IIO_DEVICE_ATTR(bwatthr, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(bwatthr, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_CWATTHR(_show, _addr) \
-   IIO_DEVICE_ATTR(cwatthr, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(cwatthr, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_AFWATTHR(_show, _addr)\
-   IIO_DEVICE_ATTR(afwatthr, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(afwatthr, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_BFWATTHR(_show, _addr)\
-   IIO_DEVICE_ATTR(bfwatthr, S_IRUGO, _show, NULL, _addr)
+   IIO_DEVICE_ATTR(bfwatthr, 0444, _show, NULL, _addr)
 
 #define IIO_DEV_ATTR_CFWATTHR(_show, _addr)\
-   IIO_DEVICE_ATTR(cfwatthr,

linux-next: manual merge of the akpm tree with the net tree

2017-02-09 Thread Stephen Rothwell

Hi Andrew,

Today's linux-next merge of the akpm tree got a conflict in:

  drivers/net/usb/sierra_net.c

between commit:

  5a70348e1187 ("sierra_net: Add support for IPv6 and Dual-Stack Link Sense 
Indications")

from the net tree and patch:

  "lib/vsprintf.c: remove %Z support"

from the akpm tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/net/usb/sierra_net.c
index d9440bc022f2,88ace5024306..
--- a/drivers/net/usb/sierra_net.c
+++ b/drivers/net/usb/sierra_net.c
@@@ -376,11 -349,11 +376,11 @@@ static inline int sierra_net_is_valid_a
  static int sierra_net_parse_lsi(struct usbnet *dev, char *data, int datalen)
  {
struct lsi_umts *lsi = (struct lsi_umts *)data;
 +  u32 expected_length;
  
 -  if (datalen < sizeof(struct lsi_umts)) {
 -  netdev_err(dev->net, "%s: Data length %d, exp %zu\n",
 -  __func__, datalen,
 -  sizeof(struct lsi_umts));
 +  if (datalen < sizeof(struct lsi_umts_single)) {
-   netdev_err(dev->net, "%s: Data length %d, exp >= %Zu\n",
++  netdev_err(dev->net, "%s: Data length %d, exp >= %zu\n",
 + __func__, datalen, sizeof(struct lsi_umts_single));
return -1;
}

Re: [PATCHSET 0/3] perf diff: Introduce delta-abs compute method

2017-02-09 Thread Namhyung Kim

Hi Arnaldo,

On Tue, Feb 07, 2017 at 01:02:14PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Feb 06, 2017 at 11:26:16PM +0900, Namhyung Kim escreveu:
> > Hi Arnaldo,
> > 
> > On Mon, Feb 06, 2017 at 09:51:49AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Mon, Feb 06, 2017 at 04:20:34PM +0900, Namhyung Kim escreveu:
> > > I agree on having the default changed to 'delta-abs', Ingo?
>  
> > Good.  Also, as I said in the changelog, it needs to change default
> > value of -o option to 1 in order to make the 'delta-abs' effective.
> 
> ok
>  
> > > Namhyung, and perhaps we should have a single letter option to do that
> > > '| grep -v ^#' bit :-) and perhaps we also should have, for all tools
> > > the equivalent of that "| head", that git log has:
> > > 
> > > [acme@jouet linux]$ git log --oneline -5
> > > d7cb3a507d23 Merge tag 'perf-core-for-mingo-4.11-20170201' of 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
> > > 5443624bedd0 perf/x86/intel/pt: Add format strings for PTWRITE and power 
> > > event tracing
> > > b05d1093987a perf ftrace: Add ftrace.tracer config option
> > > 43d41deb71fe perf tools: Create for_each_event macro for tracepoints 
> > > iteration
> > > a26305363d4b perf test: Add libbpf pinning test
> > > [acme@jouet linux]$
> > > 
> > > That '-5' to show just the first 5 lines worth of output.
> > > 
> > > With all that we would have:
> > > 
> > >   perf diff -o 1 -q10
> > > 
> > > As the equivalent to "perf diff -o 1 -c delta-abs | grep -v ^# | head".
> > 
> > The -q/--quiet looks ok since it corresponds to -v/--verbose option.
> 
> Ok, agreed on this one.
> 
> > But I'm not sure about the number option.
>  
> > In case of git, it'll stop processing commits after the given number
> > of them, so it will reduce significant processing time IMHO.  However,
> > in perf, we need to process whole data anyway and sort at the final
> > stage, and then stop displaying entries after the given number.
>  
> > Maybe it's just a shortcut of piping to the head command.  Then I
> 
> I wasn't thinking about the processing savings from stopping to process
> at that many lines, my suggestion was just about making the command line
> more compact, to type less.
> 
> If that can also map to processing savings, the better.

Ok, I'll take a look at it later.  I just want to finish this work first.

Thanks,
Namhyung

Re: [PATCH V2 2/7] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list

2017-02-09 Thread Minchan Kim

Hi Shaohua,

On Fri, Feb 03, 2017 at 03:33:18PM -0800, Shaohua Li wrote:
> Userspace indicates MADV_FREE pages could be freed without pageout, so
> it pretty much likes used once file pages. For such pages, we'd like to
> reclaim them once there is memory pressure. Also it might be unfair
> reclaiming MADV_FREE pages always before used once file pages and we
> definitively want to reclaim the pages before other anonymous and file
> pages.
> 
> To speed up MADV_FREE pages reclaim, we put the pages into
> LRU_INACTIVE_FILE list. The rationale is LRU_INACTIVE_FILE list is tiny
> nowadays and should be full of used once file pages. Reclaiming
> MADV_FREE pages will not have much interfere of anonymous and active
> file pages. And the inactive file pages and MADV_FREE pages will be
> reclaimed according to their age, so we don't reclaim too many MADV_FREE
> pages too. Putting the MADV_FREE pages into LRU_INACTIVE_FILE_LIST also
> means we can reclaim the pages without swap support. This idea is
> suggested by Johannes.
> 
> We also clear the pages SwapBacked flag to indicate they are MADV_FREE
> pages.

I think this patch should be merged with 3/7. Otherwise, MADV_FREE will
be broken during the bisect.

> 
> Cc: Michal Hocko 
> Cc: Minchan Kim 
> Cc: Hugh Dickins 
> Cc: Johannes Weiner 
> Cc: Rik van Riel 
> Cc: Mel Gorman 
> Cc: Andrew Morton 
> Signed-off-by: Shaohua Li 
> ---
>  include/linux/mm_inline.h |  5 +
>  include/linux/swap.h  |  2 +-
>  include/linux/vm_event_item.h |  2 +-
>  mm/huge_memory.c  |  5 ++---
>  mm/madvise.c  |  3 +--
>  mm/swap.c | 50 
> ---
>  mm/vmstat.c   |  1 +
>  7 files changed, 39 insertions(+), 29 deletions(-)
> 
> diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
> index e030a68..fdded06 100644
> --- a/include/linux/mm_inline.h
> +++ b/include/linux/mm_inline.h
> @@ -22,6 +22,11 @@ static inline int page_is_file_cache(struct page *page)
>   return !PageSwapBacked(page);
>  }
>  
> +static inline bool page_is_lazyfree(struct page *page)
> +{
> + return PageAnon(page) && !PageSwapBacked(page);
> +}
> +

trivial:

How about using PageLazyFree for consistency with other PageXXX?
As well, use SetPageLazyFree/ClearPageLazyFree rather than using
raw {Set,Clear}PageSwapBacked.

>  static __always_inline void __update_lru_size(struct lruvec *lruvec,
>   enum lru_list lru, enum zone_type zid,
>   int nr_pages)
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 45e91dd..486494e 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -279,7 +279,7 @@ extern void lru_add_drain_cpu(int cpu);
>  extern void lru_add_drain_all(void);
>  extern void rotate_reclaimable_page(struct page *page);
>  extern void deactivate_file_page(struct page *page);
> -extern void deactivate_page(struct page *page);
> +extern void mark_page_lazyfree(struct page *page);

trivial:

How about "deactivate_lazyfree_page"? IMO, it would show intention
clear that move the lazy free page to inactive list.

It's just matter of preference so I'm not strong against.

>  extern void swap_setup(void);
>  
>  extern void add_page_to_unevictable_list(struct page *page);
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 6aa1b6c..94e58da 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -25,7 +25,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
>   FOR_ALL_ZONES(PGALLOC),
>   FOR_ALL_ZONES(ALLOCSTALL),
>   FOR_ALL_ZONES(PGSCAN_SKIP),
> - PGFREE, PGACTIVATE, PGDEACTIVATE,
> + PGFREE, PGACTIVATE, PGDEACTIVATE, PGLAZYFREE,
>   PGFAULT, PGMAJFAULT,
>   PGLAZYFREED,
>   PGREFILL,
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index ecf569d..ddb9a94 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1391,9 +1391,6 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, 
> struct vm_area_struct *vma,
>   ClearPageDirty(page);
>   unlock_page(page);
>  
> - if (PageActive(page))
> - deactivate_page(page);
> -
>   if (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) {
>   orig_pmd = pmdp_huge_get_and_clear_full(tlb->mm, addr, pmd,
>   tlb->fullmm);
> @@ -1404,6 +1401,8 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, 
> struct vm_area_struct *vma,
>   set_pmd_at(mm, addr, pmd, orig_pmd);
>   tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
>   }
> +
> + mark_page_lazyfree(page);
>   ret = true;
>  out:
>   spin_unlock(ptl);
> diff --git a/mm/madvise.c b/mm/madvise.c
> index c867d88..c24549e 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -378,10 +378,9 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned 
> l

Re: [PATCH] media: fix s5p_mfc_set_dec_frame_buffer_v6() to print buf size in hex

2017-02-09 Thread Andrzej Hajda

On 09.02.2017 23:10, Shuah Khan wrote:
> Fix s5p_mfc_set_dec_frame_buffer_v6() to print buffer size in hex to be
> consistent with the rest of the messages in the routine.
>
> Signed-off-by: Shuah Khan 

As Nicolas said please fix the subject.

After this you can add my:
Acked-by: Andrzej Hajda 

--
Regards
Andrzej

> ---
>  drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c 
> b/drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c
> index d6f207e..fc45980 100644
> --- a/drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c
> +++ b/drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c
> @@ -497,7 +497,7 @@ static int s5p_mfc_set_dec_frame_buffer_v6(struct 
> s5p_mfc_ctx *ctx)
>   }
>   }
>  
> - mfc_debug(2, "Buf1: %zu, buf_size1: %d (frames %d)\n",
> + mfc_debug(2, "Buf1: %zx, buf_size1: %d (frames %d)\n",
>   buf_addr1, buf_size1, ctx->total_dpb_count);
>   if (buf_size1 < 0) {
>   mfc_debug(2, "Not enough memory has been allocated.\n");

linux-next: manual merge of the akpm-current tree with the xfs tree

2017-02-09 Thread Stephen Rothwell

Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  include/linux/iomap.h

between commit:

  8ff6daa17b6a ("iomap: constify struct iomap_ops")

from the xfs tree and commit:

  d6fb008f420c ("mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take 
only vmf")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc include/linux/iomap.h
index 891459caa278,857e440af9d5..
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@@ -72,17 -72,16 +72,16 @@@ struct iomap_ops 
  };
  
  ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
 -  struct iomap_ops *ops);
 +  const struct iomap_ops *ops);
  int iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
 -  struct iomap_ops *ops);
 +  const struct iomap_ops *ops);
  int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
 -  bool *did_zero, struct iomap_ops *ops);
 +  bool *did_zero, const struct iomap_ops *ops);
  int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
 -  struct iomap_ops *ops);
 -int iomap_page_mkwrite(struct vm_fault *vmf, struct iomap_ops *ops);
 +  const struct iomap_ops *ops);
- int iomap_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf,
-   const struct iomap_ops *ops);
++int iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops);
  int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 -  loff_t start, loff_t len, struct iomap_ops *ops);
 +  loff_t start, loff_t len, const struct iomap_ops *ops);
  
  /*
   * Flags for direct I/O ->end_io:

Re: [PATCH 1/3] ath10k: remove ath10k_vif_to_arvif()

2017-02-09 Thread Adrian Chadd

On 9 February 2017 at 23:03, Valo, Kalle  wrote:
> Ben Greear  writes:
>
>> On 02/07/2017 01:14 AM, Valo, Kalle wrote:
>>> Adrian Chadd  writes:
>>>
 Removing this method makes the diff to FreeBSD larger, as "vif" in
 FreeBSD is a different pointer.

 (Yes, I have ath10k on freebsd working and I'd like to find a way to
 reduce the diff moving forward.)
>>>
>>> I don't like this "(void *) vif->drv_priv" style that much either but
>>> apparently it's commonly used in Linux wireless code and already parts
>>> of ath10k. So this patch just unifies the coding style.
>>
>> Surely the code compiles to the same thing, so why add a patch that
>> makes it more difficult for Adrian and makes the code no easier to read
>> for the rest of us?
>
> Because that's the coding style used already in Linux. It's great to see
> that parts of ath10k can be used also in other systems but in principle
> I'm not very fond of the idea starting to reject valid upstream patches
> because of driver forks.
>
> I think backports project is doing it right, it's not limiting upstream
> development in any way and handles all the API changes internally. Maybe
> FreeBSD could do something similar?

I tried, but ... well, imagine renaming vif->drv_priv to something
else. That's what you're suggesting. :-) You can do it with
coccinelle, but not via just backports API implementations. I'm a big
fan of light weight accessor APIs for the same reason.

(Since FreeBSD doesn't have that pointer in ieee80211vap, it's done a
different way.)

If you could convert other direct uses over to ath10k_vif_to_arvif()
then that'd make me happier. If not, it's fine, when I push this into
freebsd and fast-forward commits, I'll have to just maintain it.

For what it's worth - the linux skb accessors are the same. If there
were accessors for the skb data / len fields (like we do for mbufs)
then porting the code would've involved about 5,000 less changed
lines.

-adrian

Re: [PATCH 1/3 v2 staging-next] android: Collect statistics from lowmemorykiller

2017-02-09 Thread peter enderborg

On 02/09/2017 09:13 PM, Greg Kroah-Hartman wrote:
> On Thu, Feb 09, 2017 at 04:42:35PM +0100, peter enderborg wrote:
>> This collects stats for shrinker calls and how much
>> waste work we do within the lowmemorykiller.
>>
>> Signed-off-by: Peter Enderborg 
>> ---
>>  drivers/staging/android/Kconfig | 11 
>>  drivers/staging/android/Makefile|  1 +
>>  drivers/staging/android/lowmemorykiller.c   |  9 ++-
>>  drivers/staging/android/lowmemorykiller_stats.c | 85 
>> +
>>  drivers/staging/android/lowmemorykiller_stats.h | 29 +
>>  5 files changed, 134 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/staging/android/lowmemorykiller_stats.c
>>  create mode 100644 drivers/staging/android/lowmemorykiller_stats.h
> What changed from v1?
Nothing. I thought I found the reason why my tabs are replaced by spaces in 
transport.

>> @@ -72,6 +73,7 @@ static unsigned long lowmem_deathpending_timeout;
>>  static unsigned long lowmem_count(struct shrinker *s,
>>struct shrink_control *sc)
>>  {
>> +lmk_inc_stats(LMK_COUNT);
>>  return global_node_page_state(NR_ACTIVE_ANON) +
>>  global_node_page_state(NR_ACTIVE_FILE) +
>>  global_node_page_state(NR_INACTIVE_ANON) +
> Your email client is eating tabs and spitting out spaces, making this
> impossible to even consider being merged :(
>
> Please fix your email client, documentation for how to do so is in the
> kernel Documentation directory.
>
> thanks,
>
> greg k-h

Re: [PATCH] media: s5p_mfc - remove unneeded io_modes initialzation in s5p_mfc_open()

2017-02-09 Thread Andrzej Hajda

On 09.02.2017 23:11, Shuah Khan wrote:
> Remove unneeded io_modes initialzation in s5p_mfc_open(). It gets done
> right below in vdev == dev->vfd_dec conditional.
>
> Signed-off-by: Shuah Khan 
Acked-by: Andrzej Hajda 
--
Regards
Andrzej
> ---
>  drivers/media/platform/s5p-mfc/s5p_mfc.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc.c 
> b/drivers/media/platform/s5p-mfc/s5p_mfc.c
> index bb0a588..20beaa2 100644
> --- a/drivers/media/platform/s5p-mfc/s5p_mfc.c
> +++ b/drivers/media/platform/s5p-mfc/s5p_mfc.c
> @@ -866,7 +866,6 @@ static int s5p_mfc_open(struct file *file)
>   /* Init videobuf2 queue for OUTPUT */
>   q = &ctx->vq_src;
>   q->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
> - q->io_modes = VB2_MMAP;
>   q->drv_priv = &ctx->fh;
>   q->lock = &dev->mfc_mutex;
>   if (vdev == dev->vfd_dec) {

Re: [PATCH v2 4/4] [media] s5p-mfc: Always check and set 'v4l2_pix_format:field' field

2017-02-09 Thread Andrzej Hajda

On 09.02.2017 21:04, Thibault Saunier wrote:
> It is required by the standard that the field order is set by the
> driver.
>
> Signed-off-by: Thibault Saunier 
> ---
>  drivers/media/platform/s5p-mfc/s5p_mfc_dec.c | 23 +--
>  1 file changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc_dec.c 
> b/drivers/media/platform/s5p-mfc/s5p_mfc_dec.c
> index 960d6c7052bd..309b0a1bbca5 100644
> --- a/drivers/media/platform/s5p-mfc/s5p_mfc_dec.c
> +++ b/drivers/media/platform/s5p-mfc/s5p_mfc_dec.c
> @@ -345,7 +345,6 @@ static int vidioc_g_fmt(struct file *file, void *priv, 
> struct v4l2_format *f)
>  rectangle. */
>   pix_mp->width = ctx->buf_width;
>   pix_mp->height = ctx->buf_height;
> - pix_mp->field = V4L2_FIELD_NONE;
>   pix_mp->num_planes = 2;
>   /* Set pixelformat to the format in which MFC
>  outputs the decoded frame */
> @@ -369,7 +368,6 @@ static int vidioc_g_fmt(struct file *file, void *priv, 
> struct v4l2_format *f)
>  so width and height have no meaning */
>   pix_mp->width = 0;
>   pix_mp->height = 0;
> - pix_mp->field = V4L2_FIELD_NONE;
>   pix_mp->plane_fmt[0].bytesperline = ctx->dec_src_buf_size;
>   pix_mp->plane_fmt[0].sizeimage = ctx->dec_src_buf_size;
>   pix_mp->pixelformat = ctx->src_fmt->fourcc;
> @@ -379,6 +377,14 @@ static int vidioc_g_fmt(struct file *file, void *priv, 
> struct v4l2_format *f)
>   mfc_debug(2, "%s-- with error\n", __func__);
>   return -EINVAL;
>   }
> +
> + if (pix_mp->field == V4L2_FIELD_ANY) {
> + pix_mp->field = V4L2_FIELD_NONE;
> + } else if (pix_mp->field != V4L2_FIELD_NONE) {
> + mfc_err("Not supported field order(%d)\n", pix_mp->field);
> + return -EINVAL;
> + }
> +

Again in g_fmt you should not check current value of the field.

>   mfc_debug_leave();
>   return 0;
>  }
> @@ -389,6 +395,19 @@ static int vidioc_try_fmt(struct file *file, void *priv, 
> struct v4l2_format *f)
>   struct s5p_mfc_dev *dev = video_drvdata(file);
>   struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
>   struct s5p_mfc_fmt *fmt;
> + enum v4l2_field field;
> +
> + field = f->fmt.pix.field;
> + if (field == V4L2_FIELD_ANY) {
> + field = V4L2_FIELD_NONE;
> + } else if (V4L2_FIELD_NONE != field) {
> + mfc_debug(2, "Not supported field order(%d)\n", pix_mp->field);
> + return -EINVAL;
> + }
> +
> + /* V4L2 specification suggests the driver corrects the format struct
> +  * if any of the dimensions is unsupported */
> + f->fmt.pix.field = field;

According to docs [1], you should just adjust the field.

[1]:
http://hverkuil.home.xs4all.nl/spec/uapi/v4l/vidioc-g-fmt.html?highlight=g_fmt#c.VIDIOC_G_FMT

Regards
Andrzej

>  
>   mfc_debug(2, "Type is %d\n", f->type);
>   if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) {

Re: [RFC] 3.10 kernel- oom with about 24G free memory

2017-02-09 Thread Michal Hocko

On Fri 10-02-17 09:13:58, Yisheng Xie wrote:
> hi Michal,
> Thanks for your comment.
> 
> On 2017/2/9 21:41, Michal Hocko wrote:
> > On Thu 09-02-17 14:26:28, Michal Hocko wrote:
> >> On Thu 09-02-17 20:54:49, Yisheng Xie wrote:
> >>> Hi all,
> >>> I get an oom on a linux 3.10 kvm guest OS. when it triggers the oom
> >>> it have about 24G free memory(and host OS have about 10G free memory)
> >>> and watermark is sure ok.
> >>>
> >>> I also check about about memcg limit value, also cannot find the
> >>> root cause.
> >>>
> >>> Is there anybody ever meet similar problem and have any idea about it?
> >>>
> >>> Any comment is more than welcome!
> >>>
> >>> Thanks
> >>> Yisheng Xie
> >>>
> >>> -
> >>> [   81.234289] DefSch0200 invoked oom-killer: gfp_mask=0xd0, order=0, 
> >>> oom_score_adj=0
> >>> [   81.234295] DefSch0200 cpuset=/ mems_allowed=0
> >>> [   81.234299] CPU: 3 PID: 8284 Comm: DefSch0200 Tainted: G   O E 
> >>> V---   3.10.0-229.42.1.105.x86_64 #1
> >>> [   81.234301] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 
> >>> rel-1.8.1-0-g4adadbd-2016_105425-HGH108200 04/01/2014
> >>> [   81.234303]  880ae290 2b3489d7 880b6cec7c58 
> >>> 81608d3d
> >>> [   81.234307]  880b6cec7ce8 81603d1c  
> >>> 880b6cd09000
> >>> [   81.234311]  880b6cec7cd8 2b3489d7 880b6cec7ce0 
> >>> 811bdd77
> >>> [   81.234314] Call Trace:
> >>> [   81.234323]  [] dump_stack+0x19/0x1b
> >>> [   81.234327]  [] dump_header+0x8e/0x214
> >>> [   81.234333]  [] ? mem_cgroup_iter+0x177/0x2b0
> >>> [   81.234339]  [] check_panic_on_oom+0x2e/0x60
> >>> [   81.234342]  [] 
> >>> mem_cgroup_oom_synchronize+0x34f/0x580
> >>
> >> OK, so this is a memcg OOM killer which panics because the configuration
> >> says so. The OOM report doesn't say so and that is the bug. dump_header
> >> is memcg aware and mem_cgroup_out_of_memory initializes oom_control
> >> properly. Is this Vanilla kernel?
>
> That means we should raise the limit of that memcg to avoid memcg OOM killer, 
> right?

Why do you configure the system to panic on memcg OOM in the first
place. This is a wrong thing to do in 99% of cases.

-- 
Michal Hocko
SUSE Labs

Re: [PATCH 1/3] ath10k: remove ath10k_vif_to_arvif()

2017-02-09 Thread Valo, Kalle

Ben Greear  writes:

> On 02/07/2017 01:14 AM, Valo, Kalle wrote:
>> Adrian Chadd  writes:
>>
>>> Removing this method makes the diff to FreeBSD larger, as "vif" in
>>> FreeBSD is a different pointer.
>>>
>>> (Yes, I have ath10k on freebsd working and I'd like to find a way to
>>> reduce the diff moving forward.)
>>
>> I don't like this "(void *) vif->drv_priv" style that much either but
>> apparently it's commonly used in Linux wireless code and already parts
>> of ath10k. So this patch just unifies the coding style.
>
> Surely the code compiles to the same thing, so why add a patch that
> makes it more difficult for Adrian and makes the code no easier to read
> for the rest of us?

Because that's the coding style used already in Linux. It's great to see
that parts of ath10k can be used also in other systems but in principle
I'm not very fond of the idea starting to reject valid upstream patches
because of driver forks.

I think backports project is doing it right, it's not limiting upstream
development in any way and handles all the API changes internally. Maybe
FreeBSD could do something similar?

-- 
Kalle Valo

Re: [PATCH v2 3/4] [media] s5p-mfc: Set colorspace in VIDIO_{G,TRY}_FMT

2017-02-09 Thread Andrzej Hajda

On 09.02.2017 21:04, Thibault Saunier wrote:
> The media documentation says that the V4L2_COLORSPACE_SMPTE170M colorspace
> should be used for SDTV and V4L2_COLORSPACE_REC709 for HDTV but the driver
> didn't set the colorimetry, also respect usespace setting.
>
> Use 576p display resolution as a threshold to set this.
>
> Signed-off-by: Thibault Saunier 
> ---
>  drivers/media/platform/s5p-mfc/s5p_mfc_dec.c | 19 +++
>  1 file changed, 19 insertions(+)
>
> diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc_dec.c 
> b/drivers/media/platform/s5p-mfc/s5p_mfc_dec.c
> index 367ef8e8dbf0..960d6c7052bd 100644
> --- a/drivers/media/platform/s5p-mfc/s5p_mfc_dec.c
> +++ b/drivers/media/platform/s5p-mfc/s5p_mfc_dec.c
> @@ -354,6 +354,15 @@ static int vidioc_g_fmt(struct file *file, void *priv, 
> struct v4l2_format *f)
>   pix_mp->plane_fmt[0].sizeimage = ctx->luma_size;
>   pix_mp->plane_fmt[1].bytesperline = ctx->buf_width;
>   pix_mp->plane_fmt[1].sizeimage = ctx->chroma_size;
> +
> + if (pix_mp->colorspace != V4L2_COLORSPACE_REC709 &&
> + pix_mp->colorspace != V4L2_COLORSPACE_SMPTE170M &&
> + pix_mp->colorspace != V4L2_COLORSPACE_DEFAULT) {
> +   if (pix_mp->width > 720 && pix_mp->height > 576) /* HD */
> + pix_mp->colorspace = V4L2_COLORSPACE_REC709;
> +   else /* SD */
> + pix_mp->colorspace = V4L2_COLORSPACE_SMPTE170M;
> + }

Again, in g_fmt you should not check values of fields you have to fill.


Regards
Andrzej

>   } else if (f->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) {
>   /* This is run on OUTPUT
>  The buffer contains compressed image
> @@ -378,6 +387,7 @@ static int vidioc_g_fmt(struct file *file, void *priv, 
> struct v4l2_format *f)
>  static int vidioc_try_fmt(struct file *file, void *priv, struct v4l2_format 
> *f)
>  {
>   struct s5p_mfc_dev *dev = video_drvdata(file);
> + struct v4l2_pix_format_mplane *pix_mp = &f->fmt.pix_mp;
>   struct s5p_mfc_fmt *fmt;
>  
>   mfc_debug(2, "Type is %d\n", f->type);
> @@ -405,6 +415,15 @@ static int vidioc_try_fmt(struct file *file, void *priv, 
> struct v4l2_format *f)
>   mfc_err("Unsupported format by this MFC version.\n");
>   return -EINVAL;
>   }
> +
> + if (pix_mp->colorspace != V4L2_COLORSPACE_REC709 &&
> + pix_mp->colorspace != V4L2_COLORSPACE_SMPTE170M &&
> + pix_mp->colorspace != V4L2_COLORSPACE_DEFAULT) {
> +   if (pix_mp->width > 720 && pix_mp->height > 576) /* HD */
> + pix_mp->colorspace = V4L2_COLORSPACE_REC709;
> +   else /* SD */
> + pix_mp->colorspace = V4L2_COLORSPACE_SMPTE170M;
> + }
>   }
>  
>   return 0;

Re: [PATCH v2 2/4] [media] exynos-gsc: Respect userspace colorspace setting

2017-02-09 Thread Andrzej Hajda

On 09.02.2017 21:04, Thibault Saunier wrote:
> If the colorspace is specified by userspace we should respect
> it and not reset it ourself if we can support it.
>
> Signed-off-by: Thibault Saunier 
> ---
>  drivers/media/platform/exynos-gsc/gsc-core.c | 25 +
>  1 file changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/media/platform/exynos-gsc/gsc-core.c 
> b/drivers/media/platform/exynos-gsc/gsc-core.c
> index 2beb43401987..63bb4577827d 100644
> --- a/drivers/media/platform/exynos-gsc/gsc-core.c
> +++ b/drivers/media/platform/exynos-gsc/gsc-core.c
> @@ -445,10 +445,14 @@ int gsc_try_fmt_mplane(struct gsc_ctx *ctx, struct 
> v4l2_format *f)
>  
>   pix_mp->num_planes = fmt->num_planes;
>  
> - if (pix_mp->width > 720 && pix_mp->height > 576) /* HD */
> - pix_mp->colorspace = V4L2_COLORSPACE_REC709;
> - else /* SD */
> - pix_mp->colorspace = V4L2_COLORSPACE_SMPTE170M;
> + if (pix_mp->colorspace != V4L2_COLORSPACE_REC709 &&
> + pix_mp->colorspace != V4L2_COLORSPACE_SMPTE170M &&
> + pix_mp->colorspace != V4L2_COLORSPACE_DEFAULT) {
> + if (pix_mp->width > 720 && pix_mp->height > 576) /* HD */
> +   pix_mp->colorspace = V4L2_COLORSPACE_REC709;
> + else /* SD */
> +   pix_mp->colorspace = V4L2_COLORSPACE_SMPTE170M;
> +   }
>  
>   for (i = 0; i < pix_mp->num_planes; ++i) {
>   struct v4l2_plane_pix_format *plane_fmt = &pix_mp->plane_fmt[i];
> @@ -492,12 +496,17 @@ int gsc_g_fmt_mplane(struct gsc_ctx *ctx, struct 
> v4l2_format *f)
>   pix_mp->height  = frame->f_height;
>   pix_mp->field   = V4L2_FIELD_NONE;
>   pix_mp->pixelformat = frame->fmt->pixelformat;
> - if (pix_mp->width > 720 && pix_mp->height > 576) /* HD */
> - pix_mp->colorspace = V4L2_COLORSPACE_REC709;
> - else /* SD */
> - pix_mp->colorspace = V4L2_COLORSPACE_SMPTE170M;
>   pix_mp->num_planes  = frame->fmt->num_planes;
>  
> + if (pix_mp->colorspace != V4L2_COLORSPACE_REC709 &&
> + pix_mp->colorspace != V4L2_COLORSPACE_SMPTE170M &&
> + pix_mp->colorspace != V4L2_COLORSPACE_DEFAULT) {
> + if (pix_mp->width > 720 && pix_mp->height > 576) /* HD */
> +   pix_mp->colorspace = V4L2_COLORSPACE_REC709;
> + else /* SD */
> +   pix_mp->colorspace = V4L2_COLORSPACE_SMPTE170M;
> +   }
> +

This is g_fmt, so you set colorspace regardless of its current value, am
I right?

Regards
Andrzej

>   for (i = 0; i < pix_mp->num_planes; ++i) {
>   pix_mp->plane_fmt[i].bytesperline = (frame->f_width *
>   frame->fmt->depth[i]) / 8;

Re: [PATCH v2 1/4] [media] exynos-gsc: Use 576p instead 720p as a threshold for colorspaces

2017-02-09 Thread Andrzej Hajda

On 09.02.2017 21:04, Thibault Saunier wrote:
> From: Javier Martinez Canillas 
>
> The media documentation says that the V4L2_COLORSPACE_SMPTE170M colorspace
> should be used for SDTV and V4L2_COLORSPACE_REC709 for HDTV. But drivers
> don't agree on the display resolution that should be used as a threshold.
>
> Some drivers set V4L2_COLORSPACE_REC709 for 720p and higher while others
> set V4L2_COLORSPACE_REC709 for anything higher than 576p. Newers drivers
> use the latter and that also matches what user-space multimedia programs
> do (i.e: GStreamer), so change the driver logic to be aligned with this.
>
> Also, check for the resolution in G_FMT instead unconditionally setting
> the V4L2_COLORSPACE_REC709 colorspace.
>
> Signed-off-by: Javier Martinez Canillas 

Reviewed-by: Andrzej Hajda 

--
Regards
Andrzej
> ---
>  drivers/media/platform/exynos-gsc/gsc-core.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/media/platform/exynos-gsc/gsc-core.c 
> b/drivers/media/platform/exynos-gsc/gsc-core.c
> index cbb03768f5d7..2beb43401987 100644
> --- a/drivers/media/platform/exynos-gsc/gsc-core.c
> +++ b/drivers/media/platform/exynos-gsc/gsc-core.c
> @@ -445,7 +445,7 @@ int gsc_try_fmt_mplane(struct gsc_ctx *ctx, struct 
> v4l2_format *f)
>  
>   pix_mp->num_planes = fmt->num_planes;
>  
> - if (pix_mp->width >= 1280) /* HD */
> + if (pix_mp->width > 720 && pix_mp->height > 576) /* HD */
>   pix_mp->colorspace = V4L2_COLORSPACE_REC709;
>   else /* SD */
>   pix_mp->colorspace = V4L2_COLORSPACE_SMPTE170M;
> @@ -492,7 +492,10 @@ int gsc_g_fmt_mplane(struct gsc_ctx *ctx, struct 
> v4l2_format *f)
>   pix_mp->height  = frame->f_height;
>   pix_mp->field   = V4L2_FIELD_NONE;
>   pix_mp->pixelformat = frame->fmt->pixelformat;
> - pix_mp->colorspace  = V4L2_COLORSPACE_REC709;
> + if (pix_mp->width > 720 && pix_mp->height > 576) /* HD */
> + pix_mp->colorspace = V4L2_COLORSPACE_REC709;
> + else /* SD */
> + pix_mp->colorspace = V4L2_COLORSPACE_SMPTE170M;
>   pix_mp->num_planes  = frame->fmt->num_planes;
>  
>   for (i = 0; i < pix_mp->num_planes; ++i) {

linux-next: manual merge of the akpm-current tree with the xfs tree

2017-02-09 Thread Stephen Rothwell

Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  include/linux/dax.h

between commit:

  8ff6daa17b6a ("omap: constify struct iomap_ops")

from the xfs tree and commits:

  d6fb008f420c ("mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take 
only vmf")
  db09ccff3e4e ("mm: replace FAULT_FLAG_SIZE with parameter to huge_fault")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc include/linux/dax.h
index 2983e52efd07,df63730e86d2..
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@@ -37,9 -37,9 +37,9 @@@ static inline void *dax_radix_locked_en
  }
  
  ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
 -  struct iomap_ops *ops);
 +  const struct iomap_ops *ops);
- int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
-   const struct iomap_ops *ops);
+ int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
 -  struct iomap_ops *ops);
++  const struct iomap_ops *ops);
  int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
  int dax_invalidate_mapping_entry(struct address_space *mapping, pgoff_t 
index);
  int dax_invalidate_mapping_entry_sync(struct address_space *mapping,

Re: [PATCH V2 3/7] mm: reclaim MADV_FREE pages

2017-02-09 Thread Minchan Kim

On Fri, Feb 03, 2017 at 03:33:19PM -0800, Shaohua Li wrote:
> When memory pressure is high, we free MADV_FREE pages. If the pages are
> not dirty in pte, the pages could be freed immediately. Otherwise we
> can't reclaim them. We put the pages back to anonumous LRU list (by
> setting SwapBacked flag) and the pages will be reclaimed in normal
> swapout way.
> 
> We use normal page reclaim policy. Since MADV_FREE pages are put into
> inactive file list, such pages and inactive file pages are reclaimed
> according to their age. This is expected, because we don't want to
> reclaim too many MADV_FREE pages before used once pages.
> 
> Cc: Michal Hocko 
> Cc: Minchan Kim 
> Cc: Hugh Dickins 
> Cc: Johannes Weiner 
> Cc: Rik van Riel 
> Cc: Mel Gorman 
> Cc: Andrew Morton 
> Signed-off-by: Shaohua Li 
> ---
>  mm/rmap.c   |  4 
>  mm/vmscan.c | 43 +++
>  2 files changed, 35 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index c8d6204..5f05926 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1554,6 +1554,10 @@ static int try_to_unmap_one(struct page *page, struct 
> vm_area_struct *vma,
>   dec_mm_counter(mm, MM_ANONPAGES);
>   rp->lazyfreed++;
>   goto discard;
> + } else if (flags & TTU_LZFREE) {
> + set_pte_at(mm, address, pte, pteval);
> + ret = SWAP_FAIL;
> + goto out_unmap;

trivial:

How about this?

if (flags && TTU_LZFREE) {
if (PageDirty(page)) {
set_pte_at(XXX);
ret = SWAP_FAIL;
goto out_unmap;
} else {
dec_mm_counter(mm, MM_ANONPAGES);
rp->lazyfreed++;
goto discard;
}
}

>   }
>  
>   if (swap_duplicate(entry) < 0) {
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 947ab6f..b304a84 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -864,7 +864,7 @@ static enum page_references page_check_references(struct 
> page *page,
>   return PAGEREF_RECLAIM;
>  
>   if (referenced_ptes) {
> - if (PageSwapBacked(page))
> + if (PageSwapBacked(page) || PageAnon(page))

If anyone accesses MADV_FREEed range with load op, not store,
why shouldn't we discard that pages?

>   return PAGEREF_ACTIVATE;
>   /*
>* All mapped pages start out with page table
> @@ -903,7 +903,7 @@ static enum page_references page_check_references(struct 
> page *page,
>  
>  /* Check if a page is dirty or under writeback */
>  static void page_check_dirty_writeback(struct page *page,
> -bool *dirty, bool *writeback)
> + bool *dirty, bool *writeback, bool lazyfree)
>  {
>   struct address_space *mapping;
>  
> @@ -911,7 +911,7 @@ static void page_check_dirty_writeback(struct page *page,
>* Anonymous pages are not handled by flushers and must be written
>* from reclaim context. Do not stall reclaim based on them
>*/
> - if (!page_is_file_cache(page)) {
> + if (!page_is_file_cache(page) || lazyfree) {

tivial:

We can check it with PageLazyFree in here rather than passing lazyfree
argument. It's consistent like page_is_file_cache in here.

>   *dirty = false;
>   *writeback = false;
>   return;
> @@ -971,7 +971,7 @@ static unsigned long shrink_page_list(struct list_head 
> *page_list,
>   int may_enter_fs;
>   enum page_references references = PAGEREF_RECLAIM_CLEAN;
>   bool dirty, writeback;
> - bool lazyfree = false;
> + bool lazyfree;
>   int ret = SWAP_SUCCESS;
>  
>   cond_resched();
> @@ -986,6 +986,8 @@ static unsigned long shrink_page_list(struct list_head 
> *page_list,
>  
>   sc->nr_scanned++;
>  
> + lazyfree = page_is_lazyfree(page);
> +
>   if (unlikely(!page_evictable(page)))
>   goto cull_mlocked;
>  
> @@ -993,7 +995,7 @@ static unsigned long shrink_page_list(struct list_head 
> *page_list,
>   goto keep_locked;
>  
>   /* Double the slab pressure for mapped and swapcache pages */
> - if (page_mapped(page) || PageSwapCache(page))
> + if ((page_mapped(page) || PageSwapCache(page)) && !lazyfree)
>   sc->nr_scanned++;

In this phase, we cannot know whether lazyfree marked page is discarable
or not. If it is freeable and mapped, this logic makes sense. However,
if the page is dirty?

>  
>   may_enter_fs = (sc->gfp_mask & __GFP_FS) ||
> @@ -1005,7 +1007,7 @@ static unsigned long shrink_page_list(struct list_head 
> *page_list,
>* will stall and start writing pages if the tail of the LRU
>* is all dirty unqueued pages.
>

Re: [BUG] port（belong to a lacp aggregate group）cannot be unselected while the connected port failed

2017-02-09 Thread zhou zhengwu

ad_rx_machine() -> __update_selected(), this funcition will check if any 
lacp parameter is different. If any change happened, related port will 
be unselected.


However, this function will not check the changing of port state, for 
example, whether partner port's port state is synchronization, which 
will make lacp state machine fail.



On 2017/2/9 17:19, zhou zhengwu wrote:

Hi

[1] Testing OS: SUSE 11SP3( kernel version 3.0.93)

[2]Problem description:
   ServerA [bond] --- [bond] ServerB
   Two servers are connected with bonding interfaces LACP is enabled.
   Firstly, it works well.
   Then, one port (for example, port B) in serverB failed, it can send
but not receive packets including the LACPDU however the port status is
still UP.
   Result:
   In serverB, port B is unselected and traffic is sent through other
aggregate port.
   While In serverA, port A connecting with port B of serverB is still
selected by LACP and traffic is still sent through it

[3]Two question about the implementation of LACP RX machine:
  1. Following judging condition marked with “*” is reasonable?

static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
{
 .
   case AD_RX_CURRENT:
// detect loopback situation
if
(!MAC_ADDRESS_COMPARE(&(lacpdu->actor_system), &(port->actor_system))) {
 // INFO_RECEIVED_LOOPBACK_FRAMES
 pr_err("%s: An illegal loopback
occurred on adapter (%s).\n"
"Check the configuration to
verify that all adapters are connected to 802.3ad compliant switch
ports\n",

port->slave->dev->master->name, port->slave->dev->name);
 return;
}
__update_selected(lacpdu, port);
__update_ntt(lacpdu, port);
__record_pdu(lacpdu, port);
port->sm_rx_timer_counter =
__ad_timer_to_ticks(AD_CURRENT_WHILE_TIMER,
(u16)(port->actor_oper_port_state & AD_STATE_LACP_TIMEOUT));
port->actor_oper_port_state &=
~AD_STATE_EXPIRED;
// verify that if the aggregator is enabled,
the port is enabled too.
//(because if the link goes down for a short
time, the 802.3ad will not
// catch it, and the port will continue to
be disabled)
 *if (port->aggregator
 *   && port->aggregator->is_active
 *   && !__port_is_enabled(port))
 *__enable_port(port); 
is the judging condition reasonable?
break;
   .
}

   2. While lacp rx machine receive the lacpdu packets, whether the port
should be selected or not should be done. Currently,if lacp rx machine
is in AD_RX_CURRENT and if the partner's configuration is unchanged,
port will still be selected. However, it is not reasonable in some
conditions. For example, partner port failed which can send but not
receive pkts. At the same time, the port still be UP.

static void __update_selected(struct lacpdu *lacpdu, struct port *port)
{
if (lacpdu && port) {
const struct port_params *partner = &port->partner_oper;

// check if any parameter is different
if (ntohs(lacpdu->actor_port) != partner->port_number ||
ntohs(lacpdu->actor_port_priority) != partner->port_priority ||
MAC_ADDRESS_COMPARE(&lacpdu->actor_system, &partner->system) ||
ntohs(lacpdu->actor_system_priority) !=
partner->system_priority ||
ntohs(lacpdu->actor_key) != partner->key ||
(lacpdu->actor_state & AD_STATE_AGGREGATION) !=
(partner->port_state & AD_STATE_AGGREGATION)) {
// update the state machine Selected variable
port->sm_vars &= ~AD_PORT_SELECTED;
}
}
}

linux-next: manual merge of the akpm-current tree with the xfs tree

2017-02-09 Thread Stephen Rothwell

Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  fs/dax.c

between commit:

  8ff6daa17b6a ("iomap: constify struct iomap_ops")

from the xfs tree and commit:

  5dce5c335124 ("mm,fs,dax: change ->pmd_fault to ->huge_fault")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc fs/dax.c
index 99b5b4458a78,97b8ecbed1c4..
--- a/fs/dax.c
+++ b/fs/dax.c
@@@ -1116,20 -1114,9 +1118,9 @@@ static int dax_fault_return(int error
return VM_FAULT_SIGBUS;
  }
  
- /**
-  * dax_iomap_fault - handle a page fault on a DAX file
-  * @vma: The virtual memory area where the fault occurred
-  * @vmf: The description of the fault
-  * @ops: iomap ops passed from the file system
-  *
-  * When a page fault occurs, filesystems may call this helper in their fault
-  * or mkwrite handler for DAX files. Assumes the caller has done all the
-  * necessary locking for the page fault to proceed successfully.
-  */
- int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
-   const struct iomap_ops *ops)
 -static int dax_iomap_pte_fault(struct vm_fault *vmf, struct iomap_ops *ops)
++static int dax_iomap_pte_fault(struct vm_fault *vmf, const struct iomap_ops 
*ops)
  {
-   struct address_space *mapping = vma->vm_file->f_mapping;
+   struct address_space *mapping = vmf->vma->vm_file->f_mapping;
struct inode *inode = mapping->host;
unsigned long vaddr = vmf->address;
loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
@@@ -1309,28 -1300,33 +1304,33 @@@ static int dax_pmd_load_hole(struct vm_
ret = dax_insert_mapping_entry(mapping, vmf, *entryp, 0,
RADIX_DAX_PMD | RADIX_DAX_HZP);
if (IS_ERR(ret))
-   return VM_FAULT_FALLBACK;
+   goto fallback;
*entryp = ret;
  
-   ptl = pmd_lock(vma->vm_mm, pmd);
-   if (!pmd_none(*pmd)) {
+   ptl = pmd_lock(vmf->vma->vm_mm, vmf->pmd);
+   if (!pmd_none(*(vmf->pmd))) {
spin_unlock(ptl);
-   return VM_FAULT_FALLBACK;
+   goto fallback;
}
  
-   pmd_entry = mk_pmd(zero_page, vma->vm_page_prot);
+   pmd_entry = mk_pmd(zero_page, vmf->vma->vm_page_prot);
pmd_entry = pmd_mkhuge(pmd_entry);
-   set_pmd_at(vma->vm_mm, pmd_addr, pmd, pmd_entry);
+   set_pmd_at(vmf->vma->vm_mm, pmd_addr, vmf->pmd, pmd_entry);
spin_unlock(ptl);
+   trace_dax_pmd_load_hole(inode, vmf, zero_page, ret);
return VM_FAULT_NOPAGE;
+ 
+ fallback:
+   trace_dax_pmd_load_hole_fallback(inode, vmf, zero_page, ret);
+   return VM_FAULT_FALLBACK;
  }
  
- int dax_iomap_pmd_fault(struct vm_area_struct *vma, unsigned long address,
-   pmd_t *pmd, unsigned int flags, const struct iomap_ops *ops)
 -static int dax_iomap_pmd_fault(struct vm_fault *vmf, struct iomap_ops *ops)
++static int dax_iomap_pmd_fault(struct vm_fault *vmf, const struct iomap_ops 
*ops)
  {
+   struct vm_area_struct *vma = vmf->vma;
struct address_space *mapping = vma->vm_file->f_mapping;
-   unsigned long pmd_addr = address & PMD_MASK;
-   bool write = flags & FAULT_FLAG_WRITE;
+   unsigned long pmd_addr = vmf->address & PMD_MASK;
+   bool write = vmf->flags & FAULT_FLAG_WRITE;
unsigned int iomap_flags = (write ? IOMAP_WRITE : 0) | IOMAP_FAULT;
struct inode *inode = mapping->host;
int result = VM_FAULT_FALLBACK;
@@@ -1429,10 -1422,40 +1426,40 @@@
}
   fallback:
if (result == VM_FAULT_FALLBACK) {
-   split_huge_pmd(vma, pmd, address);
+   split_huge_pmd(vma, vmf->pmd, vmf->address);
count_vm_event(THP_FAULT_FALLBACK);
}
+ out:
+   trace_dax_pmd_fault_done(inode, vmf, max_pgoff, result);
return result;
  }
- EXPORT_SYMBOL_GPL(dax_iomap_pmd_fault);
+ #else
 -static int dax_iomap_pmd_fault(struct vm_fault *vmf, struct iomap_ops *ops)
++static int dax_iomap_pmd_fault(struct vm_fault *vmf, const struct iomap_ops 
*ops)
+ {
+   return VM_FAULT_FALLBACK;
+ }
  #endif /* CONFIG_FS_DAX_PMD */
+ 
+ /**
+  * dax_iomap_fault - handle a page fault on a DAX file
+  * @vmf: The description of the fault
+  * @ops: iomap ops passed from the file system
+  *
+  * When a page fault occurs, filesystems may call this helper in
+  * their fault handler for DAX files. dax_iomap_fault() assumes the caller
+  * has done all the necessary locking for page fault to proceed
+  * successfully.
+  */
+ int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
 -  st

Re: [PATCH v3] usb: gadget: configfs: Fix KASAN use-after-free

2017-02-09 Thread Macpaul Lin

Hi Jim,

> Jim Lin  writes:
> > When gadget is disconnected, running sequence is like this.
> > . composite_disconnect
> > . Call trace:
> >   usb_string_copy+0xd0/0x128
> >   gadget_config_name_configuration_store+0x4
> >   gadget_config_name_attr_store+0x40/0x50
> >   configfs_write_file+0x198/0x1f4
> >   vfs_write+0x100/0x220
> >   SyS_write+0x58/0xa8
> > . configfs_composite_unbind
> > . configfs_composite_bind
> >

[deleted]

> > When "strlen(s->s) of usb_gadget_get_string is being executed, the dangling
> > memory is accessed, "BUG: KASAN: use-after-free" error occurs.
> >
> > Signed-off-by: Jim Lin 
> > ---
> > Changes in v2:
> > Changes in v3:
> >  Change commit description
>
> well, I need to be sure you tested this with Linus' tree. The reason I'm
> asking is because this could be a bug caused by Android changes. From
> your previous patch, the problem started with android_setup().
>
> Please test with v4.10-rc4 and any configfs-based gadget.
>
> --
> balbi

I've got the similar problem on Android, however,
Linux guys require you and other people to test your patch on pure Linux.
Since Linux is exactly a "PC" based OS, only common patches should be
commit to Linux code base.
Except the bug is quite common in 3 OS, in "Linux PC" and in "Android
Linux" or "Chromium OS".

I'm not sure about the difference between Chromium OS and Linux PC.
According to CVE report, it looks like the change is from  Chromium OS?
Dose Nvidia has a pure Linux software team can verify your patch on
your platform?
I think if you can prove the result is okay on Linux PC or on Chromium
OS will help.

-- 
Best regards,
Macpaul Lin

linux-next: manual merge of the akpm-current tree with the xfs tree

2017-02-09 Thread Stephen Rothwell

Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  fs/iomap.c

between commit:

  8ff6daa17b6a ("iomap: constify struct iomap_ops")

from the xfs tree and commit:

  d6fb008f420c ("mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take 
only vmf")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc fs/iomap.c
index d89f70bbb952,ccba165bed49..
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@@ -445,11 -445,10 +445,10 @@@ iomap_page_mkwrite_actor(struct inode *
return length;
  }
  
- int iomap_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf,
-   const struct iomap_ops *ops)
 -int iomap_page_mkwrite(struct vm_fault *vmf, struct iomap_ops *ops)
++int iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops)
  {
struct page *page = vmf->page;
-   struct inode *inode = file_inode(vma->vm_file);
+   struct inode *inode = file_inode(vmf->vma->vm_file);
unsigned long length;
loff_t offset, size;
ssize_t ret;

Re: [PATCH] i2c: designware: Fix regression when dynamic TAR update is disabled

2017-02-09 Thread Suravee Suthikulpanit




On 2/10/17 04:34, De Marchi, Lucas wrote:

On Thu, 2017-02-09 at 22:07 +0200, Andy Shevchenko wrote:

On Fri, 2017-02-10 at 01:20 +0530, Shah Nehal-Bakulchandra wrote:

The following commit causes a regression when dynamic TAR update is
disabled:

 commit 63d0f0a6952a1a02bc4f116b7da7c7887e46efa3 ("i2c:
designware:
 detect when dynamic tar update is possible")


Please, leave just 12 characters, it still enough.


In such case, the DW_IC_CON_10BITADDR_MASTER is R/W, and is changed
by the logic that's trying to detect  dynamic TAR update.The original
value of DW_IC_CON_10BITADDR_MASTER bit should be restored.


You are right, thanks for the fix.  This may also explains why
0317e6c (i2c: designware: do not disable adapter after transfer) caused problems
and ended up being reverted.  Could you try that on your hardware?


After looking at the patch (i2c: designware: do not disable adapter after 
transfer),
we see that it modifies the i2c_dw_xfer_init(). However, this function is never
called on our platform. So, this patch would not have any effects.

At this points, my understanding is there are probably two options here:

1) Keep the commit 63d0f0a6952a (i2c: designware: detect when dynamic tar update
   is possible) and apply V2 of this patch in 4.10. We might need to back-port 
the change
   to v4.9 stable as well.

2) Revert the 63d0f0a6952a (i2c: designware: detect when dynamic tar update is 
possible) in 4.10,
   and also in v4.9 stable as well.

Thanks,
Suravee


The dynamic tar update detection was only done as preparation work to allow not
disabling the adapter, which is reverted.  We may also just revert this commit
instead of fixing the logic.


thanks
Lucas De Marchi

[PATCH v3 0/2] drm/rockchip: switch to drm_mm for support arm64 iommu

2017-02-09 Thread Mark Yao

Some iommu patches on the series[0] "iommu/rockchip: Fix bugs and
enable on ARM64" already landed, So drm/rockchip related patches [1] and [2]
ready to landed, this series just rebase them to lastest drm-next.

Thanks Heiko's Tested-by on rk3288 and rk3399 platform.

Changes in v3:
Advices by Tomasz and Thierry:
  merge fixes into origin patch.

Changes in v2:
Advices by Tomasz:
  add some fixes patches from chromeos project.

Shunqian Zheng (1):
  drm/rockchip: Use common IOMMU API to attach devices

Tomasz Figa (1):
  drm/rockchip: Do not use DMA mapping API if attached to IOMMU domain

 drivers/gpu/drm/rockchip/rockchip_drm_drv.c | 101 ++--
 drivers/gpu/drm/rockchip/rockchip_drm_drv.h |   6 +-
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 244 ++--
 drivers/gpu/drm/rockchip/rockchip_drm_gem.h |   8 +
 4 files changed, 298 insertions(+), 61 deletions(-)

-- 
1.9.1

[PATCH v3 1/2] drm/rockchip: Do not use DMA mapping API if attached to IOMMU domain

2017-02-09 Thread Mark Yao

From: Tomasz Figa 

The API is not suitable for subsystems consisting of multiple devices
and requires severe hacks to use it. To mitigate this, this patch
implements allocation and address space management locally by using
helpers provided by DRM framework, like other DRM drivers do, e.g.
Tegra.

This patch should not introduce any functional changes until the driver
is made to attach subdevices into an IOMMU domain with the generic IOMMU
API, which will happen in following patch. Based heavily on GEM
implementation of Tegra DRM driver.

Signed-off-by: Tomasz Figa 
Signed-off-by: Shunqian Zheng 
Signed-off-by: Mark Yao 
Signed-off-by: rjan Eide 
Tested-by: Heiko Stuebner 
---
 drivers/gpu/drm/rockchip/rockchip_drm_drv.h |   6 +-
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 244 ++--
 drivers/gpu/drm/rockchip/rockchip_drm_gem.h |   8 +
 3 files changed, 244 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.h 
b/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
index fb6226c..adc3930 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
@@ -30,6 +30,7 @@
 
 struct drm_device;
 struct drm_connector;
+struct iommu_domain;
 
 /*
  * Rockchip drm private crtc funcs.
@@ -60,7 +61,10 @@ struct rockchip_drm_private {
struct drm_gem_object *fbdev_bo;
const struct rockchip_crtc_funcs *crtc_funcs[ROCKCHIP_MAX_CRTC];
struct drm_atomic_state *state;
-
+   struct iommu_domain *domain;
+   /* protect drm_mm on multi-threads */
+   struct mutex mm_lock;
+   struct drm_mm mm;
struct list_head psr_list;
spinlock_t psr_list_lock;
 };
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
index b70f942..df9e570 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
@@ -16,11 +16,146 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "rockchip_drm_drv.h"
 #include "rockchip_drm_gem.h"
 
-static int rockchip_gem_alloc_buf(struct rockchip_gem_object *rk_obj,
+static int rockchip_gem_iommu_map(struct rockchip_gem_object *rk_obj)
+{
+   struct drm_device *drm = rk_obj->base.dev;
+   struct rockchip_drm_private *private = drm->dev_private;
+   int prot = IOMMU_READ | IOMMU_WRITE;
+   ssize_t ret;
+
+   mutex_lock(&private->mm_lock);
+
+   ret = drm_mm_insert_node_generic(&private->mm, &rk_obj->mm,
+rk_obj->base.size, PAGE_SIZE,
+0, 0);
+
+   mutex_unlock(&private->mm_lock);
+   if (ret < 0) {
+   DRM_ERROR("out of I/O virtual memory: %zd\n", ret);
+   return ret;
+   }
+
+   rk_obj->dma_addr = rk_obj->mm.start;
+
+   ret = iommu_map_sg(private->domain, rk_obj->dma_addr, rk_obj->sgt->sgl,
+  rk_obj->sgt->nents, prot);
+   if (ret < rk_obj->base.size) {
+   DRM_ERROR("failed to map buffer: size=%zd request_size=%zd\n",
+ ret, rk_obj->base.size);
+   ret = -ENOMEM;
+   goto err_remove_node;
+   }
+
+   rk_obj->size = ret;
+
+   return 0;
+
+err_remove_node:
+   drm_mm_remove_node(&rk_obj->mm);
+
+   return ret;
+}
+
+static int rockchip_gem_iommu_unmap(struct rockchip_gem_object *rk_obj)
+{
+   struct drm_device *drm = rk_obj->base.dev;
+   struct rockchip_drm_private *private = drm->dev_private;
+
+   iommu_unmap(private->domain, rk_obj->dma_addr, rk_obj->size);
+
+   mutex_lock(&private->mm_lock);
+
+   drm_mm_remove_node(&rk_obj->mm);
+
+   mutex_unlock(&private->mm_lock);
+
+   return 0;
+}
+
+static int rockchip_gem_get_pages(struct rockchip_gem_object *rk_obj)
+{
+   struct drm_device *drm = rk_obj->base.dev;
+   int ret, i;
+   struct scatterlist *s;
+
+   rk_obj->pages = drm_gem_get_pages(&rk_obj->base);
+   if (IS_ERR(rk_obj->pages))
+   return PTR_ERR(rk_obj->pages);
+
+   rk_obj->num_pages = rk_obj->base.size >> PAGE_SHIFT;
+
+   rk_obj->sgt = drm_prime_pages_to_sg(rk_obj->pages, rk_obj->num_pages);
+   if (IS_ERR(rk_obj->sgt)) {
+   ret = PTR_ERR(rk_obj->sgt);
+   goto err_put_pages;
+   }
+
+   /*
+* Fake up the SG table so that dma_sync_sg_for_device() can be used
+* to flush the pages associated with it.
+*
+* TODO: Replace this by drm_clflush_sg() once it can be implemented
+* without relying on symbols that are not exported.
+*/
+   for_each_sg(rk_obj->sgt->sgl, s, rk_obj->sgt->nents, i)
+   sg_dma_address(s) = sg_phys(s);
+
+   dma_sync_sg_for_device(drm->dev, rk_obj->sgt->sgl, rk_obj->sgt->nents,
+  DMA_TO_DEVICE);
+
+   return 0;
+
+err_put_pages:
+   drm_gem_put_pages(&rk_obj->base, rk_obj-

[PATCH v3 2/2] drm/rockchip: Use common IOMMU API to attach devices

2017-02-09 Thread Mark Yao

From: Shunqian Zheng 

Rockchip DRM used the arm special API, arm_iommu_*(), to attach
iommu for ARM32 SoCs. This patch convert to common iommu API
so it would support ARM64 like RK3399.

Since previous patch added support for direct IOMMU address space
management, there is no need to use DMA API anymore and this patch wires
things to use the new method.

Signed-off-by: Shunqian Zheng 
Signed-off-by: Tomasz Figa 
Signed-off-by: Mark Yao 
Tested-by: Heiko Stuebner 
---
 drivers/gpu/drm/rockchip/rockchip_drm_drv.c | 101 +++-
 1 file changed, 54 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
index c30d649..b360e62 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
@@ -14,19 +14,19 @@
  * GNU General Public License for more details.
  */
 
-#include 
-
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include "rockchip_drm_drv.h"
 #include "rockchip_drm_fb.h"
@@ -50,28 +50,31 @@
 int rockchip_drm_dma_attach_device(struct drm_device *drm_dev,
   struct device *dev)
 {
-   struct dma_iommu_mapping *mapping = drm_dev->dev->archdata.mapping;
+   struct rockchip_drm_private *private = drm_dev->dev_private;
int ret;
 
if (!is_support_iommu)
return 0;
 
-   ret = dma_set_coherent_mask(dev, DMA_BIT_MASK(32));
-   if (ret)
+   ret = iommu_attach_device(private->domain, dev);
+   if (ret) {
+   dev_err(dev, "Failed to attach iommu device\n");
return ret;
+   }
 
-   dma_set_max_seg_size(dev, DMA_BIT_MASK(32));
-
-   return arm_iommu_attach_device(dev, mapping);
+   return 0;
 }
 
 void rockchip_drm_dma_detach_device(struct drm_device *drm_dev,
struct device *dev)
 {
+   struct rockchip_drm_private *private = drm_dev->dev_private;
+   struct iommu_domain *domain = private->domain;
+
if (!is_support_iommu)
return;
 
-   arm_iommu_detach_device(dev);
+   iommu_detach_device(domain, dev);
 }
 
 int rockchip_register_crtc_funcs(struct drm_crtc *crtc,
@@ -123,11 +126,46 @@ static void rockchip_drm_crtc_disable_vblank(struct 
drm_device *dev,
priv->crtc_funcs[pipe]->disable_vblank(crtc);
 }
 
+static int rockchip_drm_init_iommu(struct drm_device *drm_dev)
+{
+   struct rockchip_drm_private *private = drm_dev->dev_private;
+   struct iommu_domain_geometry *geometry;
+   u64 start, end;
+
+   if (!is_support_iommu)
+   return 0;
+
+   private->domain = iommu_domain_alloc(&platform_bus_type);
+   if (!private->domain)
+   return -ENOMEM;
+
+   geometry = &private->domain->geometry;
+   start = geometry->aperture_start;
+   end = geometry->aperture_end;
+
+   DRM_DEBUG("IOMMU context initialized (aperture: %#llx-%#llx)\n",
+ start, end);
+   drm_mm_init(&private->mm, start, end - start + 1);
+   mutex_init(&private->mm_lock);
+
+   return 0;
+}
+
+static void rockchip_iommu_cleanup(struct drm_device *drm_dev)
+{
+   struct rockchip_drm_private *private = drm_dev->dev_private;
+
+   if (!is_support_iommu)
+   return;
+
+   drm_mm_takedown(&private->mm);
+   iommu_domain_free(private->domain);
+}
+
 static int rockchip_drm_bind(struct device *dev)
 {
struct drm_device *drm_dev;
struct rockchip_drm_private *private;
-   struct dma_iommu_mapping *mapping = NULL;
int ret;
 
drm_dev = drm_dev_alloc(&rockchip_drm_driver, dev);
@@ -151,38 +189,14 @@ static int rockchip_drm_bind(struct device *dev)
 
rockchip_drm_mode_config_init(drm_dev);
 
-   dev->dma_parms = devm_kzalloc(dev, sizeof(*dev->dma_parms),
- GFP_KERNEL);
-   if (!dev->dma_parms) {
-   ret = -ENOMEM;
+   ret = rockchip_drm_init_iommu(drm_dev);
+   if (ret)
goto err_config_cleanup;
-   }
-
-   if (is_support_iommu) {
-   /* TODO(djkurtz): fetch the mapping start/size from somewhere */
-   mapping = arm_iommu_create_mapping(&platform_bus_type,
-  0x,
-  SZ_2G);
-   if (IS_ERR(mapping)) {
-   ret = PTR_ERR(mapping);
-   goto err_config_cleanup;
-   }
-
-   ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
-   if (ret)
-   goto err_release_mapping;
-
-   dma_set_max_seg_size(dev, DMA_BIT_MASK(32));
-
-   ret = arm_iommu_attach_device(dev, mapping);
-   if (ret)
-

Re: [RFC][PATCH 09/21] tracing: Add hist trigger timestamp support

2017-02-09 Thread Namhyung Kim

On Wed, Feb 08, 2017 at 11:25:05AM -0600, Tom Zanussi wrote:
> Add support for a timestamp event field.  This is actually a 'pseudo-'
> event field in that it behaves like it's part of the event record, but
> is really part of the corresponding ring buffer event.
> 
> To make use of the timestamp field, users can specify "common_timestamp"
> as a field name for any histogram.  Note that this doesn't make much
> sense on its own either as either a key or value, but needs to be
> supported even so, since follow-on patches will add support for making
> use of this field in time deltas.
> 
> Note that the use of this field requires the ring buffer be put into
> TIME_EXTEND_ABS mode, which saves the complete timestamp for each
> event rather than an offset.  This mode will be enabled if and only if
> a histogram makes use of the "common_timestamp" field.
> 
> Signed-off-by: Tom Zanussi 
> ---
>  kernel/trace/trace_events_hist.c | 85 
> +---
>  1 file changed, 62 insertions(+), 23 deletions(-)
> 
> diff --git a/kernel/trace/trace_events_hist.c 
> b/kernel/trace/trace_events_hist.c
> index 4e70872..8d7f7dd 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -88,6 +88,12 @@ static u64 hist_field_log2(struct hist_field *hist_field, 
> void *event,
>   return (u64) ilog2(roundup_pow_of_two(val));
>  }
>  
> +static u64 hist_field_timestamp(struct hist_field *hist_field, void *event,
> + struct ring_buffer_event *rbe)
> +{
> + return ring_buffer_event_time_stamp(rbe);
> +}
> +
>  #define DEFINE_HIST_FIELD_FN(type)   \
>   static u64 hist_field_##type(struct hist_field *hist_field, \
>void *event,   \
> @@ -134,6 +140,7 @@ enum hist_field_flags {
>   HIST_FIELD_FL_SYSCALL   = 128,
>   HIST_FIELD_FL_STACKTRACE= 256,
>   HIST_FIELD_FL_LOG2  = 512,
> + HIST_FIELD_FL_TIMESTAMP = 1024,
>  };
>  
>  struct hist_trigger_attrs {
> @@ -158,6 +165,7 @@ struct hist_trigger_data {
>   struct trace_event_file *event_file;
>   struct hist_trigger_attrs   *attrs;
>   struct tracing_map  *map;
> + boolenable_timestamps;
>  };
>  
>  static const char *hist_field_name(struct hist_field *field)
> @@ -404,6 +412,7 @@ static struct hist_field *create_hist_field(struct 
> ftrace_event_field *field,
>   hist_field = kzalloc(sizeof(struct hist_field), GFP_KERNEL);
>   if (!hist_field)
>   return NULL;
> + hist_field->is_signed = false;

I think it's not needed since hist_field is allocated by kzalloc().
Also I cannot find where ->is_signed is set.

>  
>   if (flags & HIST_FIELD_FL_HITCOUNT) {
>   hist_field->fn = hist_field_counter;
> @@ -423,6 +432,12 @@ static struct hist_field *create_hist_field(struct 
> ftrace_event_field *field,
>   goto out;
>   }
>  
> + if (flags & HIST_FIELD_FL_TIMESTAMP) {
> + hist_field->fn = hist_field_timestamp;
> + hist_field->size = sizeof(u64);
> + goto out;
> + }
> +
>   if (WARN_ON_ONCE(!field))
>   goto out;
>

[SNIP]
> @@ -1520,6 +1557,8 @@ static int hist_register_trigger(char *glob, struct 
> event_trigger_ops *ops,
>  
>   update_cond_flag(file);
>  
> + tracing_set_time_stamp_abs(file->tr, true);
> +

Hmm.. does it makes all events using absolute timestamp?  Where is it
turned off?

Thanks,
Namhyung


>   if (trace_event_trigger_enable_disable(file, 1) < 0) {
>   list_del_rcu(&data->list);
>   update_cond_flag(file);
> -- 
> 1.9.3
>

Re: [RFC][PATCH] tracing: Have traceprobe_probes_write() not access userspace unnecessarily

2017-02-09 Thread Namhyung Kim

On Thu, Feb 09, 2017 at 06:04:58PM -0500, Steven Rostedt wrote:
> 
> The code in traceprobe_probes_write() reads up to 4096 bytes from userpace
> for each line. If userspace passes in several lines to execute, the code
> will do a large read for each line, even though, it is highly likely that
> the first read from userspace received all of the lines at one.
> 
> I changed the logic to do a single read from userspace, and to only read
> from userspace again if not all of the read from userspace made it in.
> 
> I tested this by adding printk()s and writing files that would test -1, ==,
> and +1 the buffer size, to make sure that there's no overflows and that if a
> single line is written with +1 the buffer size, that it fails properly.
> 
> Signed-off-by: Steven Rostedt (VMware) 

Acked-by: Namhyung Kim 

Thanks,
Namhyung


> ---
>  kernel/trace/trace_probe.c | 48 
> --
>  1 file changed, 29 insertions(+), 19 deletions(-)
> 
> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> index 8c0553d..2a06f1f 100644
> --- a/kernel/trace/trace_probe.c
> +++ b/kernel/trace/trace_probe.c
> @@ -647,7 +647,7 @@ ssize_t traceprobe_probes_write(struct file *file, const 
> char __user *buffer,
>   size_t count, loff_t *ppos,
>   int (*createfn)(int, char **))
>  {
> - char *kbuf, *tmp;
> + char *kbuf, *buf, *tmp;
>   int ret = 0;
>   size_t done = 0;
>   size_t size;
> @@ -667,27 +667,37 @@ ssize_t traceprobe_probes_write(struct file *file, 
> const char __user *buffer,
>   goto out;
>   }
>   kbuf[size] = '\0';
> - tmp = strchr(kbuf, '\n');
> + buf = kbuf;
> + do {
> + tmp = strchr(buf, '\n');
> + if (tmp) {
> + *tmp = '\0';
> + size = tmp - buf + 1;
> + } else {
> + size = strlen(buf);
> + if (done + size < count) {
> + if (buf != kbuf)
> + break;
> + pr_warn("Line length is too long: 
> Should be less than %d\n",
> + WRITE_BUFSIZE);
> + ret = -EINVAL;
> + goto out;
> + }
> + }
> + done += size;
>  
> - if (tmp) {
> - *tmp = '\0';
> - size = tmp - kbuf + 1;
> - } else if (done + size < count) {
> - pr_warn("Line length is too long: Should be less than 
> %d\n",
> - WRITE_BUFSIZE);
> - ret = -EINVAL;
> - goto out;
> - }
> - done += size;
> - /* Remove comments */
> - tmp = strchr(kbuf, '#');
> + /* Remove comments */
> + tmp = strchr(buf, '#');
>  
> - if (tmp)
> - *tmp = '\0';
> + if (tmp)
> + *tmp = '\0';
>  
> - ret = traceprobe_command(kbuf, createfn);
> - if (ret)
> - goto out;
> + ret = traceprobe_command(buf, createfn);
> + if (ret)
> + goto out;
> + buf += size;
> +
> + } while (done < count);
>   }
>   ret = done;
>  
> -- 
> 2.9.3
>

Re: [PATCH 4/4] [media] s5p-mfc: Always check and set 'v4l2_pix_format:field' field

2017-02-09 Thread kbuild test robot

Hi Thibault,

[auto build test ERROR on linuxtv-media/master]
[also build test ERROR on v4.10-rc7 next-20170209]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Thibault-Saunier/exynos-gsc-Use-576p-instead-720p-as-a-threshold-for-colorspaces/20170210-113658
base:   git://linuxtv.org/media_tree.git master
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 6.2.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=ia64 

All error/warnings (new ones prefixed by >>):

   In file included from drivers/media/platform/s5p-mfc/s5p_mfc_dec.c:28:0:
   drivers/media/platform/s5p-mfc/s5p_mfc_dec.c: In function 'vidioc_try_fmt':
>> drivers/media/platform/s5p-mfc/s5p_mfc_debug.h:25:23: warning: comparison 
>> between pointer and integer
  if (mfc_debug_level >= level)   \
  ^
>> drivers/media/platform/s5p-mfc/s5p_mfc_dec.c:404:3: note: in expansion of 
>> macro 'mfc_debug'
  mfc_debug("Not supported field order(%d)\n", pix_mp->field);
  ^
>> drivers/media/platform/s5p-mfc/s5p_mfc_debug.h:25:23: warning: comparison 
>> with string literal results in unspecified behavior [-Waddress]
  if (mfc_debug_level >= level)   \
  ^
>> drivers/media/platform/s5p-mfc/s5p_mfc_dec.c:404:3: note: in expansion of 
>> macro 'mfc_debug'
  mfc_debug("Not supported field order(%d)\n", pix_mp->field);
  ^
>> drivers/media/platform/s5p-mfc/s5p_mfc_dec.c:404:48: error: expected ')' 
>> before 'pix_mp'
  mfc_debug("Not supported field order(%d)\n", pix_mp->field);
   ^
   drivers/media/platform/s5p-mfc/s5p_mfc_debug.h:26:32: note: in definition of 
macro 'mfc_debug'
   printk(KERN_DEBUG "%s:%d: " fmt, \
   ^~~
   In file included from include/linux/printk.h:6:0,
from include/linux/kernel.h:13,
from include/linux/clk.h:16,
from drivers/media/platform/s5p-mfc/s5p_mfc_dec.c:14:
   include/linux/kern_levels.h:4:18: warning: format '%s' expects a matching 
'char *' argument [-Wformat=]
#define KERN_SOH "\001"  /* ASCII Start Of Header */
 ^
   include/linux/kern_levels.h:14:20: note: in expansion of macro 'KERN_SOH'
#define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
   ^~~~
   drivers/media/platform/s5p-mfc/s5p_mfc_debug.h:26:11: note: in expansion of 
macro 'KERN_DEBUG'
   printk(KERN_DEBUG "%s:%d: " fmt, \
  ^~
>> drivers/media/platform/s5p-mfc/s5p_mfc_dec.c:404:3: note: in expansion of 
>> macro 'mfc_debug'
  mfc_debug("Not supported field order(%d)\n", pix_mp->field);
  ^
   include/linux/kern_levels.h:4:18: warning: format '%d' expects a matching 
'int' argument [-Wformat=]
#define KERN_SOH "\001"  /* ASCII Start Of Header */
 ^
   include/linux/kern_levels.h:14:20: note: in expansion of macro 'KERN_SOH'
#define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
   ^~~~
   drivers/media/platform/s5p-mfc/s5p_mfc_debug.h:26:11: note: in expansion of 
macro 'KERN_DEBUG'
   printk(KERN_DEBUG "%s:%d: " fmt, \
  ^~
>> drivers/media/platform/s5p-mfc/s5p_mfc_dec.c:404:3: note: in expansion of 
>> macro 'mfc_debug'
  mfc_debug("Not supported field order(%d)\n", pix_mp->field);
  ^

vim +404 drivers/media/platform/s5p-mfc/s5p_mfc_dec.c

   398  enum v4l2_field field;
   399  
   400  field = f->fmt.pix.field;
   401  if (field == V4L2_FIELD_ANY) {
   402  field = V4L2_FIELD_NONE;
   403  } else if (V4L2_FIELD_NONE != field) {
 > 404  mfc_debug("Not supported field order(%d)\n", 
 > pix_mp->field);
   405  return -EINVAL;
   406  }
   407  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH v4 4/8] ASoC: sun8i-codec-analog: Add amplifier event to fix first delay

2017-02-09 Thread Chen-Yu Tsai

On Thu, Feb 2, 2017 at 5:24 PM, Mylène Josserand
 wrote:
> When playing a sound for the first time, a short delay, where the audio
> file is not played, can be noticed.
> On a second play (right after), the sound is played correctly.
> If we wait a short time (~5 sec which corresponds to the aplay
> timeout), the delay is back.
>
> This patch fixes it by using an event on headphone amplifier.
> It allows to keep the amplifier enable while playing a sound.
> A delay of 700ms allows to wait that the amplifier is powered-up
> before playing the sound.
>
> Signed-off-by: Mylène Josserand 
> Acked-by: Maxime Ripard 

I get some static in my headphones in the time between when the
amplifier is enabled and when sound starts playing. Wonder if
this can be fixed in any way?

One solution that might work is to mute the headphone output
while the amp is being charged, by setting SUN8I_ADDA_HP_VOLC_HP_VOL
to 0, and then restoring the value once it is charged. In other
words, overriding the value for the duration of sun8i_headphone_amp_event.

Regards
ChenYu

Can EP bars be called as device memory ?

2017-02-09 Thread Valmiki Saleem

Hi,

In ARMv8 TRM:
An unaligned access to any type of Device memory causes an Alignment fault.

What is meant by device memory ?
Can we call PCIe BAR memory on End point cards as device memory ?

Thanks,
valmiki

Re: [PATCH 2/2] dmaengine: Add DW AXI DMAC driver

2017-02-09 Thread Vinod Koul

On Wed, Jan 25, 2017 at 06:34:17PM +0300, Eugeniy Paltsev wrote:
> This patch adds support for the DW AXI DMAC controller.
> 
> DW AXI DMAC is a part of upcoming development board from Synopsys.

How different is AXI from DW Synopsys?

Is the spec publicly available?

> +config AXI_DW_DMAC
> + tristate "Synopsys DesignWare AXI DMA support"
> + depends on OF && !64BIT

why not 64 bit, can you also add compile test

> +#define AXI_DMA_BUSWIDTHS  \
> + (DMA_SLAVE_BUSWIDTH_UNDEFINED   | \

DMA_SLAVE_BUSWIDTH_UNDEFINED??

> +static irqreturn_t dw_axi_dma_intretupt(int irq, void *dev_id)
> +{
> + struct axi_dma_chip *chip = dev_id;
> +
> + /* Disable DMAC inerrupts. We'll enable them in the tasklet */
> + axi_dma_irq_disable(chip);
> +
> + tasklet_schedule(&chip->dw->tasklet);

This is very inefficient, we would want to submit next txn here and not in
tasklet

> +static void axi_chan_block_xfer_complete(struct axi_dma_chan *chan)
> +{
> + struct virt_dma_desc *vd;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&chan->vc.lock, flags);
> + if (unlikely(axi_chan_is_hw_enable(chan))) {
> + dev_err(chan2dev(chan), "BUG: %s catched DWAXIDMAC_IRQ_DMA_TRF, 
> but channel not idle!\n",
> + axi_chan_name(chan));
> + axi_chan_disable(chan);
> + }
> +
> + /* The completed descriptor currently is in the head of vc list */
> + vd = vchan_next_desc(&chan->vc);
> + /* Remove the completed descriptor from issued list before completing */
> + list_del(&vd->node);
> + vchan_cookie_complete(vd);

this should be called from irq handler


> +static int dw_remove(struct platform_device *pdev)
> +{
> + struct axi_dma_chip *chip = platform_get_drvdata(pdev);
> + struct dw_axi_dma *dw = chip->dw;
> + struct axi_dma_chan *chan, *_chan;
> + u32 i;
> +
> + axi_dma_irq_disable(chip);
> + for (i = 0; i < dw->hdata->nr_channels; i++) {
> + axi_chan_disable(&chip->dw->chan[i]);
> + axi_chan_irq_disable(&chip->dw->chan[i], DWAXIDMAC_IRQ_ALL);
> + }
> + axi_dma_disable(chip);
> +
> + tasklet_kill(&dw->tasklet);
> +
> + list_for_each_entry_safe(chan, _chan, &dw->dma.channels,
> + vc.chan.device_node) {
> + list_del(&chan->vc.chan.device_node);
> + tasklet_kill(&chan->vc.task);
> + }

very nice :)

But please freeup irq as well

> +module_platform_driver(dw_driver);
> +
> +static int __init dw_init(void)
> +{
> + return platform_driver_register(&dw_driver);
> +}
> +subsys_initcall(dw_init);

why subsys_initcall??

> +/* Common registers offset */
> +#define DMAC_ID  0x000 // R DMAC ID
> +#define DMAC_COMPVER 0x008 // R DMAC Component Version
> +#define DMAC_CFG 0x010 // R/W DMAC Configuration
> +#define DMAC_CHEN0x018 // R/W DMAC Channel Enable
> +#define DMAC_CHEN_L  0x018 // R/W DMAC Channel Enable 00-31
> +#define DMAC_CHEN_H  0x01C // R/W DMAC Channel Enable 32-63
> +#define DMAC_INTSTATUS   0x030 // R DMAC Interrupt Status
> +#define DMAC_COMMON_INTCLEAR 0x038 // W DMAC Interrupt Clear
> +#define DMAC_COMMON_INTSTATUS_ENA 0x040 // R DMAC Interrupt Status Enable
> +#define DMAC_COMMON_INTSIGNAL_ENA 0x048 // R/W DMAC Interrupt Signal Enable
> +#define DMAC_COMMON_INTSTATUS0x050 // R DMAC Interrupt Status
> +#define DMAC_RESET   0x058 // R DMAC Reset Register1

DMAC is a generic term, AX_DMAC and no C99 style comment, checkpatch would
have cribbed

Use BITS and GENMASK here

> +
> +/* DMA channel registers offset */
> +#define CH_SAR   0x000 // R/W Chan Source Address
> +#define CH_DAR   0x008 // R/W Chan Destination Address
> +#define CH_BLOCK_TS  0x010 // R/W Chan Block Transfer Size
> +#define CH_CTL   0x018 // R/W Chan Control
> +#define CH_CTL_L 0x018 // R/W Chan Control 00-31
> +#define CH_CTL_H 0x01C // R/W Chan Control 32-63
> +#define CH_CFG   0x020 // R/W Chan Configuration
> +#define CH_CFG_L 0x020 // R/W Chan Configuration 00-31
> +#define CH_CFG_H 0x024 // R/W Chan Configuration 32-63
> +#define CH_LLP   0x028 // R/W Chan Linked List Pointer
> +#define CH_STATUS0x030 // R Chan Status
> +#define CH_SWHSSRC   0x038 // R/W Chan SW Handshake Source
> +#define CH_SWHSDST   0x040 // R/W Chan SW Handshake Destination
> +#define CH_BLK_TFR_RESUMEREQ 0x048 // W Chan Block Transfer Resume Req
> +#define CH_AXI_ID0x050 // R/W Chan AXI ID
> +#define CH_AXI_QOS   0x058 // R/W Chan AXI QOS
> +#define CH_SSTAT 0x060 // R Chan Source Status
> +#define CH_DSTAT 0x068 // R Chan Destination Status
> +#define CH_SSTATAR   0x070 // R/W Chan Source Status Fetch Addr
> +#define CH_DSTATAR

Re: [RFC][PATCH 03/21] ring-buffer: Add TIME_EXTEND_ABS ring buffer type

2017-02-09 Thread Namhyung Kim

On Wed, Feb 08, 2017 at 03:32:00PM -0500, Steven Rostedt wrote:
> On Wed,  8 Feb 2017 11:24:59 -0600
> Tom Zanussi  wrote:
> 
> > Replace the unused RINGBUF_TYPE_TIME_STAMP ring buffer type with
> > RINGBUF_TYPE_TIME_EXTEND_ABS, which forces extended time_deltas for
> > all events.
> 
> Hmm, I could probably have this be used for nested commits :-/
> 
> > 
> > Having time_deltas that aren't dependent on previous events in the
> > ring buffer makes it feasible to use the ring_buffer_event timetamps
> > in a more random-access way, to be used for purposes other than serial
> > event printing.
> > 
> > To set/reset this mode, use tracing_set_timestamp_abs().
> > 
> > Signed-off-by: Tom Zanussi 
> > ---
> >  include/linux/ring_buffer.h |  12 -
> >  kernel/trace/ring_buffer.c  | 109 
> > 
> >  kernel/trace/trace.c|  25 +-
> >  kernel/trace/trace.h|   2 +
> >  4 files changed, 117 insertions(+), 31 deletions(-)
> > 
> > diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> > index b6d4568..c3a1064 100644
> > --- a/include/linux/ring_buffer.h
> > +++ b/include/linux/ring_buffer.h
> > @@ -36,6 +36,12 @@ struct ring_buffer_event {
> >   *  array[0] = time delta (28 .. 59)
> >   *  size = 8 bytes
> >   *
> > + * @RINGBUF_TYPE_TIME_EXTEND_ABS:
> > + *  Extend the time delta, but interpret it as
> > + *  absolute, not relative
> > + *  array[0] = time delta (28 .. 59)

It's not a delta.

> > + *  size = 8 bytes
> > + *
> >   * @RINGBUF_TYPE_TIME_STAMP:   Sync time stamp with external clock
> 
> I guess you need to nuke this comment too.
> 
> >   *  array[0]= tv_nsec
> >   *  array[1..2] = tv_sec
> > @@ -56,12 +62,12 @@ enum ring_buffer_type {
> > RINGBUF_TYPE_DATA_TYPE_LEN_MAX = 28,
> > RINGBUF_TYPE_PADDING,
> > RINGBUF_TYPE_TIME_EXTEND,
> > -   /* FIXME: RINGBUF_TYPE_TIME_STAMP not implemented */
> > -   RINGBUF_TYPE_TIME_STAMP,
> > +   RINGBUF_TYPE_TIME_EXTEND_ABS,
> >  };
> >  
> >  unsigned ring_buffer_event_length(struct ring_buffer_event *event);
> >  void *ring_buffer_event_data(struct ring_buffer_event *event);
> > +u64 ring_buffer_event_time_stamp(struct ring_buffer_event *event);
> >  
> >  /*
> >   * ring_buffer_discard_commit will remove an event that has not
> > @@ -180,6 +186,8 @@ void ring_buffer_normalize_time_stamp(struct 
> > ring_buffer *buffer,
> >   int cpu, u64 *ts);
> >  void ring_buffer_set_clock(struct ring_buffer *buffer,
> >u64 (*clock)(void));
> > +void ring_buffer_set_time_stamp_abs(struct ring_buffer *buffer, bool abs);
> > +bool ring_buffer_time_stamp_abs(struct ring_buffer *buffer);
> >  
> >  size_t ring_buffer_page_len(void *page);
> >  
> > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > index a85739e..c9c9a83 100644
> > --- a/kernel/trace/ring_buffer.c
> > +++ b/kernel/trace/ring_buffer.c
> > @@ -41,6 +41,8 @@ int ring_buffer_print_entry_header(struct trace_seq *s)
> >  RINGBUF_TYPE_PADDING);
> > trace_seq_printf(s, "\ttime_extend : type == %d\n",
> >  RINGBUF_TYPE_TIME_EXTEND);
> > +   trace_seq_printf(s, "\ttime_extend_abs : type == %d\n",
> > +RINGBUF_TYPE_TIME_EXTEND_ABS);
> > trace_seq_printf(s, "\tdata max type_len  == %d\n",
> >  RINGBUF_TYPE_DATA_TYPE_LEN_MAX);
> >  
> > @@ -186,11 +188,9 @@ static void rb_event_set_padding(struct 
> > ring_buffer_event *event)
> > return  event->array[0] + RB_EVNT_HDR_SIZE;
> >  
> > case RINGBUF_TYPE_TIME_EXTEND:
> > +   case RINGBUF_TYPE_TIME_EXTEND_ABS:
> > return RB_LEN_TIME_EXTEND;
> >  
> > -   case RINGBUF_TYPE_TIME_STAMP:
> > -   return RB_LEN_TIME_STAMP;
> > -
> > case RINGBUF_TYPE_DATA:
> > return rb_event_data_length(event);
> > default:
> > @@ -209,7 +209,8 @@ static void rb_event_set_padding(struct 
> > ring_buffer_event *event)
> >  {
> > unsigned len = 0;
> >  
> > -   if (event->type_len == RINGBUF_TYPE_TIME_EXTEND) {
> > +   if (event->type_len == RINGBUF_TYPE_TIME_EXTEND ||
> > +   event->type_len == RINGBUF_TYPE_TIME_EXTEND_ABS) {
> 
> Hmm, we could micro-optimize this with:
> 
>   event->type_len > RINGBUF_TYPE_PADDING
> 
> But it would require comments and/or a wrapper to define it so people
> in the future know what it is doing.

What about

event->type_len >= RINGBUF_TYPE_TIME_EXTEND

?  I think it's easier to understand what it's doing.

Thanks,
Namhyung

> 
> 
> > /* time extends include the data event after it */
> > len = RB_LEN_TIME_EXTEND;
> > event = skip_time_extend(event);

Re: [PATCH V2 1/6] PM / QOS: Add default case to the switch

2017-02-09 Thread Viresh Kumar

On 09-02-17, 15:24, Pavel Machek wrote:
> On Thu 2017-02-09 09:11:47, Viresh Kumar wrote:
> > The switch block handles all the QOS request types present today, but
> > starts giving compilation warnings as soon as a new type is added and
> > not handled in this.
> > 
> > To prevent against that, add the default case as well and do a WARN from
> > it.
> 
> I'd say compilation-time warning is better than hmm stacktrace and memory 
> leak
> at runtime?

Of course we aren't going to allow a compilation warning for each and every
platform that compiles this file. How do you wish to fix the issue then ?

-- 
viresh

Re: [RFC][PATCH] tracing: Have traceprobe_probes_write() not access userspace unnecessarily

2017-02-09 Thread Masami Hiramatsu

On Thu, 9 Feb 2017 18:04:58 -0500
Steven Rostedt  wrote:

> 
> The code in traceprobe_probes_write() reads up to 4096 bytes from userpace
> for each line. If userspace passes in several lines to execute, the code
> will do a large read for each line, even though, it is highly likely that
> the first read from userspace received all of the lines at one.
> 
> I changed the logic to do a single read from userspace, and to only read
> from userspace again if not all of the read from userspace made it in.
> 
> I tested this by adding printk()s and writing files that would test -1, ==,
> and +1 the buffer size, to make sure that there's no overflows and that if a
> single line is written with +1 the buffer size, that it fails properly.
> 

Thanks Steve!

Acked-by: Masami Hiramatsu 

BTW, this can conflict with my previous patch.

https://lkml.org/lkml/2017/2/6/1048
https://lkml.org/lkml/2017/2/7/203

I'll update this. Ingo, Can I send these patch to Steve?

Thank you,

> Signed-off-by: Steven Rostedt (VMware) 
> ---
>  kernel/trace/trace_probe.c | 48 
> --
>  1 file changed, 29 insertions(+), 19 deletions(-)
> 
> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> index 8c0553d..2a06f1f 100644
> --- a/kernel/trace/trace_probe.c
> +++ b/kernel/trace/trace_probe.c
> @@ -647,7 +647,7 @@ ssize_t traceprobe_probes_write(struct file *file, const 
> char __user *buffer,
>   size_t count, loff_t *ppos,
>   int (*createfn)(int, char **))
>  {
> - char *kbuf, *tmp;
> + char *kbuf, *buf, *tmp;
>   int ret = 0;
>   size_t done = 0;
>   size_t size;
> @@ -667,27 +667,37 @@ ssize_t traceprobe_probes_write(struct file *file, 
> const char __user *buffer,
>   goto out;
>   }
>   kbuf[size] = '\0';
> - tmp = strchr(kbuf, '\n');
> + buf = kbuf;
> + do {
> + tmp = strchr(buf, '\n');
> + if (tmp) {
> + *tmp = '\0';
> + size = tmp - buf + 1;
> + } else {
> + size = strlen(buf);
> + if (done + size < count) {
> + if (buf != kbuf)
> + break;
> + pr_warn("Line length is too long: 
> Should be less than %d\n",
> + WRITE_BUFSIZE);
> + ret = -EINVAL;
> + goto out;
> + }
> + }
> + done += size;
>  
> - if (tmp) {
> - *tmp = '\0';
> - size = tmp - kbuf + 1;
> - } else if (done + size < count) {
> - pr_warn("Line length is too long: Should be less than 
> %d\n",
> - WRITE_BUFSIZE);
> - ret = -EINVAL;
> - goto out;
> - }
> - done += size;
> - /* Remove comments */
> - tmp = strchr(kbuf, '#');
> + /* Remove comments */
> + tmp = strchr(buf, '#');
>  
> - if (tmp)
> - *tmp = '\0';
> + if (tmp)
> + *tmp = '\0';
>  
> - ret = traceprobe_command(kbuf, createfn);
> - if (ret)
> - goto out;
> + ret = traceprobe_command(buf, createfn);
> + if (ret)
> + goto out;
> + buf += size;
> +
> + } while (done < count);
>   }
>   ret = done;
>  
> -- 
> 2.9.3
> 


-- 
Masami Hiramatsu

Re: [RFC] syscalls: Restore address limit after a syscall

2017-02-09 Thread Andy Lutomirski

On Thu, Feb 9, 2017 at 3:41 PM, Thomas Garnier  wrote:
> On Thu, Feb 9, 2017 at 3:05 PM, Andy Lutomirski  wrote:
>> On Thu, Feb 9, 2017 at 11:31 AM, Kees Cook  wrote:
>>> On Thu, Feb 9, 2017 at 10:33 AM, Thomas Garnier  wrote:
>>>> This patch prevents a syscall to modify the address limit of the
>>>> caller. The address limit is kept by the syscall wrapper and restored
>>>> just after the syscall ends.
>>>>
>>>> For example, it would mitigation this bug:
>>>>
>>>> - https://bugs.chromium.org/p/project-zero/issues/detail?id=990
>>>>
>>>> Signed-off-by: Thomas Garnier 
>>>> ---
>>>> Based on next-20170209
>>>> ---
>>>>  include/linux/syscalls.h | 5 -
>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>>>> index 91a740f6b884..a1b6a62a9849 100644
>>>> --- a/include/linux/syscalls.h
>>>> +++ b/include/linux/syscalls.h
>>>> @@ -198,7 +198,10 @@ extern struct trace_event_functions 
>>>> exit_syscall_print_funcs;
>>>> asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));  \
>>>> asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))   \
>>>> {   \
>>>> -   long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));  \
>>>> +   long ret;   \
>>>> +   mm_segment_t fs = get_fs(); \
>>>> +   ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));   \
>>>> +   set_fs(fs); \
>>>> __MAP(x,__SC_TEST,__VA_ARGS__); \
>>>> __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));   \
>>>> return ret; \
>>>> --
>>>> 2.11.0.483.g087da7b7c-goog
>>>>
>>>
>>> I have a memory of Andy looking at this before, and there was some
>>> problem with how a bunch of compat code would set fs and then re-call
>>> the syscall... but I can't quite find the conversation. Andy, do you
>>> remember the details?
>>>
>>> This seems like an entirely reasonable thing to enforce for syscalls,
>>> though I'm sure there's a gotcha somewhere. :)
>>
>> This sounds vaguely familiar, but that's about all.
>>
>> Anyway, it seems reasonable that the SyS_foobar wrappers are genuinely
>> only used for syscalls and not for other things, so the code should
>> *work*.  That being said, I think there's room for several
>> improvements.
>>
>> 1. Why save the old "fs" value?  For that matter, why restore it?
>> IOW, I'd rather see BUG_ON(get_fs() != USER_DS) at the end.
>>
>
> I guess that make sense in the wrapper.
>
>> 2. I'd rather see the mechanism be more general.  If we had, effectively:
>>
>> asmlinkage long SyS_foo(...) {
>>   sys_foo();
>>   verify_pre_usermode_state();
>> }
>>
>> and let verify_pre_usermode_state() potentially do more things, we'd
>> get a more flexible mechanism.  On arches like x86_32, we could save a
>> decent amount of code size by moving verify_pre_usermode_state() into
>> prepare_exit_to_usermode(), but that would have to be a per-arch
>> opt-in.  x86_64 probably would *not* select this due to the fast path
>> (or it would do it in asm.  hmm.).
>>
>
> I will look into that. I like this design better.
>
>> 3. If this thing gets factored out, then arch code can call it for
>> non-syscall entries, too.
>>
>
> Yes, it makes sense.
>
>> 4. Can we make this configurable?
>>
>>
>> For x86, a nice implementation might be:
>>
>> select ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
>>
>> ... in prepare_exit_to_usermode():
>>
>> verify_pre_usermode_state();  // right at the beginning
>>
>> ... in the asm syscall fast path:
>>
>> #ifdef CONFIG_VERIFY_PRE_USERMODE_STATE
>> call verify_pre_usermode_staet
>> #endif
>>
>> (or just inline the interesting bit)
>>
>
> So by default it is in the wrapper. If selected, an architecture can
> disable the wrapper put it in the best places. Understood correctly?

Sounds good to me.

Presumably the result should go through -mm.  Want to cc: akpm and
linux-arch@ on the next version?

I've also cc'd arm and s390 folks -- those are the other arches that
try to be on top of hardening.

Re: [PATCHv3] arm: ftrace: Adds support for CONFIG_DYNAMIC_FTRACE_WITH_REGS

2017-02-09 Thread Jean-Jacques Hiblot

2017-02-07 23:57 GMT+01:00 Abel Vesa :
> The DYNAMIC_FTRACE_WITH_REGS configuration makes it possible for a ftrace
> operation to specify if registers need to saved/restored by the ftrace 
> handler.
> This is needed by kgraft and possibly other ftrace-based tools, and the ARM
> architecture is currently lacking this feature. It would also be the first 
> step
> to support the "Kprobes-on-ftrace" optimization on ARM.
>
> This patch introduces a new ftrace handler that stores the registers on the
> stack before calling the next stage. The registers are restored from the stack
> before going back to the instrumented function.
>
> A side-effect of this patch is to activate the support for 
> ftrace_modify_call()
> as it defines ARCH_SUPPORTS_FTRACE_OPS for the ARM architecture.
>
> Signed-off-by: Abel Vesa 
> ---
>  arch/arm/Kconfig   |  1 +
>  arch/arm/include/asm/ftrace.h  |  4 +++
>  arch/arm/kernel/entry-ftrace.S | 79 
> ++
>  arch/arm/kernel/ftrace.c   | 37 
>  4 files changed, 121 insertions(+)
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 186c4c2..df401f4 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -50,6 +50,7 @@ config ARM
> select HAVE_DMA_API_DEBUG
> select HAVE_DMA_CONTIGUOUS if MMU
> select HAVE_DYNAMIC_FTRACE if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
> +   select HAVE_DYNAMIC_FTRACE_WITH_REGS if HAVE_DYNAMIC_FTRACE && 
> OLD_MCOUNT
> select HAVE_EFFICIENT_UNALIGNED_ACCESS if (CPU_V6 || CPU_V6K || 
> CPU_V7) && MMU
> select HAVE_EXIT_THREAD
> select HAVE_FTRACE_MCOUNT_RECORD if (!XIP_KERNEL)
> diff --git a/arch/arm/include/asm/ftrace.h b/arch/arm/include/asm/ftrace.h
> index 22b7311..f379881 100644
> --- a/arch/arm/include/asm/ftrace.h
> +++ b/arch/arm/include/asm/ftrace.h
> @@ -1,6 +1,10 @@
>  #ifndef _ASM_ARM_FTRACE
>  #define _ASM_ARM_FTRACE
>
> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> +#define ARCH_SUPPORTS_FTRACE_OPS 1
> +#endif
> +
>  #ifdef CONFIG_FUNCTION_TRACER
>  #define MCOUNT_ADDR((unsigned long)(__gnu_mcount_nc))
>  #define MCOUNT_INSN_SIZE   4 /* sizeof mcount call */
> diff --git a/arch/arm/kernel/entry-ftrace.S b/arch/arm/kernel/entry-ftrace.S
> index c73c403..b1fee6c 100644
> --- a/arch/arm/kernel/entry-ftrace.S
> +++ b/arch/arm/kernel/entry-ftrace.S
> @@ -92,12 +92,74 @@
>  2: mcount_exit
>  .endm
>
> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> +
> +.macro __ftrace_regs_caller
> +
> +   add ip, sp, #4  @ move in IP the value of SP as it was
> +   @ before the push {lr} of the mcount mechanism
> +   stmdb   sp!, {ip,lr,pc}
> +   stmdb   sp!, {r0-r11,lr}
> +
> +   @ stack content at this point:
> +   @ 0  4  4448   52   56   60   64
> +   @ R0 | R1 | ... | R11 | LR | SP + 4 | LR | PC | previous LR |
> +
> +   mov r3, sp  @ struct pt_regs*
> +   ldr r2, =function_trace_op
> +   ldr r2, [r2]@ pointer to the current
> +   @ function tracing op
> +   ldr r1, [sp, #64]   @ lr of instrumented func
> +   mcount_adjust_addr  r0, lr  @ instrumented function
> +
> +   .globl ftrace_regs_call
> +ftrace_regs_call:
> +   bl  ftrace_stub
> +
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> +   .globl ftrace_graph_regs_call
> +ftrace_graph_regs_call:
> +   mov r0, r0
> +#endif
> +   @ pop saved regs
> +   ldr lr, [sp, #64]   @ get the previous LR value 
> from stack
> +   ldmia   sp, {r0-r11, ip, sp, pc}@ pop the saved registers 
> INCLUDING
> +   @ the stack pointer
> +.endm
> +
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> +.macro __ftrace_graph_regs_caller
> +
> +   sub r0, fp, #4  @ lr of instrumented routine (parent)
> +
> +   @ called from __ftrace_regs_caller
> +   ldr r1, [sp, #56]   @ instrumented routine (func)
> +   mcount_adjust_addr  r1, r1
> +
> +   sub r2, r0, #4  @ frame pointer
why not mov r2, fp  ?


> +   bl  prepare_ftrace_return
> +
> +   @ pop regs saved in ftrace_regs_caller
> +   ldr lr, [sp, #64]   @ restore lr from the stack
> +   ldmia   sp, {r0-r11, ip, sp, pc}@ restore r0 through pc
> +
> +.endm
> +#endif
> +#endif
> +
>  .macro __ftrace_caller suffix
> mcount_enter
>
> mcount_get_lr   r1  @ lr of instrumented func
> mcount_adjust_addr  r0, lr  @ instrumented function
>
> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> +   ldr r2, =function_trace_op
> +   ldr r2, [r2]@ pointer to the current
> +   @ function tracing op
> +   mo

Re: [PATCH linux v7 4/6] hwmon: occ: Add callbacks for parsing P8 OCC datastructures

2017-02-09 Thread Joel Stanley

On Wed, Feb 8, 2017 at 9:40 AM,   wrote:
> From: "Edward A. James" 
>
> Add functions to parse the data structures that are specific to the OCC on
> the POWER8 processor. These are the sensor data structures, including
> temperature, frequency, power, and "caps."
>
> Signed-off-by: Edward A. James 
> Signed-off-by: Andrew Jeffery 
> ---
>  Documentation/hwmon/occ|   9 ++
>  drivers/hwmon/occ/occ_p8.c | 248 
> +
>  drivers/hwmon/occ/occ_p8.h |  30 ++
>  3 files changed, 287 insertions(+)
>  create mode 100644 drivers/hwmon/occ/occ_p8.c
>  create mode 100644 drivers/hwmon/occ/occ_p8.h

>
> diff --git a/drivers/hwmon/occ/occ_p8.c b/drivers/hwmon/occ/occ_p8.c
> new file mode 100644
> index 000..5c61fc4
> --- /dev/null
> +++ b/drivers/hwmon/occ/occ_p8.c

> +void p8_parse_sensor(u8 *data, void *sensor, int sensor_type, int off,
> +int snum)
> +{
> +   switch (sensor_type) {
> +   case FREQ:
> +   case TEMP:
> +   {
> +   struct p8_occ_sensor *os =
> +   &(((struct p8_occ_sensor *)sensor)[snum]);
> +
> +   os->sensor_id = be16_to_cpu(get_unaligned((u16 *)&data[off]));
> +   os->value = be16_to_cpu(get_unaligned((u16 *)&data[off + 2]));
> +   }
> +   break;
> +   case POWER:
> +   {
> +   struct p8_power_sensor *ps =
> +   &(((struct p8_power_sensor *)sensor)[snum]);
> +
> +   ps->sensor_id = be16_to_cpu(get_unaligned((u16 *)&data[off]));
> +   ps->update_tag =
> +   be32_to_cpu(get_unaligned((u32 *)&data[off + 2]));

This might be more readable if you wrote a
cast_get_unaliged_be32_to_cpu() macro.

> +   ps->accumulator =
> +   be32_to_cpu(get_unaligned((u32 *)&data[off + 6]));
> +   ps->value = be16_to_cpu(get_unaligned((u16 *)&data[off + 
> 10]));
> +   }
> +   break;
> +   case CAPS:
> +   {

> +const u32 *p8_get_sensor_hwmon_configs()
> +{
> +   return p8_sensor_hwmon_configs;
> +}
> +EXPORT_SYMBOL(p8_get_sensor_hwmon_configs);
> +
> +struct occ *p8_occ_start(struct device *dev, void *bus,
> +struct occ_bus_ops *bus_ops)
> +{
> +   return occ_start(dev, bus, bus_ops, &p8_ops, &p8_config);
> +}
> +EXPORT_SYMBOL(p8_occ_start);

We don't need to export these symbols; they're not used outside of the
OCC module. The same goes for all of the exports you've made in this
series.

I suggest we re-architect the drivers so we build all of the objects
and link them into one module for each platform, instead of having an
occ module and occ-p8/occ-p9 modules and i2c modules that all depend
on each other. The Makefile could look like this:

obj-$(CONFIG_SENSORS_PPC_OCC_P8_I2C) += hwmon_occ_p8.o
obj-$(CONFIG_SENSORS_PPC_OCC_P9) += hwmon_occ_p9.o

hwmon_occ_p8-$(CONFIG_SENSORS_PPC_OCC_P8_I2C) += occ_scom_i2c.o
occ_p8.o p8_occ_i2c.o occ_sysfs.o occ.o
hwmon_occ_p9-$(CONFIG_SENSORS_PPC_OCC_P9) += occ_p9.o occ_sysfs.o occ.o

And the Kbuild like this:

menuconfig SENSORS_PPC_OCC
bool "PPC On-Chip Controller"

if SENSORS_PPC_OCC

config SENSORS_PPC_OCC_P8_I2C
bool "POWER8 OCC hwmon support"
depends on I2C

config SENSORS_PPC_OCC_P9
bool "POWER9 OCC hwmon support"

endif

Re: [PATCH linux v7 5/6] hwmon: occ: Add hwmon implementation for the P8 OCC

2017-02-09 Thread Joel Stanley

On Wed, Feb 8, 2017 at 9:40 AM,   wrote:
> From: "Edward A. James" 
>
> Add code to tie the hwmon sysfs code and the POWER8 OCC code together, as
> well as probe the entire driver from the I2C bus. I2C is the communication
> method between the BMC and the P8 OCC.
>
> Signed-off-by: Edward A. James 
> Signed-off-by: Andrew Jeffery 
> Acked-by: Rob Herring 
> ---
>  Documentation/devicetree/bindings/hwmon/occ.txt |  13 +++
>  drivers/hwmon/occ/Kconfig   |  14 
>  drivers/hwmon/occ/Makefile  |   1 +
>  drivers/hwmon/occ/p8_occ_i2c.c  | 104 
> 

For consistency, how about we call this occ_p8_i2c.c? The other files
have the machine name second.


>  4 files changed, 132 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/hwmon/occ.txt
>  create mode 100644 drivers/hwmon/occ/p8_occ_i2c.c
>

> diff --git a/drivers/hwmon/occ/Kconfig b/drivers/hwmon/occ/Kconfig
> index cdb64a7..3a5188f 100644
> --- a/drivers/hwmon/occ/Kconfig
> +++ b/drivers/hwmon/occ/Kconfig
> @@ -13,3 +13,17 @@ menuconfig SENSORS_PPC_OCC
>
>   This driver can also be built as a module. If so, the module
>   will be called occ.
> +
> +if SENSORS_PPC_OCC
> +
> +config SENSORS_PPC_OCC_P8_I2C
> +   tristate "POWER8 OCC hwmon support"
> +   depends on I2C
> +   help
> +Provide a hwmon sysfs interface for the POWER8 On-Chip Controller,
> +exposing temperature, frequency and power measurements.
> +
> +This driver can also be built as a module. If so, the module will be
> +called p8-occ-i2c.

Mention that this driver is for the BMC (or just "service processor"),
and is not useful in the Power8 kernel.

I'm trying to think of a better prefix than PPC. PPC means much more
than Power8 and Power9, which is what we mean. It's also confusing as
we don't run this on any PowerPC machine - it's for an ARM chip.

> +
> +int p8_i2c_getscom(void *bus, u32 address, u64 *data)
> +{
> +   /* P8 i2c slave requires address to be shifted by 1 */
> +   address = address << 1;

I think these are scom addresses? Please indicate that so we don't get
them confused with i2c.

Or are they scom operations?

> +
> +   return occ_i2c_getscom(bus, address, data);
> +}
> +
> +int p8_i2c_putscom(void *bus, u32 address, u32 data0, u32 data1)

The functions can be static.

> +{
> +   /* P8 i2c slave requires address to be shifted by 1 */
> +   address = address << 1;
> +
> +   return occ_i2c_putscom(bus, address, data0, data1);
> +}

Re: [PATCH linux v7 1/6] hwmon: Add core On-Chip Controller support for POWER CPUs

2017-02-09 Thread Joel Stanley

On Wed, Feb 8, 2017 at 9:40 AM,   wrote:

> diff --git a/Documentation/hwmon/occ b/Documentation/hwmon/occ
> new file mode 100644
> index 000..79d1642
> --- /dev/null
> +++ b/Documentation/hwmon/occ

The kernel is using  reStructuredText these days. You should consider
following suit:

 
https://www.kernel.org/doc/html/latest/doc-guide/sphinx.html#writing-documentation


> @@ -0,0 +1,40 @@
> +Kernel driver occ
> +=
> +
> +Supported chips:
> + * ASPEED AST2400
> + * ASPEED AST2500

Not really - this will run on any chip that's connected to a P8 or P9.


> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5f10c28..193a13b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9128,6 +9128,13 @@ T:   git git://linuxtv.org/media_tree.git
>  S: Maintained
>  F: drivers/media/i2c/ov7670.c
>
> +ON-CHIP CONTROLLER HWMON DRIVER

Mention P8 and P9?

> +M: Eddie James 
> +L: linux-hw...@vger.kernel.org

Have you subscribed to this list? Would you prefer the mail to come to
the openbmc list?

> +S: Maintained
> +F: Documentation/hwmon/occ
> +F: drivers/hwmon/occ/
> +
>  ONENAND FLASH DRIVER
>  M: Kyungmin Park 
>  L: linux-...@lists.infradead.org
> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> index 190d270..e80ca81 100644
> --- a/drivers/hwmon/Kconfig
> +++ b/drivers/hwmon/Kconfig
> @@ -1240,6 +1240,8 @@ config SENSORS_NSA320
>   This driver can also be built as a module. If so, the module
>   will be called nsa320-hwmon.
>
> +source drivers/hwmon/occ/Kconfig
> +
>  config SENSORS_PCF8591
> tristate "Philips PCF8591 ADC/DAC"
> depends on I2C
> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> index d2cb7e8..c7ec5d4 100644
> --- a/drivers/hwmon/Makefile
> +++ b/drivers/hwmon/Makefile
> @@ -169,6 +169,7 @@ obj-$(CONFIG_SENSORS_WM831X)+= wm831x-hwmon.o
>  obj-$(CONFIG_SENSORS_WM8350)   += wm8350-hwmon.o
>  obj-$(CONFIG_SENSORS_XGENE)+= xgene-hwmon.o
>
> +obj-$(CONFIG_SENSORS_PPC_OCC)  += occ/
>  obj-$(CONFIG_PMBUS)+= pmbus/
>
>  ccflags-$(CONFIG_HWMON_DEBUG_CHIP) := -DDEBUG


> diff --git a/drivers/hwmon/occ/occ.c b/drivers/hwmon/occ/occ.c
> new file mode 100644
> index 000..af077f2
> --- /dev/null
> +++ b/drivers/hwmon/occ/occ.c

> +   sensors = &resp->data.blocks[b].sensors;
> +   if (!sensors) {
> +   /* first poll response */
> +   sensors = driver->ops.alloc_sensor(dev, sensor_type,
> +  
> block->num_sensors);
> +   if (!sensors)
> +   return -ENOMEM;
> +
> +   resp->data.blocks[b].sensors = sensors;
> +   resp->data.sensor_block_id[sensor_type] = b;
> +   resp->data.blocks[b].header = *block;
> +   } else if (block->sensor_length !=
> +resp->data.blocks[b].header.sensor_length) {
> +   dev_warn(dev,
> +"different sensor length than first poll\n");

The driver has changed behaviour from your previous version; now it
discards data if the sensors change.

Under what circumstances does the sensor data change?

> +   continue;
> +   }
> +
> +   for (s = 0; s < block->num_sensors &&
> +s < resp->data.blocks[b].header.num_sensors; s++) {
> +   if (offset + block->sensor_length > num_bytes) {
> +   dev_warn(dev, "exceeded data length\n");
> +   return 0;
> +   }
> +
> +   driver->ops.parse_sensor(data, sensors, sensor_type,
> +offset, s);
> +   offset += block->sensor_length;
> +   }
> +   }
> +
> +   return 0;
> +}
> +
> +static u8 occ_send_cmd(struct occ *driver, u8 seq, u8 type, u16 length,
> +  const u8 *data, u8 *resp)
> +{
> +   u32 cmd1, cmd2 = 0;
> +   u16 checksum = 0;
> +   bool retry = false;
> +   int i, rc, tries = 0;
> +
> +   cmd1 = (seq << 24) | (type << 16) | length;
> +   memcpy(&cmd2, data, length);
> +   cmd2 <<= ((4 - length) * 8);
> +
> +   /* checksum: sum of every bytes of cmd1, cmd2 */
> +   for (i = 0; i < 4; i++) {
> +   checksum += (cmd1 >> (i * 8)) & 0xFF;
> +   checksum += (cmd2 >> (i * 8)) & 0xFF;
> +   }
> +
> +   cmd2 |= checksum << ((2 - length) * 8);
> +
> +   /* Init OCB */

You should mention what OCB means in your documentation.

> +   rc = driver->bus_ops.putscom(driver->bus, OCB_STATUS_CONTROL_OR,
> +OCB_OR_INIT0, OCB_OR_INIT1);
> +   if (rc)
> +   goto err;
> +

> +int occ_set_user_powercap(struct occ *occ, u16 cap)
> +{
>

Re: [PATCH linux v7 6/6] hwmon: occ: Add callbacks for parsing P9 OCC datastructures

2017-02-09 Thread Joel Stanley

On Wed, Feb 8, 2017 at 9:40 AM,   wrote:
> From: "Edward A. James" 
>
> Add functions to parse the data structures that are specific to the OCC on
> the POWER9 processor. These are the sensor data structures, including
> temperature, frequency, power, and "caps."
>
> Signed-off-by: Edward A. James 
> Signed-off-by: Andrew Jeffery 
> ---
>  Documentation/hwmon/occ|   3 +
>  drivers/hwmon/occ/occ_p9.c | 309 
> +
>  drivers/hwmon/occ/occ_p9.h |  30 +
>  3 files changed, 342 insertions(+)
>  create mode 100644 drivers/hwmon/occ/occ_p9.c
>  create mode 100644 drivers/hwmon/occ/occ_p9.h

> diff --git a/drivers/hwmon/occ/occ_p9.c b/drivers/hwmon/occ/occ_p9.c
> new file mode 100644
> index 000..9c1283c
> --- /dev/null
> +++ b/drivers/hwmon/occ/occ_p9.c
> @@ -0,0 +1,309 @@
> +/*
> + * occ_p9.c - OCC hwmon driver
> + *
> + * This file contains the Power9-specific methods and data structures for
> + * the OCC hwmon driver.
> + *
> + * Copyright 2016 IBM Corp.

It's 2017.

> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.

We generally just include the first paragraph. Same goes for all of the files.

> +
> +static const u32 p9_sensor_hwmon_configs[MAX_OCC_SENSOR_TYPE] = {
> +   HWMON_I_INPUT | HWMON_I_LABEL,  /* freq: value | label */
> +   /* temp: value | label | fru_type */
> +   HWMON_T_INPUT | HWMON_T_LABEL | HWMON_T_TYPE,
> +   /* power: value | label | accum[0] | accum[1] | update_tag |
> +*   (function_id | (apss_channel << 8))
> +*/
> +   HWMON_P_INPUT | HWMON_P_LABEL | HWMON_P_AVERAGE_MIN |
> +   HWMON_P_AVERAGE_MAX | HWMON_P_AVERAGE_INTERVAL |
> +   HWMON_P_RESET_HISTORY,
> +   /* caps: curr | max | min | norm | user | source */
> +   HWMON_P_CAP | HWMON_P_CAP_MAX | HWMON_P_CAP_MIN | HWMON_P_MAX |
> +   HWMON_P_ALARM | HWMON_P_CAP_ALARM,

I find this really hard to read. Perhaps something like this:


#define FREQ_CONFIG(HWMON_I_INPUT | HWMON_I_LABEL)
#deifne TEMP_CONFIG(HWMON_T_INPUT | HWMON_T_LABEL | HWMON_T_TYPE)
#define POWER_CONFIG( HWMON_P_INPUT | HWMON_P_LABEL |
HWMON_P_AVERAGE_MIN | \
 HWMON_P_AVERAGE_MAX |
HWMON_P_AVERAGE_INTERVAL | \
 HWMON_P_RESET_HISTORY)

etc. Do the same in the p8 driver.


> diff --git a/drivers/hwmon/occ/occ_p9.h b/drivers/hwmon/occ/occ_p9.h
> new file mode 100644
> index 000..18ca16a
> --- /dev/null
> +++ b/drivers/hwmon/occ/occ_p9.h

> +
> +#ifndef __OCC_P9_H__
> +#define __OCC_P9_H__
> +
> +#include "scom.h"
> +
> +struct device;

Include the header for struct device instead.

Did you consider the one header file for all of your shared functions?
I don't think there's much value in having a whole heap of small ones.

> +
> +const u32 *p9_get_sensor_hwmon_configs(void);
> +struct occ *p9_occ_start(struct device *dev, void *bus,
> +struct occ_bus_ops *bus_ops);
> +
> +#endif /* __OCC_P9_H__ */
> --
> 1.8.3.1
>

Re: [PATCH linux v7 2/6] hwmon: occ: Add sysfs interface

2017-02-09 Thread Joel Stanley

On Wed, Feb 8, 2017 at 9:40 AM,   wrote:
> From: "Edward A. James" 
>
> Add a generic mechanism to expose the sensors provided by the OCC in
> sysfs.
>
> Signed-off-by: Edward A. James 
> Signed-off-by: Andrew Jeffery 
> ---
>  Documentation/hwmon/occ   |  62 +++
>  drivers/hwmon/occ/Makefile|   2 +-
>  drivers/hwmon/occ/occ_sysfs.c | 251 
> ++
>  drivers/hwmon/occ/occ_sysfs.h |  30 +
>  4 files changed, 344 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/hwmon/occ/occ_sysfs.c
>  create mode 100644 drivers/hwmon/occ/occ_sysfs.h
>
> diff --git a/Documentation/hwmon/occ b/Documentation/hwmon/occ
> index 79d1642..d0bdf06 100644
> --- a/Documentation/hwmon/occ
> +++ b/Documentation/hwmon/occ
> @@ -25,6 +25,68 @@ Currently, all versions of the OCC support four types of 
> sensor data: power,
>  temperature, frequency, and "caps," which indicate limits and thresholds used
>  internally on the OCC.
>
> +sysfs Entries
> +-
> +
> +The OCC driver uses the hwmon sysfs framework to provide data to userspace.
> +
> +The driver exports a number of sysfs files for each type of sensor. The
> +sensor-specific files vary depending on the processor type, though many of 
> the
> +attributes are common for both the POWER8 and POWER9.
> +
> +The hwmon interface cannot define every type of sensor that may be used.
> +Therefore, the frequency sensor on the OCC uses the "input" type sensor 
> defined
> +by the hwmon interface, rather than defining a new type of custom sensor.
> +
> +Below are detailed the names and meaning of each sensor file for both types 
> of
> +processors. All sensors are read-only unless otherwise specified.  
> indicates
> +the hwmon index. sensor id indicates the unique internal OCC identifer. 
> Please
> +see the POWER OCC specification for details on all these sensor values.
> +
> +frequency:
> +   all processors:
> +   in_input - frequency value
> +   in_label - sensor id
> +temperature:
> +   POWER8:
> +   temp_input - temperature value
> +   temp_label - sensor id
> +   POWER9 (in addition to above):
> +   temp_type - FRU type
> +
> +power:
> +   POWER8:
> +   power_input - power value
> +   power_label - sensor id
> +   power_average - accumulator
> +   power_average_interval - update tag (number of samples in
> +   accumulator)
> +   POWER9:
> +   power_input - power value
> +   power_label - sensor id
> +   power_average_min - accumulator[0]
> +   power_average_max - accumulator[1] (64 bits total)
> +   power_average_interval - update tag
> +   power_reset_history - (function_id | (apss_channel << 8)
> +
> +caps:
> +   POWER8:
> +   power_cap - current powercap
> +   power_cap_max - max powercap
> +   power_cap_min - min powercap
> +   power_max - normal powercap
> +   power_alarm - user powercap, r/w
> +   POWER9:
> +   power_cap_alarm - user powercap source
> +
> +The driver also provides two sysfs entries through hwmon to better
> +control the driver and monitor the master OCC. Though there may be multiple
> +OCCs present on the system, these two files are only present for the "master"
> +OCC.
> +   name - read the name of the driver
> +   update_interval - read or write the minimum interval for polling the
> +   OCC.
> +
>  BMC - Host Communications
>  -
>
> diff --git a/drivers/hwmon/occ/Makefile b/drivers/hwmon/occ/Makefile
> index 93cb52f..a6881f9 100644
> --- a/drivers/hwmon/occ/Makefile
> +++ b/drivers/hwmon/occ/Makefile
> @@ -1 +1 @@
> -obj-$(CONFIG_SENSORS_PPC_OCC) += occ.o
> +obj-$(CONFIG_SENSORS_PPC_OCC) += occ.o occ_sysfs.o
> diff --git a/drivers/hwmon/occ/occ_sysfs.c b/drivers/hwmon/occ/occ_sysfs.c
> new file mode 100644
> index 000..9598f78
> --- /dev/null
> +++ b/drivers/hwmon/occ/occ_sysfs.c
> @@ -0,0 +1,251 @@
> +/*
> + * occ_sysfs.c - OCC sysfs interface
> + *
> + * This file contains the methods and data structures for implementing the 
> OCC
> + * hwmon sysfs entries.
> + *
> + * Copyright 2016 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
>

Re: [PATCH linux v7 3/6] hwmon: occ: Add I2C transport implementation for SCOM operations

2017-02-09 Thread Joel Stanley

On Wed, Feb 8, 2017 at 9:40 AM,   wrote:
> From: "Edward A. James" 
>
> Add functions to send SCOM operations over I2C bus. The BMC can
> communicate with the Power8 host processor over I2C, but needs to use SCOM
> operations in order to access the OCC register space.

This doesn't need to be separate from the p8_occ_i2c.c file. You can
remove a layer of function calls by merging this in and having these
be your getscom putscom bus_ops callbacks.

> Signed-off-by: Edward A. James 
> Signed-off-by: Andrew Jeffery 
> ---
>  drivers/hwmon/occ/occ_scom_i2c.c | 77 
> 
>  drivers/hwmon/occ/occ_scom_i2c.h | 26 ++
>  2 files changed, 103 insertions(+)
>  create mode 100644 drivers/hwmon/occ/occ_scom_i2c.c
>  create mode 100644 drivers/hwmon/occ/occ_scom_i2c.h
>
> diff --git a/drivers/hwmon/occ/occ_scom_i2c.c 
> b/drivers/hwmon/occ/occ_scom_i2c.c
> new file mode 100644
> index 000..74bd6ff
> --- /dev/null
> +++ b/drivers/hwmon/occ/occ_scom_i2c.c
> @@ -0,0 +1,77 @@

> +
> +int occ_i2c_getscom(void *bus, u32 address, u64 *data)
> +{
> +   ssize_t rc;
> +   u64 buf;

If you add endianness annotations sparse can check your types are
consistent. The warning looks like this:

make C=2 drivers/hwmon/occ/occ_scom_i2c.o
drivers/hwmon/occ/occ_scom_i2c.c:48:17: warning: cast to restricted __be64

Which tells you it expects the type you pass to be64_to_cpu to be __be64.


> +   struct i2c_client *client = bus;
> +   struct i2c_msg msgs[2];
> +
> +   msgs[0].addr = client->addr;
> +   msgs[0].flags = client->flags & I2C_M_TEN;
> +   msgs[0].len = sizeof(u32);
> +   msgs[0].buf = (char *)&address;
> +
> +   msgs[1].addr = client->addr;
> +   msgs[1].flags = client->flags & I2C_M_TEN;
> +   msgs[1].flags |= I2C_M_RD;

I first thought you had made a mistake here. Instead you could do:

   msgs[1].flags = client->flags & I2C_M_TEN | I2C_M_RD;

> +   msgs[1].len = sizeof(u64);
> +   msgs[1].buf = (char *)&buf;
> +
> +   rc = i2c_transfer(client->adapter, msgs, 2);
> +   if (rc < 0)
> +   return rc;
> +

Re: net: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected in skb_array_produce

2017-02-09 Thread Jason Wang




On 2017年02月10日 02:10, Michael S. Tsirkin wrote:

On Thu, Feb 09, 2017 at 05:02:31AM -0500, Jason Wang wrote:

- Original Message -

Hello,

I've got the following report while running syzkaller fuzzer on mmotm
(git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git)
remotes/mmotm/auto-latest ee4ba7533626ba7bf2f8b992266467ac9fdc045e:


[...]


other info that might help us debug this:

  Possible interrupt unsafe locking scenario:

CPU0CPU1

   lock(&(&r->consumer_lock)->rlock);
local_irq_disable();
lock(&(&r->producer_lock)->rlock);
lock(&(&r->consumer_lock)->rlock);
   
 lock(&(&r->producer_lock)->rlock);


Thanks a lot for the testing.

Looks like we could address this by using skb_array_consume_bh() instead.

Could you pls verify if the following patch works?

I think we should use _bh for the produce call as well,
since resizing takes the producer lock.


Looks not since irq was disabled during resizing?

Thanks

Re: net: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected in skb_array_produce

2017-02-09 Thread Jason Wang




On 2017年02月09日 18:49, Dmitry Vyukov wrote:

On Thu, Feb 9, 2017 at 11:02 AM, Jason Wang  wrote:

- Original Message -

Hello,

I've got the following report while running syzkaller fuzzer on mmotm
(git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git)
remotes/mmotm/auto-latest ee4ba7533626ba7bf2f8b992266467ac9fdc045e:


[...]


other info that might help us debug this:

  Possible interrupt unsafe locking scenario:

CPU0CPU1

   lock(&(&r->consumer_lock)->rlock);
local_irq_disable();
lock(&(&r->producer_lock)->rlock);
lock(&(&r->consumer_lock)->rlock);
   
 lock(&(&r->producer_lock)->rlock);


Thanks a lot for the testing.

Looks like we could address this by using skb_array_consume_bh() instead.

Could you pls verify if the following patch works?

No, I can't test it, sorry. This happened once on bots. And bots
currently test only upstream versions.




No problem, will try to test my self.

Thanks

Re: [PATCH] jump_label: mark start/stop_jump_table const

2017-02-09 Thread kbuild test robot

Hi Michael,

[auto build test WARNING on linus/master]
[also build test WARNING on v4.10-rc7 next-20170209]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Michael-S-Tsirkin/jump_label-mark-__start-stop___jump_table-const/20170210-104623
config: x86_64-rhel (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   kernel/jump_label.c: In function 'jump_label_init':
>> kernel/jump_label.c:284:34: warning: initialization discards 'const' 
>> qualifier from pointer target type [-Wdiscarded-qualifiers]
 struct jump_entry *iter_start = __start___jump_table;
 ^~~~
   kernel/jump_label.c:285:33: warning: initialization discards 'const' 
qualifier from pointer target type [-Wdiscarded-qualifiers]
 struct jump_entry *iter_stop = __stop___jump_table;
^~~
   kernel/jump_label.c: In function 'jump_label_text_reserved':
>> kernel/jump_label.c:549:39: warning: passing argument 1 of 
>> '__jump_label_text_reserved' discards 'const' qualifier from pointer target 
>> type [-Wdiscarded-qualifiers]
 int ret = __jump_label_text_reserved(__start___jump_table,
  ^~~~
   kernel/jump_label.c:210:12: note: expected 'struct jump_entry *' but 
argument is of type 'const struct jump_entry *'
static int __jump_label_text_reserved(struct jump_entry *iter_start,
   ^~
   kernel/jump_label.c:550:4: warning: passing argument 2 of 
'__jump_label_text_reserved' discards 'const' qualifier from pointer target 
type [-Wdiscarded-qualifiers]
   __stop___jump_table, start, end);
   ^~~
   kernel/jump_label.c:210:12: note: expected 'struct jump_entry *' but 
argument is of type 'const struct jump_entry *'
static int __jump_label_text_reserved(struct jump_entry *iter_start,
   ^~
   kernel/jump_label.c: In function 'jump_label_update':
   kernel/jump_label.c:563:28: warning: initialization discards 'const' 
qualifier from pointer target type [-Wdiscarded-qualifiers]
 struct jump_entry *stop = __stop___jump_table;
   ^~~

vim +/const +284 kernel/jump_label.c

706249c22 Peter Zijlstra  2015-07-24  268   struct 
jump_entry *entry,
706249c22 Peter Zijlstra  2015-07-24  269   struct 
jump_entry *stop)
706249c22 Peter Zijlstra  2015-07-24  270  {
706249c22 Peter Zijlstra  2015-07-24  271   for (; (entry < stop) && 
(jump_entry_key(entry) == key); entry++) {
706249c22 Peter Zijlstra  2015-07-24  272   /*
706249c22 Peter Zijlstra  2015-07-24  273* entry->code set to 0 
invalidates module init text sections
706249c22 Peter Zijlstra  2015-07-24  274* 
kernel_text_address() verifies we are not in core kernel
706249c22 Peter Zijlstra  2015-07-24  275* init code, see 
jump_label_invalidate_module_init().
706249c22 Peter Zijlstra  2015-07-24  276*/
706249c22 Peter Zijlstra  2015-07-24  277   if (entry->code && 
kernel_text_address(entry->code))
706249c22 Peter Zijlstra  2015-07-24  278   
arch_jump_label_transform(entry, jump_label_type(entry));
706249c22 Peter Zijlstra  2015-07-24  279   }
706249c22 Peter Zijlstra  2015-07-24  280  }
706249c22 Peter Zijlstra  2015-07-24  281  
97ce2c88f Jeremy Fitzhardinge 2011-10-12  282  void __init jump_label_init(void)
bf5438fca Jason Baron 2010-09-17  283  {
bf5438fca Jason Baron 2010-09-17 @284   struct jump_entry *iter_start = 
__start___jump_table;
bf5438fca Jason Baron 2010-09-17  285   struct jump_entry *iter_stop = 
__stop___jump_table;
c5905afb0 Ingo Molnar 2012-02-24  286   struct static_key *key = NULL;
bf5438fca Jason Baron 2010-09-17  287   struct jump_entry *iter;
bf5438fca Jason Baron 2010-09-17  288  
1f69bf9c6 Jason Baron 2016-08-03  289   /*
1f69bf9c6 Jason Baron 2016-08-03  290* Since we are initializing 
the static_key.enabled field with
1f69bf9c6 Jason Baron 2016-08-03  291* with the 'raw' int values 
(to avoid pulling in atomic.h) in
1f69bf9c6 Jason Baron 2016-08-03  292* jump_label.h, let's make 
sure that is safe. There are only two

:: The code at line 284 was first introduced by commit
:: bf5438fca2950b03c21ad868090cc1a8fcd49

Re: [PATCH 4.4 05/48] mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}

2017-02-09 Thread Dan Williams

On Thu, Feb 9, 2017 at 7:26 AM, Ben Hutchings  wrote:
> On Wed, 2017-01-18 at 11:46 +0100, Greg Kroah-Hartman wrote:
>> 4.4-stable review patch.  If anyone has any objections, please let me know.
>>
>> --
>>
>> From: Dan Williams 
>>
>> commit f931ab479dd24cf7a2c6e2df19778406892591fb upstream.
>>
>> Both arch_add_memory() and arch_remove_memory() expect a single threaded
>> context.
> [...]
>> The result is that two threads calling devm_memremap_pages()
>> simultaneously can end up colliding on pgd initialization.  This leads
>> to crash signatures like the following where the loser of the race
>> initializes the wrong pgd entry:
> [...]
>> Hold the standard memory hotplug mutex over calls to
>> arch_{add,remove}_memory().
> [...]
>
> This is not a sufficient fix, because memory_hotplug.c still assumes
> there's only one 'writer':
>
> void put_online_mems(void)
> {
> ...
> if (!--mem_hotplug.refcount && unlikely(mem_hotplug.active_writer))
> wake_up_process(mem_hotplug.active_writer);
> ...
> }
>
> void mem_hotplug_begin(void)
> {
> mem_hotplug.active_writer = current;
>
> memhp_lock_acquire();
> for (;;) {
> mutex_lock(&mem_hotplug.lock);
> if (likely(!mem_hotplug.refcount))
> break;
> __set_current_state(TASK_UNINTERRUPTIBLE);
> mutex_unlock(&mem_hotplug.lock);
> schedule();
> }
> }
>
> With multiple writers, one or more of them may hang or
> {get,put}_online_mems() may mess up the hotplug reference count.

You're right. We need to hold lock_device_hotplug_sysfs() before
calling mem_hotplug_begin().  I'll take a look at a follow-on fix and
also add an assert_held_device_hotplug() helper to catch this in the
future.

Re: [PATCH v8 3/3] dmaengine: pl330: Don't require irq-safe runtime PM

2017-02-09 Thread Vinod Koul

On Thu, Feb 09, 2017 at 03:22:51PM +0100, Marek Szyprowski wrote:

> +static int pl330_set_slave(struct dma_chan *chan, struct device *slave)
> +{
> + struct dma_pl330_chan *pch = to_pchan(chan);
> + struct pl330_dmac *pl330 = pch->dmac;
> + int i;
> +
> + mutex_lock(&pl330->rpm_lock);
> +
> + for (i = 0; i < pl330->num_peripherals; i++) {
> + if (pl330->peripherals[i].chan.slave == slave &&
> + pl330->peripherals[i].slave_link) {
> + pch->slave_link = pl330->peripherals[i].slave_link;
> + goto done;
> + }
> + }
> +
> + pch->slave_link = device_link_add(slave, pl330->ddma.dev,
> +DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE);

So you are going to add the link on channel allocation and tear down on the
freeup. I am not sure I really like the idea here.

First, these thing shouldn't be handled in the drivers. These things should
be set in core and each driver setting the links doesn't sound great to me.

Second, should the link be always there and we only mange the state? Here it
seems that we have link being created and destroyed, so why not mark it
ACTIVE and DORMANT instead...

Lastly, looking at th description of the issue here, am perceiving (maybe my
understanding is not quite right here) that you have an IP block in SoC
which has multiple things and share common stuff and doing right PM is a
challenge for you, right?

-- 
~Vinod

Re: [PATCH v8 1/3] dmaengine: Add new device_{set,release}_slave callbacks

2017-02-09 Thread Vinod Koul

On Thu, Feb 09, 2017 at 03:22:49PM +0100, Marek Szyprowski wrote:
> Add two new callbacks to DMA engine device. They will used to provide
> access to slave device (the device which requested given DMA channel)

You mean access to client devices?

> for DMA engine driver. Access to slave device might be useful for example
> for implementing advanced runtime power management.
> 
> DMA slave channels are exclusive, so only one slave device can be set
> for a given DMA slave channel.

That is not a right assumption and my worry here. With virt-dma we don't
really assume a hardware channel and exclusive. Certain implementation may
do that but from framework we cannot assume that.

> device_set_slave() will be called after the device_alloc_chan_resources()
> and device_release_slave() before the device_free_chan_resources().

Okay, I had to relook at the series to get around this part. Sorry but we
can't call it set_slave, it is actually set_client/consumer

In our context slaves means dmaengine slave devices aka provider.
Client would be the consumer and not slave.

> Signed-off-by: Marek Szyprowski 
> ---
>  drivers/dma/dmaengine.c   | 27 ---
>  include/linux/dmaengine.h | 10 ++
>  2 files changed, 34 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
> index 24e0221fd66d..5b7089d8be4d 100644
> --- a/drivers/dma/dmaengine.c
> +++ b/drivers/dma/dmaengine.c
> @@ -705,6 +705,7 @@ struct dma_chan *dma_request_chan(struct device *dev, 
> const char *name)
>  {
>   struct dma_device *d, *_d;
>   struct dma_chan *chan = NULL;
> + int ret;
>  
>   /* If device-tree is present get slave info from here */
>   if (dev->of_node)
> @@ -715,8 +716,9 @@ struct dma_chan *dma_request_chan(struct device *dev, 
> const char *name)
>   chan = acpi_dma_request_slave_chan_by_name(dev, name);
>  
>   if (chan) {
> - /* Valid channel found or requester need to be deferred */
> - if (!IS_ERR(chan) || PTR_ERR(chan) == -EPROBE_DEFER)
> + if (!IS_ERR(chan))
> + goto found;
> + if (PTR_ERR(chan) == -EPROBE_DEFER)
>   return chan;
>   }
>  
> @@ -738,7 +740,21 @@ struct dma_chan *dma_request_chan(struct device *dev, 
> const char *name)
>   }
>   mutex_unlock(&dma_list_mutex);
>  
> - return chan ? chan : ERR_PTR(-EPROBE_DEFER);
> + if (!chan)
> + return ERR_PTR(-EPROBE_DEFER);
> + if (IS_ERR(chan))
> + return chan;
> +found:
> + if (chan->device->device_set_slave) {
> + chan->slave = dev;
> + ret = chan->device->device_set_slave(chan, dev);
> + if (ret) {
> + chan->slave = NULL;
> + dma_release_channel(chan);
> + chan = ERR_PTR(ret);
> + }
> + }
> + return chan;
>  }
>  EXPORT_SYMBOL_GPL(dma_request_chan);
>  
> @@ -786,6 +802,11 @@ void dma_release_channel(struct dma_chan *chan)
>   mutex_lock(&dma_list_mutex);
>   WARN_ONCE(chan->client_count != 1,
> "chan reference count %d != 1\n", chan->client_count);
> + if (chan->slave) {
> + if (chan->device->device_release_slave)
> + chan->device->device_release_slave(chan);
> + chan->slave = NULL;
> + }
>   dma_chan_put(chan);
>   /* drop PRIVATE cap enabled by __dma_request_channel() */
>   if (--chan->device->privatecnt == 0)
> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index 533680860865..d22299e37e69 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -277,6 +277,9 @@ struct dma_chan {
>   struct dma_router *router;
>   void *route_data;
>  
> + /* Only for SLAVE channels */
> + struct device *slave;

so assuming you refer to consumer aka client here, why do we need set if we
store it here.

> +
>   void *private;
>  };
>  
> @@ -686,6 +689,10 @@ struct dma_filter {
>   * @device_alloc_chan_resources: allocate resources and return the
>   *   number of allocated descriptors
>   * @device_free_chan_resources: release DMA channel's resources
> + * @device_set_slave: provide access to the slave device, which requested
> + *   given DMA channel, called after @device_alloc_chan_resources
> + * @device_release_slave: finishes access to the slave device, called
> + *   before @device_free_chan_resources
>   * @device_prep_dma_memcpy: prepares a memcpy operation
>   * @device_prep_dma_xor: prepares a xor operation
>   * @device_prep_dma_xor_val: prepares a xor validation operation
> @@ -746,6 +753,9 @@ struct dma_device {
>   int (*device_alloc_chan_resources)(struct dma_chan *chan);
>   void (*device_free_chan_resources)(struct dma_chan *chan);
>  
> + int (*device_set_slave)(struct dma_chan *chan, struct device *slave);
> + void (*device_release_slave)(struct

Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-09 Thread Eric Dumazet

On Thu, Feb 9, 2017 at 7:33 PM, Sowmini Varadhan
 wrote:
> On (02/09/17 19:19), Eric Dumazet wrote:
>>
>> More likely the bug is in fanout_add(), with a buggy sequence in error
>> case, and not correct locking.
>>
>> kfree(po->rollover);
>> po->rollover = NULL;
>>
>> Two cpus entering fanout_add() (using the same af_packet socket,
>> syzkaller courtesy...) might both see po->fanout being NULL.
>>
>> Then they grab the mutex.  Too late...
>
> I'm not sure I follow- aiui the panic was in acceessing the
> sk_receive_queue.lock in a socket that had been closed earlier. I think
> the assumption is that rcu_read_lock_bh in __dev_queue_xmit (and
> rcu_read_lock in dev_queue_xmit_nit?) should make sure that the nit
> packet delivery can be done safely, and the synchronize_net in
> packet_release() makes sure that the Tx paths are quiesced before freeing
> the socket.  What is the race-hole here? Does it have to do with the
> _bh and softirq context, somehow?
>

We have probably a dozen of bugs to fix in af_packet.c

The race in fanout_add() is one ot theml.

I do not believe Anoob Soman sent his fixes btw ...

( Look for this thread : http://marc.info/?l=linux-netdev&m=148588680525648&w=2

[PATCH] staging: r8712u: remove unnecessary le32_to_cpu

2017-02-09 Thread Perry Hooker

This patch fixes the following sparse warning:
drivers/staging/rtl8712/usb_ops_linux.c:212:33: warning: cast to restricted 
__le32

Signed-off-by: Perry Hooker 
---
 drivers/staging/rtl8712/usb_ops_linux.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/rtl8712/usb_ops_linux.c 
b/drivers/staging/rtl8712/usb_ops_linux.c
index fc6bb0b..259ef8f 100644
--- a/drivers/staging/rtl8712/usb_ops_linux.c
+++ b/drivers/staging/rtl8712/usb_ops_linux.c
@@ -209,7 +209,7 @@ static void r8712_usb_read_port_complete(struct urb *purb)
 
precvbuf->transfer_len = purb->actual_length;
pbuf = (uint *)precvbuf->pbuf;
-   isevt = le32_to_cpu(*(pbuf + 1)) & 0x1ff;
+   isevt = *(pbuf + 1) & 0x1ff;
if ((isevt & 0x1ff) == 0x1ff) {
r8712_rxcmd_event_hdl(padapter, pbuf);
skb_queue_tail(&precvpriv->rx_skb_queue, pskb);
-- 
2.4.11

Re: [PATCH v1] platform/x86: intel_pmc_ipc: fix division in 32-bit case

2017-02-09 Thread Rajneesh Bhardwaj

On Wed, Feb 08, 2017 at 06:00:28PM +0200, Andy Shevchenko wrote:
> On 32-bit x86 platforms we can't do 64-bit divisions:
> 
>   ERROR: "__udivdi3" [drivers/platform/x86/intel_pmc_ipc.ko] undefined!
> 
> Replace plain division by do_div() macro call.
> 
> Reported-by: Darren Hart 
> Cc: Shanth Murthy 
> Cc: Rajneesh Bhardwaj 
> Signed-off-by: Andy Shevchenko 
> ---
>  drivers/platform/x86/intel_pmc_ipc.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/platform/x86/intel_pmc_ipc.c 
> b/drivers/platform/x86/intel_pmc_ipc.c
> index 46202b6e7c87..8ad4d7b43423 100644
> --- a/drivers/platform/x86/intel_pmc_ipc.c
> +++ b/drivers/platform/x86/intel_pmc_ipc.c
> @@ -62,7 +62,12 @@
>  #define GCR_TELEM_SHLW_S0IX_OFFSET   0x1080
>  
>  /* Residency with clock rate at 19.2MHz to usecs */
> -#define S0IX_RESIDENCY_IN_USECS(d, s)(((d) + (s)) * 10 / 192)
> +#define S0IX_RESIDENCY_IN_USECS(d, s)\
> +({   \
> + u64 result = 10ull * ((d) + (s));   \
> + do_div(result, 192);\
> + result; \
> +})
> 

Looks good, we verified it. We are planning to squash this one and send V2.
Hope thats fine?
 
>  /*
>   * 16-byte buffer for sending data associated with IPC command.
> -- 
> 2.11.0
> 

-- 
Best Regards,
Rajneesh

Re: [PATCH v7 0/5] Update LZ4 compressor module

2017-02-09 Thread David Miller

From: Eric Biggers 
Date: Thu, 9 Feb 2017 10:29:14 -0800

> On Thu, Feb 09, 2017 at 12:02:11PM +0100, Sven Schmidt wrote:
>> > 
>> > [Also, for some reason linux-crypto is apparently still not receiving 
>> > patch 1/5
>> > in the series.  It's missing from the linux-crypto archive at
>> > http://www.spinics.net/lists/linux-crypto/, so it's not just me.]
>> > 
>> 
>> I don't really know what to do about this. I think the matter is the size of 
>> the E-Mail.
>> Are there filters or something like that? Since in linux-kernel the patch 
>> seems to get delivered.
>> I could otherwise CC you if you wish.
>> 
> 
> If I'm not mistaken, David Miller is the admin of the mail server on
> vger.kernel.org, and he already happens to be Cc'ed on this thread, so maybe 
> he
> can answer as to why linux-crypto may be configured differently?
> 
> Anyway, since the patch did make it to linux-kernel anyone can still download 
> it
> from patchwork if they know where to look:
> https://patchwork.kernel.org/patch/9556271/

I've increased the maxlength parameter for linux-cryto so this doesn't
happen again.

Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support

2017-02-09 Thread Namhyung Kim

Hi Tom,

On Wed, Feb 08, 2017 at 11:24:56AM -0600, Tom Zanussi wrote:
> This patchset adds support for 'inter-event' quantities to the trace
> event subsystem.  The most important example of inter-event quantities
> are latencies, or the time differences between two events.
> 
> One of the main motivations for adding this capability is to provide a
> general-purpose base that existing existing tools such as the -RT
> latency_hist patchset can be built upon, while at the same time
> providing a simple way for users to track latencies (or any
> inter-event quantity) generically between any two events.
> 
> Previous -RT latency_hist patchsets that take advantage of the trace
> event subsystem have been submitted, but they essentially hard-code
> special-case tracepoints and application logic in ways that can't be
> reused.  It seemed to me that rather than providing a one-off patchset
> devoted specifically to generating the specific histograms in the
> latency_hist patchset, it should be possible to build the same
> functionality on top of a generic layer allowing users to do similar
> things for other non-latency_hist applications.
> 
> In addition to preliminary patches that add some basic missing
> functionality such as a common ringbuffer-derived timestamp and
> dynamically-creatable tracepoints, the overall patchset is divided up
> into a few different areas that combine to produce the overall goal
> (The Documentation patch explains all the details):

Looks very nice!

> 
>   - variables and simple expressions required to calculate a latency
> 
> In order to calculate a latency or any inter-event value,
> something from one event needs to be saved and later retrieved,
> and some operation such as subtraction or addition is performed on
> it.  This means some minimal form of variables and expressions,
> which the first set of patches implements.  Saving and retrieving
> events to use in a latency calculation is normally done using a
> hash table, and that's exactly what we have with trace event hist
> triggers, so that's where variables are instantiated, set, and
> retrieved.  Basically, variables are set on one entry and
> retrieved and used by a 'matching' event.
> 
>   - 'synthetic' events, combining variables from other events
> 
> The trace event interface is based on pseudo-files associated with
> individual events, so it wouldn't really make sense to have
> quantities derived from multiple events attached to any one of
> those events.  For that reason, the patchset implements a means of
> combining variables from other events into a separate 'synthetic'
> event, which can be treated as if it were just like any other
> trace event in the system.
> 
>   - 'actions' generating synthetic events, among other things
> 
> Variables and synthetic events provide the data and data structure
> for new events, but something still needs to actually generate an
> event using that data.  'Actions' are expanded to provide that
> capability.  Though it hasn't been explicitly called as much
> before, the default 'action' currently for a hist trigger is to
> update the matching histogram entry's sum values.  This patchset
> essentially expands that to provide a new 'onmatch.trace(event)'
> action that can be used to have one event generate another.  The
> mechanism is extensible to other actions, and in fact the patchset
> also includes another, 'onmax(var).save(field,...)' that can be
> used to save context whenever a value exceeds the previous maximum
> (something also needed by latency_hist).
> 
> I'm submitting the patchset (based on tracing/for-next) as an RFC not
> only to get comments, but because there are still some problems I
> haven't fixed yet...
> 
> Here are some examples that should make things less abstract.
> 
>   
>   Example - wakeup latency
>   
> 
>   This basically implements the -RT latency_hist 'wakeup_latency'
>   histogram using the synthetic events, variables, and actions
>   described.  The output below is from a run of cyclictest using the
>   following command:
> 
> # rt-tests/cyclictest -p 80 -n -s -t 2
> 
>   What we're measuring the latency of is the time between when a
>   thread (of cyclictest) is awakened and when it's scheduled in.  To
>   do that we add triggers to sched_wakeup and sched_switch with the
>   appropriate variables, and on a matching sched_switch event,
>   generate a synthetic 'wakeup_latency' event.  Since it's just
>   another trace event like any other, we can also define a histogram
>   on that event, the output of which is what we see displayed when
>   reading the wakeup_latency 'hist' file.
> 
>   First, we create a synthetic event called wakeup_latency, that
>   references 3 variables from other events:
> 
> # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
>pid=sched_switch:woken_pid \
>

[PATCH] sched: Enhance readability of iterating wake_list

2017-02-09 Thread Byungchul Park

This is just for enhancing readability and a trivial one. Please take
this only if you agree.

-8<-
>From bb8cfcaa21ce157fb890a3e878fed2f2b7dadf81 Mon Sep 17 00:00:00 2001
From: Byungchul Park 
Date: Fri, 10 Feb 2017 12:53:14 +0900
Subject: [PATCH] sched: Enhance readability of iterating wake_list

Code for interating wake_list can be simpler so that it becomes more
readable and can be seen at a glance.

Signed-off-by: Byungchul Park 
---
 kernel/sched/core.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d01f9d0..66e7743 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1769,6 +1769,11 @@ static int ttwu_remote(struct task_struct *p, int 
wake_flags)
 }
 
 #ifdef CONFIG_SMP
+#define for_each_wake_list(task, node) \
+   for ((task) = llist_entry((node), struct task_struct, wake_entry); \
+node; (node) = llist_next(node), \
+(task) = llist_entry((node), struct task_struct, wake_entry))
+
 void sched_ttwu_pending(void)
 {
struct rq *rq = this_rq();
@@ -1783,17 +1788,8 @@ void sched_ttwu_pending(void)
raw_spin_lock_irqsave(&rq->lock, flags);
rq_pin_lock(rq, &rf);
 
-   while (llist) {
-   int wake_flags = 0;
-
-   p = llist_entry(llist, struct task_struct, wake_entry);
-   llist = llist_next(llist);
-
-   if (p->sched_remote_wakeup)
-   wake_flags = WF_MIGRATED;
-
-   ttwu_do_activate(rq, p, wake_flags, &rf);
-   }
+   for_each_wake_list(p, llist)
+   ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 
0, &rf);
 
rq_unpin_lock(rq, &rf);
raw_spin_unlock_irqrestore(&rq->lock, flags);
-- 
1.9.1

RE: [PATCH] ARCv2: intc: Disable all core interrupts by default

2017-02-09 Thread Yuriy Kolerov

Hi Vineet,

> -Original Message-
> From: Vineet Gupta [mailto:vgu...@synopsys.com]
> Sent: Wednesday, February 08, 2017 7:08 PM
> To: Yuriy Kolerov ; linux-snps-
> a...@lists.infradead.org
> Cc: alexey.brod...@synopsys.com; linux-kernel@vger.kernel.org;
> marc.zyng...@arm.com
> Subject: Re: [PATCH] ARCv2: intc: Disable all core interrupts by default
> 
> On 02/07/2017 03:04 PM, Yuriy Kolerov wrote:
> > The kernel emits a lot of warnings about unexpected IRQs when an
> > appropriate driver is not presented. It happens because all interrupts
> > in the core controller are enabled by default after reset. It would be
> > wise to keep all interrupts masked by default.
> >
> > Thus disable all local and common interrupts. If CPU consists of only
> > 1 core without IDU then it is necessary to disable all interrupts in
> > the core interrupt controller. If CPU contains IDU it means that there
> > are may be more than 1 cores and common interrupts (>= FIRST_EXT_IRQ)
> > must be disabled in IDU.
> 
> This is not elegant. core intc needs to do same thing to all interrupts coming
> in
> - irrespective of whether they are funneled via IDU or not.
> 
> 
> > Signed-off-by: Yuriy Kolerov 
> > ---
> >  arch/arc/kernel/intc-arcv2.c | 19 +++
> 
> [snip...]
> 
> > +
> > for (i = NR_EXCEPTIONS; i < irq_bcr.irqs + NR_EXCEPTIONS; i++) {
> > write_aux_reg(AUX_IRQ_SELECT, i);
> > write_aux_reg(AUX_IRQ_PRIORITY, ARCV2_IRQ_DEF_PRIO);
> > +
> > +   /*
> > +* If IDU exists then all common interrupts >= FIRST_EXT_IRQ
> > +* are masked by IDU thus disable only local interrupts (below
> > +* FIRST_EXT_IRQ). Otherwise disable all interrupts.
> > +*/
> > +   if (!mp.idu || i < FIRST_EXT_IRQ)
> > +   write_aux_reg(AUX_IRQ_ENABLE, 0);
> 
> So you seem to assume that anything > FIRST_EXT_IRQ is coming in via IDU
> which may not be case.
> external Interrupts can be wired to core directly - not via IDU
>  - 16 .. 23 are cpu private reserved irq
>  - 24 .. C are common irqs (via IDU)
>  - C + 1 .. N are cpu private external irqs
> 
> so better to disable all irq_bcr.irqs independent of how they are hooked up !

All IRQs >= 24 in ARC are marked as common interrupts and it is reasonable. It 
means that when "enable" or "mask" is called for hwirq < 24 then these 
functions are called on all cores. On the other hand these functions are called 
once only on 1 core for common interrupts. Then we have 3 cases:

1. We have UP and everything is easy. Just disable everything by default. When 
a driver comes up an appropriate IRQ is enable on single core.
2. We have SMP with IDU. We can enable/disable common interrupts in IDU and 
keep core interrupts enabled/unmasked by default. But what happens if a device 
is connected to one of the cores directly (is it even possible on SMP 
systems?)? Device Tree does not contain such information and the kernel does 
not know how it enable external IRQ for the specific core (as far as I know).
3. Assume that all core interrupts on all cores are disabled by default. When 
chained IRQ is created (IDU -> core intc) IDU automatically calls "enable" for 
an appropriate IRQ of core intc. But this "enable" is called only for 1 core so 
it is necessary to find a way to call "enable" on the rest of cores. I have a 
solution which solves this problem using "smp_call_function_single_async" in 
"enable" function of the core intc for IRQs >= 24 but I think that it may be 
overkill for such problem. Anyway I cannot find a solution for the same problem 
in other archs...

> -Vineet

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 925 matches

Mail list logo