date:20170327

[tip:perf/core] perf report: Introduce --inline option

2017-03-27 Thread tip-bot for Jin Yao

Commit-ID:  f3a60646cc3e0524d8f1083db1da7532a1590b40
Gitweb: http://git.kernel.org/tip/f3a60646cc3e0524d8f1083db1da7532a1590b40
Author: Jin Yao 
AuthorDate: Sun, 26 Mar 2017 04:34:27 +0800
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 27 Mar 2017 12:01:46 -0300

perf report: Introduce --inline option

It takes some time to look for inline stack for callgraph addresses.  So
it provides new option "--inline" to let user decide if enable this
feature.

  --inline:

  If a callgraph address belongs to an inlined function, the inline stack
  will be printed. Each entry is the inline function name or file/line.

Signed-off-by: Yao Jin 
Tested-by: Milian Wolff 
Cc: Andi Kleen 
Cc: Jiri Olsa 
Cc: Kan Liang 
Link: 
http://lkml.kernel.org/r/1490474069-15823-4-git-send-email-yao@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Documentation/perf-report.txt | 4 
 tools/perf/builtin-report.c  | 2 ++
 tools/perf/util/symbol.h | 3 ++-
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index e9a61f5..248bba4 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -430,6 +430,10 @@ include::itrace.txt[]
 --hierarchy::
Enable hierarchical output.
 
+--inline::
+   If a callgraph address belongs to an inlined function, the inline stack
+   will be printed. Each entry is function name or file/line.
+
 include::callchain-overhead-calculation.txt[]
 
 SEE ALSO
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 3c8885a..c18158b 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -845,6 +845,8 @@ int cmd_report(int argc, const char **argv)
 stdio__config_color, "always"),
OPT_STRING(0, "time", _str, "str",
   "Time span of interest (start,stop)"),
+   OPT_BOOLEAN(0, "inline", _conf.inline_name,
+   "Show inline function"),
OPT_END()
};
struct perf_data_file file = {
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index e36213c..5245d2f 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -118,7 +118,8 @@ struct symbol_conf {
show_ref_callgraph,
hide_unresolved,
raw_trace,
-   report_hierarchy;
+   report_hierarchy,
+   inline_name;
const char  *vmlinux_name,
*kallsyms_name,
*source_prefix,

[tip:perf/core] perf tools: Remove unused 'prefix' from builtin functions

2017-03-27 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  b0ad8ea66445d64a469df0c710947f4cdb8ef16b
Gitweb: http://git.kernel.org/tip/b0ad8ea66445d64a469df0c710947f4cdb8ef16b
Author: Arnaldo Carvalho de Melo 
AuthorDate: Mon, 27 Mar 2017 11:47:20 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 27 Mar 2017 11:58:09 -0300

perf tools: Remove unused 'prefix' from builtin functions

We got it from the git sources but never used it for anything, with the
place where this would be somehow used remaining:

  static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
  {
prefix = NULL;
if (p->option & RUN_SETUP)
prefix = NULL; /* setup_perf_directory(); */

Ditch it.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: http://lkml.kernel.org/n/tip-uw5swz05vol0qpr32c5lp...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/bench/bench.h   | 20 ++--
 tools/perf/bench/futex-hash.c  |  3 +-
 tools/perf/bench/futex-lock-pi.c   |  3 +-
 tools/perf/bench/futex-requeue.c   |  3 +-
 tools/perf/bench/futex-wake-parallel.c |  3 +-
 tools/perf/bench/futex-wake.c  |  3 +-
 tools/perf/bench/mem-functions.c   |  4 +--
 tools/perf/bench/numa.c|  2 +-
 tools/perf/bench/sched-messaging.c |  3 +-
 tools/perf/bench/sched-pipe.c  |  2 +-
 tools/perf/builtin-annotate.c  |  2 +-
 tools/perf/builtin-bench.c | 12 +++
 tools/perf/builtin-buildid-cache.c |  3 +-
 tools/perf/builtin-buildid-list.c  |  3 +-
 tools/perf/builtin-c2c.c   |  4 +--
 tools/perf/builtin-config.c|  2 +-
 tools/perf/builtin-data.c  |  9 +++---
 tools/perf/builtin-diff.c  |  2 +-
 tools/perf/builtin-evlist.c|  2 +-
 tools/perf/builtin-ftrace.c|  2 +-
 tools/perf/builtin-help.c  |  2 +-
 tools/perf/builtin-inject.c|  2 +-
 tools/perf/builtin-kallsyms.c  |  2 +-
 tools/perf/builtin-kmem.c  |  4 +--
 tools/perf/builtin-kvm.c   | 16 +-
 tools/perf/builtin-list.c  |  2 +-
 tools/perf/builtin-lock.c  |  6 ++--
 tools/perf/builtin-mem.c   |  6 ++--
 tools/perf/builtin-probe.c |  6 ++--
 tools/perf/builtin-record.c|  2 +-
 tools/perf/builtin-report.c|  2 +-
 tools/perf/builtin-sched.c |  6 ++--
 tools/perf/builtin-script.c|  4 +--
 tools/perf/builtin-stat.c  |  2 +-
 tools/perf/builtin-timechart.c |  7 ++--
 tools/perf/builtin-top.c   |  2 +-
 tools/perf/builtin-trace.c |  4 +--
 tools/perf/builtin-version.c   |  3 +-
 tools/perf/builtin.h   | 58 +-
 tools/perf/perf.c  | 11 ++-
 tools/perf/tests/builtin-test.c|  2 +-
 41 files changed, 110 insertions(+), 126 deletions(-)

diff --git a/tools/perf/bench/bench.h b/tools/perf/bench/bench.h
index 579a592..842ab27 100644
--- a/tools/perf/bench/bench.h
+++ b/tools/perf/bench/bench.h
@@ -25,17 +25,17 @@
 # endif
 #endif
 
-int bench_numa(int argc, const char **argv, const char *prefix);
-int bench_sched_messaging(int argc, const char **argv, const char *prefix);
-int bench_sched_pipe(int argc, const char **argv, const char *prefix);
-int bench_mem_memcpy(int argc, const char **argv, const char *prefix);
-int bench_mem_memset(int argc, const char **argv, const char *prefix);
-int bench_futex_hash(int argc, const char **argv, const char *prefix);
-int bench_futex_wake(int argc, const char **argv, const char *prefix);
-int bench_futex_wake_parallel(int argc, const char **argv, const char *prefix);
-int bench_futex_requeue(int argc, const char **argv, const char *prefix);
+int bench_numa(int argc, const char **argv);
+int bench_sched_messaging(int argc, const char **argv);
+int bench_sched_pipe(int argc, const char **argv);
+int bench_mem_memcpy(int argc, const char **argv);
+int bench_mem_memset(int argc, const char **argv);
+int bench_futex_hash(int argc, const char **argv);
+int bench_futex_wake(int argc, const char **argv);
+int bench_futex_wake_parallel(int argc, const char **argv);
+int bench_futex_requeue(int argc, const char **argv);
 /* pi futexes */
-int bench_futex_lock_pi(int argc, const char **argv, const char *prefix);
+int bench_futex_lock_pi(int argc, const char **argv);
 
 #define BENCH_FORMAT_DEFAULT_STR   "default"
 #define BENCH_FORMAT_DEFAULT   0
diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
index 2499e1b..fe16b31 100644
--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -114,8 +114,7 @@ static void print_summary(void)
   (int) runtime.tv_sec);
 }

[tip:perf/core] perf report: Show inline stack for browser mode

2017-03-27 Thread tip-bot for Jin Yao

Commit-ID:  0d3eb0b7783f1ee6d3314f101b9cbfb988020222
Gitweb: http://git.kernel.org/tip/0d3eb0b7783f1ee6d3314f101b9cbfb988020222
Author: Jin Yao 
AuthorDate: Sun, 26 Mar 2017 04:34:29 +0800
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 27 Mar 2017 12:12:59 -0300

perf report: Show inline stack for browser mode

If the address belongs to an inlined function, the source information
back to the first non-inlined function will be printed.

For example:

1. Show inlined function name
   perf report -g function --inline

-0.69% 0.00%  inline   ld-2.23.so   [.] dl_main
   - dl_main
0.56% _dl_relocate_object
 _dl_relocate_object (inline)
 elf_dynamic_do_Rela (inline)

2. Show the file/line information
   perf report -g address --inline

-0.69% 0.00%  inline   ld-2.23.so   [.] _dl_start
 _dl_start rtld.c:307
  /build/glibc-GKVZIf/glibc-2.23/elf/rtld.c:413 (inline)
   + _dl_sysdep_start dl-sysdep.c:250

Signed-off-by: Yao Jin 
Tested-by: Arnaldo Carvalho de Melo 
Tested-by: Milian Wolff 
Cc: Andi Kleen 
Cc: Jiri Olsa 
Cc: Kan Liang 
Link: 
http://lkml.kernel.org/r/1490474069-15823-6-git-send-email-yao@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/ui/browsers/hists.c | 180 +++--
 tools/perf/util/hist.c |   5 ++
 tools/perf/util/sort.h |   1 +
 3 files changed, 178 insertions(+), 8 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 2dc82be..62ecaeb 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -144,9 +144,60 @@ static void callchain_list__set_folding(struct 
callchain_list *cl, bool unfold)
cl->unfolded = unfold ? cl->has_children : false;
 }
 
+static struct inline_node *inline_node__create(struct map *map, u64 ip)
+{
+   struct dso *dso;
+   struct inline_node *node;
+
+   if (map == NULL)
+   return NULL;
+
+   dso = map->dso;
+   if (dso == NULL)
+   return NULL;
+
+   if (dso->kernel != DSO_TYPE_USER)
+   return NULL;
+
+   node = dso__parse_addr_inlines(dso,
+  map__rip_2objdump(map, ip));
+
+   return node;
+}
+
+static int inline__count_rows(struct inline_node *node)
+{
+   struct inline_list *ilist;
+   int i = 0;
+
+   if (node == NULL)
+   return 0;
+
+   list_for_each_entry(ilist, >val, list) {
+   if ((ilist->filename != NULL) || (ilist->funcname != NULL))
+   i++;
+   }
+
+   return i;
+}
+
+static int callchain_list__inline_rows(struct callchain_list *chain)
+{
+   struct inline_node *node;
+   int rows;
+
+   node = inline_node__create(chain->ms.map, chain->ip);
+   if (node == NULL)
+   return 0;
+
+   rows = inline__count_rows(node);
+   inline_node__delete(node);
+   return rows;
+}
+
 static int callchain_node__count_rows_rb_tree(struct callchain_node *node)
 {
-   int n = 0;
+   int n = 0, inline_rows;
struct rb_node *nd;
 
for (nd = rb_first(>rb_root); nd; nd = rb_next(nd)) {
@@ -156,6 +207,13 @@ static int callchain_node__count_rows_rb_tree(struct 
callchain_node *node)
 
list_for_each_entry(chain, >val, list) {
++n;
+
+   if (symbol_conf.inline_name) {
+   inline_rows =
+   callchain_list__inline_rows(chain);
+   n += inline_rows;
+   }
+
/* We need this because we may not have children */
folded_sign = callchain_list__folded(chain);
if (folded_sign == '+')
@@ -207,7 +265,7 @@ static int callchain_node__count_rows(struct callchain_node 
*node)
 {
struct callchain_list *chain;
bool unfolded = false;
-   int n = 0;
+   int n = 0, inline_rows;
 
if (callchain_param.mode == CHAIN_FLAT)
return callchain_node__count_flat_rows(node);
@@ -216,6 +274,11 @@ static int callchain_node__count_rows(struct 
callchain_node *node)
 
list_for_each_entry(chain, >val, list) {
++n;
+   if (symbol_conf.inline_name) {
+   inline_rows = callchain_list__inline_rows(chain);
+   n += inline_rows;
+   }
+
unfolded = chain->unfolded;
}
 
@@ -362,6 +425,19 @@ static void hist_entry__init_have_children(struct 
hist_entry *he)
he->init_have_children = true;
 }
 
+static void hist_entry_init_inline_node(struct hist_entry *he)
+{
+   if (he->inline_node)
+

[tip:perf/core] perf report: Refactor common code in srcline.c

2017-03-27 Thread tip-bot for Jin Yao

Commit-ID:  5580338d0f207921bc1fef5b668cd564adcc3419
Gitweb: http://git.kernel.org/tip/5580338d0f207921bc1fef5b668cd564adcc3419
Author: Jin Yao 
AuthorDate: Sun, 26 Mar 2017 04:34:25 +0800
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 27 Mar 2017 11:59:23 -0300

perf report: Refactor common code in srcline.c

Introduce dso__name() and filename_split() out of existing code because
these codes will be used in several places in next patch.

For filename_split(), it may also solve a potential memory leak in
existing code. In existing addr2line(),

sep = strchr(filename, ':');
if (sep) {
*sep++ = '\0';
*file = filename;
*line_nr = strtoul(sep, NULL, 0);
ret = 1;
}

out:
pclose(fp);
return ret;

If sep is NULL, filename is not freed or returned via file.

Signed-off-by: Yao Jin 
Tested-by: Milian Wolff 
Cc: Andi Kleen 
Cc: Jiri Olsa 
Cc: Kan Liang 
Link: 
http://lkml.kernel.org/r/1490474069-15823-2-git-send-email-yao@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/srcline.c | 68 +++
 1 file changed, 45 insertions(+), 23 deletions(-)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index b4db3f4..2953c9f 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -12,6 +12,24 @@
 
 bool srcline_full_filename;
 
+static const char *dso__name(struct dso *dso)
+{
+   const char *dso_name;
+
+   if (dso->symsrc_filename)
+   dso_name = dso->symsrc_filename;
+   else
+   dso_name = dso->long_name;
+
+   if (dso_name[0] == '[')
+   return NULL;
+
+   if (!strncmp(dso_name, "/tmp/perf-", 10))
+   return NULL;
+
+   return dso_name;
+}
+
 #ifdef HAVE_LIBBFD_SUPPORT
 
 /*
@@ -207,6 +225,27 @@ void dso__free_a2l(struct dso *dso)
 
 #else /* HAVE_LIBBFD_SUPPORT */
 
+static int filename_split(char *filename, unsigned int *line_nr)
+{
+   char *sep;
+
+   sep = strchr(filename, '\n');
+   if (sep)
+   *sep = '\0';
+
+   if (!strcmp(filename, "??:0"))
+   return 0;
+
+   sep = strchr(filename, ':');
+   if (sep) {
+   *sep++ = '\0';
+   *line_nr = strtoul(sep, NULL, 0);
+   return 1;
+   }
+
+   return 0;
+}
+
 static int addr2line(const char *dso_name, u64 addr,
 char **file, unsigned int *line_nr,
 struct dso *dso __maybe_unused,
@@ -216,7 +255,6 @@ static int addr2line(const char *dso_name, u64 addr,
char cmd[PATH_MAX];
char *filename = NULL;
size_t len;
-   char *sep;
int ret = 0;
 
scnprintf(cmd, sizeof(cmd), "addr2line -e %s %016"PRIx64,
@@ -233,23 +271,14 @@ static int addr2line(const char *dso_name, u64 addr,
goto out;
}
 
-   sep = strchr(filename, '\n');
-   if (sep)
-   *sep = '\0';
-
-   if (!strcmp(filename, "??:0")) {
-   pr_debug("no debugging info in %s\n", dso_name);
+   ret = filename_split(filename, line_nr);
+   if (ret != 1) {
free(filename);
goto out;
}
 
-   sep = strchr(filename, ':');
-   if (sep) {
-   *sep++ = '\0';
-   *file = filename;
-   *line_nr = strtoul(sep, NULL, 0);
-   ret = 1;
-   }
+   *file = filename;
+
 out:
pclose(fp);
return ret;
@@ -278,15 +307,8 @@ char *__get_srcline(struct dso *dso, u64 addr, struct 
symbol *sym,
if (!dso->has_srcline)
goto out;
 
-   if (dso->symsrc_filename)
-   dso_name = dso->symsrc_filename;
-   else
-   dso_name = dso->long_name;
-
-   if (dso_name[0] == '[')
-   goto out;
-
-   if (!strncmp(dso_name, "/tmp/perf-", 10))
+   dso_name = dso__name(dso);
+   if (dso_name == NULL)
goto out;
 
if (!addr2line(dso_name, addr, , , dso, unwind_inlines))

[tip:perf/core] perf report: Enable sorting by srcline as key

2017-03-27 Thread tip-bot for Milian Wolff

Commit-ID:  5dfa210e407d0fedf746958bff206995bd46570d
Gitweb: http://git.kernel.org/tip/5dfa210e407d0fedf746958bff206995bd46570d
Author: Milian Wolff 
AuthorDate: Sat, 18 Mar 2017 22:49:28 +0100
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 27 Mar 2017 12:13:28 -0300

perf report: Enable sorting by srcline as key

Often it is interesting to know how costly a given source line is in
total. Previously, one had to build these sums manually based on all
addresses that pointed to the same source line. This patch introduces
srcline as a sort key, which will do the aggregation for us.

Paired with the recent addition of showing inline frames, this makes
perf report much more useful for many C++ work loads.

The following shows the new feature in action. First, let's show the
status quo output when we sort by address. The result contains many hist
entries that generate the same output:

  
  $ perf report --stdio --inline -g address
  # Children  Self  Command   Shared ObjectSymbol
  #       ...  
.
  #
  99.89%35.34%  cpp-inlining  cpp-inlining [.] main
|
|--64.55%--main complex:655
|  
/home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39
 (inline)
|  /usr/include/c++/6.3.1/complex:664 (inline)
|  |
|  |--60.31%--hypot +20
|  |  |
|  |  |--8.52%--__hypot_finite +273
|  |  |
|  |  |--7.32%--__hypot_finite +411
...
 --35.34%--_start +4194346
   __libc_start_main +241
   |
   |--6.65%--main random.tcc:3326
   |  
/home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39
 (inline)
   |  /usr/include/c++/6.3.1/bits/random.h:1809 
(inline)
   |  /usr/include/c++/6.3.1/bits/random.h:1818 
(inline)
   |  /usr/include/c++/6.3.1/bits/random.h:185 
(inline)
   |
   |--2.70%--main random.tcc:3326
   |  
/home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39
 (inline)
   |  /usr/include/c++/6.3.1/bits/random.h:1809 
(inline)
   |  /usr/include/c++/6.3.1/bits/random.h:1818 
(inline)
   |  /usr/include/c++/6.3.1/bits/random.h:185 
(inline)
   |
   |--1.69%--main random.tcc:3326
   |  
/home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39
 (inline)
   |  /usr/include/c++/6.3.1/bits/random.h:1809 
(inline)
   |  /usr/include/c++/6.3.1/bits/random.h:1818 
(inline)
   |  /usr/include/c++/6.3.1/bits/random.h:185 
(inline)
  ...
  

With this patch and `-g srcline` we instead get the following output:

  
  $ perf report --stdio --inline -g srcline
  # Children  Self  Command   Shared ObjectSymbol
  #       ...  
.
  #
  99.89%35.34%  cpp-inlining  cpp-inlining [.] main
|
|--64.55%--main complex:655
|  
/home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39
 (inline)
|  /usr/include/c++/6.3.1/complex:664 (inline)
|  |
|  |--64.02%--hypot
|  |  |
|  |   --59.81%--__hypot_finite
|  |
|   --0.53%--cabs
|
 --35.34%--_start
   __libc_start_main
   |
   |--12.48%--main random.tcc:3326
   |  
/home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39
 (inline)
   |  /usr/include/c++/6.3.1/bits/random.h:1809 
(inline)
   |  /usr/include/c++/6.3.1/bits/random.h:1818 
(inline)
   |  /usr/include/c++/6.3.1/bits/random.h:185 
(inline)
  ...
  

Signed-off-by: Milian Wolff 
Cc: Jiri Olsa 
Cc: Yao Jin 
Link: http://lkml.kernel.org/r/20170318214928.9047-1-milian.wo...@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Documentation/perf-report.txt |  1 +

[tip:perf/core] perf report: Introduce --inline option

2017-03-27 Thread tip-bot for Jin Yao

Commit-ID:  f3a60646cc3e0524d8f1083db1da7532a1590b40
Gitweb: http://git.kernel.org/tip/f3a60646cc3e0524d8f1083db1da7532a1590b40
Author: Jin Yao 
AuthorDate: Sun, 26 Mar 2017 04:34:27 +0800
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 27 Mar 2017 12:01:46 -0300

perf report: Introduce --inline option

It takes some time to look for inline stack for callgraph addresses.  So
it provides new option "--inline" to let user decide if enable this
feature.

  --inline:

  If a callgraph address belongs to an inlined function, the inline stack
  will be printed. Each entry is the inline function name or file/line.

Signed-off-by: Yao Jin 
Tested-by: Milian Wolff 
Cc: Andi Kleen 
Cc: Jiri Olsa 
Cc: Kan Liang 
Link: 
http://lkml.kernel.org/r/1490474069-15823-4-git-send-email-yao@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Documentation/perf-report.txt | 4 
 tools/perf/builtin-report.c  | 2 ++
 tools/perf/util/symbol.h | 3 ++-
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index e9a61f5..248bba4 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -430,6 +430,10 @@ include::itrace.txt[]
 --hierarchy::
Enable hierarchical output.
 
+--inline::
+   If a callgraph address belongs to an inlined function, the inline stack
+   will be printed. Each entry is function name or file/line.
+
 include::callchain-overhead-calculation.txt[]
 
 SEE ALSO
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 3c8885a..c18158b 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -845,6 +845,8 @@ int cmd_report(int argc, const char **argv)
 stdio__config_color, "always"),
OPT_STRING(0, "time", _str, "str",
   "Time span of interest (start,stop)"),
+   OPT_BOOLEAN(0, "inline", _conf.inline_name,
+   "Show inline function"),
OPT_END()
};
struct perf_data_file file = {
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index e36213c..5245d2f 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -118,7 +118,8 @@ struct symbol_conf {
show_ref_callgraph,
hide_unresolved,
raw_trace,
-   report_hierarchy;
+   report_hierarchy,
+   inline_name;
const char  *vmlinux_name,
*kallsyms_name,
*source_prefix,

[tip:perf/core] perf tools: Remove unused 'prefix' from builtin functions

2017-03-27 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  b0ad8ea66445d64a469df0c710947f4cdb8ef16b
Gitweb: http://git.kernel.org/tip/b0ad8ea66445d64a469df0c710947f4cdb8ef16b
Author: Arnaldo Carvalho de Melo 
AuthorDate: Mon, 27 Mar 2017 11:47:20 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 27 Mar 2017 11:58:09 -0300

perf tools: Remove unused 'prefix' from builtin functions

We got it from the git sources but never used it for anything, with the
place where this would be somehow used remaining:

  static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
  {
prefix = NULL;
if (p->option & RUN_SETUP)
prefix = NULL; /* setup_perf_directory(); */

Ditch it.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: http://lkml.kernel.org/n/tip-uw5swz05vol0qpr32c5lp...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/bench/bench.h   | 20 ++--
 tools/perf/bench/futex-hash.c  |  3 +-
 tools/perf/bench/futex-lock-pi.c   |  3 +-
 tools/perf/bench/futex-requeue.c   |  3 +-
 tools/perf/bench/futex-wake-parallel.c |  3 +-
 tools/perf/bench/futex-wake.c  |  3 +-
 tools/perf/bench/mem-functions.c   |  4 +--
 tools/perf/bench/numa.c|  2 +-
 tools/perf/bench/sched-messaging.c |  3 +-
 tools/perf/bench/sched-pipe.c  |  2 +-
 tools/perf/builtin-annotate.c  |  2 +-
 tools/perf/builtin-bench.c | 12 +++
 tools/perf/builtin-buildid-cache.c |  3 +-
 tools/perf/builtin-buildid-list.c  |  3 +-
 tools/perf/builtin-c2c.c   |  4 +--
 tools/perf/builtin-config.c|  2 +-
 tools/perf/builtin-data.c  |  9 +++---
 tools/perf/builtin-diff.c  |  2 +-
 tools/perf/builtin-evlist.c|  2 +-
 tools/perf/builtin-ftrace.c|  2 +-
 tools/perf/builtin-help.c  |  2 +-
 tools/perf/builtin-inject.c|  2 +-
 tools/perf/builtin-kallsyms.c  |  2 +-
 tools/perf/builtin-kmem.c  |  4 +--
 tools/perf/builtin-kvm.c   | 16 +-
 tools/perf/builtin-list.c  |  2 +-
 tools/perf/builtin-lock.c  |  6 ++--
 tools/perf/builtin-mem.c   |  6 ++--
 tools/perf/builtin-probe.c |  6 ++--
 tools/perf/builtin-record.c|  2 +-
 tools/perf/builtin-report.c|  2 +-
 tools/perf/builtin-sched.c |  6 ++--
 tools/perf/builtin-script.c|  4 +--
 tools/perf/builtin-stat.c  |  2 +-
 tools/perf/builtin-timechart.c |  7 ++--
 tools/perf/builtin-top.c   |  2 +-
 tools/perf/builtin-trace.c |  4 +--
 tools/perf/builtin-version.c   |  3 +-
 tools/perf/builtin.h   | 58 +-
 tools/perf/perf.c  | 11 ++-
 tools/perf/tests/builtin-test.c|  2 +-
 41 files changed, 110 insertions(+), 126 deletions(-)

diff --git a/tools/perf/bench/bench.h b/tools/perf/bench/bench.h
index 579a592..842ab27 100644
--- a/tools/perf/bench/bench.h
+++ b/tools/perf/bench/bench.h
@@ -25,17 +25,17 @@
 # endif
 #endif
 
-int bench_numa(int argc, const char **argv, const char *prefix);
-int bench_sched_messaging(int argc, const char **argv, const char *prefix);
-int bench_sched_pipe(int argc, const char **argv, const char *prefix);
-int bench_mem_memcpy(int argc, const char **argv, const char *prefix);
-int bench_mem_memset(int argc, const char **argv, const char *prefix);
-int bench_futex_hash(int argc, const char **argv, const char *prefix);
-int bench_futex_wake(int argc, const char **argv, const char *prefix);
-int bench_futex_wake_parallel(int argc, const char **argv, const char *prefix);
-int bench_futex_requeue(int argc, const char **argv, const char *prefix);
+int bench_numa(int argc, const char **argv);
+int bench_sched_messaging(int argc, const char **argv);
+int bench_sched_pipe(int argc, const char **argv);
+int bench_mem_memcpy(int argc, const char **argv);
+int bench_mem_memset(int argc, const char **argv);
+int bench_futex_hash(int argc, const char **argv);
+int bench_futex_wake(int argc, const char **argv);
+int bench_futex_wake_parallel(int argc, const char **argv);
+int bench_futex_requeue(int argc, const char **argv);
 /* pi futexes */
-int bench_futex_lock_pi(int argc, const char **argv, const char *prefix);
+int bench_futex_lock_pi(int argc, const char **argv);
 
 #define BENCH_FORMAT_DEFAULT_STR   "default"
 #define BENCH_FORMAT_DEFAULT   0
diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
index 2499e1b..fe16b31 100644
--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -114,8 +114,7 @@ static void print_summary(void)
   (int) runtime.tv_sec);
 }
 
-int bench_futex_hash(int argc, const char **argv,
-const char *prefix __maybe_unused)
+int bench_futex_hash(int argc, const char

[tip:perf/core] perf report: Show inline stack for browser mode

2017-03-27 Thread tip-bot for Jin Yao

Commit-ID:  0d3eb0b7783f1ee6d3314f101b9cbfb988020222
Gitweb: http://git.kernel.org/tip/0d3eb0b7783f1ee6d3314f101b9cbfb988020222
Author: Jin Yao 
AuthorDate: Sun, 26 Mar 2017 04:34:29 +0800
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 27 Mar 2017 12:12:59 -0300

perf report: Show inline stack for browser mode

If the address belongs to an inlined function, the source information
back to the first non-inlined function will be printed.

For example:

1. Show inlined function name
   perf report -g function --inline

-0.69% 0.00%  inline   ld-2.23.so   [.] dl_main
   - dl_main
0.56% _dl_relocate_object
 _dl_relocate_object (inline)
 elf_dynamic_do_Rela (inline)

2. Show the file/line information
   perf report -g address --inline

-0.69% 0.00%  inline   ld-2.23.so   [.] _dl_start
 _dl_start rtld.c:307
  /build/glibc-GKVZIf/glibc-2.23/elf/rtld.c:413 (inline)
   + _dl_sysdep_start dl-sysdep.c:250

Signed-off-by: Yao Jin 
Tested-by: Arnaldo Carvalho de Melo 
Tested-by: Milian Wolff 
Cc: Andi Kleen 
Cc: Jiri Olsa 
Cc: Kan Liang 
Link: 
http://lkml.kernel.org/r/1490474069-15823-6-git-send-email-yao@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/ui/browsers/hists.c | 180 +++--
 tools/perf/util/hist.c |   5 ++
 tools/perf/util/sort.h |   1 +
 3 files changed, 178 insertions(+), 8 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 2dc82be..62ecaeb 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -144,9 +144,60 @@ static void callchain_list__set_folding(struct 
callchain_list *cl, bool unfold)
cl->unfolded = unfold ? cl->has_children : false;
 }
 
+static struct inline_node *inline_node__create(struct map *map, u64 ip)
+{
+   struct dso *dso;
+   struct inline_node *node;
+
+   if (map == NULL)
+   return NULL;
+
+   dso = map->dso;
+   if (dso == NULL)
+   return NULL;
+
+   if (dso->kernel != DSO_TYPE_USER)
+   return NULL;
+
+   node = dso__parse_addr_inlines(dso,
+  map__rip_2objdump(map, ip));
+
+   return node;
+}
+
+static int inline__count_rows(struct inline_node *node)
+{
+   struct inline_list *ilist;
+   int i = 0;
+
+   if (node == NULL)
+   return 0;
+
+   list_for_each_entry(ilist, >val, list) {
+   if ((ilist->filename != NULL) || (ilist->funcname != NULL))
+   i++;
+   }
+
+   return i;
+}
+
+static int callchain_list__inline_rows(struct callchain_list *chain)
+{
+   struct inline_node *node;
+   int rows;
+
+   node = inline_node__create(chain->ms.map, chain->ip);
+   if (node == NULL)
+   return 0;
+
+   rows = inline__count_rows(node);
+   inline_node__delete(node);
+   return rows;
+}
+
 static int callchain_node__count_rows_rb_tree(struct callchain_node *node)
 {
-   int n = 0;
+   int n = 0, inline_rows;
struct rb_node *nd;
 
for (nd = rb_first(>rb_root); nd; nd = rb_next(nd)) {
@@ -156,6 +207,13 @@ static int callchain_node__count_rows_rb_tree(struct 
callchain_node *node)
 
list_for_each_entry(chain, >val, list) {
++n;
+
+   if (symbol_conf.inline_name) {
+   inline_rows =
+   callchain_list__inline_rows(chain);
+   n += inline_rows;
+   }
+
/* We need this because we may not have children */
folded_sign = callchain_list__folded(chain);
if (folded_sign == '+')
@@ -207,7 +265,7 @@ static int callchain_node__count_rows(struct callchain_node 
*node)
 {
struct callchain_list *chain;
bool unfolded = false;
-   int n = 0;
+   int n = 0, inline_rows;
 
if (callchain_param.mode == CHAIN_FLAT)
return callchain_node__count_flat_rows(node);
@@ -216,6 +274,11 @@ static int callchain_node__count_rows(struct 
callchain_node *node)
 
list_for_each_entry(chain, >val, list) {
++n;
+   if (symbol_conf.inline_name) {
+   inline_rows = callchain_list__inline_rows(chain);
+   n += inline_rows;
+   }
+
unfolded = chain->unfolded;
}
 
@@ -362,6 +425,19 @@ static void hist_entry__init_have_children(struct 
hist_entry *he)
he->init_have_children = true;
 }
 
+static void hist_entry_init_inline_node(struct hist_entry *he)
+{
+   if (he->inline_node)
+   return;
+
+   he->inline_node = inline_node__create(he->ms.map, he->ip);
+
+   if (he->inline_node == NULL)
+   return;
+
+   he->has_children =

[tip:perf/core] perf report: Refactor common code in srcline.c

2017-03-27 Thread tip-bot for Jin Yao

Commit-ID:  5580338d0f207921bc1fef5b668cd564adcc3419
Gitweb: http://git.kernel.org/tip/5580338d0f207921bc1fef5b668cd564adcc3419
Author: Jin Yao 
AuthorDate: Sun, 26 Mar 2017 04:34:25 +0800
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 27 Mar 2017 11:59:23 -0300

perf report: Refactor common code in srcline.c

Introduce dso__name() and filename_split() out of existing code because
these codes will be used in several places in next patch.

For filename_split(), it may also solve a potential memory leak in
existing code. In existing addr2line(),

sep = strchr(filename, ':');
if (sep) {
*sep++ = '\0';
*file = filename;
*line_nr = strtoul(sep, NULL, 0);
ret = 1;
}

out:
pclose(fp);
return ret;

If sep is NULL, filename is not freed or returned via file.

Signed-off-by: Yao Jin 
Tested-by: Milian Wolff 
Cc: Andi Kleen 
Cc: Jiri Olsa 
Cc: Kan Liang 
Link: 
http://lkml.kernel.org/r/1490474069-15823-2-git-send-email-yao@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/srcline.c | 68 +++
 1 file changed, 45 insertions(+), 23 deletions(-)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index b4db3f4..2953c9f 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -12,6 +12,24 @@
 
 bool srcline_full_filename;
 
+static const char *dso__name(struct dso *dso)
+{
+   const char *dso_name;
+
+   if (dso->symsrc_filename)
+   dso_name = dso->symsrc_filename;
+   else
+   dso_name = dso->long_name;
+
+   if (dso_name[0] == '[')
+   return NULL;
+
+   if (!strncmp(dso_name, "/tmp/perf-", 10))
+   return NULL;
+
+   return dso_name;
+}
+
 #ifdef HAVE_LIBBFD_SUPPORT
 
 /*
@@ -207,6 +225,27 @@ void dso__free_a2l(struct dso *dso)
 
 #else /* HAVE_LIBBFD_SUPPORT */
 
+static int filename_split(char *filename, unsigned int *line_nr)
+{
+   char *sep;
+
+   sep = strchr(filename, '\n');
+   if (sep)
+   *sep = '\0';
+
+   if (!strcmp(filename, "??:0"))
+   return 0;
+
+   sep = strchr(filename, ':');
+   if (sep) {
+   *sep++ = '\0';
+   *line_nr = strtoul(sep, NULL, 0);
+   return 1;
+   }
+
+   return 0;
+}
+
 static int addr2line(const char *dso_name, u64 addr,
 char **file, unsigned int *line_nr,
 struct dso *dso __maybe_unused,
@@ -216,7 +255,6 @@ static int addr2line(const char *dso_name, u64 addr,
char cmd[PATH_MAX];
char *filename = NULL;
size_t len;
-   char *sep;
int ret = 0;
 
scnprintf(cmd, sizeof(cmd), "addr2line -e %s %016"PRIx64,
@@ -233,23 +271,14 @@ static int addr2line(const char *dso_name, u64 addr,
goto out;
}
 
-   sep = strchr(filename, '\n');
-   if (sep)
-   *sep = '\0';
-
-   if (!strcmp(filename, "??:0")) {
-   pr_debug("no debugging info in %s\n", dso_name);
+   ret = filename_split(filename, line_nr);
+   if (ret != 1) {
free(filename);
goto out;
}
 
-   sep = strchr(filename, ':');
-   if (sep) {
-   *sep++ = '\0';
-   *file = filename;
-   *line_nr = strtoul(sep, NULL, 0);
-   ret = 1;
-   }
+   *file = filename;
+
 out:
pclose(fp);
return ret;
@@ -278,15 +307,8 @@ char *__get_srcline(struct dso *dso, u64 addr, struct 
symbol *sym,
if (!dso->has_srcline)
goto out;
 
-   if (dso->symsrc_filename)
-   dso_name = dso->symsrc_filename;
-   else
-   dso_name = dso->long_name;
-
-   if (dso_name[0] == '[')
-   goto out;
-
-   if (!strncmp(dso_name, "/tmp/perf-", 10))
+   dso_name = dso__name(dso);
+   if (dso_name == NULL)
goto out;
 
if (!addr2line(dso_name, addr, , , dso, unwind_inlines))

[tip:perf/core] perf report: Enable sorting by srcline as key

2017-03-27 Thread tip-bot for Milian Wolff

Commit-ID:  5dfa210e407d0fedf746958bff206995bd46570d
Gitweb: http://git.kernel.org/tip/5dfa210e407d0fedf746958bff206995bd46570d
Author: Milian Wolff 
AuthorDate: Sat, 18 Mar 2017 22:49:28 +0100
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 27 Mar 2017 12:13:28 -0300

perf report: Enable sorting by srcline as key

Often it is interesting to know how costly a given source line is in
total. Previously, one had to build these sums manually based on all
addresses that pointed to the same source line. This patch introduces
srcline as a sort key, which will do the aggregation for us.

Paired with the recent addition of showing inline frames, this makes
perf report much more useful for many C++ work loads.

The following shows the new feature in action. First, let's show the
status quo output when we sort by address. The result contains many hist
entries that generate the same output:

  
  $ perf report --stdio --inline -g address
  # Children  Self  Command   Shared ObjectSymbol
  #       ...  
.
  #
  99.89%35.34%  cpp-inlining  cpp-inlining [.] main
|
|--64.55%--main complex:655
|  
/home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39
 (inline)
|  /usr/include/c++/6.3.1/complex:664 (inline)
|  |
|  |--60.31%--hypot +20
|  |  |
|  |  |--8.52%--__hypot_finite +273
|  |  |
|  |  |--7.32%--__hypot_finite +411
...
 --35.34%--_start +4194346
   __libc_start_main +241
   |
   |--6.65%--main random.tcc:3326
   |  
/home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39
 (inline)
   |  /usr/include/c++/6.3.1/bits/random.h:1809 
(inline)
   |  /usr/include/c++/6.3.1/bits/random.h:1818 
(inline)
   |  /usr/include/c++/6.3.1/bits/random.h:185 
(inline)
   |
   |--2.70%--main random.tcc:3326
   |  
/home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39
 (inline)
   |  /usr/include/c++/6.3.1/bits/random.h:1809 
(inline)
   |  /usr/include/c++/6.3.1/bits/random.h:1818 
(inline)
   |  /usr/include/c++/6.3.1/bits/random.h:185 
(inline)
   |
   |--1.69%--main random.tcc:3326
   |  
/home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39
 (inline)
   |  /usr/include/c++/6.3.1/bits/random.h:1809 
(inline)
   |  /usr/include/c++/6.3.1/bits/random.h:1818 
(inline)
   |  /usr/include/c++/6.3.1/bits/random.h:185 
(inline)
  ...
  

With this patch and `-g srcline` we instead get the following output:

  
  $ perf report --stdio --inline -g srcline
  # Children  Self  Command   Shared ObjectSymbol
  #       ...  
.
  #
  99.89%35.34%  cpp-inlining  cpp-inlining [.] main
|
|--64.55%--main complex:655
|  
/home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39
 (inline)
|  /usr/include/c++/6.3.1/complex:664 (inline)
|  |
|  |--64.02%--hypot
|  |  |
|  |   --59.81%--__hypot_finite
|  |
|   --0.53%--cabs
|
 --35.34%--_start
   __libc_start_main
   |
   |--12.48%--main random.tcc:3326
   |  
/home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39
 (inline)
   |  /usr/include/c++/6.3.1/bits/random.h:1809 
(inline)
   |  /usr/include/c++/6.3.1/bits/random.h:1818 
(inline)
   |  /usr/include/c++/6.3.1/bits/random.h:185 
(inline)
  ...
  

Signed-off-by: Milian Wolff 
Cc: Jiri Olsa 
Cc: Yao Jin 
Link: http://lkml.kernel.org/r/20170318214928.9047-1-milian.wo...@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Documentation/perf-report.txt |  1 +
 tools/perf/ui/browsers/hists.c   |  3 +-
 tools/perf/ui/stdio/hist.c   |  3 +-
 tools/perf/util/annotate.c

[tip:perf/core] perf report: Show inline stack for stdio mode

2017-03-27 Thread tip-bot for Jin Yao

Commit-ID:  0db64dd060f7fd77921be8f10fa9f7a5f49a3a43
Gitweb: http://git.kernel.org/tip/0db64dd060f7fd77921be8f10fa9f7a5f49a3a43
Author: Jin Yao 
AuthorDate: Sun, 26 Mar 2017 04:34:28 +0800
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 27 Mar 2017 12:02:22 -0300

perf report: Show inline stack for stdio mode

If the address belongs to an inlined function, the source information
back to the first non-inlined function will be printed.

For example:

1. Show inlined function name
   perf report --stdio -g function --inline

 0.69% 0.00%  inline   ld-2.23.so   [.] dl_main
|
---dl_main
   |
--0.56%--_dl_relocate_object
  _dl_relocate_object (inline)
  elf_dynamic_do_Rela (inline)

2. Show the file/line information
   perf report --stdio -g address --inline

 0.69% 0.00%  inline   ld-2.23.so   [.] _dl_start_user
|
---_dl_start_user .:0
   _dl_start rtld.c:307
   /build/glibc-GKVZIf/glibc-2.23/elf/rtld.c:413 (inline)
   _dl_sysdep_start dl-sysdep.c:250
   |
--0.56%--dl_main rtld.c:2076

Committer tests:

  # perf record --call-graph dwarf ~/bin/perf stat usleep 1

 Performance counter stats for 'usleep 1':

  0.443020  task-clock (msec) #0.449 CPUs utilized
 1  context-switches  #0.002 M/sec
 0  cpu-migrations#0.000 K/sec
52  page-faults   #0.117 M/sec
 1,049,423  cycles#2.369 GHz
   801,456  instructions  #0.76  insn per cycle
   155,609  branches  #  351.246 M/sec
 7,026  branch-misses #4.52% of all branches

   0.000987570 seconds time elapsed

  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 0.553 MB perf.data (66 samples) ]
  # perf report --stdio --inline fs__get_mountpoint
  
 1.73% 0.00%  perf perf   [.] fs__get_mountpoint
|
---fs__get_mountpoint
   fs__get_mountpoint (inline)
   fs__check_mounts (inline)
   __statfs
   entry_SYSCALL_64
   sys_statfs
   SYSC_statfs
   user_statfs
   user_path_at_empty
   filename_lookup
   path_lookupat
   link_path_walk
   inode_permission
   __inode_permission
   kernfs_iop_permission
   kernfs_refresh_inode
   security_inode_notifysecctx
   selinux_inode_notifysecctx
   selinux_inode_setsecurity
   security_context_to_sid
   security_context_to_sid_core
   string_to_context_struct
   symcmp

Signed-off-by: Yao Jin 
Tested-by: Arnaldo Carvalho de Melo 
Tested-by: Milian Wolff 
Cc: Andi Kleen 
Cc: Jiri Olsa 
Cc: Kan Liang 
Link: 
http://lkml.kernel.org/r/1490474069-15823-5-git-send-email-yao@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/ui/stdio/hist.c | 85 +-
 1 file changed, 84 insertions(+), 1 deletion(-)

diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 668f4ae..6128f48 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -17,6 +17,66 @@ static size_t callchain__fprintf_left_margin(FILE *fp, int 
left_margin)
return ret;
 }
 
+static size_t inline__fprintf(struct map *map, u64 ip, int left_margin,
+ int depth, int depth_mask, FILE *fp)
+{
+   struct dso *dso;
+   struct inline_node *node;
+   struct inline_list *ilist;
+   int ret = 0, i;
+
+   if (map == NULL)
+   return 0;
+
+   dso = map->dso;
+   if (dso == NULL)
+   return 0;
+
+   if (dso->kernel != DSO_TYPE_USER)
+   return 0;
+
+   node = dso__parse_addr_inlines(dso,
+  map__rip_2objdump(map, ip));
+   if (node == NULL)
+   return 0;
+
+   list_for_each_entry(ilist, >val, list) {
+   if ((ilist->filename != NULL) || (ilist->funcname != NULL)) {
+   ret += callchain__fprintf_left_margin(fp, left_margin);
+
+   for (i = 0; i < depth; i++) {
+   if (depth_mask & (1 << i))
+   ret += fprintf(fp, "|");
+   else
+   ret += fprintf(fp, " ");
+

[tip:perf/core] perf report: Show inline stack for stdio mode

2017-03-27 Thread tip-bot for Jin Yao

Commit-ID:  0db64dd060f7fd77921be8f10fa9f7a5f49a3a43
Gitweb: http://git.kernel.org/tip/0db64dd060f7fd77921be8f10fa9f7a5f49a3a43
Author: Jin Yao 
AuthorDate: Sun, 26 Mar 2017 04:34:28 +0800
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 27 Mar 2017 12:02:22 -0300

perf report: Show inline stack for stdio mode

If the address belongs to an inlined function, the source information
back to the first non-inlined function will be printed.

For example:

1. Show inlined function name
   perf report --stdio -g function --inline

 0.69% 0.00%  inline   ld-2.23.so   [.] dl_main
|
---dl_main
   |
--0.56%--_dl_relocate_object
  _dl_relocate_object (inline)
  elf_dynamic_do_Rela (inline)

2. Show the file/line information
   perf report --stdio -g address --inline

 0.69% 0.00%  inline   ld-2.23.so   [.] _dl_start_user
|
---_dl_start_user .:0
   _dl_start rtld.c:307
   /build/glibc-GKVZIf/glibc-2.23/elf/rtld.c:413 (inline)
   _dl_sysdep_start dl-sysdep.c:250
   |
--0.56%--dl_main rtld.c:2076

Committer tests:

  # perf record --call-graph dwarf ~/bin/perf stat usleep 1

 Performance counter stats for 'usleep 1':

  0.443020  task-clock (msec) #0.449 CPUs utilized
 1  context-switches  #0.002 M/sec
 0  cpu-migrations#0.000 K/sec
52  page-faults   #0.117 M/sec
 1,049,423  cycles#2.369 GHz
   801,456  instructions  #0.76  insn per cycle
   155,609  branches  #  351.246 M/sec
 7,026  branch-misses #4.52% of all branches

   0.000987570 seconds time elapsed

  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 0.553 MB perf.data (66 samples) ]
  # perf report --stdio --inline fs__get_mountpoint
  
 1.73% 0.00%  perf perf   [.] fs__get_mountpoint
|
---fs__get_mountpoint
   fs__get_mountpoint (inline)
   fs__check_mounts (inline)
   __statfs
   entry_SYSCALL_64
   sys_statfs
   SYSC_statfs
   user_statfs
   user_path_at_empty
   filename_lookup
   path_lookupat
   link_path_walk
   inode_permission
   __inode_permission
   kernfs_iop_permission
   kernfs_refresh_inode
   security_inode_notifysecctx
   selinux_inode_notifysecctx
   selinux_inode_setsecurity
   security_context_to_sid
   security_context_to_sid_core
   string_to_context_struct
   symcmp

Signed-off-by: Yao Jin 
Tested-by: Arnaldo Carvalho de Melo 
Tested-by: Milian Wolff 
Cc: Andi Kleen 
Cc: Jiri Olsa 
Cc: Kan Liang 
Link: 
http://lkml.kernel.org/r/1490474069-15823-5-git-send-email-yao@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/ui/stdio/hist.c | 85 +-
 1 file changed, 84 insertions(+), 1 deletion(-)

diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 668f4ae..6128f48 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -17,6 +17,66 @@ static size_t callchain__fprintf_left_margin(FILE *fp, int 
left_margin)
return ret;
 }
 
+static size_t inline__fprintf(struct map *map, u64 ip, int left_margin,
+ int depth, int depth_mask, FILE *fp)
+{
+   struct dso *dso;
+   struct inline_node *node;
+   struct inline_list *ilist;
+   int ret = 0, i;
+
+   if (map == NULL)
+   return 0;
+
+   dso = map->dso;
+   if (dso == NULL)
+   return 0;
+
+   if (dso->kernel != DSO_TYPE_USER)
+   return 0;
+
+   node = dso__parse_addr_inlines(dso,
+  map__rip_2objdump(map, ip));
+   if (node == NULL)
+   return 0;
+
+   list_for_each_entry(ilist, >val, list) {
+   if ((ilist->filename != NULL) || (ilist->funcname != NULL)) {
+   ret += callchain__fprintf_left_margin(fp, left_margin);
+
+   for (i = 0; i < depth; i++) {
+   if (depth_mask & (1 << i))
+   ret += fprintf(fp, "|");
+   else
+   ret += fprintf(fp, " ");
+   ret += fprintf(fp, "  ");
+   }
+
+   if (callchain_param.key == CCKEY_ADDRESS) {
+   if

[tip:perf/core] perf report: Find the inline stack for a given address

2017-03-27 Thread tip-bot for Jin Yao

Commit-ID:  a64489c56c307bf0955f0489158c5ecf6aa10fe2
Gitweb: http://git.kernel.org/tip/a64489c56c307bf0955f0489158c5ecf6aa10fe2
Author: Jin Yao 
AuthorDate: Sun, 26 Mar 2017 04:34:26 +0800
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 27 Mar 2017 12:00:38 -0300

perf report: Find the inline stack for a given address

It would be useful for perf to support a mode to query the inline stack
for a given callgraph address. This would simplify finding the right
code in code that does a lot of inlining.

The srcline.c has contained the code which supports to translate the
address to filename:line_nr. This patch just extends the function to let
it support getting the inline stacks.

It introduces the inline_list which will store the inline function
result (filename:line_nr and funcname).

If BFD lib is not supported, the result is only filename:line_nr.

Signed-off-by: Yao Jin 
Tested-by: Milian Wolff 
Cc: Andi Kleen 
Cc: Jiri Olsa 
Cc: Kan Liang 
Link: 
http://lkml.kernel.org/r/1490474069-15823-3-git-send-email-yao@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/srcline.c| 167 +--
 tools/perf/util/symbol-elf.c |   5 ++
 tools/perf/util/symbol-minimal.c |   7 ++
 tools/perf/util/symbol.h |   2 +
 tools/perf/util/util.h   |  16 
 5 files changed, 192 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index 2953c9f..3ce28f7 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -7,6 +7,7 @@
 #include "util/dso.h"
 #include "util/util.h"
 #include "util/debug.h"
+#include "util/callchain.h"
 
 #include "symbol.h"
 
@@ -30,6 +31,34 @@ static const char *dso__name(struct dso *dso)
return dso_name;
 }
 
+static int inline_list__append(char *filename, char *funcname, int line_nr,
+  struct inline_node *node, struct dso *dso)
+{
+   struct inline_list *ilist;
+   char *demangled;
+
+   ilist = zalloc(sizeof(*ilist));
+   if (ilist == NULL)
+   return -1;
+
+   ilist->filename = filename;
+   ilist->line_nr = line_nr;
+
+   if (dso != NULL) {
+   demangled = dso__demangle_sym(dso, 0, funcname);
+   if (demangled == NULL) {
+   ilist->funcname = funcname;
+   } else {
+   ilist->funcname = demangled;
+   free(funcname);
+   }
+   }
+
+   list_add_tail(>list, >val);
+
+   return 0;
+}
+
 #ifdef HAVE_LIBBFD_SUPPORT
 
 /*
@@ -169,9 +198,17 @@ static void addr2line_cleanup(struct a2l_data *a2l)
 
 #define MAX_INLINE_NEST 1024
 
+static void inline_list__reverse(struct inline_node *node)
+{
+   struct inline_list *ilist, *n;
+
+   list_for_each_entry_safe_reverse(ilist, n, >val, list)
+   list_move_tail(>list, >val);
+}
+
 static int addr2line(const char *dso_name, u64 addr,
 char **file, unsigned int *line, struct dso *dso,
-bool unwind_inlines)
+bool unwind_inlines, struct inline_node *node)
 {
int ret = 0;
struct a2l_data *a2l = dso->a2l;
@@ -196,8 +233,21 @@ static int addr2line(const char *dso_name, u64 addr,
 
while (bfd_find_inliner_info(a2l->abfd, >filename,
 >funcname, >line) &&
-  cnt++ < MAX_INLINE_NEST)
-   ;
+  cnt++ < MAX_INLINE_NEST) {
+
+   if (node != NULL) {
+   if (inline_list__append(strdup(a2l->filename),
+   strdup(a2l->funcname),
+   a2l->line, node,
+   dso) != 0)
+   return 0;
+   }
+   }
+
+   if ((node != NULL) &&
+   (callchain_param.order != ORDER_CALLEE)) {
+   inline_list__reverse(node);
+   }
}
 
if (a2l->found && a2l->filename) {
@@ -223,6 +273,35 @@ void dso__free_a2l(struct dso *dso)
dso->a2l = NULL;
 }
 
+static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
+   struct dso *dso)
+{
+   char *file = NULL;
+   unsigned int line = 0;
+   struct inline_node *node;
+
+   node = zalloc(sizeof(*node));
+   if (node == NULL) {
+   perror("not enough memory for the inline node");
+   return NULL;
+   }
+
+   INIT_LIST_HEAD(>val);
+   node->addr = addr;
+
+   if (!addr2line(dso_name, addr, , , dso, TRUE, node))
+

[tip:perf/core] perf report: Find the inline stack for a given address

2017-03-27 Thread tip-bot for Jin Yao

Commit-ID:  a64489c56c307bf0955f0489158c5ecf6aa10fe2
Gitweb: http://git.kernel.org/tip/a64489c56c307bf0955f0489158c5ecf6aa10fe2
Author: Jin Yao 
AuthorDate: Sun, 26 Mar 2017 04:34:26 +0800
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 27 Mar 2017 12:00:38 -0300

perf report: Find the inline stack for a given address

It would be useful for perf to support a mode to query the inline stack
for a given callgraph address. This would simplify finding the right
code in code that does a lot of inlining.

The srcline.c has contained the code which supports to translate the
address to filename:line_nr. This patch just extends the function to let
it support getting the inline stacks.

It introduces the inline_list which will store the inline function
result (filename:line_nr and funcname).

If BFD lib is not supported, the result is only filename:line_nr.

Signed-off-by: Yao Jin 
Tested-by: Milian Wolff 
Cc: Andi Kleen 
Cc: Jiri Olsa 
Cc: Kan Liang 
Link: 
http://lkml.kernel.org/r/1490474069-15823-3-git-send-email-yao@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/srcline.c| 167 +--
 tools/perf/util/symbol-elf.c |   5 ++
 tools/perf/util/symbol-minimal.c |   7 ++
 tools/perf/util/symbol.h |   2 +
 tools/perf/util/util.h   |  16 
 5 files changed, 192 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index 2953c9f..3ce28f7 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -7,6 +7,7 @@
 #include "util/dso.h"
 #include "util/util.h"
 #include "util/debug.h"
+#include "util/callchain.h"
 
 #include "symbol.h"
 
@@ -30,6 +31,34 @@ static const char *dso__name(struct dso *dso)
return dso_name;
 }
 
+static int inline_list__append(char *filename, char *funcname, int line_nr,
+  struct inline_node *node, struct dso *dso)
+{
+   struct inline_list *ilist;
+   char *demangled;
+
+   ilist = zalloc(sizeof(*ilist));
+   if (ilist == NULL)
+   return -1;
+
+   ilist->filename = filename;
+   ilist->line_nr = line_nr;
+
+   if (dso != NULL) {
+   demangled = dso__demangle_sym(dso, 0, funcname);
+   if (demangled == NULL) {
+   ilist->funcname = funcname;
+   } else {
+   ilist->funcname = demangled;
+   free(funcname);
+   }
+   }
+
+   list_add_tail(>list, >val);
+
+   return 0;
+}
+
 #ifdef HAVE_LIBBFD_SUPPORT
 
 /*
@@ -169,9 +198,17 @@ static void addr2line_cleanup(struct a2l_data *a2l)
 
 #define MAX_INLINE_NEST 1024
 
+static void inline_list__reverse(struct inline_node *node)
+{
+   struct inline_list *ilist, *n;
+
+   list_for_each_entry_safe_reverse(ilist, n, >val, list)
+   list_move_tail(>list, >val);
+}
+
 static int addr2line(const char *dso_name, u64 addr,
 char **file, unsigned int *line, struct dso *dso,
-bool unwind_inlines)
+bool unwind_inlines, struct inline_node *node)
 {
int ret = 0;
struct a2l_data *a2l = dso->a2l;
@@ -196,8 +233,21 @@ static int addr2line(const char *dso_name, u64 addr,
 
while (bfd_find_inliner_info(a2l->abfd, >filename,
 >funcname, >line) &&
-  cnt++ < MAX_INLINE_NEST)
-   ;
+  cnt++ < MAX_INLINE_NEST) {
+
+   if (node != NULL) {
+   if (inline_list__append(strdup(a2l->filename),
+   strdup(a2l->funcname),
+   a2l->line, node,
+   dso) != 0)
+   return 0;
+   }
+   }
+
+   if ((node != NULL) &&
+   (callchain_param.order != ORDER_CALLEE)) {
+   inline_list__reverse(node);
+   }
}
 
if (a2l->found && a2l->filename) {
@@ -223,6 +273,35 @@ void dso__free_a2l(struct dso *dso)
dso->a2l = NULL;
 }
 
+static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
+   struct dso *dso)
+{
+   char *file = NULL;
+   unsigned int line = 0;
+   struct inline_node *node;
+
+   node = zalloc(sizeof(*node));
+   if (node == NULL) {
+   perror("not enough memory for the inline node");
+   return NULL;
+   }
+
+   INIT_LIST_HEAD(>val);
+   node->addr = addr;
+
+   if (!addr2line(dso_name, addr, , , dso, TRUE, node))
+   goto out_free_inline_node;
+
+   if (list_empty(>val))
+   goto out_free_inline_node;
+
+   return node;
+
+out_free_inline_node:
+

[PATCH 0/2] drivers: serial: Aspeed VUART driver

2017-03-27 Thread Joel Stanley

This is a driver for the Aspeed VUART. The VUART is a serial device on the BMC
side of the LPC bus that connects a BMC to it's host processor.

We add a flag to the serial core to allow the driver to skip probing of the
THRE irq behaviour, which could hang due to the host not reading bytes out of
the buffer.

We've been using this on systems for over a year, so it has seen a good amount
of testing.

Cheers,

Joel


Jeremy Kerr (1):
  drivers/serial: Add driver for Aspeed virtual UART

Joel Stanley (1):
  serial: 8250: Add flag so drivers can avoid THRE probe

 Documentation/devicetree/bindings/serial/8250.txt |   2 +
 drivers/tty/serial/8250/8250_port.c   |   2 +-
 drivers/tty/serial/Kconfig|  10 +
 drivers/tty/serial/Makefile   |   1 +
 drivers/tty/serial/aspeed-vuart.c | 335 ++
 include/linux/serial_core.h   |   1 +
 6 files changed, 350 insertions(+), 1 deletion(-)
 create mode 100644 drivers/tty/serial/aspeed-vuart.c

-- 
2.11.0

[PATCH 0/2] drivers: serial: Aspeed VUART driver

2017-03-27 Thread Joel Stanley

This is a driver for the Aspeed VUART. The VUART is a serial device on the BMC
side of the LPC bus that connects a BMC to it's host processor.

We add a flag to the serial core to allow the driver to skip probing of the
THRE irq behaviour, which could hang due to the host not reading bytes out of
the buffer.

We've been using this on systems for over a year, so it has seen a good amount
of testing.

Cheers,

Joel


Jeremy Kerr (1):
  drivers/serial: Add driver for Aspeed virtual UART

Joel Stanley (1):
  serial: 8250: Add flag so drivers can avoid THRE probe

 Documentation/devicetree/bindings/serial/8250.txt |   2 +
 drivers/tty/serial/8250/8250_port.c   |   2 +-
 drivers/tty/serial/Kconfig|  10 +
 drivers/tty/serial/Makefile   |   1 +
 drivers/tty/serial/aspeed-vuart.c | 335 ++
 include/linux/serial_core.h   |   1 +
 6 files changed, 350 insertions(+), 1 deletion(-)
 create mode 100644 drivers/tty/serial/aspeed-vuart.c

-- 
2.11.0

[tip:perf/core] perf trace: Fixup thread refcounting

2017-03-27 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  ef65e96e0762cb98d9abeb6737c721ca840f8092
Gitweb: http://git.kernel.org/tip/ef65e96e0762cb98d9abeb6737c721ca840f8092
Author: Arnaldo Carvalho de Melo 
AuthorDate: Fri, 24 Mar 2017 15:03:19 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Fri, 24 Mar 2017 16:05:31 -0300

perf trace: Fixup thread refcounting

In trace__vfs_getname() and when checking if a thread is filtered in
trace__process_sample() we were not dropping the reference obtained via
machine__findnew_thread(), fix it.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: http://lkml.kernel.org/n/tip-9gc470phavxwxv5d9w7ck...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-trace.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 2425605..60053d4 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1653,17 +1653,17 @@ static int trace__vfs_getname(struct trace *trace, 
struct perf_evsel *evsel,
 
ttrace = thread__priv(thread);
if (!ttrace)
-   goto out;
+   goto out_put;
 
filename_len = strlen(filename);
if (filename_len == 0)
-   goto out;
+   goto out_put;
 
if (ttrace->filename.namelen < filename_len) {
char *f = realloc(ttrace->filename.name, filename_len + 1);
 
if (f == NULL)
-   goto out;
+   goto out_put;
 
ttrace->filename.namelen = filename_len;
ttrace->filename.name = f;
@@ -1673,12 +1673,12 @@ static int trace__vfs_getname(struct trace *trace, 
struct perf_evsel *evsel,
ttrace->filename.pending_open = true;
 
if (!ttrace->filename.ptr)
-   goto out;
+   goto out_put;
 
entry_str_len = strlen(ttrace->entry_str);
remaining_space = trace__entry_str_size - entry_str_len - 1; /* \0 */
if (remaining_space <= 0)
-   goto out;
+   goto out_put;
 
if (filename_len > (size_t)remaining_space) {
filename += filename_len - remaining_space;
@@ -1692,6 +1692,8 @@ static int trace__vfs_getname(struct trace *trace, struct 
perf_evsel *evsel,
 
ttrace->filename.ptr = 0;
ttrace->filename.entry_str_pos = 0;
+out_put:
+   thread__put(thread);
 out:
return 0;
 }
@@ -1712,6 +1714,7 @@ static int trace__sched_stat_runtime(struct trace *trace, 
struct perf_evsel *evs
 
ttrace->runtime_ms += runtime_ms;
trace->runtime_ms += runtime_ms;
+out_put:
thread__put(thread);
return 0;
 
@@ -1722,8 +1725,7 @@ out_dump:
   (pid_t)perf_evsel__intval(evsel, sample, "pid"),
   runtime,
   perf_evsel__intval(evsel, sample, "vruntime"));
-   thread__put(thread);
-   return 0;
+   goto out_put;
 }
 
 static void bpf_output__printer(enum binary_printer_ops op,
@@ -1922,7 +1924,7 @@ static int trace__process_sample(struct perf_tool *tool,
 
thread = machine__findnew_thread(trace->host, sample->pid, sample->tid);
if (thread && thread__is_filtered(thread))
-   return 0;
+   goto out;
 
trace__set_base_time(trace, evsel, sample);
 
@@ -1930,7 +1932,8 @@ static int trace__process_sample(struct perf_tool *tool,
++trace->nr_events;
handler(trace, evsel, event, sample);
}
-
+out:
+   thread__put(thread);
return err;
 }

[tip:perf/core] perf trace: Fixup thread refcounting

2017-03-27 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  ef65e96e0762cb98d9abeb6737c721ca840f8092
Gitweb: http://git.kernel.org/tip/ef65e96e0762cb98d9abeb6737c721ca840f8092
Author: Arnaldo Carvalho de Melo 
AuthorDate: Fri, 24 Mar 2017 15:03:19 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Fri, 24 Mar 2017 16:05:31 -0300

perf trace: Fixup thread refcounting

In trace__vfs_getname() and when checking if a thread is filtered in
trace__process_sample() we were not dropping the reference obtained via
machine__findnew_thread(), fix it.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: http://lkml.kernel.org/n/tip-9gc470phavxwxv5d9w7ck...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-trace.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 2425605..60053d4 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1653,17 +1653,17 @@ static int trace__vfs_getname(struct trace *trace, 
struct perf_evsel *evsel,
 
ttrace = thread__priv(thread);
if (!ttrace)
-   goto out;
+   goto out_put;
 
filename_len = strlen(filename);
if (filename_len == 0)
-   goto out;
+   goto out_put;
 
if (ttrace->filename.namelen < filename_len) {
char *f = realloc(ttrace->filename.name, filename_len + 1);
 
if (f == NULL)
-   goto out;
+   goto out_put;
 
ttrace->filename.namelen = filename_len;
ttrace->filename.name = f;
@@ -1673,12 +1673,12 @@ static int trace__vfs_getname(struct trace *trace, 
struct perf_evsel *evsel,
ttrace->filename.pending_open = true;
 
if (!ttrace->filename.ptr)
-   goto out;
+   goto out_put;
 
entry_str_len = strlen(ttrace->entry_str);
remaining_space = trace__entry_str_size - entry_str_len - 1; /* \0 */
if (remaining_space <= 0)
-   goto out;
+   goto out_put;
 
if (filename_len > (size_t)remaining_space) {
filename += filename_len - remaining_space;
@@ -1692,6 +1692,8 @@ static int trace__vfs_getname(struct trace *trace, struct 
perf_evsel *evsel,
 
ttrace->filename.ptr = 0;
ttrace->filename.entry_str_pos = 0;
+out_put:
+   thread__put(thread);
 out:
return 0;
 }
@@ -1712,6 +1714,7 @@ static int trace__sched_stat_runtime(struct trace *trace, 
struct perf_evsel *evs
 
ttrace->runtime_ms += runtime_ms;
trace->runtime_ms += runtime_ms;
+out_put:
thread__put(thread);
return 0;
 
@@ -1722,8 +1725,7 @@ out_dump:
   (pid_t)perf_evsel__intval(evsel, sample, "pid"),
   runtime,
   perf_evsel__intval(evsel, sample, "vruntime"));
-   thread__put(thread);
-   return 0;
+   goto out_put;
 }
 
 static void bpf_output__printer(enum binary_printer_ops op,
@@ -1922,7 +1924,7 @@ static int trace__process_sample(struct perf_tool *tool,
 
thread = machine__findnew_thread(trace->host, sample->pid, sample->tid);
if (thread && thread__is_filtered(thread))
-   return 0;
+   goto out;
 
trace__set_base_time(trace, evsel, sample);
 
@@ -1930,7 +1932,8 @@ static int trace__process_sample(struct perf_tool *tool,
++trace->nr_events;
handler(trace, evsel, event, sample);
}
-
+out:
+   thread__put(thread);
return err;
 }

[tip:perf/core] perf trace: Check for vfs_getname.pathname length

2017-03-27 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  39f0e7a825cfc971dc9ad40b0770c22f6f4f89b8
Gitweb: http://git.kernel.org/tip/39f0e7a825cfc971dc9ad40b0770c22f6f4f89b8
Author: Arnaldo Carvalho de Melo 
AuthorDate: Fri, 24 Mar 2017 14:51:28 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Fri, 24 Mar 2017 16:05:31 -0300

perf trace: Check for vfs_getname.pathname length

It shouldn't be zero, but if the 'perf probe' on getname_flags() (or
elsewhere in the future we need to probe to catch the pathname for
syscalls like 'open' being copied from userspace to the kernel) is
misplaced somehow, then we will end up not allocating space and trying
to copy the "" empty string to ttrace->filename.name, causing a
segfault, fix it.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: http://lkml.kernel.org/n/tip-c4f1t6sx1nczuzop19r5s...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-trace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 912fedc..33c657c 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1656,6 +1656,8 @@ static int trace__vfs_getname(struct trace *trace, struct 
perf_evsel *evsel,
goto out;
 
filename_len = strlen(filename);
+   if (filename_len == 0)
+   goto out;
 
if (ttrace->filename.namelen < filename_len) {
char *f = realloc(ttrace->filename.name, filename_len + 1);

[tip:perf/core] perf trace: Check for vfs_getname.pathname length

2017-03-27 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  39f0e7a825cfc971dc9ad40b0770c22f6f4f89b8
Gitweb: http://git.kernel.org/tip/39f0e7a825cfc971dc9ad40b0770c22f6f4f89b8
Author: Arnaldo Carvalho de Melo 
AuthorDate: Fri, 24 Mar 2017 14:51:28 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Fri, 24 Mar 2017 16:05:31 -0300

perf trace: Check for vfs_getname.pathname length

It shouldn't be zero, but if the 'perf probe' on getname_flags() (or
elsewhere in the future we need to probe to catch the pathname for
syscalls like 'open' being copied from userspace to the kernel) is
misplaced somehow, then we will end up not allocating space and trying
to copy the "" empty string to ttrace->filename.name, causing a
segfault, fix it.

Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: http://lkml.kernel.org/n/tip-c4f1t6sx1nczuzop19r5s...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-trace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 912fedc..33c657c 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1656,6 +1656,8 @@ static int trace__vfs_getname(struct trace *trace, struct 
perf_evsel *evsel,
goto out;
 
filename_len = strlen(filename);
+   if (filename_len == 0)
+   goto out;
 
if (ttrace->filename.namelen < filename_len) {
char *f = realloc(ttrace->filename.name, filename_len + 1);

Re: [PATCH 4.10 012/167] mmc: sdhci-acpi: support deferred probe

2017-03-27 Thread Zhang Rui

On Mon, 2017-03-27 at 18:36 +0200, Greg Kroah-Hartman wrote:
> On Mon, Mar 27, 2017 at 10:40:23AM +0800, Zhang Rui wrote:
> > 
> > On Sun, 2017-03-26 at 12:26 +0100, Andrey Utkin wrote:
> > > 
> > > On Fri, Mar 10, 2017 at 10:07:35AM +0100, Greg Kroah-Hartman
> > > wrote:
> > > > 
> > > > 
> > > > 4.10-stable review patch.  If anyone has any objections, please
> > > > let
> > > > me know.
> > > > 
> > > > --
> > > > 
> > > > From: Zhang Rui 
> > > > 
> > > > commit e28d6f048799acb0014491e6b74e580d84bd7916 upstream.
> > > > 
> > > > With commit 67bf5156edc4 ("gpio / ACPI: fix returned error from
> > > > acpi_dev_gpio_irq_get()"), mmc_gpiod_request_cd() returns
> > > > -EPROBE_DEFER if
> > > > GPIO is not ready when sdhci-acpi driver is probed, and sdhci-
> > > > acpi
> > > > driver
> > > > should be probed again later in this case.
> > > > 
> > > > This fixes an order issue when both GPIO and sdhci-acpi drivers
> > > > are
> > > > built
> > > > as modules.
> > > > 
> > > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=177101htt
> > > > ps://bugzilla.kernel.org/show_bug.cgi?id=177101
> > > > Tested-by: Jonas Aaberg 
> > > > Signed-off-by: Zhang Rui 
> > > > Acked-by: Adrian Hunter 
> > > > Signed-off-by: Ulf Hansson 
> > > > Signed-off-by: Greg Kroah-Hartman 
> > > > 
> > > > ---
> > > >  drivers/mmc/host/sdhci-acpi.c |5 -
> > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > > 
> > > > --- a/drivers/mmc/host/sdhci-acpi.c
> > > > +++ b/drivers/mmc/host/sdhci-acpi.c
> > > > @@ -467,7 +467,10 @@ static int sdhci_acpi_probe(struct platf
> > > >     if (sdhci_acpi_flag(c, SDHCI_ACPI_SD_CD)) {
> > > >     bool v = sdhci_acpi_flag(c,
> > > > SDHCI_ACPI_SD_CD_OVERRIDE_LEVEL);
> > > >  
> > > > -   if (mmc_gpiod_request_cd(host->mmc, NULL, 0,
> > > > v, 0,
> > > > NULL)) {
> > > > +   err = mmc_gpiod_request_cd(host->mmc, NULL, 0,
> > > > v,
> > > > 0, NULL);
> > > > +   if (err) {
> > > > +   if (err == -EPROBE_DEFER)
> > > > +   goto err_free;
> > > >     dev_warn(dev, "failed to setup card
> > > > detect
> > > > gpio\n");
> > > >     c->use_runtime_pm = false;
> > > >     }
> > > > 
> > > > 
> > > Regression reported: https://bugzilla.kernel.org/show_bug.cgi?id=
> > > 1948
> > > 71
> > > 
> > > Reverting this patch is said to fix the issue for 4.10.2.
> > thanks for raising the issue. Let's see check why it breaks in the
> > bugzilla report.
> Is this also broken in Linus's tree?
> 
Well, I think so.

Although it's still under debugging, the root cause of the problem
seems to be that, when mmc_gpiod_request_cd() returns -EPROBE_DEFER, it
means either the GPIO controller driver is not probed at the moment, OR
the GPIO controller driver is not available at all. The later case
 causes the problem like this because sdhci-acpi driver is made to wait
for the GPIO controller, in the patch above. 

This is not a problem for distro kernel when all the driver are built
as modules. And the problem should be fixed by enabling the GPIO
controller driver in kernel config.

thanks,
rui

Re: [PATCH net-next v2 5/5] net-next: dsa: add dsa support for Mediatek MT7530 switch

2017-03-27 Thread Sean Wang

Hi Florian,

Thank for taking your time on reviewing. Add comment as inline.

On Wed, 2017-03-22 at 11:39 -0700, Florian Fainelli wrote:
> On 03/21/2017 02:35 AM, sean.w...@mediatek.com wrote:
> > From: Sean Wang 
> > 
> > MT7530 is a 7-ports Gigabit Ethernet Switch that could be found on
> > Mediatek router platforms such as MT7623A or MT7623N platform which
> > includes 7-port Gigabit Ethernet MAC and 5-port Gigabit Ethernet PHY.
> > Among these ports, The port from 0 to 4 are the user ports connecting
> > with the remote devices while the port 5 and 6 are the CPU ports
> > connecting into Mediatek Ethernet GMAC.
> > 
> > For port 6, it can communicate with the CPU via Mediatek Ethernet GMAC
> > through either the TRGMII or RGMII which could be controlled by phy-mode
> > in the dt-bindings to specify which mode is preferred to use. And for
> > port 5, only RGMII can be specified. However, currently, only port 6 is
> > being supported in this DSA driver.
> > 
> > The driver is made with the reference to qca8k and other existing DSA
> > driver. The most of the essential callbacks of the DSA are already
> > support in the driver, including tag insert for user port distinguishing,
> > port control, bridge offloading, STP setup and ethtool operation to allow
> > DSA to model each user port into a standalone netdevice as the other DSA
> > driver had done.
> 
> Overall, this looks pretty nice and clean, a few comments below
> 
> > 
> > Signed-off-by: Sean Wang 
> > Signed-off-by: Landen Chao 
> > ---
> 
> > +static void
> > +mt7530_fdb_read(struct mt7530_priv *priv, struct mt7530_fdb *fdb)
> > +{
> > +   u32 reg[3];
> > +   int i;
> > +
> > +   /* Read from ARL table into an array */
> > +   for (i = 0; i < 3; i++) {
> > +   reg[i] = mt7530_read(priv, MT7530_TSRA1 + (i * 4));
> > +
> > +   dev_dbg(priv->dev, "%s(%d) reg[%d]=0x%x\n",
> > +   __func__, __LINE__, i, reg[i]);
> > +   }
> > +
> > +   /* vid - 11:0 on reg[1] */
> > +   fdb->vid = (reg[1] >> 0) & 0xfff;
> > +   /* aging - 31:24 on reg[2] */
> > +   fdb->aging = (reg[2] >> 24) & 0xff;
> > +   /* portmask - 11:4 on reg[2] */
> > +   fdb->port_mask = (reg[2] >> 4) & 0xff;
> > +   /* mac - 31:0 on reg[0] and 31:16 on reg[1] */
> > +   fdb->mac[0] = (reg[0] >> 24) & 0xff;
> > +   fdb->mac[1] = (reg[0] >> 16) & 0xff;
> > +   fdb->mac[2] = (reg[0] >>  8) & 0xff;
> > +   fdb->mac[3] = (reg[0] >>  0) & 0xff;
> > +   fdb->mac[4] = (reg[1] >> 24) & 0xff;
> > +   fdb->mac[5] = (reg[1] >> 16) & 0xff;
> > +   /* noarp - 3:2 on reg[2] */
> > +   fdb->noarp = ((reg[2] >> 2) & 0x3) == STATIC_ENT;
> 
> Could you add some definitions for the bits and masks that you are
> shifting here?
> 

Okay, I'll make into proper macro for readability  

> > +}
> > +
> > +static void
> > +mt7530_fdb_write(struct mt7530_priv *priv, u16 vid,
> > +u8 port_mask, const u8 *mac,
> > +u8 aging, u8 type)
> > +{
> > +   u32 reg[3] = { 0 };
> > +   int i;
> > +
> > +   /* vid - 11:0 on reg[1] */
> > +   reg[1] |= (vid & 0xfff) << 0;
> > +   /* aging - 31:25 on reg[2] */
> > +   reg[2] |= (aging & 0xff) << 24;
> > +   /* portmask - 11:4 on reg[2] */
> > +   reg[2] |= (port_mask & 0xff) << 4;
> > +   /* type - 3 indicate that entry is static wouldn't
> > +* be aged out and 0 specified as erasing an entry
> > +*/
> > +   reg[2] |= (type & 0x3) << 2;
> > +   /* mac - 31:0 on reg[0] and 31:16 on reg[1] */
> > +   reg[1] |= mac[5] << 16;
> > +   reg[1] |= mac[4] << 24;
> > +   reg[0] |= mac[3] << 0;
> > +   reg[0] |= mac[2] << 8;
> > +   reg[0] |= mac[1] << 16;
> > +   reg[0] |= mac[0] << 24;
> > +
> > +   /* Wrirte array into the ARL table */
> > +   for (i = 0; i < 3; i++)
> > +   mt7530_write(priv, MT7530_ATA1 + (i * 4), reg[i]);
> > +}
> 
> Same here.
> 

As above. I will improve them.


> > +
> > +static int
> > +mt7530_pad_clk_setup(struct dsa_switch *ds, int mode)
> > +{
> > +   struct mt7530_priv *priv = ds->priv;
> > +   u32 ncpo1, ssc_delta, trgint, i;
> > +
> > +   switch (mode) {
> > +   case PHY_INTERFACE_MODE_RGMII:
> > +   trgint = 0;
> > +   ncpo1 = 0x0c80;
> > +   ssc_delta = 0x87;
> > +   break;
> > +   case PHY_INTERFACE_MODE_TRGMII:
> > +   trgint = 1;
> > +   ncpo1 = 0x1400;
> > +   ssc_delta = 0x57;
> > +   break;
> > +   default:
> > +   pr_err("xMII mode %d not supported\n", mode);
> > +   return -EINVAL;
> > +   }
> 
> You may be able to move this to an adjust_link callback that the PHY
> library would call when the PHY gets setup and the port is finally used,
> as opposed to doing this upfront during driver initialization.
> 


Good point. i will follow up


> 
> > +mt7530_setup(struct dsa_switch *ds)
> > +{
> > +   struct mt7530_priv *priv = ds->priv;
> > +   int ret, i, phy_mode;
> > +   u8  cpup_mask = 0;
> > +   u32 id, val;
> > +   struct regmap

[tip:perf/core] perf auxtrace: Fix no_size logic in addr_filter__resolve_kernel_syms()

2017-03-27 Thread tip-bot for Adrian Hunter

Commit-ID:  c3a0bbc7ad7598dec5a204868bdf8a2b1b51df14
Gitweb: http://git.kernel.org/tip/c3a0bbc7ad7598dec5a204868bdf8a2b1b51df14
Author: Adrian Hunter 
AuthorDate: Fri, 24 Mar 2017 14:15:52 +0200
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 27 Mar 2017 11:58:08 -0300

perf auxtrace: Fix no_size logic in addr_filter__resolve_kernel_syms()

Address filtering with kernel symbols incorrectly resulted in the error
"Cannot determine size of symbol" because the no_size logic was the wrong
way around.

Signed-off-by: Adrian Hunter 
Tested-by: Andi Kleen 
Cc: sta...@vger.kernel.org # v4.9+
Link: 
http://lkml.kernel.org/r/1490357752-27942-1-git-send-email-adrian.hun...@intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/auxtrace.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index c5a6e0b1..78bd632 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1826,7 +1826,7 @@ static int addr_filter__resolve_kernel_syms(struct 
addr_filter *filt)
filt->addr = start;
if (filt->range && !filt->size && !filt->sym_to) {
filt->size = size;
-   no_size = !!size;
+   no_size = !size;
}
}
 
@@ -1840,7 +1840,7 @@ static int addr_filter__resolve_kernel_syms(struct 
addr_filter *filt)
if (err)
return err;
filt->size = start + size - filt->addr;
-   no_size = !!size;
+   no_size = !size;
}
 
/* The very last symbol in kallsyms does not imply a particular size */

Re: [PATCH 4.10 012/167] mmc: sdhci-acpi: support deferred probe

2017-03-27 Thread Zhang Rui

On Mon, 2017-03-27 at 18:36 +0200, Greg Kroah-Hartman wrote:
> On Mon, Mar 27, 2017 at 10:40:23AM +0800, Zhang Rui wrote:
> > 
> > On Sun, 2017-03-26 at 12:26 +0100, Andrey Utkin wrote:
> > > 
> > > On Fri, Mar 10, 2017 at 10:07:35AM +0100, Greg Kroah-Hartman
> > > wrote:
> > > > 
> > > > 
> > > > 4.10-stable review patch.  If anyone has any objections, please
> > > > let
> > > > me know.
> > > > 
> > > > --
> > > > 
> > > > From: Zhang Rui 
> > > > 
> > > > commit e28d6f048799acb0014491e6b74e580d84bd7916 upstream.
> > > > 
> > > > With commit 67bf5156edc4 ("gpio / ACPI: fix returned error from
> > > > acpi_dev_gpio_irq_get()"), mmc_gpiod_request_cd() returns
> > > > -EPROBE_DEFER if
> > > > GPIO is not ready when sdhci-acpi driver is probed, and sdhci-
> > > > acpi
> > > > driver
> > > > should be probed again later in this case.
> > > > 
> > > > This fixes an order issue when both GPIO and sdhci-acpi drivers
> > > > are
> > > > built
> > > > as modules.
> > > > 
> > > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=177101htt
> > > > ps://bugzilla.kernel.org/show_bug.cgi?id=177101
> > > > Tested-by: Jonas Aaberg 
> > > > Signed-off-by: Zhang Rui 
> > > > Acked-by: Adrian Hunter 
> > > > Signed-off-by: Ulf Hansson 
> > > > Signed-off-by: Greg Kroah-Hartman 
> > > > 
> > > > ---
> > > >  drivers/mmc/host/sdhci-acpi.c |5 -
> > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > > 
> > > > --- a/drivers/mmc/host/sdhci-acpi.c
> > > > +++ b/drivers/mmc/host/sdhci-acpi.c
> > > > @@ -467,7 +467,10 @@ static int sdhci_acpi_probe(struct platf
> > > >     if (sdhci_acpi_flag(c, SDHCI_ACPI_SD_CD)) {
> > > >     bool v = sdhci_acpi_flag(c,
> > > > SDHCI_ACPI_SD_CD_OVERRIDE_LEVEL);
> > > >  
> > > > -   if (mmc_gpiod_request_cd(host->mmc, NULL, 0,
> > > > v, 0,
> > > > NULL)) {
> > > > +   err = mmc_gpiod_request_cd(host->mmc, NULL, 0,
> > > > v,
> > > > 0, NULL);
> > > > +   if (err) {
> > > > +   if (err == -EPROBE_DEFER)
> > > > +   goto err_free;
> > > >     dev_warn(dev, "failed to setup card
> > > > detect
> > > > gpio\n");
> > > >     c->use_runtime_pm = false;
> > > >     }
> > > > 
> > > > 
> > > Regression reported: https://bugzilla.kernel.org/show_bug.cgi?id=
> > > 1948
> > > 71
> > > 
> > > Reverting this patch is said to fix the issue for 4.10.2.
> > thanks for raising the issue. Let's see check why it breaks in the
> > bugzilla report.
> Is this also broken in Linus's tree?
> 
Well, I think so.

Although it's still under debugging, the root cause of the problem
seems to be that, when mmc_gpiod_request_cd() returns -EPROBE_DEFER, it
means either the GPIO controller driver is not probed at the moment, OR
the GPIO controller driver is not available at all. The later case
 causes the problem like this because sdhci-acpi driver is made to wait
for the GPIO controller, in the patch above. 

This is not a problem for distro kernel when all the driver are built
as modules. And the problem should be fixed by enabling the GPIO
controller driver in kernel config.

thanks,
rui

Re: [PATCH net-next v2 5/5] net-next: dsa: add dsa support for Mediatek MT7530 switch

2017-03-27 Thread Sean Wang

Hi Florian,

Thank for taking your time on reviewing. Add comment as inline.

On Wed, 2017-03-22 at 11:39 -0700, Florian Fainelli wrote:
> On 03/21/2017 02:35 AM, sean.w...@mediatek.com wrote:
> > From: Sean Wang 
> > 
> > MT7530 is a 7-ports Gigabit Ethernet Switch that could be found on
> > Mediatek router platforms such as MT7623A or MT7623N platform which
> > includes 7-port Gigabit Ethernet MAC and 5-port Gigabit Ethernet PHY.
> > Among these ports, The port from 0 to 4 are the user ports connecting
> > with the remote devices while the port 5 and 6 are the CPU ports
> > connecting into Mediatek Ethernet GMAC.
> > 
> > For port 6, it can communicate with the CPU via Mediatek Ethernet GMAC
> > through either the TRGMII or RGMII which could be controlled by phy-mode
> > in the dt-bindings to specify which mode is preferred to use. And for
> > port 5, only RGMII can be specified. However, currently, only port 6 is
> > being supported in this DSA driver.
> > 
> > The driver is made with the reference to qca8k and other existing DSA
> > driver. The most of the essential callbacks of the DSA are already
> > support in the driver, including tag insert for user port distinguishing,
> > port control, bridge offloading, STP setup and ethtool operation to allow
> > DSA to model each user port into a standalone netdevice as the other DSA
> > driver had done.
> 
> Overall, this looks pretty nice and clean, a few comments below
> 
> > 
> > Signed-off-by: Sean Wang 
> > Signed-off-by: Landen Chao 
> > ---
> 
> > +static void
> > +mt7530_fdb_read(struct mt7530_priv *priv, struct mt7530_fdb *fdb)
> > +{
> > +   u32 reg[3];
> > +   int i;
> > +
> > +   /* Read from ARL table into an array */
> > +   for (i = 0; i < 3; i++) {
> > +   reg[i] = mt7530_read(priv, MT7530_TSRA1 + (i * 4));
> > +
> > +   dev_dbg(priv->dev, "%s(%d) reg[%d]=0x%x\n",
> > +   __func__, __LINE__, i, reg[i]);
> > +   }
> > +
> > +   /* vid - 11:0 on reg[1] */
> > +   fdb->vid = (reg[1] >> 0) & 0xfff;
> > +   /* aging - 31:24 on reg[2] */
> > +   fdb->aging = (reg[2] >> 24) & 0xff;
> > +   /* portmask - 11:4 on reg[2] */
> > +   fdb->port_mask = (reg[2] >> 4) & 0xff;
> > +   /* mac - 31:0 on reg[0] and 31:16 on reg[1] */
> > +   fdb->mac[0] = (reg[0] >> 24) & 0xff;
> > +   fdb->mac[1] = (reg[0] >> 16) & 0xff;
> > +   fdb->mac[2] = (reg[0] >>  8) & 0xff;
> > +   fdb->mac[3] = (reg[0] >>  0) & 0xff;
> > +   fdb->mac[4] = (reg[1] >> 24) & 0xff;
> > +   fdb->mac[5] = (reg[1] >> 16) & 0xff;
> > +   /* noarp - 3:2 on reg[2] */
> > +   fdb->noarp = ((reg[2] >> 2) & 0x3) == STATIC_ENT;
> 
> Could you add some definitions for the bits and masks that you are
> shifting here?
> 

Okay, I'll make into proper macro for readability  

> > +}
> > +
> > +static void
> > +mt7530_fdb_write(struct mt7530_priv *priv, u16 vid,
> > +u8 port_mask, const u8 *mac,
> > +u8 aging, u8 type)
> > +{
> > +   u32 reg[3] = { 0 };
> > +   int i;
> > +
> > +   /* vid - 11:0 on reg[1] */
> > +   reg[1] |= (vid & 0xfff) << 0;
> > +   /* aging - 31:25 on reg[2] */
> > +   reg[2] |= (aging & 0xff) << 24;
> > +   /* portmask - 11:4 on reg[2] */
> > +   reg[2] |= (port_mask & 0xff) << 4;
> > +   /* type - 3 indicate that entry is static wouldn't
> > +* be aged out and 0 specified as erasing an entry
> > +*/
> > +   reg[2] |= (type & 0x3) << 2;
> > +   /* mac - 31:0 on reg[0] and 31:16 on reg[1] */
> > +   reg[1] |= mac[5] << 16;
> > +   reg[1] |= mac[4] << 24;
> > +   reg[0] |= mac[3] << 0;
> > +   reg[0] |= mac[2] << 8;
> > +   reg[0] |= mac[1] << 16;
> > +   reg[0] |= mac[0] << 24;
> > +
> > +   /* Wrirte array into the ARL table */
> > +   for (i = 0; i < 3; i++)
> > +   mt7530_write(priv, MT7530_ATA1 + (i * 4), reg[i]);
> > +}
> 
> Same here.
> 

As above. I will improve them.


> > +
> > +static int
> > +mt7530_pad_clk_setup(struct dsa_switch *ds, int mode)
> > +{
> > +   struct mt7530_priv *priv = ds->priv;
> > +   u32 ncpo1, ssc_delta, trgint, i;
> > +
> > +   switch (mode) {
> > +   case PHY_INTERFACE_MODE_RGMII:
> > +   trgint = 0;
> > +   ncpo1 = 0x0c80;
> > +   ssc_delta = 0x87;
> > +   break;
> > +   case PHY_INTERFACE_MODE_TRGMII:
> > +   trgint = 1;
> > +   ncpo1 = 0x1400;
> > +   ssc_delta = 0x57;
> > +   break;
> > +   default:
> > +   pr_err("xMII mode %d not supported\n", mode);
> > +   return -EINVAL;
> > +   }
> 
> You may be able to move this to an adjust_link callback that the PHY
> library would call when the PHY gets setup and the port is finally used,
> as opposed to doing this upfront during driver initialization.
> 


Good point. i will follow up


> 
> > +mt7530_setup(struct dsa_switch *ds)
> > +{
> > +   struct mt7530_priv *priv = ds->priv;
> > +   int ret, i, phy_mode;
> > +   u8  cpup_mask = 0;
> > +   u32 id, val;
> > +   struct regmap *regmap;
> > +   struct device_node *dn;
> > +
> > +   /* Make sure that

[tip:perf/core] perf auxtrace: Fix no_size logic in addr_filter__resolve_kernel_syms()

2017-03-27 Thread tip-bot for Adrian Hunter

Commit-ID:  c3a0bbc7ad7598dec5a204868bdf8a2b1b51df14
Gitweb: http://git.kernel.org/tip/c3a0bbc7ad7598dec5a204868bdf8a2b1b51df14
Author: Adrian Hunter 
AuthorDate: Fri, 24 Mar 2017 14:15:52 +0200
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 27 Mar 2017 11:58:08 -0300

perf auxtrace: Fix no_size logic in addr_filter__resolve_kernel_syms()

Address filtering with kernel symbols incorrectly resulted in the error
"Cannot determine size of symbol" because the no_size logic was the wrong
way around.

Signed-off-by: Adrian Hunter 
Tested-by: Andi Kleen 
Cc: sta...@vger.kernel.org # v4.9+
Link: 
http://lkml.kernel.org/r/1490357752-27942-1-git-send-email-adrian.hun...@intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/auxtrace.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index c5a6e0b1..78bd632 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1826,7 +1826,7 @@ static int addr_filter__resolve_kernel_syms(struct 
addr_filter *filt)
filt->addr = start;
if (filt->range && !filt->size && !filt->sym_to) {
filt->size = size;
-   no_size = !!size;
+   no_size = !size;
}
}
 
@@ -1840,7 +1840,7 @@ static int addr_filter__resolve_kernel_syms(struct 
addr_filter *filt)
if (err)
return err;
filt->size = start + size - filt->addr;
-   no_size = !!size;
+   no_size = !size;
}
 
/* The very last symbol in kallsyms does not imply a particular size */

[tip:perf/core] perf trace: Fix up error path indentation

2017-03-27 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  c04dfafa6033ca2eddc56fe188017d9ae50414c9
Gitweb: http://git.kernel.org/tip/c04dfafa6033ca2eddc56fe188017d9ae50414c9
Author: Arnaldo Carvalho de Melo 
AuthorDate: Fri, 24 Mar 2017 14:54:06 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Fri, 24 Mar 2017 16:05:31 -0300

perf trace: Fix up error path indentation

Trivial fix removing a tab in an error path.

Link: http://lkml.kernel.org/n/tip-c14mk6cqaiby8gf5rpft3...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-trace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 33c657c..2425605 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1663,7 +1663,7 @@ static int trace__vfs_getname(struct trace *trace, struct 
perf_evsel *evsel,
char *f = realloc(ttrace->filename.name, filename_len + 1);
 
if (f == NULL)
-   goto out;
+   goto out;
 
ttrace->filename.namelen = filename_len;
ttrace->filename.name = f;

[tip:perf/core] perf trace: Fix up error path indentation

2017-03-27 Thread tip-bot for Arnaldo Carvalho de Melo

Commit-ID:  c04dfafa6033ca2eddc56fe188017d9ae50414c9
Gitweb: http://git.kernel.org/tip/c04dfafa6033ca2eddc56fe188017d9ae50414c9
Author: Arnaldo Carvalho de Melo 
AuthorDate: Fri, 24 Mar 2017 14:54:06 -0300
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Fri, 24 Mar 2017 16:05:31 -0300

perf trace: Fix up error path indentation

Trivial fix removing a tab in an error path.

Link: http://lkml.kernel.org/n/tip-c14mk6cqaiby8gf5rpft3...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-trace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 33c657c..2425605 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1663,7 +1663,7 @@ static int trace__vfs_getname(struct trace *trace, struct 
perf_evsel *evsel,
char *f = realloc(ttrace->filename.name, filename_len + 1);
 
if (f == NULL)
-   goto out;
+   goto out;
 
ttrace->filename.namelen = filename_len;
ttrace->filename.name = f;

Re: [PATCH 1/3] soc: qcom: smd: Transition client drivers from smd to rpmsg

2017-03-27 Thread Bjorn Andersson

On Mon 27 Mar 16:04 PDT 2017, David Miller wrote:

> From: Bjorn Andersson 
> Date: Mon, 27 Mar 2017 15:58:37 -0700
> 
> > I'm sorry, but I can't figure out how to reproduce this.
> 
> All of my builds are "make allmodconfig" so it should be easy to reproduce.

Thanks, turns out that while it was possible to select CONFIG_SMD_RPM
and CONFIG_QCOM_WCNSS_CTRL drivers/soc/Makefile does not traverse into
qcom/ unless CONFIG_ARCH_QCOM was set.

So I just sent out version 2 of the three patches, where I add an
explicit dependency on ARCH_QCOM for those two options.

Regards,
Bjorn

Re: [PATCH 1/3] soc: qcom: smd: Transition client drivers from smd to rpmsg

2017-03-27 Thread Bjorn Andersson

On Mon 27 Mar 16:04 PDT 2017, David Miller wrote:

> From: Bjorn Andersson 
> Date: Mon, 27 Mar 2017 15:58:37 -0700
> 
> > I'm sorry, but I can't figure out how to reproduce this.
> 
> All of my builds are "make allmodconfig" so it should be easy to reproduce.

Thanks, turns out that while it was possible to select CONFIG_SMD_RPM
and CONFIG_QCOM_WCNSS_CTRL drivers/soc/Makefile does not traverse into
qcom/ unless CONFIG_ARCH_QCOM was set.

So I just sent out version 2 of the three patches, where I add an
explicit dependency on ARCH_QCOM for those two options.

Regards,
Bjorn

[PATCH 2/2] drivers/serial: Add driver for Aspeed virtual UART

2017-03-27 Thread Joel Stanley

From: Jeremy Kerr 

This change adds a driver for the 16550-based Aspeed virtual UART
device. We use a similar process to the of_serial driver for device
probe, but expose some VUART-specific functions through sysfs too.

OpenPOWER host firmware doesn't like it when the host-side of the
VUART's FIFO is not drained. This driver only disables host TX discard
mode when the port is in use. We set the VUART enabled bit when we bind
to the device, and clear it on unbind.

We don't want to do this on open/release, as the host may be using this
bit to configure serial output modes, which is independent of whether
the devices has been opened by BMC userspace.

Signed-off-by: Jeremy Kerr 
Signed-off-by: Joel Stanley 
---
 Documentation/devicetree/bindings/serial/8250.txt |   2 +
 drivers/tty/serial/Kconfig|  10 +
 drivers/tty/serial/Makefile   |   1 +
 drivers/tty/serial/aspeed-vuart.c | 335 ++
 4 files changed, 348 insertions(+)
 create mode 100644 drivers/tty/serial/aspeed-vuart.c

diff --git a/Documentation/devicetree/bindings/serial/8250.txt 
b/Documentation/devicetree/bindings/serial/8250.txt
index f86bb06c39e9..a12e9277ac5d 100644
--- a/Documentation/devicetree/bindings/serial/8250.txt
+++ b/Documentation/devicetree/bindings/serial/8250.txt
@@ -19,6 +19,8 @@ Required properties:
- "altr,16550-FIFO128"
- "fsl,16550-FIFO64"
- "fsl,ns16550"
+   - "aspeed,ast2400-vuart"
+   - "aspeed,ast2500-vuart"
- "serial" if the port type is unknown.
 - reg : offset and length of the register set for the device.
 - interrupts : should contain uart interrupt.
diff --git a/drivers/tty/serial/Kconfig b/drivers/tty/serial/Kconfig
index e9cf5b67f1b7..758b69a51078 100644
--- a/drivers/tty/serial/Kconfig
+++ b/drivers/tty/serial/Kconfig
@@ -1129,6 +1129,16 @@ config SERIAL_NETX_CONSOLE
  If you have enabled the serial port on the Hilscher NetX SoC
  you can make it the console by answering Y to this option.
 
+config SERIAL_ASPEED_VUART
+   tristate "Aspeed Virtual UART"
+   depends on OF
+   depends on SERIAL_8250
+   help
+ If you want to use the virtual UART (VUART) device on Aspeed
+ BMC platforms, enable this option. This enables the 16550A-
+ compatible device on the local LPC bus, giving a UART device
+ with no physical RS232 connections.
+
 config SERIAL_OMAP
tristate "OMAP serial port support"
depends on ARCH_OMAP2PLUS
diff --git a/drivers/tty/serial/Makefile b/drivers/tty/serial/Makefile
index 2d6288bc4554..5b97b0fa29e2 100644
--- a/drivers/tty/serial/Makefile
+++ b/drivers/tty/serial/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_SERIAL_8250) += 8250/
 
 obj-$(CONFIG_SERIAL_AMBA_PL010) += amba-pl010.o
 obj-$(CONFIG_SERIAL_AMBA_PL011) += amba-pl011.o
+obj-$(CONFIG_SERIAL_ASPEED_VUART) += aspeed-vuart.o
 obj-$(CONFIG_SERIAL_CLPS711X) += clps711x.o
 obj-$(CONFIG_SERIAL_PXA_NON8250) += pxa.o
 obj-$(CONFIG_SERIAL_PNX8XXX) += pnx8xxx_uart.o
diff --git a/drivers/tty/serial/aspeed-vuart.c 
b/drivers/tty/serial/aspeed-vuart.c
new file mode 100644
index ..fc6fa6d243c8
--- /dev/null
+++ b/drivers/tty/serial/aspeed-vuart.c
@@ -0,0 +1,335 @@
+/*
+ *  Serial Port driver for Aspeed VUART device
+ *
+ *Copyright (C) 2016 Jeremy Kerr , IBM Corp.
+ *Copyright (C) 2006 Arnd Bergmann , IBM Corp.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "8250/8250.h"
+
+#define AST_VUART_GCRA 0x20
+#define AST_VUART_GCRA_VUART_EN0x01
+#define AST_VUART_GCRA_HOST_TX_DISCARD 0x20
+#define AST_VUART_GCRB 0x24
+#define AST_VUART_GCRB_HOST_SIRQ_MASK  0xf0
+#define AST_VUART_GCRB_HOST_SIRQ_SHIFT 4
+#define AST_VUART_ADDRL0x28
+#define AST_VUART_ADDRH0x2c
+
+struct ast_vuart {
+   struct platform_device *pdev;
+   void __iomem*regs;
+   struct clk  *clk;
+   int line;
+};
+
+static ssize_t ast_vuart_show_addr(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   struct ast_vuart *vuart = dev_get_drvdata(dev);
+   u16 addr;
+
+   addr = (readb(vuart->regs + AST_VUART_ADDRH) << 8) |
+   (readb(vuart->regs + AST_VUART_ADDRL));
+
+   return snprintf(buf, PAGE_SIZE - 1, "0x%x\n", addr);
+}
+
+static ssize_t ast_vuart_set_addr(struct device *dev,
+   struct device_attribute *attr,
+   const char *buf, size_t count)
+{
+   struct ast_vuart *vuart = dev_get_drvdata(dev);
+

[PATCH 2/2] drivers/serial: Add driver for Aspeed virtual UART

2017-03-27 Thread Joel Stanley

From: Jeremy Kerr 

This change adds a driver for the 16550-based Aspeed virtual UART
device. We use a similar process to the of_serial driver for device
probe, but expose some VUART-specific functions through sysfs too.

OpenPOWER host firmware doesn't like it when the host-side of the
VUART's FIFO is not drained. This driver only disables host TX discard
mode when the port is in use. We set the VUART enabled bit when we bind
to the device, and clear it on unbind.

We don't want to do this on open/release, as the host may be using this
bit to configure serial output modes, which is independent of whether
the devices has been opened by BMC userspace.

Signed-off-by: Jeremy Kerr 
Signed-off-by: Joel Stanley 
---
 Documentation/devicetree/bindings/serial/8250.txt |   2 +
 drivers/tty/serial/Kconfig|  10 +
 drivers/tty/serial/Makefile   |   1 +
 drivers/tty/serial/aspeed-vuart.c | 335 ++
 4 files changed, 348 insertions(+)
 create mode 100644 drivers/tty/serial/aspeed-vuart.c

diff --git a/Documentation/devicetree/bindings/serial/8250.txt 
b/Documentation/devicetree/bindings/serial/8250.txt
index f86bb06c39e9..a12e9277ac5d 100644
--- a/Documentation/devicetree/bindings/serial/8250.txt
+++ b/Documentation/devicetree/bindings/serial/8250.txt
@@ -19,6 +19,8 @@ Required properties:
- "altr,16550-FIFO128"
- "fsl,16550-FIFO64"
- "fsl,ns16550"
+   - "aspeed,ast2400-vuart"
+   - "aspeed,ast2500-vuart"
- "serial" if the port type is unknown.
 - reg : offset and length of the register set for the device.
 - interrupts : should contain uart interrupt.
diff --git a/drivers/tty/serial/Kconfig b/drivers/tty/serial/Kconfig
index e9cf5b67f1b7..758b69a51078 100644
--- a/drivers/tty/serial/Kconfig
+++ b/drivers/tty/serial/Kconfig
@@ -1129,6 +1129,16 @@ config SERIAL_NETX_CONSOLE
  If you have enabled the serial port on the Hilscher NetX SoC
  you can make it the console by answering Y to this option.
 
+config SERIAL_ASPEED_VUART
+   tristate "Aspeed Virtual UART"
+   depends on OF
+   depends on SERIAL_8250
+   help
+ If you want to use the virtual UART (VUART) device on Aspeed
+ BMC platforms, enable this option. This enables the 16550A-
+ compatible device on the local LPC bus, giving a UART device
+ with no physical RS232 connections.
+
 config SERIAL_OMAP
tristate "OMAP serial port support"
depends on ARCH_OMAP2PLUS
diff --git a/drivers/tty/serial/Makefile b/drivers/tty/serial/Makefile
index 2d6288bc4554..5b97b0fa29e2 100644
--- a/drivers/tty/serial/Makefile
+++ b/drivers/tty/serial/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_SERIAL_8250) += 8250/
 
 obj-$(CONFIG_SERIAL_AMBA_PL010) += amba-pl010.o
 obj-$(CONFIG_SERIAL_AMBA_PL011) += amba-pl011.o
+obj-$(CONFIG_SERIAL_ASPEED_VUART) += aspeed-vuart.o
 obj-$(CONFIG_SERIAL_CLPS711X) += clps711x.o
 obj-$(CONFIG_SERIAL_PXA_NON8250) += pxa.o
 obj-$(CONFIG_SERIAL_PNX8XXX) += pnx8xxx_uart.o
diff --git a/drivers/tty/serial/aspeed-vuart.c 
b/drivers/tty/serial/aspeed-vuart.c
new file mode 100644
index ..fc6fa6d243c8
--- /dev/null
+++ b/drivers/tty/serial/aspeed-vuart.c
@@ -0,0 +1,335 @@
+/*
+ *  Serial Port driver for Aspeed VUART device
+ *
+ *Copyright (C) 2016 Jeremy Kerr , IBM Corp.
+ *Copyright (C) 2006 Arnd Bergmann , IBM Corp.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "8250/8250.h"
+
+#define AST_VUART_GCRA 0x20
+#define AST_VUART_GCRA_VUART_EN0x01
+#define AST_VUART_GCRA_HOST_TX_DISCARD 0x20
+#define AST_VUART_GCRB 0x24
+#define AST_VUART_GCRB_HOST_SIRQ_MASK  0xf0
+#define AST_VUART_GCRB_HOST_SIRQ_SHIFT 4
+#define AST_VUART_ADDRL0x28
+#define AST_VUART_ADDRH0x2c
+
+struct ast_vuart {
+   struct platform_device *pdev;
+   void __iomem*regs;
+   struct clk  *clk;
+   int line;
+};
+
+static ssize_t ast_vuart_show_addr(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   struct ast_vuart *vuart = dev_get_drvdata(dev);
+   u16 addr;
+
+   addr = (readb(vuart->regs + AST_VUART_ADDRH) << 8) |
+   (readb(vuart->regs + AST_VUART_ADDRL));
+
+   return snprintf(buf, PAGE_SIZE - 1, "0x%x\n", addr);
+}
+
+static ssize_t ast_vuart_set_addr(struct device *dev,
+   struct device_attribute *attr,
+   const char *buf, size_t count)
+{
+   struct ast_vuart *vuart = dev_get_drvdata(dev);
+   unsigned long val;
+   int err;
+
+   err = kstrtoul(buf, 0, );
+

[PATCH 1/2] serial: 8250: Add flag so drivers can avoid THRE probe

2017-03-27 Thread Joel Stanley

The probing of THRE irq behaviour assumes the other end will be reading
bytes out of the buffer in order to probe the port at driver init. In
some cases the other end cannot be relied upon to read these bytes, so
provide a flag for them to skip this step.

Bit 16 was chosen as the flags are a int and the top bits are taken.

Acked-by: Benjamin Herrenschmidt 
Signed-off-by: Joel Stanley 
---
 drivers/tty/serial/8250/8250_port.c | 2 +-
 include/linux/serial_core.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/serial/8250/8250_port.c 
b/drivers/tty/serial/8250/8250_port.c
index fe4399b41df6..f4c6da107dec 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -2205,7 +2205,7 @@ int serial8250_do_startup(struct uart_port *port)
}
}
 
-   if (port->irq) {
+   if (port->irq && !(up->port.flags & UPF_NO_THRE_TEST)) {
unsigned char iir1;
/*
 * Test for UARTs that do not reassert THRE when the
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index 5def8e830fb0..f9e1fa39f553 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -195,6 +195,7 @@ struct uart_port {
 #define UPF_NO_TXEN_TEST   ((__force upf_t) (1 << 15))
 #define UPF_MAGIC_MULTIPLIER   ((__force upf_t) ASYNC_MAGIC_MULTIPLIER /* 16 
*/ )
 
+#define UPF_NO_THRE_TEST   ((__force upf_t) (1 << 19))
 /* Port has hardware-assisted h/w flow control */
 #define UPF_AUTO_CTS   ((__force upf_t) (1 << 20))
 #define UPF_AUTO_RTS   ((__force upf_t) (1 << 21))
-- 
2.11.0

Re: [GIT PULL 00/20] perf/core improvements and fixes

2017-03-27 Thread Ingo Molnar


* Arnaldo Carvalho de Melo <a...@kernel.org> wrote:

> Hi Ingo,
> 
>   Please consider pulling,
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit e3a6a62400520452fe39740dca90a1d0b94b8f92:
> 
>   Merge tag 'perf-core-for-mingo-4.12-20170324' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core 
> (2017-03-24 19:37:40 +0100)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo-4.12-20170327
> 
> for you to fetch changes up to 55f77128e7652e537d6c226d5b56821cdb5c22de:
> 
>   perf utils: Readlink /proc/self/exe to find the perf binary (2017-03-27 
> 15:37:54 -0300)
> 
> 
> perf/core improvements and fixes:
> 
> New features:
> 
> - Handle inline functions in callchains (Jin Yao)
> 
> - Enable sorting by srcline as key (Milian Wolff)
> 
> Fixes:
> 
> - Fix no_size logic in addr_filter__resolve_kernel_syms() in the
>   auxtrace code (Adrian Hunter)
> 
> - Fix some thread refcount leaks in 'perf trace' (Arnaldo Carvalho de Melo)
> 
> - Fix divide by zero when calculating percent for an event in a group in
>   the annotate by source line code (Taeung Song)
> 
> - build-id files now aren't anymore symlinks, their parent directories
>   are, so readlink the later (Taeung Song)
> 
> - Assorted fixes for null termination problems, mostly related to
>   readlink, detected by valgrind (Tommi Rantala)
> 
> Infrastructure:
> 
> - Make vfs_getname probe point logic in 'perf trace' more robust
>   wrt length of pathname (Arnaldo Carvalho de Melo)
> 
> - Remove unused 'prefix' parameter from builtins main functions (Arnaldo 
> Carvalho de Melo)
> 
> - Show 'perf list sdt' option in man page (Ravi Bangoria)
> 
> Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
> 
> 
> Adrian Hunter (1):
>   perf auxtrace: Fix no_size logic in addr_filter__resolve_kernel_syms()
> 
> Arnaldo Carvalho de Melo (4):
>   perf trace: Check for vfs_getname.pathname length
>   perf trace: Fix up error path indentation
>   perf trace: Fixup thread refcounting
>   perf tools: Remove unused 'prefix' from builtin functions
> 
> Jin Yao (5):
>   perf report: Refactor common code in srcline.c
>   perf report: Find the inline stack for a given address
>   perf report: Introduce --inline option
>   perf report: Show inline stack for stdio mode
>   perf report: Show inline stack for browser mode
> 
> Milian Wolff (1):
>   perf report: Enable sorting by srcline as key
> 
> Ravi Bangoria (1):
>   perf list sdt: Show option in man page
> 
> Taeung Song (2):
>   perf annotate: Fix a bug following symbolic link of a build-id file
>   perf annotate: Fix a bug of division by zero when calculating percent
> 
> Tommi Rantala (6):
>   perf buildid: Do not update SDT cache with null filename
>   perf buildid: Do not assume that readlink() returns a null terminated 
> string
>   perf tests: Do not assume that readlink() returns a null terminated 
> string
>   perf utils: use sizeof(buf) - 1 in readlink() call
>   perf utils: Null terminate buf in read_ftrace_printk()
>   perf utils: Readlink /proc/self/exe to find the perf binary
> 
>  tools/perf/Documentation/perf-list.txt   |   4 +-
>  tools/perf/Documentation/perf-report.txt |   5 +
>  tools/perf/bench/bench.h |  20 +--
>  tools/perf/bench/futex-hash.c|   3 +-
>  tools/perf/bench/futex-lock-pi.c |   3 +-
>  tools/perf/bench/futex-requeue.c |   3 +-
>  tools/perf/bench/futex-wake-parallel.c   |   3 +-
>  tools/perf/bench/futex-wake.c|   3 +-
>  tools/perf/bench/mem-functions.c |   4 +-
>  tools/perf/bench/numa.c  |   2 +-
>  tools/perf/bench/sched-messaging.c   |   3 +-
>  tools/perf/bench/sched-pipe.c|   2 +-
>  tools/perf/builtin-annotate.c|   2 +-
>  tools/perf/builtin-bench.c   |  12 +-
>  tools/perf/builtin-buildid-cache.c   |   3 +-
>  tools/perf/builtin-buildid-list.c|   3 +-
>  tools/perf/builtin-c2c.c |   4 +-
>  tools/perf/builtin-config.c  |   2 +-
>  tools/perf/builtin-data.c|   9 +-
>  tools/perf/builtin-diff.c|   2 +-
>  tools/perf/builtin-evlist.c  |   2 +-
>  tools/perf/builtin-ftrace.c  |   2 +-
>  tools/perf/builtin-h

[PATCH 1/2] serial: 8250: Add flag so drivers can avoid THRE probe

2017-03-27 Thread Joel Stanley

The probing of THRE irq behaviour assumes the other end will be reading
bytes out of the buffer in order to probe the port at driver init. In
some cases the other end cannot be relied upon to read these bytes, so
provide a flag for them to skip this step.

Bit 16 was chosen as the flags are a int and the top bits are taken.

Acked-by: Benjamin Herrenschmidt 
Signed-off-by: Joel Stanley 
---
 drivers/tty/serial/8250/8250_port.c | 2 +-
 include/linux/serial_core.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/serial/8250/8250_port.c 
b/drivers/tty/serial/8250/8250_port.c
index fe4399b41df6..f4c6da107dec 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -2205,7 +2205,7 @@ int serial8250_do_startup(struct uart_port *port)
}
}
 
-   if (port->irq) {
+   if (port->irq && !(up->port.flags & UPF_NO_THRE_TEST)) {
unsigned char iir1;
/*
 * Test for UARTs that do not reassert THRE when the
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index 5def8e830fb0..f9e1fa39f553 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -195,6 +195,7 @@ struct uart_port {
 #define UPF_NO_TXEN_TEST   ((__force upf_t) (1 << 15))
 #define UPF_MAGIC_MULTIPLIER   ((__force upf_t) ASYNC_MAGIC_MULTIPLIER /* 16 
*/ )
 
+#define UPF_NO_THRE_TEST   ((__force upf_t) (1 << 19))
 /* Port has hardware-assisted h/w flow control */
 #define UPF_AUTO_CTS   ((__force upf_t) (1 << 20))
 #define UPF_AUTO_RTS   ((__force upf_t) (1 << 21))
-- 
2.11.0

Re: [GIT PULL 00/20] perf/core improvements and fixes

2017-03-27 Thread Ingo Molnar


* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling,
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit e3a6a62400520452fe39740dca90a1d0b94b8f92:
> 
>   Merge tag 'perf-core-for-mingo-4.12-20170324' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core 
> (2017-03-24 19:37:40 +0100)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo-4.12-20170327
> 
> for you to fetch changes up to 55f77128e7652e537d6c226d5b56821cdb5c22de:
> 
>   perf utils: Readlink /proc/self/exe to find the perf binary (2017-03-27 
> 15:37:54 -0300)
> 
> 
> perf/core improvements and fixes:
> 
> New features:
> 
> - Handle inline functions in callchains (Jin Yao)
> 
> - Enable sorting by srcline as key (Milian Wolff)
> 
> Fixes:
> 
> - Fix no_size logic in addr_filter__resolve_kernel_syms() in the
>   auxtrace code (Adrian Hunter)
> 
> - Fix some thread refcount leaks in 'perf trace' (Arnaldo Carvalho de Melo)
> 
> - Fix divide by zero when calculating percent for an event in a group in
>   the annotate by source line code (Taeung Song)
> 
> - build-id files now aren't anymore symlinks, their parent directories
>   are, so readlink the later (Taeung Song)
> 
> - Assorted fixes for null termination problems, mostly related to
>   readlink, detected by valgrind (Tommi Rantala)
> 
> Infrastructure:
> 
> - Make vfs_getname probe point logic in 'perf trace' more robust
>   wrt length of pathname (Arnaldo Carvalho de Melo)
> 
> - Remove unused 'prefix' parameter from builtins main functions (Arnaldo 
> Carvalho de Melo)
> 
> - Show 'perf list sdt' option in man page (Ravi Bangoria)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Adrian Hunter (1):
>   perf auxtrace: Fix no_size logic in addr_filter__resolve_kernel_syms()
> 
> Arnaldo Carvalho de Melo (4):
>   perf trace: Check for vfs_getname.pathname length
>   perf trace: Fix up error path indentation
>   perf trace: Fixup thread refcounting
>   perf tools: Remove unused 'prefix' from builtin functions
> 
> Jin Yao (5):
>   perf report: Refactor common code in srcline.c
>   perf report: Find the inline stack for a given address
>   perf report: Introduce --inline option
>   perf report: Show inline stack for stdio mode
>   perf report: Show inline stack for browser mode
> 
> Milian Wolff (1):
>   perf report: Enable sorting by srcline as key
> 
> Ravi Bangoria (1):
>   perf list sdt: Show option in man page
> 
> Taeung Song (2):
>   perf annotate: Fix a bug following symbolic link of a build-id file
>   perf annotate: Fix a bug of division by zero when calculating percent
> 
> Tommi Rantala (6):
>   perf buildid: Do not update SDT cache with null filename
>   perf buildid: Do not assume that readlink() returns a null terminated 
> string
>   perf tests: Do not assume that readlink() returns a null terminated 
> string
>   perf utils: use sizeof(buf) - 1 in readlink() call
>   perf utils: Null terminate buf in read_ftrace_printk()
>   perf utils: Readlink /proc/self/exe to find the perf binary
> 
>  tools/perf/Documentation/perf-list.txt   |   4 +-
>  tools/perf/Documentation/perf-report.txt |   5 +
>  tools/perf/bench/bench.h |  20 +--
>  tools/perf/bench/futex-hash.c|   3 +-
>  tools/perf/bench/futex-lock-pi.c |   3 +-
>  tools/perf/bench/futex-requeue.c |   3 +-
>  tools/perf/bench/futex-wake-parallel.c   |   3 +-
>  tools/perf/bench/futex-wake.c|   3 +-
>  tools/perf/bench/mem-functions.c |   4 +-
>  tools/perf/bench/numa.c  |   2 +-
>  tools/perf/bench/sched-messaging.c   |   3 +-
>  tools/perf/bench/sched-pipe.c|   2 +-
>  tools/perf/builtin-annotate.c|   2 +-
>  tools/perf/builtin-bench.c   |  12 +-
>  tools/perf/builtin-buildid-cache.c   |   3 +-
>  tools/perf/builtin-buildid-list.c|   3 +-
>  tools/perf/builtin-c2c.c |   4 +-
>  tools/perf/builtin-config.c  |   2 +-
>  tools/perf/builtin-data.c|   9 +-
>  tools/perf/builtin-diff.c|   2 +-
>  tools/perf/builtin-evlist.c  |   2 +-
>  tools/perf/builtin-ftrace.c  |   2 +-
>  tools/perf/builtin-help.c|   2 +-
>  tools/perf/builtin-inje

Re: [BUG nohz]: wrong user and system time accounting

2017-03-27 Thread Wanpeng Li

2017-03-28 2:38 GMT+08:00 Luiz Capitulino :
> On Mon, 27 Mar 2017 09:56:47 +0800
> Wanpeng Li  wrote:
>
>> Actually after I bisect, the first bad commit is ff9a9b4c4334 ("sched,
>> time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity"). The bug
>> can be reproduced readily if CONFIG_CONTEXT_TRACKING_FORCE is true,
>> then just stress all the online cpus or just one cpu and leave others
>> idle(so it stresses the global timekeeping one), top show 100%
>> sys-time. And another way to reproduce it is by nohz_full, and gives
>> the stress to the house keeping cpu, the top show 100% sys-time of the
>> house keeping cpu, and also the other cpus who have at least two tasks
>> running on and in full_nohz mode.
>
> We're not short on reproducers, I have a new one too:
>
>  http://people.redhat.com/~lcapitul/real-time/acct-bug.c
>
> This is a single threaded task that reproduces the issue. If you
> run it as instructed, you'll get:
>
>  - nohz_full CPU: 95% system time 5% idle time
>  - non-nohz_full CPU: 95% user time 5% idle time (expected behavior)
>
> This reproduces the issue, but not for the reasons I expected. I was
> trying to mimic what I was seeing on my trace when tracing the two
> task problem. Which is: a task stays 995us in user-space and then
> enters the kernel. Time won't be accounted for user-space because
> we're not 1 jiffies yet, but if the task stays in the kernel for more
> than 5us, then time will be accounted for system time when going
> back to user-space.
>
> However, what really seems to be happening is: acct-bug is causing
> the tick to be re-activated (why? it shouldn't) and that causes the
> issue to appear. This is consistent with my other observations: I
> can only reproduce the issue if the nohz_full CPU re-activates the tick.

I see there are other kthreads like migration, kworker,
torture_shuffle etc on the isolated CPU.

Regards,
Wanpeng Li

>
>> Let's consider the cpu which has responsibility for the global
>> timekeeping, as the tracing posted above, the vtime_account_user() is
>> called before tick_sched_timer() which will update jiffies,
>
> But the vtime_account_user() call and the jiffies update happen
> on different CPUs, no? So the ordering shouldn't matter.
>
>> so jiffies
>> is stale in vtime_account_user() and the run time in userspace is
>> skipped, the vtime_user_enter() is called after jiffies update, so
>> both the time in userspace and in  kernel are accumulated to sys time.
>>
>> If the housekeeping cpu is idle when CONFIG_NO_HZ_FULL, everything is
>> fine. However, if you give stress to the housekeeping cpu, top will
>> show 100% sys-time of both the housekeeping cpu and the other cpus who
>> have at least two tasks running on and in full_nohz mode.
>
> The housekeeping CPUs are idle with my reproducers.
>
>> I think it
>> is because the stress delays the timer interrupt handling in some
>> degree, then the jiffies is not updated timely before other cpus
>> access it in vtime_account_user().
>>
>> I think we can keep syscalls/exceptions context tracking still in
>> jiffies based sampling and utilize local_clock() in vtime_delta()
>> again for irqs which avoids jiffies stale influence. I can make a
>> patch if the idea is acceptable or there is any better proposal. :)
>>
>> Regards,
>> Wanpeng Li
>>
>

Re: [BUG nohz]: wrong user and system time accounting

2017-03-27 Thread Wanpeng Li

2017-03-28 2:38 GMT+08:00 Luiz Capitulino :
> On Mon, 27 Mar 2017 09:56:47 +0800
> Wanpeng Li  wrote:
>
>> Actually after I bisect, the first bad commit is ff9a9b4c4334 ("sched,
>> time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity"). The bug
>> can be reproduced readily if CONFIG_CONTEXT_TRACKING_FORCE is true,
>> then just stress all the online cpus or just one cpu and leave others
>> idle(so it stresses the global timekeeping one), top show 100%
>> sys-time. And another way to reproduce it is by nohz_full, and gives
>> the stress to the house keeping cpu, the top show 100% sys-time of the
>> house keeping cpu, and also the other cpus who have at least two tasks
>> running on and in full_nohz mode.
>
> We're not short on reproducers, I have a new one too:
>
>  http://people.redhat.com/~lcapitul/real-time/acct-bug.c
>
> This is a single threaded task that reproduces the issue. If you
> run it as instructed, you'll get:
>
>  - nohz_full CPU: 95% system time 5% idle time
>  - non-nohz_full CPU: 95% user time 5% idle time (expected behavior)
>
> This reproduces the issue, but not for the reasons I expected. I was
> trying to mimic what I was seeing on my trace when tracing the two
> task problem. Which is: a task stays 995us in user-space and then
> enters the kernel. Time won't be accounted for user-space because
> we're not 1 jiffies yet, but if the task stays in the kernel for more
> than 5us, then time will be accounted for system time when going
> back to user-space.
>
> However, what really seems to be happening is: acct-bug is causing
> the tick to be re-activated (why? it shouldn't) and that causes the
> issue to appear. This is consistent with my other observations: I
> can only reproduce the issue if the nohz_full CPU re-activates the tick.

I see there are other kthreads like migration, kworker,
torture_shuffle etc on the isolated CPU.

Regards,
Wanpeng Li

>
>> Let's consider the cpu which has responsibility for the global
>> timekeeping, as the tracing posted above, the vtime_account_user() is
>> called before tick_sched_timer() which will update jiffies,
>
> But the vtime_account_user() call and the jiffies update happen
> on different CPUs, no? So the ordering shouldn't matter.
>
>> so jiffies
>> is stale in vtime_account_user() and the run time in userspace is
>> skipped, the vtime_user_enter() is called after jiffies update, so
>> both the time in userspace and in  kernel are accumulated to sys time.
>>
>> If the housekeeping cpu is idle when CONFIG_NO_HZ_FULL, everything is
>> fine. However, if you give stress to the housekeeping cpu, top will
>> show 100% sys-time of both the housekeeping cpu and the other cpus who
>> have at least two tasks running on and in full_nohz mode.
>
> The housekeeping CPUs are idle with my reproducers.
>
>> I think it
>> is because the stress delays the timer interrupt handling in some
>> degree, then the jiffies is not updated timely before other cpus
>> access it in vtime_account_user().
>>
>> I think we can keep syscalls/exceptions context tracking still in
>> jiffies based sampling and utilize local_clock() in vtime_delta()
>> again for irqs which avoids jiffies stale influence. I can make a
>> patch if the idea is acceptable or there is any better proposal. :)
>>
>> Regards,
>> Wanpeng Li
>>
>

Re: [PATCH] Make EN2 pin optional in the TRF7970A driver

2017-03-27 Thread Heiko Schocher


Hello all,

Am 21.02.2017 um 17:43 schrieb Rob Herring:

On Sun, Feb 19, 2017 at 11:19 PM, Heiko Schocher  wrote:

Hello all,

Am 13.02.2017 um 22:31 schrieb Rob Herring:


On Mon, Feb 13, 2017 at 12:38 AM, Heiko Schocher  wrote:


Hello Rob,


Am 10.02.2017 um 16:51 schrieb Rob Herring:



On Tue, Feb 07, 2017 at 06:22:04AM +0100, Heiko Schocher wrote:



From: Guan Ben 

Make the EN2 pin optional. This is useful for boards,
which have this pin fix wired, for example to ground.

Signed-off-by: Guan Ben 
Signed-off-by: Mark Jonas 
Signed-off-by: Heiko Schocher 

---

.../devicetree/bindings/net/nfc/trf7970a.txt   |  4 ++--
drivers/nfc/trf7970a.c | 26
--
2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
index 32b35a0..5889a3d 100644
--- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
+++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
@@ -5,8 +5,8 @@ Required properties:
- spi-max-frequency: Maximum SPI frequency (<= 200).
- interrupt-parent: phandle of parent interrupt handler.
- interrupts: A single interrupt specifier.
-- ti,enable-gpios: Two GPIO entries used for 'EN' and 'EN2' pins on
the
-  TRF7970A.
+- ti,enable-gpios: One or two GPIO entries used for 'EN' and 'EN2'
pins
on the
+  TRF7970A. EN2 is optional.




Could EN ever be optional/fixed? If so, perhaps deprecate this property
and do 2 properties, one for each pin.




The hardware I have has the EN2 pin fix connected to ground. Looking
into http://www.ti.com/lit/ds/slos743k/slos743k.pdf page 19 table 6-3
and 6-4 the EN2 pin is a don;t core if EN = 1. If EN = 0 EN2 pin
selects between Power Down and Sleep Mode ... I see no reason why
this is not possible/allowed ...

Hmm.. I do not like the idea of deprecating the "ti,enable-gpios"
property into 2 seperate properties ... but if this would be a reason
for not accepting this patch, I can do this ... How should I name
the 2 new properties?



I guess if this ever happens, then we just add "ti,enable2-gpios" and
ti,enable-gpios continues to point to EN. We don't need to deprecate
anything (or maybe just deprecate having both GPIOs on single
property).

In that case,

Acked-by: Rob Herring 



gentle ping.

Are there any more comments to this patch? Is it acceptable as it
is?


I acked it, so yes, it is fine.


Gentle ping. Any more issues or can this patch go into mainline?

bye,
Heiko
--
DENX Software Engineering GmbH,  Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

Re: [PATCH] Make EN2 pin optional in the TRF7970A driver

2017-03-27 Thread Heiko Schocher


Hello all,

Am 21.02.2017 um 17:43 schrieb Rob Herring:

On Sun, Feb 19, 2017 at 11:19 PM, Heiko Schocher  wrote:

Hello all,

Am 13.02.2017 um 22:31 schrieb Rob Herring:


On Mon, Feb 13, 2017 at 12:38 AM, Heiko Schocher  wrote:


Hello Rob,


Am 10.02.2017 um 16:51 schrieb Rob Herring:



On Tue, Feb 07, 2017 at 06:22:04AM +0100, Heiko Schocher wrote:



From: Guan Ben 

Make the EN2 pin optional. This is useful for boards,
which have this pin fix wired, for example to ground.

Signed-off-by: Guan Ben 
Signed-off-by: Mark Jonas 
Signed-off-by: Heiko Schocher 

---

.../devicetree/bindings/net/nfc/trf7970a.txt   |  4 ++--
drivers/nfc/trf7970a.c | 26
--
2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
index 32b35a0..5889a3d 100644
--- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
+++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
@@ -5,8 +5,8 @@ Required properties:
- spi-max-frequency: Maximum SPI frequency (<= 200).
- interrupt-parent: phandle of parent interrupt handler.
- interrupts: A single interrupt specifier.
-- ti,enable-gpios: Two GPIO entries used for 'EN' and 'EN2' pins on
the
-  TRF7970A.
+- ti,enable-gpios: One or two GPIO entries used for 'EN' and 'EN2'
pins
on the
+  TRF7970A. EN2 is optional.




Could EN ever be optional/fixed? If so, perhaps deprecate this property
and do 2 properties, one for each pin.




The hardware I have has the EN2 pin fix connected to ground. Looking
into http://www.ti.com/lit/ds/slos743k/slos743k.pdf page 19 table 6-3
and 6-4 the EN2 pin is a don;t core if EN = 1. If EN = 0 EN2 pin
selects between Power Down and Sleep Mode ... I see no reason why
this is not possible/allowed ...

Hmm.. I do not like the idea of deprecating the "ti,enable-gpios"
property into 2 seperate properties ... but if this would be a reason
for not accepting this patch, I can do this ... How should I name
the 2 new properties?



I guess if this ever happens, then we just add "ti,enable2-gpios" and
ti,enable-gpios continues to point to EN. We don't need to deprecate
anything (or maybe just deprecate having both GPIOs on single
property).

In that case,

Acked-by: Rob Herring 



gentle ping.

Are there any more comments to this patch? Is it acceptable as it
is?


I acked it, so yes, it is fine.


Gentle ping. Any more issues or can this patch go into mainline?

bye,
Heiko
--
DENX Software Engineering GmbH,  Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

[PATCH v2] selftests: gpio: fix Makefile

2017-03-27 Thread Fathi Boudra

* Fix hardcoded and misplaced libmount headers. Use pkg-config instead to
  figure out CFLAGS/LDLIBS, fixing also their value for cross-compilation.
  Note: if pkg-config is missing (command not found), it will fail to
  build because headers can't be found or libmount library can't be
  linked.

* Fix the clean target to clean up also gpio-utils.

* Fix gpio-mockup-chardev installation by using TEST_PROGS_EXTENDED
  instead of BINARIES which is not supported by the top-level lib.mk.

* Get rid of INSTALL_HDR_PATH. We don't need it since make -C is putting
  us in the right location.

* Improve readibility:
  - introduce GPIODIR/GPIOOBJ/GPIOINC variables
  - split CFLAGS on multiple lines

Signed-off-by: Fathi Boudra 
---

in v2:
 * per Michael Ellerman request, revert to use exported headers instead of uapi.

 tools/testing/selftests/gpio/Makefile | 31 ++-
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/gpio/Makefile 
b/tools/testing/selftests/gpio/Makefile
index 205e4d10e085..41826897af35 100644
--- a/tools/testing/selftests/gpio/Makefile
+++ b/tools/testing/selftests/gpio/Makefile
@@ -1,23 +1,28 @@
+CFLAGS += -O2 -g -std=gnu99 -Wall
+CFLAGS += -I../../../../usr/include/
+CFLAGS += $(shell pkg-config --cflags mount)
+LDLIBS += $(shell pkg-config --libs mount)
 
 TEST_PROGS := gpio-mockup.sh
-TEST_FILES := gpio-mockup-sysfs.sh $(BINARIES)
-BINARIES := gpio-mockup-chardev
+TEST_PROGS_EXTENDED := gpio-mockup-chardev
+TEST_FILES := gpio-mockup-sysfs.sh
+
+GPIODIR := ../../../gpio
+GPIOOBJ := gpio-utils.o
+GPIOINC := gpio.h
 
 include ../lib.mk
 
-all: $(BINARIES)
+all: $(GPIOINC) $(TEST_PROGS_EXTENDED)
 
 clean:
-   $(RM) $(BINARIES)
-
-CFLAGS += -O2 -g -std=gnu99 -Wall -I../../../../usr/include/
-LDLIBS += -lmount -I/usr/include/libmount
-
-$(BINARIES): ../../../gpio/gpio-utils.o ../../../../usr/include/linux/gpio.h
+   $(RM) $(TEST_PROGS_EXTENDED)
+   $(MAKE) -C $(GPIODIR) clean
 
-../../../gpio/gpio-utils.o:
-   make ARCH=$(ARCH) CROSS_COMPILE=$(CROSS_COMPILE) -C ../../../gpio
+$(TEST_PROGS_EXTENDED): $(GPIODIR)/$(GPIOOBJ)
 
-../../../../usr/include/linux/gpio.h:
-   make -C ../../../.. headers_install INSTALL_HDR_PATH=$(shell 
pwd)/../../../../usr/
+$(GPIODIR)/$(GPIOOBJ):
+   $(MAKE) -C $(GPIODIR)
 
+$(GPIOINC):
+   $(MAKE) -C ../../../.. headers_install
-- 
2.11.0

[PATCH v2] selftests: gpio: fix Makefile

2017-03-27 Thread Fathi Boudra

* Fix hardcoded and misplaced libmount headers. Use pkg-config instead to
  figure out CFLAGS/LDLIBS, fixing also their value for cross-compilation.
  Note: if pkg-config is missing (command not found), it will fail to
  build because headers can't be found or libmount library can't be
  linked.

* Fix the clean target to clean up also gpio-utils.

* Fix gpio-mockup-chardev installation by using TEST_PROGS_EXTENDED
  instead of BINARIES which is not supported by the top-level lib.mk.

* Get rid of INSTALL_HDR_PATH. We don't need it since make -C is putting
  us in the right location.

* Improve readibility:
  - introduce GPIODIR/GPIOOBJ/GPIOINC variables
  - split CFLAGS on multiple lines

Signed-off-by: Fathi Boudra 
---

in v2:
 * per Michael Ellerman request, revert to use exported headers instead of uapi.

 tools/testing/selftests/gpio/Makefile | 31 ++-
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/gpio/Makefile 
b/tools/testing/selftests/gpio/Makefile
index 205e4d10e085..41826897af35 100644
--- a/tools/testing/selftests/gpio/Makefile
+++ b/tools/testing/selftests/gpio/Makefile
@@ -1,23 +1,28 @@
+CFLAGS += -O2 -g -std=gnu99 -Wall
+CFLAGS += -I../../../../usr/include/
+CFLAGS += $(shell pkg-config --cflags mount)
+LDLIBS += $(shell pkg-config --libs mount)
 
 TEST_PROGS := gpio-mockup.sh
-TEST_FILES := gpio-mockup-sysfs.sh $(BINARIES)
-BINARIES := gpio-mockup-chardev
+TEST_PROGS_EXTENDED := gpio-mockup-chardev
+TEST_FILES := gpio-mockup-sysfs.sh
+
+GPIODIR := ../../../gpio
+GPIOOBJ := gpio-utils.o
+GPIOINC := gpio.h
 
 include ../lib.mk
 
-all: $(BINARIES)
+all: $(GPIOINC) $(TEST_PROGS_EXTENDED)
 
 clean:
-   $(RM) $(BINARIES)
-
-CFLAGS += -O2 -g -std=gnu99 -Wall -I../../../../usr/include/
-LDLIBS += -lmount -I/usr/include/libmount
-
-$(BINARIES): ../../../gpio/gpio-utils.o ../../../../usr/include/linux/gpio.h
+   $(RM) $(TEST_PROGS_EXTENDED)
+   $(MAKE) -C $(GPIODIR) clean
 
-../../../gpio/gpio-utils.o:
-   make ARCH=$(ARCH) CROSS_COMPILE=$(CROSS_COMPILE) -C ../../../gpio
+$(TEST_PROGS_EXTENDED): $(GPIODIR)/$(GPIOOBJ)
 
-../../../../usr/include/linux/gpio.h:
-   make -C ../../../.. headers_install INSTALL_HDR_PATH=$(shell 
pwd)/../../../../usr/
+$(GPIODIR)/$(GPIOOBJ):
+   $(MAKE) -C $(GPIODIR)
 
+$(GPIOINC):
+   $(MAKE) -C ../../../.. headers_install
-- 
2.11.0

Re: [PATCH] mm: fix section name for .data..ro_after_init

2017-03-27 Thread Heiko Carstens

On Mon, Mar 27, 2017 at 12:22:13PM -0700, Kees Cook wrote:
> A section name for .data..ro_after_init was added by both:
> 
> commit d07a980c1b8d ("s390: add proper __ro_after_init support")
> 
> and
> 
> commit d7c19b066dcf ("mm: kmemleak: scan .data.ro_after_init")
> 
> The latter adds incorrect wrapping around the existing s390 section,
> and came later. I'd prefer the s390 naming, so this moves the
> s390-specific name up to the asm-generic/sections.h and renames the
> section as used by kmemleak (and in the future, kernel/extable.c).
> 
> Cc: Jakub Kicinski 
> Cc: Heiko Carstens 
> Cc: Eddie Kovsky 
> Signed-off-by: Kees Cook 
> ---
>  arch/s390/include/asm/sections.h  | 1 -
>  arch/s390/kernel/vmlinux.lds.S| 2 --
>  include/asm-generic/sections.h| 6 +++---
>  include/asm-generic/vmlinux.lds.h | 4 ++--
>  mm/kmemleak.c | 2 +-
>  5 files changed, 6 insertions(+), 9 deletions(-)

For the s390 bits:
Acked-by: Heiko Carstens

Re: [PATCH] mm: fix section name for .data..ro_after_init

2017-03-27 Thread Heiko Carstens

On Mon, Mar 27, 2017 at 12:22:13PM -0700, Kees Cook wrote:
> A section name for .data..ro_after_init was added by both:
> 
> commit d07a980c1b8d ("s390: add proper __ro_after_init support")
> 
> and
> 
> commit d7c19b066dcf ("mm: kmemleak: scan .data.ro_after_init")
> 
> The latter adds incorrect wrapping around the existing s390 section,
> and came later. I'd prefer the s390 naming, so this moves the
> s390-specific name up to the asm-generic/sections.h and renames the
> section as used by kmemleak (and in the future, kernel/extable.c).
> 
> Cc: Jakub Kicinski 
> Cc: Heiko Carstens 
> Cc: Eddie Kovsky 
> Signed-off-by: Kees Cook 
> ---
>  arch/s390/include/asm/sections.h  | 1 -
>  arch/s390/kernel/vmlinux.lds.S| 2 --
>  include/asm-generic/sections.h| 6 +++---
>  include/asm-generic/vmlinux.lds.h | 4 ++--
>  mm/kmemleak.c | 2 +-
>  5 files changed, 6 insertions(+), 9 deletions(-)

For the s390 bits:
Acked-by: Heiko Carstens

Re: [PATCH v5 08/13] powerpc/perf: PMU functions for Core IMC and hotplugging

2017-03-27 Thread Madhavan Srinivasan




On Thursday 23 March 2017 06:39 PM, Gautham R Shenoy wrote:

Hi Maddy, Hemant, Anju,

On Thu, Mar 16, 2017 at 01:05:02PM +0530, Madhavan Srinivasan wrote:

[..snip..]


+
+static void core_imc_change_cpu_context(int old_cpu, int new_cpu)
+{
+   if (!core_imc_pmu)
+   return;
+   perf_pmu_migrate_context(_imc_pmu->pmu, old_cpu, new_cpu);
+}
+
+
+static int ppc_core_imc_cpu_online(unsigned int cpu)
+{
+   int ret;
+
+   /* If a cpu for this core is already set, then, don't do anything */
+   ret = cpumask_any_and(_imc_cpumask,
+cpu_sibling_mask(cpu));
+   if (ret < nr_cpu_ids)
+   return 0;
+
+   /* Else, set the cpu in the mask, and change the context */
+   cpumask_set_cpu(cpu, _imc_cpumask);
+   core_imc_change_cpu_context(-1, cpu);

So, in the core case, we are ok as long as any cpu in the core is
present in the imc_cpumask. It need not have to be the smallest online
cpu in the core.

Can the same logic be applied to the earlier nest case ?


Yes. This makes sense. Let me look at this.

Thanks for review
Maddy



We can have a single function for cpu_offline and cpu_online which
implements these checks and sets the cpu bit if required.

ppc_entity_imc_cpu_offline(unsigned int cpu, cpumask_t
   entity_imc_mask,
   entity_imc_change_cpu_context_fn)
{
.
.
.

}


static ppc_nest_imc_cpu_offline(unsigned int cpu)
{
return ppc_entity_imc_cpu_offline(cpu, nest_imc_mask,
  nest_imc_change_cpu_context);
}

And similar ones for core imc and thread imc.

Does this sound reasonable ?



+   return 0;
+}
+
+static int ppc_core_imc_cpu_offline(unsigned int cpu)
+{
+   int target;
+   unsigned int ncpu;
+
+   /*
+* clear this cpu out of the mask, if not present in the mask,
+* don't bother doing anything.
+*/
+   if (!cpumask_test_and_clear_cpu(cpu, _imc_cpumask))
+   return 0;
+
+   /* Find any online cpu in that core except the current "cpu" */
+   ncpu = cpumask_any_but(cpu_sibling_mask(cpu), cpu);
+
+   if (ncpu < nr_cpu_ids) {
+   target = ncpu;
+   cpumask_set_cpu(target, _imc_cpumask);
+   } else
+   target = -1;
+
+   /* migrate the context */
+   core_imc_change_cpu_context(cpu, target);
+
+   return 0;
+}
+

--
Thanks and Regards
gautham.

[PATCH -mm -v7 8/9] mm, THP, swap: Support to split THP in swap cache

2017-03-27 Thread Huang, Ying

From: Huang Ying 

This patch enhanced the split_huge_page_to_list() to work properly for
the THP (Transparent Huge Page) in the swap cache during swapping out.

This is used for delaying splitting the THP during swapping out.  Where
for a THP to be swapped out, we will allocate a swap cluster, add the
THP into the swap cache, then split the THP.  The page lock will be held
during this process.  So in the code path other than swapping out, if
the THP need to be split, the PageSwapCache(THP) will be always false.

Cc: Andrea Arcangeli 
Cc: Ebru Akagunduz 
Signed-off-by: "Huang, Ying" 
Acked-by: Kirill A. Shutemov 
---
 mm/huge_memory.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 08ccf0cebe8f..459c7d5cdeb3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2185,7 +2185,7 @@ static void __split_huge_page_tail(struct page *head, int 
tail,
 * atomic_set() here would be safe on all archs (and not only on x86),
 * it's safer to use atomic_inc()/atomic_add().
 */
-   if (PageAnon(head)) {
+   if (PageAnon(head) && !PageSwapCache(head)) {
page_ref_inc(page_tail);
} else {
/* Additional pin to radix tree */
@@ -2196,6 +2196,7 @@ static void __split_huge_page_tail(struct page *head, int 
tail,
page_tail->flags |= (head->flags &
((1L << PG_referenced) |
 (1L << PG_swapbacked) |
+(1L << PG_swapcache) |
 (1L << PG_mlocked) |
 (1L << PG_uptodate) |
 (1L << PG_active) |
@@ -2258,7 +2259,11 @@ static void __split_huge_page(struct page *page, struct 
list_head *list,
ClearPageCompound(head);
/* See comment in __split_huge_page_tail() */
if (PageAnon(head)) {
-   page_ref_inc(head);
+   /* Additional pin to radix tree of swap cache */
+   if (PageSwapCache(head))
+   page_ref_add(head, 2);
+   else
+   page_ref_inc(head);
} else {
/* Additional pin to radix tree */
page_ref_add(head, 2);
@@ -2370,10 +2375,12 @@ int page_trans_huge_mapcount(struct page *page, int 
*total_mapcount)
 /* Racy check whether the huge page can be split */
 bool can_split_huge_page(struct page *page, int *pextra_pins)
 {
-   int extra_pins = 0;
+   int extra_pins;
 
/* Additional pins from radix tree */
-   if (!PageAnon(page))
+   if (PageAnon(page))
+   extra_pins = PageSwapCache(page) ? HPAGE_PMD_NR : 0;
+   else
extra_pins = HPAGE_PMD_NR;
if (pextra_pins)
*pextra_pins = extra_pins;
@@ -2427,7 +2434,6 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
ret = -EBUSY;
goto out;
}
-   extra_pins = 0;
mapping = NULL;
anon_vma_lock_write(anon_vma);
} else {
-- 
2.11.0

Re: [PATCH v5 08/13] powerpc/perf: PMU functions for Core IMC and hotplugging

2017-03-27 Thread Madhavan Srinivasan




On Thursday 23 March 2017 06:39 PM, Gautham R Shenoy wrote:

Hi Maddy, Hemant, Anju,

On Thu, Mar 16, 2017 at 01:05:02PM +0530, Madhavan Srinivasan wrote:

[..snip..]


+
+static void core_imc_change_cpu_context(int old_cpu, int new_cpu)
+{
+   if (!core_imc_pmu)
+   return;
+   perf_pmu_migrate_context(_imc_pmu->pmu, old_cpu, new_cpu);
+}
+
+
+static int ppc_core_imc_cpu_online(unsigned int cpu)
+{
+   int ret;
+
+   /* If a cpu for this core is already set, then, don't do anything */
+   ret = cpumask_any_and(_imc_cpumask,
+cpu_sibling_mask(cpu));
+   if (ret < nr_cpu_ids)
+   return 0;
+
+   /* Else, set the cpu in the mask, and change the context */
+   cpumask_set_cpu(cpu, _imc_cpumask);
+   core_imc_change_cpu_context(-1, cpu);

So, in the core case, we are ok as long as any cpu in the core is
present in the imc_cpumask. It need not have to be the smallest online
cpu in the core.

Can the same logic be applied to the earlier nest case ?


Yes. This makes sense. Let me look at this.

Thanks for review
Maddy



We can have a single function for cpu_offline and cpu_online which
implements these checks and sets the cpu bit if required.

ppc_entity_imc_cpu_offline(unsigned int cpu, cpumask_t
   entity_imc_mask,
   entity_imc_change_cpu_context_fn)
{
.
.
.

}


static ppc_nest_imc_cpu_offline(unsigned int cpu)
{
return ppc_entity_imc_cpu_offline(cpu, nest_imc_mask,
  nest_imc_change_cpu_context);
}

And similar ones for core imc and thread imc.

Does this sound reasonable ?



+   return 0;
+}
+
+static int ppc_core_imc_cpu_offline(unsigned int cpu)
+{
+   int target;
+   unsigned int ncpu;
+
+   /*
+* clear this cpu out of the mask, if not present in the mask,
+* don't bother doing anything.
+*/
+   if (!cpumask_test_and_clear_cpu(cpu, _imc_cpumask))
+   return 0;
+
+   /* Find any online cpu in that core except the current "cpu" */
+   ncpu = cpumask_any_but(cpu_sibling_mask(cpu), cpu);
+
+   if (ncpu < nr_cpu_ids) {
+   target = ncpu;
+   cpumask_set_cpu(target, _imc_cpumask);
+   } else
+   target = -1;
+
+   /* migrate the context */
+   core_imc_change_cpu_context(cpu, target);
+
+   return 0;
+}
+

--
Thanks and Regards
gautham.

[PATCH -mm -v7 8/9] mm, THP, swap: Support to split THP in swap cache

2017-03-27 Thread Huang, Ying

From: Huang Ying 

This patch enhanced the split_huge_page_to_list() to work properly for
the THP (Transparent Huge Page) in the swap cache during swapping out.

This is used for delaying splitting the THP during swapping out.  Where
for a THP to be swapped out, we will allocate a swap cluster, add the
THP into the swap cache, then split the THP.  The page lock will be held
during this process.  So in the code path other than swapping out, if
the THP need to be split, the PageSwapCache(THP) will be always false.

Cc: Andrea Arcangeli 
Cc: Ebru Akagunduz 
Signed-off-by: "Huang, Ying" 
Acked-by: Kirill A. Shutemov 
---
 mm/huge_memory.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 08ccf0cebe8f..459c7d5cdeb3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2185,7 +2185,7 @@ static void __split_huge_page_tail(struct page *head, int 
tail,
 * atomic_set() here would be safe on all archs (and not only on x86),
 * it's safer to use atomic_inc()/atomic_add().
 */
-   if (PageAnon(head)) {
+   if (PageAnon(head) && !PageSwapCache(head)) {
page_ref_inc(page_tail);
} else {
/* Additional pin to radix tree */
@@ -2196,6 +2196,7 @@ static void __split_huge_page_tail(struct page *head, int 
tail,
page_tail->flags |= (head->flags &
((1L << PG_referenced) |
 (1L << PG_swapbacked) |
+(1L << PG_swapcache) |
 (1L << PG_mlocked) |
 (1L << PG_uptodate) |
 (1L << PG_active) |
@@ -2258,7 +2259,11 @@ static void __split_huge_page(struct page *page, struct 
list_head *list,
ClearPageCompound(head);
/* See comment in __split_huge_page_tail() */
if (PageAnon(head)) {
-   page_ref_inc(head);
+   /* Additional pin to radix tree of swap cache */
+   if (PageSwapCache(head))
+   page_ref_add(head, 2);
+   else
+   page_ref_inc(head);
} else {
/* Additional pin to radix tree */
page_ref_add(head, 2);
@@ -2370,10 +2375,12 @@ int page_trans_huge_mapcount(struct page *page, int 
*total_mapcount)
 /* Racy check whether the huge page can be split */
 bool can_split_huge_page(struct page *page, int *pextra_pins)
 {
-   int extra_pins = 0;
+   int extra_pins;
 
/* Additional pins from radix tree */
-   if (!PageAnon(page))
+   if (PageAnon(page))
+   extra_pins = PageSwapCache(page) ? HPAGE_PMD_NR : 0;
+   else
extra_pins = HPAGE_PMD_NR;
if (pextra_pins)
*pextra_pins = extra_pins;
@@ -2427,7 +2434,6 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
ret = -EBUSY;
goto out;
}
-   extra_pins = 0;
mapping = NULL;
anon_vma_lock_write(anon_vma);
} else {
-- 
2.11.0

[PATCH -mm -v7 4/9] mm, THP, swap: Add get_huge_swap_page()

2017-03-27 Thread Huang, Ying

From: Huang Ying 

A variation of get_swap_page(), get_huge_swap_page(), is added to
allocate a swap cluster (HPAGE_PMD_NR swap slots) based on the swap
cluster allocation function.  A fair simple algorithm is used, that is,
only the first swap device in priority list will be tried to allocate
the swap cluster.  The function will fail if the trying is not
successful, and the caller will fallback to allocate a single swap slot
instead.  This works good enough for normal cases.

This will be used for the THP (Transparent Huge Page) swap support.
Where get_huge_swap_page() will be used to allocate one swap cluster for
each THP swapped out.

Because of the algorithm adopted, if the difference of the number of the
free swap clusters among multiple swap devices is significant, it is
possible that some THPs are split earlier than necessary.  For example,
this could be caused by big size difference among multiple swap devices.

Cc: Andrea Arcangeli 
Cc: Kirill A. Shutemov 
Cc: Hugh Dickins 
Cc: Shaohua Li 
Cc: Minchan Kim 
Cc: Rik van Riel 
Signed-off-by: "Huang, Ying" 
---
 include/linux/swap.h | 19 ++-
 mm/swap_slots.c  |  5 +++--
 mm/swapfile.c| 18 +++---
 3 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 278e1349a424..e3a7609a8989 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -388,7 +388,7 @@ static inline long get_nr_swap_pages(void)
 extern void si_swapinfo(struct sysinfo *);
 extern swp_entry_t get_swap_page(void);
 extern swp_entry_t get_swap_page_of_type(int);
-extern int get_swap_pages(int n, swp_entry_t swp_entries[]);
+extern int get_swap_pages(int n, swp_entry_t swp_entries[], bool huge);
 extern int add_swap_count_continuation(swp_entry_t, gfp_t);
 extern void swap_shmem_alloc(swp_entry_t);
 extern int swap_duplicate(swp_entry_t);
@@ -527,6 +527,23 @@ static inline swp_entry_t get_swap_page(void)
 
 #endif /* CONFIG_SWAP */
 
+#ifdef CONFIG_THP_SWAP_CLUSTER
+static inline swp_entry_t get_huge_swap_page(void)
+{
+   swp_entry_t entry;
+
+   if (get_swap_pages(1, , true))
+   return entry;
+   else
+   return (swp_entry_t) {0};
+}
+#else
+static inline swp_entry_t get_huge_swap_page(void)
+{
+   return (swp_entry_t) {0};
+}
+#endif
+
 #ifdef CONFIG_MEMCG
 static inline int mem_cgroup_swappiness(struct mem_cgroup *memcg)
 {
diff --git a/mm/swap_slots.c b/mm/swap_slots.c
index 9b5bc86f96ad..075bb39e03c5 100644
--- a/mm/swap_slots.c
+++ b/mm/swap_slots.c
@@ -258,7 +258,8 @@ static int refill_swap_slots_cache(struct swap_slots_cache 
*cache)
 
cache->cur = 0;
if (swap_slot_cache_active)
-   cache->nr = get_swap_pages(SWAP_SLOTS_CACHE_SIZE, cache->slots);
+   cache->nr = get_swap_pages(SWAP_SLOTS_CACHE_SIZE, cache->slots,
+  false);
 
return cache->nr;
 }
@@ -334,7 +335,7 @@ swp_entry_t get_swap_page(void)
return entry;
}
 
-   get_swap_pages(1, );
+   get_swap_pages(1, , false);
 
return entry;
 }
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 54480acbbeef..382e84541e16 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -904,13 +904,14 @@ static unsigned long scan_swap_map(struct 
swap_info_struct *si,
 
 }
 
-int get_swap_pages(int n_goal, swp_entry_t swp_entries[])
+int get_swap_pages(int n_goal, swp_entry_t swp_entries[], bool huge)
 {
struct swap_info_struct *si, *next;
long avail_pgs;
int n_ret = 0;
+   int nr_pages = huge_cluster_nr_entries(huge);
 
-   avail_pgs = atomic_long_read(_swap_pages);
+   avail_pgs = atomic_long_read(_swap_pages) / nr_pages;
if (avail_pgs <= 0)
goto noswap;
 
@@ -920,7 +921,7 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[])
if (n_goal > avail_pgs)
n_goal = avail_pgs;
 
-   atomic_long_sub(n_goal, _swap_pages);
+   atomic_long_sub(n_goal * nr_pages, _swap_pages);
 
spin_lock(_avail_lock);
 
@@ -946,10 +947,13 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[])
spin_unlock(>lock);
goto nextsi;
}
-   n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE,
-   n_goal, swp_entries);
+   if (likely(!huge))
+   n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE,
+   n_goal, swp_entries);
+   else
+   n_ret = swap_alloc_huge_cluster(si, swp_entries);
spin_unlock(>lock);
-   if (n_ret)
+   if (n_ret || unlikely(huge))
goto

[PATCH -mm -v7 4/9] mm, THP, swap: Add get_huge_swap_page()

2017-03-27 Thread Huang, Ying

From: Huang Ying 

A variation of get_swap_page(), get_huge_swap_page(), is added to
allocate a swap cluster (HPAGE_PMD_NR swap slots) based on the swap
cluster allocation function.  A fair simple algorithm is used, that is,
only the first swap device in priority list will be tried to allocate
the swap cluster.  The function will fail if the trying is not
successful, and the caller will fallback to allocate a single swap slot
instead.  This works good enough for normal cases.

This will be used for the THP (Transparent Huge Page) swap support.
Where get_huge_swap_page() will be used to allocate one swap cluster for
each THP swapped out.

Because of the algorithm adopted, if the difference of the number of the
free swap clusters among multiple swap devices is significant, it is
possible that some THPs are split earlier than necessary.  For example,
this could be caused by big size difference among multiple swap devices.

Cc: Andrea Arcangeli 
Cc: Kirill A. Shutemov 
Cc: Hugh Dickins 
Cc: Shaohua Li 
Cc: Minchan Kim 
Cc: Rik van Riel 
Signed-off-by: "Huang, Ying" 
---
 include/linux/swap.h | 19 ++-
 mm/swap_slots.c  |  5 +++--
 mm/swapfile.c| 18 +++---
 3 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 278e1349a424..e3a7609a8989 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -388,7 +388,7 @@ static inline long get_nr_swap_pages(void)
 extern void si_swapinfo(struct sysinfo *);
 extern swp_entry_t get_swap_page(void);
 extern swp_entry_t get_swap_page_of_type(int);
-extern int get_swap_pages(int n, swp_entry_t swp_entries[]);
+extern int get_swap_pages(int n, swp_entry_t swp_entries[], bool huge);
 extern int add_swap_count_continuation(swp_entry_t, gfp_t);
 extern void swap_shmem_alloc(swp_entry_t);
 extern int swap_duplicate(swp_entry_t);
@@ -527,6 +527,23 @@ static inline swp_entry_t get_swap_page(void)
 
 #endif /* CONFIG_SWAP */
 
+#ifdef CONFIG_THP_SWAP_CLUSTER
+static inline swp_entry_t get_huge_swap_page(void)
+{
+   swp_entry_t entry;
+
+   if (get_swap_pages(1, , true))
+   return entry;
+   else
+   return (swp_entry_t) {0};
+}
+#else
+static inline swp_entry_t get_huge_swap_page(void)
+{
+   return (swp_entry_t) {0};
+}
+#endif
+
 #ifdef CONFIG_MEMCG
 static inline int mem_cgroup_swappiness(struct mem_cgroup *memcg)
 {
diff --git a/mm/swap_slots.c b/mm/swap_slots.c
index 9b5bc86f96ad..075bb39e03c5 100644
--- a/mm/swap_slots.c
+++ b/mm/swap_slots.c
@@ -258,7 +258,8 @@ static int refill_swap_slots_cache(struct swap_slots_cache 
*cache)
 
cache->cur = 0;
if (swap_slot_cache_active)
-   cache->nr = get_swap_pages(SWAP_SLOTS_CACHE_SIZE, cache->slots);
+   cache->nr = get_swap_pages(SWAP_SLOTS_CACHE_SIZE, cache->slots,
+  false);
 
return cache->nr;
 }
@@ -334,7 +335,7 @@ swp_entry_t get_swap_page(void)
return entry;
}
 
-   get_swap_pages(1, );
+   get_swap_pages(1, , false);
 
return entry;
 }
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 54480acbbeef..382e84541e16 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -904,13 +904,14 @@ static unsigned long scan_swap_map(struct 
swap_info_struct *si,
 
 }
 
-int get_swap_pages(int n_goal, swp_entry_t swp_entries[])
+int get_swap_pages(int n_goal, swp_entry_t swp_entries[], bool huge)
 {
struct swap_info_struct *si, *next;
long avail_pgs;
int n_ret = 0;
+   int nr_pages = huge_cluster_nr_entries(huge);
 
-   avail_pgs = atomic_long_read(_swap_pages);
+   avail_pgs = atomic_long_read(_swap_pages) / nr_pages;
if (avail_pgs <= 0)
goto noswap;
 
@@ -920,7 +921,7 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[])
if (n_goal > avail_pgs)
n_goal = avail_pgs;
 
-   atomic_long_sub(n_goal, _swap_pages);
+   atomic_long_sub(n_goal * nr_pages, _swap_pages);
 
spin_lock(_avail_lock);
 
@@ -946,10 +947,13 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[])
spin_unlock(>lock);
goto nextsi;
}
-   n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE,
-   n_goal, swp_entries);
+   if (likely(!huge))
+   n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE,
+   n_goal, swp_entries);
+   else
+   n_ret = swap_alloc_huge_cluster(si, swp_entries);
spin_unlock(>lock);
-   if (n_ret)
+   if (n_ret || unlikely(huge))
goto check_out;
pr_debug("scan_swap_map of si %d failed to find offset\n",
si->type);
@@ -975,7 +979,7 @@ int get_swap_pages(int n_goal,

[PATCH -mm -v7 2/9] mm, memcg: Support to charge/uncharge multiple swap entries

2017-03-27 Thread Huang, Ying

From: Huang Ying 

This patch make it possible to charge or uncharge a set of continuous
swap entries in the swap cgroup.  The number of swap entries is
specified via an added parameter.

This will be used for the THP (Transparent Huge Page) swap support.
Where a swap cluster backing a THP may be allocated and freed as a
whole.  So a set of (HPAGE_PMD_NR) continuous swap entries backing one
THP need to be charged or uncharged together.  This will batch the
cgroup operations for the THP swap too.

Cc: Andrea Arcangeli 
Cc: Kirill A. Shutemov 
Cc: Vladimir Davydov 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Tejun Heo 
Cc: cgro...@vger.kernel.org
Signed-off-by: "Huang, Ying" 
---
 include/linux/swap.h| 12 ++
 include/linux/swap_cgroup.h |  6 +++--
 mm/memcontrol.c | 57 +
 mm/shmem.c  |  2 +-
 mm/swap_cgroup.c| 40 +++
 mm/swap_state.c |  2 +-
 mm/swapfile.c   |  2 +-
 7 files changed, 77 insertions(+), 44 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 486494e6b2fc..278e1349a424 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -550,8 +550,10 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup 
*mem)
 
 #ifdef CONFIG_MEMCG_SWAP
 extern void mem_cgroup_swapout(struct page *page, swp_entry_t entry);
-extern int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry);
-extern void mem_cgroup_uncharge_swap(swp_entry_t entry);
+extern int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry,
+ unsigned int nr_entries);
+extern void mem_cgroup_uncharge_swap(swp_entry_t entry,
+unsigned int nr_entries);
 extern long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg);
 extern bool mem_cgroup_swap_full(struct page *page);
 #else
@@ -560,12 +562,14 @@ static inline void mem_cgroup_swapout(struct page *page, 
swp_entry_t entry)
 }
 
 static inline int mem_cgroup_try_charge_swap(struct page *page,
-swp_entry_t entry)
+swp_entry_t entry,
+unsigned int nr_entries)
 {
return 0;
 }
 
-static inline void mem_cgroup_uncharge_swap(swp_entry_t entry)
+static inline void mem_cgroup_uncharge_swap(swp_entry_t entry,
+   unsigned int nr_entries)
 {
 }
 
diff --git a/include/linux/swap_cgroup.h b/include/linux/swap_cgroup.h
index 145306bdc92f..b2b8ec7bda3f 100644
--- a/include/linux/swap_cgroup.h
+++ b/include/linux/swap_cgroup.h
@@ -7,7 +7,8 @@
 
 extern unsigned short swap_cgroup_cmpxchg(swp_entry_t ent,
unsigned short old, unsigned short new);
-extern unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id);
+extern unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id,
+unsigned int nr_ents);
 extern unsigned short lookup_swap_cgroup_id(swp_entry_t ent);
 extern int swap_cgroup_swapon(int type, unsigned long max_pages);
 extern void swap_cgroup_swapoff(int type);
@@ -15,7 +16,8 @@ extern void swap_cgroup_swapoff(int type);
 #else
 
 static inline
-unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id)
+unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id,
+ unsigned int nr_ents)
 {
return 0;
 }
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 490d5b4676c1..13ee82fe81c8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2393,10 +2393,9 @@ void mem_cgroup_split_huge_fixup(struct page *head)
 
 #ifdef CONFIG_MEMCG_SWAP
 static void mem_cgroup_swap_statistics(struct mem_cgroup *memcg,
-bool charge)
+  int nr_entries)
 {
-   int val = (charge) ? 1 : -1;
-   this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_SWAP], val);
+   this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_SWAP], nr_entries);
 }
 
 /**
@@ -2422,8 +2421,8 @@ static int mem_cgroup_move_swap_account(swp_entry_t entry,
new_id = mem_cgroup_id(to);
 
if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) {
-   mem_cgroup_swap_statistics(from, false);
-   mem_cgroup_swap_statistics(to, true);
+   mem_cgroup_swap_statistics(from, -1);
+   mem_cgroup_swap_statistics(to, 1);
return 0;
}
return -EINVAL;
@@ -5451,7 +5450,7 @@ void mem_cgroup_commit_charge(struct page *page, struct 
mem_cgroup *memcg,
 * let's not wait for it.  The page already received a

[PATCH -mm -v7 2/9] mm, memcg: Support to charge/uncharge multiple swap entries

2017-03-27 Thread Huang, Ying

From: Huang Ying 

This patch make it possible to charge or uncharge a set of continuous
swap entries in the swap cgroup.  The number of swap entries is
specified via an added parameter.

This will be used for the THP (Transparent Huge Page) swap support.
Where a swap cluster backing a THP may be allocated and freed as a
whole.  So a set of (HPAGE_PMD_NR) continuous swap entries backing one
THP need to be charged or uncharged together.  This will batch the
cgroup operations for the THP swap too.

Cc: Andrea Arcangeli 
Cc: Kirill A. Shutemov 
Cc: Vladimir Davydov 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Tejun Heo 
Cc: cgro...@vger.kernel.org
Signed-off-by: "Huang, Ying" 
---
 include/linux/swap.h| 12 ++
 include/linux/swap_cgroup.h |  6 +++--
 mm/memcontrol.c | 57 +
 mm/shmem.c  |  2 +-
 mm/swap_cgroup.c| 40 +++
 mm/swap_state.c |  2 +-
 mm/swapfile.c   |  2 +-
 7 files changed, 77 insertions(+), 44 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 486494e6b2fc..278e1349a424 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -550,8 +550,10 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup 
*mem)
 
 #ifdef CONFIG_MEMCG_SWAP
 extern void mem_cgroup_swapout(struct page *page, swp_entry_t entry);
-extern int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry);
-extern void mem_cgroup_uncharge_swap(swp_entry_t entry);
+extern int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry,
+ unsigned int nr_entries);
+extern void mem_cgroup_uncharge_swap(swp_entry_t entry,
+unsigned int nr_entries);
 extern long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg);
 extern bool mem_cgroup_swap_full(struct page *page);
 #else
@@ -560,12 +562,14 @@ static inline void mem_cgroup_swapout(struct page *page, 
swp_entry_t entry)
 }
 
 static inline int mem_cgroup_try_charge_swap(struct page *page,
-swp_entry_t entry)
+swp_entry_t entry,
+unsigned int nr_entries)
 {
return 0;
 }
 
-static inline void mem_cgroup_uncharge_swap(swp_entry_t entry)
+static inline void mem_cgroup_uncharge_swap(swp_entry_t entry,
+   unsigned int nr_entries)
 {
 }
 
diff --git a/include/linux/swap_cgroup.h b/include/linux/swap_cgroup.h
index 145306bdc92f..b2b8ec7bda3f 100644
--- a/include/linux/swap_cgroup.h
+++ b/include/linux/swap_cgroup.h
@@ -7,7 +7,8 @@
 
 extern unsigned short swap_cgroup_cmpxchg(swp_entry_t ent,
unsigned short old, unsigned short new);
-extern unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id);
+extern unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id,
+unsigned int nr_ents);
 extern unsigned short lookup_swap_cgroup_id(swp_entry_t ent);
 extern int swap_cgroup_swapon(int type, unsigned long max_pages);
 extern void swap_cgroup_swapoff(int type);
@@ -15,7 +16,8 @@ extern void swap_cgroup_swapoff(int type);
 #else
 
 static inline
-unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id)
+unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id,
+ unsigned int nr_ents)
 {
return 0;
 }
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 490d5b4676c1..13ee82fe81c8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2393,10 +2393,9 @@ void mem_cgroup_split_huge_fixup(struct page *head)
 
 #ifdef CONFIG_MEMCG_SWAP
 static void mem_cgroup_swap_statistics(struct mem_cgroup *memcg,
-bool charge)
+  int nr_entries)
 {
-   int val = (charge) ? 1 : -1;
-   this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_SWAP], val);
+   this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_SWAP], nr_entries);
 }
 
 /**
@@ -2422,8 +2421,8 @@ static int mem_cgroup_move_swap_account(swp_entry_t entry,
new_id = mem_cgroup_id(to);
 
if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) {
-   mem_cgroup_swap_statistics(from, false);
-   mem_cgroup_swap_statistics(to, true);
+   mem_cgroup_swap_statistics(from, -1);
+   mem_cgroup_swap_statistics(to, 1);
return 0;
}
return -EINVAL;
@@ -5451,7 +5450,7 @@ void mem_cgroup_commit_charge(struct page *page, struct 
mem_cgroup *memcg,
 * let's not wait for it.  The page already received a
 * memory+swap charge, drop the swap entry duplicate.
 */
-   mem_cgroup_uncharge_swap(entry);
+

[PATCH -mm -v7 6/9] mm, THP, swap: Support to add/delete THP to/from swap cache

2017-03-27 Thread Huang, Ying

From: Huang Ying 

With this patch, a THP (Transparent Huge Page) can be added/deleted
to/from the swap cache as a set of (HPAGE_PMD_NR) sub-pages.

This will be used for the THP (Transparent Huge Page) swap support.
Where one THP may be added/delted to/from the swap cache.  This will
batch the swap cache operations to reduce the lock acquire/release times
for the THP swap too.

Cc: Hugh Dickins 
Cc: Shaohua Li 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Andrea Arcangeli 
Cc: Kirill A. Shutemov 
Signed-off-by: "Huang, Ying" 
---
 include/linux/page-flags.h |  5 ++--
 mm/swap_state.c| 64 ++
 2 files changed, 45 insertions(+), 24 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6b5818d6de32..f4acd6c4f808 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -326,11 +326,12 @@ PAGEFLAG_FALSE(HighMem)
 #ifdef CONFIG_SWAP
 static __always_inline int PageSwapCache(struct page *page)
 {
+   page = compound_head(page);
return PageSwapBacked(page) && test_bit(PG_swapcache, >flags);
 
 }
-SETPAGEFLAG(SwapCache, swapcache, PF_NO_COMPOUND)
-CLEARPAGEFLAG(SwapCache, swapcache, PF_NO_COMPOUND)
+SETPAGEFLAG(SwapCache, swapcache, PF_NO_TAIL)
+CLEARPAGEFLAG(SwapCache, swapcache, PF_NO_TAIL)
 #else
 PAGEFLAG_FALSE(SwapCache)
 #endif
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 199a07efc44d..504f67d73f67 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -38,6 +38,7 @@ struct address_space *swapper_spaces[MAX_SWAPFILES];
 static unsigned int nr_swapper_spaces[MAX_SWAPFILES];
 
 #define INC_CACHE_INFO(x)  do { swap_cache_info.x++; } while (0)
+#define ADD_CACHE_INFO(x, nr)  do { swap_cache_info.x += (nr); } while (0)
 
 static struct {
unsigned long add_total;
@@ -90,39 +91,52 @@ void show_swap_cache_info(void)
  */
 int __add_to_swap_cache(struct page *page, swp_entry_t entry)
 {
-   int error;
+   int error, i, nr = hpage_nr_pages(page);
struct address_space *address_space;
+   struct page *cur_page;
+   swp_entry_t cur_entry;
 
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageSwapCache(page), page);
VM_BUG_ON_PAGE(!PageSwapBacked(page), page);
 
-   get_page(page);
+   page_ref_add(page, nr);
SetPageSwapCache(page);
-   set_page_private(page, entry.val);
 
address_space = swap_address_space(entry);
+   cur_page = page;
+   cur_entry.val = entry.val;
spin_lock_irq(_space->tree_lock);
-   error = radix_tree_insert(_space->page_tree,
- swp_offset(entry), page);
-   if (likely(!error)) {
-   address_space->nrpages++;
-   __inc_node_page_state(page, NR_FILE_PAGES);
-   INC_CACHE_INFO(add_total);
+   for (i = 0; i < nr; i++, cur_page++, cur_entry.val++) {
+   set_page_private(cur_page, cur_entry.val);
+   error = radix_tree_insert(_space->page_tree,
+ swp_offset(cur_entry), cur_page);
+   if (unlikely(error))
+   break;
}
-   spin_unlock_irq(_space->tree_lock);
-
-   if (unlikely(error)) {
+   if (likely(!error)) {
+   address_space->nrpages += nr;
+   __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr);
+   ADD_CACHE_INFO(add_total, nr);
+   } else {
/*
 * Only the context which have set SWAP_HAS_CACHE flag
 * would call add_to_swap_cache().
 * So add_to_swap_cache() doesn't returns -EEXIST.
 */
VM_BUG_ON(error == -EEXIST);
-   set_page_private(page, 0UL);
+   set_page_private(cur_page, 0UL);
+   while (i--) {
+   cur_page--;
+   cur_entry.val--;
+   radix_tree_delete(_space->page_tree,
+ swp_offset(cur_entry));
+   set_page_private(cur_page, 0UL);
+   }
ClearPageSwapCache(page);
-   put_page(page);
+   page_ref_sub(page, nr);
}
+   spin_unlock_irq(_space->tree_lock);
 
return error;
 }
@@ -132,7 +146,7 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, 
gfp_t gfp_mask)
 {
int error;
 
-   error = radix_tree_maybe_preload(gfp_mask);
+   error = radix_tree_maybe_preload_order(gfp_mask, compound_order(page));
if (!error) {
error = __add_to_swap_cache(page, entry);
radix_tree_preload_end();
@@ -148,6 +162,7 @@ void __delete_from_swap_cache(struct page *page)
 {
swp_entry_t entry;

[PATCH -mm -v7 6/9] mm, THP, swap: Support to add/delete THP to/from swap cache

2017-03-27 Thread Huang, Ying

From: Huang Ying 

With this patch, a THP (Transparent Huge Page) can be added/deleted
to/from the swap cache as a set of (HPAGE_PMD_NR) sub-pages.

This will be used for the THP (Transparent Huge Page) swap support.
Where one THP may be added/delted to/from the swap cache.  This will
batch the swap cache operations to reduce the lock acquire/release times
for the THP swap too.

Cc: Hugh Dickins 
Cc: Shaohua Li 
Cc: Minchan Kim 
Cc: Rik van Riel 
Cc: Andrea Arcangeli 
Cc: Kirill A. Shutemov 
Signed-off-by: "Huang, Ying" 
---
 include/linux/page-flags.h |  5 ++--
 mm/swap_state.c| 64 ++
 2 files changed, 45 insertions(+), 24 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6b5818d6de32..f4acd6c4f808 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -326,11 +326,12 @@ PAGEFLAG_FALSE(HighMem)
 #ifdef CONFIG_SWAP
 static __always_inline int PageSwapCache(struct page *page)
 {
+   page = compound_head(page);
return PageSwapBacked(page) && test_bit(PG_swapcache, >flags);
 
 }
-SETPAGEFLAG(SwapCache, swapcache, PF_NO_COMPOUND)
-CLEARPAGEFLAG(SwapCache, swapcache, PF_NO_COMPOUND)
+SETPAGEFLAG(SwapCache, swapcache, PF_NO_TAIL)
+CLEARPAGEFLAG(SwapCache, swapcache, PF_NO_TAIL)
 #else
 PAGEFLAG_FALSE(SwapCache)
 #endif
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 199a07efc44d..504f67d73f67 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -38,6 +38,7 @@ struct address_space *swapper_spaces[MAX_SWAPFILES];
 static unsigned int nr_swapper_spaces[MAX_SWAPFILES];
 
 #define INC_CACHE_INFO(x)  do { swap_cache_info.x++; } while (0)
+#define ADD_CACHE_INFO(x, nr)  do { swap_cache_info.x += (nr); } while (0)
 
 static struct {
unsigned long add_total;
@@ -90,39 +91,52 @@ void show_swap_cache_info(void)
  */
 int __add_to_swap_cache(struct page *page, swp_entry_t entry)
 {
-   int error;
+   int error, i, nr = hpage_nr_pages(page);
struct address_space *address_space;
+   struct page *cur_page;
+   swp_entry_t cur_entry;
 
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageSwapCache(page), page);
VM_BUG_ON_PAGE(!PageSwapBacked(page), page);
 
-   get_page(page);
+   page_ref_add(page, nr);
SetPageSwapCache(page);
-   set_page_private(page, entry.val);
 
address_space = swap_address_space(entry);
+   cur_page = page;
+   cur_entry.val = entry.val;
spin_lock_irq(_space->tree_lock);
-   error = radix_tree_insert(_space->page_tree,
- swp_offset(entry), page);
-   if (likely(!error)) {
-   address_space->nrpages++;
-   __inc_node_page_state(page, NR_FILE_PAGES);
-   INC_CACHE_INFO(add_total);
+   for (i = 0; i < nr; i++, cur_page++, cur_entry.val++) {
+   set_page_private(cur_page, cur_entry.val);
+   error = radix_tree_insert(_space->page_tree,
+ swp_offset(cur_entry), cur_page);
+   if (unlikely(error))
+   break;
}
-   spin_unlock_irq(_space->tree_lock);
-
-   if (unlikely(error)) {
+   if (likely(!error)) {
+   address_space->nrpages += nr;
+   __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr);
+   ADD_CACHE_INFO(add_total, nr);
+   } else {
/*
 * Only the context which have set SWAP_HAS_CACHE flag
 * would call add_to_swap_cache().
 * So add_to_swap_cache() doesn't returns -EEXIST.
 */
VM_BUG_ON(error == -EEXIST);
-   set_page_private(page, 0UL);
+   set_page_private(cur_page, 0UL);
+   while (i--) {
+   cur_page--;
+   cur_entry.val--;
+   radix_tree_delete(_space->page_tree,
+ swp_offset(cur_entry));
+   set_page_private(cur_page, 0UL);
+   }
ClearPageSwapCache(page);
-   put_page(page);
+   page_ref_sub(page, nr);
}
+   spin_unlock_irq(_space->tree_lock);
 
return error;
 }
@@ -132,7 +146,7 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, 
gfp_t gfp_mask)
 {
int error;
 
-   error = radix_tree_maybe_preload(gfp_mask);
+   error = radix_tree_maybe_preload_order(gfp_mask, compound_order(page));
if (!error) {
error = __add_to_swap_cache(page, entry);
radix_tree_preload_end();
@@ -148,6 +162,7 @@ void __delete_from_swap_cache(struct page *page)
 {
swp_entry_t entry;
struct address_space *address_space;
+   int i, nr = hpage_nr_pages(page);
 
VM_BUG_ON_PAGE(!PageLocked(page), page);

[PATCH -mm -v7 0/9] THP swap: Delay splitting THP during swapping out

2017-03-27 Thread Huang, Ying

From: Huang Ying 

Hi, Andrew, could you help me to check whether the overall design is
reasonable?

Hi, Hugh, Shaohua, Minchan and Rik, could you help me to review the
swap part of the patchset?  Especially [1/9], [3/9], [4/9], [5/9],
[6/9], [9/9].

Hi, Andrea could you help me to review the THP part of the patchset?
Especially [2/9], [7/9] and [8/9].

Hi, Johannes, Michal and Vladimir, I am not very confident about the
memory cgroup part, especially [2/9].  Could you help me to review it?

And for all, Any comment is welcome!


Recently, the performance of the storage devices improved so fast that
we cannot saturate the disk bandwidth with single logical CPU when do
page swap out even on a high-end server machine.  Because the
performance of the storage device improved faster than that of single
logical CPU.  And it seems that the trend will not change in the near
future.  On the other hand, the THP becomes more and more popular
because of increased memory size.  So it becomes necessary to optimize
THP swap performance.

The advantages of the THP swap support include:

- Batch the swap operations for the THP to reduce lock
  acquiring/releasing, including allocating/freeing the swap space,
  adding/deleting to/from the swap cache, and writing/reading the swap
  space, etc.  This will help improve the performance of the THP swap.

- The THP swap space read/write will be 2M sequential IO.  It is
  particularly helpful for the swap read, which are usually 4k random
  IO.  This will improve the performance of the THP swap too.

- It will help the memory fragmentation, especially when the THP is
  heavily used by the applications.  The 2M continuous pages will be
  free up after THP swapping out.

- It will improve the THP utilization on the system with the swap
  turned on.  Because the speed for khugepaged to collapse the normal
  pages into the THP is quite slow.  After the THP is split during the
  swapping out, it will take quite long time for the normal pages to
  collapse back into the THP after being swapped in.  The high THP
  utilization helps the efficiency of the page based memory management
  too.

There are some concerns regarding THP swap in, mainly because possible
enlarged read/write IO size (for swap in/out) may put more overhead on
the storage device.  To deal with that, the THP swap in should be
turned on only when necessary.  For example, it can be selected via
"always/never/madvise" logic, to be turned on globally, turned off
globally, or turned on only for VMA with MADV_HUGEPAGE, etc.

This patchset is based on 03/17 head of mmotm/master.

This patchset is the first step for the THP swap support.  The plan is
to delay splitting THP step by step, finally avoid splitting THP
during the THP swapping out and swap out/in the THP as a whole.

As the first step, in this patchset, the splitting huge page is
delayed from almost the first step of swapping out to after allocating
the swap space for the THP and adding the THP into the swap cache.
This will reduce lock acquiring/releasing for the locks used for the
swap cache management.

With the patchset, the swap out throughput improves 14.9% (from about
3.77GB/s to about 4.34GB/s) in the vm-scalability swap-w-seq test case
with 8 processes.  The test is done on a Xeon E5 v3 system.  The swap
device used is a RAM simulated PMEM (persistent memory) device.  To
test the sequential swapping out, the test case creates 8 processes,
which sequentially allocate and write to the anonymous pages until the
RAM and part of the swap device is used up.

The detailed comparison result is as follow,

base base+patchset
 -- 
 %stddev %change %stddev
 \  |\  
   7043990 ±  0% +21.2%8536807 ±  0%  vm-scalability.throughput
109.94 ±  1% -16.2%  92.09 ±  0%  vm-scalability.time.elapsed_time
   3957091 ±  0% +14.9%4547173 ±  0%  vmstat.swap.so
 31.46 ±  1% -38.3%  19.42 ±  0%  perf-stat.cache-miss-rate%
  1.04 ±  1% +22.2%   1.27 ±  0%  perf-stat.ipc
  9.33 ±  2% -60.7%   3.67 ±  1%  
perf-profile.calltrace.cycles-pp.add_to_swap.shrink_page_list.shrink_inactive_list.shrink_node_memcg.shrink_node

Changelog:

v7:

- Rebased on latest -mm tree
- Revise get_swap_pages() THP support per Tim's comments

v6:

- Rebased on latest -mm tree (cluster lock, etc).
- Fix a potential uninitialized variable bug in __swap_entry_free()
- Revise the swap read-ahead changes to avoid a potential race
  condition between swap off and swap out in theory.

v5:

- Per Hillf's comments, fix a locking bug in error path of
  __add_to_swap_cache().  And merge the code to calculate extra_pins
  into can_split_huge_page().

v4:

- Per Johannes' comments, simplified swap cgroup array accessing code.
- Per Kirill and Dave Hansen's comments, used HPAGE_PMD_NR instead of
  HPAGE_SIZE/PAGE_SIZE.
- Per

[PATCH -mm -v7 0/9] THP swap: Delay splitting THP during swapping out

2017-03-27 Thread Huang, Ying

From: Huang Ying 

Hi, Andrew, could you help me to check whether the overall design is
reasonable?

Hi, Hugh, Shaohua, Minchan and Rik, could you help me to review the
swap part of the patchset?  Especially [1/9], [3/9], [4/9], [5/9],
[6/9], [9/9].

Hi, Andrea could you help me to review the THP part of the patchset?
Especially [2/9], [7/9] and [8/9].

Hi, Johannes, Michal and Vladimir, I am not very confident about the
memory cgroup part, especially [2/9].  Could you help me to review it?

And for all, Any comment is welcome!


Recently, the performance of the storage devices improved so fast that
we cannot saturate the disk bandwidth with single logical CPU when do
page swap out even on a high-end server machine.  Because the
performance of the storage device improved faster than that of single
logical CPU.  And it seems that the trend will not change in the near
future.  On the other hand, the THP becomes more and more popular
because of increased memory size.  So it becomes necessary to optimize
THP swap performance.

The advantages of the THP swap support include:

- Batch the swap operations for the THP to reduce lock
  acquiring/releasing, including allocating/freeing the swap space,
  adding/deleting to/from the swap cache, and writing/reading the swap
  space, etc.  This will help improve the performance of the THP swap.

- The THP swap space read/write will be 2M sequential IO.  It is
  particularly helpful for the swap read, which are usually 4k random
  IO.  This will improve the performance of the THP swap too.

- It will help the memory fragmentation, especially when the THP is
  heavily used by the applications.  The 2M continuous pages will be
  free up after THP swapping out.

- It will improve the THP utilization on the system with the swap
  turned on.  Because the speed for khugepaged to collapse the normal
  pages into the THP is quite slow.  After the THP is split during the
  swapping out, it will take quite long time for the normal pages to
  collapse back into the THP after being swapped in.  The high THP
  utilization helps the efficiency of the page based memory management
  too.

There are some concerns regarding THP swap in, mainly because possible
enlarged read/write IO size (for swap in/out) may put more overhead on
the storage device.  To deal with that, the THP swap in should be
turned on only when necessary.  For example, it can be selected via
"always/never/madvise" logic, to be turned on globally, turned off
globally, or turned on only for VMA with MADV_HUGEPAGE, etc.

This patchset is based on 03/17 head of mmotm/master.

This patchset is the first step for the THP swap support.  The plan is
to delay splitting THP step by step, finally avoid splitting THP
during the THP swapping out and swap out/in the THP as a whole.

As the first step, in this patchset, the splitting huge page is
delayed from almost the first step of swapping out to after allocating
the swap space for the THP and adding the THP into the swap cache.
This will reduce lock acquiring/releasing for the locks used for the
swap cache management.

With the patchset, the swap out throughput improves 14.9% (from about
3.77GB/s to about 4.34GB/s) in the vm-scalability swap-w-seq test case
with 8 processes.  The test is done on a Xeon E5 v3 system.  The swap
device used is a RAM simulated PMEM (persistent memory) device.  To
test the sequential swapping out, the test case creates 8 processes,
which sequentially allocate and write to the anonymous pages until the
RAM and part of the swap device is used up.

The detailed comparison result is as follow,

base base+patchset
 -- 
 %stddev %change %stddev
 \  |\  
   7043990 ±  0% +21.2%8536807 ±  0%  vm-scalability.throughput
109.94 ±  1% -16.2%  92.09 ±  0%  vm-scalability.time.elapsed_time
   3957091 ±  0% +14.9%4547173 ±  0%  vmstat.swap.so
 31.46 ±  1% -38.3%  19.42 ±  0%  perf-stat.cache-miss-rate%
  1.04 ±  1% +22.2%   1.27 ±  0%  perf-stat.ipc
  9.33 ±  2% -60.7%   3.67 ±  1%  
perf-profile.calltrace.cycles-pp.add_to_swap.shrink_page_list.shrink_inactive_list.shrink_node_memcg.shrink_node

Changelog:

v7:

- Rebased on latest -mm tree
- Revise get_swap_pages() THP support per Tim's comments

v6:

- Rebased on latest -mm tree (cluster lock, etc).
- Fix a potential uninitialized variable bug in __swap_entry_free()
- Revise the swap read-ahead changes to avoid a potential race
  condition between swap off and swap out in theory.

v5:

- Per Hillf's comments, fix a locking bug in error path of
  __add_to_swap_cache().  And merge the code to calculate extra_pins
  into can_split_huge_page().

v4:

- Per Johannes' comments, simplified swap cgroup array accessing code.
- Per Kirill and Dave Hansen's comments, used HPAGE_PMD_NR instead of
  HPAGE_SIZE/PAGE_SIZE.
- Per Anshuman's comments, used

[PATCH -mm -v7 9/9] mm, THP, swap: Delay splitting THP during swap out

2017-03-27 Thread Huang, Ying

From: Huang Ying 

In this patch, splitting huge page is delayed from almost the first step
of swapping out to after allocating the swap space for the
THP (Transparent Huge Page) and adding the THP into the swap cache.
This will reduce lock acquiring/releasing for the locks used for the
swap cache management.

This is the first step for the THP swap support.  The plan is to delay
splitting the THP step by step and avoid splitting the THP finally.

The advantages of the THP swap support include:

- Batch the swap operations for the THP to reduce lock
  acquiring/releasing, including allocating/freeing the swap space,
  adding/deleting to/from the swap cache, and writing/reading the swap
  space, etc.  This will help to improve the THP swap performance.

- The THP swap space read/write will be 2M sequential IO.  It is
  particularly helpful for the swap read, which usually are 4k random
  IO.  This will help to improve the THP swap performance too.

- It will help the memory fragmentation, especially when the THP is
  heavily used by the applications.  The 2M continuous pages will be
  free up after the THP swapping out.

- It will improve the THP utilization on the system with the swap
  turned on.  Because the speed for khugepaged to collapse the normal
  pages into the THP is quite slow.  After the THP is split during the
  swapping out, it will take quite long time for the normal pages to
  collapse back into the THP after being swapped in.  The high THP
  utilization helps the efficiency of the page based memory management
  too.

There are some concerns regarding THP swap in, mainly because possible
enlarged read/write IO size (for swap in/out) may put more overhead on
the storage device.  To deal with that, the THP swap in should be
turned on only when necessary.  For example, it can be selected via
"always/never/madvise" logic, to be turned on globally, turned off
globally, or turned on only for VMA with MADV_HUGEPAGE, etc.

With the patchset, the swap out throughput improves 14.9% (from about
3.77GB/s to about 4.34GB/s) in the vm-scalability swap-w-seq test case
with 8 processes.  The test is done on a Xeon E5 v3 system.  The swap
device used is a RAM simulated PMEM (persistent memory) device.  To
test the sequential swapping out, the test case creates 8 processes,
which sequentially allocate and write to the anonymous pages until the
RAM and part of the swap device is used up.

The detailed comparison result is as follow,

base base+patchset
 --
 %stddev %change %stddev
 \  |\
   7043990 ±  0% +21.2%8536807 ±  0%  vm-scalability.throughput
109.94 ±  1% -16.2%  92.09 ±  0%  vm-scalability.time.elapsed_time
   3957091 ±  0% +14.9%4547173 ±  0%  vmstat.swap.so
 31.46 ±  1% -38.3%  19.42 ±  0%  perf-stat.cache-miss-rate%
  1.04 ±  1% +22.2%   1.27 ±  0%  perf-stat.ipc
  9.33 ±  2% -60.7%   3.67 ±  1%  
perf-profile.calltrace.cycles-pp.add_to_swap.shrink_page_list.shrink_inactive_list.shrink_node_memcg.shrink_node

Signed-off-by: "Huang, Ying" 
---
 mm/swap_state.c | 60 ++---
 1 file changed, 57 insertions(+), 3 deletions(-)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 504f67d73f67..6d63ff703d39 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -183,12 +184,53 @@ void __delete_from_swap_cache(struct page *page)
ADD_CACHE_INFO(del_total, nr);
 }
 
+#ifdef CONFIG_THP_SWAP_CLUSTER
+int add_to_swap_trans_huge(struct page *page, struct list_head *list)
+{
+   swp_entry_t entry;
+   int ret = 0;
+
+   /* cannot split, which may be needed during swap in, skip it */
+   if (!can_split_huge_page(page, NULL))
+   return -EBUSY;
+   /* fallback to split huge page firstly if no PMD map */
+   if (!compound_mapcount(page))
+   return 0;
+   entry = get_huge_swap_page();
+   if (!entry.val)
+   return 0;
+   if (mem_cgroup_try_charge_swap(page, entry, HPAGE_PMD_NR)) {
+   __swapcache_free(entry, true);
+   return -EOVERFLOW;
+   }
+   ret = add_to_swap_cache(page, entry,
+   __GFP_HIGH | __GFP_NOMEMALLOC|__GFP_NOWARN);
+   /* -ENOMEM radix-tree allocation failure */
+   if (ret) {
+   __swapcache_free(entry, true);
+   return 0;
+   }
+   ret = split_huge_page_to_list(page, list);
+   if (ret) {
+   delete_from_swap_cache(page);
+   return -EBUSY;
+   }
+   return 1;
+}
+#else
+static inline int add_to_swap_trans_huge(struct page *page,
+struct list_head *list)
+{
+   return 0;
+}
+#endif
+
 /**
  *

[PATCH -mm -v7 3/9] mm, THP, swap: Add swap cluster allocate/free functions

2017-03-27 Thread Huang, Ying

From: Huang Ying 

The swap cluster allocation/free functions are added based on the
existing swap cluster management mechanism for SSD.  These functions
don't work for the rotating hard disks because the existing swap cluster
management mechanism doesn't work for them.  The hard disks support may
be added if someone really need it.  But that needn't be included in
this patchset.

This will be used for the THP (Transparent Huge Page) swap support.
Where one swap cluster will hold the contents of each THP swapped out.

Cc: Andrea Arcangeli 
Cc: Kirill A. Shutemov 
Cc: Hugh Dickins 
Cc: Shaohua Li 
Cc: Minchan Kim 
Cc: Rik van Riel 
Signed-off-by: "Huang, Ying" 
---
 mm/swapfile.c | 217 +-
 1 file changed, 156 insertions(+), 61 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 1ef4fc82c0fa..54480acbbeef 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -378,6 +378,14 @@ static void swap_cluster_schedule_discard(struct 
swap_info_struct *si,
schedule_work(>discard_work);
 }
 
+static void __free_cluster(struct swap_info_struct *si, unsigned long idx)
+{
+   struct swap_cluster_info *ci = si->cluster_info;
+
+   cluster_set_flag(ci + idx, CLUSTER_FLAG_FREE);
+   cluster_list_add_tail(>free_clusters, ci, idx);
+}
+
 /*
  * Doing discard actually. After a cluster discard is finished, the cluster
  * will be added to free cluster list. caller should hold si->lock.
@@ -398,10 +406,7 @@ static void swap_do_scheduled_discard(struct 
swap_info_struct *si)
 
spin_lock(>lock);
ci = lock_cluster(si, idx * SWAPFILE_CLUSTER);
-   cluster_set_flag(ci, CLUSTER_FLAG_FREE);
-   unlock_cluster(ci);
-   cluster_list_add_tail(>free_clusters, info, idx);
-   ci = lock_cluster(si, idx * SWAPFILE_CLUSTER);
+   __free_cluster(si, idx);
memset(si->swap_map + idx * SWAPFILE_CLUSTER,
0, SWAPFILE_CLUSTER);
unlock_cluster(ci);
@@ -419,6 +424,34 @@ static void swap_discard_work(struct work_struct *work)
spin_unlock(>lock);
 }
 
+static void alloc_cluster(struct swap_info_struct *si, unsigned long idx)
+{
+   struct swap_cluster_info *ci = si->cluster_info;
+
+   VM_BUG_ON(cluster_list_first(>free_clusters) != idx);
+   cluster_list_del_first(>free_clusters, ci);
+   cluster_set_count_flag(ci + idx, 0, 0);
+}
+
+static void free_cluster(struct swap_info_struct *si, unsigned long idx)
+{
+   struct swap_cluster_info *ci = si->cluster_info + idx;
+
+   VM_BUG_ON(cluster_count(ci) != 0);
+   /*
+* If the swap is discardable, prepare discard the cluster
+* instead of free it immediately. The cluster will be freed
+* after discard.
+*/
+   if ((si->flags & (SWP_WRITEOK | SWP_PAGE_DISCARD)) ==
+   (SWP_WRITEOK | SWP_PAGE_DISCARD)) {
+   swap_cluster_schedule_discard(si, idx);
+   return;
+   }
+
+   __free_cluster(si, idx);
+}
+
 /*
  * The cluster corresponding to page_nr will be used. The cluster will be
  * removed from free cluster list and its usage counter will be increased.
@@ -430,11 +463,8 @@ static void inc_cluster_info_page(struct swap_info_struct 
*p,
 
if (!cluster_info)
return;
-   if (cluster_is_free(_info[idx])) {
-   VM_BUG_ON(cluster_list_first(>free_clusters) != idx);
-   cluster_list_del_first(>free_clusters, cluster_info);
-   cluster_set_count_flag(_info[idx], 0, 0);
-   }
+   if (cluster_is_free(_info[idx]))
+   alloc_cluster(p, idx);
 
VM_BUG_ON(cluster_count(_info[idx]) >= SWAPFILE_CLUSTER);
cluster_set_count(_info[idx],
@@ -458,21 +488,8 @@ static void dec_cluster_info_page(struct swap_info_struct 
*p,
cluster_set_count(_info[idx],
cluster_count(_info[idx]) - 1);
 
-   if (cluster_count(_info[idx]) == 0) {
-   /*
-* If the swap is discardable, prepare discard the cluster
-* instead of free it immediately. The cluster will be freed
-* after discard.
-*/
-   if ((p->flags & (SWP_WRITEOK | SWP_PAGE_DISCARD)) ==
-(SWP_WRITEOK | SWP_PAGE_DISCARD)) {
-   swap_cluster_schedule_discard(p, idx);
-   return;
-   }
-
-   cluster_set_flag(_info[idx], CLUSTER_FLAG_FREE);
-   cluster_list_add_tail(>free_clusters, cluster_info, idx);
-   }
+   if (cluster_count(_info[idx]) == 0)
+   free_cluster(p, idx);
 }
 
 /*
@@ -562,6 +579,71 @@ static bool

[PATCH -mm -v7 5/9] mm, THP, swap: Support to clear SWAP_HAS_CACHE for huge page

2017-03-27 Thread Huang, Ying

From: Huang Ying 

__swapcache_free() is added to support to clear the SWAP_HAS_CACHE flag
for the huge page.  This will free the specified swap cluster now.
Because now this function will be called only in the error path to free
the swap cluster just allocated.  So the corresponding swap_map[i] ==
SWAP_HAS_CACHE, that is, the swap count is 0.  This makes the
implementation simpler than that of the ordinary swap entry.

This will be used for delaying splitting THP (Transparent Huge Page)
during swapping out.  Where for one THP to swap out, we will allocate a
swap cluster, add the THP into the swap cache, then split the THP.  If
anything fails after allocating the swap cluster and before splitting
the THP successfully, the swapcache_free_trans_huge() will be used to
free the swap space allocated.

Cc: Andrea Arcangeli 
Cc: Kirill A. Shutemov 
Cc: Hugh Dickins 
Cc: Shaohua Li 
Cc: Minchan Kim 
Cc: Rik van Riel 
Signed-off-by: "Huang, Ying" 
---
 include/linux/swap.h |  9 +++--
 mm/swapfile.c| 34 --
 2 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index e3a7609a8989..2f2a6c0363aa 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -394,7 +394,7 @@ extern void swap_shmem_alloc(swp_entry_t);
 extern int swap_duplicate(swp_entry_t);
 extern int swapcache_prepare(swp_entry_t);
 extern void swap_free(swp_entry_t);
-extern void swapcache_free(swp_entry_t);
+extern void __swapcache_free(swp_entry_t entry, bool huge);
 extern void swapcache_free_entries(swp_entry_t *entries, int n);
 extern int free_swap_and_cache(swp_entry_t);
 extern int swap_type_of(dev_t, sector_t, struct block_device **);
@@ -456,7 +456,7 @@ static inline void swap_free(swp_entry_t swp)
 {
 }
 
-static inline void swapcache_free(swp_entry_t swp)
+static inline void __swapcache_free(swp_entry_t swp, bool huge)
 {
 }
 
@@ -544,6 +544,11 @@ static inline swp_entry_t get_huge_swap_page(void)
 }
 #endif
 
+static inline void swapcache_free(swp_entry_t entry)
+{
+   __swapcache_free(entry, false);
+}
+
 #ifdef CONFIG_MEMCG
 static inline int mem_cgroup_swappiness(struct mem_cgroup *memcg)
 {
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 382e84541e16..055cfc1be057 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -855,6 +855,29 @@ static void swap_free_huge_cluster(struct swap_info_struct 
*si,
_swap_entry_free(si, offset, true);
 }
 
+static void swapcache_free_trans_huge(struct swap_info_struct *si,
+ swp_entry_t entry)
+{
+   unsigned long offset = swp_offset(entry);
+   unsigned long idx = offset / SWAPFILE_CLUSTER;
+   struct swap_cluster_info *ci;
+   unsigned char *map;
+   unsigned int i;
+
+   spin_lock(>lock);
+   ci = lock_cluster(si, offset);
+   map = si->swap_map + offset;
+   for (i = 0; i < SWAPFILE_CLUSTER; i++) {
+   VM_BUG_ON(map[i] != SWAP_HAS_CACHE);
+   map[i] = 0;
+   }
+   unlock_cluster(ci);
+   /* Cluster size is same as huge pmd size */
+   mem_cgroup_uncharge_swap(entry, HPAGE_PMD_NR);
+   swap_free_huge_cluster(si, idx);
+   spin_unlock(>lock);
+}
+
 static int swap_alloc_huge_cluster(struct swap_info_struct *si,
   swp_entry_t *slot)
 {
@@ -887,6 +910,11 @@ static inline int swap_alloc_huge_cluster(struct 
swap_info_struct *si,
 {
return 0;
 }
+
+static inline void swapcache_free_trans_huge(struct swap_info_struct *si,
+swp_entry_t entry)
+{
+}
 #endif
 
 static unsigned long scan_swap_map(struct swap_info_struct *si,
@@ -1157,13 +1185,15 @@ void swap_free(swp_entry_t entry)
 /*
  * Called after dropping swapcache to decrease refcnt to swap entries.
  */
-void swapcache_free(swp_entry_t entry)
+void __swapcache_free(swp_entry_t entry, bool huge)
 {
struct swap_info_struct *p;
 
p = _swap_info_get(entry);
if (p) {
-   if (!__swap_entry_free(p, entry, SWAP_HAS_CACHE))
+   if (unlikely(huge))
+   swapcache_free_trans_huge(p, entry);
+   else if (!__swap_entry_free(p, entry, SWAP_HAS_CACHE))
free_swap_slot(entry);
}
 }
-- 
2.11.0

[PATCH -mm -v7 7/9] mm, THP: Add can_split_huge_page()

2017-03-27 Thread Huang, Ying

From: Huang Ying 

Separates checking whether we can split the huge page from
split_huge_page_to_list() into a function.  This will help to check that
before splitting the THP (Transparent Huge Page) really.

This will be used for delaying splitting THP during swapping out.  Where
for a THP, we will allocate a swap cluster, add the THP into the swap
cache, then split the THP.  To avoid the unnecessary operations for the
un-splittable THP, we will check that firstly.

There is no functionality change in this patch.

Cc: Andrea Arcangeli 
Cc: Ebru Akagunduz 
Signed-off-by: "Huang, Ying" 
Acked-by: Kirill A. Shutemov 
---
 include/linux/huge_mm.h |  7 +++
 mm/huge_memory.c| 17 ++---
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index a3762d49ba39..d3b3e8fcc717 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -113,6 +113,7 @@ extern unsigned long thp_get_unmapped_area(struct file 
*filp,
 extern void prep_transhuge_page(struct page *page);
 extern void free_transhuge_page(struct page *page);
 
+bool can_split_huge_page(struct page *page, int *pextra_pins);
 int split_huge_page_to_list(struct page *page, struct list_head *list);
 static inline int split_huge_page(struct page *page)
 {
@@ -231,6 +232,12 @@ static inline void prep_transhuge_page(struct page *page) 
{}
 
 #define thp_get_unmapped_area  NULL
 
+static inline bool
+can_split_huge_page(struct page *page, int *pextra_pins)
+{
+   BUILD_BUG();
+   return false;
+}
 static inline int
 split_huge_page_to_list(struct page *page, struct list_head *list)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d14dd961f626..08ccf0cebe8f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2367,6 +2367,19 @@ int page_trans_huge_mapcount(struct page *page, int 
*total_mapcount)
return ret;
 }
 
+/* Racy check whether the huge page can be split */
+bool can_split_huge_page(struct page *page, int *pextra_pins)
+{
+   int extra_pins = 0;
+
+   /* Additional pins from radix tree */
+   if (!PageAnon(page))
+   extra_pins = HPAGE_PMD_NR;
+   if (pextra_pins)
+   *pextra_pins = extra_pins;
+   return total_mapcount(page) == page_count(page) - extra_pins - 1;
+}
+
 /*
  * This function splits huge page into normal pages. @page can point to any
  * subpage of huge page to split. Split doesn't change the position of @page.
@@ -2426,8 +2439,6 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
goto out;
}
 
-   /* Addidional pins from radix tree */
-   extra_pins = HPAGE_PMD_NR;
anon_vma = NULL;
i_mmap_lock_read(mapping);
}
@@ -2436,7 +2447,7 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
 * Racy check if we can split the page, before freeze_page() will
 * split PMDs
 */
-   if (total_mapcount(head) != page_count(head) - extra_pins - 1) {
+   if (!can_split_huge_page(head, _pins)) {
ret = -EBUSY;
goto out_unlock;
}
-- 
2.11.0

[PATCH -mm -v7 1/9] mm, swap: Make swap cluster size same of THP size on x86_64

2017-03-27 Thread Huang, Ying

From: Huang Ying 

In this patch, the size of the swap cluster is changed to that of the
THP (Transparent Huge Page) on x86_64 architecture (512).  This is for
the THP swap support on x86_64.  Where one swap cluster will be used to
hold the contents of each THP swapped out.  And some information of the
swapped out THP (such as compound map count) will be recorded in the
swap_cluster_info data structure.

For other architectures which want THP swap support,
ARCH_USES_THP_SWAP_CLUSTER need to be selected in the Kconfig file for
the architecture.

In effect, this will enlarge swap cluster size by 2 times on x86_64.
Which may make it harder to find a free cluster when the swap space
becomes fragmented.  So that, this may reduce the continuous swap space
allocation and sequential write in theory.  The performance test in 0day
shows no regressions caused by this.

Cc: Hugh Dickins 
Cc: Shaohua Li 
Cc: Minchan Kim 
Cc: Rik van Riel 
Suggested-by: Andrew Morton 
Signed-off-by: "Huang, Ying" 
---
 arch/x86/Kconfig |  1 +
 mm/Kconfig   | 13 +
 mm/swapfile.c|  4 
 3 files changed, 18 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index abfc31fb0bee..852d13878793 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -178,6 +178,7 @@ config X86
select USER_STACKTRACE_SUPPORT
select VIRT_TO_BUS
select X86_FEATURE_NAMESif PROC_FS
+   select ARCH_USES_THP_SWAP_CLUSTER   if X86_64
 
 config INSTRUCTION_DECODER
def_bool y
diff --git a/mm/Kconfig b/mm/Kconfig
index 9b8fccb969dc..7b708e200c29 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -499,6 +499,19 @@ config FRONTSWAP
 
  If unsure, say Y to enable frontswap.
 
+config ARCH_USES_THP_SWAP_CLUSTER
+   bool
+   default n
+
+config THP_SWAP_CLUSTER
+   bool
+   depends on SWAP && TRANSPARENT_HUGEPAGE && ARCH_USES_THP_SWAP_CLUSTER
+   default y
+   help
+ Use one swap cluster to hold the contents of the THP
+ (Transparent Huge Page) swapped out.  The size of the swap
+ cluster will be same as that of THP.
+
 config CMA
bool "Contiguous Memory Allocator"
depends on HAVE_MEMBLOCK && MMU
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 53b5881ee0d6..abc401f72a0a 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -199,7 +199,11 @@ static void discard_swap_cluster(struct swap_info_struct 
*si,
}
 }
 
+#ifdef CONFIG_THP_SWAP_CLUSTER
+#define SWAPFILE_CLUSTER   HPAGE_PMD_NR
+#else
 #define SWAPFILE_CLUSTER   256
+#endif
 #define LATENCY_LIMIT  256
 
 static inline void cluster_set_flag(struct swap_cluster_info *info,
-- 
2.11.0

[PATCH -mm -v7 9/9] mm, THP, swap: Delay splitting THP during swap out

2017-03-27 Thread Huang, Ying

From: Huang Ying 

In this patch, splitting huge page is delayed from almost the first step
of swapping out to after allocating the swap space for the
THP (Transparent Huge Page) and adding the THP into the swap cache.
This will reduce lock acquiring/releasing for the locks used for the
swap cache management.

This is the first step for the THP swap support.  The plan is to delay
splitting the THP step by step and avoid splitting the THP finally.

The advantages of the THP swap support include:

- Batch the swap operations for the THP to reduce lock
  acquiring/releasing, including allocating/freeing the swap space,
  adding/deleting to/from the swap cache, and writing/reading the swap
  space, etc.  This will help to improve the THP swap performance.

- The THP swap space read/write will be 2M sequential IO.  It is
  particularly helpful for the swap read, which usually are 4k random
  IO.  This will help to improve the THP swap performance too.

- It will help the memory fragmentation, especially when the THP is
  heavily used by the applications.  The 2M continuous pages will be
  free up after the THP swapping out.

- It will improve the THP utilization on the system with the swap
  turned on.  Because the speed for khugepaged to collapse the normal
  pages into the THP is quite slow.  After the THP is split during the
  swapping out, it will take quite long time for the normal pages to
  collapse back into the THP after being swapped in.  The high THP
  utilization helps the efficiency of the page based memory management
  too.

There are some concerns regarding THP swap in, mainly because possible
enlarged read/write IO size (for swap in/out) may put more overhead on
the storage device.  To deal with that, the THP swap in should be
turned on only when necessary.  For example, it can be selected via
"always/never/madvise" logic, to be turned on globally, turned off
globally, or turned on only for VMA with MADV_HUGEPAGE, etc.

With the patchset, the swap out throughput improves 14.9% (from about
3.77GB/s to about 4.34GB/s) in the vm-scalability swap-w-seq test case
with 8 processes.  The test is done on a Xeon E5 v3 system.  The swap
device used is a RAM simulated PMEM (persistent memory) device.  To
test the sequential swapping out, the test case creates 8 processes,
which sequentially allocate and write to the anonymous pages until the
RAM and part of the swap device is used up.

The detailed comparison result is as follow,

base base+patchset
 --
 %stddev %change %stddev
 \  |\
   7043990 ±  0% +21.2%8536807 ±  0%  vm-scalability.throughput
109.94 ±  1% -16.2%  92.09 ±  0%  vm-scalability.time.elapsed_time
   3957091 ±  0% +14.9%4547173 ±  0%  vmstat.swap.so
 31.46 ±  1% -38.3%  19.42 ±  0%  perf-stat.cache-miss-rate%
  1.04 ±  1% +22.2%   1.27 ±  0%  perf-stat.ipc
  9.33 ±  2% -60.7%   3.67 ±  1%  
perf-profile.calltrace.cycles-pp.add_to_swap.shrink_page_list.shrink_inactive_list.shrink_node_memcg.shrink_node

Signed-off-by: "Huang, Ying" 
---
 mm/swap_state.c | 60 ++---
 1 file changed, 57 insertions(+), 3 deletions(-)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 504f67d73f67..6d63ff703d39 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -183,12 +184,53 @@ void __delete_from_swap_cache(struct page *page)
ADD_CACHE_INFO(del_total, nr);
 }
 
+#ifdef CONFIG_THP_SWAP_CLUSTER
+int add_to_swap_trans_huge(struct page *page, struct list_head *list)
+{
+   swp_entry_t entry;
+   int ret = 0;
+
+   /* cannot split, which may be needed during swap in, skip it */
+   if (!can_split_huge_page(page, NULL))
+   return -EBUSY;
+   /* fallback to split huge page firstly if no PMD map */
+   if (!compound_mapcount(page))
+   return 0;
+   entry = get_huge_swap_page();
+   if (!entry.val)
+   return 0;
+   if (mem_cgroup_try_charge_swap(page, entry, HPAGE_PMD_NR)) {
+   __swapcache_free(entry, true);
+   return -EOVERFLOW;
+   }
+   ret = add_to_swap_cache(page, entry,
+   __GFP_HIGH | __GFP_NOMEMALLOC|__GFP_NOWARN);
+   /* -ENOMEM radix-tree allocation failure */
+   if (ret) {
+   __swapcache_free(entry, true);
+   return 0;
+   }
+   ret = split_huge_page_to_list(page, list);
+   if (ret) {
+   delete_from_swap_cache(page);
+   return -EBUSY;
+   }
+   return 1;
+}
+#else
+static inline int add_to_swap_trans_huge(struct page *page,
+struct list_head *list)
+{
+   return 0;
+}
+#endif
+
 /**
  * add_to_swap - allocate swap space for a page
  *

[PATCH -mm -v7 3/9] mm, THP, swap: Add swap cluster allocate/free functions

2017-03-27 Thread Huang, Ying

From: Huang Ying 

The swap cluster allocation/free functions are added based on the
existing swap cluster management mechanism for SSD.  These functions
don't work for the rotating hard disks because the existing swap cluster
management mechanism doesn't work for them.  The hard disks support may
be added if someone really need it.  But that needn't be included in
this patchset.

This will be used for the THP (Transparent Huge Page) swap support.
Where one swap cluster will hold the contents of each THP swapped out.

Cc: Andrea Arcangeli 
Cc: Kirill A. Shutemov 
Cc: Hugh Dickins 
Cc: Shaohua Li 
Cc: Minchan Kim 
Cc: Rik van Riel 
Signed-off-by: "Huang, Ying" 
---
 mm/swapfile.c | 217 +-
 1 file changed, 156 insertions(+), 61 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 1ef4fc82c0fa..54480acbbeef 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -378,6 +378,14 @@ static void swap_cluster_schedule_discard(struct 
swap_info_struct *si,
schedule_work(>discard_work);
 }
 
+static void __free_cluster(struct swap_info_struct *si, unsigned long idx)
+{
+   struct swap_cluster_info *ci = si->cluster_info;
+
+   cluster_set_flag(ci + idx, CLUSTER_FLAG_FREE);
+   cluster_list_add_tail(>free_clusters, ci, idx);
+}
+
 /*
  * Doing discard actually. After a cluster discard is finished, the cluster
  * will be added to free cluster list. caller should hold si->lock.
@@ -398,10 +406,7 @@ static void swap_do_scheduled_discard(struct 
swap_info_struct *si)
 
spin_lock(>lock);
ci = lock_cluster(si, idx * SWAPFILE_CLUSTER);
-   cluster_set_flag(ci, CLUSTER_FLAG_FREE);
-   unlock_cluster(ci);
-   cluster_list_add_tail(>free_clusters, info, idx);
-   ci = lock_cluster(si, idx * SWAPFILE_CLUSTER);
+   __free_cluster(si, idx);
memset(si->swap_map + idx * SWAPFILE_CLUSTER,
0, SWAPFILE_CLUSTER);
unlock_cluster(ci);
@@ -419,6 +424,34 @@ static void swap_discard_work(struct work_struct *work)
spin_unlock(>lock);
 }
 
+static void alloc_cluster(struct swap_info_struct *si, unsigned long idx)
+{
+   struct swap_cluster_info *ci = si->cluster_info;
+
+   VM_BUG_ON(cluster_list_first(>free_clusters) != idx);
+   cluster_list_del_first(>free_clusters, ci);
+   cluster_set_count_flag(ci + idx, 0, 0);
+}
+
+static void free_cluster(struct swap_info_struct *si, unsigned long idx)
+{
+   struct swap_cluster_info *ci = si->cluster_info + idx;
+
+   VM_BUG_ON(cluster_count(ci) != 0);
+   /*
+* If the swap is discardable, prepare discard the cluster
+* instead of free it immediately. The cluster will be freed
+* after discard.
+*/
+   if ((si->flags & (SWP_WRITEOK | SWP_PAGE_DISCARD)) ==
+   (SWP_WRITEOK | SWP_PAGE_DISCARD)) {
+   swap_cluster_schedule_discard(si, idx);
+   return;
+   }
+
+   __free_cluster(si, idx);
+}
+
 /*
  * The cluster corresponding to page_nr will be used. The cluster will be
  * removed from free cluster list and its usage counter will be increased.
@@ -430,11 +463,8 @@ static void inc_cluster_info_page(struct swap_info_struct 
*p,
 
if (!cluster_info)
return;
-   if (cluster_is_free(_info[idx])) {
-   VM_BUG_ON(cluster_list_first(>free_clusters) != idx);
-   cluster_list_del_first(>free_clusters, cluster_info);
-   cluster_set_count_flag(_info[idx], 0, 0);
-   }
+   if (cluster_is_free(_info[idx]))
+   alloc_cluster(p, idx);
 
VM_BUG_ON(cluster_count(_info[idx]) >= SWAPFILE_CLUSTER);
cluster_set_count(_info[idx],
@@ -458,21 +488,8 @@ static void dec_cluster_info_page(struct swap_info_struct 
*p,
cluster_set_count(_info[idx],
cluster_count(_info[idx]) - 1);
 
-   if (cluster_count(_info[idx]) == 0) {
-   /*
-* If the swap is discardable, prepare discard the cluster
-* instead of free it immediately. The cluster will be freed
-* after discard.
-*/
-   if ((p->flags & (SWP_WRITEOK | SWP_PAGE_DISCARD)) ==
-(SWP_WRITEOK | SWP_PAGE_DISCARD)) {
-   swap_cluster_schedule_discard(p, idx);
-   return;
-   }
-
-   cluster_set_flag(_info[idx], CLUSTER_FLAG_FREE);
-   cluster_list_add_tail(>free_clusters, cluster_info, idx);
-   }
+   if (cluster_count(_info[idx]) == 0)
+   free_cluster(p, idx);
 }
 
 /*
@@ -562,6 +579,71 @@ static bool scan_swap_map_try_ssd_cluster(struct 
swap_info_struct *si,
return found_free;
 }
 
+#ifdef CONFIG_THP_SWAP_CLUSTER
+static inline unsigned int huge_cluster_nr_entries(bool huge)
+{
+   return

[PATCH -mm -v7 5/9] mm, THP, swap: Support to clear SWAP_HAS_CACHE for huge page

2017-03-27 Thread Huang, Ying

From: Huang Ying 

__swapcache_free() is added to support to clear the SWAP_HAS_CACHE flag
for the huge page.  This will free the specified swap cluster now.
Because now this function will be called only in the error path to free
the swap cluster just allocated.  So the corresponding swap_map[i] ==
SWAP_HAS_CACHE, that is, the swap count is 0.  This makes the
implementation simpler than that of the ordinary swap entry.

This will be used for delaying splitting THP (Transparent Huge Page)
during swapping out.  Where for one THP to swap out, we will allocate a
swap cluster, add the THP into the swap cache, then split the THP.  If
anything fails after allocating the swap cluster and before splitting
the THP successfully, the swapcache_free_trans_huge() will be used to
free the swap space allocated.

Cc: Andrea Arcangeli 
Cc: Kirill A. Shutemov 
Cc: Hugh Dickins 
Cc: Shaohua Li 
Cc: Minchan Kim 
Cc: Rik van Riel 
Signed-off-by: "Huang, Ying" 
---
 include/linux/swap.h |  9 +++--
 mm/swapfile.c| 34 --
 2 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index e3a7609a8989..2f2a6c0363aa 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -394,7 +394,7 @@ extern void swap_shmem_alloc(swp_entry_t);
 extern int swap_duplicate(swp_entry_t);
 extern int swapcache_prepare(swp_entry_t);
 extern void swap_free(swp_entry_t);
-extern void swapcache_free(swp_entry_t);
+extern void __swapcache_free(swp_entry_t entry, bool huge);
 extern void swapcache_free_entries(swp_entry_t *entries, int n);
 extern int free_swap_and_cache(swp_entry_t);
 extern int swap_type_of(dev_t, sector_t, struct block_device **);
@@ -456,7 +456,7 @@ static inline void swap_free(swp_entry_t swp)
 {
 }
 
-static inline void swapcache_free(swp_entry_t swp)
+static inline void __swapcache_free(swp_entry_t swp, bool huge)
 {
 }
 
@@ -544,6 +544,11 @@ static inline swp_entry_t get_huge_swap_page(void)
 }
 #endif
 
+static inline void swapcache_free(swp_entry_t entry)
+{
+   __swapcache_free(entry, false);
+}
+
 #ifdef CONFIG_MEMCG
 static inline int mem_cgroup_swappiness(struct mem_cgroup *memcg)
 {
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 382e84541e16..055cfc1be057 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -855,6 +855,29 @@ static void swap_free_huge_cluster(struct swap_info_struct 
*si,
_swap_entry_free(si, offset, true);
 }
 
+static void swapcache_free_trans_huge(struct swap_info_struct *si,
+ swp_entry_t entry)
+{
+   unsigned long offset = swp_offset(entry);
+   unsigned long idx = offset / SWAPFILE_CLUSTER;
+   struct swap_cluster_info *ci;
+   unsigned char *map;
+   unsigned int i;
+
+   spin_lock(>lock);
+   ci = lock_cluster(si, offset);
+   map = si->swap_map + offset;
+   for (i = 0; i < SWAPFILE_CLUSTER; i++) {
+   VM_BUG_ON(map[i] != SWAP_HAS_CACHE);
+   map[i] = 0;
+   }
+   unlock_cluster(ci);
+   /* Cluster size is same as huge pmd size */
+   mem_cgroup_uncharge_swap(entry, HPAGE_PMD_NR);
+   swap_free_huge_cluster(si, idx);
+   spin_unlock(>lock);
+}
+
 static int swap_alloc_huge_cluster(struct swap_info_struct *si,
   swp_entry_t *slot)
 {
@@ -887,6 +910,11 @@ static inline int swap_alloc_huge_cluster(struct 
swap_info_struct *si,
 {
return 0;
 }
+
+static inline void swapcache_free_trans_huge(struct swap_info_struct *si,
+swp_entry_t entry)
+{
+}
 #endif
 
 static unsigned long scan_swap_map(struct swap_info_struct *si,
@@ -1157,13 +1185,15 @@ void swap_free(swp_entry_t entry)
 /*
  * Called after dropping swapcache to decrease refcnt to swap entries.
  */
-void swapcache_free(swp_entry_t entry)
+void __swapcache_free(swp_entry_t entry, bool huge)
 {
struct swap_info_struct *p;
 
p = _swap_info_get(entry);
if (p) {
-   if (!__swap_entry_free(p, entry, SWAP_HAS_CACHE))
+   if (unlikely(huge))
+   swapcache_free_trans_huge(p, entry);
+   else if (!__swap_entry_free(p, entry, SWAP_HAS_CACHE))
free_swap_slot(entry);
}
 }
-- 
2.11.0

[PATCH -mm -v7 7/9] mm, THP: Add can_split_huge_page()

2017-03-27 Thread Huang, Ying

From: Huang Ying 

Separates checking whether we can split the huge page from
split_huge_page_to_list() into a function.  This will help to check that
before splitting the THP (Transparent Huge Page) really.

This will be used for delaying splitting THP during swapping out.  Where
for a THP, we will allocate a swap cluster, add the THP into the swap
cache, then split the THP.  To avoid the unnecessary operations for the
un-splittable THP, we will check that firstly.

There is no functionality change in this patch.

Cc: Andrea Arcangeli 
Cc: Ebru Akagunduz 
Signed-off-by: "Huang, Ying" 
Acked-by: Kirill A. Shutemov 
---
 include/linux/huge_mm.h |  7 +++
 mm/huge_memory.c| 17 ++---
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index a3762d49ba39..d3b3e8fcc717 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -113,6 +113,7 @@ extern unsigned long thp_get_unmapped_area(struct file 
*filp,
 extern void prep_transhuge_page(struct page *page);
 extern void free_transhuge_page(struct page *page);
 
+bool can_split_huge_page(struct page *page, int *pextra_pins);
 int split_huge_page_to_list(struct page *page, struct list_head *list);
 static inline int split_huge_page(struct page *page)
 {
@@ -231,6 +232,12 @@ static inline void prep_transhuge_page(struct page *page) 
{}
 
 #define thp_get_unmapped_area  NULL
 
+static inline bool
+can_split_huge_page(struct page *page, int *pextra_pins)
+{
+   BUILD_BUG();
+   return false;
+}
 static inline int
 split_huge_page_to_list(struct page *page, struct list_head *list)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d14dd961f626..08ccf0cebe8f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2367,6 +2367,19 @@ int page_trans_huge_mapcount(struct page *page, int 
*total_mapcount)
return ret;
 }
 
+/* Racy check whether the huge page can be split */
+bool can_split_huge_page(struct page *page, int *pextra_pins)
+{
+   int extra_pins = 0;
+
+   /* Additional pins from radix tree */
+   if (!PageAnon(page))
+   extra_pins = HPAGE_PMD_NR;
+   if (pextra_pins)
+   *pextra_pins = extra_pins;
+   return total_mapcount(page) == page_count(page) - extra_pins - 1;
+}
+
 /*
  * This function splits huge page into normal pages. @page can point to any
  * subpage of huge page to split. Split doesn't change the position of @page.
@@ -2426,8 +2439,6 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
goto out;
}
 
-   /* Addidional pins from radix tree */
-   extra_pins = HPAGE_PMD_NR;
anon_vma = NULL;
i_mmap_lock_read(mapping);
}
@@ -2436,7 +2447,7 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
 * Racy check if we can split the page, before freeze_page() will
 * split PMDs
 */
-   if (total_mapcount(head) != page_count(head) - extra_pins - 1) {
+   if (!can_split_huge_page(head, _pins)) {
ret = -EBUSY;
goto out_unlock;
}
-- 
2.11.0

[PATCH -mm -v7 1/9] mm, swap: Make swap cluster size same of THP size on x86_64

2017-03-27 Thread Huang, Ying

From: Huang Ying 

In this patch, the size of the swap cluster is changed to that of the
THP (Transparent Huge Page) on x86_64 architecture (512).  This is for
the THP swap support on x86_64.  Where one swap cluster will be used to
hold the contents of each THP swapped out.  And some information of the
swapped out THP (such as compound map count) will be recorded in the
swap_cluster_info data structure.

For other architectures which want THP swap support,
ARCH_USES_THP_SWAP_CLUSTER need to be selected in the Kconfig file for
the architecture.

In effect, this will enlarge swap cluster size by 2 times on x86_64.
Which may make it harder to find a free cluster when the swap space
becomes fragmented.  So that, this may reduce the continuous swap space
allocation and sequential write in theory.  The performance test in 0day
shows no regressions caused by this.

Cc: Hugh Dickins 
Cc: Shaohua Li 
Cc: Minchan Kim 
Cc: Rik van Riel 
Suggested-by: Andrew Morton 
Signed-off-by: "Huang, Ying" 
---
 arch/x86/Kconfig |  1 +
 mm/Kconfig   | 13 +
 mm/swapfile.c|  4 
 3 files changed, 18 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index abfc31fb0bee..852d13878793 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -178,6 +178,7 @@ config X86
select USER_STACKTRACE_SUPPORT
select VIRT_TO_BUS
select X86_FEATURE_NAMESif PROC_FS
+   select ARCH_USES_THP_SWAP_CLUSTER   if X86_64
 
 config INSTRUCTION_DECODER
def_bool y
diff --git a/mm/Kconfig b/mm/Kconfig
index 9b8fccb969dc..7b708e200c29 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -499,6 +499,19 @@ config FRONTSWAP
 
  If unsure, say Y to enable frontswap.
 
+config ARCH_USES_THP_SWAP_CLUSTER
+   bool
+   default n
+
+config THP_SWAP_CLUSTER
+   bool
+   depends on SWAP && TRANSPARENT_HUGEPAGE && ARCH_USES_THP_SWAP_CLUSTER
+   default y
+   help
+ Use one swap cluster to hold the contents of the THP
+ (Transparent Huge Page) swapped out.  The size of the swap
+ cluster will be same as that of THP.
+
 config CMA
bool "Contiguous Memory Allocator"
depends on HAVE_MEMBLOCK && MMU
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 53b5881ee0d6..abc401f72a0a 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -199,7 +199,11 @@ static void discard_swap_cluster(struct swap_info_struct 
*si,
}
 }
 
+#ifdef CONFIG_THP_SWAP_CLUSTER
+#define SWAPFILE_CLUSTER   HPAGE_PMD_NR
+#else
 #define SWAPFILE_CLUSTER   256
+#endif
 #define LATENCY_LIMIT  256
 
 static inline void cluster_set_flag(struct swap_cluster_info *info,
-- 
2.11.0

[PATCH v2 2/3] soc: qcom: smd: Remove standalone driver

2017-03-27 Thread Bjorn Andersson

Remove the standalone SMD implementation as we have transitioned the
client drivers to use the RPMSG based one.

Also remove all dependencies on QCOM_SMD from Kconfig files, in order to
keep them selectable in the absence of the removed symbol.

Acked-by: Andy Gross 
Signed-off-by: Bjorn Andersson 
---
 drivers/remoteproc/Kconfig |6 +-
 drivers/rpmsg/Kconfig  |1 -
 drivers/soc/qcom/Kconfig   |8 -
 drivers/soc/qcom/Makefile  |1 -
 drivers/soc/qcom/smd.c | 1560 
 include/linux/rpmsg/qcom_smd.h |2 +-
 include/linux/soc/qcom/smd.h   |  139 
 7 files changed, 4 insertions(+), 1713 deletions(-)
 delete mode 100644 drivers/soc/qcom/smd.c
 delete mode 100644 include/linux/soc/qcom/smd.h

diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index 1dc43fc5f65f..faad69a1a597 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -76,7 +76,7 @@ config QCOM_ADSP_PIL
depends on OF && ARCH_QCOM
depends on REMOTEPROC
depends on QCOM_SMEM
-   depends on RPMSG_QCOM_SMD || QCOM_SMD || (COMPILE_TEST && QCOM_SMD=n && 
RPMSG_QCOM_SMD=n)
+   depends on RPMSG_QCOM_SMD || (COMPILE_TEST && RPMSG_QCOM_SMD=n)
select MFD_SYSCON
select QCOM_MDT_LOADER
select QCOM_RPROC_COMMON
@@ -93,7 +93,7 @@ config QCOM_Q6V5_PIL
depends on OF && ARCH_QCOM
depends on QCOM_SMEM
depends on REMOTEPROC
-   depends on RPMSG_QCOM_SMD || QCOM_SMD || (COMPILE_TEST && QCOM_SMD=n && 
RPMSG_QCOM_SMD=n)
+   depends on RPMSG_QCOM_SMD || (COMPILE_TEST && RPMSG_QCOM_SMD=n)
select MFD_SYSCON
select QCOM_RPROC_COMMON
select QCOM_SCM
@@ -104,7 +104,7 @@ config QCOM_Q6V5_PIL
 config QCOM_WCNSS_PIL
tristate "Qualcomm WCNSS Peripheral Image Loader"
depends on OF && ARCH_QCOM
-   depends on RPMSG_QCOM_SMD || QCOM_SMD || (COMPILE_TEST && QCOM_SMD=n && 
RPMSG_QCOM_SMD=n)
+   depends on RPMSG_QCOM_SMD || (COMPILE_TEST && RPMSG_QCOM_SMD=n)
depends on QCOM_SMEM
depends on REMOTEPROC
select QCOM_MDT_LOADER
diff --git a/drivers/rpmsg/Kconfig b/drivers/rpmsg/Kconfig
index f12ac0b28263..edc008f55663 100644
--- a/drivers/rpmsg/Kconfig
+++ b/drivers/rpmsg/Kconfig
@@ -16,7 +16,6 @@ config RPMSG_CHAR
 config RPMSG_QCOM_SMD
tristate "Qualcomm Shared Memory Driver (SMD)"
depends on QCOM_SMEM
-   depends on QCOM_SMD=n
select RPMSG
help
  Say y here to enable support for the Qualcomm Shared Memory Driver
diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
index 4e090c697eb6..9fca977ef18d 100644
--- a/drivers/soc/qcom/Kconfig
+++ b/drivers/soc/qcom/Kconfig
@@ -33,14 +33,6 @@ config QCOM_SMEM
  The driver provides an interface to items in a heap shared among all
  processors in a Qualcomm platform.
 
-config QCOM_SMD
-   tristate "Qualcomm Shared Memory Driver (SMD)"
-   depends on QCOM_SMEM
-   help
- Say y here to enable support for the Qualcomm Shared Memory Driver
- providing communication channels to remote processors in Qualcomm
- platforms.
-
 config QCOM_SMD_RPM
tristate "Qualcomm Resource Power Manager (RPM) over SMD"
depends on ARCH_QCOM
diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile
index 1f30260b06b8..414f0de274fa 100644
--- a/drivers/soc/qcom/Makefile
+++ b/drivers/soc/qcom/Makefile
@@ -1,7 +1,6 @@
 obj-$(CONFIG_QCOM_GSBI)+=  qcom_gsbi.o
 obj-$(CONFIG_QCOM_MDT_LOADER)  += mdt_loader.o
 obj-$(CONFIG_QCOM_PM)  +=  spm.o
-obj-$(CONFIG_QCOM_SMD) +=  smd.o
 obj-$(CONFIG_QCOM_SMD_RPM) += smd-rpm.o
 obj-$(CONFIG_QCOM_SMEM) += smem.o
 obj-$(CONFIG_QCOM_SMEM_STATE) += smem_state.o
diff --git a/drivers/soc/qcom/smd.c b/drivers/soc/qcom/smd.c
deleted file mode 100644
index 322034ab9d37..
--- a/drivers/soc/qcom/smd.c
+++ /dev/null
@@ -1,1560 +0,0 @@
-/*
- * Copyright (c) 2015, Sony Mobile Communications AB.
- * Copyright (c) 2012-2013, The Linux Foundation. All rights reserved.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 and
- * only version 2 as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-/*
- * The Qualcomm Shared Memory communication solution provides point-to-point
- * channels for clients to send and receive streaming or packet based data.
- *
- * Each channel consists of a

[PATCH v2 3/3] soc: qcom: smd-rpm: Add msm8996 compatibility

2017-03-27 Thread Bjorn Andersson

With the RPM driver transitioned to RPMSG we can reuse the SMD-RPM
driver ontop of GLINK for 8996, without any modifications.

Acked-by: Andy Gross 
Signed-off-by: Bjorn Andersson 
---
 drivers/soc/qcom/smd-rpm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/soc/qcom/smd-rpm.c b/drivers/soc/qcom/smd-rpm.c
index 0dcf1bf33126..c2346752b3ea 100644
--- a/drivers/soc/qcom/smd-rpm.c
+++ b/drivers/soc/qcom/smd-rpm.c
@@ -225,6 +225,7 @@ static const struct of_device_id qcom_smd_rpm_of_match[] = {
{ .compatible = "qcom,rpm-apq8084" },
{ .compatible = "qcom,rpm-msm8916" },
{ .compatible = "qcom,rpm-msm8974" },
+   { .compatible = "qcom,rpm-msm8996" },
{}
 };
 MODULE_DEVICE_TABLE(of, qcom_smd_rpm_of_match);
-- 
2.12.0

[PATCH v2 2/3] soc: qcom: smd: Remove standalone driver

2017-03-27 Thread Bjorn Andersson

Remove the standalone SMD implementation as we have transitioned the
client drivers to use the RPMSG based one.

Also remove all dependencies on QCOM_SMD from Kconfig files, in order to
keep them selectable in the absence of the removed symbol.

Acked-by: Andy Gross 
Signed-off-by: Bjorn Andersson 
---
 drivers/remoteproc/Kconfig |6 +-
 drivers/rpmsg/Kconfig  |1 -
 drivers/soc/qcom/Kconfig   |8 -
 drivers/soc/qcom/Makefile  |1 -
 drivers/soc/qcom/smd.c | 1560 
 include/linux/rpmsg/qcom_smd.h |2 +-
 include/linux/soc/qcom/smd.h   |  139 
 7 files changed, 4 insertions(+), 1713 deletions(-)
 delete mode 100644 drivers/soc/qcom/smd.c
 delete mode 100644 include/linux/soc/qcom/smd.h

diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index 1dc43fc5f65f..faad69a1a597 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -76,7 +76,7 @@ config QCOM_ADSP_PIL
depends on OF && ARCH_QCOM
depends on REMOTEPROC
depends on QCOM_SMEM
-   depends on RPMSG_QCOM_SMD || QCOM_SMD || (COMPILE_TEST && QCOM_SMD=n && 
RPMSG_QCOM_SMD=n)
+   depends on RPMSG_QCOM_SMD || (COMPILE_TEST && RPMSG_QCOM_SMD=n)
select MFD_SYSCON
select QCOM_MDT_LOADER
select QCOM_RPROC_COMMON
@@ -93,7 +93,7 @@ config QCOM_Q6V5_PIL
depends on OF && ARCH_QCOM
depends on QCOM_SMEM
depends on REMOTEPROC
-   depends on RPMSG_QCOM_SMD || QCOM_SMD || (COMPILE_TEST && QCOM_SMD=n && 
RPMSG_QCOM_SMD=n)
+   depends on RPMSG_QCOM_SMD || (COMPILE_TEST && RPMSG_QCOM_SMD=n)
select MFD_SYSCON
select QCOM_RPROC_COMMON
select QCOM_SCM
@@ -104,7 +104,7 @@ config QCOM_Q6V5_PIL
 config QCOM_WCNSS_PIL
tristate "Qualcomm WCNSS Peripheral Image Loader"
depends on OF && ARCH_QCOM
-   depends on RPMSG_QCOM_SMD || QCOM_SMD || (COMPILE_TEST && QCOM_SMD=n && 
RPMSG_QCOM_SMD=n)
+   depends on RPMSG_QCOM_SMD || (COMPILE_TEST && RPMSG_QCOM_SMD=n)
depends on QCOM_SMEM
depends on REMOTEPROC
select QCOM_MDT_LOADER
diff --git a/drivers/rpmsg/Kconfig b/drivers/rpmsg/Kconfig
index f12ac0b28263..edc008f55663 100644
--- a/drivers/rpmsg/Kconfig
+++ b/drivers/rpmsg/Kconfig
@@ -16,7 +16,6 @@ config RPMSG_CHAR
 config RPMSG_QCOM_SMD
tristate "Qualcomm Shared Memory Driver (SMD)"
depends on QCOM_SMEM
-   depends on QCOM_SMD=n
select RPMSG
help
  Say y here to enable support for the Qualcomm Shared Memory Driver
diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
index 4e090c697eb6..9fca977ef18d 100644
--- a/drivers/soc/qcom/Kconfig
+++ b/drivers/soc/qcom/Kconfig
@@ -33,14 +33,6 @@ config QCOM_SMEM
  The driver provides an interface to items in a heap shared among all
  processors in a Qualcomm platform.
 
-config QCOM_SMD
-   tristate "Qualcomm Shared Memory Driver (SMD)"
-   depends on QCOM_SMEM
-   help
- Say y here to enable support for the Qualcomm Shared Memory Driver
- providing communication channels to remote processors in Qualcomm
- platforms.
-
 config QCOM_SMD_RPM
tristate "Qualcomm Resource Power Manager (RPM) over SMD"
depends on ARCH_QCOM
diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile
index 1f30260b06b8..414f0de274fa 100644
--- a/drivers/soc/qcom/Makefile
+++ b/drivers/soc/qcom/Makefile
@@ -1,7 +1,6 @@
 obj-$(CONFIG_QCOM_GSBI)+=  qcom_gsbi.o
 obj-$(CONFIG_QCOM_MDT_LOADER)  += mdt_loader.o
 obj-$(CONFIG_QCOM_PM)  +=  spm.o
-obj-$(CONFIG_QCOM_SMD) +=  smd.o
 obj-$(CONFIG_QCOM_SMD_RPM) += smd-rpm.o
 obj-$(CONFIG_QCOM_SMEM) += smem.o
 obj-$(CONFIG_QCOM_SMEM_STATE) += smem_state.o
diff --git a/drivers/soc/qcom/smd.c b/drivers/soc/qcom/smd.c
deleted file mode 100644
index 322034ab9d37..
--- a/drivers/soc/qcom/smd.c
+++ /dev/null
@@ -1,1560 +0,0 @@
-/*
- * Copyright (c) 2015, Sony Mobile Communications AB.
- * Copyright (c) 2012-2013, The Linux Foundation. All rights reserved.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 and
- * only version 2 as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-/*
- * The Qualcomm Shared Memory communication solution provides point-to-point
- * channels for clients to send and receive streaming or packet based data.
- *
- * Each channel consists of a control item (channel info) and a ring buffer
- *

[PATCH v2 3/3] soc: qcom: smd-rpm: Add msm8996 compatibility

2017-03-27 Thread Bjorn Andersson

With the RPM driver transitioned to RPMSG we can reuse the SMD-RPM
driver ontop of GLINK for 8996, without any modifications.

Acked-by: Andy Gross 
Signed-off-by: Bjorn Andersson 
---
 drivers/soc/qcom/smd-rpm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/soc/qcom/smd-rpm.c b/drivers/soc/qcom/smd-rpm.c
index 0dcf1bf33126..c2346752b3ea 100644
--- a/drivers/soc/qcom/smd-rpm.c
+++ b/drivers/soc/qcom/smd-rpm.c
@@ -225,6 +225,7 @@ static const struct of_device_id qcom_smd_rpm_of_match[] = {
{ .compatible = "qcom,rpm-apq8084" },
{ .compatible = "qcom,rpm-msm8916" },
{ .compatible = "qcom,rpm-msm8974" },
+   { .compatible = "qcom,rpm-msm8996" },
{}
 };
 MODULE_DEVICE_TABLE(of, qcom_smd_rpm_of_match);
-- 
2.12.0

Re: [RFC PATCH 1/3] of/pci: dma-ranges to account highest possible host bridge dma_mask

2017-03-27 Thread Oza Oza

On Mon, Mar 27, 2017 at 8:16 PM, Rob Herring  wrote:
> On Sat, Mar 25, 2017 at 12:31 AM, Oza Pawandeep  wrote:
>> it is possible that PCI device supports 64-bit DMA addressing,
>> and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
>> however PCI host bridge may have limitations on the inbound
>> transaction addressing. As an example, consider NVME SSD device
>> connected to iproc-PCIe controller.
>>
>> Currently, the IOMMU DMA ops only considers PCI device dma_mask
>> when allocating an IOVA. This is particularly problematic on
>> ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to
>> PA for in-bound transactions only after PCI Host has forwarded
>> these transactions on SOC IO bus. This means on such ARM/ARM64
>> SOCs the IOVA of in-bound transactions has to honor the addressing
>> restrictions of the PCI Host.
>>
>> current pcie frmework and of framework integration assumes dma-ranges
>> in a way where memory-mapped devices define their dma-ranges.
>> dma-ranges: (child-bus-address, parent-bus-address, length).
>>
>> but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
>> dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
>
> If you implement a common function, then I expect to see other users
> converted to use it. There's also PCI hosts in arch/powerpc that parse
> dma-ranges.

the common function should be similar to what
of_pci_get_host_bridge_resources is doing right now.
it parses ranges property right now.

the new function would look look following.

of_pci_get_dma_ranges(struct device_node *dev, struct list_head *resources)
where resources would return the dma-ranges.

but right now if you see the patch, of_dma_configure calls the new
function, which actually returns the largest possible size.
so this new function has to be generic in a way where other PCI hosts
can use it. but certainly iproc(Broadcom SOC) , rcar based SOCs can
use it for sure.

although having powerpc using it;  is a separate exercise, since I do
not have any access to other PCI hosts such as powerpc. but we can
workout with them on thsi forum if required.

so overall, of_pci_get_dma_ranges has to serve following 2 purposes.

1) it has to return largest possible size to of_dma_configure to
generate largest possible dma_mask.

2) it also has to return resources (dma-ranges) parsed, to the users.

so to address above needs

of_pci_get_dma_ranges(struct device_node *dev, struct list_head
*resources, u64 *size)

dev -> device node.
resources -> dma-ranges in allocated list.
size -> highest possible size to generate possible dma_mask for
of_dma_configure.

let em know how this sounds.

Regards,
Oza.

[PATCH v2 1/3] soc: qcom: smd: Transition client drivers from smd to rpmsg

2017-03-27 Thread Bjorn Andersson

By moving these client drivers to use RPMSG instead of the direct SMD
API we can reuse them ontop of the newly added GLINK wire-protocol
support found in the 820 and 835 Qualcomm platforms.

As the new (RPMSG-based) and old SMD implementations are mutually
exclusive we have to change all client drivers in one commit, to make
sure we have a working system before and after this transition.

Acked-by: Andy Gross 
Acked-by: Kalle Valo 
Acked-by: Marcel Holtmann 
Signed-off-by: Bjorn Andersson 
---

Changes since v1:
- Add dependency on ARCH_QCOM for soc config options, to match the fact that
  drivers/soc/Makefile only enters qcom/ iff ARCH_QCOM is set.

 drivers/bluetooth/Kconfig  |  2 +-
 drivers/bluetooth/btqcomsmd.c  | 32 +--
 drivers/net/wireless/ath/wcn36xx/Kconfig   |  2 +-
 drivers/net/wireless/ath/wcn36xx/main.c|  6 ++--
 drivers/net/wireless/ath/wcn36xx/smd.c | 10 +++---
 drivers/net/wireless/ath/wcn36xx/smd.h |  6 ++--
 drivers/net/wireless/ath/wcn36xx/wcn36xx.h |  2 +-
 drivers/soc/qcom/Kconfig   |  6 ++--
 drivers/soc/qcom/smd-rpm.c | 43 +
 drivers/soc/qcom/wcnss_ctrl.c  | 50 +-
 include/linux/soc/qcom/wcnss_ctrl.h| 11 ---
 net/qrtr/Kconfig   |  2 +-
 net/qrtr/smd.c | 42 -
 13 files changed, 110 insertions(+), 104 deletions(-)

diff --git a/drivers/bluetooth/Kconfig b/drivers/bluetooth/Kconfig
index 08e054507d0b..a6a9dd4d0eef 100644
--- a/drivers/bluetooth/Kconfig
+++ b/drivers/bluetooth/Kconfig
@@ -344,7 +344,7 @@ config BT_WILINK
 
 config BT_QCOMSMD
tristate "Qualcomm SMD based HCI support"
-   depends on QCOM_SMD || (COMPILE_TEST && QCOM_SMD=n)
+   depends on RPMSG || (COMPILE_TEST && RPMSG=n)
depends on QCOM_WCNSS_CTRL || (COMPILE_TEST && QCOM_WCNSS_CTRL=n)
select BT_QCA
help
diff --git a/drivers/bluetooth/btqcomsmd.c b/drivers/bluetooth/btqcomsmd.c
index 8d4868af9bbd..ef730c173d4b 100644
--- a/drivers/bluetooth/btqcomsmd.c
+++ b/drivers/bluetooth/btqcomsmd.c
@@ -14,7 +14,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 
@@ -26,8 +26,8 @@
 struct btqcomsmd {
struct hci_dev *hdev;
 
-   struct qcom_smd_channel *acl_channel;
-   struct qcom_smd_channel *cmd_channel;
+   struct rpmsg_endpoint *acl_channel;
+   struct rpmsg_endpoint *cmd_channel;
 };
 
 static int btqcomsmd_recv(struct hci_dev *hdev, unsigned int type,
@@ -48,19 +48,19 @@ static int btqcomsmd_recv(struct hci_dev *hdev, unsigned 
int type,
return hci_recv_frame(hdev, skb);
 }
 
-static int btqcomsmd_acl_callback(struct qcom_smd_channel *channel,
- const void *data, size_t count)
+static int btqcomsmd_acl_callback(struct rpmsg_device *rpdev, void *data,
+ int count, void *priv, u32 addr)
 {
-   struct btqcomsmd *btq = qcom_smd_get_drvdata(channel);
+   struct btqcomsmd *btq = priv;
 
btq->hdev->stat.byte_rx += count;
return btqcomsmd_recv(btq->hdev, HCI_ACLDATA_PKT, data, count);
 }
 
-static int btqcomsmd_cmd_callback(struct qcom_smd_channel *channel,
- const void *data, size_t count)
+static int btqcomsmd_cmd_callback(struct rpmsg_device *rpdev, void *data,
+ int count, void *priv, u32 addr)
 {
-   struct btqcomsmd *btq = qcom_smd_get_drvdata(channel);
+   struct btqcomsmd *btq = priv;
 
return btqcomsmd_recv(btq->hdev, HCI_EVENT_PKT, data, count);
 }
@@ -72,12 +72,12 @@ static int btqcomsmd_send(struct hci_dev *hdev, struct 
sk_buff *skb)
 
switch (hci_skb_pkt_type(skb)) {
case HCI_ACLDATA_PKT:
-   ret = qcom_smd_send(btq->acl_channel, skb->data, skb->len);
+   ret = rpmsg_send(btq->acl_channel, skb->data, skb->len);
hdev->stat.acl_tx++;
hdev->stat.byte_tx += skb->len;
break;
case HCI_COMMAND_PKT:
-   ret = qcom_smd_send(btq->cmd_channel, skb->data, skb->len);
+   ret = rpmsg_send(btq->cmd_channel, skb->data, skb->len);
hdev->stat.cmd_tx++;
break;
default:
@@ -114,18 +114,15 @@ static int btqcomsmd_probe(struct platform_device *pdev)
wcnss = dev_get_drvdata(pdev->dev.parent);
 
btq->acl_channel = qcom_wcnss_open_channel(wcnss, "APPS_RIVA_BT_ACL",
-  btqcomsmd_acl_callback);
+  btqcomsmd_acl_callback, btq);
if (IS_ERR(btq->acl_channel))
return PTR_ERR(btq->acl_channel);
 
btq->cmd_channel = qcom_wcnss_open_channel(wcnss, "APPS_RIVA_BT_CMD",
-

Re: [RFC PATCH 1/3] of/pci: dma-ranges to account highest possible host bridge dma_mask

2017-03-27 Thread Oza Oza

On Mon, Mar 27, 2017 at 8:16 PM, Rob Herring  wrote:
> On Sat, Mar 25, 2017 at 12:31 AM, Oza Pawandeep  wrote:
>> it is possible that PCI device supports 64-bit DMA addressing,
>> and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64),
>> however PCI host bridge may have limitations on the inbound
>> transaction addressing. As an example, consider NVME SSD device
>> connected to iproc-PCIe controller.
>>
>> Currently, the IOMMU DMA ops only considers PCI device dma_mask
>> when allocating an IOVA. This is particularly problematic on
>> ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to
>> PA for in-bound transactions only after PCI Host has forwarded
>> these transactions on SOC IO bus. This means on such ARM/ARM64
>> SOCs the IOVA of in-bound transactions has to honor the addressing
>> restrictions of the PCI Host.
>>
>> current pcie frmework and of framework integration assumes dma-ranges
>> in a way where memory-mapped devices define their dma-ranges.
>> dma-ranges: (child-bus-address, parent-bus-address, length).
>>
>> but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges.
>> dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
>
> If you implement a common function, then I expect to see other users
> converted to use it. There's also PCI hosts in arch/powerpc that parse
> dma-ranges.

the common function should be similar to what
of_pci_get_host_bridge_resources is doing right now.
it parses ranges property right now.

the new function would look look following.

of_pci_get_dma_ranges(struct device_node *dev, struct list_head *resources)
where resources would return the dma-ranges.

but right now if you see the patch, of_dma_configure calls the new
function, which actually returns the largest possible size.
so this new function has to be generic in a way where other PCI hosts
can use it. but certainly iproc(Broadcom SOC) , rcar based SOCs can
use it for sure.

although having powerpc using it;  is a separate exercise, since I do
not have any access to other PCI hosts such as powerpc. but we can
workout with them on thsi forum if required.

so overall, of_pci_get_dma_ranges has to serve following 2 purposes.

1) it has to return largest possible size to of_dma_configure to
generate largest possible dma_mask.

2) it also has to return resources (dma-ranges) parsed, to the users.

so to address above needs

of_pci_get_dma_ranges(struct device_node *dev, struct list_head
*resources, u64 *size)

dev -> device node.
resources -> dma-ranges in allocated list.
size -> highest possible size to generate possible dma_mask for
of_dma_configure.

let em know how this sounds.

Regards,
Oza.

[PATCH v2 1/3] soc: qcom: smd: Transition client drivers from smd to rpmsg

2017-03-27 Thread Bjorn Andersson

By moving these client drivers to use RPMSG instead of the direct SMD
API we can reuse them ontop of the newly added GLINK wire-protocol
support found in the 820 and 835 Qualcomm platforms.

As the new (RPMSG-based) and old SMD implementations are mutually
exclusive we have to change all client drivers in one commit, to make
sure we have a working system before and after this transition.

Acked-by: Andy Gross 
Acked-by: Kalle Valo 
Acked-by: Marcel Holtmann 
Signed-off-by: Bjorn Andersson 
---

Changes since v1:
- Add dependency on ARCH_QCOM for soc config options, to match the fact that
  drivers/soc/Makefile only enters qcom/ iff ARCH_QCOM is set.

 drivers/bluetooth/Kconfig  |  2 +-
 drivers/bluetooth/btqcomsmd.c  | 32 +--
 drivers/net/wireless/ath/wcn36xx/Kconfig   |  2 +-
 drivers/net/wireless/ath/wcn36xx/main.c|  6 ++--
 drivers/net/wireless/ath/wcn36xx/smd.c | 10 +++---
 drivers/net/wireless/ath/wcn36xx/smd.h |  6 ++--
 drivers/net/wireless/ath/wcn36xx/wcn36xx.h |  2 +-
 drivers/soc/qcom/Kconfig   |  6 ++--
 drivers/soc/qcom/smd-rpm.c | 43 +
 drivers/soc/qcom/wcnss_ctrl.c  | 50 +-
 include/linux/soc/qcom/wcnss_ctrl.h| 11 ---
 net/qrtr/Kconfig   |  2 +-
 net/qrtr/smd.c | 42 -
 13 files changed, 110 insertions(+), 104 deletions(-)

diff --git a/drivers/bluetooth/Kconfig b/drivers/bluetooth/Kconfig
index 08e054507d0b..a6a9dd4d0eef 100644
--- a/drivers/bluetooth/Kconfig
+++ b/drivers/bluetooth/Kconfig
@@ -344,7 +344,7 @@ config BT_WILINK
 
 config BT_QCOMSMD
tristate "Qualcomm SMD based HCI support"
-   depends on QCOM_SMD || (COMPILE_TEST && QCOM_SMD=n)
+   depends on RPMSG || (COMPILE_TEST && RPMSG=n)
depends on QCOM_WCNSS_CTRL || (COMPILE_TEST && QCOM_WCNSS_CTRL=n)
select BT_QCA
help
diff --git a/drivers/bluetooth/btqcomsmd.c b/drivers/bluetooth/btqcomsmd.c
index 8d4868af9bbd..ef730c173d4b 100644
--- a/drivers/bluetooth/btqcomsmd.c
+++ b/drivers/bluetooth/btqcomsmd.c
@@ -14,7 +14,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 
@@ -26,8 +26,8 @@
 struct btqcomsmd {
struct hci_dev *hdev;
 
-   struct qcom_smd_channel *acl_channel;
-   struct qcom_smd_channel *cmd_channel;
+   struct rpmsg_endpoint *acl_channel;
+   struct rpmsg_endpoint *cmd_channel;
 };
 
 static int btqcomsmd_recv(struct hci_dev *hdev, unsigned int type,
@@ -48,19 +48,19 @@ static int btqcomsmd_recv(struct hci_dev *hdev, unsigned 
int type,
return hci_recv_frame(hdev, skb);
 }
 
-static int btqcomsmd_acl_callback(struct qcom_smd_channel *channel,
- const void *data, size_t count)
+static int btqcomsmd_acl_callback(struct rpmsg_device *rpdev, void *data,
+ int count, void *priv, u32 addr)
 {
-   struct btqcomsmd *btq = qcom_smd_get_drvdata(channel);
+   struct btqcomsmd *btq = priv;
 
btq->hdev->stat.byte_rx += count;
return btqcomsmd_recv(btq->hdev, HCI_ACLDATA_PKT, data, count);
 }
 
-static int btqcomsmd_cmd_callback(struct qcom_smd_channel *channel,
- const void *data, size_t count)
+static int btqcomsmd_cmd_callback(struct rpmsg_device *rpdev, void *data,
+ int count, void *priv, u32 addr)
 {
-   struct btqcomsmd *btq = qcom_smd_get_drvdata(channel);
+   struct btqcomsmd *btq = priv;
 
return btqcomsmd_recv(btq->hdev, HCI_EVENT_PKT, data, count);
 }
@@ -72,12 +72,12 @@ static int btqcomsmd_send(struct hci_dev *hdev, struct 
sk_buff *skb)
 
switch (hci_skb_pkt_type(skb)) {
case HCI_ACLDATA_PKT:
-   ret = qcom_smd_send(btq->acl_channel, skb->data, skb->len);
+   ret = rpmsg_send(btq->acl_channel, skb->data, skb->len);
hdev->stat.acl_tx++;
hdev->stat.byte_tx += skb->len;
break;
case HCI_COMMAND_PKT:
-   ret = qcom_smd_send(btq->cmd_channel, skb->data, skb->len);
+   ret = rpmsg_send(btq->cmd_channel, skb->data, skb->len);
hdev->stat.cmd_tx++;
break;
default:
@@ -114,18 +114,15 @@ static int btqcomsmd_probe(struct platform_device *pdev)
wcnss = dev_get_drvdata(pdev->dev.parent);
 
btq->acl_channel = qcom_wcnss_open_channel(wcnss, "APPS_RIVA_BT_ACL",
-  btqcomsmd_acl_callback);
+  btqcomsmd_acl_callback, btq);
if (IS_ERR(btq->acl_channel))
return PTR_ERR(btq->acl_channel);
 
btq->cmd_channel = qcom_wcnss_open_channel(wcnss, "APPS_RIVA_BT_CMD",
-  btqcomsmd_cmd_callback);
+

linux-next: Tree for Mar 28

2017-03-27 Thread Stephen Rothwell

Hi all,

Changes since 20170327:

The kbuild tree gained a conflict against the input-current tree.

The s390 tree gained a conflict against the kbuild tree.

The md tree gained a conflict against the device-mapper tree.

The vhost tree gained a conflict against the kbuild tree.

The livepatching tree gained a conflict against the security tree.

Non-merge commits (relative to Linus' tree): 5112
 5683 files changed, 405824 insertions(+), 95186 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
and pseries_le_defconfig and i386, sparc and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 254 trees (counting Linus' and 37 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (ad0376eb1483 Merge tag 'edac_for_4.11_2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp)
Merging fixes/master (97da3854c526 Linux 4.11-rc3)
Merging kbuild-current/fixes (9be3213b14d4 gconfig: remove misleading 
parentheses around a condition)
Merging arc-current/for-curr (ae9955aeb8e4 ARC: vdk: Fix support of UIO)
Merging arm-current/fixes (a1016e94cce9 ARM: wire up statx syscall)
Merging m68k-current/for-linus (e3b1ebd67387 m68k: Wire up statx)
Merging metag-fixes/fixes (35d04077ad96 metag: Only define 
atomic_dec_if_positive conditionally)
Merging powerpc-fixes/fixes (cc638a488a52 gcc-plugins: update architecture list 
in documentation)
Merging sparc/master (f8e6859ea9d0 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (6ac3b77a6fff irda: vlsi_ir: fix check for DMA mapping 
errors)
Merging ipsec/master (72ef9c4125c7 dccp: fix memory leak during tear-down of 
unsuccessful connection request)
Merging netfilter/master (f83bf8da1135 netfilter: nfnl_cthelper: Fix memory 
leak)
Merging ipvs/master (5371bbf4b295 net: bcmgenet: Do not suspend PHY if 
Wake-on-LAN is enabled)
Merging wireless-drivers/master (6be3b6cce1e2 ath10k: fix incorrect 
wlan_mac_base in qca6174_regs)
Merging mac80211/master (ea90e0dc8cec nl80211: fix dumpit error path RTNL 
deadlocks)
Merging sound-current/for-linus (3f307834e695 ALSA: hda - Adding a group of pin 
definition to fix headset problem)
Merging pci-current/for-linus (9abb27c7594a PCI: thunder-pem: Add legacy 
firmware support for Cavium ThunderX host controller)
Merging driver-core.current/driver-core-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging tty.current/tty-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging usb.current/usb-linus (1633682053a7 USB: fix linked-list corruption in 
rh_call_control())
Merging usb-gadget-fixes/fixes (25cd9721c2b1 usb: gadget: f_hid: fix: Don't 
access hidg->req without spinlock held)
Merging usb-serial-fixes/usb-linus (436ecf5519d8 USB: serial: qcserial: add 
Dell DW5811e)
Merging usb-chipidea-fixes/ci-for-usb-stable (c7fbb09b2ea1 usb: chipidea: move 
the lock initialization to core file)
Merging phy/fixes (1a09b6a7c10e phy: qcom-usb-hs: Add depends on EXTCON)
Merging staging.current/staging-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging char-misc.current/char-misc-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging input-current/for-linus (5659495a7a14 uapi: add missing install of 
userio.h)
Merging crypto-current/master (9df0eb180c20 crypto: xts,lrw - fix out-of-bounds 
write after kmalloc failure)
Merging ide/master (96297aee8bce ide: palm_bk3710: add __initdata to 
palm_bk3710_port_info)
Merging vfio-fixes/for-l

linux-next: Tree for Mar 28

2017-03-27 Thread Stephen Rothwell

Hi all,

Changes since 20170327:

The kbuild tree gained a conflict against the input-current tree.

The s390 tree gained a conflict against the kbuild tree.

The md tree gained a conflict against the device-mapper tree.

The vhost tree gained a conflict against the kbuild tree.

The livepatching tree gained a conflict against the security tree.

Non-merge commits (relative to Linus' tree): 5112
 5683 files changed, 405824 insertions(+), 95186 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
and pseries_le_defconfig and i386, sparc and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 254 trees (counting Linus' and 37 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (ad0376eb1483 Merge tag 'edac_for_4.11_2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp)
Merging fixes/master (97da3854c526 Linux 4.11-rc3)
Merging kbuild-current/fixes (9be3213b14d4 gconfig: remove misleading 
parentheses around a condition)
Merging arc-current/for-curr (ae9955aeb8e4 ARC: vdk: Fix support of UIO)
Merging arm-current/fixes (a1016e94cce9 ARM: wire up statx syscall)
Merging m68k-current/for-linus (e3b1ebd67387 m68k: Wire up statx)
Merging metag-fixes/fixes (35d04077ad96 metag: Only define 
atomic_dec_if_positive conditionally)
Merging powerpc-fixes/fixes (cc638a488a52 gcc-plugins: update architecture list 
in documentation)
Merging sparc/master (f8e6859ea9d0 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (6ac3b77a6fff irda: vlsi_ir: fix check for DMA mapping 
errors)
Merging ipsec/master (72ef9c4125c7 dccp: fix memory leak during tear-down of 
unsuccessful connection request)
Merging netfilter/master (f83bf8da1135 netfilter: nfnl_cthelper: Fix memory 
leak)
Merging ipvs/master (5371bbf4b295 net: bcmgenet: Do not suspend PHY if 
Wake-on-LAN is enabled)
Merging wireless-drivers/master (6be3b6cce1e2 ath10k: fix incorrect 
wlan_mac_base in qca6174_regs)
Merging mac80211/master (ea90e0dc8cec nl80211: fix dumpit error path RTNL 
deadlocks)
Merging sound-current/for-linus (3f307834e695 ALSA: hda - Adding a group of pin 
definition to fix headset problem)
Merging pci-current/for-linus (9abb27c7594a PCI: thunder-pem: Add legacy 
firmware support for Cavium ThunderX host controller)
Merging driver-core.current/driver-core-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging tty.current/tty-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging usb.current/usb-linus (1633682053a7 USB: fix linked-list corruption in 
rh_call_control())
Merging usb-gadget-fixes/fixes (25cd9721c2b1 usb: gadget: f_hid: fix: Don't 
access hidg->req without spinlock held)
Merging usb-serial-fixes/usb-linus (436ecf5519d8 USB: serial: qcserial: add 
Dell DW5811e)
Merging usb-chipidea-fixes/ci-for-usb-stable (c7fbb09b2ea1 usb: chipidea: move 
the lock initialization to core file)
Merging phy/fixes (1a09b6a7c10e phy: qcom-usb-hs: Add depends on EXTCON)
Merging staging.current/staging-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging char-misc.current/char-misc-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging input-current/for-linus (5659495a7a14 uapi: add missing install of 
userio.h)
Merging crypto-current/master (9df0eb180c20 crypto: xts,lrw - fix out-of-bounds 
write after kmalloc failure)
Merging ide/master (96297aee8bce ide: palm_bk3710: add __initdata to 
palm_bk3710_port_info)
Merging vfio-fixes/for-l

[PATCH v6 5/5] i2c: aspeed: added slave support for Aspeed I2C driver

2017-03-27 Thread Brendan Higgins

Added slave support for Aspeed I2C controller. Supports fourteen busses
present in AST24XX and AST25XX BMC SoCs by Aspeed.

Signed-off-by: Brendan Higgins 
---
Added in v6:
  - Pulled slave support out of initial driver commit into its own commit.
  - No longer arbitrarily restrict bus to be slave xor master.
---
 drivers/i2c/busses/i2c-aspeed.c | 186 
 1 file changed, 186 insertions(+)

diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c-aspeed.c
index 04266acc6c46..a9ee58a2c4e2 100644
--- a/drivers/i2c/busses/i2c-aspeed.c
+++ b/drivers/i2c/busses/i2c-aspeed.c
@@ -49,6 +49,7 @@
 #define ASPEED_I2CD_SDA_DRIVE_1T_ENBIT(8)
 #define ASPEED_I2CD_M_SDA_DRIVE_1T_EN  BIT(7)
 #define ASPEED_I2CD_M_HIGH_SPEED_ENBIT(6)
+#define ASPEED_I2CD_SLAVE_EN   BIT(1)
 #define ASPEED_I2CD_MASTER_EN  BIT(0)
 
 /* 0x04 : I2CD Clock and AC Timing Control Register #1 */
@@ -69,6 +70,7 @@
  */
 #define ASPEED_I2CD_INTR_SDA_DL_TIMEOUTBIT(14)
 #define ASPEED_I2CD_INTR_BUS_RECOVER_DONE  BIT(13)
+#define ASPEED_I2CD_INTR_SLAVE_MATCH   BIT(7)
 #define ASPEED_I2CD_INTR_SCL_TIMEOUT   BIT(6)
 #define ASPEED_I2CD_INTR_ABNORMAL  BIT(5)
 #define ASPEED_I2CD_INTR_NORMAL_STOP   BIT(4)
@@ -106,6 +108,9 @@
 #define ASPEED_I2CD_M_TX_CMD   BIT(1)
 #define ASPEED_I2CD_M_START_CMDBIT(0)
 
+/* 0x18 : I2CD Slave Device Address Register   */
+#define ASPEED_I2CD_DEV_ADDR_MASK  GENMASK(6, 0)
+
 enum aspeed_i2c_master_state {
ASPEED_I2C_MASTER_START,
ASPEED_I2C_MASTER_TX_FIRST,
@@ -115,6 +120,15 @@ enum aspeed_i2c_master_state {
ASPEED_I2C_MASTER_INACTIVE,
 };
 
+enum aspeed_i2c_slave_state {
+   ASPEED_I2C_SLAVE_START,
+   ASPEED_I2C_SLAVE_READ_REQUESTED,
+   ASPEED_I2C_SLAVE_READ_PROCESSED,
+   ASPEED_I2C_SLAVE_WRITE_REQUESTED,
+   ASPEED_I2C_SLAVE_WRITE_RECEIVED,
+   ASPEED_I2C_SLAVE_STOP,
+};
+
 struct aspeed_i2c_bus {
struct i2c_adapter  adap;
struct device   *dev;
@@ -207,6 +221,110 @@ static int aspeed_i2c_recover_bus(struct aspeed_i2c_bus 
*bus)
return ret;
 }
 
+#if IS_ENABLED(CONFIG_I2C_SLAVE)
+static bool aspeed_i2c_slave_irq(struct aspeed_i2c_bus *bus)
+{
+   u32 command, irq_status, status_ack = 0;
+   struct i2c_client *slave = bus->slave;
+   bool irq_handled = true;
+   u8 value;
+
+   spin_lock(>lock);
+   if (!slave) {
+   irq_handled = false;
+   goto out;
+   }
+
+   command = aspeed_i2c_read(bus, ASPEED_I2C_CMD_REG);
+   irq_status = aspeed_i2c_read(bus, ASPEED_I2C_INTR_STS_REG);
+
+   /* Slave was requested, restart state machine. */
+   if (irq_status & ASPEED_I2CD_INTR_SLAVE_MATCH) {
+   status_ack |= ASPEED_I2CD_INTR_SLAVE_MATCH;
+   bus->slave_state = ASPEED_I2C_SLAVE_START;
+   }
+
+   /* Slave is not currently active, irq was for someone else. */
+   if (bus->slave_state == ASPEED_I2C_SLAVE_STOP) {
+   irq_handled = false;
+   goto out;
+   }
+
+   dev_dbg(bus->dev, "slave irq status 0x%08x, cmd 0x%08x\n",
+   irq_status, command);
+
+   /* Slave was sent something. */
+   if (irq_status & ASPEED_I2CD_INTR_RX_DONE) {
+   value = aspeed_i2c_read(bus, ASPEED_I2C_BYTE_BUF_REG) >> 8;
+   /* Handle address frame. */
+   if (bus->slave_state == ASPEED_I2C_SLAVE_START) {
+   if (value & 0x1)
+   bus->slave_state =
+   ASPEED_I2C_SLAVE_READ_REQUESTED;
+   else
+   bus->slave_state =
+   
ASPEED_I2C_SLAVE_WRITE_REQUESTED;
+   }
+   status_ack |= ASPEED_I2CD_INTR_RX_DONE;
+   }
+
+   /* Slave was asked to stop. */
+   if (irq_status & ASPEED_I2CD_INTR_NORMAL_STOP) {
+   status_ack |= ASPEED_I2CD_INTR_NORMAL_STOP;
+   bus->slave_state = ASPEED_I2C_SLAVE_STOP;
+   }
+   if (irq_status & ASPEED_I2CD_INTR_TX_NAK) {
+   status_ack |= ASPEED_I2CD_INTR_TX_NAK;
+   bus->slave_state = ASPEED_I2C_SLAVE_STOP;
+   }
+
+   switch (bus->slave_state) {
+   case ASPEED_I2C_SLAVE_READ_REQUESTED:
+   if (irq_status & ASPEED_I2CD_INTR_TX_ACK)
+   dev_err(bus->dev, "Unexpected ACK on read request.\n");
+   bus->slave_state = ASPEED_I2C_SLAVE_READ_PROCESSED;
+
+   i2c_slave_event(slave, I2C_SLAVE_READ_REQUESTED, );
+   aspeed_i2c_write(bus, value,

[PATCH v6 4/5] i2c: aspeed: added driver for Aspeed I2C

2017-03-27 Thread Brendan Higgins

Added initial master support for Aspeed I2C controller. Supports
fourteen busses present in AST24XX and AST25XX BMC SoCs by Aspeed.

Signed-off-by: Brendan Higgins 
---
Changes for v2:
  - Added single module_init (multiple was breaking some builds).
Changes for v3:
  - Removed "bus" device tree param; now extracted from bus address offset
Changes for v4:
  - I2C adapter number is now generated dynamically unless specified in alias.
Changes for v5:
  - Removed irq_chip used to multiplex IRQ and replaced it with dummy_irq_chip
along with some other IRQ cleanup.
  - Addressed comments from Cedric, and Vladimir, mostly stylistic things and
using devm managed resources.
  - Increased max clock frequency before the bus is put in HighSpeed mode, as
per Kachalov's comment.
Changes for v6:
  - No longer arbitrarily restrict bus to be slave xor master.
  - Pulled out "struct aspeed_i2c_controller" as a interrupt controller.
  - Pulled out slave support into its own commit.
  - Rewrote code that sets clock divider register because the original version
set it incorrectly.
  - Rewrote the aspeed_i2c_master_irq handler because the old method of
completing a completion in between restarts was too slow causing devices to
misbehave.
  - Added support for I2C_M_RECV_LEN which I had incorrectly said was supported
before.
  - Addressed other comments from Vladimir.
---
 drivers/i2c/busses/Kconfig  |  10 +
 drivers/i2c/busses/Makefile |   1 +
 drivers/i2c/busses/i2c-aspeed.c | 610 
 3 files changed, 621 insertions(+)
 create mode 100644 drivers/i2c/busses/i2c-aspeed.c

diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index 8adc0f1d7ad0..e5ea5641a874 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -326,6 +326,16 @@ config I2C_POWERMAC
 
 comment "I2C system bus drivers (mostly embedded / system-on-chip)"
 
+config I2C_ASPEED
+   tristate "Aspeed AST2xxx SoC I2C Controller"
+   depends on ARCH_ASPEED
+   help
+ If you say yes to this option, support will be included for the
+ Aspeed AST2xxx SoC I2C controller.
+
+ This driver can also be built as a module.  If so, the module
+ will be called i2c-aspeed.
+
 config I2C_AT91
tristate "Atmel AT91 I2C Two-Wire interface (TWI)"
depends on ARCH_AT91
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile
index 30b60855fbcd..e84604b9bf3b 100644
--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -29,6 +29,7 @@ obj-$(CONFIG_I2C_HYDRA)   += i2c-hydra.o
 obj-$(CONFIG_I2C_POWERMAC) += i2c-powermac.o
 
 # Embedded system I2C/SMBus host controller drivers
+obj-$(CONFIG_I2C_ASPEED)   += i2c-aspeed.o
 obj-$(CONFIG_I2C_AT91) += i2c-at91.o
 obj-$(CONFIG_I2C_AU1550)   += i2c-au1550.o
 obj-$(CONFIG_I2C_AXXIA)+= i2c-axxia.o
diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c-aspeed.c
new file mode 100644
index ..04266acc6c46
--- /dev/null
+++ b/drivers/i2c/busses/i2c-aspeed.c
@@ -0,0 +1,610 @@
+/*
+ *  Aspeed 24XX/25XX I2C Interrupt Controller.
+ *
+ *  Copyright (C) 2012-2017 ASPEED Technology Inc.
+ *  Copyright 2017 IBM Corporation
+ *  Copyright 2017 Google, Inc.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2 as
+ *  published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* I2C Register */
+#define ASPEED_I2C_FUN_CTRL_REG0x00
+#define ASPEED_I2C_AC_TIMING_REG1  0x04
+#define ASPEED_I2C_AC_TIMING_REG2  0x08
+#define ASPEED_I2C_INTR_CTRL_REG   0x0c
+#define ASPEED_I2C_INTR_STS_REG0x10
+#define ASPEED_I2C_CMD_REG 0x14
+#define ASPEED_I2C_DEV_ADDR_REG0x18
+#define ASPEED_I2C_BYTE_BUF_REG0x20
+
+/* Global Register Definition */
+/* 0x00 : I2C Interrupt Status Register  */
+/* 0x08 : I2C Interrupt Target Assignment  */
+
+/* Device Register Definition */
+/* 0x00 : I2CD Function Control Register  */
+#define ASPEED_I2CD_MULTI_MASTER_DIS   BIT(15)
+#define ASPEED_I2CD_SDA_DRIVE_1T_ENBIT(8)
+#define ASPEED_I2CD_M_SDA_DRIVE_1T_EN  BIT(7)
+#define ASPEED_I2CD_M_HIGH_SPEED_ENBIT(6)
+#define ASPEED_I2CD_MASTER_EN  BIT(0)
+
+/* 0x04 : I2CD Clock and AC Timing Control Register #1 */
+#define ASPEED_I2CD_TIME_SCL_HIGH_SHIFT16
+#define ASPEED_I2CD_TIME_SCL_HIGH_MASK

[PATCH v6 5/5] i2c: aspeed: added slave support for Aspeed I2C driver

2017-03-27 Thread Brendan Higgins

Added slave support for Aspeed I2C controller. Supports fourteen busses
present in AST24XX and AST25XX BMC SoCs by Aspeed.

Signed-off-by: Brendan Higgins 
---
Added in v6:
  - Pulled slave support out of initial driver commit into its own commit.
  - No longer arbitrarily restrict bus to be slave xor master.
---
 drivers/i2c/busses/i2c-aspeed.c | 186 
 1 file changed, 186 insertions(+)

diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c-aspeed.c
index 04266acc6c46..a9ee58a2c4e2 100644
--- a/drivers/i2c/busses/i2c-aspeed.c
+++ b/drivers/i2c/busses/i2c-aspeed.c
@@ -49,6 +49,7 @@
 #define ASPEED_I2CD_SDA_DRIVE_1T_ENBIT(8)
 #define ASPEED_I2CD_M_SDA_DRIVE_1T_EN  BIT(7)
 #define ASPEED_I2CD_M_HIGH_SPEED_ENBIT(6)
+#define ASPEED_I2CD_SLAVE_EN   BIT(1)
 #define ASPEED_I2CD_MASTER_EN  BIT(0)
 
 /* 0x04 : I2CD Clock and AC Timing Control Register #1 */
@@ -69,6 +70,7 @@
  */
 #define ASPEED_I2CD_INTR_SDA_DL_TIMEOUTBIT(14)
 #define ASPEED_I2CD_INTR_BUS_RECOVER_DONE  BIT(13)
+#define ASPEED_I2CD_INTR_SLAVE_MATCH   BIT(7)
 #define ASPEED_I2CD_INTR_SCL_TIMEOUT   BIT(6)
 #define ASPEED_I2CD_INTR_ABNORMAL  BIT(5)
 #define ASPEED_I2CD_INTR_NORMAL_STOP   BIT(4)
@@ -106,6 +108,9 @@
 #define ASPEED_I2CD_M_TX_CMD   BIT(1)
 #define ASPEED_I2CD_M_START_CMDBIT(0)
 
+/* 0x18 : I2CD Slave Device Address Register   */
+#define ASPEED_I2CD_DEV_ADDR_MASK  GENMASK(6, 0)
+
 enum aspeed_i2c_master_state {
ASPEED_I2C_MASTER_START,
ASPEED_I2C_MASTER_TX_FIRST,
@@ -115,6 +120,15 @@ enum aspeed_i2c_master_state {
ASPEED_I2C_MASTER_INACTIVE,
 };
 
+enum aspeed_i2c_slave_state {
+   ASPEED_I2C_SLAVE_START,
+   ASPEED_I2C_SLAVE_READ_REQUESTED,
+   ASPEED_I2C_SLAVE_READ_PROCESSED,
+   ASPEED_I2C_SLAVE_WRITE_REQUESTED,
+   ASPEED_I2C_SLAVE_WRITE_RECEIVED,
+   ASPEED_I2C_SLAVE_STOP,
+};
+
 struct aspeed_i2c_bus {
struct i2c_adapter  adap;
struct device   *dev;
@@ -207,6 +221,110 @@ static int aspeed_i2c_recover_bus(struct aspeed_i2c_bus 
*bus)
return ret;
 }
 
+#if IS_ENABLED(CONFIG_I2C_SLAVE)
+static bool aspeed_i2c_slave_irq(struct aspeed_i2c_bus *bus)
+{
+   u32 command, irq_status, status_ack = 0;
+   struct i2c_client *slave = bus->slave;
+   bool irq_handled = true;
+   u8 value;
+
+   spin_lock(>lock);
+   if (!slave) {
+   irq_handled = false;
+   goto out;
+   }
+
+   command = aspeed_i2c_read(bus, ASPEED_I2C_CMD_REG);
+   irq_status = aspeed_i2c_read(bus, ASPEED_I2C_INTR_STS_REG);
+
+   /* Slave was requested, restart state machine. */
+   if (irq_status & ASPEED_I2CD_INTR_SLAVE_MATCH) {
+   status_ack |= ASPEED_I2CD_INTR_SLAVE_MATCH;
+   bus->slave_state = ASPEED_I2C_SLAVE_START;
+   }
+
+   /* Slave is not currently active, irq was for someone else. */
+   if (bus->slave_state == ASPEED_I2C_SLAVE_STOP) {
+   irq_handled = false;
+   goto out;
+   }
+
+   dev_dbg(bus->dev, "slave irq status 0x%08x, cmd 0x%08x\n",
+   irq_status, command);
+
+   /* Slave was sent something. */
+   if (irq_status & ASPEED_I2CD_INTR_RX_DONE) {
+   value = aspeed_i2c_read(bus, ASPEED_I2C_BYTE_BUF_REG) >> 8;
+   /* Handle address frame. */
+   if (bus->slave_state == ASPEED_I2C_SLAVE_START) {
+   if (value & 0x1)
+   bus->slave_state =
+   ASPEED_I2C_SLAVE_READ_REQUESTED;
+   else
+   bus->slave_state =
+   
ASPEED_I2C_SLAVE_WRITE_REQUESTED;
+   }
+   status_ack |= ASPEED_I2CD_INTR_RX_DONE;
+   }
+
+   /* Slave was asked to stop. */
+   if (irq_status & ASPEED_I2CD_INTR_NORMAL_STOP) {
+   status_ack |= ASPEED_I2CD_INTR_NORMAL_STOP;
+   bus->slave_state = ASPEED_I2C_SLAVE_STOP;
+   }
+   if (irq_status & ASPEED_I2CD_INTR_TX_NAK) {
+   status_ack |= ASPEED_I2CD_INTR_TX_NAK;
+   bus->slave_state = ASPEED_I2C_SLAVE_STOP;
+   }
+
+   switch (bus->slave_state) {
+   case ASPEED_I2C_SLAVE_READ_REQUESTED:
+   if (irq_status & ASPEED_I2CD_INTR_TX_ACK)
+   dev_err(bus->dev, "Unexpected ACK on read request.\n");
+   bus->slave_state = ASPEED_I2C_SLAVE_READ_PROCESSED;
+
+   i2c_slave_event(slave, I2C_SLAVE_READ_REQUESTED, );
+   aspeed_i2c_write(bus, value, ASPEED_I2C_BYTE_BUF_REG);
+

[PATCH v6 4/5] i2c: aspeed: added driver for Aspeed I2C

2017-03-27 Thread Brendan Higgins

Added initial master support for Aspeed I2C controller. Supports
fourteen busses present in AST24XX and AST25XX BMC SoCs by Aspeed.

Signed-off-by: Brendan Higgins 
---
Changes for v2:
  - Added single module_init (multiple was breaking some builds).
Changes for v3:
  - Removed "bus" device tree param; now extracted from bus address offset
Changes for v4:
  - I2C adapter number is now generated dynamically unless specified in alias.
Changes for v5:
  - Removed irq_chip used to multiplex IRQ and replaced it with dummy_irq_chip
along with some other IRQ cleanup.
  - Addressed comments from Cedric, and Vladimir, mostly stylistic things and
using devm managed resources.
  - Increased max clock frequency before the bus is put in HighSpeed mode, as
per Kachalov's comment.
Changes for v6:
  - No longer arbitrarily restrict bus to be slave xor master.
  - Pulled out "struct aspeed_i2c_controller" as a interrupt controller.
  - Pulled out slave support into its own commit.
  - Rewrote code that sets clock divider register because the original version
set it incorrectly.
  - Rewrote the aspeed_i2c_master_irq handler because the old method of
completing a completion in between restarts was too slow causing devices to
misbehave.
  - Added support for I2C_M_RECV_LEN which I had incorrectly said was supported
before.
  - Addressed other comments from Vladimir.
---
 drivers/i2c/busses/Kconfig  |  10 +
 drivers/i2c/busses/Makefile |   1 +
 drivers/i2c/busses/i2c-aspeed.c | 610 
 3 files changed, 621 insertions(+)
 create mode 100644 drivers/i2c/busses/i2c-aspeed.c

diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index 8adc0f1d7ad0..e5ea5641a874 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -326,6 +326,16 @@ config I2C_POWERMAC
 
 comment "I2C system bus drivers (mostly embedded / system-on-chip)"
 
+config I2C_ASPEED
+   tristate "Aspeed AST2xxx SoC I2C Controller"
+   depends on ARCH_ASPEED
+   help
+ If you say yes to this option, support will be included for the
+ Aspeed AST2xxx SoC I2C controller.
+
+ This driver can also be built as a module.  If so, the module
+ will be called i2c-aspeed.
+
 config I2C_AT91
tristate "Atmel AT91 I2C Two-Wire interface (TWI)"
depends on ARCH_AT91
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile
index 30b60855fbcd..e84604b9bf3b 100644
--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -29,6 +29,7 @@ obj-$(CONFIG_I2C_HYDRA)   += i2c-hydra.o
 obj-$(CONFIG_I2C_POWERMAC) += i2c-powermac.o
 
 # Embedded system I2C/SMBus host controller drivers
+obj-$(CONFIG_I2C_ASPEED)   += i2c-aspeed.o
 obj-$(CONFIG_I2C_AT91) += i2c-at91.o
 obj-$(CONFIG_I2C_AU1550)   += i2c-au1550.o
 obj-$(CONFIG_I2C_AXXIA)+= i2c-axxia.o
diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c-aspeed.c
new file mode 100644
index ..04266acc6c46
--- /dev/null
+++ b/drivers/i2c/busses/i2c-aspeed.c
@@ -0,0 +1,610 @@
+/*
+ *  Aspeed 24XX/25XX I2C Interrupt Controller.
+ *
+ *  Copyright (C) 2012-2017 ASPEED Technology Inc.
+ *  Copyright 2017 IBM Corporation
+ *  Copyright 2017 Google, Inc.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2 as
+ *  published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* I2C Register */
+#define ASPEED_I2C_FUN_CTRL_REG0x00
+#define ASPEED_I2C_AC_TIMING_REG1  0x04
+#define ASPEED_I2C_AC_TIMING_REG2  0x08
+#define ASPEED_I2C_INTR_CTRL_REG   0x0c
+#define ASPEED_I2C_INTR_STS_REG0x10
+#define ASPEED_I2C_CMD_REG 0x14
+#define ASPEED_I2C_DEV_ADDR_REG0x18
+#define ASPEED_I2C_BYTE_BUF_REG0x20
+
+/* Global Register Definition */
+/* 0x00 : I2C Interrupt Status Register  */
+/* 0x08 : I2C Interrupt Target Assignment  */
+
+/* Device Register Definition */
+/* 0x00 : I2CD Function Control Register  */
+#define ASPEED_I2CD_MULTI_MASTER_DIS   BIT(15)
+#define ASPEED_I2CD_SDA_DRIVE_1T_ENBIT(8)
+#define ASPEED_I2CD_M_SDA_DRIVE_1T_EN  BIT(7)
+#define ASPEED_I2CD_M_HIGH_SPEED_ENBIT(6)
+#define ASPEED_I2CD_MASTER_EN  BIT(0)
+
+/* 0x04 : I2CD Clock and AC Timing Control Register #1 */
+#define ASPEED_I2CD_TIME_SCL_HIGH_SHIFT16
+#define ASPEED_I2CD_TIME_SCL_HIGH_MASK GENMASK(19, 16)

[PATCH v6 2/5] irqchip/aspeed-i2c-ic: Add I2C IRQ controller for Aspeed

2017-03-27 Thread Brendan Higgins

The Aspeed 24XX/25XX chips share a single hardware interrupt across 14
separate I2C busses. This adds a dummy irqchip which maps the single
hardware interrupt to software interrupts for each of the busses.

Signed-off-by: Brendan Higgins 
---
Added in v6:
  - Pulled "aspeed_i2c_controller" out into a interrupt controller since that is
what it actually does.
---
 drivers/irqchip/Makefile|   2 +-
 drivers/irqchip/irq-aspeed-i2c-ic.c | 102 
 2 files changed, 103 insertions(+), 1 deletion(-)
 create mode 100644 drivers/irqchip/irq-aspeed-i2c-ic.c

diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
index 152bc40b6762..c136c2bd1761 100644
--- a/drivers/irqchip/Makefile
+++ b/drivers/irqchip/Makefile
@@ -74,6 +74,6 @@ obj-$(CONFIG_MVEBU_ODMI)  += irq-mvebu-odmi.o
 obj-$(CONFIG_MVEBU_PIC)+= irq-mvebu-pic.o
 obj-$(CONFIG_LS_SCFG_MSI)  += irq-ls-scfg-msi.o
 obj-$(CONFIG_EZNPS_GIC)+= irq-eznps.o
-obj-$(CONFIG_ARCH_ASPEED)  += irq-aspeed-vic.o
+obj-$(CONFIG_ARCH_ASPEED)  += irq-aspeed-vic.o irq-aspeed-i2c-ic.o
 obj-$(CONFIG_STM32_EXTI)   += irq-stm32-exti.o
 obj-$(CONFIG_QCOM_IRQ_COMBINER)+= qcom-irq-combiner.o
diff --git a/drivers/irqchip/irq-aspeed-i2c-ic.c 
b/drivers/irqchip/irq-aspeed-i2c-ic.c
new file mode 100644
index ..59c50b28dec0
--- /dev/null
+++ b/drivers/irqchip/irq-aspeed-i2c-ic.c
@@ -0,0 +1,102 @@
+/*
+ *  Aspeed 24XX/25XX I2C Interrupt Controller.
+ *
+ *  Copyright (C) 2012-2017 ASPEED Technology Inc.
+ *  Copyright 2017 IBM Corporation
+ *  Copyright 2017 Google, Inc.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2 as
+ *  published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+#define ASPEED_I2C_IC_NUM_BUS 14
+
+struct aspeed_i2c_ic {
+   void __iomem*base;
+   int parent_irq;
+   struct irq_domain   *irq_domain;
+};
+
+/*
+ * The aspeed chip provides a single hardware interrupt for all of the I2C
+ * busses, so we use a dummy interrupt chip to translate this single interrupt
+ * into multiple interrupts, each associated with a single I2C bus.
+ */
+static void aspeed_i2c_ic_irq_handler(struct irq_desc *desc)
+{
+   struct aspeed_i2c_ic *i2c_ic = irq_desc_get_handler_data(desc);
+   struct irq_chip *chip = irq_desc_get_chip(desc);
+   unsigned long bit, status;
+   unsigned int bus_irq;
+
+   chained_irq_enter(chip, desc);
+   status = readl(i2c_ic->base);
+   for_each_set_bit(bit, , ASPEED_I2C_IC_NUM_BUS) {
+   bus_irq = irq_find_mapping(i2c_ic->irq_domain, bit);
+   generic_handle_irq(bus_irq);
+   }
+   chained_irq_exit(chip, desc);
+}
+
+/*
+ * Set simple handler and mark IRQ as valid. Nothing interesting to do here
+ * since we are using a dummy interrupt chip.
+ */
+static int aspeed_i2c_ic_map_irq_domain(struct irq_domain *domain,
+   unsigned int irq, irq_hw_number_t hwirq)
+{
+   irq_set_chip_and_handler(irq, _irq_chip, handle_simple_irq);
+   irq_set_chip_data(irq, domain->host_data);
+
+   return 0;
+}
+
+static const struct irq_domain_ops aspeed_i2c_ic_irq_domain_ops = {
+   .map = aspeed_i2c_ic_map_irq_domain,
+};
+
+static int __init aspeed_i2c_ic_of_init(struct device_node *node,
+   struct device_node *parent)
+{
+   struct aspeed_i2c_ic *i2c_ic;
+
+   i2c_ic = kzalloc(sizeof(*i2c_ic), GFP_KERNEL);
+   if (!i2c_ic)
+   return -ENOMEM;
+
+   i2c_ic->base = of_iomap(node, 0);
+   if (IS_ERR(i2c_ic->base))
+   return PTR_ERR(i2c_ic->base);
+
+   i2c_ic->parent_irq = irq_of_parse_and_map(node, 0);
+   if (i2c_ic->parent_irq < 0)
+   return i2c_ic->parent_irq;
+
+   i2c_ic->irq_domain = irq_domain_add_linear(
+   node, ASPEED_I2C_IC_NUM_BUS,
+   _i2c_ic_irq_domain_ops, NULL);
+   if (!i2c_ic->irq_domain)
+   return -ENOMEM;
+
+   i2c_ic->irq_domain->name = "ast-i2c-domain";
+
+   irq_set_chained_handler_and_data(i2c_ic->parent_irq,
+aspeed_i2c_ic_irq_handler, i2c_ic);
+
+   pr_info("i2c controller registered, irq %d\n", i2c_ic->parent_irq);
+
+   return 0;
+}
+
+IRQCHIP_DECLARE(ast2400_i2c_ic, "aspeed,ast2400-i2c-ic", 
aspeed_i2c_ic_of_init);
+IRQCHIP_DECLARE(ast2500_i2c_ic, "aspeed,ast2500-i2c-ic", 
aspeed_i2c_ic_of_init);
-- 
2.12.2.564.g063fe858b8-goog

[PATCH v6 2/5] irqchip/aspeed-i2c-ic: Add I2C IRQ controller for Aspeed

2017-03-27 Thread Brendan Higgins

The Aspeed 24XX/25XX chips share a single hardware interrupt across 14
separate I2C busses. This adds a dummy irqchip which maps the single
hardware interrupt to software interrupts for each of the busses.

Signed-off-by: Brendan Higgins 
---
Added in v6:
  - Pulled "aspeed_i2c_controller" out into a interrupt controller since that is
what it actually does.
---
 drivers/irqchip/Makefile|   2 +-
 drivers/irqchip/irq-aspeed-i2c-ic.c | 102 
 2 files changed, 103 insertions(+), 1 deletion(-)
 create mode 100644 drivers/irqchip/irq-aspeed-i2c-ic.c

diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
index 152bc40b6762..c136c2bd1761 100644
--- a/drivers/irqchip/Makefile
+++ b/drivers/irqchip/Makefile
@@ -74,6 +74,6 @@ obj-$(CONFIG_MVEBU_ODMI)  += irq-mvebu-odmi.o
 obj-$(CONFIG_MVEBU_PIC)+= irq-mvebu-pic.o
 obj-$(CONFIG_LS_SCFG_MSI)  += irq-ls-scfg-msi.o
 obj-$(CONFIG_EZNPS_GIC)+= irq-eznps.o
-obj-$(CONFIG_ARCH_ASPEED)  += irq-aspeed-vic.o
+obj-$(CONFIG_ARCH_ASPEED)  += irq-aspeed-vic.o irq-aspeed-i2c-ic.o
 obj-$(CONFIG_STM32_EXTI)   += irq-stm32-exti.o
 obj-$(CONFIG_QCOM_IRQ_COMBINER)+= qcom-irq-combiner.o
diff --git a/drivers/irqchip/irq-aspeed-i2c-ic.c 
b/drivers/irqchip/irq-aspeed-i2c-ic.c
new file mode 100644
index ..59c50b28dec0
--- /dev/null
+++ b/drivers/irqchip/irq-aspeed-i2c-ic.c
@@ -0,0 +1,102 @@
+/*
+ *  Aspeed 24XX/25XX I2C Interrupt Controller.
+ *
+ *  Copyright (C) 2012-2017 ASPEED Technology Inc.
+ *  Copyright 2017 IBM Corporation
+ *  Copyright 2017 Google, Inc.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2 as
+ *  published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+#define ASPEED_I2C_IC_NUM_BUS 14
+
+struct aspeed_i2c_ic {
+   void __iomem*base;
+   int parent_irq;
+   struct irq_domain   *irq_domain;
+};
+
+/*
+ * The aspeed chip provides a single hardware interrupt for all of the I2C
+ * busses, so we use a dummy interrupt chip to translate this single interrupt
+ * into multiple interrupts, each associated with a single I2C bus.
+ */
+static void aspeed_i2c_ic_irq_handler(struct irq_desc *desc)
+{
+   struct aspeed_i2c_ic *i2c_ic = irq_desc_get_handler_data(desc);
+   struct irq_chip *chip = irq_desc_get_chip(desc);
+   unsigned long bit, status;
+   unsigned int bus_irq;
+
+   chained_irq_enter(chip, desc);
+   status = readl(i2c_ic->base);
+   for_each_set_bit(bit, , ASPEED_I2C_IC_NUM_BUS) {
+   bus_irq = irq_find_mapping(i2c_ic->irq_domain, bit);
+   generic_handle_irq(bus_irq);
+   }
+   chained_irq_exit(chip, desc);
+}
+
+/*
+ * Set simple handler and mark IRQ as valid. Nothing interesting to do here
+ * since we are using a dummy interrupt chip.
+ */
+static int aspeed_i2c_ic_map_irq_domain(struct irq_domain *domain,
+   unsigned int irq, irq_hw_number_t hwirq)
+{
+   irq_set_chip_and_handler(irq, _irq_chip, handle_simple_irq);
+   irq_set_chip_data(irq, domain->host_data);
+
+   return 0;
+}
+
+static const struct irq_domain_ops aspeed_i2c_ic_irq_domain_ops = {
+   .map = aspeed_i2c_ic_map_irq_domain,
+};
+
+static int __init aspeed_i2c_ic_of_init(struct device_node *node,
+   struct device_node *parent)
+{
+   struct aspeed_i2c_ic *i2c_ic;
+
+   i2c_ic = kzalloc(sizeof(*i2c_ic), GFP_KERNEL);
+   if (!i2c_ic)
+   return -ENOMEM;
+
+   i2c_ic->base = of_iomap(node, 0);
+   if (IS_ERR(i2c_ic->base))
+   return PTR_ERR(i2c_ic->base);
+
+   i2c_ic->parent_irq = irq_of_parse_and_map(node, 0);
+   if (i2c_ic->parent_irq < 0)
+   return i2c_ic->parent_irq;
+
+   i2c_ic->irq_domain = irq_domain_add_linear(
+   node, ASPEED_I2C_IC_NUM_BUS,
+   _i2c_ic_irq_domain_ops, NULL);
+   if (!i2c_ic->irq_domain)
+   return -ENOMEM;
+
+   i2c_ic->irq_domain->name = "ast-i2c-domain";
+
+   irq_set_chained_handler_and_data(i2c_ic->parent_irq,
+aspeed_i2c_ic_irq_handler, i2c_ic);
+
+   pr_info("i2c controller registered, irq %d\n", i2c_ic->parent_irq);
+
+   return 0;
+}
+
+IRQCHIP_DECLARE(ast2400_i2c_ic, "aspeed,ast2400-i2c-ic", 
aspeed_i2c_ic_of_init);
+IRQCHIP_DECLARE(ast2500_i2c_ic, "aspeed,ast2500-i2c-ic", 
aspeed_i2c_ic_of_init);
-- 
2.12.2.564.g063fe858b8-goog

[PATCH v6 3/5] i2c: aspeed: added documentation for Aspeed I2C driver

2017-03-27 Thread Brendan Higgins

Added device tree binding documentation for Aspeed I2C busses.

Signed-off-by: Brendan Higgins 
---
Changes for v2:
  - None
Changes for v3:
  - Removed reference to "bus" device tree param
Changes for v4:
  - None
Changes for v5:
  - None
Changes for v6:
  - Replaced the controller property with and interrupt controller, leaving only
the busses in the I2C documentation.
---
 .../devicetree/bindings/i2c/i2c-aspeed.txt | 49 ++
 1 file changed, 49 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/i2c/i2c-aspeed.txt

diff --git a/Documentation/devicetree/bindings/i2c/i2c-aspeed.txt 
b/Documentation/devicetree/bindings/i2c/i2c-aspeed.txt
new file mode 100644
index ..fbcc501706b1
--- /dev/null
+++ b/Documentation/devicetree/bindings/i2c/i2c-aspeed.txt
@@ -0,0 +1,49 @@
+Device tree configuration for the I2C busses on the AST24XX and AST25XX SoCs.
+
+Required Properties:
+- #address-cells   : should be 1
+- #size-cells  : should be 0
+- reg  : address offset and range of bus
+- compatible   : should be "aspeed,ast2400-i2c-bus"
+ or "aspeed,ast2500-i2c-bus"
+- clocks   : root clock of bus, should reference the APB
+ clock
+- interrupts   : interrupt number
+- interrupt-parent : interrupt controller for bus, should reference a
+ aspeed,ast2400-i2c-ic or aspeed,ast2500-i2c-ic
+ interrupt controller
+
+Optional Properties:
+- clock-frequency  : frequency of the bus clock in Hz
+ defaults to 100 kHz when not specified
+
+Example:
+
+i2c {
+   compatible = "simple-bus";
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges = <0 0x1e78a000 0x1000>;
+
+   i2c_ic: interrupt-controller@0 {
+   #interrupt-cells = <1>;
+   compatible = "aspeed,ast2400-i2c-ic";
+   reg = <0x0 0x40>;
+   interrupts = <12>;
+   interrupt-controller;
+   };
+
+   i2c0: i2c-bus@40 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   #interrupt-cells = <1>;
+   reg = <0x40 0x40>;
+   compatible = "aspeed,ast2400-i2c-bus";
+   bus = <0>;
+   clocks = <_apb>;
+   clock-frequency = <10>;
+   status = "disabled";
+   interrupts = <0>;
+   interrupt-parent = <_ic>;
+   };
+};
-- 
2.12.2.564.g063fe858b8-goog

[PATCH v6 3/5] i2c: aspeed: added documentation for Aspeed I2C driver

2017-03-27 Thread Brendan Higgins

Added device tree binding documentation for Aspeed I2C busses.

Signed-off-by: Brendan Higgins 
---
Changes for v2:
  - None
Changes for v3:
  - Removed reference to "bus" device tree param
Changes for v4:
  - None
Changes for v5:
  - None
Changes for v6:
  - Replaced the controller property with and interrupt controller, leaving only
the busses in the I2C documentation.
---
 .../devicetree/bindings/i2c/i2c-aspeed.txt | 49 ++
 1 file changed, 49 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/i2c/i2c-aspeed.txt

diff --git a/Documentation/devicetree/bindings/i2c/i2c-aspeed.txt 
b/Documentation/devicetree/bindings/i2c/i2c-aspeed.txt
new file mode 100644
index ..fbcc501706b1
--- /dev/null
+++ b/Documentation/devicetree/bindings/i2c/i2c-aspeed.txt
@@ -0,0 +1,49 @@
+Device tree configuration for the I2C busses on the AST24XX and AST25XX SoCs.
+
+Required Properties:
+- #address-cells   : should be 1
+- #size-cells  : should be 0
+- reg  : address offset and range of bus
+- compatible   : should be "aspeed,ast2400-i2c-bus"
+ or "aspeed,ast2500-i2c-bus"
+- clocks   : root clock of bus, should reference the APB
+ clock
+- interrupts   : interrupt number
+- interrupt-parent : interrupt controller for bus, should reference a
+ aspeed,ast2400-i2c-ic or aspeed,ast2500-i2c-ic
+ interrupt controller
+
+Optional Properties:
+- clock-frequency  : frequency of the bus clock in Hz
+ defaults to 100 kHz when not specified
+
+Example:
+
+i2c {
+   compatible = "simple-bus";
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges = <0 0x1e78a000 0x1000>;
+
+   i2c_ic: interrupt-controller@0 {
+   #interrupt-cells = <1>;
+   compatible = "aspeed,ast2400-i2c-ic";
+   reg = <0x0 0x40>;
+   interrupts = <12>;
+   interrupt-controller;
+   };
+
+   i2c0: i2c-bus@40 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   #interrupt-cells = <1>;
+   reg = <0x40 0x40>;
+   compatible = "aspeed,ast2400-i2c-bus";
+   bus = <0>;
+   clocks = <_apb>;
+   clock-frequency = <10>;
+   status = "disabled";
+   interrupts = <0>;
+   interrupt-parent = <_ic>;
+   };
+};
-- 
2.12.2.564.g063fe858b8-goog

Re: [PATCH 4/4] zram: make deduplication feature optional

2017-03-27 Thread Sergey Senozhatsky

Hello Minchan,

On (03/28/17 11:50), Minchan Kim wrote:
[..]
> > the reason I asked was that both zram and zswap sort of trying to
> > have same optimizations - zero filled pages handling, for example.
> > zram is a bit ahead now (to the best of my knowledge), because of
> > the recent 'same element' filled pages. zswap, probably, will have
> > something like this as well some day. or may be it won't, up to Seth
> > and Dan. de-duplication definitely can improve both zram and zswap,
> > which, once again, suggests that at some point zswap will have its
> > own implementation. well, or it won't.
> 
> As I pointed out, at least, dedup was no benefit for the swap case.
> I don't want to disrupt zsmalloc without any *proved* benefit.
> Even though it *might* have benefit, it shouldn't be in allocator
> layer unless it's really huge benefit like performance.

sure.

zpool, I meant zpool. I mistakenly used the word 'allocator'.

I meant some intermediate layer between zram and actual memory allocator,
a common layer which both zram and zswap can use and which can have
common functionality. just an idea. haven't really thought about it yet.

> It makes hard zram's allocator change in future.
> And please consider zswap is born for the latency in server workload
> while zram is memory efficiency in embedded world.

may be. I do suspect zswap is used in embedded as well [1]. there is even
a brand new allocator that 'reportedly' uses less memory than zsmalloc
and outperforms zsmalloc in embedded setups [1] (once again, reportedly.
I haven't tried it).

if z3fold is actually this good (I'm not saying it is not, haven't
tested it), then it makes sense to switch to zpool API in zram and let
zram users to select the allocator that fits their setups better.

just saying.

[1] http://events.linuxfoundation.org/sites/events/files/slides/zram1.pdf

-ss

[PATCH v6 1/5] irqchip/aspeed-i2c-ic: binding docs for Aspeed I2C Interrupt Controller

2017-03-27 Thread Brendan Higgins

Added device tree binding documentation for Aspeed I2C Interrupt
Controller.

Signed-off-by: Brendan Higgins 
---
Added in v6:
  - Pulled "aspeed_i2c_controller" out into a interrupt controller since that is
what it actually does.
---
 .../interrupt-controller/aspeed,ast2400-i2c-ic.txt | 25 ++
 1 file changed, 25 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-i2c-ic.txt

diff --git 
a/Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-i2c-ic.txt
 
b/Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-i2c-ic.txt
new file mode 100644
index ..033cc82e5684
--- /dev/null
+++ 
b/Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-i2c-ic.txt
@@ -0,0 +1,25 @@
+Device tree configuration for the I2C Interrupt Controller on the AST24XX and
+AST25XX SoCs.
+
+Required Properties:
+- #address-cells   : should be 1
+- #size-cells  : should be 1
+- #interrupt-cells : should be 1
+- compatible   : should be "aspeed,ast2400-i2c-ic"
+ or "aspeed,ast2500-i2c-ic"
+- reg  : address start and range of controller
+- interrupts   : interrupt number
+- interrupt-controller : denotes that the controller receives and fires
+ new interrupts for child busses
+
+Example:
+
+i2c_ic: interrupt-controller@0 {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   #interrupt-cells = <1>;
+   compatible = "aspeed,ast2400-i2c-ic";
+   reg = <0x0 0x40>;
+   interrupts = <12>;
+   interrupt-controller;
+};
-- 
2.12.2.564.g063fe858b8-goog

[PATCH v6 0/5] i2c: aspeed: added driver for Aspeed I2C

2017-03-27 Thread Brendan Higgins

Sorry for the delay, I went on a long vacation prior to receiving feedback and
got back in the middle of a hardware bring up that consumed all of my attention
for an extended period of time. I will try to plan upstream submissions around
my other responsibilities better in the future.

Addressed comments from:
  - Vladimir in: https://www.spinics.net/lists/linux-i2c/msg27387.html
and: https://www.spinics.net/lists/linux-i2c/msg27386.html
  - Wolfram in: https://www.spinics.net/lists/linux-i2c/msg27476.html
and: https://www.spinics.net/lists/linux-i2c/msg27483.html

Changes since previous update:
  - No longer arbitrarily restrict bus to be slave xor master.
  - Pulled out "struct aspeed_i2c_controller" as a interrupt controller.
  - Pulled out slave support into its own commit.
  - Rewrote code that sets clock divider register because the original version
set it incorrectly.
  - Discovered and fixed issue in implementation that caused certain slave
devices to misbehave; the cause was that the master IRQ handler would return
control to the requesting thread after the last RX or TX command was handled
such that the requesting thread would issue either a repeated start or stop.
This was incorrect because the time taken to complete the completion was too
great. I fixed this by rewriting the master IRQ handler so that it now
manages the entire transaction only returning control to the requesting
thread once the entire transaction is complete.
  - Rewrote the aspeed_i2c_master_irq handler because the old method of
completing a completion in between restarts was too slow causing devices to
misbehave.
  - Added support for I2C_M_RECV_LEN which I had incorrectly said was supported
before.
  - Addressed other comments from Vladimir.

Changes have been tested on the Aspeed 2500 evaluation board, as before, and now
on a real platform with an Aspeed 2520.

Re: [PATCH 4/4] zram: make deduplication feature optional

2017-03-27 Thread Sergey Senozhatsky

Hello Minchan,

On (03/28/17 11:50), Minchan Kim wrote:
[..]
> > the reason I asked was that both zram and zswap sort of trying to
> > have same optimizations - zero filled pages handling, for example.
> > zram is a bit ahead now (to the best of my knowledge), because of
> > the recent 'same element' filled pages. zswap, probably, will have
> > something like this as well some day. or may be it won't, up to Seth
> > and Dan. de-duplication definitely can improve both zram and zswap,
> > which, once again, suggests that at some point zswap will have its
> > own implementation. well, or it won't.
> 
> As I pointed out, at least, dedup was no benefit for the swap case.
> I don't want to disrupt zsmalloc without any *proved* benefit.
> Even though it *might* have benefit, it shouldn't be in allocator
> layer unless it's really huge benefit like performance.

sure.

zpool, I meant zpool. I mistakenly used the word 'allocator'.

I meant some intermediate layer between zram and actual memory allocator,
a common layer which both zram and zswap can use and which can have
common functionality. just an idea. haven't really thought about it yet.

> It makes hard zram's allocator change in future.
> And please consider zswap is born for the latency in server workload
> while zram is memory efficiency in embedded world.

may be. I do suspect zswap is used in embedded as well [1]. there is even
a brand new allocator that 'reportedly' uses less memory than zsmalloc
and outperforms zsmalloc in embedded setups [1] (once again, reportedly.
I haven't tried it).

if z3fold is actually this good (I'm not saying it is not, haven't
tested it), then it makes sense to switch to zpool API in zram and let
zram users to select the allocator that fits their setups better.

just saying.

[1] http://events.linuxfoundation.org/sites/events/files/slides/zram1.pdf

-ss

[PATCH v6 1/5] irqchip/aspeed-i2c-ic: binding docs for Aspeed I2C Interrupt Controller

2017-03-27 Thread Brendan Higgins

Added device tree binding documentation for Aspeed I2C Interrupt
Controller.

Signed-off-by: Brendan Higgins 
---
Added in v6:
  - Pulled "aspeed_i2c_controller" out into a interrupt controller since that is
what it actually does.
---
 .../interrupt-controller/aspeed,ast2400-i2c-ic.txt | 25 ++
 1 file changed, 25 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-i2c-ic.txt

diff --git 
a/Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-i2c-ic.txt
 
b/Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-i2c-ic.txt
new file mode 100644
index ..033cc82e5684
--- /dev/null
+++ 
b/Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-i2c-ic.txt
@@ -0,0 +1,25 @@
+Device tree configuration for the I2C Interrupt Controller on the AST24XX and
+AST25XX SoCs.
+
+Required Properties:
+- #address-cells   : should be 1
+- #size-cells  : should be 1
+- #interrupt-cells : should be 1
+- compatible   : should be "aspeed,ast2400-i2c-ic"
+ or "aspeed,ast2500-i2c-ic"
+- reg  : address start and range of controller
+- interrupts   : interrupt number
+- interrupt-controller : denotes that the controller receives and fires
+ new interrupts for child busses
+
+Example:
+
+i2c_ic: interrupt-controller@0 {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   #interrupt-cells = <1>;
+   compatible = "aspeed,ast2400-i2c-ic";
+   reg = <0x0 0x40>;
+   interrupts = <12>;
+   interrupt-controller;
+};
-- 
2.12.2.564.g063fe858b8-goog

[PATCH v6 0/5] i2c: aspeed: added driver for Aspeed I2C

2017-03-27 Thread Brendan Higgins

Sorry for the delay, I went on a long vacation prior to receiving feedback and
got back in the middle of a hardware bring up that consumed all of my attention
for an extended period of time. I will try to plan upstream submissions around
my other responsibilities better in the future.

Addressed comments from:
  - Vladimir in: https://www.spinics.net/lists/linux-i2c/msg27387.html
and: https://www.spinics.net/lists/linux-i2c/msg27386.html
  - Wolfram in: https://www.spinics.net/lists/linux-i2c/msg27476.html
and: https://www.spinics.net/lists/linux-i2c/msg27483.html

Changes since previous update:
  - No longer arbitrarily restrict bus to be slave xor master.
  - Pulled out "struct aspeed_i2c_controller" as a interrupt controller.
  - Pulled out slave support into its own commit.
  - Rewrote code that sets clock divider register because the original version
set it incorrectly.
  - Discovered and fixed issue in implementation that caused certain slave
devices to misbehave; the cause was that the master IRQ handler would return
control to the requesting thread after the last RX or TX command was handled
such that the requesting thread would issue either a repeated start or stop.
This was incorrect because the time taken to complete the completion was too
great. I fixed this by rewriting the master IRQ handler so that it now
manages the entire transaction only returning control to the requesting
thread once the entire transaction is complete.
  - Rewrote the aspeed_i2c_master_irq handler because the old method of
completing a completion in between restarts was too slow causing devices to
misbehave.
  - Added support for I2C_M_RECV_LEN which I had incorrectly said was supported
before.
  - Addressed other comments from Vladimir.

Changes have been tested on the Aspeed 2500 evaluation board, as before, and now
on a real platform with an Aspeed 2520.

Re: [PATCH v2] arch/sparc: Avoid DCTI Couples

2017-03-27 Thread David Miller

From: Babu Moger 
Date: Fri, 17 Mar 2017 14:52:21 -0600

> Avoid un-intended DCTI Couples. Use of DCTI couples is deprecated.
> Also address the "Programming Note" for optimal performance.
> 
> Here is the complete text from Oracle SPARC Architecture Specs.
> 
> 6.3.4.7 DCTI Couples
> "A delayed control transfer instruction (DCTI) in the delay slot of
> another DCTI is referred to as a “DCTI couple”. The use of DCTI couples
> is deprecated in the Oracle SPARC Architecture; no new software should
> place a DCTI in the delay slot of another DCTI, because on future Oracle
> SPARC Architecture implementations DCTI couples may execute either
> slowly or differently than the programmer assumes it will.
> 
> SPARC V8 and SPARC V9 Compatibility Note
> The SPARC V8 architecture left behavior undefined for a DCTI couple. The
> SPARC V9 architecture defined behavior in that case, but as of
> UltraSPARC Architecture 2005, use of DCTI couples was deprecated.
> Software should not expect high performance from DCTI couples, and
> performance of DCTI couples should be expected to decline further in
> future processors.
> 
> Programming Note
> As noted in TABLE 6-5 on page 115, an annulled branch-always
> (branch-always with a = 1) instruction is not architecturally a DCTI.
> However, since not all implementations make that distinction, for
> optimal performance, a DCTI should not be placed in the instruction word
> immediately following an annulled branch-always instruction (BA,A or
> BPA,A)."
> 
> Signed-off-by: Babu Moger 
> Reviewed-by: Rob Gardner 

Applied, thanks.

Re: [PATCH v2] arch/sparc: Avoid DCTI Couples

2017-03-27 Thread David Miller

From: Babu Moger 
Date: Fri, 17 Mar 2017 14:52:21 -0600

> Avoid un-intended DCTI Couples. Use of DCTI couples is deprecated.
> Also address the "Programming Note" for optimal performance.
> 
> Here is the complete text from Oracle SPARC Architecture Specs.
> 
> 6.3.4.7 DCTI Couples
> "A delayed control transfer instruction (DCTI) in the delay slot of
> another DCTI is referred to as a “DCTI couple”. The use of DCTI couples
> is deprecated in the Oracle SPARC Architecture; no new software should
> place a DCTI in the delay slot of another DCTI, because on future Oracle
> SPARC Architecture implementations DCTI couples may execute either
> slowly or differently than the programmer assumes it will.
> 
> SPARC V8 and SPARC V9 Compatibility Note
> The SPARC V8 architecture left behavior undefined for a DCTI couple. The
> SPARC V9 architecture defined behavior in that case, but as of
> UltraSPARC Architecture 2005, use of DCTI couples was deprecated.
> Software should not expect high performance from DCTI couples, and
> performance of DCTI couples should be expected to decline further in
> future processors.
> 
> Programming Note
> As noted in TABLE 6-5 on page 115, an annulled branch-always
> (branch-always with a = 1) instruction is not architecturally a DCTI.
> However, since not all implementations make that distinction, for
> optimal performance, a DCTI should not be placed in the instruction word
> immediately following an annulled branch-always instruction (BA,A or
> BPA,A)."
> 
> Signed-off-by: Babu Moger 
> Reviewed-by: Rob Gardner 

Applied, thanks.

Re: [RFC PATCH 3/3] of: fix node traversing in of_dma_get_range

2017-03-27 Thread Oza Oza

please find my comments inline.

On Mon, Mar 27, 2017 at 8:15 PM, Robin Murphy  wrote:
> Hi Rob,
>
> On 27/03/17 15:34, Rob Herring wrote:
>> On Sat, Mar 25, 2017 at 12:31 AM, Oza Pawandeep  wrote:
>>> it jumps to the parent node without examining the child node.
>>> also with that, it throws "no dma-ranges found for node"
>>> for pci dma-ranges.
>>>
>>> this patch fixes device node traversing for dma-ranges.
>>
>> What's the DT look like that doesn't work?
>
> The problem is the bodge in pci_dma_configure() where we don't have an
> OF node for the actual device itself, so pass in the host bridge's OF
> node instead. This happens to work well enough for dma-coherent, but I
> don't think dma-ranges was even considered at the time.
>
> As it happens I'm currently halfway through writing an experiment
> wherein pci_dma_configure() creates a temporary child node for the
> of_dma_configure() call if no other suitable alternative (e.g. some
> intermediate bridge node) exists. How hard are you likely to NAK that
> approach? ;)
>
>> dma-ranges is supposed to be a bus property, not a device's property.
>> So looking at the parent is correct behavior generally.
>
> Indeed, this patch as-is will break currently correct DTs (because we
> won't find dma-ranges on the device, so will bail before even looking at
> the parent as we should).

current parsing of dma-ranges assume that dma-ranges always to be
found in parent node.

based on that, my thinking is following:
couple of options

1)
instead while(1)  some meaningful condition such as while(!node)
the following bail out is not required.
  if (!ranges)
   break;

2)
have check based on dt-property to distinguish between pci and handle
dma-ranges root bridge

but again these changes do not solve the entire problem for choosing
right dma_mask.
neither it actually correctly address root bridge pci dma-ranges.

and hence I have written
https://lkml.org/lkml/2017/3/27/540

my final take is: this function does not need to change, let it parse
memory mapped dma-ranges as it is doing.

I am more inclined to have generic pci dma-ranges and parsing.
which following already addresses.
https://lkml.org/lkml/2017/3/27/540

Regards,
Oza.

Re: [RFC PATCH 3/3] of: fix node traversing in of_dma_get_range

2017-03-27 Thread Oza Oza

please find my comments inline.

On Mon, Mar 27, 2017 at 8:15 PM, Robin Murphy  wrote:
> Hi Rob,
>
> On 27/03/17 15:34, Rob Herring wrote:
>> On Sat, Mar 25, 2017 at 12:31 AM, Oza Pawandeep  wrote:
>>> it jumps to the parent node without examining the child node.
>>> also with that, it throws "no dma-ranges found for node"
>>> for pci dma-ranges.
>>>
>>> this patch fixes device node traversing for dma-ranges.
>>
>> What's the DT look like that doesn't work?
>
> The problem is the bodge in pci_dma_configure() where we don't have an
> OF node for the actual device itself, so pass in the host bridge's OF
> node instead. This happens to work well enough for dma-coherent, but I
> don't think dma-ranges was even considered at the time.
>
> As it happens I'm currently halfway through writing an experiment
> wherein pci_dma_configure() creates a temporary child node for the
> of_dma_configure() call if no other suitable alternative (e.g. some
> intermediate bridge node) exists. How hard are you likely to NAK that
> approach? ;)
>
>> dma-ranges is supposed to be a bus property, not a device's property.
>> So looking at the parent is correct behavior generally.
>
> Indeed, this patch as-is will break currently correct DTs (because we
> won't find dma-ranges on the device, so will bail before even looking at
> the parent as we should).

current parsing of dma-ranges assume that dma-ranges always to be
found in parent node.

based on that, my thinking is following:
couple of options

1)
instead while(1)  some meaningful condition such as while(!node)
the following bail out is not required.
  if (!ranges)
   break;

2)
have check based on dt-property to distinguish between pci and handle
dma-ranges root bridge

but again these changes do not solve the entire problem for choosing
right dma_mask.
neither it actually correctly address root bridge pci dma-ranges.

and hence I have written
https://lkml.org/lkml/2017/3/27/540

my final take is: this function does not need to change, let it parse
memory mapped dma-ranges as it is doing.

I am more inclined to have generic pci dma-ranges and parsing.
which following already addresses.
https://lkml.org/lkml/2017/3/27/540

Regards,
Oza.

Re: [Patch v2 10/11] s5p-mfc: Add support for HEVC encoder

2017-03-27 Thread Smitha T Murthy

On Mon, 2017-03-27 at 14:09 +0200, Andrzej Hajda wrote:
> Hi Smitha,
> 
> Sorry for late reply, it seems I have missed this email.
> 
> 
> On 14.03.2017 12:41, Smitha T Murthy wrote:
> > On Tue, 2017-03-07 at 12:33 +0100, Andrzej Hajda wrote: 
> >> On 03.03.2017 10:07, Smitha T Murthy wrote:
> >>> Add HEVC encoder support and necessary registers, V4L2 CIDs,
> >>> and hevc encoder parameters
> >>>
> >>> Signed-off-by: Smitha T Murthy 
> >>> ---
> >>>  drivers/media/platform/s5p-mfc/regs-mfc-v10.h   |   28 +-
> >>>  drivers/media/platform/s5p-mfc/s5p_mfc.c|1 +
> >>>  drivers/media/platform/s5p-mfc/s5p_mfc_cmd_v6.c |3 +
> >>>  drivers/media/platform/s5p-mfc/s5p_mfc_common.h |   55 ++-
> >>>  drivers/media/platform/s5p-mfc/s5p_mfc_enc.c|  595 
> >>> +++
> >>>  drivers/media/platform/s5p-mfc/s5p_mfc_opr.h|8 +
> >>>  drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c |  200 
> >>>  drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.h |8 +
> >>>  8 files changed, 896 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/media/platform/s5p-mfc/regs-mfc-v10.h 
> >>> b/drivers/media/platform/s5p-mfc/regs-mfc-v10.h
> >>> index 846dcf5..caf02ff 100644
> >>> --- a/drivers/media/platform/s5p-mfc/regs-mfc-v10.h
> >>> +++ b/drivers/media/platform/s5p-mfc/regs-mfc-v10.h
> >>> @@ -20,13 +20,35 @@
> >>>  #define S5P_FIMV_MFC_STATE_V10   0x7124
> >>>  #define S5P_FIMV_D_STATIC_BUFFER_ADDR_V100xF570
> >>>  #define S5P_FIMV_D_STATIC_BUFFER_SIZE_V100xF574
> >>> +#define S5P_FIMV_E_NUM_T_LAYER_V10   0xFBAC
> >>> +#define S5P_FIMV_E_HIERARCHICAL_QP_LAYER0_V100xFBB0
> >>> +#define S5P_FIMV_E_HIERARCHICAL_QP_LAYER1_V100xFBB4
> >>> +#define S5P_FIMV_E_HIERARCHICAL_QP_LAYER2_V100xFBB8
> >>> +#define S5P_FIMV_E_HIERARCHICAL_QP_LAYER3_V100xFBBC
> >>> +#define S5P_FIMV_E_HIERARCHICAL_QP_LAYER4_V100xFBC0
> >>> +#define S5P_FIMV_E_HIERARCHICAL_QP_LAYER5_V100xFBC4
> >>> +#define S5P_FIMV_E_HIERARCHICAL_QP_LAYER6_V100xFBC8
> >>> +#define S5P_FIMV_E_HIERARCHICAL_BIT_RATE_LAYER0_V10  0xFD18
> >>> +#define S5P_FIMV_E_HIERARCHICAL_BIT_RATE_LAYER1_V10  0xFD1C
> >>> +#define S5P_FIMV_E_HIERARCHICAL_BIT_RATE_LAYER2_V10  0xFD20
> >>> +#define S5P_FIMV_E_HIERARCHICAL_BIT_RATE_LAYER3_V10  0xFD24
> >>> +#define S5P_FIMV_E_HIERARCHICAL_BIT_RATE_LAYER4_V10  0xFD28
> >>> +#define S5P_FIMV_E_HIERARCHICAL_BIT_RATE_LAYER5_V10  0xFD2C
> >>> +#define S5P_FIMV_E_HIERARCHICAL_BIT_RATE_LAYER6_V10  0xFD30
> >>> +#define S5P_FIMV_E_HEVC_OPTIONS_V10  0xFDD4
> >>> +#define S5P_FIMV_E_HEVC_REFRESH_PERIOD_V10   0xFDD8
> >>> +#define S5P_FIMV_E_HEVC_CHROMA_QP_OFFSET_V10 0xFDDC
> >>> +#define S5P_FIMV_E_HEVC_LF_BETA_OFFSET_DIV2_V10  0xFDE0
> >>> +#define S5P_FIMV_E_HEVC_LF_TC_OFFSET_DIV2_V100xFDE4
> >>> +#define S5P_FIMV_E_HEVC_NAL_CONTROL_V10  0xFDE8
> >>>  
> >>>  /* MFCv10 Context buffer sizes */
> >>>  #define MFC_CTX_BUF_SIZE_V10 (30 * SZ_1K)/* 30KB */
> >>>  #define MFC_H264_DEC_CTX_BUF_SIZE_V10(2 * SZ_1M) /* 2MB */
> >>>  #define MFC_OTHER_DEC_CTX_BUF_SIZE_V10   (20 * SZ_1K)/* 20KB */
> >>>  #define MFC_H264_ENC_CTX_BUF_SIZE_V10(100 * SZ_1K)   /* 100KB */
> >>> -#define MFC_OTHER_ENC_CTX_BUF_SIZE_V10   (15 * SZ_1K)/* 15KB */
> >>> +#define MFC_HEVC_ENC_CTX_BUF_SIZE_V10(30 * SZ_1K)/* 30KB */
> >>> +#define MFC_OTHER_ENC_CTX_BUF_SIZE_V10  (15 * SZ_1K) /* 15KB */
> >>>  
> >>>  /* MFCv10 variant defines */
> >>>  #define MAX_FW_SIZE_V10  (SZ_1M) /* 1MB */
> >>> @@ -58,5 +80,9 @@
> >>>  #define ENC_V100_VP8_ME_SIZE(x, y) \
> >>>   ENC_V100_BASE_SIZE(x, y)
> >>>  
> >>> +#define ENC_V100_HEVC_ME_SIZE(x, y)  \
> >>> + (((x + 3) * (y + 3) * 32)   \
> >>> +  + ((y * 128) + 1280) * DIV_ROUND_UP(x, 4))
> >>> +
> >>>  #endif /*_REGS_MFC_V10_H*/
> >>>  
> >>> diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc.c 
> >>> b/drivers/media/platform/s5p-mfc/s5p_mfc.c
> >>> index b014038..b01c556 100644
> >>> --- a/drivers/media/platform/s5p-mfc/s5p_mfc.c
> >>> +++ b/drivers/media/platform/s5p-mfc/s5p_mfc.c
> >>> @@ -1549,6 +1549,7 @@ static int s5p_mfc_resume(struct device *dev)
> >>>   .h264_dec_ctx   = MFC_H264_DEC_CTX_BUF_SIZE_V10,
> >>>   .other_dec_ctx  = MFC_OTHER_DEC_CTX_BUF_SIZE_V10,
> >>>   .h264_enc_ctx   = MFC_H264_ENC_CTX_BUF_SIZE_V10,
> >>> + .hevc_enc_ctx   = MFC_HEVC_ENC_CTX_BUF_SIZE_V10,
> >>>   .other_enc_ctx  = MFC_OTHER_ENC_CTX_BUF_SIZE_V10,
> >>>  };
> >>>  
> >>> diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc_cmd_v6.c 
> >>> b/drivers/media/platform/s5p-mfc/s5p_mfc_cmd_v6.c
> >>> index 102b47e..7521fce 100644
> >>> --- a/drivers/media/platform/s5p-mfc/s5p_mfc_cmd_v6.c
> >>> +++ b/drivers/media/platform/s5p-mfc/s5p_mfc_cmd_v6.c
> >>> @@

Re: [Patch v2 10/11] s5p-mfc: Add support for HEVC encoder

2017-03-27 Thread Smitha T Murthy

On Mon, 2017-03-27 at 14:09 +0200, Andrzej Hajda wrote:
> Hi Smitha,
> 
> Sorry for late reply, it seems I have missed this email.
> 
> 
> On 14.03.2017 12:41, Smitha T Murthy wrote:
> > On Tue, 2017-03-07 at 12:33 +0100, Andrzej Hajda wrote: 
> >> On 03.03.2017 10:07, Smitha T Murthy wrote:
> >>> Add HEVC encoder support and necessary registers, V4L2 CIDs,
> >>> and hevc encoder parameters
> >>>
> >>> Signed-off-by: Smitha T Murthy 
> >>> ---
> >>>  drivers/media/platform/s5p-mfc/regs-mfc-v10.h   |   28 +-
> >>>  drivers/media/platform/s5p-mfc/s5p_mfc.c|1 +
> >>>  drivers/media/platform/s5p-mfc/s5p_mfc_cmd_v6.c |3 +
> >>>  drivers/media/platform/s5p-mfc/s5p_mfc_common.h |   55 ++-
> >>>  drivers/media/platform/s5p-mfc/s5p_mfc_enc.c|  595 
> >>> +++
> >>>  drivers/media/platform/s5p-mfc/s5p_mfc_opr.h|8 +
> >>>  drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c |  200 
> >>>  drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.h |8 +
> >>>  8 files changed, 896 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/media/platform/s5p-mfc/regs-mfc-v10.h 
> >>> b/drivers/media/platform/s5p-mfc/regs-mfc-v10.h
> >>> index 846dcf5..caf02ff 100644
> >>> --- a/drivers/media/platform/s5p-mfc/regs-mfc-v10.h
> >>> +++ b/drivers/media/platform/s5p-mfc/regs-mfc-v10.h
> >>> @@ -20,13 +20,35 @@
> >>>  #define S5P_FIMV_MFC_STATE_V10   0x7124
> >>>  #define S5P_FIMV_D_STATIC_BUFFER_ADDR_V100xF570
> >>>  #define S5P_FIMV_D_STATIC_BUFFER_SIZE_V100xF574
> >>> +#define S5P_FIMV_E_NUM_T_LAYER_V10   0xFBAC
> >>> +#define S5P_FIMV_E_HIERARCHICAL_QP_LAYER0_V100xFBB0
> >>> +#define S5P_FIMV_E_HIERARCHICAL_QP_LAYER1_V100xFBB4
> >>> +#define S5P_FIMV_E_HIERARCHICAL_QP_LAYER2_V100xFBB8
> >>> +#define S5P_FIMV_E_HIERARCHICAL_QP_LAYER3_V100xFBBC
> >>> +#define S5P_FIMV_E_HIERARCHICAL_QP_LAYER4_V100xFBC0
> >>> +#define S5P_FIMV_E_HIERARCHICAL_QP_LAYER5_V100xFBC4
> >>> +#define S5P_FIMV_E_HIERARCHICAL_QP_LAYER6_V100xFBC8
> >>> +#define S5P_FIMV_E_HIERARCHICAL_BIT_RATE_LAYER0_V10  0xFD18
> >>> +#define S5P_FIMV_E_HIERARCHICAL_BIT_RATE_LAYER1_V10  0xFD1C
> >>> +#define S5P_FIMV_E_HIERARCHICAL_BIT_RATE_LAYER2_V10  0xFD20
> >>> +#define S5P_FIMV_E_HIERARCHICAL_BIT_RATE_LAYER3_V10  0xFD24
> >>> +#define S5P_FIMV_E_HIERARCHICAL_BIT_RATE_LAYER4_V10  0xFD28
> >>> +#define S5P_FIMV_E_HIERARCHICAL_BIT_RATE_LAYER5_V10  0xFD2C
> >>> +#define S5P_FIMV_E_HIERARCHICAL_BIT_RATE_LAYER6_V10  0xFD30
> >>> +#define S5P_FIMV_E_HEVC_OPTIONS_V10  0xFDD4
> >>> +#define S5P_FIMV_E_HEVC_REFRESH_PERIOD_V10   0xFDD8
> >>> +#define S5P_FIMV_E_HEVC_CHROMA_QP_OFFSET_V10 0xFDDC
> >>> +#define S5P_FIMV_E_HEVC_LF_BETA_OFFSET_DIV2_V10  0xFDE0
> >>> +#define S5P_FIMV_E_HEVC_LF_TC_OFFSET_DIV2_V100xFDE4
> >>> +#define S5P_FIMV_E_HEVC_NAL_CONTROL_V10  0xFDE8
> >>>  
> >>>  /* MFCv10 Context buffer sizes */
> >>>  #define MFC_CTX_BUF_SIZE_V10 (30 * SZ_1K)/* 30KB */
> >>>  #define MFC_H264_DEC_CTX_BUF_SIZE_V10(2 * SZ_1M) /* 2MB */
> >>>  #define MFC_OTHER_DEC_CTX_BUF_SIZE_V10   (20 * SZ_1K)/* 20KB */
> >>>  #define MFC_H264_ENC_CTX_BUF_SIZE_V10(100 * SZ_1K)   /* 100KB */
> >>> -#define MFC_OTHER_ENC_CTX_BUF_SIZE_V10   (15 * SZ_1K)/* 15KB */
> >>> +#define MFC_HEVC_ENC_CTX_BUF_SIZE_V10(30 * SZ_1K)/* 30KB */
> >>> +#define MFC_OTHER_ENC_CTX_BUF_SIZE_V10  (15 * SZ_1K) /* 15KB */
> >>>  
> >>>  /* MFCv10 variant defines */
> >>>  #define MAX_FW_SIZE_V10  (SZ_1M) /* 1MB */
> >>> @@ -58,5 +80,9 @@
> >>>  #define ENC_V100_VP8_ME_SIZE(x, y) \
> >>>   ENC_V100_BASE_SIZE(x, y)
> >>>  
> >>> +#define ENC_V100_HEVC_ME_SIZE(x, y)  \
> >>> + (((x + 3) * (y + 3) * 32)   \
> >>> +  + ((y * 128) + 1280) * DIV_ROUND_UP(x, 4))
> >>> +
> >>>  #endif /*_REGS_MFC_V10_H*/
> >>>  
> >>> diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc.c 
> >>> b/drivers/media/platform/s5p-mfc/s5p_mfc.c
> >>> index b014038..b01c556 100644
> >>> --- a/drivers/media/platform/s5p-mfc/s5p_mfc.c
> >>> +++ b/drivers/media/platform/s5p-mfc/s5p_mfc.c
> >>> @@ -1549,6 +1549,7 @@ static int s5p_mfc_resume(struct device *dev)
> >>>   .h264_dec_ctx   = MFC_H264_DEC_CTX_BUF_SIZE_V10,
> >>>   .other_dec_ctx  = MFC_OTHER_DEC_CTX_BUF_SIZE_V10,
> >>>   .h264_enc_ctx   = MFC_H264_ENC_CTX_BUF_SIZE_V10,
> >>> + .hevc_enc_ctx   = MFC_HEVC_ENC_CTX_BUF_SIZE_V10,
> >>>   .other_enc_ctx  = MFC_OTHER_ENC_CTX_BUF_SIZE_V10,
> >>>  };
> >>>  
> >>> diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc_cmd_v6.c 
> >>> b/drivers/media/platform/s5p-mfc/s5p_mfc_cmd_v6.c
> >>> index 102b47e..7521fce 100644
> >>> --- a/drivers/media/platform/s5p-mfc/s5p_mfc_cmd_v6.c
> >>> +++ b/drivers/media/platform/s5p-mfc/s5p_mfc_cmd_v6.c
> >>> @@ -122,6 +122,9 @@ static

[PATCH] usb: mtu3: Replace the extcon API

2017-03-27 Thread Chanwoo Choi

This patch uses the resource-managed extcon API for extcon_register_notifier()
and replaces the deprecated extcon API as following:
- extcon_get_cable_state_() -> extcon_get_state()

Cc: Greg Kroah-Hartman 
Cc: Chunfeng Yun 
Cc: linux-...@vger.kernel.org
Signed-off-by: Chanwoo Choi 
---
 drivers/usb/mtu3/mtu3_dr.c | 19 ++-
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/usb/mtu3/mtu3_dr.c b/drivers/usb/mtu3/mtu3_dr.c
index 1a8987e7c5b0..11a0d3b84c5e 100644
--- a/drivers/usb/mtu3/mtu3_dr.c
+++ b/drivers/usb/mtu3/mtu3_dr.c
@@ -223,25 +223,25 @@ static int ssusb_extcon_register(struct otg_switch_mtk 
*otg_sx)
return 0;
 
otg_sx->vbus_nb.notifier_call = ssusb_vbus_notifier;
-   ret = extcon_register_notifier(edev, EXTCON_USB,
+   ret = devm_extcon_register_notifier(ssusb->dev, edev, EXTCON_USB,
_sx->vbus_nb);
if (ret < 0)
dev_err(ssusb->dev, "failed to register notifier for USB\n");
 
otg_sx->id_nb.notifier_call = ssusb_id_notifier;
-   ret = extcon_register_notifier(edev, EXTCON_USB_HOST,
+   ret = devm_extcon_register_notifier(ssusb->dev, edev, EXTCON_USB_HOST,
_sx->id_nb);
if (ret < 0)
dev_err(ssusb->dev, "failed to register notifier for 
USB-HOST\n");
 
dev_dbg(ssusb->dev, "EXTCON_USB: %d, EXTCON_USB_HOST: %d\n",
-   extcon_get_cable_state_(edev, EXTCON_USB),
-   extcon_get_cable_state_(edev, EXTCON_USB_HOST));
+   extcon_get_state(edev, EXTCON_USB),
+   extcon_get_state(edev, EXTCON_USB_HOST));
 
/* default as host, switch to device mode if needed */
-   if (extcon_get_cable_state_(edev, EXTCON_USB_HOST) == false)
+   if (extcon_get_state(edev, EXTCON_USB_HOST) == false)
ssusb_set_mailbox(otg_sx, MTU3_ID_FLOAT);
-   if (extcon_get_cable_state_(edev, EXTCON_USB) == true)
+   if (extcon_get_state(edev, EXTCON_USB) == true)
ssusb_set_mailbox(otg_sx, MTU3_VBUS_VALID);
 
return 0;
@@ -367,13 +367,6 @@ void ssusb_otg_switch_exit(struct ssusb_mtk *ssusb)
 
cancel_delayed_work(_sx->extcon_reg_dwork);
 
-   if (otg_sx->edev) {
-   extcon_unregister_notifier(otg_sx->edev,
-   EXTCON_USB, _sx->vbus_nb);
-   extcon_unregister_notifier(otg_sx->edev,
-   EXTCON_USB_HOST, _sx->id_nb);
-   }
-
if (otg_sx->manual_drd_enabled)
ssusb_debugfs_exit(ssusb);
 }
-- 
1.9.1

[PATCH] usb: mtu3: Replace the extcon API

2017-03-27 Thread Chanwoo Choi

This patch uses the resource-managed extcon API for extcon_register_notifier()
and replaces the deprecated extcon API as following:
- extcon_get_cable_state_() -> extcon_get_state()

Cc: Greg Kroah-Hartman 
Cc: Chunfeng Yun 
Cc: linux-...@vger.kernel.org
Signed-off-by: Chanwoo Choi 
---
 drivers/usb/mtu3/mtu3_dr.c | 19 ++-
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/usb/mtu3/mtu3_dr.c b/drivers/usb/mtu3/mtu3_dr.c
index 1a8987e7c5b0..11a0d3b84c5e 100644
--- a/drivers/usb/mtu3/mtu3_dr.c
+++ b/drivers/usb/mtu3/mtu3_dr.c
@@ -223,25 +223,25 @@ static int ssusb_extcon_register(struct otg_switch_mtk 
*otg_sx)
return 0;
 
otg_sx->vbus_nb.notifier_call = ssusb_vbus_notifier;
-   ret = extcon_register_notifier(edev, EXTCON_USB,
+   ret = devm_extcon_register_notifier(ssusb->dev, edev, EXTCON_USB,
_sx->vbus_nb);
if (ret < 0)
dev_err(ssusb->dev, "failed to register notifier for USB\n");
 
otg_sx->id_nb.notifier_call = ssusb_id_notifier;
-   ret = extcon_register_notifier(edev, EXTCON_USB_HOST,
+   ret = devm_extcon_register_notifier(ssusb->dev, edev, EXTCON_USB_HOST,
_sx->id_nb);
if (ret < 0)
dev_err(ssusb->dev, "failed to register notifier for 
USB-HOST\n");
 
dev_dbg(ssusb->dev, "EXTCON_USB: %d, EXTCON_USB_HOST: %d\n",
-   extcon_get_cable_state_(edev, EXTCON_USB),
-   extcon_get_cable_state_(edev, EXTCON_USB_HOST));
+   extcon_get_state(edev, EXTCON_USB),
+   extcon_get_state(edev, EXTCON_USB_HOST));
 
/* default as host, switch to device mode if needed */
-   if (extcon_get_cable_state_(edev, EXTCON_USB_HOST) == false)
+   if (extcon_get_state(edev, EXTCON_USB_HOST) == false)
ssusb_set_mailbox(otg_sx, MTU3_ID_FLOAT);
-   if (extcon_get_cable_state_(edev, EXTCON_USB) == true)
+   if (extcon_get_state(edev, EXTCON_USB) == true)
ssusb_set_mailbox(otg_sx, MTU3_VBUS_VALID);
 
return 0;
@@ -367,13 +367,6 @@ void ssusb_otg_switch_exit(struct ssusb_mtk *ssusb)
 
cancel_delayed_work(_sx->extcon_reg_dwork);
 
-   if (otg_sx->edev) {
-   extcon_unregister_notifier(otg_sx->edev,
-   EXTCON_USB, _sx->vbus_nb);
-   extcon_unregister_notifier(otg_sx->edev,
-   EXTCON_USB_HOST, _sx->id_nb);
-   }
-
if (otg_sx->manual_drd_enabled)
ssusb_debugfs_exit(ssusb);
 }
-- 
1.9.1

[PATCH v2] sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable

2017-03-27 Thread Fam Zheng

If device reports a small max_xfer_blocks and a zero opt_xfer_blocks, we
end up using BLK_DEF_MAX_SECTORS, which is wrong and r/w of that size
may get error.

Fixes: ca369d51b3e ("block/sd: Fix device-imposed transfer length limits")
Signed-off-by: Fam Zheng 

---

v2: Fix granularity mismatch. [Martin]
---
 drivers/scsi/sd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index fcfeddc..a5c7e67 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2957,6 +2957,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
rw_max = logical_to_sectors(sdp, sdkp->opt_xfer_blocks);
} else
rw_max = BLK_DEF_MAX_SECTORS;
+   rw_max = min_not_zero(rw_max, logical_to_sectors(sdp, dev_max));
 
/* Combine with controller limits */
q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q));
-- 
2.9.3

[PATCH v2] sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable

2017-03-27 Thread Fam Zheng

If device reports a small max_xfer_blocks and a zero opt_xfer_blocks, we
end up using BLK_DEF_MAX_SECTORS, which is wrong and r/w of that size
may get error.

Fixes: ca369d51b3e ("block/sd: Fix device-imposed transfer length limits")
Signed-off-by: Fam Zheng 

---

v2: Fix granularity mismatch. [Martin]
---
 drivers/scsi/sd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index fcfeddc..a5c7e67 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2957,6 +2957,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
rw_max = logical_to_sectors(sdp, sdkp->opt_xfer_blocks);
} else
rw_max = BLK_DEF_MAX_SECTORS;
+   rw_max = min_not_zero(rw_max, logical_to_sectors(sdp, dev_max));
 
/* Combine with controller limits */
q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q));
-- 
2.9.3

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1820 matches

Mail list logo