Re: WARNING in task_participate_group_stop

2017-10-31 Thread Dmitry Vyukov
On Tue, Oct 31, 2017 at 7:34 PM, Oleg Nesterov  wrote:
> On 10/30, Dmitry Vyukov wrote:
>>
>> On Mon, Oct 30, 2017 at 10:12 PM, syzbot
>> 
>> wrote:
>> > Hello,
>> >
>> > syzkaller hit the following crash on
>> > d95e159cd1da1ed4dbf76bf203e8ffaf231395e4
>> > git://git.cmpxchg.org/linux-mmots.git/master
>> > compiler: gcc (GCC) 7.1.1 20170620
>> > .config is attached
>> > Raw console output is attached.
>> > C reproducer is attached
>
> Hmm. I do not see reproducer in this email...

Ah, sorry. You can see full thread with attachments here:
https://groups.google.com/forum/#!topic/syzkaller-bugs/EUmYZU4m5gU


>> > syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>> > for information about syzkaller reproducers
>>
>> This also happens on more recent commits, including linux-next
>> 36ef71cae353f88fd6e095e2aaa3e5953af1685d (Oct 19) and upstream
>> 3e0cc09a3a2c40ec1ffb6b4e12da86e98feccb11 (Oct 18).
>>
>> > WARNING: CPU: 0 PID: 1 at kernel/signal.c:340
>> > task_participate_group_stop+0x1ce/0x230 kernel/signal.c:340
>> > Kernel panic - not syncing: panic_on_warn set ...
>> >
>> > CPU: 0 PID: 1 Comm: init Not tainted 4.13.0-mm1+ #5
>
> Looks familiar... I need some time to recall the details, will try to send
> the fix(es) this week.
>
> So this is init process with SIGNAL_UNKILLABLE flag set. And I hope it has
> the pending SIGKILL, otherwise there is something else.
>
> IIRC the problem is that complete_signal(SIGKILL) does nothing if
> SIGNAL_UNKILLABLE is set, in particular it doesn't set SIGNAL_GROUP_EXIT.
> This fools the signal_group_exit() check in do_signal_stop().
>
> Actually there are more problems with SIGNAL_UNKILLABLE && SIGKILL, we need
> some nasty cleanups.
>
> Oleg.
>
>
>> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> > Google 01/01/2011
>> > Call Trace:
>> >  __dump_stack lib/dump_stack.c:16 [inline]
>> >  dump_stack+0x194/0x257 lib/dump_stack.c:52
>> >  panic+0x1e4/0x417 kernel/panic.c:181
>> >  __warn+0x1c4/0x1d9 kernel/panic.c:542
>> >  report_bug+0x211/0x2d0 lib/bug.c:183
>> >  fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
>> >  do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
>> >  do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
>> >  do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
>> >  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
>> >  invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
>> > RIP: 0010:task_participate_group_stop+0x1ce/0x230 kernel/signal.c:340
>> > RSP: 0018:8801d9ee77f0 EFLAGS: 00010097
>> > RAX: 8801d9ed8040 RBX: 8801d9ed8040 RCX: 8801d9edb2c0
>> > RDX:  RSI: 00060013 RDI: 8801d9ed84d0
>> > RBP: 8801d9ee7808 R08: 8801d9ee7180 R09: 8801d9ee7178
>> > R10: 8801d9ee70f0 R11: 11003b3db29b R12: 8801d9ee9740
>> > R13:  R14: dc00 R15: 8801d9ed85c8
>> >  do_signal_stop+0x217/0x900 kernel/signal.c:2042
>> >  get_signal+0x61c/0x17e0 kernel/signal.c:2297
>> >  do_signal+0x94/0x1ee0 arch/x86/kernel/signal.c:808
>> >  exit_to_usermode_loop+0x224/0x300 arch/x86/entry/common.c:158
>> >  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
>> >  syscall_return_slowpath+0x42f/0x500 arch/x86/entry/common.c:266
>> >  entry_SYSCALL_64_fastpath+0xbc/0xbe
>> > RIP: 0033:0x7f33f723fdd3
>> > RSP: 002b:7fffb5303398 EFLAGS: 0246 ORIG_RAX: 0017
>> > RAX: fdfe RBX: 7fffb5303540 RCX: 7f33f723fdd3
>> > RDX:  RSI: 7fffb53036f0 RDI: 000b
>> > RBP: 7fffb53036f0 R08: 7fffb5303770 R09: 0001
>> > R10:  R11: 0246 R12: 
>> > R13: 7fffb5303ad0 R14:  R15: 
>> >
>> >
>> > ---
>> > This bug is generated by a dumb bot. It may contain errors.
>> > See https://goo.gl/tpsmEJ for details.
>> > Direct all questions to syzkal...@googlegroups.com.
>> >
>> > syzbot will keep track of this bug report.
>> > Once a fix for this bug is committed, please reply to this email with:
>> > #syz fix: exact-commit-title
>> > To mark this as a duplicate of another syzbot report, please reply with:
>> > #syz dup: exact-subject-of-another-report
>> > If it's a one-off invalid bug report, please reply with:
>> > #syz invalid
>> > Note: if the crash happens again, it will cause creation of a new bug
>> > report.
>> >
>> > --
>> > You received this message because you are subscribed to the Google Groups
>> > "syzkaller-bugs" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an
>> > email to syzkaller-bugs+unsubscr...@googlegroups.com.
>> > To view this discussion on the web visit
>> > https://groups.google.com/d/msgid/syzkaller-bugs/94eb2c058c80ea49ed055cc8695e%40google.com.
>> > For more options, visit https://groups.google.com/d/optout.
>


Re: WARNING in task_participate_group_stop

2017-10-31 Thread Dmitry Vyukov
On Tue, Oct 31, 2017 at 7:34 PM, Oleg Nesterov  wrote:
> On 10/30, Dmitry Vyukov wrote:
>>
>> On Mon, Oct 30, 2017 at 10:12 PM, syzbot
>> 
>> wrote:
>> > Hello,
>> >
>> > syzkaller hit the following crash on
>> > d95e159cd1da1ed4dbf76bf203e8ffaf231395e4
>> > git://git.cmpxchg.org/linux-mmots.git/master
>> > compiler: gcc (GCC) 7.1.1 20170620
>> > .config is attached
>> > Raw console output is attached.
>> > C reproducer is attached
>
> Hmm. I do not see reproducer in this email...

Ah, sorry. You can see full thread with attachments here:
https://groups.google.com/forum/#!topic/syzkaller-bugs/EUmYZU4m5gU


>> > syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>> > for information about syzkaller reproducers
>>
>> This also happens on more recent commits, including linux-next
>> 36ef71cae353f88fd6e095e2aaa3e5953af1685d (Oct 19) and upstream
>> 3e0cc09a3a2c40ec1ffb6b4e12da86e98feccb11 (Oct 18).
>>
>> > WARNING: CPU: 0 PID: 1 at kernel/signal.c:340
>> > task_participate_group_stop+0x1ce/0x230 kernel/signal.c:340
>> > Kernel panic - not syncing: panic_on_warn set ...
>> >
>> > CPU: 0 PID: 1 Comm: init Not tainted 4.13.0-mm1+ #5
>
> Looks familiar... I need some time to recall the details, will try to send
> the fix(es) this week.
>
> So this is init process with SIGNAL_UNKILLABLE flag set. And I hope it has
> the pending SIGKILL, otherwise there is something else.
>
> IIRC the problem is that complete_signal(SIGKILL) does nothing if
> SIGNAL_UNKILLABLE is set, in particular it doesn't set SIGNAL_GROUP_EXIT.
> This fools the signal_group_exit() check in do_signal_stop().
>
> Actually there are more problems with SIGNAL_UNKILLABLE && SIGKILL, we need
> some nasty cleanups.
>
> Oleg.
>
>
>> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> > Google 01/01/2011
>> > Call Trace:
>> >  __dump_stack lib/dump_stack.c:16 [inline]
>> >  dump_stack+0x194/0x257 lib/dump_stack.c:52
>> >  panic+0x1e4/0x417 kernel/panic.c:181
>> >  __warn+0x1c4/0x1d9 kernel/panic.c:542
>> >  report_bug+0x211/0x2d0 lib/bug.c:183
>> >  fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
>> >  do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
>> >  do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
>> >  do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
>> >  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
>> >  invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
>> > RIP: 0010:task_participate_group_stop+0x1ce/0x230 kernel/signal.c:340
>> > RSP: 0018:8801d9ee77f0 EFLAGS: 00010097
>> > RAX: 8801d9ed8040 RBX: 8801d9ed8040 RCX: 8801d9edb2c0
>> > RDX:  RSI: 00060013 RDI: 8801d9ed84d0
>> > RBP: 8801d9ee7808 R08: 8801d9ee7180 R09: 8801d9ee7178
>> > R10: 8801d9ee70f0 R11: 11003b3db29b R12: 8801d9ee9740
>> > R13:  R14: dc00 R15: 8801d9ed85c8
>> >  do_signal_stop+0x217/0x900 kernel/signal.c:2042
>> >  get_signal+0x61c/0x17e0 kernel/signal.c:2297
>> >  do_signal+0x94/0x1ee0 arch/x86/kernel/signal.c:808
>> >  exit_to_usermode_loop+0x224/0x300 arch/x86/entry/common.c:158
>> >  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
>> >  syscall_return_slowpath+0x42f/0x500 arch/x86/entry/common.c:266
>> >  entry_SYSCALL_64_fastpath+0xbc/0xbe
>> > RIP: 0033:0x7f33f723fdd3
>> > RSP: 002b:7fffb5303398 EFLAGS: 0246 ORIG_RAX: 0017
>> > RAX: fdfe RBX: 7fffb5303540 RCX: 7f33f723fdd3
>> > RDX:  RSI: 7fffb53036f0 RDI: 000b
>> > RBP: 7fffb53036f0 R08: 7fffb5303770 R09: 0001
>> > R10:  R11: 0246 R12: 
>> > R13: 7fffb5303ad0 R14:  R15: 
>> >
>> >
>> > ---
>> > This bug is generated by a dumb bot. It may contain errors.
>> > See https://goo.gl/tpsmEJ for details.
>> > Direct all questions to syzkal...@googlegroups.com.
>> >
>> > syzbot will keep track of this bug report.
>> > Once a fix for this bug is committed, please reply to this email with:
>> > #syz fix: exact-commit-title
>> > To mark this as a duplicate of another syzbot report, please reply with:
>> > #syz dup: exact-subject-of-another-report
>> > If it's a one-off invalid bug report, please reply with:
>> > #syz invalid
>> > Note: if the crash happens again, it will cause creation of a new bug
>> > report.
>> >
>> > --
>> > You received this message because you are subscribed to the Google Groups
>> > "syzkaller-bugs" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an
>> > email to syzkaller-bugs+unsubscr...@googlegroups.com.
>> > To view this discussion on the web visit
>> > https://groups.google.com/d/msgid/syzkaller-bugs/94eb2c058c80ea49ed055cc8695e%40google.com.
>> > For more options, visit https://groups.google.com/d/optout.
>


[PATCH 2/2] perf record: Replace 'overwrite' by 'flightrecorder' for better naming

2017-10-31 Thread Wang Nan
The meaning of perf record's "overwrite" option and many "overwrite" in
source code are not clear. In perf's code, the 'overwrite' has 2 meanings:
 1. Make ringbuffer readonly (perf_evlist__mmap_ex's argument).
 2. Set evsel's "backward" attribute (in apply_config_terms).

perf record doesn't use meaning 1 at all, but have a overwrite option, its
real meaning is setting backward.

This patch separates these two concepts, introduce 'flightrecorder' mode
which is what we really want. It combines these 2 concept together, wraps
them into a record mode. In flight recorder mode, perf only dumps data before
something happen.

Signed-off-by: Wang Nan 
---
 tools/perf/Documentation/perf-record.txt |  8 
 tools/perf/builtin-record.c  |  4 ++--
 tools/perf/perf.h|  2 +-
 tools/perf/util/evsel.c  |  6 +++---
 tools/perf/util/evsel.h  |  4 ++--
 tools/perf/util/parse-events.c   | 20 ++--
 tools/perf/util/parse-events.h   |  4 ++--
 tools/perf/util/parse-events.l   |  4 ++--
 8 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index 5a626ef..463c2d3 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -467,19 +467,19 @@ the beginning of record, collect them during finalizing 
an output file.
 The collected non-sample events reflects the status of the system when
 record is finished.
 
---overwrite::
+--flight-recorder::
 Makes all events use an overwritable ring buffer. An overwritable ring
 buffer works like a flight recorder: when it gets full, the kernel will
 overwrite the oldest records, that thus will never make it to the
 perf.data file.
 
-When '--overwrite' and '--switch-output' are used perf records and drops
+When '--flight-recorder' and '--switch-output' are used perf records and drops
 events until it receives a signal, meaning that something unusual was
 detected that warrants taking a snapshot of the most current events,
 those fitting in the ring buffer at that moment.
 
-'overwrite' attribute can also be set or canceled for an event using
-config terms. For example: 'cycles/overwrite/' and 
'instructions/no-overwrite/'.
+'flightrecorder' attribute can also be set or canceled separately for an event 
using
+config terms. For example: 'cycles/flightrecorder/' and 
'instructions/no-flightrecorder/'.
 
 Implies --tail-synthesize.
 
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index f4d9fc5..315ea09 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1489,7 +1489,7 @@ static struct option __record_options[] = {
"child tasks do not inherit counters"),
OPT_BOOLEAN(0, "tail-synthesize", _synthesize,
"synthesize non-sample events at the end of output"),
-   OPT_BOOLEAN(0, "overwrite", , "use overwrite 
mode"),
+   OPT_BOOLEAN(0, "flight-recoder", _recorder, "use 
flight recoder mode"),
OPT_UINTEGER('F', "freq", _freq, "profile at this 
frequency"),
OPT_CALLBACK('m', "mmap-pages", , "pages[,pages]",
 "number of mmap data pages and AUX area tracing mmap 
pages",
@@ -1733,7 +1733,7 @@ int cmd_record(int argc, const char **argv)
}
}
 
-   if (record.opts.overwrite)
+   if (record.opts.flight_recorder)
record.opts.tail_synthesize = true;
 
if (rec->evlist->nr_entries == 0 &&
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index fbb0a9c..a7f7618 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -57,7 +57,7 @@ struct record_opts {
bool all_kernel;
bool all_user;
bool tail_synthesize;
-   bool overwrite;
+   bool flight_recorder;
bool ignore_missing_thread;
unsigned int freq;
unsigned int mmap_pages;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index f894893..0e1e8e8 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -772,8 +772,8 @@ static void apply_config_terms(struct perf_evsel *evsel,
 */
attr->inherit = term->val.inherit ? 1 : 0;
break;
-   case PERF_EVSEL__CONFIG_TERM_OVERWRITE:
-   attr->write_backward = term->val.overwrite ? 1 : 0;
+   case PERF_EVSEL__CONFIG_TERM_FLIGHTRECORDER:
+   attr->write_backward = term->val.flightrecorder ? 1 : 0;
break;
default:
break;
@@ -856,7 +856,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct 
record_opts *opts,
 
attr->sample_id_all = perf_missing_features.sample_id_all ? 0 : 1;
attr->inherit   = !opts->no_inherit;
- 

[PATCH 0/2] perf record: Fix --overwrite and clarify concepts

2017-10-31 Thread Wang Nan
Kan reports that 'perf record --overwrite' not working as it should be.

Patch 1/2 fix a bug, map backward events to readonly ring buffer so kernel
can overwrite that ring buffer.

Patch 2/2 clarify concepts of 'overwrite' and 'backward' in the source code
by introducing the concept of 'flightrecorder' and convert many 'overwrite'
to it to clarify that what we really want is a perf record flightrecorder
mode, not only mapping the ring buffer overwritable.

Wang Nan (2):
  perf mmap: Fix perf backward recording
  perf record: Replace 'overwrite' by 'flightrecorder' for better naming

 tools/perf/Documentation/perf-record.txt |  8 
 tools/perf/builtin-record.c  |  4 ++--
 tools/perf/perf.h|  2 +-
 tools/perf/util/evlist.c |  8 +++-
 tools/perf/util/evsel.c  |  6 +++---
 tools/perf/util/evsel.h  |  4 ++--
 tools/perf/util/parse-events.c   | 20 ++--
 tools/perf/util/parse-events.h   |  4 ++--
 tools/perf/util/parse-events.l   |  4 ++--
 9 files changed, 33 insertions(+), 27 deletions(-)

-- 
2.10.1



[PATCH 2/2] perf record: Replace 'overwrite' by 'flightrecorder' for better naming

2017-10-31 Thread Wang Nan
The meaning of perf record's "overwrite" option and many "overwrite" in
source code are not clear. In perf's code, the 'overwrite' has 2 meanings:
 1. Make ringbuffer readonly (perf_evlist__mmap_ex's argument).
 2. Set evsel's "backward" attribute (in apply_config_terms).

perf record doesn't use meaning 1 at all, but have a overwrite option, its
real meaning is setting backward.

This patch separates these two concepts, introduce 'flightrecorder' mode
which is what we really want. It combines these 2 concept together, wraps
them into a record mode. In flight recorder mode, perf only dumps data before
something happen.

Signed-off-by: Wang Nan 
---
 tools/perf/Documentation/perf-record.txt |  8 
 tools/perf/builtin-record.c  |  4 ++--
 tools/perf/perf.h|  2 +-
 tools/perf/util/evsel.c  |  6 +++---
 tools/perf/util/evsel.h  |  4 ++--
 tools/perf/util/parse-events.c   | 20 ++--
 tools/perf/util/parse-events.h   |  4 ++--
 tools/perf/util/parse-events.l   |  4 ++--
 8 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index 5a626ef..463c2d3 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -467,19 +467,19 @@ the beginning of record, collect them during finalizing 
an output file.
 The collected non-sample events reflects the status of the system when
 record is finished.
 
---overwrite::
+--flight-recorder::
 Makes all events use an overwritable ring buffer. An overwritable ring
 buffer works like a flight recorder: when it gets full, the kernel will
 overwrite the oldest records, that thus will never make it to the
 perf.data file.
 
-When '--overwrite' and '--switch-output' are used perf records and drops
+When '--flight-recorder' and '--switch-output' are used perf records and drops
 events until it receives a signal, meaning that something unusual was
 detected that warrants taking a snapshot of the most current events,
 those fitting in the ring buffer at that moment.
 
-'overwrite' attribute can also be set or canceled for an event using
-config terms. For example: 'cycles/overwrite/' and 
'instructions/no-overwrite/'.
+'flightrecorder' attribute can also be set or canceled separately for an event 
using
+config terms. For example: 'cycles/flightrecorder/' and 
'instructions/no-flightrecorder/'.
 
 Implies --tail-synthesize.
 
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index f4d9fc5..315ea09 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1489,7 +1489,7 @@ static struct option __record_options[] = {
"child tasks do not inherit counters"),
OPT_BOOLEAN(0, "tail-synthesize", _synthesize,
"synthesize non-sample events at the end of output"),
-   OPT_BOOLEAN(0, "overwrite", , "use overwrite 
mode"),
+   OPT_BOOLEAN(0, "flight-recoder", _recorder, "use 
flight recoder mode"),
OPT_UINTEGER('F', "freq", _freq, "profile at this 
frequency"),
OPT_CALLBACK('m', "mmap-pages", , "pages[,pages]",
 "number of mmap data pages and AUX area tracing mmap 
pages",
@@ -1733,7 +1733,7 @@ int cmd_record(int argc, const char **argv)
}
}
 
-   if (record.opts.overwrite)
+   if (record.opts.flight_recorder)
record.opts.tail_synthesize = true;
 
if (rec->evlist->nr_entries == 0 &&
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index fbb0a9c..a7f7618 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -57,7 +57,7 @@ struct record_opts {
bool all_kernel;
bool all_user;
bool tail_synthesize;
-   bool overwrite;
+   bool flight_recorder;
bool ignore_missing_thread;
unsigned int freq;
unsigned int mmap_pages;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index f894893..0e1e8e8 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -772,8 +772,8 @@ static void apply_config_terms(struct perf_evsel *evsel,
 */
attr->inherit = term->val.inherit ? 1 : 0;
break;
-   case PERF_EVSEL__CONFIG_TERM_OVERWRITE:
-   attr->write_backward = term->val.overwrite ? 1 : 0;
+   case PERF_EVSEL__CONFIG_TERM_FLIGHTRECORDER:
+   attr->write_backward = term->val.flightrecorder ? 1 : 0;
break;
default:
break;
@@ -856,7 +856,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct 
record_opts *opts,
 
attr->sample_id_all = perf_missing_features.sample_id_all ? 0 : 1;
attr->inherit   = !opts->no_inherit;
-   

[PATCH 0/2] perf record: Fix --overwrite and clarify concepts

2017-10-31 Thread Wang Nan
Kan reports that 'perf record --overwrite' not working as it should be.

Patch 1/2 fix a bug, map backward events to readonly ring buffer so kernel
can overwrite that ring buffer.

Patch 2/2 clarify concepts of 'overwrite' and 'backward' in the source code
by introducing the concept of 'flightrecorder' and convert many 'overwrite'
to it to clarify that what we really want is a perf record flightrecorder
mode, not only mapping the ring buffer overwritable.

Wang Nan (2):
  perf mmap: Fix perf backward recording
  perf record: Replace 'overwrite' by 'flightrecorder' for better naming

 tools/perf/Documentation/perf-record.txt |  8 
 tools/perf/builtin-record.c  |  4 ++--
 tools/perf/perf.h|  2 +-
 tools/perf/util/evlist.c |  8 +++-
 tools/perf/util/evsel.c  |  6 +++---
 tools/perf/util/evsel.h  |  4 ++--
 tools/perf/util/parse-events.c   | 20 ++--
 tools/perf/util/parse-events.h   |  4 ++--
 tools/perf/util/parse-events.l   |  4 ++--
 9 files changed, 33 insertions(+), 27 deletions(-)

-- 
2.10.1



[PATCH 1/2] perf mmap: Fix perf backward recording

2017-10-31 Thread Wang Nan
perf record backward recording doesn't work as we expected: it never
overwrite when ring buffer full.

Test:

(Run a busy printing python task background like this:

 while True:
 print 123

send SIGUSR2 to perf to capture snapshot.)

 # ./perf record --overwrite -e raw_syscalls:sys_enter -e raw_syscalls:sys_exit 
--exclude-perf -a --switch-output
 [ perf record: dump data: Woken up 1 times ]
 [ perf record: Dump perf.data.2017110101520743 ]
 [ perf record: dump data: Woken up 1 times ]
 [ perf record: Dump perf.data.2017110101521251 ]
 [ perf record: dump data: Woken up 1 times ]
 [ perf record: Dump perf.data.2017110101521692 ]
 ^C[ perf record: Woken up 1 times to write data ]
 [ perf record: Dump perf.data.2017110101521936 ]
 [ perf record: Captured and wrote 0.826 MB perf.data. ]

 # ./perf script -i ./perf.data.2017110101520743 | head -n3
 perf  2717 [000] 12449.310785: raw_syscalls:sys_enter: NR 16 (5, 
2400, 0, 59, 100, 0)
 perf  2717 [000] 12449.310790: raw_syscalls:sys_enter: NR 7 
(4112340, 2, , 3df, 100, 0)
   python  2545 [000] 12449.310800:  raw_syscalls:sys_exit: NR 1 = 4
 # ./perf script -i ./perf.data.2017110101521251 | head -n3
 perf  2717 [000] 12449.310785: raw_syscalls:sys_enter: NR 16 (5, 
2400, 0, 59, 100, 0)
 perf  2717 [000] 12449.310790: raw_syscalls:sys_enter: NR 7 
(4112340, 2, , 3df, 100, 0)
   python  2545 [000] 12449.310800:  raw_syscalls:sys_exit: NR 1 = 4
 # ./perf script -i ./perf.data.2017110101521692 | head -n3
 perf  2717 [000] 12449.310785: raw_syscalls:sys_enter: NR 16 (5, 
2400, 0, 59, 100, 0)
 perf  2717 [000] 12449.310790: raw_syscalls:sys_enter: NR 7 
(4112340, 2, , 3df, 100, 0)
   python  2545 [000] 12449.310800:  raw_syscalls:sys_exit: NR 1 = 4

Timestamps are never change, but my background task is a dead loop, can
easily overwhelme the ring buffer.

This patch fix it by force unsetting PROT_WRITE for backward ring
buffer, so all backward ring buffer become overwrite ring buffer.

Test result:

 # ./perf record --overwrite -e raw_syscalls:sys_enter -e raw_syscalls:sys_exit 
--exclude-perf -a --switch-output
 [ perf record: dump data: Woken up 1 times ]
 [ perf record: Dump perf.data.2017110101285323 ]
 [ perf record: dump data: Woken up 1 times ]
 [ perf record: Dump perf.data.2017110101290053 ]
 [ perf record: dump data: Woken up 1 times ]
 [ perf record: Dump perf.data.2017110101290446 ]
 ^C[ perf record: Woken up 1 times to write data ]
 [ perf record: Dump perf.data.2017110101290837 ]
 [ perf record: Captured and wrote 0.826 MB perf.data. ]
 # ./perf script -i ./perf.data.2017110101285323 | head -n3
   python  2545 [000] 11064.268083:  raw_syscalls:sys_exit: NR 1 = 4
   python  2545 [000] 11064.268084: raw_syscalls:sys_enter: NR 1 (1, 
12cc330, 4, 7fc237280370, 7fc2373d0700, 2c7b0)
   python  2545 [000] 11064.268086:  raw_syscalls:sys_exit: NR 1 = 4
 # ./perf script -i ./perf.data.2017110101290 | head -n3
 failed to open ./perf.data.2017110101290: No such file or directory
 # ./perf script -i ./perf.data.2017110101290053 | head -n3
   python  2545 [000] 11071.564062: raw_syscalls:sys_enter: NR 1 (1, 
12cc330, 4, 7fc237280370, 7fc2373d0700, 2c7b0)
   python  2545 [000] 11071.564064:  raw_syscalls:sys_exit: NR 1 = 4
   python  2545 [000] 11071.564066: raw_syscalls:sys_enter: NR 1 (1, 
12cc330, 4, 7fc237280370, 7fc2373d0700, 2c7b0)
 # ./perf script -i ./perf.data.2017110101290 | head -n3
 perf.data.2017110101290053  perf.data.2017110101290446  
perf.data.2017110101290837
 # ./perf script -i ./perf.data.2017110101290446 | head -n3
 sshd  1321 [000] 11075.499473:  raw_syscalls:sys_exit: NR 14 = 0
 sshd  1321 [000] 11075.499474: raw_syscalls:sys_enter: NR 14 (2, 
7ffe98899490, 0, 8, 0, 3000)
 sshd  1321 [000] 11075.499474:  raw_syscalls:sys_exit: NR 14 = 0
 # ./perf script -i ./perf.data.2017110101290837 | head -n3
   python  2545 [000] 11079.280844:  raw_syscalls:sys_exit: NR 1 = 4
   python  2545 [000] 11079.280847: raw_syscalls:sys_enter: NR 1 (1, 
12cc330, 4, 7fc237280370, 7fc2373d0700, 2c7b0)
   python  2545 [000] 11079.280850:  raw_syscalls:sys_exit: NR 1 = 4

Signed-off-by: Wang Nan 
---
 tools/perf/util/evlist.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index c6c891e..4c5daba 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -799,22 +799,28 @@ perf_evlist__should_poll(struct perf_evlist *evlist 
__maybe_unused,
 }
 
 static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
-  struct mmap_params *mp, int cpu_idx,
+  struct mmap_params *_mp, int cpu_idx,
   int thread, int 

[PATCH 1/2] perf mmap: Fix perf backward recording

2017-10-31 Thread Wang Nan
perf record backward recording doesn't work as we expected: it never
overwrite when ring buffer full.

Test:

(Run a busy printing python task background like this:

 while True:
 print 123

send SIGUSR2 to perf to capture snapshot.)

 # ./perf record --overwrite -e raw_syscalls:sys_enter -e raw_syscalls:sys_exit 
--exclude-perf -a --switch-output
 [ perf record: dump data: Woken up 1 times ]
 [ perf record: Dump perf.data.2017110101520743 ]
 [ perf record: dump data: Woken up 1 times ]
 [ perf record: Dump perf.data.2017110101521251 ]
 [ perf record: dump data: Woken up 1 times ]
 [ perf record: Dump perf.data.2017110101521692 ]
 ^C[ perf record: Woken up 1 times to write data ]
 [ perf record: Dump perf.data.2017110101521936 ]
 [ perf record: Captured and wrote 0.826 MB perf.data. ]

 # ./perf script -i ./perf.data.2017110101520743 | head -n3
 perf  2717 [000] 12449.310785: raw_syscalls:sys_enter: NR 16 (5, 
2400, 0, 59, 100, 0)
 perf  2717 [000] 12449.310790: raw_syscalls:sys_enter: NR 7 
(4112340, 2, , 3df, 100, 0)
   python  2545 [000] 12449.310800:  raw_syscalls:sys_exit: NR 1 = 4
 # ./perf script -i ./perf.data.2017110101521251 | head -n3
 perf  2717 [000] 12449.310785: raw_syscalls:sys_enter: NR 16 (5, 
2400, 0, 59, 100, 0)
 perf  2717 [000] 12449.310790: raw_syscalls:sys_enter: NR 7 
(4112340, 2, , 3df, 100, 0)
   python  2545 [000] 12449.310800:  raw_syscalls:sys_exit: NR 1 = 4
 # ./perf script -i ./perf.data.2017110101521692 | head -n3
 perf  2717 [000] 12449.310785: raw_syscalls:sys_enter: NR 16 (5, 
2400, 0, 59, 100, 0)
 perf  2717 [000] 12449.310790: raw_syscalls:sys_enter: NR 7 
(4112340, 2, , 3df, 100, 0)
   python  2545 [000] 12449.310800:  raw_syscalls:sys_exit: NR 1 = 4

Timestamps are never change, but my background task is a dead loop, can
easily overwhelme the ring buffer.

This patch fix it by force unsetting PROT_WRITE for backward ring
buffer, so all backward ring buffer become overwrite ring buffer.

Test result:

 # ./perf record --overwrite -e raw_syscalls:sys_enter -e raw_syscalls:sys_exit 
--exclude-perf -a --switch-output
 [ perf record: dump data: Woken up 1 times ]
 [ perf record: Dump perf.data.2017110101285323 ]
 [ perf record: dump data: Woken up 1 times ]
 [ perf record: Dump perf.data.2017110101290053 ]
 [ perf record: dump data: Woken up 1 times ]
 [ perf record: Dump perf.data.2017110101290446 ]
 ^C[ perf record: Woken up 1 times to write data ]
 [ perf record: Dump perf.data.2017110101290837 ]
 [ perf record: Captured and wrote 0.826 MB perf.data. ]
 # ./perf script -i ./perf.data.2017110101285323 | head -n3
   python  2545 [000] 11064.268083:  raw_syscalls:sys_exit: NR 1 = 4
   python  2545 [000] 11064.268084: raw_syscalls:sys_enter: NR 1 (1, 
12cc330, 4, 7fc237280370, 7fc2373d0700, 2c7b0)
   python  2545 [000] 11064.268086:  raw_syscalls:sys_exit: NR 1 = 4
 # ./perf script -i ./perf.data.2017110101290 | head -n3
 failed to open ./perf.data.2017110101290: No such file or directory
 # ./perf script -i ./perf.data.2017110101290053 | head -n3
   python  2545 [000] 11071.564062: raw_syscalls:sys_enter: NR 1 (1, 
12cc330, 4, 7fc237280370, 7fc2373d0700, 2c7b0)
   python  2545 [000] 11071.564064:  raw_syscalls:sys_exit: NR 1 = 4
   python  2545 [000] 11071.564066: raw_syscalls:sys_enter: NR 1 (1, 
12cc330, 4, 7fc237280370, 7fc2373d0700, 2c7b0)
 # ./perf script -i ./perf.data.2017110101290 | head -n3
 perf.data.2017110101290053  perf.data.2017110101290446  
perf.data.2017110101290837
 # ./perf script -i ./perf.data.2017110101290446 | head -n3
 sshd  1321 [000] 11075.499473:  raw_syscalls:sys_exit: NR 14 = 0
 sshd  1321 [000] 11075.499474: raw_syscalls:sys_enter: NR 14 (2, 
7ffe98899490, 0, 8, 0, 3000)
 sshd  1321 [000] 11075.499474:  raw_syscalls:sys_exit: NR 14 = 0
 # ./perf script -i ./perf.data.2017110101290837 | head -n3
   python  2545 [000] 11079.280844:  raw_syscalls:sys_exit: NR 1 = 4
   python  2545 [000] 11079.280847: raw_syscalls:sys_enter: NR 1 (1, 
12cc330, 4, 7fc237280370, 7fc2373d0700, 2c7b0)
   python  2545 [000] 11079.280850:  raw_syscalls:sys_exit: NR 1 = 4

Signed-off-by: Wang Nan 
---
 tools/perf/util/evlist.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index c6c891e..4c5daba 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -799,22 +799,28 @@ perf_evlist__should_poll(struct perf_evlist *evlist 
__maybe_unused,
 }
 
 static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
-  struct mmap_params *mp, int cpu_idx,
+  struct mmap_params *_mp, int cpu_idx,
   int thread, int *_output, int 

linux-next: manual merge of the tip tree with the powerpc tree

2017-10-31 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the tip tree got a conflict in:

  arch/powerpc/mm/numa.c

between commit:

  cee5405da402 ("powerpc/hotplug: Improve responsiveness of hotplug change")

from the powerpc tree and commit:

  df7e828c1b69 ("timer: Remove init_timer_deferrable() in favor of 
timer_setup()")

from the tip tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/powerpc/mm/numa.c
index eb604b3574fa,73016451f330..
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@@ -1506,9 -1466,7 +1505,7 @@@ static struct timer_list topology_timer
  
  static void reset_topology_timer(void)
  {
-   topology_timer.data = 0;
-   topology_timer.expires = jiffies + topology_timer_secs * HZ;
-   mod_timer(_timer, topology_timer.expires);
 -  mod_timer(_timer, jiffies + 60 * HZ);
++  mod_timer(_timer, jiffies + topology_timer_secs * HZ);
  }
  
  #ifdef CONFIG_SMP
@@@ -1561,13 -1520,14 +1558,14 @@@ int start_topology_update(void
rc = of_reconfig_notifier_register(_update_nb);
  #endif
}
 -  } else if (firmware_has_feature(FW_FEATURE_VPHN) &&
 +  }
 +  if (firmware_has_feature(FW_FEATURE_VPHN) &&
   lppaca_shared_proc(get_lppaca())) {
if (!vphn_enabled) {
 -  prrn_enabled = 0;
vphn_enabled = 1;
setup_cpu_associativity_change_counters();
-   init_timer_deferrable(_timer);
+   timer_setup(_timer, topology_timer_fn,
+   TIMER_DEFERRABLE);
reset_topology_timer();
}
}


linux-next: manual merge of the tip tree with the powerpc tree

2017-10-31 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the tip tree got a conflict in:

  arch/powerpc/mm/numa.c

between commit:

  cee5405da402 ("powerpc/hotplug: Improve responsiveness of hotplug change")

from the powerpc tree and commit:

  df7e828c1b69 ("timer: Remove init_timer_deferrable() in favor of 
timer_setup()")

from the tip tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/powerpc/mm/numa.c
index eb604b3574fa,73016451f330..
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@@ -1506,9 -1466,7 +1505,7 @@@ static struct timer_list topology_timer
  
  static void reset_topology_timer(void)
  {
-   topology_timer.data = 0;
-   topology_timer.expires = jiffies + topology_timer_secs * HZ;
-   mod_timer(_timer, topology_timer.expires);
 -  mod_timer(_timer, jiffies + 60 * HZ);
++  mod_timer(_timer, jiffies + topology_timer_secs * HZ);
  }
  
  #ifdef CONFIG_SMP
@@@ -1561,13 -1520,14 +1558,14 @@@ int start_topology_update(void
rc = of_reconfig_notifier_register(_update_nb);
  #endif
}
 -  } else if (firmware_has_feature(FW_FEATURE_VPHN) &&
 +  }
 +  if (firmware_has_feature(FW_FEATURE_VPHN) &&
   lppaca_shared_proc(get_lppaca())) {
if (!vphn_enabled) {
 -  prrn_enabled = 0;
vphn_enabled = 1;
setup_cpu_associativity_change_counters();
-   init_timer_deferrable(_timer);
+   timer_setup(_timer, topology_timer_fn,
+   TIMER_DEFERRABLE);
reset_topology_timer();
}
}


linux-next: manual merge of the tip tree with the arm64 tree

2017-10-31 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the tip tree got a conflict in:

  arch/arm64/Kconfig

between commit:

  396a5d4a5c32 ("arm64: Unconditionally support {ARCH_}HAVE_NMI{_SAFE_CMPXCHG}")

from the arm64 tree and commit:

  087133ac9076 ("locking/qrwlock, arm64: Move rwlock implementation over to 
qrwlocks")

from the tip tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/arm64/Kconfig
index 38f8d26208af,6205f521b648..
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@@ -21,8 -21,25 +21,25 @@@ config ARM6
select ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_HAS_STRICT_MODULE_RWX
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 -  select ARCH_HAVE_NMI_SAFE_CMPXCHG if ACPI_APEI_SEA
 +  select ARCH_HAVE_NMI_SAFE_CMPXCHG
+   select ARCH_INLINE_READ_LOCK if !PREEMPT
+   select ARCH_INLINE_READ_LOCK_BH if !PREEMPT
+   select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPT
+   select ARCH_INLINE_READ_LOCK_IRQSAVE if !PREEMPT
+   select ARCH_INLINE_READ_UNLOCK if !PREEMPT
+   select ARCH_INLINE_READ_UNLOCK_BH if !PREEMPT
+   select ARCH_INLINE_READ_UNLOCK_IRQ if !PREEMPT
+   select ARCH_INLINE_READ_UNLOCK_IRQRESTORE if !PREEMPT
+   select ARCH_INLINE_WRITE_LOCK if !PREEMPT
+   select ARCH_INLINE_WRITE_LOCK_BH if !PREEMPT
+   select ARCH_INLINE_WRITE_LOCK_IRQ if !PREEMPT
+   select ARCH_INLINE_WRITE_LOCK_IRQSAVE if !PREEMPT
+   select ARCH_INLINE_WRITE_UNLOCK if !PREEMPT
+   select ARCH_INLINE_WRITE_UNLOCK_BH if !PREEMPT
+   select ARCH_INLINE_WRITE_UNLOCK_IRQ if !PREEMPT
+   select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPT
select ARCH_USE_CMPXCHG_LOCKREF
+   select ARCH_USE_QUEUED_RWLOCKS
select ARCH_SUPPORTS_MEMORY_FAILURE
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_NUMA_BALANCING


linux-next: manual merge of the tip tree with the arm64 tree

2017-10-31 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the tip tree got a conflict in:

  arch/arm64/Kconfig

between commit:

  396a5d4a5c32 ("arm64: Unconditionally support {ARCH_}HAVE_NMI{_SAFE_CMPXCHG}")

from the arm64 tree and commit:

  087133ac9076 ("locking/qrwlock, arm64: Move rwlock implementation over to 
qrwlocks")

from the tip tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/arm64/Kconfig
index 38f8d26208af,6205f521b648..
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@@ -21,8 -21,25 +21,25 @@@ config ARM6
select ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_HAS_STRICT_MODULE_RWX
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 -  select ARCH_HAVE_NMI_SAFE_CMPXCHG if ACPI_APEI_SEA
 +  select ARCH_HAVE_NMI_SAFE_CMPXCHG
+   select ARCH_INLINE_READ_LOCK if !PREEMPT
+   select ARCH_INLINE_READ_LOCK_BH if !PREEMPT
+   select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPT
+   select ARCH_INLINE_READ_LOCK_IRQSAVE if !PREEMPT
+   select ARCH_INLINE_READ_UNLOCK if !PREEMPT
+   select ARCH_INLINE_READ_UNLOCK_BH if !PREEMPT
+   select ARCH_INLINE_READ_UNLOCK_IRQ if !PREEMPT
+   select ARCH_INLINE_READ_UNLOCK_IRQRESTORE if !PREEMPT
+   select ARCH_INLINE_WRITE_LOCK if !PREEMPT
+   select ARCH_INLINE_WRITE_LOCK_BH if !PREEMPT
+   select ARCH_INLINE_WRITE_LOCK_IRQ if !PREEMPT
+   select ARCH_INLINE_WRITE_LOCK_IRQSAVE if !PREEMPT
+   select ARCH_INLINE_WRITE_UNLOCK if !PREEMPT
+   select ARCH_INLINE_WRITE_UNLOCK_BH if !PREEMPT
+   select ARCH_INLINE_WRITE_UNLOCK_IRQ if !PREEMPT
+   select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPT
select ARCH_USE_CMPXCHG_LOCKREF
+   select ARCH_USE_QUEUED_RWLOCKS
select ARCH_SUPPORTS_MEMORY_FAILURE
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_NUMA_BALANCING


Re: [PATCH 1/2] mm:swap: clean up swap readahead

2017-10-31 Thread Minchan Kim
Hi Huang,

On Wed, Nov 01, 2017 at 01:41:00PM +0800, Huang, Ying wrote:
> Hi, Minchan,
> 
> Minchan Kim  writes:
> 
> > When I see recent change of swap readahead, I am very unhappy
> > about current code structure which diverges two swap readahead
> > algorithm in do_swap_page. This patch is to clean it up.
> >
> > Main motivation is that fault handler doesn't need to be aware of
> > readahead algorithms but just should call swapin_readahead.
> >
> > As first step, this patch cleans up a little bit but not perfect
> > (I just separate for review easier) so next patch will make the goal
> > complete.
> >
> > Signed-off-by: Minchan Kim 
> > ---
> >  include/linux/swap.h | 17 ++
> >  mm/memory.c  | 17 +++---
> >  mm/swap_state.c  | 89 
> > 
> >  3 files changed, 55 insertions(+), 68 deletions(-)
> >
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index 84255b3da7c1..7c7c8b344bc9 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -427,12 +427,8 @@ extern struct page 
> > *__read_swap_cache_async(swp_entry_t, gfp_t,
> > bool *new_page_allocated);
> >  extern struct page *swapin_readahead(swp_entry_t, gfp_t,
> > struct vm_area_struct *vma, unsigned long addr);
> > -
> > -extern struct page *swap_readahead_detect(struct vm_fault *vmf,
> > - struct vma_swap_readahead *swap_ra);
> >  extern struct page *do_swap_page_readahead(swp_entry_t fentry, gfp_t 
> > gfp_mask,
> > -  struct vm_fault *vmf,
> > -  struct vma_swap_readahead *swap_ra);
> > +  struct vm_fault *vmf);
> >  
> >  /* linux/mm/swapfile.c */
> >  extern atomic_long_t nr_swap_pages;
> > @@ -551,15 +547,8 @@ static inline bool swap_use_vma_readahead(void)
> > return false;
> >  }
> >  
> > -static inline struct page *swap_readahead_detect(
> > -   struct vm_fault *vmf, struct vma_swap_readahead *swap_ra)
> > -{
> > -   return NULL;
> > -}
> > -
> > -static inline struct page *do_swap_page_readahead(
> > -   swp_entry_t fentry, gfp_t gfp_mask,
> > -   struct vm_fault *vmf, struct vma_swap_readahead *swap_ra)
> > +static inline struct page *do_swap_page_readahead(swp_entry_t fentry,
> > +   gfp_t gfp_mask, struct vm_fault *vmf)
> >  {
> > return NULL;
> >  }
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 8a0c410037d2..e955298e4290 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2849,21 +2849,14 @@ int do_swap_page(struct vm_fault *vmf)
> > struct vm_area_struct *vma = vmf->vma;
> > struct page *page = NULL, *swapcache = NULL;
> > struct mem_cgroup *memcg;
> > -   struct vma_swap_readahead swap_ra;
> > swp_entry_t entry;
> > pte_t pte;
> > int locked;
> > int exclusive = 0;
> > int ret = 0;
> > -   bool vma_readahead = swap_use_vma_readahead();
> >  
> > -   if (vma_readahead)
> > -   page = swap_readahead_detect(vmf, _ra);
> > -   if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte)) {
> > -   if (page)
> > -   put_page(page);
> > +   if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte))
> > goto out;
> > -   }
> 
> The page table holding PTE may be unmapped in pte_unmap_same(), so is it
> safe for us to access page table after this in do_swap_page_readahead()?

That's why I calls pte_offset_map in swap_ra_info before the access.

> 
> Best Regards,
> Huang, Ying
> 
> > entry = pte_to_swp_entry(vmf->orig_pte);
> > if (unlikely(non_swap_entry(entry))) {
> > @@ -2889,9 +2882,7 @@ int do_swap_page(struct vm_fault *vmf)
> >  
> >  
> > delayacct_set_flag(DELAYACCT_PF_SWAPIN);
> > -   if (!page)
> > -   page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
> > -vmf->address);
> > +   page = lookup_swap_cache(entry, vma, vmf->address);
> > if (!page) {
> > struct swap_info_struct *si = swp_swap_info(entry);
> >  
> > @@ -2907,9 +2898,9 @@ int do_swap_page(struct vm_fault *vmf)
> > swap_readpage(page, true);
> > }
> > } else {
> > -   if (vma_readahead)
> > +   if (swap_use_vma_readahead())
> > page = do_swap_page_readahead(entry,
> > -   GFP_HIGHUSER_MOVABLE, vmf, _ra);
> > +   GFP_HIGHUSER_MOVABLE, vmf);
> > else
> > page = swapin_readahead(entry,
> >GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> > index 6c017ced11e6..e3c535fcd2df 100644
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c

Re: [PATCH 1/2] mm:swap: clean up swap readahead

2017-10-31 Thread Minchan Kim
Hi Huang,

On Wed, Nov 01, 2017 at 01:41:00PM +0800, Huang, Ying wrote:
> Hi, Minchan,
> 
> Minchan Kim  writes:
> 
> > When I see recent change of swap readahead, I am very unhappy
> > about current code structure which diverges two swap readahead
> > algorithm in do_swap_page. This patch is to clean it up.
> >
> > Main motivation is that fault handler doesn't need to be aware of
> > readahead algorithms but just should call swapin_readahead.
> >
> > As first step, this patch cleans up a little bit but not perfect
> > (I just separate for review easier) so next patch will make the goal
> > complete.
> >
> > Signed-off-by: Minchan Kim 
> > ---
> >  include/linux/swap.h | 17 ++
> >  mm/memory.c  | 17 +++---
> >  mm/swap_state.c  | 89 
> > 
> >  3 files changed, 55 insertions(+), 68 deletions(-)
> >
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index 84255b3da7c1..7c7c8b344bc9 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -427,12 +427,8 @@ extern struct page 
> > *__read_swap_cache_async(swp_entry_t, gfp_t,
> > bool *new_page_allocated);
> >  extern struct page *swapin_readahead(swp_entry_t, gfp_t,
> > struct vm_area_struct *vma, unsigned long addr);
> > -
> > -extern struct page *swap_readahead_detect(struct vm_fault *vmf,
> > - struct vma_swap_readahead *swap_ra);
> >  extern struct page *do_swap_page_readahead(swp_entry_t fentry, gfp_t 
> > gfp_mask,
> > -  struct vm_fault *vmf,
> > -  struct vma_swap_readahead *swap_ra);
> > +  struct vm_fault *vmf);
> >  
> >  /* linux/mm/swapfile.c */
> >  extern atomic_long_t nr_swap_pages;
> > @@ -551,15 +547,8 @@ static inline bool swap_use_vma_readahead(void)
> > return false;
> >  }
> >  
> > -static inline struct page *swap_readahead_detect(
> > -   struct vm_fault *vmf, struct vma_swap_readahead *swap_ra)
> > -{
> > -   return NULL;
> > -}
> > -
> > -static inline struct page *do_swap_page_readahead(
> > -   swp_entry_t fentry, gfp_t gfp_mask,
> > -   struct vm_fault *vmf, struct vma_swap_readahead *swap_ra)
> > +static inline struct page *do_swap_page_readahead(swp_entry_t fentry,
> > +   gfp_t gfp_mask, struct vm_fault *vmf)
> >  {
> > return NULL;
> >  }
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 8a0c410037d2..e955298e4290 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2849,21 +2849,14 @@ int do_swap_page(struct vm_fault *vmf)
> > struct vm_area_struct *vma = vmf->vma;
> > struct page *page = NULL, *swapcache = NULL;
> > struct mem_cgroup *memcg;
> > -   struct vma_swap_readahead swap_ra;
> > swp_entry_t entry;
> > pte_t pte;
> > int locked;
> > int exclusive = 0;
> > int ret = 0;
> > -   bool vma_readahead = swap_use_vma_readahead();
> >  
> > -   if (vma_readahead)
> > -   page = swap_readahead_detect(vmf, _ra);
> > -   if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte)) {
> > -   if (page)
> > -   put_page(page);
> > +   if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte))
> > goto out;
> > -   }
> 
> The page table holding PTE may be unmapped in pte_unmap_same(), so is it
> safe for us to access page table after this in do_swap_page_readahead()?

That's why I calls pte_offset_map in swap_ra_info before the access.

> 
> Best Regards,
> Huang, Ying
> 
> > entry = pte_to_swp_entry(vmf->orig_pte);
> > if (unlikely(non_swap_entry(entry))) {
> > @@ -2889,9 +2882,7 @@ int do_swap_page(struct vm_fault *vmf)
> >  
> >  
> > delayacct_set_flag(DELAYACCT_PF_SWAPIN);
> > -   if (!page)
> > -   page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
> > -vmf->address);
> > +   page = lookup_swap_cache(entry, vma, vmf->address);
> > if (!page) {
> > struct swap_info_struct *si = swp_swap_info(entry);
> >  
> > @@ -2907,9 +2898,9 @@ int do_swap_page(struct vm_fault *vmf)
> > swap_readpage(page, true);
> > }
> > } else {
> > -   if (vma_readahead)
> > +   if (swap_use_vma_readahead())
> > page = do_swap_page_readahead(entry,
> > -   GFP_HIGHUSER_MOVABLE, vmf, _ra);
> > +   GFP_HIGHUSER_MOVABLE, vmf);
> > else
> > page = swapin_readahead(entry,
> >GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> > index 6c017ced11e6..e3c535fcd2df 100644
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -331,32 +331,38 @@ struct page 

Re: [PATCH 1/2] mm:swap: clean up swap readahead

2017-10-31 Thread Huang, Ying
Hi, Minchan,

Minchan Kim  writes:

> When I see recent change of swap readahead, I am very unhappy
> about current code structure which diverges two swap readahead
> algorithm in do_swap_page. This patch is to clean it up.
>
> Main motivation is that fault handler doesn't need to be aware of
> readahead algorithms but just should call swapin_readahead.
>
> As first step, this patch cleans up a little bit but not perfect
> (I just separate for review easier) so next patch will make the goal
> complete.
>
> Signed-off-by: Minchan Kim 
> ---
>  include/linux/swap.h | 17 ++
>  mm/memory.c  | 17 +++---
>  mm/swap_state.c  | 89 
> 
>  3 files changed, 55 insertions(+), 68 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 84255b3da7c1..7c7c8b344bc9 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -427,12 +427,8 @@ extern struct page *__read_swap_cache_async(swp_entry_t, 
> gfp_t,
>   bool *new_page_allocated);
>  extern struct page *swapin_readahead(swp_entry_t, gfp_t,
>   struct vm_area_struct *vma, unsigned long addr);
> -
> -extern struct page *swap_readahead_detect(struct vm_fault *vmf,
> -   struct vma_swap_readahead *swap_ra);
>  extern struct page *do_swap_page_readahead(swp_entry_t fentry, gfp_t 
> gfp_mask,
> -struct vm_fault *vmf,
> -struct vma_swap_readahead *swap_ra);
> +struct vm_fault *vmf);
>  
>  /* linux/mm/swapfile.c */
>  extern atomic_long_t nr_swap_pages;
> @@ -551,15 +547,8 @@ static inline bool swap_use_vma_readahead(void)
>   return false;
>  }
>  
> -static inline struct page *swap_readahead_detect(
> - struct vm_fault *vmf, struct vma_swap_readahead *swap_ra)
> -{
> - return NULL;
> -}
> -
> -static inline struct page *do_swap_page_readahead(
> - swp_entry_t fentry, gfp_t gfp_mask,
> - struct vm_fault *vmf, struct vma_swap_readahead *swap_ra)
> +static inline struct page *do_swap_page_readahead(swp_entry_t fentry,
> + gfp_t gfp_mask, struct vm_fault *vmf)
>  {
>   return NULL;
>  }
> diff --git a/mm/memory.c b/mm/memory.c
> index 8a0c410037d2..e955298e4290 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2849,21 +2849,14 @@ int do_swap_page(struct vm_fault *vmf)
>   struct vm_area_struct *vma = vmf->vma;
>   struct page *page = NULL, *swapcache = NULL;
>   struct mem_cgroup *memcg;
> - struct vma_swap_readahead swap_ra;
>   swp_entry_t entry;
>   pte_t pte;
>   int locked;
>   int exclusive = 0;
>   int ret = 0;
> - bool vma_readahead = swap_use_vma_readahead();
>  
> - if (vma_readahead)
> - page = swap_readahead_detect(vmf, _ra);
> - if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte)) {
> - if (page)
> - put_page(page);
> + if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte))
>   goto out;
> - }

The page table holding PTE may be unmapped in pte_unmap_same(), so is it
safe for us to access page table after this in do_swap_page_readahead()?

Best Regards,
Huang, Ying

>   entry = pte_to_swp_entry(vmf->orig_pte);
>   if (unlikely(non_swap_entry(entry))) {
> @@ -2889,9 +2882,7 @@ int do_swap_page(struct vm_fault *vmf)
>  
>  
>   delayacct_set_flag(DELAYACCT_PF_SWAPIN);
> - if (!page)
> - page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
> -  vmf->address);
> + page = lookup_swap_cache(entry, vma, vmf->address);
>   if (!page) {
>   struct swap_info_struct *si = swp_swap_info(entry);
>  
> @@ -2907,9 +2898,9 @@ int do_swap_page(struct vm_fault *vmf)
>   swap_readpage(page, true);
>   }
>   } else {
> - if (vma_readahead)
> + if (swap_use_vma_readahead())
>   page = do_swap_page_readahead(entry,
> - GFP_HIGHUSER_MOVABLE, vmf, _ra);
> + GFP_HIGHUSER_MOVABLE, vmf);
>   else
>   page = swapin_readahead(entry,
>  GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 6c017ced11e6..e3c535fcd2df 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -331,32 +331,38 @@ struct page *lookup_swap_cache(swp_entry_t entry, 
> struct vm_area_struct *vma,
>  unsigned long addr)
>  {
>   struct page *page;
> - unsigned long ra_info;
> - int win, hits, readahead;
>  
>   page = 

Re: [PATCH 1/2] mm:swap: clean up swap readahead

2017-10-31 Thread Huang, Ying
Hi, Minchan,

Minchan Kim  writes:

> When I see recent change of swap readahead, I am very unhappy
> about current code structure which diverges two swap readahead
> algorithm in do_swap_page. This patch is to clean it up.
>
> Main motivation is that fault handler doesn't need to be aware of
> readahead algorithms but just should call swapin_readahead.
>
> As first step, this patch cleans up a little bit but not perfect
> (I just separate for review easier) so next patch will make the goal
> complete.
>
> Signed-off-by: Minchan Kim 
> ---
>  include/linux/swap.h | 17 ++
>  mm/memory.c  | 17 +++---
>  mm/swap_state.c  | 89 
> 
>  3 files changed, 55 insertions(+), 68 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 84255b3da7c1..7c7c8b344bc9 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -427,12 +427,8 @@ extern struct page *__read_swap_cache_async(swp_entry_t, 
> gfp_t,
>   bool *new_page_allocated);
>  extern struct page *swapin_readahead(swp_entry_t, gfp_t,
>   struct vm_area_struct *vma, unsigned long addr);
> -
> -extern struct page *swap_readahead_detect(struct vm_fault *vmf,
> -   struct vma_swap_readahead *swap_ra);
>  extern struct page *do_swap_page_readahead(swp_entry_t fentry, gfp_t 
> gfp_mask,
> -struct vm_fault *vmf,
> -struct vma_swap_readahead *swap_ra);
> +struct vm_fault *vmf);
>  
>  /* linux/mm/swapfile.c */
>  extern atomic_long_t nr_swap_pages;
> @@ -551,15 +547,8 @@ static inline bool swap_use_vma_readahead(void)
>   return false;
>  }
>  
> -static inline struct page *swap_readahead_detect(
> - struct vm_fault *vmf, struct vma_swap_readahead *swap_ra)
> -{
> - return NULL;
> -}
> -
> -static inline struct page *do_swap_page_readahead(
> - swp_entry_t fentry, gfp_t gfp_mask,
> - struct vm_fault *vmf, struct vma_swap_readahead *swap_ra)
> +static inline struct page *do_swap_page_readahead(swp_entry_t fentry,
> + gfp_t gfp_mask, struct vm_fault *vmf)
>  {
>   return NULL;
>  }
> diff --git a/mm/memory.c b/mm/memory.c
> index 8a0c410037d2..e955298e4290 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2849,21 +2849,14 @@ int do_swap_page(struct vm_fault *vmf)
>   struct vm_area_struct *vma = vmf->vma;
>   struct page *page = NULL, *swapcache = NULL;
>   struct mem_cgroup *memcg;
> - struct vma_swap_readahead swap_ra;
>   swp_entry_t entry;
>   pte_t pte;
>   int locked;
>   int exclusive = 0;
>   int ret = 0;
> - bool vma_readahead = swap_use_vma_readahead();
>  
> - if (vma_readahead)
> - page = swap_readahead_detect(vmf, _ra);
> - if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte)) {
> - if (page)
> - put_page(page);
> + if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte))
>   goto out;
> - }

The page table holding PTE may be unmapped in pte_unmap_same(), so is it
safe for us to access page table after this in do_swap_page_readahead()?

Best Regards,
Huang, Ying

>   entry = pte_to_swp_entry(vmf->orig_pte);
>   if (unlikely(non_swap_entry(entry))) {
> @@ -2889,9 +2882,7 @@ int do_swap_page(struct vm_fault *vmf)
>  
>  
>   delayacct_set_flag(DELAYACCT_PF_SWAPIN);
> - if (!page)
> - page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
> -  vmf->address);
> + page = lookup_swap_cache(entry, vma, vmf->address);
>   if (!page) {
>   struct swap_info_struct *si = swp_swap_info(entry);
>  
> @@ -2907,9 +2898,9 @@ int do_swap_page(struct vm_fault *vmf)
>   swap_readpage(page, true);
>   }
>   } else {
> - if (vma_readahead)
> + if (swap_use_vma_readahead())
>   page = do_swap_page_readahead(entry,
> - GFP_HIGHUSER_MOVABLE, vmf, _ra);
> + GFP_HIGHUSER_MOVABLE, vmf);
>   else
>   page = swapin_readahead(entry,
>  GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 6c017ced11e6..e3c535fcd2df 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -331,32 +331,38 @@ struct page *lookup_swap_cache(swp_entry_t entry, 
> struct vm_area_struct *vma,
>  unsigned long addr)
>  {
>   struct page *page;
> - unsigned long ra_info;
> - int win, hits, readahead;
>  
>   page = find_get_page(swap_address_space(entry), 

[RFC] EPOLL_KILLME: New flag to epoll_wait() that subscribes process to death row (new syscall)

2017-10-31 Thread Shawn Landden
It is common for services to be stateless around their main event loop.
If a process passes the EPOLL_KILLME flag to epoll_wait5() then it
signals to the kernel that epoll_wait5() may not complete, and the kernel
may send SIGKILL if resources get tight.

See my systemd patch: https://github.com/shawnl/systemd/tree/killme

Android uses this memory model for all programs, and having it in the
kernel will enable integration with the page cache (not in this
series).
---
 arch/x86/entry/syscalls/syscall_32.tbl |  1 +
 arch/x86/entry/syscalls/syscall_64.tbl |  1 +
 fs/eventpoll.c | 74 +-
 include/linux/eventpoll.h  |  2 +
 include/linux/sched.h  |  3 ++
 include/uapi/asm-generic/unistd.h  |  5 ++-
 include/uapi/linux/eventpoll.h |  3 ++
 kernel/exit.c  |  2 +
 mm/oom_kill.c  | 17 
 9 files changed, 105 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
b/arch/x86/entry/syscalls/syscall_32.tbl
index 448ac2161112..040e5d02bdcc 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -391,3 +391,4 @@
 382i386pkey_free   sys_pkey_free
 383i386statx   sys_statx
 384i386arch_prctl  sys_arch_prctl  
compat_sys_arch_prctl
+385i386epoll_wait5 sys_epoll_wait5
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
b/arch/x86/entry/syscalls/syscall_64.tbl
index 5aef183e2f85..c72802e8cf65 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -339,6 +339,7 @@
 330common  pkey_alloc  sys_pkey_alloc
 331common  pkey_free   sys_pkey_free
 332common  statx   sys_statx
+333common  epoll_wait5 sys_epoll_wait5
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 2fabd19cdeea..76d1c91d940b 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -297,6 +297,14 @@ static LIST_HEAD(visited_list);
  */
 static LIST_HEAD(tfile_check_list);
 
+static LIST_HEAD(deathrow_q);
+static long deathrow_len __read_mostly;
+
+/* TODO: Can this lock be removed by using atomic instructions to update
+ * queue?
+ */
+static DEFINE_MUTEX(deathrow_mutex);
+
 #ifdef CONFIG_SYSCTL
 
 #include 
@@ -314,6 +322,15 @@ struct ctl_table epoll_table[] = {
.extra1 = ,
.extra2 = _max,
},
+   {
+   .procname   = "deathrow_size",
+   .data   = _len,
+   .maxlen = sizeof(deathrow_len),
+   .mode   = 0444,
+   .proc_handler   = proc_doulongvec_minmax,
+   .extra1 = ,
+   .extra2 = _max,
+   },
{ }
 };
 #endif /* CONFIG_SYSCTL */
@@ -2164,9 +2181,12 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
 /*
  * Implement the event wait interface for the eventpoll file. It is the kernel
  * part of the user space epoll_wait(2).
+ *
+ * A flags argument cannot be added to epoll_pwait cause it already has
+ * the maximum number of arguments (6). Can this be fixed?
  */
-SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events,
-   int, maxevents, int, timeout)
+SYSCALL_DEFINE5(epoll_wait5, int, epfd, struct epoll_event __user *, events,
+   int, maxevents, int, timeout, int, flags)
 {
int error;
struct fd f;
@@ -2199,14 +2219,44 @@ SYSCALL_DEFINE4(epoll_wait, int, epfd, struct 
epoll_event __user *, events,
 */
ep = f.file->private_data;
 
+   /* Check the EPOLL_* constants for conflicts.  */
+   BUILD_BUG_ON(EPOLL_KILLME == EPOLL_CLOEXEC);
+
+   if (flags & ~EPOLL_KILLME)
+   return -EINVAL;
+
+   if (flags & EPOLL_KILLME) {
+   /* Put process on death row. */
+   mutex_lock(_mutex);
+   deathrow_len++;
+   list_add(>se.deathrow, _q);
+   current->se.on_deathrow = 1;
+   mutex_unlock(_mutex);
+   }
+
/* Time to fish for events ... */
error = ep_poll(ep, events, maxevents, timeout);
 
+   if (flags & EPOLL_KILLME) {
+   /* Remove process from death row. */
+   mutex_lock(_mutex);
+   current->se.on_deathrow = 0;
+   list_del(>se.deathrow);
+   deathrow_len--;
+   mutex_unlock(_mutex);
+   }
+
 error_fput:
fdput(f);
return error;
 }
 
+SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events,
+   int, maxevents, int, timeout)
+{
+   return sys_epoll_wait5(epfd, events, maxevents, timeout, 0);
+}
+
 /*
  * Implement the event wait interface for the 

[RFC] EPOLL_KILLME: New flag to epoll_wait() that subscribes process to death row (new syscall)

2017-10-31 Thread Shawn Landden
It is common for services to be stateless around their main event loop.
If a process passes the EPOLL_KILLME flag to epoll_wait5() then it
signals to the kernel that epoll_wait5() may not complete, and the kernel
may send SIGKILL if resources get tight.

See my systemd patch: https://github.com/shawnl/systemd/tree/killme

Android uses this memory model for all programs, and having it in the
kernel will enable integration with the page cache (not in this
series).
---
 arch/x86/entry/syscalls/syscall_32.tbl |  1 +
 arch/x86/entry/syscalls/syscall_64.tbl |  1 +
 fs/eventpoll.c | 74 +-
 include/linux/eventpoll.h  |  2 +
 include/linux/sched.h  |  3 ++
 include/uapi/asm-generic/unistd.h  |  5 ++-
 include/uapi/linux/eventpoll.h |  3 ++
 kernel/exit.c  |  2 +
 mm/oom_kill.c  | 17 
 9 files changed, 105 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
b/arch/x86/entry/syscalls/syscall_32.tbl
index 448ac2161112..040e5d02bdcc 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -391,3 +391,4 @@
 382i386pkey_free   sys_pkey_free
 383i386statx   sys_statx
 384i386arch_prctl  sys_arch_prctl  
compat_sys_arch_prctl
+385i386epoll_wait5 sys_epoll_wait5
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
b/arch/x86/entry/syscalls/syscall_64.tbl
index 5aef183e2f85..c72802e8cf65 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -339,6 +339,7 @@
 330common  pkey_alloc  sys_pkey_alloc
 331common  pkey_free   sys_pkey_free
 332common  statx   sys_statx
+333common  epoll_wait5 sys_epoll_wait5
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 2fabd19cdeea..76d1c91d940b 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -297,6 +297,14 @@ static LIST_HEAD(visited_list);
  */
 static LIST_HEAD(tfile_check_list);
 
+static LIST_HEAD(deathrow_q);
+static long deathrow_len __read_mostly;
+
+/* TODO: Can this lock be removed by using atomic instructions to update
+ * queue?
+ */
+static DEFINE_MUTEX(deathrow_mutex);
+
 #ifdef CONFIG_SYSCTL
 
 #include 
@@ -314,6 +322,15 @@ struct ctl_table epoll_table[] = {
.extra1 = ,
.extra2 = _max,
},
+   {
+   .procname   = "deathrow_size",
+   .data   = _len,
+   .maxlen = sizeof(deathrow_len),
+   .mode   = 0444,
+   .proc_handler   = proc_doulongvec_minmax,
+   .extra1 = ,
+   .extra2 = _max,
+   },
{ }
 };
 #endif /* CONFIG_SYSCTL */
@@ -2164,9 +2181,12 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
 /*
  * Implement the event wait interface for the eventpoll file. It is the kernel
  * part of the user space epoll_wait(2).
+ *
+ * A flags argument cannot be added to epoll_pwait cause it already has
+ * the maximum number of arguments (6). Can this be fixed?
  */
-SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events,
-   int, maxevents, int, timeout)
+SYSCALL_DEFINE5(epoll_wait5, int, epfd, struct epoll_event __user *, events,
+   int, maxevents, int, timeout, int, flags)
 {
int error;
struct fd f;
@@ -2199,14 +2219,44 @@ SYSCALL_DEFINE4(epoll_wait, int, epfd, struct 
epoll_event __user *, events,
 */
ep = f.file->private_data;
 
+   /* Check the EPOLL_* constants for conflicts.  */
+   BUILD_BUG_ON(EPOLL_KILLME == EPOLL_CLOEXEC);
+
+   if (flags & ~EPOLL_KILLME)
+   return -EINVAL;
+
+   if (flags & EPOLL_KILLME) {
+   /* Put process on death row. */
+   mutex_lock(_mutex);
+   deathrow_len++;
+   list_add(>se.deathrow, _q);
+   current->se.on_deathrow = 1;
+   mutex_unlock(_mutex);
+   }
+
/* Time to fish for events ... */
error = ep_poll(ep, events, maxevents, timeout);
 
+   if (flags & EPOLL_KILLME) {
+   /* Remove process from death row. */
+   mutex_lock(_mutex);
+   current->se.on_deathrow = 0;
+   list_del(>se.deathrow);
+   deathrow_len--;
+   mutex_unlock(_mutex);
+   }
+
 error_fput:
fdput(f);
return error;
 }
 
+SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events,
+   int, maxevents, int, timeout)
+{
+   return sys_epoll_wait5(epfd, events, maxevents, timeout, 0);
+}
+
 /*
  * Implement the event wait interface for the 

[PATCH 0/2] swap readahead clean up

2017-10-31 Thread Minchan Kim
This patchset cleans up recent added vma-based readahead code via
unifying cluster-based readahead.

Minchan Kim (2):
  mm:swap: clean up swap readahead
  mm:swap: unify cluster-based and vma-based swap readahead

 include/linux/swap.h |  32 +++
 mm/memory.c  |  24 +++
 mm/shmem.c   |   5 ++-
 mm/swap_state.c  | 110 +--
 4 files changed, 87 insertions(+), 84 deletions(-)

-- 
2.7.4



[PATCH 1/2] mm:swap: clean up swap readahead

2017-10-31 Thread Minchan Kim
When I see recent change of swap readahead, I am very unhappy
about current code structure which diverges two swap readahead
algorithm in do_swap_page. This patch is to clean it up.

Main motivation is that fault handler doesn't need to be aware of
readahead algorithms but just should call swapin_readahead.

As first step, this patch cleans up a little bit but not perfect
(I just separate for review easier) so next patch will make the goal
complete.

Signed-off-by: Minchan Kim 
---
 include/linux/swap.h | 17 ++
 mm/memory.c  | 17 +++---
 mm/swap_state.c  | 89 
 3 files changed, 55 insertions(+), 68 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 84255b3da7c1..7c7c8b344bc9 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -427,12 +427,8 @@ extern struct page *__read_swap_cache_async(swp_entry_t, 
gfp_t,
bool *new_page_allocated);
 extern struct page *swapin_readahead(swp_entry_t, gfp_t,
struct vm_area_struct *vma, unsigned long addr);
-
-extern struct page *swap_readahead_detect(struct vm_fault *vmf,
- struct vma_swap_readahead *swap_ra);
 extern struct page *do_swap_page_readahead(swp_entry_t fentry, gfp_t gfp_mask,
-  struct vm_fault *vmf,
-  struct vma_swap_readahead *swap_ra);
+  struct vm_fault *vmf);
 
 /* linux/mm/swapfile.c */
 extern atomic_long_t nr_swap_pages;
@@ -551,15 +547,8 @@ static inline bool swap_use_vma_readahead(void)
return false;
 }
 
-static inline struct page *swap_readahead_detect(
-   struct vm_fault *vmf, struct vma_swap_readahead *swap_ra)
-{
-   return NULL;
-}
-
-static inline struct page *do_swap_page_readahead(
-   swp_entry_t fentry, gfp_t gfp_mask,
-   struct vm_fault *vmf, struct vma_swap_readahead *swap_ra)
+static inline struct page *do_swap_page_readahead(swp_entry_t fentry,
+   gfp_t gfp_mask, struct vm_fault *vmf)
 {
return NULL;
 }
diff --git a/mm/memory.c b/mm/memory.c
index 8a0c410037d2..e955298e4290 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2849,21 +2849,14 @@ int do_swap_page(struct vm_fault *vmf)
struct vm_area_struct *vma = vmf->vma;
struct page *page = NULL, *swapcache = NULL;
struct mem_cgroup *memcg;
-   struct vma_swap_readahead swap_ra;
swp_entry_t entry;
pte_t pte;
int locked;
int exclusive = 0;
int ret = 0;
-   bool vma_readahead = swap_use_vma_readahead();
 
-   if (vma_readahead)
-   page = swap_readahead_detect(vmf, _ra);
-   if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte)) {
-   if (page)
-   put_page(page);
+   if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte))
goto out;
-   }
 
entry = pte_to_swp_entry(vmf->orig_pte);
if (unlikely(non_swap_entry(entry))) {
@@ -2889,9 +2882,7 @@ int do_swap_page(struct vm_fault *vmf)
 
 
delayacct_set_flag(DELAYACCT_PF_SWAPIN);
-   if (!page)
-   page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
-vmf->address);
+   page = lookup_swap_cache(entry, vma, vmf->address);
if (!page) {
struct swap_info_struct *si = swp_swap_info(entry);
 
@@ -2907,9 +2898,9 @@ int do_swap_page(struct vm_fault *vmf)
swap_readpage(page, true);
}
} else {
-   if (vma_readahead)
+   if (swap_use_vma_readahead())
page = do_swap_page_readahead(entry,
-   GFP_HIGHUSER_MOVABLE, vmf, _ra);
+   GFP_HIGHUSER_MOVABLE, vmf);
else
page = swapin_readahead(entry,
   GFP_HIGHUSER_MOVABLE, vma, vmf->address);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 6c017ced11e6..e3c535fcd2df 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -331,32 +331,38 @@ struct page *lookup_swap_cache(swp_entry_t entry, struct 
vm_area_struct *vma,
   unsigned long addr)
 {
struct page *page;
-   unsigned long ra_info;
-   int win, hits, readahead;
 
page = find_get_page(swap_address_space(entry), swp_offset(entry));
 
INC_CACHE_INFO(find_total);
if (page) {
+   bool vma_ra = swap_use_vma_readahead();
+   bool readahead = TestClearPageReadahead(page);
+
INC_CACHE_INFO(find_success);
if (unlikely(PageTransCompound(page)))
   

[PATCH 0/2] swap readahead clean up

2017-10-31 Thread Minchan Kim
This patchset cleans up recent added vma-based readahead code via
unifying cluster-based readahead.

Minchan Kim (2):
  mm:swap: clean up swap readahead
  mm:swap: unify cluster-based and vma-based swap readahead

 include/linux/swap.h |  32 +++
 mm/memory.c  |  24 +++
 mm/shmem.c   |   5 ++-
 mm/swap_state.c  | 110 +--
 4 files changed, 87 insertions(+), 84 deletions(-)

-- 
2.7.4



[PATCH 1/2] mm:swap: clean up swap readahead

2017-10-31 Thread Minchan Kim
When I see recent change of swap readahead, I am very unhappy
about current code structure which diverges two swap readahead
algorithm in do_swap_page. This patch is to clean it up.

Main motivation is that fault handler doesn't need to be aware of
readahead algorithms but just should call swapin_readahead.

As first step, this patch cleans up a little bit but not perfect
(I just separate for review easier) so next patch will make the goal
complete.

Signed-off-by: Minchan Kim 
---
 include/linux/swap.h | 17 ++
 mm/memory.c  | 17 +++---
 mm/swap_state.c  | 89 
 3 files changed, 55 insertions(+), 68 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 84255b3da7c1..7c7c8b344bc9 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -427,12 +427,8 @@ extern struct page *__read_swap_cache_async(swp_entry_t, 
gfp_t,
bool *new_page_allocated);
 extern struct page *swapin_readahead(swp_entry_t, gfp_t,
struct vm_area_struct *vma, unsigned long addr);
-
-extern struct page *swap_readahead_detect(struct vm_fault *vmf,
- struct vma_swap_readahead *swap_ra);
 extern struct page *do_swap_page_readahead(swp_entry_t fentry, gfp_t gfp_mask,
-  struct vm_fault *vmf,
-  struct vma_swap_readahead *swap_ra);
+  struct vm_fault *vmf);
 
 /* linux/mm/swapfile.c */
 extern atomic_long_t nr_swap_pages;
@@ -551,15 +547,8 @@ static inline bool swap_use_vma_readahead(void)
return false;
 }
 
-static inline struct page *swap_readahead_detect(
-   struct vm_fault *vmf, struct vma_swap_readahead *swap_ra)
-{
-   return NULL;
-}
-
-static inline struct page *do_swap_page_readahead(
-   swp_entry_t fentry, gfp_t gfp_mask,
-   struct vm_fault *vmf, struct vma_swap_readahead *swap_ra)
+static inline struct page *do_swap_page_readahead(swp_entry_t fentry,
+   gfp_t gfp_mask, struct vm_fault *vmf)
 {
return NULL;
 }
diff --git a/mm/memory.c b/mm/memory.c
index 8a0c410037d2..e955298e4290 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2849,21 +2849,14 @@ int do_swap_page(struct vm_fault *vmf)
struct vm_area_struct *vma = vmf->vma;
struct page *page = NULL, *swapcache = NULL;
struct mem_cgroup *memcg;
-   struct vma_swap_readahead swap_ra;
swp_entry_t entry;
pte_t pte;
int locked;
int exclusive = 0;
int ret = 0;
-   bool vma_readahead = swap_use_vma_readahead();
 
-   if (vma_readahead)
-   page = swap_readahead_detect(vmf, _ra);
-   if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte)) {
-   if (page)
-   put_page(page);
+   if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte))
goto out;
-   }
 
entry = pte_to_swp_entry(vmf->orig_pte);
if (unlikely(non_swap_entry(entry))) {
@@ -2889,9 +2882,7 @@ int do_swap_page(struct vm_fault *vmf)
 
 
delayacct_set_flag(DELAYACCT_PF_SWAPIN);
-   if (!page)
-   page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
-vmf->address);
+   page = lookup_swap_cache(entry, vma, vmf->address);
if (!page) {
struct swap_info_struct *si = swp_swap_info(entry);
 
@@ -2907,9 +2898,9 @@ int do_swap_page(struct vm_fault *vmf)
swap_readpage(page, true);
}
} else {
-   if (vma_readahead)
+   if (swap_use_vma_readahead())
page = do_swap_page_readahead(entry,
-   GFP_HIGHUSER_MOVABLE, vmf, _ra);
+   GFP_HIGHUSER_MOVABLE, vmf);
else
page = swapin_readahead(entry,
   GFP_HIGHUSER_MOVABLE, vma, vmf->address);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 6c017ced11e6..e3c535fcd2df 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -331,32 +331,38 @@ struct page *lookup_swap_cache(swp_entry_t entry, struct 
vm_area_struct *vma,
   unsigned long addr)
 {
struct page *page;
-   unsigned long ra_info;
-   int win, hits, readahead;
 
page = find_get_page(swap_address_space(entry), swp_offset(entry));
 
INC_CACHE_INFO(find_total);
if (page) {
+   bool vma_ra = swap_use_vma_readahead();
+   bool readahead = TestClearPageReadahead(page);
+
INC_CACHE_INFO(find_success);
if (unlikely(PageTransCompound(page)))
return page;
- 

[PATCH 2/2] mm:swap: unify cluster-based and vma-based swap readahead

2017-10-31 Thread Minchan Kim
This patch makes do_swap_page no need to be aware of two different
swap readahead algorithm. Just unify cluster-based and vma-based
readahead function call.

Signed-off-by: Minchan Kim 
---
 include/linux/swap.h | 17 -
 mm/memory.c  | 11 ---
 mm/shmem.c   |  5 -
 mm/swap_state.c  | 21 +++--
 4 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 7c7c8b344bc9..9cc330360eac 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -425,9 +425,11 @@ extern struct page *read_swap_cache_async(swp_entry_t, 
gfp_t,
 extern struct page *__read_swap_cache_async(swp_entry_t, gfp_t,
struct vm_area_struct *vma, unsigned long addr,
bool *new_page_allocated);
-extern struct page *swapin_readahead(swp_entry_t, gfp_t,
-   struct vm_area_struct *vma, unsigned long addr);
-extern struct page *do_swap_page_readahead(swp_entry_t fentry, gfp_t gfp_mask,
+extern struct page *cluster_readahead(swp_entry_t entry, gfp_t flag,
+   struct vm_fault *vmf);
+extern struct page *swapin_readahead(swp_entry_t entry, gfp_t flag,
+   struct vm_fault *vmf);
+extern struct page *vma_readahead(swp_entry_t fentry, gfp_t gfp_mask,
   struct vm_fault *vmf);
 
 /* linux/mm/swapfile.c */
@@ -536,8 +538,13 @@ static inline void put_swap_page(struct page *page, 
swp_entry_t swp)
 {
 }
 
+static inline struct page *cluster_readahead(swp_entry_t, gfp_t gfp_mask
+   struct vm_fault *vmf)
+{
+}
+
 static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
-   struct vm_area_struct *vma, unsigned long addr)
+   struct vm_fault *vmf)
 {
return NULL;
 }
@@ -547,7 +554,7 @@ static inline bool swap_use_vma_readahead(void)
return false;
 }
 
-static inline struct page *do_swap_page_readahead(swp_entry_t fentry,
+static inline struct page *vma_readahead(swp_entry_t fentry,
gfp_t gfp_mask, struct vm_fault *vmf)
 {
return NULL;
diff --git a/mm/memory.c b/mm/memory.c
index e955298e4290..ce5e3d7ccc5c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2889,7 +2889,8 @@ int do_swap_page(struct vm_fault *vmf)
if (si->flags & SWP_SYNCHRONOUS_IO &&
__swap_count(si, entry) == 1) {
/* skip swapcache */
-   page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, 
vmf->address);
+   page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma,
+   vmf->address);
if (page) {
__SetPageLocked(page);
__SetPageSwapBacked(page);
@@ -2898,12 +2899,8 @@ int do_swap_page(struct vm_fault *vmf)
swap_readpage(page, true);
}
} else {
-   if (swap_use_vma_readahead())
-   page = do_swap_page_readahead(entry,
-   GFP_HIGHUSER_MOVABLE, vmf);
-   else
-   page = swapin_readahead(entry,
-  GFP_HIGHUSER_MOVABLE, vma, vmf->address);
+   page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
+   vmf);
swapcache = page;
}
 
diff --git a/mm/shmem.c b/mm/shmem.c
index 62dfdc097e44..2522bc0958e1 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1413,9 +1413,12 @@ static struct page *shmem_swapin(swp_entry_t swap, gfp_t 
gfp,
 {
struct vm_area_struct pvma;
struct page *page;
+   struct vm_fault vmf;
 
shmem_pseudo_vma_init(, info, index);
-   page = swapin_readahead(swap, gfp, , 0);
+   vmf.vma = 
+   vmf.address = 0;
+   page = cluster_readahead(swap, gfp, );
shmem_pseudo_vma_destroy();
 
return page;
diff --git a/mm/swap_state.c b/mm/swap_state.c
index e3c535fcd2df..5ee53d4ee047 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -538,11 +538,10 @@ static unsigned long swapin_nr_pages(unsigned long offset)
 }
 
 /**
- * swapin_readahead - swap in pages in hope we need them soon
+ * cluster_readahead - swap in pages in hope we need them soon
  * @entry: swap entry of this memory
  * @gfp_mask: memory allocation flags
- * @vma: user vma this address belongs to
- * @addr: target address for mempolicy
+ * @vmf: fault information
  *
  * Returns the struct page for entry and addr, after queueing swapin.
  *
@@ -556,8 +555,8 @@ static unsigned long swapin_nr_pages(unsigned long offset)
  *
  * Caller must hold down_read on 

[PATCH 2/2] mm:swap: unify cluster-based and vma-based swap readahead

2017-10-31 Thread Minchan Kim
This patch makes do_swap_page no need to be aware of two different
swap readahead algorithm. Just unify cluster-based and vma-based
readahead function call.

Signed-off-by: Minchan Kim 
---
 include/linux/swap.h | 17 -
 mm/memory.c  | 11 ---
 mm/shmem.c   |  5 -
 mm/swap_state.c  | 21 +++--
 4 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 7c7c8b344bc9..9cc330360eac 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -425,9 +425,11 @@ extern struct page *read_swap_cache_async(swp_entry_t, 
gfp_t,
 extern struct page *__read_swap_cache_async(swp_entry_t, gfp_t,
struct vm_area_struct *vma, unsigned long addr,
bool *new_page_allocated);
-extern struct page *swapin_readahead(swp_entry_t, gfp_t,
-   struct vm_area_struct *vma, unsigned long addr);
-extern struct page *do_swap_page_readahead(swp_entry_t fentry, gfp_t gfp_mask,
+extern struct page *cluster_readahead(swp_entry_t entry, gfp_t flag,
+   struct vm_fault *vmf);
+extern struct page *swapin_readahead(swp_entry_t entry, gfp_t flag,
+   struct vm_fault *vmf);
+extern struct page *vma_readahead(swp_entry_t fentry, gfp_t gfp_mask,
   struct vm_fault *vmf);
 
 /* linux/mm/swapfile.c */
@@ -536,8 +538,13 @@ static inline void put_swap_page(struct page *page, 
swp_entry_t swp)
 {
 }
 
+static inline struct page *cluster_readahead(swp_entry_t, gfp_t gfp_mask
+   struct vm_fault *vmf)
+{
+}
+
 static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
-   struct vm_area_struct *vma, unsigned long addr)
+   struct vm_fault *vmf)
 {
return NULL;
 }
@@ -547,7 +554,7 @@ static inline bool swap_use_vma_readahead(void)
return false;
 }
 
-static inline struct page *do_swap_page_readahead(swp_entry_t fentry,
+static inline struct page *vma_readahead(swp_entry_t fentry,
gfp_t gfp_mask, struct vm_fault *vmf)
 {
return NULL;
diff --git a/mm/memory.c b/mm/memory.c
index e955298e4290..ce5e3d7ccc5c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2889,7 +2889,8 @@ int do_swap_page(struct vm_fault *vmf)
if (si->flags & SWP_SYNCHRONOUS_IO &&
__swap_count(si, entry) == 1) {
/* skip swapcache */
-   page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, 
vmf->address);
+   page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma,
+   vmf->address);
if (page) {
__SetPageLocked(page);
__SetPageSwapBacked(page);
@@ -2898,12 +2899,8 @@ int do_swap_page(struct vm_fault *vmf)
swap_readpage(page, true);
}
} else {
-   if (swap_use_vma_readahead())
-   page = do_swap_page_readahead(entry,
-   GFP_HIGHUSER_MOVABLE, vmf);
-   else
-   page = swapin_readahead(entry,
-  GFP_HIGHUSER_MOVABLE, vma, vmf->address);
+   page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
+   vmf);
swapcache = page;
}
 
diff --git a/mm/shmem.c b/mm/shmem.c
index 62dfdc097e44..2522bc0958e1 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1413,9 +1413,12 @@ static struct page *shmem_swapin(swp_entry_t swap, gfp_t 
gfp,
 {
struct vm_area_struct pvma;
struct page *page;
+   struct vm_fault vmf;
 
shmem_pseudo_vma_init(, info, index);
-   page = swapin_readahead(swap, gfp, , 0);
+   vmf.vma = 
+   vmf.address = 0;
+   page = cluster_readahead(swap, gfp, );
shmem_pseudo_vma_destroy();
 
return page;
diff --git a/mm/swap_state.c b/mm/swap_state.c
index e3c535fcd2df..5ee53d4ee047 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -538,11 +538,10 @@ static unsigned long swapin_nr_pages(unsigned long offset)
 }
 
 /**
- * swapin_readahead - swap in pages in hope we need them soon
+ * cluster_readahead - swap in pages in hope we need them soon
  * @entry: swap entry of this memory
  * @gfp_mask: memory allocation flags
- * @vma: user vma this address belongs to
- * @addr: target address for mempolicy
+ * @vmf: fault information
  *
  * Returns the struct page for entry and addr, after queueing swapin.
  *
@@ -556,8 +555,8 @@ static unsigned long swapin_nr_pages(unsigned long offset)
  *
  * Caller must hold down_read on the vma->vm_mm if vma 

Re: xfs: list corruption in xfs_setup_inode()

2017-10-31 Thread Dave Chinner
On Tue, Oct 31, 2017 at 09:43:03PM -0700, Cong Wang wrote:
> On Tue, Oct 31, 2017 at 8:05 PM, Dave Chinner  wrote:
> > On Tue, Oct 31, 2017 at 06:51:08PM -0700, Cong Wang wrote:
> >> >> Please let me know if I can provide any other information.
> >> >
> >> > How do you reproduce the problem?
> >>
> >> The warning is reported via ABRT email, we don't know what was
> >> happening at the time of crash.
> >
> > Which makes it even harder to track down. Perhaps you should
> > configure the box to crashdump on such a failure and then we
> > can do some post-failure forensic analysis...
> 
> Yeah.
> 
> We are trying to make kdump working, but even if kdump works
> we still can't turn on panic_on_warn since this is production
> machine.

Hmmm. Ok, maybe you could leave a trace of the xfs_iget* trace
points running and check the log tail for unusual events around the
time of the next crash. e.g. xfs_iget_reclaim_fail events. That
might point us to a potential interaction we can look at more
closely. I'd also suggest slab poisoning as well, as that will
catch other lifecycle problems that could be causing list
corruptions such as use-after-free.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: xfs: list corruption in xfs_setup_inode()

2017-10-31 Thread Dave Chinner
On Tue, Oct 31, 2017 at 09:43:03PM -0700, Cong Wang wrote:
> On Tue, Oct 31, 2017 at 8:05 PM, Dave Chinner  wrote:
> > On Tue, Oct 31, 2017 at 06:51:08PM -0700, Cong Wang wrote:
> >> >> Please let me know if I can provide any other information.
> >> >
> >> > How do you reproduce the problem?
> >>
> >> The warning is reported via ABRT email, we don't know what was
> >> happening at the time of crash.
> >
> > Which makes it even harder to track down. Perhaps you should
> > configure the box to crashdump on such a failure and then we
> > can do some post-failure forensic analysis...
> 
> Yeah.
> 
> We are trying to make kdump working, but even if kdump works
> we still can't turn on panic_on_warn since this is production
> machine.

Hmmm. Ok, maybe you could leave a trace of the xfs_iget* trace
points running and check the log tail for unusual events around the
time of the next crash. e.g. xfs_iget_reclaim_fail events. That
might point us to a potential interaction we can look at more
closely. I'd also suggest slab poisoning as well, as that will
catch other lifecycle problems that could be causing list
corruptions such as use-after-free.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH 1/2] bpf: add a bpf_override_function helper

2017-10-31 Thread Alexei Starovoitov

On 10/31/17 8:45 AM, Josef Bacik wrote:

From: Josef Bacik 

Error injection is sloppy and very ad-hoc.  BPF could fill this niche
perfectly with it's kprobe functionality.  We could make sure errors are
only triggered in specific call chains that we care about with very
specific situations.  Accomplish this with the bpf_override_funciton
helper.  This will modify the probe'd callers return value to the
specified value and set the PC to an override function that simply
returns, bypassing the originally probed function.  This gives us a nice
clean way to implement systematic error injection for all of our code
paths.

Signed-off-by: Josef Bacik 
---
 arch/Kconfig |  3 +++
 arch/x86/Kconfig |  1 +
 arch/x86/include/asm/kprobes.h   |  4 
 arch/x86/include/asm/ptrace.h|  5 +
 arch/x86/kernel/kprobes/ftrace.c | 14 ++
 include/linux/trace_events.h |  7 +++
 include/uapi/linux/bpf.h |  7 ++-
 kernel/trace/Kconfig | 11 +++
 kernel/trace/bpf_trace.c | 30 
 kernel/trace/trace_kprobe.c  | 42 +---
 10 files changed, 116 insertions(+), 8 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index d789a89cb32c..4fb618082259 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -195,6 +195,9 @@ config HAVE_OPTPROBES
 config HAVE_KPROBES_ON_FTRACE
bool

+config HAVE_KPROBE_OVERRIDE
+   bool
+
 config HAVE_NMI
bool

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 971feac13506..5126d2750dd0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -152,6 +152,7 @@ config X86
select HAVE_KERNEL_XZ
select HAVE_KPROBES
select HAVE_KPROBES_ON_FTRACE
+   select HAVE_KPROBE_OVERRIDE
select HAVE_KRETPROBES
select HAVE_KVM
select HAVE_LIVEPATCH   if X86_64
diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index 6cf65437b5e5..c6c3b1f4306a 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -67,6 +67,10 @@ extern const int kretprobe_blacklist_size;
 void arch_remove_kprobe(struct kprobe *p);
 asmlinkage void kretprobe_trampoline(void);

+#ifdef CONFIG_KPROBES_ON_FTRACE
+extern void arch_ftrace_kprobe_override_function(struct pt_regs *regs);
+#endif
+
 /* Architecture specific copy of original instruction*/
 struct arch_specific_insn {
/* copy of the original instruction */
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 91c04c8e67fa..f04e71800c2f 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -108,6 +108,11 @@ static inline unsigned long regs_return_value(struct 
pt_regs *regs)
return regs->ax;
 }

+static inline void regs_set_return_value(struct pt_regs *regs, unsigned long 
rc)
+{
+   regs->ax = rc;
+}
+
 /*
  * user_mode(regs) determines whether a register set came from user
  * mode.  On x86_32, this is true if V8086 mode was enabled OR if the
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index 041f7b6dfa0f..3c455bf490cb 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -97,3 +97,17 @@ int arch_prepare_kprobe_ftrace(struct kprobe *p)
p->ainsn.boostable = false;
return 0;
 }
+
+asmlinkage void override_func(void);
+asm(
+   ".type override_func, @function\n"
+   "override_func:\n"
+   "  ret\n"
+   ".size override_func, .-override_func\n"
+);
+
+void arch_ftrace_kprobe_override_function(struct pt_regs *regs)
+{
+   regs->ip = (unsigned long)_func;
+}
+NOKPROBE_SYMBOL(arch_ftrace_kprobe_override_function);
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index fc6aeca945db..9179f109c49b 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -521,7 +521,14 @@ do {   
\
 #ifdef CONFIG_PERF_EVENTS
 struct perf_event;

+enum {
+   BPF_STATE_NORMAL_KPROBE = 0,
+   BPF_STATE_FTRACE_KPROBE,
+   BPF_STATE_MODIFIED_PC,
+};
+
 DECLARE_PER_CPU(struct pt_regs, perf_trace_regs);
+DECLARE_PER_CPU(int, bpf_kprobe_state);

 extern int  perf_trace_init(struct perf_event *event);
 extern void perf_trace_destroy(struct perf_event *event);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0b7b54d898bd..1ad5b87a42f6 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -673,6 +673,10 @@ union bpf_attr {
  * @buf: buf to fill
  * @buf_size: size of the buf
  * Return : 0 on success or negative error code
+ *
+ * int bpf_override_return(pt_regs, rc)
+ * @pt_regs: pointer to struct pt_regs
+ * @rc: the return value to set
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -732,7 +736,8 @@ 

Re: [PATCH 1/2] bpf: add a bpf_override_function helper

2017-10-31 Thread Alexei Starovoitov

On 10/31/17 8:45 AM, Josef Bacik wrote:

From: Josef Bacik 

Error injection is sloppy and very ad-hoc.  BPF could fill this niche
perfectly with it's kprobe functionality.  We could make sure errors are
only triggered in specific call chains that we care about with very
specific situations.  Accomplish this with the bpf_override_funciton
helper.  This will modify the probe'd callers return value to the
specified value and set the PC to an override function that simply
returns, bypassing the originally probed function.  This gives us a nice
clean way to implement systematic error injection for all of our code
paths.

Signed-off-by: Josef Bacik 
---
 arch/Kconfig |  3 +++
 arch/x86/Kconfig |  1 +
 arch/x86/include/asm/kprobes.h   |  4 
 arch/x86/include/asm/ptrace.h|  5 +
 arch/x86/kernel/kprobes/ftrace.c | 14 ++
 include/linux/trace_events.h |  7 +++
 include/uapi/linux/bpf.h |  7 ++-
 kernel/trace/Kconfig | 11 +++
 kernel/trace/bpf_trace.c | 30 
 kernel/trace/trace_kprobe.c  | 42 +---
 10 files changed, 116 insertions(+), 8 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index d789a89cb32c..4fb618082259 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -195,6 +195,9 @@ config HAVE_OPTPROBES
 config HAVE_KPROBES_ON_FTRACE
bool

+config HAVE_KPROBE_OVERRIDE
+   bool
+
 config HAVE_NMI
bool

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 971feac13506..5126d2750dd0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -152,6 +152,7 @@ config X86
select HAVE_KERNEL_XZ
select HAVE_KPROBES
select HAVE_KPROBES_ON_FTRACE
+   select HAVE_KPROBE_OVERRIDE
select HAVE_KRETPROBES
select HAVE_KVM
select HAVE_LIVEPATCH   if X86_64
diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index 6cf65437b5e5..c6c3b1f4306a 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -67,6 +67,10 @@ extern const int kretprobe_blacklist_size;
 void arch_remove_kprobe(struct kprobe *p);
 asmlinkage void kretprobe_trampoline(void);

+#ifdef CONFIG_KPROBES_ON_FTRACE
+extern void arch_ftrace_kprobe_override_function(struct pt_regs *regs);
+#endif
+
 /* Architecture specific copy of original instruction*/
 struct arch_specific_insn {
/* copy of the original instruction */
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 91c04c8e67fa..f04e71800c2f 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -108,6 +108,11 @@ static inline unsigned long regs_return_value(struct 
pt_regs *regs)
return regs->ax;
 }

+static inline void regs_set_return_value(struct pt_regs *regs, unsigned long 
rc)
+{
+   regs->ax = rc;
+}
+
 /*
  * user_mode(regs) determines whether a register set came from user
  * mode.  On x86_32, this is true if V8086 mode was enabled OR if the
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index 041f7b6dfa0f..3c455bf490cb 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -97,3 +97,17 @@ int arch_prepare_kprobe_ftrace(struct kprobe *p)
p->ainsn.boostable = false;
return 0;
 }
+
+asmlinkage void override_func(void);
+asm(
+   ".type override_func, @function\n"
+   "override_func:\n"
+   "  ret\n"
+   ".size override_func, .-override_func\n"
+);
+
+void arch_ftrace_kprobe_override_function(struct pt_regs *regs)
+{
+   regs->ip = (unsigned long)_func;
+}
+NOKPROBE_SYMBOL(arch_ftrace_kprobe_override_function);
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index fc6aeca945db..9179f109c49b 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -521,7 +521,14 @@ do {   
\
 #ifdef CONFIG_PERF_EVENTS
 struct perf_event;

+enum {
+   BPF_STATE_NORMAL_KPROBE = 0,
+   BPF_STATE_FTRACE_KPROBE,
+   BPF_STATE_MODIFIED_PC,
+};
+
 DECLARE_PER_CPU(struct pt_regs, perf_trace_regs);
+DECLARE_PER_CPU(int, bpf_kprobe_state);

 extern int  perf_trace_init(struct perf_event *event);
 extern void perf_trace_destroy(struct perf_event *event);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0b7b54d898bd..1ad5b87a42f6 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -673,6 +673,10 @@ union bpf_attr {
  * @buf: buf to fill
  * @buf_size: size of the buf
  * Return : 0 on success or negative error code
+ *
+ * int bpf_override_return(pt_regs, rc)
+ * @pt_regs: pointer to struct pt_regs
+ * @rc: the return value to set
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -732,7 +736,8 @@ union bpf_attr {

Re: xfs: list corruption in xfs_setup_inode()

2017-10-31 Thread Cong Wang
On Tue, Oct 31, 2017 at 8:05 PM, Dave Chinner  wrote:
> On Tue, Oct 31, 2017 at 06:51:08PM -0700, Cong Wang wrote:
>> On Mon, Oct 30, 2017 at 5:33 PM, Dave Chinner  wrote:
>> > On Mon, Oct 30, 2017 at 02:55:43PM -0700, Cong Wang wrote:
>> >> Hello,
>> >>
>> >> We triggered a list corruption (double add) warning below on our 4.9
>> >> kernel (the 4.9 kernel we use is based on -stable release, with only a
>> >> few unrelated networking backports):
> ...
>> >> 4.9.34.el7.x86_64 #1
>> >> Hardware name: TYAN S5512/S5512, BIOS V8.B13 03/20/2014
>> >>  b0d48a0abb30 8e389f47 b0d48a0abb80 
>> >>  b0d48a0abb70 8e08989b 0024 8d9d691e0aa0
>> >>  8d9d7a716608 8d9d691e0aa0 4000 8d9d7de6d800
>> >> Call Trace:
>> >>  [] dump_stack+0x4d/0x66
>> >>  [] __warn+0xcb/0xf0
>> >>  [] warn_slowpath_fmt+0x5f/0x80
>> >>  [] __list_add+0xac/0xb0
>> >>  [] inode_sb_list_add+0x3b/0x50
>> >>  [] xfs_setup_inode+0x2c/0x170 [xfs]
>> >>  [] xfs_ialloc+0x317/0x5c0 [xfs]
>> >>  [] xfs_dir_ialloc+0x77/0x220 [xfs]
>> >
>> > Inode allocation, so should be a new inode straight from the slab
>> > cache. THat implies memory corruption of some kind. Please turn on
>> > slab poisoning and try to reproduce.
>>
>> Are you sure? xfs_iget() seems searching in a cache before allocating
>> a new one:
>
> /me sighs
>
> You started with "I don't know the XFS code very well", so I omitted
> the complexity of describing about 10 different corner cases where
> we /could/ find the unlinked inode still in the cache via the
> lookup. But they aren't common cases - the common case in the real
> world is allocation of cache cold inodes. IOWs: "so should be a new
> inode straight from the slab cache".
>
> So, yes, we could find the old unlinked inode still cached in the
> XFS inode cache, but I don't have the time to explain how RCU lookup
> code works to everyone who reports a bug.

Oh, sorry about it. I understand it now.


>
> All you need to understand is that all of this happens below the VFS
> and so inodes being reclaimed or newly allocated the in-cache inode
> should never, ever be on the VFS sb inode list.
>

OK.


>> >>  [] ? down_write+0x12/0x40
>> >>  [] xfs_create+0x482/0x760 [xfs]
>> >>  [] xfs_generic_create+0x21e/0x2c0 [xfs]
>> >>  [] xfs_vn_mknod+0x14/0x20 [xfs]
>> >>  [] xfs_vn_mkdir+0x16/0x20 [xfs]
>> >>  [] vfs_mkdir+0xe8/0x140
>> >>  [] SyS_mkdir+0x7a/0xf0
>> >>  [] entry_SYSCALL_64_fastpath+0x13/0x94
>> >>
>> >> _Without_ looking deeper, it seems this warning could be shut up by:
>> >>
>> >> --- a/fs/xfs/xfs_icache.c
>> >> +++ b/fs/xfs/xfs_icache.c
>> >> @@ -1138,6 +1138,8 @@ xfs_reclaim_inode(
>> >> xfs_iunlock(ip, XFS_ILOCK_EXCL);
>> >>
>> >> XFS_STATS_INC(ip->i_mount, xs_ig_reclaims);
>> >> +
>> >> +   inode_sb_list_del(VFS_I(ip));
>> >>
>> >> with properly exporting inode_sb_list_del(). Does this make any sense?
>> >
>> > No, because by this stage the inode has already been removed from
>> > the superblock indoe list. Doing this sort of thing here would just
>> > paper over whatever the underlying problem might be.
>>
>>
>> For me, it looks like the inode in the cache pag->pag_ici_root
>> is not removed from sb list before removing from cache.
>
> Sure, we have list corruption. Where we detect that corruption
> implies nothing about the cause of the list corruption. The two
> events are not connected in any way. Clearing that VFS list here
> does nothing to fix the problem causing the list corruption to
> occur.

OK.

>
>> >> Please let me know if I can provide any other information.
>> >
>> > How do you reproduce the problem?
>>
>> The warning is reported via ABRT email, we don't know what was
>> happening at the time of crash.
>
> Which makes it even harder to track down. Perhaps you should
> configure the box to crashdump on such a failure and then we
> can do some post-failure forensic analysis...

Yeah.

We are trying to make kdump working, but even if kdump works
we still can't turn on panic_on_warn since this is production machine.


Thanks!


Re: xfs: list corruption in xfs_setup_inode()

2017-10-31 Thread Cong Wang
On Tue, Oct 31, 2017 at 8:05 PM, Dave Chinner  wrote:
> On Tue, Oct 31, 2017 at 06:51:08PM -0700, Cong Wang wrote:
>> On Mon, Oct 30, 2017 at 5:33 PM, Dave Chinner  wrote:
>> > On Mon, Oct 30, 2017 at 02:55:43PM -0700, Cong Wang wrote:
>> >> Hello,
>> >>
>> >> We triggered a list corruption (double add) warning below on our 4.9
>> >> kernel (the 4.9 kernel we use is based on -stable release, with only a
>> >> few unrelated networking backports):
> ...
>> >> 4.9.34.el7.x86_64 #1
>> >> Hardware name: TYAN S5512/S5512, BIOS V8.B13 03/20/2014
>> >>  b0d48a0abb30 8e389f47 b0d48a0abb80 
>> >>  b0d48a0abb70 8e08989b 0024 8d9d691e0aa0
>> >>  8d9d7a716608 8d9d691e0aa0 4000 8d9d7de6d800
>> >> Call Trace:
>> >>  [] dump_stack+0x4d/0x66
>> >>  [] __warn+0xcb/0xf0
>> >>  [] warn_slowpath_fmt+0x5f/0x80
>> >>  [] __list_add+0xac/0xb0
>> >>  [] inode_sb_list_add+0x3b/0x50
>> >>  [] xfs_setup_inode+0x2c/0x170 [xfs]
>> >>  [] xfs_ialloc+0x317/0x5c0 [xfs]
>> >>  [] xfs_dir_ialloc+0x77/0x220 [xfs]
>> >
>> > Inode allocation, so should be a new inode straight from the slab
>> > cache. THat implies memory corruption of some kind. Please turn on
>> > slab poisoning and try to reproduce.
>>
>> Are you sure? xfs_iget() seems searching in a cache before allocating
>> a new one:
>
> /me sighs
>
> You started with "I don't know the XFS code very well", so I omitted
> the complexity of describing about 10 different corner cases where
> we /could/ find the unlinked inode still in the cache via the
> lookup. But they aren't common cases - the common case in the real
> world is allocation of cache cold inodes. IOWs: "so should be a new
> inode straight from the slab cache".
>
> So, yes, we could find the old unlinked inode still cached in the
> XFS inode cache, but I don't have the time to explain how RCU lookup
> code works to everyone who reports a bug.

Oh, sorry about it. I understand it now.


>
> All you need to understand is that all of this happens below the VFS
> and so inodes being reclaimed or newly allocated the in-cache inode
> should never, ever be on the VFS sb inode list.
>

OK.


>> >>  [] ? down_write+0x12/0x40
>> >>  [] xfs_create+0x482/0x760 [xfs]
>> >>  [] xfs_generic_create+0x21e/0x2c0 [xfs]
>> >>  [] xfs_vn_mknod+0x14/0x20 [xfs]
>> >>  [] xfs_vn_mkdir+0x16/0x20 [xfs]
>> >>  [] vfs_mkdir+0xe8/0x140
>> >>  [] SyS_mkdir+0x7a/0xf0
>> >>  [] entry_SYSCALL_64_fastpath+0x13/0x94
>> >>
>> >> _Without_ looking deeper, it seems this warning could be shut up by:
>> >>
>> >> --- a/fs/xfs/xfs_icache.c
>> >> +++ b/fs/xfs/xfs_icache.c
>> >> @@ -1138,6 +1138,8 @@ xfs_reclaim_inode(
>> >> xfs_iunlock(ip, XFS_ILOCK_EXCL);
>> >>
>> >> XFS_STATS_INC(ip->i_mount, xs_ig_reclaims);
>> >> +
>> >> +   inode_sb_list_del(VFS_I(ip));
>> >>
>> >> with properly exporting inode_sb_list_del(). Does this make any sense?
>> >
>> > No, because by this stage the inode has already been removed from
>> > the superblock indoe list. Doing this sort of thing here would just
>> > paper over whatever the underlying problem might be.
>>
>>
>> For me, it looks like the inode in the cache pag->pag_ici_root
>> is not removed from sb list before removing from cache.
>
> Sure, we have list corruption. Where we detect that corruption
> implies nothing about the cause of the list corruption. The two
> events are not connected in any way. Clearing that VFS list here
> does nothing to fix the problem causing the list corruption to
> occur.

OK.

>
>> >> Please let me know if I can provide any other information.
>> >
>> > How do you reproduce the problem?
>>
>> The warning is reported via ABRT email, we don't know what was
>> happening at the time of crash.
>
> Which makes it even harder to track down. Perhaps you should
> configure the box to crashdump on such a failure and then we
> can do some post-failure forensic analysis...

Yeah.

We are trying to make kdump working, but even if kdump works
we still can't turn on panic_on_warn since this is production machine.


Thanks!


Re: [PATCH v2] sched/sysctl: Fix attributes of some extern declarations

2017-10-31 Thread Nick Desaulniers
> El Tue, Oct 30, 2017 at 10:57:58AM +0100 Ingo Molnar ha dit:
>> So I hate this change, because it pointlessly duplicates an attribute that 
>> should
>> only matter at the definition site.
>
> It's certainly not ideal, and then again essentially the same is done
> in kernel/sched/sched.h, just that here the specific attribute is
> hidden behind const_debug.
>
>> The Clang warning:
>>
>> >   kernel/sched/sched.h:1618:33: warning: section attribute is specified on
>> > redeclared variable [-Wsection]
>>
>> suggests that the -Wsection warning can be turned off. The Clang build should
>> probably do that.

Naive question: can these definitions be hoisted to include/linux/sched.h?


Re: [PATCH v2] sched/sysctl: Fix attributes of some extern declarations

2017-10-31 Thread Nick Desaulniers
> El Tue, Oct 30, 2017 at 10:57:58AM +0100 Ingo Molnar ha dit:
>> So I hate this change, because it pointlessly duplicates an attribute that 
>> should
>> only matter at the definition site.
>
> It's certainly not ideal, and then again essentially the same is done
> in kernel/sched/sched.h, just that here the specific attribute is
> hidden behind const_debug.
>
>> The Clang warning:
>>
>> >   kernel/sched/sched.h:1618:33: warning: section attribute is specified on
>> > redeclared variable [-Wsection]
>>
>> suggests that the -Wsection warning can be turned off. The Clang build should
>> probably do that.

Naive question: can these definitions be hoisted to include/linux/sched.h?


[PATCH V10 1/2] kasan: use %pK to print addresses instead of %p

2017-10-31 Thread Tobin C. Harding
In preparation for hashing addresses printed using %p. We need the
actual address for error reporting in kasan.

Use %pK instead of %p to print addresses.

Signed-off-by: Tobin C. Harding 
---
 mm/kasan/report.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 6bcfb01ba038..ad042f025a1a 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -134,7 +134,7 @@ static void print_error_description(struct 
kasan_access_info *info)
 
pr_err("BUG: KASAN: %s in %pS\n",
bug_type, (void *)info->ip);
-   pr_err("%s of size %zu at addr %p by task %s/%d\n",
+   pr_err("%s of size %zu at addr %pK by task %s/%d\n",
info->is_write ? "Write" : "Read", info->access_size,
info->access_addr, current->comm, task_pid_nr(current));
 }
@@ -206,7 +206,7 @@ static void describe_object_addr(struct kmem_cache *cache, 
void *object,
const char *rel_type;
int rel_bytes;
 
-   pr_err("The buggy address belongs to the object at %p\n"
+   pr_err("The buggy address belongs to the object at %pK\n"
   " which belongs to the cache %s of size %d\n",
object, cache->name, cache->object_size);
 
@@ -225,7 +225,7 @@ static void describe_object_addr(struct kmem_cache *cache, 
void *object,
}
 
pr_err("The buggy address is located %d bytes %s of\n"
-  " %d-byte region [%p, %p)\n",
+  " %d-byte region [%pK, %pK)\n",
rel_bytes, rel_type, cache->object_size, (void *)object_addr,
(void *)(object_addr + cache->object_size));
 }
@@ -302,7 +302,7 @@ static void print_shadow_for_address(const void *addr)
char shadow_buf[SHADOW_BYTES_PER_ROW];
 
snprintf(buffer, sizeof(buffer),
-   (i == 0) ? ">%p: " : " %p: ", kaddr);
+   (i == 0) ? ">%pK: " : " %pK: ", kaddr);
/*
 * We should not pass a shadow pointer to generic
 * function, because generic functions may try to
-- 
2.7.4



[PATCH V10 2/2] printk: hash addresses printed with %p

2017-10-31 Thread Tobin C. Harding
Currently there are many places in the kernel where addresses are being
printed using an unadorned %p. Kernel pointers should be printed using
%pK allowing some control via the kptr_restrict sysctl. Exposing addresses
gives attackers sensitive information about the kernel layout in memory.

We can reduce the attack surface by hashing all addresses printed with
%p. This will of course break some users, forcing code printing needed
addresses to be updated.

For what it's worth, usage of unadorned %p can be broken down as
follows (thanks to Joe Perches).

$ git grep -E '%p[^A-Za-z0-9]' | cut -f1 -d"/" | sort | uniq -c
   1084 arch
 20 block
 10 crypto
 32 Documentation
   8121 drivers
   1221 fs
143 include
101 kernel
 69 lib
100 mm
   1510 net
 40 samples
  7 scripts
 11 security
166 sound
152 tools
  2 virt

Add function ptr_to_id() to map an address to a 32 bit unique
identifier. Hash any unadorned usage of specifier %p and any malformed
specifiers.

Signed-off-by: Tobin C. Harding 

---
 Documentation/printk-formats.txt |  17 +++-
 lib/test_printf.c| 108 +++-
 lib/vsprintf.c   | 176 ---
 3 files changed, 213 insertions(+), 88 deletions(-)

diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt
index 361789df51ec..ec7deb80d035 100644
--- a/Documentation/printk-formats.txt
+++ b/Documentation/printk-formats.txt
@@ -5,6 +5,9 @@ How to get printk format specifiers right
 :Author: Randy Dunlap 
 :Author: Andrew Murray 
 
+Please do not print kernel addresses using %x. Exposing kernel addresses to
+user space leaks sensitive information that increases the attack surface of the
+kernel. In order to print pointers, please see 'Pointer Types' below.
 
 Integer types
 =
@@ -45,6 +48,18 @@ return from vsnprintf.
 Raw pointer value SHOULD be printed with %p. The kernel supports
 the following extended format specifiers for pointer types:
 
+Pointer Types
+=
+
+Pointers printed without a specifier extension (i.e unadorned %p) are hashed
+to give a unique identifier without leaking kernel addresses to user space.
+If you _really_ want to see the address please use %pK (see 'Kernel Pointers'
+below). On 64 bit machines the first 32 bits are zeroed.
+
+::
+
+   %p  abcdef12 or abcdef12
+
 Symbols/Function Pointers
 =
 
@@ -91,7 +106,7 @@ Kernel Pointers
 
 ::
 
-   %pK 0x01234567 or 0x0123456789abcdef
+   %pK 01234567 or 0123456789abcdef
 
 For printing kernel pointers which should be hidden from unprivileged
 users. The behaviour of ``%pK`` depends on the ``kptr_restrict sysctl`` - see
diff --git a/lib/test_printf.c b/lib/test_printf.c
index 563f10e6876a..71ebfa43ad05 100644
--- a/lib/test_printf.c
+++ b/lib/test_printf.c
@@ -24,24 +24,6 @@
 #define PAD_SIZE 16
 #define FILL_CHAR '$'
 
-#define PTR1 ((void*)0x01234567)
-#define PTR2 ((void*)(long)(int)0xfedcba98)
-
-#if BITS_PER_LONG == 64
-#define PTR1_ZEROES "0"
-#define PTR1_SPACES " "
-#define PTR1_STR "1234567"
-#define PTR2_STR "fedcba98"
-#define PTR_WIDTH 16
-#else
-#define PTR1_ZEROES "0"
-#define PTR1_SPACES " "
-#define PTR1_STR "1234567"
-#define PTR2_STR "fedcba98"
-#define PTR_WIDTH 8
-#endif
-#define PTR_WIDTH_STR stringify(PTR_WIDTH)
-
 static unsigned total_tests __initdata;
 static unsigned failed_tests __initdata;
 static char *test_buffer __initdata;
@@ -217,30 +199,79 @@ test_string(void)
test("a  |   |   ", "%-3.s|%-3.0s|%-3.*s", "a", "b", 0, "c");
 }
 
+#define PLAIN_BUF_SIZE 64  /* leave some space so we don't oops */
+
+#if BITS_PER_LONG == 64
+
+#define PTR_WIDTH 16
+#define PTR ((void *)0x0123456789ab)
+#define PTR_STR "0123456789ab"
+#define ZEROS ""   /* hex 32 zero bits */
+
+static int __init
+plain_format(void)
+{
+   char buf[PLAIN_BUF_SIZE];
+   int nchars;
+
+   nchars = snprintf(buf, PLAIN_BUF_SIZE, "%p", PTR);
+
+   if (nchars != PTR_WIDTH || strncmp(buf, ZEROS, strlen(ZEROS)) != 0)
+   return -1;
+
+   return 0;
+}
+
+#else
+
+#define PTR_WIDTH 8
+#define PTR ((void *)0x456789ab)
+#define PTR_STR "456789ab"
+
+static int __init
+plain_format(void)
+{
+   /* Format is implicitly tested for 32 bit machines by plain_hash() */
+   return 0;
+}
+
+#endif /* BITS_PER_LONG == 64 */
+
+static int __init
+plain_hash(void)
+{
+   char buf[PLAIN_BUF_SIZE];
+   int nchars;
+
+   nchars = snprintf(buf, PLAIN_BUF_SIZE, "%p", PTR);
+
+   if (nchars != PTR_WIDTH || strncmp(buf, PTR_STR, PTR_WIDTH) == 0)
+   return -1;
+
+   return 0;
+}
+
+/*
+ * We can't use test() to test %p because we don't know what output to expect
+ * after an address is hashed.
+ */
 static void __init
 plain(void)
 {
-   test(PTR1_ZEROES 

[PATCH V10 1/2] kasan: use %pK to print addresses instead of %p

2017-10-31 Thread Tobin C. Harding
In preparation for hashing addresses printed using %p. We need the
actual address for error reporting in kasan.

Use %pK instead of %p to print addresses.

Signed-off-by: Tobin C. Harding 
---
 mm/kasan/report.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 6bcfb01ba038..ad042f025a1a 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -134,7 +134,7 @@ static void print_error_description(struct 
kasan_access_info *info)
 
pr_err("BUG: KASAN: %s in %pS\n",
bug_type, (void *)info->ip);
-   pr_err("%s of size %zu at addr %p by task %s/%d\n",
+   pr_err("%s of size %zu at addr %pK by task %s/%d\n",
info->is_write ? "Write" : "Read", info->access_size,
info->access_addr, current->comm, task_pid_nr(current));
 }
@@ -206,7 +206,7 @@ static void describe_object_addr(struct kmem_cache *cache, 
void *object,
const char *rel_type;
int rel_bytes;
 
-   pr_err("The buggy address belongs to the object at %p\n"
+   pr_err("The buggy address belongs to the object at %pK\n"
   " which belongs to the cache %s of size %d\n",
object, cache->name, cache->object_size);
 
@@ -225,7 +225,7 @@ static void describe_object_addr(struct kmem_cache *cache, 
void *object,
}
 
pr_err("The buggy address is located %d bytes %s of\n"
-  " %d-byte region [%p, %p)\n",
+  " %d-byte region [%pK, %pK)\n",
rel_bytes, rel_type, cache->object_size, (void *)object_addr,
(void *)(object_addr + cache->object_size));
 }
@@ -302,7 +302,7 @@ static void print_shadow_for_address(const void *addr)
char shadow_buf[SHADOW_BYTES_PER_ROW];
 
snprintf(buffer, sizeof(buffer),
-   (i == 0) ? ">%p: " : " %p: ", kaddr);
+   (i == 0) ? ">%pK: " : " %pK: ", kaddr);
/*
 * We should not pass a shadow pointer to generic
 * function, because generic functions may try to
-- 
2.7.4



[PATCH V10 2/2] printk: hash addresses printed with %p

2017-10-31 Thread Tobin C. Harding
Currently there are many places in the kernel where addresses are being
printed using an unadorned %p. Kernel pointers should be printed using
%pK allowing some control via the kptr_restrict sysctl. Exposing addresses
gives attackers sensitive information about the kernel layout in memory.

We can reduce the attack surface by hashing all addresses printed with
%p. This will of course break some users, forcing code printing needed
addresses to be updated.

For what it's worth, usage of unadorned %p can be broken down as
follows (thanks to Joe Perches).

$ git grep -E '%p[^A-Za-z0-9]' | cut -f1 -d"/" | sort | uniq -c
   1084 arch
 20 block
 10 crypto
 32 Documentation
   8121 drivers
   1221 fs
143 include
101 kernel
 69 lib
100 mm
   1510 net
 40 samples
  7 scripts
 11 security
166 sound
152 tools
  2 virt

Add function ptr_to_id() to map an address to a 32 bit unique
identifier. Hash any unadorned usage of specifier %p and any malformed
specifiers.

Signed-off-by: Tobin C. Harding 

---
 Documentation/printk-formats.txt |  17 +++-
 lib/test_printf.c| 108 +++-
 lib/vsprintf.c   | 176 ---
 3 files changed, 213 insertions(+), 88 deletions(-)

diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt
index 361789df51ec..ec7deb80d035 100644
--- a/Documentation/printk-formats.txt
+++ b/Documentation/printk-formats.txt
@@ -5,6 +5,9 @@ How to get printk format specifiers right
 :Author: Randy Dunlap 
 :Author: Andrew Murray 
 
+Please do not print kernel addresses using %x. Exposing kernel addresses to
+user space leaks sensitive information that increases the attack surface of the
+kernel. In order to print pointers, please see 'Pointer Types' below.
 
 Integer types
 =
@@ -45,6 +48,18 @@ return from vsnprintf.
 Raw pointer value SHOULD be printed with %p. The kernel supports
 the following extended format specifiers for pointer types:
 
+Pointer Types
+=
+
+Pointers printed without a specifier extension (i.e unadorned %p) are hashed
+to give a unique identifier without leaking kernel addresses to user space.
+If you _really_ want to see the address please use %pK (see 'Kernel Pointers'
+below). On 64 bit machines the first 32 bits are zeroed.
+
+::
+
+   %p  abcdef12 or abcdef12
+
 Symbols/Function Pointers
 =
 
@@ -91,7 +106,7 @@ Kernel Pointers
 
 ::
 
-   %pK 0x01234567 or 0x0123456789abcdef
+   %pK 01234567 or 0123456789abcdef
 
 For printing kernel pointers which should be hidden from unprivileged
 users. The behaviour of ``%pK`` depends on the ``kptr_restrict sysctl`` - see
diff --git a/lib/test_printf.c b/lib/test_printf.c
index 563f10e6876a..71ebfa43ad05 100644
--- a/lib/test_printf.c
+++ b/lib/test_printf.c
@@ -24,24 +24,6 @@
 #define PAD_SIZE 16
 #define FILL_CHAR '$'
 
-#define PTR1 ((void*)0x01234567)
-#define PTR2 ((void*)(long)(int)0xfedcba98)
-
-#if BITS_PER_LONG == 64
-#define PTR1_ZEROES "0"
-#define PTR1_SPACES " "
-#define PTR1_STR "1234567"
-#define PTR2_STR "fedcba98"
-#define PTR_WIDTH 16
-#else
-#define PTR1_ZEROES "0"
-#define PTR1_SPACES " "
-#define PTR1_STR "1234567"
-#define PTR2_STR "fedcba98"
-#define PTR_WIDTH 8
-#endif
-#define PTR_WIDTH_STR stringify(PTR_WIDTH)
-
 static unsigned total_tests __initdata;
 static unsigned failed_tests __initdata;
 static char *test_buffer __initdata;
@@ -217,30 +199,79 @@ test_string(void)
test("a  |   |   ", "%-3.s|%-3.0s|%-3.*s", "a", "b", 0, "c");
 }
 
+#define PLAIN_BUF_SIZE 64  /* leave some space so we don't oops */
+
+#if BITS_PER_LONG == 64
+
+#define PTR_WIDTH 16
+#define PTR ((void *)0x0123456789ab)
+#define PTR_STR "0123456789ab"
+#define ZEROS ""   /* hex 32 zero bits */
+
+static int __init
+plain_format(void)
+{
+   char buf[PLAIN_BUF_SIZE];
+   int nchars;
+
+   nchars = snprintf(buf, PLAIN_BUF_SIZE, "%p", PTR);
+
+   if (nchars != PTR_WIDTH || strncmp(buf, ZEROS, strlen(ZEROS)) != 0)
+   return -1;
+
+   return 0;
+}
+
+#else
+
+#define PTR_WIDTH 8
+#define PTR ((void *)0x456789ab)
+#define PTR_STR "456789ab"
+
+static int __init
+plain_format(void)
+{
+   /* Format is implicitly tested for 32 bit machines by plain_hash() */
+   return 0;
+}
+
+#endif /* BITS_PER_LONG == 64 */
+
+static int __init
+plain_hash(void)
+{
+   char buf[PLAIN_BUF_SIZE];
+   int nchars;
+
+   nchars = snprintf(buf, PLAIN_BUF_SIZE, "%p", PTR);
+
+   if (nchars != PTR_WIDTH || strncmp(buf, PTR_STR, PTR_WIDTH) == 0)
+   return -1;
+
+   return 0;
+}
+
+/*
+ * We can't use test() to test %p because we don't know what output to expect
+ * after an address is hashed.
+ */
 static void __init
 plain(void)
 {
-   test(PTR1_ZEROES PTR1_STR " " PTR2_STR, "%p %p", PTR1, PTR2);
-   /*
-

[PATCH V10 0/2] printk: hash addresses printed with %p

2017-10-31 Thread Tobin C. Harding
Currently there are many places in the kernel where addresses are being
printed using an unadorned %p. Kernel pointers should be printed using
%pK allowing some control via the kptr_restrict sysctl. Exposing
addresses gives attackers sensitive information about the kernel layout
in memory.

We can reduce the attack surface by hashing all addresses printed with
%p. This will of course break some users, forcing code printing needed
addresses to be updated.

This version adds testing, this is my first effort at kernel unit
testing. Modules in `lib` don't seem contained within a selftest target
so in order to incrementally develop the tests I implemented the tests
in `lib/test_printf.c`, built with `make M=lib` and then to insert the
module, instead of running selftest, I spun up a VM and inserted the
module manually. Comments or suggestions much appreciated.

Here is the behaviour that this series implements.

For kpt_restrict==0

Randomness not ready:
  printed with %p: (ptrval) # NOTE: with padding
Valid pointer:
  printed with %pK: deadbeefdeadbeef
  printed with %p:  deadbeef
  malformed specifier (eg %i):  deadbeef
NULL pointer:
  printed with %pK: 
  printed with %p:   (null) # NOTE: with padding
  malformed specifier (eg %i):   (null)

For kpt_restrict==2

Valid pointer:
  printed with %pK: 

All other output as for kptr_restrict==0

V10:
 - Add patch so KASAN uses %pK instead of %p. 
 - Add documentation to Documentation/printk-formats.txt
 - Add tests to lib/test_printf.c
 - Change "(pointer value)" -> "(ptrval)" to fit within columns on 32
   bit machines.

V9:
 - Drop the initial patch from V8, leaving null pointer handling as is.
 - Print the hashed ID _without_ a '0x' suffix.
 - Mask the first 32 bits of the hashed ID to all zeros on 64 bit
   architectures.

V8:
 - Add second patch cleaning up null pointer printing in pointer()
 - Move %pK handling to separate function, further cleaning up pointer()
 - Move ptr_to_id() call outside of switch statement making hashing
   the default behaviour (including malformed specifiers).
 - Remove use of static_key, replace with simple boolean.

V7:
 - Use tabs instead of spaces (ouch!).

V6:
 - Use __early_initcall() to fill the SipHash key.
 - Use static keys to guard hashing before the key is available.

V5:
 - Remove spin lock.
 - Add Jason A. Donenfeld to CC list by request.
 - Add Theodore Ts'o to CC list due to comment on previous version.

V4:
 - Remove changes to siphash.{ch}
 - Do word size check, and return value cast, directly in ptr_to_id().
 - Use add_ready_random_callback() to guard call to get_random_bytes()

V3:
 - Use atomic_xchg() to guard setting [random] key.
 - Remove erroneous white space change.

V2:
 - Use SipHash to do the hashing.

The discussion related to this patch has been fragmented. There are
three threads associated with this patch. Email threads by subject:

[PATCH] printk: hash addresses printed with %p
[PATCH 0/3] add %pX specifier
[kernel-hardening] [RFC V2 0/6] add more kernel pointer filter options

Tobin C. Harding (2):
  kasan: use %pK to print addresses instead of %p
  printk: hash addresses printed with %p

 Documentation/printk-formats.txt |  17 +++-
 lib/test_printf.c| 108 +++-
 lib/vsprintf.c   | 176 ---
 mm/kasan/report.c|   8 +-
 4 files changed, 217 insertions(+), 92 deletions(-)

-- 
2.7.4



[PATCH V10 0/2] printk: hash addresses printed with %p

2017-10-31 Thread Tobin C. Harding
Currently there are many places in the kernel where addresses are being
printed using an unadorned %p. Kernel pointers should be printed using
%pK allowing some control via the kptr_restrict sysctl. Exposing
addresses gives attackers sensitive information about the kernel layout
in memory.

We can reduce the attack surface by hashing all addresses printed with
%p. This will of course break some users, forcing code printing needed
addresses to be updated.

This version adds testing, this is my first effort at kernel unit
testing. Modules in `lib` don't seem contained within a selftest target
so in order to incrementally develop the tests I implemented the tests
in `lib/test_printf.c`, built with `make M=lib` and then to insert the
module, instead of running selftest, I spun up a VM and inserted the
module manually. Comments or suggestions much appreciated.

Here is the behaviour that this series implements.

For kpt_restrict==0

Randomness not ready:
  printed with %p: (ptrval) # NOTE: with padding
Valid pointer:
  printed with %pK: deadbeefdeadbeef
  printed with %p:  deadbeef
  malformed specifier (eg %i):  deadbeef
NULL pointer:
  printed with %pK: 
  printed with %p:   (null) # NOTE: with padding
  malformed specifier (eg %i):   (null)

For kpt_restrict==2

Valid pointer:
  printed with %pK: 

All other output as for kptr_restrict==0

V10:
 - Add patch so KASAN uses %pK instead of %p. 
 - Add documentation to Documentation/printk-formats.txt
 - Add tests to lib/test_printf.c
 - Change "(pointer value)" -> "(ptrval)" to fit within columns on 32
   bit machines.

V9:
 - Drop the initial patch from V8, leaving null pointer handling as is.
 - Print the hashed ID _without_ a '0x' suffix.
 - Mask the first 32 bits of the hashed ID to all zeros on 64 bit
   architectures.

V8:
 - Add second patch cleaning up null pointer printing in pointer()
 - Move %pK handling to separate function, further cleaning up pointer()
 - Move ptr_to_id() call outside of switch statement making hashing
   the default behaviour (including malformed specifiers).
 - Remove use of static_key, replace with simple boolean.

V7:
 - Use tabs instead of spaces (ouch!).

V6:
 - Use __early_initcall() to fill the SipHash key.
 - Use static keys to guard hashing before the key is available.

V5:
 - Remove spin lock.
 - Add Jason A. Donenfeld to CC list by request.
 - Add Theodore Ts'o to CC list due to comment on previous version.

V4:
 - Remove changes to siphash.{ch}
 - Do word size check, and return value cast, directly in ptr_to_id().
 - Use add_ready_random_callback() to guard call to get_random_bytes()

V3:
 - Use atomic_xchg() to guard setting [random] key.
 - Remove erroneous white space change.

V2:
 - Use SipHash to do the hashing.

The discussion related to this patch has been fragmented. There are
three threads associated with this patch. Email threads by subject:

[PATCH] printk: hash addresses printed with %p
[PATCH 0/3] add %pX specifier
[kernel-hardening] [RFC V2 0/6] add more kernel pointer filter options

Tobin C. Harding (2):
  kasan: use %pK to print addresses instead of %p
  printk: hash addresses printed with %p

 Documentation/printk-formats.txt |  17 +++-
 lib/test_printf.c| 108 +++-
 lib/vsprintf.c   | 176 ---
 mm/kasan/report.c|   8 +-
 4 files changed, 217 insertions(+), 92 deletions(-)

-- 
2.7.4



Re: [PATCH] at24: support eeproms that do not roll over page reads.

2017-10-31 Thread kbuild test robot
Hi Sven,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.14-rc7]
[cannot apply to next-20171018]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Sven-Van-Asbroeck/at24-support-eeproms-that-do-not-roll-over-page-reads/20171101-114231
config: tile-allyesconfig (attached as .config)
compiler: tilegx-linux-gcc (GCC) 4.6.2
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=tile 

All warnings (new ones prefixed by >>):

   drivers/misc/eeprom/at24.c: In function 'at24_translate_offset':
>> drivers/misc/eeprom/at24.c:210:12: warning: comparison of distinct pointer 
>> types lacks a cast [enabled by default]

vim +210 drivers/misc/eeprom/at24.c

   185  
   186  /*
   187   * This routine supports chips which consume multiple I2C addresses. It
   188   * computes the addressing information to be used for a given r/w 
request.
   189   * Assumes that sanity checks for offset happened at sysfs-layer.
   190   *
   191   * Slave address and byte offset derive from the offset. Always
   192   * set the byte address; on a multi-master board, another master
   193   * may have changed the chip's "current" address pointer.
   194   *
   195   * In case of chips that don't rollover page reads, truncate the count
   196   * to the nearest page boundary. This might result in the
   197   * at24_eeprom_read_XXX functions reading fewer bytes than requested,
   198   * but this is compensated for in at24_read().
   199   */
   200  static struct i2c_client *at24_translate_offset(struct at24_data *at24,
   201  unsigned int *offset, size_t *count)
   202  {
   203  unsigned int i, bits, remainder;
   204  
   205  bits = (at24->chip.flags & AT24_FLAG_ADDR16) ? 16 : 8;
   206  i = *offset >> bits;
   207  *offset &= AT24_BITMASK(bits);
   208  if ((at24->chip.flags & AT24_FLAG_NO_RDROL) && count) {
   209  remainder = BIT(bits) - *offset;
 > 210  *count = min(*count, remainder);
   211  }
   212  
   213  return at24->client[i];
   214  }
   215  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH] at24: support eeproms that do not roll over page reads.

2017-10-31 Thread kbuild test robot
Hi Sven,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.14-rc7]
[cannot apply to next-20171018]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Sven-Van-Asbroeck/at24-support-eeproms-that-do-not-roll-over-page-reads/20171101-114231
config: tile-allyesconfig (attached as .config)
compiler: tilegx-linux-gcc (GCC) 4.6.2
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=tile 

All warnings (new ones prefixed by >>):

   drivers/misc/eeprom/at24.c: In function 'at24_translate_offset':
>> drivers/misc/eeprom/at24.c:210:12: warning: comparison of distinct pointer 
>> types lacks a cast [enabled by default]

vim +210 drivers/misc/eeprom/at24.c

   185  
   186  /*
   187   * This routine supports chips which consume multiple I2C addresses. It
   188   * computes the addressing information to be used for a given r/w 
request.
   189   * Assumes that sanity checks for offset happened at sysfs-layer.
   190   *
   191   * Slave address and byte offset derive from the offset. Always
   192   * set the byte address; on a multi-master board, another master
   193   * may have changed the chip's "current" address pointer.
   194   *
   195   * In case of chips that don't rollover page reads, truncate the count
   196   * to the nearest page boundary. This might result in the
   197   * at24_eeprom_read_XXX functions reading fewer bytes than requested,
   198   * but this is compensated for in at24_read().
   199   */
   200  static struct i2c_client *at24_translate_offset(struct at24_data *at24,
   201  unsigned int *offset, size_t *count)
   202  {
   203  unsigned int i, bits, remainder;
   204  
   205  bits = (at24->chip.flags & AT24_FLAG_ADDR16) ? 16 : 8;
   206  i = *offset >> bits;
   207  *offset &= AT24_BITMASK(bits);
   208  if ((at24->chip.flags & AT24_FLAG_NO_RDROL) && count) {
   209  remainder = BIT(bits) - *offset;
 > 210  *count = min(*count, remainder);
   211  }
   212  
   213  return at24->client[i];
   214  }
   215  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH] selftests/ftrace: Introduce exit_pass and exit_fail

2017-10-31 Thread Masami Hiramatsu
On Tue, 31 Oct 2017 17:44:32 -0400
Steven Rostedt  wrote:

> On Tue, 31 Oct 2017 23:51:42 +0900
> Masami Hiramatsu  wrote:
> 
> > diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/busy_check.tc 
> > b/tools/testing/selftests/ftrace/test.d/kprobe/busy_check.tc
> > index 74507db8bbc8..b8701fa0b8f2 100644
> > --- a/tools/testing/selftests/ftrace/test.d/kprobe/busy_check.tc
> > +++ b/tools/testing/selftests/ftrace/test.d/kprobe/busy_check.tc
> > @@ -8,7 +8,7 @@ echo > kprobe_events
> >  echo p:myevent _do_fork > kprobe_events
> >  test -d events/kprobes/myevent
> >  echo 1 > events/kprobes/myevent/enable
> > -echo > kprobe_events && exit 1 # this must fail
> > +echo > kprobe_events && exit_fail
> 
> Should we keep the comment about "this must fail", otherwise it may
> look like a mistake. Echoing in kprobe_events returns failure here?

Ah, good catch! I misread the comment is for "exit 1"...

Thank you,

> 
> -- Steve
> 
> 
> >  echo 0 > events/kprobes/myevent/enable
> >  echo > kprobe_events # this must succeed
> >  clear_trace


-- 
Masami Hiramatsu 


Re: [PATCH] selftests/ftrace: Introduce exit_pass and exit_fail

2017-10-31 Thread Masami Hiramatsu
On Tue, 31 Oct 2017 17:44:32 -0400
Steven Rostedt  wrote:

> On Tue, 31 Oct 2017 23:51:42 +0900
> Masami Hiramatsu  wrote:
> 
> > diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/busy_check.tc 
> > b/tools/testing/selftests/ftrace/test.d/kprobe/busy_check.tc
> > index 74507db8bbc8..b8701fa0b8f2 100644
> > --- a/tools/testing/selftests/ftrace/test.d/kprobe/busy_check.tc
> > +++ b/tools/testing/selftests/ftrace/test.d/kprobe/busy_check.tc
> > @@ -8,7 +8,7 @@ echo > kprobe_events
> >  echo p:myevent _do_fork > kprobe_events
> >  test -d events/kprobes/myevent
> >  echo 1 > events/kprobes/myevent/enable
> > -echo > kprobe_events && exit 1 # this must fail
> > +echo > kprobe_events && exit_fail
> 
> Should we keep the comment about "this must fail", otherwise it may
> look like a mistake. Echoing in kprobe_events returns failure here?

Ah, good catch! I misread the comment is for "exit 1"...

Thank you,

> 
> -- Steve
> 
> 
> >  echo 0 > events/kprobes/myevent/enable
> >  echo > kprobe_events # this must succeed
> >  clear_trace


-- 
Masami Hiramatsu 


Re: [PATCH] [irq] Fix boot failure when irqaffinity is passed.

2017-10-31 Thread Rakib Mullick
On Tue, Oct 31, 2017 at 5:29 PM, Ingo Molnar  wrote:
>
>
> Not applied, because this patch causes the following build warning:
>
>   kernel/irq/irqdesc.c:43:6: warning: the address of ‘irq_default_affinity’ 
> will always evaluate as ‘true’ [-Waddress]
>
Ah, sorry I didn't look into the build log. It happened due to removal
of #ifdef's. Now, it's been fixed by using cpumask_available().

> Also, please pick up the improved changelog below for the next version of the
> patch.
>
Thanks for the improved changelog, I have sent a new version here:
https://lkml.org/lkml/2017/11/1/6.

Thanks,
Rakib.


Re: [PATCH] [irq] Fix boot failure when irqaffinity is passed.

2017-10-31 Thread Rakib Mullick
On Tue, Oct 31, 2017 at 5:29 PM, Ingo Molnar  wrote:
>
>
> Not applied, because this patch causes the following build warning:
>
>   kernel/irq/irqdesc.c:43:6: warning: the address of ‘irq_default_affinity’ 
> will always evaluate as ‘true’ [-Waddress]
>
Ah, sorry I didn't look into the build log. It happened due to removal
of #ifdef's. Now, it's been fixed by using cpumask_available().

> Also, please pick up the improved changelog below for the next version of the
> patch.
>
Thanks for the improved changelog, I have sent a new version here:
https://lkml.org/lkml/2017/11/1/6.

Thanks,
Rakib.


Re: [PATCH] at24: support eeproms that do not roll over page reads.

2017-10-31 Thread kbuild test robot
Hi Sven,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.14-rc7]
[cannot apply to next-20171018]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Sven-Van-Asbroeck/at24-support-eeproms-that-do-not-roll-over-page-reads/20171101-114231
config: sparc64-allyesconfig (attached as .config)
compiler: sparc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc64 

All warnings (new ones prefixed by >>):

   In file included from drivers/misc/eeprom/at24.c:12:0:
   drivers/misc/eeprom/at24.c: In function 'at24_translate_offset':
   include/linux/kernel.h:790:16: warning: comparison of distinct pointer types 
lacks a cast
 (void) ( == );   \
   ^
   include/linux/kernel.h:799:2: note: in expansion of macro '__min'
 __min(typeof(x), typeof(y),   \
 ^
>> drivers/misc/eeprom/at24.c:210:12: note: in expansion of macro 'min'
  *count = min(*count, remainder);
   ^~~

vim +/min +210 drivers/misc/eeprom/at24.c

  > 12  #include 
13  #include 
14  #include 
15  #include 
16  #include 
17  #include 
18  #include 
19  #include 
20  #include 
21  #include 
22  #include 
23  #include 
24  #include 
25  #include 
26  #include 
27  
28  /*
29   * I2C EEPROMs from most vendors are inexpensive and mostly 
interchangeable.
30   * Differences between different vendor product lines (like Atmel AT24C 
or
31   * MicroChip 24LC, etc) won't much matter for typical read/write access.
32   * There are also I2C RAM chips, likewise interchangeable. One example
33   * would be the PCF8570, which acts like a 24c02 EEPROM (256 bytes).
34   *
35   * However, misconfiguration can lose data. "Set 16-bit memory address"
36   * to a part with 8-bit addressing will overwrite data. Writing with too
37   * big a page size also loses data. And it's not safe to assume that the
38   * conventional addresses 0x50..0x57 only hold eeproms; a PCF8563 RTC
39   * uses 0x51, for just one example.
40   *
41   * Accordingly, explicit board-specific configuration data should be 
used
42   * in almost all cases. (One partial exception is an SMBus used to 
access
43   * "SPD" data for DRAM sticks. Those only use 24c02 EEPROMs.)
44   *
45   * So this driver uses "new style" I2C driver binding, expecting to be
46   * told what devices exist. That may be in arch/X/mach-Y/board-Z.c or
47   * similar kernel-resident tables; or, configuration data coming from
48   * a bootloader.
49   *
50   * Other than binding model, current differences from "eeprom" driver 
are
51   * that this one handles write access and isn't restricted to 24c02 
devices.
52   * It also handles larger devices (32 kbit and up) with two-byte 
addresses,
53   * which won't work on pure SMBus systems.
54   */
55  
56  struct at24_data {
57  struct at24_platform_data chip;
58  int use_smbus;
59  int use_smbus_write;
60  
61  ssize_t (*read_func)(struct at24_data *, char *, unsigned int, 
size_t);
62  ssize_t (*write_func)(struct at24_data *,
63const char *, unsigned int, size_t);
64  
65  /*
66   * Lock protects against activities from other Linux tasks,
67   * but not from changes by other I2C masters.
68   */
69  struct mutex lock;
70  
71  u8 *writebuf;
72  unsigned write_max;
73  unsigned num_addresses;
74  
75  struct nvmem_config nvmem_config;
76  struct nvmem_device *nvmem;
77  
78  /*
79   * Some chips tie up multiple I2C addresses; dummy devices 
reserve
80   * them for us, and we'll use them with SMBus calls.
81   */
82  struct i2c_client *client[];
83  };
84  
85  /*
86   * This parameter is to help this driver avoid blocking other drivers 
out
87   * of I2C for potentially troublesome amounts of time. With a 100 kHz 
I2C
88   * clock, one 256 byte read takes about 1/43 second which is excessive;
89   * but the 1/170 second it takes at 400 kHz may be quite reasonable; and
90   * at 1 MHz (Fm+) a 1/430 second delay could easily be invisible.
91   *
92   * This value is forced to be a power of two so that writes align on 
pages.
93   */
94  static unsigned io_limit = 128;
95  module_param(io_limit, uint, 0);
96  

Re: [PATCH] at24: support eeproms that do not roll over page reads.

2017-10-31 Thread kbuild test robot
Hi Sven,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.14-rc7]
[cannot apply to next-20171018]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Sven-Van-Asbroeck/at24-support-eeproms-that-do-not-roll-over-page-reads/20171101-114231
config: sparc64-allyesconfig (attached as .config)
compiler: sparc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc64 

All warnings (new ones prefixed by >>):

   In file included from drivers/misc/eeprom/at24.c:12:0:
   drivers/misc/eeprom/at24.c: In function 'at24_translate_offset':
   include/linux/kernel.h:790:16: warning: comparison of distinct pointer types 
lacks a cast
 (void) ( == );   \
   ^
   include/linux/kernel.h:799:2: note: in expansion of macro '__min'
 __min(typeof(x), typeof(y),   \
 ^
>> drivers/misc/eeprom/at24.c:210:12: note: in expansion of macro 'min'
  *count = min(*count, remainder);
   ^~~

vim +/min +210 drivers/misc/eeprom/at24.c

  > 12  #include 
13  #include 
14  #include 
15  #include 
16  #include 
17  #include 
18  #include 
19  #include 
20  #include 
21  #include 
22  #include 
23  #include 
24  #include 
25  #include 
26  #include 
27  
28  /*
29   * I2C EEPROMs from most vendors are inexpensive and mostly 
interchangeable.
30   * Differences between different vendor product lines (like Atmel AT24C 
or
31   * MicroChip 24LC, etc) won't much matter for typical read/write access.
32   * There are also I2C RAM chips, likewise interchangeable. One example
33   * would be the PCF8570, which acts like a 24c02 EEPROM (256 bytes).
34   *
35   * However, misconfiguration can lose data. "Set 16-bit memory address"
36   * to a part with 8-bit addressing will overwrite data. Writing with too
37   * big a page size also loses data. And it's not safe to assume that the
38   * conventional addresses 0x50..0x57 only hold eeproms; a PCF8563 RTC
39   * uses 0x51, for just one example.
40   *
41   * Accordingly, explicit board-specific configuration data should be 
used
42   * in almost all cases. (One partial exception is an SMBus used to 
access
43   * "SPD" data for DRAM sticks. Those only use 24c02 EEPROMs.)
44   *
45   * So this driver uses "new style" I2C driver binding, expecting to be
46   * told what devices exist. That may be in arch/X/mach-Y/board-Z.c or
47   * similar kernel-resident tables; or, configuration data coming from
48   * a bootloader.
49   *
50   * Other than binding model, current differences from "eeprom" driver 
are
51   * that this one handles write access and isn't restricted to 24c02 
devices.
52   * It also handles larger devices (32 kbit and up) with two-byte 
addresses,
53   * which won't work on pure SMBus systems.
54   */
55  
56  struct at24_data {
57  struct at24_platform_data chip;
58  int use_smbus;
59  int use_smbus_write;
60  
61  ssize_t (*read_func)(struct at24_data *, char *, unsigned int, 
size_t);
62  ssize_t (*write_func)(struct at24_data *,
63const char *, unsigned int, size_t);
64  
65  /*
66   * Lock protects against activities from other Linux tasks,
67   * but not from changes by other I2C masters.
68   */
69  struct mutex lock;
70  
71  u8 *writebuf;
72  unsigned write_max;
73  unsigned num_addresses;
74  
75  struct nvmem_config nvmem_config;
76  struct nvmem_device *nvmem;
77  
78  /*
79   * Some chips tie up multiple I2C addresses; dummy devices 
reserve
80   * them for us, and we'll use them with SMBus calls.
81   */
82  struct i2c_client *client[];
83  };
84  
85  /*
86   * This parameter is to help this driver avoid blocking other drivers 
out
87   * of I2C for potentially troublesome amounts of time. With a 100 kHz 
I2C
88   * clock, one 256 byte read takes about 1/43 second which is excessive;
89   * but the 1/170 second it takes at 400 kHz may be quite reasonable; and
90   * at 1 MHz (Fm+) a 1/430 second delay could easily be invisible.
91   *
92   * This value is forced to be a power of two so that writes align on 
pages.
93   */
94  static unsigned io_limit = 128;
95  module_param(io_limit, uint, 0);
96  

Re: linux-next: manual merge of the net-next tree with the net tree

2017-10-31 Thread Cong Wang
On Tue, Oct 31, 2017 at 5:58 PM, Stephen Rothwell  wrote:
> Hi all,
>
> Today's linux-next merge of the net-next tree got a conflict in:
>
>   net/sched/cls_api.c
>
> between commit:
>
>   822e86d997e4 ("net_sched: remove tcf_block_put_deferred()")
>
> from the net tree and commit:
>
>   8c4083b30e56 ("net: sched: add block bind/unbind notif. and extended 
> block_get/put")
>
> from the net-next tree.

Seems good.

Thanks!


Re: linux-next: manual merge of the net-next tree with the net tree

2017-10-31 Thread Cong Wang
On Tue, Oct 31, 2017 at 5:58 PM, Stephen Rothwell  wrote:
> Hi all,
>
> Today's linux-next merge of the net-next tree got a conflict in:
>
>   net/sched/cls_api.c
>
> between commit:
>
>   822e86d997e4 ("net_sched: remove tcf_block_put_deferred()")
>
> from the net tree and commit:
>
>   8c4083b30e56 ("net: sched: add block bind/unbind notif. and extended 
> block_get/put")
>
> from the net-next tree.

Seems good.

Thanks!


Re: [RFC/RFT PATCH 3/6] ACPI / APEI: Replace ioremap_page_range() with fixmap

2017-10-31 Thread gengdongjiu
On 2017/10/31 23:38, James Morse wrote:
> CC'd people I've seen posting CPER log fragments, could you give this a
> test on your platforms?
Thanks for the fixing, not found obviously issue.



Re: [RFC/RFT PATCH 3/6] ACPI / APEI: Replace ioremap_page_range() with fixmap

2017-10-31 Thread gengdongjiu
On 2017/10/31 23:38, James Morse wrote:
> CC'd people I've seen posting CPER log fragments, could you give this a
> test on your platforms?
Thanks for the fixing, not found obviously issue.



[PATCH] irq/core: Fix boot crash when the irqaffinity= boot parameter is passed on CPUMASK_OFFSTACK=y kernels(v1)

2017-10-31 Thread Rakib Mullick
When the irqaffinity= kernel parameter is passed in a CPUMASK_OFFSTACK=y
kernel, it fails to boot, because zalloc_cpumask_var() cannot be used before
initializing the slab allocator to allocate a cpumask.

So, use alloc_bootmem_cpumask_var() instead.

Also do some cleanups while at it: in init_irq_default_affinity() remove
an unnecessary #ifdef.

Change since v0:
* Fix build warning.

Signed-off-by: Rakib Mullick 
Cc: Thomas Gleixner 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Link: http://lkml.kernel.org/r/20171026045800.27087-1-rakib.mull...@gmail.com
Signed-off-by: Ingo Molnar 
---
Patch created against -rc7 (commit 0b07194bb55ed836c2). I found tip had a merge
conflict, so used -rc7 instead.

 kernel/irq/irqdesc.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 82afb7e..e97bbae 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -27,7 +27,7 @@ static struct lock_class_key irq_desc_lock_class;
 #if defined(CONFIG_SMP)
 static int __init irq_affinity_setup(char *str)
 {
-   zalloc_cpumask_var(_default_affinity, GFP_NOWAIT);
+   alloc_bootmem_cpumask_var(_default_affinity);
cpulist_parse(str, irq_default_affinity);
/*
 * Set at least the boot cpu. We don't want to end up with
@@ -40,10 +40,8 @@ __setup("irqaffinity=", irq_affinity_setup);
 
 static void __init init_irq_default_affinity(void)
 {
-#ifdef CONFIG_CPUMASK_OFFSTACK
-   if (!irq_default_affinity)
+   if (!cpumask_available(irq_default_affinity))
zalloc_cpumask_var(_default_affinity, GFP_NOWAIT);
-#endif
if (cpumask_empty(irq_default_affinity))
cpumask_setall(irq_default_affinity);
 }
-- 
2.9.3



[PATCH] irq/core: Fix boot crash when the irqaffinity= boot parameter is passed on CPUMASK_OFFSTACK=y kernels(v1)

2017-10-31 Thread Rakib Mullick
When the irqaffinity= kernel parameter is passed in a CPUMASK_OFFSTACK=y
kernel, it fails to boot, because zalloc_cpumask_var() cannot be used before
initializing the slab allocator to allocate a cpumask.

So, use alloc_bootmem_cpumask_var() instead.

Also do some cleanups while at it: in init_irq_default_affinity() remove
an unnecessary #ifdef.

Change since v0:
* Fix build warning.

Signed-off-by: Rakib Mullick 
Cc: Thomas Gleixner 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Link: http://lkml.kernel.org/r/20171026045800.27087-1-rakib.mull...@gmail.com
Signed-off-by: Ingo Molnar 
---
Patch created against -rc7 (commit 0b07194bb55ed836c2). I found tip had a merge
conflict, so used -rc7 instead.

 kernel/irq/irqdesc.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 82afb7e..e97bbae 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -27,7 +27,7 @@ static struct lock_class_key irq_desc_lock_class;
 #if defined(CONFIG_SMP)
 static int __init irq_affinity_setup(char *str)
 {
-   zalloc_cpumask_var(_default_affinity, GFP_NOWAIT);
+   alloc_bootmem_cpumask_var(_default_affinity);
cpulist_parse(str, irq_default_affinity);
/*
 * Set at least the boot cpu. We don't want to end up with
@@ -40,10 +40,8 @@ __setup("irqaffinity=", irq_affinity_setup);
 
 static void __init init_irq_default_affinity(void)
 {
-#ifdef CONFIG_CPUMASK_OFFSTACK
-   if (!irq_default_affinity)
+   if (!cpumask_available(irq_default_affinity))
zalloc_cpumask_var(_default_affinity, GFP_NOWAIT);
-#endif
if (cpumask_empty(irq_default_affinity))
cpumask_setall(irq_default_affinity);
 }
-- 
2.9.3



linux-next: build warning after merge of the sound-asoc tree

2017-10-31 Thread Stephen Rothwell
Hi all,

After merging the sound-asoc tree, today's linux-next build (x86_64
allmodconfig) produced this warning:

sound/soc/stm/stm32_sai_sub.c: In function 'stm32_sai_hw_params':
sound/soc/stm/stm32_sai_sub.c:485:7: warning: 'cr1' may be used uninitialized 
in this function [-Wmaybe-uninitialized]
   cr1 |= SAI_XCR1_DS_SET(SAI_DATASIZE_8);
   ^
sound/soc/stm/stm32_sai_sub.c:469:6: note: 'cr1' was declared here
  int cr1, cr1_mask, ret;
  ^

Introduced by commit

  61fb4ff70377 ("ASoC: stm32: sai: Move static settings to DAI init")

-- 
Cheers,
Stephen Rothwell


linux-next: build warning after merge of the sound-asoc tree

2017-10-31 Thread Stephen Rothwell
Hi all,

After merging the sound-asoc tree, today's linux-next build (x86_64
allmodconfig) produced this warning:

sound/soc/stm/stm32_sai_sub.c: In function 'stm32_sai_hw_params':
sound/soc/stm/stm32_sai_sub.c:485:7: warning: 'cr1' may be used uninitialized 
in this function [-Wmaybe-uninitialized]
   cr1 |= SAI_XCR1_DS_SET(SAI_DATASIZE_8);
   ^
sound/soc/stm/stm32_sai_sub.c:469:6: note: 'cr1' was declared here
  int cr1, cr1_mask, ret;
  ^

Introduced by commit

  61fb4ff70377 ("ASoC: stm32: sai: Move static settings to DAI init")

-- 
Cheers,
Stephen Rothwell


[PATCH v3 1/2] gpio: gpiolib: Generalise state persistence beyond sleep

2017-10-31 Thread Andrew Jeffery
General support for state persistence is added to gpiolib with the
introduction of a new pinconf parameter to propagate the request to
hardware. The existing persistence support for sleep is adapted to
include hardware support if the GPIO driver provides it. Persistence
continues to be enabled by default; in-kernel consumers can opt out, but
userspace (currently) does not have a choice.

The *_SLEEP_MAY_LOSE_VALUE and *_SLEEP_MAINTAIN_VALUE symbols are
renamed, dropping the SLEEP prefix to reflect that the concept is no
longer sleep-specific.  I feel that renaming to just *_MAY_LOSE_VALUE
could initially be misinterpreted, so I've further changed the symbols
to *_TRANSITORY and *_PERSISTENT to address this.

The sysfs interface is modified only to keep consistency with the
chardev interface in enforcing persistence for userspace exports.

Signed-off-by: Andrew Jeffery 
---
 drivers/gpio/gpiolib-of.c   |  6 ++--
 drivers/gpio/gpiolib-sysfs.c| 14 +---
 drivers/gpio/gpiolib.c  | 61 ++---
 drivers/gpio/gpiolib.h  |  2 +-
 include/dt-bindings/gpio/gpio.h |  6 ++--
 include/linux/gpio/consumer.h   |  8 +
 include/linux/gpio/machine.h|  4 +--
 include/linux/of_gpio.h |  2 +-
 include/linux/pinctrl/pinconf-generic.h |  2 ++
 9 files changed, 87 insertions(+), 18 deletions(-)

diff --git a/drivers/gpio/gpiolib-of.c b/drivers/gpio/gpiolib-of.c
index e0d59e61b52f..4a2b8d3397c7 100644
--- a/drivers/gpio/gpiolib-of.c
+++ b/drivers/gpio/gpiolib-of.c
@@ -153,8 +153,8 @@ struct gpio_desc *of_find_gpio(struct device *dev, const 
char *con_id,
*flags |= GPIO_OPEN_SOURCE;
}
 
-   if (of_flags & OF_GPIO_SLEEP_MAY_LOSE_VALUE)
-   *flags |= GPIO_SLEEP_MAY_LOSE_VALUE;
+   if (of_flags & OF_GPIO_TRANSITORY)
+   *flags |= GPIO_TRANSITORY;
 
return desc;
 }
@@ -214,6 +214,8 @@ static struct gpio_desc *of_parse_own_gpio(struct 
device_node *np,
 
if (xlate_flags & OF_GPIO_ACTIVE_LOW)
*lflags |= GPIO_ACTIVE_LOW;
+   if (xlate_flags & OF_GPIO_TRANSITORY)
+   *lflags |= GPIO_TRANSITORY;
 
if (of_property_read_bool(np, "input"))
*dflags |= GPIOD_IN;
diff --git a/drivers/gpio/gpiolib-sysfs.c b/drivers/gpio/gpiolib-sysfs.c
index 3f454eaf2101..0bd472ffb072 100644
--- a/drivers/gpio/gpiolib-sysfs.c
+++ b/drivers/gpio/gpiolib-sysfs.c
@@ -474,11 +474,15 @@ static ssize_t export_store(struct class *class,
status = -ENODEV;
goto done;
}
-   status = gpiod_export(desc, true);
-   if (status < 0)
-   gpiod_free(desc);
-   else
-   set_bit(FLAG_SYSFS, >flags);
+
+   status = gpiod_set_transitory(desc, false);
+   if (!status) {
+   status = gpiod_export(desc, true);
+   if (status < 0)
+   gpiod_free(desc);
+   else
+   set_bit(FLAG_SYSFS, >flags);
+   }
 
 done:
if (status)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 3827f0767101..a5e81dc03aba 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -503,6 +503,10 @@ static int linehandle_create(struct gpio_device *gdev, 
void __user *ip)
if (lflags & GPIOHANDLE_REQUEST_OPEN_SOURCE)
set_bit(FLAG_OPEN_SOURCE, >flags);
 
+   ret = gpiod_set_transitory(desc, false);
+   if (ret < 0)
+   goto out_free_descs;
+
/*
 * Lines have to be requested explicitly for input
 * or output, else the line will be treated "as is".
@@ -2424,6 +2428,49 @@ int gpiod_set_debounce(struct gpio_desc *desc, unsigned 
debounce)
 EXPORT_SYMBOL_GPL(gpiod_set_debounce);
 
 /**
+ * gpiod_set_transitory - Lose or retain GPIO state on suspend or reset
+ * @desc: descriptor of the GPIO for which to configure persistence
+ * @transitory: True to lose state on suspend or reset, false for persistence
+ *
+ * Returns:
+ * 0 on success, otherwise a negative error code.
+ */
+int gpiod_set_transitory(struct gpio_desc *desc, bool transitory)
+{
+   struct gpio_chip *chip;
+   unsigned long packed;
+   int gpio;
+   int rc;
+
+   /*
+* Handle FLAG_TRANSITORY first, enabling queries to gpiolib for
+* persistence state.
+*/
+   if (transitory)
+   set_bit(FLAG_TRANSITORY, >flags);
+   else
+   clear_bit(FLAG_TRANSITORY, >flags);
+
+   /* If the driver supports it, set the persistence state now */
+   chip = desc->gdev->chip;
+   if (!chip->set_config)
+   return 0;
+
+   packed = pinconf_to_config_packed(PIN_CONFIG_PERSIST_STATE,
+ !transitory);
+   gpio = 

[PATCH v3 2/2] gpio: aspeed: Add support for reset tolerance

2017-10-31 Thread Andrew Jeffery
Use the new pinconf parameter for state persistence to expose the
associated capability of the Aspeed GPIO controller.

Signed-off-by: Andrew Jeffery 
---
 drivers/gpio/gpio-aspeed.c | 39 +--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/gpio/gpio-aspeed.c b/drivers/gpio/gpio-aspeed.c
index 00dc1c020198..3125dcb9211d 100644
--- a/drivers/gpio/gpio-aspeed.c
+++ b/drivers/gpio/gpio-aspeed.c
@@ -60,6 +60,7 @@ struct aspeed_gpio_bank {
uint16_tval_regs;
uint16_tirq_regs;
uint16_tdebounce_regs;
+   uint16_ttolerance_regs;
const char  names[4][3];
 };
 
@@ -70,48 +71,56 @@ static const struct aspeed_gpio_bank aspeed_gpio_banks[] = {
.val_regs = 0x,
.irq_regs = 0x0008,
.debounce_regs = 0x0040,
+   .tolerance_regs = 0x001c,
.names = { "A", "B", "C", "D" },
},
{
.val_regs = 0x0020,
.irq_regs = 0x0028,
.debounce_regs = 0x0048,
+   .tolerance_regs = 0x003c,
.names = { "E", "F", "G", "H" },
},
{
.val_regs = 0x0070,
.irq_regs = 0x0098,
.debounce_regs = 0x00b0,
+   .tolerance_regs = 0x00ac,
.names = { "I", "J", "K", "L" },
},
{
.val_regs = 0x0078,
.irq_regs = 0x00e8,
.debounce_regs = 0x0100,
+   .tolerance_regs = 0x00fc,
.names = { "M", "N", "O", "P" },
},
{
.val_regs = 0x0080,
.irq_regs = 0x0118,
.debounce_regs = 0x0130,
+   .tolerance_regs = 0x012c,
.names = { "Q", "R", "S", "T" },
},
{
.val_regs = 0x0088,
.irq_regs = 0x0148,
.debounce_regs = 0x0160,
+   .tolerance_regs = 0x015c,
.names = { "U", "V", "W", "X" },
},
{
.val_regs = 0x01E0,
.irq_regs = 0x0178,
.debounce_regs = 0x0190,
+   .tolerance_regs = 0x018c,
.names = { "Y", "Z", "AA", "AB" },
},
{
-   .val_regs = 0x01E8,
-   .irq_regs = 0x01A8,
+   .val_regs = 0x01e8,
+   .irq_regs = 0x01a8,
.debounce_regs = 0x01c0,
+   .tolerance_regs = 0x01bc,
.names = { "AC", "", "", "" },
},
 };
@@ -534,6 +543,30 @@ static int aspeed_gpio_setup_irqs(struct aspeed_gpio *gpio,
return 0;
 }
 
+static int aspeed_gpio_reset_tolerance(struct gpio_chip *chip,
+   unsigned int offset, bool enable)
+{
+   struct aspeed_gpio *gpio = gpiochip_get_data(chip);
+   const struct aspeed_gpio_bank *bank;
+   unsigned long flags;
+   u32 val;
+
+   bank = to_bank(offset);
+
+   spin_lock_irqsave(>lock, flags);
+   val = readl(gpio->base + bank->tolerance_regs);
+
+   if (enable)
+   val |= GPIO_BIT(offset);
+   else
+   val &= ~GPIO_BIT(offset);
+
+   writel(val, gpio->base + bank->tolerance_regs);
+   spin_unlock_irqrestore(>lock, flags);
+
+   return 0;
+}
+
 static int aspeed_gpio_request(struct gpio_chip *chip, unsigned int offset)
 {
if (!have_gpio(gpiochip_get_data(chip), offset))
@@ -771,6 +804,8 @@ static int aspeed_gpio_set_config(struct gpio_chip *chip, 
unsigned int offset,
param == PIN_CONFIG_DRIVE_OPEN_SOURCE)
/* Return -ENOTSUPP to trigger emulation, as per datasheet */
return -ENOTSUPP;
+   else if (param == PIN_CONFIG_PERSIST_STATE)
+   return aspeed_gpio_reset_tolerance(chip, offset, arg);
 
return -ENOTSUPP;
 }
-- 
2.11.0



[PATCH v3 0/2] gpio: Generalise state persistence

2017-10-31 Thread Andrew Jeffery
Hello,

This series provides an API to configure general GPIO state persistence in
gpiolib. Previously, only sleep persistence was considered, but controllers
like one found in Aspeed BMCs also support persistence of state across
controller resets. There is some prior discussion on v1[1] and the initial
RFC[2], and minor comments on v2[3]. v3 addresses minor issues with comments
and debug statements[4], removing remaining references to reset tolerance.

Please review!

Andrew

[1] https://www.spinics.net/lists/devicetree/msg200027.html
[2] https://www.spinics.net/lists/devicetree/msg199559.html
[3] https://www.spinics.net/lists/kernel/msg2635769.html
[4] https://www.spinics.net/lists/devicetree/msg200040.html

Andrew Jeffery (2):
  gpio: gpiolib: Generalise state persistence beyond sleep
  gpio: aspeed: Add support for reset tolerance

 drivers/gpio/gpio-aspeed.c  | 39 +++--
 drivers/gpio/gpiolib-of.c   |  6 ++--
 drivers/gpio/gpiolib-sysfs.c| 14 +---
 drivers/gpio/gpiolib.c  | 61 ++---
 drivers/gpio/gpiolib.h  |  2 +-
 include/dt-bindings/gpio/gpio.h |  6 ++--
 include/linux/gpio/consumer.h   |  8 +
 include/linux/gpio/machine.h|  4 +--
 include/linux/of_gpio.h |  2 +-
 include/linux/pinctrl/pinconf-generic.h |  2 ++
 10 files changed, 124 insertions(+), 20 deletions(-)

-- 
2.11.0



[PATCH v3 1/2] gpio: gpiolib: Generalise state persistence beyond sleep

2017-10-31 Thread Andrew Jeffery
General support for state persistence is added to gpiolib with the
introduction of a new pinconf parameter to propagate the request to
hardware. The existing persistence support for sleep is adapted to
include hardware support if the GPIO driver provides it. Persistence
continues to be enabled by default; in-kernel consumers can opt out, but
userspace (currently) does not have a choice.

The *_SLEEP_MAY_LOSE_VALUE and *_SLEEP_MAINTAIN_VALUE symbols are
renamed, dropping the SLEEP prefix to reflect that the concept is no
longer sleep-specific.  I feel that renaming to just *_MAY_LOSE_VALUE
could initially be misinterpreted, so I've further changed the symbols
to *_TRANSITORY and *_PERSISTENT to address this.

The sysfs interface is modified only to keep consistency with the
chardev interface in enforcing persistence for userspace exports.

Signed-off-by: Andrew Jeffery 
---
 drivers/gpio/gpiolib-of.c   |  6 ++--
 drivers/gpio/gpiolib-sysfs.c| 14 +---
 drivers/gpio/gpiolib.c  | 61 ++---
 drivers/gpio/gpiolib.h  |  2 +-
 include/dt-bindings/gpio/gpio.h |  6 ++--
 include/linux/gpio/consumer.h   |  8 +
 include/linux/gpio/machine.h|  4 +--
 include/linux/of_gpio.h |  2 +-
 include/linux/pinctrl/pinconf-generic.h |  2 ++
 9 files changed, 87 insertions(+), 18 deletions(-)

diff --git a/drivers/gpio/gpiolib-of.c b/drivers/gpio/gpiolib-of.c
index e0d59e61b52f..4a2b8d3397c7 100644
--- a/drivers/gpio/gpiolib-of.c
+++ b/drivers/gpio/gpiolib-of.c
@@ -153,8 +153,8 @@ struct gpio_desc *of_find_gpio(struct device *dev, const 
char *con_id,
*flags |= GPIO_OPEN_SOURCE;
}
 
-   if (of_flags & OF_GPIO_SLEEP_MAY_LOSE_VALUE)
-   *flags |= GPIO_SLEEP_MAY_LOSE_VALUE;
+   if (of_flags & OF_GPIO_TRANSITORY)
+   *flags |= GPIO_TRANSITORY;
 
return desc;
 }
@@ -214,6 +214,8 @@ static struct gpio_desc *of_parse_own_gpio(struct 
device_node *np,
 
if (xlate_flags & OF_GPIO_ACTIVE_LOW)
*lflags |= GPIO_ACTIVE_LOW;
+   if (xlate_flags & OF_GPIO_TRANSITORY)
+   *lflags |= GPIO_TRANSITORY;
 
if (of_property_read_bool(np, "input"))
*dflags |= GPIOD_IN;
diff --git a/drivers/gpio/gpiolib-sysfs.c b/drivers/gpio/gpiolib-sysfs.c
index 3f454eaf2101..0bd472ffb072 100644
--- a/drivers/gpio/gpiolib-sysfs.c
+++ b/drivers/gpio/gpiolib-sysfs.c
@@ -474,11 +474,15 @@ static ssize_t export_store(struct class *class,
status = -ENODEV;
goto done;
}
-   status = gpiod_export(desc, true);
-   if (status < 0)
-   gpiod_free(desc);
-   else
-   set_bit(FLAG_SYSFS, >flags);
+
+   status = gpiod_set_transitory(desc, false);
+   if (!status) {
+   status = gpiod_export(desc, true);
+   if (status < 0)
+   gpiod_free(desc);
+   else
+   set_bit(FLAG_SYSFS, >flags);
+   }
 
 done:
if (status)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 3827f0767101..a5e81dc03aba 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -503,6 +503,10 @@ static int linehandle_create(struct gpio_device *gdev, 
void __user *ip)
if (lflags & GPIOHANDLE_REQUEST_OPEN_SOURCE)
set_bit(FLAG_OPEN_SOURCE, >flags);
 
+   ret = gpiod_set_transitory(desc, false);
+   if (ret < 0)
+   goto out_free_descs;
+
/*
 * Lines have to be requested explicitly for input
 * or output, else the line will be treated "as is".
@@ -2424,6 +2428,49 @@ int gpiod_set_debounce(struct gpio_desc *desc, unsigned 
debounce)
 EXPORT_SYMBOL_GPL(gpiod_set_debounce);
 
 /**
+ * gpiod_set_transitory - Lose or retain GPIO state on suspend or reset
+ * @desc: descriptor of the GPIO for which to configure persistence
+ * @transitory: True to lose state on suspend or reset, false for persistence
+ *
+ * Returns:
+ * 0 on success, otherwise a negative error code.
+ */
+int gpiod_set_transitory(struct gpio_desc *desc, bool transitory)
+{
+   struct gpio_chip *chip;
+   unsigned long packed;
+   int gpio;
+   int rc;
+
+   /*
+* Handle FLAG_TRANSITORY first, enabling queries to gpiolib for
+* persistence state.
+*/
+   if (transitory)
+   set_bit(FLAG_TRANSITORY, >flags);
+   else
+   clear_bit(FLAG_TRANSITORY, >flags);
+
+   /* If the driver supports it, set the persistence state now */
+   chip = desc->gdev->chip;
+   if (!chip->set_config)
+   return 0;
+
+   packed = pinconf_to_config_packed(PIN_CONFIG_PERSIST_STATE,
+ !transitory);
+   gpio = gpio_chip_hwgpio(desc);
+   

[PATCH v3 2/2] gpio: aspeed: Add support for reset tolerance

2017-10-31 Thread Andrew Jeffery
Use the new pinconf parameter for state persistence to expose the
associated capability of the Aspeed GPIO controller.

Signed-off-by: Andrew Jeffery 
---
 drivers/gpio/gpio-aspeed.c | 39 +--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/gpio/gpio-aspeed.c b/drivers/gpio/gpio-aspeed.c
index 00dc1c020198..3125dcb9211d 100644
--- a/drivers/gpio/gpio-aspeed.c
+++ b/drivers/gpio/gpio-aspeed.c
@@ -60,6 +60,7 @@ struct aspeed_gpio_bank {
uint16_tval_regs;
uint16_tirq_regs;
uint16_tdebounce_regs;
+   uint16_ttolerance_regs;
const char  names[4][3];
 };
 
@@ -70,48 +71,56 @@ static const struct aspeed_gpio_bank aspeed_gpio_banks[] = {
.val_regs = 0x,
.irq_regs = 0x0008,
.debounce_regs = 0x0040,
+   .tolerance_regs = 0x001c,
.names = { "A", "B", "C", "D" },
},
{
.val_regs = 0x0020,
.irq_regs = 0x0028,
.debounce_regs = 0x0048,
+   .tolerance_regs = 0x003c,
.names = { "E", "F", "G", "H" },
},
{
.val_regs = 0x0070,
.irq_regs = 0x0098,
.debounce_regs = 0x00b0,
+   .tolerance_regs = 0x00ac,
.names = { "I", "J", "K", "L" },
},
{
.val_regs = 0x0078,
.irq_regs = 0x00e8,
.debounce_regs = 0x0100,
+   .tolerance_regs = 0x00fc,
.names = { "M", "N", "O", "P" },
},
{
.val_regs = 0x0080,
.irq_regs = 0x0118,
.debounce_regs = 0x0130,
+   .tolerance_regs = 0x012c,
.names = { "Q", "R", "S", "T" },
},
{
.val_regs = 0x0088,
.irq_regs = 0x0148,
.debounce_regs = 0x0160,
+   .tolerance_regs = 0x015c,
.names = { "U", "V", "W", "X" },
},
{
.val_regs = 0x01E0,
.irq_regs = 0x0178,
.debounce_regs = 0x0190,
+   .tolerance_regs = 0x018c,
.names = { "Y", "Z", "AA", "AB" },
},
{
-   .val_regs = 0x01E8,
-   .irq_regs = 0x01A8,
+   .val_regs = 0x01e8,
+   .irq_regs = 0x01a8,
.debounce_regs = 0x01c0,
+   .tolerance_regs = 0x01bc,
.names = { "AC", "", "", "" },
},
 };
@@ -534,6 +543,30 @@ static int aspeed_gpio_setup_irqs(struct aspeed_gpio *gpio,
return 0;
 }
 
+static int aspeed_gpio_reset_tolerance(struct gpio_chip *chip,
+   unsigned int offset, bool enable)
+{
+   struct aspeed_gpio *gpio = gpiochip_get_data(chip);
+   const struct aspeed_gpio_bank *bank;
+   unsigned long flags;
+   u32 val;
+
+   bank = to_bank(offset);
+
+   spin_lock_irqsave(>lock, flags);
+   val = readl(gpio->base + bank->tolerance_regs);
+
+   if (enable)
+   val |= GPIO_BIT(offset);
+   else
+   val &= ~GPIO_BIT(offset);
+
+   writel(val, gpio->base + bank->tolerance_regs);
+   spin_unlock_irqrestore(>lock, flags);
+
+   return 0;
+}
+
 static int aspeed_gpio_request(struct gpio_chip *chip, unsigned int offset)
 {
if (!have_gpio(gpiochip_get_data(chip), offset))
@@ -771,6 +804,8 @@ static int aspeed_gpio_set_config(struct gpio_chip *chip, 
unsigned int offset,
param == PIN_CONFIG_DRIVE_OPEN_SOURCE)
/* Return -ENOTSUPP to trigger emulation, as per datasheet */
return -ENOTSUPP;
+   else if (param == PIN_CONFIG_PERSIST_STATE)
+   return aspeed_gpio_reset_tolerance(chip, offset, arg);
 
return -ENOTSUPP;
 }
-- 
2.11.0



[PATCH v3 0/2] gpio: Generalise state persistence

2017-10-31 Thread Andrew Jeffery
Hello,

This series provides an API to configure general GPIO state persistence in
gpiolib. Previously, only sleep persistence was considered, but controllers
like one found in Aspeed BMCs also support persistence of state across
controller resets. There is some prior discussion on v1[1] and the initial
RFC[2], and minor comments on v2[3]. v3 addresses minor issues with comments
and debug statements[4], removing remaining references to reset tolerance.

Please review!

Andrew

[1] https://www.spinics.net/lists/devicetree/msg200027.html
[2] https://www.spinics.net/lists/devicetree/msg199559.html
[3] https://www.spinics.net/lists/kernel/msg2635769.html
[4] https://www.spinics.net/lists/devicetree/msg200040.html

Andrew Jeffery (2):
  gpio: gpiolib: Generalise state persistence beyond sleep
  gpio: aspeed: Add support for reset tolerance

 drivers/gpio/gpio-aspeed.c  | 39 +++--
 drivers/gpio/gpiolib-of.c   |  6 ++--
 drivers/gpio/gpiolib-sysfs.c| 14 +---
 drivers/gpio/gpiolib.c  | 61 ++---
 drivers/gpio/gpiolib.h  |  2 +-
 include/dt-bindings/gpio/gpio.h |  6 ++--
 include/linux/gpio/consumer.h   |  8 +
 include/linux/gpio/machine.h|  4 +--
 include/linux/of_gpio.h |  2 +-
 include/linux/pinctrl/pinconf-generic.h |  2 ++
 10 files changed, 124 insertions(+), 20 deletions(-)

-- 
2.11.0



linux-next: manual merge of the sound-asoc tree with the drm tree

2017-10-31 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the sound-asoc tree got a conflict in:

  drivers/gpu/drm/amd/include/amd_shared.h

between commit:

  cfa289fd4986 ("drm/amdgpu: rename amdgpu_dpm_funcs to amd_pm_funcs")

from the drm tree and commit:

  f674bd281460 ("drm/amdgpu Moving amdgpu asic types to a separate file")

from the sound-asoc tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/gpu/drm/amd/include/amd_shared.h
index de6fc2731b98,3a49fbd8baf8..
--- a/drivers/gpu/drm/amd/include/amd_shared.h
+++ b/drivers/gpu/drm/amd/include/amd_shared.h
@@@ -23,36 -23,9 +23,11 @@@
  #ifndef __AMD_SHARED_H__
  #define __AMD_SHARED_H__
  
- #define AMD_MAX_USEC_TIMEOUT  20  /* 200 ms */
+ #include 
  
 +struct seq_file;
 +
- /*
-  * Supported ASIC types
-  */
- enum amd_asic_type {
-   CHIP_TAHITI = 0,
-   CHIP_PITCAIRN,
-   CHIP_VERDE,
-   CHIP_OLAND,
-   CHIP_HAINAN,
-   CHIP_BONAIRE,
-   CHIP_KAVERI,
-   CHIP_KABINI,
-   CHIP_HAWAII,
-   CHIP_MULLINS,
-   CHIP_TOPAZ,
-   CHIP_TONGA,
-   CHIP_FIJI,
-   CHIP_CARRIZO,
-   CHIP_STONEY,
-   CHIP_POLARIS10,
-   CHIP_POLARIS11,
-   CHIP_POLARIS12,
-   CHIP_VEGA10,
-   CHIP_RAVEN,
-   CHIP_LAST,
- };
+ #define AMD_MAX_USEC_TIMEOUT  20  /* 200 ms */
  
  /*
   * Chip flags


linux-next: manual merge of the sound-asoc tree with the drm tree

2017-10-31 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the sound-asoc tree got a conflict in:

  drivers/gpu/drm/amd/include/amd_shared.h

between commit:

  cfa289fd4986 ("drm/amdgpu: rename amdgpu_dpm_funcs to amd_pm_funcs")

from the drm tree and commit:

  f674bd281460 ("drm/amdgpu Moving amdgpu asic types to a separate file")

from the sound-asoc tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/gpu/drm/amd/include/amd_shared.h
index de6fc2731b98,3a49fbd8baf8..
--- a/drivers/gpu/drm/amd/include/amd_shared.h
+++ b/drivers/gpu/drm/amd/include/amd_shared.h
@@@ -23,36 -23,9 +23,11 @@@
  #ifndef __AMD_SHARED_H__
  #define __AMD_SHARED_H__
  
- #define AMD_MAX_USEC_TIMEOUT  20  /* 200 ms */
+ #include 
  
 +struct seq_file;
 +
- /*
-  * Supported ASIC types
-  */
- enum amd_asic_type {
-   CHIP_TAHITI = 0,
-   CHIP_PITCAIRN,
-   CHIP_VERDE,
-   CHIP_OLAND,
-   CHIP_HAINAN,
-   CHIP_BONAIRE,
-   CHIP_KAVERI,
-   CHIP_KABINI,
-   CHIP_HAWAII,
-   CHIP_MULLINS,
-   CHIP_TOPAZ,
-   CHIP_TONGA,
-   CHIP_FIJI,
-   CHIP_CARRIZO,
-   CHIP_STONEY,
-   CHIP_POLARIS10,
-   CHIP_POLARIS11,
-   CHIP_POLARIS12,
-   CHIP_VEGA10,
-   CHIP_RAVEN,
-   CHIP_LAST,
- };
+ #define AMD_MAX_USEC_TIMEOUT  20  /* 200 ms */
  
  /*
   * Chip flags


[PATCH 2/2] Documentation: fsl: dspi: Add a compatible string for ls1088a DSPI

2017-10-31 Thread Zhiqiang Hou
From: Hou Zhiqiang 

Add a new compatible string "fsl,ls1088a-dspi".

Signed-off-by: Hou Zhiqiang 
---
 Documentation/devicetree/bindings/spi/spi-fsl-dspi.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/spi/spi-fsl-dspi.txt 
b/Documentation/devicetree/bindings/spi/spi-fsl-dspi.txt
index dcc7eaada511..5fc467211cc6 100644
--- a/Documentation/devicetree/bindings/spi/spi-fsl-dspi.txt
+++ b/Documentation/devicetree/bindings/spi/spi-fsl-dspi.txt
@@ -6,6 +6,7 @@ Required properties:
or
"fsl,ls2080a-dspi" followed by "fsl,ls2085a-dspi"
"fsl,ls1012a-dspi" followed by "fsl,ls1021a-v1.0-dspi"
+   "fsl,ls1088a-dspi" followed by "fsl,ls2085a-dspi"
 - reg : Offset and length of the register set for the device
 - interrupts : Should contain SPI controller interrupt
 - clocks: from common clock binding: handle to dspi clock.
-- 
2.14.1



[PATCH 1/2] arm64: dts: ls1088a: add DT nodes for DSPI support

2017-10-31 Thread Zhiqiang Hou
From: Hou Zhiqiang 

Signed-off-by: Hou Zhiqiang 
---
 arch/arm64/boot/dts/freescale/fsl-ls1088a-qds.dts | 28 +++
 arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi| 13 +++
 2 files changed, 41 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1088a-qds.dts 
b/arch/arm64/boot/dts/freescale/fsl-ls1088a-qds.dts
index 30128051d0c0..cf5b85b93ae6 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1088a-qds.dts
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1088a-qds.dts
@@ -134,6 +134,34 @@
};
 };
 
+ {
+   status = "okay";
+
+   dflash0: n25q128a {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   compatible = "n25q128a11", "jedec,spi-nor";
+   reg = <0>;
+   spi-max-frequency = <300>;
+   };
+
+   dflash1: sst25wf040b {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   compatible = "sst,sst25wf040b", "jedec,spi-nor";
+   reg = <1>;
+   spi-max-frequency = <300>;
+   };
+
+   dflash2: en25s64 {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   compatible = "eon,en25s64", "jedec,spi-nor";
+   reg = <2>;
+   spi-max-frequency = <300>;
+   };
+};
+
  {
status = "okay";
 };
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
index bd80e9a2e67c..f5ed3878abb7 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
@@ -276,6 +276,19 @@
};
};
 
+   dspi: dspi@210 {
+   compatible = "fsl,ls1088a-dspi", "fsl,ls2085a-dspi";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0x0 0x210 0x0 0x1>;
+   interrupts = <0 26 0x4>; /* Level high type */
+   clocks = < 4 3>;
+   clock-names = "dspi";
+   spi-num-chipselects = <5>;
+   bus-num = <0>;
+   status = "disabled";
+   };
+
duart0: serial@21c0500 {
compatible = "fsl,ns16550", "ns16550a";
reg = <0x0 0x21c0500 0x0 0x100>;
-- 
2.14.1



[PATCH 2/2] Documentation: fsl: dspi: Add a compatible string for ls1088a DSPI

2017-10-31 Thread Zhiqiang Hou
From: Hou Zhiqiang 

Add a new compatible string "fsl,ls1088a-dspi".

Signed-off-by: Hou Zhiqiang 
---
 Documentation/devicetree/bindings/spi/spi-fsl-dspi.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/spi/spi-fsl-dspi.txt 
b/Documentation/devicetree/bindings/spi/spi-fsl-dspi.txt
index dcc7eaada511..5fc467211cc6 100644
--- a/Documentation/devicetree/bindings/spi/spi-fsl-dspi.txt
+++ b/Documentation/devicetree/bindings/spi/spi-fsl-dspi.txt
@@ -6,6 +6,7 @@ Required properties:
or
"fsl,ls2080a-dspi" followed by "fsl,ls2085a-dspi"
"fsl,ls1012a-dspi" followed by "fsl,ls1021a-v1.0-dspi"
+   "fsl,ls1088a-dspi" followed by "fsl,ls2085a-dspi"
 - reg : Offset and length of the register set for the device
 - interrupts : Should contain SPI controller interrupt
 - clocks: from common clock binding: handle to dspi clock.
-- 
2.14.1



[PATCH 1/2] arm64: dts: ls1088a: add DT nodes for DSPI support

2017-10-31 Thread Zhiqiang Hou
From: Hou Zhiqiang 

Signed-off-by: Hou Zhiqiang 
---
 arch/arm64/boot/dts/freescale/fsl-ls1088a-qds.dts | 28 +++
 arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi| 13 +++
 2 files changed, 41 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1088a-qds.dts 
b/arch/arm64/boot/dts/freescale/fsl-ls1088a-qds.dts
index 30128051d0c0..cf5b85b93ae6 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1088a-qds.dts
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1088a-qds.dts
@@ -134,6 +134,34 @@
};
 };
 
+ {
+   status = "okay";
+
+   dflash0: n25q128a {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   compatible = "n25q128a11", "jedec,spi-nor";
+   reg = <0>;
+   spi-max-frequency = <300>;
+   };
+
+   dflash1: sst25wf040b {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   compatible = "sst,sst25wf040b", "jedec,spi-nor";
+   reg = <1>;
+   spi-max-frequency = <300>;
+   };
+
+   dflash2: en25s64 {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   compatible = "eon,en25s64", "jedec,spi-nor";
+   reg = <2>;
+   spi-max-frequency = <300>;
+   };
+};
+
  {
status = "okay";
 };
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
index bd80e9a2e67c..f5ed3878abb7 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
@@ -276,6 +276,19 @@
};
};
 
+   dspi: dspi@210 {
+   compatible = "fsl,ls1088a-dspi", "fsl,ls2085a-dspi";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0x0 0x210 0x0 0x1>;
+   interrupts = <0 26 0x4>; /* Level high type */
+   clocks = < 4 3>;
+   clock-names = "dspi";
+   spi-num-chipselects = <5>;
+   bus-num = <0>;
+   status = "disabled";
+   };
+
duart0: serial@21c0500 {
compatible = "fsl,ns16550", "ns16550a";
reg = <0x0 0x21c0500 0x0 0x100>;
-- 
2.14.1



[PATCH 0/2] arm64: dts: Add ls1088a DSPI device tree nodes

2017-10-31 Thread Zhiqiang Hou
From: Hou Zhiqiang 

LS1088A reuse LS2085A DSPI driver, this patchset just adds device tree
nodes and adds compatible entry to documentation.

Hou Zhiqiang (2):
  arm64: dts: ls1088a: add DT nodes for DSPI support
  Documentation: fsl: dspi: Add a compatible string for ls1088a DSPI

 .../devicetree/bindings/spi/spi-fsl-dspi.txt   |  1 +
 arch/arm64/boot/dts/freescale/fsl-ls1088a-qds.dts  | 28 ++
 arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 13 ++
 3 files changed, 42 insertions(+)

-- 
2.14.1



[PATCH 0/2] arm64: dts: Add ls1088a DSPI device tree nodes

2017-10-31 Thread Zhiqiang Hou
From: Hou Zhiqiang 

LS1088A reuse LS2085A DSPI driver, this patchset just adds device tree
nodes and adds compatible entry to documentation.

Hou Zhiqiang (2):
  arm64: dts: ls1088a: add DT nodes for DSPI support
  Documentation: fsl: dspi: Add a compatible string for ls1088a DSPI

 .../devicetree/bindings/spi/spi-fsl-dspi.txt   |  1 +
 arch/arm64/boot/dts/freescale/fsl-ls1088a-qds.dts  | 28 ++
 arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 13 ++
 3 files changed, 42 insertions(+)

-- 
2.14.1



linux-next: build warning after merge of the drm-msm tree

2017-10-31 Thread Stephen Rothwell
Hi Rob,

After merging the drm-msm tree, today's linux-next build (arm
multi_v7_defconfig) produced this warning:

In file included from include/drm/drm_mm.h:49:0,
 from include/drm/drmP.h:73,
 from drivers/gpu/drm/msm/msm_drv.h:37,
 from drivers/gpu/drm/msm/msm_gpu.h:24,
 from drivers/gpu/drm/msm/msm_gpu.c:18:
drivers/gpu/drm/msm/msm_gpu.c: In function 'msm_gpu_init':
drivers/gpu/drm/msm/msm_gpu.c:780:31: warning: format '%lu' expects argument of 
type 'long unsigned int', but argument 7 has type 'unsigned int' [-Wformat=]
   DRM_DEV_INFO_ONCE(drm->dev, "Only creating %lu ringbuffers\n",
   ^
include/drm/drm_print.h:237:60: note: in definition of macro 'DRM_DEV_INFO'
  drm_dev_printk(dev, KERN_INFO, DRM_UT_NONE, __func__, "", fmt, \
^
drivers/gpu/drm/msm/msm_gpu.c:780:3: note: in expansion of macro 
'DRM_DEV_INFO_ONCE'
   DRM_DEV_INFO_ONCE(drm->dev, "Only creating %lu ringbuffers\n",
   ^

Introduced by commit

  f97decac5f4c ("drm/msm: Support multiple ringbuffers")

-- 
Cheers,
Stephen Rothwell


linux-next: build warning after merge of the drm-msm tree

2017-10-31 Thread Stephen Rothwell
Hi Rob,

After merging the drm-msm tree, today's linux-next build (arm
multi_v7_defconfig) produced this warning:

In file included from include/drm/drm_mm.h:49:0,
 from include/drm/drmP.h:73,
 from drivers/gpu/drm/msm/msm_drv.h:37,
 from drivers/gpu/drm/msm/msm_gpu.h:24,
 from drivers/gpu/drm/msm/msm_gpu.c:18:
drivers/gpu/drm/msm/msm_gpu.c: In function 'msm_gpu_init':
drivers/gpu/drm/msm/msm_gpu.c:780:31: warning: format '%lu' expects argument of 
type 'long unsigned int', but argument 7 has type 'unsigned int' [-Wformat=]
   DRM_DEV_INFO_ONCE(drm->dev, "Only creating %lu ringbuffers\n",
   ^
include/drm/drm_print.h:237:60: note: in definition of macro 'DRM_DEV_INFO'
  drm_dev_printk(dev, KERN_INFO, DRM_UT_NONE, __func__, "", fmt, \
^
drivers/gpu/drm/msm/msm_gpu.c:780:3: note: in expansion of macro 
'DRM_DEV_INFO_ONCE'
   DRM_DEV_INFO_ONCE(drm->dev, "Only creating %lu ringbuffers\n",
   ^

Introduced by commit

  f97decac5f4c ("drm/msm: Support multiple ringbuffers")

-- 
Cheers,
Stephen Rothwell


linux-next: manual merge of the drm-misc tree with the drm tree

2017-10-31 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the drm-misc tree got a conflict in:

  include/drm/drmP.h

between commit:

  e7646f84ad4f ("drm: Add new LEASE debug level")

from the drm tree and commit:

  02c9656b2f0d ("drm: Move debug macros out of drmP.h")

from the drm-misc tree.

I fixed it up (I used the drm-misc version of the file and added the below
merge fix patch) and can carry the fix as necessary. This is now fixed
as far as linux-next is concerned, but any non trivial conflicts should
be mentioned to your upstream maintainer when your tree is submitted for
merging.  You may also want to consider cooperating with the maintainer
of the conflicting tree to minimise any particularly complex conflicts.

From: Stephen Rothwell 
Date: Wed, 1 Nov 2017 14:33:07 +1100
Subject: [PATCH] drm-misc: merge fix up for DEBUG printing macros move

Signed-off-by: Stephen Rothwell 
---
 include/drm/drm_print.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h
index 7b9c86a6ca3e..edcea83a5050 100644
--- a/include/drm/drm_print.h
+++ b/include/drm/drm_print.h
@@ -171,6 +171,7 @@ static inline struct drm_printer drm_debug_printer(const 
char *prefix)
 #define DRM_UT_ATOMIC  0x10
 #define DRM_UT_VBL 0x20
 #define DRM_UT_STATE   0x40
+#define DRM_UT_LEASE   0x80
 
 __printf(6, 7)
 void drm_dev_printk(const struct device *dev, const char *level,
@@ -287,6 +288,9 @@ void drm_printk(const char *level, unsigned int category,
 #define DRM_DEBUG_VBL(fmt, ...)\
drm_printk(KERN_DEBUG, DRM_UT_VBL, fmt, ##__VA_ARGS__)
 
+#define DRM_DEBUG_LEASE(fmt, ...)  \
+   drm_printk(KERN_DEBUG, DRM_UT_LEASE, fmt, ##__VA_ARGS__)
+
 #define _DRM_DEV_DEFINE_DEBUG_RATELIMITED(dev, level, fmt, args...)\
 ({ \
static DEFINE_RATELIMIT_STATE(_rs,  \


-- 
Cheers,
Stephen Rothwell


linux-next: manual merge of the drm-misc tree with the drm tree

2017-10-31 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the drm-misc tree got a conflict in:

  include/drm/drmP.h

between commit:

  e7646f84ad4f ("drm: Add new LEASE debug level")

from the drm tree and commit:

  02c9656b2f0d ("drm: Move debug macros out of drmP.h")

from the drm-misc tree.

I fixed it up (I used the drm-misc version of the file and added the below
merge fix patch) and can carry the fix as necessary. This is now fixed
as far as linux-next is concerned, but any non trivial conflicts should
be mentioned to your upstream maintainer when your tree is submitted for
merging.  You may also want to consider cooperating with the maintainer
of the conflicting tree to minimise any particularly complex conflicts.

From: Stephen Rothwell 
Date: Wed, 1 Nov 2017 14:33:07 +1100
Subject: [PATCH] drm-misc: merge fix up for DEBUG printing macros move

Signed-off-by: Stephen Rothwell 
---
 include/drm/drm_print.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h
index 7b9c86a6ca3e..edcea83a5050 100644
--- a/include/drm/drm_print.h
+++ b/include/drm/drm_print.h
@@ -171,6 +171,7 @@ static inline struct drm_printer drm_debug_printer(const 
char *prefix)
 #define DRM_UT_ATOMIC  0x10
 #define DRM_UT_VBL 0x20
 #define DRM_UT_STATE   0x40
+#define DRM_UT_LEASE   0x80
 
 __printf(6, 7)
 void drm_dev_printk(const struct device *dev, const char *level,
@@ -287,6 +288,9 @@ void drm_printk(const char *level, unsigned int category,
 #define DRM_DEBUG_VBL(fmt, ...)\
drm_printk(KERN_DEBUG, DRM_UT_VBL, fmt, ##__VA_ARGS__)
 
+#define DRM_DEBUG_LEASE(fmt, ...)  \
+   drm_printk(KERN_DEBUG, DRM_UT_LEASE, fmt, ##__VA_ARGS__)
+
 #define _DRM_DEV_DEFINE_DEBUG_RATELIMITED(dev, level, fmt, args...)\
 ({ \
static DEFINE_RATELIMIT_STATE(_rs,  \


-- 
Cheers,
Stephen Rothwell


[PATCH v3] tracing: Allocate mask_str buffer dynamically

2017-10-31 Thread changbin . du
From: Changbin Du 

The default NR_CPUS can be very large, but actual possible nr_cpu_ids
usually is very small. For my x86 distribution, the NR_CPUS is 8192 and
nr_cpu_ids is 4. About 2 pages are wasted.

Most machines don't have so many CPUs, so define a array with NR_CPUS
just wastes memory. So let's allocate the buffer dynamically when need.

The exact buffer size should be:
  DIV_ROUND_UP(nr_cpu_ids, 4) + nr_cpu_ids/32 + 2;

Example output:
  ff,

With this change, the mutext tracing_cpumask_update_lock also can be
removed now, which was used to protect mask_str.

Signed-off-by: Changbin Du 
Cc: Steven Rostedt 

---
v3:
  - remove tracing_cpumask_update_lock which was used to protect mask_str. 
(Rostedt)
v2:
  - remove 'static' declaration.
  - fix buffer size.
---
 kernel/trace/trace.c | 29 +
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 752e5da..5d2ec80 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4178,37 +4178,30 @@ static const struct file_operations show_traces_fops = {
.llseek = seq_lseek,
 };
 
-/*
- * The tracer itself will not take this lock, but still we want
- * to provide a consistent cpumask to user-space:
- */
-static DEFINE_MUTEX(tracing_cpumask_update_lock);
-
-/*
- * Temporary storage for the character representation of the
- * CPU bitmask (and one more byte for the newline):
- */
-static char mask_str[NR_CPUS + 1];
-
 static ssize_t
 tracing_cpumask_read(struct file *filp, char __user *ubuf,
 size_t count, loff_t *ppos)
 {
struct trace_array *tr = file_inode(filp)->i_private;
+   char *mask_str;
int len;
 
-   mutex_lock(_cpumask_update_lock);
+   /* Bitmap, ',' and two more bytes for the newline and '\0'. */
+   len = DIV_ROUND_UP(nr_cpu_ids, 4) + nr_cpu_ids/32 + 2;
+   mask_str = kmalloc(len, GFP_KERNEL);
+   if (!mask_str)
+   return -ENOMEM;
 
-   len = snprintf(mask_str, count, "%*pb\n",
+   len = snprintf(mask_str, len, "%*pb\n",
   cpumask_pr_args(tr->tracing_cpumask));
if (len >= count) {
count = -EINVAL;
goto out_err;
}
-   count = simple_read_from_buffer(ubuf, count, ppos, mask_str, NR_CPUS+1);
+   count = simple_read_from_buffer(ubuf, count, ppos, mask_str, len);
 
 out_err:
-   mutex_unlock(_cpumask_update_lock);
+   kfree(mask_str);
 
return count;
 }
@@ -4228,8 +4221,6 @@ tracing_cpumask_write(struct file *filp, const char 
__user *ubuf,
if (err)
goto err_unlock;
 
-   mutex_lock(_cpumask_update_lock);
-
local_irq_disable();
arch_spin_lock(>max_lock);
for_each_tracing_cpu(cpu) {
@@ -4252,8 +4243,6 @@ tracing_cpumask_write(struct file *filp, const char 
__user *ubuf,
local_irq_enable();
 
cpumask_copy(tr->tracing_cpumask, tracing_cpumask_new);
-
-   mutex_unlock(_cpumask_update_lock);
free_cpumask_var(tracing_cpumask_new);
 
return count;
-- 
2.7.4



[PATCH v3] tracing: Allocate mask_str buffer dynamically

2017-10-31 Thread changbin . du
From: Changbin Du 

The default NR_CPUS can be very large, but actual possible nr_cpu_ids
usually is very small. For my x86 distribution, the NR_CPUS is 8192 and
nr_cpu_ids is 4. About 2 pages are wasted.

Most machines don't have so many CPUs, so define a array with NR_CPUS
just wastes memory. So let's allocate the buffer dynamically when need.

The exact buffer size should be:
  DIV_ROUND_UP(nr_cpu_ids, 4) + nr_cpu_ids/32 + 2;

Example output:
  ff,

With this change, the mutext tracing_cpumask_update_lock also can be
removed now, which was used to protect mask_str.

Signed-off-by: Changbin Du 
Cc: Steven Rostedt 

---
v3:
  - remove tracing_cpumask_update_lock which was used to protect mask_str. 
(Rostedt)
v2:
  - remove 'static' declaration.
  - fix buffer size.
---
 kernel/trace/trace.c | 29 +
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 752e5da..5d2ec80 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4178,37 +4178,30 @@ static const struct file_operations show_traces_fops = {
.llseek = seq_lseek,
 };
 
-/*
- * The tracer itself will not take this lock, but still we want
- * to provide a consistent cpumask to user-space:
- */
-static DEFINE_MUTEX(tracing_cpumask_update_lock);
-
-/*
- * Temporary storage for the character representation of the
- * CPU bitmask (and one more byte for the newline):
- */
-static char mask_str[NR_CPUS + 1];
-
 static ssize_t
 tracing_cpumask_read(struct file *filp, char __user *ubuf,
 size_t count, loff_t *ppos)
 {
struct trace_array *tr = file_inode(filp)->i_private;
+   char *mask_str;
int len;
 
-   mutex_lock(_cpumask_update_lock);
+   /* Bitmap, ',' and two more bytes for the newline and '\0'. */
+   len = DIV_ROUND_UP(nr_cpu_ids, 4) + nr_cpu_ids/32 + 2;
+   mask_str = kmalloc(len, GFP_KERNEL);
+   if (!mask_str)
+   return -ENOMEM;
 
-   len = snprintf(mask_str, count, "%*pb\n",
+   len = snprintf(mask_str, len, "%*pb\n",
   cpumask_pr_args(tr->tracing_cpumask));
if (len >= count) {
count = -EINVAL;
goto out_err;
}
-   count = simple_read_from_buffer(ubuf, count, ppos, mask_str, NR_CPUS+1);
+   count = simple_read_from_buffer(ubuf, count, ppos, mask_str, len);
 
 out_err:
-   mutex_unlock(_cpumask_update_lock);
+   kfree(mask_str);
 
return count;
 }
@@ -4228,8 +4221,6 @@ tracing_cpumask_write(struct file *filp, const char 
__user *ubuf,
if (err)
goto err_unlock;
 
-   mutex_lock(_cpumask_update_lock);
-
local_irq_disable();
arch_spin_lock(>max_lock);
for_each_tracing_cpu(cpu) {
@@ -4252,8 +4243,6 @@ tracing_cpumask_write(struct file *filp, const char 
__user *ubuf,
local_irq_enable();
 
cpumask_copy(tr->tracing_cpumask, tracing_cpumask_new);
-
-   mutex_unlock(_cpumask_update_lock);
free_cpumask_var(tracing_cpumask_new);
 
return count;
-- 
2.7.4



Re: [PATCH] atm: iphase: Fix space before '[' error.

2017-10-31 Thread David Miller
From: Arvind Yadav 
Date: Mon, 30 Oct 2017 21:22:03 +0530

> Fix checkpatch.pl error:
> ERROR: space prohibited before open square bracket '['.
> 
> Signed-off-by: Arvind Yadav 

Applied.


Re: [PATCH] atm: iphase: Fix space before '[' error.

2017-10-31 Thread David Miller
From: Arvind Yadav 
Date: Mon, 30 Oct 2017 21:22:03 +0530

> Fix checkpatch.pl error:
> ERROR: space prohibited before open square bracket '['.
> 
> Signed-off-by: Arvind Yadav 

Applied.


Re: [PATCH 2/3] thermal: int340x: processor_thermal: Add Coffee Lake support

2017-10-31 Thread Zhang Rui
On Thu, 2017-10-19 at 14:51 -0700, Srinivas Pandruvada wrote:
> Add new PCI id for Coffee lake processor thermal device.
> 
> Signed-off-by: Srinivas Pandruvada  om>
> ---
>  drivers/thermal/int340x_thermal/processor_thermal_device.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git
> a/drivers/thermal/int340x_thermal/processor_thermal_device.c
> b/drivers/thermal/int340x_thermal/processor_thermal_device.c
> index e724a23..1d9f524 100644
> --- a/drivers/thermal/int340x_thermal/processor_thermal_device.c
> +++ b/drivers/thermal/int340x_thermal/processor_thermal_device.c
> @@ -32,6 +32,7 @@
>  
>  /* CannonLake thermal reporting device */
>  #define PCI_DEVICE_ID_PROC_CNL_THERMAL   0x5a03
> +#define PCI_DEVICE_ID_PROC_CFL_THERMAL   0x3E83
>  
shouldn't it be added into proc_thermal_pci_ids[]?

thanks,
rui
>  /* Braswell thermal reporting device */
>  #define PCI_DEVICE_ID_PROC_BSW_THERMAL   0x22DC


Re: [PATCH 2/3] thermal: int340x: processor_thermal: Add Coffee Lake support

2017-10-31 Thread Zhang Rui
On Thu, 2017-10-19 at 14:51 -0700, Srinivas Pandruvada wrote:
> Add new PCI id for Coffee lake processor thermal device.
> 
> Signed-off-by: Srinivas Pandruvada  om>
> ---
>  drivers/thermal/int340x_thermal/processor_thermal_device.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git
> a/drivers/thermal/int340x_thermal/processor_thermal_device.c
> b/drivers/thermal/int340x_thermal/processor_thermal_device.c
> index e724a23..1d9f524 100644
> --- a/drivers/thermal/int340x_thermal/processor_thermal_device.c
> +++ b/drivers/thermal/int340x_thermal/processor_thermal_device.c
> @@ -32,6 +32,7 @@
>  
>  /* CannonLake thermal reporting device */
>  #define PCI_DEVICE_ID_PROC_CNL_THERMAL   0x5a03
> +#define PCI_DEVICE_ID_PROC_CFL_THERMAL   0x3E83
>  
shouldn't it be added into proc_thermal_pci_ids[]?

thanks,
rui
>  /* Braswell thermal reporting device */
>  #define PCI_DEVICE_ID_PROC_BSW_THERMAL   0x22DC


Re: [PATCH net-next 0/6] net: ppv2: various improvements

2017-10-31 Thread David Miller
From: Antoine Tenart 
Date: Mon, 30 Oct 2017 11:23:27 +0100

> This series includes various patches improving the Marvell PPv2 driver.
> I send them as a series to avoid any possible merge conflict.
> 
> - Patches 1 and 2 improve the initializing of the Tx and Rx FIFO.
> - Patch 3 initialize the RSS table to evenly distribute the ingress
>   packets across multiple Rx queues based on their hashes.
> - Patch 4 limits the number of TSO segments sent to the driver, to avoid
>   having more segments to handle than the corresponding number of
>   available descriptors.
> - Patch 5 and 6 are cosmetic improvements.
> 
> This applies on today's net-next branch, The patches were tested
> extensively (I ran iperf and http downloads in parallel, transferring
> TBs of data).

Series applied, thanks.


Re: [PATCH net-next 0/6] net: ppv2: various improvements

2017-10-31 Thread David Miller
From: Antoine Tenart 
Date: Mon, 30 Oct 2017 11:23:27 +0100

> This series includes various patches improving the Marvell PPv2 driver.
> I send them as a series to avoid any possible merge conflict.
> 
> - Patches 1 and 2 improve the initializing of the Tx and Rx FIFO.
> - Patch 3 initialize the RSS table to evenly distribute the ingress
>   packets across multiple Rx queues based on their hashes.
> - Patch 4 limits the number of TSO segments sent to the driver, to avoid
>   having more segments to handle than the corresponding number of
>   available descriptors.
> - Patch 5 and 6 are cosmetic improvements.
> 
> This applies on today's net-next branch, The patches were tested
> extensively (I ran iperf and http downloads in parallel, transferring
> TBs of data).

Series applied, thanks.


Re: Kernel crash in free_pipe_info()

2017-10-31 Thread Cong Wang
On Mon, Oct 30, 2017 at 7:08 PM, Linus Torvalds
 wrote:
> On Mon, Oct 30, 2017 at 6:19 PM, Cong Wang  wrote:
>>
>> 1. The faulty addresses are all near 0001, with one exception
>> of null (which is the most recent one)
>
> Well, they're at 8(%rax), except for that last case.
>
> And in every case (_including_ that last case), %rax has a very
> interesting pattern.. That's the (bad) buf->ops pointer that  was
> loaded from the somehow corrupted "buf".
>
> The values in all cases are
>
> fffa
> fffd
> fff1
> fff7
> fff4
> fffa
> fffd
> fffd
> fffa
> ffe8
> fff1
> fff7
>
> which kind of looks like a 32-bit error value. So we have (n, val, (errno)):
>
>   1 -24 (EMFILE)
>   2 -15 (ENOTBLK)
>   1 -12 (ENOMEM)
>   2 -9 (EBADF)
>   3 -6 (ENXIO)
>   3 -3 (ESRCH)
>
> none of which makes any sense to me, but it's an interesting pattern
> nonetheless.


Yeah, good find!


>
>> 2. R12 register, which should map to the local vairable 'i', is always 0x8
>> at the time of crash.
>
> So _if_ this is some kind of use-after-free thing, and the allocation
> got re-used for something else, that might just be related to whatever
> ends up being the offset that is filled in with the (int) error
> number.
>
> Except the offset is that %r12*0x28+0x10, so we're talking a byte
> offset of 330 bytes into the allocation, and apparently the eight
> previous (0-7) iterations were fine.
>
> Which is really odd.
>
> I'm not seeing anything that makes sense. I'll have to think about this.
>
> I'm assuming you don't have slub debugging enabled, and no way to
> enable it and try to catch this?

We enable it at compile-time but not at run-time:

CONFIG_SLUB_DEBUG=y
CONFIG_SLUB=y
CONFIG_SLUB_CPU_PARTIAL=y
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_SLUB_STATS is not set

I can try to manually add slub_debug in boot parameters, but still
have no idea how and when can trigger this bug again.


Thanks!


Re: Kernel crash in free_pipe_info()

2017-10-31 Thread Cong Wang
On Mon, Oct 30, 2017 at 7:08 PM, Linus Torvalds
 wrote:
> On Mon, Oct 30, 2017 at 6:19 PM, Cong Wang  wrote:
>>
>> 1. The faulty addresses are all near 0001, with one exception
>> of null (which is the most recent one)
>
> Well, they're at 8(%rax), except for that last case.
>
> And in every case (_including_ that last case), %rax has a very
> interesting pattern.. That's the (bad) buf->ops pointer that  was
> loaded from the somehow corrupted "buf".
>
> The values in all cases are
>
> fffa
> fffd
> fff1
> fff7
> fff4
> fffa
> fffd
> fffd
> fffa
> ffe8
> fff1
> fff7
>
> which kind of looks like a 32-bit error value. So we have (n, val, (errno)):
>
>   1 -24 (EMFILE)
>   2 -15 (ENOTBLK)
>   1 -12 (ENOMEM)
>   2 -9 (EBADF)
>   3 -6 (ENXIO)
>   3 -3 (ESRCH)
>
> none of which makes any sense to me, but it's an interesting pattern
> nonetheless.


Yeah, good find!


>
>> 2. R12 register, which should map to the local vairable 'i', is always 0x8
>> at the time of crash.
>
> So _if_ this is some kind of use-after-free thing, and the allocation
> got re-used for something else, that might just be related to whatever
> ends up being the offset that is filled in with the (int) error
> number.
>
> Except the offset is that %r12*0x28+0x10, so we're talking a byte
> offset of 330 bytes into the allocation, and apparently the eight
> previous (0-7) iterations were fine.
>
> Which is really odd.
>
> I'm not seeing anything that makes sense. I'll have to think about this.
>
> I'm assuming you don't have slub debugging enabled, and no way to
> enable it and try to catch this?

We enable it at compile-time but not at run-time:

CONFIG_SLUB_DEBUG=y
CONFIG_SLUB=y
CONFIG_SLUB_CPU_PARTIAL=y
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_SLUB_STATS is not set

I can try to manually add slub_debug in boot parameters, but still
have no idea how and when can trigger this bug again.


Thanks!


Re: [PATCH] x86, build: Improve the isolinux searching of isoimage generation

2017-10-31 Thread Masahiro Yamada
2017-10-31 18:39 GMT+09:00 Ingo Molnar :
>
> * changbin...@intel.com  wrote:
>
>> From: Changbin Du 
>>
>> Recently I failed to build isoimage target, because the path of isolinux.bin
>> changed to /usr/xxx/ISOLINUX/isolinux.bin, as well as ldlinux.c32 which
>> changed to /usr/xxx/syslinux/modules/bios/ldlinux.c32.
>>
>> This patch has a improvement of the file search:
>>   - Don't print the raw shell commands. It doesn't make sense to show the
>> entire big block.
>>   - Show a error message instead of silent fail.
>>   - Add above new paths.
>>
>> Now it becomes:
>> Kernel: arch/x86/boot/bzImage is ready  (#62)
>> rm -rf arch/x86/boot/isoimage
>> mkdir arch/x86/boot/isoimage
>> Using /usr/lib/ISOLINUX/isolinux.bin
>> Using /usr/lib/syslinux/modules/bios/ldlinux.c32
>> cp arch/x86/boot/bzImage arch/x86/boot/isoimage/linux
>> ...
>>
>> Before:
>> Kernel: arch/x86/boot/bzImage is ready  (#63)
>> rm -rf arch/x86/boot/isoimage
>> mkdir arch/x86/boot/isoimage
>> for i in lib lib64 share end ; do \
>>   if [ -f /usr/$i/syslinux/isolinux.bin ] ; then \
>>   cp /usr/$i/syslinux/isolinux.bin arch/x86/boot/isoimage ; \
>>   if [ -f /usr/$i/syslinux/ldlinux.c32 ]; then \
>>   cp /usr/$i/syslinux/ldlinux.c32 arch/x86/boot/isoimage 
>> ; \
>>   fi ; \
>>   break ; \
>>   fi ; \
>>   if [ $i = end ] ; then exit 1 ; fi ; \
>> done
>> arch/x86/boot/Makefile:161: recipe for target 'isoimage' failed
>> make[1]: *** [isoimage] Error 1
>
> I like these changes. Could we please further improve it: for example the boot
> image build messages are still pretty unstructured, while regular build system
> messages come in the following format:
>
>   CC  arch/x86/events/msr.o
>   RELOCS  arch/x86/realmode/rm/realmode.relocs
>   OBJCOPY arch/x86/realmode/rm/realmode.bin
>   CC  arch/x86/kernel/signal.o
>   AS  arch/x86/realmode/rmpiggy.o
>   CC  ipc/msg.o
>   AR  arch/x86/ia32/built-in.o
>   CC  arch/x86/events/amd/iommu.o
>   CC  init/do_mounts.o
>   AR  arch/x86/realmode/built-in.o
>
> So instead of:
>
>> Kernel: arch/x86/boot/bzImage is ready  (#62)
>> rm -rf arch/x86/boot/isoimage
>> mkdir arch/x86/boot/isoimage
>> Using /usr/lib/ISOLINUX/isolinux.bin
>> Using /usr/lib/syslinux/modules/bios/ldlinux.c32
>> cp arch/x86/boot/bzImage arch/x86/boot/isoimage/linux
>
> Could we make it something more streamlined and similar to the rest of the 
> build
> as well, like:
>
>   GEN arch/x86/boot/bzImage
>   GEN arch/x86/boot/isoimage
>   GEN arch/x86/boot/isoimage/linux
>
> I.e. only mention the new files built, with an appropriate prefix.
>
> I've Cc:-ed the kbuild maintainers, maybe they have a better suggestion 
> instead of
> the 'GEN' abbreviation?
>

Generally, the abbreviation is the tool that has processed the target,
but if you do not find an appropriate one, 'GEN' is fine.




-- 
Best Regards
Masahiro Yamada


Re: [PATCH] x86, build: Improve the isolinux searching of isoimage generation

2017-10-31 Thread Masahiro Yamada
2017-10-31 18:39 GMT+09:00 Ingo Molnar :
>
> * changbin...@intel.com  wrote:
>
>> From: Changbin Du 
>>
>> Recently I failed to build isoimage target, because the path of isolinux.bin
>> changed to /usr/xxx/ISOLINUX/isolinux.bin, as well as ldlinux.c32 which
>> changed to /usr/xxx/syslinux/modules/bios/ldlinux.c32.
>>
>> This patch has a improvement of the file search:
>>   - Don't print the raw shell commands. It doesn't make sense to show the
>> entire big block.
>>   - Show a error message instead of silent fail.
>>   - Add above new paths.
>>
>> Now it becomes:
>> Kernel: arch/x86/boot/bzImage is ready  (#62)
>> rm -rf arch/x86/boot/isoimage
>> mkdir arch/x86/boot/isoimage
>> Using /usr/lib/ISOLINUX/isolinux.bin
>> Using /usr/lib/syslinux/modules/bios/ldlinux.c32
>> cp arch/x86/boot/bzImage arch/x86/boot/isoimage/linux
>> ...
>>
>> Before:
>> Kernel: arch/x86/boot/bzImage is ready  (#63)
>> rm -rf arch/x86/boot/isoimage
>> mkdir arch/x86/boot/isoimage
>> for i in lib lib64 share end ; do \
>>   if [ -f /usr/$i/syslinux/isolinux.bin ] ; then \
>>   cp /usr/$i/syslinux/isolinux.bin arch/x86/boot/isoimage ; \
>>   if [ -f /usr/$i/syslinux/ldlinux.c32 ]; then \
>>   cp /usr/$i/syslinux/ldlinux.c32 arch/x86/boot/isoimage 
>> ; \
>>   fi ; \
>>   break ; \
>>   fi ; \
>>   if [ $i = end ] ; then exit 1 ; fi ; \
>> done
>> arch/x86/boot/Makefile:161: recipe for target 'isoimage' failed
>> make[1]: *** [isoimage] Error 1
>
> I like these changes. Could we please further improve it: for example the boot
> image build messages are still pretty unstructured, while regular build system
> messages come in the following format:
>
>   CC  arch/x86/events/msr.o
>   RELOCS  arch/x86/realmode/rm/realmode.relocs
>   OBJCOPY arch/x86/realmode/rm/realmode.bin
>   CC  arch/x86/kernel/signal.o
>   AS  arch/x86/realmode/rmpiggy.o
>   CC  ipc/msg.o
>   AR  arch/x86/ia32/built-in.o
>   CC  arch/x86/events/amd/iommu.o
>   CC  init/do_mounts.o
>   AR  arch/x86/realmode/built-in.o
>
> So instead of:
>
>> Kernel: arch/x86/boot/bzImage is ready  (#62)
>> rm -rf arch/x86/boot/isoimage
>> mkdir arch/x86/boot/isoimage
>> Using /usr/lib/ISOLINUX/isolinux.bin
>> Using /usr/lib/syslinux/modules/bios/ldlinux.c32
>> cp arch/x86/boot/bzImage arch/x86/boot/isoimage/linux
>
> Could we make it something more streamlined and similar to the rest of the 
> build
> as well, like:
>
>   GEN arch/x86/boot/bzImage
>   GEN arch/x86/boot/isoimage
>   GEN arch/x86/boot/isoimage/linux
>
> I.e. only mention the new files built, with an appropriate prefix.
>
> I've Cc:-ed the kbuild maintainers, maybe they have a better suggestion 
> instead of
> the 'GEN' abbreviation?
>

Generally, the abbreviation is the tool that has processed the target,
but if you do not find an appropriate one, 'GEN' is fine.




-- 
Best Regards
Masahiro Yamada


Re: [PATCH] net: hns: set correct return value

2017-10-31 Thread David Miller
From: Pan Bian 
Date: Mon, 30 Oct 2017 16:50:01 +0800

> The function of_parse_phandle() returns a NULL pointer if it cannot
> resolve a phandle property to a device_node pointer. In function
> hns_nic_dev_probe(), its return value is passed to PTR_ERR to extract
> the error code. However, in this case, the extracted error code will
> always be zero, which is unexpected.
> 
> Signed-off-by: Pan Bian 

Applied.


Re: [PATCH] net: hns: set correct return value

2017-10-31 Thread David Miller
From: Pan Bian 
Date: Mon, 30 Oct 2017 16:50:01 +0800

> The function of_parse_phandle() returns a NULL pointer if it cannot
> resolve a phandle property to a device_node pointer. In function
> hns_nic_dev_probe(), its return value is passed to PTR_ERR to extract
> the error code. However, in this case, the extracted error code will
> always be zero, which is unexpected.
> 
> Signed-off-by: Pan Bian 

Applied.


linux-next: build warnings after merge of the drm tree

2017-10-31 Thread Stephen Rothwell
Hi Dave,

After merging the drm tree, today's linux-next build (x86_64 allmodconfig)
produced these warnings:

drivers/gpu/drm/vc4/vc4_bo.c: In function 'vc4_bo_stats_debugfs':
drivers/gpu/drm/vc4/vc4_bo.c:91:17: warning: format '%d' expects argument of 
type 'int', but argument 4 has type 'size_t {aka long unsigned int}' [-Wformat=]
   seq_printf(m, "%30s: %6dkb BOs (%d)\n", "userspace BO cache",
 ^
drivers/gpu/drm/vc4/vc4_bo.c:95:17: warning: format '%d' expects argument of 
type 'int', but argument 4 has type 'size_t {aka long unsigned int}' [-Wformat=]
   seq_printf(m, "%30s: %6dkb BOs (%d)\n", "total purged BO",
 ^

Introduced by commit

  b9f19259b84d ("drm/vc4: Add the DRM_IOCTL_VC4_GEM_MADVISE ioctl")

-- 
Cheers,
Stephen Rothwell


linux-next: build warnings after merge of the drm tree

2017-10-31 Thread Stephen Rothwell
Hi Dave,

After merging the drm tree, today's linux-next build (x86_64 allmodconfig)
produced these warnings:

drivers/gpu/drm/vc4/vc4_bo.c: In function 'vc4_bo_stats_debugfs':
drivers/gpu/drm/vc4/vc4_bo.c:91:17: warning: format '%d' expects argument of 
type 'int', but argument 4 has type 'size_t {aka long unsigned int}' [-Wformat=]
   seq_printf(m, "%30s: %6dkb BOs (%d)\n", "userspace BO cache",
 ^
drivers/gpu/drm/vc4/vc4_bo.c:95:17: warning: format '%d' expects argument of 
type 'int', but argument 4 has type 'size_t {aka long unsigned int}' [-Wformat=]
   seq_printf(m, "%30s: %6dkb BOs (%d)\n", "total purged BO",
 ^

Introduced by commit

  b9f19259b84d ("drm/vc4: Add the DRM_IOCTL_VC4_GEM_MADVISE ioctl")

-- 
Cheers,
Stephen Rothwell


Re: net: lapbether: fix double free

2017-10-31 Thread David Miller
From: Pan Bian 
Date: Sun, 29 Oct 2017 21:57:22 +0800

> The function netdev_priv() returns the private data of the device. The
> memory to store the private data is allocated in alloc_netdev() and is
> released in netdev_free(). Calling kfree() on the return value of
> netdev_priv() after netdev_free() results in a double free bug.
> 
> Signed-off-by: Pan Bian 

Applied.


Re: net: lapbether: fix double free

2017-10-31 Thread David Miller
From: Pan Bian 
Date: Sun, 29 Oct 2017 21:57:22 +0800

> The function netdev_priv() returns the private data of the device. The
> memory to store the private data is allocated in alloc_netdev() and is
> released in netdev_free(). Calling kfree() on the return value of
> netdev_priv() after netdev_free() results in a double free bug.
> 
> Signed-off-by: Pan Bian 

Applied.


Re: [PATCH] mkiss: remove redundant assignment of len to ax->mtu

2017-10-31 Thread David Miller
From: Colin King 
Date: Sun, 29 Oct 2017 13:30:25 +

> From: Colin Ian King 
> 
> Variable len is being assigned a value that is never read,
> hence the assignment is redundant and can be removed. Cleans
> up clang warning:
> 
> drivers/net/hamradio/mkiss.c:443:3: warning: Value stored to
> 'len' is never read
> 
> Signed-off-by: Colin Ian King 

Applied.


Re: [PATCH] mkiss: remove redundant assignment of len to ax->mtu

2017-10-31 Thread David Miller
From: Colin King 
Date: Sun, 29 Oct 2017 13:30:25 +

> From: Colin Ian King 
> 
> Variable len is being assigned a value that is never read,
> hence the assignment is redundant and can be removed. Cleans
> up clang warning:
> 
> drivers/net/hamradio/mkiss.c:443:3: warning: Value stored to
> 'len' is never read
> 
> Signed-off-by: Colin Ian King 

Applied.


Re: [PATCH v3] dmaengine: rcar-dmac: use TCRB instead of TCR for residue

2017-10-31 Thread Kuninori Morimoto

Hi Geert, Vinod

Geert, thank you for your report,
Vinod, thank you for your quick help.

> > > This is now commit 847449f23dcbff68 ("dmaengine: rcar-dmac: use TCRB
> > > instead of TCR for residue") in slave-dma/next, and breaks serial console
> > > input on koelsch (shmobile_defconfig) and salvator-x (renesas_defconfig).
> > > Reverting that commit fixes the issue for me.

This patch solved my issue (= sound noise), but it is transferring
large size data. From "transferring data size" point of view,
my sound situation is same as your large serial console input situation?

I will ask this to HW guys.
Thanks

Best regards
---
Kuninori Morimoto


Re: [PATCH v3] dmaengine: rcar-dmac: use TCRB instead of TCR for residue

2017-10-31 Thread Kuninori Morimoto

Hi Geert, Vinod

Geert, thank you for your report,
Vinod, thank you for your quick help.

> > > This is now commit 847449f23dcbff68 ("dmaengine: rcar-dmac: use TCRB
> > > instead of TCR for residue") in slave-dma/next, and breaks serial console
> > > input on koelsch (shmobile_defconfig) and salvator-x (renesas_defconfig).
> > > Reverting that commit fixes the issue for me.

This patch solved my issue (= sound noise), but it is transferring
large size data. From "transferring data size" point of view,
my sound situation is same as your large serial console input situation?

I will ask this to HW guys.
Thanks

Best regards
---
Kuninori Morimoto


Re: [PATCH] net: decnet: dn_nsp_out: use swap macro in dn_mk_ack_header

2017-10-31 Thread David Miller
From: "Gustavo A. R. Silva" 
Date: Sat, 28 Oct 2017 15:39:48 -0500

> Make use of the swap macro and remove unnecessary variable tmp.
> This makes the code easier to read and maintain.
> 
> This code was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva 

Applied.


Re: [PATCH] net: decnet: dn_nsp_out: use swap macro in dn_mk_ack_header

2017-10-31 Thread David Miller
From: "Gustavo A. R. Silva" 
Date: Sat, 28 Oct 2017 15:39:48 -0500

> Make use of the swap macro and remove unnecessary variable tmp.
> This makes the code easier to read and maintain.
> 
> This code was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva 

Applied.


Re: [PATCH] net: dccp: ccids: lib: packet_history: use swap macro in tfrc_rx_hist_swap

2017-10-31 Thread David Miller
From: "Gustavo A. R. Silva" 
Date: Sat, 28 Oct 2017 15:48:47 -0500

> Make use of the swap macro and remove unnecessary variable tmp.
> This makes the code easier to read and maintain.
> 
> This code was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva 

Applied.


Re: [PATCH] net: dccp: ccids: lib: packet_history: use swap macro in tfrc_rx_hist_swap

2017-10-31 Thread David Miller
From: "Gustavo A. R. Silva" 
Date: Sat, 28 Oct 2017 15:48:47 -0500

> Make use of the swap macro and remove unnecessary variable tmp.
> This makes the code easier to read and maintain.
> 
> This code was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva 

Applied.


Re: [PATCH] net: decnet: dn_nsp_in: use swap macro in dn_nsp_rx_packet

2017-10-31 Thread David Miller
From: "Gustavo A. R. Silva" 
Date: Sat, 28 Oct 2017 14:38:45 -0500

> Make use of the swap macro and remove unnecessary variable tmp.
> This makes the code easier to read and maintain.
> 
> This code was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva 

Applied.


Re: [PATCH] net: decnet: dn_nsp_in: use swap macro in dn_nsp_rx_packet

2017-10-31 Thread David Miller
From: "Gustavo A. R. Silva" 
Date: Sat, 28 Oct 2017 14:38:45 -0500

> Make use of the swap macro and remove unnecessary variable tmp.
> This makes the code easier to read and maintain.
> 
> This code was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva 

Applied.


  1   2   3   4   5   6   7   8   9   10   >