RE: [PATCH V8 1/5] crypto: Multi-buffer encryption infrastructure support

2018-04-18 Thread Dey, Megha


>-Original Message-
>From: Herbert Xu [mailto:herb...@gondor.apana.org.au]
>Sent: Wednesday, April 18, 2018 4:01 AM
>To: Dey, Megha 
>Cc: linux-kernel@vger.kernel.org; linux-cry...@vger.kernel.org;
>da...@davemloft.net
>Subject: Re: [PATCH V8 1/5] crypto: Multi-buffer encryption infrastructure
>support
>
>On Tue, Apr 17, 2018 at 06:40:17PM +, Dey, Megha wrote:
>>
>>
>> >-Original Message-
>> >From: Herbert Xu [mailto:herb...@gondor.apana.org.au]
>> >Sent: Friday, March 16, 2018 7:54 AM
>> >To: Dey, Megha 
>> >Cc: linux-kernel@vger.kernel.org; linux-cry...@vger.kernel.org;
>> >da...@davemloft.net
>> >Subject: Re: [PATCH V8 1/5] crypto: Multi-buffer encryption
>> >infrastructure support
>> >
>> >I have taken a deeper look and I'm even more convinced now that
>> >mcryptd is simply not needed in your current model.
>> >
>> >The only reason you would need mcryptd is if you need to limit the
>> >rate of requests going into the underlying mb algorithm.
>> >
>> >However, it doesn't do that all.  Even though it seems to have a
>> >batch size of 10, but because it immediately reschedules itself after
>> >the batch runs out, it's essentially just dumping all requests at the
>> >underlying algorithm as fast as they're coming in.  The underlying
>> >algorithm doesn't have need throttling anyway because it'll do the work
>when the queue is full synchronously.
>> >
>> >So why not just get rid of mcryptd completely and expose the
>> >underlying algorithm as a proper async skcipher/hash?
>>
>> Hi Herbert,
>>
>> Most part of the cryptd.c and mcryptd.c are similar, except the logic
>> used to flush out partially completed jobs in the case of multibuffer
>algorithms.
>>
>> I think I will try to merge the cryptd and mcryptd adding necessary quirks 
>> for
>multibuffer where needed.
>
>I think you didn't quite get my point.  From what I'm seeing you don't need
>either cryptd or mcryptd.  You just need to expose the underlying mb
>algorithm directly.

Hi Herbert,

Yeah I think I misunderstood. I think what you mean is to remove mcryptd.c 
completely and avoid the extra layer of indirection to call the underlying 
algorithm, instead call it directly, correct?

So currently we have 3 algorithms registered for every multibuffer algorithm:
name : __sha1-mb
driver   : mcryptd(__intel_sha1-mb)

name : sha1
driver   : sha1_mb

name : __sha1-mb
driver   : __intel_sha1-mb

If we remove mcryptd, then we will have just the 2?

The outer algorithm:sha1-mb, will 
>
>So I'm not sure what we would gain from merging cryptd and mcryptd.
>
>Cheers,
>--
>Email: Herbert Xu  Home Page:
>http://gondor.apana.org.au/~herbert/
>PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [net-next PATCH v4 00/13] Add support for netcp driver on K2G SoC

2018-04-18 Thread David Miller
From: Murali Karicheri 
Date: Tue, 17 Apr 2018 17:30:29 -0400

> K2G SoC is another variant of Keystone family of SoCs. This patch
> series add support for NetCP driver on this SoC. The QMSS found on
> K2G SoC is a cut down version of the QMSS found on other keystone
> devices with less number of queues, internal link ram etc. The patch
> series has 2 patch sets that goes into the drivers/soc and the
> rest has to be applied to net sub system. Please review and merge
> if this looks good.
> 
> K2G TRM is located at http://www.ti.com/lit/ug/spruhy8g/spruhy8g.pdf
> Thanks
> 
> The boot logs on K2G ICE board (tftp boot over Ethernet and from mmc)
> https://pastebin.ubuntu.com/p/yvZ6drFhkW/
> 
> 
> The boot logs on K2G GP board (tftp boot over Ethernet and from mmc)
> https://pastebin.ubuntu.com/p/QTr6K7s4Zp/
> 
> Also regressed boot on K2HK and K2L EVMs as we have modified GBE
> version detection logic (K2E uses same version of NetCP as in K2L.
> So regression on one of them is needed).
> 
> Boot log on K2L and K2HK EVMs are at
> https://pastebin.ubuntu.com/p/N9DBdPjbvR/ 
> 
> This series applies to net-next master branch.

Series applied, thank you.


Re: [RFC PATCH ghak32 V2 01/13] audit: add container id

2018-04-18 Thread Casey Schaufler
On 4/18/2018 5:46 PM, Paul Moore wrote:
> On Wed, Apr 18, 2018 at 8:41 PM, Casey Schaufler  
> wrote:
>> On 4/18/2018 4:47 PM, Paul Moore wrote:
>>> On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs  wrote:
 Implement the proc fs write to set the audit container ID of a process,
 emitting an AUDIT_CONTAINER record to document the event.
 ...

 diff --git a/include/linux/sched.h b/include/linux/sched.h
 index d258826..1b82191 100644
 --- a/include/linux/sched.h
 +++ b/include/linux/sched.h
 @@ -796,6 +796,7 @@ struct task_struct {
  #ifdef CONFIG_AUDITSYSCALL
 kuid_t  loginuid;
 unsigned intsessionid;
 +   u64 containerid;
>>> This one line addition to the task_struct scares me the most of
>>> anything in this patchset.  Why?  It's a field named "containerid" in
>>> a perhaps one of the most widely used core kernel structures; the
>>> possibilities for abuse are endless, and it's foolish to think we
>>> would ever be able to adequately police this.
>> If we can get the LSM infrastructure managed task blobs from
>> module stacking in ahead of this we could create a trivial security
>> module to manage this. It's not as if there aren't all sorts of
>> interactions between security modules and the audit system already.
> While yes, there are plenty of interactions between the two, it is
> possible to use audit without the LSMs and I would like to preserve
> that.  

Fair enough.

> Further, I don't want to entangle two very complicated code
> changes or make the audit container ID effort dependent on LSM
> stacking.

Also fair, although the use case for container audit IDs is
already pulling in audit, namespaces (yeah, I know it's not
necessary for a container to use namespaces) security modules
(stacked and/or namespaced), cgroups and who knows what else.

> You're a good salesman Casey, but you're not that good ;)

I have to keep the skills sharpened somehow!

OK, I'll grant that this isn't a great fit.



[PATCH v2] prctl: fix compat handling for prctl

2018-04-18 Thread Li Bin
The member auxv in prctl_mm_map structure which be shared with
userspace is pointer type, but the kernel supporting COMPAT didn't
handle it. This patch fix the compat handling for prctl syscall.

Signed-off-by: Li Bin 
---
 kernel/sys.c | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/kernel/sys.c b/kernel/sys.c
index ad69218..d4259938 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1969,6 +1969,26 @@ static int validate_prctl_map(struct prctl_mm_map 
*prctl_map)
 }
 
 #ifdef CONFIG_CHECKPOINT_RESTORE
+
+#ifdef CONFIG_COMPAT
+struct compat_prctl_mm_map {
+   __u64   start_code; /* code section bounds */
+   __u64   end_code;
+   __u64   start_data; /* data section bounds */
+   __u64   end_data;
+   __u64   start_brk;  /* heap for brk() syscall */
+   __u64   brk;
+   __u64   start_stack;/* stack starts at */
+   __u64   arg_start;  /* command line arguments bounds */
+   __u64   arg_end;
+   __u64   env_start;  /* environment variables bounds */
+   __u64   env_end;
+   compat_uptr_t   auxv;   /* auxiliary vector */
+   __u32   auxv_size;  /* vector size */
+   __u32   exe_fd; /* /proc/$pid/exe link file */
+};
+#endif
+
 static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long 
data_size)
 {
struct prctl_mm_map prctl_map = { .exe_fd = (u32)-1, };
@@ -1986,6 +2006,28 @@ static int prctl_set_mm_map(int opt, const void __user 
*addr, unsigned long data
if (data_size != sizeof(prctl_map))
return -EINVAL;
 
+#ifdef CONFIG_COMPAT
+   if (in_compat_syscall()) {
+   struct compat_prctl_mm_map prctl_map32;
+   if (copy_from_user(_map32, addr, sizeof(prctl_map32)))
+   return -EFAULT;
+
+   prctl_map.start_code = prctl_map32.start_code;
+   prctl_map.end_code = prctl_map32.end_code;
+   prctl_map.start_data = prctl_map32.start_data;
+   prctl_map.end_data = prctl_map32.end_data;
+   prctl_map.start_brk = prctl_map32.start_brk;
+   prctl_map.brk = prctl_map32.brk;
+   prctl_map.start_stack = prctl_map32.start_stack;
+   prctl_map.arg_start = prctl_map32.arg_start;
+   prctl_map.arg_end = prctl_map32.arg_end;
+   prctl_map.env_start = prctl_map32.env_start;
+   prctl_map.env_end = prctl_map32.env_end;
+   prctl_map.auxv = compat_ptr(prctl_map32.auxv);
+   prctl_map.auxv_size = prctl_map32.auxv_size;
+   prctl_map.exe_fd = prctl_map32.exe_fd;
+   } else
+#endif
if (copy_from_user(_map, addr, sizeof(prctl_map)))
return -EFAULT;
 
-- 
1.7.12.4



Re: [PATCH 2/2] printk: wake up klogd in vprintk_emit

2018-04-18 Thread Sergey Senozhatsky
On (04/18/18 16:04), Petr Mladek wrote:
[..]
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 2f4af216bd6e..86f0b337cbf6 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -1888,6 +1888,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  
> > printed_len = log_output(facility, level, lflags, dict, dictlen, text, 
> > text_len);
> >  
> > +   wake_up_klogd();
> > logbuf_unlock_irqrestore(flags);
> 
> The change makes perfect sense and I am fine with the idea. I just
> wonder if there is a strong reason to do the wake_up before
> releasing the logbuf_lock. It makes an assumption that it needs
> to be synchronized by logbuf_lock.

No, not really, just wanted to wakeup klogd from the same CPU which
called printk().

> In fact, I would feel more comfortable if we move this to the end
> of vprintk_emit() right before return printk_len. This will be
> more close to the current behavior (console first). But it will
> still wakeup klogd much earlier and regularly if there is
> a flood of messages.

Hm, the idea of the patch is that the existing "push everything to slow
consoles first, then wakeup syslog" is not very robust. But probably we
can do what you suggested, yes.

-ss


Re: [RFC PATCH ghak32 V2 10/13] audit: add containerid support for seccomp and anom_abend records

2018-04-18 Thread Paul Moore
On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs  wrote:
> Add container ID auxiliary records to secure computing and abnormal end
> standalone records.
>
> Signed-off-by: Richard Guy Briggs 
> ---
>  kernel/auditsc.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 7103d23..2f02ed9 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -2571,6 +2571,7 @@ static void audit_log_task(struct audit_buffer *ab)
>  void audit_core_dumps(long signr)
>  {
> struct audit_buffer *ab;
> +   struct audit_context *context = audit_alloc_local();

Looking quickly at do_coredump() I *believe* we can use current here.

> if (!audit_enabled)
> return;
> @@ -2578,19 +2579,22 @@ void audit_core_dumps(long signr)
> if (signr == SIGQUIT)   /* don't care for those */
> return;
>
> -   ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_ANOM_ABEND);
> +   ab = audit_log_start(context, GFP_KERNEL, AUDIT_ANOM_ABEND);
> if (unlikely(!ab))
> return;
> audit_log_task(ab);
> audit_log_format(ab, " sig=%ld res=1", signr);
> audit_log_end(ab);
> +   audit_log_container_info(context, "abend", 
> audit_get_containerid(current));
> +   audit_free_context(context);
>  }
>
>  void __audit_seccomp(unsigned long syscall, long signr, int code)
>  {
> struct audit_buffer *ab;
> +   struct audit_context *context = audit_alloc_local();

We can definitely use current here.

> -   ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_SECCOMP);
> +   ab = audit_log_start(context, GFP_KERNEL, AUDIT_SECCOMP);
> if (unlikely(!ab))
> return;
> audit_log_task(ab);
> @@ -2598,6 +2602,8 @@ void __audit_seccomp(unsigned long syscall, long signr, 
> int code)
>  signr, syscall_get_arch(), syscall,
>  in_compat_syscall(), KSTK_EIP(current), code);
> audit_log_end(ab);
> +   audit_log_container_info(context, "seccomp", 
> audit_get_containerid(current));
> +   audit_free_context(context);
>  }
>
>  struct list_head *audit_killed_trees(void)

-- 
paul moore
www.paul-moore.com


[PATCH bpf-next 4/5] samples/bpf: Refine printing symbol for sampleip

2018-04-18 Thread Leo Yan
The code defines macro 'PAGE_OFFSET' and uses it to decide if the
address is in kernel space or not.  But different architecture has
different 'PAGE_OFFSET' so this program cannot be used for all
platforms.

This commit changes to check returned pointer from ksym_search() to
judge if the address falls into kernel space or not, and removes
macro 'PAGE_OFFSET' as it isn't used anymore.  As result, this program
has no architecture dependency.

Signed-off-by: Leo Yan 
---
 samples/bpf/sampleip_user.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/samples/bpf/sampleip_user.c b/samples/bpf/sampleip_user.c
index 4ed690b..0eea1b3 100644
--- a/samples/bpf/sampleip_user.c
+++ b/samples/bpf/sampleip_user.c
@@ -26,7 +26,6 @@
 #define DEFAULT_FREQ   99
 #define DEFAULT_SECS   5
 #define MAX_IPS8192
-#define PAGE_OFFSET0x8800
 
 static int nr_cpus;
 
@@ -107,14 +106,13 @@ static void print_ip_map(int fd)
/* sort and print */
qsort(counts, max, sizeof(struct ipcount), count_cmp);
for (i = 0; i < max; i++) {
-   if (counts[i].ip > PAGE_OFFSET) {
-   sym = ksym_search(counts[i].ip);
+   sym = ksym_search(counts[i].ip);
+   if (sym)
printf("0x%-17llx %-32s %u\n", counts[i].ip, sym->name,
   counts[i].count);
-   } else {
+   else
printf("0x%-17llx %-32s %u\n", counts[i].ip, "(user)",
   counts[i].count);
-   }
}
 
if (max == MAX_IPS) {
-- 
1.9.1



[PATCH bpf-next 5/5] samples/bpf: Handle NULL pointer returned by ksym_search()

2018-04-18 Thread Leo Yan
This commit handles NULL pointer returned by ksym_search() to directly
print address hexadecimal value, the change is applied in 'trace_event',
'spintest' and 'offwaketime' programs.

Signed-off-by: Leo Yan 
---
 samples/bpf/offwaketime_user.c | 5 +
 samples/bpf/spintest_user.c| 5 -
 samples/bpf/trace_event_user.c | 5 +
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/samples/bpf/offwaketime_user.c b/samples/bpf/offwaketime_user.c
index 512f87a..fce2113 100644
--- a/samples/bpf/offwaketime_user.c
+++ b/samples/bpf/offwaketime_user.c
@@ -27,6 +27,11 @@ static void print_ksym(__u64 addr)
if (!addr)
return;
sym = ksym_search(addr);
+   if (!sym) {
+   printf("%llx;", addr);
+   return;
+   }
+
if (PRINT_RAW_ADDR)
printf("%s/%llx;", sym->name, addr);
else
diff --git a/samples/bpf/spintest_user.c b/samples/bpf/spintest_user.c
index 3d73621..3140803 100644
--- a/samples/bpf/spintest_user.c
+++ b/samples/bpf/spintest_user.c
@@ -36,7 +36,10 @@ int main(int ac, char **argv)
bpf_map_lookup_elem(map_fd[0], _key, );
assert(next_key == value);
sym = ksym_search(value);
-   printf(" %s", sym->name);
+   if (!sym)
+   printf(" %lx", value);
+   else
+   printf(" %s", sym->name);
key = next_key;
}
if (key)
diff --git a/samples/bpf/trace_event_user.c b/samples/bpf/trace_event_user.c
index 56f7a25..d2ab33e 100644
--- a/samples/bpf/trace_event_user.c
+++ b/samples/bpf/trace_event_user.c
@@ -33,6 +33,11 @@ static void print_ksym(__u64 addr)
if (!addr)
return;
sym = ksym_search(addr);
+   if (!sym) {
+   printf("%llx;", addr);
+   return;
+   }
+
printf("%s;", sym->name);
if (!strcmp(sym->name, "sys_read"))
sys_read_seen = true;
-- 
1.9.1



[PATCH bpf-next 2/5] samples/bpf: Dynamically allocate structure 'syms'

2018-04-18 Thread Leo Yan
Structure 'syms' is used to store kernel symbol info by reading proc fs
node '/proc/kallsyms', this structure is declared with 30 entries
and static linked into bss section.  For most case the kernel symbols
has less than 30 entries, so it's safe to define so large array, but
the side effect is bss section is big introduced by this structure and
it isn't flexible.

To fix this, this patch dynamically allocates memory for structure
'syms' based on parsing '/proc/kallsyms' line number at the runtime,
which can save elf file required memory significantly.

Before:
   textdata bss dec hex filename
  188411172 5199776 5219789  4fa5cd samples/bpf/sampleip

After:
   textdata bss dec hex filename
  191011188  399792  420081   668f1 samples/bpf/sampleip

Signed-off-by: Leo Yan 
---
 samples/bpf/bpf_load.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 28e4678..c2bf7ca 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -651,8 +651,7 @@ void read_trace_pipe(void)
}
 }
 
-#define MAX_SYMS 30
-static struct ksym syms[MAX_SYMS];
+static struct ksym *syms;
 static int sym_cnt;
 
 static int ksym_cmp(const void *p1, const void *p2)
@@ -678,12 +677,30 @@ int load_kallsyms(void)
break;
if (!addr)
continue;
+   sym_cnt++;
+   }
+
+   syms = calloc(sym_cnt, sizeof(*syms));
+   if (!syms) {
+   fclose(f);
+   return -ENOMEM;
+   }
+
+   rewind(f);
+   while (!feof(f)) {
+   if (!fgets(buf, sizeof(buf), f))
+   break;
+   if (sscanf(buf, "%p %c %s", , , func) != 3)
+   break;
+   if (!addr)
+   continue;
syms[i].addr = (long) addr;
syms[i].name = strdup(func);
i++;
}
-   sym_cnt = i;
qsort(syms, sym_cnt, sizeof(struct ksym), ksym_cmp);
+
+   fclose(f);
return 0;
 }
 
-- 
1.9.1



[PATCH bpf-next 0/5] samples/bpf: Minor fixes and cleanup

2018-04-18 Thread Leo Yan
This patch series is minor fixes and cleanup for bpf load and samples
code.  The first one patch is typo fixing; patch 0002 is refactor for
dynamically allocate memory for kernel symbol structures; the last
three patches are mainly related with refactor with function
ksym_search(), the main benefit of this refactor is program sampleip
can be used without architecture dependency.

The patch series has been tested on ARM64 Hikey960 boards.

Leo Yan (5):
  samples/bpf: Fix typo in comment
  samples/bpf: Dynamically allocate structure 'syms'
  samples/bpf: Use NULL for failed to find symbol
  samples/bpf: Refine printing symbol for sampleip
  samples/bpf: Handle NULL pointer returned by ksym_search()

 samples/bpf/bpf_load.c | 29 +++--
 samples/bpf/offwaketime_user.c |  5 +
 samples/bpf/sampleip_user.c|  8 +++-
 samples/bpf/spintest_user.c|  5 -
 samples/bpf/trace_event_user.c |  5 +
 5 files changed, 40 insertions(+), 12 deletions(-)

-- 
1.9.1



[PATCH bpf-next 1/5] samples/bpf: Fix typo in comment

2018-04-18 Thread Leo Yan
Fix typo by replacing 'iif' with 'if'.

Signed-off-by: Leo Yan 
---
 samples/bpf/bpf_load.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index bebe418..28e4678 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -393,7 +393,7 @@ static int load_elf_maps_section(struct bpf_map_data *maps, 
int maps_shndx,
continue;
if (sym[nr_maps].st_shndx != maps_shndx)
continue;
-   /* Only increment iif maps section */
+   /* Only increment if maps section */
nr_maps++;
}
 
-- 
1.9.1



[PATCH bpf-next 3/5] samples/bpf: Use NULL for failed to find symbol

2018-04-18 Thread Leo Yan
Function ksym_search() is used to parse address and return the symbol
structure, when the address is out of range for kernel symbols it
returns the symbol structure of kernel '_stext' entry; this introduces
confusion and it misses the chance to intuitively tell the address is
out of range.

This commit changes to use NULL pointer for failed to find symbol, user
functions need to check the pointer is NULL and get to know the address
has no corresponding kernel symbol for it.

Signed-off-by: Leo Yan 
---
 samples/bpf/bpf_load.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index c2bf7ca..0c0584f 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -726,7 +726,7 @@ struct ksym *ksym_search(long key)
/* valid ksym */
return [start - 1];
 
-   /* out of range. return _stext */
-   return [0];
+   /* out of range. return NULL */
+   return NULL;
 }
 
-- 
1.9.1



INVESTMENT OPPORTUNITY

2018-04-18 Thread Christopher Gregson
Hello

I write to you based on a request by an investor for
funding/investment in your project.

My name is Christopher Gregson, a financial consultant with Williams
Group UK Properties Limited.

My reason for contacting you is that my client who is a politician,
has an urgent need to invest a considerable amount of funds into
lucrative opportunities with you. My client's request is for a
competent individual or company such as you, who will be willing
to put this money to good use for a period of 10 years for a start.

Should this be of interest to you, please do not hesitate to e-mail me
back for further information.

I take this opportunity to thank you for taking your time to read the
contents of this email.

Kind regards,
Christopher Gregson


Re: [PATCH v2] prctl: fix compat handling for prctl

2018-04-18 Thread Andy Lutomirski


> On Apr 18, 2018, at 9:06 PM, Li Bin  wrote:
> 
> The member auxv in prctl_mm_map structure which be shared with
> userspace is pointer type, but the kernel supporting COMPAT didn't
> handle it. This patch fix the compat handling for prctl syscall.
> 
> Signed-off-by: Li Bin 
> ---
> kernel/sys.c | 42 ++
> 1 file changed, 42 insertions(+)
> 
> diff --git a/kernel/sys.c b/kernel/sys.c
> index ad69218..d4259938 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1969,6 +1969,26 @@ static int validate_prctl_map(struct prctl_mm_map 
> *prctl_map)
> }
> 
> #ifdef CONFIG_CHECKPOINT_RESTORE
> +
> +#ifdef CONFIG_COMPAT
> +struct compat_prctl_mm_map {
> +__u64   start_code; /* code section bounds */
> +__u64   end_code;
> +__u64   start_data; /* data section bounds */
> +__u64   end_data;
> +__u64   start_brk;  /* heap for brk() syscall */
> +__u64   brk;
> +__u64   start_stack;/* stack starts at */
> +__u64   arg_start;  /* command line arguments bounds */
> +__u64   arg_end;
> +__u64   env_start;  /* environment variables bounds */
> +__u64   env_end;
> +compat_uptr_t   auxv;   /* auxiliary vector */
> +__u32   auxv_size;  /* vector size */
> +__u32   exe_fd; /* /proc/$pid/exe link file */
> +};
> +#endif
> +
> static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long 
> data_size)
> {
>struct prctl_mm_map prctl_map = { .exe_fd = (u32)-1, };
> @@ -1986,6 +2006,28 @@ static int prctl_set_mm_map(int opt, const void __user 
> *addr, unsigned long data
>if (data_size != sizeof(prctl_map))
>return -EINVAL;
> 
> +#ifdef CONFIG_COMPAT
> +if (in_compat_syscall()) {
> +struct compat_prctl_mm_map prctl_map32;
> +if (copy_from_user(_map32, addr, sizeof(prctl_map32)))
> +return -EFAULT;
> +
> +prctl_map.start_code = prctl_map32.start_code;
> +prctl_map.end_code = prctl_map32.end_code;
> +prctl_map.start_data = prctl_map32.start_data;
> +prctl_map.end_data = prctl_map32.end_data;
> +prctl_map.start_brk = prctl_map32.start_brk;
> +prctl_map.brk = prctl_map32.brk;
> +prctl_map.start_stack = prctl_map32.start_stack;
> +prctl_map.arg_start = prctl_map32.arg_start;
> +prctl_map.arg_end = prctl_map32.arg_end;
> +prctl_map.env_start = prctl_map32.env_start;
> +prctl_map.env_end = prctl_map32.env_end;
> +prctl_map.auxv = compat_ptr(prctl_map32.auxv);
> +prctl_map.auxv_size = prctl_map32.auxv_size;
> +prctl_map.exe_fd = prctl_map32.exe_fd;
> +} else
> +#endif
>if (copy_from_user(_map, addr, sizeof(prctl_map)))
>return -EFAULT;
> 
> -- 
> 1.7.12.4
> 


[PATCHv2] printk: wake up klogd in vprintk_emit

2018-04-18 Thread Sergey Senozhatsky
We wake up klogd very late - only when current console_sem owner
is done pushing pending kernel messages to the serial/net consoles.
In some cases this results in lost syslog messages, because kernel
log buffer is a circular buffer and if we don't wakeup syslog long
enough there are chances that logbuf simply will wrap around.

The patch moves the klog wake up call to vprintk_emit(), which is
the only legit way for a kernel message to appear in the logbuf,
right before we attempt to grab the console_sem (possibly spinning
on it waiting for the hand off) and call console drivers.

Signed-off-by: Sergey Senozhatsky 
---
 kernel/printk/printk.c | 14 ++
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2f4af216bd6e..247808333ba4 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1908,6 +1908,7 @@ asmlinkage int vprintk_emit(int facility, int level,
preempt_enable();
}
 
+   wake_up_klogd();
return printed_len;
 }
 EXPORT_SYMBOL(vprintk_emit);
@@ -2289,9 +2290,7 @@ void console_unlock(void)
 {
static char ext_text[CONSOLE_EXT_LOG_MAX];
static char text[LOG_LINE_MAX + PREFIX_MAX];
-   static u64 seen_seq;
unsigned long flags;
-   bool wake_klogd = false;
bool do_cond_resched, retry;
 
if (console_suspended) {
@@ -2335,11 +2334,6 @@ void console_unlock(void)
 
printk_safe_enter_irqsave(flags);
raw_spin_lock(_lock);
-   if (seen_seq != log_next_seq) {
-   wake_klogd = true;
-   seen_seq = log_next_seq;
-   }
-
if (console_seq < log_first_seq) {
len = sprintf(text, "** %u printk messages dropped 
**\n",
  (unsigned)(log_first_seq - console_seq));
@@ -2397,7 +2391,7 @@ void console_unlock(void)
 
if (console_lock_spinning_disable_and_check()) {
printk_safe_exit_irqrestore(flags);
-   goto out;
+   return;
}
 
printk_safe_exit_irqrestore(flags);
@@ -2429,10 +2423,6 @@ void console_unlock(void)
 
if (retry && console_trylock())
goto again;
-
-out:
-   if (wake_klogd)
-   wake_up_klogd();
 }
 EXPORT_SYMBOL(console_unlock);
 
-- 
2.17.0



Re: [RFC PATCH ghak32 V2 11/13] audit: add support for containerid to network namespaces

2018-04-18 Thread Paul Moore
On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs  wrote:
> Audit events could happen in a network namespace outside of a task
> context due to packets received from the net that trigger an auditing
> rule prior to being associated with a running task.  The network
> namespace could in use by multiple containers by association to the
> tasks in that network namespace.  We still want a way to attribute
> these events to any potential containers.  Keep a list per network
> namespace to track these container identifiiers.
>
> Add/increment the container identifier on:
> - initial setting of the container id via /proc
> - clone/fork call that inherits a container identifier
> - unshare call that inherits a container identifier
> - setns call that inherits a container identifier
> Delete/decrement the container identifier on:
> - an inherited container id dropped when child set
> - process exit
> - unshare call that drops a net namespace
> - setns call that drops a net namespace
>
> See: https://github.com/linux-audit/audit-kernel/issues/32
> See: https://github.com/linux-audit/audit-testsuite/issues/64
> Signed-off-by: Richard Guy Briggs 
> ---
>  include/linux/audit.h   |  7 +++
>  include/net/net_namespace.h | 12 
>  kernel/auditsc.c|  9 ++---
>  kernel/nsproxy.c|  6 ++
>  net/core/net_namespace.c| 45 
> +
>  5 files changed, 76 insertions(+), 3 deletions(-)

...

> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
> index 0490084..343a428 100644
> --- a/include/net/net_namespace.h
> +++ b/include/net/net_namespace.h
> @@ -33,6 +33,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  struct user_namespace;
>  struct proc_dir_entry;
> @@ -150,6 +151,7 @@ struct net {
>  #endif
> struct sock *diag_nlsk;
> atomic_tfnhe_genid;
> +   struct list_headaudit_containerid;
>  } __randomize_layout;

We talked about this briefly off-list, you should be using audit_net
and the net_generic mechanism instead of this.

>  #include 
> @@ -301,6 +303,16 @@ static inline struct net *read_pnet(const possible_net_t 
> *pnet)
>  #define __net_initconst__initconst
>  #endif
>
> +#ifdef CONFIG_NET_NS
> +void net_add_audit_containerid(struct net *net, u64 containerid);
> +void net_del_audit_containerid(struct net *net, u64 containerid);
> +#else
> +static inline void net_add_audit_containerid(struct net *, u64)
> +{ }
> +static inline void net_del_audit_containerid(struct net *, u64)
> +{ }
> +#endif
> +
>  int peernet2id_alloc(struct net *net, struct net *peer);
>  int peernet2id(struct net *net, struct net *peer);
>  bool peernet_has_id(struct net *net, struct net *peer);
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 2f02ed9..208da962 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -75,6 +75,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include "audit.h"
>
> @@ -2175,16 +2176,18 @@ static void audit_log_set_containerid(struct 
> task_struct *task, u64 oldcontainer
>   */
>  int audit_set_containerid(struct task_struct *task, u64 containerid)
>  {
> -   u64 oldcontainerid;
> +   u64 oldcontainerid = audit_get_containerid(task);
> int rc;
> -
> -   oldcontainerid = audit_get_containerid(task);
> +   struct net *net = task->nsproxy->net_ns;
>
> rc = audit_set_containerid_perm(task, containerid);
> if (!rc) {
> +   if (cid_valid(oldcontainerid))
> +   net_del_audit_containerid(net, oldcontainerid);

Using audit_net we can handle this internal to audit, which is a Good Thing.

> task_lock(task);
> task->containerid = containerid;
> task_unlock(task);
> +   net_add_audit_containerid(net, containerid);

Same.

> }
>
> audit_log_set_containerid(task, oldcontainerid, containerid, rc);
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index f6c5d33..d9f1090 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -140,6 +140,7 @@ int copy_namespaces(unsigned long flags, struct 
> task_struct *tsk)
> struct nsproxy *old_ns = tsk->nsproxy;
> struct user_namespace *user_ns = task_cred_xxx(tsk, user_ns);
> struct nsproxy *new_ns;
> +   u64 containerid = audit_get_containerid(tsk);
>
> if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
>   CLONE_NEWPID | CLONE_NEWNET |
> @@ -167,6 +168,7 @@ int copy_namespaces(unsigned long flags, struct 
> task_struct *tsk)
> return  PTR_ERR(new_ns);
>
> tsk->nsproxy = new_ns;
> +   net_add_audit_containerid(new_ns->net_ns, containerid);
> return 0;
>  }

Hopefully we can handle this in audit_net_init(), we just need to
figure out where we can get the correct task_struct for 

Re: [PATCH 1/6] rhashtable: remove outdated comments about grow_decision etc

2018-04-18 Thread David Miller
From: NeilBrown 
Date: Thu, 19 Apr 2018 09:09:05 +1000

> On Wed, Apr 18 2018, Herbert Xu wrote:
> 
>> On Wed, Apr 18, 2018 at 04:47:01PM +1000, NeilBrown wrote:
>>> grow_decision and shink_decision no longer exist, so remove
>>> the remaining references to them.
>>> 
>>> Signed-off-by: NeilBrown 
>>
>> Acked-by: Herbert Xu 
> 
> Thanks.  Is that Ack sufficient for this patch to go upstream, or is
> there something else that I need to do?

One patch being ACK'd does not release the whole series to be applied
and the whole series will be treated as a complete unit for that
purpose.

So if discussion is holding up one patch in the series, it holds up
the entire series.

So get the entire series in acceptable condition, or submit only one
change at a time individually and wait for that one to be accepted
before you submit and ask for feedback on the next one.

I hope that makes things clear for you.


Re: c9e97a1997 BUG: kernel reboot-without-warning in early-boot stage, last printk: early console in setup code

2018-04-18 Thread Pavel Tatashin
Thank you, I am studying the problem.

Pavel

On Wed, Apr 18, 2018 at 9:31 PM, Fengguang Wu  wrote:
> On Wed, Apr 18, 2018 at 06:38:25PM -0500, Dennis Zhou wrote:
>>Hi,
>>
>>On Wed, Apr 18, 2018 at 09:55:53PM +0800, Fengguang Wu wrote:
>>>
>>> Hello,
>>>
>>> FYI here is a slightly different boot error in mainline kernel 4.17.0-rc1.
>>> It also dates back to v4.16 .
>>>
>>> It occurs in 4 out of 4 boots.
>>>
>>> [0.00] Built 1 zonelists, mobility grouping on.  Total pages: 128873
>>> [0.00] Kernel command line: root=/dev/ram0 hung_task_panic=1 debug 
>>> apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 
>>> net.ifnames=0 printk.devkmsg=on panic=-1 softlockup_panic=1 
>>> nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 
>>> drbd.minor_count=8 systemd.log_level=err ignore_loglevel console=tty0 
>>> earlyprintk=ttyS0,115200 console=ttyS0,115200 vga=normal rw 
>>> link=/kbuild-tests/run-queue/kvm/x86_64-randconfig-a0-04172313/linux-devel:devel-hourly-2018041714:60cc43fc888428bb2f18f08997432d426a243338/.vmlinuz-60cc43fc888428bb2f18f08997432d426a243338-20180418000325-19:yocto-lkp-nhm-dp2-4
>>>  branch=linux-devel/devel-hourly-2018041714 
>>> BOOT_IMAGE=/pkg/linux/x86_64-randconfig-a0-04172313/gcc-7/60cc43fc888428bb2f18f08997432d426a243338/vmlinuz-4.17.0-rc1
>>>  drbd.minor_count=8 rcuperf.shutdown=0
>>> [0.00] sysrq: sysrq always enabled.
>>> [0.00] Dentry cache hash table entries: 65536 (order: 7, 524288 
>>> bytes)
>>> [0.00] Inode-cache hash table entries: 32768 (order: 6, 262144 
>>> bytes)
>>> PANIC: early exception 0x0d IP 10:a892f15f error 0 cr2 
>>> 0x88001fbff000
>>> [0.00] CPU: 0 PID: 0 Comm: swapper Tainted: GT 
>>> 4.17.0-rc1 #238
>>> [0.00] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>>> 1.10.2-1 04/01/2014
>>> [0.00] RIP: 0010:per_cpu_ptr_to_phys+0x16a/0x298:
>>>  __section_mem_map_addr at 
>>> include/linux/mmzone.h:1188
>>>   (inlined by) 
>>> per_cpu_ptr_to_phys at mm/percpu.c:1849
>>> [0.00] RSP: :ab407e50 EFLAGS: 00010046 ORIG_RAX: 
>>> 
>>> [0.00] RAX: dc00 RBX: 88001f17c340 RCX: 
>>> 000f
>>> [0.00] RDX:  RSI: 0001 RDI: 
>>> acfbf580
>>> [0.00] RBP: ab40d000 R08: fbfff57c4eca R09: 
>>> 
>>> [0.00] R10: 880015421000 R11: fbfff57c4ec9 R12: 
>>> 
>>> [0.00] R13: 88001fb03ff8 R14: 88001fc051c0 R15: 
>>> 
>>> [0.00] FS:  () GS:ab4c5000() 
>>> knlGS:
>>> [0.00] CS:  0010 DS:  ES:  CR0: 80050033
>>> [0.00] CR2: 88001fbff000 CR3: 1a06c000 CR4: 
>>> 06b0
>>> [0.00] Call Trace:
>>> [0.00]  setup_cpu_entry_areas+0x7b/0x27b:
>>>  setup_cpu_entry_area at 
>>> arch/x86/mm/cpu_entry_area.c:104
>>>   (inlined by) 
>>> setup_cpu_entry_areas at arch/x86/mm/cpu_entry_area.c:177
>>> [0.00]  trap_init+0xb/0x13d:
>>>  trap_init at 
>>> arch/x86/kernel/traps.c:949
>>> [0.00]  start_kernel+0x2a5/0x91d:
>>>  mm_init at init/main.c:519
>>>   (inlined by) start_kernel at 
>>> init/main.c:589
>>> [0.00]  ? thread_stack_cache_init+0x6/0x6
>>> [0.00]  ? memcpy_orig+0x16/0x110:
>>>  memcpy_orig at 
>>> arch/x86/lib/memcpy_64.S:77
>>> [0.00]  ? x86_family+0x5/0x1d:
>>>  x86_family at 
>>> arch/x86/lib/cpu.c:8
>>> [0.00]  ? load_ucode_bsp+0x42/0x13e:
>>>  load_ucode_bsp at 
>>> arch/x86/kernel/cpu/microcode/core.c:183
>>> [0.00]  secondary_startup_64+0xa5/0xb0:
>>>  secondary_startup_64 at 
>>> arch/x86/kernel/head_64.S:242
>>> [0.00] Code: 78 06 00 49 8b 45 00 48 85 c0 74 a5 49 c1 ec 28 41 81 
>>> e4 e0 0f 00 00 49 01 c4 4c 89 e2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 
>>> <80> 3c 02 00 74 08 4c 89 e7 e8 63 78 06 00 49 8b 04 24 81 e5 ff
>>> BUG: kernel hang in boot stage
>>>
>>
>>I spent some time bisecting this one and it seemse to be an intermittent
>>issue starting with this commit for me:
>>c9e97a1997, mm: initialize pages on demand during boot. The prior
>>commit, 3a2d7fa8a3, did not run into this issue after 10+ boots.
>
> Dennis, thanks for bisecting it down!
>
> Pavel, here is an early boot error bisected to c9e97a1997 ("mm:
> initialize pages on demand during boot"). 

Re: [PATCH v3 2/2] iommu/amd: Add basic debugfs infrastructure for AMD IOMMU

2018-04-18 Thread Yang, Shunyong
Hi, Gary,

On Wed, 2018-04-18 at 16:51 -0400, Hook, Gary wrote:
> On 4/18/2018 4:16 PM, Mehta, Sohil wrote:
> > 
> > On Wed, 2018-04-18 at 08:31 +, Yang, Shunyong wrote:
> > > 
> > > Maybe the original design is to call debugfs_initialized() before
> > > calling debugfs_create_xxx()?
> > I am unaware of the original design. Someone else would probably
> > have
> > more context. However, looking at other places in the kernel where
> > debugfs_create_xx() is used, the common convention seems to be to
> > avoid
> > calling debugfs_initialized().
> > 
> >   Sohil
> > 
> debugfs_initialized() was introduced in commit c0f92ba99 back in 
> 2.6.30-rc1. It was intended as a helper, not as a gatekeeper, which
> is 
> why one doesn't see it used. Given that my use in this proposed patch
> is 
> straightforward, I'm not seeing the need here. I had just seen some 
> other code that used it, and copied the model.
> 
> Unless someone comes along to say, yes, use it, I'll not.
> 

I agree with you and Sohil on removing the unnecessary function
calling.

Thanks.
Shunyong.


> Gary


Re: [RFC PATCH ghak32 V2 12/13] audit: NETFILTER_PKT: record each container ID associated with a netNS

2018-04-18 Thread Paul Moore
On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs  wrote:
> Add container ID auxiliary record(s) to NETFILTER_PKT event standalone
> records.  Iterate through all potential container IDs associated with a
> network namespace.
>
> Signed-off-by: Richard Guy Briggs 
> ---
>  kernel/audit.c   |  1 +
>  kernel/auditsc.c |  2 ++
>  net/netfilter/xt_AUDIT.c | 15 ++-
>  3 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 08662b4..3c77e47 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -2102,6 +2102,7 @@ int audit_log_container_info(struct audit_context 
> *context,
> audit_log_end(ab);
> return 0;
>  }
> +EXPORT_SYMBOL(audit_log_container_info);
>
>  void audit_log_key(struct audit_buffer *ab, char *key)
>  {
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 208da962..af68d01 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -975,6 +975,7 @@ struct audit_context *audit_alloc_local(void)
> context->in_syscall = 1;
> return context;
>  }
> +EXPORT_SYMBOL(audit_alloc_local);
>
>  inline void audit_free_context(struct audit_context *context)
>  {
> @@ -989,6 +990,7 @@ inline void audit_free_context(struct audit_context 
> *context)
> audit_proctitle_free(context);
> kfree(context);
>  }
> +EXPORT_SYMBOL(audit_free_context);
>
>  static int audit_log_pid_context(struct audit_context *context, pid_t pid,
>  kuid_t auid, kuid_t uid, unsigned int 
> sessionid,
> diff --git a/net/netfilter/xt_AUDIT.c b/net/netfilter/xt_AUDIT.c
> index c502419..edaa456 100644
> --- a/net/netfilter/xt_AUDIT.c
> +++ b/net/netfilter/xt_AUDIT.c
> @@ -71,10 +71,14 @@ static bool audit_ip6(struct audit_buffer *ab, struct 
> sk_buff *skb)
>  {
> struct audit_buffer *ab;
> int fam = -1;
> +   struct audit_context *context = audit_alloc_local();
> +   struct audit_containerid *cont;
> +   int i = 0;
> +   struct net *net;
>
> if (audit_enabled == 0)
> goto errout;

Do I need to say it?  I probably should ... the allocation should
happen after the audit_enabled check.

> -   ab = audit_log_start(NULL, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
> +   ab = audit_log_start(context, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
> if (ab == NULL)
> goto errout;
>
> @@ -104,7 +108,16 @@ static bool audit_ip6(struct audit_buffer *ab, struct 
> sk_buff *skb)
>
> audit_log_end(ab);
>
> +   net = sock_net(NETLINK_CB(skb).sk);
> +   list_for_each_entry(cont, >audit_containerid, list) {
> +   char buf[14];
> +
> +   sprintf(buf, "net%u", i++);
> +   audit_log_container_info(context, buf, cont->id);
> +   }

It seems like this could (should?) be hidden inside an audit function,
e.g. audit_log_net_containers() or something like that.

>  errout:
> +   audit_free_context(context);
> return XT_CONTINUE;
>  }

-- 
paul moore
www.paul-moore.com


Re: [PATCH] blkcg: not hold blkcg lock when deactivating policy.

2018-04-18 Thread Jens Axboe
On 4/18/18 6:54 PM, jiang.bi...@zte.com.cn wrote:
 by chance, did you check whether this may cause problems with bfq,
 being the latter not protected by the queue lock as cfq?
>>> Checked the bfq code, bfq seems never used blkcg lock derectly, and
>>> update of blkg in the common code is protected by both queue and
>>> blkcg locks, so IMHO this patch would not introduce any new problem
>>> with bfq, even though bfq is not protected by queue lock.
>>> On the other hand, the locks (queue lock/blkcg lock) used to protected
>>> the update of blkg seems a bit too heavyweight, especially the queue lock
>>> which is used too widely may cause races with other contexts. I wonder
>>> if there is any way to ease the case? e.g. add a new lock for blkg's own.:)
>>
>> It might make sense to lock it separately, but I would not worry
>> about it unless it shows up as hot in your testing.
> Actually, we've met a triggering of nmi_watchdog, blocked at the queue lock
> in blkcg_print_blkgs(), caused by the slow serial console and too many 
> printks.
> Related discussion is here,
> https://bugzilla.kernel.org/show_bug.cgi?id=199003
> Even though it's not caused by the queue lock directly, it would not happen
> without using queue lock. The queue lock is big and used too widely, using it
> would intensify the race, so we're trying to understand the locks using in 
> blkg,
> and maybe could improve the situation.

The queue lock is only used widely on non blk-mq, where it is the only
lock really. Doing serial IO under a spinlock is always going to suck,
regardless of how contended it is.

-- 
Jens Axboe



Re: PATCH V4 0/5 nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-04-18 Thread Ming Lei
On Thu, Apr 19, 2018 at 09:51:16AM +0800, jianchao.wang wrote:
> Hi Ming
> 
> Thanks for your kindly response.
> 
> On 04/18/2018 11:40 PM, Ming Lei wrote:
> >> Regarding to this patchset, it is mainly to fix the dependency between
> >> nvme_timeout and nvme_dev_disable, as your can see:
> >> nvme_timeout will invoke nvme_dev_disable, and nvme_dev_disable have to
> >> depend on nvme_timeout when controller no response.
> > Do you mean nvme_disable_io_queues()? If yes, this one has been handled
> > by wait_for_completion_io_timeout() already, and looks the block timeout
> > can be disabled simply. Or are there others?
> > 
> Here is one possible scenario currently
> 
> nvme_dev_disable // hold shutdown_lock nvme_timeout
>   -> nvme_set_host_mem   -> nvme_dev_disable
> -> nvme_submit_sync_cmd-> try to require 
> shutdown_lock 
>   -> __nvme_submit_sync_cmd
> -> blk_execute_rq
>   //if sysctl_hung_task_timeout_secs == 0
>   -> wait_for_completion_io
> And maybe nvme_dev_disable need to issue other commands in the future.

OK, thanks for sharing this one, for now I think it might need to be
handled by wait_for_completion_io_timeout() for working around this issue.

> 
> Even if we could fix these kind of issues as nvme_disable_io_queues, 
> it is still a risk I think.

Yeah, I can't agree more, that is why I think the nvme time/eh code should
be refactored, and solve the current issues in a more clean/maintainable
way.

Thanks,
Ming


Re: [PATCH] perf tools: set kernel end address properly

2018-04-18 Thread Namhyung Kim
On Wed, Apr 18, 2018 at 07:37:59PM -0500, Kim Phillips wrote:
> On Tue, 17 Apr 2018 11:27:26 +0900
> Namhyung Kim  wrote:
> > On Mon, Apr 16, 2018 at 05:48:11PM -0500, Kim Phillips wrote:
> > > > a perf/urgent from last week (commit 918965d4897) + this patch:
> > > > 
> > > > $ sudo ./perf test -vv 1 |& head 
> > > >  1: vmlinux symtab matches kallsyms   :
> > > > --- start ---
> > > > test child forked, pid 6194
> > > > Looking at the vmlinux_path (8 entries long)
> > > > Using /lib/modules/4.16.0+/build/vmlinux for symbols
> > > > ERR : 0x28081000: do_undefinstr not on kallsyms
> > > > ERR : 0x280810b8: do_sysinstr not on kallsyms
> > > > ERR : 0x28081258: do_debug_exception not on kallsyms
> > > > ERR : 0x28081648: do_mem_abort not on kallsyms
> > > > ERR : 0x280818b8: do_el0_irq_bp_hardening not on kallsyms
> > > > $ sudo ./perf test -vv 1 |& tail
> > > > ERR : 0x2a1d37c8: tramp_exit_native not on kallsyms
> > > > ERR : 0x2a1d37e8: tramp_exit_compat not on kallsyms
> > > > ERR : 0x2a1d4000: __entry_tramp_text_end not on kallsyms
> > > > WARN: Maps only in vmlinux:
> > > >  2808-28081000 1 [kernel].head.text
> > > >  2aec-2aff7548 2e5 [kernel].init.text
> > > >  2aff7548-2b0126d4 2f87548 [kernel].exit.text
> > > > test child finished with -1
> > > >  end 
> > > > vmlinux symtab matches kallsyms: FAILED!
> > > 
> > > this patch's advertised "If there's no module after the kernel map, the
> > > end address will be ~0ULL." doesn't seem to be working: the value it
> > > gets for 'end' is 0x2808.
> > 
> > For the vmlinux, right?
> 
> yes, map__next(machine__kernel_map(machine)) has the start address
> of the single module currently loaded:
> 
> 2229 t $x [arm_ccn]
> 2229 t arm_ccn_pmu_events_is_visible  [arm_ccn]
> 
> The beginning of the kernel is..later:
> 
> 2808 t _head
> 2808 T _text
> 
> and its end according to grep -w _end /proc/kallsyms is:
> 
> 2d5f9000 B _end
> 
> but end was assigned to the beginning of arm_ccn (0x2229),
> which is upside-down.

So ARM64 has modules below the kernel.

> 
> > To be precise, it should be "If there's no map after the kernel map".
> 
> In numerical address order, in maps in map_groups__insert order, or
> some other order?

The map_groups__insert() also sorts the map by address, so they should
be identical.  I think the problem is perf assumes the kernel is the
first map in the kmaps.  When it calls maps_groups__insert() it uses
start address of 0 for the kernel map.  It seems always true for x86
but not for ARM64.

While it changes the start address in machine__set_kernel_mmap() it
doesn't change the order in the kmaps.

Could you please test below patch (on top) then?

Thanks,
Namhyung


---8<---

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index be328416de61..0f3c4bc7b90f 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1228,7 +1228,6 @@ int machine__create_kernel_maps(struct machine *machine)
const char *name = NULL;
struct map *map;
u64 addr = 0;
-   u64 end = ~0ULL;
int ret;
 
if (kernel == NULL)
@@ -1254,14 +1253,25 @@ int machine__create_kernel_maps(struct machine *machine)
machine__destroy_kernel_maps(machine);
return -1;
}
+
+   /* we have a real start address now, so re-order the kmaps */
+   map = machine__kernel_map(machine);
+
+   map__get(map);
+   map_groups__remove(>kmaps, map);
+
+   /* assume it's the last in the kmaps */
+   machine__set_kernel_mmap(machine, addr, ~0ULL);
+
+   map_groups__insert(>kmaps, map);
+   map__put(map);
}
 
/* update end address of the kernel map using adjacent module address */
map = map__next(machine__kernel_map(machine));
if (map)
-   end = map->start;
+   machine__set_kernel_mmap(machine, addr, map->start);
 
-   machine__set_kernel_mmap(machine, addr, end);
return 0;
 }
 


Re: [RFC PATCH ghak32 V2 07/13] audit: add container aux record to watch/tree/mark

2018-04-18 Thread Paul Moore
On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs  wrote:
> Add container ID auxiliary record to mark, watch and tree rule
> configuration standalone records.
>
> Signed-off-by: Richard Guy Briggs 
> ---
>  kernel/audit_fsnotify.c |  5 -
>  kernel/audit_tree.c |  5 -
>  kernel/audit_watch.c| 33 +++--
>  3 files changed, 27 insertions(+), 16 deletions(-)
>
> diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
> index 52f368b..18c110d 100644
> --- a/kernel/audit_fsnotify.c
> +++ b/kernel/audit_fsnotify.c
> @@ -124,10 +124,11 @@ static void audit_mark_log_rule_change(struct 
> audit_fsnotify_mark *audit_mark, c
>  {
> struct audit_buffer *ab;
> struct audit_krule *rule = audit_mark->rule;
> +   struct audit_context *context = audit_alloc_local();
>
> if (!audit_enabled)
> return;

Move the audit_alloc_local() after the audit_enabled check.

> -   ab = audit_log_start(NULL, GFP_NOFS, AUDIT_CONFIG_CHANGE);
> +   ab = audit_log_start(context, GFP_NOFS, AUDIT_CONFIG_CHANGE);
> if (unlikely(!ab))
> return;
> audit_log_format(ab, "auid=%u ses=%u op=%s",
> @@ -138,6 +139,8 @@ static void audit_mark_log_rule_change(struct 
> audit_fsnotify_mark *audit_mark, c
> audit_log_key(ab, rule->filterkey);
> audit_log_format(ab, " list=%d res=1", rule->listnr);
> audit_log_end(ab);
> +   audit_log_container_info(context, "config", 
> audit_get_containerid(current));
> +   audit_free_context(context);
>  }
>
>  void audit_remove_mark(struct audit_fsnotify_mark *audit_mark)
> diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> index 67e6956..7c085be 100644
> --- a/kernel/audit_tree.c
> +++ b/kernel/audit_tree.c
> @@ -496,8 +496,9 @@ static int tag_chunk(struct inode *inode, struct 
> audit_tree *tree)
>  static void audit_tree_log_remove_rule(struct audit_krule *rule)
>  {
> struct audit_buffer *ab;
> +   struct audit_context *context = audit_alloc_local();

Sort of independent of the audit container ID work, but shouldn't we
have an audit_enabled check here?

> -   ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_CONFIG_CHANGE);
> +   ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONFIG_CHANGE);
> if (unlikely(!ab))
> return;
> audit_log_format(ab, "op=remove_rule");
> @@ -506,6 +507,8 @@ static void audit_tree_log_remove_rule(struct audit_krule 
> *rule)
> audit_log_key(ab, rule->filterkey);
> audit_log_format(ab, " list=%d res=1", rule->listnr);
> audit_log_end(ab);
> +   audit_log_container_info(context, "config", 
> audit_get_containerid(current));
> +   audit_free_context(context);
>  }
>
>  static void kill_rules(struct audit_tree *tree)
> diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
> index 9eb8b35..60d75a2 100644
> --- a/kernel/audit_watch.c
> +++ b/kernel/audit_watch.c
> @@ -238,20 +238,25 @@ static struct audit_watch *audit_dupe_watch(struct 
> audit_watch *old)
>
>  static void audit_watch_log_rule_change(struct audit_krule *r, struct 
> audit_watch *w, char *op)
>  {
> -   if (audit_enabled) {
> -   struct audit_buffer *ab;
> -   ab = audit_log_start(NULL, GFP_NOFS, AUDIT_CONFIG_CHANGE);
> -   if (unlikely(!ab))
> -   return;
> -   audit_log_format(ab, "auid=%u ses=%u op=%s",
> -from_kuid(_user_ns, 
> audit_get_loginuid(current)),
> -audit_get_sessionid(current), op);
> -   audit_log_format(ab, " path=");
> -   audit_log_untrustedstring(ab, w->path);
> -   audit_log_key(ab, r->filterkey);
> -   audit_log_format(ab, " list=%d res=1", r->listnr);
> -   audit_log_end(ab);
> -   }
> +   struct audit_buffer *ab;
> +   struct audit_context *context = audit_alloc_local();
> +
> +   if (!audit_enabled)
> +   return;

Same as above, do the allocation after the audit_enabled check.

> +   ab = audit_log_start(context, GFP_NOFS, AUDIT_CONFIG_CHANGE);
> +   if (unlikely(!ab))
> +   return;
> +   audit_log_format(ab, "auid=%u ses=%u op=%s",
> +from_kuid(_user_ns, 
> audit_get_loginuid(current)),
> +audit_get_sessionid(current), op);
> +   audit_log_format(ab, " path=");
> +   audit_log_untrustedstring(ab, w->path);
> +   audit_log_key(ab, r->filterkey);
> +   audit_log_format(ab, " list=%d res=1", r->listnr);
> +   audit_log_end(ab);
> +   audit_log_container_info(context, "config", 
> audit_get_containerid(current));
> +   audit_free_context(context);
>  }

-- 
paul moore
www.paul-moore.com


Re: [PATCH 2/6] tracing: Add trace event error log

2018-04-18 Thread Namhyung Kim
Hi guys, :)

On Wed, Apr 18, 2018 at 09:49:24AM -0400, Steven Rostedt wrote:
> On Wed, 18 Apr 2018 18:34:34 +0900
> Masami Hiramatsu  wrote:
> 
> > On Fri, 13 Apr 2018 10:44:32 -0400
> > Steven Rostedt  wrote:
> > 
> > > On Fri, 13 Apr 2018 09:24:34 -0500
> > > Tom Zanussi  wrote:
> > >   
> > > > Yeah, I agree - I'd rather get it right than get it in now.  I thought
> > > > this made sense, and was based on input from Masami, which I may have
> > > > misinterpreted, but I'll wait for some more ideas about the best way to
> > > > do this.  
> > > 
> > > Too bad we are not closer to November, as this would actually be a good
> > > Plumbers topic. Maybe it's not that important and we should wait until
> > > then. I'd like to get some brain storming ideas out before we decide on
> > > anything, and this is something I believe is better done face to face
> > > than over email.  
> > 
> > OK, sounds good for me too :)
> > My point was that printk buffer is not good place for the parser error
> > of ftrace, nor each sub-features (like hist, trigger, probe_events etc.) 
> > has different place to show it. I just want to unify the user experience
> > over the ftrace UI.
> 
> I totally agree. I just want to make sure that whatever we come up with
> will be well thought out. Perhaps we can wait till November to talk
> about it.

I'm not sure I can go to LPC this year, but definitely interested in
improving error logging for tracing.

Thanks,
Namhyung


Re: [RFC PATCH ghak32 V2 01/13] audit: add container id

2018-04-18 Thread Casey Schaufler
On 4/18/2018 4:47 PM, Paul Moore wrote:
> On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs  wrote:
>> Implement the proc fs write to set the audit container ID of a process,
>> emitting an AUDIT_CONTAINER record to document the event.
>> ...
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index d258826..1b82191 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -796,6 +796,7 @@ struct task_struct {
>>  #ifdef CONFIG_AUDITSYSCALL
>> kuid_t  loginuid;
>> unsigned intsessionid;
>> +   u64 containerid;
> This one line addition to the task_struct scares me the most of
> anything in this patchset.  Why?  It's a field named "containerid" in
> a perhaps one of the most widely used core kernel structures; the
> possibilities for abuse are endless, and it's foolish to think we
> would ever be able to adequately police this.

If we can get the LSM infrastructure managed task blobs from 
module stacking in ahead of this we could create a trivial security
module to manage this. It's not as if there aren't all sorts of
interactions between security modules and the audit system already.




Re: [PATCH 1/2] hfs: fix potential refcnt problem of nls module

2018-04-18 Thread cgxu...@gmx.com
在 2018年4月19日,上午3:42,Andrew Morton  写道:
> 
> On Tue, 17 Apr 2018 15:05:32 +0800 Chengguang Xu  wrote:
> 
>> When specifying iocharset/codepage multiple times in a mount,
>> current option parsing will cause inaccurate refcount of nls
>> module. Hence, call unload_nls for previous one in this case.
>> 
>> ...
>> 
>> --- a/fs/hfs/super.c
>> +++ b/fs/hfs/super.c
>> @@ -329,8 +329,10 @@ static int parse_options(char *options, struct 
>> hfs_sb_info *hsb)
>>  return 0;
>>  }
>>  p = match_strdup([0]);
>> -if (p)
>> +if (p) {
>> +unload_nls(hsb->nls_disk);
>>  hsb->nls_disk = load_nls(p);
>> +}
>>  if (!hsb->nls_disk) {
>>  pr_err("unable to load codepage \"%s\"\n", p);
>>  kfree(p);
>> @@ -344,8 +346,10 @@ static int parse_options(char *options, struct 
>> hfs_sb_info *hsb)
>>  return 0;
>>  }
>>  p = match_strdup([0]);
>> -if (p)
>> +if (p) {
>> +unload_nls(hsb->nls_io);
>>  hsb->nls_io = load_nls(p);
>> +}
>>  if (!hsb->nls_io) {
>>  pr_err("unable to load iocharset \"%s\"\n", p);
>>  kfree(p);
> 
> Confused.
> 
>   break;
> : case opt_codepage:
> : if (hsb->nls_disk) {
> : pr_err("unable to change codepage\n");
> : return 0;
> : }
> 
> Here, hsb->nls_disk is known to be zero.
> 
> : p = match_strdup([0]);
> : if (p) {
> : unload_nls(hsb->nls_disk);
> 
> So this will always do unload_nls(0).
> 
> : hsb->nls_disk = load_nls(p);
> : }
> 
> And the same applies to your opt_iocharset change.

You are right. Sorry I just misread this part, please just drop the patch.

Thanks.





Re: [PATCH 2/2] perf: add arm64 smmuv3 pmu driver

2018-04-18 Thread Yisheng Xie
Hi Shameerali,

On 2018/4/18 19:05, Shameerali Kolothum Thodi wrote:
> 
> 
>> -Original Message-
>> From: linux-arm-kernel [mailto:linux-arm-kernel-boun...@lists.infradead.org]
>> On Behalf Of Yisheng Xie
>> Sent: Thursday, March 29, 2018 8:04 AM
>> To: Neil Leeder ; Will Deacon
>> ; Mark Rutland 
>> Cc: Mark Langsdorf ; Jon Masters
>> ; Timur Tabi ; linux-
>> ker...@vger.kernel.org; Mark Brown ; Mark Salter
>> ; linux-arm-ker...@lists.infradead.org
>> Subject: Re: [PATCH 2/2] perf: add arm64 smmuv3 pmu driver
>>
>> Hi Neil,
>>
>> On 2017/8/5 3:59, Neil Leeder wrote:
>>> +   mem_resource_0 = platform_get_resource(pdev, IORESOURCE_MEM,
>> 0);
>>> +   mem_map_0 = devm_ioremap_resource(>dev,
>> mem_resource_0);
>>> +
>> Can we use devm_ioremap instead? for the reg_base of smmu_pmu is
>> IMPLEMENTATION DEFINED. If the reg of smmu_pmu is inside smmu,
>> devm_ioremap_resource will failed and return -EBUSY, eg.:
>>
>>  smmu reg ranges:0x18000 ~ 0x1801f
>>  its smmu_pmu reg ranges:0x180001000 ~ 0x180001fff
> 
> I think this will not solve the issue completely as the smmu v3 driver 
> uses devm_ioremap_resource() currently and that will fail because of
> the overlap.

Right, I get your point.

> 
> Please find the discussion here:
> https://lkml.org/lkml/2018/1/31/235

Thanks for the infomation.

Thanks
Yisheng

> 
> Thanks,
> Shameer
> 
>>> +   if (IS_ERR(mem_map_0)) {
>>> +   dev_err(>dev, "Can't map SMMU PMU @%pa\n",
>>> +   _resource_0->start);
>>> +   return PTR_ERR(mem_map_0);
>>> +   }
>>> +
>>> +   smmu_pmu->reg_base = mem_map_0;
>>> +   smmu_pmu->pmu.name =
>>> +   devm_kasprintf(>dev, GFP_KERNEL, "smmu_0_%llx",
>>> +  (mem_resource_0->start) >> SMMU_PA_SHIFT);
>>> +
>>> +   if (!smmu_pmu->pmu.name) {
>>> +   dev_err(>dev, "Failed to create PMU name");
>>> +   return -EINVAL;
>>> +   }
>>> +
>>> +   ceid_64 = readq(smmu_pmu->reg_base + SMMU_PMCG_CEID0);
>>> +   ceid[0] = ceid_64 & GENMASK(31, 0);
>>> +   ceid[1] = ceid_64 >> 32;
>>> +   ceid_64 = readq(smmu_pmu->reg_base + SMMU_PMCG_CEID1);
>>> +   ceid[2] = ceid_64 & GENMASK(31, 0);
>>> +   ceid[3] = ceid_64 >> 32;
>>> +   bitmap_from_u32array(smmu_pmu->supported_events,
>> SMMU_MAX_EVENT_ID,
>>> +ceid, SMMU_NUM_EVENTS_U32);
>>> +
>>> +   /* Determine if page 1 is present */
>>> +   if (readl(smmu_pmu->reg_base + SMMU_PMCG_CFGR) &
>>> +   SMMU_PMCG_CFGR_RELOC_CTRS) {
>>> +   mem_resource_1 = platform_get_resource(pdev,
>> IORESOURCE_MEM, 1);
>>> +   mem_map_1 = devm_ioremap_resource(>dev,
>> mem_resource_1);
>>> +
>> The same as above.
>>
>> Thanks
>> Yisheng
>>
>>> +   if (IS_ERR(mem_map_1)) {
>>> +   dev_err(>dev, "Can't map SMMU PMU
>> @%pa\n",
>>> +   _resource_1->start);
>>> +   return PTR_ERR(mem_map_1);
>>> +   }
>>> +   smmu_pmu->reloc_base = mem_map_1;
>>> +   } else {
>>> +   smmu_pmu->reloc_base = smmu_pmu->reg_base;
>>> +   }
>>> +
>>> +   irq = platform_get_irq(pdev, 0);
>>> +   if (irq < 0) {
>>> +   dev_err(>dev,
>>> +   "Failed to get valid irq for smmu @%pa\n",
>>> +   _resource_0->start);
>>> +   return irq;
>>> +   }
>>
>>
>> ___
>> linux-arm-kernel mailing list
>> linux-arm-ker...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> .
> 



Re: [PATCH 2/2] printk: wake up klogd in vprintk_emit

2018-04-18 Thread Sergey Senozhatsky
On (04/18/18 11:10), Steven Rostedt wrote:
> 
> > > Calling wake_up_klogd() will grab the rq lock and give us a A-B<->B-A
> > > locking order.  
> > 
> > wake_up_klogd() uses the lockless irq_work_queue(). So it is actually
> > safe.
> 
> I didn't look at the code. OK then we don't need to worry about that.

OK.

> > 
> > But the name is confusing. We should rename it.
> 
> Yes, I would because the old wake_up_klogd() did do a wakeup. Perhaps
> we should name it: kick_klogd().

Agreed.

-ss


Re: [PATCH net-next] hv_netvsc: Add NetVSP v6 and v6.1 into version negotiation

2018-04-18 Thread David Miller
From: Haiyang Zhang 
Date: Tue, 17 Apr 2018 15:31:47 -0700

> From: Haiyang Zhang 
> 
> This patch adds the NetVSP v6 and 6.1 message structures, and includes
> these versions into NetVSC/NetVSP version negotiation process.
> 
> Signed-off-by: Haiyang Zhang 

Applied to net-next, thank you.


Re: [RFC PATCH ghak32 V2 09/13] audit: add containerid support for config/feature/user records

2018-04-18 Thread Paul Moore
On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs  wrote:
> Add container ID auxiliary records to configuration change, feature set change
> and user generated standalone records.
>
> Signed-off-by: Richard Guy Briggs 
> ---
>  kernel/audit.c   | 50 --
>  kernel/auditfilter.c |  5 -
>  2 files changed, 44 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index b238be5..08662b4 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -400,8 +400,9 @@ static int audit_log_config_change(char *function_name, 
> u32 new, u32 old,
>  {
> struct audit_buffer *ab;
> int rc = 0;
> +   struct audit_context *context = audit_alloc_local();

We should be able to use current->audit_context here right?  If we
can't for every caller, perhaps we pass an audit_context as an
argument and only allocate a local context when the passed
audit_context is NULL.

Also, if you're not comfortable always using current, just pass the
audit_context as you do with audit_log_common_recv_msg().

> -   ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_CONFIG_CHANGE);
> +   ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONFIG_CHANGE);
> if (unlikely(!ab))
> return rc;
> audit_log_format(ab, "%s=%u old=%u", function_name, new, old);
> @@ -411,6 +412,8 @@ static int audit_log_config_change(char *function_name, 
> u32 new, u32 old,
> allow_changes = 0; /* Something weird, deny request */
> audit_log_format(ab, " res=%d", allow_changes);
> audit_log_end(ab);
> +   audit_log_container_info(context, "config", 
> audit_get_containerid(current));
> +   audit_free_context(context);
> return rc;
>  }
>
> @@ -1058,7 +1061,8 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 
> msg_type)
> return err;
>  }
>
> -static void audit_log_common_recv_msg(struct audit_buffer **ab, u16 msg_type)
> +static void audit_log_common_recv_msg(struct audit_context *context,
> + struct audit_buffer **ab, u16 msg_type)
>  {
> uid_t uid = from_kuid(_user_ns, current_uid());
> pid_t pid = task_tgid_nr(current);
> @@ -1068,7 +1072,7 @@ static void audit_log_common_recv_msg(struct 
> audit_buffer **ab, u16 msg_type)
> return;
> }
>
> -   *ab = audit_log_start(NULL, GFP_KERNEL, msg_type);
> +   *ab = audit_log_start(context, GFP_KERNEL, msg_type);
> if (unlikely(!*ab))
> return;
> audit_log_format(*ab, "pid=%d uid=%u", pid, uid);
> @@ -1097,11 +1101,12 @@ static void audit_log_feature_change(int which, u32 
> old_feature, u32 new_feature
>  u32 old_lock, u32 new_lock, int res)
>  {
> struct audit_buffer *ab;
> +   struct audit_context *context = audit_alloc_local();

So I know based on the other patch we are currently discussing that we
can use current here ...

> if (audit_enabled == AUDIT_OFF)
> return;
>
> -   ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_FEATURE_CHANGE);
> +   ab = audit_log_start(context, GFP_KERNEL, AUDIT_FEATURE_CHANGE);
> if (!ab)
> return;
> audit_log_task_info(ab, current);
> @@ -1109,6 +1114,8 @@ static void audit_log_feature_change(int which, u32 
> old_feature, u32 new_feature
>  audit_feature_names[which], !!old_feature, 
> !!new_feature,
>  !!old_lock, !!new_lock, res);
> audit_log_end(ab);
> +   audit_log_container_info(context, "feature", 
> audit_get_containerid(current));
> +   audit_free_context(context);
>  }
>
>  static int audit_set_feature(struct sk_buff *skb)
> @@ -1337,13 +1344,15 @@ static int audit_receive_msg(struct sk_buff *skb, 
> struct nlmsghdr *nlh)
>
> err = audit_filter(msg_type, AUDIT_FILTER_USER);
> if (err == 1) { /* match or error */
> +   struct audit_context *context = audit_alloc_local();

I'm pretty sure we can use current here.

> err = 0;
> if (msg_type == AUDIT_USER_TTY) {
> err = tty_audit_push();
> if (err)
> break;
> }
> -   audit_log_common_recv_msg(, msg_type);
> +   audit_log_common_recv_msg(context, , msg_type);
> if (msg_type != AUDIT_USER_TTY)
> audit_log_format(ab, " msg='%.*s'",
>  AUDIT_MESSAGE_TEXT_MAX,
> @@ -1359,6 +1368,9 @@ static int audit_receive_msg(struct sk_buff *skb, 
> struct nlmsghdr *nlh)
> audit_log_n_untrustedstring(ab, data, size);
> }
> 

c9e97a1997 BUG: kernel reboot-without-warning in early-boot stage, last printk: early console in setup code

2018-04-18 Thread Fengguang Wu
On Wed, Apr 18, 2018 at 06:38:25PM -0500, Dennis Zhou wrote:
>Hi,
>
>On Wed, Apr 18, 2018 at 09:55:53PM +0800, Fengguang Wu wrote:
>>
>> Hello,
>>
>> FYI here is a slightly different boot error in mainline kernel 4.17.0-rc1.
>> It also dates back to v4.16 .
>>
>> It occurs in 4 out of 4 boots.
>>
>> [0.00] Built 1 zonelists, mobility grouping on.  Total pages: 128873
>> [0.00] Kernel command line: root=/dev/ram0 hung_task_panic=1 debug 
>> apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 
>> net.ifnames=0 printk.devkmsg=on panic=-1 softlockup_panic=1 
>> nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 
>> drbd.minor_count=8 systemd.log_level=err ignore_loglevel console=tty0 
>> earlyprintk=ttyS0,115200 console=ttyS0,115200 vga=normal rw 
>> link=/kbuild-tests/run-queue/kvm/x86_64-randconfig-a0-04172313/linux-devel:devel-hourly-2018041714:60cc43fc888428bb2f18f08997432d426a243338/.vmlinuz-60cc43fc888428bb2f18f08997432d426a243338-20180418000325-19:yocto-lkp-nhm-dp2-4
>>  branch=linux-devel/devel-hourly-2018041714 
>> BOOT_IMAGE=/pkg/linux/x86_64-randconfig-a0-04172313/gcc-7/60cc43fc888428bb2f18f08997432d426a243338/vmlinuz-4.17.0-rc1
>>  drbd.minor_count=8 rcuperf.shutdown=0
>> [0.00] sysrq: sysrq always enabled.
>> [0.00] Dentry cache hash table entries: 65536 (order: 7, 524288 
>> bytes)
>> [0.00] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
>> PANIC: early exception 0x0d IP 10:a892f15f error 0 cr2 
>> 0x88001fbff000
>> [0.00] CPU: 0 PID: 0 Comm: swapper Tainted: GT 
>> 4.17.0-rc1 #238
>> [0.00] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>> 1.10.2-1 04/01/2014
>> [0.00] RIP: 0010:per_cpu_ptr_to_phys+0x16a/0x298:
>>  __section_mem_map_addr at 
>> include/linux/mmzone.h:1188
>>   (inlined by) 
>> per_cpu_ptr_to_phys at mm/percpu.c:1849
>> [0.00] RSP: :ab407e50 EFLAGS: 00010046 ORIG_RAX: 
>> 
>> [0.00] RAX: dc00 RBX: 88001f17c340 RCX: 
>> 000f
>> [0.00] RDX:  RSI: 0001 RDI: 
>> acfbf580
>> [0.00] RBP: ab40d000 R08: fbfff57c4eca R09: 
>> 
>> [0.00] R10: 880015421000 R11: fbfff57c4ec9 R12: 
>> 
>> [0.00] R13: 88001fb03ff8 R14: 88001fc051c0 R15: 
>> 
>> [0.00] FS:  () GS:ab4c5000() 
>> knlGS:
>> [0.00] CS:  0010 DS:  ES:  CR0: 80050033
>> [0.00] CR2: 88001fbff000 CR3: 1a06c000 CR4: 
>> 06b0
>> [0.00] Call Trace:
>> [0.00]  setup_cpu_entry_areas+0x7b/0x27b:
>>  setup_cpu_entry_area at 
>> arch/x86/mm/cpu_entry_area.c:104
>>   (inlined by) 
>> setup_cpu_entry_areas at arch/x86/mm/cpu_entry_area.c:177
>> [0.00]  trap_init+0xb/0x13d:
>>  trap_init at 
>> arch/x86/kernel/traps.c:949
>> [0.00]  start_kernel+0x2a5/0x91d:
>>  mm_init at init/main.c:519
>>   (inlined by) start_kernel at 
>> init/main.c:589
>> [0.00]  ? thread_stack_cache_init+0x6/0x6
>> [0.00]  ? memcpy_orig+0x16/0x110:
>>  memcpy_orig at 
>> arch/x86/lib/memcpy_64.S:77
>> [0.00]  ? x86_family+0x5/0x1d:
>>  x86_family at 
>> arch/x86/lib/cpu.c:8
>> [0.00]  ? load_ucode_bsp+0x42/0x13e:
>>  load_ucode_bsp at 
>> arch/x86/kernel/cpu/microcode/core.c:183
>> [0.00]  secondary_startup_64+0xa5/0xb0:
>>  secondary_startup_64 at 
>> arch/x86/kernel/head_64.S:242
>> [0.00] Code: 78 06 00 49 8b 45 00 48 85 c0 74 a5 49 c1 ec 28 41 81 
>> e4 e0 0f 00 00 49 01 c4 4c 89 e2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 
>> <80> 3c 02 00 74 08 4c 89 e7 e8 63 78 06 00 49 8b 04 24 81 e5 ff
>> BUG: kernel hang in boot stage
>>
>
>I spent some time bisecting this one and it seemse to be an intermittent
>issue starting with this commit for me:
>c9e97a1997, mm: initialize pages on demand during boot. The prior
>commit, 3a2d7fa8a3, did not run into this issue after 10+ boots.

Dennis, thanks for bisecting it down!

Pavel, here is an early boot error bisected to c9e97a1997 ("mm:
initialize pages on demand during boot"). Reproduce script attached.

3a2d7fa8a3  mm: disable interrupts while initializing deferred pages
c9e97a1997  mm: initialize pages on demand during boot
48023102b7  Merge branch 'overlayfs-linus' of 

Early timeouts due to inaccurate jiffies during system suspend/resume

2018-04-18 Thread Imre Deak
Hi,

while checking bug [1], I noticed that jiffies based timing loops like

expire = jiffies + timeout + 1;
while (!time_after(jiffies, expire))
do_something;

can last shorter than expected (that is less than timeout). This happens
at least on an Intel Geminilake system when running the timing loop from
a driver's system suspend and resume hooks. To me it looks like expire
above is calculated with a stale jiffies value at the beginning and then
jiffies is updated - corresponding to the actual current time - with a >1
increment in the middle of the loop causing an early expiry.

With the following simplified testcase ran from a driver's suspend/resume
hooks

cpu = raw_smp_processor_id();
cpu_clock_start = cpu_clock(cpu); // 1.
jiffies_start = jiffies;  // 2.

usleep_range(200, 200);

jiffies_end = jiffies;// 3.
cpu_clock_end = cpu_clock(cpu);   // 4.

jiffies_delta = jiffies_end - jiffies_start;
cpu_clock_delta = cpu_clock_end - cpu_clock_start;

if (jiffies_to_nsecs(jiffies_delta) >
cpu_clock_delta + jiffies_to_nsecs(1))
pr_info("cpu %d jiffies-delta %llu ns (%llu->%llu) 
cpu_clock-delta %llu ns (%llu -> %llu)\n",
cpu,
jiffies_to_nsecs(jiffies_delta), jiffies_start, 
jiffies_end,
cpu_clock_delta, cpu_clock_start, cpu_clock_end);



and doing suspend/resume to mem cycles in a loop, I can trigger the
following:

[   42.415713] cpu 1 jiffies-delta 1100 ns (4294709700->4294709711) 
cpu_clock-delta 215738 ns (42415489466 -> 42415705204)

So according to jiffies the delay was 11ms while according to cpu_clock()
it was ~216usec. I have CONFIG_HZ=1000, so AFAIU - due to the ordering of
1.,2.,3.,4. - cpu_clock-delta should be bigger than jiffies-delta
minus 1ms.

Are the above timing loops/assumptions incorrect?

After some ftracing it seems like jiffies gets stale due to a missed
LAPIC timer interrupt after the interrupt is armed in
lapic_next_deadline() and before jiffies is sampled at 2. above.
Eventually the interrupt does get delivered, at which point jiffies gets
updated via tick_do_update_jiffies64() with a >1 ticks increment.
Between lapic_next_deadline() and the - late - delivery of the interrupt
the CPU on which the interrupt is armed doesn't go idle.

Booting with nolapic_timer I couldn't yet trigger the problem.

I'm still trying to do a bisect, without high hopes, since triggering
the problem can take rather long and I suspect this could also be some
HW issue.

Any idea what could go wrong or how to debug this further?

I attached my dmesg.

Thanks,
Imre

[1] https://bugs.freedesktop.org/show_bug.cgi?id=105771
[0.00] Linux version 4.17.0-rc1-CI-CI_DRM_4040+ (jim@ideak-desk) (gcc 
version 6.3.0 20170406 (Ubuntu 6.3.0-12ubuntu2)) #86 SMP PREEMPT Wed Apr 18 
22:35:05 EEST 2018
[0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-4.17.0-rc1-CI-CI_DRM_4040+ 
root=UUID=39b51f91-8fce-4449-acb4-3e740303a4fe ro quiet splash drm.debug=0xe 
intel_iommu=igfx_off 3 modprobe.blacklist=i915,snd_hda_intel apic=debug 
vt.handoff=1
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[0.00] x86/fpu: xstate_offset[3]:  576, xstate_sizes[3]:   64
[0.00] x86/fpu: xstate_offset[4]:  640, xstate_sizes[4]:   64
[0.00] x86/fpu: Enabled xstate features 0x1b, context size is 704 
bytes, using 'compacted' format.
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0003dfff] usable
[0.00] BIOS-e820: [mem 0x0003e000-0x0003] reserved
[0.00] BIOS-e820: [mem 0x0004-0x0009dfff] usable
[0.00] BIOS-e820: [mem 0x0009e000-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x0fff] usable
[0.00] BIOS-e820: [mem 0x1000-0x12150fff] reserved
[0.00] BIOS-e820: [mem 0x12151000-0x674a9fff] usable
[0.00] BIOS-e820: [mem 0x674aa000-0x69d58fff] reserved
[0.00] BIOS-e820: [mem 0x69d59000-0x69d7cfff] ACPI data
[0.00] BIOS-e820: [mem 0x69d7d000-0x69ddcfff] ACPI NVS
[0.00] BIOS-e820: [mem 0x69ddd000-0x6a0a3fff] reserved
[0.00] BIOS-e820: [mem 0x6a0a4000-0x6a164fff] type 20
[0.00] BIOS-e820: [mem 0x6a165000-0x6a562fff] usable
[0.00] BIOS-e820: [mem 0x6a563000-0x6a60efff] reserved
[0.00] BIOS-e820: [mem 

Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-04-18 Thread Dave Young
On 04/18/18 at 06:01pm, Rahul Lakkireddy wrote:
> On Wednesday, April 04/18/18, 2018 at 11:45:46 +0530, Dave Young wrote:
> > Hi Rahul,
> > On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
> > > On production servers running variety of workloads over time, kernel
> > > panic can happen sporadically after days or even months. It is
> > > important to collect as much debug logs as possible to root cause
> > > and fix the problem, that may not be easy to reproduce. Snapshot of
> > > underlying hardware/firmware state (like register dump, firmware
> > > logs, adapter memory, etc.), at the time of kernel panic will be very
> > > helpful while debugging the culprit device driver.
> > > 
> > > This series of patches add new generic framework that enable device
> > > drivers to collect device specific snapshot of the hardware/firmware
> > > state of the underlying device in the crash recovery kernel. In crash
> > > recovery kernel, the collected logs are added as elf notes to
> > > /proc/vmcore, which is copied by user space scripts for post-analysis.
> > > 
> > > The sequence of actions done by device drivers to append their device
> > > specific hardware/firmware logs to /proc/vmcore are as follows:
> > > 
> > > 1. During probe (before hardware is initialized), device drivers
> > > register to the vmcore module (via vmcore_add_device_dump()), with
> > > callback function, along with buffer size and log name needed for
> > > firmware/hardware log collection.
> > 
> > I assumed the elf notes info should be prepared while kexec_[file_]load
> > phase. But I did not read the old comment, not sure if it has been discussed
> > or not.
> > 
> 
> We must not collect dumps in crashing kernel. Adding more things in
> crash dump path risks not collecting vmcore at all. Eric had
> discussed this in more detail at:
> 
> https://lkml.org/lkml/2018/3/24/319
> 
> We are safe to collect dumps in the second kernel. Each device dump
> will be exported as an elf note in /proc/vmcore.

I understand that we should avoid adding anything in crash path.  And I also
agree to collect device dump in second kernel.  I just assumed device
dump use some memory area to store the debug info and the memory
is persistent so that this can be done in 2 steps, first register the
address in elf header in kexec_load, then collect the dump in 2nd
kernel.  But it seems the driver is doing some other logic to collect
the info instead of just that simple like I thought. 

> 
> > If do this in 2nd kernel a question is driver can be loaded later than 
> > vmcore init.
> 
> Yes, drivers will add their device dumps after vmcore init.
> 
> > How to guarantee the function works if vmcore reading happens before
> > the driver is loaded?
> > 
> > Also it is possible that kdump initramfs does not contains the driver
> > module.
> > 
> > Am I missing something?
> > 
> 
> Yes, driver must be in initramfs if it wants to collect and add device
> dump to /proc/vmcore in second kernel.

In RH/Fedora kdump scripts we only add the things are required to
bring up the dump target, so that we can use as less memory as we can.

For example, if a net driver panicked, and the dump target is rootfs
which is a scsi disk, then no network related stuff will be added in
initramfs.

In this case the device dump info will be not collected..
> 
> > > 
> > > 2. vmcore module allocates the buffer with requested size. It adds
> > > an elf note and invokes the device driver's registered callback
> > > function.
> > > 
> > > 3. Device driver collects all hardware/firmware logs into the buffer
> > > and returns control back to vmcore module.
> > > 
> > > The device specific hardware/firmware logs can be seen as elf notes:
> > > 
> > > # readelf -n /proc/vmcore
> > > 
> > > Displaying notes found at file offset 0x1000 with length 0x04003288:
> > >   Owner Data size Description
> > >   VMCOREDD_cxgb4_:02:00.4 0x02000fd8  Unknown note type: (0x0700)
> > >   VMCOREDD_cxgb4_:04:00.4 0x02000fd8  Unknown note type: (0x0700)
> > >   CORE 0x0150 NT_PRSTATUS (prstatus structure)
> > >   CORE 0x0150 NT_PRSTATUS (prstatus structure)
> > >   CORE 0x0150 NT_PRSTATUS (prstatus structure)
> > >   CORE 0x0150 NT_PRSTATUS (prstatus structure)
> > >   CORE 0x0150 NT_PRSTATUS (prstatus structure)
> > >   CORE 0x0150 NT_PRSTATUS (prstatus structure)
> > >   CORE 0x0150 NT_PRSTATUS (prstatus structure)
> > >   CORE 0x0150 NT_PRSTATUS (prstatus structure)
> > >   VMCOREINFO   0x074f Unknown note type: (0x)
> > > 
> > > Patch 1 adds API to vmcore module to allow drivers to register callback
> > > to collect the device specific hardware/firmware logs.  The logs will
> > > be added to /proc/vmcore as elf notes.
> > > 
> > > Patch 2 updates read and mmap logic to append device specific hardware/
> > > 

Re: PATCH V4 0/5 nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-04-18 Thread jianchao.wang
Hi Ming

Thanks for your kindly response.

On 04/18/2018 11:40 PM, Ming Lei wrote:
>> Regarding to this patchset, it is mainly to fix the dependency between
>> nvme_timeout and nvme_dev_disable, as your can see:
>> nvme_timeout will invoke nvme_dev_disable, and nvme_dev_disable have to
>> depend on nvme_timeout when controller no response.
> Do you mean nvme_disable_io_queues()? If yes, this one has been handled
> by wait_for_completion_io_timeout() already, and looks the block timeout
> can be disabled simply. Or are there others?
> 
Here is one possible scenario currently

nvme_dev_disable // hold shutdown_lock nvme_timeout
  -> nvme_set_host_mem   -> nvme_dev_disable
-> nvme_submit_sync_cmd-> try to require 
shutdown_lock 
  -> __nvme_submit_sync_cmd
-> blk_execute_rq
  //if sysctl_hung_task_timeout_secs == 0
  -> wait_for_completion_io
And maybe nvme_dev_disable need to issue other commands in the future.

Even if we could fix these kind of issues as nvme_disable_io_queues, 
it is still a risk I think.

Thanks
Jianchao


[PATCH v2 3/5] dmaengine: sprd: Move DMA request mode and interrupt type into head file

2018-04-18 Thread Baolin Wang
From: Eric Long 

This patch will move the Spreadtrum DMA request mode and interrupt type
into one head file for user to configure.

Signed-off-by: Eric Long 
Signed-off-by: Baolin Wang 
---
Changes since v1:
- No updates.
---
 drivers/dma/sprd-dma.c   |   52 +-
 include/linux/dma/sprd-dma.h |   57 ++
 2 files changed, 58 insertions(+), 51 deletions(-)
 create mode 100644 include/linux/dma/sprd-dma.h

diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
index 65ff0a58..ccdeb8f 100644
--- a/drivers/dma/sprd-dma.c
+++ b/drivers/dma/sprd-dma.c
@@ -6,6 +6,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -125,57 +126,6 @@
 
 #define SPRD_DMA_SOFTWARE_UID  0
 
-/*
- * enum sprd_dma_req_mode: define the DMA request mode
- * @SPRD_DMA_FRAG_REQ: fragment request mode
- * @SPRD_DMA_BLK_REQ: block request mode
- * @SPRD_DMA_TRANS_REQ: transaction request mode
- * @SPRD_DMA_LIST_REQ: link-list request mode
- *
- * We have 4 types request mode: fragment mode, block mode, transaction mode
- * and linklist mode. One transaction can contain several blocks, one block can
- * contain several fragments. Link-list mode means we can save several DMA
- * configuration into one reserved memory, then DMA can fetch each DMA
- * configuration automatically to start transfer.
- */
-enum sprd_dma_req_mode {
-   SPRD_DMA_FRAG_REQ,
-   SPRD_DMA_BLK_REQ,
-   SPRD_DMA_TRANS_REQ,
-   SPRD_DMA_LIST_REQ,
-};
-
-/*
- * enum sprd_dma_int_type: define the DMA interrupt type
- * @SPRD_DMA_NO_INT: do not need generate DMA interrupts.
- * @SPRD_DMA_FRAG_INT: fragment done interrupt when one fragment request
- * is done.
- * @SPRD_DMA_BLK_INT: block done interrupt when one block request is done.
- * @SPRD_DMA_BLK_FRAG_INT: block and fragment interrupt when one fragment
- * or one block request is done.
- * @SPRD_DMA_TRANS_INT: tansaction done interrupt when one transaction
- * request is done.
- * @SPRD_DMA_TRANS_FRAG_INT: transaction and fragment interrupt when one
- * transaction request or fragment request is done.
- * @SPRD_DMA_TRANS_BLK_INT: transaction and block interrupt when one
- * transaction request or block request is done.
- * @SPRD_DMA_LIST_INT: link-list done interrupt when one link-list request
- * is done.
- * @SPRD_DMA_CFGERR_INT: configure error interrupt when configuration is
- * incorrect.
- */
-enum sprd_dma_int_type {
-   SPRD_DMA_NO_INT,
-   SPRD_DMA_FRAG_INT,
-   SPRD_DMA_BLK_INT,
-   SPRD_DMA_BLK_FRAG_INT,
-   SPRD_DMA_TRANS_INT,
-   SPRD_DMA_TRANS_FRAG_INT,
-   SPRD_DMA_TRANS_BLK_INT,
-   SPRD_DMA_LIST_INT,
-   SPRD_DMA_CFGERR_INT,
-};
-
 /* dma data width values */
 enum sprd_dma_datawidth {
SPRD_DMA_DATAWIDTH_1_BYTE,
diff --git a/include/linux/dma/sprd-dma.h b/include/linux/dma/sprd-dma.h
new file mode 100644
index 000..c545162
--- /dev/null
+++ b/include/linux/dma/sprd-dma.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _SPRD_DMA_H_
+#define _SPRD_DMA_H_
+
+/*
+ * enum sprd_dma_req_mode: define the DMA request mode
+ * @SPRD_DMA_FRAG_REQ: fragment request mode
+ * @SPRD_DMA_BLK_REQ: block request mode
+ * @SPRD_DMA_TRANS_REQ: transaction request mode
+ * @SPRD_DMA_LIST_REQ: link-list request mode
+ *
+ * We have 4 types request mode: fragment mode, block mode, transaction mode
+ * and linklist mode. One transaction can contain several blocks, one block can
+ * contain several fragments. Link-list mode means we can save several DMA
+ * configuration into one reserved memory, then DMA can fetch each DMA
+ * configuration automatically to start transfer.
+ */
+enum sprd_dma_req_mode {
+   SPRD_DMA_FRAG_REQ,
+   SPRD_DMA_BLK_REQ,
+   SPRD_DMA_TRANS_REQ,
+   SPRD_DMA_LIST_REQ,
+};
+
+/*
+ * enum sprd_dma_int_type: define the DMA interrupt type
+ * @SPRD_DMA_NO_INT: do not need generate DMA interrupts.
+ * @SPRD_DMA_FRAG_INT: fragment done interrupt when one fragment request
+ * is done.
+ * @SPRD_DMA_BLK_INT: block done interrupt when one block request is done.
+ * @SPRD_DMA_BLK_FRAG_INT: block and fragment interrupt when one fragment
+ * or one block request is done.
+ * @SPRD_DMA_TRANS_INT: tansaction done interrupt when one transaction
+ * request is done.
+ * @SPRD_DMA_TRANS_FRAG_INT: transaction and fragment interrupt when one
+ * transaction request or fragment request is done.
+ * @SPRD_DMA_TRANS_BLK_INT: transaction and block interrupt when one
+ * transaction request or block request is done.
+ * @SPRD_DMA_LIST_INT: link-list done interrupt when one link-list request
+ * is done.
+ * @SPRD_DMA_CFGERR_INT: configure error interrupt when configuration is
+ * incorrect.
+ */
+enum sprd_dma_int_type {
+   SPRD_DMA_NO_INT,
+   SPRD_DMA_FRAG_INT,
+   SPRD_DMA_BLK_INT,
+   SPRD_DMA_BLK_FRAG_INT,
+   

[PATCH v2 4/5] dmaengine: sprd: Add Spreadtrum DMA configuration

2018-04-18 Thread Baolin Wang
From: Eric Long 

This patch adds one 'struct sprd_dma_config' structure to save Spreadtrum
DMA configuration for each DMA channel. Meanwhile we also did some optimization
for sprd_dma_config() and sprd_dma_prep_dma_memcpy() to prepare to configure
DMA from users.

Signed-off-by: Eric Long 
Signed-off-by: Baolin Wang 
---
Changes since v1:
 - Remove 'struct sprd_dma_config' structure in 'sprd_dma.h'.
 - Add 'struct sprd_dma_config' for DMA channel.
 - Remove sprd_dma_get_datawidth() and sprd_dma_get_step().
 - Other optimization.
---
 drivers/dma/sprd-dma.c |  205 ++--
 1 file changed, 110 insertions(+), 95 deletions(-)

diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
index ccdeb8f..23846ed 100644
--- a/drivers/dma/sprd-dma.c
+++ b/drivers/dma/sprd-dma.c
@@ -100,6 +100,8 @@
 #define SPRD_DMA_DES_DATAWIDTH_OFFSET  28
 #define SPRD_DMA_SWT_MODE_OFFSET   26
 #define SPRD_DMA_REQ_MODE_OFFSET   24
+#define SPRD_DMA_WRAP_SEL_OFFSET   23
+#define SPRD_DMA_WRAP_EN_OFFSET22
 #define SPRD_DMA_REQ_MODE_MASK GENMASK(1, 0)
 #define SPRD_DMA_FIX_SEL_OFFSET21
 #define SPRD_DMA_FIX_EN_OFFSET 20
@@ -154,6 +156,41 @@ struct sprd_dma_chn_hw {
u32 des_blk_step;
 };
 
+/*
+ * struct sprd_dma_config - DMA configuration structure
+ * @src_addr: the physical address where DMA slave data should be read
+ * @dst_addr: the physical address where DMA slave data should be written
+ * @fragment_len: specify one fragment transfer length
+ * @block_len: specify one block transfer length
+ * @transcation_len: specify one transcation transfer length
+ * @src_step: source transfer step
+ * @dst_step: destination transfer step
+ * @src_datawidth: source transfer data width
+ * @dst_datawidth: destination transfer data width
+ * @wrap_ptr: wrap pointer address, once the transfer address reaches the
+ * 'wrap_ptr', the next transfer address will jump to the 'wrap_to' address.
+ * @wrap_to: wrap jump to address
+ * @req_mode: specify the DMA request mode
+ * @int_mode: specify the DMA interrupt type
+ * @slave_id: slave channel requester id
+ */
+struct sprd_dma_config {
+   phys_addr_t src_addr;
+   phys_addr_t dst_addr;
+   u32 fragment_len;
+   u32 block_len;
+   u32 transcation_len;
+   u32 src_step;
+   u32 dst_step;
+   enum sprd_dma_datawidth src_datawidth;
+   enum sprd_dma_datawidth dst_datawidth;
+   phys_addr_t wrap_ptr;
+   phys_addr_t wrap_to;
+   enum sprd_dma_req_mode req_mode;
+   enum sprd_dma_int_type int_mode;
+   u32 slave_id;
+};
+
 /* dma request description */
 struct sprd_dma_desc {
struct virt_dma_descvd;
@@ -164,6 +201,7 @@ struct sprd_dma_desc {
 struct sprd_dma_chn {
struct virt_dma_chanvc;
void __iomem*chn_base;
+   struct sprd_dma_config  slave_cfg;
u32 chn_num;
u32 dev_id;
struct sprd_dma_desc*cur_desc;
@@ -553,125 +591,74 @@ static void sprd_dma_issue_pending(struct dma_chan *chan)
 }
 
 static int sprd_dma_config(struct dma_chan *chan, struct sprd_dma_desc *sdesc,
-  dma_addr_t dest, dma_addr_t src, size_t len)
+  struct sprd_dma_config *slave_cfg)
 {
struct sprd_dma_dev *sdev = to_sprd_dma_dev(chan);
+   struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
struct sprd_dma_chn_hw *hw = >chn_hw;
-   u32 datawidth, src_step, des_step, fragment_len;
-   u32 block_len, req_mode, irq_mode, transcation_len;
-   u32 fix_mode = 0, fix_en = 0;
-
-   if (IS_ALIGNED(len, 4)) {
-   datawidth = SPRD_DMA_DATAWIDTH_4_BYTES;
-   src_step = SPRD_DMA_WORD_STEP;
-   des_step = SPRD_DMA_WORD_STEP;
-   } else if (IS_ALIGNED(len, 2)) {
-   datawidth = SPRD_DMA_DATAWIDTH_2_BYTES;
-   src_step = SPRD_DMA_SHORT_STEP;
-   des_step = SPRD_DMA_SHORT_STEP;
-   } else {
-   datawidth = SPRD_DMA_DATAWIDTH_1_BYTE;
-   src_step = SPRD_DMA_BYTE_STEP;
-   des_step = SPRD_DMA_BYTE_STEP;
-   }
+   u32 fix_mode = 0, fix_en = 0, wrap_en = 0, wrap_mode = 0;
 
-   fragment_len = SPRD_DMA_MEMCPY_MIN_SIZE;
-   if (len <= SPRD_DMA_BLK_LEN_MASK) {
-   block_len = len;
-   transcation_len = 0;
-   req_mode = SPRD_DMA_BLK_REQ;
-   irq_mode = SPRD_DMA_BLK_INT;
-   } else {
-   block_len = SPRD_DMA_MEMCPY_MIN_SIZE;
-   transcation_len = len;
-   req_mode = SPRD_DMA_TRANS_REQ;
-   irq_mode = SPRD_DMA_TRANS_INT;
-   }
+   if (slave_cfg->slave_id)
+   schan->dev_id = slave_cfg->slave_id;
 
hw->cfg = SPRD_DMA_DONOT_WAIT_BDONE << SPRD_DMA_WAIT_BDONE_OFFSET;
-   

[PATCH v2 2/5] dmaengine: sprd: Define the DMA data width type

2018-04-18 Thread Baolin Wang
Define the DMA data width type to make code more readable.

Signed-off-by: Baolin Wang 
---
Changes since v1:
- No updates.
---
 drivers/dma/sprd-dma.c |   14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
index dcfa417..65ff0a58 100644
--- a/drivers/dma/sprd-dma.c
+++ b/drivers/dma/sprd-dma.c
@@ -176,6 +176,14 @@ enum sprd_dma_int_type {
SPRD_DMA_CFGERR_INT,
 };
 
+/* dma data width values */
+enum sprd_dma_datawidth {
+   SPRD_DMA_DATAWIDTH_1_BYTE,
+   SPRD_DMA_DATAWIDTH_2_BYTES,
+   SPRD_DMA_DATAWIDTH_4_BYTES,
+   SPRD_DMA_DATAWIDTH_8_BYTES,
+};
+
 /* dma channel hardware configuration */
 struct sprd_dma_chn_hw {
u32 pause;
@@ -604,15 +612,15 @@ static int sprd_dma_config(struct dma_chan *chan, struct 
sprd_dma_desc *sdesc,
u32 fix_mode = 0, fix_en = 0;
 
if (IS_ALIGNED(len, 4)) {
-   datawidth = 2;
+   datawidth = SPRD_DMA_DATAWIDTH_4_BYTES;
src_step = SPRD_DMA_WORD_STEP;
des_step = SPRD_DMA_WORD_STEP;
} else if (IS_ALIGNED(len, 2)) {
-   datawidth = 1;
+   datawidth = SPRD_DMA_DATAWIDTH_2_BYTES;
src_step = SPRD_DMA_SHORT_STEP;
des_step = SPRD_DMA_SHORT_STEP;
} else {
-   datawidth = 0;
+   datawidth = SPRD_DMA_DATAWIDTH_1_BYTE;
src_step = SPRD_DMA_BYTE_STEP;
des_step = SPRD_DMA_BYTE_STEP;
}
-- 
1.7.9.5



[PATCH v2 5/5] dmaengine: sprd: Add 'device_config' and 'device_prep_slave_sg' interfaces

2018-04-18 Thread Baolin Wang
From: Eric Long 

This patch adds the 'device_config' and 'device_prep_slave_sg' interfaces
for users to configure DMA.

Signed-off-by: Eric Long 
Signed-off-by: Baolin Wang 
---
Changes since v1:
 - The request mode and interrupt type will be passed from flags.
 - Add sprd_dma_get_step() to get src/dst step.
 - Add sprd_dma_get_datawidth() to convert data width values which
 can be used by Spreadtrum DMA.
---
 drivers/dma/sprd-dma.c   |  115 ++
 include/linux/dma/sprd-dma.h |4 ++
 2 files changed, 119 insertions(+)

diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
index 23846ed..f2598ed 100644
--- a/drivers/dma/sprd-dma.c
+++ b/drivers/dma/sprd-dma.c
@@ -590,6 +590,47 @@ static void sprd_dma_issue_pending(struct dma_chan *chan)
spin_unlock_irqrestore(>vc.lock, flags);
 }
 
+static enum sprd_dma_datawidth
+sprd_dma_get_datawidth(enum dma_slave_buswidth buswidth)
+{
+   switch (buswidth) {
+   case DMA_SLAVE_BUSWIDTH_1_BYTE:
+   return SPRD_DMA_DATAWIDTH_1_BYTE;
+
+   case DMA_SLAVE_BUSWIDTH_2_BYTES:
+   return SPRD_DMA_DATAWIDTH_2_BYTES;
+
+   case DMA_SLAVE_BUSWIDTH_4_BYTES:
+   return SPRD_DMA_DATAWIDTH_4_BYTES;
+
+   case DMA_SLAVE_BUSWIDTH_8_BYTES:
+   return SPRD_DMA_DATAWIDTH_8_BYTES;
+
+   default:
+   return SPRD_DMA_DATAWIDTH_4_BYTES;
+   }
+}
+
+static u32 sprd_dma_get_step(enum sprd_dma_datawidth datawidth)
+{
+   switch (datawidth) {
+   case SPRD_DMA_DATAWIDTH_1_BYTE:
+   return SPRD_DMA_BYTE_STEP;
+
+   case SPRD_DMA_DATAWIDTH_2_BYTES:
+   return SPRD_DMA_SHORT_STEP;
+
+   case SPRD_DMA_DATAWIDTH_4_BYTES:
+   return SPRD_DMA_WORD_STEP;
+
+   case SPRD_DMA_DATAWIDTH_8_BYTES:
+   return SPRD_DMA_DWORD_STEP;
+
+   default:
+   return SPRD_DMA_DWORD_STEP;
+   }
+}
+
 static int sprd_dma_config(struct dma_chan *chan, struct sprd_dma_desc *sdesc,
   struct sprd_dma_config *slave_cfg)
 {
@@ -711,6 +752,78 @@ static int sprd_dma_config(struct dma_chan *chan, struct 
sprd_dma_desc *sdesc,
return vchan_tx_prep(>vc, >vd, flags);
 }
 
+static struct dma_async_tx_descriptor *
+sprd_dma_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl,
+  unsigned int sglen, enum dma_transfer_direction dir,
+  unsigned long flags, void *context)
+{
+   struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
+   struct sprd_dma_config *slave_cfg = >slave_cfg;
+   struct sprd_dma_desc *sdesc;
+   struct scatterlist *sg;
+   int ret, i;
+
+   /* TODO: now we only support one sg for each DMA configuration. */
+   if (!is_slave_direction(dir) || sglen > 1)
+   return NULL;
+
+   sdesc = kzalloc(sizeof(*sdesc), GFP_NOWAIT);
+   if (!sdesc)
+   return NULL;
+
+   for_each_sg(sgl, sg, sglen, i) {
+   if (dir == DMA_MEM_TO_DEV) {
+   slave_cfg->src_addr = sg_dma_address(sg);
+   slave_cfg->src_step =
+   sprd_dma_get_step(slave_cfg->src_datawidth);
+   slave_cfg->dst_step = SPRD_DMA_NONE_STEP;
+   } else {
+   slave_cfg->dst_addr = sg_dma_address(sg);
+   slave_cfg->src_step = SPRD_DMA_NONE_STEP;
+   slave_cfg->dst_step =
+   sprd_dma_get_step(slave_cfg->dst_datawidth);
+   }
+
+   slave_cfg->block_len = sg_dma_len(sg);
+   slave_cfg->transcation_len = sg_dma_len(sg);
+   }
+
+   slave_cfg->req_mode =
+   (flags >> SPRD_DMA_REQ_SHIFT) & SPRD_DMA_REQ_MODE_MASK;
+   slave_cfg->int_mode = flags & SPRD_DMA_INT_MASK;
+
+   ret = sprd_dma_config(chan, sdesc, slave_cfg);
+   if (ret) {
+   kfree(sdesc);
+   return NULL;
+   }
+
+   return vchan_tx_prep(>vc, >vd, flags);
+}
+
+static int sprd_dma_slave_config(struct dma_chan *chan,
+struct dma_slave_config *config)
+{
+   struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
+   struct sprd_dma_config *slave_cfg = >slave_cfg;
+
+   if (!is_slave_direction(config->direction))
+   return -EINVAL;
+
+   memset(slave_cfg, 0, sizeof(*slave_cfg));
+
+   slave_cfg->slave_id = config->slave_id;
+   slave_cfg->src_addr = config->src_addr;
+   slave_cfg->dst_addr = config->dst_addr;
+   slave_cfg->fragment_len = config->src_maxburst;
+   slave_cfg->src_datawidth =
+   sprd_dma_get_datawidth(config->src_addr_width);
+   slave_cfg->dst_datawidth =
+   sprd_dma_get_datawidth(config->dst_addr_width);
+
+   return 0;
+}
+
 static int 

[PATCH v2 1/5] dmaengine: sprd: Define the DMA transfer step type

2018-04-18 Thread Baolin Wang
From: Eric Long 

Define the DMA transfer step type to make code more readable.

Signed-off-by: Eric Long 
Signed-off-by: Baolin Wang 
---
Changes since v1:
 - Convert enum structure to macros definition for DMA step type.
---
 drivers/dma/sprd-dma.c |   19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
index b106e8a..dcfa417 100644
--- a/drivers/dma/sprd-dma.c
+++ b/drivers/dma/sprd-dma.c
@@ -116,6 +116,13 @@
 #define SPRD_DMA_SRC_TRSF_STEP_OFFSET  0
 #define SPRD_DMA_TRSF_STEP_MASKGENMASK(15, 0)
 
+/* define the DMA transfer step type */
+#define SPRD_DMA_NONE_STEP 0
+#define SPRD_DMA_BYTE_STEP 1
+#define SPRD_DMA_SHORT_STEP2
+#define SPRD_DMA_WORD_STEP 4
+#define SPRD_DMA_DWORD_STEP8
+
 #define SPRD_DMA_SOFTWARE_UID  0
 
 /*
@@ -598,16 +605,16 @@ static int sprd_dma_config(struct dma_chan *chan, struct 
sprd_dma_desc *sdesc,
 
if (IS_ALIGNED(len, 4)) {
datawidth = 2;
-   src_step = 4;
-   des_step = 4;
+   src_step = SPRD_DMA_WORD_STEP;
+   des_step = SPRD_DMA_WORD_STEP;
} else if (IS_ALIGNED(len, 2)) {
datawidth = 1;
-   src_step = 2;
-   des_step = 2;
+   src_step = SPRD_DMA_SHORT_STEP;
+   des_step = SPRD_DMA_SHORT_STEP;
} else {
datawidth = 0;
-   src_step = 1;
-   des_step = 1;
+   src_step = SPRD_DMA_BYTE_STEP;
+   des_step = SPRD_DMA_BYTE_STEP;
}
 
fragment_len = SPRD_DMA_MEMCPY_MIN_SIZE;
-- 
1.7.9.5



Re: [RESEND PATCH] x86/boot/KASLR: Extend movable_node option for KASLR

2018-04-18 Thread Dou Liyang

Hi Ingo,

Any comments about that?

Now, When users want to support node hotplug with KASLR, they use
'mem=' to restrict the boot-up memory to the first node memory size.
If we want to boot up some hotpluggable node, their memory can't be
shown.

IMO, only few machines can support physical NUMA Node hotplug, and
we can't get memory hotplug info from ACPI SRAT earlier now(If we can do
that, we even can remove the 'movable_node' option).

So, IMO, extend movable_node to replace the misuse of 'mem' option.

Thought?

Thanks,

dou

At 04/03/2018 11:36 AM, Dou Liyang wrote:

The movable_node option is a boot-time switch to make sure the physical
NUMA nodes can be hot-added/removed when ACPI table can't be parsed to
provide the memory hotplug information.

As we all know, there is always one node, called "home node", which
can't be movabled and the kernel image resides in it. With movable_node
option, Linux allocates new early memorys near the kernel image to avoid
using the other movable node.

But, due to KASLR also can't get the the memory hotplug information, it may
randomize the kernel image into a movable node which breaks the rule of
movable_node option and makes the physical hot-add/remove operation failed.

The perfect solution is providing the memory hotplug information to KASLR.
But, it needs the efforts from hardware engineers and software engineers.

Here is an alternative method. Extend movable_node option to restrict kernel
to be randomized in the home node by adding a parameter. this parameter sets
up the boundaries between the home nodes and other nodes.

Reported-by: Chao Fan 
Signed-off-by: Dou Liyang 
Reviewed-by: Kees Cook 
---
Changelog:
   -Rewrite the commit log and document.

  Documentation/admin-guide/kernel-parameters.txt | 12 ++--
  arch/x86/boot/compressed/kaslr.c| 19 ---
  2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 1d1d53f85ddd..0cfc0b10a117 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2353,7 +2353,8 @@
mousedev.yres=  [MOUSE] Vertical screen resolution, used for devices
reporting absolute coordinates, such as tablets
  
-	movablecore=nn[KMG]	[KNL,X86,IA-64,PPC] This parameter

+   movablecore=nn[KMG]
+   [KNL,X86,IA-64,PPC] This parameter
is similar to kernelcore except it specifies the
amount of memory used for migratable allocations.
If both kernelcore and movablecore is specified,
@@ -2363,12 +2364,19 @@
that the amount of memory usable for all allocations
is not too small.
  
-	movable_node	[KNL] Boot-time switch to make hotplugable memory

+   movable_node[KNL] Boot-time switch to make hot-pluggable memory
NUMA nodes to be movable. This means that the memory
of such nodes will be usable only for movable
allocations which rules out almost all kernel
allocations. Use with caution!
  
+	movable_node=nn[KMG]

+   [KNL] Extend movable_node to make it work well with 
KASLR.
+   This parameter is the boundaries between the "home 
node" and
+   the other nodes. The "home node" is an immovable node 
and is
+   defined by BIOS. Set the 'nn' to the memory size of 
"home
+   node", the kernel image will be extracted in immovable 
nodes.
+
MTD_Partition=  [MTD]
Format: ,,,
  
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c

index 8199a6187251..f906d7890e69 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -92,7 +92,10 @@ struct mem_vector {
  static bool memmap_too_large;
  
  
-/* Store memory limit specified by "mem=nn[KMG]" or "memmap=nn[KMG]" */

+/*
+ * Store memory limit specified by the following situations:
+ * "mem=nn[KMG]" or "memmap=nn[KMG]" or "movable_node=nn[KMG]"
+ */
  unsigned long long mem_limit = ULLONG_MAX;
  
  
@@ -214,7 +217,8 @@ static int handle_mem_memmap(void)

char *param, *val;
u64 mem_size;
  
-	if (!strstr(args, "memmap=") && !strstr(args, "mem="))

+   if (!strstr(args, "memmap=") && !strstr(args, "mem=") &&
+   !strstr(args, "movable_node="))
return 0;
  
  	tmp_cmdline = malloc(len + 1);

@@ -249,7 +253,16 @@ static int handle_mem_memmap(void)
free(tmp_cmdline);
return -EINVAL;
}
-   mem_limit = mem_size;
+   

d17a1d97dc ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow"): [ 0.001000] BUG: KASAN: use-after-scope in console_unlock

2018-04-18 Thread Fengguang Wu
On Thu, Apr 19, 2018 at 10:17:57AM +0800, Fengguang Wu wrote:
>Hello,
>
>FYI this happens in mainline kernel 4.17.0-rc1.
>It at least dates back to v4.15-rc1 .
>
>The regression was reported before
>
> https://lkml.org/lkml/2017/11/30/33
>
>Where the last message from Dmitry mentions that use-after-scope has
>known false positives with CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL=y
>If so, what would be the best way to workaround such false positives
>in boot testing? Disable the above config?
>
>0day bisects produce diverged results, with 2 of them converge to
>commit d17a1d97dc ("x86/mm/kasan: don't use vmemmap_populate() to
>initialize shadow") and 1 bisected to the earlier a4a3ede213 ("mm:
>zero reserved and unavailable struct pages"). I'll send the bisect
>reports in follow up emails.

Here is the bisect report for

commit d17a1d97dc208d664c91cc387ffb752c7f85dc61
Author: Andrey Ryabinin 
AuthorDate: Wed Nov 15 17:36:35 2017 -0800
Commit: Linus Torvalds 
CommitDate: Wed Nov 15 18:21:05 2017 -0800

 x86/mm/kasan: don't use vmemmap_populate() to initialize shadow
 
 The kasan shadow is currently mapped using vmemmap_populate() since that
 provides a semi-convenient way to map pages into init_top_pgt.  However,
 since that no longer zeroes the mapped pages, it is not suitable for
 kasan, which requires zeroed shadow memory.
 
 Add kasan_populate_shadow() interface and use it instead of
 vmemmap_populate().  Besides, this allows us to take advantage of
 gigantic pages and use them to populate the shadow, which should save us
 some memory wasted on page tables and reduce TLB pressure.
 
 Link: 
http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatas...@oracle.com
 Signed-off-by: Andrey Ryabinin 
 Signed-off-by: Pavel Tatashin 
 Cc: Steven Sistare 
 Cc: Daniel Jordan 
 Cc: Bob Picco 
 Cc: Michal Hocko 
 Cc: Alexander Potapenko 
 Cc: Ard Biesheuvel 
 Cc: Catalin Marinas 
 Cc: Christian Borntraeger 
 Cc: David S. Miller 
 Cc: Dmitry Vyukov 
 Cc: Heiko Carstens 
 Cc: "H. Peter Anvin" 
 Cc: Ingo Molnar 
 Cc: Mark Rutland 
 Cc: Matthew Wilcox 
 Cc: Mel Gorman 
 Cc: Michal Hocko 
 Cc: Sam Ravnborg 
 Cc: Thomas Gleixner 
 Cc: Will Deacon 
 Signed-off-by: Andrew Morton 
 Signed-off-by: Linus Torvalds 

a4a3ede213  mm: zero reserved and unavailable struct pages
d17a1d97dc  x86/mm/kasan: don't use vmemmap_populate() to initialize shadow
d6bbd51587  Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
73005e1a35  Add linux-next specific files for 20180103
+++++---+
|| a4a3ede213 | d17a1d97dc | d6bbd51587 | 
next-20180103 |
+++++---+
| boot_successes | 35 | 0  | 0  | 10
|
| boot_failures  | 0  | 15 | 17 |   
|
| BUG:KASAN:use-after-scope_in_c | 0  | 15 | 17 |   
|
+++++---+

[0.004000]  Tasks RCU enabled.
[0.004000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
[0.004000] NR_IRQS: 4352, nr_irqs: 440, preallocated irqs: 16
[0.004000]  Offload RCU callbacks from CPUs: .
[0.004000] 
==
[0.004000] BUG: KASAN: use-after-scope in console_unlock+0x516/0x7bf
[0.004000] Write of size 4 at addr af207aa0 by task swapper/0
[0.004000] 
[0.004000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0-04319-gd17a1d9 #2
[0.004000] Call Trace:
[0.004000]  ? dump_stack+0xd1/0x178
[0.004000]  ? _atomic_dec_and_lock+0x11a/0x11a
[0.004000]  ? show_regs_print_info+0x51/0x51
[0.004000]  ? do_raw_spin_unlock+0x223/0x247
[0.004000]  ? print_address_description+0x94/0x2d9
[0.004000]  ? console_unlock+0x516/0x7bf
[0.004000]  ? kasan_report+0x21e/0x244
[0.004000]  ? console_unlock+0x516/0x7bf
[0.004000]  ? wake_up_klogd+0xe6/0xe6
[0.004000]  ? vprintk_emit+0x3ee/0x426
[0.004000]  ? 

Re: [PATCH v2] prctl: fix compat handling for prctl

2018-04-18 Thread Andy Lutomirski
> On Apr 18, 2018, at 9:06 PM, Li Bin  wrote:
>
> The member auxv in prctl_mm_map structure which be shared with
> userspace is pointer type, but the kernel supporting COMPAT didn't
> handle it. This patch fix the compat handling for prctl syscall.

I would propose an alternative fix: change the type to u64. As far as
I know, this thing is only used by CRIU, and CRIU doesn’t work (AFAIK)
on native 32-bit anyway.   Do you know of some reason that this
wouldn't work?


Re: [RFC PATCH spi] spi: pxa2xx: pxa2xx_spi_transfer_one() can be static

2018-04-18 Thread Jarkko Nikula

On 04/17/18 22:53, kbuild test robot wrote:


Fixes: d5898e19c0d7 ("spi: pxa2xx: Use core message processing loop")
Signed-off-by: Fengguang Wu 
---
  spi-pxa2xx.c |6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/spi/spi-pxa2xx.c b/drivers/spi/spi-pxa2xx.c
index c852ea5..40f1346 100644
--- a/drivers/spi/spi-pxa2xx.c
+++ b/drivers/spi/spi-pxa2xx.c
@@ -911,9 +911,9 @@ static bool pxa2xx_spi_can_dma(struct spi_controller 
*master,
   xfer->len >= chip->dma_burst_size;
  }
  
-int pxa2xx_spi_transfer_one(struct spi_controller *master,

-   struct spi_device *spi,
-   struct spi_transfer *transfer)
+static int pxa2xx_spi_transfer_one(struct spi_controller *master,
+  struct spi_device *spi,
+  struct spi_transfer *transfer)


Thanks Fengguang. I don't understand how I managed to drop "static" 
while doing manual s/pump_transfers/pxa2xx_spi_transfer_one/ :-)


Reviewed-by: Jarkko Nikula 


Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-04-18 Thread Dave Young
Hi Rahul,
On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
> On production servers running variety of workloads over time, kernel
> panic can happen sporadically after days or even months. It is
> important to collect as much debug logs as possible to root cause
> and fix the problem, that may not be easy to reproduce. Snapshot of
> underlying hardware/firmware state (like register dump, firmware
> logs, adapter memory, etc.), at the time of kernel panic will be very
> helpful while debugging the culprit device driver.
> 
> This series of patches add new generic framework that enable device
> drivers to collect device specific snapshot of the hardware/firmware
> state of the underlying device in the crash recovery kernel. In crash
> recovery kernel, the collected logs are added as elf notes to
> /proc/vmcore, which is copied by user space scripts for post-analysis.
> 
> The sequence of actions done by device drivers to append their device
> specific hardware/firmware logs to /proc/vmcore are as follows:
> 
> 1. During probe (before hardware is initialized), device drivers
> register to the vmcore module (via vmcore_add_device_dump()), with
> callback function, along with buffer size and log name needed for
> firmware/hardware log collection.

I assumed the elf notes info should be prepared while kexec_[file_]load
phase. But I did not read the old comment, not sure if it has been discussed
or not.

If do this in 2nd kernel a question is driver can be loaded later than vmcore 
init.
How to guarantee the function works if vmcore reading happens before
the driver is loaded?

Also it is possible that kdump initramfs does not contains the driver
module.

Am I missing something?

> 
> 2. vmcore module allocates the buffer with requested size. It adds
> an elf note and invokes the device driver's registered callback
> function.
> 
> 3. Device driver collects all hardware/firmware logs into the buffer
> and returns control back to vmcore module.
> 
> The device specific hardware/firmware logs can be seen as elf notes:
> 
> # readelf -n /proc/vmcore
> 
> Displaying notes found at file offset 0x1000 with length 0x04003288:
>   Owner Data size Description
>   VMCOREDD_cxgb4_:02:00.4 0x02000fd8  Unknown note type: (0x0700)
>   VMCOREDD_cxgb4_:04:00.4 0x02000fd8  Unknown note type: (0x0700)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   VMCOREINFO   0x074f Unknown note type: (0x)
> 
> Patch 1 adds API to vmcore module to allow drivers to register callback
> to collect the device specific hardware/firmware logs.  The logs will
> be added to /proc/vmcore as elf notes.
> 
> Patch 2 updates read and mmap logic to append device specific hardware/
> firmware logs as elf notes.
> 
> Patch 3 shows a cxgb4 driver example using the API to collect
> hardware/firmware logs in crash recovery kernel, before hardware is
> initialized.
> 
> Thanks,
> Rahul
> 
> RFC v1: https://lkml.org/lkml/2018/3/2/542
> RFC v2: https://lkml.org/lkml/2018/3/16/326
> 
> ---
> v4:
> - Made __vmcore_add_device_dump() static.
> - Moved compile check to define vmcore_add_device_dump() to
>   crash_dump.h to fix compilation when vmcore.c is not compiled in.
> - Convert ---help--- to help in Kconfig as indicated by checkpatch.
> - Rebased to tip.
> 
> v3:
> - Dropped sysfs crashdd module.
> - Exported dumps as elf notes. Suggested by Eric Biederman
>   .  Added as patch 2 in this version.
> - Added CONFIG_PROC_VMCORE_DEVICE_DUMP to allow configuring device
>   dump support.
> - Moved logic related to adding dumps from crashdd to vmcore module.
> - Rename all crashdd* to vmcoredd*.
> - Updated comments.
> 
> v2:
> - Added ABI Documentation for crashdd.
> - Directly use octal permission instead of macro.
> 
> Changes since rfc v2:
> - Moved exporting crashdd from procfs to sysfs. Suggested by
>   Stephen Hemminger 
> - Moved code from fs/proc/crashdd.c to fs/crashdd/ directory.
> - Replaced all proc API with sysfs API and updated comments.
> - Calling driver callback before creating the binary file under
>   crashdd sysfs.
> - Changed binary dump file permission from S_IRUSR to S_IRUGO.
> - Changed module name from CRASH_DRIVER_DUMP to CRASH_DEVICE_DUMP.
> 
> rfc v2:
> - Collecting logs in 2nd kernel instead of during kernel panic.
>   Suggested by Eric Biederman .
> - Added new crashdd module that exports /proc/crashdd/ containing
>   driver's registered 

Re: [PATCH v3 2/2] iommu/amd: Add basic debugfs infrastructure for AMD IOMMU

2018-04-18 Thread Yang, Shunyong
Hi, Gary and Sohil,

On Tue, 2018-04-17 at 13:38 -0400, Hook, Gary wrote:
> On 4/13/2018 8:08 PM, Mehta, Sohil wrote:
> > 
> > On Fri, 2018-04-06 at 08:17 -0500, Gary R Hook wrote:
> > > 
> > >   
> > > +
> > > +void amd_iommu_debugfs_setup(struct amd_iommu *iommu)
> > > +{
> > > + char name[MAX_NAME_LEN + 1];
> > > + struct dentry *d_top;
> > > +
> > > + if (!debugfs_initialized())
> > Probably not needed.
> Right.

When will this check is needed?
IMO, this function is to check debugfs ready status before we want to
use debugfs. I just want to understand when we should use
debugfs_initialized();

Thanks.
Shunyong.

> 
> > 
> > 
> > > 
> > > + return;
> > > +
> > > + mutex_lock(_iommu_debugfs_lock);
> > > + if (!amd_iommu_debugfs) {
> > > + d_top = iommu_debugfs_setup();
> > > + if (d_top)
> > > + amd_iommu_debugfs =
> > > debugfs_create_dir("amd", d_top);
> > > + }
> > > + mutex_unlock(_iommu_debugfs_lock);
> > 
> > You can do the above only once if you iterate over the IOMMUs here
> >   instead of doing it in amd_iommu_init.
> I'm not sure it matters, given the finite number of IOMMUs in a
> system, 
> and the fact that this work is done exactly once. However, removal of
> a 
> lock is fine thing, so I'll move this around.
> 
> > 
> > 
> > > 
> > > + if (amd_iommu_debugfs) {
> > > + snprintf(name, MAX_NAME_LEN, "iommu%02d", iommu-
> > > > 
> > > > index);
> > > + iommu->debugfs = debugfs_create_dir(name,
> > > + amd_iommu_de
> > > bugf
> > > s);
> > > + if (!iommu->debugfs) {
> > > + debugfs_remove_recursive(amd_iommu_debug
> > > fs);
> > > + amd_iommu_debugfs = NULL;
> > > + }
> > > + }
> > > +}
> > -Sohil
> > 
> ___
> iommu mailing list
> io...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] locking/rwsem: Synchronize task state & waiter->task of readers

2018-04-18 Thread Benjamin Herrenschmidt
On Tue, 2018-04-10 at 13:22 -0400, Waiman Long wrote:
> It was observed occasionally in PowerPC systems that there was reader
> who had not been woken up but that its waiter->task had been cleared.
> 
> One probable cause of this missed wakeup may be the fact that the
> waiter->task and the task state have not been properly synchronized as
> the lock release-acquire pair of different locks in the wakeup code path
> does not provide a full memory barrier guarantee. So smp_store_mb()
> is now used to set waiter->task to NULL to provide a proper memory
> barrier for synchronization.
> 
> Signed-off-by: Waiman Long 

That looks right... nothing in either lock or unlock will prevent a
store going past a load.

> ---
>  kernel/locking/rwsem-xadd.c | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index e795908..b3c588c 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -209,6 +209,23 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
>   smp_store_release(>task, NULL);
>   }
>  
> + /*
> +  * To avoid missed wakeup of reader, we need to make sure
> +  * that task state and waiter->task are properly synchronized.
> +  *
> +  * wakeup sleep
> +  * -- -
> +  * __rwsem_mark_wake:   rwsem_down_read_failed*:
> +  *   [S] waiter->task [S] set_current_state(state)
> +  *   MB   MB
> +  * try_to_wake_up:
> +  *   [L] state[L] waiter->task
> +  *
> +  * For the wakeup path, the original lock release-acquire pair
> +  * does not provide enough guarantee of proper synchronization.
> +  */
> + smp_mb();
> +
>   adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
>   if (list_empty(>wait_list)) {
>   /* hit end of list above */


RE: [PATCH 2/6 v2] iommu: of: make of_pci_map_rid() available for other devices too

2018-04-18 Thread Nipun Gupta


> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: Tuesday, April 17, 2018 10:23 PM
> To: Nipun Gupta ; robh...@kernel.org;
> frowand.l...@gmail.com
> Cc: will.dea...@arm.com; mark.rutl...@arm.com; catalin.mari...@arm.com;
> h...@lst.de; gre...@linuxfoundation.org; j...@8bytes.org;
> m.szyprow...@samsung.com; shawn...@kernel.org; bhelg...@google.com;
> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linuxppc-
> d...@lists.ozlabs.org; linux-...@vger.kernel.org; Bharat Bhushan
> ; stuyo...@gmail.com; Laurentiu Tudor
> ; Leo Li 
> Subject: Re: [PATCH 2/6 v2] iommu: of: make of_pci_map_rid() available for
> other devices too
> 
> On 17/04/18 11:21, Nipun Gupta wrote:
> > iommu-map property is also used by devices with fsl-mc. This patch
> > moves the of_pci_map_rid to generic location, so that it can be used
> > by other busses too.
> >
> > Signed-off-by: Nipun Gupta 
> > ---
> >   drivers/iommu/of_iommu.c | 106
> > +--
> 
> Doesn't this break "msi-parent" parsing for !CONFIG_OF_IOMMU? I guess you
> don't want fsl-mc to have to depend on PCI, but this looks like a step in the
> wrong direction.

Thanks for pointing out.
Agree, this will break "msi-parent" parsing for !CONFIG_OF_IOMMU case.

> 
> I'm not entirely sure where of_map_rid() fits best, but from a quick look 
> around
> the least-worst option might be drivers/of/of_address.c, unless Rob and Frank
> have a better idea of where generic DT-based ID translation routines could 
> live?
> 
> >   drivers/of/irq.c |   6 +--
> >   drivers/pci/of.c | 101 
> > 
> >   include/linux/of_iommu.h |  11 +
> >   include/linux/of_pci.h   |  10 -
> >   5 files changed, 117 insertions(+), 117 deletions(-)
> >

[...]

> >   struct of_pci_iommu_alias_info {
> > struct device *dev;
> > struct device_node *np;
> > @@ -149,9 +249,9 @@ static int of_pci_iommu_init(struct pci_dev *pdev, u16
> alias, void *data)
> > struct of_phandle_args iommu_spec = { .args_count = 1 };
> > int err;
> >
> > -   err = of_pci_map_rid(info->np, alias, "iommu-map",
> > -"iommu-map-mask", _spec.np,
> > -iommu_spec.args);
> > +   err = of_map_rid(info->np, alias, "iommu-map",
> > +"iommu-map-mask", _spec.np,
> > +iommu_spec.args);
> 
> Super-nit: Apparently I missed rewrapping this to 2 lines in d87beb749281, 
> but if
> it's being touched again, that would be nice ;)

Sure.. I'll take care of this in the next version :)

Regards,
Nipun


[PATCH 1/2] tracing: fix bad use of igrab in trace_uprobe.c

2018-04-18 Thread Song Liu
As Miklos reported and suggested:

  This pattern repeats two times in trace_uprobe.c and in
  kernel/events/core.c as well:

  ret = kern_path(filename, LOOKUP_FOLLOW, );
  if (ret)
  goto fail_address_parse;

  inode = igrab(d_inode(path.dentry));
  path_put();

  And it's wrong.  You can only hold a reference to the inode if you
  have an active ref to the superblock as well (which is normally
  through path.mnt) or holding s_umount.

  This way unmounting the containing filesystem while the tracepoint is
  active will give you the "VFS: Busy inodes after unmount..." message
  and a crash when the inode is finally put.

  Solution: store path instead of inode.

This patch fixes two instances in trace_uprobe.c.

Fixes: f3f096cfedf8 ("tracing: Provide trace events interface for uprobes")
Fixes: 33ea4b24277b ("perf/core: Implement the 'perf_uprobe' PMU")
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: Howard McLauchlan 
Cc: Josef Bacik 
Cc: Srikar Dronamraju 
Reported-by: Miklos Szeredi 
Signed-off-by: Song Liu 
---
 kernel/trace/trace_uprobe.c | 42 ++
 1 file changed, 14 insertions(+), 28 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 0d450b4..80dfcdf 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -55,7 +55,7 @@ struct trace_uprobe {
struct list_headlist;
struct trace_uprobe_filter  filter;
struct uprobe_consumer  consumer;
-   struct inode*inode;
+   struct path path;
char*filename;
unsigned long   offset;
unsigned long   nhit;
@@ -289,7 +289,7 @@ static void free_trace_uprobe(struct trace_uprobe *tu)
for (i = 0; i < tu->tp.nr_args; i++)
traceprobe_free_probe_arg(>tp.args[i]);
 
-   iput(tu->inode);
+   path_put(>path);
kfree(tu->tp.call.class->system);
kfree(tu->tp.call.name);
kfree(tu->filename);
@@ -363,7 +363,6 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
 static int create_trace_uprobe(int argc, char **argv)
 {
struct trace_uprobe *tu;
-   struct inode *inode;
char *arg, *event, *group, *filename;
char buf[MAX_EVENT_NAME_LEN];
struct path path;
@@ -371,7 +370,6 @@ static int create_trace_uprobe(int argc, char **argv)
bool is_delete, is_return;
int i, ret;
 
-   inode = NULL;
ret = 0;
is_delete = false;
is_return = false;
@@ -448,14 +446,6 @@ static int create_trace_uprobe(int argc, char **argv)
if (ret)
goto fail_address_parse;
 
-   inode = igrab(d_inode(path.dentry));
-   path_put();
-
-   if (!inode || !S_ISREG(inode->i_mode)) {
-   ret = -EINVAL;
-   goto fail_address_parse;
-   }
-
ret = kstrtoul(arg, 0, );
if (ret)
goto fail_address_parse;
@@ -490,7 +480,8 @@ static int create_trace_uprobe(int argc, char **argv)
goto fail_address_parse;
}
tu->offset = offset;
-   tu->inode = inode;
+   tu->path.mnt = path.mnt;
+   tu->path.dentry = path.dentry;
tu->filename = kstrdup(filename, GFP_KERNEL);
 
if (!tu->filename) {
@@ -558,7 +549,7 @@ static int create_trace_uprobe(int argc, char **argv)
return ret;
 
 fail_address_parse:
-   iput(inode);
+   path_put();
 
pr_info("Failed to parse address or file.\n");
 
@@ -937,7 +928,8 @@ probe_event_enable(struct trace_uprobe *tu, struct 
trace_event_file *file,
goto err_flags;
 
tu->consumer.filter = filter;
-   ret = uprobe_register(tu->inode, tu->offset, >consumer);
+   ret = uprobe_register(d_inode(tu->path.dentry), tu->offset,
+ >consumer);
if (ret)
goto err_buffer;
 
@@ -981,7 +973,7 @@ probe_event_disable(struct trace_uprobe *tu, struct 
trace_event_file *file)
 
WARN_ON(!uprobe_filter_is_empty(>filter));
 
-   uprobe_unregister(tu->inode, tu->offset, >consumer);
+   uprobe_unregister(d_inode(tu->path.dentry), tu->offset, >consumer);
tu->tp.flags &= file ? ~TP_FLAG_TRACE : ~TP_FLAG_PROFILE;
 
uprobe_buffer_disable();
@@ -1056,7 +1048,8 @@ static int uprobe_perf_close(struct trace_uprobe *tu, 
struct perf_event *event)
write_unlock(>filter.rwlock);
 
if (!done)
-   return uprobe_apply(tu->inode, tu->offset, >consumer, 
false);
+   return uprobe_apply(d_inode(tu->path.dentry), tu->offset,
+   >consumer, false);
 
return 0;
 }
@@ -1088,7 +1081,8 @@ static int uprobe_perf_open(struct trace_uprobe *tu, 
struct perf_event *event)
 
err = 0;
if (!done) {
-   err = uprobe_apply(tu->inode, tu->offset, 

[PATCH 2/2] perf/core: fix bad use of igrab in kernel/event/core.c

2018-04-18 Thread Song Liu
As Miklos reported and suggested:

  This pattern repeats two times in trace_uprobe.c and in
  kernel/events/core.c as well:

  ret = kern_path(filename, LOOKUP_FOLLOW, );
  if (ret)
  goto fail_address_parse;

  inode = igrab(d_inode(path.dentry));
  path_put();

  And it's wrong.  You can only hold a reference to the inode if you
  have an active ref to the superblock as well (which is normally
  through path.mnt) or holding s_umount.

  This way unmounting the containing filesystem while the tracepoint is
  active will give you the "VFS: Busy inodes after unmount..." message
  and a crash when the inode is finally put.

  Solution: store path instead of inode.

This patch fixes the issue in kernel/event/core.c.

NOTE: Based on my understanding, perf_addr_filter only supports intel_pt.
However, my test system doesn't support address filtering (or I made a
mistake?). Therefore, I have NOT tested this patch.

Could someone please help test it?

Fixes: 375637bc5249 ("perf/core: Introduce address range filtering")
Cc: Alexander Shishkin 
Cc: Ingo Molnar 
Cc: Peter Zijlstra (Intel) 
Reported-by: Miklos Szeredi 
Signed-off-by: Song Liu 
---
 arch/x86/events/intel/pt.c |  4 ++--
 include/linux/perf_event.h |  2 +-
 kernel/events/core.c   | 21 +
 3 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 3b99394..8d016ce 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -1194,7 +1194,7 @@ static int pt_event_addr_filters_validate(struct 
list_head *filters)
filter->action == PERF_ADDR_FILTER_ACTION_START)
return -EOPNOTSUPP;
 
-   if (!filter->inode) {
+   if (!filter->path.dentry) {
if (!valid_kernel_ip(filter->offset))
return -EINVAL;
 
@@ -1221,7 +1221,7 @@ static void pt_event_addr_filters_sync(struct perf_event 
*event)
return;
 
list_for_each_entry(filter, >list, entry) {
-   if (filter->inode && !offs[range]) {
+   if (filter->path.dentry && !offs[range]) {
msr_a = msr_b = 0;
} else {
/* apply the offset */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e71e99e..88922d8 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -467,7 +467,7 @@ enum perf_addr_filter_action_t {
  */
 struct perf_addr_filter {
struct list_headentry;
-   struct inode*inode;
+   struct path path;
unsigned long   offset;
unsigned long   size;
enum perf_addr_filter_action_t  action;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d7af828..7d711ed 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6668,7 +6668,7 @@ static void perf_event_addr_filters_exec(struct 
perf_event *event, void *data)
 
raw_spin_lock_irqsave(>lock, flags);
list_for_each_entry(filter, >list, entry) {
-   if (filter->inode) {
+   if (filter->path.dentry) {
event->addr_filters_offs[count] = 0;
restart++;
}
@@ -7333,7 +7333,7 @@ static bool perf_addr_filter_match(struct 
perf_addr_filter *filter,
 struct file *file, unsigned long offset,
 unsigned long size)
 {
-   if (filter->inode != file_inode(file))
+   if (d_inode(filter->path.dentry) != file_inode(file))
return false;
 
if (filter->offset > offset + size)
@@ -8674,8 +8674,7 @@ static void free_filters_list(struct list_head *filters)
struct perf_addr_filter *filter, *iter;
 
list_for_each_entry_safe(filter, iter, filters, entry) {
-   if (filter->inode)
-   iput(filter->inode);
+   path_put(>path);
list_del(>entry);
kfree(filter);
}
@@ -8772,7 +8771,7 @@ static void perf_event_addr_filters_apply(struct 
perf_event *event)
 * Adjust base offset if the filter is associated to a binary
 * that needs to be mapped:
 */
-   if (filter->inode)
+   if (filter->path.dentry)
event->addr_filters_offs[count] =
perf_addr_filter_apply(filter, mm);
 
@@ -8846,7 +8845,6 @@ perf_event_parse_addr_filter(struct perf_event *event, 
char *fstr,
 {
struct perf_addr_filter *filter = NULL;
char *start, *orig, *filename = NULL;
-   struct path path;
substring_t args[MAX_OPT_ARGS];
int state = IF_STATE_ACTION, token;
unsigned int kernel = 0;
@@ -8959,19 +8957,18 @@ perf_event_parse_addr_filter(struct perf_event *event, 
char 

Re: [PATCH] usb: always build usb/common/ targets; fixes extcon-axp288 build error

2018-04-18 Thread Randy Dunlap
On 04/17/18 02:01, Hans de Goede wrote:
> Hi,
> 
> On 17-04-18 07:14, Randy Dunlap wrote:
>> From: Randy Dunlap 
>>
>> The extcon-axp288 driver selects USB_ROLE_SWITCH, but the USB
>> Makefile does not currently build drivers/usb/common/ (where
>> USB_ROLE_SWITCH code is) unless USB_COMMON is set, so modify
>> the USB Makefile to always descend into drivers/usb/common/
>> to build its configured targets.
>>
>> Fixes these build errors:
>>
>> ERROR: "usb_role_switch_get" [drivers/extcon/extcon-axp288.ko] undefined!
>> ERROR: "usb_role_switch_set_role" [drivers/extcon/extcon-axp288.ko] 
>> undefined!
>> ERROR: "usb_role_switch_get_role" [drivers/extcon/extcon-axp288.ko] 
>> undefined!
>> ERROR: "usb_role_switch_put" [drivers/extcon/extcon-axp288.ko] undefined!
>>
>> An alternative patch would be to select USB_COMMON in the EXTCON_AXP288
>> driver Kconfig entry, but this would build more code in
>> drivers/usb/common/ than is necessary.
> 
> Ah, that variant of fixing this got posted yesterday and I acked that,
> but I agree that this version is better.

That was my first patch version, but I didn't like it.

However, I missed that patch. If I had seen it, I wouldn't have posted
this patch.


> Greg, what is your take on this fix?
> 
> Chanwoo Choi, please wait with merging the fix from yesterday until
> we've a decision which fix to use.
> 
> Regards,
> 
> Hans
> 
> 
> 
>>
>> Reported-by: Fengguang Wu 
>> Signed-off-by: Randy Dunlap 
>> Cc: MyungJoo Ham 
>> Cc: Chanwoo Choi 
>> Cc: Hans de Goede 
>> Cc: Greg Kroah-Hartman 
>> Cc: Andy Shevchenko 
>> Cc: Heikki Krogerus 
>> Cc: linux-...@vger.kernel.org
>> ---
>>   drivers/usb/Makefile |    2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> --- lnx-417-rc1.orig/drivers/usb/Makefile
>> +++ lnx-417-rc1/drivers/usb/Makefile
>> @@ -60,7 +60,7 @@ obj-$(CONFIG_USB_CHIPIDEA)    += chipidea/
>>   obj-$(CONFIG_USB_RENESAS_USBHS)    += renesas_usbhs/
>>   obj-$(CONFIG_USB_GADGET)    += gadget/
>>   -obj-$(CONFIG_USB_COMMON)    += common/
>> +obj-y    += common/
>>     obj-$(CONFIG_USBIP_CORE)    += usbip/
>>  
>>
> 


-- 
~Randy


Re: [RFC 2/6] dmaengine: xilinx_dma: Pass AXI4-Stream control words to netdev dma client

2018-04-18 Thread Peter Ujfalusi

On 2018-04-17 18:54, Lars-Peter Clausen wrote:
> On 04/17/2018 04:53 PM, Peter Ujfalusi wrote:
>> On 2018-04-17 16:58, Lars-Peter Clausen wrote:
> There are two options.
>
> Either you extend the generic interfaces so it can cover your usecase in a
> generic way. E.g. the ability to attach meta data to transfer.

 Fwiw I have this patch as part of a bigger work to achieve similar results:
>>>
>>> That's good stuff. Is this in a public tree somewhere?
>>
>> Not atm. I can not send the user of the new API and I did not wanted to
>> send something like this out of the blue w/o context.
>>
>> But as it is a generic patch, I can send it as well. The only thing is
>> that the need for the memcpy, so I might end up with
>> ptr = get_metadata_ptr(desc, ); /* size: in RX the valid size */
>>
>> and set_metadata_size(); /* in TX to tell how the client placed */
>>
>> Or something like that, the attach_metadata() as it is works just fine,
>> but high throughput might not like the memcpy.
>>
> 
> In the most abstracted way I'd say metadata and data are two different data
> streams that are correlated and send/received at the same time.

In my case the meatdata is sideband information or parameters for/from
the remote end. Like timestamp, algorithm parameters, keys, etc.

It is tight to the data payload, but it is not part of it.

But the API should be generic enough to cover other use cases where
clients need to provide additional information.
For me, the metadata is part of the descriptor we give and receive back
from the DMA, others might have sideband channel to send that.

For metadata handling we could have:

struct dma_desc_metadata_ops {
   /* To give a buffer for the DMA with the metadata, as it was in my
* original patch
*/
   int (*desc_attach_metadata)(struct dma_async_tx_descriptor *desc,
   void *data, size_t len);

   void *(*desc_get_metadata_ptr)(struct dma_async_tx_descriptor *desc,
  size_t *payload_len, size_t *max_len);
   int (*desc_set_payload_len)(struct dma_async_tx_descriptor *desc,
  size_t payload_len);
};

Probably a simple flag variable to indicate which of the two modes are
supported:
1. Client provided metadata buffer handling
Clients provide the buffer via desc_attach_metadata(), the DMA driver
will do whatever it needs to do, copy it in place, send it differently,
use parameters.
In RX the received metadata is going to be placed to the provided buffer.
2. Ability to give the metadata pointer to user to work on it.
In TX, clients can use desc_get_metadata_ptr() to get the pointer,
current payload size and maximum size of the metadata and can work
directly on the buffer to place the data. Then desc_set_payload_len() to
let the DMA know how much data is actually placed there.
In RX, desc_get_metadata_ptr() will give the user the pointer and the
payload size so it can process that information correctly.

DMA driver can implement either or both, but clients must only use
either 1 or 2 to work with the metadata.


> Think multi-planar transfer, like for audio when the right and left channel
> are in separate buffers and not interleaved. Or video with different
> color/luminance components in separate buffers. This is something that is at
> the moment not covered by the dmaengine API either.

Hrm, true, but it is hardly the metadata use case. It is more like
different DMA transfer type.

> Or you can implement a interface that is specific to your DMA controller 
> and
> any client using this interface knows it is talking to your DMA 
> controller.

 Hrm, so we can have DMA driver specific calls? The reason why TI's 
 keystone 2
 navigator DMA support was rejected that it was introducing NAV specific 
 calls
 for clients to configure features not yet supported by the framework.
>>>
>>> In my opinion it is OK, somebody else might have different ideas. I mean it
>>> is not nice, but it is better than the alternative of overloading the
>>> generic API with driver specific semantics or introducing some kind of IOCTL
>>> catch all callback.
>>
>> True, but the generic API can be extended as well to cover new grounds,
>> features. Like this metadata thing.
>>
>>> If there is tight coupling between the DMA core and client and there is no
>>> intention of using a generic client the best solution might even be to no
>>> use DMAengine at all.
>>
>> This is how the knav stuff ended up. Well it is only used by networking
>> atm, so it is 'fine' to have custom API, but it is not portable.
> 
> I totally agree generic APIs are better, but not everybody has the resources
> to rewrite the whole framework just because they want to do this tiny thing
> that isn't covered by the framework yet. In that case it is better to go
> with a custom API (that might evolve into a generic API), rather than
> overloading the generic API and putting a strain on 

Re: [PATCH] powerpc: Allow selection of CONFIG_LD_DEAD_CODE_DATA_ELIMINATION

2018-04-18 Thread Christophe LEROY



Le 17/04/2018 à 19:10, Mathieu Malaterre a écrit :

On Tue, Apr 17, 2018 at 6:49 PM, Christophe LEROY
 wrote:



Le 17/04/2018 à 18:45, Mathieu Malaterre a écrit :


On Tue, Apr 17, 2018 at 12:49 PM, Christophe Leroy
 wrote:


This option does dead code and data elimination with the linker by
compiling with -ffunction-sections -fdata-sections and linking with
--gc-sections.

By selecting this option on mpc885_ads_defconfig,
vmlinux LOAD segment size gets reduced by 10%

Program Header before the patch:
  LOAD off0x0001 vaddr 0xc000 paddr 0x align 2**16
   filesz 0x0036eda4 memsz 0x0038de04 flags rwx

Program Header after the patch:
  LOAD off0x0001 vaddr 0xc000 paddr 0x align 2**16
   filesz 0x00316da4 memsz 0x00334268 flags rwx

Signed-off-by: Christophe Leroy 
---
   arch/powerpc/Kconfig | 8 
   1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8fe4353be5e3..e1fac49cf465 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -888,6 +888,14 @@ config PPC_MEM_KEYS

If unsure, say y.

+config PPC_UNUSED_ELIMINATION
+   bool "Eliminate unused functions and data from vmlinux"
+   default n
+   select LD_DEAD_CODE_DATA_ELIMINATION
+   help
+ Select this to do dead code and data elimination with the
linker
+ by compiling with -ffunction-sections -fdata-sections and
linking
+ with --gc-sections.
   endmenu



Just for reference, I cannot boot my Mac Mini G4 anymore (yaboot). The
messages I can see (prom_init) are:



Which version of GCC do you use ?


$ powerpc-linux-gnu-gcc --version
powerpc-linux-gnu-gcc (Debian 6.3.0-18) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

this is simply coming from:

$ apt-cache policy crossbuild-essential-powerpc
crossbuild-essential-powerpc:
   Installed: 12.3
   Candidate: 12.3
   Version table:
  *** 12.3 500
 500 http://ftp.fr.debian.org/debian stretch/main amd64 Packages
 500 http://ftp.fr.debian.org/debian stretch/main i386 Packages
 100 /var/lib/dpkg/status



Can you provide the generated System.map with and without that option active
?


$ du -sh g4/System.map.*
1.7M g4/System.map.with
1.8M g4/System.map.without


Here below is the list of objects removed with the option selected. I 
can't see anything suspect at first.
Do you use one of the defconfigs of the kernel ? Otherwise, can you 
provide your .config ?
Can you also provide a copy of the messages you can see (prom_init ...) 
when boot is ok ?

Maybe you can also send me the two vmlinux objects.

Thanks
Christophe

account_steal_time
adbhid_exit
adb_reset_bus
add_range
add_range_with_merge
aes_fini
af_unix_exit
agp_exit
agp_find_client_by_pid
agp_find_mem_by_key
agp_find_private
agp_free_memory_wrap
agpioc_protect_wrap
agpioc_release_wrap
agp_uninorth_cleanup
__alloc_reserved_percpu
all_stat_sessions
all_stat_sessions_mutex
apple_driver_exit
arch_cpu_idle_dead
arch_setup_msi_irq
arch_teardown_msi_irq
arch_tlb_gather_mmu
asymmetric_key_cleanup
asymmetric_key_hex_to_key_id
ata_exit
ata_tf_to_lba
ata_tf_to_lba48
attribute_container_add_class_device_adapter
attribute_container_trigger
backlight_class_exit
bdi_lock
bhrb_table
biovec_create_pool
blk_stat_enable_accounting
boot_mapsize
bpf_map_meta_equal
bvec_free
bvec_nr_vecs
calc_load_fold_active
can_request_irq
capacity_margin
cap_inode_getsecurity
cap_mmap_file
cfq_exit
cgroup_is_threaded
cgroup_is_thread_root
cgroup_migrate_add_src
cgroup_migrate_vet_dst
cgroup_on_dfl
cgroup_sk_update_lock
cgroupstats_build
cgroup_task_count
cgroup_transfer_tasks
change_protection
clean_sort_range
clear_ftrace_function
clear_zone_contiguous
__clockevents_update_freq
clockevents_update_freq
clocksource_mark_unstable
clocksource_touch_watchdog
clone_property.isra.2
cmp_range
cn_fini
cn_queue_free_dev
collect_mounts
compaction_restarting
copy_fpr_from_user
copy_fpr_to_user
copy_mount_string
copy_msg
cpu_check_up_prepare
cpufreq_boost_trigger_state
cpufreq_gov_performance_exit
cpu_hotplug_state
cpu_in_idle
cpu_report_state
cpu_set_state_online
cpu_temp
crashk_low_res
crash_wake_offline
create_prof_cpu_mask
crypto_algapi_exit
crypto_exit_proc
crypto_null_mod_fini
crypto_wq_exit
css_rightmost_descendant
css_set_lock
cubictcp_unregister
__current_kernel_time
d_absolute_path
dbg_release_bp_slot
dbg_reserve_bp_slot
deadline_exit
deadline_exit
debug_guardpage_ops
default_restore_msi_irqs
default_teardown_msi_irqs
del_named_trigger
dereference_module_function_descriptor
__dev_pm_qos_flags
dev_pm_qos_read_value
devtree_lock
die_will_crash
disable_cpufreq
dma_buf_deinit
dma_common_contiguous_remap
dma_common_pages_remap
__dma_get_required_mask
dma_pfn_limit_to_zone
do_execveat
do_fork
__domain_nr
do_msg_redirect_map

4.14.34: kernel stack regs has bad 'bp' value

2018-04-18 Thread Daniel J Blueman
When running stress-ng on 4.14.34 mainline on x86, I ran into a
"kernel stack regs has bad 'bp' value" warning [1].

Let me know if any more information/debug is useful.

Thanks,
  Daniel

-- [1]

WARNING: kernel stack regs at 880638ad76b8 in
stress-ng-af-al:32670 has bad 'bp' value 737a756eb765e87a
unwind stack type:0 next_sp: (null) mask:0x6 graph_idx:0
88105ef47af8: 88105ef47b88 (0x88105ef47b88)
88105ef47b00: 810a5a22 (__save_stack_trace+0x82/0x100)
88105ef47b08:  ...
88105ef47b10: 880638ad (0x880638ad)
88105ef47b18: 880638ad8000 (0x880638ad8000)
88105ef47b20:  ...
88105ef47b28: 0006 (0x6)
88105ef47b30: 881004cd9f40 (0x881004cd9f40)
88105ef47b38: 0101 (0x101)
88105ef47b40:  ...
88105ef47b48: 88105ef47af8 (0x88105ef47af8)
88105ef47b50: a0e201e3 (._mainloop+0x8c/0x4ca [salsa20_x86_64])
88105ef47b58: 880638ad76b8 (0x880638ad76b8)
88105ef47b60: b5d8152ac2b05d00 (0xb5d8152ac2b05d00)
88105ef47b68: 0100 (0x100)
88105ef47b70: 8810513d4a00 (0x8810513d4a00)
88105ef47b78: 8810513d4b00 (0x8810513d4b00)
88105ef47b80: 816b2fb3 (file_free_rcu+0x53/0x70)
88105ef47b88: 88105ef47b98 (0x88105ef47b98)
88105ef47b90: 810a5abb (save_stack_trace+0x1b/0x20)
88105ef47b98: 88105ef47dc8 (0x88105ef47dc8)
88105ef47ba0: 81648e53 (save_stack+0x43/0xd0)
88105ef47ba8: 004b (0x4b)
88105ef47bb0: 88105ef47bc0 (0x88105ef47bc0)
88105ef47bb8:  (0x)
88105ef47bc0: 810a5abb (save_stack_trace+0x1b/0x20)
88105ef47bc8: 81648e53 (save_stack+0x43/0xd0)
88105ef47bd0: 81649762 (kasan_slab_free+0x72/0xc0)
88105ef47bd8: 8164473c (kmem_cache_free+0x7c/0x1f0)
88105ef47be0: 816b2fb3 (file_free_rcu+0x53/0x70)
88105ef47be8: 812b8cbd (rcu_process_callbacks+0x39d/0xde0)
88105ef47bf0: 82c00184 (__do_softirq+0x184/0x5b7)
88105ef47bf8: 811721e8 (irq_exit+0x1e8/0x220)
88105ef47c00: 82a03ca8 (smp_apic_timer_interrupt+0xd8/0x2f0)
88105ef47c08: 82a0213e (apic_timer_interrupt+0x8e/0xa0)
88105ef47c10: a0e201e3 (._mainloop+0x8c/0x4ca [salsa20_x86_64])
88105ef47c18: 8332be80 (inat_primary_table+0x17efc0/0x1d0d97)
88105ef47c20: 812ef0a0 (posix_cpu_timers_exit_group+0x50/0x50)
88105ef47c28: 0008 (0x8)
88105ef47c30: 88105ef47c88 (0x88105ef47c88)
88105ef47c38: 811f9feb (__update_load_avg_se.isra.30+0x3cb/0x550)
88105ef47c40: 811f9feb (__update_load_avg_se.isra.30+0x3cb/0x550)
88105ef47c48: 88100a5b9f90 (0x88100a5b9f90)
88105ef47c50: 03b5 (0x03b5)
88105ef47c58: 88100a5b9f01 (0x88100a5b9f01)
88105ef47c60: 0007 (0x7)
88105ef47c68: 0005 (0x5)
88105ef47c70: 88105ef6bbf8 (0x88105ef6bbf8)
88105ef47c78:  ...
88105ef47c80: 0007 (0x7)
88105ef47c88: 88105ef47d28 (0x88105ef47d28)
88105ef47c90: 811fd2f6 (cpu_load_update+0x1b6/0x3a0)
88105ef47c98: 00015ef6bc30 (0x15ef6bc30)
88105ef47ca0: 88100a5b9e00 (0x88100a5b9e00)
88105ef47ca8: 88105ef6bc30 (0x88105ef6bc30)
88105ef47cb0: 2432 (0x2432)
88105ef47cb8: 0001 (0x1)
88105ef47cc0: dc00 (0xdc00)
88105ef47cc8: 88105ef47d00 (0x88105ef47d00)
88105ef47cd0: 8120eaef (0x8120eaef)
88105ef47cd8: 0007 (0x7)
88105ef47ce0: 88105ef6bc30 (0x88105ef6bc30)
88105ef47ce8: 88105ef6bbc0 (0x88105ef6bbc0)
88105ef47cf0: 88105ef47cf0 (0x88105ef47cf0)
88105ef47cf8: 88105ef47cf0 (0x88105ef47cf0)
88105ef47d00: 88105ef6bbc0 (0x88105ef6bbc0)
88105ef47d08: 0001002fda17 (0x1002fda17)
88105ef47d10: 0007 (0x7)
88105ef47d18: 83382628 (__per_cpu_offset+0x268/0x1)
88105ef47d20: 04cd9f40 (0x4cd9f40)
88105ef47d28: 88105ef67b80 (0x88105ef67b80)
88105ef47d30: 8354b600 (rcu_sched_state+0xa00/0x413e0)
88105ef47d38: 88105ef6ca78 (0x88105ef6ca78)
88105ef47d40: 88105ef6ca40 (0x88105ef6ca40)
88105ef47d48: 8354ac00 (rcu_bh_varname+0x60/0x60)
88105ef47d50: 8354ac00 (rcu_bh_varname+0x60/0x60)
88105ef47d58: 88105ef47d90 (0x88105ef47d90)
88105ef47d60: 812b313d (rcu_accelerate_cbs+0x7d/0xd0)
88105ef47d68: 88105ef663e0 (0x88105ef663e0)
88105ef47d70: 88105ef6ca40 (0x88105ef6ca40)
88105ef47d78: 88105ef6ca78 (0x88105ef6ca78)
88105ef47d80: 8354b600 

Re: [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Oleksandr Andrushchenko

On 04/17/2018 11:57 PM, Dongwon Kim wrote:

On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:

On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:

Yeah, I definitely agree on the idea of expanding the use case to the
general domain where dmabuf sharing is used. However, what you are
targetting with proposed changes is identical to the core design of
hyper_dmabuf.

On top of this basic functionalities, hyper_dmabuf has driver level
inter-domain communication, that is needed for dma-buf remote tracking
(no fence forwarding though), event triggering and event handling, extra
meta data exchange and hyper_dmabuf_id that represents grefs
(grefs are shared implicitly on driver level)

This really isn't a positive design aspect of hyperdmabuf imo. The core
code in xen-zcopy (ignoring the ioctl side, which will be cleaned up) is
very simple & clean.

If there's a clear need later on we can extend that. But for now xen-zcopy
seems to cover the basic use-case needs, so gets the job done.


Also it is designed with frontend (common core framework) + backend
(hyper visor specific comm and memory sharing) structure for portability.
We just can't limit this feature to Xen because we want to use the same
uapis not only for Xen but also other applicable hypervisor, like ACORN.

See the discussion around udmabuf and the needs for kvm. I think trying to
make an ioctl/uapi that works for multiple hypervisors is misguided - it
likely won't work.

On top of that the 2nd hypervisor you're aiming to support is ACRN. That's
not even upstream yet, nor have I seen any patches proposing to land linux
support for ACRN. Since it's not upstream, it doesn't really matter for
upstream consideration. I'm doubting that ACRN will use the same grant
references as xen, so the same uapi won't work on ACRN as on Xen anyway.

Yeah, ACRN doesn't have grant-table. Only Xen supports it. But that is why
hyper_dmabuf has been architectured with the concept of backend.
If you look at the structure of backend, you will find that
backend is just a set of standard function calls as shown here:

struct hyper_dmabuf_bknd_ops {
 /* backend initialization routine (optional) */
 int (*init)(void);

 /* backend cleanup routine (optional) */
 int (*cleanup)(void);

 /* retreiving id of current virtual machine */
 int (*get_vm_id)(void);

 /* get pages shared via hypervisor-specific method */
 int (*share_pages)(struct page **pages, int vm_id,
int nents, void **refs_info);

 /* make shared pages unshared via hypervisor specific method */
 int (*unshare_pages)(void **refs_info, int nents);

 /* map remotely shared pages on importer's side via
  * hypervisor-specific method
  */
 struct page ** (*map_shared_pages)(unsigned long ref, int vm_id,
int nents, void **refs_info);

 /* unmap and free shared pages on importer's side via
  * hypervisor-specific method
  */
 int (*unmap_shared_pages)(void **refs_info, int nents);

 /* initialize communication environment */
 int (*init_comm_env)(void);

 void (*destroy_comm)(void);

 /* upstream ch setup (receiving and responding) */
 int (*init_rx_ch)(int vm_id);

 /* downstream ch setup (transmitting and parsing responses) */
 int (*init_tx_ch)(int vm_id);

 int (*send_req)(int vm_id, struct hyper_dmabuf_req *req, int wait);
};

All of these can be mapped with any hypervisor specific implementation.
We designed backend implementation for Xen using grant-table, Xen event
and ring buffer communication. For ACRN, we have another backend using Virt-IO
for both memory sharing and communication.

We tried to define this structure of backend to make it general enough (or
it can be even modified or extended to support more cases.) so that it can
fit to other hypervisor cases. Only requirements/expectation on the hypervisor
are page-level memory sharing and inter-domain communication, which I think
are standard features of modern hypervisor.

And please review common UAPIs that hyper_dmabuf and xen-zcopy supports. They
are very general. One is getting FD (dmabuf) and get those shared. The other
is generating dmabuf from global handle (secure handle hiding gref behind it).
On top of this, hyper_dmabuf has "unshare" and "query" which are also useful
for any cases.

So I don't know why we wouldn't want to try to make these standard in most of
hypervisor cases instead of limiting it to certain hypervisor like Xen.
Frontend-backend structre is optimal for this I think.


So I am wondering we can start with this hyper_dmabuf then modify it for
your use-case if needed and polish and fix any glitches if we want to
to use this for all general dma-buf usecases.

Imo xen-zcopy is a much more reasonable starting point for upstream, which
can 

[PATCH v2] module: Fix display of wrong module .text address

2018-04-18 Thread Thomas Richter
Fixes: ef0010a30935 ("vsprintf: don't use 'restricted_pointer()'
when not restricting") for /sys/module/*/sections/.text file.

Reading file /proc/modules shows the correct address:
[root@s35lp76 ~]# cat /proc/modules | egrep '^qeth_l2'
qeth_l2 94208 1 - Live 0x03ff80401000

and reading file /sys/module/qeth_l2/sections/.text
[root@s35lp76 ~]# cat /sys/module/qeth_l2/sections/.text
0x18ea8363
displays a random address.

This breaks the perf tool which uses this address on s390
to calculate start of .text section in memory.

Fix this by printing the correct (unhashed) address.

Thanks to Jessica Yu for helping on this.

Suggested-by: Linus Torvalds 
Signed-off-by: Thomas Richter 
Cc: Jessica Yu 
Cc: sta...@vger.kernel.org
---
 kernel/module.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/module.c b/kernel/module.c
index a6e43a5806a1..40b42000bd80 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1472,7 +1472,8 @@ static ssize_t module_sect_show(struct module_attribute 
*mattr,
 {
struct module_sect_attr *sattr =
container_of(mattr, struct module_sect_attr, mattr);
-   return sprintf(buf, "0x%pK\n", (void *)sattr->address);
+   return sprintf(buf, "0x%px\n", kptr_restrict < 2 ?
+  (void *)sattr->address : 0);
 }
 
 static void free_sect_attrs(struct module_sect_attrs *sect_attrs)
-- 
2.14.3



Re: [RFC 2/6] dmaengine: xilinx_dma: Pass AXI4-Stream control words to netdev dma client

2018-04-18 Thread Peter Ujfalusi


On 2018-04-17 18:42, Vinod Koul wrote:
> On Tue, Apr 17, 2018 at 04:46:43PM +0300, Peter Ujfalusi wrote:
> 
>> @@ -709,6 +709,11 @@ struct dma_filter {
>>   *  be called after period_len bytes have been transferred.
>>   * @device_prep_interleaved_dma: Transfer expression in a generic way.
>>   * @device_prep_dma_imm_data: DMA's 8 byte immediate data to the dst address
>> + * @device_attach_metadata: Some DMA engines can send and receive side band
>> + *  information, commands or parameters which is not transferred within the
>> + *  data stream itself. In such case clients can set the metadata to the
>> + *  given descriptor and it is going to be sent to the peripheral, or in
>> + *  case of DEV_TO_MEM the provided buffer will receive the metadata.
>>   * @device_config: Pushes a new configuration to a channel, return 0 or an 
>> error
>>   *  code
>>   * @device_pause: Pauses any transfer happening on a channel. Returns
>> @@ -796,6 +801,9 @@ struct dma_device {
>>  struct dma_chan *chan, dma_addr_t dst, u64 data,
>>  unsigned long flags);
>>  
>> +int (*device_attach_metadata)(struct dma_async_tx_descriptor *desc,
>> +  void *data, size_t len);
> 
> while i am okay with the concept, I would not want to go again the custom
> pointer route, this is a no-go for me.
> 
> Instead lets add the vendor data, define that explicitly. We can use struct,
> tokens or something else to define these. But lets try to stay away from
> opaque objects please :-)

The DMA does not interpret the metadata, it is information which can be
only understood by the client driver and the remote peripheral. It is
just chunk of data (parameters, timestamps, keys, etc) that needs to
travel along with the payload.

The content is not relevant for the DMA itself.

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


J'AI BESOIN DE VOTRE RESPONSE DES QUE POSSIBLE

2018-04-18 Thread Brenda Wilson


Bonjour, je suis le Sgt. Brenda Wilson, originaire de Lake Jackson au Texas aux 
tats-Unis. Jai personnellement fait une recherche spciale 
et je suis tomb sur votre information. Je suis en train de vous 
crire ce Message de la base militaire amricaine de Kaboul en 
Afghanistan. Jai une proposition daffaires scurise 
pour vous. Bonjour, je suis le Sgt Brenda Wilson, originaire de Lake Jackson au 
Texas aux tats-Unis. vos informations. Je vous cris 
prsentement ce message de la base militaire des tats-Unis 
 Kaboul en Afghanistan. Jai une proposition commerciale 
scurise pour vous.


Re: [PATCH 4.16 00/68] 4.16.3-stable review

2018-04-18 Thread Greg Kroah-Hartman
On Wed, Apr 18, 2018 at 10:43:26AM +0530, Naresh Kamboju wrote:
> On 17 April 2018 at 21:27, Greg Kroah-Hartman
>  wrote:
> > This is the start of the stable review cycle for the 4.16.3 release.
> > There are 68 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Thu Apr 19 15:57:33 UTC 2018.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > 
> > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.16.3-rc1.gz
> > or in the git tree and branch at:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-4.16.y
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
> 
> Results from Linaro’s test farm.
> No regressions on arm64, arm and x86_64.

Great, thanks for testing these two and letting me know.

greg k-h


Re: [PATCH 4.16 00/68] 4.16.3-stable review

2018-04-18 Thread Greg Kroah-Hartman
On Tue, Apr 17, 2018 at 03:03:52PM -0600, Shuah Khan wrote:
> On 04/17/2018 09:57 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.16.3 release.
> > There are 68 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Thu Apr 19 15:57:33 UTC 2018.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > 
> > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.16.3-rc1.gz
> > or in the git tree and branch at:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-4.16.y
> > and the diffstat can be found below.
> > 
> > thanks,
> > 
> > greg k-h
> > 
> 
> Compiled and booted on my test system. No dmesg regressions.

Thanks for testing all of these and letting me know.

greg k-h


Re: [v5,08/13] ARM: dts: ipq4019: Add ipq4019-ap.dk07.1 common data

2018-04-18 Thread Sven Eckelmann
On Freitag, 23. März 2018 15:48:51 CEST Sricharan R wrote:
> Add the common data for all dk07 based boards.
> 
> Reviewed-by: Abhishek Sahu 
> Signed-off-by: Sricharan R 
> ---
>  arch/arm/boot/dts/qcom-ipq4019-ap.dk07.1.dtsi | 69 
> +++
>  1 file changed, 69 insertions(+)
>  create mode 100644 arch/arm/boot/dts/qcom-ipq4019-ap.dk07.1.dtsi

The no-map reserved-memory for tz and smem are missing. Linux doesn't have 
control over these regions and they are placed in the middle of the ram before 
Linux even starts. And u-boot is also not adding these ranges automatically.

reserved-memory {
#address-cells = <0x1>;
#size-cells = <0x1>;
ranges;

smem@87e0 {
reg = <0x87e0 0x08>;
no-map;
};

tz@87e8 {
reg = <0x87e8 0x18>;
no-map;
};
};

This can either (depending on HW/SW configuration) lead to a failed boot [1] 
or to runtime crashes like:

root@OpenWrt:/# /tmp/memory-allocator-test
main 0
[  571.758058] Unhandled fault: imprecise external abort (0xc06) at 
0x01715ff8
[  571.758099] pgd = cebec000
[  571.763826] [01715ff8] *pgd=8e7fa835, *pte=87e7f75f, *ppte=87e7fc7f
Bus error

I would not know how to disable QSEE on these boards and thus would assume 
that it should be part of this dtsi.

Kind regards,
Sven

[1] https://www.spinics.net/lists/linux-arm-msm/msg21536.html

signature.asc
Description: This is a digitally signed message part.


Re: [v5,05/13] ARM: dts: ipq4019: Add ipq4019-ap.dk04.dtsi

2018-04-18 Thread Sven Eckelmann
On Freitag, 23. März 2018 15:48:48 CEST Sricharan R wrote:
> Add the common parts for the dk04 boards.
> 
> Reviewed-by: Abhishek Sahu 
> Signed-off-by: Sricharan R 
> ---
>  arch/arm/boot/dts/qcom-ipq4019-ap.dk04.1.dtsi | 115 
> ++
>  arch/arm/boot/dts/qcom-ipq4019.dtsi   |   2 +-
>  2 files changed, 116 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm/boot/dts/qcom-ipq4019-ap.dk04.1.dtsi

The no-map reserved-memory for tz and smem are missing. Linux doesn't have 
control over these regions and they are placed in the middle of the ram before 
Linux even starts. And u-boot is also not adding these ranges automatically.

reserved-memory {
#address-cells = <0x1>;
#size-cells = <0x1>;
ranges;

smem@87e0 {
reg = <0x87e0 0x08>;
no-map;
};

tz@87e8 {
reg = <0x87e8 0x18>;
no-map;
};
};

This can either (depending on HW/SW configuration) lead to a failed boot [1] 
or to runtime crashes like:

root@OpenWrt:/# /tmp/memory-allocator-test
main 0
[  571.758058] Unhandled fault: imprecise external abort (0xc06) at 
0x01715ff8
[  571.758099] pgd = cebec000
[  571.763826] [01715ff8] *pgd=8e7fa835, *pte=87e7f75f, *ppte=87e7fc7f
Bus error

I would not know how to disable QSEE on these boards and thus would assume 
that it should be part of this dtsi.

Kind regards,
Sven

[1] https://www.spinics.net/lists/linux-arm-msm/msg21536.html


signature.asc
Description: This is a digitally signed message part.


Re: [PATCH v3] gpio: dwapb: Add support for 1 interrupt per port A GPIO

2018-04-18 Thread Hoan Tran
Hi Phil,

On Fri, Apr 13, 2018 at 9:47 AM, Phil Edworthy
 wrote:
> Hi Hoan,
>
> On 13 April 2018 17:37 Hoan Tran wrote:
>> On Fri, Apr 13, 2018 at 1:51 AM, Phil Edworthy wrote:
>> > The DesignWare GPIO IP can be configured for either 1 interrupt or 1
>> > per GPIO in port A, but the driver currently only supports 1 interrupt.
>> > See the DesignWare DW_apb_gpio Databook description of the
>> > 'GPIO_INTR_IO' parameter.
>> >
>> > This change allows the driver to work with up to 32 interrupts, it
>> > will get as many interrupts as specified in the DT 'interrupts' property.
>> > It doesn't do anything clever with the different interrupts, it just
>> > calls the same handler used for single interrupt hardware.
>> >
>> > Signed-off-by: Phil Edworthy 
>> > ---
>> > One point to mention is that I have made it possible for users to have
>> > unconncted interrupts by specifying holes in the list of interrupts.
>> > This is done by supporting the interrupts-extended DT prop.
>> > However, I have no use for this and had to hack some test case for this.
>> > Perhaps the driver should support 1 interrupt or all GPIOa as interrupts?
>> >
>> > v3:
>> >  - Rolled mfd: intel_quark_i2c_gpio fix into this patch to avoid
>> > bisect problems
>> > v2:
>> >  - Replaced interrupt-mask DT prop with support for the interrupts-
>> extended
>> >prop. This means replacing the call to irq_of_parse_and_map() with calls
>> >to of_irq_parse_one() and irq_create_of_mapping().
>> >
>> > Note: There are a few *code* lines over 80 chars, but this is just 
>> > guidance,
>> >right? Especially as there are already some lines over 80 chars.
>> > ---
> [snip]
>
>> > -   if (has_acpi_companion(dev) && pp->idx == 0)
>> > -   pp->irq = 
>> > platform_get_irq(to_platform_device(dev), 0);
>> > +   if (has_acpi_companion(dev) && pp->idx == 0) {
>> > +   pp->irq[0] = 
>> > platform_get_irq(to_platform_device(dev), 0);
>> > +   if (pp->irq[0])
>> > +   pp->has_irq = true;
>> > +   }
>>
>> It doesn't work for ACPI. Could you do the same logic for ACPI?
> I don’t have access to any device that was baked (i.e. fabbed) with multiple
> output interrupts from the Synopsys GPIO blocks and use ACPI. I don't
> know if any such device exists.

Below code is tested on X-Gene system which supports 1 interrupt per
GPIO on Port A. You can update it into your patch.

-   if (has_acpi_companion(dev) && pp->idx == 0)
-   pp->irq = platform_get_irq(to_platform_device(dev), 0);
+   if (has_acpi_companion(dev) && pp->idx == 0) {
+   unsigned int j;
+   for (j = 0; j < pp->ngpio; j++) {
+   pp->irq[j] =
platform_get_irq(to_platform_device(dev), j);
+   if (pp->irq[j])
+   pp->has_irq = true;
+   }
+   }

Thanks
Hoan

>
> I would prefer not writing code that can be tested easily. I cannot even
> test the current, albeit small, changes to the Intel Quark MFD.
>
> Regards
> Phil
>
>> Thanks
>> Hoan
>>
>> >
>> > pp->irq_shared  = false;
>> > pp->gpio_base   = -1;
>> > diff --git a/drivers/mfd/intel_quark_i2c_gpio.c
>> > b/drivers/mfd/intel_quark_i2c_gpio.c
>> > index 90e35de..5bddb84 100644
>> > --- a/drivers/mfd/intel_quark_i2c_gpio.c
>> > +++ b/drivers/mfd/intel_quark_i2c_gpio.c
>> > @@ -233,7 +233,8 @@ static int intel_quark_gpio_setup(struct pci_dev
>> *pdev, struct mfd_cell *cell)
>> > pdata->properties->idx  = 0;
>> > pdata->properties->ngpio= INTEL_QUARK_MFD_NGPIO;
>> > pdata->properties->gpio_base= INTEL_QUARK_MFD_GPIO_BASE;
>> > -   pdata->properties->irq  = pdev->irq;
>> > +   pdata->properties->irq[0]   = pdev->irq;
>> > +   pdata->properties->has_irq  = true;
>> > pdata->properties->irq_shared   = true;
>> >
>> > cell->platform_data = pdata;
>> > diff --git a/include/linux/platform_data/gpio-dwapb.h
>> > b/include/linux/platform_data/gpio-dwapb.h
>> > index 2dc7f4a..5a52d69 100644
>> > --- a/include/linux/platform_data/gpio-dwapb.h
>> > +++ b/include/linux/platform_data/gpio-dwapb.h
>> > @@ -19,7 +19,8 @@ struct dwapb_port_property {
>> > unsigned intidx;
>> > unsigned intngpio;
>> > unsigned intgpio_base;
>> > -   unsigned intirq;
>> > +   unsigned intirq[32];
>> > +   boolhas_irq;
>> > boolirq_shared;
>> >  };
>> >
>> > --
>> > 2.7.4
>> >


Re: [RFC 2/6] dmaengine: xilinx_dma: Pass AXI4-Stream control words to netdev dma client

2018-04-18 Thread Peter Ujfalusi


On 2018-04-18 09:39, Peter Ujfalusi wrote:
> 
> 
> On 2018-04-17 18:42, Vinod Koul wrote:
>> On Tue, Apr 17, 2018 at 04:46:43PM +0300, Peter Ujfalusi wrote:
>>
>>> @@ -709,6 +709,11 @@ struct dma_filter {
>>>   * be called after period_len bytes have been transferred.
>>>   * @device_prep_interleaved_dma: Transfer expression in a generic way.
>>>   * @device_prep_dma_imm_data: DMA's 8 byte immediate data to the dst 
>>> address
>>> + * @device_attach_metadata: Some DMA engines can send and receive side band
>>> + * information, commands or parameters which is not transferred within the
>>> + * data stream itself. In such case clients can set the metadata to the
>>> + * given descriptor and it is going to be sent to the peripheral, or in
>>> + * case of DEV_TO_MEM the provided buffer will receive the metadata.
>>>   * @device_config: Pushes a new configuration to a channel, return 0 or an 
>>> error
>>>   * code
>>>   * @device_pause: Pauses any transfer happening on a channel. Returns
>>> @@ -796,6 +801,9 @@ struct dma_device {
>>> struct dma_chan *chan, dma_addr_t dst, u64 data,
>>> unsigned long flags);
>>>  
>>> +   int (*device_attach_metadata)(struct dma_async_tx_descriptor *desc,
>>> + void *data, size_t len);
>>
>> while i am okay with the concept, I would not want to go again the custom
>> pointer route, this is a no-go for me.
>>
>> Instead lets add the vendor data, define that explicitly. We can use struct,
>> tokens or something else to define these. But lets try to stay away from
>> opaque objects please :-)
> 
> The DMA does not interpret the metadata, it is information which can be
> only understood by the client driver and the remote peripheral. It is
> just chunk of data (parameters, timestamps, keys, etc) that needs to
> travel along with the payload.
> 
> The content is not relevant for the DMA itself.

To add: different peripherals needs to send receive different metadata
and even the same peripheral might pass different information based on
their operating mode. The size of metadata can be different as well.

So it is not really vendor specific metadata, but peripheral, operating
mode and other factors affected chunk of data.

> 
> - Péter
> 
> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
> 

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


Re: [PATCH v2 1/7] powerpc: Add TIDR CPU feature for Power9

2018-04-18 Thread Andrew Donnellan

On 18/04/18 11:08, Alastair D'Silva wrote:

From: Alastair D'Silva 

This patch adds a CPU feature bit to show whether the CPU has
the TIDR register available, enabling as_notify/wait in userspace.

Signed-off-by: Alastair D'Silva 


Per my previous email:

Reviewed-by: Andrew Donnellan 


--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH 1/1] i2c: dev: check i2c_msg len before memdup_user() to prevent ZERO_SIZE_PTR deref

2018-04-18 Thread Uwe Kleine-König
Hello,

On Wed, Apr 18, 2018 at 03:16:45AM +0300, Alexander Popov wrote:
> Currently i2cdev_ioctl_rdwr() doesn't check i2c_msg len against zero
> before calling memdup_user(). If this len is zero memdup_user() returns
> ZERO_SIZE_PTR, which is later considered as valid since
> IS_ERR(ZERO_SIZE_PTR) is false. That causes ZERO_SIZE_PTR deref oops.

You're saying that

memdup_user(ptr, 0)

reads from *ptr? I'd say this is a bug in memdup_user, not its user.

If however the problem only happens later in

if (msgs[i].flags & I2C_M_RECV_LEN) {
if (!(msgs[i].flags & I2C_M_RD) || msgs[i].buf[0] < 1 || ...)

Your commit log is wrong (and I think the patch, too).

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: [v5,05/13] ARM: dts: ipq4019: Add ipq4019-ap.dk04.dtsi

2018-04-18 Thread Sven Eckelmann
On Mittwoch, 18. April 2018 08:59:46 CEST Sven Eckelmann wrote:
[...]
> I would not know how to disable QSEE on these boards and thus would assume 
> that it should be part of this dtsi.


Just did some reviews of the reserved-memory regions in other QCA devices and 
it looks like this tz and smem are often directly added to the SoC dtsi. So I 
will prepare a similar change for qcom-ipq4019.dtsi and this would then solve 
it for AP-DK01/04/07 and no changes in the board-family specific dtsi would be 
necessary.

But maybe someone has an objection because tz and smem can actually be 
disabled in a sane way on these SoCs and thus it would be better to have these 
regions in the board specific dts(i) files. We will see...

Kind regards,
Sven

signature.asc
Description: This is a digitally signed message part.


Re: [PATCH] mm:memcg: add __GFP_NOWARN in __memcg_schedule_kmem_cache_create

2018-04-18 Thread Michal Hocko
On Tue 17-04-18 20:08:24, Matthew Wilcox wrote:
> On Wed, Apr 18, 2018 at 11:29:12AM +0900, Minchan Kim wrote:
> > If there are heavy memory pressure, page allocation with __GFP_NOWAIT
> > fails easily although it's order-0 request.
> > I got below warning 9 times for normal boot.
> > 
> > [   17.072747] c0 0  : page allocation failure: order:0, 
> > mode:0x220(GFP_NOWAIT|__GFP_NOTRACK)
> > 
> > Let's not make user scared.
> >  
> > -   cw = kmalloc(sizeof(*cw), GFP_NOWAIT);
> > +   cw = kmalloc(sizeof(*cw), GFP_NOWAIT | __GFP_NOWARN);
> > if (!cw)
> 
> Not arguing against this patch.  But how many places do we want to use
> GFP_NOWAIT without __GFP_NOWARN?  Not many, and the few which do do this
> seem like they simply haven't added it yet.  Maybe this would be a good idea?
> 
> -#define GFP_NOWAIT  (__GFP_KSWAPD_RECLAIM)
> +#define GFP_NOWAIT  (__GFP_KSWAPD_RECLAIM | __GFP_NOWARN)

We have tried something like this in the past and Linus was strongly
against. I do not have reference handy but his argument was that each
__GFP_NOWARN should be explicit rather than implicit because it is
a deliberate decision to make.

-- 
Michal Hocko
SUSE Labs


Re: [v5,08/13] ARM: dts: ipq4019: Add ipq4019-ap.dk07.1 common data

2018-04-18 Thread Sricharan R
Hi Sven,

On 4/18/2018 12:29 PM, Sven Eckelmann wrote:
> On Freitag, 23. März 2018 15:48:51 CEST Sricharan R wrote:
>> Add the common data for all dk07 based boards.
>>
>> Reviewed-by: Abhishek Sahu 
>> Signed-off-by: Sricharan R 
>> ---
>>  arch/arm/boot/dts/qcom-ipq4019-ap.dk07.1.dtsi | 69 
>> +++
>>  1 file changed, 69 insertions(+)
>>  create mode 100644 arch/arm/boot/dts/qcom-ipq4019-ap.dk07.1.dtsi
> 
> The no-map reserved-memory for tz and smem are missing. Linux doesn't have 
> control over these regions and they are placed in the middle of the ram 
> before 
> Linux even starts. And u-boot is also not adding these ranges automatically.
> 
>   reserved-memory {
>   #address-cells = <0x1>;
>   #size-cells = <0x1>;
>   ranges;
> 
>   smem@87e0 {
>   reg = <0x87e0 0x08>;
>   no-map;
>   };
> 
>   tz@87e8 {
>   reg = <0x87e8 0x18>;
>   no-map;
>   };
>   };
> 
> This can either (depending on HW/SW configuration) lead to a failed boot [1] 
> or to runtime crashes like:
> 
> root@OpenWrt:/# /tmp/memory-allocator-test
> main 0
> [  571.758058] Unhandled fault: imprecise external abort (0xc06) at 
> 0x01715ff8
> [  571.758099] pgd = cebec000
> [  571.763826] [01715ff8] *pgd=8e7fa835, *pte=87e7f75f, *ppte=87e7fc7f
> Bus error
> 
> I would not know how to disable QSEE on these boards and thus would assume 
> that it should be part of this dtsi.

 As we discussed offline, i agree that the smem and tz reserved memory nodes 
need to
 be added. It still boots today without that, but would abort when that memory
 region is allocated and written. I will add the reserved-memory node for that
 in V6 along with other comments.

Regards,
 Sricharan

-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v2 2/7] powerpc: Use TIDR CPU feature to control TIDR allocation

2018-04-18 Thread Andrew Donnellan

On 18/04/18 11:08, Alastair D'Silva wrote:

From: Alastair D'Silva 

Switch the use of TIDR on it's CPU feature, rather than assuming it
is available based on architecture.

Signed-off-by: Alastair D'Silva 


Reviewed-by: Andrew Donnellan 

--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



[PATCH 0/6] Assorted rhashtable improvements. RESEND

2018-04-18 Thread NeilBrown
[[ I mistyped linux-kernel the first time I sent these, so
   resending.  Please reply to this set.  Sorry - neilb ]]


Some of these have been posted before and a couple
received an Ack from Herbert, but haven't appeared in any git tree
yet.
Another (the first) has been sent but received no ack.

I've added the second patch, which removes more incorrect
documentation, and added the last two patches.

One further improves rhashtable_walk stability.
The last added rhashtable_walk_prev(), as discussed with Herbert,
which should be useful for seq_files.
(Separately I've posted a patch to Al Viro to make seq_file even
easier to use with rhashtables, but this series does not depend
on that patch).

I don't see these patches as particularly urgent, though the third is a
bugfix that currently prevents me from allowing one rhashtable in
lustre to auto-shrink.

I previously suggested it might be good for some of these patches to
go upstream through 'staging' with the lustre patches.  I no longer
think that is necessary.  It is probably best for them to go upstream
through net or net-next.

Thanks,
NeilBrown


---

NeilBrown (6):
  rhashtable: remove outdated comments about grow_decision etc
  rhashtable: remove incorrect comment on r{hl,hash}table_walk_enter()
  rhashtable: reset iter when rhashtable_walk_start sees new table
  rhashtable: improve rhashtable_walk stability when stop/start used.
  rhashtable: further improve stability of rhashtable_walk
  rhashtable: add rhashtable_walk_prev()


 include/linux/rhashtable.h |   49 +-
 lib/rhashtable.c   |  121 +---
 2 files changed, 126 insertions(+), 44 deletions(-)

--
Signature



[PATCH 2/6] rhashtable: remove incorrect comment on r{hl, hash}table_walk_enter()

2018-04-18 Thread NeilBrown
Neither rhashtable_walk_enter() or rhltable_walk_enter() sleep, so
remove the comments which suggest that they do.

Signed-off-by: NeilBrown 
---
 include/linux/rhashtable.h |3 ---
 lib/rhashtable.c   |3 ---
 2 files changed, 6 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 87d443a5b11d..b01d88e196c2 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -1268,9 +1268,6 @@ static inline int rhashtable_walk_init(struct rhashtable 
*ht,
  * For a completely stable walk you should construct your own data
  * structure outside the hash table.
  *
- * This function may sleep so you must not call it from interrupt
- * context or with spin locks held.
- *
  * You must call rhashtable_walk_exit after this function returns.
  */
 static inline void rhltable_walk_enter(struct rhltable *hlt,
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 2b2b79974b61..19db8e563c40 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -668,9 +668,6 @@ EXPORT_SYMBOL_GPL(rhashtable_insert_slow);
  * For a completely stable walk you should construct your own data
  * structure outside the hash table.
  *
- * This function may sleep so you must not call it from interrupt
- * context or with spin locks held.
- *
  * You must call rhashtable_walk_exit after this function returns.
  */
 void rhashtable_walk_enter(struct rhashtable *ht, struct rhashtable_iter *iter)




[PATCH 3/6] rhashtable: reset iter when rhashtable_walk_start sees new table

2018-04-18 Thread NeilBrown
The documentation claims that when rhashtable_walk_start_check()
detects a resize event, it will rewind back to the beginning
of the table.  This is not true.  We need to set ->slot and
->skip to be zero for it to be true.

Acked-by: Herbert Xu 
Signed-off-by: NeilBrown 
---
 lib/rhashtable.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 19db8e563c40..28e1be9f681b 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -733,6 +733,8 @@ int rhashtable_walk_start_check(struct rhashtable_iter 
*iter)
 
if (!iter->walker.tbl && !iter->end_of_table) {
iter->walker.tbl = rht_dereference_rcu(ht->tbl, ht);
+   iter->slot = 0;
+   iter->skip = 0;
return -EAGAIN;
}
 




[PATCH 1/6] rhashtable: remove outdated comments about grow_decision etc

2018-04-18 Thread NeilBrown
grow_decision and shink_decision no longer exist, so remove
the remaining references to them.

Signed-off-by: NeilBrown 
---
 include/linux/rhashtable.h |   33 ++---
 1 file changed, 14 insertions(+), 19 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 1f8ad121eb43..87d443a5b11d 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -836,9 +836,8 @@ static inline void *__rhashtable_insert_fast(
  *
  * It is safe to call this function from atomic context.
  *
- * Will trigger an automatic deferred table resizing if the size grows
- * beyond the watermark indicated by grow_decision() which can be passed
- * to rhashtable_init().
+ * Will trigger an automatic deferred table resizing if residency in the
+ * table grows beyond 70%.
  */
 static inline int rhashtable_insert_fast(
struct rhashtable *ht, struct rhash_head *obj,
@@ -866,9 +865,8 @@ static inline int rhashtable_insert_fast(
  *
  * It is safe to call this function from atomic context.
  *
- * Will trigger an automatic deferred table resizing if the size grows
- * beyond the watermark indicated by grow_decision() which can be passed
- * to rhashtable_init().
+ * Will trigger an automatic deferred table resizing if residency in the
+ * table grows beyond 70%.
  */
 static inline int rhltable_insert_key(
struct rhltable *hlt, const void *key, struct rhlist_head *list,
@@ -890,9 +888,8 @@ static inline int rhltable_insert_key(
  *
  * It is safe to call this function from atomic context.
  *
- * Will trigger an automatic deferred table resizing if the size grows
- * beyond the watermark indicated by grow_decision() which can be passed
- * to rhashtable_init().
+ * Will trigger an automatic deferred table resizing if residency in the
+ * table grows beyond 70%.
  */
 static inline int rhltable_insert(
struct rhltable *hlt, struct rhlist_head *list,
@@ -922,9 +919,8 @@ static inline int rhltable_insert(
  *
  * It is safe to call this function from atomic context.
  *
- * Will trigger an automatic deferred table resizing if the size grows
- * beyond the watermark indicated by grow_decision() which can be passed
- * to rhashtable_init().
+ * Will trigger an automatic deferred table resizing if residency in the
+ * table grows beyond 70%.
  */
 static inline int rhashtable_lookup_insert_fast(
struct rhashtable *ht, struct rhash_head *obj,
@@ -981,9 +977,8 @@ static inline void *rhashtable_lookup_get_insert_fast(
  *
  * Lookups may occur in parallel with hashtable mutations and resizing.
  *
- * Will trigger an automatic deferred table resizing if the size grows
- * beyond the watermark indicated by grow_decision() which can be passed
- * to rhashtable_init().
+ * Will trigger an automatic deferred table resizing if residency in the
+ * table grows beyond 70%.
  *
  * Returns zero on success.
  */
@@ -1134,8 +1129,8 @@ static inline int __rhashtable_remove_fast(
  * walk the bucket chain upon removal. The removal operation is thus
  * considerable slow if the hash table is not correctly sized.
  *
- * Will automatically shrink the table via rhashtable_expand() if the
- * shrink_decision function specified at rhashtable_init() returns true.
+ * Will automatically shrink the table if permitted when residency drops
+ * below 30%.
  *
  * Returns zero on success, -ENOENT if the entry could not be found.
  */
@@ -1156,8 +1151,8 @@ static inline int rhashtable_remove_fast(
  * walk the bucket chain upon removal. The removal operation is thus
  * considerable slow if the hash table is not correctly sized.
  *
- * Will automatically shrink the table via rhashtable_expand() if the
- * shrink_decision function specified at rhashtable_init() returns true.
+ * Will automatically shrink the table if permitted when residency drops
+ * below 30%
  *
  * Returns zero on success, -ENOENT if the entry could not be found.
  */




[PATCH 5/6] rhashtable: further improve stability of rhashtable_walk

2018-04-18 Thread NeilBrown
If the sequence:
   obj = rhashtable_walk_next(iter);
   rhashtable_walk_stop(iter);
   rhashtable_remove_fast(ht, >head, params);
   rhashtable_walk_start(iter);

 races with another thread inserting or removing
 an object on the same hash chain, a subsequent
 rhashtable_walk_next() is not guaranteed to get the "next"
 object. It is possible that an object could be
 repeated, or missed.

 This can be made more reliable by keeping the objects in a hash chain
 sorted by memory address.  A subsequent rhashtable_walk_next()
 call can reliably find the correct position in the list, and thus
 find the 'next' object.

 It is not possible (certainly not so easy) to achieve this with an
 rhltable as keeping the hash chain in order is not so easy.  When the
 first object with a given key is removed, it is replaced in the chain
 with the next object with the same key, and the address of that
 object may not be correctly ordered.
 No current user of rhltable_walk_enter() calls
 rhashtable_walk_start() more than once, so no current code
 could benefit from a more reliable walk of rhltables.

 This patch only attempts to improve walks for rhashtables.
 - a new object is always inserted after the last object with a
   smaller address, or at the start
 - when rhashtable_walk_start() is called, it records that 'p' is not
   'safe', meaning that it cannot be dereferenced.  The revalidation
   that was previously done here is moved to rhashtable_walk_next()
 - when rhashtable_walk_next() is called while p is not NULL and not
   safe, it walks the chain looking for the first object with an
   address greater than p and returns that.  If there is none, it moves
   to the next hash chain.

Signed-off-by: NeilBrown 
---
 include/linux/rhashtable.h |   12 ++--
 lib/rhashtable.c   |   66 +++-
 2 files changed, 50 insertions(+), 28 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index b01d88e196c2..5ce6201f246e 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -207,6 +207,7 @@ struct rhashtable_iter {
struct rhashtable_walker walker;
unsigned int slot;
unsigned int skip;
+   bool p_is_unsafe;
bool end_of_table;
 };
 
@@ -730,6 +731,7 @@ static inline void *__rhashtable_insert_fast(
.ht = ht,
.key = key,
};
+   struct rhash_head __rcu **inspos;
struct rhash_head __rcu **pprev;
struct bucket_table *tbl;
struct rhash_head *head;
@@ -757,6 +759,7 @@ static inline void *__rhashtable_insert_fast(
data = ERR_PTR(-ENOMEM);
if (!pprev)
goto out;
+   inspos = pprev;
 
rht_for_each_continue(head, *pprev, tbl, hash) {
struct rhlist_head *plist;
@@ -768,6 +771,8 @@ static inline void *__rhashtable_insert_fast(
 params.obj_cmpfn(, rht_obj(ht, head)) :
 rhashtable_compare(, rht_obj(ht, head {
pprev = >next;
+   if (head < obj)
+   inspos = >next;
continue;
}
 
@@ -798,7 +803,7 @@ static inline void *__rhashtable_insert_fast(
if (unlikely(rht_grow_above_100(ht, tbl)))
goto slow_path;
 
-   head = rht_dereference_bucket(*pprev, tbl, hash);
+   head = rht_dereference_bucket(*inspos, tbl, hash);
 
RCU_INIT_POINTER(obj->next, head);
if (rhlist) {
@@ -808,7 +813,7 @@ static inline void *__rhashtable_insert_fast(
RCU_INIT_POINTER(list->next, NULL);
}
 
-   rcu_assign_pointer(*pprev, obj);
+   rcu_assign_pointer(*inspos, obj);
 
atomic_inc(>nelems);
if (rht_grow_above_75(ht, tbl))
@@ -1263,7 +1268,8 @@ static inline int rhashtable_walk_init(struct rhashtable 
*ht,
  * Note that if you restart a walk after rhashtable_walk_stop you
  * may see the same object twice.  Also, you may miss objects if
  * there are removals in between rhashtable_walk_stop and the next
- * call to rhashtable_walk_start.
+ * call to rhashtable_walk_start.  Note that this is different to
+ * rhashtable_walk_enter() which never leads to missing objects.
  *
  * For a completely stable walk you should construct your own data
  * structure outside the hash table.
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 16cde54a553b..be7eb57d9398 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -566,6 +566,10 @@ static struct bucket_table *rhashtable_insert_one(struct 
rhashtable *ht,
return ERR_PTR(-ENOMEM);
 
head = rht_dereference_bucket(*pprev, tbl, hash);
+   while (!rht_is_a_nulls(head) && head < obj) {
+   pprev = >next;
+   head = rht_dereference_bucket(*pprev, tbl, hash);
+   }
 
RCU_INIT_POINTER(obj->next, head);
if (ht->rhlist) {
@@ -660,10 +664,10 @@ 

Re: [v5,05/13] ARM: dts: ipq4019: Add ipq4019-ap.dk04.dtsi

2018-04-18 Thread Sricharan R
Hi Sven,

On 4/18/2018 12:37 PM, Sven Eckelmann wrote:
> On Mittwoch, 18. April 2018 08:59:46 CEST Sven Eckelmann wrote:
> [...]
>> I would not know how to disable QSEE on these boards and thus would assume 
>> that it should be part of this dtsi.
> 
> 
> Just did some reviews of the reserved-memory regions in other QCA devices and 
> it looks like this tz and smem are often directly added to the SoC dtsi. So I 
> will prepare a similar change for qcom-ipq4019.dtsi and this would then solve 
> it for AP-DK01/04/07 and no changes in the board-family specific dtsi would 
> be 
> necessary.
> 
> But maybe someone has an objection because tz and smem can actually be 
> disabled in a sane way on these SoCs and thus it would be better to have 
> these 
> regions in the board specific dts(i) files. We will see...

 Right, will add the above change to soc.dtsi in V6. Does that sound ok for you 
?

Regards,
 Sricharan

-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation


[PATCH 6/6] rhashtable: add rhashtable_walk_prev()

2018-04-18 Thread NeilBrown
rhashtable_walk_prev() returns the object returned by
the previous rhashtable_walk_next(), providing it is still in the
table (or was during this grace period).
This works even if rhashtable_walk_stop() and rhashtable_talk_start()
have been called since the last rhashtable_walk_next().

If there have been no calls to rhashtable_walk_next(), or if the
object is gone from the table, then NULL is returned.

This can usefully be used in a seq_file ->start() function.
If the pos is the same as was returned by the last ->next() call,
then rhashtable_walk_prev() can be used to re-establish the
current location in the table.  If it returns NULL, then
rhashtable_walk_next() should be used.

Signed-off-by: NeilBrown 
---
 include/linux/rhashtable.h |1 +
 lib/rhashtable.c   |   30 ++
 2 files changed, 31 insertions(+)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 5ce6201f246e..b1ad2b6a3f3f 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -397,6 +397,7 @@ static inline void rhashtable_walk_start(struct 
rhashtable_iter *iter)
 
 void *rhashtable_walk_next(struct rhashtable_iter *iter);
 void *rhashtable_walk_peek(struct rhashtable_iter *iter);
+void *rhashtable_walk_prev(struct rhashtable_iter *iter);
 void rhashtable_walk_stop(struct rhashtable_iter *iter) __releases(RCU);
 
 void rhashtable_free_and_destroy(struct rhashtable *ht,
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index be7eb57d9398..d2f941146ea3 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -910,6 +910,36 @@ void *rhashtable_walk_next(struct rhashtable_iter *iter)
 }
 EXPORT_SYMBOL_GPL(rhashtable_walk_next);
 
+/**
+ * rhashtable_walk_prev - Return the previously returned object, if available
+ * @iter:  Hash table iterator
+ *
+ * If rhashtable_walk_next() has previously been called and the object
+ * it returned is still in the hash table, that object is returned again,
+ * otherwise %NULL is returned.
+ *
+ * If the recent rhashtable_walk_next() call was since the most recent
+ * rhashtable_walk_start() call then the returned object may not, strictly
+ * speaking, still be in the table.  It will be safe to dereference.
+ *
+ * Note that the iterator is not changed and in particular it does not
+ * step backwards.
+ */
+void *rhashtable_walk_prev(struct rhashtable_iter *iter)
+{
+   struct rhashtable *ht = iter->ht;
+   struct rhash_head *p = iter->p;
+
+   if (!p)
+   return NULL;
+   if (!iter->p_is_unsafe || ht->rhlist)
+   return p;
+   rht_for_each_rcu(p, iter->walker.tbl, iter->slot)
+   if (p == iter->p)
+   return p;
+   return NULL;
+}
+
 /**
  * rhashtable_walk_peek - Return the next object but don't advance the iterator
  * @iter:  Hash table iterator




[PATCH v3] module: Fix display of wrong module .text address

2018-04-18 Thread Thomas Richter
Reading file /proc/modules shows the correct address:
[root@s35lp76 ~]# cat /proc/modules | egrep '^qeth_l2'
qeth_l2 94208 1 - Live 0x03ff80401000

and reading file /sys/module/qeth_l2/sections/.text
[root@s35lp76 ~]# cat /sys/module/qeth_l2/sections/.text
0x18ea8363
displays a random address.

This breaks the perf tool which uses this address on s390
to calculate start of .text section in memory.

Fix this by printing the correct (unhashed) address.

Thanks to Jessica Yu for helping on this.

Fixes: ef0010a30935 ("vsprintf: don't use 'restricted_pointer()' when not 
restricting")
Cc:  # v4.15+
Suggested-by: Linus Torvalds 
Signed-off-by: Thomas Richter 
Cc: Jessica Yu 
---
 kernel/module.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/module.c b/kernel/module.c
index a6e43a5806a1..40b42000bd80 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1472,7 +1472,8 @@ static ssize_t module_sect_show(struct module_attribute 
*mattr,
 {
struct module_sect_attr *sattr =
container_of(mattr, struct module_sect_attr, mattr);
-   return sprintf(buf, "0x%pK\n", (void *)sattr->address);
+   return sprintf(buf, "0x%px\n", kptr_restrict < 2 ?
+  (void *)sattr->address : NULL);
 }
 
 static void free_sect_attrs(struct module_sect_attrs *sect_attrs)
-- 
2.14.3



Re: [PATCH v2] regulator: Don't return or expect -errno from of_map_mode()

2018-04-18 Thread Javier Martinez Canillas
Hi Doug,

Patch looks good to me, I just have some minor comments.

On Wed, Apr 18, 2018 at 5:31 AM, Douglas Anderson  wrote:
> In of_get_regulation_constraints() we were taking the result of
> of_map_mode() (an unsigned int) and assigning it to an int.  We were
> then checking whether this value was -EINVAL.  Some implementers of
> of_map_mode() were returning -EINVAL (even though the return type of
> their function needed to be unsigned int) because they needed to to

s/to to/to

> signal an error back to of_get_regulation_constraints().
>
> In general in the regulator framework the mode is always referred to
> as an unsigned int.  While we could fix this to be a signed int (the
> highest value we store in there right now is 0x8), it's actually
> pretty clean to just define the regulator mode 0x0 (the lack of any
> bits set) as an invalid mode.  Let's do that.
>
> Suggested-by: Javier Martinez Canillas 
> Fixes: 5e5e3a42c653 ("regulator: of: Add support for parsing initial and 
> suspend modes")
> Signed-off-by: Douglas Anderson 
> ---
>
> Changes in v2:
> - Use Javier's suggestion of defining 0x0 as invalid
>
>  drivers/regulator/cpcap-regulator.c |  2 +-
>  drivers/regulator/of_regulator.c| 15 +--
>  drivers/regulator/twl-regulator.c   |  2 +-
>  include/linux/regulator/consumer.h  |  1 +
>  4 files changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/regulator/cpcap-regulator.c 
> b/drivers/regulator/cpcap-regulator.c
> index f541b80f1b54..bd910fe123d9 100644
> --- a/drivers/regulator/cpcap-regulator.c
> +++ b/drivers/regulator/cpcap-regulator.c
> @@ -222,7 +222,7 @@ static unsigned int cpcap_map_mode(unsigned int mode)
> case CPCAP_BIT_AUDIO_LOW_PWR:
> return REGULATOR_MODE_STANDBY;
> default:
> -   return -EINVAL;
> +   return REGULATOR_MODE_INVALID;
> }
>  }
>
> diff --git a/drivers/regulator/of_regulator.c 
> b/drivers/regulator/of_regulator.c
> index f47264fa1940..22c02b7a338b 100644
> --- a/drivers/regulator/of_regulator.c
> +++ b/drivers/regulator/of_regulator.c
> @@ -124,11 +124,12 @@ static void of_get_regulation_constraints(struct 
> device_node *np,
>
> if (!of_property_read_u32(np, "regulator-initial-mode", )) {
> if (desc && desc->of_map_mode) {
> -   ret = desc->of_map_mode(pval);
> -   if (ret == -EINVAL)
> +   unsigned int mode = desc->of_map_mode(pval);

I think the convention is to always declare local variables at the
start of the function? Although I couldn't find anything in the coding
style document...

> +
> +   if (mode == REGULATOR_MODE_INVALID)
> pr_err("%s: invalid mode %u\n", np->name, 
> pval);
> else
> -   constraints->initial_mode = ret;
> +   constraints->initial_mode = mode;
> } else {
> pr_warn("%s: mapping for mode %d not defined\n",
> np->name, pval);
> @@ -163,12 +164,14 @@ static void of_get_regulation_constraints(struct 
> device_node *np,
> if (!of_property_read_u32(suspend_np, "regulator-mode",
>   )) {
> if (desc && desc->of_map_mode) {
> -   ret = desc->of_map_mode(pval);
> -   if (ret == -EINVAL)
> +   unsigned int mode = desc->of_map_mode(pval);
> +
> +   mode = desc->of_map_mode(pval);

You are calling .of_map_mode and assigning the return value twice here.

If you post a new version, feel free to add:

Reviewed-by: Javier Martinez Canillas 

Best regards,
Javier


[PATCH 4/6] rhashtable: improve rhashtable_walk stability when stop/start used.

2018-04-18 Thread NeilBrown
When a walk of an rhashtable is interrupted with rhastable_walk_stop()
and then rhashtable_walk_start(), the location to restart from is based
on a 'skip' count in the current hash chain, and this can be incorrect
if insertions or deletions have happened.  This does not happen when
the walk is not stopped and started as iter->p is a placeholder which
is safe to use while holding the RCU read lock.

In rhashtable_walk_start() we can revalidate that 'p' is still in the
same hash chain.  If it isn't then the current method is still used.

With this patch, if a rhashtable walker ensures that the current
object remains in the table over a stop/start period (possibly by
elevating the reference count if that is sufficient), it can be sure
that a walk will not miss objects that were in the hashtable for the
whole time of the walk.

rhashtable_walk_start() may not find the object even though it is
still in the hashtable if a rehash has moved it to a new table.  In
this case it will (eventually) get -EAGAIN and will need to proceed
through the whole table again to be sure to see everything at least
once.

Acked-by: Herbert Xu 
Signed-off-by: NeilBrown 
---
 lib/rhashtable.c |   44 +---
 1 file changed, 41 insertions(+), 3 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 28e1be9f681b..16cde54a553b 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -723,6 +723,7 @@ int rhashtable_walk_start_check(struct rhashtable_iter 
*iter)
__acquires(RCU)
 {
struct rhashtable *ht = iter->ht;
+   bool rhlist = ht->rhlist;
 
rcu_read_lock();
 
@@ -731,13 +732,52 @@ int rhashtable_walk_start_check(struct rhashtable_iter 
*iter)
list_del(>walker.list);
spin_unlock(>lock);
 
-   if (!iter->walker.tbl && !iter->end_of_table) {
+   if (iter->end_of_table)
+   return 0;
+   if (!iter->walker.tbl) {
iter->walker.tbl = rht_dereference_rcu(ht->tbl, ht);
iter->slot = 0;
iter->skip = 0;
return -EAGAIN;
}
 
+   if (iter->p && !rhlist) {
+   /*
+* We need to validate that 'p' is still in the table, and
+* if so, update 'skip'
+*/
+   struct rhash_head *p;
+   int skip = 0;
+   rht_for_each_rcu(p, iter->walker.tbl, iter->slot) {
+   skip++;
+   if (p == iter->p) {
+   iter->skip = skip;
+   goto found;
+   }
+   }
+   iter->p = NULL;
+   } else if (iter->p && rhlist) {
+   /* Need to validate that 'list' is still in the table, and
+* if so, update 'skip' and 'p'.
+*/
+   struct rhash_head *p;
+   struct rhlist_head *list;
+   int skip = 0;
+   rht_for_each_rcu(p, iter->walker.tbl, iter->slot) {
+   for (list = container_of(p, struct rhlist_head, rhead);
+list;
+list = rcu_dereference(list->next)) {
+   skip++;
+   if (list == iter->list) {
+   iter->p = p;
+   skip = skip;
+   goto found;
+   }
+   }
+   }
+   iter->p = NULL;
+   }
+found:
return 0;
 }
 EXPORT_SYMBOL_GPL(rhashtable_walk_start_check);
@@ -913,8 +953,6 @@ void rhashtable_walk_stop(struct rhashtable_iter *iter)
iter->walker.tbl = NULL;
spin_unlock(>lock);
 
-   iter->p = NULL;
-
 out:
rcu_read_unlock();
 }




Re: [PATCH v2 2/6] dt-bindings: display: atmel: optional video-interface of endpoints

2018-04-18 Thread Boris Brezillon
Hi Peter,

On Tue, 17 Apr 2018 15:10:48 +0200
Peter Rosin  wrote:

> With bus-type/bus-width properties in the endpoint nodes, the video-
> interface of the connection can be specified for cases where the
> heuristic fails to select the correct output mode. This can happen
> e.g. if not all RGB pins are routed on the PCB; the driver has no
> way of knowing this, and needs to be told explicitly.
> 
> This is critical for the devices that have the "conflicting output
> formats" issue (SAM9N12, SAM9X5, SAMA5D3), since the most significant
> RGB bits move around depending on the selected output mode. For
> devices that do not have the "conflicting output formats" issue
> (SAMA5D2, SAMA5D4), this is completely irrelevant.
> 
> Signed-off-by: Peter Rosin 
> ---
>  Documentation/devicetree/bindings/display/atmel/hlcdc-dc.txt | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/display/atmel/hlcdc-dc.txt 
> b/Documentation/devicetree/bindings/display/atmel/hlcdc-dc.txt
> index 82f2acb3d374..244b48869eb4 100644
> --- a/Documentation/devicetree/bindings/display/atmel/hlcdc-dc.txt
> +++ b/Documentation/devicetree/bindings/display/atmel/hlcdc-dc.txt
> @@ -15,6 +15,14 @@ Required children nodes:
>   to external devices using the OF graph reprensentation (see ../graph.txt).
>   At least one port node is required.
>  
> +Optional properties in grandchild nodes:
> + Any endpoint grandchild node may specify a desired video interface
> + according to ../../media/video-interfaces.txt, specifically
> + - bus-type: must be <0>.
> + - bus-width: recognized values are <12>, <16>, <18> and <24>, and
> +   override any output mode selection hueristic, forcing "rgb444",
> +   "rgb565", "rgb666" and "rgb888" respectively.
> +

Can you add an example or update the existing one to show how this
should be defined?

>  Example:
>  
>   hlcdc: hlcdc@f003 {


Thanks,

Boris


Re: [PATCH v3] module: Fix display of wrong module .text address

2018-04-18 Thread Tobin C. Harding
On Wed, Apr 18, 2018 at 09:14:36AM +0200, Thomas Richter wrote:
> Reading file /proc/modules shows the correct address:
> [root@s35lp76 ~]# cat /proc/modules | egrep '^qeth_l2'
> qeth_l2 94208 1 - Live 0x03ff80401000
> 
> and reading file /sys/module/qeth_l2/sections/.text
> [root@s35lp76 ~]# cat /sys/module/qeth_l2/sections/.text
> 0x18ea8363
> displays a random address.
> 
> This breaks the perf tool which uses this address on s390
> to calculate start of .text section in memory.
> 
> Fix this by printing the correct (unhashed) address.
> 
> Thanks to Jessica Yu for helping on this.
> 
> Fixes: ef0010a30935 ("vsprintf: don't use 'restricted_pointer()' when not 
> restricting")
> Cc:  # v4.15+
> Suggested-by: Linus Torvalds 
> Signed-off-by: Thomas Richter 
> Cc: Jessica Yu 
> ---

What's changed in each version please?


thanks,
Tobin.


Re: nds32 build failures

2018-04-18 Thread Greentime Hu
2018-04-17 20:47 GMT+08:00 Arnd Bergmann :
> On Mon, Apr 16, 2018 at 11:06 AM, Greentime Hu  wrote:
>> 2018-04-16 11:58 GMT+08:00 Guenter Roeck :
>>
>> This built failure is because the toolchain version you used is not
>> supported the latest intrinsic function/macro.
>> We are sending the latest patchset now and we expect the whole new
>> features will be supported in gcc8.0.0 and binutil2.31+.
>>
>> If you'd like to get these new features of toolchain, you may use the
>> github version.
>> This is the built-script repo. https://github.com/andestech/build_script.git
>
> I've taken the gcc-6.3 sources from there, and updated them to gcc-6.4.0
> in order to build a nds32le-linux toolchain based on the same version as
> the other ones.
>
> Unfortunately neither the usual binutils-2.29.1 nor your binutils worked
> for me, but I eventually managed to get a build using the binutils-2.30
> release.
>
> With this, I could build a mainline kernel with a couple of warnings,
> but an 'allmodconfig' build still failed.
>
> Guenter, can you try my binary from
> www.kernel.org/pub/tools/crosstool/files/bin/x86_64/6.4.0/x86_64-gcc-6.4.0-nolibc-nds32le-linux.tar.xz
> ?
>
> If that works for you, I'll update the front-page and remove the nds32-elf
> toolchains.
>
> Greentime, do you have a patch set for gcc-7.3 as well, or are 6.3 and 8.0 the
> only working compilers for nds32le-linux?
>

Hi, all:

I just discuss with our toolchain colleagues. We have only gcc6.3 and
gcc8.0 for nds32le-linux.
I have the ld segmentation fault issue too when building kernel with
'allmodconfig'. We are dealing with it.


Re: [PATCH] mm:memcg: add __GFP_NOWARN in __memcg_schedule_kmem_cache_create

2018-04-18 Thread Michal Hocko
On Wed 18-04-18 11:29:12, Minchan Kim wrote:
> If there are heavy memory pressure, page allocation with __GFP_NOWAIT
> fails easily although it's order-0 request.
> I got below warning 9 times for normal boot.
> 
> [   17.072747] c0 0  : page allocation failure: order:0, 
> mode:0x220(GFP_NOWAIT|__GFP_NOTRACK)
> < snip >
> [   17.072789] c0 0  Call trace:
> [   17.072803] c0 0  [] dump_backtrace+0x0/0x4
> [   17.072813] c0 0  [] dump_stack+0xa4/0xc0
> [   17.072822] c0 0  [] warn_alloc+0xd4/0x15c
> [   17.072829] c0 0  [] 
> __alloc_pages_nodemask+0xf88/0x10fc
> [   17.072838] c0 0  [] alloc_slab_page+0x40/0x18c
> [   17.072843] c0 0  [] new_slab+0x2b8/0x2e0
> [   17.072849] c0 0  [] ___slab_alloc+0x25c/0x464
> [   17.072858] c0 0  [] __kmalloc+0x394/0x498
> [   17.072865] c0 0  [] memcg_kmem_get_cache+0x114/0x2b8
> [   17.072870] c0 0  [] kmem_cache_alloc+0x98/0x3e8
> [   17.072878] c0 0  [] mmap_region+0x3bc/0x8c0
> [   17.072884] c0 0  [] do_mmap+0x40c/0x43c
> [   17.072890] c0 0  [] vm_mmap_pgoff+0x15c/0x1e4
> [   17.072898] c0 0  [] sys_mmap+0xb0/0xc8
> [   17.072904] c0 0  [] el0_svc_naked+0x24/0x28
> [   17.072908] c0 0  Mem-Info:
> [   17.072920] c0 0  active_anon:17124 inactive_anon:193 isolated_anon:0
> [   17.072920] c0 0   active_file:7898 inactive_file:712955 
> isolated_file:55
> [   17.072920] c0 0   unevictable:0 dirty:27 writeback:18 unstable:0
> [   17.072920] c0 0   slab_reclaimable:12250 slab_unreclaimable:23334
> [   17.072920] c0 0   mapped:19310 shmem:212 pagetables:816 bounce:0
> [   17.072920] c0 0   free:36561 free_pcp:1205 free_cma:35615
> [   17.072933] c0 0  Node 0 active_anon:68496kB inactive_anon:772kB 
> active_file:31592kB inactive_file:2851820kB unevictable:0kB 
> isolated(anon):0kB isolated(file):220kB mapped:77240kB dirty:108kB 
> writeback:72kB shmem:848kB writeback_tmp:0kB unstable:0kB all_unreclaimable? 
> no
> [   17.072945] c0 0  DMA free:142188kB min:3056kB low:3820kB high:4584kB 
> active_anon:10052kB inactive_anon:12kB active_file:312kB 
> inactive_file:1412620kB unevictable:0kB writepending:0kB present:1781412kB 
> managed:1604728kB mlocked:0kB slab_reclaimable:3592kB 
> slab_unreclaimable:876kB kernel_stack:400kB pagetables:52kB bounce:0kB 
> free_pcp:1436kB local_pcp:124kB free_cma:142492kB
> [   17.072949] c0 0  lowmem_reserve[]: 0 1842 1842
> [   17.072966] c0 0  Normal free:4056kB min:4172kB low:5212kB high:6252kB 
> active_anon:58376kB inactive_anon:760kB active_file:31348kB 
> inactive_file:1439040kB unevictable:0kB writepending:180kB present:2000636kB 
> managed:1923688kB mlocked:0kB slab_reclaimable:45408kB 
> slab_unreclaimable:92460kB kernel_stack:9680kB pagetables:3212kB bounce:0kB 
> free_pcp:3392kB local_pcp:688kB free_cma:0kB
> [   17.072971] c0 0  lowmem_reserve[]: 0 0 0
> [   17.072982] c0 0  DMA: 0*4kB 0*8kB 1*16kB (C) 0*32kB 0*64kB 0*128kB 
> 1*256kB (C) 1*512kB (C) 0*1024kB 1*2048kB (C) 34*4096kB (C) = 142096kB
> [   17.073024] c0 0  Normal: 228*4kB (UMEH) 172*8kB (UMH) 23*16kB (UH) 
> 24*32kB (H) 5*64kB (H) 1*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 
> = 3872kB
> [   17.073069] c0 0  721350 total pagecache pages
> [   17.073073] c0 0  0 pages in swap cache
> [   17.073078] c0 0  Swap cache stats: add 0, delete 0, find 0/0
> [   17.073081] c0 0  Free swap  = 0kB
> [   17.073085] c0 0  Total swap = 0kB
> [   17.073089] c0 0  945512 pages RAM
> [   17.073093] c0 0  0 pages HighMem/MovableOnly
> [   17.073097] c0 0  63408 pages reserved
> [   17.073100] c0 0  51200 pages cma reserved
> 
> Let's not make user scared.

This is not a proper explanation. So what exactly happens when this
allocation fails? I would suggest something like the following
"
__memcg_schedule_kmem_cache_create tries to create a shadow slab cache
and the worker allocation failure is not really critical because we will
retry on the next kmem charge. We might miss some charges but that
shouldn't be critical. The excessive allocation failure report is not
very much helpful. Replace it with a rate limited single line output so
that we know that there is a lot of these failures and that we need to
do something about it in future.
"

With the last part to be implemented of course.
 
> Cc: Johannes Weiner 
> Cc: Michal Hocko 
> Cc: Vladimir Davydov 
> Signed-off-by: Minchan Kim 
> ---
>  mm/memcontrol.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 448db08d97a0..671d07e73a3b 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2200,7 +2200,7 @@ static void __memcg_schedule_kmem_cache_create(struct 
> mem_cgroup *memcg,
>  {
>   struct memcg_kmem_cache_create_work *cw;
>  
> - cw = kmalloc(sizeof(*cw), GFP_NOWAIT);
> + cw = kmalloc(sizeof(*cw), GFP_NOWAIT | __GFP_NOWARN);
>   if (!cw)
>   return;
>  
> -- 

Attn: Beneficiary,

2018-04-18 Thread lizzyBen
-- 



-- 
Attn: Beneficiary,

Compliments of the day. This is an official notification from the
International Monetary Fund (IMF) after due consideration of the
rigorous/difficult process engaged by Banks/financial home and
delivery agents/Courier companies in payment of your long awaited
payment.

The International Monetary Fund (IMF) is an organization of 188
countries, working to foster global monetary cooperation, secure
financial stability, facilitate international trade, promote high
employment and sustainable economic growth, and reduce poverty around
the world. Inline with the above, i wish to bring you with this good
news as your long overdue payment which was supposed to have been
released to you has now been endorsed by the (IMF) as you have been
mandated to re-confirm your personal details as stated below before we
can be obliged to direct you to our authorized payment center (Bank)
in Europe.

Please kindly furnish us with your full details as listed below:

Full Name .
Contact Address ...
Country of Origin .
Date of Birth. .
Phone Number ..Cell Phone 
Occupation .
Sex... ...
Marital status ..

As soon as I hear from you again, we are going to direct you to the
bank in Europe to enable you you open up communication with them, so
that you can officially apply for the release of your funds and making
sure you have to provide all statutory requirements to enable the bank
complete all remittance / transfer protocols.

Looking forward to hearing from you soon.

Yours truly,

Mrs. Christine Lagarde.
EXECUTIVE DIRECTOR (IMF)
INTERNATIONAL MONETARY FUND LONDON.


Re: [PATCH v3] module: Fix display of wrong module .text address

2018-04-18 Thread Thomas-Mich Richter
On 04/18/2018 09:17 AM, Tobin C. Harding wrote:
> On Wed, Apr 18, 2018 at 09:14:36AM +0200, Thomas Richter wrote:
>> Reading file /proc/modules shows the correct address:
>> [root@s35lp76 ~]# cat /proc/modules | egrep '^qeth_l2'
>> qeth_l2 94208 1 - Live 0x03ff80401000
>>
>> and reading file /sys/module/qeth_l2/sections/.text
>> [root@s35lp76 ~]# cat /sys/module/qeth_l2/sections/.text
>> 0x18ea8363
>> displays a random address.
>>
>> This breaks the perf tool which uses this address on s390
>> to calculate start of .text section in memory.
>>
>> Fix this by printing the correct (unhashed) address.
>>
>> Thanks to Jessica Yu for helping on this.
>>
>> Fixes: ef0010a30935 ("vsprintf: don't use 'restricted_pointer()' when not 
>> restricting")
>> Cc:  # v4.15+
>> Suggested-by: Linus Torvalds 
>> Signed-off-by: Thomas Richter 
>> Cc: Jessica Yu 
>> ---
> 
> What's changed in each version please?
> 
> 
> thanks,
> Tobin.
> 

V2: Changed sprintf format string from %#lx to 0x%px (suggested by Kees Cook).
V3: Changed sprintf agrument from 0 to NULL to avoid sparse warning.

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294



Re: [PATCH 4.14 00/49] 4.14.35-stable review

2018-04-18 Thread Naresh Kamboju
On 17 April 2018 at 21:28, Greg Kroah-Hartman
 wrote:
> This is the start of the stable review cycle for the 4.14.35 release.
> There are 49 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu Apr 19 15:56:59 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.35-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.14.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm and x86_64.

kselftest: BPF tests test_xdp_meta.sh and test_xdp_redirect.sh were being
skipped with "Could not run test without the ip {xdp,xdpgeneric} support",
which got added into iproute2 4.11 and now being run and reported failed
on stable-rc-4.14.35-rc1 and also on linux-mainline kernel 4.17.

We have an open bug to investigate this failure.
LKFT: mainline: BPF: test_xdp_redirect.sh and test_xdp_meta.sh skipped -
Could not run test without the ip xdpgeneric support
https://bugs.linaro.org/show_bug.cgi?id=3630

Summary

kernel: 4.14.35-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.14.y
git commit: adacb0d813172896acb603b9af86907f1ee62a1f
git describe: v4.14.34-50-gadacb0d81317
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.34-50-gadacb0d81317

No regressions (compared to build v4.14.34)



Boards, architectures and test suites:
-

dragonboard-410c - arm64
* boot - pass: 20
* kselftest - skip: 20, fail: 2, pass: 43
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 17, pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 6, pass: 57
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - skip: 1, pass: 21
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - pass: 14
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 134, pass: 1016
* ltp-timers-tests - pass: 13

hi6220-hikey - arm64
* boot - pass: 20
* kselftest - skip: 17, fail: 2, pass: 46
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 17, pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 6, pass: 57
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - skip: 1, pass: 21
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - skip: 4, pass: 10
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 135, pass: 1015
* ltp-timers-tests - pass: 13

juno-r2 - arm64
* boot - pass: 20
* kselftest - skip: 18, fail: 2, pass: 45
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 17, pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 6, pass: 57
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - pass: 22
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - skip: 4, pass: 10
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 134, pass: 1016
* ltp-timers-tests - pass: 13

qemu_x86_64
* boot - pass: 22
* kselftest - skip: 22, fail: 2, pass: 56
* kselftest-vsyscall-mode-native - skip: 22, fail: 2, pass: 56
* kselftest-vsyscall-mode-none - skip: 22, fail: 2, pass: 56
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 17, pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 6, pass: 57
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - pass: 22
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - skip: 1, pass: 13
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 147, pass: 1003
* ltp-timers-tests - pass: 13

x15 - arm
* boot - pass: 20
* kselftest - skip: 19, fail: 4, pass: 39
* libhugetlbfs - skip: 1, pass: 87
* 

Re: [PATCH v3 2/2] iommu/amd: Add basic debugfs infrastructure for AMD IOMMU

2018-04-18 Thread Mehta, Sohil
On Wed, 2018-04-18 at 05:58 +, Yang, Shunyong wrote:
> Hi, Gary and Sohil,
> 
> On Tue, 2018-04-17 at 13:38 -0400, Hook, Gary wrote:
> > On 4/13/2018 8:08 PM, Mehta, Sohil wrote:
> > > 
> > > On Fri, 2018-04-06 at 08:17 -0500, Gary R Hook wrote:
> > > > 
> > > >   
> > > > +
> > > > +void amd_iommu_debugfs_setup(struct amd_iommu *iommu)
> > > > +{
> > > > + char name[MAX_NAME_LEN + 1];
> > > > + struct dentry *d_top;
> > > > +
> > > > + if (!debugfs_initialized())
> > > Probably not needed.
> > Right.
> 
> When will this check is needed?
> IMO, this function is to check debugfs ready status before we want to
> use debugfs. I just want to understand when we should use
> debugfs_initialized();
> 

You are right debugfs_initialized() can be used to check if debugfs is
ready. However in this case we can also rely on debugfs_create_dir()
which is called in iommu_debufs_setup().

debugfs_create_dir() says:

 * If debugfs is not enabled in the kernel, the value -%ENODEV will be
 * returned.

Sohil

> Thanks.
> Shunyong.
> 
> > 
> > > 
> > > 
> > > > 
> > > > + return;
> > > > +
> > > > + mutex_lock(_iommu_debugfs_lock);
> > > > + if (!amd_iommu_debugfs) {
> > > > + d_top = iommu_debugfs_setup();
> > > > + if (d_top)
> > > > + amd_iommu_debugfs =
> > > > debugfs_create_dir("amd", d_top);
> > > > + }
> > > > + mutex_unlock(_iommu_debugfs_lock);

Re: [PATCH] drm/xen-front: Remove CMA support

2018-04-18 Thread Oleksandr Andrushchenko

On 04/17/2018 12:08 PM, Oleksandr Andrushchenko wrote:

On 04/17/2018 12:04 PM, Daniel Vetter wrote:

On Tue, Apr 17, 2018 at 10:40:12AM +0300, Oleksandr Andrushchenko wrote:

From: Oleksandr Andrushchenko 

Even if xen-front allocates its buffers from contiguous memory
those are still not contiguous in PA space, e.g. the buffer is only
contiguous in IPA space.
The only use-case for this mode was if xen-front is used to allocate
dumb buffers which later be used by some other driver requiring
contiguous memory, but there is no currently such a use-case or
it can be worked around with xen-front.
Please also mention the nents confusion here, and the patch that 
fixes it.

Or just outright take the commit message from my patch with all the
details:

ok, if you don't mind then I'll use your commit message entirely

 drm/xen: Dissable CMA support
  It turns out this was only needed to paper over a bug in 
the CMA

 helpers, which was addressed in
  commit 998fb1a0f478b83492220ff79583bf9ad538bdd8
 Author: Liviu Dudau 
 Date:   Fri Nov 10 13:33:10 2017 +
  drm: gem_cma_helper.c: Allow importing of contiguous 
scatterlists with nents > 1

  Without this the following pipeline didn't work:
  domU:
 1. xen-front allocates a non-contig buffer
 2. creates grants out of it
  dom0:
 3. converts the grants into a dma-buf. Since they're non-contig, 
the

 scatter-list is huge.
 4. imports it into rcar-du, which requires dma-contig memory for
 scanout.
  -> On this given platform there's an IOMMU, so in theory 
this should

 work. But in practice this failed, because of the huge number of sg
 entries, even though the IOMMU driver mapped it all into a 
dma-contig

 range.
  With a guest-contig buffer allocated in step 1, this 
problem doesn't

 exist. But there's technically no reason to require guest-contig
 memory for xen buffer sharing using grants.

With the commit message improved:

Acked-by: Daniel Vetter 

Thank you,
I'll wait for a day and apply to drm-misc-next if this is ok

applied to drm-misc-next


Signed-off-by: Oleksandr Andrushchenko 


Suggested-by: Daniel Vetter 
---
  Documentation/gpu/xen-front.rst | 12 
  drivers/gpu/drm/xen/Kconfig | 13 
  drivers/gpu/drm/xen/Makefile    |  9 +--
  drivers/gpu/drm/xen/xen_drm_front.c | 62 +++-
  drivers/gpu/drm/xen/xen_drm_front.h | 42 ++-
  drivers/gpu/drm/xen/xen_drm_front_gem.c | 12 +---
  drivers/gpu/drm/xen/xen_drm_front_gem.h |  3 -
  drivers/gpu/drm/xen/xen_drm_front_gem_cma.c | 79 
-

  drivers/gpu/drm/xen/xen_drm_front_shbuf.c   | 22 --
  drivers/gpu/drm/xen/xen_drm_front_shbuf.h   |  8 ---
  10 files changed, 21 insertions(+), 241 deletions(-)
  delete mode 100644 drivers/gpu/drm/xen/xen_drm_front_gem_cma.c

diff --git a/Documentation/gpu/xen-front.rst 
b/Documentation/gpu/xen-front.rst

index 009d942386c5..d988da7d1983 100644
--- a/Documentation/gpu/xen-front.rst
+++ b/Documentation/gpu/xen-front.rst
@@ -18,18 +18,6 @@ Buffers allocated by the frontend driver
  .. kernel-doc:: drivers/gpu/drm/xen/xen_drm_front.h
 :doc: Buffers allocated by the frontend driver
  -With GEM CMA helpers
-
-
-.. kernel-doc:: drivers/gpu/drm/xen/xen_drm_front.h
-   :doc: With GEM CMA helpers
-
-Without GEM CMA helpers
-~~~
-
-.. kernel-doc:: drivers/gpu/drm/xen/xen_drm_front.h
-   :doc: Without GEM CMA helpers
-
  Buffers allocated by the backend
  
  diff --git a/drivers/gpu/drm/xen/Kconfig 
b/drivers/gpu/drm/xen/Kconfig

index 4f4abc91f3b6..4cca160782ab 100644
--- a/drivers/gpu/drm/xen/Kconfig
+++ b/drivers/gpu/drm/xen/Kconfig
@@ -15,16 +15,3 @@ config DRM_XEN_FRONTEND
  help
    Choose this option if you want to enable a para-virtualized
    frontend DRM/KMS driver for Xen guest OSes.
-
-config DRM_XEN_FRONTEND_CMA
-    bool "Use DRM CMA to allocate dumb buffers"
-    depends on DRM_XEN_FRONTEND
-    select DRM_KMS_CMA_HELPER
-    select DRM_GEM_CMA_HELPER
-    help
-  Use DRM CMA helpers to allocate display buffers.
-  This is useful for the use-cases when guest driver needs to
-  share or export buffers to other drivers which only expect
-  contiguous buffers.
-  Note: in this mode driver cannot use buffers allocated
-  by the backend.
diff --git a/drivers/gpu/drm/xen/Makefile 
b/drivers/gpu/drm/xen/Makefile

index 352730dc6c13..712afff5ffc3 100644
--- a/drivers/gpu/drm/xen/Makefile
+++ b/drivers/gpu/drm/xen/Makefile
@@ -5,12 +5,7 @@ drm_xen_front-objs := xen_drm_front.o \
    xen_drm_front_conn.o \
    xen_drm_front_evtchnl.o \
    xen_drm_front_shbuf.o \
-  xen_drm_front_cfg.o
-
-ifeq ($(CONFIG_DRM_XEN_FRONTEND_CMA),y)
-    drm_xen_front-objs += xen_drm_front_gem_cma.o

Re: [PATCH v2 4/6] drm/atmel-hlcdc: support bus-width (12/16/18/24) in endpoint nodes

2018-04-18 Thread Boris Brezillon
On Tue, 17 Apr 2018 15:10:50 +0200
Peter Rosin  wrote:

> This beats the heuristic that the connector is involved in what format
> should be output for cases where this fails.
> 
> E.g. if there is a bridge that changes format between the encoder and the
> connector, or if some of the RGB pins between the lcd controller and the
> encoder are not routed on the PCB.
> 
> This is critical for the devices that have the "conflicting output
> formats" issue (SAM9N12, SAM9X5, SAMA5D3), since the most significant
> RGB bits move around depending on the selected output mode. For
> devices that do not have the "conflicting output formats" issue
> (SAMA5D2, SAMA5D4), this is completely irrelevant.
> 
> Signed-off-by: Peter Rosin 
> ---
>  drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c | 85 
> --
>  1 file changed, 65 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c 
> b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
> index d73281095fac..2e718959981e 100644
> --- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
> +++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
> @@ -19,12 +19,14 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -226,6 +228,68 @@ static void atmel_hlcdc_crtc_atomic_enable(struct 
> drm_crtc *c,
>  #define ATMEL_HLCDC_RGB888_OUTPUTBIT(3)
>  #define ATMEL_HLCDC_OUTPUT_MODE_MASK GENMASK(3, 0)
>  
> +static int atmel_hlcdc_connector_output_mode(struct drm_connector_state 
> *state)
> +{
> + struct drm_connector *connector = state->connector;
> + struct drm_display_info *info = >display_info;
> + unsigned int supported_fmts = 0;
> + struct device_node *ep;
> + int j;
> +
> + /*
> +  * Use the connector index as an approximation of the
> +  * endpoint node index. We know it's true for our case
> +  * depending on the driver implementation.
> +  */
> + ep = of_graph_get_endpoint_by_regs(connector->dev->dev->of_node, 0,
> +connector->index);
> +

Hm, this sounds a bit fragile. Can't we have a reference to the of_node
attached to the connector? Or maybe we can parse this earlier and set a
constraint on the accepted modes.

> + if (ep) {
> + int bus_fmt = drm_of_media_bus_fmt(ep);

Hm, you're extracting this piece of information from the DT every time
an atomic modeset is done. I'd really prefer to have this done once at
probe time. Since this property is attached to the connector, maybe we
should overwrite the info->bus_formats[] array or mark some of its
entries as invalid.

> +
> + of_node_put(ep);
> +
> + if (bus_fmt < 0)
> + return bus_fmt;
> +
> + switch (bus_fmt) {
> + case 0:
> + break;
> + case MEDIA_BUS_FMT_RGB444_1X12:
> + return ATMEL_HLCDC_RGB444_OUTPUT;
> + case MEDIA_BUS_FMT_RGB565_1X16:
> + return ATMEL_HLCDC_RGB565_OUTPUT;
> + case MEDIA_BUS_FMT_RGB666_1X18:
> + return ATMEL_HLCDC_RGB666_OUTPUT;
> + case MEDIA_BUS_FMT_RGB888_1X24:
> + return ATMEL_HLCDC_RGB888_OUTPUT;
> + default:
> + return -EINVAL;
> + }
> + }
> +
> + for (j = 0; j < info->num_bus_formats; j++) {
> + switch (info->bus_formats[j]) {
> + case MEDIA_BUS_FMT_RGB444_1X12:
> + supported_fmts |= ATMEL_HLCDC_RGB444_OUTPUT;
> + break;
> + case MEDIA_BUS_FMT_RGB565_1X16:
> + supported_fmts |= ATMEL_HLCDC_RGB565_OUTPUT;
> + break;
> + case MEDIA_BUS_FMT_RGB666_1X18:
> + supported_fmts |= ATMEL_HLCDC_RGB666_OUTPUT;
> + break;
> + case MEDIA_BUS_FMT_RGB888_1X24:
> + supported_fmts |= ATMEL_HLCDC_RGB888_OUTPUT;
> + break;
> + default:
> + break;
> + }
> + }
> +
> + return supported_fmts;
> +}
> +
>  static int atmel_hlcdc_crtc_select_output_mode(struct drm_crtc_state *state)
>  {
>   unsigned int output_fmts = ATMEL_HLCDC_OUTPUT_MODE_MASK;
> @@ -238,31 +302,12 @@ static int atmel_hlcdc_crtc_select_output_mode(struct 
> drm_crtc_state *state)
>   crtc = drm_crtc_to_atmel_hlcdc_crtc(state->crtc);
>  
>   for_each_new_connector_in_state(state->state, connector, cstate, i) {
> - struct drm_display_info *info = >display_info;
>   unsigned int supported_fmts = 0;
> - int j;
>  
>   if (!cstate->crtc)
>   continue;
>  
> - for (j = 0; j < info->num_bus_formats; j++) {
> - switch (info->bus_formats[j]) {
> -   

Re: INFO: task hung in fsnotify_mark_destroy_workfn

2018-04-18 Thread Dan Carpenter
This looks like a binder bug, but none of the Android devs are CC'd.
The've probably already seen it, but let me forward it to them.

regards,
dan carpenter

On Tue, Apr 17, 2018 at 06:02:02PM -0700, syzbot wrote:
> Hello,
> 
> syzbot hit the following crash on upstream commit
> a27fc14219f2e3c4a46ba9177b04d9b52c875532 (Mon Apr 16 21:07:39 2018 +)
> Merge branch 'parisc-4.17-3' of
> git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=e38306788a2e7102a3b6
> 
> syzkaller reproducer:
> https://syzkaller.appspot.com/x/repro.syz?id=5126465372815360
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=5956756370882560
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=-5914490758943236750
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+e38306788a2e7102a...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
> 
> binder: undelivered TRANSACTION_ERROR: 29201
> binder: 23363:23363 got reply transaction with no transaction stack
> binder: undelivered TRANSACTION_ERROR: 29201
> binder: 23363:23363 transaction failed 29201/-71, size 0-0 line 2763
> binder: undelivered TRANSACTION_ERROR: 29201
> INFO: task kworker/u4:4:853 blocked for more than 120 seconds.
> binder: undelivered TRANSACTION_ERROR: 29201
>   Not tainted 4.17.0-rc1+ #6
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u4:4D11512   853  2 0x8000
> Workqueue: events_unbound fsnotify_mark_destroy_workfn
> binder: undelivered TRANSACTION_ERROR: 29201
> Call Trace:
>  context_switch kernel/sched/core.c:2848 [inline]
>  __schedule+0x801/0x1e30 kernel/sched/core.c:3490
> binder: undelivered TRANSACTION_ERROR: 29201
> binder: undelivered TRANSACTION_ERROR: 29201
>  schedule+0xef/0x430 kernel/sched/core.c:3549
> binder: undelivered TRANSACTION_ERROR: 29201
> binder: undelivered TRANSACTION_ERROR: 29201
> binder: undelivered TRANSACTION_ERROR: 29201
> binder: undelivered TRANSACTION_ERROR: 29201
>  schedule_timeout+0x1b5/0x240 kernel/time/timer.c:1777
> binder: undelivered TRANSACTION_ERROR: 29201
>  do_wait_for_common kernel/sched/completion.c:83 [inline]
>  __wait_for_common kernel/sched/completion.c:104 [inline]
>  wait_for_common kernel/sched/completion.c:115 [inline]
>  wait_for_completion+0x3e7/0x870 kernel/sched/completion.c:136
> binder: undelivered TRANSACTION_ERROR: 29201
>  __synchronize_srcu+0x189/0x240 kernel/rcu/srcutree.c:924
> binder: undelivered TRANSACTION_ERROR: 29201
> binder: undelivered TRANSACTION_ERROR: 29201
>  synchronize_srcu+0x408/0x54f kernel/rcu/srcutree.c:1002
> binder: undelivered TRANSACTION_ERROR: 29201
> binder: undelivered TRANSACTION_ERROR: 29201
>  fsnotify_mark_destroy_workfn+0x1aa/0x530 fs/notify/mark.c:759
> binder: undelivered TRANSACTION_ERROR: 29201
> binder: undelivered TRANSACTION_ERROR: 29201
> binder: undelivered TRANSACTION_ERROR: 29201
> binder: undelivered TRANSACTION_ERROR: 29201
>  process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
> binder: undelivered TRANSACTION_ERROR: 29201
> binder: undelivered TRANSACTION_ERROR: 29201
> binder: 23369:23369 got reply transaction with no transaction stack
> binder: 23369:23369 transaction failed 29201/-71, size 0-0 line 2763
> binder: 23366:23366 got reply transaction with no transaction stack
> binder: 23366:23366 transaction failed 29201/-71, size 0-0 line 2763
>  worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
> binder: undelivered TRANSACTION_ERROR: 29201
> binder: undelivered TRANSACTION_ERROR: 29201
> binder: 23379:23379 got reply transaction with no transaction stack
> binder: 23379:23379 transaction failed 29201/-71, size 0-0 line 2763
> binder: undelivered TRANSACTION_ERROR: 29201
>  kthread+0x345/0x410 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
> binder: undelivered TRANSACTION_ERROR: 29201
> 
> Showing all locks held in the system:
> 2 locks held by kworker/u4:4/853:
>  #0: 9bb0899e ((wq_completion)"events_unbound"){+.+.}, at:
> __write_once_size include/linux/compiler.h:215 [inline]
>  #0: 9bb0899e ((wq_completion)"events_unbound"){+.+.}, at:
> arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
>  #0: 9bb0899e ((wq_completion)"events_unbound"){+.+.}, at:
> atomic64_set include/asm-generic/atomic-instrumented.h:40 [inline]
>  #0: 9bb0899e ((wq_completion)"events_unbound"){+.+.}, at:
> atomic_long_set include/asm-generic/atomic-long.h:57 [inline]
>  #0: 9bb0899e ((wq_completion)"events_unbound"){+.+.}, at:
> set_work_data kernel/workqueue.c:617 [inline]
>  #0: 9bb0899e ((wq_completion)"events_unbound"){+.+.}, at:
> set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
> 

Re: [PATCH v2 7/7] ocxl: Document new OCXL IOCTLs

2018-04-18 Thread Andrew Donnellan

On 18/04/18 11:08, Alastair D'Silva wrote:

From: Alastair D'Silva 

Signed-off-by: Alastair D'Silva 


This looks better.

Acked-by: Andrew Donnellan 


---
  Documentation/accelerators/ocxl.rst | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/Documentation/accelerators/ocxl.rst 
b/Documentation/accelerators/ocxl.rst
index 7904adcc07fd..3b8d3b99795c 100644
--- a/Documentation/accelerators/ocxl.rst
+++ b/Documentation/accelerators/ocxl.rst
@@ -157,6 +157,17 @@ OCXL_IOCTL_GET_METADATA:
Obtains configuration information from the card, such at the size of
MMIO areas, the AFU version, and the PASID for the current context.
  
+OCXL_IOCTL_ENABLE_P9_WAIT:

+
+  Allows the AFU to wake a userspace thread executing 'wait'. Returns
+  information to userspace to allow it to configure the AFU. Note that
+  this is only available on Power 9.


Nitpicking time, if you do a v3 you should stay on brand and call it 
POWER9. :D



+
+OCXL_IOCTL_GET_FEATURES:
+
+  Reports on which CPU features that affect OpenCAPI are usable from
+  userspace.
+
  mmap
  
  



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH v2 2/6] dt-bindings: display: atmel: optional video-interface of endpoints

2018-04-18 Thread Peter Rosin
On 2018-04-18 09:16, Boris Brezillon wrote:
> Hi Peter,
> 
> On Tue, 17 Apr 2018 15:10:48 +0200
> Peter Rosin  wrote:
> 
>> With bus-type/bus-width properties in the endpoint nodes, the video-
>> interface of the connection can be specified for cases where the
>> heuristic fails to select the correct output mode. This can happen
>> e.g. if not all RGB pins are routed on the PCB; the driver has no
>> way of knowing this, and needs to be told explicitly.
>>
>> This is critical for the devices that have the "conflicting output
>> formats" issue (SAM9N12, SAM9X5, SAMA5D3), since the most significant
>> RGB bits move around depending on the selected output mode. For
>> devices that do not have the "conflicting output formats" issue
>> (SAMA5D2, SAMA5D4), this is completely irrelevant.
>>
>> Signed-off-by: Peter Rosin 
>> ---
>>  Documentation/devicetree/bindings/display/atmel/hlcdc-dc.txt | 8 
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/display/atmel/hlcdc-dc.txt 
>> b/Documentation/devicetree/bindings/display/atmel/hlcdc-dc.txt
>> index 82f2acb3d374..244b48869eb4 100644
>> --- a/Documentation/devicetree/bindings/display/atmel/hlcdc-dc.txt
>> +++ b/Documentation/devicetree/bindings/display/atmel/hlcdc-dc.txt
>> @@ -15,6 +15,14 @@ Required children nodes:
>>   to external devices using the OF graph reprensentation (see ../graph.txt).
>>   At least one port node is required.
>>  
>> +Optional properties in grandchild nodes:
>> + Any endpoint grandchild node may specify a desired video interface
>> + according to ../../media/video-interfaces.txt, specifically
>> + - bus-type: must be <0>.
>> + - bus-width: recognized values are <12>, <16>, <18> and <24>, and
>> +   override any output mode selection hueristic, forcing "rgb444",

heuristic, I'll fix that for v3, so please review as if it wasn't there...

>> +   "rgb565", "rgb666" and "rgb888" respectively.
>> +
> 
> Can you add an example or update the existing one to show how this
> should be defined?

For v3, I'll extend the binding with this after the preexisting example:

--8<-
Example 2: With a video interface override to force rgb565, as above
but with these changes/additions:

 {
hlcdc-display-controller {
pinctrl-names = "default";
pinctrl-0 = <_lcd_base _lcd_rgb565>;

port@0 {
hlcdc_panel_output: endpoint@0 {
bus-type = <0>;
bus-width = <16>;
};
};
};
};
--8<-

Is that a good plan, or should I perhaps duplicate the whole example?

Cheers,
Peter


>>  Example:
>>  
>>  hlcdc: hlcdc@f003 {
> 
> 
> Thanks,
> 
> Boris
> 



[PATCH] nvme: fix the suspicious RCU usage warning in nvme_mpath_clear_current_path

2018-04-18 Thread Jianchao Wang
With lockdep enabled, when trigger nvme_remove, suspicious RCU
usage warning will be printed out.
Fix it with adding srcu_read_lock/unlock in it.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/nvme.h | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 061fecf..d326c23 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -446,9 +446,14 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head);
 static inline void nvme_mpath_clear_current_path(struct nvme_ns *ns)
 {
struct nvme_ns_head *head = ns->head;
+   int srcu_idx;
 
-   if (head && ns == srcu_dereference(head->current_path, >srcu))
-   rcu_assign_pointer(head->current_path, NULL);
+   if (head) {
+   srcu_idx = srcu_read_lock(>srcu);
+   if (ns == srcu_dereference(head->current_path, >srcu))
+   rcu_assign_pointer(head->current_path, NULL);
+   srcu_read_unlock(>srcu, srcu_idx);
+   }
 }
 struct nvme_ns *nvme_find_path(struct nvme_ns_head *head);
 
-- 
2.7.4



Re: [PATCH] Bluetooth: hci_qca: Avoid missing rampatch failure with userspace fw loader

2018-04-18 Thread Marcel Holtmann
Hi Amit,

> AOSP use userspace firmware loader to load firmwares, which will
> return -EAGAIN in case qca/rampatch_00440302.bin is not found.
> Since there is no rampatch for dragonboard820c QCA controller
> revision, just make it work as is.
> 
> CC: Loic Poulain 
> CC: Nicolas Dechesne 
> CC: Marcel Holtmann 
> CC: Johan Hedberg 
> CC: Stable 
> Signed-off-by: Amit Pundir 
> ---
> drivers/bluetooth/hci_qca.c | 6 ++
> 1 file changed, 6 insertions(+)

patch has been applied to bluetooth-next tree.

Regards

Marcel



Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Roger Pau Monné
On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko wrote:
> On 04/17/2018 11:57 PM, Dongwon Kim wrote:
> > On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:
> > > On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:
> 3.2 Backend exports dma-buf to xen-front
> 
> In this case Dom0 pages are shared with DomU. As before, DomU can only write
> to these pages, not any other page from Dom0, so it can be still considered
> safe.
> But, the following must be considered (highlighted in xen-front's Kernel
> documentation):
>  - If guest domain dies then pages/grants received from the backend cannot
>    be claimed back - think of it as memory lost to Dom0 (won't be used for
> any
>    other guest)
>  - Misbehaving guest may send too many requests to the backend exhausting
>    its grant references and memory (consider this from security POV). As the
>    backend runs in the trusted domain we also assume that it is trusted as
> well,
>    e.g. must take measures to prevent DDoS attacks.

I cannot parse the above sentence:

"As the backend runs in the trusted domain we also assume that it is
trusted as well, e.g. must take measures to prevent DDoS attacks."

What's the relation between being trusted and protecting from DoS
attacks?

In any case, all? PV protocols are implemented with the frontend
sharing pages to the backend, and I think there's a reason why this
model is used, and it should continue to be used.

Having to add logic in the backend to prevent such attacks means
that:

 - We need more code in the backend, which increases complexity and
   chances of bugs.
 - Such code/logic could be wrong, thus allowing DoS.

> 4. xen-front/backend/xen-zcopy synchronization
> 
> 4.1. As I already said in 2) all the inter VM communication happens between
> xen-front and the backend, xen-zcopy is NOT involved in that.
> When xen-front wants to destroy a display buffer (dumb/dma-buf) it issues a
> XENDISPL_OP_DBUF_DESTROY command (opposite to XENDISPL_OP_DBUF_CREATE).
> This call is synchronous, so xen-front expects that backend does free the
> buffer pages on return.
> 
> 4.2. Backend, on XENDISPL_OP_DBUF_DESTROY:
>   - closes all dumb handles/fd's of the buffer according to [3]
>   - issues DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL to xen-zcopy to make
> sure
>     the buffer is freed (think of it as it waits for dma-buf->release
> callback)

So this zcopy thing keeps some kind of track of the memory usage? Why
can't the user-space backend keep track of the buffer usage?

>   - replies to xen-front that the buffer can be destroyed.
> This way deletion of the buffer happens synchronously on both Dom0 and DomU
> sides. In case if DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE returns with time-out
> error
> (BTW, wait time is a parameter of this IOCTL), Xen will defer grant
> reference
> removal and will retry later until those are free.
> 
> Hope this helps understand how buffers are synchronously deleted in case
> of xen-zcopy with a single protocol command.
> 
> I think the above logic can also be re-used by the hyper-dmabuf driver with
> some additional work:
> 
> 1. xen-zcopy can be split into 2 parts and extend:
> 1.1. Xen gntdev driver [4], [5] to allow creating dma-buf from grefs and
> vise versa,

I don't know much about the dma-buf implementation in Linux, but
gntdev is a user-space device, and AFAICT user-space applications
don't have any notion of dma buffers. How are such buffers useful for
user-space? Why can't this just be called memory?

Also, (with my FreeBSD maintainer hat) how is this going to translate
to other OSes? So far the operations performed by the gntdev device
are mostly OS-agnostic because this just map/unmap memory, and in fact
they are implemented by Linux and FreeBSD.

> implement "wait" ioctl (wait for dma-buf->release): currently these are
> DRM_XEN_ZCOPY_DUMB_FROM_REFS, DRM_XEN_ZCOPY_DUMB_TO_REFS and
> DRM_XEN_ZCOPY_DUMB_WAIT_FREE
> 1.2. Xen balloon driver [6] to allow allocating contiguous buffers (not
> needed
> by current hyper-dmabuf, but is a must for xen-zcopy use-cases)

I think this needs clarifying. In which memory space do you need those
regions to be contiguous?

Do they need to be contiguous in host physical memory, or guest
physical memory?

If it's in guest memory space, isn't there any generic interface that
you can use?

If it's in host physical memory space, why do you need this buffer to
be contiguous in host physical memory space? The IOMMU should hide all
this.

Thanks, Roger.


Re: [PATCH 00/35 v5] PTI support for x32

2018-04-18 Thread Joerg Roedel
On Mon, Apr 16, 2018 at 09:13:22AM -0700, Linus Torvalds wrote:
> See for example commit 8c06c7740d19 ("x86/pti: Leave kernel text
> global for !PCID") and in particular the performance numbers (that's
> an Atom microserver, but it was chosen due to lack of PCID).

Okay, I checked this on 32 bit and after some small changes I got
identical mappings with GLB set in all page-tables. The changes were:

* Don't change permission bits in pti_clone_kernel_text().
  Changing them does not make a difference on 64 bit as
  everything cloned in this function is RO anyway. On 32 bit
  some areas are mapped RW, so it does make a difference there.
  
  Having different permissions between kernel and user
  page-table does also not make sense, because a permission
  mismatch in the TLB will cause a re-walk, which is as fast as
  not mapping it at all.

* Mapping kernel-text to user-space on 32 bit too. Since there
  is no PCID this should improve performance. I have not
  measured that yet, but will do so before posting the next
  version.

I do some more testing and performance measurements and will send
version 6 of my patches beginning of next week when v4.17-rc2 is out.


Regards,

Joerg



Re: [PATCH v2 5/6] drm/atmel-hlcdc: add support for connecting to tda998x HDMI encoder

2018-04-18 Thread Boris Brezillon
On Tue, 17 Apr 2018 15:10:51 +0200
Peter Rosin  wrote:

> When the of-graph points to a tda998x-compatible HDMI encoder, register
> as a component master and bind to the encoder/connector provided by
> the tda998x driver.

Can't we do the opposite: make the tda998x driver expose its devices as
drm bridges. I'd rather not add another way to connect external
encoders (or bridges) to display controller drivers, especially since,
when I asked DRM maintainers/devs what was the good approach to
represent such external encoders they pointed me to the drm_bridge
interface.

> 
> Signed-off-by: Peter Rosin 
> ---
>  drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c |  81 --
>  drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h |  15 +++
>  drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_output.c | 130 
> +++
>  3 files changed, 220 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c 
> b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
> index c1ea5c36b006..8523c40fac94 100644
> --- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
> +++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
> @@ -20,6 +20,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -568,10 +569,13 @@ static int atmel_hlcdc_dc_modeset_init(struct 
> drm_device *dev)
>  
>   drm_mode_config_init(dev);
>  
> - ret = atmel_hlcdc_create_outputs(dev);
> - if (ret) {
> - dev_err(dev->dev, "failed to create HLCDC outputs: %d\n", ret);
> - return ret;
> + if (!dc->is_componentized) {
> + ret = atmel_hlcdc_create_outputs(dev);
> + if (ret) {
> + dev_err(dev->dev,
> + "failed to create HLCDC outputs: %d\n", ret);
> + return ret;
> + }
>   }
>  
>   ret = atmel_hlcdc_create_planes(dev);
> @@ -586,6 +590,16 @@ static int atmel_hlcdc_dc_modeset_init(struct drm_device 
> *dev)
>   return ret;
>   }
>  
> + if (dc->is_componentized) {
> + ret = component_bind_all(dev->dev, dev);
> + if (ret < 0)
> + return ret;
> +
> + ret = atmel_hlcdc_add_component_encoder(dev);
> + if (ret < 0)
> + return ret;
> + }
> +
>   dev->mode_config.min_width = dc->desc->min_width;
>   dev->mode_config.min_height = dc->desc->min_height;
>   dev->mode_config.max_width = dc->desc->max_width;
> @@ -617,6 +631,9 @@ static int atmel_hlcdc_dc_load(struct drm_device *dev)
>   if (!dc)
>   return -ENOMEM;
>  
> + dc->is_componentized =
> + atmel_hlcdc_get_external_components(dev->dev, NULL) > 0;
> +
>   dc->wq = alloc_ordered_workqueue("atmel-hlcdc-dc", 0);
>   if (!dc->wq)
>   return -ENOMEM;
> @@ -751,7 +768,7 @@ static struct drm_driver atmel_hlcdc_dc_driver = {
>   .minor = 0,
>  };
>  
> -static int atmel_hlcdc_dc_drm_probe(struct platform_device *pdev)
> +static int atmel_hlcdc_dc_drm_init(struct platform_device *pdev)
>  {
>   struct drm_device *ddev;
>   int ret;
> @@ -779,7 +796,7 @@ static int atmel_hlcdc_dc_drm_probe(struct 
> platform_device *pdev)
>   return ret;
>  }
>  
> -static int atmel_hlcdc_dc_drm_remove(struct platform_device *pdev)
> +static int atmel_hlcdc_dc_drm_fini(struct platform_device *pdev)
>  {
>   struct drm_device *ddev = platform_get_drvdata(pdev);
>  
> @@ -790,6 +807,58 @@ static int atmel_hlcdc_dc_drm_remove(struct 
> platform_device *pdev)
>   return 0;
>  }
>  
> +static int atmel_hlcdc_bind(struct device *dev)
> +{
> + return atmel_hlcdc_dc_drm_init(to_platform_device(dev));
> +}
> +
> +static void atmel_hlcdc_unbind(struct device *dev)
> +{
> + struct drm_device *ddev = dev_get_drvdata(dev);
> +
> + /* Check if a subcomponent has already triggered the unloading. */
> + if (!ddev->dev_private)
> + return;
> +
> + atmel_hlcdc_dc_drm_fini(to_platform_device(dev));
> +}
> +
> +static const struct component_master_ops atmel_hlcdc_comp_ops = {
> + .bind = atmel_hlcdc_bind,
> + .unbind = atmel_hlcdc_unbind,
> +};
> +
> +static int atmel_hlcdc_dc_drm_probe(struct platform_device *pdev)
> +{
> + struct component_match *match = NULL;
> + int ret;
> +
> + ret = atmel_hlcdc_get_external_components(>dev, );
> + if (ret < 0)
> + return ret;
> + else if (ret)
> + return component_master_add_with_match(>dev,
> +_hlcdc_comp_ops,
> +match);
> + else
> + return atmel_hlcdc_dc_drm_init(pdev);
> +}
> +
> +static int atmel_hlcdc_dc_drm_remove(struct platform_device *pdev)
> +{
> + int ret;
> +
> + ret = atmel_hlcdc_get_external_components(>dev, NULL);
> + if (ret < 0)
> + return ret;
> + else if (ret)
> +  

Re: [v5,05/13] ARM: dts: ipq4019: Add ipq4019-ap.dk04.dtsi

2018-04-18 Thread Sven Eckelmann
Hi,

On Mittwoch, 18. April 2018 12:45:20 CEST Sricharan R wrote:
>  Right, will add the above change to soc.dtsi in V6. Does that sound ok for 
> you ?

I have submitted a patch for this now [1] because I need this for OpenWrt 
(sooner rather than later). And I am not sure whether it is good to have this 
in your feature series because it is a bugfix which might even qualify for 
sta...@vger.kernel.org.

I hope this patch [1] is ok for you.

Kind regards,
Sven

[1] https://patchwork.kernel.org/patch/10347459/

signature.asc
Description: This is a digitally signed message part.


Re: [PATCH v7 0/4] Bluetooth: hci_qca: Add serdev support

2018-04-18 Thread Marcel Holtmann
Hi Thierry,

> This patchset enables the Qualcomm BT controller QCA6174 node in the
> device tree of the db820c board. This allows the bluetooth chipset to
> be probed and registered against the hci layer by using the serdev
> framework.
> 
> This patchset also contains the documentation for the compatible
> string "qcom,qca6174-bt" related to this chipset.
> 
> v7:
> - Add a new patch enabling regulators and gpios for the bt/wlan
>  combo chip
> 
> v6:
> - Move pinctrl properties into subnodes
> - fix binding documentation
> 
> v5:
> - Rename 'bt-disable-n' gpio as 'enable'
> 
> v4:
> - Fix dt binding documentation
> - Address some other issues in patch #3
> 
> v3:
> - Address comments for patch #3 (details in patch)
> 
> v2:
> - Fix author email
> 
> 
> Srinivas Kandagatla (1):
>  arm64: dts: apq8096-db820c: Enable wlan and bt en pins
> 
> Thierry Escande (3):
>  arm64: dts: apq8096-db820c: enable bluetooth node
>  dt-bindings: net: bluetooth: Add qualcomm-bluetooth
>  Bluetooth: hci_qca: Add serdev support
> 
> .../devicetree/bindings/net/qualcomm-bluetooth.txt |  30 ++
> arch/arm64/boot/dts/qcom/apq8096-db820c-pins.dtsi  |  26 +
> .../boot/dts/qcom/apq8096-db820c-pmic-pins.dtsi|  32 ++
> arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi   |  62 
> arch/arm64/boot/dts/qcom/msm8996.dtsi  |  10 ++
> drivers/bluetooth/Kconfig  |   1 +
> drivers/bluetooth/hci_qca.c| 109 -
> 7 files changed, 268 insertions(+), 2 deletions(-)
> create mode 100644 
> Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt

all 4 patches have been applied to bluetooth-next tree.

Regards

Marcel



<    4   5   6   7   8   9   10   11   12   13   >