date:20181023

Re: [PATCH 0/7] x86/mm/tlb: make lazy TLB mode even lazier

2018-10-23 Thread Ingo Molnar

* Rik van Riel  wrote:

> The big thing remaining is the reference count overhead of
> the lazy TLB mm_struct, but getting rid of that is rather a
> lot of code for a small performance gain. Not quite what
> Linus asked for :)

BTW., what would be the plan to improve scalability there,
is it even possible?

Also, it would be nice to integrate some of those workloads
into a simple 'perf bench mm' or 'perf bench tlb' subcommand,
see tools/perf/bench/ on how to add benchmarking modules.

Thanks,

Ingo

Re: [PATCH 0/7] x86/mm/tlb: make lazy TLB mode even lazier

2018-10-23 Thread Ingo Molnar

* Rik van Riel  wrote:

> The big thing remaining is the reference count overhead of
> the lazy TLB mm_struct, but getting rid of that is rather a
> lot of code for a small performance gain. Not quite what
> Linus asked for :)

BTW., what would be the plan to improve scalability there,
is it even possible?

Also, it would be nice to integrate some of those workloads
into a simple 'perf bench mm' or 'perf bench tlb' subcommand,
see tools/perf/bench/ on how to add benchmarking modules.

Thanks,

Ingo

Re: [GIT] Sparc

2018-10-23 Thread Linus Torvalds

On Wed, Oct 24, 2018 at 4:31 AM David Miller  wrote:
>
> Mostly VDSO cleanups and optimizations.

Pulled,

  Linus

Re: [GIT] Sparc

2018-10-23 Thread Linus Torvalds

On Wed, Oct 24, 2018 at 4:31 AM David Miller  wrote:
>
> Mostly VDSO cleanups and optimizations.

Pulled,

  Linus

[PATCH v3] kernel/signal: Signal-based pre-coredump notification

2018-10-23 Thread Enke Chen

For simplicity and consistency, this patch provides an implementation
for signal-based fault notification prior to the coredump of a child
process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can
be used by an application to express its interest and to specify the
signal for such a notification. A new signal code CLD_PREDUMP is also
defined for SIGCHLD.

Changes to prctl(2):

   PR_SET_PREDUMP_SIG (since Linux 4.20.x)
  Set the child pre-coredump signal of the calling process to
  arg2 (either a signal value in the range 1..maxsig, or 0 to
  clear). This is the signal that the calling process will get
  prior to the coredump of a child process. This value is
  cleared across execve(2), or for the child of a fork(2).

  When SIGCHLD is specified, the signal code will be set to
  CLD_PREDUMP in such an SIGCHLD signal.

   PR_GET_PREDUMP_SIG (since Linux 4.20.x)
  Return the current value of the child pre-coredump signal,
  in the location pointed to by (int *) arg2.

Background:

As the coredump of a process may take time, in certain time-sensitive
applications it is necessary for a parent process (e.g., a process
manager) to be notified of a child's imminent death before the coredump
so that the parent process can act sooner, such as re-spawning an
application process, or initiating a control-plane fail-over.

Currently there are two ways for a parent process to be notified of a
child process's state change. One is to use the POSIX signal, and
another is to use the kernel connector module. The specific events and
actions are summarized as follows:

Process EventPOSIX SignalConnector-based
--
ptrace_attach()  do_notify_parent_cldstop()  proc_ptrace_connector()
 SIGCHLD / CLD_STOPPED

ptrace_detach()  do_notify_parent_cldstop()  proc_ptrace_connector()
 SIGCHLD / CLD_CONTINUED

pre_coredump/N/A proc_coredump_connector()
get_signal()

post_coredump/   do_notify_parent()  proc_exit_connector()
do_exit()SIGCHLD / exit_signal
--

As shown in the table, the signal-based pre-coredump notification is not
currently available. In some cases using a connector-based notification
can be quite complicated (e.g., when a process manager is written in shell
scripts and thus is subject to certain inherent limitations), and a
signal-based notification would be simpler and better suited.

Signed-off-by: Enke Chen 
---
v2 -> v3:

Addressed review comments from Oleg Nesterov, including:
o remove the restriction on signal for PR_SET_PREDUMP_SIG.
o code simplification

 arch/x86/kernel/signal_compat.c  |   2 +-
 fs/coredump.c|   6 +
 fs/exec.c|   3 +
 include/linux/sched/signal.h |   4 +
 include/uapi/asm-generic/siginfo.h   |   3 +-
 include/uapi/linux/prctl.h   |   4 +
 kernel/fork.c|   3 +
 kernel/signal.c  |  31 +
 kernel/sys.c |  13 ++
 tools/testing/selftests/prctl/Makefile   |   2 +-
 tools/testing/selftests/prctl/predump-sig-test.c | 169 +++
 11 files changed, 237 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/prctl/predump-sig-test.c

diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c
index 9ccbf05..a3deba8 100644
--- a/arch/x86/kernel/signal_compat.c
+++ b/arch/x86/kernel/signal_compat.c
@@ -30,7 +30,7 @@ static inline void signal_compat_build_tests(void)
BUILD_BUG_ON(NSIGSEGV != 7);
BUILD_BUG_ON(NSIGBUS  != 5);
BUILD_BUG_ON(NSIGTRAP != 5);
-   BUILD_BUG_ON(NSIGCHLD != 6);
+   BUILD_BUG_ON(NSIGCHLD != 7);
BUILD_BUG_ON(NSIGSYS  != 1);
 
/* This is part of the ABI and can never change in size: */
diff --git a/fs/coredump.c b/fs/coredump.c
index e42e17e..d6ca1a3 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -590,6 +590,12 @@ void do_coredump(const kernel_siginfo_t *siginfo)
if (retval < 0)
goto fail_creds;
 
+   /*
+* Send the pre-coredump signal to the parent if requested.
+*/
+   do_notify_parent_predump();
+   cond_resched();
+
old_cred = override_creds(cred);
 
ispipe = format_corename(, );
diff --git a/fs/exec.c b/fs/exec.c
index fc281b7..7714da7 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1181,6 +1181,9 @@ static int de_thread(struct task_struct *tsk)
/* we have changed execution domain */
tsk->exit_signal = SIGCHLD;
 
+   /* Clear the pre-coredump signal before loading a new binary */
+   sig->predump_signal = 0;
+
 #ifdef

[PATCH v3] kernel/signal: Signal-based pre-coredump notification

2018-10-23 Thread Enke Chen

For simplicity and consistency, this patch provides an implementation
for signal-based fault notification prior to the coredump of a child
process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can
be used by an application to express its interest and to specify the
signal for such a notification. A new signal code CLD_PREDUMP is also
defined for SIGCHLD.

Changes to prctl(2):

   PR_SET_PREDUMP_SIG (since Linux 4.20.x)
  Set the child pre-coredump signal of the calling process to
  arg2 (either a signal value in the range 1..maxsig, or 0 to
  clear). This is the signal that the calling process will get
  prior to the coredump of a child process. This value is
  cleared across execve(2), or for the child of a fork(2).

  When SIGCHLD is specified, the signal code will be set to
  CLD_PREDUMP in such an SIGCHLD signal.

   PR_GET_PREDUMP_SIG (since Linux 4.20.x)
  Return the current value of the child pre-coredump signal,
  in the location pointed to by (int *) arg2.

Background:

As the coredump of a process may take time, in certain time-sensitive
applications it is necessary for a parent process (e.g., a process
manager) to be notified of a child's imminent death before the coredump
so that the parent process can act sooner, such as re-spawning an
application process, or initiating a control-plane fail-over.

Currently there are two ways for a parent process to be notified of a
child process's state change. One is to use the POSIX signal, and
another is to use the kernel connector module. The specific events and
actions are summarized as follows:

Process EventPOSIX SignalConnector-based
--
ptrace_attach()  do_notify_parent_cldstop()  proc_ptrace_connector()
 SIGCHLD / CLD_STOPPED

ptrace_detach()  do_notify_parent_cldstop()  proc_ptrace_connector()
 SIGCHLD / CLD_CONTINUED

pre_coredump/N/A proc_coredump_connector()
get_signal()

post_coredump/   do_notify_parent()  proc_exit_connector()
do_exit()SIGCHLD / exit_signal
--

As shown in the table, the signal-based pre-coredump notification is not
currently available. In some cases using a connector-based notification
can be quite complicated (e.g., when a process manager is written in shell
scripts and thus is subject to certain inherent limitations), and a
signal-based notification would be simpler and better suited.

Signed-off-by: Enke Chen 
---
v2 -> v3:

Addressed review comments from Oleg Nesterov, including:
o remove the restriction on signal for PR_SET_PREDUMP_SIG.
o code simplification

 arch/x86/kernel/signal_compat.c  |   2 +-
 fs/coredump.c|   6 +
 fs/exec.c|   3 +
 include/linux/sched/signal.h |   4 +
 include/uapi/asm-generic/siginfo.h   |   3 +-
 include/uapi/linux/prctl.h   |   4 +
 kernel/fork.c|   3 +
 kernel/signal.c  |  31 +
 kernel/sys.c |  13 ++
 tools/testing/selftests/prctl/Makefile   |   2 +-
 tools/testing/selftests/prctl/predump-sig-test.c | 169 +++
 11 files changed, 237 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/prctl/predump-sig-test.c

diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c
index 9ccbf05..a3deba8 100644
--- a/arch/x86/kernel/signal_compat.c
+++ b/arch/x86/kernel/signal_compat.c
@@ -30,7 +30,7 @@ static inline void signal_compat_build_tests(void)
BUILD_BUG_ON(NSIGSEGV != 7);
BUILD_BUG_ON(NSIGBUS  != 5);
BUILD_BUG_ON(NSIGTRAP != 5);
-   BUILD_BUG_ON(NSIGCHLD != 6);
+   BUILD_BUG_ON(NSIGCHLD != 7);
BUILD_BUG_ON(NSIGSYS  != 1);
 
/* This is part of the ABI and can never change in size: */
diff --git a/fs/coredump.c b/fs/coredump.c
index e42e17e..d6ca1a3 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -590,6 +590,12 @@ void do_coredump(const kernel_siginfo_t *siginfo)
if (retval < 0)
goto fail_creds;
 
+   /*
+* Send the pre-coredump signal to the parent if requested.
+*/
+   do_notify_parent_predump();
+   cond_resched();
+
old_cred = override_creds(cred);
 
ispipe = format_corename(, );
diff --git a/fs/exec.c b/fs/exec.c
index fc281b7..7714da7 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1181,6 +1181,9 @@ static int de_thread(struct task_struct *tsk)
/* we have changed execution domain */
tsk->exit_signal = SIGCHLD;
 
+   /* Clear the pre-coredump signal before loading a new binary */
+   sig->predump_signal = 0;
+
 #ifdef

Re: [PATCH] mm: convert totalram_pages, totalhigh_pages and managed_pages to atomic.

2018-10-23 Thread Arun KS


On 2018-10-24 01:34, Kees Cook wrote:

On Mon, Oct 22, 2018 at 10:11 PM, Konstantin Khlebnikov
 wrote:
On 23.10.2018 7:15, Joe Perches wrote:> On Mon, 2018-10-22 at 22:53 
+0530,

Arun KS wrote:

Remove managed_page_count_lock spinlock and instead use atomic
variables.


Perhaps better to define and use macros for the accesses
instead of specific uses of atomic_long_

Something like:

#define totalram_pages()  (unsigned
long)atomic_long_read(&_totalram_pages)


or proper static inline
this code isn't so low level for breaking include dependencies with 
macro


BTW, I noticed a few places in the patch that did multiple evaluations
of totalram_pages. It might be worth fixing those prior to doing the
conversion, too. e.g.:

if (totalram_pages > something)
   foobar(totalram_pages); <- value may have changed here

should, instead, be:

var = totalram_pages; <- get stable view of the value
if (var > something)
foobar(var);


Thanks for reviewing. Point taken.


-Kees


[dropped bloated cc - my server rejects this mess]


Thank you -- I was struggling to figure out the best way to reply to 
this. :)

I'm sorry for the trouble caused. Sent the email using,
git send-email  --to-cmd="scripts/get_maintainer.pl -i" 
0001-convert-totalram_pages-totalhigh_pages-and-managed_p.patch


Is this not a recommended approach?

Regards,
Arun



-Kees

Re: [PATCH] mm: convert totalram_pages, totalhigh_pages and managed_pages to atomic.

2018-10-23 Thread Arun KS


On 2018-10-24 01:34, Kees Cook wrote:

On Mon, Oct 22, 2018 at 10:11 PM, Konstantin Khlebnikov
 wrote:
On 23.10.2018 7:15, Joe Perches wrote:> On Mon, 2018-10-22 at 22:53 
+0530,

Arun KS wrote:

Remove managed_page_count_lock spinlock and instead use atomic
variables.


Perhaps better to define and use macros for the accesses
instead of specific uses of atomic_long_

Something like:

#define totalram_pages()  (unsigned
long)atomic_long_read(&_totalram_pages)


or proper static inline
this code isn't so low level for breaking include dependencies with 
macro


BTW, I noticed a few places in the patch that did multiple evaluations
of totalram_pages. It might be worth fixing those prior to doing the
conversion, too. e.g.:

if (totalram_pages > something)
   foobar(totalram_pages); <- value may have changed here

should, instead, be:

var = totalram_pages; <- get stable view of the value
if (var > something)
foobar(var);


Thanks for reviewing. Point taken.


-Kees


[dropped bloated cc - my server rejects this mess]


Thank you -- I was struggling to figure out the best way to reply to 
this. :)

I'm sorry for the trouble caused. Sent the email using,
git send-email  --to-cmd="scripts/get_maintainer.pl -i" 
0001-convert-totalram_pages-totalhigh_pages-and-managed_p.patch


Is this not a recommended approach?

Regards,
Arun



-Kees

Re: [PATCH] hugetlbfs: dirty pages as they are added to pagecache

2018-10-23 Thread Khalid Aziz

On Tue, 2018-10-23 at 10:30 -0700, Mike Kravetz wrote:
> . snip
> Here is updated patch without the drop_caches change and updated
> fixes tag.
> 
> From: Mike Kravetz 
> 
> hugetlbfs: dirty pages as they are added to pagecache
> 
> Some test systems were experiencing negative huge page reserve
> counts and incorrect file block counts.  This was traced to
> /proc/sys/vm/drop_caches removing clean pages from hugetlbfs
> file pagecaches.  When non-hugetlbfs explicit code removes the
> pages, the appropriate accounting is not performed.
> 
> This can be recreated as follows:
>  fallocate -l 2M /dev/hugepages/foo
>  echo 1 > /proc/sys/vm/drop_caches
>  fallocate -l 2M /dev/hugepages/foo
>  grep -i huge /proc/meminfo
>AnonHugePages: 0 kB
>ShmemHugePages:0 kB
>HugePages_Total:2048
>HugePages_Free: 2047
>HugePages_Rsvd:18446744073709551615
>HugePages_Surp:0
>Hugepagesize:   2048 kB
>Hugetlb: 4194304 kB
>  ls -lsh /dev/hugepages/foo
>4.0M -rw-r--r--. 1 root root 2.0M Oct 17 20:05 /dev/hugepages/foo
> 
> To address this issue, dirty pages as they are added to pagecache.
> This can easily be reproduced with fallocate as shown above. Read
> faulted pages will eventually end up being marked dirty.  But there
> is a window where they are clean and could be impacted by code such
> as drop_caches.  So, just dirty them all as they are added to the
> pagecache.
> 
> Fixes: 6bda666a03f0 ("hugepages: fold find_or_alloc_pages into
> huge_no_page()")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Mike Kravetz 
> ---
>  mm/hugetlb.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5c390f5a5207..7b5c0ad9a6bd 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3690,6 +3690,12 @@ int huge_add_to_page_cache(struct page *page,
> struct address_space *mapping,
>   return err;
>   ClearPagePrivate(page);
>  
> + /*
> +  * set page dirty so that it will not be removed from
> cache/file
> +  * by non-hugetlbfs specific code paths.
> +  */
> + set_page_dirty(page);
> +
>   spin_lock(>i_lock);
>   inode->i_blocks += blocks_per_huge_page(h);
>   spin_unlock(>i_lock);

This looks good.

Reviewed-by: Khalid Aziz 

--
Khalid

Re: [PATCH] hugetlbfs: dirty pages as they are added to pagecache

2018-10-23 Thread Khalid Aziz

On Tue, 2018-10-23 at 10:30 -0700, Mike Kravetz wrote:
> . snip
> Here is updated patch without the drop_caches change and updated
> fixes tag.
> 
> From: Mike Kravetz 
> 
> hugetlbfs: dirty pages as they are added to pagecache
> 
> Some test systems were experiencing negative huge page reserve
> counts and incorrect file block counts.  This was traced to
> /proc/sys/vm/drop_caches removing clean pages from hugetlbfs
> file pagecaches.  When non-hugetlbfs explicit code removes the
> pages, the appropriate accounting is not performed.
> 
> This can be recreated as follows:
>  fallocate -l 2M /dev/hugepages/foo
>  echo 1 > /proc/sys/vm/drop_caches
>  fallocate -l 2M /dev/hugepages/foo
>  grep -i huge /proc/meminfo
>AnonHugePages: 0 kB
>ShmemHugePages:0 kB
>HugePages_Total:2048
>HugePages_Free: 2047
>HugePages_Rsvd:18446744073709551615
>HugePages_Surp:0
>Hugepagesize:   2048 kB
>Hugetlb: 4194304 kB
>  ls -lsh /dev/hugepages/foo
>4.0M -rw-r--r--. 1 root root 2.0M Oct 17 20:05 /dev/hugepages/foo
> 
> To address this issue, dirty pages as they are added to pagecache.
> This can easily be reproduced with fallocate as shown above. Read
> faulted pages will eventually end up being marked dirty.  But there
> is a window where they are clean and could be impacted by code such
> as drop_caches.  So, just dirty them all as they are added to the
> pagecache.
> 
> Fixes: 6bda666a03f0 ("hugepages: fold find_or_alloc_pages into
> huge_no_page()")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Mike Kravetz 
> ---
>  mm/hugetlb.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5c390f5a5207..7b5c0ad9a6bd 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3690,6 +3690,12 @@ int huge_add_to_page_cache(struct page *page,
> struct address_space *mapping,
>   return err;
>   ClearPagePrivate(page);
>  
> + /*
> +  * set page dirty so that it will not be removed from
> cache/file
> +  * by non-hugetlbfs specific code paths.
> +  */
> + set_page_dirty(page);
> +
>   spin_lock(>i_lock);
>   inode->i_blocks += blocks_per_huge_page(h);
>   spin_unlock(>i_lock);

This looks good.

Reviewed-by: Khalid Aziz 

--
Khalid

[PATCH RFC v2 0/1] hugetlbfs: Use i_mmap_rwsem for pmd share and fault/trunc

2018-10-23 Thread Mike Kravetz

This patch addresses issues with page fault/truncation synchronization.
The first issue was noticed as a negative hugetlb reserved page counts
during DB development testing.  Code inspection revealed that the most
likely cause were races with truncate and page faults.  In fact, I could
write a not too complicated program to cause the races and recreate the
issue.

A more dangerous issue exists when you introduce huge pmd sharing to
page fault/truncate races.  The fist thing that happens in huge page
fault processing is a call to huge_pte_alloc to get a ptep.  Suppose
that ptep points to a shared pmd.  Now, another thread could perform
a truncate and unmap everyone mapping the file.  huge_pmd_unshare can
be called for the mapping on which the first thread is operating.
huge_pmd_unshare can clear pud pointing to the pmd.  After this, the
ptep points to another task's page table or worse.  This leads to bad
things such as incorrect page map/reference counts or invaid memory
references.

Fix this all by modifying the usage of i_mmap_rwsem to cover
fault/truncate races as well as handling of shared pmds

Mike Kravetz (1):
  hugetlbfs: use i_mmap_rwsem for pmd sharing and truncate/fault sync

 fs/hugetlbfs/inode.c | 21 ++
 mm/hugetlb.c | 65 +---
 mm/rmap.c| 10 +++
 mm/userfaultfd.c | 11 ++--
 4 files changed, 84 insertions(+), 23 deletions(-)

-- 
2.17.2

[PATCH RFC v2 0/1] hugetlbfs: Use i_mmap_rwsem for pmd share and fault/trunc

2018-10-23 Thread Mike Kravetz

This patch addresses issues with page fault/truncation synchronization.
The first issue was noticed as a negative hugetlb reserved page counts
during DB development testing.  Code inspection revealed that the most
likely cause were races with truncate and page faults.  In fact, I could
write a not too complicated program to cause the races and recreate the
issue.

A more dangerous issue exists when you introduce huge pmd sharing to
page fault/truncate races.  The fist thing that happens in huge page
fault processing is a call to huge_pte_alloc to get a ptep.  Suppose
that ptep points to a shared pmd.  Now, another thread could perform
a truncate and unmap everyone mapping the file.  huge_pmd_unshare can
be called for the mapping on which the first thread is operating.
huge_pmd_unshare can clear pud pointing to the pmd.  After this, the
ptep points to another task's page table or worse.  This leads to bad
things such as incorrect page map/reference counts or invaid memory
references.

Fix this all by modifying the usage of i_mmap_rwsem to cover
fault/truncate races as well as handling of shared pmds

Mike Kravetz (1):
  hugetlbfs: use i_mmap_rwsem for pmd sharing and truncate/fault sync

 fs/hugetlbfs/inode.c | 21 ++
 mm/hugetlb.c | 65 +---
 mm/rmap.c| 10 +++
 mm/userfaultfd.c | 11 ++--
 4 files changed, 84 insertions(+), 23 deletions(-)

-- 
2.17.2

Re: [PATCH v4 2/2] sched/fair: update scale invariance of PELT

2018-10-23 Thread Pavan Kondeti

Hi Vincent,

Thanks for the detailed explanation.

On Tue, Oct 23, 2018 at 02:15:08PM +0200, Vincent Guittot wrote:
> Hi Pavan,
> 
> On Tue, 23 Oct 2018 at 07:59, Pavan Kondeti  wrote:
> >
> > Hi Vincent,
> >
> > On Fri, Oct 19, 2018 at 06:17:51PM +0200, Vincent Guittot wrote:
> > >
> > >  /*
> > > + * The clock_pelt scales the time to reflect the effective amount of
> > > + * computation done during the running delta time but then sync back to
> > > + * clock_task when rq is idle.
> > > + *
> > > + *
> > > + * absolute time   | 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|16
> > > + * @ max capacity  --**---**---
> > > + * @ half capacity ----
> > > + * clock pelt  | 1| 2|3|4| 7| 8| 9|   10|   11|14|15|16
> > > + *
> > > + */
> > > +void update_rq_clock_pelt(struct rq *rq, s64 delta)
> > > +{
> > > +
> > > + if (is_idle_task(rq->curr)) {
> > > + u32 divider = (LOAD_AVG_MAX - 1024 + 
> > > rq->cfs.avg.period_contrib) << SCHED_CAPACITY_SHIFT;
> > > + u32 overload = rq->cfs.avg.util_sum + LOAD_AVG_MAX;
> > > + overload += rq->avg_rt.util_sum;
> > > + overload += rq->avg_dl.util_sum;
> > > +
> > > + /*
> > > +  * Reflecting some stolen time makes sense only if the idle
> > > +  * phase would be present at max capacity. As soon as the
> > > +  * utilization of a rq has reached the maximum value, it is
> > > +  * considered as an always runnnig rq without idle time to
> > > +  * steal. This potential idle time is considered as lost in
> > > +  * this case. We keep track of this lost idle time compare 
> > > to
> > > +  * rq's clock_task.
> > > +  */
> > > + if (overload >= divider)
> > > + rq->lost_idle_time += rq_clock_task(rq) - 
> > > rq->clock_pelt;
> > > +
> >
> > I am trying to understand this better. I believe we run into this scenario, 
> > when
> > the frequency is limited due to thermal/userspace constraints. Lets say
> 
> Yes these are the most common UCs but this can also happen after tasks
> migration or with a cpufreq governor that doesn't increase OPP fast
> enough for current utilization.
> 
> > frequency is limited to Fmax/2. A 50% task at Fmax, becomes 100% running at
> > Fmax/2. The utilization is built up to 100% after several periods.
> > The clock_pelt runs at 1/2 speed of the clock_task. We are loosing the idle 
> > time
> > all along. What happens when the CPU enters idle for a short duration and 
> > comes
> > back to run this 100% utilization task?
> 
> If you are at 100%, we only apply the short idle duration
> 
> >
> > If the above block is not present i.e lost_idle_time is not tracked, we
> > stretch the idle time (since clock_pelt is synced to clock_task) and the
> > utilization is dropped. Right?
> 
> yes that 's what would happen. I gives more details below
> 
> >
> > With the above block, we don't stretch the idle time. In fact we don't
> > consider the idle time at all. Because,
> >
> > idle_time = now - last_time;
> >
> > idle_time = (rq->clock_pelt - rq->lost_idle_time) - last_time
> > idle_time = (rq->clock_task - rq_clock_task + rq->clock_pelt_old) - 
> > last_time
> > idle_time = rq->clock_pelt_old - last_time
> >
> > The last time is nothing but the last snapshot of the rq->clock_pelt when 
> > the
> > task entered sleep due to which CPU entered idle.
> 
> The condition for dropping this idle time is quite important. This
> only happens when the utilization reaches max compute capacity of the
> CPU. Otherwise, the idle time will be fully applied

Right.

rq->lost_idle_time += rq_clock_task(rq) - rq->clock_pelt

This not only tracks the lost idle time due to running slow but also the
absolute/real sleep time. For example, when the slow running 100% task
sleeps for 100 msec, are not we ignoring the 100 msec sleep there?

For example a task ran 323 msec at full capacity and sleeps for (1000-323)
msec. when it wakes up the utilization is dropped. If the same task runs
for 626 msec at the half capacity and sleeps for (1000-626), should not
drop the utilization by taking (1000-626) sleep time into account. I
understand that why we don't strech idle time to (1000-323) but it is not
clear to me why we completely drop the idle time.

> 
> >
> > Can you please explain the significance of the above block with an example?
> 
> The pelt signal reaches its max value after 323ms at full capacity,
> which means that we can't make any difference between tasks running
> 323ms, 500ms or more at max capacity. As a result, we consider that
> the CPU is fully used and there is no idle time when the utilization
> equals max capacity. If CPU runs at half the capacity, it will run
> 626ms before reaching max utilization and at that time we will stop to
> stretch the idle time because we consider that there is no idle

Re: [PATCH v4 2/2] sched/fair: update scale invariance of PELT

2018-10-23 Thread Pavan Kondeti

Hi Vincent,

Thanks for the detailed explanation.

On Tue, Oct 23, 2018 at 02:15:08PM +0200, Vincent Guittot wrote:
> Hi Pavan,
> 
> On Tue, 23 Oct 2018 at 07:59, Pavan Kondeti  wrote:
> >
> > Hi Vincent,
> >
> > On Fri, Oct 19, 2018 at 06:17:51PM +0200, Vincent Guittot wrote:
> > >
> > >  /*
> > > + * The clock_pelt scales the time to reflect the effective amount of
> > > + * computation done during the running delta time but then sync back to
> > > + * clock_task when rq is idle.
> > > + *
> > > + *
> > > + * absolute time   | 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|16
> > > + * @ max capacity  --**---**---
> > > + * @ half capacity ----
> > > + * clock pelt  | 1| 2|3|4| 7| 8| 9|   10|   11|14|15|16
> > > + *
> > > + */
> > > +void update_rq_clock_pelt(struct rq *rq, s64 delta)
> > > +{
> > > +
> > > + if (is_idle_task(rq->curr)) {
> > > + u32 divider = (LOAD_AVG_MAX - 1024 + 
> > > rq->cfs.avg.period_contrib) << SCHED_CAPACITY_SHIFT;
> > > + u32 overload = rq->cfs.avg.util_sum + LOAD_AVG_MAX;
> > > + overload += rq->avg_rt.util_sum;
> > > + overload += rq->avg_dl.util_sum;
> > > +
> > > + /*
> > > +  * Reflecting some stolen time makes sense only if the idle
> > > +  * phase would be present at max capacity. As soon as the
> > > +  * utilization of a rq has reached the maximum value, it is
> > > +  * considered as an always runnnig rq without idle time to
> > > +  * steal. This potential idle time is considered as lost in
> > > +  * this case. We keep track of this lost idle time compare 
> > > to
> > > +  * rq's clock_task.
> > > +  */
> > > + if (overload >= divider)
> > > + rq->lost_idle_time += rq_clock_task(rq) - 
> > > rq->clock_pelt;
> > > +
> >
> > I am trying to understand this better. I believe we run into this scenario, 
> > when
> > the frequency is limited due to thermal/userspace constraints. Lets say
> 
> Yes these are the most common UCs but this can also happen after tasks
> migration or with a cpufreq governor that doesn't increase OPP fast
> enough for current utilization.
> 
> > frequency is limited to Fmax/2. A 50% task at Fmax, becomes 100% running at
> > Fmax/2. The utilization is built up to 100% after several periods.
> > The clock_pelt runs at 1/2 speed of the clock_task. We are loosing the idle 
> > time
> > all along. What happens when the CPU enters idle for a short duration and 
> > comes
> > back to run this 100% utilization task?
> 
> If you are at 100%, we only apply the short idle duration
> 
> >
> > If the above block is not present i.e lost_idle_time is not tracked, we
> > stretch the idle time (since clock_pelt is synced to clock_task) and the
> > utilization is dropped. Right?
> 
> yes that 's what would happen. I gives more details below
> 
> >
> > With the above block, we don't stretch the idle time. In fact we don't
> > consider the idle time at all. Because,
> >
> > idle_time = now - last_time;
> >
> > idle_time = (rq->clock_pelt - rq->lost_idle_time) - last_time
> > idle_time = (rq->clock_task - rq_clock_task + rq->clock_pelt_old) - 
> > last_time
> > idle_time = rq->clock_pelt_old - last_time
> >
> > The last time is nothing but the last snapshot of the rq->clock_pelt when 
> > the
> > task entered sleep due to which CPU entered idle.
> 
> The condition for dropping this idle time is quite important. This
> only happens when the utilization reaches max compute capacity of the
> CPU. Otherwise, the idle time will be fully applied

Right.

rq->lost_idle_time += rq_clock_task(rq) - rq->clock_pelt

This not only tracks the lost idle time due to running slow but also the
absolute/real sleep time. For example, when the slow running 100% task
sleeps for 100 msec, are not we ignoring the 100 msec sleep there?

For example a task ran 323 msec at full capacity and sleeps for (1000-323)
msec. when it wakes up the utilization is dropped. If the same task runs
for 626 msec at the half capacity and sleeps for (1000-626), should not
drop the utilization by taking (1000-626) sleep time into account. I
understand that why we don't strech idle time to (1000-323) but it is not
clear to me why we completely drop the idle time.

> 
> >
> > Can you please explain the significance of the above block with an example?
> 
> The pelt signal reaches its max value after 323ms at full capacity,
> which means that we can't make any difference between tasks running
> 323ms, 500ms or more at max capacity. As a result, we consider that
> the CPU is fully used and there is no idle time when the utilization
> equals max capacity. If CPU runs at half the capacity, it will run
> 626ms before reaching max utilization and at that time we will stop to
> stretch the idle time because we consider that there is no idle

[PATCH RFC v2 1/1] hugetlbfs: use i_mmap_rwsem for pmd sharing and truncate/fault sync

2018-10-23 Thread Mike Kravetz

hugetlbfs does not correctly handle page faults racing with truncation.
In addition, shared pmds can cause additional issues.

Without pmd sharing, issues can occur as follows:
  A huegtlbfs file is mmap(MAP_SHARED) with a size of 4 pages.  At
  mmap time, 4 huge pages are reserved for the file/mapping.  So,
  the global reserve count is 4.  In addition, since this is a shared
  mapping an entry for 4 pages is added to the file's reserve map.
  The first 3 of the 4 pages are faulted into the file.  As a result,
  the global reserve count is now 1.

  Task A starts to fault in the last page (routines hugetlb_fault,
  hugetlb_no_page).  It allocates a huge page (alloc_huge_page).
  The reserve map indicates there is a reserved page, so this is
  used and the global reserve count goes to 0.

  Now, task B truncates the file to size 0.  It starts by setting
  inode size to 0(hugetlb_vmtruncate).  It then unmaps all mapping
  of the file (hugetlb_vmdelete_list).  Since task A's page table
  lock is not held at the time, truncation is not blocked.  Truncation
  removes the 3 pages from the file (remove_inode_hugepages).  When
  cleaning up the reserved pages (hugetlb_unreserve_pages), it notices
  the reserve map was for 4 pages.  However, it has only freed 3 pages.
  So it assumes there is still (4 - 3) 1 reserved pages.  It then
  decrements the global reserve count by 1 and it goes negative.

  Task A then continues the page fault process and adds it's newly
  acquired page to the page cache.  Note that the index of this page
  is beyond the size of the truncated file (0).  The page fault process
  then notices the file has been truncated and exits.  However, the
  page is left in the cache associated with the file.

  Now, if the file is immediately deleted the truncate code runs again.
  It will find and free the one page associated with the file.  When
  cleaning up reserves, it notices the reserve map is empty.  Yet, one
  page freed.  So, the global reserve count is decremented by (0 - 1) -1.
  This returns the global count to 0 as it should be.  But, it is
  possible for someone else to mmap this file/range before it is deleted.
  If this happens, a reserve map entry for the allocated page is created
  and the reserved page is forever leaked.

With pmd sharing, the situation is even worse.  Consider the following:
  A task processes a page fault on a shared hugetlbfs file and calls
  huge_pte_alloc to get a ptep.  Suppose the returned ptep points to a
  shared pmd.

  Now, anopther task truncates the hugetlbfs file.  As part of truncation,
  it unmaps everyone who has the file mapped.  If a task has a shared pmd
  in this range, huge_pmd_unshhare will be called.  If this is not the last
  user sharing the pmd, huge_pmd_unshare will clear pud pointing to the
  pmd.  For the task in the middle of the page fault, the ptep returned by
  huge_pte_alloc points to another task's page table or worse.  This leads
  to bad things such as incorrect page map/reference counts or invalid
  memory references.

i_mmap_rwsem is currently used for pmd sharing synchronization.  It is also
held during unmap and whenever a call to huge_pmd_unshare is possible.  It
is only acquired in write mode.  Expand and modify the use of i_mmap_rwsem
as follows:
- i_mmap_rwsem is held in write mode for the duration of truncate
  processing.
- i_mmap_rwsem is held in write mode whenever huge_pmd_share is called.
- i_mmap_rwsem is held in read mode whenever huge_pmd_share is called.
  Today that is only via huge_pte_alloc.
- i_mmap_rwsem is held in read mode after huge_pte_alloc, until the caller
  is finished with the returned ptep.

Signed-off-by: Mike Kravetz 
---
 fs/hugetlbfs/inode.c | 21 ++
 mm/hugetlb.c | 65 +---
 mm/rmap.c| 10 +++
 mm/userfaultfd.c | 11 ++--
 4 files changed, 84 insertions(+), 23 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 32920a10100e..6ee97622a231 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -426,10 +426,16 @@ static void remove_inode_hugepages(struct inode *inode, 
loff_t lstart,
u32 hash;
 
index = page->index;
-   hash = hugetlb_fault_mutex_hash(h, current->mm,
+   /*
+* No need to take fault mutex for truncation as we
+* are synchronized via i_mmap_rwsem.
+*/
+   if (!truncate_op) {
+   hash = hugetlb_fault_mutex_hash(h, current->mm,
_vma,
mapping, index, 0);
-   mutex_lock(_fault_mutex_table[hash]);
+   mutex_lock(_fault_mutex_table[hash]);
+   }
 
/*

[PATCH RFC v2 1/1] hugetlbfs: use i_mmap_rwsem for pmd sharing and truncate/fault sync

2018-10-23 Thread Mike Kravetz

hugetlbfs does not correctly handle page faults racing with truncation.
In addition, shared pmds can cause additional issues.

Without pmd sharing, issues can occur as follows:
  A huegtlbfs file is mmap(MAP_SHARED) with a size of 4 pages.  At
  mmap time, 4 huge pages are reserved for the file/mapping.  So,
  the global reserve count is 4.  In addition, since this is a shared
  mapping an entry for 4 pages is added to the file's reserve map.
  The first 3 of the 4 pages are faulted into the file.  As a result,
  the global reserve count is now 1.

  Task A starts to fault in the last page (routines hugetlb_fault,
  hugetlb_no_page).  It allocates a huge page (alloc_huge_page).
  The reserve map indicates there is a reserved page, so this is
  used and the global reserve count goes to 0.

  Now, task B truncates the file to size 0.  It starts by setting
  inode size to 0(hugetlb_vmtruncate).  It then unmaps all mapping
  of the file (hugetlb_vmdelete_list).  Since task A's page table
  lock is not held at the time, truncation is not blocked.  Truncation
  removes the 3 pages from the file (remove_inode_hugepages).  When
  cleaning up the reserved pages (hugetlb_unreserve_pages), it notices
  the reserve map was for 4 pages.  However, it has only freed 3 pages.
  So it assumes there is still (4 - 3) 1 reserved pages.  It then
  decrements the global reserve count by 1 and it goes negative.

  Task A then continues the page fault process and adds it's newly
  acquired page to the page cache.  Note that the index of this page
  is beyond the size of the truncated file (0).  The page fault process
  then notices the file has been truncated and exits.  However, the
  page is left in the cache associated with the file.

  Now, if the file is immediately deleted the truncate code runs again.
  It will find and free the one page associated with the file.  When
  cleaning up reserves, it notices the reserve map is empty.  Yet, one
  page freed.  So, the global reserve count is decremented by (0 - 1) -1.
  This returns the global count to 0 as it should be.  But, it is
  possible for someone else to mmap this file/range before it is deleted.
  If this happens, a reserve map entry for the allocated page is created
  and the reserved page is forever leaked.

With pmd sharing, the situation is even worse.  Consider the following:
  A task processes a page fault on a shared hugetlbfs file and calls
  huge_pte_alloc to get a ptep.  Suppose the returned ptep points to a
  shared pmd.

  Now, anopther task truncates the hugetlbfs file.  As part of truncation,
  it unmaps everyone who has the file mapped.  If a task has a shared pmd
  in this range, huge_pmd_unshhare will be called.  If this is not the last
  user sharing the pmd, huge_pmd_unshare will clear pud pointing to the
  pmd.  For the task in the middle of the page fault, the ptep returned by
  huge_pte_alloc points to another task's page table or worse.  This leads
  to bad things such as incorrect page map/reference counts or invalid
  memory references.

i_mmap_rwsem is currently used for pmd sharing synchronization.  It is also
held during unmap and whenever a call to huge_pmd_unshare is possible.  It
is only acquired in write mode.  Expand and modify the use of i_mmap_rwsem
as follows:
- i_mmap_rwsem is held in write mode for the duration of truncate
  processing.
- i_mmap_rwsem is held in write mode whenever huge_pmd_share is called.
- i_mmap_rwsem is held in read mode whenever huge_pmd_share is called.
  Today that is only via huge_pte_alloc.
- i_mmap_rwsem is held in read mode after huge_pte_alloc, until the caller
  is finished with the returned ptep.

Signed-off-by: Mike Kravetz 
---
 fs/hugetlbfs/inode.c | 21 ++
 mm/hugetlb.c | 65 +---
 mm/rmap.c| 10 +++
 mm/userfaultfd.c | 11 ++--
 4 files changed, 84 insertions(+), 23 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 32920a10100e..6ee97622a231 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -426,10 +426,16 @@ static void remove_inode_hugepages(struct inode *inode, 
loff_t lstart,
u32 hash;
 
index = page->index;
-   hash = hugetlb_fault_mutex_hash(h, current->mm,
+   /*
+* No need to take fault mutex for truncation as we
+* are synchronized via i_mmap_rwsem.
+*/
+   if (!truncate_op) {
+   hash = hugetlb_fault_mutex_hash(h, current->mm,
_vma,
mapping, index, 0);
-   mutex_lock(_fault_mutex_table[hash]);
+   mutex_lock(_fault_mutex_table[hash]);
+   }
 
/*

Re: [PATCH] s390/fault: use wake_up_klogd() in bust_spinlocks()

2018-10-23 Thread Sergey Senozhatsky

On (10/24/18 13:30), Sergey Senozhatsky wrote:
> -  * OK, the message is on the console.  Now we call printk()
> -  * without oops_in_progress set so that printk will give klogd
> -  * a poke.  Hold onto your hats...
> -  */
> - console_loglevel = 15;
> - printk(" ");
>   console_loglevel = loglevel_save;
> +
> + oops_in_progress = 0;
> + wake_up_klogd();

D'oh... Fat fingers!
I noticed that I have removed "console_loglevel = 15".
Sorry about that.

From: Sergey Senozhatsky 
Subject: [PATCH] s390/fault: use wake_up_klogd() in bust_spinlocks()

printk() without oops_in_progress set is potentially dangerous.
it will attempt to call into console driver, so if oops happened
while console driver port->lock spin_lock was locked on the same
CPU (NMI oops or oops from console driver), then re-entering
console driver from bust_spinlocks() will deadlock the system.

Some serial drivers have are re-entrant from oops path:

static void serial_console_write(struct console *co, const char *s,
 unsigned count)
{
...
if (port->sysrq)
locked = 0;
else if (oops_in_progress)
locked = spin_trylock_irqsave(>lock, flags);
else
spin_lock_irqsave(>lock, flags);
...

uart_console_write(port, s, count, serial_console_putchar);
...
if (locked)
spin_unlock_irqrestore(>lock, flags);
}

So it's OK to call printk() or console_unblank() and re-enter
serial console drivers when oops_in_progress set. But once we
clear oops_in_progress serial consoles become non-reentrant.

>From the comment it seems that s390 wants to just poke klogd.
There is wake_up_klogd() for this purpose, so we can replace
that printk(" ").

Signed-off-by: Sergey Senozhatsky 
---
 arch/s390/mm/fault.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 2b8f32f56e0c..53915c61ad95 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -92,16 +92,12 @@ void bust_spinlocks(int yes)
oops_in_progress = 1;
} else {
int loglevel_save = console_loglevel;
-   console_unblank();
-   oops_in_progress = 0;
-   /*
-* OK, the message is on the console.  Now we call printk()
-* without oops_in_progress set so that printk will give klogd
-* a poke.  Hold onto your hats...
-*/
+
console_loglevel = 15;
-   printk(" ");
+   console_unblank();
console_loglevel = loglevel_save;
+   oops_in_progress = 0;
+   wake_up_klogd();
}
 }

-- 
2.19.1

Re: [PATCH] s390/fault: use wake_up_klogd() in bust_spinlocks()

2018-10-23 Thread Sergey Senozhatsky

On (10/24/18 13:30), Sergey Senozhatsky wrote:
> -  * OK, the message is on the console.  Now we call printk()
> -  * without oops_in_progress set so that printk will give klogd
> -  * a poke.  Hold onto your hats...
> -  */
> - console_loglevel = 15;
> - printk(" ");
>   console_loglevel = loglevel_save;
> +
> + oops_in_progress = 0;
> + wake_up_klogd();

D'oh... Fat fingers!
I noticed that I have removed "console_loglevel = 15".
Sorry about that.

From: Sergey Senozhatsky 
Subject: [PATCH] s390/fault: use wake_up_klogd() in bust_spinlocks()

printk() without oops_in_progress set is potentially dangerous.
it will attempt to call into console driver, so if oops happened
while console driver port->lock spin_lock was locked on the same
CPU (NMI oops or oops from console driver), then re-entering
console driver from bust_spinlocks() will deadlock the system.

Some serial drivers have are re-entrant from oops path:

static void serial_console_write(struct console *co, const char *s,
 unsigned count)
{
...
if (port->sysrq)
locked = 0;
else if (oops_in_progress)
locked = spin_trylock_irqsave(>lock, flags);
else
spin_lock_irqsave(>lock, flags);
...

uart_console_write(port, s, count, serial_console_putchar);
...
if (locked)
spin_unlock_irqrestore(>lock, flags);
}

So it's OK to call printk() or console_unblank() and re-enter
serial console drivers when oops_in_progress set. But once we
clear oops_in_progress serial consoles become non-reentrant.

>From the comment it seems that s390 wants to just poke klogd.
There is wake_up_klogd() for this purpose, so we can replace
that printk(" ").

Signed-off-by: Sergey Senozhatsky 
---
 arch/s390/mm/fault.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 2b8f32f56e0c..53915c61ad95 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -92,16 +92,12 @@ void bust_spinlocks(int yes)
oops_in_progress = 1;
} else {
int loglevel_save = console_loglevel;
-   console_unblank();
-   oops_in_progress = 0;
-   /*
-* OK, the message is on the console.  Now we call printk()
-* without oops_in_progress set so that printk will give klogd
-* a poke.  Hold onto your hats...
-*/
+
console_loglevel = 15;
-   printk(" ");
+   console_unblank();
console_loglevel = loglevel_save;
+   oops_in_progress = 0;
+   wake_up_klogd();
}
 }

-- 
2.19.1

[PATCH] s390/fault: use wake_up_klogd() in bust_spinlocks()

2018-10-23 Thread Sergey Senozhatsky

printk() without oops_in_progress set is potentially dangerous.
it will attempt to call into console driver, so if oops happened
while console driver port->lock spin_lock was locked on the same
CPU (NMI oops or oops from console driver), then re-entering
console driver from bust_spinlocks() will deadlock the system.

Some serial drivers have are re-entrant from oops path:

static void serial_console_write(struct console *co, const char *s,
 unsigned count)
{
...
if (port->sysrq)
locked = 0;
else if (oops_in_progress)
locked = spin_trylock_irqsave(>lock, flags);
else
spin_lock_irqsave(>lock, flags);
...

uart_console_write(port, s, count, serial_console_putchar);
...
if (locked)
spin_unlock_irqrestore(>lock, flags);
}

So it's OK to call printk() or console_unblank() and re-enter
serial console drivers when oops_in_progress set. But once we
clear oops_in_progress serial consoles become non-reentrant.

>From the comment it seems that s390 wants to just poke klogd.
There is wake_up_klogd() for this purpose, so we can replace
that printk(" ").

Signed-off-by: Sergey Senozhatsky 
---
 arch/s390/mm/fault.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 2b8f32f56e0c..244993dc3c70 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -93,15 +93,10 @@ void bust_spinlocks(int yes)
} else {
int loglevel_save = console_loglevel;
console_unblank();
-   oops_in_progress = 0;
-   /*
-* OK, the message is on the console.  Now we call printk()
-* without oops_in_progress set so that printk will give klogd
-* a poke.  Hold onto your hats...
-*/
-   console_loglevel = 15;
-   printk(" ");
console_loglevel = loglevel_save;
+
+   oops_in_progress = 0;
+   wake_up_klogd();
}
 }
 
-- 
2.19.1

Re: [PATCH] scsi: 3w-{sas,9xxx}: Use unsigned char for cdb

2018-10-23 Thread Martin K. Petersen



Nathan,

> Clang warns a few times:
>
> drivers/scsi/3w-sas.c:386:11: warning: implicit conversion from 'int' to
> 'char' changes value from 128 to -128 [-Wconstant-conversion]
> cdb[4] = TW_ALLOCATION_LENGTH; /* allocation length */
>~ ^~~~
>
> Update cdb's type to unsigned char, which matches the type of the cdb
> member in struct TW_Command_Apache.

Applied to 4.20/scsi-queue. Thank you.

-- 
Martin K. Petersen  Oracle Linux Engineering

[PATCH] s390/fault: use wake_up_klogd() in bust_spinlocks()

2018-10-23 Thread Sergey Senozhatsky

printk() without oops_in_progress set is potentially dangerous.
it will attempt to call into console driver, so if oops happened
while console driver port->lock spin_lock was locked on the same
CPU (NMI oops or oops from console driver), then re-entering
console driver from bust_spinlocks() will deadlock the system.

Some serial drivers have are re-entrant from oops path:

static void serial_console_write(struct console *co, const char *s,
 unsigned count)
{
...
if (port->sysrq)
locked = 0;
else if (oops_in_progress)
locked = spin_trylock_irqsave(>lock, flags);
else
spin_lock_irqsave(>lock, flags);
...

uart_console_write(port, s, count, serial_console_putchar);
...
if (locked)
spin_unlock_irqrestore(>lock, flags);
}

So it's OK to call printk() or console_unblank() and re-enter
serial console drivers when oops_in_progress set. But once we
clear oops_in_progress serial consoles become non-reentrant.

>From the comment it seems that s390 wants to just poke klogd.
There is wake_up_klogd() for this purpose, so we can replace
that printk(" ").

Signed-off-by: Sergey Senozhatsky 
---
 arch/s390/mm/fault.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 2b8f32f56e0c..244993dc3c70 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -93,15 +93,10 @@ void bust_spinlocks(int yes)
} else {
int loglevel_save = console_loglevel;
console_unblank();
-   oops_in_progress = 0;
-   /*
-* OK, the message is on the console.  Now we call printk()
-* without oops_in_progress set so that printk will give klogd
-* a poke.  Hold onto your hats...
-*/
-   console_loglevel = 15;
-   printk(" ");
console_loglevel = loglevel_save;
+
+   oops_in_progress = 0;
+   wake_up_klogd();
}
 }
 
-- 
2.19.1

Re: [PATCH] scsi: 3w-{sas,9xxx}: Use unsigned char for cdb

2018-10-23 Thread Martin K. Petersen



Nathan,

> Clang warns a few times:
>
> drivers/scsi/3w-sas.c:386:11: warning: implicit conversion from 'int' to
> 'char' changes value from 128 to -128 [-Wconstant-conversion]
> cdb[4] = TW_ALLOCATION_LENGTH; /* allocation length */
>~ ^~~~
>
> Update cdb's type to unsigned char, which matches the type of the cdb
> member in struct TW_Command_Apache.

Applied to 4.20/scsi-queue. Thank you.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH V12 00/14] Krait clocks + Krait CPUfreq

2018-10-23 Thread Sricharan R

Hi Niklas,

On 10/22/2018 9:00 PM, Niklas Cassel wrote:
> On Mon, Oct 22, 2018 at 09:39:03AM +0530, Sricharan R wrote:
>> Hi Stephen,
>>
>> On 10/18/2018 1:46 AM, Stephen Boyd wrote:
>>> Quoting Stephen Boyd (2018-10-17 08:44:12)
 Quoting Sricharan R (2018-09-20 06:03:31)
>
>
> On 9/20/2018 1:54 AM, Craig wrote:
>> Yup, this patch seems to have fixed the higher frequencies from the 
>> quick test I did.
>>
>   Thanks !!. Can i take that as 
>   Tested-by: Craig Tatlor   ?
>

 Is this patch series going to be resent?

>>>
>>> Nevermind. Looking at it I think I can apply all the clk ones and we're
>>> good to go. If you can send a followup patch series to change the
>>> registration and provider APIs to be clk_hw instead of clk based I would
>>> appreciate it.
>>>
>>
>> Sorry for the late response. Was away.
>> Only pending thing was separating out the binding documentation for the 
>> cpu-freq
>> driver and fixing the text in documentation.  That means, yes its fine to 
>> merge
>> the clk ones as you said. I will resend that. Also, will send a follow up 
>> series for clk_hw to
>> clk change as you mentioned separately.
> 
> Hello Sricharan,
> 
> Great to see that the clk parts has been marged to clk-next!
> 
> Are you also planning on sending out a new version of the cpufreq driver
> consolidation parts?
> 
   yeah right, will send a new version, sometime next week.

> I'm planning on extending your consilidated cpufreq driver with support
> for msm8916 (Cortex-A53), where I plan to read PVS/speedbin, in order to
> set opp_supported_hw(), and also register with cpufreq (since Viresh/Ulf
> suggested that we shouldn't register with cpufreq in the CPR power-domain
> driver).

   ok sure.

Regards,
 Sricharan

-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

Re: [PATCH V12 00/14] Krait clocks + Krait CPUfreq

2018-10-23 Thread Sricharan R

Hi Niklas,

On 10/22/2018 9:00 PM, Niklas Cassel wrote:
> On Mon, Oct 22, 2018 at 09:39:03AM +0530, Sricharan R wrote:
>> Hi Stephen,
>>
>> On 10/18/2018 1:46 AM, Stephen Boyd wrote:
>>> Quoting Stephen Boyd (2018-10-17 08:44:12)
 Quoting Sricharan R (2018-09-20 06:03:31)
>
>
> On 9/20/2018 1:54 AM, Craig wrote:
>> Yup, this patch seems to have fixed the higher frequencies from the 
>> quick test I did.
>>
>   Thanks !!. Can i take that as 
>   Tested-by: Craig Tatlor   ?
>

 Is this patch series going to be resent?

>>>
>>> Nevermind. Looking at it I think I can apply all the clk ones and we're
>>> good to go. If you can send a followup patch series to change the
>>> registration and provider APIs to be clk_hw instead of clk based I would
>>> appreciate it.
>>>
>>
>> Sorry for the late response. Was away.
>> Only pending thing was separating out the binding documentation for the 
>> cpu-freq
>> driver and fixing the text in documentation.  That means, yes its fine to 
>> merge
>> the clk ones as you said. I will resend that. Also, will send a follow up 
>> series for clk_hw to
>> clk change as you mentioned separately.
> 
> Hello Sricharan,
> 
> Great to see that the clk parts has been marged to clk-next!
> 
> Are you also planning on sending out a new version of the cpufreq driver
> consolidation parts?
> 
   yeah right, will send a new version, sometime next week.

> I'm planning on extending your consilidated cpufreq driver with support
> for msm8916 (Cortex-A53), where I plan to read PVS/speedbin, in order to
> set opp_supported_hw(), and also register with cpufreq (since Viresh/Ulf
> suggested that we shouldn't register with cpufreq in the CPR power-domain
> driver).

   ok sure.

Regards,
 Sricharan

-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH v3 3/5] Creates macro to avoid variable shadowing

2018-10-23 Thread Leonardo Bras

Creates DEF_FIELD_ADDR_VAR as a more generic version of the DEF_FIELD_ADD
macro, allowing usage of a variable name other than the struct element name.
Also, sets DEF_FIELD_ADDR as a specific usage of DEF_FILD_ADDR_VAR in which
the var name is the same as the struct element name.
Then, makes use of DEF_FIELD_ADDR_VAR to create a variable of another name,
in order to avoid variable shadowing.

Signed-off-by: Leonardo Bras 
---
 scripts/mod/file2alias.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
index 7be43697ff84..ed468313ddeb 100644
--- a/scripts/mod/file2alias.c
+++ b/scripts/mod/file2alias.c
@@ -95,12 +95,20 @@ extern struct devtable *__start___devtable[], 
*__stop___devtable[];
  */
 #define DEF_FIELD(m, devid, f) \
typeof(((struct devid *)0)->f) f = TO_NATIVE(*(typeof(f) *)((m) + 
OFF_##devid##_##f))
+
+/* Define a variable v that holds the address of field f of struct devid
+ * based at address m.  Due to the way typeof works, for a field of type
+ * T[N] the variable has type T(*)[N], _not_ T*.
+ */
+#define DEF_FIELD_ADDR_VAR(m, devid, f, v) \
+   typeof(((struct devid *)0)->f) *v = ((m) + OFF_##devid##_##f)
+
 /* Define a variable f that holds the address of field f of struct devid
  * based at address m.  Due to the way typeof works, for a field of type
  * T[N] the variable has type T(*)[N], _not_ T*.
  */
 #define DEF_FIELD_ADDR(m, devid, f) \
-   typeof(((struct devid *)0)->f) *f = ((m) + OFF_##devid##_##f)
+   DEF_FIELD_ADDR_VAR(m, devid, f, f)
 
 /* Add a table entry.  We test function type matches while we're here. */
 #define ADD_TO_DEVTABLE(device_id, type, function) \
@@ -644,7 +652,7 @@ static void do_pnp_card_entries(void *symval, unsigned long 
size,
 
for (i = 0; i < count; i++) {
unsigned int j;
-   DEF_FIELD_ADDR(symval + i*id_size, pnp_card_device_id, devs);
+   DEF_FIELD_ADDR(symval + i * id_size, pnp_card_device_id, devs);
 
for (j = 0; j < PNP_MAX_DEVICES; j++) {
const char *id = (char *)(*devs)[j].id;
@@ -656,10 +664,13 @@ static void do_pnp_card_entries(void *symval, unsigned 
long size,
 
/* find duplicate, already added value */
for (i2 = 0; i2 < i && !dup; i2++) {
-   DEF_FIELD_ADDR(symval + i2*id_size, 
pnp_card_device_id, devs);
+   DEF_FIELD_ADDR_VAR(symval + i2 * id_size,
+  pnp_card_device_id,
+  devs, devs_dup);
 
for (j2 = 0; j2 < PNP_MAX_DEVICES; j2++) {
-   const char *id2 = (char 
*)(*devs)[j2].id;
+   const char *id2 =
+   (char *)(*devs_dup)[j2].id;
 
if (!id2[0])
break;
-- 
2.19.1

[PATCH v3 5/5] Adds -Wshadow on KBUILD_HOSTCFLAGS

2018-10-23 Thread Leonardo Bras

Adds -Wshadow on KBUILD_HOSTCFLAGS to show shadow warnings
on tools built for HOST.

Signed-off-by: Leonardo Bras 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index e8b599b4dcde..3edae5d359b5 100644
--- a/Makefile
+++ b/Makefile
@@ -360,7 +360,7 @@ HOST_LFS_LIBS := $(shell getconf LFS_LIBS 2>/dev/null)
 
 HOSTCC   = gcc
 HOSTCXX  = g++
-KBUILD_HOSTCFLAGS   := -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 \
+KBUILD_HOSTCFLAGS   := -Wall -Wshadow -Wmissing-prototypes -Wstrict-prototypes 
-O2 \
-fomit-frame-pointer -std=gnu89 $(HOST_LFS_CFLAGS) \
$(HOSTCFLAGS)
 KBUILD_HOSTCXXFLAGS := -O2 $(HOST_LFS_CFLAGS) $(HOSTCXXFLAGS)
-- 
2.19.1

[PATCH v3 0/5] Adds -Wshadow on KBUILD_HOSTCFLAGS and fix warnings

2018-10-23 Thread Leonardo Bras

This patchset add -Wshadow on KBUILD_HOSTCFLAGS and fixes
all code that show this warning.

Changes in v3:
- Better Cover letter
- Better commit message for patch 1/5.
- Fixes what should change on patch 3/5
- Removes accent of my second name (better for searching at lkml.org)


v2: https://lkml.org/lkml/2018/10/23/151
v1: https://lkml.org/lkml/2018/10/17/169

Leonardo Bras (5):
  x86/vdso: Renames variable to fix shadow warning.
  kbuild: Removes unnecessary shadowed local variable.
  Creates macro to avoid variable shadowing
  modpost: Changes parameter name to avoid shadowing.
  Adds -Wshadow on KBUILD_HOSTCFLAGS

 Makefile |  2 +-
 arch/x86/entry/vdso/vdso2c.h | 13 +++--
 scripts/asn1_compiler.c  |  2 +-
 scripts/mod/file2alias.c | 19 +++
 scripts/mod/modpost.c|  4 ++--
 5 files changed, 26 insertions(+), 14 deletions(-)

-- 
2.19.1

[PATCH v3 3/5] Creates macro to avoid variable shadowing

2018-10-23 Thread Leonardo Bras

Creates DEF_FIELD_ADDR_VAR as a more generic version of the DEF_FIELD_ADD
macro, allowing usage of a variable name other than the struct element name.
Also, sets DEF_FIELD_ADDR as a specific usage of DEF_FILD_ADDR_VAR in which
the var name is the same as the struct element name.
Then, makes use of DEF_FIELD_ADDR_VAR to create a variable of another name,
in order to avoid variable shadowing.

Signed-off-by: Leonardo Bras 
---
 scripts/mod/file2alias.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
index 7be43697ff84..ed468313ddeb 100644
--- a/scripts/mod/file2alias.c
+++ b/scripts/mod/file2alias.c
@@ -95,12 +95,20 @@ extern struct devtable *__start___devtable[], 
*__stop___devtable[];
  */
 #define DEF_FIELD(m, devid, f) \
typeof(((struct devid *)0)->f) f = TO_NATIVE(*(typeof(f) *)((m) + 
OFF_##devid##_##f))
+
+/* Define a variable v that holds the address of field f of struct devid
+ * based at address m.  Due to the way typeof works, for a field of type
+ * T[N] the variable has type T(*)[N], _not_ T*.
+ */
+#define DEF_FIELD_ADDR_VAR(m, devid, f, v) \
+   typeof(((struct devid *)0)->f) *v = ((m) + OFF_##devid##_##f)
+
 /* Define a variable f that holds the address of field f of struct devid
  * based at address m.  Due to the way typeof works, for a field of type
  * T[N] the variable has type T(*)[N], _not_ T*.
  */
 #define DEF_FIELD_ADDR(m, devid, f) \
-   typeof(((struct devid *)0)->f) *f = ((m) + OFF_##devid##_##f)
+   DEF_FIELD_ADDR_VAR(m, devid, f, f)
 
 /* Add a table entry.  We test function type matches while we're here. */
 #define ADD_TO_DEVTABLE(device_id, type, function) \
@@ -644,7 +652,7 @@ static void do_pnp_card_entries(void *symval, unsigned long 
size,
 
for (i = 0; i < count; i++) {
unsigned int j;
-   DEF_FIELD_ADDR(symval + i*id_size, pnp_card_device_id, devs);
+   DEF_FIELD_ADDR(symval + i * id_size, pnp_card_device_id, devs);
 
for (j = 0; j < PNP_MAX_DEVICES; j++) {
const char *id = (char *)(*devs)[j].id;
@@ -656,10 +664,13 @@ static void do_pnp_card_entries(void *symval, unsigned 
long size,
 
/* find duplicate, already added value */
for (i2 = 0; i2 < i && !dup; i2++) {
-   DEF_FIELD_ADDR(symval + i2*id_size, 
pnp_card_device_id, devs);
+   DEF_FIELD_ADDR_VAR(symval + i2 * id_size,
+  pnp_card_device_id,
+  devs, devs_dup);
 
for (j2 = 0; j2 < PNP_MAX_DEVICES; j2++) {
-   const char *id2 = (char 
*)(*devs)[j2].id;
+   const char *id2 =
+   (char *)(*devs_dup)[j2].id;
 
if (!id2[0])
break;
-- 
2.19.1

[PATCH v3 5/5] Adds -Wshadow on KBUILD_HOSTCFLAGS

2018-10-23 Thread Leonardo Bras

Adds -Wshadow on KBUILD_HOSTCFLAGS to show shadow warnings
on tools built for HOST.

Signed-off-by: Leonardo Bras 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index e8b599b4dcde..3edae5d359b5 100644
--- a/Makefile
+++ b/Makefile
@@ -360,7 +360,7 @@ HOST_LFS_LIBS := $(shell getconf LFS_LIBS 2>/dev/null)
 
 HOSTCC   = gcc
 HOSTCXX  = g++
-KBUILD_HOSTCFLAGS   := -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 \
+KBUILD_HOSTCFLAGS   := -Wall -Wshadow -Wmissing-prototypes -Wstrict-prototypes 
-O2 \
-fomit-frame-pointer -std=gnu89 $(HOST_LFS_CFLAGS) \
$(HOSTCFLAGS)
 KBUILD_HOSTCXXFLAGS := -O2 $(HOST_LFS_CFLAGS) $(HOSTCXXFLAGS)
-- 
2.19.1

[PATCH v3 0/5] Adds -Wshadow on KBUILD_HOSTCFLAGS and fix warnings

2018-10-23 Thread Leonardo Bras

This patchset add -Wshadow on KBUILD_HOSTCFLAGS and fixes
all code that show this warning.

Changes in v3:
- Better Cover letter
- Better commit message for patch 1/5.
- Fixes what should change on patch 3/5
- Removes accent of my second name (better for searching at lkml.org)


v2: https://lkml.org/lkml/2018/10/23/151
v1: https://lkml.org/lkml/2018/10/17/169

Leonardo Bras (5):
  x86/vdso: Renames variable to fix shadow warning.
  kbuild: Removes unnecessary shadowed local variable.
  Creates macro to avoid variable shadowing
  modpost: Changes parameter name to avoid shadowing.
  Adds -Wshadow on KBUILD_HOSTCFLAGS

 Makefile |  2 +-
 arch/x86/entry/vdso/vdso2c.h | 13 +++--
 scripts/asn1_compiler.c  |  2 +-
 scripts/mod/file2alias.c | 19 +++
 scripts/mod/modpost.c|  4 ++--
 5 files changed, 26 insertions(+), 14 deletions(-)

-- 
2.19.1

[PATCH v3 1/5] x86/vdso: Renames variable to fix shadow warning.

2018-10-23 Thread Leonardo Bras

The go32() and go64() functions have an argument and a local variable
called ‘name’.  Rename both to clarify the code and to fix a warning
with -Wshadow.

Signed-off-by: Leonardo Bras 
---
 arch/x86/entry/vdso/vdso2c.h | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/vdso/vdso2c.h b/arch/x86/entry/vdso/vdso2c.h
index fa847a620f40..a20b134de2a8 100644
--- a/arch/x86/entry/vdso/vdso2c.h
+++ b/arch/x86/entry/vdso/vdso2c.h
@@ -7,7 +7,7 @@
 
 static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 void *stripped_addr, size_t stripped_len,
-FILE *outfile, const char *name)
+FILE *outfile, const char *image_name)
 {
int found_load = 0;
unsigned long load_size = -1;  /* Work around bogus warning */
@@ -93,11 +93,12 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
int k;
ELF(Sym) *sym = raw_addr + GET_LE(_hdr->sh_offset) +
GET_LE(_hdr->sh_entsize) * i;
-   const char *name = raw_addr + GET_LE(_hdr->sh_offset) +
-   GET_LE(>st_name);
+   const char *sym_name = raw_addr +
+  GET_LE(_hdr->sh_offset) +
+  GET_LE(>st_name);
 
for (k = 0; k < NSYMS; k++) {
-   if (!strcmp(name, required_syms[k].name)) {
+   if (!strcmp(sym_name, required_syms[k].name)) {
if (syms[k]) {
fail("duplicate symbol %s\n",
 required_syms[k].name);
@@ -134,7 +135,7 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
if (syms[sym_vvar_start] % 4096)
fail("vvar_begin must be a multiple of 4096\n");
 
-   if (!name) {
+   if (!image_name) {
fwrite(stripped_addr, stripped_len, 1, outfile);
return;
}
@@ -157,7 +158,7 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
}
fprintf(outfile, "\n};\n\n");
 
-   fprintf(outfile, "const struct vdso_image %s = {\n", name);
+   fprintf(outfile, "const struct vdso_image %s = {\n", image_name);
fprintf(outfile, "\t.data = raw_data,\n");
fprintf(outfile, "\t.size = %lu,\n", mapping_size);
if (alt_sec) {
-- 
2.19.1

[PATCH v3 4/5] modpost: Changes parameter name to avoid shadowing.

2018-10-23 Thread Leonardo Bras

Changes the parameter name to avoid shadowing a variable.

Signed-off-by: Leonardo Bras 
---
 scripts/mod/modpost.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 0d998c54564d..368fe42340df 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -2228,13 +2228,13 @@ static int add_versions(struct buffer *b, struct module 
*mod)
 }
 
 static void add_depends(struct buffer *b, struct module *mod,
-   struct module *modules)
+   struct module *module_list)
 {
struct symbol *s;
struct module *m;
int first = 1;
 
-   for (m = modules; m; m = m->next)
+   for (m = module_list; m; m = m->next)
m->seen = is_vmlinux(m->name);
 
buf_printf(b, "\n");
-- 
2.19.1

[PATCH v3 2/5] kbuild: Removes unnecessary shadowed local variable.

2018-10-23 Thread Leonardo Bras

Removes an unnecessary shadowed local variable (start).
It was used only once, with the same value it was started before
the if block.

Signed-off-by: Leonardo Bras 
---
 scripts/asn1_compiler.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/asn1_compiler.c b/scripts/asn1_compiler.c
index c146020fc783..1b28787028d3 100644
--- a/scripts/asn1_compiler.c
+++ b/scripts/asn1_compiler.c
@@ -413,7 +413,7 @@ static void tokenise(char *buffer, char *end)
 
/* Handle string tokens */
if (isalpha(*p)) {
-   const char **dir, *start = p;
+   const char **dir;
 
/* Can be a directive, type name or element
 * name.  Find the end of the name.
-- 
2.19.1

[PATCH v3 1/5] x86/vdso: Renames variable to fix shadow warning.

2018-10-23 Thread Leonardo Bras

The go32() and go64() functions have an argument and a local variable
called ‘name’.  Rename both to clarify the code and to fix a warning
with -Wshadow.

Signed-off-by: Leonardo Bras 
---
 arch/x86/entry/vdso/vdso2c.h | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/vdso/vdso2c.h b/arch/x86/entry/vdso/vdso2c.h
index fa847a620f40..a20b134de2a8 100644
--- a/arch/x86/entry/vdso/vdso2c.h
+++ b/arch/x86/entry/vdso/vdso2c.h
@@ -7,7 +7,7 @@
 
 static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 void *stripped_addr, size_t stripped_len,
-FILE *outfile, const char *name)
+FILE *outfile, const char *image_name)
 {
int found_load = 0;
unsigned long load_size = -1;  /* Work around bogus warning */
@@ -93,11 +93,12 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
int k;
ELF(Sym) *sym = raw_addr + GET_LE(_hdr->sh_offset) +
GET_LE(_hdr->sh_entsize) * i;
-   const char *name = raw_addr + GET_LE(_hdr->sh_offset) +
-   GET_LE(>st_name);
+   const char *sym_name = raw_addr +
+  GET_LE(_hdr->sh_offset) +
+  GET_LE(>st_name);
 
for (k = 0; k < NSYMS; k++) {
-   if (!strcmp(name, required_syms[k].name)) {
+   if (!strcmp(sym_name, required_syms[k].name)) {
if (syms[k]) {
fail("duplicate symbol %s\n",
 required_syms[k].name);
@@ -134,7 +135,7 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
if (syms[sym_vvar_start] % 4096)
fail("vvar_begin must be a multiple of 4096\n");
 
-   if (!name) {
+   if (!image_name) {
fwrite(stripped_addr, stripped_len, 1, outfile);
return;
}
@@ -157,7 +158,7 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
}
fprintf(outfile, "\n};\n\n");
 
-   fprintf(outfile, "const struct vdso_image %s = {\n", name);
+   fprintf(outfile, "const struct vdso_image %s = {\n", image_name);
fprintf(outfile, "\t.data = raw_data,\n");
fprintf(outfile, "\t.size = %lu,\n", mapping_size);
if (alt_sec) {
-- 
2.19.1

[PATCH v3 4/5] modpost: Changes parameter name to avoid shadowing.

2018-10-23 Thread Leonardo Bras

Changes the parameter name to avoid shadowing a variable.

Signed-off-by: Leonardo Bras 
---
 scripts/mod/modpost.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 0d998c54564d..368fe42340df 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -2228,13 +2228,13 @@ static int add_versions(struct buffer *b, struct module 
*mod)
 }
 
 static void add_depends(struct buffer *b, struct module *mod,
-   struct module *modules)
+   struct module *module_list)
 {
struct symbol *s;
struct module *m;
int first = 1;
 
-   for (m = modules; m; m = m->next)
+   for (m = module_list; m; m = m->next)
m->seen = is_vmlinux(m->name);
 
buf_printf(b, "\n");
-- 
2.19.1

[PATCH v3 2/5] kbuild: Removes unnecessary shadowed local variable.

2018-10-23 Thread Leonardo Bras

Removes an unnecessary shadowed local variable (start).
It was used only once, with the same value it was started before
the if block.

Signed-off-by: Leonardo Bras 
---
 scripts/asn1_compiler.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/asn1_compiler.c b/scripts/asn1_compiler.c
index c146020fc783..1b28787028d3 100644
--- a/scripts/asn1_compiler.c
+++ b/scripts/asn1_compiler.c
@@ -413,7 +413,7 @@ static void tokenise(char *buffer, char *end)
 
/* Handle string tokens */
if (isalpha(*p)) {
-   const char **dir, *start = p;
+   const char **dir;
 
/* Can be a directive, type name or element
 * name.  Find the end of the name.
-- 
2.19.1

[PATCH] rpmsg: virtio_rpmsg_bus: replace "%p" with "%pK"

2018-10-23 Thread Suman Anna

The virtio_rpmsg_bus driver uses the "%p" format-specifier for
printing the vring buffer address. This prints only a hashed
pointer even for previliged users. Use "%pK" instead so that
the address can be printed during debug using kptr_restrict
sysctl.

Signed-off-by: Suman Anna 
---
 drivers/rpmsg/virtio_rpmsg_bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
index f29dee731026..1345f373a1a0 100644
--- a/drivers/rpmsg/virtio_rpmsg_bus.c
+++ b/drivers/rpmsg/virtio_rpmsg_bus.c
@@ -950,7 +950,7 @@ static int rpmsg_probe(struct virtio_device *vdev)
goto vqs_del;
}
 
-   dev_dbg(>dev, "buffers: va %p, dma %pad\n",
+   dev_dbg(>dev, "buffers: va %pK, dma %pad\n",
bufs_va, >bufs_dma);
 
/* half of the buffers is dedicated for RX */
-- 
2.19.1

Re: [PATCH v4 13/17] remoteproc: create vdev subdevice with specific dma memory pool

2018-10-23 Thread Suman Anna

On 10/23/18 8:22 PM, Suman Anna wrote:
> On 9/27/18 3:18 PM, Wendy Liang wrote:
>> Hi Loic,
>>
>>
>> On Thu, Sep 27, 2018 at 12:22 PM Loic PALLARDY  wrote:
>>>
>>> Hi Wendy
>>>
 -Original Message-
 From: Wendy Liang 
 Sent: Thursday, September 27, 2018 7:17 PM
 To: Loic PALLARDY 
 Cc: Bjorn Andersson ; Ohad Ben-Cohen
 ; linux-remotep...@vger.kernel.org; Linux Kernel
 Mailing List ; Arnaud POULIQUEN
 ; benjamin.gaign...@linaro.org; Suman Anna
 
 Subject: Re: [PATCH v4 13/17] remoteproc: create vdev subdevice with
 specific dma memory pool

 On Fri, Jul 27, 2018 at 6:16 AM Loic Pallardy  wrote:
>
> This patch creates a dedicated vdev subdevice for each vdev declared
> in firmware resource table and associates carveout named "vdev%dbuffer"
> (with %d vdev index in resource table) if any as dma coherent memory
 pool.
>
> Then vdev subdevice is used as parent for virtio device.
>
> Signed-off-by: Loic Pallardy 
> ---
>  drivers/remoteproc/remoteproc_core.c | 35
 +++---
>  drivers/remoteproc/remoteproc_internal.h |  1 +
>  drivers/remoteproc/remoteproc_virtio.c   | 42
 +++-
>  include/linux/remoteproc.h   |  1 +
>  4 files changed, 75 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c
 b/drivers/remoteproc/remoteproc_core.c
> index 4edc6f0..adcc66e 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -39,6 +39,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -145,7 +146,7 @@ static void rproc_disable_iommu(struct rproc
 *rproc)
> iommu_domain_free(domain);
>  }
>
> -static phys_addr_t rproc_va_to_pa(void *cpu_addr)
> +phys_addr_t rproc_va_to_pa(void *cpu_addr)
>  {
> /*
>  * Return physical address according to virtual address location
> @@ -160,6 +161,7 @@ static phys_addr_t rproc_va_to_pa(void
 *cpu_addr)
> WARN_ON(!virt_addr_valid(cpu_addr));
> return virt_to_phys(cpu_addr);
>  }
> +EXPORT_SYMBOL(rproc_va_to_pa);
>
>  /**
>   * rproc_da_to_va() - lookup the kernel virtual address for a remoteproc
 address
> @@ -423,6 +425,20 @@ static void rproc_vdev_do_stop(struct
 rproc_subdev *subdev, bool crashed)
>  }
>
>  /**
> + * rproc_rvdev_release() - release the existence of a rvdev
> + *
> + * @dev: the subdevice's dev
> + */
> +static void rproc_rvdev_release(struct device *dev)
> +{
> +   struct rproc_vdev *rvdev = container_of(dev, struct rproc_vdev, 
> dev);
> +
> +   of_reserved_mem_device_release(dev);
> +
> +   kfree(rvdev);
> +}
> +
> +/**
>   * rproc_handle_vdev() - handle a vdev fw resource
>   * @rproc: the remote processor
>   * @rsc: the vring resource descriptor
> @@ -455,6 +471,7 @@ static int rproc_handle_vdev(struct rproc *rproc,
 struct fw_rsc_vdev *rsc,
> struct device *dev = >dev;
> struct rproc_vdev *rvdev;
> int i, ret;
> +   char name[16];
>
> /* make sure resource isn't truncated */
> if (sizeof(*rsc) + rsc->num_of_vrings * sizeof(struct
 fw_rsc_vdev_vring)
> @@ -488,6 +505,18 @@ static int rproc_handle_vdev(struct rproc *rproc,
 struct fw_rsc_vdev *rsc,
> rvdev->rproc = rproc;
> rvdev->index = rproc->nb_vdev++;
>
> +   /* Initialise vdev subdevice */
> +   snprintf(name, sizeof(name), "vdev%dbuffer", rvdev->index);
> +   rvdev->dev.parent = rproc->dev.parent;
> +   rvdev->dev.release = rproc_rvdev_release;
> +   dev_set_name(>dev, "%s#%s", dev_name(rvdev-
> dev.parent), name);
> +   dev_set_drvdata(>dev, rvdev);
> +   dma_set_coherent_mask(>dev, DMA_BIT_MASK(32));
 I tried the latest kernel, this function will not set the DMA coherent 
 mask as
 dma_supported() of the >dev will return false.
 As this is a device created at run time, should it be force to support DMA?
 should it directly set the dma_coherent_mask?
>>>
>>> Thanks for pointing me this issue. I tested on top of 4.18-rc1 few months 
>>> ago...
>>> Could you please give me kernel version on which you are testing the series?
>>> Is you platform 32bit or 64bit ?
>>> I'll rebase and check on my side.
>>
>> I am testing with 4.19-rc4 on aarch64 platform.
> 
> Btw, I ran into this on my v7 platform as well (4.19-rc6). The
> dma_set_coherent_mask fails with error EIO. I did get my allocations
> through though.

Correction, that was before Patch 17. With patch 17, this fails.

regards
Suman

> 
> regards
> Suman
> 
>>
>> Best Regards,
>> Wendy

Re: [PATCH v4 13/17] remoteproc: create vdev subdevice with specific dma memory pool

2018-10-23 Thread Suman Anna

On 10/10/18 2:17 PM, Loic PALLARDY wrote:
> 
> 
>> -Original Message-
>> From: Bjorn Andersson [mailto:bjorn.anders...@linaro.org]
>> Sent: mercredi 10 octobre 2018 07:58
>> To: Loic PALLARDY 
>> Cc: o...@wizery.com; linux-remotep...@vger.kernel.org; linux-
>> ker...@vger.kernel.org; Arnaud POULIQUEN ;
>> benjamin.gaign...@linaro.org; s-a...@ti.com
>> Subject: Re: [PATCH v4 13/17] remoteproc: create vdev subdevice with
>> specific dma memory pool
>>
>> On Fri 27 Jul 06:14 PDT 2018, Loic Pallardy wrote:
>>
>>> This patch creates a dedicated vdev subdevice for each vdev declared
>>> in firmware resource table and associates carveout named "vdev%dbuffer"
>>> (with %d vdev index in resource table) if any as dma coherent memory
>> pool.
>>>
>>> Then vdev subdevice is used as parent for virtio device.
>>>
>>> Signed-off-by: Loic Pallardy 
>>> ---
>>>  drivers/remoteproc/remoteproc_core.c | 35
>> +++---
>>>  drivers/remoteproc/remoteproc_internal.h |  1 +
>>>  drivers/remoteproc/remoteproc_virtio.c   | 42
>> +++-
>>>  include/linux/remoteproc.h   |  1 +
>>>  4 files changed, 75 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/remoteproc/remoteproc_core.c
>> b/drivers/remoteproc/remoteproc_core.c
>>> index 4edc6f0..adcc66e 100644
>>> --- a/drivers/remoteproc/remoteproc_core.c
>>> +++ b/drivers/remoteproc/remoteproc_core.c
>>> @@ -39,6 +39,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  #include 
>>>  #include 
>>>  #include 
>>> @@ -145,7 +146,7 @@ static void rproc_disable_iommu(struct rproc
>> *rproc)
>>> iommu_domain_free(domain);
>>>  }
>>>
>>> -static phys_addr_t rproc_va_to_pa(void *cpu_addr)
>>> +phys_addr_t rproc_va_to_pa(void *cpu_addr)
>>>  {
>>> /*
>>>  * Return physical address according to virtual address location
>>> @@ -160,6 +161,7 @@ static phys_addr_t rproc_va_to_pa(void
>> *cpu_addr)
>>> WARN_ON(!virt_addr_valid(cpu_addr));
>>> return virt_to_phys(cpu_addr);
>>>  }
>>> +EXPORT_SYMBOL(rproc_va_to_pa);
>>>
>>>  /**
>>>   * rproc_da_to_va() - lookup the kernel virtual address for a remoteproc
>> address
>>> @@ -423,6 +425,20 @@ static void rproc_vdev_do_stop(struct
>> rproc_subdev *subdev, bool crashed)
>>>  }
>>>
>>>  /**
>>> + * rproc_rvdev_release() - release the existence of a rvdev
>>> + *
>>> + * @dev: the subdevice's dev
>>> + */
>>> +static void rproc_rvdev_release(struct device *dev)
>>> +{
>>> +   struct rproc_vdev *rvdev = container_of(dev, struct rproc_vdev,
>> dev);
>>> +
>>> +   of_reserved_mem_device_release(dev);
>>> +
>>> +   kfree(rvdev);
>>> +}
>>> +
>>> +/**
>>>   * rproc_handle_vdev() - handle a vdev fw resource
>>>   * @rproc: the remote processor
>>>   * @rsc: the vring resource descriptor
>>> @@ -455,6 +471,7 @@ static int rproc_handle_vdev(struct rproc *rproc,
>> struct fw_rsc_vdev *rsc,
>>> struct device *dev = >dev;
>>> struct rproc_vdev *rvdev;
>>> int i, ret;
>>> +   char name[16];
>>>
>>> /* make sure resource isn't truncated */
>>> if (sizeof(*rsc) + rsc->num_of_vrings * sizeof(struct
>> fw_rsc_vdev_vring)
>>> @@ -488,6 +505,18 @@ static int rproc_handle_vdev(struct rproc *rproc,
>> struct fw_rsc_vdev *rsc,
>>> rvdev->rproc = rproc;
>>> rvdev->index = rproc->nb_vdev++;
>>>
>>> +   /* Initialise vdev subdevice */
>>> +   snprintf(name, sizeof(name), "vdev%dbuffer", rvdev->index);
>>> +   rvdev->dev.parent = rproc->dev.parent;
>>> +   rvdev->dev.release = rproc_rvdev_release;
>>> +   dev_set_name(>dev, "%s#%s", dev_name(rvdev-
>>> dev.parent), name);
>>> +   dev_set_drvdata(>dev, rvdev);
>>> +   dma_set_coherent_mask(>dev, DMA_BIT_MASK(32));
>>> +
>>> +   ret = device_register(>dev);
>>> +   if (ret)
>>> +   goto free_rvdev;
>>> +
>>> /* parse the vrings */
>>> for (i = 0; i < rsc->num_of_vrings; i++) {
>>> ret = rproc_parse_vring(rvdev, rsc, i);
>>> @@ -518,7 +547,7 @@ static int rproc_handle_vdev(struct rproc *rproc,
>> struct fw_rsc_vdev *rsc,
>>> for (i--; i >= 0; i--)
>>> rproc_free_vring(>vring[i]);
>>>  free_rvdev:
>>> -   kfree(rvdev);
>>> +   device_unregister(>dev);
>>> return ret;
>>>  }
>>>
>>> @@ -536,7 +565,7 @@ void rproc_vdev_release(struct kref *ref)
>>>
>>> rproc_remove_subdev(rproc, >subdev);
>>> list_del(>node);
>>> -   kfree(rvdev);
>>> +   device_unregister(>dev);
>>>  }
>>>
>>>  /**
>>> diff --git a/drivers/remoteproc/remoteproc_internal.h
>> b/drivers/remoteproc/remoteproc_internal.h
>>> index f6cad24..bfeacfd 100644
>>> --- a/drivers/remoteproc/remoteproc_internal.h
>>> +++ b/drivers/remoteproc/remoteproc_internal.h
>>> @@ -52,6 +52,7 @@ struct dentry *rproc_create_trace_file(const char
>> *name, struct rproc *rproc,
>>>  int rproc_alloc_vring(struct rproc_vdev *rvdev, int i);
>>>
>>>  void *rproc_da_to_va(struct rproc *rproc, u64 da, int len);
>>> +phys_addr_t rproc_va_to_pa(void *cpu_addr);
>>>  int rproc_trigger_recovery(struct rproc

[PATCH] rpmsg: virtio_rpmsg_bus: replace "%p" with "%pK"

2018-10-23 Thread Suman Anna

The virtio_rpmsg_bus driver uses the "%p" format-specifier for
printing the vring buffer address. This prints only a hashed
pointer even for previliged users. Use "%pK" instead so that
the address can be printed during debug using kptr_restrict
sysctl.

Signed-off-by: Suman Anna 
---
 drivers/rpmsg/virtio_rpmsg_bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
index f29dee731026..1345f373a1a0 100644
--- a/drivers/rpmsg/virtio_rpmsg_bus.c
+++ b/drivers/rpmsg/virtio_rpmsg_bus.c
@@ -950,7 +950,7 @@ static int rpmsg_probe(struct virtio_device *vdev)
goto vqs_del;
}
 
-   dev_dbg(>dev, "buffers: va %p, dma %pad\n",
+   dev_dbg(>dev, "buffers: va %pK, dma %pad\n",
bufs_va, >bufs_dma);
 
/* half of the buffers is dedicated for RX */
-- 
2.19.1

Re: [PATCH v4 13/17] remoteproc: create vdev subdevice with specific dma memory pool

2018-10-23 Thread Suman Anna

On 10/23/18 8:22 PM, Suman Anna wrote:
> On 9/27/18 3:18 PM, Wendy Liang wrote:
>> Hi Loic,
>>
>>
>> On Thu, Sep 27, 2018 at 12:22 PM Loic PALLARDY  wrote:
>>>
>>> Hi Wendy
>>>
 -Original Message-
 From: Wendy Liang 
 Sent: Thursday, September 27, 2018 7:17 PM
 To: Loic PALLARDY 
 Cc: Bjorn Andersson ; Ohad Ben-Cohen
 ; linux-remotep...@vger.kernel.org; Linux Kernel
 Mailing List ; Arnaud POULIQUEN
 ; benjamin.gaign...@linaro.org; Suman Anna
 
 Subject: Re: [PATCH v4 13/17] remoteproc: create vdev subdevice with
 specific dma memory pool

 On Fri, Jul 27, 2018 at 6:16 AM Loic Pallardy  wrote:
>
> This patch creates a dedicated vdev subdevice for each vdev declared
> in firmware resource table and associates carveout named "vdev%dbuffer"
> (with %d vdev index in resource table) if any as dma coherent memory
 pool.
>
> Then vdev subdevice is used as parent for virtio device.
>
> Signed-off-by: Loic Pallardy 
> ---
>  drivers/remoteproc/remoteproc_core.c | 35
 +++---
>  drivers/remoteproc/remoteproc_internal.h |  1 +
>  drivers/remoteproc/remoteproc_virtio.c   | 42
 +++-
>  include/linux/remoteproc.h   |  1 +
>  4 files changed, 75 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c
 b/drivers/remoteproc/remoteproc_core.c
> index 4edc6f0..adcc66e 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -39,6 +39,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -145,7 +146,7 @@ static void rproc_disable_iommu(struct rproc
 *rproc)
> iommu_domain_free(domain);
>  }
>
> -static phys_addr_t rproc_va_to_pa(void *cpu_addr)
> +phys_addr_t rproc_va_to_pa(void *cpu_addr)
>  {
> /*
>  * Return physical address according to virtual address location
> @@ -160,6 +161,7 @@ static phys_addr_t rproc_va_to_pa(void
 *cpu_addr)
> WARN_ON(!virt_addr_valid(cpu_addr));
> return virt_to_phys(cpu_addr);
>  }
> +EXPORT_SYMBOL(rproc_va_to_pa);
>
>  /**
>   * rproc_da_to_va() - lookup the kernel virtual address for a remoteproc
 address
> @@ -423,6 +425,20 @@ static void rproc_vdev_do_stop(struct
 rproc_subdev *subdev, bool crashed)
>  }
>
>  /**
> + * rproc_rvdev_release() - release the existence of a rvdev
> + *
> + * @dev: the subdevice's dev
> + */
> +static void rproc_rvdev_release(struct device *dev)
> +{
> +   struct rproc_vdev *rvdev = container_of(dev, struct rproc_vdev, 
> dev);
> +
> +   of_reserved_mem_device_release(dev);
> +
> +   kfree(rvdev);
> +}
> +
> +/**
>   * rproc_handle_vdev() - handle a vdev fw resource
>   * @rproc: the remote processor
>   * @rsc: the vring resource descriptor
> @@ -455,6 +471,7 @@ static int rproc_handle_vdev(struct rproc *rproc,
 struct fw_rsc_vdev *rsc,
> struct device *dev = >dev;
> struct rproc_vdev *rvdev;
> int i, ret;
> +   char name[16];
>
> /* make sure resource isn't truncated */
> if (sizeof(*rsc) + rsc->num_of_vrings * sizeof(struct
 fw_rsc_vdev_vring)
> @@ -488,6 +505,18 @@ static int rproc_handle_vdev(struct rproc *rproc,
 struct fw_rsc_vdev *rsc,
> rvdev->rproc = rproc;
> rvdev->index = rproc->nb_vdev++;
>
> +   /* Initialise vdev subdevice */
> +   snprintf(name, sizeof(name), "vdev%dbuffer", rvdev->index);
> +   rvdev->dev.parent = rproc->dev.parent;
> +   rvdev->dev.release = rproc_rvdev_release;
> +   dev_set_name(>dev, "%s#%s", dev_name(rvdev-
> dev.parent), name);
> +   dev_set_drvdata(>dev, rvdev);
> +   dma_set_coherent_mask(>dev, DMA_BIT_MASK(32));
 I tried the latest kernel, this function will not set the DMA coherent 
 mask as
 dma_supported() of the >dev will return false.
 As this is a device created at run time, should it be force to support DMA?
 should it directly set the dma_coherent_mask?
>>>
>>> Thanks for pointing me this issue. I tested on top of 4.18-rc1 few months 
>>> ago...
>>> Could you please give me kernel version on which you are testing the series?
>>> Is you platform 32bit or 64bit ?
>>> I'll rebase and check on my side.
>>
>> I am testing with 4.19-rc4 on aarch64 platform.
> 
> Btw, I ran into this on my v7 platform as well (4.19-rc6). The
> dma_set_coherent_mask fails with error EIO. I did get my allocations
> through though.

Correction, that was before Patch 17. With patch 17, this fails.

regards
Suman

> 
> regards
> Suman
> 
>>
>> Best Regards,
>> Wendy

Re: [PATCH v4 13/17] remoteproc: create vdev subdevice with specific dma memory pool

2018-10-23 Thread Suman Anna

On 10/10/18 2:17 PM, Loic PALLARDY wrote:
> 
> 
>> -Original Message-
>> From: Bjorn Andersson [mailto:bjorn.anders...@linaro.org]
>> Sent: mercredi 10 octobre 2018 07:58
>> To: Loic PALLARDY 
>> Cc: o...@wizery.com; linux-remotep...@vger.kernel.org; linux-
>> ker...@vger.kernel.org; Arnaud POULIQUEN ;
>> benjamin.gaign...@linaro.org; s-a...@ti.com
>> Subject: Re: [PATCH v4 13/17] remoteproc: create vdev subdevice with
>> specific dma memory pool
>>
>> On Fri 27 Jul 06:14 PDT 2018, Loic Pallardy wrote:
>>
>>> This patch creates a dedicated vdev subdevice for each vdev declared
>>> in firmware resource table and associates carveout named "vdev%dbuffer"
>>> (with %d vdev index in resource table) if any as dma coherent memory
>> pool.
>>>
>>> Then vdev subdevice is used as parent for virtio device.
>>>
>>> Signed-off-by: Loic Pallardy 
>>> ---
>>>  drivers/remoteproc/remoteproc_core.c | 35
>> +++---
>>>  drivers/remoteproc/remoteproc_internal.h |  1 +
>>>  drivers/remoteproc/remoteproc_virtio.c   | 42
>> +++-
>>>  include/linux/remoteproc.h   |  1 +
>>>  4 files changed, 75 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/remoteproc/remoteproc_core.c
>> b/drivers/remoteproc/remoteproc_core.c
>>> index 4edc6f0..adcc66e 100644
>>> --- a/drivers/remoteproc/remoteproc_core.c
>>> +++ b/drivers/remoteproc/remoteproc_core.c
>>> @@ -39,6 +39,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  #include 
>>>  #include 
>>>  #include 
>>> @@ -145,7 +146,7 @@ static void rproc_disable_iommu(struct rproc
>> *rproc)
>>> iommu_domain_free(domain);
>>>  }
>>>
>>> -static phys_addr_t rproc_va_to_pa(void *cpu_addr)
>>> +phys_addr_t rproc_va_to_pa(void *cpu_addr)
>>>  {
>>> /*
>>>  * Return physical address according to virtual address location
>>> @@ -160,6 +161,7 @@ static phys_addr_t rproc_va_to_pa(void
>> *cpu_addr)
>>> WARN_ON(!virt_addr_valid(cpu_addr));
>>> return virt_to_phys(cpu_addr);
>>>  }
>>> +EXPORT_SYMBOL(rproc_va_to_pa);
>>>
>>>  /**
>>>   * rproc_da_to_va() - lookup the kernel virtual address for a remoteproc
>> address
>>> @@ -423,6 +425,20 @@ static void rproc_vdev_do_stop(struct
>> rproc_subdev *subdev, bool crashed)
>>>  }
>>>
>>>  /**
>>> + * rproc_rvdev_release() - release the existence of a rvdev
>>> + *
>>> + * @dev: the subdevice's dev
>>> + */
>>> +static void rproc_rvdev_release(struct device *dev)
>>> +{
>>> +   struct rproc_vdev *rvdev = container_of(dev, struct rproc_vdev,
>> dev);
>>> +
>>> +   of_reserved_mem_device_release(dev);
>>> +
>>> +   kfree(rvdev);
>>> +}
>>> +
>>> +/**
>>>   * rproc_handle_vdev() - handle a vdev fw resource
>>>   * @rproc: the remote processor
>>>   * @rsc: the vring resource descriptor
>>> @@ -455,6 +471,7 @@ static int rproc_handle_vdev(struct rproc *rproc,
>> struct fw_rsc_vdev *rsc,
>>> struct device *dev = >dev;
>>> struct rproc_vdev *rvdev;
>>> int i, ret;
>>> +   char name[16];
>>>
>>> /* make sure resource isn't truncated */
>>> if (sizeof(*rsc) + rsc->num_of_vrings * sizeof(struct
>> fw_rsc_vdev_vring)
>>> @@ -488,6 +505,18 @@ static int rproc_handle_vdev(struct rproc *rproc,
>> struct fw_rsc_vdev *rsc,
>>> rvdev->rproc = rproc;
>>> rvdev->index = rproc->nb_vdev++;
>>>
>>> +   /* Initialise vdev subdevice */
>>> +   snprintf(name, sizeof(name), "vdev%dbuffer", rvdev->index);
>>> +   rvdev->dev.parent = rproc->dev.parent;
>>> +   rvdev->dev.release = rproc_rvdev_release;
>>> +   dev_set_name(>dev, "%s#%s", dev_name(rvdev-
>>> dev.parent), name);
>>> +   dev_set_drvdata(>dev, rvdev);
>>> +   dma_set_coherent_mask(>dev, DMA_BIT_MASK(32));
>>> +
>>> +   ret = device_register(>dev);
>>> +   if (ret)
>>> +   goto free_rvdev;
>>> +
>>> /* parse the vrings */
>>> for (i = 0; i < rsc->num_of_vrings; i++) {
>>> ret = rproc_parse_vring(rvdev, rsc, i);
>>> @@ -518,7 +547,7 @@ static int rproc_handle_vdev(struct rproc *rproc,
>> struct fw_rsc_vdev *rsc,
>>> for (i--; i >= 0; i--)
>>> rproc_free_vring(>vring[i]);
>>>  free_rvdev:
>>> -   kfree(rvdev);
>>> +   device_unregister(>dev);
>>> return ret;
>>>  }
>>>
>>> @@ -536,7 +565,7 @@ void rproc_vdev_release(struct kref *ref)
>>>
>>> rproc_remove_subdev(rproc, >subdev);
>>> list_del(>node);
>>> -   kfree(rvdev);
>>> +   device_unregister(>dev);
>>>  }
>>>
>>>  /**
>>> diff --git a/drivers/remoteproc/remoteproc_internal.h
>> b/drivers/remoteproc/remoteproc_internal.h
>>> index f6cad24..bfeacfd 100644
>>> --- a/drivers/remoteproc/remoteproc_internal.h
>>> +++ b/drivers/remoteproc/remoteproc_internal.h
>>> @@ -52,6 +52,7 @@ struct dentry *rproc_create_trace_file(const char
>> *name, struct rproc *rproc,
>>>  int rproc_alloc_vring(struct rproc_vdev *rvdev, int i);
>>>
>>>  void *rproc_da_to_va(struct rproc *rproc, u64 da, int len);
>>> +phys_addr_t rproc_va_to_pa(void *cpu_addr);
>>>  int rproc_trigger_recovery(struct rproc

Re: [v4] i2c: Add PCI and platform drivers for the AMD MP2 I2C controller

2018-10-23 Thread kbuild test robot

Hi Elie,

I love your patch! Perhaps something to improve:

[auto build test WARNING on wsa/i2c/for-next]
[also build test WARNING on v4.19 next-20181019]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Elie-Morisse/i2c-Add-PCI-and-platform-drivers-for-the-AMD-MP2-I2C-controller/20181024-013625
base:   https://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git 
i2c/for-next
config: i386-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/i2c/busses/i2c-amd-pci-mp2.c: In function 'amd_mp2_debugfs_read':
>> drivers/i2c/busses/i2c-amd-pci-mp2.c:298:66: warning: comparison of distinct 
>> pointer types lacks a cast
 buf_size = min(count, 0x800ul);
 ^ 

vim +298 drivers/i2c/busses/i2c-amd-pci-mp2.c

   285  
   286  static ssize_t amd_mp2_debugfs_read(struct file *filp, char __user 
*ubuf,
   287  size_t count, loff_t *offp)
   288  {
   289  struct amd_mp2_dev *privdata;
   290  void __iomem *mmio;
   291  u8 *buf;
   292  size_t buf_size;
   293  ssize_t ret, off;
   294  u32 v32;
   295  
   296  privdata = filp->private_data;
   297  mmio = privdata->mmio;
 > 298  buf_size = min(count, 0x800ul);
   299  buf = kmalloc(buf_size, GFP_KERNEL);
   300  
   301  if (!buf)
   302  return -ENOMEM;
   303  
   304  off = 0;
   305  off += scnprintf(buf + off, buf_size - off,
   306  "Mp2 Device Information:\n");
   307  
   308  off += scnprintf(buf + off, buf_size - off,
   309  "\n");
   310  off += scnprintf(buf + off, buf_size - off,
   311  "\tMP2 C2P Message Register Dump:\n\n");
   312  v32 = readl(privdata->mmio + AMD_C2P_MSG0);
   313  off += scnprintf(buf + off, buf_size - off,
   314  "AMD_C2P_MSG0 -\t\t\t%#06x\n", v32);
   315  
   316  v32 = readl(privdata->mmio + AMD_C2P_MSG1);
   317  off += scnprintf(buf + off, buf_size - off,
   318  "AMD_C2P_MSG1 -\t\t\t%#06x\n", v32);
   319  
   320  v32 = readl(privdata->mmio + AMD_C2P_MSG2);
   321  off += scnprintf(buf + off, buf_size - off,
   322  "AMD_C2P_MSG2 -\t\t\t%#06x\n", v32);
   323  
   324  v32 = readl(privdata->mmio + AMD_C2P_MSG3);
   325  off += scnprintf(buf + off, buf_size - off,
   326  "AMD_C2P_MSG3 -\t\t\t%#06x\n", v32);
   327  
   328  v32 = readl(privdata->mmio + AMD_C2P_MSG4);
   329  off += scnprintf(buf + off, buf_size - off,
   330  "AMD_C2P_MSG4 -\t\t\t%#06x\n", v32);
   331  
   332  v32 = readl(privdata->mmio + AMD_C2P_MSG5);
   333  off += scnprintf(buf + off, buf_size - off,
   334  "AMD_C2P_MSG5 -\t\t\t%#06x\n", v32);
   335  
   336  v32 = readl(privdata->mmio + AMD_C2P_MSG6);
   337  off += scnprintf(buf + off, buf_size - off,
   338  "AMD_C2P_MSG6 -\t\t\t%#06x\n", v32);
   339  
   340  v32 = readl(privdata->mmio + AMD_C2P_MSG7);
   341  off += scnprintf(buf + off, buf_size - off,
   342  "AMD_C2P_MSG7 -\t\t\t%#06x\n", v32);
   343  
   344  v32 = readl(privdata->mmio + AMD_C2P_MSG8);
   345  off += scnprintf(buf + off, buf_size - off,
   346  "AMD_C2P_MSG8 -\t\t\t%#06x\n", v32);
   347  
   348  v32 = readl(privdata->mmio + AMD_C2P_MSG9);
   349  off += scnprintf(buf + off, buf_size - off,
   350  "AMD_C2P_MSG9 -\t\t\t%#06x\n", v32);
   351  
   352  off += scnprintf(buf + off, buf_size - off,
   353  "\n\tMP2 P2C Message Register Dump:\n\n");
   354  
   355  v32 = readl(privdata->mmio + AMD_P2C_MSG1);
   356  off += scnprintf(buf + off, buf_size - off,
   357  "AMD_P2C_MSG1 -\t\t\t%#06x\n", v32);
   358  
   359  v32 = readl(privdata->mmio + AMD_P2C_MSG2);
   360  off += scnprintf(buf + off, buf_size - off,
   361  "AMD_P2C_MSG2 -\t\t\t%#06x\n", v32);
   362  
   363  v32 = readl(privdata->mmio + AMD_P2C_MSG_INTEN);
   364  off += scnprintf(buf + off, buf_size - off,
   365  "AMD_P2C_MSG_INTEN -\t\t%#06x\n", v32);
   366  
   367  v32 = readl(privdata->mmio + AMD_P2C_MSG_INTSTS);
   368  off += scnprintf(buf + off, buf_size - off,
   369

Re: [v4] i2c: Add PCI and platform drivers for the AMD MP2 I2C controller

2018-10-23 Thread kbuild test robot

Hi Elie,

I love your patch! Perhaps something to improve:

[auto build test WARNING on wsa/i2c/for-next]
[also build test WARNING on v4.19 next-20181019]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Elie-Morisse/i2c-Add-PCI-and-platform-drivers-for-the-AMD-MP2-I2C-controller/20181024-013625
base:   https://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git 
i2c/for-next
config: i386-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/i2c/busses/i2c-amd-pci-mp2.c: In function 'amd_mp2_debugfs_read':
>> drivers/i2c/busses/i2c-amd-pci-mp2.c:298:66: warning: comparison of distinct 
>> pointer types lacks a cast
 buf_size = min(count, 0x800ul);
 ^ 

vim +298 drivers/i2c/busses/i2c-amd-pci-mp2.c

   285  
   286  static ssize_t amd_mp2_debugfs_read(struct file *filp, char __user 
*ubuf,
   287  size_t count, loff_t *offp)
   288  {
   289  struct amd_mp2_dev *privdata;
   290  void __iomem *mmio;
   291  u8 *buf;
   292  size_t buf_size;
   293  ssize_t ret, off;
   294  u32 v32;
   295  
   296  privdata = filp->private_data;
   297  mmio = privdata->mmio;
 > 298  buf_size = min(count, 0x800ul);
   299  buf = kmalloc(buf_size, GFP_KERNEL);
   300  
   301  if (!buf)
   302  return -ENOMEM;
   303  
   304  off = 0;
   305  off += scnprintf(buf + off, buf_size - off,
   306  "Mp2 Device Information:\n");
   307  
   308  off += scnprintf(buf + off, buf_size - off,
   309  "\n");
   310  off += scnprintf(buf + off, buf_size - off,
   311  "\tMP2 C2P Message Register Dump:\n\n");
   312  v32 = readl(privdata->mmio + AMD_C2P_MSG0);
   313  off += scnprintf(buf + off, buf_size - off,
   314  "AMD_C2P_MSG0 -\t\t\t%#06x\n", v32);
   315  
   316  v32 = readl(privdata->mmio + AMD_C2P_MSG1);
   317  off += scnprintf(buf + off, buf_size - off,
   318  "AMD_C2P_MSG1 -\t\t\t%#06x\n", v32);
   319  
   320  v32 = readl(privdata->mmio + AMD_C2P_MSG2);
   321  off += scnprintf(buf + off, buf_size - off,
   322  "AMD_C2P_MSG2 -\t\t\t%#06x\n", v32);
   323  
   324  v32 = readl(privdata->mmio + AMD_C2P_MSG3);
   325  off += scnprintf(buf + off, buf_size - off,
   326  "AMD_C2P_MSG3 -\t\t\t%#06x\n", v32);
   327  
   328  v32 = readl(privdata->mmio + AMD_C2P_MSG4);
   329  off += scnprintf(buf + off, buf_size - off,
   330  "AMD_C2P_MSG4 -\t\t\t%#06x\n", v32);
   331  
   332  v32 = readl(privdata->mmio + AMD_C2P_MSG5);
   333  off += scnprintf(buf + off, buf_size - off,
   334  "AMD_C2P_MSG5 -\t\t\t%#06x\n", v32);
   335  
   336  v32 = readl(privdata->mmio + AMD_C2P_MSG6);
   337  off += scnprintf(buf + off, buf_size - off,
   338  "AMD_C2P_MSG6 -\t\t\t%#06x\n", v32);
   339  
   340  v32 = readl(privdata->mmio + AMD_C2P_MSG7);
   341  off += scnprintf(buf + off, buf_size - off,
   342  "AMD_C2P_MSG7 -\t\t\t%#06x\n", v32);
   343  
   344  v32 = readl(privdata->mmio + AMD_C2P_MSG8);
   345  off += scnprintf(buf + off, buf_size - off,
   346  "AMD_C2P_MSG8 -\t\t\t%#06x\n", v32);
   347  
   348  v32 = readl(privdata->mmio + AMD_C2P_MSG9);
   349  off += scnprintf(buf + off, buf_size - off,
   350  "AMD_C2P_MSG9 -\t\t\t%#06x\n", v32);
   351  
   352  off += scnprintf(buf + off, buf_size - off,
   353  "\n\tMP2 P2C Message Register Dump:\n\n");
   354  
   355  v32 = readl(privdata->mmio + AMD_P2C_MSG1);
   356  off += scnprintf(buf + off, buf_size - off,
   357  "AMD_P2C_MSG1 -\t\t\t%#06x\n", v32);
   358  
   359  v32 = readl(privdata->mmio + AMD_P2C_MSG2);
   360  off += scnprintf(buf + off, buf_size - off,
   361  "AMD_P2C_MSG2 -\t\t\t%#06x\n", v32);
   362  
   363  v32 = readl(privdata->mmio + AMD_P2C_MSG_INTEN);
   364  off += scnprintf(buf + off, buf_size - off,
   365  "AMD_P2C_MSG_INTEN -\t\t%#06x\n", v32);
   366  
   367  v32 = readl(privdata->mmio + AMD_P2C_MSG_INTSTS);
   368  off += scnprintf(buf + off, buf_size - off,
   369

Re: [PATCH v4 01/17] remoteproc: configure IOMMU only if device address requested

2018-10-23 Thread Suman Anna

On 10/23/18 2:40 PM, Loic PALLARDY wrote:
> Hi Suman,
> 
>> -Original Message-
>> From: Suman Anna 
>> Sent: mardi 23 octobre 2018 19:26
>> To: Loic PALLARDY ; bjorn.anders...@linaro.org;
>> o...@wizery.com
>> Cc: linux-remotep...@vger.kernel.org; linux-kernel@vger.kernel.org;
>> Arnaud POULIQUEN ;
>> benjamin.gaign...@linaro.org
>> Subject: Re: [PATCH v4 01/17] remoteproc: configure IOMMU only if device
>> address requested
>>
>> Hi Loic,
>>
>> On 7/27/18 8:14 AM, Loic Pallardy wrote:
>>> If there is no IOMMU associate to remote processor device,
>>> remoteproc_core won't be able to satisfy device address requested
>>> in firmware resource table.
>>> Return an error as configuration won't be coherent.
>>>
>>> Signed-off-by: Loic Pallardy 
>>
>> This patch is breaking my Davinci platforms. It is not really required
>> that you _should_ have IOMMUs when a valid DA is mentioned. Please see
>> the existing description (paras 4 and 5) on the fw_rsc_carveout
>> kerneldoc in remoteproc.h file.
> 
> Thanks for pointing this comment. Indeed sMMU is not mandatory, and at first 
> sight I agree we should remove the restriction introduced by the patch.
> Driver porting on the series should be done before adding this.
>>
>> We do have platforms where we have some internal sub-modules within the
>> remote processor sub-system that provides some linear
>> address-translation (most common case with 32-bit processors supporting
>> 64-bit addresses). Also, we have some upcoming SoCs where we have an
>> MMU
>> but is not programmable by Linux.
>>
>> There is one comment there, but I don't think this is actually handled
>> in the current remoteproc core.
>> "If @da is set to
>>  * FW_RSC_ADDR_ANY, then the host will dynamically allocate it, and then
>>  * overwrite @da with the dynamically allocated address."
>>
> I don't remember it was implemented like described.

Yes, it was missing, and one of your patches seem to add this behavior
now. That said, I really don't think the remoteproc core can dictate the
 da. Even if the individual remoteproc driver were to furnish this, how
would you get such data without forcing a fixed behavior for all
possible firmwares (not desirable). We should get rid of this comment,
and any code that seems to do this.

> 
> I have remarks about the comment:
> "* We will always use @da to negotiate the device addresses, even if it
>  * isn't using an iommu. In that case, though, it will obviously contain
>  * physical addresses."
> 
> When there is no sMMU, we can't consider that da contains a physical address 
> because coprocessor can have its own memory map just because it is a 32bit 
> processor accessing only a part of the memory and the main is 64bit one. The 
> 2 processors won't see the internal memory at the same base address for 
> example.

Agreed, believe it was valid when it was written (32-bit platforms
supporting 32-bit addresses). I think this is akin to an IPA
(Intermediate Physical Address).

> So what should we do when carveout allocated by host is not fitting with 
> resource table request?
> - put a warning and overwrite da address in the resource table?

Hmm, why da? This goes to my earlier comment about how you are able to
decide the da. Atleast your current ST driver seems to be assigning the
same value as the physical bus address for da, which would prompt why
you would still need a carveout entry in the resource table if it is
truly one-to-one.

Eg, I have an upcoming usecase with R5Fs on newer TI SoCs where we
actually have a sub-module called Region Address Translator (RAT) which
can only be programmed by the R5F for translating the 32-bit CPU
addresses to larger physical address space, and yet I need the da and pa
to be able to do loading. I cannot dictate the da since that is what the
firmware images are linked against. So, have to rely on the firmware
providing this data for me.

> - stop rproc probe as no match detected?

I think that is the safest approach.

> 
> Later in the series, carveout allocation is changed. Resource table carveout 
> are either linked with an existing carveout registered by driver or added to 
> carveout list for allocations.
> In the case you described, TI driver should first register the specific 
> carveout regions thank to the helper.

The current series should still continue to work without having to
enforce new name assignments (unless needed and being defined to use the
new features being added).

> 
> Regards,
> Loic
> 
>> regards
>> Suman
>>
>>> ---
>>>  drivers/remoteproc/remoteproc_core.c | 10 +-
>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/remoteproc/remoteproc_core.c
>> b/drivers/remoteproc/remoteproc_core.c
>>> index 4cd1a8e..437fabf 100644
>>> --- a/drivers/remoteproc/remoteproc_core.c
>>> +++ b/drivers/remoteproc/remoteproc_core.c
>>> @@ -657,7 +657,15 @@ static int rproc_handle_carveout(struct rproc
>> *rproc,
>>>  * to use the iommu-based DMA API: we expect 'dma' to

Re: [PATCH v4 01/17] remoteproc: configure IOMMU only if device address requested

2018-10-23 Thread Suman Anna

On 10/23/18 2:40 PM, Loic PALLARDY wrote:
> Hi Suman,
> 
>> -Original Message-
>> From: Suman Anna 
>> Sent: mardi 23 octobre 2018 19:26
>> To: Loic PALLARDY ; bjorn.anders...@linaro.org;
>> o...@wizery.com
>> Cc: linux-remotep...@vger.kernel.org; linux-kernel@vger.kernel.org;
>> Arnaud POULIQUEN ;
>> benjamin.gaign...@linaro.org
>> Subject: Re: [PATCH v4 01/17] remoteproc: configure IOMMU only if device
>> address requested
>>
>> Hi Loic,
>>
>> On 7/27/18 8:14 AM, Loic Pallardy wrote:
>>> If there is no IOMMU associate to remote processor device,
>>> remoteproc_core won't be able to satisfy device address requested
>>> in firmware resource table.
>>> Return an error as configuration won't be coherent.
>>>
>>> Signed-off-by: Loic Pallardy 
>>
>> This patch is breaking my Davinci platforms. It is not really required
>> that you _should_ have IOMMUs when a valid DA is mentioned. Please see
>> the existing description (paras 4 and 5) on the fw_rsc_carveout
>> kerneldoc in remoteproc.h file.
> 
> Thanks for pointing this comment. Indeed sMMU is not mandatory, and at first 
> sight I agree we should remove the restriction introduced by the patch.
> Driver porting on the series should be done before adding this.
>>
>> We do have platforms where we have some internal sub-modules within the
>> remote processor sub-system that provides some linear
>> address-translation (most common case with 32-bit processors supporting
>> 64-bit addresses). Also, we have some upcoming SoCs where we have an
>> MMU
>> but is not programmable by Linux.
>>
>> There is one comment there, but I don't think this is actually handled
>> in the current remoteproc core.
>> "If @da is set to
>>  * FW_RSC_ADDR_ANY, then the host will dynamically allocate it, and then
>>  * overwrite @da with the dynamically allocated address."
>>
> I don't remember it was implemented like described.

Yes, it was missing, and one of your patches seem to add this behavior
now. That said, I really don't think the remoteproc core can dictate the
 da. Even if the individual remoteproc driver were to furnish this, how
would you get such data without forcing a fixed behavior for all
possible firmwares (not desirable). We should get rid of this comment,
and any code that seems to do this.

> 
> I have remarks about the comment:
> "* We will always use @da to negotiate the device addresses, even if it
>  * isn't using an iommu. In that case, though, it will obviously contain
>  * physical addresses."
> 
> When there is no sMMU, we can't consider that da contains a physical address 
> because coprocessor can have its own memory map just because it is a 32bit 
> processor accessing only a part of the memory and the main is 64bit one. The 
> 2 processors won't see the internal memory at the same base address for 
> example.

Agreed, believe it was valid when it was written (32-bit platforms
supporting 32-bit addresses). I think this is akin to an IPA
(Intermediate Physical Address).

> So what should we do when carveout allocated by host is not fitting with 
> resource table request?
> - put a warning and overwrite da address in the resource table?

Hmm, why da? This goes to my earlier comment about how you are able to
decide the da. Atleast your current ST driver seems to be assigning the
same value as the physical bus address for da, which would prompt why
you would still need a carveout entry in the resource table if it is
truly one-to-one.

Eg, I have an upcoming usecase with R5Fs on newer TI SoCs where we
actually have a sub-module called Region Address Translator (RAT) which
can only be programmed by the R5F for translating the 32-bit CPU
addresses to larger physical address space, and yet I need the da and pa
to be able to do loading. I cannot dictate the da since that is what the
firmware images are linked against. So, have to rely on the firmware
providing this data for me.

> - stop rproc probe as no match detected?

I think that is the safest approach.

> 
> Later in the series, carveout allocation is changed. Resource table carveout 
> are either linked with an existing carveout registered by driver or added to 
> carveout list for allocations.
> In the case you described, TI driver should first register the specific 
> carveout regions thank to the helper.

The current series should still continue to work without having to
enforce new name assignments (unless needed and being defined to use the
new features being added).

> 
> Regards,
> Loic
> 
>> regards
>> Suman
>>
>>> ---
>>>  drivers/remoteproc/remoteproc_core.c | 10 +-
>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/remoteproc/remoteproc_core.c
>> b/drivers/remoteproc/remoteproc_core.c
>>> index 4cd1a8e..437fabf 100644
>>> --- a/drivers/remoteproc/remoteproc_core.c
>>> +++ b/drivers/remoteproc/remoteproc_core.c
>>> @@ -657,7 +657,15 @@ static int rproc_handle_carveout(struct rproc
>> *rproc,
>>>  * to use the iommu-based DMA API: we expect 'dma' to

[PATCH 1/1] nds32: Power management for nds32

2018-10-23 Thread Nickhu

There are three sleep states in nds32:
suspend to idle,
suspend to standby,
suspend to ram

In suspend to ram, we use the 'standby' instruction to emulate
power management device to hang the system util wakeup source
send wakeup events to break the loop.

First, we push the general purpose registers and system registers
to stack. Second, we translate stack pointer to physical address
and store to memory to save the stack pointer. Third, after write
back and invalid the cache we hang in 'standby' intruction.
When wakeup source trigger wake up events, the loop will be break
and resume the system.

Signed-off-by: Nickhu 
---
 arch/nds32/Kconfig   |  10 +++
 arch/nds32/include/asm/suspend.h |  11 +++
 arch/nds32/kernel/Makefile   |   2 +-
 arch/nds32/kernel/pm.c   |  91 ++
 arch/nds32/kernel/sleep.S| 129 +++
 drivers/irqchip/irq-ativic32.c   |  29 +++
 6 files changed, 271 insertions(+), 1 deletion(-)
 create mode 100644 arch/nds32/include/asm/suspend.h
 create mode 100644 arch/nds32/kernel/pm.c
 create mode 100644 arch/nds32/kernel/sleep.S

diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig
index dd448d431f5a..8e2c5ac6acd1 100644
--- a/arch/nds32/Kconfig
+++ b/arch/nds32/Kconfig
@@ -95,3 +95,13 @@ endmenu
 menu "Kernel Features"
 source "kernel/Kconfig.hz"
 endmenu
+
+menu "Power management options"
+config SYS_SUPPORTS_APM_EMULATION
+   bool
+
+config ARCH_SUSPEND_POSSIBLE
+   def_bool y
+
+source "kernel/power/Kconfig"
+endmenu
diff --git a/arch/nds32/include/asm/suspend.h b/arch/nds32/include/asm/suspend.h
new file mode 100644
index ..6ed2418af1ac
--- /dev/null
+++ b/arch/nds32/include/asm/suspend.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+// Copyright (C) 2008-2017 Andes Technology Corporation
+
+#ifndef __ASM_NDS32_SUSPEND_H
+#define __ASM_NDS32_SUSPEND_H
+
+extern void suspend2ram(void);
+extern void cpu_resume(void);
+extern unsigned long wake_mask;
+
+#endif
diff --git a/arch/nds32/kernel/Makefile b/arch/nds32/kernel/Makefile
index f52bd2744f50..8d62f2ecb1ab 100644
--- a/arch/nds32/kernel/Makefile
+++ b/arch/nds32/kernel/Makefile
@@ -16,7 +16,7 @@ obj-$(CONFIG_STACKTRACE)  += stacktrace.o
 obj-$(CONFIG_OF)   += devtree.o
 obj-$(CONFIG_CACHE_L2) += atl2c.o
 obj-$(CONFIG_PERF_EVENTS) += perf_event_cpu.o
-
+obj-$(CONFIG_PM)   += pm.o sleep.o
 extra-y := head.o vmlinux.lds
 
 obj-y  += vdso/
diff --git a/arch/nds32/kernel/pm.c b/arch/nds32/kernel/pm.c
new file mode 100644
index ..e1eaf3bac709
--- /dev/null
+++ b/arch/nds32/kernel/pm.c
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2008-2017 Andes Technology Corporation
+
+/*
+ * nds32 Power Management Routines
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License.
+ *
+ *  Abstract:
+ *
+ *This program is for nds32 power management routines.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+unsigned int resume_addr;
+unsigned int *phy_addr_sp_tmp;
+
+static void nds32_suspend2ram(void)
+{
+   pgd_t *pgdv;
+   pud_t *pudv;
+   pmd_t *pmdv;
+   pte_t *ptev;
+
+   pgdv = (pgd_t *)__va((__nds32__mfsr(NDS32_SR_L1_PPTB) &
+   L1_PPTB_mskBASE)) + pgd_index((unsigned int)cpu_resume);
+
+   pudv = pud_offset(pgdv, (unsigned int)cpu_resume);
+   pmdv = pmd_offset(pudv, (unsigned int)cpu_resume);
+   ptev = pte_offset_map(pmdv, (unsigned int)cpu_resume);
+
+   resume_addr = ((*ptev) & TLB_DATA_mskPPN)
+   | ((unsigned int)cpu_resume & 0x0fff);
+
+   suspend2ram();
+}
+
+static void nds32_suspend_cpu(void)
+{
+   while (!(__nds32__mfsr(NDS32_SR_INT_PEND) & wake_mask))
+   __asm__ volatile ("standby no_wake_grant\n\t");
+}
+
+static int nds32_pm_valid(suspend_state_t state)
+{
+   switch (state) {
+   case PM_SUSPEND_ON:
+   case PM_SUSPEND_STANDBY:
+   case PM_SUSPEND_MEM:
+   return 1;
+   default:
+   return 0;
+   }
+}
+
+static int nds32_pm_enter(suspend_state_t state)
+{
+   pr_debug("%s:state:%d\n", __func__, state);
+   switch (state) {
+   case PM_SUSPEND_STANDBY:
+   nds32_suspend_cpu();
+   return 0;
+   case PM_SUSPEND_MEM:
+   nds32_suspend2ram();
+   return 0;
+   default:
+   return -EINVAL;
+   }
+}
+
+static const struct platform_suspend_ops nds32_pm_ops = {
+   .valid = nds32_pm_valid,
+   .enter = nds32_pm_enter,
+};
+
+static int __init nds32_pm_init(void)
+{
+   pr_debug("Enter %s\n", __func__);
+   suspend_set_ops(_pm_ops);
+   return 0;
+}
+late_initcall(nds32_pm_init);
diff --git a/arch/nds32/kernel/sleep.S b/arch/nds32/kernel/sleep.S
new

[PATCH 0/1] nds32: Power management

2018-10-23 Thread Nickhu

This commit is power management porting for nds32.

Nickhu (1):
  nds32: Power management for nds32

 arch/nds32/Kconfig   |  10 +++
 arch/nds32/include/asm/suspend.h |  11 +++
 arch/nds32/kernel/Makefile   |   2 +-
 arch/nds32/kernel/pm.c   |  91 ++
 arch/nds32/kernel/sleep.S| 129 +++
 drivers/irqchip/irq-ativic32.c   |  29 +++
 6 files changed, 271 insertions(+), 1 deletion(-)
 create mode 100644 arch/nds32/include/asm/suspend.h
 create mode 100644 arch/nds32/kernel/pm.c
 create mode 100644 arch/nds32/kernel/sleep.S

-- 
2.17.0

[PATCH 1/1] nds32: Power management for nds32

2018-10-23 Thread Nickhu

There are three sleep states in nds32:
suspend to idle,
suspend to standby,
suspend to ram

In suspend to ram, we use the 'standby' instruction to emulate
power management device to hang the system util wakeup source
send wakeup events to break the loop.

First, we push the general purpose registers and system registers
to stack. Second, we translate stack pointer to physical address
and store to memory to save the stack pointer. Third, after write
back and invalid the cache we hang in 'standby' intruction.
When wakeup source trigger wake up events, the loop will be break
and resume the system.

Signed-off-by: Nickhu 
---
 arch/nds32/Kconfig   |  10 +++
 arch/nds32/include/asm/suspend.h |  11 +++
 arch/nds32/kernel/Makefile   |   2 +-
 arch/nds32/kernel/pm.c   |  91 ++
 arch/nds32/kernel/sleep.S| 129 +++
 drivers/irqchip/irq-ativic32.c   |  29 +++
 6 files changed, 271 insertions(+), 1 deletion(-)
 create mode 100644 arch/nds32/include/asm/suspend.h
 create mode 100644 arch/nds32/kernel/pm.c
 create mode 100644 arch/nds32/kernel/sleep.S

diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig
index dd448d431f5a..8e2c5ac6acd1 100644
--- a/arch/nds32/Kconfig
+++ b/arch/nds32/Kconfig
@@ -95,3 +95,13 @@ endmenu
 menu "Kernel Features"
 source "kernel/Kconfig.hz"
 endmenu
+
+menu "Power management options"
+config SYS_SUPPORTS_APM_EMULATION
+   bool
+
+config ARCH_SUSPEND_POSSIBLE
+   def_bool y
+
+source "kernel/power/Kconfig"
+endmenu
diff --git a/arch/nds32/include/asm/suspend.h b/arch/nds32/include/asm/suspend.h
new file mode 100644
index ..6ed2418af1ac
--- /dev/null
+++ b/arch/nds32/include/asm/suspend.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+// Copyright (C) 2008-2017 Andes Technology Corporation
+
+#ifndef __ASM_NDS32_SUSPEND_H
+#define __ASM_NDS32_SUSPEND_H
+
+extern void suspend2ram(void);
+extern void cpu_resume(void);
+extern unsigned long wake_mask;
+
+#endif
diff --git a/arch/nds32/kernel/Makefile b/arch/nds32/kernel/Makefile
index f52bd2744f50..8d62f2ecb1ab 100644
--- a/arch/nds32/kernel/Makefile
+++ b/arch/nds32/kernel/Makefile
@@ -16,7 +16,7 @@ obj-$(CONFIG_STACKTRACE)  += stacktrace.o
 obj-$(CONFIG_OF)   += devtree.o
 obj-$(CONFIG_CACHE_L2) += atl2c.o
 obj-$(CONFIG_PERF_EVENTS) += perf_event_cpu.o
-
+obj-$(CONFIG_PM)   += pm.o sleep.o
 extra-y := head.o vmlinux.lds
 
 obj-y  += vdso/
diff --git a/arch/nds32/kernel/pm.c b/arch/nds32/kernel/pm.c
new file mode 100644
index ..e1eaf3bac709
--- /dev/null
+++ b/arch/nds32/kernel/pm.c
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2008-2017 Andes Technology Corporation
+
+/*
+ * nds32 Power Management Routines
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License.
+ *
+ *  Abstract:
+ *
+ *This program is for nds32 power management routines.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+unsigned int resume_addr;
+unsigned int *phy_addr_sp_tmp;
+
+static void nds32_suspend2ram(void)
+{
+   pgd_t *pgdv;
+   pud_t *pudv;
+   pmd_t *pmdv;
+   pte_t *ptev;
+
+   pgdv = (pgd_t *)__va((__nds32__mfsr(NDS32_SR_L1_PPTB) &
+   L1_PPTB_mskBASE)) + pgd_index((unsigned int)cpu_resume);
+
+   pudv = pud_offset(pgdv, (unsigned int)cpu_resume);
+   pmdv = pmd_offset(pudv, (unsigned int)cpu_resume);
+   ptev = pte_offset_map(pmdv, (unsigned int)cpu_resume);
+
+   resume_addr = ((*ptev) & TLB_DATA_mskPPN)
+   | ((unsigned int)cpu_resume & 0x0fff);
+
+   suspend2ram();
+}
+
+static void nds32_suspend_cpu(void)
+{
+   while (!(__nds32__mfsr(NDS32_SR_INT_PEND) & wake_mask))
+   __asm__ volatile ("standby no_wake_grant\n\t");
+}
+
+static int nds32_pm_valid(suspend_state_t state)
+{
+   switch (state) {
+   case PM_SUSPEND_ON:
+   case PM_SUSPEND_STANDBY:
+   case PM_SUSPEND_MEM:
+   return 1;
+   default:
+   return 0;
+   }
+}
+
+static int nds32_pm_enter(suspend_state_t state)
+{
+   pr_debug("%s:state:%d\n", __func__, state);
+   switch (state) {
+   case PM_SUSPEND_STANDBY:
+   nds32_suspend_cpu();
+   return 0;
+   case PM_SUSPEND_MEM:
+   nds32_suspend2ram();
+   return 0;
+   default:
+   return -EINVAL;
+   }
+}
+
+static const struct platform_suspend_ops nds32_pm_ops = {
+   .valid = nds32_pm_valid,
+   .enter = nds32_pm_enter,
+};
+
+static int __init nds32_pm_init(void)
+{
+   pr_debug("Enter %s\n", __func__);
+   suspend_set_ops(_pm_ops);
+   return 0;
+}
+late_initcall(nds32_pm_init);
diff --git a/arch/nds32/kernel/sleep.S b/arch/nds32/kernel/sleep.S
new

[PATCH 0/1] nds32: Power management

2018-10-23 Thread Nickhu

This commit is power management porting for nds32.

Nickhu (1):
  nds32: Power management for nds32

 arch/nds32/Kconfig   |  10 +++
 arch/nds32/include/asm/suspend.h |  11 +++
 arch/nds32/kernel/Makefile   |   2 +-
 arch/nds32/kernel/pm.c   |  91 ++
 arch/nds32/kernel/sleep.S| 129 +++
 drivers/irqchip/irq-ativic32.c   |  29 +++
 6 files changed, 271 insertions(+), 1 deletion(-)
 create mode 100644 arch/nds32/include/asm/suspend.h
 create mode 100644 arch/nds32/kernel/pm.c
 create mode 100644 arch/nds32/kernel/sleep.S

-- 
2.17.0

[PATCH 14/25] afs: Commit the status on a new file/dir/symlink [ver #2]

2018-10-23 Thread David Howells

Call the function to commit the status on a new file, dir or symlink so
that the access rights for the caller's key are cached for that object.

Without this, the next access to the file will cause a FetchStatus
operation to be emitted to retrieve the access rights.

Signed-off-by: David Howells 
---

 fs/afs/dir.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 024b7cf7441c..8936731c59ff 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -1089,6 +1089,7 @@ static void afs_vnode_new_inode(struct afs_fs_cursor *fc,
 
vnode = AFS_FS_I(inode);
set_bit(AFS_VNODE_NEW_CONTENT, >flags);
+   afs_vnode_commit_status(fc, vnode, 0);
d_add(new_dentry, inode);
 }

[PATCH 24/25] afs: Fix callback handling [ver #2]

2018-10-23 Thread David Howells

In some circumstances, the callback interest pointer is NULL, so in such a
case we can't dereference it when checking to see if the callback is
broken.  This causes an oops in some circumstances.

Fix this by replacing the function that worked out the aggregate break
counter with one that actually does the comparison, and then make that
return true (ie. broken) if there is no callback interest as yet (ie. the
pointer is NULL).

Fixes: 68251f0a6818 ("afs: Fix whole-volume callback handling")
Signed-off-by: David Howells 
---

 fs/afs/fsclient.c  |2 +-
 fs/afs/internal.h  |9 ++---
 fs/afs/security.c  |7 ---
 fs/afs/yfsclient.c |2 +-
 4 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 3975969719de..7c75a1813321 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -269,7 +269,7 @@ static void xdr_decode_AFSCallBack(struct afs_call *call,
 
write_seqlock(>cb_lock);
 
-   if (call->cb_break == afs_cb_break_sum(vnode, cbi)) {
+   if (!afs_cb_is_broken(call->cb_break, vnode, cbi)) {
vnode->cb_version   = ntohl(*bp++);
cb_expiry   = ntohl(*bp++);
vnode->cb_type  = ntohl(*bp++);
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index e5b596bd8acf..b60d15212975 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -776,10 +776,13 @@ static inline unsigned int afs_calc_vnode_cb_break(struct 
afs_vnode *vnode)
return vnode->cb_break + vnode->cb_s_break + vnode->cb_v_break;
 }
 
-static inline unsigned int afs_cb_break_sum(struct afs_vnode *vnode,
-   struct afs_cb_interest *cbi)
+static inline bool afs_cb_is_broken(unsigned int cb_break,
+   const struct afs_vnode *vnode,
+   const struct afs_cb_interest *cbi)
 {
-   return vnode->cb_break + cbi->server->cb_s_break + 
vnode->volume->cb_v_break;
+   return !cbi || cb_break != (vnode->cb_break +
+   cbi->server->cb_s_break +
+   vnode->volume->cb_v_break);
 }
 
 /*
diff --git a/fs/afs/security.c b/fs/afs/security.c
index d1ae53fd3739..5f58a9a17e69 100644
--- a/fs/afs/security.c
+++ b/fs/afs/security.c
@@ -147,7 +147,8 @@ void afs_cache_permit(struct afs_vnode *vnode, struct key 
*key,
break;
}
 
-   if (cb_break != afs_cb_break_sum(vnode, 
vnode->cb_interest)) {
+   if (afs_cb_is_broken(cb_break, vnode,
+vnode->cb_interest)) {
changed = true;
break;
}
@@ -177,7 +178,7 @@ void afs_cache_permit(struct afs_vnode *vnode, struct key 
*key,
}
}
 
-   if (cb_break != afs_cb_break_sum(vnode, vnode->cb_interest))
+   if (afs_cb_is_broken(cb_break, vnode, vnode->cb_interest))
goto someone_else_changed_it;
 
/* We need a ref on any permits list we want to copy as we'll have to
@@ -256,7 +257,7 @@ void afs_cache_permit(struct afs_vnode *vnode, struct key 
*key,
 
spin_lock(>lock);
zap = rcu_access_pointer(vnode->permit_cache);
-   if (cb_break == afs_cb_break_sum(vnode, vnode->cb_interest) &&
+   if (!afs_cb_is_broken(cb_break, vnode, vnode->cb_interest) &&
zap == permits)
rcu_assign_pointer(vnode->permit_cache, replacement);
else
diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c
index d5e3f0095040..12658c1363ae 100644
--- a/fs/afs/yfsclient.c
+++ b/fs/afs/yfsclient.c
@@ -324,7 +324,7 @@ static void xdr_decode_YFSCallBack(struct afs_call *call,
 
write_seqlock(>cb_lock);
 
-   if (call->cb_break == afs_cb_break_sum(vnode, cbi)) {
+   if (!afs_cb_is_broken(call->cb_break, vnode, cbi)) {
cb_expiry = xdr_to_u64(xdr->expiration_time);
do_div(cb_expiry, 10 * 1000 * 1000);
vnode->cb_version   = ntohl(xdr->version);

[PATCH 14/25] afs: Commit the status on a new file/dir/symlink [ver #2]

2018-10-23 Thread David Howells

Call the function to commit the status on a new file, dir or symlink so
that the access rights for the caller's key are cached for that object.

Without this, the next access to the file will cause a FetchStatus
operation to be emitted to retrieve the access rights.

Signed-off-by: David Howells 
---

 fs/afs/dir.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 024b7cf7441c..8936731c59ff 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -1089,6 +1089,7 @@ static void afs_vnode_new_inode(struct afs_fs_cursor *fc,
 
vnode = AFS_FS_I(inode);
set_bit(AFS_VNODE_NEW_CONTENT, >flags);
+   afs_vnode_commit_status(fc, vnode, 0);
d_add(new_dentry, inode);
 }

[PATCH 24/25] afs: Fix callback handling [ver #2]

2018-10-23 Thread David Howells

In some circumstances, the callback interest pointer is NULL, so in such a
case we can't dereference it when checking to see if the callback is
broken.  This causes an oops in some circumstances.

Fix this by replacing the function that worked out the aggregate break
counter with one that actually does the comparison, and then make that
return true (ie. broken) if there is no callback interest as yet (ie. the
pointer is NULL).

Fixes: 68251f0a6818 ("afs: Fix whole-volume callback handling")
Signed-off-by: David Howells 
---

 fs/afs/fsclient.c  |2 +-
 fs/afs/internal.h  |9 ++---
 fs/afs/security.c  |7 ---
 fs/afs/yfsclient.c |2 +-
 4 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 3975969719de..7c75a1813321 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -269,7 +269,7 @@ static void xdr_decode_AFSCallBack(struct afs_call *call,
 
write_seqlock(>cb_lock);
 
-   if (call->cb_break == afs_cb_break_sum(vnode, cbi)) {
+   if (!afs_cb_is_broken(call->cb_break, vnode, cbi)) {
vnode->cb_version   = ntohl(*bp++);
cb_expiry   = ntohl(*bp++);
vnode->cb_type  = ntohl(*bp++);
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index e5b596bd8acf..b60d15212975 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -776,10 +776,13 @@ static inline unsigned int afs_calc_vnode_cb_break(struct 
afs_vnode *vnode)
return vnode->cb_break + vnode->cb_s_break + vnode->cb_v_break;
 }
 
-static inline unsigned int afs_cb_break_sum(struct afs_vnode *vnode,
-   struct afs_cb_interest *cbi)
+static inline bool afs_cb_is_broken(unsigned int cb_break,
+   const struct afs_vnode *vnode,
+   const struct afs_cb_interest *cbi)
 {
-   return vnode->cb_break + cbi->server->cb_s_break + 
vnode->volume->cb_v_break;
+   return !cbi || cb_break != (vnode->cb_break +
+   cbi->server->cb_s_break +
+   vnode->volume->cb_v_break);
 }
 
 /*
diff --git a/fs/afs/security.c b/fs/afs/security.c
index d1ae53fd3739..5f58a9a17e69 100644
--- a/fs/afs/security.c
+++ b/fs/afs/security.c
@@ -147,7 +147,8 @@ void afs_cache_permit(struct afs_vnode *vnode, struct key 
*key,
break;
}
 
-   if (cb_break != afs_cb_break_sum(vnode, 
vnode->cb_interest)) {
+   if (afs_cb_is_broken(cb_break, vnode,
+vnode->cb_interest)) {
changed = true;
break;
}
@@ -177,7 +178,7 @@ void afs_cache_permit(struct afs_vnode *vnode, struct key 
*key,
}
}
 
-   if (cb_break != afs_cb_break_sum(vnode, vnode->cb_interest))
+   if (afs_cb_is_broken(cb_break, vnode, vnode->cb_interest))
goto someone_else_changed_it;
 
/* We need a ref on any permits list we want to copy as we'll have to
@@ -256,7 +257,7 @@ void afs_cache_permit(struct afs_vnode *vnode, struct key 
*key,
 
spin_lock(>lock);
zap = rcu_access_pointer(vnode->permit_cache);
-   if (cb_break == afs_cb_break_sum(vnode, vnode->cb_interest) &&
+   if (!afs_cb_is_broken(cb_break, vnode, vnode->cb_interest) &&
zap == permits)
rcu_assign_pointer(vnode->permit_cache, replacement);
else
diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c
index d5e3f0095040..12658c1363ae 100644
--- a/fs/afs/yfsclient.c
+++ b/fs/afs/yfsclient.c
@@ -324,7 +324,7 @@ static void xdr_decode_YFSCallBack(struct afs_call *call,
 
write_seqlock(>cb_lock);
 
-   if (call->cb_break == afs_cb_break_sum(vnode, cbi)) {
+   if (!afs_cb_is_broken(call->cb_break, vnode, cbi)) {
cb_expiry = xdr_to_u64(xdr->expiration_time);
do_div(cb_expiry, 10 * 1000 * 1000);
vnode->cb_version   = ntohl(xdr->version);

[PATCH 19/25] afs: Get the target vnode in afs_rmdir() and get a callback on it [ver #2]

2018-10-23 Thread David Howells

Get the target vnode in afs_rmdir() and validate it before we attempt the
deletion, The vnode pointer will be passed through to the delivery function
in a later patch so that the delivery function can mark it deleted.

Signed-off-by: David Howells 
---

 fs/afs/dir.c |   11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 8936731c59ff..f2dd48d4363f 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -1174,7 +1174,7 @@ static void afs_dir_remove_subdir(struct dentry *dentry)
 static int afs_rmdir(struct inode *dir, struct dentry *dentry)
 {
struct afs_fs_cursor fc;
-   struct afs_vnode *dvnode = AFS_FS_I(dir);
+   struct afs_vnode *dvnode = AFS_FS_I(dir), *vnode = NULL;
struct key *key;
u64 data_version = dvnode->status.data_version;
int ret;
@@ -1188,6 +1188,14 @@ static int afs_rmdir(struct inode *dir, struct dentry 
*dentry)
goto error;
}
 
+   /* Try to make sure we have a callback promise on the victim. */
+   if (d_really_is_positive(dentry)) {
+   vnode = AFS_FS_I(d_inode(dentry));
+   ret = afs_validate(vnode, key);
+   if (ret < 0)
+   goto error_key;
+   }
+
ret = -ERESTARTSYS;
if (afs_begin_vnode_operation(, dvnode, key)) {
while (afs_select_fileserver()) {
@@ -1206,6 +1214,7 @@ static int afs_rmdir(struct inode *dir, struct dentry 
*dentry)
}
}
 
+error_key:
key_put(key);
 error:
return ret;

[PATCH 19/25] afs: Get the target vnode in afs_rmdir() and get a callback on it [ver #2]

2018-10-23 Thread David Howells

Get the target vnode in afs_rmdir() and validate it before we attempt the
deletion, The vnode pointer will be passed through to the delivery function
in a later patch so that the delivery function can mark it deleted.

Signed-off-by: David Howells 
---

 fs/afs/dir.c |   11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 8936731c59ff..f2dd48d4363f 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -1174,7 +1174,7 @@ static void afs_dir_remove_subdir(struct dentry *dentry)
 static int afs_rmdir(struct inode *dir, struct dentry *dentry)
 {
struct afs_fs_cursor fc;
-   struct afs_vnode *dvnode = AFS_FS_I(dir);
+   struct afs_vnode *dvnode = AFS_FS_I(dir), *vnode = NULL;
struct key *key;
u64 data_version = dvnode->status.data_version;
int ret;
@@ -1188,6 +1188,14 @@ static int afs_rmdir(struct inode *dir, struct dentry 
*dentry)
goto error;
}
 
+   /* Try to make sure we have a callback promise on the victim. */
+   if (d_really_is_positive(dentry)) {
+   vnode = AFS_FS_I(d_inode(dentry));
+   ret = afs_validate(vnode, key);
+   if (ret < 0)
+   goto error_key;
+   }
+
ret = -ERESTARTSYS;
if (afs_begin_vnode_operation(, dvnode, key)) {
while (afs_select_fileserver()) {
@@ -1206,6 +1214,7 @@ static int afs_rmdir(struct inode *dir, struct dentry 
*dentry)
}
}
 
+error_key:
key_put(key);
 error:
return ret;

[PATCH 22/25] afs: Allow dumping of server cursor on operation failure [ver #2]

2018-10-23 Thread David Howells

Provide an option to allow the file or volume location server cursor to be
dumped if the rotation routine falls off the end without managing to
contact a server.

Signed-off-by: David Howells 
---

 fs/afs/Kconfig |   12 +++
 fs/afs/addr_list.c |2 ++
 fs/afs/internal.h  |3 +++
 fs/afs/rotate.c|   57 
 fs/afs/vl_rotate.c |   53 
 5 files changed, 127 insertions(+)

diff --git a/fs/afs/Kconfig b/fs/afs/Kconfig
index ebba3b18e5da..701aaa9b1899 100644
--- a/fs/afs/Kconfig
+++ b/fs/afs/Kconfig
@@ -27,3 +27,15 @@ config AFS_FSCACHE
help
  Say Y here if you want AFS data to be cached locally on disk through
  the generic filesystem cache manager
+
+config AFS_DEBUG_CURSOR
+   bool "AFS server cursor debugging"
+   depends on AFS_FS
+   help
+ Say Y here to cause the contents of a server cursor to be dumped to
+ the dmesg log if the server rotation algorithm fails to successfully
+ contact a server.
+
+ See  for more information.
+
+ If unsure, say N.
diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c
index 3f60b4012587..bc5ce31a4ae4 100644
--- a/fs/afs/addr_list.c
+++ b/fs/afs/addr_list.c
@@ -358,6 +358,8 @@ bool afs_iterate_addresses(struct afs_addr_cursor *ac)
if (!ac->alist)
return false;
 
+   ac->nr_iterations++;
+
if (ac->begun) {
ac->index++;
if (ac->index == ac->alist->nr_addrs)
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index ce79bd514331..ac9da1e4050e 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -660,6 +660,7 @@ struct afs_addr_cursor {
short   error;
boolbegun;  /* T if we've begun iteration */
boolresponded;  /* T if the current address 
responded */
+   unsigned short  nr_iterations;  /* Number of address iterations 
*/
 };
 
 /*
@@ -677,6 +678,7 @@ struct afs_vl_cursor {
 #define AFS_VL_CURSOR_STOP 0x0001  /* Set to cease iteration */
 #define AFS_VL_CURSOR_RETRY0x0002  /* Set to do a retry */
 #define AFS_VL_CURSOR_RETRIED  0x0004  /* Set if started a retry */
+   unsigned short  nr_iterations;  /* Number of server iterations 
*/
 };
 
 /*
@@ -700,6 +702,7 @@ struct afs_fs_cursor {
 #define AFS_FS_CURSOR_VNOVOL   0x0008  /* Set if seen VNOVOL */
 #define AFS_FS_CURSOR_CUR_ONLY 0x0010  /* Set if current server only 
(file lock held) */
 #define AFS_FS_CURSOR_NO_VSLEEP0x0020  /* Set to prevent sleep 
on VBUSY, VOFFLINE, ... */
+   unsigned short  nr_iterations;  /* Number of server iterations 
*/
 };
 
 /*
diff --git a/fs/afs/rotate.c b/fs/afs/rotate.c
index 41405dde0113..7c4487781637 100644
--- a/fs/afs/rotate.c
+++ b/fs/afs/rotate.c
@@ -156,6 +156,8 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc)
return false;
}
 
+   fc->nr_iterations++;
+
/* Evaluate the result of the previous operation, if there was one. */
switch (error) {
case SHRT_MAX:
@@ -519,6 +521,56 @@ bool afs_select_current_fileserver(struct afs_fs_cursor 
*fc)
return false;
 }
 
+/*
+ * Dump cursor state in the case of the error being EDESTADDRREQ.
+ */
+static void afs_dump_edestaddrreq(const struct afs_fs_cursor *fc)
+{
+   static int count;
+   int i;
+
+   if (!IS_ENABLED(CONFIG_AFS_DEBUG_CURSOR) || count > 3)
+   return;
+   count++;
+
+   rcu_read_lock();
+
+   pr_notice("EDESTADDR occurred\n");
+   pr_notice("FC: cbb=%x cbb2=%x fl=%hx err=%hd\n",
+ fc->cb_break, fc->cb_break_2, fc->flags, fc->error);
+   pr_notice("FC: st=%u ix=%u ni=%u\n",
+ fc->start, fc->index, fc->nr_iterations);
+
+   if (fc->server_list) {
+   const struct afs_server_list *sl = fc->server_list;
+   pr_notice("FC: SL nr=%u ix=%u vnov=%hx\n",
+ sl->nr_servers, sl->index, sl->vnovol_mask);
+   for (i = 0; i < sl->nr_servers; i++) {
+   const struct afs_server *s = sl->servers[i].server;
+   pr_notice("FC: server fl=%lx av=%u %pU\n",
+ s->flags, s->addr_version, >uuid);
+   if (s->addresses) {
+   const struct afs_addr_list *a =
+   rcu_dereference(s->addresses);
+   pr_notice("FC:  - av=%u nr=%u/%u/%u ax=%u\n",
+ a->version,
+ a->nr_ipv4, a->nr_addrs, a->max_addrs,
+ a->index);
+   pr_notice("FC:  - pr=%lx yf=%lx\n",
+

[PATCH 12/25] afs: Don't invoke the server to read data beyond EOF [ver #2]

2018-10-23 Thread David Howells

When writing a new page, clear space in the page rather than attempting to
load it from the server if the space is beyond the EOF.

Signed-off-by: David Howells 
---

 fs/afs/write.c |   11 +++
 1 file changed, 11 insertions(+)

diff --git a/fs/afs/write.c b/fs/afs/write.c
index fdb9d6024126..11066a3248ba 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -33,10 +33,21 @@ static int afs_fill_page(struct afs_vnode *vnode, struct 
key *key,
 loff_t pos, unsigned int len, struct page *page)
 {
struct afs_read *req;
+   size_t p;
+   void *data;
int ret;
 
_enter(",,%llu", (unsigned long long)pos);
 
+   if (pos >= vnode->vfs_inode.i_size) {
+   p = pos & ~PAGE_MASK;
+   ASSERTCMP(p + len, <=, PAGE_SIZE);
+   data = kmap(page);
+   memset(data + p, 0, len);
+   kunmap(page);
+   return 0;
+   }
+
req = kzalloc(sizeof(struct afs_read) + sizeof(struct page *),
  GFP_KERNEL);
if (!req)

[PATCH 08/25] afs: Implement VL server rotation [ver #2]

2018-10-23 Thread David Howells

Track VL servers as independent entities rather than lumping all their
addresses together into one set and implement server-level rotation by:

 (1) Add the concept of a VL server list, where each server has its own
 separate address list.  This code is similar to the FS server list.

 (2) Use the DNS resolver to retrieve a set of servers and their associated
 addresses, ports, preference and weight ratings.

 (3) In the case of a legacy DNS resolver or an address list given directly
 through /proc/net/afs/cells, create a list containing just a dummy
 server record and attach all the addresses to that.

 (4) Implement a simple rotation policy, for the moment ignoring the
 priorities and weights assigned to the servers.

 (5) Show the address list through /proc/net/afs//vlservers.  This
 also displays the source and status of the data as indicated by the
 upcall.

Signed-off-by: David Howells 
---

 fs/afs/Makefile|2 
 fs/afs/addr_list.c |  163 +
 fs/afs/cell.c  |   39 +++---
 fs/afs/dynroot.c   |2 
 fs/afs/internal.h  |  114 --
 fs/afs/proc.c  |   90 +++---
 fs/afs/server.c|   42 ++-
 fs/afs/vl_list.c   |  336 
 fs/afs/vl_rotate.c |  251 +++
 fs/afs/vlclient.c  |   32 ++---
 fs/afs/volume.c|   52 ++--
 11 files changed, 905 insertions(+), 218 deletions(-)
 create mode 100644 fs/afs/vl_list.c
 create mode 100644 fs/afs/vl_rotate.c

diff --git a/fs/afs/Makefile b/fs/afs/Makefile
index 546874057bd3..03e9f7afea1b 100644
--- a/fs/afs/Makefile
+++ b/fs/afs/Makefile
@@ -29,6 +29,8 @@ kafs-y := \
super.o \
netdevices.o \
vlclient.o \
+   vl_rotate.o \
+   vl_list.o \
volume.o \
write.o \
xattr.o
diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c
index 7b34fad4f8f5..3f60b4012587 100644
--- a/fs/afs/addr_list.c
+++ b/fs/afs/addr_list.c
@@ -64,19 +64,25 @@ struct afs_addr_list *afs_alloc_addrlist(unsigned int nr,
 /*
  * Parse a text string consisting of delimited addresses.
  */
-struct afs_addr_list *afs_parse_text_addrs(const char *text, size_t len,
-  char delim,
-  unsigned short service,
-  unsigned short port)
+struct afs_vlserver_list *afs_parse_text_addrs(struct afs_net *net,
+  const char *text, size_t len,
+  char delim,
+  unsigned short service,
+  unsigned short port)
 {
+   struct afs_vlserver_list *vllist;
struct afs_addr_list *alist;
const char *p, *end = text + len;
+   const char *problem;
unsigned int nr = 0;
+   int ret = -ENOMEM;
 
_enter("%*.*s,%c", (int)len, (int)len, text, delim);
 
-   if (!len)
+   if (!len) {
+   _leave(" = -EDESTADDRREQ [empty]");
return ERR_PTR(-EDESTADDRREQ);
+   }
 
if (delim == ':' && (memchr(text, ',', len) || !memchr(text, '.', len)))
delim = ',';
@@ -84,18 +90,24 @@ struct afs_addr_list *afs_parse_text_addrs(const char 
*text, size_t len,
/* Count the addresses */
p = text;
do {
-   if (!*p)
-   return ERR_PTR(-EINVAL);
+   if (!*p) {
+   problem = "nul";
+   goto inval;
+   }
if (*p == delim)
continue;
nr++;
if (*p == '[') {
p++;
-   if (p == end)
-   return ERR_PTR(-EINVAL);
+   if (p == end) {
+   problem = "brace1";
+   goto inval;
+   }
p = memchr(p, ']', end - p);
-   if (!p)
-   return ERR_PTR(-EINVAL);
+   if (!p) {
+   problem = "brace2";
+   goto inval;
+   }
p++;
if (p >= end)
break;
@@ -109,10 +121,19 @@ struct afs_addr_list *afs_parse_text_addrs(const char 
*text, size_t len,
 
_debug("%u/%u addresses", nr, AFS_MAX_ADDRESSES);
 
-   alist = afs_alloc_addrlist(nr, service, port);
-   if (!alist)
+   vllist = afs_alloc_vlserver_list(1);
+   if (!vllist)
return ERR_PTR(-ENOMEM);
 
+   vllist->nr_servers = 1;
+   vllist->servers[0].server = afs_alloc_vlserver("", 7, 
AFS_VL_PORT);
+   if (!vllist->servers[0].server)
+   goto error_vl;
+
+

[PATCH 22/25] afs: Allow dumping of server cursor on operation failure [ver #2]

2018-10-23 Thread David Howells

Provide an option to allow the file or volume location server cursor to be
dumped if the rotation routine falls off the end without managing to
contact a server.

Signed-off-by: David Howells 
---

 fs/afs/Kconfig |   12 +++
 fs/afs/addr_list.c |2 ++
 fs/afs/internal.h  |3 +++
 fs/afs/rotate.c|   57 
 fs/afs/vl_rotate.c |   53 
 5 files changed, 127 insertions(+)

diff --git a/fs/afs/Kconfig b/fs/afs/Kconfig
index ebba3b18e5da..701aaa9b1899 100644
--- a/fs/afs/Kconfig
+++ b/fs/afs/Kconfig
@@ -27,3 +27,15 @@ config AFS_FSCACHE
help
  Say Y here if you want AFS data to be cached locally on disk through
  the generic filesystem cache manager
+
+config AFS_DEBUG_CURSOR
+   bool "AFS server cursor debugging"
+   depends on AFS_FS
+   help
+ Say Y here to cause the contents of a server cursor to be dumped to
+ the dmesg log if the server rotation algorithm fails to successfully
+ contact a server.
+
+ See  for more information.
+
+ If unsure, say N.
diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c
index 3f60b4012587..bc5ce31a4ae4 100644
--- a/fs/afs/addr_list.c
+++ b/fs/afs/addr_list.c
@@ -358,6 +358,8 @@ bool afs_iterate_addresses(struct afs_addr_cursor *ac)
if (!ac->alist)
return false;
 
+   ac->nr_iterations++;
+
if (ac->begun) {
ac->index++;
if (ac->index == ac->alist->nr_addrs)
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index ce79bd514331..ac9da1e4050e 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -660,6 +660,7 @@ struct afs_addr_cursor {
short   error;
boolbegun;  /* T if we've begun iteration */
boolresponded;  /* T if the current address 
responded */
+   unsigned short  nr_iterations;  /* Number of address iterations 
*/
 };
 
 /*
@@ -677,6 +678,7 @@ struct afs_vl_cursor {
 #define AFS_VL_CURSOR_STOP 0x0001  /* Set to cease iteration */
 #define AFS_VL_CURSOR_RETRY0x0002  /* Set to do a retry */
 #define AFS_VL_CURSOR_RETRIED  0x0004  /* Set if started a retry */
+   unsigned short  nr_iterations;  /* Number of server iterations 
*/
 };
 
 /*
@@ -700,6 +702,7 @@ struct afs_fs_cursor {
 #define AFS_FS_CURSOR_VNOVOL   0x0008  /* Set if seen VNOVOL */
 #define AFS_FS_CURSOR_CUR_ONLY 0x0010  /* Set if current server only 
(file lock held) */
 #define AFS_FS_CURSOR_NO_VSLEEP0x0020  /* Set to prevent sleep 
on VBUSY, VOFFLINE, ... */
+   unsigned short  nr_iterations;  /* Number of server iterations 
*/
 };
 
 /*
diff --git a/fs/afs/rotate.c b/fs/afs/rotate.c
index 41405dde0113..7c4487781637 100644
--- a/fs/afs/rotate.c
+++ b/fs/afs/rotate.c
@@ -156,6 +156,8 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc)
return false;
}
 
+   fc->nr_iterations++;
+
/* Evaluate the result of the previous operation, if there was one. */
switch (error) {
case SHRT_MAX:
@@ -519,6 +521,56 @@ bool afs_select_current_fileserver(struct afs_fs_cursor 
*fc)
return false;
 }
 
+/*
+ * Dump cursor state in the case of the error being EDESTADDRREQ.
+ */
+static void afs_dump_edestaddrreq(const struct afs_fs_cursor *fc)
+{
+   static int count;
+   int i;
+
+   if (!IS_ENABLED(CONFIG_AFS_DEBUG_CURSOR) || count > 3)
+   return;
+   count++;
+
+   rcu_read_lock();
+
+   pr_notice("EDESTADDR occurred\n");
+   pr_notice("FC: cbb=%x cbb2=%x fl=%hx err=%hd\n",
+ fc->cb_break, fc->cb_break_2, fc->flags, fc->error);
+   pr_notice("FC: st=%u ix=%u ni=%u\n",
+ fc->start, fc->index, fc->nr_iterations);
+
+   if (fc->server_list) {
+   const struct afs_server_list *sl = fc->server_list;
+   pr_notice("FC: SL nr=%u ix=%u vnov=%hx\n",
+ sl->nr_servers, sl->index, sl->vnovol_mask);
+   for (i = 0; i < sl->nr_servers; i++) {
+   const struct afs_server *s = sl->servers[i].server;
+   pr_notice("FC: server fl=%lx av=%u %pU\n",
+ s->flags, s->addr_version, >uuid);
+   if (s->addresses) {
+   const struct afs_addr_list *a =
+   rcu_dereference(s->addresses);
+   pr_notice("FC:  - av=%u nr=%u/%u/%u ax=%u\n",
+ a->version,
+ a->nr_ipv4, a->nr_addrs, a->max_addrs,
+ a->index);
+   pr_notice("FC:  - pr=%lx yf=%lx\n",
+

[PATCH 12/25] afs: Don't invoke the server to read data beyond EOF [ver #2]

2018-10-23 Thread David Howells

When writing a new page, clear space in the page rather than attempting to
load it from the server if the space is beyond the EOF.

Signed-off-by: David Howells 
---

 fs/afs/write.c |   11 +++
 1 file changed, 11 insertions(+)

diff --git a/fs/afs/write.c b/fs/afs/write.c
index fdb9d6024126..11066a3248ba 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -33,10 +33,21 @@ static int afs_fill_page(struct afs_vnode *vnode, struct 
key *key,
 loff_t pos, unsigned int len, struct page *page)
 {
struct afs_read *req;
+   size_t p;
+   void *data;
int ret;
 
_enter(",,%llu", (unsigned long long)pos);
 
+   if (pos >= vnode->vfs_inode.i_size) {
+   p = pos & ~PAGE_MASK;
+   ASSERTCMP(p + len, <=, PAGE_SIZE);
+   data = kmap(page);
+   memset(data + p, 0, len);
+   kunmap(page);
+   return 0;
+   }
+
req = kzalloc(sizeof(struct afs_read) + sizeof(struct page *),
  GFP_KERNEL);
if (!req)

[PATCH 08/25] afs: Implement VL server rotation [ver #2]

2018-10-23 Thread David Howells

Track VL servers as independent entities rather than lumping all their
addresses together into one set and implement server-level rotation by:

 (1) Add the concept of a VL server list, where each server has its own
 separate address list.  This code is similar to the FS server list.

 (2) Use the DNS resolver to retrieve a set of servers and their associated
 addresses, ports, preference and weight ratings.

 (3) In the case of a legacy DNS resolver or an address list given directly
 through /proc/net/afs/cells, create a list containing just a dummy
 server record and attach all the addresses to that.

 (4) Implement a simple rotation policy, for the moment ignoring the
 priorities and weights assigned to the servers.

 (5) Show the address list through /proc/net/afs//vlservers.  This
 also displays the source and status of the data as indicated by the
 upcall.

Signed-off-by: David Howells 
---

 fs/afs/Makefile|2 
 fs/afs/addr_list.c |  163 +
 fs/afs/cell.c  |   39 +++---
 fs/afs/dynroot.c   |2 
 fs/afs/internal.h  |  114 --
 fs/afs/proc.c  |   90 +++---
 fs/afs/server.c|   42 ++-
 fs/afs/vl_list.c   |  336 
 fs/afs/vl_rotate.c |  251 +++
 fs/afs/vlclient.c  |   32 ++---
 fs/afs/volume.c|   52 ++--
 11 files changed, 905 insertions(+), 218 deletions(-)
 create mode 100644 fs/afs/vl_list.c
 create mode 100644 fs/afs/vl_rotate.c

diff --git a/fs/afs/Makefile b/fs/afs/Makefile
index 546874057bd3..03e9f7afea1b 100644
--- a/fs/afs/Makefile
+++ b/fs/afs/Makefile
@@ -29,6 +29,8 @@ kafs-y := \
super.o \
netdevices.o \
vlclient.o \
+   vl_rotate.o \
+   vl_list.o \
volume.o \
write.o \
xattr.o
diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c
index 7b34fad4f8f5..3f60b4012587 100644
--- a/fs/afs/addr_list.c
+++ b/fs/afs/addr_list.c
@@ -64,19 +64,25 @@ struct afs_addr_list *afs_alloc_addrlist(unsigned int nr,
 /*
  * Parse a text string consisting of delimited addresses.
  */
-struct afs_addr_list *afs_parse_text_addrs(const char *text, size_t len,
-  char delim,
-  unsigned short service,
-  unsigned short port)
+struct afs_vlserver_list *afs_parse_text_addrs(struct afs_net *net,
+  const char *text, size_t len,
+  char delim,
+  unsigned short service,
+  unsigned short port)
 {
+   struct afs_vlserver_list *vllist;
struct afs_addr_list *alist;
const char *p, *end = text + len;
+   const char *problem;
unsigned int nr = 0;
+   int ret = -ENOMEM;
 
_enter("%*.*s,%c", (int)len, (int)len, text, delim);
 
-   if (!len)
+   if (!len) {
+   _leave(" = -EDESTADDRREQ [empty]");
return ERR_PTR(-EDESTADDRREQ);
+   }
 
if (delim == ':' && (memchr(text, ',', len) || !memchr(text, '.', len)))
delim = ',';
@@ -84,18 +90,24 @@ struct afs_addr_list *afs_parse_text_addrs(const char 
*text, size_t len,
/* Count the addresses */
p = text;
do {
-   if (!*p)
-   return ERR_PTR(-EINVAL);
+   if (!*p) {
+   problem = "nul";
+   goto inval;
+   }
if (*p == delim)
continue;
nr++;
if (*p == '[') {
p++;
-   if (p == end)
-   return ERR_PTR(-EINVAL);
+   if (p == end) {
+   problem = "brace1";
+   goto inval;
+   }
p = memchr(p, ']', end - p);
-   if (!p)
-   return ERR_PTR(-EINVAL);
+   if (!p) {
+   problem = "brace2";
+   goto inval;
+   }
p++;
if (p >= end)
break;
@@ -109,10 +121,19 @@ struct afs_addr_list *afs_parse_text_addrs(const char 
*text, size_t len,
 
_debug("%u/%u addresses", nr, AFS_MAX_ADDRESSES);
 
-   alist = afs_alloc_addrlist(nr, service, port);
-   if (!alist)
+   vllist = afs_alloc_vlserver_list(1);
+   if (!vllist)
return ERR_PTR(-ENOMEM);
 
+   vllist->nr_servers = 1;
+   vllist->servers[0].server = afs_alloc_vlserver("", 7, 
AFS_VL_PORT);
+   if (!vllist->servers[0].server)
+   goto error_vl;
+
+

[PATCH 16/25] afs: Implement the YFS cache manager service [ver #2]

2018-10-23 Thread David Howells

Implement the YFS cache manager service which gives extra capabilities on
top of AFS.  This is done by listening for an additional service on the
same port and indicating that anyone requesting an upgrade should be
upgraded to the YFS port.

Signed-off-by: David Howells 
---

 fs/afs/cmservice.c|  103 +
 fs/afs/protocol_yfs.h |   57 +++
 fs/afs/rxrpc.c|   15 +++
 3 files changed, 174 insertions(+), 1 deletion(-)
 create mode 100644 fs/afs/protocol_yfs.h

diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c
index fc0010d800a0..8cf8d10daa6c 100644
--- a/fs/afs/cmservice.c
+++ b/fs/afs/cmservice.c
@@ -16,6 +16,7 @@
 #include 
 #include "internal.h"
 #include "afs_cm.h"
+#include "protocol_yfs.h"
 
 static int afs_deliver_cb_init_call_back_state(struct afs_call *);
 static int afs_deliver_cb_init_call_back_state3(struct afs_call *);
@@ -30,6 +31,8 @@ static void SRXAFSCB_Probe(struct work_struct *);
 static void SRXAFSCB_ProbeUuid(struct work_struct *);
 static void SRXAFSCB_TellMeAboutYourself(struct work_struct *);
 
+static int afs_deliver_yfs_cb_callback(struct afs_call *);
+
 #define CM_NAME(name) \
const char afs_SRXCB##name##_name[] __tracepoint_string =   \
"CB." #name
@@ -100,13 +103,24 @@ static const struct afs_call_type 
afs_SRXCBTellMeAboutYourself = {
.work   = SRXAFSCB_TellMeAboutYourself,
 };
 
+/*
+ * YFS CB.CallBack operation type
+ */
+static CM_NAME(YFS_CallBack);
+static const struct afs_call_type afs_SRXYFSCB_CallBack = {
+   .name   = afs_SRXCBYFS_CallBack_name,
+   .deliver= afs_deliver_yfs_cb_callback,
+   .destructor = afs_cm_destructor,
+   .work   = SRXAFSCB_CallBack,
+};
+
 /*
  * route an incoming cache manager call
  * - return T if supported, F if not
  */
 bool afs_cm_incoming_call(struct afs_call *call)
 {
-   _enter("{CB.OP %u}", call->operation_ID);
+   _enter("{%u, CB.OP %u}", call->service_id, call->operation_ID);
 
switch (call->operation_ID) {
case CBCallBack:
@@ -127,6 +141,11 @@ bool afs_cm_incoming_call(struct afs_call *call)
case CBTellMeAboutYourself:
call->type = _SRXCBTellMeAboutYourself;
return true;
+   case YFSCBCallBack:
+   if (call->service_id != YFS_CM_SERVICE)
+   return false;
+   call->type = _SRXYFSCB_CallBack;
+   return true;
default:
return false;
}
@@ -570,3 +589,85 @@ static int afs_deliver_cb_tell_me_about_yourself(struct 
afs_call *call)
 
return afs_queue_call_work(call);
 }
+
+/*
+ * deliver request data to a YFS CB.CallBack call
+ */
+static int afs_deliver_yfs_cb_callback(struct afs_call *call)
+{
+   struct afs_callback_break *cb;
+   struct sockaddr_rxrpc srx;
+   struct yfs_xdr_YFSFid *bp;
+   size_t size;
+   int ret, loop;
+
+   _enter("{%u}", call->unmarshall);
+
+   switch (call->unmarshall) {
+   case 0:
+   afs_extract_to_tmp(call);
+   call->unmarshall++;
+
+   /* extract the FID array and its count in two steps */
+   case 1:
+   _debug("extract FID count");
+   ret = afs_extract_data(call, true);
+   if (ret < 0)
+   return ret;
+
+   call->count = ntohl(call->tmp);
+   _debug("FID count: %u", call->count);
+   if (call->count > YFSCBMAX)
+   return afs_protocol_error(call, -EBADMSG,
+ afs_eproto_cb_fid_count);
+
+   size = array_size(call->count, sizeof(struct yfs_xdr_YFSFid));
+   call->buffer = kmalloc(size, GFP_KERNEL);
+   if (!call->buffer)
+   return -ENOMEM;
+   afs_extract_to_buf(call, size);
+   call->unmarshall++;
+
+   case 2:
+   _debug("extract FID array");
+   ret = afs_extract_data(call, false);
+   if (ret < 0)
+   return ret;
+
+   _debug("unmarshall FID array");
+   call->request = kcalloc(call->count,
+   sizeof(struct afs_callback_break),
+   GFP_KERNEL);
+   if (!call->request)
+   return -ENOMEM;
+
+   cb = call->request;
+   bp = call->buffer;
+   for (loop = call->count; loop > 0; loop--, cb++) {
+   cb->fid.vid = xdr_to_u64(bp->volume);
+   cb->fid.vnode   = xdr_to_u64(bp->vnode.lo);
+   cb->fid.vnode_hi = ntohl(bp->vnode.hi);
+   cb->fid.unique  = ntohl(bp->vnode.unique);
+   bp++;
+   }
+
+   afs_extract_to_tmp(call);

[PATCH 23/25] afs: Eliminate the address pointer from the address list cursor [ver #2]

2018-10-23 Thread David Howells

Eliminate the address pointer from the address list cursor as it's
redundant (ac->addrs[ac->index] can be used to find the same address) and
address lists must be replaced rather than being rearranged, so is of
limited value.

Signed-off-by: David Howells 
---

 fs/afs/addr_list.c |2 --
 fs/afs/internal.h  |1 -
 fs/afs/rxrpc.c |2 +-
 fs/afs/server.c|2 --
 fs/afs/vl_rotate.c |2 +-
 fs/afs/volume.c|6 +++---
 6 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c
index bc5ce31a4ae4..1536d1d21c33 100644
--- a/fs/afs/addr_list.c
+++ b/fs/afs/addr_list.c
@@ -371,7 +371,6 @@ bool afs_iterate_addresses(struct afs_addr_cursor *ac)
 
ac->begun = true;
ac->responded = false;
-   ac->addr = >alist->addrs[ac->index];
return true;
 }
 
@@ -389,7 +388,6 @@ int afs_end_cursor(struct afs_addr_cursor *ac)
afs_put_addrlist(alist);
}
 
-   ac->addr = NULL;
ac->alist = NULL;
ac->begun = false;
return ac->error;
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index ac9da1e4050e..e5b596bd8acf 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -653,7 +653,6 @@ struct afs_interface {
  */
 struct afs_addr_cursor {
struct afs_addr_list*alist; /* Current address list (pins 
ref) */
-   struct sockaddr_rxrpc   *addr;
u32 abort_code;
unsigned short  start;  /* Starting point in 
alist->addrs[] */
unsigned short  index;  /* Wrapping offset from start 
to current addr */
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 444ba0d511ef..42e1ea7372e9 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -359,7 +359,7 @@ static int afs_send_pages(struct afs_call *call, struct 
msghdr *msg)
 long afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call,
   gfp_t gfp, bool async)
 {
-   struct sockaddr_rxrpc *srx = ac->addr;
+   struct sockaddr_rxrpc *srx = >alist->addrs[ac->index];
struct rxrpc_call *rxcall;
struct msghdr msg;
struct kvec iov[1];
diff --git a/fs/afs/server.c b/fs/afs/server.c
index aa35cfae5440..7c1be8b4dc9a 100644
--- a/fs/afs/server.c
+++ b/fs/afs/server.c
@@ -367,7 +367,6 @@ static void afs_destroy_server(struct afs_net *net, struct 
afs_server *server)
.alist  = alist,
.start  = alist->index,
.index  = 0,
-   .addr   = >addrs[alist->index],
.error  = 0,
};
_enter("%p", server);
@@ -518,7 +517,6 @@ static bool afs_do_probe_fileserver(struct afs_fs_cursor 
*fc)
 
_enter("");
 
-   fc->ac.addr = NULL;
fc->ac.start = READ_ONCE(fc->ac.alist->index);
fc->ac.index = fc->ac.start;
fc->ac.error = 0;
diff --git a/fs/afs/vl_rotate.c b/fs/afs/vl_rotate.c
index 5b99ea7be194..ead6dedbb561 100644
--- a/fs/afs/vl_rotate.c
+++ b/fs/afs/vl_rotate.c
@@ -209,7 +209,7 @@ bool afs_select_vlserver(struct afs_vl_cursor *vc)
if (!afs_iterate_addresses(>ac))
goto next_server;
 
-   _leave(" = t %pISpc", >ac.addr->transport);
+   _leave(" = t %pISpc", >ac.alist->addrs[vc->ac.index].transport);
return true;
 
 next_server:
diff --git a/fs/afs/volume.c b/fs/afs/volume.c
index f0020e35bf6f..7527c081726e 100644
--- a/fs/afs/volume.c
+++ b/fs/afs/volume.c
@@ -88,16 +88,16 @@ static struct afs_vldb_entry *afs_vl_lookup_vldb(struct 
afs_cell *cell,
case VL_SERVICE:
clear_bit(vc.ac.index, >yfs);
set_bit(vc.ac.index, >probed);
-   vc.ac.addr->srx_service = ret;
+   vc.ac.alist->addrs[vc.ac.index].srx_service = 
ret;
break;
case YFS_VL_SERVICE:
set_bit(vc.ac.index, >yfs);
set_bit(vc.ac.index, >probed);
-   vc.ac.addr->srx_service = ret;
+   vc.ac.alist->addrs[vc.ac.index].srx_service = 
ret;
break;
}
}
-   
+
vldb = afs_vl_get_entry_by_name_u(, volname, volnamesz);
}

[PATCH 25/25] afs: Probe multiple fileservers simultaneously [ver #2]

2018-10-23 Thread David Howells

Send probes to all the unprobed fileservers in a fileserver list on all
addresses simultaneously in an attempt to find out the fastest route whilst
not getting stuck for 20s on any server or address that we don't get a
reply from.

This alleviates the problem whereby attempting to access a new server can
take a long time because the rotation algorithm ends up rotating through
all servers and addresses until it finds one that responds.

Signed-off-by: David Howells 
---

 fs/afs/Makefile|4 -
 fs/afs/addr_list.c |   40 --
 fs/afs/cmservice.c |  129 +++--
 fs/afs/fs_probe.c  |  270 
 fs/afs/fsclient.c  |   27 +++-
 fs/afs/internal.h  |   98 +---
 fs/afs/proc.c  |6 -
 fs/afs/rotate.c|  174 ++--
 fs/afs/rxrpc.c |   44 ---
 fs/afs/server.c|  109 +-
 fs/afs/server_list.c   |6 -
 fs/afs/vl_list.c   |6 +
 fs/afs/vl_probe.c  |  273 
 fs/afs/vl_rotate.c |  159 +-
 fs/afs/vlclient.c  |   35 +++---
 fs/afs/volume.c|   16 ---
 include/trace/events/afs.h |4 -
 17 files changed, 1050 insertions(+), 350 deletions(-)
 create mode 100644 fs/afs/fs_probe.c
 create mode 100644 fs/afs/vl_probe.c

diff --git a/fs/afs/Makefile b/fs/afs/Makefile
index cc942b790cff..0738e2bf5193 100644
--- a/fs/afs/Makefile
+++ b/fs/afs/Makefile
@@ -17,6 +17,7 @@ kafs-y := \
file.o \
flock.o \
fsclient.o \
+   fs_probe.o \
inode.o \
main.o \
misc.o \
@@ -29,8 +30,9 @@ kafs-y := \
super.o \
netdevices.o \
vlclient.o \
-   vl_rotate.o \
vl_list.o \
+   vl_probe.o \
+   vl_rotate.o \
volume.o \
write.o \
xattr.o \
diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c
index 1536d1d21c33..967db336d11a 100644
--- a/fs/afs/addr_list.c
+++ b/fs/afs/addr_list.c
@@ -303,6 +303,8 @@ void afs_merge_fs_addr4(struct afs_addr_list *alist, __be32 
xdr, u16 port)
sizeof(alist->addrs[0]) * (alist->nr_addrs - i));
 
srx = >addrs[i];
+   srx->srx_family = AF_RXRPC;
+   srx->transport_type = SOCK_DGRAM;
srx->transport_len = sizeof(srx->transport.sin);
srx->transport.sin.sin_family = AF_INET;
srx->transport.sin.sin_port = htons(port);
@@ -341,6 +343,8 @@ void afs_merge_fs_addr6(struct afs_addr_list *alist, __be32 
*xdr, u16 port)
sizeof(alist->addrs[0]) * (alist->nr_addrs - i));
 
srx = >addrs[i];
+   srx->srx_family = AF_RXRPC;
+   srx->transport_type = SOCK_DGRAM;
srx->transport_len = sizeof(srx->transport.sin6);
srx->transport.sin6.sin6_family = AF_INET6;
srx->transport.sin6.sin6_port = htons(port);
@@ -353,23 +357,32 @@ void afs_merge_fs_addr6(struct afs_addr_list *alist, 
__be32 *xdr, u16 port)
  */
 bool afs_iterate_addresses(struct afs_addr_cursor *ac)
 {
-   _enter("%hu+%hd", ac->start, (short)ac->index);
+   unsigned long set, failed;
+   int index;
 
if (!ac->alist)
return false;
 
+   set = ac->alist->responded;
+   failed = ac->alist->failed;
+   _enter("%lx-%lx-%lx,%d", set, failed, ac->tried, ac->index);
+
ac->nr_iterations++;
 
-   if (ac->begun) {
-   ac->index++;
-   if (ac->index == ac->alist->nr_addrs)
-   ac->index = 0;
+   set &= ~(failed | ac->tried);
 
-   if (ac->index == ac->start)
-   return false;
-   }
+   if (!set)
+   return false;
+
+   index = READ_ONCE(ac->alist->preferred);
+   if (test_bit(index, ))
+   goto selected;
+
+   index = __ffs(set);
 
-   ac->begun = true;
+selected:
+   ac->index = index;
+   set_bit(index, >tried);
ac->responded = false;
return true;
 }
@@ -383,12 +396,13 @@ int afs_end_cursor(struct afs_addr_cursor *ac)
 
alist = ac->alist;
if (alist) {
-   if (ac->responded && ac->index != ac->start)
-   WRITE_ONCE(alist->index, ac->index);
+   if (ac->responded &&
+   ac->index != alist->preferred &&
+   test_bit(ac->alist->preferred, >tried))
+   WRITE_ONCE(alist->preferred, ac->index);
afs_put_addrlist(alist);
+   ac->alist = NULL;
}
 
-   ac->alist = NULL;
-   ac->begun = false;
return ac->error;
 }
diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c
index 8cf8d10daa6c..8ee5972893ed 100644
--- a/fs/afs/cmservice.c
+++ b/fs/afs/cmservice.c
@@ -122,6 +122,8 @@ bool afs_cm_incoming_call(struct afs_call *call)
 {
_enter("{%u, CB.OP %u}",

[PATCH 10/25] afs: Handle EIO from delivery function [ver #2]

2018-10-23 Thread David Howells

Fix afs_deliver_to_call() to handle -EIO being returned by the operation
delivery function, indicating that the call found itself in the wrong
state, by printing an error and aborting the call.

Currently, an assertion failure will occur.  This can happen, say, if the
delivery function falls off the end without calling afs_extract_data() with
the want_more parameter set to false to collect the end of the Rx phase of
a call.

The assertion failure looks like:

AFS: Assertion failed
4 == 7 is false
0x4 == 0x7 is false
[ cut here ]
kernel BUG at fs/afs/rxrpc.c:462!

and is matched in the trace buffer by a line like:

kworker/7:3-3226 [007] ...1 85158.030203: afs_io_error: c=0003be0c r=-5 CM_REPLY

Fixes: 98bf40cd99fc ("afs: Protect call->state changes against signals")
Reported-by: Marc Dionne 
Signed-off-by: David Howells 
---

 fs/afs/rxrpc.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index a3904a8315de..947ae3ab389b 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -499,7 +499,6 @@ static void afs_deliver_to_call(struct afs_call *call)
case -EINPROGRESS:
case -EAGAIN:
goto out;
-   case -EIO:
case -ECONNABORTED:
ASSERTCMP(state, ==, AFS_CALL_COMPLETE);
goto done;
@@ -508,6 +507,10 @@ static void afs_deliver_to_call(struct afs_call *call)
rxrpc_kernel_abort_call(call->net->socket, call->rxcall,
abort_code, ret, "KIV");
goto local_abort;
+   case -EIO:
+   pr_err("kAFS: Call %u in bad state %u\n",
+  call->debug_id, state);
+   /* Fall through */
case -ENODATA:
case -EBADMSG:
case -EMSGSIZE:

[PATCH 16/25] afs: Implement the YFS cache manager service [ver #2]

2018-10-23 Thread David Howells

Implement the YFS cache manager service which gives extra capabilities on
top of AFS.  This is done by listening for an additional service on the
same port and indicating that anyone requesting an upgrade should be
upgraded to the YFS port.

Signed-off-by: David Howells 
---

 fs/afs/cmservice.c|  103 +
 fs/afs/protocol_yfs.h |   57 +++
 fs/afs/rxrpc.c|   15 +++
 3 files changed, 174 insertions(+), 1 deletion(-)
 create mode 100644 fs/afs/protocol_yfs.h

diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c
index fc0010d800a0..8cf8d10daa6c 100644
--- a/fs/afs/cmservice.c
+++ b/fs/afs/cmservice.c
@@ -16,6 +16,7 @@
 #include 
 #include "internal.h"
 #include "afs_cm.h"
+#include "protocol_yfs.h"
 
 static int afs_deliver_cb_init_call_back_state(struct afs_call *);
 static int afs_deliver_cb_init_call_back_state3(struct afs_call *);
@@ -30,6 +31,8 @@ static void SRXAFSCB_Probe(struct work_struct *);
 static void SRXAFSCB_ProbeUuid(struct work_struct *);
 static void SRXAFSCB_TellMeAboutYourself(struct work_struct *);
 
+static int afs_deliver_yfs_cb_callback(struct afs_call *);
+
 #define CM_NAME(name) \
const char afs_SRXCB##name##_name[] __tracepoint_string =   \
"CB." #name
@@ -100,13 +103,24 @@ static const struct afs_call_type 
afs_SRXCBTellMeAboutYourself = {
.work   = SRXAFSCB_TellMeAboutYourself,
 };
 
+/*
+ * YFS CB.CallBack operation type
+ */
+static CM_NAME(YFS_CallBack);
+static const struct afs_call_type afs_SRXYFSCB_CallBack = {
+   .name   = afs_SRXCBYFS_CallBack_name,
+   .deliver= afs_deliver_yfs_cb_callback,
+   .destructor = afs_cm_destructor,
+   .work   = SRXAFSCB_CallBack,
+};
+
 /*
  * route an incoming cache manager call
  * - return T if supported, F if not
  */
 bool afs_cm_incoming_call(struct afs_call *call)
 {
-   _enter("{CB.OP %u}", call->operation_ID);
+   _enter("{%u, CB.OP %u}", call->service_id, call->operation_ID);
 
switch (call->operation_ID) {
case CBCallBack:
@@ -127,6 +141,11 @@ bool afs_cm_incoming_call(struct afs_call *call)
case CBTellMeAboutYourself:
call->type = _SRXCBTellMeAboutYourself;
return true;
+   case YFSCBCallBack:
+   if (call->service_id != YFS_CM_SERVICE)
+   return false;
+   call->type = _SRXYFSCB_CallBack;
+   return true;
default:
return false;
}
@@ -570,3 +589,85 @@ static int afs_deliver_cb_tell_me_about_yourself(struct 
afs_call *call)
 
return afs_queue_call_work(call);
 }
+
+/*
+ * deliver request data to a YFS CB.CallBack call
+ */
+static int afs_deliver_yfs_cb_callback(struct afs_call *call)
+{
+   struct afs_callback_break *cb;
+   struct sockaddr_rxrpc srx;
+   struct yfs_xdr_YFSFid *bp;
+   size_t size;
+   int ret, loop;
+
+   _enter("{%u}", call->unmarshall);
+
+   switch (call->unmarshall) {
+   case 0:
+   afs_extract_to_tmp(call);
+   call->unmarshall++;
+
+   /* extract the FID array and its count in two steps */
+   case 1:
+   _debug("extract FID count");
+   ret = afs_extract_data(call, true);
+   if (ret < 0)
+   return ret;
+
+   call->count = ntohl(call->tmp);
+   _debug("FID count: %u", call->count);
+   if (call->count > YFSCBMAX)
+   return afs_protocol_error(call, -EBADMSG,
+ afs_eproto_cb_fid_count);
+
+   size = array_size(call->count, sizeof(struct yfs_xdr_YFSFid));
+   call->buffer = kmalloc(size, GFP_KERNEL);
+   if (!call->buffer)
+   return -ENOMEM;
+   afs_extract_to_buf(call, size);
+   call->unmarshall++;
+
+   case 2:
+   _debug("extract FID array");
+   ret = afs_extract_data(call, false);
+   if (ret < 0)
+   return ret;
+
+   _debug("unmarshall FID array");
+   call->request = kcalloc(call->count,
+   sizeof(struct afs_callback_break),
+   GFP_KERNEL);
+   if (!call->request)
+   return -ENOMEM;
+
+   cb = call->request;
+   bp = call->buffer;
+   for (loop = call->count; loop > 0; loop--, cb++) {
+   cb->fid.vid = xdr_to_u64(bp->volume);
+   cb->fid.vnode   = xdr_to_u64(bp->vnode.lo);
+   cb->fid.vnode_hi = ntohl(bp->vnode.hi);
+   cb->fid.unique  = ntohl(bp->vnode.unique);
+   bp++;
+   }
+
+   afs_extract_to_tmp(call);

[PATCH 23/25] afs: Eliminate the address pointer from the address list cursor [ver #2]

2018-10-23 Thread David Howells

Eliminate the address pointer from the address list cursor as it's
redundant (ac->addrs[ac->index] can be used to find the same address) and
address lists must be replaced rather than being rearranged, so is of
limited value.

Signed-off-by: David Howells 
---

 fs/afs/addr_list.c |2 --
 fs/afs/internal.h  |1 -
 fs/afs/rxrpc.c |2 +-
 fs/afs/server.c|2 --
 fs/afs/vl_rotate.c |2 +-
 fs/afs/volume.c|6 +++---
 6 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c
index bc5ce31a4ae4..1536d1d21c33 100644
--- a/fs/afs/addr_list.c
+++ b/fs/afs/addr_list.c
@@ -371,7 +371,6 @@ bool afs_iterate_addresses(struct afs_addr_cursor *ac)
 
ac->begun = true;
ac->responded = false;
-   ac->addr = >alist->addrs[ac->index];
return true;
 }
 
@@ -389,7 +388,6 @@ int afs_end_cursor(struct afs_addr_cursor *ac)
afs_put_addrlist(alist);
}
 
-   ac->addr = NULL;
ac->alist = NULL;
ac->begun = false;
return ac->error;
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index ac9da1e4050e..e5b596bd8acf 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -653,7 +653,6 @@ struct afs_interface {
  */
 struct afs_addr_cursor {
struct afs_addr_list*alist; /* Current address list (pins 
ref) */
-   struct sockaddr_rxrpc   *addr;
u32 abort_code;
unsigned short  start;  /* Starting point in 
alist->addrs[] */
unsigned short  index;  /* Wrapping offset from start 
to current addr */
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 444ba0d511ef..42e1ea7372e9 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -359,7 +359,7 @@ static int afs_send_pages(struct afs_call *call, struct 
msghdr *msg)
 long afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call,
   gfp_t gfp, bool async)
 {
-   struct sockaddr_rxrpc *srx = ac->addr;
+   struct sockaddr_rxrpc *srx = >alist->addrs[ac->index];
struct rxrpc_call *rxcall;
struct msghdr msg;
struct kvec iov[1];
diff --git a/fs/afs/server.c b/fs/afs/server.c
index aa35cfae5440..7c1be8b4dc9a 100644
--- a/fs/afs/server.c
+++ b/fs/afs/server.c
@@ -367,7 +367,6 @@ static void afs_destroy_server(struct afs_net *net, struct 
afs_server *server)
.alist  = alist,
.start  = alist->index,
.index  = 0,
-   .addr   = >addrs[alist->index],
.error  = 0,
};
_enter("%p", server);
@@ -518,7 +517,6 @@ static bool afs_do_probe_fileserver(struct afs_fs_cursor 
*fc)
 
_enter("");
 
-   fc->ac.addr = NULL;
fc->ac.start = READ_ONCE(fc->ac.alist->index);
fc->ac.index = fc->ac.start;
fc->ac.error = 0;
diff --git a/fs/afs/vl_rotate.c b/fs/afs/vl_rotate.c
index 5b99ea7be194..ead6dedbb561 100644
--- a/fs/afs/vl_rotate.c
+++ b/fs/afs/vl_rotate.c
@@ -209,7 +209,7 @@ bool afs_select_vlserver(struct afs_vl_cursor *vc)
if (!afs_iterate_addresses(>ac))
goto next_server;
 
-   _leave(" = t %pISpc", >ac.addr->transport);
+   _leave(" = t %pISpc", >ac.alist->addrs[vc->ac.index].transport);
return true;
 
 next_server:
diff --git a/fs/afs/volume.c b/fs/afs/volume.c
index f0020e35bf6f..7527c081726e 100644
--- a/fs/afs/volume.c
+++ b/fs/afs/volume.c
@@ -88,16 +88,16 @@ static struct afs_vldb_entry *afs_vl_lookup_vldb(struct 
afs_cell *cell,
case VL_SERVICE:
clear_bit(vc.ac.index, >yfs);
set_bit(vc.ac.index, >probed);
-   vc.ac.addr->srx_service = ret;
+   vc.ac.alist->addrs[vc.ac.index].srx_service = 
ret;
break;
case YFS_VL_SERVICE:
set_bit(vc.ac.index, >yfs);
set_bit(vc.ac.index, >probed);
-   vc.ac.addr->srx_service = ret;
+   vc.ac.alist->addrs[vc.ac.index].srx_service = 
ret;
break;
}
}
-   
+
vldb = afs_vl_get_entry_by_name_u(, volname, volnamesz);
}

[PATCH 25/25] afs: Probe multiple fileservers simultaneously [ver #2]

2018-10-23 Thread David Howells

Send probes to all the unprobed fileservers in a fileserver list on all
addresses simultaneously in an attempt to find out the fastest route whilst
not getting stuck for 20s on any server or address that we don't get a
reply from.

This alleviates the problem whereby attempting to access a new server can
take a long time because the rotation algorithm ends up rotating through
all servers and addresses until it finds one that responds.

Signed-off-by: David Howells 
---

 fs/afs/Makefile|4 -
 fs/afs/addr_list.c |   40 --
 fs/afs/cmservice.c |  129 +++--
 fs/afs/fs_probe.c  |  270 
 fs/afs/fsclient.c  |   27 +++-
 fs/afs/internal.h  |   98 +---
 fs/afs/proc.c  |6 -
 fs/afs/rotate.c|  174 ++--
 fs/afs/rxrpc.c |   44 ---
 fs/afs/server.c|  109 +-
 fs/afs/server_list.c   |6 -
 fs/afs/vl_list.c   |6 +
 fs/afs/vl_probe.c  |  273 
 fs/afs/vl_rotate.c |  159 +-
 fs/afs/vlclient.c  |   35 +++---
 fs/afs/volume.c|   16 ---
 include/trace/events/afs.h |4 -
 17 files changed, 1050 insertions(+), 350 deletions(-)
 create mode 100644 fs/afs/fs_probe.c
 create mode 100644 fs/afs/vl_probe.c

diff --git a/fs/afs/Makefile b/fs/afs/Makefile
index cc942b790cff..0738e2bf5193 100644
--- a/fs/afs/Makefile
+++ b/fs/afs/Makefile
@@ -17,6 +17,7 @@ kafs-y := \
file.o \
flock.o \
fsclient.o \
+   fs_probe.o \
inode.o \
main.o \
misc.o \
@@ -29,8 +30,9 @@ kafs-y := \
super.o \
netdevices.o \
vlclient.o \
-   vl_rotate.o \
vl_list.o \
+   vl_probe.o \
+   vl_rotate.o \
volume.o \
write.o \
xattr.o \
diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c
index 1536d1d21c33..967db336d11a 100644
--- a/fs/afs/addr_list.c
+++ b/fs/afs/addr_list.c
@@ -303,6 +303,8 @@ void afs_merge_fs_addr4(struct afs_addr_list *alist, __be32 
xdr, u16 port)
sizeof(alist->addrs[0]) * (alist->nr_addrs - i));
 
srx = >addrs[i];
+   srx->srx_family = AF_RXRPC;
+   srx->transport_type = SOCK_DGRAM;
srx->transport_len = sizeof(srx->transport.sin);
srx->transport.sin.sin_family = AF_INET;
srx->transport.sin.sin_port = htons(port);
@@ -341,6 +343,8 @@ void afs_merge_fs_addr6(struct afs_addr_list *alist, __be32 
*xdr, u16 port)
sizeof(alist->addrs[0]) * (alist->nr_addrs - i));
 
srx = >addrs[i];
+   srx->srx_family = AF_RXRPC;
+   srx->transport_type = SOCK_DGRAM;
srx->transport_len = sizeof(srx->transport.sin6);
srx->transport.sin6.sin6_family = AF_INET6;
srx->transport.sin6.sin6_port = htons(port);
@@ -353,23 +357,32 @@ void afs_merge_fs_addr6(struct afs_addr_list *alist, 
__be32 *xdr, u16 port)
  */
 bool afs_iterate_addresses(struct afs_addr_cursor *ac)
 {
-   _enter("%hu+%hd", ac->start, (short)ac->index);
+   unsigned long set, failed;
+   int index;
 
if (!ac->alist)
return false;
 
+   set = ac->alist->responded;
+   failed = ac->alist->failed;
+   _enter("%lx-%lx-%lx,%d", set, failed, ac->tried, ac->index);
+
ac->nr_iterations++;
 
-   if (ac->begun) {
-   ac->index++;
-   if (ac->index == ac->alist->nr_addrs)
-   ac->index = 0;
+   set &= ~(failed | ac->tried);
 
-   if (ac->index == ac->start)
-   return false;
-   }
+   if (!set)
+   return false;
+
+   index = READ_ONCE(ac->alist->preferred);
+   if (test_bit(index, ))
+   goto selected;
+
+   index = __ffs(set);
 
-   ac->begun = true;
+selected:
+   ac->index = index;
+   set_bit(index, >tried);
ac->responded = false;
return true;
 }
@@ -383,12 +396,13 @@ int afs_end_cursor(struct afs_addr_cursor *ac)
 
alist = ac->alist;
if (alist) {
-   if (ac->responded && ac->index != ac->start)
-   WRITE_ONCE(alist->index, ac->index);
+   if (ac->responded &&
+   ac->index != alist->preferred &&
+   test_bit(ac->alist->preferred, >tried))
+   WRITE_ONCE(alist->preferred, ac->index);
afs_put_addrlist(alist);
+   ac->alist = NULL;
}
 
-   ac->alist = NULL;
-   ac->begun = false;
return ac->error;
 }
diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c
index 8cf8d10daa6c..8ee5972893ed 100644
--- a/fs/afs/cmservice.c
+++ b/fs/afs/cmservice.c
@@ -122,6 +122,8 @@ bool afs_cm_incoming_call(struct afs_call *call)
 {
_enter("{%u, CB.OP %u}",

[PATCH 10/25] afs: Handle EIO from delivery function [ver #2]

2018-10-23 Thread David Howells

Fix afs_deliver_to_call() to handle -EIO being returned by the operation
delivery function, indicating that the call found itself in the wrong
state, by printing an error and aborting the call.

Currently, an assertion failure will occur.  This can happen, say, if the
delivery function falls off the end without calling afs_extract_data() with
the want_more parameter set to false to collect the end of the Rx phase of
a call.

The assertion failure looks like:

AFS: Assertion failed
4 == 7 is false
0x4 == 0x7 is false
[ cut here ]
kernel BUG at fs/afs/rxrpc.c:462!

and is matched in the trace buffer by a line like:

kworker/7:3-3226 [007] ...1 85158.030203: afs_io_error: c=0003be0c r=-5 CM_REPLY

Fixes: 98bf40cd99fc ("afs: Protect call->state changes against signals")
Reported-by: Marc Dionne 
Signed-off-by: David Howells 
---

 fs/afs/rxrpc.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index a3904a8315de..947ae3ab389b 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -499,7 +499,6 @@ static void afs_deliver_to_call(struct afs_call *call)
case -EINPROGRESS:
case -EAGAIN:
goto out;
-   case -EIO:
case -ECONNABORTED:
ASSERTCMP(state, ==, AFS_CALL_COMPLETE);
goto done;
@@ -508,6 +507,10 @@ static void afs_deliver_to_call(struct afs_call *call)
rxrpc_kernel_abort_call(call->net->socket, call->rxcall,
abort_code, ret, "KIV");
goto local_abort;
+   case -EIO:
+   pr_err("kAFS: Call %u in bad state %u\n",
+  call->debug_id, state);
+   /* Fall through */
case -ENODATA:
case -EBADMSG:
case -EMSGSIZE:

[PATCH 07/25] afs: Improve FS server rotation error handling [ver #2]

2018-10-23 Thread David Howells

Improve the error handling in FS server rotation by:

 (1) Cache the latest useful error value for the fs operation as a whole in
 struct afs_fs_cursor separately from the error cached in the
 afs_addr_cursor struct.  The one in the address cursor gets clobbered
 occasionally.  Copy over the error to the fs operation only when it's
 something we'd be interested in passing to userspace.

 (2) Make it so that EDESTADDRREQ is the default that is seen only if no
 addresses are available to be accessed.

 (3) When calling utility functions, such as checking a volume status or
 probing a fileserver, don't let a successful result clobber the cached
 error in the cursor; instead, stash the result in a temporary variable
 until it has been assessed.

 (4) Don't return ETIMEDOUT or ETIME if a better error, such as
 ENETUNREACH, is already cached.

 (5) On leaving the rotation loop, turn any remote abort code into a more
 useful error than ECONNABORTED.

Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and 
fileserver rotation")
Signed-off-by: David Howells 
---

 fs/afs/addr_list.c |4 +-
 fs/afs/internal.h  |1 +
 fs/afs/rotate.c|   95 +---
 3 files changed, 55 insertions(+), 45 deletions(-)

diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c
index 55a756c60746..7b34fad4f8f5 100644
--- a/fs/afs/addr_list.c
+++ b/fs/afs/addr_list.c
@@ -318,10 +318,8 @@ bool afs_iterate_addresses(struct afs_addr_cursor *ac)
if (ac->index == ac->alist->nr_addrs)
ac->index = 0;
 
-   if (ac->index == ac->start) {
-   ac->error = -EDESTADDRREQ;
+   if (ac->index == ac->start)
return false;
-   }
}
 
ac->begun = true;
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 36e9cc74ac11..81936a4d5035 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -629,6 +629,7 @@ struct afs_fs_cursor {
unsigned intcb_break_2; /* cb_break + cb_s_break (2nd 
vnode) */
unsigned char   start;  /* Initial index in server list 
*/
unsigned char   index;  /* Number of servers tried 
beyond start */
+   short   error;
unsigned short  flags;
 #define AFS_FS_CURSOR_STOP 0x0001  /* Set to cease iteration */
 #define AFS_FS_CURSOR_VBUSY0x0002  /* Set if seen VBUSY */
diff --git a/fs/afs/rotate.c b/fs/afs/rotate.c
index 1faef56b12bd..d7cbc3c230ee 100644
--- a/fs/afs/rotate.c
+++ b/fs/afs/rotate.c
@@ -39,9 +39,10 @@ bool afs_begin_vnode_operation(struct afs_fs_cursor *fc, 
struct afs_vnode *vnode
fc->vnode = vnode;
fc->key = key;
fc->ac.error = SHRT_MAX;
+   fc->error = -EDESTADDRREQ;
 
if (mutex_lock_interruptible(>io_lock) < 0) {
-   fc->ac.error = -EINTR;
+   fc->error = -EINTR;
fc->flags |= AFS_FS_CURSOR_STOP;
return false;
}
@@ -80,7 +81,7 @@ static bool afs_start_fs_iteration(struct afs_fs_cursor *fc,
 * and have to return an error.
 */
if (fc->flags & AFS_FS_CURSOR_CUR_ONLY) {
-   fc->ac.error = -ESTALE;
+   fc->error = -ESTALE;
return false;
}
 
@@ -127,7 +128,7 @@ static bool afs_sleep_and_retry(struct afs_fs_cursor *fc)
 {
msleep_interruptible(1000);
if (signal_pending(current)) {
-   fc->ac.error = -ERESTARTSYS;
+   fc->error = -ERESTARTSYS;
return false;
}
 
@@ -143,11 +144,12 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc)
struct afs_addr_list *alist;
struct afs_server *server;
struct afs_vnode *vnode = fc->vnode;
+   int error = fc->ac.error;
 
_enter("%u/%u,%u/%u,%d,%d",
   fc->index, fc->start,
   fc->ac.index, fc->ac.start,
-  fc->ac.error, fc->ac.abort_code);
+  error, fc->ac.abort_code);
 
if (fc->flags & AFS_FS_CURSOR_STOP) {
_leave(" = f [stopped]");
@@ -155,15 +157,16 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc)
}
 
/* Evaluate the result of the previous operation, if there was one. */
-   switch (fc->ac.error) {
+   switch (error) {
case SHRT_MAX:
goto start;
 
case 0:
default:
/* Success or local failure.  Stop. */
+   fc->error = error;
fc->flags |= AFS_FS_CURSOR_STOP;
-   _leave(" = f [okay/local %d]", fc->ac.error);
+   _leave(" = f [okay/local %d]", error);
return false;
 
case -ECONNABORTED:
@@ -178,7 +181,7 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc)

Re: [PATCH] scsi/pmcraid.c: Use dma_pool_zalloc

2018-10-23 Thread Souptick Joarder

On Mon, Oct 8, 2018 at 9:58 PM Souptick Joarder  wrote:
>
> On Tue, Oct 2, 2018 at 10:53 AM Souptick Joarder  wrote:
> >
> > Replaced dma_pool_alloc + memset with dma_pool_zalloc.
> >
> > Signed-off-by: Sabyasachi Gupta 
> > Signed-off-by: Souptick Joarder 
>
> Any comment on this patch ?

Any comment on this patch ?

>
> > ---
> >  drivers/scsi/pmcraid.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/drivers/scsi/pmcraid.c b/drivers/scsi/pmcraid.c
> > index 4e86994..84a2734 100644
> > --- a/drivers/scsi/pmcraid.c
> > +++ b/drivers/scsi/pmcraid.c
> > @@ -4681,7 +4681,7 @@ static int pmcraid_allocate_control_blocks(struct 
> > pmcraid_instance *pinstance)
> >
> > for (i = 0; i < PMCRAID_MAX_CMD; i++) {
> > pinstance->cmd_list[i]->ioa_cb =
> > -   dma_pool_alloc(
> > +   dma_pool_zalloc(
> > pinstance->control_pool,
> > GFP_KERNEL,
> > &(pinstance->cmd_list[i]->ioa_cb_bus_addr));
> > @@ -4690,8 +4690,6 @@ static int pmcraid_allocate_control_blocks(struct 
> > pmcraid_instance *pinstance)
> > pmcraid_release_control_blocks(pinstance, i);
> > return -ENOMEM;
> > }
> > -   memset(pinstance->cmd_list[i]->ioa_cb, 0,
> > -   sizeof(struct pmcraid_control_block));
> > }
> > return 0;
> >  }
> > --
> > 1.9.1
> >

[PATCH 21/25] afs: Implement YFS support in the fs client [ver #2]

2018-10-23 Thread David Howells

Implement support for talking to YFS-variant fileservers in the cache
manager and the filesystem client.  These implement upgraded services on
the same port as their AFS services.

YFS fileservers provide expanded capabilities over AFS.

Signed-off-by: David Howells 
---

 fs/afs/Makefile|3 
 fs/afs/callback.c  |9 
 fs/afs/dir.c   |   21 
 fs/afs/fsclient.c  |  104 ++
 fs/afs/internal.h  |   35 +
 fs/afs/protocol_yfs.h  |  106 ++
 fs/afs/server.c|8 
 fs/afs/yfsclient.c | 2184 
 include/trace/events/afs.h |   58 +
 9 files changed, 2500 insertions(+), 28 deletions(-)
 create mode 100644 fs/afs/yfsclient.c

diff --git a/fs/afs/Makefile b/fs/afs/Makefile
index 03e9f7afea1b..cc942b790cff 100644
--- a/fs/afs/Makefile
+++ b/fs/afs/Makefile
@@ -33,7 +33,8 @@ kafs-y := \
vl_list.o \
volume.o \
write.o \
-   xattr.o
+   xattr.o \
+   yfsclient.o
 
 kafs-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_AFS_FS)  := kafs.o
diff --git a/fs/afs/callback.c b/fs/afs/callback.c
index df9bfee698ad..1c7955f5cdaf 100644
--- a/fs/afs/callback.c
+++ b/fs/afs/callback.c
@@ -210,12 +210,10 @@ void afs_init_callback_state(struct afs_server *server)
 /*
  * actually break a callback
  */
-void afs_break_callback(struct afs_vnode *vnode)
+void __afs_break_callback(struct afs_vnode *vnode)
 {
_enter("");
 
-   write_seqlock(>cb_lock);
-
clear_bit(AFS_VNODE_NEW_CONTENT, >flags);
if (test_and_clear_bit(AFS_VNODE_CB_PROMISED, >flags)) {
vnode->cb_break++;
@@ -230,7 +228,12 @@ void afs_break_callback(struct afs_vnode *vnode)
afs_lock_may_be_available(vnode);
spin_unlock(>lock);
}
+}
 
+void afs_break_callback(struct afs_vnode *vnode)
+{
+   write_seqlock(>cb_lock);
+   __afs_break_callback(vnode);
write_sequnlock(>cb_lock);
 }
 
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index f2dd48d4363f..43dea3b00c29 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -1200,7 +1200,7 @@ static int afs_rmdir(struct inode *dir, struct dentry 
*dentry)
if (afs_begin_vnode_operation(, dvnode, key)) {
while (afs_select_fileserver()) {
fc.cb_break = afs_calc_vnode_cb_break(dvnode);
-   afs_fs_remove(, dentry->d_name.name, true,
+   afs_fs_remove(, vnode, dentry->d_name.name, true,
  data_version);
}
 
@@ -1245,7 +1245,9 @@ static int afs_dir_remove_link(struct dentry *dentry, 
struct key *key,
if (d_really_is_positive(dentry)) {
struct afs_vnode *vnode = AFS_FS_I(d_inode(dentry));
 
-   if (dir_valid) {
+   if (test_bit(AFS_VNODE_DELETED, >flags)) {
+   /* Already done */
+   } else if (dir_valid) {
drop_nlink(>vfs_inode);
if (vnode->vfs_inode.i_nlink == 0) {
set_bit(AFS_VNODE_DELETED, >flags);
@@ -1274,7 +1276,7 @@ static int afs_dir_remove_link(struct dentry *dentry, 
struct key *key,
 static int afs_unlink(struct inode *dir, struct dentry *dentry)
 {
struct afs_fs_cursor fc;
-   struct afs_vnode *dvnode = AFS_FS_I(dir), *vnode;
+   struct afs_vnode *dvnode = AFS_FS_I(dir), *vnode = NULL;
struct key *key;
unsigned long d_version = (unsigned long)dentry->d_fsdata;
u64 data_version = dvnode->status.data_version;
@@ -1304,7 +1306,18 @@ static int afs_unlink(struct inode *dir, struct dentry 
*dentry)
if (afs_begin_vnode_operation(, dvnode, key)) {
while (afs_select_fileserver()) {
fc.cb_break = afs_calc_vnode_cb_break(dvnode);
-   afs_fs_remove(, dentry->d_name.name, false,
+
+   if (test_bit(AFS_SERVER_FL_IS_YFS, 
>server->flags) &&
+   !test_bit(AFS_SERVER_FL_NO_RM2, 
>server->flags)) {
+   yfs_fs_remove_file2(, vnode, 
dentry->d_name.name,
+   data_version);
+   if (fc.ac.error != -ECONNABORTED ||
+   fc.ac.abort_code != RXGEN_OPCODE)
+   continue;
+   set_bit(AFS_SERVER_FL_NO_RM2, 
>server->flags);
+   }
+
+   afs_fs_remove(, vnode, dentry->d_name.name, false,
  data_version);
}
 
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 2da65309e0de..3975969719de 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -17,6 +17,7 @@
 #include "internal.h"
 #include "afs_fs.h"
 #include "xdr_fs.h"
+#include "protocol_yfs.h"
 
 static const struct afs_fid afs_zero_fid;

[PATCH 07/25] afs: Improve FS server rotation error handling [ver #2]

2018-10-23 Thread David Howells

Improve the error handling in FS server rotation by:

 (1) Cache the latest useful error value for the fs operation as a whole in
 struct afs_fs_cursor separately from the error cached in the
 afs_addr_cursor struct.  The one in the address cursor gets clobbered
 occasionally.  Copy over the error to the fs operation only when it's
 something we'd be interested in passing to userspace.

 (2) Make it so that EDESTADDRREQ is the default that is seen only if no
 addresses are available to be accessed.

 (3) When calling utility functions, such as checking a volume status or
 probing a fileserver, don't let a successful result clobber the cached
 error in the cursor; instead, stash the result in a temporary variable
 until it has been assessed.

 (4) Don't return ETIMEDOUT or ETIME if a better error, such as
 ENETUNREACH, is already cached.

 (5) On leaving the rotation loop, turn any remote abort code into a more
 useful error than ECONNABORTED.

Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and 
fileserver rotation")
Signed-off-by: David Howells 
---

 fs/afs/addr_list.c |4 +-
 fs/afs/internal.h  |1 +
 fs/afs/rotate.c|   95 +---
 3 files changed, 55 insertions(+), 45 deletions(-)

diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c
index 55a756c60746..7b34fad4f8f5 100644
--- a/fs/afs/addr_list.c
+++ b/fs/afs/addr_list.c
@@ -318,10 +318,8 @@ bool afs_iterate_addresses(struct afs_addr_cursor *ac)
if (ac->index == ac->alist->nr_addrs)
ac->index = 0;
 
-   if (ac->index == ac->start) {
-   ac->error = -EDESTADDRREQ;
+   if (ac->index == ac->start)
return false;
-   }
}
 
ac->begun = true;
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 36e9cc74ac11..81936a4d5035 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -629,6 +629,7 @@ struct afs_fs_cursor {
unsigned intcb_break_2; /* cb_break + cb_s_break (2nd 
vnode) */
unsigned char   start;  /* Initial index in server list 
*/
unsigned char   index;  /* Number of servers tried 
beyond start */
+   short   error;
unsigned short  flags;
 #define AFS_FS_CURSOR_STOP 0x0001  /* Set to cease iteration */
 #define AFS_FS_CURSOR_VBUSY0x0002  /* Set if seen VBUSY */
diff --git a/fs/afs/rotate.c b/fs/afs/rotate.c
index 1faef56b12bd..d7cbc3c230ee 100644
--- a/fs/afs/rotate.c
+++ b/fs/afs/rotate.c
@@ -39,9 +39,10 @@ bool afs_begin_vnode_operation(struct afs_fs_cursor *fc, 
struct afs_vnode *vnode
fc->vnode = vnode;
fc->key = key;
fc->ac.error = SHRT_MAX;
+   fc->error = -EDESTADDRREQ;
 
if (mutex_lock_interruptible(>io_lock) < 0) {
-   fc->ac.error = -EINTR;
+   fc->error = -EINTR;
fc->flags |= AFS_FS_CURSOR_STOP;
return false;
}
@@ -80,7 +81,7 @@ static bool afs_start_fs_iteration(struct afs_fs_cursor *fc,
 * and have to return an error.
 */
if (fc->flags & AFS_FS_CURSOR_CUR_ONLY) {
-   fc->ac.error = -ESTALE;
+   fc->error = -ESTALE;
return false;
}
 
@@ -127,7 +128,7 @@ static bool afs_sleep_and_retry(struct afs_fs_cursor *fc)
 {
msleep_interruptible(1000);
if (signal_pending(current)) {
-   fc->ac.error = -ERESTARTSYS;
+   fc->error = -ERESTARTSYS;
return false;
}
 
@@ -143,11 +144,12 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc)
struct afs_addr_list *alist;
struct afs_server *server;
struct afs_vnode *vnode = fc->vnode;
+   int error = fc->ac.error;
 
_enter("%u/%u,%u/%u,%d,%d",
   fc->index, fc->start,
   fc->ac.index, fc->ac.start,
-  fc->ac.error, fc->ac.abort_code);
+  error, fc->ac.abort_code);
 
if (fc->flags & AFS_FS_CURSOR_STOP) {
_leave(" = f [stopped]");
@@ -155,15 +157,16 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc)
}
 
/* Evaluate the result of the previous operation, if there was one. */
-   switch (fc->ac.error) {
+   switch (error) {
case SHRT_MAX:
goto start;
 
case 0:
default:
/* Success or local failure.  Stop. */
+   fc->error = error;
fc->flags |= AFS_FS_CURSOR_STOP;
-   _leave(" = f [okay/local %d]", fc->ac.error);
+   _leave(" = f [okay/local %d]", error);
return false;
 
case -ECONNABORTED:
@@ -178,7 +181,7 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc)

Re: [PATCH] scsi/pmcraid.c: Use dma_pool_zalloc

2018-10-23 Thread Souptick Joarder

On Mon, Oct 8, 2018 at 9:58 PM Souptick Joarder  wrote:
>
> On Tue, Oct 2, 2018 at 10:53 AM Souptick Joarder  wrote:
> >
> > Replaced dma_pool_alloc + memset with dma_pool_zalloc.
> >
> > Signed-off-by: Sabyasachi Gupta 
> > Signed-off-by: Souptick Joarder 
>
> Any comment on this patch ?

Any comment on this patch ?

>
> > ---
> >  drivers/scsi/pmcraid.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/drivers/scsi/pmcraid.c b/drivers/scsi/pmcraid.c
> > index 4e86994..84a2734 100644
> > --- a/drivers/scsi/pmcraid.c
> > +++ b/drivers/scsi/pmcraid.c
> > @@ -4681,7 +4681,7 @@ static int pmcraid_allocate_control_blocks(struct 
> > pmcraid_instance *pinstance)
> >
> > for (i = 0; i < PMCRAID_MAX_CMD; i++) {
> > pinstance->cmd_list[i]->ioa_cb =
> > -   dma_pool_alloc(
> > +   dma_pool_zalloc(
> > pinstance->control_pool,
> > GFP_KERNEL,
> > &(pinstance->cmd_list[i]->ioa_cb_bus_addr));
> > @@ -4690,8 +4690,6 @@ static int pmcraid_allocate_control_blocks(struct 
> > pmcraid_instance *pinstance)
> > pmcraid_release_control_blocks(pinstance, i);
> > return -ENOMEM;
> > }
> > -   memset(pinstance->cmd_list[i]->ioa_cb, 0,
> > -   sizeof(struct pmcraid_control_block));
> > }
> > return 0;
> >  }
> > --
> > 1.9.1
> >

[PATCH 21/25] afs: Implement YFS support in the fs client [ver #2]

2018-10-23 Thread David Howells

Implement support for talking to YFS-variant fileservers in the cache
manager and the filesystem client.  These implement upgraded services on
the same port as their AFS services.

YFS fileservers provide expanded capabilities over AFS.

Signed-off-by: David Howells 
---

 fs/afs/Makefile|3 
 fs/afs/callback.c  |9 
 fs/afs/dir.c   |   21 
 fs/afs/fsclient.c  |  104 ++
 fs/afs/internal.h  |   35 +
 fs/afs/protocol_yfs.h  |  106 ++
 fs/afs/server.c|8 
 fs/afs/yfsclient.c | 2184 
 include/trace/events/afs.h |   58 +
 9 files changed, 2500 insertions(+), 28 deletions(-)
 create mode 100644 fs/afs/yfsclient.c

diff --git a/fs/afs/Makefile b/fs/afs/Makefile
index 03e9f7afea1b..cc942b790cff 100644
--- a/fs/afs/Makefile
+++ b/fs/afs/Makefile
@@ -33,7 +33,8 @@ kafs-y := \
vl_list.o \
volume.o \
write.o \
-   xattr.o
+   xattr.o \
+   yfsclient.o
 
 kafs-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_AFS_FS)  := kafs.o
diff --git a/fs/afs/callback.c b/fs/afs/callback.c
index df9bfee698ad..1c7955f5cdaf 100644
--- a/fs/afs/callback.c
+++ b/fs/afs/callback.c
@@ -210,12 +210,10 @@ void afs_init_callback_state(struct afs_server *server)
 /*
  * actually break a callback
  */
-void afs_break_callback(struct afs_vnode *vnode)
+void __afs_break_callback(struct afs_vnode *vnode)
 {
_enter("");
 
-   write_seqlock(>cb_lock);
-
clear_bit(AFS_VNODE_NEW_CONTENT, >flags);
if (test_and_clear_bit(AFS_VNODE_CB_PROMISED, >flags)) {
vnode->cb_break++;
@@ -230,7 +228,12 @@ void afs_break_callback(struct afs_vnode *vnode)
afs_lock_may_be_available(vnode);
spin_unlock(>lock);
}
+}
 
+void afs_break_callback(struct afs_vnode *vnode)
+{
+   write_seqlock(>cb_lock);
+   __afs_break_callback(vnode);
write_sequnlock(>cb_lock);
 }
 
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index f2dd48d4363f..43dea3b00c29 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -1200,7 +1200,7 @@ static int afs_rmdir(struct inode *dir, struct dentry 
*dentry)
if (afs_begin_vnode_operation(, dvnode, key)) {
while (afs_select_fileserver()) {
fc.cb_break = afs_calc_vnode_cb_break(dvnode);
-   afs_fs_remove(, dentry->d_name.name, true,
+   afs_fs_remove(, vnode, dentry->d_name.name, true,
  data_version);
}
 
@@ -1245,7 +1245,9 @@ static int afs_dir_remove_link(struct dentry *dentry, 
struct key *key,
if (d_really_is_positive(dentry)) {
struct afs_vnode *vnode = AFS_FS_I(d_inode(dentry));
 
-   if (dir_valid) {
+   if (test_bit(AFS_VNODE_DELETED, >flags)) {
+   /* Already done */
+   } else if (dir_valid) {
drop_nlink(>vfs_inode);
if (vnode->vfs_inode.i_nlink == 0) {
set_bit(AFS_VNODE_DELETED, >flags);
@@ -1274,7 +1276,7 @@ static int afs_dir_remove_link(struct dentry *dentry, 
struct key *key,
 static int afs_unlink(struct inode *dir, struct dentry *dentry)
 {
struct afs_fs_cursor fc;
-   struct afs_vnode *dvnode = AFS_FS_I(dir), *vnode;
+   struct afs_vnode *dvnode = AFS_FS_I(dir), *vnode = NULL;
struct key *key;
unsigned long d_version = (unsigned long)dentry->d_fsdata;
u64 data_version = dvnode->status.data_version;
@@ -1304,7 +1306,18 @@ static int afs_unlink(struct inode *dir, struct dentry 
*dentry)
if (afs_begin_vnode_operation(, dvnode, key)) {
while (afs_select_fileserver()) {
fc.cb_break = afs_calc_vnode_cb_break(dvnode);
-   afs_fs_remove(, dentry->d_name.name, false,
+
+   if (test_bit(AFS_SERVER_FL_IS_YFS, 
>server->flags) &&
+   !test_bit(AFS_SERVER_FL_NO_RM2, 
>server->flags)) {
+   yfs_fs_remove_file2(, vnode, 
dentry->d_name.name,
+   data_version);
+   if (fc.ac.error != -ECONNABORTED ||
+   fc.ac.abort_code != RXGEN_OPCODE)
+   continue;
+   set_bit(AFS_SERVER_FL_NO_RM2, 
>server->flags);
+   }
+
+   afs_fs_remove(, vnode, dentry->d_name.name, false,
  data_version);
}
 
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 2da65309e0de..3975969719de 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -17,6 +17,7 @@
 #include "internal.h"
 #include "afs_fs.h"
 #include "xdr_fs.h"
+#include "protocol_yfs.h"
 
 static const struct afs_fid afs_zero_fid;

[PATCH 18/25] afs: Calc callback expiry in op reply delivery [ver #2]

2018-10-23 Thread David Howells

Calculate the callback expiration time at the point of operation reply
delivery, using the reply time queried from AF_RXRPC on that call as a
base.

Signed-off-by: David Howells 
---

 fs/afs/afs.h  |2 +-
 fs/afs/fsclient.c |   22 +-
 fs/afs/inode.c|4 ++--
 fs/afs/internal.h |2 ++
 fs/afs/rxrpc.c|6 ++
 5 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/fs/afs/afs.h b/fs/afs/afs.h
index fb9bcb8758ea..417cd23529c5 100644
--- a/fs/afs/afs.h
+++ b/fs/afs/afs.h
@@ -68,8 +68,8 @@ typedef enum {
 } afs_callback_type_t;
 
 struct afs_callback {
+   time64_texpires_at; /* Time at which expires */
unsignedversion;/* Callback version */
-   unsignedexpiry; /* Time at which expires */
afs_callback_type_t type;   /* Type of callback */
 };
 
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index f758750e81d8..6105cdb17163 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -287,13 +287,19 @@ static void xdr_decode_AFSCallBack(struct afs_call *call,
*_bp = bp;
 }
 
-static void xdr_decode_AFSCallBack_raw(const __be32 **_bp,
+static ktime_t xdr_decode_expiry(struct afs_call *call, u32 expiry)
+{
+   return ktime_add_ns(call->reply_time, expiry * NSEC_PER_SEC);
+}
+
+static void xdr_decode_AFSCallBack_raw(struct afs_call *call,
+  const __be32 **_bp,
   struct afs_callback *cb)
 {
const __be32 *bp = *_bp;
 
cb->version = ntohl(*bp++);
-   cb->expiry  = ntohl(*bp++);
+   cb->expires_at  = xdr_decode_expiry(call, ntohl(*bp++));
cb->type= ntohl(*bp++);
*_bp = bp;
 }
@@ -440,6 +446,7 @@ int afs_fs_fetch_file_status(struct afs_fs_cursor *fc, 
struct afs_volsync *volsy
call->reply[0] = vnode;
call->reply[1] = volsync;
call->expected_version = new_inode ? 1 : vnode->status.data_version;
+   call->want_reply_time = true;
 
/* marshall the parameters */
bp = call->request;
@@ -627,6 +634,7 @@ static int afs_fs_fetch_data64(struct afs_fs_cursor *fc, 
struct afs_read *req)
call->reply[1] = NULL; /* volsync */
call->reply[2] = req;
call->expected_version = vnode->status.data_version;
+   call->want_reply_time = true;
 
/* marshall the parameters */
bp = call->request;
@@ -672,6 +680,7 @@ int afs_fs_fetch_data(struct afs_fs_cursor *fc, struct 
afs_read *req)
call->reply[1] = NULL; /* volsync */
call->reply[2] = req;
call->expected_version = vnode->status.data_version;
+   call->want_reply_time = true;
 
/* marshall the parameters */
bp = call->request;
@@ -714,7 +723,7 @@ static int afs_deliver_fs_create_vnode(struct afs_call 
*call)
>expected_version, NULL);
if (ret < 0)
return ret;
-   xdr_decode_AFSCallBack_raw(, call->reply[3]);
+   xdr_decode_AFSCallBack_raw(call, , call->reply[3]);
/* xdr_decode_AFSVolSync(, call->reply[X]); */
 
_leave(" = 0 [done]");
@@ -773,6 +782,7 @@ int afs_fs_create(struct afs_fs_cursor *fc,
call->reply[2] = newstatus;
call->reply[3] = newcb;
call->expected_version = current_data_version + 1;
+   call->want_reply_time = true;
 
/* marshall the parameters */
bp = call->request;
@@ -2042,7 +2052,7 @@ static int afs_deliver_fs_fetch_status(struct afs_call 
*call)
>expected_version, NULL);
if (ret < 0)
return ret;
-   xdr_decode_AFSCallBack_raw(, callback);
+   xdr_decode_AFSCallBack_raw(call, , callback);
if (volsync)
xdr_decode_AFSVolSync(, volsync);
 
@@ -2088,6 +2098,7 @@ int afs_fs_fetch_status(struct afs_fs_cursor *fc,
call->reply[2] = callback;
call->reply[3] = volsync;
call->expected_version = 1; /* vnode->status.data_version */
+   call->want_reply_time = true;
 
/* marshall the parameters */
bp = call->request;
@@ -2188,7 +2199,7 @@ static int afs_deliver_fs_inline_bulk_status(struct 
afs_call *call)
bp = call->buffer;
callbacks = call->reply[2];
callbacks[call->count].version  = ntohl(bp[0]);
-   callbacks[call->count].expiry   = ntohl(bp[1]);
+   callbacks[call->count].expires_at = xdr_decode_expiry(call, 
ntohl(bp[1]));
callbacks[call->count].type = ntohl(bp[2]);
statuses = call->reply[1];
if (call->count == 0 && vnode && statuses[0].abort_code == 0)
@@ -2261,6 +2272,7 @@ int afs_fs_inline_bulk_status(struct afs_fs_cursor *fc,
call->reply[2] = callbacks;
call->reply[3] = volsync;
call->count2 = nr_fids;
+   call->want_reply_time = true;
 
/*

[PATCH 20/25] afs: Expand data structure fields to support YFS [ver #2]

2018-10-23 Thread David Howells

Expand fields in various data structures to support the expanded
information that YFS is capable of returning.

Signed-off-by: David Howells 
---

 fs/afs/afs.h  |   35 ++-
 fs/afs/fsclient.c |9 +
 2 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/fs/afs/afs.h b/fs/afs/afs.h
index 417cd23529c5..d12ffb457e47 100644
--- a/fs/afs/afs.h
+++ b/fs/afs/afs.h
@@ -130,19 +130,18 @@ typedef u32 afs_access_t;
 struct afs_file_status {
u64 size;   /* file size */
afs_dataversion_t   data_version;   /* current data version */
-   time_t  mtime_client;   /* last time client changed 
data */
-   time_t  mtime_server;   /* last time server changed 
data */
-   unsignedabort_code; /* Abort if bulk-fetching this 
failed */
-
-   afs_file_type_t type;   /* file type */
-   unsignednlink;  /* link count */
-   u32 author; /* author ID */
-   u32 owner;  /* owner ID */
-   u32 group;  /* group ID */
+   struct timespec64   mtime_client;   /* Last time client changed 
data */
+   struct timespec64   mtime_server;   /* Last time server changed 
data */
+   s64 author; /* author ID */
+   s64 owner;  /* owner ID */
+   s64 group;  /* group ID */
afs_access_tcaller_access;  /* access rights for 
authenticated caller */
afs_access_tanon_access;/* access rights for 
unauthenticated caller */
umode_t mode;   /* UNIX mode */
+   afs_file_type_t type;   /* file type */
+   u32 nlink;  /* link count */
s32 lock_count; /* file lock count (0=UNLK 
-1=WRLCK +ve=#RDLCK */
+   u32 abort_code; /* Abort if bulk-fetching this 
failed */
 };
 
 /*
@@ -159,25 +158,27 @@ struct afs_file_status {
  * AFS volume synchronisation information
  */
 struct afs_volsync {
-   time_t  creation;   /* volume creation time */
+   time64_tcreation;   /* volume creation time */
 };
 
 /*
  * AFS volume status record
  */
 struct afs_volume_status {
-   u32 vid;/* volume ID */
-   u32 parent_id;  /* parent volume ID */
+   afs_volid_t vid;/* volume ID */
+   afs_volid_t parent_id;  /* parent volume ID */
u8  online; /* true if volume currently 
online and available */
u8  in_service; /* true if volume currently in 
service */
u8  blessed;/* same as in_service */
u8  needs_salvage;  /* true if consistency checking 
required */
u32 type;   /* volume type (afs_voltype_t) 
*/
-   u32 min_quota;  /* minimum space set aside 
(blocks) */
-   u32 max_quota;  /* maximum space this volume 
may occupy (blocks) */
-   u32 blocks_in_use;  /* space this volume currently 
occupies (blocks) */
-   u32 part_blocks_avail; /* space available in 
volume's partition */
-   u32 part_max_blocks; /* size of volume's partition 
*/
+   u64 min_quota;  /* minimum space set aside 
(blocks) */
+   u64 max_quota;  /* maximum space this volume 
may occupy (blocks) */
+   u64 blocks_in_use;  /* space this volume currently 
occupies (blocks) */
+   u64 part_blocks_avail; /* space available in 
volume's partition */
+   u64 part_max_blocks; /* size of volume's partition 
*/
+   s64 vol_copy_date;
+   s64 vol_backup_date;
 };
 
 #define AFS_BLOCK_SIZE 1024
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 6105cdb17163..2da65309e0de 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -69,8 +69,7 @@ void afs_update_inode_from_status(struct afs_vnode *vnode,
struct timespec64 t;
umode_t mode;
 
-   t.tv_sec = status->mtime_client;
-   t.tv_nsec = 0;
+   t = status->mtime_client;
vnode->vfs_inode.i_ctime = t;
vnode->vfs_inode.i_mtime = t;
vnode->vfs_inode.i_atime = t;
@@ -194,8 +193,10 @@ static int xdr_decode_AFSFetchStatus(struct afs_call *call,
EXTRACT_M(mode);
EXTRACT_M(group);
 
-   status->mtime_client = ntohl(xdr->mtime_client);
-   status->mtime_server =

[PATCH 11/25] afs: Add a couple of tracepoints to log I/O errors [ver #2]

2018-10-23 Thread David Howells

Add a couple of tracepoints to log the production of I/O errors within the AFS
filesystem.

Signed-off-by: David Howells 
---

 fs/afs/cmservice.c |   10 +++--
 fs/afs/dir.c   |   18 ++
 fs/afs/internal.h  |   11 ++
 fs/afs/mntpt.c |5 ++-
 fs/afs/rxrpc.c |2 +
 fs/afs/server.c|2 +
 fs/afs/write.c |1 +
 include/trace/events/afs.h |   81 
 8 files changed, 114 insertions(+), 16 deletions(-)

diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c
index 4db62ae8dc1a..186f621f8722 100644
--- a/fs/afs/cmservice.c
+++ b/fs/afs/cmservice.c
@@ -260,7 +260,7 @@ static int afs_deliver_cb_callback(struct afs_call *call)
}
 
if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING))
-   return -EIO;
+   return afs_io_error(call, afs_io_error_cm_reply);
 
/* we'll need the file server record as that tells us which set of
 * vnodes to operate upon */
@@ -368,7 +368,7 @@ static int afs_deliver_cb_init_call_back_state3(struct 
afs_call *call)
}
 
if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING))
-   return -EIO;
+   return afs_io_error(call, afs_io_error_cm_reply);
 
/* we'll need the file server record as that tells us which set of
 * vnodes to operate upon */
@@ -409,7 +409,7 @@ static int afs_deliver_cb_probe(struct afs_call *call)
return ret;
 
if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING))
-   return -EIO;
+   return afs_io_error(call, afs_io_error_cm_reply);
 
return afs_queue_call_work(call);
 }
@@ -490,7 +490,7 @@ static int afs_deliver_cb_probe_uuid(struct afs_call *call)
}
 
if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING))
-   return -EIO;
+   return afs_io_error(call, afs_io_error_cm_reply);
 
return afs_queue_call_work(call);
 }
@@ -573,7 +573,7 @@ static int afs_deliver_cb_tell_me_about_yourself(struct 
afs_call *call)
return ret;
 
if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING))
-   return -EIO;
+   return afs_io_error(call, afs_io_error_cm_reply);
 
return afs_queue_call_work(call);
 }
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 855bf2b79fed..78f9754fd03d 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -138,6 +138,7 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, 
struct page *page,
   ntohs(dbuf->blocks[tmp].hdr.magic));
trace_afs_dir_check_failed(dvnode, off, i_size);
kunmap(page);
+   trace_afs_file_error(dvnode, -EIO, 
afs_file_error_dir_bad_magic);
goto error;
}
 
@@ -190,9 +191,11 @@ static struct afs_read *afs_read_dir(struct afs_vnode 
*dvnode, struct key *key)
 retry:
i_size = i_size_read(>vfs_inode);
if (i_size < 2048)
-   return ERR_PTR(-EIO);
-   if (i_size > 2048 * 1024)
+   return ERR_PTR(afs_bad(dvnode, afs_file_error_dir_small));
+   if (i_size > 2048 * 1024) {
+   trace_afs_file_error(dvnode, -EFBIG, afs_file_error_dir_big);
return ERR_PTR(-EFBIG);
+   }
 
_enter("%llu", i_size);
 
@@ -315,7 +318,8 @@ static struct afs_read *afs_read_dir(struct afs_vnode 
*dvnode, struct key *key)
 /*
  * deal with one block in an AFS directory
  */
-static int afs_dir_iterate_block(struct dir_context *ctx,
+static int afs_dir_iterate_block(struct afs_vnode *dvnode,
+struct dir_context *ctx,
 union afs_xdr_dir_block *block,
 unsigned blkoff)
 {
@@ -365,7 +369,7 @@ static int afs_dir_iterate_block(struct dir_context *ctx,
   " (len %u/%zu)",
   blkoff / sizeof(union afs_xdr_dir_block),
   offset, next, tmp, nlen);
-   return -EIO;
+   return afs_bad(dvnode, 
afs_file_error_dir_over_end);
}
if (!(block->hdr.bitmap[next / 8] &
  (1 << (next % 8 {
@@ -373,7 +377,7 @@ static int afs_dir_iterate_block(struct dir_context *ctx,
   " %u unmarked extension (len %u/%zu)",
   blkoff / sizeof(union afs_xdr_dir_block),
   offset, next, tmp, nlen);
-   return -EIO;
+   return afs_bad(dvnode, 
afs_file_error_dir_unmarked_ext);
}
 
_debug("ENT[%zu.%u]: ext %u/%zu",
@@ -442,7 +446,7 @@ static int afs_dir_iterate(struct

[PATCH 18/25] afs: Calc callback expiry in op reply delivery [ver #2]

2018-10-23 Thread David Howells

Calculate the callback expiration time at the point of operation reply
delivery, using the reply time queried from AF_RXRPC on that call as a
base.

Signed-off-by: David Howells 
---

 fs/afs/afs.h  |2 +-
 fs/afs/fsclient.c |   22 +-
 fs/afs/inode.c|4 ++--
 fs/afs/internal.h |2 ++
 fs/afs/rxrpc.c|6 ++
 5 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/fs/afs/afs.h b/fs/afs/afs.h
index fb9bcb8758ea..417cd23529c5 100644
--- a/fs/afs/afs.h
+++ b/fs/afs/afs.h
@@ -68,8 +68,8 @@ typedef enum {
 } afs_callback_type_t;
 
 struct afs_callback {
+   time64_texpires_at; /* Time at which expires */
unsignedversion;/* Callback version */
-   unsignedexpiry; /* Time at which expires */
afs_callback_type_t type;   /* Type of callback */
 };
 
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index f758750e81d8..6105cdb17163 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -287,13 +287,19 @@ static void xdr_decode_AFSCallBack(struct afs_call *call,
*_bp = bp;
 }
 
-static void xdr_decode_AFSCallBack_raw(const __be32 **_bp,
+static ktime_t xdr_decode_expiry(struct afs_call *call, u32 expiry)
+{
+   return ktime_add_ns(call->reply_time, expiry * NSEC_PER_SEC);
+}
+
+static void xdr_decode_AFSCallBack_raw(struct afs_call *call,
+  const __be32 **_bp,
   struct afs_callback *cb)
 {
const __be32 *bp = *_bp;
 
cb->version = ntohl(*bp++);
-   cb->expiry  = ntohl(*bp++);
+   cb->expires_at  = xdr_decode_expiry(call, ntohl(*bp++));
cb->type= ntohl(*bp++);
*_bp = bp;
 }
@@ -440,6 +446,7 @@ int afs_fs_fetch_file_status(struct afs_fs_cursor *fc, 
struct afs_volsync *volsy
call->reply[0] = vnode;
call->reply[1] = volsync;
call->expected_version = new_inode ? 1 : vnode->status.data_version;
+   call->want_reply_time = true;
 
/* marshall the parameters */
bp = call->request;
@@ -627,6 +634,7 @@ static int afs_fs_fetch_data64(struct afs_fs_cursor *fc, 
struct afs_read *req)
call->reply[1] = NULL; /* volsync */
call->reply[2] = req;
call->expected_version = vnode->status.data_version;
+   call->want_reply_time = true;
 
/* marshall the parameters */
bp = call->request;
@@ -672,6 +680,7 @@ int afs_fs_fetch_data(struct afs_fs_cursor *fc, struct 
afs_read *req)
call->reply[1] = NULL; /* volsync */
call->reply[2] = req;
call->expected_version = vnode->status.data_version;
+   call->want_reply_time = true;
 
/* marshall the parameters */
bp = call->request;
@@ -714,7 +723,7 @@ static int afs_deliver_fs_create_vnode(struct afs_call 
*call)
>expected_version, NULL);
if (ret < 0)
return ret;
-   xdr_decode_AFSCallBack_raw(, call->reply[3]);
+   xdr_decode_AFSCallBack_raw(call, , call->reply[3]);
/* xdr_decode_AFSVolSync(, call->reply[X]); */
 
_leave(" = 0 [done]");
@@ -773,6 +782,7 @@ int afs_fs_create(struct afs_fs_cursor *fc,
call->reply[2] = newstatus;
call->reply[3] = newcb;
call->expected_version = current_data_version + 1;
+   call->want_reply_time = true;
 
/* marshall the parameters */
bp = call->request;
@@ -2042,7 +2052,7 @@ static int afs_deliver_fs_fetch_status(struct afs_call 
*call)
>expected_version, NULL);
if (ret < 0)
return ret;
-   xdr_decode_AFSCallBack_raw(, callback);
+   xdr_decode_AFSCallBack_raw(call, , callback);
if (volsync)
xdr_decode_AFSVolSync(, volsync);
 
@@ -2088,6 +2098,7 @@ int afs_fs_fetch_status(struct afs_fs_cursor *fc,
call->reply[2] = callback;
call->reply[3] = volsync;
call->expected_version = 1; /* vnode->status.data_version */
+   call->want_reply_time = true;
 
/* marshall the parameters */
bp = call->request;
@@ -2188,7 +2199,7 @@ static int afs_deliver_fs_inline_bulk_status(struct 
afs_call *call)
bp = call->buffer;
callbacks = call->reply[2];
callbacks[call->count].version  = ntohl(bp[0]);
-   callbacks[call->count].expiry   = ntohl(bp[1]);
+   callbacks[call->count].expires_at = xdr_decode_expiry(call, 
ntohl(bp[1]));
callbacks[call->count].type = ntohl(bp[2]);
statuses = call->reply[1];
if (call->count == 0 && vnode && statuses[0].abort_code == 0)
@@ -2261,6 +2272,7 @@ int afs_fs_inline_bulk_status(struct afs_fs_cursor *fc,
call->reply[2] = callbacks;
call->reply[3] = volsync;
call->count2 = nr_fids;
+   call->want_reply_time = true;
 
/*

[PATCH 20/25] afs: Expand data structure fields to support YFS [ver #2]

2018-10-23 Thread David Howells

Expand fields in various data structures to support the expanded
information that YFS is capable of returning.

Signed-off-by: David Howells 
---

 fs/afs/afs.h  |   35 ++-
 fs/afs/fsclient.c |9 +
 2 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/fs/afs/afs.h b/fs/afs/afs.h
index 417cd23529c5..d12ffb457e47 100644
--- a/fs/afs/afs.h
+++ b/fs/afs/afs.h
@@ -130,19 +130,18 @@ typedef u32 afs_access_t;
 struct afs_file_status {
u64 size;   /* file size */
afs_dataversion_t   data_version;   /* current data version */
-   time_t  mtime_client;   /* last time client changed 
data */
-   time_t  mtime_server;   /* last time server changed 
data */
-   unsignedabort_code; /* Abort if bulk-fetching this 
failed */
-
-   afs_file_type_t type;   /* file type */
-   unsignednlink;  /* link count */
-   u32 author; /* author ID */
-   u32 owner;  /* owner ID */
-   u32 group;  /* group ID */
+   struct timespec64   mtime_client;   /* Last time client changed 
data */
+   struct timespec64   mtime_server;   /* Last time server changed 
data */
+   s64 author; /* author ID */
+   s64 owner;  /* owner ID */
+   s64 group;  /* group ID */
afs_access_tcaller_access;  /* access rights for 
authenticated caller */
afs_access_tanon_access;/* access rights for 
unauthenticated caller */
umode_t mode;   /* UNIX mode */
+   afs_file_type_t type;   /* file type */
+   u32 nlink;  /* link count */
s32 lock_count; /* file lock count (0=UNLK 
-1=WRLCK +ve=#RDLCK */
+   u32 abort_code; /* Abort if bulk-fetching this 
failed */
 };
 
 /*
@@ -159,25 +158,27 @@ struct afs_file_status {
  * AFS volume synchronisation information
  */
 struct afs_volsync {
-   time_t  creation;   /* volume creation time */
+   time64_tcreation;   /* volume creation time */
 };
 
 /*
  * AFS volume status record
  */
 struct afs_volume_status {
-   u32 vid;/* volume ID */
-   u32 parent_id;  /* parent volume ID */
+   afs_volid_t vid;/* volume ID */
+   afs_volid_t parent_id;  /* parent volume ID */
u8  online; /* true if volume currently 
online and available */
u8  in_service; /* true if volume currently in 
service */
u8  blessed;/* same as in_service */
u8  needs_salvage;  /* true if consistency checking 
required */
u32 type;   /* volume type (afs_voltype_t) 
*/
-   u32 min_quota;  /* minimum space set aside 
(blocks) */
-   u32 max_quota;  /* maximum space this volume 
may occupy (blocks) */
-   u32 blocks_in_use;  /* space this volume currently 
occupies (blocks) */
-   u32 part_blocks_avail; /* space available in 
volume's partition */
-   u32 part_max_blocks; /* size of volume's partition 
*/
+   u64 min_quota;  /* minimum space set aside 
(blocks) */
+   u64 max_quota;  /* maximum space this volume 
may occupy (blocks) */
+   u64 blocks_in_use;  /* space this volume currently 
occupies (blocks) */
+   u64 part_blocks_avail; /* space available in 
volume's partition */
+   u64 part_max_blocks; /* size of volume's partition 
*/
+   s64 vol_copy_date;
+   s64 vol_backup_date;
 };
 
 #define AFS_BLOCK_SIZE 1024
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 6105cdb17163..2da65309e0de 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -69,8 +69,7 @@ void afs_update_inode_from_status(struct afs_vnode *vnode,
struct timespec64 t;
umode_t mode;
 
-   t.tv_sec = status->mtime_client;
-   t.tv_nsec = 0;
+   t = status->mtime_client;
vnode->vfs_inode.i_ctime = t;
vnode->vfs_inode.i_mtime = t;
vnode->vfs_inode.i_atime = t;
@@ -194,8 +193,10 @@ static int xdr_decode_AFSFetchStatus(struct afs_call *call,
EXTRACT_M(mode);
EXTRACT_M(group);
 
-   status->mtime_client = ntohl(xdr->mtime_client);
-   status->mtime_server =

[PATCH 11/25] afs: Add a couple of tracepoints to log I/O errors [ver #2]

2018-10-23 Thread David Howells

Add a couple of tracepoints to log the production of I/O errors within the AFS
filesystem.

Signed-off-by: David Howells 
---

 fs/afs/cmservice.c |   10 +++--
 fs/afs/dir.c   |   18 ++
 fs/afs/internal.h  |   11 ++
 fs/afs/mntpt.c |5 ++-
 fs/afs/rxrpc.c |2 +
 fs/afs/server.c|2 +
 fs/afs/write.c |1 +
 include/trace/events/afs.h |   81 
 8 files changed, 114 insertions(+), 16 deletions(-)

diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c
index 4db62ae8dc1a..186f621f8722 100644
--- a/fs/afs/cmservice.c
+++ b/fs/afs/cmservice.c
@@ -260,7 +260,7 @@ static int afs_deliver_cb_callback(struct afs_call *call)
}
 
if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING))
-   return -EIO;
+   return afs_io_error(call, afs_io_error_cm_reply);
 
/* we'll need the file server record as that tells us which set of
 * vnodes to operate upon */
@@ -368,7 +368,7 @@ static int afs_deliver_cb_init_call_back_state3(struct 
afs_call *call)
}
 
if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING))
-   return -EIO;
+   return afs_io_error(call, afs_io_error_cm_reply);
 
/* we'll need the file server record as that tells us which set of
 * vnodes to operate upon */
@@ -409,7 +409,7 @@ static int afs_deliver_cb_probe(struct afs_call *call)
return ret;
 
if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING))
-   return -EIO;
+   return afs_io_error(call, afs_io_error_cm_reply);
 
return afs_queue_call_work(call);
 }
@@ -490,7 +490,7 @@ static int afs_deliver_cb_probe_uuid(struct afs_call *call)
}
 
if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING))
-   return -EIO;
+   return afs_io_error(call, afs_io_error_cm_reply);
 
return afs_queue_call_work(call);
 }
@@ -573,7 +573,7 @@ static int afs_deliver_cb_tell_me_about_yourself(struct 
afs_call *call)
return ret;
 
if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING))
-   return -EIO;
+   return afs_io_error(call, afs_io_error_cm_reply);
 
return afs_queue_call_work(call);
 }
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 855bf2b79fed..78f9754fd03d 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -138,6 +138,7 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, 
struct page *page,
   ntohs(dbuf->blocks[tmp].hdr.magic));
trace_afs_dir_check_failed(dvnode, off, i_size);
kunmap(page);
+   trace_afs_file_error(dvnode, -EIO, 
afs_file_error_dir_bad_magic);
goto error;
}
 
@@ -190,9 +191,11 @@ static struct afs_read *afs_read_dir(struct afs_vnode 
*dvnode, struct key *key)
 retry:
i_size = i_size_read(>vfs_inode);
if (i_size < 2048)
-   return ERR_PTR(-EIO);
-   if (i_size > 2048 * 1024)
+   return ERR_PTR(afs_bad(dvnode, afs_file_error_dir_small));
+   if (i_size > 2048 * 1024) {
+   trace_afs_file_error(dvnode, -EFBIG, afs_file_error_dir_big);
return ERR_PTR(-EFBIG);
+   }
 
_enter("%llu", i_size);
 
@@ -315,7 +318,8 @@ static struct afs_read *afs_read_dir(struct afs_vnode 
*dvnode, struct key *key)
 /*
  * deal with one block in an AFS directory
  */
-static int afs_dir_iterate_block(struct dir_context *ctx,
+static int afs_dir_iterate_block(struct afs_vnode *dvnode,
+struct dir_context *ctx,
 union afs_xdr_dir_block *block,
 unsigned blkoff)
 {
@@ -365,7 +369,7 @@ static int afs_dir_iterate_block(struct dir_context *ctx,
   " (len %u/%zu)",
   blkoff / sizeof(union afs_xdr_dir_block),
   offset, next, tmp, nlen);
-   return -EIO;
+   return afs_bad(dvnode, 
afs_file_error_dir_over_end);
}
if (!(block->hdr.bitmap[next / 8] &
  (1 << (next % 8 {
@@ -373,7 +377,7 @@ static int afs_dir_iterate_block(struct dir_context *ctx,
   " %u unmarked extension (len %u/%zu)",
   blkoff / sizeof(union afs_xdr_dir_block),
   offset, next, tmp, nlen);
-   return -EIO;
+   return afs_bad(dvnode, 
afs_file_error_dir_unmarked_ext);
}
 
_debug("ENT[%zu.%u]: ext %u/%zu",
@@ -442,7 +446,7 @@ static int afs_dir_iterate(struct

[PATCH 09/25] afs: Fix TTL on VL server and address lists [ver #2]

2018-10-23 Thread David Howells

Currently the TTL on VL server and address lists isn't set in all
circumstances and may be set to poor choices in others, since the TTL is
derived from the SRV/AFSDB DNS record if and when available.

Fix the TTL by limiting the range to a minimum and maximum from the current
time.  At some point these can be made into sysctl knobs.  Further, use the
TTL we obtained from the upcall to set the expiry on negative results too;
in future a mechanism can be added to force reloading of such data.

Signed-off-by: David Howells 
---

 fs/afs/cell.c |   26 ++
 fs/afs/proc.c |   14 +++---
 2 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/fs/afs/cell.c b/fs/afs/cell.c
index 963b6fa51fdf..cf445dbd5f2e 100644
--- a/fs/afs/cell.c
+++ b/fs/afs/cell.c
@@ -20,6 +20,8 @@
 #include "internal.h"
 
 static unsigned __read_mostly afs_cell_gc_delay = 10;
+static unsigned __read_mostly afs_cell_min_ttl = 10 * 60;
+static unsigned __read_mostly afs_cell_max_ttl = 24 * 60 * 60;
 
 static void afs_manage_cell(struct work_struct *);
 
@@ -171,6 +173,8 @@ static struct afs_cell *afs_alloc_cell(struct afs_net *net,
 
rcu_assign_pointer(cell->vl_servers, vllist);
cell->dns_expiry = TIME64_MAX;
+   } else {
+   cell->dns_expiry = ktime_get_real_seconds();
}
 
_leave(" = %p", cell);
@@ -358,25 +362,39 @@ int afs_cell_init(struct afs_net *net, const char 
*rootcell)
 static void afs_update_cell(struct afs_cell *cell)
 {
struct afs_vlserver_list *vllist, *old;
-   time64_t now, expiry;
+   unsigned int min_ttl = READ_ONCE(afs_cell_min_ttl);
+   unsigned int max_ttl = READ_ONCE(afs_cell_max_ttl);
+   time64_t now, expiry = 0;
 
_enter("%s", cell->name);
 
vllist = afs_dns_query(cell, );
+
+   now = ktime_get_real_seconds();
+   if (min_ttl > max_ttl)
+   max_ttl = min_ttl;
+   if (expiry < now + min_ttl)
+   expiry = now + min_ttl;
+   else if (expiry > now + max_ttl)
+   expiry = now + max_ttl;
+
if (IS_ERR(vllist)) {
switch (PTR_ERR(vllist)) {
case -ENODATA:
-   /* The DNS said that the cell does not exist */
+   case -EDESTADDRREQ:
+   /* The DNS said that the cell does not exist or there
+* weren't any addresses to be had.
+*/
set_bit(AFS_CELL_FL_NOT_FOUND, >flags);
clear_bit(AFS_CELL_FL_DNS_FAIL, >flags);
-   cell->dns_expiry = ktime_get_real_seconds() + 61;
+   cell->dns_expiry = expiry;
break;
 
case -EAGAIN:
case -ECONNREFUSED:
default:
set_bit(AFS_CELL_FL_DNS_FAIL, >flags);
-   cell->dns_expiry = ktime_get_real_seconds() + 10;
+   cell->dns_expiry = now + 10;
break;
}
 
diff --git a/fs/afs/proc.c b/fs/afs/proc.c
index 6585f4bec0d3..fc36c41641ab 100644
--- a/fs/afs/proc.c
+++ b/fs/afs/proc.c
@@ -37,16 +37,24 @@ static inline struct afs_net *afs_seq2net_single(struct 
seq_file *m)
  */
 static int afs_proc_cells_show(struct seq_file *m, void *v)
 {
-   struct afs_cell *cell = list_entry(v, struct afs_cell, proc_link);
+   struct afs_vlserver_list *vllist;
+   struct afs_cell *cell;
 
if (v == SEQ_START_TOKEN) {
/* display header on line 1 */
-   seq_puts(m, "USE NAME\n");
+   seq_puts(m, "USETTL SV NAME\n");
return 0;
}
 
+   cell = list_entry(v, struct afs_cell, proc_link);
+   vllist = rcu_dereference(cell->vl_servers);
+
/* display one cell per line on subsequent lines */
-   seq_printf(m, "%3u %s\n", atomic_read(>usage), cell->name);
+   seq_printf(m, "%3u %6lld %2u %s\n",
+  atomic_read(>usage),
+  cell->dns_expiry - ktime_get_real_seconds(),
+  vllist ? vllist->nr_servers : 0,
+  cell->name);
return 0;
 }

[PATCH 09/25] afs: Fix TTL on VL server and address lists [ver #2]

2018-10-23 Thread David Howells

Currently the TTL on VL server and address lists isn't set in all
circumstances and may be set to poor choices in others, since the TTL is
derived from the SRV/AFSDB DNS record if and when available.

Fix the TTL by limiting the range to a minimum and maximum from the current
time.  At some point these can be made into sysctl knobs.  Further, use the
TTL we obtained from the upcall to set the expiry on negative results too;
in future a mechanism can be added to force reloading of such data.

Signed-off-by: David Howells 
---

 fs/afs/cell.c |   26 ++
 fs/afs/proc.c |   14 +++---
 2 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/fs/afs/cell.c b/fs/afs/cell.c
index 963b6fa51fdf..cf445dbd5f2e 100644
--- a/fs/afs/cell.c
+++ b/fs/afs/cell.c
@@ -20,6 +20,8 @@
 #include "internal.h"
 
 static unsigned __read_mostly afs_cell_gc_delay = 10;
+static unsigned __read_mostly afs_cell_min_ttl = 10 * 60;
+static unsigned __read_mostly afs_cell_max_ttl = 24 * 60 * 60;
 
 static void afs_manage_cell(struct work_struct *);
 
@@ -171,6 +173,8 @@ static struct afs_cell *afs_alloc_cell(struct afs_net *net,
 
rcu_assign_pointer(cell->vl_servers, vllist);
cell->dns_expiry = TIME64_MAX;
+   } else {
+   cell->dns_expiry = ktime_get_real_seconds();
}
 
_leave(" = %p", cell);
@@ -358,25 +362,39 @@ int afs_cell_init(struct afs_net *net, const char 
*rootcell)
 static void afs_update_cell(struct afs_cell *cell)
 {
struct afs_vlserver_list *vllist, *old;
-   time64_t now, expiry;
+   unsigned int min_ttl = READ_ONCE(afs_cell_min_ttl);
+   unsigned int max_ttl = READ_ONCE(afs_cell_max_ttl);
+   time64_t now, expiry = 0;
 
_enter("%s", cell->name);
 
vllist = afs_dns_query(cell, );
+
+   now = ktime_get_real_seconds();
+   if (min_ttl > max_ttl)
+   max_ttl = min_ttl;
+   if (expiry < now + min_ttl)
+   expiry = now + min_ttl;
+   else if (expiry > now + max_ttl)
+   expiry = now + max_ttl;
+
if (IS_ERR(vllist)) {
switch (PTR_ERR(vllist)) {
case -ENODATA:
-   /* The DNS said that the cell does not exist */
+   case -EDESTADDRREQ:
+   /* The DNS said that the cell does not exist or there
+* weren't any addresses to be had.
+*/
set_bit(AFS_CELL_FL_NOT_FOUND, >flags);
clear_bit(AFS_CELL_FL_DNS_FAIL, >flags);
-   cell->dns_expiry = ktime_get_real_seconds() + 61;
+   cell->dns_expiry = expiry;
break;
 
case -EAGAIN:
case -ECONNREFUSED:
default:
set_bit(AFS_CELL_FL_DNS_FAIL, >flags);
-   cell->dns_expiry = ktime_get_real_seconds() + 10;
+   cell->dns_expiry = now + 10;
break;
}
 
diff --git a/fs/afs/proc.c b/fs/afs/proc.c
index 6585f4bec0d3..fc36c41641ab 100644
--- a/fs/afs/proc.c
+++ b/fs/afs/proc.c
@@ -37,16 +37,24 @@ static inline struct afs_net *afs_seq2net_single(struct 
seq_file *m)
  */
 static int afs_proc_cells_show(struct seq_file *m, void *v)
 {
-   struct afs_cell *cell = list_entry(v, struct afs_cell, proc_link);
+   struct afs_vlserver_list *vllist;
+   struct afs_cell *cell;
 
if (v == SEQ_START_TOKEN) {
/* display header on line 1 */
-   seq_puts(m, "USE NAME\n");
+   seq_puts(m, "USETTL SV NAME\n");
return 0;
}
 
+   cell = list_entry(v, struct afs_cell, proc_link);
+   vllist = rcu_dereference(cell->vl_servers);
+
/* display one cell per line on subsequent lines */
-   seq_printf(m, "%3u %s\n", atomic_read(>usage), cell->name);
+   seq_printf(m, "%3u %6lld %2u %s\n",
+  atomic_read(>usage),
+  cell->dns_expiry - ktime_get_real_seconds(),
+  vllist ? vllist->nr_servers : 0,
+  cell->name);
return 0;
 }

[PATCH 17/25] afs: Fix FS.FetchStatus delivery from updating wrong vnode [ver #2]

2018-10-23 Thread David Howells

The FS.FetchStatus reply delivery function was updating inode of the
directory in which a lookup had been done with the status of the looked up
file.  This corrupts some of the directory state.

Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a 
single lookup")
Signed-off-by: David Howells 
---

 fs/afs/fsclient.c |   16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 5e3027f21390..f758750e81d8 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -2026,7 +2026,7 @@ static int afs_deliver_fs_fetch_status(struct afs_call 
*call)
struct afs_file_status *status = call->reply[1];
struct afs_callback *callback = call->reply[2];
struct afs_volsync *volsync = call->reply[3];
-   struct afs_vnode *vnode = call->reply[0];
+   struct afs_fid *fid = call->reply[0];
const __be32 *bp;
int ret;
 
@@ -2034,21 +2034,15 @@ static int afs_deliver_fs_fetch_status(struct afs_call 
*call)
if (ret < 0)
return ret;
 
-   _enter("{%llx:%llu}", vnode->fid.vid, vnode->fid.vnode);
+   _enter("{%llx:%llu}", fid->vid, fid->vnode);
 
/* unmarshall the reply once we've received all of it */
bp = call->buffer;
-   ret = afs_decode_status(call, , status, vnode,
+   ret = afs_decode_status(call, , status, NULL,
>expected_version, NULL);
if (ret < 0)
return ret;
-   callback[call->count].version   = ntohl(bp[0]);
-   callback[call->count].expiry= ntohl(bp[1]);
-   callback[call->count].type  = ntohl(bp[2]);
-   if (vnode)
-   xdr_decode_AFSCallBack(call, vnode, );
-   else
-   bp += 3;
+   xdr_decode_AFSCallBack_raw(, callback);
if (volsync)
xdr_decode_AFSVolSync(, volsync);
 
@@ -2089,7 +2083,7 @@ int afs_fs_fetch_status(struct afs_fs_cursor *fc,
}
 
call->key = fc->key;
-   call->reply[0] = NULL; /* vnode for fid[0] */
+   call->reply[0] = fid;
call->reply[1] = status;
call->reply[2] = callback;
call->reply[3] = volsync;

[PATCH 13/25] afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS [ver #2]

2018-10-23 Thread David Howells

Increase the sizes of the volume ID to 64 bits and the vnode ID (inode
number equivalent) to 96 bits to allow the support of YFS.

This requires the iget comparator to check the vnode->fid rather than i_ino
and i_generation as i_ino is not sufficiently capacious.  It also requires
this data to be placed into the vnode cache key for fscache.

For the moment, just discard the top 32 bits of the vnode ID when returning
it though stat.

Signed-off-by: David Howells 
---

 fs/afs/afs.h   |   11 ++-
 fs/afs/cache.c |2 +-
 fs/afs/callback.c  |2 +-
 fs/afs/dir.c   |   24 
 fs/afs/dynroot.c   |2 +-
 fs/afs/file.c  |8 
 fs/afs/flock.c |   22 +++---
 fs/afs/fsclient.c  |   24 
 fs/afs/inode.c |   31 +--
 fs/afs/proc.c  |2 +-
 fs/afs/rotate.c|2 +-
 fs/afs/security.c  |6 +++---
 fs/afs/super.c |5 +++--
 fs/afs/volume.c|2 +-
 fs/afs/write.c |   18 +-
 fs/afs/xattr.c |2 +-
 include/trace/events/afs.h |4 ++--
 17 files changed, 86 insertions(+), 81 deletions(-)

diff --git a/fs/afs/afs.h b/fs/afs/afs.h
index b4ff1f7ae4ab..c23b31b742fa 100644
--- a/fs/afs/afs.h
+++ b/fs/afs/afs.h
@@ -23,9 +23,9 @@
 #define AFSPATHMAX 1024/* Maximum length of a pathname plus 
NUL */
 #define AFSOPAQUEMAX   1024/* Maximum length of an opaque field */
 
-typedef unsigned   afs_volid_t;
-typedef unsigned   afs_vnodeid_t;
-typedef unsigned long long afs_dataversion_t;
+typedef u64afs_volid_t;
+typedef u64afs_vnodeid_t;
+typedef u64afs_dataversion_t;
 
 typedef enum {
AFSVL_RWVOL,/* read/write volume */
@@ -52,8 +52,9 @@ typedef enum {
  */
 struct afs_fid {
afs_volid_t vid;/* volume ID */
-   afs_vnodeid_t   vnode;  /* file index within volume */
-   unsignedunique; /* unique ID number (file index 
version) */
+   afs_vnodeid_t   vnode;  /* Lower 64-bits of file index within 
volume */
+   u32 vnode_hi;   /* Upper 32-bits of file index */
+   u32 unique; /* unique ID number (file index 
version) */
 };
 
 /*
diff --git a/fs/afs/cache.c b/fs/afs/cache.c
index b1c31ec4523a..f6d0a21e8052 100644
--- a/fs/afs/cache.c
+++ b/fs/afs/cache.c
@@ -49,7 +49,7 @@ static enum fscache_checkaux afs_vnode_cache_check_aux(void 
*cookie_netfs_data,
struct afs_vnode *vnode = cookie_netfs_data;
struct afs_vnode_cache_aux aux;
 
-   _enter("{%x,%x,%llx},%p,%u",
+   _enter("{%llx,%x,%llx},%p,%u",
   vnode->fid.vnode, vnode->fid.unique, vnode->status.data_version,
   buffer, buflen);
 
diff --git a/fs/afs/callback.c b/fs/afs/callback.c
index 5f261fbf2182..8698198ad427 100644
--- a/fs/afs/callback.c
+++ b/fs/afs/callback.c
@@ -310,7 +310,7 @@ void afs_break_callbacks(struct afs_server *server, size_t 
count,
/* TODO: Sort the callback break list by volume ID */
 
for (; count > 0; callbacks++, count--) {
-   _debug("- Fid { vl=%08x n=%u u=%u }  CB { v=%u x=%u t=%u }",
+   _debug("- Fid { vl=%08llx n=%llu u=%u }  CB { v=%u x=%u t=%u }",
   callbacks->fid.vid,
   callbacks->fid.vnode,
   callbacks->fid.unique,
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 78f9754fd03d..024b7cf7441c 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -552,7 +552,7 @@ static int afs_do_lookup_one(struct inode *dir, struct 
dentry *dentry,
}
 
*fid = cookie.fid;
-   _leave(" = 0 { vn=%u u=%u }", fid->vnode, fid->unique);
+   _leave(" = 0 { vn=%llu u=%u }", fid->vnode, fid->unique);
return 0;
 }
 
@@ -830,7 +830,7 @@ static struct dentry *afs_lookup(struct inode *dir, struct 
dentry *dentry,
struct key *key;
int ret;
 
-   _enter("{%x:%u},%p{%pd},",
+   _enter("{%llx:%llu},%p{%pd},",
   dvnode->fid.vid, dvnode->fid.vnode, dentry, dentry);
 
ASSERTCMP(d_inode(dentry), ==, NULL);
@@ -900,7 +900,7 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned 
int flags)
 
if (d_really_is_positive(dentry)) {
vnode = AFS_FS_I(d_inode(dentry));
-   _enter("{v={%x:%u} n=%pd fl=%lx},",
+   _enter("{v={%llx:%llu} n=%pd fl=%lx},",
   vnode->fid.vid, vnode->fid.vnode, dentry,
   vnode->flags);
} else {
@@ -969,7 +969,7 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned 
int flags)
/* if the vnode ID has changed, then the dirent points to a

[PATCH 17/25] afs: Fix FS.FetchStatus delivery from updating wrong vnode [ver #2]

2018-10-23 Thread David Howells

The FS.FetchStatus reply delivery function was updating inode of the
directory in which a lookup had been done with the status of the looked up
file.  This corrupts some of the directory state.

Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a 
single lookup")
Signed-off-by: David Howells 
---

 fs/afs/fsclient.c |   16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 5e3027f21390..f758750e81d8 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -2026,7 +2026,7 @@ static int afs_deliver_fs_fetch_status(struct afs_call 
*call)
struct afs_file_status *status = call->reply[1];
struct afs_callback *callback = call->reply[2];
struct afs_volsync *volsync = call->reply[3];
-   struct afs_vnode *vnode = call->reply[0];
+   struct afs_fid *fid = call->reply[0];
const __be32 *bp;
int ret;
 
@@ -2034,21 +2034,15 @@ static int afs_deliver_fs_fetch_status(struct afs_call 
*call)
if (ret < 0)
return ret;
 
-   _enter("{%llx:%llu}", vnode->fid.vid, vnode->fid.vnode);
+   _enter("{%llx:%llu}", fid->vid, fid->vnode);
 
/* unmarshall the reply once we've received all of it */
bp = call->buffer;
-   ret = afs_decode_status(call, , status, vnode,
+   ret = afs_decode_status(call, , status, NULL,
>expected_version, NULL);
if (ret < 0)
return ret;
-   callback[call->count].version   = ntohl(bp[0]);
-   callback[call->count].expiry= ntohl(bp[1]);
-   callback[call->count].type  = ntohl(bp[2]);
-   if (vnode)
-   xdr_decode_AFSCallBack(call, vnode, );
-   else
-   bp += 3;
+   xdr_decode_AFSCallBack_raw(, callback);
if (volsync)
xdr_decode_AFSVolSync(, volsync);
 
@@ -2089,7 +2083,7 @@ int afs_fs_fetch_status(struct afs_fs_cursor *fc,
}
 
call->key = fc->key;
-   call->reply[0] = NULL; /* vnode for fid[0] */
+   call->reply[0] = fid;
call->reply[1] = status;
call->reply[2] = callback;
call->reply[3] = volsync;

[PATCH 13/25] afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS [ver #2]

2018-10-23 Thread David Howells

Increase the sizes of the volume ID to 64 bits and the vnode ID (inode
number equivalent) to 96 bits to allow the support of YFS.

This requires the iget comparator to check the vnode->fid rather than i_ino
and i_generation as i_ino is not sufficiently capacious.  It also requires
this data to be placed into the vnode cache key for fscache.

For the moment, just discard the top 32 bits of the vnode ID when returning
it though stat.

Signed-off-by: David Howells 
---

 fs/afs/afs.h   |   11 ++-
 fs/afs/cache.c |2 +-
 fs/afs/callback.c  |2 +-
 fs/afs/dir.c   |   24 
 fs/afs/dynroot.c   |2 +-
 fs/afs/file.c  |8 
 fs/afs/flock.c |   22 +++---
 fs/afs/fsclient.c  |   24 
 fs/afs/inode.c |   31 +--
 fs/afs/proc.c  |2 +-
 fs/afs/rotate.c|2 +-
 fs/afs/security.c  |6 +++---
 fs/afs/super.c |5 +++--
 fs/afs/volume.c|2 +-
 fs/afs/write.c |   18 +-
 fs/afs/xattr.c |2 +-
 include/trace/events/afs.h |4 ++--
 17 files changed, 86 insertions(+), 81 deletions(-)

diff --git a/fs/afs/afs.h b/fs/afs/afs.h
index b4ff1f7ae4ab..c23b31b742fa 100644
--- a/fs/afs/afs.h
+++ b/fs/afs/afs.h
@@ -23,9 +23,9 @@
 #define AFSPATHMAX 1024/* Maximum length of a pathname plus 
NUL */
 #define AFSOPAQUEMAX   1024/* Maximum length of an opaque field */
 
-typedef unsigned   afs_volid_t;
-typedef unsigned   afs_vnodeid_t;
-typedef unsigned long long afs_dataversion_t;
+typedef u64afs_volid_t;
+typedef u64afs_vnodeid_t;
+typedef u64afs_dataversion_t;
 
 typedef enum {
AFSVL_RWVOL,/* read/write volume */
@@ -52,8 +52,9 @@ typedef enum {
  */
 struct afs_fid {
afs_volid_t vid;/* volume ID */
-   afs_vnodeid_t   vnode;  /* file index within volume */
-   unsignedunique; /* unique ID number (file index 
version) */
+   afs_vnodeid_t   vnode;  /* Lower 64-bits of file index within 
volume */
+   u32 vnode_hi;   /* Upper 32-bits of file index */
+   u32 unique; /* unique ID number (file index 
version) */
 };
 
 /*
diff --git a/fs/afs/cache.c b/fs/afs/cache.c
index b1c31ec4523a..f6d0a21e8052 100644
--- a/fs/afs/cache.c
+++ b/fs/afs/cache.c
@@ -49,7 +49,7 @@ static enum fscache_checkaux afs_vnode_cache_check_aux(void 
*cookie_netfs_data,
struct afs_vnode *vnode = cookie_netfs_data;
struct afs_vnode_cache_aux aux;
 
-   _enter("{%x,%x,%llx},%p,%u",
+   _enter("{%llx,%x,%llx},%p,%u",
   vnode->fid.vnode, vnode->fid.unique, vnode->status.data_version,
   buffer, buflen);
 
diff --git a/fs/afs/callback.c b/fs/afs/callback.c
index 5f261fbf2182..8698198ad427 100644
--- a/fs/afs/callback.c
+++ b/fs/afs/callback.c
@@ -310,7 +310,7 @@ void afs_break_callbacks(struct afs_server *server, size_t 
count,
/* TODO: Sort the callback break list by volume ID */
 
for (; count > 0; callbacks++, count--) {
-   _debug("- Fid { vl=%08x n=%u u=%u }  CB { v=%u x=%u t=%u }",
+   _debug("- Fid { vl=%08llx n=%llu u=%u }  CB { v=%u x=%u t=%u }",
   callbacks->fid.vid,
   callbacks->fid.vnode,
   callbacks->fid.unique,
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 78f9754fd03d..024b7cf7441c 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -552,7 +552,7 @@ static int afs_do_lookup_one(struct inode *dir, struct 
dentry *dentry,
}
 
*fid = cookie.fid;
-   _leave(" = 0 { vn=%u u=%u }", fid->vnode, fid->unique);
+   _leave(" = 0 { vn=%llu u=%u }", fid->vnode, fid->unique);
return 0;
 }
 
@@ -830,7 +830,7 @@ static struct dentry *afs_lookup(struct inode *dir, struct 
dentry *dentry,
struct key *key;
int ret;
 
-   _enter("{%x:%u},%p{%pd},",
+   _enter("{%llx:%llu},%p{%pd},",
   dvnode->fid.vid, dvnode->fid.vnode, dentry, dentry);
 
ASSERTCMP(d_inode(dentry), ==, NULL);
@@ -900,7 +900,7 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned 
int flags)
 
if (d_really_is_positive(dentry)) {
vnode = AFS_FS_I(d_inode(dentry));
-   _enter("{v={%x:%u} n=%pd fl=%lx},",
+   _enter("{v={%llx:%llu} n=%pd fl=%lx},",
   vnode->fid.vid, vnode->fid.vnode, dentry,
   vnode->flags);
} else {
@@ -969,7 +969,7 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned 
int flags)
/* if the vnode ID has changed, then the dirent points to a

[PATCH v3 2/4] nds32: Perf porting

2018-10-23 Thread Nickhu

This is the commit that porting the perf for nds32.

1.Raw event:
The raw events start with 'r'.
Usage:
perf stat -e rXYZ ./app
X: the index of performance counter.
YZ: the index(convert to hexdecimal) of events

Example:
'perf stat -e r101 ./app' means the counter 1 will 
count the instruction
event.

The index of counter and events can be found in
"Andes System Privilege Architecture Version 3 Manual".

Or you can perform the 'perf list' to find the symbolic name of raw events.

2.Perf mmap2:

Fix unexpected perf mmap2() page fault

When the mmap2() called by perf application,
you will encounter such condition:"failed to write."
With return value -EFAULT

This is due to the page fault caused by "reading" buffer
from the mapped legal address region to write to the descriptor.
The page_fault handler will get a VM_FAULT_SIGBUS return value,
which should not happens here.(Due to this is a read request.)

You can refer to kernel/events/core.c:perf_mmap_fault(...)
If "(vmf->pgoff && (vmf->flags & FAULT_FLAG_WRITE))" is evaluated
as true, you will get VM_FAULT_SIGBUS as return value.

However, this is not an write request. The flags which indicated
why the page fault happens is wrong.

Furthermore, NDS32 SPAv3 is not able to detect it is read or write.
It only know  either it is instruction fetch or data access.

Therefore, by removing the wrong flag assignment(actually, the hardware
is not able to show the reason), we can fix this bug.

3.Perf multiple events map to same counter.

When there are multiple events map to the same counter, the counter
counts inaccurately. This is because each counter only counts one event
in the same time.
So when there are multiple events map to same counter, they have to take
turns in each context.

There are two solution:
1. Print the error message when multiple events map to the same counter.
But print the error message would let the program hang in loop. The ltp
(linux test program) would be failed when the program hang in loop.

2. Don't print the error message, the ltp would pass. But the user need 
to
have the knowledge that don't count the events which map to the same
counter, or the user will get the inaccurate results.

We choose method 2 for the solution

Signed-off-by: Nickhu 
---
 arch/nds32/Kconfig|1 +
 arch/nds32/boot/dts/ae3xx.dts |5 +
 arch/nds32/include/asm/Kbuild |1 +
 arch/nds32/include/asm/perf_event.h   |   16 +
 arch/nds32/include/asm/pmu.h  |  386 ++
 arch/nds32/include/asm/stacktrace.h   |   39 +
 arch/nds32/kernel/Makefile|3 +-
 arch/nds32/kernel/perf_event_cpu.c| 1223 +
 arch/nds32/mm/fault.c |   13 +-
 tools/include/asm/barrier.h   |2 +
 tools/perf/arch/nds32/Build   |1 +
 tools/perf/arch/nds32/util/Build  |1 +
 tools/perf/arch/nds32/util/header.c   |   29 +
 tools/perf/pmu-events/arch/nds32/mapfile.csv  |   15 +
 .../pmu-events/arch/nds32/n13/atcpmu.json |  290 
 15 files changed, 2019 insertions(+), 6 deletions(-)
 create mode 100644 arch/nds32/include/asm/perf_event.h
 create mode 100644 arch/nds32/include/asm/pmu.h
 create mode 100644 arch/nds32/include/asm/stacktrace.h
 create mode 100644 arch/nds32/kernel/perf_event_cpu.c
 create mode 100644 tools/perf/arch/nds32/Build
 create mode 100644 tools/perf/arch/nds32/util/Build
 create mode 100644 tools/perf/arch/nds32/util/header.c
 create mode 100644 tools/perf/pmu-events/arch/nds32/mapfile.csv
 create mode 100644 tools/perf/pmu-events/arch/nds32/n13/atcpmu.json

diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig
index 7068f341133d..dd448d431f5a 100644
--- a/arch/nds32/Kconfig
+++ b/arch/nds32/Kconfig
@@ -31,6 +31,7 @@ config NDS32
select HAVE_DEBUG_KMEMLEAK
select HAVE_MEMBLOCK
select HAVE_REGS_AND_STACK_ACCESS_API
+   select HAVE_PERF_EVENTS
select IRQ_DOMAIN
select LOCKDEP_SUPPORT
select MODULES_USE_ELF_RELA
diff --git a/arch/nds32/boot/dts/ae3xx.dts b/arch/nds32/boot/dts/ae3xx.dts
index bb39749a6673..16a9f54a805e 100644
--- a/arch/nds32/boot/dts/ae3xx.dts
+++ b/arch/nds32/boot/dts/ae3xx.dts
@@ -82,4 +82,9 @@
interrupts = <18>;
};
};
+
+   pmu {
+   compatible = "andestech,nds32v3-pmu";
+   interrupts= <13>;
+   };
 };
diff --git a/arch/nds32/include/asm/Kbuild b/arch/nds32/include/asm/Kbuild
index

[PATCH v3 2/4] nds32: Perf porting

2018-10-23 Thread Nickhu

This is the commit that porting the perf for nds32.

1.Raw event:
The raw events start with 'r'.
Usage:
perf stat -e rXYZ ./app
X: the index of performance counter.
YZ: the index(convert to hexdecimal) of events

Example:
'perf stat -e r101 ./app' means the counter 1 will 
count the instruction
event.

The index of counter and events can be found in
"Andes System Privilege Architecture Version 3 Manual".

Or you can perform the 'perf list' to find the symbolic name of raw events.

2.Perf mmap2:

Fix unexpected perf mmap2() page fault

When the mmap2() called by perf application,
you will encounter such condition:"failed to write."
With return value -EFAULT

This is due to the page fault caused by "reading" buffer
from the mapped legal address region to write to the descriptor.
The page_fault handler will get a VM_FAULT_SIGBUS return value,
which should not happens here.(Due to this is a read request.)

You can refer to kernel/events/core.c:perf_mmap_fault(...)
If "(vmf->pgoff && (vmf->flags & FAULT_FLAG_WRITE))" is evaluated
as true, you will get VM_FAULT_SIGBUS as return value.

However, this is not an write request. The flags which indicated
why the page fault happens is wrong.

Furthermore, NDS32 SPAv3 is not able to detect it is read or write.
It only know  either it is instruction fetch or data access.

Therefore, by removing the wrong flag assignment(actually, the hardware
is not able to show the reason), we can fix this bug.

3.Perf multiple events map to same counter.

When there are multiple events map to the same counter, the counter
counts inaccurately. This is because each counter only counts one event
in the same time.
So when there are multiple events map to same counter, they have to take
turns in each context.

There are two solution:
1. Print the error message when multiple events map to the same counter.
But print the error message would let the program hang in loop. The ltp
(linux test program) would be failed when the program hang in loop.

2. Don't print the error message, the ltp would pass. But the user need 
to
have the knowledge that don't count the events which map to the same
counter, or the user will get the inaccurate results.

We choose method 2 for the solution

Signed-off-by: Nickhu 
---
 arch/nds32/Kconfig|1 +
 arch/nds32/boot/dts/ae3xx.dts |5 +
 arch/nds32/include/asm/Kbuild |1 +
 arch/nds32/include/asm/perf_event.h   |   16 +
 arch/nds32/include/asm/pmu.h  |  386 ++
 arch/nds32/include/asm/stacktrace.h   |   39 +
 arch/nds32/kernel/Makefile|3 +-
 arch/nds32/kernel/perf_event_cpu.c| 1223 +
 arch/nds32/mm/fault.c |   13 +-
 tools/include/asm/barrier.h   |2 +
 tools/perf/arch/nds32/Build   |1 +
 tools/perf/arch/nds32/util/Build  |1 +
 tools/perf/arch/nds32/util/header.c   |   29 +
 tools/perf/pmu-events/arch/nds32/mapfile.csv  |   15 +
 .../pmu-events/arch/nds32/n13/atcpmu.json |  290 
 15 files changed, 2019 insertions(+), 6 deletions(-)
 create mode 100644 arch/nds32/include/asm/perf_event.h
 create mode 100644 arch/nds32/include/asm/pmu.h
 create mode 100644 arch/nds32/include/asm/stacktrace.h
 create mode 100644 arch/nds32/kernel/perf_event_cpu.c
 create mode 100644 tools/perf/arch/nds32/Build
 create mode 100644 tools/perf/arch/nds32/util/Build
 create mode 100644 tools/perf/arch/nds32/util/header.c
 create mode 100644 tools/perf/pmu-events/arch/nds32/mapfile.csv
 create mode 100644 tools/perf/pmu-events/arch/nds32/n13/atcpmu.json

diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig
index 7068f341133d..dd448d431f5a 100644
--- a/arch/nds32/Kconfig
+++ b/arch/nds32/Kconfig
@@ -31,6 +31,7 @@ config NDS32
select HAVE_DEBUG_KMEMLEAK
select HAVE_MEMBLOCK
select HAVE_REGS_AND_STACK_ACCESS_API
+   select HAVE_PERF_EVENTS
select IRQ_DOMAIN
select LOCKDEP_SUPPORT
select MODULES_USE_ELF_RELA
diff --git a/arch/nds32/boot/dts/ae3xx.dts b/arch/nds32/boot/dts/ae3xx.dts
index bb39749a6673..16a9f54a805e 100644
--- a/arch/nds32/boot/dts/ae3xx.dts
+++ b/arch/nds32/boot/dts/ae3xx.dts
@@ -82,4 +82,9 @@
interrupts = <18>;
};
};
+
+   pmu {
+   compatible = "andestech,nds32v3-pmu";
+   interrupts= <13>;
+   };
 };
diff --git a/arch/nds32/include/asm/Kbuild b/arch/nds32/include/asm/Kbuild
index

[PATCH 15/25] afs: Remove callback details from afs_callback_break struct [ver #2]

2018-10-23 Thread David Howells

Remove unnecessary details of a broken callback, such as version, expiry
and type, from the afs_callback_break struct as they're not actually used
and make the list take more memory.

Signed-off-by: David Howells 
---

 fs/afs/afs.h   |2 +-
 fs/afs/callback.c  |8 ++--
 fs/afs/cmservice.c |   17 +
 3 files changed, 8 insertions(+), 19 deletions(-)

diff --git a/fs/afs/afs.h b/fs/afs/afs.h
index c23b31b742fa..fb9bcb8758ea 100644
--- a/fs/afs/afs.h
+++ b/fs/afs/afs.h
@@ -75,7 +75,7 @@ struct afs_callback {
 
 struct afs_callback_break {
struct afs_fid  fid;/* File identifier */
-   struct afs_callback cb; /* Callback details */
+   //struct afs_callback   cb; /* Callback details */
 };
 
 #define AFSCBMAX 50/* maximum callbacks transferred per bulk op */
diff --git a/fs/afs/callback.c b/fs/afs/callback.c
index 8698198ad427..df9bfee698ad 100644
--- a/fs/afs/callback.c
+++ b/fs/afs/callback.c
@@ -310,14 +310,10 @@ void afs_break_callbacks(struct afs_server *server, 
size_t count,
/* TODO: Sort the callback break list by volume ID */
 
for (; count > 0; callbacks++, count--) {
-   _debug("- Fid { vl=%08llx n=%llu u=%u }  CB { v=%u x=%u t=%u }",
+   _debug("- Fid { vl=%08llx n=%llu u=%u }",
   callbacks->fid.vid,
   callbacks->fid.vnode,
-  callbacks->fid.unique,
-  callbacks->cb.version,
-  callbacks->cb.expiry,
-  callbacks->cb.type
-  );
+  callbacks->fid.unique);
afs_break_one_callback(server, >fid);
}
 
diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c
index 186f621f8722..fc0010d800a0 100644
--- a/fs/afs/cmservice.c
+++ b/fs/afs/cmservice.c
@@ -218,7 +218,6 @@ static int afs_deliver_cb_callback(struct afs_call *call)
cb->fid.vid = ntohl(*bp++);
cb->fid.vnode   = ntohl(*bp++);
cb->fid.unique  = ntohl(*bp++);
-   cb->cb.type = AFSCM_CB_UNTYPED;
}
 
afs_extract_to_tmp(call);
@@ -236,24 +235,18 @@ static int afs_deliver_cb_callback(struct afs_call *call)
if (call->count2 != call->count && call->count2 != 0)
return afs_protocol_error(call, -EBADMSG,
  afs_eproto_cb_count);
-   afs_extract_to_buf(call, call->count2 * 3 * 4);
+   call->_iter = >iter;
+   iov_iter_discard(>iter, READ, call->count2 * 3 * 4);
call->unmarshall++;
 
case 4:
-   _debug("extract CB array");
+   _debug("extract discard %zu/%u",
+  iov_iter_count(>iter), call->count2 * 3 * 4);
+
ret = afs_extract_data(call, false);
if (ret < 0)
return ret;
 
-   _debug("unmarshall CB array");
-   cb = call->request;
-   bp = call->buffer;
-   for (loop = call->count2; loop > 0; loop--, cb++) {
-   cb->cb.version  = ntohl(*bp++);
-   cb->cb.expiry   = ntohl(*bp++);
-   cb->cb.type = ntohl(*bp++);
-   }
-
call->unmarshall++;
case 5:
break;

[RFC v2 10/14] kunit: add Python libraries for handing KUnit config and kernel

2018-10-23 Thread Brendan Higgins

The ultimate goal is to create minimal isolated test binaries; in the
meantime we are using UML to provide the infrastructure to run tests, so
define an abstract way to configure and run tests that allow us to
change the context in which tests are built without affecting the user.
This also makes pretty and dynamic error reporting, and a lot of other
nice features easier.

kunit_config.py:
  - parse .config and Kconfig files.

kunit_kernel.py: provides helper functions to:
  - configure the kernel using kunitconfig.
  - build the kernel with the appropriate configuration.
  - provide function to invoke the kernel and stream the output back.

Signed-off-by: Felix Guo 
Signed-off-by: Brendan Higgins 
---
 tools/testing/kunit/.gitignore  |   3 +
 tools/testing/kunit/kunit_config.py |  60 ++
 tools/testing/kunit/kunit_kernel.py | 123 
 3 files changed, 186 insertions(+)
 create mode 100644 tools/testing/kunit/.gitignore
 create mode 100644 tools/testing/kunit/kunit_config.py
 create mode 100644 tools/testing/kunit/kunit_kernel.py

diff --git a/tools/testing/kunit/.gitignore b/tools/testing/kunit/.gitignore
new file mode 100644
index 0..c791ff59a37a9
--- /dev/null
+++ b/tools/testing/kunit/.gitignore
@@ -0,0 +1,3 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
\ No newline at end of file
diff --git a/tools/testing/kunit/kunit_config.py 
b/tools/testing/kunit/kunit_config.py
new file mode 100644
index 0..183bd5e758762
--- /dev/null
+++ b/tools/testing/kunit/kunit_config.py
@@ -0,0 +1,60 @@
+# SPDX-License-Identifier: GPL-2.0
+
+import collections
+import re
+
+CONFIG_IS_NOT_SET_PATTERN = r'^# CONFIG_\w+ is not set$'
+CONFIG_PATTERN = r'^CONFIG_\w+=\S+$'
+
+KconfigEntryBase = collections.namedtuple('KconfigEntry', ['raw_entry'])
+
+
+class KconfigEntry(KconfigEntryBase):
+
+   def __str__(self) -> str:
+   return self.raw_entry
+
+
+class KconfigParseError(Exception):
+   """Error parsing Kconfig defconfig or .config."""
+
+
+class Kconfig(object):
+   """Represents defconfig or .config specified using the Kconfig 
language."""
+
+   def __init__(self):
+   self._entries = []
+
+   def entries(self):
+   return set(self._entries)
+
+   def add_entry(self, entry: KconfigEntry) -> None:
+   self._entries.append(entry)
+
+   def is_subset_of(self, other: "Kconfig") -> bool:
+   return self.entries().issubset(other.entries())
+
+   def write_to_file(self, path: str) -> None:
+   with open(path, 'w') as f:
+   for entry in self.entries():
+   f.write(str(entry) + '\n')
+
+   def parse_from_string(self, blob: str) -> None:
+   """Parses a string containing KconfigEntrys and populates this 
Kconfig."""
+   self._entries = []
+   is_not_set_matcher = re.compile(CONFIG_IS_NOT_SET_PATTERN)
+   config_matcher = re.compile(CONFIG_PATTERN)
+   for line in blob.split('\n'):
+   line = line.strip()
+   if not line:
+   continue
+   elif config_matcher.match(line) or 
is_not_set_matcher.match(line):
+   self._entries.append(KconfigEntry(line))
+   elif line[0] == '#':
+   continue
+   else:
+   raise KconfigParseError('Failed to parse: ' + 
line)
+
+   def read_from_file(self, path: str) -> None:
+   with open(path, 'r') as f:
+   self.parse_from_string(f.read())
diff --git a/tools/testing/kunit/kunit_kernel.py 
b/tools/testing/kunit/kunit_kernel.py
new file mode 100644
index 0..87abaede50513
--- /dev/null
+++ b/tools/testing/kunit/kunit_kernel.py
@@ -0,0 +1,123 @@
+# SPDX-License-Identifier: GPL-2.0
+
+import logging
+import subprocess
+import os
+
+import kunit_config
+
+KCONFIG_PATH = '.config'
+
+class ConfigError(Exception):
+   """Represents an error trying to configure the Linux kernel."""
+
+
+class BuildError(Exception):
+   """Represents an error trying to build the Linux kernel."""
+
+
+class LinuxSourceTreeOperations(object):
+   """An abstraction over command line operations performed on a source 
tree."""
+
+   def make_mrproper(self):
+   try:
+   subprocess.check_output(['make', 'mrproper'])
+   except OSError as e:
+   raise ConfigError('Could not call make command: ' + e)
+   except subprocess.CalledProcessError as e:
+   raise ConfigError(e.output)
+
+   def make_olddefconfig(self):
+   try:
+   subprocess.check_output(['make', 'ARCH=um', 
'olddefconfig'])
+   except OSError as e:
+

[PATCH 15/25] afs: Remove callback details from afs_callback_break struct [ver #2]

2018-10-23 Thread David Howells

Remove unnecessary details of a broken callback, such as version, expiry
and type, from the afs_callback_break struct as they're not actually used
and make the list take more memory.

Signed-off-by: David Howells 
---

 fs/afs/afs.h   |2 +-
 fs/afs/callback.c  |8 ++--
 fs/afs/cmservice.c |   17 +
 3 files changed, 8 insertions(+), 19 deletions(-)

diff --git a/fs/afs/afs.h b/fs/afs/afs.h
index c23b31b742fa..fb9bcb8758ea 100644
--- a/fs/afs/afs.h
+++ b/fs/afs/afs.h
@@ -75,7 +75,7 @@ struct afs_callback {
 
 struct afs_callback_break {
struct afs_fid  fid;/* File identifier */
-   struct afs_callback cb; /* Callback details */
+   //struct afs_callback   cb; /* Callback details */
 };
 
 #define AFSCBMAX 50/* maximum callbacks transferred per bulk op */
diff --git a/fs/afs/callback.c b/fs/afs/callback.c
index 8698198ad427..df9bfee698ad 100644
--- a/fs/afs/callback.c
+++ b/fs/afs/callback.c
@@ -310,14 +310,10 @@ void afs_break_callbacks(struct afs_server *server, 
size_t count,
/* TODO: Sort the callback break list by volume ID */
 
for (; count > 0; callbacks++, count--) {
-   _debug("- Fid { vl=%08llx n=%llu u=%u }  CB { v=%u x=%u t=%u }",
+   _debug("- Fid { vl=%08llx n=%llu u=%u }",
   callbacks->fid.vid,
   callbacks->fid.vnode,
-  callbacks->fid.unique,
-  callbacks->cb.version,
-  callbacks->cb.expiry,
-  callbacks->cb.type
-  );
+  callbacks->fid.unique);
afs_break_one_callback(server, >fid);
}
 
diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c
index 186f621f8722..fc0010d800a0 100644
--- a/fs/afs/cmservice.c
+++ b/fs/afs/cmservice.c
@@ -218,7 +218,6 @@ static int afs_deliver_cb_callback(struct afs_call *call)
cb->fid.vid = ntohl(*bp++);
cb->fid.vnode   = ntohl(*bp++);
cb->fid.unique  = ntohl(*bp++);
-   cb->cb.type = AFSCM_CB_UNTYPED;
}
 
afs_extract_to_tmp(call);
@@ -236,24 +235,18 @@ static int afs_deliver_cb_callback(struct afs_call *call)
if (call->count2 != call->count && call->count2 != 0)
return afs_protocol_error(call, -EBADMSG,
  afs_eproto_cb_count);
-   afs_extract_to_buf(call, call->count2 * 3 * 4);
+   call->_iter = >iter;
+   iov_iter_discard(>iter, READ, call->count2 * 3 * 4);
call->unmarshall++;
 
case 4:
-   _debug("extract CB array");
+   _debug("extract discard %zu/%u",
+  iov_iter_count(>iter), call->count2 * 3 * 4);
+
ret = afs_extract_data(call, false);
if (ret < 0)
return ret;
 
-   _debug("unmarshall CB array");
-   cb = call->request;
-   bp = call->buffer;
-   for (loop = call->count2; loop > 0; loop--, cb++) {
-   cb->cb.version  = ntohl(*bp++);
-   cb->cb.expiry   = ntohl(*bp++);
-   cb->cb.type = ntohl(*bp++);
-   }
-
call->unmarshall++;
case 5:
break;

[RFC v2 10/14] kunit: add Python libraries for handing KUnit config and kernel

2018-10-23 Thread Brendan Higgins

The ultimate goal is to create minimal isolated test binaries; in the
meantime we are using UML to provide the infrastructure to run tests, so
define an abstract way to configure and run tests that allow us to
change the context in which tests are built without affecting the user.
This also makes pretty and dynamic error reporting, and a lot of other
nice features easier.

kunit_config.py:
  - parse .config and Kconfig files.

kunit_kernel.py: provides helper functions to:
  - configure the kernel using kunitconfig.
  - build the kernel with the appropriate configuration.
  - provide function to invoke the kernel and stream the output back.

Signed-off-by: Felix Guo 
Signed-off-by: Brendan Higgins 
---
 tools/testing/kunit/.gitignore  |   3 +
 tools/testing/kunit/kunit_config.py |  60 ++
 tools/testing/kunit/kunit_kernel.py | 123 
 3 files changed, 186 insertions(+)
 create mode 100644 tools/testing/kunit/.gitignore
 create mode 100644 tools/testing/kunit/kunit_config.py
 create mode 100644 tools/testing/kunit/kunit_kernel.py

diff --git a/tools/testing/kunit/.gitignore b/tools/testing/kunit/.gitignore
new file mode 100644
index 0..c791ff59a37a9
--- /dev/null
+++ b/tools/testing/kunit/.gitignore
@@ -0,0 +1,3 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
\ No newline at end of file
diff --git a/tools/testing/kunit/kunit_config.py 
b/tools/testing/kunit/kunit_config.py
new file mode 100644
index 0..183bd5e758762
--- /dev/null
+++ b/tools/testing/kunit/kunit_config.py
@@ -0,0 +1,60 @@
+# SPDX-License-Identifier: GPL-2.0
+
+import collections
+import re
+
+CONFIG_IS_NOT_SET_PATTERN = r'^# CONFIG_\w+ is not set$'
+CONFIG_PATTERN = r'^CONFIG_\w+=\S+$'
+
+KconfigEntryBase = collections.namedtuple('KconfigEntry', ['raw_entry'])
+
+
+class KconfigEntry(KconfigEntryBase):
+
+   def __str__(self) -> str:
+   return self.raw_entry
+
+
+class KconfigParseError(Exception):
+   """Error parsing Kconfig defconfig or .config."""
+
+
+class Kconfig(object):
+   """Represents defconfig or .config specified using the Kconfig 
language."""
+
+   def __init__(self):
+   self._entries = []
+
+   def entries(self):
+   return set(self._entries)
+
+   def add_entry(self, entry: KconfigEntry) -> None:
+   self._entries.append(entry)
+
+   def is_subset_of(self, other: "Kconfig") -> bool:
+   return self.entries().issubset(other.entries())
+
+   def write_to_file(self, path: str) -> None:
+   with open(path, 'w') as f:
+   for entry in self.entries():
+   f.write(str(entry) + '\n')
+
+   def parse_from_string(self, blob: str) -> None:
+   """Parses a string containing KconfigEntrys and populates this 
Kconfig."""
+   self._entries = []
+   is_not_set_matcher = re.compile(CONFIG_IS_NOT_SET_PATTERN)
+   config_matcher = re.compile(CONFIG_PATTERN)
+   for line in blob.split('\n'):
+   line = line.strip()
+   if not line:
+   continue
+   elif config_matcher.match(line) or 
is_not_set_matcher.match(line):
+   self._entries.append(KconfigEntry(line))
+   elif line[0] == '#':
+   continue
+   else:
+   raise KconfigParseError('Failed to parse: ' + 
line)
+
+   def read_from_file(self, path: str) -> None:
+   with open(path, 'r') as f:
+   self.parse_from_string(f.read())
diff --git a/tools/testing/kunit/kunit_kernel.py 
b/tools/testing/kunit/kunit_kernel.py
new file mode 100644
index 0..87abaede50513
--- /dev/null
+++ b/tools/testing/kunit/kunit_kernel.py
@@ -0,0 +1,123 @@
+# SPDX-License-Identifier: GPL-2.0
+
+import logging
+import subprocess
+import os
+
+import kunit_config
+
+KCONFIG_PATH = '.config'
+
+class ConfigError(Exception):
+   """Represents an error trying to configure the Linux kernel."""
+
+
+class BuildError(Exception):
+   """Represents an error trying to build the Linux kernel."""
+
+
+class LinuxSourceTreeOperations(object):
+   """An abstraction over command line operations performed on a source 
tree."""
+
+   def make_mrproper(self):
+   try:
+   subprocess.check_output(['make', 'mrproper'])
+   except OSError as e:
+   raise ConfigError('Could not call make command: ' + e)
+   except subprocess.CalledProcessError as e:
+   raise ConfigError(e.output)
+
+   def make_olddefconfig(self):
+   try:
+   subprocess.check_output(['make', 'ARCH=um', 
'olddefconfig'])
+   except OSError as e:
+

[PATCH v3 4/4] nds32: Add document for NDS32 PMU.

2018-10-23 Thread Nickhu

The document for how to add NDS32 PMU
in devicetree.

Signed-off-by: Nickhu 
---
 Documentation/devicetree/bindings/nds32/pmu.txt | 17 +
 1 file changed, 17 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/nds32/pmu.txt

diff --git a/Documentation/devicetree/bindings/nds32/pmu.txt 
b/Documentation/devicetree/bindings/nds32/pmu.txt
new file mode 100644
index ..1bd15785b4ae
--- /dev/null
+++ b/Documentation/devicetree/bindings/nds32/pmu.txt
@@ -0,0 +1,17 @@
+* NDS32 Performance Monitor Units
+
+NDS32 core have a PMU for counting cpu and cache events like cache misses.
+The NDS32 PMU representation in the device tree should be done as under:
+
+Required properties:
+
+- compatible :
+   "andestech,nds32v3-pmu"
+
+- interrupts : The interrupt number for NDS32 PMU is 13.
+
+Example:
+pmu{
+   compatible = "andestech,nds32v3-pmu";
+   interrupts = <13>;
+}
-- 
2.17.0

[PATCH v3 3/4] nds32: Add perf call-graph support.

2018-10-23 Thread Nickhu

The perf call-graph option can trace the callchain
between functions. This commit add the perf callchain
for nds32. There are kerenl callchain and user callchain.
The kerenl callchain can trace the function in kernel
space. There are two type for user callchain. One for the
'optimize for size' config is set, and another one for the
config is not set. The difference between two types is that
the index of frame-pointer in user stack is not the same.

For example:
With optimize for size:
User Stack:
-
|   lp  |
-
|   gp  |
-
|   fp  |

Without optimize for size:
User Stack:
1. non-leaf function:
-
|   lp  |
-
|   fp  |

2. leaf function:
-
|   fp  |

Signed-off-by: Nickhu 
---
 arch/nds32/kernel/perf_event_cpu.c | 299 +
 1 file changed, 299 insertions(+)

diff --git a/arch/nds32/kernel/perf_event_cpu.c 
b/arch/nds32/kernel/perf_event_cpu.c
index a6e723d0fdbc..5e00ce54d0ff 100644
--- a/arch/nds32/kernel/perf_event_cpu.c
+++ b/arch/nds32/kernel/perf_event_cpu.c
@@ -1193,6 +1193,305 @@ static int __init register_pmu_driver(void)
 
 device_initcall(register_pmu_driver);
 
+/*
+ * References: arch/nds32/kernel/traps.c:__dump()
+ * You will need to know the NDS ABI first.
+ */
+static int unwind_frame_kernel(struct stackframe *frame)
+{
+   int graph = 0;
+#ifdef CONFIG_FRAME_POINTER
+   /* 0x3 means misalignment */
+   if (!kstack_end((void *)frame->fp) &&
+   !((unsigned long)frame->fp & 0x3) &&
+   ((unsigned long)frame->fp >= TASK_SIZE)) {
+   /*
+*  The array index is based on the ABI, the below graph
+*  illustrate the reasons.
+*  Function call procedure: "smw" and "lmw" will always
+*  update SP and FP for you automatically.
+*
+*  Stack Relative Address
+*  |  |  0
+*  
+*  |LP| <-- SP(before smw)  <-- FP(after smw)   -1
+*  
+*  |FP| -2
+*  
+*  |  | <-- SP(after smw)   -3
+*/
+   frame->lp = ((unsigned long *)frame->fp)[-1];
+   frame->fp = ((unsigned long *)frame->fp)[FP_OFFSET];
+   /* make sure CONFIG_FUNCTION_GRAPH_TRACER is turned on */
+   if (__kernel_text_address(frame->lp))
+   frame->lp = ftrace_graph_ret_addr
+   (NULL, , frame->lp, NULL);
+
+   return 0;
+   } else {
+   return -EPERM;
+   }
+#else
+   /*
+* You can refer to arch/nds32/kernel/traps.c:__dump()
+* Treat "sp" as "fp", but the "sp" is one frame ahead of "fp".
+* And, the "sp" is not always correct.
+*
+*   Stack Relative Address
+*   |  |  0
+*   
+*   |LP| <-- SP(before smw)  -1
+*   
+*   |  | <-- SP(after smw)   -2
+*   
+*/
+   if (!kstack_end((void *)frame->sp)) {
+   frame->lp = ((unsigned long *)frame->sp)[1];
+   /* TODO: How to deal with the value in first
+* "sp" is not correct?
+*/
+   if (__kernel_text_address(frame->lp))
+   frame->lp = ftrace_graph_ret_addr
+   (tsk, , frame->lp, NULL);
+
+   frame->sp = ((unsigned long *)frame->sp) + 1;
+
+   return 0;
+   } else {
+   return -EPERM;
+   }
+#endif
+}
+
+static void notrace
+walk_stackframe(struct stackframe *frame,
+   int (*fn_record)(struct stackframe *, void *),
+   void *data)
+{
+   while (1) {
+   int ret;
+
+   if (fn_record(frame, data))
+   break;
+
+   ret = unwind_frame_kernel(frame);
+   if (ret < 0)
+   break;
+   }
+}
+
+/*
+ * Gets called by walk_stackframe() for every stackframe. This will be called
+ * whist unwinding the stackframe and is like a subroutine return so we use
+ * the PC.
+ */
+static int callchain_trace(struct stackframe *fr, void *data)
+{
+   struct

[PATCH v3 1/4] nds32: Fix bug in bitfield.h

2018-10-23 Thread Nickhu

There two bitfield bug for perfomance counter
in bitfield.h:

PFM_CTL_offSEL1 21 --> 16
PFM_CTL_offSEL2 27 --> 22

This commit fix it.

Signed-off-by: Nickhu 
---
 arch/nds32/include/asm/bitfield.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/nds32/include/asm/bitfield.h 
b/arch/nds32/include/asm/bitfield.h
index 8e84fc385b94..19b2841219ad 100644
--- a/arch/nds32/include/asm/bitfield.h
+++ b/arch/nds32/include/asm/bitfield.h
@@ -692,8 +692,8 @@
 #define PFM_CTL_offKU1 13  /* Enable user mode event counting for 
PFMC1 */
 #define PFM_CTL_offKU2 14  /* Enable user mode event counting for 
PFMC2 */
 #define PFM_CTL_offSEL015  /* The event selection for 
PFMC0 */
-#define PFM_CTL_offSEL121  /* The event selection for 
PFMC1 */
-#define PFM_CTL_offSEL227  /* The event selection for 
PFMC2 */
+#define PFM_CTL_offSEL116  /* The event selection for 
PFMC1 */
+#define PFM_CTL_offSEL222  /* The event selection for 
PFMC2 */
 /* bit 28:31 reserved */
 
 #define PFM_CTL_mskEN0 ( 0x01  << PFM_CTL_offEN0 )
-- 
2.17.0

[PATCH v3 4/4] nds32: Add document for NDS32 PMU.

2018-10-23 Thread Nickhu

The document for how to add NDS32 PMU
in devicetree.

Signed-off-by: Nickhu 
---
 Documentation/devicetree/bindings/nds32/pmu.txt | 17 +
 1 file changed, 17 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/nds32/pmu.txt

diff --git a/Documentation/devicetree/bindings/nds32/pmu.txt 
b/Documentation/devicetree/bindings/nds32/pmu.txt
new file mode 100644
index ..1bd15785b4ae
--- /dev/null
+++ b/Documentation/devicetree/bindings/nds32/pmu.txt
@@ -0,0 +1,17 @@
+* NDS32 Performance Monitor Units
+
+NDS32 core have a PMU for counting cpu and cache events like cache misses.
+The NDS32 PMU representation in the device tree should be done as under:
+
+Required properties:
+
+- compatible :
+   "andestech,nds32v3-pmu"
+
+- interrupts : The interrupt number for NDS32 PMU is 13.
+
+Example:
+pmu{
+   compatible = "andestech,nds32v3-pmu";
+   interrupts = <13>;
+}
-- 
2.17.0

[PATCH v3 3/4] nds32: Add perf call-graph support.

2018-10-23 Thread Nickhu

The perf call-graph option can trace the callchain
between functions. This commit add the perf callchain
for nds32. There are kerenl callchain and user callchain.
The kerenl callchain can trace the function in kernel
space. There are two type for user callchain. One for the
'optimize for size' config is set, and another one for the
config is not set. The difference between two types is that
the index of frame-pointer in user stack is not the same.

For example:
With optimize for size:
User Stack:
-
|   lp  |
-
|   gp  |
-
|   fp  |

Without optimize for size:
User Stack:
1. non-leaf function:
-
|   lp  |
-
|   fp  |

2. leaf function:
-
|   fp  |

Signed-off-by: Nickhu 
---
 arch/nds32/kernel/perf_event_cpu.c | 299 +
 1 file changed, 299 insertions(+)

diff --git a/arch/nds32/kernel/perf_event_cpu.c 
b/arch/nds32/kernel/perf_event_cpu.c
index a6e723d0fdbc..5e00ce54d0ff 100644
--- a/arch/nds32/kernel/perf_event_cpu.c
+++ b/arch/nds32/kernel/perf_event_cpu.c
@@ -1193,6 +1193,305 @@ static int __init register_pmu_driver(void)
 
 device_initcall(register_pmu_driver);
 
+/*
+ * References: arch/nds32/kernel/traps.c:__dump()
+ * You will need to know the NDS ABI first.
+ */
+static int unwind_frame_kernel(struct stackframe *frame)
+{
+   int graph = 0;
+#ifdef CONFIG_FRAME_POINTER
+   /* 0x3 means misalignment */
+   if (!kstack_end((void *)frame->fp) &&
+   !((unsigned long)frame->fp & 0x3) &&
+   ((unsigned long)frame->fp >= TASK_SIZE)) {
+   /*
+*  The array index is based on the ABI, the below graph
+*  illustrate the reasons.
+*  Function call procedure: "smw" and "lmw" will always
+*  update SP and FP for you automatically.
+*
+*  Stack Relative Address
+*  |  |  0
+*  
+*  |LP| <-- SP(before smw)  <-- FP(after smw)   -1
+*  
+*  |FP| -2
+*  
+*  |  | <-- SP(after smw)   -3
+*/
+   frame->lp = ((unsigned long *)frame->fp)[-1];
+   frame->fp = ((unsigned long *)frame->fp)[FP_OFFSET];
+   /* make sure CONFIG_FUNCTION_GRAPH_TRACER is turned on */
+   if (__kernel_text_address(frame->lp))
+   frame->lp = ftrace_graph_ret_addr
+   (NULL, , frame->lp, NULL);
+
+   return 0;
+   } else {
+   return -EPERM;
+   }
+#else
+   /*
+* You can refer to arch/nds32/kernel/traps.c:__dump()
+* Treat "sp" as "fp", but the "sp" is one frame ahead of "fp".
+* And, the "sp" is not always correct.
+*
+*   Stack Relative Address
+*   |  |  0
+*   
+*   |LP| <-- SP(before smw)  -1
+*   
+*   |  | <-- SP(after smw)   -2
+*   
+*/
+   if (!kstack_end((void *)frame->sp)) {
+   frame->lp = ((unsigned long *)frame->sp)[1];
+   /* TODO: How to deal with the value in first
+* "sp" is not correct?
+*/
+   if (__kernel_text_address(frame->lp))
+   frame->lp = ftrace_graph_ret_addr
+   (tsk, , frame->lp, NULL);
+
+   frame->sp = ((unsigned long *)frame->sp) + 1;
+
+   return 0;
+   } else {
+   return -EPERM;
+   }
+#endif
+}
+
+static void notrace
+walk_stackframe(struct stackframe *frame,
+   int (*fn_record)(struct stackframe *, void *),
+   void *data)
+{
+   while (1) {
+   int ret;
+
+   if (fn_record(frame, data))
+   break;
+
+   ret = unwind_frame_kernel(frame);
+   if (ret < 0)
+   break;
+   }
+}
+
+/*
+ * Gets called by walk_stackframe() for every stackframe. This will be called
+ * whist unwinding the stackframe and is like a subroutine return so we use
+ * the PC.
+ */
+static int callchain_trace(struct stackframe *fr, void *data)
+{
+   struct

[PATCH v3 1/4] nds32: Fix bug in bitfield.h

2018-10-23 Thread Nickhu

There two bitfield bug for perfomance counter
in bitfield.h:

PFM_CTL_offSEL1 21 --> 16
PFM_CTL_offSEL2 27 --> 22

This commit fix it.

Signed-off-by: Nickhu 
---
 arch/nds32/include/asm/bitfield.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/nds32/include/asm/bitfield.h 
b/arch/nds32/include/asm/bitfield.h
index 8e84fc385b94..19b2841219ad 100644
--- a/arch/nds32/include/asm/bitfield.h
+++ b/arch/nds32/include/asm/bitfield.h
@@ -692,8 +692,8 @@
 #define PFM_CTL_offKU1 13  /* Enable user mode event counting for 
PFMC1 */
 #define PFM_CTL_offKU2 14  /* Enable user mode event counting for 
PFMC2 */
 #define PFM_CTL_offSEL015  /* The event selection for 
PFMC0 */
-#define PFM_CTL_offSEL121  /* The event selection for 
PFMC1 */
-#define PFM_CTL_offSEL227  /* The event selection for 
PFMC2 */
+#define PFM_CTL_offSEL116  /* The event selection for 
PFMC1 */
+#define PFM_CTL_offSEL222  /* The event selection for 
PFMC2 */
 /* bit 28:31 reserved */
 
 #define PFM_CTL_mskEN0 ( 0x01  << PFM_CTL_offEN0 )
-- 
2.17.0

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1128 matches

Mail list logo