Re: [PATCH 0/7] x86/mm/tlb: make lazy TLB mode even lazier
* Rik van Riel wrote: > The big thing remaining is the reference count overhead of > the lazy TLB mm_struct, but getting rid of that is rather a > lot of code for a small performance gain. Not quite what > Linus asked for :) BTW., what would be the plan to improve scalability there, is it even possible? Also, it would be nice to integrate some of those workloads into a simple 'perf bench mm' or 'perf bench tlb' subcommand, see tools/perf/bench/ on how to add benchmarking modules. Thanks, Ingo
Re: [PATCH 0/7] x86/mm/tlb: make lazy TLB mode even lazier
* Rik van Riel wrote: > The big thing remaining is the reference count overhead of > the lazy TLB mm_struct, but getting rid of that is rather a > lot of code for a small performance gain. Not quite what > Linus asked for :) BTW., what would be the plan to improve scalability there, is it even possible? Also, it would be nice to integrate some of those workloads into a simple 'perf bench mm' or 'perf bench tlb' subcommand, see tools/perf/bench/ on how to add benchmarking modules. Thanks, Ingo
Re: [GIT] Sparc
On Wed, Oct 24, 2018 at 4:31 AM David Miller wrote: > > Mostly VDSO cleanups and optimizations. Pulled, Linus
Re: [GIT] Sparc
On Wed, Oct 24, 2018 at 4:31 AM David Miller wrote: > > Mostly VDSO cleanups and optimizations. Pulled, Linus
[PATCH v3] kernel/signal: Signal-based pre-coredump notification
For simplicity and consistency, this patch provides an implementation for signal-based fault notification prior to the coredump of a child process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can be used by an application to express its interest and to specify the signal for such a notification. A new signal code CLD_PREDUMP is also defined for SIGCHLD. Changes to prctl(2): PR_SET_PREDUMP_SIG (since Linux 4.20.x) Set the child pre-coredump signal of the calling process to arg2 (either a signal value in the range 1..maxsig, or 0 to clear). This is the signal that the calling process will get prior to the coredump of a child process. This value is cleared across execve(2), or for the child of a fork(2). When SIGCHLD is specified, the signal code will be set to CLD_PREDUMP in such an SIGCHLD signal. PR_GET_PREDUMP_SIG (since Linux 4.20.x) Return the current value of the child pre-coredump signal, in the location pointed to by (int *) arg2. Background: As the coredump of a process may take time, in certain time-sensitive applications it is necessary for a parent process (e.g., a process manager) to be notified of a child's imminent death before the coredump so that the parent process can act sooner, such as re-spawning an application process, or initiating a control-plane fail-over. Currently there are two ways for a parent process to be notified of a child process's state change. One is to use the POSIX signal, and another is to use the kernel connector module. The specific events and actions are summarized as follows: Process EventPOSIX SignalConnector-based -- ptrace_attach() do_notify_parent_cldstop() proc_ptrace_connector() SIGCHLD / CLD_STOPPED ptrace_detach() do_notify_parent_cldstop() proc_ptrace_connector() SIGCHLD / CLD_CONTINUED pre_coredump/N/A proc_coredump_connector() get_signal() post_coredump/ do_notify_parent() proc_exit_connector() do_exit()SIGCHLD / exit_signal -- As shown in the table, the signal-based pre-coredump notification is not currently available. In some cases using a connector-based notification can be quite complicated (e.g., when a process manager is written in shell scripts and thus is subject to certain inherent limitations), and a signal-based notification would be simpler and better suited. Signed-off-by: Enke Chen --- v2 -> v3: Addressed review comments from Oleg Nesterov, including: o remove the restriction on signal for PR_SET_PREDUMP_SIG. o code simplification arch/x86/kernel/signal_compat.c | 2 +- fs/coredump.c| 6 + fs/exec.c| 3 + include/linux/sched/signal.h | 4 + include/uapi/asm-generic/siginfo.h | 3 +- include/uapi/linux/prctl.h | 4 + kernel/fork.c| 3 + kernel/signal.c | 31 + kernel/sys.c | 13 ++ tools/testing/selftests/prctl/Makefile | 2 +- tools/testing/selftests/prctl/predump-sig-test.c | 169 +++ 11 files changed, 237 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/prctl/predump-sig-test.c diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c index 9ccbf05..a3deba8 100644 --- a/arch/x86/kernel/signal_compat.c +++ b/arch/x86/kernel/signal_compat.c @@ -30,7 +30,7 @@ static inline void signal_compat_build_tests(void) BUILD_BUG_ON(NSIGSEGV != 7); BUILD_BUG_ON(NSIGBUS != 5); BUILD_BUG_ON(NSIGTRAP != 5); - BUILD_BUG_ON(NSIGCHLD != 6); + BUILD_BUG_ON(NSIGCHLD != 7); BUILD_BUG_ON(NSIGSYS != 1); /* This is part of the ABI and can never change in size: */ diff --git a/fs/coredump.c b/fs/coredump.c index e42e17e..d6ca1a3 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -590,6 +590,12 @@ void do_coredump(const kernel_siginfo_t *siginfo) if (retval < 0) goto fail_creds; + /* +* Send the pre-coredump signal to the parent if requested. +*/ + do_notify_parent_predump(); + cond_resched(); + old_cred = override_creds(cred); ispipe = format_corename(, ); diff --git a/fs/exec.c b/fs/exec.c index fc281b7..7714da7 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1181,6 +1181,9 @@ static int de_thread(struct task_struct *tsk) /* we have changed execution domain */ tsk->exit_signal = SIGCHLD; + /* Clear the pre-coredump signal before loading a new binary */ + sig->predump_signal = 0; + #ifdef
[PATCH v3] kernel/signal: Signal-based pre-coredump notification
For simplicity and consistency, this patch provides an implementation for signal-based fault notification prior to the coredump of a child process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can be used by an application to express its interest and to specify the signal for such a notification. A new signal code CLD_PREDUMP is also defined for SIGCHLD. Changes to prctl(2): PR_SET_PREDUMP_SIG (since Linux 4.20.x) Set the child pre-coredump signal of the calling process to arg2 (either a signal value in the range 1..maxsig, or 0 to clear). This is the signal that the calling process will get prior to the coredump of a child process. This value is cleared across execve(2), or for the child of a fork(2). When SIGCHLD is specified, the signal code will be set to CLD_PREDUMP in such an SIGCHLD signal. PR_GET_PREDUMP_SIG (since Linux 4.20.x) Return the current value of the child pre-coredump signal, in the location pointed to by (int *) arg2. Background: As the coredump of a process may take time, in certain time-sensitive applications it is necessary for a parent process (e.g., a process manager) to be notified of a child's imminent death before the coredump so that the parent process can act sooner, such as re-spawning an application process, or initiating a control-plane fail-over. Currently there are two ways for a parent process to be notified of a child process's state change. One is to use the POSIX signal, and another is to use the kernel connector module. The specific events and actions are summarized as follows: Process EventPOSIX SignalConnector-based -- ptrace_attach() do_notify_parent_cldstop() proc_ptrace_connector() SIGCHLD / CLD_STOPPED ptrace_detach() do_notify_parent_cldstop() proc_ptrace_connector() SIGCHLD / CLD_CONTINUED pre_coredump/N/A proc_coredump_connector() get_signal() post_coredump/ do_notify_parent() proc_exit_connector() do_exit()SIGCHLD / exit_signal -- As shown in the table, the signal-based pre-coredump notification is not currently available. In some cases using a connector-based notification can be quite complicated (e.g., when a process manager is written in shell scripts and thus is subject to certain inherent limitations), and a signal-based notification would be simpler and better suited. Signed-off-by: Enke Chen --- v2 -> v3: Addressed review comments from Oleg Nesterov, including: o remove the restriction on signal for PR_SET_PREDUMP_SIG. o code simplification arch/x86/kernel/signal_compat.c | 2 +- fs/coredump.c| 6 + fs/exec.c| 3 + include/linux/sched/signal.h | 4 + include/uapi/asm-generic/siginfo.h | 3 +- include/uapi/linux/prctl.h | 4 + kernel/fork.c| 3 + kernel/signal.c | 31 + kernel/sys.c | 13 ++ tools/testing/selftests/prctl/Makefile | 2 +- tools/testing/selftests/prctl/predump-sig-test.c | 169 +++ 11 files changed, 237 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/prctl/predump-sig-test.c diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c index 9ccbf05..a3deba8 100644 --- a/arch/x86/kernel/signal_compat.c +++ b/arch/x86/kernel/signal_compat.c @@ -30,7 +30,7 @@ static inline void signal_compat_build_tests(void) BUILD_BUG_ON(NSIGSEGV != 7); BUILD_BUG_ON(NSIGBUS != 5); BUILD_BUG_ON(NSIGTRAP != 5); - BUILD_BUG_ON(NSIGCHLD != 6); + BUILD_BUG_ON(NSIGCHLD != 7); BUILD_BUG_ON(NSIGSYS != 1); /* This is part of the ABI and can never change in size: */ diff --git a/fs/coredump.c b/fs/coredump.c index e42e17e..d6ca1a3 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -590,6 +590,12 @@ void do_coredump(const kernel_siginfo_t *siginfo) if (retval < 0) goto fail_creds; + /* +* Send the pre-coredump signal to the parent if requested. +*/ + do_notify_parent_predump(); + cond_resched(); + old_cred = override_creds(cred); ispipe = format_corename(, ); diff --git a/fs/exec.c b/fs/exec.c index fc281b7..7714da7 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1181,6 +1181,9 @@ static int de_thread(struct task_struct *tsk) /* we have changed execution domain */ tsk->exit_signal = SIGCHLD; + /* Clear the pre-coredump signal before loading a new binary */ + sig->predump_signal = 0; + #ifdef
Re: [PATCH] mm: convert totalram_pages, totalhigh_pages and managed_pages to atomic.
On 2018-10-24 01:34, Kees Cook wrote: On Mon, Oct 22, 2018 at 10:11 PM, Konstantin Khlebnikov wrote: On 23.10.2018 7:15, Joe Perches wrote:> On Mon, 2018-10-22 at 22:53 +0530, Arun KS wrote: Remove managed_page_count_lock spinlock and instead use atomic variables. Perhaps better to define and use macros for the accesses instead of specific uses of atomic_long_ Something like: #define totalram_pages() (unsigned long)atomic_long_read(&_totalram_pages) or proper static inline this code isn't so low level for breaking include dependencies with macro BTW, I noticed a few places in the patch that did multiple evaluations of totalram_pages. It might be worth fixing those prior to doing the conversion, too. e.g.: if (totalram_pages > something) foobar(totalram_pages); <- value may have changed here should, instead, be: var = totalram_pages; <- get stable view of the value if (var > something) foobar(var); Thanks for reviewing. Point taken. -Kees [dropped bloated cc - my server rejects this mess] Thank you -- I was struggling to figure out the best way to reply to this. :) I'm sorry for the trouble caused. Sent the email using, git send-email --to-cmd="scripts/get_maintainer.pl -i" 0001-convert-totalram_pages-totalhigh_pages-and-managed_p.patch Is this not a recommended approach? Regards, Arun -Kees
Re: [PATCH] mm: convert totalram_pages, totalhigh_pages and managed_pages to atomic.
On 2018-10-24 01:34, Kees Cook wrote: On Mon, Oct 22, 2018 at 10:11 PM, Konstantin Khlebnikov wrote: On 23.10.2018 7:15, Joe Perches wrote:> On Mon, 2018-10-22 at 22:53 +0530, Arun KS wrote: Remove managed_page_count_lock spinlock and instead use atomic variables. Perhaps better to define and use macros for the accesses instead of specific uses of atomic_long_ Something like: #define totalram_pages() (unsigned long)atomic_long_read(&_totalram_pages) or proper static inline this code isn't so low level for breaking include dependencies with macro BTW, I noticed a few places in the patch that did multiple evaluations of totalram_pages. It might be worth fixing those prior to doing the conversion, too. e.g.: if (totalram_pages > something) foobar(totalram_pages); <- value may have changed here should, instead, be: var = totalram_pages; <- get stable view of the value if (var > something) foobar(var); Thanks for reviewing. Point taken. -Kees [dropped bloated cc - my server rejects this mess] Thank you -- I was struggling to figure out the best way to reply to this. :) I'm sorry for the trouble caused. Sent the email using, git send-email --to-cmd="scripts/get_maintainer.pl -i" 0001-convert-totalram_pages-totalhigh_pages-and-managed_p.patch Is this not a recommended approach? Regards, Arun -Kees
Re: [PATCH] hugetlbfs: dirty pages as they are added to pagecache
On Tue, 2018-10-23 at 10:30 -0700, Mike Kravetz wrote: > . snip > Here is updated patch without the drop_caches change and updated > fixes tag. > > From: Mike Kravetz > > hugetlbfs: dirty pages as they are added to pagecache > > Some test systems were experiencing negative huge page reserve > counts and incorrect file block counts. This was traced to > /proc/sys/vm/drop_caches removing clean pages from hugetlbfs > file pagecaches. When non-hugetlbfs explicit code removes the > pages, the appropriate accounting is not performed. > > This can be recreated as follows: > fallocate -l 2M /dev/hugepages/foo > echo 1 > /proc/sys/vm/drop_caches > fallocate -l 2M /dev/hugepages/foo > grep -i huge /proc/meminfo >AnonHugePages: 0 kB >ShmemHugePages:0 kB >HugePages_Total:2048 >HugePages_Free: 2047 >HugePages_Rsvd:18446744073709551615 >HugePages_Surp:0 >Hugepagesize: 2048 kB >Hugetlb: 4194304 kB > ls -lsh /dev/hugepages/foo >4.0M -rw-r--r--. 1 root root 2.0M Oct 17 20:05 /dev/hugepages/foo > > To address this issue, dirty pages as they are added to pagecache. > This can easily be reproduced with fallocate as shown above. Read > faulted pages will eventually end up being marked dirty. But there > is a window where they are clean and could be impacted by code such > as drop_caches. So, just dirty them all as they are added to the > pagecache. > > Fixes: 6bda666a03f0 ("hugepages: fold find_or_alloc_pages into > huge_no_page()") > Cc: sta...@vger.kernel.org > Signed-off-by: Mike Kravetz > --- > mm/hugetlb.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 5c390f5a5207..7b5c0ad9a6bd 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -3690,6 +3690,12 @@ int huge_add_to_page_cache(struct page *page, > struct address_space *mapping, > return err; > ClearPagePrivate(page); > > + /* > + * set page dirty so that it will not be removed from > cache/file > + * by non-hugetlbfs specific code paths. > + */ > + set_page_dirty(page); > + > spin_lock(>i_lock); > inode->i_blocks += blocks_per_huge_page(h); > spin_unlock(>i_lock); This looks good. Reviewed-by: Khalid Aziz -- Khalid
Re: [PATCH] hugetlbfs: dirty pages as they are added to pagecache
On Tue, 2018-10-23 at 10:30 -0700, Mike Kravetz wrote: > . snip > Here is updated patch without the drop_caches change and updated > fixes tag. > > From: Mike Kravetz > > hugetlbfs: dirty pages as they are added to pagecache > > Some test systems were experiencing negative huge page reserve > counts and incorrect file block counts. This was traced to > /proc/sys/vm/drop_caches removing clean pages from hugetlbfs > file pagecaches. When non-hugetlbfs explicit code removes the > pages, the appropriate accounting is not performed. > > This can be recreated as follows: > fallocate -l 2M /dev/hugepages/foo > echo 1 > /proc/sys/vm/drop_caches > fallocate -l 2M /dev/hugepages/foo > grep -i huge /proc/meminfo >AnonHugePages: 0 kB >ShmemHugePages:0 kB >HugePages_Total:2048 >HugePages_Free: 2047 >HugePages_Rsvd:18446744073709551615 >HugePages_Surp:0 >Hugepagesize: 2048 kB >Hugetlb: 4194304 kB > ls -lsh /dev/hugepages/foo >4.0M -rw-r--r--. 1 root root 2.0M Oct 17 20:05 /dev/hugepages/foo > > To address this issue, dirty pages as they are added to pagecache. > This can easily be reproduced with fallocate as shown above. Read > faulted pages will eventually end up being marked dirty. But there > is a window where they are clean and could be impacted by code such > as drop_caches. So, just dirty them all as they are added to the > pagecache. > > Fixes: 6bda666a03f0 ("hugepages: fold find_or_alloc_pages into > huge_no_page()") > Cc: sta...@vger.kernel.org > Signed-off-by: Mike Kravetz > --- > mm/hugetlb.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 5c390f5a5207..7b5c0ad9a6bd 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -3690,6 +3690,12 @@ int huge_add_to_page_cache(struct page *page, > struct address_space *mapping, > return err; > ClearPagePrivate(page); > > + /* > + * set page dirty so that it will not be removed from > cache/file > + * by non-hugetlbfs specific code paths. > + */ > + set_page_dirty(page); > + > spin_lock(>i_lock); > inode->i_blocks += blocks_per_huge_page(h); > spin_unlock(>i_lock); This looks good. Reviewed-by: Khalid Aziz -- Khalid
[PATCH RFC v2 0/1] hugetlbfs: Use i_mmap_rwsem for pmd share and fault/trunc
This patch addresses issues with page fault/truncation synchronization. The first issue was noticed as a negative hugetlb reserved page counts during DB development testing. Code inspection revealed that the most likely cause were races with truncate and page faults. In fact, I could write a not too complicated program to cause the races and recreate the issue. A more dangerous issue exists when you introduce huge pmd sharing to page fault/truncate races. The fist thing that happens in huge page fault processing is a call to huge_pte_alloc to get a ptep. Suppose that ptep points to a shared pmd. Now, another thread could perform a truncate and unmap everyone mapping the file. huge_pmd_unshare can be called for the mapping on which the first thread is operating. huge_pmd_unshare can clear pud pointing to the pmd. After this, the ptep points to another task's page table or worse. This leads to bad things such as incorrect page map/reference counts or invaid memory references. Fix this all by modifying the usage of i_mmap_rwsem to cover fault/truncate races as well as handling of shared pmds Mike Kravetz (1): hugetlbfs: use i_mmap_rwsem for pmd sharing and truncate/fault sync fs/hugetlbfs/inode.c | 21 ++ mm/hugetlb.c | 65 +--- mm/rmap.c| 10 +++ mm/userfaultfd.c | 11 ++-- 4 files changed, 84 insertions(+), 23 deletions(-) -- 2.17.2
[PATCH RFC v2 0/1] hugetlbfs: Use i_mmap_rwsem for pmd share and fault/trunc
This patch addresses issues with page fault/truncation synchronization. The first issue was noticed as a negative hugetlb reserved page counts during DB development testing. Code inspection revealed that the most likely cause were races with truncate and page faults. In fact, I could write a not too complicated program to cause the races and recreate the issue. A more dangerous issue exists when you introduce huge pmd sharing to page fault/truncate races. The fist thing that happens in huge page fault processing is a call to huge_pte_alloc to get a ptep. Suppose that ptep points to a shared pmd. Now, another thread could perform a truncate and unmap everyone mapping the file. huge_pmd_unshare can be called for the mapping on which the first thread is operating. huge_pmd_unshare can clear pud pointing to the pmd. After this, the ptep points to another task's page table or worse. This leads to bad things such as incorrect page map/reference counts or invaid memory references. Fix this all by modifying the usage of i_mmap_rwsem to cover fault/truncate races as well as handling of shared pmds Mike Kravetz (1): hugetlbfs: use i_mmap_rwsem for pmd sharing and truncate/fault sync fs/hugetlbfs/inode.c | 21 ++ mm/hugetlb.c | 65 +--- mm/rmap.c| 10 +++ mm/userfaultfd.c | 11 ++-- 4 files changed, 84 insertions(+), 23 deletions(-) -- 2.17.2
Re: [PATCH v4 2/2] sched/fair: update scale invariance of PELT
Hi Vincent, Thanks for the detailed explanation. On Tue, Oct 23, 2018 at 02:15:08PM +0200, Vincent Guittot wrote: > Hi Pavan, > > On Tue, 23 Oct 2018 at 07:59, Pavan Kondeti wrote: > > > > Hi Vincent, > > > > On Fri, Oct 19, 2018 at 06:17:51PM +0200, Vincent Guittot wrote: > > > > > > /* > > > + * The clock_pelt scales the time to reflect the effective amount of > > > + * computation done during the running delta time but then sync back to > > > + * clock_task when rq is idle. > > > + * > > > + * > > > + * absolute time | 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|16 > > > + * @ max capacity --**---**--- > > > + * @ half capacity ---- > > > + * clock pelt | 1| 2|3|4| 7| 8| 9| 10| 11|14|15|16 > > > + * > > > + */ > > > +void update_rq_clock_pelt(struct rq *rq, s64 delta) > > > +{ > > > + > > > + if (is_idle_task(rq->curr)) { > > > + u32 divider = (LOAD_AVG_MAX - 1024 + > > > rq->cfs.avg.period_contrib) << SCHED_CAPACITY_SHIFT; > > > + u32 overload = rq->cfs.avg.util_sum + LOAD_AVG_MAX; > > > + overload += rq->avg_rt.util_sum; > > > + overload += rq->avg_dl.util_sum; > > > + > > > + /* > > > + * Reflecting some stolen time makes sense only if the idle > > > + * phase would be present at max capacity. As soon as the > > > + * utilization of a rq has reached the maximum value, it is > > > + * considered as an always runnnig rq without idle time to > > > + * steal. This potential idle time is considered as lost in > > > + * this case. We keep track of this lost idle time compare > > > to > > > + * rq's clock_task. > > > + */ > > > + if (overload >= divider) > > > + rq->lost_idle_time += rq_clock_task(rq) - > > > rq->clock_pelt; > > > + > > > > I am trying to understand this better. I believe we run into this scenario, > > when > > the frequency is limited due to thermal/userspace constraints. Lets say > > Yes these are the most common UCs but this can also happen after tasks > migration or with a cpufreq governor that doesn't increase OPP fast > enough for current utilization. > > > frequency is limited to Fmax/2. A 50% task at Fmax, becomes 100% running at > > Fmax/2. The utilization is built up to 100% after several periods. > > The clock_pelt runs at 1/2 speed of the clock_task. We are loosing the idle > > time > > all along. What happens when the CPU enters idle for a short duration and > > comes > > back to run this 100% utilization task? > > If you are at 100%, we only apply the short idle duration > > > > > If the above block is not present i.e lost_idle_time is not tracked, we > > stretch the idle time (since clock_pelt is synced to clock_task) and the > > utilization is dropped. Right? > > yes that 's what would happen. I gives more details below > > > > > With the above block, we don't stretch the idle time. In fact we don't > > consider the idle time at all. Because, > > > > idle_time = now - last_time; > > > > idle_time = (rq->clock_pelt - rq->lost_idle_time) - last_time > > idle_time = (rq->clock_task - rq_clock_task + rq->clock_pelt_old) - > > last_time > > idle_time = rq->clock_pelt_old - last_time > > > > The last time is nothing but the last snapshot of the rq->clock_pelt when > > the > > task entered sleep due to which CPU entered idle. > > The condition for dropping this idle time is quite important. This > only happens when the utilization reaches max compute capacity of the > CPU. Otherwise, the idle time will be fully applied Right. rq->lost_idle_time += rq_clock_task(rq) - rq->clock_pelt This not only tracks the lost idle time due to running slow but also the absolute/real sleep time. For example, when the slow running 100% task sleeps for 100 msec, are not we ignoring the 100 msec sleep there? For example a task ran 323 msec at full capacity and sleeps for (1000-323) msec. when it wakes up the utilization is dropped. If the same task runs for 626 msec at the half capacity and sleeps for (1000-626), should not drop the utilization by taking (1000-626) sleep time into account. I understand that why we don't strech idle time to (1000-323) but it is not clear to me why we completely drop the idle time. > > > > > Can you please explain the significance of the above block with an example? > > The pelt signal reaches its max value after 323ms at full capacity, > which means that we can't make any difference between tasks running > 323ms, 500ms or more at max capacity. As a result, we consider that > the CPU is fully used and there is no idle time when the utilization > equals max capacity. If CPU runs at half the capacity, it will run > 626ms before reaching max utilization and at that time we will stop to > stretch the idle time because we consider that there is no idle
Re: [PATCH v4 2/2] sched/fair: update scale invariance of PELT
Hi Vincent, Thanks for the detailed explanation. On Tue, Oct 23, 2018 at 02:15:08PM +0200, Vincent Guittot wrote: > Hi Pavan, > > On Tue, 23 Oct 2018 at 07:59, Pavan Kondeti wrote: > > > > Hi Vincent, > > > > On Fri, Oct 19, 2018 at 06:17:51PM +0200, Vincent Guittot wrote: > > > > > > /* > > > + * The clock_pelt scales the time to reflect the effective amount of > > > + * computation done during the running delta time but then sync back to > > > + * clock_task when rq is idle. > > > + * > > > + * > > > + * absolute time | 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|16 > > > + * @ max capacity --**---**--- > > > + * @ half capacity ---- > > > + * clock pelt | 1| 2|3|4| 7| 8| 9| 10| 11|14|15|16 > > > + * > > > + */ > > > +void update_rq_clock_pelt(struct rq *rq, s64 delta) > > > +{ > > > + > > > + if (is_idle_task(rq->curr)) { > > > + u32 divider = (LOAD_AVG_MAX - 1024 + > > > rq->cfs.avg.period_contrib) << SCHED_CAPACITY_SHIFT; > > > + u32 overload = rq->cfs.avg.util_sum + LOAD_AVG_MAX; > > > + overload += rq->avg_rt.util_sum; > > > + overload += rq->avg_dl.util_sum; > > > + > > > + /* > > > + * Reflecting some stolen time makes sense only if the idle > > > + * phase would be present at max capacity. As soon as the > > > + * utilization of a rq has reached the maximum value, it is > > > + * considered as an always runnnig rq without idle time to > > > + * steal. This potential idle time is considered as lost in > > > + * this case. We keep track of this lost idle time compare > > > to > > > + * rq's clock_task. > > > + */ > > > + if (overload >= divider) > > > + rq->lost_idle_time += rq_clock_task(rq) - > > > rq->clock_pelt; > > > + > > > > I am trying to understand this better. I believe we run into this scenario, > > when > > the frequency is limited due to thermal/userspace constraints. Lets say > > Yes these are the most common UCs but this can also happen after tasks > migration or with a cpufreq governor that doesn't increase OPP fast > enough for current utilization. > > > frequency is limited to Fmax/2. A 50% task at Fmax, becomes 100% running at > > Fmax/2. The utilization is built up to 100% after several periods. > > The clock_pelt runs at 1/2 speed of the clock_task. We are loosing the idle > > time > > all along. What happens when the CPU enters idle for a short duration and > > comes > > back to run this 100% utilization task? > > If you are at 100%, we only apply the short idle duration > > > > > If the above block is not present i.e lost_idle_time is not tracked, we > > stretch the idle time (since clock_pelt is synced to clock_task) and the > > utilization is dropped. Right? > > yes that 's what would happen. I gives more details below > > > > > With the above block, we don't stretch the idle time. In fact we don't > > consider the idle time at all. Because, > > > > idle_time = now - last_time; > > > > idle_time = (rq->clock_pelt - rq->lost_idle_time) - last_time > > idle_time = (rq->clock_task - rq_clock_task + rq->clock_pelt_old) - > > last_time > > idle_time = rq->clock_pelt_old - last_time > > > > The last time is nothing but the last snapshot of the rq->clock_pelt when > > the > > task entered sleep due to which CPU entered idle. > > The condition for dropping this idle time is quite important. This > only happens when the utilization reaches max compute capacity of the > CPU. Otherwise, the idle time will be fully applied Right. rq->lost_idle_time += rq_clock_task(rq) - rq->clock_pelt This not only tracks the lost idle time due to running slow but also the absolute/real sleep time. For example, when the slow running 100% task sleeps for 100 msec, are not we ignoring the 100 msec sleep there? For example a task ran 323 msec at full capacity and sleeps for (1000-323) msec. when it wakes up the utilization is dropped. If the same task runs for 626 msec at the half capacity and sleeps for (1000-626), should not drop the utilization by taking (1000-626) sleep time into account. I understand that why we don't strech idle time to (1000-323) but it is not clear to me why we completely drop the idle time. > > > > > Can you please explain the significance of the above block with an example? > > The pelt signal reaches its max value after 323ms at full capacity, > which means that we can't make any difference between tasks running > 323ms, 500ms or more at max capacity. As a result, we consider that > the CPU is fully used and there is no idle time when the utilization > equals max capacity. If CPU runs at half the capacity, it will run > 626ms before reaching max utilization and at that time we will stop to > stretch the idle time because we consider that there is no idle
[PATCH RFC v2 1/1] hugetlbfs: use i_mmap_rwsem for pmd sharing and truncate/fault sync
hugetlbfs does not correctly handle page faults racing with truncation. In addition, shared pmds can cause additional issues. Without pmd sharing, issues can occur as follows: A huegtlbfs file is mmap(MAP_SHARED) with a size of 4 pages. At mmap time, 4 huge pages are reserved for the file/mapping. So, the global reserve count is 4. In addition, since this is a shared mapping an entry for 4 pages is added to the file's reserve map. The first 3 of the 4 pages are faulted into the file. As a result, the global reserve count is now 1. Task A starts to fault in the last page (routines hugetlb_fault, hugetlb_no_page). It allocates a huge page (alloc_huge_page). The reserve map indicates there is a reserved page, so this is used and the global reserve count goes to 0. Now, task B truncates the file to size 0. It starts by setting inode size to 0(hugetlb_vmtruncate). It then unmaps all mapping of the file (hugetlb_vmdelete_list). Since task A's page table lock is not held at the time, truncation is not blocked. Truncation removes the 3 pages from the file (remove_inode_hugepages). When cleaning up the reserved pages (hugetlb_unreserve_pages), it notices the reserve map was for 4 pages. However, it has only freed 3 pages. So it assumes there is still (4 - 3) 1 reserved pages. It then decrements the global reserve count by 1 and it goes negative. Task A then continues the page fault process and adds it's newly acquired page to the page cache. Note that the index of this page is beyond the size of the truncated file (0). The page fault process then notices the file has been truncated and exits. However, the page is left in the cache associated with the file. Now, if the file is immediately deleted the truncate code runs again. It will find and free the one page associated with the file. When cleaning up reserves, it notices the reserve map is empty. Yet, one page freed. So, the global reserve count is decremented by (0 - 1) -1. This returns the global count to 0 as it should be. But, it is possible for someone else to mmap this file/range before it is deleted. If this happens, a reserve map entry for the allocated page is created and the reserved page is forever leaked. With pmd sharing, the situation is even worse. Consider the following: A task processes a page fault on a shared hugetlbfs file and calls huge_pte_alloc to get a ptep. Suppose the returned ptep points to a shared pmd. Now, anopther task truncates the hugetlbfs file. As part of truncation, it unmaps everyone who has the file mapped. If a task has a shared pmd in this range, huge_pmd_unshhare will be called. If this is not the last user sharing the pmd, huge_pmd_unshare will clear pud pointing to the pmd. For the task in the middle of the page fault, the ptep returned by huge_pte_alloc points to another task's page table or worse. This leads to bad things such as incorrect page map/reference counts or invalid memory references. i_mmap_rwsem is currently used for pmd sharing synchronization. It is also held during unmap and whenever a call to huge_pmd_unshare is possible. It is only acquired in write mode. Expand and modify the use of i_mmap_rwsem as follows: - i_mmap_rwsem is held in write mode for the duration of truncate processing. - i_mmap_rwsem is held in write mode whenever huge_pmd_share is called. - i_mmap_rwsem is held in read mode whenever huge_pmd_share is called. Today that is only via huge_pte_alloc. - i_mmap_rwsem is held in read mode after huge_pte_alloc, until the caller is finished with the returned ptep. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 21 ++ mm/hugetlb.c | 65 +--- mm/rmap.c| 10 +++ mm/userfaultfd.c | 11 ++-- 4 files changed, 84 insertions(+), 23 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 32920a10100e..6ee97622a231 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -426,10 +426,16 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, u32 hash; index = page->index; - hash = hugetlb_fault_mutex_hash(h, current->mm, + /* +* No need to take fault mutex for truncation as we +* are synchronized via i_mmap_rwsem. +*/ + if (!truncate_op) { + hash = hugetlb_fault_mutex_hash(h, current->mm, _vma, mapping, index, 0); - mutex_lock(_fault_mutex_table[hash]); + mutex_lock(_fault_mutex_table[hash]); + } /*
[PATCH RFC v2 1/1] hugetlbfs: use i_mmap_rwsem for pmd sharing and truncate/fault sync
hugetlbfs does not correctly handle page faults racing with truncation. In addition, shared pmds can cause additional issues. Without pmd sharing, issues can occur as follows: A huegtlbfs file is mmap(MAP_SHARED) with a size of 4 pages. At mmap time, 4 huge pages are reserved for the file/mapping. So, the global reserve count is 4. In addition, since this is a shared mapping an entry for 4 pages is added to the file's reserve map. The first 3 of the 4 pages are faulted into the file. As a result, the global reserve count is now 1. Task A starts to fault in the last page (routines hugetlb_fault, hugetlb_no_page). It allocates a huge page (alloc_huge_page). The reserve map indicates there is a reserved page, so this is used and the global reserve count goes to 0. Now, task B truncates the file to size 0. It starts by setting inode size to 0(hugetlb_vmtruncate). It then unmaps all mapping of the file (hugetlb_vmdelete_list). Since task A's page table lock is not held at the time, truncation is not blocked. Truncation removes the 3 pages from the file (remove_inode_hugepages). When cleaning up the reserved pages (hugetlb_unreserve_pages), it notices the reserve map was for 4 pages. However, it has only freed 3 pages. So it assumes there is still (4 - 3) 1 reserved pages. It then decrements the global reserve count by 1 and it goes negative. Task A then continues the page fault process and adds it's newly acquired page to the page cache. Note that the index of this page is beyond the size of the truncated file (0). The page fault process then notices the file has been truncated and exits. However, the page is left in the cache associated with the file. Now, if the file is immediately deleted the truncate code runs again. It will find and free the one page associated with the file. When cleaning up reserves, it notices the reserve map is empty. Yet, one page freed. So, the global reserve count is decremented by (0 - 1) -1. This returns the global count to 0 as it should be. But, it is possible for someone else to mmap this file/range before it is deleted. If this happens, a reserve map entry for the allocated page is created and the reserved page is forever leaked. With pmd sharing, the situation is even worse. Consider the following: A task processes a page fault on a shared hugetlbfs file and calls huge_pte_alloc to get a ptep. Suppose the returned ptep points to a shared pmd. Now, anopther task truncates the hugetlbfs file. As part of truncation, it unmaps everyone who has the file mapped. If a task has a shared pmd in this range, huge_pmd_unshhare will be called. If this is not the last user sharing the pmd, huge_pmd_unshare will clear pud pointing to the pmd. For the task in the middle of the page fault, the ptep returned by huge_pte_alloc points to another task's page table or worse. This leads to bad things such as incorrect page map/reference counts or invalid memory references. i_mmap_rwsem is currently used for pmd sharing synchronization. It is also held during unmap and whenever a call to huge_pmd_unshare is possible. It is only acquired in write mode. Expand and modify the use of i_mmap_rwsem as follows: - i_mmap_rwsem is held in write mode for the duration of truncate processing. - i_mmap_rwsem is held in write mode whenever huge_pmd_share is called. - i_mmap_rwsem is held in read mode whenever huge_pmd_share is called. Today that is only via huge_pte_alloc. - i_mmap_rwsem is held in read mode after huge_pte_alloc, until the caller is finished with the returned ptep. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 21 ++ mm/hugetlb.c | 65 +--- mm/rmap.c| 10 +++ mm/userfaultfd.c | 11 ++-- 4 files changed, 84 insertions(+), 23 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 32920a10100e..6ee97622a231 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -426,10 +426,16 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, u32 hash; index = page->index; - hash = hugetlb_fault_mutex_hash(h, current->mm, + /* +* No need to take fault mutex for truncation as we +* are synchronized via i_mmap_rwsem. +*/ + if (!truncate_op) { + hash = hugetlb_fault_mutex_hash(h, current->mm, _vma, mapping, index, 0); - mutex_lock(_fault_mutex_table[hash]); + mutex_lock(_fault_mutex_table[hash]); + } /*
Re: [PATCH] s390/fault: use wake_up_klogd() in bust_spinlocks()
On (10/24/18 13:30), Sergey Senozhatsky wrote: > - * OK, the message is on the console. Now we call printk() > - * without oops_in_progress set so that printk will give klogd > - * a poke. Hold onto your hats... > - */ > - console_loglevel = 15; > - printk(" "); > console_loglevel = loglevel_save; > + > + oops_in_progress = 0; > + wake_up_klogd(); D'oh... Fat fingers! I noticed that I have removed "console_loglevel = 15". Sorry about that. From: Sergey Senozhatsky Subject: [PATCH] s390/fault: use wake_up_klogd() in bust_spinlocks() printk() without oops_in_progress set is potentially dangerous. it will attempt to call into console driver, so if oops happened while console driver port->lock spin_lock was locked on the same CPU (NMI oops or oops from console driver), then re-entering console driver from bust_spinlocks() will deadlock the system. Some serial drivers have are re-entrant from oops path: static void serial_console_write(struct console *co, const char *s, unsigned count) { ... if (port->sysrq) locked = 0; else if (oops_in_progress) locked = spin_trylock_irqsave(>lock, flags); else spin_lock_irqsave(>lock, flags); ... uart_console_write(port, s, count, serial_console_putchar); ... if (locked) spin_unlock_irqrestore(>lock, flags); } So it's OK to call printk() or console_unblank() and re-enter serial console drivers when oops_in_progress set. But once we clear oops_in_progress serial consoles become non-reentrant. >From the comment it seems that s390 wants to just poke klogd. There is wake_up_klogd() for this purpose, so we can replace that printk(" "). Signed-off-by: Sergey Senozhatsky --- arch/s390/mm/fault.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index 2b8f32f56e0c..53915c61ad95 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -92,16 +92,12 @@ void bust_spinlocks(int yes) oops_in_progress = 1; } else { int loglevel_save = console_loglevel; - console_unblank(); - oops_in_progress = 0; - /* -* OK, the message is on the console. Now we call printk() -* without oops_in_progress set so that printk will give klogd -* a poke. Hold onto your hats... -*/ + console_loglevel = 15; - printk(" "); + console_unblank(); console_loglevel = loglevel_save; + oops_in_progress = 0; + wake_up_klogd(); } } -- 2.19.1
Re: [PATCH] s390/fault: use wake_up_klogd() in bust_spinlocks()
On (10/24/18 13:30), Sergey Senozhatsky wrote: > - * OK, the message is on the console. Now we call printk() > - * without oops_in_progress set so that printk will give klogd > - * a poke. Hold onto your hats... > - */ > - console_loglevel = 15; > - printk(" "); > console_loglevel = loglevel_save; > + > + oops_in_progress = 0; > + wake_up_klogd(); D'oh... Fat fingers! I noticed that I have removed "console_loglevel = 15". Sorry about that. From: Sergey Senozhatsky Subject: [PATCH] s390/fault: use wake_up_klogd() in bust_spinlocks() printk() without oops_in_progress set is potentially dangerous. it will attempt to call into console driver, so if oops happened while console driver port->lock spin_lock was locked on the same CPU (NMI oops or oops from console driver), then re-entering console driver from bust_spinlocks() will deadlock the system. Some serial drivers have are re-entrant from oops path: static void serial_console_write(struct console *co, const char *s, unsigned count) { ... if (port->sysrq) locked = 0; else if (oops_in_progress) locked = spin_trylock_irqsave(>lock, flags); else spin_lock_irqsave(>lock, flags); ... uart_console_write(port, s, count, serial_console_putchar); ... if (locked) spin_unlock_irqrestore(>lock, flags); } So it's OK to call printk() or console_unblank() and re-enter serial console drivers when oops_in_progress set. But once we clear oops_in_progress serial consoles become non-reentrant. >From the comment it seems that s390 wants to just poke klogd. There is wake_up_klogd() for this purpose, so we can replace that printk(" "). Signed-off-by: Sergey Senozhatsky --- arch/s390/mm/fault.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index 2b8f32f56e0c..53915c61ad95 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -92,16 +92,12 @@ void bust_spinlocks(int yes) oops_in_progress = 1; } else { int loglevel_save = console_loglevel; - console_unblank(); - oops_in_progress = 0; - /* -* OK, the message is on the console. Now we call printk() -* without oops_in_progress set so that printk will give klogd -* a poke. Hold onto your hats... -*/ + console_loglevel = 15; - printk(" "); + console_unblank(); console_loglevel = loglevel_save; + oops_in_progress = 0; + wake_up_klogd(); } } -- 2.19.1
[PATCH] s390/fault: use wake_up_klogd() in bust_spinlocks()
printk() without oops_in_progress set is potentially dangerous. it will attempt to call into console driver, so if oops happened while console driver port->lock spin_lock was locked on the same CPU (NMI oops or oops from console driver), then re-entering console driver from bust_spinlocks() will deadlock the system. Some serial drivers have are re-entrant from oops path: static void serial_console_write(struct console *co, const char *s, unsigned count) { ... if (port->sysrq) locked = 0; else if (oops_in_progress) locked = spin_trylock_irqsave(>lock, flags); else spin_lock_irqsave(>lock, flags); ... uart_console_write(port, s, count, serial_console_putchar); ... if (locked) spin_unlock_irqrestore(>lock, flags); } So it's OK to call printk() or console_unblank() and re-enter serial console drivers when oops_in_progress set. But once we clear oops_in_progress serial consoles become non-reentrant. >From the comment it seems that s390 wants to just poke klogd. There is wake_up_klogd() for this purpose, so we can replace that printk(" "). Signed-off-by: Sergey Senozhatsky --- arch/s390/mm/fault.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index 2b8f32f56e0c..244993dc3c70 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -93,15 +93,10 @@ void bust_spinlocks(int yes) } else { int loglevel_save = console_loglevel; console_unblank(); - oops_in_progress = 0; - /* -* OK, the message is on the console. Now we call printk() -* without oops_in_progress set so that printk will give klogd -* a poke. Hold onto your hats... -*/ - console_loglevel = 15; - printk(" "); console_loglevel = loglevel_save; + + oops_in_progress = 0; + wake_up_klogd(); } } -- 2.19.1
Re: [PATCH] scsi: 3w-{sas,9xxx}: Use unsigned char for cdb
Nathan, > Clang warns a few times: > > drivers/scsi/3w-sas.c:386:11: warning: implicit conversion from 'int' to > 'char' changes value from 128 to -128 [-Wconstant-conversion] > cdb[4] = TW_ALLOCATION_LENGTH; /* allocation length */ >~ ^~~~ > > Update cdb's type to unsigned char, which matches the type of the cdb > member in struct TW_Command_Apache. Applied to 4.20/scsi-queue. Thank you. -- Martin K. Petersen Oracle Linux Engineering
[PATCH] s390/fault: use wake_up_klogd() in bust_spinlocks()
printk() without oops_in_progress set is potentially dangerous. it will attempt to call into console driver, so if oops happened while console driver port->lock spin_lock was locked on the same CPU (NMI oops or oops from console driver), then re-entering console driver from bust_spinlocks() will deadlock the system. Some serial drivers have are re-entrant from oops path: static void serial_console_write(struct console *co, const char *s, unsigned count) { ... if (port->sysrq) locked = 0; else if (oops_in_progress) locked = spin_trylock_irqsave(>lock, flags); else spin_lock_irqsave(>lock, flags); ... uart_console_write(port, s, count, serial_console_putchar); ... if (locked) spin_unlock_irqrestore(>lock, flags); } So it's OK to call printk() or console_unblank() and re-enter serial console drivers when oops_in_progress set. But once we clear oops_in_progress serial consoles become non-reentrant. >From the comment it seems that s390 wants to just poke klogd. There is wake_up_klogd() for this purpose, so we can replace that printk(" "). Signed-off-by: Sergey Senozhatsky --- arch/s390/mm/fault.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index 2b8f32f56e0c..244993dc3c70 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -93,15 +93,10 @@ void bust_spinlocks(int yes) } else { int loglevel_save = console_loglevel; console_unblank(); - oops_in_progress = 0; - /* -* OK, the message is on the console. Now we call printk() -* without oops_in_progress set so that printk will give klogd -* a poke. Hold onto your hats... -*/ - console_loglevel = 15; - printk(" "); console_loglevel = loglevel_save; + + oops_in_progress = 0; + wake_up_klogd(); } } -- 2.19.1
Re: [PATCH] scsi: 3w-{sas,9xxx}: Use unsigned char for cdb
Nathan, > Clang warns a few times: > > drivers/scsi/3w-sas.c:386:11: warning: implicit conversion from 'int' to > 'char' changes value from 128 to -128 [-Wconstant-conversion] > cdb[4] = TW_ALLOCATION_LENGTH; /* allocation length */ >~ ^~~~ > > Update cdb's type to unsigned char, which matches the type of the cdb > member in struct TW_Command_Apache. Applied to 4.20/scsi-queue. Thank you. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH V12 00/14] Krait clocks + Krait CPUfreq
Hi Niklas, On 10/22/2018 9:00 PM, Niklas Cassel wrote: > On Mon, Oct 22, 2018 at 09:39:03AM +0530, Sricharan R wrote: >> Hi Stephen, >> >> On 10/18/2018 1:46 AM, Stephen Boyd wrote: >>> Quoting Stephen Boyd (2018-10-17 08:44:12) Quoting Sricharan R (2018-09-20 06:03:31) > > > On 9/20/2018 1:54 AM, Craig wrote: >> Yup, this patch seems to have fixed the higher frequencies from the >> quick test I did. >> > Thanks !!. Can i take that as > Tested-by: Craig Tatlor ? > Is this patch series going to be resent? >>> >>> Nevermind. Looking at it I think I can apply all the clk ones and we're >>> good to go. If you can send a followup patch series to change the >>> registration and provider APIs to be clk_hw instead of clk based I would >>> appreciate it. >>> >> >> Sorry for the late response. Was away. >> Only pending thing was separating out the binding documentation for the >> cpu-freq >> driver and fixing the text in documentation. That means, yes its fine to >> merge >> the clk ones as you said. I will resend that. Also, will send a follow up >> series for clk_hw to >> clk change as you mentioned separately. > > Hello Sricharan, > > Great to see that the clk parts has been marged to clk-next! > > Are you also planning on sending out a new version of the cpufreq driver > consolidation parts? > yeah right, will send a new version, sometime next week. > I'm planning on extending your consilidated cpufreq driver with support > for msm8916 (Cortex-A53), where I plan to read PVS/speedbin, in order to > set opp_supported_hw(), and also register with cpufreq (since Viresh/Ulf > suggested that we shouldn't register with cpufreq in the CPR power-domain > driver). ok sure. Regards, Sricharan -- "QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH V12 00/14] Krait clocks + Krait CPUfreq
Hi Niklas, On 10/22/2018 9:00 PM, Niklas Cassel wrote: > On Mon, Oct 22, 2018 at 09:39:03AM +0530, Sricharan R wrote: >> Hi Stephen, >> >> On 10/18/2018 1:46 AM, Stephen Boyd wrote: >>> Quoting Stephen Boyd (2018-10-17 08:44:12) Quoting Sricharan R (2018-09-20 06:03:31) > > > On 9/20/2018 1:54 AM, Craig wrote: >> Yup, this patch seems to have fixed the higher frequencies from the >> quick test I did. >> > Thanks !!. Can i take that as > Tested-by: Craig Tatlor ? > Is this patch series going to be resent? >>> >>> Nevermind. Looking at it I think I can apply all the clk ones and we're >>> good to go. If you can send a followup patch series to change the >>> registration and provider APIs to be clk_hw instead of clk based I would >>> appreciate it. >>> >> >> Sorry for the late response. Was away. >> Only pending thing was separating out the binding documentation for the >> cpu-freq >> driver and fixing the text in documentation. That means, yes its fine to >> merge >> the clk ones as you said. I will resend that. Also, will send a follow up >> series for clk_hw to >> clk change as you mentioned separately. > > Hello Sricharan, > > Great to see that the clk parts has been marged to clk-next! > > Are you also planning on sending out a new version of the cpufreq driver > consolidation parts? > yeah right, will send a new version, sometime next week. > I'm planning on extending your consilidated cpufreq driver with support > for msm8916 (Cortex-A53), where I plan to read PVS/speedbin, in order to > set opp_supported_hw(), and also register with cpufreq (since Viresh/Ulf > suggested that we shouldn't register with cpufreq in the CPR power-domain > driver). ok sure. Regards, Sricharan -- "QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH v3 3/5] Creates macro to avoid variable shadowing
Creates DEF_FIELD_ADDR_VAR as a more generic version of the DEF_FIELD_ADD macro, allowing usage of a variable name other than the struct element name. Also, sets DEF_FIELD_ADDR as a specific usage of DEF_FILD_ADDR_VAR in which the var name is the same as the struct element name. Then, makes use of DEF_FIELD_ADDR_VAR to create a variable of another name, in order to avoid variable shadowing. Signed-off-by: Leonardo Bras --- scripts/mod/file2alias.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c index 7be43697ff84..ed468313ddeb 100644 --- a/scripts/mod/file2alias.c +++ b/scripts/mod/file2alias.c @@ -95,12 +95,20 @@ extern struct devtable *__start___devtable[], *__stop___devtable[]; */ #define DEF_FIELD(m, devid, f) \ typeof(((struct devid *)0)->f) f = TO_NATIVE(*(typeof(f) *)((m) + OFF_##devid##_##f)) + +/* Define a variable v that holds the address of field f of struct devid + * based at address m. Due to the way typeof works, for a field of type + * T[N] the variable has type T(*)[N], _not_ T*. + */ +#define DEF_FIELD_ADDR_VAR(m, devid, f, v) \ + typeof(((struct devid *)0)->f) *v = ((m) + OFF_##devid##_##f) + /* Define a variable f that holds the address of field f of struct devid * based at address m. Due to the way typeof works, for a field of type * T[N] the variable has type T(*)[N], _not_ T*. */ #define DEF_FIELD_ADDR(m, devid, f) \ - typeof(((struct devid *)0)->f) *f = ((m) + OFF_##devid##_##f) + DEF_FIELD_ADDR_VAR(m, devid, f, f) /* Add a table entry. We test function type matches while we're here. */ #define ADD_TO_DEVTABLE(device_id, type, function) \ @@ -644,7 +652,7 @@ static void do_pnp_card_entries(void *symval, unsigned long size, for (i = 0; i < count; i++) { unsigned int j; - DEF_FIELD_ADDR(symval + i*id_size, pnp_card_device_id, devs); + DEF_FIELD_ADDR(symval + i * id_size, pnp_card_device_id, devs); for (j = 0; j < PNP_MAX_DEVICES; j++) { const char *id = (char *)(*devs)[j].id; @@ -656,10 +664,13 @@ static void do_pnp_card_entries(void *symval, unsigned long size, /* find duplicate, already added value */ for (i2 = 0; i2 < i && !dup; i2++) { - DEF_FIELD_ADDR(symval + i2*id_size, pnp_card_device_id, devs); + DEF_FIELD_ADDR_VAR(symval + i2 * id_size, + pnp_card_device_id, + devs, devs_dup); for (j2 = 0; j2 < PNP_MAX_DEVICES; j2++) { - const char *id2 = (char *)(*devs)[j2].id; + const char *id2 = + (char *)(*devs_dup)[j2].id; if (!id2[0]) break; -- 2.19.1
[PATCH v3 5/5] Adds -Wshadow on KBUILD_HOSTCFLAGS
Adds -Wshadow on KBUILD_HOSTCFLAGS to show shadow warnings on tools built for HOST. Signed-off-by: Leonardo Bras --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index e8b599b4dcde..3edae5d359b5 100644 --- a/Makefile +++ b/Makefile @@ -360,7 +360,7 @@ HOST_LFS_LIBS := $(shell getconf LFS_LIBS 2>/dev/null) HOSTCC = gcc HOSTCXX = g++ -KBUILD_HOSTCFLAGS := -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 \ +KBUILD_HOSTCFLAGS := -Wall -Wshadow -Wmissing-prototypes -Wstrict-prototypes -O2 \ -fomit-frame-pointer -std=gnu89 $(HOST_LFS_CFLAGS) \ $(HOSTCFLAGS) KBUILD_HOSTCXXFLAGS := -O2 $(HOST_LFS_CFLAGS) $(HOSTCXXFLAGS) -- 2.19.1
[PATCH v3 0/5] Adds -Wshadow on KBUILD_HOSTCFLAGS and fix warnings
This patchset add -Wshadow on KBUILD_HOSTCFLAGS and fixes all code that show this warning. Changes in v3: - Better Cover letter - Better commit message for patch 1/5. - Fixes what should change on patch 3/5 - Removes accent of my second name (better for searching at lkml.org) v2: https://lkml.org/lkml/2018/10/23/151 v1: https://lkml.org/lkml/2018/10/17/169 Leonardo Bras (5): x86/vdso: Renames variable to fix shadow warning. kbuild: Removes unnecessary shadowed local variable. Creates macro to avoid variable shadowing modpost: Changes parameter name to avoid shadowing. Adds -Wshadow on KBUILD_HOSTCFLAGS Makefile | 2 +- arch/x86/entry/vdso/vdso2c.h | 13 +++-- scripts/asn1_compiler.c | 2 +- scripts/mod/file2alias.c | 19 +++ scripts/mod/modpost.c| 4 ++-- 5 files changed, 26 insertions(+), 14 deletions(-) -- 2.19.1
[PATCH v3 3/5] Creates macro to avoid variable shadowing
Creates DEF_FIELD_ADDR_VAR as a more generic version of the DEF_FIELD_ADD macro, allowing usage of a variable name other than the struct element name. Also, sets DEF_FIELD_ADDR as a specific usage of DEF_FILD_ADDR_VAR in which the var name is the same as the struct element name. Then, makes use of DEF_FIELD_ADDR_VAR to create a variable of another name, in order to avoid variable shadowing. Signed-off-by: Leonardo Bras --- scripts/mod/file2alias.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c index 7be43697ff84..ed468313ddeb 100644 --- a/scripts/mod/file2alias.c +++ b/scripts/mod/file2alias.c @@ -95,12 +95,20 @@ extern struct devtable *__start___devtable[], *__stop___devtable[]; */ #define DEF_FIELD(m, devid, f) \ typeof(((struct devid *)0)->f) f = TO_NATIVE(*(typeof(f) *)((m) + OFF_##devid##_##f)) + +/* Define a variable v that holds the address of field f of struct devid + * based at address m. Due to the way typeof works, for a field of type + * T[N] the variable has type T(*)[N], _not_ T*. + */ +#define DEF_FIELD_ADDR_VAR(m, devid, f, v) \ + typeof(((struct devid *)0)->f) *v = ((m) + OFF_##devid##_##f) + /* Define a variable f that holds the address of field f of struct devid * based at address m. Due to the way typeof works, for a field of type * T[N] the variable has type T(*)[N], _not_ T*. */ #define DEF_FIELD_ADDR(m, devid, f) \ - typeof(((struct devid *)0)->f) *f = ((m) + OFF_##devid##_##f) + DEF_FIELD_ADDR_VAR(m, devid, f, f) /* Add a table entry. We test function type matches while we're here. */ #define ADD_TO_DEVTABLE(device_id, type, function) \ @@ -644,7 +652,7 @@ static void do_pnp_card_entries(void *symval, unsigned long size, for (i = 0; i < count; i++) { unsigned int j; - DEF_FIELD_ADDR(symval + i*id_size, pnp_card_device_id, devs); + DEF_FIELD_ADDR(symval + i * id_size, pnp_card_device_id, devs); for (j = 0; j < PNP_MAX_DEVICES; j++) { const char *id = (char *)(*devs)[j].id; @@ -656,10 +664,13 @@ static void do_pnp_card_entries(void *symval, unsigned long size, /* find duplicate, already added value */ for (i2 = 0; i2 < i && !dup; i2++) { - DEF_FIELD_ADDR(symval + i2*id_size, pnp_card_device_id, devs); + DEF_FIELD_ADDR_VAR(symval + i2 * id_size, + pnp_card_device_id, + devs, devs_dup); for (j2 = 0; j2 < PNP_MAX_DEVICES; j2++) { - const char *id2 = (char *)(*devs)[j2].id; + const char *id2 = + (char *)(*devs_dup)[j2].id; if (!id2[0]) break; -- 2.19.1
[PATCH v3 5/5] Adds -Wshadow on KBUILD_HOSTCFLAGS
Adds -Wshadow on KBUILD_HOSTCFLAGS to show shadow warnings on tools built for HOST. Signed-off-by: Leonardo Bras --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index e8b599b4dcde..3edae5d359b5 100644 --- a/Makefile +++ b/Makefile @@ -360,7 +360,7 @@ HOST_LFS_LIBS := $(shell getconf LFS_LIBS 2>/dev/null) HOSTCC = gcc HOSTCXX = g++ -KBUILD_HOSTCFLAGS := -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 \ +KBUILD_HOSTCFLAGS := -Wall -Wshadow -Wmissing-prototypes -Wstrict-prototypes -O2 \ -fomit-frame-pointer -std=gnu89 $(HOST_LFS_CFLAGS) \ $(HOSTCFLAGS) KBUILD_HOSTCXXFLAGS := -O2 $(HOST_LFS_CFLAGS) $(HOSTCXXFLAGS) -- 2.19.1
[PATCH v3 0/5] Adds -Wshadow on KBUILD_HOSTCFLAGS and fix warnings
This patchset add -Wshadow on KBUILD_HOSTCFLAGS and fixes all code that show this warning. Changes in v3: - Better Cover letter - Better commit message for patch 1/5. - Fixes what should change on patch 3/5 - Removes accent of my second name (better for searching at lkml.org) v2: https://lkml.org/lkml/2018/10/23/151 v1: https://lkml.org/lkml/2018/10/17/169 Leonardo Bras (5): x86/vdso: Renames variable to fix shadow warning. kbuild: Removes unnecessary shadowed local variable. Creates macro to avoid variable shadowing modpost: Changes parameter name to avoid shadowing. Adds -Wshadow on KBUILD_HOSTCFLAGS Makefile | 2 +- arch/x86/entry/vdso/vdso2c.h | 13 +++-- scripts/asn1_compiler.c | 2 +- scripts/mod/file2alias.c | 19 +++ scripts/mod/modpost.c| 4 ++-- 5 files changed, 26 insertions(+), 14 deletions(-) -- 2.19.1
[PATCH v3 1/5] x86/vdso: Renames variable to fix shadow warning.
The go32() and go64() functions have an argument and a local variable called ‘name’. Rename both to clarify the code and to fix a warning with -Wshadow. Signed-off-by: Leonardo Bras --- arch/x86/entry/vdso/vdso2c.h | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/arch/x86/entry/vdso/vdso2c.h b/arch/x86/entry/vdso/vdso2c.h index fa847a620f40..a20b134de2a8 100644 --- a/arch/x86/entry/vdso/vdso2c.h +++ b/arch/x86/entry/vdso/vdso2c.h @@ -7,7 +7,7 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len, void *stripped_addr, size_t stripped_len, -FILE *outfile, const char *name) +FILE *outfile, const char *image_name) { int found_load = 0; unsigned long load_size = -1; /* Work around bogus warning */ @@ -93,11 +93,12 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len, int k; ELF(Sym) *sym = raw_addr + GET_LE(_hdr->sh_offset) + GET_LE(_hdr->sh_entsize) * i; - const char *name = raw_addr + GET_LE(_hdr->sh_offset) + - GET_LE(>st_name); + const char *sym_name = raw_addr + + GET_LE(_hdr->sh_offset) + + GET_LE(>st_name); for (k = 0; k < NSYMS; k++) { - if (!strcmp(name, required_syms[k].name)) { + if (!strcmp(sym_name, required_syms[k].name)) { if (syms[k]) { fail("duplicate symbol %s\n", required_syms[k].name); @@ -134,7 +135,7 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len, if (syms[sym_vvar_start] % 4096) fail("vvar_begin must be a multiple of 4096\n"); - if (!name) { + if (!image_name) { fwrite(stripped_addr, stripped_len, 1, outfile); return; } @@ -157,7 +158,7 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len, } fprintf(outfile, "\n};\n\n"); - fprintf(outfile, "const struct vdso_image %s = {\n", name); + fprintf(outfile, "const struct vdso_image %s = {\n", image_name); fprintf(outfile, "\t.data = raw_data,\n"); fprintf(outfile, "\t.size = %lu,\n", mapping_size); if (alt_sec) { -- 2.19.1
[PATCH v3 4/5] modpost: Changes parameter name to avoid shadowing.
Changes the parameter name to avoid shadowing a variable. Signed-off-by: Leonardo Bras --- scripts/mod/modpost.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index 0d998c54564d..368fe42340df 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -2228,13 +2228,13 @@ static int add_versions(struct buffer *b, struct module *mod) } static void add_depends(struct buffer *b, struct module *mod, - struct module *modules) + struct module *module_list) { struct symbol *s; struct module *m; int first = 1; - for (m = modules; m; m = m->next) + for (m = module_list; m; m = m->next) m->seen = is_vmlinux(m->name); buf_printf(b, "\n"); -- 2.19.1
[PATCH v3 2/5] kbuild: Removes unnecessary shadowed local variable.
Removes an unnecessary shadowed local variable (start). It was used only once, with the same value it was started before the if block. Signed-off-by: Leonardo Bras --- scripts/asn1_compiler.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/asn1_compiler.c b/scripts/asn1_compiler.c index c146020fc783..1b28787028d3 100644 --- a/scripts/asn1_compiler.c +++ b/scripts/asn1_compiler.c @@ -413,7 +413,7 @@ static void tokenise(char *buffer, char *end) /* Handle string tokens */ if (isalpha(*p)) { - const char **dir, *start = p; + const char **dir; /* Can be a directive, type name or element * name. Find the end of the name. -- 2.19.1
[PATCH v3 1/5] x86/vdso: Renames variable to fix shadow warning.
The go32() and go64() functions have an argument and a local variable called ‘name’. Rename both to clarify the code and to fix a warning with -Wshadow. Signed-off-by: Leonardo Bras --- arch/x86/entry/vdso/vdso2c.h | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/arch/x86/entry/vdso/vdso2c.h b/arch/x86/entry/vdso/vdso2c.h index fa847a620f40..a20b134de2a8 100644 --- a/arch/x86/entry/vdso/vdso2c.h +++ b/arch/x86/entry/vdso/vdso2c.h @@ -7,7 +7,7 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len, void *stripped_addr, size_t stripped_len, -FILE *outfile, const char *name) +FILE *outfile, const char *image_name) { int found_load = 0; unsigned long load_size = -1; /* Work around bogus warning */ @@ -93,11 +93,12 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len, int k; ELF(Sym) *sym = raw_addr + GET_LE(_hdr->sh_offset) + GET_LE(_hdr->sh_entsize) * i; - const char *name = raw_addr + GET_LE(_hdr->sh_offset) + - GET_LE(>st_name); + const char *sym_name = raw_addr + + GET_LE(_hdr->sh_offset) + + GET_LE(>st_name); for (k = 0; k < NSYMS; k++) { - if (!strcmp(name, required_syms[k].name)) { + if (!strcmp(sym_name, required_syms[k].name)) { if (syms[k]) { fail("duplicate symbol %s\n", required_syms[k].name); @@ -134,7 +135,7 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len, if (syms[sym_vvar_start] % 4096) fail("vvar_begin must be a multiple of 4096\n"); - if (!name) { + if (!image_name) { fwrite(stripped_addr, stripped_len, 1, outfile); return; } @@ -157,7 +158,7 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len, } fprintf(outfile, "\n};\n\n"); - fprintf(outfile, "const struct vdso_image %s = {\n", name); + fprintf(outfile, "const struct vdso_image %s = {\n", image_name); fprintf(outfile, "\t.data = raw_data,\n"); fprintf(outfile, "\t.size = %lu,\n", mapping_size); if (alt_sec) { -- 2.19.1
[PATCH v3 4/5] modpost: Changes parameter name to avoid shadowing.
Changes the parameter name to avoid shadowing a variable. Signed-off-by: Leonardo Bras --- scripts/mod/modpost.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index 0d998c54564d..368fe42340df 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -2228,13 +2228,13 @@ static int add_versions(struct buffer *b, struct module *mod) } static void add_depends(struct buffer *b, struct module *mod, - struct module *modules) + struct module *module_list) { struct symbol *s; struct module *m; int first = 1; - for (m = modules; m; m = m->next) + for (m = module_list; m; m = m->next) m->seen = is_vmlinux(m->name); buf_printf(b, "\n"); -- 2.19.1
[PATCH v3 2/5] kbuild: Removes unnecessary shadowed local variable.
Removes an unnecessary shadowed local variable (start). It was used only once, with the same value it was started before the if block. Signed-off-by: Leonardo Bras --- scripts/asn1_compiler.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/asn1_compiler.c b/scripts/asn1_compiler.c index c146020fc783..1b28787028d3 100644 --- a/scripts/asn1_compiler.c +++ b/scripts/asn1_compiler.c @@ -413,7 +413,7 @@ static void tokenise(char *buffer, char *end) /* Handle string tokens */ if (isalpha(*p)) { - const char **dir, *start = p; + const char **dir; /* Can be a directive, type name or element * name. Find the end of the name. -- 2.19.1
[PATCH] rpmsg: virtio_rpmsg_bus: replace "%p" with "%pK"
The virtio_rpmsg_bus driver uses the "%p" format-specifier for printing the vring buffer address. This prints only a hashed pointer even for previliged users. Use "%pK" instead so that the address can be printed during debug using kptr_restrict sysctl. Signed-off-by: Suman Anna --- drivers/rpmsg/virtio_rpmsg_bus.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c index f29dee731026..1345f373a1a0 100644 --- a/drivers/rpmsg/virtio_rpmsg_bus.c +++ b/drivers/rpmsg/virtio_rpmsg_bus.c @@ -950,7 +950,7 @@ static int rpmsg_probe(struct virtio_device *vdev) goto vqs_del; } - dev_dbg(>dev, "buffers: va %p, dma %pad\n", + dev_dbg(>dev, "buffers: va %pK, dma %pad\n", bufs_va, >bufs_dma); /* half of the buffers is dedicated for RX */ -- 2.19.1
Re: [PATCH v4 13/17] remoteproc: create vdev subdevice with specific dma memory pool
On 10/23/18 8:22 PM, Suman Anna wrote: > On 9/27/18 3:18 PM, Wendy Liang wrote: >> Hi Loic, >> >> >> On Thu, Sep 27, 2018 at 12:22 PM Loic PALLARDY wrote: >>> >>> Hi Wendy >>> -Original Message- From: Wendy Liang Sent: Thursday, September 27, 2018 7:17 PM To: Loic PALLARDY Cc: Bjorn Andersson ; Ohad Ben-Cohen ; linux-remotep...@vger.kernel.org; Linux Kernel Mailing List ; Arnaud POULIQUEN ; benjamin.gaign...@linaro.org; Suman Anna Subject: Re: [PATCH v4 13/17] remoteproc: create vdev subdevice with specific dma memory pool On Fri, Jul 27, 2018 at 6:16 AM Loic Pallardy wrote: > > This patch creates a dedicated vdev subdevice for each vdev declared > in firmware resource table and associates carveout named "vdev%dbuffer" > (with %d vdev index in resource table) if any as dma coherent memory pool. > > Then vdev subdevice is used as parent for virtio device. > > Signed-off-by: Loic Pallardy > --- > drivers/remoteproc/remoteproc_core.c | 35 +++--- > drivers/remoteproc/remoteproc_internal.h | 1 + > drivers/remoteproc/remoteproc_virtio.c | 42 +++- > include/linux/remoteproc.h | 1 + > 4 files changed, 75 insertions(+), 4 deletions(-) > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c > index 4edc6f0..adcc66e 100644 > --- a/drivers/remoteproc/remoteproc_core.c > +++ b/drivers/remoteproc/remoteproc_core.c > @@ -39,6 +39,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -145,7 +146,7 @@ static void rproc_disable_iommu(struct rproc *rproc) > iommu_domain_free(domain); > } > > -static phys_addr_t rproc_va_to_pa(void *cpu_addr) > +phys_addr_t rproc_va_to_pa(void *cpu_addr) > { > /* > * Return physical address according to virtual address location > @@ -160,6 +161,7 @@ static phys_addr_t rproc_va_to_pa(void *cpu_addr) > WARN_ON(!virt_addr_valid(cpu_addr)); > return virt_to_phys(cpu_addr); > } > +EXPORT_SYMBOL(rproc_va_to_pa); > > /** > * rproc_da_to_va() - lookup the kernel virtual address for a remoteproc address > @@ -423,6 +425,20 @@ static void rproc_vdev_do_stop(struct rproc_subdev *subdev, bool crashed) > } > > /** > + * rproc_rvdev_release() - release the existence of a rvdev > + * > + * @dev: the subdevice's dev > + */ > +static void rproc_rvdev_release(struct device *dev) > +{ > + struct rproc_vdev *rvdev = container_of(dev, struct rproc_vdev, > dev); > + > + of_reserved_mem_device_release(dev); > + > + kfree(rvdev); > +} > + > +/** > * rproc_handle_vdev() - handle a vdev fw resource > * @rproc: the remote processor > * @rsc: the vring resource descriptor > @@ -455,6 +471,7 @@ static int rproc_handle_vdev(struct rproc *rproc, struct fw_rsc_vdev *rsc, > struct device *dev = >dev; > struct rproc_vdev *rvdev; > int i, ret; > + char name[16]; > > /* make sure resource isn't truncated */ > if (sizeof(*rsc) + rsc->num_of_vrings * sizeof(struct fw_rsc_vdev_vring) > @@ -488,6 +505,18 @@ static int rproc_handle_vdev(struct rproc *rproc, struct fw_rsc_vdev *rsc, > rvdev->rproc = rproc; > rvdev->index = rproc->nb_vdev++; > > + /* Initialise vdev subdevice */ > + snprintf(name, sizeof(name), "vdev%dbuffer", rvdev->index); > + rvdev->dev.parent = rproc->dev.parent; > + rvdev->dev.release = rproc_rvdev_release; > + dev_set_name(>dev, "%s#%s", dev_name(rvdev- > dev.parent), name); > + dev_set_drvdata(>dev, rvdev); > + dma_set_coherent_mask(>dev, DMA_BIT_MASK(32)); I tried the latest kernel, this function will not set the DMA coherent mask as dma_supported() of the >dev will return false. As this is a device created at run time, should it be force to support DMA? should it directly set the dma_coherent_mask? >>> >>> Thanks for pointing me this issue. I tested on top of 4.18-rc1 few months >>> ago... >>> Could you please give me kernel version on which you are testing the series? >>> Is you platform 32bit or 64bit ? >>> I'll rebase and check on my side. >> >> I am testing with 4.19-rc4 on aarch64 platform. > > Btw, I ran into this on my v7 platform as well (4.19-rc6). The > dma_set_coherent_mask fails with error EIO. I did get my allocations > through though. Correction, that was before Patch 17. With patch 17, this fails. regards Suman > > regards > Suman > >> >> Best Regards, >> Wendy
Re: [PATCH v4 13/17] remoteproc: create vdev subdevice with specific dma memory pool
On 10/10/18 2:17 PM, Loic PALLARDY wrote: > > >> -Original Message- >> From: Bjorn Andersson [mailto:bjorn.anders...@linaro.org] >> Sent: mercredi 10 octobre 2018 07:58 >> To: Loic PALLARDY >> Cc: o...@wizery.com; linux-remotep...@vger.kernel.org; linux- >> ker...@vger.kernel.org; Arnaud POULIQUEN ; >> benjamin.gaign...@linaro.org; s-a...@ti.com >> Subject: Re: [PATCH v4 13/17] remoteproc: create vdev subdevice with >> specific dma memory pool >> >> On Fri 27 Jul 06:14 PDT 2018, Loic Pallardy wrote: >> >>> This patch creates a dedicated vdev subdevice for each vdev declared >>> in firmware resource table and associates carveout named "vdev%dbuffer" >>> (with %d vdev index in resource table) if any as dma coherent memory >> pool. >>> >>> Then vdev subdevice is used as parent for virtio device. >>> >>> Signed-off-by: Loic Pallardy >>> --- >>> drivers/remoteproc/remoteproc_core.c | 35 >> +++--- >>> drivers/remoteproc/remoteproc_internal.h | 1 + >>> drivers/remoteproc/remoteproc_virtio.c | 42 >> +++- >>> include/linux/remoteproc.h | 1 + >>> 4 files changed, 75 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/remoteproc/remoteproc_core.c >> b/drivers/remoteproc/remoteproc_core.c >>> index 4edc6f0..adcc66e 100644 >>> --- a/drivers/remoteproc/remoteproc_core.c >>> +++ b/drivers/remoteproc/remoteproc_core.c >>> @@ -39,6 +39,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> #include >>> #include >>> #include >>> @@ -145,7 +146,7 @@ static void rproc_disable_iommu(struct rproc >> *rproc) >>> iommu_domain_free(domain); >>> } >>> >>> -static phys_addr_t rproc_va_to_pa(void *cpu_addr) >>> +phys_addr_t rproc_va_to_pa(void *cpu_addr) >>> { >>> /* >>> * Return physical address according to virtual address location >>> @@ -160,6 +161,7 @@ static phys_addr_t rproc_va_to_pa(void >> *cpu_addr) >>> WARN_ON(!virt_addr_valid(cpu_addr)); >>> return virt_to_phys(cpu_addr); >>> } >>> +EXPORT_SYMBOL(rproc_va_to_pa); >>> >>> /** >>> * rproc_da_to_va() - lookup the kernel virtual address for a remoteproc >> address >>> @@ -423,6 +425,20 @@ static void rproc_vdev_do_stop(struct >> rproc_subdev *subdev, bool crashed) >>> } >>> >>> /** >>> + * rproc_rvdev_release() - release the existence of a rvdev >>> + * >>> + * @dev: the subdevice's dev >>> + */ >>> +static void rproc_rvdev_release(struct device *dev) >>> +{ >>> + struct rproc_vdev *rvdev = container_of(dev, struct rproc_vdev, >> dev); >>> + >>> + of_reserved_mem_device_release(dev); >>> + >>> + kfree(rvdev); >>> +} >>> + >>> +/** >>> * rproc_handle_vdev() - handle a vdev fw resource >>> * @rproc: the remote processor >>> * @rsc: the vring resource descriptor >>> @@ -455,6 +471,7 @@ static int rproc_handle_vdev(struct rproc *rproc, >> struct fw_rsc_vdev *rsc, >>> struct device *dev = >dev; >>> struct rproc_vdev *rvdev; >>> int i, ret; >>> + char name[16]; >>> >>> /* make sure resource isn't truncated */ >>> if (sizeof(*rsc) + rsc->num_of_vrings * sizeof(struct >> fw_rsc_vdev_vring) >>> @@ -488,6 +505,18 @@ static int rproc_handle_vdev(struct rproc *rproc, >> struct fw_rsc_vdev *rsc, >>> rvdev->rproc = rproc; >>> rvdev->index = rproc->nb_vdev++; >>> >>> + /* Initialise vdev subdevice */ >>> + snprintf(name, sizeof(name), "vdev%dbuffer", rvdev->index); >>> + rvdev->dev.parent = rproc->dev.parent; >>> + rvdev->dev.release = rproc_rvdev_release; >>> + dev_set_name(>dev, "%s#%s", dev_name(rvdev- >>> dev.parent), name); >>> + dev_set_drvdata(>dev, rvdev); >>> + dma_set_coherent_mask(>dev, DMA_BIT_MASK(32)); >>> + >>> + ret = device_register(>dev); >>> + if (ret) >>> + goto free_rvdev; >>> + >>> /* parse the vrings */ >>> for (i = 0; i < rsc->num_of_vrings; i++) { >>> ret = rproc_parse_vring(rvdev, rsc, i); >>> @@ -518,7 +547,7 @@ static int rproc_handle_vdev(struct rproc *rproc, >> struct fw_rsc_vdev *rsc, >>> for (i--; i >= 0; i--) >>> rproc_free_vring(>vring[i]); >>> free_rvdev: >>> - kfree(rvdev); >>> + device_unregister(>dev); >>> return ret; >>> } >>> >>> @@ -536,7 +565,7 @@ void rproc_vdev_release(struct kref *ref) >>> >>> rproc_remove_subdev(rproc, >subdev); >>> list_del(>node); >>> - kfree(rvdev); >>> + device_unregister(>dev); >>> } >>> >>> /** >>> diff --git a/drivers/remoteproc/remoteproc_internal.h >> b/drivers/remoteproc/remoteproc_internal.h >>> index f6cad24..bfeacfd 100644 >>> --- a/drivers/remoteproc/remoteproc_internal.h >>> +++ b/drivers/remoteproc/remoteproc_internal.h >>> @@ -52,6 +52,7 @@ struct dentry *rproc_create_trace_file(const char >> *name, struct rproc *rproc, >>> int rproc_alloc_vring(struct rproc_vdev *rvdev, int i); >>> >>> void *rproc_da_to_va(struct rproc *rproc, u64 da, int len); >>> +phys_addr_t rproc_va_to_pa(void *cpu_addr); >>> int rproc_trigger_recovery(struct rproc
[PATCH] rpmsg: virtio_rpmsg_bus: replace "%p" with "%pK"
The virtio_rpmsg_bus driver uses the "%p" format-specifier for printing the vring buffer address. This prints only a hashed pointer even for previliged users. Use "%pK" instead so that the address can be printed during debug using kptr_restrict sysctl. Signed-off-by: Suman Anna --- drivers/rpmsg/virtio_rpmsg_bus.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c index f29dee731026..1345f373a1a0 100644 --- a/drivers/rpmsg/virtio_rpmsg_bus.c +++ b/drivers/rpmsg/virtio_rpmsg_bus.c @@ -950,7 +950,7 @@ static int rpmsg_probe(struct virtio_device *vdev) goto vqs_del; } - dev_dbg(>dev, "buffers: va %p, dma %pad\n", + dev_dbg(>dev, "buffers: va %pK, dma %pad\n", bufs_va, >bufs_dma); /* half of the buffers is dedicated for RX */ -- 2.19.1
Re: [PATCH v4 13/17] remoteproc: create vdev subdevice with specific dma memory pool
On 10/23/18 8:22 PM, Suman Anna wrote: > On 9/27/18 3:18 PM, Wendy Liang wrote: >> Hi Loic, >> >> >> On Thu, Sep 27, 2018 at 12:22 PM Loic PALLARDY wrote: >>> >>> Hi Wendy >>> -Original Message- From: Wendy Liang Sent: Thursday, September 27, 2018 7:17 PM To: Loic PALLARDY Cc: Bjorn Andersson ; Ohad Ben-Cohen ; linux-remotep...@vger.kernel.org; Linux Kernel Mailing List ; Arnaud POULIQUEN ; benjamin.gaign...@linaro.org; Suman Anna Subject: Re: [PATCH v4 13/17] remoteproc: create vdev subdevice with specific dma memory pool On Fri, Jul 27, 2018 at 6:16 AM Loic Pallardy wrote: > > This patch creates a dedicated vdev subdevice for each vdev declared > in firmware resource table and associates carveout named "vdev%dbuffer" > (with %d vdev index in resource table) if any as dma coherent memory pool. > > Then vdev subdevice is used as parent for virtio device. > > Signed-off-by: Loic Pallardy > --- > drivers/remoteproc/remoteproc_core.c | 35 +++--- > drivers/remoteproc/remoteproc_internal.h | 1 + > drivers/remoteproc/remoteproc_virtio.c | 42 +++- > include/linux/remoteproc.h | 1 + > 4 files changed, 75 insertions(+), 4 deletions(-) > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c > index 4edc6f0..adcc66e 100644 > --- a/drivers/remoteproc/remoteproc_core.c > +++ b/drivers/remoteproc/remoteproc_core.c > @@ -39,6 +39,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -145,7 +146,7 @@ static void rproc_disable_iommu(struct rproc *rproc) > iommu_domain_free(domain); > } > > -static phys_addr_t rproc_va_to_pa(void *cpu_addr) > +phys_addr_t rproc_va_to_pa(void *cpu_addr) > { > /* > * Return physical address according to virtual address location > @@ -160,6 +161,7 @@ static phys_addr_t rproc_va_to_pa(void *cpu_addr) > WARN_ON(!virt_addr_valid(cpu_addr)); > return virt_to_phys(cpu_addr); > } > +EXPORT_SYMBOL(rproc_va_to_pa); > > /** > * rproc_da_to_va() - lookup the kernel virtual address for a remoteproc address > @@ -423,6 +425,20 @@ static void rproc_vdev_do_stop(struct rproc_subdev *subdev, bool crashed) > } > > /** > + * rproc_rvdev_release() - release the existence of a rvdev > + * > + * @dev: the subdevice's dev > + */ > +static void rproc_rvdev_release(struct device *dev) > +{ > + struct rproc_vdev *rvdev = container_of(dev, struct rproc_vdev, > dev); > + > + of_reserved_mem_device_release(dev); > + > + kfree(rvdev); > +} > + > +/** > * rproc_handle_vdev() - handle a vdev fw resource > * @rproc: the remote processor > * @rsc: the vring resource descriptor > @@ -455,6 +471,7 @@ static int rproc_handle_vdev(struct rproc *rproc, struct fw_rsc_vdev *rsc, > struct device *dev = >dev; > struct rproc_vdev *rvdev; > int i, ret; > + char name[16]; > > /* make sure resource isn't truncated */ > if (sizeof(*rsc) + rsc->num_of_vrings * sizeof(struct fw_rsc_vdev_vring) > @@ -488,6 +505,18 @@ static int rproc_handle_vdev(struct rproc *rproc, struct fw_rsc_vdev *rsc, > rvdev->rproc = rproc; > rvdev->index = rproc->nb_vdev++; > > + /* Initialise vdev subdevice */ > + snprintf(name, sizeof(name), "vdev%dbuffer", rvdev->index); > + rvdev->dev.parent = rproc->dev.parent; > + rvdev->dev.release = rproc_rvdev_release; > + dev_set_name(>dev, "%s#%s", dev_name(rvdev- > dev.parent), name); > + dev_set_drvdata(>dev, rvdev); > + dma_set_coherent_mask(>dev, DMA_BIT_MASK(32)); I tried the latest kernel, this function will not set the DMA coherent mask as dma_supported() of the >dev will return false. As this is a device created at run time, should it be force to support DMA? should it directly set the dma_coherent_mask? >>> >>> Thanks for pointing me this issue. I tested on top of 4.18-rc1 few months >>> ago... >>> Could you please give me kernel version on which you are testing the series? >>> Is you platform 32bit or 64bit ? >>> I'll rebase and check on my side. >> >> I am testing with 4.19-rc4 on aarch64 platform. > > Btw, I ran into this on my v7 platform as well (4.19-rc6). The > dma_set_coherent_mask fails with error EIO. I did get my allocations > through though. Correction, that was before Patch 17. With patch 17, this fails. regards Suman > > regards > Suman > >> >> Best Regards, >> Wendy
Re: [PATCH v4 13/17] remoteproc: create vdev subdevice with specific dma memory pool
On 10/10/18 2:17 PM, Loic PALLARDY wrote: > > >> -Original Message- >> From: Bjorn Andersson [mailto:bjorn.anders...@linaro.org] >> Sent: mercredi 10 octobre 2018 07:58 >> To: Loic PALLARDY >> Cc: o...@wizery.com; linux-remotep...@vger.kernel.org; linux- >> ker...@vger.kernel.org; Arnaud POULIQUEN ; >> benjamin.gaign...@linaro.org; s-a...@ti.com >> Subject: Re: [PATCH v4 13/17] remoteproc: create vdev subdevice with >> specific dma memory pool >> >> On Fri 27 Jul 06:14 PDT 2018, Loic Pallardy wrote: >> >>> This patch creates a dedicated vdev subdevice for each vdev declared >>> in firmware resource table and associates carveout named "vdev%dbuffer" >>> (with %d vdev index in resource table) if any as dma coherent memory >> pool. >>> >>> Then vdev subdevice is used as parent for virtio device. >>> >>> Signed-off-by: Loic Pallardy >>> --- >>> drivers/remoteproc/remoteproc_core.c | 35 >> +++--- >>> drivers/remoteproc/remoteproc_internal.h | 1 + >>> drivers/remoteproc/remoteproc_virtio.c | 42 >> +++- >>> include/linux/remoteproc.h | 1 + >>> 4 files changed, 75 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/remoteproc/remoteproc_core.c >> b/drivers/remoteproc/remoteproc_core.c >>> index 4edc6f0..adcc66e 100644 >>> --- a/drivers/remoteproc/remoteproc_core.c >>> +++ b/drivers/remoteproc/remoteproc_core.c >>> @@ -39,6 +39,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> #include >>> #include >>> #include >>> @@ -145,7 +146,7 @@ static void rproc_disable_iommu(struct rproc >> *rproc) >>> iommu_domain_free(domain); >>> } >>> >>> -static phys_addr_t rproc_va_to_pa(void *cpu_addr) >>> +phys_addr_t rproc_va_to_pa(void *cpu_addr) >>> { >>> /* >>> * Return physical address according to virtual address location >>> @@ -160,6 +161,7 @@ static phys_addr_t rproc_va_to_pa(void >> *cpu_addr) >>> WARN_ON(!virt_addr_valid(cpu_addr)); >>> return virt_to_phys(cpu_addr); >>> } >>> +EXPORT_SYMBOL(rproc_va_to_pa); >>> >>> /** >>> * rproc_da_to_va() - lookup the kernel virtual address for a remoteproc >> address >>> @@ -423,6 +425,20 @@ static void rproc_vdev_do_stop(struct >> rproc_subdev *subdev, bool crashed) >>> } >>> >>> /** >>> + * rproc_rvdev_release() - release the existence of a rvdev >>> + * >>> + * @dev: the subdevice's dev >>> + */ >>> +static void rproc_rvdev_release(struct device *dev) >>> +{ >>> + struct rproc_vdev *rvdev = container_of(dev, struct rproc_vdev, >> dev); >>> + >>> + of_reserved_mem_device_release(dev); >>> + >>> + kfree(rvdev); >>> +} >>> + >>> +/** >>> * rproc_handle_vdev() - handle a vdev fw resource >>> * @rproc: the remote processor >>> * @rsc: the vring resource descriptor >>> @@ -455,6 +471,7 @@ static int rproc_handle_vdev(struct rproc *rproc, >> struct fw_rsc_vdev *rsc, >>> struct device *dev = >dev; >>> struct rproc_vdev *rvdev; >>> int i, ret; >>> + char name[16]; >>> >>> /* make sure resource isn't truncated */ >>> if (sizeof(*rsc) + rsc->num_of_vrings * sizeof(struct >> fw_rsc_vdev_vring) >>> @@ -488,6 +505,18 @@ static int rproc_handle_vdev(struct rproc *rproc, >> struct fw_rsc_vdev *rsc, >>> rvdev->rproc = rproc; >>> rvdev->index = rproc->nb_vdev++; >>> >>> + /* Initialise vdev subdevice */ >>> + snprintf(name, sizeof(name), "vdev%dbuffer", rvdev->index); >>> + rvdev->dev.parent = rproc->dev.parent; >>> + rvdev->dev.release = rproc_rvdev_release; >>> + dev_set_name(>dev, "%s#%s", dev_name(rvdev- >>> dev.parent), name); >>> + dev_set_drvdata(>dev, rvdev); >>> + dma_set_coherent_mask(>dev, DMA_BIT_MASK(32)); >>> + >>> + ret = device_register(>dev); >>> + if (ret) >>> + goto free_rvdev; >>> + >>> /* parse the vrings */ >>> for (i = 0; i < rsc->num_of_vrings; i++) { >>> ret = rproc_parse_vring(rvdev, rsc, i); >>> @@ -518,7 +547,7 @@ static int rproc_handle_vdev(struct rproc *rproc, >> struct fw_rsc_vdev *rsc, >>> for (i--; i >= 0; i--) >>> rproc_free_vring(>vring[i]); >>> free_rvdev: >>> - kfree(rvdev); >>> + device_unregister(>dev); >>> return ret; >>> } >>> >>> @@ -536,7 +565,7 @@ void rproc_vdev_release(struct kref *ref) >>> >>> rproc_remove_subdev(rproc, >subdev); >>> list_del(>node); >>> - kfree(rvdev); >>> + device_unregister(>dev); >>> } >>> >>> /** >>> diff --git a/drivers/remoteproc/remoteproc_internal.h >> b/drivers/remoteproc/remoteproc_internal.h >>> index f6cad24..bfeacfd 100644 >>> --- a/drivers/remoteproc/remoteproc_internal.h >>> +++ b/drivers/remoteproc/remoteproc_internal.h >>> @@ -52,6 +52,7 @@ struct dentry *rproc_create_trace_file(const char >> *name, struct rproc *rproc, >>> int rproc_alloc_vring(struct rproc_vdev *rvdev, int i); >>> >>> void *rproc_da_to_va(struct rproc *rproc, u64 da, int len); >>> +phys_addr_t rproc_va_to_pa(void *cpu_addr); >>> int rproc_trigger_recovery(struct rproc
Re: [v4] i2c: Add PCI and platform drivers for the AMD MP2 I2C controller
Hi Elie, I love your patch! Perhaps something to improve: [auto build test WARNING on wsa/i2c/for-next] [also build test WARNING on v4.19 next-20181019] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Elie-Morisse/i2c-Add-PCI-and-platform-drivers-for-the-AMD-MP2-I2C-controller/20181024-013625 base: https://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git i2c/for-next config: i386-allyesconfig (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 All warnings (new ones prefixed by >>): drivers/i2c/busses/i2c-amd-pci-mp2.c: In function 'amd_mp2_debugfs_read': >> drivers/i2c/busses/i2c-amd-pci-mp2.c:298:66: warning: comparison of distinct >> pointer types lacks a cast buf_size = min(count, 0x800ul); ^ vim +298 drivers/i2c/busses/i2c-amd-pci-mp2.c 285 286 static ssize_t amd_mp2_debugfs_read(struct file *filp, char __user *ubuf, 287 size_t count, loff_t *offp) 288 { 289 struct amd_mp2_dev *privdata; 290 void __iomem *mmio; 291 u8 *buf; 292 size_t buf_size; 293 ssize_t ret, off; 294 u32 v32; 295 296 privdata = filp->private_data; 297 mmio = privdata->mmio; > 298 buf_size = min(count, 0x800ul); 299 buf = kmalloc(buf_size, GFP_KERNEL); 300 301 if (!buf) 302 return -ENOMEM; 303 304 off = 0; 305 off += scnprintf(buf + off, buf_size - off, 306 "Mp2 Device Information:\n"); 307 308 off += scnprintf(buf + off, buf_size - off, 309 "\n"); 310 off += scnprintf(buf + off, buf_size - off, 311 "\tMP2 C2P Message Register Dump:\n\n"); 312 v32 = readl(privdata->mmio + AMD_C2P_MSG0); 313 off += scnprintf(buf + off, buf_size - off, 314 "AMD_C2P_MSG0 -\t\t\t%#06x\n", v32); 315 316 v32 = readl(privdata->mmio + AMD_C2P_MSG1); 317 off += scnprintf(buf + off, buf_size - off, 318 "AMD_C2P_MSG1 -\t\t\t%#06x\n", v32); 319 320 v32 = readl(privdata->mmio + AMD_C2P_MSG2); 321 off += scnprintf(buf + off, buf_size - off, 322 "AMD_C2P_MSG2 -\t\t\t%#06x\n", v32); 323 324 v32 = readl(privdata->mmio + AMD_C2P_MSG3); 325 off += scnprintf(buf + off, buf_size - off, 326 "AMD_C2P_MSG3 -\t\t\t%#06x\n", v32); 327 328 v32 = readl(privdata->mmio + AMD_C2P_MSG4); 329 off += scnprintf(buf + off, buf_size - off, 330 "AMD_C2P_MSG4 -\t\t\t%#06x\n", v32); 331 332 v32 = readl(privdata->mmio + AMD_C2P_MSG5); 333 off += scnprintf(buf + off, buf_size - off, 334 "AMD_C2P_MSG5 -\t\t\t%#06x\n", v32); 335 336 v32 = readl(privdata->mmio + AMD_C2P_MSG6); 337 off += scnprintf(buf + off, buf_size - off, 338 "AMD_C2P_MSG6 -\t\t\t%#06x\n", v32); 339 340 v32 = readl(privdata->mmio + AMD_C2P_MSG7); 341 off += scnprintf(buf + off, buf_size - off, 342 "AMD_C2P_MSG7 -\t\t\t%#06x\n", v32); 343 344 v32 = readl(privdata->mmio + AMD_C2P_MSG8); 345 off += scnprintf(buf + off, buf_size - off, 346 "AMD_C2P_MSG8 -\t\t\t%#06x\n", v32); 347 348 v32 = readl(privdata->mmio + AMD_C2P_MSG9); 349 off += scnprintf(buf + off, buf_size - off, 350 "AMD_C2P_MSG9 -\t\t\t%#06x\n", v32); 351 352 off += scnprintf(buf + off, buf_size - off, 353 "\n\tMP2 P2C Message Register Dump:\n\n"); 354 355 v32 = readl(privdata->mmio + AMD_P2C_MSG1); 356 off += scnprintf(buf + off, buf_size - off, 357 "AMD_P2C_MSG1 -\t\t\t%#06x\n", v32); 358 359 v32 = readl(privdata->mmio + AMD_P2C_MSG2); 360 off += scnprintf(buf + off, buf_size - off, 361 "AMD_P2C_MSG2 -\t\t\t%#06x\n", v32); 362 363 v32 = readl(privdata->mmio + AMD_P2C_MSG_INTEN); 364 off += scnprintf(buf + off, buf_size - off, 365 "AMD_P2C_MSG_INTEN -\t\t%#06x\n", v32); 366 367 v32 = readl(privdata->mmio + AMD_P2C_MSG_INTSTS); 368 off += scnprintf(buf + off, buf_size - off, 369
Re: [v4] i2c: Add PCI and platform drivers for the AMD MP2 I2C controller
Hi Elie, I love your patch! Perhaps something to improve: [auto build test WARNING on wsa/i2c/for-next] [also build test WARNING on v4.19 next-20181019] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Elie-Morisse/i2c-Add-PCI-and-platform-drivers-for-the-AMD-MP2-I2C-controller/20181024-013625 base: https://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git i2c/for-next config: i386-allyesconfig (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 All warnings (new ones prefixed by >>): drivers/i2c/busses/i2c-amd-pci-mp2.c: In function 'amd_mp2_debugfs_read': >> drivers/i2c/busses/i2c-amd-pci-mp2.c:298:66: warning: comparison of distinct >> pointer types lacks a cast buf_size = min(count, 0x800ul); ^ vim +298 drivers/i2c/busses/i2c-amd-pci-mp2.c 285 286 static ssize_t amd_mp2_debugfs_read(struct file *filp, char __user *ubuf, 287 size_t count, loff_t *offp) 288 { 289 struct amd_mp2_dev *privdata; 290 void __iomem *mmio; 291 u8 *buf; 292 size_t buf_size; 293 ssize_t ret, off; 294 u32 v32; 295 296 privdata = filp->private_data; 297 mmio = privdata->mmio; > 298 buf_size = min(count, 0x800ul); 299 buf = kmalloc(buf_size, GFP_KERNEL); 300 301 if (!buf) 302 return -ENOMEM; 303 304 off = 0; 305 off += scnprintf(buf + off, buf_size - off, 306 "Mp2 Device Information:\n"); 307 308 off += scnprintf(buf + off, buf_size - off, 309 "\n"); 310 off += scnprintf(buf + off, buf_size - off, 311 "\tMP2 C2P Message Register Dump:\n\n"); 312 v32 = readl(privdata->mmio + AMD_C2P_MSG0); 313 off += scnprintf(buf + off, buf_size - off, 314 "AMD_C2P_MSG0 -\t\t\t%#06x\n", v32); 315 316 v32 = readl(privdata->mmio + AMD_C2P_MSG1); 317 off += scnprintf(buf + off, buf_size - off, 318 "AMD_C2P_MSG1 -\t\t\t%#06x\n", v32); 319 320 v32 = readl(privdata->mmio + AMD_C2P_MSG2); 321 off += scnprintf(buf + off, buf_size - off, 322 "AMD_C2P_MSG2 -\t\t\t%#06x\n", v32); 323 324 v32 = readl(privdata->mmio + AMD_C2P_MSG3); 325 off += scnprintf(buf + off, buf_size - off, 326 "AMD_C2P_MSG3 -\t\t\t%#06x\n", v32); 327 328 v32 = readl(privdata->mmio + AMD_C2P_MSG4); 329 off += scnprintf(buf + off, buf_size - off, 330 "AMD_C2P_MSG4 -\t\t\t%#06x\n", v32); 331 332 v32 = readl(privdata->mmio + AMD_C2P_MSG5); 333 off += scnprintf(buf + off, buf_size - off, 334 "AMD_C2P_MSG5 -\t\t\t%#06x\n", v32); 335 336 v32 = readl(privdata->mmio + AMD_C2P_MSG6); 337 off += scnprintf(buf + off, buf_size - off, 338 "AMD_C2P_MSG6 -\t\t\t%#06x\n", v32); 339 340 v32 = readl(privdata->mmio + AMD_C2P_MSG7); 341 off += scnprintf(buf + off, buf_size - off, 342 "AMD_C2P_MSG7 -\t\t\t%#06x\n", v32); 343 344 v32 = readl(privdata->mmio + AMD_C2P_MSG8); 345 off += scnprintf(buf + off, buf_size - off, 346 "AMD_C2P_MSG8 -\t\t\t%#06x\n", v32); 347 348 v32 = readl(privdata->mmio + AMD_C2P_MSG9); 349 off += scnprintf(buf + off, buf_size - off, 350 "AMD_C2P_MSG9 -\t\t\t%#06x\n", v32); 351 352 off += scnprintf(buf + off, buf_size - off, 353 "\n\tMP2 P2C Message Register Dump:\n\n"); 354 355 v32 = readl(privdata->mmio + AMD_P2C_MSG1); 356 off += scnprintf(buf + off, buf_size - off, 357 "AMD_P2C_MSG1 -\t\t\t%#06x\n", v32); 358 359 v32 = readl(privdata->mmio + AMD_P2C_MSG2); 360 off += scnprintf(buf + off, buf_size - off, 361 "AMD_P2C_MSG2 -\t\t\t%#06x\n", v32); 362 363 v32 = readl(privdata->mmio + AMD_P2C_MSG_INTEN); 364 off += scnprintf(buf + off, buf_size - off, 365 "AMD_P2C_MSG_INTEN -\t\t%#06x\n", v32); 366 367 v32 = readl(privdata->mmio + AMD_P2C_MSG_INTSTS); 368 off += scnprintf(buf + off, buf_size - off, 369
Re: [PATCH v4 01/17] remoteproc: configure IOMMU only if device address requested
On 10/23/18 2:40 PM, Loic PALLARDY wrote: > Hi Suman, > >> -Original Message- >> From: Suman Anna >> Sent: mardi 23 octobre 2018 19:26 >> To: Loic PALLARDY ; bjorn.anders...@linaro.org; >> o...@wizery.com >> Cc: linux-remotep...@vger.kernel.org; linux-kernel@vger.kernel.org; >> Arnaud POULIQUEN ; >> benjamin.gaign...@linaro.org >> Subject: Re: [PATCH v4 01/17] remoteproc: configure IOMMU only if device >> address requested >> >> Hi Loic, >> >> On 7/27/18 8:14 AM, Loic Pallardy wrote: >>> If there is no IOMMU associate to remote processor device, >>> remoteproc_core won't be able to satisfy device address requested >>> in firmware resource table. >>> Return an error as configuration won't be coherent. >>> >>> Signed-off-by: Loic Pallardy >> >> This patch is breaking my Davinci platforms. It is not really required >> that you _should_ have IOMMUs when a valid DA is mentioned. Please see >> the existing description (paras 4 and 5) on the fw_rsc_carveout >> kerneldoc in remoteproc.h file. > > Thanks for pointing this comment. Indeed sMMU is not mandatory, and at first > sight I agree we should remove the restriction introduced by the patch. > Driver porting on the series should be done before adding this. >> >> We do have platforms where we have some internal sub-modules within the >> remote processor sub-system that provides some linear >> address-translation (most common case with 32-bit processors supporting >> 64-bit addresses). Also, we have some upcoming SoCs where we have an >> MMU >> but is not programmable by Linux. >> >> There is one comment there, but I don't think this is actually handled >> in the current remoteproc core. >> "If @da is set to >> * FW_RSC_ADDR_ANY, then the host will dynamically allocate it, and then >> * overwrite @da with the dynamically allocated address." >> > I don't remember it was implemented like described. Yes, it was missing, and one of your patches seem to add this behavior now. That said, I really don't think the remoteproc core can dictate the da. Even if the individual remoteproc driver were to furnish this, how would you get such data without forcing a fixed behavior for all possible firmwares (not desirable). We should get rid of this comment, and any code that seems to do this. > > I have remarks about the comment: > "* We will always use @da to negotiate the device addresses, even if it > * isn't using an iommu. In that case, though, it will obviously contain > * physical addresses." > > When there is no sMMU, we can't consider that da contains a physical address > because coprocessor can have its own memory map just because it is a 32bit > processor accessing only a part of the memory and the main is 64bit one. The > 2 processors won't see the internal memory at the same base address for > example. Agreed, believe it was valid when it was written (32-bit platforms supporting 32-bit addresses). I think this is akin to an IPA (Intermediate Physical Address). > So what should we do when carveout allocated by host is not fitting with > resource table request? > - put a warning and overwrite da address in the resource table? Hmm, why da? This goes to my earlier comment about how you are able to decide the da. Atleast your current ST driver seems to be assigning the same value as the physical bus address for da, which would prompt why you would still need a carveout entry in the resource table if it is truly one-to-one. Eg, I have an upcoming usecase with R5Fs on newer TI SoCs where we actually have a sub-module called Region Address Translator (RAT) which can only be programmed by the R5F for translating the 32-bit CPU addresses to larger physical address space, and yet I need the da and pa to be able to do loading. I cannot dictate the da since that is what the firmware images are linked against. So, have to rely on the firmware providing this data for me. > - stop rproc probe as no match detected? I think that is the safest approach. > > Later in the series, carveout allocation is changed. Resource table carveout > are either linked with an existing carveout registered by driver or added to > carveout list for allocations. > In the case you described, TI driver should first register the specific > carveout regions thank to the helper. The current series should still continue to work without having to enforce new name assignments (unless needed and being defined to use the new features being added). > > Regards, > Loic > >> regards >> Suman >> >>> --- >>> drivers/remoteproc/remoteproc_core.c | 10 +- >>> 1 file changed, 9 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/remoteproc/remoteproc_core.c >> b/drivers/remoteproc/remoteproc_core.c >>> index 4cd1a8e..437fabf 100644 >>> --- a/drivers/remoteproc/remoteproc_core.c >>> +++ b/drivers/remoteproc/remoteproc_core.c >>> @@ -657,7 +657,15 @@ static int rproc_handle_carveout(struct rproc >> *rproc, >>> * to use the iommu-based DMA API: we expect 'dma' to
Re: [PATCH v4 01/17] remoteproc: configure IOMMU only if device address requested
On 10/23/18 2:40 PM, Loic PALLARDY wrote: > Hi Suman, > >> -Original Message- >> From: Suman Anna >> Sent: mardi 23 octobre 2018 19:26 >> To: Loic PALLARDY ; bjorn.anders...@linaro.org; >> o...@wizery.com >> Cc: linux-remotep...@vger.kernel.org; linux-kernel@vger.kernel.org; >> Arnaud POULIQUEN ; >> benjamin.gaign...@linaro.org >> Subject: Re: [PATCH v4 01/17] remoteproc: configure IOMMU only if device >> address requested >> >> Hi Loic, >> >> On 7/27/18 8:14 AM, Loic Pallardy wrote: >>> If there is no IOMMU associate to remote processor device, >>> remoteproc_core won't be able to satisfy device address requested >>> in firmware resource table. >>> Return an error as configuration won't be coherent. >>> >>> Signed-off-by: Loic Pallardy >> >> This patch is breaking my Davinci platforms. It is not really required >> that you _should_ have IOMMUs when a valid DA is mentioned. Please see >> the existing description (paras 4 and 5) on the fw_rsc_carveout >> kerneldoc in remoteproc.h file. > > Thanks for pointing this comment. Indeed sMMU is not mandatory, and at first > sight I agree we should remove the restriction introduced by the patch. > Driver porting on the series should be done before adding this. >> >> We do have platforms where we have some internal sub-modules within the >> remote processor sub-system that provides some linear >> address-translation (most common case with 32-bit processors supporting >> 64-bit addresses). Also, we have some upcoming SoCs where we have an >> MMU >> but is not programmable by Linux. >> >> There is one comment there, but I don't think this is actually handled >> in the current remoteproc core. >> "If @da is set to >> * FW_RSC_ADDR_ANY, then the host will dynamically allocate it, and then >> * overwrite @da with the dynamically allocated address." >> > I don't remember it was implemented like described. Yes, it was missing, and one of your patches seem to add this behavior now. That said, I really don't think the remoteproc core can dictate the da. Even if the individual remoteproc driver were to furnish this, how would you get such data without forcing a fixed behavior for all possible firmwares (not desirable). We should get rid of this comment, and any code that seems to do this. > > I have remarks about the comment: > "* We will always use @da to negotiate the device addresses, even if it > * isn't using an iommu. In that case, though, it will obviously contain > * physical addresses." > > When there is no sMMU, we can't consider that da contains a physical address > because coprocessor can have its own memory map just because it is a 32bit > processor accessing only a part of the memory and the main is 64bit one. The > 2 processors won't see the internal memory at the same base address for > example. Agreed, believe it was valid when it was written (32-bit platforms supporting 32-bit addresses). I think this is akin to an IPA (Intermediate Physical Address). > So what should we do when carveout allocated by host is not fitting with > resource table request? > - put a warning and overwrite da address in the resource table? Hmm, why da? This goes to my earlier comment about how you are able to decide the da. Atleast your current ST driver seems to be assigning the same value as the physical bus address for da, which would prompt why you would still need a carveout entry in the resource table if it is truly one-to-one. Eg, I have an upcoming usecase with R5Fs on newer TI SoCs where we actually have a sub-module called Region Address Translator (RAT) which can only be programmed by the R5F for translating the 32-bit CPU addresses to larger physical address space, and yet I need the da and pa to be able to do loading. I cannot dictate the da since that is what the firmware images are linked against. So, have to rely on the firmware providing this data for me. > - stop rproc probe as no match detected? I think that is the safest approach. > > Later in the series, carveout allocation is changed. Resource table carveout > are either linked with an existing carveout registered by driver or added to > carveout list for allocations. > In the case you described, TI driver should first register the specific > carveout regions thank to the helper. The current series should still continue to work without having to enforce new name assignments (unless needed and being defined to use the new features being added). > > Regards, > Loic > >> regards >> Suman >> >>> --- >>> drivers/remoteproc/remoteproc_core.c | 10 +- >>> 1 file changed, 9 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/remoteproc/remoteproc_core.c >> b/drivers/remoteproc/remoteproc_core.c >>> index 4cd1a8e..437fabf 100644 >>> --- a/drivers/remoteproc/remoteproc_core.c >>> +++ b/drivers/remoteproc/remoteproc_core.c >>> @@ -657,7 +657,15 @@ static int rproc_handle_carveout(struct rproc >> *rproc, >>> * to use the iommu-based DMA API: we expect 'dma' to
[PATCH 1/1] nds32: Power management for nds32
There are three sleep states in nds32: suspend to idle, suspend to standby, suspend to ram In suspend to ram, we use the 'standby' instruction to emulate power management device to hang the system util wakeup source send wakeup events to break the loop. First, we push the general purpose registers and system registers to stack. Second, we translate stack pointer to physical address and store to memory to save the stack pointer. Third, after write back and invalid the cache we hang in 'standby' intruction. When wakeup source trigger wake up events, the loop will be break and resume the system. Signed-off-by: Nickhu --- arch/nds32/Kconfig | 10 +++ arch/nds32/include/asm/suspend.h | 11 +++ arch/nds32/kernel/Makefile | 2 +- arch/nds32/kernel/pm.c | 91 ++ arch/nds32/kernel/sleep.S| 129 +++ drivers/irqchip/irq-ativic32.c | 29 +++ 6 files changed, 271 insertions(+), 1 deletion(-) create mode 100644 arch/nds32/include/asm/suspend.h create mode 100644 arch/nds32/kernel/pm.c create mode 100644 arch/nds32/kernel/sleep.S diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig index dd448d431f5a..8e2c5ac6acd1 100644 --- a/arch/nds32/Kconfig +++ b/arch/nds32/Kconfig @@ -95,3 +95,13 @@ endmenu menu "Kernel Features" source "kernel/Kconfig.hz" endmenu + +menu "Power management options" +config SYS_SUPPORTS_APM_EMULATION + bool + +config ARCH_SUSPEND_POSSIBLE + def_bool y + +source "kernel/power/Kconfig" +endmenu diff --git a/arch/nds32/include/asm/suspend.h b/arch/nds32/include/asm/suspend.h new file mode 100644 index ..6ed2418af1ac --- /dev/null +++ b/arch/nds32/include/asm/suspend.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +// Copyright (C) 2008-2017 Andes Technology Corporation + +#ifndef __ASM_NDS32_SUSPEND_H +#define __ASM_NDS32_SUSPEND_H + +extern void suspend2ram(void); +extern void cpu_resume(void); +extern unsigned long wake_mask; + +#endif diff --git a/arch/nds32/kernel/Makefile b/arch/nds32/kernel/Makefile index f52bd2744f50..8d62f2ecb1ab 100644 --- a/arch/nds32/kernel/Makefile +++ b/arch/nds32/kernel/Makefile @@ -16,7 +16,7 @@ obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-$(CONFIG_OF) += devtree.o obj-$(CONFIG_CACHE_L2) += atl2c.o obj-$(CONFIG_PERF_EVENTS) += perf_event_cpu.o - +obj-$(CONFIG_PM) += pm.o sleep.o extra-y := head.o vmlinux.lds obj-y += vdso/ diff --git a/arch/nds32/kernel/pm.c b/arch/nds32/kernel/pm.c new file mode 100644 index ..e1eaf3bac709 --- /dev/null +++ b/arch/nds32/kernel/pm.c @@ -0,0 +1,91 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (C) 2008-2017 Andes Technology Corporation + +/* + * nds32 Power Management Routines + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License. + * + * Abstract: + * + *This program is for nds32 power management routines. + * + */ + +#include +#include +#include +#include +#include +#include +#include + +unsigned int resume_addr; +unsigned int *phy_addr_sp_tmp; + +static void nds32_suspend2ram(void) +{ + pgd_t *pgdv; + pud_t *pudv; + pmd_t *pmdv; + pte_t *ptev; + + pgdv = (pgd_t *)__va((__nds32__mfsr(NDS32_SR_L1_PPTB) & + L1_PPTB_mskBASE)) + pgd_index((unsigned int)cpu_resume); + + pudv = pud_offset(pgdv, (unsigned int)cpu_resume); + pmdv = pmd_offset(pudv, (unsigned int)cpu_resume); + ptev = pte_offset_map(pmdv, (unsigned int)cpu_resume); + + resume_addr = ((*ptev) & TLB_DATA_mskPPN) + | ((unsigned int)cpu_resume & 0x0fff); + + suspend2ram(); +} + +static void nds32_suspend_cpu(void) +{ + while (!(__nds32__mfsr(NDS32_SR_INT_PEND) & wake_mask)) + __asm__ volatile ("standby no_wake_grant\n\t"); +} + +static int nds32_pm_valid(suspend_state_t state) +{ + switch (state) { + case PM_SUSPEND_ON: + case PM_SUSPEND_STANDBY: + case PM_SUSPEND_MEM: + return 1; + default: + return 0; + } +} + +static int nds32_pm_enter(suspend_state_t state) +{ + pr_debug("%s:state:%d\n", __func__, state); + switch (state) { + case PM_SUSPEND_STANDBY: + nds32_suspend_cpu(); + return 0; + case PM_SUSPEND_MEM: + nds32_suspend2ram(); + return 0; + default: + return -EINVAL; + } +} + +static const struct platform_suspend_ops nds32_pm_ops = { + .valid = nds32_pm_valid, + .enter = nds32_pm_enter, +}; + +static int __init nds32_pm_init(void) +{ + pr_debug("Enter %s\n", __func__); + suspend_set_ops(_pm_ops); + return 0; +} +late_initcall(nds32_pm_init); diff --git a/arch/nds32/kernel/sleep.S b/arch/nds32/kernel/sleep.S new
[PATCH 0/1] nds32: Power management
This commit is power management porting for nds32. Nickhu (1): nds32: Power management for nds32 arch/nds32/Kconfig | 10 +++ arch/nds32/include/asm/suspend.h | 11 +++ arch/nds32/kernel/Makefile | 2 +- arch/nds32/kernel/pm.c | 91 ++ arch/nds32/kernel/sleep.S| 129 +++ drivers/irqchip/irq-ativic32.c | 29 +++ 6 files changed, 271 insertions(+), 1 deletion(-) create mode 100644 arch/nds32/include/asm/suspend.h create mode 100644 arch/nds32/kernel/pm.c create mode 100644 arch/nds32/kernel/sleep.S -- 2.17.0
[PATCH 1/1] nds32: Power management for nds32
There are three sleep states in nds32: suspend to idle, suspend to standby, suspend to ram In suspend to ram, we use the 'standby' instruction to emulate power management device to hang the system util wakeup source send wakeup events to break the loop. First, we push the general purpose registers and system registers to stack. Second, we translate stack pointer to physical address and store to memory to save the stack pointer. Third, after write back and invalid the cache we hang in 'standby' intruction. When wakeup source trigger wake up events, the loop will be break and resume the system. Signed-off-by: Nickhu --- arch/nds32/Kconfig | 10 +++ arch/nds32/include/asm/suspend.h | 11 +++ arch/nds32/kernel/Makefile | 2 +- arch/nds32/kernel/pm.c | 91 ++ arch/nds32/kernel/sleep.S| 129 +++ drivers/irqchip/irq-ativic32.c | 29 +++ 6 files changed, 271 insertions(+), 1 deletion(-) create mode 100644 arch/nds32/include/asm/suspend.h create mode 100644 arch/nds32/kernel/pm.c create mode 100644 arch/nds32/kernel/sleep.S diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig index dd448d431f5a..8e2c5ac6acd1 100644 --- a/arch/nds32/Kconfig +++ b/arch/nds32/Kconfig @@ -95,3 +95,13 @@ endmenu menu "Kernel Features" source "kernel/Kconfig.hz" endmenu + +menu "Power management options" +config SYS_SUPPORTS_APM_EMULATION + bool + +config ARCH_SUSPEND_POSSIBLE + def_bool y + +source "kernel/power/Kconfig" +endmenu diff --git a/arch/nds32/include/asm/suspend.h b/arch/nds32/include/asm/suspend.h new file mode 100644 index ..6ed2418af1ac --- /dev/null +++ b/arch/nds32/include/asm/suspend.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +// Copyright (C) 2008-2017 Andes Technology Corporation + +#ifndef __ASM_NDS32_SUSPEND_H +#define __ASM_NDS32_SUSPEND_H + +extern void suspend2ram(void); +extern void cpu_resume(void); +extern unsigned long wake_mask; + +#endif diff --git a/arch/nds32/kernel/Makefile b/arch/nds32/kernel/Makefile index f52bd2744f50..8d62f2ecb1ab 100644 --- a/arch/nds32/kernel/Makefile +++ b/arch/nds32/kernel/Makefile @@ -16,7 +16,7 @@ obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-$(CONFIG_OF) += devtree.o obj-$(CONFIG_CACHE_L2) += atl2c.o obj-$(CONFIG_PERF_EVENTS) += perf_event_cpu.o - +obj-$(CONFIG_PM) += pm.o sleep.o extra-y := head.o vmlinux.lds obj-y += vdso/ diff --git a/arch/nds32/kernel/pm.c b/arch/nds32/kernel/pm.c new file mode 100644 index ..e1eaf3bac709 --- /dev/null +++ b/arch/nds32/kernel/pm.c @@ -0,0 +1,91 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (C) 2008-2017 Andes Technology Corporation + +/* + * nds32 Power Management Routines + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License. + * + * Abstract: + * + *This program is for nds32 power management routines. + * + */ + +#include +#include +#include +#include +#include +#include +#include + +unsigned int resume_addr; +unsigned int *phy_addr_sp_tmp; + +static void nds32_suspend2ram(void) +{ + pgd_t *pgdv; + pud_t *pudv; + pmd_t *pmdv; + pte_t *ptev; + + pgdv = (pgd_t *)__va((__nds32__mfsr(NDS32_SR_L1_PPTB) & + L1_PPTB_mskBASE)) + pgd_index((unsigned int)cpu_resume); + + pudv = pud_offset(pgdv, (unsigned int)cpu_resume); + pmdv = pmd_offset(pudv, (unsigned int)cpu_resume); + ptev = pte_offset_map(pmdv, (unsigned int)cpu_resume); + + resume_addr = ((*ptev) & TLB_DATA_mskPPN) + | ((unsigned int)cpu_resume & 0x0fff); + + suspend2ram(); +} + +static void nds32_suspend_cpu(void) +{ + while (!(__nds32__mfsr(NDS32_SR_INT_PEND) & wake_mask)) + __asm__ volatile ("standby no_wake_grant\n\t"); +} + +static int nds32_pm_valid(suspend_state_t state) +{ + switch (state) { + case PM_SUSPEND_ON: + case PM_SUSPEND_STANDBY: + case PM_SUSPEND_MEM: + return 1; + default: + return 0; + } +} + +static int nds32_pm_enter(suspend_state_t state) +{ + pr_debug("%s:state:%d\n", __func__, state); + switch (state) { + case PM_SUSPEND_STANDBY: + nds32_suspend_cpu(); + return 0; + case PM_SUSPEND_MEM: + nds32_suspend2ram(); + return 0; + default: + return -EINVAL; + } +} + +static const struct platform_suspend_ops nds32_pm_ops = { + .valid = nds32_pm_valid, + .enter = nds32_pm_enter, +}; + +static int __init nds32_pm_init(void) +{ + pr_debug("Enter %s\n", __func__); + suspend_set_ops(_pm_ops); + return 0; +} +late_initcall(nds32_pm_init); diff --git a/arch/nds32/kernel/sleep.S b/arch/nds32/kernel/sleep.S new
[PATCH 0/1] nds32: Power management
This commit is power management porting for nds32. Nickhu (1): nds32: Power management for nds32 arch/nds32/Kconfig | 10 +++ arch/nds32/include/asm/suspend.h | 11 +++ arch/nds32/kernel/Makefile | 2 +- arch/nds32/kernel/pm.c | 91 ++ arch/nds32/kernel/sleep.S| 129 +++ drivers/irqchip/irq-ativic32.c | 29 +++ 6 files changed, 271 insertions(+), 1 deletion(-) create mode 100644 arch/nds32/include/asm/suspend.h create mode 100644 arch/nds32/kernel/pm.c create mode 100644 arch/nds32/kernel/sleep.S -- 2.17.0
[PATCH 14/25] afs: Commit the status on a new file/dir/symlink [ver #2]
Call the function to commit the status on a new file, dir or symlink so that the access rights for the caller's key are cached for that object. Without this, the next access to the file will cause a FetchStatus operation to be emitted to retrieve the access rights. Signed-off-by: David Howells --- fs/afs/dir.c |1 + 1 file changed, 1 insertion(+) diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 024b7cf7441c..8936731c59ff 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -1089,6 +1089,7 @@ static void afs_vnode_new_inode(struct afs_fs_cursor *fc, vnode = AFS_FS_I(inode); set_bit(AFS_VNODE_NEW_CONTENT, >flags); + afs_vnode_commit_status(fc, vnode, 0); d_add(new_dentry, inode); }
[PATCH 24/25] afs: Fix callback handling [ver #2]
In some circumstances, the callback interest pointer is NULL, so in such a case we can't dereference it when checking to see if the callback is broken. This causes an oops in some circumstances. Fix this by replacing the function that worked out the aggregate break counter with one that actually does the comparison, and then make that return true (ie. broken) if there is no callback interest as yet (ie. the pointer is NULL). Fixes: 68251f0a6818 ("afs: Fix whole-volume callback handling") Signed-off-by: David Howells --- fs/afs/fsclient.c |2 +- fs/afs/internal.h |9 ++--- fs/afs/security.c |7 --- fs/afs/yfsclient.c |2 +- 4 files changed, 12 insertions(+), 8 deletions(-) diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 3975969719de..7c75a1813321 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -269,7 +269,7 @@ static void xdr_decode_AFSCallBack(struct afs_call *call, write_seqlock(>cb_lock); - if (call->cb_break == afs_cb_break_sum(vnode, cbi)) { + if (!afs_cb_is_broken(call->cb_break, vnode, cbi)) { vnode->cb_version = ntohl(*bp++); cb_expiry = ntohl(*bp++); vnode->cb_type = ntohl(*bp++); diff --git a/fs/afs/internal.h b/fs/afs/internal.h index e5b596bd8acf..b60d15212975 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -776,10 +776,13 @@ static inline unsigned int afs_calc_vnode_cb_break(struct afs_vnode *vnode) return vnode->cb_break + vnode->cb_s_break + vnode->cb_v_break; } -static inline unsigned int afs_cb_break_sum(struct afs_vnode *vnode, - struct afs_cb_interest *cbi) +static inline bool afs_cb_is_broken(unsigned int cb_break, + const struct afs_vnode *vnode, + const struct afs_cb_interest *cbi) { - return vnode->cb_break + cbi->server->cb_s_break + vnode->volume->cb_v_break; + return !cbi || cb_break != (vnode->cb_break + + cbi->server->cb_s_break + + vnode->volume->cb_v_break); } /* diff --git a/fs/afs/security.c b/fs/afs/security.c index d1ae53fd3739..5f58a9a17e69 100644 --- a/fs/afs/security.c +++ b/fs/afs/security.c @@ -147,7 +147,8 @@ void afs_cache_permit(struct afs_vnode *vnode, struct key *key, break; } - if (cb_break != afs_cb_break_sum(vnode, vnode->cb_interest)) { + if (afs_cb_is_broken(cb_break, vnode, +vnode->cb_interest)) { changed = true; break; } @@ -177,7 +178,7 @@ void afs_cache_permit(struct afs_vnode *vnode, struct key *key, } } - if (cb_break != afs_cb_break_sum(vnode, vnode->cb_interest)) + if (afs_cb_is_broken(cb_break, vnode, vnode->cb_interest)) goto someone_else_changed_it; /* We need a ref on any permits list we want to copy as we'll have to @@ -256,7 +257,7 @@ void afs_cache_permit(struct afs_vnode *vnode, struct key *key, spin_lock(>lock); zap = rcu_access_pointer(vnode->permit_cache); - if (cb_break == afs_cb_break_sum(vnode, vnode->cb_interest) && + if (!afs_cb_is_broken(cb_break, vnode, vnode->cb_interest) && zap == permits) rcu_assign_pointer(vnode->permit_cache, replacement); else diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c index d5e3f0095040..12658c1363ae 100644 --- a/fs/afs/yfsclient.c +++ b/fs/afs/yfsclient.c @@ -324,7 +324,7 @@ static void xdr_decode_YFSCallBack(struct afs_call *call, write_seqlock(>cb_lock); - if (call->cb_break == afs_cb_break_sum(vnode, cbi)) { + if (!afs_cb_is_broken(call->cb_break, vnode, cbi)) { cb_expiry = xdr_to_u64(xdr->expiration_time); do_div(cb_expiry, 10 * 1000 * 1000); vnode->cb_version = ntohl(xdr->version);
[PATCH 14/25] afs: Commit the status on a new file/dir/symlink [ver #2]
Call the function to commit the status on a new file, dir or symlink so that the access rights for the caller's key are cached for that object. Without this, the next access to the file will cause a FetchStatus operation to be emitted to retrieve the access rights. Signed-off-by: David Howells --- fs/afs/dir.c |1 + 1 file changed, 1 insertion(+) diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 024b7cf7441c..8936731c59ff 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -1089,6 +1089,7 @@ static void afs_vnode_new_inode(struct afs_fs_cursor *fc, vnode = AFS_FS_I(inode); set_bit(AFS_VNODE_NEW_CONTENT, >flags); + afs_vnode_commit_status(fc, vnode, 0); d_add(new_dentry, inode); }
[PATCH 24/25] afs: Fix callback handling [ver #2]
In some circumstances, the callback interest pointer is NULL, so in such a case we can't dereference it when checking to see if the callback is broken. This causes an oops in some circumstances. Fix this by replacing the function that worked out the aggregate break counter with one that actually does the comparison, and then make that return true (ie. broken) if there is no callback interest as yet (ie. the pointer is NULL). Fixes: 68251f0a6818 ("afs: Fix whole-volume callback handling") Signed-off-by: David Howells --- fs/afs/fsclient.c |2 +- fs/afs/internal.h |9 ++--- fs/afs/security.c |7 --- fs/afs/yfsclient.c |2 +- 4 files changed, 12 insertions(+), 8 deletions(-) diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 3975969719de..7c75a1813321 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -269,7 +269,7 @@ static void xdr_decode_AFSCallBack(struct afs_call *call, write_seqlock(>cb_lock); - if (call->cb_break == afs_cb_break_sum(vnode, cbi)) { + if (!afs_cb_is_broken(call->cb_break, vnode, cbi)) { vnode->cb_version = ntohl(*bp++); cb_expiry = ntohl(*bp++); vnode->cb_type = ntohl(*bp++); diff --git a/fs/afs/internal.h b/fs/afs/internal.h index e5b596bd8acf..b60d15212975 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -776,10 +776,13 @@ static inline unsigned int afs_calc_vnode_cb_break(struct afs_vnode *vnode) return vnode->cb_break + vnode->cb_s_break + vnode->cb_v_break; } -static inline unsigned int afs_cb_break_sum(struct afs_vnode *vnode, - struct afs_cb_interest *cbi) +static inline bool afs_cb_is_broken(unsigned int cb_break, + const struct afs_vnode *vnode, + const struct afs_cb_interest *cbi) { - return vnode->cb_break + cbi->server->cb_s_break + vnode->volume->cb_v_break; + return !cbi || cb_break != (vnode->cb_break + + cbi->server->cb_s_break + + vnode->volume->cb_v_break); } /* diff --git a/fs/afs/security.c b/fs/afs/security.c index d1ae53fd3739..5f58a9a17e69 100644 --- a/fs/afs/security.c +++ b/fs/afs/security.c @@ -147,7 +147,8 @@ void afs_cache_permit(struct afs_vnode *vnode, struct key *key, break; } - if (cb_break != afs_cb_break_sum(vnode, vnode->cb_interest)) { + if (afs_cb_is_broken(cb_break, vnode, +vnode->cb_interest)) { changed = true; break; } @@ -177,7 +178,7 @@ void afs_cache_permit(struct afs_vnode *vnode, struct key *key, } } - if (cb_break != afs_cb_break_sum(vnode, vnode->cb_interest)) + if (afs_cb_is_broken(cb_break, vnode, vnode->cb_interest)) goto someone_else_changed_it; /* We need a ref on any permits list we want to copy as we'll have to @@ -256,7 +257,7 @@ void afs_cache_permit(struct afs_vnode *vnode, struct key *key, spin_lock(>lock); zap = rcu_access_pointer(vnode->permit_cache); - if (cb_break == afs_cb_break_sum(vnode, vnode->cb_interest) && + if (!afs_cb_is_broken(cb_break, vnode, vnode->cb_interest) && zap == permits) rcu_assign_pointer(vnode->permit_cache, replacement); else diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c index d5e3f0095040..12658c1363ae 100644 --- a/fs/afs/yfsclient.c +++ b/fs/afs/yfsclient.c @@ -324,7 +324,7 @@ static void xdr_decode_YFSCallBack(struct afs_call *call, write_seqlock(>cb_lock); - if (call->cb_break == afs_cb_break_sum(vnode, cbi)) { + if (!afs_cb_is_broken(call->cb_break, vnode, cbi)) { cb_expiry = xdr_to_u64(xdr->expiration_time); do_div(cb_expiry, 10 * 1000 * 1000); vnode->cb_version = ntohl(xdr->version);
[PATCH 19/25] afs: Get the target vnode in afs_rmdir() and get a callback on it [ver #2]
Get the target vnode in afs_rmdir() and validate it before we attempt the deletion, The vnode pointer will be passed through to the delivery function in a later patch so that the delivery function can mark it deleted. Signed-off-by: David Howells --- fs/afs/dir.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 8936731c59ff..f2dd48d4363f 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -1174,7 +1174,7 @@ static void afs_dir_remove_subdir(struct dentry *dentry) static int afs_rmdir(struct inode *dir, struct dentry *dentry) { struct afs_fs_cursor fc; - struct afs_vnode *dvnode = AFS_FS_I(dir); + struct afs_vnode *dvnode = AFS_FS_I(dir), *vnode = NULL; struct key *key; u64 data_version = dvnode->status.data_version; int ret; @@ -1188,6 +1188,14 @@ static int afs_rmdir(struct inode *dir, struct dentry *dentry) goto error; } + /* Try to make sure we have a callback promise on the victim. */ + if (d_really_is_positive(dentry)) { + vnode = AFS_FS_I(d_inode(dentry)); + ret = afs_validate(vnode, key); + if (ret < 0) + goto error_key; + } + ret = -ERESTARTSYS; if (afs_begin_vnode_operation(, dvnode, key)) { while (afs_select_fileserver()) { @@ -1206,6 +1214,7 @@ static int afs_rmdir(struct inode *dir, struct dentry *dentry) } } +error_key: key_put(key); error: return ret;
[PATCH 19/25] afs: Get the target vnode in afs_rmdir() and get a callback on it [ver #2]
Get the target vnode in afs_rmdir() and validate it before we attempt the deletion, The vnode pointer will be passed through to the delivery function in a later patch so that the delivery function can mark it deleted. Signed-off-by: David Howells --- fs/afs/dir.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 8936731c59ff..f2dd48d4363f 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -1174,7 +1174,7 @@ static void afs_dir_remove_subdir(struct dentry *dentry) static int afs_rmdir(struct inode *dir, struct dentry *dentry) { struct afs_fs_cursor fc; - struct afs_vnode *dvnode = AFS_FS_I(dir); + struct afs_vnode *dvnode = AFS_FS_I(dir), *vnode = NULL; struct key *key; u64 data_version = dvnode->status.data_version; int ret; @@ -1188,6 +1188,14 @@ static int afs_rmdir(struct inode *dir, struct dentry *dentry) goto error; } + /* Try to make sure we have a callback promise on the victim. */ + if (d_really_is_positive(dentry)) { + vnode = AFS_FS_I(d_inode(dentry)); + ret = afs_validate(vnode, key); + if (ret < 0) + goto error_key; + } + ret = -ERESTARTSYS; if (afs_begin_vnode_operation(, dvnode, key)) { while (afs_select_fileserver()) { @@ -1206,6 +1214,7 @@ static int afs_rmdir(struct inode *dir, struct dentry *dentry) } } +error_key: key_put(key); error: return ret;
[PATCH 22/25] afs: Allow dumping of server cursor on operation failure [ver #2]
Provide an option to allow the file or volume location server cursor to be dumped if the rotation routine falls off the end without managing to contact a server. Signed-off-by: David Howells --- fs/afs/Kconfig | 12 +++ fs/afs/addr_list.c |2 ++ fs/afs/internal.h |3 +++ fs/afs/rotate.c| 57 fs/afs/vl_rotate.c | 53 5 files changed, 127 insertions(+) diff --git a/fs/afs/Kconfig b/fs/afs/Kconfig index ebba3b18e5da..701aaa9b1899 100644 --- a/fs/afs/Kconfig +++ b/fs/afs/Kconfig @@ -27,3 +27,15 @@ config AFS_FSCACHE help Say Y here if you want AFS data to be cached locally on disk through the generic filesystem cache manager + +config AFS_DEBUG_CURSOR + bool "AFS server cursor debugging" + depends on AFS_FS + help + Say Y here to cause the contents of a server cursor to be dumped to + the dmesg log if the server rotation algorithm fails to successfully + contact a server. + + See for more information. + + If unsure, say N. diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index 3f60b4012587..bc5ce31a4ae4 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -358,6 +358,8 @@ bool afs_iterate_addresses(struct afs_addr_cursor *ac) if (!ac->alist) return false; + ac->nr_iterations++; + if (ac->begun) { ac->index++; if (ac->index == ac->alist->nr_addrs) diff --git a/fs/afs/internal.h b/fs/afs/internal.h index ce79bd514331..ac9da1e4050e 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -660,6 +660,7 @@ struct afs_addr_cursor { short error; boolbegun; /* T if we've begun iteration */ boolresponded; /* T if the current address responded */ + unsigned short nr_iterations; /* Number of address iterations */ }; /* @@ -677,6 +678,7 @@ struct afs_vl_cursor { #define AFS_VL_CURSOR_STOP 0x0001 /* Set to cease iteration */ #define AFS_VL_CURSOR_RETRY0x0002 /* Set to do a retry */ #define AFS_VL_CURSOR_RETRIED 0x0004 /* Set if started a retry */ + unsigned short nr_iterations; /* Number of server iterations */ }; /* @@ -700,6 +702,7 @@ struct afs_fs_cursor { #define AFS_FS_CURSOR_VNOVOL 0x0008 /* Set if seen VNOVOL */ #define AFS_FS_CURSOR_CUR_ONLY 0x0010 /* Set if current server only (file lock held) */ #define AFS_FS_CURSOR_NO_VSLEEP0x0020 /* Set to prevent sleep on VBUSY, VOFFLINE, ... */ + unsigned short nr_iterations; /* Number of server iterations */ }; /* diff --git a/fs/afs/rotate.c b/fs/afs/rotate.c index 41405dde0113..7c4487781637 100644 --- a/fs/afs/rotate.c +++ b/fs/afs/rotate.c @@ -156,6 +156,8 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc) return false; } + fc->nr_iterations++; + /* Evaluate the result of the previous operation, if there was one. */ switch (error) { case SHRT_MAX: @@ -519,6 +521,56 @@ bool afs_select_current_fileserver(struct afs_fs_cursor *fc) return false; } +/* + * Dump cursor state in the case of the error being EDESTADDRREQ. + */ +static void afs_dump_edestaddrreq(const struct afs_fs_cursor *fc) +{ + static int count; + int i; + + if (!IS_ENABLED(CONFIG_AFS_DEBUG_CURSOR) || count > 3) + return; + count++; + + rcu_read_lock(); + + pr_notice("EDESTADDR occurred\n"); + pr_notice("FC: cbb=%x cbb2=%x fl=%hx err=%hd\n", + fc->cb_break, fc->cb_break_2, fc->flags, fc->error); + pr_notice("FC: st=%u ix=%u ni=%u\n", + fc->start, fc->index, fc->nr_iterations); + + if (fc->server_list) { + const struct afs_server_list *sl = fc->server_list; + pr_notice("FC: SL nr=%u ix=%u vnov=%hx\n", + sl->nr_servers, sl->index, sl->vnovol_mask); + for (i = 0; i < sl->nr_servers; i++) { + const struct afs_server *s = sl->servers[i].server; + pr_notice("FC: server fl=%lx av=%u %pU\n", + s->flags, s->addr_version, >uuid); + if (s->addresses) { + const struct afs_addr_list *a = + rcu_dereference(s->addresses); + pr_notice("FC: - av=%u nr=%u/%u/%u ax=%u\n", + a->version, + a->nr_ipv4, a->nr_addrs, a->max_addrs, + a->index); + pr_notice("FC: - pr=%lx yf=%lx\n", +
[PATCH 12/25] afs: Don't invoke the server to read data beyond EOF [ver #2]
When writing a new page, clear space in the page rather than attempting to load it from the server if the space is beyond the EOF. Signed-off-by: David Howells --- fs/afs/write.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/fs/afs/write.c b/fs/afs/write.c index fdb9d6024126..11066a3248ba 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -33,10 +33,21 @@ static int afs_fill_page(struct afs_vnode *vnode, struct key *key, loff_t pos, unsigned int len, struct page *page) { struct afs_read *req; + size_t p; + void *data; int ret; _enter(",,%llu", (unsigned long long)pos); + if (pos >= vnode->vfs_inode.i_size) { + p = pos & ~PAGE_MASK; + ASSERTCMP(p + len, <=, PAGE_SIZE); + data = kmap(page); + memset(data + p, 0, len); + kunmap(page); + return 0; + } + req = kzalloc(sizeof(struct afs_read) + sizeof(struct page *), GFP_KERNEL); if (!req)
[PATCH 08/25] afs: Implement VL server rotation [ver #2]
Track VL servers as independent entities rather than lumping all their addresses together into one set and implement server-level rotation by: (1) Add the concept of a VL server list, where each server has its own separate address list. This code is similar to the FS server list. (2) Use the DNS resolver to retrieve a set of servers and their associated addresses, ports, preference and weight ratings. (3) In the case of a legacy DNS resolver or an address list given directly through /proc/net/afs/cells, create a list containing just a dummy server record and attach all the addresses to that. (4) Implement a simple rotation policy, for the moment ignoring the priorities and weights assigned to the servers. (5) Show the address list through /proc/net/afs//vlservers. This also displays the source and status of the data as indicated by the upcall. Signed-off-by: David Howells --- fs/afs/Makefile|2 fs/afs/addr_list.c | 163 + fs/afs/cell.c | 39 +++--- fs/afs/dynroot.c |2 fs/afs/internal.h | 114 -- fs/afs/proc.c | 90 +++--- fs/afs/server.c| 42 ++- fs/afs/vl_list.c | 336 fs/afs/vl_rotate.c | 251 +++ fs/afs/vlclient.c | 32 ++--- fs/afs/volume.c| 52 ++-- 11 files changed, 905 insertions(+), 218 deletions(-) create mode 100644 fs/afs/vl_list.c create mode 100644 fs/afs/vl_rotate.c diff --git a/fs/afs/Makefile b/fs/afs/Makefile index 546874057bd3..03e9f7afea1b 100644 --- a/fs/afs/Makefile +++ b/fs/afs/Makefile @@ -29,6 +29,8 @@ kafs-y := \ super.o \ netdevices.o \ vlclient.o \ + vl_rotate.o \ + vl_list.o \ volume.o \ write.o \ xattr.o diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index 7b34fad4f8f5..3f60b4012587 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -64,19 +64,25 @@ struct afs_addr_list *afs_alloc_addrlist(unsigned int nr, /* * Parse a text string consisting of delimited addresses. */ -struct afs_addr_list *afs_parse_text_addrs(const char *text, size_t len, - char delim, - unsigned short service, - unsigned short port) +struct afs_vlserver_list *afs_parse_text_addrs(struct afs_net *net, + const char *text, size_t len, + char delim, + unsigned short service, + unsigned short port) { + struct afs_vlserver_list *vllist; struct afs_addr_list *alist; const char *p, *end = text + len; + const char *problem; unsigned int nr = 0; + int ret = -ENOMEM; _enter("%*.*s,%c", (int)len, (int)len, text, delim); - if (!len) + if (!len) { + _leave(" = -EDESTADDRREQ [empty]"); return ERR_PTR(-EDESTADDRREQ); + } if (delim == ':' && (memchr(text, ',', len) || !memchr(text, '.', len))) delim = ','; @@ -84,18 +90,24 @@ struct afs_addr_list *afs_parse_text_addrs(const char *text, size_t len, /* Count the addresses */ p = text; do { - if (!*p) - return ERR_PTR(-EINVAL); + if (!*p) { + problem = "nul"; + goto inval; + } if (*p == delim) continue; nr++; if (*p == '[') { p++; - if (p == end) - return ERR_PTR(-EINVAL); + if (p == end) { + problem = "brace1"; + goto inval; + } p = memchr(p, ']', end - p); - if (!p) - return ERR_PTR(-EINVAL); + if (!p) { + problem = "brace2"; + goto inval; + } p++; if (p >= end) break; @@ -109,10 +121,19 @@ struct afs_addr_list *afs_parse_text_addrs(const char *text, size_t len, _debug("%u/%u addresses", nr, AFS_MAX_ADDRESSES); - alist = afs_alloc_addrlist(nr, service, port); - if (!alist) + vllist = afs_alloc_vlserver_list(1); + if (!vllist) return ERR_PTR(-ENOMEM); + vllist->nr_servers = 1; + vllist->servers[0].server = afs_alloc_vlserver("", 7, AFS_VL_PORT); + if (!vllist->servers[0].server) + goto error_vl; + +
[PATCH 22/25] afs: Allow dumping of server cursor on operation failure [ver #2]
Provide an option to allow the file or volume location server cursor to be dumped if the rotation routine falls off the end without managing to contact a server. Signed-off-by: David Howells --- fs/afs/Kconfig | 12 +++ fs/afs/addr_list.c |2 ++ fs/afs/internal.h |3 +++ fs/afs/rotate.c| 57 fs/afs/vl_rotate.c | 53 5 files changed, 127 insertions(+) diff --git a/fs/afs/Kconfig b/fs/afs/Kconfig index ebba3b18e5da..701aaa9b1899 100644 --- a/fs/afs/Kconfig +++ b/fs/afs/Kconfig @@ -27,3 +27,15 @@ config AFS_FSCACHE help Say Y here if you want AFS data to be cached locally on disk through the generic filesystem cache manager + +config AFS_DEBUG_CURSOR + bool "AFS server cursor debugging" + depends on AFS_FS + help + Say Y here to cause the contents of a server cursor to be dumped to + the dmesg log if the server rotation algorithm fails to successfully + contact a server. + + See for more information. + + If unsure, say N. diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index 3f60b4012587..bc5ce31a4ae4 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -358,6 +358,8 @@ bool afs_iterate_addresses(struct afs_addr_cursor *ac) if (!ac->alist) return false; + ac->nr_iterations++; + if (ac->begun) { ac->index++; if (ac->index == ac->alist->nr_addrs) diff --git a/fs/afs/internal.h b/fs/afs/internal.h index ce79bd514331..ac9da1e4050e 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -660,6 +660,7 @@ struct afs_addr_cursor { short error; boolbegun; /* T if we've begun iteration */ boolresponded; /* T if the current address responded */ + unsigned short nr_iterations; /* Number of address iterations */ }; /* @@ -677,6 +678,7 @@ struct afs_vl_cursor { #define AFS_VL_CURSOR_STOP 0x0001 /* Set to cease iteration */ #define AFS_VL_CURSOR_RETRY0x0002 /* Set to do a retry */ #define AFS_VL_CURSOR_RETRIED 0x0004 /* Set if started a retry */ + unsigned short nr_iterations; /* Number of server iterations */ }; /* @@ -700,6 +702,7 @@ struct afs_fs_cursor { #define AFS_FS_CURSOR_VNOVOL 0x0008 /* Set if seen VNOVOL */ #define AFS_FS_CURSOR_CUR_ONLY 0x0010 /* Set if current server only (file lock held) */ #define AFS_FS_CURSOR_NO_VSLEEP0x0020 /* Set to prevent sleep on VBUSY, VOFFLINE, ... */ + unsigned short nr_iterations; /* Number of server iterations */ }; /* diff --git a/fs/afs/rotate.c b/fs/afs/rotate.c index 41405dde0113..7c4487781637 100644 --- a/fs/afs/rotate.c +++ b/fs/afs/rotate.c @@ -156,6 +156,8 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc) return false; } + fc->nr_iterations++; + /* Evaluate the result of the previous operation, if there was one. */ switch (error) { case SHRT_MAX: @@ -519,6 +521,56 @@ bool afs_select_current_fileserver(struct afs_fs_cursor *fc) return false; } +/* + * Dump cursor state in the case of the error being EDESTADDRREQ. + */ +static void afs_dump_edestaddrreq(const struct afs_fs_cursor *fc) +{ + static int count; + int i; + + if (!IS_ENABLED(CONFIG_AFS_DEBUG_CURSOR) || count > 3) + return; + count++; + + rcu_read_lock(); + + pr_notice("EDESTADDR occurred\n"); + pr_notice("FC: cbb=%x cbb2=%x fl=%hx err=%hd\n", + fc->cb_break, fc->cb_break_2, fc->flags, fc->error); + pr_notice("FC: st=%u ix=%u ni=%u\n", + fc->start, fc->index, fc->nr_iterations); + + if (fc->server_list) { + const struct afs_server_list *sl = fc->server_list; + pr_notice("FC: SL nr=%u ix=%u vnov=%hx\n", + sl->nr_servers, sl->index, sl->vnovol_mask); + for (i = 0; i < sl->nr_servers; i++) { + const struct afs_server *s = sl->servers[i].server; + pr_notice("FC: server fl=%lx av=%u %pU\n", + s->flags, s->addr_version, >uuid); + if (s->addresses) { + const struct afs_addr_list *a = + rcu_dereference(s->addresses); + pr_notice("FC: - av=%u nr=%u/%u/%u ax=%u\n", + a->version, + a->nr_ipv4, a->nr_addrs, a->max_addrs, + a->index); + pr_notice("FC: - pr=%lx yf=%lx\n", +
[PATCH 12/25] afs: Don't invoke the server to read data beyond EOF [ver #2]
When writing a new page, clear space in the page rather than attempting to load it from the server if the space is beyond the EOF. Signed-off-by: David Howells --- fs/afs/write.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/fs/afs/write.c b/fs/afs/write.c index fdb9d6024126..11066a3248ba 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -33,10 +33,21 @@ static int afs_fill_page(struct afs_vnode *vnode, struct key *key, loff_t pos, unsigned int len, struct page *page) { struct afs_read *req; + size_t p; + void *data; int ret; _enter(",,%llu", (unsigned long long)pos); + if (pos >= vnode->vfs_inode.i_size) { + p = pos & ~PAGE_MASK; + ASSERTCMP(p + len, <=, PAGE_SIZE); + data = kmap(page); + memset(data + p, 0, len); + kunmap(page); + return 0; + } + req = kzalloc(sizeof(struct afs_read) + sizeof(struct page *), GFP_KERNEL); if (!req)
[PATCH 08/25] afs: Implement VL server rotation [ver #2]
Track VL servers as independent entities rather than lumping all their addresses together into one set and implement server-level rotation by: (1) Add the concept of a VL server list, where each server has its own separate address list. This code is similar to the FS server list. (2) Use the DNS resolver to retrieve a set of servers and their associated addresses, ports, preference and weight ratings. (3) In the case of a legacy DNS resolver or an address list given directly through /proc/net/afs/cells, create a list containing just a dummy server record and attach all the addresses to that. (4) Implement a simple rotation policy, for the moment ignoring the priorities and weights assigned to the servers. (5) Show the address list through /proc/net/afs//vlservers. This also displays the source and status of the data as indicated by the upcall. Signed-off-by: David Howells --- fs/afs/Makefile|2 fs/afs/addr_list.c | 163 + fs/afs/cell.c | 39 +++--- fs/afs/dynroot.c |2 fs/afs/internal.h | 114 -- fs/afs/proc.c | 90 +++--- fs/afs/server.c| 42 ++- fs/afs/vl_list.c | 336 fs/afs/vl_rotate.c | 251 +++ fs/afs/vlclient.c | 32 ++--- fs/afs/volume.c| 52 ++-- 11 files changed, 905 insertions(+), 218 deletions(-) create mode 100644 fs/afs/vl_list.c create mode 100644 fs/afs/vl_rotate.c diff --git a/fs/afs/Makefile b/fs/afs/Makefile index 546874057bd3..03e9f7afea1b 100644 --- a/fs/afs/Makefile +++ b/fs/afs/Makefile @@ -29,6 +29,8 @@ kafs-y := \ super.o \ netdevices.o \ vlclient.o \ + vl_rotate.o \ + vl_list.o \ volume.o \ write.o \ xattr.o diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index 7b34fad4f8f5..3f60b4012587 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -64,19 +64,25 @@ struct afs_addr_list *afs_alloc_addrlist(unsigned int nr, /* * Parse a text string consisting of delimited addresses. */ -struct afs_addr_list *afs_parse_text_addrs(const char *text, size_t len, - char delim, - unsigned short service, - unsigned short port) +struct afs_vlserver_list *afs_parse_text_addrs(struct afs_net *net, + const char *text, size_t len, + char delim, + unsigned short service, + unsigned short port) { + struct afs_vlserver_list *vllist; struct afs_addr_list *alist; const char *p, *end = text + len; + const char *problem; unsigned int nr = 0; + int ret = -ENOMEM; _enter("%*.*s,%c", (int)len, (int)len, text, delim); - if (!len) + if (!len) { + _leave(" = -EDESTADDRREQ [empty]"); return ERR_PTR(-EDESTADDRREQ); + } if (delim == ':' && (memchr(text, ',', len) || !memchr(text, '.', len))) delim = ','; @@ -84,18 +90,24 @@ struct afs_addr_list *afs_parse_text_addrs(const char *text, size_t len, /* Count the addresses */ p = text; do { - if (!*p) - return ERR_PTR(-EINVAL); + if (!*p) { + problem = "nul"; + goto inval; + } if (*p == delim) continue; nr++; if (*p == '[') { p++; - if (p == end) - return ERR_PTR(-EINVAL); + if (p == end) { + problem = "brace1"; + goto inval; + } p = memchr(p, ']', end - p); - if (!p) - return ERR_PTR(-EINVAL); + if (!p) { + problem = "brace2"; + goto inval; + } p++; if (p >= end) break; @@ -109,10 +121,19 @@ struct afs_addr_list *afs_parse_text_addrs(const char *text, size_t len, _debug("%u/%u addresses", nr, AFS_MAX_ADDRESSES); - alist = afs_alloc_addrlist(nr, service, port); - if (!alist) + vllist = afs_alloc_vlserver_list(1); + if (!vllist) return ERR_PTR(-ENOMEM); + vllist->nr_servers = 1; + vllist->servers[0].server = afs_alloc_vlserver("", 7, AFS_VL_PORT); + if (!vllist->servers[0].server) + goto error_vl; + +
[PATCH 16/25] afs: Implement the YFS cache manager service [ver #2]
Implement the YFS cache manager service which gives extra capabilities on top of AFS. This is done by listening for an additional service on the same port and indicating that anyone requesting an upgrade should be upgraded to the YFS port. Signed-off-by: David Howells --- fs/afs/cmservice.c| 103 + fs/afs/protocol_yfs.h | 57 +++ fs/afs/rxrpc.c| 15 +++ 3 files changed, 174 insertions(+), 1 deletion(-) create mode 100644 fs/afs/protocol_yfs.h diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index fc0010d800a0..8cf8d10daa6c 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -16,6 +16,7 @@ #include #include "internal.h" #include "afs_cm.h" +#include "protocol_yfs.h" static int afs_deliver_cb_init_call_back_state(struct afs_call *); static int afs_deliver_cb_init_call_back_state3(struct afs_call *); @@ -30,6 +31,8 @@ static void SRXAFSCB_Probe(struct work_struct *); static void SRXAFSCB_ProbeUuid(struct work_struct *); static void SRXAFSCB_TellMeAboutYourself(struct work_struct *); +static int afs_deliver_yfs_cb_callback(struct afs_call *); + #define CM_NAME(name) \ const char afs_SRXCB##name##_name[] __tracepoint_string = \ "CB." #name @@ -100,13 +103,24 @@ static const struct afs_call_type afs_SRXCBTellMeAboutYourself = { .work = SRXAFSCB_TellMeAboutYourself, }; +/* + * YFS CB.CallBack operation type + */ +static CM_NAME(YFS_CallBack); +static const struct afs_call_type afs_SRXYFSCB_CallBack = { + .name = afs_SRXCBYFS_CallBack_name, + .deliver= afs_deliver_yfs_cb_callback, + .destructor = afs_cm_destructor, + .work = SRXAFSCB_CallBack, +}; + /* * route an incoming cache manager call * - return T if supported, F if not */ bool afs_cm_incoming_call(struct afs_call *call) { - _enter("{CB.OP %u}", call->operation_ID); + _enter("{%u, CB.OP %u}", call->service_id, call->operation_ID); switch (call->operation_ID) { case CBCallBack: @@ -127,6 +141,11 @@ bool afs_cm_incoming_call(struct afs_call *call) case CBTellMeAboutYourself: call->type = _SRXCBTellMeAboutYourself; return true; + case YFSCBCallBack: + if (call->service_id != YFS_CM_SERVICE) + return false; + call->type = _SRXYFSCB_CallBack; + return true; default: return false; } @@ -570,3 +589,85 @@ static int afs_deliver_cb_tell_me_about_yourself(struct afs_call *call) return afs_queue_call_work(call); } + +/* + * deliver request data to a YFS CB.CallBack call + */ +static int afs_deliver_yfs_cb_callback(struct afs_call *call) +{ + struct afs_callback_break *cb; + struct sockaddr_rxrpc srx; + struct yfs_xdr_YFSFid *bp; + size_t size; + int ret, loop; + + _enter("{%u}", call->unmarshall); + + switch (call->unmarshall) { + case 0: + afs_extract_to_tmp(call); + call->unmarshall++; + + /* extract the FID array and its count in two steps */ + case 1: + _debug("extract FID count"); + ret = afs_extract_data(call, true); + if (ret < 0) + return ret; + + call->count = ntohl(call->tmp); + _debug("FID count: %u", call->count); + if (call->count > YFSCBMAX) + return afs_protocol_error(call, -EBADMSG, + afs_eproto_cb_fid_count); + + size = array_size(call->count, sizeof(struct yfs_xdr_YFSFid)); + call->buffer = kmalloc(size, GFP_KERNEL); + if (!call->buffer) + return -ENOMEM; + afs_extract_to_buf(call, size); + call->unmarshall++; + + case 2: + _debug("extract FID array"); + ret = afs_extract_data(call, false); + if (ret < 0) + return ret; + + _debug("unmarshall FID array"); + call->request = kcalloc(call->count, + sizeof(struct afs_callback_break), + GFP_KERNEL); + if (!call->request) + return -ENOMEM; + + cb = call->request; + bp = call->buffer; + for (loop = call->count; loop > 0; loop--, cb++) { + cb->fid.vid = xdr_to_u64(bp->volume); + cb->fid.vnode = xdr_to_u64(bp->vnode.lo); + cb->fid.vnode_hi = ntohl(bp->vnode.hi); + cb->fid.unique = ntohl(bp->vnode.unique); + bp++; + } + + afs_extract_to_tmp(call);
[PATCH 23/25] afs: Eliminate the address pointer from the address list cursor [ver #2]
Eliminate the address pointer from the address list cursor as it's redundant (ac->addrs[ac->index] can be used to find the same address) and address lists must be replaced rather than being rearranged, so is of limited value. Signed-off-by: David Howells --- fs/afs/addr_list.c |2 -- fs/afs/internal.h |1 - fs/afs/rxrpc.c |2 +- fs/afs/server.c|2 -- fs/afs/vl_rotate.c |2 +- fs/afs/volume.c|6 +++--- 6 files changed, 5 insertions(+), 10 deletions(-) diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index bc5ce31a4ae4..1536d1d21c33 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -371,7 +371,6 @@ bool afs_iterate_addresses(struct afs_addr_cursor *ac) ac->begun = true; ac->responded = false; - ac->addr = >alist->addrs[ac->index]; return true; } @@ -389,7 +388,6 @@ int afs_end_cursor(struct afs_addr_cursor *ac) afs_put_addrlist(alist); } - ac->addr = NULL; ac->alist = NULL; ac->begun = false; return ac->error; diff --git a/fs/afs/internal.h b/fs/afs/internal.h index ac9da1e4050e..e5b596bd8acf 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -653,7 +653,6 @@ struct afs_interface { */ struct afs_addr_cursor { struct afs_addr_list*alist; /* Current address list (pins ref) */ - struct sockaddr_rxrpc *addr; u32 abort_code; unsigned short start; /* Starting point in alist->addrs[] */ unsigned short index; /* Wrapping offset from start to current addr */ diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c index 444ba0d511ef..42e1ea7372e9 100644 --- a/fs/afs/rxrpc.c +++ b/fs/afs/rxrpc.c @@ -359,7 +359,7 @@ static int afs_send_pages(struct afs_call *call, struct msghdr *msg) long afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp, bool async) { - struct sockaddr_rxrpc *srx = ac->addr; + struct sockaddr_rxrpc *srx = >alist->addrs[ac->index]; struct rxrpc_call *rxcall; struct msghdr msg; struct kvec iov[1]; diff --git a/fs/afs/server.c b/fs/afs/server.c index aa35cfae5440..7c1be8b4dc9a 100644 --- a/fs/afs/server.c +++ b/fs/afs/server.c @@ -367,7 +367,6 @@ static void afs_destroy_server(struct afs_net *net, struct afs_server *server) .alist = alist, .start = alist->index, .index = 0, - .addr = >addrs[alist->index], .error = 0, }; _enter("%p", server); @@ -518,7 +517,6 @@ static bool afs_do_probe_fileserver(struct afs_fs_cursor *fc) _enter(""); - fc->ac.addr = NULL; fc->ac.start = READ_ONCE(fc->ac.alist->index); fc->ac.index = fc->ac.start; fc->ac.error = 0; diff --git a/fs/afs/vl_rotate.c b/fs/afs/vl_rotate.c index 5b99ea7be194..ead6dedbb561 100644 --- a/fs/afs/vl_rotate.c +++ b/fs/afs/vl_rotate.c @@ -209,7 +209,7 @@ bool afs_select_vlserver(struct afs_vl_cursor *vc) if (!afs_iterate_addresses(>ac)) goto next_server; - _leave(" = t %pISpc", >ac.addr->transport); + _leave(" = t %pISpc", >ac.alist->addrs[vc->ac.index].transport); return true; next_server: diff --git a/fs/afs/volume.c b/fs/afs/volume.c index f0020e35bf6f..7527c081726e 100644 --- a/fs/afs/volume.c +++ b/fs/afs/volume.c @@ -88,16 +88,16 @@ static struct afs_vldb_entry *afs_vl_lookup_vldb(struct afs_cell *cell, case VL_SERVICE: clear_bit(vc.ac.index, >yfs); set_bit(vc.ac.index, >probed); - vc.ac.addr->srx_service = ret; + vc.ac.alist->addrs[vc.ac.index].srx_service = ret; break; case YFS_VL_SERVICE: set_bit(vc.ac.index, >yfs); set_bit(vc.ac.index, >probed); - vc.ac.addr->srx_service = ret; + vc.ac.alist->addrs[vc.ac.index].srx_service = ret; break; } } - + vldb = afs_vl_get_entry_by_name_u(, volname, volnamesz); }
[PATCH 25/25] afs: Probe multiple fileservers simultaneously [ver #2]
Send probes to all the unprobed fileservers in a fileserver list on all addresses simultaneously in an attempt to find out the fastest route whilst not getting stuck for 20s on any server or address that we don't get a reply from. This alleviates the problem whereby attempting to access a new server can take a long time because the rotation algorithm ends up rotating through all servers and addresses until it finds one that responds. Signed-off-by: David Howells --- fs/afs/Makefile|4 - fs/afs/addr_list.c | 40 -- fs/afs/cmservice.c | 129 +++-- fs/afs/fs_probe.c | 270 fs/afs/fsclient.c | 27 +++- fs/afs/internal.h | 98 +--- fs/afs/proc.c |6 - fs/afs/rotate.c| 174 ++-- fs/afs/rxrpc.c | 44 --- fs/afs/server.c| 109 +- fs/afs/server_list.c |6 - fs/afs/vl_list.c |6 + fs/afs/vl_probe.c | 273 fs/afs/vl_rotate.c | 159 +- fs/afs/vlclient.c | 35 +++--- fs/afs/volume.c| 16 --- include/trace/events/afs.h |4 - 17 files changed, 1050 insertions(+), 350 deletions(-) create mode 100644 fs/afs/fs_probe.c create mode 100644 fs/afs/vl_probe.c diff --git a/fs/afs/Makefile b/fs/afs/Makefile index cc942b790cff..0738e2bf5193 100644 --- a/fs/afs/Makefile +++ b/fs/afs/Makefile @@ -17,6 +17,7 @@ kafs-y := \ file.o \ flock.o \ fsclient.o \ + fs_probe.o \ inode.o \ main.o \ misc.o \ @@ -29,8 +30,9 @@ kafs-y := \ super.o \ netdevices.o \ vlclient.o \ - vl_rotate.o \ vl_list.o \ + vl_probe.o \ + vl_rotate.o \ volume.o \ write.o \ xattr.o \ diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index 1536d1d21c33..967db336d11a 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -303,6 +303,8 @@ void afs_merge_fs_addr4(struct afs_addr_list *alist, __be32 xdr, u16 port) sizeof(alist->addrs[0]) * (alist->nr_addrs - i)); srx = >addrs[i]; + srx->srx_family = AF_RXRPC; + srx->transport_type = SOCK_DGRAM; srx->transport_len = sizeof(srx->transport.sin); srx->transport.sin.sin_family = AF_INET; srx->transport.sin.sin_port = htons(port); @@ -341,6 +343,8 @@ void afs_merge_fs_addr6(struct afs_addr_list *alist, __be32 *xdr, u16 port) sizeof(alist->addrs[0]) * (alist->nr_addrs - i)); srx = >addrs[i]; + srx->srx_family = AF_RXRPC; + srx->transport_type = SOCK_DGRAM; srx->transport_len = sizeof(srx->transport.sin6); srx->transport.sin6.sin6_family = AF_INET6; srx->transport.sin6.sin6_port = htons(port); @@ -353,23 +357,32 @@ void afs_merge_fs_addr6(struct afs_addr_list *alist, __be32 *xdr, u16 port) */ bool afs_iterate_addresses(struct afs_addr_cursor *ac) { - _enter("%hu+%hd", ac->start, (short)ac->index); + unsigned long set, failed; + int index; if (!ac->alist) return false; + set = ac->alist->responded; + failed = ac->alist->failed; + _enter("%lx-%lx-%lx,%d", set, failed, ac->tried, ac->index); + ac->nr_iterations++; - if (ac->begun) { - ac->index++; - if (ac->index == ac->alist->nr_addrs) - ac->index = 0; + set &= ~(failed | ac->tried); - if (ac->index == ac->start) - return false; - } + if (!set) + return false; + + index = READ_ONCE(ac->alist->preferred); + if (test_bit(index, )) + goto selected; + + index = __ffs(set); - ac->begun = true; +selected: + ac->index = index; + set_bit(index, >tried); ac->responded = false; return true; } @@ -383,12 +396,13 @@ int afs_end_cursor(struct afs_addr_cursor *ac) alist = ac->alist; if (alist) { - if (ac->responded && ac->index != ac->start) - WRITE_ONCE(alist->index, ac->index); + if (ac->responded && + ac->index != alist->preferred && + test_bit(ac->alist->preferred, >tried)) + WRITE_ONCE(alist->preferred, ac->index); afs_put_addrlist(alist); + ac->alist = NULL; } - ac->alist = NULL; - ac->begun = false; return ac->error; } diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 8cf8d10daa6c..8ee5972893ed 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -122,6 +122,8 @@ bool afs_cm_incoming_call(struct afs_call *call) { _enter("{%u, CB.OP %u}",
[PATCH 10/25] afs: Handle EIO from delivery function [ver #2]
Fix afs_deliver_to_call() to handle -EIO being returned by the operation delivery function, indicating that the call found itself in the wrong state, by printing an error and aborting the call. Currently, an assertion failure will occur. This can happen, say, if the delivery function falls off the end without calling afs_extract_data() with the want_more parameter set to false to collect the end of the Rx phase of a call. The assertion failure looks like: AFS: Assertion failed 4 == 7 is false 0x4 == 0x7 is false [ cut here ] kernel BUG at fs/afs/rxrpc.c:462! and is matched in the trace buffer by a line like: kworker/7:3-3226 [007] ...1 85158.030203: afs_io_error: c=0003be0c r=-5 CM_REPLY Fixes: 98bf40cd99fc ("afs: Protect call->state changes against signals") Reported-by: Marc Dionne Signed-off-by: David Howells --- fs/afs/rxrpc.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c index a3904a8315de..947ae3ab389b 100644 --- a/fs/afs/rxrpc.c +++ b/fs/afs/rxrpc.c @@ -499,7 +499,6 @@ static void afs_deliver_to_call(struct afs_call *call) case -EINPROGRESS: case -EAGAIN: goto out; - case -EIO: case -ECONNABORTED: ASSERTCMP(state, ==, AFS_CALL_COMPLETE); goto done; @@ -508,6 +507,10 @@ static void afs_deliver_to_call(struct afs_call *call) rxrpc_kernel_abort_call(call->net->socket, call->rxcall, abort_code, ret, "KIV"); goto local_abort; + case -EIO: + pr_err("kAFS: Call %u in bad state %u\n", + call->debug_id, state); + /* Fall through */ case -ENODATA: case -EBADMSG: case -EMSGSIZE:
[PATCH 16/25] afs: Implement the YFS cache manager service [ver #2]
Implement the YFS cache manager service which gives extra capabilities on top of AFS. This is done by listening for an additional service on the same port and indicating that anyone requesting an upgrade should be upgraded to the YFS port. Signed-off-by: David Howells --- fs/afs/cmservice.c| 103 + fs/afs/protocol_yfs.h | 57 +++ fs/afs/rxrpc.c| 15 +++ 3 files changed, 174 insertions(+), 1 deletion(-) create mode 100644 fs/afs/protocol_yfs.h diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index fc0010d800a0..8cf8d10daa6c 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -16,6 +16,7 @@ #include #include "internal.h" #include "afs_cm.h" +#include "protocol_yfs.h" static int afs_deliver_cb_init_call_back_state(struct afs_call *); static int afs_deliver_cb_init_call_back_state3(struct afs_call *); @@ -30,6 +31,8 @@ static void SRXAFSCB_Probe(struct work_struct *); static void SRXAFSCB_ProbeUuid(struct work_struct *); static void SRXAFSCB_TellMeAboutYourself(struct work_struct *); +static int afs_deliver_yfs_cb_callback(struct afs_call *); + #define CM_NAME(name) \ const char afs_SRXCB##name##_name[] __tracepoint_string = \ "CB." #name @@ -100,13 +103,24 @@ static const struct afs_call_type afs_SRXCBTellMeAboutYourself = { .work = SRXAFSCB_TellMeAboutYourself, }; +/* + * YFS CB.CallBack operation type + */ +static CM_NAME(YFS_CallBack); +static const struct afs_call_type afs_SRXYFSCB_CallBack = { + .name = afs_SRXCBYFS_CallBack_name, + .deliver= afs_deliver_yfs_cb_callback, + .destructor = afs_cm_destructor, + .work = SRXAFSCB_CallBack, +}; + /* * route an incoming cache manager call * - return T if supported, F if not */ bool afs_cm_incoming_call(struct afs_call *call) { - _enter("{CB.OP %u}", call->operation_ID); + _enter("{%u, CB.OP %u}", call->service_id, call->operation_ID); switch (call->operation_ID) { case CBCallBack: @@ -127,6 +141,11 @@ bool afs_cm_incoming_call(struct afs_call *call) case CBTellMeAboutYourself: call->type = _SRXCBTellMeAboutYourself; return true; + case YFSCBCallBack: + if (call->service_id != YFS_CM_SERVICE) + return false; + call->type = _SRXYFSCB_CallBack; + return true; default: return false; } @@ -570,3 +589,85 @@ static int afs_deliver_cb_tell_me_about_yourself(struct afs_call *call) return afs_queue_call_work(call); } + +/* + * deliver request data to a YFS CB.CallBack call + */ +static int afs_deliver_yfs_cb_callback(struct afs_call *call) +{ + struct afs_callback_break *cb; + struct sockaddr_rxrpc srx; + struct yfs_xdr_YFSFid *bp; + size_t size; + int ret, loop; + + _enter("{%u}", call->unmarshall); + + switch (call->unmarshall) { + case 0: + afs_extract_to_tmp(call); + call->unmarshall++; + + /* extract the FID array and its count in two steps */ + case 1: + _debug("extract FID count"); + ret = afs_extract_data(call, true); + if (ret < 0) + return ret; + + call->count = ntohl(call->tmp); + _debug("FID count: %u", call->count); + if (call->count > YFSCBMAX) + return afs_protocol_error(call, -EBADMSG, + afs_eproto_cb_fid_count); + + size = array_size(call->count, sizeof(struct yfs_xdr_YFSFid)); + call->buffer = kmalloc(size, GFP_KERNEL); + if (!call->buffer) + return -ENOMEM; + afs_extract_to_buf(call, size); + call->unmarshall++; + + case 2: + _debug("extract FID array"); + ret = afs_extract_data(call, false); + if (ret < 0) + return ret; + + _debug("unmarshall FID array"); + call->request = kcalloc(call->count, + sizeof(struct afs_callback_break), + GFP_KERNEL); + if (!call->request) + return -ENOMEM; + + cb = call->request; + bp = call->buffer; + for (loop = call->count; loop > 0; loop--, cb++) { + cb->fid.vid = xdr_to_u64(bp->volume); + cb->fid.vnode = xdr_to_u64(bp->vnode.lo); + cb->fid.vnode_hi = ntohl(bp->vnode.hi); + cb->fid.unique = ntohl(bp->vnode.unique); + bp++; + } + + afs_extract_to_tmp(call);
[PATCH 23/25] afs: Eliminate the address pointer from the address list cursor [ver #2]
Eliminate the address pointer from the address list cursor as it's redundant (ac->addrs[ac->index] can be used to find the same address) and address lists must be replaced rather than being rearranged, so is of limited value. Signed-off-by: David Howells --- fs/afs/addr_list.c |2 -- fs/afs/internal.h |1 - fs/afs/rxrpc.c |2 +- fs/afs/server.c|2 -- fs/afs/vl_rotate.c |2 +- fs/afs/volume.c|6 +++--- 6 files changed, 5 insertions(+), 10 deletions(-) diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index bc5ce31a4ae4..1536d1d21c33 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -371,7 +371,6 @@ bool afs_iterate_addresses(struct afs_addr_cursor *ac) ac->begun = true; ac->responded = false; - ac->addr = >alist->addrs[ac->index]; return true; } @@ -389,7 +388,6 @@ int afs_end_cursor(struct afs_addr_cursor *ac) afs_put_addrlist(alist); } - ac->addr = NULL; ac->alist = NULL; ac->begun = false; return ac->error; diff --git a/fs/afs/internal.h b/fs/afs/internal.h index ac9da1e4050e..e5b596bd8acf 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -653,7 +653,6 @@ struct afs_interface { */ struct afs_addr_cursor { struct afs_addr_list*alist; /* Current address list (pins ref) */ - struct sockaddr_rxrpc *addr; u32 abort_code; unsigned short start; /* Starting point in alist->addrs[] */ unsigned short index; /* Wrapping offset from start to current addr */ diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c index 444ba0d511ef..42e1ea7372e9 100644 --- a/fs/afs/rxrpc.c +++ b/fs/afs/rxrpc.c @@ -359,7 +359,7 @@ static int afs_send_pages(struct afs_call *call, struct msghdr *msg) long afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp, bool async) { - struct sockaddr_rxrpc *srx = ac->addr; + struct sockaddr_rxrpc *srx = >alist->addrs[ac->index]; struct rxrpc_call *rxcall; struct msghdr msg; struct kvec iov[1]; diff --git a/fs/afs/server.c b/fs/afs/server.c index aa35cfae5440..7c1be8b4dc9a 100644 --- a/fs/afs/server.c +++ b/fs/afs/server.c @@ -367,7 +367,6 @@ static void afs_destroy_server(struct afs_net *net, struct afs_server *server) .alist = alist, .start = alist->index, .index = 0, - .addr = >addrs[alist->index], .error = 0, }; _enter("%p", server); @@ -518,7 +517,6 @@ static bool afs_do_probe_fileserver(struct afs_fs_cursor *fc) _enter(""); - fc->ac.addr = NULL; fc->ac.start = READ_ONCE(fc->ac.alist->index); fc->ac.index = fc->ac.start; fc->ac.error = 0; diff --git a/fs/afs/vl_rotate.c b/fs/afs/vl_rotate.c index 5b99ea7be194..ead6dedbb561 100644 --- a/fs/afs/vl_rotate.c +++ b/fs/afs/vl_rotate.c @@ -209,7 +209,7 @@ bool afs_select_vlserver(struct afs_vl_cursor *vc) if (!afs_iterate_addresses(>ac)) goto next_server; - _leave(" = t %pISpc", >ac.addr->transport); + _leave(" = t %pISpc", >ac.alist->addrs[vc->ac.index].transport); return true; next_server: diff --git a/fs/afs/volume.c b/fs/afs/volume.c index f0020e35bf6f..7527c081726e 100644 --- a/fs/afs/volume.c +++ b/fs/afs/volume.c @@ -88,16 +88,16 @@ static struct afs_vldb_entry *afs_vl_lookup_vldb(struct afs_cell *cell, case VL_SERVICE: clear_bit(vc.ac.index, >yfs); set_bit(vc.ac.index, >probed); - vc.ac.addr->srx_service = ret; + vc.ac.alist->addrs[vc.ac.index].srx_service = ret; break; case YFS_VL_SERVICE: set_bit(vc.ac.index, >yfs); set_bit(vc.ac.index, >probed); - vc.ac.addr->srx_service = ret; + vc.ac.alist->addrs[vc.ac.index].srx_service = ret; break; } } - + vldb = afs_vl_get_entry_by_name_u(, volname, volnamesz); }
[PATCH 25/25] afs: Probe multiple fileservers simultaneously [ver #2]
Send probes to all the unprobed fileservers in a fileserver list on all addresses simultaneously in an attempt to find out the fastest route whilst not getting stuck for 20s on any server or address that we don't get a reply from. This alleviates the problem whereby attempting to access a new server can take a long time because the rotation algorithm ends up rotating through all servers and addresses until it finds one that responds. Signed-off-by: David Howells --- fs/afs/Makefile|4 - fs/afs/addr_list.c | 40 -- fs/afs/cmservice.c | 129 +++-- fs/afs/fs_probe.c | 270 fs/afs/fsclient.c | 27 +++- fs/afs/internal.h | 98 +--- fs/afs/proc.c |6 - fs/afs/rotate.c| 174 ++-- fs/afs/rxrpc.c | 44 --- fs/afs/server.c| 109 +- fs/afs/server_list.c |6 - fs/afs/vl_list.c |6 + fs/afs/vl_probe.c | 273 fs/afs/vl_rotate.c | 159 +- fs/afs/vlclient.c | 35 +++--- fs/afs/volume.c| 16 --- include/trace/events/afs.h |4 - 17 files changed, 1050 insertions(+), 350 deletions(-) create mode 100644 fs/afs/fs_probe.c create mode 100644 fs/afs/vl_probe.c diff --git a/fs/afs/Makefile b/fs/afs/Makefile index cc942b790cff..0738e2bf5193 100644 --- a/fs/afs/Makefile +++ b/fs/afs/Makefile @@ -17,6 +17,7 @@ kafs-y := \ file.o \ flock.o \ fsclient.o \ + fs_probe.o \ inode.o \ main.o \ misc.o \ @@ -29,8 +30,9 @@ kafs-y := \ super.o \ netdevices.o \ vlclient.o \ - vl_rotate.o \ vl_list.o \ + vl_probe.o \ + vl_rotate.o \ volume.o \ write.o \ xattr.o \ diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index 1536d1d21c33..967db336d11a 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -303,6 +303,8 @@ void afs_merge_fs_addr4(struct afs_addr_list *alist, __be32 xdr, u16 port) sizeof(alist->addrs[0]) * (alist->nr_addrs - i)); srx = >addrs[i]; + srx->srx_family = AF_RXRPC; + srx->transport_type = SOCK_DGRAM; srx->transport_len = sizeof(srx->transport.sin); srx->transport.sin.sin_family = AF_INET; srx->transport.sin.sin_port = htons(port); @@ -341,6 +343,8 @@ void afs_merge_fs_addr6(struct afs_addr_list *alist, __be32 *xdr, u16 port) sizeof(alist->addrs[0]) * (alist->nr_addrs - i)); srx = >addrs[i]; + srx->srx_family = AF_RXRPC; + srx->transport_type = SOCK_DGRAM; srx->transport_len = sizeof(srx->transport.sin6); srx->transport.sin6.sin6_family = AF_INET6; srx->transport.sin6.sin6_port = htons(port); @@ -353,23 +357,32 @@ void afs_merge_fs_addr6(struct afs_addr_list *alist, __be32 *xdr, u16 port) */ bool afs_iterate_addresses(struct afs_addr_cursor *ac) { - _enter("%hu+%hd", ac->start, (short)ac->index); + unsigned long set, failed; + int index; if (!ac->alist) return false; + set = ac->alist->responded; + failed = ac->alist->failed; + _enter("%lx-%lx-%lx,%d", set, failed, ac->tried, ac->index); + ac->nr_iterations++; - if (ac->begun) { - ac->index++; - if (ac->index == ac->alist->nr_addrs) - ac->index = 0; + set &= ~(failed | ac->tried); - if (ac->index == ac->start) - return false; - } + if (!set) + return false; + + index = READ_ONCE(ac->alist->preferred); + if (test_bit(index, )) + goto selected; + + index = __ffs(set); - ac->begun = true; +selected: + ac->index = index; + set_bit(index, >tried); ac->responded = false; return true; } @@ -383,12 +396,13 @@ int afs_end_cursor(struct afs_addr_cursor *ac) alist = ac->alist; if (alist) { - if (ac->responded && ac->index != ac->start) - WRITE_ONCE(alist->index, ac->index); + if (ac->responded && + ac->index != alist->preferred && + test_bit(ac->alist->preferred, >tried)) + WRITE_ONCE(alist->preferred, ac->index); afs_put_addrlist(alist); + ac->alist = NULL; } - ac->alist = NULL; - ac->begun = false; return ac->error; } diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 8cf8d10daa6c..8ee5972893ed 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -122,6 +122,8 @@ bool afs_cm_incoming_call(struct afs_call *call) { _enter("{%u, CB.OP %u}",
[PATCH 10/25] afs: Handle EIO from delivery function [ver #2]
Fix afs_deliver_to_call() to handle -EIO being returned by the operation delivery function, indicating that the call found itself in the wrong state, by printing an error and aborting the call. Currently, an assertion failure will occur. This can happen, say, if the delivery function falls off the end without calling afs_extract_data() with the want_more parameter set to false to collect the end of the Rx phase of a call. The assertion failure looks like: AFS: Assertion failed 4 == 7 is false 0x4 == 0x7 is false [ cut here ] kernel BUG at fs/afs/rxrpc.c:462! and is matched in the trace buffer by a line like: kworker/7:3-3226 [007] ...1 85158.030203: afs_io_error: c=0003be0c r=-5 CM_REPLY Fixes: 98bf40cd99fc ("afs: Protect call->state changes against signals") Reported-by: Marc Dionne Signed-off-by: David Howells --- fs/afs/rxrpc.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c index a3904a8315de..947ae3ab389b 100644 --- a/fs/afs/rxrpc.c +++ b/fs/afs/rxrpc.c @@ -499,7 +499,6 @@ static void afs_deliver_to_call(struct afs_call *call) case -EINPROGRESS: case -EAGAIN: goto out; - case -EIO: case -ECONNABORTED: ASSERTCMP(state, ==, AFS_CALL_COMPLETE); goto done; @@ -508,6 +507,10 @@ static void afs_deliver_to_call(struct afs_call *call) rxrpc_kernel_abort_call(call->net->socket, call->rxcall, abort_code, ret, "KIV"); goto local_abort; + case -EIO: + pr_err("kAFS: Call %u in bad state %u\n", + call->debug_id, state); + /* Fall through */ case -ENODATA: case -EBADMSG: case -EMSGSIZE:
[PATCH 07/25] afs: Improve FS server rotation error handling [ver #2]
Improve the error handling in FS server rotation by: (1) Cache the latest useful error value for the fs operation as a whole in struct afs_fs_cursor separately from the error cached in the afs_addr_cursor struct. The one in the address cursor gets clobbered occasionally. Copy over the error to the fs operation only when it's something we'd be interested in passing to userspace. (2) Make it so that EDESTADDRREQ is the default that is seen only if no addresses are available to be accessed. (3) When calling utility functions, such as checking a volume status or probing a fileserver, don't let a successful result clobber the cached error in the cursor; instead, stash the result in a temporary variable until it has been assessed. (4) Don't return ETIMEDOUT or ETIME if a better error, such as ENETUNREACH, is already cached. (5) On leaving the rotation loop, turn any remote abort code into a more useful error than ECONNABORTED. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells --- fs/afs/addr_list.c |4 +- fs/afs/internal.h |1 + fs/afs/rotate.c| 95 +--- 3 files changed, 55 insertions(+), 45 deletions(-) diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index 55a756c60746..7b34fad4f8f5 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -318,10 +318,8 @@ bool afs_iterate_addresses(struct afs_addr_cursor *ac) if (ac->index == ac->alist->nr_addrs) ac->index = 0; - if (ac->index == ac->start) { - ac->error = -EDESTADDRREQ; + if (ac->index == ac->start) return false; - } } ac->begun = true; diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 36e9cc74ac11..81936a4d5035 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -629,6 +629,7 @@ struct afs_fs_cursor { unsigned intcb_break_2; /* cb_break + cb_s_break (2nd vnode) */ unsigned char start; /* Initial index in server list */ unsigned char index; /* Number of servers tried beyond start */ + short error; unsigned short flags; #define AFS_FS_CURSOR_STOP 0x0001 /* Set to cease iteration */ #define AFS_FS_CURSOR_VBUSY0x0002 /* Set if seen VBUSY */ diff --git a/fs/afs/rotate.c b/fs/afs/rotate.c index 1faef56b12bd..d7cbc3c230ee 100644 --- a/fs/afs/rotate.c +++ b/fs/afs/rotate.c @@ -39,9 +39,10 @@ bool afs_begin_vnode_operation(struct afs_fs_cursor *fc, struct afs_vnode *vnode fc->vnode = vnode; fc->key = key; fc->ac.error = SHRT_MAX; + fc->error = -EDESTADDRREQ; if (mutex_lock_interruptible(>io_lock) < 0) { - fc->ac.error = -EINTR; + fc->error = -EINTR; fc->flags |= AFS_FS_CURSOR_STOP; return false; } @@ -80,7 +81,7 @@ static bool afs_start_fs_iteration(struct afs_fs_cursor *fc, * and have to return an error. */ if (fc->flags & AFS_FS_CURSOR_CUR_ONLY) { - fc->ac.error = -ESTALE; + fc->error = -ESTALE; return false; } @@ -127,7 +128,7 @@ static bool afs_sleep_and_retry(struct afs_fs_cursor *fc) { msleep_interruptible(1000); if (signal_pending(current)) { - fc->ac.error = -ERESTARTSYS; + fc->error = -ERESTARTSYS; return false; } @@ -143,11 +144,12 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc) struct afs_addr_list *alist; struct afs_server *server; struct afs_vnode *vnode = fc->vnode; + int error = fc->ac.error; _enter("%u/%u,%u/%u,%d,%d", fc->index, fc->start, fc->ac.index, fc->ac.start, - fc->ac.error, fc->ac.abort_code); + error, fc->ac.abort_code); if (fc->flags & AFS_FS_CURSOR_STOP) { _leave(" = f [stopped]"); @@ -155,15 +157,16 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc) } /* Evaluate the result of the previous operation, if there was one. */ - switch (fc->ac.error) { + switch (error) { case SHRT_MAX: goto start; case 0: default: /* Success or local failure. Stop. */ + fc->error = error; fc->flags |= AFS_FS_CURSOR_STOP; - _leave(" = f [okay/local %d]", fc->ac.error); + _leave(" = f [okay/local %d]", error); return false; case -ECONNABORTED: @@ -178,7 +181,7 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc)
Re: [PATCH] scsi/pmcraid.c: Use dma_pool_zalloc
On Mon, Oct 8, 2018 at 9:58 PM Souptick Joarder wrote: > > On Tue, Oct 2, 2018 at 10:53 AM Souptick Joarder wrote: > > > > Replaced dma_pool_alloc + memset with dma_pool_zalloc. > > > > Signed-off-by: Sabyasachi Gupta > > Signed-off-by: Souptick Joarder > > Any comment on this patch ? Any comment on this patch ? > > > --- > > drivers/scsi/pmcraid.c | 4 +--- > > 1 file changed, 1 insertion(+), 3 deletions(-) > > > > diff --git a/drivers/scsi/pmcraid.c b/drivers/scsi/pmcraid.c > > index 4e86994..84a2734 100644 > > --- a/drivers/scsi/pmcraid.c > > +++ b/drivers/scsi/pmcraid.c > > @@ -4681,7 +4681,7 @@ static int pmcraid_allocate_control_blocks(struct > > pmcraid_instance *pinstance) > > > > for (i = 0; i < PMCRAID_MAX_CMD; i++) { > > pinstance->cmd_list[i]->ioa_cb = > > - dma_pool_alloc( > > + dma_pool_zalloc( > > pinstance->control_pool, > > GFP_KERNEL, > > &(pinstance->cmd_list[i]->ioa_cb_bus_addr)); > > @@ -4690,8 +4690,6 @@ static int pmcraid_allocate_control_blocks(struct > > pmcraid_instance *pinstance) > > pmcraid_release_control_blocks(pinstance, i); > > return -ENOMEM; > > } > > - memset(pinstance->cmd_list[i]->ioa_cb, 0, > > - sizeof(struct pmcraid_control_block)); > > } > > return 0; > > } > > -- > > 1.9.1 > >
[PATCH 21/25] afs: Implement YFS support in the fs client [ver #2]
Implement support for talking to YFS-variant fileservers in the cache manager and the filesystem client. These implement upgraded services on the same port as their AFS services. YFS fileservers provide expanded capabilities over AFS. Signed-off-by: David Howells --- fs/afs/Makefile|3 fs/afs/callback.c |9 fs/afs/dir.c | 21 fs/afs/fsclient.c | 104 ++ fs/afs/internal.h | 35 + fs/afs/protocol_yfs.h | 106 ++ fs/afs/server.c|8 fs/afs/yfsclient.c | 2184 include/trace/events/afs.h | 58 + 9 files changed, 2500 insertions(+), 28 deletions(-) create mode 100644 fs/afs/yfsclient.c diff --git a/fs/afs/Makefile b/fs/afs/Makefile index 03e9f7afea1b..cc942b790cff 100644 --- a/fs/afs/Makefile +++ b/fs/afs/Makefile @@ -33,7 +33,8 @@ kafs-y := \ vl_list.o \ volume.o \ write.o \ - xattr.o + xattr.o \ + yfsclient.o kafs-$(CONFIG_PROC_FS) += proc.o obj-$(CONFIG_AFS_FS) := kafs.o diff --git a/fs/afs/callback.c b/fs/afs/callback.c index df9bfee698ad..1c7955f5cdaf 100644 --- a/fs/afs/callback.c +++ b/fs/afs/callback.c @@ -210,12 +210,10 @@ void afs_init_callback_state(struct afs_server *server) /* * actually break a callback */ -void afs_break_callback(struct afs_vnode *vnode) +void __afs_break_callback(struct afs_vnode *vnode) { _enter(""); - write_seqlock(>cb_lock); - clear_bit(AFS_VNODE_NEW_CONTENT, >flags); if (test_and_clear_bit(AFS_VNODE_CB_PROMISED, >flags)) { vnode->cb_break++; @@ -230,7 +228,12 @@ void afs_break_callback(struct afs_vnode *vnode) afs_lock_may_be_available(vnode); spin_unlock(>lock); } +} +void afs_break_callback(struct afs_vnode *vnode) +{ + write_seqlock(>cb_lock); + __afs_break_callback(vnode); write_sequnlock(>cb_lock); } diff --git a/fs/afs/dir.c b/fs/afs/dir.c index f2dd48d4363f..43dea3b00c29 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -1200,7 +1200,7 @@ static int afs_rmdir(struct inode *dir, struct dentry *dentry) if (afs_begin_vnode_operation(, dvnode, key)) { while (afs_select_fileserver()) { fc.cb_break = afs_calc_vnode_cb_break(dvnode); - afs_fs_remove(, dentry->d_name.name, true, + afs_fs_remove(, vnode, dentry->d_name.name, true, data_version); } @@ -1245,7 +1245,9 @@ static int afs_dir_remove_link(struct dentry *dentry, struct key *key, if (d_really_is_positive(dentry)) { struct afs_vnode *vnode = AFS_FS_I(d_inode(dentry)); - if (dir_valid) { + if (test_bit(AFS_VNODE_DELETED, >flags)) { + /* Already done */ + } else if (dir_valid) { drop_nlink(>vfs_inode); if (vnode->vfs_inode.i_nlink == 0) { set_bit(AFS_VNODE_DELETED, >flags); @@ -1274,7 +1276,7 @@ static int afs_dir_remove_link(struct dentry *dentry, struct key *key, static int afs_unlink(struct inode *dir, struct dentry *dentry) { struct afs_fs_cursor fc; - struct afs_vnode *dvnode = AFS_FS_I(dir), *vnode; + struct afs_vnode *dvnode = AFS_FS_I(dir), *vnode = NULL; struct key *key; unsigned long d_version = (unsigned long)dentry->d_fsdata; u64 data_version = dvnode->status.data_version; @@ -1304,7 +1306,18 @@ static int afs_unlink(struct inode *dir, struct dentry *dentry) if (afs_begin_vnode_operation(, dvnode, key)) { while (afs_select_fileserver()) { fc.cb_break = afs_calc_vnode_cb_break(dvnode); - afs_fs_remove(, dentry->d_name.name, false, + + if (test_bit(AFS_SERVER_FL_IS_YFS, >server->flags) && + !test_bit(AFS_SERVER_FL_NO_RM2, >server->flags)) { + yfs_fs_remove_file2(, vnode, dentry->d_name.name, + data_version); + if (fc.ac.error != -ECONNABORTED || + fc.ac.abort_code != RXGEN_OPCODE) + continue; + set_bit(AFS_SERVER_FL_NO_RM2, >server->flags); + } + + afs_fs_remove(, vnode, dentry->d_name.name, false, data_version); } diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 2da65309e0de..3975969719de 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -17,6 +17,7 @@ #include "internal.h" #include "afs_fs.h" #include "xdr_fs.h" +#include "protocol_yfs.h" static const struct afs_fid afs_zero_fid;
[PATCH 07/25] afs: Improve FS server rotation error handling [ver #2]
Improve the error handling in FS server rotation by: (1) Cache the latest useful error value for the fs operation as a whole in struct afs_fs_cursor separately from the error cached in the afs_addr_cursor struct. The one in the address cursor gets clobbered occasionally. Copy over the error to the fs operation only when it's something we'd be interested in passing to userspace. (2) Make it so that EDESTADDRREQ is the default that is seen only if no addresses are available to be accessed. (3) When calling utility functions, such as checking a volume status or probing a fileserver, don't let a successful result clobber the cached error in the cursor; instead, stash the result in a temporary variable until it has been assessed. (4) Don't return ETIMEDOUT or ETIME if a better error, such as ENETUNREACH, is already cached. (5) On leaving the rotation loop, turn any remote abort code into a more useful error than ECONNABORTED. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells --- fs/afs/addr_list.c |4 +- fs/afs/internal.h |1 + fs/afs/rotate.c| 95 +--- 3 files changed, 55 insertions(+), 45 deletions(-) diff --git a/fs/afs/addr_list.c b/fs/afs/addr_list.c index 55a756c60746..7b34fad4f8f5 100644 --- a/fs/afs/addr_list.c +++ b/fs/afs/addr_list.c @@ -318,10 +318,8 @@ bool afs_iterate_addresses(struct afs_addr_cursor *ac) if (ac->index == ac->alist->nr_addrs) ac->index = 0; - if (ac->index == ac->start) { - ac->error = -EDESTADDRREQ; + if (ac->index == ac->start) return false; - } } ac->begun = true; diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 36e9cc74ac11..81936a4d5035 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -629,6 +629,7 @@ struct afs_fs_cursor { unsigned intcb_break_2; /* cb_break + cb_s_break (2nd vnode) */ unsigned char start; /* Initial index in server list */ unsigned char index; /* Number of servers tried beyond start */ + short error; unsigned short flags; #define AFS_FS_CURSOR_STOP 0x0001 /* Set to cease iteration */ #define AFS_FS_CURSOR_VBUSY0x0002 /* Set if seen VBUSY */ diff --git a/fs/afs/rotate.c b/fs/afs/rotate.c index 1faef56b12bd..d7cbc3c230ee 100644 --- a/fs/afs/rotate.c +++ b/fs/afs/rotate.c @@ -39,9 +39,10 @@ bool afs_begin_vnode_operation(struct afs_fs_cursor *fc, struct afs_vnode *vnode fc->vnode = vnode; fc->key = key; fc->ac.error = SHRT_MAX; + fc->error = -EDESTADDRREQ; if (mutex_lock_interruptible(>io_lock) < 0) { - fc->ac.error = -EINTR; + fc->error = -EINTR; fc->flags |= AFS_FS_CURSOR_STOP; return false; } @@ -80,7 +81,7 @@ static bool afs_start_fs_iteration(struct afs_fs_cursor *fc, * and have to return an error. */ if (fc->flags & AFS_FS_CURSOR_CUR_ONLY) { - fc->ac.error = -ESTALE; + fc->error = -ESTALE; return false; } @@ -127,7 +128,7 @@ static bool afs_sleep_and_retry(struct afs_fs_cursor *fc) { msleep_interruptible(1000); if (signal_pending(current)) { - fc->ac.error = -ERESTARTSYS; + fc->error = -ERESTARTSYS; return false; } @@ -143,11 +144,12 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc) struct afs_addr_list *alist; struct afs_server *server; struct afs_vnode *vnode = fc->vnode; + int error = fc->ac.error; _enter("%u/%u,%u/%u,%d,%d", fc->index, fc->start, fc->ac.index, fc->ac.start, - fc->ac.error, fc->ac.abort_code); + error, fc->ac.abort_code); if (fc->flags & AFS_FS_CURSOR_STOP) { _leave(" = f [stopped]"); @@ -155,15 +157,16 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc) } /* Evaluate the result of the previous operation, if there was one. */ - switch (fc->ac.error) { + switch (error) { case SHRT_MAX: goto start; case 0: default: /* Success or local failure. Stop. */ + fc->error = error; fc->flags |= AFS_FS_CURSOR_STOP; - _leave(" = f [okay/local %d]", fc->ac.error); + _leave(" = f [okay/local %d]", error); return false; case -ECONNABORTED: @@ -178,7 +181,7 @@ bool afs_select_fileserver(struct afs_fs_cursor *fc)
Re: [PATCH] scsi/pmcraid.c: Use dma_pool_zalloc
On Mon, Oct 8, 2018 at 9:58 PM Souptick Joarder wrote: > > On Tue, Oct 2, 2018 at 10:53 AM Souptick Joarder wrote: > > > > Replaced dma_pool_alloc + memset with dma_pool_zalloc. > > > > Signed-off-by: Sabyasachi Gupta > > Signed-off-by: Souptick Joarder > > Any comment on this patch ? Any comment on this patch ? > > > --- > > drivers/scsi/pmcraid.c | 4 +--- > > 1 file changed, 1 insertion(+), 3 deletions(-) > > > > diff --git a/drivers/scsi/pmcraid.c b/drivers/scsi/pmcraid.c > > index 4e86994..84a2734 100644 > > --- a/drivers/scsi/pmcraid.c > > +++ b/drivers/scsi/pmcraid.c > > @@ -4681,7 +4681,7 @@ static int pmcraid_allocate_control_blocks(struct > > pmcraid_instance *pinstance) > > > > for (i = 0; i < PMCRAID_MAX_CMD; i++) { > > pinstance->cmd_list[i]->ioa_cb = > > - dma_pool_alloc( > > + dma_pool_zalloc( > > pinstance->control_pool, > > GFP_KERNEL, > > &(pinstance->cmd_list[i]->ioa_cb_bus_addr)); > > @@ -4690,8 +4690,6 @@ static int pmcraid_allocate_control_blocks(struct > > pmcraid_instance *pinstance) > > pmcraid_release_control_blocks(pinstance, i); > > return -ENOMEM; > > } > > - memset(pinstance->cmd_list[i]->ioa_cb, 0, > > - sizeof(struct pmcraid_control_block)); > > } > > return 0; > > } > > -- > > 1.9.1 > >
[PATCH 21/25] afs: Implement YFS support in the fs client [ver #2]
Implement support for talking to YFS-variant fileservers in the cache manager and the filesystem client. These implement upgraded services on the same port as their AFS services. YFS fileservers provide expanded capabilities over AFS. Signed-off-by: David Howells --- fs/afs/Makefile|3 fs/afs/callback.c |9 fs/afs/dir.c | 21 fs/afs/fsclient.c | 104 ++ fs/afs/internal.h | 35 + fs/afs/protocol_yfs.h | 106 ++ fs/afs/server.c|8 fs/afs/yfsclient.c | 2184 include/trace/events/afs.h | 58 + 9 files changed, 2500 insertions(+), 28 deletions(-) create mode 100644 fs/afs/yfsclient.c diff --git a/fs/afs/Makefile b/fs/afs/Makefile index 03e9f7afea1b..cc942b790cff 100644 --- a/fs/afs/Makefile +++ b/fs/afs/Makefile @@ -33,7 +33,8 @@ kafs-y := \ vl_list.o \ volume.o \ write.o \ - xattr.o + xattr.o \ + yfsclient.o kafs-$(CONFIG_PROC_FS) += proc.o obj-$(CONFIG_AFS_FS) := kafs.o diff --git a/fs/afs/callback.c b/fs/afs/callback.c index df9bfee698ad..1c7955f5cdaf 100644 --- a/fs/afs/callback.c +++ b/fs/afs/callback.c @@ -210,12 +210,10 @@ void afs_init_callback_state(struct afs_server *server) /* * actually break a callback */ -void afs_break_callback(struct afs_vnode *vnode) +void __afs_break_callback(struct afs_vnode *vnode) { _enter(""); - write_seqlock(>cb_lock); - clear_bit(AFS_VNODE_NEW_CONTENT, >flags); if (test_and_clear_bit(AFS_VNODE_CB_PROMISED, >flags)) { vnode->cb_break++; @@ -230,7 +228,12 @@ void afs_break_callback(struct afs_vnode *vnode) afs_lock_may_be_available(vnode); spin_unlock(>lock); } +} +void afs_break_callback(struct afs_vnode *vnode) +{ + write_seqlock(>cb_lock); + __afs_break_callback(vnode); write_sequnlock(>cb_lock); } diff --git a/fs/afs/dir.c b/fs/afs/dir.c index f2dd48d4363f..43dea3b00c29 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -1200,7 +1200,7 @@ static int afs_rmdir(struct inode *dir, struct dentry *dentry) if (afs_begin_vnode_operation(, dvnode, key)) { while (afs_select_fileserver()) { fc.cb_break = afs_calc_vnode_cb_break(dvnode); - afs_fs_remove(, dentry->d_name.name, true, + afs_fs_remove(, vnode, dentry->d_name.name, true, data_version); } @@ -1245,7 +1245,9 @@ static int afs_dir_remove_link(struct dentry *dentry, struct key *key, if (d_really_is_positive(dentry)) { struct afs_vnode *vnode = AFS_FS_I(d_inode(dentry)); - if (dir_valid) { + if (test_bit(AFS_VNODE_DELETED, >flags)) { + /* Already done */ + } else if (dir_valid) { drop_nlink(>vfs_inode); if (vnode->vfs_inode.i_nlink == 0) { set_bit(AFS_VNODE_DELETED, >flags); @@ -1274,7 +1276,7 @@ static int afs_dir_remove_link(struct dentry *dentry, struct key *key, static int afs_unlink(struct inode *dir, struct dentry *dentry) { struct afs_fs_cursor fc; - struct afs_vnode *dvnode = AFS_FS_I(dir), *vnode; + struct afs_vnode *dvnode = AFS_FS_I(dir), *vnode = NULL; struct key *key; unsigned long d_version = (unsigned long)dentry->d_fsdata; u64 data_version = dvnode->status.data_version; @@ -1304,7 +1306,18 @@ static int afs_unlink(struct inode *dir, struct dentry *dentry) if (afs_begin_vnode_operation(, dvnode, key)) { while (afs_select_fileserver()) { fc.cb_break = afs_calc_vnode_cb_break(dvnode); - afs_fs_remove(, dentry->d_name.name, false, + + if (test_bit(AFS_SERVER_FL_IS_YFS, >server->flags) && + !test_bit(AFS_SERVER_FL_NO_RM2, >server->flags)) { + yfs_fs_remove_file2(, vnode, dentry->d_name.name, + data_version); + if (fc.ac.error != -ECONNABORTED || + fc.ac.abort_code != RXGEN_OPCODE) + continue; + set_bit(AFS_SERVER_FL_NO_RM2, >server->flags); + } + + afs_fs_remove(, vnode, dentry->d_name.name, false, data_version); } diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 2da65309e0de..3975969719de 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -17,6 +17,7 @@ #include "internal.h" #include "afs_fs.h" #include "xdr_fs.h" +#include "protocol_yfs.h" static const struct afs_fid afs_zero_fid;
[PATCH 18/25] afs: Calc callback expiry in op reply delivery [ver #2]
Calculate the callback expiration time at the point of operation reply delivery, using the reply time queried from AF_RXRPC on that call as a base. Signed-off-by: David Howells --- fs/afs/afs.h |2 +- fs/afs/fsclient.c | 22 +- fs/afs/inode.c|4 ++-- fs/afs/internal.h |2 ++ fs/afs/rxrpc.c|6 ++ 5 files changed, 28 insertions(+), 8 deletions(-) diff --git a/fs/afs/afs.h b/fs/afs/afs.h index fb9bcb8758ea..417cd23529c5 100644 --- a/fs/afs/afs.h +++ b/fs/afs/afs.h @@ -68,8 +68,8 @@ typedef enum { } afs_callback_type_t; struct afs_callback { + time64_texpires_at; /* Time at which expires */ unsignedversion;/* Callback version */ - unsignedexpiry; /* Time at which expires */ afs_callback_type_t type; /* Type of callback */ }; diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index f758750e81d8..6105cdb17163 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -287,13 +287,19 @@ static void xdr_decode_AFSCallBack(struct afs_call *call, *_bp = bp; } -static void xdr_decode_AFSCallBack_raw(const __be32 **_bp, +static ktime_t xdr_decode_expiry(struct afs_call *call, u32 expiry) +{ + return ktime_add_ns(call->reply_time, expiry * NSEC_PER_SEC); +} + +static void xdr_decode_AFSCallBack_raw(struct afs_call *call, + const __be32 **_bp, struct afs_callback *cb) { const __be32 *bp = *_bp; cb->version = ntohl(*bp++); - cb->expiry = ntohl(*bp++); + cb->expires_at = xdr_decode_expiry(call, ntohl(*bp++)); cb->type= ntohl(*bp++); *_bp = bp; } @@ -440,6 +446,7 @@ int afs_fs_fetch_file_status(struct afs_fs_cursor *fc, struct afs_volsync *volsy call->reply[0] = vnode; call->reply[1] = volsync; call->expected_version = new_inode ? 1 : vnode->status.data_version; + call->want_reply_time = true; /* marshall the parameters */ bp = call->request; @@ -627,6 +634,7 @@ static int afs_fs_fetch_data64(struct afs_fs_cursor *fc, struct afs_read *req) call->reply[1] = NULL; /* volsync */ call->reply[2] = req; call->expected_version = vnode->status.data_version; + call->want_reply_time = true; /* marshall the parameters */ bp = call->request; @@ -672,6 +680,7 @@ int afs_fs_fetch_data(struct afs_fs_cursor *fc, struct afs_read *req) call->reply[1] = NULL; /* volsync */ call->reply[2] = req; call->expected_version = vnode->status.data_version; + call->want_reply_time = true; /* marshall the parameters */ bp = call->request; @@ -714,7 +723,7 @@ static int afs_deliver_fs_create_vnode(struct afs_call *call) >expected_version, NULL); if (ret < 0) return ret; - xdr_decode_AFSCallBack_raw(, call->reply[3]); + xdr_decode_AFSCallBack_raw(call, , call->reply[3]); /* xdr_decode_AFSVolSync(, call->reply[X]); */ _leave(" = 0 [done]"); @@ -773,6 +782,7 @@ int afs_fs_create(struct afs_fs_cursor *fc, call->reply[2] = newstatus; call->reply[3] = newcb; call->expected_version = current_data_version + 1; + call->want_reply_time = true; /* marshall the parameters */ bp = call->request; @@ -2042,7 +2052,7 @@ static int afs_deliver_fs_fetch_status(struct afs_call *call) >expected_version, NULL); if (ret < 0) return ret; - xdr_decode_AFSCallBack_raw(, callback); + xdr_decode_AFSCallBack_raw(call, , callback); if (volsync) xdr_decode_AFSVolSync(, volsync); @@ -2088,6 +2098,7 @@ int afs_fs_fetch_status(struct afs_fs_cursor *fc, call->reply[2] = callback; call->reply[3] = volsync; call->expected_version = 1; /* vnode->status.data_version */ + call->want_reply_time = true; /* marshall the parameters */ bp = call->request; @@ -2188,7 +2199,7 @@ static int afs_deliver_fs_inline_bulk_status(struct afs_call *call) bp = call->buffer; callbacks = call->reply[2]; callbacks[call->count].version = ntohl(bp[0]); - callbacks[call->count].expiry = ntohl(bp[1]); + callbacks[call->count].expires_at = xdr_decode_expiry(call, ntohl(bp[1])); callbacks[call->count].type = ntohl(bp[2]); statuses = call->reply[1]; if (call->count == 0 && vnode && statuses[0].abort_code == 0) @@ -2261,6 +2272,7 @@ int afs_fs_inline_bulk_status(struct afs_fs_cursor *fc, call->reply[2] = callbacks; call->reply[3] = volsync; call->count2 = nr_fids; + call->want_reply_time = true; /*
[PATCH 20/25] afs: Expand data structure fields to support YFS [ver #2]
Expand fields in various data structures to support the expanded information that YFS is capable of returning. Signed-off-by: David Howells --- fs/afs/afs.h | 35 ++- fs/afs/fsclient.c |9 + 2 files changed, 23 insertions(+), 21 deletions(-) diff --git a/fs/afs/afs.h b/fs/afs/afs.h index 417cd23529c5..d12ffb457e47 100644 --- a/fs/afs/afs.h +++ b/fs/afs/afs.h @@ -130,19 +130,18 @@ typedef u32 afs_access_t; struct afs_file_status { u64 size; /* file size */ afs_dataversion_t data_version; /* current data version */ - time_t mtime_client; /* last time client changed data */ - time_t mtime_server; /* last time server changed data */ - unsignedabort_code; /* Abort if bulk-fetching this failed */ - - afs_file_type_t type; /* file type */ - unsignednlink; /* link count */ - u32 author; /* author ID */ - u32 owner; /* owner ID */ - u32 group; /* group ID */ + struct timespec64 mtime_client; /* Last time client changed data */ + struct timespec64 mtime_server; /* Last time server changed data */ + s64 author; /* author ID */ + s64 owner; /* owner ID */ + s64 group; /* group ID */ afs_access_tcaller_access; /* access rights for authenticated caller */ afs_access_tanon_access;/* access rights for unauthenticated caller */ umode_t mode; /* UNIX mode */ + afs_file_type_t type; /* file type */ + u32 nlink; /* link count */ s32 lock_count; /* file lock count (0=UNLK -1=WRLCK +ve=#RDLCK */ + u32 abort_code; /* Abort if bulk-fetching this failed */ }; /* @@ -159,25 +158,27 @@ struct afs_file_status { * AFS volume synchronisation information */ struct afs_volsync { - time_t creation; /* volume creation time */ + time64_tcreation; /* volume creation time */ }; /* * AFS volume status record */ struct afs_volume_status { - u32 vid;/* volume ID */ - u32 parent_id; /* parent volume ID */ + afs_volid_t vid;/* volume ID */ + afs_volid_t parent_id; /* parent volume ID */ u8 online; /* true if volume currently online and available */ u8 in_service; /* true if volume currently in service */ u8 blessed;/* same as in_service */ u8 needs_salvage; /* true if consistency checking required */ u32 type; /* volume type (afs_voltype_t) */ - u32 min_quota; /* minimum space set aside (blocks) */ - u32 max_quota; /* maximum space this volume may occupy (blocks) */ - u32 blocks_in_use; /* space this volume currently occupies (blocks) */ - u32 part_blocks_avail; /* space available in volume's partition */ - u32 part_max_blocks; /* size of volume's partition */ + u64 min_quota; /* minimum space set aside (blocks) */ + u64 max_quota; /* maximum space this volume may occupy (blocks) */ + u64 blocks_in_use; /* space this volume currently occupies (blocks) */ + u64 part_blocks_avail; /* space available in volume's partition */ + u64 part_max_blocks; /* size of volume's partition */ + s64 vol_copy_date; + s64 vol_backup_date; }; #define AFS_BLOCK_SIZE 1024 diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 6105cdb17163..2da65309e0de 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -69,8 +69,7 @@ void afs_update_inode_from_status(struct afs_vnode *vnode, struct timespec64 t; umode_t mode; - t.tv_sec = status->mtime_client; - t.tv_nsec = 0; + t = status->mtime_client; vnode->vfs_inode.i_ctime = t; vnode->vfs_inode.i_mtime = t; vnode->vfs_inode.i_atime = t; @@ -194,8 +193,10 @@ static int xdr_decode_AFSFetchStatus(struct afs_call *call, EXTRACT_M(mode); EXTRACT_M(group); - status->mtime_client = ntohl(xdr->mtime_client); - status->mtime_server =
[PATCH 11/25] afs: Add a couple of tracepoints to log I/O errors [ver #2]
Add a couple of tracepoints to log the production of I/O errors within the AFS filesystem. Signed-off-by: David Howells --- fs/afs/cmservice.c | 10 +++-- fs/afs/dir.c | 18 ++ fs/afs/internal.h | 11 ++ fs/afs/mntpt.c |5 ++- fs/afs/rxrpc.c |2 + fs/afs/server.c|2 + fs/afs/write.c |1 + include/trace/events/afs.h | 81 8 files changed, 114 insertions(+), 16 deletions(-) diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 4db62ae8dc1a..186f621f8722 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -260,7 +260,7 @@ static int afs_deliver_cb_callback(struct afs_call *call) } if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING)) - return -EIO; + return afs_io_error(call, afs_io_error_cm_reply); /* we'll need the file server record as that tells us which set of * vnodes to operate upon */ @@ -368,7 +368,7 @@ static int afs_deliver_cb_init_call_back_state3(struct afs_call *call) } if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING)) - return -EIO; + return afs_io_error(call, afs_io_error_cm_reply); /* we'll need the file server record as that tells us which set of * vnodes to operate upon */ @@ -409,7 +409,7 @@ static int afs_deliver_cb_probe(struct afs_call *call) return ret; if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING)) - return -EIO; + return afs_io_error(call, afs_io_error_cm_reply); return afs_queue_call_work(call); } @@ -490,7 +490,7 @@ static int afs_deliver_cb_probe_uuid(struct afs_call *call) } if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING)) - return -EIO; + return afs_io_error(call, afs_io_error_cm_reply); return afs_queue_call_work(call); } @@ -573,7 +573,7 @@ static int afs_deliver_cb_tell_me_about_yourself(struct afs_call *call) return ret; if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING)) - return -EIO; + return afs_io_error(call, afs_io_error_cm_reply); return afs_queue_call_work(call); } diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 855bf2b79fed..78f9754fd03d 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -138,6 +138,7 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, struct page *page, ntohs(dbuf->blocks[tmp].hdr.magic)); trace_afs_dir_check_failed(dvnode, off, i_size); kunmap(page); + trace_afs_file_error(dvnode, -EIO, afs_file_error_dir_bad_magic); goto error; } @@ -190,9 +191,11 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key) retry: i_size = i_size_read(>vfs_inode); if (i_size < 2048) - return ERR_PTR(-EIO); - if (i_size > 2048 * 1024) + return ERR_PTR(afs_bad(dvnode, afs_file_error_dir_small)); + if (i_size > 2048 * 1024) { + trace_afs_file_error(dvnode, -EFBIG, afs_file_error_dir_big); return ERR_PTR(-EFBIG); + } _enter("%llu", i_size); @@ -315,7 +318,8 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key) /* * deal with one block in an AFS directory */ -static int afs_dir_iterate_block(struct dir_context *ctx, +static int afs_dir_iterate_block(struct afs_vnode *dvnode, +struct dir_context *ctx, union afs_xdr_dir_block *block, unsigned blkoff) { @@ -365,7 +369,7 @@ static int afs_dir_iterate_block(struct dir_context *ctx, " (len %u/%zu)", blkoff / sizeof(union afs_xdr_dir_block), offset, next, tmp, nlen); - return -EIO; + return afs_bad(dvnode, afs_file_error_dir_over_end); } if (!(block->hdr.bitmap[next / 8] & (1 << (next % 8 { @@ -373,7 +377,7 @@ static int afs_dir_iterate_block(struct dir_context *ctx, " %u unmarked extension (len %u/%zu)", blkoff / sizeof(union afs_xdr_dir_block), offset, next, tmp, nlen); - return -EIO; + return afs_bad(dvnode, afs_file_error_dir_unmarked_ext); } _debug("ENT[%zu.%u]: ext %u/%zu", @@ -442,7 +446,7 @@ static int afs_dir_iterate(struct
[PATCH 18/25] afs: Calc callback expiry in op reply delivery [ver #2]
Calculate the callback expiration time at the point of operation reply delivery, using the reply time queried from AF_RXRPC on that call as a base. Signed-off-by: David Howells --- fs/afs/afs.h |2 +- fs/afs/fsclient.c | 22 +- fs/afs/inode.c|4 ++-- fs/afs/internal.h |2 ++ fs/afs/rxrpc.c|6 ++ 5 files changed, 28 insertions(+), 8 deletions(-) diff --git a/fs/afs/afs.h b/fs/afs/afs.h index fb9bcb8758ea..417cd23529c5 100644 --- a/fs/afs/afs.h +++ b/fs/afs/afs.h @@ -68,8 +68,8 @@ typedef enum { } afs_callback_type_t; struct afs_callback { + time64_texpires_at; /* Time at which expires */ unsignedversion;/* Callback version */ - unsignedexpiry; /* Time at which expires */ afs_callback_type_t type; /* Type of callback */ }; diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index f758750e81d8..6105cdb17163 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -287,13 +287,19 @@ static void xdr_decode_AFSCallBack(struct afs_call *call, *_bp = bp; } -static void xdr_decode_AFSCallBack_raw(const __be32 **_bp, +static ktime_t xdr_decode_expiry(struct afs_call *call, u32 expiry) +{ + return ktime_add_ns(call->reply_time, expiry * NSEC_PER_SEC); +} + +static void xdr_decode_AFSCallBack_raw(struct afs_call *call, + const __be32 **_bp, struct afs_callback *cb) { const __be32 *bp = *_bp; cb->version = ntohl(*bp++); - cb->expiry = ntohl(*bp++); + cb->expires_at = xdr_decode_expiry(call, ntohl(*bp++)); cb->type= ntohl(*bp++); *_bp = bp; } @@ -440,6 +446,7 @@ int afs_fs_fetch_file_status(struct afs_fs_cursor *fc, struct afs_volsync *volsy call->reply[0] = vnode; call->reply[1] = volsync; call->expected_version = new_inode ? 1 : vnode->status.data_version; + call->want_reply_time = true; /* marshall the parameters */ bp = call->request; @@ -627,6 +634,7 @@ static int afs_fs_fetch_data64(struct afs_fs_cursor *fc, struct afs_read *req) call->reply[1] = NULL; /* volsync */ call->reply[2] = req; call->expected_version = vnode->status.data_version; + call->want_reply_time = true; /* marshall the parameters */ bp = call->request; @@ -672,6 +680,7 @@ int afs_fs_fetch_data(struct afs_fs_cursor *fc, struct afs_read *req) call->reply[1] = NULL; /* volsync */ call->reply[2] = req; call->expected_version = vnode->status.data_version; + call->want_reply_time = true; /* marshall the parameters */ bp = call->request; @@ -714,7 +723,7 @@ static int afs_deliver_fs_create_vnode(struct afs_call *call) >expected_version, NULL); if (ret < 0) return ret; - xdr_decode_AFSCallBack_raw(, call->reply[3]); + xdr_decode_AFSCallBack_raw(call, , call->reply[3]); /* xdr_decode_AFSVolSync(, call->reply[X]); */ _leave(" = 0 [done]"); @@ -773,6 +782,7 @@ int afs_fs_create(struct afs_fs_cursor *fc, call->reply[2] = newstatus; call->reply[3] = newcb; call->expected_version = current_data_version + 1; + call->want_reply_time = true; /* marshall the parameters */ bp = call->request; @@ -2042,7 +2052,7 @@ static int afs_deliver_fs_fetch_status(struct afs_call *call) >expected_version, NULL); if (ret < 0) return ret; - xdr_decode_AFSCallBack_raw(, callback); + xdr_decode_AFSCallBack_raw(call, , callback); if (volsync) xdr_decode_AFSVolSync(, volsync); @@ -2088,6 +2098,7 @@ int afs_fs_fetch_status(struct afs_fs_cursor *fc, call->reply[2] = callback; call->reply[3] = volsync; call->expected_version = 1; /* vnode->status.data_version */ + call->want_reply_time = true; /* marshall the parameters */ bp = call->request; @@ -2188,7 +2199,7 @@ static int afs_deliver_fs_inline_bulk_status(struct afs_call *call) bp = call->buffer; callbacks = call->reply[2]; callbacks[call->count].version = ntohl(bp[0]); - callbacks[call->count].expiry = ntohl(bp[1]); + callbacks[call->count].expires_at = xdr_decode_expiry(call, ntohl(bp[1])); callbacks[call->count].type = ntohl(bp[2]); statuses = call->reply[1]; if (call->count == 0 && vnode && statuses[0].abort_code == 0) @@ -2261,6 +2272,7 @@ int afs_fs_inline_bulk_status(struct afs_fs_cursor *fc, call->reply[2] = callbacks; call->reply[3] = volsync; call->count2 = nr_fids; + call->want_reply_time = true; /*
[PATCH 20/25] afs: Expand data structure fields to support YFS [ver #2]
Expand fields in various data structures to support the expanded information that YFS is capable of returning. Signed-off-by: David Howells --- fs/afs/afs.h | 35 ++- fs/afs/fsclient.c |9 + 2 files changed, 23 insertions(+), 21 deletions(-) diff --git a/fs/afs/afs.h b/fs/afs/afs.h index 417cd23529c5..d12ffb457e47 100644 --- a/fs/afs/afs.h +++ b/fs/afs/afs.h @@ -130,19 +130,18 @@ typedef u32 afs_access_t; struct afs_file_status { u64 size; /* file size */ afs_dataversion_t data_version; /* current data version */ - time_t mtime_client; /* last time client changed data */ - time_t mtime_server; /* last time server changed data */ - unsignedabort_code; /* Abort if bulk-fetching this failed */ - - afs_file_type_t type; /* file type */ - unsignednlink; /* link count */ - u32 author; /* author ID */ - u32 owner; /* owner ID */ - u32 group; /* group ID */ + struct timespec64 mtime_client; /* Last time client changed data */ + struct timespec64 mtime_server; /* Last time server changed data */ + s64 author; /* author ID */ + s64 owner; /* owner ID */ + s64 group; /* group ID */ afs_access_tcaller_access; /* access rights for authenticated caller */ afs_access_tanon_access;/* access rights for unauthenticated caller */ umode_t mode; /* UNIX mode */ + afs_file_type_t type; /* file type */ + u32 nlink; /* link count */ s32 lock_count; /* file lock count (0=UNLK -1=WRLCK +ve=#RDLCK */ + u32 abort_code; /* Abort if bulk-fetching this failed */ }; /* @@ -159,25 +158,27 @@ struct afs_file_status { * AFS volume synchronisation information */ struct afs_volsync { - time_t creation; /* volume creation time */ + time64_tcreation; /* volume creation time */ }; /* * AFS volume status record */ struct afs_volume_status { - u32 vid;/* volume ID */ - u32 parent_id; /* parent volume ID */ + afs_volid_t vid;/* volume ID */ + afs_volid_t parent_id; /* parent volume ID */ u8 online; /* true if volume currently online and available */ u8 in_service; /* true if volume currently in service */ u8 blessed;/* same as in_service */ u8 needs_salvage; /* true if consistency checking required */ u32 type; /* volume type (afs_voltype_t) */ - u32 min_quota; /* minimum space set aside (blocks) */ - u32 max_quota; /* maximum space this volume may occupy (blocks) */ - u32 blocks_in_use; /* space this volume currently occupies (blocks) */ - u32 part_blocks_avail; /* space available in volume's partition */ - u32 part_max_blocks; /* size of volume's partition */ + u64 min_quota; /* minimum space set aside (blocks) */ + u64 max_quota; /* maximum space this volume may occupy (blocks) */ + u64 blocks_in_use; /* space this volume currently occupies (blocks) */ + u64 part_blocks_avail; /* space available in volume's partition */ + u64 part_max_blocks; /* size of volume's partition */ + s64 vol_copy_date; + s64 vol_backup_date; }; #define AFS_BLOCK_SIZE 1024 diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 6105cdb17163..2da65309e0de 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -69,8 +69,7 @@ void afs_update_inode_from_status(struct afs_vnode *vnode, struct timespec64 t; umode_t mode; - t.tv_sec = status->mtime_client; - t.tv_nsec = 0; + t = status->mtime_client; vnode->vfs_inode.i_ctime = t; vnode->vfs_inode.i_mtime = t; vnode->vfs_inode.i_atime = t; @@ -194,8 +193,10 @@ static int xdr_decode_AFSFetchStatus(struct afs_call *call, EXTRACT_M(mode); EXTRACT_M(group); - status->mtime_client = ntohl(xdr->mtime_client); - status->mtime_server =
[PATCH 11/25] afs: Add a couple of tracepoints to log I/O errors [ver #2]
Add a couple of tracepoints to log the production of I/O errors within the AFS filesystem. Signed-off-by: David Howells --- fs/afs/cmservice.c | 10 +++-- fs/afs/dir.c | 18 ++ fs/afs/internal.h | 11 ++ fs/afs/mntpt.c |5 ++- fs/afs/rxrpc.c |2 + fs/afs/server.c|2 + fs/afs/write.c |1 + include/trace/events/afs.h | 81 8 files changed, 114 insertions(+), 16 deletions(-) diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 4db62ae8dc1a..186f621f8722 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -260,7 +260,7 @@ static int afs_deliver_cb_callback(struct afs_call *call) } if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING)) - return -EIO; + return afs_io_error(call, afs_io_error_cm_reply); /* we'll need the file server record as that tells us which set of * vnodes to operate upon */ @@ -368,7 +368,7 @@ static int afs_deliver_cb_init_call_back_state3(struct afs_call *call) } if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING)) - return -EIO; + return afs_io_error(call, afs_io_error_cm_reply); /* we'll need the file server record as that tells us which set of * vnodes to operate upon */ @@ -409,7 +409,7 @@ static int afs_deliver_cb_probe(struct afs_call *call) return ret; if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING)) - return -EIO; + return afs_io_error(call, afs_io_error_cm_reply); return afs_queue_call_work(call); } @@ -490,7 +490,7 @@ static int afs_deliver_cb_probe_uuid(struct afs_call *call) } if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING)) - return -EIO; + return afs_io_error(call, afs_io_error_cm_reply); return afs_queue_call_work(call); } @@ -573,7 +573,7 @@ static int afs_deliver_cb_tell_me_about_yourself(struct afs_call *call) return ret; if (!afs_check_call_state(call, AFS_CALL_SV_REPLYING)) - return -EIO; + return afs_io_error(call, afs_io_error_cm_reply); return afs_queue_call_work(call); } diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 855bf2b79fed..78f9754fd03d 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -138,6 +138,7 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, struct page *page, ntohs(dbuf->blocks[tmp].hdr.magic)); trace_afs_dir_check_failed(dvnode, off, i_size); kunmap(page); + trace_afs_file_error(dvnode, -EIO, afs_file_error_dir_bad_magic); goto error; } @@ -190,9 +191,11 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key) retry: i_size = i_size_read(>vfs_inode); if (i_size < 2048) - return ERR_PTR(-EIO); - if (i_size > 2048 * 1024) + return ERR_PTR(afs_bad(dvnode, afs_file_error_dir_small)); + if (i_size > 2048 * 1024) { + trace_afs_file_error(dvnode, -EFBIG, afs_file_error_dir_big); return ERR_PTR(-EFBIG); + } _enter("%llu", i_size); @@ -315,7 +318,8 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key) /* * deal with one block in an AFS directory */ -static int afs_dir_iterate_block(struct dir_context *ctx, +static int afs_dir_iterate_block(struct afs_vnode *dvnode, +struct dir_context *ctx, union afs_xdr_dir_block *block, unsigned blkoff) { @@ -365,7 +369,7 @@ static int afs_dir_iterate_block(struct dir_context *ctx, " (len %u/%zu)", blkoff / sizeof(union afs_xdr_dir_block), offset, next, tmp, nlen); - return -EIO; + return afs_bad(dvnode, afs_file_error_dir_over_end); } if (!(block->hdr.bitmap[next / 8] & (1 << (next % 8 { @@ -373,7 +377,7 @@ static int afs_dir_iterate_block(struct dir_context *ctx, " %u unmarked extension (len %u/%zu)", blkoff / sizeof(union afs_xdr_dir_block), offset, next, tmp, nlen); - return -EIO; + return afs_bad(dvnode, afs_file_error_dir_unmarked_ext); } _debug("ENT[%zu.%u]: ext %u/%zu", @@ -442,7 +446,7 @@ static int afs_dir_iterate(struct
[PATCH 09/25] afs: Fix TTL on VL server and address lists [ver #2]
Currently the TTL on VL server and address lists isn't set in all circumstances and may be set to poor choices in others, since the TTL is derived from the SRV/AFSDB DNS record if and when available. Fix the TTL by limiting the range to a minimum and maximum from the current time. At some point these can be made into sysctl knobs. Further, use the TTL we obtained from the upcall to set the expiry on negative results too; in future a mechanism can be added to force reloading of such data. Signed-off-by: David Howells --- fs/afs/cell.c | 26 ++ fs/afs/proc.c | 14 +++--- 2 files changed, 33 insertions(+), 7 deletions(-) diff --git a/fs/afs/cell.c b/fs/afs/cell.c index 963b6fa51fdf..cf445dbd5f2e 100644 --- a/fs/afs/cell.c +++ b/fs/afs/cell.c @@ -20,6 +20,8 @@ #include "internal.h" static unsigned __read_mostly afs_cell_gc_delay = 10; +static unsigned __read_mostly afs_cell_min_ttl = 10 * 60; +static unsigned __read_mostly afs_cell_max_ttl = 24 * 60 * 60; static void afs_manage_cell(struct work_struct *); @@ -171,6 +173,8 @@ static struct afs_cell *afs_alloc_cell(struct afs_net *net, rcu_assign_pointer(cell->vl_servers, vllist); cell->dns_expiry = TIME64_MAX; + } else { + cell->dns_expiry = ktime_get_real_seconds(); } _leave(" = %p", cell); @@ -358,25 +362,39 @@ int afs_cell_init(struct afs_net *net, const char *rootcell) static void afs_update_cell(struct afs_cell *cell) { struct afs_vlserver_list *vllist, *old; - time64_t now, expiry; + unsigned int min_ttl = READ_ONCE(afs_cell_min_ttl); + unsigned int max_ttl = READ_ONCE(afs_cell_max_ttl); + time64_t now, expiry = 0; _enter("%s", cell->name); vllist = afs_dns_query(cell, ); + + now = ktime_get_real_seconds(); + if (min_ttl > max_ttl) + max_ttl = min_ttl; + if (expiry < now + min_ttl) + expiry = now + min_ttl; + else if (expiry > now + max_ttl) + expiry = now + max_ttl; + if (IS_ERR(vllist)) { switch (PTR_ERR(vllist)) { case -ENODATA: - /* The DNS said that the cell does not exist */ + case -EDESTADDRREQ: + /* The DNS said that the cell does not exist or there +* weren't any addresses to be had. +*/ set_bit(AFS_CELL_FL_NOT_FOUND, >flags); clear_bit(AFS_CELL_FL_DNS_FAIL, >flags); - cell->dns_expiry = ktime_get_real_seconds() + 61; + cell->dns_expiry = expiry; break; case -EAGAIN: case -ECONNREFUSED: default: set_bit(AFS_CELL_FL_DNS_FAIL, >flags); - cell->dns_expiry = ktime_get_real_seconds() + 10; + cell->dns_expiry = now + 10; break; } diff --git a/fs/afs/proc.c b/fs/afs/proc.c index 6585f4bec0d3..fc36c41641ab 100644 --- a/fs/afs/proc.c +++ b/fs/afs/proc.c @@ -37,16 +37,24 @@ static inline struct afs_net *afs_seq2net_single(struct seq_file *m) */ static int afs_proc_cells_show(struct seq_file *m, void *v) { - struct afs_cell *cell = list_entry(v, struct afs_cell, proc_link); + struct afs_vlserver_list *vllist; + struct afs_cell *cell; if (v == SEQ_START_TOKEN) { /* display header on line 1 */ - seq_puts(m, "USE NAME\n"); + seq_puts(m, "USETTL SV NAME\n"); return 0; } + cell = list_entry(v, struct afs_cell, proc_link); + vllist = rcu_dereference(cell->vl_servers); + /* display one cell per line on subsequent lines */ - seq_printf(m, "%3u %s\n", atomic_read(>usage), cell->name); + seq_printf(m, "%3u %6lld %2u %s\n", + atomic_read(>usage), + cell->dns_expiry - ktime_get_real_seconds(), + vllist ? vllist->nr_servers : 0, + cell->name); return 0; }
[PATCH 09/25] afs: Fix TTL on VL server and address lists [ver #2]
Currently the TTL on VL server and address lists isn't set in all circumstances and may be set to poor choices in others, since the TTL is derived from the SRV/AFSDB DNS record if and when available. Fix the TTL by limiting the range to a minimum and maximum from the current time. At some point these can be made into sysctl knobs. Further, use the TTL we obtained from the upcall to set the expiry on negative results too; in future a mechanism can be added to force reloading of such data. Signed-off-by: David Howells --- fs/afs/cell.c | 26 ++ fs/afs/proc.c | 14 +++--- 2 files changed, 33 insertions(+), 7 deletions(-) diff --git a/fs/afs/cell.c b/fs/afs/cell.c index 963b6fa51fdf..cf445dbd5f2e 100644 --- a/fs/afs/cell.c +++ b/fs/afs/cell.c @@ -20,6 +20,8 @@ #include "internal.h" static unsigned __read_mostly afs_cell_gc_delay = 10; +static unsigned __read_mostly afs_cell_min_ttl = 10 * 60; +static unsigned __read_mostly afs_cell_max_ttl = 24 * 60 * 60; static void afs_manage_cell(struct work_struct *); @@ -171,6 +173,8 @@ static struct afs_cell *afs_alloc_cell(struct afs_net *net, rcu_assign_pointer(cell->vl_servers, vllist); cell->dns_expiry = TIME64_MAX; + } else { + cell->dns_expiry = ktime_get_real_seconds(); } _leave(" = %p", cell); @@ -358,25 +362,39 @@ int afs_cell_init(struct afs_net *net, const char *rootcell) static void afs_update_cell(struct afs_cell *cell) { struct afs_vlserver_list *vllist, *old; - time64_t now, expiry; + unsigned int min_ttl = READ_ONCE(afs_cell_min_ttl); + unsigned int max_ttl = READ_ONCE(afs_cell_max_ttl); + time64_t now, expiry = 0; _enter("%s", cell->name); vllist = afs_dns_query(cell, ); + + now = ktime_get_real_seconds(); + if (min_ttl > max_ttl) + max_ttl = min_ttl; + if (expiry < now + min_ttl) + expiry = now + min_ttl; + else if (expiry > now + max_ttl) + expiry = now + max_ttl; + if (IS_ERR(vllist)) { switch (PTR_ERR(vllist)) { case -ENODATA: - /* The DNS said that the cell does not exist */ + case -EDESTADDRREQ: + /* The DNS said that the cell does not exist or there +* weren't any addresses to be had. +*/ set_bit(AFS_CELL_FL_NOT_FOUND, >flags); clear_bit(AFS_CELL_FL_DNS_FAIL, >flags); - cell->dns_expiry = ktime_get_real_seconds() + 61; + cell->dns_expiry = expiry; break; case -EAGAIN: case -ECONNREFUSED: default: set_bit(AFS_CELL_FL_DNS_FAIL, >flags); - cell->dns_expiry = ktime_get_real_seconds() + 10; + cell->dns_expiry = now + 10; break; } diff --git a/fs/afs/proc.c b/fs/afs/proc.c index 6585f4bec0d3..fc36c41641ab 100644 --- a/fs/afs/proc.c +++ b/fs/afs/proc.c @@ -37,16 +37,24 @@ static inline struct afs_net *afs_seq2net_single(struct seq_file *m) */ static int afs_proc_cells_show(struct seq_file *m, void *v) { - struct afs_cell *cell = list_entry(v, struct afs_cell, proc_link); + struct afs_vlserver_list *vllist; + struct afs_cell *cell; if (v == SEQ_START_TOKEN) { /* display header on line 1 */ - seq_puts(m, "USE NAME\n"); + seq_puts(m, "USETTL SV NAME\n"); return 0; } + cell = list_entry(v, struct afs_cell, proc_link); + vllist = rcu_dereference(cell->vl_servers); + /* display one cell per line on subsequent lines */ - seq_printf(m, "%3u %s\n", atomic_read(>usage), cell->name); + seq_printf(m, "%3u %6lld %2u %s\n", + atomic_read(>usage), + cell->dns_expiry - ktime_get_real_seconds(), + vllist ? vllist->nr_servers : 0, + cell->name); return 0; }
[PATCH 17/25] afs: Fix FS.FetchStatus delivery from updating wrong vnode [ver #2]
The FS.FetchStatus reply delivery function was updating inode of the directory in which a lookup had been done with the status of the looked up file. This corrupts some of the directory state. Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup") Signed-off-by: David Howells --- fs/afs/fsclient.c | 16 +--- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 5e3027f21390..f758750e81d8 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -2026,7 +2026,7 @@ static int afs_deliver_fs_fetch_status(struct afs_call *call) struct afs_file_status *status = call->reply[1]; struct afs_callback *callback = call->reply[2]; struct afs_volsync *volsync = call->reply[3]; - struct afs_vnode *vnode = call->reply[0]; + struct afs_fid *fid = call->reply[0]; const __be32 *bp; int ret; @@ -2034,21 +2034,15 @@ static int afs_deliver_fs_fetch_status(struct afs_call *call) if (ret < 0) return ret; - _enter("{%llx:%llu}", vnode->fid.vid, vnode->fid.vnode); + _enter("{%llx:%llu}", fid->vid, fid->vnode); /* unmarshall the reply once we've received all of it */ bp = call->buffer; - ret = afs_decode_status(call, , status, vnode, + ret = afs_decode_status(call, , status, NULL, >expected_version, NULL); if (ret < 0) return ret; - callback[call->count].version = ntohl(bp[0]); - callback[call->count].expiry= ntohl(bp[1]); - callback[call->count].type = ntohl(bp[2]); - if (vnode) - xdr_decode_AFSCallBack(call, vnode, ); - else - bp += 3; + xdr_decode_AFSCallBack_raw(, callback); if (volsync) xdr_decode_AFSVolSync(, volsync); @@ -2089,7 +2083,7 @@ int afs_fs_fetch_status(struct afs_fs_cursor *fc, } call->key = fc->key; - call->reply[0] = NULL; /* vnode for fid[0] */ + call->reply[0] = fid; call->reply[1] = status; call->reply[2] = callback; call->reply[3] = volsync;
[PATCH 13/25] afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS [ver #2]
Increase the sizes of the volume ID to 64 bits and the vnode ID (inode number equivalent) to 96 bits to allow the support of YFS. This requires the iget comparator to check the vnode->fid rather than i_ino and i_generation as i_ino is not sufficiently capacious. It also requires this data to be placed into the vnode cache key for fscache. For the moment, just discard the top 32 bits of the vnode ID when returning it though stat. Signed-off-by: David Howells --- fs/afs/afs.h | 11 ++- fs/afs/cache.c |2 +- fs/afs/callback.c |2 +- fs/afs/dir.c | 24 fs/afs/dynroot.c |2 +- fs/afs/file.c |8 fs/afs/flock.c | 22 +++--- fs/afs/fsclient.c | 24 fs/afs/inode.c | 31 +-- fs/afs/proc.c |2 +- fs/afs/rotate.c|2 +- fs/afs/security.c |6 +++--- fs/afs/super.c |5 +++-- fs/afs/volume.c|2 +- fs/afs/write.c | 18 +- fs/afs/xattr.c |2 +- include/trace/events/afs.h |4 ++-- 17 files changed, 86 insertions(+), 81 deletions(-) diff --git a/fs/afs/afs.h b/fs/afs/afs.h index b4ff1f7ae4ab..c23b31b742fa 100644 --- a/fs/afs/afs.h +++ b/fs/afs/afs.h @@ -23,9 +23,9 @@ #define AFSPATHMAX 1024/* Maximum length of a pathname plus NUL */ #define AFSOPAQUEMAX 1024/* Maximum length of an opaque field */ -typedef unsigned afs_volid_t; -typedef unsigned afs_vnodeid_t; -typedef unsigned long long afs_dataversion_t; +typedef u64afs_volid_t; +typedef u64afs_vnodeid_t; +typedef u64afs_dataversion_t; typedef enum { AFSVL_RWVOL,/* read/write volume */ @@ -52,8 +52,9 @@ typedef enum { */ struct afs_fid { afs_volid_t vid;/* volume ID */ - afs_vnodeid_t vnode; /* file index within volume */ - unsignedunique; /* unique ID number (file index version) */ + afs_vnodeid_t vnode; /* Lower 64-bits of file index within volume */ + u32 vnode_hi; /* Upper 32-bits of file index */ + u32 unique; /* unique ID number (file index version) */ }; /* diff --git a/fs/afs/cache.c b/fs/afs/cache.c index b1c31ec4523a..f6d0a21e8052 100644 --- a/fs/afs/cache.c +++ b/fs/afs/cache.c @@ -49,7 +49,7 @@ static enum fscache_checkaux afs_vnode_cache_check_aux(void *cookie_netfs_data, struct afs_vnode *vnode = cookie_netfs_data; struct afs_vnode_cache_aux aux; - _enter("{%x,%x,%llx},%p,%u", + _enter("{%llx,%x,%llx},%p,%u", vnode->fid.vnode, vnode->fid.unique, vnode->status.data_version, buffer, buflen); diff --git a/fs/afs/callback.c b/fs/afs/callback.c index 5f261fbf2182..8698198ad427 100644 --- a/fs/afs/callback.c +++ b/fs/afs/callback.c @@ -310,7 +310,7 @@ void afs_break_callbacks(struct afs_server *server, size_t count, /* TODO: Sort the callback break list by volume ID */ for (; count > 0; callbacks++, count--) { - _debug("- Fid { vl=%08x n=%u u=%u } CB { v=%u x=%u t=%u }", + _debug("- Fid { vl=%08llx n=%llu u=%u } CB { v=%u x=%u t=%u }", callbacks->fid.vid, callbacks->fid.vnode, callbacks->fid.unique, diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 78f9754fd03d..024b7cf7441c 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -552,7 +552,7 @@ static int afs_do_lookup_one(struct inode *dir, struct dentry *dentry, } *fid = cookie.fid; - _leave(" = 0 { vn=%u u=%u }", fid->vnode, fid->unique); + _leave(" = 0 { vn=%llu u=%u }", fid->vnode, fid->unique); return 0; } @@ -830,7 +830,7 @@ static struct dentry *afs_lookup(struct inode *dir, struct dentry *dentry, struct key *key; int ret; - _enter("{%x:%u},%p{%pd},", + _enter("{%llx:%llu},%p{%pd},", dvnode->fid.vid, dvnode->fid.vnode, dentry, dentry); ASSERTCMP(d_inode(dentry), ==, NULL); @@ -900,7 +900,7 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned int flags) if (d_really_is_positive(dentry)) { vnode = AFS_FS_I(d_inode(dentry)); - _enter("{v={%x:%u} n=%pd fl=%lx},", + _enter("{v={%llx:%llu} n=%pd fl=%lx},", vnode->fid.vid, vnode->fid.vnode, dentry, vnode->flags); } else { @@ -969,7 +969,7 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned int flags) /* if the vnode ID has changed, then the dirent points to a
[PATCH 17/25] afs: Fix FS.FetchStatus delivery from updating wrong vnode [ver #2]
The FS.FetchStatus reply delivery function was updating inode of the directory in which a lookup had been done with the status of the looked up file. This corrupts some of the directory state. Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup") Signed-off-by: David Howells --- fs/afs/fsclient.c | 16 +--- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 5e3027f21390..f758750e81d8 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -2026,7 +2026,7 @@ static int afs_deliver_fs_fetch_status(struct afs_call *call) struct afs_file_status *status = call->reply[1]; struct afs_callback *callback = call->reply[2]; struct afs_volsync *volsync = call->reply[3]; - struct afs_vnode *vnode = call->reply[0]; + struct afs_fid *fid = call->reply[0]; const __be32 *bp; int ret; @@ -2034,21 +2034,15 @@ static int afs_deliver_fs_fetch_status(struct afs_call *call) if (ret < 0) return ret; - _enter("{%llx:%llu}", vnode->fid.vid, vnode->fid.vnode); + _enter("{%llx:%llu}", fid->vid, fid->vnode); /* unmarshall the reply once we've received all of it */ bp = call->buffer; - ret = afs_decode_status(call, , status, vnode, + ret = afs_decode_status(call, , status, NULL, >expected_version, NULL); if (ret < 0) return ret; - callback[call->count].version = ntohl(bp[0]); - callback[call->count].expiry= ntohl(bp[1]); - callback[call->count].type = ntohl(bp[2]); - if (vnode) - xdr_decode_AFSCallBack(call, vnode, ); - else - bp += 3; + xdr_decode_AFSCallBack_raw(, callback); if (volsync) xdr_decode_AFSVolSync(, volsync); @@ -2089,7 +2083,7 @@ int afs_fs_fetch_status(struct afs_fs_cursor *fc, } call->key = fc->key; - call->reply[0] = NULL; /* vnode for fid[0] */ + call->reply[0] = fid; call->reply[1] = status; call->reply[2] = callback; call->reply[3] = volsync;
[PATCH 13/25] afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS [ver #2]
Increase the sizes of the volume ID to 64 bits and the vnode ID (inode number equivalent) to 96 bits to allow the support of YFS. This requires the iget comparator to check the vnode->fid rather than i_ino and i_generation as i_ino is not sufficiently capacious. It also requires this data to be placed into the vnode cache key for fscache. For the moment, just discard the top 32 bits of the vnode ID when returning it though stat. Signed-off-by: David Howells --- fs/afs/afs.h | 11 ++- fs/afs/cache.c |2 +- fs/afs/callback.c |2 +- fs/afs/dir.c | 24 fs/afs/dynroot.c |2 +- fs/afs/file.c |8 fs/afs/flock.c | 22 +++--- fs/afs/fsclient.c | 24 fs/afs/inode.c | 31 +-- fs/afs/proc.c |2 +- fs/afs/rotate.c|2 +- fs/afs/security.c |6 +++--- fs/afs/super.c |5 +++-- fs/afs/volume.c|2 +- fs/afs/write.c | 18 +- fs/afs/xattr.c |2 +- include/trace/events/afs.h |4 ++-- 17 files changed, 86 insertions(+), 81 deletions(-) diff --git a/fs/afs/afs.h b/fs/afs/afs.h index b4ff1f7ae4ab..c23b31b742fa 100644 --- a/fs/afs/afs.h +++ b/fs/afs/afs.h @@ -23,9 +23,9 @@ #define AFSPATHMAX 1024/* Maximum length of a pathname plus NUL */ #define AFSOPAQUEMAX 1024/* Maximum length of an opaque field */ -typedef unsigned afs_volid_t; -typedef unsigned afs_vnodeid_t; -typedef unsigned long long afs_dataversion_t; +typedef u64afs_volid_t; +typedef u64afs_vnodeid_t; +typedef u64afs_dataversion_t; typedef enum { AFSVL_RWVOL,/* read/write volume */ @@ -52,8 +52,9 @@ typedef enum { */ struct afs_fid { afs_volid_t vid;/* volume ID */ - afs_vnodeid_t vnode; /* file index within volume */ - unsignedunique; /* unique ID number (file index version) */ + afs_vnodeid_t vnode; /* Lower 64-bits of file index within volume */ + u32 vnode_hi; /* Upper 32-bits of file index */ + u32 unique; /* unique ID number (file index version) */ }; /* diff --git a/fs/afs/cache.c b/fs/afs/cache.c index b1c31ec4523a..f6d0a21e8052 100644 --- a/fs/afs/cache.c +++ b/fs/afs/cache.c @@ -49,7 +49,7 @@ static enum fscache_checkaux afs_vnode_cache_check_aux(void *cookie_netfs_data, struct afs_vnode *vnode = cookie_netfs_data; struct afs_vnode_cache_aux aux; - _enter("{%x,%x,%llx},%p,%u", + _enter("{%llx,%x,%llx},%p,%u", vnode->fid.vnode, vnode->fid.unique, vnode->status.data_version, buffer, buflen); diff --git a/fs/afs/callback.c b/fs/afs/callback.c index 5f261fbf2182..8698198ad427 100644 --- a/fs/afs/callback.c +++ b/fs/afs/callback.c @@ -310,7 +310,7 @@ void afs_break_callbacks(struct afs_server *server, size_t count, /* TODO: Sort the callback break list by volume ID */ for (; count > 0; callbacks++, count--) { - _debug("- Fid { vl=%08x n=%u u=%u } CB { v=%u x=%u t=%u }", + _debug("- Fid { vl=%08llx n=%llu u=%u } CB { v=%u x=%u t=%u }", callbacks->fid.vid, callbacks->fid.vnode, callbacks->fid.unique, diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 78f9754fd03d..024b7cf7441c 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -552,7 +552,7 @@ static int afs_do_lookup_one(struct inode *dir, struct dentry *dentry, } *fid = cookie.fid; - _leave(" = 0 { vn=%u u=%u }", fid->vnode, fid->unique); + _leave(" = 0 { vn=%llu u=%u }", fid->vnode, fid->unique); return 0; } @@ -830,7 +830,7 @@ static struct dentry *afs_lookup(struct inode *dir, struct dentry *dentry, struct key *key; int ret; - _enter("{%x:%u},%p{%pd},", + _enter("{%llx:%llu},%p{%pd},", dvnode->fid.vid, dvnode->fid.vnode, dentry, dentry); ASSERTCMP(d_inode(dentry), ==, NULL); @@ -900,7 +900,7 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned int flags) if (d_really_is_positive(dentry)) { vnode = AFS_FS_I(d_inode(dentry)); - _enter("{v={%x:%u} n=%pd fl=%lx},", + _enter("{v={%llx:%llu} n=%pd fl=%lx},", vnode->fid.vid, vnode->fid.vnode, dentry, vnode->flags); } else { @@ -969,7 +969,7 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned int flags) /* if the vnode ID has changed, then the dirent points to a
[PATCH v3 2/4] nds32: Perf porting
This is the commit that porting the perf for nds32. 1.Raw event: The raw events start with 'r'. Usage: perf stat -e rXYZ ./app X: the index of performance counter. YZ: the index(convert to hexdecimal) of events Example: 'perf stat -e r101 ./app' means the counter 1 will count the instruction event. The index of counter and events can be found in "Andes System Privilege Architecture Version 3 Manual". Or you can perform the 'perf list' to find the symbolic name of raw events. 2.Perf mmap2: Fix unexpected perf mmap2() page fault When the mmap2() called by perf application, you will encounter such condition:"failed to write." With return value -EFAULT This is due to the page fault caused by "reading" buffer from the mapped legal address region to write to the descriptor. The page_fault handler will get a VM_FAULT_SIGBUS return value, which should not happens here.(Due to this is a read request.) You can refer to kernel/events/core.c:perf_mmap_fault(...) If "(vmf->pgoff && (vmf->flags & FAULT_FLAG_WRITE))" is evaluated as true, you will get VM_FAULT_SIGBUS as return value. However, this is not an write request. The flags which indicated why the page fault happens is wrong. Furthermore, NDS32 SPAv3 is not able to detect it is read or write. It only know either it is instruction fetch or data access. Therefore, by removing the wrong flag assignment(actually, the hardware is not able to show the reason), we can fix this bug. 3.Perf multiple events map to same counter. When there are multiple events map to the same counter, the counter counts inaccurately. This is because each counter only counts one event in the same time. So when there are multiple events map to same counter, they have to take turns in each context. There are two solution: 1. Print the error message when multiple events map to the same counter. But print the error message would let the program hang in loop. The ltp (linux test program) would be failed when the program hang in loop. 2. Don't print the error message, the ltp would pass. But the user need to have the knowledge that don't count the events which map to the same counter, or the user will get the inaccurate results. We choose method 2 for the solution Signed-off-by: Nickhu --- arch/nds32/Kconfig|1 + arch/nds32/boot/dts/ae3xx.dts |5 + arch/nds32/include/asm/Kbuild |1 + arch/nds32/include/asm/perf_event.h | 16 + arch/nds32/include/asm/pmu.h | 386 ++ arch/nds32/include/asm/stacktrace.h | 39 + arch/nds32/kernel/Makefile|3 +- arch/nds32/kernel/perf_event_cpu.c| 1223 + arch/nds32/mm/fault.c | 13 +- tools/include/asm/barrier.h |2 + tools/perf/arch/nds32/Build |1 + tools/perf/arch/nds32/util/Build |1 + tools/perf/arch/nds32/util/header.c | 29 + tools/perf/pmu-events/arch/nds32/mapfile.csv | 15 + .../pmu-events/arch/nds32/n13/atcpmu.json | 290 15 files changed, 2019 insertions(+), 6 deletions(-) create mode 100644 arch/nds32/include/asm/perf_event.h create mode 100644 arch/nds32/include/asm/pmu.h create mode 100644 arch/nds32/include/asm/stacktrace.h create mode 100644 arch/nds32/kernel/perf_event_cpu.c create mode 100644 tools/perf/arch/nds32/Build create mode 100644 tools/perf/arch/nds32/util/Build create mode 100644 tools/perf/arch/nds32/util/header.c create mode 100644 tools/perf/pmu-events/arch/nds32/mapfile.csv create mode 100644 tools/perf/pmu-events/arch/nds32/n13/atcpmu.json diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig index 7068f341133d..dd448d431f5a 100644 --- a/arch/nds32/Kconfig +++ b/arch/nds32/Kconfig @@ -31,6 +31,7 @@ config NDS32 select HAVE_DEBUG_KMEMLEAK select HAVE_MEMBLOCK select HAVE_REGS_AND_STACK_ACCESS_API + select HAVE_PERF_EVENTS select IRQ_DOMAIN select LOCKDEP_SUPPORT select MODULES_USE_ELF_RELA diff --git a/arch/nds32/boot/dts/ae3xx.dts b/arch/nds32/boot/dts/ae3xx.dts index bb39749a6673..16a9f54a805e 100644 --- a/arch/nds32/boot/dts/ae3xx.dts +++ b/arch/nds32/boot/dts/ae3xx.dts @@ -82,4 +82,9 @@ interrupts = <18>; }; }; + + pmu { + compatible = "andestech,nds32v3-pmu"; + interrupts= <13>; + }; }; diff --git a/arch/nds32/include/asm/Kbuild b/arch/nds32/include/asm/Kbuild index
[PATCH v3 2/4] nds32: Perf porting
This is the commit that porting the perf for nds32. 1.Raw event: The raw events start with 'r'. Usage: perf stat -e rXYZ ./app X: the index of performance counter. YZ: the index(convert to hexdecimal) of events Example: 'perf stat -e r101 ./app' means the counter 1 will count the instruction event. The index of counter and events can be found in "Andes System Privilege Architecture Version 3 Manual". Or you can perform the 'perf list' to find the symbolic name of raw events. 2.Perf mmap2: Fix unexpected perf mmap2() page fault When the mmap2() called by perf application, you will encounter such condition:"failed to write." With return value -EFAULT This is due to the page fault caused by "reading" buffer from the mapped legal address region to write to the descriptor. The page_fault handler will get a VM_FAULT_SIGBUS return value, which should not happens here.(Due to this is a read request.) You can refer to kernel/events/core.c:perf_mmap_fault(...) If "(vmf->pgoff && (vmf->flags & FAULT_FLAG_WRITE))" is evaluated as true, you will get VM_FAULT_SIGBUS as return value. However, this is not an write request. The flags which indicated why the page fault happens is wrong. Furthermore, NDS32 SPAv3 is not able to detect it is read or write. It only know either it is instruction fetch or data access. Therefore, by removing the wrong flag assignment(actually, the hardware is not able to show the reason), we can fix this bug. 3.Perf multiple events map to same counter. When there are multiple events map to the same counter, the counter counts inaccurately. This is because each counter only counts one event in the same time. So when there are multiple events map to same counter, they have to take turns in each context. There are two solution: 1. Print the error message when multiple events map to the same counter. But print the error message would let the program hang in loop. The ltp (linux test program) would be failed when the program hang in loop. 2. Don't print the error message, the ltp would pass. But the user need to have the knowledge that don't count the events which map to the same counter, or the user will get the inaccurate results. We choose method 2 for the solution Signed-off-by: Nickhu --- arch/nds32/Kconfig|1 + arch/nds32/boot/dts/ae3xx.dts |5 + arch/nds32/include/asm/Kbuild |1 + arch/nds32/include/asm/perf_event.h | 16 + arch/nds32/include/asm/pmu.h | 386 ++ arch/nds32/include/asm/stacktrace.h | 39 + arch/nds32/kernel/Makefile|3 +- arch/nds32/kernel/perf_event_cpu.c| 1223 + arch/nds32/mm/fault.c | 13 +- tools/include/asm/barrier.h |2 + tools/perf/arch/nds32/Build |1 + tools/perf/arch/nds32/util/Build |1 + tools/perf/arch/nds32/util/header.c | 29 + tools/perf/pmu-events/arch/nds32/mapfile.csv | 15 + .../pmu-events/arch/nds32/n13/atcpmu.json | 290 15 files changed, 2019 insertions(+), 6 deletions(-) create mode 100644 arch/nds32/include/asm/perf_event.h create mode 100644 arch/nds32/include/asm/pmu.h create mode 100644 arch/nds32/include/asm/stacktrace.h create mode 100644 arch/nds32/kernel/perf_event_cpu.c create mode 100644 tools/perf/arch/nds32/Build create mode 100644 tools/perf/arch/nds32/util/Build create mode 100644 tools/perf/arch/nds32/util/header.c create mode 100644 tools/perf/pmu-events/arch/nds32/mapfile.csv create mode 100644 tools/perf/pmu-events/arch/nds32/n13/atcpmu.json diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig index 7068f341133d..dd448d431f5a 100644 --- a/arch/nds32/Kconfig +++ b/arch/nds32/Kconfig @@ -31,6 +31,7 @@ config NDS32 select HAVE_DEBUG_KMEMLEAK select HAVE_MEMBLOCK select HAVE_REGS_AND_STACK_ACCESS_API + select HAVE_PERF_EVENTS select IRQ_DOMAIN select LOCKDEP_SUPPORT select MODULES_USE_ELF_RELA diff --git a/arch/nds32/boot/dts/ae3xx.dts b/arch/nds32/boot/dts/ae3xx.dts index bb39749a6673..16a9f54a805e 100644 --- a/arch/nds32/boot/dts/ae3xx.dts +++ b/arch/nds32/boot/dts/ae3xx.dts @@ -82,4 +82,9 @@ interrupts = <18>; }; }; + + pmu { + compatible = "andestech,nds32v3-pmu"; + interrupts= <13>; + }; }; diff --git a/arch/nds32/include/asm/Kbuild b/arch/nds32/include/asm/Kbuild index
[PATCH 15/25] afs: Remove callback details from afs_callback_break struct [ver #2]
Remove unnecessary details of a broken callback, such as version, expiry and type, from the afs_callback_break struct as they're not actually used and make the list take more memory. Signed-off-by: David Howells --- fs/afs/afs.h |2 +- fs/afs/callback.c |8 ++-- fs/afs/cmservice.c | 17 + 3 files changed, 8 insertions(+), 19 deletions(-) diff --git a/fs/afs/afs.h b/fs/afs/afs.h index c23b31b742fa..fb9bcb8758ea 100644 --- a/fs/afs/afs.h +++ b/fs/afs/afs.h @@ -75,7 +75,7 @@ struct afs_callback { struct afs_callback_break { struct afs_fid fid;/* File identifier */ - struct afs_callback cb; /* Callback details */ + //struct afs_callback cb; /* Callback details */ }; #define AFSCBMAX 50/* maximum callbacks transferred per bulk op */ diff --git a/fs/afs/callback.c b/fs/afs/callback.c index 8698198ad427..df9bfee698ad 100644 --- a/fs/afs/callback.c +++ b/fs/afs/callback.c @@ -310,14 +310,10 @@ void afs_break_callbacks(struct afs_server *server, size_t count, /* TODO: Sort the callback break list by volume ID */ for (; count > 0; callbacks++, count--) { - _debug("- Fid { vl=%08llx n=%llu u=%u } CB { v=%u x=%u t=%u }", + _debug("- Fid { vl=%08llx n=%llu u=%u }", callbacks->fid.vid, callbacks->fid.vnode, - callbacks->fid.unique, - callbacks->cb.version, - callbacks->cb.expiry, - callbacks->cb.type - ); + callbacks->fid.unique); afs_break_one_callback(server, >fid); } diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 186f621f8722..fc0010d800a0 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -218,7 +218,6 @@ static int afs_deliver_cb_callback(struct afs_call *call) cb->fid.vid = ntohl(*bp++); cb->fid.vnode = ntohl(*bp++); cb->fid.unique = ntohl(*bp++); - cb->cb.type = AFSCM_CB_UNTYPED; } afs_extract_to_tmp(call); @@ -236,24 +235,18 @@ static int afs_deliver_cb_callback(struct afs_call *call) if (call->count2 != call->count && call->count2 != 0) return afs_protocol_error(call, -EBADMSG, afs_eproto_cb_count); - afs_extract_to_buf(call, call->count2 * 3 * 4); + call->_iter = >iter; + iov_iter_discard(>iter, READ, call->count2 * 3 * 4); call->unmarshall++; case 4: - _debug("extract CB array"); + _debug("extract discard %zu/%u", + iov_iter_count(>iter), call->count2 * 3 * 4); + ret = afs_extract_data(call, false); if (ret < 0) return ret; - _debug("unmarshall CB array"); - cb = call->request; - bp = call->buffer; - for (loop = call->count2; loop > 0; loop--, cb++) { - cb->cb.version = ntohl(*bp++); - cb->cb.expiry = ntohl(*bp++); - cb->cb.type = ntohl(*bp++); - } - call->unmarshall++; case 5: break;
[RFC v2 10/14] kunit: add Python libraries for handing KUnit config and kernel
The ultimate goal is to create minimal isolated test binaries; in the meantime we are using UML to provide the infrastructure to run tests, so define an abstract way to configure and run tests that allow us to change the context in which tests are built without affecting the user. This also makes pretty and dynamic error reporting, and a lot of other nice features easier. kunit_config.py: - parse .config and Kconfig files. kunit_kernel.py: provides helper functions to: - configure the kernel using kunitconfig. - build the kernel with the appropriate configuration. - provide function to invoke the kernel and stream the output back. Signed-off-by: Felix Guo Signed-off-by: Brendan Higgins --- tools/testing/kunit/.gitignore | 3 + tools/testing/kunit/kunit_config.py | 60 ++ tools/testing/kunit/kunit_kernel.py | 123 3 files changed, 186 insertions(+) create mode 100644 tools/testing/kunit/.gitignore create mode 100644 tools/testing/kunit/kunit_config.py create mode 100644 tools/testing/kunit/kunit_kernel.py diff --git a/tools/testing/kunit/.gitignore b/tools/testing/kunit/.gitignore new file mode 100644 index 0..c791ff59a37a9 --- /dev/null +++ b/tools/testing/kunit/.gitignore @@ -0,0 +1,3 @@ +# Byte-compiled / optimized / DLL files +__pycache__/ +*.py[cod] \ No newline at end of file diff --git a/tools/testing/kunit/kunit_config.py b/tools/testing/kunit/kunit_config.py new file mode 100644 index 0..183bd5e758762 --- /dev/null +++ b/tools/testing/kunit/kunit_config.py @@ -0,0 +1,60 @@ +# SPDX-License-Identifier: GPL-2.0 + +import collections +import re + +CONFIG_IS_NOT_SET_PATTERN = r'^# CONFIG_\w+ is not set$' +CONFIG_PATTERN = r'^CONFIG_\w+=\S+$' + +KconfigEntryBase = collections.namedtuple('KconfigEntry', ['raw_entry']) + + +class KconfigEntry(KconfigEntryBase): + + def __str__(self) -> str: + return self.raw_entry + + +class KconfigParseError(Exception): + """Error parsing Kconfig defconfig or .config.""" + + +class Kconfig(object): + """Represents defconfig or .config specified using the Kconfig language.""" + + def __init__(self): + self._entries = [] + + def entries(self): + return set(self._entries) + + def add_entry(self, entry: KconfigEntry) -> None: + self._entries.append(entry) + + def is_subset_of(self, other: "Kconfig") -> bool: + return self.entries().issubset(other.entries()) + + def write_to_file(self, path: str) -> None: + with open(path, 'w') as f: + for entry in self.entries(): + f.write(str(entry) + '\n') + + def parse_from_string(self, blob: str) -> None: + """Parses a string containing KconfigEntrys and populates this Kconfig.""" + self._entries = [] + is_not_set_matcher = re.compile(CONFIG_IS_NOT_SET_PATTERN) + config_matcher = re.compile(CONFIG_PATTERN) + for line in blob.split('\n'): + line = line.strip() + if not line: + continue + elif config_matcher.match(line) or is_not_set_matcher.match(line): + self._entries.append(KconfigEntry(line)) + elif line[0] == '#': + continue + else: + raise KconfigParseError('Failed to parse: ' + line) + + def read_from_file(self, path: str) -> None: + with open(path, 'r') as f: + self.parse_from_string(f.read()) diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py new file mode 100644 index 0..87abaede50513 --- /dev/null +++ b/tools/testing/kunit/kunit_kernel.py @@ -0,0 +1,123 @@ +# SPDX-License-Identifier: GPL-2.0 + +import logging +import subprocess +import os + +import kunit_config + +KCONFIG_PATH = '.config' + +class ConfigError(Exception): + """Represents an error trying to configure the Linux kernel.""" + + +class BuildError(Exception): + """Represents an error trying to build the Linux kernel.""" + + +class LinuxSourceTreeOperations(object): + """An abstraction over command line operations performed on a source tree.""" + + def make_mrproper(self): + try: + subprocess.check_output(['make', 'mrproper']) + except OSError as e: + raise ConfigError('Could not call make command: ' + e) + except subprocess.CalledProcessError as e: + raise ConfigError(e.output) + + def make_olddefconfig(self): + try: + subprocess.check_output(['make', 'ARCH=um', 'olddefconfig']) + except OSError as e: +
[PATCH 15/25] afs: Remove callback details from afs_callback_break struct [ver #2]
Remove unnecessary details of a broken callback, such as version, expiry and type, from the afs_callback_break struct as they're not actually used and make the list take more memory. Signed-off-by: David Howells --- fs/afs/afs.h |2 +- fs/afs/callback.c |8 ++-- fs/afs/cmservice.c | 17 + 3 files changed, 8 insertions(+), 19 deletions(-) diff --git a/fs/afs/afs.h b/fs/afs/afs.h index c23b31b742fa..fb9bcb8758ea 100644 --- a/fs/afs/afs.h +++ b/fs/afs/afs.h @@ -75,7 +75,7 @@ struct afs_callback { struct afs_callback_break { struct afs_fid fid;/* File identifier */ - struct afs_callback cb; /* Callback details */ + //struct afs_callback cb; /* Callback details */ }; #define AFSCBMAX 50/* maximum callbacks transferred per bulk op */ diff --git a/fs/afs/callback.c b/fs/afs/callback.c index 8698198ad427..df9bfee698ad 100644 --- a/fs/afs/callback.c +++ b/fs/afs/callback.c @@ -310,14 +310,10 @@ void afs_break_callbacks(struct afs_server *server, size_t count, /* TODO: Sort the callback break list by volume ID */ for (; count > 0; callbacks++, count--) { - _debug("- Fid { vl=%08llx n=%llu u=%u } CB { v=%u x=%u t=%u }", + _debug("- Fid { vl=%08llx n=%llu u=%u }", callbacks->fid.vid, callbacks->fid.vnode, - callbacks->fid.unique, - callbacks->cb.version, - callbacks->cb.expiry, - callbacks->cb.type - ); + callbacks->fid.unique); afs_break_one_callback(server, >fid); } diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 186f621f8722..fc0010d800a0 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -218,7 +218,6 @@ static int afs_deliver_cb_callback(struct afs_call *call) cb->fid.vid = ntohl(*bp++); cb->fid.vnode = ntohl(*bp++); cb->fid.unique = ntohl(*bp++); - cb->cb.type = AFSCM_CB_UNTYPED; } afs_extract_to_tmp(call); @@ -236,24 +235,18 @@ static int afs_deliver_cb_callback(struct afs_call *call) if (call->count2 != call->count && call->count2 != 0) return afs_protocol_error(call, -EBADMSG, afs_eproto_cb_count); - afs_extract_to_buf(call, call->count2 * 3 * 4); + call->_iter = >iter; + iov_iter_discard(>iter, READ, call->count2 * 3 * 4); call->unmarshall++; case 4: - _debug("extract CB array"); + _debug("extract discard %zu/%u", + iov_iter_count(>iter), call->count2 * 3 * 4); + ret = afs_extract_data(call, false); if (ret < 0) return ret; - _debug("unmarshall CB array"); - cb = call->request; - bp = call->buffer; - for (loop = call->count2; loop > 0; loop--, cb++) { - cb->cb.version = ntohl(*bp++); - cb->cb.expiry = ntohl(*bp++); - cb->cb.type = ntohl(*bp++); - } - call->unmarshall++; case 5: break;
[RFC v2 10/14] kunit: add Python libraries for handing KUnit config and kernel
The ultimate goal is to create minimal isolated test binaries; in the meantime we are using UML to provide the infrastructure to run tests, so define an abstract way to configure and run tests that allow us to change the context in which tests are built without affecting the user. This also makes pretty and dynamic error reporting, and a lot of other nice features easier. kunit_config.py: - parse .config and Kconfig files. kunit_kernel.py: provides helper functions to: - configure the kernel using kunitconfig. - build the kernel with the appropriate configuration. - provide function to invoke the kernel and stream the output back. Signed-off-by: Felix Guo Signed-off-by: Brendan Higgins --- tools/testing/kunit/.gitignore | 3 + tools/testing/kunit/kunit_config.py | 60 ++ tools/testing/kunit/kunit_kernel.py | 123 3 files changed, 186 insertions(+) create mode 100644 tools/testing/kunit/.gitignore create mode 100644 tools/testing/kunit/kunit_config.py create mode 100644 tools/testing/kunit/kunit_kernel.py diff --git a/tools/testing/kunit/.gitignore b/tools/testing/kunit/.gitignore new file mode 100644 index 0..c791ff59a37a9 --- /dev/null +++ b/tools/testing/kunit/.gitignore @@ -0,0 +1,3 @@ +# Byte-compiled / optimized / DLL files +__pycache__/ +*.py[cod] \ No newline at end of file diff --git a/tools/testing/kunit/kunit_config.py b/tools/testing/kunit/kunit_config.py new file mode 100644 index 0..183bd5e758762 --- /dev/null +++ b/tools/testing/kunit/kunit_config.py @@ -0,0 +1,60 @@ +# SPDX-License-Identifier: GPL-2.0 + +import collections +import re + +CONFIG_IS_NOT_SET_PATTERN = r'^# CONFIG_\w+ is not set$' +CONFIG_PATTERN = r'^CONFIG_\w+=\S+$' + +KconfigEntryBase = collections.namedtuple('KconfigEntry', ['raw_entry']) + + +class KconfigEntry(KconfigEntryBase): + + def __str__(self) -> str: + return self.raw_entry + + +class KconfigParseError(Exception): + """Error parsing Kconfig defconfig or .config.""" + + +class Kconfig(object): + """Represents defconfig or .config specified using the Kconfig language.""" + + def __init__(self): + self._entries = [] + + def entries(self): + return set(self._entries) + + def add_entry(self, entry: KconfigEntry) -> None: + self._entries.append(entry) + + def is_subset_of(self, other: "Kconfig") -> bool: + return self.entries().issubset(other.entries()) + + def write_to_file(self, path: str) -> None: + with open(path, 'w') as f: + for entry in self.entries(): + f.write(str(entry) + '\n') + + def parse_from_string(self, blob: str) -> None: + """Parses a string containing KconfigEntrys and populates this Kconfig.""" + self._entries = [] + is_not_set_matcher = re.compile(CONFIG_IS_NOT_SET_PATTERN) + config_matcher = re.compile(CONFIG_PATTERN) + for line in blob.split('\n'): + line = line.strip() + if not line: + continue + elif config_matcher.match(line) or is_not_set_matcher.match(line): + self._entries.append(KconfigEntry(line)) + elif line[0] == '#': + continue + else: + raise KconfigParseError('Failed to parse: ' + line) + + def read_from_file(self, path: str) -> None: + with open(path, 'r') as f: + self.parse_from_string(f.read()) diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py new file mode 100644 index 0..87abaede50513 --- /dev/null +++ b/tools/testing/kunit/kunit_kernel.py @@ -0,0 +1,123 @@ +# SPDX-License-Identifier: GPL-2.0 + +import logging +import subprocess +import os + +import kunit_config + +KCONFIG_PATH = '.config' + +class ConfigError(Exception): + """Represents an error trying to configure the Linux kernel.""" + + +class BuildError(Exception): + """Represents an error trying to build the Linux kernel.""" + + +class LinuxSourceTreeOperations(object): + """An abstraction over command line operations performed on a source tree.""" + + def make_mrproper(self): + try: + subprocess.check_output(['make', 'mrproper']) + except OSError as e: + raise ConfigError('Could not call make command: ' + e) + except subprocess.CalledProcessError as e: + raise ConfigError(e.output) + + def make_olddefconfig(self): + try: + subprocess.check_output(['make', 'ARCH=um', 'olddefconfig']) + except OSError as e: +
[PATCH v3 4/4] nds32: Add document for NDS32 PMU.
The document for how to add NDS32 PMU in devicetree. Signed-off-by: Nickhu --- Documentation/devicetree/bindings/nds32/pmu.txt | 17 + 1 file changed, 17 insertions(+) create mode 100644 Documentation/devicetree/bindings/nds32/pmu.txt diff --git a/Documentation/devicetree/bindings/nds32/pmu.txt b/Documentation/devicetree/bindings/nds32/pmu.txt new file mode 100644 index ..1bd15785b4ae --- /dev/null +++ b/Documentation/devicetree/bindings/nds32/pmu.txt @@ -0,0 +1,17 @@ +* NDS32 Performance Monitor Units + +NDS32 core have a PMU for counting cpu and cache events like cache misses. +The NDS32 PMU representation in the device tree should be done as under: + +Required properties: + +- compatible : + "andestech,nds32v3-pmu" + +- interrupts : The interrupt number for NDS32 PMU is 13. + +Example: +pmu{ + compatible = "andestech,nds32v3-pmu"; + interrupts = <13>; +} -- 2.17.0
[PATCH v3 3/4] nds32: Add perf call-graph support.
The perf call-graph option can trace the callchain between functions. This commit add the perf callchain for nds32. There are kerenl callchain and user callchain. The kerenl callchain can trace the function in kernel space. There are two type for user callchain. One for the 'optimize for size' config is set, and another one for the config is not set. The difference between two types is that the index of frame-pointer in user stack is not the same. For example: With optimize for size: User Stack: - | lp | - | gp | - | fp | Without optimize for size: User Stack: 1. non-leaf function: - | lp | - | fp | 2. leaf function: - | fp | Signed-off-by: Nickhu --- arch/nds32/kernel/perf_event_cpu.c | 299 + 1 file changed, 299 insertions(+) diff --git a/arch/nds32/kernel/perf_event_cpu.c b/arch/nds32/kernel/perf_event_cpu.c index a6e723d0fdbc..5e00ce54d0ff 100644 --- a/arch/nds32/kernel/perf_event_cpu.c +++ b/arch/nds32/kernel/perf_event_cpu.c @@ -1193,6 +1193,305 @@ static int __init register_pmu_driver(void) device_initcall(register_pmu_driver); +/* + * References: arch/nds32/kernel/traps.c:__dump() + * You will need to know the NDS ABI first. + */ +static int unwind_frame_kernel(struct stackframe *frame) +{ + int graph = 0; +#ifdef CONFIG_FRAME_POINTER + /* 0x3 means misalignment */ + if (!kstack_end((void *)frame->fp) && + !((unsigned long)frame->fp & 0x3) && + ((unsigned long)frame->fp >= TASK_SIZE)) { + /* +* The array index is based on the ABI, the below graph +* illustrate the reasons. +* Function call procedure: "smw" and "lmw" will always +* update SP and FP for you automatically. +* +* Stack Relative Address +* | | 0 +* +* |LP| <-- SP(before smw) <-- FP(after smw) -1 +* +* |FP| -2 +* +* | | <-- SP(after smw) -3 +*/ + frame->lp = ((unsigned long *)frame->fp)[-1]; + frame->fp = ((unsigned long *)frame->fp)[FP_OFFSET]; + /* make sure CONFIG_FUNCTION_GRAPH_TRACER is turned on */ + if (__kernel_text_address(frame->lp)) + frame->lp = ftrace_graph_ret_addr + (NULL, , frame->lp, NULL); + + return 0; + } else { + return -EPERM; + } +#else + /* +* You can refer to arch/nds32/kernel/traps.c:__dump() +* Treat "sp" as "fp", but the "sp" is one frame ahead of "fp". +* And, the "sp" is not always correct. +* +* Stack Relative Address +* | | 0 +* +* |LP| <-- SP(before smw) -1 +* +* | | <-- SP(after smw) -2 +* +*/ + if (!kstack_end((void *)frame->sp)) { + frame->lp = ((unsigned long *)frame->sp)[1]; + /* TODO: How to deal with the value in first +* "sp" is not correct? +*/ + if (__kernel_text_address(frame->lp)) + frame->lp = ftrace_graph_ret_addr + (tsk, , frame->lp, NULL); + + frame->sp = ((unsigned long *)frame->sp) + 1; + + return 0; + } else { + return -EPERM; + } +#endif +} + +static void notrace +walk_stackframe(struct stackframe *frame, + int (*fn_record)(struct stackframe *, void *), + void *data) +{ + while (1) { + int ret; + + if (fn_record(frame, data)) + break; + + ret = unwind_frame_kernel(frame); + if (ret < 0) + break; + } +} + +/* + * Gets called by walk_stackframe() for every stackframe. This will be called + * whist unwinding the stackframe and is like a subroutine return so we use + * the PC. + */ +static int callchain_trace(struct stackframe *fr, void *data) +{ + struct
[PATCH v3 1/4] nds32: Fix bug in bitfield.h
There two bitfield bug for perfomance counter in bitfield.h: PFM_CTL_offSEL1 21 --> 16 PFM_CTL_offSEL2 27 --> 22 This commit fix it. Signed-off-by: Nickhu --- arch/nds32/include/asm/bitfield.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/nds32/include/asm/bitfield.h b/arch/nds32/include/asm/bitfield.h index 8e84fc385b94..19b2841219ad 100644 --- a/arch/nds32/include/asm/bitfield.h +++ b/arch/nds32/include/asm/bitfield.h @@ -692,8 +692,8 @@ #define PFM_CTL_offKU1 13 /* Enable user mode event counting for PFMC1 */ #define PFM_CTL_offKU2 14 /* Enable user mode event counting for PFMC2 */ #define PFM_CTL_offSEL015 /* The event selection for PFMC0 */ -#define PFM_CTL_offSEL121 /* The event selection for PFMC1 */ -#define PFM_CTL_offSEL227 /* The event selection for PFMC2 */ +#define PFM_CTL_offSEL116 /* The event selection for PFMC1 */ +#define PFM_CTL_offSEL222 /* The event selection for PFMC2 */ /* bit 28:31 reserved */ #define PFM_CTL_mskEN0 ( 0x01 << PFM_CTL_offEN0 ) -- 2.17.0
[PATCH v3 4/4] nds32: Add document for NDS32 PMU.
The document for how to add NDS32 PMU in devicetree. Signed-off-by: Nickhu --- Documentation/devicetree/bindings/nds32/pmu.txt | 17 + 1 file changed, 17 insertions(+) create mode 100644 Documentation/devicetree/bindings/nds32/pmu.txt diff --git a/Documentation/devicetree/bindings/nds32/pmu.txt b/Documentation/devicetree/bindings/nds32/pmu.txt new file mode 100644 index ..1bd15785b4ae --- /dev/null +++ b/Documentation/devicetree/bindings/nds32/pmu.txt @@ -0,0 +1,17 @@ +* NDS32 Performance Monitor Units + +NDS32 core have a PMU for counting cpu and cache events like cache misses. +The NDS32 PMU representation in the device tree should be done as under: + +Required properties: + +- compatible : + "andestech,nds32v3-pmu" + +- interrupts : The interrupt number for NDS32 PMU is 13. + +Example: +pmu{ + compatible = "andestech,nds32v3-pmu"; + interrupts = <13>; +} -- 2.17.0
[PATCH v3 3/4] nds32: Add perf call-graph support.
The perf call-graph option can trace the callchain between functions. This commit add the perf callchain for nds32. There are kerenl callchain and user callchain. The kerenl callchain can trace the function in kernel space. There are two type for user callchain. One for the 'optimize for size' config is set, and another one for the config is not set. The difference between two types is that the index of frame-pointer in user stack is not the same. For example: With optimize for size: User Stack: - | lp | - | gp | - | fp | Without optimize for size: User Stack: 1. non-leaf function: - | lp | - | fp | 2. leaf function: - | fp | Signed-off-by: Nickhu --- arch/nds32/kernel/perf_event_cpu.c | 299 + 1 file changed, 299 insertions(+) diff --git a/arch/nds32/kernel/perf_event_cpu.c b/arch/nds32/kernel/perf_event_cpu.c index a6e723d0fdbc..5e00ce54d0ff 100644 --- a/arch/nds32/kernel/perf_event_cpu.c +++ b/arch/nds32/kernel/perf_event_cpu.c @@ -1193,6 +1193,305 @@ static int __init register_pmu_driver(void) device_initcall(register_pmu_driver); +/* + * References: arch/nds32/kernel/traps.c:__dump() + * You will need to know the NDS ABI first. + */ +static int unwind_frame_kernel(struct stackframe *frame) +{ + int graph = 0; +#ifdef CONFIG_FRAME_POINTER + /* 0x3 means misalignment */ + if (!kstack_end((void *)frame->fp) && + !((unsigned long)frame->fp & 0x3) && + ((unsigned long)frame->fp >= TASK_SIZE)) { + /* +* The array index is based on the ABI, the below graph +* illustrate the reasons. +* Function call procedure: "smw" and "lmw" will always +* update SP and FP for you automatically. +* +* Stack Relative Address +* | | 0 +* +* |LP| <-- SP(before smw) <-- FP(after smw) -1 +* +* |FP| -2 +* +* | | <-- SP(after smw) -3 +*/ + frame->lp = ((unsigned long *)frame->fp)[-1]; + frame->fp = ((unsigned long *)frame->fp)[FP_OFFSET]; + /* make sure CONFIG_FUNCTION_GRAPH_TRACER is turned on */ + if (__kernel_text_address(frame->lp)) + frame->lp = ftrace_graph_ret_addr + (NULL, , frame->lp, NULL); + + return 0; + } else { + return -EPERM; + } +#else + /* +* You can refer to arch/nds32/kernel/traps.c:__dump() +* Treat "sp" as "fp", but the "sp" is one frame ahead of "fp". +* And, the "sp" is not always correct. +* +* Stack Relative Address +* | | 0 +* +* |LP| <-- SP(before smw) -1 +* +* | | <-- SP(after smw) -2 +* +*/ + if (!kstack_end((void *)frame->sp)) { + frame->lp = ((unsigned long *)frame->sp)[1]; + /* TODO: How to deal with the value in first +* "sp" is not correct? +*/ + if (__kernel_text_address(frame->lp)) + frame->lp = ftrace_graph_ret_addr + (tsk, , frame->lp, NULL); + + frame->sp = ((unsigned long *)frame->sp) + 1; + + return 0; + } else { + return -EPERM; + } +#endif +} + +static void notrace +walk_stackframe(struct stackframe *frame, + int (*fn_record)(struct stackframe *, void *), + void *data) +{ + while (1) { + int ret; + + if (fn_record(frame, data)) + break; + + ret = unwind_frame_kernel(frame); + if (ret < 0) + break; + } +} + +/* + * Gets called by walk_stackframe() for every stackframe. This will be called + * whist unwinding the stackframe and is like a subroutine return so we use + * the PC. + */ +static int callchain_trace(struct stackframe *fr, void *data) +{ + struct
[PATCH v3 1/4] nds32: Fix bug in bitfield.h
There two bitfield bug for perfomance counter in bitfield.h: PFM_CTL_offSEL1 21 --> 16 PFM_CTL_offSEL2 27 --> 22 This commit fix it. Signed-off-by: Nickhu --- arch/nds32/include/asm/bitfield.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/nds32/include/asm/bitfield.h b/arch/nds32/include/asm/bitfield.h index 8e84fc385b94..19b2841219ad 100644 --- a/arch/nds32/include/asm/bitfield.h +++ b/arch/nds32/include/asm/bitfield.h @@ -692,8 +692,8 @@ #define PFM_CTL_offKU1 13 /* Enable user mode event counting for PFMC1 */ #define PFM_CTL_offKU2 14 /* Enable user mode event counting for PFMC2 */ #define PFM_CTL_offSEL015 /* The event selection for PFMC0 */ -#define PFM_CTL_offSEL121 /* The event selection for PFMC1 */ -#define PFM_CTL_offSEL227 /* The event selection for PFMC2 */ +#define PFM_CTL_offSEL116 /* The event selection for PFMC1 */ +#define PFM_CTL_offSEL222 /* The event selection for PFMC2 */ /* bit 28:31 reserved */ #define PFM_CTL_mskEN0 ( 0x01 << PFM_CTL_offEN0 ) -- 2.17.0