Re: net/dst_cache.c: preemption bug in net/dst_cache.c

2019-09-18 Thread Bharath Vedartham
On Mon, Sep 09, 2019 at 05:48:25PM +0800, Xin Long wrote:
> On Fri, Aug 23, 2019 at 3:58 PM Bharath Vedartham  
> wrote:
> >
> > Hi all,
> >
> > I just want to bring attention to the syzbot bug [1]
> >
> > Even though syzbot claims the bug to be in net/tipc, I feel it is in
> > net/dst_cache.c. Please correct me if I am wrong.
> >
> > This bug is being triggered a lot of times by syzbot since the day it
> > was reported. Also given that this is core networking code, I felt it
> > was important to bring this to attention.
> >
> > It looks like preemption needs to be disabled before using this_cpu_ptr
> > or maybe we would be better of using a get_cpu_var and put_cpu_var combo
> > here.
> b->media->send_msg (tipc_udp_send_msg)
> -> tipc_udp_xmit() -> dst_cache_get()
> 
> send_msg() is always called under the protection of rcu_read_lock(), which
> already disabled preemption. If not, there must be some unbalanced calls of
> disable/enable preemption elsewhere.
> 
> Agree that this could be a serious issue, do you have any reproducer for this?
> 
> Thanks.
Hi Xin,

Sorry for the delayed response. I do not have a reproducer for this. You
can submit a patch to syzbot which can run the patch on the same system
on which it found the bug.

Thank you
Bharath
> >
> > [1] 
> > https://syzkaller.appspot.com/bug?id=dc6352b92862eb79373fe03fdf9af5928753e057
> >
> > Thank you
> > Bharath


Re: [PATCH] mm: Add callback for defining compaction completion

2019-09-12 Thread Bharath Vedartham
Hi Nitin,
On Wed, Sep 11, 2019 at 10:33:39PM +, Nitin Gupta wrote:
> On Wed, 2019-09-11 at 08:45 +0200, Michal Hocko wrote:
> > On Tue 10-09-19 22:27:53, Nitin Gupta wrote:
> > [...]
> > > > On Tue 10-09-19 13:07:32, Nitin Gupta wrote:
> > > > > For some applications we need to allocate almost all memory as
> > > > > hugepages.
> > > > > However, on a running system, higher order allocations can fail if the
> > > > > memory is fragmented. Linux kernel currently does on-demand
> > > > > compaction
> > > > > as we request more hugepages but this style of compaction incurs very
> > > > > high latency. Experiments with one-time full memory compaction
> > > > > (followed by hugepage allocations) shows that kernel is able to
> > > > > restore a highly fragmented memory state to a fairly compacted memory
> > > > > state within <1 sec for a 32G system. Such data suggests that a more
> > > > > proactive compaction can help us allocate a large fraction of memory
> > > > > as hugepages keeping allocation latencies low.
> > > > > 
> > > > > In general, compaction can introduce unexpected latencies for
> > > > > applications that don't even have strong requirements for contiguous
> > > > > allocations.
> > 
> > Could you expand on this a bit please? Gfp flags allow to express how
> > much the allocator try and compact for a high order allocations. Hugetlb
> > allocations tend to require retrying and heavy compaction to succeed and
> > the success rate tends to be pretty high from my experience.  Why that
> > is not case in your case?
> > 
The link to the driver you send on gitlab is not working :(
> Yes, I have the same observation: with `GFP_TRANSHUGE |
> __GFP_RETRY_MAYFAIL` I get very good success rate (~90% of free RAM
> allocated as hugepages). However, what I'm trying to point out is that this
> high success rate comes with high allocation latencies (90th percentile
> latency of 2206us). On the same system, the same high-order allocations
> which hit the fast path have latency <5us.
> 
> > > > > It is also hard to efficiently determine if the current
> > > > > system state can be easily compacted due to mixing of unmovable
> > > > > memory. Due to these reasons, automatic background compaction by the
> > > > > kernel itself is hard to get right in a way which does not hurt
> > > > > unsuspecting
> > > > applications or waste CPU cycles.
> > > > 
> > > > We do trigger background compaction on a high order pressure from the
> > > > page allocator by waking up kcompactd. Why is that not sufficient?
> > > > 
> > > 
> > > Whenever kcompactd is woken up, it does just enough work to create
> > > one free page of the given order (compaction_control.order) or higher.
> > 
> > This is an implementation detail IMHO. I am pretty sure we can do a
> > better auto tuning when there is an indication of a constant flow of
> > high order requests. This is no different from the memory reclaim in
> > principle. Just because the kswapd autotuning not fitting with your
> > particular workload you wouldn't want to export direct reclaim
> > functionality and call it from a random module. That is just doomed to
> > fail because different subsystems in control just leads to decisions
> > going against each other.
> > 
> 
> I don't want to go the route of adding any auto-tuning/perdiction code to
> control compaction in the kernel. I'm more inclined towards extending
> existing interfaces to allow compaction behavior to be controlled either
> from userspace or a kernel driver. Letting a random module control
> compaction or a root process pumping new tunables from sysfs is the same in
> principle.
Do you think a kernel module and root user process have the same
privileges? You can only export so much info to sysfs to use? Also
wouldn't this introduce more tunables, per driver tunables to be more
specific?
> This patch is in the spirit of simple extension to existing
> compaction_zone_order() which allows either a kernel driver or userspace
> (through sysfs) to control compaction.
> 
> Also, we should avoid driving hard parallels between reclaim and
> compaction: the former is often necessary for forward progress while the
> latter is often an optimization. Since contiguous allocations are mostly
> optimizations it's good to expose hooks from the kernel that let user
> (through a driver or userspace) control it using its own heuristics.
How is compaction an optimization? If I have a memory zone which has
memory pages more than zone_highwmark and if higher order allocations a
re failing because the memory is awfully fragmented, We need compaction
to furthur progress here. I have seen workloads where kswapd won't help
in progressing furthur because memory is so awfully fragmented.
The workload I am quoting is the thpscale_workload from Mel Gorman's mmtests
workloads.
> 
> I thought hard about whats lacking in current userspace interface (sysfs):
>  - /proc/sys/vm/compact_memory: full system compaction is not an option as
>a viable 

Re: [PATCH 5.2 00/94] 5.2.14-stable review

2019-09-09 Thread Bharath Vedartham
Built and booted on my x86 machine. No dmesg regressions found.

Thank you
Bharath


Re: [PATCH 4.14 00/40] 4.14.143-stable review

2019-09-09 Thread Bharath Vedartham
Built and booted on my x86 machine. NO dmesg regressions found.

Thank you
Bharath


Re: [PATCH 4.4 00/23] 4.4.192-stable review

2019-09-09 Thread Bharath Vedartham
Built and booted on my x86 machine. No dmesg regressions found.

Thank you
Bharath


Re: [PATCH 4.9 00/26] 4.9.192-stable review

2019-09-09 Thread Bharath Vedartham
Built and booted on my x86 arch machine. No dmesg regressions found.

Thank you
Bharath


Re: [RFC PATCH 0/2] Add predictive memory reclamation and compaction

2019-08-28 Thread Bharath Vedartham
Hi Michal, Thank you for spending your time on this.
On Tue, Aug 27, 2019 at 08:16:06AM +0200, Michal Hocko wrote:
> On Tue 27-08-19 02:14:20, Bharath Vedartham wrote:
> > Hi Michal,
> > 
> > Here are some of my thoughts,
> > On Wed, Aug 21, 2019 at 04:06:32PM +0200, Michal Hocko wrote:
> > > On Thu 15-08-19 14:51:04, Khalid Aziz wrote:
> > > > Hi Michal,
> > > > 
> > > > The smarts for tuning these knobs can be implemented in userspace and
> > > > more knobs added to allow for what is missing today, but we get back to
> > > > the same issue as before. That does nothing to make kernel self-tuning
> > > > and adds possibly even more knobs to userspace. Something so fundamental
> > > > to kernel memory management as making free pages available when they are
> > > > needed really should be taken care of in the kernel itself. Moving it to
> > > > userspace just means the kernel is hobbled unless one installs and tunes
> > > > a userspace package correctly.
> > > 
> > > From my past experience the existing autotunig works mostly ok for a
> > > vast variety of workloads. A more clever tuning is possible and people
> > > are doing that already. Especially for cases when the machine is heavily
> > > overcommited. There are different ways to achieve that. Your new
> > > in-kernel auto tuning would have to be tested on a large variety of
> > > workloads to be proven and riskless. So I am quite skeptical to be
> > > honest.
> > Could you give some references to such works regarding tuning the kernel? 
> 
> Talk to Facebook guys and their usage of PSI to control the memory
> distribution and OOM situations.
Yup. Thanks for the pointer.
> > Essentially, Our idea here is to foresee potential memory exhaustion.
> > This foreseeing is done by observing the workload, observing the memory
> > usage of the workload. Based on this observations, we make a prediction
> > whether or not memory exhaustion could occur.
> 
> I understand that and I am not disputing this can be useful. All I do
> argue here is that there is unlikely a good "crystall ball" for most/all
> workloads that would justify its inclusion into the kernel and that this
> is something better done in the userspace where you can experiment and
> tune the behavior for a particular workload of your interest.
> 
> Therefore I would like to shift the discussion towards existing APIs and
> whether they are suitable for such an advance auto-tuning. I haven't
> heard any arguments about missing pieces.
I understand your concern here. Just confirming, by APIs you are
referring to sysctls, sysfs files and stuff like that right?
> > If memory exhaustion
> > occurs, we reclaim some more memory. kswapd stops reclaim when
> > hwmark is reached. hwmark is usually set to a fairly low percentage of
> > total memory, in my system for zone Normal hwmark is 13% of total pages.
> > So there is scope for reclaiming more pages to make sure system does not
> > suffer from a lack of pages. 
> 
> Yes and we have ways to control those watermarks that your monitoring
> tool can use to alter the reclaim behavior.
Just to confirm here, I am aware of one way which is to alter
min_kfree_bytes values. What other ways are there to alter watermarks
from user space? 
> [...]
> > > Therefore I would really focus on discussing whether we have sufficient
> > > APIs to tune the kernel to do the right thing when needed. That requires
> > > to identify gaps in that area. 
> > One thing that comes to my mind is based on the issue Khalid mentioned
> > earlier on how his desktop took more than 30secs to boot up because of
> > the caches using up a lot of memory.
> > Rather than allowing any unused memory to be the page cache, would it be
> > a good idea to fix a size for the caches and elastically change the size
> > based on the workload?
> 
> I do not think so. Limiting the pagecache is unlikely to help as it is
> really cheap to reclaim most of the time. In those cases when this is
> not the case (e.g. the underlying FS needs to flush and/or metadata)
> then the same would be possible in a restricted page cache situation
> and you could easily end up stalled waiting for pagecache (e.g. any
> executable/library) while there is a lot of memory.
That makes sense to me.
> I cannot comment on the Khalid's example because there were no details
> there but I would be really surprised if the primary source of stall was
> the pagecache.
Should have done more research before talking :) Sorry about that.
> -- 
> Michal Hocko
> SUSE Labs


Re: [RFC PATCH 0/2] Add predictive memory reclamation and compaction

2019-08-26 Thread Bharath Vedartham
Hi Michal,

Here are some of my thoughts,
On Wed, Aug 21, 2019 at 04:06:32PM +0200, Michal Hocko wrote:
> On Thu 15-08-19 14:51:04, Khalid Aziz wrote:
> > Hi Michal,
> > 
> > The smarts for tuning these knobs can be implemented in userspace and
> > more knobs added to allow for what is missing today, but we get back to
> > the same issue as before. That does nothing to make kernel self-tuning
> > and adds possibly even more knobs to userspace. Something so fundamental
> > to kernel memory management as making free pages available when they are
> > needed really should be taken care of in the kernel itself. Moving it to
> > userspace just means the kernel is hobbled unless one installs and tunes
> > a userspace package correctly.
> 
> From my past experience the existing autotunig works mostly ok for a
> vast variety of workloads. A more clever tuning is possible and people
> are doing that already. Especially for cases when the machine is heavily
> overcommited. There are different ways to achieve that. Your new
> in-kernel auto tuning would have to be tested on a large variety of
> workloads to be proven and riskless. So I am quite skeptical to be
> honest.
Could you give some references to such works regarding tuning the kernel? 

Essentially, Our idea here is to foresee potential memory exhaustion.
This foreseeing is done by observing the workload, observing the memory
usage of the workload. Based on this observations, we make a prediction
whether or not memory exhaustion could occur. If memory exhaustion
occurs, we reclaim some more memory. kswapd stops reclaim when
hwmark is reached. hwmark is usually set to a fairly low percentage of
total memory, in my system for zone Normal hwmark is 13% of total pages.
So there is scope for reclaiming more pages to make sure system does not
suffer from a lack of pages. 

Since we are "predicting", there could be mistakes in our prediction.
The question is how bad are the mistakes? How much does a wrong
prediction cost? 

A right prediction would be a win. We rightfully predict that there could be
exhaustion, this would lead to us reclaiming more memory(than hwmark)/compacting
memory beforehand(unlike kcompactd which does it on demand).

A wrong prediction on the other hand can be categorized into 2
situations: 
(i) We foresee memory exhaustion but there is no memory exhaustion in
the future. In this case, we would be reclaiming more memory for not a lot
of use. This situation is not entirely bad but we definitly waste a few
clock cycles.
(ii) We don't foresee memory exhaustion but there is memory exhaustion
in the future. This is a bad case where we may end up going into direct
compaction/reclaim. But it could be the case that the memory exhaustion
is far in the future and even though we didnt see it, kswapd could have
reclaimed that memory or drop_cache occured.

How often we hit wrong predictions of type (ii) would really determine our
efficiency. 

Coming to your situation of provisioning vms. A situation where our work
will come to good is when there is a cloud burst. When the demand for
vms is super high, our algorithm could adapt to the increase in demand
for these vms and reclaim more memory/compact more memory to reduce
allocation stalls and improve performance.
> Therefore I would really focus on discussing whether we have sufficient
> APIs to tune the kernel to do the right thing when needed. That requires
> to identify gaps in that area. 
One thing that comes to my mind is based on the issue Khalid mentioned
earlier on how his desktop took more than 30secs to boot up because of
the caches using up a lot of memory.
Rather than allowing any unused memory to be the page cache, would it be
a good idea to fix a size for the caches and elastically change the size
based on the workload?

Thank you
Bharath

> -- 
> Michal Hocko
> SUSE Labs
> 


net/dst_cache.c: preemption bug in net/dst_cache.c

2019-08-22 Thread Bharath Vedartham
Hi all,

I just want to bring attention to the syzbot bug [1]

Even though syzbot claims the bug to be in net/tipc, I feel it is in
net/dst_cache.c. Please correct me if I am wrong.

This bug is being triggered a lot of times by syzbot since the day it
was reported. Also given that this is core networking code, I felt it
was important to bring this to attention.

It looks like preemption needs to be disabled before using this_cpu_ptr
or maybe we would be better of using a get_cpu_var and put_cpu_var combo
here.

[1] 
https://syzkaller.appspot.com/bug?id=dc6352b92862eb79373fe03fdf9af5928753e057

Thank you
Bharath


Re: [Linux-kernel-mentees][PATCH v6 1/2] sgi-gru: Convert put_page() to put_user_page*()

2019-08-20 Thread Bharath Vedartham
On Mon, Aug 19, 2019 at 12:30:18PM -0700, John Hubbard wrote:
> On 8/19/19 12:06 PM, Bharath Vedartham wrote:
> >On Mon, Aug 19, 2019 at 07:56:11AM -0500, Dimitri Sivanich wrote:
> >>Reviewed-by: Dimitri Sivanich 
> >Thanks!
> >
> >John, would you like to take this patch into your miscellaneous
> >conversions patch set?
> >
> 
> (+Andrew and Michal, so they know where all this is going.)
> 
> Sure, although that conversion series [1] is on a brief hold, because
> there are additional conversions desired, and the API is still under
> discussion. Also, reading between the lines of Michal's response [2]
> about it, I think people would prefer that the next revision include
> the following, for each conversion site:
> 
> Conversion of gup/put_page sites:
> 
> Before:
> 
>   get_user_pages(...);
>   ...
>   for each page:
>   put_page();
> 
> After:
>   
>   gup_flags |= FOLL_PIN; (maybe FOLL_LONGTERM in some cases)
>   vaddr_pin_user_pages(...gup_flags...)
>   ...
>   vaddr_unpin_user_pages(); /* which invokes put_user_page() */
> 
> Fortunately, it's not harmful for the simpler conversion from put_page()
> to put_user_page() to happen first, and in fact those have usually led
> to simplifications, paving the way to make it easier to call
> vaddr_unpin_user_pages(), once it's ready. (And showing exactly what
> to convert, too.)
> 
> So for now, I'm going to just build on top of Ira's tree, and once the
> vaddr*() API settles down, I'll send out an updated series that attempts
> to include the reviews and ACKs so far (I'll have to review them, but
> make a note that review or ACK was done for part of the conversion),
> and adds the additional gup(FOLL_PIN), and uses vaddr*() wrappers instead of
> gup/pup.
> 
> [1] https://lore.kernel.org/r/20190807013340.9706-1-jhubb...@nvidia.com
> 
> [2] https://lore.kernel.org/r/20190809175210.gr18...@dhcp22.suse.cz
> 
Cc' lkml(I missed out the 'l' in this series). 

sounds good. It makes sense to keep the entire gup in the kernel rather
than to expose it outside. 

I ll make sure to checkout the emails on vaddr*() API and pace my work
on it accordingly.

Thank you
Bharath
> thanks,
> -- 
> John Hubbard
> NVIDIA


Re: [Question-kvm] Can hva_to_pfn_fast be executed in interrupt context?

2019-08-20 Thread Bharath Vedartham
On Thu, Aug 15, 2019 at 08:26:43PM +0200, Paolo Bonzini wrote:
> Oh, I see. Sorry I didn't understand the question. In the case of KVM,
> there's simply no code that runs in interrupt context and needs to use
> virtual addresses.
> 
> In fact, there's no code that runs in interrupt context at all. The only
> code that deals with host interrupts in a virtualization host is in VFIO,
> but all it needs to do is signal an eventfd.
> 
> Paolo
Great, answers my question. Thank you for your time.

Thank you
Bharath
> 
> Il gio 15 ago 2019, 19:18 Bharath Vedartham  ha
> scritto:
> 
> > On Tue, Aug 13, 2019 at 10:17:09PM +0200, Paolo Bonzini wrote:
> > > On 13/08/19 21:14, Bharath Vedartham wrote:
> > > > Hi all,
> > > >
> > > > I was looking at the function hva_to_pfn_fast(in virt/kvm/kvm_main)
> > which is
> > > > executed in an atomic context(even in non-atomic context, since
> > > > hva_to_pfn_fast is much faster than hva_to_pfn_slow).
> > > >
> > > > My question is can this be executed in an interrupt context?
> > >
> > > No, it cannot for the reason you mention below.
> > >
> > > Paolo
> > hmm.. Well I expected the answer to be kvm specific.
> > Because I observed a similar use-case for a driver (sgi-gru) where
> > we want to retrive the physical address of a virtual address. This was
> > done in atomic and non-atomic context similar to hva_to_pfn_fast and
> > hva_to_pfn_slow. __get_user_pages_fast(for atomic case)
> > would not work as the driver could execute in interrupt context.
> >
> > The driver manually walked the page tables to handle this issue.
> >
> > Since kvm is a widely used piece of code, I asked this question to know
> > how kvm handled this issue.
> >
> > Thank you for your time.
> >
> > Thank you
> > Bharath
> > > > The motivation for this question is that in an interrupt context, we
> > cannot
> > > > assume "current" to be the task_struct of the process of interest.
> > > > __get_user_pages_fast assume current->mm when walking the process page
> > > > tables.
> > > >
> > > > So if this function hva_to_pfn_fast can be executed in an
> > > > interrupt context, it would not be safe to retrive the pfn with
> > > > __get_user_pages_fast.
> > > >
> > > > Thoughts on this?
> > > >
> > > > Thank you
> > > > Bharath
> > > >
> > >
> >


Re: [Linux-kernel-mentees][PATCH v6 1/2] sgi-gru: Convert put_page() to put_user_page*()

2019-08-19 Thread Bharath Vedartham
On Mon, Aug 19, 2019 at 07:56:11AM -0500, Dimitri Sivanich wrote:
> Reviewed-by: Dimitri Sivanich 
Thanks!

John, would you like to take this patch into your miscellaneous
conversions patch set?

Thank you
Bharath
> On Mon, Aug 19, 2019 at 01:08:54AM +0530, Bharath Vedartham wrote:
> > For pages that were retained via get_user_pages*(), release those pages
> > via the new put_user_page*() routines, instead of via put_page() or
> > release_pages().
> > 
> > This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> > ("mm: introduce put_user_page*(), placeholder versions").
> > 
> > Cc: Ira Weiny 
> > Cc: John Hubbard 
> > Cc: Jérôme Glisse 
> > Cc: Greg Kroah-Hartman 
> > Cc: Dimitri Sivanich 
> > Cc: Arnd Bergmann 
> > Cc: William Kucharski 
> > Cc: Christoph Hellwig 
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux...@kvack.org
> > Cc: linux-kernel-ment...@lists.linuxfoundation.org
> > Reviewed-by: Ira Weiny 
> > Reviewed-by: John Hubbard 
> > Reviewed-by: William Kucharski 
> > Signed-off-by: Bharath Vedartham 
> > ---
> >  drivers/misc/sgi-gru/grufault.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/misc/sgi-gru/grufault.c 
> > b/drivers/misc/sgi-gru/grufault.c
> > index 4b713a8..61b3447 100644
> > --- a/drivers/misc/sgi-gru/grufault.c
> > +++ b/drivers/misc/sgi-gru/grufault.c
> > @@ -188,7 +188,7 @@ static int non_atomic_pte_lookup(struct vm_area_struct 
> > *vma,
> > if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, , NULL) <= 0)
> > return -EFAULT;
> > *paddr = page_to_phys(page);
> > -   put_page(page);
> > +   put_user_page(page);
> > return 0;
> >  }
> >  
> > -- 
> > 2.7.4
> > 


Re: [Linux-kernel-mentees][PATCH 2/2] sgi-gru: Remove uneccessary ifdef for CONFIG_HUGETLB_PAGE

2019-08-19 Thread Bharath Vedartham
On Mon, Aug 19, 2019 at 08:00:57AM -0500, Dimitri Sivanich wrote:
> Reviewed-by: Dimitri Sivanich 
Thanks! 
> On Mon, Aug 19, 2019 at 01:08:55AM +0530, Bharath Vedartham wrote:
> > is_vm_hugetlb_page will always return false if CONFIG_HUGETLB_PAGE is
> > not set.
> > 
> > Cc: Ira Weiny 
> > Cc: John Hubbard 
> > Cc: Jérôme Glisse 
> > Cc: Greg Kroah-Hartman 
> > Cc: Dimitri Sivanich 
> > Cc: Arnd Bergmann 
> > Cc: William Kucharski 
> > Cc: Christoph Hellwig 
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux...@kvack.org
> > Cc: linux-kernel-ment...@lists.linuxfoundation.org
> > Reviewed-by: Ira Weiny 
> > Reviewed-by: John Hubbard 
> > Reviewed-by: William Kucharski 
> > Signed-off-by: Bharath Vedartham 
> > ---
> >  drivers/misc/sgi-gru/grufault.c | 21 +++--
> >  1 file changed, 11 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/misc/sgi-gru/grufault.c 
> > b/drivers/misc/sgi-gru/grufault.c
> > index 61b3447..bce47af 100644
> > --- a/drivers/misc/sgi-gru/grufault.c
> > +++ b/drivers/misc/sgi-gru/grufault.c
> > @@ -180,11 +180,11 @@ static int non_atomic_pte_lookup(struct 
> > vm_area_struct *vma,
> >  {
> > struct page *page;
> >  
> > -#ifdef CONFIG_HUGETLB_PAGE
> > -   *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> > -#else
> > -   *pageshift = PAGE_SHIFT;
> > -#endif
> > +   if (unlikely(is_vm_hugetlb_page(vma)))
> > +   *pageshift = HPAGE_SHIFT;
> > +   else
> > +   *pageshift = PAGE_SHIFT;
> > +
> > if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, , NULL) <= 0)
> > return -EFAULT;
> > *paddr = page_to_phys(page);
> > @@ -238,11 +238,12 @@ static int atomic_pte_lookup(struct vm_area_struct 
> > *vma, unsigned long vaddr,
> > return 1;
> >  
> > *paddr = pte_pfn(pte) << PAGE_SHIFT;
> > -#ifdef CONFIG_HUGETLB_PAGE
> > -   *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> > -#else
> > -   *pageshift = PAGE_SHIFT;
> > -#endif
> > +
> > +   if (unlikely(is_vm_hugetlb_page(vma)))
> > +   *pageshift = HPAGE_SHIFT;
> > +   else
> > +   *pageshift = PAGE_SHIFT;
> > +
> > return 0;
> >  
> >  err:
> > -- 
> > 2.7.4
> > 


Re: [Linux-kernel-mentees][PATCH v6 0/2] get_user_pages changes

2019-08-18 Thread Bharath Vedartham
On Mon, Aug 19, 2019 at 01:08:53AM +0530, Bharath Vedartham wrote:
CC'ing lkml, the mail id was wrong.

> This version only converts put_page to put_user_page and removes
> an unecessary ifdef. 
> 
> It does not convert atomic_pte_lookup to __get_user_pages as
> gru_vtop could run in an interrupt context in which we can't assume
> current as __get_user_pages does.
> 
> Bharath Vedartham (2):
>   sgi-gru: Convert put_page() to put_user_page*()
>   sgi-gru: Remove uneccessary ifdef for CONFIG_HUGETLB_PAGE
> 
>  drivers/misc/sgi-gru/grufault.c | 23 ---
>  1 file changed, 12 insertions(+), 11 deletions(-)
> 
> -- 
> 2.7.4
> 


Re: [Linux-kernel-mentees][PATCH v6 1/2] sgi-gru: Convert put_page() to put_user_page*()

2019-08-18 Thread Bharath Vedartham
CC'ing lkml.

On Mon, Aug 19, 2019 at 01:08:54AM +0530, Bharath Vedartham wrote:
> For pages that were retained via get_user_pages*(), release those pages
> via the new put_user_page*() routines, instead of via put_page() or
> release_pages().
> 
> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions").
> 
> Cc: Ira Weiny 
> Cc: John Hubbard 
> Cc: Jérôme Glisse 
> Cc: Greg Kroah-Hartman 
> Cc: Dimitri Sivanich 
> Cc: Arnd Bergmann 
> Cc: William Kucharski 
> Cc: Christoph Hellwig 
> Cc: linux-kernel@vger.kernel.org
> Cc: linux...@kvack.org
> Cc: linux-kernel-ment...@lists.linuxfoundation.org
> Reviewed-by: Ira Weiny 
> Reviewed-by: John Hubbard 
> Reviewed-by: William Kucharski 
> Signed-off-by: Bharath Vedartham 
> ---
>  drivers/misc/sgi-gru/grufault.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
> index 4b713a8..61b3447 100644
> --- a/drivers/misc/sgi-gru/grufault.c
> +++ b/drivers/misc/sgi-gru/grufault.c
> @@ -188,7 +188,7 @@ static int non_atomic_pte_lookup(struct vm_area_struct 
> *vma,
>   if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, , NULL) <= 0)
>   return -EFAULT;
>   *paddr = page_to_phys(page);
> - put_page(page);
> + put_user_page(page);
>   return 0;
>  }
>  
> -- 
> 2.7.4
> 


Re: [Linux-kernel-mentees][PATCH 2/2] sgi-gru: Remove uneccessary ifdef for CONFIG_HUGETLB_PAGE

2019-08-18 Thread Bharath Vedartham
CC'ing lkml.

On Mon, Aug 19, 2019 at 01:08:55AM +0530, Bharath Vedartham wrote:
> is_vm_hugetlb_page will always return false if CONFIG_HUGETLB_PAGE is
> not set.
> 
> Cc: Ira Weiny 
> Cc: John Hubbard 
> Cc: Jérôme Glisse 
> Cc: Greg Kroah-Hartman 
> Cc: Dimitri Sivanich 
> Cc: Arnd Bergmann 
> Cc: William Kucharski 
> Cc: Christoph Hellwig 
> Cc: linux-kernel@vger.kernel.org
> Cc: linux...@kvack.org
> Cc: linux-kernel-ment...@lists.linuxfoundation.org
> Reviewed-by: Ira Weiny 
> Reviewed-by: John Hubbard 
> Reviewed-by: William Kucharski 
> Signed-off-by: Bharath Vedartham 
> ---
>  drivers/misc/sgi-gru/grufault.c | 21 +++--
>  1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
> index 61b3447..bce47af 100644
> --- a/drivers/misc/sgi-gru/grufault.c
> +++ b/drivers/misc/sgi-gru/grufault.c
> @@ -180,11 +180,11 @@ static int non_atomic_pte_lookup(struct vm_area_struct 
> *vma,
>  {
>   struct page *page;
>  
> -#ifdef CONFIG_HUGETLB_PAGE
> - *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> -#else
> - *pageshift = PAGE_SHIFT;
> -#endif
> + if (unlikely(is_vm_hugetlb_page(vma)))
> + *pageshift = HPAGE_SHIFT;
> + else
> + *pageshift = PAGE_SHIFT;
> +
>   if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, , NULL) <= 0)
>   return -EFAULT;
>   *paddr = page_to_phys(page);
> @@ -238,11 +238,12 @@ static int atomic_pte_lookup(struct vm_area_struct 
> *vma, unsigned long vaddr,
>   return 1;
>  
>   *paddr = pte_pfn(pte) << PAGE_SHIFT;
> -#ifdef CONFIG_HUGETLB_PAGE
> - *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> -#else
> - *pageshift = PAGE_SHIFT;
> -#endif
> +
> + if (unlikely(is_vm_hugetlb_page(vma)))
> + *pageshift = HPAGE_SHIFT;
> + else
> + *pageshift = PAGE_SHIFT;
> +
>   return 0;
>  
>  err:
> -- 
> 2.7.4
> 


[Linux-kernel-mentees][PATCH 2/2] sgi-gru: Remove uneccessary ifdef for CONFIG_HUGETLB_PAGE

2019-08-18 Thread Bharath Vedartham
is_vm_hugetlb_page will always return false if CONFIG_HUGETLB_PAGE is
not set.

Cc: Ira Weiny 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Greg Kroah-Hartman 
Cc: Dimitri Sivanich 
Cc: Arnd Bergmann 
Cc: William Kucharski 
Cc: Christoph Hellwig 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-kernel-ment...@lists.linuxfoundation.org
Reviewed-by: Ira Weiny 
Reviewed-by: John Hubbard 
Reviewed-by: William Kucharski 
Signed-off-by: Bharath Vedartham 
---
 drivers/misc/sgi-gru/grufault.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index 61b3447..bce47af 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -180,11 +180,11 @@ static int non_atomic_pte_lookup(struct vm_area_struct 
*vma,
 {
struct page *page;
 
-#ifdef CONFIG_HUGETLB_PAGE
-   *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
-#else
-   *pageshift = PAGE_SHIFT;
-#endif
+   if (unlikely(is_vm_hugetlb_page(vma)))
+   *pageshift = HPAGE_SHIFT;
+   else
+   *pageshift = PAGE_SHIFT;
+
if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, , NULL) <= 0)
return -EFAULT;
*paddr = page_to_phys(page);
@@ -238,11 +238,12 @@ static int atomic_pte_lookup(struct vm_area_struct *vma, 
unsigned long vaddr,
return 1;
 
*paddr = pte_pfn(pte) << PAGE_SHIFT;
-#ifdef CONFIG_HUGETLB_PAGE
-   *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
-#else
-   *pageshift = PAGE_SHIFT;
-#endif
+
+   if (unlikely(is_vm_hugetlb_page(vma)))
+   *pageshift = HPAGE_SHIFT;
+   else
+   *pageshift = PAGE_SHIFT;
+
return 0;
 
 err:
-- 
2.7.4



[Linux-kernel-mentees][PATCH v6 1/2] sgi-gru: Convert put_page() to put_user_page*()

2019-08-18 Thread Bharath Vedartham
For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Ira Weiny 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Greg Kroah-Hartman 
Cc: Dimitri Sivanich 
Cc: Arnd Bergmann 
Cc: William Kucharski 
Cc: Christoph Hellwig 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-kernel-ment...@lists.linuxfoundation.org
Reviewed-by: Ira Weiny 
Reviewed-by: John Hubbard 
Reviewed-by: William Kucharski 
Signed-off-by: Bharath Vedartham 
---
 drivers/misc/sgi-gru/grufault.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index 4b713a8..61b3447 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -188,7 +188,7 @@ static int non_atomic_pte_lookup(struct vm_area_struct *vma,
if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, , NULL) <= 0)
return -EFAULT;
*paddr = page_to_phys(page);
-   put_page(page);
+   put_user_page(page);
return 0;
 }
 
-- 
2.7.4



Re: [Linux-kernel-mentees][PATCH v5 1/1] sgi-gru: Remove *pte_lookup functions, Convert to get_user_page*()

2019-08-18 Thread Bharath Vedartham
Hi Dimitri,

Can you confirm that this driver will run gru_vtop() in interrupt
context?

If so, I ll send you another set of patches in which I don't change the
*pte_lookup functions but only change put_page to put_user_page and
remove the ifdef for CONFIG_HUGETLB_PAGE.

Thank you for your time.

Thank you
Bharath

On Sat, Aug 10, 2019 at 01:08:17AM +0530, Bharath Vedartham wrote:
> For pages that were retained via get_user_pages*(), release those pages
> via the new put_user_page*() routines, instead of via put_page() or
> release_pages().
> 
> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions").
> 
> As part of this conversion, the *pte_lookup functions can be removed and
> be easily replaced with get_user_pages_fast() functions. In the case of
> atomic lookup, __get_user_pages_fast() is used, because it does not fall
> back to the slow path: get_user_pages(). get_user_pages_fast(), on the other
> hand, first calls __get_user_pages_fast(), but then falls back to the
> slow path if __get_user_pages_fast() fails.
> 
> Also: remove unnecessary CONFIG_HUGETLB ifdefs.
> 
> Cc: Ira Weiny 
> Cc: John Hubbard 
> Cc: Jérôme Glisse 
> Cc: Greg Kroah-Hartman 
> Cc: Dimitri Sivanich 
> Cc: Arnd Bergmann 
> Cc: William Kucharski 
> Cc: Christoph Hellwig 
> Cc: linux-kernel@vger.kernel.org
> Cc: linux...@kvack.org
> Cc: linux-kernel-ment...@lists.linuxfoundation.org
> Reviewed-by: Ira Weiny 
> Reviewed-by: John Hubbard 
> Reviewed-by: William Kucharski 
> Signed-off-by: Bharath Vedartham 
> ---
> This is a fold of the 3 patches in the v2 patch series.
> The review tags were given to the individual patches.
> 
> Changes since v3
>   - Used gup flags in get_user_pages_fast rather than
>   boolean flags.
> Changes since v4
>   - Updated changelog according to John Hubbard.
> ---
>  drivers/misc/sgi-gru/grufault.c | 112 
> +---
>  1 file changed, 24 insertions(+), 88 deletions(-)
> 
> diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
> index 4b713a8..304e9c5 100644
> --- a/drivers/misc/sgi-gru/grufault.c
> +++ b/drivers/misc/sgi-gru/grufault.c
> @@ -166,96 +166,20 @@ static void get_clear_fault_map(struct gru_state *gru,
>  }
>  
>  /*
> - * Atomic (interrupt context) & non-atomic (user context) functions to
> - * convert a vaddr into a physical address. The size of the page
> - * is returned in pageshift.
> - *   returns:
> - * 0 - successful
> - *   < 0 - error code
> - * 1 - (atomic only) try again in non-atomic context
> - */
> -static int non_atomic_pte_lookup(struct vm_area_struct *vma,
> -  unsigned long vaddr, int write,
> -  unsigned long *paddr, int *pageshift)
> -{
> - struct page *page;
> -
> -#ifdef CONFIG_HUGETLB_PAGE
> - *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> -#else
> - *pageshift = PAGE_SHIFT;
> -#endif
> - if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, , NULL) <= 0)
> - return -EFAULT;
> - *paddr = page_to_phys(page);
> - put_page(page);
> - return 0;
> -}
> -
> -/*
> - * atomic_pte_lookup
> + * mmap_sem is already helod on entry to this function. This guarantees
> + * existence of the page tables.
>   *
> - * Convert a user virtual address to a physical address
>   * Only supports Intel large pages (2MB only) on x86_64.
> - *   ZZZ - hugepage support is incomplete
> - *
> - * NOTE: mmap_sem is already held on entry to this function. This
> - * guarantees existence of the page tables.
> + *   ZZZ - hugepage support is incomplete.
>   */
> -static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
> - int write, unsigned long *paddr, int *pageshift)
> -{
> - pgd_t *pgdp;
> - p4d_t *p4dp;
> - pud_t *pudp;
> - pmd_t *pmdp;
> - pte_t pte;
> -
> - pgdp = pgd_offset(vma->vm_mm, vaddr);
> - if (unlikely(pgd_none(*pgdp)))
> - goto err;
> -
> - p4dp = p4d_offset(pgdp, vaddr);
> - if (unlikely(p4d_none(*p4dp)))
> - goto err;
> -
> - pudp = pud_offset(p4dp, vaddr);
> - if (unlikely(pud_none(*pudp)))
> - goto err;
> -
> - pmdp = pmd_offset(pudp, vaddr);
> - if (unlikely(pmd_none(*pmdp)))
> - goto err;
> -#ifdef CONFIG_X86_64
> - if (unlikely(pmd_large(*pmdp)))
> - pte = *(pte_t *) pmdp;
> - else
> -#endif
> - pte = *pte_offset_kernel(pmdp, vaddr);
>

Re: [Question-kvm] Can hva_to_pfn_fast be executed in interrupt context?

2019-08-15 Thread Bharath Vedartham
On Tue, Aug 13, 2019 at 10:17:09PM +0200, Paolo Bonzini wrote:
> On 13/08/19 21:14, Bharath Vedartham wrote:
> > Hi all,
> > 
> > I was looking at the function hva_to_pfn_fast(in virt/kvm/kvm_main) which 
> > is 
> > executed in an atomic context(even in non-atomic context, since
> > hva_to_pfn_fast is much faster than hva_to_pfn_slow).
> > 
> > My question is can this be executed in an interrupt context? 
> 
> No, it cannot for the reason you mention below.
> 
> Paolo
hmm.. Well I expected the answer to be kvm specific. 
Because I observed a similar use-case for a driver (sgi-gru) where 
we want to retrive the physical address of a virtual address. This was
done in atomic and non-atomic context similar to hva_to_pfn_fast and
hva_to_pfn_slow. __get_user_pages_fast(for atomic case) 
would not work as the driver could execute in interrupt context.

The driver manually walked the page tables to handle this issue.

Since kvm is a widely used piece of code, I asked this question to know
how kvm handled this issue. 

Thank you for your time.

Thank you
Bharath
> > The motivation for this question is that in an interrupt context, we cannot
> > assume "current" to be the task_struct of the process of interest.
> > __get_user_pages_fast assume current->mm when walking the process page
> > tables. 
> > 
> > So if this function hva_to_pfn_fast can be executed in an
> > interrupt context, it would not be safe to retrive the pfn with
> > __get_user_pages_fast. 
> > 
> > Thoughts on this?
> > 
> > Thank you
> > Bharath
> > 
> 


Re: [Linux-kernel-mentees][PATCH v5 1/1] sgi-gru: Remove *pte_lookup functions, Convert to get_user_page*()

2019-08-14 Thread Bharath Vedartham
On Wed, Aug 14, 2019 at 02:38:30PM -0300, Jason Gunthorpe wrote:
> On Wed, Aug 14, 2019 at 11:00:34PM +0530, Bharath Vedartham wrote:
> > On Tue, Aug 13, 2019 at 01:19:38PM -0500, Dimitri Sivanich wrote:
> > > On Tue, Aug 13, 2019 at 10:53:01PM +0530, Bharath Vedartham wrote:
> > > > On Tue, Aug 13, 2019 at 09:50:29AM -0500, Dimitri Sivanich wrote:
> > > > > Bharath,
> > > > > 
> > > > > I do not believe that __get_user_pages_fast will work for the atomic 
> > > > > case, as
> > > > > there is no guarantee that the 'current->mm' will be the correct one 
> > > > > for the
> > > > > process in question, as the process might have moved away from the 
> > > > > cpu that is
> > > > > handling interrupts for it's context.
> > > > So what your saying is, there may be cases where current->mm != 
> > > > gts->ts_mm
> > > > right? __get_user_pages_fast and get_user_pages do assume current->mm.
> > > 
> > > Correct, in the case of atomic context.
> > > 
> > > > 
> > > > These changes were inspired a bit from kvm. In kvm/kvm_main.c,
> > > > hva_to_pfn_fast uses __get_user_pages_fast. THe comment above the
> > > > function states it runs in atomic context.
> > > > 
> > > > Just curious, get_user_pages also uses current->mm. Do you think that is
> > > > also an issue? 
> > > 
> > > Not in non-atomic context.  Notice that it is currently done that way.
> > > 
> > > > 
> > > > Do you feel using get_user_pages_remote would be a better idea? We can
> > > > specify the mm_struct in get_user_pages_remote?
> > > 
> > > From that standpoint maybe, but is it safe in interrupt context?
> > Hmm.. The gup maintainers seemed fine with the code..
> > 
> > Now this is only an issue if gru_vtop can be executed in an interrupt
> > context. 
> > 
> > get_user_pages_remote is not valid in an interrupt context(if CONFIG_MMU
> > is set). If we follow the function, in __get_user_pages, cond_resched()
> > is called which definitly confirms that we can't run this function in an
> > interrupt context. 
> > 
> > I think we might need some advice from the gup maintainers here.
> > Note that the comment on the function __get_user_pages_fast states that
> > __get_user_pages_fast is IRQ-safe.
> 
> vhost is doing some approach where they switch current to the target
> then call __get_user_pages_fast in an IRQ context, that might be a
> reasonable pattern
> 
> If this is a regular occurance we should probably add a
> get_atomic_user_pages_remote() to make the pattern clear.
> 
> Jason

That makes sense. get_atomic_user_pages_remote() should not be hard to
write. AFAIKS __get_user_pages_fast is special_cased for current, we
could probably just add a new parameter of the mm_struct to the page
table walking code in gup.c

But till then I think we can approach this by the way vhost approaches
this problem by switching current to the target. 

Thank you
Bharath


Re: [Linux-kernel-mentees][PATCH v5 1/1] sgi-gru: Remove *pte_lookup functions, Convert to get_user_page*()

2019-08-14 Thread Bharath Vedartham
On Tue, Aug 13, 2019 at 01:19:38PM -0500, Dimitri Sivanich wrote:
> On Tue, Aug 13, 2019 at 10:53:01PM +0530, Bharath Vedartham wrote:
> > On Tue, Aug 13, 2019 at 09:50:29AM -0500, Dimitri Sivanich wrote:
> > > Bharath,
> > > 
> > > I do not believe that __get_user_pages_fast will work for the atomic 
> > > case, as
> > > there is no guarantee that the 'current->mm' will be the correct one for 
> > > the
> > > process in question, as the process might have moved away from the cpu 
> > > that is
> > > handling interrupts for it's context.
> > So what your saying is, there may be cases where current->mm != gts->ts_mm
> > right? __get_user_pages_fast and get_user_pages do assume current->mm.
> 
> Correct, in the case of atomic context.
> 
> > 
> > These changes were inspired a bit from kvm. In kvm/kvm_main.c,
> > hva_to_pfn_fast uses __get_user_pages_fast. THe comment above the
> > function states it runs in atomic context.
> > 
> > Just curious, get_user_pages also uses current->mm. Do you think that is
> > also an issue? 
> 
> Not in non-atomic context.  Notice that it is currently done that way.
> 
> > 
> > Do you feel using get_user_pages_remote would be a better idea? We can
> > specify the mm_struct in get_user_pages_remote?
> 
> From that standpoint maybe, but is it safe in interrupt context?
Hmm.. The gup maintainers seemed fine with the code..

Now this is only an issue if gru_vtop can be executed in an interrupt
context. 

get_user_pages_remote is not valid in an interrupt context(if CONFIG_MMU
is set). If we follow the function, in __get_user_pages, cond_resched()
is called which definitly confirms that we can't run this function in an
interrupt context. 

I think we might need some advice from the gup maintainers here.
Note that the comment on the function __get_user_pages_fast states that
__get_user_pages_fast is IRQ-safe.

Thank you
Bharath
> > 
> > Thank you
> > Bharath
> > > On Sat, Aug 10, 2019 at 01:08:17AM +0530, Bharath Vedartham wrote:
> > > > For pages that were retained via get_user_pages*(), release those pages
> > > > via the new put_user_page*() routines, instead of via put_page() or
> > > > release_pages().
> > > > 
> > > > This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> > > > ("mm: introduce put_user_page*(), placeholder versions").
> > > > 
> > > > As part of this conversion, the *pte_lookup functions can be removed and
> > > > be easily replaced with get_user_pages_fast() functions. In the case of
> > > > atomic lookup, __get_user_pages_fast() is used, because it does not fall
> > > > back to the slow path: get_user_pages(). get_user_pages_fast(), on the 
> > > > other
> > > > hand, first calls __get_user_pages_fast(), but then falls back to the
> > > > slow path if __get_user_pages_fast() fails.
> > > > 
> > > > Also: remove unnecessary CONFIG_HUGETLB ifdefs.
> > > > 
> > > > Cc: Ira Weiny 
> > > > Cc: John Hubbard 
> > > > Cc: Jérôme Glisse 
> > > > Cc: Greg Kroah-Hartman 
> > > > Cc: Dimitri Sivanich 
> > > > Cc: Arnd Bergmann 
> > > > Cc: William Kucharski 
> > > > Cc: Christoph Hellwig 
> > > > Cc: linux-kernel@vger.kernel.org
> > > > Cc: linux...@kvack.org
> > > > Cc: linux-kernel-ment...@lists.linuxfoundation.org
> > > > Reviewed-by: Ira Weiny 
> > > > Reviewed-by: John Hubbard 
> > > > Reviewed-by: William Kucharski 
> > > > Signed-off-by: Bharath Vedartham 
> > > > ---
> > > > This is a fold of the 3 patches in the v2 patch series.
> > > > The review tags were given to the individual patches.
> > > > 
> > > > Changes since v3
> > > > - Used gup flags in get_user_pages_fast rather than
> > > > boolean flags.
> > > > Changes since v4
> > > > - Updated changelog according to John Hubbard.
> > > > ---
> > > >  drivers/misc/sgi-gru/grufault.c | 112 
> > > > +---
> > > >  1 file changed, 24 insertions(+), 88 deletions(-)
> > > > 
> > > > diff --git a/drivers/misc/sgi-gru/grufault.c 
> > > > b/drivers/misc/sgi-gru/grufault.c
> > > > index 4b713a8..304e9c5 100644
> > > > --- a/drivers/misc/sgi-gru/grufault.c
> > > > +++ b/drivers/misc/sgi-gru/grufault.c
> 

[Question-kvm] Can hva_to_pfn_fast be executed in interrupt context?

2019-08-13 Thread Bharath Vedartham
Hi all,

I was looking at the function hva_to_pfn_fast(in virt/kvm/kvm_main) which is 
executed in an atomic context(even in non-atomic context, since
hva_to_pfn_fast is much faster than hva_to_pfn_slow).

My question is can this be executed in an interrupt context? 

The motivation for this question is that in an interrupt context, we cannot
assume "current" to be the task_struct of the process of interest.
__get_user_pages_fast assume current->mm when walking the process page
tables. 

So if this function hva_to_pfn_fast can be executed in an
interrupt context, it would not be safe to retrive the pfn with
__get_user_pages_fast. 

Thoughts on this?

Thank you
Bharath


Re: [Linux-kernel-mentees][PATCH v5 1/1] sgi-gru: Remove *pte_lookup functions, Convert to get_user_page*()

2019-08-13 Thread Bharath Vedartham
On Tue, Aug 13, 2019 at 09:50:29AM -0500, Dimitri Sivanich wrote:
> Bharath,
> 
> I do not believe that __get_user_pages_fast will work for the atomic case, as
> there is no guarantee that the 'current->mm' will be the correct one for the
> process in question, as the process might have moved away from the cpu that is
> handling interrupts for it's context.
So what your saying is, there may be cases where current->mm != gts->ts_mm
right? __get_user_pages_fast and get_user_pages do assume current->mm.

These changes were inspired a bit from kvm. In kvm/kvm_main.c,
hva_to_pfn_fast uses __get_user_pages_fast. THe comment above the
function states it runs in atomic context.

Just curious, get_user_pages also uses current->mm. Do you think that is
also an issue? 

Do you feel using get_user_pages_remote would be a better idea? We can
specify the mm_struct in get_user_pages_remote?

Thank you
Bharath
> On Sat, Aug 10, 2019 at 01:08:17AM +0530, Bharath Vedartham wrote:
> > For pages that were retained via get_user_pages*(), release those pages
> > via the new put_user_page*() routines, instead of via put_page() or
> > release_pages().
> > 
> > This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> > ("mm: introduce put_user_page*(), placeholder versions").
> > 
> > As part of this conversion, the *pte_lookup functions can be removed and
> > be easily replaced with get_user_pages_fast() functions. In the case of
> > atomic lookup, __get_user_pages_fast() is used, because it does not fall
> > back to the slow path: get_user_pages(). get_user_pages_fast(), on the other
> > hand, first calls __get_user_pages_fast(), but then falls back to the
> > slow path if __get_user_pages_fast() fails.
> > 
> > Also: remove unnecessary CONFIG_HUGETLB ifdefs.
> > 
> > Cc: Ira Weiny 
> > Cc: John Hubbard 
> > Cc: Jérôme Glisse 
> > Cc: Greg Kroah-Hartman 
> > Cc: Dimitri Sivanich 
> > Cc: Arnd Bergmann 
> > Cc: William Kucharski 
> > Cc: Christoph Hellwig 
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux...@kvack.org
> > Cc: linux-kernel-ment...@lists.linuxfoundation.org
> > Reviewed-by: Ira Weiny 
> > Reviewed-by: John Hubbard 
> > Reviewed-by: William Kucharski 
> > Signed-off-by: Bharath Vedartham 
> > ---
> > This is a fold of the 3 patches in the v2 patch series.
> > The review tags were given to the individual patches.
> > 
> > Changes since v3
> > - Used gup flags in get_user_pages_fast rather than
> > boolean flags.
> > Changes since v4
> > - Updated changelog according to John Hubbard.
> > ---
> >  drivers/misc/sgi-gru/grufault.c | 112 
> > +---
> >  1 file changed, 24 insertions(+), 88 deletions(-)
> > 
> > diff --git a/drivers/misc/sgi-gru/grufault.c 
> > b/drivers/misc/sgi-gru/grufault.c
> > index 4b713a8..304e9c5 100644
> > --- a/drivers/misc/sgi-gru/grufault.c
> > +++ b/drivers/misc/sgi-gru/grufault.c
> > @@ -166,96 +166,20 @@ static void get_clear_fault_map(struct gru_state *gru,
> >  }
> >  
> >  /*
> > - * Atomic (interrupt context) & non-atomic (user context) functions to
> > - * convert a vaddr into a physical address. The size of the page
> > - * is returned in pageshift.
> > - * returns:
> > - *   0 - successful
> > - * < 0 - error code
> > - *   1 - (atomic only) try again in non-atomic context
> > - */
> > -static int non_atomic_pte_lookup(struct vm_area_struct *vma,
> > -unsigned long vaddr, int write,
> > -unsigned long *paddr, int *pageshift)
> > -{
> > -   struct page *page;
> > -
> > -#ifdef CONFIG_HUGETLB_PAGE
> > -   *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> > -#else
> > -   *pageshift = PAGE_SHIFT;
> > -#endif
> > -   if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, , NULL) <= 0)
> > -   return -EFAULT;
> > -   *paddr = page_to_phys(page);
> > -   put_page(page);
> > -   return 0;
> > -}
> > -
> > -/*
> > - * atomic_pte_lookup
> > + * mmap_sem is already helod on entry to this function. This guarantees
> > + * existence of the page tables.
> >   *
> > - * Convert a user virtual address to a physical address
> >   * Only supports Intel large pages (2MB only) on x86_64.
> > - * ZZZ - hugepage support is incomplete
> > - *
> > - * NOTE: mmap_sem is already held on entry to this function. This
> >

[Linux-kernel-mentees][PATCH v5 0/1] get_user_pages changes

2019-08-09 Thread Bharath Vedartham
In this 5th version of the patch series, I have compressed the patches
of the v2 patch series into one patch. This was suggested by Christoph Hellwig.
The suggestion was to remove the pte_lookup functions and use the 
get_user_pages* functions directly instead of the pte_lookup functions.

There is nothing different in this series compared to the v2
series, It essentially compresses the 3 patches of the original series
into one patch.

This series survives a compile test.

Bharath Vedartham (1):
  sgi-gru: Remove *pte_lookup functions

 drivers/misc/sgi-gru/grufault.c | 112 +---
 1 file changed, 24 insertions(+), 88 deletions(-)

-- 
2.7.4



[Linux-kernel-mentees][PATCH v5 1/1] sgi-gru: Remove *pte_lookup functions, Convert to get_user_page*()

2019-08-09 Thread Bharath Vedartham
For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

As part of this conversion, the *pte_lookup functions can be removed and
be easily replaced with get_user_pages_fast() functions. In the case of
atomic lookup, __get_user_pages_fast() is used, because it does not fall
back to the slow path: get_user_pages(). get_user_pages_fast(), on the other
hand, first calls __get_user_pages_fast(), but then falls back to the
slow path if __get_user_pages_fast() fails.

Also: remove unnecessary CONFIG_HUGETLB ifdefs.

Cc: Ira Weiny 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Greg Kroah-Hartman 
Cc: Dimitri Sivanich 
Cc: Arnd Bergmann 
Cc: William Kucharski 
Cc: Christoph Hellwig 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-kernel-ment...@lists.linuxfoundation.org
Reviewed-by: Ira Weiny 
Reviewed-by: John Hubbard 
Reviewed-by: William Kucharski 
Signed-off-by: Bharath Vedartham 
---
This is a fold of the 3 patches in the v2 patch series.
The review tags were given to the individual patches.

Changes since v3
- Used gup flags in get_user_pages_fast rather than
boolean flags.
Changes since v4
- Updated changelog according to John Hubbard.
---
 drivers/misc/sgi-gru/grufault.c | 112 +---
 1 file changed, 24 insertions(+), 88 deletions(-)

diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index 4b713a8..304e9c5 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -166,96 +166,20 @@ static void get_clear_fault_map(struct gru_state *gru,
 }
 
 /*
- * Atomic (interrupt context) & non-atomic (user context) functions to
- * convert a vaddr into a physical address. The size of the page
- * is returned in pageshift.
- * returns:
- *   0 - successful
- * < 0 - error code
- *   1 - (atomic only) try again in non-atomic context
- */
-static int non_atomic_pte_lookup(struct vm_area_struct *vma,
-unsigned long vaddr, int write,
-unsigned long *paddr, int *pageshift)
-{
-   struct page *page;
-
-#ifdef CONFIG_HUGETLB_PAGE
-   *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
-#else
-   *pageshift = PAGE_SHIFT;
-#endif
-   if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, , NULL) <= 0)
-   return -EFAULT;
-   *paddr = page_to_phys(page);
-   put_page(page);
-   return 0;
-}
-
-/*
- * atomic_pte_lookup
+ * mmap_sem is already helod on entry to this function. This guarantees
+ * existence of the page tables.
  *
- * Convert a user virtual address to a physical address
  * Only supports Intel large pages (2MB only) on x86_64.
- * ZZZ - hugepage support is incomplete
- *
- * NOTE: mmap_sem is already held on entry to this function. This
- * guarantees existence of the page tables.
+ * ZZZ - hugepage support is incomplete.
  */
-static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
-   int write, unsigned long *paddr, int *pageshift)
-{
-   pgd_t *pgdp;
-   p4d_t *p4dp;
-   pud_t *pudp;
-   pmd_t *pmdp;
-   pte_t pte;
-
-   pgdp = pgd_offset(vma->vm_mm, vaddr);
-   if (unlikely(pgd_none(*pgdp)))
-   goto err;
-
-   p4dp = p4d_offset(pgdp, vaddr);
-   if (unlikely(p4d_none(*p4dp)))
-   goto err;
-
-   pudp = pud_offset(p4dp, vaddr);
-   if (unlikely(pud_none(*pudp)))
-   goto err;
-
-   pmdp = pmd_offset(pudp, vaddr);
-   if (unlikely(pmd_none(*pmdp)))
-   goto err;
-#ifdef CONFIG_X86_64
-   if (unlikely(pmd_large(*pmdp)))
-   pte = *(pte_t *) pmdp;
-   else
-#endif
-   pte = *pte_offset_kernel(pmdp, vaddr);
-
-   if (unlikely(!pte_present(pte) ||
-(write && (!pte_write(pte) || !pte_dirty(pte)
-   return 1;
-
-   *paddr = pte_pfn(pte) << PAGE_SHIFT;
-#ifdef CONFIG_HUGETLB_PAGE
-   *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
-#else
-   *pageshift = PAGE_SHIFT;
-#endif
-   return 0;
-
-err:
-   return 1;
-}
-
 static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
int write, int atomic, unsigned long *gpa, int *pageshift)
 {
struct mm_struct *mm = gts->ts_mm;
struct vm_area_struct *vma;
unsigned long paddr;
-   int ret, ps;
+   int ret;
+   struct page *page;
 
vma = find_vma(mm, vaddr);
if (!vma)
@@ -263,21 +187,33 @@ static int gru_vtop(struct gru_thread_state *gts, 
unsigned long vaddr,
 
/*
 * Atomic lookup is faster &

Re: [Linux-kernel-mentees][PATCH v4 1/1] sgi-gru: Remove *pte_lookup functions

2019-08-09 Thread Bharath Vedartham
On Thu, Aug 08, 2019 at 04:21:44PM -0700, John Hubbard wrote:
> On 8/8/19 11:55 AM, Bharath Vedartham wrote:
> ...
> >  static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
> > int write, int atomic, unsigned long *gpa, int *pageshift)
> >  {
> > struct mm_struct *mm = gts->ts_mm;
> > struct vm_area_struct *vma;
> > unsigned long paddr;
> > -   int ret, ps;
> > +   int ret;
> > +   struct page *page;
> >  
> > vma = find_vma(mm, vaddr);
> > if (!vma)
> > @@ -263,21 +187,33 @@ static int gru_vtop(struct gru_thread_state *gts, 
> > unsigned long vaddr,
> >  
> > /*
> >  * Atomic lookup is faster & usually works even if called in non-atomic
> > -* context.
> > +* context. get_user_pages_fast does atomic lookup before falling back 
> > to
> > +* slow gup.
> >  */
> > rmb();  /* Must/check ms_range_active before loading PTEs */
> > -   ret = atomic_pte_lookup(vma, vaddr, write, , );
> > -   if (ret) {
> > -   if (atomic)
> > +   if (atomic) {
> > +   ret = __get_user_pages_fast(vaddr, 1, write, );
> > +   if (!ret)
> > goto upm;
> > -   if (non_atomic_pte_lookup(vma, vaddr, write, , ))
> > +   } else {
> > +   ret = get_user_pages_fast(vaddr, 1, write ? FOLL_WRITE : 0, 
> > );
> > +   if (!ret)
> > goto inval;
> > }
> > +
> > +   paddr = page_to_phys(page);
> > +   put_user_page(page);
> > +
> > +   if (unlikely(is_vm_hugetlb_page(vma)))
> > +   *pageshift = HPAGE_SHIFT;
> > +   else
> > +   *pageshift = PAGE_SHIFT;
> > +
> > if (is_gru_paddr(paddr))
> > goto inval;
> > -   paddr = paddr & ~((1UL << ps) - 1);
> > +   paddr = paddr & ~((1UL << *pageshift) - 1);
> > *gpa = uv_soc_phys_ram_to_gpa(paddr);
> > -   *pageshift = ps;
> 
> Why are you no longer setting *pageshift? There are a couple of callers
> that both use this variable.
I ll send v5 once your convinced by my argument that ps is not necessary
to set *pageshift and that *pageshift is being set.

Thank you
Bharath
> 
> thanks,
> -- 
> John Hubbard
> NVIDIA


Re: [Linux-kernel-mentees][PATCH v4 1/1] sgi-gru: Remove *pte_lookup functions

2019-08-09 Thread Bharath Vedartham
On Thu, Aug 08, 2019 at 04:30:48PM -0700, John Hubbard wrote:
> On 8/8/19 4:21 PM, John Hubbard wrote:
> > On 8/8/19 11:55 AM, Bharath Vedartham wrote:
> > ...
> >>if (is_gru_paddr(paddr))
> >>goto inval;
> >> -  paddr = paddr & ~((1UL << ps) - 1);
> >> +  paddr = paddr & ~((1UL << *pageshift) - 1);
> >>*gpa = uv_soc_phys_ram_to_gpa(paddr);
> >> -  *pageshift = ps;
> > 
> > Why are you no longer setting *pageshift? There are a couple of callers
> > that both use this variable.
> > 
> > 
> 
> ...and once that's figured out, I can fix it up here and send it up with 
> the next misc callsites series. I'm also inclined to make the commit
> log read more like this:
> 
> sgi-gru: Remove *pte_lookup functions, convert to put_user_page*()
> 
> For pages that were retained via get_user_pages*(), release those pages
> via the new put_user_page*() routines, instead of via put_page() or
> release_pages().
> 
> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions").
> 
> As part of this conversion, the *pte_lookup functions can be removed and
> be easily replaced with get_user_pages_fast() functions. In the case of
> atomic lookup, __get_user_pages_fast() is used, because it does not fall
> back to the slow path: get_user_pages(). get_user_pages_fast(), on the other
> hand, first calls __get_user_pages_fast(), but then falls back to the
> slow path if __get_user_pages_fast() fails.
> 
> Also: remove unnecessary CONFIG_HUGETLB ifdefs.
Sounds great! I will send the next version with an updated changelog!

Thank you
Bharath
> 
> thanks,
> -- 
> John Hubbard
> NVIDIA


Re: [Linux-kernel-mentees][PATCH v4 1/1] sgi-gru: Remove *pte_lookup functions

2019-08-09 Thread Bharath Vedartham
On Thu, Aug 08, 2019 at 04:21:44PM -0700, John Hubbard wrote:
> On 8/8/19 11:55 AM, Bharath Vedartham wrote:
> ...
> >  static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
> > int write, int atomic, unsigned long *gpa, int *pageshift)
> >  {
> > struct mm_struct *mm = gts->ts_mm;
> > struct vm_area_struct *vma;
> > unsigned long paddr;
> > -   int ret, ps;
> > +   int ret;
> > +   struct page *page;
> >  
> > vma = find_vma(mm, vaddr);
> > if (!vma)
> > @@ -263,21 +187,33 @@ static int gru_vtop(struct gru_thread_state *gts, 
> > unsigned long vaddr,
> >  
> > /*
> >  * Atomic lookup is faster & usually works even if called in non-atomic
> > -* context.
> > +* context. get_user_pages_fast does atomic lookup before falling back 
> > to
> > +* slow gup.
> >  */
> > rmb();  /* Must/check ms_range_active before loading PTEs */
> > -   ret = atomic_pte_lookup(vma, vaddr, write, , );
> > -   if (ret) {
> > -   if (atomic)
> > +   if (atomic) {
> > +   ret = __get_user_pages_fast(vaddr, 1, write, );
> > +   if (!ret)
> > goto upm;
> > -   if (non_atomic_pte_lookup(vma, vaddr, write, , ))
> > +   } else {
> > +   ret = get_user_pages_fast(vaddr, 1, write ? FOLL_WRITE : 0, 
> > );
> > +   if (!ret)
> > goto inval;
> > }
> > +
> > +   paddr = page_to_phys(page);
> > +   put_user_page(page);
> > +
> > +   if (unlikely(is_vm_hugetlb_page(vma)))
> > +   *pageshift = HPAGE_SHIFT;
> > +   else
> > +   *pageshift = PAGE_SHIFT;
> > +
> > if (is_gru_paddr(paddr))
> > goto inval;
> > -   paddr = paddr & ~((1UL << ps) - 1);
> > +   paddr = paddr & ~((1UL << *pageshift) - 1);
> > *gpa = uv_soc_phys_ram_to_gpa(paddr);
> > -   *pageshift = ps;
> 
> Why are you no longer setting *pageshift? There are a couple of callers
> that both use this variable.
Hi John,

I did set *pageshift. The if statement above sets *pageshift. ps was
used to retrive the pageshift value when the pte_lookup functions were
present. ps was passed by reference to those functions and set by them.
But here since we are trying to remove those functions, we don't need ps
and we directly set *pageshift to HPAGE_SHIFT or PAGE_SHIFT based on the
type of vma. 

Hope this clears things up?

Thank you
Bharath
> 
> thanks,
> -- 
> John Hubbard
> NVIDIA


[Linux-kernel-mentees][PATCH v4 1/1] sgi-gru: Remove *pte_lookup functions

2019-08-08 Thread Bharath Vedartham
The *pte_lookup functions can be removed and be easily replaced with
get_user_pages_fast functions. In the case of atomic lookup,
__get_user_pages_fast is used which does not fall back to slow
get_user_pages. get_user_pages_fast on the other hand tries to use
__get_user_pages_fast but fallbacks to slow get_user_pages if
__get_user_pages_fast fails.

Also unnecessary ifdefs to check for CONFIG_HUGETLB is removed as the
check is redundant.

Cc: Ira Weiny 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Greg Kroah-Hartman 
Cc: Dimitri Sivanich 
Cc: Arnd Bergmann 
Cc: William Kucharski 
Cc: Christoph Hellwig 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-kernel-ment...@lists.linuxfoundation.org
Reviewed-by: Ira Weiny 
Reviewed-by: John Hubbard 
Reviewed-by: William Kucharski 
Signed-off-by: Bharath Vedartham 
---
This is a fold of the 3 patches in the v2 patch series.
The review tags were given to the individual patches.

Changes since v3
- Used gup flags in get_user_pages_fast rather than
boolean flags.
---
 drivers/misc/sgi-gru/grufault.c | 112 +---
 1 file changed, 24 insertions(+), 88 deletions(-)

diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index 4b713a8..304e9c5 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -166,96 +166,20 @@ static void get_clear_fault_map(struct gru_state *gru,
 }
 
 /*
- * Atomic (interrupt context) & non-atomic (user context) functions to
- * convert a vaddr into a physical address. The size of the page
- * is returned in pageshift.
- * returns:
- *   0 - successful
- * < 0 - error code
- *   1 - (atomic only) try again in non-atomic context
- */
-static int non_atomic_pte_lookup(struct vm_area_struct *vma,
-unsigned long vaddr, int write,
-unsigned long *paddr, int *pageshift)
-{
-   struct page *page;
-
-#ifdef CONFIG_HUGETLB_PAGE
-   *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
-#else
-   *pageshift = PAGE_SHIFT;
-#endif
-   if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, , NULL) <= 0)
-   return -EFAULT;
-   *paddr = page_to_phys(page);
-   put_page(page);
-   return 0;
-}
-
-/*
- * atomic_pte_lookup
+ * mmap_sem is already helod on entry to this function. This guarantees
+ * existence of the page tables.
  *
- * Convert a user virtual address to a physical address
  * Only supports Intel large pages (2MB only) on x86_64.
- * ZZZ - hugepage support is incomplete
- *
- * NOTE: mmap_sem is already held on entry to this function. This
- * guarantees existence of the page tables.
+ * ZZZ - hugepage support is incomplete.
  */
-static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
-   int write, unsigned long *paddr, int *pageshift)
-{
-   pgd_t *pgdp;
-   p4d_t *p4dp;
-   pud_t *pudp;
-   pmd_t *pmdp;
-   pte_t pte;
-
-   pgdp = pgd_offset(vma->vm_mm, vaddr);
-   if (unlikely(pgd_none(*pgdp)))
-   goto err;
-
-   p4dp = p4d_offset(pgdp, vaddr);
-   if (unlikely(p4d_none(*p4dp)))
-   goto err;
-
-   pudp = pud_offset(p4dp, vaddr);
-   if (unlikely(pud_none(*pudp)))
-   goto err;
-
-   pmdp = pmd_offset(pudp, vaddr);
-   if (unlikely(pmd_none(*pmdp)))
-   goto err;
-#ifdef CONFIG_X86_64
-   if (unlikely(pmd_large(*pmdp)))
-   pte = *(pte_t *) pmdp;
-   else
-#endif
-   pte = *pte_offset_kernel(pmdp, vaddr);
-
-   if (unlikely(!pte_present(pte) ||
-(write && (!pte_write(pte) || !pte_dirty(pte)
-   return 1;
-
-   *paddr = pte_pfn(pte) << PAGE_SHIFT;
-#ifdef CONFIG_HUGETLB_PAGE
-   *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
-#else
-   *pageshift = PAGE_SHIFT;
-#endif
-   return 0;
-
-err:
-   return 1;
-}
-
 static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
int write, int atomic, unsigned long *gpa, int *pageshift)
 {
struct mm_struct *mm = gts->ts_mm;
struct vm_area_struct *vma;
unsigned long paddr;
-   int ret, ps;
+   int ret;
+   struct page *page;
 
vma = find_vma(mm, vaddr);
if (!vma)
@@ -263,21 +187,33 @@ static int gru_vtop(struct gru_thread_state *gts, 
unsigned long vaddr,
 
/*
 * Atomic lookup is faster & usually works even if called in non-atomic
-* context.
+* context. get_user_pages_fast does atomic lookup before falling back 
to
+* slow gup.
 */
rmb();  /* Must/check ms_range_active before loading PTEs */
-   ret = atomic_pte_lookup(vma, vaddr, write, , );
-   if (ret) {
-   if (atomic)
+   if (atomic) {
+  

[Linux-kernel-mentees][PATCH v4 0/1] get_user_pages changes

2019-08-08 Thread Bharath Vedartham
In this 4th version of the patch series, I have compressed the patches
of the v2 patch series into one patch. This was suggested by Christoph Hellwig.
The suggestion was to remove the pte_lookup functions and use the 
get_user_pages* functions directly instead of the pte_lookup functions.

There is nothing different in this series compared to the previous
series, It essentially compresses the 3 patches of the original series
into one patch.

This series survives a compile test.

Bharath Vedartham (1):
  sgi-gru: Remove *pte_lookup functions

 drivers/misc/sgi-gru/grufault.c | 112 +---
 1 file changed, 24 insertions(+), 88 deletions(-)

-- 
2.7.4



[Linux-kernel-mentees][PATCH v4 1/1] sgi-gru: Remove *pte_lookup functions

2019-07-30 Thread Bharath Vedartham
The *pte_lookup functions can be removed and be easily replaced with
get_user_pages_fast functions. In the case of atomic lookup,
__get_user_pages_fast is used which does not fall back to slow
get_user_pages. get_user_pages_fast on the other hand tries to use
__get_user_pages_fast but fallbacks to slow get_user_pages if
__get_user_pages_fast fails.

Also unnecessary ifdefs to check for CONFIG_HUGETLB is removed as the
check is redundant.

Cc: Ira Weiny 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Greg Kroah-Hartman 
Cc: Dimitri Sivanich 
Cc: Arnd Bergmann 
Cc: William Kucharski 
Cc: Christoph Hellwig 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-kernel-ment...@lists.linuxfoundation.org
Reviewed-by: Ira Weiny 
Reviewed-by: John Hubbard 
Reviewed-by: William Kucharski 
Signed-off-by: Bharath Vedartham 
---
This is a fold of the 3 patches in the v2 patch series.
The review tags were given to the individual patches.

Changes since v3
- Used gup flags in get_user_pages_fast rather than
boolean flags.
---
 drivers/misc/sgi-gru/grufault.c | 112 +---
 1 file changed, 24 insertions(+), 88 deletions(-)

diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index 4b713a8..304e9c5 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -166,96 +166,20 @@ static void get_clear_fault_map(struct gru_state *gru,
 }
 
 /*
- * Atomic (interrupt context) & non-atomic (user context) functions to
- * convert a vaddr into a physical address. The size of the page
- * is returned in pageshift.
- * returns:
- *   0 - successful
- * < 0 - error code
- *   1 - (atomic only) try again in non-atomic context
- */
-static int non_atomic_pte_lookup(struct vm_area_struct *vma,
-unsigned long vaddr, int write,
-unsigned long *paddr, int *pageshift)
-{
-   struct page *page;
-
-#ifdef CONFIG_HUGETLB_PAGE
-   *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
-#else
-   *pageshift = PAGE_SHIFT;
-#endif
-   if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, , NULL) <= 0)
-   return -EFAULT;
-   *paddr = page_to_phys(page);
-   put_page(page);
-   return 0;
-}
-
-/*
- * atomic_pte_lookup
+ * mmap_sem is already helod on entry to this function. This guarantees
+ * existence of the page tables.
  *
- * Convert a user virtual address to a physical address
  * Only supports Intel large pages (2MB only) on x86_64.
- * ZZZ - hugepage support is incomplete
- *
- * NOTE: mmap_sem is already held on entry to this function. This
- * guarantees existence of the page tables.
+ * ZZZ - hugepage support is incomplete.
  */
-static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
-   int write, unsigned long *paddr, int *pageshift)
-{
-   pgd_t *pgdp;
-   p4d_t *p4dp;
-   pud_t *pudp;
-   pmd_t *pmdp;
-   pte_t pte;
-
-   pgdp = pgd_offset(vma->vm_mm, vaddr);
-   if (unlikely(pgd_none(*pgdp)))
-   goto err;
-
-   p4dp = p4d_offset(pgdp, vaddr);
-   if (unlikely(p4d_none(*p4dp)))
-   goto err;
-
-   pudp = pud_offset(p4dp, vaddr);
-   if (unlikely(pud_none(*pudp)))
-   goto err;
-
-   pmdp = pmd_offset(pudp, vaddr);
-   if (unlikely(pmd_none(*pmdp)))
-   goto err;
-#ifdef CONFIG_X86_64
-   if (unlikely(pmd_large(*pmdp)))
-   pte = *(pte_t *) pmdp;
-   else
-#endif
-   pte = *pte_offset_kernel(pmdp, vaddr);
-
-   if (unlikely(!pte_present(pte) ||
-(write && (!pte_write(pte) || !pte_dirty(pte)
-   return 1;
-
-   *paddr = pte_pfn(pte) << PAGE_SHIFT;
-#ifdef CONFIG_HUGETLB_PAGE
-   *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
-#else
-   *pageshift = PAGE_SHIFT;
-#endif
-   return 0;
-
-err:
-   return 1;
-}
-
 static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
int write, int atomic, unsigned long *gpa, int *pageshift)
 {
struct mm_struct *mm = gts->ts_mm;
struct vm_area_struct *vma;
unsigned long paddr;
-   int ret, ps;
+   int ret;
+   struct page *page;
 
vma = find_vma(mm, vaddr);
if (!vma)
@@ -263,21 +187,33 @@ static int gru_vtop(struct gru_thread_state *gts, 
unsigned long vaddr,
 
/*
 * Atomic lookup is faster & usually works even if called in non-atomic
-* context.
+* context. get_user_pages_fast does atomic lookup before falling back 
to
+* slow gup.
 */
rmb();  /* Must/check ms_range_active before loading PTEs */
-   ret = atomic_pte_lookup(vma, vaddr, write, , );
-   if (ret) {
-   if (atomic)
+   if (atomic) {
+  

[Linux-kernel-mentees][PATCH v4 0/1] get_user_pages changes

2019-07-30 Thread Bharath Vedartham
In this 4th version of the patch series, I have compressed the patches
of the v2 patch series into one patch. This was suggested by Christoph Hellwig.
The suggestion was to remove the pte_lookup functions and use the 
get_user_pages* functions directly instead of the pte_lookup functions.

There is nothing different in this series compared to the previous
series, It essentially compresses the 3 patches of the original series
into one patch.

Bharath Vedartham (1):
  sgi-gru: Remove *pte_lookup functions

 drivers/misc/sgi-gru/grufault.c | 112 +---
 1 file changed, 24 insertions(+), 88 deletions(-)

-- 
2.7.4



Re: [PATCH v3 1/1] sgi-gru: Remove *pte_lookup functions

2019-07-30 Thread Bharath Vedartham
On Mon, Jul 29, 2019 at 08:48:42AM +0200, Christoph Hellwig wrote:
> On Sat, Jul 27, 2019 at 01:12:00AM +0530, Bharath Vedartham wrote:
> > +   ret = get_user_pages_fast(vaddr, 1, write, );
> 
> I think you want to pass "write ? FOLL_WRITE : 0" here, as
> get_user_pages_fast takes a gup_flags argument, not a boolean
> write flag.

You are right there! I ll send another version correcting this.

Thank you
Bharath


Re: [Linux-kernel-mentees][PATCH v4] staging: kpc2000: Convert put_page to put_user_page*()

2019-07-30 Thread Bharath Vedartham
On Tue, Jul 30, 2019 at 11:36:06AM +0200, Greg KH wrote:
> On Tue, Jul 30, 2019 at 02:58:44PM +0530, Bharath Vedartham wrote:
> > put_page() to put_user_page*()
> 
> What does this mean?

That must have been a mistake! I just wanted to forward this patch to
the Linux-kernel-mentees mailing list. THis patch has already been taken
by for staging-testing. I ll forward another patch just cc'ing the
mentees mailing lists and won't disturb the other devs.

Thank you
Bharath


Re: [Linux-kernel-mentees][PATCH v4] staging: kpc2000: Convert put_page to put_user_page*()

2019-07-30 Thread Bharath Vedartham
put_page() to put_user_page*()
Reply-To: 
In-Reply-To: <1564058658-3551-1-git-send-email-linux.b...@gmail.com>

On Thu, Jul 25, 2019 at 06:14:18PM +0530, Bharath Vedartham wrote:
[Forwarding patch to linux-kernel-mentees mailing list]
> For pages that were retained via get_user_pages*(), release those pages
> via the new put_user_page*() routines, instead of via put_page().
> 
> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions").
> 
> Cc: Ira Weiny 
> Cc: John Hubbard 
> Cc: Jérôme Glisse 
> Cc: Greg Kroah-Hartman 
> Cc: Matt Sickler 
> Cc: de...@driverdev.osuosl.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux...@kvack.org
> Reviewed-by: John Hubbard 
> Signed-off-by: Bharath Vedartham 
> ---
> Changes since v1
> - Improved changelog by John's suggestion.
> - Moved logic to dirty pages below sg_dma_unmap
>  and removed PageReserved check.
> Changes since v2
> - Added back PageResevered check as
> suggested by John Hubbard.
> Changes since v3
> - Changed the changelog as suggested by John.
> - Added John's Reviewed-By tag.
> Changes since v4
> - Rebased the patch on the staging tree.
> - Improved commit log by fixing a line wrap.
> ---
>  drivers/staging/kpc2000/kpc_dma/fileops.c | 17 ++---
>  1 file changed, 6 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c 
> b/drivers/staging/kpc2000/kpc_dma/fileops.c
> index 48ca88b..f15e292 100644
> --- a/drivers/staging/kpc2000/kpc_dma/fileops.c
> +++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
> @@ -190,9 +190,7 @@ static int kpc_dma_transfer(struct dev_private_data *priv,
>   sg_free_table(>sgt);
>   err_dma_map_sg:
>   err_alloc_sg_table:
> - for (i = 0 ; i < acd->page_count ; i++) {
> - put_page(acd->user_pages[i]);
> - }
> + put_user_pages(acd->user_pages, acd->page_count);
>   err_get_user_pages:
>   kfree(acd->user_pages);
>   err_alloc_userpages:
> @@ -211,16 +209,13 @@ void  transfer_complete_cb(struct aio_cb_data *acd, 
> size_t xfr_count, u32 flags)
>   BUG_ON(acd->ldev == NULL);
>   BUG_ON(acd->ldev->pldev == NULL);
>  
> - for (i = 0 ; i < acd->page_count ; i++) {
> - if (!PageReserved(acd->user_pages[i])) {
> - set_page_dirty(acd->user_pages[i]);
> - }
> - }
> -
>   dma_unmap_sg(>ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, 
> acd->ldev->dir);
>  
> - for (i = 0 ; i < acd->page_count ; i++) {
> - put_page(acd->user_pages[i]);
> + for (i = 0; i < acd->page_count; i++) {
> + if (!PageReserved(acd->user_pages[i]))
> + put_user_pages_dirty(>user_pages[i], 1);
> + else
> + put_user_page(acd->user_pages[i]);
>   }
>  
>   sg_free_table(>sgt);
> -- 
> 2.7.4
> 


Re: [PATCH v3 1/1] sgi-gru: Remove *pte_lookup functions

2019-07-28 Thread Bharath Vedartham
On Sat, Jul 27, 2019 at 05:22:28PM +0800, Hillf Danton wrote:
> 
> On Fri, 26 Jul 2019 12:42:26 -0700 (PDT) Bharath Vedartham wrote:
> > 
> >  static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
> > int write, int atomic, unsigned long *gpa, int *pageshift)
> >  {
> > struct mm_struct *mm = gts->ts_mm;
> > struct vm_area_struct *vma;
> > unsigned long paddr;
> > -   int ret, ps;
> > +   int ret;
> > +   struct page *page;
> >  
> > vma = find_vma(mm, vaddr);
> > if (!vma)
> > @@ -263,21 +187,33 @@ static int gru_vtop(struct gru_thread_state *gts, 
> > unsigned long vaddr,
> >  
> > /*
> >  * Atomic lookup is faster & usually works even if called in non-atomic
> > -* context.
> > +* context. get_user_pages_fast does atomic lookup before falling back 
> > to
> > +* slow gup.
> >  */
> > rmb();  /* Must/check ms_range_active before loading PTEs */
> > -   ret = atomic_pte_lookup(vma, vaddr, write, , );
> > -   if (ret) {
> > -   if (atomic)
> > +   if (atomic) {
> > +   ret = __get_user_pages_fast(vaddr, 1, write, );
> > +   if (!ret)
> > goto upm;
> > -   if (non_atomic_pte_lookup(vma, vaddr, write, , ))
> > +   } else {
> > +   ret = get_user_pages_fast(vaddr, 1, write, );
> > +   if (!ret)
> > goto inval;
> > }
> > +
> > +   paddr = page_to_phys(page);
> 
> You may drop find_vma() above if PageHuge(page) makes sense here.
I don't think it does. Hugepage support is still incomplete for this
driver.

Thank you
Bharath
> > +   put_user_page(page);
> > +
> > +   if (unlikely(is_vm_hugetlb_page(vma)))
> > +   *pageshift = HPAGE_SHIFT;
> > +   else
> > +   *pageshift = PAGE_SHIFT;
> > +
> > if (is_gru_paddr(paddr))
> > goto inval;
> > -   paddr = paddr & ~((1UL << ps) - 1);
> > -   *gpa = uv_soc_phys_ram_to_gpa(paddr);
> > -   *pageshift = ps;
> > +   paddr = paddr & ~((1UL << *pageshift) - 1);
> > +   *gpa = uv_soc_phys_ram_to_gpa(paddr);
> > +
> > return VTOP_SUCCESS;
> >  
> >  inval:
> > -- 
> > 2.7.4
> 


[PATCH v3 0/1] get_user_pages changes

2019-07-26 Thread Bharath Vedartham
In this 3rd version of the patch series, I have compressed the patches
of the previous patch series into one patch. This was suggested by Christoph 
Hellwig.
The suggestion was to remove the pte_lookup functions and use the g
et_user_pages* functions directly instead of the pte_lookup functions.

There is nothing different in this series compared to the previous 
series, It essentially compresses the 3 patches of the original series 
into one patch.

Bharath Vedartham (1):
  sgi-gru: Remove *pte_lookup functions

 drivers/misc/sgi-gru/grufault.c | 114 +---
 1 file changed, 25 insertions(+), 89 deletions(-)

-- 
2.7.4



[PATCH v3 1/1] sgi-gru: Remove *pte_lookup functions

2019-07-26 Thread Bharath Vedartham
The *pte_lookup functions can be removed and be easily replaced with
get_user_pages_fast functions. In the case of atomic lookup,
__get_user_pages_fast is used which does not fall back to slow
get_user_pages. get_user_pages_fast on the other hand tries to use
__get_user_pages_fast but fallbacks to slow get_user_pages if
__get_user_pages_fast fails.

Also unnecessary ifdefs to check for CONFIG_HUGETLB is removed as the
check is redundant.

Cc: Ira Weiny 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Greg Kroah-Hartman 
Cc: Dimitri Sivanich 
Cc: Arnd Bergmann 
Cc: William Kucharski 
Cc: Christoph Hellwig 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Reviewed-by: Ira Weiny 
Reviewed-by: John Hubbard 
Reviewed-by: William Kucharski 
Signed-off-by: Bharath Vedartham 
---
This is a fold of the 3 patches in the previous patch series.
The review tags were given to the individual patches.
---
 drivers/misc/sgi-gru/grufault.c | 114 +---
 1 file changed, 25 insertions(+), 89 deletions(-)

diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index 4b713a8..c1258ea 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -166,96 +166,20 @@ static void get_clear_fault_map(struct gru_state *gru,
 }
 
 /*
- * Atomic (interrupt context) & non-atomic (user context) functions to
- * convert a vaddr into a physical address. The size of the page
- * is returned in pageshift.
- * returns:
- *   0 - successful
- * < 0 - error code
- *   1 - (atomic only) try again in non-atomic context
- */
-static int non_atomic_pte_lookup(struct vm_area_struct *vma,
-unsigned long vaddr, int write,
-unsigned long *paddr, int *pageshift)
-{
-   struct page *page;
-
-#ifdef CONFIG_HUGETLB_PAGE
-   *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
-#else
-   *pageshift = PAGE_SHIFT;
-#endif
-   if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, , NULL) <= 0)
-   return -EFAULT;
-   *paddr = page_to_phys(page);
-   put_page(page);
-   return 0;
-}
-
-/*
- * atomic_pte_lookup
+ * mmap_sem is already helod on entry to this function. This guarantees
+ * existence of the page tables.
  *
- * Convert a user virtual address to a physical address
  * Only supports Intel large pages (2MB only) on x86_64.
- * ZZZ - hugepage support is incomplete
- *
- * NOTE: mmap_sem is already held on entry to this function. This
- * guarantees existence of the page tables.
+ * ZZZ - hugepage support is incomplete.
  */
-static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
-   int write, unsigned long *paddr, int *pageshift)
-{
-   pgd_t *pgdp;
-   p4d_t *p4dp;
-   pud_t *pudp;
-   pmd_t *pmdp;
-   pte_t pte;
-
-   pgdp = pgd_offset(vma->vm_mm, vaddr);
-   if (unlikely(pgd_none(*pgdp)))
-   goto err;
-
-   p4dp = p4d_offset(pgdp, vaddr);
-   if (unlikely(p4d_none(*p4dp)))
-   goto err;
-
-   pudp = pud_offset(p4dp, vaddr);
-   if (unlikely(pud_none(*pudp)))
-   goto err;
-
-   pmdp = pmd_offset(pudp, vaddr);
-   if (unlikely(pmd_none(*pmdp)))
-   goto err;
-#ifdef CONFIG_X86_64
-   if (unlikely(pmd_large(*pmdp)))
-   pte = *(pte_t *) pmdp;
-   else
-#endif
-   pte = *pte_offset_kernel(pmdp, vaddr);
-
-   if (unlikely(!pte_present(pte) ||
-(write && (!pte_write(pte) || !pte_dirty(pte)
-   return 1;
-
-   *paddr = pte_pfn(pte) << PAGE_SHIFT;
-#ifdef CONFIG_HUGETLB_PAGE
-   *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
-#else
-   *pageshift = PAGE_SHIFT;
-#endif
-   return 0;
-
-err:
-   return 1;
-}
-
 static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
int write, int atomic, unsigned long *gpa, int *pageshift)
 {
struct mm_struct *mm = gts->ts_mm;
struct vm_area_struct *vma;
unsigned long paddr;
-   int ret, ps;
+   int ret;
+   struct page *page;
 
vma = find_vma(mm, vaddr);
if (!vma)
@@ -263,21 +187,33 @@ static int gru_vtop(struct gru_thread_state *gts, 
unsigned long vaddr,
 
/*
 * Atomic lookup is faster & usually works even if called in non-atomic
-* context.
+* context. get_user_pages_fast does atomic lookup before falling back 
to
+* slow gup.
 */
rmb();  /* Must/check ms_range_active before loading PTEs */
-   ret = atomic_pte_lookup(vma, vaddr, write, , );
-   if (ret) {
-   if (atomic)
+   if (atomic) {
+   ret = __get_user_pages_fast(vaddr, 1, write, );
+   if (!ret)
goto upm;
-   if (non_a

Re: [PATCH 5.2 000/413] 5.2.3-stable review

2019-07-26 Thread Bharath Vedartham
Built and booted on my x86_64 test system. No dmesg regressions.

Thank you
Bharath


Re: [PATCH 5.1 000/371] 5.1.20-stable review

2019-07-26 Thread Bharath Vedartham
Built and booted on my x86_64 test system. No dmesg regressions.

Thank you
Bharath


Re: [PATCH 4.19 000/271] 4.19.61-stable review

2019-07-26 Thread Bharath Vedartham
Built and booted on my x86_64 system. No dmesg regressions.

Thank you
Bharath


[PATCH v4] staging: kpc2000: Convert put_page() to put_user_page*()

2019-07-25 Thread Bharath Vedartham
For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Ira Weiny 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Greg Kroah-Hartman 
Cc: Matt Sickler 
Cc: de...@driverdev.osuosl.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Reviewed-by: John Hubbard 
Signed-off-by: Bharath Vedartham 
---
Changes since v1
- Improved changelog by John's suggestion.
- Moved logic to dirty pages below sg_dma_unmap
 and removed PageReserved check.
Changes since v2
- Added back PageResevered check as
suggested by John Hubbard.
Changes since v3
- Changed the changelog as suggested by John.
- Added John's Reviewed-By tag.
Changes since v4
- Rebased the patch on the staging tree.
- Improved commit log by fixing a line wrap.
---
 drivers/staging/kpc2000/kpc_dma/fileops.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c 
b/drivers/staging/kpc2000/kpc_dma/fileops.c
index 48ca88b..f15e292 100644
--- a/drivers/staging/kpc2000/kpc_dma/fileops.c
+++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
@@ -190,9 +190,7 @@ static int kpc_dma_transfer(struct dev_private_data *priv,
sg_free_table(>sgt);
  err_dma_map_sg:
  err_alloc_sg_table:
-   for (i = 0 ; i < acd->page_count ; i++) {
-   put_page(acd->user_pages[i]);
-   }
+   put_user_pages(acd->user_pages, acd->page_count);
  err_get_user_pages:
kfree(acd->user_pages);
  err_alloc_userpages:
@@ -211,16 +209,13 @@ void  transfer_complete_cb(struct aio_cb_data *acd, 
size_t xfr_count, u32 flags)
BUG_ON(acd->ldev == NULL);
BUG_ON(acd->ldev->pldev == NULL);
 
-   for (i = 0 ; i < acd->page_count ; i++) {
-   if (!PageReserved(acd->user_pages[i])) {
-   set_page_dirty(acd->user_pages[i]);
-   }
-   }
-
dma_unmap_sg(>ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, 
acd->ldev->dir);
 
-   for (i = 0 ; i < acd->page_count ; i++) {
-   put_page(acd->user_pages[i]);
+   for (i = 0; i < acd->page_count; i++) {
+   if (!PageReserved(acd->user_pages[i]))
+   put_user_pages_dirty(>user_pages[i], 1);
+   else
+   put_user_page(acd->user_pages[i]);
}
 
sg_free_table(>sgt);
-- 
2.7.4



Re: [PATCH v4] staging: kpc2000: Convert put_page to put_user_page*()

2019-07-25 Thread Bharath Vedartham
On Thu, Jul 25, 2019 at 09:46:34AM +0200, Greg KH wrote:
> On Sat, Jul 20, 2019 at 11:02:14PM +0530, Bharath Vedartham wrote:
> > For pages that were retained via get_user_pages*(), release those pages
> > via the new put_user_page*() routines, instead of via put_page().
> > 
> > This is part a tree-wide conversion, as described in commit fc1d8e7cca2d 
> > ("mm: introduce put_user_page*(), placeholder versions").
> 
> Please line-wrap this line.
> 
> > 
> > Cc: Ira Weiny 
> > Cc: John Hubbard 
> > Cc: Jérôme Glisse 
> > Cc: Greg Kroah-Hartman 
> > Cc: Matt Sickler 
> > Cc: de...@driverdev.osuosl.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux...@kvack.org
> > Reviewed-by: John Hubbard 
> > Signed-off-by: Bharath Vedartham 
> > ---
> > Changes since v1
> >- Improved changelog by John's suggestion.
> >- Moved logic to dirty pages below sg_dma_unmap
> >and removed PageReserved check.
> > Changes since v2
> >- Added back PageResevered check as suggested by John Hubbard.
> > Changes since v3
> >- Changed the commit log as suggested by John.
> >- Added John's Reviewed-By tag
> > 
> > ---
> >  drivers/staging/kpc2000/kpc_dma/fileops.c | 17 ++---
> >  1 file changed, 6 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c 
> > b/drivers/staging/kpc2000/kpc_dma/fileops.c
> > index 6166587..75ad263 100644
> > --- a/drivers/staging/kpc2000/kpc_dma/fileops.c
> > +++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
> > @@ -198,9 +198,7 @@ int  kpc_dma_transfer(struct dev_private_data *priv, 
> > struct kiocb *kcb, unsigned
> > sg_free_table(>sgt);
> >   err_dma_map_sg:
> >   err_alloc_sg_table:
> > -   for (i = 0 ; i < acd->page_count ; i++){
> > -   put_page(acd->user_pages[i]);
> > -   }
> > +   put_user_pages(acd->user_pages, acd->page_count);
> >   err_get_user_pages:
> > kfree(acd->user_pages);
> >   err_alloc_userpages:
> > @@ -221,16 +219,13 @@ void  transfer_complete_cb(struct aio_cb_data *acd, 
> > size_t xfr_count, u32 flags)
> > 
> > dev_dbg(>ldev->pldev->dev, "transfer_complete_cb(acd = [%p])\n", 
> > acd);
> > 
> > -   for (i = 0 ; i < acd->page_count ; i++){
> > -   if (!PageReserved(acd->user_pages[i])){
> > -   set_page_dirty(acd->user_pages[i]);
> > -   }
> > -   }
> > -   
> > dma_unmap_sg(>ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, 
> > acd->ldev->dir);
> > 
> > -   for (i = 0 ; i < acd->page_count ; i++){
> > -   put_page(acd->user_pages[i]);
> > +   for (i = 0; i < acd->page_count; i++) {
> > +   if (!PageReserved(acd->user_pages[i]))
> > +   put_user_pages_dirty(>user_pages[i], 1);
> > +   else
> > +   put_user_page(acd->user_pages[i]);
> > }
> > 
> > sg_free_table(>sgt);
> > -- 
> > 2.7.4
> 
> This patch can not be applied at all :(
> 
> Can you redo it against the latest staging-next branch and resend?
> 
> thanks,
Yup. Will do that!
> greg k-h


Re: [PATCH v2 3/3] sgi-gru: Use __get_user_pages_fast in atomic_pte_lookup

2019-07-24 Thread Bharath Vedartham
On Wed, Jul 24, 2019 at 09:09:29AM -0700, Christoph Hellwig wrote:
> I think the atomic_pte_lookup / non_atomic_pte_lookup helpers
> should simply go away.  Most of the setup code is common now and should
> be in the caller where it can be shared.  Then just do a:
> 
>   if (atomic) {
>   __get_user_pages_fast()
>   } else {
>   get_user_pages_fast();
>   }
> 
> and we actually have an easy to understand piece of code.

That makes sense. I ll do that and send v3. I ll probably cut down on a
patch and try to fold all the changes into a single patch removing the
*pte_lookup helpers.


[PATCH v2 3/3] sgi-gru: Use __get_user_pages_fast in atomic_pte_lookup

2019-07-24 Thread Bharath Vedartham
*pte_lookup functions get the physical address for a given virtual
address by getting a physical page using gup and use page_to_phys to get
the physical address.

Currently, atomic_pte_lookup manually walks the page tables. If this
function fails to get a physical page, it will fall back too
non_atomic_pte_lookup to get a physical page which uses the slow gup
path to get the physical page.

Instead of manually walking the page tables use __get_user_pages_fast
which does the same thing and it does not fall back to the slow gup
path.

Also, the function atomic_pte_lookup's return value has been changed to boolean.
The callsites have been appropriately modified.

This is largely inspired from kvm code. kvm uses __get_user_pages_fast
in hva_to_pfn_fast function which can run in an atomic context.

Cc: Ira Weiny 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Greg Kroah-Hartman 
Cc: Dimitri Sivanich 
Cc: Arnd Bergmann 
Cc: William Kucharski 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Reviewed-by: Ira Weiny 
Signed-off-by: Bharath Vedartham 
---
Changes since v2
- Modified the return value of atomic_pte_lookup
to use booleans rather than numeric values.
This was suggested by John Hubbard.
---
 drivers/misc/sgi-gru/grufault.c | 56 +++--
 1 file changed, 15 insertions(+), 41 deletions(-)

diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index bce47af..da2d2cc 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -193,9 +193,11 @@ static int non_atomic_pte_lookup(struct vm_area_struct 
*vma,
 }
 
 /*
- * atomic_pte_lookup
+ * atomic_pte_lookup() - Convert a user virtual address 
+ * to a physical address.
+ * @Return: true for success, false for failure. Failure means that
+ * the page could not be pinned via gup fast.
  *
- * Convert a user virtual address to a physical address
  * Only supports Intel large pages (2MB only) on x86_64.
  * ZZZ - hugepage support is incomplete
  *
@@ -205,49 +207,20 @@ static int non_atomic_pte_lookup(struct vm_area_struct 
*vma,
 static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
int write, unsigned long *paddr, int *pageshift)
 {
-   pgd_t *pgdp;
-   p4d_t *p4dp;
-   pud_t *pudp;
-   pmd_t *pmdp;
-   pte_t pte;
-
-   pgdp = pgd_offset(vma->vm_mm, vaddr);
-   if (unlikely(pgd_none(*pgdp)))
-   goto err;
-
-   p4dp = p4d_offset(pgdp, vaddr);
-   if (unlikely(p4d_none(*p4dp)))
-   goto err;
-
-   pudp = pud_offset(p4dp, vaddr);
-   if (unlikely(pud_none(*pudp)))
-   goto err;
-
-   pmdp = pmd_offset(pudp, vaddr);
-   if (unlikely(pmd_none(*pmdp)))
-   goto err;
-#ifdef CONFIG_X86_64
-   if (unlikely(pmd_large(*pmdp)))
-   pte = *(pte_t *) pmdp;
-   else
-#endif
-   pte = *pte_offset_kernel(pmdp, vaddr);
-
-   if (unlikely(!pte_present(pte) ||
-(write && (!pte_write(pte) || !pte_dirty(pte)
-   return 1;
-
-   *paddr = pte_pfn(pte) << PAGE_SHIFT;
+   struct page *page;
 
if (unlikely(is_vm_hugetlb_page(vma)))
*pageshift = HPAGE_SHIFT;
else
*pageshift = PAGE_SHIFT;
 
-   return 0;
+   if (!__get_user_pages_fast(vaddr, 1, write, ))
+   return false;
 
-err:
-   return 1;
+   *paddr = page_to_phys(page);
+   put_user_page(page);
+
+   return true;
 }
 
 static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
@@ -256,7 +229,8 @@ static int gru_vtop(struct gru_thread_state *gts, unsigned 
long vaddr,
struct mm_struct *mm = gts->ts_mm;
struct vm_area_struct *vma;
unsigned long paddr;
-   int ret, ps;
+   int ps;
+   bool success;
 
vma = find_vma(mm, vaddr);
if (!vma)
@@ -267,8 +241,8 @@ static int gru_vtop(struct gru_thread_state *gts, unsigned 
long vaddr,
 * context.
 */
rmb();  /* Must/check ms_range_active before loading PTEs */
-   ret = atomic_pte_lookup(vma, vaddr, write, , );
-   if (ret) {
+   success = atomic_pte_lookup(vma, vaddr, write, , );
+   if (!success) {
if (atomic)
goto upm;
if (non_atomic_pte_lookup(vma, vaddr, write, , ))
-- 
2.7.4



[PATCH v2 2/3] sgi-gru: Remove CONFIG_HUGETLB_PAGE ifdef

2019-07-24 Thread Bharath Vedartham
is_vm_hugetlb_page has checks for whether CONFIG_HUGETLB_PAGE is defined
or not. If CONFIG_HUGETLB_PAGE is not defined is_vm_hugetlb_page will
always return false. There is no need to have an uneccessary
CONFIG_HUGETLB_PAGE check in the code.

Cc: Ira Weiny 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Greg Kroah-Hartman 
Cc: Dimitri Sivanich 
Cc: Arnd Bergmann 
Cc: William Kucharski 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Reviewed-by: John Hubbard 
Reviewed-by: William Kucharski 
Reviewed-by: Ira Weiny 
Signed-off-by: Bharath Vedartham 
---
Changes since v2
- Added an 'unlikely' if statement as suggested by William
  Kucharski. This is because of the fact that most systems
  using this driver won't have CONFIG_HUGE_PAGE enabled and we
  optimize the branch with an unlikely.

Signed-off-by: Bharath Vedartham 
---
 drivers/misc/sgi-gru/grufault.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index 61b3447..bce47af 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -180,11 +180,11 @@ static int non_atomic_pte_lookup(struct vm_area_struct 
*vma,
 {
struct page *page;
 
-#ifdef CONFIG_HUGETLB_PAGE
-   *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
-#else
-   *pageshift = PAGE_SHIFT;
-#endif
+   if (unlikely(is_vm_hugetlb_page(vma)))
+   *pageshift = HPAGE_SHIFT;
+   else
+   *pageshift = PAGE_SHIFT;
+
if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, , NULL) <= 0)
return -EFAULT;
*paddr = page_to_phys(page);
@@ -238,11 +238,12 @@ static int atomic_pte_lookup(struct vm_area_struct *vma, 
unsigned long vaddr,
return 1;
 
*paddr = pte_pfn(pte) << PAGE_SHIFT;
-#ifdef CONFIG_HUGETLB_PAGE
-   *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
-#else
-   *pageshift = PAGE_SHIFT;
-#endif
+
+   if (unlikely(is_vm_hugetlb_page(vma)))
+   *pageshift = HPAGE_SHIFT;
+   else
+   *pageshift = PAGE_SHIFT;
+
return 0;
 
 err:
-- 
2.7.4



[PATCH v2 1/3] sgi-gru: Convert put_page() to get_user_page*()

2019-07-24 Thread Bharath Vedartham
For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Ira Weiny 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Greg Kroah-Hartman 
Cc: Dimitri Sivanich 
Cc: Arnd Bergmann 
Cc: William Kucharski 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Reviewed-by: Ira Weiny 
Reviewed-by: John Hubbard 
Signed-off-by: Bharath Vedartham 
---
 drivers/misc/sgi-gru/grufault.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index 4b713a8..61b3447 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -188,7 +188,7 @@ static int non_atomic_pte_lookup(struct vm_area_struct *vma,
if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, , NULL) <= 0)
return -EFAULT;
*paddr = page_to_phys(page);
-   put_page(page);
+   put_user_page(page);
return 0;
 }
 
-- 
2.7.4



[PATCH v2 0/3] sgi-gru: get_user_page changes

2019-07-24 Thread Bharath Vedartham
This is version 2 of the patch series with a few non-functional changes.
Changes are described in the individual changelog.

This patch series incorporates a few changes in the get_user_page usage
of sgi-gru.

The main change is the first patch, which is a trivial one line change to
convert put_page to put_user_page to enable tracking of get_user_pages.

The second patch removes an uneccessary ifdef of CONFIG_HUGETLB.

The third patch adds __get_user_pages_fast in atomic_pte_lookup to retrive
a physical user page in an atomic context instead of manually walking up
the page tables like the current code does. This patch should be subject to
more review from the gup people.

drivers/misc/sgi-gru/* builds after this patch series. But I do not have the
hardware to verify these changes.

The first patch implements gup tracking in the current code. This is to be 
tested
as to check whether gup tracking works properly. Currently, in the upstream 
kernels
put_user_page simply calls put_page. But that is to change in the future.
Any suggestions as to how to test this code?

The implementation of gup tracking is in:
https://github.com/johnhubbard/linux/tree/gup_dma_core

We could test it by applying the first patch to the above tree and test it.

More details are in the individual changelogs.
Bharath Vedartham (3):
  sgi-gru: Convert put_page() to get_user_page*()
  sgi-gru: Remove CONFIG_HUGETLB_PAGE ifdef
  sgi-gru: Use __get_user_pages_fast in atomic_pte_lookup

 drivers/misc/sgi-gru/grufault.c | 73 ++---
 1 file changed, 24 insertions(+), 49 deletions(-)

-- 
2.7.4



Re: [PATCH 3/3] sgi-gru: Use __get_user_pages_fast in atomic_pte_lookup

2019-07-22 Thread Bharath Vedartham
On Sun, Jul 21, 2019 at 07:32:36PM -0700, John Hubbard wrote:
> On 7/21/19 8:58 AM, Bharath Vedartham wrote:
> > *pte_lookup functions get the physical address for a given virtual
> > address by getting a physical page using gup and use page_to_phys to get
> > the physical address.
> > 
> > Currently, atomic_pte_lookup manually walks the page tables. If this
> > function fails to get a physical page, it will fall back too
> > non_atomic_pte_lookup to get a physical page which uses the slow gup
> > path to get the physical page.
> > 
> > Instead of manually walking the page tables use __get_user_pages_fast
> > which does the same thing and it does not fall back to the slow gup
> > path.
> > 
> > This is largely inspired from kvm code. kvm uses __get_user_pages_fast
> > in hva_to_pfn_fast function which can run in an atomic context.
> > 
> > Cc: Ira Weiny 
> > Cc: John Hubbard 
> > Cc: Jérôme Glisse 
> > Cc: Greg Kroah-Hartman 
> > Cc: Dimitri Sivanich 
> > Cc: Arnd Bergmann 
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux...@kvack.org
> > Signed-off-by: Bharath Vedartham 
> > ---
> >  drivers/misc/sgi-gru/grufault.c | 39 
> > +--
> >  1 file changed, 5 insertions(+), 34 deletions(-)
> > 
> > diff --git a/drivers/misc/sgi-gru/grufault.c 
> > b/drivers/misc/sgi-gru/grufault.c
> > index 75108d2..121c9a4 100644
> > --- a/drivers/misc/sgi-gru/grufault.c
> > +++ b/drivers/misc/sgi-gru/grufault.c
> > @@ -202,46 +202,17 @@ static int non_atomic_pte_lookup(struct 
> > vm_area_struct *vma,
> >  static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long 
> > vaddr,
> > int write, unsigned long *paddr, int *pageshift)
> >  {
> > -   pgd_t *pgdp;
> > -   p4d_t *p4dp;
> > -   pud_t *pudp;
> > -   pmd_t *pmdp;
> > -   pte_t pte;
> > -
> > -   pgdp = pgd_offset(vma->vm_mm, vaddr);
> > -   if (unlikely(pgd_none(*pgdp)))
> > -   goto err;
> > -
> > -   p4dp = p4d_offset(pgdp, vaddr);
> > -   if (unlikely(p4d_none(*p4dp)))
> > -   goto err;
> > -
> > -   pudp = pud_offset(p4dp, vaddr);
> > -   if (unlikely(pud_none(*pudp)))
> > -   goto err;
> > +   struct page *page;
> >  
> > -   pmdp = pmd_offset(pudp, vaddr);
> > -   if (unlikely(pmd_none(*pmdp)))
> > -   goto err;
> > -#ifdef CONFIG_X86_64
> > -   if (unlikely(pmd_large(*pmdp)))
> > -   pte = *(pte_t *) pmdp;
> > -   else
> > -#endif
> > -   pte = *pte_offset_kernel(pmdp, vaddr);
> > +   *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> >  
> > -   if (unlikely(!pte_present(pte) ||
> > -(write && (!pte_write(pte) || !pte_dirty(pte)
> > +   if (!__get_user_pages_fast(vaddr, 1, write, ))
> > return 1;
> 
> Let's please use numeric, not boolean comparison, for the return value of 
> gup.
Alright then! I ll resubmit it!
> Also, optional: as long as you're there, atomic_pte_lookup() ought to
> either return a bool (true == success) or an errno, rather than a
> numeric zero or one.
That makes sense. But the code which uses atomic_pte_lookup uses the
return value of 1 for success and failure value of 0 in gru_vtop. That's
why I did not mess with the return values in this code. It would require
some change in the driver functionality which I am not ready to do :(
> Other than that, this looks like a good cleanup, I wonder how many
> open-coded gup implementations are floating around like this. 
I ll be on the lookout!
> thanks,
> -- 
> John Hubbard
> NVIDIA
> 
> >  
> > -   *paddr = pte_pfn(pte) << PAGE_SHIFT;
> > -
> > -   *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> > +   *paddr = page_to_phys(page);
> > +   put_user_page(page);
> >  
> > return 0;
> > -
> > -err:
> > -   return 1;
> >  }
> >  
> >  static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
> > 


Re: [PATCH 2/3] sgi-gru: Remove CONFIG_HUGETLB_PAGE ifdef

2019-07-22 Thread Bharath Vedartham
On Sun, Jul 21, 2019 at 09:20:38PM -0600, William Kucharski wrote:
> I suspect I'm being massively pedantic here, but the comments for 
> atomic_pte_lookup() note:
> 
>  * Only supports Intel large pages (2MB only) on x86_64.
>  *ZZZ - hugepage support is incomplete
> 
> That makes me wonder how many systems using this hardware are actually 
> configured with CONFIG_HUGETLB_PAGE.
> 
> I ask as in the most common case, this is likely introducing a few extra 
> instructions and possibly an additional branch to a routine that is called 
> per-fault.
> 
> So the nit-picky questions are:
> 
> 1) Does the code really need to be cleaned up in this way?
> 
> 2) If it does, does it make more sense (given the way pmd_large() is handled 
> now in atomic_pte_lookup()) for this to be coded as:
> 
> if (unlikely(is_vm_hugetlb_page(vma)))
>   *pageshift = HPAGE_SHIFT;
> else
>   *pageshift = PAGE_SHIFT;
> 
> In all likelihood, these questions are no-ops, and the optimizer may even 
> make my questions completely moot, but I thought I might as well ask anyway.
> 
That sounds reasonable. I am not really sure as to how much of 
an improvement it would be, the condition will be evaluated eitherways
AFAIK? Eitherways, the ternary operator does not look good. I ll make a
version 2 of this.
> > On Jul 21, 2019, at 9:58 AM, Bharath Vedartham  wrote:
> > 
> > is_vm_hugetlb_page has checks for whether CONFIG_HUGETLB_PAGE is defined
> > or not. If CONFIG_HUGETLB_PAGE is not defined is_vm_hugetlb_page will
> > always return false. There is no need to have an uneccessary
> > CONFIG_HUGETLB_PAGE check in the code.
> > 
> > Cc: Ira Weiny 
> > Cc: John Hubbard 
> > Cc: Jérôme Glisse 
> > Cc: Greg Kroah-Hartman 
> > Cc: Dimitri Sivanich 
> > Cc: Arnd Bergmann 
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux...@kvack.org
> > Signed-off-by: Bharath Vedartham 
> > ---
> > drivers/misc/sgi-gru/grufault.c | 11 +++
> > 1 file changed, 3 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/misc/sgi-gru/grufault.c 
> > b/drivers/misc/sgi-gru/grufault.c
> > index 61b3447..75108d2 100644
> > --- a/drivers/misc/sgi-gru/grufault.c
> > +++ b/drivers/misc/sgi-gru/grufault.c
> > @@ -180,11 +180,8 @@ static int non_atomic_pte_lookup(struct vm_area_struct 
> > *vma,
> > {
> > struct page *page;
> > 
> > -#ifdef CONFIG_HUGETLB_PAGE
> > *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> > -#else
> > -   *pageshift = PAGE_SHIFT;
> > -#endif
> > +
> > if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, , NULL) <= 0)
> > return -EFAULT;
> > *paddr = page_to_phys(page);
> > @@ -238,11 +235,9 @@ static int atomic_pte_lookup(struct vm_area_struct 
> > *vma, unsigned long vaddr,
> > return 1;
> > 
> > *paddr = pte_pfn(pte) << PAGE_SHIFT;
> > -#ifdef CONFIG_HUGETLB_PAGE
> > +
> > *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
> > -#else
> > -   *pageshift = PAGE_SHIFT;
> > -#endif
> > +
> > return 0;
> > 
> > err:
> > -- 
> > 2.7.4
> > 
> 


Re: [PATCH 1/3] sgi-gru: Convert put_page() to get_user_page*()

2019-07-22 Thread Bharath Vedartham
On Sun, Jul 21, 2019 at 07:25:31PM -0700, John Hubbard wrote:
> On 7/21/19 8:58 AM, Bharath Vedartham wrote:
> > For pages that were retained via get_user_pages*(), release those pages
> > via the new put_user_page*() routines, instead of via put_page().
> > 
> > This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> > ("mm: introduce put_user_page*(), placeholder versions").
> > 
> > Cc: Ira Weiny 
> > Cc: John Hubbard 
> > Cc: Jérôme Glisse 
> > Cc: Greg Kroah-Hartman 
> > Cc: Dimitri Sivanich 
> > Cc: Arnd Bergmann 
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux...@kvack.org
> > Signed-off-by: Bharath Vedartham 
> > ---
> >  drivers/misc/sgi-gru/grufault.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> 
> Reviewed-by: John Hubbard 
Thanks! 
> thanks,
> -- 
> John Hubbard
> NVIDIA
> 
> > diff --git a/drivers/misc/sgi-gru/grufault.c 
> > b/drivers/misc/sgi-gru/grufault.c
> > index 4b713a8..61b3447 100644
> > --- a/drivers/misc/sgi-gru/grufault.c
> > +++ b/drivers/misc/sgi-gru/grufault.c
> > @@ -188,7 +188,7 @@ static int non_atomic_pte_lookup(struct vm_area_struct 
> > *vma,
> > if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, , NULL) <= 0)
> > return -EFAULT;
> > *paddr = page_to_phys(page);
> > -   put_page(page);
> > +   put_user_page(page);
> > return 0;
> >  }
> >  
> > 


[PATCH 3/3] sgi-gru: Use __get_user_pages_fast in atomic_pte_lookup

2019-07-21 Thread Bharath Vedartham
*pte_lookup functions get the physical address for a given virtual
address by getting a physical page using gup and use page_to_phys to get
the physical address.

Currently, atomic_pte_lookup manually walks the page tables. If this
function fails to get a physical page, it will fall back too
non_atomic_pte_lookup to get a physical page which uses the slow gup
path to get the physical page.

Instead of manually walking the page tables use __get_user_pages_fast
which does the same thing and it does not fall back to the slow gup
path.

This is largely inspired from kvm code. kvm uses __get_user_pages_fast
in hva_to_pfn_fast function which can run in an atomic context.

Cc: Ira Weiny 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Greg Kroah-Hartman 
Cc: Dimitri Sivanich 
Cc: Arnd Bergmann 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Signed-off-by: Bharath Vedartham 
---
 drivers/misc/sgi-gru/grufault.c | 39 +--
 1 file changed, 5 insertions(+), 34 deletions(-)

diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index 75108d2..121c9a4 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -202,46 +202,17 @@ static int non_atomic_pte_lookup(struct vm_area_struct 
*vma,
 static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
int write, unsigned long *paddr, int *pageshift)
 {
-   pgd_t *pgdp;
-   p4d_t *p4dp;
-   pud_t *pudp;
-   pmd_t *pmdp;
-   pte_t pte;
-
-   pgdp = pgd_offset(vma->vm_mm, vaddr);
-   if (unlikely(pgd_none(*pgdp)))
-   goto err;
-
-   p4dp = p4d_offset(pgdp, vaddr);
-   if (unlikely(p4d_none(*p4dp)))
-   goto err;
-
-   pudp = pud_offset(p4dp, vaddr);
-   if (unlikely(pud_none(*pudp)))
-   goto err;
+   struct page *page;
 
-   pmdp = pmd_offset(pudp, vaddr);
-   if (unlikely(pmd_none(*pmdp)))
-   goto err;
-#ifdef CONFIG_X86_64
-   if (unlikely(pmd_large(*pmdp)))
-   pte = *(pte_t *) pmdp;
-   else
-#endif
-   pte = *pte_offset_kernel(pmdp, vaddr);
+   *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
 
-   if (unlikely(!pte_present(pte) ||
-(write && (!pte_write(pte) || !pte_dirty(pte)
+   if (!__get_user_pages_fast(vaddr, 1, write, ))
return 1;
 
-   *paddr = pte_pfn(pte) << PAGE_SHIFT;
-
-   *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
+   *paddr = page_to_phys(page);
+   put_user_page(page);
 
return 0;
-
-err:
-   return 1;
 }
 
 static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
-- 
2.7.4



[PATCH 0/3] sgi-gru: get_user_page changes

2019-07-21 Thread Bharath Vedartham
This patch series incorporates a few changes in the get_user_page usage 
of sgi-gru.

The main change is the first patch, which is a trivial one line change to 
convert put_page to put_user_page to enable tracking of get_user_pages.

The second patch removes an uneccessary ifdef of CONFIG_HUGETLB.

The third patch adds __get_user_pages_fast in atomic_pte_lookup to retrive
a physical user page in an atomic context instead of manually walking up
the page tables like the current code does. This patch should be subject to 
more review from the gup people.

drivers/misc/sgi-gru/* builds after this patch series. But I do not have the 
hardware to verify these changes. 

The first patch implements gup tracking in the current code. This is to be 
tested
as to check whether gup tracking works properly. Currently, in the upstream 
kernels
put_user_page simply calls put_page. But that is to change in the future. 
Any suggestions as to how to test this code?

The implementation of gup tracking is in:
https://github.com/johnhubbard/linux/tree/gup_dma_core

We could test it by applying the first patch to the above tree and test it.

More details are in the individual changelogs.

Bharath Vedartham (3):
  sgi-gru: Convert put_page() to get_user_page*()
  sgi-gru: Remove CONFIG_HUGETLB_PAGE ifdef
  sgi-gru: Use __get_user_pages_fast in atomic_pte_lookup

 drivers/misc/sgi-gru/grufault.c | 50 +++--
 1 file changed, 8 insertions(+), 42 deletions(-)

-- 
2.7.4



[PATCH 2/3] sgi-gru: Remove CONFIG_HUGETLB_PAGE ifdef

2019-07-21 Thread Bharath Vedartham
is_vm_hugetlb_page has checks for whether CONFIG_HUGETLB_PAGE is defined
or not. If CONFIG_HUGETLB_PAGE is not defined is_vm_hugetlb_page will
always return false. There is no need to have an uneccessary
CONFIG_HUGETLB_PAGE check in the code.

Cc: Ira Weiny 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Greg Kroah-Hartman 
Cc: Dimitri Sivanich 
Cc: Arnd Bergmann 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Signed-off-by: Bharath Vedartham 
---
 drivers/misc/sgi-gru/grufault.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index 61b3447..75108d2 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -180,11 +180,8 @@ static int non_atomic_pte_lookup(struct vm_area_struct 
*vma,
 {
struct page *page;
 
-#ifdef CONFIG_HUGETLB_PAGE
*pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
-#else
-   *pageshift = PAGE_SHIFT;
-#endif
+
if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, , NULL) <= 0)
return -EFAULT;
*paddr = page_to_phys(page);
@@ -238,11 +235,9 @@ static int atomic_pte_lookup(struct vm_area_struct *vma, 
unsigned long vaddr,
return 1;
 
*paddr = pte_pfn(pte) << PAGE_SHIFT;
-#ifdef CONFIG_HUGETLB_PAGE
+
*pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
-#else
-   *pageshift = PAGE_SHIFT;
-#endif
+
return 0;
 
 err:
-- 
2.7.4



[PATCH 1/3] sgi-gru: Convert put_page() to get_user_page*()

2019-07-21 Thread Bharath Vedartham
For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Ira Weiny 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Greg Kroah-Hartman 
Cc: Dimitri Sivanich 
Cc: Arnd Bergmann 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Signed-off-by: Bharath Vedartham 
---
 drivers/misc/sgi-gru/grufault.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index 4b713a8..61b3447 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -188,7 +188,7 @@ static int non_atomic_pte_lookup(struct vm_area_struct *vma,
if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, , NULL) <= 0)
return -EFAULT;
*paddr = page_to_phys(page);
-   put_page(page);
+   put_user_page(page);
return 0;
 }
 
-- 
2.7.4



Re: [PATCH v3] staging: kpc2000: Convert put_page to put_user_page*()

2019-07-20 Thread Bharath Vedartham
On Fri, Jul 19, 2019 at 08:59:02PM +, Matt Sickler wrote:
> >From: Bharath Vedartham 
> >Changes since v2
> >- Added back PageResevered check as suggested by John Hubbard.
> >
> >The PageReserved check needs a closer look and is not worth messing
> >around with for now.
> >
> >Matt, Could you give any suggestions for testing this patch?
> 
> Myself or someone else from Daktronics would have to do the testing since the
> hardware isn't really commercially available.  I've been toying with the idea
> of asking for a volunteer from the mailing list to help me out with this - I'd
> send them some hardware and they'd do all the development and testing. :)
> I still have to run that idea by Management though.
> 
> >If in-case, you are willing to pick this up to test. Could you
> >apply this patch to this tree and test it with your devices?
> 
> I've been meaning to get to testing the changes to the drivers since 
> upstreaming
> them, but I've been swamped with other development.  I'm keeping an eye on the
> mailing lists, so I'm at least aware of what is coming down the pipe.
> I'm not too worried about this specific change, even though I don't really 
> know
> if the reserved check and the dirtying are even necessary.
> It sounded like John's suggestion was to not do the PageReserved() check and 
> just
> use put_user_pges_dirty() all the time.  John, is that incorrect?
The change is fairly trivial in the upstream kernel. It requires no
testing in the upstream kernel. It would be great if you could test it
on John's git tree with the implemented gup tracking subsystem and check
if gup tracking is working alright with your dma driver. I think this
patch will easily apply to John's git tree.

Thanks!
Bharath


[PATCH v4] staging: kpc2000: Convert put_page to put_user_page*()

2019-07-20 Thread Bharath Vedartham
For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d ("mm: 
introduce put_user_page*(), placeholder versions").

Cc: Ira Weiny 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Greg Kroah-Hartman 
Cc: Matt Sickler 
Cc: de...@driverdev.osuosl.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Reviewed-by: John Hubbard 
Signed-off-by: Bharath Vedartham 
---
Changes since v1
   - Improved changelog by John's suggestion.
   - Moved logic to dirty pages below sg_dma_unmap
   and removed PageReserved check.
Changes since v2
   - Added back PageResevered check as suggested by John Hubbard.
Changes since v3
   - Changed the commit log as suggested by John.
   - Added John's Reviewed-By tag

---
 drivers/staging/kpc2000/kpc_dma/fileops.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c 
b/drivers/staging/kpc2000/kpc_dma/fileops.c
index 6166587..75ad263 100644
--- a/drivers/staging/kpc2000/kpc_dma/fileops.c
+++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
@@ -198,9 +198,7 @@ int  kpc_dma_transfer(struct dev_private_data *priv, struct 
kiocb *kcb, unsigned
sg_free_table(>sgt);
  err_dma_map_sg:
  err_alloc_sg_table:
-   for (i = 0 ; i < acd->page_count ; i++){
-   put_page(acd->user_pages[i]);
-   }
+   put_user_pages(acd->user_pages, acd->page_count);
  err_get_user_pages:
kfree(acd->user_pages);
  err_alloc_userpages:
@@ -221,16 +219,13 @@ void  transfer_complete_cb(struct aio_cb_data *acd, 
size_t xfr_count, u32 flags)

dev_dbg(>ldev->pldev->dev, "transfer_complete_cb(acd = [%p])\n", 
acd);

-   for (i = 0 ; i < acd->page_count ; i++){
-   if (!PageReserved(acd->user_pages[i])){
-   set_page_dirty(acd->user_pages[i]);
-   }
-   }
-   
dma_unmap_sg(>ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, 
acd->ldev->dir);

-   for (i = 0 ; i < acd->page_count ; i++){
-   put_page(acd->user_pages[i]);
+   for (i = 0; i < acd->page_count; i++) {
+   if (!PageReserved(acd->user_pages[i]))
+   put_user_pages_dirty(>user_pages[i], 1);
+   else
+   put_user_page(acd->user_pages[i]);
}

sg_free_table(>sgt);
-- 
2.7.4



Re: [PATCH v3] staging: kpc2000: Convert put_page to put_user_page*()

2019-07-20 Thread Bharath Vedartham
On Fri, Jul 19, 2019 at 02:28:39PM -0700, John Hubbard wrote:
> On 7/19/19 1:02 PM, Bharath Vedartham wrote:
> > There have been issues with coordination of various subsystems using
> > get_user_pages. These issues are better described in [1].
> > 
> > An implementation of tracking get_user_pages is currently underway
> > The implementation requires the use put_user_page*() variants to release
> > a reference rather than put_page(). The commit that introduced
> > put_user_pages, Commit fc1d8e7cca2daa18d2fe56b94874848adf89d7f5 ("mm: 
> > introduce
> > put_user_page*(), placeholder version").
> > 
> > The implementation currently simply calls put_page() within
> > put_user_page(). But in the future, it is to change to add a mechanism
> > to keep track of get_user_pages. Once a tracking mechanism is
> > implemented, we can make attempts to work on improving on coordination
> > between various subsystems using get_user_pages.
> > 
> > [1] https://lwn.net/Articles/753027/
> 
> Optional: I've been fussing about how to keep the change log reasonable,
> and finally came up with the following recommended template for these 
> conversion patches. This would replace the text you have above, because the 
> put_user_page placeholder commit has all the documentation (and then some) 
> that we need:
> 
> 
> For pages that were retained via get_user_pages*(), release those pages
> via the new put_user_page*() routines, instead of via put_page().
> 
> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions").
Great then, I ll send another patch with the updated changelog.
> 
> For the change itself, you will need to rebase it onto the latest 
> linux.git, as it doesn't quite apply there. 
> 
> Testing is good if we can get it, but as far as I can tell this is
> correct, so you can also add:
> 
> Reviewed-by: John Hubbard 
Thanks! 
> thanks,
> -- 
> John Hubbard
> NVIDIA
>
> > 
> > Cc: Ira Weiny 
> > Cc: John Hubbard 
> > Cc: Jérôme Glisse 
> > Cc: Greg Kroah-Hartman 
> > Cc: Matt Sickler 
> > Cc: de...@driverdev.osuosl.org 
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux...@kvack.org
> > Signed-off-by: Bharath Vedartham 
> > ---
> > Changes since v1
> > - Improved changelog by John's suggestion.
> > - Moved logic to dirty pages below sg_dma_unmap
> > and removed PageReserved check.
> > Changes since v2
> > - Added back PageResevered check as suggested by John Hubbard.
> > 
> > The PageReserved check needs a closer look and is not worth messing
> > around with for now.
> > 
> > Matt, Could you give any suggestions for testing this patch?
> > 
> > If in-case, you are willing to pick this up to test. Could you
> > apply this patch to this tree
> > https://github.com/johnhubbard/linux/tree/gup_dma_core
> > and test it with your devices?
> > 
> > ---
> >  drivers/staging/kpc2000/kpc_dma/fileops.c | 17 ++---
> >  1 file changed, 6 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c 
> > b/drivers/staging/kpc2000/kpc_dma/fileops.c
> > index 6166587..75ad263 100644
> > --- a/drivers/staging/kpc2000/kpc_dma/fileops.c
> > +++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
> > @@ -198,9 +198,7 @@ int  kpc_dma_transfer(struct dev_private_data *priv, 
> > struct kiocb *kcb, unsigned
> > sg_free_table(>sgt);
> >   err_dma_map_sg:
> >   err_alloc_sg_table:
> > -   for (i = 0 ; i < acd->page_count ; i++){
> > -   put_page(acd->user_pages[i]);
> > -   }
> > +   put_user_pages(acd->user_pages, acd->page_count);
> >   err_get_user_pages:
> > kfree(acd->user_pages);
> >   err_alloc_userpages:
> > @@ -221,16 +219,13 @@ void  transfer_complete_cb(struct aio_cb_data *acd, 
> > size_t xfr_count, u32 flags)
> > 
> > dev_dbg(>ldev->pldev->dev, "transfer_complete_cb(acd = [%p])\n", 
> > acd);
> > 
> > -   for (i = 0 ; i < acd->page_count ; i++){
> > -   if (!PageReserved(acd->user_pages[i])){
> > -   set_page_dirty(acd->user_pages[i]);
> > -   }
> > -   }
> > -   
> > dma_unmap_sg(>ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, 
> > acd->ldev->dir);
> > 
> > -   for (i = 0 ; i < acd->page_count ; i++){
> > -   put_page(acd->user_pages[i]);
> > +   for (i = 0; i < acd->page_count; i++) {
> > +   if (!PageReserved(acd->user_pages[i]))
> > +   put_user_pages_dirty(>user_pages[i], 1);
> > +   else
> > +   put_user_page(acd->user_pages[i]);
> > }
> > 
> > sg_free_table(>sgt);
> > 


[PATCH v3] staging: kpc2000: Convert put_page to put_user_page*()

2019-07-19 Thread Bharath Vedartham
There have been issues with coordination of various subsystems using
get_user_pages. These issues are better described in [1].

An implementation of tracking get_user_pages is currently underway
The implementation requires the use put_user_page*() variants to release
a reference rather than put_page(). The commit that introduced
put_user_pages, Commit fc1d8e7cca2daa18d2fe56b94874848adf89d7f5 ("mm: introduce
put_user_page*(), placeholder version").

The implementation currently simply calls put_page() within
put_user_page(). But in the future, it is to change to add a mechanism
to keep track of get_user_pages. Once a tracking mechanism is
implemented, we can make attempts to work on improving on coordination
between various subsystems using get_user_pages.

[1] https://lwn.net/Articles/753027/

Cc: Ira Weiny 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Greg Kroah-Hartman 
Cc: Matt Sickler 
Cc: de...@driverdev.osuosl.org 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Signed-off-by: Bharath Vedartham 
---
Changes since v1
- Improved changelog by John's suggestion.
- Moved logic to dirty pages below sg_dma_unmap
and removed PageReserved check.
Changes since v2
- Added back PageResevered check as suggested by John Hubbard.

The PageReserved check needs a closer look and is not worth messing
around with for now.

Matt, Could you give any suggestions for testing this patch?

If in-case, you are willing to pick this up to test. Could you
apply this patch to this tree
https://github.com/johnhubbard/linux/tree/gup_dma_core
and test it with your devices?

---
 drivers/staging/kpc2000/kpc_dma/fileops.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c 
b/drivers/staging/kpc2000/kpc_dma/fileops.c
index 6166587..75ad263 100644
--- a/drivers/staging/kpc2000/kpc_dma/fileops.c
+++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
@@ -198,9 +198,7 @@ int  kpc_dma_transfer(struct dev_private_data *priv, struct 
kiocb *kcb, unsigned
sg_free_table(>sgt);
  err_dma_map_sg:
  err_alloc_sg_table:
-   for (i = 0 ; i < acd->page_count ; i++){
-   put_page(acd->user_pages[i]);
-   }
+   put_user_pages(acd->user_pages, acd->page_count);
  err_get_user_pages:
kfree(acd->user_pages);
  err_alloc_userpages:
@@ -221,16 +219,13 @@ void  transfer_complete_cb(struct aio_cb_data *acd, 
size_t xfr_count, u32 flags)

dev_dbg(>ldev->pldev->dev, "transfer_complete_cb(acd = [%p])\n", 
acd);

-   for (i = 0 ; i < acd->page_count ; i++){
-   if (!PageReserved(acd->user_pages[i])){
-   set_page_dirty(acd->user_pages[i]);
-   }
-   }
-   
dma_unmap_sg(>ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, 
acd->ldev->dir);

-   for (i = 0 ; i < acd->page_count ; i++){
-   put_page(acd->user_pages[i]);
+   for (i = 0; i < acd->page_count; i++) {
+   if (!PageReserved(acd->user_pages[i]))
+   put_user_pages_dirty(>user_pages[i], 1);
+   else
+   put_user_page(acd->user_pages[i]);
}

sg_free_table(>sgt);
-- 
2.7.4



Re: [PATCH 4.14 00/80] 4.14.134-stable review

2019-07-18 Thread Bharath Vedartham
Built and booted on my x86 machine. No dmesg regressions.

Thank you
Bharath


Re: [PATCH 4.4 00/40] 4.4.186-stable review

2019-07-18 Thread Bharath Vedartham
Built and booted on my x86 system. No dmesg regressions found.

Thank you
Bharath


Re: [PATCH 5.1 00/54] 5.1.19-stable review

2019-07-18 Thread Bharath Vedartham
Built and booted on my x86 machine. No dmesg regressions found.

Thank you
Bharath


Re: [PATCH 4.9 00/54] 4.9.186-stable review

2019-07-18 Thread Bharath Vedartham
Built and booted in my x86 test machine. No regressions found.

Thank you
Bharath


Re: [PATCH] staging: kpc2000: Convert put_page() to put_user_page*()

2019-07-16 Thread Bharath Vedartham
On Mon, Jul 15, 2019 at 03:01:43PM -0700, John Hubbard wrote:
> On 7/15/19 2:47 PM, Matt Sickler wrote:
> > It looks like Outlook is going to absolutely trash this email.  Hopefully 
> > it comes through okay.
> > 
> ...
> >>
> >> Because this is a common pattern, and because the code here doesn't likely
> >> need to set page dirty before the dma_unmap_sg call, I think the following
> >> would be better (it's untested), instead of the above diff hunk:
> >>
> >> diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c
> >> b/drivers/staging/kpc2000/kpc_dma/fileops.c
> >> index 48ca88bc6b0b..d486f9866449 100644
> >> --- a/drivers/staging/kpc2000/kpc_dma/fileops.c
> >> +++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
> >> @@ -211,16 +211,13 @@ void  transfer_complete_cb(struct aio_cb_data
> >> *acd, size_t xfr_count, u32 flags)
> >>BUG_ON(acd->ldev == NULL);
> >>BUG_ON(acd->ldev->pldev == NULL);
> >>
> >> -   for (i = 0 ; i < acd->page_count ; i++) {
> >> -   if (!PageReserved(acd->user_pages[i])) {
> >> -   set_page_dirty(acd->user_pages[i]);
> >> -   }
> >> -   }
> >> -
> >>dma_unmap_sg(>ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, 
> >> acd->ldev->dir);
> >>
> >>for (i = 0 ; i < acd->page_count ; i++) {
> >> -   put_page(acd->user_pages[i]);
> >> +   if (!PageReserved(acd->user_pages[i])) {
> >> +   put_user_pages_dirty(>user_pages[i], 1);
> >> +   else
> >> +   put_user_page(acd->user_pages[i]);
> >>}
> >>
> >>sg_free_table(>sgt);
> > 
> > I don't think I ever really knew the right way to do this. 
> > 
> > The changes Bharath suggested look okay to me.  I'm not sure about the 
> > check for PageReserved(), though.  At first glance it appears to be 
> > equivalent to what was there before, but maybe I should learn what that 
> > Reserved page flag really means.
> > From [1], the only comment that seems applicable is
> > * - MMIO/DMA pages. Some architectures don't allow to ioremap pages that are
> >  *   not marked PG_reserved (as they might be in use by somebody else who 
> > does
> >  *   not respect the caching strategy).
> > 
> > These pages should be coming from anonymous (RAM, not file backed) memory 
> > in userspace.  Sometimes it comes from hugepage backed memory, though I 
> > don't think that makes a difference.  I should note that 
> > transfer_complete_cb handles both RAM to device and device to RAM DMAs, if 
> > that matters.
Yes. file_operations->read passes a userspace buffer which AFAIK is
anonymous memory.
> > [1] 
> > https://elixir.bootlin.com/linux/v5.2/source/include/linux/page-flags.h#L17
> > 
> 
> I agree: the PageReserved check looks unnecessary here, from my 
> outside-the-kpc_2000-team
> perspective, anyway. Assuming that your analysis above is correct, you could 
> collapse that
> whole think into just:
Since the file_operations->read passes a userspace buffer, I doubt that
the pages of the userspace buffer will be reserved.
> @@ -211,17 +209,8 @@ void  transfer_complete_cb(struct aio_cb_data *acd, 
> size_t xfr_count, u32 flags)
> BUG_ON(acd->ldev == NULL);
> BUG_ON(acd->ldev->pldev == NULL);
>  
> -   for (i = 0 ; i < acd->page_count ; i++) {
> -   if (!PageReserved(acd->user_pages[i])) {
> -   set_page_dirty(acd->user_pages[i]);
> -   }
> -   }
> -
> dma_unmap_sg(>ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, 
> acd->ldev->dir);
> -
> -   for (i = 0 ; i < acd->page_count ; i++) {
> -   put_page(acd->user_pages[i]);
> -   }
> +   put_user_pages_dirty(>user_pages[i], acd->page_count);
>  
> sg_free_table(>sgt);
>  
> (Also, Matt, I failed to Cc: you on a semi-related cleanup that I just sent 
> out for this
> driver, as long as I have your attention:
> 
>https://lore.kernel.org/r/20190715212123.432-1-jhubb...@nvidia.com
> )
Matt will you be willing to pick this up for testing or do you want a
different patch?
> thanks,
> -- 
> John Hubbard
> NVIDIA
Thank you
Bharath


[PATCH v2] staging: kpc2000: Convert put_page() to put_user_page*()

2019-07-15 Thread Bharath Vedartham
There have been issues with get_user_pages and filesystem writeback.
The issues are better described in [1].

The solution being proposed wants to keep track of gup_pinned pages
which will allow to take furthur steps to coordinate between subsystems
using gup.

put_user_page() simply calls put_page inside for now. But the
implementation will change once all call sites of put_page() are
converted.

[1] https://lwn.net/Articles/753027/

Cc: Matt Sickler 
Cc: Greg Kroah-Hartman 
Cc: Jérôme Glisse 
Cc: Ira Weiny 
Cc: John Hubbard 
Cc: linux...@kvack.org
Cc: de...@driverdev.osuosl.org

Reviewed-by: John Hubbard 
Signed-off-by: Bharath Vedartham 
---
Changes since v1
- Added John's reviewed-by tag
- Moved para talking about testing below
the '---'
- Moved logic of set_page_diry below dma_unmap_sg
as per John's suggestion

I currently do not have the driver to test. Could I have some
suggestions to test this code? The solution is currently implemented
in https://github.com/johnhubbard/linux/tree/gup_dma_core and it would be great 
if we could apply the patch on top of the repo and run some 
tests to check if any regressions occur.
---
 drivers/staging/kpc2000/kpc_dma/fileops.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c 
b/drivers/staging/kpc2000/kpc_dma/fileops.c
index 48ca88b..3d1a00a 100644
--- a/drivers/staging/kpc2000/kpc_dma/fileops.c
+++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
@@ -190,9 +190,7 @@ static int kpc_dma_transfer(struct dev_private_data *priv,
sg_free_table(>sgt);
  err_dma_map_sg:
  err_alloc_sg_table:
-   for (i = 0 ; i < acd->page_count ; i++) {
-   put_page(acd->user_pages[i]);
-   }
+   put_user_pages(acd->user_pages, acd->page_count);
  err_get_user_pages:
kfree(acd->user_pages);
  err_alloc_userpages:
@@ -211,16 +209,13 @@ void  transfer_complete_cb(struct aio_cb_data *acd, 
size_t xfr_count, u32 flags)
BUG_ON(acd->ldev == NULL);
BUG_ON(acd->ldev->pldev == NULL);
 
-   for (i = 0 ; i < acd->page_count ; i++) {
-   if (!PageReserved(acd->user_pages[i])) {
-   set_page_dirty(acd->user_pages[i]);
-   }
-   }
-
dma_unmap_sg(>ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, 
acd->ldev->dir);
 
-   for (i = 0 ; i < acd->page_count ; i++) {
-   put_page(acd->user_pages[i]);
+   for (i = 0; i < acd->page_count; i++) {
+   if (!PageReserved(acd->user_pages[i])) 
+   put_user_pages_dirty(>user_pages[i], 1);
+   else
+   put_user_page(acd->user_pages[i]);
}
 
sg_free_table(>sgt);
-- 
2.7.4



Re: [PATCH] staging: kpc2000: Convert put_page() to put_user_page*()

2019-07-15 Thread Bharath Vedartham
On Mon, Jul 15, 2019 at 01:14:13PM -0700, John Hubbard wrote:
> On 7/15/19 12:52 PM, Bharath Vedartham wrote:
> > There have been issues with get_user_pages and filesystem writeback.
> > The issues are better described in [1].
> > 
> > The solution being proposed wants to keep track of gup_pinned pages which 
> > will allow to take furthur steps to coordinate between subsystems using gup.
> > 
> > put_user_page() simply calls put_page inside for now. But the 
> > implementation will change once all call sites of put_page() are converted.
> > 
> > I currently do not have the driver to test. Could I have some suggestions 
> > to test this code? The solution is currently implemented in [2] and
> > it would be great if we could apply the patch on top of [2] and run some 
> > tests to check if any regressions occur.
> 
> Hi Bharath,
> 
> Process point: the above paragraph, and other meta-questions (about the 
> patch, rather than part of the patch) should be placed either after the 
> "---", or in a cover letter (git-send-email --cover-letter). That way, the 
> patch itself is in a commit-able state.
> 
> One more below:
Will fix that in the next version. 
> > 
> > [1] https://lwn.net/Articles/753027/
> > [2] https://github.com/johnhubbard/linux/tree/gup_dma_core
> > 
> > Cc: Matt Sickler 
> > Cc: Greg Kroah-Hartman 
> > Cc: Jérôme Glisse 
> > Cc: Ira Weiny 
> > Cc: John Hubbard 
> > Cc: linux...@kvack.org
> > Cc: de...@driverdev.osuosl.org
> > 
> > Signed-off-by: Bharath Vedartham 
> > ---
> >  drivers/staging/kpc2000/kpc_dma/fileops.c | 8 ++--
> >  1 file changed, 2 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c 
> > b/drivers/staging/kpc2000/kpc_dma/fileops.c
> > index 6166587..82c70e6 100644
> > --- a/drivers/staging/kpc2000/kpc_dma/fileops.c
> > +++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
> > @@ -198,9 +198,7 @@ int  kpc_dma_transfer(struct dev_private_data *priv, 
> > struct kiocb *kcb, unsigned
> > sg_free_table(>sgt);
> >   err_dma_map_sg:
> >   err_alloc_sg_table:
> > -   for (i = 0 ; i < acd->page_count ; i++){
> > -   put_page(acd->user_pages[i]);
> > -   }
> > +   put_user_pages(acd->user_pages, acd->page_count);
> >   err_get_user_pages:
> > kfree(acd->user_pages);
> >   err_alloc_userpages:
> > @@ -229,9 +227,7 @@ void  transfer_complete_cb(struct aio_cb_data *acd, 
> > size_t xfr_count, u32 flags)
> > 
> > dma_unmap_sg(>ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, 
> > acd->ldev->dir);
> > 
> > -   for (i = 0 ; i < acd->page_count ; i++){
> > -   put_page(acd->user_pages[i]);
> > -   }
> > +   put_user_pages(acd->user_pages, acd->page_count);
> > 
> > sg_free_table(>sgt);
> > 
> > 
> 
> Because this is a common pattern, and because the code here doesn't likely 
> need to set page dirty before the dma_unmap_sg call, I think the following 
> would be better (it's untested), instead of the above diff hunk:
>
> diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c 
> b/drivers/staging/kpc2000/kpc_dma/fileops.c
> index 48ca88bc6b0b..d486f9866449 100644
> --- a/drivers/staging/kpc2000/kpc_dma/fileops.c
> +++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
> @@ -211,16 +211,13 @@ void  transfer_complete_cb(struct aio_cb_data *acd, 
> size_t xfr_count, u32 flags)
> BUG_ON(acd->ldev == NULL);
> BUG_ON(acd->ldev->pldev == NULL);
>  
> -   for (i = 0 ; i < acd->page_count ; i++) {
> -   if (!PageReserved(acd->user_pages[i])) {
> -   set_page_dirty(acd->user_pages[i]);
> -   }
> -   }
> -
> dma_unmap_sg(>ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, 
> acd->ldev->dir);
>  
> for (i = 0 ; i < acd->page_count ; i++) {
> -   put_page(acd->user_pages[i]);
> +   if (!PageReserved(acd->user_pages[i])) {
> +   put_user_pages_dirty(>user_pages[i], 1);
> +   else
> +   put_user_page(acd->user_pages[i]);
> }
>  
> sg_free_table(>sgt);
I had my doubts on this. This definitley needs to be looked at by the
driver author. 
> Assuming that you make those two changes, you can add:
> 
> Reviewed-by: John Hubbard 
Great!
> 
> thanks,
> -- 
> John Hubbard
> NVIDIA


[PATCH] staging: kpc2000: Convert put_page() to put_user_page*()

2019-07-15 Thread Bharath Vedartham
There have been issues with get_user_pages and filesystem writeback.
The issues are better described in [1].

The solution being proposed wants to keep track of gup_pinned pages which will 
allow to take furthur steps to coordinate between subsystems using gup.

put_user_page() simply calls put_page inside for now. But the implementation 
will change once all call sites of put_page() are converted.

I currently do not have the driver to test. Could I have some suggestions to 
test this code? The solution is currently implemented in [2] and
it would be great if we could apply the patch on top of [2] and run some tests 
to check if any regressions occur.

[1] https://lwn.net/Articles/753027/
[2] https://github.com/johnhubbard/linux/tree/gup_dma_core

Cc: Matt Sickler 
Cc: Greg Kroah-Hartman 
Cc: Jérôme Glisse 
Cc: Ira Weiny 
Cc: John Hubbard 
Cc: linux...@kvack.org
Cc: de...@driverdev.osuosl.org

Signed-off-by: Bharath Vedartham 
---
 drivers/staging/kpc2000/kpc_dma/fileops.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c 
b/drivers/staging/kpc2000/kpc_dma/fileops.c
index 6166587..82c70e6 100644
--- a/drivers/staging/kpc2000/kpc_dma/fileops.c
+++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
@@ -198,9 +198,7 @@ int  kpc_dma_transfer(struct dev_private_data *priv, struct 
kiocb *kcb, unsigned
sg_free_table(>sgt);
  err_dma_map_sg:
  err_alloc_sg_table:
-   for (i = 0 ; i < acd->page_count ; i++){
-   put_page(acd->user_pages[i]);
-   }
+   put_user_pages(acd->user_pages, acd->page_count);
  err_get_user_pages:
kfree(acd->user_pages);
  err_alloc_userpages:
@@ -229,9 +227,7 @@ void  transfer_complete_cb(struct aio_cb_data *acd, size_t 
xfr_count, u32 flags)

dma_unmap_sg(>ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, 
acd->ldev->dir);

-   for (i = 0 ; i < acd->page_count ; i++){
-   put_page(acd->user_pages[i]);
-   }
+   put_user_pages(acd->user_pages, acd->page_count);

sg_free_table(>sgt);

-- 
1.8.3.1



Re: [PATCH 4.19 00/72] 4.19.57-stable review

2019-07-03 Thread Bharath Vedartham
Tested and booted in my x86 system. No regressions.


Re: [PATCH 5.1 00/55] 5.1.16-stable review

2019-07-03 Thread Bharath Vedartham
Tested and booted on my x86 system. No regressions.


Re: [PATCH 4.14 00/43] 4.14.132-stable review

2019-07-03 Thread Bharath Vedartham
Booted and tested in my x86 systems. No regressions.


Re: [PATCH] mm: fix setting the high and low watermarks

2019-06-21 Thread Bharath Vedartham
On Fri, Jun 21, 2019 at 02:09:31PM +0200, Vlastimil Babka wrote:
> On 6/21/19 1:43 PM, Alan Jenkins wrote:
> > When setting the low and high watermarks we use min_wmark_pages(zone).
> > I guess this is to reduce the line length.  But we forgot that this macro
> > includes zone->watermark_boost.  We need to reset zone->watermark_boost
> > first.  Otherwise the watermarks will be set inconsistently.
> > 
> > E.g. this could cause inconsistent values if the watermarks have been
> > boosted, and then you change a sysctl which triggers
> > __setup_per_zone_wmarks().
> > 
> > I strongly suspect this explains why I have seen slightly high watermarks.
> > Suspicious-looking zoneinfo below - notice high-low != low-min.
> > 
> > Node 0, zone   Normal
> >   pages free 74597
> > min  9582
> > low  34505
> > high 36900
> > 
> > https://unix.stackexchange.com/questions/525674/my-low-and-high-watermarks-seem-higher-than-predicted-by-documentation-sysctl-vm/525687
> > 
> > Signed-off-by: Alan Jenkins 
> > Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external
> >   fragmentation event occurs")
> > Cc: sta...@vger.kernel.org
> 
> Nice catch, thanks!
> 
> Acked-by: Vlastimil Babka 
> 
> Personally I would implement it a bit differently, see below. If you
> agree, it's fine if you keep the authorship of the whole patch.
> 
> > ---
> > 
> > Tested by compiler :-).
> > 
> > Ideally the commit message would be clear about what happens the
> > *first* time __setup_per_zone_watermarks() is called.  I guess that
> > zone->watermark_boost is *usually* zero, or we would have noticed
> > some wild problems :-).  However I am not familiar with how the zone
> > structures are allocated & initialized.  Maybe there is a case where
> > zone->watermark_boost could contain an arbitrary unitialized value
> > at this point.  Can we rule that out?
> 
> Dunno if there's some arch override, but generic_alloc_nodedata() uses
> kzalloc() so it's zeroed.
> 
> -8<-
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d66bc8abe0af..3b2f0cedf78e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7624,6 +7624,7 @@ static void __setup_per_zone_wmarks(void)
>  
>   for_each_zone(zone) {
>   u64 tmp;
> + unsigned long wmark_min;
>  
>   spin_lock_irqsave(>lock, flags);
>   tmp = (u64)pages_min * zone_managed_pages(zone);
> @@ -7642,13 +7643,13 @@ static void __setup_per_zone_wmarks(void)
>  
>   min_pages = zone_managed_pages(zone) / 1024;
>   min_pages = clamp(min_pages, SWAP_CLUSTER_MAX, 128UL);
> - zone->_watermark[WMARK_MIN] = min_pages;
> + wmark_min = min_pages;
>   } else {
>   /*
>* If it's a lowmem zone, reserve a number of pages
>* proportionate to the zone's size.
>*/
> - zone->_watermark[WMARK_MIN] = tmp;
> + wmark_min = tmp;
>   }
>  
>   /*
> @@ -7660,8 +7661,9 @@ static void __setup_per_zone_wmarks(void)
>   mult_frac(zone_managed_pages(zone),
> watermark_scale_factor, 1));
>  
> - zone->_watermark[WMARK_LOW]  = min_wmark_pages(zone) + tmp;
> - zone->_watermark[WMARK_HIGH] = min_wmark_pages(zone) + tmp * 2;
> + zone->_watermark[WMARK_MIN]  = wmark_min;
> + zone->_watermark[WMARK_LOW]  = wmark_min + tmp;
> + zone->_watermark[WMARK_HIGH] = wmark_min + tmp * 2;
>   zone->watermark_boost = 0;
Do you think this could cause a race condition between
__setup_per_zone_wmarks and pgdat_watermark_boosted which checks whether
the watermark_boost of each zone is non-zero? pgdat_watermark_boosted is
not called with a zone lock.
Here is a probable case scenario:
watermarks are boosted in steal_suitable_fallback(which happens under a
zone lock). After that kswapd is woken up by
wakeup_kswapd(zone,0,0,zone_idx(zone)) in rmqueue without holding a
zone lock. Lets say someone modified min_kfree_bytes, this would lead to
all the zone->watermark_boost being set to 0. This may cause
pgdat_watermark_boosted to return false, which would not wakeup kswapd
as intended by boosting the watermark. This behaviour is similar to waking up 
kswapd for a
balanced node.

Also if kswapd was woken up successfully because of watermarks being
boosted. In balance_pgdat, we use nr_boost_reclaim to count number of
pages to reclaim because of boosting. nr_boost_reclaim is calculated as:
nr_boost_reclaim = 0;
for (i = 0; i <= classzone_idx; i++) {
zone = pgdat->node_zones + i;
if (!managed_zone(zone))
continue;

nr_boost_reclaim += zone->watermark_boost;
zone_boosts[i] = zone->watermark_boost;
}
boosted = 

Re: [PATCH] mm: Remove VM_BUG_ON in __alloc_pages_node

2019-06-05 Thread Bharath Vedartham
IMO the reason why a lot of failures must not have occured in the past
might be because the programs which use it use stuff like cpu_to_node or
have checks for nid.
If one day we do get a program which passes an invalid node id without
VM_BUG_ON enabled, it might get weird.



Re: [PATCH] mm: Remove VM_BUG_ON in __alloc_pages_node

2019-06-05 Thread Bharath Vedartham
[Not replying inline as my mail is bouncing back]

This patch is based on reading the code rather than a kernel crash. My
thought process was that if an invalid node id was passed to
__alloc_pages_node, it would be better to add a VM_WARN_ON and fail the
allocation rather than crashing the kernel. 
I feel it would be better to fail the allocation early in the hot path
if an invalid node id is passed. This is irrespective of whether the
VM_[BUG|WARN]_*s are enabled or not. I do not see any checks in the hot
path for the node id, which in turn may cause NODE_DATA(nid) to fail to
get the pglist_data pointer for the node id. 
We can optimise the branch by wrapping it around in unlikely(), if
performance is the issue?
What are your thoughts on this? 

Thank you 
Bharath


[PATCH] mm: Remove VM_BUG_ON in __alloc_pages_node

2019-06-05 Thread Bharath Vedartham
In __alloc_pages_node, there is a VM_BUG_ON on the condition (nid < 0 ||
nid >= MAX_NUMNODES). Remove this VM_BUG_ON and add a VM_WARN_ON, if the
condition fails and fail the allocation if an invalid NUMA node id is
passed to __alloc_pages_node.

The check (nid < 0 || nid >= MAX_NUMNODES) also considers NUMA_NO_NODE
as an invalid nid, but the caller of __alloc_pages_node is assumed to
have checked for the case where nid == NUMA_NO_NODE.

Signed-off-by: Bharath Vedartham 
---
 include/linux/gfp.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 5f5e25f..075bdaf 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -480,7 +480,11 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order, int 
preferred_nid)
 static inline struct page *
 __alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
 {
-   VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
+   if (nid < 0 || nid >= MAX_NUMNODES) {
+   VM_WARN_ON(nid < 0 || nid >= MAX_NUMNODES);
+   return NULL; 
+   }
+
VM_WARN_ON((gfp_mask & __GFP_THISNODE) && !node_online(nid));
 
return __alloc_pages(gfp_mask, order, nid);
-- 
2.7.4



[PATCH] 9p/vfs_super.c: Remove unused parameter data in v9fs_fill_super

2019-05-23 Thread Bharath Vedartham
v9fs_fill_super has a param 'void *data' which is unused in the
function.

This patch removes the 'void *data' param in v9fs_fill_super and changes
the parameters in all function calls of v9fs_fill_super.

Signed-off-by: Bharath Vedartham 
---
 fs/9p/vfs_super.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index 67d1b96..00f2078 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -73,7 +73,7 @@ static int v9fs_set_super(struct super_block *s, void *data)
 
 static int
 v9fs_fill_super(struct super_block *sb, struct v9fs_session_info *v9ses,
-   int flags, void *data)
+   int flags)
 {
int ret;
 
@@ -143,7 +143,7 @@ static struct dentry *v9fs_mount(struct file_system_type 
*fs_type, int flags,
retval = PTR_ERR(sb);
goto clunk_fid;
}
-   retval = v9fs_fill_super(sb, v9ses, flags, data);
+   retval = v9fs_fill_super(sb, v9ses, flags);
if (retval)
goto release_sb;
 
-- 
2.7.4



Re: [PATCH v2] 9p/cache.c: Fix memory leak in v9fs_cache_session_get_cookie

2019-05-22 Thread Bharath Vedartham
Cool! Thanks!

Thank you,
Bharath


[PATCH v2] 9p/cache.c: Fix memory leak in v9fs_cache_session_get_cookie

2019-05-22 Thread Bharath Vedartham
v9fs_cache_session_get_cookie assigns a random cachetag to v9ses->cachetag,
if the cachetag is not assigned previously.

v9fs_random_cachetag allocates memory to v9ses->cachetag with kmalloc and uses
scnprintf to fill it up with a cachetag.

But if scnprintf fails, v9ses->cachetag is not freed in the current
code causing a memory leak.

Fix this by freeing v9ses->cachetag it v9fs_random_cachetag fails.

This was reported by syzbot, the link to the report is below:
https://syzkaller.appspot.com/bug?id=f012bdf297a7a4c860c38a88b44fbee43fd9bbf3

Reported-by: syzbot+3a030a73b6c1e9833...@syzkaller.appspotmail.com
Signed-off-by: Bharath Vedartham 

---
Changes since v2
- Made v9ses->cachetag NULL after freeing to avoid any
  side effects.
---
 fs/9p/cache.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/9p/cache.c b/fs/9p/cache.c
index 9eb3470..baf72da 100644
--- a/fs/9p/cache.c
+++ b/fs/9p/cache.c
@@ -66,6 +66,8 @@ void v9fs_cache_session_get_cookie(struct v9fs_session_info 
*v9ses)
if (!v9ses->cachetag) {
if (v9fs_random_cachetag(v9ses) < 0) {
v9ses->fscache = NULL;
+   kfree(v9ses->cachetag);
+   v9ses->cachetag = NULL; 
return;
}
}
-- 
2.7.4



[PATCH] 9p/cache.c: Fix memory leak in v9fs_cache_session_get_cookie

2019-05-22 Thread Bharath Vedartham
v9fs_cache_session_get_cookie assigns a random cachetag to
v9ses->cachetag, if the cachetag is not assigned previously.

v9fs_random_cachetag allocates memory to v9ses->cachetag with kmalloc
and uses scnprintf to fill it up with a cachetag.

But if scnprintf fails, v9ses->cachetag is not freed in the current code 
causing a memory leak.

Fix this by freeing v9ses->cachetag it v9fs_random_cachetag fails.

This was reported by syzbot, the link to the report is below:
https://syzkaller.appspot.com/bug?id=f012bdf297a7a4c860c38a88b44fbee43fd9bbf3

Reported-by: syzbot+3a030a73b6c1e9833...@syzkaller.appspotmail.com 
Signed-off-by: Bharath Vedartham 
---
 fs/9p/cache.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/9p/cache.c b/fs/9p/cache.c
index 9eb3470..4463b91 100644
--- a/fs/9p/cache.c
+++ b/fs/9p/cache.c
@@ -66,6 +66,7 @@ void v9fs_cache_session_get_cookie(struct v9fs_session_info 
*v9ses)
if (!v9ses->cachetag) {
if (v9fs_random_cachetag(v9ses) < 0) {
v9ses->fscache = NULL;
+   kfree(v9ses->cachetag);
return;
}
}
-- 
2.7.4



[PATCH v2] message/fusion/mptbase.c: Use kmemdup instead of memcpy and kmalloc

2019-05-22 Thread Bharath Vedartham
Replace kmalloc + memcpy with kmemdup.

This was reported by coccinelle.

Signed-off-by: Bharath Vedartham 

---
Changes since v2:
Removed the cast from pIoc2.
---
 drivers/message/fusion/mptbase.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c
index d8882b0..37876a7 100644
--- a/drivers/message/fusion/mptbase.c
+++ b/drivers/message/fusion/mptbase.c
@@ -6001,13 +6001,12 @@ mpt_findImVolumes(MPT_ADAPTER *ioc)
if (mpt_config(ioc, ) != 0)
goto out;
 
-   mem = kmalloc(iocpage2sz, GFP_KERNEL);
+   mem = kmemdup(pIoc2, iocpage2sz, GFP_KERNEL);
if (!mem) {
rc = -ENOMEM;
goto out;
}
 
-   memcpy(mem, (u8 *)pIoc2, iocpage2sz);
ioc->raid_data.pIocPg2 = (IOCPage2_t *) mem;
 
mpt_read_ioc_pg_3(ioc);
-- 
2.7.4



Re: [PATCH] message/fusion/mptbase.c: Use kmemdup instead of memcpy and kmalloc

2019-05-22 Thread Bharath Vedartham
On Wed, May 22, 2019 at 04:48:33AM -0700, Joe Perches wrote:
> On Wed, 2019-05-22 at 15:23 +0530, Bharath Vedartham wrote:
> > Replace kmalloc + memcpy with kmemdup.
> > This was reported by coccinelle.
> []
> > diff --git a/drivers/message/fusion/mptbase.c 
> > b/drivers/message/fusion/mptbase.c
> []
> > @@ -6001,13 +6001,12 @@ mpt_findImVolumes(MPT_ADAPTER *ioc)
> > if (mpt_config(ioc, ) != 0)
> > goto out;
> >  
> > -   mem = kmalloc(iocpage2sz, GFP_KERNEL);
> > +   mem = kmemdup((u8 *)pIoc2, iocpage2sz, GFP_KERNEL);
> 
> You should remove the unnecessary cast here.
> 
>
Yes! I will change this in v2.

Thanks
Bharath


[PATCH] message/fusion/mptbase.c: Use kmemdup instead of memcpy and kmalloc

2019-05-22 Thread Bharath Vedartham
Replace kmalloc + memcpy with kmemdup.

This was reported by coccinelle.

Signed-off-by: Bharath Vedartham 
---
 drivers/message/fusion/mptbase.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c
index d8882b0..37876a7 100644
--- a/drivers/message/fusion/mptbase.c
+++ b/drivers/message/fusion/mptbase.c
@@ -6001,13 +6001,12 @@ mpt_findImVolumes(MPT_ADAPTER *ioc)
if (mpt_config(ioc, ) != 0)
goto out;
 
-   mem = kmalloc(iocpage2sz, GFP_KERNEL);
+   mem = kmemdup((u8 *)pIoc2, iocpage2sz, GFP_KERNEL);
if (!mem) {
rc = -ENOMEM;
goto out;
}
 
-   memcpy(mem, (u8 *)pIoc2, iocpage2sz);
ioc->raid_data.pIocPg2 = (IOCPage2_t *) mem;
 
mpt_read_ioc_pg_3(ioc);
-- 
2.7.4



[PATCH] nfs/dns_resolve.c: Move redundant headers to the top

2019-05-11 Thread Bharath Vedartham
NFS uses either the kernel's DNS service or its own depending on whether
CONFIG_NFS_USE_KERNEL_DNS is enabled or not.

4 header files which are:
(i) #include 
(ii) #include 
(iii) #include 
(iv) #include "dns_resolve.h"

These 4 header files are used regardless of whether
CONFIG_NFS_USE_KERNEL_DNS is enabled or not. But they have been included
under #ifdef CONFIG_NFS_USE_KERNEL_DNS and also the #else section(if
CONFIG_NFS_USE_KERNEL_DNS is not enabled).

Move these redundant headers to the top of the file i.e include them
before #ifdef CONFIG_NFS_USE_KERNEL_DNS.

Tested by compiling the kernel and also running
./scripts/checkincludes.pl on fs/nfs/dns_resolve.c

Signed-off-by: Bharath Vedartham 
---
 fs/nfs/dns_resolve.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/nfs/dns_resolve.c b/fs/nfs/dns_resolve.c
index a7d3df8..b34e575 100644
--- a/fs/nfs/dns_resolve.c
+++ b/fs/nfs/dns_resolve.c
@@ -7,14 +7,15 @@
  * Resolves DNS hostnames into valid ip addresses
  */
 
-#ifdef CONFIG_NFS_USE_KERNEL_DNS
-
 #include 
 #include 
 #include 
-#include 
 #include "dns_resolve.h"
 
+#ifdef CONFIG_NFS_USE_KERNEL_DNS
+
+#include 
+
 ssize_t nfs_dns_resolve_name(struct net *net, char *name, size_t namelen,
struct sockaddr *sa, size_t salen)
 {
@@ -33,24 +34,19 @@ ssize_t nfs_dns_resolve_name(struct net *net, char *name, 
size_t namelen,
 
 #else
 
-#include 
 #include 
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-#include 
-#include 
 #include 
 #include 
 #include 
 #include 
 
 #include "nfs4_fs.h"
-#include "dns_resolve.h"
 #include "cache_lib.h"
 #include "netns.h"
 
-- 
2.7.4



[PATCH] mm/gup.c: Make follow_page_mask static

2019-05-10 Thread Bharath Vedartham
follow_page_mask is only used in gup.c, make it static.

Tested by compiling and booting. Grepped the source for
"follow_page_mask" to be sure it is not used else where.

Signed-off-by: Bharath Vedartham 
---
 mm/gup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/gup.c b/mm/gup.c
index 91819b8..e6f3b7f 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -409,7 +409,7 @@ static struct page *follow_p4d_mask(struct vm_area_struct 
*vma,
  * an error pointer if there is a mapping to something not represented
  * by a page descriptor (see also vm_normal_page()).
  */
-struct page *follow_page_mask(struct vm_area_struct *vma,
+static struct page *follow_page_mask(struct vm_area_struct *vma,
  unsigned long address, unsigned int flags,
  struct follow_page_context *ctx)
 {
-- 
2.7.4



Re: [PATCH 4.19 00/23] 4.19.40-stable review

2019-05-05 Thread Bharath Vedartham
Built and booted on my x86_64 machine. No dmesg regressions observed.

Thanks
Bharath


Re: [PATCH] mm/huge_memory.c: Make __thp_get_unmapped_area static

2019-05-04 Thread Bharath Vedartham
On Sat, May 04, 2019 at 07:54:46AM -0400, Michal Hocko wrote:
Hi Michal,

Thanks for having a look at the patch. 
> On Sat 04-05-19 15:53:54, Bharath Vedartham wrote:
> > __thp_get_unmapped_area is only used in mm/huge_memory.c. Make it
> > static.
> 
> Makes sense. Looks like an omission.
> 
> > Tested by building and booting the kernel.
> 
> Testing by git grep __thp_get_unmapped_area would give you a better
> picture. Build test migh not hit paths that are config specific and
> static aspect of a functions should not have any functionality related
> side effects AFAICS.
>  
I have made sure CONFIG_TRANSPARENT_HUGEPAGE was enabled before building
and booting the kernel. I have also grepped the entire kernel source for
__thp_get_unmapped_area to be very sure.
> > Signed-off-by: Bharath Vedartham 
> 
> Acked-by: Michal Hocko 
> 
> > ---
> >  mm/huge_memory.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 165ea46..75fe2b7 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -509,7 +509,7 @@ void prep_transhuge_page(struct page *page)
> > set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
> >  }
> >  
> > -unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long len,
> > +static unsigned long __thp_get_unmapped_area(struct file *filp, unsigned 
> > long len,
> > loff_t off, unsigned long flags, unsigned long size)
> >  {
> > unsigned long addr;
> > -- 
> > 2.7.4
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs

Thanks
Bharath


[PATCH] mm/huge_memory.c: Make __thp_get_unmapped_area static

2019-05-04 Thread Bharath Vedartham
__thp_get_unmapped_area is only used in mm/huge_memory.c. Make it
static. Tested by building and booting the kernel.

Signed-off-by: Bharath Vedartham 
---
 mm/huge_memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 165ea46..75fe2b7 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -509,7 +509,7 @@ void prep_transhuge_page(struct page *page)
set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
 }
 
-unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long len,
+static unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long 
len,
loff_t off, unsigned long flags, unsigned long size)
 {
unsigned long addr;
-- 
2.7.4



Re: [PATCH 5.0 00/89] 5.0.11-stable review

2019-05-01 Thread Bharath Vedartham
Built and booted on my x86 machine. No dmesg regression.


Re: [PATCH 4.19 000/100] 4.19.38-stable review

2019-05-01 Thread Bharath Vedartham
Built and booted on my x86 machine with defconfig. No dmesg regressions.

Thank you
Bharath


[PATCH] drivers/char/random.c: Make add_hwgenerator_randomness static

2019-04-26 Thread Bharath Vedartham
Make add_hwgenerator_randomness static as it is only declared on
drivers/char/random.c

Signed-off-by: Bharath Vedartham 
---
 drivers/char/random.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 38c6d1a..08fd00a 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -2342,7 +2342,7 @@ randomize_page(unsigned long start, unsigned long range)
  * Those devices may produce endless random bits and will be throttled
  * when our pool is full.
  */
-void add_hwgenerator_randomness(const char *buffer, size_t count,
+static void add_hwgenerator_randomness(const char *buffer, size_t count,
size_t entropy)
 {
struct entropy_store *poolp = _pool;
-- 
2.7.4



[PATCH] apparmor: Force type-casting of current->real_cred

2019-04-23 Thread Bharath Vedartham
This patch fixes the sparse warning:
warning: cast removes address space '' of expression.

Signed-off-by: Bharath Vedartham 
---
 security/apparmor/lsm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 87500bd..36478c4 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -1529,7 +1529,7 @@ static int param_set_mode(const char *val, const struct 
kernel_param *kp)
  */
 static int __init set_init_ctx(void)
 {
-   struct cred *cred = (struct cred *)current->real_cred;
+   struct cred *cred = (__force struct cred *)current->real_cred;
 
set_cred_label(cred, aa_get_label(ns_unconfined(root_ns)));
 
-- 
2.7.4



Re: [PATCH] reiserfs: Add comment to explain endianness issue in xattr_hash

2019-04-23 Thread Bharath Vedartham
On Tue, Apr 23, 2019 at 09:48:31PM +0530, Bharath Vedartham wrote:
> csum_partial() gives different results for little-endian and big-endian
> hosts. This causes images created on little-endian hosts and mounted on
> big endian hosts to see csum mismatches. This causes an endianness bug.
> Sparse gives a warning as csum_partial returns a restricted integer type
> __wsum_t and xattr_hash expects __u32. This warning acts as a reminder
> for this bug and should not be suppressed.
> 
> This comment aims to convey these endianness issues.
> 
> Signed-off-by: Bharath Vedartham 
> ---
>  fs/reiserfs/xattr.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/fs/reiserfs/xattr.c b/fs/reiserfs/xattr.c
> index 32d8986..0ea6654 100644
> --- a/fs/reiserfs/xattr.c
> +++ b/fs/reiserfs/xattr.c
> @@ -450,6 +450,16 @@ static struct page *reiserfs_get_page(struct inode *dir, 
> size_t n)
>  
>  static inline __u32 xattr_hash(const char *msg, int len)
>  {
> + /*
> +  * csum_partial() gives different results for little-endian and
> +  * big endian hosts. Images created on little-endian hosts and
> +  * mounted on big-endian hosts(and vice versa) will see csum mismatches
> +  * when trying to fetch xattrs. Treating the hash as __wsum_t would
> +  * lower the frequency of mismatch. This is an endianness bug in 
> reiserfs.
> +  * The return statement would result in a sparse warning. Do not fix 
> the sparse
> +  * warning so as to not hide the reminder of the bug.
> +  */
> +
>   return csum_partial(msg, len, 0);
>  }
>  
> -- 
> 2.7.4
>

Is this good or is it lacking any explanation?


[PATCH] reiserfs: Add comment to explain endianness issue in xattr_hash

2019-04-23 Thread Bharath Vedartham
csum_partial() gives different results for little-endian and big-endian
hosts. This causes images created on little-endian hosts and mounted on
big endian hosts to see csum mismatches. This causes an endianness bug.
Sparse gives a warning as csum_partial returns a restricted integer type
__wsum_t and xattr_hash expects __u32. This warning acts as a reminder
for this bug and should not be suppressed.

This comment aims to convey these endianness issues.

Signed-off-by: Bharath Vedartham 
---
 fs/reiserfs/xattr.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/fs/reiserfs/xattr.c b/fs/reiserfs/xattr.c
index 32d8986..0ea6654 100644
--- a/fs/reiserfs/xattr.c
+++ b/fs/reiserfs/xattr.c
@@ -450,6 +450,16 @@ static struct page *reiserfs_get_page(struct inode *dir, 
size_t n)
 
 static inline __u32 xattr_hash(const char *msg, int len)
 {
+   /*
+* csum_partial() gives different results for little-endian and
+* big endian hosts. Images created on little-endian hosts and
+* mounted on big-endian hosts(and vice versa) will see csum mismatches
+* when trying to fetch xattrs. Treating the hash as __wsum_t would
+* lower the frequency of mismatch. This is an endianness bug in 
reiserfs.
+* The return statement would result in a sparse warning. Do not fix 
the sparse
+* warning so as to not hide the reminder of the bug.
+*/
+
return csum_partial(msg, len, 0);
 }
 
-- 
2.7.4



Re: [PATCH] reiserfs: Force type conversion in xattr_hash

2019-04-23 Thread Bharath Vedartham
On Mon, Apr 22, 2019 at 12:27:05PM -0700, Andrew Morton wrote:
> On Sun, 21 Apr 2019 18:02:35 +0100 Al Viro  wrote:
> 
> > IOW, what sparse has caught here is a genuine endianness bug; images
> > created on little-endian host and mounted on big-endian (or vice
> > versa) will see csum mismatches when trying to fetch xattrs.
> 
> OK, thanks.  I'll drop the patch - we shouldn't hide that reminder of a
> bug.
> 
> Perhaps someone could prepare a patch which adds a comment explaining
> these issues, so someone else doesn't try to "fix" the sparse warning.
>

Hi Andrew,

I will send a patch CCing Al to add a comment explaining these issues.

Thanks
Bharath


Re: [PATCH] reiserfs: Force type conversion in xattr_hash

2019-04-23 Thread Bharath Vedartham
On Sun, Apr 21, 2019 at 06:02:35PM +0100, Al Viro wrote:
> On Thu, Apr 18, 2019 at 03:50:19PM -0700, Andrew Morton wrote:
> > On Wed, 17 Apr 2019 17:22:00 +0530 Bharath Vedartham  
> > wrote:
> > 
> > > This patch fixes the sparse warning:
> > > 
> > > fs/reiserfs//xattr.c:453:28: warning: incorrect type in return
> > > expression (different base types)
> > > fs/reiserfs//xattr.c:453:28:expected unsigned int
> > > fs/reiserfs//xattr.c:453:28:got restricted __wsum
> > > fs/reiserfs//xattr.c:453:28: warning: incorrect type in return
> > > expression (different base types)
> > > fs/reiserfs//xattr.c:453:28:expected unsigned int
> > > fs/reiserfs//xattr.c:453:28:got restricted __wsum
> > > 
> > > csum_partial returns restricted integer __wsum whereas xattr_hash
> > > expects a return type of __u32.
> > > 
> > > ...
> > >
> > > --- a/fs/reiserfs/xattr.c
> > > +++ b/fs/reiserfs/xattr.c
> > > @@ -450,7 +450,7 @@ static struct page *reiserfs_get_page(struct inode 
> > > *dir, size_t n)
> > >  
> > >  static inline __u32 xattr_hash(const char *msg, int len)
> > >  {
> > > - return csum_partial(msg, len, 0);
> > > + return (__force __u32)csum_partial(msg, len, 0);
> > >  }
> > >  
> > >  int reiserfs_commit_write(struct file *f, struct page *page,
> > 
> > hm.  Conversion from int to __u32 should be OK - why is sparse being so
> > picky here?
> 
> Because csum_partial() returns __wsum_t, not int.
> 
> > Why is the __force needed, btw?
> 
> So that accidental mixing of those csums (both 16bit and 32bit) with
> host- or net-endian would be caught.
> 
> And I'm not at all sure reiserfs xattr_hash() doesn't bugger it up, actually.
> 
> Recall that 16bit inet csum is the sum of 16bit words (treated as host-endian)
> modulo 0x, i.e. the entire buffer interpreted as host-endian integer
> taken modulo 0x.  That has a lovely property - memory representation
> of that value is the same whether we'd done calculations on b-e or l-e
> host; the reason is that modulo 65535 byteswap is the same as multiplying
> by 256, so the sum of byteswapped 16bit values modulo 65535 is byteswapped
> sum of original values.
> 
> csum_partial() is sum of 32bit words (treated as host-endian) modulo 
> 0x,
> i.e. the entire buffer treated as host-endian number modulo 0x.
> It is convenient when we want to calculate the 16bit csum - 0x is
> a multiple of 0x, so residue modulo 0x determines the residue
> modulo 0x; that's what csum_fold() is.
> 
> However, result of csum_partial() on big- and little-endian hosts
> does *not* have the same property.  Consider e.g. an array {0, 0, 0, 128,
> 0, 0, 0, 128}.  csum_partial of that on l-e will be (2^31 + 2^31)mod(2^32 - 
> 1),
> i.e. 1, with {1, 0, 0, 0} as memory representation.  16bit csum will
> again be 1, with {1, 0} as memory representation.  On big-endian we
> get (128 + 128)mod(2^32 - 1), i.e. 256, with {0, 0, 1, 0} as memory
> representation.  16bit csum is again 256, stored as {1, 0}, i.e.
> the same as if we'd done everything on l-e; however, raw csum_partial()
> values have different memory representations.  They certainly are
> different as host-endian (and so are 16bit csums).
> 
> Reiserfs takes csum_partial() on buffer, interprets it as host-endian
> and stores it little-endian on disk.  When fetching those it does
> the same calculation and fails on mismatch.  However, if the
> store had been done on little-endian host and load - on big-endian
> one we *will* get mismatch almost all the time.  Treating ->rx_hash
> as __wsum_t (and not doing that cpu_to_le32()) would lower the
> frequency of mismatches, but still would be broken.  Storing
> a 16bit csum (declared as __sum16_t, again, without cpu_to_le...())
> would be endian-safe, but that's not what reiserfs folks wanted
> (16 bits of csum instead of 32, for starters).
> 
> IOW, what sparse has caught here is a genuine endianness bug; images
> created on little-endian host and mounted on big-endian (or vice
> versa) will see csum mismatches when trying to fetch xattrs.
> Broken since
> commit 0b1a6a8ca8a78c2e068b04acf97479ee89a024ac
> Author: Andrew Morton 
> Date:   Sun May 9 23:59:13 2004 -0700
> 
> [PATCH] reiserfs: xattr support
> 
> From: Chris Mason 
> 
> From: je...@suse.com
> 
> reiserfs support for xattrs
> 
> ISTR some discussions of reiserfs layout endianness problems, but
> that had been many years ago and I could be wrong; I _think_
> the 

[PATCH] fs/ufs: Force type conversion from __fs16 to u16

2019-04-23 Thread Bharath Vedartham
This patch fixes the sparse warning:
warning: restricted __fs16 degrades to integer

inode->ui_u1.oldids.ui_suid is of type __fs16, a restricted integer.
0X is a 16 bit unsigned integer. Use __force to fix the sparse
warning.

Signed-off-by: Bharath Vedartham 
---
 fs/ufs/util.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ufs/util.h b/fs/ufs/util.h
index 1fd3011..b03143f 100644
--- a/fs/ufs/util.h
+++ b/fs/ufs/util.h
@@ -195,7 +195,7 @@ ufs_get_inode_uid(struct super_block *sb, struct ufs_inode 
*inode)
case UFS_UID_44BSD:
return fs32_to_cpu(sb, inode->ui_u3.ui_44.ui_uid);
case UFS_UID_EFT:
-   if (inode->ui_u1.oldids.ui_suid == 0x)
+   if ((__force u16)inode->ui_u1.oldids.ui_suid == 0x)
return fs32_to_cpu(sb, inode->ui_u3.ui_sun.ui_uid);
/* Fall through */
default:
@@ -229,7 +229,7 @@ ufs_get_inode_gid(struct super_block *sb, struct ufs_inode 
*inode)
case UFS_UID_44BSD:
return fs32_to_cpu(sb, inode->ui_u3.ui_44.ui_gid);
case UFS_UID_EFT:
-   if (inode->ui_u1.oldids.ui_suid == 0x)
+   if ((__force u16)inode->ui_u1.oldids.ui_suid == 0x)
return fs32_to_cpu(sb, inode->ui_u3.ui_sun.ui_gid);
/* Fall through */
default:
-- 
2.7.4



Re: [PATCH 5.0 00/93] 5.0.9-stable review

2019-04-20 Thread Bharath Vedartham
Built and booted on my x86 machine.
Observed no dmesg regressions.


Re: [PATCH 4.9 00/50] 4.9.170-stable review

2019-04-20 Thread Bharath Vedartham
Built and booted the kernel with defconfig on my x86 machine.
Observed no dmesg regressions.


Re: [PATCH] reiserfs: Force type conversion in xattr_hash

2019-04-19 Thread Bharath Vedartham
On Thu, Apr 18, 2019 at 03:50:19PM -0700, Andrew Morton wrote:
> On Wed, 17 Apr 2019 17:22:00 +0530 Bharath Vedartham  
> wrote:
> 
> > This patch fixes the sparse warning:
> > 
> > fs/reiserfs//xattr.c:453:28: warning: incorrect type in return
> > expression (different base types)
> > fs/reiserfs//xattr.c:453:28:expected unsigned int
> > fs/reiserfs//xattr.c:453:28:got restricted __wsum
> > fs/reiserfs//xattr.c:453:28: warning: incorrect type in return
> > expression (different base types)
> > fs/reiserfs//xattr.c:453:28:expected unsigned int
> > fs/reiserfs//xattr.c:453:28:got restricted __wsum
> > 
> > csum_partial returns restricted integer __wsum whereas xattr_hash
> > expects a return type of __u32.
> > 
> > ...
> >
> > --- a/fs/reiserfs/xattr.c
> > +++ b/fs/reiserfs/xattr.c
> > @@ -450,7 +450,7 @@ static struct page *reiserfs_get_page(struct inode 
> > *dir, size_t n)
> >  
> >  static inline __u32 xattr_hash(const char *msg, int len)
> >  {
> > -   return csum_partial(msg, len, 0);
> > +   return (__force __u32)csum_partial(msg, len, 0);
> >  }
> >  
> >  int reiserfs_commit_write(struct file *f, struct page *page,
> 
> hm.  Conversion from int to __u32 should be OK - why is sparse being so
> picky here?
> 
> Why is the __force needed, btw?
The return type of csum_partial is __wsum which is a restricted integer
type.

__wsum is defined as:
typedef __u32 __bitwise __wsum;

Being a restricted integer type, sparse will complain whenever convert
the restricted type to another type without __force.

Interestingly enough if we look at the definition of csum_partial, more
specifically the first 2 lines:

__wsum csum_partial(const void *buff, int len, __wsum sum)
{
unsigned long result = do_csum(buff, len);

result += (__force u32)sum;

.

sum is of type __wsum, result is of type unsigned long. __force is used
to suppress the sparse warning.

Thanks for your time!


[PATCH] f2fs: Force type conversion from __le32 to u32

2019-04-18 Thread Bharath Vedartham
This patch forces type conversion from __le32 to u32 to prevent sparse
warnings like:
warning: restricted __le32 degrades to integer

dir.c:
fscrypt_fname_disk_to_usr takes a hash of type u32 as it's second arg
but de->hash_code is of type __le32.

node.c
NULL_ADDR is of type u32 but block_addr is of type __le32.

Signed-off-by: Bharath Vedartham 
---
 fs/f2fs/dir.c  | 2 +-
 fs/f2fs/node.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index 59bc460..4f39872 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -828,7 +828,7 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct 
f2fs_dentry_ptr *d,
int save_len = fstr->len;
 
err = fscrypt_fname_disk_to_usr(d->inode,
-   (u32)de->hash_code, 0,
+   (__force u32)de->hash_code, 0,
_name, fstr);
if (err)
goto out;
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 3f99ab2..9e333a5 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -2706,7 +2706,7 @@ static void __update_nat_bits(struct f2fs_sb_info *sbi, 
nid_t start_nid,
i = 1;
}
for (; i < NAT_ENTRY_PER_BLOCK; i++) {
-   if (nat_blk->entries[i].block_addr != NULL_ADDR)
+   if ((__force u32)nat_blk->entries[i].block_addr != NULL_ADDR)
valid++;
}
if (valid == 0) {
-- 
2.7.4



  1   2   >