date:20180206

Re: [PATCH 0/2] rcu: Transform kfree_rcu() into kvfree_rcu()

2018-02-06 Thread Kirill Tkhai

On 07.02.2018 08:02, Paul E. McKenney wrote:
> On Tue, Feb 06, 2018 at 08:23:34PM -0800, Matthew Wilcox wrote:
>> On Tue, Feb 06, 2018 at 06:17:03PM -0800, Paul E. McKenney wrote:
>>> So it is OK to kvmalloc() something and pass it to either kfree() or
>>> kvfree(), and it had better be OK to kvmalloc() something and pass it
>>> to kvfree().
>>>
>>> Is it OK to kmalloc() something and pass it to kvfree()?
>>
>> Yes, it absolutely is.
>>
>> void kvfree(const void *addr)
>> {
>> if (is_vmalloc_addr(addr))
>> vfree(addr);
>> else
>> kfree(addr);
>> }
>>
>>> If so, is it really useful to have two different names here, that is,
>>> both kfree_rcu() and kvfree_rcu()?
>>
>> I think it's handy to have all three of kvfree_rcu(), kfree_rcu() and
>> vfree_rcu() available in the API for the symmetry of calling kmalloc()
>> / kfree_rcu().
>>
>> Personally, I would like us to rename kvfree() to just free(), and have
>> malloc(x) be an alias to kvmalloc(x, GFP_KERNEL), but I haven't won that
>> fight yet.
> 
> But why not just have the existing kfree_rcu() API cover both kmalloc()
> and kvmalloc()?  Perhaps I am not in the right forums, but I am not hearing
> anyone arguing that the RCU API has too few members.  ;-)

People, far from RCU internals, consider kfree_rcu() like an extension
of kfree(). And it's not clear it's need to dive into kfree_rcu() comments,
when someone is looking a primitive to free vmalloc'ed memory.

Also, construction like

obj = kvmalloc();
kfree_rcu(obj);

makes me think it's legitimately to use plain kfree() as pair bracket to 
kvmalloc().

So the significant change of kfree_rcu() behavior will complicate stable 
backporters
life, because they will need to keep in mind such differences between different
kernel versions.

It seems if we are going to use the single primitive for both kmalloc()
and kvmalloc() memory, it has to have another name. But I don't see problems
with having both kfree_rcu() and kvfree_rcu().

Kirill

Re: [PATCH 0/2] rcu: Transform kfree_rcu() into kvfree_rcu()

2018-02-06 Thread Kirill Tkhai

On 07.02.2018 08:02, Paul E. McKenney wrote:
> On Tue, Feb 06, 2018 at 08:23:34PM -0800, Matthew Wilcox wrote:
>> On Tue, Feb 06, 2018 at 06:17:03PM -0800, Paul E. McKenney wrote:
>>> So it is OK to kvmalloc() something and pass it to either kfree() or
>>> kvfree(), and it had better be OK to kvmalloc() something and pass it
>>> to kvfree().
>>>
>>> Is it OK to kmalloc() something and pass it to kvfree()?
>>
>> Yes, it absolutely is.
>>
>> void kvfree(const void *addr)
>> {
>> if (is_vmalloc_addr(addr))
>> vfree(addr);
>> else
>> kfree(addr);
>> }
>>
>>> If so, is it really useful to have two different names here, that is,
>>> both kfree_rcu() and kvfree_rcu()?
>>
>> I think it's handy to have all three of kvfree_rcu(), kfree_rcu() and
>> vfree_rcu() available in the API for the symmetry of calling kmalloc()
>> / kfree_rcu().
>>
>> Personally, I would like us to rename kvfree() to just free(), and have
>> malloc(x) be an alias to kvmalloc(x, GFP_KERNEL), but I haven't won that
>> fight yet.
> 
> But why not just have the existing kfree_rcu() API cover both kmalloc()
> and kvmalloc()?  Perhaps I am not in the right forums, but I am not hearing
> anyone arguing that the RCU API has too few members.  ;-)

People, far from RCU internals, consider kfree_rcu() like an extension
of kfree(). And it's not clear it's need to dive into kfree_rcu() comments,
when someone is looking a primitive to free vmalloc'ed memory.

Also, construction like

obj = kvmalloc();
kfree_rcu(obj);

makes me think it's legitimately to use plain kfree() as pair bracket to 
kvmalloc().

So the significant change of kfree_rcu() behavior will complicate stable 
backporters
life, because they will need to keep in mind such differences between different
kernel versions.

It seems if we are going to use the single primitive for both kmalloc()
and kvmalloc() memory, it has to have another name. But I don't see problems
with having both kfree_rcu() and kvfree_rcu().

Kirill

Re: [PATCH 0/2] rcu: Transform kfree_rcu() into kvfree_rcu()

2018-02-06 Thread Josh Triplett

On Tue, Feb 06, 2018 at 09:02:00PM -0800, Paul E. McKenney wrote:
> On Tue, Feb 06, 2018 at 08:23:34PM -0800, Matthew Wilcox wrote:
> > On Tue, Feb 06, 2018 at 06:17:03PM -0800, Paul E. McKenney wrote:
> > > So it is OK to kvmalloc() something and pass it to either kfree() or
> > > kvfree(), and it had better be OK to kvmalloc() something and pass it
> > > to kvfree().
> > > 
> > > Is it OK to kmalloc() something and pass it to kvfree()?
> > 
> > Yes, it absolutely is.
> > 
> > void kvfree(const void *addr)
> > {
> > if (is_vmalloc_addr(addr))
> > vfree(addr);
> > else
> > kfree(addr);
> > }
> > 
> > > If so, is it really useful to have two different names here, that is,
> > > both kfree_rcu() and kvfree_rcu()?
> > 
> > I think it's handy to have all three of kvfree_rcu(), kfree_rcu() and
> > vfree_rcu() available in the API for the symmetry of calling kmalloc()
> > / kfree_rcu().
> > 
> > Personally, I would like us to rename kvfree() to just free(), and have
> > malloc(x) be an alias to kvmalloc(x, GFP_KERNEL), but I haven't won that
> > fight yet.
> 
> But why not just have the existing kfree_rcu() API cover both kmalloc()
> and kvmalloc()?  Perhaps I am not in the right forums, but I am not hearing
> anyone arguing that the RCU API has too few members.  ;-)

I don't have any problem with having just `kvfree_rcu`, but having just
`kfree_rcu` seems confusingly asymmetric.

(Also, count me in favor of having just one "free" function, too.)

Re: [PATCH 0/2] rcu: Transform kfree_rcu() into kvfree_rcu()

2018-02-06 Thread Josh Triplett

On Tue, Feb 06, 2018 at 09:02:00PM -0800, Paul E. McKenney wrote:
> On Tue, Feb 06, 2018 at 08:23:34PM -0800, Matthew Wilcox wrote:
> > On Tue, Feb 06, 2018 at 06:17:03PM -0800, Paul E. McKenney wrote:
> > > So it is OK to kvmalloc() something and pass it to either kfree() or
> > > kvfree(), and it had better be OK to kvmalloc() something and pass it
> > > to kvfree().
> > > 
> > > Is it OK to kmalloc() something and pass it to kvfree()?
> > 
> > Yes, it absolutely is.
> > 
> > void kvfree(const void *addr)
> > {
> > if (is_vmalloc_addr(addr))
> > vfree(addr);
> > else
> > kfree(addr);
> > }
> > 
> > > If so, is it really useful to have two different names here, that is,
> > > both kfree_rcu() and kvfree_rcu()?
> > 
> > I think it's handy to have all three of kvfree_rcu(), kfree_rcu() and
> > vfree_rcu() available in the API for the symmetry of calling kmalloc()
> > / kfree_rcu().
> > 
> > Personally, I would like us to rename kvfree() to just free(), and have
> > malloc(x) be an alias to kvmalloc(x, GFP_KERNEL), but I haven't won that
> > fight yet.
> 
> But why not just have the existing kfree_rcu() API cover both kmalloc()
> and kvmalloc()?  Perhaps I am not in the right forums, but I am not hearing
> anyone arguing that the RCU API has too few members.  ;-)

I don't have any problem with having just `kvfree_rcu`, but having just
`kfree_rcu` seems confusingly asymmetric.

(Also, count me in favor of having just one "free" function, too.)

Re: [PATCH v3 1/2] drm/virtio: Add window server support

2018-02-06 Thread Tomeu Vizoso


On 02/07/2018 02:09 AM, Michael S. Tsirkin wrote:

On Tue, Feb 06, 2018 at 03:23:02PM +0100, Gerd Hoffmann wrote:

Creation of shareable buffer by guest
-

1. Client requests virtio driver to create a buffer suitable for sharing
with host (DRM_VIRTGPU_RESOURCE_CREATE)


client or guest proxy?


4. QEMU maps that buffer to the guest's address space
(KVM_SET_USER_MEMORY_REGION), passes the guest PFN to the virtio driver


That part is problematic.  The host can't simply allocate something in
the physical address space, because most physical address space
management is done by the guest.  All pci bars are mapped by the guest
firmware for example (or by the guest OS in case of hotplug).


4. QEMU pops data+buffers from the virtqueue, looks up shmem FD for each
resource, sends data + FDs to the compositor with SCM_RIGHTS


If you squint hard, this sounds a bit like a use-case for vhost-user-gpu, does 
it not?


Can you extend on what makes you think that?

As an aside, crosvm runs the virtio-gpu device in a separate, jailed
process, among other virtual devices.

https://chromium.googlesource.com/chromiumos/platform/crosvm/

Regards,

Tomeu

Re: [PATCH v3 1/2] drm/virtio: Add window server support

2018-02-06 Thread Tomeu Vizoso


On 02/07/2018 02:09 AM, Michael S. Tsirkin wrote:

On Tue, Feb 06, 2018 at 03:23:02PM +0100, Gerd Hoffmann wrote:

Creation of shareable buffer by guest
-

1. Client requests virtio driver to create a buffer suitable for sharing
with host (DRM_VIRTGPU_RESOURCE_CREATE)


client or guest proxy?


4. QEMU maps that buffer to the guest's address space
(KVM_SET_USER_MEMORY_REGION), passes the guest PFN to the virtio driver


That part is problematic.  The host can't simply allocate something in
the physical address space, because most physical address space
management is done by the guest.  All pci bars are mapped by the guest
firmware for example (or by the guest OS in case of hotplug).


4. QEMU pops data+buffers from the virtqueue, looks up shmem FD for each
resource, sends data + FDs to the compositor with SCM_RIGHTS


If you squint hard, this sounds a bit like a use-case for vhost-user-gpu, does 
it not?


Can you extend on what makes you think that?

As an aside, crosvm runs the virtio-gpu device in a separate, jailed
process, among other virtual devices.

https://chromium.googlesource.com/chromiumos/platform/crosvm/

Regards,

Tomeu

Re: [PATCH v11 00/10] Application Data Integrity feature introduced by SPARC M7

2018-02-06 Thread Eric W. Biederman

Khalid Aziz  writes:

> On 02/01/2018 07:29 PM, ebied...@xmission.com wrote:
>> Khalid Aziz  writes:
>>
>>> V11 changes:
>>> This series is same as v10 and was simply rebased on 4.15 kernel. Can
>>> mm maintainers please review patches 2, 7, 8 and 9 which are arch
>>> independent, and include/linux/mm.h and mm/ksm.c changes in patch 10
>>> and ack these if everything looks good?
>>
>> I am a bit puzzled how this differs from the pkey's that other
>> architectures are implementing to achieve a similar result.
>>
>> I am a bit mystified why you don't store the tag in a vma
>> instead of inventing a new way to store data on page out.
>
> Hello Eric,
>
> As Steven pointed out, sparc sets tags per cacheline unlike pkey. This results
> in much finer granularity for tags that pkey and hence requires larger tag
> storage than what we can do in a vma.

*Nod*   I am a bit mystified where you keep the information in memory.
I would think the tags would need to be stored per cacheline or per
tlb entry, in some kind of cache that could overflow.  So I would be
surprised if swapping is the only time this information needs stored
in memory.  Which makes me wonder if you have the proper data
structures.

I would think an array per vma or something in the page tables would
tend to make sense.

But perhaps I am missing something.

>> Can you please use force_sig_fault to send these signals instead
>> of force_sig_info.  Emperically I have found that it is very
>> error prone to generate siginfo's by hand, especially on code
>> paths where several different si_codes may apply.  So it helps
>> to go through a helper function to ensure the fiddly bits are
>> all correct.  AKA the unused bits all need to be set to zero before
>> struct siginfo is copied to userspace.
>>
>
> What you say makes sense. I followed the same code as other fault handlers for
> sparc. I could change just the fault handlers for ADI related faults. Would it
> make more sense to change all the fault handlers in a separate patch and keep
> the code in arch/sparc/kernel/traps_64.c consistent? Dave M, do you have a
> preference?

It is my intention post -rc1 to start sending out patches to get the
rest of not just sparc but all of the architectures using the new
helpers.  I have the code I just ran out of time befor the merge
window opened to ensure everything had a good thorough review.

So if you can handle the your new changes I expect I will handle the
rest.

Eric

Re: [PATCH v11 00/10] Application Data Integrity feature introduced by SPARC M7

2018-02-06 Thread Eric W. Biederman

Khalid Aziz  writes:

> On 02/01/2018 07:29 PM, ebied...@xmission.com wrote:
>> Khalid Aziz  writes:
>>
>>> V11 changes:
>>> This series is same as v10 and was simply rebased on 4.15 kernel. Can
>>> mm maintainers please review patches 2, 7, 8 and 9 which are arch
>>> independent, and include/linux/mm.h and mm/ksm.c changes in patch 10
>>> and ack these if everything looks good?
>>
>> I am a bit puzzled how this differs from the pkey's that other
>> architectures are implementing to achieve a similar result.
>>
>> I am a bit mystified why you don't store the tag in a vma
>> instead of inventing a new way to store data on page out.
>
> Hello Eric,
>
> As Steven pointed out, sparc sets tags per cacheline unlike pkey. This results
> in much finer granularity for tags that pkey and hence requires larger tag
> storage than what we can do in a vma.

*Nod*   I am a bit mystified where you keep the information in memory.
I would think the tags would need to be stored per cacheline or per
tlb entry, in some kind of cache that could overflow.  So I would be
surprised if swapping is the only time this information needs stored
in memory.  Which makes me wonder if you have the proper data
structures.

I would think an array per vma or something in the page tables would
tend to make sense.

But perhaps I am missing something.

>> Can you please use force_sig_fault to send these signals instead
>> of force_sig_info.  Emperically I have found that it is very
>> error prone to generate siginfo's by hand, especially on code
>> paths where several different si_codes may apply.  So it helps
>> to go through a helper function to ensure the fiddly bits are
>> all correct.  AKA the unused bits all need to be set to zero before
>> struct siginfo is copied to userspace.
>>
>
> What you say makes sense. I followed the same code as other fault handlers for
> sparc. I could change just the fault handlers for ADI related faults. Would it
> make more sense to change all the fault handlers in a separate patch and keep
> the code in arch/sparc/kernel/traps_64.c consistent? Dave M, do you have a
> preference?

It is my intention post -rc1 to start sending out patches to get the
rest of not just sparc but all of the architectures using the new
helpers.  I have the code I just ran out of time befor the merge
window opened to ensure everything had a good thorough review.

So if you can handle the your new changes I expect I will handle the
rest.

Eric

Re: [PATCH 8/8] thermal/drivers/cpu_cooling: Add the combo cpu cooling device

2018-02-06 Thread Viresh Kumar

On 06-02-18, 11:48, Daniel Lezcano wrote:
> On 06/02/2018 05:28, Viresh Kumar wrote:

> > Surely we can do one thing at a time if that's the way we choose to do it.
> 
> Easy to say :)
> 
> The current code is to introduce the feature without impacting the DT
> bindings in order to keep focused on the thermal mitigation aspect.
> 
> There are still a lot of improvements to do after that. You are
> basically asking me to implement the copy-on-write before the memory
> management is complete.

Perhaps I wasn't clear. What I was trying to say is that we can do "one thing at
a time" if we choose to create a "combo device" (the way you proposed). I am not
trying to force you to solve all the problems in one go :)

> Can you give an example? Or your understanding is incorrect or I missed
> the point.

So I tried to write it down and realized I was assuming that different
cooling-maps can be provided for different cooling strategies
(cpufreq/cpuidle) and obviously that's not the case as its per device.
Not sure if it would be correct to explore the possibility of doing
that.

-- 
viresh

Re: [PATCH 8/8] thermal/drivers/cpu_cooling: Add the combo cpu cooling device

2018-02-06 Thread Viresh Kumar

On 06-02-18, 11:48, Daniel Lezcano wrote:
> On 06/02/2018 05:28, Viresh Kumar wrote:

> > Surely we can do one thing at a time if that's the way we choose to do it.
> 
> Easy to say :)
> 
> The current code is to introduce the feature without impacting the DT
> bindings in order to keep focused on the thermal mitigation aspect.
> 
> There are still a lot of improvements to do after that. You are
> basically asking me to implement the copy-on-write before the memory
> management is complete.

Perhaps I wasn't clear. What I was trying to say is that we can do "one thing at
a time" if we choose to create a "combo device" (the way you proposed). I am not
trying to force you to solve all the problems in one go :)

> Can you give an example? Or your understanding is incorrect or I missed
> the point.

So I tried to write it down and realized I was assuming that different
cooling-maps can be provided for different cooling strategies
(cpufreq/cpuidle) and obviously that's not the case as its per device.
Not sure if it would be correct to explore the possibility of doing
that.

-- 
viresh

Re: [PATCH v26 2/2 RESEND] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-06 Thread Wei Wang


On 02/07/2018 12:34 PM, Michael S. Tsirkin wrote:

On Wed, Feb 07, 2018 at 11:01:06AM +0800, Wei Wang wrote:

Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host polls the free
page vq after sending the starting cmd id, so the guest doesn't need to
kick after filling an element to the vq.

Host may also requests the guest to stop the reporting in advance by
sending the stop cmd id to the guest via the configuration register.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
---
  drivers/virtio/virtio_balloon.c | 255 +++-
  include/uapi/linux/virtio_balloon.h |   7 +
  mm/page_poison.c|   6 +
  3 files changed, 232 insertions(+), 36 deletions(-)

Resend Change:
- Expose page_poisoning_enabled to kernel modules

RESEND tag is for reposting unchanged patches.
you want to post a v27, and you want the mm patch
as a separate one, so you can get an ack on it from
someone on linux-mm.

In fact, I would probably add reporting the poison value as
a separate feature/couple of patches.



OK. I have made them separate patches in v27. Thanks a lot for reviewing 
so many versions, I learned a lot from the comments and discussion.


Best,
Wei

Re: [PATCH v26 2/2 RESEND] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-06 Thread Wei Wang


On 02/07/2018 12:34 PM, Michael S. Tsirkin wrote:

On Wed, Feb 07, 2018 at 11:01:06AM +0800, Wei Wang wrote:

Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host polls the free
page vq after sending the starting cmd id, so the guest doesn't need to
kick after filling an element to the vq.

Host may also requests the guest to stop the reporting in advance by
sending the stop cmd id to the guest via the configuration register.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
---
  drivers/virtio/virtio_balloon.c | 255 +++-
  include/uapi/linux/virtio_balloon.h |   7 +
  mm/page_poison.c|   6 +
  3 files changed, 232 insertions(+), 36 deletions(-)

Resend Change:
- Expose page_poisoning_enabled to kernel modules

RESEND tag is for reposting unchanged patches.
you want to post a v27, and you want the mm patch
as a separate one, so you can get an ack on it from
someone on linux-mm.

In fact, I would probably add reporting the poison value as
a separate feature/couple of patches.



OK. I have made them separate patches in v27. Thanks a lot for reviewing 
so many versions, I learned a lot from the comments and discussion.


Best,
Wei

[PATCH v27 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-06 Thread Wei Wang

Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host polls the free
page vq after sending the starting cmd id, so the guest doesn't need to
kick after filling an element to the vq.

Host may also requests the guest to stop the reporting in advance by
sending the stop cmd id to the guest via the configuration register.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
---
 drivers/virtio/virtio_balloon.c | 245 ++--
 include/uapi/linux/virtio_balloon.h |   4 +
 2 files changed, 213 insertions(+), 36 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index a1fb52c..39ecce3 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -63,6 +76,11 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* The new cmd id received from host */
+   uint32_t cmd_id_received;
+   /* The cmd id that is in use */
+   __virtio32 cmd_id_use;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -316,17 +334,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -343,6 +350,34 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   virtio_cread(vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, >cmd_id_received);
+   if (vb->cmd_id_received !=
+   VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(vb->balloon_wq,
+  >report_free_page_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+   }
+}
+
 static void update_balloon_size(struct virtio_balloon *vb)
 {
u32 actual = vb->num_pages;
@@ -417,42 +452,155 @@ static void update_balloon_size_func(struct work_struct 
*work)
 
 static int init_vqs(struct virtio_balloon *vb)
 {
-   struct virtqueue *vqs[3];
-   vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
};
-   static const char * const names[] = { "inflate", "deflate", "stats" };
-   int err, nvqs;
+   struct virtqueue *vqs[VIRTIO_BALLOON_VQ_MAX];
+

[PATCH v27 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-06 Thread Wei Wang

Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host polls the free
page vq after sending the starting cmd id, so the guest doesn't need to
kick after filling an element to the vq.

Host may also requests the guest to stop the reporting in advance by
sending the stop cmd id to the guest via the configuration register.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
---
 drivers/virtio/virtio_balloon.c | 245 ++--
 include/uapi/linux/virtio_balloon.h |   4 +
 2 files changed, 213 insertions(+), 36 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index a1fb52c..39ecce3 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -63,6 +76,11 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* The new cmd id received from host */
+   uint32_t cmd_id_received;
+   /* The cmd id that is in use */
+   __virtio32 cmd_id_use;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -316,17 +334,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -343,6 +350,34 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   virtio_cread(vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, >cmd_id_received);
+   if (vb->cmd_id_received !=
+   VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(vb->balloon_wq,
+  >report_free_page_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+   }
+}
+
 static void update_balloon_size(struct virtio_balloon *vb)
 {
u32 actual = vb->num_pages;
@@ -417,42 +452,155 @@ static void update_balloon_size_func(struct work_struct 
*work)
 
 static int init_vqs(struct virtio_balloon *vb)
 {
-   struct virtqueue *vqs[3];
-   vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
};
-   static const char * const names[] = { "inflate", "deflate", "stats" };
-   int err, nvqs;
+   struct virtqueue *vqs[VIRTIO_BALLOON_VQ_MAX];
+   vq_callback_t *callbacks[VIRTIO_BALLOON_VQ_MAX];
+   const char

[PATCH v27 4/4] virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON

2018-02-06 Thread Wei Wang

The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
guest is using page poisoning. Guest writes to the poison_val config
field to tell host about the page poisoning value in use.

Signed-off-by: Wei Wang 
Suggested-by: Michael S. Tsirkin 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
Cc: Andrew Morton 
---
 drivers/virtio/virtio_balloon.c | 10 ++
 include/uapi/linux/virtio_balloon.h |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 39ecce3..76b4853 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -685,6 +685,7 @@ static struct file_system_type balloon_fs = {
 static int virtballoon_probe(struct virtio_device *vdev)
 {
struct virtio_balloon *vb;
+   __u32 poison_val;
int err;
 
if (!vdev->config->get) {
@@ -728,6 +729,11 @@ static int virtballoon_probe(struct virtio_device *vdev)
goto out_del_vqs;
}
INIT_WORK(>report_free_page_work, report_free_page_func);
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
+   memset(_val, PAGE_POISON, sizeof(poison_val));
+   virtio_cwrite(vb->vdev, struct virtio_balloon_config,
+ poison_val, _val);
+   }
}
 
vb->nb.notifier_call = virtballoon_oom_notify;
@@ -846,6 +852,9 @@ static int virtballoon_restore(struct virtio_device *vdev)
 
 static int virtballoon_validate(struct virtio_device *vdev)
 {
+   if (!page_poisoning_enabled())
+   __virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_POISON);
+
__virtio_clear_bit(vdev, VIRTIO_F_IOMMU_PLATFORM);
return 0;
 }
@@ -855,6 +864,7 @@ static unsigned int features[] = {
VIRTIO_BALLOON_F_STATS_VQ,
VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
VIRTIO_BALLOON_F_FREE_PAGE_HINT,
+   VIRTIO_BALLOON_F_PAGE_POISON,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
diff --git a/include/uapi/linux/virtio_balloon.h 
b/include/uapi/linux/virtio_balloon.h
index 0c654db..3f97067 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -35,6 +35,7 @@
 #define VIRTIO_BALLOON_F_STATS_VQ  1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT3 /* VQ to report free pages */
+#define VIRTIO_BALLOON_F_PAGE_POISON   4 /* Guest is using page poisoning */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -47,6 +48,8 @@ struct virtio_balloon_config {
__u32 actual;
/* Free page report command id, readonly by guest */
__u32 free_page_report_cmd_id;
+   /* Stores PAGE_POISON if page poisoning is in use */
+   __u32 poison_val;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
-- 
2.7.4

[PATCH v27 4/4] virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON

2018-02-06 Thread Wei Wang

The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
guest is using page poisoning. Guest writes to the poison_val config
field to tell host about the page poisoning value in use.

Signed-off-by: Wei Wang 
Suggested-by: Michael S. Tsirkin 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
Cc: Andrew Morton 
---
 drivers/virtio/virtio_balloon.c | 10 ++
 include/uapi/linux/virtio_balloon.h |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 39ecce3..76b4853 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -685,6 +685,7 @@ static struct file_system_type balloon_fs = {
 static int virtballoon_probe(struct virtio_device *vdev)
 {
struct virtio_balloon *vb;
+   __u32 poison_val;
int err;
 
if (!vdev->config->get) {
@@ -728,6 +729,11 @@ static int virtballoon_probe(struct virtio_device *vdev)
goto out_del_vqs;
}
INIT_WORK(>report_free_page_work, report_free_page_func);
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
+   memset(_val, PAGE_POISON, sizeof(poison_val));
+   virtio_cwrite(vb->vdev, struct virtio_balloon_config,
+ poison_val, _val);
+   }
}
 
vb->nb.notifier_call = virtballoon_oom_notify;
@@ -846,6 +852,9 @@ static int virtballoon_restore(struct virtio_device *vdev)
 
 static int virtballoon_validate(struct virtio_device *vdev)
 {
+   if (!page_poisoning_enabled())
+   __virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_POISON);
+
__virtio_clear_bit(vdev, VIRTIO_F_IOMMU_PLATFORM);
return 0;
 }
@@ -855,6 +864,7 @@ static unsigned int features[] = {
VIRTIO_BALLOON_F_STATS_VQ,
VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
VIRTIO_BALLOON_F_FREE_PAGE_HINT,
+   VIRTIO_BALLOON_F_PAGE_POISON,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
diff --git a/include/uapi/linux/virtio_balloon.h 
b/include/uapi/linux/virtio_balloon.h
index 0c654db..3f97067 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -35,6 +35,7 @@
 #define VIRTIO_BALLOON_F_STATS_VQ  1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT3 /* VQ to report free pages */
+#define VIRTIO_BALLOON_F_PAGE_POISON   4 /* Guest is using page poisoning */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -47,6 +48,8 @@ struct virtio_balloon_config {
__u32 actual;
/* Free page report command id, readonly by guest */
__u32 free_page_report_cmd_id;
+   /* Stores PAGE_POISON if page poisoning is in use */
+   __u32 poison_val;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
-- 
2.7.4

[PATCH v27 3/4] mm/page_poison: expose page_poisoning_enabled to kernel modules

2018-02-06 Thread Wei Wang

In some usages, e.g. virtio-balloon, a kernel module needs to know if
page poisoning is in use. This patch exposes the page_poisoning_enabled
function to kernel modules.

Signed-off-by: Wei Wang 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Michael S. Tsirkin 
---
 mm/page_poison.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/mm/page_poison.c b/mm/page_poison.c
index e83fd44..c08d02a 100644
--- a/mm/page_poison.c
+++ b/mm/page_poison.c
@@ -30,6 +30,11 @@ bool page_poisoning_enabled(void)
debug_pagealloc_enabled()));
 }
 
+/**
+ * page_poisoning_enabled - check if page poisoning is enabled
+ *
+ * Return true if page poisoning is enabled, or false if not.
+ */
 static void poison_page(struct page *page)
 {
void *addr = kmap_atomic(page);
@@ -37,6 +42,7 @@ static void poison_page(struct page *page)
memset(addr, PAGE_POISON, PAGE_SIZE);
kunmap_atomic(addr);
 }
+EXPORT_SYMBOL_GPL(page_poisoning_enabled);
 
 static void poison_pages(struct page *page, int n)
 {
-- 
2.7.4

[PATCH v27 3/4] mm/page_poison: expose page_poisoning_enabled to kernel modules

2018-02-06 Thread Wei Wang

In some usages, e.g. virtio-balloon, a kernel module needs to know if
page poisoning is in use. This patch exposes the page_poisoning_enabled
function to kernel modules.

Signed-off-by: Wei Wang 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Michael S. Tsirkin 
---
 mm/page_poison.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/mm/page_poison.c b/mm/page_poison.c
index e83fd44..c08d02a 100644
--- a/mm/page_poison.c
+++ b/mm/page_poison.c
@@ -30,6 +30,11 @@ bool page_poisoning_enabled(void)
debug_pagealloc_enabled()));
 }
 
+/**
+ * page_poisoning_enabled - check if page poisoning is enabled
+ *
+ * Return true if page poisoning is enabled, or false if not.
+ */
 static void poison_page(struct page *page)
 {
void *addr = kmap_atomic(page);
@@ -37,6 +42,7 @@ static void poison_page(struct page *page)
memset(addr, PAGE_POISON, PAGE_SIZE);
kunmap_atomic(addr);
 }
+EXPORT_SYMBOL_GPL(page_poisoning_enabled);
 
 static void poison_pages(struct page *page, int n)
 {
-- 
2.7.4

[PATCH v27 0/4] Virtio-balloon: support free page reporting

2018-02-06 Thread Wei Wang

This patch series is separated from the previous "Virtio-balloon
Enhancement" series. The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT,  
implemented by this series enables the virtio-balloon driver to report
hints of guest free pages to the host. It can be used to accelerate live
migration of VMs. Here is an introduction of this usage:

Live migration needs to transfer the VM's memory from the source machine
to the destination round by round. For the 1st round, all the VM's memory
is transferred. From the 2nd round, only the pieces of memory that were
written by the guest (after the 1st round) are transferred. One method
that is popularly used by the hypervisor to track which part of memory is
written is to write-protect all the guest memory.

This feature enables the optimization of the 1st round memory transfer -
the hypervisor can skip the transfer of guest free pages in the 1st round.
It is not concerned that the memory pages are used after they are given
to the hypervisor as a hint of the free pages, because they will be
tracked by the hypervisor and transferred in the next round if they are
used and written.

* Tests
- Migration time improvement
Result:
Live migration time is reduced to 14% with this optimization.
Details:
Local live migration of 8GB idle guest, the legacy live migration takes
~1817ms. With this optimization, it takes ~254ms, which reduces the time
to 14%.
- Workload tests
Results:
Running this feature has no impact on the linux compilation workload
running inside the guest.
Details:
Set up a Ping-Pong local live migration, where the guest ceaselessy
migrates between the source and destination. Linux compilation,
i.e. make bzImage -j4, is performed during the Ping-Pong migration. The
legacy case takes 5min14s to finish the compilation. With this
optimization patched, it takes 5min12s.

ChangeLog:
v26->v27:
- add a new patch to expose page_poisoning_enabled to kernel modules
- virtio-balloon: set poison_val to 0x, instead of 0xaa
v25->v26: virtio-balloon changes only
- remove kicking free page vq since the host now polls the vq after
  initiating the reporting
- report_free_page_func: detach all the used buffers after sending
  the stop cmd id. This avoids leaving the detaching burden (i.e.
  overhead) to the next cmd id. Detaching here isn't considered
  overhead since the stop cmd id has been sent, and host has already
  moved formard.
v24->v25:
- mm: change walk_free_mem_block to return 0 (instead of true) on
  completing the report, and return a non-zero value from the
  callabck, which stops the reporting.
- virtio-balloon:
- use enum instead of define for VIRTIO_BALLOON_VQ_INFLATE etc.
- avoid __virtio_clear_bit when bailing out;
- a new method to avoid reporting the some cmd id to host twice
- destroy_workqueue can cancel free page work when the feature is
  negotiated;
- fail probe when the free page vq size is less than 2.
v23->v24:
- change feature name VIRTIO_BALLOON_F_FREE_PAGE_VQ to
  VIRTIO_BALLOON_F_FREE_PAGE_HINT
- kick when vq->num_free < half full, instead of "= half full"
- replace BUG_ON with bailing out
- check vb->balloon_wq in probe(), if null, bail out
- add a new feature bit for page poisoning
- solve the corner case that one cmd id being sent to host twice
v22->v23:
- change to kick the device when the vq is half-way full;
- open-code batch_free_page_sg into add_one_sg;
- change cmd_id from "uint32_t" to "__virtio32";
- reserver one entry in the vq for the driver to send cmd_id, instead
  of busywaiting for an available entry;
- add "stop_update" check before queue_work for prudence purpose for
  now, will have a separate patch to discuss this flag check later;
- init_vqs: change to put some variables on stack to have simpler
  implementation;
- add destroy_workqueue(vb->balloon_wq);

v21->v22:
- add_one_sg: some code and comment re-arrangement
- send_cmd_id: handle a cornercase

For previous ChangeLog, please reference
https://lwn.net/Articles/743660/

Wei Wang (4):
  mm: support reporting free page blocks
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  mm/page_poison: expose page_poisoning_enabled to kernel modules
  virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON

 drivers/virtio/virtio_balloon.c | 255 +++-
 include/linux/mm.h  |   6 +
 include/uapi/linux/virtio_balloon.h |   7 +
 mm/page_alloc.c |  96 ++
 mm/page_poison.c|   6 +
 5 files changed, 334 insertions(+), 36 deletions(-)

-- 
2.7.4

[PATCH v27 1/4] mm: support reporting free page blocks

2018-02-06 Thread Wei Wang

This patch adds support to walk through the free page blocks in the
system and report them via a callback function. Some page blocks may
leave the free list after zone->lock is released, so it is the caller's
responsibility to either detect or prevent the use of such pages.

One use example of this patch is to accelerate live migration by skipping
the transfer of free pages reported from the guest. A popular method used
by the hypervisor to track which part of memory is written during live
migration is to write-protect all the guest memory. So, those pages that
are reported as free pages but are written after the report function
returns will be captured by the hypervisor, and they will be added to the
next round of memory transfer.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michal Hocko 
Cc: Michael S. Tsirkin 
Acked-by: Michal Hocko 
---
 include/linux/mm.h |  6 
 mm/page_alloc.c| 96 ++
 2 files changed, 102 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 173d248..1c77d88 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1944,6 +1944,12 @@ extern void free_area_init_node(int nid, unsigned long * 
zones_size,
unsigned long zone_start_pfn, unsigned long *zholes_size);
 extern void free_initmem(void);
 
+extern int walk_free_mem_block(void *opaque,
+  int min_order,
+  int (*report_pfn_range)(void *opaque,
+  unsigned long pfn,
+  unsigned long num));
+
 /*
  * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
  * into the buddy system. The freed pages will be poisoned with pattern
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c7dd9c8..995ff01 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4906,6 +4906,102 @@ void show_free_areas(unsigned int filter, nodemask_t 
*nodemask)
show_swap_cache_info();
 }
 
+/*
+ * Walk through a free page list and report the found pfn range via the
+ * callback.
+ *
+ * Return 0 if it completes the reporting. Otherwise, return the non-zero
+ * value returned from the callback.
+ */
+static int walk_free_page_list(void *opaque,
+  struct zone *zone,
+  int order,
+  enum migratetype mt,
+  int (*report_pfn_range)(void *,
+  unsigned long,
+  unsigned long))
+{
+   struct page *page;
+   struct list_head *list;
+   unsigned long pfn, flags;
+   int ret = 0;
+
+   spin_lock_irqsave(>lock, flags);
+   list = >free_area[order].free_list[mt];
+   list_for_each_entry(page, list, lru) {
+   pfn = page_to_pfn(page);
+   ret = report_pfn_range(opaque, pfn, 1 << order);
+   if (ret)
+   break;
+   }
+   spin_unlock_irqrestore(>lock, flags);
+
+   return ret;
+}
+
+/**
+ * walk_free_mem_block - Walk through the free page blocks in the system
+ * @opaque: the context passed from the caller
+ * @min_order: the minimum order of free lists to check
+ * @report_pfn_range: the callback to report the pfn range of the free pages
+ *
+ * If the callback returns a non-zero value, stop iterating the list of free
+ * page blocks. Otherwise, continue to report.
+ *
+ * Please note that there are no locking guarantees for the callback and
+ * that the reported pfn range might be freed or disappear after the
+ * callback returns so the caller has to be very careful how it is used.
+ *
+ * The callback itself must not sleep or perform any operations which would
+ * require any memory allocations directly (not even GFP_NOWAIT/GFP_ATOMIC)
+ * or via any lock dependency. It is generally advisable to implement
+ * the callback as simple as possible and defer any heavy lifting to a
+ * different context.
+ *
+ * There is no guarantee that each free range will be reported only once
+ * during one walk_free_mem_block invocation.
+ *
+ * pfn_to_page on the given range is strongly discouraged and if there is
+ * an absolute need for that make sure to contact MM people to discuss
+ * potential problems.
+ *
+ * The function itself might sleep so it cannot be called from atomic
+ * contexts.
+ *
+ * In general low orders tend to be very volatile and so it makes more
+ * sense to query larger ones first for various optimizations which like
+ * ballooning etc... This will reduce the overhead as well.
+ *
+ * Return 0 if it completes the reporting. Otherwise, return the non-zero
+ * value returned from the callback.
+ */
+int walk_free_mem_block(void *opaque,
+   int

[PATCH v27 0/4] Virtio-balloon: support free page reporting

2018-02-06 Thread Wei Wang

This patch series is separated from the previous "Virtio-balloon
Enhancement" series. The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT,  
implemented by this series enables the virtio-balloon driver to report
hints of guest free pages to the host. It can be used to accelerate live
migration of VMs. Here is an introduction of this usage:

Live migration needs to transfer the VM's memory from the source machine
to the destination round by round. For the 1st round, all the VM's memory
is transferred. From the 2nd round, only the pieces of memory that were
written by the guest (after the 1st round) are transferred. One method
that is popularly used by the hypervisor to track which part of memory is
written is to write-protect all the guest memory.

This feature enables the optimization of the 1st round memory transfer -
the hypervisor can skip the transfer of guest free pages in the 1st round.
It is not concerned that the memory pages are used after they are given
to the hypervisor as a hint of the free pages, because they will be
tracked by the hypervisor and transferred in the next round if they are
used and written.

* Tests
- Migration time improvement
Result:
Live migration time is reduced to 14% with this optimization.
Details:
Local live migration of 8GB idle guest, the legacy live migration takes
~1817ms. With this optimization, it takes ~254ms, which reduces the time
to 14%.
- Workload tests
Results:
Running this feature has no impact on the linux compilation workload
running inside the guest.
Details:
Set up a Ping-Pong local live migration, where the guest ceaselessy
migrates between the source and destination. Linux compilation,
i.e. make bzImage -j4, is performed during the Ping-Pong migration. The
legacy case takes 5min14s to finish the compilation. With this
optimization patched, it takes 5min12s.

ChangeLog:
v26->v27:
- add a new patch to expose page_poisoning_enabled to kernel modules
- virtio-balloon: set poison_val to 0x, instead of 0xaa
v25->v26: virtio-balloon changes only
- remove kicking free page vq since the host now polls the vq after
  initiating the reporting
- report_free_page_func: detach all the used buffers after sending
  the stop cmd id. This avoids leaving the detaching burden (i.e.
  overhead) to the next cmd id. Detaching here isn't considered
  overhead since the stop cmd id has been sent, and host has already
  moved formard.
v24->v25:
- mm: change walk_free_mem_block to return 0 (instead of true) on
  completing the report, and return a non-zero value from the
  callabck, which stops the reporting.
- virtio-balloon:
- use enum instead of define for VIRTIO_BALLOON_VQ_INFLATE etc.
- avoid __virtio_clear_bit when bailing out;
- a new method to avoid reporting the some cmd id to host twice
- destroy_workqueue can cancel free page work when the feature is
  negotiated;
- fail probe when the free page vq size is less than 2.
v23->v24:
- change feature name VIRTIO_BALLOON_F_FREE_PAGE_VQ to
  VIRTIO_BALLOON_F_FREE_PAGE_HINT
- kick when vq->num_free < half full, instead of "= half full"
- replace BUG_ON with bailing out
- check vb->balloon_wq in probe(), if null, bail out
- add a new feature bit for page poisoning
- solve the corner case that one cmd id being sent to host twice
v22->v23:
- change to kick the device when the vq is half-way full;
- open-code batch_free_page_sg into add_one_sg;
- change cmd_id from "uint32_t" to "__virtio32";
- reserver one entry in the vq for the driver to send cmd_id, instead
  of busywaiting for an available entry;
- add "stop_update" check before queue_work for prudence purpose for
  now, will have a separate patch to discuss this flag check later;
- init_vqs: change to put some variables on stack to have simpler
  implementation;
- add destroy_workqueue(vb->balloon_wq);

v21->v22:
- add_one_sg: some code and comment re-arrangement
- send_cmd_id: handle a cornercase

For previous ChangeLog, please reference
https://lwn.net/Articles/743660/

Wei Wang (4):
  mm: support reporting free page blocks
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  mm/page_poison: expose page_poisoning_enabled to kernel modules
  virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON

 drivers/virtio/virtio_balloon.c | 255 +++-
 include/linux/mm.h  |   6 +
 include/uapi/linux/virtio_balloon.h |   7 +
 mm/page_alloc.c |  96 ++
 mm/page_poison.c|   6 +
 5 files changed, 334 insertions(+), 36 deletions(-)

-- 
2.7.4

[PATCH v27 1/4] mm: support reporting free page blocks

2018-02-06 Thread Wei Wang

This patch adds support to walk through the free page blocks in the
system and report them via a callback function. Some page blocks may
leave the free list after zone->lock is released, so it is the caller's
responsibility to either detect or prevent the use of such pages.

One use example of this patch is to accelerate live migration by skipping
the transfer of free pages reported from the guest. A popular method used
by the hypervisor to track which part of memory is written during live
migration is to write-protect all the guest memory. So, those pages that
are reported as free pages but are written after the report function
returns will be captured by the hypervisor, and they will be added to the
next round of memory transfer.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michal Hocko 
Cc: Michael S. Tsirkin 
Acked-by: Michal Hocko 
---
 include/linux/mm.h |  6 
 mm/page_alloc.c| 96 ++
 2 files changed, 102 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 173d248..1c77d88 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1944,6 +1944,12 @@ extern void free_area_init_node(int nid, unsigned long * 
zones_size,
unsigned long zone_start_pfn, unsigned long *zholes_size);
 extern void free_initmem(void);
 
+extern int walk_free_mem_block(void *opaque,
+  int min_order,
+  int (*report_pfn_range)(void *opaque,
+  unsigned long pfn,
+  unsigned long num));
+
 /*
  * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
  * into the buddy system. The freed pages will be poisoned with pattern
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c7dd9c8..995ff01 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4906,6 +4906,102 @@ void show_free_areas(unsigned int filter, nodemask_t 
*nodemask)
show_swap_cache_info();
 }
 
+/*
+ * Walk through a free page list and report the found pfn range via the
+ * callback.
+ *
+ * Return 0 if it completes the reporting. Otherwise, return the non-zero
+ * value returned from the callback.
+ */
+static int walk_free_page_list(void *opaque,
+  struct zone *zone,
+  int order,
+  enum migratetype mt,
+  int (*report_pfn_range)(void *,
+  unsigned long,
+  unsigned long))
+{
+   struct page *page;
+   struct list_head *list;
+   unsigned long pfn, flags;
+   int ret = 0;
+
+   spin_lock_irqsave(>lock, flags);
+   list = >free_area[order].free_list[mt];
+   list_for_each_entry(page, list, lru) {
+   pfn = page_to_pfn(page);
+   ret = report_pfn_range(opaque, pfn, 1 << order);
+   if (ret)
+   break;
+   }
+   spin_unlock_irqrestore(>lock, flags);
+
+   return ret;
+}
+
+/**
+ * walk_free_mem_block - Walk through the free page blocks in the system
+ * @opaque: the context passed from the caller
+ * @min_order: the minimum order of free lists to check
+ * @report_pfn_range: the callback to report the pfn range of the free pages
+ *
+ * If the callback returns a non-zero value, stop iterating the list of free
+ * page blocks. Otherwise, continue to report.
+ *
+ * Please note that there are no locking guarantees for the callback and
+ * that the reported pfn range might be freed or disappear after the
+ * callback returns so the caller has to be very careful how it is used.
+ *
+ * The callback itself must not sleep or perform any operations which would
+ * require any memory allocations directly (not even GFP_NOWAIT/GFP_ATOMIC)
+ * or via any lock dependency. It is generally advisable to implement
+ * the callback as simple as possible and defer any heavy lifting to a
+ * different context.
+ *
+ * There is no guarantee that each free range will be reported only once
+ * during one walk_free_mem_block invocation.
+ *
+ * pfn_to_page on the given range is strongly discouraged and if there is
+ * an absolute need for that make sure to contact MM people to discuss
+ * potential problems.
+ *
+ * The function itself might sleep so it cannot be called from atomic
+ * contexts.
+ *
+ * In general low orders tend to be very volatile and so it makes more
+ * sense to query larger ones first for various optimizations which like
+ * ballooning etc... This will reduce the overhead as well.
+ *
+ * Return 0 if it completes the reporting. Otherwise, return the non-zero
+ * value returned from the callback.
+ */
+int walk_free_mem_block(void *opaque,
+   int min_order,
+   int (*report_pfn_range)(void *opaque,
+   unsigned

Re: staging: ion: ION allocation fall back order depends on heap linkage order

2018-02-06 Thread Alexey Skidanov



> Yup, you've hit upon a key problem. Having fallbacks be stable
> was always a problem and the recommendation these days is to
> not rely on them. You can specify a heap at a time and fallback
> manually if you want that behavior.
> 
> If you have a proposal to make fallbacks work reliably without
> overly complicating the ABI I'm happy to review it.
> 
> Thanks,
> Laura
> 
I think it's possible to "automate" the "manual fallback" behavior. But
the real issues is using heap id to specify the particular heap object.

Current API (allocation IOCTL) requires to specify the particular heap
object by using heap id. From the other hand, the user space doesn't
control the heaps creation order and heap id assignment. So it may be
tricky, especially when more than one object of the same heap type is
created automatically.

Thanks,
Alexey

Re: staging: ion: ION allocation fall back order depends on heap linkage order

2018-02-06 Thread Alexey Skidanov



> Yup, you've hit upon a key problem. Having fallbacks be stable
> was always a problem and the recommendation these days is to
> not rely on them. You can specify a heap at a time and fallback
> manually if you want that behavior.
> 
> If you have a proposal to make fallbacks work reliably without
> overly complicating the ABI I'm happy to review it.
> 
> Thanks,
> Laura
> 
I think it's possible to "automate" the "manual fallback" behavior. But
the real issues is using heap id to specify the particular heap object.

Current API (allocation IOCTL) requires to specify the particular heap
object by using heap id. From the other hand, the user space doesn't
control the heaps creation order and heap id assignment. So it may be
tricky, especially when more than one object of the same heap type is
created automatically.

Thanks,
Alexey

[PATCH 6/6] s390: introduce execute-trampolines for branches

2018-02-06 Thread Martin Schwidefsky

Add CONFIG_EXPOLINE to enable the use of the new -mindirect-branch= and
-mfunction_return= compiler options to create a kernel fortified against
the specte v2 attack.

With CONFIG_EXPOLINE=y all indirect branches will be issued with an
execute type instruction. For z10 or newer the EXRL instruction will
be used, for older machines the EX instruction. The typical indirect
call

basr%r14,%r1

is replaced with a PC relative call to a new thunk

brasl   %r14,__s390x_indirect_jump_r1

The thunk contains the EXRL/EX instruction to the indirect branch

__s390x_indirect_jump_r1:
exrl0,0f
j   .
0:  br  %r1

The detour via the execute type instruction has a performance impact.
To get rid of the detour the new kernel parameter "nospectre_v2" and
"spectre_v2=[on,off,auto]" can be used. If the parameter is specified
the kernel and module code will be patched at runtime.

Signed-off-by: Martin Schwidefsky 
---
 arch/s390/Kconfig |  28 +
 arch/s390/Makefile|  12 
 arch/s390/include/asm/lowcore.h   |   6 +-
 arch/s390/include/asm/nospec-branch.h |  18 ++
 arch/s390/kernel/Makefile |   4 ++
 arch/s390/kernel/entry.S  | 113 ++
 arch/s390/kernel/module.c |  62 ---
 arch/s390/kernel/nospec-branch.c  | 100 ++
 arch/s390/kernel/setup.c  |   4 ++
 arch/s390/kernel/smp.c|   1 +
 arch/s390/kernel/vmlinux.lds.S|  14 +
 drivers/s390/char/Makefile|   2 +
 12 files changed, 329 insertions(+), 35 deletions(-)
 create mode 100644 arch/s390/include/asm/nospec-branch.h
 create mode 100644 arch/s390/kernel/nospec-branch.c

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index d514e25..d4a65bf 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -557,6 +557,34 @@ config KERNEL_NOBP
 
  If unsure, say N.
 
+config EXPOLINE
+   def_bool n
+   prompt "Avoid speculative indirect branches in the kernel"
+   help
+ Compile the kernel with the expoline compiler options to guard
+ against kernel-to-user data leaks by avoiding speculative indirect
+ branches.
+ Requires a compiler with -mindirect-branch=thunk support for full
+ protection. The kernel may run slower.
+
+ If unsure, say N.
+
+choice
+   prompt "Expoline default"
+   depends on EXPOLINE
+   default EXPOLINE_FULL
+
+config EXPOLINE_OFF
+   bool "spectre_v2=off"
+
+config EXPOLINE_MEDIUM
+   bool "spectre_v2=auto"
+
+config EXPOLINE_FULL
+   bool "spectre_v2=on"
+
+endchoice
+
 endmenu
 
 menu "Memory setup"
diff --git a/arch/s390/Makefile b/arch/s390/Makefile
index fd691c4..2f925ef 100644
--- a/arch/s390/Makefile
+++ b/arch/s390/Makefile
@@ -78,6 +78,18 @@ ifeq ($(call cc-option-yn,-mwarn-dynamicstack),y)
 cflags-$(CONFIG_WARN_DYNAMIC_STACK) += -mwarn-dynamicstack
 endif
 
+ifdef CONFIG_EXPOLINE
+  ifeq ($(call cc-option-yn,$(CC_FLAGS_MARCH) -mindirect-branch=thunk),y)
+CC_FLAGS_EXPOLINE := -mindirect-branch=thunk
+CC_FLAGS_EXPOLINE += -mfunction-return=thunk
+CC_FLAGS_EXPOLINE += -mindirect-branch-table
+export CC_FLAGS_EXPOLINE
+cflags-y += $(CC_FLAGS_EXPOLINE)
+  else
+$(warning "Your gcc lacks the -mindirect-branch= option")
+  endif
+endif
+
 ifdef CONFIG_FUNCTION_TRACER
 # make use of hotpatch feature if the compiler supports it
 cc_hotpatch:= -mhotpatch=0,3
diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
index c63986a..5bc 100644
--- a/arch/s390/include/asm/lowcore.h
+++ b/arch/s390/include/asm/lowcore.h
@@ -136,7 +136,11 @@ struct lowcore {
__u64   vdso_per_cpu_data;  /* 0x03b8 */
__u64   machine_flags;  /* 0x03c0 */
__u64   gmap;   /* 0x03c8 */
-   __u8pad_0x03d0[0x0e00-0x03d0];  /* 0x03d0 */
+   __u8pad_0x03d0[0x0400-0x03d0];  /* 0x03d0 */
+
+   /* br %r1 trampoline */
+   __u16   br_r1_trampoline;   /* 0x0400 */
+   __u8pad_0x0402[0x0e00-0x0402];  /* 0x0402 */
 
/*
 * 0xe00 contains the address of the IPL Parameter Information
diff --git a/arch/s390/include/asm/nospec-branch.h 
b/arch/s390/include/asm/nospec-branch.h
new file mode 100644
index 000..7df48e5
--- /dev/null
+++ b/arch/s390/include/asm/nospec-branch.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_S390_EXPOLINE_H
+#define _ASM_S390_EXPOLINE_H
+
+#ifndef __ASSEMBLY__
+
+#include 
+
+extern int nospec_call_disable;
+extern int nospec_return_disable;
+
+void nospec_init_branches(void);
+void nospec_call_revert(s32 *start, s32 *end);
+void nospec_return_revert(s32 *start, s32 *end);
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* _ASM_S390_EXPOLINE_H */
diff --git a/arch/s390/kernel/Makefile

[PATCH 6/6] s390: introduce execute-trampolines for branches

2018-02-06 Thread Martin Schwidefsky

Add CONFIG_EXPOLINE to enable the use of the new -mindirect-branch= and
-mfunction_return= compiler options to create a kernel fortified against
the specte v2 attack.

With CONFIG_EXPOLINE=y all indirect branches will be issued with an
execute type instruction. For z10 or newer the EXRL instruction will
be used, for older machines the EX instruction. The typical indirect
call

basr%r14,%r1

is replaced with a PC relative call to a new thunk

brasl   %r14,__s390x_indirect_jump_r1

The thunk contains the EXRL/EX instruction to the indirect branch

__s390x_indirect_jump_r1:
exrl0,0f
j   .
0:  br  %r1

The detour via the execute type instruction has a performance impact.
To get rid of the detour the new kernel parameter "nospectre_v2" and
"spectre_v2=[on,off,auto]" can be used. If the parameter is specified
the kernel and module code will be patched at runtime.

Signed-off-by: Martin Schwidefsky 
---
 arch/s390/Kconfig |  28 +
 arch/s390/Makefile|  12 
 arch/s390/include/asm/lowcore.h   |   6 +-
 arch/s390/include/asm/nospec-branch.h |  18 ++
 arch/s390/kernel/Makefile |   4 ++
 arch/s390/kernel/entry.S  | 113 ++
 arch/s390/kernel/module.c |  62 ---
 arch/s390/kernel/nospec-branch.c  | 100 ++
 arch/s390/kernel/setup.c  |   4 ++
 arch/s390/kernel/smp.c|   1 +
 arch/s390/kernel/vmlinux.lds.S|  14 +
 drivers/s390/char/Makefile|   2 +
 12 files changed, 329 insertions(+), 35 deletions(-)
 create mode 100644 arch/s390/include/asm/nospec-branch.h
 create mode 100644 arch/s390/kernel/nospec-branch.c

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index d514e25..d4a65bf 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -557,6 +557,34 @@ config KERNEL_NOBP
 
  If unsure, say N.
 
+config EXPOLINE
+   def_bool n
+   prompt "Avoid speculative indirect branches in the kernel"
+   help
+ Compile the kernel with the expoline compiler options to guard
+ against kernel-to-user data leaks by avoiding speculative indirect
+ branches.
+ Requires a compiler with -mindirect-branch=thunk support for full
+ protection. The kernel may run slower.
+
+ If unsure, say N.
+
+choice
+   prompt "Expoline default"
+   depends on EXPOLINE
+   default EXPOLINE_FULL
+
+config EXPOLINE_OFF
+   bool "spectre_v2=off"
+
+config EXPOLINE_MEDIUM
+   bool "spectre_v2=auto"
+
+config EXPOLINE_FULL
+   bool "spectre_v2=on"
+
+endchoice
+
 endmenu
 
 menu "Memory setup"
diff --git a/arch/s390/Makefile b/arch/s390/Makefile
index fd691c4..2f925ef 100644
--- a/arch/s390/Makefile
+++ b/arch/s390/Makefile
@@ -78,6 +78,18 @@ ifeq ($(call cc-option-yn,-mwarn-dynamicstack),y)
 cflags-$(CONFIG_WARN_DYNAMIC_STACK) += -mwarn-dynamicstack
 endif
 
+ifdef CONFIG_EXPOLINE
+  ifeq ($(call cc-option-yn,$(CC_FLAGS_MARCH) -mindirect-branch=thunk),y)
+CC_FLAGS_EXPOLINE := -mindirect-branch=thunk
+CC_FLAGS_EXPOLINE += -mfunction-return=thunk
+CC_FLAGS_EXPOLINE += -mindirect-branch-table
+export CC_FLAGS_EXPOLINE
+cflags-y += $(CC_FLAGS_EXPOLINE)
+  else
+$(warning "Your gcc lacks the -mindirect-branch= option")
+  endif
+endif
+
 ifdef CONFIG_FUNCTION_TRACER
 # make use of hotpatch feature if the compiler supports it
 cc_hotpatch:= -mhotpatch=0,3
diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
index c63986a..5bc 100644
--- a/arch/s390/include/asm/lowcore.h
+++ b/arch/s390/include/asm/lowcore.h
@@ -136,7 +136,11 @@ struct lowcore {
__u64   vdso_per_cpu_data;  /* 0x03b8 */
__u64   machine_flags;  /* 0x03c0 */
__u64   gmap;   /* 0x03c8 */
-   __u8pad_0x03d0[0x0e00-0x03d0];  /* 0x03d0 */
+   __u8pad_0x03d0[0x0400-0x03d0];  /* 0x03d0 */
+
+   /* br %r1 trampoline */
+   __u16   br_r1_trampoline;   /* 0x0400 */
+   __u8pad_0x0402[0x0e00-0x0402];  /* 0x0402 */
 
/*
 * 0xe00 contains the address of the IPL Parameter Information
diff --git a/arch/s390/include/asm/nospec-branch.h 
b/arch/s390/include/asm/nospec-branch.h
new file mode 100644
index 000..7df48e5
--- /dev/null
+++ b/arch/s390/include/asm/nospec-branch.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_S390_EXPOLINE_H
+#define _ASM_S390_EXPOLINE_H
+
+#ifndef __ASSEMBLY__
+
+#include 
+
+extern int nospec_call_disable;
+extern int nospec_return_disable;
+
+void nospec_init_branches(void);
+void nospec_call_revert(s32 *start, s32 *end);
+void nospec_return_revert(s32 *start, s32 *end);
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* _ASM_S390_EXPOLINE_H */
diff --git a/arch/s390/kernel/Makefile

[PATCH 3/6] s390/alternative: use a copy of the facility bit mask

2018-02-06 Thread Martin Schwidefsky

To be able to switch off specific CPU alternatives with kernel parameters
make a copy of the facility bit mask provided by STFLE and use the copy
for the decision to apply an alternative.

Reviewed-by: David Hildenbrand 
Reviewed-by: Cornelia Huck 
Signed-off-by: Martin Schwidefsky 
---
 arch/s390/include/asm/facility.h | 18 ++
 arch/s390/include/asm/lowcore.h  |  3 ++-
 arch/s390/kernel/alternative.c   |  3 ++-
 arch/s390/kernel/early.c |  3 +++
 arch/s390/kernel/setup.c |  4 +++-
 arch/s390/kernel/smp.c   |  4 +++-
 6 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/arch/s390/include/asm/facility.h b/arch/s390/include/asm/facility.h
index fbe0c4b..99c8ce3 100644
--- a/arch/s390/include/asm/facility.h
+++ b/arch/s390/include/asm/facility.h
@@ -15,6 +15,24 @@
 
 #define MAX_FACILITY_BIT (sizeof(((struct lowcore *)0)->stfle_fac_list) * 8)
 
+static inline void __set_facility(unsigned long nr, void *facilities)
+{
+   unsigned char *ptr = (unsigned char *) facilities;
+
+   if (nr >= MAX_FACILITY_BIT)
+   return;
+   ptr[nr >> 3] |= 0x80 >> (nr & 7);
+}
+
+static inline void __clear_facility(unsigned long nr, void *facilities)
+{
+   unsigned char *ptr = (unsigned char *) facilities;
+
+   if (nr >= MAX_FACILITY_BIT)
+   return;
+   ptr[nr >> 3] &= ~(0x80 >> (nr & 7));
+}
+
 static inline int __test_facility(unsigned long nr, void *facilities)
 {
unsigned char *ptr;
diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
index ec6592e..c63986a 100644
--- a/arch/s390/include/asm/lowcore.h
+++ b/arch/s390/include/asm/lowcore.h
@@ -151,7 +151,8 @@ struct lowcore {
__u8pad_0x0e20[0x0f00-0x0e20];  /* 0x0e20 */
 
/* Extended facility list */
-   __u64   stfle_fac_list[32]; /* 0x0f00 */
+   __u64   stfle_fac_list[16]; /* 0x0f00 */
+   __u64   alt_stfle_fac_list[16]; /* 0x0f80 */
__u8pad_0x1000[0x11b0-0x1000];  /* 0x1000 */
 
/* Pointer to the machine check extended save area */
diff --git a/arch/s390/kernel/alternative.c b/arch/s390/kernel/alternative.c
index 574e776..1abf4f3 100644
--- a/arch/s390/kernel/alternative.c
+++ b/arch/s390/kernel/alternative.c
@@ -75,7 +75,8 @@ static void __init_or_module __apply_alternatives(struct 
alt_instr *start,
instr = (u8 *)>instr_offset + a->instr_offset;
replacement = (u8 *)>repl_offset + a->repl_offset;
 
-   if (!test_facility(a->facility))
+   if (!__test_facility(a->facility,
+S390_lowcore.alt_stfle_fac_list))
continue;
 
if (unlikely(a->instrlen % 2 || a->replacementlen % 2)) {
diff --git a/arch/s390/kernel/early.c b/arch/s390/kernel/early.c
index 497a920..510f218 100644
--- a/arch/s390/kernel/early.c
+++ b/arch/s390/kernel/early.c
@@ -193,6 +193,9 @@ static noinline __init void setup_facility_list(void)
 {
stfle(S390_lowcore.stfle_fac_list,
  ARRAY_SIZE(S390_lowcore.stfle_fac_list));
+   memcpy(S390_lowcore.alt_stfle_fac_list,
+  S390_lowcore.stfle_fac_list,
+  sizeof(S390_lowcore.alt_stfle_fac_list));
 }
 
 static __init void detect_diag9c(void)
diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index 793da97..bcd2a4a 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -340,7 +340,9 @@ static void __init setup_lowcore(void)
lc->preempt_count = S390_lowcore.preempt_count;
lc->stfl_fac_list = S390_lowcore.stfl_fac_list;
memcpy(lc->stfle_fac_list, S390_lowcore.stfle_fac_list,
-  MAX_FACILITY_BIT/8);
+  sizeof(lc->stfle_fac_list));
+   memcpy(lc->alt_stfle_fac_list, S390_lowcore.alt_stfle_fac_list,
+  sizeof(lc->alt_stfle_fac_list));
nmi_alloc_boot_cpu(lc);
vdso_alloc_boot_cpu(lc);
lc->sync_enter_timer = S390_lowcore.sync_enter_timer;
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index a919b2f..fc28c95 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -266,7 +266,9 @@ static void pcpu_prepare_secondary(struct pcpu *pcpu, int 
cpu)
__ctl_store(lc->cregs_save_area, 0, 15);
save_access_regs((unsigned int *) lc->access_regs_save_area);
memcpy(lc->stfle_fac_list, S390_lowcore.stfle_fac_list,
-  MAX_FACILITY_BIT/8);
+  sizeof(lc->stfle_fac_list));
+   memcpy(lc->alt_stfle_fac_list, S390_lowcore.alt_stfle_fac_list,
+  sizeof(lc->alt_stfle_fac_list));
arch_spin_lock_setup(cpu);
 }
 
-- 
2.7.4

[PATCH 1/6] s390: scrub registers on kernel entry and KVM exit

2018-02-06 Thread Martin Schwidefsky

Clear all user space registers on entry to the kernel and all KVM guest
registers on KVM guest exit if the register does not contain either a
parameter or a result value.

Reviewed-by: Christian Borntraeger 
Signed-off-by: Martin Schwidefsky 
---
 arch/s390/kernel/entry.S | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index 6cd444d..5d87eda 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -248,6 +248,12 @@ ENTRY(sie64a)
 sie_exit:
lg  %r14,__SF_EMPTY+8(%r15) # load guest register save area
stmg%r0,%r13,0(%r14)# save guest gprs 0-13
+   xgr %r0,%r0 # clear guest registers to
+   xgr %r1,%r1 # prevent speculative use
+   xgr %r2,%r2
+   xgr %r3,%r3
+   xgr %r4,%r4
+   xgr %r5,%r5
lmg %r6,%r14,__SF_GPRS(%r15)# restore kernel registers
lg  %r2,__SF_EMPTY+16(%r15) # return exit reason code
br  %r14
@@ -282,6 +288,8 @@ ENTRY(system_call)
 .Lsysc_vtime:
UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER
stmg%r0,%r7,__PT_R0(%r11)
+   # clear user controlled register to prevent speculative use
+   xgr %r0,%r0
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
mvc __PT_PSW(16,%r11),__LC_SVC_OLD_PSW
mvc __PT_INT_CODE(4,%r11),__LC_SVC_ILC
@@ -561,6 +569,15 @@ ENTRY(pgm_check_handler)
 4: lgr %r13,%r11
la  %r11,STACK_FRAME_OVERHEAD(%r15)
stmg%r0,%r7,__PT_R0(%r11)
+   # clear user controlled registers to prevent speculative use
+   xgr %r0,%r0
+   xgr %r1,%r1
+   xgr %r2,%r2
+   xgr %r3,%r3
+   xgr %r4,%r4
+   xgr %r5,%r5
+   xgr %r6,%r6
+   xgr %r7,%r7
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
stmg%r8,%r9,__PT_PSW(%r11)
mvc __PT_INT_CODE(4,%r11),__LC_PGM_ILC
@@ -626,6 +643,16 @@ ENTRY(io_int_handler)
lmg %r8,%r9,__LC_IO_OLD_PSW
SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_ENTER_TIMER
stmg%r0,%r7,__PT_R0(%r11)
+   # clear user controlled registers to prevent speculative use
+   xgr %r0,%r0
+   xgr %r1,%r1
+   xgr %r2,%r2
+   xgr %r3,%r3
+   xgr %r4,%r4
+   xgr %r5,%r5
+   xgr %r6,%r6
+   xgr %r7,%r7
+   xgr %r10,%r10
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC
stmg%r8,%r9,__PT_PSW(%r11)
mvc __PT_INT_CODE(12,%r11),__LC_SUBCHANNEL_ID
@@ -839,6 +866,16 @@ ENTRY(ext_int_handler)
lmg %r8,%r9,__LC_EXT_OLD_PSW
SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_ENTER_TIMER
stmg%r0,%r7,__PT_R0(%r11)
+   # clear user controlled registers to prevent speculative use
+   xgr %r0,%r0
+   xgr %r1,%r1
+   xgr %r2,%r2
+   xgr %r3,%r3
+   xgr %r4,%r4
+   xgr %r5,%r5
+   xgr %r6,%r6
+   xgr %r7,%r7
+   xgr %r10,%r10
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC
stmg%r8,%r9,__PT_PSW(%r11)
lghi%r1,__LC_EXT_PARAMS2
@@ -1046,6 +1083,16 @@ ENTRY(mcck_int_handler)
 .Lmcck_skip:
lghi%r14,__LC_GPREGS_SAVE_AREA+64
stmg%r0,%r7,__PT_R0(%r11)
+   # clear user controlled registers to prevent speculative use
+   xgr %r0,%r0
+   xgr %r1,%r1
+   xgr %r2,%r2
+   xgr %r3,%r3
+   xgr %r4,%r4
+   xgr %r5,%r5
+   xgr %r6,%r6
+   xgr %r7,%r7
+   xgr %r10,%r10
mvc __PT_R8(64,%r11),0(%r14)
stmg%r8,%r9,__PT_PSW(%r11)
xc  __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
-- 
2.7.4

[PATCH 1/6] s390: scrub registers on kernel entry and KVM exit

2018-02-06 Thread Martin Schwidefsky

Clear all user space registers on entry to the kernel and all KVM guest
registers on KVM guest exit if the register does not contain either a
parameter or a result value.

Reviewed-by: Christian Borntraeger 
Signed-off-by: Martin Schwidefsky 
---
 arch/s390/kernel/entry.S | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index 6cd444d..5d87eda 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -248,6 +248,12 @@ ENTRY(sie64a)
 sie_exit:
lg  %r14,__SF_EMPTY+8(%r15) # load guest register save area
stmg%r0,%r13,0(%r14)# save guest gprs 0-13
+   xgr %r0,%r0 # clear guest registers to
+   xgr %r1,%r1 # prevent speculative use
+   xgr %r2,%r2
+   xgr %r3,%r3
+   xgr %r4,%r4
+   xgr %r5,%r5
lmg %r6,%r14,__SF_GPRS(%r15)# restore kernel registers
lg  %r2,__SF_EMPTY+16(%r15) # return exit reason code
br  %r14
@@ -282,6 +288,8 @@ ENTRY(system_call)
 .Lsysc_vtime:
UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER
stmg%r0,%r7,__PT_R0(%r11)
+   # clear user controlled register to prevent speculative use
+   xgr %r0,%r0
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
mvc __PT_PSW(16,%r11),__LC_SVC_OLD_PSW
mvc __PT_INT_CODE(4,%r11),__LC_SVC_ILC
@@ -561,6 +569,15 @@ ENTRY(pgm_check_handler)
 4: lgr %r13,%r11
la  %r11,STACK_FRAME_OVERHEAD(%r15)
stmg%r0,%r7,__PT_R0(%r11)
+   # clear user controlled registers to prevent speculative use
+   xgr %r0,%r0
+   xgr %r1,%r1
+   xgr %r2,%r2
+   xgr %r3,%r3
+   xgr %r4,%r4
+   xgr %r5,%r5
+   xgr %r6,%r6
+   xgr %r7,%r7
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
stmg%r8,%r9,__PT_PSW(%r11)
mvc __PT_INT_CODE(4,%r11),__LC_PGM_ILC
@@ -626,6 +643,16 @@ ENTRY(io_int_handler)
lmg %r8,%r9,__LC_IO_OLD_PSW
SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_ENTER_TIMER
stmg%r0,%r7,__PT_R0(%r11)
+   # clear user controlled registers to prevent speculative use
+   xgr %r0,%r0
+   xgr %r1,%r1
+   xgr %r2,%r2
+   xgr %r3,%r3
+   xgr %r4,%r4
+   xgr %r5,%r5
+   xgr %r6,%r6
+   xgr %r7,%r7
+   xgr %r10,%r10
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC
stmg%r8,%r9,__PT_PSW(%r11)
mvc __PT_INT_CODE(12,%r11),__LC_SUBCHANNEL_ID
@@ -839,6 +866,16 @@ ENTRY(ext_int_handler)
lmg %r8,%r9,__LC_EXT_OLD_PSW
SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_ENTER_TIMER
stmg%r0,%r7,__PT_R0(%r11)
+   # clear user controlled registers to prevent speculative use
+   xgr %r0,%r0
+   xgr %r1,%r1
+   xgr %r2,%r2
+   xgr %r3,%r3
+   xgr %r4,%r4
+   xgr %r5,%r5
+   xgr %r6,%r6
+   xgr %r7,%r7
+   xgr %r10,%r10
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC
stmg%r8,%r9,__PT_PSW(%r11)
lghi%r1,__LC_EXT_PARAMS2
@@ -1046,6 +1083,16 @@ ENTRY(mcck_int_handler)
 .Lmcck_skip:
lghi%r14,__LC_GPREGS_SAVE_AREA+64
stmg%r0,%r7,__PT_R0(%r11)
+   # clear user controlled registers to prevent speculative use
+   xgr %r0,%r0
+   xgr %r1,%r1
+   xgr %r2,%r2
+   xgr %r3,%r3
+   xgr %r4,%r4
+   xgr %r5,%r5
+   xgr %r6,%r6
+   xgr %r7,%r7
+   xgr %r10,%r10
mvc __PT_R8(64,%r11),0(%r14)
stmg%r8,%r9,__PT_PSW(%r11)
xc  __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
-- 
2.7.4

[PATCH 3/6] s390/alternative: use a copy of the facility bit mask

2018-02-06 Thread Martin Schwidefsky

To be able to switch off specific CPU alternatives with kernel parameters
make a copy of the facility bit mask provided by STFLE and use the copy
for the decision to apply an alternative.

Reviewed-by: David Hildenbrand 
Reviewed-by: Cornelia Huck 
Signed-off-by: Martin Schwidefsky 
---
 arch/s390/include/asm/facility.h | 18 ++
 arch/s390/include/asm/lowcore.h  |  3 ++-
 arch/s390/kernel/alternative.c   |  3 ++-
 arch/s390/kernel/early.c |  3 +++
 arch/s390/kernel/setup.c |  4 +++-
 arch/s390/kernel/smp.c   |  4 +++-
 6 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/arch/s390/include/asm/facility.h b/arch/s390/include/asm/facility.h
index fbe0c4b..99c8ce3 100644
--- a/arch/s390/include/asm/facility.h
+++ b/arch/s390/include/asm/facility.h
@@ -15,6 +15,24 @@
 
 #define MAX_FACILITY_BIT (sizeof(((struct lowcore *)0)->stfle_fac_list) * 8)
 
+static inline void __set_facility(unsigned long nr, void *facilities)
+{
+   unsigned char *ptr = (unsigned char *) facilities;
+
+   if (nr >= MAX_FACILITY_BIT)
+   return;
+   ptr[nr >> 3] |= 0x80 >> (nr & 7);
+}
+
+static inline void __clear_facility(unsigned long nr, void *facilities)
+{
+   unsigned char *ptr = (unsigned char *) facilities;
+
+   if (nr >= MAX_FACILITY_BIT)
+   return;
+   ptr[nr >> 3] &= ~(0x80 >> (nr & 7));
+}
+
 static inline int __test_facility(unsigned long nr, void *facilities)
 {
unsigned char *ptr;
diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
index ec6592e..c63986a 100644
--- a/arch/s390/include/asm/lowcore.h
+++ b/arch/s390/include/asm/lowcore.h
@@ -151,7 +151,8 @@ struct lowcore {
__u8pad_0x0e20[0x0f00-0x0e20];  /* 0x0e20 */
 
/* Extended facility list */
-   __u64   stfle_fac_list[32]; /* 0x0f00 */
+   __u64   stfle_fac_list[16]; /* 0x0f00 */
+   __u64   alt_stfle_fac_list[16]; /* 0x0f80 */
__u8pad_0x1000[0x11b0-0x1000];  /* 0x1000 */
 
/* Pointer to the machine check extended save area */
diff --git a/arch/s390/kernel/alternative.c b/arch/s390/kernel/alternative.c
index 574e776..1abf4f3 100644
--- a/arch/s390/kernel/alternative.c
+++ b/arch/s390/kernel/alternative.c
@@ -75,7 +75,8 @@ static void __init_or_module __apply_alternatives(struct 
alt_instr *start,
instr = (u8 *)>instr_offset + a->instr_offset;
replacement = (u8 *)>repl_offset + a->repl_offset;
 
-   if (!test_facility(a->facility))
+   if (!__test_facility(a->facility,
+S390_lowcore.alt_stfle_fac_list))
continue;
 
if (unlikely(a->instrlen % 2 || a->replacementlen % 2)) {
diff --git a/arch/s390/kernel/early.c b/arch/s390/kernel/early.c
index 497a920..510f218 100644
--- a/arch/s390/kernel/early.c
+++ b/arch/s390/kernel/early.c
@@ -193,6 +193,9 @@ static noinline __init void setup_facility_list(void)
 {
stfle(S390_lowcore.stfle_fac_list,
  ARRAY_SIZE(S390_lowcore.stfle_fac_list));
+   memcpy(S390_lowcore.alt_stfle_fac_list,
+  S390_lowcore.stfle_fac_list,
+  sizeof(S390_lowcore.alt_stfle_fac_list));
 }
 
 static __init void detect_diag9c(void)
diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index 793da97..bcd2a4a 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -340,7 +340,9 @@ static void __init setup_lowcore(void)
lc->preempt_count = S390_lowcore.preempt_count;
lc->stfl_fac_list = S390_lowcore.stfl_fac_list;
memcpy(lc->stfle_fac_list, S390_lowcore.stfle_fac_list,
-  MAX_FACILITY_BIT/8);
+  sizeof(lc->stfle_fac_list));
+   memcpy(lc->alt_stfle_fac_list, S390_lowcore.alt_stfle_fac_list,
+  sizeof(lc->alt_stfle_fac_list));
nmi_alloc_boot_cpu(lc);
vdso_alloc_boot_cpu(lc);
lc->sync_enter_timer = S390_lowcore.sync_enter_timer;
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index a919b2f..fc28c95 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -266,7 +266,9 @@ static void pcpu_prepare_secondary(struct pcpu *pcpu, int 
cpu)
__ctl_store(lc->cregs_save_area, 0, 15);
save_access_regs((unsigned int *) lc->access_regs_save_area);
memcpy(lc->stfle_fac_list, S390_lowcore.stfle_fac_list,
-  MAX_FACILITY_BIT/8);
+  sizeof(lc->stfle_fac_list));
+   memcpy(lc->alt_stfle_fac_list, S390_lowcore.alt_stfle_fac_list,
+  sizeof(lc->alt_stfle_fac_list));
arch_spin_lock_setup(cpu);
 }
 
-- 
2.7.4

[PATCH 0/6] s390: improve speculative execution handling v3

2018-02-06 Thread Martin Schwidefsky

Version 3 of the speculative execution improvements for s390.

Changes to v2:

* Dropped the prctl to introduce the PR_ISOLATE_BP control and simply
  added two exported functions s390_isolate_bp and s390_isolate_bp_guest.
  There is currently no caller for these functions, for now an out-of-tree
  module can be used until an acceptable upstream solution for the user
  space interface is found.

* Added an optimized version for the the array_index_mask_nospec
  function based on subtract with borrow for the spectre v1 defense.

* Introduce "expoline", the s390 version of a retpoline. As s390 does
  not have a return instruction and the associate return stack we use
  an execute-type instruction on an indirect branch to get unpredicatable
  branches. This requires gcc support for -mindirect-branch=thunk /
  -mfunction-return=thunk.  To be able to disable expolines there is
  another gcc option -mindirect-branch-table to keep a list of PC relative
  locations of calls to the execute thunks. With spectre_v2=off the call
  will be replaced with the original indirect branch and a nop.

Martin Schwidefsky (6):
  s390: scrub registers on kernel entry and KVM exit
  s390: add optimized array_index_mask_nospec
  s390/alternative: use a copy of the facility bit mask
  s390: add options to change branch prediction behaviour for the kernel
  s390: run user space and KVM guests with modified branch prediction
  s390: introduce execute-trampolines for branches

 arch/s390/Kconfig |  45 ++
 arch/s390/Makefile|  12 ++
 arch/s390/include/asm/barrier.h   |  24 
 arch/s390/include/asm/facility.h  |  18 +++
 arch/s390/include/asm/lowcore.h   |   9 +-
 arch/s390/include/asm/nospec-branch.h |  18 +++
 arch/s390/include/asm/processor.h |   4 +
 arch/s390/include/asm/thread_info.h   |   4 +
 arch/s390/kernel/Makefile |   4 +
 arch/s390/kernel/alternative.c|  26 +++-
 arch/s390/kernel/early.c  |   5 +
 arch/s390/kernel/entry.S  | 249 ++
 arch/s390/kernel/ipl.c|   1 +
 arch/s390/kernel/module.c |  62 +++--
 arch/s390/kernel/nospec-branch.c  | 100 ++
 arch/s390/kernel/processor.c  |  18 +++
 arch/s390/kernel/setup.c  |   8 +-
 arch/s390/kernel/smp.c|   7 +-
 arch/s390/kernel/vmlinux.lds.S|  14 ++
 drivers/s390/char/Makefile|   2 +
 20 files changed, 591 insertions(+), 39 deletions(-)
 create mode 100644 arch/s390/include/asm/nospec-branch.h
 create mode 100644 arch/s390/kernel/nospec-branch.c

-- 
2.7.4

[PATCH 5/6] s390: run user space and KVM guests with modified branch prediction

2018-02-06 Thread Martin Schwidefsky

Define TIF_ISOLATE_BP and TIF_ISOLATE_BP_GUEST and add the necessary
plumbing in entry.S to be able to run user space and KVM guests with
limited branch prediction.

To switch a user space process to limited branch prediction the
s390_isolate_bp() function has to be call, and to run a vCPU of a KVM
guest associated with the current task with limited branch prediction
call s390_isolate_bp_guest().

Signed-off-by: Martin Schwidefsky 
---
 arch/s390/include/asm/processor.h   |  3 +++
 arch/s390/include/asm/thread_info.h |  4 +++
 arch/s390/kernel/entry.S| 51 +
 arch/s390/kernel/processor.c| 18 +
 4 files changed, 71 insertions(+), 5 deletions(-)

diff --git a/arch/s390/include/asm/processor.h 
b/arch/s390/include/asm/processor.h
index 5f37f9c..7f2953c 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -378,6 +378,9 @@ extern void memcpy_absolute(void *, void *, size_t);
memcpy_absolute(&(dest), &__tmp, sizeof(__tmp));\
 } while (0)
 
+extern int s390_isolate_bp(void);
+extern int s390_isolate_bp_guest(void);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __ASM_S390_PROCESSOR_H */
diff --git a/arch/s390/include/asm/thread_info.h 
b/arch/s390/include/asm/thread_info.h
index 25d6ec3..83ba575 100644
--- a/arch/s390/include/asm/thread_info.h
+++ b/arch/s390/include/asm/thread_info.h
@@ -58,6 +58,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct 
task_struct *src);
 #define TIF_GUARDED_STORAGE4   /* load guarded storage control block */
 #define TIF_PATCH_PENDING  5   /* pending live patching update */
 #define TIF_PGSTE  6   /* New mm's will use 4K page tables */
+#define TIF_ISOLATE_BP 8   /* Run process with isolated BP */
+#define TIF_ISOLATE_BP_GUEST   9   /* Run KVM guests with isolated BP */
 
 #define TIF_31BIT  16  /* 32bit process */
 #define TIF_MEMDIE 17  /* is terminating due to OOM killer */
@@ -78,6 +80,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct 
task_struct *src);
 #define _TIF_UPROBE_BITUL(TIF_UPROBE)
 #define _TIF_GUARDED_STORAGE   _BITUL(TIF_GUARDED_STORAGE)
 #define _TIF_PATCH_PENDING _BITUL(TIF_PATCH_PENDING)
+#define _TIF_ISOLATE_BP_BITUL(TIF_ISOLATE_BP)
+#define _TIF_ISOLATE_BP_GUEST  _BITUL(TIF_ISOLATE_BP_GUEST)
 
 #define _TIF_31BIT _BITUL(TIF_31BIT)
 #define _TIF_SINGLE_STEP   _BITUL(TIF_SINGLE_STEP)
diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index e6d7550..53145b5 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -107,6 +107,7 @@ _PIF_WORK   = (_PIF_PER_TRAP | _PIF_SYSCALL_RESTART)
aghi%r15,-(STACK_FRAME_OVERHEAD + __PT_SIZE)
j   3f
 1: UPDATE_VTIME %r14,%r15,\timer
+   BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP
 2: lg  %r15,__LC_ASYNC_STACK   # load async stack
 3: la  %r11,STACK_FRAME_OVERHEAD(%r15)
.endm
@@ -187,6 +188,40 @@ _PIF_WORK  = (_PIF_PER_TRAP | _PIF_SYSCALL_RESTART)
.popsection
.endm
 
+   .macro BPENTER tif_ptr,tif_mask
+   .pushsection .altinstr_replacement, "ax"
+662:   .word   0xc004, 0x, 0x  # 6 byte nop
+   .word   0xc004, 0x, 0x  # 6 byte nop
+   .popsection
+664:   TSTMSK  \tif_ptr,\tif_mask
+   jz  . + 8
+   .long   0xb2e8d000
+   .pushsection .altinstructions, "a"
+   .long 664b - .
+   .long 662b - .
+   .word 82
+   .byte 12
+   .byte 12
+   .popsection
+   .endm
+
+   .macro BPEXIT tif_ptr,tif_mask
+   TSTMSK  \tif_ptr,\tif_mask
+   .pushsection .altinstr_replacement, "ax"
+662:   jnz . + 8
+   .long   0xb2e8d000
+   .popsection
+664:   jz  . + 8
+   .long   0xb2e8c000
+   .pushsection .altinstructions, "a"
+   .long 664b - .
+   .long 662b - .
+   .word 82
+   .byte 8
+   .byte 8
+   .popsection
+   .endm
+
.section .kprobes.text, "ax"
 .Ldummy:
/*
@@ -240,9 +275,11 @@ ENTRY(__switch_to)
  */
 ENTRY(sie64a)
stmg%r6,%r14,__SF_GPRS(%r15)# save kernel registers
+   lg  %r12,__LC_CURRENT
stg %r2,__SF_EMPTY(%r15)# save control block pointer
stg %r3,__SF_EMPTY+8(%r15)  # save guest register save area
xc  __SF_EMPTY+16(8,%r15),__SF_EMPTY+16(%r15) # reason code = 0
+   mvc __SF_EMPTY+24(8,%r15),__TI_flags(%r12) # copy thread flags
TSTMSK  __LC_CPU_FLAGS,_CIF_FPU # load guest fp/vx registers ?
jno .Lsie_load_guest_gprs
brasl   %r14,load_fpu_regs  # load guest fp/vx regs
@@ -259,11 +296,12 @@ ENTRY(sie64a)
jnz .Lsie_skip
TSTMSK  __LC_CPU_FLAGS,_CIF_FPU
jo  .Lsie_skip  # exit if fp/vx regs changed
-   BPON
+

[PATCH 0/6] s390: improve speculative execution handling v3

2018-02-06 Thread Martin Schwidefsky

Version 3 of the speculative execution improvements for s390.

Changes to v2:

* Dropped the prctl to introduce the PR_ISOLATE_BP control and simply
  added two exported functions s390_isolate_bp and s390_isolate_bp_guest.
  There is currently no caller for these functions, for now an out-of-tree
  module can be used until an acceptable upstream solution for the user
  space interface is found.

* Added an optimized version for the the array_index_mask_nospec
  function based on subtract with borrow for the spectre v1 defense.

* Introduce "expoline", the s390 version of a retpoline. As s390 does
  not have a return instruction and the associate return stack we use
  an execute-type instruction on an indirect branch to get unpredicatable
  branches. This requires gcc support for -mindirect-branch=thunk /
  -mfunction-return=thunk.  To be able to disable expolines there is
  another gcc option -mindirect-branch-table to keep a list of PC relative
  locations of calls to the execute thunks. With spectre_v2=off the call
  will be replaced with the original indirect branch and a nop.

Martin Schwidefsky (6):
  s390: scrub registers on kernel entry and KVM exit
  s390: add optimized array_index_mask_nospec
  s390/alternative: use a copy of the facility bit mask
  s390: add options to change branch prediction behaviour for the kernel
  s390: run user space and KVM guests with modified branch prediction
  s390: introduce execute-trampolines for branches

 arch/s390/Kconfig |  45 ++
 arch/s390/Makefile|  12 ++
 arch/s390/include/asm/barrier.h   |  24 
 arch/s390/include/asm/facility.h  |  18 +++
 arch/s390/include/asm/lowcore.h   |   9 +-
 arch/s390/include/asm/nospec-branch.h |  18 +++
 arch/s390/include/asm/processor.h |   4 +
 arch/s390/include/asm/thread_info.h   |   4 +
 arch/s390/kernel/Makefile |   4 +
 arch/s390/kernel/alternative.c|  26 +++-
 arch/s390/kernel/early.c  |   5 +
 arch/s390/kernel/entry.S  | 249 ++
 arch/s390/kernel/ipl.c|   1 +
 arch/s390/kernel/module.c |  62 +++--
 arch/s390/kernel/nospec-branch.c  | 100 ++
 arch/s390/kernel/processor.c  |  18 +++
 arch/s390/kernel/setup.c  |   8 +-
 arch/s390/kernel/smp.c|   7 +-
 arch/s390/kernel/vmlinux.lds.S|  14 ++
 drivers/s390/char/Makefile|   2 +
 20 files changed, 591 insertions(+), 39 deletions(-)
 create mode 100644 arch/s390/include/asm/nospec-branch.h
 create mode 100644 arch/s390/kernel/nospec-branch.c

-- 
2.7.4

[PATCH 5/6] s390: run user space and KVM guests with modified branch prediction

2018-02-06 Thread Martin Schwidefsky

Define TIF_ISOLATE_BP and TIF_ISOLATE_BP_GUEST and add the necessary
plumbing in entry.S to be able to run user space and KVM guests with
limited branch prediction.

To switch a user space process to limited branch prediction the
s390_isolate_bp() function has to be call, and to run a vCPU of a KVM
guest associated with the current task with limited branch prediction
call s390_isolate_bp_guest().

Signed-off-by: Martin Schwidefsky 
---
 arch/s390/include/asm/processor.h   |  3 +++
 arch/s390/include/asm/thread_info.h |  4 +++
 arch/s390/kernel/entry.S| 51 +
 arch/s390/kernel/processor.c| 18 +
 4 files changed, 71 insertions(+), 5 deletions(-)

diff --git a/arch/s390/include/asm/processor.h 
b/arch/s390/include/asm/processor.h
index 5f37f9c..7f2953c 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -378,6 +378,9 @@ extern void memcpy_absolute(void *, void *, size_t);
memcpy_absolute(&(dest), &__tmp, sizeof(__tmp));\
 } while (0)
 
+extern int s390_isolate_bp(void);
+extern int s390_isolate_bp_guest(void);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __ASM_S390_PROCESSOR_H */
diff --git a/arch/s390/include/asm/thread_info.h 
b/arch/s390/include/asm/thread_info.h
index 25d6ec3..83ba575 100644
--- a/arch/s390/include/asm/thread_info.h
+++ b/arch/s390/include/asm/thread_info.h
@@ -58,6 +58,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct 
task_struct *src);
 #define TIF_GUARDED_STORAGE4   /* load guarded storage control block */
 #define TIF_PATCH_PENDING  5   /* pending live patching update */
 #define TIF_PGSTE  6   /* New mm's will use 4K page tables */
+#define TIF_ISOLATE_BP 8   /* Run process with isolated BP */
+#define TIF_ISOLATE_BP_GUEST   9   /* Run KVM guests with isolated BP */
 
 #define TIF_31BIT  16  /* 32bit process */
 #define TIF_MEMDIE 17  /* is terminating due to OOM killer */
@@ -78,6 +80,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct 
task_struct *src);
 #define _TIF_UPROBE_BITUL(TIF_UPROBE)
 #define _TIF_GUARDED_STORAGE   _BITUL(TIF_GUARDED_STORAGE)
 #define _TIF_PATCH_PENDING _BITUL(TIF_PATCH_PENDING)
+#define _TIF_ISOLATE_BP_BITUL(TIF_ISOLATE_BP)
+#define _TIF_ISOLATE_BP_GUEST  _BITUL(TIF_ISOLATE_BP_GUEST)
 
 #define _TIF_31BIT _BITUL(TIF_31BIT)
 #define _TIF_SINGLE_STEP   _BITUL(TIF_SINGLE_STEP)
diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index e6d7550..53145b5 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -107,6 +107,7 @@ _PIF_WORK   = (_PIF_PER_TRAP | _PIF_SYSCALL_RESTART)
aghi%r15,-(STACK_FRAME_OVERHEAD + __PT_SIZE)
j   3f
 1: UPDATE_VTIME %r14,%r15,\timer
+   BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP
 2: lg  %r15,__LC_ASYNC_STACK   # load async stack
 3: la  %r11,STACK_FRAME_OVERHEAD(%r15)
.endm
@@ -187,6 +188,40 @@ _PIF_WORK  = (_PIF_PER_TRAP | _PIF_SYSCALL_RESTART)
.popsection
.endm
 
+   .macro BPENTER tif_ptr,tif_mask
+   .pushsection .altinstr_replacement, "ax"
+662:   .word   0xc004, 0x, 0x  # 6 byte nop
+   .word   0xc004, 0x, 0x  # 6 byte nop
+   .popsection
+664:   TSTMSK  \tif_ptr,\tif_mask
+   jz  . + 8
+   .long   0xb2e8d000
+   .pushsection .altinstructions, "a"
+   .long 664b - .
+   .long 662b - .
+   .word 82
+   .byte 12
+   .byte 12
+   .popsection
+   .endm
+
+   .macro BPEXIT tif_ptr,tif_mask
+   TSTMSK  \tif_ptr,\tif_mask
+   .pushsection .altinstr_replacement, "ax"
+662:   jnz . + 8
+   .long   0xb2e8d000
+   .popsection
+664:   jz  . + 8
+   .long   0xb2e8c000
+   .pushsection .altinstructions, "a"
+   .long 664b - .
+   .long 662b - .
+   .word 82
+   .byte 8
+   .byte 8
+   .popsection
+   .endm
+
.section .kprobes.text, "ax"
 .Ldummy:
/*
@@ -240,9 +275,11 @@ ENTRY(__switch_to)
  */
 ENTRY(sie64a)
stmg%r6,%r14,__SF_GPRS(%r15)# save kernel registers
+   lg  %r12,__LC_CURRENT
stg %r2,__SF_EMPTY(%r15)# save control block pointer
stg %r3,__SF_EMPTY+8(%r15)  # save guest register save area
xc  __SF_EMPTY+16(8,%r15),__SF_EMPTY+16(%r15) # reason code = 0
+   mvc __SF_EMPTY+24(8,%r15),__TI_flags(%r12) # copy thread flags
TSTMSK  __LC_CPU_FLAGS,_CIF_FPU # load guest fp/vx registers ?
jno .Lsie_load_guest_gprs
brasl   %r14,load_fpu_regs  # load guest fp/vx regs
@@ -259,11 +296,12 @@ ENTRY(sie64a)
jnz .Lsie_skip
TSTMSK  __LC_CPU_FLAGS,_CIF_FPU
jo  .Lsie_skip  # exit if fp/vx regs changed
-   BPON
+   BPEXIT

[PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled

2018-02-06 Thread Huang, Ying

From: Huang Ying 

It was reported by Sergey Senozhatsky that if THP (Transparent Huge
Page) and frontswap (via zswap) are both enabled, when memory goes low
so that swap is triggered, segfault and memory corruption will occur
in random user space applications as follow,

kernel: urxvt[338]: segfault at 20 ip 7fc08889ae0d sp 7ffc73a7fc40 
error 6 in libc-2.26.so[7fc08881a000+1ae000]
 #0  0x7fc08889ae0d _int_malloc (libc.so.6)
 #1  0x7fc08889c2f3 malloc (libc.so.6)
 #2  0x560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
 #3  0x560e6005e75c n/a (urxvt)
 #4  0x560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez 
(urxvt)
 #5  0x560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
 #6  0x560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
 #7  0x560e6005c10f _Z17ev_invoke_pendingv (urxvt)
 #8  0x560e6005cb55 ev_run (urxvt)
 #9  0x560e6003b9b9 main (urxvt)
 #10 0x7fc08883af4a __libc_start_main (libc.so.6)
 #11 0x560e6003f9da _start (urxvt)

After bisection, it was found the first bad commit is
bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
out").

The root cause is as follow.

When the pages are written to swap device during swapping out in
swap_writepage(), zswap (fontswap) is tried to compress the pages
instead to improve the performance.  But zswap (frontswap) will treat
THP as normal page, so only the head page is saved.  After swapping
in, tail pages will not be restored to its original contents, so cause
the memory corruption in the applications.

This is fixed via splitting THP before writing the page to swap device
if frontswap is enabled.  To deal with the situation where frontswap
is enabled at runtime, whether the page is THP is checked before using
frontswap during swapping out too.

Reported-and-tested-by: Sergey Senozhatsky 
Signed-off-by: "Huang, Ying" 
Cc: Konrad Rzeszutek Wilk 
Cc: Dan Streetman 
Cc: Seth Jennings 
Cc: Minchan Kim 
Cc: Tetsuo Handa 
Cc: Shaohua Li 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Mel Gorman 
Cc: Shakeel Butt 
Cc: sta...@vger.kernel.org # 4.14
Fixes: bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped 
out")

Changelog:

v2:

- Move frontswap check into swapfile.c to avoid to make vmscan.c
  depends on frontswap.
---
 mm/page_io.c  | 2 +-
 mm/swapfile.c | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/page_io.c b/mm/page_io.c
index b41cf9644585..6dca817ae7a0 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -250,7 +250,7 @@ int swap_writepage(struct page *page, struct 
writeback_control *wbc)
unlock_page(page);
goto out;
}
-   if (frontswap_store(page) == 0) {
+   if (!PageTransHuge(page) && frontswap_store(page) == 0) {
set_page_writeback(page);
unlock_page(page);
end_page_writeback(page);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 006047b16814..0b7c7883ce64 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -934,6 +934,9 @@ int get_swap_pages(int n_goal, bool cluster, swp_entry_t 
swp_entries[])
 
/* Only single cluster request supported */
WARN_ON_ONCE(n_goal > 1 && cluster);
+   /* Frontswap doesn't support THP */
+   if (frontswap_enabled() && cluster)
+   goto noswap;
 
avail_pgs = atomic_long_read(_swap_pages) / nr_pages;
if (avail_pgs <= 0)
-- 
2.15.1

[PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled

2018-02-06 Thread Huang, Ying

From: Huang Ying 

It was reported by Sergey Senozhatsky that if THP (Transparent Huge
Page) and frontswap (via zswap) are both enabled, when memory goes low
so that swap is triggered, segfault and memory corruption will occur
in random user space applications as follow,

kernel: urxvt[338]: segfault at 20 ip 7fc08889ae0d sp 7ffc73a7fc40 
error 6 in libc-2.26.so[7fc08881a000+1ae000]
 #0  0x7fc08889ae0d _int_malloc (libc.so.6)
 #1  0x7fc08889c2f3 malloc (libc.so.6)
 #2  0x560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
 #3  0x560e6005e75c n/a (urxvt)
 #4  0x560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez 
(urxvt)
 #5  0x560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
 #6  0x560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
 #7  0x560e6005c10f _Z17ev_invoke_pendingv (urxvt)
 #8  0x560e6005cb55 ev_run (urxvt)
 #9  0x560e6003b9b9 main (urxvt)
 #10 0x7fc08883af4a __libc_start_main (libc.so.6)
 #11 0x560e6003f9da _start (urxvt)

After bisection, it was found the first bad commit is
bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
out").

The root cause is as follow.

When the pages are written to swap device during swapping out in
swap_writepage(), zswap (fontswap) is tried to compress the pages
instead to improve the performance.  But zswap (frontswap) will treat
THP as normal page, so only the head page is saved.  After swapping
in, tail pages will not be restored to its original contents, so cause
the memory corruption in the applications.

This is fixed via splitting THP before writing the page to swap device
if frontswap is enabled.  To deal with the situation where frontswap
is enabled at runtime, whether the page is THP is checked before using
frontswap during swapping out too.

Reported-and-tested-by: Sergey Senozhatsky 
Signed-off-by: "Huang, Ying" 
Cc: Konrad Rzeszutek Wilk 
Cc: Dan Streetman 
Cc: Seth Jennings 
Cc: Minchan Kim 
Cc: Tetsuo Handa 
Cc: Shaohua Li 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Mel Gorman 
Cc: Shakeel Butt 
Cc: sta...@vger.kernel.org # 4.14
Fixes: bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped 
out")

Changelog:

v2:

- Move frontswap check into swapfile.c to avoid to make vmscan.c
  depends on frontswap.
---
 mm/page_io.c  | 2 +-
 mm/swapfile.c | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/page_io.c b/mm/page_io.c
index b41cf9644585..6dca817ae7a0 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -250,7 +250,7 @@ int swap_writepage(struct page *page, struct 
writeback_control *wbc)
unlock_page(page);
goto out;
}
-   if (frontswap_store(page) == 0) {
+   if (!PageTransHuge(page) && frontswap_store(page) == 0) {
set_page_writeback(page);
unlock_page(page);
end_page_writeback(page);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 006047b16814..0b7c7883ce64 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -934,6 +934,9 @@ int get_swap_pages(int n_goal, bool cluster, swp_entry_t 
swp_entries[])
 
/* Only single cluster request supported */
WARN_ON_ONCE(n_goal > 1 && cluster);
+   /* Frontswap doesn't support THP */
+   if (frontswap_enabled() && cluster)
+   goto noswap;
 
avail_pgs = atomic_long_read(_swap_pages) / nr_pages;
if (avail_pgs <= 0)
-- 
2.15.1

[PATCH 2/6] s390: add optimized array_index_mask_nospec

2018-02-06 Thread Martin Schwidefsky

Add an optimized version of the array_index_mask_nospec function for
s390 based on a compare and a subtract with borrow.

Signed-off-by: Martin Schwidefsky 
---
 arch/s390/include/asm/barrier.h | 24 
 1 file changed, 24 insertions(+)

diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index 1043260..f9eddbc 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -49,6 +49,30 @@ do { 
\
 #define __smp_mb__before_atomic()  barrier()
 #define __smp_mb__after_atomic()   barrier()
 
+/**
+ * array_index_mask_nospec - generate a mask for array_idx() that is
+ * ~0UL when the bounds check succeeds and 0 otherwise
+ * @index: array element index
+ * @size: number of elements in array
+ */
+#define array_index_mask_nospec array_index_mask_nospec
+static inline unsigned long array_index_mask_nospec(unsigned long index,
+   unsigned long size)
+{
+   unsigned long mask;
+
+   if (__builtin_constant_p(size) && size > 0) {
+   asm("   clgr%2,%1\n"
+   "   slbgr   %0,%0\n"
+   :"=d" (mask) : "d" (size-1), "d" (index) :"cc");
+   return mask;
+   }
+   asm("   clgr%1,%2\n"
+   "   slbgr   %0,%0\n"
+   :"=d" (mask) : "d" (size), "d" (index) :"cc");
+   return ~mask;
+}
+
 #include 
 
 #endif /* __ASM_BARRIER_H */
-- 
2.7.4

[PATCH 4/6] s390: add options to change branch prediction behaviour for the kernel

2018-02-06 Thread Martin Schwidefsky

Add the PPA instruction to the system entry and exit path to switch
the kernel to a different branch prediction behaviour. The instructions
are added via CPU alternatives and can be disabled with the "nospec"
or the "nobp=0" kernel parameter. If the default behaviour selected
with CONFIG_KERNEL_NOBP is set to "n" then the "nobp=1" parameter can be
used to enable the changed kernel branch prediction.

Acked-by: Cornelia Huck 
Signed-off-by: Martin Schwidefsky 
---
 arch/s390/Kconfig | 17 ++
 arch/s390/include/asm/processor.h |  1 +
 arch/s390/kernel/alternative.c| 23 +++
 arch/s390/kernel/early.c  |  2 ++
 arch/s390/kernel/entry.S  | 48 +++
 arch/s390/kernel/ipl.c|  1 +
 arch/s390/kernel/smp.c|  2 ++
 7 files changed, 94 insertions(+)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 0105ce2..d514e25 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -540,6 +540,23 @@ config ARCH_RANDOM
 
  If unsure, say Y.
 
+config KERNEL_NOBP
+   def_bool n
+   prompt "Enable modified branch prediction for the kernel by default"
+   help
+ If this option is selected the kernel will switch to a modified
+ branch prediction mode if the firmware interface is available.
+ The modified branch prediction mode improves the behaviour in
+ regard to speculative execution.
+
+ With the option enabled the kernel parameter "nobp=0" or "nospec"
+ can be used to run the kernel in the normal branch prediction mode.
+
+ With the option disabled the modified branch prediction mode is
+ enabled with the "nobp=1" kernel parameter.
+
+ If unsure, say N.
+
 endmenu
 
 menu "Memory setup"
diff --git a/arch/s390/include/asm/processor.h 
b/arch/s390/include/asm/processor.h
index bfbfad4..5f37f9c 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -91,6 +91,7 @@ void cpu_detect_mhz_feature(void);
 extern const struct seq_operations cpuinfo_op;
 extern int sysctl_ieee_emulation_warnings;
 extern void execve_tail(void);
+extern void __bpon(void);
 
 /*
  * User space process size: 2GB for 31 bit, 4TB or 8PT for 64 bit.
diff --git a/arch/s390/kernel/alternative.c b/arch/s390/kernel/alternative.c
index 1abf4f3..2247613 100644
--- a/arch/s390/kernel/alternative.c
+++ b/arch/s390/kernel/alternative.c
@@ -15,6 +15,29 @@ static int __init disable_alternative_instructions(char *str)
 
 early_param("noaltinstr", disable_alternative_instructions);
 
+static int __init nobp_setup_early(char *str)
+{
+   bool enabled;
+   int rc;
+
+   rc = kstrtobool(str, );
+   if (rc)
+   return rc;
+   if (enabled && test_facility(82))
+   __set_facility(82, S390_lowcore.alt_stfle_fac_list);
+   else
+   __clear_facility(82, S390_lowcore.alt_stfle_fac_list);
+   return 0;
+}
+early_param("nobp", nobp_setup_early);
+
+static int __init nospec_setup_early(char *str)
+{
+   __clear_facility(82, S390_lowcore.alt_stfle_fac_list);
+   return 0;
+}
+early_param("nospec", nospec_setup_early);
+
 struct brcl_insn {
u16 opc;
s32 disp;
diff --git a/arch/s390/kernel/early.c b/arch/s390/kernel/early.c
index 510f218..ac707a9 100644
--- a/arch/s390/kernel/early.c
+++ b/arch/s390/kernel/early.c
@@ -196,6 +196,8 @@ static noinline __init void setup_facility_list(void)
memcpy(S390_lowcore.alt_stfle_fac_list,
   S390_lowcore.stfle_fac_list,
   sizeof(S390_lowcore.alt_stfle_fac_list));
+   if (!IS_ENABLED(CONFIG_KERNEL_NOBP))
+   __clear_facility(82, S390_lowcore.alt_stfle_fac_list);
 }
 
 static __init void detect_diag9c(void)
diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index 5d87eda..e6d7550 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -159,6 +159,34 @@ _PIF_WORK  = (_PIF_PER_TRAP | _PIF_SYSCALL_RESTART)
tm  off+\addr, \mask
.endm
 
+   .macro BPOFF
+   .pushsection .altinstr_replacement, "ax"
+660:   .long   0xb2e8c000
+   .popsection
+661:   .long   0x4700
+   .pushsection .altinstructions, "a"
+   .long 661b - .
+   .long 660b - .
+   .word 82
+   .byte 4
+   .byte 4
+   .popsection
+   .endm
+
+   .macro BPON
+   .pushsection .altinstr_replacement, "ax"
+662:   .long   0xb2e8d000
+   .popsection
+663:   .long   0x4700
+   .pushsection .altinstructions, "a"
+   .long 663b - .
+   .long 662b - .
+   .word 82
+   .byte 4
+   .byte 4
+   .popsection
+   .endm
+
.section .kprobes.text, "ax"
 .Ldummy:
/*
@@ -171,6 +199,11 @@ _PIF_WORK  = (_PIF_PER_TRAP | _PIF_SYSCALL_RESTART)
 */
nop 0
 
+ENTRY(__bpon)
+   .globl __bpon
+   BPON
+

[PATCH 2/6] s390: add optimized array_index_mask_nospec

2018-02-06 Thread Martin Schwidefsky

Add an optimized version of the array_index_mask_nospec function for
s390 based on a compare and a subtract with borrow.

Signed-off-by: Martin Schwidefsky 
---
 arch/s390/include/asm/barrier.h | 24 
 1 file changed, 24 insertions(+)

diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index 1043260..f9eddbc 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -49,6 +49,30 @@ do { 
\
 #define __smp_mb__before_atomic()  barrier()
 #define __smp_mb__after_atomic()   barrier()
 
+/**
+ * array_index_mask_nospec - generate a mask for array_idx() that is
+ * ~0UL when the bounds check succeeds and 0 otherwise
+ * @index: array element index
+ * @size: number of elements in array
+ */
+#define array_index_mask_nospec array_index_mask_nospec
+static inline unsigned long array_index_mask_nospec(unsigned long index,
+   unsigned long size)
+{
+   unsigned long mask;
+
+   if (__builtin_constant_p(size) && size > 0) {
+   asm("   clgr%2,%1\n"
+   "   slbgr   %0,%0\n"
+   :"=d" (mask) : "d" (size-1), "d" (index) :"cc");
+   return mask;
+   }
+   asm("   clgr%1,%2\n"
+   "   slbgr   %0,%0\n"
+   :"=d" (mask) : "d" (size), "d" (index) :"cc");
+   return ~mask;
+}
+
 #include 
 
 #endif /* __ASM_BARRIER_H */
-- 
2.7.4

[PATCH 4/6] s390: add options to change branch prediction behaviour for the kernel

2018-02-06 Thread Martin Schwidefsky

Add the PPA instruction to the system entry and exit path to switch
the kernel to a different branch prediction behaviour. The instructions
are added via CPU alternatives and can be disabled with the "nospec"
or the "nobp=0" kernel parameter. If the default behaviour selected
with CONFIG_KERNEL_NOBP is set to "n" then the "nobp=1" parameter can be
used to enable the changed kernel branch prediction.

Acked-by: Cornelia Huck 
Signed-off-by: Martin Schwidefsky 
---
 arch/s390/Kconfig | 17 ++
 arch/s390/include/asm/processor.h |  1 +
 arch/s390/kernel/alternative.c| 23 +++
 arch/s390/kernel/early.c  |  2 ++
 arch/s390/kernel/entry.S  | 48 +++
 arch/s390/kernel/ipl.c|  1 +
 arch/s390/kernel/smp.c|  2 ++
 7 files changed, 94 insertions(+)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 0105ce2..d514e25 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -540,6 +540,23 @@ config ARCH_RANDOM
 
  If unsure, say Y.
 
+config KERNEL_NOBP
+   def_bool n
+   prompt "Enable modified branch prediction for the kernel by default"
+   help
+ If this option is selected the kernel will switch to a modified
+ branch prediction mode if the firmware interface is available.
+ The modified branch prediction mode improves the behaviour in
+ regard to speculative execution.
+
+ With the option enabled the kernel parameter "nobp=0" or "nospec"
+ can be used to run the kernel in the normal branch prediction mode.
+
+ With the option disabled the modified branch prediction mode is
+ enabled with the "nobp=1" kernel parameter.
+
+ If unsure, say N.
+
 endmenu
 
 menu "Memory setup"
diff --git a/arch/s390/include/asm/processor.h 
b/arch/s390/include/asm/processor.h
index bfbfad4..5f37f9c 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -91,6 +91,7 @@ void cpu_detect_mhz_feature(void);
 extern const struct seq_operations cpuinfo_op;
 extern int sysctl_ieee_emulation_warnings;
 extern void execve_tail(void);
+extern void __bpon(void);
 
 /*
  * User space process size: 2GB for 31 bit, 4TB or 8PT for 64 bit.
diff --git a/arch/s390/kernel/alternative.c b/arch/s390/kernel/alternative.c
index 1abf4f3..2247613 100644
--- a/arch/s390/kernel/alternative.c
+++ b/arch/s390/kernel/alternative.c
@@ -15,6 +15,29 @@ static int __init disable_alternative_instructions(char *str)
 
 early_param("noaltinstr", disable_alternative_instructions);
 
+static int __init nobp_setup_early(char *str)
+{
+   bool enabled;
+   int rc;
+
+   rc = kstrtobool(str, );
+   if (rc)
+   return rc;
+   if (enabled && test_facility(82))
+   __set_facility(82, S390_lowcore.alt_stfle_fac_list);
+   else
+   __clear_facility(82, S390_lowcore.alt_stfle_fac_list);
+   return 0;
+}
+early_param("nobp", nobp_setup_early);
+
+static int __init nospec_setup_early(char *str)
+{
+   __clear_facility(82, S390_lowcore.alt_stfle_fac_list);
+   return 0;
+}
+early_param("nospec", nospec_setup_early);
+
 struct brcl_insn {
u16 opc;
s32 disp;
diff --git a/arch/s390/kernel/early.c b/arch/s390/kernel/early.c
index 510f218..ac707a9 100644
--- a/arch/s390/kernel/early.c
+++ b/arch/s390/kernel/early.c
@@ -196,6 +196,8 @@ static noinline __init void setup_facility_list(void)
memcpy(S390_lowcore.alt_stfle_fac_list,
   S390_lowcore.stfle_fac_list,
   sizeof(S390_lowcore.alt_stfle_fac_list));
+   if (!IS_ENABLED(CONFIG_KERNEL_NOBP))
+   __clear_facility(82, S390_lowcore.alt_stfle_fac_list);
 }
 
 static __init void detect_diag9c(void)
diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index 5d87eda..e6d7550 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -159,6 +159,34 @@ _PIF_WORK  = (_PIF_PER_TRAP | _PIF_SYSCALL_RESTART)
tm  off+\addr, \mask
.endm
 
+   .macro BPOFF
+   .pushsection .altinstr_replacement, "ax"
+660:   .long   0xb2e8c000
+   .popsection
+661:   .long   0x4700
+   .pushsection .altinstructions, "a"
+   .long 661b - .
+   .long 660b - .
+   .word 82
+   .byte 4
+   .byte 4
+   .popsection
+   .endm
+
+   .macro BPON
+   .pushsection .altinstr_replacement, "ax"
+662:   .long   0xb2e8d000
+   .popsection
+663:   .long   0x4700
+   .pushsection .altinstructions, "a"
+   .long 663b - .
+   .long 662b - .
+   .word 82
+   .byte 4
+   .byte 4
+   .popsection
+   .endm
+
.section .kprobes.text, "ax"
 .Ldummy:
/*
@@ -171,6 +199,11 @@ _PIF_WORK  = (_PIF_PER_TRAP | _PIF_SYSCALL_RESTART)
 */
nop 0
 
+ENTRY(__bpon)
+   .globl __bpon
+   BPON
+   br  %r14
+
 /*
  * Scheduler resume

linux/drivers/cpuidle: cpuidle_enter_state() issue

2018-02-06 Thread Li Wang

Hi Kernel-developers,

The flowing call trace was catch from kernel-v4.15, could anyone help
to analysis the cpuidle problem?
or, if you need any more detail info pls let me know.

Test Env:
IBM KVM Guest on ibm-p8-kvm-03
POWER8E (raw), altivec supported
9216 MB memory, 107 GB disk space

8<
[15002.722413] swapper/15: page allocation failure: order:0,
mode:0x1080020(GFP_ATOMIC), nodemask=(null)
[15002.853793] swapper/15 cpuset=/ mems_allowed=0
[15002.853932] CPU: 15 PID: 0 Comm: swapper/15 Not tainted 4.15.0 #1
[15002.854019] Call Trace:
[15002.854129] [c0023ff77650] [c0940b50]
.dump_stack+0xac/0xfc (unreliable)
[15002.854285] [c0023ff776e0] [c026c678] .warn_alloc+0xe8/0x180
[15002.854376] [c0023ff777a0] [c026d50c]
.__alloc_pages_nodemask+0xd6c/0xf90
[15002.854490] [c0023ff77980] [c02e9cc0]
.alloc_pages_current+0x90/0x120
[15002.854624] [c0023ff77a10] [c07990cc]
.skb_page_frag_refill+0x8c/0x120
[15002.854746] [c0023ff77aa0] [d3a561a8]
.try_fill_recv+0x368/0x620 [virtio_net]
[15003.422855] [c0023ff77ba0] [d3a568ec]
.virtnet_poll+0x25c/0x380 [virtio_net]
[15003.423864] [c0023ff77c70] [c07c18d0] .net_rx_action+0x330/0x4a0
[15003.424024] [c0023ff77d90] [c0960d50] .__do_softirq+0x150/0x3a8
[15003.424197] [c0023ff77e90] [c00ff608] .irq_exit+0x198/0x1b0
[15003.424342] [c0023ff77f10] [c0015504] .__do_irq+0x94/0x1f0
[15003.424485] [c0023ff77f90] [c0026d5c] .call_do_irq+0x14/0x24
[15003.424627] [c0023bc63820] [c00156ec] .do_IRQ+0x8c/0x100
[15003.424776] [c0023bc638c0] [c0008b34]
hardware_interrupt_common+0x114/0x120
[15003.424963] --- interrupt: 501 at .snooze_loop+0xa4/0x1c0
LR = .snooze_loop+0x60/0x1c0
[15003.425164] [c0023bc63bb0] [c0023bc63c50]
0xc0023bc63c50 (unreliable)
[15003.425346] [c0023bc63c30] [c075104c]
.cpuidle_enter_state+0xac/0x390
[15003.425534] [c0023bc63ce0] [c0157adc] .call_cpuidle+0x3c/0x70
[15003.425669] [c0023bc63d50] [c0157e90] .do_idle+0x2a0/0x300
[15003.425815] [c0023bc63e20] [c01580ac]
.cpu_startup_entry+0x2c/0x40
[15003.425995] [c0023bc63ea0] [c0045790]
.start_secondary+0x4d0/0x520
[15003.426170] [c0023bc63f90] [c000aa70]
start_secondary_prolog+0x10/0x14
-8<---

Any response will be appreciated!

-- 
Regards,
Li Wang
Email: wangli.a...@gmail.com

linux/drivers/cpuidle: cpuidle_enter_state() issue

2018-02-06 Thread Li Wang

Hi Kernel-developers,

The flowing call trace was catch from kernel-v4.15, could anyone help
to analysis the cpuidle problem?
or, if you need any more detail info pls let me know.

Test Env:
IBM KVM Guest on ibm-p8-kvm-03
POWER8E (raw), altivec supported
9216 MB memory, 107 GB disk space

8<
[15002.722413] swapper/15: page allocation failure: order:0,
mode:0x1080020(GFP_ATOMIC), nodemask=(null)
[15002.853793] swapper/15 cpuset=/ mems_allowed=0
[15002.853932] CPU: 15 PID: 0 Comm: swapper/15 Not tainted 4.15.0 #1
[15002.854019] Call Trace:
[15002.854129] [c0023ff77650] [c0940b50]
.dump_stack+0xac/0xfc (unreliable)
[15002.854285] [c0023ff776e0] [c026c678] .warn_alloc+0xe8/0x180
[15002.854376] [c0023ff777a0] [c026d50c]
.__alloc_pages_nodemask+0xd6c/0xf90
[15002.854490] [c0023ff77980] [c02e9cc0]
.alloc_pages_current+0x90/0x120
[15002.854624] [c0023ff77a10] [c07990cc]
.skb_page_frag_refill+0x8c/0x120
[15002.854746] [c0023ff77aa0] [d3a561a8]
.try_fill_recv+0x368/0x620 [virtio_net]
[15003.422855] [c0023ff77ba0] [d3a568ec]
.virtnet_poll+0x25c/0x380 [virtio_net]
[15003.423864] [c0023ff77c70] [c07c18d0] .net_rx_action+0x330/0x4a0
[15003.424024] [c0023ff77d90] [c0960d50] .__do_softirq+0x150/0x3a8
[15003.424197] [c0023ff77e90] [c00ff608] .irq_exit+0x198/0x1b0
[15003.424342] [c0023ff77f10] [c0015504] .__do_irq+0x94/0x1f0
[15003.424485] [c0023ff77f90] [c0026d5c] .call_do_irq+0x14/0x24
[15003.424627] [c0023bc63820] [c00156ec] .do_IRQ+0x8c/0x100
[15003.424776] [c0023bc638c0] [c0008b34]
hardware_interrupt_common+0x114/0x120
[15003.424963] --- interrupt: 501 at .snooze_loop+0xa4/0x1c0
LR = .snooze_loop+0x60/0x1c0
[15003.425164] [c0023bc63bb0] [c0023bc63c50]
0xc0023bc63c50 (unreliable)
[15003.425346] [c0023bc63c30] [c075104c]
.cpuidle_enter_state+0xac/0x390
[15003.425534] [c0023bc63ce0] [c0157adc] .call_cpuidle+0x3c/0x70
[15003.425669] [c0023bc63d50] [c0157e90] .do_idle+0x2a0/0x300
[15003.425815] [c0023bc63e20] [c01580ac]
.cpu_startup_entry+0x2c/0x40
[15003.425995] [c0023bc63ea0] [c0045790]
.start_secondary+0x4d0/0x520
[15003.426170] [c0023bc63f90] [c000aa70]
start_secondary_prolog+0x10/0x14
-8<---

Any response will be appreciated!

-- 
Regards,
Li Wang
Email: wangli.a...@gmail.com

Re: WARNING: kmalloc bug in tun_device_event

2018-02-06 Thread Jason Wang




On 2018年02月07日 06:58, syzbot wrote:

Hello,

syzbot hit the following crash on net-next commit
617aebe6a97efa539cc4b8a52adccd89596e6be0 (Sun Feb 4 00:25:42 2018 +)
Merge tag 'usercopy-v4.16-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux


So far this crash happened 5 times on net-next, upstream.
C reproducer is attached.
syzkaller reproducer is attached.
Raw console output is attached.
compiler: gcc (GCC) 7.1.1 20170620
.config is attached.

IMPORTANT: if you fix the bug, please add the following tag to the 
commit:

Reported-by: syzbot+e4d4f9ddd42955397...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for 
details.

If you forward the report, please keep this part and the footer.

WARNING: CPU: 1 PID: 4134 at mm/slab_common.c:1012 
kmalloc_slab+0x5d/0x70 mm/slab_common.c:1012

Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 4134 Comm: syzkaller993072 Not tainted 4.15.0+ #221
Hardware name: Google Google Compute Engine/Google Compute Engine, 
BIOS Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:53
 panic+0x1e4/0x41c kernel/panic.c:183
 __warn+0x1dc/0x200 kernel/panic.c:547
 report_bug+0x211/0x2d0 lib/bug.c:184
 fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
 fixup_bug arch/x86/kernel/traps.c:247 [inline]
 do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
 invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1097
RIP: 0010:kmalloc_slab+0x5d/0x70 mm/slab_common.c:1012
RSP: 0018:8801ba7ceb20 EFLAGS: 00010246
RAX:  RBX:  RCX: 83b88bed
RDX:  RSI:  RDI: 00040008
RBP: 8801ba7ceb20 R08: 1100374f9cd7 R09: 
R10:  R11:  R12: 00040008
R13: dc00 R14: 014080c0 R15: 8801b5d52080
 __do_kmalloc mm/slab.c:3700 [inline]
 __kmalloc+0x25/0x760 mm/slab.c:3714
 kmalloc_array include/linux/slab.h:631 [inline]
 kcalloc include/linux/slab.h:642 [inline]
 __ptr_ring_init_queue_alloc include/linux/ptr_ring.h:469 [inline]
 ptr_ring_resize_multiple include/linux/ptr_ring.h:629 [inline]
 tun_queue_resize drivers/net/tun.c:3319 [inline]
 tun_device_event+0x471/0xec0 drivers/net/tun.c:3338
 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
 __raw_notifier_call_chain kernel/notifier.c:394 [inline]
 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
 call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
 call_netdevice_notifiers net/core/dev.c:1725 [inline]
 dev_change_tx_queue_len+0x117/0x220 net/core/dev.c:7065
 do_setlink+0xba7/0x3bb0 net/core/rtnetlink.c:2341
 rtnl_newlink+0xf1c/0x1a20 net/core/rtnetlink.c:2915
 rtnetlink_rcv_msg+0x57f/0xb10 net/core/rtnetlink.c:4587
 netlink_rcv_skb+0x14b/0x380 net/netlink/af_netlink.c:2442
 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4605
 netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline]
 netlink_unicast+0x4c4/0x6b0 net/netlink/af_netlink.c:1334
 netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1897
 sock_sendmsg_nosec net/socket.c:630 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:640
 ___sys_sendmsg+0x767/0x8b0 net/socket.c:2046
 __sys_sendmsg+0xe5/0x210 net/socket.c:2080
 SYSC_sendmsg net/socket.c:2091 [inline]
 SyS_sendmsg+0x2d/0x50 net/socket.c:2087
 entry_SYSCALL_64_fastpath+0x29/0xa0
RIP: 0033:0x4463c9
RSP: 002b:7ffe63916e68 EFLAGS: 0246 ORIG_RAX: 002e
RAX: ffda RBX: 004a7af2 RCX: 004463c9
RDX:  RSI: 20504000 RDI: 0004
RBP: 7ffe63916f08 R08:  R09: 004a7af2
R10:  R11: 0246 R12: 7ffe63916f08
R13: 00403890 R14:  R15: 
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkal...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is 
merged

into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug 
report.
Note: all commands must start from beginning of the line in the email 
body.


Looks like we need cap the maximum size that ptr_ring could allocate.

Will post a patch soon.

Thanks

Re: WARNING: kmalloc bug in tun_device_event

2018-02-06 Thread Jason Wang




On 2018年02月07日 06:58, syzbot wrote:

Hello,

syzbot hit the following crash on net-next commit
617aebe6a97efa539cc4b8a52adccd89596e6be0 (Sun Feb 4 00:25:42 2018 +)
Merge tag 'usercopy-v4.16-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux


So far this crash happened 5 times on net-next, upstream.
C reproducer is attached.
syzkaller reproducer is attached.
Raw console output is attached.
compiler: gcc (GCC) 7.1.1 20170620
.config is attached.

IMPORTANT: if you fix the bug, please add the following tag to the 
commit:

Reported-by: syzbot+e4d4f9ddd42955397...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for 
details.

If you forward the report, please keep this part and the footer.

WARNING: CPU: 1 PID: 4134 at mm/slab_common.c:1012 
kmalloc_slab+0x5d/0x70 mm/slab_common.c:1012

Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 4134 Comm: syzkaller993072 Not tainted 4.15.0+ #221
Hardware name: Google Google Compute Engine/Google Compute Engine, 
BIOS Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:53
 panic+0x1e4/0x41c kernel/panic.c:183
 __warn+0x1dc/0x200 kernel/panic.c:547
 report_bug+0x211/0x2d0 lib/bug.c:184
 fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
 fixup_bug arch/x86/kernel/traps.c:247 [inline]
 do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
 invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1097
RIP: 0010:kmalloc_slab+0x5d/0x70 mm/slab_common.c:1012
RSP: 0018:8801ba7ceb20 EFLAGS: 00010246
RAX:  RBX:  RCX: 83b88bed
RDX:  RSI:  RDI: 00040008
RBP: 8801ba7ceb20 R08: 1100374f9cd7 R09: 
R10:  R11:  R12: 00040008
R13: dc00 R14: 014080c0 R15: 8801b5d52080
 __do_kmalloc mm/slab.c:3700 [inline]
 __kmalloc+0x25/0x760 mm/slab.c:3714
 kmalloc_array include/linux/slab.h:631 [inline]
 kcalloc include/linux/slab.h:642 [inline]
 __ptr_ring_init_queue_alloc include/linux/ptr_ring.h:469 [inline]
 ptr_ring_resize_multiple include/linux/ptr_ring.h:629 [inline]
 tun_queue_resize drivers/net/tun.c:3319 [inline]
 tun_device_event+0x471/0xec0 drivers/net/tun.c:3338
 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
 __raw_notifier_call_chain kernel/notifier.c:394 [inline]
 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
 call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
 call_netdevice_notifiers net/core/dev.c:1725 [inline]
 dev_change_tx_queue_len+0x117/0x220 net/core/dev.c:7065
 do_setlink+0xba7/0x3bb0 net/core/rtnetlink.c:2341
 rtnl_newlink+0xf1c/0x1a20 net/core/rtnetlink.c:2915
 rtnetlink_rcv_msg+0x57f/0xb10 net/core/rtnetlink.c:4587
 netlink_rcv_skb+0x14b/0x380 net/netlink/af_netlink.c:2442
 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4605
 netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline]
 netlink_unicast+0x4c4/0x6b0 net/netlink/af_netlink.c:1334
 netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1897
 sock_sendmsg_nosec net/socket.c:630 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:640
 ___sys_sendmsg+0x767/0x8b0 net/socket.c:2046
 __sys_sendmsg+0xe5/0x210 net/socket.c:2080
 SYSC_sendmsg net/socket.c:2091 [inline]
 SyS_sendmsg+0x2d/0x50 net/socket.c:2087
 entry_SYSCALL_64_fastpath+0x29/0xa0
RIP: 0033:0x4463c9
RSP: 002b:7ffe63916e68 EFLAGS: 0246 ORIG_RAX: 002e
RAX: ffda RBX: 004a7af2 RCX: 004463c9
RDX:  RSI: 20504000 RDI: 0004
RBP: 7ffe63916f08 R08:  R09: 004a7af2
R10:  R11: 0246 R12: 7ffe63916f08
R13: 00403890 R14:  R15: 
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkal...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is 
merged

into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug 
report.
Note: all commands must start from beginning of the line in the email 
body.


Looks like we need cap the maximum size that ptr_ring could allocate.

Will post a patch soon.

Thanks

RE: [Patch v13 0/4] This patchset is to remove PPCisms for QEIC

2018-02-06 Thread Qiang Zhao

Hi all,

Is there any comments on this patchset?

Best Regards
Qiang Zhao

-Original Message-
From: Zhao Qiang [mailto:qiang.z...@nxp.com] 
Sent: 2017年11月10日 11:31
To: t...@linutronix.de; marc.zyng...@arm.com; ja...@lakedaemon.net
Cc: linux-kernel@vger.kernel.org; Qiang Zhao 
Subject: [Patch v13 0/4] This patchset is to remove PPCisms for QEIC

QEIC is an interrupt controller for QE, was put under drivers/soc/fsl/qe, and 
now move to driver/irqchip.
And QEIC is supported more than just powerpc boards, so remove PPCisms.

changelog:
Changes for v8:
- use IRQCHIP_DECLARE() instead of subsys_initcall in qeic driver
- remove include/soc/fsl/qe/qe_ic.h
Changes for v9:
- rebase 
- fix the compile issue when apply the second patch, in fact, there was 
no compile issue 
  when apply all the patches of this patchset
Changes for v10:
- simplify codes, remove duplicated codes 
Changes for v11:
- rebase
Changes for v13:
- rewrite single-bit constants to BIT(x) to make the code more readable

Zhao Qiang (4):
  irqchip/qeic: move qeic driver from drivers/soc/fsl/qe
Changes for v2:
- modify the subject and commit msg
Changes for v3:
- merge .h file to .c, rename it with irq-qeic.c
Changes for v4:
- modify comments
Changes for v5:
- disable rename detection
Changes for v6:
- rebase
Changes for v7:
- na

  irqchip/qeic: merge qeic init code from platforms to a common function
Changes for v2:
- modify subject and commit msg
- add check for qeic by type
Changes for v3:
- na
Changes for v4:
- na
Changes for v5:
- na
Changes for v6:
- rebase
Changes for v7:
- na
Changes for v8:
- use IRQCHIP_DECLARE() instead of subsys_initcall

  irqchip/qeic: merge qeic_of_init into qe_ic_init
Changes for v2:
- modify subject and commit msg
- return 0 and add put node when return in qe_ic_init
Changes for v3:
- na
Changes for v4:
- na
Changes for v5:
- na
Changes for v6:
- rebase
Changes for v7:
- na
Changes for v12:
- remove unused code

  irqchip/qeic: remove PPCisms for QEIC
Changes for v6:
- new added
Changes for v7:
- fix warning
Changes for v8:
- remove include/soc/fsl/qe/qe_ic.h

Zhao Qiang (4):
  irqchip/qeic: move qeic driver from drivers/soc/fsl/qe
  irqchip/qeic: merge qeic init code from platforms to a common function
  irqchip/qeic: merge qeic_of_init into qe_ic_init
  irqchip/qeic: remove PPCisms for QEIC

 MAINTAINERS|   6 +
 arch/powerpc/platforms/83xx/km83xx.c   |   1 -
 arch/powerpc/platforms/83xx/misc.c |  16 -
 arch/powerpc/platforms/83xx/mpc832x_mds.c  |   1 -
 arch/powerpc/platforms/83xx/mpc832x_rdb.c  |   1 -
 arch/powerpc/platforms/83xx/mpc836x_mds.c  |   1 -
 arch/powerpc/platforms/83xx/mpc836x_rdk.c  |   1 -
 arch/powerpc/platforms/85xx/corenet_generic.c  |  10 -
 arch/powerpc/platforms/85xx/mpc85xx_mds.c  |  15 -
 arch/powerpc/platforms/85xx/mpc85xx_rdb.c  |  17 -
 arch/powerpc/platforms/85xx/twr_p102x.c|  15 -
 drivers/irqchip/Makefile   |   1 +
 drivers/{soc/fsl/qe/qe_ic.c => irqchip/irq-qeic.c} | 423 +++--
 drivers/soc/fsl/qe/Makefile|   2 +-
 drivers/soc/fsl/qe/qe_ic.h | 103 -
 include/soc/fsl/qe/qe_ic.h | 139 ---
 16 files changed, 231 insertions(+), 521 deletions(-)  rename 
drivers/{soc/fsl/qe/qe_ic.c => irqchip/irq-qeic.c} (53%)  delete mode 100644 
drivers/soc/fsl/qe/qe_ic.h  delete mode 100644 include/soc/fsl/qe/qe_ic.h

--
2.14.1

RE: [Patch v13 0/4] This patchset is to remove PPCisms for QEIC

2018-02-06 Thread Qiang Zhao

Hi all,

Is there any comments on this patchset?

Best Regards
Qiang Zhao

-Original Message-
From: Zhao Qiang [mailto:qiang.z...@nxp.com] 
Sent: 2017年11月10日 11:31
To: t...@linutronix.de; marc.zyng...@arm.com; ja...@lakedaemon.net
Cc: linux-kernel@vger.kernel.org; Qiang Zhao 
Subject: [Patch v13 0/4] This patchset is to remove PPCisms for QEIC

QEIC is an interrupt controller for QE, was put under drivers/soc/fsl/qe, and 
now move to driver/irqchip.
And QEIC is supported more than just powerpc boards, so remove PPCisms.

changelog:
Changes for v8:
- use IRQCHIP_DECLARE() instead of subsys_initcall in qeic driver
- remove include/soc/fsl/qe/qe_ic.h
Changes for v9:
- rebase 
- fix the compile issue when apply the second patch, in fact, there was 
no compile issue 
  when apply all the patches of this patchset
Changes for v10:
- simplify codes, remove duplicated codes 
Changes for v11:
- rebase
Changes for v13:
- rewrite single-bit constants to BIT(x) to make the code more readable

Zhao Qiang (4):
  irqchip/qeic: move qeic driver from drivers/soc/fsl/qe
Changes for v2:
- modify the subject and commit msg
Changes for v3:
- merge .h file to .c, rename it with irq-qeic.c
Changes for v4:
- modify comments
Changes for v5:
- disable rename detection
Changes for v6:
- rebase
Changes for v7:
- na

  irqchip/qeic: merge qeic init code from platforms to a common function
Changes for v2:
- modify subject and commit msg
- add check for qeic by type
Changes for v3:
- na
Changes for v4:
- na
Changes for v5:
- na
Changes for v6:
- rebase
Changes for v7:
- na
Changes for v8:
- use IRQCHIP_DECLARE() instead of subsys_initcall

  irqchip/qeic: merge qeic_of_init into qe_ic_init
Changes for v2:
- modify subject and commit msg
- return 0 and add put node when return in qe_ic_init
Changes for v3:
- na
Changes for v4:
- na
Changes for v5:
- na
Changes for v6:
- rebase
Changes for v7:
- na
Changes for v12:
- remove unused code

  irqchip/qeic: remove PPCisms for QEIC
Changes for v6:
- new added
Changes for v7:
- fix warning
Changes for v8:
- remove include/soc/fsl/qe/qe_ic.h

Zhao Qiang (4):
  irqchip/qeic: move qeic driver from drivers/soc/fsl/qe
  irqchip/qeic: merge qeic init code from platforms to a common function
  irqchip/qeic: merge qeic_of_init into qe_ic_init
  irqchip/qeic: remove PPCisms for QEIC

 MAINTAINERS|   6 +
 arch/powerpc/platforms/83xx/km83xx.c   |   1 -
 arch/powerpc/platforms/83xx/misc.c |  16 -
 arch/powerpc/platforms/83xx/mpc832x_mds.c  |   1 -
 arch/powerpc/platforms/83xx/mpc832x_rdb.c  |   1 -
 arch/powerpc/platforms/83xx/mpc836x_mds.c  |   1 -
 arch/powerpc/platforms/83xx/mpc836x_rdk.c  |   1 -
 arch/powerpc/platforms/85xx/corenet_generic.c  |  10 -
 arch/powerpc/platforms/85xx/mpc85xx_mds.c  |  15 -
 arch/powerpc/platforms/85xx/mpc85xx_rdb.c  |  17 -
 arch/powerpc/platforms/85xx/twr_p102x.c|  15 -
 drivers/irqchip/Makefile   |   1 +
 drivers/{soc/fsl/qe/qe_ic.c => irqchip/irq-qeic.c} | 423 +++--
 drivers/soc/fsl/qe/Makefile|   2 +-
 drivers/soc/fsl/qe/qe_ic.h | 103 -
 include/soc/fsl/qe/qe_ic.h | 139 ---
 16 files changed, 231 insertions(+), 521 deletions(-)  rename 
drivers/{soc/fsl/qe/qe_ic.c => irqchip/irq-qeic.c} (53%)  delete mode 100644 
drivers/soc/fsl/qe/qe_ic.h  delete mode 100644 include/soc/fsl/qe/qe_ic.h

--
2.14.1

Re: WARNING: proc registration bug in clusterip_tg_check

2018-02-06 Thread Cong Wang

On Tue, Feb 6, 2018 at 6:27 AM, syzbot
 wrote:
> Hello,
>
> syzbot hit the following crash on net-next commit
> 617aebe6a97efa539cc4b8a52adccd89596e6be0 (Sun Feb 4 00:25:42 2018 +)
> Merge tag 'usercopy-v4.16-rc1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
>
> So far this crash happened 5 times on net-next, upstream.
> C reproducer is attached.
> syzkaller reproducer is attached.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+03218bcdba6aa7644...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> x_tables: ip_tables: osf match: only valid for protocol 6
> x_tables: ip_tables: osf match: only valid for protocol 6
> x_tables: ip_tables: osf match: only valid for protocol 6
> [ cut here ]
> proc_dir_entry 'ipt_CLUSTERIP/172.20.0.170' already registered
> WARNING: CPU: 1 PID: 4152 at fs/proc/generic.c:330 proc_register+0x2a4/0x370
> fs/proc/generic.c:329
> Kernel panic - not syncing: panic_on_warn set ...
>
> CPU: 1 PID: 4152 Comm: syzkaller851476 Not tainted 4.15.0+ #221
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  panic+0x1e4/0x41c kernel/panic.c:183
>  __warn+0x1dc/0x200 kernel/panic.c:547
>  report_bug+0x211/0x2d0 lib/bug.c:184
>  fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
>  fixup_bug arch/x86/kernel/traps.c:247 [inline]
>  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>  invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1097
> RIP: 0010:proc_register+0x2a4/0x370 fs/proc/generic.c:329
> RSP: 0018:8801cbd6ee20 EFLAGS: 00010286
> RAX: dc08 RBX: 8801d2181038 RCX: 815a57ae
> RDX:  RSI: 1100397add74 RDI: 1100397add49
> RBP: 8801cbd6ee70 R08: 1100397add0b R09: 
> R10: 8801cbd6ecd8 R11:  R12: 8801b2bb1cc0
> R13: dc00 R14: 8801b0d8dbc8 R15: 8801b2bb1d81
>  proc_create_data+0xf8/0x180 fs/proc/generic.c:494
>  clusterip_config_init net/ipv4/netfilter/ipt_CLUSTERIP.c:250 [inline]

I think there is probably a race condition between clusterip_config_entry_put()
and clusterip_config_init(), after we release the spinlock, a new proc
with the same IP could be created therefore triggers this warning

I am not sure if it is enough to just move the proc_remove() under
spinlock...


diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 3a84a60f6b39..1ff72b87a066 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -107,12 +107,6 @@ clusterip_config_entry_put(struct net *net,
struct clusterip_config *c)

local_bh_disable();
if (refcount_dec_and_lock(>entries, >lock)) {
-   list_del_rcu(>list);
-   spin_unlock(>lock);
-   local_bh_enable();
-
-   unregister_netdevice_notifier(>notifier);
-
/* In case anyone still accesses the file, the open/close
 * functions are also incrementing the refcount on their own,
 * so it's safe to remove the entry even if it's in use. */
@@ -120,6 +114,12 @@ clusterip_config_entry_put(struct net *net,
struct clusterip_config *c)
if (cn->procdir)
proc_remove(c->pde);
 #endif
+   list_del_rcu(>list);
+   spin_unlock(>lock);
+   local_bh_enable();
+
+   unregister_netdevice_notifier(>notifier);
+
return;
}
local_bh_enable();


>  clusterip_tg_check+0xf9c/0x16d0 net/ipv4/netfilter/ipt_CLUSTERIP.c:488
>  xt_check_target+0x22c/0x7d0 net/netfilter/x_tables.c:850
>  check_target net/ipv4/netfilter/ip_tables.c:513 [inline]
>  find_check_entry.isra.8+0x8c8/0xcb0 net/ipv4/netfilter/ip_tables.c:554
>  translate_table+0xed1/0x1610 net/ipv4/netfilter/ip_tables.c:725
>  do_replace net/ipv4/netfilter/ip_tables.c:1141 [inline]
>  do_ipt_set_ctl+0x370/0x5f0 net/ipv4/netfilter/ip_tables.c:1675
>  nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
>  nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
>  ip_setsockopt+0x97/0xa0 net/ipv4/ip_sockglue.c:1259
>  sctp_setsockopt+0x2b6/0x61d0 net/sctp/socket.c:4104
>  sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2975
>  SYSC_setsockopt net/socket.c:1849 [inline]
>  SyS_setsockopt+0x189/0x360 net/socket.c:1828
>  entry_SYSCALL_64_fastpath+0x29/0xa0
> RIP: 0033:0x446839
> RSP: 002b:7f0309d0fdb8 EFLAGS: 0246 ORIG_RAX: 0036
> RAX:

Re: WARNING: proc registration bug in clusterip_tg_check

2018-02-06 Thread Cong Wang

On Tue, Feb 6, 2018 at 6:27 AM, syzbot
 wrote:
> Hello,
>
> syzbot hit the following crash on net-next commit
> 617aebe6a97efa539cc4b8a52adccd89596e6be0 (Sun Feb 4 00:25:42 2018 +)
> Merge tag 'usercopy-v4.16-rc1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
>
> So far this crash happened 5 times on net-next, upstream.
> C reproducer is attached.
> syzkaller reproducer is attached.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+03218bcdba6aa7644...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> x_tables: ip_tables: osf match: only valid for protocol 6
> x_tables: ip_tables: osf match: only valid for protocol 6
> x_tables: ip_tables: osf match: only valid for protocol 6
> [ cut here ]
> proc_dir_entry 'ipt_CLUSTERIP/172.20.0.170' already registered
> WARNING: CPU: 1 PID: 4152 at fs/proc/generic.c:330 proc_register+0x2a4/0x370
> fs/proc/generic.c:329
> Kernel panic - not syncing: panic_on_warn set ...
>
> CPU: 1 PID: 4152 Comm: syzkaller851476 Not tainted 4.15.0+ #221
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  panic+0x1e4/0x41c kernel/panic.c:183
>  __warn+0x1dc/0x200 kernel/panic.c:547
>  report_bug+0x211/0x2d0 lib/bug.c:184
>  fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
>  fixup_bug arch/x86/kernel/traps.c:247 [inline]
>  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>  invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1097
> RIP: 0010:proc_register+0x2a4/0x370 fs/proc/generic.c:329
> RSP: 0018:8801cbd6ee20 EFLAGS: 00010286
> RAX: dc08 RBX: 8801d2181038 RCX: 815a57ae
> RDX:  RSI: 1100397add74 RDI: 1100397add49
> RBP: 8801cbd6ee70 R08: 1100397add0b R09: 
> R10: 8801cbd6ecd8 R11:  R12: 8801b2bb1cc0
> R13: dc00 R14: 8801b0d8dbc8 R15: 8801b2bb1d81
>  proc_create_data+0xf8/0x180 fs/proc/generic.c:494
>  clusterip_config_init net/ipv4/netfilter/ipt_CLUSTERIP.c:250 [inline]

I think there is probably a race condition between clusterip_config_entry_put()
and clusterip_config_init(), after we release the spinlock, a new proc
with the same IP could be created therefore triggers this warning

I am not sure if it is enough to just move the proc_remove() under
spinlock...


diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 3a84a60f6b39..1ff72b87a066 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -107,12 +107,6 @@ clusterip_config_entry_put(struct net *net,
struct clusterip_config *c)

local_bh_disable();
if (refcount_dec_and_lock(>entries, >lock)) {
-   list_del_rcu(>list);
-   spin_unlock(>lock);
-   local_bh_enable();
-
-   unregister_netdevice_notifier(>notifier);
-
/* In case anyone still accesses the file, the open/close
 * functions are also incrementing the refcount on their own,
 * so it's safe to remove the entry even if it's in use. */
@@ -120,6 +114,12 @@ clusterip_config_entry_put(struct net *net,
struct clusterip_config *c)
if (cn->procdir)
proc_remove(c->pde);
 #endif
+   list_del_rcu(>list);
+   spin_unlock(>lock);
+   local_bh_enable();
+
+   unregister_netdevice_notifier(>notifier);
+
return;
}
local_bh_enable();


>  clusterip_tg_check+0xf9c/0x16d0 net/ipv4/netfilter/ipt_CLUSTERIP.c:488
>  xt_check_target+0x22c/0x7d0 net/netfilter/x_tables.c:850
>  check_target net/ipv4/netfilter/ip_tables.c:513 [inline]
>  find_check_entry.isra.8+0x8c8/0xcb0 net/ipv4/netfilter/ip_tables.c:554
>  translate_table+0xed1/0x1610 net/ipv4/netfilter/ip_tables.c:725
>  do_replace net/ipv4/netfilter/ip_tables.c:1141 [inline]
>  do_ipt_set_ctl+0x370/0x5f0 net/ipv4/netfilter/ip_tables.c:1675
>  nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
>  nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
>  ip_setsockopt+0x97/0xa0 net/ipv4/ip_sockglue.c:1259
>  sctp_setsockopt+0x2b6/0x61d0 net/sctp/socket.c:4104
>  sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2975
>  SYSC_setsockopt net/socket.c:1849 [inline]
>  SyS_setsockopt+0x189/0x360 net/socket.c:1828
>  entry_SYSCALL_64_fastpath+0x29/0xa0
> RIP: 0033:0x446839
> RSP: 002b:7f0309d0fdb8 EFLAGS: 0246 ORIG_RAX: 0036
> RAX: ffda RBX: 006dbc24 RCX:

Re: [PATCH] KVM: X86: Fix SMRAM accessing even if VM is shutdown

2018-02-06 Thread Dmitry Vyukov

On Wed, Feb 7, 2018 at 7:25 AM, Wanpeng Li  wrote:
> From: Wanpeng Li 
>
> Reported by syzkaller:
>
>WARNING: CPU: 6 PID: 2434 at arch/x86/kvm/vmx.c:6660 
> handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
>CPU: 6 PID: 2434 Comm: repro_test Not tainted 4.15.0+ #4
>RIP: 0010:handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
>Call Trace:
> vmx_handle_exit+0xbd/0xe20 [kvm_intel]
> kvm_arch_vcpu_ioctl_run+0xdaf/0x1d50 [kvm]
> kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
> do_vfs_ioctl+0xa4/0x6a0
> SyS_ioctl+0x79/0x90
> entry_SYSCALL_64_fastpath+0x25/0x9c
>
> The syzkaller creates a former thread to issue KVM_SMI ioctl, and then creates
> a latter thread to mmap and operate on the same vCPU, rsm emulation will not 
> be
> executed since there is no something like seabios which implements smi handler
> when running syzkaller directly. This triggers a race condition when running
> the testcase with multiple threads. Sometimes one thread exit w/ SHUTDOWN
> reason, another thread mmaps and operates on the same vCPU, it continues to
> use CS=0x3, IP=0x8000 to access the address of SMI handler which results
> in the above ept misconfig. This patch fixes it by bailing out immediately if
> the vCPU is marked EXIT_SHUTDOWN reason.
>
> Reported-by: Dmitry Vyukov 

This was reported by syzbot:
https://groups.google.com/d/msg/syzkaller-bugs/6GrlY0UcDEk/aMShRKq3AwAJ

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: 
syzbot+c1d9517cab094dae65e446c0c5b4de6c40f4d...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed.


> Cc: Dmitry Vyukov 
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Signed-off-by: Wanpeng Li 
> ---
>  arch/x86/kvm/x86.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 786cd00..445e702 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7458,6 +7458,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
> struct kvm_run *kvm_run)
> goto out;
> }
>
> +   if (unlikely(vcpu->run->exit_reason == KVM_EXIT_SHUTDOWN)) {
> +   r = -EINVAL;
> +   goto out;
> +   }
> +
> if (vcpu->run->kvm_dirty_regs) {
> r = sync_regs(vcpu);
> if (r != 0)
> --
> 2.7.4
>

Re: [PATCH] KVM: X86: Fix SMRAM accessing even if VM is shutdown

2018-02-06 Thread Dmitry Vyukov

On Wed, Feb 7, 2018 at 7:25 AM, Wanpeng Li  wrote:
> From: Wanpeng Li 
>
> Reported by syzkaller:
>
>WARNING: CPU: 6 PID: 2434 at arch/x86/kvm/vmx.c:6660 
> handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
>CPU: 6 PID: 2434 Comm: repro_test Not tainted 4.15.0+ #4
>RIP: 0010:handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
>Call Trace:
> vmx_handle_exit+0xbd/0xe20 [kvm_intel]
> kvm_arch_vcpu_ioctl_run+0xdaf/0x1d50 [kvm]
> kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
> do_vfs_ioctl+0xa4/0x6a0
> SyS_ioctl+0x79/0x90
> entry_SYSCALL_64_fastpath+0x25/0x9c
>
> The syzkaller creates a former thread to issue KVM_SMI ioctl, and then creates
> a latter thread to mmap and operate on the same vCPU, rsm emulation will not 
> be
> executed since there is no something like seabios which implements smi handler
> when running syzkaller directly. This triggers a race condition when running
> the testcase with multiple threads. Sometimes one thread exit w/ SHUTDOWN
> reason, another thread mmaps and operates on the same vCPU, it continues to
> use CS=0x3, IP=0x8000 to access the address of SMI handler which results
> in the above ept misconfig. This patch fixes it by bailing out immediately if
> the vCPU is marked EXIT_SHUTDOWN reason.
>
> Reported-by: Dmitry Vyukov 

This was reported by syzbot:
https://groups.google.com/d/msg/syzkaller-bugs/6GrlY0UcDEk/aMShRKq3AwAJ

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: 
syzbot+c1d9517cab094dae65e446c0c5b4de6c40f4d...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed.


> Cc: Dmitry Vyukov 
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Signed-off-by: Wanpeng Li 
> ---
>  arch/x86/kvm/x86.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 786cd00..445e702 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7458,6 +7458,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
> struct kvm_run *kvm_run)
> goto out;
> }
>
> +   if (unlikely(vcpu->run->exit_reason == KVM_EXIT_SHUTDOWN)) {
> +   r = -EINVAL;
> +   goto out;
> +   }
> +
> if (vcpu->run->kvm_dirty_regs) {
> r = sync_regs(vcpu);
> if (r != 0)
> --
> 2.7.4
>

[PATCH] KVM: X86: Fix SMRAM accessing even if VM is shutdown

2018-02-06 Thread Wanpeng Li

From: Wanpeng Li 

Reported by syzkaller:

   WARNING: CPU: 6 PID: 2434 at arch/x86/kvm/vmx.c:6660 
handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
   CPU: 6 PID: 2434 Comm: repro_test Not tainted 4.15.0+ #4
   RIP: 0010:handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
   Call Trace:
vmx_handle_exit+0xbd/0xe20 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0xdaf/0x1d50 [kvm]
kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
do_vfs_ioctl+0xa4/0x6a0
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x25/0x9c

The syzkaller creates a former thread to issue KVM_SMI ioctl, and then creates
a latter thread to mmap and operate on the same vCPU, rsm emulation will not be 
executed since there is no something like seabios which implements smi handler 
when running syzkaller directly. This triggers a race condition when running 
the testcase with multiple threads. Sometimes one thread exit w/ SHUTDOWN 
reason, another thread mmaps and operates on the same vCPU, it continues to 
use CS=0x3, IP=0x8000 to access the address of SMI handler which results 
in the above ept misconfig. This patch fixes it by bailing out immediately if 
the vCPU is marked EXIT_SHUTDOWN reason.

Reported-by: Dmitry Vyukov 
Cc: Dmitry Vyukov 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/x86.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 786cd00..445e702 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7458,6 +7458,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
goto out;
}
 
+   if (unlikely(vcpu->run->exit_reason == KVM_EXIT_SHUTDOWN)) {
+   r = -EINVAL;
+   goto out;
+   }
+
if (vcpu->run->kvm_dirty_regs) {
r = sync_regs(vcpu);
if (r != 0)
-- 
2.7.4

[PATCH] KVM: X86: Fix SMRAM accessing even if VM is shutdown

2018-02-06 Thread Wanpeng Li

From: Wanpeng Li 

Reported by syzkaller:

   WARNING: CPU: 6 PID: 2434 at arch/x86/kvm/vmx.c:6660 
handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
   CPU: 6 PID: 2434 Comm: repro_test Not tainted 4.15.0+ #4
   RIP: 0010:handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
   Call Trace:
vmx_handle_exit+0xbd/0xe20 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0xdaf/0x1d50 [kvm]
kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
do_vfs_ioctl+0xa4/0x6a0
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x25/0x9c

The syzkaller creates a former thread to issue KVM_SMI ioctl, and then creates
a latter thread to mmap and operate on the same vCPU, rsm emulation will not be 
executed since there is no something like seabios which implements smi handler 
when running syzkaller directly. This triggers a race condition when running 
the testcase with multiple threads. Sometimes one thread exit w/ SHUTDOWN 
reason, another thread mmaps and operates on the same vCPU, it continues to 
use CS=0x3, IP=0x8000 to access the address of SMI handler which results 
in the above ept misconfig. This patch fixes it by bailing out immediately if 
the vCPU is marked EXIT_SHUTDOWN reason.

Reported-by: Dmitry Vyukov 
Cc: Dmitry Vyukov 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/x86.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 786cd00..445e702 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7458,6 +7458,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
goto out;
}
 
+   if (unlikely(vcpu->run->exit_reason == KVM_EXIT_SHUTDOWN)) {
+   r = -EINVAL;
+   goto out;
+   }
+
if (vcpu->run->kvm_dirty_regs) {
r = sync_regs(vcpu);
if (r != 0)
-- 
2.7.4

Re: [PATCH 2/2] usb: chipidea: imx: Fix ULPI on imx53

2018-02-06 Thread Peter Chen

On Tue, Feb 06, 2018 at 04:50:41PM +0100, Sebastian Reichel wrote:
> Hi Peter,
> 
> On Mon, Jan 29, 2018 at 11:33:15AM +0800, Peter Chen wrote:
> > On Wed, Jan 24, 2018 at 06:14:39PM +0100, Sebastian Reichel wrote:
> > > Traditionally, PORTSC should be set before initializing ULPI phys. But
> > > setting PORTSC before powering on the phy results in a kernel freeze
> > > on imx53 based GE PPD. As a workaround this initializes the phy early
> > > in the imx platform code and disables phy power management from the
> > > core.
> > > 
> > > Signed-off-by: Fabien Lahoudere 
> > > Signed-off-by: Sebastian Reichel 
> > > ---
> > >  drivers/usb/chipidea/ci_hdrc_imx.c | 12 
> > >  1 file changed, 12 insertions(+)
> > > 
> > > diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c 
> > > b/drivers/usb/chipidea/ci_hdrc_imx.c
> > > index de155c80eb70..e431c5aafe35 100644
> > > --- a/drivers/usb/chipidea/ci_hdrc_imx.c
> > > +++ b/drivers/usb/chipidea/ci_hdrc_imx.c
> > > @@ -83,6 +83,7 @@ struct ci_hdrc_imx_data {
> > >   struct clk *clk;
> > >   struct imx_usbmisc_data *usbmisc_data;
> > >   bool supports_runtime_pm;
> > > + bool override_phy_control;
> > >   bool in_lpm;
> > >   /* SoC before i.mx6 (except imx23/imx28) needs three clks */
> > >   bool need_three_clks;
> > > @@ -254,6 +255,7 @@ static int ci_hdrc_imx_probe(struct platform_device 
> > > *pdev)
> > >   int ret;
> > >   const struct of_device_id *of_id;
> > >   const struct ci_hdrc_imx_platform_flag *imx_platform_flag;
> > > + struct device_node *np = pdev->dev.of_node;
> > >  
> > >   of_id = of_match_device(ci_hdrc_imx_dt_ids, >dev);
> > >   if (!of_id)
> > > @@ -288,6 +290,14 @@ static int ci_hdrc_imx_probe(struct platform_device 
> > > *pdev)
> > >   }
> > >  
> > >   pdata.usb_phy = data->phy;
> > > +
> > > + if (of_device_is_compatible(np, "fsl,imx53-usb") && pdata.usb_phy &&
> > > + of_usb_get_phy_mode(np) == USBPHY_INTERFACE_MODE_ULPI) {
> > > + pdata.flags |= CI_HDRC_OVERRIDE_PHY_CONTROL;
> > > + data->override_phy_control = true;
> > > + usb_phy_init(pdata.usb_phy);
> > > + }
> > > +
> > >   pdata.flags |= imx_platform_flag->flags;
> > >   if (pdata.flags & CI_HDRC_SUPPORTS_RUNTIME_PM)
> > >   data->supports_runtime_pm = true;
> > > @@ -341,6 +351,8 @@ static int ci_hdrc_imx_remove(struct platform_device 
> > > *pdev)
> > >   pm_runtime_put_noidle(>dev);
> > >   }
> > >   ci_hdrc_remove_device(data->ci_pdev);
> > > + if (data->override_phy_control)
> > > + usb_phy_shutdown(data->phy);
> > >   imx_disable_unprepare_clks(>dev);
> > >  
> > 
> > Sebastian, I have a question, do you have any USB or generic PHY drivers
> > for ULPI bus, any power controls are needed for your ULPI peripheral?
> 
> The devicetree for GE PPD is available in the mainline kernel:
> 
> $ grep -A9 "usbphy[23] {" arch/arm/boot/dts/imx53-ppd.dts
>   usbphy2: usbphy2 {
>   compatible = "usb-nop-xceiv";
>   reset-gpios = < 4 GPIO_ACTIVE_LOW>;
>   clock-names = "main_clk";
>   clock-frequency = <2400>;
>   clocks = < IMX5_CLK_CKO2>;
>   assigned-clocks = < IMX5_CLK_CKO2_SEL>, < 
> IMX5_CLK_OSC>;
>   assigned-clock-parents = < IMX5_CLK_OSC>;
>   };
> 
>   usbphy3: usbphy3 {
>   compatible = "usb-nop-xceiv";
>   reset-gpios = < 19 GPIO_ACTIVE_LOW>;
>   clock-names = "main_clk";
> 
>   clock-frequency = <2400>;
>   clocks = < IMX5_CLK_CKO2>;
>   assigned-clocks = < IMX5_CLK_CKO2_SEL>, < 
> IMX5_CLK_OSC>;
>   assigned-clock-parents = < IMX5_CLK_OSC>;
>   };
> 
> So currently the machine only uses drivers/usb/phy/phy-generic.c. Both
> USB phys are actually SMSC USB3315, which is also detected by the kernel:
> 
> root@csmon :~# cat /sys/bus/ulpi/devices/ci_hdrc.*.ulpi/uevent 
> DEVTYPE=ulpi_device
> MODALIAS=ulpi:v0424p0006
> DEVTYPE=ulpi_device
> MODALIAS=ulpi:v0424p0006
> 
> So maybe drivers/usb/phy/phy-ulpi.c should be used, but I don't see
> a simple way to do so and using the generic PHY works.
> 

It is correct you use phy-generic.c if it can let your design
work, thanks.

-- 

Best Regards,
Peter Chen

Re: [PATCH 2/2] usb: chipidea: imx: Fix ULPI on imx53

2018-02-06 Thread Peter Chen

On Tue, Feb 06, 2018 at 04:50:41PM +0100, Sebastian Reichel wrote:
> Hi Peter,
> 
> On Mon, Jan 29, 2018 at 11:33:15AM +0800, Peter Chen wrote:
> > On Wed, Jan 24, 2018 at 06:14:39PM +0100, Sebastian Reichel wrote:
> > > Traditionally, PORTSC should be set before initializing ULPI phys. But
> > > setting PORTSC before powering on the phy results in a kernel freeze
> > > on imx53 based GE PPD. As a workaround this initializes the phy early
> > > in the imx platform code and disables phy power management from the
> > > core.
> > > 
> > > Signed-off-by: Fabien Lahoudere 
> > > Signed-off-by: Sebastian Reichel 
> > > ---
> > >  drivers/usb/chipidea/ci_hdrc_imx.c | 12 
> > >  1 file changed, 12 insertions(+)
> > > 
> > > diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c 
> > > b/drivers/usb/chipidea/ci_hdrc_imx.c
> > > index de155c80eb70..e431c5aafe35 100644
> > > --- a/drivers/usb/chipidea/ci_hdrc_imx.c
> > > +++ b/drivers/usb/chipidea/ci_hdrc_imx.c
> > > @@ -83,6 +83,7 @@ struct ci_hdrc_imx_data {
> > >   struct clk *clk;
> > >   struct imx_usbmisc_data *usbmisc_data;
> > >   bool supports_runtime_pm;
> > > + bool override_phy_control;
> > >   bool in_lpm;
> > >   /* SoC before i.mx6 (except imx23/imx28) needs three clks */
> > >   bool need_three_clks;
> > > @@ -254,6 +255,7 @@ static int ci_hdrc_imx_probe(struct platform_device 
> > > *pdev)
> > >   int ret;
> > >   const struct of_device_id *of_id;
> > >   const struct ci_hdrc_imx_platform_flag *imx_platform_flag;
> > > + struct device_node *np = pdev->dev.of_node;
> > >  
> > >   of_id = of_match_device(ci_hdrc_imx_dt_ids, >dev);
> > >   if (!of_id)
> > > @@ -288,6 +290,14 @@ static int ci_hdrc_imx_probe(struct platform_device 
> > > *pdev)
> > >   }
> > >  
> > >   pdata.usb_phy = data->phy;
> > > +
> > > + if (of_device_is_compatible(np, "fsl,imx53-usb") && pdata.usb_phy &&
> > > + of_usb_get_phy_mode(np) == USBPHY_INTERFACE_MODE_ULPI) {
> > > + pdata.flags |= CI_HDRC_OVERRIDE_PHY_CONTROL;
> > > + data->override_phy_control = true;
> > > + usb_phy_init(pdata.usb_phy);
> > > + }
> > > +
> > >   pdata.flags |= imx_platform_flag->flags;
> > >   if (pdata.flags & CI_HDRC_SUPPORTS_RUNTIME_PM)
> > >   data->supports_runtime_pm = true;
> > > @@ -341,6 +351,8 @@ static int ci_hdrc_imx_remove(struct platform_device 
> > > *pdev)
> > >   pm_runtime_put_noidle(>dev);
> > >   }
> > >   ci_hdrc_remove_device(data->ci_pdev);
> > > + if (data->override_phy_control)
> > > + usb_phy_shutdown(data->phy);
> > >   imx_disable_unprepare_clks(>dev);
> > >  
> > 
> > Sebastian, I have a question, do you have any USB or generic PHY drivers
> > for ULPI bus, any power controls are needed for your ULPI peripheral?
> 
> The devicetree for GE PPD is available in the mainline kernel:
> 
> $ grep -A9 "usbphy[23] {" arch/arm/boot/dts/imx53-ppd.dts
>   usbphy2: usbphy2 {
>   compatible = "usb-nop-xceiv";
>   reset-gpios = < 4 GPIO_ACTIVE_LOW>;
>   clock-names = "main_clk";
>   clock-frequency = <2400>;
>   clocks = < IMX5_CLK_CKO2>;
>   assigned-clocks = < IMX5_CLK_CKO2_SEL>, < 
> IMX5_CLK_OSC>;
>   assigned-clock-parents = < IMX5_CLK_OSC>;
>   };
> 
>   usbphy3: usbphy3 {
>   compatible = "usb-nop-xceiv";
>   reset-gpios = < 19 GPIO_ACTIVE_LOW>;
>   clock-names = "main_clk";
> 
>   clock-frequency = <2400>;
>   clocks = < IMX5_CLK_CKO2>;
>   assigned-clocks = < IMX5_CLK_CKO2_SEL>, < 
> IMX5_CLK_OSC>;
>   assigned-clock-parents = < IMX5_CLK_OSC>;
>   };
> 
> So currently the machine only uses drivers/usb/phy/phy-generic.c. Both
> USB phys are actually SMSC USB3315, which is also detected by the kernel:
> 
> root@csmon :~# cat /sys/bus/ulpi/devices/ci_hdrc.*.ulpi/uevent 
> DEVTYPE=ulpi_device
> MODALIAS=ulpi:v0424p0006
> DEVTYPE=ulpi_device
> MODALIAS=ulpi:v0424p0006
> 
> So maybe drivers/usb/phy/phy-ulpi.c should be used, but I don't see
> a simple way to do so and using the generic PHY works.
> 

It is correct you use phy-generic.c if it can let your design
work, thanks.

-- 

Best Regards,
Peter Chen

Re: [PATCH v2 06/16] arm64: dts: mt7622: add cpufreq related device nodes

2018-02-06 Thread Viresh Kumar

On 07-02-18, 14:16, Sean Wang wrote:
> On Wed, 2018-02-07 at 09:03 +0530, Viresh Kumar wrote:
> > On 06-02-18, 17:52, sean.w...@mediatek.com wrote:
> > >   cpus {
> > >   #address-cells = <2>;
> > >   #size-cells = <0>;
> > > @@ -26,6 +70,10 @@
> > >   device_type = "cpu";
> > >   compatible = "arm,cortex-a53", "arm,armv8";
> > >   reg = <0x0 0x0>;
> > > + clocks = < CLK_INFRA_MUX1_SEL>,
> > > +  < CLK_APMIXED_MAIN_CORE_EN>;
> > > + clock-names = "cpu", "intermediate";
> > > + operating-points-v2 = <_opp_table>;
> > >   enable-method = "psci";
> > >   clock-frequency = <13>;
> > >   };
> > > @@ -34,6 +82,7 @@
> > >   device_type = "cpu";
> > >   compatible = "arm,cortex-a53", "arm,armv8";
> > >   reg = <0x0 0x1>;
> > > + operating-points-v2 = <_opp_table>;
> > >   enable-method = "psci";
> > >   clock-frequency = <13>;
> > >   };
> > 
> > Sorry for not picking this earlier, but you should probably add the same 
> > clock
> > related properties for both cpu nodes here. Things will break if CPU1 is 
> > used by
> > the cpufreq core to bring the cpufreq policy online.
> > 
> > This can happen if cpufreq driver is a module, CPU0 is hotplugged out and 
> > then
> > the cpufreq driver is inserted.
> > 
> 
> mt7622 cpu0 does not support hotplug. do I still need to add same clock
> related properties for both cpu nodes here?

Normally we should always add these properties to all the CPUs, as that's the
real scenario hardware configuration wise.

But I am not sure if something else will break if you don't provide clocks in
CPU1.

@Rob @Mark: What do you suggest ?

-- 
viresh

Re: [PATCH v2 06/16] arm64: dts: mt7622: add cpufreq related device nodes

2018-02-06 Thread Viresh Kumar

On 07-02-18, 14:16, Sean Wang wrote:
> On Wed, 2018-02-07 at 09:03 +0530, Viresh Kumar wrote:
> > On 06-02-18, 17:52, sean.w...@mediatek.com wrote:
> > >   cpus {
> > >   #address-cells = <2>;
> > >   #size-cells = <0>;
> > > @@ -26,6 +70,10 @@
> > >   device_type = "cpu";
> > >   compatible = "arm,cortex-a53", "arm,armv8";
> > >   reg = <0x0 0x0>;
> > > + clocks = < CLK_INFRA_MUX1_SEL>,
> > > +  < CLK_APMIXED_MAIN_CORE_EN>;
> > > + clock-names = "cpu", "intermediate";
> > > + operating-points-v2 = <_opp_table>;
> > >   enable-method = "psci";
> > >   clock-frequency = <13>;
> > >   };
> > > @@ -34,6 +82,7 @@
> > >   device_type = "cpu";
> > >   compatible = "arm,cortex-a53", "arm,armv8";
> > >   reg = <0x0 0x1>;
> > > + operating-points-v2 = <_opp_table>;
> > >   enable-method = "psci";
> > >   clock-frequency = <13>;
> > >   };
> > 
> > Sorry for not picking this earlier, but you should probably add the same 
> > clock
> > related properties for both cpu nodes here. Things will break if CPU1 is 
> > used by
> > the cpufreq core to bring the cpufreq policy online.
> > 
> > This can happen if cpufreq driver is a module, CPU0 is hotplugged out and 
> > then
> > the cpufreq driver is inserted.
> > 
> 
> mt7622 cpu0 does not support hotplug. do I still need to add same clock
> related properties for both cpu nodes here?

Normally we should always add these properties to all the CPUs, as that's the
real scenario hardware configuration wise.

But I am not sure if something else will break if you don't provide clocks in
CPU1.

@Rob @Mark: What do you suggest ?

-- 
viresh

arch/x86/tools/insn_decoder_test: warning: ffffffff810005de: 0f ff e8 ud0 %eax,%ebp

2018-02-06 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   ab2d92ad881da11331280aedf612d82e61cb6d41
commit: 10c91577d5e631773a6394e14cf60125389b71ae x86/tools: Standardize output 
format of insn_decode_test
date:   8 weeks ago
config: x86_64-randconfig-s3-02070914 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
git checkout 10c91577d5e631773a6394e14cf60125389b71ae
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
>> arch/x86/tools/insn_decoder_test: warning: 810005de: 0f ff e8
>> ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 810010d7: 0f ff eb
ud0%ebx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81001152: 0f ff bf 09 00 
00 00ud00x9(%rdi),%edi
   arch/x86/tools/insn_decoder_test: warning: objdump says 7 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81001275: 0f ff eb
ud0%ebx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 810019b2: 0f ff e8
ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81001afc: 0f ff e8
ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81001c23: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81002502: 0f ff eb
ud0%ebx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 8100267e: 0f ff e8
ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 810028a0: 0f ff e8
ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81002a94: 0f ff e8
ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81002b17: 0f ff e8
ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81002e20: 0f ff 83 cd 01 
e8 17ud00x17e801cd(%rbx),%eax
   arch/x86/tools/insn_decoder_test: warning: objdump says 7 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81002eea: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please

arch/x86/tools/insn_decoder_test: warning: ffffffff810005de: 0f ff e8 ud0 %eax,%ebp

2018-02-06 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   ab2d92ad881da11331280aedf612d82e61cb6d41
commit: 10c91577d5e631773a6394e14cf60125389b71ae x86/tools: Standardize output 
format of insn_decode_test
date:   8 weeks ago
config: x86_64-randconfig-s3-02070914 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
git checkout 10c91577d5e631773a6394e14cf60125389b71ae
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
>> arch/x86/tools/insn_decoder_test: warning: 810005de: 0f ff e8
>> ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 810010d7: 0f ff eb
ud0%ebx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81001152: 0f ff bf 09 00 
00 00ud00x9(%rdi),%edi
   arch/x86/tools/insn_decoder_test: warning: objdump says 7 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81001275: 0f ff eb
ud0%ebx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 810019b2: 0f ff e8
ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81001afc: 0f ff e8
ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81001c23: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81002502: 0f ff eb
ud0%ebx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 8100267e: 0f ff e8
ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 810028a0: 0f ff e8
ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81002a94: 0f ff e8
ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81002b17: 0f ff e8
ud0%eax,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81002e20: 0f ff 83 cd 01 
e8 17ud00x17e801cd(%rbx),%eax
   arch/x86/tools/insn_decoder_test: warning: objdump says 7 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 81002eea: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please

Re: [PATCH v2 06/16] arm64: dts: mt7622: add cpufreq related device nodes

2018-02-06 Thread Sean Wang

On Wed, 2018-02-07 at 09:03 +0530, Viresh Kumar wrote:
> On 06-02-18, 17:52, sean.w...@mediatek.com wrote:
> > cpus {
> > #address-cells = <2>;
> > #size-cells = <0>;
> > @@ -26,6 +70,10 @@
> > device_type = "cpu";
> > compatible = "arm,cortex-a53", "arm,armv8";
> > reg = <0x0 0x0>;
> > +   clocks = < CLK_INFRA_MUX1_SEL>,
> > +< CLK_APMIXED_MAIN_CORE_EN>;
> > +   clock-names = "cpu", "intermediate";
> > +   operating-points-v2 = <_opp_table>;
> > enable-method = "psci";
> > clock-frequency = <13>;
> > };
> > @@ -34,6 +82,7 @@
> > device_type = "cpu";
> > compatible = "arm,cortex-a53", "arm,armv8";
> > reg = <0x0 0x1>;
> > +   operating-points-v2 = <_opp_table>;
> > enable-method = "psci";
> > clock-frequency = <13>;
> > };
> 
> Sorry for not picking this earlier, but you should probably add the same clock
> related properties for both cpu nodes here. Things will break if CPU1 is used 
> by
> the cpufreq core to bring the cpufreq policy online.
> 
> This can happen if cpufreq driver is a module, CPU0 is hotplugged out and then
> the cpufreq driver is inserted.
> 

mt7622 cpu0 does not support hotplug. do I still need to add same clock
related properties for both cpu nodes here?

Re: [PATCH v2 06/16] arm64: dts: mt7622: add cpufreq related device nodes

2018-02-06 Thread Sean Wang

On Wed, 2018-02-07 at 09:03 +0530, Viresh Kumar wrote:
> On 06-02-18, 17:52, sean.w...@mediatek.com wrote:
> > cpus {
> > #address-cells = <2>;
> > #size-cells = <0>;
> > @@ -26,6 +70,10 @@
> > device_type = "cpu";
> > compatible = "arm,cortex-a53", "arm,armv8";
> > reg = <0x0 0x0>;
> > +   clocks = < CLK_INFRA_MUX1_SEL>,
> > +< CLK_APMIXED_MAIN_CORE_EN>;
> > +   clock-names = "cpu", "intermediate";
> > +   operating-points-v2 = <_opp_table>;
> > enable-method = "psci";
> > clock-frequency = <13>;
> > };
> > @@ -34,6 +82,7 @@
> > device_type = "cpu";
> > compatible = "arm,cortex-a53", "arm,armv8";
> > reg = <0x0 0x1>;
> > +   operating-points-v2 = <_opp_table>;
> > enable-method = "psci";
> > clock-frequency = <13>;
> > };
> 
> Sorry for not picking this earlier, but you should probably add the same clock
> related properties for both cpu nodes here. Things will break if CPU1 is used 
> by
> the cpufreq core to bring the cpufreq policy online.
> 
> This can happen if cpufreq driver is a module, CPU0 is hotplugged out and then
> the cpufreq driver is inserted.
> 

mt7622 cpu0 does not support hotplug. do I still need to add same clock
related properties for both cpu nodes here?

Re: [PATCH] selftests/android: Fix line continuation in Makefile

2018-02-06 Thread Pintu Kumar

On Wed, Feb 7, 2018 at 5:22 AM, Daniel Díaz  wrote:
> The Makefile lacks a couple of line continuation backslashes
> in an `if' clause, which can make the subsequent rsync
> command go awry over the whole filesystem (`rsync -a / /`).
>
>   /bin/sh: -c: line 5: syntax error: unexpected end of file
>   make[1]: [all] Error 1 (ignored)
>   TEST=$DIR"_test.sh"; \
>   if [ -e $DIR/$TEST ]; then
>   /bin/sh: -c: line 2: syntax error: unexpected end of file
>   make[1]: [all] Error 1 (ignored)
>   rsync -a $DIR/$TEST $BUILD_TARGET/;
>   [...a myriad of:]
>   [  rsync: readlink_stat("...") failed: Permission denied (13)]
>   [  skipping non-regular file "..."]
>   [  rsync: opendir "..." failed: Permission denied (13)]
>   [and many other errors...]
>   fi
>   make[1]: fi: Command not found
>   make[1]: [all] Error 127 (ignored)
>   done
>   make[1]: done: Command not found
>   make[1]: [all] Error 127 (ignored)
>
> Signed-off-by: Daniel Díaz 
> ---
>  tools/testing/selftests/android/Makefile | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/tools/testing/selftests/android/Makefile 
> b/tools/testing/selftests/android/Makefile
> index 1a74922..f6304d2 100644
> --- a/tools/testing/selftests/android/Makefile
> +++ b/tools/testing/selftests/android/Makefile
> @@ -11,11 +11,11 @@ all:
> BUILD_TARGET=$(OUTPUT)/$$DIR;   \
> mkdir $$BUILD_TARGET  -p;   \
> make OUTPUT=$$BUILD_TARGET -C $$DIR $@;\
> -   #SUBDIR test prog name should be in the form: SUBDIR_test.sh
> +   #SUBDIR test prog name should be in the form: SUBDIR_test.sh \
> TEST=$$DIR"_test.sh"; \
> -   if [ -e $$DIR/$$TEST ]; then
> -   rsync -a $$DIR/$$TEST $$BUILD_TARGET/;
> -   fi
> +   if [ -e $$DIR/$$TEST ]; then \
> +   rsync -a $$DIR/$$TEST $$BUILD_TARGET/; \
> +   fi \
> done

Thanks for your patch.
However, I have copied this Makefile from
tools/testing/selftests/futex/Makefile before modifying it.
If there is a problem with backslash then the same problem must be
there in futex Makefile as well.
Can you compare these 2 Makefile and see if there is any problem.

Also is it because of make version ?
Can you check your make version ?

Thank You!
Pintu

>
>  override define RUN_TESTS
> --
> 2.7.4
>

Re: [PATCH] selftests/android: Fix line continuation in Makefile

2018-02-06 Thread Pintu Kumar

On Wed, Feb 7, 2018 at 5:22 AM, Daniel Díaz  wrote:
> The Makefile lacks a couple of line continuation backslashes
> in an `if' clause, which can make the subsequent rsync
> command go awry over the whole filesystem (`rsync -a / /`).
>
>   /bin/sh: -c: line 5: syntax error: unexpected end of file
>   make[1]: [all] Error 1 (ignored)
>   TEST=$DIR"_test.sh"; \
>   if [ -e $DIR/$TEST ]; then
>   /bin/sh: -c: line 2: syntax error: unexpected end of file
>   make[1]: [all] Error 1 (ignored)
>   rsync -a $DIR/$TEST $BUILD_TARGET/;
>   [...a myriad of:]
>   [  rsync: readlink_stat("...") failed: Permission denied (13)]
>   [  skipping non-regular file "..."]
>   [  rsync: opendir "..." failed: Permission denied (13)]
>   [and many other errors...]
>   fi
>   make[1]: fi: Command not found
>   make[1]: [all] Error 127 (ignored)
>   done
>   make[1]: done: Command not found
>   make[1]: [all] Error 127 (ignored)
>
> Signed-off-by: Daniel Díaz 
> ---
>  tools/testing/selftests/android/Makefile | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/tools/testing/selftests/android/Makefile 
> b/tools/testing/selftests/android/Makefile
> index 1a74922..f6304d2 100644
> --- a/tools/testing/selftests/android/Makefile
> +++ b/tools/testing/selftests/android/Makefile
> @@ -11,11 +11,11 @@ all:
> BUILD_TARGET=$(OUTPUT)/$$DIR;   \
> mkdir $$BUILD_TARGET  -p;   \
> make OUTPUT=$$BUILD_TARGET -C $$DIR $@;\
> -   #SUBDIR test prog name should be in the form: SUBDIR_test.sh
> +   #SUBDIR test prog name should be in the form: SUBDIR_test.sh \
> TEST=$$DIR"_test.sh"; \
> -   if [ -e $$DIR/$$TEST ]; then
> -   rsync -a $$DIR/$$TEST $$BUILD_TARGET/;
> -   fi
> +   if [ -e $$DIR/$$TEST ]; then \
> +   rsync -a $$DIR/$$TEST $$BUILD_TARGET/; \
> +   fi \
> done

Thanks for your patch.
However, I have copied this Makefile from
tools/testing/selftests/futex/Makefile before modifying it.
If there is a problem with backslash then the same problem must be
there in futex Makefile as well.
Can you compare these 2 Makefile and see if there is any problem.

Also is it because of make version ?
Can you check your make version ?

Thank You!
Pintu

>
>  override define RUN_TESTS
> --
> 2.7.4
>

Re: [PATCH] ALSA: usb-audio: add implicit fb quirk for Behringer UFX1204

2018-02-06 Thread Takashi Iwai

On Sat, 03 Feb 2018 15:42:40 +0100,
Lassi Ylikojola wrote:
> 
> Add quirk to ensure a sync endpoint is properly configured.
> This patch is a fix for same symptoms on Behringer UFX1204 as patch
> from Albertto Aquirre on Dec 8 2016 for Axe-Fx II.
> 
> Signed-off-by: Lassi Ylikojola 

The patch doesn't seem applied cleanly to the latest tree.
Could you check it and repost with the proper patch for the latest
Linus tree?

thanks,

Takashi

Re: [PATCH] ALSA: usb-audio: add implicit fb quirk for Behringer UFX1204

2018-02-06 Thread Takashi Iwai

On Sat, 03 Feb 2018 15:42:40 +0100,
Lassi Ylikojola wrote:
> 
> Add quirk to ensure a sync endpoint is properly configured.
> This patch is a fix for same symptoms on Behringer UFX1204 as patch
> from Albertto Aquirre on Dec 8 2016 for Axe-Fx II.
> 
> Signed-off-by: Lassi Ylikojola 

The patch doesn't seem applied cleanly to the latest tree.
Could you check it and repost with the proper patch for the latest
Linus tree?

thanks,

Takashi

Re: [PATCH] ALSA: usb-audio: Fix UAC2 get_ctl request with a RANGE attribute

2018-02-06 Thread Takashi Iwai

On Mon, 29 Jan 2018 06:37:55 +0100,
Kirill Marinushkin wrote:
> 
> The layout of the UAC2 Control request and response varies depending on
> the request type. With the current implementation, only the Layout 2
> Parameter Block (with the 2-byte sized RANGE attribute) is handled
> properly. For the Control requests with the 1-byte sized RANGE attribute
> (Bass Control, Mid Control, Tremble Control), the response is parsed
> incorrectly.
> 
> This commit:
> * fixes the wLength field value in the request
> * fixes parsing the range values from the response
> 
> Fixes: 23caaf19b11e ("ALSA: usb-mixer: Add support for Audio Class v2.0")
> Signed-off-by: Kirill Marinushkin 
> Cc: Jaroslav Kysela 
> Cc: Takashi Iwai 
> Cc: Jaejoong Kim 
> Cc: Bhumika Goyal 
> Cc: Stephen Barber 
> Cc: Julian Scheel 
> Cc: alsa-de...@alsa-project.org
> Cc: linux-kernel@vger.kernel.org

Sorry for the late reply, as I've been (and still) off.

Does this bug actually hit on any real devices, or is it only a
logical error so far?  In the former case, a Cc to stable is
mandatory.

In anyway, I'll review and merge it properly once after I back to
work.


thanks,

Takashi

Re: [PATCH] ALSA: usb-audio: Fix UAC2 get_ctl request with a RANGE attribute

2018-02-06 Thread Takashi Iwai

On Mon, 29 Jan 2018 06:37:55 +0100,
Kirill Marinushkin wrote:
> 
> The layout of the UAC2 Control request and response varies depending on
> the request type. With the current implementation, only the Layout 2
> Parameter Block (with the 2-byte sized RANGE attribute) is handled
> properly. For the Control requests with the 1-byte sized RANGE attribute
> (Bass Control, Mid Control, Tremble Control), the response is parsed
> incorrectly.
> 
> This commit:
> * fixes the wLength field value in the request
> * fixes parsing the range values from the response
> 
> Fixes: 23caaf19b11e ("ALSA: usb-mixer: Add support for Audio Class v2.0")
> Signed-off-by: Kirill Marinushkin 
> Cc: Jaroslav Kysela 
> Cc: Takashi Iwai 
> Cc: Jaejoong Kim 
> Cc: Bhumika Goyal 
> Cc: Stephen Barber 
> Cc: Julian Scheel 
> Cc: alsa-de...@alsa-project.org
> Cc: linux-kernel@vger.kernel.org

Sorry for the late reply, as I've been (and still) off.

Does this bug actually hit on any real devices, or is it only a
logical error so far?  In the former case, a Cc to stable is
mandatory.

In anyway, I'll review and merge it properly once after I back to
work.


thanks,

Takashi

Re: [PATCH v3 0/2] phy: rockchip-emmc: fixes emmc-phy power on failed with rk3399 SoCs

2018-02-06 Thread Kishon Vijay Abraham I



On Wednesday 07 February 2018 06:47 AM, Caesar Wang wrote:
> Kishon,
> 
> Can you help merge this in your  or next tree?  I'm hoping that we can land
> this somewhere.:-)

sure, I'll merge once -rc1 is tagged.

Thanks
Kishon

> 
> 
> Thanks,
> -Caesar
> 在 2018年01月11日 10:40, Caesar Wang 写道:
>> Hi Kishon,
>>
>> Since the Shawn isn't available, I take over this series patches for now.
>>
>> As the original bug had tracked on https://issuetracker.google.com/71561742.
>> In some cases, the mmc phy power on failed during booting up.
>> The log as below:
>> ...
>> [   2.375333] rockchip_emmc_phy_power: caldone timeout.
>> [2.377815] phy phy-ff77.syscon:phy@f780.4: phy poweron failed --> 
>> -110
>> ...
>> [2.489295] mmc0: mmc_select_hs400es failed, error -110
>> [2.489302] mmc0: error -110 whilst initialising MMC card
>> ..
>>
>> The actual emulate, the wait 5us for calpad busy trimming, that's no enough.
>> We need give the enough margin for it.
>>
>> Verified on url =
>> 
>> https://chromium.googlesource.com/chromiumos/third_party/kernel/+/chromeos-4.4
>> This series patches can apply and bring up with kernel-next on rk3399
>> chromebook.
>>
>> -Caesar
>>
>>
>> Changes in v3:
>> - As Doug commented on both upstream and gerrit.
>>Change "5, 50" to "0, 50", and the message of print.
>> - As Doug commented on https://patchwork.kernel.org/patch/10154797,
>>Change "1, 50" to "0, 50".
>>
>> Changes in v2:
>> - print the return valut with regmap_read_poll_timeout failing.
>> - As Brian commented on https://patchwork.kernel.org/patch/10139891/,
>>changed the note and added to print error value with
>>regmap_read_poll_timeout API.
>>
>> Shawn Lin (2):
>>phy: rockchip-emmc: retry calpad busy trimming
>>phy: rockchip-emmc: use regmap_read_poll_timeout to poll dllrdy
>>
>>   drivers/phy/rockchip/phy-rockchip-emmc.c | 60 
>> +++-
>>   1 file changed, 28 insertions(+), 32 deletions(-)
>>
> 
>

Re: [PATCH v3 0/2] phy: rockchip-emmc: fixes emmc-phy power on failed with rk3399 SoCs

2018-02-06 Thread Kishon Vijay Abraham I



On Wednesday 07 February 2018 06:47 AM, Caesar Wang wrote:
> Kishon,
> 
> Can you help merge this in your  or next tree?  I'm hoping that we can land
> this somewhere.:-)

sure, I'll merge once -rc1 is tagged.

Thanks
Kishon

> 
> 
> Thanks,
> -Caesar
> 在 2018年01月11日 10:40, Caesar Wang 写道:
>> Hi Kishon,
>>
>> Since the Shawn isn't available, I take over this series patches for now.
>>
>> As the original bug had tracked on https://issuetracker.google.com/71561742.
>> In some cases, the mmc phy power on failed during booting up.
>> The log as below:
>> ...
>> [   2.375333] rockchip_emmc_phy_power: caldone timeout.
>> [2.377815] phy phy-ff77.syscon:phy@f780.4: phy poweron failed --> 
>> -110
>> ...
>> [2.489295] mmc0: mmc_select_hs400es failed, error -110
>> [2.489302] mmc0: error -110 whilst initialising MMC card
>> ..
>>
>> The actual emulate, the wait 5us for calpad busy trimming, that's no enough.
>> We need give the enough margin for it.
>>
>> Verified on url =
>> 
>> https://chromium.googlesource.com/chromiumos/third_party/kernel/+/chromeos-4.4
>> This series patches can apply and bring up with kernel-next on rk3399
>> chromebook.
>>
>> -Caesar
>>
>>
>> Changes in v3:
>> - As Doug commented on both upstream and gerrit.
>>Change "5, 50" to "0, 50", and the message of print.
>> - As Doug commented on https://patchwork.kernel.org/patch/10154797,
>>Change "1, 50" to "0, 50".
>>
>> Changes in v2:
>> - print the return valut with regmap_read_poll_timeout failing.
>> - As Brian commented on https://patchwork.kernel.org/patch/10139891/,
>>changed the note and added to print error value with
>>regmap_read_poll_timeout API.
>>
>> Shawn Lin (2):
>>phy: rockchip-emmc: retry calpad busy trimming
>>phy: rockchip-emmc: use regmap_read_poll_timeout to poll dllrdy
>>
>>   drivers/phy/rockchip/phy-rockchip-emmc.c | 60 
>> +++-
>>   1 file changed, 28 insertions(+), 32 deletions(-)
>>
> 
>

Re: [PATCH v3] staging: android: ion: Add implementation of dma_buf_vmap and dma_buf_vunmap

2018-02-06 Thread Alexey Skidanov



On 02/07/2018 01:56 AM, Laura Abbott wrote:
> On 01/31/2018 10:10 PM, Alexey Skidanov wrote:
>>
>> On 01/31/2018 03:00 PM, Greg KH wrote:
>>> On Wed, Jan 31, 2018 at 02:03:42PM +0200, Alexey Skidanov wrote:
 Any driver may access shared buffers, created by ion, using
 dma_buf_vmap and
 dma_buf_vunmap dma-buf API that maps/unmaps previosuly allocated
 buffers into
 the kernel virtual address space. The implementation of these API is
 missing in
 the current ion implementation.

 Signed-off-by: Alexey Skidanov 
 ---
>>>
>>> No review from any other Intel developers? :(
>> Will add.
>>>
>>> Anyway, what in-tree driver needs access to these functions?
>> I'm not sure that there are the in-tree drivers using these functions
>> and ion as> buffer exporter because they are not implemented in ion :)
>> But there are some in-tre> drivers using these APIs (gpu drivers) with
>> other buffer exporters.
> 
> It's still not clear why you need to implement these APIs.
How the importing kernel module may access the content of the buffer? :)
With the current ion implementation it's only possible by dma_buf_kmap,
mapping one page at a time. For pretty large buffers, it might have some
performance impact.
(Probably, the page by page mapping is the only way to access large
buffers on 32 bit systems, where the vmalloc range is very small. By the
way, the current ion dma_map_kmap doesn't really map only 1 page at a
time - it uses the result of vmap() that might fail on 32 bit systems.)

> Are you planning to use Ion with GPU drivers? I'm especially
> interested in this if you have a non-Android use case.
Yes, my use case is the non-Android one. But not with GPU drivers.
> 
> Thanks,
> Laura

Thanks,
Alexey

Re: [PATCH v3] staging: android: ion: Add implementation of dma_buf_vmap and dma_buf_vunmap

2018-02-06 Thread Alexey Skidanov



On 02/07/2018 01:56 AM, Laura Abbott wrote:
> On 01/31/2018 10:10 PM, Alexey Skidanov wrote:
>>
>> On 01/31/2018 03:00 PM, Greg KH wrote:
>>> On Wed, Jan 31, 2018 at 02:03:42PM +0200, Alexey Skidanov wrote:
 Any driver may access shared buffers, created by ion, using
 dma_buf_vmap and
 dma_buf_vunmap dma-buf API that maps/unmaps previosuly allocated
 buffers into
 the kernel virtual address space. The implementation of these API is
 missing in
 the current ion implementation.

 Signed-off-by: Alexey Skidanov 
 ---
>>>
>>> No review from any other Intel developers? :(
>> Will add.
>>>
>>> Anyway, what in-tree driver needs access to these functions?
>> I'm not sure that there are the in-tree drivers using these functions
>> and ion as> buffer exporter because they are not implemented in ion :)
>> But there are some in-tre> drivers using these APIs (gpu drivers) with
>> other buffer exporters.
> 
> It's still not clear why you need to implement these APIs.
How the importing kernel module may access the content of the buffer? :)
With the current ion implementation it's only possible by dma_buf_kmap,
mapping one page at a time. For pretty large buffers, it might have some
performance impact.
(Probably, the page by page mapping is the only way to access large
buffers on 32 bit systems, where the vmalloc range is very small. By the
way, the current ion dma_map_kmap doesn't really map only 1 page at a
time - it uses the result of vmap() that might fail on 32 bit systems.)

> Are you planning to use Ion with GPU drivers? I'm especially
> interested in this if you have a non-Android use case.
Yes, my use case is the non-Android one. But not with GPU drivers.
> 
> Thanks,
> Laura

Thanks,
Alexey

Re: [PATCH 0/2] rcu: Transform kfree_rcu() into kvfree_rcu()

2018-02-06 Thread Paul E. McKenney

On Tue, Feb 06, 2018 at 08:23:34PM -0800, Matthew Wilcox wrote:
> On Tue, Feb 06, 2018 at 06:17:03PM -0800, Paul E. McKenney wrote:
> > So it is OK to kvmalloc() something and pass it to either kfree() or
> > kvfree(), and it had better be OK to kvmalloc() something and pass it
> > to kvfree().
> > 
> > Is it OK to kmalloc() something and pass it to kvfree()?
> 
> Yes, it absolutely is.
> 
> void kvfree(const void *addr)
> {
> if (is_vmalloc_addr(addr))
> vfree(addr);
> else
> kfree(addr);
> }
> 
> > If so, is it really useful to have two different names here, that is,
> > both kfree_rcu() and kvfree_rcu()?
> 
> I think it's handy to have all three of kvfree_rcu(), kfree_rcu() and
> vfree_rcu() available in the API for the symmetry of calling kmalloc()
> / kfree_rcu().
> 
> Personally, I would like us to rename kvfree() to just free(), and have
> malloc(x) be an alias to kvmalloc(x, GFP_KERNEL), but I haven't won that
> fight yet.

But why not just have the existing kfree_rcu() API cover both kmalloc()
and kvmalloc()?  Perhaps I am not in the right forums, but I am not hearing
anyone arguing that the RCU API has too few members.  ;-)

Thanx, Paul

Re: [PATCH 0/2] rcu: Transform kfree_rcu() into kvfree_rcu()

2018-02-06 Thread Paul E. McKenney

On Tue, Feb 06, 2018 at 08:23:34PM -0800, Matthew Wilcox wrote:
> On Tue, Feb 06, 2018 at 06:17:03PM -0800, Paul E. McKenney wrote:
> > So it is OK to kvmalloc() something and pass it to either kfree() or
> > kvfree(), and it had better be OK to kvmalloc() something and pass it
> > to kvfree().
> > 
> > Is it OK to kmalloc() something and pass it to kvfree()?
> 
> Yes, it absolutely is.
> 
> void kvfree(const void *addr)
> {
> if (is_vmalloc_addr(addr))
> vfree(addr);
> else
> kfree(addr);
> }
> 
> > If so, is it really useful to have two different names here, that is,
> > both kfree_rcu() and kvfree_rcu()?
> 
> I think it's handy to have all three of kvfree_rcu(), kfree_rcu() and
> vfree_rcu() available in the API for the symmetry of calling kmalloc()
> / kfree_rcu().
> 
> Personally, I would like us to rename kvfree() to just free(), and have
> malloc(x) be an alias to kvmalloc(x, GFP_KERNEL), but I haven't won that
> fight yet.

But why not just have the existing kfree_rcu() API cover both kmalloc()
and kvmalloc()?  Perhaps I am not in the right forums, but I am not hearing
anyone arguing that the RCU API has too few members.  ;-)

Thanx, Paul

Re: [PATCH v3 14/21] fpga: dfl: add fpga manager platform driver for FME

2018-02-06 Thread Wu Hao

On Tue, Feb 06, 2018 at 12:53:44PM -0600, Alan Tull wrote:
> On Tue, Feb 6, 2018 at 12:47 AM, Wu Hao  wrote:
> > On Mon, Feb 05, 2018 at 10:25:54PM -0600, Alan Tull wrote:
> >> On Mon, Feb 5, 2018 at 7:47 PM, Wu Hao  wrote:
> >> > On Mon, Feb 05, 2018 at 10:36:45AM -0800, Luebbers, Enno wrote:
> >> >> Hi Hao,
> >> >>
> >> >> On Sun, Feb 04, 2018 at 05:37:06PM +0800, Wu Hao wrote:
> >> >> > On Fri, Feb 02, 2018 at 04:26:26PM -0800, Luebbers, Enno wrote:
> >> >> > > Hi Hao, Alan,
> >> >> > >
> >> >> > > On Fri, Feb 02, 2018 at 05:42:13PM +0800, Wu Hao wrote:
> >> >> > > > On Thu, Feb 01, 2018 at 04:00:36PM -0600, Alan Tull wrote:
> >> >> > > > > On Mon, Nov 27, 2017 at 12:42 AM, Wu Hao  
> >> >> > > > > wrote:
> >> >> > > > >
> >> >> > > > > Hi Hao,
> >> >> > > > >
> >> >> > > > > A few comments below.   Besides that, looks good.
> >> >> > > > >
> >> >> > > > > > This patch adds fpga manager driver for FPGA Management 
> >> >> > > > > > Engine (FME). It
> >> >> > > > > > implements fpga_manager_ops for FPGA Partial Reconfiguration 
> >> >> > > > > > function.
> >> >> > > > > >
> >> >> > > > > > Signed-off-by: Tim Whisonant 
> >> >> > > > > > Signed-off-by: Enno Luebbers 
> >> >> > > > > > Signed-off-by: Shiva Rao 
> >> >> > > > > > Signed-off-by: Christopher Rauer 
> >> >> > > > > > Signed-off-by: Kang Luwei 
> >> >> > > > > > Signed-off-by: Xiao Guangrong 
> >> >> > > > > > Signed-off-by: Wu Hao 
> >> >> > > > > > 
> >> >> > > > > > v3: rename driver to dfl-fpga-fme-mgr
> >> >> > > > > > implemented status callback for fpga manager
> >> >> > > > > > rebased due to fpga api changes
> >> >> > > > > > ---
> >> >> > > > > >  .../ABI/testing/sysfs-platform-fpga-dfl-fme-mgr|   8 +
> >> >> > > > > >  drivers/fpga/Kconfig   |   6 +
> >> >> > > > > >  drivers/fpga/Makefile  |   1 +
> >> >> > > > > >  drivers/fpga/fpga-dfl-fme-mgr.c| 318 
> >> >> > > > > > +
> >> >> > > > > >  drivers/fpga/fpga-dfl.h|  39 ++-
> >> >> > > > > >  5 files changed, 371 insertions(+), 1 deletion(-)
> >> >> > > > > >  create mode 100644 
> >> >> > > > > > Documentation/ABI/testing/sysfs-platform-fpga-dfl-fme-mgr
> >> >> > > > > >  create mode 100644 drivers/fpga/fpga-dfl-fme-mgr.c
> >> >> > > > > >
> >> >> > > > > > diff --git 
> >> >> > > > > > a/Documentation/ABI/testing/sysfs-platform-fpga-dfl-fme-mgr 
> >> >> > > > > > b/Documentation/ABI/testing/sysfs-platform-fpga-dfl-fme-mgr
> >> >> > > > > > new file mode 100644
> >> >> > > > > > index 000..2d4f917
> >> >> > > > > > --- /dev/null
> >> >> > > > > > +++ 
> >> >> > > > > > b/Documentation/ABI/testing/sysfs-platform-fpga-dfl-fme-mgr
> >> >> > > > > > @@ -0,0 +1,8 @@
> >> >> > > > > > +What:  
> >> >> > > > > > /sys/bus/platform/devices/fpga-dfl-fme-mgr.0/interface_id
> >> >> > > > > > +Date:  November 2017
> >> >> > > > > > +KernelVersion:  4.15
> >> >> > > > > > +Contact:   Wu Hao 
> >> >> > > > > > +Description:   Read-only. It returns interface id of partial 
> >> >> > > > > > reconfiguration
> >> >> > > > > > +   hardware. Userspace could use this 
> >> >> > > > > > information to check if
> >> >> > > > > > +   current hardware is compatible with given 
> >> >> > > > > > image before FPGA
> >> >> > > > > > +   programming.
> >> >> > > > >
> >> >> > > > > I'm a little confused by this.  I can understand that the PR 
> >> >> > > > > bitstream
> >> >> > > > > has a dependency on the FPGA's static image, but I don't 
> >> >> > > > > understand
> >> >> > > > > the dependency of the bistream on the hardware that is used to 
> >> >> > > > > program
> >> >> > > > > the bitstream to the FPGA.
> >> >> > > >
> >> >> > > > Sorry for the confusion, the interface_id is used to indicate the 
> >> >> > > > version of
> >> >> > > > the hardware for partial reconfiguration (it's part of the static 
> >> >> > > > image of
> >> >> > > > the FPGA device). Will improve the description on this.
> >> >> > > >
> >> >> > >
> >> >> > > The interface_id expresses the compatibility of the static region 
> >> >> > > with PR
> >> >> > > bitstreams generated for it. It changes every time a new static 
> >> >> > > region is
> >> >> > > generated.
> >> >> > >
> >> >> > > Would it make more sense to have the interface_id exposed as part 
> >> >> > > of the FME
> >> >> > > device (which represents the static region)? I'm not sure - it kind 
> >> >> > > of also
> >> >> > > makes sense here, where you would have all the information in one 
> >> >> > > place (if the
> >> >> > > interface_id matches, I can use this component to program a 
> >> >> > > bitstream).
> >> >> >
> >> >> > Hi

Re: [PATCH v3 14/21] fpga: dfl: add fpga manager platform driver for FME

2018-02-06 Thread Wu Hao

On Tue, Feb 06, 2018 at 12:53:44PM -0600, Alan Tull wrote:
> On Tue, Feb 6, 2018 at 12:47 AM, Wu Hao  wrote:
> > On Mon, Feb 05, 2018 at 10:25:54PM -0600, Alan Tull wrote:
> >> On Mon, Feb 5, 2018 at 7:47 PM, Wu Hao  wrote:
> >> > On Mon, Feb 05, 2018 at 10:36:45AM -0800, Luebbers, Enno wrote:
> >> >> Hi Hao,
> >> >>
> >> >> On Sun, Feb 04, 2018 at 05:37:06PM +0800, Wu Hao wrote:
> >> >> > On Fri, Feb 02, 2018 at 04:26:26PM -0800, Luebbers, Enno wrote:
> >> >> > > Hi Hao, Alan,
> >> >> > >
> >> >> > > On Fri, Feb 02, 2018 at 05:42:13PM +0800, Wu Hao wrote:
> >> >> > > > On Thu, Feb 01, 2018 at 04:00:36PM -0600, Alan Tull wrote:
> >> >> > > > > On Mon, Nov 27, 2017 at 12:42 AM, Wu Hao  
> >> >> > > > > wrote:
> >> >> > > > >
> >> >> > > > > Hi Hao,
> >> >> > > > >
> >> >> > > > > A few comments below.   Besides that, looks good.
> >> >> > > > >
> >> >> > > > > > This patch adds fpga manager driver for FPGA Management 
> >> >> > > > > > Engine (FME). It
> >> >> > > > > > implements fpga_manager_ops for FPGA Partial Reconfiguration 
> >> >> > > > > > function.
> >> >> > > > > >
> >> >> > > > > > Signed-off-by: Tim Whisonant 
> >> >> > > > > > Signed-off-by: Enno Luebbers 
> >> >> > > > > > Signed-off-by: Shiva Rao 
> >> >> > > > > > Signed-off-by: Christopher Rauer 
> >> >> > > > > > Signed-off-by: Kang Luwei 
> >> >> > > > > > Signed-off-by: Xiao Guangrong 
> >> >> > > > > > Signed-off-by: Wu Hao 
> >> >> > > > > > 
> >> >> > > > > > v3: rename driver to dfl-fpga-fme-mgr
> >> >> > > > > > implemented status callback for fpga manager
> >> >> > > > > > rebased due to fpga api changes
> >> >> > > > > > ---
> >> >> > > > > >  .../ABI/testing/sysfs-platform-fpga-dfl-fme-mgr|   8 +
> >> >> > > > > >  drivers/fpga/Kconfig   |   6 +
> >> >> > > > > >  drivers/fpga/Makefile  |   1 +
> >> >> > > > > >  drivers/fpga/fpga-dfl-fme-mgr.c| 318 
> >> >> > > > > > +
> >> >> > > > > >  drivers/fpga/fpga-dfl.h|  39 ++-
> >> >> > > > > >  5 files changed, 371 insertions(+), 1 deletion(-)
> >> >> > > > > >  create mode 100644 
> >> >> > > > > > Documentation/ABI/testing/sysfs-platform-fpga-dfl-fme-mgr
> >> >> > > > > >  create mode 100644 drivers/fpga/fpga-dfl-fme-mgr.c
> >> >> > > > > >
> >> >> > > > > > diff --git 
> >> >> > > > > > a/Documentation/ABI/testing/sysfs-platform-fpga-dfl-fme-mgr 
> >> >> > > > > > b/Documentation/ABI/testing/sysfs-platform-fpga-dfl-fme-mgr
> >> >> > > > > > new file mode 100644
> >> >> > > > > > index 000..2d4f917
> >> >> > > > > > --- /dev/null
> >> >> > > > > > +++ 
> >> >> > > > > > b/Documentation/ABI/testing/sysfs-platform-fpga-dfl-fme-mgr
> >> >> > > > > > @@ -0,0 +1,8 @@
> >> >> > > > > > +What:  
> >> >> > > > > > /sys/bus/platform/devices/fpga-dfl-fme-mgr.0/interface_id
> >> >> > > > > > +Date:  November 2017
> >> >> > > > > > +KernelVersion:  4.15
> >> >> > > > > > +Contact:   Wu Hao 
> >> >> > > > > > +Description:   Read-only. It returns interface id of partial 
> >> >> > > > > > reconfiguration
> >> >> > > > > > +   hardware. Userspace could use this 
> >> >> > > > > > information to check if
> >> >> > > > > > +   current hardware is compatible with given 
> >> >> > > > > > image before FPGA
> >> >> > > > > > +   programming.
> >> >> > > > >
> >> >> > > > > I'm a little confused by this.  I can understand that the PR 
> >> >> > > > > bitstream
> >> >> > > > > has a dependency on the FPGA's static image, but I don't 
> >> >> > > > > understand
> >> >> > > > > the dependency of the bistream on the hardware that is used to 
> >> >> > > > > program
> >> >> > > > > the bitstream to the FPGA.
> >> >> > > >
> >> >> > > > Sorry for the confusion, the interface_id is used to indicate the 
> >> >> > > > version of
> >> >> > > > the hardware for partial reconfiguration (it's part of the static 
> >> >> > > > image of
> >> >> > > > the FPGA device). Will improve the description on this.
> >> >> > > >
> >> >> > >
> >> >> > > The interface_id expresses the compatibility of the static region 
> >> >> > > with PR
> >> >> > > bitstreams generated for it. It changes every time a new static 
> >> >> > > region is
> >> >> > > generated.
> >> >> > >
> >> >> > > Would it make more sense to have the interface_id exposed as part 
> >> >> > > of the FME
> >> >> > > device (which represents the static region)? I'm not sure - it kind 
> >> >> > > of also
> >> >> > > makes sense here, where you would have all the information in one 
> >> >> > > place (if the
> >> >> > > interface_id matches, I can use this component to program a 
> >> >> > > bitstream).
> >> >> >
> >> >> > Hi Enno
> >> >> >
> >> >> > Yes, this interface is under fpga-dfl-fme-mgr.0, and 
> >> >> > fpga-dfl-fme-mgr.0 is
> >> >> > under fpga-dfl-fme.0. It's part of the FME device for sure. From 
> >> >> > another
> >> >> > point of view, it means if

[PATCH v3] Documentation/ABI: update infiniband sysfs interfaces

2018-02-06 Thread Aishwarya Pant

Add documentation for core and hardware specific infiniband interfaces.
The descriptions have been collected from git commit logs, reading
through code and data sheets. Some drivers have incomplete doc and are
annotated with the comment '[to be documented]'.

Signed-off-by: Aishwarya Pant 
---
Changes in v3:
-  outbound -> inbound in description of port_rcv_constraint_errors
v2:
- Move infiniband interface from testing to stable
- Fix typos
- Update description of cap_mask, port_xmit_constraint_errors and
  port_rcv_constraint_errors
- Add doc for hw_counters
- Remove old documentation

 Documentation/ABI/stable/sysfs-class-infiniband  | 818 +++
 Documentation/ABI/testing/sysfs-class-infiniband |  16 -
 Documentation/infiniband/sysfs.txt   | 129 +---
 3 files changed, 820 insertions(+), 143 deletions(-)
 create mode 100644 Documentation/ABI/stable/sysfs-class-infiniband
 delete mode 100644 Documentation/ABI/testing/sysfs-class-infiniband

diff --git a/Documentation/ABI/stable/sysfs-class-infiniband 
b/Documentation/ABI/stable/sysfs-class-infiniband
new file mode 100644
index ..f3acf3713a91
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-class-infiniband
@@ -0,0 +1,818 @@
+sysfs interface common for all infiniband devices
+-
+
+What:  /sys/class/infiniband//node_type
+What:  /sys/class/infiniband//node_guid
+What:  /sys/class/infiniband//sys_image_guid
+Date:  Apr, 2005
+KernelVersion: v2.6.12
+Contact:   linux-r...@vger.kernel.org
+Description:
+   node_type:  (RO) Node type (CA, RNIC, usNIC, usNIC UDP,
+   switch or router)
+
+   node_guid:  (RO) Node GUID
+
+   sys_image_guid: (RO) System image GUID
+
+
+What:  /sys/class/infiniband//node_desc
+Date:  Feb, 2006
+KernelVersion: v2.6.17
+Contact:   linux-r...@vger.kernel.org
+Description:
+   (RW) Update the node description with information such as the
+   node's hostname, so that IB network management software can tie
+   its view to the real world.
+
+
+What:  /sys/class/infiniband//fw_ver
+Date:  Jun, 2016
+KernelVersion: v4.10
+Contact:   linux-r...@vger.kernel.org
+Description:
+   (RO) Display firmware version
+
+
+What:  /sys/class/infiniband//ports//lid
+What:  /sys/class/infiniband//ports//rate
+What:  /sys/class/infiniband//ports//lid_mask_count
+What:  /sys/class/infiniband//ports//sm_sl
+What:  /sys/class/infiniband//ports//sm_lid
+What:  /sys/class/infiniband//ports//state
+What:  /sys/class/infiniband//ports//phys_state
+What:  /sys/class/infiniband//ports//cap_mask
+Date:  Apr, 2005
+KernelVersion: v2.6.12
+Contact:   linux-r...@vger.kernel.org
+Description:
+
+   lid:(RO) Port LID
+
+   rate:   (RO) Port data rate (active width * active
+   speed)
+
+   lid_mask_count: (RO) Port LID mask count
+
+   sm_sl:  (RO) Subnet manager SL for port's subnet
+
+   sm_lid: (RO) Subnet manager LID for port's subnet
+
+   state:  (RO) Port state (DOWN, INIT, ARMED, ACTIVE or
+   ACTIVE_DEFER)
+
+   phys_state: (RO) Port physical state (Sleep, Polling,
+   LinkUp, etc)
+
+   cap_mask:   (RO) Port capability mask. 2 bits here are
+   settable- IsCommunicationManagementSupported
+   (set when CM module is loaded) and IsSM (set via
+   open of issmN file).
+
+
+What:  /sys/class/infiniband//ports//link_layer
+Date:  Oct, 2010
+KernelVersion: v2.6.37
+Contact:   linux-r...@vger.kernel.org
+Description:
+   (RO) Link layer type information (Infiniband or Ethernet type)
+
+
+What:  
/sys/class/infiniband//ports//counters/symbol_error
+What:  
/sys/class/infiniband//ports//counters/port_rcv_errors
+What:  
/sys/class/infiniband//ports//counters/port_rcv_remote_physical_errors
+What:  
/sys/class/infiniband//ports//counters/port_rcv_switch_relay_errors
+What:  
/sys/class/infiniband//ports//counters/link_error_recovery
+What:  
/sys/class/infiniband//ports//counters/port_xmit_constraint_errors
+What:  
/sys/class/infiniband//ports//counters/port_rcv_contraint_errors
+What:  
/sys/class/infiniband//ports//counters/local_link_integrity_errors
+What:  
/sys/class/infiniband//ports//counters/excessive_buffer_overrun_errors
+What:  
/sys/class/infiniband//ports//counters/port_xmit_data
+What:  
/sys/class/infiniband//ports//counters/port_rcv_data
+What:

[PATCH v3] Documentation/ABI: update infiniband sysfs interfaces

2018-02-06 Thread Aishwarya Pant

Add documentation for core and hardware specific infiniband interfaces.
The descriptions have been collected from git commit logs, reading
through code and data sheets. Some drivers have incomplete doc and are
annotated with the comment '[to be documented]'.

Signed-off-by: Aishwarya Pant 
---
Changes in v3:
-  outbound -> inbound in description of port_rcv_constraint_errors
v2:
- Move infiniband interface from testing to stable
- Fix typos
- Update description of cap_mask, port_xmit_constraint_errors and
  port_rcv_constraint_errors
- Add doc for hw_counters
- Remove old documentation

 Documentation/ABI/stable/sysfs-class-infiniband  | 818 +++
 Documentation/ABI/testing/sysfs-class-infiniband |  16 -
 Documentation/infiniband/sysfs.txt   | 129 +---
 3 files changed, 820 insertions(+), 143 deletions(-)
 create mode 100644 Documentation/ABI/stable/sysfs-class-infiniband
 delete mode 100644 Documentation/ABI/testing/sysfs-class-infiniband

diff --git a/Documentation/ABI/stable/sysfs-class-infiniband 
b/Documentation/ABI/stable/sysfs-class-infiniband
new file mode 100644
index ..f3acf3713a91
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-class-infiniband
@@ -0,0 +1,818 @@
+sysfs interface common for all infiniband devices
+-
+
+What:  /sys/class/infiniband//node_type
+What:  /sys/class/infiniband//node_guid
+What:  /sys/class/infiniband//sys_image_guid
+Date:  Apr, 2005
+KernelVersion: v2.6.12
+Contact:   linux-r...@vger.kernel.org
+Description:
+   node_type:  (RO) Node type (CA, RNIC, usNIC, usNIC UDP,
+   switch or router)
+
+   node_guid:  (RO) Node GUID
+
+   sys_image_guid: (RO) System image GUID
+
+
+What:  /sys/class/infiniband//node_desc
+Date:  Feb, 2006
+KernelVersion: v2.6.17
+Contact:   linux-r...@vger.kernel.org
+Description:
+   (RW) Update the node description with information such as the
+   node's hostname, so that IB network management software can tie
+   its view to the real world.
+
+
+What:  /sys/class/infiniband//fw_ver
+Date:  Jun, 2016
+KernelVersion: v4.10
+Contact:   linux-r...@vger.kernel.org
+Description:
+   (RO) Display firmware version
+
+
+What:  /sys/class/infiniband//ports//lid
+What:  /sys/class/infiniband//ports//rate
+What:  /sys/class/infiniband//ports//lid_mask_count
+What:  /sys/class/infiniband//ports//sm_sl
+What:  /sys/class/infiniband//ports//sm_lid
+What:  /sys/class/infiniband//ports//state
+What:  /sys/class/infiniband//ports//phys_state
+What:  /sys/class/infiniband//ports//cap_mask
+Date:  Apr, 2005
+KernelVersion: v2.6.12
+Contact:   linux-r...@vger.kernel.org
+Description:
+
+   lid:(RO) Port LID
+
+   rate:   (RO) Port data rate (active width * active
+   speed)
+
+   lid_mask_count: (RO) Port LID mask count
+
+   sm_sl:  (RO) Subnet manager SL for port's subnet
+
+   sm_lid: (RO) Subnet manager LID for port's subnet
+
+   state:  (RO) Port state (DOWN, INIT, ARMED, ACTIVE or
+   ACTIVE_DEFER)
+
+   phys_state: (RO) Port physical state (Sleep, Polling,
+   LinkUp, etc)
+
+   cap_mask:   (RO) Port capability mask. 2 bits here are
+   settable- IsCommunicationManagementSupported
+   (set when CM module is loaded) and IsSM (set via
+   open of issmN file).
+
+
+What:  /sys/class/infiniband//ports//link_layer
+Date:  Oct, 2010
+KernelVersion: v2.6.37
+Contact:   linux-r...@vger.kernel.org
+Description:
+   (RO) Link layer type information (Infiniband or Ethernet type)
+
+
+What:  
/sys/class/infiniband//ports//counters/symbol_error
+What:  
/sys/class/infiniband//ports//counters/port_rcv_errors
+What:  
/sys/class/infiniband//ports//counters/port_rcv_remote_physical_errors
+What:  
/sys/class/infiniband//ports//counters/port_rcv_switch_relay_errors
+What:  
/sys/class/infiniband//ports//counters/link_error_recovery
+What:  
/sys/class/infiniband//ports//counters/port_xmit_constraint_errors
+What:  
/sys/class/infiniband//ports//counters/port_rcv_contraint_errors
+What:  
/sys/class/infiniband//ports//counters/local_link_integrity_errors
+What:  
/sys/class/infiniband//ports//counters/excessive_buffer_overrun_errors
+What:  
/sys/class/infiniband//ports//counters/port_xmit_data
+What:  
/sys/class/infiniband//ports//counters/port_rcv_data
+What:

Re: WARNING in kmalloc_slab (3)

2018-02-06 Thread Dmitry Vyukov

On Tue, Dec 12, 2017 at 10:22 PM, Eric Biggers  wrote:
> On Mon, Dec 04, 2017 at 12:26:32PM +0300, Dan Carpenter wrote:
>> On Mon, Dec 04, 2017 at 09:18:05AM +0100, Dmitry Vyukov wrote:
>> > On Mon, Dec 4, 2017 at 9:14 AM, Dan Carpenter  
>> > wrote:
>> > > On Sun, Dec 03, 2017 at 12:16:08PM -0800, Eric Biggers wrote:
>> > >> Looks like BLKTRACESETUP doesn't limit the '.buf_nr' parameter, 
>> > >> allowing anyone
>> > >> who can open a block device to cause an extremely large kmalloc.  
>> > >> Here's a
>> > >> simplified reproducer:
>> > >>
>> > >
>> > > There are lots of places which allow people to allocate as much as they
>> > > want.  With Syzcaller, you might want to just hard code a __GFP_NOWARN
>> > > in to disable it.
>> >
>> > Hi,
>> >
>> > Hard code it where?
>>
>> My idea was to just make warn_alloc() a no-op.
>>
>> >
>> > User-controllable allocation are supposed to use __GFP_NOWARN.
>>
>> No that's not right.  What we don't want is unprivileged users to use
>> all the memory and we don't want unprivileged users to spam
>> /var/log/messages.  But you have to have slightly elevated permissions
>> to open block devices right?  The warning is helpful.  Admins should
>> "don't do that" if they don't want the warning.
>
> WARN_ON() should only be used for kernel bugs.  printk can be a different 
> story.
> If it's a "userspace shouldn't do this" kind of thing, then if there is any
> message at all it should be a rate-limited printk that actually explains what
> the problem is, not a random WARN_ON() that can only be interpreted by kernel
> developers.
>
> And yes, the fact that anyone with read access to any block device, even e.g. 
> a
> loop device, can cause the kernel to do an unbounded kmalloc *is* a bug.  It
> needs to have a reasonable limit.  It is not a problem on all systems, but on
> some systems "the admin" might give users read access to some block devices.



#syz fix: kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE

Re: WARNING in kmalloc_slab (3)

2018-02-06 Thread Dmitry Vyukov

On Tue, Dec 12, 2017 at 10:22 PM, Eric Biggers  wrote:
> On Mon, Dec 04, 2017 at 12:26:32PM +0300, Dan Carpenter wrote:
>> On Mon, Dec 04, 2017 at 09:18:05AM +0100, Dmitry Vyukov wrote:
>> > On Mon, Dec 4, 2017 at 9:14 AM, Dan Carpenter  
>> > wrote:
>> > > On Sun, Dec 03, 2017 at 12:16:08PM -0800, Eric Biggers wrote:
>> > >> Looks like BLKTRACESETUP doesn't limit the '.buf_nr' parameter, 
>> > >> allowing anyone
>> > >> who can open a block device to cause an extremely large kmalloc.  
>> > >> Here's a
>> > >> simplified reproducer:
>> > >>
>> > >
>> > > There are lots of places which allow people to allocate as much as they
>> > > want.  With Syzcaller, you might want to just hard code a __GFP_NOWARN
>> > > in to disable it.
>> >
>> > Hi,
>> >
>> > Hard code it where?
>>
>> My idea was to just make warn_alloc() a no-op.
>>
>> >
>> > User-controllable allocation are supposed to use __GFP_NOWARN.
>>
>> No that's not right.  What we don't want is unprivileged users to use
>> all the memory and we don't want unprivileged users to spam
>> /var/log/messages.  But you have to have slightly elevated permissions
>> to open block devices right?  The warning is helpful.  Admins should
>> "don't do that" if they don't want the warning.
>
> WARN_ON() should only be used for kernel bugs.  printk can be a different 
> story.
> If it's a "userspace shouldn't do this" kind of thing, then if there is any
> message at all it should be a rate-limited printk that actually explains what
> the problem is, not a random WARN_ON() that can only be interpreted by kernel
> developers.
>
> And yes, the fact that anyone with read access to any block device, even e.g. 
> a
> loop device, can cause the kernel to do an unbounded kmalloc *is* a bug.  It
> needs to have a reasonable limit.  It is not a problem on all systems, but on
> some systems "the admin" might give users read access to some block devices.



#syz fix: kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE

Re: WARNING: kmalloc bug in relay_open_buf

2018-02-06 Thread Dmitry Vyukov

On Wed, Feb 7, 2018 at 12:21 AM, Andrew Morton
 wrote:
> On Tue, 06 Feb 2018 14:58:02 -0800 syzbot 
>  wrote:
>
>> Hello,
>>
>> syzbot hit the following crash on upstream commit
>> e237f98a9c134c3d600353f21e07db915516875b (Mon Feb 5 21:35:56 2018 +)
>> Merge tag 'xfs-4.16-merge-5' of
>> git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
>>
>> C reproducer is attached.
>> syzkaller reproducer is attached.
>> Raw console output is attached.
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+7525b19f9531f76b8...@syzkaller.appspotmail.com
>> It will help syzbot understand when the bug is fixed. See footer for
>> details.
>> If you forward the report, please keep this part and the footer.
>>
>> audit: type=1400 audit(1517939984.452:7): avc:  denied  { map } for
>> pid=4159 comm="syzkaller032522" path="/root/syzkaller032522586" dev="sda1"
>> ino=16481 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
>> tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
>> WARNING: CPU: 0 PID: 4159 at mm/slab_common.c:1012 kmalloc_slab+0x5d/0x70
>> mm/slab_common.c:1012
>> Kernel panic - not syncing: panic_on_warn set ...
>
>
> David sent a fix today which I believe will address this.

Thanks
Let's tell syzbot about the fix:
#syz fix: kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE

> From: David Rientjes 
> Subject: kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE
>
> chan->n_subbufs is set by the user and relay_create_buf() does a kmalloc()
> of chan->n_subbufs * sizeof(size_t *).
>
> kmalloc_slab() will generate a warning when this fails if
> chan->subbufs * sizeof(size_t *) > KMALLOC_MAX_SIZE.
>
> Limit chan->n_subbufs to the maximum allowed kmalloc() size.
>
> Link: 
> http://lkml.kernel.org/r/alpine.deb.2.10.1802061216100.122...@chino.kir.corp.google.com
> Fixes: f6302f1bcd75 ("relay: prevent integer overflow in relay_open()")
> Signed-off-by: David Rientjes 
> Reviewed-by: Andrew Morton 
> Cc: Jens Axboe 
> Cc: Dave Jiang 
> Cc: Al Viro 
> Cc: Dan Carpenter 
> Signed-off-by: Andrew Morton 
> ---
>
>  kernel/relay.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff -puN kernel/relay.c~kernel-relay-limit-kmalloc-size-to-kmalloc_max_size 
> kernel/relay.c
> --- a/kernel/relay.c~kernel-relay-limit-kmalloc-size-to-kmalloc_max_size
> +++ a/kernel/relay.c
> @@ -163,7 +163,7 @@ static struct rchan_buf *relay_create_bu
>  {
> struct rchan_buf *buf;
>
> -   if (chan->n_subbufs > UINT_MAX / sizeof(size_t *))
> +   if (chan->n_subbufs > KMALLOC_MAX_SIZE / sizeof(size_t *))
> return NULL;
>
> buf = kzalloc(sizeof(struct rchan_buf), GFP_KERNEL);

Re: WARNING: kmalloc bug in relay_open_buf

2018-02-06 Thread Dmitry Vyukov

On Wed, Feb 7, 2018 at 12:21 AM, Andrew Morton
 wrote:
> On Tue, 06 Feb 2018 14:58:02 -0800 syzbot 
>  wrote:
>
>> Hello,
>>
>> syzbot hit the following crash on upstream commit
>> e237f98a9c134c3d600353f21e07db915516875b (Mon Feb 5 21:35:56 2018 +)
>> Merge tag 'xfs-4.16-merge-5' of
>> git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
>>
>> C reproducer is attached.
>> syzkaller reproducer is attached.
>> Raw console output is attached.
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+7525b19f9531f76b8...@syzkaller.appspotmail.com
>> It will help syzbot understand when the bug is fixed. See footer for
>> details.
>> If you forward the report, please keep this part and the footer.
>>
>> audit: type=1400 audit(1517939984.452:7): avc:  denied  { map } for
>> pid=4159 comm="syzkaller032522" path="/root/syzkaller032522586" dev="sda1"
>> ino=16481 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
>> tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
>> WARNING: CPU: 0 PID: 4159 at mm/slab_common.c:1012 kmalloc_slab+0x5d/0x70
>> mm/slab_common.c:1012
>> Kernel panic - not syncing: panic_on_warn set ...
>
>
> David sent a fix today which I believe will address this.

Thanks
Let's tell syzbot about the fix:
#syz fix: kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE

> From: David Rientjes 
> Subject: kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE
>
> chan->n_subbufs is set by the user and relay_create_buf() does a kmalloc()
> of chan->n_subbufs * sizeof(size_t *).
>
> kmalloc_slab() will generate a warning when this fails if
> chan->subbufs * sizeof(size_t *) > KMALLOC_MAX_SIZE.
>
> Limit chan->n_subbufs to the maximum allowed kmalloc() size.
>
> Link: 
> http://lkml.kernel.org/r/alpine.deb.2.10.1802061216100.122...@chino.kir.corp.google.com
> Fixes: f6302f1bcd75 ("relay: prevent integer overflow in relay_open()")
> Signed-off-by: David Rientjes 
> Reviewed-by: Andrew Morton 
> Cc: Jens Axboe 
> Cc: Dave Jiang 
> Cc: Al Viro 
> Cc: Dan Carpenter 
> Signed-off-by: Andrew Morton 
> ---
>
>  kernel/relay.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff -puN kernel/relay.c~kernel-relay-limit-kmalloc-size-to-kmalloc_max_size 
> kernel/relay.c
> --- a/kernel/relay.c~kernel-relay-limit-kmalloc-size-to-kmalloc_max_size
> +++ a/kernel/relay.c
> @@ -163,7 +163,7 @@ static struct rchan_buf *relay_create_bu
>  {
> struct rchan_buf *buf;
>
> -   if (chan->n_subbufs > UINT_MAX / sizeof(size_t *))
> +   if (chan->n_subbufs > KMALLOC_MAX_SIZE / sizeof(size_t *))
> return NULL;
>
> buf = kzalloc(sizeof(struct rchan_buf), GFP_KERNEL);

Re: [PATCH net 1/1 v2] rtnetlink: require unique netns identifier

2018-02-06 Thread kbuild test robot

Hi Christian,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net/master]

url:
https://github.com/0day-ci/linux/commits/Christian-Brauner/rtnetlink-require-unique-netns-identifier/20180207-064207
config: x86_64-rhel (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817de851: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817de85f: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817decc2: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817decf1: 0f ff c3
ud0%ebx,%eax
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817def6c: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817df332: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e1947: 0f ff 44 8b ad  
ud0-0x53(%rbx,%rcx,4),%eax
   arch/x86/tools/insn_decoder_test: warning: objdump says 5 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e2552: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e2585: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e26d8: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e2752: 0f ff 48 8d 
ud0-0x73(%rax),%ecx
   arch/x86/tools/insn_decoder_test: warning: objdump says 4 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e2801: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e305e: 0f ff eb
ud0%ebx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e3559: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e3fd8: 0f ff 48 8b 
ud0

Re: [PATCH net 1/1 v2] rtnetlink: require unique netns identifier

2018-02-06 Thread kbuild test robot

Hi Christian,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net/master]

url:
https://github.com/0day-ci/linux/commits/Christian-Brauner/rtnetlink-require-unique-netns-identifier/20180207-064207
config: x86_64-rhel (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817de851: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817de85f: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817decc2: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817decf1: 0f ff c3
ud0%ebx,%eax
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817def6c: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817df332: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e1947: 0f ff 44 8b ad  
ud0-0x53(%rbx,%rcx,4),%eax
   arch/x86/tools/insn_decoder_test: warning: objdump says 5 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e2552: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e2585: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e26d8: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e2752: 0f ff 48 8d 
ud0-0x73(%rax),%ecx
   arch/x86/tools/insn_decoder_test: warning: objdump says 4 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e2801: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e305e: 0f ff eb
ud0%ebx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e3559: 0f ff e9
ud0%ecx,%ebp
   arch/x86/tools/insn_decoder_test: warning: objdump says 3 bytes, but 
insn_get_length() says 2
   arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder 
bug, please report this.
   arch/x86/tools/insn_decoder_test: warning: 817e3fd8: 0f ff 48 8b 
ud0

Re: [PATCH v2 2/3] arm64: dts: sdm845: Add minimal dts/dtsi files for sdm845 SoC and MTP

2018-02-06 Thread Rajendra Nayak



On 02/07/2018 03:25 AM, Doug Anderson wrote:
> Hi,
> 
> On Wed, Jan 31, 2018 at 8:19 AM, Rajendra Nayak  wrote:
>> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
>> b/arch/arm64/boot/dts/qcom/sdm845.dtsi
>> new file mode 100644
>> index ..02520f19e4ca
>> --- /dev/null
>> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
>> @@ -0,0 +1,277 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright (c) 2018, The Linux Foundation. All rights reserved.
>> + */
>> +
>> +#include 
>> +
>> +/ {
>> +   model = "Qualcomm Technologies, Inc. SDM845";
> 
> I'm fairly certain that "model" doesn't belong in the SoC .dtsi file.
> Only in the board .dts file.
> 
> 
>> +   clocks {
>> +   xo_board: xo_board {
> 
> Just to make it explicit: see my comments in patch 3/3 in this series
> about using "_" in node names.  I believe this should be:
> 
>   xo_board: xo-board {
> 
> 
>> +   spmi_bus: qcom,spmi@c44 {
> 
> Drop the qcom in the node name.  AKA, I believe this should be:
> 
> spmi_bus: spmi@c44 {
> 
> Specifically the node name is supposed to be a generic component name
> then with an address.  I see that Rob Herring said the same thing when
> he reviewed v1 of this patch just now (it seems like people are still
> commenting there, so make sure you collect the latest feedback from
> there when re-spinning).

yes, I'll make sure I fix up based on Robs' review of the v1.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Re: [PATCH v2 2/3] arm64: dts: sdm845: Add minimal dts/dtsi files for sdm845 SoC and MTP

2018-02-06 Thread Rajendra Nayak



On 02/07/2018 03:25 AM, Doug Anderson wrote:
> Hi,
> 
> On Wed, Jan 31, 2018 at 8:19 AM, Rajendra Nayak  wrote:
>> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
>> b/arch/arm64/boot/dts/qcom/sdm845.dtsi
>> new file mode 100644
>> index ..02520f19e4ca
>> --- /dev/null
>> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
>> @@ -0,0 +1,277 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright (c) 2018, The Linux Foundation. All rights reserved.
>> + */
>> +
>> +#include 
>> +
>> +/ {
>> +   model = "Qualcomm Technologies, Inc. SDM845";
> 
> I'm fairly certain that "model" doesn't belong in the SoC .dtsi file.
> Only in the board .dts file.
> 
> 
>> +   clocks {
>> +   xo_board: xo_board {
> 
> Just to make it explicit: see my comments in patch 3/3 in this series
> about using "_" in node names.  I believe this should be:
> 
>   xo_board: xo-board {
> 
> 
>> +   spmi_bus: qcom,spmi@c44 {
> 
> Drop the qcom in the node name.  AKA, I believe this should be:
> 
> spmi_bus: spmi@c44 {
> 
> Specifically the node name is supposed to be a generic component name
> then with an address.  I see that Rob Herring said the same thing when
> he reviewed v1 of this patch just now (it seems like people are still
> commenting there, so make sure you collect the latest feedback from
> there when re-spinning).

yes, I'll make sure I fix up based on Robs' review of the v1.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Re: [RFC PATCH] vfio/pci: Add ioeventfd support

2018-02-06 Thread Alexey Kardashevskiy

On 07/02/18 15:25, Alex Williamson wrote:
> On Wed, 7 Feb 2018 15:09:22 +1100
> Alexey Kardashevskiy  wrote:
>> On 07/02/18 11:08, Alex Williamson wrote:
>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>> index e3301dbd27d4..07966a5f0832 100644
>>> --- a/include/uapi/linux/vfio.h
>>> +++ b/include/uapi/linux/vfio.h
>>> @@ -503,6 +503,30 @@ struct vfio_pci_hot_reset {
>>>  
>>>  #define VFIO_DEVICE_PCI_HOT_RESET  _IO(VFIO_TYPE, VFIO_BASE + 13)
>>>  
>>> +/**
>>> + * VFIO_DEVICE_IOEVENTFD - _IOW(VFIO_TYPE, VFIO_BASE + 14,
>>> + *  struct vfio_device_ioeventfd)
>>> + *
>>> + * Perform a write to the device at the specified device fd offset, with
>>> + * the specified data and width when the provided eventfd is triggered.
>>> + *
>>> + * Return: 0 on success, -errno on failure.
>>> + */
>>> +struct vfio_device_ioeventfd {
>>> +   __u32   argsz;
>>> +   __u32   flags;
>>> +#define VFIO_DEVICE_IOEVENTFD_8(1 << 0) /* 1-byte write */
>>> +#define VFIO_DEVICE_IOEVENTFD_16   (1 << 1) /* 2-byte write */
>>> +#define VFIO_DEVICE_IOEVENTFD_32   (1 << 2) /* 4-byte write */
>>> +#define VFIO_DEVICE_IOEVENTFD_64   (1 << 3) /* 8-byte write */
>>> +#define VFIO_DEVICE_IOEVENTFD_SIZE_MASK(0xf)
>>> +   __u64   offset; /* device fd offset of write */
>>> +   __u64   data;   /* data to be written */
>>> +   __s32   fd; /* -1 for de-assignment */
>>> +};
>>> +
>>> +#define VFIO_DEVICE_IOEVENTFD  _IO(VFIO_TYPE, VFIO_BASE + 14)  
>>
>>
>> Is this a first ioctl with endianness fixed to little-endian? I'd suggest
>> to comment on that as things like vfio_info_cap_header do use the host
>> endianness.
> 
> Look at our current read and write interface, we call leXX_to_cpu
> before calling iowriteXX there and I think a user would logically
> expect to use the same data format here as they would there.

If the data is "char data[8]" (i.e. bytestream), then it can be expected to
be device/bus endian (i.e. PCI == little endian), but if it is u64 - then I
am not so sure really, and this made me look around. It could be "__le64
data" too.

> Also note
> that iowriteXX does a cpu_to_leXX, so are we really defining the
> interface as little-endian or are we just trying to make ourselves
> endian neutral and counter that implicit conversion?  Thanks,

Defining it LE is fine, I just find it a bit confusing when
vfio_info_cap_header is host endian but vfio_device_ioeventfd is not.


-- 
Alexey

Re: [RFC PATCH] vfio/pci: Add ioeventfd support

2018-02-06 Thread Alexey Kardashevskiy

On 07/02/18 15:25, Alex Williamson wrote:
> On Wed, 7 Feb 2018 15:09:22 +1100
> Alexey Kardashevskiy  wrote:
>> On 07/02/18 11:08, Alex Williamson wrote:
>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>> index e3301dbd27d4..07966a5f0832 100644
>>> --- a/include/uapi/linux/vfio.h
>>> +++ b/include/uapi/linux/vfio.h
>>> @@ -503,6 +503,30 @@ struct vfio_pci_hot_reset {
>>>  
>>>  #define VFIO_DEVICE_PCI_HOT_RESET  _IO(VFIO_TYPE, VFIO_BASE + 13)
>>>  
>>> +/**
>>> + * VFIO_DEVICE_IOEVENTFD - _IOW(VFIO_TYPE, VFIO_BASE + 14,
>>> + *  struct vfio_device_ioeventfd)
>>> + *
>>> + * Perform a write to the device at the specified device fd offset, with
>>> + * the specified data and width when the provided eventfd is triggered.
>>> + *
>>> + * Return: 0 on success, -errno on failure.
>>> + */
>>> +struct vfio_device_ioeventfd {
>>> +   __u32   argsz;
>>> +   __u32   flags;
>>> +#define VFIO_DEVICE_IOEVENTFD_8(1 << 0) /* 1-byte write */
>>> +#define VFIO_DEVICE_IOEVENTFD_16   (1 << 1) /* 2-byte write */
>>> +#define VFIO_DEVICE_IOEVENTFD_32   (1 << 2) /* 4-byte write */
>>> +#define VFIO_DEVICE_IOEVENTFD_64   (1 << 3) /* 8-byte write */
>>> +#define VFIO_DEVICE_IOEVENTFD_SIZE_MASK(0xf)
>>> +   __u64   offset; /* device fd offset of write */
>>> +   __u64   data;   /* data to be written */
>>> +   __s32   fd; /* -1 for de-assignment */
>>> +};
>>> +
>>> +#define VFIO_DEVICE_IOEVENTFD  _IO(VFIO_TYPE, VFIO_BASE + 14)  
>>
>>
>> Is this a first ioctl with endianness fixed to little-endian? I'd suggest
>> to comment on that as things like vfio_info_cap_header do use the host
>> endianness.
> 
> Look at our current read and write interface, we call leXX_to_cpu
> before calling iowriteXX there and I think a user would logically
> expect to use the same data format here as they would there.

If the data is "char data[8]" (i.e. bytestream), then it can be expected to
be device/bus endian (i.e. PCI == little endian), but if it is u64 - then I
am not so sure really, and this made me look around. It could be "__le64
data" too.

> Also note
> that iowriteXX does a cpu_to_leXX, so are we really defining the
> interface as little-endian or are we just trying to make ourselves
> endian neutral and counter that implicit conversion?  Thanks,

Defining it LE is fine, I just find it a bit confusing when
vfio_info_cap_header is host endian but vfio_device_ioeventfd is not.


-- 
Alexey

Re: [PATCH 1/2] arm64: dts: sdm845: Add minimal dts/dtsi files for sdm845 SoC and MTP

2018-02-06 Thread Rajendra Nayak

[]..

>> +
>> +#include 
>> +
>> +/ {
>> +   model = "Qualcomm Technologies, Inc. SDM845";
> 
> This should only be in the board level file.

thanks, will fix.

> 
>> +
>> +   interrupt-parent = <>;
>> +
>> +   #address-cells = <2>;
>> +   #size-cells = <2>;
>> +
>> +   chosen { };
>> +
>> +   memory {
>> +   device_type = "memory";
>> +   /* We expect the bootloader to fill in the reg */
> 
> The start address is variable? If not you should populate the base and
> have a unit-address.

sure, I'll check and update.

> 
>> +   reg = <0 0 0 0>;
>> +   };
>> +

[]..
>> +
>> +   soc: soc {
>> +   #address-cells = <1>;
>> +   #size-cells = <1>;
>> +   ranges = <0 0 0 0x>;
>> +   compatible = "simple-bus";
>> +
>> +   intc: interrupt-controller@17a0 {
>> +   compatible = "arm,gic-v3";
>> +   #interrupt-cells = <3>;
>> +   interrupt-controller;
>> +   #redistributor-regions = <1>;
>> +   redistributor-stride = <0x0 0x2>;
>> +   reg = <0x17a0 0x1>, /* GICD */
>> + <0x17a6 0x10>;/* GICR * 8 */
>> +   interrupts = ;
>> +   };
>> +
>> +   gcc: clock-controller@10 {
>> +   compatible = "qcom,gcc-sdm845";
> 
> sdm845-gcc is the preferred order.

This is still proposed as part of the GCC patch for sdm845 [1]
(which looks like has neither you nor the DT list copied :/ )
Also looking at Documentation/devicetree/bindings/clock/qcom,gcc.txt,
I see we seem to follow the gcc- convention for compatible all along :(

"qcom,gcc-apq8064"
"qcom,gcc-apq8084"
"qcom,gcc-ipq8064"
"qcom,gcc-ipq4019"
"qcom,gcc-ipq8074"
"qcom,gcc-msm8660"
"qcom,gcc-msm8916"
"qcom,gcc-msm8960"
"qcom,gcc-msm8974"
"qcom,gcc-msm8974pro"
"qcom,gcc-msm8974pro-ac"
"qcom,gcc-msm8994"
"qcom,gcc-msm8996"
"qcom,gcc-mdm9615"

[1] https://patchwork.kernel.org/patch/10193895/ 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Re: [PATCH 1/2] arm64: dts: sdm845: Add minimal dts/dtsi files for sdm845 SoC and MTP

2018-02-06 Thread Rajendra Nayak

[]..

>> +
>> +#include 
>> +
>> +/ {
>> +   model = "Qualcomm Technologies, Inc. SDM845";
> 
> This should only be in the board level file.

thanks, will fix.

> 
>> +
>> +   interrupt-parent = <>;
>> +
>> +   #address-cells = <2>;
>> +   #size-cells = <2>;
>> +
>> +   chosen { };
>> +
>> +   memory {
>> +   device_type = "memory";
>> +   /* We expect the bootloader to fill in the reg */
> 
> The start address is variable? If not you should populate the base and
> have a unit-address.

sure, I'll check and update.

> 
>> +   reg = <0 0 0 0>;
>> +   };
>> +

[]..
>> +
>> +   soc: soc {
>> +   #address-cells = <1>;
>> +   #size-cells = <1>;
>> +   ranges = <0 0 0 0x>;
>> +   compatible = "simple-bus";
>> +
>> +   intc: interrupt-controller@17a0 {
>> +   compatible = "arm,gic-v3";
>> +   #interrupt-cells = <3>;
>> +   interrupt-controller;
>> +   #redistributor-regions = <1>;
>> +   redistributor-stride = <0x0 0x2>;
>> +   reg = <0x17a0 0x1>, /* GICD */
>> + <0x17a6 0x10>;/* GICR * 8 */
>> +   interrupts = ;
>> +   };
>> +
>> +   gcc: clock-controller@10 {
>> +   compatible = "qcom,gcc-sdm845";
> 
> sdm845-gcc is the preferred order.

This is still proposed as part of the GCC patch for sdm845 [1]
(which looks like has neither you nor the DT list copied :/ )
Also looking at Documentation/devicetree/bindings/clock/qcom,gcc.txt,
I see we seem to follow the gcc- convention for compatible all along :(

"qcom,gcc-apq8064"
"qcom,gcc-apq8084"
"qcom,gcc-ipq8064"
"qcom,gcc-ipq4019"
"qcom,gcc-ipq8074"
"qcom,gcc-msm8660"
"qcom,gcc-msm8916"
"qcom,gcc-msm8960"
"qcom,gcc-msm8974"
"qcom,gcc-msm8974pro"
"qcom,gcc-msm8974pro-ac"
"qcom,gcc-msm8994"
"qcom,gcc-msm8996"
"qcom,gcc-mdm9615"

[1] https://patchwork.kernel.org/patch/10193895/ 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Re: [PATCH v26 2/2 RESEND] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-06 Thread Michael S. Tsirkin

On Wed, Feb 07, 2018 at 11:01:06AM +0800, Wei Wang wrote:
> Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
> support of reporting hints of guest free pages to host via virtio-balloon.
> 
> Host requests the guest to report free page hints by sending a new cmd
> id to the guest via the free_page_report_cmd_id configuration register.
> 
> When the guest starts to report, the first element added to the free page
> vq is the cmd id given by host. When the guest finishes the reporting
> of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
> to the vq to tell host that the reporting is done. Host polls the free
> page vq after sending the starting cmd id, so the guest doesn't need to
> kick after filling an element to the vq.
> 
> Host may also requests the guest to stop the reporting in advance by
> sending the stop cmd id to the guest via the configuration register.
> 
> Signed-off-by: Wei Wang 
> Signed-off-by: Liang Li 
> Cc: Michael S. Tsirkin 
> Cc: Michal Hocko 
> ---
>  drivers/virtio/virtio_balloon.c | 255 
> +++-
>  include/uapi/linux/virtio_balloon.h |   7 +
>  mm/page_poison.c|   6 +
>  3 files changed, 232 insertions(+), 36 deletions(-)
> 
> Resend Change:
>   - Expose page_poisoning_enabled to kernel modules

RESEND tag is for reposting unchanged patches.
you want to post a v27, and you want the mm patch
as a separate one, so you can get an ack on it from
someone on linux-mm.

In fact, I would probably add reporting the poison value as
a separate feature/couple of patches.

> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index a1fb52c..5476725 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  static struct vfsmount *balloon_mnt;
>  #endif
>  
> +enum virtio_balloon_vq {
> + VIRTIO_BALLOON_VQ_INFLATE,
> + VIRTIO_BALLOON_VQ_DEFLATE,
> + VIRTIO_BALLOON_VQ_STATS,
> + VIRTIO_BALLOON_VQ_FREE_PAGE,
> + VIRTIO_BALLOON_VQ_MAX
> +};
> +
>  struct virtio_balloon {
>   struct virtio_device *vdev;
> - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
> +
> + /* Balloon's own wq for cpu-intensive work items */
> + struct workqueue_struct *balloon_wq;
> + /* The free page reporting work item submitted to the balloon wq */
> + struct work_struct report_free_page_work;
>  
>   /* The balloon servicing is delegated to a freezable workqueue. */
>   struct work_struct update_balloon_stats_work;
> @@ -63,6 +76,11 @@ struct virtio_balloon {
>   spinlock_t stop_update_lock;
>   bool stop_update;
>  
> + /* The new cmd id received from host */
> + uint32_t cmd_id_received;
> + /* The cmd id that is in use */
> + __virtio32 cmd_id_use;
> +
>   /* Waiting for host to ack the pages we released. */
>   wait_queue_head_t acked;
>  
> @@ -316,17 +334,6 @@ static void stats_handle_request(struct virtio_balloon 
> *vb)
>   virtqueue_kick(vq);
>  }
>  
> -static void virtballoon_changed(struct virtio_device *vdev)
> -{
> - struct virtio_balloon *vb = vdev->priv;
> - unsigned long flags;
> -
> - spin_lock_irqsave(>stop_update_lock, flags);
> - if (!vb->stop_update)
> - queue_work(system_freezable_wq, >update_balloon_size_work);
> - spin_unlock_irqrestore(>stop_update_lock, flags);
> -}
> -
>  static inline s64 towards_target(struct virtio_balloon *vb)
>  {
>   s64 target;
> @@ -343,6 +350,34 @@ static inline s64 towards_target(struct virtio_balloon 
> *vb)
>   return target - vb->num_pages;
>  }
>  
> +static void virtballoon_changed(struct virtio_device *vdev)
> +{
> + struct virtio_balloon *vb = vdev->priv;
> + unsigned long flags;
> + s64 diff = towards_target(vb);
> +
> + if (diff) {
> + spin_lock_irqsave(>stop_update_lock, flags);
> + if (!vb->stop_update)
> + queue_work(system_freezable_wq,
> +>update_balloon_size_work);
> + spin_unlock_irqrestore(>stop_update_lock, flags);
> + }
> +
> + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> + virtio_cread(vdev, struct virtio_balloon_config,
> +  free_page_report_cmd_id, >cmd_id_received);
> + if (vb->cmd_id_received !=
> + VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
> + spin_lock_irqsave(>stop_update_lock, flags);
> + if (!vb->stop_update)
> + queue_work(vb->balloon_wq,
> +>report_free_page_work);
> +

Re: [PATCH v26 2/2 RESEND] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-06 Thread Michael S. Tsirkin

On Wed, Feb 07, 2018 at 11:01:06AM +0800, Wei Wang wrote:
> Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
> support of reporting hints of guest free pages to host via virtio-balloon.
> 
> Host requests the guest to report free page hints by sending a new cmd
> id to the guest via the free_page_report_cmd_id configuration register.
> 
> When the guest starts to report, the first element added to the free page
> vq is the cmd id given by host. When the guest finishes the reporting
> of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
> to the vq to tell host that the reporting is done. Host polls the free
> page vq after sending the starting cmd id, so the guest doesn't need to
> kick after filling an element to the vq.
> 
> Host may also requests the guest to stop the reporting in advance by
> sending the stop cmd id to the guest via the configuration register.
> 
> Signed-off-by: Wei Wang 
> Signed-off-by: Liang Li 
> Cc: Michael S. Tsirkin 
> Cc: Michal Hocko 
> ---
>  drivers/virtio/virtio_balloon.c | 255 
> +++-
>  include/uapi/linux/virtio_balloon.h |   7 +
>  mm/page_poison.c|   6 +
>  3 files changed, 232 insertions(+), 36 deletions(-)
> 
> Resend Change:
>   - Expose page_poisoning_enabled to kernel modules

RESEND tag is for reposting unchanged patches.
you want to post a v27, and you want the mm patch
as a separate one, so you can get an ack on it from
someone on linux-mm.

In fact, I would probably add reporting the poison value as
a separate feature/couple of patches.

> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index a1fb52c..5476725 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  static struct vfsmount *balloon_mnt;
>  #endif
>  
> +enum virtio_balloon_vq {
> + VIRTIO_BALLOON_VQ_INFLATE,
> + VIRTIO_BALLOON_VQ_DEFLATE,
> + VIRTIO_BALLOON_VQ_STATS,
> + VIRTIO_BALLOON_VQ_FREE_PAGE,
> + VIRTIO_BALLOON_VQ_MAX
> +};
> +
>  struct virtio_balloon {
>   struct virtio_device *vdev;
> - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
> +
> + /* Balloon's own wq for cpu-intensive work items */
> + struct workqueue_struct *balloon_wq;
> + /* The free page reporting work item submitted to the balloon wq */
> + struct work_struct report_free_page_work;
>  
>   /* The balloon servicing is delegated to a freezable workqueue. */
>   struct work_struct update_balloon_stats_work;
> @@ -63,6 +76,11 @@ struct virtio_balloon {
>   spinlock_t stop_update_lock;
>   bool stop_update;
>  
> + /* The new cmd id received from host */
> + uint32_t cmd_id_received;
> + /* The cmd id that is in use */
> + __virtio32 cmd_id_use;
> +
>   /* Waiting for host to ack the pages we released. */
>   wait_queue_head_t acked;
>  
> @@ -316,17 +334,6 @@ static void stats_handle_request(struct virtio_balloon 
> *vb)
>   virtqueue_kick(vq);
>  }
>  
> -static void virtballoon_changed(struct virtio_device *vdev)
> -{
> - struct virtio_balloon *vb = vdev->priv;
> - unsigned long flags;
> -
> - spin_lock_irqsave(>stop_update_lock, flags);
> - if (!vb->stop_update)
> - queue_work(system_freezable_wq, >update_balloon_size_work);
> - spin_unlock_irqrestore(>stop_update_lock, flags);
> -}
> -
>  static inline s64 towards_target(struct virtio_balloon *vb)
>  {
>   s64 target;
> @@ -343,6 +350,34 @@ static inline s64 towards_target(struct virtio_balloon 
> *vb)
>   return target - vb->num_pages;
>  }
>  
> +static void virtballoon_changed(struct virtio_device *vdev)
> +{
> + struct virtio_balloon *vb = vdev->priv;
> + unsigned long flags;
> + s64 diff = towards_target(vb);
> +
> + if (diff) {
> + spin_lock_irqsave(>stop_update_lock, flags);
> + if (!vb->stop_update)
> + queue_work(system_freezable_wq,
> +>update_balloon_size_work);
> + spin_unlock_irqrestore(>stop_update_lock, flags);
> + }
> +
> + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> + virtio_cread(vdev, struct virtio_balloon_config,
> +  free_page_report_cmd_id, >cmd_id_received);
> + if (vb->cmd_id_received !=
> + VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
> + spin_lock_irqsave(>stop_update_lock, flags);
> + if (!vb->stop_update)
> + queue_work(vb->balloon_wq,
> +>report_free_page_work);
> + spin_unlock_irqrestore(>stop_update_lock, flags);
> + }
> + }
> +}
> +
>  static void

Re: [PATCH 2/3] x86/tme: Detect if TME and MKTME is activated by BIOS

2018-02-06 Thread Kai Huang

On Wed, 2018-01-31 at 12:15 +0300, Kirill A. Shutemov wrote:
> IA32_TME_ACTIVATE MSR (0x982) can be used to check if BIOS has
> enabled
> TME and MKTME. It includes which encryption policy/algorithm is
> selected
> for TME or available for MKTME. For MKTME, the MSR also enumerates
> how
> many KeyIDs are available.
> 
> Signed-off-by: Kirill A. Shutemov 
> ---
>  arch/x86/kernel/cpu/intel.c | 83
> +
>  1 file changed, 83 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/intel.c
> b/arch/x86/kernel/cpu/intel.c
> index 6936d14d4c77..5b95fa484837 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -517,6 +517,86 @@ static void detect_vmx_virtcap(struct
> cpuinfo_x86 *c)
>   }
>  }
>  
> +#define MSR_IA32_TME_ACTIVATE0x982

Should this MSR go into msr-index.h?

> +
> +#define TME_ACTIVATE_LOCKED(x)   (x & 0x1)
> +#define TME_ACTIVATE_ENABLED(x)  (x & 0x2)
> +
> +#define TME_ACTIVATE_POLICY(x)   ((x >> 4) & 0xf)
> /* Bits 7:4 */
> +#define TME_ACTIVATE_POLICY_AES_XTS  0
> +
> +#define TME_ACTIVATE_KEYID_BITS(x)   ((x >> 32) & 0xf)   /
> * Bits 35:32 */
> +
> +#define TME_ACTIVATE_CRYPTO_ALGS(x)  ((x >> 48) & 0x)
> /* Bits 63:48 */
> +#define TME_ACTIVATE_CRYPTO_AES_XTS  1
> +
> +#define MKTME_ENABLED0
> +#define MKTME_DISABLED   1
> +#define MKTME_UNINITIALIZED  2
> +static int mktme_status = MKTME_UNINITIALIZED;
> +
> +static void detect_tme(struct cpuinfo_x86 *c)
> +{
> + u64 tme_activate, tme_policy, tme_crypto_algs;
> + int keyid_bits = 0, nr_keyids = 0;
> + static u64 tme_activate_cpu0 = 0;
> +
> + rdmsrl(MSR_IA32_TME_ACTIVATE, tme_activate);
> +
> + if (mktme_status != MKTME_UNINITIALIZED) {
> + /* Broken BIOS? */
> + if (tme_activate != tme_activate_cpu0) {
> + pr_err_once("TME: configuation is
> inconsistent between CPUs\n");
> + mktme_status = MKTME_DISABLED;
> + }
> + goto out;

Why goto out here? If something goes wrong, I think it is pointless to
read keyID bits staff? IMHO if something goes wrong, you should set
mktme_status to disabled, and clear tme_activate_cpu0?

> + }
> +
> + tme_activate_cpu0 = tme_activate;
> +
> + if (!TME_ACTIVATE_LOCKED(tme_activate) ||
> !TME_ACTIVATE_ENABLED(tme_activate)) {
> + pr_info("TME: not enabled by BIOS\n");
> + mktme_status = MKTME_DISABLED;
> + goto out;

I think it is pointless to read keyID bits staff if TME is not even
enabled.

> + }
> +
> + pr_info("TME: enabled by BIOS\n");
> +
> + tme_policy = TME_ACTIVATE_POLICY(tme_activate);
> + if (tme_policy != TME_ACTIVATE_POLICY_AES_XTS)
> + pr_warn("TME: Unknown policy is active: %#llx\n",
> tme_policy);
> +
> + tme_crypto_algs = TME_ACTIVATE_CRYPTO_ALGS(tme_activate);
> + if (!(tme_crypto_algs & TME_ACTIVATE_CRYPTO_AES_XTS)) {
> + pr_err("MKTME: No known encryption algorithm is
> supported: %#llx\n",
> + tme_crypto_algs);
> + mktme_status = MKTME_DISABLED;
> + }

To me it is a little bit confusing about the naming. tme_policy is the
crypto_alg used by TME keyID (0), and tme_crypto_algs is bitmap of
supported crypto_algs for MK-TME. Probably a better naming is needed?
And the naming of TME_ACTIVATE_POLICY(x), TME_ACTIVATE_CRYPTO_ALGS(x)
above as well?

> +out:
> + keyid_bits = TME_ACTIVATE_KEYID_BITS(tme_activate);
> + nr_keyids = (1UL << keyid_bits) - 1;
> + if (nr_keyids) {
> + pr_info_once("MKTME: enabled by BIOS\n");
> + pr_info_once("MKTME: %d KeyIDs available\n",
> nr_keyids);
> + } else {
> + pr_info_once("MKTME: disabled by BIOS\n");
> + }
> +
> + if (mktme_status == MKTME_UNINITIALIZED) {
> + /* MKTME is usable */
> + mktme_status = MKTME_ENABLED;
> + }
> +
> + /*
> +  * Exclude KeyID bits from physical address bits.
> +  *
> +  * We have to do this even if we are not going to use KeyID
> bits
> +  * ourself. VM guests still have to know that these bits are
> not usable
> +  * for physical address.
> +  */
Currently KVM uses CPUID to get such info directly, but not consulting
c->x86_phys_bits. I think it may be reasonable for KVM to consulting c-
>x86_phys_bits for MK-TME, but IMHO the real reason we need to do this
is this is just the fact, and c->x86_phys_bits needs to reflect the
fact, so probably the comments can be refined.

Thanks,
-Kai

> + c->x86_phys_bits -= keyid_bits;
> +}
> +
>  static void init_intel_energy_perf(struct cpuinfo_x86 *c)
>  {
>   u64 epb;
> @@ -687,6 +767,9 @@ static void init_intel(struct cpuinfo_x86 *c)
>   if (cpu_has(c, X86_FEATURE_VMX))
>   detect_vmx_virtcap(c);
>  
> + if

Re: [PATCH 2/3] x86/tme: Detect if TME and MKTME is activated by BIOS

2018-02-06 Thread Kai Huang

On Wed, 2018-01-31 at 12:15 +0300, Kirill A. Shutemov wrote:
> IA32_TME_ACTIVATE MSR (0x982) can be used to check if BIOS has
> enabled
> TME and MKTME. It includes which encryption policy/algorithm is
> selected
> for TME or available for MKTME. For MKTME, the MSR also enumerates
> how
> many KeyIDs are available.
> 
> Signed-off-by: Kirill A. Shutemov 
> ---
>  arch/x86/kernel/cpu/intel.c | 83
> +
>  1 file changed, 83 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/intel.c
> b/arch/x86/kernel/cpu/intel.c
> index 6936d14d4c77..5b95fa484837 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -517,6 +517,86 @@ static void detect_vmx_virtcap(struct
> cpuinfo_x86 *c)
>   }
>  }
>  
> +#define MSR_IA32_TME_ACTIVATE0x982

Should this MSR go into msr-index.h?

> +
> +#define TME_ACTIVATE_LOCKED(x)   (x & 0x1)
> +#define TME_ACTIVATE_ENABLED(x)  (x & 0x2)
> +
> +#define TME_ACTIVATE_POLICY(x)   ((x >> 4) & 0xf)
> /* Bits 7:4 */
> +#define TME_ACTIVATE_POLICY_AES_XTS  0
> +
> +#define TME_ACTIVATE_KEYID_BITS(x)   ((x >> 32) & 0xf)   /
> * Bits 35:32 */
> +
> +#define TME_ACTIVATE_CRYPTO_ALGS(x)  ((x >> 48) & 0x)
> /* Bits 63:48 */
> +#define TME_ACTIVATE_CRYPTO_AES_XTS  1
> +
> +#define MKTME_ENABLED0
> +#define MKTME_DISABLED   1
> +#define MKTME_UNINITIALIZED  2
> +static int mktme_status = MKTME_UNINITIALIZED;
> +
> +static void detect_tme(struct cpuinfo_x86 *c)
> +{
> + u64 tme_activate, tme_policy, tme_crypto_algs;
> + int keyid_bits = 0, nr_keyids = 0;
> + static u64 tme_activate_cpu0 = 0;
> +
> + rdmsrl(MSR_IA32_TME_ACTIVATE, tme_activate);
> +
> + if (mktme_status != MKTME_UNINITIALIZED) {
> + /* Broken BIOS? */
> + if (tme_activate != tme_activate_cpu0) {
> + pr_err_once("TME: configuation is
> inconsistent between CPUs\n");
> + mktme_status = MKTME_DISABLED;
> + }
> + goto out;

Why goto out here? If something goes wrong, I think it is pointless to
read keyID bits staff? IMHO if something goes wrong, you should set
mktme_status to disabled, and clear tme_activate_cpu0?

> + }
> +
> + tme_activate_cpu0 = tme_activate;
> +
> + if (!TME_ACTIVATE_LOCKED(tme_activate) ||
> !TME_ACTIVATE_ENABLED(tme_activate)) {
> + pr_info("TME: not enabled by BIOS\n");
> + mktme_status = MKTME_DISABLED;
> + goto out;

I think it is pointless to read keyID bits staff if TME is not even
enabled.

> + }
> +
> + pr_info("TME: enabled by BIOS\n");
> +
> + tme_policy = TME_ACTIVATE_POLICY(tme_activate);
> + if (tme_policy != TME_ACTIVATE_POLICY_AES_XTS)
> + pr_warn("TME: Unknown policy is active: %#llx\n",
> tme_policy);
> +
> + tme_crypto_algs = TME_ACTIVATE_CRYPTO_ALGS(tme_activate);
> + if (!(tme_crypto_algs & TME_ACTIVATE_CRYPTO_AES_XTS)) {
> + pr_err("MKTME: No known encryption algorithm is
> supported: %#llx\n",
> + tme_crypto_algs);
> + mktme_status = MKTME_DISABLED;
> + }

To me it is a little bit confusing about the naming. tme_policy is the
crypto_alg used by TME keyID (0), and tme_crypto_algs is bitmap of
supported crypto_algs for MK-TME. Probably a better naming is needed?
And the naming of TME_ACTIVATE_POLICY(x), TME_ACTIVATE_CRYPTO_ALGS(x)
above as well?

> +out:
> + keyid_bits = TME_ACTIVATE_KEYID_BITS(tme_activate);
> + nr_keyids = (1UL << keyid_bits) - 1;
> + if (nr_keyids) {
> + pr_info_once("MKTME: enabled by BIOS\n");
> + pr_info_once("MKTME: %d KeyIDs available\n",
> nr_keyids);
> + } else {
> + pr_info_once("MKTME: disabled by BIOS\n");
> + }
> +
> + if (mktme_status == MKTME_UNINITIALIZED) {
> + /* MKTME is usable */
> + mktme_status = MKTME_ENABLED;
> + }
> +
> + /*
> +  * Exclude KeyID bits from physical address bits.
> +  *
> +  * We have to do this even if we are not going to use KeyID
> bits
> +  * ourself. VM guests still have to know that these bits are
> not usable
> +  * for physical address.
> +  */
Currently KVM uses CPUID to get such info directly, but not consulting
c->x86_phys_bits. I think it may be reasonable for KVM to consulting c-
>x86_phys_bits for MK-TME, but IMHO the real reason we need to do this
is this is just the fact, and c->x86_phys_bits needs to reflect the
fact, so probably the comments can be refined.

Thanks,
-Kai

> + c->x86_phys_bits -= keyid_bits;
> +}
> +
>  static void init_intel_energy_perf(struct cpuinfo_x86 *c)
>  {
>   u64 epb;
> @@ -687,6 +767,9 @@ static void init_intel(struct cpuinfo_x86 *c)
>   if (cpu_has(c, X86_FEATURE_VMX))
>   detect_vmx_virtcap(c);
>  
> + if (cpu_has(c, X86_FEATURE_TME))
> +

Re: linux-next: build failure after merge of the vhost tree

2018-02-06 Thread Stephen Rothwell

Hi Michael,

On Wed, 7 Feb 2018 04:57:42 +0200 "Michael S. Tsirkin" <m...@redhat.com> wrote:
>
> On Wed, Feb 07, 2018 at 01:54:41PM +1100, Stephen Rothwell wrote:
> > 
> > On Wed, 7 Feb 2018 13:04:23 +1100 Stephen Rothwell <s...@canb.auug.org.au> 
> > wrote:  
> > >
> > > I have used the vhost tree from next-20180206 for today.  
> 
> That's
> commit d25cc43c6775bff6b8e3dad97c747954b805e421
> vhost: don't hold onto file pointer for VHOST_SET_LOG_FD
> 
> Right?

Correct.

-- 
Cheers,
Stephen Rothwell

Re: linux-next: build failure after merge of the vhost tree

2018-02-06 Thread Stephen Rothwell

Hi Michael,

On Wed, 7 Feb 2018 04:57:42 +0200 "Michael S. Tsirkin"  wrote:
>
> On Wed, Feb 07, 2018 at 01:54:41PM +1100, Stephen Rothwell wrote:
> > 
> > On Wed, 7 Feb 2018 13:04:23 +1100 Stephen Rothwell  
> > wrote:  
> > >
> > > I have used the vhost tree from next-20180206 for today.  
> 
> That's
> commit d25cc43c6775bff6b8e3dad97c747954b805e421
> vhost: don't hold onto file pointer for VHOST_SET_LOG_FD
> 
> Right?

Correct.

-- 
Cheers,
Stephen Rothwell

Re: [PATCH v2 3/3] arm64: dts: sdm845: Add serial console support

2018-02-06 Thread Rajendra Nayak

[]..

>> @@ -10,4 +10,46 @@
>>  / {
>> model = "Qualcomm Technologies, Inc. SDM845 MTP";
>> compatible = "qcom,sdm845-mtp";
>> +
>> +   aliases {
>> +   serial0 = _uart2;
>> +   };
>> +
>> +   chosen {
>> +   stdout-path = "serial0";
>> +   };
>> +
>> +   soc {
> 
> I don't know if there's an official position, but in general I'm seen
> people use the actual "soc" alias here.  AKA at the top level of this
> dts, just do:
> 
>  {
>   ...
> };
> 
> Normally doing stuff like that is useful so you don't need to
> replicate the whole hierarchy.  In this case that's not a huge
> savings, but it can be nice to be consistent.  In the very least it
> saves you a level of indentation.
> 
> 
>> +   serial@a84000 {
>> +   status = "okay";
>> +   };
> 
> Similarly here you can use the alias from the sdm845.dtsi file to
> avoid replicating the hierarchy.  AKA at the top level do:
> 
> _uart2 {
>   status = "okay";
> };
> 
> In this case it saves you 2 levels of indentation.

Right. Andy/Bjorn, are there any preferences here?
I see we don't do this for the other board files, and I not sure
theres a specific reasoning for how its currently done and if we
need to stick to it.

> 
>> +
>> +   pinctrl@340 {
>> +   qup_uart2_default: qup_uart2_default {
> 
> I'm not sure how persnickety I should be, but according to
> :
> 
>   node names use dash "-" instead of underscore "_"
> 
> ...but, of course, labels can't use dashes (and the same page says to
> use underscore for labels).  This is why, in rk3288 for instance, you
> see:
> 
> i2c2_xfer: i2c2-xfer {
>   rockchip,pins = <6 9 RK_FUNC_1 _pull_none>,
>   <6 10 RK_FUNC_1 _pull_none>;
> };
> 
> AKA the label and the node name are the same but the label uses "_"
> and the node names use "-".

Sure, I'll fix these up.

[]
>> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
>> b/arch/arm64/boot/dts/qcom/sdm845.dtsi
>> index 02520f19e4ca..c4ce70840acf 100644
>> --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
>> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
>> @@ -4,6 +4,7 @@
>>   */
>>
>>  #include 
>> +#include 
>>
>>  / {
>> model = "Qualcomm Technologies, Inc. SDM845";
>> @@ -273,5 +274,25 @@
>> cell-index = <0>;
>> };
>>
>> +   qup_1: qcom,geni_se@ac {
>> +   compatible = "qcom,geni-se-qup";
>> +   reg = <0xac 0x6000>;
> 
> I think you may have mentioned this in another context, but this
> doesn't match the current bindings.  Some clocks should be here.
> ...and it looks like the uart should be a subnode.

right, these were tested with the v1 set for serial. I will update them.

regards
Rajendra

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Re: [PATCH v2 3/3] arm64: dts: sdm845: Add serial console support

2018-02-06 Thread Rajendra Nayak

[]..

>> @@ -10,4 +10,46 @@
>>  / {
>> model = "Qualcomm Technologies, Inc. SDM845 MTP";
>> compatible = "qcom,sdm845-mtp";
>> +
>> +   aliases {
>> +   serial0 = _uart2;
>> +   };
>> +
>> +   chosen {
>> +   stdout-path = "serial0";
>> +   };
>> +
>> +   soc {
> 
> I don't know if there's an official position, but in general I'm seen
> people use the actual "soc" alias here.  AKA at the top level of this
> dts, just do:
> 
>  {
>   ...
> };
> 
> Normally doing stuff like that is useful so you don't need to
> replicate the whole hierarchy.  In this case that's not a huge
> savings, but it can be nice to be consistent.  In the very least it
> saves you a level of indentation.
> 
> 
>> +   serial@a84000 {
>> +   status = "okay";
>> +   };
> 
> Similarly here you can use the alias from the sdm845.dtsi file to
> avoid replicating the hierarchy.  AKA at the top level do:
> 
> _uart2 {
>   status = "okay";
> };
> 
> In this case it saves you 2 levels of indentation.

Right. Andy/Bjorn, are there any preferences here?
I see we don't do this for the other board files, and I not sure
theres a specific reasoning for how its currently done and if we
need to stick to it.

> 
>> +
>> +   pinctrl@340 {
>> +   qup_uart2_default: qup_uart2_default {
> 
> I'm not sure how persnickety I should be, but according to
> :
> 
>   node names use dash "-" instead of underscore "_"
> 
> ...but, of course, labels can't use dashes (and the same page says to
> use underscore for labels).  This is why, in rk3288 for instance, you
> see:
> 
> i2c2_xfer: i2c2-xfer {
>   rockchip,pins = <6 9 RK_FUNC_1 _pull_none>,
>   <6 10 RK_FUNC_1 _pull_none>;
> };
> 
> AKA the label and the node name are the same but the label uses "_"
> and the node names use "-".

Sure, I'll fix these up.

[]
>> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
>> b/arch/arm64/boot/dts/qcom/sdm845.dtsi
>> index 02520f19e4ca..c4ce70840acf 100644
>> --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
>> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
>> @@ -4,6 +4,7 @@
>>   */
>>
>>  #include 
>> +#include 
>>
>>  / {
>> model = "Qualcomm Technologies, Inc. SDM845";
>> @@ -273,5 +274,25 @@
>> cell-index = <0>;
>> };
>>
>> +   qup_1: qcom,geni_se@ac {
>> +   compatible = "qcom,geni-se-qup";
>> +   reg = <0xac 0x6000>;
> 
> I think you may have mentioned this in another context, but this
> doesn't match the current bindings.  Some clocks should be here.
> ...and it looks like the uart should be a subnode.

right, these were tested with the v1 set for serial. I will update them.

regards
Rajendra

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Re: [RFC PATCH] vfio/pci: Add ioeventfd support

2018-02-06 Thread Alex Williamson

On Wed, 7 Feb 2018 15:09:22 +1100
Alexey Kardashevskiy  wrote:
> On 07/02/18 11:08, Alex Williamson wrote:
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index e3301dbd27d4..07966a5f0832 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -503,6 +503,30 @@ struct vfio_pci_hot_reset {
> >  
> >  #define VFIO_DEVICE_PCI_HOT_RESET  _IO(VFIO_TYPE, VFIO_BASE + 13)
> >  
> > +/**
> > + * VFIO_DEVICE_IOEVENTFD - _IOW(VFIO_TYPE, VFIO_BASE + 14,
> > + *  struct vfio_device_ioeventfd)
> > + *
> > + * Perform a write to the device at the specified device fd offset, with
> > + * the specified data and width when the provided eventfd is triggered.
> > + *
> > + * Return: 0 on success, -errno on failure.
> > + */
> > +struct vfio_device_ioeventfd {
> > +   __u32   argsz;
> > +   __u32   flags;
> > +#define VFIO_DEVICE_IOEVENTFD_8(1 << 0) /* 1-byte write */
> > +#define VFIO_DEVICE_IOEVENTFD_16   (1 << 1) /* 2-byte write */
> > +#define VFIO_DEVICE_IOEVENTFD_32   (1 << 2) /* 4-byte write */
> > +#define VFIO_DEVICE_IOEVENTFD_64   (1 << 3) /* 8-byte write */
> > +#define VFIO_DEVICE_IOEVENTFD_SIZE_MASK(0xf)
> > +   __u64   offset; /* device fd offset of write */
> > +   __u64   data;   /* data to be written */
> > +   __s32   fd; /* -1 for de-assignment */
> > +};
> > +
> > +#define VFIO_DEVICE_IOEVENTFD  _IO(VFIO_TYPE, VFIO_BASE + 14)  
> 
> 
> Is this a first ioctl with endianness fixed to little-endian? I'd suggest
> to comment on that as things like vfio_info_cap_header do use the host
> endianness.

Look at our current read and write interface, we call leXX_to_cpu
before calling iowriteXX there and I think a user would logically
expect to use the same data format here as they would there.  Also note
that iowriteXX does a cpu_to_leXX, so are we really defining the
interface as little-endian or are we just trying to make ourselves
endian neutral and counter that implicit conversion?  Thanks,

Alex

Re: [RFC PATCH] vfio/pci: Add ioeventfd support

2018-02-06 Thread Alex Williamson

On Wed, 7 Feb 2018 15:09:22 +1100
Alexey Kardashevskiy  wrote:
> On 07/02/18 11:08, Alex Williamson wrote:
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index e3301dbd27d4..07966a5f0832 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -503,6 +503,30 @@ struct vfio_pci_hot_reset {
> >  
> >  #define VFIO_DEVICE_PCI_HOT_RESET  _IO(VFIO_TYPE, VFIO_BASE + 13)
> >  
> > +/**
> > + * VFIO_DEVICE_IOEVENTFD - _IOW(VFIO_TYPE, VFIO_BASE + 14,
> > + *  struct vfio_device_ioeventfd)
> > + *
> > + * Perform a write to the device at the specified device fd offset, with
> > + * the specified data and width when the provided eventfd is triggered.
> > + *
> > + * Return: 0 on success, -errno on failure.
> > + */
> > +struct vfio_device_ioeventfd {
> > +   __u32   argsz;
> > +   __u32   flags;
> > +#define VFIO_DEVICE_IOEVENTFD_8(1 << 0) /* 1-byte write */
> > +#define VFIO_DEVICE_IOEVENTFD_16   (1 << 1) /* 2-byte write */
> > +#define VFIO_DEVICE_IOEVENTFD_32   (1 << 2) /* 4-byte write */
> > +#define VFIO_DEVICE_IOEVENTFD_64   (1 << 3) /* 8-byte write */
> > +#define VFIO_DEVICE_IOEVENTFD_SIZE_MASK(0xf)
> > +   __u64   offset; /* device fd offset of write */
> > +   __u64   data;   /* data to be written */
> > +   __s32   fd; /* -1 for de-assignment */
> > +};
> > +
> > +#define VFIO_DEVICE_IOEVENTFD  _IO(VFIO_TYPE, VFIO_BASE + 14)  
> 
> 
> Is this a first ioctl with endianness fixed to little-endian? I'd suggest
> to comment on that as things like vfio_info_cap_header do use the host
> endianness.

Look at our current read and write interface, we call leXX_to_cpu
before calling iowriteXX there and I think a user would logically
expect to use the same data format here as they would there.  Also note
that iowriteXX does a cpu_to_leXX, so are we really defining the
interface as little-endian or are we just trying to make ourselves
endian neutral and counter that implicit conversion?  Thanks,

Alex

Re: [PATCH 0/2] rcu: Transform kfree_rcu() into kvfree_rcu()

2018-02-06 Thread Matthew Wilcox

On Tue, Feb 06, 2018 at 06:17:03PM -0800, Paul E. McKenney wrote:
> So it is OK to kvmalloc() something and pass it to either kfree() or
> kvfree(), and it had better be OK to kvmalloc() something and pass it
> to kvfree().
> 
> Is it OK to kmalloc() something and pass it to kvfree()?

Yes, it absolutely is.

void kvfree(const void *addr)
{
if (is_vmalloc_addr(addr))
vfree(addr);
else
kfree(addr);
}

> If so, is it really useful to have two different names here, that is,
> both kfree_rcu() and kvfree_rcu()?

I think it's handy to have all three of kvfree_rcu(), kfree_rcu() and
vfree_rcu() available in the API for the symmetry of calling kmalloc()
/ kfree_rcu().

Personally, I would like us to rename kvfree() to just free(), and have
malloc(x) be an alias to kvmalloc(x, GFP_KERNEL), but I haven't won that
fight yet.

Re: [PATCH 0/2] rcu: Transform kfree_rcu() into kvfree_rcu()

2018-02-06 Thread Matthew Wilcox

On Tue, Feb 06, 2018 at 06:17:03PM -0800, Paul E. McKenney wrote:
> So it is OK to kvmalloc() something and pass it to either kfree() or
> kvfree(), and it had better be OK to kvmalloc() something and pass it
> to kvfree().
> 
> Is it OK to kmalloc() something and pass it to kvfree()?

Yes, it absolutely is.

void kvfree(const void *addr)
{
if (is_vmalloc_addr(addr))
vfree(addr);
else
kfree(addr);
}

> If so, is it really useful to have two different names here, that is,
> both kfree_rcu() and kvfree_rcu()?

I think it's handy to have all three of kvfree_rcu(), kfree_rcu() and
vfree_rcu() available in the API for the symmetry of calling kmalloc()
/ kfree_rcu().

Personally, I would like us to rename kvfree() to just free(), and have
malloc(x) be an alias to kvmalloc(x, GFP_KERNEL), but I haven't won that
fight yet.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1916 matches

Mail list logo