Re: [PATCH net-next] qed: Set error code for allocation failures

2017-10-27 Thread Yunsheng Lin
Hi, Dan

On 2017/10/27 19:52, Dan Carpenter wrote:
> On Fri, Oct 27, 2017 at 05:32:42PM +0800, Yunsheng Lin wrote:
>> Hi, Dan
>>
>> On 2017/10/27 14:40, Dan Carpenter wrote:
>>> There are several places where we accidentally return success when
>>> kcalloc() fails.
>>>
>>> Fixes: fcb39f6c10b2 ("qed: Add mpa buffer descriptors for storing and 
>>> processing mpa fpdus")
>>> Signed-off-by: Dan Carpenter 
>>>
>>> diff --git a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c 
>>> b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
>>> index 409041eab189..6366f2ef82b7 100644
>>> --- a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
>>> +++ b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
>>> @@ -2585,7 +2585,7 @@ qed_iwarp_ll2_start(struct qed_hwfn *p_hwfn,
>>> struct qed_ll2_cbs cbs;
>>> u32 mpa_buff_size;
>>> u16 n_ooo_bufs;
>>> -   int rc = 0;
>>> +   int rc;
>>> int i;
>>>  
>>> iwarp_info = _hwfn->p_rdma_info->iwarp;
>>> @@ -2696,6 +2696,7 @@ qed_iwarp_ll2_start(struct qed_hwfn *p_hwfn,
>>> if (rc)
>>> goto err;
>>>  
>>> +   rc = -ENOMEM;
>>> iwarp_info->partial_fpdus = kcalloc((u16)p_hwfn->p_rdma_info->num_qps,
>>> sizeof(*iwarp_info->partial_fpdus),
>>> GFP_KERNEL);
>>
>> Does the memory allocated here need to be freed when error happens below?
>>
> 
> Hm...  I think you're right that it leaks.  Also I'm confused by the
> qed_iwarp_ll2_alloc_buffers() allocation.  The comment in there says
> that /* buffers will be deallocated by qed_ll2 */ but qed_ll2 is not
> a function name or something which is useful to grep.

Yes, I am confused by it too.

Even in qed_iwarp_ll2_alloc_buffers, if kzcalloc failed, it do not clean
up the memory allocated by pre kzcalloc.

> 
> regards,
> dan carpenter
> 
> 
> .
> 



Re: [PATCH net-next] qed: Set error code for allocation failures

2017-10-27 Thread Yunsheng Lin
Hi, Dan

On 2017/10/27 19:52, Dan Carpenter wrote:
> On Fri, Oct 27, 2017 at 05:32:42PM +0800, Yunsheng Lin wrote:
>> Hi, Dan
>>
>> On 2017/10/27 14:40, Dan Carpenter wrote:
>>> There are several places where we accidentally return success when
>>> kcalloc() fails.
>>>
>>> Fixes: fcb39f6c10b2 ("qed: Add mpa buffer descriptors for storing and 
>>> processing mpa fpdus")
>>> Signed-off-by: Dan Carpenter 
>>>
>>> diff --git a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c 
>>> b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
>>> index 409041eab189..6366f2ef82b7 100644
>>> --- a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
>>> +++ b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
>>> @@ -2585,7 +2585,7 @@ qed_iwarp_ll2_start(struct qed_hwfn *p_hwfn,
>>> struct qed_ll2_cbs cbs;
>>> u32 mpa_buff_size;
>>> u16 n_ooo_bufs;
>>> -   int rc = 0;
>>> +   int rc;
>>> int i;
>>>  
>>> iwarp_info = _hwfn->p_rdma_info->iwarp;
>>> @@ -2696,6 +2696,7 @@ qed_iwarp_ll2_start(struct qed_hwfn *p_hwfn,
>>> if (rc)
>>> goto err;
>>>  
>>> +   rc = -ENOMEM;
>>> iwarp_info->partial_fpdus = kcalloc((u16)p_hwfn->p_rdma_info->num_qps,
>>> sizeof(*iwarp_info->partial_fpdus),
>>> GFP_KERNEL);
>>
>> Does the memory allocated here need to be freed when error happens below?
>>
> 
> Hm...  I think you're right that it leaks.  Also I'm confused by the
> qed_iwarp_ll2_alloc_buffers() allocation.  The comment in there says
> that /* buffers will be deallocated by qed_ll2 */ but qed_ll2 is not
> a function name or something which is useful to grep.

Yes, I am confused by it too.

Even in qed_iwarp_ll2_alloc_buffers, if kzcalloc failed, it do not clean
up the memory allocated by pre kzcalloc.

> 
> regards,
> dan carpenter
> 
> 
> .
> 



Re: [PATCH v2] KVM: arm/arm64: Fix external abort type matching

2017-10-27 Thread gengdongjiu

On 2017/10/28 2:28, Marc Zyngier wrote:
>> kvm_vcpu_trap_get_fault_type() will clear the low two bits to zero. So
>> I use FSC_SEA_TTW represent "Synchronous external abort on translation
>> table walk"
> I understand that, and I certainly not keen on adding another "fault
> type" for this.
 Ok,  thanks.

> 
>> As we can see the Translation fault type "FSC_FAULT", which does not
>> define the "FSC_FAULT0" "FSC_FAULT1" "FSC_FAULT2" "FSC_FAULT3".
>> Because the fault type " FSC_FAULT " include the four cases. 
> Indeed, and I'm still not convinced this is the best thing we have in
> the code.
Ok, thanks.


> 
>> static inline u8 kvm_vcpu_trap_get_fault(const struct kvm_vcpu *vcpu)
>> {
>>  return kvm_vcpu_get_hsr(vcpu) & 0x3f;
>> }
> Here you go. This should give you a pretty good idea of how to provide a
> proper fix.
OK, got it, thanks

> 
> Thanks,



Re: [PATCH v2] KVM: arm/arm64: Fix external abort type matching

2017-10-27 Thread gengdongjiu

On 2017/10/28 2:28, Marc Zyngier wrote:
>> kvm_vcpu_trap_get_fault_type() will clear the low two bits to zero. So
>> I use FSC_SEA_TTW represent "Synchronous external abort on translation
>> table walk"
> I understand that, and I certainly not keen on adding another "fault
> type" for this.
 Ok,  thanks.

> 
>> As we can see the Translation fault type "FSC_FAULT", which does not
>> define the "FSC_FAULT0" "FSC_FAULT1" "FSC_FAULT2" "FSC_FAULT3".
>> Because the fault type " FSC_FAULT " include the four cases. 
> Indeed, and I'm still not convinced this is the best thing we have in
> the code.
Ok, thanks.


> 
>> static inline u8 kvm_vcpu_trap_get_fault(const struct kvm_vcpu *vcpu)
>> {
>>  return kvm_vcpu_get_hsr(vcpu) & 0x3f;
>> }
> Here you go. This should give you a pretty good idea of how to provide a
> proper fix.
OK, got it, thanks

> 
> Thanks,



HELLO

2017-10-27 Thread verify


DEAR FRIEND.

I am MR ADNANA MUHAMMAD, Manager Audit/Accounting Department BANK OF AFRICA 
B.O.A
Ouagadougou Burkina-Faso, with the business proposal deal of 19. 3 million US 
Dollars to
transfer into your account, if you are interested contact me for more details, 
with my
email address:

adnanamuhammad...@gmail.com

Best Regard,

MR ADNANA MUHAMMAD



HELLO

2017-10-27 Thread verify


DEAR FRIEND.

I am MR ADNANA MUHAMMAD, Manager Audit/Accounting Department BANK OF AFRICA 
B.O.A
Ouagadougou Burkina-Faso, with the business proposal deal of 19. 3 million US 
Dollars to
transfer into your account, if you are interested contact me for more details, 
with my
email address:

adnanamuhammad...@gmail.com

Best Regard,

MR ADNANA MUHAMMAD



Re: Query regarding srcu_funnel_exp_start()

2017-10-27 Thread Neeraj Upadhyay

On 10/28/2017 03:50 AM, Paul E. McKenney wrote:

On Fri, Oct 27, 2017 at 10:15:04PM +0530, Neeraj Upadhyay wrote:

On 10/27/2017 05:56 PM, Paul E. McKenney wrote:

On Fri, Oct 27, 2017 at 02:23:07PM +0530, Neeraj Upadhyay wrote:

Hi,

One query regarding srcu_funnel_exp_start() function in
kernel/rcu/srcutree.c.

static void srcu_funnel_exp_start(struct srcu_struct *sp, struct
srcu_node *snp,
  unsigned long s)
{

if (!ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, s))
sp->srcu_gp_seq_needed_exp = s;

}

Why is sp->srcu_gp_seq_needed_exp set to 's' if srcu_gp_seq_needed_exp is >=
's'. Shouldn't srcu_gp_seq_needed_exp be equal to the greater of both?


Let's suppose that it is incorrect as currently written.  Can you
construct a test case demonstrating a failure of some sort, then provide
a fix?


Will check this. Might take some time to build a test case.


Fair enough!

I suggest checking to see if kernel/rcu/rcuperf.c can do what you need for
this test.  (Might not with a single test, but perhaps a before-and-after
comparison.  Or maybe you really do need to add some test code somewhere.)



Thanks for the suggestion, will try that out.


To start with, if it is currently incorrect, what would be the nature
of the failure?

Thanx, Paul



Hi Paul,

I see below scenario, where new gp won't be expedited. Please correct
me if I am missing something here.

1. CPU0 calls synchronize_srcu_expedited()

synchronize_srcu_expedited()
   __synchronize_srcu()
 __call_srcu()
 s = rcu_seq_snap(>srcu_gp_seq); // lets say
srcu_gp_seq  = 0;
 // s = 0x100


Looks like you have one hex digit and then two binary digits, but why not?
(RCU_SEQ_STATE_MASK is 3 rather than 0xff >


Yeah, sorry I confused myself while representing the values. 0x100 need 
to be replaced with b'100' and 0x200 with b'1000'.



 sdp->srcu_gp_seq_needed = s // 0x100
 needgp = true
 sdp->srcu_gp_seq_needed_exp = s // 0x100
 srcu_funnel_gp_start()
 sp->srcu_gp_seq_needed_exp = s;
 srcu_gp_start(sp);
 rcu_seq_start(>srcu_gp_seq);

2. CPU1 calls normal synchronize_srcu()

synchronize_srcu()
 __synchronize_srcu(sp, true)
 __call_srcu()
 s = rcu_seq_snap(>srcu_gp_seq); // srcu_gp_seq = 1
 // s= 0x200
 sdp->srcu_gp_seq_needed = s; // 0x200
 srcu_funnel_gp_start()
 smp_store_release(>srcu_gp_seq_needed, s); // 0x200

3. CPU3 calls synchronize_srcu_expedited()

synchronize_srcu_expedited()
   __synchronize_srcu()
 __call_srcu()
 s = rcu_seq_snap(>srcu_gp_seq); // srcu_gp_seq = 1
 // s = 0x200
 sdp->srcu_gp_seq_needed_exp = s // 0x200
 srcu_funnel_exp_start(sp, sdp->mynode, s);
 // sp->srcu_gp_seq_needed_exp = 0x100
 // s = 0x200 ; sp->srcu_gp_seq_needed_exp is not updated
 if (!ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, s))
 sp->srcu_gp_seq_needed_exp = s;


Seems plausible, but you should be able to show the difference in
grace-period duration with a test.



Ok sure, will attempt that.


While you are in srcu_funnel_exp_start(), should it be rechecking
rcu_seq_done(>srcu_gp_seq, s) as well as the current
ULONG_CMP_GE(snp->srcu_gp_seq_needed_exp, s) under the lock?
Why or why not?

Thanx, Paul



Hi Paul,

I don't see how it will impact. I have put markers in code snippet
below to explain my points. My understanding is

* rcu_seq_done check @a is a fastpath return, and avoid contention
for snp lock, if the gp has already elapsed.

* Checking it @b, inside srcu_node  lock might not make any
difference, as sp->srcu_gp_seq counter portion is updated
under srcu_struct lock. Also, we cannot lock srcu_struct at this
point, as it will cause lock contention among multiple CPUs.

* Checking rcu_seq_done @c also does not impact, as we have already
done all the work of traversing the entire parent chain and if
rcu_seq_done() is true srcu_gp_seq_needed_exp will be greater
than or equal to 's'.

  srcu_gp_end()
raw_spin_lock_irq_rcu_node(sp);
rcu_seq_end(>srcu_gp_seq);
gpseq = rcu_seq_current(>srcu_gp_seq);
if (ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, gpseq))
sp->srcu_gp_seq_needed_exp = gpseq;
raw_spin_unlock_irq_rcu_node(sp);

static void srcu_funnel_exp_start(...)
{


for (; snp != NULL; snp = snp->srcu_parent) {
if (rcu_seq_done(>srcu_gp_seq, s) ||  /* a */
ULONG_CMP_GE(READ_ONCE(snp->srcu_gp_seq_needed_exp), s))
return;
raw_spin_lock_irqsave_rcu_node(snp, flags);
/* b */
if 

Re: Query regarding srcu_funnel_exp_start()

2017-10-27 Thread Neeraj Upadhyay

On 10/28/2017 03:50 AM, Paul E. McKenney wrote:

On Fri, Oct 27, 2017 at 10:15:04PM +0530, Neeraj Upadhyay wrote:

On 10/27/2017 05:56 PM, Paul E. McKenney wrote:

On Fri, Oct 27, 2017 at 02:23:07PM +0530, Neeraj Upadhyay wrote:

Hi,

One query regarding srcu_funnel_exp_start() function in
kernel/rcu/srcutree.c.

static void srcu_funnel_exp_start(struct srcu_struct *sp, struct
srcu_node *snp,
  unsigned long s)
{

if (!ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, s))
sp->srcu_gp_seq_needed_exp = s;

}

Why is sp->srcu_gp_seq_needed_exp set to 's' if srcu_gp_seq_needed_exp is >=
's'. Shouldn't srcu_gp_seq_needed_exp be equal to the greater of both?


Let's suppose that it is incorrect as currently written.  Can you
construct a test case demonstrating a failure of some sort, then provide
a fix?


Will check this. Might take some time to build a test case.


Fair enough!

I suggest checking to see if kernel/rcu/rcuperf.c can do what you need for
this test.  (Might not with a single test, but perhaps a before-and-after
comparison.  Or maybe you really do need to add some test code somewhere.)



Thanks for the suggestion, will try that out.


To start with, if it is currently incorrect, what would be the nature
of the failure?

Thanx, Paul



Hi Paul,

I see below scenario, where new gp won't be expedited. Please correct
me if I am missing something here.

1. CPU0 calls synchronize_srcu_expedited()

synchronize_srcu_expedited()
   __synchronize_srcu()
 __call_srcu()
 s = rcu_seq_snap(>srcu_gp_seq); // lets say
srcu_gp_seq  = 0;
 // s = 0x100


Looks like you have one hex digit and then two binary digits, but why not?
(RCU_SEQ_STATE_MASK is 3 rather than 0xff >


Yeah, sorry I confused myself while representing the values. 0x100 need 
to be replaced with b'100' and 0x200 with b'1000'.



 sdp->srcu_gp_seq_needed = s // 0x100
 needgp = true
 sdp->srcu_gp_seq_needed_exp = s // 0x100
 srcu_funnel_gp_start()
 sp->srcu_gp_seq_needed_exp = s;
 srcu_gp_start(sp);
 rcu_seq_start(>srcu_gp_seq);

2. CPU1 calls normal synchronize_srcu()

synchronize_srcu()
 __synchronize_srcu(sp, true)
 __call_srcu()
 s = rcu_seq_snap(>srcu_gp_seq); // srcu_gp_seq = 1
 // s= 0x200
 sdp->srcu_gp_seq_needed = s; // 0x200
 srcu_funnel_gp_start()
 smp_store_release(>srcu_gp_seq_needed, s); // 0x200

3. CPU3 calls synchronize_srcu_expedited()

synchronize_srcu_expedited()
   __synchronize_srcu()
 __call_srcu()
 s = rcu_seq_snap(>srcu_gp_seq); // srcu_gp_seq = 1
 // s = 0x200
 sdp->srcu_gp_seq_needed_exp = s // 0x200
 srcu_funnel_exp_start(sp, sdp->mynode, s);
 // sp->srcu_gp_seq_needed_exp = 0x100
 // s = 0x200 ; sp->srcu_gp_seq_needed_exp is not updated
 if (!ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, s))
 sp->srcu_gp_seq_needed_exp = s;


Seems plausible, but you should be able to show the difference in
grace-period duration with a test.



Ok sure, will attempt that.


While you are in srcu_funnel_exp_start(), should it be rechecking
rcu_seq_done(>srcu_gp_seq, s) as well as the current
ULONG_CMP_GE(snp->srcu_gp_seq_needed_exp, s) under the lock?
Why or why not?

Thanx, Paul



Hi Paul,

I don't see how it will impact. I have put markers in code snippet
below to explain my points. My understanding is

* rcu_seq_done check @a is a fastpath return, and avoid contention
for snp lock, if the gp has already elapsed.

* Checking it @b, inside srcu_node  lock might not make any
difference, as sp->srcu_gp_seq counter portion is updated
under srcu_struct lock. Also, we cannot lock srcu_struct at this
point, as it will cause lock contention among multiple CPUs.

* Checking rcu_seq_done @c also does not impact, as we have already
done all the work of traversing the entire parent chain and if
rcu_seq_done() is true srcu_gp_seq_needed_exp will be greater
than or equal to 's'.

  srcu_gp_end()
raw_spin_lock_irq_rcu_node(sp);
rcu_seq_end(>srcu_gp_seq);
gpseq = rcu_seq_current(>srcu_gp_seq);
if (ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, gpseq))
sp->srcu_gp_seq_needed_exp = gpseq;
raw_spin_unlock_irq_rcu_node(sp);

static void srcu_funnel_exp_start(...)
{


for (; snp != NULL; snp = snp->srcu_parent) {
if (rcu_seq_done(>srcu_gp_seq, s) ||  /* a */
ULONG_CMP_GE(READ_ONCE(snp->srcu_gp_seq_needed_exp), s))
return;
raw_spin_lock_irqsave_rcu_node(snp, flags);
/* b */
if 

Re: mptsas driver cannot detect hotplugging disk with the LSI SCSI SAS1068 controller in Ubuntu guest on VMware

2017-10-27 Thread Gavin Guo
On Fri, Oct 27, 2017 at 10:53 PM, Hannes Reinecke  wrote:
> On 10/27/2017 04:02 PM, Gavin Guo wrote:
>> Hi Hannes,
>>
>> Thank you for looking into the issue. If there is anything I can help
>> to test the patch? I appreciate your help. Thank you.
>>
> If you had checked linux-scsi you would have found this patch:
> '[PATCH] mptsas: Fixup device hotplug for VMWare ESXi', which I guess is
> already scheduled for inclusion in 4.14.
> Anything else I could help you with?
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes ReineckeTeamlead Storage & Networking
> h...@suse.de   +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)

Really appreciate your help. I will proceed the testing and keep you posted.


Re: mptsas driver cannot detect hotplugging disk with the LSI SCSI SAS1068 controller in Ubuntu guest on VMware

2017-10-27 Thread Gavin Guo
On Fri, Oct 27, 2017 at 10:53 PM, Hannes Reinecke  wrote:
> On 10/27/2017 04:02 PM, Gavin Guo wrote:
>> Hi Hannes,
>>
>> Thank you for looking into the issue. If there is anything I can help
>> to test the patch? I appreciate your help. Thank you.
>>
> If you had checked linux-scsi you would have found this patch:
> '[PATCH] mptsas: Fixup device hotplug for VMWare ESXi', which I guess is
> already scheduled for inclusion in 4.14.
> Anything else I could help you with?
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes ReineckeTeamlead Storage & Networking
> h...@suse.de   +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)

Really appreciate your help. I will proceed the testing and keep you posted.


Re: [GIT PULL] tracing/samples: Fix creation and deletion of simple_thread_fn creation

2017-10-27 Thread Linus Torvalds
On Fri, Oct 27, 2017 at 6:15 PM, Steven Rostedt  wrote:
>
> I'll write up a fix on the plane home on Saturday. Expect something by
> Sunday.

I'll change the "bool" to "int" in the meanwhile, getting rid of the warning.

That may be what you meant in the first place.

  Linus


Re: [GIT PULL] tracing/samples: Fix creation and deletion of simple_thread_fn creation

2017-10-27 Thread Linus Torvalds
On Fri, Oct 27, 2017 at 6:15 PM, Steven Rostedt  wrote:
>
> I'll write up a fix on the plane home on Saturday. Expect something by
> Sunday.

I'll change the "bool" to "int" in the meanwhile, getting rid of the warning.

That may be what you meant in the first place.

  Linus


Re: [PATCH RFC] random: fix syzkaller fuzzer test int overflow

2017-10-27 Thread Chen Feng
Hi ted,

On 2017/10/26 23:04, Theodore Ts'o wrote:
> On Thu, Oct 26, 2017 at 04:25:15PM +0800, Chen Feng wrote:
>>
>>
>> On 2017/10/25 16:49, Theodore Ts'o wrote:
>>> Other people who have sent me fuzzer test reproducers are able to
>>> reproduce syzkaller logs into a simple C program.  Can you explain to
>>> me what the heck:
>>>
 r3 = syz_open_dev$urandom(&(0x7f00a000)="2f6465762f7572616e646f6d00", 
 0x0, 0x0)
>>>
>>> means?
>>
>> Take a look at this:
>>
>> https://github.com/google/syzkaller/blob/master/sys/linux/random.txt
> 
> Sorry, this *still* looks like gobbledygook.
> 
> What ioctls are you executing, and with what arguments?
> 
> *Please*, give me a C program I can compile.

I checked the ioctl. What's the purpose of RNDADDTOENTCNT ioctl to userspace?

We need to checked the user-input at credit_entropy_bits_safe.

+   if (INT_MAX - nbits < r->entropy_total)
+   return -EINVAL;
+


The test-code below:

void *random_ioctl_test(void *args)
{
int fd = -1;
int ret = -1;
int test_arg = 0x7fff;

fd = open("dev/urandom", 0x0, 0x0);
if (fd < 0) {
printf("open dev/urandom failed!\n");
return NULL;
}

ret = ioctl(fd, 0x40045201, _arg);

printf("random_ioctl ret=%d\n", ret);
close(fd);
return NULL;
}

int main(int argc, char *argv[])
{
int ret, i;
pthread_t thread[100];

for (i = 0; i < 100; i++) {
ret = pthread_create([i], NULL, random_ioctl_test, );
if (ret) {
printf("create thread %d fail with ret=%d\n", i, ret);
return -1;
}
}

for (i = 0; i < 100; i++) {
pthread_join(thread[i], NULL);
}
return 0;
}


> 
>-Ted
> 
> .
> 



Re: [PATCH RFC] random: fix syzkaller fuzzer test int overflow

2017-10-27 Thread Chen Feng
Hi ted,

On 2017/10/26 23:04, Theodore Ts'o wrote:
> On Thu, Oct 26, 2017 at 04:25:15PM +0800, Chen Feng wrote:
>>
>>
>> On 2017/10/25 16:49, Theodore Ts'o wrote:
>>> Other people who have sent me fuzzer test reproducers are able to
>>> reproduce syzkaller logs into a simple C program.  Can you explain to
>>> me what the heck:
>>>
 r3 = syz_open_dev$urandom(&(0x7f00a000)="2f6465762f7572616e646f6d00", 
 0x0, 0x0)
>>>
>>> means?
>>
>> Take a look at this:
>>
>> https://github.com/google/syzkaller/blob/master/sys/linux/random.txt
> 
> Sorry, this *still* looks like gobbledygook.
> 
> What ioctls are you executing, and with what arguments?
> 
> *Please*, give me a C program I can compile.

I checked the ioctl. What's the purpose of RNDADDTOENTCNT ioctl to userspace?

We need to checked the user-input at credit_entropy_bits_safe.

+   if (INT_MAX - nbits < r->entropy_total)
+   return -EINVAL;
+


The test-code below:

void *random_ioctl_test(void *args)
{
int fd = -1;
int ret = -1;
int test_arg = 0x7fff;

fd = open("dev/urandom", 0x0, 0x0);
if (fd < 0) {
printf("open dev/urandom failed!\n");
return NULL;
}

ret = ioctl(fd, 0x40045201, _arg);

printf("random_ioctl ret=%d\n", ret);
close(fd);
return NULL;
}

int main(int argc, char *argv[])
{
int ret, i;
pthread_t thread[100];

for (i = 0; i < 100; i++) {
ret = pthread_create([i], NULL, random_ioctl_test, );
if (ret) {
printf("create thread %d fail with ret=%d\n", i, ret);
return -1;
}
}

for (i = 0; i < 100; i++) {
pthread_join(thread[i], NULL);
}
return 0;
}


> 
>-Ted
> 
> .
> 



[PATCH v2] workqueue: Fix NULL pointer dereference

2017-10-27 Thread Li Bin
When queue_work() is used in irq (not in task context), there is
a potential case that trigger NULL pointer dereference.

worker_thread()
|-spin_lock_irq()
|-process_one_work()
|-worker->current_pwq = pwq
|-spin_unlock_irq()
|-worker->current_func(work)
|-spin_lock_irq()
|-worker->current_pwq = NULL
|-spin_unlock_irq()

//interrupt here
|-irq_handler
|-__queue_work()
//assuming that the wq is 
draining
|-is_chained_work(wq)
|-current_wq_worker()
//Here, 'current' is 
the interrupted worker!

|-current->current_pwq is NULL here!
|-schedule()


Avoid it by checking for task context in current_wq_worker(), and
if not in task context, we shouldn't use the 'current' to check the
condition.

Reported-by: Xiaofei Tan 
Signed-off-by: Li Bin 
---
 kernel/workqueue_internal.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 8635417..29fa81f 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -9,6 +9,7 @@
 
 #include 
 #include 
+#include 
 
 struct worker_pool;
 
@@ -59,7 +60,7 @@ struct worker {
  */
 static inline struct worker *current_wq_worker(void)
 {
-   if (current->flags & PF_WQ_WORKER)
+   if (in_task() && (current->flags & PF_WQ_WORKER))
return kthread_data(current);
return NULL;
 }
-- 
1.7.12.4



[PATCH v2] workqueue: Fix NULL pointer dereference

2017-10-27 Thread Li Bin
When queue_work() is used in irq (not in task context), there is
a potential case that trigger NULL pointer dereference.

worker_thread()
|-spin_lock_irq()
|-process_one_work()
|-worker->current_pwq = pwq
|-spin_unlock_irq()
|-worker->current_func(work)
|-spin_lock_irq()
|-worker->current_pwq = NULL
|-spin_unlock_irq()

//interrupt here
|-irq_handler
|-__queue_work()
//assuming that the wq is 
draining
|-is_chained_work(wq)
|-current_wq_worker()
//Here, 'current' is 
the interrupted worker!

|-current->current_pwq is NULL here!
|-schedule()


Avoid it by checking for task context in current_wq_worker(), and
if not in task context, we shouldn't use the 'current' to check the
condition.

Reported-by: Xiaofei Tan 
Signed-off-by: Li Bin 
---
 kernel/workqueue_internal.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 8635417..29fa81f 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -9,6 +9,7 @@
 
 #include 
 #include 
+#include 
 
 struct worker_pool;
 
@@ -59,7 +60,7 @@ struct worker {
  */
 static inline struct worker *current_wq_worker(void)
 {
-   if (current->flags & PF_WQ_WORKER)
+   if (in_task() && (current->flags & PF_WQ_WORKER))
return kthread_data(current);
return NULL;
 }
-- 
1.7.12.4



Re: [PATCH 2/2] Add /proc/PID/{smaps, numa_maps} support for DAX

2017-10-27 Thread kbuild test robot
Hi Fan,

Thank you for the patch! Yet we hit a small issue.
[auto build test WARNING on linus/master]
[also build test WARNING on v4.14-rc6 next-20171018]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Fan-Du/proc-mm-export-PTE-sizes-directly-in-smaps/20171027-233355
config: i386-randconfig-b0-10280854 (attached as .config)
compiler: gcc-5 (Debian 5.4.1-2) 5.4.1 20160904
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings

All warnings (new ones prefixed by >>):

   In file included from arch/x86/include/asm/processor.h:31:0,
from arch/x86/include/asm/cpufeature.h:4,
from arch/x86/include/asm/thread_info.h:52,
from include/linux/thread_info.h:37,
from arch/x86/include/asm/preempt.h:6,
from include/linux/preempt.h:80,
from include/linux/spinlock.h:50,
from include/linux/mmzone.h:7,
from include/linux/gfp.h:5,
from include/linux/mm.h:9,
from fs/proc/task_mmu.c:1:
   fs/proc/task_mmu.c: In function 'smaps_pte_range':
>> include/linux/err.h:40:24: warning: 'page' may be used uninitialized in this 
>> function [-Wmaybe-uninitialized]
 return unlikely(!ptr) || IS_ERR_VALUE((unsigned long)ptr);
   ^
   fs/proc/task_mmu.c:586:15: note: 'page' was declared here
 struct page *page;
  ^
--
   In file included from arch/x86/include/asm/processor.h:31:0,
from arch/x86/include/asm/cpufeature.h:4,
from arch/x86/include/asm/thread_info.h:52,
from include/linux/thread_info.h:37,
from arch/x86/include/asm/preempt.h:6,
from include/linux/preempt.h:80,
from include/linux/spinlock.h:50,
from include/linux/mmzone.h:7,
from include/linux/gfp.h:5,
from include/linux/mm.h:9,
from fs//proc/task_mmu.c:1:
   fs//proc/task_mmu.c: In function 'smaps_pte_range':
>> include/linux/err.h:40:24: warning: 'page' may be used uninitialized in this 
>> function [-Wmaybe-uninitialized]
 return unlikely(!ptr) || IS_ERR_VALUE((unsigned long)ptr);
   ^
   fs//proc/task_mmu.c:586:15: note: 'page' was declared here
 struct page *page;
  ^

vim +/page +40 include/linux/err.h

^1da177e Linus Torvalds 2005-04-16  37  
a5ed3cee Joe Perches2014-04-03  38  static inline bool __must_check 
IS_ERR_OR_NULL(__force const void *ptr)
603c4ba9 Phil Carmody   2009-12-14  39  {
dfffa587 Viresh Kumar   2016-01-15 @40  return unlikely(!ptr) || 
IS_ERR_VALUE((unsigned long)ptr);
603c4ba9 Phil Carmody   2009-12-14  41  }
603c4ba9 Phil Carmody   2009-12-14  42  

:: The code at line 40 was first introduced by commit
:: dfffa587a6bcd84f2087f88e11600b0e8b0aa1ee err.h: add (missing) unlikely() 
to IS_ERR_OR_NULL()

:: TO: Viresh Kumar <viresh.ku...@linaro.org>
:: CC: Linus Torvalds <torva...@linux-foundation.org>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH 2/2] Add /proc/PID/{smaps, numa_maps} support for DAX

2017-10-27 Thread kbuild test robot
Hi Fan,

Thank you for the patch! Yet we hit a small issue.
[auto build test WARNING on linus/master]
[also build test WARNING on v4.14-rc6 next-20171018]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Fan-Du/proc-mm-export-PTE-sizes-directly-in-smaps/20171027-233355
config: i386-randconfig-b0-10280854 (attached as .config)
compiler: gcc-5 (Debian 5.4.1-2) 5.4.1 20160904
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings

All warnings (new ones prefixed by >>):

   In file included from arch/x86/include/asm/processor.h:31:0,
from arch/x86/include/asm/cpufeature.h:4,
from arch/x86/include/asm/thread_info.h:52,
from include/linux/thread_info.h:37,
from arch/x86/include/asm/preempt.h:6,
from include/linux/preempt.h:80,
from include/linux/spinlock.h:50,
from include/linux/mmzone.h:7,
from include/linux/gfp.h:5,
from include/linux/mm.h:9,
from fs/proc/task_mmu.c:1:
   fs/proc/task_mmu.c: In function 'smaps_pte_range':
>> include/linux/err.h:40:24: warning: 'page' may be used uninitialized in this 
>> function [-Wmaybe-uninitialized]
 return unlikely(!ptr) || IS_ERR_VALUE((unsigned long)ptr);
   ^
   fs/proc/task_mmu.c:586:15: note: 'page' was declared here
 struct page *page;
  ^
--
   In file included from arch/x86/include/asm/processor.h:31:0,
from arch/x86/include/asm/cpufeature.h:4,
from arch/x86/include/asm/thread_info.h:52,
from include/linux/thread_info.h:37,
from arch/x86/include/asm/preempt.h:6,
from include/linux/preempt.h:80,
from include/linux/spinlock.h:50,
from include/linux/mmzone.h:7,
from include/linux/gfp.h:5,
from include/linux/mm.h:9,
from fs//proc/task_mmu.c:1:
   fs//proc/task_mmu.c: In function 'smaps_pte_range':
>> include/linux/err.h:40:24: warning: 'page' may be used uninitialized in this 
>> function [-Wmaybe-uninitialized]
 return unlikely(!ptr) || IS_ERR_VALUE((unsigned long)ptr);
   ^
   fs//proc/task_mmu.c:586:15: note: 'page' was declared here
 struct page *page;
  ^

vim +/page +40 include/linux/err.h

^1da177e Linus Torvalds 2005-04-16  37  
a5ed3cee Joe Perches2014-04-03  38  static inline bool __must_check 
IS_ERR_OR_NULL(__force const void *ptr)
603c4ba9 Phil Carmody   2009-12-14  39  {
dfffa587 Viresh Kumar   2016-01-15 @40  return unlikely(!ptr) || 
IS_ERR_VALUE((unsigned long)ptr);
603c4ba9 Phil Carmody   2009-12-14  41  }
603c4ba9 Phil Carmody   2009-12-14  42  

:: The code at line 40 was first introduced by commit
:: dfffa587a6bcd84f2087f88e11600b0e8b0aa1ee err.h: add (missing) unlikely() 
to IS_ERR_OR_NULL()

:: TO: Viresh Kumar 
:: CC: Linus Torvalds 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH 1/4] m68k/mac: More printk modernization

2017-10-27 Thread Finn Thain
On Sat, 28 Oct 2017, I wrote:

> > Any particular reason why you didn't use pr_debug() here? I'm guessing 
> > it's because this is not a known pointer value?
> > 
> 
> It's because the call to psc_debug_dump() is already conditional on 
> #ifdef DEBUG_PSC.
> 
> Having the printk conditional on both DEBUG and DEBUG_PSC would be 
> annoying. And I didn't want an unconditional call to psc_debug_dump() 
> because I think PSC_DEBUG could become more useful given that PSC 
> support is woefully incomplete.
> 

Perhaps PSC_DEBUG should be scrapped in favour of DEBUG. Presently 
DEBUG_PSC is set and I think that's useful as long as those drivers are 
incomplete. So we would end up with this:


#define DEBUG

#include ...
...

static void psc_debug_dump(void)
{
...
pr_debug(...);
...
}

void __init psc_init(void)
{
...
#if DEBUG
psc_debug_dump()
#endif
...
}


In this version, the "#define DEBUG" at the top of the file has obscure 
side effects (not just in printk.h) considering all of the headers that 
get included, and their includes, and so on. I still prefer the patch that 
I sent.

-- 


Re: [PATCH 1/4] m68k/mac: More printk modernization

2017-10-27 Thread Finn Thain
On Sat, 28 Oct 2017, I wrote:

> > Any particular reason why you didn't use pr_debug() here? I'm guessing 
> > it's because this is not a known pointer value?
> > 
> 
> It's because the call to psc_debug_dump() is already conditional on 
> #ifdef DEBUG_PSC.
> 
> Having the printk conditional on both DEBUG and DEBUG_PSC would be 
> annoying. And I didn't want an unconditional call to psc_debug_dump() 
> because I think PSC_DEBUG could become more useful given that PSC 
> support is woefully incomplete.
> 

Perhaps PSC_DEBUG should be scrapped in favour of DEBUG. Presently 
DEBUG_PSC is set and I think that's useful as long as those drivers are 
incomplete. So we would end up with this:


#define DEBUG

#include ...
...

static void psc_debug_dump(void)
{
...
pr_debug(...);
...
}

void __init psc_init(void)
{
...
#if DEBUG
psc_debug_dump()
#endif
...
}


In this version, the "#define DEBUG" at the top of the file has obscure 
side effects (not just in printk.h) considering all of the headers that 
get included, and their includes, and so on. I still prefer the patch that 
I sent.

-- 


Re: [PATCH] f2fs: fix to keep backward compatibility of flexible inline xattr feature

2017-10-27 Thread Chao Yu
On 2017/10/27 19:39, Chao Yu wrote:
> On 2017/10/27 19:32, Jaegeuk Kim wrote:
>> On 10/27, Chao Yu wrote:
>>> On 2017/10/27 18:56, Jaegeuk Kim wrote:
 On 10/27, Chao Yu wrote:
> On 2017/10/26 22:05, Jaegeuk Kim wrote:
>> On 10/26, Chao Yu wrote:
>>> On 2017/10/26 19:52, Jaegeuk Kim wrote:
 On 10/26, Chao Yu wrote:
> Hi Jaegeuk,
>
> On 2017/10/26 18:02, Jaegeuk Kim wrote:
>> Hi Chao,
>>
>> On 10/26, Jaegeuk Kim wrote:
>>> On 10/26, Chao Yu wrote:
 Hi Jaegeuk,

 On 2017/10/26 16:42, Jaegeuk Kim wrote:
> Hi Chao,
>
> It seems this is a critical problem, so let me integrate this 
> patch with your
> initial patch "f2fs: support flexible inline xattr size".
> Let me know, if you have any other concern.

 Better. ;)

 Please add commit message of this patch into initial patch "f2fs: 
 support
 flexible inline xattr size".
>>
>> BTW, I read the patch again, and couldn't catch the problem actually.
>> We didn't assign inline_xattr all the time, instead do if 
>> inline_xattr
>
> But you can see, MAX_INLINE_DATA is calculated as below, in where we 
> will always
> reserve F2FS_INLINE_XATTR_ADDRS of 200 bytes, now we will change
> F2FS_INLINE_XATTR_ADDRS to F2FS_INLINE_XATTR_ADDRS(inode) which is an 
> flexible
> size, so MAX_INLINE_DATA could be calculated to expand 200 bytes than 
> before,

 That doesn't mean inline_addr is starting after 200 bytes. We're 
 getting the
 address only if inline_xattr is set, no? The below size seems 
 reserving the
>>>
>>> This isn't about inline_xattr_addr, it is about MAX_INLINE_DATA size 
>>> calculation.
>>>
>>> For example, in old image, inode is with inline_{data,dentry} flag, but 
>>> without
>>> inline_xattr flag, it will only use 3688 - 200 bytes for storing inline 
>>> data/dents,
>>> which means, it reserves 200 bytes.
>>>
>>> If we update kernel, get_inline_xattr_addrs will return 0 if 
>>> inline_xattr flag is not
>>> set, so MAX_INLINE_DATA will be 3688, then inline_dentry page layout 
>>> will change.
>>
>> Thanks. This makes much clear to me. It seems we need to handle 
>> directory only.
>> Could you please check the dev-test branches in f2fs and f2fs-tools?
>
> This is a little complicated, as for non-inline_{data,dentry} inode, we 
> will get
> addrs number in inode by:
>
> static inline unsigned int addrs_per_inode(struct inode *inode)
> {
>   if (f2fs_has_inline_xattr(inode))
>   return CUR_ADDRS_PER_INODE(inode) - F2FS_INLINE_XATTR_ADDRS;
>   return CUR_ADDRS_PER_INODE(inode);
> }
>
> It only cares about this inode is inline_xattr or not, so, if we reserve 
> 200
> bytes for dir inode, it will cause incompatibility.
>
> So we need add this logic into i_inline_xattr_size calculation? Such as:
>
> a. new_inode:
>
>   if (f2fs_sb_has_flexible_inline_xattr(sbi->sb)) {
>   f2fs_bug_on(sbi, !f2fs_has_extra_attr(inode));
>   if (f2fs_has_inline_xattr)
>   xattr_size = sbi->inline_xattr_size;
>   } else if (f2fs_has_inline_xattr(inode) ||
>   (f2fs_has_inline_dentry(inode) && 
> S_ISDIR(inode->i_mode))) {

 We don't need to check both of them. It'd be enough to check
 f2fs_has_inline_dentry(inode).
>>>
>>> Agreed.
>>>

>   xattr = DEFAULT_INLINE_XATTR_ADDRS
>   }
>   F2FS_I(inode)->i_inline_xattr_size = xattr_size;
>
> b. do_read_inode
>
>   if (f2fs_sb_has_flexible_inline_xattr(sbi->sb)) {
>   f2fs_bug_on(sbi, !f2fs_has_extra_attr(inode));
>   fi->i_inline_xattr_size = le16_to_cpu(ri->i_inline_xattr_size);
>   } else if (f2fs_has_inline_xattr(inode) ||
>   (f2fs_has_inline_dentry(inode) && 
> S_ISDIR(inode->i_mode)) {

 Ditto.

 I merged the change in dev-test, so could you check that out again? ;)

 Thanks,

>   fi->i_inline_xattr_size = DEFAULT_INLINE_XATTR_ADDRS;
>   } else {
>   /*
>* Previous inline_data always reserved 200 bytes, even if
>* inline_xattr is disabled. But only inline_data is safe
>>>
>>> Minor thing, how about:
>>>
>>> /*
>>>  * Previous inline data or directory always reserved 200 bytes in
>>>  * inode layout, even if inline_xattr is disabled, in order to
>>>  * stablize inline dentry's structure for backward compatibility,
>>>  * we only get back reserved space for inline data.
>>>  */
>>
>> I slightly changed 

Re: [PATCH] f2fs: fix to keep backward compatibility of flexible inline xattr feature

2017-10-27 Thread Chao Yu
On 2017/10/27 19:39, Chao Yu wrote:
> On 2017/10/27 19:32, Jaegeuk Kim wrote:
>> On 10/27, Chao Yu wrote:
>>> On 2017/10/27 18:56, Jaegeuk Kim wrote:
 On 10/27, Chao Yu wrote:
> On 2017/10/26 22:05, Jaegeuk Kim wrote:
>> On 10/26, Chao Yu wrote:
>>> On 2017/10/26 19:52, Jaegeuk Kim wrote:
 On 10/26, Chao Yu wrote:
> Hi Jaegeuk,
>
> On 2017/10/26 18:02, Jaegeuk Kim wrote:
>> Hi Chao,
>>
>> On 10/26, Jaegeuk Kim wrote:
>>> On 10/26, Chao Yu wrote:
 Hi Jaegeuk,

 On 2017/10/26 16:42, Jaegeuk Kim wrote:
> Hi Chao,
>
> It seems this is a critical problem, so let me integrate this 
> patch with your
> initial patch "f2fs: support flexible inline xattr size".
> Let me know, if you have any other concern.

 Better. ;)

 Please add commit message of this patch into initial patch "f2fs: 
 support
 flexible inline xattr size".
>>
>> BTW, I read the patch again, and couldn't catch the problem actually.
>> We didn't assign inline_xattr all the time, instead do if 
>> inline_xattr
>
> But you can see, MAX_INLINE_DATA is calculated as below, in where we 
> will always
> reserve F2FS_INLINE_XATTR_ADDRS of 200 bytes, now we will change
> F2FS_INLINE_XATTR_ADDRS to F2FS_INLINE_XATTR_ADDRS(inode) which is an 
> flexible
> size, so MAX_INLINE_DATA could be calculated to expand 200 bytes than 
> before,

 That doesn't mean inline_addr is starting after 200 bytes. We're 
 getting the
 address only if inline_xattr is set, no? The below size seems 
 reserving the
>>>
>>> This isn't about inline_xattr_addr, it is about MAX_INLINE_DATA size 
>>> calculation.
>>>
>>> For example, in old image, inode is with inline_{data,dentry} flag, but 
>>> without
>>> inline_xattr flag, it will only use 3688 - 200 bytes for storing inline 
>>> data/dents,
>>> which means, it reserves 200 bytes.
>>>
>>> If we update kernel, get_inline_xattr_addrs will return 0 if 
>>> inline_xattr flag is not
>>> set, so MAX_INLINE_DATA will be 3688, then inline_dentry page layout 
>>> will change.
>>
>> Thanks. This makes much clear to me. It seems we need to handle 
>> directory only.
>> Could you please check the dev-test branches in f2fs and f2fs-tools?
>
> This is a little complicated, as for non-inline_{data,dentry} inode, we 
> will get
> addrs number in inode by:
>
> static inline unsigned int addrs_per_inode(struct inode *inode)
> {
>   if (f2fs_has_inline_xattr(inode))
>   return CUR_ADDRS_PER_INODE(inode) - F2FS_INLINE_XATTR_ADDRS;
>   return CUR_ADDRS_PER_INODE(inode);
> }
>
> It only cares about this inode is inline_xattr or not, so, if we reserve 
> 200
> bytes for dir inode, it will cause incompatibility.
>
> So we need add this logic into i_inline_xattr_size calculation? Such as:
>
> a. new_inode:
>
>   if (f2fs_sb_has_flexible_inline_xattr(sbi->sb)) {
>   f2fs_bug_on(sbi, !f2fs_has_extra_attr(inode));
>   if (f2fs_has_inline_xattr)
>   xattr_size = sbi->inline_xattr_size;
>   } else if (f2fs_has_inline_xattr(inode) ||
>   (f2fs_has_inline_dentry(inode) && 
> S_ISDIR(inode->i_mode))) {

 We don't need to check both of them. It'd be enough to check
 f2fs_has_inline_dentry(inode).
>>>
>>> Agreed.
>>>

>   xattr = DEFAULT_INLINE_XATTR_ADDRS
>   }
>   F2FS_I(inode)->i_inline_xattr_size = xattr_size;
>
> b. do_read_inode
>
>   if (f2fs_sb_has_flexible_inline_xattr(sbi->sb)) {
>   f2fs_bug_on(sbi, !f2fs_has_extra_attr(inode));
>   fi->i_inline_xattr_size = le16_to_cpu(ri->i_inline_xattr_size);
>   } else if (f2fs_has_inline_xattr(inode) ||
>   (f2fs_has_inline_dentry(inode) && 
> S_ISDIR(inode->i_mode)) {

 Ditto.

 I merged the change in dev-test, so could you check that out again? ;)

 Thanks,

>   fi->i_inline_xattr_size = DEFAULT_INLINE_XATTR_ADDRS;
>   } else {
>   /*
>* Previous inline_data always reserved 200 bytes, even if
>* inline_xattr is disabled. But only inline_data is safe
>>>
>>> Minor thing, how about:
>>>
>>> /*
>>>  * Previous inline data or directory always reserved 200 bytes in
>>>  * inode layout, even if inline_xattr is disabled, in order to
>>>  * stablize inline dentry's structure for backward compatibility,
>>>  * we only get back reserved space for inline data.
>>>  */
>>
>> I slightly changed 

[PATCH] ACPICA: Clean up whitespace of indentation

2017-10-27 Thread Baoquan He
Use tabs (not spaces) for indentation.

Signed-off-by: Baoquan He 
---
 include/acpi/actbl1.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
index 6b8714a428b6..d8a4fc066abe 100644
--- a/include/acpi/actbl1.h
+++ b/include/acpi/actbl1.h
@@ -1438,9 +1438,9 @@ struct acpi_srat_mem_affinity {
u16 reserved;   /* Reserved, must be zero */
u64 base_address;
u64 length;
-   u32 reserved1;
+   u32 reserved1;
u32 flags;
-   u64 reserved2; /* Reserved, must be zero */
+   u64 reserved2; /* Reserved, must be zero */
 };
 
 /* Flags */
-- 
2.5.5



[PATCH] ACPICA: Clean up whitespace of indentation

2017-10-27 Thread Baoquan He
Use tabs (not spaces) for indentation.

Signed-off-by: Baoquan He 
---
 include/acpi/actbl1.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
index 6b8714a428b6..d8a4fc066abe 100644
--- a/include/acpi/actbl1.h
+++ b/include/acpi/actbl1.h
@@ -1438,9 +1438,9 @@ struct acpi_srat_mem_affinity {
u16 reserved;   /* Reserved, must be zero */
u64 base_address;
u64 length;
-   u32 reserved1;
+   u32 reserved1;
u32 flags;
-   u64 reserved2; /* Reserved, must be zero */
+   u64 reserved2; /* Reserved, must be zero */
 };
 
 /* Flags */
-- 
2.5.5



"Core temperature above threshold" on Fujitsu U757 with 2 core Kaby Lake (i7-7600U)

2017-10-27 Thread Christoph Anton Mitterer
Hey.

Perhaps someone can help me with this.


I got a brand new notebook from the university, a Fujitsu U757[0][1],
with a 2 core Kaby Lake (i7-7600U) and 32GB RAM.
It runs Debian unstable, that is as of now kernel 4.13.4.

Even at pretty simple tasks (just some VM running) and a bit more, the
CPUs seem to overheat (>100°C).
I brought the thing back to the university's vendor and they claimed
that they couldn't reproduce this with the (Windows based) tests and it
might be a OS issue (they did replace the heat paste at my request).


The kernel logs quite regularly give:
Oct 28 03:15:19 heisenberg kernel: CPU2: Core temperature above threshold, cpu 
clock throttled (total events = 1207)
Oct 28 03:15:19 heisenberg kernel: CPU0: Core temperature above threshold, cpu 
clock throttled (total events = 1207)
Oct 28 03:15:19 heisenberg kernel: CPU1: Package temperature above threshold, 
cpu clock throttled (total events = 1394)
Oct 28 03:15:19 heisenberg kernel: CPU3: Package temperature above threshold, 
cpu clock throttled (total events = 1394)
Oct 28 03:15:19 heisenberg kernel: CPU0: Package temperature above threshold, 
cpu clock throttled (total events = 1394)
Oct 28 03:15:19 heisenberg kernel: CPU2: Package temperature above threshold, 
cpu clock throttled (total events = 1394)
Oct 28 03:15:19 heisenberg kernel: CPU0: Core temperature/speed normal
Oct 28 03:15:19 heisenberg kernel: CPU2: Core temperature/speed normal
Oct 28 03:15:19 heisenberg kernel: CPU3: Package temperature/speed normal
Oct 28 03:15:19 heisenberg kernel: CPU1: Package temperature/speed normal
Oct 28 03:15:19 heisenberg kernel: CPU2: Package temperature/speed normal
Oct 28 03:15:19 heisenberg kernel: CPU0: Package temperature/speed normal

I guess every time it goes beyond 100° C.

Once so far I had a complete lockup of the machine (it still seemed to
write data to the HDD, but I could only hard power cycle to get
it usable again.
Not sure if this is related to the temperature issue.
See the attached kernel log.

At around Oct 15 22:46:39 there seems to be first a crash of the Wifi
microcode a bit later, beginning at about Oct 16 01:27:16, there are
numerous stack traces with "BUG: soft lockup - CPU".


Could this be some kernel issue? Especially the overheating... I mean
obviously not in the sense that it's the kernels fault, but in the
sense that is should speed it down earlier or so...?


Interestingly, when I run e.g. stress or stress-ng on all 4 logical
CPUs... then sometimes I do get the overheating, sometimes not (in
which case temperature stays above 90°C.. but always below 100°C (I
assume).



Any help would be welcome, do not hesitate to ask if you need more data
(keep me CCed).

Thanks,
Chris.



[0] http://www.fujitsu.com/fts/products/computing/pc/notebooks/lifebook-u757/
[1] http://docs.ts.fujitsu.com/dl.aspx?id=addf5093-b73b-407b-ae78-90c5baf6456a

kern.log.xz
Description: application/xz


"Core temperature above threshold" on Fujitsu U757 with 2 core Kaby Lake (i7-7600U)

2017-10-27 Thread Christoph Anton Mitterer
Hey.

Perhaps someone can help me with this.


I got a brand new notebook from the university, a Fujitsu U757[0][1],
with a 2 core Kaby Lake (i7-7600U) and 32GB RAM.
It runs Debian unstable, that is as of now kernel 4.13.4.

Even at pretty simple tasks (just some VM running) and a bit more, the
CPUs seem to overheat (>100°C).
I brought the thing back to the university's vendor and they claimed
that they couldn't reproduce this with the (Windows based) tests and it
might be a OS issue (they did replace the heat paste at my request).


The kernel logs quite regularly give:
Oct 28 03:15:19 heisenberg kernel: CPU2: Core temperature above threshold, cpu 
clock throttled (total events = 1207)
Oct 28 03:15:19 heisenberg kernel: CPU0: Core temperature above threshold, cpu 
clock throttled (total events = 1207)
Oct 28 03:15:19 heisenberg kernel: CPU1: Package temperature above threshold, 
cpu clock throttled (total events = 1394)
Oct 28 03:15:19 heisenberg kernel: CPU3: Package temperature above threshold, 
cpu clock throttled (total events = 1394)
Oct 28 03:15:19 heisenberg kernel: CPU0: Package temperature above threshold, 
cpu clock throttled (total events = 1394)
Oct 28 03:15:19 heisenberg kernel: CPU2: Package temperature above threshold, 
cpu clock throttled (total events = 1394)
Oct 28 03:15:19 heisenberg kernel: CPU0: Core temperature/speed normal
Oct 28 03:15:19 heisenberg kernel: CPU2: Core temperature/speed normal
Oct 28 03:15:19 heisenberg kernel: CPU3: Package temperature/speed normal
Oct 28 03:15:19 heisenberg kernel: CPU1: Package temperature/speed normal
Oct 28 03:15:19 heisenberg kernel: CPU2: Package temperature/speed normal
Oct 28 03:15:19 heisenberg kernel: CPU0: Package temperature/speed normal

I guess every time it goes beyond 100° C.

Once so far I had a complete lockup of the machine (it still seemed to
write data to the HDD, but I could only hard power cycle to get
it usable again.
Not sure if this is related to the temperature issue.
See the attached kernel log.

At around Oct 15 22:46:39 there seems to be first a crash of the Wifi
microcode a bit later, beginning at about Oct 16 01:27:16, there are
numerous stack traces with "BUG: soft lockup - CPU".


Could this be some kernel issue? Especially the overheating... I mean
obviously not in the sense that it's the kernels fault, but in the
sense that is should speed it down earlier or so...?


Interestingly, when I run e.g. stress or stress-ng on all 4 logical
CPUs... then sometimes I do get the overheating, sometimes not (in
which case temperature stays above 90°C.. but always below 100°C (I
assume).



Any help would be welcome, do not hesitate to ask if you need more data
(keep me CCed).

Thanks,
Chris.



[0] http://www.fujitsu.com/fts/products/computing/pc/notebooks/lifebook-u757/
[1] http://docs.ts.fujitsu.com/dl.aspx?id=addf5093-b73b-407b-ae78-90c5baf6456a

kern.log.xz
Description: application/xz


[PATCH v3] x86/mm/64: Rename the register_page_bootmem_memmap() 'size' parameter to 'nr_pages'

2017-10-27 Thread Baoquan He
register_page_bootmem_memmap()'s 3rd 'size' parameter is named
in a somewhat misleading fashion - rename it to 'nr_pages' which
makes the units of it much clearer.

Meanwhile rename the existing local variable 'nr_pages' to
'nr_pmd_pages', a more expressive name, to avoid conflict with
new function parameter 'nr_pages'.

And also clean up the extra parentheses in which get_order()
is called.

Signed-off-by: Baoquan He 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: a...@linux-foundation.org
Link: http://lkml.kernel.org/r/1508849249-18035-1-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
v1->v2:
  Code change in v1 is incomplete, caused build failure. Change
  it after Ingo pointed it out.

  And Ingo helped rewrite the change log of v1. I also add description
  about the local variable change.

v2->v3:
  Make changes according to Ingo's comment:
  - Change the old local variable 'nr_pages' to 'nr_pmd_pages'.
  - Remove the extra parentheses where get_order() is called.
  
 arch/x86/mm/init_64.c | 10 +-
 include/linux/mm.h|  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 048fbe8fc274..adcea90a2046 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1426,16 +1426,16 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node)
 
 #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && 
defined(CONFIG_HAVE_BOOTMEM_INFO_NODE)
 void register_page_bootmem_memmap(unsigned long section_nr,
- struct page *start_page, unsigned long size)
+ struct page *start_page, unsigned long 
nr_pages)
 {
unsigned long addr = (unsigned long)start_page;
-   unsigned long end = (unsigned long)(start_page + size);
+   unsigned long end = (unsigned long)(start_page + nr_pages);
unsigned long next;
pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
-   unsigned int nr_pages;
+   unsigned int nr_pmd_pages;
struct page *page;
 
for (; addr < end; addr = next) {
@@ -1482,9 +1482,9 @@ void register_page_bootmem_memmap(unsigned long 
section_nr,
if (pmd_none(*pmd))
continue;
 
-   nr_pages = 1 << (get_order(PMD_SIZE));
+   nr_pmd_pages = 1 << get_order(PMD_SIZE);
page = pmd_page(*pmd);
-   while (nr_pages--)
+   while (nr_pmd_pages--)
get_page_bootmem(section_nr, page++,
 SECTION_INFO);
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 065d99deb847..b2c7045e9604 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2495,7 +2495,7 @@ void vmemmap_populate_print_last(void);
 void vmemmap_free(unsigned long start, unsigned long end);
 #endif
 void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
- unsigned long size);
+ unsigned long nr_pages);
 
 enum mf_flags {
MF_COUNT_INCREASED = 1 << 0,
-- 
2.5.5



[PATCH v3] x86/mm/64: Rename the register_page_bootmem_memmap() 'size' parameter to 'nr_pages'

2017-10-27 Thread Baoquan He
register_page_bootmem_memmap()'s 3rd 'size' parameter is named
in a somewhat misleading fashion - rename it to 'nr_pages' which
makes the units of it much clearer.

Meanwhile rename the existing local variable 'nr_pages' to
'nr_pmd_pages', a more expressive name, to avoid conflict with
new function parameter 'nr_pages'.

And also clean up the extra parentheses in which get_order()
is called.

Signed-off-by: Baoquan He 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: a...@linux-foundation.org
Link: http://lkml.kernel.org/r/1508849249-18035-1-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
v1->v2:
  Code change in v1 is incomplete, caused build failure. Change
  it after Ingo pointed it out.

  And Ingo helped rewrite the change log of v1. I also add description
  about the local variable change.

v2->v3:
  Make changes according to Ingo's comment:
  - Change the old local variable 'nr_pages' to 'nr_pmd_pages'.
  - Remove the extra parentheses where get_order() is called.
  
 arch/x86/mm/init_64.c | 10 +-
 include/linux/mm.h|  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 048fbe8fc274..adcea90a2046 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1426,16 +1426,16 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node)
 
 #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && 
defined(CONFIG_HAVE_BOOTMEM_INFO_NODE)
 void register_page_bootmem_memmap(unsigned long section_nr,
- struct page *start_page, unsigned long size)
+ struct page *start_page, unsigned long 
nr_pages)
 {
unsigned long addr = (unsigned long)start_page;
-   unsigned long end = (unsigned long)(start_page + size);
+   unsigned long end = (unsigned long)(start_page + nr_pages);
unsigned long next;
pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
-   unsigned int nr_pages;
+   unsigned int nr_pmd_pages;
struct page *page;
 
for (; addr < end; addr = next) {
@@ -1482,9 +1482,9 @@ void register_page_bootmem_memmap(unsigned long 
section_nr,
if (pmd_none(*pmd))
continue;
 
-   nr_pages = 1 << (get_order(PMD_SIZE));
+   nr_pmd_pages = 1 << get_order(PMD_SIZE);
page = pmd_page(*pmd);
-   while (nr_pages--)
+   while (nr_pmd_pages--)
get_page_bootmem(section_nr, page++,
 SECTION_INFO);
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 065d99deb847..b2c7045e9604 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2495,7 +2495,7 @@ void vmemmap_populate_print_last(void);
 void vmemmap_free(unsigned long start, unsigned long end);
 #endif
 void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
- unsigned long size);
+ unsigned long nr_pages);
 
 enum mf_flags {
MF_COUNT_INCREASED = 1 << 0,
-- 
2.5.5



Re: [PATCH] kprobes, x86/alternatives: use text_mutex to protect smp_alt_modules

2017-10-27 Thread zhouchengming

On 2017/10/27 22:15, Peter Zijlstra wrote:

On Fri, Oct 27, 2017 at 02:33:48PM +0200, Borislav Petkov wrote:

On Fri, Oct 27, 2017 at 07:42:45PM +0800, zhouchengming wrote:

This is a real bug happened on one of our machines, below is the calltrace.
We can see the trigger is at alternatives_text_reserved+0x20/0x80, and
encounter a deleted (poisoned) list_head.

Looks like some out-of-tree, old kernel thing. We don't have
mlx4_stats_sysfs_create() upstream and looking at the boot timestamps,
it could be that register_jprobe() is not ready yet.

Looking at the Code, though:

   20:   74 59   je 0x7b
   22:   66 0f 1f 84 00 00 00nopw   0x0(%rax,%rax,1)
   29:   00 00
   2b:*  48 3b 71 20 cmp0x20(%rcx),%rsi<-- trapping instruction
   2f:   72 3a   jb 0x6b
   31:   48 3b 79 28 cmp0x28(%rcx),%rdi
   35:   77 34   ja 0x6b

%rcx is 0xdead00d0 and that is POISON_POINTER_DELTA + 0xd0 so
that looks more like smp_alt_modules is not initialized yet but I could
could very well be wrong because this is an old kernel. So trigger that
with the upstream kernel without out of tree modules.

Not to mention that we're about (or just have) yanked jprobes out of the
kernel entirely.


Well... but this is a bug of alternatives_text_reserved(), it traverse the list 
without holding
the smp_alt mutex. So all users of it, like kprobes, will still have this 
problem. Maybe I could
think of a way to get rid of the mutex entirely.

Thanks.


.






Re: [PATCH] kprobes, x86/alternatives: use text_mutex to protect smp_alt_modules

2017-10-27 Thread zhouchengming

On 2017/10/27 22:15, Peter Zijlstra wrote:

On Fri, Oct 27, 2017 at 02:33:48PM +0200, Borislav Petkov wrote:

On Fri, Oct 27, 2017 at 07:42:45PM +0800, zhouchengming wrote:

This is a real bug happened on one of our machines, below is the calltrace.
We can see the trigger is at alternatives_text_reserved+0x20/0x80, and
encounter a deleted (poisoned) list_head.

Looks like some out-of-tree, old kernel thing. We don't have
mlx4_stats_sysfs_create() upstream and looking at the boot timestamps,
it could be that register_jprobe() is not ready yet.

Looking at the Code, though:

   20:   74 59   je 0x7b
   22:   66 0f 1f 84 00 00 00nopw   0x0(%rax,%rax,1)
   29:   00 00
   2b:*  48 3b 71 20 cmp0x20(%rcx),%rsi<-- trapping instruction
   2f:   72 3a   jb 0x6b
   31:   48 3b 79 28 cmp0x28(%rcx),%rdi
   35:   77 34   ja 0x6b

%rcx is 0xdead00d0 and that is POISON_POINTER_DELTA + 0xd0 so
that looks more like smp_alt_modules is not initialized yet but I could
could very well be wrong because this is an old kernel. So trigger that
with the upstream kernel without out of tree modules.

Not to mention that we're about (or just have) yanked jprobes out of the
kernel entirely.


Well... but this is a bug of alternatives_text_reserved(), it traverse the list 
without holding
the smp_alt mutex. So all users of it, like kprobes, will still have this 
problem. Maybe I could
think of a way to get rid of the mutex entirely.

Thanks.


.






Re: [GIT PULL] tracing/samples: Fix creation and deletion of simple_thread_fn creation

2017-10-27 Thread Steven Rostedt
On Fri, 27 Oct 2017 16:59:49 -0700
Linus Torvalds  wrote:

> I'm back home, which means I do full builds, and that in turn shows
> that this was garbage.

Unfortunately, I'm still in Prague, but I just finished my marathon of
presentations.

> 
> On Tue, Oct 17, 2017 at 2:21 PM, Steven Rostedt  wrote:
> >
> > Please pull the latest trace-v4.14-rc3 tree, which can be found at:
> >
> >   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> > trace-v4.14-rc3
> 
> This causes a compiler warning, for a very valid reason:
> 
> > +static bool simple_thread_cnt;
> >
> >  int foo_bar_reg(void)
> >  {
> > +   mutex_lock(_mutex);
> > +   if (simple_thread_cnt++)
> > +   goto out;
> 
> Yeah., take a closer look at that. It's complete and utter BS.
> 
> Please send a fix asap, or I'll just revert.

Yes, it is. Being sample code it doesn't get tested in my test suite (I
should fix that) and this is the first pull request I sent you that
didn't go through the testing (because my test suit wouldn't even test
it).

I'll write up a fix on the plane home on Saturday. Expect something by
Sunday.

Sorry about that. :-/

-- Steve


Re: [GIT PULL] tracing/samples: Fix creation and deletion of simple_thread_fn creation

2017-10-27 Thread Steven Rostedt
On Fri, 27 Oct 2017 16:59:49 -0700
Linus Torvalds  wrote:

> I'm back home, which means I do full builds, and that in turn shows
> that this was garbage.

Unfortunately, I'm still in Prague, but I just finished my marathon of
presentations.

> 
> On Tue, Oct 17, 2017 at 2:21 PM, Steven Rostedt  wrote:
> >
> > Please pull the latest trace-v4.14-rc3 tree, which can be found at:
> >
> >   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> > trace-v4.14-rc3
> 
> This causes a compiler warning, for a very valid reason:
> 
> > +static bool simple_thread_cnt;
> >
> >  int foo_bar_reg(void)
> >  {
> > +   mutex_lock(_mutex);
> > +   if (simple_thread_cnt++)
> > +   goto out;
> 
> Yeah., take a closer look at that. It's complete and utter BS.
> 
> Please send a fix asap, or I'll just revert.

Yes, it is. Being sample code it doesn't get tested in my test suite (I
should fix that) and this is the first pull request I sent you that
didn't go through the testing (because my test suit wouldn't even test
it).

I'll write up a fix on the plane home on Saturday. Expect something by
Sunday.

Sorry about that. :-/

-- Steve


Re: [PATCH 17/18] x86/asm/64: Remove thread_struct::sp0

2017-10-27 Thread Brian Gerst
On Thu, Oct 26, 2017 at 4:26 AM, Andy Lutomirski  wrote:
> On x86_64, we can easily calculate sp0 when needed instead of
> storing it in thread_struct.
>
> On x86_32, a similar cleanup would be possible, but it would require
> cleaning up the vm86 code first, and that can wait for a later
> cleanup series.
>
> Signed-off-by: Andy Lutomirski 
> ---
>  arch/x86/include/asm/compat.h|  1 +
>  arch/x86/include/asm/processor.h | 33 +++--
>  arch/x86/include/asm/switch_to.h |  6 ++
>  arch/x86/kernel/process_64.c |  1 -
>  4 files changed, 22 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
> index 5343c19814b3..948b6d8ec46f 100644
> --- a/arch/x86/include/asm/compat.h
> +++ b/arch/x86/include/asm/compat.h
> @@ -6,6 +6,7 @@
>   */
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> diff --git a/arch/x86/include/asm/processor.h 
> b/arch/x86/include/asm/processor.h
> index ad59cec14239..562c575d8bc3 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -430,7 +430,9 @@ typedef struct {
>  struct thread_struct {
> /* Cached TLS descriptors: */
> struct desc_struct  tls_array[GDT_ENTRY_TLS_ENTRIES];
> +#ifdef CONFIG_X86_32
> unsigned long   sp0;
> +#endif
> unsigned long   sp;
>  #ifdef CONFIG_X86_32
> unsigned long   sysenter_cs;
> @@ -797,6 +799,13 @@ static inline void spin_lock_prefetch(const void *x)
>
>  #define task_top_of_stack(task) ((unsigned long)(task_pt_regs(task) + 1))
>
> +#define task_pt_regs(task) \
> +({ \
> +   unsigned long __ptr = (unsigned long)task_stack_page(task); \
> +   __ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING; \
> +   ((struct pt_regs *)__ptr) - 1;  \
> +})
> +
>  #ifdef CONFIG_X86_32
>  /*
>   * User space process size: 3GB (default).
> @@ -816,23 +825,6 @@ static inline void spin_lock_prefetch(const void *x)
> .addr_limit = KERNEL_DS,  \
>  }
>
> -/*
> - * TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack.
> - * This is necessary to guarantee that the entire "struct pt_regs"
> - * is accessible even if the CPU haven't stored the SS/ESP registers
> - * on the stack (interrupt gate does not save these registers
> - * when switching to the same priv ring).
> - * Therefore beware: accessing the ss/esp fields of the
> - * "struct pt_regs" is possible, but they may contain the
> - * completely wrong values.
> - */
> -#define task_pt_regs(task) \
> -({ \
> -   unsigned long __ptr = (unsigned long)task_stack_page(task); \
> -   __ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING; \
> -   ((struct pt_regs *)__ptr) - 1;  \
> -})
> -
>  #define KSTK_ESP(task) (task_pt_regs(task)->sp)
>
>  #else
> @@ -865,12 +857,17 @@ static inline void spin_lock_prefetch(const void *x)
>  #define STACK_TOP  TASK_SIZE_LOW
>  #define STACK_TOP_MAX  TASK_SIZE_MAX
>
> +#ifdef CONFIG_X86_32
>  #define INIT_THREAD  { \
> .sp0= TOP_OF_INIT_STACK,\
> .addr_limit = KERNEL_DS,\
>  }
> +#else
> +#define INIT_THREAD  { \
> +   .addr_limit = KERNEL_DS,\
> +}
> +#endif

There is already a separate INIT_THREAD for 32-bit.  Just delete the sp0 member.

--
Brian Gerst


Re: [PATCH 17/18] x86/asm/64: Remove thread_struct::sp0

2017-10-27 Thread Brian Gerst
On Thu, Oct 26, 2017 at 4:26 AM, Andy Lutomirski  wrote:
> On x86_64, we can easily calculate sp0 when needed instead of
> storing it in thread_struct.
>
> On x86_32, a similar cleanup would be possible, but it would require
> cleaning up the vm86 code first, and that can wait for a later
> cleanup series.
>
> Signed-off-by: Andy Lutomirski 
> ---
>  arch/x86/include/asm/compat.h|  1 +
>  arch/x86/include/asm/processor.h | 33 +++--
>  arch/x86/include/asm/switch_to.h |  6 ++
>  arch/x86/kernel/process_64.c |  1 -
>  4 files changed, 22 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
> index 5343c19814b3..948b6d8ec46f 100644
> --- a/arch/x86/include/asm/compat.h
> +++ b/arch/x86/include/asm/compat.h
> @@ -6,6 +6,7 @@
>   */
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> diff --git a/arch/x86/include/asm/processor.h 
> b/arch/x86/include/asm/processor.h
> index ad59cec14239..562c575d8bc3 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -430,7 +430,9 @@ typedef struct {
>  struct thread_struct {
> /* Cached TLS descriptors: */
> struct desc_struct  tls_array[GDT_ENTRY_TLS_ENTRIES];
> +#ifdef CONFIG_X86_32
> unsigned long   sp0;
> +#endif
> unsigned long   sp;
>  #ifdef CONFIG_X86_32
> unsigned long   sysenter_cs;
> @@ -797,6 +799,13 @@ static inline void spin_lock_prefetch(const void *x)
>
>  #define task_top_of_stack(task) ((unsigned long)(task_pt_regs(task) + 1))
>
> +#define task_pt_regs(task) \
> +({ \
> +   unsigned long __ptr = (unsigned long)task_stack_page(task); \
> +   __ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING; \
> +   ((struct pt_regs *)__ptr) - 1;  \
> +})
> +
>  #ifdef CONFIG_X86_32
>  /*
>   * User space process size: 3GB (default).
> @@ -816,23 +825,6 @@ static inline void spin_lock_prefetch(const void *x)
> .addr_limit = KERNEL_DS,  \
>  }
>
> -/*
> - * TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack.
> - * This is necessary to guarantee that the entire "struct pt_regs"
> - * is accessible even if the CPU haven't stored the SS/ESP registers
> - * on the stack (interrupt gate does not save these registers
> - * when switching to the same priv ring).
> - * Therefore beware: accessing the ss/esp fields of the
> - * "struct pt_regs" is possible, but they may contain the
> - * completely wrong values.
> - */
> -#define task_pt_regs(task) \
> -({ \
> -   unsigned long __ptr = (unsigned long)task_stack_page(task); \
> -   __ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING; \
> -   ((struct pt_regs *)__ptr) - 1;  \
> -})
> -
>  #define KSTK_ESP(task) (task_pt_regs(task)->sp)
>
>  #else
> @@ -865,12 +857,17 @@ static inline void spin_lock_prefetch(const void *x)
>  #define STACK_TOP  TASK_SIZE_LOW
>  #define STACK_TOP_MAX  TASK_SIZE_MAX
>
> +#ifdef CONFIG_X86_32
>  #define INIT_THREAD  { \
> .sp0= TOP_OF_INIT_STACK,\
> .addr_limit = KERNEL_DS,\
>  }
> +#else
> +#define INIT_THREAD  { \
> +   .addr_limit = KERNEL_DS,\
> +}
> +#endif

There is already a separate INIT_THREAD for 32-bit.  Just delete the sp0 member.

--
Brian Gerst


Re: [PATCH] rpmsg: glink: Initialize the "intent_req_comp" completion variable

2017-10-27 Thread Chris Lew


On 10/27/2017 3:45 AM, Arun Kumar Neelakantam wrote:

The "intent_req_comp" variable is used without initialization which
results in NULL pointer derefernce in qcom_glink_request_intent().



Typo on dereference.

Thanks,
Chris
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH] rpmsg: glink: Initialize the "intent_req_comp" completion variable

2017-10-27 Thread Chris Lew


On 10/27/2017 3:45 AM, Arun Kumar Neelakantam wrote:

The "intent_req_comp" variable is used without initialization which
results in NULL pointer derefernce in qcom_glink_request_intent().



Typo on dereference.

Thanks,
Chris
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Ghazale Hosseinabadi



On 10/27/2017 04:19 PM, Hal Rosenstock wrote:

On 10/27/2017 7:04 PM, Ghazale Hosseinabadi wrote:


On 10/27/2017 03:52 PM, Hal Rosenstock wrote:

On 10/27/2017 5:54 PM, Ghazale Hosseinabadi wrote:

When running ibstat (if transceiver is not connected in adapter):

ibpanic: [7851] main: stat of IB device 'mlx5_1' failed: Invalid
argument

Any output before that ?

no, It only prints this line.

and setting the width to 1x in the driver so the rate file is properly
populated fixes this ?
Yes, because a value is written in 
/sys/class/infiniband/mlx5_X/ports/1/rate

  I must be missing something as to what is going
on in this scenario.
Without this bug fix, file /sys/class/infiniband/mlx5_X/ports/1/rate is 
empty, which results in ibpanic.


-- Ghazale


sysfs.c:rate_show is inconsistent as it paves over an invalid speed
setting that to SDR but does not pave over invalid width returning
-EINVAL but this comment is in another "direction".

-- Hal


-- Ghazale

   I'm trying to understand how far it gets. It
looks to me that empty rate file would be parsed as 0 and ibstat would
show that rate. ibpanic would occur if file was not found but I could be
missing something.







Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Ghazale Hosseinabadi



On 10/27/2017 04:19 PM, Hal Rosenstock wrote:

On 10/27/2017 7:04 PM, Ghazale Hosseinabadi wrote:


On 10/27/2017 03:52 PM, Hal Rosenstock wrote:

On 10/27/2017 5:54 PM, Ghazale Hosseinabadi wrote:

When running ibstat (if transceiver is not connected in adapter):

ibpanic: [7851] main: stat of IB device 'mlx5_1' failed: Invalid
argument

Any output before that ?

no, It only prints this line.

and setting the width to 1x in the driver so the rate file is properly
populated fixes this ?
Yes, because a value is written in 
/sys/class/infiniband/mlx5_X/ports/1/rate

  I must be missing something as to what is going
on in this scenario.
Without this bug fix, file /sys/class/infiniband/mlx5_X/ports/1/rate is 
empty, which results in ibpanic.


-- Ghazale


sysfs.c:rate_show is inconsistent as it paves over an invalid speed
setting that to SDR but does not pave over invalid width returning
-EINVAL but this comment is in another "direction".

-- Hal


-- Ghazale

   I'm trying to understand how far it gets. It
looks to me that empty rate file would be parsed as 0 and ibstat would
show that rate. ibpanic would occur if file was not found but I could be
missing something.







Re: [PATCH 2/2] scsi: megaraid: Track the page allocations for struct fusion_context

2017-10-27 Thread Yisheng Xie
hi Martin ,

On 2017/10/25 20:36, Martin K. Petersen wrote:
> 
> Yisheng,
> 
>> I have get many kmemleak reports just similar to commit 70c54e210ee9
>> (scsi: megaraid_sas: fix memleak in megasas_alloc_cmdlist_fusion)
>> on v4.14-rc6, however it seems have a different stroy:
> 
> Do you still see leaks reported with the megaraid driver update recently
> merged into 4.15/scsi-queue?
> 
No, the related code have been optimized and __get_free_pages is not used
to allocate fusion_context anymore. So, please ignore this one, and sorry
for disturbing.

BTW, what about the Patch 1/2, which is just a minor clean up?

Thanks
Yisheng Xie



Re: [PATCH 2/2] scsi: megaraid: Track the page allocations for struct fusion_context

2017-10-27 Thread Yisheng Xie
hi Martin ,

On 2017/10/25 20:36, Martin K. Petersen wrote:
> 
> Yisheng,
> 
>> I have get many kmemleak reports just similar to commit 70c54e210ee9
>> (scsi: megaraid_sas: fix memleak in megasas_alloc_cmdlist_fusion)
>> on v4.14-rc6, however it seems have a different stroy:
> 
> Do you still see leaks reported with the megaraid driver update recently
> merged into 4.15/scsi-queue?
> 
No, the related code have been optimized and __get_free_pages is not used
to allocate fusion_context anymore. So, please ignore this one, and sorry
for disturbing.

BTW, what about the Patch 1/2, which is just a minor clean up?

Thanks
Yisheng Xie



Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Ghazale Hosseinabadi



On 10/27/2017 05:17 PM, Hal Rosenstock wrote:

On 10/27/2017 7:19 PM, Hal Rosenstock wrote:

On 10/27/2017 7:04 PM, Ghazale Hosseinabadi wrote:


On 10/27/2017 03:52 PM, Hal Rosenstock wrote:

On 10/27/2017 5:54 PM, Ghazale Hosseinabadi wrote:

When running ibstat (if transceiver is not connected in adapter):

ibpanic: [7851] main: stat of IB device 'mlx5_1' failed: Invalid
argument

Any output before that ?

no, It only prints this line.

and setting the width to 1x in the driver so the rate file is properly
populated fixes this ? I must be missing something as to what is going
on in this scenario.

[off list...]
Are you using libibumad or rdma-core package ?

rdma-core

  Which version ?

rdma-core-13-25

What version of infiniband-diags are you using ?

infiniband-diags-1.6.7-1

Can you build from sources ?
I have patch to libibumad/rdma-core and another patch to ibstat
(infiniband-diags) which I'd like you to try. Is that possible ?

I haven't built user-land packages myself, but I can definitely try it.
Please send me the patches and I will try to build.

Thanks,
Ghazale


Thanks.

-- Hal


sysfs.c:rate_show is inconsistent as it paves over an invalid speed
setting that to SDR but does not pave over invalid width returning
-EINVAL but this comment is in another "direction".

-- Hal


-- Ghazale

   I'm trying to understand how far it gets. It
looks to me that empty rate file would be parsed as 0 and ibstat would
show that rate. ibpanic would occur if file was not found but I could be
missing something.







Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Ghazale Hosseinabadi



On 10/27/2017 05:17 PM, Hal Rosenstock wrote:

On 10/27/2017 7:19 PM, Hal Rosenstock wrote:

On 10/27/2017 7:04 PM, Ghazale Hosseinabadi wrote:


On 10/27/2017 03:52 PM, Hal Rosenstock wrote:

On 10/27/2017 5:54 PM, Ghazale Hosseinabadi wrote:

When running ibstat (if transceiver is not connected in adapter):

ibpanic: [7851] main: stat of IB device 'mlx5_1' failed: Invalid
argument

Any output before that ?

no, It only prints this line.

and setting the width to 1x in the driver so the rate file is properly
populated fixes this ? I must be missing something as to what is going
on in this scenario.

[off list...]
Are you using libibumad or rdma-core package ?

rdma-core

  Which version ?

rdma-core-13-25

What version of infiniband-diags are you using ?

infiniband-diags-1.6.7-1

Can you build from sources ?
I have patch to libibumad/rdma-core and another patch to ibstat
(infiniband-diags) which I'd like you to try. Is that possible ?

I haven't built user-land packages myself, but I can definitely try it.
Please send me the patches and I will try to build.

Thanks,
Ghazale


Thanks.

-- Hal


sysfs.c:rate_show is inconsistent as it paves over an invalid speed
setting that to SDR but does not pave over invalid width returning
-EINVAL but this comment is in another "direction".

-- Hal


-- Ghazale

   I'm trying to understand how far it gets. It
looks to me that empty rate file would be parsed as 0 and ibstat would
show that rate. ibpanic would occur if file was not found but I could be
missing something.







Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Hal Rosenstock
On 10/27/2017 7:19 PM, Hal Rosenstock wrote:
> On 10/27/2017 7:04 PM, Ghazale Hosseinabadi wrote:
>>
>>
>> On 10/27/2017 03:52 PM, Hal Rosenstock wrote:
>>> On 10/27/2017 5:54 PM, Ghazale Hosseinabadi wrote:
 When running ibstat (if transceiver is not connected in adapter):

 ibpanic: [7851] main: stat of IB device 'mlx5_1' failed: Invalid
 argument
>>> Any output before that ?
>> no, It only prints this line.
> 
> and setting the width to 1x in the driver so the rate file is properly
> populated fixes this ? I must be missing something as to what is going
> on in this scenario.

[off list...]
Are you using libibumad or rdma-core package ? Which version ?
What version of infiniband-diags are you using ?
Can you build from sources ?
I have patch to libibumad/rdma-core and another patch to ibstat
(infiniband-diags) which I'd like you to try. Is that possible ?

Thanks.

-- Hal

> sysfs.c:rate_show is inconsistent as it paves over an invalid speed
> setting that to SDR but does not pave over invalid width returning
> -EINVAL but this comment is in another "direction".
> 
> -- Hal
> 
>>
>> -- Ghazale
>>>   I'm trying to understand how far it gets. It
>>> looks to me that empty rate file would be parsed as 0 and ibstat would
>>> show that rate. ibpanic would occur if file was not found but I could be
>>> missing something.
>>>
>>
>>


Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Hal Rosenstock
On 10/27/2017 7:19 PM, Hal Rosenstock wrote:
> On 10/27/2017 7:04 PM, Ghazale Hosseinabadi wrote:
>>
>>
>> On 10/27/2017 03:52 PM, Hal Rosenstock wrote:
>>> On 10/27/2017 5:54 PM, Ghazale Hosseinabadi wrote:
 When running ibstat (if transceiver is not connected in adapter):

 ibpanic: [7851] main: stat of IB device 'mlx5_1' failed: Invalid
 argument
>>> Any output before that ?
>> no, It only prints this line.
> 
> and setting the width to 1x in the driver so the rate file is properly
> populated fixes this ? I must be missing something as to what is going
> on in this scenario.

[off list...]
Are you using libibumad or rdma-core package ? Which version ?
What version of infiniband-diags are you using ?
Can you build from sources ?
I have patch to libibumad/rdma-core and another patch to ibstat
(infiniband-diags) which I'd like you to try. Is that possible ?

Thanks.

-- Hal

> sysfs.c:rate_show is inconsistent as it paves over an invalid speed
> setting that to SDR but does not pave over invalid width returning
> -EINVAL but this comment is in another "direction".
> 
> -- Hal
> 
>>
>> -- Ghazale
>>>   I'm trying to understand how far it gets. It
>>> looks to me that empty rate file would be parsed as 0 and ibstat would
>>> show that rate. ibpanic would occur if file was not found but I could be
>>> missing something.
>>>
>>
>>


[PATCH v1] x86/smpboot: broken calibration path during cpu bringup

2017-10-27 Thread Pavel Tatashin
While studying why it takes 0.06s to bring up every cpu, which accounts to
15.36s on 256 cpu system, I determined that it is all because of
calibrate_delay() call.

After, studying code further I found that there are bugs in the current
code:

If tsc is enabled, and cpu has TSC_CONSTANT feature, and a cpu is in the
same core has already been calibrated, we do not need to calibrate again:

This check is done here:

calibrate_delay()
calibrate_delay_is_known()

But, calibrate_delay() is called before topology for new cpu is updated,
so we never actually take the optimized path.

The second bug, is that inside calibrate_delay_is_known() there is branch
like this:

if (!tsc_disabled && !cpu_has(_data(cpu), X86_FEATURE_CONSTANT_TSC))
return 0;

But the logic is broken, it should be:

if (tsc_disabled || !cpu_has(_data(cpu), X86_FEATURE_CONSTANT_TSC))
return 0;

Fixes: c25323c07345 ("x86/tsc: Use topology functions")

Signed-off-by: Pavel Tatashin 
---
 arch/x86/kernel/smpboot.c | 13 -
 arch/x86/kernel/tsc.c |  6 ++
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ad59edd84de7..e7a3bab6818b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -193,6 +193,14 @@ static void smp_callin(void)
 */
smp_store_cpu_info(cpuid);
 
+   /*
+* This must be done before setting cpu_online_mask
+* or calling notify_cpu_starting.
+* And also before calibrate_delay(), as the information about topology
+* is used to determine if calibration is needed.
+*/
+   set_cpu_sibling_map(raw_smp_processor_id());
+
/*
 * Get our bogomips.
 * Update loops_per_jiffy in cpu_data. Previous call to
@@ -203,11 +211,6 @@ static void smp_callin(void)
cpu_data(cpuid).loops_per_jiffy = loops_per_jiffy;
pr_debug("Stack at about %p\n", );
 
-   /*
-* This must be done before setting cpu_online_mask
-* or calling notify_cpu_starting.
-*/
-   set_cpu_sibling_map(raw_smp_processor_id());
wmb();
 
notify_cpu_starting(cpuid);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 796d96bb0821..a99cde96201f 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1346,12 +1346,10 @@ void __init tsc_init(void)
 unsigned long calibrate_delay_is_known(void)
 {
int sibling, cpu = smp_processor_id();
+   int constant_tsc = cpu_has(_data(cpu), X86_FEATURE_CONSTANT_TSC);
struct cpumask *mask = topology_core_cpumask(cpu);
 
-   if (!tsc_disabled && !cpu_has(_data(cpu), X86_FEATURE_CONSTANT_TSC))
-   return 0;
-
-   if (!mask)
+   if (tsc_disabled || !constant_tsc || !mask)
return 0;
 
sibling = cpumask_any_but(mask, cpu);
-- 
2.14.3



[PATCH v1] x86/smpboot: broken calibration path during cpu bringup

2017-10-27 Thread Pavel Tatashin
While studying why it takes 0.06s to bring up every cpu, which accounts to
15.36s on 256 cpu system, I determined that it is all because of
calibrate_delay() call.

After, studying code further I found that there are bugs in the current
code:

If tsc is enabled, and cpu has TSC_CONSTANT feature, and a cpu is in the
same core has already been calibrated, we do not need to calibrate again:

This check is done here:

calibrate_delay()
calibrate_delay_is_known()

But, calibrate_delay() is called before topology for new cpu is updated,
so we never actually take the optimized path.

The second bug, is that inside calibrate_delay_is_known() there is branch
like this:

if (!tsc_disabled && !cpu_has(_data(cpu), X86_FEATURE_CONSTANT_TSC))
return 0;

But the logic is broken, it should be:

if (tsc_disabled || !cpu_has(_data(cpu), X86_FEATURE_CONSTANT_TSC))
return 0;

Fixes: c25323c07345 ("x86/tsc: Use topology functions")

Signed-off-by: Pavel Tatashin 
---
 arch/x86/kernel/smpboot.c | 13 -
 arch/x86/kernel/tsc.c |  6 ++
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ad59edd84de7..e7a3bab6818b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -193,6 +193,14 @@ static void smp_callin(void)
 */
smp_store_cpu_info(cpuid);
 
+   /*
+* This must be done before setting cpu_online_mask
+* or calling notify_cpu_starting.
+* And also before calibrate_delay(), as the information about topology
+* is used to determine if calibration is needed.
+*/
+   set_cpu_sibling_map(raw_smp_processor_id());
+
/*
 * Get our bogomips.
 * Update loops_per_jiffy in cpu_data. Previous call to
@@ -203,11 +211,6 @@ static void smp_callin(void)
cpu_data(cpuid).loops_per_jiffy = loops_per_jiffy;
pr_debug("Stack at about %p\n", );
 
-   /*
-* This must be done before setting cpu_online_mask
-* or calling notify_cpu_starting.
-*/
-   set_cpu_sibling_map(raw_smp_processor_id());
wmb();
 
notify_cpu_starting(cpuid);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 796d96bb0821..a99cde96201f 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1346,12 +1346,10 @@ void __init tsc_init(void)
 unsigned long calibrate_delay_is_known(void)
 {
int sibling, cpu = smp_processor_id();
+   int constant_tsc = cpu_has(_data(cpu), X86_FEATURE_CONSTANT_TSC);
struct cpumask *mask = topology_core_cpumask(cpu);
 
-   if (!tsc_disabled && !cpu_has(_data(cpu), X86_FEATURE_CONSTANT_TSC))
-   return 0;
-
-   if (!mask)
+   if (tsc_disabled || !constant_tsc || !mask)
return 0;
 
sibling = cpumask_any_but(mask, cpu);
-- 
2.14.3



Re: [RFC PATCH v8 7/7] PCI / PM: Add support for the PCIe WAKE# signal for OF

2017-10-27 Thread Brian Norris
Hi Sinan,

I'm not sure I understand all your suggestions below.

On Thu, Oct 26, 2017 at 11:16:55AM -0400, Sinan Kaya wrote:
> On 10/26/2017 9:28 AM, Jeffy Chen wrote:
> >  drivers/pci/Makefile |   2 +-
> >  drivers/pci/pci-of.c | 136 
> > +++
> >  2 files changed, 137 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/pci/pci-of.c
> > 
> > diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
> > index 66a21acad952..4f76dbdb024c 100644
> > --- a/drivers/pci/Makefile
> > +++ b/drivers/pci/Makefile
> > @@ -49,7 +49,7 @@ obj-$(CONFIG_PCI_ECAM) += ecam.o
> >  
> >  obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
> >  
> > -obj-$(CONFIG_OF) += of.o
> > +obj-$(CONFIG_OF) += of.o pci-of.o
> 
> If the intention is to push this to pci directory, this code needs to be made
> platform agnostic by splitting into two pieces.
> 
> I think you can make this code common by abstracting the IRQ number and
> have some generic code like pci-wake.c in pci directory without the of prefix
> in this file.

Why would 'pci-wake' be a good name for this? We're doing basically the
same things that pci-acpi.c is doing (it also can configure the WAKE#
signal, via ACPI firmware calls).

> Then, you can have some other OF specific code in the drivers/of directory
> that reads the IRQ from OF and calls the common code in PCI directory.

Why is drivers/of/ special? I thought in general, the DT maintainers
preferred to move domain-specific stuff into the respective subsystems.

Also, the extraction is a very tiny piece of code, and the logic around
walking a PCI tree is the more important part. And in the
meantime...Jeffy has sent two more revisions of this patch set already,
and he did the latter (I like his abstraction of PCI device trees,
shared between ACPI and OF code) but not the former (it's still all
'pci-of.c').

Feel free to comment further on here or on v10, but at the moment I'm
not sure I understand yet how your suggestions would improve things.

Brian


Re: [RFC PATCH v8 7/7] PCI / PM: Add support for the PCIe WAKE# signal for OF

2017-10-27 Thread Brian Norris
Hi Sinan,

I'm not sure I understand all your suggestions below.

On Thu, Oct 26, 2017 at 11:16:55AM -0400, Sinan Kaya wrote:
> On 10/26/2017 9:28 AM, Jeffy Chen wrote:
> >  drivers/pci/Makefile |   2 +-
> >  drivers/pci/pci-of.c | 136 
> > +++
> >  2 files changed, 137 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/pci/pci-of.c
> > 
> > diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
> > index 66a21acad952..4f76dbdb024c 100644
> > --- a/drivers/pci/Makefile
> > +++ b/drivers/pci/Makefile
> > @@ -49,7 +49,7 @@ obj-$(CONFIG_PCI_ECAM) += ecam.o
> >  
> >  obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
> >  
> > -obj-$(CONFIG_OF) += of.o
> > +obj-$(CONFIG_OF) += of.o pci-of.o
> 
> If the intention is to push this to pci directory, this code needs to be made
> platform agnostic by splitting into two pieces.
> 
> I think you can make this code common by abstracting the IRQ number and
> have some generic code like pci-wake.c in pci directory without the of prefix
> in this file.

Why would 'pci-wake' be a good name for this? We're doing basically the
same things that pci-acpi.c is doing (it also can configure the WAKE#
signal, via ACPI firmware calls).

> Then, you can have some other OF specific code in the drivers/of directory
> that reads the IRQ from OF and calls the common code in PCI directory.

Why is drivers/of/ special? I thought in general, the DT maintainers
preferred to move domain-specific stuff into the respective subsystems.

Also, the extraction is a very tiny piece of code, and the logic around
walking a PCI tree is the more important part. And in the
meantime...Jeffy has sent two more revisions of this patch set already,
and he did the latter (I like his abstraction of PCI device trees,
shared between ACPI and OF code) but not the former (it's still all
'pci-of.c').

Feel free to comment further on here or on v10, but at the moment I'm
not sure I understand yet how your suggestions would improve things.

Brian


Re: [Part2 PATCH v6 13/38] crypto: ccp: Add Secure Encrypted Virtualization (SEV) command support

2017-10-27 Thread Borislav Petkov
On Fri, Oct 27, 2017 at 05:59:23PM -0500, Brijesh Singh wrote:
> Yes it is typo. PEK_GEN wants FW to be in INIT state hence someone need
> to transition from UNINIT -> INIT.

Which, once you've done it once on driver init, is there.

> That's what I am doing except FACTORY_RESET.

Well, not really. Lemme pick a command at random...

PEK_CSR. For that, you do INIT -> PEK_CSR -> SHUTDOWN.

Doc says, platform needs to be in INIT or WORKING state. But nothing
says you should shut it down. Spec says, SHUTDOWN transitions platform
to UNINIT state. So when the next command comes in which needs the
platform to be in INIT state, you go and INIT it again. For no reason
*WHATSOEVER*!

I know, you're gonna say, but what if the next command needs a different
state than INIT. Well, *then* you transition it, in the command
function. When that function executes. But not before that and not in
preparation that *maybe* the next command will be it.

Now, if you did:

INIT once during driver init

PEK_CSR

(platform remains in INIT state)

<--- the next command here can execute directly if it is allowed in INIT
state.

Instead, the platform has been shutdown and you init it again. Do you
see now what I mean?

IOW, once you init the PSP master, you should keep it in the INIT state
- or the state in which most commands expect it to be and thus save
yourself all that unnecessary toggling. If a command needs it to be in a
different state, only *then* you transition it.

Instead, what you have now is that you call INIT and SHUTDOWN
around SEV_PEK_GEN, SEV_PDH_GEN, SEV_PEK_CSR, SEV_PEK_CERT_IMPORT,
SEV_PDH_CERT_EXPORT and for all those, the platform must be in INIT
(for some in WORKING state) but for all in INIT state and "The platform
remains be in the same state after completion." So the whole SHUTDOWN ->
INIT wankery in-between is a pure waste of electrons.

>  I see that we can do a small optimization -- since we already know
> the FW state hence we can avoid issuing PSP command when we know for
> sure that command will fail because we are not in correct state.

As I said before, you should do that regardless by recording the current
state of the PSP in variable so that you can save yourself the status
querying.

> If command needs INIT state and FW is not in INIT state then its safe to
> transition from UNINIT -> INIT. But if command needs UNINIT state and FW
> is in INIT state then its not safe to transition -- in those case we
> simply return EBUSY and let the user retry the command.

Whatever - that doesn't contradict what I'm proposing.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [Part2 PATCH v6 13/38] crypto: ccp: Add Secure Encrypted Virtualization (SEV) command support

2017-10-27 Thread Borislav Petkov
On Fri, Oct 27, 2017 at 05:59:23PM -0500, Brijesh Singh wrote:
> Yes it is typo. PEK_GEN wants FW to be in INIT state hence someone need
> to transition from UNINIT -> INIT.

Which, once you've done it once on driver init, is there.

> That's what I am doing except FACTORY_RESET.

Well, not really. Lemme pick a command at random...

PEK_CSR. For that, you do INIT -> PEK_CSR -> SHUTDOWN.

Doc says, platform needs to be in INIT or WORKING state. But nothing
says you should shut it down. Spec says, SHUTDOWN transitions platform
to UNINIT state. So when the next command comes in which needs the
platform to be in INIT state, you go and INIT it again. For no reason
*WHATSOEVER*!

I know, you're gonna say, but what if the next command needs a different
state than INIT. Well, *then* you transition it, in the command
function. When that function executes. But not before that and not in
preparation that *maybe* the next command will be it.

Now, if you did:

INIT once during driver init

PEK_CSR

(platform remains in INIT state)

<--- the next command here can execute directly if it is allowed in INIT
state.

Instead, the platform has been shutdown and you init it again. Do you
see now what I mean?

IOW, once you init the PSP master, you should keep it in the INIT state
- or the state in which most commands expect it to be and thus save
yourself all that unnecessary toggling. If a command needs it to be in a
different state, only *then* you transition it.

Instead, what you have now is that you call INIT and SHUTDOWN
around SEV_PEK_GEN, SEV_PDH_GEN, SEV_PEK_CSR, SEV_PEK_CERT_IMPORT,
SEV_PDH_CERT_EXPORT and for all those, the platform must be in INIT
(for some in WORKING state) but for all in INIT state and "The platform
remains be in the same state after completion." So the whole SHUTDOWN ->
INIT wankery in-between is a pure waste of electrons.

>  I see that we can do a small optimization -- since we already know
> the FW state hence we can avoid issuing PSP command when we know for
> sure that command will fail because we are not in correct state.

As I said before, you should do that regardless by recording the current
state of the PSP in variable so that you can save yourself the status
querying.

> If command needs INIT state and FW is not in INIT state then its safe to
> transition from UNINIT -> INIT. But if command needs UNINIT state and FW
> is in INIT state then its not safe to transition -- in those case we
> simply return EBUSY and let the user retry the command.

Whatever - that doesn't contradict what I'm proposing.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [GIT PULL] tracing/samples: Fix creation and deletion of simple_thread_fn creation

2017-10-27 Thread Linus Torvalds
I'm back home, which means I do full builds, and that in turn shows
that this was garbage.

On Tue, Oct 17, 2017 at 2:21 PM, Steven Rostedt  wrote:
>
> Please pull the latest trace-v4.14-rc3 tree, which can be found at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> trace-v4.14-rc3

This causes a compiler warning, for a very valid reason:

> +static bool simple_thread_cnt;
>
>  int foo_bar_reg(void)
>  {
> +   mutex_lock(_mutex);
> +   if (simple_thread_cnt++)
> +   goto out;

Yeah., take a closer look at that. It's complete and utter BS.

Please send a fix asap, or I'll just revert.

 Linus


Re: [GIT PULL] tracing/samples: Fix creation and deletion of simple_thread_fn creation

2017-10-27 Thread Linus Torvalds
I'm back home, which means I do full builds, and that in turn shows
that this was garbage.

On Tue, Oct 17, 2017 at 2:21 PM, Steven Rostedt  wrote:
>
> Please pull the latest trace-v4.14-rc3 tree, which can be found at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> trace-v4.14-rc3

This causes a compiler warning, for a very valid reason:

> +static bool simple_thread_cnt;
>
>  int foo_bar_reg(void)
>  {
> +   mutex_lock(_mutex);
> +   if (simple_thread_cnt++)
> +   goto out;

Yeah., take a closer look at that. It's complete and utter BS.

Please send a fix asap, or I'll just revert.

 Linus


Re: [PATCH] coccinelle: fix verbose message about .cocci file being run

2017-10-27 Thread Jim Davis
On Wed, Oct 25, 2017 at 9:55 PM, Masahiro Yamada
 wrote:
> If you run coccicheck with V=1 and COCCI=, you will see a strange
> path to the semantic patch file.  For example, run the following:
>
> $ make V=1 COCCI=scripts/coccinelle/free/kfree.cocci coccicheck
>   [ snip ]
>  The semantic patch that makes this report is available
>  in scriptcoccinelle/free/kfree.cocci.
>
> Notice "s/" was dropped from "scripts/coccinelle/free/kfree.cocci".
>
> When running coccicheck without O=, $srctree is expanded to ".", which
> represents one arbitrary character in the regular expression.  Using
> sed is not a good choice here.  Strip $srctree/ simply without sed.
>
> Signed-off-by: Masahiro Yamada 
> ---
>
>  scripts/coccicheck | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/scripts/coccicheck b/scripts/coccicheck
> index 1bfa2d2..9d18662 100755
> --- a/scripts/coccicheck
> +++ b/scripts/coccicheck
> @@ -186,7 +186,7 @@ coccinelle () {
>
>  if [ $VERBOSE -ne 0 -a $ONLINE -eq 0 ] ; then
>
> -   FILE=`echo $COCCI | sed "s|$srctree/||"`
> +   FILE=${COCCI#$srctree/}

[jim@krebstar linux-rc]$ make CONFIG_SHELL=dash V=1
COCCI=scripts/coccinelle/free/kfree.cocci coccicheck
dash ./scripts/coccicheck
./scripts/coccicheck: 63: ./scripts/coccicheck: Bad substitution
make: *** [Makefile:1585: coccicheck] Error 2

-- 
Jim


Re: [PATCH] coccinelle: fix verbose message about .cocci file being run

2017-10-27 Thread Jim Davis
On Wed, Oct 25, 2017 at 9:55 PM, Masahiro Yamada
 wrote:
> If you run coccicheck with V=1 and COCCI=, you will see a strange
> path to the semantic patch file.  For example, run the following:
>
> $ make V=1 COCCI=scripts/coccinelle/free/kfree.cocci coccicheck
>   [ snip ]
>  The semantic patch that makes this report is available
>  in scriptcoccinelle/free/kfree.cocci.
>
> Notice "s/" was dropped from "scripts/coccinelle/free/kfree.cocci".
>
> When running coccicheck without O=, $srctree is expanded to ".", which
> represents one arbitrary character in the regular expression.  Using
> sed is not a good choice here.  Strip $srctree/ simply without sed.
>
> Signed-off-by: Masahiro Yamada 
> ---
>
>  scripts/coccicheck | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/scripts/coccicheck b/scripts/coccicheck
> index 1bfa2d2..9d18662 100755
> --- a/scripts/coccicheck
> +++ b/scripts/coccicheck
> @@ -186,7 +186,7 @@ coccinelle () {
>
>  if [ $VERBOSE -ne 0 -a $ONLINE -eq 0 ] ; then
>
> -   FILE=`echo $COCCI | sed "s|$srctree/||"`
> +   FILE=${COCCI#$srctree/}

[jim@krebstar linux-rc]$ make CONFIG_SHELL=dash V=1
COCCI=scripts/coccinelle/free/kfree.cocci coccicheck
dash ./scripts/coccicheck
./scripts/coccicheck: 63: ./scripts/coccicheck: Bad substitution
make: *** [Makefile:1585: coccicheck] Error 2

-- 
Jim


[PATCH v10 02/13] x86/insn-eval: Compute linear address in several utility functions

2017-10-27 Thread Ricardo Neri
Computing a linear address involves several steps. The first step is to
compute the effective address. This involves determining the addressing
mode in use and perform arithmetic operations on the operands. Plus, each
addressing mode has special cases that must be handled.

Once the effective address is known, the base address of the applicable
segment is added to obtain the linear address.

Clearly, this is too much work for a single function. Instead, handle each
addressing mode in a separate utility function. This improves readability
and gives us the opportunity to handler errors better.

At the moment, arithmetic to compute the effective address uses 8-byte
variables. Thus, limit support to 64-bit addresses.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Suggested-by: Borislav Petkov 
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 243 ---
 1 file changed, 186 insertions(+), 57 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 91f08aa..4aa3c48 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -776,6 +776,182 @@ static int get_seg_base_limit(struct insn *insn, struct 
pt_regs *regs,
return 0;
 }
 
+/**
+ * get_eff_addr_reg() - Obtain effective address from register operand
+ * @insn:  Instruction. Must be valid.
+ * @regs:  Register values as seen when entering kernel mode
+ * @regoff:Obtained operand offset, in pt_regs, with the effective address
+ * @eff_addr:  Obtained effective address
+ *
+ * Obtain the effective address stored in the register operand as indicated by
+ * the ModRM byte. This function is to be used only with register addressing
+ * (i.e.,  ModRM.mod is 3). The effective address is saved in @eff_addr. The
+ * register operand, as an offset from the base of pt_regs, is saved in 
@regoff;
+ * such offset can then be used to resolve the segment associated with the
+ * operand. This function can be used with any of the supported address sizes
+ * in x86.
+ *
+ * Returns:
+ *
+ * 0 on success. @eff_addr will have the effective address stored in the
+ * operand indicated by ModRM. @regoff will have such operand as an offset from
+ * the base of pt_regs.
+ *
+ * -EINVAL on error.
+ */
+static int get_eff_addr_reg(struct insn *insn, struct pt_regs *regs,
+   int *regoff, long *eff_addr)
+{
+   insn_get_modrm(insn);
+
+   if (!insn->modrm.nbytes)
+   return -EINVAL;
+
+   if (X86_MODRM_MOD(insn->modrm.value) != 3)
+   return -EINVAL;
+
+   *regoff = get_reg_offset(insn, regs, REG_TYPE_RM);
+   if (*regoff < 0)
+   return -EINVAL;
+
+   *eff_addr = (long)regs_get_register(regs, *regoff);
+
+   return 0;
+}
+
+/**
+ * get_eff_addr_modrm() - Obtain referenced effective address via ModRM
+ * @insn:  Instruction. Must be valid.
+ * @regs:  Register values as seen when entering kernel mode
+ * @regoff:Obtained operand offset, in pt_regs, associated with segment
+ * @eff_addr:  Obtained effective address
+ *
+ * Obtain the effective address referenced by the ModRM byte of @insn. After
+ * identifying the registers involved in the register-indirect memory 
reference,
+ * its value is obtained from the operands in @regs. The computed address is
+ * stored @eff_addr. Also, the register operand that indicates the associated
+ * segment is stored in @regoff, this parameter can later be used to determine
+ * such segment. This function can be used for both 32-bit and 64-bit effective
+ * addresses.
+ *
+ * Returns:
+ *
+ * 0 on success. @eff_addr will have the referenced effective address. @regoff
+ * will have a register, as an offset from the base of pt_regs, that can be 
used
+ * to resolve the associated segment.
+ *
+ * -EINVAL on error.
+ */
+static int get_eff_addr_modrm(struct insn *insn, struct pt_regs *regs,
+ int *regoff, long *eff_addr)
+{
+   long tmp;
+
+   if (insn->addr_bytes != 8)
+   return -EINVAL;
+
+   insn_get_modrm(insn);
+
+   if (!insn->modrm.nbytes)
+   return -EINVAL;
+
+   if (X86_MODRM_MOD(insn->modrm.value) > 2)
+   return -EINVAL;
+
+   *regoff = get_reg_offset(insn, regs, REG_TYPE_RM);
+   /*
+* -EDOM means that we must ignore the 

[PATCH v10 02/13] x86/insn-eval: Compute linear address in several utility functions

2017-10-27 Thread Ricardo Neri
Computing a linear address involves several steps. The first step is to
compute the effective address. This involves determining the addressing
mode in use and perform arithmetic operations on the operands. Plus, each
addressing mode has special cases that must be handled.

Once the effective address is known, the base address of the applicable
segment is added to obtain the linear address.

Clearly, this is too much work for a single function. Instead, handle each
addressing mode in a separate utility function. This improves readability
and gives us the opportunity to handler errors better.

At the moment, arithmetic to compute the effective address uses 8-byte
variables. Thus, limit support to 64-bit addresses.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Suggested-by: Borislav Petkov 
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 243 ---
 1 file changed, 186 insertions(+), 57 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 91f08aa..4aa3c48 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -776,6 +776,182 @@ static int get_seg_base_limit(struct insn *insn, struct 
pt_regs *regs,
return 0;
 }
 
+/**
+ * get_eff_addr_reg() - Obtain effective address from register operand
+ * @insn:  Instruction. Must be valid.
+ * @regs:  Register values as seen when entering kernel mode
+ * @regoff:Obtained operand offset, in pt_regs, with the effective address
+ * @eff_addr:  Obtained effective address
+ *
+ * Obtain the effective address stored in the register operand as indicated by
+ * the ModRM byte. This function is to be used only with register addressing
+ * (i.e.,  ModRM.mod is 3). The effective address is saved in @eff_addr. The
+ * register operand, as an offset from the base of pt_regs, is saved in 
@regoff;
+ * such offset can then be used to resolve the segment associated with the
+ * operand. This function can be used with any of the supported address sizes
+ * in x86.
+ *
+ * Returns:
+ *
+ * 0 on success. @eff_addr will have the effective address stored in the
+ * operand indicated by ModRM. @regoff will have such operand as an offset from
+ * the base of pt_regs.
+ *
+ * -EINVAL on error.
+ */
+static int get_eff_addr_reg(struct insn *insn, struct pt_regs *regs,
+   int *regoff, long *eff_addr)
+{
+   insn_get_modrm(insn);
+
+   if (!insn->modrm.nbytes)
+   return -EINVAL;
+
+   if (X86_MODRM_MOD(insn->modrm.value) != 3)
+   return -EINVAL;
+
+   *regoff = get_reg_offset(insn, regs, REG_TYPE_RM);
+   if (*regoff < 0)
+   return -EINVAL;
+
+   *eff_addr = (long)regs_get_register(regs, *regoff);
+
+   return 0;
+}
+
+/**
+ * get_eff_addr_modrm() - Obtain referenced effective address via ModRM
+ * @insn:  Instruction. Must be valid.
+ * @regs:  Register values as seen when entering kernel mode
+ * @regoff:Obtained operand offset, in pt_regs, associated with segment
+ * @eff_addr:  Obtained effective address
+ *
+ * Obtain the effective address referenced by the ModRM byte of @insn. After
+ * identifying the registers involved in the register-indirect memory 
reference,
+ * its value is obtained from the operands in @regs. The computed address is
+ * stored @eff_addr. Also, the register operand that indicates the associated
+ * segment is stored in @regoff, this parameter can later be used to determine
+ * such segment. This function can be used for both 32-bit and 64-bit effective
+ * addresses.
+ *
+ * Returns:
+ *
+ * 0 on success. @eff_addr will have the referenced effective address. @regoff
+ * will have a register, as an offset from the base of pt_regs, that can be 
used
+ * to resolve the associated segment.
+ *
+ * -EINVAL on error.
+ */
+static int get_eff_addr_modrm(struct insn *insn, struct pt_regs *regs,
+ int *regoff, long *eff_addr)
+{
+   long tmp;
+
+   if (insn->addr_bytes != 8)
+   return -EINVAL;
+
+   insn_get_modrm(insn);
+
+   if (!insn->modrm.nbytes)
+   return -EINVAL;
+
+   if (X86_MODRM_MOD(insn->modrm.value) > 2)
+   return -EINVAL;
+
+   *regoff = get_reg_offset(insn, regs, REG_TYPE_RM);
+   /*
+* -EDOM means that we must ignore the address_offset. In such a case,
+* in 64-bit mode the effective address relative to the RIP of the
+* following instruction.
+*/
+   if (*regoff == -EDOM) {
+   if (user_64bit_mode(regs))
+   tmp = (long)regs->ip + insn->length;
+   else
+   tmp = 0;
+   } else if 

[PATCH v10 05/13] x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode

2017-10-27 Thread Ricardo Neri
It is possible to utilize 32-bit address encodings in virtual-8086 mode via
an address override instruction prefix. However, the range of the
effective address is still limited to [0x-0x]. In such a case, return
error.

Also, linear addresses in virtual-8086 mode are limited to 20 bits. Enforce
such limit by truncating the most significant bytes of the computed linear
address.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index d5618ee..66d597d 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -1043,12 +1043,23 @@ static void __user *get_addr_ref_32(struct insn *insn, 
struct pt_regs *regs)
goto out;
 
/*
+* Even though 32-bit address encodings are allowed in virtual-8086
+* mode, the address range is still limited to [0x-0x].
+*/
+   if (v8086_mode(regs) && (eff_addr & ~0x))
+   goto out;
+
+   /*
 * Data type long could be 64 bits in size. Ensure that our 32-bit
 * effective address is not sign-extended when computing the linear
 * address.
 */
linear_addr = (unsigned long)(eff_addr & 0x) + seg_base;
 
+   /* Limit linear address to 20 bits */
+   if (v8086_mode(regs))
+   linear_addr &= 0xf;
+
 out:
return (void __user *)linear_addr;
 }
-- 
2.7.4



[PATCH v10 05/13] x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode

2017-10-27 Thread Ricardo Neri
It is possible to utilize 32-bit address encodings in virtual-8086 mode via
an address override instruction prefix. However, the range of the
effective address is still limited to [0x-0x]. In such a case, return
error.

Also, linear addresses in virtual-8086 mode are limited to 20 bits. Enforce
such limit by truncating the most significant bytes of the computed linear
address.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index d5618ee..66d597d 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -1043,12 +1043,23 @@ static void __user *get_addr_ref_32(struct insn *insn, 
struct pt_regs *regs)
goto out;
 
/*
+* Even though 32-bit address encodings are allowed in virtual-8086
+* mode, the address range is still limited to [0x-0x].
+*/
+   if (v8086_mode(regs) && (eff_addr & ~0x))
+   goto out;
+
+   /*
 * Data type long could be 64 bits in size. Ensure that our 32-bit
 * effective address is not sign-extended when computing the linear
 * address.
 */
linear_addr = (unsigned long)(eff_addr & 0x) + seg_base;
 
+   /* Limit linear address to 20 bits */
+   if (v8086_mode(regs))
+   linear_addr &= 0xf;
+
 out:
return (void __user *)linear_addr;
 }
-- 
2.7.4



[PATCH v10 07/13] x86/cpufeature: Add User-Mode Instruction Prevention definitions

2017-10-27 Thread Ricardo Neri
User-Mode Instruction Prevention is a security feature present in new
Intel processors that, when set, prevents the execution of a subset of
instructions if such instructions are executed in user mode (CPL > 0).
Attempting to execute such instructions causes a general protection
exception.

The subset of instructions comprises:

 * SGDT - Store Global Descriptor Table
 * SIDT - Store Interrupt Descriptor Table
 * SLDT - Store Local Descriptor Table
 * SMSW - Store Machine Status Word
 * STR  - Store Task Register

This feature is also added to the list of disabled-features to allow
a cleaner handling of build-time configuration.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: x...@kernel.org
Reviewed-by: Borislav Petkov 
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/cpufeatures.h  | 1 +
 arch/x86/include/asm/disabled-features.h| 8 +++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 401a709..ca0cc2d 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -297,6 +297,7 @@
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (ecx), word 16 */
 #define X86_FEATURE_AVX512VBMI  (16*32+ 1) /* AVX512 Vector Bit Manipulation 
instructions*/
+#define X86_FEATURE_UMIP   (16*32+ 2) /* User Mode Instruction Protection 
*/
 #define X86_FEATURE_PKU(16*32+ 3) /* Protection Keys for 
Userspace */
 #define X86_FEATURE_OSPKE  (16*32+ 4) /* OS Protection Keys Enable */
 #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW 
*/
diff --git a/arch/x86/include/asm/disabled-features.h 
b/arch/x86/include/asm/disabled-features.h
index c10c912..14d6d50 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -16,6 +16,12 @@
 # define DISABLE_MPX   (1<<(X86_FEATURE_MPX & 31))
 #endif
 
+#ifdef CONFIG_X86_INTEL_UMIP
+# define DISABLE_UMIP  0
+#else
+# define DISABLE_UMIP  (1<<(X86_FEATURE_UMIP & 31))
+#endif
+
 #ifdef CONFIG_X86_64
 # define DISABLE_VME   (1<<(X86_FEATURE_VME & 31))
 # define DISABLE_K6_MTRR   (1<<(X86_FEATURE_K6_MTRR & 31))
@@ -63,7 +69,7 @@
 #define DISABLED_MASK130
 #define DISABLED_MASK140
 #define DISABLED_MASK150
-#define DISABLED_MASK16(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57)
+#define DISABLED_MASK16
(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP)
 #define DISABLED_MASK170
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
 
diff --git a/arch/x86/include/uapi/asm/processor-flags.h 
b/arch/x86/include/uapi/asm/processor-flags.h
index 39946d0..cf4c876 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -104,6 +104,8 @@
 #define X86_CR4_OSFXSR _BITUL(X86_CR4_OSFXSR_BIT)
 #define X86_CR4_OSXMMEXCPT_BIT 10 /* enable unmasked SSE exceptions */
 #define X86_CR4_OSXMMEXCPT _BITUL(X86_CR4_OSXMMEXCPT_BIT)
+#define X86_CR4_UMIP_BIT   11 /* enable UMIP support */
+#define X86_CR4_UMIP   _BITUL(X86_CR4_UMIP_BIT)
 #define X86_CR4_LA57_BIT   12 /* enable 5-level page tables */
 #define X86_CR4_LA57   _BITUL(X86_CR4_LA57_BIT)
 #define X86_CR4_VMXE_BIT   13 /* enable VMX virtualization */
-- 
2.7.4



[PATCH v10 04/13] x86/insn-eval: Add wrapper function for 32 and 64-bit addresses

2017-10-27 Thread Ricardo Neri
The function insn_get_addr_ref() is capable of handling only 64-bit
addresses. A previous commit introduced a function to handle 32-bit
addresses. Invoke these two functions from a third wrapper function that
calls the appropriate routine based on the address size specified in the
instruction structure (obtained by looking at the code segment default
address size and the address override prefix, if present).

While doing this, rename the original function insn_get_addr_ref() with
the more appropriate name get_addr_ref_64(), ensure it is only used
for 64-bit addresses.

Also, since 64-bit addresses are not possible in 32-bit builds, provide
a dummy function such case.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 60 
 1 file changed, 55 insertions(+), 5 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 7133a47..d5618ee 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -1053,17 +1053,36 @@ static void __user *get_addr_ref_32(struct insn *insn, 
struct pt_regs *regs)
return (void __user *)linear_addr;
 }
 
-/*
- * return the address being referenced be instruction
- * for rm=3 returning the content of the rm reg
- * for rm!=3 calculates the address using SIB and Disp
+/**
+ * get_addr_ref_64() - Obtain a 64-bit linear address
+ * @insn:  Instruction struct with ModRM and SIB bytes and displacement
+ * @regs:  Structure with register values as seen when entering kernel mode
+ *
+ * This function is to be used with 64-bit address encodings to obtain the
+ * linear memory address referred by the instruction's ModRM, SIB,
+ * displacement bytes and segment base address, as applicable.
+ *
+ * Returns:
+ *
+ * Linear address referenced by instruction and registers on success.
+ *
+ * -1L on error.
  */
-void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+#ifndef CONFIG_X86_64
+static void __user *get_addr_ref_64(struct insn *insn, struct pt_regs *regs)
+{
+   return (void __user *)-1L;
+}
+#else
+static void __user *get_addr_ref_64(struct insn *insn, struct pt_regs *regs)
 {
unsigned long linear_addr = -1L, seg_base;
int addr_offset, ret;
long eff_addr;
 
+   if (insn->addr_bytes != 8)
+   goto out;
+
if (X86_MODRM_MOD(insn->modrm.value) == 3) {
ret = get_eff_addr_reg(insn, regs, _offset, _addr);
if (ret)
@@ -1093,3 +1112,34 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
 out:
return (void __user *)linear_addr;
 }
+#endif /* CONFIG_X86_64 */
+
+/**
+ * insn_get_addr_ref() - Obtain the linear address referred by instruction
+ * @insn:  Instruction structure containing ModRM byte and displacement
+ * @regs:  Structure with register values as seen when entering kernel mode
+ *
+ * Obtain the linear address referred by the instruction's ModRM, SIB and
+ * displacement bytes, and segment base, as applicable. In protected mode,
+ * segment limits are enforced.
+ *
+ * Returns:
+ *
+ * Linear address referenced by instruction and registers on success.
+ *
+ * -1L on error.
+ */
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+{
+   if (!insn || !regs)
+   return (void __user *)-1L;
+
+   switch (insn->addr_bytes) {
+   case 4:
+   return get_addr_ref_32(insn, regs);
+   case 8:
+   return get_addr_ref_64(insn, regs);
+   default:
+   return (void __user *)-1L;
+   }
+}
-- 
2.7.4



[PATCH v10 06/13] x86/insn-eval: Add support to resolve 16-bit address encodings

2017-10-27 Thread Ricardo Neri
Tasks running in virtual-8086 mode, in protected mode with code segment
descriptors that specify 16-bit default address sizes via the
D bit, or via an address override prefix will use 16-bit addressing form
encodings as described in the Intel 64 and IA-32 Architecture Software
Developer's Manual Volume 2A Section 2.1.5, Table 2-1.

16-bit addressing encodings differ in several ways from the 32-bit/64-bit
addressing form encodings: ModRM.rm points to different registers and, in
some cases, effective addresses are indicated by the addition of the value
of two registers. Also, there is no support for SIB bytes. Thus, a
separate function is needed to parse this form of addressing.

Three functions are introduced. get_reg_offset_16() obtains the
offset from the base of pt_regs of the registers indicated by the ModRM
byte of the address encoding. get_eff_addr_modrm_16() computes the
effective address from the value of the register operands.
get_addr_ref_16() computes the linear address using the obtained effective
address and the base address of the segment.

Segment limits are enforced when running in protected mode.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 213 ++-
 1 file changed, 212 insertions(+), 1 deletion(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 66d597d..cee168b 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -481,6 +481,80 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
 }
 
 /**
+ * get_reg_offset_16() - Obtain offset of register indicated by instruction
+ * @insn:  Instruction containing ModRM byte
+ * @regs:  Register values as seen when entering kernel mode
+ * @offs1: Offset of the first operand register
+ * @offs2: Offset of the second opeand register, if applicable
+ *
+ * Obtain the offset, in pt_regs, of the registers indicated by the ModRM byte
+ * in @insn. This function is to be used with 16-bit address encodings. The
+ * @offs1 and @offs2 will be written with the offset of the two registers
+ * indicated by the instruction. In cases where any of the registers is not
+ * referenced by the instruction, the value will be set to -EDOM.
+ *
+ * Returns:
+ *
+ * 0 on success, -EINVAL on error.
+ */
+static int get_reg_offset_16(struct insn *insn, struct pt_regs *regs,
+int *offs1, int *offs2)
+{
+   /*
+* 16-bit addressing can use one or two registers. Specifics of
+* encodings are given in Table 2-1. "16-Bit Addressing Forms with the
+* ModR/M Byte" of the Intel Software Development Manual.
+*/
+   static const int regoff1[] = {
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, bx),
+   };
+
+   static const int regoff2[] = {
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   -EDOM,
+   -EDOM,
+   -EDOM,
+   -EDOM,
+   };
+
+   if (!offs1 || !offs2)
+   return -EINVAL;
+
+   /* Operand is a register, use the generic function. */
+   if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+   *offs1 = insn_get_modrm_rm_off(insn, regs);
+   *offs2 = -EDOM;
+   return 0;
+   }
+
+   *offs1 = regoff1[X86_MODRM_RM(insn->modrm.value)];
+   *offs2 = regoff2[X86_MODRM_RM(insn->modrm.value)];
+
+   /*
+* If ModRM.mod is 0 and ModRM.rm is 110b, then we use displacement-
+* only addressing. This means that no registers are involved in
+* computing the effective address. Thus, ensure that the first
+* register offset is invalild. The second register offset is already
+* invalid under the aforementioned conditions.
+*/
+   if ((X86_MODRM_MOD(insn->modrm.value) == 0) &&
+   (X86_MODRM_RM(insn->modrm.value) == 6))
+

[PATCH v10 07/13] x86/cpufeature: Add User-Mode Instruction Prevention definitions

2017-10-27 Thread Ricardo Neri
User-Mode Instruction Prevention is a security feature present in new
Intel processors that, when set, prevents the execution of a subset of
instructions if such instructions are executed in user mode (CPL > 0).
Attempting to execute such instructions causes a general protection
exception.

The subset of instructions comprises:

 * SGDT - Store Global Descriptor Table
 * SIDT - Store Interrupt Descriptor Table
 * SLDT - Store Local Descriptor Table
 * SMSW - Store Machine Status Word
 * STR  - Store Task Register

This feature is also added to the list of disabled-features to allow
a cleaner handling of build-time configuration.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: x...@kernel.org
Reviewed-by: Borislav Petkov 
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/cpufeatures.h  | 1 +
 arch/x86/include/asm/disabled-features.h| 8 +++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 401a709..ca0cc2d 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -297,6 +297,7 @@
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (ecx), word 16 */
 #define X86_FEATURE_AVX512VBMI  (16*32+ 1) /* AVX512 Vector Bit Manipulation 
instructions*/
+#define X86_FEATURE_UMIP   (16*32+ 2) /* User Mode Instruction Protection 
*/
 #define X86_FEATURE_PKU(16*32+ 3) /* Protection Keys for 
Userspace */
 #define X86_FEATURE_OSPKE  (16*32+ 4) /* OS Protection Keys Enable */
 #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW 
*/
diff --git a/arch/x86/include/asm/disabled-features.h 
b/arch/x86/include/asm/disabled-features.h
index c10c912..14d6d50 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -16,6 +16,12 @@
 # define DISABLE_MPX   (1<<(X86_FEATURE_MPX & 31))
 #endif
 
+#ifdef CONFIG_X86_INTEL_UMIP
+# define DISABLE_UMIP  0
+#else
+# define DISABLE_UMIP  (1<<(X86_FEATURE_UMIP & 31))
+#endif
+
 #ifdef CONFIG_X86_64
 # define DISABLE_VME   (1<<(X86_FEATURE_VME & 31))
 # define DISABLE_K6_MTRR   (1<<(X86_FEATURE_K6_MTRR & 31))
@@ -63,7 +69,7 @@
 #define DISABLED_MASK130
 #define DISABLED_MASK140
 #define DISABLED_MASK150
-#define DISABLED_MASK16(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57)
+#define DISABLED_MASK16
(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP)
 #define DISABLED_MASK170
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
 
diff --git a/arch/x86/include/uapi/asm/processor-flags.h 
b/arch/x86/include/uapi/asm/processor-flags.h
index 39946d0..cf4c876 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -104,6 +104,8 @@
 #define X86_CR4_OSFXSR _BITUL(X86_CR4_OSFXSR_BIT)
 #define X86_CR4_OSXMMEXCPT_BIT 10 /* enable unmasked SSE exceptions */
 #define X86_CR4_OSXMMEXCPT _BITUL(X86_CR4_OSXMMEXCPT_BIT)
+#define X86_CR4_UMIP_BIT   11 /* enable UMIP support */
+#define X86_CR4_UMIP   _BITUL(X86_CR4_UMIP_BIT)
 #define X86_CR4_LA57_BIT   12 /* enable 5-level page tables */
 #define X86_CR4_LA57   _BITUL(X86_CR4_LA57_BIT)
 #define X86_CR4_VMXE_BIT   13 /* enable VMX virtualization */
-- 
2.7.4



[PATCH v10 04/13] x86/insn-eval: Add wrapper function for 32 and 64-bit addresses

2017-10-27 Thread Ricardo Neri
The function insn_get_addr_ref() is capable of handling only 64-bit
addresses. A previous commit introduced a function to handle 32-bit
addresses. Invoke these two functions from a third wrapper function that
calls the appropriate routine based on the address size specified in the
instruction structure (obtained by looking at the code segment default
address size and the address override prefix, if present).

While doing this, rename the original function insn_get_addr_ref() with
the more appropriate name get_addr_ref_64(), ensure it is only used
for 64-bit addresses.

Also, since 64-bit addresses are not possible in 32-bit builds, provide
a dummy function such case.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 60 
 1 file changed, 55 insertions(+), 5 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 7133a47..d5618ee 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -1053,17 +1053,36 @@ static void __user *get_addr_ref_32(struct insn *insn, 
struct pt_regs *regs)
return (void __user *)linear_addr;
 }
 
-/*
- * return the address being referenced be instruction
- * for rm=3 returning the content of the rm reg
- * for rm!=3 calculates the address using SIB and Disp
+/**
+ * get_addr_ref_64() - Obtain a 64-bit linear address
+ * @insn:  Instruction struct with ModRM and SIB bytes and displacement
+ * @regs:  Structure with register values as seen when entering kernel mode
+ *
+ * This function is to be used with 64-bit address encodings to obtain the
+ * linear memory address referred by the instruction's ModRM, SIB,
+ * displacement bytes and segment base address, as applicable.
+ *
+ * Returns:
+ *
+ * Linear address referenced by instruction and registers on success.
+ *
+ * -1L on error.
  */
-void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+#ifndef CONFIG_X86_64
+static void __user *get_addr_ref_64(struct insn *insn, struct pt_regs *regs)
+{
+   return (void __user *)-1L;
+}
+#else
+static void __user *get_addr_ref_64(struct insn *insn, struct pt_regs *regs)
 {
unsigned long linear_addr = -1L, seg_base;
int addr_offset, ret;
long eff_addr;
 
+   if (insn->addr_bytes != 8)
+   goto out;
+
if (X86_MODRM_MOD(insn->modrm.value) == 3) {
ret = get_eff_addr_reg(insn, regs, _offset, _addr);
if (ret)
@@ -1093,3 +1112,34 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
 out:
return (void __user *)linear_addr;
 }
+#endif /* CONFIG_X86_64 */
+
+/**
+ * insn_get_addr_ref() - Obtain the linear address referred by instruction
+ * @insn:  Instruction structure containing ModRM byte and displacement
+ * @regs:  Structure with register values as seen when entering kernel mode
+ *
+ * Obtain the linear address referred by the instruction's ModRM, SIB and
+ * displacement bytes, and segment base, as applicable. In protected mode,
+ * segment limits are enforced.
+ *
+ * Returns:
+ *
+ * Linear address referenced by instruction and registers on success.
+ *
+ * -1L on error.
+ */
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+{
+   if (!insn || !regs)
+   return (void __user *)-1L;
+
+   switch (insn->addr_bytes) {
+   case 4:
+   return get_addr_ref_32(insn, regs);
+   case 8:
+   return get_addr_ref_64(insn, regs);
+   default:
+   return (void __user *)-1L;
+   }
+}
-- 
2.7.4



[PATCH v10 06/13] x86/insn-eval: Add support to resolve 16-bit address encodings

2017-10-27 Thread Ricardo Neri
Tasks running in virtual-8086 mode, in protected mode with code segment
descriptors that specify 16-bit default address sizes via the
D bit, or via an address override prefix will use 16-bit addressing form
encodings as described in the Intel 64 and IA-32 Architecture Software
Developer's Manual Volume 2A Section 2.1.5, Table 2-1.

16-bit addressing encodings differ in several ways from the 32-bit/64-bit
addressing form encodings: ModRM.rm points to different registers and, in
some cases, effective addresses are indicated by the addition of the value
of two registers. Also, there is no support for SIB bytes. Thus, a
separate function is needed to parse this form of addressing.

Three functions are introduced. get_reg_offset_16() obtains the
offset from the base of pt_regs of the registers indicated by the ModRM
byte of the address encoding. get_eff_addr_modrm_16() computes the
effective address from the value of the register operands.
get_addr_ref_16() computes the linear address using the obtained effective
address and the base address of the segment.

Segment limits are enforced when running in protected mode.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 213 ++-
 1 file changed, 212 insertions(+), 1 deletion(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 66d597d..cee168b 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -481,6 +481,80 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
 }
 
 /**
+ * get_reg_offset_16() - Obtain offset of register indicated by instruction
+ * @insn:  Instruction containing ModRM byte
+ * @regs:  Register values as seen when entering kernel mode
+ * @offs1: Offset of the first operand register
+ * @offs2: Offset of the second opeand register, if applicable
+ *
+ * Obtain the offset, in pt_regs, of the registers indicated by the ModRM byte
+ * in @insn. This function is to be used with 16-bit address encodings. The
+ * @offs1 and @offs2 will be written with the offset of the two registers
+ * indicated by the instruction. In cases where any of the registers is not
+ * referenced by the instruction, the value will be set to -EDOM.
+ *
+ * Returns:
+ *
+ * 0 on success, -EINVAL on error.
+ */
+static int get_reg_offset_16(struct insn *insn, struct pt_regs *regs,
+int *offs1, int *offs2)
+{
+   /*
+* 16-bit addressing can use one or two registers. Specifics of
+* encodings are given in Table 2-1. "16-Bit Addressing Forms with the
+* ModR/M Byte" of the Intel Software Development Manual.
+*/
+   static const int regoff1[] = {
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, bx),
+   };
+
+   static const int regoff2[] = {
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   -EDOM,
+   -EDOM,
+   -EDOM,
+   -EDOM,
+   };
+
+   if (!offs1 || !offs2)
+   return -EINVAL;
+
+   /* Operand is a register, use the generic function. */
+   if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+   *offs1 = insn_get_modrm_rm_off(insn, regs);
+   *offs2 = -EDOM;
+   return 0;
+   }
+
+   *offs1 = regoff1[X86_MODRM_RM(insn->modrm.value)];
+   *offs2 = regoff2[X86_MODRM_RM(insn->modrm.value)];
+
+   /*
+* If ModRM.mod is 0 and ModRM.rm is 110b, then we use displacement-
+* only addressing. This means that no registers are involved in
+* computing the effective address. Thus, ensure that the first
+* register offset is invalild. The second register offset is already
+* invalid under the aforementioned conditions.
+*/
+   if ((X86_MODRM_MOD(insn->modrm.value) == 0) &&
+   (X86_MODRM_RM(insn->modrm.value) == 6))
+   *offs1 = -EDOM;
+
+   return 0;
+}
+
+/**
  * get_desc() - Obtain pointer to a segment descriptor
  * @sel:   Segment selector
  *
@@ -815,7 +889,9 @@ static int get_eff_addr_reg(struct insn *insn, struct 
pt_regs *regs,
return -EINVAL;
 
/* Ignore bytes that are outside the address size */
-   if 

[PATCH v10 03/13] x86/insn-eval: Add support to resolve 32-bit address encodings

2017-10-27 Thread Ricardo Neri
32-bit and 64-bit address encodings are identical. Thus, the same logic
could be used to resolve the effective address. However, there are two key
differences: address size and enforcement of segment limits.

If running a 32-bit process on a 64-bit kernel, it is best to perform
the address calculation using 32-bit data types. In this manner hardware
is used for the arithmetic, including handling of signs and overflows.

32-bit addresses are generally used in protected mode; segment limits are
enforced in this mode. This implementation obtains the limit of the
segment associated with the instruction operands and prefixes. If the
computed address is outside the segment limits, an error is returned. It
is also possible to use 32-bit address in long mode and virtual-8086 mode
by using an address override prefix. In such cases, segment limits are not
enforced.

Add support to use 32-bit arithmetic to the utility functions that compute
effective addresses. Once the effective address is computed, ignore
the bytes that are outside the address size as given by the instruction
structure.

The new function get_addr_ref_32() is almost identical to the existing
function insn_get_addr_ref() (used for 64-bit addresses). The only
difference is that it verifies that the effective address is within the
limits of the segment.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 113 ---
 1 file changed, 107 insertions(+), 6 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 4aa3c48..7133a47 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -814,7 +814,12 @@ static int get_eff_addr_reg(struct insn *insn, struct 
pt_regs *regs,
if (*regoff < 0)
return -EINVAL;
 
-   *eff_addr = (long)regs_get_register(regs, *regoff);
+   /* Ignore bytes that are outside the address size */
+   if (insn->addr_bytes == 4)
+   *eff_addr = (long)(regs_get_register(regs, *regoff) &
+  0x);
+   else /* 64-bit address */
+   *eff_addr = (long)regs_get_register(regs, *regoff);
 
return 0;
 }
@@ -847,7 +852,7 @@ static int get_eff_addr_modrm(struct insn *insn, struct 
pt_regs *regs,
 {
long tmp;
 
-   if (insn->addr_bytes != 8)
+   if (insn->addr_bytes != 8 && insn->addr_bytes != 4)
return -EINVAL;
 
insn_get_modrm(insn);
@@ -875,7 +880,13 @@ static int get_eff_addr_modrm(struct insn *insn, struct 
pt_regs *regs,
tmp = (long)regs_get_register(regs, *regoff);
}
 
-   *eff_addr = tmp + insn->displacement.value;
+   if (insn->addr_bytes == 4) {
+   int addr32 = (int)(tmp & 0x) + insn->displacement.value;
+
+   *eff_addr = (long)(addr32 & 0x);
+   } else {
+   *eff_addr = tmp + insn->displacement.value;
+   }
 
return 0;
 }
@@ -909,7 +920,7 @@ static int get_eff_addr_sib(struct insn *insn, struct 
pt_regs *regs,
long base, indx;
int indx_offset;
 
-   if (insn->addr_bytes != 8)
+   if (insn->addr_bytes != 8 && insn->addr_bytes != 4)
return -EINVAL;
 
insn_get_modrm(insn);
@@ -946,12 +957,102 @@ static int get_eff_addr_sib(struct insn *insn, struct 
pt_regs *regs,
else
indx = (long)regs_get_register(regs, indx_offset);
 
-   *eff_addr = base + indx * (1 << X86_SIB_SCALE(insn->sib.value));
+   if (insn->addr_bytes == 4) {
+   int addr32, base32, idx32;
+
+   base32 = (int)(base & 0x);
+   idx32 = (int)(indx & 0x);
 
-   *eff_addr += insn->displacement.value;
+   addr32 = base32 + idx32 * (1 << X86_SIB_SCALE(insn->sib.value));
+   addr32 += insn->displacement.value;
+
+   *eff_addr = (long)(addr32 & 0x);
+   } else {
+   *eff_addr = base + indx * (1 << X86_SIB_SCALE(insn->sib.value));
+   *eff_addr += insn->displacement.value;
+   }
 
return 0;
 }
+
+/**
+ * get_addr_ref_32() - Obtain a 32-bit linear address
+ * @insn:  Instruction with ModRM, SIB bytes and displacement
+ * @regs:  Register values as seen when entering kernel mode
+ *
+ * 

[PATCH v10 03/13] x86/insn-eval: Add support to resolve 32-bit address encodings

2017-10-27 Thread Ricardo Neri
32-bit and 64-bit address encodings are identical. Thus, the same logic
could be used to resolve the effective address. However, there are two key
differences: address size and enforcement of segment limits.

If running a 32-bit process on a 64-bit kernel, it is best to perform
the address calculation using 32-bit data types. In this manner hardware
is used for the arithmetic, including handling of signs and overflows.

32-bit addresses are generally used in protected mode; segment limits are
enforced in this mode. This implementation obtains the limit of the
segment associated with the instruction operands and prefixes. If the
computed address is outside the segment limits, an error is returned. It
is also possible to use 32-bit address in long mode and virtual-8086 mode
by using an address override prefix. In such cases, segment limits are not
enforced.

Add support to use 32-bit arithmetic to the utility functions that compute
effective addresses. Once the effective address is computed, ignore
the bytes that are outside the address size as given by the instruction
structure.

The new function get_addr_ref_32() is almost identical to the existing
function insn_get_addr_ref() (used for 64-bit addresses). The only
difference is that it verifies that the effective address is within the
limits of the segment.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 113 ---
 1 file changed, 107 insertions(+), 6 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 4aa3c48..7133a47 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -814,7 +814,12 @@ static int get_eff_addr_reg(struct insn *insn, struct 
pt_regs *regs,
if (*regoff < 0)
return -EINVAL;
 
-   *eff_addr = (long)regs_get_register(regs, *regoff);
+   /* Ignore bytes that are outside the address size */
+   if (insn->addr_bytes == 4)
+   *eff_addr = (long)(regs_get_register(regs, *regoff) &
+  0x);
+   else /* 64-bit address */
+   *eff_addr = (long)regs_get_register(regs, *regoff);
 
return 0;
 }
@@ -847,7 +852,7 @@ static int get_eff_addr_modrm(struct insn *insn, struct 
pt_regs *regs,
 {
long tmp;
 
-   if (insn->addr_bytes != 8)
+   if (insn->addr_bytes != 8 && insn->addr_bytes != 4)
return -EINVAL;
 
insn_get_modrm(insn);
@@ -875,7 +880,13 @@ static int get_eff_addr_modrm(struct insn *insn, struct 
pt_regs *regs,
tmp = (long)regs_get_register(regs, *regoff);
}
 
-   *eff_addr = tmp + insn->displacement.value;
+   if (insn->addr_bytes == 4) {
+   int addr32 = (int)(tmp & 0x) + insn->displacement.value;
+
+   *eff_addr = (long)(addr32 & 0x);
+   } else {
+   *eff_addr = tmp + insn->displacement.value;
+   }
 
return 0;
 }
@@ -909,7 +920,7 @@ static int get_eff_addr_sib(struct insn *insn, struct 
pt_regs *regs,
long base, indx;
int indx_offset;
 
-   if (insn->addr_bytes != 8)
+   if (insn->addr_bytes != 8 && insn->addr_bytes != 4)
return -EINVAL;
 
insn_get_modrm(insn);
@@ -946,12 +957,102 @@ static int get_eff_addr_sib(struct insn *insn, struct 
pt_regs *regs,
else
indx = (long)regs_get_register(regs, indx_offset);
 
-   *eff_addr = base + indx * (1 << X86_SIB_SCALE(insn->sib.value));
+   if (insn->addr_bytes == 4) {
+   int addr32, base32, idx32;
+
+   base32 = (int)(base & 0x);
+   idx32 = (int)(indx & 0x);
 
-   *eff_addr += insn->displacement.value;
+   addr32 = base32 + idx32 * (1 << X86_SIB_SCALE(insn->sib.value));
+   addr32 += insn->displacement.value;
+
+   *eff_addr = (long)(addr32 & 0x);
+   } else {
+   *eff_addr = base + indx * (1 << X86_SIB_SCALE(insn->sib.value));
+   *eff_addr += insn->displacement.value;
+   }
 
return 0;
 }
+
+/**
+ * get_addr_ref_32() - Obtain a 32-bit linear address
+ * @insn:  Instruction with ModRM, SIB bytes and displacement
+ * @regs:  Register values as seen when entering kernel mode
+ *
+ * This function is to be used with 32-bit address encodings to obtain the
+ * linear memory address referred by the instruction's ModRM, SIB,
+ * displacement bytes and segment base address, as applicable. If in protected
+ * mode, segment limits are enforced.
+ *
+ * Returns:
+ *
+ * Linear address referenced by instruction and registers on success.
+ 

[PATCH v10 10/13] x86: Enable User-Mode Instruction Prevention

2017-10-27 Thread Ricardo Neri
User-Mode Instruction Prevention (UMIP) is enabled by setting/clearing a
bit in %cr4.

It makes sense to enable UMIP at some point while booting, before user
spaces come up. Like SMAP and SMEP, is not critical to have it enabled
very early during boot. This is because UMIP is relevant only when there is
a userspace to be protected from. Given the similarities in relevance, it
makes sense to enable UMIP along with SMAP and SMEP.

At the moment, UMIP is disabled by default. It can be enabled at build
time by selecting CONFIG_X86_INTEL_UMIP. If enabled at build time, it can
be disabled at run time by adding clearcpuid=514 to the kernel parameters.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/Kconfig | 10 ++
 arch/x86/kernel/cpu/common.c | 25 -
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ecf2cf3..1579a71 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1803,6 +1803,16 @@ config X86_SMAP
 
  If unsure, say Y.
 
+config X86_INTEL_UMIP
+   def_bool n
+   depends on CPU_SUP_INTEL
+   prompt "Intel User Mode Instruction Prevention" if EXPERT
+   ---help---
+ The User Mode Instruction Prevention (UMIP) is a security
+ feature in newer Intel processors. If enabled, a general
+ protection fault is issued if the instructions SGDT, SLDT,
+ SIDT, SMSW and STR are executed in user mode.
+
 config X86_INTEL_MPX
prompt "Intel MPX (Memory Protection Extensions)"
def_bool n
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 09b8a99..3e6b9ca 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -329,6 +329,28 @@ static __always_inline void setup_smap(struct cpuinfo_x86 
*c)
}
 }
 
+static __always_inline void setup_umip(struct cpuinfo_x86 *c)
+{
+   /* Check the boot processor, plus build option for UMIP. */
+   if (!cpu_feature_enabled(X86_FEATURE_UMIP))
+   goto out;
+
+   /* Check the current processor's cpuid bits. */
+   if (!cpu_has(c, X86_FEATURE_UMIP))
+   goto out;
+
+   cr4_set_bits(X86_CR4_UMIP);
+
+   return;
+
+out:
+   /*
+* Make sure UMIP is disabled in case it was enabled in a
+* previous boot (e.g., via kexec).
+*/
+   cr4_clear_bits(X86_CR4_UMIP);
+}
+
 /*
  * Protection Keys are not available in 32-bit mode.
  */
@@ -1147,9 +1169,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
/* Disable the PN if appropriate */
squash_the_stupid_serial_number(c);
 
-   /* Set up SMEP/SMAP */
+   /* Set up SMEP/SMAP/UMIP */
setup_smep(c);
setup_smap(c);
+   setup_umip(c);
 
/*
 * The vendor-specific functions might have changed features.
-- 
2.7.4



[PATCH v10 08/13] x86: Add emulation code for UMIP instructions

2017-10-27 Thread Ricardo Neri
The feature User-Mode Instruction Prevention present in recent Intel
processor prevents a group of instructions (sgdt, sidt, sldt, smsw, and
str) from being executed with CPL > 0. Otherwise, a general protection
fault is issued.

Rather than relaying to the user space the general protection fault caused
by the UMIP-protected instructions (in the form of a SIGSEGV signal), it
can be trapped and emulate the result of such instructions to provide dummy
values. This allows to both conserve the current kernel behavior and not
reveal the system resources that UMIP intends to protect (i.e., the
locations of the global descriptor and interrupt descriptor tables, the
segment selectors of the local descriptor table, the value of the task
state register and the contents of the CR0 register).

This emulation is needed because certain applications (e.g., WineHQ and
DOSEMU2) rely on this subset of instructions to function. Given that sldt
and str are not commonly used in programs that run on WineHQ or DOSEMU2,
they are not emulated. Also, emulation is provided only for 32-bit
processes; 64-bit processes that attempt to use the instructions that UMIP
protects will receive the SIGSEGV signal issued as a consequence of the
general protection fault.

The instructions protected by UMIP can be split in two groups. Those which
return a kernel memory address (sgdt and sidt) and those which return a
value (sldt, str and smsw).

For the instructions that return a kernel memory address, applications such
as WineHQ rely on the result being located in the kernel memory space, not
the actual location of the table. The result is emulated as a hard-coded
value that lies close to the top of the kernel memory. The limit for the
GDT and the IDT are set to zero.

The instruction smsw is emulated to return the value that the register CR0
has at boot time as set in the head_32.

Care is taken to appropriately emulate the results when segmentation is
used. That is, rather than relying on USER_DS and USER_CS, the function
insn_get_addr_ref() inspects the segment descriptor pointed by the
registers in pt_regs. This ensures that we correctly obtain the segment
base address and the address and operand sizes even if the user space
application uses a local descriptor table.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/umip.h |  12 ++
 arch/x86/kernel/Makefile|   1 +
 arch/x86/kernel/umip.c  | 312 
 3 files changed, 325 insertions(+)
 create mode 100644 arch/x86/include/asm/umip.h
 create mode 100644 arch/x86/kernel/umip.c

diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h
new file mode 100644
index 000..db43f2a
--- /dev/null
+++ b/arch/x86/include/asm/umip.h
@@ -0,0 +1,12 @@
+#ifndef _ASM_X86_UMIP_H
+#define _ASM_X86_UMIP_H
+
+#include 
+#include 
+
+#ifdef CONFIG_X86_INTEL_UMIP
+bool fixup_umip_exception(struct pt_regs *regs);
+#else
+static inline bool fixup_umip_exception(struct pt_regs *regs) { return false; }
+#endif  /* CONFIG_X86_INTEL_UMIP */
+#endif  /* _ASM_X86_UMIP_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index d449c5a..4855dc4 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -126,6 +126,7 @@ obj-$(CONFIG_EFI)   += sysfb_efi.o
 obj-$(CONFIG_PERF_EVENTS)  += perf_regs.o
 obj-$(CONFIG_TRACING)  += tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)+= itmt.o
+obj-$(CONFIG_X86_INTEL_UMIP)   += umip.o
 
 obj-$(CONFIG_UNWINDER_ORC) += unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)   += unwind_frame.o
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
new file mode 100644
index 000..31cf9e9
--- /dev/null
+++ b/arch/x86/kernel/umip.c
@@ -0,0 +1,312 @@
+/*
+ * umip.c Emulation for instruction protected by the Intel User-Mode
+ * Instruction Prevention feature
+ *
+ * Copyright (c) 2017, Intel Corporation.
+ * Ricardo Neri 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/** DOC: Emulation for User-Mode Instruction Prevention (UMIP)
+ *

[PATCH v10 10/13] x86: Enable User-Mode Instruction Prevention

2017-10-27 Thread Ricardo Neri
User-Mode Instruction Prevention (UMIP) is enabled by setting/clearing a
bit in %cr4.

It makes sense to enable UMIP at some point while booting, before user
spaces come up. Like SMAP and SMEP, is not critical to have it enabled
very early during boot. This is because UMIP is relevant only when there is
a userspace to be protected from. Given the similarities in relevance, it
makes sense to enable UMIP along with SMAP and SMEP.

At the moment, UMIP is disabled by default. It can be enabled at build
time by selecting CONFIG_X86_INTEL_UMIP. If enabled at build time, it can
be disabled at run time by adding clearcpuid=514 to the kernel parameters.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/Kconfig | 10 ++
 arch/x86/kernel/cpu/common.c | 25 -
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ecf2cf3..1579a71 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1803,6 +1803,16 @@ config X86_SMAP
 
  If unsure, say Y.
 
+config X86_INTEL_UMIP
+   def_bool n
+   depends on CPU_SUP_INTEL
+   prompt "Intel User Mode Instruction Prevention" if EXPERT
+   ---help---
+ The User Mode Instruction Prevention (UMIP) is a security
+ feature in newer Intel processors. If enabled, a general
+ protection fault is issued if the instructions SGDT, SLDT,
+ SIDT, SMSW and STR are executed in user mode.
+
 config X86_INTEL_MPX
prompt "Intel MPX (Memory Protection Extensions)"
def_bool n
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 09b8a99..3e6b9ca 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -329,6 +329,28 @@ static __always_inline void setup_smap(struct cpuinfo_x86 
*c)
}
 }
 
+static __always_inline void setup_umip(struct cpuinfo_x86 *c)
+{
+   /* Check the boot processor, plus build option for UMIP. */
+   if (!cpu_feature_enabled(X86_FEATURE_UMIP))
+   goto out;
+
+   /* Check the current processor's cpuid bits. */
+   if (!cpu_has(c, X86_FEATURE_UMIP))
+   goto out;
+
+   cr4_set_bits(X86_CR4_UMIP);
+
+   return;
+
+out:
+   /*
+* Make sure UMIP is disabled in case it was enabled in a
+* previous boot (e.g., via kexec).
+*/
+   cr4_clear_bits(X86_CR4_UMIP);
+}
+
 /*
  * Protection Keys are not available in 32-bit mode.
  */
@@ -1147,9 +1169,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
/* Disable the PN if appropriate */
squash_the_stupid_serial_number(c);
 
-   /* Set up SMEP/SMAP */
+   /* Set up SMEP/SMAP/UMIP */
setup_smep(c);
setup_smap(c);
+   setup_umip(c);
 
/*
 * The vendor-specific functions might have changed features.
-- 
2.7.4



[PATCH v10 08/13] x86: Add emulation code for UMIP instructions

2017-10-27 Thread Ricardo Neri
The feature User-Mode Instruction Prevention present in recent Intel
processor prevents a group of instructions (sgdt, sidt, sldt, smsw, and
str) from being executed with CPL > 0. Otherwise, a general protection
fault is issued.

Rather than relaying to the user space the general protection fault caused
by the UMIP-protected instructions (in the form of a SIGSEGV signal), it
can be trapped and emulate the result of such instructions to provide dummy
values. This allows to both conserve the current kernel behavior and not
reveal the system resources that UMIP intends to protect (i.e., the
locations of the global descriptor and interrupt descriptor tables, the
segment selectors of the local descriptor table, the value of the task
state register and the contents of the CR0 register).

This emulation is needed because certain applications (e.g., WineHQ and
DOSEMU2) rely on this subset of instructions to function. Given that sldt
and str are not commonly used in programs that run on WineHQ or DOSEMU2,
they are not emulated. Also, emulation is provided only for 32-bit
processes; 64-bit processes that attempt to use the instructions that UMIP
protects will receive the SIGSEGV signal issued as a consequence of the
general protection fault.

The instructions protected by UMIP can be split in two groups. Those which
return a kernel memory address (sgdt and sidt) and those which return a
value (sldt, str and smsw).

For the instructions that return a kernel memory address, applications such
as WineHQ rely on the result being located in the kernel memory space, not
the actual location of the table. The result is emulated as a hard-coded
value that lies close to the top of the kernel memory. The limit for the
GDT and the IDT are set to zero.

The instruction smsw is emulated to return the value that the register CR0
has at boot time as set in the head_32.

Care is taken to appropriately emulate the results when segmentation is
used. That is, rather than relying on USER_DS and USER_CS, the function
insn_get_addr_ref() inspects the segment descriptor pointed by the
registers in pt_regs. This ensures that we correctly obtain the segment
base address and the address and operand sizes even if the user space
application uses a local descriptor table.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/umip.h |  12 ++
 arch/x86/kernel/Makefile|   1 +
 arch/x86/kernel/umip.c  | 312 
 3 files changed, 325 insertions(+)
 create mode 100644 arch/x86/include/asm/umip.h
 create mode 100644 arch/x86/kernel/umip.c

diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h
new file mode 100644
index 000..db43f2a
--- /dev/null
+++ b/arch/x86/include/asm/umip.h
@@ -0,0 +1,12 @@
+#ifndef _ASM_X86_UMIP_H
+#define _ASM_X86_UMIP_H
+
+#include 
+#include 
+
+#ifdef CONFIG_X86_INTEL_UMIP
+bool fixup_umip_exception(struct pt_regs *regs);
+#else
+static inline bool fixup_umip_exception(struct pt_regs *regs) { return false; }
+#endif  /* CONFIG_X86_INTEL_UMIP */
+#endif  /* _ASM_X86_UMIP_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index d449c5a..4855dc4 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -126,6 +126,7 @@ obj-$(CONFIG_EFI)   += sysfb_efi.o
 obj-$(CONFIG_PERF_EVENTS)  += perf_regs.o
 obj-$(CONFIG_TRACING)  += tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)+= itmt.o
+obj-$(CONFIG_X86_INTEL_UMIP)   += umip.o
 
 obj-$(CONFIG_UNWINDER_ORC) += unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)   += unwind_frame.o
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
new file mode 100644
index 000..31cf9e9
--- /dev/null
+++ b/arch/x86/kernel/umip.c
@@ -0,0 +1,312 @@
+/*
+ * umip.c Emulation for instruction protected by the Intel User-Mode
+ * Instruction Prevention feature
+ *
+ * Copyright (c) 2017, Intel Corporation.
+ * Ricardo Neri 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/** DOC: Emulation for User-Mode Instruction Prevention (UMIP)
+ *
+ * The feature User-Mode Instruction Prevention present in recent Intel
+ * processor prevents a group of instructions (sgdt, sidt, sldt, smsw, and str)
+ * from being executed with CPL > 0. Otherwise, a general protection fault is
+ * issued.
+ *
+ * Rather than relaying to the user space the general protection fault caused 
by
+ * the UMIP-protected instructions (in the form of a SIGSEGV signal), it can be
+ * trapped and emulate the result of such instructions to provide 

[PATCH] thermal: tegra: allow sensor registeration to fail

2017-10-27 Thread Nicolin Chen
Not all sensors may be used on a platform. So there could be
some missing thermal zones in the Device Tree. However, the
the driver now errors out whenever a sensor fails to register
the thermal zone.

Since the driver could live with other sensors, this change
allows other sensors to continue the registerations even if
one sensor fails to register.

Signed-off-by: Nicolin Chen 
---
 drivers/thermal/tegra/soctherm.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/thermal/tegra/soctherm.c b/drivers/thermal/tegra/soctherm.c
index 455b58c..73fcd48 100644
--- a/drivers/thermal/tegra/soctherm.c
+++ b/drivers/thermal/tegra/soctherm.c
@@ -1280,7 +1280,7 @@ static int tegra_soctherm_probe(struct platform_device 
*pdev)
struct tsensor_shared_calib shared_calib;
struct resource *res;
struct tegra_soctherm_soc *soc;
-   unsigned int i;
+   unsigned int i, e;
int err;
 
match = of_match_node(tegra_soctherm_of_match, pdev->dev.of_node);
@@ -1377,7 +1377,7 @@ static int tegra_soctherm_probe(struct platform_device 
*pdev)
 
soctherm_init(pdev);
 
-   for (i = 0; i < soc->num_ttgs; ++i) {
+   for (i = 0, e = 0; i < soc->num_ttgs; ++i) {
struct tegra_thermctl_zone *zone =
devm_kzalloc(>dev, sizeof(*zone), GFP_KERNEL);
if (!zone) {
@@ -1397,7 +1397,10 @@ static int tegra_soctherm_probe(struct platform_device 
*pdev)
err = PTR_ERR(z);
dev_err(>dev, "failed to register sensor: %d\n",
err);
-   goto disable_clocks;
+   /* Check if all sensors failed to register */
+   if (++e == soc->num_ttgs)
+   goto disable_clocks;
+   continue;
}
 
zone->tz = z;
@@ -1459,6 +1462,9 @@ static int __maybe_unused soctherm_resume(struct device 
*dev)
struct thermal_zone_device *tz;
 
tz = tegra->thermctl_tzs[soc->ttgs[i]->id];
+   if (!tz)
+   continue;
+
err = tegra_soctherm_set_hwtrips(dev, soc->ttgs[i], tz);
if (err) {
dev_err(>dev,
-- 
2.1.4



[PATCH v10 09/13] x86/umip: Force a page fault when unable to copy emulated result to user

2017-10-27 Thread Ricardo Neri
fixup_umip_exception() will be called from do_general_protection(). If the
former returns false, the latter will issue a SIGSEGV with SEND_SIG_PRIV.
However, when emulation is successful but the emulated result cannot be
copied to user space memory, it is more accurate to issue a SIGSEGV with
SEGV_MAPERR with the offending address. A new function, inspired in
force_sig_info_fault(), is introduced to model the page fault.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/umip.c | 45 +++--
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index 31cf9e9..12da83a 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -187,6 +187,41 @@ static int emulate_umip_insn(struct insn *insn, int 
umip_inst,
 }
 
 /**
+ * force_sig_info_umip_fault() - Force a SIGSEGV with SEGV_MAPERR
+ * @addr:  Address that caused the signal
+ * @regs:  Register set containing the instruction pointer
+ *
+ * Force a SIGSEGV signal with SEGV_MAPERR as the error code. This function is
+ * intended to be used to provide a segmentation fault when the result of the
+ * UMIP emulation could not be copied to the user space memory.
+ *
+ * Returns: none
+ */
+static void force_sig_info_umip_fault(void __user *addr, struct pt_regs *regs)
+{
+   siginfo_t info;
+   struct task_struct *tsk = current;
+
+   tsk->thread.cr2 = (unsigned long)addr;
+   tsk->thread.error_code  = X86_PF_USER | X86_PF_WRITE;
+   tsk->thread.trap_nr = X86_TRAP_PF;
+
+   info.si_signo   = SIGSEGV;
+   info.si_errno   = 0;
+   info.si_code= SEGV_MAPERR;
+   info.si_addr= addr;
+   force_sig_info(SIGSEGV, , tsk);
+
+   if (!(show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)))
+   return;
+
+   pr_err_ratelimited("%s[%d] umip emulation segfault ip:%lx sp:%lx 
error:%x in %lx\n",
+  tsk->comm, task_pid_nr(tsk), regs->ip,
+  regs->sp, X86_PF_USER | X86_PF_WRITE,
+  regs->ip);
+}
+
+/**
  * fixup_umip_exception() - Fixup #GP faults caused by UMIP
  * @regs:  Registers as saved when entering the #GP trap
  *
@@ -302,8 +337,14 @@ bool fixup_umip_exception(struct pt_regs *regs)
return false;
 
nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
-   if (nr_copied  > 0)
-   return false;
+   if (nr_copied  > 0) {
+   /*
+* If copy fails, send a signal and tell caller that
+* fault was fixed up.
+*/
+   force_sig_info_umip_fault(uaddr, regs);
+   return true;
+   }
}
 
/* increase IP to let the program keep going */
-- 
2.7.4



[PATCH] thermal: tegra: allow sensor registeration to fail

2017-10-27 Thread Nicolin Chen
Not all sensors may be used on a platform. So there could be
some missing thermal zones in the Device Tree. However, the
the driver now errors out whenever a sensor fails to register
the thermal zone.

Since the driver could live with other sensors, this change
allows other sensors to continue the registerations even if
one sensor fails to register.

Signed-off-by: Nicolin Chen 
---
 drivers/thermal/tegra/soctherm.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/thermal/tegra/soctherm.c b/drivers/thermal/tegra/soctherm.c
index 455b58c..73fcd48 100644
--- a/drivers/thermal/tegra/soctherm.c
+++ b/drivers/thermal/tegra/soctherm.c
@@ -1280,7 +1280,7 @@ static int tegra_soctherm_probe(struct platform_device 
*pdev)
struct tsensor_shared_calib shared_calib;
struct resource *res;
struct tegra_soctherm_soc *soc;
-   unsigned int i;
+   unsigned int i, e;
int err;
 
match = of_match_node(tegra_soctherm_of_match, pdev->dev.of_node);
@@ -1377,7 +1377,7 @@ static int tegra_soctherm_probe(struct platform_device 
*pdev)
 
soctherm_init(pdev);
 
-   for (i = 0; i < soc->num_ttgs; ++i) {
+   for (i = 0, e = 0; i < soc->num_ttgs; ++i) {
struct tegra_thermctl_zone *zone =
devm_kzalloc(>dev, sizeof(*zone), GFP_KERNEL);
if (!zone) {
@@ -1397,7 +1397,10 @@ static int tegra_soctherm_probe(struct platform_device 
*pdev)
err = PTR_ERR(z);
dev_err(>dev, "failed to register sensor: %d\n",
err);
-   goto disable_clocks;
+   /* Check if all sensors failed to register */
+   if (++e == soc->num_ttgs)
+   goto disable_clocks;
+   continue;
}
 
zone->tz = z;
@@ -1459,6 +1462,9 @@ static int __maybe_unused soctherm_resume(struct device 
*dev)
struct thermal_zone_device *tz;
 
tz = tegra->thermctl_tzs[soc->ttgs[i]->id];
+   if (!tz)
+   continue;
+
err = tegra_soctherm_set_hwtrips(dev, soc->ttgs[i], tz);
if (err) {
dev_err(>dev,
-- 
2.1.4



[PATCH v10 09/13] x86/umip: Force a page fault when unable to copy emulated result to user

2017-10-27 Thread Ricardo Neri
fixup_umip_exception() will be called from do_general_protection(). If the
former returns false, the latter will issue a SIGSEGV with SEND_SIG_PRIV.
However, when emulation is successful but the emulated result cannot be
copied to user space memory, it is more accurate to issue a SIGSEGV with
SEGV_MAPERR with the offending address. A new function, inspired in
force_sig_info_fault(), is introduced to model the page fault.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/umip.c | 45 +++--
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index 31cf9e9..12da83a 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -187,6 +187,41 @@ static int emulate_umip_insn(struct insn *insn, int 
umip_inst,
 }
 
 /**
+ * force_sig_info_umip_fault() - Force a SIGSEGV with SEGV_MAPERR
+ * @addr:  Address that caused the signal
+ * @regs:  Register set containing the instruction pointer
+ *
+ * Force a SIGSEGV signal with SEGV_MAPERR as the error code. This function is
+ * intended to be used to provide a segmentation fault when the result of the
+ * UMIP emulation could not be copied to the user space memory.
+ *
+ * Returns: none
+ */
+static void force_sig_info_umip_fault(void __user *addr, struct pt_regs *regs)
+{
+   siginfo_t info;
+   struct task_struct *tsk = current;
+
+   tsk->thread.cr2 = (unsigned long)addr;
+   tsk->thread.error_code  = X86_PF_USER | X86_PF_WRITE;
+   tsk->thread.trap_nr = X86_TRAP_PF;
+
+   info.si_signo   = SIGSEGV;
+   info.si_errno   = 0;
+   info.si_code= SEGV_MAPERR;
+   info.si_addr= addr;
+   force_sig_info(SIGSEGV, , tsk);
+
+   if (!(show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)))
+   return;
+
+   pr_err_ratelimited("%s[%d] umip emulation segfault ip:%lx sp:%lx 
error:%x in %lx\n",
+  tsk->comm, task_pid_nr(tsk), regs->ip,
+  regs->sp, X86_PF_USER | X86_PF_WRITE,
+  regs->ip);
+}
+
+/**
  * fixup_umip_exception() - Fixup #GP faults caused by UMIP
  * @regs:  Registers as saved when entering the #GP trap
  *
@@ -302,8 +337,14 @@ bool fixup_umip_exception(struct pt_regs *regs)
return false;
 
nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
-   if (nr_copied  > 0)
-   return false;
+   if (nr_copied  > 0) {
+   /*
+* If copy fails, send a signal and tell caller that
+* fault was fixed up.
+*/
+   force_sig_info_umip_fault(uaddr, regs);
+   return true;
+   }
}
 
/* increase IP to let the program keep going */
-- 
2.7.4



[PATCH v10 01/13] x86/insn-eval: Extend get_seg_base_addr() to also obtain segment limit

2017-10-27 Thread Ricardo Neri
In protected mode, it is common to want to obtain the limit of a segment
along with its base address. This is useful, for instance, to verify that
an effective address lies within a segment before computing a linear
address.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
Up to this point, this library only computes linear addresses in long
mode. Subsequent patches will include support for protected mode. Support
to verify the segment limit will be needed.
---
 arch/x86/lib/insn-eval.c | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 1c23ec0..91f08aa 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -729,25 +729,29 @@ int insn_get_modrm_rm_off(struct insn *insn, struct 
pt_regs *regs)
 }
 
 /**
- * get_seg_base_addr() - obtain base address of a segment
+ * get_seg_base_limit() - obtain base address and limit of a segment
  * @insn:  Instruction. Must be valid.
  * @regs:  Register values as seen when entering kernel mode
  * @regoff:Operand offset, in pt_regs, used to resolve segment descriptor
  * @base:  Obtained segment base
+ * @limit: Obtained segment limit
  *
- * Obtain the base address of the segment associated with the operand @regoff
- * and, if any or allowed, override prefixes in @insn. This function is
+ * Obtain the base address and limit of the segment associated with the operand
+ * @regoff and, if any or allowed, override prefixes in @insn. This function is
  * different from insn_get_seg_base() as the latter does not resolve the 
segment
- * associated with the instruction operand.
+ * associated with the instruction operand. If a limit is not needed (e.g.,
+ * when running in long mode), @limit can be NULL.
  *
  * Returns:
  *
- * 0 on success. @base will contain the base address of the resolved segment.
+ * 0 on success. @base and @limit will contain the base address and of the
+ * resolved segment, respectively.
  *
  * -EINVAL on error.
  */
-static int get_seg_base_addr(struct insn *insn, struct pt_regs *regs,
-int regoff, unsigned long *base)
+static int get_seg_base_limit(struct insn *insn, struct pt_regs *regs,
+ int regoff, unsigned long *base,
+ unsigned long *limit)
 {
int seg_reg_idx;
 
@@ -762,6 +766,13 @@ static int get_seg_base_addr(struct insn *insn, struct 
pt_regs *regs,
if (*base == -1L)
return -EINVAL;
 
+   if (!limit)
+   return 0;
+
+   *limit = get_seg_limit(regs, seg_reg_idx);
+   if (!(*limit))
+   return -EINVAL;
+
return 0;
 }
 
@@ -843,7 +854,7 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
eff_addr += insn->displacement.value;
}
 
-   ret = get_seg_base_addr(insn, regs, addr_offset, _base);
+   ret = get_seg_base_limit(insn, regs, addr_offset, _base, NULL);
if (ret)
goto out;
 
-- 
2.7.4



[PATCH v10 13/13] selftests/x86: Add tests for instruction str and sldt

2017-10-27 Thread Ricardo Neri
The instructions str and sldt are not recognized when running on virtual-
8086 mode and generate an invalid operand exception. These two
instructions are protected by the Intel User-Mode Instruction Prevention
(UMIP) security feature. In protected mode, if UMIP is enabled, these
instructions generate a general protection fault if called from CPL > 0.
Linux traps the general protection fault and emulates the instructions
sgdt, sidt and smsw; but not str and sldt.

These tests are added to verify that the emulation code does not emulate
these two instructions but the expected invalid operand exception is
seen.

Tests fallback to exit with int3 in case emulation does happen.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Signed-off-by: Ricardo Neri 
---
 tools/testing/selftests/x86/entry_from_vm86.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/entry_from_vm86.c 
b/tools/testing/selftests/x86/entry_from_vm86.c
index f7d9cea..361466a 100644
--- a/tools/testing/selftests/x86/entry_from_vm86.c
+++ b/tools/testing/selftests/x86/entry_from_vm86.c
@@ -111,6 +111,11 @@ asm (
"smsw %ax\n\t"
"mov %ax, (2080)\n\t"
"int3\n\t"
+   "vmcode_umip_str:\n\t"
+   "str %eax\n\t"
+   "vmcode_umip_sldt:\n\t"
+   "sldt %eax\n\t"
+   "int3\n\t"
".size vmcode, . - vmcode\n\t"
"end_vmcode:\n\t"
".code32\n\t"
@@ -119,7 +124,8 @@ asm (
 
 extern unsigned char vmcode[], end_vmcode[];
 extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[],
-   vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[];
+   vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[],
+   vmcode_umip_str[], vmcode_umip_sldt[];
 
 /* Returns false if the test was skipped. */
 static bool do_test(struct vm86plus_struct *v86, unsigned long eip,
@@ -226,6 +232,16 @@ void do_umip_tests(struct vm86plus_struct *vm86, unsigned 
char *test_mem)
printf("[FAIL]\tAll the results of SIDT should be the same.\n");
else
printf("[PASS]\tAll the results from SIDT are identical.\n");
+
+   sethandler(SIGILL, sighandler, 0);
+   do_test(vm86, vmcode_umip_str - vmcode, VM86_SIGNAL, 0,
+   "STR instruction");
+   clearhandler(SIGILL);
+
+   sethandler(SIGILL, sighandler, 0);
+   do_test(vm86, vmcode_umip_sldt - vmcode, VM86_SIGNAL, 0,
+   "SLDT instruction");
+   clearhandler(SIGILL);
 }
 
 int main(void)
-- 
2.7.4



[PATCH v10 01/13] x86/insn-eval: Extend get_seg_base_addr() to also obtain segment limit

2017-10-27 Thread Ricardo Neri
In protected mode, it is common to want to obtain the limit of a segment
along with its base address. This is useful, for instance, to verify that
an effective address lies within a segment before computing a linear
address.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
Up to this point, this library only computes linear addresses in long
mode. Subsequent patches will include support for protected mode. Support
to verify the segment limit will be needed.
---
 arch/x86/lib/insn-eval.c | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 1c23ec0..91f08aa 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -729,25 +729,29 @@ int insn_get_modrm_rm_off(struct insn *insn, struct 
pt_regs *regs)
 }
 
 /**
- * get_seg_base_addr() - obtain base address of a segment
+ * get_seg_base_limit() - obtain base address and limit of a segment
  * @insn:  Instruction. Must be valid.
  * @regs:  Register values as seen when entering kernel mode
  * @regoff:Operand offset, in pt_regs, used to resolve segment descriptor
  * @base:  Obtained segment base
+ * @limit: Obtained segment limit
  *
- * Obtain the base address of the segment associated with the operand @regoff
- * and, if any or allowed, override prefixes in @insn. This function is
+ * Obtain the base address and limit of the segment associated with the operand
+ * @regoff and, if any or allowed, override prefixes in @insn. This function is
  * different from insn_get_seg_base() as the latter does not resolve the 
segment
- * associated with the instruction operand.
+ * associated with the instruction operand. If a limit is not needed (e.g.,
+ * when running in long mode), @limit can be NULL.
  *
  * Returns:
  *
- * 0 on success. @base will contain the base address of the resolved segment.
+ * 0 on success. @base and @limit will contain the base address and of the
+ * resolved segment, respectively.
  *
  * -EINVAL on error.
  */
-static int get_seg_base_addr(struct insn *insn, struct pt_regs *regs,
-int regoff, unsigned long *base)
+static int get_seg_base_limit(struct insn *insn, struct pt_regs *regs,
+ int regoff, unsigned long *base,
+ unsigned long *limit)
 {
int seg_reg_idx;
 
@@ -762,6 +766,13 @@ static int get_seg_base_addr(struct insn *insn, struct 
pt_regs *regs,
if (*base == -1L)
return -EINVAL;
 
+   if (!limit)
+   return 0;
+
+   *limit = get_seg_limit(regs, seg_reg_idx);
+   if (!(*limit))
+   return -EINVAL;
+
return 0;
 }
 
@@ -843,7 +854,7 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
eff_addr += insn->displacement.value;
}
 
-   ret = get_seg_base_addr(insn, regs, addr_offset, _base);
+   ret = get_seg_base_limit(insn, regs, addr_offset, _base, NULL);
if (ret)
goto out;
 
-- 
2.7.4



[PATCH v10 13/13] selftests/x86: Add tests for instruction str and sldt

2017-10-27 Thread Ricardo Neri
The instructions str and sldt are not recognized when running on virtual-
8086 mode and generate an invalid operand exception. These two
instructions are protected by the Intel User-Mode Instruction Prevention
(UMIP) security feature. In protected mode, if UMIP is enabled, these
instructions generate a general protection fault if called from CPL > 0.
Linux traps the general protection fault and emulates the instructions
sgdt, sidt and smsw; but not str and sldt.

These tests are added to verify that the emulation code does not emulate
these two instructions but the expected invalid operand exception is
seen.

Tests fallback to exit with int3 in case emulation does happen.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Signed-off-by: Ricardo Neri 
---
 tools/testing/selftests/x86/entry_from_vm86.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/entry_from_vm86.c 
b/tools/testing/selftests/x86/entry_from_vm86.c
index f7d9cea..361466a 100644
--- a/tools/testing/selftests/x86/entry_from_vm86.c
+++ b/tools/testing/selftests/x86/entry_from_vm86.c
@@ -111,6 +111,11 @@ asm (
"smsw %ax\n\t"
"mov %ax, (2080)\n\t"
"int3\n\t"
+   "vmcode_umip_str:\n\t"
+   "str %eax\n\t"
+   "vmcode_umip_sldt:\n\t"
+   "sldt %eax\n\t"
+   "int3\n\t"
".size vmcode, . - vmcode\n\t"
"end_vmcode:\n\t"
".code32\n\t"
@@ -119,7 +124,8 @@ asm (
 
 extern unsigned char vmcode[], end_vmcode[];
 extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[],
-   vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[];
+   vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[],
+   vmcode_umip_str[], vmcode_umip_sldt[];
 
 /* Returns false if the test was skipped. */
 static bool do_test(struct vm86plus_struct *v86, unsigned long eip,
@@ -226,6 +232,16 @@ void do_umip_tests(struct vm86plus_struct *vm86, unsigned 
char *test_mem)
printf("[FAIL]\tAll the results of SIDT should be the same.\n");
else
printf("[PASS]\tAll the results from SIDT are identical.\n");
+
+   sethandler(SIGILL, sighandler, 0);
+   do_test(vm86, vmcode_umip_str - vmcode, VM86_SIGNAL, 0,
+   "STR instruction");
+   clearhandler(SIGILL);
+
+   sethandler(SIGILL, sighandler, 0);
+   do_test(vm86, vmcode_umip_sldt - vmcode, VM86_SIGNAL, 0,
+   "SLDT instruction");
+   clearhandler(SIGILL);
 }
 
 int main(void)
-- 
2.7.4



[PATCH v10 12/13] selftests/x86: Add tests for User-Mode Instruction Prevention

2017-10-27 Thread Ricardo Neri
Certain user space programs that run on virtual-8086 mode may utilize
instructions protected by the User-Mode Instruction Prevention (UMIP)
security feature present in new Intel processors: SGDT, SIDT and SMSW. In
such a case, a general protection fault is issued if UMIP is enabled. When
such a fault happens, the kernel traps it and emulates the results of
these instructions with dummy values. The purpose of this new
test is to verify whether the impacted instructions can be executed
without causing such #GP. If no #GP exceptions occur, we expect to exit
virtual-8086 mode from INT3.

The instructions protected by UMIP are executed in representative use
cases:
 a) displacement-only memory addressing
 b) register-indirect memory addressing
 c) results stored directly in operands

Unfortunately, it is not possible to check the results against a set of
expected values because no emulation will occur in systems that do not
have the UMIP feature. Instead, results are printed for verification. A
simple verification is done to ensure that results of all tests are
identical.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Signed-off-by: Ricardo Neri 
---
 tools/testing/selftests/x86/entry_from_vm86.c | 73 ++-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/entry_from_vm86.c 
b/tools/testing/selftests/x86/entry_from_vm86.c
index d075ea0..f7d9cea 100644
--- a/tools/testing/selftests/x86/entry_from_vm86.c
+++ b/tools/testing/selftests/x86/entry_from_vm86.c
@@ -95,6 +95,22 @@ asm (
"int3\n\t"
"vmcode_int80:\n\t"
"int $0x80\n\t"
+   "vmcode_umip:\n\t"
+   /* addressing via displacements */
+   "smsw (2052)\n\t"
+   "sidt (2054)\n\t"
+   "sgdt (2060)\n\t"
+   /* addressing via registers */
+   "mov $2066, %bx\n\t"
+   "smsw (%bx)\n\t"
+   "mov $2068, %bx\n\t"
+   "sidt (%bx)\n\t"
+   "mov $2074, %bx\n\t"
+   "sgdt (%bx)\n\t"
+   /* register operands, only for smsw */
+   "smsw %ax\n\t"
+   "mov %ax, (2080)\n\t"
+   "int3\n\t"
".size vmcode, . - vmcode\n\t"
"end_vmcode:\n\t"
".code32\n\t"
@@ -103,7 +119,7 @@ asm (
 
 extern unsigned char vmcode[], end_vmcode[];
 extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[],
-   vmcode_sti[], vmcode_int3[], vmcode_int80[];
+   vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[];
 
 /* Returns false if the test was skipped. */
 static bool do_test(struct vm86plus_struct *v86, unsigned long eip,
@@ -160,6 +176,58 @@ static bool do_test(struct vm86plus_struct *v86, unsigned 
long eip,
return true;
 }
 
+void do_umip_tests(struct vm86plus_struct *vm86, unsigned char *test_mem)
+{
+   struct table_desc {
+   unsigned short limit;
+   unsigned long base;
+   } __attribute__((packed));
+
+   /* Initialize variables with arbitrary values */
+   struct table_desc gdt1 = { .base = 0x3c3c3c3c, .limit = 0x };
+   struct table_desc gdt2 = { .base = 0x1a1a1a1a, .limit = 0xaeae };
+   struct table_desc idt1 = { .base = 0x7b7b7b7b, .limit = 0xf1f1 };
+   struct table_desc idt2 = { .base = 0x89898989, .limit = 0x1313 };
+   unsigned short msw1 = 0x1414, msw2 = 0x2525, msw3 = 3737;
+
+   /* UMIP -- exit with INT3 unless kernel emulation did not trap #GP */
+   do_test(vm86, vmcode_umip - vmcode, VM86_TRAP, 3, "UMIP tests");
+
+   /* Results from displacement-only addressing */
+   msw1 = *(unsigned short *)(test_mem + 2052);
+   memcpy(, test_mem + 2054, sizeof(idt1));
+   memcpy(, test_mem + 2060, sizeof(gdt1));
+
+   /* Results from register-indirect addressing */
+   msw2 = *(unsigned short *)(test_mem + 2066);
+   memcpy(, test_mem + 2068, sizeof(idt2));
+   memcpy(, test_mem + 2074, sizeof(gdt2));
+
+   /* Results when using register operands */
+   msw3 = *(unsigned short *)(test_mem + 2080);
+
+   printf("[INFO]\tResult from SMSW:[0x%04x]\n", msw1);
+   printf("[INFO]\tResult from SIDT: limit[0x%04x]base[0x%08lx]\n",
+  idt1.limit, idt1.base);
+   printf("[INFO]\tResult from SGDT: limit[0x%04x]base[0x%08lx]\n",
+  gdt1.limit, gdt1.base);
+
+   if (msw1 != msw2 || msw1 != msw3)
+  

[PATCH v10 12/13] selftests/x86: Add tests for User-Mode Instruction Prevention

2017-10-27 Thread Ricardo Neri
Certain user space programs that run on virtual-8086 mode may utilize
instructions protected by the User-Mode Instruction Prevention (UMIP)
security feature present in new Intel processors: SGDT, SIDT and SMSW. In
such a case, a general protection fault is issued if UMIP is enabled. When
such a fault happens, the kernel traps it and emulates the results of
these instructions with dummy values. The purpose of this new
test is to verify whether the impacted instructions can be executed
without causing such #GP. If no #GP exceptions occur, we expect to exit
virtual-8086 mode from INT3.

The instructions protected by UMIP are executed in representative use
cases:
 a) displacement-only memory addressing
 b) register-indirect memory addressing
 c) results stored directly in operands

Unfortunately, it is not possible to check the results against a set of
expected values because no emulation will occur in systems that do not
have the UMIP feature. Instead, results are printed for verification. A
simple verification is done to ensure that results of all tests are
identical.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Signed-off-by: Ricardo Neri 
---
 tools/testing/selftests/x86/entry_from_vm86.c | 73 ++-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/entry_from_vm86.c 
b/tools/testing/selftests/x86/entry_from_vm86.c
index d075ea0..f7d9cea 100644
--- a/tools/testing/selftests/x86/entry_from_vm86.c
+++ b/tools/testing/selftests/x86/entry_from_vm86.c
@@ -95,6 +95,22 @@ asm (
"int3\n\t"
"vmcode_int80:\n\t"
"int $0x80\n\t"
+   "vmcode_umip:\n\t"
+   /* addressing via displacements */
+   "smsw (2052)\n\t"
+   "sidt (2054)\n\t"
+   "sgdt (2060)\n\t"
+   /* addressing via registers */
+   "mov $2066, %bx\n\t"
+   "smsw (%bx)\n\t"
+   "mov $2068, %bx\n\t"
+   "sidt (%bx)\n\t"
+   "mov $2074, %bx\n\t"
+   "sgdt (%bx)\n\t"
+   /* register operands, only for smsw */
+   "smsw %ax\n\t"
+   "mov %ax, (2080)\n\t"
+   "int3\n\t"
".size vmcode, . - vmcode\n\t"
"end_vmcode:\n\t"
".code32\n\t"
@@ -103,7 +119,7 @@ asm (
 
 extern unsigned char vmcode[], end_vmcode[];
 extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[],
-   vmcode_sti[], vmcode_int3[], vmcode_int80[];
+   vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[];
 
 /* Returns false if the test was skipped. */
 static bool do_test(struct vm86plus_struct *v86, unsigned long eip,
@@ -160,6 +176,58 @@ static bool do_test(struct vm86plus_struct *v86, unsigned 
long eip,
return true;
 }
 
+void do_umip_tests(struct vm86plus_struct *vm86, unsigned char *test_mem)
+{
+   struct table_desc {
+   unsigned short limit;
+   unsigned long base;
+   } __attribute__((packed));
+
+   /* Initialize variables with arbitrary values */
+   struct table_desc gdt1 = { .base = 0x3c3c3c3c, .limit = 0x };
+   struct table_desc gdt2 = { .base = 0x1a1a1a1a, .limit = 0xaeae };
+   struct table_desc idt1 = { .base = 0x7b7b7b7b, .limit = 0xf1f1 };
+   struct table_desc idt2 = { .base = 0x89898989, .limit = 0x1313 };
+   unsigned short msw1 = 0x1414, msw2 = 0x2525, msw3 = 3737;
+
+   /* UMIP -- exit with INT3 unless kernel emulation did not trap #GP */
+   do_test(vm86, vmcode_umip - vmcode, VM86_TRAP, 3, "UMIP tests");
+
+   /* Results from displacement-only addressing */
+   msw1 = *(unsigned short *)(test_mem + 2052);
+   memcpy(, test_mem + 2054, sizeof(idt1));
+   memcpy(, test_mem + 2060, sizeof(gdt1));
+
+   /* Results from register-indirect addressing */
+   msw2 = *(unsigned short *)(test_mem + 2066);
+   memcpy(, test_mem + 2068, sizeof(idt2));
+   memcpy(, test_mem + 2074, sizeof(gdt2));
+
+   /* Results when using register operands */
+   msw3 = *(unsigned short *)(test_mem + 2080);
+
+   printf("[INFO]\tResult from SMSW:[0x%04x]\n", msw1);
+   printf("[INFO]\tResult from SIDT: limit[0x%04x]base[0x%08lx]\n",
+  idt1.limit, idt1.base);
+   printf("[INFO]\tResult from SGDT: limit[0x%04x]base[0x%08lx]\n",
+  gdt1.limit, gdt1.base);
+
+   if (msw1 != msw2 || msw1 != msw3)
+   printf("[FAIL]\tAll the results of SMSW should be the same.\n");
+   else
+   printf("[PASS]\tAll the results from SMSW are identical.\n");
+
+   if (memcmp(, , sizeof(gdt1)))
+   printf("[FAIL]\tAll the results of SGDT should be the same.\n");
+   else
+   printf("[PASS]\tAll the results from SGDT are identical.\n");
+
+

[PATCH v10 11/13] x86/traps: Fixup general protection faults caused by UMIP

2017-10-27 Thread Ricardo Neri
If the User-Mode Instruction Prevention CPU feature is available and
enabled, a general protection fault will be issued if the instructions
sgdt, sldt, sidt, str or smsw are executed from user-mode context
(CPL > 0). If the fault was caused by any of the instructions protected
by UMIP, fixup_umip_exception() will emulate dummy results for these
instructions as follows: if running a 32-bit process, sgdt, sidt and smsw
are emulated; str and sldt are not emulated. No emulation is done for
64-bit processes.

If emulation is successful, the result is passed to the user space program
and no SIGSEGV signal is emitted.

Please note that fixup_umip_exception() also caters for the case when
the fault originated while running in virtual-8086 mode.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: x...@kernel.org
Reviewed-by: Andy Lutomirski 
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/traps.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index a5791f3..4c0aa6c 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -60,6 +60,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_64
 #include 
@@ -514,6 +515,10 @@ do_general_protection(struct pt_regs *regs, long 
error_code)
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
cond_local_irq_enable(regs);
 
+   if (static_cpu_has(X86_FEATURE_UMIP))
+   if (user_mode(regs) && fixup_umip_exception(regs))
+   return;
+
if (v8086_mode(regs)) {
local_irq_enable();
handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
-- 
2.7.4



[PATCH v10 00/13] x86: Enable User-Mode Instruction Prevention

2017-10-27 Thread Ricardo Neri
This the 10th revision of a patchset to enable User-Mode Instruction
Prevention (UMIP) in Linux. This is the second part of two series. Instead
of submitting a single huge series. It would be better to split the series
for easier review and handling. The first part of the series can be found
here [1]. In this series, support is added to handle 32-bit and 16-bit
addresses in protected and virtual-8086 modes plus effectively enabling
UMIP. See below for a description on why this support is needed.

=== What is UMIP?

User-Mode Instruction Prevention (UMIP) is a security feature present in
new Intel Processors. If enabled, it prevents the execution of certain
instructions if the Current Privilege Level (CPL) is greater than 0. If
these instructions were executed while in CPL > 0, user space applications
could have access to system-wide settings such as the global and local
descriptor tables, the segment selectors to the current task state and the
local descriptor table. Hiding these system resources reduces the tools
available to craft privilege escalation attacks such as [2].

These are the instructions covered by UMIP:
* SGDT - Store Global Descriptor Table
* SIDT - Store Interrupt Descriptor Table
* SLDT - Store Local Descriptor Table
* SMSW - Store Machine Status Word
* STR - Store Task Register

If any of these instructions is executed with CPL > 0, a general protection
exception is issued when UMIP is enabled. This means that any process that
attempts to use the aforementioned instructions would see a SIGSEGV signal.

=== How does it impact applications?

When enabled, however, UMIP will change the behavior that certain
applications expect from the operating system. For instance, programs
running on WineHQ and DOSEMU2 rely on some of these instructions to
function. Stas Sergeev found that Microsoft Windows 3.1 and dos4gw use the
instruction SMSW when running in virtual-8086 mode [3]. SGDT and SIDT can
also be used on virtual-8086 mode.

In order to not change the behavior of the system (i.e., a SIGSEGV signal
should not be generated when using these instructions), this implementation
traps the #GP fault generated by the CPU and emulates SGDT, SIDT and SMSW.
with dummy returned values. This should be sufficient to not break the
applications mentioned above. Regarding the two remaining instructions,
STR and SLDT, the WineHQ team has shown interest catching the general
protection fault and use it as a vehicle to fix broken applications[4].

Thus, emulation is only provided for protected and virtual-8086 modes. No
emulation is implemented for processes running in long mode. Also the
instructions SLDT and STR are not emulated in any case.

DOSEMU2 emulates virtual-8086 mode via KVM. No applications will be broken
unless DOSEMU2 decides to enable the CR4.UMIP bit in platforms that support
it. Also, this should not pose a security risk as no system resources would
be revealed. Instead, code running inside the KVM would only see the KVM's
GDT, IDT and MSW.

=== How is this series laid out?

++ Extend the insn-eval library
This library is extended to also support 32-bit and 16 bit addresses. This
also implies to add functionality to enforce segment limits in protected
mode and linear address sizes in virtual-8086 mode as described in the
section 20.1.1 of the Intel 64 and IA-32 Architectures Software Development
Manual Vol. 3.

++ Emulate UMIP instructions
A new fixup_umip_exception() functions inspect the instruction at the
instruction pointer. If it is an UMIP-protected instruction, it executes
the emulation code.

++ Add self-tests
Lastly, self-tests are added to entry_from_v86.c to exercise the most
typical use cases of UMIP-protected instructions in a virtual-8086 mode.

++ Extensive tests
Extensive tests were performed to test all the combinations of ModRM,
SiB and displacements for 16-bit and 32-bit encodings for the SS, DS,
ES, FS and GS segments. Tests also include a 64-bit program that uses
segmentation via FS and GS. For this purpose, I temporarily enabled UMIP
support for 64-bit process. This change is not part of this patchset.
The intention is to test the computations of linear addresses in 64-bit
mode, including the extra R8-R15 registers. Extensive test is also
implemented for virtual-8086 tasks. Code of these tests can be found here
[5] and here [6].

Thanks and BR,
Ricardo

[1]. https://www.spinics.net/lists/kernel/msg2635138.html
[2]. 
http://timetobleed.com/a-closer-look-at-a-recent-privilege-escalation-bug-in-linux-cve-2013-2094/
[3]. https://www.winehq.org/pipermail/wine-devel/2017-April/117159.html
[4]. https://marc.info/?l=linux-kernel=147876798717927=2
[5]. 
https://github.com/01org/luv-yocto/tree/rneri/umip/meta-luv/recipes-core/umip/files
[6]. 
https://github.com/01org/luv-yocto/commit/a72a7fe7d68693c0f4100ad86de6ecabde57334f#diff-3860c136a63add269bce4ea50222c248R1

Changes since V9:
*All the changes described in [1], plus:
*Created new utility functions utility functions to handle 

[PATCH v10 11/13] x86/traps: Fixup general protection faults caused by UMIP

2017-10-27 Thread Ricardo Neri
If the User-Mode Instruction Prevention CPU feature is available and
enabled, a general protection fault will be issued if the instructions
sgdt, sldt, sidt, str or smsw are executed from user-mode context
(CPL > 0). If the fault was caused by any of the instructions protected
by UMIP, fixup_umip_exception() will emulate dummy results for these
instructions as follows: if running a 32-bit process, sgdt, sidt and smsw
are emulated; str and sldt are not emulated. No emulation is done for
64-bit processes.

If emulation is successful, the result is passed to the user space program
and no SIGSEGV signal is emitted.

Please note that fixup_umip_exception() also caters for the case when
the fault originated while running in virtual-8086 mode.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: x...@kernel.org
Reviewed-by: Andy Lutomirski 
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/traps.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index a5791f3..4c0aa6c 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -60,6 +60,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_64
 #include 
@@ -514,6 +515,10 @@ do_general_protection(struct pt_regs *regs, long 
error_code)
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
cond_local_irq_enable(regs);
 
+   if (static_cpu_has(X86_FEATURE_UMIP))
+   if (user_mode(regs) && fixup_umip_exception(regs))
+   return;
+
if (v8086_mode(regs)) {
local_irq_enable();
handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
-- 
2.7.4



[PATCH v10 00/13] x86: Enable User-Mode Instruction Prevention

2017-10-27 Thread Ricardo Neri
This the 10th revision of a patchset to enable User-Mode Instruction
Prevention (UMIP) in Linux. This is the second part of two series. Instead
of submitting a single huge series. It would be better to split the series
for easier review and handling. The first part of the series can be found
here [1]. In this series, support is added to handle 32-bit and 16-bit
addresses in protected and virtual-8086 modes plus effectively enabling
UMIP. See below for a description on why this support is needed.

=== What is UMIP?

User-Mode Instruction Prevention (UMIP) is a security feature present in
new Intel Processors. If enabled, it prevents the execution of certain
instructions if the Current Privilege Level (CPL) is greater than 0. If
these instructions were executed while in CPL > 0, user space applications
could have access to system-wide settings such as the global and local
descriptor tables, the segment selectors to the current task state and the
local descriptor table. Hiding these system resources reduces the tools
available to craft privilege escalation attacks such as [2].

These are the instructions covered by UMIP:
* SGDT - Store Global Descriptor Table
* SIDT - Store Interrupt Descriptor Table
* SLDT - Store Local Descriptor Table
* SMSW - Store Machine Status Word
* STR - Store Task Register

If any of these instructions is executed with CPL > 0, a general protection
exception is issued when UMIP is enabled. This means that any process that
attempts to use the aforementioned instructions would see a SIGSEGV signal.

=== How does it impact applications?

When enabled, however, UMIP will change the behavior that certain
applications expect from the operating system. For instance, programs
running on WineHQ and DOSEMU2 rely on some of these instructions to
function. Stas Sergeev found that Microsoft Windows 3.1 and dos4gw use the
instruction SMSW when running in virtual-8086 mode [3]. SGDT and SIDT can
also be used on virtual-8086 mode.

In order to not change the behavior of the system (i.e., a SIGSEGV signal
should not be generated when using these instructions), this implementation
traps the #GP fault generated by the CPU and emulates SGDT, SIDT and SMSW.
with dummy returned values. This should be sufficient to not break the
applications mentioned above. Regarding the two remaining instructions,
STR and SLDT, the WineHQ team has shown interest catching the general
protection fault and use it as a vehicle to fix broken applications[4].

Thus, emulation is only provided for protected and virtual-8086 modes. No
emulation is implemented for processes running in long mode. Also the
instructions SLDT and STR are not emulated in any case.

DOSEMU2 emulates virtual-8086 mode via KVM. No applications will be broken
unless DOSEMU2 decides to enable the CR4.UMIP bit in platforms that support
it. Also, this should not pose a security risk as no system resources would
be revealed. Instead, code running inside the KVM would only see the KVM's
GDT, IDT and MSW.

=== How is this series laid out?

++ Extend the insn-eval library
This library is extended to also support 32-bit and 16 bit addresses. This
also implies to add functionality to enforce segment limits in protected
mode and linear address sizes in virtual-8086 mode as described in the
section 20.1.1 of the Intel 64 and IA-32 Architectures Software Development
Manual Vol. 3.

++ Emulate UMIP instructions
A new fixup_umip_exception() functions inspect the instruction at the
instruction pointer. If it is an UMIP-protected instruction, it executes
the emulation code.

++ Add self-tests
Lastly, self-tests are added to entry_from_v86.c to exercise the most
typical use cases of UMIP-protected instructions in a virtual-8086 mode.

++ Extensive tests
Extensive tests were performed to test all the combinations of ModRM,
SiB and displacements for 16-bit and 32-bit encodings for the SS, DS,
ES, FS and GS segments. Tests also include a 64-bit program that uses
segmentation via FS and GS. For this purpose, I temporarily enabled UMIP
support for 64-bit process. This change is not part of this patchset.
The intention is to test the computations of linear addresses in 64-bit
mode, including the extra R8-R15 registers. Extensive test is also
implemented for virtual-8086 tasks. Code of these tests can be found here
[5] and here [6].

Thanks and BR,
Ricardo

[1]. https://www.spinics.net/lists/kernel/msg2635138.html
[2]. 
http://timetobleed.com/a-closer-look-at-a-recent-privilege-escalation-bug-in-linux-cve-2013-2094/
[3]. https://www.winehq.org/pipermail/wine-devel/2017-April/117159.html
[4]. https://marc.info/?l=linux-kernel=147876798717927=2
[5]. 
https://github.com/01org/luv-yocto/tree/rneri/umip/meta-luv/recipes-core/umip/files
[6]. 
https://github.com/01org/luv-yocto/commit/a72a7fe7d68693c0f4100ad86de6ecabde57334f#diff-3860c136a63add269bce4ea50222c248R1

Changes since V9:
*All the changes described in [1], plus:
*Created new utility functions utility functions to handle 

Re: [RFC PATCH v10 6/7] PCI / PM: Move acpi wakeup code to pci core

2017-10-27 Thread Brian Norris
Hi Jeffy,

On Fri, Oct 27, 2017 at 03:26:11PM +0800, Jeffy Chen wrote:
> Move acpi wakeup code to pci core as pci_set_wakeup(), so that other
> platforms could reuse it.

I think you need to double check your refactoring. I believe you may
have changed some behavior here.

> Also add .setup_dev() / .setup_host_bridge() / .cleanup() platform pm
> ops's callbacks to setup and cleanup pci devices and host bridge for
> wakeup.
> 
> Signed-off-by: Jeffy Chen 
> ---
> 
> Changes in v10: None
> Changes in v9: None
> Changes in v8: None
> Changes in v7: None
> Changes in v6: None
> Changes in v5: None
> Changes in v3: None
> Changes in v2: None
> 
>  drivers/pci/pci-acpi.c   | 121 
> +++
>  drivers/pci/pci-driver.c |   9 
>  drivers/pci/pci.c|  84 
>  drivers/pci/pci.h|  28 +--
>  drivers/pci/probe.c  |  12 -
>  drivers/pci/remove.c |   2 +
>  include/linux/pci.h  |   2 +
>  7 files changed, 180 insertions(+), 78 deletions(-)
> 
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index 4708eb9df71b..ee96e7afe1ac 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -569,31 +569,6 @@ static pci_power_t acpi_pci_get_power_state(struct 
> pci_dev *dev)
>   return state_conv[state];
>  }
>  
> -static int acpi_pci_propagate_wakeup(struct pci_bus *bus, bool enable)
> -{
> - while (bus->parent) {
> - if (acpi_pm_device_can_wakeup(>self->dev))
> - return acpi_pm_set_bridge_wakeup(>self->dev, 
> enable);
> -
> - bus = bus->parent;
> - }
> -
> - /* We have reached the root bus. */
> - if (bus->bridge) {
> - if (acpi_pm_device_can_wakeup(bus->bridge))
> - return acpi_pm_set_bridge_wakeup(bus->bridge, enable);
> - }
> - return 0;
> -}
> -
> -static int acpi_pci_wakeup(struct pci_dev *dev, bool enable)
> -{
> - if (acpi_pm_device_can_wakeup(>dev))
> - return acpi_pm_set_device_wakeup(>dev, enable);
> -
> - return acpi_pci_propagate_wakeup(dev->bus, enable);
> -}
> -
>  static bool acpi_pci_need_resume(struct pci_dev *dev)
>  {
>   struct acpi_device *adev = ACPI_COMPANION(>dev);
> @@ -610,14 +585,29 @@ static bool acpi_pci_need_resume(struct pci_dev *dev)
>   return !!adev->power.flags.dsw_present;
>  }
>  
> -static const struct pci_platform_pm_ops acpi_pci_platform_pm = {
> - .is_manageable = acpi_pci_power_manageable,
> - .set_state = acpi_pci_set_power_state,
> - .get_state = acpi_pci_get_power_state,
> - .choose_state = acpi_pci_choose_state,
> - .set_wakeup = acpi_pci_wakeup,
> - .need_resume = acpi_pci_need_resume,
> -};
> +static bool acpi_pci_can_wakeup(void *pmdata)
> +{
> + struct device *dev = pmdata;
> +
> + if (!dev)
> + return false;
> +
> + return acpi_pm_device_can_wakeup(dev);
> +}
> +
> +static int acpi_pci_dev_wakeup(void *pmdata, bool enable)
> +{
> + struct device *dev = pmdata;
> +
> + return acpi_pm_set_device_wakeup(dev, enable);
> +}
> +
> +static int acpi_pci_bridge_wakeup(void *pmdata, bool enable)
> +{
> + struct device *dev = pmdata;
> +
> + return acpi_pm_set_bridge_wakeup(dev, enable);
> +}
>  
>  void acpi_pci_add_bus(struct pci_bus *bus)
>  {
> @@ -658,20 +648,6 @@ void acpi_pci_remove_bus(struct pci_bus *bus)
>   acpi_pci_slot_remove(bus);
>  }
>  
> -/* ACPI bus type */
> -static struct acpi_device *acpi_pci_find_companion(struct device *dev)
> -{
> - struct pci_dev *pci_dev = to_pci_dev(dev);
> - bool check_children;
> - u64 addr;
> -
> - check_children = pci_is_bridge(pci_dev);
> - /* Please ref to ACPI spec for the syntax of _ADR */
> - addr = (PCI_SLOT(pci_dev->devfn) << 16) | PCI_FUNC(pci_dev->devfn);
> - return acpi_find_child_device(ACPI_COMPANION(dev->parent), addr,
> -   check_children);
> -}
> -
>  /**
>   * pci_acpi_optimize_delay - optimize PCI D3 and D3cold delay from ACPI
>   * @pdev: the PCI device whose delay is to be updated
> @@ -723,34 +699,55 @@ static void pci_acpi_optimize_delay(struct pci_dev 
> *pdev,
>   ACPI_FREE(obj);
>  }
>  
> -static void pci_acpi_setup(struct device *dev)
> +static void *acpi_pci_setup_dev(struct pci_dev *pci_dev)
>  {
> - struct pci_dev *pci_dev = to_pci_dev(dev);
> - struct acpi_device *adev = ACPI_COMPANION(dev);
> + struct acpi_device *adev = ACPI_COMPANION(_dev->dev);
>  
>   if (!adev)
> - return;
> + return NULL;
>  
>   pci_acpi_optimize_delay(pci_dev, adev->handle);
>  
>   pci_acpi_add_pm_notifier(adev, pci_dev);
>   if (!adev->wakeup.flags.valid)
> - return;
> + return NULL;
> +
> + device_set_wakeup_capable(_dev->dev, true);
> + acpi_pm_set_device_wakeup(_dev->dev, false);
>  
> - 

Re: [RFC PATCH v10 6/7] PCI / PM: Move acpi wakeup code to pci core

2017-10-27 Thread Brian Norris
Hi Jeffy,

On Fri, Oct 27, 2017 at 03:26:11PM +0800, Jeffy Chen wrote:
> Move acpi wakeup code to pci core as pci_set_wakeup(), so that other
> platforms could reuse it.

I think you need to double check your refactoring. I believe you may
have changed some behavior here.

> Also add .setup_dev() / .setup_host_bridge() / .cleanup() platform pm
> ops's callbacks to setup and cleanup pci devices and host bridge for
> wakeup.
> 
> Signed-off-by: Jeffy Chen 
> ---
> 
> Changes in v10: None
> Changes in v9: None
> Changes in v8: None
> Changes in v7: None
> Changes in v6: None
> Changes in v5: None
> Changes in v3: None
> Changes in v2: None
> 
>  drivers/pci/pci-acpi.c   | 121 
> +++
>  drivers/pci/pci-driver.c |   9 
>  drivers/pci/pci.c|  84 
>  drivers/pci/pci.h|  28 +--
>  drivers/pci/probe.c  |  12 -
>  drivers/pci/remove.c |   2 +
>  include/linux/pci.h  |   2 +
>  7 files changed, 180 insertions(+), 78 deletions(-)
> 
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index 4708eb9df71b..ee96e7afe1ac 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -569,31 +569,6 @@ static pci_power_t acpi_pci_get_power_state(struct 
> pci_dev *dev)
>   return state_conv[state];
>  }
>  
> -static int acpi_pci_propagate_wakeup(struct pci_bus *bus, bool enable)
> -{
> - while (bus->parent) {
> - if (acpi_pm_device_can_wakeup(>self->dev))
> - return acpi_pm_set_bridge_wakeup(>self->dev, 
> enable);
> -
> - bus = bus->parent;
> - }
> -
> - /* We have reached the root bus. */
> - if (bus->bridge) {
> - if (acpi_pm_device_can_wakeup(bus->bridge))
> - return acpi_pm_set_bridge_wakeup(bus->bridge, enable);
> - }
> - return 0;
> -}
> -
> -static int acpi_pci_wakeup(struct pci_dev *dev, bool enable)
> -{
> - if (acpi_pm_device_can_wakeup(>dev))
> - return acpi_pm_set_device_wakeup(>dev, enable);
> -
> - return acpi_pci_propagate_wakeup(dev->bus, enable);
> -}
> -
>  static bool acpi_pci_need_resume(struct pci_dev *dev)
>  {
>   struct acpi_device *adev = ACPI_COMPANION(>dev);
> @@ -610,14 +585,29 @@ static bool acpi_pci_need_resume(struct pci_dev *dev)
>   return !!adev->power.flags.dsw_present;
>  }
>  
> -static const struct pci_platform_pm_ops acpi_pci_platform_pm = {
> - .is_manageable = acpi_pci_power_manageable,
> - .set_state = acpi_pci_set_power_state,
> - .get_state = acpi_pci_get_power_state,
> - .choose_state = acpi_pci_choose_state,
> - .set_wakeup = acpi_pci_wakeup,
> - .need_resume = acpi_pci_need_resume,
> -};
> +static bool acpi_pci_can_wakeup(void *pmdata)
> +{
> + struct device *dev = pmdata;
> +
> + if (!dev)
> + return false;
> +
> + return acpi_pm_device_can_wakeup(dev);
> +}
> +
> +static int acpi_pci_dev_wakeup(void *pmdata, bool enable)
> +{
> + struct device *dev = pmdata;
> +
> + return acpi_pm_set_device_wakeup(dev, enable);
> +}
> +
> +static int acpi_pci_bridge_wakeup(void *pmdata, bool enable)
> +{
> + struct device *dev = pmdata;
> +
> + return acpi_pm_set_bridge_wakeup(dev, enable);
> +}
>  
>  void acpi_pci_add_bus(struct pci_bus *bus)
>  {
> @@ -658,20 +648,6 @@ void acpi_pci_remove_bus(struct pci_bus *bus)
>   acpi_pci_slot_remove(bus);
>  }
>  
> -/* ACPI bus type */
> -static struct acpi_device *acpi_pci_find_companion(struct device *dev)
> -{
> - struct pci_dev *pci_dev = to_pci_dev(dev);
> - bool check_children;
> - u64 addr;
> -
> - check_children = pci_is_bridge(pci_dev);
> - /* Please ref to ACPI spec for the syntax of _ADR */
> - addr = (PCI_SLOT(pci_dev->devfn) << 16) | PCI_FUNC(pci_dev->devfn);
> - return acpi_find_child_device(ACPI_COMPANION(dev->parent), addr,
> -   check_children);
> -}
> -
>  /**
>   * pci_acpi_optimize_delay - optimize PCI D3 and D3cold delay from ACPI
>   * @pdev: the PCI device whose delay is to be updated
> @@ -723,34 +699,55 @@ static void pci_acpi_optimize_delay(struct pci_dev 
> *pdev,
>   ACPI_FREE(obj);
>  }
>  
> -static void pci_acpi_setup(struct device *dev)
> +static void *acpi_pci_setup_dev(struct pci_dev *pci_dev)
>  {
> - struct pci_dev *pci_dev = to_pci_dev(dev);
> - struct acpi_device *adev = ACPI_COMPANION(dev);
> + struct acpi_device *adev = ACPI_COMPANION(_dev->dev);
>  
>   if (!adev)
> - return;
> + return NULL;
>  
>   pci_acpi_optimize_delay(pci_dev, adev->handle);
>  
>   pci_acpi_add_pm_notifier(adev, pci_dev);
>   if (!adev->wakeup.flags.valid)
> - return;
> + return NULL;
> +
> + device_set_wakeup_capable(_dev->dev, true);
> + acpi_pm_set_device_wakeup(_dev->dev, false);
>  
> - device_set_wakeup_capable(dev, true);
> - 

Re: [PATCH] x86/mm/64: Rename the register_page_bootmem_memmap() 'size' parameter to 'nr_pages'

2017-10-27 Thread Baoquan He
Hi,

On 10/27/17 at 11:51pm, kbuild test robot wrote:
> Hi Ingo,
> 
> Thank you for the patch! Yet we hit a small issue.
> [auto build test ERROR on tip/x86/core]
> [also build test ERROR on v4.14-rc6 next-20171018]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Ingo-Molnar/x86-mm-64-Rename-the-register_page_bootmem_memmap-size-parameter-to-nr_pages/20171027-181456
> config: x86_64-kexec (attached as .config)
> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64 
> 
> All errors (new ones prefixed by >>):
> 
>arch/x86/mm/init_64.c: In function 'register_page_bootmem_memmap':
> >> arch/x86/mm/init_64.c:1341:15: error: 'nr_pages' redeclared as different 
> >> kind of symbol
>  unsigned int nr_pages;
>   ^~~~
>arch/x86/mm/init_64.c:1332:46: note: previous definition of 'nr_pages' was 
> here
>   struct page *start_page, unsigned long nr_pages)
>  ^~~~
The code change is incomplete in this thread, Ingo helped rewrite the
patch log in the suggested patch. I catched this error too when rebuilt,
will post v3. Please drop this one.

Thanks
Baoquan

> 
> vim +/nr_pages +1341 arch/x86/mm/init_64.c
> 
> c2b91e2ee Yinghai Lu 2008-04-12  1329  
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1330  #if 
> defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && 
> defined(CONFIG_HAVE_BOOTMEM_INFO_NODE)
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1331  void 
> register_page_bootmem_memmap(unsigned long section_nr,
> 56f1692b6 Ingo Molnar2017-10-24  1332 
>   struct page *start_page, unsigned long nr_pages)
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1333  {
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1334 unsigned long addr = 
> (unsigned long)start_page;
> 56f1692b6 Ingo Molnar2017-10-24  1335 unsigned long end = 
> (unsigned long)(start_page + nr_pages);
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1336 unsigned long next;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1337 pgd_t *pgd;
> f2a6a7050 Kirill A. Shutemov 2017-03-17  1338 p4d_t *p4d;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1339 pud_t *pud;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1340 pmd_t *pmd;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22 @1341 unsigned int nr_pages;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1342 struct page *page;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1343  
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1344 for (; addr < end; addr 
> = next) {
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1345 pte_t *pte = 
> NULL;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1346  
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1347 pgd = 
> pgd_offset_k(addr);
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1348 if 
> (pgd_none(*pgd)) {
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1349 next = 
> (addr + PAGE_SIZE) & PAGE_MASK;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1350 
> continue;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1351 }
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1352 
> get_page_bootmem(section_nr, pgd_page(*pgd), MIX_SECTION_INFO);
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1353  
> f2a6a7050 Kirill A. Shutemov 2017-03-17  1354 p4d = 
> p4d_offset(pgd, addr);
> f2a6a7050 Kirill A. Shutemov 2017-03-17  1355 if 
> (p4d_none(*p4d)) {
> f2a6a7050 Kirill A. Shutemov 2017-03-17  1356 next = 
> (addr + PAGE_SIZE) & PAGE_MASK;
> f2a6a7050 Kirill A. Shutemov 2017-03-17  1357 
> continue;
> f2a6a7050 Kirill A. Shutemov 2017-03-17  1358 }
> f2a6a7050 Kirill A. Shutemov 2017-03-17  1359 
> get_page_bootmem(section_nr, p4d_page(*p4d), MIX_SECTION_INFO);
> f2a6a7050 Kirill A. Shutemov 2017-03-17  1360  
> f2a6a7050 Kirill A. Shutemov 2017-03-17  1361 pud = 
> pud_offset(p4d, addr);
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1362 if 
> (pud_none(*pud)) {
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1363 next = 
> (addr + PAGE_SIZE) & PAGE_MASK;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1364 
> continue;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1365 }
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1366 
> get_

Re: [PATCH] x86/mm/64: Rename the register_page_bootmem_memmap() 'size' parameter to 'nr_pages'

2017-10-27 Thread Baoquan He
Hi,

On 10/27/17 at 11:51pm, kbuild test robot wrote:
> Hi Ingo,
> 
> Thank you for the patch! Yet we hit a small issue.
> [auto build test ERROR on tip/x86/core]
> [also build test ERROR on v4.14-rc6 next-20171018]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Ingo-Molnar/x86-mm-64-Rename-the-register_page_bootmem_memmap-size-parameter-to-nr_pages/20171027-181456
> config: x86_64-kexec (attached as .config)
> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64 
> 
> All errors (new ones prefixed by >>):
> 
>arch/x86/mm/init_64.c: In function 'register_page_bootmem_memmap':
> >> arch/x86/mm/init_64.c:1341:15: error: 'nr_pages' redeclared as different 
> >> kind of symbol
>  unsigned int nr_pages;
>   ^~~~
>arch/x86/mm/init_64.c:1332:46: note: previous definition of 'nr_pages' was 
> here
>   struct page *start_page, unsigned long nr_pages)
>  ^~~~
The code change is incomplete in this thread, Ingo helped rewrite the
patch log in the suggested patch. I catched this error too when rebuilt,
will post v3. Please drop this one.

Thanks
Baoquan

> 
> vim +/nr_pages +1341 arch/x86/mm/init_64.c
> 
> c2b91e2ee Yinghai Lu 2008-04-12  1329  
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1330  #if 
> defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && 
> defined(CONFIG_HAVE_BOOTMEM_INFO_NODE)
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1331  void 
> register_page_bootmem_memmap(unsigned long section_nr,
> 56f1692b6 Ingo Molnar2017-10-24  1332 
>   struct page *start_page, unsigned long nr_pages)
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1333  {
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1334 unsigned long addr = 
> (unsigned long)start_page;
> 56f1692b6 Ingo Molnar2017-10-24  1335 unsigned long end = 
> (unsigned long)(start_page + nr_pages);
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1336 unsigned long next;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1337 pgd_t *pgd;
> f2a6a7050 Kirill A. Shutemov 2017-03-17  1338 p4d_t *p4d;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1339 pud_t *pud;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1340 pmd_t *pmd;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22 @1341 unsigned int nr_pages;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1342 struct page *page;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1343  
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1344 for (; addr < end; addr 
> = next) {
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1345 pte_t *pte = 
> NULL;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1346  
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1347 pgd = 
> pgd_offset_k(addr);
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1348 if 
> (pgd_none(*pgd)) {
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1349 next = 
> (addr + PAGE_SIZE) & PAGE_MASK;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1350 
> continue;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1351 }
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1352 
> get_page_bootmem(section_nr, pgd_page(*pgd), MIX_SECTION_INFO);
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1353  
> f2a6a7050 Kirill A. Shutemov 2017-03-17  1354 p4d = 
> p4d_offset(pgd, addr);
> f2a6a7050 Kirill A. Shutemov 2017-03-17  1355 if 
> (p4d_none(*p4d)) {
> f2a6a7050 Kirill A. Shutemov 2017-03-17  1356 next = 
> (addr + PAGE_SIZE) & PAGE_MASK;
> f2a6a7050 Kirill A. Shutemov 2017-03-17  1357 
> continue;
> f2a6a7050 Kirill A. Shutemov 2017-03-17  1358 }
> f2a6a7050 Kirill A. Shutemov 2017-03-17  1359 
> get_page_bootmem(section_nr, p4d_page(*p4d), MIX_SECTION_INFO);
> f2a6a7050 Kirill A. Shutemov 2017-03-17  1360  
> f2a6a7050 Kirill A. Shutemov 2017-03-17  1361 pud = 
> pud_offset(p4d, addr);
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1362 if 
> (pud_none(*pud)) {
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1363 next = 
> (addr + PAGE_SIZE) & PAGE_MASK;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1364 
> continue;
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1365 }
> 46723bfa5 Yasuaki Ishimatsu  2013-02-22  1366 
> get_

[PATCH 2/2] perf tools: arm-spe: add customized strerror function

2017-10-27 Thread Kim Phillips
Add a routine to try to help the user determine how they're
supposed to use the SPE driver.

Example #1:  Trouble setting sample rate:

$ sudo ./perf record -e arm_spe_0/ts_enable=1,pa_enable=1/ -F 1 true
Error:
required sample period missing.  Use '--count='
$ ./perf record -e arm_spe_0/ts_enable=1/ -c 0 true
Error:
required sample period missing.  Use '--count='
$ ./perf record -e arm_spe_0/ts_enable=1/ -c 1 true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.009 MB perf.data ]
$

Example #2:  Non-privileged user tries to obtain physical address data:

$ ./perf record -e arm_spe_0/ts_enable=1,pa_enable=1/ -c 1 true
Error:
arm_spe_0/ts_enable=1,pa_enable=1/:u: physical address and time, and EL1 
context ID data collection
require admin privileges
$ sudo ./perf record -e arm_spe_0/ts_enable=1,pa_enable=1/ -c 1 true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.054 MB perf.data ]
$

Example #3:  Trying to exclude idle profiling:

$ sudo ./perf record -e arm_spe_0/ts_enable=1,pa_enable=1/I -c 1 true
Error:
arm_spe_0/ts_enable=1,pa_enable=1/I: Cannot exclude profiling when idle, try 
without //I
$

Signed-off-by: Kim Phillips 
---
DO NOT APPLY:  This should really be an RFC, since depends on this RFC:

https://www.spinics.net/lists/arm-kernel/msg613725.html

but providing as part of SPE tool patch anyway in case it helps
resolve the RFC.

 tools/perf/arch/arm64/util/evsel.c | 43 ++
 1 file changed, 43 insertions(+)

diff --git a/tools/perf/arch/arm64/util/evsel.c 
b/tools/perf/arch/arm64/util/evsel.c
index e09cbb5d1518..222bf761d11b 100644
--- a/tools/perf/arch/arm64/util/evsel.c
+++ b/tools/perf/arch/arm64/util/evsel.c
@@ -70,6 +70,44 @@ target__has_task(target), target__has_cpu(target), 
target__none(target)
return 0;
 }
 
+#ifdef HAVE_AUXTRACE_SUPPORT
+static int strerror_arm_spe(struct perf_evsel *evsel,
+   struct target *target __maybe_unused,
+   int err, char *msg, size_t size)
+{
+   const char *evname = perf_evsel__name(evsel);
+   struct perf_event_attr *attr = >attr;
+
+   switch (err) {
+   case EOPNOTSUPP:
+   if (attr->exclude_idle)
+   return scnprintf(msg, size,
+   "%s: Cannot exclude profiling when idle, try without //I\n", evname);
+   return scnprintf(msg, size, "%s: unsupported error code:\n"
+   "EITHER this driver may not support a possibly h/w-implementation\n"
+   "\tdefined event filter bit that has been set in the PMSEVFR register\n"
+   "OR h/w doesn't support filtering by one or more of: latency,\n"
+   "\toperation type, or events\n", evname);
+   break;
+   case EACCES:
+   if (strstr(evname, "pa_enable") || strstr(evname, "pct_enable"))
+   return scnprintf(msg, size,
+   "%s: physical address and time, and EL1 context ID data collection\n"
+   "\trequire admin privileges\n", evname);
+   break;
+   case EINVAL:
+   if (attr->freq || !attr->sample_period)
+   return scnprintf(msg, size,
+   "required sample period missing.  Use '--count='\n");
+   break;
+   default:
+   break;
+   }
+
+   return 0;
+}
+#endif
+
 int perf_evsel__open_strerror_arch(struct perf_evsel *evsel,
   struct target *target,
   int err, char *msg, size_t size)
@@ -80,5 +118,10 @@ int perf_evsel__open_strerror_arch(struct perf_evsel *evsel,
if (strstarts(evname, "ccn"))
return ccn_strerror(evsel, target, err, msg, size);
 
+#ifdef HAVE_AUXTRACE_SUPPORT
+   if (strstarts(evname, "arm_spe"))
+   return strerror_arm_spe(evsel, target, err, msg, size);
+#endif
+
return 0;
 }
-- 
2.14.2



[PATCH 2/2] perf tools: arm-spe: add customized strerror function

2017-10-27 Thread Kim Phillips
Add a routine to try to help the user determine how they're
supposed to use the SPE driver.

Example #1:  Trouble setting sample rate:

$ sudo ./perf record -e arm_spe_0/ts_enable=1,pa_enable=1/ -F 1 true
Error:
required sample period missing.  Use '--count='
$ ./perf record -e arm_spe_0/ts_enable=1/ -c 0 true
Error:
required sample period missing.  Use '--count='
$ ./perf record -e arm_spe_0/ts_enable=1/ -c 1 true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.009 MB perf.data ]
$

Example #2:  Non-privileged user tries to obtain physical address data:

$ ./perf record -e arm_spe_0/ts_enable=1,pa_enable=1/ -c 1 true
Error:
arm_spe_0/ts_enable=1,pa_enable=1/:u: physical address and time, and EL1 
context ID data collection
require admin privileges
$ sudo ./perf record -e arm_spe_0/ts_enable=1,pa_enable=1/ -c 1 true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.054 MB perf.data ]
$

Example #3:  Trying to exclude idle profiling:

$ sudo ./perf record -e arm_spe_0/ts_enable=1,pa_enable=1/I -c 1 true
Error:
arm_spe_0/ts_enable=1,pa_enable=1/I: Cannot exclude profiling when idle, try 
without //I
$

Signed-off-by: Kim Phillips 
---
DO NOT APPLY:  This should really be an RFC, since depends on this RFC:

https://www.spinics.net/lists/arm-kernel/msg613725.html

but providing as part of SPE tool patch anyway in case it helps
resolve the RFC.

 tools/perf/arch/arm64/util/evsel.c | 43 ++
 1 file changed, 43 insertions(+)

diff --git a/tools/perf/arch/arm64/util/evsel.c 
b/tools/perf/arch/arm64/util/evsel.c
index e09cbb5d1518..222bf761d11b 100644
--- a/tools/perf/arch/arm64/util/evsel.c
+++ b/tools/perf/arch/arm64/util/evsel.c
@@ -70,6 +70,44 @@ target__has_task(target), target__has_cpu(target), 
target__none(target)
return 0;
 }
 
+#ifdef HAVE_AUXTRACE_SUPPORT
+static int strerror_arm_spe(struct perf_evsel *evsel,
+   struct target *target __maybe_unused,
+   int err, char *msg, size_t size)
+{
+   const char *evname = perf_evsel__name(evsel);
+   struct perf_event_attr *attr = >attr;
+
+   switch (err) {
+   case EOPNOTSUPP:
+   if (attr->exclude_idle)
+   return scnprintf(msg, size,
+   "%s: Cannot exclude profiling when idle, try without //I\n", evname);
+   return scnprintf(msg, size, "%s: unsupported error code:\n"
+   "EITHER this driver may not support a possibly h/w-implementation\n"
+   "\tdefined event filter bit that has been set in the PMSEVFR register\n"
+   "OR h/w doesn't support filtering by one or more of: latency,\n"
+   "\toperation type, or events\n", evname);
+   break;
+   case EACCES:
+   if (strstr(evname, "pa_enable") || strstr(evname, "pct_enable"))
+   return scnprintf(msg, size,
+   "%s: physical address and time, and EL1 context ID data collection\n"
+   "\trequire admin privileges\n", evname);
+   break;
+   case EINVAL:
+   if (attr->freq || !attr->sample_period)
+   return scnprintf(msg, size,
+   "required sample period missing.  Use '--count='\n");
+   break;
+   default:
+   break;
+   }
+
+   return 0;
+}
+#endif
+
 int perf_evsel__open_strerror_arch(struct perf_evsel *evsel,
   struct target *target,
   int err, char *msg, size_t size)
@@ -80,5 +118,10 @@ int perf_evsel__open_strerror_arch(struct perf_evsel *evsel,
if (strstarts(evname, "ccn"))
return ccn_strerror(evsel, target, err, msg, size);
 
+#ifdef HAVE_AUXTRACE_SUPPORT
+   if (strstarts(evname, "arm_spe"))
+   return strerror_arm_spe(evsel, target, err, msg, size);
+#endif
+
return 0;
 }
-- 
2.14.2



[PATCH v3 1/2] perf tools: Add ARM Statistical Profiling Extensions (SPE) support

2017-10-27 Thread Kim Phillips
'perf record' and 'perf report --dump-raw-trace' supported in this
release.

Example usage:

$ ./perf record -e arm_spe_0/ts_enable=1,pa_enable=1/ \
dd if=/dev/zero of=/dev/null count=1

perf report --dump-raw-trace

Note that the perf.data file is portable, so the report can be run on
another architecture host if necessary.

Output will contain raw SPE data and its textual representation, such
as:

0x550 [0x30]: PERF_RECORD_AUXTRACE size: 0xc408  offset: 0  ref: 0x30005619  
idx: 3  tid: 2109  cpu: 3
.
. ... ARM SPE data: size 50184 bytes
.  :  49 00   LD
.  0002:  b2 00 9c 7b 7a 00 80 ff ff  VA 
0x80007a7b9c00
.  000b:  9a 00 00LAT 0 XLAT
.  000e:  42 16   EV RETIRED 
L1D-ACCESS TLB-ACCESS
.  0010:  b0 b0 c9 15 08 00 00 ff ff  PC 
0xff0815c9b0 el3 ns=1
.  0019:  98 00 00LAT 0 TOT
.  001c:  71 00 20 fa fd 16 00 00 00  TS 98750308352
.  0025:  49 01   ST
.  0027:  b2 60 bc 0c 0f 00 00 ff ff  VA 
0x0f0cbc60
.  0030:  9a 00 00LAT 0 XLAT
.  0033:  42 16   EV RETIRED 
L1D-ACCESS TLB-ACCESS
.  0035:  b0 48 cc 15 08 00 00 ff ff  PC 
0xff0815cc48 el3 ns=1
.  003e:  98 00 00LAT 0 TOT
.  0041:  71 00 20 fa fd 16 00 00 00  TS 98750308352
.  004a:  48 00   INSN-OTHER
.  004c:  42 02   EV RETIRED
.  004e:  b0 ac 47 0c 08 00 00 ff ff  PC 
0xff080c47ac el3 ns=1
.  0057:  98 00 00LAT 0 TOT
.  005a:  71 00 20 fa fd 16 00 00 00  TS 98750308352
.  0063:  49 00   LD
.  0065:  b2 18 48 e5 7a 00 80 ff ff  VA 
0x80007ae54818
.  006e:  9a 00 00LAT 0 XLAT
.  0071:  42 16   EV RETIRED 
L1D-ACCESS TLB-ACCESS
.  0073:  b0 08 f8 15 08 00 00 ff ff  PC 
0xff0815f808 el3 ns=1
.  007c:  98 00 00LAT 0 TOT
.  007f:  71 00 20 fa fd 16 00 00 00  TS 98750308352
...

Other release notes:

- applies to acme's perf/{core,urgent} branches, likely elsewhere

- Report is self-contained within the tool.  Record requires Will's SPE
  driver:

  http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/536221.html

- the intel-bts implementation was used as a starting point; its
  min/default/max buffer sizes and power of 2 pages granularity need to be
  revisited for ARM SPE

- multiple SPE clusters/domains not supported

- snapshot support (record -S), and conversion to native perf events
  (e.g., via 'perf inject --itrace'), are also not supported

- technically both cs-etm and spe can be used simultaneously, however
  disabled for simplicity in this release

Signed-off-by: Kim Phillips 
---
v3: trying to address comments from v2:

- despite adding a find_all_arm_spe_pmus() function to scan for all
  arm_spe_ device instances, in order to ensure auxtrace_record__init
  successfully matches the evsel type with the correct arm_spe_pmu type,
  I am still having trouble running in multi-SPE PPI (heterogeneous)
  environments (mmap fails with EOPNOTSUPP, as does running with
  --per-thread on homogeneous systems).

- arm_spe_reference: use gettime instead of direct cntvct register access

- spe-decoder: add a comment for why SPE_EVENTS code sets packet->index.

- added arm_spe_pmu_default_config that accesses the driver
  caps/min_interval and sets the default sampling period to it.  This way
  users don't have to specify -c explicitly.  Also set is_uncore to false.

- set more sampling bits in the arm_spe and its tracking evsel.  Still
  unsure if too liberal, and not sure whether it needs another context
  switch tracking evsel.  Comments welcome!

v2: mostly addressing Mark Rutland's comments as much as possible without his
feedback to my feedback:

- decoder refactored with a get_payload, not extended to with-ext_len ones like
  get_addr,  named the constants

- 0x-ified %x output formats, but decided to not sign extend the addresses in
  the raw dump, rather do so if necessary in the synthesis stage:
  SPE implementations differ in this area, and raw dump should reflect that.

- CPU mask / new record behaviour bisected to commit e3ba76deef23064 "perf
  tools: Force uncore events to system wide monitoring".  Waiting to hear back
  on why driver can't do 

[PATCH v3 1/2] perf tools: Add ARM Statistical Profiling Extensions (SPE) support

2017-10-27 Thread Kim Phillips
'perf record' and 'perf report --dump-raw-trace' supported in this
release.

Example usage:

$ ./perf record -e arm_spe_0/ts_enable=1,pa_enable=1/ \
dd if=/dev/zero of=/dev/null count=1

perf report --dump-raw-trace

Note that the perf.data file is portable, so the report can be run on
another architecture host if necessary.

Output will contain raw SPE data and its textual representation, such
as:

0x550 [0x30]: PERF_RECORD_AUXTRACE size: 0xc408  offset: 0  ref: 0x30005619  
idx: 3  tid: 2109  cpu: 3
.
. ... ARM SPE data: size 50184 bytes
.  :  49 00   LD
.  0002:  b2 00 9c 7b 7a 00 80 ff ff  VA 
0x80007a7b9c00
.  000b:  9a 00 00LAT 0 XLAT
.  000e:  42 16   EV RETIRED 
L1D-ACCESS TLB-ACCESS
.  0010:  b0 b0 c9 15 08 00 00 ff ff  PC 
0xff0815c9b0 el3 ns=1
.  0019:  98 00 00LAT 0 TOT
.  001c:  71 00 20 fa fd 16 00 00 00  TS 98750308352
.  0025:  49 01   ST
.  0027:  b2 60 bc 0c 0f 00 00 ff ff  VA 
0x0f0cbc60
.  0030:  9a 00 00LAT 0 XLAT
.  0033:  42 16   EV RETIRED 
L1D-ACCESS TLB-ACCESS
.  0035:  b0 48 cc 15 08 00 00 ff ff  PC 
0xff0815cc48 el3 ns=1
.  003e:  98 00 00LAT 0 TOT
.  0041:  71 00 20 fa fd 16 00 00 00  TS 98750308352
.  004a:  48 00   INSN-OTHER
.  004c:  42 02   EV RETIRED
.  004e:  b0 ac 47 0c 08 00 00 ff ff  PC 
0xff080c47ac el3 ns=1
.  0057:  98 00 00LAT 0 TOT
.  005a:  71 00 20 fa fd 16 00 00 00  TS 98750308352
.  0063:  49 00   LD
.  0065:  b2 18 48 e5 7a 00 80 ff ff  VA 
0x80007ae54818
.  006e:  9a 00 00LAT 0 XLAT
.  0071:  42 16   EV RETIRED 
L1D-ACCESS TLB-ACCESS
.  0073:  b0 08 f8 15 08 00 00 ff ff  PC 
0xff0815f808 el3 ns=1
.  007c:  98 00 00LAT 0 TOT
.  007f:  71 00 20 fa fd 16 00 00 00  TS 98750308352
...

Other release notes:

- applies to acme's perf/{core,urgent} branches, likely elsewhere

- Report is self-contained within the tool.  Record requires Will's SPE
  driver:

  http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/536221.html

- the intel-bts implementation was used as a starting point; its
  min/default/max buffer sizes and power of 2 pages granularity need to be
  revisited for ARM SPE

- multiple SPE clusters/domains not supported

- snapshot support (record -S), and conversion to native perf events
  (e.g., via 'perf inject --itrace'), are also not supported

- technically both cs-etm and spe can be used simultaneously, however
  disabled for simplicity in this release

Signed-off-by: Kim Phillips 
---
v3: trying to address comments from v2:

- despite adding a find_all_arm_spe_pmus() function to scan for all
  arm_spe_ device instances, in order to ensure auxtrace_record__init
  successfully matches the evsel type with the correct arm_spe_pmu type,
  I am still having trouble running in multi-SPE PPI (heterogeneous)
  environments (mmap fails with EOPNOTSUPP, as does running with
  --per-thread on homogeneous systems).

- arm_spe_reference: use gettime instead of direct cntvct register access

- spe-decoder: add a comment for why SPE_EVENTS code sets packet->index.

- added arm_spe_pmu_default_config that accesses the driver
  caps/min_interval and sets the default sampling period to it.  This way
  users don't have to specify -c explicitly.  Also set is_uncore to false.

- set more sampling bits in the arm_spe and its tracking evsel.  Still
  unsure if too liberal, and not sure whether it needs another context
  switch tracking evsel.  Comments welcome!

v2: mostly addressing Mark Rutland's comments as much as possible without his
feedback to my feedback:

- decoder refactored with a get_payload, not extended to with-ext_len ones like
  get_addr,  named the constants

- 0x-ified %x output formats, but decided to not sign extend the addresses in
  the raw dump, rather do so if necessary in the synthesis stage:
  SPE implementations differ in this area, and raw dump should reflect that.

- CPU mask / new record behaviour bisected to commit e3ba76deef23064 "perf
  tools: Force uncore events to system wide monitoring".  Waiting to hear back
  on why driver can't do system wide monitoring, 

Re: [PATCH] Enable SR-IOV instantiation through /sys file

2017-10-27 Thread Duyck, Alexander H
On Sat, 2017-10-28 at 00:19 +0200, Alex Williamson wrote:
> On Fri, 27 Oct 2017 21:50:43 +
> "Wang, Liang-min"  wrote:
> 
> > > -Original Message-
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Tuesday, October 24, 2017 6:07 PM
> > > To: Wang, Liang-min 
> > > Cc: Kirsher, Jeffrey T ; 
> > > k...@vger.kernel.org;
> > > linux-...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > bhelg...@google.com; Duyck, Alexander H 
> > > Subject: Re: [PATCH] Enable SR-IOV instantiation through /sys file
> > > 
> > > On Tue, 24 Oct 2017 21:49:15 +
> > > "Wang, Liang-min"  wrote:
> > >   
> > > > Just like any PCIe devices that supports SR-IOV. There are restrictions 
> > > > set for  
> > > 
> > > VF. Also, there is a concept of trust VF now available for PF to manage 
> > > certain
> > > features that only selected VF could exercise. Are you saying all the 
> > > devices
> > > supporting SR-IOV all have security issue?
> > > 
> > > Here's a simple example, most SR-IOV capable NICs, including those from
> > > Intel, require the PF interface to be up in order to route traffic from
> > > the VF.  If the user controls the PF interface and VFs are used
> > > elsewhere in the host, the PF driver in userspace can induce a denial
> > > of service on the VFs.  That doesn't even take into account that VFs
> > > might be in separate IOMMU groups from the PF and therefore not
> > > isolated from the host like the PF and that the PF driver can
> > > potentially manipulate the VF, possibly performing DMA on behalf of the
> > > PF.  VFs are only considered secure today because the PF is managed by
> > > a driver in the host kernel.  Allowing simple enablement of VFs for a
> > > user owned PF seems inherently insecure to me.  Thanks,
> > > 
> > > Alex  
> > 
> > Firstly, the concern is on user-space PF driver based upon vfio-pci, this 
> > patch doesn't
> > change PF behavior so with/without this patch, the concern remains the same.
> 
> This patch enables SR-IOV to be enabled via the host on a user-owned
> PF, how is this not a change in behavior?
> 
> > Secondly, the security concern (including denial of service) in general is 
> > to ensure trust
> > entity to be trust-worthy. No matter the PF driver is in kernel-space or in 
> > user- space,
> > necessary mechanism needs to be enforced on the device driver to ensure it's
> > trusted worthy. For example, ixgbe kernel driver introduces a Tx hang 
> > detection
> > to avoid driver stays in a bad state. Therefore, it's the responsibility of 
> > user-space
> > driver function, which based upon vfio-pci, to enforce necessary mechanism 
> > to ensure
> > its trust-ness. That's a given.
> 
> Userspace is not trustworthy, therefore the host kernel cannot place
> responsibility on a userspace driver for anything, including the
> behavior of VFs.  I'm sorry, but it's a NAK unless you intend to
> follow-up with some proposal to quarantine the VFs enabled by the
> userspace PF driver.  Thanks,
> 
> Alex

I don't see this so much as a security problem per-se. It all depends
on the hardware setup. If I recall correctly, there are devices where
the PF function doesn't really do much other than act as a bit more
heavy-weight VF, and the actual logic is handled by a firmware engine
on the device. The only real issue is that for devices like the Intel
NICs instead of trusting a firmware engine we have historically used a
kernel driver and now we are wanting to trust a user-space agent
instead.

I do think that we probably need to have some sort of signaling between
user-space and vfio-pci that would allow for notifying the user-space
of the change and for user-space to notify vfio-pci that it is capable
of handling the notification. This is something that can be toggled at
any time after all and not all devices have a means of notifying the PF
that this has been changed.

Beyond that once the root user enables the VFs I would kind of think
they know what driver they have running them. Enabling VFs implies the
root user trusts the application running on top of vfio-pci to handle
the PF responsibly. At least that is how it works in my mind.

Thanks.

- Alexander
  (using full name since 2 Alexs in one thread can be confusing)



Re: [PATCH] Enable SR-IOV instantiation through /sys file

2017-10-27 Thread Duyck, Alexander H
On Sat, 2017-10-28 at 00:19 +0200, Alex Williamson wrote:
> On Fri, 27 Oct 2017 21:50:43 +
> "Wang, Liang-min"  wrote:
> 
> > > -Original Message-
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Tuesday, October 24, 2017 6:07 PM
> > > To: Wang, Liang-min 
> > > Cc: Kirsher, Jeffrey T ; 
> > > k...@vger.kernel.org;
> > > linux-...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > bhelg...@google.com; Duyck, Alexander H 
> > > Subject: Re: [PATCH] Enable SR-IOV instantiation through /sys file
> > > 
> > > On Tue, 24 Oct 2017 21:49:15 +
> > > "Wang, Liang-min"  wrote:
> > >   
> > > > Just like any PCIe devices that supports SR-IOV. There are restrictions 
> > > > set for  
> > > 
> > > VF. Also, there is a concept of trust VF now available for PF to manage 
> > > certain
> > > features that only selected VF could exercise. Are you saying all the 
> > > devices
> > > supporting SR-IOV all have security issue?
> > > 
> > > Here's a simple example, most SR-IOV capable NICs, including those from
> > > Intel, require the PF interface to be up in order to route traffic from
> > > the VF.  If the user controls the PF interface and VFs are used
> > > elsewhere in the host, the PF driver in userspace can induce a denial
> > > of service on the VFs.  That doesn't even take into account that VFs
> > > might be in separate IOMMU groups from the PF and therefore not
> > > isolated from the host like the PF and that the PF driver can
> > > potentially manipulate the VF, possibly performing DMA on behalf of the
> > > PF.  VFs are only considered secure today because the PF is managed by
> > > a driver in the host kernel.  Allowing simple enablement of VFs for a
> > > user owned PF seems inherently insecure to me.  Thanks,
> > > 
> > > Alex  
> > 
> > Firstly, the concern is on user-space PF driver based upon vfio-pci, this 
> > patch doesn't
> > change PF behavior so with/without this patch, the concern remains the same.
> 
> This patch enables SR-IOV to be enabled via the host on a user-owned
> PF, how is this not a change in behavior?
> 
> > Secondly, the security concern (including denial of service) in general is 
> > to ensure trust
> > entity to be trust-worthy. No matter the PF driver is in kernel-space or in 
> > user- space,
> > necessary mechanism needs to be enforced on the device driver to ensure it's
> > trusted worthy. For example, ixgbe kernel driver introduces a Tx hang 
> > detection
> > to avoid driver stays in a bad state. Therefore, it's the responsibility of 
> > user-space
> > driver function, which based upon vfio-pci, to enforce necessary mechanism 
> > to ensure
> > its trust-ness. That's a given.
> 
> Userspace is not trustworthy, therefore the host kernel cannot place
> responsibility on a userspace driver for anything, including the
> behavior of VFs.  I'm sorry, but it's a NAK unless you intend to
> follow-up with some proposal to quarantine the VFs enabled by the
> userspace PF driver.  Thanks,
> 
> Alex

I don't see this so much as a security problem per-se. It all depends
on the hardware setup. If I recall correctly, there are devices where
the PF function doesn't really do much other than act as a bit more
heavy-weight VF, and the actual logic is handled by a firmware engine
on the device. The only real issue is that for devices like the Intel
NICs instead of trusting a firmware engine we have historically used a
kernel driver and now we are wanting to trust a user-space agent
instead.

I do think that we probably need to have some sort of signaling between
user-space and vfio-pci that would allow for notifying the user-space
of the change and for user-space to notify vfio-pci that it is capable
of handling the notification. This is something that can be toggled at
any time after all and not all devices have a means of notifying the PF
that this has been changed.

Beyond that once the root user enables the VFs I would kind of think
they know what driver they have running them. Enabling VFs implies the
root user trusts the application running on top of vfio-pci to handle
the PF responsibly. At least that is how it works in my mind.

Thanks.

- Alexander
  (using full name since 2 Alexs in one thread can be confusing)



Re: [PATCH v4] Input: add support for the Samsung S6SY761 touchscreen

2017-10-27 Thread Dmitry Torokhov
Hi Andi,

On Tue, Sep 26, 2017 at 03:31:35PM +0900, Andi Shyti wrote:
> The S6SY761 touchscreen is a capicitive multi-touch controller
> for mobile use. It's connected with i2c at the address 0x48.
> 
> This commit provides a basic version of the driver which can
> handle only initialization, touch events and power states.
> 
> The controller is controlled by a firmware which, in the version
> I currently have, doesn't provide all the possible
> functionalities mentioned in the datasheet.
> 
> Signed-off-by: Andi Shyti 
> ---
> Hi,
> 
> sorry for the mix-up of the previous patch. This one should be
> fine. Here's the changelog:
> 
> v3 - v4
>  - fixed a mismatch on the module name
> 
> v2 - v3
>  - added security check on an unsigned value which can (unlikely)
>get a "negative" value
> 
>  - in the probe function the interrupt is requested after the
>input device registration in order to avoid checking in the
>interrupt handler whether the input device has been registered
> 
>  - removed the 'prev_pm_state' variable. Its original meaning
>was to restore the state of the device when coming back from
>sleep mode, but because I removed in patch v2 the low power
>mode, now the device works only in two modes and therefore
>'prev_pm_state' is not required any longer.
> 
> v1 - v2
>  - remove the low power functionality as it doesn't bring any
>benefit
>  - use get_unaligned_be16 instead of the form 'a << 8 | b'
>  - use max_t instead of '? :'
>  - use managed 'devm_device_add_group()'
> 
> Thanks,
> Andi
> 
>  .../bindings/input/touchscreen/samsung,s6sy761.txt |  34 ++
>  drivers/input/touchscreen/Kconfig  |  11 +
>  drivers/input/touchscreen/Makefile |   1 +
>  drivers/input/touchscreen/s6sy761.c| 556 
> +
>  4 files changed, 602 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/input/touchscreen/samsung,s6sy761.txt
>  create mode 100644 drivers/input/touchscreen/s6sy761.c
> 
> diff --git 
> a/Documentation/devicetree/bindings/input/touchscreen/samsung,s6sy761.txt 
> b/Documentation/devicetree/bindings/input/touchscreen/samsung,s6sy761.txt
> new file mode 100644
> index ..d9b7c2ff611e
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/input/touchscreen/samsung,s6sy761.txt
> @@ -0,0 +1,34 @@
> +* Samsung S6SY761 touchscreen controller
> +
> +Required properties:
> +- compatible : must be "samsung,s6sy761"
> +- reg: I2C slave address, (e.g. 0x48)
> +- interrupt-parent   : the phandle to the interrupt controller which provides
> +   the interrupt
> +- interrupts : interrupt specification
> +- avdd-supply: analogic power supply
> +- vdd-supply : power supply
> +
> +Optional properties:
> +- touchscreen-size-x : see touchscreen.txt. This property is embedded in the
> +   device. If defined it forces a different x resolution.
> +- touchscreen-size-y : see touchscreen.txt. This property is embedded in the
> +   device. If defined it forces a different y resolution.
> +
> +Example:
> +
> +i2c@ {
> +
> + /* ... */
> +
> + touchscreen@48 {
> + compatible = "samsung,s6sy761";
> + reg = <0x48>;
> + interrupt-parent = <>;
> + interrupts = <1 IRQ_TYPE_NONE>;
> + avdd-supply = <_reg>;
> + vdd-supply = <_reg>;
> + touchscreen-size-x = <4096>;
> + touchscreen-size-y = <4096>;
> + };
> +};
> diff --git a/drivers/input/touchscreen/Kconfig 
> b/drivers/input/touchscreen/Kconfig
> index 176b1a74b2b7..c903db4cf7b2 100644
> --- a/drivers/input/touchscreen/Kconfig
> +++ b/drivers/input/touchscreen/Kconfig
> @@ -383,6 +383,17 @@ config TOUCHSCREEN_S3C2410
> To compile this driver as a module, choose M here: the
> module will be called s3c2410_ts.
>  
> +config TOUCHSCREEN_S6SY761
> + tristate "Samsung S6SY761 Touchscreen driver"
> + depends on I2C
> + help
> +   Say Y if you have the Samsung S6SY761 driver
> +
> +   If unsure, say N
> +
> +   To compile this driver as module, choose M here: the
> +   module will be called s6sy761.
> +
>  config TOUCHSCREEN_GUNZE
>   tristate "Gunze AHL-51S touchscreen"
>   select SERIO
> diff --git a/drivers/input/touchscreen/Makefile 
> b/drivers/input/touchscreen/Makefile
> index 6badce87037b..4f63439211fd 100644
> --- a/drivers/input/touchscreen/Makefile
> +++ b/drivers/input/touchscreen/Makefile
> @@ -64,6 +64,7 @@ obj-$(CONFIG_TOUCHSCREEN_PENMOUNT)  += penmount.o
>  obj-$(CONFIG_TOUCHSCREEN_PIXCIR) += pixcir_i2c_ts.o
>  obj-$(CONFIG_TOUCHSCREEN_RM_TS)  += raydium_i2c_ts.o
>  obj-$(CONFIG_TOUCHSCREEN_S3C2410)+= s3c2410_ts.o
> +obj-$(CONFIG_TOUCHSCREEN_S6SY761)+= s6sy761.o
>  obj-$(CONFIG_TOUCHSCREEN_SILEAD) += silead.o
> 

Re: [PATCH v4] Input: add support for the Samsung S6SY761 touchscreen

2017-10-27 Thread Dmitry Torokhov
Hi Andi,

On Tue, Sep 26, 2017 at 03:31:35PM +0900, Andi Shyti wrote:
> The S6SY761 touchscreen is a capicitive multi-touch controller
> for mobile use. It's connected with i2c at the address 0x48.
> 
> This commit provides a basic version of the driver which can
> handle only initialization, touch events and power states.
> 
> The controller is controlled by a firmware which, in the version
> I currently have, doesn't provide all the possible
> functionalities mentioned in the datasheet.
> 
> Signed-off-by: Andi Shyti 
> ---
> Hi,
> 
> sorry for the mix-up of the previous patch. This one should be
> fine. Here's the changelog:
> 
> v3 - v4
>  - fixed a mismatch on the module name
> 
> v2 - v3
>  - added security check on an unsigned value which can (unlikely)
>get a "negative" value
> 
>  - in the probe function the interrupt is requested after the
>input device registration in order to avoid checking in the
>interrupt handler whether the input device has been registered
> 
>  - removed the 'prev_pm_state' variable. Its original meaning
>was to restore the state of the device when coming back from
>sleep mode, but because I removed in patch v2 the low power
>mode, now the device works only in two modes and therefore
>'prev_pm_state' is not required any longer.
> 
> v1 - v2
>  - remove the low power functionality as it doesn't bring any
>benefit
>  - use get_unaligned_be16 instead of the form 'a << 8 | b'
>  - use max_t instead of '? :'
>  - use managed 'devm_device_add_group()'
> 
> Thanks,
> Andi
> 
>  .../bindings/input/touchscreen/samsung,s6sy761.txt |  34 ++
>  drivers/input/touchscreen/Kconfig  |  11 +
>  drivers/input/touchscreen/Makefile |   1 +
>  drivers/input/touchscreen/s6sy761.c| 556 
> +
>  4 files changed, 602 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/input/touchscreen/samsung,s6sy761.txt
>  create mode 100644 drivers/input/touchscreen/s6sy761.c
> 
> diff --git 
> a/Documentation/devicetree/bindings/input/touchscreen/samsung,s6sy761.txt 
> b/Documentation/devicetree/bindings/input/touchscreen/samsung,s6sy761.txt
> new file mode 100644
> index ..d9b7c2ff611e
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/input/touchscreen/samsung,s6sy761.txt
> @@ -0,0 +1,34 @@
> +* Samsung S6SY761 touchscreen controller
> +
> +Required properties:
> +- compatible : must be "samsung,s6sy761"
> +- reg: I2C slave address, (e.g. 0x48)
> +- interrupt-parent   : the phandle to the interrupt controller which provides
> +   the interrupt
> +- interrupts : interrupt specification
> +- avdd-supply: analogic power supply
> +- vdd-supply : power supply
> +
> +Optional properties:
> +- touchscreen-size-x : see touchscreen.txt. This property is embedded in the
> +   device. If defined it forces a different x resolution.
> +- touchscreen-size-y : see touchscreen.txt. This property is embedded in the
> +   device. If defined it forces a different y resolution.
> +
> +Example:
> +
> +i2c@ {
> +
> + /* ... */
> +
> + touchscreen@48 {
> + compatible = "samsung,s6sy761";
> + reg = <0x48>;
> + interrupt-parent = <>;
> + interrupts = <1 IRQ_TYPE_NONE>;
> + avdd-supply = <_reg>;
> + vdd-supply = <_reg>;
> + touchscreen-size-x = <4096>;
> + touchscreen-size-y = <4096>;
> + };
> +};
> diff --git a/drivers/input/touchscreen/Kconfig 
> b/drivers/input/touchscreen/Kconfig
> index 176b1a74b2b7..c903db4cf7b2 100644
> --- a/drivers/input/touchscreen/Kconfig
> +++ b/drivers/input/touchscreen/Kconfig
> @@ -383,6 +383,17 @@ config TOUCHSCREEN_S3C2410
> To compile this driver as a module, choose M here: the
> module will be called s3c2410_ts.
>  
> +config TOUCHSCREEN_S6SY761
> + tristate "Samsung S6SY761 Touchscreen driver"
> + depends on I2C
> + help
> +   Say Y if you have the Samsung S6SY761 driver
> +
> +   If unsure, say N
> +
> +   To compile this driver as module, choose M here: the
> +   module will be called s6sy761.
> +
>  config TOUCHSCREEN_GUNZE
>   tristate "Gunze AHL-51S touchscreen"
>   select SERIO
> diff --git a/drivers/input/touchscreen/Makefile 
> b/drivers/input/touchscreen/Makefile
> index 6badce87037b..4f63439211fd 100644
> --- a/drivers/input/touchscreen/Makefile
> +++ b/drivers/input/touchscreen/Makefile
> @@ -64,6 +64,7 @@ obj-$(CONFIG_TOUCHSCREEN_PENMOUNT)  += penmount.o
>  obj-$(CONFIG_TOUCHSCREEN_PIXCIR) += pixcir_i2c_ts.o
>  obj-$(CONFIG_TOUCHSCREEN_RM_TS)  += raydium_i2c_ts.o
>  obj-$(CONFIG_TOUCHSCREEN_S3C2410)+= s3c2410_ts.o
> +obj-$(CONFIG_TOUCHSCREEN_S6SY761)+= s6sy761.o
>  obj-$(CONFIG_TOUCHSCREEN_SILEAD) += silead.o
>  

Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Hal Rosenstock
On 10/27/2017 7:04 PM, Ghazale Hosseinabadi wrote:
> 
> 
> On 10/27/2017 03:52 PM, Hal Rosenstock wrote:
>> On 10/27/2017 5:54 PM, Ghazale Hosseinabadi wrote:
>>> When running ibstat (if transceiver is not connected in adapter):
>>>
>>> ibpanic: [7851] main: stat of IB device 'mlx5_1' failed: Invalid
>>> argument
>> Any output before that ?
> no, It only prints this line.

and setting the width to 1x in the driver so the rate file is properly
populated fixes this ? I must be missing something as to what is going
on in this scenario.

sysfs.c:rate_show is inconsistent as it paves over an invalid speed
setting that to SDR but does not pave over invalid width returning
-EINVAL but this comment is in another "direction".

-- Hal

> 
> -- Ghazale
>>   I'm trying to understand how far it gets. It
>> looks to me that empty rate file would be parsed as 0 and ibstat would
>> show that rate. ibpanic would occur if file was not found but I could be
>> missing something.
>>
> 
> 


Re: [PATCH] IB/mlx5: give back valid speed/width even without plugged in SFP module

2017-10-27 Thread Hal Rosenstock
On 10/27/2017 7:04 PM, Ghazale Hosseinabadi wrote:
> 
> 
> On 10/27/2017 03:52 PM, Hal Rosenstock wrote:
>> On 10/27/2017 5:54 PM, Ghazale Hosseinabadi wrote:
>>> When running ibstat (if transceiver is not connected in adapter):
>>>
>>> ibpanic: [7851] main: stat of IB device 'mlx5_1' failed: Invalid
>>> argument
>> Any output before that ?
> no, It only prints this line.

and setting the width to 1x in the driver so the rate file is properly
populated fixes this ? I must be missing something as to what is going
on in this scenario.

sysfs.c:rate_show is inconsistent as it paves over an invalid speed
setting that to SDR but does not pave over invalid width returning
-EINVAL but this comment is in another "direction".

-- Hal

> 
> -- Ghazale
>>   I'm trying to understand how far it gets. It
>> looks to me that empty rate file would be parsed as 0 and ibstat would
>> show that rate. ibpanic would occur if file was not found but I could be
>> missing something.
>>
> 
> 


Re: [RFC PATCH v10 7/7] PCI / PM: Add support for the PCIe WAKE# signal for OF

2017-10-27 Thread Brian Norris
Hi Jeffy,

On Fri, Oct 27, 2017 at 03:26:12PM +0800, Jeffy Chen wrote:
> Add pci-of.c to handle the PCIe WAKE# interrupt.
> 
> Also use the dedicated wakeirq infrastructure to simplify it.
> 
> Signed-off-by: Jeffy Chen 
> ---
> 
> Changes in v10:
> Use device_set_wakeup_capable() instead of device_set_wakeup_enable(),
> since dedicated wakeirq will be lost in device_set_wakeup_enable(false).
> 
> Changes in v9:
> Fix check error in .cleanup().
> Move dedicated wakeirq setup to setup() callback and use
> device_set_wakeup_enable() to enable/disable.
> 
> Changes in v8:
> Add pci-of.c and use platform_pm_ops to handle the PCIe WAKE# signal.
> 
> Changes in v7:
> Move PCIE_WAKE handling into pci core.
> 
> Changes in v6:
> Fix device_init_wake error handling, and add some comments.
> 
> Changes in v5:
> Rebase.
> 
> Changes in v3:
> Fix error handling.
> 
> Changes in v2:
> Use dev_pm_set_dedicated_wake_irq.
> 
>  drivers/pci/Makefile |   2 +-
>  drivers/pci/pci-of.c | 127 
> +++
>  2 files changed, 128 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/pci/pci-of.c
> 
> diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
> index 66a21acad952..4f76dbdb024c 100644
> --- a/drivers/pci/Makefile
> +++ b/drivers/pci/Makefile
> @@ -49,7 +49,7 @@ obj-$(CONFIG_PCI_ECAM) += ecam.o
>  
>  obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
>  
> -obj-$(CONFIG_OF) += of.o
> +obj-$(CONFIG_OF) += of.o pci-of.o
>  
>  ccflags-$(CONFIG_PCI_DEBUG) := -DDEBUG
>  
> diff --git a/drivers/pci/pci-of.c b/drivers/pci/pci-of.c
> new file mode 100644
> index ..28f3c4a0eec8
> --- /dev/null
> +++ b/drivers/pci/pci-of.c
> @@ -0,0 +1,127 @@
> +/*
> + * OF PCI PM support
> + *
> + * Copyright (c) 2017 Rockchip, Inc.
> + *
> + * Author: Jeffy Chen 
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "pci.h"
> +
> +struct of_pci_pm_data {
> + struct device   *dev;
> + unsigned intwakeup_irq;
> + atomic_twakeup_cnt;
> +};
> +
> +static void *of_pci_setup(struct device *dev)
> +{
> + struct of_pci_pm_data *data;
> + int irq, ret;
> +
> + if (!dev->of_node)
> + return NULL;
> +
> + data = devm_kzalloc(dev, sizeof(struct of_pci_pm_data), GFP_KERNEL);
> + if (!data)
> + return ERR_PTR(-ENOMEM);
> +
> + irq = of_irq_get_byname(dev->of_node, "wakeup");
> + if (irq < 0) {
> + if (irq == -EPROBE_DEFER)
> + return ERR_PTR(irq);
> +
> + return NULL;
> + }
> +
> + data->wakeup_irq = irq;
> + data->dev = dev;
> +
> + device_init_wakeup(dev, true);
> + ret = dev_pm_set_dedicated_wake_irq(dev, irq);

I'm seeing this WARNING during system resume when I enable wake-on-Wifi
with this series:

[  135.259920] Unbalanced IRQ 64 wake disable
[  135.259929] [ cut here ]
[  135.259942] WARNING: CPU: 0 PID: 3233 at kernel/irq/manage.c:606 
irq_set_irq_wake+0xac/0x12c
[  135.259944] Modules linked in: btusb btrtl btbcm btintel bluetooth 
ecdh_generic cdc_ether usbnet uvcvideo mwifiex_pcie videobuf2_vmalloc 
videobuf2_memops mwifiex videobuf2_v4l2 videobuf2_core cfg80211 ip6table_filter 
r8152 mii joydev
[  135.259986] CPU: 0 PID: 3233 Comm: bash Not tainted 4.14.0-rc3+ #40
[  135.259988] Hardware name: Google Kevin (DT)
[  135.259991] task: ffc0f133c880 task.stack: ff801052
[  135.259994] PC is at irq_set_irq_wake+0xac/0x12c
[  135.259998] LR is at irq_set_irq_wake+0xac/0x12c
...
[  135.260121] [] irq_set_irq_wake+0xac/0x12c
[  135.260127] [] dev_pm_disarm_wake_irq+0x3c/0x58
[  135.260133] [] device_wakeup_disarm_wake_irqs+0x50/0x78
[  135.260137] [] dpm_noirq_end+0x18/0x24
[  135.260140] [] dpm_resume_noirq+0x24/0x30
[  135.260146] [] suspend_devices_and_enter+0x474/0x970
[  135.260150] [] pm_suspend+0x688/0x6cc
[  135.260154] [] state_store+0xd4/0xf8
[  135.260160] [] kobj_attr_store+0x18/0x28
[  135.260165] [] sysfs_kf_write+0x5c/0x68
[  135.260169] [] kernfs_fop_write+0x15c/0x198
[  135.260174] [] __vfs_write+0x58/0x160
[  135.260178] [] vfs_write+0xc4/0x15c
[  135.260181] [] SyS_write+0x64/0xb4

I'm not yet sure if this is your series' fault, or if the dedicated wake IRQ
infrastructure did something wrong.

> + if (ret < 0) {
> + device_init_wakeup(dev, false);
> + return NULL;
> + }
> + device_set_wakeup_capable(dev, false);
> +
> + dev_info(dev, "Wakeup IRQ %d\n", irq);

Do you actually need to print this out? It'll be available in things
like /proc/interrupts now, so this seems overkill.

> + return data;
> +}
> +
> +static 

Re: [RFC PATCH v10 7/7] PCI / PM: Add support for the PCIe WAKE# signal for OF

2017-10-27 Thread Brian Norris
Hi Jeffy,

On Fri, Oct 27, 2017 at 03:26:12PM +0800, Jeffy Chen wrote:
> Add pci-of.c to handle the PCIe WAKE# interrupt.
> 
> Also use the dedicated wakeirq infrastructure to simplify it.
> 
> Signed-off-by: Jeffy Chen 
> ---
> 
> Changes in v10:
> Use device_set_wakeup_capable() instead of device_set_wakeup_enable(),
> since dedicated wakeirq will be lost in device_set_wakeup_enable(false).
> 
> Changes in v9:
> Fix check error in .cleanup().
> Move dedicated wakeirq setup to setup() callback and use
> device_set_wakeup_enable() to enable/disable.
> 
> Changes in v8:
> Add pci-of.c and use platform_pm_ops to handle the PCIe WAKE# signal.
> 
> Changes in v7:
> Move PCIE_WAKE handling into pci core.
> 
> Changes in v6:
> Fix device_init_wake error handling, and add some comments.
> 
> Changes in v5:
> Rebase.
> 
> Changes in v3:
> Fix error handling.
> 
> Changes in v2:
> Use dev_pm_set_dedicated_wake_irq.
> 
>  drivers/pci/Makefile |   2 +-
>  drivers/pci/pci-of.c | 127 
> +++
>  2 files changed, 128 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/pci/pci-of.c
> 
> diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
> index 66a21acad952..4f76dbdb024c 100644
> --- a/drivers/pci/Makefile
> +++ b/drivers/pci/Makefile
> @@ -49,7 +49,7 @@ obj-$(CONFIG_PCI_ECAM) += ecam.o
>  
>  obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
>  
> -obj-$(CONFIG_OF) += of.o
> +obj-$(CONFIG_OF) += of.o pci-of.o
>  
>  ccflags-$(CONFIG_PCI_DEBUG) := -DDEBUG
>  
> diff --git a/drivers/pci/pci-of.c b/drivers/pci/pci-of.c
> new file mode 100644
> index ..28f3c4a0eec8
> --- /dev/null
> +++ b/drivers/pci/pci-of.c
> @@ -0,0 +1,127 @@
> +/*
> + * OF PCI PM support
> + *
> + * Copyright (c) 2017 Rockchip, Inc.
> + *
> + * Author: Jeffy Chen 
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "pci.h"
> +
> +struct of_pci_pm_data {
> + struct device   *dev;
> + unsigned intwakeup_irq;
> + atomic_twakeup_cnt;
> +};
> +
> +static void *of_pci_setup(struct device *dev)
> +{
> + struct of_pci_pm_data *data;
> + int irq, ret;
> +
> + if (!dev->of_node)
> + return NULL;
> +
> + data = devm_kzalloc(dev, sizeof(struct of_pci_pm_data), GFP_KERNEL);
> + if (!data)
> + return ERR_PTR(-ENOMEM);
> +
> + irq = of_irq_get_byname(dev->of_node, "wakeup");
> + if (irq < 0) {
> + if (irq == -EPROBE_DEFER)
> + return ERR_PTR(irq);
> +
> + return NULL;
> + }
> +
> + data->wakeup_irq = irq;
> + data->dev = dev;
> +
> + device_init_wakeup(dev, true);
> + ret = dev_pm_set_dedicated_wake_irq(dev, irq);

I'm seeing this WARNING during system resume when I enable wake-on-Wifi
with this series:

[  135.259920] Unbalanced IRQ 64 wake disable
[  135.259929] [ cut here ]
[  135.259942] WARNING: CPU: 0 PID: 3233 at kernel/irq/manage.c:606 
irq_set_irq_wake+0xac/0x12c
[  135.259944] Modules linked in: btusb btrtl btbcm btintel bluetooth 
ecdh_generic cdc_ether usbnet uvcvideo mwifiex_pcie videobuf2_vmalloc 
videobuf2_memops mwifiex videobuf2_v4l2 videobuf2_core cfg80211 ip6table_filter 
r8152 mii joydev
[  135.259986] CPU: 0 PID: 3233 Comm: bash Not tainted 4.14.0-rc3+ #40
[  135.259988] Hardware name: Google Kevin (DT)
[  135.259991] task: ffc0f133c880 task.stack: ff801052
[  135.259994] PC is at irq_set_irq_wake+0xac/0x12c
[  135.259998] LR is at irq_set_irq_wake+0xac/0x12c
...
[  135.260121] [] irq_set_irq_wake+0xac/0x12c
[  135.260127] [] dev_pm_disarm_wake_irq+0x3c/0x58
[  135.260133] [] device_wakeup_disarm_wake_irqs+0x50/0x78
[  135.260137] [] dpm_noirq_end+0x18/0x24
[  135.260140] [] dpm_resume_noirq+0x24/0x30
[  135.260146] [] suspend_devices_and_enter+0x474/0x970
[  135.260150] [] pm_suspend+0x688/0x6cc
[  135.260154] [] state_store+0xd4/0xf8
[  135.260160] [] kobj_attr_store+0x18/0x28
[  135.260165] [] sysfs_kf_write+0x5c/0x68
[  135.260169] [] kernfs_fop_write+0x15c/0x198
[  135.260174] [] __vfs_write+0x58/0x160
[  135.260178] [] vfs_write+0xc4/0x15c
[  135.260181] [] SyS_write+0x64/0xb4

I'm not yet sure if this is your series' fault, or if the dedicated wake IRQ
infrastructure did something wrong.

> + if (ret < 0) {
> + device_init_wakeup(dev, false);
> + return NULL;
> + }
> + device_set_wakeup_capable(dev, false);
> +
> + dev_info(dev, "Wakeup IRQ %d\n", irq);

Do you actually need to print this out? It'll be available in things
like /proc/interrupts now, so this seems overkill.

> + return data;
> +}
> +
> +static void *of_pci_setup_dev(struct pci_dev *pci_dev)
> +{
> +  

  1   2   3   4   5   6   7   8   9   10   >