[PATCH] x86/vdso: avoid warning when building vdso2c on 32-bit host

2021-03-10 Thread Jan Beulich
size_t arguments can't compatibly be passed for l-modifier format
specifiers. Use z instead.

Fixes: 8382c668ce4f ("x86/vdso: Add support for exception fixup in vDSO 
functions")
Signed-off-by: Jan Beulich 

--- a/arch/x86/entry/vdso/vdso2c.h
+++ b/arch/x86/entry/vdso/vdso2c.h
@@ -35,7 +35,7 @@ static void BITSFUNC(extract)(const unsi
if (offset + len > data_len)
fail("section to extract overruns input data");
 
-   fprintf(outfile, "static const unsigned char %s[%lu] = {", name, len);
+   fprintf(outfile, "static const unsigned char %s[%zu] = {", name, len);
BITSFUNC(copy)(outfile, data + offset, len);
fprintf(outfile, "\n};\n\n");
 }


Re: [PATCH v2 4/8] xen/netback: fix spurious event detection for common event case

2021-02-11 Thread Jan Beulich
On 11.02.2021 11:16, Juergen Gross wrote:
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -162,13 +162,15 @@ irqreturn_t xenvif_interrupt(int irq, void *dev_id)
>  {
>   struct xenvif_queue *queue = dev_id;
>   int old;
> + bool has_rx, has_tx;
>  
>   old = atomic_fetch_or(NETBK_COMMON_EOI, >eoi_pending);
>   WARN(old, "Interrupt while EOI pending\n");
>  
> - /* Use bitwise or as we need to call both functions. */
> - if ((!xenvif_handle_tx_interrupt(queue) |
> -  !xenvif_handle_rx_interrupt(queue))) {
> + has_tx = xenvif_handle_tx_interrupt(queue);
> + has_rx = xenvif_handle_rx_interrupt(queue);
> +
> + if (!has_rx && !has_tx) {
>   atomic_andnot(NETBK_COMMON_EOI, >eoi_pending);
>   xen_irq_lateeoi(irq, XEN_EOI_FLAG_SPURIOUS);
>   }
> 

Ah yes, what was originally meant really was

if (!(xenvif_handle_tx_interrupt(queue) |
  xenvif_handle_rx_interrupt(queue))) {

(also hinted at by the otherwise pointless inner parentheses),
which you simply write in an alternative way.

Reviewed-by: Jan Beulich 

Jan


Re: [PATCH 7/7] xen/evtchn: read producer index only once

2021-02-08 Thread Jan Beulich
On 08.02.2021 13:15, Jürgen Groß wrote:
> On 08.02.21 12:54, Jan Beulich wrote:
>> On 08.02.2021 11:59, Jürgen Groß wrote:
>>> On 08.02.21 11:51, Jan Beulich wrote:
>>>> On 08.02.2021 11:41, Jürgen Groß wrote:
>>>>> On 08.02.21 10:48, Jan Beulich wrote:
>>>>>> On 06.02.2021 11:49, Juergen Gross wrote:
>>>>>>> In evtchn_read() use READ_ONCE() for reading the producer index in
>>>>>>> order to avoid the compiler generating multiple accesses.
>>>>>>>
>>>>>>> Signed-off-by: Juergen Gross 
>>>>>>> ---
>>>>>>> drivers/xen/evtchn.c | 2 +-
>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
>>>>>>> index 421382c73d88..f6b199b597bf 100644
>>>>>>> --- a/drivers/xen/evtchn.c
>>>>>>> +++ b/drivers/xen/evtchn.c
>>>>>>> @@ -211,7 +211,7 @@ static ssize_t evtchn_read(struct file *file, char 
>>>>>>> __user *buf,
>>>>>>> goto unlock_out;
>>>>>>> 
>>>>>>> c = u->ring_cons;
>>>>>>> -   p = u->ring_prod;
>>>>>>> +   p = READ_ONCE(u->ring_prod);
>>>>>>> if (c != p)
>>>>>>> break;
>>>>>>
>>>>>> Why only here and not also in
>>>>>>
>>>>>>  rc = wait_event_interruptible(u->evtchn_wait,
>>>>>>u->ring_cons != u->ring_prod);
>>>>>>
>>>>>> or in evtchn_poll()? I understand it's not needed when
>>>>>> ring_prod_lock is held, but that's not the case in the two
>>>>>> afaics named places. Plus isn't the same then true for
>>>>>> ring_cons and ring_cons_mutex, i.e. aren't the two named
>>>>>> places plus evtchn_interrupt() also in need of READ_ONCE()
>>>>>> for ring_cons?
>>>>>
>>>>> The problem solved here is the further processing using "p" multiple
>>>>> times. p must not be silently replaced with u->ring_prod by the
>>>>> compiler, so I probably should reword the commit message to say:
>>>>>
>>>>> ... in order to not allow the compiler to refetch p.
>>>>
>>>> I still wouldn't understand the change (and the lack of
>>>> further changes) then: The first further use of p is
>>>> outside the loop, alongside one of c. IOW why would c
>>>> then not need treating the same as p?
>>>
>>> Its value wouldn't change, as ring_cons is being modified only at
>>> the bottom of this function, and nowhere else (apart from the reset
>>> case, but this can't run concurrently due to ring_cons_mutex).
>>>
>>>> I also still don't see the difference between latching a
>>>> value into a local variable vs a "freestanding" access -
>>>> neither are guaranteed to result in exactly one memory
>>>> access afaict.
>>>
>>> READ_ONCE() is using a pointer to volatile, so any refetching by
>>> the compiler would be a bug.
>>
>> Of course, but this wasn't my point. I was contrasting
>>
>>  c = u->ring_cons;
>>  p = u->ring_prod;
>>
>> which you change with
>>
>>  rc = wait_event_interruptible(u->evtchn_wait,
>>u->ring_cons != u->ring_prod);
>>
>> which you leave alone.
> 
> Can you point out which problem might arise from that?

Not any particular active one. Yet enhancing some accesses
but not others seems to me like a recipe for new problems
down the road.

Jan


Re: [PATCH 7/7] xen/evtchn: read producer index only once

2021-02-08 Thread Jan Beulich
On 08.02.2021 11:59, Jürgen Groß wrote:
> On 08.02.21 11:51, Jan Beulich wrote:
>> On 08.02.2021 11:41, Jürgen Groß wrote:
>>> On 08.02.21 10:48, Jan Beulich wrote:
>>>> On 06.02.2021 11:49, Juergen Gross wrote:
>>>>> In evtchn_read() use READ_ONCE() for reading the producer index in
>>>>> order to avoid the compiler generating multiple accesses.
>>>>>
>>>>> Signed-off-by: Juergen Gross 
>>>>> ---
>>>>>drivers/xen/evtchn.c | 2 +-
>>>>>1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
>>>>> index 421382c73d88..f6b199b597bf 100644
>>>>> --- a/drivers/xen/evtchn.c
>>>>> +++ b/drivers/xen/evtchn.c
>>>>> @@ -211,7 +211,7 @@ static ssize_t evtchn_read(struct file *file, char 
>>>>> __user *buf,
>>>>>   goto unlock_out;
>>>>>
>>>>>   c = u->ring_cons;
>>>>> - p = u->ring_prod;
>>>>> + p = READ_ONCE(u->ring_prod);
>>>>>   if (c != p)
>>>>>   break;
>>>>
>>>> Why only here and not also in
>>>>
>>>>rc = wait_event_interruptible(u->evtchn_wait,
>>>>  u->ring_cons != u->ring_prod);
>>>>
>>>> or in evtchn_poll()? I understand it's not needed when
>>>> ring_prod_lock is held, but that's not the case in the two
>>>> afaics named places. Plus isn't the same then true for
>>>> ring_cons and ring_cons_mutex, i.e. aren't the two named
>>>> places plus evtchn_interrupt() also in need of READ_ONCE()
>>>> for ring_cons?
>>>
>>> The problem solved here is the further processing using "p" multiple
>>> times. p must not be silently replaced with u->ring_prod by the
>>> compiler, so I probably should reword the commit message to say:
>>>
>>> ... in order to not allow the compiler to refetch p.
>>
>> I still wouldn't understand the change (and the lack of
>> further changes) then: The first further use of p is
>> outside the loop, alongside one of c. IOW why would c
>> then not need treating the same as p?
> 
> Its value wouldn't change, as ring_cons is being modified only at
> the bottom of this function, and nowhere else (apart from the reset
> case, but this can't run concurrently due to ring_cons_mutex).
> 
>> I also still don't see the difference between latching a
>> value into a local variable vs a "freestanding" access -
>> neither are guaranteed to result in exactly one memory
>> access afaict.
> 
> READ_ONCE() is using a pointer to volatile, so any refetching by
> the compiler would be a bug.

Of course, but this wasn't my point. I was contrasting

c = u->ring_cons;
p = u->ring_prod;

which you change with

rc = wait_event_interruptible(u->evtchn_wait,
  u->ring_cons != u->ring_prod);

which you leave alone.

Jan


Re: [PATCH 7/7] xen/evtchn: read producer index only once

2021-02-08 Thread Jan Beulich
On 08.02.2021 11:41, Jürgen Groß wrote:
> On 08.02.21 10:48, Jan Beulich wrote:
>> On 06.02.2021 11:49, Juergen Gross wrote:
>>> In evtchn_read() use READ_ONCE() for reading the producer index in
>>> order to avoid the compiler generating multiple accesses.
>>>
>>> Signed-off-by: Juergen Gross 
>>> ---
>>>   drivers/xen/evtchn.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
>>> index 421382c73d88..f6b199b597bf 100644
>>> --- a/drivers/xen/evtchn.c
>>> +++ b/drivers/xen/evtchn.c
>>> @@ -211,7 +211,7 @@ static ssize_t evtchn_read(struct file *file, char 
>>> __user *buf,
>>> goto unlock_out;
>>>   
>>> c = u->ring_cons;
>>> -   p = u->ring_prod;
>>> +   p = READ_ONCE(u->ring_prod);
>>> if (c != p)
>>> break;
>>
>> Why only here and not also in
>>
>>  rc = wait_event_interruptible(u->evtchn_wait,
>>u->ring_cons != u->ring_prod);
>>
>> or in evtchn_poll()? I understand it's not needed when
>> ring_prod_lock is held, but that's not the case in the two
>> afaics named places. Plus isn't the same then true for
>> ring_cons and ring_cons_mutex, i.e. aren't the two named
>> places plus evtchn_interrupt() also in need of READ_ONCE()
>> for ring_cons?
> 
> The problem solved here is the further processing using "p" multiple
> times. p must not be silently replaced with u->ring_prod by the
> compiler, so I probably should reword the commit message to say:
> 
> ... in order to not allow the compiler to refetch p.

I still wouldn't understand the change (and the lack of
further changes) then: The first further use of p is
outside the loop, alongside one of c. IOW why would c
then not need treating the same as p?

I also still don't see the difference between latching a
value into a local variable vs a "freestanding" access -
neither are guaranteed to result in exactly one memory
access afaict.

And of course there's also our beloved topic of access
tearing here: READ_ONCE() also excludes that (at least as
per its intentions aiui).

Jan


Re: [PATCH 6/7] xen/evtch: use smp barriers for user event ring

2021-02-08 Thread Jan Beulich
On 08.02.2021 11:23, Andrew Cooper wrote:
> On 08/02/2021 09:50, Jan Beulich wrote:
>> On 08.02.2021 10:44, Andrew Cooper wrote:
>>> On 06/02/2021 10:49, Juergen Gross wrote:
>>>> The ring buffer for user events is used in the local system only, so
>>>> smp barriers are fine for ensuring consistency.
>>>>
>>>> Reported-by: Andrew Cooper 
>>>> Signed-off-by: Juergen Gross 
>>> These need to be virt_* to not break in UP builds (on non-x86).
>> Initially I though so, too, but isn't the sole vCPU of such a
>> VM getting re-scheduled to a different pCPU in the hypervisor
>> an implied barrier anyway?
> 
> Yes, but that isn't relevant to why UP builds break.
> 
> smp_*() degrade to compiler barriers in UP builds, and while that's
> mostly fine for x86 read/write, its not fine for ARM barriers.

Hmm, I may not know enough of Arm's memory model - are you saying
Arm CPUs aren't even self-coherent, i.e. later operations (e.g.
the consuming of ring contents) won't observe earlier ones (the
updating of ring contents) when only a single physical CPU is
involved in all of this? (I did mention the hypervisor level
context switch simply because that's the only way multiple CPUs
can get involved.)

Jan


Re: [PATCH 2/7] xen/events: don't unmask an event channel when an eoi is pending

2021-02-08 Thread Jan Beulich
On 06.02.2021 11:49, Juergen Gross wrote:
> @@ -1798,6 +1818,29 @@ static void mask_ack_dynirq(struct irq_data *data)
>   ack_dynirq(data);
>  }
>  
> +static void lateeoi_ack_dynirq(struct irq_data *data)
> +{
> + struct irq_info *info = info_for_irq(data->irq);
> + evtchn_port_t evtchn = info ? info->evtchn : 0;
> +
> + if (VALID_EVTCHN(evtchn)) {
> + info->eoi_pending = true;
> + mask_evtchn(evtchn);
> + }
> +}
> +
> +static void lateeoi_mask_ack_dynirq(struct irq_data *data)
> +{
> + struct irq_info *info = info_for_irq(data->irq);
> + evtchn_port_t evtchn = info ? info->evtchn : 0;
> +
> + if (VALID_EVTCHN(evtchn)) {
> + info->masked = true;
> + info->eoi_pending = true;
> + mask_evtchn(evtchn);
> + }
> +}
> +
>  static int retrigger_dynirq(struct irq_data *data)
>  {
>   evtchn_port_t evtchn = evtchn_from_irq(data->irq);
> @@ -2023,8 +2066,8 @@ static struct irq_chip xen_lateeoi_chip __read_mostly = 
> {
>   .irq_mask   = disable_dynirq,
>   .irq_unmask = enable_dynirq,
>  
> - .irq_ack= mask_ack_dynirq,
> - .irq_mask_ack   = mask_ack_dynirq,
> + .irq_ack= lateeoi_ack_dynirq,
> + .irq_mask_ack   = lateeoi_mask_ack_dynirq,
>  
>   .irq_set_affinity   = set_affinity_irq,
>   .irq_retrigger  = retrigger_dynirq,
> 

Unlike the prior handler the two new ones don't call ack_dynirq()
anymore, and the description doesn't give a hint towards this
difference. As a consequence, clear_evtchn() also doesn't get
called anymore - patch 3 adds the calls, but claims an older
commit to have been at fault. _If_ ack_dynirq() indeed isn't to
be called here, shouldn't the clear_evtchn() calls get added
right here?

Jan


Re: [PATCH 6/7] xen/evtch: use smp barriers for user event ring

2021-02-08 Thread Jan Beulich
On 08.02.2021 10:44, Andrew Cooper wrote:
> On 06/02/2021 10:49, Juergen Gross wrote:
>> The ring buffer for user events is used in the local system only, so
>> smp barriers are fine for ensuring consistency.
>>
>> Reported-by: Andrew Cooper 
>> Signed-off-by: Juergen Gross 
> 
> These need to be virt_* to not break in UP builds (on non-x86).

Initially I though so, too, but isn't the sole vCPU of such a
VM getting re-scheduled to a different pCPU in the hypervisor
an implied barrier anyway?

Jan


Re: [PATCH 7/7] xen/evtchn: read producer index only once

2021-02-08 Thread Jan Beulich
On 06.02.2021 11:49, Juergen Gross wrote:
> In evtchn_read() use READ_ONCE() for reading the producer index in
> order to avoid the compiler generating multiple accesses.
> 
> Signed-off-by: Juergen Gross 
> ---
>  drivers/xen/evtchn.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
> index 421382c73d88..f6b199b597bf 100644
> --- a/drivers/xen/evtchn.c
> +++ b/drivers/xen/evtchn.c
> @@ -211,7 +211,7 @@ static ssize_t evtchn_read(struct file *file, char __user 
> *buf,
>   goto unlock_out;
>  
>   c = u->ring_cons;
> - p = u->ring_prod;
> + p = READ_ONCE(u->ring_prod);
>   if (c != p)
>   break;

Why only here and not also in

rc = wait_event_interruptible(u->evtchn_wait,
  u->ring_cons != u->ring_prod);

or in evtchn_poll()? I understand it's not needed when
ring_prod_lock is held, but that's not the case in the two
afaics named places. Plus isn't the same then true for
ring_cons and ring_cons_mutex, i.e. aren't the two named
places plus evtchn_interrupt() also in need of READ_ONCE()
for ring_cons?

Jan


Re: [PATCH 6/7] xen/evtch: use smp barriers for user event ring

2021-02-08 Thread Jan Beulich
On 06.02.2021 11:49, Juergen Gross wrote:
> The ring buffer for user events is used in the local system only, so
> smp barriers are fine for ensuring consistency.
> 
> Reported-by: Andrew Cooper 
> Signed-off-by: Juergen Gross 

Reviewed-by: Jan Beulich 

Albeit I think "local system" is at least ambiguous (physical
machine? VM?). How about something like "is local to the given
kernel instance"?

Jan


Re: [PATCH] xen-blkback: fix compatibility bug with single page rings

2021-01-27 Thread Jan Beulich
On 27.01.2021 12:33, Paul Durrant wrote:
>> -Original Message-
>> From: Jan Beulich 
>> Sent: 27 January 2021 11:21
>> To: p...@xen.org
>> Cc: 'Paul Durrant' ; 'Konrad Rzeszutek Wilk' 
>> ; 'Roger Pau
>> Monné' ; 'Jens Axboe' ; 'Dongli Zhang'
>> ; linux-kernel@vger.kernel.org; 
>> linux-bl...@vger.kernel.org; xen-
>> de...@lists.xenproject.org
>> Subject: Re: [PATCH] xen-blkback: fix compatibility bug with single page 
>> rings
>>
>> On 27.01.2021 12:09, Paul Durrant wrote:
>>>> -Original Message-
>>>> From: Jan Beulich 
>>>> Sent: 27 January 2021 10:57
>>>> To: Paul Durrant 
>>>> Cc: Paul Durrant ; Konrad Rzeszutek Wilk 
>>>> ; Roger Pau
>>>> Monné ; Jens Axboe ; Dongli Zhang 
>>>> ;
>>>> linux-kernel@vger.kernel.org; linux-bl...@vger.kernel.org; 
>>>> xen-de...@lists.xenproject.org
>>>> Subject: Re: [PATCH] xen-blkback: fix compatibility bug with single page 
>>>> rings
>>>>
>>>> On 27.01.2021 11:30, Paul Durrant wrote:
>>>>> From: Paul Durrant 
>>>>>
>>>>> Prior to commit 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid
>>>>> inconsistent xenstore 'ring-page-order' set by malicious blkfront"), the
>>>>> behaviour of xen-blkback when connecting to a frontend was:
>>>>>
>>>>> - read 'ring-page-order'
>>>>> - if not present then expect a single page ring specified by 'ring-ref'
>>>>> - else expect a ring specified by 'ring-refX' where X is between 0 and
>>>>>   1 << ring-page-order
>>>>>
>>>>> This was correct behaviour, but was broken by the afforementioned commit 
>>>>> to
>>>>> become:
>>>>>
>>>>> - read 'ring-page-order'
>>>>> - if not present then expect a single page ring
>>>>> - expect a ring specified by 'ring-refX' where X is between 0 and
>>>>>   1 << ring-page-order
>>>>> - if that didn't work then see if there's a single page ring specified by
>>>>>   'ring-ref'
>>>>>
>>>>> This incorrect behaviour works most of the time but fails when a frontend
>>>>> that sets 'ring-page-order' is unloaded and replaced by one that does not
>>>>> because, instead of reading 'ring-ref', xen-blkback will read the stale
>>>>> 'ring-ref0' left around by the previous frontend will try to map the wrong
>>>>> grant reference.
>>>>>
>>>>> This patch restores the original behaviour.
>>>>
>>>> Isn't this only the 2nd of a pair of fixes that's needed, the
>>>> first being the drivers, upon being unloaded, to fully clean up
>>>> after itself? Any stale key left may lead to confusion upon
>>>> re-use of the containing directory.
>>>
>>> In a backend we shouldn't be relying on, nor really expect IMO, a frontend 
>>> to clean up after itself.
>> Any backend should know *exactly* what xenstore nodes it’s looking for from 
>> a frontend.
>>
>> But the backend can't know whether a node exists because the present
>> frontend has written it, or because an earlier instance forgot to
>> delete it. It can only honor what's there. (In fact the other day I
>> was wondering whether some of the writes of boolean "false" nodes
>> wouldn't better be xenbus_rm() instead.)
> 
> In the particular case this patch is fixing for me, the frontends are the 
> Windows XENVBD driver and the Windows crash version of the same driver 
> (actually built from different code). The 'normal' instance is multi-page 
> aware and the crash instance is not quite, i.e. it uses the old ring-ref but 
> knows to clean up 'ring-page-order'.
> Clearly, in a crash situation, we cannot rely on frontend to clean up

Ah, I see (and agree).

> so what you say does highlight that there indeed needs to be a second patch 
> to xen-blkback to make sure it removes 'ring-page-order' itself as 'state' 
> cycles through Closed and back to InitWait.

And not just this one node then, I suppose?

> I think this patch does still stand on its own though.

Perhaps, yes.

Jan


Re: [PATCH] xen-blkback: fix compatibility bug with single page rings

2021-01-27 Thread Jan Beulich
On 27.01.2021 12:09, Paul Durrant wrote:
>> -Original Message-
>> From: Jan Beulich 
>> Sent: 27 January 2021 10:57
>> To: Paul Durrant 
>> Cc: Paul Durrant ; Konrad Rzeszutek Wilk 
>> ; Roger Pau
>> Monné ; Jens Axboe ; Dongli Zhang 
>> ;
>> linux-kernel@vger.kernel.org; linux-bl...@vger.kernel.org; 
>> xen-de...@lists.xenproject.org
>> Subject: Re: [PATCH] xen-blkback: fix compatibility bug with single page 
>> rings
>>
>> On 27.01.2021 11:30, Paul Durrant wrote:
>>> From: Paul Durrant 
>>>
>>> Prior to commit 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid
>>> inconsistent xenstore 'ring-page-order' set by malicious blkfront"), the
>>> behaviour of xen-blkback when connecting to a frontend was:
>>>
>>> - read 'ring-page-order'
>>> - if not present then expect a single page ring specified by 'ring-ref'
>>> - else expect a ring specified by 'ring-refX' where X is between 0 and
>>>   1 << ring-page-order
>>>
>>> This was correct behaviour, but was broken by the afforementioned commit to
>>> become:
>>>
>>> - read 'ring-page-order'
>>> - if not present then expect a single page ring
>>> - expect a ring specified by 'ring-refX' where X is between 0 and
>>>   1 << ring-page-order
>>> - if that didn't work then see if there's a single page ring specified by
>>>   'ring-ref'
>>>
>>> This incorrect behaviour works most of the time but fails when a frontend
>>> that sets 'ring-page-order' is unloaded and replaced by one that does not
>>> because, instead of reading 'ring-ref', xen-blkback will read the stale
>>> 'ring-ref0' left around by the previous frontend will try to map the wrong
>>> grant reference.
>>>
>>> This patch restores the original behaviour.
>>
>> Isn't this only the 2nd of a pair of fixes that's needed, the
>> first being the drivers, upon being unloaded, to fully clean up
>> after itself? Any stale key left may lead to confusion upon
>> re-use of the containing directory.
> 
> In a backend we shouldn't be relying on, nor really expect IMO, a frontend to 
> clean up after itself. Any backend should know *exactly* what xenstore nodes 
> it’s looking for from a frontend.

But the backend can't know whether a node exists because the present
frontend has written it, or because an earlier instance forgot to
delete it. It can only honor what's there. (In fact the other day I
was wondering whether some of the writes of boolean "false" nodes
wouldn't better be xenbus_rm() instead.)

Jan


Re: [PATCH v2] x86/xen: avoid warning in Xen pv guest with CONFIG_AMD_MEM_ENCRYPT enabled

2021-01-27 Thread Jan Beulich
On 27.01.2021 11:26, Jürgen Groß wrote:
> On 27.01.21 10:43, Andrew Cooper wrote:
>> On 27/01/2021 09:38, Juergen Gross wrote:
>>> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
>>> index 4409306364dc..ca5ac10fcbf7 100644
>>> --- a/arch/x86/xen/enlighten_pv.c
>>> +++ b/arch/x86/xen/enlighten_pv.c
>>> @@ -583,6 +583,12 @@ DEFINE_IDTENTRY_RAW(xenpv_exc_debug)
>>> exc_debug(regs);
>>>   }
>>>   
>>> +DEFINE_IDTENTRY_RAW(exc_xen_unknown_trap)
>>> +{
>>> +   /* This should never happen and there is no way to handle it. */
>>> +   panic("Unknown trap in Xen PV mode.");
>>
>> Looks much better.  How about including regs->entry_vector here, just to
>> short circuit the inevitable swearing which will accompany encountering
>> this panic() ?
> 
> You are aware the regs parameter is struct pt_regs *, not the Xen
> struct cpu_user_regs *?
> 
> So I have no idea how I should get this information without creating
> a per-vector handler.

Maybe log _RET_IP_ then (ideally decoded to a symbol), to give at
least some hint as to where this was coming from?

Jan


Re: [PATCH] xen-blkback: fix compatibility bug with single page rings

2021-01-27 Thread Jan Beulich
On 27.01.2021 11:30, Paul Durrant wrote:
> From: Paul Durrant 
> 
> Prior to commit 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid
> inconsistent xenstore 'ring-page-order' set by malicious blkfront"), the
> behaviour of xen-blkback when connecting to a frontend was:
> 
> - read 'ring-page-order'
> - if not present then expect a single page ring specified by 'ring-ref'
> - else expect a ring specified by 'ring-refX' where X is between 0 and
>   1 << ring-page-order
> 
> This was correct behaviour, but was broken by the afforementioned commit to
> become:
> 
> - read 'ring-page-order'
> - if not present then expect a single page ring
> - expect a ring specified by 'ring-refX' where X is between 0 and
>   1 << ring-page-order
> - if that didn't work then see if there's a single page ring specified by
>   'ring-ref'
> 
> This incorrect behaviour works most of the time but fails when a frontend
> that sets 'ring-page-order' is unloaded and replaced by one that does not
> because, instead of reading 'ring-ref', xen-blkback will read the stale
> 'ring-ref0' left around by the previous frontend will try to map the wrong
> grant reference.
> 
> This patch restores the original behaviour.

Isn't this only the 2nd of a pair of fixes that's needed, the
first being the drivers, upon being unloaded, to fully clean up
after itself? Any stale key left may lead to confusion upon
re-use of the containing directory.

Jan


Re: [PATCH v2] xen-blkfront: allow discard-* nodes to be optional

2021-01-20 Thread Jan Beulich
On 20.01.2021 15:35, Roger Pau Monné wrote:
> On Wed, Jan 20, 2021 at 03:23:30PM +0100, Arthur Borsboom wrote:
>> Hi Roger,
>>
>> I have set up a test environment based on Linux 5.11.0-rc4.
>> The patch did not apply clean, so I copied/pasted the patch manually.
>>
>> Without the patch the call trace (as reported) is visible in dmesg.
>> With the patch the call trace in dmesg is gone, but ... (there is always a
>> but) ...
>>
>> Now the discard action returns the following.
>>
>> [arthur@test-arch ~]$ sudo fstrim -v /
>> fstrim: /: the discard operation is not supported
>>
>> It might be correct, but of course I was hoping the Xen VM guest would pass
>> on the discard request to the block device in the Xen VM host, which is a
>> disk partition.
>> Any suggestions?
> 
> Hm, that's not what I did see on my testing, the operation worked OK,
> and that's what I would expect to happen in your case also, since I
> know the xenstore keys.

Does discard work on partitions (and not just on entire disks)?
It's been a while since I last had anything to do with this code,
so I may well misremember.

Jan


Re: [PATCH v2 3/5] xen/events: only register debug interrupt for 2-level events

2020-10-22 Thread Jan Beulich
On 22.10.2020 15:09, Jürgen Groß wrote:
> On 22.10.20 12:35, Jan Beulich wrote:
>> On 22.10.2020 11:49, Juergen Gross wrote:
>>> @@ -2080,10 +2080,12 @@ void __init xen_init_IRQ(void)
>>> int ret = -EINVAL;
>>> evtchn_port_t evtchn;
>>>   
>>> -   if (fifo_events)
>>> +   if (xen_fifo_events)
>>> ret = xen_evtchn_fifo_init();
>>> -   if (ret < 0)
>>> +   if (ret < 0) {
>>> xen_evtchn_2l_init();
>>> +   xen_fifo_events = false;
>>> +   }
>>
>> Another note: While it may not matter right here, maybe better
>> first set the variable and the call the function?
> 
> I don't think this is really important, TBH.
> 
> This code is executed before other cpus are up and we'd have major
> other problems in case the sequence would really matter here.

Fair enough; I was thinking in particular that it ought to be
legitimate for xen_evtchn_2l_init() to BUG_ON(xen_fifo_events).

Jan


Re: [PATCH v2 3/5] xen/events: only register debug interrupt for 2-level events

2020-10-22 Thread Jan Beulich
On 22.10.2020 11:49, Juergen Gross wrote:
> @@ -2080,10 +2080,12 @@ void __init xen_init_IRQ(void)
>   int ret = -EINVAL;
>   evtchn_port_t evtchn;
>  
> - if (fifo_events)
> + if (xen_fifo_events)
>   ret = xen_evtchn_fifo_init();
> - if (ret < 0)
> + if (ret < 0) {
>   xen_evtchn_2l_init();
> + xen_fifo_events = false;
> + }

Another note: While it may not matter right here, maybe better
first set the variable and the call the function?

Jan


Re: [PATCH 0/5] xen: event handling cleanup

2020-10-22 Thread Jan Beulich
On 22.10.2020 09:42, Juergen Gross wrote:
> Do some cleanups in Xen event handling code.
> 
> Juergen Gross (5):
>   xen: remove no longer used functions
>   xen/events: make struct irq_info private to events_base.c
>   xen/events: only register debug interrupt for 2-level events
>   xen/events: unmask a fifo event channel only if it was masked
>   Documentation: add xen.fifo_events kernel parameter description

With the two remarks to individual patches suitably taken care of
one way or another
Reviewed-by: Jan Beulich 

Jan


Re: [PATCH 4/5] xen/events: unmask a fifo event channel only if it was masked

2020-10-22 Thread Jan Beulich
On 22.10.2020 09:42, Juergen Gross wrote:
> --- a/drivers/xen/events/events_fifo.c
> +++ b/drivers/xen/events/events_fifo.c
> @@ -236,6 +236,9 @@ static bool clear_masked_cond(volatile event_word_t *word)
>  
>   w = *word;
>  
> + if (!(w & (1 << EVTCHN_FIFO_MASKED)))
> + return true;

Maybe better move this ...

>   do {
>   if (w & (1 << EVTCHN_FIFO_PENDING))
>   return false;
> 

... into the loop, above this check?

Jan


Re: [PATCH 3/5] xen/events: only register debug interrupt for 2-level events

2020-10-22 Thread Jan Beulich
On 22.10.2020 09:42, Juergen Gross wrote:
> --- a/drivers/xen/events/events_base.c
> +++ b/drivers/xen/events/events_base.c
> @@ -2050,7 +2050,7 @@ void xen_setup_callback_vector(void) {}
>  static inline void xen_alloc_callback_vector(void) {}
>  #endif
>  
> -static bool fifo_events = true;
> +bool fifo_events = true;

When making this non-static, perhaps better to also prefix it with
xen_?

Jan


Re: [PATCH 1/2] xen: Remove Xen PVH/PVHVM dependency on PCI

2020-10-15 Thread Jan Beulich
On 15.10.2020 16:59, Jason Andryuk wrote:
> On Thu, Oct 15, 2020 at 4:10 AM Jan Beulich  wrote:
>>
>> On 14.10.2020 19:53, Jason Andryuk wrote:
>>> @@ -76,7 +80,9 @@ config XEN_DEBUG_FS
>>> Enabling this option may incur a significant performance overhead.
>>>
>>>  config XEN_PVH
>>> - bool "Support for running as a Xen PVH guest"
>>> + bool "Xen PVH guest support"
>>
>> Tangential question: Is "guest" here still appropriate, i.e.
>> isn't this option also controlling whether the kernel can be
>> used in a PVH Dom0?
> 
> Would you like something more generic like "Xen PVH support" and
> "Support for running in Xen PVH mode"?

Yeah, just dropping "guest" would be fine with me. No idea how
to reflect that PVH Dom0 isn't supported, yet.

Jan


Re: [PATCH 1/2] xen: Remove Xen PVH/PVHVM dependency on PCI

2020-10-15 Thread Jan Beulich
On 14.10.2020 19:53, Jason Andryuk wrote:
> @@ -76,7 +80,9 @@ config XEN_DEBUG_FS
> Enabling this option may incur a significant performance overhead.
>  
>  config XEN_PVH
> - bool "Support for running as a Xen PVH guest"
> + bool "Xen PVH guest support"

Tangential question: Is "guest" here still appropriate, i.e.
isn't this option also controlling whether the kernel can be
used in a PVH Dom0?

>   def_bool n

And is this default still appropriate?

Jan


Re: [PATCH] x86/xen: disable Firmware First mode for correctable memory errors

2020-09-25 Thread Jan Beulich
On 25.09.2020 15:45, boris.ostrov...@oracle.com wrote:
> On 9/25/20 6:11 AM, Juergen Gross wrote:
>> @@ -1296,6 +1296,14 @@ asmlinkage __visible void __init 
>> xen_start_kernel(void)
>>  
>>  xen_smp_init();
>>  
>> +#ifdef CONFIG_ACPI
>> +/*
>> + * Disable selecting "Firmware First mode" for correctable memory
>> + * errors, as this is the duty of the hypervisor to decide.
>> + */
>> +acpi_disable_cmcff = 1;
>> +#endif
> 
> 
> Not that it matters greatly but should this go under if (xen_initial_domain())
> clause a bit further down?

Yes - DomU-s are supposed to be in charge of their (virtual) firmware,
no matter that right now APEI for guests is completely out of sight as
far as I'm aware.

Jan


Re: [PATCH] efi: discover ESRT table on Xen PV too

2020-08-19 Thread Jan Beulich
On 18.08.2020 14:47, Roger Pau Monné wrote:
> On Tue, Aug 18, 2020 at 02:01:35PM +0200, Marek Marczykowski-Górecki wrote:
>> On Mon, Aug 17, 2020 at 11:00:13AM +0200, Roger Pau Monné wrote:
>>> On Sun, Aug 16, 2020 at 02:19:49AM +0200, Marek Marczykowski-Górecki wrote:
 In case of Xen PV dom0, Xen passes along info about system tables (see
 arch/x86/xen/efi.c), but not the memory map from EFI.
>>>
>>> I think that's because the memory map returned by
>>> XENMEM_machine_memory_map is in e820 form, and doesn't contain the
>>> required information about the EFI regions due to the translation done
>>> by efi_arch_process_memory_map in Xen?
>>
>> Yes, I think so.
>>
 This makes sense
 as it is Xen responsible for managing physical memory address space.
 In this case, it doesn't make sense to condition using ESRT table on
 availability of EFI memory map, as it isn't Linux kernel responsible for
 it.
>>>
>>> PV dom0 is kind of special in that regard as it can create mappings to
>>> (almost) any MMIO regions, and hence can change it's memory map
>>> substantially.
>>
>> Do you mean PV dom0 should receive full EFI memory map? Jan already
>> objected this as it would be a layering violation.
> 
> dom0 is already capable of getting the native e820 memory map using
> XENMEM_machine_memory_map, I'm not sure why allowing to return the
> memory map in EFI form would be any different (or a layering
> violation in the PV dom0 case).

The EFI memory map exposes more information than the E820 one, and
this extra information should remain private to Xen if at all
possible. I actually think that exposing the raw E820 map was a
layering violation, too. Instead hypercalls should have been added
for the specific legitimate uses that Dom0 may have for the memmap.

Jan


[PATCH] x86/mm: keep __default_kernel_pte_mask in sync with __supported_pte_mask

2020-05-27 Thread Jan Beulich
Both masks get applied in the process of e.g. set_fixmap() - the former
through use of PAGE_KERNEL, the latter by use of massage_pgprot(). Hence
forever since the introduction of the former there was a time window
(between x86_configure_nx() and the syncing of the two in
probe_page_size_mask(), as called from init_mem_mapping()) where fixmap
mappings would get established without NX set. For a 32-bit kernel
running in PV mode under Xen this meant a W+X mapping (and associated
warning) for its shared info page mapping established in
xen_pv_init_platform().

Signed-off-by: Jan Beulich 

--- a/arch/x86/mm/setup_nx.c
+++ b/arch/x86/mm/setup_nx.c
@@ -33,10 +33,13 @@ early_param("noexec", noexec_setup);
 
 void x86_configure_nx(void)
 {
-   if (boot_cpu_has(X86_FEATURE_NX) && !disable_nx)
+   if (boot_cpu_has(X86_FEATURE_NX) && !disable_nx) {
__supported_pte_mask |= _PAGE_NX;
-   else
+   __default_kernel_pte_mask |= _PAGE_NX;
+   } else {
__supported_pte_mask &= ~_PAGE_NX;
+   __default_kernel_pte_mask &= ~_PAGE_NX;
+   }
 }
 
 void __init x86_report_nx(void)


Re: [PATCH 1/2] x86: mm: ptdump: Calculate effective permissions correctly

2020-05-27 Thread Jan Beulich
On 21.05.2020 17:23, Steven Price wrote:
> By switching the x86 page table dump code to use the generic code the
> effective permissions are no longer calculated correctly because the
> note_page() function is only called for *leaf* entries. To calculate the
> actual effective permissions it is necessary to observe the full
> hierarchy of the page tree.
> 
> Introduce a new callback for ptdump which is called for every entry and
> can therefore update the prot_levels array correctly. note_page() can
> then simply access the appropriate element in the array.
> 
> Reported-by: Jan Beulich 
> Fixes: 2ae27137b2db ("x86: mm: convert dump_pagetables to use 
> walk_page_range")
> Signed-off-by: Steven Price 

This (with the later correction) and the 2nd patch
Tested-by: Jan Beulich 

It allowed me to go and finally find why under Xen there was still
a single W+X mapping left - another bug, another patch.

Thanks, Jan


[PATCH] ix86: keep page table dumping off hypervisor area

2020-05-19 Thread Jan Beulich
Commit bba42affa732 ("x86/mm: Fix dump_pagetables with Xen PV") took
care of only the 64-bit case, albeit I think the 32-bit issue predates
the commit mentioned there in the Fixes: tag. For the 32-bit case, as
long as this space is mapped (by the hypervisor) with large pages, no
(noticable) harm results from walking even that area. If there are 4k
mappings, though, HIGHPTE=y configurations wanting to also dump that
range will find a need to map page tables owned by the hypervisor (which
hence doesn't allow them to be mapped). With DEBUG_WX this in turn
causes booting to fail.

Signed-off-by: Jan Beulich 

--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -366,7 +366,7 @@ static void ptdump_walk_pgd_level_core(s
{0, PTRS_PER_PGD * PGD_LEVEL_MULT / 2},
{GUARD_HOLE_END_ADDR, ~0UL},
 #else
-   {0, ~0UL},
+   {0, __FIXADDR_TOP + PAGE_SIZE ?: ~0UL},
 #endif
{0, 0}
 };


Re: your "x86: mm: convert dump_pagetables to use walk_page_range" change

2020-05-12 Thread Jan Beulich
On 12.05.2020 15:02, Steven Price wrote:
> On 12/05/2020 10:39, Jan Beulich wrote:
>> Additionally I'd like to note that note_page()'s "unsigned long val"
>> parameter isn't wide enough for 32-bit PAE PTEs, and hence the NX
>> flag will always be seen as clear in new_prot in such configs.
> 
> Ah, interesting. I'm not sure what type is actually guaranteed to be 
> correct. pgprotval_t is x86 specific, but it might be necessary to 
> extend it to other architectures. I think I got the "unsigned long" from 
> the generic page.h (and because it happens to work on most 
> architectures) - but hadn't noticed that that file was specifically only 
> for NOMMU architectures.

Well, it's pteval_t (still x86-specific I guess) unless the call
sites could/would be switched to use pte_flags(). As per
https://lists.xenproject.org/archives/html/xen-devel/2020-05/msg00549.html
the use of pte_val() has another bad effect, but as described
there it might also be Xen specific code which is broken here.

> I'll see if I can come up with fixes, but if you've got anything ready 
> already then please jump in.

No, I don't. I lack the time right now to make a change potentially
affecting multiple architectures.

Jan


your "x86: mm: convert dump_pagetables to use walk_page_range" change

2020-05-12 Thread Jan Beulich
Steven,

in the description of this change you say:

"The effective permissions are passed down the chain using new fields in
 struct pg_state."

I don't see how this works, and I suppose this part of the change is
(part of) the reason why a W+X warning has magically disappeared in
5.6.x (compared to 5.5.x) when running a 32-bit kernel under Xen.

Quoting the relevant piece of code:

if (level > 0) {
new_eff = effective_prot(st->prot_levels[level - 1],
 new_prot);
} else {
new_eff = new_prot;
}

if (level >= 0)
st->prot_levels[level] = new_eff;

The generic framework calls note_page() only for leaf pages or holes
afaics. The protections for a leaf page found at a level other than
the numerically highest one have no meaning at all for a mapping at
a later address mapped with a numerically higher level mapping.
Instead it's the non-leaf page tables for that specific address
which determine the effective protection for any particular mapping.

To take an example, suppose the first present leaf page is found
at level 4. st->prot_levels[] will be all zero at this time, from
which it follows that new_eff will be zero then, too.

I don't think the intended effect can be achieved without either
retaining the original behavior of passing the effective protection
into note_page(), or calling note_page() also for non-leaf pages
(indicating to it which case it is, and adjusting it accordingly).

Am I overlooking something?

Additionally I'd like to note that note_page()'s "unsigned long val"
parameter isn't wide enough for 32-bit PAE PTEs, and hence the NX
flag will always be seen as clear in new_prot in such configs.

Jan


[PATCH] x86/stackframe/32: repair 32-bit Xen PV

2019-10-07 Thread Jan Beulich
Once again RPL checks have been introduced which don't account for a
32-bit kernel living in ring 1 when running in a PV Xen domain. The
case in FIXUP_FRAME has been preventing boot; adjust BUG_IF_WRONG_CR3
as well just in case.

Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")
Signed-off-by: Jan Beulich 

--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -48,6 +48,17 @@
 
 #include "calling.h"
 
+#ifndef CONFIG_XEN_PV
+# define USER_SEGMENT_RPL_MASK SEGMENT_RPL_MASK
+#else
+/*
+ * When running paravirtualized on Xen the kernel runs in ring 1, and hence
+ * simple mask based tests (i.e. ones not comparing against USER_RPL) have to
+ * ignore bit 0. See also the C-level get_kernel_rpl().
+ */
+# define USER_SEGMENT_RPL_MASK (SEGMENT_RPL_MASK & ~1)
+#endif
+
.section .entry.text, "ax"
 
 /*
@@ -172,7 +183,7 @@
ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
.if \no_user_check == 0
/* coming from usermode? */
-   testl   $SEGMENT_RPL_MASK, PT_CS(%esp)
+   testl   $USER_SEGMENT_RPL_MASK, PT_CS(%esp)
jz  .Lend_\@
.endif
/* On user-cr3? */
@@ -217,7 +228,7 @@
testl   $X86_EFLAGS_VM, 4*4(%esp)
jnz .Lfrom_usermode_no_fixup_\@
 #endif
-   testl   $SEGMENT_RPL_MASK, 3*4(%esp)
+   testl   $USER_SEGMENT_RPL_MASK, 3*4(%esp)
jnz .Lfrom_usermode_no_fixup_\@
 
orl $CS_FROM_KERNEL, 3*4(%esp)


Re: [Xen-devel] [PATCH] x86/xen: Return from panic notifier

2019-10-02 Thread Jan Beulich
On 02.10.2019 16:14, Boris Ostrovsky wrote:
> On 10/2/19 9:42 AM, Jan Beulich wrote:
>>
>> I can only guess that the thinking probably was that e.g. external
>> dumping (by the tool stack) would be more reliable (including but
>> not limited to this meaning less change of state from when the
>> original crash reason was detected) than having the domain dump
>> itself.
> 
> 
> We could register an external dumper (controlled by a boot option
> perhaps, off by default) that will call directly into hypervisor with
> SHUTDOWN_crash. That will guarantee that we will complete the notifier
> chain without relying on priorities. (Of course this still won't address
> a possible new feature in panic() that might be called post-dumping)
> 
> If you think it's worth doing this can be easily added.

Well, I think this is the better option than potentially
pingponging between the two extremes, because one wants it the
way it currently is and another wants it the way this patch
changes things to.

Jan


Re: [Xen-devel] [PATCH] x86/xen: Return from panic notifier

2019-10-02 Thread Jan Beulich
On 02.10.2019 15:24, Boris Ostrovsky wrote:
> On 10/2/19 3:40 AM, Jan Beulich wrote:
>> On 01.10.2019 17:16, Boris Ostrovsky wrote:
>>> Currently execution of panic() continues until Xen's panic notifier
>>> (xen_panic_event()) is called at which point we make a hypercall that
>>> never returns.
>>>
>>> This means that any notifier that is supposed to be called later as
>>> well as significant part of panic() code (such as pstore writes from
>>> kmsg_dump()) is never executed.
>> Back at the time when this was introduced into the XenoLinux tree,
>> I think this behavior was intentional for at least DomU-s. I wonder
>> whether you wouldn't want your change to further distinguish Dom0
>> and DomU behavior.
> 
> Do you remember what the reason for that was?

I can only guess that the thinking probably was that e.g. external
dumping (by the tool stack) would be more reliable (including but
not limited to this meaning less change of state from when the
original crash reason was detected) than having the domain dump
itself.

> I think having ability to call kmsg_dump() on a panic is a useful thing
> to have for domUs as well. Besides, there may be other functionality
> added post-notifiers in panic() in the future. Or another notifier may
> be registered later with the same lowest priority.
> 
> Is there a downside in allowing domUs to fall through panic() all the
> way to emergency_restart()?

See above.

>>> There is no reason for xen_panic_event() to be this last point in
>>> execution since panic()'s emergency_restart() will call into
>>> xen_emergency_restart() from where we can perform our hypercall.
>> Did you consider, as an alternative, to lower the notifier's
>> priority?
> 
> I didn't but that wouldn't help with the original problem that James
> reported --- we'd still not get to kmsg_dump() call.

True. I guess more control over the behavior needs to be given to
the admin, as either approach has its up- and downsides

Jan


Re: [Xen-devel] [PATCH] x86/xen: Return from panic notifier

2019-10-02 Thread Jan Beulich
On 01.10.2019 17:16, Boris Ostrovsky wrote:
> Currently execution of panic() continues until Xen's panic notifier
> (xen_panic_event()) is called at which point we make a hypercall that
> never returns.
> 
> This means that any notifier that is supposed to be called later as
> well as significant part of panic() code (such as pstore writes from
> kmsg_dump()) is never executed.

Back at the time when this was introduced into the XenoLinux tree,
I think this behavior was intentional for at least DomU-s. I wonder
whether you wouldn't want your change to further distinguish Dom0
and DomU behavior.

> There is no reason for xen_panic_event() to be this last point in
> execution since panic()'s emergency_restart() will call into
> xen_emergency_restart() from where we can perform our hypercall.

Did you consider, as an alternative, to lower the notifier's
priority?

Jan


Re: [Xen-devel] [PATCH] xen/efi: have a common runtime setup function

2019-10-01 Thread Jan Beulich
On 01.10.2019 10:25, Juergen Gross wrote:
> @@ -281,4 +270,26 @@ void xen_efi_reset_system(int reset_type, efi_status_t 
> status,
>   BUG();
>   }
>  }
> -EXPORT_SYMBOL_GPL(xen_efi_reset_system);
> +
> +/*
> + * Set XEN EFI runtime services function pointers. Other fields of struct 
> efi,
> + * e.g. efi.systab, will be set like normal EFI.
> + */
> +void __init xen_efi_runtime_setup(void)
> +{
> + efi.get_time= xen_efi_get_time;
> + efi.set_time= xen_efi_set_time;
> + efi.get_wakeup_time = xen_efi_get_wakeup_time;
> + efi.set_wakeup_time = xen_efi_set_wakeup_time;
> + efi.get_variable= xen_efi_get_variable;
> + efi.get_next_variable   = xen_efi_get_next_variable;
> + efi.set_variable= xen_efi_set_variable;
> + efi.set_variable_nonblocking= xen_efi_set_variable;
> + efi.query_variable_info = xen_efi_query_variable_info;
> + efi.query_variable_info_nonblocking = xen_efi_query_variable_info;
> + efi.update_capsule  = xen_efi_update_capsule;
> + efi.query_capsule_caps  = xen_efi_query_capsule_caps;
> + efi.get_next_high_mono_count= xen_efi_get_next_high_mono_count;
> + efi.reset_system= xen_efi_reset_system;
> +}
> +EXPORT_SYMBOL_GPL(xen_efi_runtime_setup);

I don't think exporting an __init function is a good idea, and I also
don't see why the function would need exporting had it had the __init
dropped. With the line dropped
Reviewed-by: Jan Beulich 

Jan



Re: [Xen-devel] [PATCH] xen/pci: try to reserve MCFG areas earlier

2019-09-11 Thread Jan Beulich
On 11.09.2019 03:15, Igor Druzhinin wrote:
> Another thing that I implied by "not supporting" but want to explicitly
> call out is that currently Xen will refuse reserving any MCFG area
> unless it actually existed in MCFG table at boot. I don't clearly
> understand reasoning behind it but it might be worth relaxing at least
> size matching restriction on Xen side now with this change.

I guess it's because no-one had a system were it would be needed,
and hence could be tested.

Jan


Re: [Xen-devel] [PATCH] xen/pci: try to reserve MCFG areas earlier

2019-09-10 Thread Jan Beulich
On 10.09.2019 11:46, Igor Druzhinin wrote:
> On 10/09/2019 02:47, Boris Ostrovsky wrote:
>> On 9/9/19 5:48 PM, Igor Druzhinin wrote:
>>> Actually, pci_mmcfg_late_init() that's called out of acpi_init() -
>>> that's where MCFG areas are properly sized. 
>>
>> pci_mmcfg_late_init() reads the (static) MCFG, which doesn't need DSDT 
>> parsing, does it? setup_mcfg_map() OTOH does need it as it uses data from 
>> _CBA (or is it _CRS?), and I think that's why we can't parse MCFG prior to 
>> acpi_init(). So what I said above indeed won't work.
>>
> 
> No, it uses is_acpi_reserved() (it's called indirectly so might be well
> hidden) to parse DSDT to find a reserved resource in it and size MCFG
> area accordingly. setup_mcfg_map() is called for every root bus
> discovered and indeed tries to evaluate _CBA but at this point
> pci_mmcfg_late_init() has already finished MCFG registration for every
> cold-plugged bus (which information is described in MCFG table) so those
> calls are dummy.

I don't think they're strictly dummy. Even for boot time available devices
iirc there's no strict requirement for there to be respective data in MCFG.
Such a requirement exists only for devices which are actually needed to
start the OS (disk or network, perhaps video or alike), or maybe even just
its loader.

Jan


Re: [Xen-devel] [PATCH] ARM: xen: unexport HYPERVISOR_platform_op function

2019-09-06 Thread Jan Beulich
On 06.09.2019 17:55, Andrew Cooper wrote:
> On 06/09/2019 16:39, Arnd Bergmann wrote:
>> HYPERVISOR_platform_op() is an inline function and should not
>> be exported. Since commit 15bfc2348d54 ("modpost: check for
>> static EXPORT_SYMBOL* functions"), this causes a warning:
>>
>> WARNING: "HYPERVISOR_platform_op" [vmlinux] is a static EXPORT_SYMBOL_GPL
>>
>> Remove the extraneous export.
>>
>> Fixes: 15bfc2348d54 ("modpost: check for static EXPORT_SYMBOL* functions")
>> Signed-off-by: Arnd Bergmann 
> 
> Something is wonky.  That symbol is (/ really ought to be) in the
> hypercall page and most definitely not inline.

It's HYPERVISOR_platform_op_raw that's in the hypercall page afaics.

Jan


Re: [PATCH] xen/pci: try to reserve MCFG areas earlier

2019-09-04 Thread Jan Beulich
On 04.09.2019 13:36, Igor Druzhinin wrote:
> On 04/09/2019 10:08, Jan Beulich wrote:
>> On 04.09.2019 02:20, Igor Druzhinin wrote:
>>> If MCFG area is not reserved in E820, Xen by default will defer its usage
>>> until Dom0 registers it explicitly after ACPI parser recognizes it as
>>> a reserved resource in DSDT. Having it reserved in E820 is not
>>> mandatory according to "PCI Firmware Specification, rev 3.2" (par. 4.1.2)
>>> and firmware is free to keep a hole E820 in that place. Xen doesn't know
>>> what exactly is inside this hole since it lacks full ACPI view of the
>>> platform therefore it's potentially harmful to access MCFG region
>>> without additional checks as some machines are known to provide
>>> inconsistent information on the size of the region.
>>
>> Irrespective of this being a good change, I've had another thought
>> while reading this paragraph, for a hypervisor side control: Linux
>> has a "memopt=" command line option allowing fine grained control
>> over the E820 map. We could have something similar to allow
>> inserting an E820_RESERVED region into a hole (it would be the
>> responsibility of the admin to guarantee no other conflicts, i.e.
>> it should generally be used only if e.g. the MCFG is indeed known
>> to live at the specified place, and being properly represented in
>> the ACPI tables). Thoughts?
> 
> What other use cases can you think of in case we'd have this option?
> From the top of my head, it might be providing a memmap for a second Xen
> after doing kexec from Xen to Xen.
> 
> What benefits do you think it might have over just accepting a hole
> using "mcfg=relaxed" option from admin perspective?

It wouldn't be MCFG-specific, i.e. it could also be used to e.g.
convert holes to E820_RESERVED to silence VT-d's respective RMRR
warning. Plus by inserting the entry into our own E820 we'd also
propagate it to users of XENMEM_machine_memory_map.

Jan


Re: [PATCH] xen/pci: try to reserve MCFG areas earlier

2019-09-04 Thread Jan Beulich
On 04.09.2019 02:20, Igor Druzhinin wrote:
> If MCFG area is not reserved in E820, Xen by default will defer its usage
> until Dom0 registers it explicitly after ACPI parser recognizes it as
> a reserved resource in DSDT. Having it reserved in E820 is not
> mandatory according to "PCI Firmware Specification, rev 3.2" (par. 4.1.2)
> and firmware is free to keep a hole E820 in that place. Xen doesn't know
> what exactly is inside this hole since it lacks full ACPI view of the
> platform therefore it's potentially harmful to access MCFG region
> without additional checks as some machines are known to provide
> inconsistent information on the size of the region.

Irrespective of this being a good change, I've had another thought
while reading this paragraph, for a hypervisor side control: Linux
has a "memopt=" command line option allowing fine grained control
over the E820 map. We could have something similar to allow
inserting an E820_RESERVED region into a hole (it would be the
responsibility of the admin to guarantee no other conflicts, i.e.
it should generally be used only if e.g. the MCFG is indeed known
to live at the specified place, and being properly represented in
the ACPI tables). Thoughts?

Jan


Re: [Xen-devel] [PATCH] x86/xen/efi: Fix EFI variable 'name' type conversion

2019-09-02 Thread Jan Beulich
On 01.09.2019 08:58, Adam Zerella wrote:
> This resolves a type conversion from 'char *' to 'unsigned short'.

Could you explain this? There's no ...

> --- a/arch/x86/xen/efi.c
> +++ b/arch/x86/xen/efi.c
> @@ -118,8 +118,8 @@ static enum efi_secureboot_mode 
> xen_efi_get_secureboot(void)
>   unsigned long size;
>  
>   size = sizeof(secboot);
> - status = efi.get_variable(L"SecureBoot", _variable_guid,
> -   NULL, , );
> + status = efi.get_variable((efi_char16_t *)L"SecureBoot",
> +   _variable_guid, NULL, , );

... "char *" resulting as type for L"" type strings, hence there
should be no need for a cast: In fact I consider such casts
dangerous, as they may hide actual problems. To me this looks
more like something that wants fixing in sparse; the compilers,
after all, have no issue with such wide character string literals.

> @@ -158,7 +158,7 @@ static enum efi_secureboot_mode 
> xen_efi_get_secureboot(void)
>   return efi_secureboot_mode_unknown;
>  }
>  
> -void __init xen_efi_init(struct boot_params *boot_params)
> +static void __init xen_efi_init(struct boot_params *boot_params)
>  {
>   efi_system_table_t *efi_systab_xen;

If I was a maintainer of this code, I'd request this not be part
of a patch with a title being entirely unrelated to the change.

Jan


Re: [PATCH] vsyscall: use __iter_div_u64_rem()

2019-07-22 Thread Jan Beulich
On 22.07.2019 12:10, Thomas Gleixner wrote:
> On Thu, 11 Jul 2019, Arnd Bergmann wrote:
> 
> Trimmed CC list and added Jan
> 
>> See below for the patch I am using locally to work around this.
>> That patch is probably wrong, so I have not submitted it yet, but it
>> gives you a clean build ;-)
>>
>>   Arnd
>> 8<---
>> Subject: [PATCH] x86: percpu: fix clang 32-bit build
>>
>> clang does not like an inline assembly with a "=q" contraint for
>> a 64-bit output:
>>
>> arch/x86/events/perf_event.h:824:21: error: invalid output size for
>> constraint '=q'
>>  u64 disable_mask = 
>> __this_cpu_read(cpu_hw_events.perf_ctr_virt_mask);
>> ^
>> include/linux/percpu-defs.h:447:2: note: expanded from macro 
>> '__this_cpu_read'
>>  raw_cpu_read(pcp);  \
>>  ^
>> include/linux/percpu-defs.h:421:28: note: expanded from macro 'raw_cpu_read'
>>   #define raw_cpu_read(pcp)
>> __pcpu_size_call_return(raw_cpu_read_, pcp)
>>  ^
>> include/linux/percpu-defs.h:322:23: note: expanded from macro
>> '__pcpu_size_call_return'
>>  case 1: pscr_ret__ = stem##1(variable); break;  \
>>   ^
>> :357:1: note: expanded from here
>> raw_cpu_read_1
>> ^
>> arch/x86/include/asm/percpu.h:394:30: note: expanded from macro 
>> 'raw_cpu_read_1'
>>   #define raw_cpu_read_1(pcp) percpu_from_op(, "mov", pcp)
>>  ^
>> arch/x86/include/asm/percpu.h:189:15: note: expanded from macro 
>> 'percpu_from_op'
>>  : "=q" (pfo_ret__)  \
>>  ^
>>
>> According to the commit that introduced the "q" constraint, this was
>> needed to fix miscompilation, but it gives no further detail.
> 
> Jan, do you have any memory why you added those 'q' constraints? The
> changelog of 3c598766a2ba is not really helpful.

"q" was used in that commit exclusively for byte sized operands, simply
because that _is_ the constraint to use in such cases. Using "r" is
wrong on 32-bit, as it would include inaccessible byte portions of %sp,
%bp, %si, and %di. This is how it's described in gcc sources / docs:

  "Any register accessible as @code{@var{r}l}.  In 32-bit mode, @code{a},
   @code{b}, @code{c}, and @code{d}; in 64-bit mode, any integer register."

What I'm struggling with is why clang would evaluate that asm() in the
first place when a 64-bit field (perf_ctr_virt_mask) is being accessed.

Jan


Re: [Xen-devel] [PATCH 0/2] Remove 32-bit Xen PV guest support

2019-07-16 Thread Jan Beulich
On 15.07.2019 19:28, Andy Lutomirski wrote:
> On Mon, Jul 15, 2019 at 9:34 AM Andi Kleen  wrote:
>>
>> Juergen Gross  writes:
>>
>>> The long term plan has been to replace Xen PV guests by PVH. The first
>>> victim of that plan are now 32-bit PV guests, as those are used only
>>> rather seldom these days. Xen on x86 requires 64-bit support and with
>>> Grub2 now supporting PVH officially since version 2.04 there is no
>>> need to keep 32-bit PV guest support alive in the Linux kernel.
>>> Additionally Meltdown mitigation is not available in the kernel running
>>> as 32-bit PV guest, so dropping this mode makes sense from security
>>> point of view, too.
>>
>> Normally we have a deprecation period for feature removals like this.
>> You would make the kernel print a warning for some releases, and when
>> no user complains you can then remove. If a user complains you can't.
>>
> 
> As I understand it, the kernel rules do allow changes like this even
> if there's a complaint: this is a patch that removes what is
> effectively hardware support.  If the maintenance cost exceeds the
> value, then removal is fair game.  (Obviously we weight the value to
> preserving compatibility quite highly, but in this case, Xen dropped
> 32-bit hardware support a long time ago.  If the Xen hypervisor says
> that 32-bit PV guest support is deprecated, it's deprecated.)

Since it was implied but not explicit from Andrew's reply, just to
make it explicit: So far 32-bit PV guest support has not been
deprecated in Xen itself.

Jan


Re: [Xen-devel] [PATCH v2] xen/pv: Fix a boot up hang revealed by int3 self test

2019-07-15 Thread Jan Beulich
On 15.07.2019 07:05, Zhenzhong Duan wrote:
> 
> On 2019/7/12 22:06, Andrew Cooper wrote:
>> On 11/07/2019 03:15, Zhenzhong Duan wrote:
>>> Commit 7457c0da024b ("x86/alternatives: Add int3_emulate_call()
>>> selftest") is used to ensure there is a gap setup in exception stack
>>> which could be used for inserting call return address.
>>>
>>> This gap is missed in XEN PV int3 exception entry path, then below panic
>>> triggered:
>>>
>>> [    0.772876] general protection fault:  [#1] SMP NOPTI
>>> [    0.772886] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0+ #11
>>> [    0.772893] RIP: e030:int3_magic+0x0/0x7
>>> [    0.772905] RSP: 3507:82203e98 EFLAGS: 0246
>>> [    0.773334] Call Trace:
>>> [    0.773334]  alternative_instructions+0x3d/0x12e
>>> [    0.773334]  check_bugs+0x7c9/0x887
>>> [    0.773334]  ? __get_locked_pte+0x178/0x1f0
>>> [    0.773334]  start_kernel+0x4ff/0x535
>>> [    0.773334]  ? set_init_arg+0x55/0x55
>>> [    0.773334]  xen_start_kernel+0x571/0x57a
>>>
>>> As xenint3 and int3 entry code are same except xenint3 doesn't generate
>>> a gap, we can fix it by using int3 and drop useless xenint3.
>> For 64bit PV guests, Xen's ABI enters the kernel with using SYSRET, with
>> %rcx/%r11 on the stack.
>>
>> To convert back to "normal" looking exceptions, the xen thunks do `pop
>> %rcx; pop %r11; jmp do_*`...
> I will add this to commit message.
>>
>>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>>> index 0ea4831..35a66fc 100644
>>> --- a/arch/x86/entry/entry_64.S
>>> +++ b/arch/x86/entry/entry_64.S
>>> @@ -1176,7 +1176,6 @@ idtentry stack_segment    do_stack_segment    
>>> has_error_code=1
>>>   #ifdef CONFIG_XEN_PV
>>>   idtentry xennmi    do_nmi    has_error_code=0
>>>   idtentry xendebug    do_debug    has_error_code=0
>>> -idtentry xenint3    do_int3    has_error_code=0
>>>   #endif
>> What is confusing is why there are 3 extra magic versions here.  At a
>> guess, I'd say something to do with IST settings (given the vectors),
>> but I don't see why #DB/#BP would need to be different in the first
>> place.  (NMI sure, but that is more to do with the crazy hoops needing
>> to be jumped through to keep native functioning safely.)
> 
> xenint3 will be removed in this patch safely as it don't use IST now.
> 
> But debug and nmi need paranoid_entry which will read MSR_GS_BASE to determine
> 
> if swapgs is needed. I guess PV guesting running in ring3 will #GP with 
> swapgs?

Not only that (Xen could trap and emulate swapgs if that was needed) - 64-bit
PV kernel mode always gets entered with kernel GS base already set. Hence
finding out whether to switch the GS base is specifically not something that
any exception entry point would need to do (and it should actively try to
avoid it, for performance reasons).

Jan


Re: [Xen-devel] [PATCH] x86/xen: disable nosmt in Xen guests

2019-06-12 Thread Jan Beulich
>>> On 12.06.19 at 12:12,  wrote:
> When running as a Xen guest selecting "nosmt" either via command line
> or implicitly via default settings makes no sense, as the guest has no
> clue about the real system topology it is running on. With Xen it is
> the hypervisor's job to ensure the proper bug mitigations are active
> regarding smt settings.

I don't agree with the second sentence: It is in principle fine for the
hypervisor to expose HT (i.e. not disable it as bug mitigation), and
leave it to the guest kernels to protect themselves. We're just not
at the point yet where Xen offers sufficient / reliable data to guest
kernels to do so, so _for the time being_ what you say is correct.

Jan




Re: [PATCH RFC] sscanf: Fix integer overflow with sscanf field width

2019-05-24 Thread Jan Beulich
>>> On 23.05.19 at 19:27,  wrote:
> This fixes 53809751ac23 ("sscanf: don't ignore field widths for numeric
> conversions"). sscanf overflows integers with simple strings such as dates.
> As an example, consider
> 
>   r = sscanf("20190523123456", "%4d%2d%2d%2d%2d%2d",
>   , , ,
>   , , );
> 
>   pr_info("%d %04d-%02d-%2d %02d:%02d:%02d\n",
>   r,
>   year, month, day,
>   hour, minute, second);
> 
> On a 32-bit machine this prints
> 
>   6 -05-23 12:34:56
> 
> where the year is zero, and not 2019 as expected. The reason is that sscanf
> attempts to read 20190523123456 as a whole integer and then divide it with
> 10^10 to obtain 2019, which obviously fails. Of course, 64-bit machines fail
> similarly on longer numerical strings.

Right, and that's also what that commit's description says remains as
non-conforming behavior.

> I'm offering a simple patch to correct this below. The idea is to have a
> variant of _parse_integer() called _parse_integer_end(), with the ability
> to stop consuming digits. The functions
> 
>   simple_{strtol,strtoll,strtoul,strtoull}()
> 
> now have the corresponding
> 
>   sscanf_{strtol,strtoll,strtoul,strtoull}()
> 
> taking a field width into account. There are some code duplication issues
> etc. so one might consider making more extensive changes than these.

I'm not the maintainer here, but to me it looks mostly okay.

> +static long sscanf_strtol(const char *cp, int field_width,
> + char **endp, unsigned int base)
> +{
> + if (*cp == '-')
> + return -sscanf_strtoul(cp + 1, field_width - 1, endp, base);

I'm afraid you may neither convert a field width of zero to -1 here,
nor convert a field width of 1 to zero (unlimited).

I'd also like to note that the 'u' and 'x' format characters also accept
a sign as per the standard, but that's an orthogonal issue which you
may or may not want to address at the same time.

Jan




[tip:x86/asm] x86/asm: Modernize sync_bitops.h

2019-04-10 Thread tip-bot for Jan Beulich
Commit-ID:  547571b5abe61bb33c6005d8981e86e3c61fedcc
Gitweb: https://git.kernel.org/tip/547571b5abe61bb33c6005d8981e86e3c61fedcc
Author: Jan Beulich 
AuthorDate: Wed, 27 Mar 2019 09:15:19 -0600
Committer:  Ingo Molnar 
CommitDate: Wed, 10 Apr 2019 09:53:31 +0200

x86/asm: Modernize sync_bitops.h

Add missing instruction suffixes and use rmwcc.h just like was (more or less)
recently done for bitops.h as well, see:

  22636f8c9511: x86/asm: Add instruction suffixes to bitops
  288e4521f0f6: x86/asm: 'Simplify' GEN_*_RMWcc() macros

No change in functionality intended.

Signed-off-by: Jan Beulich 
Cc: Andy Lutomirski 
Cc: Boris Ostrovsky 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Juergen Gross 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/5c9b9387027800222...@prv1-mh.provo.novell.com
[ Cleaned up the changelog a bit. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/sync_bitops.h | 31 +--
 1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/sync_bitops.h 
b/arch/x86/include/asm/sync_bitops.h
index 2fe745356fb1..6d8d6bc183b7 100644
--- a/arch/x86/include/asm/sync_bitops.h
+++ b/arch/x86/include/asm/sync_bitops.h
@@ -14,6 +14,8 @@
  * bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1).
  */
 
+#include 
+
 #define ADDR (*(volatile long *)addr)
 
 /**
@@ -29,7 +31,7 @@
  */
 static inline void sync_set_bit(long nr, volatile unsigned long *addr)
 {
-   asm volatile("lock; bts %1,%0"
+   asm volatile("lock; " __ASM_SIZE(bts) " %1,%0"
 : "+m" (ADDR)
 : "Ir" (nr)
 : "memory");
@@ -47,7 +49,7 @@ static inline void sync_set_bit(long nr, volatile unsigned 
long *addr)
  */
 static inline void sync_clear_bit(long nr, volatile unsigned long *addr)
 {
-   asm volatile("lock; btr %1,%0"
+   asm volatile("lock; " __ASM_SIZE(btr) " %1,%0"
 : "+m" (ADDR)
 : "Ir" (nr)
 : "memory");
@@ -64,7 +66,7 @@ static inline void sync_clear_bit(long nr, volatile unsigned 
long *addr)
  */
 static inline void sync_change_bit(long nr, volatile unsigned long *addr)
 {
-   asm volatile("lock; btc %1,%0"
+   asm volatile("lock; " __ASM_SIZE(btc) " %1,%0"
 : "+m" (ADDR)
 : "Ir" (nr)
 : "memory");
@@ -78,14 +80,9 @@ static inline void sync_change_bit(long nr, volatile 
unsigned long *addr)
  * This operation is atomic and cannot be reordered.
  * It also implies a memory barrier.
  */
-static inline int sync_test_and_set_bit(long nr, volatile unsigned long *addr)
+static inline bool sync_test_and_set_bit(long nr, volatile unsigned long *addr)
 {
-   unsigned char oldbit;
-
-   asm volatile("lock; bts %2,%1\n\tsetc %0"
-: "=qm" (oldbit), "+m" (ADDR)
-: "Ir" (nr) : "memory");
-   return oldbit;
+   return GEN_BINARY_RMWcc("lock; " __ASM_SIZE(bts), *addr, c, "Ir", nr);
 }
 
 /**
@@ -98,12 +95,7 @@ static inline int sync_test_and_set_bit(long nr, volatile 
unsigned long *addr)
  */
 static inline int sync_test_and_clear_bit(long nr, volatile unsigned long 
*addr)
 {
-   unsigned char oldbit;
-
-   asm volatile("lock; btr %2,%1\n\tsetc %0"
-: "=qm" (oldbit), "+m" (ADDR)
-: "Ir" (nr) : "memory");
-   return oldbit;
+   return GEN_BINARY_RMWcc("lock; " __ASM_SIZE(btr), *addr, c, "Ir", nr);
 }
 
 /**
@@ -116,12 +108,7 @@ static inline int sync_test_and_clear_bit(long nr, 
volatile unsigned long *addr)
  */
 static inline int sync_test_and_change_bit(long nr, volatile unsigned long 
*addr)
 {
-   unsigned char oldbit;
-
-   asm volatile("lock; btc %2,%1\n\tsetc %0"
-: "=qm" (oldbit), "+m" (ADDR)
-: "Ir" (nr) : "memory");
-   return oldbit;
+   return GEN_BINARY_RMWcc("lock; " __ASM_SIZE(btc), *addr, c, "Ir", nr);
 }
 
 #define sync_test_bit(nr, addr) test_bit(nr, addr)


[PATCH v2 RESEND] x86: modernize sync_bitops.h

2019-03-27 Thread Jan Beulich
Add missing insn suffixes and use rmwcc.h just like was (more or less)
recently done for bitops.h as well.

Signed-off-by: Jan Beulich 
---
v2: Re-base over rmwcc.h changes.
---
 arch/x86/include/asm/sync_bitops.h |   31 +--
 1 file changed, 9 insertions(+), 22 deletions(-)

--- a/arch/x86/include/asm/sync_bitops.h
+++ b/arch/x86/include/asm/sync_bitops.h
@@ -14,6 +14,8 @@
  * bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1).
  */
 
+#include 
+
 #define ADDR (*(volatile long *)addr)
 
 /**
@@ -29,7 +31,7 @@
  */
 static inline void sync_set_bit(long nr, volatile unsigned long *addr)
 {
-   asm volatile("lock; bts %1,%0"
+   asm volatile("lock; " __ASM_SIZE(bts) " %1,%0"
 : "+m" (ADDR)
 : "Ir" (nr)
 : "memory");
@@ -47,7 +49,7 @@ static inline void sync_set_bit(long nr,
  */
 static inline void sync_clear_bit(long nr, volatile unsigned long *addr)
 {
-   asm volatile("lock; btr %1,%0"
+   asm volatile("lock; " __ASM_SIZE(btr) " %1,%0"
 : "+m" (ADDR)
 : "Ir" (nr)
 : "memory");
@@ -64,7 +66,7 @@ static inline void sync_clear_bit(long n
  */
 static inline void sync_change_bit(long nr, volatile unsigned long *addr)
 {
-   asm volatile("lock; btc %1,%0"
+   asm volatile("lock; " __ASM_SIZE(btc) " %1,%0"
 : "+m" (ADDR)
 : "Ir" (nr)
 : "memory");
@@ -78,14 +80,9 @@ static inline void sync_change_bit(long
  * This operation is atomic and cannot be reordered.
  * It also implies a memory barrier.
  */
-static inline int sync_test_and_set_bit(long nr, volatile unsigned long *addr)
+static inline bool sync_test_and_set_bit(long nr, volatile unsigned long *addr)
 {
-   unsigned char oldbit;
-
-   asm volatile("lock; bts %2,%1\n\tsetc %0"
-: "=qm" (oldbit), "+m" (ADDR)
-: "Ir" (nr) : "memory");
-   return oldbit;
+   return GEN_BINARY_RMWcc("lock; " __ASM_SIZE(bts), *addr, c, "Ir", nr);
 }
 
 /**
@@ -98,12 +95,7 @@ static inline int sync_test_and_set_bit(
  */
 static inline int sync_test_and_clear_bit(long nr, volatile unsigned long 
*addr)
 {
-   unsigned char oldbit;
-
-   asm volatile("lock; btr %2,%1\n\tsetc %0"
-: "=qm" (oldbit), "+m" (ADDR)
-: "Ir" (nr) : "memory");
-   return oldbit;
+   return GEN_BINARY_RMWcc("lock; " __ASM_SIZE(btr), *addr, c, "Ir", nr);
 }
 
 /**
@@ -116,12 +108,7 @@ static inline int sync_test_and_clear_bi
  */
 static inline int sync_test_and_change_bit(long nr, volatile unsigned long 
*addr)
 {
-   unsigned char oldbit;
-
-   asm volatile("lock; btc %2,%1\n\tsetc %0"
-: "=qm" (oldbit), "+m" (ADDR)
-: "Ir" (nr) : "memory");
-   return oldbit;
+   return GEN_BINARY_RMWcc("lock; " __ASM_SIZE(btc), *addr, c, "Ir", nr);
 }
 
 #define sync_test_bit(nr, addr) test_bit(nr, addr)






Re: [Xen-devel] [PATCH] xen: fix dom0 boot on huge systems

2019-03-07 Thread Jan Beulich
>>> On 07.03.19 at 10:11,  wrote:
> Commit f7c90c2aa40048 ("x86/xen: don't write ptes directly in 32-bit
> PV guests") introduced a regression for booting dom0 on huge systems
> with lots of RAM (in the TB range).
> 
> Reason is that on those hosts the p2m list needs to be moved early in
> the boot process and this requires temporary page tables to be created.
> Said commit modified xen_set_pte_init() to use a hypercall for writing
> a PTE, but this requires the page table being in the direct mapped
> area, which is not the case for the temporary page tables used in
> xen_relocate_p2m().
> 
> As the page tables are completely written before being linked to the
> actual address space instead of set_pte() a plain write to memory can
> be used in xen_relocate_p2m().
> 
> Fixes: f7c90c2aa40048 ("x86/xen: don't write ptes directly in 32-bit PV 
> guests")
> Signed-off-by: Juergen Gross 

Reviewed-by: Jan Beulich 




Re: xen/evtchn and forced threaded irq

2019-02-22 Thread Jan Beulich
>>> On 20.02.19 at 23:03,  wrote:
> On 2/20/19 9:46 PM, Boris Ostrovsky wrote:
>> On 2/20/19 3:46 PM, Julien Grall wrote:
>>> On 2/20/19 8:04 PM, Boris Ostrovsky wrote:
 On 2/20/19 1:05 PM, Julien Grall wrote:
 Some sort of a FIFO that stores {irq, data} tuple. It could obviously be
 implemented as a ring but not necessarily as Xen shared ring (if that's
 what you were referring to).

I'm sort of lost here with the mixed references to "interrupts" and
"events". Interrupts can be shared (and have a per-instance
(irq,data) tuple), but there should be at most one interrupt
underlying the event channel systems, shouldn't there? Event
channels otoh can't be shared, and hence there's only one
"data" item to be associated with each channel, at which point a
plain counter ought to do.

>>> The underlying question is what happen if you miss an interrupt. Is it
>>> going to be ok?
>> 
>> This I am not sure about. I thought yes since we are signaling the
>> process only once.
> 
> I have CCed Andrew and Jan to see if they can help here.

What meaning of "miss" do you use here? As long as is only means
delay (i.e. miss an instance, but get notified as soon as feasible
even if there is not further event coming from the source) - that
would be okay, I think. But if you truly mean "miss" (i.e. event lost
altogether, with silence resulting if there's no further event), then
no, this would not be tolerable.

Jan




Re: [Xen-devel] [PATCH net-next] xen-netback: mark expected switch fall-through

2019-02-11 Thread Jan Beulich
>>> On 08.02.19 at 20:58,  wrote:
> In preparation to enabling -Wimplicit-fallthrough, mark switch
> cases where we are expecting to fall through.
> 
> Warning level 3 was used: -Wimplicit-fallthrough=3
> 
> This patch is part of the ongoing efforts to enabling
> -Wimplicit-fallthrough.
> 
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  drivers/net/xen-netback/xenbus.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/xen-netback/xenbus.c 
> b/drivers/net/xen-netback/xenbus.c
> index 2625740bdc4a..330ddb64930f 100644
> --- a/drivers/net/xen-netback/xenbus.c
> +++ b/drivers/net/xen-netback/xenbus.c
> @@ -655,7 +655,7 @@ static void frontend_changed(struct xenbus_device *dev,
>   set_backend_state(be, XenbusStateClosed);
>   if (xenbus_dev_is_online(dev))
>   break;
> - /* fall through if not online */
> + /* fall through - if not online */
>   case XenbusStateUnknown:

Considering the fall-through was already annotated, I don't think
title and description really justify the change. Is the compiler after
a particular wording here?

Jan




Re: [Xen-devel] [PATCH 2/2] x86/xen: dont add memory above max allowed allocation

2019-01-22 Thread Jan Beulich
>>> On 22.01.19 at 09:06,  wrote:
> Don't allow memory to be added above the allowed maximum allocation
> limit set by Xen.

This reads as if the hypervisor was imposing a limit here, but looking at
xen_get_max_pages(), xen_foreach_remap_area(), and
xen_count_remap_pages() I take it that it's a restriction enforced by
the Xen subsystem in Linux. Furthermore from the cover letter I imply
that the observed issue was on a Dom0, yet xen_get_max_pages()'s
use of XENMEM_maximum_reservation wouldn't impose any limit there
at all (without use of the hypervisor option "dom0_mem=max:..."),
would it?

Jan




[tip:x86/urgent] x86/entry/64/compat: Fix stack switching for XEN PV

2019-01-17 Thread tip-bot for Jan Beulich
Commit-ID:  fc24d75a7f91837d7918e40719575951820b2b8f
Gitweb: https://git.kernel.org/tip/fc24d75a7f91837d7918e40719575951820b2b8f
Author: Jan Beulich 
AuthorDate: Tue, 15 Jan 2019 09:58:16 -0700
Committer:  Thomas Gleixner 
CommitDate: Fri, 18 Jan 2019 00:39:33 +0100

x86/entry/64/compat: Fix stack switching for XEN PV

While in the native case entry into the kernel happens on the trampoline
stack, PV Xen kernels get entered with the current thread stack right
away. Hence source and destination stacks are identical in that case,
and special care is needed.

Other than in sync_regs() the copying done on the INT80 path isn't
NMI / #MC safe, as either of these events occurring in the middle of the
stack copying would clobber data on the (source) stack.

There is similar code in interrupt_entry() and nmi(), but there is no fixup
required because those code paths are unreachable in XEN PV guests.

[ tglx: Sanitized subject, changelog, Fixes tag and stable mail address. Sigh ]

Fixes: 7f2590a110b8 ("x86/entry/64: Use a per-CPU trampoline stack for IDT 
entries")
Signed-off-by: Jan Beulich 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Juergen Gross 
Acked-by: Andy Lutomirski 
Cc: Peter Anvin 
Cc: xen-de...@lists.xenproject.org>
Cc: Boris Ostrovsky 
Cc: sta...@vger.kernel.org
Link: 
https://lkml.kernel.org/r/5c3e112802780020d...@prv1-mh.provo.novell.com

---
 arch/x86/entry/entry_64_compat.S | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 8eaf8952c408..39913770a44d 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -361,7 +361,8 @@ ENTRY(entry_INT80_compat)
 
/* Need to switch before accessing the thread stack. */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
-   movq%rsp, %rdi
+   /* In the Xen PV case we already run on the thread stack. */
+   ALTERNATIVE "movq %rsp, %rdi", "jmp .Lint80_keep_stack", 
X86_FEATURE_XENPV
movqPER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
pushq   6*8(%rdi)   /* regs->ss */
@@ -370,8 +371,9 @@ ENTRY(entry_INT80_compat)
pushq   3*8(%rdi)   /* regs->cs */
pushq   2*8(%rdi)   /* regs->ip */
pushq   1*8(%rdi)   /* regs->orig_ax */
-
pushq   (%rdi)  /* pt_regs->di */
+.Lint80_keep_stack:
+
pushq   %rsi/* pt_regs->si */
xorl%esi, %esi  /* nospec   si */
pushq   %rdx/* pt_regs->dx */


[PATCH v3] x86-64/Xen: fix stack switching

2019-01-15 Thread Jan Beulich
While in the native case entry into the kernel happens on the trampoline
stack, PV Xen kernels get entered with the current thread stack right
away. Hence source and destination stacks are identical in that case,
and special care is needed.

Other than in sync_regs() the copying done on the INT80 path isn't
NMI / #MC safe, as either of these events occurring in the middle of the
stack copying would clobber data on the (source) stack.

I'm not altering the similar code in interrupt_entry() and nmi(), as
those code paths are unreachable afaict when running PV Xen guests.

Fixes: 7f2590a110b837af5679d08fc25c6227c5a8c497
Signed-off-by: Jan Beulich 
Cc: sta...@kernel.org 
---
v3: Drop NMI path change. Use ALTERNATIVE.
v2: Correct placement of .Lint80_keep_stack label.
---
 arch/x86/entry/entry_64_compat.S |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

--- 5.0-rc2/arch/x86/entry/entry_64_compat.S
+++ 5.0-rc2-x86_64-stack-switch-Xen/arch/x86/entry/entry_64_compat.S
@@ -361,7 +361,8 @@ ENTRY(entry_INT80_compat)
 
/* Need to switch before accessing the thread stack. */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
-   movq%rsp, %rdi
+   /* In the Xen PV case we already run on the thread stack. */
+   ALTERNATIVE "movq %rsp, %rdi", "jmp .Lint80_keep_stack", 
X86_FEATURE_XENPV
movqPER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
pushq   6*8(%rdi)   /* regs->ss */
@@ -370,8 +371,9 @@ ENTRY(entry_INT80_compat)
pushq   3*8(%rdi)   /* regs->cs */
pushq   2*8(%rdi)   /* regs->ip */
pushq   1*8(%rdi)   /* regs->orig_ax */
-
pushq   (%rdi)  /* pt_regs->di */
+.Lint80_keep_stack:
+
pushq   %rsi/* pt_regs->si */
xorl%esi, %esi  /* nospec   si */
pushq   %rdx/* pt_regs->dx */






Re: [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device

2018-12-12 Thread Jan Beulich
>>> On 13.12.18 at 04:46,  wrote:
> On Wed, Dec 12, 2018 at 08:21:39AM -0700, Jan Beulich wrote:
>>>>> On 12.12.18 at 16:18,  wrote:
>>> On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich wrote:
>>>>>>> On 12.12.18 at 08:06,  wrote:
>>>>> On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote:
>>>>>>On 12/5/18 4:32 AM, Roger Pau Monné wrote:
>>>>>>> On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote:
>>>>>>>> I find some pass-thru devices don't work any more across guest reboot.
>>>>>>>> Assigning it to another guest also meets the same issue. And the only
>>>>>>>> way to make it work again is un-binding and binding it to pciback.
>>>>>>>> Someone reported this issue one year ago [1]. More detail also can be
>>>>>>>> found in [2].
>>>>>>>>
>>>>>>>> The root-cause is Xen's internal MSI-X state isn't reset properly
>>>>>>>> during reboot or re-assignment. In the above case, Xen set maskall bit
>>>>>>>> to mask all MSI interrupts after it detected a potential security
>>>>>>>> issue. Even after device reset, Xen didn't reset its internal maskall
>>>>>>>> bit. As a result, maskall bit would be set again in next write to
>>>>>>>> MSI-X message control register.
>>>>>>>>
>>>>>>>> Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting MSI-X
>>>>>>>> internal state of a device, we employ it to fix this issue rather than
>>>>>>>> introducing another dedicated sub-hypercall.
>>>>>>>>
>>>>>>>> Note that PHYSDEVOPS_release_msix() will fail if the mapping between
>>>>>>>> the device's msix and pirq has been created. This limitation prevents
>>>>>>>> us calling this function when detaching a device from a guest during
>>>>>>>> guest shutdown. Thus it is called right before calling
>>>>>>>> PHYSDEVOPS_prepare_msix().
>>>>>>> s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the
>>>>>>> () at the end of the hypercall name since it's not a function.
>>>>>>>
>>>>>>> I'm also wondering why the release can't be done when the device is
>>>>>>> detached from the guest (or the guest has been shut down). This makes
>>>>>>> me worry about the raciness of the attach/detach procedure: if there's
>>>>>>> a state where pciback assumes the device has been detached from the
>>>>>>> guest, but there are still pirqs bound, an attempt to attach to
>>>>>>> another guest in such state will fail.
>>>>>>
>>>>>>I wonder whether this additional reset functionality could be done out
>>>>>>of xen_pcibk_xenbus_remove(). We first do a (best effort) device reset
>>>>>>and then do the extra things that are not properly done there.
>>>>> 
>>>>> No. It cannot be done in xen_pcibk_xenbus_remove() without modifying
>>>>> the handler of PHYSDEVOP_release_msix. To do a successful Xen internal
>>>>> MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be finished
>>>>> without error. But ATM, xen expects that no msi is bound to pirq when
>>>>> doing PHYSDEVOP_release_msix. Otherwise it fails with error code -EBUSY.
>>>>> However, the expectation isn't guaranteed in xen_pcibk_xenbus_remove().
>>>>> In some cases, if qemu fails to unmap MSIs, MSIs are unmapped by Xen
>>>>> at last minute, which happens after device reset in 
>>>>> xen_pcibk_xenbus_remove().
>>>>
>>>>But that may need taking care of: I don't think it is a good idea to have
>>>>anything left from the prior owning domain when the device gets reset.
>>>>I.e. left over IRQ bindings should perhaps be forcibly cleared before
>>>>invoking the reset;
>>> 
>>> Agree. How about pciback to track the established IRQ bindings? Then
>>> pciback can clear irq binding before invoking the reset.
>>
>>How would pciback even know of those mappings, when it's qemu
>>who establishes (and manages) them?
> 
> I meant to expose some interfaces from pciback. And pciback serves
> as the proxy of IRQ (un)binding APIs.

If at all

Re: [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device

2018-12-12 Thread Jan Beulich
>>> On 12.12.18 at 16:18,  wrote:
> On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich wrote:
>>>>> On 12.12.18 at 08:06,  wrote:
>>> On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote:
>>>>On 12/5/18 4:32 AM, Roger Pau Monné wrote:
>>>>> On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote:
>>>>>> I find some pass-thru devices don't work any more across guest reboot.
>>>>>> Assigning it to another guest also meets the same issue. And the only
>>>>>> way to make it work again is un-binding and binding it to pciback.
>>>>>> Someone reported this issue one year ago [1]. More detail also can be
>>>>>> found in [2].
>>>>>>
>>>>>> The root-cause is Xen's internal MSI-X state isn't reset properly
>>>>>> during reboot or re-assignment. In the above case, Xen set maskall bit
>>>>>> to mask all MSI interrupts after it detected a potential security
>>>>>> issue. Even after device reset, Xen didn't reset its internal maskall
>>>>>> bit. As a result, maskall bit would be set again in next write to
>>>>>> MSI-X message control register.
>>>>>>
>>>>>> Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting MSI-X
>>>>>> internal state of a device, we employ it to fix this issue rather than
>>>>>> introducing another dedicated sub-hypercall.
>>>>>>
>>>>>> Note that PHYSDEVOPS_release_msix() will fail if the mapping between
>>>>>> the device's msix and pirq has been created. This limitation prevents
>>>>>> us calling this function when detaching a device from a guest during
>>>>>> guest shutdown. Thus it is called right before calling
>>>>>> PHYSDEVOPS_prepare_msix().
>>>>> s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the
>>>>> () at the end of the hypercall name since it's not a function.
>>>>>
>>>>> I'm also wondering why the release can't be done when the device is
>>>>> detached from the guest (or the guest has been shut down). This makes
>>>>> me worry about the raciness of the attach/detach procedure: if there's
>>>>> a state where pciback assumes the device has been detached from the
>>>>> guest, but there are still pirqs bound, an attempt to attach to
>>>>> another guest in such state will fail.
>>>>
>>>>I wonder whether this additional reset functionality could be done out
>>>>of xen_pcibk_xenbus_remove(). We first do a (best effort) device reset
>>>>and then do the extra things that are not properly done there.
>>> 
>>> No. It cannot be done in xen_pcibk_xenbus_remove() without modifying
>>> the handler of PHYSDEVOP_release_msix. To do a successful Xen internal
>>> MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be finished
>>> without error. But ATM, xen expects that no msi is bound to pirq when
>>> doing PHYSDEVOP_release_msix. Otherwise it fails with error code -EBUSY.
>>> However, the expectation isn't guaranteed in xen_pcibk_xenbus_remove().
>>> In some cases, if qemu fails to unmap MSIs, MSIs are unmapped by Xen
>>> at last minute, which happens after device reset in 
>>> xen_pcibk_xenbus_remove().
>>
>>But that may need taking care of: I don't think it is a good idea to have
>>anything left from the prior owning domain when the device gets reset.
>>I.e. left over IRQ bindings should perhaps be forcibly cleared before
>>invoking the reset;
> 
> Agree. How about pciback to track the established IRQ bindings? Then
> pciback can clear irq binding before invoking the reset.

How would pciback even know of those mappings, when it's qemu
who establishes (and manages) them?

>>in fact I'd expect this to happen in the course of
>>domain destruction, and I'd expect the device reset to come after the
>>domain was cleaned up. Perhaps simply an ordering issue in the tool
>>stack?
> 
> I don't think reversing the sequences of device reset and domain
> destruction would be simple. Furthermore, during device hot-unplug,
> device reset is done when the owner is alive. So if we use domain
> destruction to enforce all irq binding cleared, in theory, it won't be
> applicable to hot-unplug case (if qemu's hot-unplug logic is
> compromised).

Even in the hot-unplug case the tool stack could issue unbind
requests, behind the back of the possibly compromised qemu,
once neither the guest nor qemu have access to the device
anymore.

Jan



Re: [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device

2018-12-12 Thread Jan Beulich
>>> On 12.12.18 at 08:06,  wrote:
> On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote:
>>On 12/5/18 4:32 AM, Roger Pau Monné wrote:
>>> On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote:
 I find some pass-thru devices don't work any more across guest reboot.
 Assigning it to another guest also meets the same issue. And the only
 way to make it work again is un-binding and binding it to pciback.
 Someone reported this issue one year ago [1]. More detail also can be
 found in [2].

 The root-cause is Xen's internal MSI-X state isn't reset properly
 during reboot or re-assignment. In the above case, Xen set maskall bit
 to mask all MSI interrupts after it detected a potential security
 issue. Even after device reset, Xen didn't reset its internal maskall
 bit. As a result, maskall bit would be set again in next write to
 MSI-X message control register.

 Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting MSI-X
 internal state of a device, we employ it to fix this issue rather than
 introducing another dedicated sub-hypercall.

 Note that PHYSDEVOPS_release_msix() will fail if the mapping between
 the device's msix and pirq has been created. This limitation prevents
 us calling this function when detaching a device from a guest during
 guest shutdown. Thus it is called right before calling
 PHYSDEVOPS_prepare_msix().
>>> s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the
>>> () at the end of the hypercall name since it's not a function.
>>>
>>> I'm also wondering why the release can't be done when the device is
>>> detached from the guest (or the guest has been shut down). This makes
>>> me worry about the raciness of the attach/detach procedure: if there's
>>> a state where pciback assumes the device has been detached from the
>>> guest, but there are still pirqs bound, an attempt to attach to
>>> another guest in such state will fail.
>>
>>I wonder whether this additional reset functionality could be done out
>>of xen_pcibk_xenbus_remove(). We first do a (best effort) device reset
>>and then do the extra things that are not properly done there.
> 
> No. It cannot be done in xen_pcibk_xenbus_remove() without modifying
> the handler of PHYSDEVOP_release_msix. To do a successful Xen internal
> MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be finished
> without error. But ATM, xen expects that no msi is bound to pirq when
> doing PHYSDEVOP_release_msix. Otherwise it fails with error code -EBUSY.
> However, the expectation isn't guaranteed in xen_pcibk_xenbus_remove().
> In some cases, if qemu fails to unmap MSIs, MSIs are unmapped by Xen
> at last minute, which happens after device reset in 
> xen_pcibk_xenbus_remove().

But that may need taking care of: I don't think it is a good idea to have
anything left from the prior owning domain when the device gets reset.
I.e. left over IRQ bindings should perhaps be forcibly cleared before
invoking the reset; in fact I'd expect this to happen in the course of
domain destruction, and I'd expect the device reset to come after the
domain was cleaned up. Perhaps simply an ordering issue in the tool
stack?

Jan



RE: [PATCH v2] x86: modernize sync_bitops.h

2018-11-21 Thread Jan Beulich
>>> On 21.11.18 at 14:49,  wrote:
> From: Jan Beulich
>> Sent: 21 November 2018 13:03
>> 
>> >>> On 21.11.18 at 12:55,  wrote:
>> > From: Jan Beulich
>> >> Sent: 21 November 2018 10:11
>> >>
>> >> Add missing insn suffixes and use rmwcc.h just like was (more or less)
>> >> recently done for bitops.h as well.
>> >
>> > Why? bts (etc) on memory don't really have an 'operand size'.
>> 
>> Of course they do - depending on operand size they operate on
>> 2-, 4-, or 8-byte quantities. When the second operand is a
>> register, the suffix is redundant (but doesn't hurt), but when
>> the second operand is an immediate, the assembler (in AT
>> syntax) has no way of knowing what operand size you mean.
> 
> You need to RTFM.

Excuse me? How about you look at this table from the SDM
(format of course comes out better in the .pdf):

0F AB /r BTS r/m16, r16 MR Valid Valid Store selected bit in CF flag and set.
0F AB /r BTS r/m32, r32 MR Valid Valid Store selected bit in CF flag and set.
REX.W + 0F AB /r BTS r/m64, r64 MR Valid N.E. Store selected bit in CF flag and 
set.
0F BA /5 ib BTS r/m16, imm8 MI Valid Valid Store selected bit in CF flag and 
set.
0F BA /5 ib BTS r/m32, imm8 MI Valid Valid Store selected bit in CF flag and 
set.
REX.W + 0F BA /5 ib BTS r/m64, imm8 MI Valid N.E. Store selected bit in CF flag 
and set.

Please read manuals yourself before making such statements.

> Regardless of the 'operand size' the 'bit' instructions do a 32 bit aligned
> 32 bit wide read/modify/write cycle.
> 
> The 'operand size' does affect whether the bit number (which is signed)
> comes from %cl (8 bits), %cx (16 bits), %rcx (32 bits) or (%ecx) 64 bits.
> But that is implicit in the register name used.

There is no form with %cl as operand. Instead there are forms with
an immediate operand. 

Jan




RE: [PATCH v2] x86: modernize sync_bitops.h

2018-11-21 Thread Jan Beulich
>>> On 21.11.18 at 14:49,  wrote:
> From: Jan Beulich
>> Sent: 21 November 2018 13:03
>> 
>> >>> On 21.11.18 at 12:55,  wrote:
>> > From: Jan Beulich
>> >> Sent: 21 November 2018 10:11
>> >>
>> >> Add missing insn suffixes and use rmwcc.h just like was (more or less)
>> >> recently done for bitops.h as well.
>> >
>> > Why? bts (etc) on memory don't really have an 'operand size'.
>> 
>> Of course they do - depending on operand size they operate on
>> 2-, 4-, or 8-byte quantities. When the second operand is a
>> register, the suffix is redundant (but doesn't hurt), but when
>> the second operand is an immediate, the assembler (in AT
>> syntax) has no way of knowing what operand size you mean.
> 
> You need to RTFM.

Excuse me? How about you look at this table from the SDM
(format of course comes out better in the .pdf):

0F AB /r BTS r/m16, r16 MR Valid Valid Store selected bit in CF flag and set.
0F AB /r BTS r/m32, r32 MR Valid Valid Store selected bit in CF flag and set.
REX.W + 0F AB /r BTS r/m64, r64 MR Valid N.E. Store selected bit in CF flag and 
set.
0F BA /5 ib BTS r/m16, imm8 MI Valid Valid Store selected bit in CF flag and 
set.
0F BA /5 ib BTS r/m32, imm8 MI Valid Valid Store selected bit in CF flag and 
set.
REX.W + 0F BA /5 ib BTS r/m64, imm8 MI Valid N.E. Store selected bit in CF flag 
and set.

Please read manuals yourself before making such statements.

> Regardless of the 'operand size' the 'bit' instructions do a 32 bit aligned
> 32 bit wide read/modify/write cycle.
> 
> The 'operand size' does affect whether the bit number (which is signed)
> comes from %cl (8 bits), %cx (16 bits), %rcx (32 bits) or (%ecx) 64 bits.
> But that is implicit in the register name used.

There is no form with %cl as operand. Instead there are forms with
an immediate operand. 

Jan




RE: [PATCH v2] x86: modernize sync_bitops.h

2018-11-21 Thread Jan Beulich
>>> On 21.11.18 at 12:55,  wrote:
> From: Jan Beulich
>> Sent: 21 November 2018 10:11
>> 
>> Add missing insn suffixes and use rmwcc.h just like was (more or less)
>> recently done for bitops.h as well.
> 
> Why? bts (etc) on memory don't really have an 'operand size'.

Of course they do - depending on operand size they operate on
2-, 4-, or 8-byte quantities. When the second operand is a
register, the suffix is redundant (but doesn't hurt), but when
the second operand is an immediate, the assembler (in AT
syntax) has no way of knowing what operand size you mean.

Jan




RE: [PATCH v2] x86: modernize sync_bitops.h

2018-11-21 Thread Jan Beulich
>>> On 21.11.18 at 12:55,  wrote:
> From: Jan Beulich
>> Sent: 21 November 2018 10:11
>> 
>> Add missing insn suffixes and use rmwcc.h just like was (more or less)
>> recently done for bitops.h as well.
> 
> Why? bts (etc) on memory don't really have an 'operand size'.

Of course they do - depending on operand size they operate on
2-, 4-, or 8-byte quantities. When the second operand is a
register, the suffix is redundant (but doesn't hurt), but when
the second operand is an immediate, the assembler (in AT
syntax) has no way of knowing what operand size you mean.

Jan




[PATCH v2] x86: modernize sync_bitops.h

2018-11-21 Thread Jan Beulich
Add missing insn suffixes and use rmwcc.h just like was (more or less)
recently done for bitops.h as well.

Signed-off-by: Jan Beulich 
---
v2: Re-base over rmwcc.h changes.
---
 arch/x86/include/asm/sync_bitops.h |   31 +--
 1 file changed, 9 insertions(+), 22 deletions(-)

--- 4.20-rc3/arch/x86/include/asm/sync_bitops.h
+++ 4.20-rc3-x86-sync-bitops-insn-suffixes/arch/x86/include/asm/sync_bitops.h
@@ -14,6 +14,8 @@
  * bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1).
  */
 
+#include 
+
 #define ADDR (*(volatile long *)addr)
 
 /**
@@ -29,7 +31,7 @@
  */
 static inline void sync_set_bit(long nr, volatile unsigned long *addr)
 {
-   asm volatile("lock; bts %1,%0"
+   asm volatile("lock; " __ASM_SIZE(bts) " %1,%0"
 : "+m" (ADDR)
 : "Ir" (nr)
 : "memory");
@@ -47,7 +49,7 @@ static inline void sync_set_bit(long nr,
  */
 static inline void sync_clear_bit(long nr, volatile unsigned long *addr)
 {
-   asm volatile("lock; btr %1,%0"
+   asm volatile("lock; " __ASM_SIZE(btr) " %1,%0"
 : "+m" (ADDR)
 : "Ir" (nr)
 : "memory");
@@ -64,7 +66,7 @@ static inline void sync_clear_bit(long n
  */
 static inline void sync_change_bit(long nr, volatile unsigned long *addr)
 {
-   asm volatile("lock; btc %1,%0"
+   asm volatile("lock; " __ASM_SIZE(btc) " %1,%0"
 : "+m" (ADDR)
 : "Ir" (nr)
 : "memory");
@@ -78,14 +80,9 @@ static inline void sync_change_bit(long
  * This operation is atomic and cannot be reordered.
  * It also implies a memory barrier.
  */
-static inline int sync_test_and_set_bit(long nr, volatile unsigned long *addr)
+static inline bool sync_test_and_set_bit(long nr, volatile unsigned long *addr)
 {
-   unsigned char oldbit;
-
-   asm volatile("lock; bts %2,%1\n\tsetc %0"
-: "=qm" (oldbit), "+m" (ADDR)
-: "Ir" (nr) : "memory");
-   return oldbit;
+   return GEN_BINARY_RMWcc("lock; " __ASM_SIZE(bts), *addr, c, "Ir", nr);
 }
 
 /**
@@ -98,12 +95,7 @@ static inline int sync_test_and_set_bit(
  */
 static inline int sync_test_and_clear_bit(long nr, volatile unsigned long 
*addr)
 {
-   unsigned char oldbit;
-
-   asm volatile("lock; btr %2,%1\n\tsetc %0"
-: "=qm" (oldbit), "+m" (ADDR)
-: "Ir" (nr) : "memory");
-   return oldbit;
+   return GEN_BINARY_RMWcc("lock; " __ASM_SIZE(btr), *addr, c, "Ir", nr);
 }
 
 /**
@@ -116,12 +108,7 @@ static inline int sync_test_and_clear_bi
  */
 static inline int sync_test_and_change_bit(long nr, volatile unsigned long 
*addr)
 {
-   unsigned char oldbit;
-
-   asm volatile("lock; btc %2,%1\n\tsetc %0"
-: "=qm" (oldbit), "+m" (ADDR)
-: "Ir" (nr) : "memory");
-   return oldbit;
+   return GEN_BINARY_RMWcc("lock; " __ASM_SIZE(btc), *addr, c, "Ir", nr);
 }
 
 #define sync_test_bit(nr, addr) test_bit(nr, addr)





[PATCH v2] x86: modernize sync_bitops.h

2018-11-21 Thread Jan Beulich
Add missing insn suffixes and use rmwcc.h just like was (more or less)
recently done for bitops.h as well.

Signed-off-by: Jan Beulich 
---
v2: Re-base over rmwcc.h changes.
---
 arch/x86/include/asm/sync_bitops.h |   31 +--
 1 file changed, 9 insertions(+), 22 deletions(-)

--- 4.20-rc3/arch/x86/include/asm/sync_bitops.h
+++ 4.20-rc3-x86-sync-bitops-insn-suffixes/arch/x86/include/asm/sync_bitops.h
@@ -14,6 +14,8 @@
  * bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1).
  */
 
+#include 
+
 #define ADDR (*(volatile long *)addr)
 
 /**
@@ -29,7 +31,7 @@
  */
 static inline void sync_set_bit(long nr, volatile unsigned long *addr)
 {
-   asm volatile("lock; bts %1,%0"
+   asm volatile("lock; " __ASM_SIZE(bts) " %1,%0"
 : "+m" (ADDR)
 : "Ir" (nr)
 : "memory");
@@ -47,7 +49,7 @@ static inline void sync_set_bit(long nr,
  */
 static inline void sync_clear_bit(long nr, volatile unsigned long *addr)
 {
-   asm volatile("lock; btr %1,%0"
+   asm volatile("lock; " __ASM_SIZE(btr) " %1,%0"
 : "+m" (ADDR)
 : "Ir" (nr)
 : "memory");
@@ -64,7 +66,7 @@ static inline void sync_clear_bit(long n
  */
 static inline void sync_change_bit(long nr, volatile unsigned long *addr)
 {
-   asm volatile("lock; btc %1,%0"
+   asm volatile("lock; " __ASM_SIZE(btc) " %1,%0"
 : "+m" (ADDR)
 : "Ir" (nr)
 : "memory");
@@ -78,14 +80,9 @@ static inline void sync_change_bit(long
  * This operation is atomic and cannot be reordered.
  * It also implies a memory barrier.
  */
-static inline int sync_test_and_set_bit(long nr, volatile unsigned long *addr)
+static inline bool sync_test_and_set_bit(long nr, volatile unsigned long *addr)
 {
-   unsigned char oldbit;
-
-   asm volatile("lock; bts %2,%1\n\tsetc %0"
-: "=qm" (oldbit), "+m" (ADDR)
-: "Ir" (nr) : "memory");
-   return oldbit;
+   return GEN_BINARY_RMWcc("lock; " __ASM_SIZE(bts), *addr, c, "Ir", nr);
 }
 
 /**
@@ -98,12 +95,7 @@ static inline int sync_test_and_set_bit(
  */
 static inline int sync_test_and_clear_bit(long nr, volatile unsigned long 
*addr)
 {
-   unsigned char oldbit;
-
-   asm volatile("lock; btr %2,%1\n\tsetc %0"
-: "=qm" (oldbit), "+m" (ADDR)
-: "Ir" (nr) : "memory");
-   return oldbit;
+   return GEN_BINARY_RMWcc("lock; " __ASM_SIZE(btr), *addr, c, "Ir", nr);
 }
 
 /**
@@ -116,12 +108,7 @@ static inline int sync_test_and_clear_bi
  */
 static inline int sync_test_and_change_bit(long nr, volatile unsigned long 
*addr)
 {
-   unsigned char oldbit;
-
-   asm volatile("lock; btc %2,%1\n\tsetc %0"
-: "=qm" (oldbit), "+m" (ADDR)
-: "Ir" (nr) : "memory");
-   return oldbit;
+   return GEN_BINARY_RMWcc("lock; " __ASM_SIZE(btc), *addr, c, "Ir", nr);
 }
 
 #define sync_test_bit(nr, addr) test_bit(nr, addr)





Re: [PATCH v2] x86-64: use 32-bit XOR to zero registers

2018-07-09 Thread Jan Beulich
>>> On 09.07.18 at 10:33,  wrote:
> Anyway, normally assembler is the one who chooses instruction
> encoding.

There are different possible views here, and I personally think that
while it is a compiler's job to chose optimal encodings, assemblers
shouldn't by default alter what the programmer (or compiler) has
specified, unless there are multiple encodings that are _entirely_
identical in effect, but which are _truly_ different encodings (and
presence/absence of instruction prefixes to me doesn't count as
"truly different"). This view is particularly on the basis that
assembly programmers often imply a certain encoding to be chosen,
be it just to imply size, or to go as far as meaning to run-time patch
code.

> xor %rax, %rax is equivalent to xor %eax, %eax; first variant is
> slower on some CPUs, second variant will take one extra byte due to
> operand size prefix IIRC...

The second variant is actually shorter by one byte.

> Should the assembler generate right
> variant according to the CPU type?

An "optimization" mode has recently been added to gas, but in general
as well as for the sake of older gas I don't think we should rely on this
new behavior (which is also off by default) except in cases where the
impact on the source code would be undesirable (as can e.g. be the
case when macro-izing things).

Jan




Re: [PATCH v2] x86-64: use 32-bit XOR to zero registers

2018-07-09 Thread Jan Beulich
>>> On 09.07.18 at 10:33,  wrote:
> Anyway, normally assembler is the one who chooses instruction
> encoding.

There are different possible views here, and I personally think that
while it is a compiler's job to chose optimal encodings, assemblers
shouldn't by default alter what the programmer (or compiler) has
specified, unless there are multiple encodings that are _entirely_
identical in effect, but which are _truly_ different encodings (and
presence/absence of instruction prefixes to me doesn't count as
"truly different"). This view is particularly on the basis that
assembly programmers often imply a certain encoding to be chosen,
be it just to imply size, or to go as far as meaning to run-time patch
code.

> xor %rax, %rax is equivalent to xor %eax, %eax; first variant is
> slower on some CPUs, second variant will take one extra byte due to
> operand size prefix IIRC...

The second variant is actually shorter by one byte.

> Should the assembler generate right
> variant according to the CPU type?

An "optimization" mode has recently been added to gas, but in general
as well as for the sake of older gas I don't think we should rely on this
new behavior (which is also off by default) except in cases where the
impact on the source code would be undesirable (as can e.g. be the
case when macro-izing things).

Jan




RE: [tip:x86/asm] x86/entry/64: Add two more instruction suffixes

2018-07-03 Thread Jan Beulich
>>> On 03.07.18 at 10:46,  wrote:
> From: Jan Beulich
>> Sent: 03 July 2018 09:36
> ...
>> As said there, omitting suffixes from instructions in AT mode is bad
>> practice when operand size cannot be determined by the assembler from
>> register operands, and is likely going to be warned about by upstream
>> gas in the future (mine does already).
> ...
>> -bt  $9, EFLAGS(%rsp)/* interrupts off? */
>> +btl $9, EFLAGS(%rsp)/* interrupts off? */
> 
> Hmmm
> Does the operand size make any difference at all for the bit instructions?
> I'm pretty sure that the cpus (386 onwards) have always done aligned 32bit
> transfers (the docs never actually said aligned).
> I can't remember whether 64bit mode allows immediates above 31.
> 
> So gas accepting 'btb $n,memory' is giving a false impression of
> what actually happens.

BTB does not exist at all. BTW and (on 64-bit) BTQ do exist though,
and they have behavior differing from BTL. The only AT syntax doc
I have says that L is the default suffix to be used, but there are cases
where this wasn't (and maybe still isn't) the case, so omitting a suffix
when register operands aren't available to size instructions has always
been a risky game.

Jan




RE: [tip:x86/asm] x86/entry/64: Add two more instruction suffixes

2018-07-03 Thread Jan Beulich
>>> On 03.07.18 at 10:46,  wrote:
> From: Jan Beulich
>> Sent: 03 July 2018 09:36
> ...
>> As said there, omitting suffixes from instructions in AT mode is bad
>> practice when operand size cannot be determined by the assembler from
>> register operands, and is likely going to be warned about by upstream
>> gas in the future (mine does already).
> ...
>> -bt  $9, EFLAGS(%rsp)/* interrupts off? */
>> +btl $9, EFLAGS(%rsp)/* interrupts off? */
> 
> Hmmm
> Does the operand size make any difference at all for the bit instructions?
> I'm pretty sure that the cpus (386 onwards) have always done aligned 32bit
> transfers (the docs never actually said aligned).
> I can't remember whether 64bit mode allows immediates above 31.
> 
> So gas accepting 'btb $n,memory' is giving a false impression of
> what actually happens.

BTB does not exist at all. BTW and (on 64-bit) BTQ do exist though,
and they have behavior differing from BTL. The only AT syntax doc
I have says that L is the default suffix to be used, but there are cases
where this wasn't (and maybe still isn't) the case, so omitting a suffix
when register operands aren't available to size instructions has always
been a risky game.

Jan




[tip:x86/asm] x86/asm/64: Use 32-bit XOR to zero registers

2018-07-03 Thread tip-bot for Jan Beulich
Commit-ID:  a7bea8308933aaeea76dad7d42a6e51000417626
Gitweb: https://git.kernel.org/tip/a7bea8308933aaeea76dad7d42a6e51000417626
Author: Jan Beulich 
AuthorDate: Mon, 2 Jul 2018 04:31:54 -0600
Committer:  Ingo Molnar 
CommitDate: Tue, 3 Jul 2018 09:59:29 +0200

x86/asm/64: Use 32-bit XOR to zero registers

Some Intel CPUs don't recognize 64-bit XORs as zeroing idioms. Zeroing
idioms don't require execution bandwidth, as they're being taken care
of in the frontend (through register renaming). Use 32-bit XORs instead.

Signed-off-by: Jan Beulich 
Cc: Alok Kataria 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Juergen Gross 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: da...@davemloft.net
Cc: herb...@gondor.apana.org.au
Cc: pa...@ucw.cz
Cc: r...@rjwysocki.net
Link: http://lkml.kernel.org/r/5b39ff1a0278001cf...@prv1-mh.provo.novell.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/crypto/aegis128-aesni-asm.S | 2 +-
 arch/x86/crypto/aegis128l-aesni-asm.S| 2 +-
 arch/x86/crypto/aegis256-aesni-asm.S | 2 +-
 arch/x86/crypto/aesni-intel_asm.S| 8 
 arch/x86/crypto/aesni-intel_avx-x86_64.S | 4 ++--
 arch/x86/crypto/morus1280-avx2-asm.S | 2 +-
 arch/x86/crypto/morus1280-sse2-asm.S | 2 +-
 arch/x86/crypto/morus640-sse2-asm.S  | 2 +-
 arch/x86/crypto/sha1_ssse3_asm.S | 2 +-
 arch/x86/kernel/head_64.S| 2 +-
 arch/x86/kernel/paravirt_patch_64.c  | 2 +-
 arch/x86/lib/memcpy_64.S | 2 +-
 arch/x86/power/hibernate_asm_64.S| 2 +-
 13 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/x86/crypto/aegis128-aesni-asm.S 
b/arch/x86/crypto/aegis128-aesni-asm.S
index 9254e0b6cc06..d5c5e2082ae7 100644
--- a/arch/x86/crypto/aegis128-aesni-asm.S
+++ b/arch/x86/crypto/aegis128-aesni-asm.S
@@ -75,7 +75,7 @@
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG, MSG
 
mov LEN, %r8
diff --git a/arch/x86/crypto/aegis128l-aesni-asm.S 
b/arch/x86/crypto/aegis128l-aesni-asm.S
index 9263c344f2c7..0fbdf5f00bda 100644
--- a/arch/x86/crypto/aegis128l-aesni-asm.S
+++ b/arch/x86/crypto/aegis128l-aesni-asm.S
@@ -66,7 +66,7 @@
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG0, MSG0
pxor MSG1, MSG1
 
diff --git a/arch/x86/crypto/aegis256-aesni-asm.S 
b/arch/x86/crypto/aegis256-aesni-asm.S
index 1d977d515bf9..a49f58e2a5dd 100644
--- a/arch/x86/crypto/aegis256-aesni-asm.S
+++ b/arch/x86/crypto/aegis256-aesni-asm.S
@@ -59,7 +59,7 @@
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG, MSG
 
mov LEN, %r8
diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index e762ef417562..9bd139569b41 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -258,7 +258,7 @@ ALL_F:  .octa 0x
 .macro GCM_INIT Iv SUBKEY AAD AADLEN
mov \AADLEN, %r11
mov %r11, AadLen(%arg2) # ctx_data.aad_length = aad_length
-   xor %r11, %r11
+   xor %r11d, %r11d
mov %r11, InLen(%arg2) # ctx_data.in_length = 0
mov %r11, PBlockLen(%arg2) # ctx_data.partial_block_length = 0
mov %r11, PBlockEncKey(%arg2) # ctx_data.partial_block_enc_key = 0
@@ -286,7 +286,7 @@ ALL_F:  .octa 0x
movdqu HashKey(%arg2), %xmm13
add %arg5, InLen(%arg2)
 
-   xor %r11, %r11 # initialise the data pointer offset as zero
+   xor %r11d, %r11d # initialise the data pointer offset as zero
PARTIAL_BLOCK %arg3 %arg4 %arg5 %r11 %xmm8 \operation
 
sub %r11, %arg5 # sub partial block data used
@@ -702,7 +702,7 @@ _no_extra_mask_1_\@:
 
# GHASH computation for the last <16 Byte block
GHASH_MUL \AAD_HASH, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6
-   xor %rax,%rax
+   xor %eax, %eax
 
mov %rax, PBlockLen(%arg2)
jmp _dec_done_\@
@@ -737,7 +737,7 @@ _no_extra_mask_2_\@:
 
# GHASH computation for the last <16 Byte block
GHASH_MUL \AAD_HASH, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6
-   xor %rax,%rax
+   xor %eax, %eax
 
mov %rax, PBlockLen(%arg2)
jmp _encode_done_\@
diff --git a/arch/x86/crypto/aesni-intel_avx-x86_64.S 
b/arch/x86/crypto/aesni-intel_avx-x86_64.S
index faecb1518bf8..1985ea0b551b 100644
--- a/arch/x86/crypto/aesni-intel_avx-x86_64.S
+++ b/arch/x86/crypto/aesni-intel_avx-x86_64.S
@@ -463,7 +463,7 @@ _get_AAD_rest_final\@:
 
 _get_AAD_done\@:
# initialize the data pointer offset as zero
-   xor %r11, %r11
+   xor %r11d, %r11d
 
# start AES for num_initial_blocks blocks
mov arg5, %rax # rax = *Y0
@@ -1770,7 +1770,7 @@ _get_AAD_rest_final\@:
 
 _get_AA

[tip:x86/asm] x86/entry/64: Add two more instruction suffixes

2018-07-03 Thread tip-bot for Jan Beulich
Commit-ID:  6709812f094d96543b443645c68daaa32d3d3e77
Gitweb: https://git.kernel.org/tip/6709812f094d96543b443645c68daaa32d3d3e77
Author: Jan Beulich 
AuthorDate: Mon, 2 Jul 2018 04:47:57 -0600
Committer:  Ingo Molnar 
CommitDate: Tue, 3 Jul 2018 09:59:29 +0200

x86/entry/64: Add two more instruction suffixes

Sadly, other than claimed in:

  a368d7fd2a ("x86/entry/64: Add instruction suffix")

... there are two more instances which want to be adjusted.

As said there, omitting suffixes from instructions in AT mode is bad
practice when operand size cannot be determined by the assembler from
register operands, and is likely going to be warned about by upstream
gas in the future (mine does already).

Add the other missing suffixes here as well.

Signed-off-by: Jan Beulich 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/5b3a02dd0278001cf...@prv1-mh.provo.novell.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/entry/entry_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index c6f3677e6105..65aa16d845f6 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -92,7 +92,7 @@ END(native_usergs_sysret64)
 .endm
 
 .macro TRACE_IRQS_IRETQ_DEBUG
-   bt  $9, EFLAGS(%rsp)/* interrupts off? */
+   btl $9, EFLAGS(%rsp)/* interrupts off? */
jnc 1f
TRACE_IRQS_ON_DEBUG
 1:
@@ -702,7 +702,7 @@ retint_kernel:
 #ifdef CONFIG_PREEMPT
/* Interrupts are off */
/* Check if we need preemption */
-   bt  $9, EFLAGS(%rsp)/* were interrupts off? */
+   btl $9, EFLAGS(%rsp)/* were interrupts off? */
jnc 1f
 0: cmpl$0, PER_CPU_VAR(__preempt_count)
jnz 1f


[tip:x86/asm] x86/asm/64: Use 32-bit XOR to zero registers

2018-07-03 Thread tip-bot for Jan Beulich
Commit-ID:  a7bea8308933aaeea76dad7d42a6e51000417626
Gitweb: https://git.kernel.org/tip/a7bea8308933aaeea76dad7d42a6e51000417626
Author: Jan Beulich 
AuthorDate: Mon, 2 Jul 2018 04:31:54 -0600
Committer:  Ingo Molnar 
CommitDate: Tue, 3 Jul 2018 09:59:29 +0200

x86/asm/64: Use 32-bit XOR to zero registers

Some Intel CPUs don't recognize 64-bit XORs as zeroing idioms. Zeroing
idioms don't require execution bandwidth, as they're being taken care
of in the frontend (through register renaming). Use 32-bit XORs instead.

Signed-off-by: Jan Beulich 
Cc: Alok Kataria 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Juergen Gross 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: da...@davemloft.net
Cc: herb...@gondor.apana.org.au
Cc: pa...@ucw.cz
Cc: r...@rjwysocki.net
Link: http://lkml.kernel.org/r/5b39ff1a0278001cf...@prv1-mh.provo.novell.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/crypto/aegis128-aesni-asm.S | 2 +-
 arch/x86/crypto/aegis128l-aesni-asm.S| 2 +-
 arch/x86/crypto/aegis256-aesni-asm.S | 2 +-
 arch/x86/crypto/aesni-intel_asm.S| 8 
 arch/x86/crypto/aesni-intel_avx-x86_64.S | 4 ++--
 arch/x86/crypto/morus1280-avx2-asm.S | 2 +-
 arch/x86/crypto/morus1280-sse2-asm.S | 2 +-
 arch/x86/crypto/morus640-sse2-asm.S  | 2 +-
 arch/x86/crypto/sha1_ssse3_asm.S | 2 +-
 arch/x86/kernel/head_64.S| 2 +-
 arch/x86/kernel/paravirt_patch_64.c  | 2 +-
 arch/x86/lib/memcpy_64.S | 2 +-
 arch/x86/power/hibernate_asm_64.S| 2 +-
 13 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/x86/crypto/aegis128-aesni-asm.S 
b/arch/x86/crypto/aegis128-aesni-asm.S
index 9254e0b6cc06..d5c5e2082ae7 100644
--- a/arch/x86/crypto/aegis128-aesni-asm.S
+++ b/arch/x86/crypto/aegis128-aesni-asm.S
@@ -75,7 +75,7 @@
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG, MSG
 
mov LEN, %r8
diff --git a/arch/x86/crypto/aegis128l-aesni-asm.S 
b/arch/x86/crypto/aegis128l-aesni-asm.S
index 9263c344f2c7..0fbdf5f00bda 100644
--- a/arch/x86/crypto/aegis128l-aesni-asm.S
+++ b/arch/x86/crypto/aegis128l-aesni-asm.S
@@ -66,7 +66,7 @@
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG0, MSG0
pxor MSG1, MSG1
 
diff --git a/arch/x86/crypto/aegis256-aesni-asm.S 
b/arch/x86/crypto/aegis256-aesni-asm.S
index 1d977d515bf9..a49f58e2a5dd 100644
--- a/arch/x86/crypto/aegis256-aesni-asm.S
+++ b/arch/x86/crypto/aegis256-aesni-asm.S
@@ -59,7 +59,7 @@
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG, MSG
 
mov LEN, %r8
diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index e762ef417562..9bd139569b41 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -258,7 +258,7 @@ ALL_F:  .octa 0x
 .macro GCM_INIT Iv SUBKEY AAD AADLEN
mov \AADLEN, %r11
mov %r11, AadLen(%arg2) # ctx_data.aad_length = aad_length
-   xor %r11, %r11
+   xor %r11d, %r11d
mov %r11, InLen(%arg2) # ctx_data.in_length = 0
mov %r11, PBlockLen(%arg2) # ctx_data.partial_block_length = 0
mov %r11, PBlockEncKey(%arg2) # ctx_data.partial_block_enc_key = 0
@@ -286,7 +286,7 @@ ALL_F:  .octa 0x
movdqu HashKey(%arg2), %xmm13
add %arg5, InLen(%arg2)
 
-   xor %r11, %r11 # initialise the data pointer offset as zero
+   xor %r11d, %r11d # initialise the data pointer offset as zero
PARTIAL_BLOCK %arg3 %arg4 %arg5 %r11 %xmm8 \operation
 
sub %r11, %arg5 # sub partial block data used
@@ -702,7 +702,7 @@ _no_extra_mask_1_\@:
 
# GHASH computation for the last <16 Byte block
GHASH_MUL \AAD_HASH, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6
-   xor %rax,%rax
+   xor %eax, %eax
 
mov %rax, PBlockLen(%arg2)
jmp _dec_done_\@
@@ -737,7 +737,7 @@ _no_extra_mask_2_\@:
 
# GHASH computation for the last <16 Byte block
GHASH_MUL \AAD_HASH, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6
-   xor %rax,%rax
+   xor %eax, %eax
 
mov %rax, PBlockLen(%arg2)
jmp _encode_done_\@
diff --git a/arch/x86/crypto/aesni-intel_avx-x86_64.S 
b/arch/x86/crypto/aesni-intel_avx-x86_64.S
index faecb1518bf8..1985ea0b551b 100644
--- a/arch/x86/crypto/aesni-intel_avx-x86_64.S
+++ b/arch/x86/crypto/aesni-intel_avx-x86_64.S
@@ -463,7 +463,7 @@ _get_AAD_rest_final\@:
 
 _get_AAD_done\@:
# initialize the data pointer offset as zero
-   xor %r11, %r11
+   xor %r11d, %r11d
 
# start AES for num_initial_blocks blocks
mov arg5, %rax # rax = *Y0
@@ -1770,7 +1770,7 @@ _get_AAD_rest_final\@:
 
 _get_AA

[tip:x86/asm] x86/entry/64: Add two more instruction suffixes

2018-07-03 Thread tip-bot for Jan Beulich
Commit-ID:  6709812f094d96543b443645c68daaa32d3d3e77
Gitweb: https://git.kernel.org/tip/6709812f094d96543b443645c68daaa32d3d3e77
Author: Jan Beulich 
AuthorDate: Mon, 2 Jul 2018 04:47:57 -0600
Committer:  Ingo Molnar 
CommitDate: Tue, 3 Jul 2018 09:59:29 +0200

x86/entry/64: Add two more instruction suffixes

Sadly, other than claimed in:

  a368d7fd2a ("x86/entry/64: Add instruction suffix")

... there are two more instances which want to be adjusted.

As said there, omitting suffixes from instructions in AT mode is bad
practice when operand size cannot be determined by the assembler from
register operands, and is likely going to be warned about by upstream
gas in the future (mine does already).

Add the other missing suffixes here as well.

Signed-off-by: Jan Beulich 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/5b3a02dd0278001cf...@prv1-mh.provo.novell.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/entry/entry_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index c6f3677e6105..65aa16d845f6 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -92,7 +92,7 @@ END(native_usergs_sysret64)
 .endm
 
 .macro TRACE_IRQS_IRETQ_DEBUG
-   bt  $9, EFLAGS(%rsp)/* interrupts off? */
+   btl $9, EFLAGS(%rsp)/* interrupts off? */
jnc 1f
TRACE_IRQS_ON_DEBUG
 1:
@@ -702,7 +702,7 @@ retint_kernel:
 #ifdef CONFIG_PREEMPT
/* Interrupts are off */
/* Check if we need preemption */
-   bt  $9, EFLAGS(%rsp)/* were interrupts off? */
+   btl $9, EFLAGS(%rsp)/* were interrupts off? */
jnc 1f
 0: cmpl$0, PER_CPU_VAR(__preempt_count)
jnz 1f


[PATCH] x86/entry/64: add two more instruction suffixes

2018-07-02 Thread Jan Beulich
Sadly, other than claimed in a368d7fd2a ("x86/entry/64: Add instruction
suffix"), there are two more instances which want to be adjusted. As
said there, omitting suffixes from instructions in AT mode is bad
practice when operand size cannot be determined by the assembler from
register operands, and is likely going to be warned about by upstream
gas in the future (mine does already). Add the other missing suffixes
here as well.

Signed-off-by: Jan Beulich 
---
 arch/x86/entry/entry_64.S |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- 4.18-rc3/arch/x86/entry/entry_64.S
+++ 4.18-rc3-x86_64-entry-insn-suffix/arch/x86/entry/entry_64.S
@@ -92,7 +92,7 @@ END(native_usergs_sysret64)
 .endm
 
 .macro TRACE_IRQS_IRETQ_DEBUG
-   bt  $9, EFLAGS(%rsp)/* interrupts off? */
+   btl $9, EFLAGS(%rsp)/* interrupts off? */
jnc 1f
TRACE_IRQS_ON_DEBUG
 1:
@@ -701,7 +701,7 @@ retint_kernel:
 #ifdef CONFIG_PREEMPT
/* Interrupts are off */
/* Check if we need preemption */
-   bt  $9, EFLAGS(%rsp)/* were interrupts off? */
+   btl $9, EFLAGS(%rsp)/* were interrupts off? */
jnc 1f
 0: cmpl$0, PER_CPU_VAR(__preempt_count)
jnz 1f






[PATCH] x86/entry/64: add two more instruction suffixes

2018-07-02 Thread Jan Beulich
Sadly, other than claimed in a368d7fd2a ("x86/entry/64: Add instruction
suffix"), there are two more instances which want to be adjusted. As
said there, omitting suffixes from instructions in AT mode is bad
practice when operand size cannot be determined by the assembler from
register operands, and is likely going to be warned about by upstream
gas in the future (mine does already). Add the other missing suffixes
here as well.

Signed-off-by: Jan Beulich 
---
 arch/x86/entry/entry_64.S |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- 4.18-rc3/arch/x86/entry/entry_64.S
+++ 4.18-rc3-x86_64-entry-insn-suffix/arch/x86/entry/entry_64.S
@@ -92,7 +92,7 @@ END(native_usergs_sysret64)
 .endm
 
 .macro TRACE_IRQS_IRETQ_DEBUG
-   bt  $9, EFLAGS(%rsp)/* interrupts off? */
+   btl $9, EFLAGS(%rsp)/* interrupts off? */
jnc 1f
TRACE_IRQS_ON_DEBUG
 1:
@@ -701,7 +701,7 @@ retint_kernel:
 #ifdef CONFIG_PREEMPT
/* Interrupts are off */
/* Check if we need preemption */
-   bt  $9, EFLAGS(%rsp)/* were interrupts off? */
+   btl $9, EFLAGS(%rsp)/* were interrupts off? */
jnc 1f
 0: cmpl$0, PER_CPU_VAR(__preempt_count)
jnz 1f






[PATCH v2] x86-64: use 32-bit XOR to zero registers

2018-07-02 Thread Jan Beulich
Some Intel CPUs don't recognize 64-bit XORs as zeroing idioms. Zeroing
idioms don't require execution bandwidth, as they're being taken care
of in the frontend (through register renaming). Use 32-bit XORs instead.

Signed-off-by: Jan Beulich 
---
v2: Explain what zeroing idioms are/do in the description.
---
 arch/x86/crypto/aegis128-aesni-asm.S |2 +-
 arch/x86/crypto/aegis128l-aesni-asm.S|2 +-
 arch/x86/crypto/aegis256-aesni-asm.S |2 +-
 arch/x86/crypto/aesni-intel_asm.S|8 
 arch/x86/crypto/aesni-intel_avx-x86_64.S |4 ++--
 arch/x86/crypto/morus1280-avx2-asm.S |2 +-
 arch/x86/crypto/morus1280-sse2-asm.S |2 +-
 arch/x86/crypto/morus640-sse2-asm.S  |2 +-
 arch/x86/crypto/sha1_ssse3_asm.S |2 +-
 arch/x86/kernel/head_64.S|2 +-
 arch/x86/kernel/paravirt_patch_64.c  |2 +-
 arch/x86/lib/memcpy_64.S |2 +-
 arch/x86/power/hibernate_asm_64.S|2 +-
 13 files changed, 17 insertions(+), 17 deletions(-)

--- 4.18-rc3/arch/x86/crypto/aegis128-aesni-asm.S
+++ 4.18-rc3-x86_64-32bit-XOR/arch/x86/crypto/aegis128-aesni-asm.S
@@ -75,7 +75,7 @@
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG, MSG
 
mov LEN, %r8
--- 4.18-rc3/arch/x86/crypto/aegis128l-aesni-asm.S
+++ 4.18-rc3-x86_64-32bit-XOR/arch/x86/crypto/aegis128l-aesni-asm.S
@@ -66,7 +66,7 @@
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG0, MSG0
pxor MSG1, MSG1
 
--- 4.18-rc3/arch/x86/crypto/aegis256-aesni-asm.S
+++ 4.18-rc3-x86_64-32bit-XOR/arch/x86/crypto/aegis256-aesni-asm.S
@@ -59,7 +59,7 @@
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG, MSG
 
mov LEN, %r8
--- 4.18-rc3/arch/x86/crypto/aesni-intel_asm.S
+++ 4.18-rc3-x86_64-32bit-XOR/arch/x86/crypto/aesni-intel_asm.S
@@ -258,7 +258,7 @@ ALL_F:  .octa 0x
 .macro GCM_INIT Iv SUBKEY AAD AADLEN
mov \AADLEN, %r11
mov %r11, AadLen(%arg2) # ctx_data.aad_length = aad_length
-   xor %r11, %r11
+   xor %r11d, %r11d
mov %r11, InLen(%arg2) # ctx_data.in_length = 0
mov %r11, PBlockLen(%arg2) # ctx_data.partial_block_length = 0
mov %r11, PBlockEncKey(%arg2) # ctx_data.partial_block_enc_key = 0
@@ -286,7 +286,7 @@ ALL_F:  .octa 0x
movdqu HashKey(%arg2), %xmm13
add %arg5, InLen(%arg2)
 
-   xor %r11, %r11 # initialise the data pointer offset as zero
+   xor %r11d, %r11d # initialise the data pointer offset as zero
PARTIAL_BLOCK %arg3 %arg4 %arg5 %r11 %xmm8 \operation
 
sub %r11, %arg5 # sub partial block data used
@@ -702,7 +702,7 @@ _no_extra_mask_1_\@:
 
# GHASH computation for the last <16 Byte block
GHASH_MUL \AAD_HASH, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6
-   xor %rax,%rax
+   xor %eax, %eax
 
mov %rax, PBlockLen(%arg2)
jmp _dec_done_\@
@@ -737,7 +737,7 @@ _no_extra_mask_2_\@:
 
# GHASH computation for the last <16 Byte block
GHASH_MUL \AAD_HASH, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6
-   xor %rax,%rax
+   xor %eax, %eax
 
mov %rax, PBlockLen(%arg2)
jmp _encode_done_\@
--- 4.18-rc3/arch/x86/crypto/aesni-intel_avx-x86_64.S
+++ 4.18-rc3-x86_64-32bit-XOR/arch/x86/crypto/aesni-intel_avx-x86_64.S
@@ -463,7 +463,7 @@ _get_AAD_rest_final\@:
 
 _get_AAD_done\@:
# initialize the data pointer offset as zero
-   xor %r11, %r11
+   xor %r11d, %r11d
 
# start AES for num_initial_blocks blocks
mov arg5, %rax # rax = *Y0
@@ -1770,7 +1770,7 @@ _get_AAD_rest_final\@:
 
 _get_AAD_done\@:
# initialize the data pointer offset as zero
-   xor %r11, %r11
+   xor %r11d, %r11d
 
# start AES for num_initial_blocks blocks
mov arg5, %rax # rax = *Y0
--- 4.18-rc3/arch/x86/crypto/morus1280-avx2-asm.S
+++ 4.18-rc3-x86_64-32bit-XOR/arch/x86/crypto/morus1280-avx2-asm.S
@@ -113,7 +113,7 @@ ENDPROC(__morus1280_update_zero)
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
vpxor MSG, MSG, MSG
 
mov %rcx, %r8
--- 4.18-rc3/arch/x86/crypto/morus1280-sse2-asm.S
+++ 4.18-rc3-x86_64-32bit-XOR/arch/x86/crypto/morus1280-sse2-asm.S
@@ -235,7 +235,7 @@ ENDPROC(__morus1280_update_zero)
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG_LO, MSG_LO
pxor MSG_HI, MSG_HI
 
--- 4.18-rc3/arch/x86/crypto/morus640-sse2-asm.S
+++ 4.18-rc3-x86_64-32bit-XOR/arch/x86/crypto/morus640-sse2-asm.S
@@ -113,7 +113,7 @@ ENDPROC(__morus640_update_zero)
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG, MSG
 
mov %rcx, %r8
--- 4.18-rc3/arch/x86/crypto/sha1_ssse3

[PATCH v2] x86-64: use 32-bit XOR to zero registers

2018-07-02 Thread Jan Beulich
Some Intel CPUs don't recognize 64-bit XORs as zeroing idioms. Zeroing
idioms don't require execution bandwidth, as they're being taken care
of in the frontend (through register renaming). Use 32-bit XORs instead.

Signed-off-by: Jan Beulich 
---
v2: Explain what zeroing idioms are/do in the description.
---
 arch/x86/crypto/aegis128-aesni-asm.S |2 +-
 arch/x86/crypto/aegis128l-aesni-asm.S|2 +-
 arch/x86/crypto/aegis256-aesni-asm.S |2 +-
 arch/x86/crypto/aesni-intel_asm.S|8 
 arch/x86/crypto/aesni-intel_avx-x86_64.S |4 ++--
 arch/x86/crypto/morus1280-avx2-asm.S |2 +-
 arch/x86/crypto/morus1280-sse2-asm.S |2 +-
 arch/x86/crypto/morus640-sse2-asm.S  |2 +-
 arch/x86/crypto/sha1_ssse3_asm.S |2 +-
 arch/x86/kernel/head_64.S|2 +-
 arch/x86/kernel/paravirt_patch_64.c  |2 +-
 arch/x86/lib/memcpy_64.S |2 +-
 arch/x86/power/hibernate_asm_64.S|2 +-
 13 files changed, 17 insertions(+), 17 deletions(-)

--- 4.18-rc3/arch/x86/crypto/aegis128-aesni-asm.S
+++ 4.18-rc3-x86_64-32bit-XOR/arch/x86/crypto/aegis128-aesni-asm.S
@@ -75,7 +75,7 @@
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG, MSG
 
mov LEN, %r8
--- 4.18-rc3/arch/x86/crypto/aegis128l-aesni-asm.S
+++ 4.18-rc3-x86_64-32bit-XOR/arch/x86/crypto/aegis128l-aesni-asm.S
@@ -66,7 +66,7 @@
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG0, MSG0
pxor MSG1, MSG1
 
--- 4.18-rc3/arch/x86/crypto/aegis256-aesni-asm.S
+++ 4.18-rc3-x86_64-32bit-XOR/arch/x86/crypto/aegis256-aesni-asm.S
@@ -59,7 +59,7 @@
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG, MSG
 
mov LEN, %r8
--- 4.18-rc3/arch/x86/crypto/aesni-intel_asm.S
+++ 4.18-rc3-x86_64-32bit-XOR/arch/x86/crypto/aesni-intel_asm.S
@@ -258,7 +258,7 @@ ALL_F:  .octa 0x
 .macro GCM_INIT Iv SUBKEY AAD AADLEN
mov \AADLEN, %r11
mov %r11, AadLen(%arg2) # ctx_data.aad_length = aad_length
-   xor %r11, %r11
+   xor %r11d, %r11d
mov %r11, InLen(%arg2) # ctx_data.in_length = 0
mov %r11, PBlockLen(%arg2) # ctx_data.partial_block_length = 0
mov %r11, PBlockEncKey(%arg2) # ctx_data.partial_block_enc_key = 0
@@ -286,7 +286,7 @@ ALL_F:  .octa 0x
movdqu HashKey(%arg2), %xmm13
add %arg5, InLen(%arg2)
 
-   xor %r11, %r11 # initialise the data pointer offset as zero
+   xor %r11d, %r11d # initialise the data pointer offset as zero
PARTIAL_BLOCK %arg3 %arg4 %arg5 %r11 %xmm8 \operation
 
sub %r11, %arg5 # sub partial block data used
@@ -702,7 +702,7 @@ _no_extra_mask_1_\@:
 
# GHASH computation for the last <16 Byte block
GHASH_MUL \AAD_HASH, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6
-   xor %rax,%rax
+   xor %eax, %eax
 
mov %rax, PBlockLen(%arg2)
jmp _dec_done_\@
@@ -737,7 +737,7 @@ _no_extra_mask_2_\@:
 
# GHASH computation for the last <16 Byte block
GHASH_MUL \AAD_HASH, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6
-   xor %rax,%rax
+   xor %eax, %eax
 
mov %rax, PBlockLen(%arg2)
jmp _encode_done_\@
--- 4.18-rc3/arch/x86/crypto/aesni-intel_avx-x86_64.S
+++ 4.18-rc3-x86_64-32bit-XOR/arch/x86/crypto/aesni-intel_avx-x86_64.S
@@ -463,7 +463,7 @@ _get_AAD_rest_final\@:
 
 _get_AAD_done\@:
# initialize the data pointer offset as zero
-   xor %r11, %r11
+   xor %r11d, %r11d
 
# start AES for num_initial_blocks blocks
mov arg5, %rax # rax = *Y0
@@ -1770,7 +1770,7 @@ _get_AAD_rest_final\@:
 
 _get_AAD_done\@:
# initialize the data pointer offset as zero
-   xor %r11, %r11
+   xor %r11d, %r11d
 
# start AES for num_initial_blocks blocks
mov arg5, %rax # rax = *Y0
--- 4.18-rc3/arch/x86/crypto/morus1280-avx2-asm.S
+++ 4.18-rc3-x86_64-32bit-XOR/arch/x86/crypto/morus1280-avx2-asm.S
@@ -113,7 +113,7 @@ ENDPROC(__morus1280_update_zero)
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
vpxor MSG, MSG, MSG
 
mov %rcx, %r8
--- 4.18-rc3/arch/x86/crypto/morus1280-sse2-asm.S
+++ 4.18-rc3-x86_64-32bit-XOR/arch/x86/crypto/morus1280-sse2-asm.S
@@ -235,7 +235,7 @@ ENDPROC(__morus1280_update_zero)
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG_LO, MSG_LO
pxor MSG_HI, MSG_HI
 
--- 4.18-rc3/arch/x86/crypto/morus640-sse2-asm.S
+++ 4.18-rc3-x86_64-32bit-XOR/arch/x86/crypto/morus640-sse2-asm.S
@@ -113,7 +113,7 @@ ENDPROC(__morus640_update_zero)
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG, MSG
 
mov %rcx, %r8
--- 4.18-rc3/arch/x86/crypto/sha1_ssse3

[tip:x86/urgent] x86/entry/32: Add explicit 'l' instruction suffix

2018-06-26 Thread tip-bot for Jan Beulich
Commit-ID:  236f0cd286b67291796362feeac4f342900cfa82
Gitweb: https://git.kernel.org/tip/236f0cd286b67291796362feeac4f342900cfa82
Author: Jan Beulich 
AuthorDate: Mon, 25 Jun 2018 04:21:59 -0600
Committer:  Ingo Molnar 
CommitDate: Tue, 26 Jun 2018 09:20:31 +0200

x86/entry/32: Add explicit 'l' instruction suffix

Omitting suffixes from instructions in AT mode is bad practice when
operand size cannot be determined by the assembler from register
operands, and is likely going to be warned about by upstream GAS in the
future (mine does already).

Add the single missing 'l' suffix here.

Signed-off-by: Jan Beulich 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Andy Lutomirski 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/5b30c2470278001cd...@prv1-mh.provo.novell.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/entry/entry_32.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 2582881d19ce..c371bfee137a 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -477,7 +477,7 @@ ENTRY(entry_SYSENTER_32)
 * whereas POPF does not.)
 */
addl$PT_EFLAGS-PT_DS, %esp  /* point esp at pt_regs->flags */
-   btr $X86_EFLAGS_IF_BIT, (%esp)
+   btrl$X86_EFLAGS_IF_BIT, (%esp)
popfl
 
/*


[tip:x86/urgent] x86/entry/32: Add explicit 'l' instruction suffix

2018-06-26 Thread tip-bot for Jan Beulich
Commit-ID:  236f0cd286b67291796362feeac4f342900cfa82
Gitweb: https://git.kernel.org/tip/236f0cd286b67291796362feeac4f342900cfa82
Author: Jan Beulich 
AuthorDate: Mon, 25 Jun 2018 04:21:59 -0600
Committer:  Ingo Molnar 
CommitDate: Tue, 26 Jun 2018 09:20:31 +0200

x86/entry/32: Add explicit 'l' instruction suffix

Omitting suffixes from instructions in AT mode is bad practice when
operand size cannot be determined by the assembler from register
operands, and is likely going to be warned about by upstream GAS in the
future (mine does already).

Add the single missing 'l' suffix here.

Signed-off-by: Jan Beulich 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Andy Lutomirski 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/5b30c2470278001cd...@prv1-mh.provo.novell.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/entry/entry_32.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 2582881d19ce..c371bfee137a 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -477,7 +477,7 @@ ENTRY(entry_SYSENTER_32)
 * whereas POPF does not.)
 */
addl$PT_EFLAGS-PT_DS, %esp  /* point esp at pt_regs->flags */
-   btr $X86_EFLAGS_IF_BIT, (%esp)
+   btrl$X86_EFLAGS_IF_BIT, (%esp)
popfl
 
/*


Re: [PATCH] x86: modernize sync_bitops.h

2018-06-26 Thread Jan Beulich
>>> On 26.06.18 at 09:18,  wrote:
> * Jan Beulich  wrote:
> 
>> Add missing insn suffixes and use rmwcc.h just like was (more or less)
>> recently done for bitops.h as well.
>> 
>> Signed-off-by: Jan Beulich 
>> ---
>>  arch/x86/include/asm/sync_bitops.h |   34 --
>>  1 file changed, 12 insertions(+), 22 deletions(-)
> 
> Have you verified that a typical x86 vmlinux (with defconfig for example) is 
> the 
> same before/after?

It works the same, but the binary is unlikely to be the same with the switch
to condition code asm() outputs.

Jan




Re: [PATCH] x86: modernize sync_bitops.h

2018-06-26 Thread Jan Beulich
>>> On 26.06.18 at 09:18,  wrote:
> * Jan Beulich  wrote:
> 
>> Add missing insn suffixes and use rmwcc.h just like was (more or less)
>> recently done for bitops.h as well.
>> 
>> Signed-off-by: Jan Beulich 
>> ---
>>  arch/x86/include/asm/sync_bitops.h |   34 --
>>  1 file changed, 12 insertions(+), 22 deletions(-)
> 
> Have you verified that a typical x86 vmlinux (with defconfig for example) is 
> the 
> same before/after?

It works the same, but the binary is unlikely to be the same with the switch
to condition code asm() outputs.

Jan




Re: [PATCH] x86-64: use 32-bit XOR to zero registers

2018-06-26 Thread Jan Beulich
>>> On 25.06.18 at 18:33,  wrote:
> On 06/25/2018 03:25 AM, Jan Beulich wrote:
>> Some Intel CPUs don't recognize 64-bit XORs as zeroing idioms - use
>> 32-bit ones instead.
> 
> Hmph.  Is that considered a bug (errata)?

No.

> URL/references?

Intel's Optimization Reference Manual says so (in rev 040 this is in section
16.2.2.5 "Zeroing Idioms" as a subsection of the Goldmont/Silvermont
descriptions).

> Are these changes really only zeroing the lower 32 bits of the register?
> and that's all that the code cares about?

No - like all operations targeting a 32-bit register, the result is zero
extended to the entire 64-bit destination register.

Jan




Re: [PATCH] x86-64: use 32-bit XOR to zero registers

2018-06-26 Thread Jan Beulich
>>> On 25.06.18 at 18:33,  wrote:
> On 06/25/2018 03:25 AM, Jan Beulich wrote:
>> Some Intel CPUs don't recognize 64-bit XORs as zeroing idioms - use
>> 32-bit ones instead.
> 
> Hmph.  Is that considered a bug (errata)?

No.

> URL/references?

Intel's Optimization Reference Manual says so (in rev 040 this is in section
16.2.2.5 "Zeroing Idioms" as a subsection of the Goldmont/Silvermont
descriptions).

> Are these changes really only zeroing the lower 32 bits of the register?
> and that's all that the code cares about?

No - like all operations targeting a 32-bit register, the result is zero
extended to the entire 64-bit destination register.

Jan




[PATCH] x86-64: use 32-bit XOR to zero registers

2018-06-25 Thread Jan Beulich
Some Intel CPUs don't recognize 64-bit XORs as zeroing idioms - use
32-bit ones instead.

Signed-off-by: Jan Beulich 
---
 arch/x86/crypto/aegis128-aesni-asm.S |2 +-
 arch/x86/crypto/aegis128l-aesni-asm.S|2 +-
 arch/x86/crypto/aegis256-aesni-asm.S |2 +-
 arch/x86/crypto/aesni-intel_asm.S|8 
 arch/x86/crypto/aesni-intel_avx-x86_64.S |4 ++--
 arch/x86/crypto/morus1280-avx2-asm.S |2 +-
 arch/x86/crypto/morus1280-sse2-asm.S |2 +-
 arch/x86/crypto/morus640-sse2-asm.S  |2 +-
 arch/x86/crypto/sha1_ssse3_asm.S |2 +-
 arch/x86/kernel/head_64.S|2 +-
 arch/x86/kernel/paravirt_patch_64.c  |2 +-
 arch/x86/lib/memcpy_64.S |2 +-
 arch/x86/power/hibernate_asm_64.S|2 +-
 13 files changed, 17 insertions(+), 17 deletions(-)

--- 4.18-rc2/arch/x86/crypto/aegis128-aesni-asm.S
+++ 4.18-rc2-x86_64-32bit-XOR/arch/x86/crypto/aegis128-aesni-asm.S
@@ -75,7 +75,7 @@
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG, MSG
 
mov LEN, %r8
--- 4.18-rc2/arch/x86/crypto/aegis128l-aesni-asm.S
+++ 4.18-rc2-x86_64-32bit-XOR/arch/x86/crypto/aegis128l-aesni-asm.S
@@ -66,7 +66,7 @@
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG0, MSG0
pxor MSG1, MSG1
 
--- 4.18-rc2/arch/x86/crypto/aegis256-aesni-asm.S
+++ 4.18-rc2-x86_64-32bit-XOR/arch/x86/crypto/aegis256-aesni-asm.S
@@ -59,7 +59,7 @@
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG, MSG
 
mov LEN, %r8
--- 4.18-rc2/arch/x86/crypto/aesni-intel_asm.S
+++ 4.18-rc2-x86_64-32bit-XOR/arch/x86/crypto/aesni-intel_asm.S
@@ -258,7 +258,7 @@ ALL_F:  .octa 0x
 .macro GCM_INIT Iv SUBKEY AAD AADLEN
mov \AADLEN, %r11
mov %r11, AadLen(%arg2) # ctx_data.aad_length = aad_length
-   xor %r11, %r11
+   xor %r11d, %r11d
mov %r11, InLen(%arg2) # ctx_data.in_length = 0
mov %r11, PBlockLen(%arg2) # ctx_data.partial_block_length = 0
mov %r11, PBlockEncKey(%arg2) # ctx_data.partial_block_enc_key = 0
@@ -286,7 +286,7 @@ ALL_F:  .octa 0x
movdqu HashKey(%arg2), %xmm13
add %arg5, InLen(%arg2)
 
-   xor %r11, %r11 # initialise the data pointer offset as zero
+   xor %r11d, %r11d # initialise the data pointer offset as zero
PARTIAL_BLOCK %arg3 %arg4 %arg5 %r11 %xmm8 \operation
 
sub %r11, %arg5 # sub partial block data used
@@ -702,7 +702,7 @@ _no_extra_mask_1_\@:
 
# GHASH computation for the last <16 Byte block
GHASH_MUL \AAD_HASH, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6
-   xor %rax,%rax
+   xor %eax, %eax
 
mov %rax, PBlockLen(%arg2)
jmp _dec_done_\@
@@ -737,7 +737,7 @@ _no_extra_mask_2_\@:
 
# GHASH computation for the last <16 Byte block
GHASH_MUL \AAD_HASH, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6
-   xor %rax,%rax
+   xor %eax, %eax
 
mov %rax, PBlockLen(%arg2)
jmp _encode_done_\@
--- 4.18-rc2/arch/x86/crypto/aesni-intel_avx-x86_64.S
+++ 4.18-rc2-x86_64-32bit-XOR/arch/x86/crypto/aesni-intel_avx-x86_64.S
@@ -463,7 +463,7 @@ _get_AAD_rest_final\@:
 
 _get_AAD_done\@:
# initialize the data pointer offset as zero
-   xor %r11, %r11
+   xor %r11d, %r11d
 
# start AES for num_initial_blocks blocks
mov arg5, %rax # rax = *Y0
@@ -1770,7 +1770,7 @@ _get_AAD_rest_final\@:
 
 _get_AAD_done\@:
# initialize the data pointer offset as zero
-   xor %r11, %r11
+   xor %r11d, %r11d
 
# start AES for num_initial_blocks blocks
mov arg5, %rax # rax = *Y0
--- 4.18-rc2/arch/x86/crypto/morus1280-avx2-asm.S
+++ 4.18-rc2-x86_64-32bit-XOR/arch/x86/crypto/morus1280-avx2-asm.S
@@ -113,7 +113,7 @@ ENDPROC(__morus1280_update_zero)
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
vpxor MSG, MSG, MSG
 
mov %rcx, %r8
--- 4.18-rc2/arch/x86/crypto/morus1280-sse2-asm.S
+++ 4.18-rc2-x86_64-32bit-XOR/arch/x86/crypto/morus1280-sse2-asm.S
@@ -235,7 +235,7 @@ ENDPROC(__morus1280_update_zero)
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG_LO, MSG_LO
pxor MSG_HI, MSG_HI
 
--- 4.18-rc2/arch/x86/crypto/morus640-sse2-asm.S
+++ 4.18-rc2-x86_64-32bit-XOR/arch/x86/crypto/morus640-sse2-asm.S
@@ -113,7 +113,7 @@ ENDPROC(__morus640_update_zero)
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG, MSG
 
mov %rcx, %r8
--- 4.18-rc2/arch/x86/crypto/sha1_ssse3_asm.S
+++ 4.18-rc2-x86_64-32bit-XOR/arch/x86/crypto/sha1_ssse3_asm.S
@@ -96,7 +96,7 @@
# cleanup workspace
mov $8, %ecx
mov %rsp, %rdi
-   xor %rax

[PATCH] x86-64: use 32-bit XOR to zero registers

2018-06-25 Thread Jan Beulich
Some Intel CPUs don't recognize 64-bit XORs as zeroing idioms - use
32-bit ones instead.

Signed-off-by: Jan Beulich 
---
 arch/x86/crypto/aegis128-aesni-asm.S |2 +-
 arch/x86/crypto/aegis128l-aesni-asm.S|2 +-
 arch/x86/crypto/aegis256-aesni-asm.S |2 +-
 arch/x86/crypto/aesni-intel_asm.S|8 
 arch/x86/crypto/aesni-intel_avx-x86_64.S |4 ++--
 arch/x86/crypto/morus1280-avx2-asm.S |2 +-
 arch/x86/crypto/morus1280-sse2-asm.S |2 +-
 arch/x86/crypto/morus640-sse2-asm.S  |2 +-
 arch/x86/crypto/sha1_ssse3_asm.S |2 +-
 arch/x86/kernel/head_64.S|2 +-
 arch/x86/kernel/paravirt_patch_64.c  |2 +-
 arch/x86/lib/memcpy_64.S |2 +-
 arch/x86/power/hibernate_asm_64.S|2 +-
 13 files changed, 17 insertions(+), 17 deletions(-)

--- 4.18-rc2/arch/x86/crypto/aegis128-aesni-asm.S
+++ 4.18-rc2-x86_64-32bit-XOR/arch/x86/crypto/aegis128-aesni-asm.S
@@ -75,7 +75,7 @@
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG, MSG
 
mov LEN, %r8
--- 4.18-rc2/arch/x86/crypto/aegis128l-aesni-asm.S
+++ 4.18-rc2-x86_64-32bit-XOR/arch/x86/crypto/aegis128l-aesni-asm.S
@@ -66,7 +66,7 @@
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG0, MSG0
pxor MSG1, MSG1
 
--- 4.18-rc2/arch/x86/crypto/aegis256-aesni-asm.S
+++ 4.18-rc2-x86_64-32bit-XOR/arch/x86/crypto/aegis256-aesni-asm.S
@@ -59,7 +59,7 @@
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG, MSG
 
mov LEN, %r8
--- 4.18-rc2/arch/x86/crypto/aesni-intel_asm.S
+++ 4.18-rc2-x86_64-32bit-XOR/arch/x86/crypto/aesni-intel_asm.S
@@ -258,7 +258,7 @@ ALL_F:  .octa 0x
 .macro GCM_INIT Iv SUBKEY AAD AADLEN
mov \AADLEN, %r11
mov %r11, AadLen(%arg2) # ctx_data.aad_length = aad_length
-   xor %r11, %r11
+   xor %r11d, %r11d
mov %r11, InLen(%arg2) # ctx_data.in_length = 0
mov %r11, PBlockLen(%arg2) # ctx_data.partial_block_length = 0
mov %r11, PBlockEncKey(%arg2) # ctx_data.partial_block_enc_key = 0
@@ -286,7 +286,7 @@ ALL_F:  .octa 0x
movdqu HashKey(%arg2), %xmm13
add %arg5, InLen(%arg2)
 
-   xor %r11, %r11 # initialise the data pointer offset as zero
+   xor %r11d, %r11d # initialise the data pointer offset as zero
PARTIAL_BLOCK %arg3 %arg4 %arg5 %r11 %xmm8 \operation
 
sub %r11, %arg5 # sub partial block data used
@@ -702,7 +702,7 @@ _no_extra_mask_1_\@:
 
# GHASH computation for the last <16 Byte block
GHASH_MUL \AAD_HASH, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6
-   xor %rax,%rax
+   xor %eax, %eax
 
mov %rax, PBlockLen(%arg2)
jmp _dec_done_\@
@@ -737,7 +737,7 @@ _no_extra_mask_2_\@:
 
# GHASH computation for the last <16 Byte block
GHASH_MUL \AAD_HASH, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6
-   xor %rax,%rax
+   xor %eax, %eax
 
mov %rax, PBlockLen(%arg2)
jmp _encode_done_\@
--- 4.18-rc2/arch/x86/crypto/aesni-intel_avx-x86_64.S
+++ 4.18-rc2-x86_64-32bit-XOR/arch/x86/crypto/aesni-intel_avx-x86_64.S
@@ -463,7 +463,7 @@ _get_AAD_rest_final\@:
 
 _get_AAD_done\@:
# initialize the data pointer offset as zero
-   xor %r11, %r11
+   xor %r11d, %r11d
 
# start AES for num_initial_blocks blocks
mov arg5, %rax # rax = *Y0
@@ -1770,7 +1770,7 @@ _get_AAD_rest_final\@:
 
 _get_AAD_done\@:
# initialize the data pointer offset as zero
-   xor %r11, %r11
+   xor %r11d, %r11d
 
# start AES for num_initial_blocks blocks
mov arg5, %rax # rax = *Y0
--- 4.18-rc2/arch/x86/crypto/morus1280-avx2-asm.S
+++ 4.18-rc2-x86_64-32bit-XOR/arch/x86/crypto/morus1280-avx2-asm.S
@@ -113,7 +113,7 @@ ENDPROC(__morus1280_update_zero)
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
vpxor MSG, MSG, MSG
 
mov %rcx, %r8
--- 4.18-rc2/arch/x86/crypto/morus1280-sse2-asm.S
+++ 4.18-rc2-x86_64-32bit-XOR/arch/x86/crypto/morus1280-sse2-asm.S
@@ -235,7 +235,7 @@ ENDPROC(__morus1280_update_zero)
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG_LO, MSG_LO
pxor MSG_HI, MSG_HI
 
--- 4.18-rc2/arch/x86/crypto/morus640-sse2-asm.S
+++ 4.18-rc2-x86_64-32bit-XOR/arch/x86/crypto/morus640-sse2-asm.S
@@ -113,7 +113,7 @@ ENDPROC(__morus640_update_zero)
  *   %r9
  */
 __load_partial:
-   xor %r9, %r9
+   xor %r9d, %r9d
pxor MSG, MSG
 
mov %rcx, %r8
--- 4.18-rc2/arch/x86/crypto/sha1_ssse3_asm.S
+++ 4.18-rc2-x86_64-32bit-XOR/arch/x86/crypto/sha1_ssse3_asm.S
@@ -96,7 +96,7 @@
# cleanup workspace
mov $8, %ecx
mov %rsp, %rdi
-   xor %rax

[PATCH] x86: modernize sync_bitops.h

2018-06-25 Thread Jan Beulich
Add missing insn suffixes and use rmwcc.h just like was (more or less)
recently done for bitops.h as well.

Signed-off-by: Jan Beulich 
---
 arch/x86/include/asm/sync_bitops.h |   34 --
 1 file changed, 12 insertions(+), 22 deletions(-)

--- 4.18-rc2/arch/x86/include/asm/sync_bitops.h
+++ 4.18-rc2-x86-sync-bitops-insn-suffixes/arch/x86/include/asm/sync_bitops.h
@@ -14,6 +14,8 @@
  * bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1).
  */
 
+#include 
+
 #define ADDR (*(volatile long *)addr)
 
 /**
@@ -29,7 +31,7 @@
  */
 static inline void sync_set_bit(long nr, volatile unsigned long *addr)
 {
-   asm volatile("lock; bts %1,%0"
+   asm volatile("lock; " __ASM_SIZE(bts) " %1,%0"
 : "+m" (ADDR)
 : "Ir" (nr)
 : "memory");
@@ -47,7 +49,7 @@ static inline void sync_set_bit(long nr,
  */
 static inline void sync_clear_bit(long nr, volatile unsigned long *addr)
 {
-   asm volatile("lock; btr %1,%0"
+   asm volatile("lock; " __ASM_SIZE(btr) " %1,%0"
 : "+m" (ADDR)
 : "Ir" (nr)
 : "memory");
@@ -64,7 +66,7 @@ static inline void sync_clear_bit(long n
  */
 static inline void sync_change_bit(long nr, volatile unsigned long *addr)
 {
-   asm volatile("lock; btc %1,%0"
+   asm volatile("lock; " __ASM_SIZE(btc) " %1,%0"
 : "+m" (ADDR)
 : "Ir" (nr)
 : "memory");
@@ -78,14 +80,10 @@ static inline void sync_change_bit(long
  * This operation is atomic and cannot be reordered.
  * It also implies a memory barrier.
  */
-static inline int sync_test_and_set_bit(long nr, volatile unsigned long *addr)
+static inline bool sync_test_and_set_bit(long nr, volatile unsigned long *addr)
 {
-   unsigned char oldbit;
-
-   asm volatile("lock; bts %2,%1\n\tsetc %0"
-: "=qm" (oldbit), "+m" (ADDR)
-: "Ir" (nr) : "memory");
-   return oldbit;
+   GEN_BINARY_RMWcc("lock; " __ASM_SIZE(bts),
+*addr, "Ir", nr, "%0", c);
 }
 
 /**
@@ -98,12 +96,8 @@ static inline int sync_test_and_set_bit(
  */
 static inline int sync_test_and_clear_bit(long nr, volatile unsigned long 
*addr)
 {
-   unsigned char oldbit;
-
-   asm volatile("lock; btr %2,%1\n\tsetc %0"
-: "=qm" (oldbit), "+m" (ADDR)
-: "Ir" (nr) : "memory");
-   return oldbit;
+   GEN_BINARY_RMWcc("lock; " __ASM_SIZE(btr),
+*addr, "Ir", nr, "%0", c);
 }
 
 /**
@@ -116,12 +110,8 @@ static inline int sync_test_and_clear_bi
  */
 static inline int sync_test_and_change_bit(long nr, volatile unsigned long 
*addr)
 {
-   unsigned char oldbit;
-
-   asm volatile("lock; btc %2,%1\n\tsetc %0"
-: "=qm" (oldbit), "+m" (ADDR)
-: "Ir" (nr) : "memory");
-   return oldbit;
+   GEN_BINARY_RMWcc("lock; " __ASM_SIZE(btc),
+*addr, "Ir", nr, "%0", c);
 }
 
 #define sync_test_bit(nr, addr) test_bit(nr, addr)






[PATCH] x86: modernize sync_bitops.h

2018-06-25 Thread Jan Beulich
Add missing insn suffixes and use rmwcc.h just like was (more or less)
recently done for bitops.h as well.

Signed-off-by: Jan Beulich 
---
 arch/x86/include/asm/sync_bitops.h |   34 --
 1 file changed, 12 insertions(+), 22 deletions(-)

--- 4.18-rc2/arch/x86/include/asm/sync_bitops.h
+++ 4.18-rc2-x86-sync-bitops-insn-suffixes/arch/x86/include/asm/sync_bitops.h
@@ -14,6 +14,8 @@
  * bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1).
  */
 
+#include 
+
 #define ADDR (*(volatile long *)addr)
 
 /**
@@ -29,7 +31,7 @@
  */
 static inline void sync_set_bit(long nr, volatile unsigned long *addr)
 {
-   asm volatile("lock; bts %1,%0"
+   asm volatile("lock; " __ASM_SIZE(bts) " %1,%0"
 : "+m" (ADDR)
 : "Ir" (nr)
 : "memory");
@@ -47,7 +49,7 @@ static inline void sync_set_bit(long nr,
  */
 static inline void sync_clear_bit(long nr, volatile unsigned long *addr)
 {
-   asm volatile("lock; btr %1,%0"
+   asm volatile("lock; " __ASM_SIZE(btr) " %1,%0"
 : "+m" (ADDR)
 : "Ir" (nr)
 : "memory");
@@ -64,7 +66,7 @@ static inline void sync_clear_bit(long n
  */
 static inline void sync_change_bit(long nr, volatile unsigned long *addr)
 {
-   asm volatile("lock; btc %1,%0"
+   asm volatile("lock; " __ASM_SIZE(btc) " %1,%0"
 : "+m" (ADDR)
 : "Ir" (nr)
 : "memory");
@@ -78,14 +80,10 @@ static inline void sync_change_bit(long
  * This operation is atomic and cannot be reordered.
  * It also implies a memory barrier.
  */
-static inline int sync_test_and_set_bit(long nr, volatile unsigned long *addr)
+static inline bool sync_test_and_set_bit(long nr, volatile unsigned long *addr)
 {
-   unsigned char oldbit;
-
-   asm volatile("lock; bts %2,%1\n\tsetc %0"
-: "=qm" (oldbit), "+m" (ADDR)
-: "Ir" (nr) : "memory");
-   return oldbit;
+   GEN_BINARY_RMWcc("lock; " __ASM_SIZE(bts),
+*addr, "Ir", nr, "%0", c);
 }
 
 /**
@@ -98,12 +96,8 @@ static inline int sync_test_and_set_bit(
  */
 static inline int sync_test_and_clear_bit(long nr, volatile unsigned long 
*addr)
 {
-   unsigned char oldbit;
-
-   asm volatile("lock; btr %2,%1\n\tsetc %0"
-: "=qm" (oldbit), "+m" (ADDR)
-: "Ir" (nr) : "memory");
-   return oldbit;
+   GEN_BINARY_RMWcc("lock; " __ASM_SIZE(btr),
+*addr, "Ir", nr, "%0", c);
 }
 
 /**
@@ -116,12 +110,8 @@ static inline int sync_test_and_clear_bi
  */
 static inline int sync_test_and_change_bit(long nr, volatile unsigned long 
*addr)
 {
-   unsigned char oldbit;
-
-   asm volatile("lock; btc %2,%1\n\tsetc %0"
-: "=qm" (oldbit), "+m" (ADDR)
-: "Ir" (nr) : "memory");
-   return oldbit;
+   GEN_BINARY_RMWcc("lock; " __ASM_SIZE(btc),
+*addr, "Ir", nr, "%0", c);
 }
 
 #define sync_test_bit(nr, addr) test_bit(nr, addr)






[PATCH] ix86/entry: add instruction suffix

2018-06-25 Thread Jan Beulich
Omitting suffixes from instructions in AT mode is bad practice when
operand size cannot be determined by the assembler from register
operands, and is likely going to be warned about by upstream gas in the
future (mine does already). Add the single missing suffix here.

Signed-off-by: Jan Beulich 
---
 arch/x86/entry/entry_32.S |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- 4.18-rc2/arch/x86/entry/entry_32.S
+++ 4.18-rc2-ix86-entry-insn-suffix/arch/x86/entry/entry_32.S
@@ -477,7 +477,7 @@ ENTRY(entry_SYSENTER_32)
 * whereas POPF does not.)
 */
addl$PT_EFLAGS-PT_DS, %esp  /* point esp at pt_regs->flags */
-   btr $X86_EFLAGS_IF_BIT, (%esp)
+   btrl$X86_EFLAGS_IF_BIT, (%esp)
popfl
 
/*




[PATCH] ix86/entry: add instruction suffix

2018-06-25 Thread Jan Beulich
Omitting suffixes from instructions in AT mode is bad practice when
operand size cannot be determined by the assembler from register
operands, and is likely going to be warned about by upstream gas in the
future (mine does already). Add the single missing suffix here.

Signed-off-by: Jan Beulich 
---
 arch/x86/entry/entry_32.S |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- 4.18-rc2/arch/x86/entry/entry_32.S
+++ 4.18-rc2-ix86-entry-insn-suffix/arch/x86/entry/entry_32.S
@@ -477,7 +477,7 @@ ENTRY(entry_SYSENTER_32)
 * whereas POPF does not.)
 */
addl$PT_EFLAGS-PT_DS, %esp  /* point esp at pt_regs->flags */
-   btr $X86_EFLAGS_IF_BIT, (%esp)
+   btrl$X86_EFLAGS_IF_BIT, (%esp)
popfl
 
/*




Re: [PATCH v5 1/2] xen/PVH: Set up GS segment for stack canary

2018-05-23 Thread Jan Beulich
>>> On 23.05.18 at 16:30,  wrote:
> @@ -98,6 +101,12 @@ ENTRY(pvh_start_xen)
>   /* 64-bit entry point. */
>   .code64
>  1:
> + /* Set base address in stack canary descriptor. */
> + mov $MSR_GS_BASE,%ecx
> + mov $_pa(canary), %rax
> + xor %rdx, %rdx

Why rax and rdx instead of eax and edx? In the former case, the
relocation produced might confuse whatever entity processing it
(it'll have a sign-extended 32-bit quantity to deal with, which
wouldn't allow representing an address in the [2Gb, 4Gb) range).
In the latter case, while surely neither performance nor code size
matter much here, it's still a bad precedent (people copy-and-paste
code all the time): Zero-ing of registers should generally use the
32-bit forms of the insn. Gas has actually gained an optimization
mode recently (upon request from Linus and the x86 maintainers)
to silently "repair" such inefficiencies.

Jan




Re: [PATCH v5 1/2] xen/PVH: Set up GS segment for stack canary

2018-05-23 Thread Jan Beulich
>>> On 23.05.18 at 16:30,  wrote:
> @@ -98,6 +101,12 @@ ENTRY(pvh_start_xen)
>   /* 64-bit entry point. */
>   .code64
>  1:
> + /* Set base address in stack canary descriptor. */
> + mov $MSR_GS_BASE,%ecx
> + mov $_pa(canary), %rax
> + xor %rdx, %rdx

Why rax and rdx instead of eax and edx? In the former case, the
relocation produced might confuse whatever entity processing it
(it'll have a sign-extended 32-bit quantity to deal with, which
wouldn't allow representing an address in the [2Gb, 4Gb) range).
In the latter case, while surely neither performance nor code size
matter much here, it's still a bad precedent (people copy-and-paste
code all the time): Zero-ing of registers should generally use the
32-bit forms of the insn. Gas has actually gained an optimization
mode recently (upon request from Linus and the x86 maintainers)
to silently "repair" such inefficiencies.

Jan




Re: [PATCH v4 1/2] xen/PVH: Set up GS segment for stack canary

2018-05-23 Thread Jan Beulich
>>> On 22.05.18 at 19:10, <boris.ostrov...@oracle.com> wrote:
> On 05/22/2018 12:32 PM, Jan Beulich wrote:
>>>>> On 22.05.18 at 18:20, <boris.ostrov...@oracle.com> wrote:
>>> We are loading virtual address for $canary so we will always have EDX
>>> set to 0x. Isn't that what we want?
>> Oh, that's rather confusing - we're still running on the low 1:1
>> mapping when we're here. But yes, by the time we enter C code
>> (where the GS base starts to matter) we ought to be on the high
>> mappings - if only there wasn't xen_prepare_pvh().
> 
> xen_prepare_pvh() (and whatever it might call) is the only reason for
> this patch to exist. It's the only C call that we are making before
> jumping to startup_64, which I assume will have to set up GS itself
> before calling into C.
> 
> I didn't realize we are still on identity mapping. I'll clear EDX (and
> load $_pa(canary)) then.
> 
> BTW, don't we have the same issue in startup_xen()?

I don't think so, no - there we're on the high mappings already (the
ELF note specifies the virtual address of the entry point, after all).

Jan




Re: [PATCH v4 1/2] xen/PVH: Set up GS segment for stack canary

2018-05-23 Thread Jan Beulich
>>> On 22.05.18 at 19:10,  wrote:
> On 05/22/2018 12:32 PM, Jan Beulich wrote:
>>>>> On 22.05.18 at 18:20,  wrote:
>>> We are loading virtual address for $canary so we will always have EDX
>>> set to 0x. Isn't that what we want?
>> Oh, that's rather confusing - we're still running on the low 1:1
>> mapping when we're here. But yes, by the time we enter C code
>> (where the GS base starts to matter) we ought to be on the high
>> mappings - if only there wasn't xen_prepare_pvh().
> 
> xen_prepare_pvh() (and whatever it might call) is the only reason for
> this patch to exist. It's the only C call that we are making before
> jumping to startup_64, which I assume will have to set up GS itself
> before calling into C.
> 
> I didn't realize we are still on identity mapping. I'll clear EDX (and
> load $_pa(canary)) then.
> 
> BTW, don't we have the same issue in startup_xen()?

I don't think so, no - there we're on the high mappings already (the
ELF note specifies the virtual address of the entry point, after all).

Jan




Re: [PATCH v4 1/2] xen/PVH: Set up GS segment for stack canary

2018-05-22 Thread Jan Beulich
>>> On 22.05.18 at 18:20, <boris.ostrov...@oracle.com> wrote:
> On 05/22/2018 12:10 PM, Jan Beulich wrote:
>>>>> On 22.05.18 at 17:15, <brge...@gmail.com> wrote:
>>> On Tue, May 22, 2018 at 9:57 AM, Jan Beulich <jbeul...@suse.com> wrote:
>>>>>>> On 22.05.18 at 15:45, <brge...@gmail.com> wrote:
>>>>> On Mon, May 21, 2018 at 11:54 PM, Boris Ostrovsky 
>>>>> <boris.ostrov...@oracle.com> 
> wrote:
>>>>>> @@ -98,6 +101,12 @@ ENTRY(pvh_start_xen)
>>>>>> /* 64-bit entry point. */
>>>>>> .code64
>>>>>>  1:
>>>>>> +   /* Set base address in stack canary descriptor. */
>>>>>> +   mov $MSR_GS_BASE,%ecx
>>>>>> +   mov $canary, %rax
>>>>>> +   cdq
>>>>>> +   wrmsr
>>>>> CDQ only sign-extends EAX to RAX.  What you really want is to move the
>>>>> high 32-bits to EDX (or zero EDX if we can guarantee it is loaded
>>>>> below 4G).
>>>> What you describe is CDQE (AT name: CLTD); CDQ (AT: CLTQ)
>>>> sign-extends EAX to EDX:EAX.
>>> But that would still be wrong, as it would set EDX to 0x if
>>> the kernel was loaded between 2G and 4G.  Looking closer at the code,
>>> we just left 32-bit mode, so we must have been loaded below 4G,
>>> therefore EDX must be zero.
>> Ah, yes, indeed.
> 
> We are loading virtual address for $canary so we will always have EDX
> set to 0x. Isn't that what we want?

Oh, that's rather confusing - we're still running on the low 1:1
mapping when we're here. But yes, by the time we enter C code
(where the GS base starts to matter) we ought to be on the high
mappings - if only there wasn't xen_prepare_pvh().

Jan




Re: [PATCH v4 1/2] xen/PVH: Set up GS segment for stack canary

2018-05-22 Thread Jan Beulich
>>> On 22.05.18 at 18:20,  wrote:
> On 05/22/2018 12:10 PM, Jan Beulich wrote:
>>>>> On 22.05.18 at 17:15,  wrote:
>>> On Tue, May 22, 2018 at 9:57 AM, Jan Beulich  wrote:
>>>>>>> On 22.05.18 at 15:45,  wrote:
>>>>> On Mon, May 21, 2018 at 11:54 PM, Boris Ostrovsky 
>>>>>  
> wrote:
>>>>>> @@ -98,6 +101,12 @@ ENTRY(pvh_start_xen)
>>>>>> /* 64-bit entry point. */
>>>>>> .code64
>>>>>>  1:
>>>>>> +   /* Set base address in stack canary descriptor. */
>>>>>> +   mov $MSR_GS_BASE,%ecx
>>>>>> +   mov $canary, %rax
>>>>>> +   cdq
>>>>>> +   wrmsr
>>>>> CDQ only sign-extends EAX to RAX.  What you really want is to move the
>>>>> high 32-bits to EDX (or zero EDX if we can guarantee it is loaded
>>>>> below 4G).
>>>> What you describe is CDQE (AT name: CLTD); CDQ (AT: CLTQ)
>>>> sign-extends EAX to EDX:EAX.
>>> But that would still be wrong, as it would set EDX to 0x if
>>> the kernel was loaded between 2G and 4G.  Looking closer at the code,
>>> we just left 32-bit mode, so we must have been loaded below 4G,
>>> therefore EDX must be zero.
>> Ah, yes, indeed.
> 
> We are loading virtual address for $canary so we will always have EDX
> set to 0x. Isn't that what we want?

Oh, that's rather confusing - we're still running on the low 1:1
mapping when we're here. But yes, by the time we enter C code
(where the GS base starts to matter) we ought to be on the high
mappings - if only there wasn't xen_prepare_pvh().

Jan




Re: [PATCH v4 1/2] xen/PVH: Set up GS segment for stack canary

2018-05-22 Thread Jan Beulich
>>> On 22.05.18 at 17:15, <brge...@gmail.com> wrote:
> On Tue, May 22, 2018 at 9:57 AM, Jan Beulich <jbeul...@suse.com> wrote:
>>>>> On 22.05.18 at 15:45, <brge...@gmail.com> wrote:
>>> On Mon, May 21, 2018 at 11:54 PM, Boris Ostrovsky 
>>> <boris.ostrov...@oracle.com> wrote:
>>>> @@ -98,6 +101,12 @@ ENTRY(pvh_start_xen)
>>>> /* 64-bit entry point. */
>>>> .code64
>>>>  1:
>>>> +   /* Set base address in stack canary descriptor. */
>>>> +   mov $MSR_GS_BASE,%ecx
>>>> +   mov $canary, %rax
>>>> +   cdq
>>>> +   wrmsr
>>>
>>> CDQ only sign-extends EAX to RAX.  What you really want is to move the
>>> high 32-bits to EDX (or zero EDX if we can guarantee it is loaded
>>> below 4G).
>>
>> What you describe is CDQE (AT name: CLTD); CDQ (AT: CLTQ)
>> sign-extends EAX to EDX:EAX.
> 
> But that would still be wrong, as it would set EDX to 0x if
> the kernel was loaded between 2G and 4G.  Looking closer at the code,
> we just left 32-bit mode, so we must have been loaded below 4G,
> therefore EDX must be zero.

Ah, yes, indeed.

Jan




Re: [PATCH v4 1/2] xen/PVH: Set up GS segment for stack canary

2018-05-22 Thread Jan Beulich
>>> On 22.05.18 at 17:15,  wrote:
> On Tue, May 22, 2018 at 9:57 AM, Jan Beulich  wrote:
>>>>> On 22.05.18 at 15:45,  wrote:
>>> On Mon, May 21, 2018 at 11:54 PM, Boris Ostrovsky 
>>>  wrote:
>>>> @@ -98,6 +101,12 @@ ENTRY(pvh_start_xen)
>>>> /* 64-bit entry point. */
>>>> .code64
>>>>  1:
>>>> +   /* Set base address in stack canary descriptor. */
>>>> +   mov $MSR_GS_BASE,%ecx
>>>> +   mov $canary, %rax
>>>> +   cdq
>>>> +   wrmsr
>>>
>>> CDQ only sign-extends EAX to RAX.  What you really want is to move the
>>> high 32-bits to EDX (or zero EDX if we can guarantee it is loaded
>>> below 4G).
>>
>> What you describe is CDQE (AT name: CLTD); CDQ (AT: CLTQ)
>> sign-extends EAX to EDX:EAX.
> 
> But that would still be wrong, as it would set EDX to 0x if
> the kernel was loaded between 2G and 4G.  Looking closer at the code,
> we just left 32-bit mode, so we must have been loaded below 4G,
> therefore EDX must be zero.

Ah, yes, indeed.

Jan




Re: [PATCH v4 1/2] xen/PVH: Set up GS segment for stack canary

2018-05-22 Thread Jan Beulich
>>> On 22.05.18 at 15:45,  wrote:
> On Mon, May 21, 2018 at 11:54 PM, Boris Ostrovsky 
>  wrote:
>> @@ -98,6 +101,12 @@ ENTRY(pvh_start_xen)
>> /* 64-bit entry point. */
>> .code64
>>  1:
>> +   /* Set base address in stack canary descriptor. */
>> +   mov $MSR_GS_BASE,%ecx
>> +   mov $canary, %rax
>> +   cdq
>> +   wrmsr
> 
> CDQ only sign-extends EAX to RAX.  What you really want is to move the
> high 32-bits to EDX (or zero EDX if we can guarantee it is loaded
> below 4G).

What you describe is CDQE (AT name: CLTD); CDQ (AT: CLTQ)
sign-extends EAX to EDX:EAX.

Jan




Re: [PATCH v4 1/2] xen/PVH: Set up GS segment for stack canary

2018-05-22 Thread Jan Beulich
>>> On 22.05.18 at 15:45,  wrote:
> On Mon, May 21, 2018 at 11:54 PM, Boris Ostrovsky 
>  wrote:
>> @@ -98,6 +101,12 @@ ENTRY(pvh_start_xen)
>> /* 64-bit entry point. */
>> .code64
>>  1:
>> +   /* Set base address in stack canary descriptor. */
>> +   mov $MSR_GS_BASE,%ecx
>> +   mov $canary, %rax
>> +   cdq
>> +   wrmsr
> 
> CDQ only sign-extends EAX to RAX.  What you really want is to move the
> high 32-bits to EDX (or zero EDX if we can guarantee it is loaded
> below 4G).

What you describe is CDQE (AT name: CLTD); CDQ (AT: CLTQ)
sign-extends EAX to EDX:EAX.

Jan




Re: [PATCH v4 1/2] xen/PVH: Set up GS segment for stack canary

2018-05-22 Thread Jan Beulich
>>> On 22.05.18 at 05:54, <boris.ostrov...@oracle.com> wrote:
> We are making calls to C code (e.g. xen_prepare_pvh()) which may use
> stack canary (stored in GS segment).
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrov...@oracle.com>

Reviewed-by: Jan Beulich <jbeul...@suse.com>




Re: [PATCH v4 1/2] xen/PVH: Set up GS segment for stack canary

2018-05-22 Thread Jan Beulich
>>> On 22.05.18 at 05:54,  wrote:
> We are making calls to C code (e.g. xen_prepare_pvh()) which may use
> stack canary (stored in GS segment).
> 
> Signed-off-by: Boris Ostrovsky 

Reviewed-by: Jan Beulich 




Re: [PATCH v3 1/2] xen/PVH: Set up GS segment for stack canary

2018-05-18 Thread Jan Beulich
>>> On 17.05.18 at 19:47, <boris.ostrov...@oracle.com> wrote:
> On 05/17/2018 11:02 AM, Jan Beulich wrote:
>>>>> On 17.05.18 at 16:47, <boris.ostrov...@oracle.com> wrote:
>>> @@ -64,6 +67,9 @@ ENTRY(pvh_start_xen)
>>> mov %eax,%es
>>> mov %eax,%ss
>>>  
>>> +   mov $PVH_CANARY_SEL,%eax
>>> +   mov %eax,%gs
>> I doubt this is needed for 64-bit (you could equally well load zero or leave
>> in place what's there in that case),
> 
> I don't understand this.

The actual selector value doesn't matter on 64-bit. Hence you could
omit the load altogether, or you could use plain zero. No need for the
(non-zero) selector, or (by implication) the GDT descriptor.

>>  and loading the selector before setting
>> the base address in the descriptor won't have the intended effect.
> 
> 
> I wasn't sure about this either but then I noticed that
> secondary_startup_64() does it in the same order (although not using the
> MSR).

Well, for one they load a null selector, which is independent of setting up
any GDT descriptors. I also don't understand why you say "although not
using the MSR" when they clearly do. And then, as said above (and also
in a comment in secondary_startup_64()), the actual selector value (and
when / if at all it is loaded) doesn't matter on 64-bit. The ordering does
matter on 32-bit though.

Jan




Re: [PATCH v3 1/2] xen/PVH: Set up GS segment for stack canary

2018-05-18 Thread Jan Beulich
>>> On 17.05.18 at 19:47,  wrote:
> On 05/17/2018 11:02 AM, Jan Beulich wrote:
>>>>> On 17.05.18 at 16:47,  wrote:
>>> @@ -64,6 +67,9 @@ ENTRY(pvh_start_xen)
>>> mov %eax,%es
>>> mov %eax,%ss
>>>  
>>> +   mov $PVH_CANARY_SEL,%eax
>>> +   mov %eax,%gs
>> I doubt this is needed for 64-bit (you could equally well load zero or leave
>> in place what's there in that case),
> 
> I don't understand this.

The actual selector value doesn't matter on 64-bit. Hence you could
omit the load altogether, or you could use plain zero. No need for the
(non-zero) selector, or (by implication) the GDT descriptor.

>>  and loading the selector before setting
>> the base address in the descriptor won't have the intended effect.
> 
> 
> I wasn't sure about this either but then I noticed that
> secondary_startup_64() does it in the same order (although not using the
> MSR).

Well, for one they load a null selector, which is independent of setting up
any GDT descriptors. I also don't understand why you say "although not
using the MSR" when they clearly do. And then, as said above (and also
in a comment in secondary_startup_64()), the actual selector value (and
when / if at all it is loaded) doesn't matter on 64-bit. The ordering does
matter on 32-bit though.

Jan




  1   2   3   4   5   6   7   8   9   10   >