Re: [Xen-devel] [PATCH v10 4/6] xen/x86: use DEFINE_SYMBOL as required

2019-02-26 Thread Ian Jackson
Stefano Stabellini writes ("[PATCH v10 4/6] xen/x86: use DEFINE_SYMBOL as 
required"):
> Use SYMBOLS_SUBTRACT and SYMBOLS_COMPARE in cases of comparisons and
> subtractions of:

Oh and the commit message still mentions old macro names :-).


FTR I think your definition macro should be called SYMBOLS_DEFINE
since it defines two symbols rather than one, but I think this is a
very minor point and if you don't feel like changing it I certainly
won't object.


Regards,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10 4/6] xen/x86: use DEFINE_SYMBOL as required

2019-02-26 Thread Ian Jackson
Stefano Stabellini writes ("[PATCH v10 4/6] xen/x86: use DEFINE_SYMBOL as 
required"):
> Use SYMBOLS_SUBTRACT and SYMBOLS_COMPARE in cases of comparisons and
> subtractions of:
...
> Use explicit casts to uintptr_t when it is not possible to use the
> provided static inline functions.

Why is it not possible ?  You write:

> +/*
> + * Cannot use DEFINE_SYMBOL because of the way they are passed to
> + * apply_alternatives.
> + */
>  extern struct alt_instr __alt_instructions[], __alt_instructions_end[];

But I don't know why you can't pass a `struct abstract_alt_instr*' to
apply_alternatives.

IMO it should be strictly forbidden to ever write this formulation, as
you have above.  See my proposed rule comment for DEFINE_SYMBOL.

Even if you can't use the macros at some particular calculation site,
you should still ensure that ..._end has a different type, to make
sure that no unsafe uses escape.


FTR, I have not reviewed the rest of this patch yet.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10 2/6] xen: introduce DEFINE_SYMBOL

2019-02-26 Thread Ian Jackson
Jan Beulich writes ("Re: [PATCH v10 2/6] xen: introduce DEFINE_SYMBOL"):
> On 26.02.19 at 17:46,  wrote:
> > I am not aware of a standard C type which could be used instead of
> > this struct.  But I think you can use the `packed' attribute to get
> > the right behaviour.  The GCC manual says:
> > 
> >  | Alignment can be decreased by specifying the 'packed' attribute.
> >  | See below.
...
> Until I've looked at this (again) now, I wasn't even aware that
> one can combine packed and aligned attributes in a sensible
> way. May I suggest that, because of this being a theoretical
> issue only at this point, we limit ourselves to the build time
> assertion you suggest?

I am not suggesting combining `packed' and `aligned'.  I am suggesting
only `packed' (but based on text which is in the manual section for
`aligned').  But I am happy with a build-time assertion if you don't
want to add `packed'.  That is just as safe.

> > This is wrong.  The conversion to ptrdiff_t (currently done implicitly
> > by return) must come before the division.  Otherwise it will give the
> > wrong answer when s1 > s2.
> > 
> > Suppose 32-bit, s1=0x0040 s2=0x0020 sizeof=0x10, Then
> > s2-s1=0xffe0, and unsigned division gives
> > (s2-s1)/sizeof=0x0ffe.  Converstion to ptrdiff_t does not change
> > the bit pattern.  But we wanted 0xffe.
> > 
> > Signed integer division by a positive divisor is always defined (and
> > always fits) so doing the conversion first is fine.
> 
> Well, this would come as a side effect if there first was a function
> producing the byte delta, and then the function here would call
> that other function, doing the division on the result.

I don't mind if someone wants to provide a byte delta function.  It
ought to have a different name to `blah_diff' though.  `blah_bytediff'
maybe.

> There's another caveat here though: Even by doing the cast first,
> the division will still be unsigned as long as the sizeof() doesn't also
> get cast to ptrdiff_t.

Yes, the sizeof would have to be cast to ptrdiff_t too.

> One question though is whether we actually care about the case
> when s1 > s2 in the first place. But perhaps it's better to consider
> it right away than getting bitten later on.

Having a thing which silently goes wrong giving bizarre and large
answers is clearly not acceptable...

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH] jessie: Disable use of security.debian.org

2019-02-25 Thread Ian Jackson
We have about a 10% failure rate of a problem where the symptoms are
that the test box fails to get some thing from security.debian.org.

The apt-cacher-ng logs show the relevant test box's ip address
fetching the file that it is supposed to.  But, it is possible that
there are different timeouts, so that does not mean the problem is
inside the colo.

Fetching from other apt sources, notably the main Debian archive and
snapshot.d.o, do not seem to be affected.

Specifically, I searched the logs for the last 1000 host install steps,
and looked for the failures, with the following rune:

 select flight,job,logfile,started from (select *, (select val from runvars r 
where r.job=steps.job and r.flight=steps.flight and r.name='host')  from steps 
where testid like 'host-install%' and flight>13 order by finished desc 
limit 1000) sub where status='broken';

I then used these runes to correlate that with the syslogs from the
installer:

 perl <~/t -ne 'use strict; s/^ *//; my ($flight,$job,$logf) = split / +\| +/; 
next unless $flight =~ m/^\d+$/; my $f = "$flight/$job/?.ts-syslog-server.log"; 
my @y = glob $f; print $_, "\n" for @y;' >~/u

 xargs <~/junk/u egrep -L 'Failed to fetch 
http://security\.debian\.org.*Connection failed'

The only logs which did not mention that error message were three
failed jobs on the same host, joubertin1, which seems not to be
rebooting reliably.

So I think this is a problem with the security.debian.org CDN.

For now, disable security updates entirely.  We don't really care
about the security patch status of test boxes anyway.  Hopefully this
will cause the system to become reliable again.

CC: Juergen Gross 
Signed-off-by: Ian Jackson 
---
 Osstest/Debian.pm | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Osstest/Debian.pm b/Osstest/Debian.pm
index 17dfef54..d991bf48 100644
--- a/Osstest/Debian.pm
+++ b/Osstest/Debian.pm
@@ -932,7 +932,8 @@ END
 $preseed .= <<'END'
 d-i apt-setup/services-select multiselect updates
 END
-if $suite =~ m/jessie/ && $r{arch} eq 'arm64';
+#   if $suite =~ m/jessie/;
+# security.d.o CDN seems unreliable right now
 
 $preseed .= <<"END";
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11 5/9] xen/x86: use DECLARE_BOUNDS as required

2019-03-07 Thread Ian Jackson
Stefano Stabellini writes ("Re: [PATCH v11 5/9] xen/x86: use DECLARE_BOUNDS as 
required"):
> This is problematic. We have also the following instances in this series
> to deal with:
> 
> xen/arch/arm/percpu.c:_free_percpu_area
>   char *p = (char *)__per_cpu_start + __per_cpu_offset[cpu];
> 
> xen/arch/x86/percpu.c:_free_percpu_area
>   char *p = (char *)__per_cpu_start + __per_cpu_offset[cpu];
> 
> xen/arch/x86/setup.c:init_done
>   for ( va = (char *)__init_begin; init_lt(va, __init_end); va += PAGE_SIZE )
> 
> xen/arch/x86/alternative.c:apply_alternatives
>   for ( a = base = (struct alt_instr *)start; alt_instr_lt(a, end); a++ )

Presumably you will be writing some explanation as to why each of
these is OK ?

> In all these cases we actually end up modifying the object. I suggest
> we remove the const from either __DECLARE_BOUNDS (so from everywhere),
> or just for per_cpu, init, and alt_instr by introducing another MACRO.

My personal opinion is that you should:

 * Introduce a new macro DECLARE_BOUNDS_NONCONST.
 * Write a clear explanation of when DECLARE_BOUNDS_NONCONST is
   permitted and when DECLARE_BOUNDS must be used.
 * Implement both macros in terms of a common internal macro
   which takes an argument CONST which is empty or `const'.

Ian.

PS in prose, `macro' is written thus, not in all caps.  It is not an
acroynom; the etymology is from ancient Greek.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11 3/9] xen: introduce DECLARE_BOUNDS

2019-03-07 Thread Ian Jackson
Jan Beulich writes ("Re: [PATCH v11 3/9] xen: introduce DECLARE_BOUNDS"):
> On 06.03.19 at 21:55,  wrote:
> > On Wed, 6 Mar 2019, Jan Beulich wrote:
> > uintptr_t is the integer type corresponding to a pointer, so we should
> > use uintptr_t first. ptrdiff_t is the difference type so we should cast
> > to it afterwards. Specifically, uintptr_t is unsigned and ptrdiff_t is
> > signed. So I don't think it would be correct to do:
> > 
> >   return (ptrdiff_t)((ptrdiff_t)s2 - (ptrdiff_t)s1);
> > 
> > Or am I missing your point?
> 
> Well, I really mean
> 
>return (ptrdiff_t)s2 - (ptrdiff_t)s1;
> 
> But that aside - let's consider all three cases:
> 
> 1) sizeof(ptrdiff_t) == sizeof(void *)
> 
> All fine. And you'll have some difficulty finding a platform where this
> isn't the case.

No.  This is not fine.  Signed integer subtraction sometimes has UB.

To use signed integer subtraction safely in C it is necessary to prove
that it cannot overflow.  I doubt that such a proof would be a
tractable thing to try to write for a general macro like this.  It
would imply some complicated constraint on the arguments. [1]

OTOH, unsigned integer subtraction is always defined; and the
subsequent conversion to signed is also always defined.


Jan, I appreciate you looking at this stuff in detail.

But it is important tht these kind of review comments are actually
correct.  This is not the first occasion in the discussion of this
series where you have advocated a construct which contains a lurking
possibility of UB, and argued against the corresponding construct
which has no UB.


Conversely, I think the discussion of the sizes of these types is not
really relevant.  To port Xen it is necessary to have an environment
where
sizeof(ptrdiff_t) == sizeof(uintptr_t)
== sizeof(void*) == sizeof(struct maxalign*)
and I think there is little harm in further baking in those
assumptions.


Ian.

[1] In particular ISTM that on systems with unsigned pointers,
pointers around the 0x8bazillion boundary cause trouble.

Say, for any reason someone were to use the linker machinery to
produce two pointer constants 0x7f00 and 0x8100 .  Then that's
the two signed numbers 7f00.._16 and -7f00.._16 and the difference is
fe00.._16 which is not representable in the same sized signed integer.
But of course what the programmer wanted was to treat that mod 2^n and
get 200_16 ie 0x0200.. and that is what using unsigned arithmetic,
with last-minute conversion to signed, does.

To use signed arithmetic it would be necessary to somehow exclude such
uses.  Maybe they are fanciful because the two linker symbols would
often be in different notional segments but ISTM that this argument
depends on assumptions about the addressing layout and is in general
rather weak.  So a BUILD_BUG_ON would be needed.

All of this seems much more easily avoided by using uintptr_t.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11 5/9] xen/x86: use DECLARE_BOUNDS as required

2019-03-07 Thread Ian Jackson
Stefano Stabellini writes ("Re: [PATCH v11 5/9] xen/x86: use DECLARE_BOUNDS as 
required"):
> On Wed, 6 Mar 2019, Jan Beulich wrote:
> > Is the line wrapping really needed here?
> 
> It would end at 80 characters exactly otherwise. I am happy to do as you
> prefer.

Certainly I prefer lines to end strictly less than 80 characters and
preferably even shorter.  My mailer/editor produces wrap damage for
exactly-80-character lines.

I think this wrapping was introduced by Stefano after a prompt from
me.

Jan, it is quite unfortunate that you are replying to Stefano to
disagree with things that Stefano did because I suggested them, rather
than replying to my patch comments.  We must not put Stefano in the
middle of a disagreement between different committers.

On this style question, while I have an opinion, I don't consider
myself a maintainer, so the hypervisor maintainers' answer is
definitive.


Nevertheless, I will have one go at trying to convince Jan:

Note that:

 - When code is turned into a patch, an extra character is added for
   the diff +/-.  That means that 80-column code becomes 81 columns
   wide.

 - When a patch is quoted for review in email, two (usually) extra
   quoting characters are added ('> ') for each level of reply,
   so 80-column code becomes 83 or 85 (or more) columns wide.

 - One purpose of the line length limit is to fit within a
   conventional 80-column text terminal window (or at least, to
   minimise the number of lines which overflow such a window)

 - In an 80 column ssh session, simple representations are only
   capable of unambiguously displaying lines of up to 79 characters.

 - Therefore the total available code width which can be displayed
   unambiguously in an 80-column ssh session, within a singly quoted
   patch, is 76 characters.  Longer lines produce wrap damage.

To me would seem to imply that a *code* line length limit of 76 or 74
characters should be usual.  Certainly it seems churlish to object to
patches where the new code is wrapped to avoid lines >76.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11 5/9] xen/x86: use DECLARE_BOUNDS as required

2019-03-07 Thread Ian Jackson
Jan Beulich writes ("Re: [PATCH v11 5/9] xen/x86: use DECLARE_BOUNDS as 
required"):
> I'd like to note though that in the first two cases we don't alter the
> declared object anyway, but instead a derived one; the declaration
> should not use const for other reasons though. And the 3rd case is
> fiddling with something that has lost its meaning as an object. In fact
> we're about to free the underlying memory. In this case I'd prefer to
> retain the const, but I won't insist as the symbol is non-const right
> now as well.

I think if you do this

  extern const struct blah blah_start[];

it is not safe to cast the const away later.

From C99+TC1+TC2, 6.7.3 5:

  5  If an attempt is made to modify an object defned with a
 const-qualifed type through use of an lvalue with
 non-const-qualifed type, the behavior is undefned. ...

Of course `extern const struct blah blah_start[]' is only a
declaration, not a definition.

But all the declarations/definitions even in different translation
units must be `compatible' (or UB, 6.2.7 2) and for types to be
`compatible' they must be identically qualified (6.7.3 9).

So the compiler authors will say that

  `extern const struct blah blah_start[]'

implies a definition somewhere of an array 

  `extern const struct blah blah_start[ somesize ]

and therefore any accesses `to blah_start' (which in their mind means
any accesses via a pointer whose provenance is blah_start) via
non-const lvalues (even for reading!) is UB.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11 1/9] xen: use __UINTPTR_TYPE__ for uintptr_t

2019-03-08 Thread Ian Jackson
Andrew Cooper writes ("Re: [PATCH v11 1/9] xen: use __UINTPTR_TYPE__ for 
uintptr_t"):
> NACK.
> 
> Stop messing around with GCC-isms and use the spec-compliant way of
> getting uintptr_t (and others).
> 
> #include 

If everything is working correctly, stdint.h is provided by the
compiler (eg by libgcc) and this will DTRT.

However, if things are not working correctly, we will pick up the host
operating system .  I consider it unacceptable that a build
system issue would be able to cause us to compile a hypervisor with a
busted uintptr_t.  (This is particularly bad given that the hypervisor
currently hardly uses uintptr_t at all, which means such a
miscompilation might even go largely undetected.)

I would tolerate this approach if it were accompanied by an
appropriate set of BUILD_BUG_ON which will detect if this has
occurred, for example by checking that sizeof(void*) ==
sizeof(uintptr_t).


> That way, *any* compiler will give you a working uintptr_t, not just
> those which are emulating GCC's internals.

It has been conventional for many years in embedded programming to,
effectively, briefly put on the hat of the C language implementor and
provide one's own uintptr_t et al.

For this reason all compilers have always provided compiler-defined
types like __UINTPTR_TYPE__.

It is not conceivable that we would succeed in porting Xen to a
compiler which was so strange that it did not provide any internal
type aliases for these standard types.  (Indeed such a compiler
authors would have to do more work to implement !)  It is
highly unlikely that we would want to try porting Xen to a compiler
which didn't provide these with these now-conventional names, given
that any contemporary compiler author can trivially provide them.

And the (vanishingly unlikely) failure mode is a completely obvious
missing type error.

So __UINTPTR_TYPE__ is strictly superior: it is de jure correct
according to the manual for the compiler that invented the name; it is
currently implemented everywhere we care about (including, right now,
llvm), and will almost certainly remain available, and unlike
 there is absolutely no risk of it meaning anything else.


To put it another way,  is a layer of indirection to
__UINTPTR_TYPE__.  It is provided by libgcc (and other compilers'
equivalents) for the benefit of programs which need to run on much
wider variety of platforms than Xen, and to provide a set of more
convenient names which are not available by default (for compatibility
reasons).

We do need the convenient and standard names, but we can provide them
ourselves.  Relying on an external layer of indirection, however,
exposes us to needless additional risk that the indirection ends up
pointing to the wrong place.


> This isn't rocket science, and I know all of us have better things to be
> doing that wasting time arguing over aspects which are unrelated to
> actually fixing the problem.

I suggest we compromise on  + BUILD_BUG_ON since at least
two of us seem to be able to tolerate that.


Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12 v2] libxl: prepare environment for domcreate_stream_done

2019-03-08 Thread Ian Jackson
Olaf Hering writes ("Re: [PATCH v2] libxl: prepare environment for 
domcreate_stream_done"):
> Am Fri, 8 Mar 2019 14:08:10 +0000
> schrieb Ian Jackson :
> 
> > In fact this is OK because domcreate_stream_done only reads srs->dcs
> > and then does everything with the obtained dcs.  But there is nothing
> > there to indicate that srs might be mostly uninitialised.  Maybe we
> > could add a comment there, something like:
> > 
> >   /* NB perhaps only srs->dcs is valid; eg in the case of an
> >* early branch to domcreate_bootloader_done's `out' block */
> 
> I'm find with that. Can this comment be adjusted at commit time?

Sure, although if it's me that commits it I'd do it as a followup
patch.

We need a release ack I think for 4.12.  Subject adjusted, CCing
Juergen.  Juergen, will bounce the original patch to you too.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] libxl: prepare environment for domcreate_stream_done

2019-03-08 Thread Ian Jackson
Olaf Hering writes ("[PATCH v2] libxl: prepare environment for 
domcreate_stream_done"):
> The function domcreate_bootloader_done may branch early to
> domcreate_stream_done, in case some error occoured. Here srs->dcs will be
> NULL, which leads to a crash.

Thanks.  I think this is OK as far as it goes.  But:

> +/* Prepare environment for domcreate_stream_done */
> +dcs->srs.dcs = dcs;

The need for this comment is telling us something about the weird code
structure here.  We initialise all of dcs->srs much later, so even
with this change we call domcreate_stream_done with a
mostly-uninitialised srs.

In fact this is OK because domcreate_stream_done only reads srs->dcs
and then does everything with the obtained dcs.  But there is nothing
there to indicate that srs might be mostly uninitialised.  Maybe we
could add a comment there, something like:

  /* NB perhaps only srs->dcs is valid; eg in the case of an
   * early branch to domcreate_bootloader_done's `out' block */

?

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11 5/9] xen/x86: use DECLARE_BOUNDS as required

2019-03-08 Thread Ian Jackson
Jan Beulich writes ("Re: [PATCH v11 5/9] xen/x86: use DECLARE_BOUNDS as 
required"):
> Ian Jackson  03/07/19 3:44 PM >>>
> >Jan Beulich writes ("Re: [PATCH v11 5/9] xen/x86: use DECLARE_BOUNDS as 
> >required"):
> >> I'd like to note though that in the first two cases we don't alter the
> >> declared object anyway, but instead a derived one; the declaration
> >> should not use const for other reasons though. And the 3rd case is
> >> fiddling with something that has lost its meaning as an object. In fact
> >> we're about to free the underlying memory. In this case I'd prefer to
> >> retain the const, but I won't insist as the symbol is non-const right
> >> now as well.
> >
> >I think if you do this
> >extern const struct blah blah_start[];
> >it is not safe to cast the const away later.
...
> It looks to me as if we were in agreement then.

Perhaps so.

> I was talking about freeing an object that was const-qualified, not
> modifying it.

I'm not sure what you mean, precisely.  Do you mean this:

   extern const struct blah blah_start[];
   ...
   free(blah_start);

?  But I'm sure the hypervisor doesn't have a function free().

> The scrubbing of the memory could be considered part
> of the freeing, it just ought to occur up front because of otherwise
> possible races, and because we have no means to tell the freeing
> function to do the scrubbing (in fact I should say "we used to have
> no means, as I think right now the memory would be scrubbed
> implicitly, so the explicit scrubbing could probably be dropped).

Maybe you mean this:

   extern const struct blah blah_start[];
   ...
   memset(blah_start, 0, size of blah array);

?  That is clearly forbidden, regardless of whether the cost
is cast away.  But I think this:

   extern struct blah blah_start[];
   ...
   memset(blah_start, 0, size of blah array);

is fine.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] SRSL People... [PATCH v11 0/9] misc safety certification fixes

2019-03-08 Thread Ian Jackson
Andrew Cooper writes ("Re: SRSL People... [PATCH v11 0/9] misc safety 
certification fixes"):
> The rational for this series is to satisfy MISRA.  MISRA have said in no
> uncertain terms that all of these tricks are unacceptable, and have
> identified the one acceptable option.  By not doing what MISRA said,
> this series doesn't move Xen any closer to passing certification.

Can you please point me to the message from the MISRA folks ?  I seem
to have missed it.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11 5/9] xen/x86: use DECLARE_BOUNDS as required

2019-03-08 Thread Ian Jackson
Stefano Stabellini writes ("[PATCH v11 5/9] xen/x86: use DECLARE_BOUNDS as 
required"):
> Use DECLARE_BOUNDS and the two static inline functions that come with it
> for comparisons and subtractions of:
> 
> __2M_rwdata_start, __2M_rwdata_end, __end_vpci_array,
> __start_vpci_array, _stextentry, _etextentry, __trampoline_rel_start,
> __trampoline_rel_stop, __trampoline_seg_start, __trampoline_seg_stop
> __per_cpu_start, __per_cpu_data_end
> 
> possible to use the provided static inline functions.
> M3CM: Rule-18.2: Subtraction between pointers shall only be applied to
> pointers that address elements of the same array.

Hi.  I picked one of these to briefly look at the detail of and I
spotted this:

> @@ -49,7 +50,7 @@ static void _free_percpu_area(struct rcu_head *head)
>  {
>  struct free_info *info = container_of(head, struct free_info, rcu);
>  unsigned int cpu = info->cpu;
> -char *p = __per_cpu_start + __per_cpu_offset[cpu];
> +char *p = (char *)__per_cpu_start + __per_cpu_offset[cpu];

This is dangerous because it turns __per_cpu_start which is a magic
linker symbol pointer into one which you might try to compare with
other pointers.

I can't see from your patch what else is done to `p'.  Something ought
to be done to prevent future programmers from comparing p to other
pointers.

Some extended comment, or something, is likely to be needed every time
you introduce a cast.

>  free_xenheap_pages(p, PERCPU_ORDER);

JOOI, why does free_xenheap_pages not take a void* ?

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11 5/9] xen/x86: use DECLARE_BOUNDS as required

2019-03-08 Thread Ian Jackson
Jan Beulich writes ("Re: [PATCH v11 5/9] xen/x86: use DECLARE_BOUNDS as 
required"):
> Ian Jackson  03/07/19 3:02 PM
> >Jan, it is quite unfortunate that you are replying to Stefano to
> >disagree with things that Stefano did because I suggested them, rather
> >than replying to my patch comments.  We must not put Stefano in the
> >middle of a disagreement between different committers.
> 
> I'm sorry, but I may have easily overlooked this earlier comment of yours.
> Anyway, a later reply by Stefano suggests that the line would end up
> being 79 chars, which is in line with ./CODING_STYLE.

Right.  Given others' replies, I won't press this point further.

Of course `less than 80 characters' would mean that the maximum was 79
but if you all decide to increase it by one then grump but whatever.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11 9/9] xen: explicit casts when DECLARE_BOUNDS cannot be used [and 1 more messages] [and 1 more messages]

2019-03-08 Thread Ian Jackson
Jan Beulich writes ("Re: [PATCH v11 9/9] xen: explicit casts when 
DECLARE_BOUNDS cannot be used [and 1 more messages]"):
> Ian Jackson  03/07/19 4:26 PM >>>
> >Jan, I'm not sure exactly what you are suggesting.  Currently the
> >array has one pointer per element.  Are you suggesting it should have
> >two pointers (start and end), with different notional types ?
> 
> Yes.

Right.

> >If that is OK from a perf point of view then it is an easy answer
> >(although a bit tiresome since more linker symbols will have to be
> >generated).
> 
> This is init-time code and init-time data, so to me neither the performance
> aspect nor the doubled storage requirements would really matter.

Jolly good.

> Both could perhaps even be eliminated by using an array of unions.

If there are no compelling perf reassons to do otherwise, let us do
something which is obviously correct, rather than complicating
matters.


Stefano Stabellini writes ("Re: [PATCH v11 9/9] xen: explicit casts when 
DECLARE_BOUNDS cannot be used [and 1 more messages]"):
> On Thu, 7 Mar 2019, Ian Jackson wrote:
> > Jan has one suggestion, which I don't fully understand (see below).
> > Alternatively I suggest the following ad-hoc approach:

Probably, we should follow Jan's suggestion and not mine.


> > I also disagree with the wording of the comment.  It is seriously
> > misleading.  These symbols do in fact refer to the same object!
> > The problem is that the compiler thinks otherwise.  You need wording
> > like that in DECLARE_BOUNDS.  (Or a reference to it.)
> > 
> > > and if you think it is correct, then no
> > > matter what you do the behavior is undefined. Instead I view the
> > > entirety of the .bug_frames.* sections as a single array, with
> > > labels placed not only at start and end, but also in the middle. I
> > > think the code here would better also be taken care of by the
> > > DECLARE_BOUNDS() machinery, dividing the single array into
> > > multiple smaller ones.
> > 
> > Jan, I'm not sure exactly what you are suggesting.  Currently the
> > array has one pointer per element.  Are you suggesting it should have
> > two pointers (start and end), with different notional types ?
> 
> I am not fully understanding the suggestion either.
> 
> Even if we split it into multiple start-end pairs, still we won't be
> able to compare start/end of a pair with start/end of different pair
> without casts.

I don't think this bug_frames code does that ?  Maybe I have misread
it.


> This makes sense. Also, I understand _stext and __2M_rwdata_end better
> than the bug_frames and I should be able to write an half-decent
> explanation here. FYI _stext is already converted to DEFINE_BOUNDS in
> this series, but you are right that __2M_rwdata_end is not. I'll fix
> that.

Thanks.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11 3/9] xen: introduce DECLARE_BOUNDS

2019-03-08 Thread Ian Jackson
Jan Beulich writes ("Re: [PATCH v11 3/9] xen: introduce DECLARE_BOUNDS"):
> >No.  This is not fine.  Signed integer subtraction sometimes has UB.
...
> I've spent an hour trying to find the relevant parts of the spec, but I'm
> afraid I've once again failed at finding all necessary pieces.

The spec is obtuse indeed.

> This does not make any provisions for unsigned types being special.

6.2.5 9
    A computation involving unsigned operands can never overfow,
   because a result that cannot be represented by the resulting
   unsigned integer type is reduced modulo the number that is one
   greater than the largest value that can be represented by the
   resulting type.

> As to the unsigned -> signed conversion, according to, again in the
> Conversions section,
> 
> "Otherwise, the new type is signed and the value cannot be represented
>  in it; either the result is implementation-defined or an implementation-
>  defined signal is raised."
> 
> this is implementation defined according to my understanding (but I take
> it that we're fine with this).

Yes.  On the platforms currently supported by Xen, the
`implementation' `defines' that the result is simply the same bit
pattern.  I say `implementation' in quotes because really we mean
compiler, and `defines' because while no doubt there is a formal
conformance statement somewhere, what we are relying on is instead
something much more general:

The community of C programmers for `normal' architectures effectively
require that conversion to signed integer work this way.  A platform
where that didn't work would be strange and would produce a lot of
other porting difficulties too.  And because the result is *defined*,
the compiler authors can't use this as an optimisation opportunity so
aren't motivated to sophistic justifications for misbehaviour.

> >Conversely, I think the discussion of the sizes of these types is not
> >really relevant.  To port Xen it is necessary to have an environment
> >where
> >sizeof(ptrdiff_t) == sizeof(uintptr_t)
> >== sizeof(void*) == sizeof(struct maxalign*)
> >and I think there is little harm in further baking in those
> >assumptions.
> 
> Good (except that I don't understand the struct maxalign* part - neither
> why you use a pointer there, nor - assuming there was no pointer - how
> that would be in line with __{,u}int128_t or with vector types).

On some machines, pointers to different types have different
representations and even different sizes.  Xen cannot be ported to
such machines - at least, not without an awful lot of work.

Xen assumes that all pointers have the same representation.

`struct maxalign*' was a somewhat-informal way of referring to the
type of a pointer to an object type with the maximal alignment.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11 1/9] xen: use __UINTPTR_TYPE__ for uintptr_t

2019-03-07 Thread Ian Jackson
Jan Beulich writes ("Re: [PATCH v11 1/9] xen: use __UINTPTR_TYPE__ for 
uintptr_t"):
> On 06.03.19 at 22:16,  wrote:
> > Also, it is not a good idea to use __mode__(__pointer__) to define
> > uintptr_t, because it relies on an obscurely defined GCC feature whose
> > semantics might be taken to imply that the things are really pointers.
> 
> "Obscurely defined" is rather subjective, I'd say. The "pointer" mode
> is precisely for this according to my understanding,

Unfortunately it is not your understanding that is relevant, but the
possible future understanding of compiler authors.

> and personally I find "You may also specify a mode of byte or
> __byte__ to indicate the mode corresponding to a one-byte integer,
> word or __word__ for the mode of a one-word integer, and pointer or
> __pointer__ for the mode used to represent pointers" sufficiently
> clear, and not leaving room for any (hidden) interpretation like "to
> imply that the things are really pointers".

The problem is the word `mode'.  I cannot find a definition of that in
this context in the GCC manual.  The only thing is the extremely
informal description you quote.  Normally in English `mode' would mean
something other than merely `size'.

I think in fact the word `mode' here is a reference to a concept in
the GCC compiler internals.  This is really not a situation we should
be optimistic about, particularly given that we are in serious dispute
with compiler authors about the meanings of specifications and that
compiler authors seem quite willing to engage in rank sophistry.
Relying on our good faith interpretation of some vague text is a
seriously bad idea.

On the other hand, the cpp documentation has this:

  __UINTPTR_TYPE__

 These macros are defined to the correct underlying types for the
 'size_t', 'ptrdiff_t', 'wchar_t', 'wint_t', 'intmax_t', ...

This is completely clear and unambiguous.

If I were Stefano I would have quoted these manual sections in my
commit message or a code comment.  I would have hoped that to be
conclusive.

> ... in line with
> 
> void
> c_stddef_cpp_builtins(void)
> {
>   builtin_define_with_value ("__SIZE_TYPE__", SIZE_TYPE, 0);
>   builtin_define_with_value ("__PTRDIFF_TYPE__", PTRDIFF_TYPE, 0);
> [...]
>   if (INTPTR_TYPE)
> builtin_define_with_value ("__INTPTR_TYPE__", INTPTR_TYPE, 0);
>   if (UINTPTR_TYPE)
> builtin_define_with_value ("__UINTPTR_TYPE__", UINTPTR_TYPE, 0);
> }

I have no idea what this is.  Is this some piece of the cpp source
code ?  What is your point in quoting it ?

> > [1] https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html 
> 
> Note that this also says "Some of these macros may not be defined on
> particular systems if GCC does not provide a ‘stdint.h’ header on those
> systems." I have absolutely no idea under what conditions gcc may not
> (want to) provide stdint.h.

We do not need to worry about this.

What this means is that if the underlying gcc platform does not have a
uintptr_t then we don't get __UINTPTR_T__.  If that is the case then
we need to fix GCC, not use some random other ad-hoc compiler feature
which we hope, without clear justification, will DTRT.

Furthermore, the possible failure case with __UINTPTR_T__is an
undefined type error.  Failures due to use of the mode attribute are
unknowable because there is no real specification for it.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11 9/9] xen: explicit casts when DECLARE_BOUNDS cannot be used [and 1 more messages]

2019-03-07 Thread Ian Jackson
Stefano Stabellini writes ("[PATCH v11 9/9] xen: explicit casts when 
DECLARE_BOUNDS cannot be used"):
> Sometimes the static inline functions provided by DECLARE_BOUNDS cannot
> be used. This patch uses explicit casts to uintptr_t in those cases.
> 
> M3CM: Rule-18.2: Subtraction between pointers shall only be applied to
> pointers that address elements of the same array

IMO these ad-hoc workarounds *must* be accompanied by a clear
explanation of why the DECLARE_BOUNDS arragement is not suitable.


In general I am very strongly of the opinion that:

 * Code should be written so that it does not introduce the risk of
   accidentally inducing UB in correct-looking constructs elsewhere.

 * In particular, declarations of linker symbols should always be
   written in a way that means that plain correct-looking code will
   either *be* correct or produce a compile error.

 * Exceptions of whatever kind should be fully justified, in the
   code.  Usually this will mean a clear and detailed and rigorous
   comment which contains a proof (or similar argument) that
   the approach taken is sound.


Considering these two in reverse order.

bug_frames
--

What appears to be going on is this:

setup_virtual_regions contains a static const array of pointers to
struct bug_frame.  These struct bug_frame* values are themselves
linker symbol values.

Because the compiler has visibility of all of this, it is allowed to
assume that these values do not change.  It will therefore wrongly
assume that they are incomparable.

(All of this ought to have been explained in comments in the code.
Given that it wasn't, a comment ought to have been in your patch.
I shouldn't have had to go and dig into the code.)

And I'm afraid I do not like your patch.  The declaration in eg
xen/include/asm-arm/bug.h

  extern const struct bug_frame __start_bug_frames[],
__stop_bug_frames_0[],
__stop_bug_frames_1[],
__stop_bug_frames_2[];

is a clear violation of the rule in the DECLARE_BOUNDS comment:

+  * These macros, or an alternative technique, MUST be used any time
+  * linker symbols are imported into C via the `extern []' idiom.

I think an `alternative technique' should be applied *at the time of
the declaration* and it should suffice to *mostly avoid the risk of
dangerous uses*.

Jan has one suggestion, which I don't fully understand (see below).
Alternatively I suggest the following ad-hoc approach:

  * Rename __start_bug_frames etc. to
  __UBDANGER_*bug_frame*
(and provide an accompanying comment saying what the UB danger is
and how UB must be avoided).

  * Change the definition of bug_frames to
  static const uintptr_t __initconstrel bug_frames[] ...

  * Provide an arch-independent helper macro to both declare
__UBDANGER_*bug_frame*, converting the pointers to
uintptr_t, and initialise the array.

  * In setup_virtual_regions, cast the uintptr_t to a pointer.  This
needs to be accompanied by a substantial comment explaining why
this is safe.  Elements of the proof are
  - stating that the thing came from a pointer which came
from a linker symbol, so it is a valid pointer value
  - some kind of argument that no code elsewhere will
compare pointers from different core_init.frame
and core_init.ex entries.
I observe that with the current code I think making such an
argument may involve doing something about core_init.ex and
core_init.ex_end.  Not sure.

An alternative would be to construct this table in a different
translation unit, in which case the compiler compiling
setup_virtual_regions cannot `prove' the falsehood .  (Note that we
must be disabling whole-program optimisation.  I hope we are!)


Jan writes:

> I disagree with the comment,

I also disagree with the wording of the comment.  It is seriously
misleading.  These symbols do in fact refer to the same object!
The problem is that the compiler thinks otherwise.  You need wording
like that in DECLARE_BOUNDS.  (Or a reference to it.)

> and if you think it is correct, then no
> matter what you do the behavior is undefined. Instead I view the
> entirety of the .bug_frames.* sections as a single array, with
> labels placed not only at start and end, but also in the middle. I
> think the code here would better also be taken care of by the
> DECLARE_BOUNDS() machinery, dividing the single array into
> multiple smaller ones.

Jan, I'm not sure exactly what you are suggesting.  Currently the
array has one pointer per element.  Are you suggesting it should have
two pointers (start and end), with different notional types ?

If that is OK from a perf point of view then it is an easy answer
(although a bit tiresome since more linker symbols will have to be
generated).


__start_xen
---

> @@ -976,7 +976,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>   * respective 

Re: [Xen-devel] [PATCH 0/3] docs: User oriented documentation

2019-03-20 Thread Ian Jackson
Andrew Cooper writes ("Re: [PATCH 0/3] docs: User oriented documentation"):
> The point of using virtualenv is to get a known-compatible set of
> dependencies. See docs/sphinx/requirements.txt in patch 1.

I don't think we need to have more curlbashware in the Xen build tree.
Sphinx 1.4.9 is what is needed according to the metadata in your patch
1.  It is in Debian stable.  Likewise your metadata says docutils 0.12
is needed and Debian stable has 0.13.1.

> Use of the distro-packaged versions of sphinx/rtd-theme/docutils may
> work, but can be very hit-and-miss.

On irc you wrote:

 11:39  Why on earth do we need [virtualenv] ?
 11:44  in this case, because stretch has a known-buggy
 combination of docutils and sphinx

That does not seem to be documented in your
docs/sphinx/requirements.txt.

Is the fix available in stretch-backports ?

I really don't want to install virtualenv on xenbits.  Nor do I want
anyone doing so ad-hoc in their filespace.  This is not an appropriate
approach to software distribution and management.  And yes, I know
that I am rejecting the underlying design principle of things like
virtualenv, NPM, etc.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/3] docs: User oriented documentation

2019-03-20 Thread Ian Jackson
Andrew Cooper writes ("Re: [PATCH 0/3] docs: User oriented documentation"):
> It turns out that I mis-interpreted the compatibility note in
> https://www.kernel.org/doc/html/latest/doc-guide/sphinx.html# and
> stretch is probably fine.

OK, good.

Well, I think we could say that to build this documentation you have
to have the relevant tools.  sphinx is certainly well-regarded and
this is probably a worthwhile upgrade.

What do the earlier tools do ?  Should we print some warning and
disable the build, or will they generate slightly-mangled output, or
what ?

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] libxc: fix HVM core dump

2019-03-20 Thread Ian Jackson
Wei Liu writes ("[PATCH v2] libxc: fix HVM core dump"):
> f969bc9fc96 forbid get_address_size call on HVM guests, because that
> didn't make sense. It broke core dump functionality on HVM because
> libxc unconditionally asked for guest width.
> 
> Force guest_width to a sensible value.
> 
> Reported-by: Igor Druzhinin 
> Signed-off-by: Wei Liu 

LGTM.  Thanks for the copious comment.

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] libxc: fix HVM core dump

2019-03-20 Thread Ian Jackson
Wei Liu writes ("[PATCH] libxc: fix HVM core dump"):
> f969bc9fc96 forbid get_address_size call on HVM guests, because that
> didn't make sense. It broke core dump functionality on HVM because
> libxc unconditionally asked for guest width.
...
> +if ( !auto_translated_physmap &&
> + xc_domain_get_guest_width(xch, domid, >guest_width) != 0 )
> +{
> +PERROR("Could not get address size for domain");
> +goto out;
> +}

Did you consider and reject the option of writing:

if ( auto_translated_physmap ) {
 ... xc_domain_get_guest_width(,,>guest_width,,) ...
} else {
 dinfo->guest_width = sizeof(unsigned long)
}

This is perhaps less likely to leave dinfo->guest_width as an
uninitialised hazard.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 5/5] release technician checklist: More explicit XEN_EXTRAVERSION

2019-03-11 Thread Ian Jackson
In particular, say clearly that X.Y-unstable should be thus, not
X.Y.0-unstable.

CC: Jan Beulich 
Signed-off-by: Ian Jackson 
---
 docs/process/release-technician-checklist.txt | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/docs/process/release-technician-checklist.txt 
b/docs/process/release-technician-checklist.txt
index ebf338c9d4..f4bee4ee13 100644
--- a/docs/process/release-technician-checklist.txt
+++ b/docs/process/release-technician-checklist.txt
@@ -59,7 +59,11 @@ t=RELEASE-$r
 * change xen-unstable xen/Makefile XEN_EXTRAVERSION
 # if main version number has changed (eg 4.7 -> 4.8) rerun ./autogen.sh
 * rerun ./autogen.sh to update version number in configure
-#- XEN_EXTRAVERSION should be `.0-rc$(XEN_VENDORVERSION)'
+#- XEN_EXTRAVERSION should be as follows
+#  `.0-rc$(XEN_VENDORVERSION)'   during freeze, first rc onwards 
(including staging, before branching)
+#  `-unstable$(XEN_VENDORVERSION)'   unstable aka unfrozen staging (or 
unstable branch, after branching)
+#  `.0$(XEN_VENDORVERSION)'  actual release of Xen X.Y.0 (aka 
first actual release of Xen X.Y)
+#  `.Z$(XEN_VENDORVERSION)'  actual release of Xen X.Y.Z (stable 
point realase)
 #
 #- turn off debug on stable branches, if not already done
 #   - tools/Rules.mk
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 4/5] release technician checklist: Reformat Config.mk changes

2019-03-11 Thread Ian Jackson
One per line is a lot easier to read.

Signed-off-by: Ian Jackson 
---
 docs/process/release-technician-checklist.txt | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/docs/process/release-technician-checklist.txt 
b/docs/process/release-technician-checklist.txt
index 5dd85dbc40..ebf338c9d4 100644
--- a/docs/process/release-technician-checklist.txt
+++ b/docs/process/release-technician-checklist.txt
@@ -49,7 +49,10 @@ t=RELEASE-$r
 * consider bumping sonames of shlibs
 
 * change xen-unstable README (should say "Xen 4.5" in releases and on stable 
branches, "Xen 4.5-unstable" on unstable)
-* change xen-unstable Config.mk (QEMU_UPSTREAM_REVISION, 
QEMU_TRADITIONAL_REVISION, MINIOS_UPSTREAM_REVISION)
+* change xen-unstable Config.mk
+#   QEMU_UPSTREAM_REVISION,
+#   QEMU_TRADITIONAL_REVISION
+#   MINIOS_UPSTREAM_REVISION
 * change SUPPORT.md heading version number; -unstable or -rc tag
 * (empty in stable branches after .0 release).
 * insert correct version number in release-notes link
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/5] branching checklist: drop some hg tag runes

2019-03-11 Thread Ian Jackson
We no longer use hg

Signed-off-by: Ian Jackson 
---
 docs/process/branching-checklist.txt | 4 
 1 file changed, 4 deletions(-)

diff --git a/docs/process/branching-checklist.txt 
b/docs/process/branching-checklist.txt
index 5a02d21968..4cda33656d 100644
--- a/docs/process/branching-checklist.txt
+++ b/docs/process/branching-checklist.txt
@@ -1,10 +1,6 @@
 v=4.1
 ov=4.0
 
-##* tag branchpoint
-##hg tag $v.0-branched
-##hg sign -k 'Xen tree' 4.1.0-branched
-
 * make branch in qemu-iwj.git
 git-branch $v-testing master
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 2/5] branching checklist: Say perhaps no Config.mk changes needed

2019-03-11 Thread Ian Jackson
It is only necessary to change Config.mk if it refers to unstable
branches anywhere.  This time, for example, it didn't.

Signed-off-by: Ian Jackson 
---
 docs/process/branching-checklist.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/process/branching-checklist.txt 
b/docs/process/branching-checklist.txt
index 4cda33656d..8437787446 100644
--- a/docs/process/branching-checklist.txt
+++ b/docs/process/branching-checklist.txt
@@ -75,6 +75,7 @@ Ensure references to qemu trees in xen.git's Config.mk are 
updated.
 Check this with
 grep unstable Config.mk 
 which should produce no output.  Replace as necessary.
+(There may well be none.)
 
 Update newly diverging staging (unstable) according to
 release-technician-checklist.txt section re README etc.
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 3/5] branching checklist: More detailed instructions re MAINTAINERS

2019-03-11 Thread Ian Jackson
Provide a rune, following which a magit selective git add
(or git add -p) can be used to commit the appropriate changes.

Signed-off-by: Ian Jackson 
---
 docs/process/branching-checklist.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/process/branching-checklist.txt 
b/docs/process/branching-checklist.txt
index 8437787446..9e79f64157 100644
--- a/docs/process/branching-checklist.txt
+++ b/docs/process/branching-checklist.txt
@@ -88,6 +88,8 @@ Update new stable tree's MAINTAINERS to contain correct info 
for this
 stable branch: usually, copy text from previous
 staging-$ov:MAINTAINERS section "Stable Release Maintenance"
 into new staging-$v, deleting what's there.
+  git cat-file blob origin/staging-$ov:MAINTAINERS >MAINTAINERS
+and review the changes, commiting ONLY THE RELEVANT ONES
 
 Set off a manual osstest run, since the osstest cr-for-branches change
 will take a while to take effect:
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Commit moratorium for branching off 4.12

2019-03-11 Thread Ian Jackson
Juergen Gross writes ("Commit moratorium for branching off 4.12"):
> Committers,
> 
> Please don't push any new patch to staging because we are going to
> branch off 4.12 from the main trunk.

4.12 is now branched off from the main trunk (ie, staging-4.12 is
branched from staging).

The commit moratorium for staging-4.12 is still in force as we are
preparing an RC; in any case all pushes to 4.12 need the RM ack.
For 4.12 only very few selected patches will be accepted until the
release.

staging (aka xen-unstable) is open for commits.  But:

Please wait to commit any larger series to the main branch until 4.12
has been released, in case some security patches or late fixes need to
be applied (backporting efforts should be as low as possible).

Thanks,
ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Ping: libfsimage path/file name changes

2019-02-06 Thread Ian Jackson
Jan Beulich writes ("Ping: libfsimage path/file name changes"):
> On 28.01.19 at 08:55,  wrote:
> > back in October you've added quite a number of "xen" prefixes to
> > various pieces there. Now that I've finally had time to connect this
> > change of yours with PV domain creation failures that I've since
> > been observing (not a bug in any way, merely resulting from the
> > fact that I'm running everything straight from the build tree) I
> > started wondering whether the environment variable inspected
> > by common/fsimage_plugin.c:load_plugins() shouldn't then also
> > gain a "XEN" prefix.
> 
> I think this wants a resolution one way or the other for 4.12, i.e.
> either we stick to what is there also after 4.12, or we do the
> additional change for the same release as where all your other
> changes have been made.

Yes, IMO it should.  I will send a patch.

Also I have had reports of trouble with pygrub in the Debian Xen
packages, which I need to investigate and this may generate a further
patch or two.

Juergen, would that be OK with you in principle ?  (I will ask again
for a formal release ack on concrete patch(es).)

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 1/4] xen: introduce SYMBOL

2019-02-06 Thread Ian Jackson
Jan Beulich writes ("Re: [Xen-devel] [PATCH v6 1/4] xen: introduce SYMBOL"):
> As per my earlier reply, I've yet to see proof of a "code-breaking
> optimization" that actually matches our case(s).

I have personally experienced a program being miscompiled because of
the mistaken belief by the compiler that _start and _end were
different objects.  I have read haven't read back the whole history of
this but this is definitely a real bug.

I agree with George that this is a compiler bug.  However, this bug is
not going to be fixed because the compiler community's behaviour is as
unreasonable as George paints :-(.

Our only option is some kind of bodge.

I don't believe in the asm fragment bodge because we don't have a
promise anywhere from the compiler authors that the asm hides pointer
provenance.  I am not aware of any C technique which can be reliably
used to obscure pointer provenance and prevent this kind of
misoptimisation.  Linux is skating on thin ice here.

That leaves:

(i) define indirection variables eg end_ in an assembly language file.
(ii) convert to uintptr_t before comparing

(i) is IMO wholly safe but it is a bit ugly and slightly less
performant.

I think (ii) is fairly safe.  I doubt that we will find (ii) broken.
In particular, because there is little motivation for compiler authors
to try to `optimise it'.  The difficulty with it providing automatic
way of detecting when we accidentallyf fail to cast.  I suggest the
following:

 extern const struct wombat _wombats_start[];
 extern const struct abstract_symbol _wombats_end[];

and providing a macro which compares any pointer with a struct
abstract_symbol* by converting the latter to a uintptr_t.  Direct
comparisons of _wombats_start with _wombats_end will result in a
compilation error due to type mismatch.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-unstable test] 132932: regressions - FAIL

2019-02-06 Thread Ian Jackson
Jan Beulich writes ("Re: [Xen-devel] [xen-unstable test] 132932: regressions - 
FAIL"):
> >>> On 06.02.19 at 17:10,  wrote:
> > flight 132932 xen-unstable real [real]
> > http://logs.test-lab.xenproject.org/osstest/logs/132932/ 
> > 
> > Regressions :-(
> > 
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> >  test-amd64-amd64-amd64-pvgrub  7 xen-bootfail REGR. vs. 
> > 132820
> 
> Hmm, the log is full of
> 
> (XEN) AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = ...
> 
> This can hardly be a result of the three commits in range here.
> Assuming this is recurring / reproducible, is there a way to set
> up a bisect starting from an older flight? Otoh looking at a
> successful test in an older flight I find in
> http://logs.test-lab.xenproject.org/osstest/logs/132575/test-amd64-amd64-xl-credit2/serial-joubertin0.log
> much the same, just that there the box came up.

This failure occured on joubertin1.  The joubertins were put into
service again yesterday after a long absence.  (The absence was due to
hardware trouble with the PDU etc. - not anything that would cause
this crash.)

So I think this is probably a host-specific crash.


The bisector has found a basis pass in 127489 and is working on it.
It has only just started.  In particular it has not yet repro'd either
the basis pass or the new failure, and right now it is doing some
rebuilds of old (expired) build artifacts to repro the basis pass.

Results will appear here:

  
http://logs.test-lab.xenproject.org/osstest/results/bisect/xen-unstable/test-amd64-amd64-amd64-pvgrub.xen-boot.html

and interim reports, or a report of a failure to bisect, will go to
the osstest-output mailing list under subjects like this:

  Subject: [xen-unstable bisection] 132970: testing 
test-amd64-amd64-amd64-pvgrub

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 1/4] xen: introduce SYMBOL

2019-02-06 Thread Ian Jackson
Jan Beulich writes ("Re: [Xen-devel] [PATCH v6 1/4] xen: introduce SYMBOL"):
> On 06.02.19 at 17:37,  wrote:
> > Jan Beulich writes ("Re: [Xen-devel] [PATCH v6 1/4] xen: introduce SYMBOL"):
> >> - it allows the end-of-whatever symbols to also be handed to
> >>   functions in a type-safe manner
> > 
> > Yes.  However...
> > 
> >>   (we could even have per-instance structure flavors, such that
> >>   unrelated "end" symbols can't be mixed up; for example the type
> >>   could simply be a structure wrapping a field of the original base
> >>   type).
> > 
> > I think this would be difficult to achieve without writing a forbidden
> > pointer comparison in the macro.  It could perhaps be achieved with
> > typeof() if that is available in hypervisor code.
> 
> I'm afraid I don't understand - you want to cast to an integer
> type anyway for doing the comparison.

The usual approach to haking a macro perform an explicit typecheck
(ie, to have the macro check that the types of its arguments are
right) is to have the macro expansion contain a `spurious' comparison
whose result is ignored but which will yield a compile-type type
mismatch if the argument types were wrong.  But this is only legal if
the provenance of the compared pointers is legal for a pointer
comparison.  The bad effects of evaluating an UB expression are not
limited by within-program causality.

> As to typeof() - this being a compiler construct, it is available
> whenever the compiler supports it. We certainly use it
> elsewhere in hypervisor code.

I think then that
   (typeof(X))0 == (typeof(Y))0
is the correct formulation of the type check - because it is legal no
matter the provenance of X and Y.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 1/4] xen: introduce SYMBOL

2019-02-06 Thread Ian Jackson
Jan Beulich writes ("Re: [Xen-devel] [PATCH v6 1/4] xen: introduce SYMBOL"):
> On 06.02.19 at 16:41,  wrote:
> > (i) define indirection variables eg end_ in an assembly language file.
> > (ii) convert to uintptr_t before comparing
> > 
> > (i) is IMO wholly safe but it is a bit ugly and slightly less
> > performant.
> > 
> > I think (ii) is fairly safe.  I doubt that we will find (ii) broken.
> > In particular, because there is little motivation for compiler authors
> > to try to `optimise it'.
> 
> If you want to be "prepared" for them taking apart asm()-s, how
> would you expect them to never "look through" casts?

I'm not sure what you mean by `look through casts'.  I expect the
compiler to `look through casts' for both value and provenance
purposes.

But comparing uintptr_t's is never UB, no matter their value or
provenance.  So there is simply unarguably no UB if the comparison is
done with uintptr_t's.  The conversions themselves are ID, not UB.

Whether comparing pointer values is UB depends on their value and
provenance, and in general the compiler is able to look through most
transformations (including casts) to determine the ultimate provenance
- ie the provenance rules are not defeated by casts.

So that's de jure.

De facto, there is little incentive for a compiler to misoptimise
uintptr_t comparisons, and much incentive for it to get `better' at
disentangling pointer provenance for the purpose of (mis)optimising
pointer comparisons.

> >  extern const struct wombat _wombats_start[];
> >  extern const struct abstract_symbol _wombats_end[];
...
> Hmm, that's certainly an interesting approach, and requires
> care only when introducing a new pair of symbols of this kind.
> But of course the macro you suggest to carry out the
> comparison will have the same weakness as any open coded
> cast to a suitable integer type. But there are benefits:

libxl already relies on casting to uintptr_t as a way of avoiding the
rules restricting pointer comparisons.  See these comments:
  libxl_event.c l.476  libxl__watch_slot_contents
  libxl_internal.h  l.322  typedef struct libxl__ev_watch_slot

> - it marks problem sites clearly (one of Stefano's goals),
> - it isolates future changes to how exactly the comparisons
>   are to be done to the comparison macro(s)
> - it doesn't undermine type safety of the main (start-of-
>   whatever) symbols (one of my goals),
> - it allows the end-of-whatever symbols to also be handed to
>   functions in a type-safe manner

Yes.  However...

>   (we could even have per-instance structure flavors, such that
>   unrelated "end" symbols can't be mixed up; for example the type
>   could simply be a structure wrapping a field of the original base
>   type).

I think this would be difficult to achieve without writing a forbidden
pointer comparison in the macro.  It could perhaps be achieved with
typeof() if that is available in hypervisor code.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Help commissioning x86 boxes intended for builds [himrod[012]]

2019-02-18 Thread Ian Jackson
Sorry for the rather random CC list.

Last year we bought a variety of test boxes.  Amongst them were three
biggish Intel machines which I had primarily intended for use as
dedicated build servers.  These are himrod[012].

Unfortunately I have not been able to commission them because they
have been failing their commissioning tests.  Investigations have not
found the problem.

The symptom is that, occasionally, the network stops working for a
while.  It then comes back, spontaneously.  There are no log messages
recorded on the box itself in /var/log for this; no messages on the
serial console.

The failure probability is about 10% for any one individual test job.

It seems to do it only under Xen with our own kernels (4.14.x).
For initial installation and for for builds we use stock Debian
kernels (currently, jessie, so 3.16.56-1 for the installer and
3.16.57-2 for the installed system); and I haven't seen failures
there.  I have not tried other combinations (yet).

They have Intel I350 NICs.  We have the same NICs in another pair of
boxes, debina[01], which work fine.  (The himrods are set to
use UEFI; the debinas BIOS.)

I have already had the machines' firmware updated.


I know that it isn't the whole machine freezing because here is an
example where the test box itself experiences a timeout trying to talk
to the network:

  
http://logs.test-lab.xenproject.org/osstest/logs/133269/test-amd64-amd64-xl-multivcpu/10.ts-debian-install.log
  
http://logs.test-lab.xenproject.org/osstest/logs/133269/test-amd64-amd64-xl-multivcpu/info.html

Here when the test box's network connection starts working again, the
TCP carrying the ssh session eventually retransmits and then the TCP
connection is unblocked, so the test box ends up sending the whole lot
of buffered up error messages to the controller VM which duly logs
them.


The most recent failed commissioning attempt was for himrods 0 and 2
only.  himrod1 has an unrelated problem with its serial cable.
However, my notes indicate that I had previous problems with himrod1
too.  So I think we need to take this as applying to these three
machines.

Just in case, I have asked Credativ to give the machines fresh cables
to different swtich ports.



I don't have a good model of what to do (or try) next.  Suggestions
welcome.  The full report from my most recent commissioning test
attempt is here:
  http://logs.test-lab.xenproject.org/osstest/logs/133269/


Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [linux-3.18 bisection] complete test-amd64-amd64-pair [and 1 more messages]

2019-02-12 Thread Ian Jackson
Summary:

 7b8052e19304 which is a backport to linux-3.18 of be06998f96ec has
 been found by the Xen CI auto-bisector to be responsible for a
 regression booting under Xen.

Jan Beulich writes ("Re: [linux-3.18 bisection] complete 
test-amd64-amd64-pair"):
> No, I'm not. I've said what I can say in a reply to an earlier bisection
> report (from Dec 12th), attached again here for reference.

I missed that.  Thanks.

Jan Beulich writes ("Re: [Xen-devel] [linux-3.18 bisection] complete 
test-amd64-amd64-pair"):
> On 12.12.18 at 22:41,  wrote:
> >   Bug is in tree:  linux 
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
> >   Bug introduced:  7b8052e19304865477e03a0047062d977309a22f
> >   Bug not present: d255d18a34a8d53ccc4a019dc07e17b6e8cf6bd1
> >   Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/131278/ 
> > 
> > 
> >   commit 7b8052e19304865477e03a0047062d977309a22f
> >   Author: Jan Beulich 
> >   Date:   Mon Oct 19 04:23:29 2015 -0600
> >   
> >   igb: fix NULL derefs due to skipped SR-IOV enabling
> 
> _Very_ interesting. An over three years old commit was determined
> to cause whatever regression it is. But wait - that's the date of the
> mainline commit, not that of the backport (which was done a month
> ago). I notice that of the two original commits the combination of
> which the one here is supposed to fix, only one actually got
> backported. Hence I wonder whether backporting the one here
> was actually appropriate.

The mainline commit be06998f96ecb93938ad2cce46c4289bf7cf45bc from
October 2015 was backported as
7b8052e19304865477e03a0047062d977309a22f.  You need to look at the
commit date as well as the author date:

   commit 7b8052e19304865477e03a0047062d977309a22f
   Author: Jan Beulich 
   AuthorDate: Mon Oct 19 04:23:29 2015 -0600
   Commit: Greg Kroah-Hartman 
   CommitDate: Sat Nov 10 07:39:21 2018 -0800

CC'ing Greg K-H.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 2/5] xen: introduce SYMBOLS_SUBTRACT and SYMBOLS_COMPARE

2019-02-12 Thread Ian Jackson
Jan Beulich writes ("Re: [Xen-devel] [PATCH v9 2/5] xen: introduce 
SYMBOLS_SUBTRACT and SYMBOLS_COMPARE"):
> On 12.02.19 at 15:47,  wrote:
> > I didn't see your proposed inline function, but don't think it can
> > work correctly because it won't be type-generic.  Ie, the requirement
> > is to use the same `SYMBOLS_DIFFERENCE(p, _foo_end)' for various
> > different `struct foo *p'.  p is perhaps of different types in the
> > different invocations and the return value needs to have been divided
> > by sizeof(*p).
> 
> Well, it wasn't a single inline function, but a macro defining an
> inline function suitable for a certain (start,end) tuple, at the
> same time giving both start and end suitable types which then
> also can't be used in the wrong context:
> https://lists.xenproject.org/archives/html/xen-devel/2019-02/msg00440.html

Ah.  Right.  Yes, that macro looks OK to me.

Although I really think it should use uintpr_t, not `unsigned long'.
IIRC there are special C language spec rules for conversion of
pointers to and from uintptr_t's.  (They are probably applied anyway
for all pointer-to-sufficiently-large-integer conversions but I would
prefer not to bet on that.)

> >  #define ALTER_PROVENANCE(value,provenance) ({
...
> I'm having trouble seeing how this would help with the issues
> at hand. Would you mind giving an example of how you'd see
> this used?

It may be a red herring.  But you could, perhaps, have your WHATEVER
macro do something like this:

  extern const type pfx##_end__wrong_provenance[];
  static const type *const pfx##_end =
 ALTER_PROVENANCE(pfx##_end__wrong_provenance, pfx##_start);

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 2/5] xen: introduce SYMBOLS_SUBTRACT and SYMBOLS_COMPARE

2019-02-12 Thread Ian Jackson
Jan Beulich writes ("Re: [Xen-devel] [PATCH v9 2/5] xen: introduce 
SYMBOLS_SUBTRACT and SYMBOLS_COMPARE"):
> On 12.02.19 at 13:01,  wrote:
> > I would particularly welcome the opinion of hypervisor maintainers on
> > my type safety point, below.
> 
> I agree with the requirements you put forward; I think I'd
> prefer the inline function versions I had suggested (or
> something similar) over macros though, not the least because
> they come with "built-in" type safety, rather than grafted one
> (by adding "pseudo" comparisons).

I didn't see your proposed inline function, but don't think it can
work correctly because it won't be type-generic.  Ie, the requirement
is to use the same `SYMBOLS_DIFFERENCE(p, _foo_end)' for various
different `struct foo *p'.  p is perhaps of different types in the
different invocations and the return value needs to have been divided
by sizeof(*p).


I hate to suggest this now, but as an alternative it would be possible
to do this:

 /*
  *T *ALTER_PROVENANCE(T *value, T *provenance);
  *
  *Returns a pointer with the value of `value' but the provenance
  *of `provenance'; that is, which is considered by the compiler
  *to be part of same object as `provenance' and not (necessarily)
  *part of the same object as `value'.
  *
  *`value' MUST be within whatever the compiler thinks the size
  *of `provenance' is (if the compiler has any way to know the
  *size of `provenance').  That is, `value' must be one that can
  *legally be constructed by indexing within `provenance'.
  *
  *`value' and `provenance' are evaluated only once each.
  */
 #define ALTER_PROVENANCE(value,provenance) ({
  (void)( typeof((value))0  == (void*)0   );
  (void)( typeof((provenance))0 == (void*)0   );
  (void)( typeof((provenance))0 == typeof((value))0   );
  typeof((provenance)) const alter_provenance__copy = (provenance);
  (typeof((provenance)))(
 (char*)(alter_provenance__copy) +
 ( (uintptr_t)(char*)(value) -
   (uintptr_t)(char*)(alter_provenance__copy) )
  )
 })

(\ omitted for clarity).

But you would still want _foo_end to have a different type to
_foo_start so you would have to wrap ALTER_PROVENANCE in another
macro.


> > Secondly, I find the argument ordering extremely confusing.  With your
> > macros DIFFERENCE(start,end) is negative but COMPARE(start,end) is
> > positive.  I suggest that you call the macro DIFFERENCE and have
> > DIFFERENCE(start,end) be positive.
> 
> Indeed having to put end before start in either macro invocation
> is prone to be got wrong. In the common case this will be noticed
> quickly, but even then it's likely one extra compile and test run to
> notice that there's something wrong.
> 
> However, I realize this is to keep use sites look more like
> "end - start", which has its merits as well.

Yes.  The problem is that with the `-' disappearing, the need to swap
the arguments (as is needed with `-') is much less evident.


> Furthermore - do we really need both a subtract and a compare
> construct? The result subtract produces can be used for
> comparison purposes as well, after all (just like all CPUs I know
> details of implement [integer] compare as a subtract discarding
> its numeric result, instead [or only] updating certain status flags).

I think we only need one macro construct to deal with comparisons with
_foo_end.

The other cases seem very few and don't fall into a convenient pattern
anyway.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 1/5] xen: introduce ptrdiff_t, fix uintptr_t

2019-02-12 Thread Ian Jackson
Jan Beulich writes ("Re: [PATCH v9 1/5] xen: introduce ptrdiff_t, fix 
uintptr_t"):
> On 12.02.19 at 02:13,  wrote:
> > -typedef unsigned int __attribute__((__mode__(__pointer__))) uintptr_t;
> > +typedef unsigned long __attribute__((__mode__(__pointer__))) uintptr_t;
> 
> I don't understand this change: The mode attribute discards any width
> association derived from the specified base type.

I looked for the reference documentation for the semantics of
__attribute__((__mode__(__pointer__))).  I found this in my GCC 6
manual:

  'mode (MODE)'
   This attribute specifies the data type for the
   declaration--whichever type corresponds to the mode MODE.  This in
   effect lets you request an integer or floating-point type according
   to its width.

   You may also specify a mode of 'byte' or '__byte__' to indicate the
   mode corresponding to a one-byte integer, 'word' or '__word__' for
   the mode of a one-word integer, and 'pointer' or '__pointer__' for
   the mode used to represent pointers.

This wording is extremely obscure, especially when read in the context
of C's insane pointer provenance rules and compiler authors' insane
interpretations of those rules.

If our goal is to avoid malicious compiler optimisation nonsense we
should not use a feature like this whose semantics might be taken to
imply that the things are really pointers.


> > +typedef long ptrdiff_t;
> 
> FTR I'm unconvinced of this, or the need to use uintptr_t in the first
> place in the macros you introduce: The entire UNIX world implies
> sizeof(long) == sizeof(void*) afaik.

Please provide a reference for this assertion about sizeof(long).

Looking at http://www.unix.org/whitepapers/64bit.html LLP64 (32-bit
long, 64-bit long long, 64-bit void*) was at least contemplated in
some circles, although I haven't been able to find any Unix-like
examples besides Minix, which runs on machines with 16-bit pointers
(and its longs have to be >=32-bit).

> But if we really want to have ptrdiff_t, then I think we should either
> follow the uintptr_t model and use attribute((mode())), or we should
> leverage the compiler's __PTRDIFF_TYPE__ (and then also
> __UINTPTR_TYPE__ for uintptr_t, at least if available - not sure what
> its availability depends on, but it's conditional in gcc's
> c_stddef_cpp_builtins()).

It is not unusual for porting something like Xen to a new architecture
to involve writing a short header file with these kind of type
definitions.  I don't know why we couldn't take that approach.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] arm64, laxton[01] (was Re: [xen-unstable-smoke test] 133030: trouble: blocked/broken/pass)

2019-02-12 Thread Ian Jackson
Ian Jackson writes ("Re: [Xen-devel] arm64, laxton[01] (was Re: 
[xen-unstable-smoke test] 133030: trouble: blocked/broken/pass)"):
> And that part works.  It runs through d-i and thinks it has succeeded.
> But then when the host reboots it reboots into 3.16, not the backports
> kernel.
> 
> Looking at the installer log:
> 
>   2019-02-07 15:18:34 Z 172.16.144.52:39843 <13>Feb  7 15:18:34
>   base-installer: info: kernel linux-image-arm64 usable on arm64 
> 
> which is kind of weird.
> 
> > So I think we can rule out a firmware bug.  Now, I am struggling to
> > understand why osstest is suddenly using 3.16.
> > Was there any change in osstest or the configuration?
> 
> No.  I think something may be wrong with the way Debian is publishing
> the backports kernels.  I will ask...

Thanks, Julien, you have found enough for me to identify the problem.

According to snapshot.debian.org, at 2019-02-06T21:13:14, the file
  
http://ftp.debian.org/debian/dists/jessie-backports/main/binary-arm64/Packages.xz
mentioned linux-base version 4.3~bpo8+1.  Now, it doesn't mention
it at all.

This is presumably part of cruft removal, since jessie-security does
contain an even newer version, 4.5~deb8u1.

linux-image-4.9.* require newer linux-support than is available in
plain jessie.  As a result of all this, the installer removes the
4.9.x kernel because its dependencies can't be satisfied.  The machine
then tries to reboot into some much older kernel.

The osstest installer system does not use jessie-security and I think
there was some reason for this which I have forgotten right now.
Maybe the security repositories were awkward to cache - but nowadays
we have a different approach to this caching.

I will investigate putting -security back.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] arm64, laxton[01] (was Re: [xen-unstable-smoke test] 133030: trouble: blocked/broken/pass)

2019-02-12 Thread Ian Jackson
Julien Grall writes ("Re: [Xen-devel] arm64, laxton[01] (was Re: 
[xen-unstable-smoke test] 133030: trouble: blocked/broken/pass)"):
> On Tue, Feb 12, 2019 at 12:26 PM Ian Jackson  wrote:
> > I don't have a good plan about what to do next.  I guess one thing we
> > could do would be to ask Yogesh to reflash the firmware on laxton0 and
> > see if that helps.
> >
> > I think this issue is likely to be a protracted problem.  Any helpful
> > suggestions from people with more experience of these particular boxes
> > would be very welcome...
> 
> Looking at the first failure log [1], it seems we are trying to boot
> Linux 3.16.  So the failure seems legit to me as I don't think 3.16
> supports Seattle.

This is odd.  You have definitely spotted something wrong.

In
  
http://logs.test-lab.xenproject.org/osstest/logs/132973/test-arm64-arm64-xl/info.html
we see that the failing step started at 15:13:32.  In the log
  > [1] 
http://logs.test-lab.xenproject.org/osstest/logs/132973/test-arm64-arm64-xl/serial-laxton0.log
I see this:

  Feb 7 15:14:49.374771 [ 0.00] Linux version 4.9.0-0.bpo.2-arm64
  (debian-ker...@lists.debian.org) (gcc version 4.9.2 (Debian/Linaro
  4.9.2-10) ) #1 SMP Debian 4.9.18-1~bpo8+1 (2017-04-10)

And that part works.  It runs through d-i and thinks it has succeeded.
But then when the host reboots it reboots into 3.16, not the backports
kernel.

Looking at the installer log:

  2019-02-07 15:18:34 Z 172.16.144.52:39843 <13>Feb  7 15:18:34
  base-installer: info: kernel linux-image-arm64 usable on arm64 

which is kind of weird.

> So I think we can rule out a firmware bug.  Now, I am struggling to
> understand why osstest is suddenly using 3.16.
> Was there any change in osstest or the configuration?

No.  I think something may be wrong with the way Debian is publishing
the backports kernels.  I will ask...

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 1/4] xen: introduce SYMBOL

2019-02-12 Thread Ian Jackson
Stefano Stabellini writes ("Re: [Xen-devel] [PATCH v6 1/4] xen: introduce 
SYMBOL"):
> On Thu, 7 Feb 2019, Ian Jackson wrote:
> > FAOD, I think you should expect people to declare the linker symbols
> > either as I suggested:
> > 
> >  extern const struct wombat _wombats_start[];
> >  extern const struct abstract_symbol _wombats_end[];
> > 
> > (or along the lines of Jan's suggestion, but frankly I think that is
> > going to be too hard to sort out now.)
> 
> Yes, they are already declared this way, I would prefer to avoid
> changing the declaration as part of this series.

I'm not sure why you didn't CC me on your revised version(s) ?

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 2/5] xen: introduce SYMBOLS_SUBTRACT and SYMBOLS_COMPARE

2019-02-12 Thread Ian Jackson
I would particularly welcome the opinion of hypervisor maintainers on
my type safety point, below.


Stefano Stabellini writes ("[Xen-devel] [PATCH v9 2/5] xen: introduce 
SYMBOLS_SUBTRACT and SYMBOLS_COMPARE"):
> +/*
> + * Calculate (end - start), where start and/or end are linker symbols,
> + * returning a ptrdiff_t. The size is in units of start's referent.
> + */
> +#define SYMBOLS_SUBTRACT(end, start) ({  
>  \
> +(ptrdiff_t)((uintptr_t)(end) - (uintptr_t)(start)) / sizeof(*start); 
>  \
> +})

I'm afraid I have several problems with the details of these macros.

Firstly, I really don't like the lack a type check, which was present
in my proposal.  For example, given:
 extern struct wombat _foos_start[];
 extern char _end[];
and your macro definition, the two expressions
  SYMBOLS_SUBTRACT(_foos_start, _end)
 -SYMBOLS_SUBTRACT(_end, _foos_start)
produce different values because one is divided by sizeof(struct
wombat) and the other by sizeof(char).  This is hardly desirable.


Secondly, I find the argument ordering extremely confusing.  With your
macros DIFFERENCE(start,end) is negative but COMPARE(start,end) is
positive.  I suggest that you call the macro DIFFERENCE and have
DIFFERENCE(start,end) be positive.


Thirdly, in an earlier exchange:

> On Thu, 7 Feb 2019, Ian Jackson wrote:
> > FAOD, I think you should expect people to declare the linker symbols
> > either as I suggested:
> > 
> >  extern const struct wombat _wombats_start[];
> >  extern const struct abstract_symbol _wombats_end[];
> > 
> > (or along the lines of Jan's suggestion, but frankly I think that is
> > going to be too hard to sort out now.)
> 
> Yes, they are already declared this way, I would prefer to avoid
> changing the declaration as part of this series.

Are you sure things are not currently declared like this
 extern const struct wombat _wombats_start[];
 extern const struct wombat _wombats_end[];
?  Looking at your v9 series, that seems to be the case.

I think using the `abstract_symbol' version (or Jan's refinement of
it) is important because it arranges that accidental raw pointer
comparisons, which are UB, are a type error.

IOW this approach moves and centralises the burden of remembering (on
pain of miscompilation) that something is special about these
pointers, to the point where the linker symbols are introduced into
the C environment.  That is one place, and one where it is easier to
remember because something odd is already going on.

Unless you want to propose some other approach which will reliably
find these kind of linker symbol pointer comparison rule violations
and ensure that they are not deployed in production code ?


This type difference is why my proposal had two macros.

If both start and end have the same type, only one macro is necessary.
It should be one which checks the types are identical and returns a
signed value in units of the type.  That can be used for comparisons.
If desired, we could enhance the macro so that the compiler can prove
that the division can be removed when the result is compared for
inequality with 0.

But even with two types, it may be that there is only one macro needed
because the vast majority of use sites are formulaic.  You said
earlier:

> Most of the call sites only do things like (_end - _start) or (p >
> _end). I wanted to bring up cases that are not trivial.

When designing a general scheme for macros like this it is best to
consider the usual case and make it straightforward to use, and
bulletproof.  Ie presenting unusual cases as your examples is not
helpful to the design process for a macro.

IMO the situation you describe in the snipped I quote is what the
purpose of SYMBOL_DIFFERENCE is.  For the two examples you give,
always one of the arguments is _end.  So if we make the type of _end
be struct abstract_symbol[], SYMBOL_DIFFERENCE can (i) check that _end
is of that type (ii) return an answer in units of its other (first)
argument.

For pointers derived from _start, it is actually straightforwardly
legal to compare them with _start, or subtract _start from them.  So
no macrology is needed in that case.


Stepping back a bit, it is indeed the second symbol referrring to the
same memory object that triggers the compiler bug: the compiler
wrongly assumes that two extern declarations must refer to two
different objects.  Making the two declarations have different types
will simply prevent *all* triggers for this bug, because raw
comparisons or arithmetic between differently typed pointers is a
compile error.  Then all that is needed is to embed the correct usage
pattern into a macro (or, a sufficiently general set of correct usage
patterns that ad-hoc approaches are rare).


You also write:

> We have a couple of cases where we are "punning wildly between pointers
> a

[Xen-devel] [OSSTEST PATCH 3/4] backports snapshot: Disable apt timestamp checking (sometimes)

2019-02-13 Thread Ian Jackson
In jessie and earlier, this has to be done with a global option.

In later releases, it can be done by putting some options in [ ]
in the relevant sources list entry.

Signed-off-by: Ian Jackson 
---
 Osstest/Debian.pm | 13 +
 1 file changed, 13 insertions(+)

diff --git a/Osstest/Debian.pm b/Osstest/Debian.pm
index 5e74e86e..c1e75020 100644
--- a/Osstest/Debian.pm
+++ b/Osstest/Debian.pm
@@ -951,6 +951,19 @@ sub preseed_backports_packages (@) {
 
my $apt_insert='';
my $extra_rune='';
+   if ($suite =~ m/wheezy|jessie/) {
+   # this has global effect, unfortunately
+   $extra_rune = <\$d/50osstestsnapshot <https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 4/4] backports snapshot: Use 20190206T211314Z for jessie-backports

2019-02-13 Thread Ian Jackson
Some time on 2019-02-07, Debian removed linux-base from
jessie-backports.  This caused everything to break: apt wasn't happy
to get linux-base from jessie-security (because of our -t
jessie-backports, probably) and that meant there was no linux-base
suitable for linux-image-4.9.x on arm64.  We ended up trying to
boot the installed system with 3.16, which does not work on our two
SoftIron arm64 test boxes.

Also, jessie-backports about to be completely removed.

Signed-off-by: Ian Jackson 
---
 production-config   | 2 ++
 production-config-cambridge | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/production-config b/production-config
index 6b743d4f..59c74cca 100644
--- a/production-config
+++ b/production-config
@@ -91,6 +91,8 @@ TftpNetbootGroup osstest
 TftpDiVersion_wheezy 2016-06-08
 TftpDiVersion_jessie 2018-06-26
 
+DebianSnapshotBackports_jessie 
http://snapshot.debian.org/archive/debian/20190206T211314Z/
+
 # For ISO installs
 DebianImageVersion_wheezy 7.2.0
 DebianImageVersion_jessie 8.2.0
diff --git a/production-config-cambridge b/production-config-cambridge
index 8e2eadd2..ce6239bd 100644
--- a/production-config-cambridge
+++ b/production-config-cambridge
@@ -78,6 +78,8 @@ TftpNetbootGroup osstest
 TftpDiVersion_wheezy 2016-06-08
 TftpDiVersion_jessie 2018-06-26
 
+DebianSnapshotBackports_jessie 
http://snapshot.debian.org/archive/debian/20190206T211314Z/
+
 # For ISO installs
 DebianImageVersion_wheezy 7.2.0
 DebianImageVersion_jessie 8.2.0
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 1/4] backports snapshot: Honour DebianSnapshotBackports_ config var

2019-02-13 Thread Ian Jackson
If this is set, use it instead of the usual DebianMirrorHost and
Subpath.  No functional change with configs that don't set it.

This is not sufficient to work right yet, because snapshots
repositories have out-of-date signatures...

Signed-off-by: Ian Jackson 
---
 Osstest/Debian.pm | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Osstest/Debian.pm b/Osstest/Debian.pm
index c6858253..ee7e03cf 100644
--- a/Osstest/Debian.pm
+++ b/Osstest/Debian.pm
@@ -946,6 +946,9 @@ sub preseed_backports_packages (@) {
 my ($ho, $sfx, $xopts, $suite, @pkgs) = @_;
 
 if (! $xopts->{BackportsSourcesAlreadyAdded}++) {
+   my $bp_url = $c{"DebianSnapshotBackports_$suite"};
+   $bp_url ||= "http://$c{DebianMirrorHost}/$c{DebianMirrorSubpath};;
+
preseed_hook_command($ho, 'late_command', $sfx, <>/target/etc/apt/sources.list
 
 # $suite backports
-deb http://$c{DebianMirrorHost}/$c{DebianMirrorSubpath} $suite-backports main
+deb $bp_url $suite-backports main
 EOF
 in-target apt-get update
 END
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 2/4] backports snapshot: Provide for $apt_insert and $extra_rune

2019-02-13 Thread Ian Jackson
No functional change.

Signed-off-by: Ian Jackson 
---
 Osstest/Debian.pm | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Osstest/Debian.pm b/Osstest/Debian.pm
index ee7e03cf..5e74e86e 100644
--- a/Osstest/Debian.pm
+++ b/Osstest/Debian.pm
@@ -949,6 +949,8 @@ sub preseed_backports_packages (@) {
my $bp_url = $c{"DebianSnapshotBackports_$suite"};
$bp_url ||= "http://$c{DebianMirrorHost}/$c{DebianMirrorSubpath};;
 
+   my $apt_insert='';
+   my $extra_rune='';
preseed_hook_command($ho, 'late_command', $sfx, <>/target/etc/apt/sources.list
 
 # $suite backports
-deb $bp_url $suite-backports main
+deb $apt_insert $bp_url $suite-backports main
 EOF
+$extra_rune
 in-target apt-get update
 END
 }
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [OSSTEST PATCH 0/4] Use snapshot for jessie-backports

2019-02-13 Thread Ian Jackson
Ian Jackson writes ("[OSSTEST PATCH 0/4] Use snapshot for jessie-backports"):
> Things are being removed from jessie-backports, which we rely on for
> arm64 tests.  Switch to using snapshot.debian.org.
> 
> In my moderately-formal test, this has got as far as booting one of
> the machines into Xen.  It's currently doing the guest install, which
> I expect to be fine.
...
> Accordingly, I have convinced myself that this code will only affect
> arm64 in the Xen Project CI lab.

I have pushed this to osstest pretest.

Juergen, I wasn't sure if this met the formal conditions for putting
your release ack on it, but I felt you would want me to go ahead.

If you disagree, I can rewind pretest and kill the self-test flight,
putting us back in the same situation (although having maybe wasted
some machine effort).

BTW, in my semiformal test this has got as far as the guest repeat
stop/start test so it is looking good.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 0/4] Use snapshot for jessie-backports

2019-02-13 Thread Ian Jackson
Things are being removed from jessie-backports, which we rely on for
arm64 tests.  Switch to using snapshot.debian.org.

In my moderately-formal test, this has got as far as booting one of
the machines into Xen.  It's currently doing the guest install, which
I expect to be fine.


Impact on the Xen Project Massachusetts CI, and the Xen release:

This enables the new code in preseed_backports_packages when $suite is
jessie.  preseed_backports_packages has three call sites; two in
preseed_create_guest but only where $suite is wheezy or stretch, and
one in preseed_create in a call to di_special_kernel.

di_special_kernel looks for host flags like `need-kernel-deb-SUITE'
(and a compat case for wheezy).  In the Massachusetts osstest
instance, such flags are need-kernel-deb-jessie-backports (for the
laxtons) and need-kernel-deb-jessie-special (for a handful of x86
boxes).  But the call to preseed_backports_packages only triggers if
$kp eq 'backports', so only for the laxtons.

Accordingly, I have convinced myself that this code will only affect
arm64 in the Xen Project CI lab.


Ian Jackson (4):
  backports snapshot: Honour DebianSnapshotBackports_ config var
  backports snapshot: Provide for $apt_insert and $extra_rune
  backports snapshot: Disable apt timestamp checking (sometimes)
  backports snapshot: Use 20190206T211314Z for jessie-backports

 Osstest/Debian.pm | 21 -
 production-config |  2 ++
 2 files changed, 22 insertions(+), 1 deletion(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 5/4] backports snapshot: Disable apt timestamp checking in right place

2019-02-14 Thread Ian Jackson
We need to put this in /target or it does not take effect.

Yesterday's 4 patches worked yesterday but not today, because the
snapshot in question in fact expired in between.  With this additional
patch they work today too.

Signed-off-by: Ian Jackson 
---
 Osstest/Debian.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Osstest/Debian.pm b/Osstest/Debian.pm
index c1e75020..17dfef54 100644
--- a/Osstest/Debian.pm
+++ b/Osstest/Debian.pm
@@ -954,7 +954,7 @@ sub preseed_backports_packages (@) {
if ($suite =~ m/wheezy|jessie/) {
# this has global effect, unfortunately
$extra_rune = <\$d/50osstestsnapshot <https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [OSSTEST PATCH 3/4] backports snapshot: Disable apt timestamp checking (sometimes)

2019-02-14 Thread Ian Jackson
Ian Jackson writes ("[OSSTEST PATCH 3/4] backports snapshot: Disable apt 
timestamp checking (sometimes)"):
> In jessie and earlier, this has to be done with a global option.
> 
> In later releases, it can be done by putting some options in [ ]
> in the relevant sources list entry.

I discover that:

 1. This was not effective.
 2. It worked in my test yesterday because 20190206T211314Z
was less than 14 days ago.  It does not work today.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] arm64, laxton[01] (was Re: [xen-unstable-smoke test] 133030: trouble: blocked/broken/pass)

2019-02-12 Thread Ian Jackson
Ian Jackson writes ("arm64, laxton[01] (was Re: [Xen-devel] [xen-unstable-smoke 
test] 133030: trouble: blocked/broken/pass)"):
> I have checked and there was no update to the osstest code.  I don't
> think there have been updates to the infrastructure config but I
> haven't searched all the infrastructure boxes etckeepers etc.

I have now checked the etckeeper and dpkg logs on the osstest (test
controller, tftp server) and infra (dhcp and dns server etc.) VMs and
there were no changes that could have caused this.

Credativ have checked the boot files being used against the backups
and they are file.

> So something has caused both machines to fail simultaneously.
> 
> Possible explanations to me seem to be:
> 
>   * Some kind of common physical cause (power surge corrupting the
> firmware or something)
> 
>   * Some kind of common nonphysical cause external to the hosts or the
> tests: bad files on the infrastructure hosts; a change to the
> behaviour of the bootp/tftp servers; a new kind of broadcast
> network packet (perhaps from other tests) which causes the laxton
> firmware to malfunction; etc.
> 
>   * The new linux-4.9 kernel does something which has the effect of
> often (but not always) corrupting the laxtons' firmware.
> 
>   * A firmware bug triggered by the passage of time (eg
> clock-dependent)
> 
>   * Misunderstanding by me in my analysis of what ingredients are used
> by the host installation failures.
> 
>   * Some other common-mode failure that I haven't thought of.
> 
> In the meantime I have unblessed the laxtons to avoid osstest
> repeatedly power cycling them to try to get them to work.  FTR this
> will still not allow the push gate to pass.

I don't have a good plan about what to do next.  I guess one thing we
could do would be to ask Yogesh to reflash the firmware on laxton0 and
see if that helps.

I think this issue is likely to be a protracted problem.  Any helpful
suggestions from people with more experience of these particular boxes
would be very welcome...

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [linux-3.18 bisection] complete test-amd64-amd64-pair

2019-02-12 Thread Ian Jackson
Jan, are you investigating this regression ?

osstest service owner writes ("[linux-3.18 bisection] complete 
test-amd64-amd64-pair"):
> branch xen-unstable
> xenbranch xen-unstable
> job test-amd64-amd64-pair
> testid xen-boot/dst_host
> 
> Tree: linux 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
> Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
> Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
> Tree: qemuu git://xenbits.xen.org/qemu-xen.git
> Tree: xen git://xenbits.xen.org/xen.git
> 
> *** Found and reproduced problem changeset ***
> 
>   Bug is in tree:  linux 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
>   Bug introduced:  7b8052e19304865477e03a0047062d977309a22f
>   Bug not present: d255d18a34a8d53ccc4a019dc07e17b6e8cf6bd1
>   Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/131278/
> 
> 
>   commit 7b8052e19304865477e03a0047062d977309a22f
>   Author: Jan Beulich 
>   Date:   Mon Oct 19 04:23:29 2015 -0600
>   
>   igb: fix NULL derefs due to skipped SR-IOV enabling
>   
>   [ Upstream commit be06998f96ecb93938ad2cce46c4289bf7cf45bc ]
>   
>   The combined effect of commits 6423fc3416 ("igb: do not re-init SR-IOV
>   during probe") and ceee3450b3 ("igb: make sure SR-IOV init uses the
>   right number of queues") causes VFs no longer getting set up, leading
>   to NULL pointer dereferences due to the adapter's ->vf_data being NULL
>   while ->vfs_allocated_count is non-zero. The first commit not only
>   neglected the side effect of igb_sriov_reinit() that the second commit
>   tried to account for, but also that of setting IGB_FLAG_HAS_MSIX,
>   without which igb_enable_sriov() is effectively a no-op. Calling
>   igb_{,re}set_interrupt_capability() as done here seems to address this,
>   but I'm not sure whether this is better than sinply reverting the other
>   two commits.
>   
>   Signed-off-by: Jan Beulich 
>   Tested-by: Aaron Brown 
>   Signed-off-by: Jeff Kirsher 
>   Signed-off-by: Sasha Levin 
> 
> 
> For bisection revision-tuple graph see:
>
> http://logs.test-lab.xenproject.org/osstest/results/bisect/linux-3.18/test-amd64-amd64-pair.xen-boot--dst_host.html
> Revision IDs in each graph node refer, respectively, to the Trees above.


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [OSSTEST PATCH] README.hardware-acquisition [and 1 more messages]

2019-02-15 Thread Ian Jackson
Lars Kurth writes ("Re: [OSSTEST PATCH] README.hardware-acquisition [and 1 more 
messages]"):
> as mentioned earlier, I think we need a legal/admin item for hardware and 
> donations. See proposed addition below.
...
> LEGAL
> =

I would prefer not to bake the Linux Foundation address into this
document, nor to have it be the primary repository for legal details,
if that's OK.

How about this:

  LEGAL - XEN PROJECT
  ===

  The Xen Project Test Lab runs public tests, and provides access to its
  hardware to Xen community members so that they can debug the code.
  Accordingly there must not be any restrictions on publication of
  information such as test results, nor any restrictions on access to
  the hardware.  Ie, the Xen Project is not able to agree to any
  nondisclosure agreement (NDA).

  The Xen Project Test Lab does not accept hardware on loan.  We
  purchase it, or, following prior consultation and approval, can accept
  donations.  For donations, the Linux Foundation requires a letter
  confirming some legal details.  The Community Manager will provide
  a list of specific requirements for that `donation form' letter.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [OSSTEST PATCH] README.hardware-acquisition [and 1 more messages]

2019-02-15 Thread Ian Jackson
Lars Kurth writes ("Re: [OSSTEST PATCH] README.hardware-acquisition [and 1 more 
messages]"):
> This is fine by me

Thanks, pushed now.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [OSSTEST PATCH 5/4] backports snapshot: Disable apt timestamp checking in right place

2019-02-15 Thread Ian Jackson
Juergen Gross writes ("Re: [Xen-devel] [OSSTEST PATCH 5/4] backports snapshot: 
Disable apt timestamp checking in right place"):
> Release-acked-by: Juergen Gross 
> 
> in case my implicit ack from yesterday expired as well ;-)

Thanks :-).  I pushed it without waiting.  It is now in production (as
of 07mumble UTC this morning).  I think we will get a push of
xen-unstable-smoke shortly.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/libxendevicemodel: add xendevicemodel_modified_memory_bulk to map

2019-02-15 Thread Ian Jackson
Paul Durrant writes ("RE: [PATCH] tools/libxendevicemodel: add 
xendevicemodel_modified_memory_bulk to map"):
> Cc-ing Juergen, in case this can make 4.12...

I think this should go into 4.12.  It is very low risk.

In another case there would possibly be questions about whether we
wanted to add an additional ABI/API stability promise, but in this
case this is not an issue.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/libxendevicemodel: add xendevicemodel_modified_memory_bulk to map

2019-02-15 Thread Ian Jackson
Paul Durrant writes ("RE: [PATCH] tools/libxendevicemodel: add 
xendevicemodel_modified_memory_bulk to map"):
> Perhaps it would be possible to build some sort of cross-check that all 
> functions listed in the public header of a library are actually *somewhere* 
> in the map file, so this doesn't happen in future.

I wouldn't stand in the way of that, but presumably functions are
usually added because someone wants to call them so it would often be
spotted anyway, and the consequences of a mistake here are fairly
minor.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [OSSTEST PATCH] README.hardware-acquisition [and 1 more messages]

2019-02-15 Thread Ian Jackson
Ian Jackson writes ("Re: [OSSTEST PATCH] README.hardware-acquisition [and 1 
more messages]"):
> So overall, for the reasons I explain, I'm going to commit this
> document (subject to the other comments etc.) *with* the requirement
> that hardware must be supported by Debian (at least, in -backports).

This didn't happen.  THere was considerable further discussion.  The
fact that various kinds of uncertainty meant this document didn't get
committed is now blocking us giving the go-ahead for some new hardware
acquisition:

Ie, I can't answer the question "should we accept hardware XYZ"
without reference to at least an implied a checklist like this.
Having written it down I ought to use the one I've written down,
because to do otherwise is simply to pointlessly invite mistakes.  And
if I'm to use a written-down checklist it should be one which is
actually official.

Accordingly, I intend to commit this to osstest now.  Juergen, this is
just a document: can I have your release ack for it ?

I will then reply separately about the specific new hardware, using
the checklist as a guide.  Obviously a checklist is always a
guidelines document: if we find that a point is best answered a
different way than the checklist expects, or that the checklist ought
to be changed, then changes to the checklist are a reasonable part of
the outcome of such a process; that would be in the form of further
patches to this document in osstest.

Ian.

From fae48bd584a0b58934a2df97b6db1d06eacf1724 Mon Sep 17 00:00:00 2001
From: Ian Jackson 
Date: Tue, 30 Oct 2018 16:12:27 +
Subject: [OSSTEST PATCH] README.hardware-acquisition

New document-cum-checklist, for helping with hardware procurement.

Signed-off-by: Ian Jackson 
CC: in...@xenproject.org
CC: George Dunlap 
CC: Stefano Stabellini 
CC: Julien Grall https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 1/4] xen: introduce SYMBOL

2019-02-07 Thread Ian Jackson
Stefano Stabellini writes ("Re: [Xen-devel] [PATCH v6 1/4] xen: introduce 
SYMBOL"):
> I am OK with this approach. Maybe not the best IMO, but good enough. It
> should also satisfy the MISRAC guys, as they wrote "ideally cast to
> uintptr_t only once": here we wouldn't be casting only once, but at
> least we would do it inside a single well-defined macro.

Right.  I think it meets the goals of MISRA-C, probably better than
most other approaches.

FAOD, I think you should expect people to declare the linker symbols
either as I suggested:

 extern const struct wombat _wombats_start[];
 extern const struct abstract_symbol _wombats_end[];

(or along the lines of Jan's suggestion, but frankly I think that is
going to be too hard to sort out now.)

> +/*
> + * Performs x - y, returns the original pointer type. To be used when
> + * either x or y or both are linker symbols.
> + */
> +#define SYMBOLS_SUBTRACT(x, y) ({
>  \
> +__typeof__(*(y)) *ptr_;  
>  \
> +ptr_ = (typeof(ptr_)) (((uintptr_t)(x) - (uintptr_t)(y)) / 
> sizeof(*(y))); \
> +ptr_;
>  \
> +})

This is type-incoherent.  The difference between two pointers is a
scalar, not another pointer.  Also "the original pointer type" is
ambiguous.  It should refer explicitly to y.  IMO this function should
contain a typecheck which assures that x is of the right type.

How about something like this:

  /*
   * Calculate (end - start), where start and end are linker symbols,
   * giving a ptrdiff_t.  The size is in units of start's referent.
   * end must be a `struct abstract_symbol*'.
   */
  #define SYMBOLS_ARRAY_LEN(start,end) ({
 ((end) == (struct abstract_symbol*)0);   
 (ptrdiff_t)((uintptr_t)(end) - (uintptr_t)(start)) / sizeof(*start);
  })

  /*
   * Given two pointers A,B of arbitrary types, gives the difference
   * B-A in bytes.  Can be used for comparisons:
   *   If AB, gives a positive number
   * Legal even if the pointers are to different objects.
   */
  #define POINTER_CMP(a,b) ({
 ((a) == (void*)0);
 ((b) == (void*)0);
 (ptrdiff_t)((uintptr_t)(end) - (uintptr_t)(start));
  })

The application of these two your two examples is complex because your
examples seem wrong to me.

> +/*
> + * Performs x - y, returns uintptr_t. To be used when either x or y or

This is wrong.  Comparisons should give a signed output.

> + * both are linker symbols.

In neither of your example below are the things in question linker
symbols so your examples violate your own preconditions...


> Examples:
> 
> +new_ptr = SYMBOLS_SUBTRACT(func->old_addr, _start) + vmap_of_xen_text;

This is punning wildly between pointers and integers.  I infer that
old_addr is a pointer of some kind and vmap_of_xen_text is an integer.
I also infer that sizeof(*old_addr) is 1 because otherwise you
multiply vmap_of_xen_text by the size which is clearly entirely wrong.
Ie this code is just entirely wrong.

This is presumably some kind of relocation.  I don't think it makes
much sense to macro this.  Instead, it is better to make
vmap_of_xen_text a pointer and do this:

  +/* Relocation.  We need to calculate the offset of the address
  + * from _start, and apply that to our own map, to find where we
  + * have this mapped.  Doing these kind of games directly with
  + * pointers is contrary to the C rules for what pointers may be
  + * compared and computed.  So we do the offset calculation with
  + * integers, which is always legal.  The subsequent addition of
  + * the offset to the vmap_of_xen_text pointer is legal because
  + * the computed pointer is indeed a valid part of the object
  + * referred to by vmap_of_xen_text - namely, the byte array
  + * of our mapping of the Xen text. */
  +new_ptr = ((uintptr_t)func->old_addr - (uintptr_t)_start) + 
vmap_of_xen_text;

Note that, unfortunately, any deviation from completely normal pointer
handling *must* be accompanied by this kind of a proof, to explain why
it is OK.

> and:
> 
> +for ( alt = region->begin;
> +  SYMBOLS_COMPARE(alt, region->end) < 0;
> +  alt++ )

region->begin and ->end aren't linker symbols, are they ?  So the
wrong assumption by the compiler (which is at the root of this thread)
that different linker symbols are necessarily different objects
(resulting from the need to declare them in C as if they were) does
not arise.  I think you mean maybe something like _region_start and
_region_end.  So with my proposed macro:

> We could also define a third macro such as:
>   #define SYMBOLS_SUBTRACT_INT(x, y)  SYMBOLS_COMPARE((x), (y))
> because we have many places where we need the result of SYMBOLS_SUBTRACT
> converted to an integer type. For instance:
>   paddr_t xen_size = (uintptr_t)SYMBOLS_SUBTRAC(_end, _start);

This need arises 

[Xen-devel] [PATCH 2/3] tools: init scripts: xencommons: Fixes to Description

2019-02-07 Thread Ian Jackson
`neeeded' is a typo.  And xend is long gone.

No functional change.

Signed-off-by: Ian Jackson 
---
 tools/hotplug/Linux/init.d/xencommons.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/hotplug/Linux/init.d/xencommons.in 
b/tools/hotplug/Linux/init.d/xencommons.in
index 581b02c27a..aa62e4c92f 100644
--- a/tools/hotplug/Linux/init.d/xencommons.in
+++ b/tools/hotplug/Linux/init.d/xencommons.in
@@ -15,7 +15,7 @@
 # Default-Start: 2 3 5
 # Default-Stop:  0 1 6
 # Short-Description: Start/stop xenstored and xenconsoled
-# Description:   Starts and stops the daemons neeeded for xl/xend
+# Description:   Starts and stops the daemons needed for xl
 ### END INIT INFO
 
 BACKEND_MODULES="@LINUX_BACKEND_MODULES@"
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/3] tools: init scripts: xencommons: Provides `xen'

2019-02-07 Thread Ian Jackson
It is useful to have a single `xen' facility (in the LSB Provides
namespace).  That allows other facilities to specify that they should
go after `xen' without needing to know the implementation details.

This service name is already Provide'd by the (fairly different) init
scripts used in Debian.

Signed-off-by: Ian Jackson 
---
 tools/hotplug/Linux/init.d/xencommons.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/hotplug/Linux/init.d/xencommons.in 
b/tools/hotplug/Linux/init.d/xencommons.in
index a33058ed44..581b02c27a 100644
--- a/tools/hotplug/Linux/init.d/xencommons.in
+++ b/tools/hotplug/Linux/init.d/xencommons.in
@@ -7,7 +7,7 @@
 # chkconfig: 2345 70 10
 # description: Starts and stops xenstored and xenconsoled
 ### BEGIN INIT INFO
-# Provides:  xenstored xenconsoled
+# Provides:  xenstored xenconsoled xen
 # Required-Start:$syslog $remote_fs
 # Should-Start:
 # Required-Stop: $syslog $remote_fs
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 3/3] tools: init scripts: make XEN_RUN_DIR and XEN_LOCK_DIR mode 700

2019-02-07 Thread Ian Jackson
These directories ought not to be even world-readable.  If this script
for some reason runs with a lax umask they might be created
overly-writeable.  Avoid any such bug by setting the mode explicitly.

Signed-off-by: Ian Jackson 
---
 tools/hotplug/Linux/init.d/xencommons.in | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/hotplug/Linux/init.d/xencommons.in 
b/tools/hotplug/Linux/init.d/xencommons.in
index aa62e4c92f..7fd6903b98 100644
--- a/tools/hotplug/Linux/init.d/xencommons.in
+++ b/tools/hotplug/Linux/init.d/xencommons.in
@@ -56,8 +56,8 @@ do_start () {
 
for mod in $BACKEND_MODULES ; do modprobe "$mod" &>/dev/null ; done
 
-   mkdir -p ${XEN_RUN_DIR}
-   mkdir -p ${XEN_LOCK_DIR}
+   mkdir -m700 -p ${XEN_RUN_DIR}
+   mkdir -m700 -p ${XEN_LOCK_DIR}
mkdir -p ${XEN_LOG_DIR}
 
@XEN_SCRIPT_DIR@/launch-xenstore || exit 1
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.12 0/3] tools: Init scripts minor fixes

2019-02-07 Thread Ian Jackson
The Debian Xen maintainers had a review of the init scripts.  Debian
has its completely own set of scripts.  This is something that needs
to be tackled in the medium term but not now.

However, we did identify some things which could be remedied upstream
right away.

Ian Jackson (3):
  tools: init scripts: xencommons: Provides `xen'
  tools: init scripts: xencommons: Fixes to Description
  tools: init scripts: make XEN_RUN_DIR and XEN_LOCK_DIR mode 700

 tools/hotplug/Linux/init.d/xencommons.in | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] arm64, laxton[01] (was Re: [xen-unstable-smoke test] 133030: trouble: blocked/broken/pass)

2019-02-11 Thread Ian Jackson
On Fri, Feb 08, 2019 at 01:14:16PM +, Wei Liu wrote:
> On Fri, Feb 08, 2019 at 05:21:44AM +, osstest service owner wrote:
> > flight 133030 xen-unstable-smoke real [real]
> > http://logs.test-lab.xenproject.org/osstest/logs/133030/
> > 
> > Failures and problems with tests :-(

Thanks for looking at this.

> After some investigation, I think something is wrong with the linux-4.9
> branch.
> 
> The issue to hand is:
> 
> Feb  8 04:12:54.790904 
> Loading initial ramdisk ...
> Feb  8 04:12:55.114864 
> EFI stub: Booting Linux Kernel...
> Feb  8 04:12:55.354885 EFI stub: ERROR: Failed to alloc kernel memory
> Feb  8 04:12:55.354946 EFI stub: ERROR: Failed to relocate kernel
> Feb  8 04:12:55.354993 Feb  8 04:12:55.355016 
>   Failed to boot both default and fallback entries.
> 
> The new 4.9 kernel can't be loaded _natively_ anymore.

This is not using our own-built kernel.

It is using (our copy of) the jessie arm64 debian-installer kernel
which has not changed since June.  I guess it is just about possible
that the file in /home/tftp has been corrupted somehow.  I have asked
Credativ to check them against our backups.

Looking at
  http://logs.test-lab.xenproject.org/osstest/results/host/laxton0.html
  http://logs.test-lab.xenproject.org/osstest/results/host/laxton1.html
it is evident that both boxes started failing at roughly the same
time.

On both boxes the first failing job was a test in flight 132973, the
linux-4.9 one.  But I think this is a red herring because (i) the
failure occurs during host installation, before the flight has
actually touched the kernel to be tested (ii) on laxton1 there were
two test jobs in 132973 which passed (iii) on both machines there were
earlier build jobs in 132973 which passed (which would have had a very
similar host installation step).

I have checked and there was no update to the osstest code.  I don't
think there have been updates to the infrastructure config but I
haven't searched all the infrastructure boxes etckeepers etc.

So something has caused both machines to fail simultaneously.

Possible explanations to me seem to be:

  * Some kind of common physical cause (power surge corrupting the
firmware or something)

  * Some kind of common nonphysical cause external to the hosts or the
tests: bad files on the infrastructure hosts; a change to the
behaviour of the bootp/tftp servers; a new kind of broadcast
network packet (perhaps from other tests) which causes the laxton
firmware to malfunction; etc.

  * The new linux-4.9 kernel does something which has the effect of
often (but not always) corrupting the laxtons' firmware.

  * A firmware bug triggered by the passage of time (eg
clock-dependent)

  * Misunderstanding by me in my analysis of what ingredients are used
by the host installation failures.

  * Some other common-mode failure that I haven't thought of.

In the meantime I have unblessed the laxtons to avoid osstest
repeatedly power cycling them to try to get them to work.  FTR this
will still not allow the push gate to pass.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 3/5] power: ssh: Reduce timeout for script fragment

2019-01-25 Thread Ian Jackson
This is really not going to take a minute.  Probably, much less.
Waiting less long will save time when we fall back.

Signed-off-by: Ian Jackson 
---
 Osstest/PDU/ssh.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Osstest/PDU/ssh.pm b/Osstest/PDU/ssh.pm
index cfcf8f85..16410937 100644
--- a/Osstest/PDU/ssh.pm
+++ b/Osstest/PDU/ssh.pm
@@ -47,7 +47,7 @@ sub pdu_power_state {
 
 my $delay = 5;
 
-target_cmd_root($mo->{Host}, <{Host}, <>/var/log/osstest-reboot.log
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 5/5] power: ssh: Wait for the target to appear to go down

2019-01-25 Thread Ian Jackson
When we `power on' with the ssh method, we actually run ssh reboot.

On some systems (notably, FreeBSD) the kernel does not simply reboot
immediately even with the runes we provide here, ie for FreeBSD
  reboot -nq
Eg, I have seen reboots with several messages like this:
  Jan 25 14:17:59.100044 Waiting (max 60 seconds) for system thread 
`bufspacedaemon-2' to stop... done

This can result in the ssh method failing spuriously, because the
`power on' appears to complete while the host is still up in the
previous environment.  In one of my test runs I saw an ssh to the host
succeed, and print the uptime (of the existing environment), between
the reboot command being issued and the host actually rebooting.

So, wait (up to just over a minute) until the host does not respond to
ping.  (target_await_down runs ping -c 5.)

CC: Roger Pau Monné 
Signed-off-by: Ian Jackson 
---
 Osstest/PDU/ssh.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Osstest/PDU/ssh.pm b/Osstest/PDU/ssh.pm
index 16410937..d68d3880 100644
--- a/Osstest/PDU/ssh.pm
+++ b/Osstest/PDU/ssh.pm
@@ -62,7 +62,7 @@ sub pdu_power_state {
  )&
 END
 
-sleep($delay);
+target_await_down($mo->{Host}, $delay + 70);
 }
 
 sub instantaneous {
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 1/5] flight_otherjob: Use confess rather than die

2019-01-25 Thread Ian Jackson
When this error trips it is usually because the call site looked up an
unset runvar and it can be hard to tell what that runvar was.

If we use confess we will at least find out the calling line number...

Signed-off-by: Ian Jackson 
---
 Osstest.pm | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Osstest.pm b/Osstest.pm
index 85a6e78b..92b1a0ea 100644
--- a/Osstest.pm
+++ b/Osstest.pm
@@ -22,6 +22,7 @@ use warnings;
 use POSIX;
 use File::Basename;
 use IO::File;
+use Carp;
 
 BEGIN {
 use Exporter ();
@@ -370,7 +371,7 @@ sub flight_otherjob ($$) {
 my ($thisflight, $otherflightjob) = @_;
 return $otherflightjob =~ m/^([^.]+)\.([^.]+)$/ ? ($1,$2) :
$otherflightjob =~ m/^\.?([^.]+)$/ ? ($thisflight,$1) :
-   die "$otherflightjob ?";
+   confess "$otherflightjob ?";
 }
 
 sub other_revision_job_suffix ($$) {
@@ -444,3 +445,4 @@ sub show_abs_time ($) {
 }
 
 1;
+
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 2/5] power: ssh: Fix handling of $delay

2019-01-25 Thread Ian Jackson
The script fragment contains a reference to $delay which is a perl
variable, not a variable in the script fragment.  We therefore need to
not ''-quote the script.

Without this, the ssh method will often fail spuriously: the exiting
parent (which will signal success back to the osstest controller)
races with the attempt to reboot.

Signed-off-by: Ian Jackson 
---
 Osstest/PDU/ssh.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Osstest/PDU/ssh.pm b/Osstest/PDU/ssh.pm
index ac1eb919..cfcf8f85 100644
--- a/Osstest/PDU/ssh.pm
+++ b/Osstest/PDU/ssh.pm
@@ -47,7 +47,7 @@ sub pdu_power_state {
 
 my $delay = 5;
 
-target_cmd_root($mo->{Host}, <<'END', 60);
+target_cmd_root($mo->{Host}, <>/var/log/osstest-reboot.log
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 0/5] Reduce hard power cycles, part 2 (FreeBSD, fixes)

2019-01-25 Thread Ian Jackson
This replaces the `DO NOT APPLY' patch 26/26 from the part 1 series.
It also contains some other fixes/improvements.

Ian Jackson (5):
  flight_otherjob: Use confess rather than die
  power: ssh: Fix handling of $delay
  power: ssh: Reduce timeout for script fragment
  power: ts-freebsd-host-install: Use power_reboot_attempts
  power: ssh: Wait for the target to appear to go down

 Osstest.pm  |  4 +++-
 Osstest/PDU/ssh.pm  |  4 ++--
 ts-freebsd-host-install | 33 +
 3 files changed, 30 insertions(+), 11 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 4/5] power: ts-freebsd-host-install: Use power_reboot_attempts

2019-01-25 Thread Ian Jackson
We look at the installer environment uptime, to
 | check that this is the installer environment
as requested by the comment
 | in particular $await must only succeed if the host really did
 | reboot into the boot environment that $await expects.
near the top of power_reboot_attempts

CC: Roger Pau Monné 
Signed-off-by: Ian Jackson 
---
 ts-freebsd-host-install | 33 +
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/ts-freebsd-host-install b/ts-freebsd-host-install
index 53daeefc..3c3e9c34 100755
--- a/ts-freebsd-host-install
+++ b/ts-freebsd-host-install
@@ -259,14 +259,31 @@ END
 }
 
 # Switch off, setup PXE and switch on to the installer
-power_state($ho, 0);
-setup_netboot_installer();
-power_cycle_sleep($ho);
-power_state($ho, 1);
-
-# Wait for the host to finish booting
-logm("Waiting for the installer to boot");
-await_tcp(get_timeout($ho,'reboot',$timeout), 5, $ho);
+power_reboot_attempts($ho, sub {
+setup_netboot_installer();
+}, sub {
+# Wait for the host to finish booting
+logm("Waiting for the installer to boot");
+my $wait_start = time;
+await_tcp(get_timeout($ho,'reboot',$timeout), 5, $ho);
+
+# We want to check that we actually rebooted.  We do this by
+# comparing the (putative) installer environment's uptime,
+# with the time we spent waiting for it to appear.
+my $timeoutput = target_cmd_output_root($ho,
+'date +%s; sysctl -n kern.boottime');
+logm("got:\n$timeoutput");
+$timeoutput =~ s{^(\d+)\n}{} or die "date: $timeoutput ?";
+my $target_now = $1;
+$timeoutput =~ m{\ssec\s?=\s?(\d+)\b} or die "sysctl: $timeoutput ?";
+my $target_boottime = $1;
+
+my $uptime = $target_now - $target_boottime;
+my $elapsed = time - $wait_start;
+logm("uptime=$uptime elapsed=$elapsed");
+$uptime < $elapsed or die "uptime >= elapsed";
+
+}, undef, 'install');
 
 if ($bootonly) {
 hostprop_putative_record($ho, "MemdiskAppend", $memdisk_append)
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [OSSTEST PATCH 5/5] power: ssh: Wait for the target to appear to go down

2019-01-25 Thread Ian Jackson
Roger Pau Monne writes ("Re: [OSSTEST PATCH 5/5] power: ssh: Wait for the 
target to appear to go down"):
> On Fri, Jan 25, 2019 at 02:50:46PM +, Ian Jackson wrote:
> > On some systems (notably, FreeBSD) the kernel does not simply reboot
> > immediately even with the runes we provide here, ie for FreeBSD
> >   reboot -nq
> > Eg, I have seen reboots with several messages like this:
> >   Jan 25 14:17:59.100044 Waiting (max 60 seconds) for system thread 
> > `bufspacedaemon-2' to stop... done
> 
> So it seems like reboot -nq doesn't behave as expected...

Mmm.  Ah well.

Thanks for the reviews.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12] libxl: When restricted, start QEMU paused

2019-01-30 Thread Ian Jackson
Anthony PERARD writes ("[PATCH for-4.12] libxl: When restricted, start QEMU 
paused"):
> [stuff]

Thanks for this.  I think the code looks right but to make it easier
to understand what was going on I have taken the liberty of trying to
reword your commit message for English grammar.

Can you please check that this is true ?

  libxl runs the command "cont" later during guest creation; i.e. it
  is expecting that QEMU would not do any emulation.  Use the "-S"
  command option to achieve this.

  Unfortunately, when QEMU is started with "-S", it won't write QEMU's
  readiness into xenstore. So only activate this option when we have a
  QEMU startup notification via QMP available, i.e. when dm_restrict
  is activated.

  The -S option has the side-effect of suppressing the startup
  notification via xenstore: libxl will only get the notification via
  QMP.

  It is important to rely only on QMP for notification when we have
  QMP available, as (due to a qemu bug) not waiting for that QMP
  notification may result in the QMP socket becoming blocked, so that
  QEMU stops responding to new connections even if no existing ones
  are active.

  The QEMU bug that this patch works around is as follows:
  - libxl connects and hand-checks [xxx???] with QEMU, then sends the
cmd "query-status".
  - QEMU prepares and maybe tries to send the response,
while also writing "running" into xenstore.
  - libxl sees via xenstore that QEMU is running and disconnects from the
QMP socket before receiving the response from the cmd.
  => The QMP socket (monitor) is thereby blocked and will never reply
to commands on new connections.

  This is due to QEMU only responding to one command at a time, and
  suspending its monitor (QMP) until the command as been processed and
  sent. Disconnecting from the socket doesn't unsuspend the monitor. The
  race described here is very likely to happen with QEMU 3.1.50 (during
  3.2 development), but can be reproduced with QEMU 3.1.

  [xxx??? So, require, therefore, that when we get the QMP readiness
  notification, the qemu state is xenstore.]

I added a couple of xxx where I was particularly unsure.

Regards,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12] libxl: When restricted, start QEMU paused

2019-01-30 Thread Ian Jackson
Anthony PERARD writes ("Re: [PATCH for-4.12] libxl: When restricted, start QEMU 
paused"):
> On Wed, Jan 30, 2019 at 03:09:45PM +, Ian Jackson wrote:
> >   - libxl connects and hand-checks [xxx???] with QEMU, then sends the
> > cmd "query-status".
> >   - QEMU prepares and maybe tries to send the response,
> > while also writing "running" into xenstore.
> >   - libxl sees via xenstore that QEMU is running and disconnects from the
> > QMP socket before receiving the response from the cmd.
> >   => The QMP socket (monitor) is thereby blocked and will never reply
> > to commands on new connections.
> > 
> >   This is due to QEMU only responding to one command at a time, and
> >   suspending its monitor (QMP) until the command as been processed and
> >   sent. Disconnecting from the socket doesn't unsuspend the monitor. The
> >   race described here is very likely to happen with QEMU 3.1.50 (during
> >   3.2 development), but can be reproduced with QEMU 3.1.
> > 
> >   [xxx??? So, require, therefore, that when we get the QMP readiness
> >   notification, the qemu state is xenstore.]
> 
> Sorry for this been vague. I'm not 100% sure of what QEMU do and when.
> The listing of QEMU's action are probably not accurate.

Hrm.  Maybe you want to write `roughly' or `I think' or something,
then.  I think the description of the QEMU behaviour is useful even if
not 100% accurate.

> Or maybe the issue here is: s/hand-checks/handshake/, I was trying to
> describe the context with as little words as possible before
> getting to the point where things fails.

Yes, `handshake' makes much more sense.  `Hand-check' would mean
`check by hand'.
https://en.wiktionary.org/wiki/handshake
https://en.wiktionary.org/wiki/by_hand

> Is that better to understand the context? How much of this would
> actally be useful for the patch description?

Probably what you wrote originally and I edited, with the xxx's fixed.
Assuming it's actually true...

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 for-4.12] libxl: When restricted, start QEMU paused

2019-01-31 Thread Ian Jackson
Anthony PERARD writes ("[PATCH v2 for-4.12] libxl: When restricted, start QEMU 
paused"):
> Signed-off-by: Anthony PERARD 
> Release-acked-by: Juergen Gross 

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH-for-4.10/4.11] libxl: don't set gnttab limits in soft reset case

2019-02-05 Thread Ian Jackson
Wei Liu writes ("Re: [PATCH-for-4.10/4.11] libxl: don't set gnttab limits in 
soft reset case"):
> On Thu, Jan 17, 2019 at 05:40:59PM +0100, Juergen Gross wrote:
> > In case of soft reset the gnttab limit setting will fail, so omit it.
> > Setting of max vcpu count is pointless in this case, too, so we can
> > drop that as well.
> > 
> > Without this patch soft reset will fail with:
> > 
> > libxl: error: libxl_dom.c:363:libxl__build_pre: Couldn't set grant table 
> > limits
> > 
> > Reported-by: Jim Fehlig 
> > Signed-off-by: Juergen Gross 
> > Tested-by: Jim Fehlig 
> 
> Reviewed-by: Wei Liu 

Pushed to 4.10 and 4.11, thanks.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] preparations for 4.10.3 and 4.9.4

2019-02-05 Thread Ian Jackson
Jan Beulich writes ("Re: [Xen-devel] preparations for 4.10.3 and 4.9.4"):
> On 01.02.19 at 12:23,  wrote:
> > For 4.10.3:
> > 
> > https://lists.xen.org/archives/html/xen-devel/2019-01/msg01451.html 
> 
> Ian, this looks to be one for you.

Indeed, thanks.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12] libxl: correctly dispose of dominfo list in libxl_name_to_domid

2019-02-05 Thread Ian Jackson
Ian Jackson writes ("Re: [PATCH for-4.12] libxl: correctly dispose of dominfo 
list in libxl_name_to_domid"):
> Wei Liu writes ("[PATCH for-4.12] libxl: correctly dispose of dominfo list in 
> libxl_name_to_domid"):
> > Tamas reported ssid_label was leaked. Use the designated function to
> > free dominfo list to fix the leakage.
> > 
> > Reported-by: Tamas K Lengyel 
> > Signed-off-by: Wei Liu 
> > Tested-by: Tamas K Lengyel 
> 
> Acked-by: Ian Jackson 
> 
> > Backport candidate.
> 
> Noted.

For reference:

This bug was first introduced in

  35c1498503216510095745d06d09f0f01ddf800e
  libxenlight: fix name to domid conversion

where the call to libxl_domain_list was introduced.  A fix was
attempted in

  cbc97d570c09ff9a1db5af18f5e5db5565ab3506
  libxenlight: fix memory leaks

which is where the `free(dominfo)' call was introduced, but that was
not right.  That latter commit is in staging-3.4.  So all supported
(and security-supported) versions are affected.  I have backported
this to 4.10 and 4.11.  (As it doesn't seem to me to be a security
issue I have not fixed it on earlier branches.)

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [OSSTEST PATCH] production-config: Temporarily drop arm64

2019-01-29 Thread Ian Jackson
Following reflashing of the two machines, and recent work to change
the PDU usage, which I think will help with the reliability, I am
intending to revert this as soon as I have rerun a commissioning test
on both laxtons.

Ian.

From a7491b211f06daa8fa6e0f646bf717958cf91a2b Mon Sep 17 00:00:00 2001
From: Ian Jackson 
Date: Tue, 29 Jan 2019 13:26:29 +
Subject: [OSSTEST PATCH] Revert "production-config: Temporarily drop arm64"

This reverts commit c65d7eb3f6c424d6c1fe69c5ecfca9c0b6cf4302.
---
 production-config | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/production-config b/production-config
index 980377c4..6b743d4f 100644
--- a/production-config
+++ b/production-config
@@ -45,9 +45,6 @@ LogsMinExpireAge= 86400*4
 LogsPublishMinSpaceMby= 20*1e3
 LogsPublishMinExpireAge= 86400*7
 
-BuildArches i386 amd64 armhf
-TestArches i386 amd64 armhf
-
 TestHostKeypairPath /home/osstest/keys/id_rsa_osstest
 
 GitCacheProxy git://cache:9419/
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12] libxl: correctly dispose of dominfo list in libxl_name_to_domid

2019-01-29 Thread Ian Jackson
Wei Liu writes ("[PATCH for-4.12] libxl: correctly dispose of dominfo list in 
libxl_name_to_domid"):
> Tamas reported ssid_label was leaked. Use the designated function to
> free dominfo list to fix the leakage.
> 
> Reported-by: Tamas K Lengyel 
> Signed-off-by: Wei Liu 
> Tested-by: Tamas K Lengyel 

Acked-by: Ian Jackson 

> Backport candidate.

Noted.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] osstest commits and Xen releases

2019-01-29 Thread Ian Jackson
Juergen Gross writes ("OSStest commits and Xen releases"):
> I have found an alarming tendency regarding changes in the OSStest
> repository: over the last 2 years (or 3 Xen versions) there has been
> a pattern of OSStest commits being more frequent during the RC phase
> of a Xen release. On average there were about 4 commits to osstest.git
> per week. The numbers were significantly higher during RC-phases:
> 
> Version   RC-phase OSStest commits per week
> 4.12  2019/01/16 - 19
> 4.11  2018/04/17 - 2018/07/09  10
> 4.10  2017/10/16 - 2017/12/13  6
> 
> I have looked at this as I would have liked to cut 4.12-RC2 this
> Monday, but OSStests for xen-unstable failed over the weekend. Ian
> suspected a change in OSStest to be blamed (needs to be verified).
> 
> As the release manager I don't like RCs being delayed due to changes
> in our infrastructure. For Xen we have code freeze and patches to go in
> need the release manager's ack. Shouldn't the same apply to OSStest?
> 
> I like OSStest very much as it helps catching bugs early. But I believe
> the main development should not be done in the time when we need it's
> results to be most reliable.
> 
> Thoughts?


Thanks for raising this.  I have three lines of response.


Firstly, in the most general case: I think you have a point.

(I think this effect is probably due to changes which had been starved
of effort due to the impending Xen freeze being unblocked, but I would
have to do a full chart to be sure.)

I suggest we improve this by adopting a release ack system for pushes
to osstest pretest after the Xen codefreeze date.  In practice it will
sometimes be necessary to make changes quickly (eg debian-installer
kernel updates) so I think I (as ossteset maintainer) would need some
discretion to waive the need for a release ack or to make one myself,
but that would certainly involve informing you, and asking your
opinion if you are available.

Another possibility would be to arrange for xen-unstable to have its
own separate branch of osstest, so that xen-unstable's runs can be
detached from the rest.  I think while this is technically possible it
is not worth the additional complexity (admin hassle, risk of
confusion, work to reconcile branches, etc. etc.)

Do you think a release ack should be needed for commissioning new
hardware ?


Secondly, on this specific set of changes, looking at it from the
point of view of whether such a release ack ought to have been
forthcoming:

We have been having hardware failures.  Particularly, we have been
having PDU port failures which I am fairly sure are due to the high
frequency with which we use the PDU relays to hard power cycle the
machines.  We have also had a higher rate of other hardware problems
than I think would be to be expected, which might be related.

These PDU relay problems themselves lead to osstest unreliability and
of course the longer the situation goes on the more stuff breaks.

So I think that for these changes a release ack should probably have
been granted although perhaps additional formal testing (or some other
assurance) would have or should been done - see below.


Thirdly, in this case these recent changes were in fact not anything
to do with the fact that we didn't get a push over the weekend.
Looking at the recent flights, the first of the changes I made at the
end of last week took effect in 132504 (which reported late on
Monday).

The osstest changes were:

 * Substantial changes to host (and L1 host/guest) power
   on/off/reboot machinery.  In particular hosts are now normally
   soft-rebooted via ssh at the start of a test, rather than
   hard power cycled.

 * Small changes to reporting functions.
 * One tiny change to improve some error messages.


These changes *did* cause a regression in 132504:
 test-amd64-amd64-examine  4 memdisk-try-append fail pass in 132478

This was not considered blocking by osstest because from the
archaeologist's point of view it is intermittent (the archaeologist is
right but for the wrong reason).  But, to justify that osstest had to
look at 132478 which has other failures, so this osstest regression
was part of the reason for not getting a push on Monday night.


The bug was effectively introduced by dropping, late in development,
the power management changes for the FreeBSD tests.  Those changes
were dropped late due to me realising as I was writing more
comprehensive design comments that my intended scheme was not 100%
sound.

This problem was not detected by osstest's formal self-test because
the formal self-test did not encounter the triggering condition (The
bug triggers when the FreeBSD test runs on a box which for some reason
was left in a state, by the previous test, where it could not be
rebooted with ssh; the latter is quite rare.)

This risk would have been obvious to me if I had been asked (or asked
myself) how thoroughly the changes ought to have been tested.  For
example, in the context 

[Xen-devel] [OSSTEST PATCH 09/26] power: Provide `try_off' pdu method; deprecate ipmi_try

2019-01-24 Thread Ian Jackson
We are going to want to use this magically, in our new approach.  Make
a general version, and deprecate ipmi_try (which will be obsoleted by
the new approach and which has probably not been used very much).

Signed-off-by: Ian Jackson 
---
 Osstest/PDU/ipmi_try.pm |  2 ++
 Osstest/PDU/try_off.pm  | 65 +
 2 files changed, 67 insertions(+)
 create mode 100644 Osstest/PDU/try_off.pm

diff --git a/Osstest/PDU/ipmi_try.pm b/Osstest/PDU/ipmi_try.pm
index cf851d21..17eb504e 100644
--- a/Osstest/PDU/ipmi_try.pm
+++ b/Osstest/PDU/ipmi_try.pm
@@ -14,6 +14,8 @@
 # You should have received a copy of the GNU Affero General Public License
 # along with this program.  If not, see <http://www.gnu.org/licenses/>.
 
+# NOTE: this method is deprecated.  Use try_off instead.
+
 package Osstest::PDU::ipmi_try;
 
 use strict;
diff --git a/Osstest/PDU/try_off.pm b/Osstest/PDU/try_off.pm
new file mode 100644
index ..aa73c854
--- /dev/null
+++ b/Osstest/PDU/try_off.pm
@@ -0,0 +1,65 @@
+# This is part of "osstest", an automated testing framework for Xen.
+# Copyright (C) 2014 Citrix Inc.
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+package Osstest::PDU::try_off;
+
+use strict;
+use warnings;
+
+use Osstest;
+use Osstest::TestSupport;
+use IO::File;
+
+use parent qw(Osstest::PDU::unsupported);
+
+our $default_attempts = 3;
+
+sub new {
+my ($class, $ho, $methname, @opts) = @_;
+my $attempts;
+if ($opts[0] =~ m/^\d+$/) {
+   $attempts = shift @opts;
+}
+return bless { Attempts => $attempts,
+  Then => get_host_method_object($ho, 'PDU', "@opts"),
+}, $class;
+}
+
+sub new_from_mo {
+my ($class, $mo, $attempts) = @_;
+return bless { Attempts => $attempts,
+  Then => $mo
+}, $class;
+}
+
+sub pdu_power_state {
+my ($mo, $on) = @_;
+
+if ($on) {
+   $mo->{Then}->pdu_power_state($on);
+} else {
+   my $attempts = ($mo->{Attempts} // $default_attempts);
+   foreach my $attempt (1..$attempts) {
+   eval {
+   $mo->{Then}->pdu_power_state($on);
+   };
+   last unless $@;
+   warn "(attempt $attempt/$attempts) $@";
+   }
+}
+}
+
+1;
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 05/26] power handling: Break out power_cycle_parse_method

2019-01-24 Thread Ian Jackson
Signed-off-by: Ian Jackson 
---
 Osstest/TestSupport.pm | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm
index 29108423..68b51728 100644
--- a/Osstest/TestSupport.pm
+++ b/Osstest/TestSupport.pm
@@ -917,10 +917,10 @@ sub serial_fetch_logs ($) {
 
 #-- power cycling --
 
-sub power_cycle_host_setup ($) {
-my ($ho) = @_;
+sub power_cycle_parse_method ($$) {
+my ($ho, $spec) = @_;
 my $methobjs = [ ];
-foreach my $meth (split /\;\s*/, ($ho->{Power} // 'unsupported')) {
+foreach my $meth (split /\;\s*/, $spec) {
 if ($meth eq 'nest') {
 push @$methobjs, $meth;
 } elsif ($meth =~ m{^(\d+)(?:/(\d+))$}) {
@@ -930,7 +930,13 @@ sub power_cycle_host_setup ($) {
 push @$methobjs, get_host_method_object($ho,'PDU',$meth);
 }
 }
-$ho->{PowerMethobjs} = $methobjs;
+return $methobjs;
+}
+
+sub power_cycle_host_setup ($) {
+my ($ho) = @_;
+$spec = ($ho->{Power} // 'unsupported');
+$ho->{PowerMethobjs} = power_cycle_parse_method($ho,$spec);
 }
 
 sub power_cycle_sleep ($) {
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 16/26] ts-host-powercycle: Use a lighter-weight method if available

2019-01-24 Thread Ian Jackson
Signed-off-by: Ian Jackson 
---
 ts-host-powercycle | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/ts-host-powercycle b/ts-host-powercycle
index 5c1698c7..79a3b711 100755
--- a/ts-host-powercycle
+++ b/ts-host-powercycle
@@ -33,8 +33,10 @@ our ($whhost) = @ARGV;
 $whhost ||= 'host';
 our $ho= selecthost($whhost);
 
+my $approach_re = qr{(?!.*SSH)};
+
 if (defined $only) {
-power_state($ho, $only);
+power_state($ho, $only, $approach_re);
 } else {
-power_cycle($ho);
+power_cycle($ho, $approach_re);
 }
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 20/26] power: Use `Combined' as Name for PDU+ILOM approach

2019-01-24 Thread Ian Jackson
Signed-off-by: Ian Jackson 
---
 Osstest/TestSupport.pm | 11 +++
 mg-hosts   |  2 +-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm
index 886e95d4..fb14c4b3 100644
--- a/Osstest/TestSupport.pm
+++ b/Osstest/TestSupport.pm
@@ -938,9 +938,10 @@ sub power_cycle_host_setup ($) {
 # $ho->{PowerApproaches}[]{Name}see below
 # $ho->{PowerApproaches}[]{MethObjs}[]  each has ->pdu_power_state($on)
 # `Name's are:
-#Only   Host only supports one method and this is it
-#ILOM   Try to use just the ILOM
-#PDUTry to use the PDU (but also turn off/on ILOM if provided)
+#Only Host only supports one method and this is it
+#ILOM Try to use just the ILOM
+#PDU  Try to use the PDU (no ILOM provided)
+#Combined Try to use the PDU and ILOM tohether
 my @approaches;
 my $pdu_s = get_host_property($ho,'PowerPDU');
 my $ilom_s = get_host_property($ho,'PowerILOM');
@@ -955,8 +956,10 @@ sub power_cycle_host_setup ($) {
}
if ($pdu_s) {
my $pdu_m = power_cycle_parse_method($ho, $pdu_s);
+   my $pdu_name = 'PDU';
if ($ilom_m) {
require Osstest::PDU::try_off;
+   $pdu_name = 'Combined';
$pdu_m = [
@$pdu_m,
get_host_method_object($ho, 'PDU', 'pause'),
@@ -964,7 +967,7 @@ sub power_cycle_host_setup ($) {
];
}
push @approaches, {
-   Name => 'PDU',
+   Name => $pdu_name,
MethObjs => ['nest', @$pdu_m ],
};
}
diff --git a/mg-hosts b/mg-hosts
index 2edf1216..58b4acc3 100755
--- a/mg-hosts
+++ b/mg-hosts
@@ -31,7 +31,7 @@
 #"1" or "off"   : power off
 #"c" or "r" : reboot
 #   APPROACH_RE is a regexp specifying which method to
-#   use.  Methods are:  Only ILOM PDU SSH
+#   use.  Methods are:  Only Combined ILOM PDU SSH
 #   and the lightest method which matches will be used.
 #   If not specified, the most reliable method will be used.
 #
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 18/26] power: power_reboot_attempts: Honour an $approach_re

2019-01-24 Thread Ian Jackson
The semantics are slightly different here: not specifying it means to
try everything rather than only the hardest.  But the effect is
similar: not specifying $approach_re means we must succeed.

Signed-off-by: Ian Jackson 
---
 Osstest/TestSupport.pm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm
index 96ddbc3c..4e1192d4 100644
--- a/Osstest/TestSupport.pm
+++ b/Osstest/TestSupport.pm
@@ -978,8 +978,8 @@ sub power_cycle_host_setup ($) {
 $ho->{PowerApproaches} = \@approaches;
 }
 
-sub power_reboot_attempts ($$$) {
-my ($ho, $setup, $await) = @_;
+sub power_reboot_attempts ($$$;$) {
+my ($ho, $setup, $await, $approach_re) = @_;
 # Power cycles $ho, calling $setup->() while it is (supposedly) off.
 # Then, just after turning $ho on, calls $await->().
 # If $await->() succeeds, great: returns.
@@ -997,6 +997,7 @@ sub power_reboot_attempts ($$$) {
MethObjs => power_cycle_parse_method($ho, 'ssh'),
 };
 foreach my $approach ($ssh, @{ $ho->{PowerApproaches} }) {
+   next if defined $approach_re && $approach->{Name} !~ qr{$approach_re};
logm("power: trying to reboot $ho->{Name} (using $approach->{Name})");
if (eval {
power_approach_invoke($ho, $approach, 0);
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 08/26] power: power_reboot_attempts: Try multiple approaches

2019-01-24 Thread Ian Jackson
Actually iterate over PowerApproaches, rather than calling power_state
with no approach selector regexp.

No overall functional change right now because nothing puts more
than one entry in PowerApproaches.

Signed-off-by: Ian Jackson 
---
 Osstest/TestSupport.pm | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm
index fa7c36e1..d1b7ad66 100644
--- a/Osstest/TestSupport.pm
+++ b/Osstest/TestSupport.pm
@@ -960,11 +960,22 @@ sub power_reboot_attempts ($$$) {
 # to work.  $setup and $await should tolerate this situation; in
 # particular $await must only succeed if the host really did reboot
 # into the boot environment that $await expects.
-power_state($ho, 0);
-$setup->();
-power_cycle_sleep($ho);
-power_state($ho, 1);
-$await->();
+foreach my $approach (@{ $ho->{PowerApproaches} }) {
+   logm("power: rebooting $ho->{Name} (using $approach->{Name})");
+   if (eval {
+   power_approach_invoke($ho, $approach, 0);
+   $setup->();
+   power_cycle_sleep($ho);
+   power_approach_invoke($ho, $approach, 1);
+   $await->();
+   1;
+   }) {
+   logm("power: rebooted $ho->{Name} (using $approach->{Name})");
+   return;
+   }
+   logm("power: failed to reboot (using $approach->{Name}): $@");
+}
+die "power: all approaches to rebooting $ho->{Name} failed!\n";
 }
 
 sub power_cycle_sleep ($) {
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 26/26] DO NOT APPLY power: ts-freebsd-host-install: Use power_reboot_attempts

2019-01-24 Thread Ian Jackson
RFC.  Needs input on how to
 | check that this is the installer environment
as requested by the comment
 | in particular $await must only succeed if the host really did
 | reboot into the boot environment that $await expects.
near the top of power_reboot_attempts

CC: Roger Pau Monné 
Signed-off-by: Ian Jackson 
---
 ts-freebsd-host-install | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/ts-freebsd-host-install b/ts-freebsd-host-install
index 53daeefc..21f3b5a2 100755
--- a/ts-freebsd-host-install
+++ b/ts-freebsd-host-install
@@ -259,14 +259,14 @@ END
 }
 
 # Switch off, setup PXE and switch on to the installer
-power_state($ho, 0);
-setup_netboot_installer();
-power_cycle_sleep($ho);
-power_state($ho, 1);
-
-# Wait for the host to finish booting
-logm("Waiting for the installer to boot");
-await_tcp(get_timeout($ho,'reboot',$timeout), 5, $ho);
+power_reboot_attempts($ho, sub {
+setup_netboot_installer();
+}, sub {
+# Wait for the host to finish booting
+logm("Waiting for the installer to boot");
+await_tcp(get_timeout($ho,'reboot',$timeout), 5, $ho);
+xxx check that this is the installer environment
+}, undef, 'install');
 
 if ($bootonly) {
 hostprop_putative_record($ho, "MemdiskAppend", $memdisk_append)
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 25/26] sg-report-host-history: Show used power approach(es)

2019-01-24 Thread Ian Jackson
Signed-off-by: Ian Jackson 
---
 sg-report-host-history | 56 ++
 1 file changed, 56 insertions(+)

diff --git a/sg-report-host-history b/sg-report-host-history
index 51814534..18b538e9 100755
--- a/sg-report-host-history
+++ b/sg-report-host-history
@@ -161,6 +161,7 @@ sub reporthost ($) {
 print H "branchintendedblessing\n";
 
 print H "jobfailure\n";
+print H "power\n";
 
 print H "\n";
 
@@ -188,6 +189,16 @@ END
 LIMIT 1
 END
 
+# the final AND is just to reduce the data flow
+our $jrunvarq //= db_prepare(<{name};
+   $jrunvarq->execute($jr->{flight}, $jr->{job}, $ident);
+my %runvars;
+while (my ($n, $v) = $jrunvarq->fetchrow_array()) {
+$runvars{$n} = $v;
+}
 
my $altcolour = report_altcolour($alternate);
print H "";
@@ -252,6 +269,45 @@ END
my $ri = report_run_getinfo({ %$jr, %$ir });
print H "{ColourAttr}>$ri->{Content}\n";
 
+   my %powers;
+   foreach my $r (sort keys %runvars) {
+   next unless $r =~ m{^\Q${ident}\E_power_};
+   $powers{$'} = $runvars{$r};
+   }
+   my $skipped = 0;
+my $any_power = 0;
+my $pr_power_colour = sub {
+my ($colour, $contents) = @_;
+if ($any_power++) {
+print H span_colour($colour), $contents, '';
+} else {
+print H "", $contents;
+}
+};
+   my $pr_power = sub {
+   my ($wh) = @_;
+   for (; $skipped; $skipped--) {
+   $pr_power_colour->($grey_mid, " - ");
+   }
+   my $how = $powers{$wh};
+   my $colour =
+   $how =~ /PDU/  ? $yellow :
+   $how =~ /Combined/ ? $yellow :
+   $how eq 'SSH'  ? $green  :
+$grey_pale ;
+$pr_power_colour->($colour, " $how ");
+   };
+   foreach my $wh (qw(install recover)) {
+   $skipped++, next unless exists $powers{$wh};
+   $pr_power->($wh);
+   delete $powers{$wh};
+   }
+   foreach my $wh (sort keys %powers) {
+   $pr_power->($wh);
+   }
+print H "" if !$any_power;
+   print H "\n";
+
print H "\n\n";
$alternate ^= 1;
 }
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 13/26] power: Try rebooting hosts with ssh first

2019-01-24 Thread Ian Jackson
Before we try anything with IPMI (if provided) or the PDU, try
rebooting with ssh.

I think this will dramatically reduce the rate at which we power cycle
our test hosts.

Signed-off-by: Ian Jackson 
---
 Osstest/TestSupport.pm | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm
index c3b2fb7b..8101b739 100644
--- a/Osstest/TestSupport.pm
+++ b/Osstest/TestSupport.pm
@@ -992,8 +992,12 @@ sub power_reboot_attempts ($$$) {
 # to work.  $setup and $await should tolerate this situation; in
 # particular $await must only succeed if the host really did reboot
 # into the boot environment that $await expects.
-foreach my $approach (@{ $ho->{PowerApproaches} }) {
-   logm("power: rebooting $ho->{Name} (using $approach->{Name})");
+my $ssh = {
+   Name => 'SSH',
+   MethObjs => power_cycle_parse_method($ho, 'ssh'),
+};
+foreach my $approach ($ssh, @{ $ho->{PowerApproaches} }) {
+   logm("power: trying to reboot $ho->{Name} (using $approach->{Name})");
if (eval {
power_approach_invoke($ho, $approach, 0);
$setup->();
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 23/26] Executive: Export $grey_* with :colours

2019-01-24 Thread Ian Jackson
No functional change with existing callers.

Signed-off-by: Ian Jackson 
---
 Osstest/Executive.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Osstest/Executive.pm b/Osstest/Executive.pm
index 1b1cdc36..0d8502b5 100644
--- a/Osstest/Executive.pm
+++ b/Osstest/Executive.pm
@@ -60,6 +60,7 @@ BEGIN {
   db_schema_updates_applied db_schema_updates_intree
   );
 %EXPORT_TAGS = ( colours => [qw($green $red $yellow $purple $blue
+$grey_pale $grey_mid $grey_dark
span_colour)] );
 
 @EXPORT_OK   = @{ $EXPORT_TAGS{colours} };
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 00/26] Reduce hard power cycles

2019-01-24 Thread Ian Jackson
We have been having trouble with PDU port relays sticking.  We vaguely
suspect that this is happening in Massachusetts due to the
lower-voltage (120V, and therefore higher-current) electricity
(compared to Cambridge, 240V, where the PDUs remain reliable).

To mitigate this issue (and generally prolong hardware life), try to
avoid hard power cycling test boxes when we can.  Specifically, try a
reboot with ssh first, and if that fails try the IMPI if provided.  If
the IMPI doesn't work either or is not provided, hard power cycle.

Patches 01-25 are ready I think, and have been tested.  There is a
difficulty with the FreeBSD host install, which is discussed in patch
26 which is definitely not ready in this form.

Ian Jackson (26):
  power: Osstest::PDU::*: drop Exporter blocks
  power: Osstest::PDU::*: use parent ::unsupported
  power: ipmi_try: Use `use parent' instead of Exporter
  power: Osstest::PDU::*: reuse default new method where applicable
  power handling: Break out power_cycle_parse_method
  power handling: Introduce power_reboot_attempts
  power: PowerApproaches replaces $ho->{PowerMethobjs}
  power: power_reboot_attempts: Try multiple approaches
  power: Provide `try_off' pdu method; deprecate ipmi_try
  power: New ILOM/PDU arrangements - try just IPMI
  power: Do not sleep between power off and power on if not needed
  power: Provide `ssh' power method
  power: Try rebooting hosts with ssh first
  power: Document Power* host properties and power methods
  power: Honour approach_re in power_cycle and mg-hosts power
  ts-host-powercycle: Use a lighter-weight method if available
  power: Rename target_reboot_force from target_reboot_hard
  power: power_reboot_attempts: Honour an $approach_re
  ts-logs-capture: power: try ILOM first
  power: Use `Combined' as Name for PDU+ILOM approach
  power: Record approach used for power cycles in runvars
  Executive: Break out span_colour
  Executive: Export $grey_* with :colours
  sg-report-host-history: Move SET LOCAL into transaction
  sg-report-host-history: Show used power approach(es)
  DO NOT APPLY power: ts-freebsd-host-install: Use power_reboot_attempts

 Osstest/Executive.pm   |  11 +++-
 Osstest/PDU/eth008.pm  |  11 +---
 Osstest/PDU/guest.pm   |  16 ++---
 Osstest/PDU/ipmi.pm|  11 +---
 Osstest/PDU/ipmi_try.pm|  14 +
 Osstest/PDU/ipmiextra.pm   |  11 +---
 Osstest/PDU/manual.pm  |  16 +
 Osstest/PDU/msw.pm |  11 +---
 Osstest/PDU/pause.pm   |  11 +---
 Osstest/PDU/ssh.pm |  74 ++
 Osstest/PDU/statedb.pm |   2 +
 Osstest/PDU/try_off.pm |  70 +
 Osstest/PDU/unsupported.pm |  16 ++---
 Osstest/PDU/xenuse.pm  |  16 +
 Osstest/TestSupport.pm | 154 +++--
 README |  77 +++
 README.dev |   4 +-
 mg-hosts   |  14 +++--
 sg-report-host-history |  67 ++--
 ts-freebsd-host-install|  16 ++---
 ts-host-install|  19 +++---
 ts-host-powercycle |   6 +-
 ts-logs-capture|   2 +-
 23 files changed, 482 insertions(+), 167 deletions(-)
 create mode 100644 Osstest/PDU/ssh.pm
 create mode 100644 Osstest/PDU/try_off.pm

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 04/26] power: Osstest::PDU::*: reuse default new method where applicable

2019-01-24 Thread Ian Jackson
These two modules both had the same `new' as `unsupported'.  Now that
we have `use parent' they can be removed.

Signed-off-by: Ian Jackson 
---
 Osstest/PDU/manual.pm | 5 -
 Osstest/PDU/xenuse.pm | 5 -
 2 files changed, 10 deletions(-)

diff --git a/Osstest/PDU/manual.pm b/Osstest/PDU/manual.pm
index 78f24db9..7f955b90 100644
--- a/Osstest/PDU/manual.pm
+++ b/Osstest/PDU/manual.pm
@@ -27,11 +27,6 @@ use parent qw(Osstest::PDU::unsupported);
 
 our $tty;
 
-sub new {
-my ($class, $ho) = @_;
-return bless { Host => $ho }, $class;
-}
-
 sub pdu_power_state {
 my ($mo, $on) = @_;
 my $onoff= $on ? "on" : "off";
diff --git a/Osstest/PDU/xenuse.pm b/Osstest/PDU/xenuse.pm
index 2ce0dd22..981b6cd9 100644
--- a/Osstest/PDU/xenuse.pm
+++ b/Osstest/PDU/xenuse.pm
@@ -26,11 +26,6 @@ use IO::File;
 
 use parent qw(Osstest::PDU::unsupported);
 
-sub new {
-my ($class, $ho) = @_;
-return bless { Host => $ho }, $class;
-}
-
 sub pdu_power_state {
 my ($mo, $on) = @_;
 my $onoff= $on ? "on" : "off";
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 15/26] power: Honour approach_re in power_cycle and mg-hosts power

2019-01-24 Thread Ian Jackson
Signed-off-by: Ian Jackson 
---
 Osstest/TestSupport.pm |  8 
 mg-hosts   | 14 +-
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm
index 8101b739..e45f54b2 100644
--- a/Osstest/TestSupport.pm
+++ b/Osstest/TestSupport.pm
@@ -1028,15 +1028,15 @@ sub power_cycle_sleep ($) {
 sleep($to);
 }
 
-sub power_cycle ($) {
-my ($ho) = @_;
+sub power_cycle ($;$) {
+my ($ho, $approach_re) = @_;
 $mjobdb->host_check_allocated($ho);
 die "refusing to set power state for host $ho->{Name}".
" possibly shared with other jobs\n"
if $ho->{SharedMaybeOthers};
-power_state($ho, 0);
+power_state($ho, 0, $approach_re);
 power_cycle_sleep($ho);
-power_state($ho, 1);
+power_state($ho, 1, $approach_re);
 }
 
 sub power_approach_invoke ($$$) {
diff --git a/mg-hosts b/mg-hosts
index 5361eb63..2edf1216 100755
--- a/mg-hosts
+++ b/mg-hosts
@@ -24,12 +24,16 @@
 #   of osstest.  Will use "sudo". The HOST(s) must be
 #   allocated (via mg-allocate HOST).
 #
-#  ./mg-hosts power HOST ACTION
+#  ./mg-hosts power HOST ACTION [APPROACH_RE]
 #   Power cycles the host. Host must be allocated to the current
 #   user.  Actions are:
 #"0" or "on": power on
 #"1" or "off"   : power off
 #"c" or "r" : reboot
+#   APPROACH_RE is a regexp specifying which method to
+#   use.  Methods are:  Only ILOM PDU SSH
+#   and the lightest method which matches will be used.
+#   If not specified, the most reliable method will be used.
 #
 #  ./mg-hosts create-like SOURCE-HOST NEW-HOST[,NEW-HOST...]
 #   Create new host(s).  This does NOT copy the
@@ -151,15 +155,15 @@ END
 }
 
 sub cmd_power () {
-die unless @ARGV==2;
-my ($host,$power) = @ARGV;
+die unless @ARGV==2 || @ARGV==3;
+my ($host,$power,$approach_re) = @ARGV;
 $_ = $power;
 $power = m/^1|^on/ ? 1 : m/^0|^off/ ? 0 : m/^r|^c/ ? -1 : die;
 my $ho= selecthost("host=$host");
 if ($power >= 0) {
-   power_state($ho, $power);
+   power_state($ho, $power, $approach_re);
 } else {
-   power_cycle($ho);
+   power_cycle($ho, $approach_re);
 }
 }
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 19/26] ts-logs-capture: power: try ILOM first

2019-01-24 Thread Ian Jackson
If ILOM is provided, and the host is not responding when we try logs
capture, try the ILOM before hard power cycling the host.

Signed-off-by: Ian Jackson 
---
 Osstest/TestSupport.pm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm
index 4e1192d4..886e95d4 100644
--- a/Osstest/TestSupport.pm
+++ b/Osstest/TestSupport.pm
@@ -1440,8 +1440,9 @@ sub target_reboot ($) {
 
 sub target_reboot_force ($) {
 my ($ho) = @_;
-power_cycle($ho);
-await_tcp(get_timeout($ho,'reboot',$timeout{HardRebootUp}), 5, $ho);
+power_reboot_attempts($ho, sub { }, sub {
+await_tcp(get_timeout($ho,'reboot',$timeout{HardRebootUp}), 5, $ho);
+}, qr{(?!.*SSH)});
 }
 
 sub tcpconnect ($$) {
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 22/26] Executive: Break out span_colour

2019-01-24 Thread Ian Jackson
No functional change.

Signed-off-by: Ian Jackson 
---
 Osstest/Executive.pm | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/Osstest/Executive.pm b/Osstest/Executive.pm
index 0be27b64..1b1cdc36 100644
--- a/Osstest/Executive.pm
+++ b/Osstest/Executive.pm
@@ -59,7 +59,8 @@ BEGIN {
   db_pg_dsn opendb opendb_state
   db_schema_updates_applied db_schema_updates_intree
   );
-%EXPORT_TAGS = ( colours => [qw($green $red $yellow $purple $blue)] );
+%EXPORT_TAGS = ( colours => [qw($green $red $yellow $purple $blue
+   span_colour)] );
 
 @EXPORT_OK   = @{ $EXPORT_TAGS{colours} };
 }
@@ -306,6 +307,11 @@ our $grey_pale= '#66';
 our $grey_mid=  '#88';
 our $grey_dark= '#cc';
 
+sub span_colour ($) {
+my ($colour) = @_;
+""
+}
+
 sub report_run_getinfo ($) {
 # $f is a joined flight/job row, must contain at least
 #flight job status
@@ -345,7 +351,7 @@ END
$summary .= " $fs->{status}";
$colour = $failcolour;
}
-   push @content, "".
+   push @content, span_colour($colour).
encode_entities($summary)."";
 }
if (!@content) {
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 07/26] power: PowerApproaches replaces $ho->{PowerMethobjs}

2019-01-24 Thread Ian Jackson
This new variable contains a list of different approaches to try.

* Move the meat of power_state into power_approach_invoke.
* power_state now looks for a single approach to try.
* The default for power_state is to pick the last approach in
  the list, which by definition is supposed to be the most reliable.
* Currently there will only be one approach, `Only'.

No overall functional change other than to log messages.

Signed-off-by: Ian Jackson 
---
 Osstest/TestSupport.pm | 35 +--
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm
index fb6a407c..fa7c36e1 100644
--- a/Osstest/TestSupport.pm
+++ b/Osstest/TestSupport.pm
@@ -935,8 +935,15 @@ sub power_cycle_parse_method ($$) {
 
 sub power_cycle_host_setup ($) {
 my ($ho) = @_;
-$spec = ($ho->{Power} // 'unsupported');
-$ho->{PowerMethobjs} = power_cycle_parse_method($ho,$spec);
+my $spec = ($ho->{Power} // 'unsupported');
+# $ho->{PowerApproaches}[]{Name}see below
+# $ho->{PowerApproaches}[]{MethObjs}[]  each has ->pdu_power_state($on)
+# `Name's are:
+#Only   Host only supports one method and this is it
+$ho->{PowerApproaches} = {
+Name => 'Only',
+MethObjs => power_cycle_parse_method($ho, $spec),
+};
 }
 
 sub power_reboot_attempts ($$$) {
@@ -978,10 +985,10 @@ sub power_cycle ($) {
 power_state($ho, 1);
 }
 
-sub power_state ($$) {
-my ($ho, $on) = @_;
-logm("power: setting $on for $ho->{Name}");
-my @methobjs = @{ $ho->{PowerMethobjs} };
+sub power_approach_invoke ($$$) {
+my ($ho, $approach, $on) = @_;
+logm("power: setting $on (using $approach->{Name}) for $ho->{Name}");
+my @methobjs = @{ $approach->{MethObjs} };
 if ($methobjs[0] eq 'nest') {
 shift @methobjs;
 @methobjs = reverse @methobjs if !$on;
@@ -991,6 +998,22 @@ sub power_state ($$) {
 }
 }
 
+sub power_state ($$;$) {
+my ($ho, $on, $approach_re) = @_;
+my @approaches = @{ $ho->{PowerApproaches} };
+my $approach;
+if (defined $approach_re) {
+   ($approach) = grep { $_->{Name} =~ qr{$approach_re} } @approaches
+   or die ("No matching power approach for $ho->{Name}".
+   " (wanted $approach_re) (available: ".
+   join(' ', map { $_->{Name} } @approaches).
+   ")\n");
+} else {
+   $approach = $approaches[-1];
+}
+power_approach_invoke($ho,$approach,$on);
+}
+
 #-- host selection and properties --
 
 sub selecthost ($);
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 10/26] power: New ILOM/PDU arrangements - try just IPMI

2019-01-24 Thread Ian Jackson
We honour two new host properties PowerPDU and PowerILOM, in
preference to PowerMethod.  The semantics are going to be properly
documented in a later patch, but, briefly:

If only one of these is supplied, it works like PowerMethod, except
that `nest' is applied by default.

If both are supplied, we make two approaches: one is just ILOM.  The
other is to use ILOM and PDU together, with pause in between, and with
try_off applied to ILOM.

The current configuration in Massachusetts is, for hosts with IPMI, to
provide a PowerMethod specifying ad hoc to use PDU and then IPMI, and
also to provide both PowerPDU and PowerILOM.

The overall result of this patch, with that configuration, is to avoid
using the PDU at all if an IPMI-requested reboot is successful.  This
should significantly reduce the number of hard power cycles for hosts
with IMPI.

Signed-off-by: Ian Jackson 
---
 Osstest/TestSupport.pm | 42 +-
 1 file changed, 37 insertions(+), 5 deletions(-)

diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm
index d1b7ad66..5e2fb488 100644
--- a/Osstest/TestSupport.pm
+++ b/Osstest/TestSupport.pm
@@ -935,15 +935,47 @@ sub power_cycle_parse_method ($$) {
 
 sub power_cycle_host_setup ($) {
 my ($ho) = @_;
-my $spec = ($ho->{Power} // 'unsupported');
 # $ho->{PowerApproaches}[]{Name}see below
 # $ho->{PowerApproaches}[]{MethObjs}[]  each has ->pdu_power_state($on)
 # `Name's are:
 #Only   Host only supports one method and this is it
-$ho->{PowerApproaches} = {
-Name => 'Only',
-MethObjs => power_cycle_parse_method($ho, $spec),
-};
+#ILOM   Try to use just the ILOM
+#PDUTry to use the PDU (but also turn off/on ILOM if provided)
+my @approaches;
+my $pdu_s = get_host_property($ho,'PowerPDU');
+my $ilom_s = get_host_property($ho,'PowerILOM');
+if ($pdu_s || $ilom_s) {
+   my $ilom_m;
+   if ($ilom_s) {
+   $ilom_m = power_cycle_parse_method($ho, $ilom_s);
+   push @approaches, {
+   Name => 'ILOM',
+   MethObjs => ['nest', @$ilom_m ],
+   };
+   }
+   if ($pdu_s) {
+   my $pdu_m = power_cycle_parse_method($ho, $pdu_s);
+   if ($ilom_m) {
+   require Osstest::PDU::try_off;
+   $pdu_m = [
+   @$pdu_m,
+   get_host_method_object($ho, 'PDU', 'pause'),
+   map { Osstest::PDU::try_off->new_from_mo($_) } @$ilom_m,
+   ];
+   }
+   push @approaches, {
+   Name => 'PDU',
+   MethObjs => ['nest', @$pdu_m ],
+   };
+   }
+} else {   
+   my $spec = ($ho->{Power} // 'unsupported');
+   push @approaches, {
+   Name => 'Only',
+   MethObjs => power_cycle_parse_method($ho, $spec),
+};
+}
+$ho->{PowerApproaches} = \@approaches;
 }
 
 sub power_reboot_attempts ($$$) {
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 11/26] power: Do not sleep between power off and power on if not needed

2019-01-24 Thread Ian Jackson
This is controlled by a new query method on power method objects.  If
every power method says it is instananeous then we do not sleep.

This is going to be most useful when we introduce the new SSH
approach, which should not sleep for the power off time.

But we do it for guests (ie, L1 nested hosts) too.

Signed-off-by: Ian Jackson 
---
 Osstest/PDU/guest.pm   | 5 +
 Osstest/PDU/try_off.pm | 5 +
 Osstest/PDU/unsupported.pm | 5 +
 Osstest/TestSupport.pm | 9 -
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/Osstest/PDU/guest.pm b/Osstest/PDU/guest.pm
index 779ba6c2..7bed724c 100755
--- a/Osstest/PDU/guest.pm
+++ b/Osstest/PDU/guest.pm
@@ -56,4 +56,9 @@ sub pdu_power_state {
 }
 }
 
+sub instantaneous {
+my ($mo) = @_;
+return 1;
+}
+
 1;
diff --git a/Osstest/PDU/try_off.pm b/Osstest/PDU/try_off.pm
index aa73c854..37e34301 100644
--- a/Osstest/PDU/try_off.pm
+++ b/Osstest/PDU/try_off.pm
@@ -62,4 +62,9 @@ sub pdu_power_state {
 }
 }
 
+sub instantaneous {
+my ($mo) = @_;
+return $mo->{Then}->instantaneous();
+}
+
 1;
diff --git a/Osstest/PDU/unsupported.pm b/Osstest/PDU/unsupported.pm
index 8627f452..c914defd 100644
--- a/Osstest/PDU/unsupported.pm
+++ b/Osstest/PDU/unsupported.pm
@@ -35,4 +35,9 @@ sub pdu_power_state {
 die "power switch request for $mo->{Host}{Name} ($on) not supported\n";
 }
 
+sub instantaneous {
+my ($mo) = @_;
+return 0;
+}
+
 1;
diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm
index 5e2fb488..c3b2fb7b 100644
--- a/Osstest/TestSupport.pm
+++ b/Osstest/TestSupport.pm
@@ -997,7 +997,14 @@ sub power_reboot_attempts ($$$) {
if (eval {
power_approach_invoke($ho, $approach, 0);
$setup->();
-   power_cycle_sleep($ho);
+   my $need_sleep = 0;
+   foreach my $mo (@{ $approach->{MethObjs} }) {
+   next if $mo eq 'nest';
+   next if $mo->instantaneous();
+   $need_sleep = 1;
+   last;
+   }
+   power_cycle_sleep($ho) if $need_sleep;
power_approach_invoke($ho, $approach, 1);
$await->();
1;
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [OSSTEST PATCH 12/26] power: Provide `ssh' power method

2019-01-24 Thread Ian Jackson
This is not really a power method but it can pretend to be one.  On
power off, it does nothing.  On power on it logs into the host to ask
it to do a hard reboot.

This is rather best effort, but it is eminently suitable for our new
approach/attempts arrangements because those will try another approach
if ssh didn't work.

Signed-off-by: Ian Jackson 
---
 Osstest/PDU/ssh.pm | 74 ++
 1 file changed, 74 insertions(+)
 create mode 100644 Osstest/PDU/ssh.pm

diff --git a/Osstest/PDU/ssh.pm b/Osstest/PDU/ssh.pm
new file mode 100644
index ..ac1eb919
--- /dev/null
+++ b/Osstest/PDU/ssh.pm
@@ -0,0 +1,74 @@
+# This is part of "osstest", an automated testing framework for Xen.
+# Copyright (C) 2009-2013 Citrix Inc.
+# 
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+# 
+# You should have received a copy of the GNU Affero General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+
+package Osstest::PDU::ssh;
+
+# This power method is used automatically, even when not configured.
+
+use strict;
+use warnings;
+
+use Osstest;
+use Osstest::TestSupport;
+
+use parent qw(Osstest::PDU::unsupported);
+
+our $tty;
+
+sub pdu_sleep_required {
+return 0;
+}
+
+sub pdu_power_state {
+my ($mo, $on) = @_;
+
+if (!$on) {
+   logm("power: request to turn off via SSH method, ignored");
+   return;
+}
+
+# These games with ( )& are needed because the command to request
+# a hard reboot request will not return, so the tcp connection
+# carrying our ssh command request would just hang.
+
+my $delay = 5;
+
+target_cmd_root($mo->{Host}, <<'END', 60);
+ set -e
+ type reboot
+ exec >>/var/log/osstest-reboot.log
+ date
+ exec &1
+ set -x
+ (
+set +e
+sleep $delay
+reboot -f -n -d   # Linux sysvinit/systemd
+reboot -nq# FreeBSD (rejects above due to -f)
+ )&
+END
+
+sleep($delay);
+}
+
+sub instantaneous {
+my ($mo) = @_;
+# This does not need any more sleep, if it worked.
+return 1;
+}
+
+1;
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

<    7   8   9   10   11   12   13   14   15   16   >