On 09/05/2019 14:25, Jan Beulich wrote: >>>> On 09.05.19 at 14:50, <andrew.coop...@citrix.com> wrote: >> On 09/05/2019 13:35, Jan Beulich wrote: >>> Right now this goes unnoticed until some subsequent page allocator >>> operation stumbles across the thus corrupted list. We can do better: >>> Only PGC_state_inuse and PGC_state_offlining pages can legitimately be >>> passed to free_heap_pages(). >>> >>> Take the opportunity and also restrict the PGC_broken check to the >>> PGC_state_offlining case, as only pages of that type or >>> PGC_state_offlined may have this flag set on them. Similarly, since >>> PGC_state_offlined is not a valid input state, the setting of "tainted" >>> can be restricted to just this case. >>> >>> Signed-off-by: Jan Beulich <jbeul...@suse.com> >> Acked-by: Andrew Cooper <andrew.coop...@citrix.com>, with a suggestion. > Thanks. > >>> --- a/xen/common/page_alloc.c >>> +++ b/xen/common/page_alloc.c >>> @@ -1409,13 +1409,22 @@ static void free_heap_pages( >>> * in its pseudophysical address space). >>> * In all the above cases there can be no guest mappings of this >>> page. >>> */ >>> - ASSERT(!page_state_is(&pg[i], offlined)); >>> - pg[i].count_info = >>> - ((pg[i].count_info & PGC_broken) | >>> - (page_state_is(&pg[i], offlining) >>> - ? PGC_state_offlined : PGC_state_free)); >>> - if ( page_state_is(&pg[i], offlined) ) >>> + switch ( pg[i].count_info & PGC_state ) >>> + { >>> + case PGC_state_inuse: >>> + BUG_ON(pg[i].count_info & PGC_broken); >>> + pg[i].count_info = PGC_state_free; >>> + break; >>> + >>> + case PGC_state_offlining: >>> + pg[i].count_info = (pg[i].count_info & PGC_broken) | >>> + PGC_state_offlined; >>> tainted = 1; >>> + break; >>> + >>> + default: >> Given that this is a fully fatal condition, it would be helpful to at >> least print the state we found here. For cases other than >> PGC_state_free, it would probably be a very useful piece of information >> for diagnosing what went wrong. > Funny you should say this - I have the debugging patch below on top > in my tree. I could easily submit this as a standalone follow-on patch.
TBH, I think it would be fine folded in, although with... > > Jan > > --- unstable.orig/xen/common/page_alloc.c > +++ unstable/xen/common/page_alloc.c > @@ -1014,7 +1014,14 @@ static struct page_info *alloc_heap_page > for ( i = 0; i < (1 << order); i++ ) > { > /* Reference count must continuously be zero for free pages. */ > - BUG_ON((pg[i].count_info & ~PGC_need_scrub) != PGC_state_free); > + if ( (pg[i].count_info & ~PGC_need_scrub) != PGC_state_free ) > + { > + printk(XENLOG_ERR "pg[%x] m=%lx c=%lx o=%x v=%lx t=%x\n", "pg[%u] mfn %"PRImfn" c=%#lx o=%u v=%#lx t=%#x\n" so we don't end up printing numbers which are ambiguous between hex/dec. With at least the ambiguity removed, my ack stands. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel