Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-07-06 Thread Matthew D. Fuller


> See if you can work out how to reproduce it at will and try it
> against the new patch.

   (... and now that I carefully planned to add the following before
   I sent the mail, and then completely didn't...)

The MonitorLayout config var I just landed may be helpful in this; a
hardcoded variant was the only way I managed to track down my crash in
the first place.  By using it to turn a Xnest into two side-by-side
monitors, I could easily "fullscreen" an app, and have the other half
of the nest easily visible to focus into/out of.  Being able to flit
the focus around like that, to the desktop or to other visible
windows, makes it a lot easier to track down.

As did being able to do it in a controlled environment like that, not
by trying to crash the WM I'm actually having to _use_.  Not just in
not disrupting my work, but also having the debug output somewhere
easy to fiddle with after it bombs out.


(I also have this dream, that MonitorLayout can give us all the excuse
we need to take VirtualScreens out back of the woodshed and shoot it
in the head...)


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.



Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-07-06 Thread Matthew D. Fuller
On Mon, Jul 01, 2019 at 07:36:23PM +0200 I heard the voice of
Rhialto, and lo! it spake thus:
> 
> Intersting! My theory so far was that *during* the change of focus,
> the window depths might be inconsistent. But after
> OtpRestackWindow()ing the newly focused window, it should be
> finished, and all consistent and happy. That was consistent with my
> crash between the two calls to OtpRestackWindow(). My change was
> also based on that idea.

So, the crash isn't necessarily about where the window is in the X
stacking order, it's about where it is in the OtpWinList chain.  (I'm
going on the assumption here that your crash is the same situation as
mine, just triggered differently.)  And what causes the actual blowup
is that the transient has the new level's priority, but hasn't been
moved from its previous position in the OWL list, and so the
consistency check craps itself.

Of course, the crash isn't "between" the Restack calls, since the
Restack is what calls the consistency check that crashes; it's _in_
one or the other.  Mine is always in the second one, because my
trigger is on focusing _from_ a non-FS-focus window _into_ one that
is; my current theory is that your crash is in the first one, because
you're hitting it when focusing _away_ from one that is, into one
that's not.  In those cases, I never make the first call (since the
window I'm focusing away from isn't OtpIsFocusDependent()), and you'd
never make the second for the same reason, so there's only one place
either of us _could_ crash.

On that theory, my guess is that your patch could fix your case, since
you'd be crashing on a mis-placement of the old focus'd win's
transient in the list, it being lowered down to normal since the focus
has moved away.  And whether ->Focus is NULL or the new window won't
make any difference to that.


Now, I _can_ cause a crash on leaving the old.  I do that by having a
transient up of the focus/fullscreen, _and_ having another window
that's in a + OTP layer (e.g., an xterm in +2), then moving out of the
fullscreener.  Great.  And with my patch, that...  also crashes.  In a
new way.  Crap.

Sigh.  The whole way we look for the transients is wonky, because of
the assumptions it makes about where they already are.  So we just
have to expand it and search the whole thing.  So, new iteration: the
attached.  With this, I can't provoke a crash in my old or new tests
anymore.  (it still has some slightly subideal cases, but they aren't
correctness related, so I'm ignoring them for now).


> So far I haven't turned on debugging to print all restacking
> operations, but maybe it is time for that... and we might find out
> that everything is different...

It's plenty verbose   :p

I actually found it mostly less helpful than I'd hoped, due to details
of how the crashes happened and were provoked.  I wound up having to
add some of my own fprintf's to dump out the important details.  Now I
sorta wanna spent some time reworking how all that debug goes together
and outputs stuff.  Well, on the list...

See if you can work out how to reproduce it at will and try it against the
new patch.


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.
=== modified file 'otp.c'
--- otp.c	2018-08-19 22:50:13 +
+++ otp.c	2019-07-07 04:23:24 +
@@ -1579,6 +1579,99 @@
 }
 
 
+
+/**
+ * Focus/unfocus backend.  This is used on windows whose stacking is
+ * focus-dependent (e.g., EWMH fullscreen), to move them and their
+ * transients around.  For these windows, getting/losing focus is
+ * practically the same as a f.setpriority, except it's on the calculated
+ * rather than the base parts.  And it's hard to re-use our existing
+ * functions to do it because we have to move Scr->Focus before the main
+ * window changes, but then it's too late to see where all the transients
+ * were.
+ *
+ * There are a number of unpleasant assumptions in here relating to where
+ * the transients are, and IWBNI we could be smarter and quicker about
+ * dealing with them.  But this gets us past the simple to cause
+ * assertion failures, anyway...
+ */
+static void
+OtpFocusWindowBE(TwmWindow *twm_win, int oldprio)
+{
+	OtpWinList *owl = twm_win->otp;
+
+	// This one comes off the list, and goes back in its new place.
+	RemoveOwl(owl);
+	InsertOwl(owl, Above);
+
+	// Now root around for any transients of it, and
+	// nudge them into the new location.  The whole Above/Below thing is
+	// kinda a heavy-handed guess, but...
+	//
+	// This is nearly a reimplementation of TryToMoveTransientsOfTo(),
+	// but the assumption that we can find the transients by starting
+	// from where the old priority was in the list turns out to be deeply
+	// broken.  So just walk the whole thing.  Which isn't ideal, but...
+	//
+	// XXX It should not be this freakin' hard to find a window's
+	// transients.  We should fix that more globally.