Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-07-15 Thread Stefan Monnier


> of the list...  whoops.  Totally not my fault.  The mailing list is
> obviously inserting bugs in patches as they pass through it.

Luckily, the relevant info is in the headers:

X-List-Server: Minimalist v2.5(3) (Good Bowl) 

Do you want me to file the bug report?


Stefan



Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-07-14 Thread Matthew D. Fuller


On Thu, Jul 11, 2019 at 08:15:36PM +0200 I heard the voice of
Rhialto, and lo! it spake thus:
> 
> I do notice that SetFocus() duplicates a lot of work that the
> FocusIn and FocusOut event handlers do.

Yeah.  Almost-duplicated code makes me a bit grumpy.  I spent a minute
or two thinking about how to collapse them together the other way
around and didn't come up with anything I loved.  Should probably go
on the "cleanup-someday" list.


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.



Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-07-14 Thread Matthew D. Fuller


On Thu, Jul 11, 2019 at 11:16:31PM +0200 I heard the voice of
Rhialto, and lo! it spake thus:
> 
> but I did find some other scenario that does weird things :-(

[...]

> so maybe that is a sign of a loop in ctwm indeed. It might be
> getting confused while getting the OTP stack in order with now 3
> windows that have the wrong priority. (Maybe restacking needs to
> happen by first removing all transients, then re-adding them all,
> instead of one by one??)

I think it turns out to be more banal than that.  xine helpfully
positions the panel and nav controls bumped right up against each
other, which means with the borders we add there's actually a
single-pixel overlap.  I can trigger it just to by kicking to
fullscreen with things positioned right, and then there's this weird
visible flickering right at the intersection between the two
transients.  Ahah.

We start from the bottom of the OWL stack.  When we find a transient,
we move it up to the top of the stack, then move on to the window that
was previously right above it.  Which actually meant that we moved the
transient up _twice_; once when we found it in the middle of the list,
then again when we found it at the top.  But the second time, the
window previously right above it was NULL, so, fine, we're done.  Now,
if we have *two* (or presumably >2) transients that we move to the top
of the list...  whoops.  Totally not my fault.  The mailing list is
obviously inserting bugs in patches as they pass through it.

So we have to just build up the list of transients during the pass
through the list, then move them after we've completed the search.

Fixed up, and landed the stack of changes in trunk.  If we don't hit
any new surprises with it, I'll plan to try backporting it to before
the RLayout/RANDR changes and make a 4.0.3 in a week or so.


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.



Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-07-11 Thread Rhialto
On Thu 11 Jul 2019 at 20:15:36 +0200, Rhialto wrote:
> Indeed, I can't reproduce it anymore now with this scenario. I'm pretty
> sure it is the same bug that happened "randomly" before.

but I did find some other scenario that does weird things :-(

It's not an OTP assertion failure but some sort of loop or deadlock or
something.

Scenario: start with fullscreen Xine ("F"), with open on top of that its
Panel window ("P) with its Navigator side-window ("N") with even more
DVD controls. In Xine's Panel you get that window if you click the N in
its bottom right. The problem does not occur without N.

Now, maybe first some focussing on N and P might be needed, or maybe it
is not required. Then with the mouse on F, hit the "f" key to switch
from fullscreen to a normal window.

Now ctwm freezes. F and N get un-painted, but F doesn't change and in
fact no other events seem to have any effect.

I have ctwm running in gdb and that shows nothing weird. But if I
interrupt it and get a stack trace, it is in a location where it ought
not be very long ever (NetBSD now ships with debugging symbols for all
system libraries and programs, so even within X libs there is useful
info):

(gdb) run --replace
Starting program: /mnt/vol1/rhialto/cvs/other/ctwm/bzr/trunk/build/ctwm 
--replace
load: 0.21  cmd: ctwm 1719 [select] 0.18u 0.22s 0% 3580k

Program received signal SIGINFO, Information request.
0x76704223e28a in poll () from /usr/lib/libc.so.12
(gdb) bt
#0  0x76704223e28a in poll () from /usr/lib/libc.so.12
#1  0x767041e18b91 in _xcb_conn_wait (c=c@entry=0x76704451f000, 
cond=cond@entry=0x767044520128, vector=vector@entry=0x7f7fff5f9c78, 
count=count@entry=0x7f7fff5f9c74)
at /usr/xsrc/external/mit/libxcb/dist/src/xcb_conn.c:479
#2  0x767041e16517 in _xcb_out_send (c=c@entry=0x76704451f000, 
vector=vector@entry=0x7f7fff5f9cf0, count=count@entry=3)
at /usr/xsrc/external/mit/libxcb/dist/src/xcb_out.c:458
#3  0x767041e16588 in xcb_writev (c=c@entry=0x76704451f000, 
vector=vector@entry=0x7f7fff5f9cf0, count=count@entry=3, 
requests=requests@entry=819)
at /usr/xsrc/external/mit/libxcb/dist/src/xcb_out.c:406
#4  0x767043e48f76 in _XSend (dpy=dpy@entry=0x76704451d000, 
data=data@entry=0x0, 
size=size@entry=0) at /usr/xsrc/external/mit/libX11/dist/src/xcb_io.c:486
#5  0x767043e49272 in _XFlush (dpy=0x76704451d000)
at /usr/xsrc/external/mit/libX11/dist/src/xcb_io.c:503
#6  0x767043e6da55 in _XGetRequest (dpy=dpy@entry=0x76704451d000, 
type=type@entry=12 '\f', len=len@entry=12)
at /usr/xsrc/external/mit/libX11/dist/src/XlibInt.c:1707
#7  0x767043e290b5 in XConfigureWindow (dpy=0x76704451d000, w=16777841, 
mask=mask@entry=96, changes=changes@entry=0x7f7fff5f9e00)
at /usr/xsrc/external/mit/libX11/dist/src/ReconfWin.c:48
#8  0x004220d3 in InsertOwlAbove (owl=0x767044583a40, 
other_owl=0x767044583a10)
at /mnt/vol1/rhialto/cvs/other/ctwm/bzr/trunk/otp.c:483
#9  0x0042420d in OtpFocusWindowBE (oldprio=, 
twm_win=0x767044540c00, twm_win=0x767044540c00)
at /mnt/vol1/rhialto/cvs/other/ctwm/bzr/trunk/otp.c:1675
#10 0x004242b3 in OtpFocusWindow (twm_win=twm_win@entry=0x767044540c00)
at /mnt/vol1/rhialto/cvs/other/ctwm/bzr/trunk/otp.c:1718
#11 0x0042f8de in SetFocus (tmp_win=0x767044540c00, tim=)
at /mnt/vol1/rhialto/cvs/other/ctwm/bzr/trunk/win_ops.c:171
#12 0x004096b0 in HandleEnterNotify ()
at /mnt/vol1/rhialto/cvs/other/ctwm/bzr/trunk/event_handlers.c:3304
#13 0x00408804 in DispatchEvent ()
at /mnt/vol1/rhialto/cvs/other/ctwm/bzr/trunk/event_core.c:331
#14 0x00408aa8 in HandleEvents ()
at /mnt/vol1/rhialto/cvs/other/ctwm/bzr/trunk/event_core.c:203
#15 0x00407eff in ctwm_main (argc=, argv=)
at /mnt/vol1/rhialto/cvs/other/ctwm/bzr/trunk/ctwm_main.c:1055
#16 0x004061ab in ___start ()
#17 0x7670449ed000 in ?? ()
#18 0x0002 in ?? ()
#19 0x7f7fff5fa838 in ?? ()
#20 0x7f7fff5fa86e in ?? ()
#21 0x in ?? ()
(gdb)

Maybe the XConfigureWindow() call has some impossible arguments that are
not caught in Xlib? It looks like a reply never arrives, or maybe there
is a loop of some kind. 

One time I caught it at

(gdb) run
Starting program: /mnt/vol1/rhialto/cvs/other/ctwm/bzr/trunk/build/ctwm 
load: 0.86  cmd: ctwm 2269 [0x421e16/0] 0.13u 0.09s 0% 3588k

Program received signal SIGINFO, Information request.
XFindContext (display=0x768b6991d000, rid=27264372, context=-1, 
data=data@entry=0x7f7fff372c38)
at /usr/xsrc/external/mit/libX11/dist/src/Context.c:252
252 /usr/xsrc/external/mit/libX11/dist/src/Context.c: No such file or 
directory.
(gdb) bt
#0  XFindContext (display=0x768b6991d000, rid=27264372, context=-1, 
data=data@entry=0x7f7fff372c38)
at /usr/xsrc/external/mit/libX11/dist/src/Context.c:252
#1  0x00432cdc in GetTwmWindow (w=)
at 

Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-07-08 Thread Stefan Monnier


> I mean, I'm actually assuming an _array_ would work better than
> either.  But if I can't dream up insane anti-solutions that would
> _technically_ work, what's the point of weekends anyway?

Good point.  This said, the current way windows and turned into
TwmWindow structs has been "efficient enough" for several decades (for
me anyway, and I routinely have 200-300 windows), and AFAIK hardware
resources are not depleting yet (even though single thread performance
has stopped improving), so I think there's no hurry to improve this ;-)


Stefan



Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-07-08 Thread Stefan Monnier


>> Hmm... so is it because we changed PRI(owl) without changing the
>> OtpWinList accordingly, [...]
>
> Yes[0].  The PRI() of the transient changed when the main window
> gained focus, but it's still down among the normal windows where it
> was prior.

Looking at OwlEffectivePriority which in my code was just a access to
the owl->priority field, I see that the priority is now a lot more
dynamic, so there's indeed a lot more ways to change the priority and
forget to adjust the stacking accordingly.

>> I missed this change.  I'm not very familiar with FULLSCREEN.  Why
>> do we need to change the OTP depending on focus?
> https://specifications.freedesktop.org/wm-spec/wm-spec-1.3.html#STACKINGORDER
>
>* [on the top of everything] focused windows having state
>  _NET_WM_STATE_FULLSCREEN

I don't understand how that spec says that focus changes should change
stacking order.  It doesn't seem to say anything about what we should
with FULLSCREEN windows that don't have focus, so it seems perfectly
acceptable to keep them on top.

To my naive reading, I'd argue that we should raise windows to the
foreground when they get the FULLSCREEN prop and disregard focus.

Of course, that shouldn't make any significant change to the code
anyway: when we get a property change to/from FULLSCREEN we need to
restack the window (and potentially its transients), and if we decide to
pay attention to focus, then it just means we also need to restack
similarly when the focus of a FULLSCREEN window changes.  And indeed,
that's what I see.

So, yes, the code has changed a fair bit since when I wrote it, and the
fact that priority can be "inherited" by transients means that several
windows can change priority at the same time, so restacking needs to
look for the transients and restack them as well (and be careful not to
run the consistency check before they've all been restacked).


Stefan



Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-07-08 Thread Matthew D. Fuller


On Mon, Jul 08, 2019 at 11:08:25AM -0400 I heard the voice of
Stefan Monnier, and lo! it spake thus:
> > Actually, what we obviously need to do is get rid of TwmWindow
> > structs wholesale, just embed SQLite, and store our runtime data
> > there; then the indexes will find them for us!
> 
> I don't follow.  Wouldn't a hash-table work just as well?  SQLite
> seems like a rather expensive way to implement a hash-table.

I mean, I'm actually assuming an _array_ would work better than
either.  But if I can't dream up insane anti-solutions that would
_technically_ work, what's the point of weekends anyway?


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.



Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-07-08 Thread Matthew D. Fuller


On Mon, Jul 08, 2019 at 11:04:10AM -0400 I heard the voice of
Stefan Monnier, and lo! it spake thus:
> 
> Hmm... so is it because we changed PRI(owl) without changing the
> OtpWinList accordingly, [...]

Yes[0].  The PRI() of the transient changed when the main window
gained focus, but it's still down among the normal windows where it
was prior.

We don't _move_ the transient to a new priority; it's just THERE as
soon as its master has focus.  's why the solution I wound up with was
that we just needed OTP entry points that explictly say "do {un,}focus
stuff", rather than the previous "restack this window", so it knows it
needs to explicitly handle already-unsorted stuff as part of the
action.


> IIRC we already had code doing something along those lines, more
> specifically code that raises its transients when we raise a normal
> window (see RaiseSmallTransientsOfAbove).

Yes, but that has two issues.  One, it only does _Small_ transients,
so misses larger one.  And the other (and the reason I couldn't just
use TryToMoveTransientsOfTo()) is that it embeds assumptions about how
to find the transients, which aren't effective since it's not where
its priority would indicate in the list (heck, if it were, we wouldn't
be eating the assertion fail, either...).


> I missed this change.  I'm not very familiar with FULLSCREEN.  Why
> do we need to change the OTP depending on focus?

https://specifications.freedesktop.org/wm-spec/wm-spec-1.3.html#STACKINGORDER

   * [on the top of everything] focused windows having state
 _NET_WM_STATE_FULLSCREEN


> Can you point me to the code which changes the "OTP" in response to
> focus change?

That happens in OwlEffectivePriority() (which PRI() is just a call
to).

https://bazaar.launchpad.net/~ctwm/ctwm/trunk/view/head:/otp.c#L1654




[0] In my case.  I'm still going with the assumption that Olaf's is
closely related, except his seems to happen on _un_focus rather
than focus, but we haven't managed to verify it yet.



-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.



Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-07-08 Thread Stefan Monnier


> Actually, what we obviously need to do is get rid of TwmWindow structs
> wholesale, just embed SQLite, and store our runtime data there; then
> the indexes will find them for us!

I don't follow.  Wouldn't a hash-table work just as well?
SQLite seems like a rather expensive way to implement a hash-table.


Stefan



Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-07-08 Thread Stefan Monnier


>> From what I remember of the time I wrote and debugged OTP, these
>> problems were invariably due to things like calling RaiseWindow
>> instead of OtpRaiseWindow.
>
> That would lead to assertion failures where it compares against the X
> window stack.  These troubles are triggering a bit earlier, on the
>
> assert(PRI(owl) >= priority);
>
> e.g., the priority isn't monotonically increasing from the bottom->top
> of the OWL.

Hmm... so is it because we changed PRI(owl) without changing the
OtpWinList accordingly, or is it because we changed OtpWinList without
changing PRI(owl)?

> This got trickier with the _FULLSCREEN stuff, where the
> effective OTP of a window now depends on whether it has focus or not,

I missed this change.  I'm not very familiar with FULLSCREEN.  Why do we
need to change the OTP depending on focus?

Can you point me to the code which changes the "OTP" in response to
focus change?

> And, in this case, DOUBLE tricky, because its transients have to move
> with it;

IIRC we already had code doing something along those lines, more
specifically code that raises its transients when we raise a normal
window (see RaiseSmallTransientsOfAbove).


Stefan



Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-07-07 Thread Matthew D. Fuller


On Sun, Jul 07, 2019 at 10:35:01PM +0200 I heard the voice of
Rhialto, and lo! it spake thus:
> 
> An obvious change here would to link all transients for a window to
> that window.

Well, we have a link that direction, via TwmWindow->transientfor
(though that's more expensive than ideal, since it has the Window, not
the TwmWindow).  It's going the other way we lack, short of checking
everything.  I'm of a mind to remedy that...  though, that'd be more
of a Step 2 sort of thing, so that a fix for our current bugaboo is
easily backportable.

Actually, what we obviously need to do is get rid of TwmWindow structs
wholesale, just embed SQLite, and store our runtime data there; then
the indexes will find them for us!


> The code for OtpFocusWindowBE() could avoid a copy of
> TryToMoveTransientsOfTo() [...]

There are enough differences that we'd probably have to chop it up
pretty bad.  Especially now that it's searching in a whole different
way, through the whole OWL, to be able to actually find the
transients.  Or alternately, we could easily just use it, all we'd
have to do is rewrite it.  That's more like Step 3...

(also, that whole OwlRightBelow()->above idiom to find "bottom of
layer X" gives me a headache every time I read it.  I wonder if we
should just have "struct OtpWinList *bottomowl[17]" in Screen instead
and just treat each layer individually...)


> Since it calls PRI(owl) several times, and that is more expensive
> than it looks,

Eh, it's not too bad.  The function call itself probably costs more
than the couple dozen simple ops it does.  Except for when it's run on
a transient window and has to go find the master, but...   well, now
we've looped back up to the top of the mail:)


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.



Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-07-07 Thread Rhialto
On Sat 06 Jul 2019 at 23:34:07 -0500, Matthew D. Fuller wrote:
> Sigh.  The whole way we look for the transients is wonky, because of
> the assumptions it makes about where they already are.  So we just
> have to expand it and search the whole thing.  So, new iteration: the

An obvious change here would to link all transients for a window to that
window. Although transients for transients (if those would occur) should
probably be linked to the "master intransient" (or whatshallwecallit).

A less drastic way might be to just add a boolean to each window,
indicating if it is a "master intransient" window or not. That would
save searching in case it isn't.

> === modified file 'otp.c'
> --- otp.c 2018-08-19 22:50:13 +
> +++ otp.c 2019-07-07 04:23:24 +
> @@ -1579,6 +1579,99 @@

The code for OtpFocusWindowBE() could avoid a copy of
TryToMoveTransientsOfTo() if that function got an extra parameter: the
old_priority, equal to PRI(owl) (or in case of OtpFocusWindowBE(), equal
to oldprio). Since it calls PRI(owl) several times, and that is more
expensive than it looks, it is even a good thing to turn that into a
parameter. There is also less risk that PRI(owl) would change somehow
through the list iteration.

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert  -- "What good is a Ring of Power
\X/ rhialto/at/falu.nl  -- if you're unable...to Speak." - Agent Elrond


signature.asc
Description: PGP signature


Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-07-07 Thread Matthew D. Fuller


On Sun, Jul 07, 2019 at 09:05:02AM -0400 I heard the voice of
Stefan Monnier, and lo! it spake thus:
> 
> From what I remember of the time I wrote and debugged OTP, these
> problems were invariably due to things like calling RaiseWindow
> instead of OtpRaiseWindow.

That would lead to assertion failures where it compares against the X
window stack.  These troubles are triggering a bit earlier, on the

assert(PRI(owl) >= priority);

e.g., the priority isn't monotonically increasing from the bottom->top
of the OWL.  This got trickier with the _FULLSCREEN stuff, where the
effective OTP of a window now depends on whether it has focus or not,
so it has to get reshuffled as a side effect of the focus change,
rather than an explicit raise/lower or OTP change.

And, in this case, DOUBLE tricky, because its transients have to move
with it; when a _FULLSCREEN window is focused, it transients are
jumped up with it, and when it's unfocused, they have to move down
too.  So any given window's OTP isn't even dependent solely on whether
_it_ has focus, we get to worry about its "family tree", too.


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.



Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-07-07 Thread Stefan Monnier


> So, the crash isn't necessarily about where the window is in the X
> stacking order, it's about where it is in the OtpWinList chain.

>From what I remember of the time I wrote and debugged OTP, these
problems were invariably due to things like calling RaiseWindow instead
of OtpRaiseWindow.

IOW every single one of those bugs was fixed by hunting for the place
where ctwm changed the X stacking order without correspondingly updating
the OtpWinList chain.


Stefan



Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-07-06 Thread Matthew D. Fuller


> See if you can work out how to reproduce it at will and try it
> against the new patch.

   (... and now that I carefully planned to add the following before
   I sent the mail, and then completely didn't...)

The MonitorLayout config var I just landed may be helpful in this; a
hardcoded variant was the only way I managed to track down my crash in
the first place.  By using it to turn a Xnest into two side-by-side
monitors, I could easily "fullscreen" an app, and have the other half
of the nest easily visible to focus into/out of.  Being able to flit
the focus around like that, to the desktop or to other visible
windows, makes it a lot easier to track down.

As did being able to do it in a controlled environment like that, not
by trying to crash the WM I'm actually having to _use_.  Not just in
not disrupting my work, but also having the debug output somewhere
easy to fiddle with after it bombs out.


(I also have this dream, that MonitorLayout can give us all the excuse
we need to take VirtualScreens out back of the woodshed and shoot it
in the head...)


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.



Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-07-06 Thread Matthew D. Fuller
On Mon, Jul 01, 2019 at 07:36:23PM +0200 I heard the voice of
Rhialto, and lo! it spake thus:
> 
> Intersting! My theory so far was that *during* the change of focus,
> the window depths might be inconsistent. But after
> OtpRestackWindow()ing the newly focused window, it should be
> finished, and all consistent and happy. That was consistent with my
> crash between the two calls to OtpRestackWindow(). My change was
> also based on that idea.

So, the crash isn't necessarily about where the window is in the X
stacking order, it's about where it is in the OtpWinList chain.  (I'm
going on the assumption here that your crash is the same situation as
mine, just triggered differently.)  And what causes the actual blowup
is that the transient has the new level's priority, but hasn't been
moved from its previous position in the OWL list, and so the
consistency check craps itself.

Of course, the crash isn't "between" the Restack calls, since the
Restack is what calls the consistency check that crashes; it's _in_
one or the other.  Mine is always in the second one, because my
trigger is on focusing _from_ a non-FS-focus window _into_ one that
is; my current theory is that your crash is in the first one, because
you're hitting it when focusing _away_ from one that is, into one
that's not.  In those cases, I never make the first call (since the
window I'm focusing away from isn't OtpIsFocusDependent()), and you'd
never make the second for the same reason, so there's only one place
either of us _could_ crash.

On that theory, my guess is that your patch could fix your case, since
you'd be crashing on a mis-placement of the old focus'd win's
transient in the list, it being lowered down to normal since the focus
has moved away.  And whether ->Focus is NULL or the new window won't
make any difference to that.


Now, I _can_ cause a crash on leaving the old.  I do that by having a
transient up of the focus/fullscreen, _and_ having another window
that's in a + OTP layer (e.g., an xterm in +2), then moving out of the
fullscreener.  Great.  And with my patch, that...  also crashes.  In a
new way.  Crap.

Sigh.  The whole way we look for the transients is wonky, because of
the assumptions it makes about where they already are.  So we just
have to expand it and search the whole thing.  So, new iteration: the
attached.  With this, I can't provoke a crash in my old or new tests
anymore.  (it still has some slightly subideal cases, but they aren't
correctness related, so I'm ignoring them for now).


> So far I haven't turned on debugging to print all restacking
> operations, but maybe it is time for that... and we might find out
> that everything is different...

It's plenty verbose   :p

I actually found it mostly less helpful than I'd hoped, due to details
of how the crashes happened and were provoked.  I wound up having to
add some of my own fprintf's to dump out the important details.  Now I
sorta wanna spent some time reworking how all that debug goes together
and outputs stuff.  Well, on the list...

See if you can work out how to reproduce it at will and try it against the
new patch.


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.
=== modified file 'otp.c'
--- otp.c	2018-08-19 22:50:13 +
+++ otp.c	2019-07-07 04:23:24 +
@@ -1579,6 +1579,99 @@
 }
 
 
+
+/**
+ * Focus/unfocus backend.  This is used on windows whose stacking is
+ * focus-dependent (e.g., EWMH fullscreen), to move them and their
+ * transients around.  For these windows, getting/losing focus is
+ * practically the same as a f.setpriority, except it's on the calculated
+ * rather than the base parts.  And it's hard to re-use our existing
+ * functions to do it because we have to move Scr->Focus before the main
+ * window changes, but then it's too late to see where all the transients
+ * were.
+ *
+ * There are a number of unpleasant assumptions in here relating to where
+ * the transients are, and IWBNI we could be smarter and quicker about
+ * dealing with them.  But this gets us past the simple to cause
+ * assertion failures, anyway...
+ */
+static void
+OtpFocusWindowBE(TwmWindow *twm_win, int oldprio)
+{
+	OtpWinList *owl = twm_win->otp;
+
+	// This one comes off the list, and goes back in its new place.
+	RemoveOwl(owl);
+	InsertOwl(owl, Above);
+
+	// Now root around for any transients of it, and
+	// nudge them into the new location.  The whole Above/Below thing is
+	// kinda a heavy-handed guess, but...
+	//
+	// This is nearly a reimplementation of TryToMoveTransientsOfTo(),
+	// but the assumption that we can find the transients by starting
+	// from where the old priority was in the list turns out to be deeply
+	// broken.  So just walk the whole thing.  Which isn't ideal, but...
+	//
+	// XXX It should not be this freakin' hard to find a window's
+	// transients.  We should fix that more globally.

Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-07-01 Thread Rhialto
On Sun 30 Jun 2019 at 18:36:20 -0500, Matthew D. Fuller wrote:
> > The change (or lack thereof) in SetFocus() makes no difference at
> > all in either case for me.  Which seems a strange; a fairly small
> > amount of thinking on the matter _does_ suggest it's a thing that
> > should be needed, but somehow...  isn't?  I'll have to think more on
> > that.
> 
> FWIW, I figured this out; a little experimentation showed that my
> crash happened on the OtpRestack on the _new_ focused window, not the
> _old_ being-unfocused window.  So it was because of the transient that
> wasn't pre-raised to where it was expected.  So maybe if your crash
> was on the way down, you may also have crashed in that second (when it
> wasn't lowered right), in which case the Focus NULL'ing wouldn't make
> a different.  Or maybe you crashed in the first one somehow, in which
> case it would have?  Mmph.  Lot of moving parts, to keep in the head,
> trying to guess...

Intersting! My theory so far was that *during* the change of focus, the
window depths might be inconsistent. But after OtpRestackWindow()ing the
newly focused window, it should be finished, and all consistent and
happy. That was consistent with my crash between the two calls to
OtpRestackWindow(). My change was also based on that idea.

So far I haven't turned on debugging to print all restacking operations,
but maybe it is time for that... and we might find out that everything
is different...

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert  -- "What good is a Ring of Power
\X/ rhialto/at/falu.nl  -- if you're unable...to Speak." - Agent Elrond


signature.asc
Description: PGP signature


Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-06-30 Thread Matthew D. Fuller


> The change (or lack thereof) in SetFocus() makes no difference at
> all in either case for me.  Which seems a strange; a fairly small
> amount of thinking on the matter _does_ suggest it's a thing that
> should be needed, but somehow...  isn't?  I'll have to think more on
> that.

FWIW, I figured this out; a little experimentation showed that my
crash happened on the OtpRestack on the _new_ focused window, not the
_old_ being-unfocused window.  So it was because of the transient that
wasn't pre-raised to where it was expected.  So maybe if your crash
was on the way down, you may also have crashed in that second (when it
wasn't lowered right), in which case the Focus NULL'ing wouldn't make
a different.  Or maybe you crashed in the first one somehow, in which
case it would have?  Mmph.  Lot of moving parts, to keep in the head,
trying to guess...


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.



Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-06-30 Thread Matthew D. Fuller
On Sat, Jun 29, 2019 at 03:02:30PM +0200 I heard the voice of
Rhialto, and lo! it spake thus:
> 
> Looking at my test case (Xine), it has indeed a small window with
> playback controls, and it is also a transient of the main window
> (fullscreenable).

Ah, well, see?  If you just stuck with mpv or mplayer like me, you
wouldn't have a transient, and wouldn't have these problems   :p


> _NET_WM_WINDOW_TYPE(ATOM) = _NET_WM_WINDOW_TYPE_TOOLBAR
> 
> which sounds weird to be, but it definitely complicates window
> stacking!

Well, luckily, we don't know or care anything about TOOLBAR, so it
doesn't hurt anything itself.


> I didn't get a reliable way to get the crash. I just noticed it mostly
> seemed to occur if I was switching to another workspace while the
> main window fullscreened, or when switching away from fullscreen mode.
> I didn't take notice of any other windows. I guess I'll experiment a bit
> more.

Interesting.  In both cases, that sounds like you're finding the crash
when the window is on its way _down_, whereas my method hits it on the
way _up_.  Interesting.  Now, apart from the fullscreen thing, I don't
think I ever have any windows with OTP != 0; maybe if you've got
something at e.g. +2, that the FS window and its transient have to
pass in both directions...?


> My first impression, after reading your description, was that we have
> found different problems. But now that my Xine Panel window turns out to
> be transient as well, maybe it's the same one, or at least more related
> than I thought. It is a fairly small window. But when taking focus away
> from the main window, it also changes the calculated proper position of
> the panel window.

Yeah.  The trickiness apparently is that a window and its transients
really need to move around as a group.  I mean, conceptually they
don't, but with our OTP implementation, I think it's probably way more
troublesome if they don't.  And that winds up mostly being the case,
but there are a few edge cases where I think you can probably wind up
with them in different places (f.setpriority on a transient itself,
frex, or _ABOVE/_BELOW flags, etc).  All the OTP code dealing with
transients seems to assume they're always on the same effective level
as the main window, but that's not actually enforced anywhere.  I'll
bet you'll cause a _lot_ of trouble, if you arrange those
situations...

The focus handling has been calling OtpRestackWindows(), which calls
InsertOwl().  Which seemingly assumes that all of the window's
transients are on the top of the current OTP layer, copies in the
current base priority, and then...   raises all the small transients
above all the windows below the last one that loop found?  And then
puts this window right below them.  The logic is kinda odd, but
apparently that's purely handling TransientOnTop stuff (which is why
cranking TOT lets me hack around my crash).

But for our case, we want to act more like f.setpriority, in that we
need to assume all our transients are shifting to the new layer with
us.  I did a little noodling around the tree below
OtpRestackWindows(), but I couldn't come up with any good ideas that
didn't have obviously wacky side effects.  So I think we just need to
adjust what the focus handlers call.

I came up with the attached patch, which does fix my crash.  The major
side effect is that the window's transients all always wind up back on
top of it when it moves up/down.  That's not ideal, but it's better
than a crash, and putting them all below it doesn't sound any better.

Really, the whole interaction of the transients + fullscreen/focus is
wonky and needs some rethinking.  Actually, one reason I haven't
already added a StopDoingOTPCrapOnFullscreenFocused setting is 'cuz
I'd probably just set it and never think about the things that really
need fixing in it...


So, give the attached a whirl.  Especially if you can find a way to
trigger the crash at will, so we can actually know whether it's
addressing the matter.  If it succeeds, I'd like to go back to just
before the RLayout/RANDR changes, apply it there, and cut a 4.0.3 with
that and the few other little things that happened early enough; 'd be
good to get that fix out in a point release.


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.
=== modified file 'otp.c'
--- otp.c	2018-08-19 22:50:13 +
+++ otp.c	2019-06-30 22:47:26 +
@@ -1579,6 +1579,104 @@
 }
 
 
+
+/**
+ * Focus/unfocus backend.  This is used on windows whose stacking is
+ * focus-dependent (e.g., EWMH fullscreen), to move them and their
+ * transients around.  For these windows, getting/losing focus is
+ * practically the same as a f.setpriority, except it's on the calculated
+ * rather than the base parts.  And it's hard to re-use our existing
+ * functions to do it because we have to move Scr->Focus before the main
+ * window changes, but 

Re: assertion error "PRI(owl) >= priority" otp.c line 248

2019-06-29 Thread Rhialto
On Wed 26 Jun 2019 at 21:56:07 -0500, Matthew D. Fuller wrote:
> So, the solution to _a_ problem (maybe not yours) is that we need to

Looking at my test case (Xine), it has indeed a small window with
playback controls, and it is also a transient of the main window
(fullscreenable).

WM_NAME(STRING) = "xine Panel"
WM_TRANSIENT_FOR(WINDOW): window id # 0x52002ff

With xprop(1) I noticed that it ALSO has

_NET_WM_WINDOW_TYPE(ATOM) = _NET_WM_WINDOW_TYPE_TOOLBAR

which sounds weird to be, but it definitely complicates window stacking!

I didn't get a reliable way to get the crash. I just noticed it mostly
seemed to occur if I was switching to another workspace while the
main window fullscreened, or when switching away from fullscreen mode.
I didn't take notice of any other windows. I guess I'll experiment a bit
more.

> reposition even the 'small' transients up to the top of the stack,
> letting the 'small' just determine whether they get jammed on top of
> the main window or not.  Hopefully I can experiment more in the next
> few days...

My first impression, after reading your description, was that we have
found different problems. But now that my Xine Panel window turns out to
be transient as well, maybe it's the same one, or at least more related
than I thought. It is a fairly small window. But when taking focus away
from the main window, it also changes the calculated proper position of
the panel window.

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert  -- "What good is a Ring of Power
\X/ rhialto/at/falu.nl  -- if you're unable...to Speak." - Agent Elrond


signature.asc
Description: PGP signature