Re: [fossil-users] Delay with `fossil ui' (related to backoffice processing?)
Warren Young: > You can open a file called NUL on Windows to get the same effect as > /dev/null on POSIX type platforms. A very interesting topic! NULL (void*)0, or nullptr (was used in a confusing manner, in my post) \Device\Null Windows NT (object manager) equivalent to /dev/null \??\NUL Windows NT global symlink to \Device\Null Not accepted by Win32 APIs \\.\NUL \\?\NUL Local Device and Root Local Device forms of \??\NUL Works with Win32 APIs (\\.\ and \\?\ both escape to \DosDevices, the latter without any path name preprocessing, and on current versions of Windows NT, \DosDevices is a is a symlink to \??, which is a virtual directory containing both objects from the \GLOBAL?? and from the user session directories) NUL Win32 API magic file name mapped to \\.\NUL \\?\C:\dev\NUL Path to file C:\dev\NUL (must be specified as an absolute path) Works with Win32 APIs (\\?\ escapes to \DosDevices, which links to \??, where the symlink named C: is found, pointing to \Device\HarddiskVolumeN, and finally resolving to \Device\HarddiskVolumeN\dev\NUL) > That’s no concern here, since the Fossil code will likely just say > "NUL". Yes, I think the Win32 variant should be: (a) g.httpOut = fossil_fopen("NUL", "wb"); instead of: (b) g.httpOut = fossil_fopen("/dev/null", "wb"); Code snippet (a) returns a valid (i.e. non-zero) FILE*, on Windows. If the root drive of the fossil.exe current (working) directory has a directory named "dev", say "C:\dev", code snippet (b) creates the file "C:\dev\null", on Windows. Just tested with fossil_open(). --Florian ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Delay with `fossil ui' (related to backoffice processing?)
On Jul 22, 2018, at 4:21 AM, Florian Balmer wrote: > > writing to a FILE* pointing to NULL may have > exactly the same effect as writing to a FILE* pointing to "/dev/null”? You can open a file called NUL on Windows to get the same effect as /dev/null on POSIX type platforms. Unlike on POSIX, NUL is just a special-cased file name, not the name of an actual device, so it’s possible to get around it with NTFS file path trickery: https://superuser.com/questions/282194/ That’s no concern here, since the Fossil code will likely just say "NUL". ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Delay with `fossil ui' (related to backoffice processing?)
On 7/22/18, Florian Balmer wrote: > > Just a thought: would it make sense to have --nodelay as a global > option recognized by all commands, and maybe also through an > environment variable, and a CGI control option? So that any backoffice > processing could be temporarily disabled? Everything is in flux right now. Let's iron out these details once I get everything actually working. There is a lot more to be done before we reach that point. -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Delay with `fossil ui' (related to backoffice processing?)
Richard Hipp: > Please test the latest trunk version and let me know if you are still > seeing issues. Thank you very much for the fix! I see no more delays with "fossil ui" on Windows. Just a thought: would it make sense to have --nodelay as a global option recognized by all commands, and maybe also through an environment variable, and a CGI control option? So that any backoffice processing could be temporarily disabled? This would probably require an explicit "backoffice" command, which could then also be set up to run as a task or cron job every few hours. This may be handy for fatal incidents that require "repository surgery", or restoring from backups, where you don't want any notifications to be sent before things are fixed. But maybe this is overengineering. --Florian ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Delay with `fossil ui' (related to backoffice processing?)
On 7/21/18, Florian Balmer wrote: > > I see random delays for the `fossil ui' command, Please test the latest trunk version and let me know if you are still seeing issues. Anybody who has the capability, please also verify that "fossil server --scgi" is now working again. I do not have a SCGI webserver set up on windows with which to test that command so I implemented the fixes there without actually checking them. -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Delay with `fossil ui' (related to backoffice processing?)
Richard Hipp: > Hence, if the subprocess gets involves in backoffice work, that can > delay the return of content to the user. > The only comes up on Windows. I do not yet have a good work-around. I see, thank you very much for the detailed explanations. I only have the obvious (inefficient?) ideas, like using a command-line switch to direct the "HTTP renderer" to offload backoffice processing to another separate process, and exit immediately once the HTTP response is ready. Looking forward to see what will be your solution. I can always learn a lot of interesting things from the Fossil source code. --Florian ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Delay with `fossil ui' (related to backoffice processing?)
On 7/22/18, Florian Balmer wrote: > > The Windows HTTP server (and thus `fossil ui') works by spawning > sub-processes for each individual HTTP request, right? This results in > multiple sub-processes per web page (one for the HTML page, one for > CSS, another one for JavaScript, etc.)? > Every Fossil webpage is generated by a separate subprocess. This is true for Windows, Mac, Linux, BSD, whatever. And it is true regardless of how Fossil is launched. The backoffice is a facility to do background processing - things like sending email notifications, syncing with peers, maybe doing aggressive compaction of the repository - anything that is not directly interacting with the user. Backoffice runs after a webpage is generated, in the same subprocess. Once the entire webpage has been generated, Fossil closes the socket or file into which the webpage is being written, and then attempts to do backoffice processing. There can only be one backoffice process at a time, and another backoffice that is "on deck" - ready to run when the current one finishes. In most systems, the webserver returns results back to the user as soon as the output socket closes. So the user does not realize that the process that generated the webpage continues running for up to 60 seconds of backoffice work. But on Windows for "fossil ui" and "fossil server", the webpage is not returned to the user until the subprocess actually terminates. Hence, if the subprocess gets involves in backoffice work, that can delay the return of content to the user. The only comes up on Windows. I do not yet have a good work-around. -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Delay with `fossil ui' (related to backoffice processing?)
> ... writing to a FILE* pointing to NULL may have exactly the same > effect as writing to a FILE* pointing to "/dev/null"? Well, unless there's a directory named "dev" in the root of the current drive, which would result in creation (due to the passed "wb" mode) of a file named "null", on Windows. --Florian ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Delay with `fossil ui' (related to backoffice processing?)
Warren Young: > That’s brand new code, less than a week old. It’s not surprising it’s > not rock-solid yet. Yes, I agree, I no not expect complex new features to be flawless from the beginning. But I've stumbled into the issue while testing a fix from the current tip, and decided to report it early. > I expect drh is now on the case. :) The Windows HTTP server (and thus `fossil ui') works by spawning sub-processes for each individual HTTP request, right? This results in multiple sub-processes per web page (one for the HTML page, one for CSS, another one for JavaScript, etc.)? Maybe that these sub-processes concurrently try to "get the backoffice lease" after successfully serving their HTTP request, resulting in a race, and some of the sub-processes sleeping and waiting for 60 seconds for the lease to expire? I noticed the following commit: [a32a92d2] At the end of CGI processing, close the output pipe before starting backoffice processing, in order to let higher levels know that the CGI is finished. http://fossil-scm.org/index.html/info/a32a92d2 Maybe this fixed the same (or a similar) problem with the HTTP server on *nix, so the web browser quickly receives all the HTTP output, before the HTTP serving process is turning to backoffice work? But on Windows, g.httpOut seems to be a temporary file, and not a pipe, and the `fossil ui' main process may not be waiting for the file to become signaled when (any) I/O completes (which is a discouraged technique anyway), but for the sub-process that generated the file to exit? And this one sub-process sleeping is causing the delay in the `fossil ui' main process? Sorry for my prosaic speculation and no hard facts, but I find it all very interesting! Another minor point: The following line of source code from the aforementioned commit: g.httpOut = fossil_fopen("/dev/null", "wb"); always seems to result in g.httpOut==NULL on Windows, as "/dev/null" is not a valid Windows device. This probably doesn't matter from the logic of the code, as writing to a FILE* pointing to NULL may have exactly the same effect as writing to a FILE* pointing to "/dev/null"? --Florian ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Delay with `fossil ui' (related to backoffice processing?)
On Jul 22, 2018, at 1:18 AM, Florian Balmer wrote: > > [5544931c] src/backoffice.c:240 That’s brand new code, less than a week old. It’s not surprising it’s not rock-solid yet. I expect drh is now on the case. :) ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Delay with `fossil ui' (related to backoffice processing?)
Thank very much for your feedback! Warren Young: > Which is it? With the random appearance of the delay, and the long duration of 60 seconds, I eventually gave up on "bisecting". But with a more systematic approach, I've now got: Last good: [06507038] First bad: [947081aa] > If the problem first appeared with [76800769] ... Backing out [76800769] (and the dependent [d216ea9a]) does not fix the delay. > Also, does this behavior happen on only one machine? I can also see it on a Windows 7 (32-bit) VM. Here's what I've found out in the meantime: Waiting for 60 seconds for the delay to time out, until the default /timeline web page is loaded in the browser, instead of terminating the server with Ctrl+C, will cause another 60 seconds delay for any links clicked in the web ui. Reloading the delayed /timeline web page with F5, before the 60 seconds have elapsed, gives a speedy refresh, with no delay, so multiple requests within the 60 seconds period are not delayed. Waiting for 60 seconds before running `fossil ui' after the last server was terminated, causes the delay to always happen. The observed delay of 60 seconds seems somehow related to: [5544931c] src/backoffice.c:59 #define BKOFCE_LEASE_TIME 60 /* Length of lease validity */ On Windows, the DbgView utility is useful to trace messages from OutputDebugStringA(), handy if there's no attached console, and easier than attaching WinDbg to spawned sub-processes: https://live.sysinternals.com/Dbgview.exe By adding calls to OutputDebugStringA() before and after the following line of source code, I found that one of the fossil.exe processes seems to be stuck here during the delay: [5544931c] src/backoffice.c:240 sqlite3_sleep(1000*(x.tmCurrent - tmNow + 1)); This code path is only entered if the delay becomes manifest. I don't understand all the details of the backoffice processing code, yet, in particular not the condition to trigger the sleep. But the sleep seems to block the web ui, somehow? --Florian ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Delay with `fossil ui' (related to backoffice processing?)
On Jul 21, 2018, at 9:43 PM, Warren Young wrote: > > Try backing that checkin out to see if the prior problem (dangling journal > files) reappears, while the delay also goes away: Also, does this behavior happen on only one machine? If it happens on many different Windows boxes within your reach, that has different implications than if it only happens on the one you happen to be using most often. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Delay with `fossil ui' (related to backoffice processing?)
On Jul 21, 2018, at 11:48 AM, Florian Balmer wrote: > > I've noticed something with the current tip version of Fossil… > I tried to find a "last good" version by means of "bisecting", but…the delay > never appears with Fossil 2.6. I can see only two ways for those two sentences to be true at once, but it’s unclear which is the case from the way you phrased it: 1. The tip of trunk and *only* the tip of trunk exhibits the problem behavior. 2. Every checkin after 2.6 exhibits it. Which is it? If the problem first appeared with [76800769], the fix drh checked in to clean up the dangling journal files you were reporting, then I wonder if the delay is waiting out some SQLite lock. Windows’ native locking behavior is much more restrictive than the default on pretty much every other platform, so I wonder if the new shutdown code path created a temporary deadlock that gets broken by a SQLite timeout. Try backing that checkin out to see if the prior problem (dangling journal files) reappears, while the delay also goes away: fossil up trunk # back out from the bisect fossil revert # just to be sure fossil merge --backout 76800769 Then build and test. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users