=?windows-1252?Q?=5BPATCH=5D_A_working_implementation_of?= =?windows-1252?Q?_fork=5Fto=5Fbackground=28=29_under_Windows_=96?= =?windows-1252?Q?_please_test?=

David Fritz Fri, 19 Mar 2004 17:39:26 -0800

Attached is an implementation of fork_to_background() for Windows that (I hope) has the desired effect under both 9x and NT.

_This is a preliminary patch and needs to be tested._

The patch is dependant upon the fact that the only time fork_to_background() is called is on start-up when –b is specified.

Windows of course does not support the fork() call, so it must be simulated. This can be done by creating a new process and using some form of inter-process communication to transfer the state of the old process to the new one. This requires the parent and child to cooperate and when done in a general way (such as by Cygwin) requires a lot of work.

However, with Wget since we have a priori knowledge of what could have changed in the parent by the time we call fork(), we could implement a special purpose fork() that only passes to the child the things that we know could have changed. (The initialization done by the C run-time library, etc. would be performed anew in the child, but hold on a minute.)

The only real work done by Wget before calling fork() is the reading of wgetrc files and the processing of command-line arguments. Passing this information directly to the child would be possible, but the implementation would be complex and fragile. It would need to be updated as changes are made to the main code.

It would be much simpler to simply perform the initialization (reading of config files, processing of args, etc.) again in the child. This would have a small performance impact and introduce some race-conditions, but I think the advantages (having –b work) outweigh the disadvantages.

The implementation is, I hope, fairly straightforward. I have attempted to explain it in moderate detail in an attached README.

I'm hoping others can test it with various operating systems and compilers. Also, any feedback regarding the design or implementation would be welcome. Do you feel this is the right way to go about this?

Cheers,
David Fritz

2004-03-19 David Fritz <[EMAIL PROTECTED]>

        * mswindows.c (make_section_name, fake_fork, fake_fork_child): New
        functions.
        (fork_to_backgorund): Replace with new implementation.

Index: src/mswindows.c
===================================================================
RCS file: /pack/anoncvs/wget/src/mswindows.c,v
retrieving revision 1.29
diff -u -r1.29 mswindows.c
--- src/mswindows.c     2004/03/19 23:54:27     1.29
+++ src/mswindows.c     2004/03/20 01:34:15
@@ -131,10 +131,240 @@
   FreeConsole ();
 }
 
+/* Construct the name for a named section (a.k.a `file mapping') object.
+   The returned string is dynamically allocated and needs to be xfree()'d.  */
+static char *
+make_section_name (DWORD pid)
+{
+    return aprintf("gnu_wget_fake_fork_%lu", pid);
+}
+
+/* This structure is used to hold all the data that is exchanged between
+   parent and child.  */
+struct fake_fork_info
+{
+  HANDLE event;
+  int changedp;
+  char lfilename[MAX_PATH + 1];
+};
+
+/* Determines if we are the child and if so performs the child logic.
+   Return values:
+     < 0  error
+     0    parent
+     > 0  child
+*/
+static int
+fake_fork_child (void)
+{
+  HANDLE section, event;
+  struct fake_fork_info *info;
+  char *name;
+  DWORD le;
+
+  name = make_section_name (GetCurrentProcessId ());
+  section = OpenFileMapping (FILE_MAP_WRITE, FALSE, name);
+  le = GetLastError ();
+  xfree (name);
+  if (!section)
+    {
+      if (le == ERROR_FILE_NOT_FOUND)
+        return 0;   /* Section object does not exist; we are the parent.  */
+      else
+        return -1;
+    }
+
+  info = MapViewOfFile (section, FILE_MAP_WRITE, 0, 0, 0);
+  if (!info)
+    {
+      CloseHandle (section);
+      return -1;
+    }
+
+  event = info->event;
+
+  if (!opt.lfilename)
+    {
+      opt.lfilename = unique_name (DEFAULT_LOGFILE, 0);
+      info->changedp = 1;
+      strncpy (info->lfilename, opt.lfilename, sizeof (info->lfilename));
+      info->lfilename[sizeof (info->lfilename) - 1] = '\0';
+    }
+  else
+    info->changedp = 0;
+
+  UnmapViewOfFile (info);
+  CloseHandle (section);
+
+  /* Inform the parent that we've done our part.  */
+  if (!SetEvent (event))
+      return -1;
+
+  CloseHandle (event);
+  return 1;                     /* We are the child.  */
+}
+
+
+static void
+fake_fork (void)
+{
+  char *cmdline, *args;
+  char exe[MAX_PATH + 1];
+  DWORD exe_len, le;
+  SECURITY_ATTRIBUTES sa;
+  HANDLE section, event, h[2];
+  STARTUPINFO si;
+  PROCESS_INFORMATION pi;
+  struct fake_fork_info *info;
+  char *name;
+  BOOL rv;
+
+  event = section = pi.hProcess = pi.hThread = NULL;
+
+  /* Get command line arguments to pass to the child process.
+     We need to skip the name of the command (what amounts to argv[0]).  */
+  cmdline = GetCommandLine ();
+  if (*cmdline == '"')
+    {
+      args = strchr (cmdline + 1, '"');
+      if (args)
+        ++args;
+    }
+  else
+    args = strchr (cmdline, ' ');
+
+  /* It's ok if args is NULL, that would mean there were no arguments
+     after the command name.  As it is now though, we would never get here
+     if that were true.  */
+
+  /* Get the fully qualified name of our executable.  This is more reliable
+     than using argv[0].  */
+  exe_len = GetModuleFileName (GetModuleHandle (NULL), exe, sizeof (exe));
+  if (!exe_len || (exe_len >= sizeof (exe)))
+      return;
+
+  sa.nLength = sizeof (sa);
+  sa.lpSecurityDescriptor = NULL;
+  sa.bInheritHandle = TRUE;
+
+  /* Create an anonymous inheritable event object that starts out
+     non-signaled.  */
+  event = CreateEvent (&sa, FALSE, FALSE, NULL);
+  if (!event)
+      return;
+
+  /* Creat the child process detached form the current console and in a
+     suspended state.  */
+  memset (&si, 0, sizeof (si));
+  si.cb = sizeof (si);
+  rv = CreateProcess (exe, args, NULL, NULL, TRUE, CREATE_SUSPENDED |
+                      DETACHED_PROCESS, NULL, NULL, &si, &pi);
+  if (!rv)
+    goto cleanup;
+
+  /* Create a named section object with a name based on the process id of
+     the child.  */
+  name = make_section_name (pi.dwProcessId);
+  section =
+      CreateFileMapping (INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE, 0,
+                         sizeof (struct fake_fork_info), name);
+  le = GetLastError();
+  xfree (name);
+  /* Fail if the section object already exists (should not happen).  */
+  if (!section || (le == ERROR_ALREADY_EXISTS))
+    {
+      rv = FALSE;
+      goto cleanup;
+    }
+
+  /* Copy the event handle into the section object.  */
+  info = MapViewOfFile (section, FILE_MAP_WRITE, 0, 0, 0);
+  if (!info)
+    {
+      rv = FALSE;
+      goto cleanup;
+    }
+
+  info->event = event;
+
+  UnmapViewOfFile (info);
+
+  /* Start the child process.  */
+  rv = ResumeThread (pi.hThread);
+  if (!rv)
+    {
+      TerminateProcess (pi.hProcess, (DWORD) -1);
+      goto cleanup;
+    }
+
+  /* Wait for the child to signal to us that it has done its part.  If it
+     terminates before signaling us it's an error.  */
+
+  h[0] = event;
+  h[1] = pi.hProcess;
+  rv = WAIT_OBJECT_0 == WaitForMultipleObjects (2, h, FALSE, 5 * 60 * 1000);
+  if (!rv)
+    goto cleanup;
+
+  info = MapViewOfFile (section, FILE_MAP_READ, 0, 0, 0);
+  if (!info)
+    {
+      rv = FALSE;
+      goto cleanup;
+    }
+
+  /* Ensure string is properly terminated.  */
+  if (info->changedp &&
+      !memchr (info->lfilename, '\0', sizeof (info->lfilename)))
+    {
+      rv = FALSE;
+      goto cleanup;
+    }
+
+  printf (_("Continuing in background, pid %lu.\n"), pi.dwProcessId);
+  if (info->changedp)
+    printf (_("Output will be written to `%s'.\n"), info->lfilename);
+
+  UnmapViewOfFile (info);
+
+cleanup:
+
+  if (event)
+    CloseHandle (event);
+  if (section)
+    CloseHandle (section);
+  if (pi.hThread)
+    CloseHandle (pi.hThread);
+  if (pi.hProcess)
+    CloseHandle (pi.hProcess);
+
+  /* We're the parent.  If all is well, terminate.  */
+  if (rv)
+    exit (0);
+
+  /* We failed, return.  */
+}
+
 void
 fork_to_background (void)
 {
-  ws_hangup ("fork");
+  int rv;
+
+  rv = fake_fork_child ();
+  if (rv < 0)
+    {
+      fprintf (stderr, "fake_fork_child() failed\n");
+      abort ();
+    }
+  else if (rv == 0)
+    {
+      /* We're the parent.  */
+      fake_fork ();
+      /* If fake_fork() returns, it failed.  */
+      fprintf (stderr, "fake_fork() failed\n");
+      abort ();
+    }
+  /* If we get here, we're the child.  */
 }
 
 static BOOL WINAPI

The implementation of fork_to_background() on Windows
-----------------------------------------------------

This is an attempt to describe the implementation of fork_to_background()
under Windows.

First, there is one very important assumption that this code makes:
The *only* time fork_to_background() is called is in response to
the -b|--background option at start-up.  If this assumption were ever
invalidated, the code would need to be changed.  Probably by naming the
function something other than fork_to_background() and special-casing
-b for Windows.

This code fakes fork() by invoking another copy of Wget (hereafter
referred to as the child) with the exact same arguments with which the
first copy (hereafter, the parent) was invoked.

The real (Unix) fork() creates a duplicate of the calling process and the
return value indicates to each process whether it is the parent or the
child.  Instead of copying the parent process, this code invokes another
copy of Wget that will (should) perform the exact same initialization
sequence.  So by the time it gets to our code, we have two processes that
are essentially identical.  This method has a slight performance penalty
as the initialization must be performed twice.  There's a memory overhead
as well, as this is almost definitely not as efficient as a real fork().
It also introduces some race conditions; for instance, the wgetrc file(s)
might have changed in the split second between the invocations.  But I
think the likelihood of these issues causing any real problem is small.

Now all we really need to do is distinguish between the parent and the
child.  There are many ways of doing this.  We could use an environment
variable or a special command line option passed to the child.  But either
of those solutions could bump up against the limitations imposed by
Windows (especially 9x) in edge cases.  Using a command line option would
also require making changes to the main code and so is doubly undesirable.

The method used by this implementation is to check for the existence
or non-existence of a named kernel object.  The object name includes
the process id of the child; this allows multiple instances of Wget to
avoid stepping on each other.

The type of object used is a section object backed by the page file;
which is Windows' way of implementing shared memory.  (Under NT,
these are natively referred to as `section' objects but the Windows API
documentation mostly refers to them as `file-mapping' objects; for the
remainder of this document I will use the NT nomenclature and refer to
them as section objects.)  The section object is then used to exchange
information between the parent and child.  This isn't really necessary;
the only information currently exchanged is a flag indicating whether
the child generated a new log file name and if so, the generated name.
The parent will then print process id of the child and the new log file
name (if there is one) to the console before terminating.  We could
simply not print the name of the log file or we could guess the filename
the child would choose (and introduce another race condition).  Doing it
this way avoids a race condition and gives -b the same look-and-feel as
on Unix.  I think it's worth the modicum of added complexity.  (We could
also generate the log filename in the parent and pass it to the child,
but this wouldn't reduce the complexity any.)  This also provides a means
of passing information between the parent and child that could be easily
extended in the future.

The code starts by checking to see if we are the child process.  It does
this by attempting to open a named section object, constructing the object
name using the current process id.  For example, if we are process 80,
we try to open a section object named `gnu_wget_fake_fork_80'.  If it
exists then we are the child process.  If not then we are the parent.

Let's say we're the parent.  What we need to do is create a process
that is another copy of Wget invoked with the same arguments we were
invoked with.  To do this we use GetCommandLine() to get the unprocessed
command line we were invoked with and GetModuleFileName() to get the
fully qualified name of our executable.

We will need a means of synchronizing the interaction of parent and child.
For this we create an anonymous event object with an inheritable handle.

Since the point is for the child to run in the background, we create the
child detached from the current console.  We also create the child in
a suspended state to allow us to create the named section object before
the child starts executing.

Once we have created the child process, we know its process id and can
use it to construct the name of the section object.  We then create a
section object of appropriate size to hold all the information we intend
to exchange between parent and child.  For convenience, we put all such
information in a single data structure.

We then map the section object into our address space and write our event
handle into it.  Because the event handle is inheritable and we created it
before creating the child process, it will be valid in the child process.

We then start the child executing and wait for it to signal the event.
We also want to resume right away if the child terminates unexpectedly
before signaling the event; such a case would be an error condition.
We accomplish this by using WaitForMultipleObjects().

While the parent is waiting, the child starts running.  The child
attempts to open a section object with a name based on its process id.
This should succeed and the child now knows that it is the child and has
a handle to the section object created by the parent.  The child maps
the section object into its address space, retrieves the event handle
from it and writes the log file name info to it.  It then signals the
event and returns so it can continue executing in the background.

Back in the parent, execution resumes when the event becomes signaled.
The parent now prints the process id and (if necessary) the name of
the log file (retrieved from the section object) to the console and
terminates.  Mission accomplished.

=?windows-1252?Q?=5BPATCH=5D_A_working_implementation_of?= =?windows-1252?Q?_fork=5Fto=5Fbackground=28=29_under_Windows_=96?= =?windows-1252?Q?_please_test?=

Reply via email to