D8023: chg: read CHGORIG_ values from env and handle these appropriately

2020-02-10 Thread spectral (Kyle Lippincott)
spectral added a comment.
spectral abandoned this revision.


  We decided to use a different method to resolve this issue.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8023/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8023

To: spectral, #hg-reviewers
Cc: mjpieters, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D8023: chg: read CHGORIG_ values from env and handle these appropriately

2020-02-04 Thread spectral (Kyle Lippincott)
spectral updated this revision to Diff 19875.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D8023?vs=19652=19875

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8023/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8023

AFFECTED FILES
  mercurial/chgserver.py
  tests/test-chg.t

CHANGE DETAILS

diff --git a/tests/test-chg.t b/tests/test-chg.t
--- a/tests/test-chg.t
+++ b/tests/test-chg.t
@@ -332,8 +332,7 @@
   /MM/DD HH:MM:SS (PID)> log -R cached
   /MM/DD HH:MM:SS (PID)> loaded repo into cache: $TESTTMP/cached (in  ...s)
 
-Test that chg works even when python "coerces" the locale (py3.7+, which is 
done
-by default if none of LC_ALL, LC_CTYPE, or LANG are set in the environment)
+Test that chg works even when python "coerces" the locale (py3.7+)
 
   $ cat > $TESTTMP/debugenv.py < from mercurial import encoding
@@ -345,11 +344,31 @@
   > for k in [b'LC_ALL', b'LC_CTYPE', b'LANG']:
   > v = encoding.environ.get(k)
   > if v is not None:
-  > ui.write(b'%s=%s\n' % (k, encoding.environ[k]))
+  > ui.write(b'debugenv: %s=%s\n' % (k, encoding.environ[k]))
   > EOF
-  $ LANG= LC_ALL= LC_CTYPE= chg \
-  >--config extensions.debugenv=$TESTTMP/debugenv.py debugenv
-  LC_ALL=
-  LC_CTYPE=C.UTF-8 (py37 !)
-  LC_CTYPE= (no-py37 !)
-  LANG=
+  $ CHGDEBUG= LANG= LC_ALL= LC_CTYPE= chg \
+  >--config extensions.debugenv=$TESTTMP/debugenv.py debugenv 2>&1 \
+  >| egrep 'debugenv|start'
+  chg: debug: * start cmdserver at * (glob)
+  debugenv: LC_ALL=
+  debugenv: LC_CTYPE=C.UTF-8 (py37 !)
+  debugenv: LC_CTYPE= (no-py37 !)
+  debugenv: LANG=
+(Should not trigger a command server restart)
+  $ CHGDEBUG= LANG= LC_ALL= LC_CTYPE= chg \
+  >--config extensions.debugenv=$TESTTMP/debugenv.py debugenv 2>&1 \
+  >| egrep 'debugenv|start'
+  debugenv: LC_ALL=
+  debugenv: LC_CTYPE=C.UTF-8 (py37 !)
+  debugenv: LC_CTYPE= (no-py37 !)
+  debugenv: LANG=
+(Should trigger a command server restart since the LC_CTYPE=C value should be
+used, not the coerced one)
+  $ CHGDEBUG= LANG= LC_ALL= LC_CTYPE=C chg \
+  >--config extensions.debugenv=$TESTTMP/debugenv.py debugenv 2>&1 \
+  >| egrep 'debugenv|start'
+  chg: debug: * start cmdserver at * (glob)
+  debugenv: LC_ALL=
+  debugenv: LC_CTYPE=C.UTF-8 (py37 !)
+  debugenv: LC_CTYPE=C (no-py37 !)
+  debugenv: LANG=
diff --git a/mercurial/chgserver.py b/mercurial/chgserver.py
--- a/mercurial/chgserver.py
+++ b/mercurial/chgserver.py
@@ -129,12 +129,37 @@
 ignored = {b'HG'}
 else:
 ignored = set()
-envitems = [
-(k, v)
+envitems = {
+k: v
 for k, v in pycompat.iteritems(encoding.environ)
 if _envre.match(k) and k not in ignored
-]
-envhash = _hashlist(sorted(envitems))
+}
+# Py3 may modify environment variables before `hg serve` starts. When chg
+# connects to us, it passes in its environment (without the modifications),
+# and we'll generate a different config hash.
+#
+# To handle this situation, chg will pass the variables through using
+# CHGORIG_=<0 if not in env, 1 if in env>, and we use
+# *that* value only when calculating the hash. Examples:
+#   CHGORIG_LC_CTYPE=0   <-- not in chg's env, delete for hash
+#   CHGORIG_LC_CTYPE=1   <-- in chg's env, but empty. Set to empty for 
hash.
+#   CHGORIG_LC_CTYPE=1UTF-8  <-- set to UTF-8 for hash.
+#
+# This does NOT cause us to lose Py3's modifications to these variables (we
+# aren't copying this into our environment), so chg and hg should continue
+# to behave the same - both will operate with the modifications Py3 made.
+for k, v in pycompat.iteritems(encoding.environ):
+if k.startswith(b'CHGORIG_'):
+origname = k[len(b'CHGORIG_') :]
+if not _envre.match(origname) or origname in ignored:
+continue
+
+if v.startswith(b'0') and origname in envitems:
+del envitems[origname]
+else:
+envitems[origname] = v[1:]
+
+envhash = _hashlist(sorted(pycompat.iteritems(envitems)))
 return sectionhash[:6] + envhash[:6]
 
 
@@ -550,39 +575,18 @@
 raise ValueError(b'unexpected value in setenv request')
 self.ui.log(b'chgserver', b'setenv: %r\n', sorted(newenv.keys()))
 
-# Python3 has some logic to "coerce" the C locale to a UTF-8 capable
-# one, and it sets LC_CTYPE in the environment to C.UTF-8 if none of
-# 'LC_CTYPE', 'LC_ALL' or 'LANG' are set (to any value). This can be
-# disabled with PYTHONCOERCECLOCALE=0 in the environment.
-#
-# When fromui is called via _inithashstate, python has already set
-# this, so that's in the environment right when we start up the hg
-# process. Then chg will call us and tell us to set the environment to
-# the one it has; this might 

D8023: chg: read CHGORIG_ values from env and handle these appropriately

2020-01-27 Thread spectral (Kyle Lippincott)
spectral created this revision.
Herald added subscribers: mercurial-devel, mjpieters.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Python3.7+ will "coerce" the locale (specifically LC_CTYPE) in many 
situations,
  and this can cause chg to not start. My previous fix in D7550 
 did not cover all
  situations correctly, but hopefully this fix does.
  
  The C side of chg will set CHGORIG_LC_CTYPE in its environment before starting
  the command server and before calling setenv on the command server.
  
  When calculating the environment hash, we use the value from CHGORIG_LC_CTYPE 
to
  calculate the hash - intentionally ignoring the modifications that Python may
  have done during command server startup.
  
  When chg calls setenv on the command server, the command server will see
  CHGORIG_LC_CTYPE in the environment-to-set, and NOT modify LC_CTYPE to be the
  same as in the environment-to-set. This preserves the modifications that 
Python
  has done during startup. We'll still calculate the hash using the
  CHGORIG_LC_CTYPE variables appropriately, so we'll detect environment changes
  (even if they don't cause a change in the actual value). Example:
  
  - LC_CTYPE=invalid_1 chg cmd
- Py3.7 sets LC_CTYPE=C.UTF-8 on Linux
- CHGORIG_LC_CTYPE=1invalid_1
- Environment hash is as-if 'LC_CTYPE=invalid_1', even though it really is 
LC_CTYPE=C.UTF-8
  - LC_CTYPE=invalid_2 chg cmd
- Connect to the existing server, call setenv
- Calculate hash as-if 'LC_CTYPE=invalid_2', even though it is identical to 
the other command server (C.UTF-8)
  
  This isn't a huge issue in practice. It can cause two separate command servers
  that are functionally identical to be executed. This should not be considered 
an
  observable/intentional effect, and is something that may change in the future.
  
  This is hopefully a more future-proof fix than the original one in D7550 
: we
  won't have to worry about behavior changes (or incorrect readings of the 
current
  behavior) in future versions of Python. If more environment variables end up
  being modified, it's a simple one line fix in chg.c to also preserve those.
  
  Important Caveat: if something causes one of these variables to change 
*inside*
  the hg serve process, we're going to end up persisting that value. Example:
  
  - Command server starts up, Python sets LC_CTYPE=C.UTF-8
  - Some extension sets LC_CTYPE=en_US.UTF-8 in the environment
  - The next invocation of chg will call setenv, saying via CHGORIG_LC_CTYPE 
that the variable should not be in the environment
  - chgserver.py will preserve LC_CTYPE=en_US.UTF-8
  
  This is quite unlikely and would previously have caused a different problem:
  
  - Command server starts up, let's assume py2 and so LC_CTYPE is unmodified
  - Some extension sets LC_CTYPE=en_US.UTF-8 in the environment
  - The next invocation of chg will call setenv, saying LC_CTYPE shouldn't be 
in the environment
  - chgserver.py will say that the environment hash doesn't match and redirect 
chg to a new server
  - chg will create that server and use that, but it'll have an identical hash 
to the previous one (since at startup LC_CTYPE isn't modified by the extension 
yet). This should be fine, it'll then run the command like normal.
  - Every time chg is run, it restarts the command server due to this issue, 
slowing everything down :)

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D8023

AFFECTED FILES
  mercurial/chgserver.py
  tests/test-chg.t

CHANGE DETAILS

diff --git a/tests/test-chg.t b/tests/test-chg.t
--- a/tests/test-chg.t
+++ b/tests/test-chg.t
@@ -332,8 +332,7 @@
   /MM/DD HH:MM:SS (PID)> log -R cached
   /MM/DD HH:MM:SS (PID)> loaded repo into cache: $TESTTMP/cached (in  ...s)
 
-Test that chg works even when python "coerces" the locale (py3.7+, which is 
done
-by default if none of LC_ALL, LC_CTYPE, or LANG are set in the environment)
+Test that chg works even when python "coerces" the locale (py3.7+)
 
   $ cat > $TESTTMP/debugenv.py < from mercurial import encoding
@@ -345,11 +344,31 @@
   > for k in [b'LC_ALL', b'LC_CTYPE', b'LANG']:
   > v = encoding.environ.get(k)
   > if v is not None:
-  > ui.write(b'%s=%s\n' % (k, encoding.environ[k]))
+  > ui.write(b'debugenv: %s=%s\n' % (k, encoding.environ[k]))
   > EOF
-  $ LANG= LC_ALL= LC_CTYPE= chg \
-  >--config extensions.debugenv=$TESTTMP/debugenv.py debugenv
-  LC_ALL=
-  LC_CTYPE=C.UTF-8 (py37 !)
-  LC_CTYPE= (no-py37 !)
-  LANG=
+  $ CHGDEBUG= LANG= LC_ALL= LC_CTYPE= chg \
+  >--config extensions.debugenv=$TESTTMP/debugenv.py debugenv 2>&1 \
+  >| egrep 'debugenv|start'
+  chg: debug: * start cmdserver at * (glob)
+  debugenv: LC_ALL=
+  debugenv: LC_CTYPE=C.UTF-8 (py37 !)
+  debugenv: LC_CTYPE= (no-py37 !)
+  debugenv: LANG=
+(Should not trigger a command