D2819: hgweb: refactor repository name URL parsing

2018-04-05 Thread mharbison72 (Matt Harbison)
mharbison72 added a comment.


  In https://phab.mercurial-scm.org/D2819#49519, @mharbison72 wrote:
  
  > I installed the latest default branch with SCM Manager, and it 404s even 
simple things like `hg id https://...`.  I bisected back to this.  The paths in 
the access log looks unchanged:
  >
  > With this commit:
  >
  > 127.0.0.1 - - [04/Apr/2018:12:44:12 -0400] "GET /hook/hg/?ping=true 
HTTP/1.1" 204 -
  >  10.10.1.36 - - [04/Apr/2018:12:44:12 -0400] "GET 
/hg/eng/devsetup?cmd=capabilities HTTP/1.1" 404 949
  >
  > Parent of this commit:
  >
  > 127.0.0.1 - - [04/Apr/2018:12:47:18 -0400] "GET /hook/hg/?ping=true 
HTTP/1.1" 204 -
  >  10.10.1.36 - - [04/Apr/2018:12:47:19 -0400] "GET 
/hg/eng/devsetup?cmd=capabilities HTTP/1.1" 200 422
  >  10.10.1.36 - - [04/Apr/2018:12:47:19 -0400] "GET 
/hg/eng/devsetup?cmd=lookup HTTP/1.1" 200 43
  >  10.10.1.36 - - [04/Apr/2018:12:47:20 -0400] "GET 
/hg/eng/devsetup?cmd=listkeys HTTP/1.1" 200 30
  >  10.10.1.36 - - [04/Apr/2018:12:47:20 -0400] "GET 
/hg/eng/devsetup?cmd=listkeys HTTP/1.1" 200 -
  >
  > I'm going to try to add print statements, but if you have any darts you'd 
like to throw, I'd be happy to try it.
  
  
  Finally figured this out.  SCM Manager must be generating this stub hgweb.py 
(it gets rewritten every time tomcat is restarted):
  
import os
from mercurial import demandimport
from mercurial.hgweb import hgweb, wsgicgi

repositoryPath = os.environ['SCM_REPOSITORY_PATH']

demandimport.enable()

application = hgweb(repositoryPath)
wsgicgi.launch(application)
  
  Note that it's not using hgweb**dir**.  (I have no idea why not, but it does 
let you organize repos into virtual directories under '/hg', and git repos 
under '/git'.)  If we simply set the reponame value in 
request.parserequestfromenv() from env if it wasn't passed in, then everything 
works.  Not real nice, but I assume that we don't want to break existing 
hosting packages when upgrading just Mercurial?  I'll send a patch if this is 
acceptable.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2819

To: indygreg, #hg-reviewers, durin42
Cc: mharbison72, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2819: hgweb: refactor repository name URL parsing

2018-04-04 Thread mharbison72 (Matt Harbison)
mharbison72 added a comment.


  I installed the latest default branch with SCM Manager, and it 404s even 
simple things like `hg id https://...`.  I bisected back to this.  The paths in 
the access log looks unchanged:
  
  With this commit:
  
  127.0.0.1 - - [04/Apr/2018:12:44:12 -0400] "GET /hook/hg/?ping=true HTTP/1.1" 
204 -
  10.10.1.36 - - [04/Apr/2018:12:44:12 -0400] "GET 
/hg/eng/devsetup?cmd=capabilities HTTP/1.1" 404 949
  
  Parent of this commit:
  
  127.0.0.1 - - [04/Apr/2018:12:47:18 -0400] "GET /hook/hg/?ping=true HTTP/1.1" 
204 -
  10.10.1.36 - - [04/Apr/2018:12:47:19 -0400] "GET 
/hg/eng/devsetup?cmd=capabilities HTTP/1.1" 200 422
  10.10.1.36 - - [04/Apr/2018:12:47:19 -0400] "GET /hg/eng/devsetup?cmd=lookup 
HTTP/1.1" 200 43
  10.10.1.36 - - [04/Apr/2018:12:47:20 -0400] "GET 
/hg/eng/devsetup?cmd=listkeys HTTP/1.1" 200 30
  10.10.1.36 - - [04/Apr/2018:12:47:20 -0400] "GET 
/hg/eng/devsetup?cmd=listkeys HTTP/1.1" 200 -
  
  I'm going to try to add print statements, but if you have any darts you'd 
like to throw, I'd be happy to try it.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2819

To: indygreg, #hg-reviewers, durin42
Cc: mharbison72, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2819: hgweb: refactor repository name URL parsing

2018-03-12 Thread indygreg (Gregory Szorc)
indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  The hgwebdir WSGI application detects when a requested URL is for
  a known repository and it effectively forwards the request to the
  hgweb WSGI application.
  
  The hgweb WSGI application needs to route the request based on the
  base URL for the repository. The way this normally works is
  SCRIPT_NAME is used to resolve the base URL and PATH_INFO
  contains the path after the script.
  
  But with hgwebdir, SCRIPT_NAME refers to hgwebdir, not the base
  URL for the repository. So, there was a hacky REPO_NAME environment
  variable being set to convey the part of the URL that represented
  the repository so hgweb could ignore this path component for
  routing purposes.
  
  The use of the environment variable for passing internal state
  is pretty hacky. Plus, it wasn't clear from the perspective of
  the URL parsing code what was going on.
  
  This commit improves matters by making the repository name an
  explicit argument to the request parser. The logic around
  handling of this value has been shored up. We add various checks
  that the argument is used properly - that the repository name
  does represent the prefix of the PATH_INFO.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2819

AFFECTED FILES
  mercurial/hgweb/hgwebdir_mod.py
  mercurial/hgweb/request.py
  tests/test-wsgirequest.py

CHANGE DETAILS

diff --git a/tests/test-wsgirequest.py b/tests/test-wsgirequest.py
--- a/tests/test-wsgirequest.py
+++ b/tests/test-wsgirequest.py
@@ -5,6 +5,9 @@
 from mercurial.hgweb import (
 request as requestmod,
 )
+from mercurial import (
+error,
+)
 
 DEFAULT_ENV = {
 r'REQUEST_METHOD': r'GET',
@@ -20,11 +23,11 @@
 r'wsgi.run_once': False,
 }
 
-def parse(env, bodyfh=None, extra=None):
+def parse(env, bodyfh=None, reponame=None, extra=None):
 env = dict(env)
 env.update(extra or {})
 
-return requestmod.parserequestfromenv(env, bodyfh)
+return requestmod.parserequestfromenv(env, bodyfh, reponame=reponame)
 
 class ParseRequestTests(unittest.TestCase):
 def testdefault(self):
@@ -203,24 +206,26 @@
 self.assertTrue(r.havepathinfo)
 
 def testreponame(self):
-"""REPO_NAME path components get stripped from URL."""
-r = parse(DEFAULT_ENV, extra={
-r'REPO_NAME': r'repo',
-r'PATH_INFO': r'/path1/path2'
-})
+"""repository path components get stripped from URL."""
+
+with self.assertRaisesRegexp(error.ProgrammingError,
+ b'reponame requires PATH_INFO'):
+parse(DEFAULT_ENV, reponame=b'repo')
 
-self.assertEqual(r.url, b'http://testserver/path1/path2')
-self.assertEqual(r.baseurl, b'http://testserver')
-self.assertEqual(r.advertisedurl, b'http://testserver/path1/path2')
-self.assertEqual(r.advertisedbaseurl, b'http://testserver')
-self.assertEqual(r.apppath, b'/repo')
-self.assertEqual(r.dispatchparts, [b'path1', b'path2'])
-self.assertEqual(r.dispatchpath, b'path1/path2')
-self.assertTrue(r.havepathinfo)
-self.assertEqual(r.reponame, b'repo')
+with self.assertRaisesRegexp(error.ProgrammingError,
+ b'PATH_INFO does not begin with repo '
+ b'name'):
+parse(DEFAULT_ENV, reponame=b'repo', extra={
+r'PATH_INFO': r'/pathinfo',
+})
 
-r = parse(DEFAULT_ENV, extra={
-r'REPO_NAME': r'repo',
+with self.assertRaisesRegexp(error.ProgrammingError,
+ b'reponame prefix of PATH_INFO'):
+parse(DEFAULT_ENV, reponame=b'repo', extra={
+r'PATH_INFO': r'/repoextra/path',
+})
+
+r = parse(DEFAULT_ENV, reponame=b'repo', extra={
 r'PATH_INFO': r'/repo/path1/path2',
 })
 
@@ -234,8 +239,7 @@
 self.assertTrue(r.havepathinfo)
 self.assertEqual(r.reponame, b'repo')
 
-r = parse(DEFAULT_ENV, extra={
-r'REPO_NAME': r'prefix/repo',
+r = parse(DEFAULT_ENV, reponame=b'prefix/repo', extra={
 r'PATH_INFO': r'/prefix/repo/path1/path2',
 })
 
diff --git a/mercurial/hgweb/request.py b/mercurial/hgweb/request.py
--- a/mercurial/hgweb/request.py
+++ b/mercurial/hgweb/request.py
@@ -155,11 +155,16 @@
 # Request body input stream.
 bodyfh = attr.ib()
 
-def parserequestfromenv(env, bodyfh):
+def parserequestfromenv(env, bodyfh, reponame=None):
 """Parse URL components from environment variables.
 
 WSGI defines request attributes via environment variables. This function
 parses the environment variables into a data structure.
+
+If ``reponame`` is defined, the leading path components matching that
+string are effectively