D2776: hgweb: use a multidict for holding query string parameters

2018-03-15 Thread indygreg (Gregory Szorc)
indygreg added a comment.


  I may have a shot at rewriting this class today. I do want to prioritize on 
adding missing components to the new wire protocol so reviewers have better 
context, however. So if someone else wants to do the work, I'd happily review 
it. Otherwise, I should get around to it sometime in the next few days.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2776

To: indygreg, #hg-reviewers, durin42
Cc: yuja, av6, martinvonz, durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2776: hgweb: use a multidict for holding query string parameters

2018-03-15 Thread yuja (Yuya Nishihara)
yuja added inline comments.

INLINE COMMENTS

> indygreg wrote in request.py:31
> I agree with the sentiment about this being a list in disguise. One reason I 
> didn't bother to optimize it is because I don't think we do any `qsparams` 
> lookups in loops and I don't believe we have any more than ~10 arguments to 
> any single request. So even if we have an `O(n^2)` situation, n is so small 
> that it doesn't matter.
> 
> I can add a follow-up comment easily enough. Or we could just index the 
> fields by key. That doesn't seem too difficult.

Isn't the `n` controllable by a malicious user?

I agree with @martinvonz in that it would be probably easier to
write as a dict of lists (or dict + lists).

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2776

To: indygreg, #hg-reviewers, durin42
Cc: yuja, av6, martinvonz, durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2776: hgweb: use a multidict for holding query string parameters

2018-03-14 Thread indygreg (Gregory Szorc)
indygreg added inline comments.

INLINE COMMENTS

> durin42 wrote in request.py:31
> I'm a little uncomfortable with this not advertising that it's not a hash 
> table, and htat lookups are O(N). Maybe send a docstring followup?

I agree with the sentiment about this being a list in disguise. One reason I 
didn't bother to optimize it is because I don't think we do any `qsparams` 
lookups in loops and I don't believe we have any more than ~10 arguments to any 
single request. So even if we have an `O(n^2)` situation, n is so small that it 
doesn't matter.

I can add a follow-up comment easily enough. Or we could just index the fields 
by key. That doesn't seem too difficult.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2776

To: indygreg, #hg-reviewers, durin42
Cc: av6, martinvonz, durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2776: hgweb: use a multidict for holding query string parameters

2018-03-14 Thread av6 (Anton Shestakov)
av6 added inline comments.

INLINE COMMENTS

> martinvonz wrote in request.py:45
> Would it make sense to make this an error if there isn't exactly one value?

I don't think it would. AIUI, that would make handling query strings like 
?style=foo=bar just more difficult. I'm not aware of any web tools where 
default access method wouldn't simply return that last value.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2776

To: indygreg, #hg-reviewers, durin42
Cc: av6, martinvonz, durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2776: hgweb: use a multidict for holding query string parameters

2018-03-13 Thread martinvonz (Martin von Zweigbergk)
martinvonz added inline comments.

INLINE COMMENTS

> request.py:39-41
> +# Stores (key, value) 2-tuples. This isn't the most efficient. But we
> +# don't rely on parameters that much, so it shouldn't be a perf 
> issue.
> +# we can always add dict for fast lookups.

Sure, that's probably fine, but why? Wouldn't it be easier to write it as dict 
of lists anyway?

> request.py:45
> +def __getitem__(self, key):
> +"""Returns the last set value for a key."""
> +for k, v in reversed(self._items):

Would it make sense to make this an error if there isn't exactly one value?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2776

To: indygreg, #hg-reviewers, durin42
Cc: martinvonz, durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2776: hgweb: use a multidict for holding query string parameters

2018-03-12 Thread durin42 (Augie Fackler)
durin42 added inline comments.

INLINE COMMENTS

> request.py:31
>  
> +class multidict(object):
> +"""A dict like object that can store multiple values for a key.

I'm a little uncomfortable with this not advertising that it's not a hash 
table, and htat lookups are O(N). Maybe send a docstring followup?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2776

To: indygreg, #hg-reviewers
Cc: durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D2776: hgweb: use a multidict for holding query string parameters

2018-03-10 Thread indygreg (Gregory Szorc)
indygreg updated this revision to Diff 6840.
indygreg edited the summary of this revision.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D2776?vs=6838=6840

REVISION DETAIL
  https://phab.mercurial-scm.org/D2776

AFFECTED FILES
  mercurial/hgweb/request.py
  mercurial/wireprotoserver.py

CHANGE DETAILS

diff --git a/mercurial/wireprotoserver.py b/mercurial/wireprotoserver.py
--- a/mercurial/wireprotoserver.py
+++ b/mercurial/wireprotoserver.py
@@ -79,7 +79,7 @@
 return [data[k] for k in keys]
 
 def _args(self):
-args = util.rapply(pycompat.bytesurl, self._wsgireq.form.copy())
+args = self._req.qsparams.asdictoflists()
 postlen = int(self._req.headers.get(b'X-HgArgs-Post', 0))
 if postlen:
 args.update(urlreq.parseqs(
@@ -170,10 +170,10 @@
 # HTTP version 1 wire protocol requests are denoted by a "cmd" query
 # string parameter. If it isn't present, this isn't a wire protocol
 # request.
-if 'cmd' not in req.querystringdict:
+if 'cmd' not in req.qsparams:
 return False
 
-cmd = req.querystringdict['cmd'][0]
+cmd = req.qsparams['cmd']
 
 # The "cmd" request parameter is used by both the wire protocol and hgweb.
 # While not all wire protocol commands are available for all transports,
diff --git a/mercurial/hgweb/request.py b/mercurial/hgweb/request.py
--- a/mercurial/hgweb/request.py
+++ b/mercurial/hgweb/request.py
@@ -28,6 +28,90 @@
 util,
 )
 
+class multidict(object):
+"""A dict like object that can store multiple values for a key.
+
+Used to store parsed request parameters.
+
+This is inspired by WebOb's class of the same name.
+"""
+def __init__(self):
+# Stores (key, value) 2-tuples. This isn't the most efficient. But we
+# don't rely on parameters that much, so it shouldn't be a perf issue.
+# we can always add dict for fast lookups.
+self._items = []
+
+def __getitem__(self, key):
+"""Returns the last set value for a key."""
+for k, v in reversed(self._items):
+if k == key:
+return v
+
+raise KeyError(key)
+
+def __setitem__(self, key, value):
+"""Replace a values for a key with a new value."""
+try:
+del self[key]
+except KeyError:
+pass
+
+self._items.append((key, value))
+
+def __delitem__(self, key):
+"""Delete all values for a key."""
+oldlen = len(self._items)
+
+self._items[:] = [(k, v) for k, v in self._items if k != key]
+
+if oldlen == len(self._items):
+raise KeyError(key)
+
+def __contains__(self, key):
+return any(k == key for k, v in self._items)
+
+def __len__(self):
+return len(self._items)
+
+def get(self, key, default=None):
+try:
+return self.__getitem__(key)
+except KeyError:
+return default
+
+def add(self, key, value):
+"""Add a new value for a key. Does not replace existing values."""
+self._items.append((key, value))
+
+def getall(self, key):
+"""Obtains all values for a key."""
+return [v for k, v in self._items if k == key]
+
+def getone(self, key):
+"""Obtain a single value for a key.
+
+Raises KeyError if key not defined or it has multiple values set.
+"""
+vals = self.getall(key)
+
+if not vals:
+raise KeyError(key)
+
+if len(vals) > 1:
+raise KeyError('multiple values for %r' % key)
+
+return vals[0]
+
+def asdictoflists(self):
+d = {}
+for k, v in self._items:
+if k in d:
+d[k].append(v)
+else:
+d[k] = [v]
+
+return d
+
 @attr.s(frozen=True)
 class parsedrequest(object):
 """Represents a parsed WSGI request.
@@ -56,10 +140,8 @@
 havepathinfo = attr.ib()
 # Raw query string (part after "?" in URL).
 querystring = attr.ib()
-# List of 2-tuples of query string arguments.
-querystringlist = attr.ib()
-# Dict of query string arguments. Values are lists with at least 1 item.
-querystringdict = attr.ib()
+# multidict of query string parameters.
+qsparams = attr.ib()
 # wsgiref.headers.Headers instance. Operates like a dict with case
 # insensitive keys.
 headers = attr.ib()
@@ -157,14 +239,9 @@
 
 # We store as a list so we have ordering information. We also store as
 # a dict to facilitate fast lookup.
-querystringlist = util.urlreq.parseqsl(querystring, keep_blank_values=True)
-
-querystringdict = {}
-for k, v in querystringlist:
-if k in querystringdict:
-querystringdict[k].append(v)
-else:
-querystringdict[k] = [v]
+qsparams = multidict()
+for k, v in util.urlreq.parseqsl(querystring, keep_blank_values=True):
+

D2776: hgweb: use a multidict for holding query string parameters

2018-03-10 Thread indygreg (Gregory Szorc)
indygreg updated this revision to Diff 6838.
indygreg edited the summary of this revision.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D2776?vs=6822=6838

REVISION DETAIL
  https://phab.mercurial-scm.org/D2776

AFFECTED FILES
  mercurial/hgweb/request.py
  mercurial/wireprotoserver.py

CHANGE DETAILS

diff --git a/mercurial/wireprotoserver.py b/mercurial/wireprotoserver.py
--- a/mercurial/wireprotoserver.py
+++ b/mercurial/wireprotoserver.py
@@ -170,10 +170,10 @@
 # HTTP version 1 wire protocol requests are denoted by a "cmd" query
 # string parameter. If it isn't present, this isn't a wire protocol
 # request.
-if 'cmd' not in req.querystringdict:
+if 'cmd' not in req.qsparams:
 return False
 
-cmd = req.querystringdict['cmd'][0]
+cmd = req.qsparams['cmd']
 
 # The "cmd" request parameter is used by both the wire protocol and hgweb.
 # While not all wire protocol commands are available for all transports,
diff --git a/mercurial/hgweb/request.py b/mercurial/hgweb/request.py
--- a/mercurial/hgweb/request.py
+++ b/mercurial/hgweb/request.py
@@ -28,6 +28,90 @@
 util,
 )
 
+class multidict(object):
+"""A dict like object that can store multiple values for a key.
+
+Used to store parsed request parameters.
+
+This is inspired by WebOb's class of the same name.
+"""
+def __init__(self):
+# Stores (key, value) 2-tuples. This isn't the most efficient. But we
+# don't rely on parameters that much, so it shouldn't be a perf issue.
+# we can always add dict for fast lookups.
+self._items = []
+
+def __getitem__(self, key):
+"""Returns the last set value for a key."""
+for k, v in reversed(self._items):
+if k == key:
+return v
+
+raise KeyError(key)
+
+def __setitem__(self, key, value):
+"""Replace a values for a key with a new value."""
+try:
+del self[key]
+except KeyError:
+pass
+
+self._items.append((key, value))
+
+def __delitem__(self, key):
+"""Delete all values for a key."""
+oldlen = len(self._items)
+
+self._items[:] = [(k, v) for k, v in self._items if k != key]
+
+if oldlen == len(self._items):
+raise KeyError(key)
+
+def __contains__(self, key):
+return any(k == key for k, v in self._items)
+
+def __len__(self):
+return len(self._items)
+
+def get(self, key, default=None):
+try:
+return self.__getitem__(key)
+except KeyError:
+return default
+
+def add(self, key, value):
+"""Add a new value for a key. Does not replace existing values."""
+self._items.append((key, value))
+
+def getall(self, key):
+"""Obtains all values for a key."""
+return [v for k, v in self._items if k == key]
+
+def getone(self, key):
+"""Obtain a single value for a key.
+
+Raises KeyError if key not defined or it has multiple values set.
+"""
+vals = self.getall(key)
+
+if not vals:
+raise KeyError(key)
+
+if len(vals) > 1:
+raise KeyError('multiple values for %r' % key)
+
+return vals[0]
+
+def asdictoflists(self):
+d = {}
+for k, v in self._items:
+if k in d:
+d[k].append(v)
+else:
+d[k] = [v]
+
+return d
+
 @attr.s(frozen=True)
 class parsedrequest(object):
 """Represents a parsed WSGI request.
@@ -56,10 +140,8 @@
 havepathinfo = attr.ib()
 # Raw query string (part after "?" in URL).
 querystring = attr.ib()
-# List of 2-tuples of query string arguments.
-querystringlist = attr.ib()
-# Dict of query string arguments. Values are lists with at least 1 item.
-querystringdict = attr.ib()
+# multidict of query string parameters.
+qsparams = attr.ib()
 # wsgiref.headers.Headers instance. Operates like a dict with case
 # insensitive keys.
 headers = attr.ib()
@@ -157,14 +239,9 @@
 
 # We store as a list so we have ordering information. We also store as
 # a dict to facilitate fast lookup.
-querystringlist = util.urlreq.parseqsl(querystring, keep_blank_values=True)
-
-querystringdict = {}
-for k, v in querystringlist:
-if k in querystringdict:
-querystringdict[k].append(v)
-else:
-querystringdict[k] = [v]
+qsparams = multidict()
+for k, v in util.urlreq.parseqsl(querystring, keep_blank_values=True):
+qsparams.add(k, v)
 
 # HTTP_* keys contain HTTP request headers. The Headers structure should
 # perform case normalization for us. We just rewrite underscore to dash
@@ -197,8 +274,7 @@
  dispatchparts=dispatchparts, 
dispatchpath=dispatchpath,
  havepathinfo='PATH_INFO' in env,
   

D2776: hgweb: use a multidict for holding query string parameters

2018-03-09 Thread indygreg (Gregory Szorc)
indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  My intention with refactoring the WSGI code was to make it easier
  to read. I initially wanted to vendor and use WebOb, because it seems
  to be a pretty reasonable abstraction layer for WSGI. However, it isn't
  using relative imports and I didn't want to deal with the hassle of
  patching it. But that doesn't mean we can't use good ideas from WebOb.
  
  WebOb has a "multidict" data structure for holding parsed query string
  and POST form data. It quacks like a dict but allows you to store
  multiple values for each key. It offers mechanisms to return just one
  value, all values, or return 1 value asserting that only 1 value is
  set. I quite like its API.
  
  This commit implements a read-only "multidict" in the spirit of
  WebOb's multidict.
  
  We replace the query string attributes of our parsed request with
  an instance of it.
  
  For the record, I'm not a huge fan of the method to convert instances
  to a dict of lists. But this is needed to support the wsgirequest.form
  API.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2776

AFFECTED FILES
  mercurial/hgweb/request.py
  mercurial/wireprotoserver.py

CHANGE DETAILS

diff --git a/mercurial/wireprotoserver.py b/mercurial/wireprotoserver.py
--- a/mercurial/wireprotoserver.py
+++ b/mercurial/wireprotoserver.py
@@ -170,10 +170,10 @@
 # HTTP version 1 wire protocol requests are denoted by a "cmd" query
 # string parameter. If it isn't present, this isn't a wire protocol
 # request.
-if 'cmd' not in req.querystringdict:
+if 'cmd' not in req.qsparams:
 return False
 
-cmd = req.querystringdict['cmd'][0]
+cmd = req.qsparams['cmd']
 
 # The "cmd" request parameter is used by both the wire protocol and hgweb.
 # While not all wire protocol commands are available for all transports,
diff --git a/mercurial/hgweb/request.py b/mercurial/hgweb/request.py
--- a/mercurial/hgweb/request.py
+++ b/mercurial/hgweb/request.py
@@ -28,6 +28,84 @@
 util,
 )
 
+class multidict(object):
+"""A dict like object that can store multiple values for a key.
+
+Used to store parsed request parameters.
+
+This is inspired by WebOb's class of the same name.
+"""
+def __init__(self):
+# Stores (key, value) 2-tuples. This isn't the most efficient. But we
+# don't rely on parameters that much, so it shouldn't be a perf issue.
+# we can always add dict for fast lookups.
+self._items = []
+
+def __getitem__(self, key):
+"""Returns the last set value for a key."""
+for k, v in reversed(self._items):
+if k == key:
+return v
+
+raise KeyError(key)
+
+def __setitem__(self, key, value):
+"""Replace a values for a key with a new value."""
+try:
+del self[key]
+except KeyError:
+pass
+
+self._items.append((key, value))
+
+def __delitem__(self, key):
+"""Delete all values for a key."""
+oldlen = len(self._items)
+
+self._items[:] = [(k, v) for k, v in self._items if k != key]
+
+if oldlen == len(self._items):
+raise KeyError(key)
+
+def __contains__(self, key):
+return any(k == key for k, v in self._items)
+
+def __len__(self):
+return len(self._items)
+
+def add(self, key, value):
+"""Add a new value for a key. Does not replace existing values."""
+self._items.append((key, value))
+
+def getall(self, key):
+"""Obtains all values for a key."""
+return [v for k, v in self._items if k == key]
+
+def getone(self, key):
+"""Obtain a single value for a key.
+
+Raises KeyError if key not defined or it has multiple values set.
+"""
+vals = self.getall(key)
+
+if not vals:
+raise KeyError(key)
+
+if len(vals) > 1:
+raise KeyError('multiple values for %r' % key)
+
+return vals[0]
+
+def asdictoflists(self):
+d = {}
+for k, v in self._items:
+if k in d:
+d[k].append(v)
+else:
+d[k] = [v]
+
+return d
+
 @attr.s(frozen=True)
 class parsedrequest(object):
 """Represents a parsed WSGI request.
@@ -56,10 +134,8 @@
 havepathinfo = attr.ib()
 # Raw query string (part after "?" in URL).
 querystring = attr.ib()
-# List of 2-tuples of query string arguments.
-querystringlist = attr.ib()
-# Dict of query string arguments. Values are lists with at least 1 item.
-querystringdict = attr.ib()
+# multidict of query string parameters.
+qsparams = attr.ib()
 # wsgiref.headers.Headers instance. Operates like a dict with case
 # insensitive keys.
 headers = attr.ib()
@@ -157,14 +233,9 @@
 
 # We store as a list