Hi all, This mail is a request for comments on changes to urlparse module. We understand that urlparse returns the 'complete query' value as the query component and does not provide the facilities to separate the query components. User will have to use the cgi module (cgi.parse_qs) to get the query parsed. There has been a discussion in the past, on having a method of parse query string available from urlparse module itself. [1]
To implement the query parse feature in urlparse module, we can: a) import cgi and call cgi module's query_ps. This approach will have problems as it i) imports cgi for urlparse module. ii) cgi module in turn imports urllib and urlparse. b) Implement a stand alone query parsing facility in urlparse *AS IN* cgi module. Below method implements the urlparse_qs(url, keep_blank_values,strict_parsing) that will help in parsing the query component of the url. It behaves same as the cgi.parse_qs. Please let me know your comments on the below code. ---------------------------------------------------------------------- def unquote(s): """unquote('abc%20def') -> 'abc def'.""" res = s.split('%') for i in xrange(1, len(res)): item = res[i] try: res[i] = _hextochr[item[:2]] + item[2:] except KeyError: res[i] = '%' + item except UnicodeDecodeError: res[i] = unichr(int(item[:2], 16)) + item[2:] return "".join(res) def urlparse_qs(url, keep_blank_values=0, strict_parsing=0): """Parse a URL query string and return the components as a dictionary. Based on the cgi.parse_qs method.This is a utility function provided with urlparse so that users need not use cgi module for parsing the url query string. Arguments: url: URL with query string to be parsed keep_blank_values: flag indicating whether blank values in URL encoded queries should be treated as blank strings. A true value indicates that blanks should be retained as blank strings. The default false value indicates that blank values are to be ignored and treated as if they were not included. strict_parsing: flag indicating what to do with parsing errors. If false (the default), errors are silently ignored. If true, errors raise a ValueError exception. """ scheme, netloc, url, params, querystring, fragment = urlparse(url) pairs = [s2 for s1 in querystring.split('&') for s2 in s1.split(';')] query = [] for name_value in pairs: if not name_value and not strict_parsing: continue nv = name_value.split('=', 1) if len(nv) != 2: if strict_parsing: raise ValueError, "bad query field: %r" % (name_value,) # Handle case of a control-name with no equal sign if keep_blank_values: nv.append('') else: continue if len(nv[1]) or keep_blank_values: name = unquote(nv[0].replace('+', ' ')) value = unquote(nv[1].replace('+', ' ')) query.append((name, value)) dict = {} for name, value in query: if name in dict: dict[name].append(value) else: dict[name] = [value] return dict ---------------------------------------------------------------------- Testing: $ python Python 2.6a0 (trunk, Jun 10 2007, 12:04:03) [GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import urlparse >>> dir(urlparse) ['BaseResult', 'MAX_CACHE_SIZE', 'ParseResult', 'SplitResult', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '_parse_cache', '_splitnetloc', '_splitparams', 'clear_cache', 'non_hierarchical', 'scheme_chars', 'test', 'test_input', 'unquote', 'urldefrag', 'urljoin', 'urlparse', 'urlparse_qs', 'urlsplit', 'urlunparse', 'urlunsplit', 'uses_fragment', 'uses_netloc', 'uses_params', 'uses_query', 'uses_relative'] >>> URL = >>> 'http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=utf-8&q=south+africa+travel+cape+town' >>> print urlparse.urlparse_qs(URL) {'q': ['south africa travel cape town'], 'oe': ['utf-8'], 'ie': ['UTF-8'], 'hl': ['en']} >>> print urlparse.urlparse_qs(URL,keep_blank_values=1) {'q': ['south africa travel cape town'], 'ie': ['UTF-8'], 'oe': ['utf-8'], 'lr': [''], 'hl': ['en']} >>> Thanks, Senthil [1] http://mail.python.org/pipermail/tutor/2002-August/016823.html -- O.R.Senthil Kumaran http://phoe6.livejournal.com _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com