Re: [Python-Dev] [RFC] urlparse - parse query facility

2007-06-16 Thread O.R.Senthil Kumaran
* Fred L. Drake, Jr. [EMAIL PROTECTED] [2007-06-16 01:06:59]:

   * Coding question: Without retyping the bunch of code again in the
   BaseResult, would is the possible to call parse_qs/parse_qsl function on
   self.query and provide the result? Basically, what would be a good of
   doing it.
 
 That's what I was thinking.  Just add something like this to BaseResult 
 (untested):
 
 def parsedQuery(self, keep_blank_values=False, strict_parsing=False):
 return parse_qs(
 self.query,
 keep_blank_values=keep_blank_values,
 strict_parsing=strict_parsing)
 
 def parsedQueryList(self, keep_blank_values=False, strict_parsing=False):
 return parse_qsl(
 self.query,
 keep_blank_values=keep_blank_values,
 strict_parsing=strict_parsing)

Thanks Fred. That really helped. :-)

I have updated the urlparse.py module, cgi.py and also included in the tests
in the test_urlparse.py to test this new functionality. 
test run passed for all the valid queries, except for these:

#(=, {}),
#(==, {}),
#(=;=, {}),

The testcases are basically from test_cgi.py module and there is comment on
validity of these 3 tests for query values. 

Pending stuff is updating the documentation.

I maintained all the files temporarily at:

http://cvs.sarovar.org/cgi-bin/cvsweb.cgi/python/?cvsroot=uthcode

I had requested a commit access to Summer of Code branch in my previous mail,
but I guess it not been noticed yet. I shall update the files later or
send in as patches for application.


 Whether there's a real win with this is unclear.  I generally prefer having 
 an 
 object that represents the URL and lets me get what I want from it, rather 
 than having to pass the bits around to separate parsing functions.  The 

I agree. This is really convenient when one comes to know about it.

Thanks,
Senthil

-- 
O.R.Senthil Kumaran
http://uthcode.sarovar.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [RFC] urlparse - parse query facility

2007-06-15 Thread O.R.Senthil Kumaran
* Fred L. Drake, Jr. [EMAIL PROTECTED] [2007-06-13 22:42:21]:

 I see no reason to incorporate the URL splitting into the function; the 
 existing function signatures for cgi.parse_qs and cgi.parse_qsl are 
 sufficient.

Thanks for the comments, Fred. I understand, that having the signatures of
parse_qs and parse_qsl are sufficient in the urlparse module and invoking the
same from cgi module will be correct.

The urlparse will cotain parse_qs and parse_qsl takes the query string (not
url) and with optional arguments keep_blank_values and strict_parsing (same as 
cgi).

http://deadbeefbabe.org/paste/5154

 It may be convenient to add methods to the urlparse.BaseResult class 
 providing 
 access to the parsed version of the query on the instance.
 

This is where, I spent a little bit time and I am unable to comeout
conclusively as how it can be done.

Someone in the list, please help me.

* parse_qs or parse_qsl will be invoked on the query component separately by
the user.
* If parsed query needs to be available at the instance as a convenience
function, then we will have to assume the keep_blank_values and strict_parsing
values.
* Coding question: Without retyping the bunch of code again in the BaseResult,
would is the possible to call parse_qs/parse_qsl function on self.query and
provide the result? Basically, what would be a good of doing it.


Thanks,
Senthil

-- 
O.R.Senthil Kumaran
http://uthcode.sarovar.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [RFC] urlparse - parse query facility

2007-06-15 Thread Fred L. Drake, Jr.
On Saturday 16 June 2007, O.R.Senthil Kumaran wrote:
  The urlparse will cotain parse_qs and parse_qsl takes the query string
  (not url) and with optional arguments keep_blank_values and strict_parsing
  (same as cgi).
 
  http://deadbeefbabe.org/paste/5154

Looks good.

   It may be convenient to add methods to the urlparse.BaseResult class
   providing access to the parsed version of the query on the instance.
...
  * parse_qs or parse_qsl will be invoked on the query component separately
  by the user.

Yes; this doesn't change, really.  Methods would still need to be invoked 
separately, but the query string doesn't need to be passed in; it's part of 
the data object.

  * If parsed query needs to be available at the instance as a convenience
  function, then we will have to assume the keep_blank_values and
  strict_parsing values.

If it were a property, yes, but I think a method on the result object makes 
more sense because we don't want to assume values for these arguments.

  * Coding question: Without retyping the bunch of code again in the
  BaseResult, would is the possible to call parse_qs/parse_qsl function on
  self.query and provide the result? Basically, what would be a good of
  doing it.

That's what I was thinking.  Just add something like this to BaseResult 
(untested):

def parsedQuery(self, keep_blank_values=False, strict_parsing=False):
return parse_qs(
self.query,
keep_blank_values=keep_blank_values,
strict_parsing=strict_parsing)

def parsedQueryList(self, keep_blank_values=False, strict_parsing=False):
return parse_qsl(
self.query,
keep_blank_values=keep_blank_values,
strict_parsing=strict_parsing)

Whether there's a real win with this is unclear.  I generally prefer having an 
object that represents the URL and lets me get what I want from it, rather 
than having to pass the bits around to separate parsing functions.  The 
result objects were added in 2.5, though, and I've no real idea how widely 
they've been adopted.


  -Fred

-- 
Fred L. Drake, Jr.   fdrake at acm.org
Chaos is the score upon which reality is written. --Henry Miller
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] [RFC] urlparse - parse query facility

2007-06-13 Thread Jim Jewett
 a) import cgi and call cgi module's query_ps.  [circular imports]

or

 b) Implement a stand alone query parsing facility in urlparse *AS IN*
 cgi module.

Assuming (b), please remove the (code for the) parsing from the cgi
module, and just import it back from urlparse (or urllib).  Since cgi
already imports urllib (which imports urlparse), this isn't adding any
dependencies -- but it keeps the code in a single location.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [RFC] urlparse - parse query facility

2007-06-13 Thread O.R.Senthil Kumaran
* Jim Jewett [EMAIL PROTECTED] [2007-06-13 19:27:24]:

  a) import cgi and call cgi module's query_ps.  [circular imports]
 
  or
 
  b) Implement a stand alone query parsing facility in urlparse *AS IN*
  cgi module.
 
  Assuming (b), please remove the (code for the) parsing from the cgi
  module, and just import it back from urlparse (or urllib).  Since cgi
  already imports urllib (which imports urlparse), this isn't adding any
  dependencies -- but it keeps the code in a single location.

Sure, thats a good idea as I see it. It wont break anything as well.

Thanks,

-- 
O.R.Senthil Kumaran
http://uthcode.sarovar.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [RFC] urlparse - parse query facility

2007-06-13 Thread Fred L. Drake, Jr.
On Tuesday 12 June 2007, Senthil Kumaran wrote:
  This mail is a request for comments on changes to urlparse module. We
  understand that urlparse returns the 'complete query' value as the query
  component and does not
  provide the facilities to separate the query components. User will have to
  use the cgi module (cgi.parse_qs) to get the query parsed.

I agree with the comments Jim provided.

  Below method implements the urlparse_qs(url,
  keep_blank_values,strict_parsing) that will help in parsing the query
  component of the url. It behaves same as the cgi.parse_qs.

Except that it takes a URL, not only a query string.

  def urlparse_qs(url, keep_blank_values=0, strict_parsing=0):
...
  scheme, netloc, url, params, querystring, fragment = urlparse(url)

I see no reason to incorporate the URL splitting into the function; the 
existing function signatures for cgi.parse_qs and cgi.parse_qsl are 
sufficient.

It may be convenient to add methods to the urlparse.BaseResult class providing 
access to the parsed version of the query on the instance.


  -Fred

-- 
Fred L. Drake, Jr.   fdrake at acm.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] [RFC] urlparse - parse query facility

2007-06-12 Thread Senthil Kumaran
Hi all,
This mail is a request for comments on changes to urlparse module. We understand
that urlparse returns the 'complete query' value as the query
component and does not
provide the facilities to separate the query components. User will have to use
the cgi module (cgi.parse_qs) to get the query parsed.
There has been a discussion in the past, on having a method of parse query
string available from urlparse module itself. [1]

To implement the query parse feature in urlparse module, we can:
a) import cgi and call cgi module's query_ps.
This approach will have problems as it
i) imports cgi for urlparse module.
ii) cgi module in turn imports urllib and urlparse.

b) Implement a stand alone query parsing facility in urlparse *AS IN*
cgi module.

Below method implements the urlparse_qs(url, keep_blank_values,strict_parsing)
that will help in parsing the query component of the url. It behaves same as the
cgi.parse_qs.

Please let me know your comments on the below code.

--

def unquote(s):
unquote('abc%20def') - 'abc def'.
res = s.split('%')
for i in xrange(1, len(res)):
item = res[i]
try:
res[i] = _hextochr[item[:2]] + item[2:]
except KeyError:
res[i] = '%' + item
except UnicodeDecodeError:
res[i] = unichr(int(item[:2], 16)) + item[2:]
return .join(res)

def urlparse_qs(url, keep_blank_values=0, strict_parsing=0):
Parse a URL query string and return the components as a dictionary.

Based on the cgi.parse_qs method.This is a utility function provided
with urlparse so that users need not use cgi module for
parsing the url query string.

Arguments:

url: URL with query string to be parsed

keep_blank_values: flag indicating whether blank values in
URL encoded queries should be treated as blank strings.
A true value indicates that blanks should be retained as
blank strings.  The default false value indicates that
blank values are to be ignored and treated as if they were
not included.

strict_parsing: flag indicating what to do with parsing errors.
If false (the default), errors are silently ignored.
If true, errors raise a ValueError exception.


scheme, netloc, url, params, querystring, fragment = urlparse(url)

pairs = [s2 for s1 in querystring.split('') for s2 in s1.split(';')]
query = []
for name_value in pairs:
if not name_value and not strict_parsing:
continue
nv = name_value.split('=', 1)
if len(nv) != 2:
if strict_parsing:
raise ValueError, bad query field: %r % (name_value,)
# Handle case of a control-name with no equal sign
if keep_blank_values:
nv.append('')
else:
continue
if len(nv[1]) or keep_blank_values:
name = unquote(nv[0].replace('+', ' '))
value = unquote(nv[1].replace('+', ' '))
query.append((name, value))

dict = {}
for name, value in query:
if name in dict:
dict[name].append(value)
else:
dict[name] = [value]
return dict

--

Testing:

$ python
Python 2.6a0 (trunk, Jun 10 2007, 12:04:03)
[GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2
Type help, copyright, credits or license for more information.
 import urlparse
 dir(urlparse)
['BaseResult', 'MAX_CACHE_SIZE', 'ParseResult', 'SplitResult', '__all__',
'__builtins__', '__doc__', '__file__', '__name__', '_parse_cache',
'_splitnetloc', '_splitparams', 'clear_cache', 'non_hierarchical',
'scheme_chars', 'test', 'test_input', 'unquote', 'urldefrag', 'urljoin',
'urlparse', 'urlparse_qs', 'urlsplit', 'urlunparse', 'urlunsplit',
'uses_fragment', 'uses_netloc', 'uses_params', 'uses_query', 'uses_relative']
 URL =
 'http://www.google.com/search?hl=enlr=ie=UTF-8oe=utf-8q=south+africa+travel+cape+town'
 print urlparse.urlparse_qs(URL)
{'q': ['south africa travel cape town'], 'oe': ['utf-8'], 'ie': ['UTF-8'],
'hl': ['en']}
 print urlparse.urlparse_qs(URL,keep_blank_values=1)
{'q': ['south africa travel cape town'], 'ie': ['UTF-8'], 'oe': ['utf-8'],
'lr': [''], 'hl': ['en']}



Thanks,
Senthil

[1] http://mail.python.org/pipermail/tutor/2002-August/016823.html



-- 
O.R.Senthil Kumaran
http://phoe6.livejournal.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com