[jira] Commented: (MODPYTHON-94) Calling APR optional functions provided by mod_ssl

2005-11-29 Thread Graham Dumpleton (JIRA)
[ 
http://issues.apache.org/jira/browse/MODPYTHON-94?page=comments#action_12358781 
] 

Graham Dumpleton commented on MODPYTHON-94:
---

I thought about the ctypes approach when I proposed the first code I 
referenced. The problem was how you dealt with the complexity of setting up a 
call which had to refer to data within the internal mod_python request object C 
structure which wasn't otherwise visible at the Python layer through the 
mod_python Python request object. Eg.

  variable_value = ssl_var_lookup(
   request_object-request_rec-pool,
   request_object-request_rec-server,
   request_object-request_rec-connection,
   request_object-request_rec,
   variable_name);

It may well be possible, but it seemed to me to be quite messy and quite prone 
to bugs/problems due to the disconnect between the C code and Python code. Ie., 
you change the C structure to add/change stuff and you have to remember to 
change any Python code which is somehow using ctypes and defining offsets 
within the structure. Seemed more trouble that it was worth.

 Calling APR optional functions provided by mod_ssl
 --

  Key: MODPYTHON-94
  URL: http://issues.apache.org/jira/browse/MODPYTHON-94
  Project: mod_python
 Type: New Feature
   Components: core
 Versions: 3.2
  Environment: Apache 2
 Reporter: Deron Meranda
  Attachments: modpython4.tex.patch, requestobject.c.patch

 mod_python is not able to invoke APR Optional Functions.  There are
 some cases however where this could be of great benifit.
 For example, consider writing an authentication or authorization handler
 which needs to determine SSL properties (even if to just answer the
 simple question: is the connection SSL encrypted).  The normal way of
 looking in the subprocess_env for SSL_* variables does not work in those
 early handler phases because those variables are not set until the fixup 
 phase.
 The mod_ssl module though does provide both a ssl_is_https() and
 ssl_var_lookup() optional functions which can be used in earlier
 phases.  For example look at how mod_rewrite calls those; using
 the APR_DECLARE_OPTIONAL_FN and APR_RETRIEVE_OPTIONAL_FN
 macros.
 I can see how it might be very hard to support optional functions in
 general because of the C type linkage issue, but perhaps a select few
 could be coded directly into mod_python.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



Re: Various musings about the request URL / URI / whatever

2005-11-29 Thread Jim Gallacher

Nicolas Lehuen wrote:

Hi,

Is it me or is it quite tiresome to get the URL that called us, or get 
the complete URL that would call another function ?


When performing an external redirect (using mod_python.util.redirect for 
example), we MUST (as per RFC) provide a full URL, not a relative one. 
Instead of util.redirect(req,'/foo/bar.py'), we should write 
util.redirect(req,'https://whatever:8443/foo/bar.py').


The problem is, writing this is always tiresome, as it means building a 
string like this :


def current_url(req):
req.add_common_vars()
current_url = []

# protocol
if req.subprocess_env.get('HTTPS') == 'on':
current_url.append('https')
default_port = 443
else:
current_url.append('http')
default_port = 80
current_url.append('://')

# host
current_url.append(req.hostname)

# port
port = req.connection.local_addr[1]
if port != default_port:
current_url.append(':')
current_url.append(str(port))

# URI

current_url.append(req.uri)

return ''.join(current_url)


So I have two questions :

First question, is there a simpler way to do this ? Ironically, when 
using mod_rewrite, you get an environment variable named SCRIPT_URI 
which is precisely what I need (SCRIPT_URL, also added by mod_rewrite, 
is equivalent to req.uri... Don't ask we why). But relying on it isn't 
safe since mod_rewrite isn't always used.


I guess you could just assemble the parts from the req.parsed_uri tuple, 
except that apache doesn't actually fill in parsed_uri. :(


Second question, if there isn't any simpler way to do this, should we 
add it to mod_python ? Either as a function like above in 
mod_python.util, or as a member of the request object (named something 
like url to match the other member named uri, but that's just teasing).


I'm not against it, but for my purposes I think it would be more useful 
for parsed_uri to just work properly.


And third question (in pure Spanish inquisition style) : why is 
req.parsed_uri returning me a tuple full of Nones except for the uri and 
path_info part ?


It comes from apache that way. I sure don't know why though. Maybe we're 
missing some magic apache call that would fill it in?


Ah, fourth question : why are we (mod_python, mod_rewrite and the CGI 
environment variables) using the terms URI and URL to distinguish 
between a full, absolute resource path and a path relative to the 
server, whereas the definition of URLs and URIs is very vague and 
nothing close to this 
(http://www.w3.org/TR/uri-clarification/#contemporary) ? Shouldn't we 
save our souls and a lot of saliva by choosing better names ?


Strangely I was reading the cited page just last week, for perhaps  the 
100th time. I keep hoping I'll find enlightment but alas no. The danger 
of choosing new names (ie absolute_thingy or relative_thingy) is  that 
we also add another layer of confusion. I'm not saying new names are a 
bad idea, just that we need to be very careful.


OK, OK, fifth question : we made req.filename and other members 
writable. But when those attributes are changed, as Graham noted a while 
ago, the other dependent ones aren't, leading to inconsitencies (for 
example, if you change req.filename, req.canonical_filename isn't 
changed). Should we try to solve this and provide clear definition of 
the various parts of a request for mod_python 3.3 ?


That would make sense. I'm wondering how often people make use of 
req.canonical_filename (CFN*)?  Also, just how would the CFN  be 
adjusted if the user code changes req.filename, since the user is free 
to put any string in there they want? Maybe CFN just gets changed to the 
same string. Hopefully Graham will shed some light on this, since it was 
his use case.


Regards,
Jim

* Because I can't type canonical_filename the same way twice. Stupid 
fingers.


Re: Various musings about the request URL / URI / whatever

2005-11-29 Thread Gregory (Grisha) Trubetskoy


On Tue, 29 Nov 2005, Nicolas Lehuen wrote:


def current_url(req):


[snip]



   # host
   current_url.append(req.hostname)


[snip]

This part isn't going to work reliably if you are not using virtual hosts 
and just bind to an IP number. Deciphering the URL is an impossible task - 
I used to have similar code in my apllications, but lately I realized that 
it does not work reliably and it is much simpler to just treat it as a 
configuration item...



First question, is there a simpler way to do this ? Ironically, when using
mod_rewrite, you get an environment variable named SCRIPT_URI which is
precisely what I need (SCRIPT_URL, also added by mod_rewrite, is equivalent
to req.uri... Don't ask we why). But relying on it isn't safe since
mod_rewrite isn't always used.


well - here's how it does it.

/*
 *  create the SCRIPT_URI variable for the env
 */

/* add the canonical URI of this URL */
thisserver = ap_get_server_name(r);
port = ap_get_server_port(r);
if (ap_is_default_port(port, r)) {
thisport = ;
}
else {
apr_snprintf(buf, sizeof(buf), :%u, port);
thisport = buf;
}
thisurl = apr_table_get(r-subprocess_env, ENVVAR_SCRIPT_URL);

/* set the variable */
var = apr_pstrcat(r-pool, ap_http_method(r), ://, thisserver, thisport,
 thisurl, NULL);
apr_table_setn(r-subprocess_env, ENVVAR_SCRIPT_URI, var);

/* if filename was not initially set,
 * we start with the requested URI
 */
if (r-filename == NULL) {
r-filename = apr_pstrdup(r-pool, r-uri);
rewritelog(r, 2, init rewrite engine with requested uri %s,
   r-filename);
}


Second question, if there isn't any simpler way to do this, should we add it
to mod_python ? Either as a function like above in mod_python.util, or as a
member of the request object (named something like url to match the other
member named uri, but that's just teasing).


I don't know... Since the result is going to be half-baked... I think a 
more interesting and mod_python-ish thing to do would be to expose all the 
API's used in the above code (e.g. ap_get_server_name, ap_is_default_port, 
ap_http_method) FIRST, then think about this.



And third question (in pure Spanish inquisition style) : why is
req.parsed_uri returning me a tuple full of Nones except for the uri and
path_info part ?


This is an httpd question most likely...


Ah, fourth question : why are we (mod_python, mod_rewrite and the CGI
environment variables) using the terms URI and URL to distinguish
between a full, absolute resource path and a path relative to the server,
whereas the definition of URLs and URIs is very vague and nothing close to
this (http://www.w3.org/TR/uri-clarification/#contemporary) ? Shouldn't we
save our souls and a lot of saliva by choosing better names ?


No, we (mod_python) should just use the exact same name that httpd uses. 
If we come up better names, then it's just going to make it even more 
confusing.



OK, OK, fifth question : we made req.filename and other members writable.
But when those attributes are changed, as Graham noted a while ago, the
other dependent ones aren't, leading to inconsitencies (for example, if you
change req.filename, req.canonical_filename isn't changed). Should we try to
solve this


The solutions is to make req.canonical_filename writable too and document 
that if you change req.filename, you may consider changing 
canonical_filename as well and what will happen if you do not.



and provide clear definition of the various parts of a request
for mod_python 3.3 ?


Yes, that'd be good :)

Grisha


Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

2005-11-29 Thread Gregory (Grisha) Trubetskoy


On Tue, 29 Nov 2005, Jim Gallacher wrote:

I still think the correct place to create the index dictionary is in the 
__init__ phase. Using the dictionary-on-demand idea only improves performance 
on the second access to a form field. For the first access you are still 
iterating through the whole list for each field name.


I am still not convinced we need an index. I'd like to see some concrete 
proof that we're not engaging in overoptimization here - is this really 
a bottleneck for anyone?


If we're concerned (and I'm not at this point) that FieldStorage is too 
slow, we should just rewrite the whole thing in C :-)


Grisha



Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

2005-11-29 Thread Nick

Jim Gallacher wrote:

Nick wrote:

Just one comment.  It seems like it would be better just to make 
add_method inline, since everything else in __init__ is, and it never 
gets called from anywhere else.


add_method?


Haha, thanks, I haven't had coffee yet.  The add_item method, that is. :)

I also like properties, but doesn't that cause a problem if someone 
chooses to subclass FieldStorage?


It could if you didn't realize it was a property.  But you can always 
override a property with another property.


Nick


Re: Various musings about the request URL / URI / whatever

2005-11-29 Thread Jim Gallacher

Daniel J. Popowich wrote:

Jim Gallacher writes:


Nicolas Lehuen wrote:

Second question, if there isn't any simpler way to do this, should we 
add it to mod_python ? Either as a function like above in 
mod_python.util, or as a member of the request object (named something 
like url to match the other member named uri, but that's just teasing).


I'm not against it, but for my purposes I think it would be more useful 
for parsed_uri to just work properly.



Here, here!!  I've wanted parsed_uri to work as expected for quite
some time...I'm actually in a position where I could devote some time
to tracking this down.  If apache doesn't provide it, I think
mod_python should at least fill it in, right? 


+1


Can someone knudge me
in the right direction to start?  Haven't poked around apache source
and/or developer docs in years.


All I can say is grep is your friend. :)

I've found http://docx.webperf.org to be useful. Unfortunately you can
only drill down into the header files, not c files (unless I'm missing
something). I might even be tempted to generate my own local copy of the
apache docs using doxygen so that the c-files get included. I've been
playing with doxygen + mod_python and it's pretty cool.

Searching docx for parse_uri turns up ap_parse_uri.
http://docx.webperf.org/group__APACHE__CORE__PROTO.html#ga44

Grab the src and put grep to work. I'll dig in and help any way I can.

Jim





Re: Various musings about the request URL / URI / whatever

2005-11-29 Thread Jim Gallacher

Gregory (Grisha) Trubetskoy wrote:


On Tue, 29 Nov 2005, Jim Gallacher wrote:


Daniel J. Popowich wrote:



Here, here!!  I've wanted parsed_uri to work as expected for quite
some time...I'm actually in a position where I could devote some time
to tracking this down.  If apache doesn't provide it, I think
mod_python should at least fill it in, right? 



+1



I don't know what the specific issue is with parsed_uri, if this is a 
mod_python bug it should just be fixed BUT if this is an issue with 
httpd, I don't think we should cover the problem up by having mod_python 
fix it. Since we are part of the HTTP Server project, we should just 
fix it in httpd.


Either way, it should be fixed.

In case anyone is not familiar with the issue, a request for 
http://example.com/tests/mptest?view=form currently gives a tuple that 
looks something like this:


(None, None, None, None, None, None, '/tests/mptest', 'view=form', None)

which is not what we expect. This is what the mod_python docs have to say:

parsed_uri
Tuple. The URI broken down into pieces. (scheme, hostinfo, user, 
password, hostname, port, path, query, fragment). The apache module 
defines a set of URI_* constants that should be used to access elements 
of this tuple. Example:


fname = req.parsed_uri[apache.URI_PATH]

(Read-Only)

Jim




Re: Various musings about the request URL / URI / whatever

2005-11-29 Thread Daniel J. Popowich

Gregory (Grisha) Trubetskoy writes:

 On Tue, 29 Nov 2005, Nicolas Lehuen wrote:

  If I understand you correctly, req.hostname is not reliable in case where
  virtual hosting is not used. What about server.server_hostname, which seems
  to be used by the code from mod_rewrite you posted below ? Can it be used
  reliably ?

 I don't think so.

 if I do this:

 telnet some.host.com 80

 GET /index.html

 How would apache know what the hostname is?

By the Host header. I've been looking into this issue tonight and
think I have the answers (but it's really late, so I'll save the gory
details for tomorrow). In brief: typically, req.hostname is set from
the Host header and, in fact, when I telnet to apache and issue a GET
by hand, if I don't send the Host header, apache barfs with a 400, Bad
Request, response. (apache 2.0.54, debian testing)

As for the larger issue at hand: the reason req.parsed_uri is not
filled in is because browsers don't send the info in the GET, e.g.,
browsers send this:

GET /index.py?a=bc=d HTTP/1.1

not

GET http://user:[EMAIL PROTECTED]:80/index.py?a=bc=d#here HTTP/1.1

if they did, parsed_uri would be filled in. req.unparsed_uri is
whatever the word after GET in the http protocol exchange;
req.parsed_uri is the parsing of that word.

Given the full URI spec:

SCHEME://[USER[:[EMAIL PROTECTED]:PORT]/PATH?QUERY#FRAGMENT

you can see where eight of the nine elements of the parsed_uri tuple
come from; the ninth, hostinfo, is the combination of
[USER[:[EMAIL PROTECTED]:PORT] (everything between // and /).

Unfortunately, browsers only send:

/PATH?QUERY

and that's why we only ever see it in unparsed_uri and parsed_uri.



Again, lots more to share...in the morrow...



Daniel Popowich
---
http://home.comcast.net/~d.popowich/mpservlets/



Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

2005-11-29 Thread Mike Looijmans
Just one comment.  It seems like it would be better just to make 
add_method inline, since everything else in __init__ is, and it 
never gets called from anywhere else.


s/_method/_field/g

The thing I had in mind when I built the add_field method was that I 
could subclass FieldStorage and give the page handler more control over 
what is happening. This was convenient for the thing that got this all 
started: Posting large files. In fact, I intended add_field to replace 
the make_file function.


What I had in mind is that you override the add_field function to get 
context information (e.g. user name, upload directory and such) before 
the file upload comes in, so that you can put the upload where it belongs.


With the two callbacks, you can create the same effect, so it was no 
longer nessecary.


Also, the add_field is referred at least twice and I don't like 
duplicating code.


--
Mike Looijmans
Philips Natlab / Topic Automation



Re: Various musings about the request URL / URI / whatever

2005-11-29 Thread Mike Looijmans

Daniel J. Popowich wrote:


By the Host header. I've been looking into this issue tonight and
think I have the answers (but it's really late, so I'll save the gory
details for tomorrow). In brief: typically, req.hostname is set from
the Host header and, in fact, when I telnet to apache and issue a GET
by hand, if I don't send the Host header, apache barfs with a 400, Bad
Request, response. (apache 2.0.54, debian testing)


It will only do that if you claim to be a HTTP/1.1 client. If you send 
GET / HTTP/1.0

it will not complain about the host header. Sending:
GET / HTTP/1.1
will get you a 400 response, because you MUST supply it (says RFC 2068, 
and whatever superseded that one). There is more you must do to be able 
to call yourself HTTP/1.1 by the way, such as keep-alive connections and 
chunked encoding.


--
Mike Looijmans
Philips Natlab / Topic Automation