[modwsgi] Re: mod_wsgi on Python 3.0 (was Re: Python 2.6 and migration warnings flag for Python 3.0.)

2008-09-30 Thread Toshio Kuratomi



On Sep 29, 3:24 pm, Brian Smith [EMAIL PROTECTED] wrote:
 Toshio Kuratomi wrote:
  Graham Dumpleton wrote:
   As to the HTTP request headers, the RFCs say they are effectively
   latin-1. Thus, all HTTP_? variables in WSGI environ can only be
   processed as latin-1 when converting toUnicode.

  Converting these headers tounicodewill lead to mangled data
  at times.  Let's say that some web app needs to keep track of
  the referer information for some reason.  If the app is
  referred to fromhttp://localhost/€.html (Euro symbol.html
  ) and it is encoded as
  utf-8 on the server then the server will send a header with
  this sequence of bytes::

    Referer  http://localhost/%e2%82%ac.html

  If mod_wsgi assumes latin-1 and converts that intounicode
  before it hits the app, the app will see this::

    Refererhttp://localhost/â%82¬.html

 No, it will leave it ashttp://localhost/%e2%82%ac.html. It does (or should 
 do) the Latin-1-to-Unicodeconversion before it decodes URL encoding.

uhm... you're wrong here.  url encoding and decoding operates on
bytes.  unicode is not bytes.  so you can't go from byte string to
unicode and then pass it through url decode.  Or I suppose you can,
but it isn't by any means the opposite of what you did to get the url
escaped bytes so it's pretty senseless.

  Unlike wsgi.input where the application *must* decide how to
  decode the data, you are trying to do automatic encoding of
  data in the wsgi server here.  This will cause tracebacks on
  someunicodestring input but not others (which is one of the
  reasons that people hateunicodehandling in python-2).  The
  tracebacks occur because latin-1 characters are a subset of
 Unicodecharacters (note that we're not dealing with
  code-point to byte mapping here, we're dealing with character
  mapping).  So you can always convert latin-1 tounicode.
  But you can't always convertUnicodeto latin-1 (which is
  what this automatic conversion would attempt). It's much
  better for the application layer to always hand mod_wsgi byte
  types, neverunicode.

 The HTTP standards mandates Latin-1. Python 3.0 says all strings areUnicode. 
 The encoding/decoding is needed to bridge the gap. Treating the HTTP headers 
 as raw sequences of bytes and requiring Python applications to do their own 
 manual decoding/encoding would not be Pythonic and the Python community 
 wouldn't accept it.

I disagree.  You are dealing with byte sequences here so you need to
call them bytes.  This *is* pythonic (as much as you can define that
for a type that hasn't existed before :-).  Look at the WSGI
specification for python-2.  It specifies storing the values in str
type and not in unicode type and that's accepted by the Python
community as Pythonic.

  This takes care of the problem but is somewhat silly.  We're
  basically using latin-1 as a marshalling format for passing
  bytes over the wire.  So we have to convert  theunicodeto
  bytes as the first step in changingunicodecharacters
  outside the latin-1 range into bytes that can go over the
  wire.  At that point converting the bytes back tounicode
  pretending they're latin-1 instead of utf-8 is just an extra
  step for no reason.

 Again, I think you are misunderstanding the interaction between URL encoding 
 and character encoding conversion. Mod_wsgi will (should) never do or undo 
 URL-encoding itself for non-ASCII (%80-%FF) sequences.

I think that you are misunderstanding the interaction.  And I thing
that % sequences should definitely be done by mod_wsgi.  Ending up
with a unicode string containing %encoded sequences is even worse than
the other scenarios I described as the application then has to convert
from unicode to byte string, unquote the url quoting, and then convert
back to unicode.  (Although this is alleviated in python3 by the fact
that urllib.parse.quote()/unquote() take an encoding argument.  So the
extra steps are taken care of by the function).

It would be much better for mod_wsgi to do the url quoting for the
user as converting between bytes and %escape sequences is 100%
automatable.  This is unlike converting between unicode and a sequence
of bytes where something has to decide what the character encoding
is.  So -- WSGI should take care of %encoding because that's a job for
a computer anyway.  WSGI should not take care of the byte= unicode
conversion because it doesn't know what enconding the bytes are in.

  I have two files there.  Both are named  ½ñ.html. (one-half
  tilde- lowercase-n .html).  However one of the filenames is
  encoded with
  latin-1 and the other with utf-8.  If you switch between
  character encodings for the web page (firefox3:
  View::Character Encoding::UTF-8 vs View::Character
  Encoding::Western (iso 8859-1) ) you'll see that you can make
  one or the other show its name correctly.  Why isn't apache
  able to display both correctly at the same time?  It's
  because apache doesn't know what the encoding of the
  filenames are.  The filesystem is 

[modwsgi] Re: mod_wsgi on Python 3.0 (was Re: Python 2.6 and migration warnings flag for Python 3.0.)

2008-09-30 Thread Toshio Kuratomi



On Sep 29, 4:33 pm, Graham Dumpleton [EMAIL PROTECTED]
wrote:
 2008/9/30 Toshio Kuratomi [EMAIL PROTECTED]:



  For response headers and content, the application can either generate
  bytes and thus control the encoding, or it will fallback to trying to
  convert it as latin-1 ifUnicodesupplied, so like wsgi.input, no
  problem there.

  Unlike wsgi.input where the application *must* decide how to decode
  the data, you are trying to do automatic encoding of data in the wsgi
  server here.  This will cause tracebacks on someunicodestring input
  but not others (which is one of the reasons that people hateunicode
  handling in python-2).  The tracebacks occur because latin-1
  characters are a subset ofUnicodecharacters (note that we're not
  dealing with code-point to byte mapping here, we're dealing with
  character mapping).  So you can always convert latin-1 tounicode.
  But you can't always convertUnicodeto latin-1 (which is what this
  automatic conversion would attempt). It's much better for the
  application layer to always hand mod_wsgi byte types, neverunicode.

 The amendment page says:

   When running under Python 3, applications SHOULD produce bytes
 output and headers

 So, the ideal situation is that the application would always produce
 bytes and so it is the application which is supposed to deal with it.

 That mod_wsgi fallbacks to converting anyUnicodestrings to bytes is
 a fail safe as dictated by:

   When running under Python 3, servers and gateways MUST accept
   strings as application output or headers, under the existing rules (i.e.,
   s.encode('latin-1') must convert the string to bytes without an
   exception)

  and is more to protect lazy programmers, plus make it easier to port
 WSGI applications for Python 2.X.

So there's two things here:
1) Maybe I'm misunderstanding some code but I thought mod_wsgi was
decoding bytes going out to the app.  If that's not the case and
mod_wsgi is only handing byte strings to the apps then that's fine.
(I note that this interaction isn't specified in the Amendment which
goes along with your general feeling on the problems with the WSGI-
spec writing process.)

2) pje said that accepting unicode str here would make it easier to
port WSGI applications but that's actually not true.  In python-2.x,
you are only supposed to pass byte strings (py-2.x str) so there's no
problems.  When those str's are converted to unicode str in py3.x, you
have to rewrite your code so you aren't passing non-latin-1
characters.  At that point, there's zero incentive to pass a sanitized
unicode string to the wsgi server as you had to go through the byte
type in order to get there (unless you misunderstand the WSGI spec and
think it wants you to send py-3.x str type.)

As for protecting lazy programmers... I'd argue that it's much better
to throw an exception immediately upon receiving a unicode type rather
than waiting until your app starts getting popular and you suddenly
have transient errors due to people occassionally submitting data with
non-latin-1 characters.

 In other words, your application is the one who should be dealing with
 it in the first place if you want to be sure about what is being
 produced.

+100

 It only becomes an issue where the WSGI application hasn't
 done what it really should have done.

As long as mod_wsgi is only converting unicode to bytes and not
converting bytes to unicode, this is true.

-Toshio
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
modwsgi group.
To post to this group, send email to modwsgi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en
-~--~~~~--~~--~--~---



[modwsgi] Re: mod_wsgi on Python 3.0 (was Re: Python 2.6 and migration warnings flag for Python 3.0.)

2008-09-30 Thread Toshio Kuratomi



On Sep 29, 4:38 pm, Graham Dumpleton [EMAIL PROTECTED]
wrote:

 As to this whole discussion, as much as it is interesting there is
 nothing I can do about it. It really needs to be brought up on the
 Python WEB-SIG where I originally raised the issue of Python 3.0
 support for WSGI. I can only implement what consensus comes out of
 discussion on Python WEB-SIG in lieu of them not wanting to come out
 with an official revised specification for WSGI.

So I have a couple questions:

Do you agree with or disagree with my analysis that byte type is the
ideal going in and out of WSGI?

Do you agree that pje's argument as to why unicode strings should be
accepted is specious?

If you agree on those, I'll start a new argument on python-web-sig and
see if I can get this changed.  There's a high probability that it'll
just end with pje and I disagreeing with each other but I'll try my
hand as long as someone else who's been implementing WSGI servers
thinks that it's the correct approach.

Thanks!
-Toshio
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
modwsgi group.
To post to this group, send email to modwsgi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en
-~--~~~~--~~--~--~---



[modwsgi] Re: Running roundup directly under mod_wsgi possible?

2008-09-30 Thread Van Gale

Graham Dumpleton wrote:
 
 Just make sure mod_wsgi is working first by following instructions in:
 
   http://code.google.com/p/modwsgi/wiki/QuickConfigurationGuide
 
 The just substitute out hello world script with that snippet.

Yeah, I'm already running about a dozen vhosts under apache, mod_wsgi,
and django... some with pretty complex conf files.

Anyway, the tracker name needing to be full path was the problem, so
it's working now.  Thanks so much for the tip!

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
modwsgi group.
To post to this group, send email to modwsgi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en
-~--~~~~--~~--~--~---



[modwsgi] Re: mod_wsgi on Python 3.0 (was Re: Python 2.6 and migration warnings flag for Python 3.0.)

2008-09-30 Thread Graham Dumpleton

2008/9/30 Toshio Kuratomi [EMAIL PROTECTED]:



 On Sep 29, 4:33 pm, Graham Dumpleton [EMAIL PROTECTED]
 wrote:
 2008/9/30 Toshio Kuratomi [EMAIL PROTECTED]:



  For response headers and content, the application can either generate
  bytes and thus control the encoding, or it will fallback to trying to
  convert it as latin-1 ifUnicodesupplied, so like wsgi.input, no
  problem there.

  Unlike wsgi.input where the application *must* decide how to decode
  the data, you are trying to do automatic encoding of data in the wsgi
  server here.  This will cause tracebacks on someunicodestring input
  but not others (which is one of the reasons that people hateunicode
  handling in python-2).  The tracebacks occur because latin-1
  characters are a subset ofUnicodecharacters (note that we're not
  dealing with code-point to byte mapping here, we're dealing with
  character mapping).  So you can always convert latin-1 tounicode.
  But you can't always convertUnicodeto latin-1 (which is what this
  automatic conversion would attempt). It's much better for the
  application layer to always hand mod_wsgi byte types, neverunicode.

 The amendment page says:

   When running under Python 3, applications SHOULD produce bytes
 output and headers

 So, the ideal situation is that the application would always produce
 bytes and so it is the application which is supposed to deal with it.

 That mod_wsgi fallbacks to converting anyUnicodestrings to bytes is
 a fail safe as dictated by:

   When running under Python 3, servers and gateways MUST accept
   strings as application output or headers, under the existing rules (i.e.,
   s.encode('latin-1') must convert the string to bytes without an
   exception)

  and is more to protect lazy programmers, plus make it easier to port
 WSGI applications for Python 2.X.

 So there's two things here:
 1) Maybe I'm misunderstanding some code but I thought mod_wsgi was
 decoding bytes going out to the app.  If that's not the case and
 mod_wsgi is only handing byte strings to the apps then that's fine.
 (I note that this interaction isn't specified in the Amendment which
 goes along with your general feeling on the problems with the WSGI-
 spec writing process.)

I thought I had made it clear enough and that the proposed amendments
were also clear on this.

The wsgi.input stream which contains the request content is 'bytes'.
Thus it is not touched by mod_wsgi. The amendments say:

  When running under Python 3, servers MUST make wsgi.input a
  binary (byte) stream

Though amendments do though also say:

  When running under Python 3, servers MUST provide CGI HTTP variables
  as strings, decoded from the headers using HTTP standard encodings
  (i.e. latin-1 + RFC 2047) (Open question: are there any CGI or WSGI
  variables that should NOT be strings?)

Thus, mod_wsgi does however convert the CGI variables (ie., translated
HTTP headers) in WSGI environment dictionary, into Unicode strings
using latin-1 encoding.

As I pointed out there were only a few variables in there which were
of concern. Brian has pointed out that request URI has to be ascii
characters but there possibly still is an open question there on how
encoding of non ascii characters works in practice. We just need to do
some actual tests to see what happens and whether there is a problem.

Thus we are possibly down to SCRIPT_FILENAME given that it is
reflecting a file system path. Again, we just need to do some actual
tests to see what happens. Remembering that Apache is going to dictate
in the main how things work.

 2) pje said that accepting unicode str here would make it easier to
 port WSGI applications but that's actually not true.  In python-2.x,
 you are only supposed to pass byte strings (py-2.x str) so there's no
 problems.  When those str's are converted to unicode str in py3.x, you
 have to rewrite your code so you aren't passing non-latin-1
 characters.  At that point, there's zero incentive to pass a sanitized
 unicode string to the wsgi server as you had to go through the byte
 type in order to get there (unless you misunderstand the WSGI spec and
 think it wants you to send py-3.x str type.)

 As for protecting lazy programmers... I'd argue that it's much better
 to throw an exception immediately upon receiving a unicode type rather
 than waiting until your app starts getting popular and you suddenly
 have transient errors due to people occassionally submitting data with
 non-latin-1 characters.

My feeling was that fallback to converting to bytes using latin-1 was
so that simple applications would still work. For example, the hello
world application:

def application(environ, start_response):
status = '200 OK'
output = 'Hello World!'

response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(len(output)))]
start_response(status, response_headers)

return [output]

works in by Python 2.X and 3.0 without change.

Larger applications such as Django 

[modwsgi] Re: mod_wsgi on Python 3.0 (was Re: Python 2.6 and migration warnings flag for Python 3.0.)

2008-09-30 Thread Graham Dumpleton

Can we stop with the mod_wsgi should do this or mod_wsgi should do
that. The Apache/mod_wsgi module is just one implementation of the
WSGI specification. You need when talking about this to look at the
bigger picture and what other implementations exist, plus how they all
work and interact with the web server they use.

Take CGI for example. If you are using a CGI-WSGI adapter, the WSGI
environment will come in through os.environ. If you run Python 3.0 and
look at os.environ you will get:

Python 3.0rc1 (r30rc1:66499, Sep 18 2008, 21:39:06)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type help, copyright, credits or license for more information.
 import os
 os.environ['PATH']
'/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/ose/bin:/usr/local/bin:/Users/grahamd/bin'
 type(os.environ['PATH'])
class 'str'

So, os.environ already holds values as Unicode string objects and not
bytes. Thus there is no chance of them being passed to application as
bytes.

How they get to become Unicode strings depend on the platform. For
Windows it uses:

  PyUnicode_FromWideChar()

So, input is Unicode to begin with.

On UNIX boxes it uses:

  PyUnicode_FromString()

which presumably means it uses default system encoding whatever that might be.

Anyway, already you are stopped from communicating bytes to WSGI
application. One could say that proposed amendments to specification
for Python 3.0 don't even consider this case where conversion already
done for you.

Anyway, I have to leave off for now as have to go home. As I sort of
suggest above, keep in mind that the proposed amendments are trying to
find a compromise that works for many hosting environments. Thus
although you ideally may want bytes everywhere, that may not work in
practice.

Graham


2008/9/30 Toshio Kuratomi [EMAIL PROTECTED]:



 On Sep 29, 3:24 pm, Brian Smith [EMAIL PROTECTED] wrote:
 Toshio Kuratomi wrote:
  Graham Dumpleton wrote:
   As to the HTTP request headers, the RFCs say they are effectively
   latin-1. Thus, all HTTP_? variables in WSGI environ can only be
   processed as latin-1 when converting toUnicode.

  Converting these headers tounicodewill lead to mangled data
  at times.  Let's say that some web app needs to keep track of
  the referer information for some reason.  If the app is
  referred to fromhttp://localhost/€.html (Euro symbol.html
  ) and it is encoded as
  utf-8 on the server then the server will send a header with
  this sequence of bytes::

Referer  http://localhost/%e2%82%ac.html

  If mod_wsgi assumes latin-1 and converts that intounicode
  before it hits the app, the app will see this::

Refererhttp://localhost/â%82¬.html

 No, it will leave it ashttp://localhost/%e2%82%ac.html. It does (or should 
 do) the Latin-1-to-Unicodeconversion before it decodes URL encoding.

 uhm... you're wrong here.  url encoding and decoding operates on
 bytes.  unicode is not bytes.  so you can't go from byte string to
 unicode and then pass it through url decode.  Or I suppose you can,
 but it isn't by any means the opposite of what you did to get the url
 escaped bytes so it's pretty senseless.

  Unlike wsgi.input where the application *must* decide how to
  decode the data, you are trying to do automatic encoding of
  data in the wsgi server here.  This will cause tracebacks on
  someunicodestring input but not others (which is one of the
  reasons that people hateunicodehandling in python-2).  The
  tracebacks occur because latin-1 characters are a subset of
 Unicodecharacters (note that we're not dealing with
  code-point to byte mapping here, we're dealing with character
  mapping).  So you can always convert latin-1 tounicode.
  But you can't always convertUnicodeto latin-1 (which is
  what this automatic conversion would attempt). It's much
  better for the application layer to always hand mod_wsgi byte
  types, neverunicode.

 The HTTP standards mandates Latin-1. Python 3.0 says all strings areUnicode. 
 The encoding/decoding is needed to bridge the gap. Treating the HTTP headers 
 as raw sequences of bytes and requiring Python applications to do their own 
 manual decoding/encoding would not be Pythonic and the Python community 
 wouldn't accept it.

 I disagree.  You are dealing with byte sequences here so you need to
 call them bytes.  This *is* pythonic (as much as you can define that
 for a type that hasn't existed before :-).  Look at the WSGI
 specification for python-2.  It specifies storing the values in str
 type and not in unicode type and that's accepted by the Python
 community as Pythonic.

  This takes care of the problem but is somewhat silly.  We're
  basically using latin-1 as a marshalling format for passing
  bytes over the wire.  So we have to convert  theunicodeto
  bytes as the first step in changingunicodecharacters
  outside the latin-1 range into bytes that can go over the
  wire.  At that point converting the bytes back tounicode
  pretending they're latin-1 instead of utf-8 is just an extra
  

[modwsgi] Re: Segmentation fault - premature end of script headers

2008-09-30 Thread Pigletto

Today I was not able to start my application as I got segmentation
faults constantly.
I've attached gdb and that is the result:

(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1212216416 (LWP 29850)]
PyErr_Occurred () at Python/errors.c:80
80  Python/errors.c: No such file or directory.
in Python/errors.c
(gdb) bt
#0  PyErr_Occurred () at Python/errors.c:80
#1  0x002ce167 in _PyObject_GC_Malloc (basicsize=40) at Modules/
gcmodule.c:1326
#2  0x002ce21c in _PyObject_GC_NewVar (tp=0x3083c0, nitems=7) at
Modules/gcmodule.c:1352
#3  0x00267c33 in PyTuple_New (size=7) at Objects/tupleobject.c:68
#4  0x0041cdc0 in ?? ()
#5  0x0007 in ?? ()
#6  0x001c in ?? ()
#7  0xb7beab18 in ?? ()
#8  0x0041cd4e in ?? ()
#9  0xb7be9af8 in ?? ()
#10 0xb758b22c in ?? ()
#11 0xb7be9b80 in ?? ()
#12 0x0042fed4 in ?? ()
#13 0x0042fed4 in ?? ()
#14 0xb7be9acc in ?? ()
#15 0xb7be9a2c in ?? ()
#16 0xfbad8001 in ?? ()
#17 0xb7be9ca0 in ?? ()
#18 0xb7be9ca0 in ?? ()
#19 0xb7be9ca0 in ?? ()
#20 0xb7be9ca0 in ?? ()
#21 0x0042fed4 in ?? ()
#22 0x08098608 in apr_bucket_type_eos ()
#23 0x09ba7920 in ?? ()
#24 0x002c in ?? ()
#25 0x in ?? ()
#26 0x0413 in ?? ()
#27 0x in ?? ()
(gdb) thread apply all bt

Thread 4 (Thread -1211159648 (LWP 29848)):
#0  0x00ad57a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00bb33b1 in ___newselect_nocancel () from /lib/tls/libc.so.6
#2  0x0097826b in apr_sleep (t=29953) at time/unix/time.c:246
#3  0x00a76110 in wsgi_monitor_thread (thd=0x9a93420, data=0x9a92dd0)
at mod_wsgi.c:8367
#4  0x0097783c in dummy_worker (opaque=0xfdfe) at threadproc/unix/
thread.c:142
#5  0x00c723cc in start_thread () from /lib/tls/libpthread.so.0
#6  0x00bba96e in clone () from /lib/tls/libc.so.6

Thread 3 (Thread -1211688032 (LWP 29849)):
#0  0x00ad57a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00bb33b1 in ___newselect_nocancel () from /lib/tls/libc.so.6
#2  0x0097826b in apr_sleep (t=100) at time/unix/time.c:246
#3  0x00a75f6a in wsgi_deadlock_thread (thd=0x9a93440, data=0x9a92dd0)
at mod_wsgi.c:8279
#4  0x0097783c in dummy_worker (opaque=0xfdfe) at threadproc/unix/
thread.c:142
#5  0x00c723cc in start_thread () from /lib/tls/libpthread.so.0
#6  0x00bba96e in clone () from /lib/tls/libc.so.6

Thread 2 (Thread -1212216416 (LWP 29850)):
#0  PyErr_Occurred () at Python/errors.c:80
#1  0x002ce167 in _PyObject_GC_Malloc (basicsize=40) at Modules/
gcmodule.c:1326
#2  0x002ce21c in _PyObject_GC_NewVar (tp=0x3083c0, nitems=7) at
Modules/gcmodule.c:1352
#3  0x00267c33 in PyTuple_New (size=7) at Objects/tupleobject.c:68
#4  0x0041cdc0 in ?? ()
#5  0x0007 in ?? ()
#6  0x001c in ?? ()
#7  0xb7beab18 in ?? ()
#8  0x0041cd4e in ?? ()
#9  0xb7be9af8 in ?? ()
#10 0xb758b22c in ?? ()
#11 0xb7be9b80 in ?? ()
#12 0x0042fed4 in ?? ()
#13 0x0042fed4 in ?? ()
#14 0xb7be9acc in ?? ()
#15 0xb7be9a2c in ?? ()
#16 0xfbad8001 in ?? ()
#17 0xb7be9ca0 in ?? ()
#18 0xb7be9ca0 in ?? ()
#19 0xb7be9ca0 in ?? ()
#20 0xb7be9ca0 in ?? ()
#21 0x0042fed4 in ?? ()
#22 0x08098608 in apr_bucket_type_eos ()
#23 0x09ba7920 in ?? ()
#24 0x002c in ?? ()
#25 0x in ?? ()
#26 0x0413 in ?? ()
#27 0x in ?? ()

---Type return to continue, or q return to quit---
Thread 1 (Thread -1208453440 (LWP 29847)):
#0  0x00ad57a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00c787c7 in do_sigwait () from /lib/tls/libpthread.so.0
#2  0x00c7888f in sigwait () from /lib/tls/libpthread.so.0
#3  0x009775ea in apr_signal_thread (signal_handler=0xa75e30
wsgi_check_signal) at threadproc/unix/signals.c:383
#4  0x00a76b61 in wsgi_start_process (p=0x9a0d0a8, daemon=0x9a92dd0)
at mod_wsgi.c:8483
#5  0x00a7707a in wsgi_manage_process (reason=0, data=0x9a92dd0,
status=11) at mod_wsgi.c:7708
#6  0x009703c8 in apr_proc_other_child_alert (proc=0xbfea8f80,
reason=0, status=11) at misc/unix/otherchild.c:115
#7  0x080817ad in ap_mpm_run (_pconf=0x9a0d0a8, plog=0x9a3b160,
s=0x9a0ef48) at worker.c:1611
#8  0x08061d9c in main (argc=3, argv=0xbfea90e4) at main.c:730
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
PyErr_Occurred () at Python/errors.c:80
80  in Python/errors.c
(gdb) cont
Continuing.

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb) quit


After switching to WSGIApplicationGroup %{GLOBAL} my application
started, but I have few more applications on this apache instance so I
can't use this kind of setup.
Is there anything interesting in the above gdb log? Any other commands
that I can use next time?

--
Maciej Wisniowski
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
modwsgi group.
To post to this group, send email to modwsgi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 

[modwsgi] Re: Segmentation fault - premature end of script headers

2008-09-30 Thread Graham Dumpleton

Not particularly useful unfortunately.

Next thing would be to determine if crash happens as a result of
import WSGI script file itself, or due to call of WSGI application.

Thus at head of WSGI script file add:

  import sys
  print  sys.stderr, START OF WSGI SCRIPT FILE

and at end of WSGI script file add:

  print  sys.stderr, END OF WSGI SCRIPT FILE

If it isn't crashing at load of WSGI script file, both should appear
in Apache error log.

If does crash, add more debug output like that to ascertain which
module being imported causes it to crash.

If that is a big module, then need to recursively work out what module
that module imports and do the import at start of WSGI script file and
try and narrow down which module causes crash.

I can't remember, but will test later, if one can manage to set
environment variable to force Python to log all imports. This will
help narrow it down quicker.

Other option is since works in %{GLOBAL}, once everything imported,
iterate over modules in sys.modules and find all that have __file__
referencing a .so file and print that out. That will tell you which C
extension modules are being used. Standard ones should be okay, but
third party ones would be worth a closer look.

More later.

Graham

2008/9/30 Pigletto [EMAIL PROTECTED]:

 Today I was not able to start my application as I got segmentation
 faults constantly.
 I've attached gdb and that is the result:

 (gdb) cont
 Continuing.

 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread -1212216416 (LWP 29850)]
 PyErr_Occurred () at Python/errors.c:80
 80  Python/errors.c: No such file or directory.
in Python/errors.c
 (gdb) bt
 #0  PyErr_Occurred () at Python/errors.c:80
 #1  0x002ce167 in _PyObject_GC_Malloc (basicsize=40) at Modules/
 gcmodule.c:1326
 #2  0x002ce21c in _PyObject_GC_NewVar (tp=0x3083c0, nitems=7) at
 Modules/gcmodule.c:1352
 #3  0x00267c33 in PyTuple_New (size=7) at Objects/tupleobject.c:68
 #4  0x0041cdc0 in ?? ()
 #5  0x0007 in ?? ()
 #6  0x001c in ?? ()
 #7  0xb7beab18 in ?? ()
 #8  0x0041cd4e in ?? ()
 #9  0xb7be9af8 in ?? ()
 #10 0xb758b22c in ?? ()
 #11 0xb7be9b80 in ?? ()
 #12 0x0042fed4 in ?? ()
 #13 0x0042fed4 in ?? ()
 #14 0xb7be9acc in ?? ()
 #15 0xb7be9a2c in ?? ()
 #16 0xfbad8001 in ?? ()
 #17 0xb7be9ca0 in ?? ()
 #18 0xb7be9ca0 in ?? ()
 #19 0xb7be9ca0 in ?? ()
 #20 0xb7be9ca0 in ?? ()
 #21 0x0042fed4 in ?? ()
 #22 0x08098608 in apr_bucket_type_eos ()
 #23 0x09ba7920 in ?? ()
 #24 0x002c in ?? ()
 #25 0x in ?? ()
 #26 0x0413 in ?? ()
 #27 0x in ?? ()
 (gdb) thread apply all bt

 Thread 4 (Thread -1211159648 (LWP 29848)):
 #0  0x00ad57a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
 #1  0x00bb33b1 in ___newselect_nocancel () from /lib/tls/libc.so.6
 #2  0x0097826b in apr_sleep (t=29953) at time/unix/time.c:246
 #3  0x00a76110 in wsgi_monitor_thread (thd=0x9a93420, data=0x9a92dd0)
 at mod_wsgi.c:8367
 #4  0x0097783c in dummy_worker (opaque=0xfdfe) at threadproc/unix/
 thread.c:142
 #5  0x00c723cc in start_thread () from /lib/tls/libpthread.so.0
 #6  0x00bba96e in clone () from /lib/tls/libc.so.6

 Thread 3 (Thread -1211688032 (LWP 29849)):
 #0  0x00ad57a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
 #1  0x00bb33b1 in ___newselect_nocancel () from /lib/tls/libc.so.6
 #2  0x0097826b in apr_sleep (t=100) at time/unix/time.c:246
 #3  0x00a75f6a in wsgi_deadlock_thread (thd=0x9a93440, data=0x9a92dd0)
 at mod_wsgi.c:8279
 #4  0x0097783c in dummy_worker (opaque=0xfdfe) at threadproc/unix/
 thread.c:142
 #5  0x00c723cc in start_thread () from /lib/tls/libpthread.so.0
 #6  0x00bba96e in clone () from /lib/tls/libc.so.6

 Thread 2 (Thread -1212216416 (LWP 29850)):
 #0  PyErr_Occurred () at Python/errors.c:80
 #1  0x002ce167 in _PyObject_GC_Malloc (basicsize=40) at Modules/
 gcmodule.c:1326
 #2  0x002ce21c in _PyObject_GC_NewVar (tp=0x3083c0, nitems=7) at
 Modules/gcmodule.c:1352
 #3  0x00267c33 in PyTuple_New (size=7) at Objects/tupleobject.c:68
 #4  0x0041cdc0 in ?? ()
 #5  0x0007 in ?? ()
 #6  0x001c in ?? ()
 #7  0xb7beab18 in ?? ()
 #8  0x0041cd4e in ?? ()
 #9  0xb7be9af8 in ?? ()
 #10 0xb758b22c in ?? ()
 #11 0xb7be9b80 in ?? ()
 #12 0x0042fed4 in ?? ()
 #13 0x0042fed4 in ?? ()
 #14 0xb7be9acc in ?? ()
 #15 0xb7be9a2c in ?? ()
 #16 0xfbad8001 in ?? ()
 #17 0xb7be9ca0 in ?? ()
 #18 0xb7be9ca0 in ?? ()
 #19 0xb7be9ca0 in ?? ()
 #20 0xb7be9ca0 in ?? ()
 #21 0x0042fed4 in ?? ()
 #22 0x08098608 in apr_bucket_type_eos ()
 #23 0x09ba7920 in ?? ()
 #24 0x002c in ?? ()
 #25 0x in ?? ()
 #26 0x0413 in ?? ()
 #27 0x in ?? ()

 ---Type return to continue, or q return to quit---
 Thread 1 (Thread -1208453440 (LWP 29847)):
 #0  0x00ad57a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
 #1  0x00c787c7 in do_sigwait () from /lib/tls/libpthread.so.0
 #2  0x00c7888f in sigwait () from /lib/tls/libpthread.so.0
 #3  0x009775ea in apr_signal_thread (signal_handler=0xa75e30
 

[modwsgi] Re: Segmentation fault - premature end of script headers

2008-09-30 Thread Graham Dumpleton

2008/9/30 Pigletto [EMAIL PROTECTED]:
 After switching to WSGIApplicationGroup %{GLOBAL} my application
 started, but I have few more applications on this apache instance so I
 can't use this kind of setup.

Can you explain to me how WebFaction process/memory limits work?

If you don't have issues with number of processes and only overall
memory usage, then create a separate daemon process group for each
application with it being forced to run in main interpreter of its own
process. Thus:

VirtualHost *:2867

 ServerName my-domain.xyz

 WSGIDaemonProcess rek-prod-app-1 user=xyz group=xyz processes=2 threads=1 \
 maximum-requests=500 inactivity-timeout=7200 stack-size=524288 \
 display-name=%{GROUP}

 WSGIScriptAlias / /home2/(...)/rek_project-1.wsgi

 Directory /home2/(...)/rek_project-1/
   WSGIProcessGroup rek-prod-app-1
WSGIApplicationGroup %{GLOBAL}
   Order deny,allow
   Allow from all
 /Directory

 WSGIDaemonProcess rek-prod-app-1 user=xyz group=xyz processes=2 threads=1 \
 maximum-requests=500 inactivity-timeout=7200 stack-size=524288 \
 display-name=%{GROUP}

  WSGIScriptAlias /suburl /home2/(...)/rek_project-2.wsgi

 Directory /home2/(...)/rek_project-2/
   WSGIProcessGroup rek-prod-app-2
WSGIApplicationGroup %{GLOBAL}
   Order deny,allow
   Allow from all
 /Directory

/VirtualHost

This would end up with similar memory usage, the difference being that
the application instances are in separate processes rather than
separate sub interpreters of same process.

Graham

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
modwsgi group.
To post to this group, send email to modwsgi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en
-~--~~~~--~~--~--~---



[modwsgi] Re: mod_wsgi on Python 3.0 (was Re: Python 2.6 and migration warnings flag for Python 3.0.)

2008-09-30 Thread Graham Dumpleton

The BaseHTTPRequestHandler in http.server of Python 3.0 also only
makes headers available as Unicode (latin-1).

headers = []
while True:
line = self.rfile.readline()
headers.append(line)
if line in (b'\r\n', b'\n', b''):
break
hfile = io.StringIO(b''.join(headers).decode('iso-8859-1'))
self.headers =
email.parser.Parser(_class=self.MessageClass).parse(hfile)

Thus, any WSGI server based on that would have no chance of getting
access to headers in byte form.

Graham

2008/9/30 Graham Dumpleton [EMAIL PROTECTED]:
 Can we stop with the mod_wsgi should do this or mod_wsgi should do
 that. The Apache/mod_wsgi module is just one implementation of the
 WSGI specification. You need when talking about this to look at the
 bigger picture and what other implementations exist, plus how they all
 work and interact with the web server they use.

 Take CGI for example. If you are using a CGI-WSGI adapter, the WSGI
 environment will come in through os.environ. If you run Python 3.0 and
 look at os.environ you will get:

 Python 3.0rc1 (r30rc1:66499, Sep 18 2008, 21:39:06)
 [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
 Type help, copyright, credits or license for more information.
 import os
 os.environ['PATH']
 '/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/ose/bin:/usr/local/bin:/Users/grahamd/bin'
 type(os.environ['PATH'])
 class 'str'

 So, os.environ already holds values as Unicode string objects and not
 bytes. Thus there is no chance of them being passed to application as
 bytes.

 How they get to become Unicode strings depend on the platform. For
 Windows it uses:

  PyUnicode_FromWideChar()

 So, input is Unicode to begin with.

 On UNIX boxes it uses:

  PyUnicode_FromString()

 which presumably means it uses default system encoding whatever that might be.

 Anyway, already you are stopped from communicating bytes to WSGI
 application. One could say that proposed amendments to specification
 for Python 3.0 don't even consider this case where conversion already
 done for you.

 Anyway, I have to leave off for now as have to go home. As I sort of
 suggest above, keep in mind that the proposed amendments are trying to
 find a compromise that works for many hosting environments. Thus
 although you ideally may want bytes everywhere, that may not work in
 practice.

 Graham


 2008/9/30 Toshio Kuratomi [EMAIL PROTECTED]:



 On Sep 29, 3:24 pm, Brian Smith [EMAIL PROTECTED] wrote:
 Toshio Kuratomi wrote:
  Graham Dumpleton wrote:
   As to the HTTP request headers, the RFCs say they are effectively
   latin-1. Thus, all HTTP_? variables in WSGI environ can only be
   processed as latin-1 when converting toUnicode.

  Converting these headers tounicodewill lead to mangled data
  at times.  Let's say that some web app needs to keep track of
  the referer information for some reason.  If the app is
  referred to fromhttp://localhost/€.html (Euro symbol.html
  ) and it is encoded as
  utf-8 on the server then the server will send a header with
  this sequence of bytes::

Referer  http://localhost/%e2%82%ac.html

  If mod_wsgi assumes latin-1 and converts that intounicode
  before it hits the app, the app will see this::

Refererhttp://localhost/â%82¬.html

 No, it will leave it ashttp://localhost/%e2%82%ac.html. It does (or should 
 do) the Latin-1-to-Unicodeconversion before it decodes URL encoding.

 uhm... you're wrong here.  url encoding and decoding operates on
 bytes.  unicode is not bytes.  so you can't go from byte string to
 unicode and then pass it through url decode.  Or I suppose you can,
 but it isn't by any means the opposite of what you did to get the url
 escaped bytes so it's pretty senseless.

  Unlike wsgi.input where the application *must* decide how to
  decode the data, you are trying to do automatic encoding of
  data in the wsgi server here.  This will cause tracebacks on
  someunicodestring input but not others (which is one of the
  reasons that people hateunicodehandling in python-2).  The
  tracebacks occur because latin-1 characters are a subset of
 Unicodecharacters (note that we're not dealing with
  code-point to byte mapping here, we're dealing with character
  mapping).  So you can always convert latin-1 tounicode.
  But you can't always convertUnicodeto latin-1 (which is
  what this automatic conversion would attempt). It's much
  better for the application layer to always hand mod_wsgi byte
  types, neverunicode.

 The HTTP standards mandates Latin-1. Python 3.0 says all strings 
 areUnicode. The encoding/decoding is needed to bridge the gap. Treating the 
 HTTP headers as raw sequences of bytes and requiring Python applications to 
 do their own manual decoding/encoding would not be Pythonic and the Python 
 community wouldn't accept it.

 I disagree.  You are dealing with byte sequences here so you need to
 call them bytes.  This *is* pythonic (as much as you can define that
 for a type 

[modwsgi] Re: mod_wsgi on Python 3.0 (was Re: Python 2.6 and migration warnings flag for Python 3.0.)

2008-09-30 Thread Clodoaldo Pinto Neto
2008/9/30 Toshio Kuratomi [EMAIL PROTECTED]:



 On Sep 29, 3:24 pm, Brian Smith [EMAIL PROTECTED] wrote:
 Toshio Kuratomi wrote:
  Graham Dumpleton wrote:
   As to the HTTP request headers, the RFCs say they are effectively
   latin-1. Thus, all HTTP_? variables in WSGI environ can only be
   processed as latin-1 when converting toUnicode.

  Converting these headers tounicodewill lead to mangled data
  at times.  Let's say that some web app needs to keep track of
  the referer information for some reason.  If the app is
  referred to fromhttp://localhost/€.html (Euro symbol.html
  ) and it is encoded as
  utf-8 on the server then the server will send a header with
  this sequence of bytes::

Referer  http://localhost/%e2%82%ac.html

  If mod_wsgi assumes latin-1 and converts that intounicode
  before it hits the app, the app will see this::

Refererhttp://localhost/â%82¬.html

 No, it will leave it ashttp://localhost/%e2%82%ac.html. It does (or should 
 do) the Latin-1-to-Unicodeconversion before it decodes URL encoding.

 uhm... you're wrong here.  url encoding and decoding operates on
 bytes.  unicode is not bytes.  so you can't go from byte string to
 unicode and then pass it through url decode.  Or I suppose you can,
 but it isn't by any means the opposite of what you did to get the url
 escaped bytes so it's pretty senseless.

I tested that url with Firefox and Opera in Linux utf-8 and what
happens is that Firefox does what Brian says. But testing Firefox in
Windows XP it substitutes € for %80 and IE6 changes € to %e2%82%ac.

  Unlike wsgi.input where the application *must* decide how to
  decode the data, you are trying to do automatic encoding of
  data in the wsgi server here.  This will cause tracebacks on
  someunicodestring input but not others (which is one of the
  reasons that people hateunicodehandling in python-2).  The
  tracebacks occur because latin-1 characters are a subset of
 Unicodecharacters (note that we're not dealing with
  code-point to byte mapping here, we're dealing with character
  mapping).  So you can always convert latin-1 tounicode.
  But you can't always convertUnicodeto latin-1 (which is
  what this automatic conversion would attempt). It's much
  better for the application layer to always hand mod_wsgi byte
  types, neverunicode.

 The HTTP standards mandates Latin-1. Python 3.0 says all strings areUnicode. 
 The encoding/decoding is needed to bridge the gap. Treating the HTTP headers 
 as raw sequences of bytes and requiring Python applications to do their own 
 manual decoding/encoding would not be Pythonic and the Python community 
 wouldn't accept it.

 I disagree.  You are dealing with byte sequences here so you need to
 call them bytes.  This *is* pythonic (as much as you can define that
 for a type that hasn't existed before :-).  Look at the WSGI
 specification for python-2.  It specifies storing the values in str
 type and not in unicode type and that's accepted by the Python
 community as Pythonic.

  This takes care of the problem but is somewhat silly.  We're
  basically using latin-1 as a marshalling format for passing
  bytes over the wire.  So we have to convert  theunicodeto
  bytes as the first step in changingunicodecharacters
  outside the latin-1 range into bytes that can go over the
  wire.  At that point converting the bytes back tounicode
  pretending they're latin-1 instead of utf-8 is just an extra
  step for no reason.

 Again, I think you are misunderstanding the interaction between URL encoding 
 and character encoding conversion. Mod_wsgi will (should) never do or undo 
 URL-encoding itself for non-ASCII (%80-%FF) sequences.

 I think that you are misunderstanding the interaction.  And I thing
 that % sequences should definitely be done by mod_wsgi.  Ending up
 with a unicode string containing %encoded sequences is even worse than
 the other scenarios I described as the application then has to convert
 from unicode to byte string, unquote the url quoting, and then convert
 back to unicode.  (Although this is alleviated in python3 by the fact
 that urllib.parse.quote()/unquote() take an encoding argument.  So the
 extra steps are taken care of by the function).

 It would be much better for mod_wsgi to do the url quoting for the
 user as converting between bytes and %escape sequences is 100%
 automatable.  This is unlike converting between unicode and a sequence
 of bytes where something has to decide what the character encoding
 is.  So -- WSGI should take care of %encoding because that's a job for
 a computer anyway.  WSGI should not take care of the byte= unicode
 conversion because it doesn't know what enconding the bytes are in.

  I have two files there.  Both are named  ½ñ.html. (one-half
  tilde- lowercase-n .html).  However one of the filenames is
  encoded with
  latin-1 and the other with utf-8.  If you switch between
  character encodings for the web page (firefox3:
  View::Character Encoding::UTF-8 vs View::Character

[modwsgi] Re: Segmentation fault - premature end of script headers

2008-09-30 Thread Pigletto

Now, again, my application is working with the same setup as before
(without GLOBAL). I don't know why this started without segfault now.
Nothing has changed.
I have to mention that the issue that caused I was not able to start
my application today morning was
because my memory was over the limit (before this I was disconnected
while gdb'ing my app on another Apache instance and gdb process was
hung using too much memory)
so webfaction killed my processes. After my processes were killed I
had to start everything and I was not albe to make one of my apps
running (as you have seen already).
So, important thing is that there were no changes in application code
and no changes in apache configuration. Currently it works again and I
can't do more debugging - it doesn't want to segfault.


I've added some print statements as you've suggested but I think that
wsgi script was imported properlywhen segmentation fault has occured
becouse LoggingMiddleware had written empty oheaders.. and ocontent..
files.


 Can you explain to me how WebFaction process/memory limits work?
There are no limits for number of processes only for memory usage.

 If you don't have issues with number of processes and only overall
 memory usage, then create a separate daemon process group for each
 application with it being forced to run in main interpreter of its own
 process. Thus:
  Directory /home2/(...)/rek_project-2/
    WSGIProcessGroup rek-prod-app-2
     WSGIApplicationGroup %{GLOBAL}
    Order deny,allow
    Allow from all
  /Directory

 /VirtualHost

 This would end up with similar memory usage, the difference being that
 the application instances are in separate processes rather than
 separate sub interpreters of same process.
OK I'll try this.

Strange thing is that I had no segmentation faults for two days (since
my previous post), and today morning I've seen them one after one.
I think about things like: maximum requests per child setting in
apache, something with threading in apache, memcached - was not
started while I was trying to start my application, but when I've
switched to %{GLOBAL}, memcached was still down and it worked...
I had segmentation faults before (with locmem caching, so it is not
issue with memcached). AFAIR I saw some segfaults before using django-
compress. Maybe this is something nasty in psycopg2. I think about
adding print statements to all my middlewares and functions. This
thing is really hard to debug especially on the server that is used by
real users.

Thank you very much for your help so far.

--
Maciej Wisniowski
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
modwsgi group.
To post to this group, send email to modwsgi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en
-~--~~~~--~~--~--~---



[modwsgi] Re: Segmentation fault - premature end of script headers

2008-09-30 Thread Graham Dumpleton

What do you get if you run:

   ulimit -a

Maybe they have some sort of hard memory limits in place and you are
hitting that.

Graham

2008/9/30 Pigletto [EMAIL PROTECTED]:

 Now, again, my application is working with the same setup as before
 (without GLOBAL). I don't know why this started without segfault now.
 Nothing has changed.
 I have to mention that the issue that caused I was not able to start
 my application today morning was
 because my memory was over the limit (before this I was disconnected
 while gdb'ing my app on another Apache instance and gdb process was
 hung using too much memory)
 so webfaction killed my processes. After my processes were killed I
 had to start everything and I was not albe to make one of my apps
 running (as you have seen already).
 So, important thing is that there were no changes in application code
 and no changes in apache configuration. Currently it works again and I
 can't do more debugging - it doesn't want to segfault.


 I've added some print statements as you've suggested but I think that
 wsgi script was imported properlywhen segmentation fault has occured
 becouse LoggingMiddleware had written empty oheaders.. and ocontent..
 files.


 Can you explain to me how WebFaction process/memory limits work?
 There are no limits for number of processes only for memory usage.

 If you don't have issues with number of processes and only overall
 memory usage, then create a separate daemon process group for each
 application with it being forced to run in main interpreter of its own
 process. Thus:
  Directory /home2/(...)/rek_project-2/
WSGIProcessGroup rek-prod-app-2
 WSGIApplicationGroup %{GLOBAL}
Order deny,allow
Allow from all
  /Directory

 /VirtualHost

 This would end up with similar memory usage, the difference being that
 the application instances are in separate processes rather than
 separate sub interpreters of same process.
 OK I'll try this.

 Strange thing is that I had no segmentation faults for two days (since
 my previous post), and today morning I've seen them one after one.
 I think about things like: maximum requests per child setting in
 apache, something with threading in apache, memcached - was not
 started while I was trying to start my application, but when I've
 switched to %{GLOBAL}, memcached was still down and it worked...
 I had segmentation faults before (with locmem caching, so it is not
 issue with memcached). AFAIR I saw some segfaults before using django-
 compress. Maybe this is something nasty in psycopg2. I think about
 adding print statements to all my middlewares and functions. This
 thing is really hard to debug especially on the server that is used by
 real users.

 Thank you very much for your help so far.

 --
 Maciej Wisniowski
 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
modwsgi group.
To post to this group, send email to modwsgi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en
-~--~~~~--~~--~--~---



[modwsgi] Re: mod_wsgi on Python 3.0 (was Re: Python 2.6 and migration warnings flag for Python 3.0.)

2008-09-30 Thread Brian Smith

Toshio Kuratomi wrote:
 On Sep 29, 3:24 pm, Brian Smith [EMAIL PROTECTED] wrote:
  Toshio Kuratomi wrote:
   If mod_wsgi assumes latin-1 and converts that intounicode 
   before it hits the app, the app will see this::
 
 Refererhttp://localhost/â%82¬.html
 
  No, it will leave it as http://localhost/%e2%82%ac.html. It 
  does (or should do) the Latin-1-to-Unicodeconversion before 
  it decodes URL encoding.
 
 uhm... you're wrong here.  url encoding and decoding operates 
 on bytes.  unicode is not bytes.  so you can't go from byte 
 string to unicode and then pass it through url decode.

Original string in Latin-1:   http://localhost/%e2%82%ac.html
Latin-1 to Unicode:   http://localhost/%e2%82%ac.html

Since the original Latin-1 string did not contain any non-Latin characters, no 
codepoint conversions are performed.

 Or I suppose you can, but it isn't by any means the opposite of 
 what you did to get the url escaped bytes so it's pretty senseless.

I made a mistake about the *encoding* (not decoding) order in my previous 
email. I will correct it below.

  Again, I think you are misunderstanding the interaction 
  between URL encoding and character encoding conversion. 
  Mod_wsgi will (should) never do or undo URL-encoding itself 
  for non-ASCII (%80-%FF) sequences.

 I think that you are misunderstanding the interaction.  And I 
 thing that % sequences should definitely be done by mod_wsgi. 

  Ending up with a unicode string containing %encoded 
 sequences is even worse than the other scenarios I described 
 as the application then has to convert from unicode to byte 
 string, unquote the url quoting, and then convert back to 
 unicode.  

mod_wsgi cannot decode all the % sequences in headers because it doesn't know 
which headers contain URIs and which ones don't; many headers can contain % 
sequences that don't mean the same thing they mean in URIs. Plus, sometimes 
(many times) the application needs the encoded URI instead of the IRI form. If 
you are you talking about things like PATH_INFO, SCRIPT_NAME, and REQUEST_URI, 
doing URI-IRI conversion on them will break applications like mine that 
already do their own URI-IRI conversion. I should test to see what WSGI 
gateways actually do there.

 It would be much better for mod_wsgi to do the url quoting 
 for the user as converting between bytes and %escape 
 sequences is 100% automatable.  This is unlike converting 
 between unicode and a sequence of bytes where something has 
 to decide what the character encoding is.  So -- WSGI should 
 take care of %encoding because that's a job for a computer 
 anyway.  WSGI should not take care of the byte= unicode 
 conversion because it doesn't know what enconding the bytes are in.

mod_wsgi already mangles the URI components too much in SCRIPT_NAME and 
PATH_INFO (in its defense, it does so because CGI/WSGI require it to for the 
most part, except for // munging). That is why I fall back to parsing 
REQUEST_URI myself.


   Now let's look at the reverse case:  Let's say that the 
 application 
   wants to redirect the user to €.html (Euro symbol.html).  
 For that, 
   they have to enter this into the location header::
 real_url = '€.html'
 byte_sequence = real_url.encode('utf-8')
 marshalled_form = str(byte_sequence, 'latin-1')
 headers = [('location', marshalled_form)]
 
  No, they have to URL-encode mashalled_form into ASCII 
 first, because the Location header holds a URI, and URIs are 
 always ASCII-only.
 
 Well... between marshalled_form and HTTP HEADER, there needs 
 to be a url escaping sequence.  but whether that needs to 
 happen outside of mod_wsgi or inside is part of what you and 
 I are debating.  You do see from your example above why your 
 initial sequence for decoding at the top of the post is 
 wrong, though?  Your decoding sequence at the top placed the 
 ASCII escaping between byte_sequence and real_url instead of 
 between marshalled_form and headers.

Right, I made two mistakes here. First, it doesn't make sense to URL-encode the 
string AFTER converting it to Latin-1. Instead, you need to URL-encode the 
string BEFORE converting it to Latin-1. Then, the string will only have ASCII 
characters. Secondly, you can encode/decode it using whatever encodings you 
please before you URL-encode it, because the URI and IRI specifications do not 
require every %XX sequence to decode to a valid UTF-8 sequence. mod_wsgi's own 
view of the filesystem encoding doesn't matter in this case.

Regards,
Brian



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
modwsgi group.
To post to this group, send email to modwsgi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en
-~--~~~~--~~--~--~---



[modwsgi] Re: mod_wsgi on Python 3.0 (was Re: Python 2.6 and migration warnings flag for Python 3.0.)

2008-09-30 Thread Graham Dumpleton

2008/9/30 Brian Smith [EMAIL PROTECTED]:
 mod_wsgi receives a sequence of bytes from apache.
 It transforms those into unicode by pretending that those bytes are
 latin-1 and sticks them into SCRIPT_NAME.

 IMO, mod_wsgi should just drop SCRIPT_NAME and all other non-WSGI environ 
 keys except REQUEST_URI (REQUEST_URI is needed to get the raw, un-decoded 
 URI).

Did you perhaps mean SCRIPT_FILENAME. The WSGI specification requires
SCRIPT_NAME.

As to this whole discussion, as much as it is interesting there is
nothing I can do about it. It really needs to be brought up on the
Python WEB-SIG where I originally raised the issue of Python 3.0
support for WSGI. I can only implement what consensus comes out of
discussion on Python WEB-SIG in lieu of them not wanting to come out
with an official revised specification for WSGI.

Graham

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
modwsgi group.
To post to this group, send email to modwsgi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en
-~--~~~~--~~--~--~---



[modwsgi] Re: Segmentation fault - premature end of script headers

2008-09-30 Thread Pigletto

On 30 Wrz, 14:41, Graham Dumpleton [EMAIL PROTECTED]
wrote:
 What do you get if you run:

ulimit -a

 Maybe they have some sort of hard memory limits in place and you are
 hitting that.
Output of ulimit -a is:
---
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
file size   (blocks, -f) unlimited
pending signals (-i) 1024
max locked memory   (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files  (-n) 4096
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 200
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited
---

AFAIK there is no hard limit at Webfaction. I have 160 MB memory limit
but my processes were killed when memory usage was above 220 MB
(ups..). Additionaly after every such incident I'm notified by
Webfaction about this issue. So other segmentation faults I've seen
before are not connected with process killing due to memory problems.

One more question as I'm a bit confused about WSGIApplicationGroup
directive. So far I was not using this at all. Does this mean that %
{GLOBAL} was used implicitly - by default? I only had WSGIProcessGroup
directives in use.

I've added a lot of printsys.stderr statements into my application
and I will try to raise segmentation fault somehow...

--
Maciej Wisniowski
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
modwsgi group.
To post to this group, send email to modwsgi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en
-~--~~~~--~~--~--~---



[modwsgi] OS X compile problem

2008-09-30 Thread Arash Arfaee

Hi All,

I am trying to install modwsgi over my mac. It gave me some error so I
updated my Xcode to the latest version however it still doesn't
compile. It seems it has a problem with 64-bit:
here is the result of ./configure :

checking for apxs2... no
checking for apxs... /usr/sbin/apxs
checking Apache version... 2.2.8
checking for python... /Library/Frameworks/Python.framework/Versions/
Current/bin/python
configure: creating ./config.status
config.status: creating Makefile


and here is the error:

/usr/sbin/apxs -c -I/Library/Frameworks/Python.framework/Versions/2.5/
include/python2.5 -DNDEBUG  -Wc,'-arch ppc7400' -Wc,'-arch ppc64' -
Wc,'-arch i386' -Wc,'-arch x86_64' mod_wsgi.c -arch ppc7400 -arch
ppc64 -arch i386 -arch x86_64 -Wl,-F/Library/Frameworks -framework
Python -u _PyMac_Error  -ldl
/usr/share/apr-1/build-1/libtool --silent --mode=compile gcc-
DDARWIN -DSIGPROCMASK_SETS_THREAD_MASK -no-cpp-precomp  -I/usr/include/
apache2  -I/usr/include/apr-1   -I/usr/include/apr-1  -arch ppc7400 -
arch ppc64 -arch i386 -arch x86_64 -I/Library/Frameworks/
Python.framework/Versions/2.5/include/python2.5 -DNDEBUG  -c -o
mod_wsgi.lo mod_wsgi.c  touch mod_wsgi.slo
In file included from /Library/Frameworks/Python.framework/Versions/
2.5/include/python2.5/Python.h:57,
 from mod_wsgi.c:113:
/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5/
pyport.h:761:2:In file included from /Library/Frameworks/
Python.framework/Versions/2.5/include/python2.5/Python.h:57 ,
 from mod_wsgi.c:113:
/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5/
pyport.h:761:2: error: error: #error #error LONG_BIT definition
appears wrong for platform (bad gcc/glibc config?).LONG_BIT
definition appears wrong for platform (bad gcc/glibc config?).

lipo: can't figure out the architecture type of: /var/folders/Oz/
OzJCk42BHE4eZJoJ1zy2PTI/-Tmp-//cccGDuqg.out
apxs:Error: Command failed with rc=65536

Any idea how to solve this problem?
Tnx,
Arash

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
modwsgi group.
To post to this group, send email to modwsgi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en
-~--~~~~--~~--~--~---



[modwsgi] Re: Segmentation fault - premature end of script headers

2008-09-30 Thread Pigletto

I've managed to get segmentation fault (I was just clicking around my
application, I forced few reloads of mod_wsgi by changing wsgi script,
etc.), and I was able to reproduce this few times.
Again, I've connected to it with gdb but this time I've issued command
'share' before 'bt'. Thanks to this I was able to see much more
interesting things.

WSGI script is executed, processing reaches my function (view in
Django) and exception is raised inside the view. Below is long output
of gdb. Seems to me that it is psycopg2 issue...?
In my code it is like:

class OrManager(models.Manager):
def latest(self, count=5):
latest = cache.get('latest-offers')
if latest is None:
latest = self.filter(is_active=True).order_by('-
date_added')[:count]
print  sys.stderr, latest  # 
THIS LINE FAILS - real execution of the SQL

I wonder whether this issue might be solved by using %{GLOBAL}?


GDB session:

(...)
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1212707936 (LWP 9463)]
PyErr_Occurred () at Python/errors.c:80
80  Python/errors.c: No such file or directory.
in Python/errors.c
(gdb) bt
#0  PyErr_Occurred () at Python/errors.c:80
#1  0x00d65167 in _PyObject_GC_Malloc (basicsize=40) at Modules/
gcmodule.c:1326
#2  0x00d6521c in _PyObject_GC_NewVar (tp=0xd9f3c0, nitems=7) at
Modules/gcmodule.c:1352
#3  0x00cfec33 in PyTuple_New (size=7) at Objects/tupleobject.c:68
#4  0x00400dc0 in ?? ()
#5  0x0007 in ?? ()
#6  0x0009 in ?? ()
#7  0xb7b74aa8 in ?? ()
#8  0x00400d4e in ?? ()
#9  0x00d95980 in PyExc_IndexError () from /usr/lib/libpython2.5.so.
1.0
#10 0x in ?? ()
(gdb) share
Symbols already loaded for /lib/tls/libm.so.6
Symbols already loaded for /home2/(...)/apache2.2//lib/libaprutil-1.so.
0
Symbols already loaded for /usr/lib/libsqlite3.so.0
Symbols already loaded for /usr/lib/libexpat.so.0
Symbols already loaded for /home2/(...)/apache2.2//lib/libapr-1.so.0
Symbols already loaded for /lib/libuuid.so.1
Symbols already loaded for /lib/tls/librt.so.1
Symbols already loaded for /lib/libcrypt.so.1
Symbols already loaded for /lib/tls/libpthread.so.0
Symbols already loaded for /lib/libdl.so.2
Symbols already loaded for /lib/tls/libc.so.6
Symbols already loaded for /lib/ld-linux.so.2
Symbols already loaded for /lib/libnss_files.so.2
Symbols already loaded for /home2/(...)/apache2.2/modules/mod_wsgi.so
Symbols already loaded for /usr/lib/libpython2.5.so.1.0
Symbols already loaded for /lib/libutil.so.1
Symbols already loaded for /home2/(...)/apache2.2/modules/
mod_log_config.so
Symbols already loaded for /home2/(...)/apache2.2/modules/
mod_auth_basic.so
Symbols already loaded for /home2/(...)/apache2.2/modules/
mod_authz_user.so
Symbols already loaded for /home2/(...)/apache2.2/modules/
mod_authz_host.so
Symbols already loaded for /home2/(...)/apache2.2/modules/mod_env.so
Symbols already loaded for /home2/(...)/modules/mod_alias.so
Symbols already loaded for /home2/(...)/modules/mod_auth_tkt.so
Symbols already loaded for /home2/(...)/modules/mod_rewrite.so
Reading symbols from /usr/local/lib/python2.5/lib-dynload/
time.so...done.
Loaded symbols for /usr/local/lib/python2.5/lib-dynload/time.so
Reading symbols from /usr/local/lib/python2.5/lib-dynload/
collections.so...done.
Loaded symbols for /usr/local/lib/python2.5/lib-dynload/collections.so
Reading symbols from /usr/local/lib/python2.5/lib-dynload/
cStringIO.so...done.
Loaded symbols for /usr/local/lib/python2.5/lib-dynload/cStringIO.so
Reading symbols from /usr/local/lib/python2.5/lib-dynload/
strop.so...done.
Loaded symbols for /usr/local/lib/python2.5/lib-dynload/strop.so
Reading symbols from /usr/local/lib/python2.5/lib-dynload/
cPickle.so...done.
Loaded symbols for /usr/local/lib/python2.5/lib-dynload/cPickle.so
Reading symbols from /usr/local/lib/python2.5/lib-dynload/
_socket.so...done.
Loaded symbols for /usr/local/lib/python2.5/lib-dynload/_socket.so
Reading symbols from /usr/local/lib/python2.5/lib-dynload/
_ssl.so...done.
Loaded symbols for /usr/local/lib/python2.5/lib-dynload/_ssl.so
Reading symbols from /lib/libssl.so.4...done.
Loaded symbols for /lib/libssl.so.4
Reading symbols from /lib/libcrypto.so.4...done.
Loaded symbols for /lib/libcrypto.so.4
Reading symbols from /usr/lib/libgssapi_krb5.so.2...done.
Loaded symbols for /usr/lib/libgssapi_krb5.so.2
Reading symbols from /usr/lib/libkrb5.so.3...done.
Loaded symbols for /usr/lib/libkrb5.so.3
Reading symbols from /lib/libcom_err.so.2...done.
Loaded symbols for /lib/libcom_err.so.2
Reading symbols from /usr/lib/libk5crypto.so.3...done.
Loaded symbols for /usr/lib/libk5crypto.so.3
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/local/lib/python2.5/lib-dynload/
operator.so...done.
Loaded symbols for 

[modwsgi] Re: Segmentation fault - premature end of script headers

2008-09-30 Thread Pigletto

Currently I use psycopg2 from the svn - version dated at January 2008.
I've just looked at initd.org's svn and I see there is psycopg2-2.0.8
and in change log from march I found:

2008-03-07  James Henstridge  [EMAIL PROTECTED]

* psycopg/pqpath.c (_pq_fetch_tuples): Don't call Python APIs
without holding the GIL.

Maybe that is the problem? I'll give a try to newest psycopg2

--
Maciej Wisniowski
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
modwsgi group.
To post to this group, send email to modwsgi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en
-~--~~~~--~~--~--~---



[modwsgi] Re: Segmentation fault - premature end of script headers

2008-09-30 Thread Brett Hoerner

On Tue, Sep 30, 2008 at 4:22 PM, Pigletto [EMAIL PROTECTED] wrote:
 Maybe that is the problem? I'll give a try to newest psycopg2

2.0.8 definitely fixed some segfaults on my end.

Brett

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
modwsgi group.
To post to this group, send email to modwsgi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en
-~--~~~~--~~--~--~---



[modwsgi] Re: mod_wsgi on Python 3.0 (was Re: Python 2.6 and migration warnings flag for Python 3.0.)

2008-09-30 Thread Graham Dumpleton

2008/10/1 Brian Smith [EMAIL PROTECTED]:
 mod_wsgi already mangles the URI components too much in SCRIPT_NAME and 
 PATH_INFO (in its defense, it does so because CGI/WSGI require it to for the 
 most part, except for // munging). That is why I fall back to parsing 
 REQUEST_URI myself.

In my defence I do the leading duplicate slash removal in SCRIPT_NAME
because otherwise different major versions of Apache would behave
differently. Any duplicate slashes otherwise within the path of
SCRIPT_NAME and PATH_INFO are from memory eliminated by Apache itself
and not by mod_wsgi.

Graham

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
modwsgi group.
To post to this group, send email to modwsgi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en
-~--~~~~--~~--~--~---



[modwsgi] Re: OS X compile problem

2008-09-30 Thread Graham Dumpleton

2008/10/1 Arash Arfaee [EMAIL PROTECTED]:

 Hi All,

 I am trying to install modwsgi over my mac. It gave me some error so I
 updated my Xcode to the latest version however it still doesn't
 compile. It seems it has a problem with 64-bit:
 here is the result of ./configure :

 checking for apxs2... no
 checking for apxs... /usr/sbin/apxs
 checking Apache version... 2.2.8
 checking for python... /Library/Frameworks/Python.framework/Versions/
 Current/bin/python
 configure: creating ./config.status
 config.status: creating Makefile


 and here is the error:

 /usr/sbin/apxs -c -I/Library/Frameworks/Python.framework/Versions/2.5/
 include/python2.5 -DNDEBUG  -Wc,'-arch ppc7400' -Wc,'-arch ppc64' -
 Wc,'-arch i386' -Wc,'-arch x86_64' mod_wsgi.c -arch ppc7400 -arch
 ppc64 -arch i386 -arch x86_64 -Wl,-F/Library/Frameworks -framework
 Python -u _PyMac_Error  -ldl
 /usr/share/apr-1/build-1/libtool --silent --mode=compile gcc-
 DDARWIN -DSIGPROCMASK_SETS_THREAD_MASK -no-cpp-precomp  -I/usr/include/
 apache2  -I/usr/include/apr-1   -I/usr/include/apr-1  -arch ppc7400 -
 arch ppc64 -arch i386 -arch x86_64 -I/Library/Frameworks/
 Python.framework/Versions/2.5/include/python2.5 -DNDEBUG  -c -o
 mod_wsgi.lo mod_wsgi.c  touch mod_wsgi.slo
 In file included from /Library/Frameworks/Python.framework/Versions/
 2.5/include/python2.5/Python.h:57,
 from mod_wsgi.c:113:
 /Library/Frameworks/Python.framework/Versions/2.5/include/python2.5/
 pyport.h:761:2:In file included from /Library/Frameworks/
 Python.framework/Versions/2.5/include/python2.5/Python.h:57 ,
 from mod_wsgi.c:113:
 /Library/Frameworks/Python.framework/Versions/2.5/include/python2.5/
 pyport.h:761:2: error: error: #error #error LONG_BIT definition
 appears wrong for platform (bad gcc/glibc config?).LONG_BIT
 definition appears wrong for platform (bad gcc/glibc config?).

 lipo: can't figure out the architecture type of: /var/folders/Oz/
 OzJCk42BHE4eZJoJ1zy2PTI/-Tmp-//cccGDuqg.out
 apxs:Error: Command failed with rc=65536

 Any idea how to solve this problem?

See:

  
http://code.google.com/p/modwsgi/wiki/InstallationOnMacOSX#Non_Universal_Developer_Tools

In short, you are using MacPorts Python and it isn't a fully fat
version. Alternatively it is because MacPorts gcc is being used and it
isn't fully fat.

What hardware are you running on, PPC or Intel and 32 or 64 bit chip?

Graham

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
modwsgi group.
To post to this group, send email to modwsgi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en
-~--~~~~--~~--~--~---