Re: bug: calling setlocale(LC_ALL,name of any locale with utf8 charset)more than once crashes httpd with mod_perl
Vlad Harchev wrote: When using the following script under mod_perl, each httpd process crashes on the 2nd request to execute this script. #!/usr/bin/perl use strict; use POSIX qw(locale_h); setlocale(LC_ALL,'en_US.utf8'); print Expires: 1 Jan 1970\nContent-Type: text/html\n\nHi; - This crashes if instead of 'en_US.utf8' one uses any other utf8 locale that is available in the system. If one uses locale with single-byte encoding (e.g. 'en_US.ascii' or 'ru_RU.koi8r') it doesn't crash httpd. (I didn't try other multibyte encodings beside utf8). If one uses LC_COLLATE instead of LC_ALL, it doesn't crash for any locale (I didn't try other locale categories). (The httpd server is started under locale 'ru_RU.koi8r' - a single-byte locale). I'm using mod_perl on RH72 for x86, versions of the relevant software: perl-5.6.0-17 mod_perl-1.24_01-3 glibc-2.2.4-19.3 apache-1.3.20-16 I doubt it's a bug in mod_perl. Setting locale affects the lots of core things, so a simple test may not trigger the problem. BTW, if I remember correctly Perl 5.6.0 is not utf8-safe (or unicode in general), correct me if I'm wrong. Can you try the same with the latest bleadperl? (skip the 'make test' there is some problem in perl that I'm fixing now). 5.8.0 should be out in a month or so and it should work well with unicode. Unfortunately, I don't have time for compiling and installing it.. I hope somebody on this list who has already installed version of recent perl will test the problem.. Confirmed as working with bleadperl (i.e the latest perl-5.7.3) Also confirmed as working with the stock 5.6.1 perl on linux. So I'd suggest to upgrade to 5.6.1 if possible as the safest step, before 5.8.0 is released. All tested with Apache/1.3.25-dev (Unix) mod_perl/1.26_01-dev but the non-cvs version should probably work the same way. __ Stas BekmanJAm_pH -- Just Another mod_perl Hacker http://stason.org/ mod_perl Guide --- http://perl.apache.org mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com http://modperlbook.org http://apache.org http://ticketmaster.com
bug: calling setlocale(LC_ALL,name of any locale with utf8 charset)more than once crashes httpd with mod_perl
Hi, When using the following script under mod_perl, each httpd process crashes on the 2nd request to execute this script. #!/usr/bin/perl use strict; use POSIX qw(locale_h); setlocale(LC_ALL,'en_US.utf8'); print Expires: 1 Jan 1970\nContent-Type: text/html\n\nHi; - This crashes if instead of 'en_US.utf8' one uses any other utf8 locale that is available in the system. If one uses locale with single-byte encoding (e.g. 'en_US.ascii' or 'ru_RU.koi8r') it doesn't crash httpd. (I didn't try other multibyte encodings beside utf8). If one uses LC_COLLATE instead of LC_ALL, it doesn't crash for any locale (I didn't try other locale categories). (The httpd server is started under locale 'ru_RU.koi8r' - a single-byte locale). I'm using mod_perl on RH72 for x86, versions of the relevant software: perl-5.6.0-17 mod_perl-1.24_01-3 glibc-2.2.4-19.3 apache-1.3.20-16 It seems it's a bug in mod_perl, since the following C program -- #include locale.h #include stdlib.h int main(int argc,char** argv) { setlocale(LC_ALL,en_US.utf8); printf(try 1\n); setlocale(LC_ALL,en_US.utf8); printf(try 2\n); setlocale(LC_ALL,en_US.utf8); printf(try 3\n); return 0; } Works without crashing. Also, the following perl script: #!/usr/bin/perl use strict; use POSIX qw(locale_h); for(my $i=0;$i10; ++$i) { setlocale(LC_ALL,'en_US.utf8'); } print done\n; Works without crashing too. Granted, LC_COLLATE it's enough for my scripts, so this bug is not fatal to me. Best regards, -Vlad
Re: bug: calling setlocale(LC_ALL,name of any locale with utf8 charset)more than once crashes httpd with mod_perl
Vlad Harchev wrote: Hi, When using the following script under mod_perl, each httpd process crashes on the 2nd request to execute this script. #!/usr/bin/perl use strict; use POSIX qw(locale_h); setlocale(LC_ALL,'en_US.utf8'); print Expires: 1 Jan 1970\nContent-Type: text/html\n\nHi; - This crashes if instead of 'en_US.utf8' one uses any other utf8 locale that is available in the system. If one uses locale with single-byte encoding (e.g. 'en_US.ascii' or 'ru_RU.koi8r') it doesn't crash httpd. (I didn't try other multibyte encodings beside utf8). If one uses LC_COLLATE instead of LC_ALL, it doesn't crash for any locale (I didn't try other locale categories). (The httpd server is started under locale 'ru_RU.koi8r' - a single-byte locale). I'm using mod_perl on RH72 for x86, versions of the relevant software: perl-5.6.0-17 mod_perl-1.24_01-3 glibc-2.2.4-19.3 apache-1.3.20-16 I doubt it's a bug in mod_perl. Setting locale affects the lots of core things, so a simple test may not trigger the problem. BTW, if I remember correctly Perl 5.6.0 is not utf8-safe (or unicode in general), correct me if I'm wrong. Can you try the same with the latest bleadperl? (skip the 'make test' there is some problem in perl that I'm fixing now). 5.8.0 should be out in a month or so and it should work well with unicode. In any case, you should send a backtrace of the coredump according to the SUPPORT file found in the mod_perl source distro. __ Stas BekmanJAm_pH -- Just Another mod_perl Hacker http://stason.org/ mod_perl Guide --- http://perl.apache.org mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com http://modperlbook.org http://apache.org http://ticketmaster.com
Re: Charset woes
On Thursday 14 June 2001 23:40, Ged Haywood wrote: On Thu, 14 Jun 2001, Robin Berjon wrote: The problem is simply that I need to mix that data with other data in another encoding, which means I have to convert it. Do you send a charset specification to the client? This was often overlooked until the cross-site scripting thing blew up early last year. I wonder if browsers seeing that might be more forthcoming if they want to use a something different. Yes I am, AxKit doesn't give you much of a choice there (rightfully so) :) After doing a number of tests, I've found that browsers (even totally non-compliant ones) tend to POST back in the same charset you used to send the page to them, unless the user types characters that don't fit into that charset (people usually post in the language in which the page is written, but sometimes their names will not fit into the charset). The spec says they _may_ do that if accept-charset is set to UNKNOWN (its default value), but then the spec is moot when it comes to browsers. So now if I could find a way to send UTF-8 to Netscape 4 without it blowing up, I might have found a workable solution to this problem :-) -- ___ Robin Berjon [EMAIL PROTECTED] -- CTO k n o w s c a p e : // venture knowledge agency www.knowscape.com --- Chance is irrelevant. We will succeed. -- 7o9
Re: Charset woes
Hi Robin, On Wed, 13 Jun 2001, Robin Berjon wrote: I'm running into trouble with browsers submitting data using various charsets and not telling me which charset they're using. This results in all sorts of breakages and unusable text. I can't be the only one dealing with this problem (if I am, then I'm really out of luck) so I was wondering if anyone here knows of a good way to reliably detect the charset that the browser is using to post its data. It will be very difficult to guess reliably what charset is in use from a random sample of characters taken from it. I think you just have to be able to handle the data. You need sixteen bits per character. 73, Ged.
Re: Charset woes
On Thursday 14 June 2001 13:18, Ged Haywood wrote: On Wed, 13 Jun 2001, Robin Berjon wrote: I'm running into trouble with browsers submitting data using various charsets and not telling me which charset they're using. This results in all sorts of breakages and unusable text. I can't be the only one dealing with this problem (if I am, then I'm really out of luck) so I was wondering if anyone here knows of a good way to reliably detect the charset that the browser is using to post its data. It will be very difficult to guess reliably what charset is in use from a random sample of characters taken from it. I think you just have to be able to handle the data. You need sixteen bits per character. I'm able to handle the data :) The problem is simply that I need to mix that data with other data in another encoding, which means I have to convert it. And in order to convert it, I need to know the original encoding... otherwise either the converter will blow up, or I'll corrupt the content. Thanks Ged :) -- ___ Robin Berjon [EMAIL PROTECTED] -- CTO k n o w s c a p e : // venture knowledge agency www.knowscape.com --- There are trivial truths and there are great Truths. The opposite of a trival truth is obviously false. The opposite of a great Truth is also true. -- Niels Bohr
Charset woes
Hi, I'm running into trouble with browsers submitting data using various charsets and not telling me which charset they're using. This results in all sorts of breakages and unusable text. I can't be the only one dealing with this problem (if I am, then I'm really out of luck) so I was wondering if anyone here knows of a good way to reliably detect the charset that the browser is using to post its data. Thanks, -- ___ Robin Berjon [EMAIL PROTECTED] -- CTO k n o w s c a p e : // venture knowledge agency www.knowscape.com --- Always remember you're unique just like everyone else.
Re: Charset woes
On Wed Jun 13 16:17:14 2001 +0200 Robin Berjon wrote: Hi, I'm running into trouble with browsers submitting data using various charsets and not telling me which charset they're using. This results in all sorts of breakages and unusable text. I can't be the only one dealing with this problem (if I am, then I'm really out of luck) so I was wondering if anyone here knows of a good way to reliably detect the charset that the browser is using to post its data. Make sure your page or http header has charset declared and add hidden input field with known string that you can examin when submitted back. -- ☻ Ričardas Čepas ☺ ~~ ~
Re: Charset woes
On Wednesday 13 June 2001 20:15, Ričardas Čepas wrote: On Wed Jun 13 16:17:14 2001 +0200 Robin Berjon wrote: Hi, I'm running into trouble with browsers submitting data using various charsets and not telling me which charset they're using. This results in all sorts of breakages and unusable text. I can't be the only one dealing with this problem (if I am, then I'm really out of luck) so I was wondering if anyone here knows of a good way to reliably detect the charset that the browser is using to post its data. Make sure your page or http header has charset declared and add hidden input field with known string that you can examin when submitted back. Hmm, that's an interesting approach. Have you used it before ? Do you know of a string that could potentially detect any encoding ? I'm really facing pretty much anything depending on the browser's whim. -- ___ Robin Berjon [EMAIL PROTECTED] -- CTO k n o w s c a p e : // venture knowledge agency www.knowscape.com --- Lavish spending can be disastrous. Don't buy lavishes for a while.
Charset?
Is there a mod_perl way to set the character set besides doing: $r-content_type('text/html; charset=foo'); ??? That'd be handy for a future version. I can't find anything in the Apache.pm docs (1.24) or the guide for this. -dave /*== www.urth.org We await the New Sun ==*/
Re: Charset?
On Fri, 23 Mar 2001, Dave Rolsky wrote: Is there a mod_perl way to set the character set besides doing: $r-content_type('text/html; charset=foo'); ??? No, that's the way you have to do it. -- Matt/ /||** Founder and CTO ** ** http://axkit.com/ ** //||** AxKit.com Ltd ** ** XML Application Serving ** // ||** http://axkit.org ** ** XSLT, XPathScript, XSP ** // \\| // ** mod_perl news and resources: http://take23.org ** \\// //\\ // \\