Re: bug: calling setlocale(LC_ALL,name of any locale with utf8 charset)more than once crashes httpd with mod_perl

2002-04-15 Thread Stas Bekman

Vlad Harchev wrote:

 When using the following script under mod_perl, each httpd process crashes on
the 2nd request to execute this script.
 
#!/usr/bin/perl 
use strict; use POSIX qw(locale_h); 
setlocale(LC_ALL,'en_US.utf8'); 
print Expires: 1 Jan 1970\nContent-Type: text/html\n\nHi; 
-
 This crashes if instead of 'en_US.utf8' one uses any other utf8 locale that
is available in the system. If one uses locale with single-byte encoding (e.g. 
'en_US.ascii' or 'ru_RU.koi8r') it doesn't crash httpd. (I didn't try
other multibyte encodings beside utf8).

 If one uses LC_COLLATE instead of LC_ALL, it doesn't crash for any locale (I
didn't try other locale categories).  (The httpd server is started under
locale 'ru_RU.koi8r' - a single-byte locale).

 I'm using mod_perl on RH72 for x86, versions of the relevant software:
perl-5.6.0-17
mod_perl-1.24_01-3
glibc-2.2.4-19.3
apache-1.3.20-16

I doubt it's a bug in mod_perl. Setting locale affects the lots of core 
things, so a simple test may not trigger the problem.

BTW, if I remember correctly Perl 5.6.0 is not utf8-safe (or unicode in 
general), correct me if I'm wrong. Can you try  the same with the latest 
bleadperl? (skip the 'make test' there is some problem in perl that I'm 
fixing now). 5.8.0 should be out in a month or so and it should work 
well with unicode.
 
 
  Unfortunately, I don't have time for compiling and installing it..
  I hope somebody on this list who has already installed version of recent perl 
 will test the problem..

Confirmed as working with bleadperl (i.e the latest perl-5.7.3)

Also confirmed as working with the stock 5.6.1 perl on linux.

So I'd suggest to upgrade to 5.6.1 if possible as the safest step, 
before 5.8.0 is released.

All tested with Apache/1.3.25-dev (Unix) mod_perl/1.26_01-dev but the 
non-cvs version should probably work the same way.

__
Stas BekmanJAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide --- http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com




bug: calling setlocale(LC_ALL,name of any locale with utf8 charset)more than once crashes httpd with mod_perl

2002-04-14 Thread Vlad Harchev

 Hi, 

 When using the following script under mod_perl, each httpd process crashes on
the 2nd request to execute this script.
 
#!/usr/bin/perl 
use strict; use POSIX qw(locale_h); 
setlocale(LC_ALL,'en_US.utf8'); 
print Expires: 1 Jan 1970\nContent-Type: text/html\n\nHi; 
-
 This crashes if instead of 'en_US.utf8' one uses any other utf8 locale that
is available in the system. If one uses locale with single-byte encoding (e.g. 
'en_US.ascii' or 'ru_RU.koi8r') it doesn't crash httpd. (I didn't try
other multibyte encodings beside utf8).

 If one uses LC_COLLATE instead of LC_ALL, it doesn't crash for any locale (I
didn't try other locale categories).  (The httpd server is started under
locale 'ru_RU.koi8r' - a single-byte locale).

 I'm using mod_perl on RH72 for x86, versions of the relevant software:
perl-5.6.0-17
mod_perl-1.24_01-3
glibc-2.2.4-19.3
apache-1.3.20-16


 It seems it's a bug in mod_perl, since the following C program
--
#include locale.h
#include stdlib.h

int main(int argc,char** argv)
{
setlocale(LC_ALL,en_US.utf8);
printf(try 1\n);

setlocale(LC_ALL,en_US.utf8);
printf(try 2\n);

setlocale(LC_ALL,en_US.utf8);
printf(try 3\n);
return 0;
}

 Works without crashing.

 Also, the following perl script:

#!/usr/bin/perl
use strict;
use POSIX qw(locale_h);
for(my $i=0;$i10; ++$i)
{
setlocale(LC_ALL,'en_US.utf8');
}
print done\n;

 Works without crashing too.

 Granted, LC_COLLATE it's enough for my scripts, so this bug is not fatal to
me.

 Best regards,
  -Vlad




Re: bug: calling setlocale(LC_ALL,name of any locale with utf8 charset)more than once crashes httpd with mod_perl

2002-04-14 Thread Stas Bekman

Vlad Harchev wrote:
  Hi, 
 
  When using the following script under mod_perl, each httpd process crashes on
 the 2nd request to execute this script.
  
 #!/usr/bin/perl 
 use strict; use POSIX qw(locale_h); 
 setlocale(LC_ALL,'en_US.utf8'); 
 print Expires: 1 Jan 1970\nContent-Type: text/html\n\nHi; 
 -
  This crashes if instead of 'en_US.utf8' one uses any other utf8 locale that
 is available in the system. If one uses locale with single-byte encoding (e.g. 
 'en_US.ascii' or 'ru_RU.koi8r') it doesn't crash httpd. (I didn't try
 other multibyte encodings beside utf8).
 
  If one uses LC_COLLATE instead of LC_ALL, it doesn't crash for any locale (I
 didn't try other locale categories).  (The httpd server is started under
 locale 'ru_RU.koi8r' - a single-byte locale).
 
  I'm using mod_perl on RH72 for x86, versions of the relevant software:
 perl-5.6.0-17
 mod_perl-1.24_01-3
 glibc-2.2.4-19.3
 apache-1.3.20-16

I doubt it's a bug in mod_perl. Setting locale affects the lots of core 
things, so a simple test may not trigger the problem.

BTW, if I remember correctly Perl 5.6.0 is not utf8-safe (or unicode in 
general), correct me if I'm wrong. Can you try  the same with the latest 
bleadperl? (skip the 'make test' there is some problem in perl that I'm 
fixing now). 5.8.0 should be out in a month or so and it should work 
well with unicode.

In any case, you should send a backtrace of the coredump according to 
the SUPPORT file found in the mod_perl source distro.

__
Stas BekmanJAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide --- http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com




Re: Charset woes

2001-06-15 Thread Robin Berjon

On Thursday 14 June 2001 23:40, Ged Haywood wrote:
 On Thu, 14 Jun 2001, Robin Berjon wrote:
  The problem is simply that I need to mix that data with other data
  in another encoding, which means I have to convert it.

 Do you send a charset specification to the client?  This was often
 overlooked until the cross-site scripting thing blew up early last
 year.  I wonder if browsers seeing that might be more forthcoming if
 they want to use a something different.

Yes I am, AxKit doesn't give you much of a choice there (rightfully so) :) 
After doing a number of tests, I've found that browsers (even totally 
non-compliant ones) tend to POST back in the same charset you used to send 
the page to them, unless the user types characters that don't fit into that 
charset (people usually post in the language in which the page is written, 
but sometimes their names will not fit into the charset). The spec says they 
_may_ do that if accept-charset is set to UNKNOWN (its default value), but 
then the spec is moot when it comes to browsers.

So now if I could find a way to send UTF-8 to Netscape 4 without it blowing 
up, I might have found a workable solution to this problem :-)

-- 
___
Robin Berjon [EMAIL PROTECTED] -- CTO
k n o w s c a p e : // venture knowledge agency www.knowscape.com
---
Chance is irrelevant. We will succeed. 
-- 7o9




Re: Charset woes

2001-06-14 Thread Ged Haywood

Hi Robin,

On Wed, 13 Jun 2001, Robin Berjon wrote:

 I'm running into trouble with browsers submitting data using various charsets 
 and not telling me which charset they're using. This results in all sorts of 
 breakages and unusable text. I can't be the only one dealing with this 
 problem (if I am, then I'm really out of luck) so I was wondering if anyone 
 here knows of a good way to reliably detect the charset that the browser is 
 using to post its data.

It will be very difficult to guess reliably what charset is in use from
a random sample of characters taken from it.  I think you just have to
be able to handle the data.  You need sixteen bits per character.

73,
Ged.





Re: Charset woes

2001-06-14 Thread Robin Berjon

On Thursday 14 June 2001 13:18, Ged Haywood wrote:
 On Wed, 13 Jun 2001, Robin Berjon wrote:
  I'm running into trouble with browsers submitting data using various
  charsets and not telling me which charset they're using. This results in
  all sorts of breakages and unusable text. I can't be the only one dealing
  with this problem (if I am, then I'm really out of luck) so I was
  wondering if anyone here knows of a good way to reliably detect the
  charset that the browser is using to post its data.

 It will be very difficult to guess reliably what charset is in use from
 a random sample of characters taken from it.  I think you just have to
 be able to handle the data.  You need sixteen bits per character.

I'm able to handle the data :) The problem is simply that I need to mix that 
data with other data in another encoding, which means I have to convert it. 
And in order to convert it, I need to know the original encoding... otherwise 
either the converter will blow up, or I'll corrupt the content.

Thanks Ged :)

-- 
___
Robin Berjon [EMAIL PROTECTED] -- CTO
k n o w s c a p e : // venture knowledge agency www.knowscape.com
---
There are trivial truths and there are great Truths. The opposite of 
a trival truth is obviously false. The opposite of a great Truth is 
also true.  
-- Niels Bohr 




Charset woes

2001-06-13 Thread Robin Berjon

Hi,

I'm running into trouble with browsers submitting data using various charsets 
and not telling me which charset they're using. This results in all sorts of 
breakages and unusable text. I can't be the only one dealing with this 
problem (if I am, then I'm really out of luck) so I was wondering if anyone 
here knows of a good way to reliably detect the charset that the browser is 
using to post its data.

Thanks,

-- 
___
Robin Berjon [EMAIL PROTECTED] -- CTO
k n o w s c a p e : // venture knowledge agency www.knowscape.com
---
Always remember you're unique just like everyone else. 




Re: Charset woes

2001-06-13 Thread Riardas epas

On Wed Jun 13 16:17:14 2001 +0200 Robin Berjon wrote:

 Hi,
 
 I'm running into trouble with browsers submitting data using various charsets 
 and not telling me which charset they're using. This results in all sorts of 
 breakages and unusable text. I can't be the only one dealing with this 
 problem (if I am, then I'm really out of luck) so I was wondering if anyone 
 here knows of a good way to reliably detect the charset that the browser is 
 using to post its data.
 
Make sure your page or http header has charset declared and add hidden input 
field with known string that you can examin when submitted back.
-- 
  ☻ Ričardas Čepas ☺
~~
~



Re: Charset woes

2001-06-13 Thread Robin Berjon

On Wednesday 13 June 2001 20:15, Ričardas Čepas wrote:
 On Wed Jun 13 16:17:14 2001 +0200 Robin Berjon wrote:
  Hi,
 
  I'm running into trouble with browsers submitting data using various
  charsets and not telling me which charset they're using. This results in
  all sorts of breakages and unusable text. I can't be the only one dealing
  with this problem (if I am, then I'm really out of luck) so I was
  wondering if anyone here knows of a good way to reliably detect the
  charset that the browser is using to post its data.

 Make sure your page or http header has charset declared and add
 hidden input field with known string that you can examin when submitted
 back.

Hmm, that's an interesting approach. Have you used it before ? Do you know of 
a string that could potentially detect any encoding ? I'm really facing 
pretty much anything depending on the browser's whim.

-- 
___
Robin Berjon [EMAIL PROTECTED] -- CTO
k n o w s c a p e : // venture knowledge agency www.knowscape.com
---
Lavish spending can be disastrous. Don't buy lavishes for a while.




Charset?

2001-03-23 Thread Dave Rolsky

Is there a mod_perl way to set the character set besides doing:

$r-content_type('text/html; charset=foo');

???

That'd be handy for a future version.  I can't find anything in the
Apache.pm docs (1.24) or the guide for this.


-dave

/*==
www.urth.org
We await the New Sun
==*/




Re: Charset?

2001-03-23 Thread Matt Sergeant

On Fri, 23 Mar 2001, Dave Rolsky wrote:

 Is there a mod_perl way to set the character set besides doing:
 
 $r-content_type('text/html; charset=foo');
 
 ???

No, that's the way you have to do it.

-- 
Matt/

/||** Founder and CTO  **  **   http://axkit.com/ **
   //||**  AxKit.com Ltd   **  ** XML Application Serving **
  // ||** http://axkit.org **  ** XSLT, XPathScript, XSP  **
 // \\| // ** mod_perl news and resources: http://take23.org  **
 \\//
 //\\
//  \\