Re: multilanguage site

2000-09-05 Thread Paul Lindner

On Fri, Sep 01, 2000 at 10:44:10PM -0400, Greg Stark wrote:
 
   can someone suggest me the best way to build a multilanguage web site
   (english, french, ..).
   I'm using Apache + mod_perl + Apache::asp (for applications)
 
 I'm really interested in what other people are doing here. We've just released
 our first cut at i18n and it's going fairly well. But so far we haven't dealt
 with the big bugaboo, character encoding. 

 One major problem I anticipate is what to do when individual include files are
 not available in the local language. For iso-8859-1 encoded languages that's
 not a major hurdle as we can simply use the english text until it's
 translated. But for other encodings does it make sense to include english
 text? 

 If we use UTF-8 all the ascii characters would display properly, but do most
 browsers support UTF-8 now? Or do people still use BIG5, EUS, etc? 

 As far as I can tell there's no way in html to indicate to the browser that a
 chunk of content is in some other encoding other than what was specified in
 the headers or meta tag. There's no span charset=... attribute or anything
 like that. This seems to make truly multilingual pages really awkward. You
 basically must use an encoding like UTF-8 which can reach the entire unicode
 character set or else you cannot mix languages.

It's a mess, but you're just going to have to assume multiple
character sets for the forseeable future.  We try to use all utf8 data
sources.  XML defaults to this. Oracle can be easily set up this way,
and you can use utf8 in your html sources too.  You just have to be
careful, for example in our message catalogs we source translations
into utf8.

Anyway, here's what's in my global.asa to take care of this character
set conversion mess..  Full details available to those that are
interested..


In Script_OnStart we convert submitted data to utf8

  ...
  #set $Apps::Param to form data or querystring.

  # decide on character set based on submitted form data element
  # 'asp_charset', or based on user's language.

  my $charset = $Apps::Param-{'asp_charset'};
  $charset = 'x-euc-jp' if (!$charset  $Session-{"Lang"} eq 'ja');

  $charset ||= 'iso-8859-1';

  # Convert japanese to UTF8
  ... messy Jcode stuff removed..
  # convert utf8 
   ; # no-op
  # convert iso8859-1 to utf8
  ... messy Unicode::String code..

  $Response-{Charset} = $charset;


In Script_OnFlush we convert the internal utf8 data to the target charset

  my $charset = $Response-{Charset};

  # do character set conversion..
  if ($charset eq 'x-euc-jp') {
... messy Jcode stuff
  } elsif ($charset eq 'iso-8859-1') {
... unicode::string stuff here.
  }

  # here's the tricky part:
  # Automatically add hidden charset fields to forms?
  $$data =~ s,(form.*/form),formfixer($1),sige;


Here's the formfixer thing, it adds hidden charset values to the form:

sub formfixer {
my $form = shift;
return($form) if ($form =~ /action="?http/);
$form =~ s,/form,input type="hidden" name="asp_charset" 
value="$Response-{Charset}"/form,si;
return($form);
}



-- 
Paul Lindner
[EMAIL PROTECTED]
Red Hat Inc.



Re: multilanguage site

2000-09-05 Thread Matt Sergeant

On Tue, 5 Sep 2000, Paul Lindner wrote:

 Anyway, here's what's in my global.asa to take care of this character
 set conversion mess..  Full details available to those that are
 interested..

[snip]

Yikes, you redhat guys really need to look at AxKit:

# in .htaccess
AxOutputCharset ISO-8859-1

And thats it. :-)

-- 
Matt/

Fastnet Software Ltd. High Performance Web Specialists
Providing mod_perl, XML, Sybase and Oracle solutions
Email for training and consultancy availability.
http://sergeant.org | AxKit: http://axkit.org




Re: multilanguage site

2000-09-05 Thread Matt Sergeant

On Tue, 5 Sep 2000, Paul Lindner wrote:

 On Tue, Sep 05, 2000 at 10:23:45AM +0100, Matt Sergeant wrote:
  On Tue, 5 Sep 2000, Paul Lindner wrote:
  
   Anyway, here's what's in my global.asa to take care of this character
   set conversion mess..  Full details available to those that are
   interested..
  
  [snip]
  
  Yikes, you redhat guys really need to look at AxKit:
 
 We have.
 
  # in .htaccess
  AxOutputCharset ISO-8859-1
  
  And thats it. :-)
  
 
 But that doesn't provide me dynamic switching between character sets
 based on user preferences.  Based on HTTP_ACCEPT_CHARSET we can choose
 to use iso-8859-1 or utf8, plus we need to force Japanese to use
 x-euc-jp on certain platforms, sjis on others.

Thats why everything is a plugin in AxKit. You're free to do that. However
you've reminded me that I do need to implement ACCEPT_CHARSET directly.

 Tell us, how do you do the character set conversion behind the scenes
 for various data sources?

Well all data sources are XML at some point, so XML::Parser converts to
UTF8 for us. Then outgoing charset is converted to via Unicode::String and
Map8. This surely ignores DB's and random files that we don't know the
format of - but how could we expect to cope with that? I'll consider
adding an incoming-charset attribute to the SQL taglib though - that
sounds like a good idea. And maybe some day perl will get input filters or
something to control that for ordinary files...

-- 
Matt/

Fastnet Software Ltd. High Performance Web Specialists
Providing mod_perl, XML, Sybase and Oracle solutions
Email for training and consultancy availability.
http://sergeant.org | AxKit: http://axkit.org




Re: multilanguage site

2000-09-03 Thread Matt Sergeant

On Sat, 2 Sep 2000, Eric L. Brine wrote:

 
   As far as I can tell there's no way in html to indicate to the
   browser that a chunk of content is in some other encoding other
   than what was specified in the headers or meta tag. There's no
   span charset=... attribute or anything like that.
  
  Yes, there is.
 
 None exists in the standard, as seen below, and I don't see anything in
 CSS either.

My bad. I was mistaken by HTML form's accept-charset attribute.

-- 
Matt/

Fastnet Software Ltd. High Performance Web Specialists
Providing mod_perl, XML, Sybase and Oracle solutions
Email for training and consultancy availability.
http://sergeant.org | AxKit: http://axkit.org




Re: multilanguage site

2000-09-03 Thread Riardas epas

On Fri Sep  1 23:18:13 2000 -0400 Eric L. Brine wrote:

 
  You basically must use an encoding like UTF-8 which can reach the
  entire unicode character set or else you cannot mix languages.
 
 Not quite. To display characters not in the current character set, use
 "...;" encodings, such as "eacute;" and "#;" (where  is
 unicode).
 
This would require unicode capable browser anyway.  Even more,
Netscape v4 doesn't show these escapes unless you set encoding to utf-8.

-- 
  ☻ Ričardas Čepas ☺
~~
~



Re: [OT] multilanguage site

2000-09-03 Thread G.W. Haywood

Hi all,

On Sun, 3 Sep 2000, [UTF-8] Ričardas Čepas wrote:

 On Fri Sep  1 23:18:13 2000 -0400 Eric L. Brine wrote:
 
 This would require unicode capable browser anyway.  Even more,
 Netscape v4 doesn't show these escapes unless you set encoding to utf-8.


There's a rather good document about character set encoding at

http://www.physics.gla.ac.uk/r2h-extras/rtfunicode.html

and some useful background stuff at

http://ppewww.ph.gla.ac.uk/~flavell/charset/

Flavell has done a lot of good work on browser response too, if you
browse around those sites you'll find there's a table there somewhere
which shows how many different browser versions respond to what I'd
call `funny characters'.

See also 'man unicode', 'man utf-8' (even 'man latin-1') on Linux.

73,
Ged.

(And what's all this \342\230\273 stuff?  Looks funny in Pine...:)




Re: multilanguage site

2000-09-02 Thread Matt Sergeant

On 1 Sep 2000, Greg Stark wrote:

 
   can someone suggest me the best way to build a multilanguage web site
   (english, french, ..).
   I'm using Apache + mod_perl + Apache::asp (for applications)
 
 I'm really interested in what other people are doing here. We've just released
 our first cut at i18n and it's going fairly well. But so far we haven't dealt
 with the big bugaboo, character encoding. 
 
 One major problem I anticipate is what to do when individual include files are
 not available in the local language. For iso-8859-1 encoded languages that's
 not a major hurdle as we can simply use the english text until it's
 translated. But for other encodings does it make sense to include english
 text? 
 
 If we use UTF-8 all the ascii characters would display properly, but do most
 browsers support UTF-8 now? Or do people still use BIG5, EUS, etc? 

My experience has been really good. With 4.x+ browsers UTF8 displays just
fine, with the obvious caveat that you have to be using the right
fonts. Generally the people you are displaying to have the right fonts
(otherwise they wouldn't be able to use their computers!).

My only problems were two things: 1. Title bars in Linux just displayed
junk. This was probably both an encoding/window manager issue and a font
issue. 2. People don't want their content in UTF8 - they want it in the
character set they are used to, like ISO-8859-2. So I added support in
AxKit for alternate output encodings.

Of course being XML, AxKit handles different character sets in included
files just fine - everything is UTF8 to axkit.

 As far as I can tell there's no way in html to indicate to the browser that a
 chunk of content is in some other encoding other than what was specified in
 the headers or meta tag. There's no span charset=... attribute or anything
 like that.

Yes, there is.

 This seems to make truly multilingual pages really awkward. You
 basically must use an encoding like UTF-8 which can reach the entire unicode
 character set or else you cannot mix languages.

Or use AxKit ;-)

-- 
Matt/

Fastnet Software Ltd. High Performance Web Specialists
Providing mod_perl, XML, Sybase and Oracle solutions
Email for training and consultancy availability.
http://sergeant.org | AxKit: http://axkit.org




Re: multilanguage site

2000-09-02 Thread Eric L. Brine


  As far as I can tell there's no way in html to indicate to the
  browser that a chunk of content is in some other encoding other
  than what was specified in the headers or meta tag. There's no
  span charset=... attribute or anything like that.
 
 Yes, there is.

None exists in the standard, as seen below, and I don't see anything in
CSS either.

!ELEMENT SPAN - - (%inline;)*-- generic language/style container
--
!ATTLIST SPAN
  %attrs; -- %coreattrs, %i18n, %events --
  %reserved;  -- reserved for possible future use --
  

!ENTITY % attrs "%coreattrs; %i18n; %events;"

!ENTITY % coreattrs
 "id   ID #IMPLIED  -- document-wide unique id --
  classCDATA  #IMPLIED  -- space-separated list of classes
--
  style%StyleSheet;   #IMPLIED  -- associated style info --
  title%Text; #IMPLIED  -- advisory title --"
  

!ENTITY % i18n
 "lang %LanguageCode; #IMPLIED  -- language code --
  dir  (ltr|rtl)  #IMPLIED  -- direction for weak/neutral text
--"
  

!ENTITY % events
 "onclick %Script;  #IMPLIED  -- a pointer button was clicked --
  ondblclick  %Script;  #IMPLIED  -- a pointer button was double
clicked--
  onmousedown %Script;  #IMPLIED  -- a pointer button was pressed down
--
  onmouseup   %Script;  #IMPLIED  -- a pointer button was released --
  onmouseover %Script;  #IMPLIED  -- a pointer was moved onto --
  onmousemove %Script;  #IMPLIED  -- a pointer was moved within --
  onmouseout  %Script;  #IMPLIED  -- a pointer was moved away --
  onkeypress  %Script;  #IMPLIED  -- a key was pressed and released --
  onkeydown   %Script;  #IMPLIED  -- a key was pressed down --
  onkeyup %Script;  #IMPLIED  -- a key was released --"
  

ELB



Re: multilanguage site

2000-09-01 Thread Greg Stark


  can someone suggest me the best way to build a multilanguage web site
  (english, french, ..).
  I'm using Apache + mod_perl + Apache::asp (for applications)

I'm really interested in what other people are doing here. We've just released
our first cut at i18n and it's going fairly well. But so far we haven't dealt
with the big bugaboo, character encoding. 

One major problem I anticipate is what to do when individual include files are
not available in the local language. For iso-8859-1 encoded languages that's
not a major hurdle as we can simply use the english text until it's
translated. But for other encodings does it make sense to include english
text? 

If we use UTF-8 all the ascii characters would display properly, but do most
browsers support UTF-8 now? Or do people still use BIG5, EUS, etc? 

As far as I can tell there's no way in html to indicate to the browser that a
chunk of content is in some other encoding other than what was specified in
the headers or meta tag. There's no span charset=... attribute or anything
like that. This seems to make truly multilingual pages really awkward. You
basically must use an encoding like UTF-8 which can reach the entire unicode
character set or else you cannot mix languages.

-- 
greg




Re: multilanguage site

2000-09-01 Thread Eric L. Brine

 As far as I can tell there's no way in html to indicate to the browser 
 that a chunk of content is in some other encoding other than what was 
 specified in the headers or meta tag. There's no span charset=... 
 attribute or anything like that. This seems to make truly multilingual 
 pages really awkward.

 You basically must use an encoding like UTF-8 which can reach the
 entire unicode character set or else you cannot mix languages.

Not quite. To display characters not in the current character set, use
"...;" encodings, such as "eacute;" and "#;" (where  is
unicode).

ELB



multilanguage site

2000-08-29 Thread Francesco Pasqualini

can someone suggest me the best way to build a multilanguage web site
(english, french, ..).
I'm using Apache + mod_perl + Apache::asp (for applications)

Can be usefull XML/XSL whit AxKit ?
Is there any example/guideline ?

Thanks
Francesco Pasqualini




Re: multilanguage site

2000-08-29 Thread Matt Sergeant

On Tue, 29 Aug 2000, Francesco Pasqualini wrote:

 can someone suggest me the best way to build a multilanguage web site
 (english, french, ..).
 I'm using Apache + mod_perl + Apache::asp (for applications)
 
 Can be usefull XML/XSL whit AxKit ?
 Is there any example/guideline ?

This month's Web Techniques is all about this (albeit in a framework
independant manner). I suggest you try as hard as you can to get a copy as
it covers way more than I could possibly type here.

Also look up content negotiation in the Apache docs.

-- 
Matt/

Fastnet Software Ltd. High Performance Web Specialists
Providing mod_perl, XML, Sybase and Oracle solutions
Email for training and consultancy availability.
http://sergeant.org | AxKit: http://axkit.org




RE: multilanguage site

2000-08-29 Thread Jerrad Pierce

Try this:
http://webtechniques.com/archives/2000/09/yunker/
and perhaps this:
http://webtechniques.com/archives/2000/09/lagon/

-Original Message-
From: Matt Sergeant [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, August 29, 2000 9:16 AM
To: Francesco Pasqualini
Cc: [EMAIL PROTECTED]
Subject: Re: multilanguage site


On Tue, 29 Aug 2000, Francesco Pasqualini wrote:

 can someone suggest me the best way to build a multilanguage web site
 (english, french, ..).
 I'm using Apache + mod_perl + Apache::asp (for applications)
 
 Can be usefull XML/XSL whit AxKit ?
 Is there any example/guideline ?

This month's Web Techniques is all about this (albeit in a framework
independant manner). I suggest you try as hard as you can to 
get a copy as
it covers way more than I could possibly type here.

Also look up content negotiation in the Apache docs.

-- 
Matt/

Fastnet Software Ltd. High Performance Web Specialists
Providing mod_perl, XML, Sybase and Oracle solutions
Email for training and consultancy availability.
http://sergeant.org | AxKit: http://axkit.org




Re: multilanguage site

2000-08-29 Thread Stas Bekman

On Tue, 29 Aug 2000, Matt Sergeant wrote:

 On Tue, 29 Aug 2000, Francesco Pasqualini wrote:
 
  can someone suggest me the best way to build a multilanguage web site
  (english, french, ..).
  I'm using Apache + mod_perl + Apache::asp (for applications)
  
  Can be usefull XML/XSL whit AxKit ?
  Is there any example/guideline ?
 
 This month's Web Techniques is all about this (albeit in a framework
 independant manner). I suggest you try as hard as you can to get a copy as
 it covers way more than I could possibly type here.

You can get as many copies as want :) it's online:
http://www.webtechniques.com/

 Also look up content negotiation in the Apache docs.
 
 -- 
 Matt/
 
 Fastnet Software Ltd. High Performance Web Specialists
 Providing mod_perl, XML, Sybase and Oracle solutions
 Email for training and consultancy availability.
 http://sergeant.org | AxKit: http://axkit.org
 
 



_
Stas Bekman  JAm_pH --   Just Another mod_perl Hacker
http://stason.org/   mod_perl Guide  http://perl.apache.org/guide 
mailto:[EMAIL PROTECTED]   http://apachetoday.com http://jazzvalley.com
http://singlesheaven.com http://perlmonth.com   perl.org   apache.org





Re: multilanguage site

2000-08-29 Thread David Hodgkinson


"Francesco Pasqualini" [EMAIL PROTECTED] writes:

 can someone suggest me the best way to build a multilanguage web site
 (english, french, ..).
 I'm using Apache + mod_perl + Apache::asp (for applications)
 
 Can be usefull XML/XSL whit AxKit ?
 Is there any example/guideline ?

I'm interested in this too :-) The Deep Purple site just went vaguely
multilingual, but I'm doing this with straight Apache MultiViews
(which _are_ honoured by SSI, which is nice) and I can see this
becoming a huge headache.

I'd like to do it with the Template Toolkit if at all possible.

Dave

-- 
Dave Hodgkinson, http://www.hodgkinson.org
Editor-in-chief, The Highway Star   http://www.deep-purple.com
  Apache, mod_perl, MySQL, Sybase hired gun for, well, hire
  -



Re: multilanguage site

2000-08-29 Thread Matt Sergeant

On Tue, 29 Aug 2000, Stas Bekman wrote:

 On Tue, 29 Aug 2000, Matt Sergeant wrote:
 
  On Tue, 29 Aug 2000, Francesco Pasqualini wrote:
  
   can someone suggest me the best way to build a multilanguage web site
   (english, french, ..).
   I'm using Apache + mod_perl + Apache::asp (for applications)
   
   Can be usefull XML/XSL whit AxKit ?
   Is there any example/guideline ?
  
  This month's Web Techniques is all about this (albeit in a framework
  independant manner). I suggest you try as hard as you can to get a copy as
  it covers way more than I could possibly type here.
 
 You can get as many copies as want :) it's online:
 http://www.webtechniques.com/

Ah - I thought they had a lead time before it went online - guess
not! They also didn't used to include all articles online, but I guess
that has changed. Maybe I won't buy a subscription again (especially since
its free to US readers!)...

-- 
Matt/

Fastnet Software Ltd. High Performance Web Specialists
Providing mod_perl, XML, Sybase and Oracle solutions
Email for training and consultancy availability.
http://sergeant.org | AxKit: http://axkit.org





Re: multilanguage site

2000-08-29 Thread Joshua Chamas

Francesco Pasqualini wrote:
 
 can someone suggest me the best way to build a multilanguage web site
 (english, french, ..).
 I'm using Apache + mod_perl + Apache::asp (for applications)
 
 Can be usefull XML/XSL whit AxKit ?
 Is there any example/guideline ?
 

The approach used by Paul at RedHat seems to have been
to wrap internationalized messages with tagmessage/tag
where tag is an XMLSub, which would do a lookup at runtime
into a message catalog for the right message, based on what
language the client was set to.  I'm sure its much more
complicated than that, but that was the gist of it.

-- Joshua
_
Joshua Chamas   Chamas Enterprises Inc.
NodeWorks  free web link monitoring   Huntington Beach, CA  USA 
http://www.nodeworks.com1-714-625-4051