Re: [Resin-interest] java unicode problem in cross-plateform situation

2010-01-18 Thread Riccardo Cohen
Knut Forkalsrud wrote:
 > You most likely will have to know which characters you want to replace,
 > I don't see any way around that.  Anyway the more interesting change
 > is probably to replace a character at a time instead of a byte at a time.
 > How to specify each character literal in your source file is a separate
 > problem.

When I do sb_replace(ret,lit,urlset3b[idx]) I replace strings, not 
bytes. This is probably not very performant... but it is not at byte level.

 > Try "man locale" in your terminal window.

this gives many locales available, including fr_FR.UTF-8

 > Good luck,

Thanks for your help, I'll try.

 > -Knut
 > PS: If you get all the character set issues under control you 
probably don't
 > even need to replace characters in the URLs.  See for example Wikipedia
 > with URLs like http://en.wikipedia.org/wiki/Sebastián_Piñera
 > 

The problem with full unicode url is that it does not work with old 
navigators like ie6. Well if a page appears bad with IE6 i don't mind, 
but if it gives error 404 it is a problem.

Should I consider unicode url as mainly supported ?

At the end my application will need unicode url since it should work in 
russian language. But for the moment I just try to understand the 
runtime problem.

Thanks


-- 
Riccardo Cohen
Architecte du Logiciel
http://www.architectedulogiciel.fr
+33 (0)6.09.83.64.49
Membre du réseau http://www.reflexe-conseil-centre.org




___
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest


Re: [Resin-interest] java unicode problem in cross-plateform situation

2010-01-18 Thread Knut Forkalsrud
On Sun, Jan 17, 2010 at 22:47, Riccardo Cohen wrote:

> Using numeric is more difficult because I have to make a program to
> write these values as I don't know them,


You most likely will have to know which characters you want to replace,
I don't see any way around that.  Anyway the more interesting change
is probably to replace a character at a time instead of a byte at a time.
How to specify each character literal in your source file is a separate
problem.


> But
> in the same time, source code is not used at run time, and the problem
> is at run time.
>

In general, the problem could be at compile time even if you don't see
the symptoms until runtime.


> I checked my locales on unix and there is no other locale installed than
> UTF-8. On mac it is more difficult to check.


Try "man locale" in your terminal window.

Good luck,

-Knut

PS: If you get all the character set issues under control you probably don't
even need to replace characters in the URLs.  See for example Wikipedia
with URLs like 
http://en.wikipedia.org/wiki/Sebastián_Piñera
___
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest


Re: [Resin-interest] java unicode problem in cross-plateform situation

2010-01-17 Thread Riccardo Cohen
Thanks Knut,

Using numeric is more difficult because I have to make a program to 
write these values as I don't know them, and after that I cannot check 
my source code anymore... I'll try to see if it changes something. But 
in the same time, source code is not used at run time, and the problem 
is at run time.
I checked my locales on unix and there is no other locale installed than 
UTF-8. On mac it is more difficult to check.

Knut Forkalsrud wrote:
> You may want to double check if you really have UTF-8 in your 
> environment in all circumstances.
> Here you also depend on the character set of the source file, which is 
> unnecessary.
> And your code could be simpler.
> 
> char[] urlset3a = {0xc0,\u00c1,    // either numeric literals or 
> unicode escapes
> char[] urlset3b = {'a','a'
> 
> String s = ;
> for (int i=0; i   s = s.replace(urlset3a[i], urlset3b[i]);
> 
> return s;
> 
> 
> 
> On Fri, Jan 15, 2010 at 00:40, Riccardo Cohen 
> mailto:r...@architectedulogiciel.fr>> wrote:
> 
> Hello
> I wrote a piece of code to remove diacritics :
> 
> String urlset3a[]={"À","Á"
> String urlset3b[]={"a","a"
> for (idx=0;idx {
>  String lit=null;
>  try{lit=new String(urlset3a[idx].getBytes(),"UTF-8");}catch(Exception
> ex){}
>  if (lit!=null)
>   sb_replace(ret,lit,urlset3b[idx]);
> }
> 
> This code works perfectly on macosx and on linux when compiled on the
> plateform it is run on.
> 
> javac com/adl/java/utils/Stringutils.java
> java com.adl.java.utils.Stringutils "aÄaé° ç%Щx"
> URL:aaae-cx
> 
> But when I compile this class on macosx and if I copy and use this
> .class on linux, the accents and special characters are no more replaced
> correctly (same in the other way) :
> 
> Mac class used on linux :
> java com.adl.java.utils.Stringutils "aÄaé° ç%Щx"
> aäaé°-çx
> 
> Linux class used on mac :
> java com.adl.java.utils.Stringutils "aÄaé° ç%Щx"
> URL:a?a??-?x
> 
> My LANG env var is fr_FR.UTF-8 on both plateforms. java version
> "1.5.0_17" on linux and "1.5.0_22" on mac
> 
> 
> Does anybody know why and how to correct this ? (this class is in a jar
> that is copied to the server and not compiled by resin)
> Thanks
> --
> Riccardo Cohen
> Architecte du Logiciel
> http://www.architectedulogiciel.fr
> +33 (0)6.09.83.64.49
> Membre du réseau http://www.reflexe-conseil-centre.org
> 
> 
> 
> 
> ___
> resin-interest mailing list
> resin-interest@caucho.com 
> http://maillist.caucho.com/mailman/listinfo/resin-interest
> 
> 
> 
> 
> 
> ___
> resin-interest mailing list
> resin-interest@caucho.com
> http://maillist.caucho.com/mailman/listinfo/resin-interest

-- 
Riccardo Cohen
Architecte du Logiciel
http://www.architectedulogiciel.fr
+33 (0)6.09.83.64.49
Membre du réseau http://www.reflexe-conseil-centre.org




___
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest


Re: [Resin-interest] java unicode problem in cross-plateform situation

2010-01-15 Thread Knut Forkalsrud
You may want to double check if you really have UTF-8 in your environment in
all circumstances.
Here you also depend on the character set of the source file, which is
unnecessary.
And your code could be simpler.

char[] urlset3a = {0xc0,\u00c1,    // either numeric literals or unicode
escapes
char[] urlset3b = {'a','a'

String s = ;
for (int i=0; iwrote:

> Hello
> I wrote a piece of code to remove diacritics :
>
> String urlset3a[]={"À","Á"
> String urlset3b[]={"a","a"
> for (idx=0;idx {
>  String lit=null;
>  try{lit=new String(urlset3a[idx].getBytes(),"UTF-8");}catch(Exception
> ex){}
>  if (lit!=null)
>   sb_replace(ret,lit,urlset3b[idx]);
> }
>
> This code works perfectly on macosx and on linux when compiled on the
> plateform it is run on.
>
> javac com/adl/java/utils/Stringutils.java
> java com.adl.java.utils.Stringutils "aÄaé° ç%Щx"
> URL:aaae-cx
>
> But when I compile this class on macosx and if I copy and use this
> .class on linux, the accents and special characters are no more replaced
> correctly (same in the other way) :
>
> Mac class used on linux :
> java com.adl.java.utils.Stringutils "aÄaé° ç%Щx"
> aäaé°-çx
>
> Linux class used on mac :
> java com.adl.java.utils.Stringutils "aÄaé° ç%Щx"
> URL:a?a??-?x
>
> My LANG env var is fr_FR.UTF-8 on both plateforms. java version
> "1.5.0_17" on linux and "1.5.0_22" on mac
>
>
> Does anybody know why and how to correct this ? (this class is in a jar
> that is copied to the server and not compiled by resin)
> Thanks
> --
> Riccardo Cohen
> Architecte du Logiciel
> http://www.architectedulogiciel.fr
> +33 (0)6.09.83.64.49
> Membre du réseau http://www.reflexe-conseil-centre.org
>
>
>
>
> ___
> resin-interest mailing list
> resin-interest@caucho.com
> http://maillist.caucho.com/mailman/listinfo/resin-interest
>
___
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest