Re: UTF-8 parsing faster than US-ASCII

Glenn Marcy Wed, 01 Aug 2001 14:43:42 -0700

Xerces has hard-wired encoding support for UTF-8.  US-ASCII, ISO-8859-1,
ISO-Latin-1, etc. are passed to the Java JDK,
for which the results may differ on different environments/platforms.

-Glenn



                                                                                       
                            
                    "Sandeep                                                           
                            
                    Randhawa"            To:     <[EMAIL PROTECTED]>         
                            
                    <[EMAIL PROTECTED]       cc:                                           
                            
                    t.in>                Subject:     UTF-8 parsing faster than 
US-ASCII                           
                                                                                       
                            
                    08/01/2001                                                         
                            
                    12:07 PM                                                           
                            
                    Please respond                                                     
                            
                    to                                                                 
                            
                    xerces-j-dev                                                       
                            
                                                                                       
                            
                                                                                       
                            



Hi,
    Somebody noticed this on Netbeans. I did a few my tests of my own and
found similar results. Is this a known issue? Very contrary to the docs.

Sandeep Randhawa

Sandeep Randhawa wrote:
>
> <?xml encoding="UTF-8" ?>
>
> If there is no specific reason to use "utf-8", stick with "us-ascii".
> Parsing is faster. Also, I noticed all of Netbeans Settings are stored
> without encoding attribute in the prolog. Xerces defaults to "utf-8" if
no
> encoding attribute is present. So for Petr Nejedly, add the attribute in
the
> prolog, we might catch a few more milliseconds.

I tried it, but with the opposite results.
I made a simple test that created a filesystem over all the modules
layers (it is a part of IDE startup sequence) and measured the time.
Then I replaced all the encoding="UTF-8" with us-ascii and added
it where it was missing and the parsing was slower then, but not much.
so I guess we could stick with using utf-8.

--
Petr Nejedly
NetBeans/Sun Microsystems
http://www.netbeans.org




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Re: UTF-8 parsing faster than US-ASCII

Reply via email to