[Sab] Some problems

Jan Janousek Thu, 28 Dec 2000 09:05:10 -0800
Hallo.

We (GA) had some internet conectivity problems and our site and mailboxes
were partly unreachable for some time. We are really sorry for this
problem. It looks it is ok now.

There is one new message which made its way to the Sablotron mailing list
but not to the majority of subscribers. So below it is again.

Jan

--------------------------------------------------------------------------

           From: Koscheev Andrey <[EMAIL PROTECTED]>
           Subject: [Sab] Encodings... 
           Date: Wed, 27 Dec 2000 09:10:32 -0800 

        
       Merry Christmas 
        
       This is quite deep technical stuff, so sysadmins and Perl
programmers may skip it.
        
        
       A few days ago I posted a note about encoding problems in Sablotron
0.44. The problem was not solved since I was waiting for the new version
to come. However nothing has changed, so I have spent
       some time today investigating the sources and revealed some
interesting things.
        
       Just to remind the real problem: 
       Assume Sablotron compiled with libiconv on a Linux machine (I was
testing 0.44, because 0.5 doesn't compile at all). After setting XML file
encoding to "windows-1250", setting XSL file encoding to
       "windows-1250" and adding <xsl:output> PI with encoding
"windows-1250" the following thing occurs: the result have HTML meta tag
with encoding set to "windows-1250" but is encoded in UTF-8.
       Of course, browsers are happy about the META tag and the page looks
very bizarre. 
        
       Sablotron checks if the encoding is correct using libiconv
functions but it doesn't use encodings which are not listed in
iconv_encoding array (utf8.cpp). Function OutputDefinition::getEncoding()
defaults to
       UTF-8, and when the encoding is not located in the static table,
utf8Recode() is not called at all.
        
       I have added new record to the table manually and changed some
other functions so that it works now, but it's not the best thing we can
do.
        
       Conclusions:
       1. Encodings in Sablotron are handled differently in different
functions. This is caused by improper abstraction of encoding functions.
If all functions, constants and other stuff around encoding was placed
       to one file and one class was used for all transformations, it
would be much easier to handle errors and add new encodings and
translation libraries.
       2. Libiconv ability to translate dozens of encodings is not used.
Number of encodings is limited to the specified in the constant table.
       4. I understand, that such a large project requires lots of other
things to do and the problem I describe is not the most important. If I
get some time next year (I mean 2001), I might try to rearrange
       encoding handling and offer you a more complete suggestionhow to do
it. 
       5. I saw many pieces of code, whare encoding different from UTF-8
were taken as exceptions. That means that after processing every entity,
we check many times if the encoding used is the "one we
       like or the one we dislike". It decreases perfomance and adds very
complicated C++ block trees to many functions. To make things function
predictably we could separate theese to special classes.
        
        
       Please don't consider it to be a bug description. It's just an idea
how to force things to perfection. 
        
       Sincerely Yours
        
       Koscheev Andey 
       Eller s.r.o.
[Sab] Some problems

Reply via email to