Re: [Pharo-project] XML Parser, Monticello and unicode?

Norbert Hartl Sat, 07 Aug 2010 09:33:39 -0700

On 07.08.2010, at 13:19, Stéphane Ducasse wrote:

>>>> 
>>>> 
>> Has this been reported before? If not why? This is really important. I don't 
>> think we can wait until Monticello is replaced by something different that 
>> will fix this :)
> 
> 
> the anwser is:
>       - check bug entries
>       - add new one if necessary
>       - propose a fix if possible
>       - else wait.
>       
I replied to Henriks mail. It was meant as a question to him because he knows 
about the topic. I see I need to be more clear next time. And btw. this was 
_not_ an answer to my question.


>> 
>> There are a few things here that work together in 98% of all cases. I didn't 
>> get it fully what is going on but
>> 
>> ZipArchiveMember>>contentStream does
>> ...
>> s := MultiByteBinaryOrTextStream on: (String new: self uncompressedSize).
>> s converter: Latin1TextConverter new.
>> ...
>> 
>> and
>> 
>> MultiByteBinaryOrTextStream>>defaultConverter
>>      ^ Latin1TextConverter new.
>> 
>> These two are being used when a monticello package is being read. So we have 
>> an assumption about an encoding here. On the other hand something in the 
>> system does something similar. I don't know InputEvents and how to debug 
>> them but if I create a method
>> 
>> EncTest>>encTest
>>      ^ 'ö'
>> 
>> I can see that
>> 
>> ((EncTest>>#encTest literalAt: 1) at: 1) asciiValue
>> 
>> is 246 which is something that matches latin1 to some extent.
>> 
>> This way there is a conversion (I think at the time I press on my keyboard) 
>> to latin1. While writing a monticello package I didn't find any conversion 
>> so this might be the reason that the files become latin1 on disk and can be 
>> read back using an explicit conversion from latin1.
>> 
>> But this does not explain how it does work with WideString. I would need to 
>> dig deeper but maybe someone of you have an idea.
>> 
>> To estimate the possibility to change this I think we should fix this. I 
>> scanned all of my cached monticello packages. Most of them are 7bit clean.
> 
> how do you do that?

for i in `find . -name "*.mcz"`; do 
   echo $i;
   unzip -qc $i snapshot/source.st | enca -L none;
done

> 
>> No problem for them if we change encoding. Besides XML Parser I didn't find 
>> any that contain WideString so no problem here. Some of them are latin1 
>> encoded (like Seaside 2.8 or Seaside-InternetExplorer from 3.0). That is the 
>> biggest problem because there is no fallback and monticello does not have a 
>> version number on file format, right?
>> I think it is still feasible to change this in monticello as the fix for 
>> users of older images will be probably only a few lines that you can apply 
>> to any version of monticello if I'm not wrong. But the change is not that 
>> easy.
> 
> 
> This is not clear to me what is the problem and potential solution. I was not 
> concentrated enough on pharo :(

It is not clear to me, either. That's why I am asking. I'm willing to track 
this down any further. At least i want to open a substantial ticket. But a 
little bit more information would be helpful.

Norbert


_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] XML Parser, Monticello and unicode?

Reply via email to