What happened exactly is very hard to trace because these FileStream
are a can of worms...
Here are some of my perigrinations:

FIRST POSSIBLE TRACK:

All methods were changed in 10305.
Monticello snapshot/source.st is not UTF-8.
If the file is opened UTF-8, then we get decompiledCode, I don't know why yet...
But the changes still go into the change log in correct UTF-8 form, so
that's just another bug, but not the real source of the problem.
For getting some worms out of the can just browse inst var defs of
converter in MultiByteFileStream:
The accessor #converter initialize converter with TextConverter
defaultSystemConverter which depends on LanguageEnvironment.
That is a Latin1TextConverter in my latin image.
Unless #reset is called first, in which case it will initialize with a
UTF8TextConverter.
Yes, but open: fileName forWrite: writeMode, does the job too with a
UTF8TextConverter.
You still follow? me neither.
A better behaved is #setConverterForCode that should let non UTF-8
.mcz work in UTF-8 environment, but not sure if called where
required...
I think Yoshiki changes are necessary only for writing source code
with character code > 255.
This was not the case of incriminated methods.

SECOND POSSIBLE TRACK:

Everything going to the change log pass thru the MultiByteFileStream,
so how did non UTF-8 characters went in?
I tried to follow two other clues:
1) There are senders of #primWrite:from:startingAt:count: not
redefined in MultiByteFileStream...
  for example, using #next:putAll:startingAt: will bypass the converter.
2) using nextPutAll: with a ByteArray argument also does bypass the
converter (See MultiByteFileStream>>#nextPutAll:)
I did not find the senders (you really believe senders of nextPutAll:
can be analyzed?).
I tried to instrument code with Notification, but I'm unable to
reproduce the problem, so that was vain...

THIRD POSSIBLE TRACK:

http://gforge.inria.fr/frs/download.php/22283/Pharo0.1Core-10304cl.zip
has the invalid UTF-8 problem, just before 10305 changes that
introduced decompiled code...
So we might attack the problem with another code snippet:

(SystemNavigation default browseAllCallsOn: (Smalltalk associationAt:
#SourceFiles))...

Hmm, I might have a better clue now.
The problem might possibly come from the condenseChanges in update10298.
What happen in a condenseChanges?
Changes are copied to this file:

f := FileStream fileNamed: 'ST80.temp'.

So far, so good, because the concreteStream is a MultiByteFileStream.

But the end finishes with:

        SourceFiles
                at: 2
                put: (StandardFileStream oldFileNamed: oldChanges name)

Waouh, no MultiByteFileStream here, so no more UTF-8.
But hey, that would be the inverse problem: reading UTF-8 text with
latin1 reader: I can't get an error doing this, only some strange
sequence of characters... (The UTF-8 encoding)...
Unless incriminated methods are further changed in #script376 or any
other method... In which case they are written in latin1 in the
changeLog...
Hmm... That could be the case eventually. We must restart update
process from 
http://gforge.inria.fr/frs/download.php/22167/Pharo0.1Core-10296cl-2.zip

One thing is sure, at next returnFromSnapshot, FileDirectory
class>>startup will reopen changes UTF-8.
So saving the image will reopen UTF-8...

But wait... Maybe we get enough pieces of the puzzle:
Analyzing the Pharo0.1Core-10304cl.changes tells that Stephane applied
several updates before snapshoting the image. So if Kernel and
System-Support are changed between 10298 and 10304, then we get the
explanation:
- condense changes put all in the .changes in UTF-8 but reopen the
changes in latin1
- further updates up to 10304 write changes in latin1
- image snapshot reopen changes in UTF-8 and thus we get further
invalid UTF-8...

That's easy to reproduce. Stef, can you confirm?

That also explain why I did not get the problem at home: I update
early and always save my image after.
After that we still have to detect and clean while Monticello sources
are interpreted UTF-8 when they should not (FIRST TRACK) , and
eventually make source code go UTF-8 in Monticello, so that non latin
programmers can use their favourite language eventually...

Nicolas

2009/5/23 Stéphane Ducasse <[email protected]>:
> No problem I never interpreted it like that.
> Me too I want a system that is working
>
> Adrian I will publish a fix for DNU now
> and I will try later to check the fixes proposed by yoshiki
>
> stef
>
> On May 23, 2009, at 1:29 PM, Tudor Girba wrote:
>
>> Actually, the fix is even simpler: if you find a method that raises
>> "invalid utf8 input detected", just browse to it with a class browser,
>> and re-accept it :).
>>
>> With my previous mail, I was not implying that someone should fix it
>> for me, I was merely asking for what could a quick solution be,
>> because I was a bit lost (scared) :). Now, I am happy. Thanks for
>> discussing it.
>>
>> Cheers,
>> Doru
>>
>> On 23 May 2009, at 13:07, Tudor Girba wrote:
>>
>>> Hi,
>>>
>>> I attached here a DNU implementation I took from an older image.
>>> After filing this one in, I can debug DNU problems.
>>>
>>> Cheers,
>>> Doru
>>>
>>> <Object-doesNotUnderstand.st>
>>>
>>>
>>>
>>> On 23 May 2009, at 13:04, Stéphane Ducasse wrote:
>>>
>>>> I did the following
>>>>
>>>> (Object>>#doesNotUNderstand) getSourceFromFile and I get an
>>>> invalid....
>>>>
>>>> Now when I take another method
>>>>
>>>> (BalloonFontTest>>#testDefaultFont) I do not get problem.
>>>>
>>>> I will reread carefully the mails of nicolas to try to understand,
>>>> I do not know if the fixes of yoh
>>>>
>>>>     http://bugs.squeak.org/view.php?id=5996
>>>> is related.
>>>>
>>>> Nicolas
>>>>
>>>>>> {Object>>#doesNotUnderstand:.
>>>>>> SystemNavigation>>#browseMethodsWhoseNamesContain:.
>>>>>> Utilities class>>#changeStampPerSe.
>>>>>> Utilities class>>#methodsWithInitials:} collect: [:e | (e
>>>>>> getSourceFromFile select: [:s | s charCode > 127]) asArray
>>>>>> collect:
>>>>>> [:c | c charCode]]
>>>>
>>>> I cannot get that code running it break before with me.
>>>>
>>>> Stef
>>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> [email protected]
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>> --
>>> www.tudorgirba.com
>>>
>>> "Not knowing how to do something is not an argument for how it
>>> cannot be done."
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [email protected]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>> --
>> www.tudorgirba.com
>>
>> "Problem solving efficiency grows with the abstractness level of
>> problem understanding."
>>
>>
>>
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [email protected]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
>
> _______________________________________________
> Pharo-project mailing list
> [email protected]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Reply via email to