What happened exactly is very hard to trace because these FileStream are a can of worms... Here are some of my perigrinations:
FIRST POSSIBLE TRACK: All methods were changed in 10305. Monticello snapshot/source.st is not UTF-8. If the file is opened UTF-8, then we get decompiledCode, I don't know why yet... But the changes still go into the change log in correct UTF-8 form, so that's just another bug, but not the real source of the problem. For getting some worms out of the can just browse inst var defs of converter in MultiByteFileStream: The accessor #converter initialize converter with TextConverter defaultSystemConverter which depends on LanguageEnvironment. That is a Latin1TextConverter in my latin image. Unless #reset is called first, in which case it will initialize with a UTF8TextConverter. Yes, but open: fileName forWrite: writeMode, does the job too with a UTF8TextConverter. You still follow? me neither. A better behaved is #setConverterForCode that should let non UTF-8 .mcz work in UTF-8 environment, but not sure if called where required... I think Yoshiki changes are necessary only for writing source code with character code > 255. This was not the case of incriminated methods. SECOND POSSIBLE TRACK: Everything going to the change log pass thru the MultiByteFileStream, so how did non UTF-8 characters went in? I tried to follow two other clues: 1) There are senders of #primWrite:from:startingAt:count: not redefined in MultiByteFileStream... for example, using #next:putAll:startingAt: will bypass the converter. 2) using nextPutAll: with a ByteArray argument also does bypass the converter (See MultiByteFileStream>>#nextPutAll:) I did not find the senders (you really believe senders of nextPutAll: can be analyzed?). I tried to instrument code with Notification, but I'm unable to reproduce the problem, so that was vain... THIRD POSSIBLE TRACK: http://gforge.inria.fr/frs/download.php/22283/Pharo0.1Core-10304cl.zip has the invalid UTF-8 problem, just before 10305 changes that introduced decompiled code... So we might attack the problem with another code snippet: (SystemNavigation default browseAllCallsOn: (Smalltalk associationAt: #SourceFiles))... Hmm, I might have a better clue now. The problem might possibly come from the condenseChanges in update10298. What happen in a condenseChanges? Changes are copied to this file: f := FileStream fileNamed: 'ST80.temp'. So far, so good, because the concreteStream is a MultiByteFileStream. But the end finishes with: SourceFiles at: 2 put: (StandardFileStream oldFileNamed: oldChanges name) Waouh, no MultiByteFileStream here, so no more UTF-8. But hey, that would be the inverse problem: reading UTF-8 text with latin1 reader: I can't get an error doing this, only some strange sequence of characters... (The UTF-8 encoding)... Unless incriminated methods are further changed in #script376 or any other method... In which case they are written in latin1 in the changeLog... Hmm... That could be the case eventually. We must restart update process from http://gforge.inria.fr/frs/download.php/22167/Pharo0.1Core-10296cl-2.zip One thing is sure, at next returnFromSnapshot, FileDirectory class>>startup will reopen changes UTF-8. So saving the image will reopen UTF-8... But wait... Maybe we get enough pieces of the puzzle: Analyzing the Pharo0.1Core-10304cl.changes tells that Stephane applied several updates before snapshoting the image. So if Kernel and System-Support are changed between 10298 and 10304, then we get the explanation: - condense changes put all in the .changes in UTF-8 but reopen the changes in latin1 - further updates up to 10304 write changes in latin1 - image snapshot reopen changes in UTF-8 and thus we get further invalid UTF-8... That's easy to reproduce. Stef, can you confirm? That also explain why I did not get the problem at home: I update early and always save my image after. After that we still have to detect and clean while Monticello sources are interpreted UTF-8 when they should not (FIRST TRACK) , and eventually make source code go UTF-8 in Monticello, so that non latin programmers can use their favourite language eventually... Nicolas 2009/5/23 Stéphane Ducasse <[email protected]>: > No problem I never interpreted it like that. > Me too I want a system that is working > > Adrian I will publish a fix for DNU now > and I will try later to check the fixes proposed by yoshiki > > stef > > On May 23, 2009, at 1:29 PM, Tudor Girba wrote: > >> Actually, the fix is even simpler: if you find a method that raises >> "invalid utf8 input detected", just browse to it with a class browser, >> and re-accept it :). >> >> With my previous mail, I was not implying that someone should fix it >> for me, I was merely asking for what could a quick solution be, >> because I was a bit lost (scared) :). Now, I am happy. Thanks for >> discussing it. >> >> Cheers, >> Doru >> >> On 23 May 2009, at 13:07, Tudor Girba wrote: >> >>> Hi, >>> >>> I attached here a DNU implementation I took from an older image. >>> After filing this one in, I can debug DNU problems. >>> >>> Cheers, >>> Doru >>> >>> <Object-doesNotUnderstand.st> >>> >>> >>> >>> On 23 May 2009, at 13:04, Stéphane Ducasse wrote: >>> >>>> I did the following >>>> >>>> (Object>>#doesNotUNderstand) getSourceFromFile and I get an >>>> invalid.... >>>> >>>> Now when I take another method >>>> >>>> (BalloonFontTest>>#testDefaultFont) I do not get problem. >>>> >>>> I will reread carefully the mails of nicolas to try to understand, >>>> I do not know if the fixes of yoh >>>> >>>> http://bugs.squeak.org/view.php?id=5996 >>>> is related. >>>> >>>> Nicolas >>>> >>>>>> {Object>>#doesNotUnderstand:. >>>>>> SystemNavigation>>#browseMethodsWhoseNamesContain:. >>>>>> Utilities class>>#changeStampPerSe. >>>>>> Utilities class>>#methodsWithInitials:} collect: [:e | (e >>>>>> getSourceFromFile select: [:s | s charCode > 127]) asArray >>>>>> collect: >>>>>> [:c | c charCode]] >>>> >>>> I cannot get that code running it break before with me. >>>> >>>> Stef >>>> >>>> _______________________________________________ >>>> Pharo-project mailing list >>>> [email protected] >>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>> >>> -- >>> www.tudorgirba.com >>> >>> "Not knowing how to do something is not an argument for how it >>> cannot be done." >>> >>> _______________________________________________ >>> Pharo-project mailing list >>> [email protected] >>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >> >> -- >> www.tudorgirba.com >> >> "Problem solving efficiency grows with the abstractness level of >> problem understanding." >> >> >> >> >> _______________________________________________ >> Pharo-project mailing list >> [email protected] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >> > > > _______________________________________________ > Pharo-project mailing list > [email protected] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [email protected] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
