I confirm the scenario: 1) update10298 condenseChanges that let (SourceFiles at: 2) class = StandardFileStream This is the seed of further problems, because further changes will be encoded in latin1 (or MacRoman I don't really wnt to know) 2) update10302 changes the methods with non ASCII characters 3) Stef save the image after update10304, that does reopen (SourceFiles at: 2) in UTF-8, but that's too late, the worm is in the apple.
If you save the image just after the condenseChanges, no problem because (SourceFiles at: 2) is opened in Latin1 AFTER all the changes have gotten into it, and reopened UTF-8 before any changes got into it. We must track undue usage of StandardFileStream such as #condenseChanges. 2009/5/23 Nicolas Cellier <[email protected]>: > What happened exactly is very hard to trace because these FileStream > are a can of worms... > Here are some of my perigrinations: > > FIRST POSSIBLE TRACK: > > All methods were changed in 10305. > Monticello snapshot/source.st is not UTF-8. > If the file is opened UTF-8, then we get decompiledCode, I don't know why > yet... > But the changes still go into the change log in correct UTF-8 form, so > that's just another bug, but not the real source of the problem. > For getting some worms out of the can just browse inst var defs of > converter in MultiByteFileStream: > The accessor #converter initialize converter with TextConverter > defaultSystemConverter which depends on LanguageEnvironment. > That is a Latin1TextConverter in my latin image. > Unless #reset is called first, in which case it will initialize with a > UTF8TextConverter. > Yes, but open: fileName forWrite: writeMode, does the job too with a > UTF8TextConverter. > You still follow? me neither. > A better behaved is #setConverterForCode that should let non UTF-8 > .mcz work in UTF-8 environment, but not sure if called where > required... > I think Yoshiki changes are necessary only for writing source code > with character code > 255. > This was not the case of incriminated methods. > > SECOND POSSIBLE TRACK: > > Everything going to the change log pass thru the MultiByteFileStream, > so how did non UTF-8 characters went in? > I tried to follow two other clues: > 1) There are senders of #primWrite:from:startingAt:count: not > redefined in MultiByteFileStream... > for example, using #next:putAll:startingAt: will bypass the converter. > 2) using nextPutAll: with a ByteArray argument also does bypass the > converter (See MultiByteFileStream>>#nextPutAll:) > I did not find the senders (you really believe senders of nextPutAll: > can be analyzed?). > I tried to instrument code with Notification, but I'm unable to > reproduce the problem, so that was vain... > > THIRD POSSIBLE TRACK: > > http://gforge.inria.fr/frs/download.php/22283/Pharo0.1Core-10304cl.zip > has the invalid UTF-8 problem, just before 10305 changes that > introduced decompiled code... > So we might attack the problem with another code snippet: > > (SystemNavigation default browseAllCallsOn: (Smalltalk associationAt: > #SourceFiles))... > > Hmm, I might have a better clue now. > The problem might possibly come from the condenseChanges in update10298. > What happen in a condenseChanges? > Changes are copied to this file: > > f := FileStream fileNamed: 'ST80.temp'. > > So far, so good, because the concreteStream is a MultiByteFileStream. > > But the end finishes with: > > SourceFiles > at: 2 > put: (StandardFileStream oldFileNamed: oldChanges name) > > Waouh, no MultiByteFileStream here, so no more UTF-8. > But hey, that would be the inverse problem: reading UTF-8 text with > latin1 reader: I can't get an error doing this, only some strange > sequence of characters... (The UTF-8 encoding)... > Unless incriminated methods are further changed in #script376 or any > other method... In which case they are written in latin1 in the > changeLog... > Hmm... That could be the case eventually. We must restart update > process from > http://gforge.inria.fr/frs/download.php/22167/Pharo0.1Core-10296cl-2.zip > > One thing is sure, at next returnFromSnapshot, FileDirectory > class>>startup will reopen changes UTF-8. > So saving the image will reopen UTF-8... > > But wait... Maybe we get enough pieces of the puzzle: > Analyzing the Pharo0.1Core-10304cl.changes tells that Stephane applied > several updates before snapshoting the image. So if Kernel and > System-Support are changed between 10298 and 10304, then we get the > explanation: > - condense changes put all in the .changes in UTF-8 but reopen the > changes in latin1 > - further updates up to 10304 write changes in latin1 > - image snapshot reopen changes in UTF-8 and thus we get further > invalid UTF-8... > > That's easy to reproduce. Stef, can you confirm? > > That also explain why I did not get the problem at home: I update > early and always save my image after. > After that we still have to detect and clean while Monticello sources > are interpreted UTF-8 when they should not (FIRST TRACK) , and > eventually make source code go UTF-8 in Monticello, so that non latin > programmers can use their favourite language eventually... > > Nicolas > > 2009/5/23 Stéphane Ducasse <[email protected]>: >> No problem I never interpreted it like that. >> Me too I want a system that is working >> >> Adrian I will publish a fix for DNU now >> and I will try later to check the fixes proposed by yoshiki >> >> stef >> >> On May 23, 2009, at 1:29 PM, Tudor Girba wrote: >> >>> Actually, the fix is even simpler: if you find a method that raises >>> "invalid utf8 input detected", just browse to it with a class browser, >>> and re-accept it :). >>> >>> With my previous mail, I was not implying that someone should fix it >>> for me, I was merely asking for what could a quick solution be, >>> because I was a bit lost (scared) :). Now, I am happy. Thanks for >>> discussing it. >>> >>> Cheers, >>> Doru >>> >>> On 23 May 2009, at 13:07, Tudor Girba wrote: >>> >>>> Hi, >>>> >>>> I attached here a DNU implementation I took from an older image. >>>> After filing this one in, I can debug DNU problems. >>>> >>>> Cheers, >>>> Doru >>>> >>>> <Object-doesNotUnderstand.st> >>>> >>>> >>>> >>>> On 23 May 2009, at 13:04, Stéphane Ducasse wrote: >>>> >>>>> I did the following >>>>> >>>>> (Object>>#doesNotUNderstand) getSourceFromFile and I get an >>>>> invalid.... >>>>> >>>>> Now when I take another method >>>>> >>>>> (BalloonFontTest>>#testDefaultFont) I do not get problem. >>>>> >>>>> I will reread carefully the mails of nicolas to try to understand, >>>>> I do not know if the fixes of yoh >>>>> >>>>> http://bugs.squeak.org/view.php?id=5996 >>>>> is related. >>>>> >>>>> Nicolas >>>>> >>>>>>> {Object>>#doesNotUnderstand:. >>>>>>> SystemNavigation>>#browseMethodsWhoseNamesContain:. >>>>>>> Utilities class>>#changeStampPerSe. >>>>>>> Utilities class>>#methodsWithInitials:} collect: [:e | (e >>>>>>> getSourceFromFile select: [:s | s charCode > 127]) asArray >>>>>>> collect: >>>>>>> [:c | c charCode]] >>>>> >>>>> I cannot get that code running it break before with me. >>>>> >>>>> Stef >>>>> >>>>> _______________________________________________ >>>>> Pharo-project mailing list >>>>> [email protected] >>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>>> >>>> -- >>>> www.tudorgirba.com >>>> >>>> "Not knowing how to do something is not an argument for how it >>>> cannot be done." >>>> >>>> _______________________________________________ >>>> Pharo-project mailing list >>>> [email protected] >>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>> >>> -- >>> www.tudorgirba.com >>> >>> "Problem solving efficiency grows with the abstractness level of >>> problem understanding." >>> >>> >>> >>> >>> _______________________________________________ >>> Pharo-project mailing list >>> [email protected] >>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>> >> >> >> _______________________________________________ >> Pharo-project mailing list >> [email protected] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >> > _______________________________________________ Pharo-project mailing list [email protected] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
