Excellent! Thanks guys. I'm preparing a lectures for torino and I will experiment with umejava mcz fixes.
Stef On May 23, 2009, at 8:49 PM, Adrian Lienhard wrote: > Wow, great analysis, Nicolas! > > I was trying to find the cause for several hours now. Your third track > exactly matches my findings. > > For example in Object>>#doesNotUnderstand: prior to the condensing, > the source contained a non-ASCII character (UTF8 encoded as the two > bytes: 192 160). This gets correctly transferred during the condensing > into the new changes file. When you don't save the image (and hence > have the standard stream without UTF8 encoder) what you see in the > source is the character  (this is 192). That is, we suddenly have two > characters, 192 and 160 where before there was just one. If you load a > package, MC will compare methods and think this is a change. When > loading the method from the MC file, the source is UTF8 encoded, > producing a unicode character 160. When storing this source to the > file (still without the encoder), it will just directly put 160 there. > At this point we have lost the leading byte 192. Next time we start > or save the image and have the right encoder again, it will choke > because 160 is an invalid first byte in UTF8. > > I think it's save to fix the invalid methods by overriding their > source. So we don't have to backtrack to version 10297. > > Thanks, > Adrian > > > On May 23, 2009, at 19:57 , Nicolas Cellier wrote: > >> I confirm the scenario: >> 1) update10298 condenseChanges that let (SourceFiles at: 2) class = >> StandardFileStream >> This is the seed of further problems, because further changes will >> be encoded in latin1 (or MacRoman I don't really wnt to know) >> 2) update10302 changes the methods with non ASCII characters >> 3) Stef save the image after update10304, that does reopen >> (SourceFiles at: 2) in UTF-8, but that's too late, the worm is in the >> apple. >> >> If you save the image just after the condenseChanges, no problem >> because (SourceFiles at: 2) is opened in Latin1 AFTER all the changes >> have gotten into it, and reopened UTF-8 before any changes got into >> it. >> We must track undue usage of StandardFileStream such as >> #condenseChanges. >> >> 2009/5/23 Nicolas Cellier <[email protected]>: >>> What happened exactly is very hard to trace because these FileStream >>> are a can of worms... >>> Here are some of my perigrinations: >>> >>> FIRST POSSIBLE TRACK: >>> >>> All methods were changed in 10305. >>> Monticello snapshot/source.st is not UTF-8. >>> If the file is opened UTF-8, then we get decompiledCode, I don't >>> know why yet... >>> But the changes still go into the change log in correct UTF-8 form, >>> so >>> that's just another bug, but not the real source of the problem. >>> For getting some worms out of the can just browse inst var defs of >>> converter in MultiByteFileStream: >>> The accessor #converter initialize converter with TextConverter >>> defaultSystemConverter which depends on LanguageEnvironment. >>> That is a Latin1TextConverter in my latin image. >>> Unless #reset is called first, in which case it will initialize >>> with a >>> UTF8TextConverter. >>> Yes, but open: fileName forWrite: writeMode, does the job too with a >>> UTF8TextConverter. >>> You still follow? me neither. >>> A better behaved is #setConverterForCode that should let non UTF-8 >>> .mcz work in UTF-8 environment, but not sure if called where >>> required... >>> I think Yoshiki changes are necessary only for writing source code >>> with character code > 255. >>> This was not the case of incriminated methods. >>> >>> SECOND POSSIBLE TRACK: >>> >>> Everything going to the change log pass thru the >>> MultiByteFileStream, >>> so how did non UTF-8 characters went in? >>> I tried to follow two other clues: >>> 1) There are senders of #primWrite:from:startingAt:count: not >>> redefined in MultiByteFileStream... >>> for example, using #next:putAll:startingAt: will bypass the >>> converter. >>> 2) using nextPutAll: with a ByteArray argument also does bypass the >>> converter (See MultiByteFileStream>>#nextPutAll:) >>> I did not find the senders (you really believe senders of >>> nextPutAll: >>> can be analyzed?). >>> I tried to instrument code with Notification, but I'm unable to >>> reproduce the problem, so that was vain... >>> >>> THIRD POSSIBLE TRACK: >>> >>> http://gforge.inria.fr/frs/download.php/22283/ >>> Pharo0.1Core-10304cl.zip >>> has the invalid UTF-8 problem, just before 10305 changes that >>> introduced decompiled code... >>> So we might attack the problem with another code snippet: >>> >>> (SystemNavigation default browseAllCallsOn: (Smalltalk >>> associationAt: >>> #SourceFiles))... >>> >>> Hmm, I might have a better clue now. >>> The problem might possibly come from the condenseChanges in >>> update10298. >>> What happen in a condenseChanges? >>> Changes are copied to this file: >>> >>> f := FileStream fileNamed: 'ST80.temp'. >>> >>> So far, so good, because the concreteStream is a >>> MultiByteFileStream. >>> >>> But the end finishes with: >>> >>> SourceFiles >>> at: 2 >>> put: (StandardFileStream oldFileNamed: oldChanges name) >>> >>> Waouh, no MultiByteFileStream here, so no more UTF-8. >>> But hey, that would be the inverse problem: reading UTF-8 text with >>> latin1 reader: I can't get an error doing this, only some strange >>> sequence of characters... (The UTF-8 encoding)... >>> Unless incriminated methods are further changed in #script376 or any >>> other method... In which case they are written in latin1 in the >>> changeLog... >>> Hmm... That could be the case eventually. We must restart update >>> process from >>> http://gforge.inria.fr/frs/download.php/22167/Pharo0.1Core-10296cl-2.zip >>> >>> One thing is sure, at next returnFromSnapshot, FileDirectory >>> class>>startup will reopen changes UTF-8. >>> So saving the image will reopen UTF-8... >>> >>> But wait... Maybe we get enough pieces of the puzzle: >>> Analyzing the Pharo0.1Core-10304cl.changes tells that Stephane >>> applied >>> several updates before snapshoting the image. So if Kernel and >>> System-Support are changed between 10298 and 10304, then we get the >>> explanation: >>> - condense changes put all in the .changes in UTF-8 but reopen the >>> changes in latin1 >>> - further updates up to 10304 write changes in latin1 >>> - image snapshot reopen changes in UTF-8 and thus we get further >>> invalid UTF-8... >>> >>> That's easy to reproduce. Stef, can you confirm? >>> >>> That also explain why I did not get the problem at home: I update >>> early and always save my image after. >>> After that we still have to detect and clean while Monticello >>> sources >>> are interpreted UTF-8 when they should not (FIRST TRACK) , and >>> eventually make source code go UTF-8 in Monticello, so that non >>> latin >>> programmers can use their favourite language eventually... >>> >>> Nicolas >>> >>> 2009/5/23 Stéphane Ducasse <[email protected]>: >>>> No problem I never interpreted it like that. >>>> Me too I want a system that is working >>>> >>>> Adrian I will publish a fix for DNU now >>>> and I will try later to check the fixes proposed by yoshiki >>>> >>>> stef >>>> >>>> On May 23, 2009, at 1:29 PM, Tudor Girba wrote: >>>> >>>>> Actually, the fix is even simpler: if you find a method that >>>>> raises >>>>> "invalid utf8 input detected", just browse to it with a class >>>>> browser, >>>>> and re-accept it :). >>>>> >>>>> With my previous mail, I was not implying that someone should fix >>>>> it >>>>> for me, I was merely asking for what could a quick solution be, >>>>> because I was a bit lost (scared) :). Now, I am happy. Thanks for >>>>> discussing it. >>>>> >>>>> Cheers, >>>>> Doru >>>>> >>>>> On 23 May 2009, at 13:07, Tudor Girba wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I attached here a DNU implementation I took from an older image. >>>>>> After filing this one in, I can debug DNU problems. >>>>>> >>>>>> Cheers, >>>>>> Doru >>>>>> >>>>>> <Object-doesNotUnderstand.st> >>>>>> >>>>>> >>>>>> >>>>>> On 23 May 2009, at 13:04, Stéphane Ducasse wrote: >>>>>> >>>>>>> I did the following >>>>>>> >>>>>>> (Object>>#doesNotUNderstand) getSourceFromFile and I get an >>>>>>> invalid.... >>>>>>> >>>>>>> Now when I take another method >>>>>>> >>>>>>> (BalloonFontTest>>#testDefaultFont) I do not get problem. >>>>>>> >>>>>>> I will reread carefully the mails of nicolas to try to >>>>>>> understand, >>>>>>> I do not know if the fixes of yoh >>>>>>> >>>>>>> http://bugs.squeak.org/view.php?id=5996 >>>>>>> is related. >>>>>>> >>>>>>> Nicolas >>>>>>> >>>>>>>>> {Object>>#doesNotUnderstand:. >>>>>>>>> SystemNavigation>>#browseMethodsWhoseNamesContain:. >>>>>>>>> Utilities class>>#changeStampPerSe. >>>>>>>>> Utilities class>>#methodsWithInitials:} collect: [:e | (e >>>>>>>>> getSourceFromFile select: [:s | s charCode > 127]) asArray >>>>>>>>> collect: >>>>>>>>> [:c | c charCode]] >>>>>>> >>>>>>> I cannot get that code running it break before with me. >>>>>>> >>>>>>> Stef >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Pharo-project mailing list >>>>>>> [email protected] >>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>>>>> >>>>>> -- >>>>>> www.tudorgirba.com >>>>>> >>>>>> "Not knowing how to do something is not an argument for how it >>>>>> cannot be done." >>>>>> >>>>>> _______________________________________________ >>>>>> Pharo-project mailing list >>>>>> [email protected] >>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo- >>>>>> project >>>>> >>>>> -- >>>>> www.tudorgirba.com >>>>> >>>>> "Problem solving efficiency grows with the abstractness level of >>>>> problem understanding." >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Pharo-project mailing list >>>>> [email protected] >>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo- >>>>> project >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Pharo-project mailing list >>>> [email protected] >>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>>> >>> >> >> _______________________________________________ >> Pharo-project mailing list >> [email protected] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > > > _______________________________________________ > Pharo-project mailing list > [email protected] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [email protected] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
