I am not shy, the mail of Guille is perfect, I cannot do it! Thanks a lot for all the explanations!
On Thu, Apr 11, 2019 at 3:25 PM Guillermo Polito <guillermopol...@gmail.com> wrote: > So, this morning we have been working on validating this hypothesis and > making a meaningful and reproducible test for it. > I'm just the messenger here because Pablo who has done most of the job is > shy. > > ## The main story > > Since we are working with concurrency and non-determinism, we have > designed a test that reproduces the case with an accuracy of ~94%. > Basically the test renders a Rubric morph in concurrent green threads on a > big string, and then we compare that the drawn morphs should be bit > identical. > We have made some measurements and observed that putting three concurrent > processes made the chance of hitting the bug to almost 94%. > So running it more than ~5 times makes it highly reproducible. > > Good thing: this test inside a loop of 10 iterations (to increase the > probability of hitting the bug to >~99%) detects the bug in the stock image > and not in the one with our patch. > Bad thing: this test relies on rubric, we have tried to simplify it (like > just doing a collect on the string to fetch the glyph for each character), > but simpler code produces a tighter loop which makes it far less > reproducible (down from 94 to 33%). > > Still, there are several questions open: > - can we simplify the test? :) > - we have to think about a smart way to serialize Freetype access in the > image (there are several users) > - I'll paste below our math, so if somebody would like to review it would > be good too, our probability is a bit rusty. > - we have been doing benchmarks with Pablo this morning and adding a > mutex does not seem to have any negative impact in rendering. > > Help is super welcome of course!! > The patch that Pablo has done in the PR will just work for the most common > cases (at least for the calypso tabs), but it shows that FT* needs some > love. > Still we would like to be able to backport something like this in Pharo7, > but backporting a big Freetype refactor would not be so good. > Maybe the mutex is a good enough change for Pharo7? > > Any more ideas? input? > > Cheers, > Guille and Pablo > > ## The code > > The code of the test is the following: > > ``` > | sem text canvases blocky | > sem := Semaphore new. > text := (String loremIpsum: 25*1024). > FreeTypeCache current removeAll. > canvases := OrderedCollection new. > > blocky := [ | canvas | > canvas := FormCanvas extent: 1000@1000. > canvases add: canvas. > (RubScrolledTextMorph new) > setText: text; > font: StandardFonts codeFont; > bounds: (0@0 corner: canvas form extent); > fullDrawOn: canvas. > sem signal ]. > blocky forkAt: 39. > blocky forkAt: 39. > blocky forkAt: 39. > sem > wait; > wait; > wait. > > self assert: (((canvases at:1) form bits = (canvases at:2) form bits) and: > [ ((canvases at:2) form bits = (canvases at:3) form bits) ]) > ``` > > ## The math > p := 0.94 asScaledDecimal. > > hittingTheBug := 0. > notHittingTheBug := 1 - hittingTheBug. > 5 timesRepeat: [ > hittingTheBug := hittingTheBug + (notHittingTheBug * p). > notHittingTheBug := 1 - hittingTheBug ]. > hittingTheBug. "after 5 iterations" > "0.99999969s8" > > On Wed, Apr 10, 2019 at 8:04 PM Sven Van Caekenberghe <s...@stfx.eu> > wrote: > >> Thanks a lot for actually diving into this, this mysterious issue has >> been bugging us for a very long time. I do hope you can finally solve this. >> I am sure many people will be grateful. >> >> Sven >> >> > On 10 Apr 2019, at 15:38, teso...@gmail.com wrote: >> > >> > Hello, >> > >> > After checking the problem with Guille, we have the hypothesis of the >> > source of the problem. >> > We have seen that accessing Free Type is not thread-safe. >> > Basically, the FTFace holds a structure that is filled up with the >> > glyph and its information. As this structure is part of the Font Face >> > (a font face is a font plus size and attributes), only one request to >> > a glyph can be executed at the time. As we are sharing the same font >> > in many places, you will be starting to see the problem. >> > >> > Also, we have seen that there many accesses to the glyphs outside the >> > UI process. >> > This problem started to appear as we starting to use Calypso (but it >> > is not limited to it), as Calypso uses lazy tabs. The creation of >> > these tabs is done outside the UI process. >> > >> > This is only a hack to test our hypothesis. >> > To fix it correctly we will have to rewrite the drawing of the glyphs >> > and synchronize the access to the glyph data information. Also, we >> > want to do it in a way that does not penalize the processing in the UI >> > process. >> > >> > I have only patched the code that is used by the rendering of morphic, >> > other renderings like Athens or any other user using FT should be >> > correctly rewritten. >> > >> > Once we are sure that synchronizing the access to FT fixes the >> > problem, we will do a real fix not a dirty hack like this. >> > >> > I will be testing this new Image to see if there is an improvement. >> > >> > I will really love you try to use this image and tell me if you still >> > find the problem. >> > >> > There is a PR generating a Pharo8 image (it is called wrong, but... it >> > is a Pharo8) >> > >> > 32 bits >> > ===== >> > >> > >> https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-3157/2/artifact/bootstrap-cache/Pharo7.0-PR-32bit-f8c6957.zip >> > >> > 64 bits >> > ===== >> > >> > >> https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-3157/2/artifact/bootstrap-cache/Pharo7.0-PR-64bit-f8c6957.zip >> > >> > If somebody is willing to use it in Pharo 7, I can create a PR against >> > Pharo7 to generate patched P7 images. >> > >> > Thanks! >> > >> > -- >> > Pablo Tesone. >> > teso...@gmail.com >> > >> >> >> > > -- > > > > Guille Polito > > Research Engineer > > Centre de Recherche en Informatique, Signal et Automatique de Lille > > CRIStAL - UMR 9189 > > French National Center for Scientific Research - *http://www.cnrs.fr > <http://www.cnrs.fr>* > > > *Web:* *http://guillep.github.io* <http://guillep.github.io> > > *Phone: *+33 06 52 70 66 13 > -- Pablo Tesone. teso...@gmail.com