I am not shy, the mail of Guille is perfect, I cannot do it!

Thanks a lot for all the explanations!

On Thu, Apr 11, 2019 at 3:25 PM Guillermo Polito <guillermopol...@gmail.com>
wrote:

> So, this morning we have been working on validating this hypothesis and
> making a meaningful and reproducible test for it.
> I'm just the messenger here because Pablo who has done most of the job is
> shy.
>
> ## The main story
>
> Since we are working with concurrency and non-determinism, we have
> designed a test that reproduces the case with an accuracy of ~94%.
> Basically the test renders a Rubric morph in concurrent green threads on a
> big string, and then we compare that the drawn morphs should be bit
> identical.
> We have made some measurements and observed that putting three concurrent
> processes made the chance of hitting the bug to almost 94%.
> So running it more than ~5 times makes it highly reproducible.
>
> Good thing: this test inside a loop of 10 iterations (to increase the
> probability of hitting the bug to >~99%) detects the bug in the stock image
> and not in the one with our patch.
> Bad thing: this test relies on rubric, we have tried to simplify it (like
> just doing a collect on the string to fetch the glyph for each character),
> but simpler code produces a tighter loop which makes it far less
> reproducible (down from 94 to 33%).
>
> Still, there are several questions open:
>  - can we simplify the test? :)
>  - we have to think about a smart way to serialize Freetype access in the
> image (there are several users)
>  - I'll paste below our math, so if somebody would like to review it would
> be good too, our probability is a bit rusty.
>  - we have been doing benchmarks with Pablo this morning and adding a
> mutex does not seem to have any negative impact in rendering.
>
> Help is super welcome of course!!
> The patch that Pablo has done in the PR will just work for the most common
> cases (at least for the calypso tabs), but it shows that FT* needs some
> love.
> Still we would like to be able to backport something like this in Pharo7,
> but backporting a big Freetype refactor would not be so good.
> Maybe the mutex is a good enough change for Pharo7?
>
> Any more ideas? input?
>
> Cheers,
> Guille and Pablo
>
> ## The code
>
> The code of the test is the following:
>
> ```
>  | sem text canvases blocky |
> sem := Semaphore new.
> text := (String loremIpsum: 25*1024).
> FreeTypeCache current removeAll.
> canvases := OrderedCollection new.
>
> blocky := [  | canvas |
>     canvas := FormCanvas extent: 1000@1000.
>     canvases add: canvas.
>     (RubScrolledTextMorph new)
>         setText: text;
>         font: StandardFonts codeFont;
>         bounds: (0@0 corner: canvas form extent);
>         fullDrawOn: canvas.
>         sem signal ].
> blocky forkAt: 39.
> blocky forkAt: 39.
> blocky forkAt: 39.
> sem
>     wait;
>     wait;
>     wait.
>
> self assert: (((canvases at:1) form bits = (canvases at:2) form bits) and:
> [ ((canvases at:2) form bits = (canvases at:3) form bits) ])
> ```
>
> ## The math
> p := 0.94 asScaledDecimal.
>
> hittingTheBug := 0.
> notHittingTheBug := 1 - hittingTheBug.
> 5 timesRepeat: [
> hittingTheBug := hittingTheBug + (notHittingTheBug * p).
> notHittingTheBug := 1 - hittingTheBug ].
> hittingTheBug. "after 5 iterations"
> "0.99999969s8"
>
> On Wed, Apr 10, 2019 at 8:04 PM Sven Van Caekenberghe <s...@stfx.eu>
> wrote:
>
>> Thanks a lot for actually diving into this, this mysterious issue has
>> been bugging us for a very long time. I do hope you can finally solve this.
>> I am sure many people will be grateful.
>>
>> Sven
>>
>> > On 10 Apr 2019, at 15:38, teso...@gmail.com wrote:
>> >
>> > Hello,
>> >
>> > After checking the problem with Guille, we have the hypothesis of the
>> > source of the problem.
>> > We have seen that accessing Free Type is not thread-safe.
>> > Basically, the FTFace holds a structure that is filled up with the
>> > glyph and its information. As this structure is part of the Font Face
>> > (a font face is a font plus size and attributes), only one request to
>> > a glyph can be executed at the time. As we are sharing the same font
>> > in many places, you will be starting to see the problem.
>> >
>> > Also, we have seen that there many accesses to the glyphs outside the
>> > UI process.
>> > This problem started to appear as we starting to use Calypso (but it
>> > is not limited to it), as Calypso uses lazy tabs. The creation of
>> > these tabs is done outside the UI process.
>> >
>> > This is only a hack to test our hypothesis.
>> > To fix it correctly we will have to rewrite the drawing of the glyphs
>> > and synchronize the access to the glyph data information. Also, we
>> > want to do it in a way that does not penalize the processing in the UI
>> > process.
>> >
>> > I have only patched the code that is used by the rendering of morphic,
>> > other renderings like Athens or any other user using FT should be
>> > correctly rewritten.
>> >
>> > Once we are sure that synchronizing the access to FT fixes the
>> > problem, we will do a real fix not a dirty hack like this.
>> >
>> > I will be testing this new Image to see if there is an improvement.
>> >
>> > I will really love you try to use this image and tell me if you still
>> > find the problem.
>> >
>> > There is a PR generating a Pharo8 image (it is called wrong, but... it
>> > is a Pharo8)
>> >
>> > 32 bits
>> > =====
>> >
>> >
>> https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-3157/2/artifact/bootstrap-cache/Pharo7.0-PR-32bit-f8c6957.zip
>> >
>> > 64 bits
>> > =====
>> >
>> >
>> https://ci.inria.fr/pharo-ci-jenkins2/job/Test%20pending%20pull%20request%20and%20branch%20Pipeline/job/PR-3157/2/artifact/bootstrap-cache/Pharo7.0-PR-64bit-f8c6957.zip
>> >
>> > If somebody is willing to use it in Pharo 7, I can create a PR against
>> > Pharo7 to generate patched P7 images.
>> >
>> > Thanks!
>> >
>> > --
>> > Pablo Tesone.
>> > teso...@gmail.com
>> >
>>
>>
>>
>
> --
>
>
>
> Guille Polito
>
> Research Engineer
>
> Centre de Recherche en Informatique, Signal et Automatique de Lille
>
> CRIStAL - UMR 9189
>
> French National Center for Scientific Research - *http://www.cnrs.fr
> <http://www.cnrs.fr>*
>
>
> *Web:* *http://guillep.github.io* <http://guillep.github.io>
>
> *Phone: *+33 06 52 70 66 13
>


-- 
Pablo Tesone.
teso...@gmail.com

Reply via email to