Re: Cmm Memory Model (Understanding #15449)

2018-12-06 Thread Carter Schonwald
Awesome summary by Ben!

theres also some crazy hijinks needed currently to do compare and swap
style stuff

see
https://hackage.haskell.org/package/atomic-primops-0.8.2/docs/Data-Atomics.html
for some discussion. (i'm not sure what the current state of the world is
on CAS tech)

On Thu, Nov 29, 2018 at 9:45 AM Ben Gamari  wrote:

> Travis Whitaker  writes:
>
> > Hello GHC Devs,
> >
> > I'm trying to get my head around ticket #15449 (
> > https://ghc.haskell.org/trac/ghc/ticket/15449). This gist of things is
> that
> > GHC generates incorrect aarch64 code that causes memory corruption in
> > multithreaded programs run on out-of-order machines. User trommler
> > discovered that similar issues are present on PowerPC, and indeed ARMv7
> and
> > PowerPC support the same types of load/store reorderings. The LLVM code
> > emitted by GHC may be incorrect with respect to LLVM's memory model, but
> > this isn't a problem on architectures with minimal reordering like x86.
> >
> Thank you for picking this up!
>
> > I had initially thought that GHC simply wasn't emitting the appropriate
> > LLVM fences; there's an elephant-gun-approach here (
> > https://github.com/TravisWhitaker/ghc/commits/ghc843-wip/T15449) that
> > guards each atomic operation with a full barrier. I still believe that
> GHC
> > is omitting necessary LLVM fences, but this change is insufficient to fix
> > the behavior of the test case (which is simply GHC itself compiling a
> test
> > package with '-jN', N > 1).
> >
> > It seems there's a long and foggy history of the Cmm memory model. Edward
> > Yang discusses this a bit in his post here (
> >
> http://blog.ezyang.com/2014/01/so-you-want-to-add-a-new-concurrency-primitive-to-ghc/
> )
> > and issues similar to #15449 have plagued GHC in the past, like #12469 (
> > https://ghc.haskell.org/trac/ghc/ticket/12469). Worryingly, GHC only has
> > MO_WriteBarrier, whereas PowerPC and ARMv7 really need read, write, and
> > full memory barriers. On ARM an instruction memory barrier might be
> > required as well, but I don't know enough about STG/Cmm to say for sure,
> > and it'd likely be LLVM's responsibility to emit that anyway.
> >
> In my opinion GHC's current memory model story is quite unsustainable. As
> you point out, we currently have a limited selection of plain barriers
> and a few atomic operations, none of which are adequately documented
> given the subtlety they involve.
>
> I think we would at this point be foolish not to take advantage of the
> research that has been done in this area by moving to C11-style
> acquire/release semantics wherever possible. While following through on
> this is not a small task, these semantics are both well defined and
> easier to reason about intuitively. We might also be able to benefit
> from the wealth of model checking tools that are now available for this
> formalism.
>
> Ulan Degenbaev (CC'd) and I discussed him possibly picking up the task
> of moving to C11 atomics a few weeks ago.
>
> > I'm hoping that someone with more tribal knowledge than I might be able
> to
> > give me some pointers with regards to the following areas:
> >
> >
> >- Does STG itself have anything like a memory model? My intuition says
> >'definitely not', but given that STG expressions may contain Cmm
> operations
> >(via StgCmmPrim), there may be STG-to-STG transformations that need
> to care
> >about the target machine's memory model.
>
> As far as I know, it has nothing formally, or even informally, defined.
> That being said, relatively few of our primops make any guarantees about
> their operation in a concurrent setting. The cases that I can think of
> include `casArray#`, `atomicModifyIORef#`, `atomic{Read,Write}*Array#`
> and the STM operations.
>
> >- With respect to Cmm, what reorderings does GHC perform? What are the
> >relevant parts of the compiler to begin studying?
>
> As far as I know, very few. We only perform a handful of optimizations
> on C--. These include,
>
>  * Sinking of assignments (see compiler/cmm/CmmSink.hs); see
>CmmSink.conflicts for what commutation we allow. We notably don't
>allow sinking of any memory assignments past calls (including
>primops)
>
>  * Simple constant folding (see CmmOpt.cmmMachOpFoldM)
>
>  * Common block elimination (see CmmCommonBlockElim)
>
>  * Some simple control-flow optimisation (see CmmContFlowOpt)
>
> >- Are the LLVM atomics that GHC emits correct with respect to the LLVM
> >memory model? As it stands now LLVM fences are only emitted for
> >MO_WriteBarrier. Without fences accompanying the atomics, it seems
> the LLVM
> >compiler could float dependent loads/stores past atomic operations.
>
> Frankly, I would be surprised if they are correct. Few people have
> really looked at GHC's memory ordering properties and even fewer have
> looked at those of the LLVM backend.
>
> >- Why is MO_WriteBarrier the only provided memory barrier? My hunch is
> >that it's because this

Re: Windows test failures

2018-12-06 Thread Phyx
forgot to copy in ghc-devs.

On Fri, Dec 7, 2018 at 12:05 AM Phyx  wrote:

> Ah great,
>
> Normally an msys2 .profile will contain the following line
>
> # Set user-defined locale
> export LANG=$(locale -uU)
>
> which would attach ".UTF-8" to the user's current locale.
> I'm guessing when you run it from emacs the profile isn't loaded (probably
> because bash isn't being called with --login) or something in bashrc or
> profile is overwriting this.
>
> Setting this should bring the test failures down more.
>
> Which leaves only these tests
>
>/c/Users/simonpj/AppData/Local/Temp/ghctest-i30xogs3/test
> spaces/plugins/plugin-recomp-pure.run  plugin-recomp-pure [bad
> exit code] (normal)
>
>/c/Users/simonpj/AppData/Local/Temp/ghctest-i30xogs3/test
> spaces/plugins/plugin-recomp-impure.runplugin-recomp-impure
> [bad exit code] (normal)
>
>/c/Users/simonpj/AppData/Local/Temp/ghctest-i30xogs3/test
> spaces/plugins/plugin-recomp-flags.run plugin-recomp-flags [bad
> exit code] (normal)
>
> as the remaining unexplained ones.
>
> Do these still fail for you?
>
> Cheers,
> Tamar
>
>
> On Thu, Dec 6, 2018 at 11:50 PM Simon Peyton Jones 
> wrote:
>
>> Aha!   Yes, in libraries/base/tests, I find that make TEST=T3307 fails;
>> but succeeds with
>>
>>
>>
>> export LANG=en_GB.UTF-8
>>
>>
>>
>> Progress!
>>
>>
>>
>> Simon
>>
>>
>>
>> *From:* Phyx 
>> *Sent:* 06 December 2018 23:43
>> *To:* Simon Peyton Jones 
>> *Cc:* ghc-devs@haskell.org Devs 
>> *Subject:* Re: Windows test failures
>>
>>
>>
>> Hi Simon,
>>
>>
>>
>> > Does that help at all?
>>
>> >
>>
>> > I can try your “export LANG=en_GB.UTF-8” … shall I do that? Can I test
>> its efficacy by running one test rather than 6,000 of them?  In which case,
>> which one?
>>
>>
>>
>> Yes, "en_GB" doesn't seem to be a unicode locale, one test you can try to
>> check is T3307,so make TEST="T3307" -C testsuite/tests/
>>
>>
>>
>> I tried using your "en_GB" locale and the test failed for me too then.
>>
>>
>>
>> Kind regards,
>>
>> Tamar
>>
>>
>>
>> On Thu, Dec 6, 2018 at 2:17 PM Simon Peyton Jones 
>> wrote:
>>
>> Hi Tamar
>>
>>
>>
>> Thanks for working on this.
>>
>>
>>
>> if it's an msys2 shell, what does "locale" return?
>>
>>
>>
>> It’s a shell running inside emacs.  Here’s what locale returns
>>
>> /c/tmp$ locale
>>
>> LANG=en_GB
>>
>> LC_CTYPE="en_GB"
>>
>> LC_NUMERIC="en_GB"
>>
>> LC_TIME="en_GB"
>>
>> LC_COLLATE="en_GB"
>>
>> LC_MONETARY="en_GB"
>>
>> LC_MESSAGES="en_GB"
>>
>> LC_ALL=
>>
>>
>>
>> Does that help at all?
>>
>>
>>
>> I can try your “export LANG=en_GB.UTF-8” … shall I do that? Can I test
>> its efficacy by running one test rather than 6,000 of them?  In which case,
>> which one?
>>
>>
>> Thanks
>>
>>
>>
>> Simon
>>
>>
>>
>>
>>
>> *From:* Phyx 
>> *Sent:* 02 December 2018 20:43
>> *To:* Simon Peyton Jones 
>> *Cc:* ghc-devs@haskell.org Devs 
>> *Subject:* Re: Windows test failures
>>
>>
>>
>> Hi Simon,
>>
>>
>>
>> That's a bit better (still need to figure out why the recent threading
>> issues, but one problem at a time :) )
>>
>>
>>
>> From that list T10672_x64 is one I'm looking at already, seems to have
>> something to do with the libstdc++ destructors.
>>
>> Plugins 09 and 10 are the other two I know about, but haven't had time to
>> look at them yet. Frankly I know too little about plugins to make an
>> accurate determination here, but the input files are empty
>>
>> yet it expects output, so I don't know what it's supposed to do here.  If
>> someone who knows more about plugins can chime in that would save some time.
>>
>>
>>
>> The segfaulting plugin I haven't triaged yet. Now the remaining failures
>> aside from T14452 that Roland is taking care of, seem to have to do with
>> your locale in your console.  You seem to be running the
>>
>> tests in a console that has latin-1 locale? So some unicode characters
>> fail encoding/decoding.
>>
>>
>>
>> If it's a Windows shell you can change it to utf-8 using "chcp 65001", if
>> it's an msys2 shell, what does "locale" return?
>>
>>
>>
>> For reference mine is
>>
>>
>>
>> $ locale
>> LANG=en_GB.UTF-8
>> LC_CTYPE="en_GB.UTF-8"
>> LC_NUMERIC="en_GB.UTF-8"
>> LC_TIME="en_GB.UTF-8"
>> LC_COLLATE="en_GB.UTF-8"
>> LC_MONETARY="en_GB.UTF-8"
>> LC_MESSAGES="en_GB.UTF-8"
>> LC_ALL=
>>
>>
>>
>> If it does say latin1 you can change it with
>>
>>
>>
>> export LANG=en_GB.UTF-8
>>
>>
>>
>> This should fix more of the tests.
>>
>>
>>
>> The reason I don't mark the remaining tests as expect fail yet is because
>> I haven't had the time to triage them, so I don't know their severity and
>>
>> last time there were a few nasty issues hidden in them.
>>
>>
>>
>> Unfortunately I won't have time to look at them till next weekend.
>>
>>
>>
>> Thanks,
>>
>> Tamar
>>
>>
>>
>> On Fri, Nov 30, 2018 at 9:49 PM Simon Peyton Jones 
>> wrote:
>>
>> At the end of the first test run it would have given a list of tests that
>> failed and a line saying TEST=" List of tests..."
>>
>>
>>
>> Copy tha

Re: Windows test failures

2018-12-06 Thread Phyx
Hi Roland,

Thanks for looking into thiis,

> To fix the issue on Windows, the compiler and the plugin should use the
same buffer for stdout.

I'm not convinced that they're not. I'm guessing the answer lies in why
your messages have a different
ordering, but I don't know why off the top of my head.

if the plugins use the stdout caf then it should be using the same
CharBuffer.

You can verify this by doing something like

cbuf <- readIORef haCharBuffer stdout
summaryBuffer cbuf
putStrLn $ "buffer: " ++ show cbuf

at the places you check the buffer mode.

> However I don't know whether this is possible / difficult / easy?
> What's your opinion?

On Tue, Dec 4, 2018 at 11:52 AM Roland Senn  wrote:

> Hi Tamar,
>
> WINDOWS
> ===
> On Windows I did the following changes before running the 'plugin09' test:
> 1.) In the compiler (HscMain.hs), just before calling the plugin function
> 'parsedPlugin', I set BufferMode of the file stdout to LineBuffering.
> 2.) At the same place, I write a message to stdout with the text "COMPILER
> About to call plugin parse" and the result of the buffer-mode query.
> 3.) In the plugin, in the function parsedPlugin, I query and print (to
> stdout) the buffer mode.
> 4.) I added the heading PLUGIN to the normal parse message issued by the
> parsedPlugin function
> 5.) In the compiler (HscMain.hs) just after returning from the plugin, I
> print the line "COMPILER Returning from plugin parse" to stdout.
> 6.) In the  plugin function interfaceLoadPlugin' that is called much
> later, I flush stdout, and add the heading "PLUGIN".
>
> This gives the following interesting result:
>
>  COMPILER About to call plugin parse: LineBuffering
>  COMPILER Returning from plugin parse
>  PLUGIN Buffermode: BlockBuffering Nothing
>  PLUGIN parsePlugin(a,b)
>  PLUGIN interfacePlugin: Prelude
>  ...
>
> The output lines do not appear in the sequence they were produced!!
> The plugin doesn't see/inherit the BlockBuffer mode (LineBuffering) set by
> the compiler!!
>
> This is a strong indication, that *there are two different buffers for
> stdout*. One in the compiler and another one in the plugin.
> At the end of the processing, the buffer in the compiler is automatically
> flushed, however the buffer in the plugin never gets flushed!
>
> LINUX
> =
> I did a similar test in Linux, however, here I set the buffer mode to
> 'Blockmode Nothing' and I didn't do a manual flush in the plugin. I got the
> following result:
>
>  COMPILER About to call plugin parse: Buffering mode: BlockBuffering
> Nothing
>  PLUGIN Buffering: BlockBuffering Nothing
>  PLUGIN parsePlugin(a,b)
>  COMPILER Returning from plugin parse
>  PLUGIN interfacePlugin: Prelude
>  ...
>
> Here the lines are in the same order as they were produced.
> The setting of the Buffering mode is inherited by the plugin.
>
> I think, on Linux the compiler and the plugin share the same buffer.
>
> To fix the issue on Windows, the compiler and the plugin should use the
> same buffer for stdout.
> However I don't know whether this is possible / difficult / easy?
> What's your opinion?
>
> Many thanks and kind regards
>Roland
>
> Here are my changes for Windows in code:
>
> Change in HscMain the line "import System.IO (fixIO)" to "import System.IO
> "
>
> Last lines of function HscMain.hs:hscParse'
>
> -- apply parse transformation of plugins
> let applyPluginAction p opts
>   = parsedResultAction p opts mod_summary
> liftIO $ hSetBuffering stdout LineBuffering
> mode <- liftIO $ hGetBuffering stdout
> liftIO $ putStrLn ("COMPILER About to call plugin parse: " ++
> show mode)
> rsxresult <- withPlugins dflags applyPluginAction res
> liftIO $ putStrLn "COMPILER Returning from plugin parse"
> return rsxresult
>
> New code for function SourcePlugin.hs:parsedPlugin
>
> parsedPlugin opts _ pm
>   = do
>mode <- liftIO $ hGetBuffering stdout
>liftIO $ putStrLn $ "PLUGIN Buffermode: " ++ show mode
>liftIO $ putStrLn $ "PLUGIN parsePlugin(" ++ intercalate "," opts
> ++ ")"
>return pm
>
> New code for function SourcePlugin.hs:interfaceLoadPlugin'
>
> interfaceLoadPlugin' :: [CommandLineOption] -> ModIface -> IfM lcl ModIface
> interfaceLoadPlugin' _ iface
>   = do liftIO $ putStrLn $ "PLUGIN interfacePlugin: "
>   ++ (showSDocUnsafe $ ppr $ mi_module iface)
>liftIO $ hFlush stdout
>return iface
>
>
>
> Am Dienstag, den 04.12.2018, 00:02 + schrieb Phyx:
>
> Hi Roland,
>
> Thanks for looking into these.
>
> > I looked into the testcases 'plugins09', 'plugins10' and 'plugins11' and
> found the following: GHC-Windows uses BufferMode 'BlockBuffering Nothing',
> however, GHC-Linux uses 'LineBuffering'.
>
> Ah, yes, this isn't technically a Linux vs Windows thing, GHC will always
> default to LineBuffering for terminals and BlockBuffering for anything
> else. The issue is

Re: Windows test failures

2018-12-06 Thread Phyx
Hi Simon,

> Does that help at all?

>

> I can try your “export LANG=en_GB.UTF-8” … shall I do that? Can I test
its efficacy by running one test rather than 6,000 of them?  In which case,
which one?


Yes, "en_GB" doesn't seem to be a unicode locale, one test you can try to
check is T3307,so make TEST="T3307" -C testsuite/tests/


I tried using your "en_GB" locale and the test failed for me too then.


Kind regards,

Tamar

On Thu, Dec 6, 2018 at 2:17 PM Simon Peyton Jones 
wrote:

> Hi Tamar
>
>
>
> Thanks for working on this.
>
>
>
> if it's an msys2 shell, what does "locale" return?
>
>
>
> It’s a shell running inside emacs.  Here’s what locale returns
>
> /c/tmp$ locale
>
> LANG=en_GB
>
> LC_CTYPE="en_GB"
>
> LC_NUMERIC="en_GB"
>
> LC_TIME="en_GB"
>
> LC_COLLATE="en_GB"
>
> LC_MONETARY="en_GB"
>
> LC_MESSAGES="en_GB"
>
> LC_ALL=
>
>
>
> Does that help at all?
>
>
>
> I can try your “export LANG=en_GB.UTF-8” … shall I do that? Can I test
> its efficacy by running one test rather than 6,000 of them?  In which case,
> which one?
>
>
> Thanks
>
>
>
> Simon
>
>
>
>
>
> *From:* Phyx 
> *Sent:* 02 December 2018 20:43
> *To:* Simon Peyton Jones 
> *Cc:* ghc-devs@haskell.org Devs 
> *Subject:* Re: Windows test failures
>
>
>
> Hi Simon,
>
>
>
> That's a bit better (still need to figure out why the recent threading
> issues, but one problem at a time :) )
>
>
>
> From that list T10672_x64 is one I'm looking at already, seems to have
> something to do with the libstdc++ destructors.
>
> Plugins 09 and 10 are the other two I know about, but haven't had time to
> look at them yet. Frankly I know too little about plugins to make an
> accurate determination here, but the input files are empty
>
> yet it expects output, so I don't know what it's supposed to do here.  If
> someone who knows more about plugins can chime in that would save some time.
>
>
>
> The segfaulting plugin I haven't triaged yet. Now the remaining failures
> aside from T14452 that Roland is taking care of, seem to have to do with
> your locale in your console.  You seem to be running the
>
> tests in a console that has latin-1 locale? So some unicode characters
> fail encoding/decoding.
>
>
>
> If it's a Windows shell you can change it to utf-8 using "chcp 65001", if
> it's an msys2 shell, what does "locale" return?
>
>
>
> For reference mine is
>
>
>
> $ locale
> LANG=en_GB.UTF-8
> LC_CTYPE="en_GB.UTF-8"
> LC_NUMERIC="en_GB.UTF-8"
> LC_TIME="en_GB.UTF-8"
> LC_COLLATE="en_GB.UTF-8"
> LC_MONETARY="en_GB.UTF-8"
> LC_MESSAGES="en_GB.UTF-8"
> LC_ALL=
>
>
>
> If it does say latin1 you can change it with
>
>
>
> export LANG=en_GB.UTF-8
>
>
>
> This should fix more of the tests.
>
>
>
> The reason I don't mark the remaining tests as expect fail yet is because
> I haven't had the time to triage them, so I don't know their severity and
>
> last time there were a few nasty issues hidden in them.
>
>
>
> Unfortunately I won't have time to look at them till next weekend.
>
>
>
> Thanks,
>
> Tamar
>
>
>
> On Fri, Nov 30, 2018 at 9:49 PM Simon Peyton Jones 
> wrote:
>
> At the end of the first test run it would have given a list of tests that
> failed and a line saying TEST=" List of tests..."
>
>
>
> Copy that line and at the root of the checkout do
>
>
>
> make TEST=" List of tests..."  test -C testsuite/tests
>
>
>
> (that's uppercase C). This will run everything using one thread. :)
>
>
>
> OK, done.  Results below.
>
>
>
> Simon
>
>
>
>
>
> /c/code/HEAD$ make TEST="T10420 T10672_x64 T13385 T14452 T15815 T3307
> T3319 T4006 TH_scopedTvs environment001 plugin-recomp-change
> plugin-recomp-flags plugin-recomp-impure plugin-recomp-pure plugins07
> plugins09 plugins10 plugins11 plugins13 plugins14 print017"  test -C
> testsuite/tests
>
> make: Entering directory '/c/code/HEAD/testsuite/tests'
>
> PYTHON="python3" "python3" ../driver/runtests.py  -e
> "ghc_compiler_always_flags='-dcore-lint -dcmm-lint -no-user-package-db
> -rtsopts  -fno-warn-missed-specialisations -fshow-warning-groups
> -fdiagnostics-color=never -fno-diagnostics-show-caret -Werror=compat
> -dno-debug-output'" -e config.compiler_debugged=False -e
> ghc_with_native_codegen=True -e config.have_vanilla=True -e
> config.have_dynamic=False -e config.have_profiling=False -e
> ghc_with_threaded_rts=True -e ghc_with_dynamic_rts=False -e
> config.have_interp=True -e config.unregisterised=False -e
> config.have_gdb=False -e config.have_readelf=True -e
> config.ghc_dynamic_by_default=False -e config.ghc_dynamic=False -e
> ghc_with_smp=True -e ghc_with_llvm=False -e windows=True -e darwin=False -e
> config.in_tree_compiler=True -e config.cleanup=True -e config.local=True
> --rootdir=. --config-file=../config/ghc -e
> 'config.platform="x86_64-unknown-mingw32"' -e 'config.os="mingw32"' -e
> 'config.arch="x86_64"' -e 'config.wordsize="64"' -e 'config.timeout=int()
> or config.timeout' -e 'config.exeext=".exe"' -e
> 'config.top="/c/code/HEAD/testsuite"' --config
> 'compiler="/c/code/HEAD/inplace/bin/g

Re: Residency profiles

2018-12-06 Thread Simon Marlow
It is documented!
https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/runtime_control.html#rts-flag--F%20%E2%9F%A8factor%E2%9F%A9

On Thu, 6 Dec 2018 at 16:21, Sebastian Graf  wrote:

> Hey,
>
> thanks, all! Measuring with `-A1M -F1` delivers much more reliable
> residency numbers.
> `-F` doesn't seem to be documented. From reading `rts/RtsFlags.c` and
> `rts/sm/GC.c` I gather that it's the factor by which to multiply the number
> of live bytes by to get the new old gen size?
> So effectively, the old gen will 'overflow' on every minor GC, neat!
>
> Greetings
> Sebastian
>
> Am Do., 6. Dez. 2018 um 12:52 Uhr schrieb Simon Peyton Jones via ghc-devs <
> ghc-devs@haskell.org>:
>
>> |  Right. A parameter for fixing the nursery size would be easy to
>> implement,
>> |  I think. Just a new flag, then in GC.c:resize_nursery() use the flag
>> as the
>> |  nursery size.
>>
>> Super!  That would be v useful.
>>
>> |  "Max. residency" is really hard to measure (need to do very frequent
>> GCs),
>> |  perhaps a better question to ask is "residency when the program is in
>> state
>> |  S".
>>
>> Actually, Sebastian simply wants to see an accurate, reproducible
>> residency profile, and doing frequent GCs might well be an acceptable
>> cost.
>>
>> Simon
>> ___
>> ghc-devs mailing list
>> ghc-devs@haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Residency profiles

2018-12-06 Thread Sebastian Graf
Hey,

thanks, all! Measuring with `-A1M -F1` delivers much more reliable
residency numbers.
`-F` doesn't seem to be documented. From reading `rts/RtsFlags.c` and
`rts/sm/GC.c` I gather that it's the factor by which to multiply the number
of live bytes by to get the new old gen size?
So effectively, the old gen will 'overflow' on every minor GC, neat!

Greetings
Sebastian

Am Do., 6. Dez. 2018 um 12:52 Uhr schrieb Simon Peyton Jones via ghc-devs <
ghc-devs@haskell.org>:

> |  Right. A parameter for fixing the nursery size would be easy to
> implement,
> |  I think. Just a new flag, then in GC.c:resize_nursery() use the flag as
> the
> |  nursery size.
>
> Super!  That would be v useful.
>
> |  "Max. residency" is really hard to measure (need to do very frequent
> GCs),
> |  perhaps a better question to ask is "residency when the program is in
> state
> |  S".
>
> Actually, Sebastian simply wants to see an accurate, reproducible
> residency profile, and doing frequent GCs might well be an acceptable
> cost.
>
> Simon
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: References for GHC usage of multiple capabilities

2018-12-06 Thread Ben Gamari
Artem Pelenitsyn  writes:

> Hello devs,
>
> I've been working on a short survey devoted to a topic of multithreading
> inside the GHC compiler and runtime. So far I was mostly looking at the
> following three papers
>
> [1] P. W. Trinder, K. Hammond, J. S. Mattson, Jr., A. S. Partridge, and S.
> L. Peyton Jones. Gum: A portable parallel implementation of Haskell.  PLDI
> ’96
>
> [2] Tim Harris, Simon Marlow, and Simon Peyton Jones. Haskell on a
> shared-memory multiprocessor. Haskell ’05
>
> [3] Simon Marlow, Simon Peyton Jones, and Satnam Singh. Runtime support for
> multicore Haskell. ICFP ’09
>
> Can you suggest any other papers adding insights on how GHC uses multiple
> capabilities for anything from GC to the implementation of
> Parallel/Concurrent Haskell? Perhaps, something more recent than the above,
> but preferably published in academic venues.
>
Here are a few others but I may have missed a few:

 * Parallel Generational-Copying Garbage Collection with a Block-Structured 
Heap (Simon Marlow, Tim Harris, Roshan P. James, Simon Peyton Jones) In ISMM 
'08: Proceedings of the 7th international symposium on Memory management, 
Tucson, Arizona, ACM, June 2008 
 * Concurrent Haskell, Simon Peyton Jones, Andrew Gordon, Sigbjorn Finne.
 * Composable Memory Transactions, Tim Harris, Simon Marlow, Simon 
Peyton-Jones, and Maurice Herlihy. In Proceedings of the tenth ACM SIGPLAN 
symposium on Principles and practice of parallel programming (PPoPP '05) 
 * Transactional Memory with Data Invariants, Tim Harris and Simon Peyton 
Jones. In TRANSACT '06 

Cheers,

- Ben


signature.asc
Description: PGP signature
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


References for GHC usage of multiple capabilities

2018-12-06 Thread Artem Pelenitsyn
Hello devs,

I've been working on a short survey devoted to a topic of multithreading
inside the GHC compiler and runtime. So far I was mostly looking at the
following three papers

[1] P. W. Trinder, K. Hammond, J. S. Mattson, Jr., A. S. Partridge, and S.
L. Peyton Jones. Gum: A portable parallel implementation of Haskell.  PLDI
’96

[2] Tim Harris, Simon Marlow, and Simon Peyton Jones. Haskell on a
shared-memory multiprocessor. Haskell ’05

[3] Simon Marlow, Simon Peyton Jones, and Satnam Singh. Runtime support for
multicore Haskell. ICFP ’09

Can you suggest any other papers adding insights on how GHC uses multiple
capabilities for anything from GC to the implementation of
Parallel/Concurrent Haskell? Perhaps, something more recent than the above,
but preferably published in academic venues.

The survey is meant to be of interest for systems folks. Therefore, I'm not
paying so much attention to the programming model and how it is used in
programs.

--
Best wishes,
Artem
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


RE: Windows test failures

2018-12-06 Thread Simon Peyton Jones via ghc-devs
Hi Tamar

Thanks for working on this.

if it's an msys2 shell, what does "locale" return?

It’s a shell running inside emacs.  Here’s what locale returns

/c/tmp$ locale

LANG=en_GB

LC_CTYPE="en_GB"

LC_NUMERIC="en_GB"

LC_TIME="en_GB"

LC_COLLATE="en_GB"

LC_MONETARY="en_GB"

LC_MESSAGES="en_GB"

LC_ALL=

Does that help at all?

I can try your “export LANG=en_GB.UTF-8” … shall I do that? Can I test its 
efficacy by running one test rather than 6,000 of them?  In which case, which 
one?

Thanks

Simon


From: Phyx 
Sent: 02 December 2018 20:43
To: Simon Peyton Jones 
Cc: ghc-devs@haskell.org Devs 
Subject: Re: Windows test failures

Hi Simon,

That's a bit better (still need to figure out why the recent threading issues, 
but one problem at a time :) )

From that list T10672_x64 is one I'm looking at already, seems to have 
something to do with the libstdc++ destructors.
Plugins 09 and 10 are the other two I know about, but haven't had time to look 
at them yet. Frankly I know too little about plugins to make an accurate 
determination here, but the input files are empty
yet it expects output, so I don't know what it's supposed to do here.  If 
someone who knows more about plugins can chime in that would save some time.

The segfaulting plugin I haven't triaged yet. Now the remaining failures aside 
from T14452 that Roland is taking care of, seem to have to do with your locale 
in your console.  You seem to be running the
tests in a console that has latin-1 locale? So some unicode characters fail 
encoding/decoding.

If it's a Windows shell you can change it to utf-8 using "chcp 65001", if it's 
an msys2 shell, what does "locale" return?

For reference mine is

$ locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_ALL=

If it does say latin1 you can change it with

export LANG=en_GB.UTF-8

This should fix more of the tests.

The reason I don't mark the remaining tests as expect fail yet is because I 
haven't had the time to triage them, so I don't know their severity and
last time there were a few nasty issues hidden in them.

Unfortunately I won't have time to look at them till next weekend.

Thanks,
Tamar

On Fri, Nov 30, 2018 at 9:49 PM Simon Peyton Jones 
mailto:simo...@microsoft.com>> wrote:
At the end of the first test run it would have given a list of tests that 
failed and a line saying TEST=" List of tests..."

Copy that line and at the root of the checkout do

make TEST=" List of tests..."  test -C testsuite/tests

(that's uppercase C). This will run everything using one thread. :)

OK, done.  Results below.

Simon



/c/code/HEAD$ make TEST="T10420 T10672_x64 T13385 T14452 T15815 T3307 T3319 
T4006 TH_scopedTvs environment001 plugin-recomp-change plugin-recomp-flags 
plugin-recomp-impure plugin-recomp-pure plugins07 plugins09 plugins10 plugins11 
plugins13 plugins14 print017"  test -C testsuite/tests

make: Entering directory '/c/code/HEAD/testsuite/tests'

PYTHON="python3" "python3" ../driver/runtests.py  -e 
"ghc_compiler_always_flags='-dcore-lint -dcmm-lint -no-user-package-db -rtsopts 
 -fno-warn-missed-specialisations -fshow-warning-groups 
-fdiagnostics-color=never -fno-diagnostics-show-caret -Werror=compat 
-dno-debug-output'" -e config.compiler_debugged=False -e 
ghc_with_native_codegen=True -e config.have_vanilla=True -e 
config.have_dynamic=False -e config.have_profiling=False -e 
ghc_with_threaded_rts=True -e ghc_with_dynamic_rts=False -e 
config.have_interp=True -e config.unregisterised=False -e config.have_gdb=False 
-e config.have_readelf=True -e config.ghc_dynamic_by_default=False -e 
config.ghc_dynamic=False -e ghc_with_smp=True -e ghc_with_llvm=False -e 
windows=True -e darwin=False -e config.in_tree_compiler=True -e 
config.cleanup=True -e config.local=True --rootdir=. 
--config-file=../config/ghc -e 'config.platform="x86_64-unknown-mingw32"' -e 
'config.os="mingw32"' -e 'config.arch="x86_64"' -e 'config.wordsize="64"' -e 
'config.timeout=int() or config.timeout' -e 'config.exeext=".exe"' -e 
'config.top="/c/code/HEAD/testsuite"' --config 
'compiler="/c/code/HEAD/inplace/bin/ghc-stage2.exe"' --config 
'ghc_pkg="/c/code/HEAD/inplace/bin/ghc-pkg.exe"' --config 'haddock=' --config 
'hp2ps="/c/code/HEAD/inplace/bin/hp2ps.exe"' --config 
'hpc="/c/code/HEAD/inplace/bin/hpc.exe"' --config 'gs="gs"' --config 
'timeout_prog="../timeout/install-inplace/bin/timeout.exe"' -e "config.stage=2" 
  --rootdir=../../libraries/Win32/tests  --rootdir=../../libraries/array/tests  
--rootdir=../../libraries/base/tests  --rootdir=../../libraries/binary/tests  
--rootdir=../../libraries/bytestring/tests  
--rootdir=../../libraries/containers/tests  
--rootdir=../../libraries/deepseq/tests  
--rootdir=../../libraries/directory/tests  
--rootdir=../../libraries/filepath/tests  
--rootdir=../../libraries/ghc-compact/tests  
--rootdir=../../libraries/ghc-heap/tests  
--rootdir

RE: Residency profiles

2018-12-06 Thread Simon Peyton Jones via ghc-devs
|  Right. A parameter for fixing the nursery size would be easy to implement,
|  I think. Just a new flag, then in GC.c:resize_nursery() use the flag as the
|  nursery size.

Super!  That would be v useful.

|  "Max. residency" is really hard to measure (need to do very frequent GCs),
|  perhaps a better question to ask is "residency when the program is in state
|  S".

Actually, Sebastian simply wants to see an accurate, reproducible residency 
profile, and doing frequent GCs might well be an acceptable cost.  

Simon
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Residency profiles

2018-12-06 Thread Simon Marlow
On Thu, 6 Dec 2018 at 11:15, Ömer Sinan Ağacan  wrote:

> Hi,
>
>  > I think what we want is a way to trigger GC at very regular intervals,
> after
>  > (say) each 10kbytes or 100kbytes or 1Mbyte  of allocation.  That might
> be
>  > expensive, but we’d get reproducible results.
>
> If we could fix the nursery size to 10kb that'd trigger a GC in every 10kb
> of
> allocation (you could still allocate large objects as those are not
> allocated in
> the nursery, but perhaps that's not a problem in your benchmarks). Then by
> setting -G1 you could turn all GCs to major GCs (because first generation
> is
> always collected). Note that because each capability has its own nursery
> you may
> want to set the nursery size to alloc_per_gc / num_of_caps if you need
> more than
> one capability.
>
>  > I don’t think that is possible right now – see the ticket – but it
> would be
>  > easy enough to do wouldn’t it?  Just give only 10k or 100k or 1M to the
>  > allocator when setting it running again.
>
> Right. A parameter for fixing the nursery size would be easy to implement,
> I
> think. Just a new flag, then in GC.c:resize_nursery() use the flag as the
> nursery size.
>

It's possible that +RTS -A100k -F1 would do what you want. It would keep
the 2-generation setup, with a fixed nursery size, and -F1 would ensure
that every collection is a major one. I haven't tested this, but if it
doesn't work, it should!

Cheers
Simon



>
> "Max. residency" is really hard to measure (need to do very frequent GCs),
> perhaps a better question to ask is "residency when the program is in
> state S".
> This is also hard to measure if your program is threaded or have other
> non-determinism, but this lets you decide when to measure residency.
> Currently
> we can't tell the GC to print residency stats, but perhaps we could
> implement a
> variant of `performGC` that prints residency after the GC. So in your
> program
> you could add `performGCPrintStats` after every iteration or step etc. Not
> sure
> how useful this would be, but just an idea..
>
> On 6.12.2018 13:09, Simon Peyton Jones wrote:
> > Simon, Ben, Omer
> >
> > As you’ll see in comments 55-72 of
> https://ghc.haskell.org/trac/ghc/ticket/9476,
> > Sebastian has been a bit flummoxed by the task of measure residency
> profiles;
> > that is, how much data is truly live during execution.
> >
> > A major GC measures that, but we are vulnerable to exactly when it
> happens (even
> > with -G1) and that can lead to irreproducible results.
> >
> > I think what we want is a way to trigger GC at very regular intervals,
> after
> > (say) each 10kbytes or 100kbytes or 1Mbyte  of allocation.  That might
> be
> > expensive, but we’d get reproducible results.
> >
> > I don’t think that is possible right now – see the ticket – but it would
> be easy
> > enough to do wouldn’t it?  Just give only 10k or 100k or 1M to the
> allocator
> > when setting it running again.
> >
> > Would you consider this?  Or are we just missing something obvious?
> >
> > Needless to say, we want to do all this with full optimisation on, no
> > cost-centre profiling.
> >
> > Thanks
> >
> > Simon
> >
>
> --
> Ömer Sinan Ağacan, Haskell Consultant
> Well-Typed LLP, http://www.well-typed.com
>
> Registered in England & Wales, OC335890
> 118 Wymering Mansions, Wymering Road, London W9 2NF, England
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Residency profiles

2018-12-06 Thread Ömer Sinan Ağacan

Hi,

> I think what we want is a way to trigger GC at very regular intervals, after
> (say) each 10kbytes or 100kbytes or 1Mbyte  of allocation.  That might be
> expensive, but we’d get reproducible results.

If we could fix the nursery size to 10kb that'd trigger a GC in every 10kb of
allocation (you could still allocate large objects as those are not allocated in
the nursery, but perhaps that's not a problem in your benchmarks). Then by
setting -G1 you could turn all GCs to major GCs (because first generation is
always collected). Note that because each capability has its own nursery you may
want to set the nursery size to alloc_per_gc / num_of_caps if you need more than
one capability.

> I don’t think that is possible right now – see the ticket – but it would be
> easy enough to do wouldn’t it?  Just give only 10k or 100k or 1M to the
> allocator when setting it running again.

Right. A parameter for fixing the nursery size would be easy to implement, I
think. Just a new flag, then in GC.c:resize_nursery() use the flag as the
nursery size.

"Max. residency" is really hard to measure (need to do very frequent GCs),
perhaps a better question to ask is "residency when the program is in state S".
This is also hard to measure if your program is threaded or have other
non-determinism, but this lets you decide when to measure residency. Currently
we can't tell the GC to print residency stats, but perhaps we could implement a
variant of `performGC` that prints residency after the GC. So in your program
you could add `performGCPrintStats` after every iteration or step etc. Not sure
how useful this would be, but just an idea..

On 6.12.2018 13:09, Simon Peyton Jones wrote:

Simon, Ben, Omer

As you’ll see in comments 55-72 of https://ghc.haskell.org/trac/ghc/ticket/9476, 
Sebastian has been a bit flummoxed by the task of measure residency profiles; 
that is, how much data is truly live during execution.


A major GC measures that, but we are vulnerable to exactly when it happens (even 
with -G1) and that can lead to irreproducible results.


I think what we want is a way to trigger GC at very regular intervals, after 
(say) each 10kbytes or 100kbytes or 1Mbyte  of allocation.  That might be 
expensive, but we’d get reproducible results.


I don’t think that is possible right now – see the ticket – but it would be easy 
enough to do wouldn’t it?  Just give only 10k or 100k or 1M to the allocator 
when setting it running again.


Would you consider this?  Or are we just missing something obvious?

Needless to say, we want to do all this with full optimisation on, no 
cost-centre profiling.


Thanks

Simon



--
Ömer Sinan Ağacan, Haskell Consultant
Well-Typed LLP, http://www.well-typed.com

Registered in England & Wales, OC335890
118 Wymering Mansions, Wymering Road, London W9 2NF, England
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Guidelines for respectful communication

2018-12-06 Thread Simon Peyton Jones via ghc-devs
Friends
As many of you will know, I have been concerned for several years about the 
standards of discourse in the Haskell community.  I think things have improved 
since the period that drove me to write my Respect 
email, 
but it's far from secure.
We discussed this at a meeting of the GHC Steering 
Committee at ICFP in September, 
and many of us have had related discussions since.  Arising out of that 
conversation, the GHC Steering Committee has decided to adopt these
  Guidelines for respectful 
communication

We are not trying to impose these guidelines on members of the Haskell 
community generally. Rather, we are adopting them for ourselves, as a signal 
that we seek high standards of discourse in the Haskell community, and are 
willing to publicly hold ourselves to that standard, in the hope that others 
may choose to follow suit.
We are calling them "guidelines for respectful communication" rather than a 
"code of conduct", because we want to encourage good communication, rather than 
focus on bad behaviour.  Richard Stallman's recent 
post about the new GNU Kind Communication 
Guidelines expresses the 
same idea.
Meanwhile, the Stack community is taking a similar 
approach.
Our guidelines are not set in stone; you can comment 
here.
   Perhaps they can evolve so that other Haskell committees (or even 
individuals) feel able to adopt them.
The Haskell community is such a rich collection of intelligent, passionate, and 
committed people. Thank you -- I love you all!
Simon



___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Residency profiles

2018-12-06 Thread Simon Peyton Jones via ghc-devs
Simon, Ben, Omer
As you'll see in comments 55-72 of 
https://ghc.haskell.org/trac/ghc/ticket/9476, Sebastian has been a bit 
flummoxed by the task of measure residency profiles; that is, how much data is 
truly live during execution.
A major GC measures that, but we are vulnerable to exactly when it happens 
(even with -G1) and that can lead to irreproducible results.
I think what we want is a way to trigger GC at very regular intervals, after 
(say) each 10kbytes or 100kbytes or 1Mbyte  of allocation.  That might be 
expensive, but we'd get reproducible results.
I don't think that is possible right now - see the ticket - but it would be 
easy enough to do wouldn't it?  Just give only 10k or 100k or 1M to the 
allocator when setting it running again.
Would you consider this?  Or are we just missing something obvious?
Needless to say, we want to do all this with full optimisation on, no 
cost-centre profiling.
Thanks
Simon
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs