Re: New Windows I/O manager in GHC 8.12

2020-07-20 Thread Phyx
Thanks Simon, cheers :)

Sent from my Mobile

On Mon, Jul 20, 2020, 15:28 Simon Peyton Jones 
wrote:

> Tamar, I salute you!  This is a big piece of work – thank you!
>
>
> Simon
>
>
>
> *From:* ghc-devs  *On Behalf Of *Phyx
> *Sent:* 17 July 2020 16:04
> *To:* ghc-devs@haskell.org Devs 
> *Subject:* New Windows I/O manager in GHC 8.12
>
>
>
> Hi All,
>
> In case you've missed it, about 150 or so commits were committed to master
> yesterday.  These commits add WinIO (Windows I/O) to GHC.  This is a new
> I/O
> manager that is designed for the native Windows I/O subsystem instead of
> relying on the broken posix-ish compatibility layer that MIO used.
>
> This is one of 3 big patches I have been working on for years now..
>
> So before I continue on why WinIO was made I'll add a TL;DR;
>
> WinIO adds an internal API break compared to previous GHC releases.  That
> is
> the internal code was modified to support a completely asynchronous I/O
> system.
>
> What this means is that we have to keep track of the file pointer offset
> which
> previously was done by the C runtime.  This is because in async I/O you
> cannot
> assume the offset to be at any given location.
>
> What does this mean for you? Very little. If you did not use internal GHC
> I/O code.
> In particular if you haven't used Buffer, BufferIO and RawIO. If you have
> you will
> to explicitly add support for GHC 8.12+.
>
> Because FDs are a Unix concept and don't behave as you would expect on
> Windows, the
> new I/O manager also uses HANDLE instead of FD. This means that any
> library that has
> used the internal GHC Fd type won't work with WinIO. Luckily the number of
> libraries
> that have seems quite low. If you can please stick to the external Handle
> interface
> for I/O functions.
>
> The boot libraries have been updated, and in particular process *requires*
> the version
> that is shipped with GHC.  Please respect the version bounds here!  I will
> be writing
> a migration guide for those that need to migrate code.  The amount of work
> is usually
> trivial as Base provides shims to do most of the common things you would
> have used Fd for.
>
> Also if I may make a plea to GHC developers.. Do not add non-trivial
> implementations
> in the external exposed modules (e.g. System.xxx, Data.xxx) but rather add
> them to internal
> modules (GHC.xxx) and re-export them from the external modules.  This
> allows us to avoid
> import cycles inside the internal modules :)
>
> --
>
> So why WinIO? Over the years a number of hard to fix issues popped up on
> Windows, including
> proper Unicode console I/O, cooked inputs, ability to cancel I/O requests.
> This also allows libraries like Brick to work on Windows without
> re-inventing the wheel or have to hide their I/O from the I/O manager.
>
> In order to attempt to do some of these with MIO layer upon layers of
> hacks were added.  This means that things sometimes worked.., but when it
> didn't was rather unpredictable.  Some of the issues were simply unfixable
> with MIO.  I will be making some posts about how WinIO works (and also
> archiving them on the wiki don't worry :)) but for now some highlights:
>
> WinIO is 3 years of work, First started by Joey Hess, then picked up by
> Mikhail Glushenkov before landing at my feet.  While the majority has been
> rewritten their work did provide a great jumping off point so thanks!  Also
> thanks to Ben and AndreasK for helping me get it over the line.. As you can
> imagine I was exhausted by this point :).
>
> Some stats: ~8000 new lines and ~1100 removed ones spread over 130+
> commits (sorry this was the smallest we could get it while not losing some
> historical context) and with over 153 files changed not counting the
> changes to boot libraries.
>
> It Fixes #18307, #17035, #16917, #15366, #14530, #13516, #13396, #13359,
> #12873, #12869, #11394, #10542, #10484, #10477, #9940, #7593, #7353, #5797,
> #5305, #4471, #3937, #3081, #12117, #2408, #10956, #2189
> (but only on native windows consoles, so no msys shells) and #806 which is
> 14 years old!
>
> WinIO is a dynamic choice, so you can switch between I/O managers using
> the RTS flag --io-manager=[native|posix].
>
> On non-Windows native is the same as posix.
>
> The chosen Async interface for this implementation is using Completion
> Ports.
>
> The I/O manager uses a new interface added in Windows Vista called
> GetQueuedCompletionStatusEx which allows us to service multiple
> request interrupts in one go.
>
> Some highlights:
>
> * Drops Windows Vista support
>   Vista is out of extended support as of 2017. The new minimum is Windows
> 7.  This allows us to use much more efficient OS provided abstractions.
>
> * Replace Events and Monitor locks with much faster and efficient
> Conditional Variables and SlimReaderWriterLocks.
> * Change GHC's Buffer and I/O structs to support asynchronous operation by
> not relying on the OS managing File Offset.
> * Implement a new command line flag +RTS --io-manag

RE: New Windows I/O manager in GHC 8.12

2020-07-20 Thread Simon Peyton Jones via ghc-devs
Tamar, I salute you!  This is a big piece of work – thank you!

Simon

From: ghc-devs  On Behalf Of Phyx
Sent: 17 July 2020 16:04
To: ghc-devs@haskell.org Devs 
Subject: New Windows I/O manager in GHC 8.12

Hi All,

In case you've missed it, about 150 or so commits were committed to master
yesterday.  These commits add WinIO (Windows I/O) to GHC.  This is a new I/O
manager that is designed for the native Windows I/O subsystem instead of
relying on the broken posix-ish compatibility layer that MIO used.

This is one of 3 big patches I have been working on for years now..

So before I continue on why WinIO was made I'll add a TL;DR;

WinIO adds an internal API break compared to previous GHC releases.  That is
the internal code was modified to support a completely asynchronous I/O system.

What this means is that we have to keep track of the file pointer offset which
previously was done by the C runtime.  This is because in async I/O you cannot
assume the offset to be at any given location.

What does this mean for you? Very little. If you did not use internal GHC I/O 
code.
In particular if you haven't used Buffer, BufferIO and RawIO. If you have you 
will
to explicitly add support for GHC 8.12+.

Because FDs are a Unix concept and don't behave as you would expect on Windows, 
the
new I/O manager also uses HANDLE instead of FD. This means that any library 
that has
used the internal GHC Fd type won't work with WinIO. Luckily the number of 
libraries
that have seems quite low. If you can please stick to the external Handle 
interface
for I/O functions.

The boot libraries have been updated, and in particular process *requires* the 
version
that is shipped with GHC.  Please respect the version bounds here!  I will be 
writing
a migration guide for those that need to migrate code.  The amount of work is 
usually
trivial as Base provides shims to do most of the common things you would have 
used Fd for.

Also if I may make a plea to GHC developers.. Do not add non-trivial 
implementations
in the external exposed modules (e.g. System.xxx, Data.xxx) but rather add them 
to internal
modules (GHC.xxx) and re-export them from the external modules.  This allows us 
to avoid
import cycles inside the internal modules :)

--

So why WinIO? Over the years a number of hard to fix issues popped up on 
Windows, including
proper Unicode console I/O, cooked inputs, ability to cancel I/O requests. This 
also allows libraries like Brick to work on Windows without re-inventing the 
wheel or have to hide their I/O from the I/O manager.

In order to attempt to do some of these with MIO layer upon layers of hacks 
were added.  This means that things sometimes worked.., but when it didn't was 
rather unpredictable.  Some of the issues were simply unfixable with MIO.  I 
will be making some posts about how WinIO works (and also archiving them on the 
wiki don't worry :)) but for now some highlights:

WinIO is 3 years of work, First started by Joey Hess, then picked up by Mikhail 
Glushenkov before landing at my feet.  While the majority has been rewritten 
their work did provide a great jumping off point so thanks!  Also thanks to Ben 
and AndreasK for helping me get it over the line.. As you can imagine I was 
exhausted by this point :).

Some stats: ~8000 new lines and ~1100 removed ones spread over 130+ commits 
(sorry this was the smallest we could get it while not losing some historical 
context) and with over 153 files changed not counting the changes to boot 
libraries.

It Fixes #18307, #17035, #16917, #15366, #14530, #13516, #13396, #13359, 
#12873, #12869, #11394, #10542, #10484, #10477, #9940, #7593, #7353, #5797, 
#5305, #4471, #3937, #3081, #12117, #2408, #10956, #2189
(but only on native windows consoles, so no msys shells) and #806 which is 14 
years old!

WinIO is a dynamic choice, so you can switch between I/O managers using the RTS 
flag --io-manager=[native|posix].

On non-Windows native is the same as posix.

The chosen Async interface for this implementation is using Completion Ports.

The I/O manager uses a new interface added in Windows Vista called 
GetQueuedCompletionStatusEx which allows us to service multiple request 
interrupts in one go.

Some highlights:

* Drops Windows Vista support
  Vista is out of extended support as of 2017. The new minimum is Windows 7.  
This allows us to use much more efficient OS provided abstractions.

* Replace Events and Monitor locks with much faster and efficient Conditional 
Variables and SlimReaderWriterLocks.
* Change GHC's Buffer and I/O structs to support asynchronous operation by not 
relying on the OS managing File Offset.
* Implement a new command line flag +RTS --io-manager=[native|posix] to control 
which I/O manager is used.
* Implement a new Console I/O interface supporting much faster reads/writes and 
unicode output correctly.  Also supports things like cooked input etc.
* In new I/O manager if the user still has their code-page set to OEM, then we 
use UTF-8 

Re: HEAD doesn't build. Totally stalled.

2020-07-20 Thread Moritz Angermann
Ther revert MR is here: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/3714
It's kind of ironic that it's stuck in CI limbo, whereas the initial MR wasn't.

> I'm surprised gitlab presubmit merge did not detect the build breakage.
So am I!

As laid out, I believe a better solution is to have a mapping of
symbols to potential
carrying libraries, and have GHC know about that, when the linker tries to link
arbitrary objects and encounters those symbols. Another strategy that Tamar
employed to great success on the windows side, is to just increase the
set of libraries
GHC tries to load by default, and thus get rid of the annoying list of
symbols in the
RTS.

I hope the above MR will pass now (after another rebase); and I can
find some time to
implement a better solution soon.

Cheers,
 Moritz

On Mon, Jul 20, 2020 at 4:28 PM Sergei Trofimovich  wrote:
>
> On Fri, 17 Jul 2020 10:45:37 +0800
> Moritz Angermann  wrote:
>
> > Well, we actually *do* test for __SSP__ in HEAD:
> > https://github.com/ghc/ghc/blob/master/rts/RtsSymbols.c#L1170
> > Which currently lists:
> > #if !defined(mingw32_HOST_OS) && !defined(DYNAMIC) &&
> > (defined(_FORTIFY_SOURCE) || defined(__SSP__))
>
> I believe it's a https://gitlab.haskell.org/ghc/ghc/-/issues/18442
>
> It breaks for me as well.
>
> It triggers if one has gcc compiler with any of 2 properties:
>
> 1. gcc is built with --enable-default-ssp (sets __SSP__ for all compilations)
> 2. gcc defaults to _FORTIFY_SOURCE
>
> Note that presence or absence of __stack_chk_guard is indicated
> by neither of these and instead is present when gcc is built with
> --enable-libssp (use gcc's __stack_* functions instead gcc's direct TLS
> instructions with one glibc fallback.)
>
> Gentoo does both [1.] and [2.] by default. I believe Debian does at least
> [2.] by default. I'm surprised gitlab presubmit merge did not detect the
> build breakage.
>
> What do macros [1] and [2.] mean for glibc-linux:
>
> - _FORTIFY_SOURCE only affects glibc headers to change memcpy()
>   calls to memcpy_chk() to add overflow checks. It does not affect
>   symbol exports available by libc. __stack_* symbols are always present.
>   Parts of libc or other libraries we link ghc with coult already call 
> __stack_*
>   function as they could already be built with _FORTIFY_SOURCE. Regardless
>   of how ghc is being built: with _FORTIFY_SOURCE or without.
>
> - __SSP__  indicates code generation of stack canary placement by gcc
>   (-fstack-protector-* options, or default override with gcc's 
> --enable-default-ssp)
>
>   If target is not a gcc's libssp target (a.k.a. --disable-libssp), a default 
> for all
>   linux-glibc targets) then gcc never uses -lssp and uses gcc's builtin 
> instructions
>   instead of __stack_chk_guard helpers. In this mode __stack_chk_guard is not
>   present in any libraries installed by gcc or glibc. The only symbol 
> provided by glibc
>   is __stack_chk_fail (which arguably should not be exposed at all as it's an
>   unusual contract between glibc/gcc: https://gcc.gnu.org/PR93509)
>
> --enable-libssp for gcc does bring in __stack_chk_guard. Library is present 
> and could
> use __stack_chk_guard in libraries ghc depends on regardless of
> -fstack-protector-* options used to build ghc. I believe --enable-libssp is 
> used only
> on mingw.
>
> What I'm trying to say is that presence of __stack_chk_guard is orthogonal
> to either __SSP__ define or _FORTIFY_SOURCE ghc uses today..
>
> It's rather a function of how gcc toolchain was built: --enable-libssp or not.
>
> > But this seems to still be ill conceived.  And while Simon is the only
> > one I'm aware of, for whom this breaks we need to find a better
> > solution. As such, we will revert the commits.
> >
> > Why do we do all this symbol nonsense in the RTS to begin with?  It
> > has to do with our static linker we have in GHC. Loading arbitrary
> > archives, means we need to be able to resolve all kinds of symbols
> > that objects might refer to. For regular dependencies this will work
> > if the dependencies are listed in the package configuration file, the
> > linker will know which dependencies to link. This get a bit annoying
> > for libraries that the compiler will automagically provide. libgcc,
> > libssp, librt, ...
> >
> > The solution so far was simply to have the RTS depend on these
> > symbols, and keep a list of them around. That way when the linker
> > built the RTS we'd get it to link all these symbols into the RTS, and
> > we could refer to them in the linker. Essentially looking them up in
> > the linked binary (ghc, or iserv).
> >
> > This is a rather tricky problem, and almost all solutions we came up
> > with are annoying in one or more dimensions.  After some discussion on
> > IRC last night, we'll go forward trying the following solution:
> >
> > We'll keep a file in the lib folder (similar to the settings,
> > llvm-targets, ...) that is essentially a lookup table of Symbol ->
> > [Library]. If we encounter an un

Re: HEAD doesn't build. Totally stalled.

2020-07-20 Thread Sergei Trofimovich
On Fri, 17 Jul 2020 10:45:37 +0800
Moritz Angermann  wrote:

> Well, we actually *do* test for __SSP__ in HEAD:
> https://github.com/ghc/ghc/blob/master/rts/RtsSymbols.c#L1170
> Which currently lists:
> #if !defined(mingw32_HOST_OS) && !defined(DYNAMIC) &&
> (defined(_FORTIFY_SOURCE) || defined(__SSP__))

I believe it's a https://gitlab.haskell.org/ghc/ghc/-/issues/18442

It breaks for me as well.

It triggers if one has gcc compiler with any of 2 properties:

1. gcc is built with --enable-default-ssp (sets __SSP__ for all compilations)
2. gcc defaults to _FORTIFY_SOURCE

Note that presence or absence of __stack_chk_guard is indicated
by neither of these and instead is present when gcc is built with
--enable-libssp (use gcc's __stack_* functions instead gcc's direct TLS
instructions with one glibc fallback.)

Gentoo does both [1.] and [2.] by default. I believe Debian does at least
[2.] by default. I'm surprised gitlab presubmit merge did not detect the
build breakage.

What do macros [1] and [2.] mean for glibc-linux:

- _FORTIFY_SOURCE only affects glibc headers to change memcpy()
  calls to memcpy_chk() to add overflow checks. It does not affect
  symbol exports available by libc. __stack_* symbols are always present.
  Parts of libc or other libraries we link ghc with coult already call __stack_*
  function as they could already be built with _FORTIFY_SOURCE. Regardless
  of how ghc is being built: with _FORTIFY_SOURCE or without.

- __SSP__  indicates code generation of stack canary placement by gcc
  (-fstack-protector-* options, or default override with gcc's 
--enable-default-ssp)

  If target is not a gcc's libssp target (a.k.a. --disable-libssp), a default 
for all
  linux-glibc targets) then gcc never uses -lssp and uses gcc's builtin 
instructions
  instead of __stack_chk_guard helpers. In this mode __stack_chk_guard is not
  present in any libraries installed by gcc or glibc. The only symbol provided 
by glibc
  is __stack_chk_fail (which arguably should not be exposed at all as it's an
  unusual contract between glibc/gcc: https://gcc.gnu.org/PR93509)

--enable-libssp for gcc does bring in __stack_chk_guard. Library is present and 
could
use __stack_chk_guard in libraries ghc depends on regardless of
-fstack-protector-* options used to build ghc. I believe --enable-libssp is 
used only
on mingw.

What I'm trying to say is that presence of __stack_chk_guard is orthogonal
to either __SSP__ define or _FORTIFY_SOURCE ghc uses today..

It's rather a function of how gcc toolchain was built: --enable-libssp or not.

> But this seems to still be ill conceived.  And while Simon is the only
> one I'm aware of, for whom this breaks we need to find a better
> solution. As such, we will revert the commits.
> 
> Why do we do all this symbol nonsense in the RTS to begin with?  It
> has to do with our static linker we have in GHC. Loading arbitrary
> archives, means we need to be able to resolve all kinds of symbols
> that objects might refer to. For regular dependencies this will work
> if the dependencies are listed in the package configuration file, the
> linker will know which dependencies to link. This get a bit annoying
> for libraries that the compiler will automagically provide. libgcc,
> libssp, librt, ...
> 
> The solution so far was simply to have the RTS depend on these
> symbols, and keep a list of them around. That way when the linker
> built the RTS we'd get it to link all these symbols into the RTS, and
> we could refer to them in the linker. Essentially looking them up in
> the linked binary (ghc, or iserv).
> 
> This is a rather tricky problem, and almost all solutions we came up
> with are annoying in one or more dimensions.  After some discussion on
> IRC last night, we'll go forward trying the following solution:
> 
> We'll keep a file in the lib folder (similar to the settings,
> llvm-targets, ...) that is essentially a lookup table of Symbol ->
> [Library]. If we encounter an unknown symbol, and we have it in our
> lookup table, we will try to load the named libraries, hoping for them
> to contain the symbol we are looking for. If everything fails we'll
> bail.
> 
> For the example symbols that prompted this issue: (which are emitted
> when stack smashing protector hardening is enabled, which seems to be
> the default on most linux distributions today, which is likely why I
> couldn't reproduce this easily.)
> 
> [("__stack_chk_guard", ["ssp"])]
> 
> would tell the compiler to try to locate (through the usual library
> location means) the library called "ssp", if it encounters the symbol
> "__stack_chk_guard".
> 
> Isn't this what the dynamic linker is supposed to solve? Why do we
> have to do all this on our own? Can't we just use the dynamic linker?
> Yes, and no. Yes we can use the dynamic linker, and we even do. But
> not all platforms have a working, or usable linker. iOS for example
> has a working dynamic linker, but user programs can't use it. muslc
> reports "Dynamic loading n

RE: Unmerged Patch: 3358

2020-07-20 Thread Simon Peyton Jones via ghc-devs
Matthew

It looks from 
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/3358
as if it was blocked on something to do with 'text'.  Is that unblocked now?

The MR also says "Fast forward merge is not possible".

So it sounds as if the steps are:

* Check that the change to text, whatever that is, has been done,
  and fix the text submodule commit on the MR
* Rebase
* Assign to Marge.

If you get stuck with that, do yell.

Simon


|  -Original Message-
|  From: ghc-devs  On Behalf Of Matthew
|  Pickering
|  Sent: 20 July 2020 08:46
|  To: GHC developers 
|  Subject: Unmerged Patch: 3358
|  
|  Hi,
|  
|  My patch 3358 needs to get merged before the 8.12 fork.
|  
|  When I finished it (in May), it passed CI and after this point I
|  lacked time to work on it further. Now I have asked 4 times for this
|  patch to get merged and it is still open.
|  
|  The GHC proposal for this patch already took an extortionate amount of
|  time to get accepted. Please can we close this chapter by merging the
|  patch.
|  
|  Cheers,
|  
|  Matt
|  ___
|  ghc-devs mailing list
|  ghc-devs@haskell.org
|  https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.has
|  kell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
|  devs&data=02%7C01%7Csimonpj%40microsoft.com%7Cc2cb1e33865c4972eaaf08d
|  82c810934%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63730827004717
|  &sdata=TZc9lHJMlijCfhhhqWSLk8o46z5tosFz91LWAy9h1b8%3D&reserved=0
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Unmerged Patch: 3358

2020-07-20 Thread Matthew Pickering
Hi,

My patch 3358 needs to get merged before the 8.12 fork.

When I finished it (in May), it passed CI and after this point I
lacked time to work on it further. Now I have asked 4 times for this
patch to get merged and it is still open.

The GHC proposal for this patch already took an extortionate amount of
time to get accepted. Please can we close this chapter by merging the
patch.

Cheers,

Matt
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs