Re: Advice requested for fixing issue 17914

2017-10-25 Thread Nemanja Boric via Digitalmars-d
On Wednesday, 25 October 2017 at 15:43:12 UTC, Steven 
Schveighoffer wrote:

On 10/25/17 11:27 AM, Nemanja Boric wrote:
On Wednesday, 25 October 2017 at 15:22:18 UTC, Nemanja Boric 
wrote:
On Wednesday, 25 October 2017 at 15:12:27 UTC, Nemanja Boric 
wrote:
On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M 
Davis wrote:
On Wednesday, October 25, 2017 09:26:26 Steven 
Schveighoffer via Digitalmars-d wrote:

[...]


Maybe there was a change in the OS(es) being used that 
affected the limit?


- Jonathan M Davis


Yes, the stack is not immediately unmapped because it's very 
common
just to reset the fiber and reuse it for handling the new 
connection -
creating new fibers (and doing unmap on termination) is a 
problem in the

real life (as this is as well).

At sociomantic we already had this issue: 
https://github.com/sociomantic-tsunami/tangort/issues/2 
Maybe this is the way to go - I don't see a reason why every 
stack should be mmaped separately.


I'm not sure that would allow us to mprotect the page guards, 
though.


I think the easiest way to proceed from here is to default the 
stack protection page to 0, so we avoid this regression. Once 
that's in, we can think about different allocation strategy 
for the fiber's stacks or throwing the exceptions on too many 
fibers (too bad mmap doesn't return error code, but you get 
SIGABRT instead :( )


mmap does return an error. And onOutMemoryError is called when 
it fails.


https://github.com/dlang/druntime/blob/master/src/core/thread.d#L4518

Which should throw an error:

https://github.com/dlang/druntime/blob/144c9e6e9a3c00aba82b92da527a52190fe91c97/src/core/exception.d#L542

however, when mprotect fails, it calls abort():

https://github.com/dlang/druntime/blob/master/src/core/thread.d#L4540


What do you think?


I think we should reverse the mprotect default, and try and 
determine a better way to opt-in to the limit.


Is this a Linux-specific problem? Are there issues with 
mprotect on other OSes? Or is Linux the only OS that supports 
mprotect?


-Steve



Reading FreeBSD man pages, it looks like at least FreeBSD has the 
same limitation in the mmap, but the man pages for the mprotect 
doesn't specify this. However, I believe it's just the issue in 
the man pages, as it would be the same thing really as with 
Linux. Linux's manpage for mprotect specifically says this:


   ENOMEM Changing the protection of a memory region would 
result in the
  total number of mappings with distinct attributes 
(e.g., read
  versus read/write protection) exceeding the allowed 
maximum.
  (For example, making the protection of a range 
PROT_READ in

  the middle of a region currently protected as
  PROT_READ|PROT_WRITE would result in three 
mappings: two
  read/write mappings at each end and a read-only 
mapping in the

  middle.)

so I was right about doubling the mappings.


> I think we should reverse the mprotect default, and try and
determine a better way to opt-in to the limit.


I agree: https://github.com/dlang/druntime/pull/1956


Re: Advice requested for fixing issue 17914

2017-10-25 Thread Steven Schveighoffer via Digitalmars-d

On 10/25/17 11:27 AM, Nemanja Boric wrote:

On Wednesday, 25 October 2017 at 15:22:18 UTC, Nemanja Boric wrote:

On Wednesday, 25 October 2017 at 15:12:27 UTC, Nemanja Boric wrote:

On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis wrote:
On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via 
Digitalmars-d wrote:

[...]


Maybe there was a change in the OS(es) being used that affected the 
limit?


- Jonathan M Davis


Yes, the stack is not immediately unmapped because it's very common
just to reset the fiber and reuse it for handling the new connection -
creating new fibers (and doing unmap on termination) is a problem in the
real life (as this is as well).

At sociomantic we already had this issue: 
https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is 
the way to go - I don't see a reason why every stack should be mmaped 
separately.


I'm not sure that would allow us to mprotect the page guards, though.


I think the easiest way to proceed from here is to default the stack 
protection page to 0, so we avoid this regression. Once that's in, we 
can think about different allocation strategy for the fiber's stacks or 
throwing the exceptions on too many fibers (too bad mmap doesn't return 
error code, but you get SIGABRT instead :( )


mmap does return an error. And onOutMemoryError is called when it fails.

https://github.com/dlang/druntime/blob/master/src/core/thread.d#L4518

Which should throw an error:

https://github.com/dlang/druntime/blob/144c9e6e9a3c00aba82b92da527a52190fe91c97/src/core/exception.d#L542

however, when mprotect fails, it calls abort():

https://github.com/dlang/druntime/blob/master/src/core/thread.d#L4540


What do you think?


I think we should reverse the mprotect default, and try and determine a 
better way to opt-in to the limit.


Is this a Linux-specific problem? Are there issues with mprotect on 
other OSes? Or is Linux the only OS that supports mprotect?


-Steve


Re: Advice requested for fixing issue 17914

2017-10-25 Thread Nemanja Boric via Digitalmars-d
On Wednesday, 25 October 2017 at 15:32:36 UTC, Steven 
Schveighoffer wrote:

On 10/25/17 11:12 AM, Nemanja Boric wrote:
On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M 
Davis wrote:
On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer 
via Digitalmars-d wrote:

[...]


Maybe there was a change in the OS(es) being used that 
affected the limit?




Yes, the stack is not immediately unmapped because it's very 
common
just to reset the fiber and reuse it for handling the new 
connection -
creating new fibers (and doing unmap on termination) is a 
problem in the

real life (as this is as well).

At sociomantic we already had this issue: 
https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe 
this is the way to go - I don't see a reason why every stack 
should be mmaped separately.


Hm... the mprotect docs specifically state that calling 
mprotect on something that's not allocated via mmap is 
undefined. So if you use GC to allocate Fiber stacks, you can't 
mprotect it.


I think what we need is a more configurable way to allocate 
stacks. There is a tradeoff to mprotect vs. simple allocation, 
and it's not obvious to choose one over the other.


I still am baffled as to why this is now showing up. Perhaps if 
you are using mmap as an allocator (as Fiber seems to be 
doing), it doesn't count towards the limit? Maybe it's just 
glommed into the standard allocator's space?


-Steve


I'm sorry I wrote several messages in the row, as the thoughts 
were
coming to me. I think the reason is that mprotect creates a new 
range, since it
needs to have distinct protection attributes, hence doubling the 
number of mappings.



Maybe it's just glommed into the standard allocator's space?



No, you get to see each fiber's stack allocated separately when 
you cat /proc/pid/mappings.


Re: Advice requested for fixing issue 17914

2017-10-25 Thread Steven Schveighoffer via Digitalmars-d

On 10/25/17 11:12 AM, Nemanja Boric wrote:

On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis wrote:
On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via 
Digitalmars-d wrote:

On 10/23/17 12:56 PM, Brian Schott wrote:
> Context: https://issues.dlang.org/show_bug.cgi?id=17914
>
> I need to get this issue resolved as soon as possible so > that the 
fix makes it into the next compiler release. > Because it involves 
cleanup code in a class destructor a > design change may be 
necessary. Who should I contact to > determine the best way to fix 
this bug?


It appears that the limitation applies to mmap calls as well, and 
mmap call to allocate the stack has been in Fiber since as far as I 
can tell the beginning. How has this not shown up before?


Maybe there was a change in the OS(es) being used that affected the 
limit?




Yes, the stack is not immediately unmapped because it's very common
just to reset the fiber and reuse it for handling the new connection -
creating new fibers (and doing unmap on termination) is a problem in the
real life (as this is as well).

At sociomantic we already had this issue: 
https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is 
the way to go - I don't see a reason why every stack should be mmaped 
separately.


Hm... the mprotect docs specifically state that calling mprotect on 
something that's not allocated via mmap is undefined. So if you use GC 
to allocate Fiber stacks, you can't mprotect it.


I think what we need is a more configurable way to allocate stacks. 
There is a tradeoff to mprotect vs. simple allocation, and it's not 
obvious to choose one over the other.


I still am baffled as to why this is now showing up. Perhaps if you are 
using mmap as an allocator (as Fiber seems to be doing), it doesn't 
count towards the limit? Maybe it's just glommed into the standard 
allocator's space?


-Steve


Re: Advice requested for fixing issue 17914

2017-10-25 Thread Nemanja Boric via Digitalmars-d
On Wednesday, 25 October 2017 at 15:22:18 UTC, Nemanja Boric 
wrote:
On Wednesday, 25 October 2017 at 15:12:27 UTC, Nemanja Boric 
wrote:
On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M 
Davis wrote:
On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer 
via Digitalmars-d wrote:

[...]


Maybe there was a change in the OS(es) being used that 
affected the limit?


- Jonathan M Davis


Yes, the stack is not immediately unmapped because it's very 
common
just to reset the fiber and reuse it for handling the new 
connection -
creating new fibers (and doing unmap on termination) is a 
problem in the

real life (as this is as well).

At sociomantic we already had this issue: 
https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe 
this is the way to go - I don't see a reason why every stack 
should be mmaped separately.


I'm not sure that would allow us to mprotect the page guards, 
though.


I think the easiest way to proceed from here is to default the 
stack protection page to 0, so we avoid this regression. Once 
that's in, we can think about different allocation strategy for 
the fiber's stacks or throwing the exceptions on too many fibers 
(too bad mmap doesn't return error code, but you get SIGABRT 
instead :( )


What do you think?


Re: Advice requested for fixing issue 17914

2017-10-25 Thread Nemanja Boric via Digitalmars-d
On Wednesday, 25 October 2017 at 15:12:27 UTC, Nemanja Boric 
wrote:
On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis 
wrote:
On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer 
via Digitalmars-d wrote:

[...]


Maybe there was a change in the OS(es) being used that 
affected the limit?


- Jonathan M Davis


Yes, the stack is not immediately unmapped because it's very 
common
just to reset the fiber and reuse it for handling the new 
connection -
creating new fibers (and doing unmap on termination) is a 
problem in the

real life (as this is as well).

At sociomantic we already had this issue: 
https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe 
this is the way to go - I don't see a reason why every stack 
should be mmaped separately.


I'm not sure that would allow us to mprotect the page guards, 
though.


Re: Advice requested for fixing issue 17914

2017-10-25 Thread Nemanja Boric via Digitalmars-d
On Wednesday, 25 October 2017 at 13:26:26 UTC, Steven 
Schveighoffer wrote:

On 10/23/17 12:56 PM, Brian Schott wrote:

Context: https://issues.dlang.org/show_bug.cgi?id=17914

I need to get this issue resolved as soon as possible so that 
the fix makes it into the next compiler release. Because it 
involves cleanup code in a class destructor a design change 
may be necessary. Who should I contact to determine the best 
way to fix this bug?


It appears that the limitation applies to mmap calls as well, 
and mmap call to allocate the stack has been in Fiber since as 
far as I can tell the beginning. How has this not shown up 
before?


Although after the stack overflow protection for fibers number of 
mmap calls is the same, _I think_ mprotect actually splits the 
single mmapped region into two (as they share different 
protection bits). This effectively doubles the number of the 
regions. As a workaround (if you have lots of fibers and doesn't 
care about stack overflow protection) you can just pass zero for 
the guardPageSize: 
https://github.com/dlang/druntime/blob/master/src/core/thread.d#L4037


Re: Advice requested for fixing issue 17914

2017-10-25 Thread Nemanja Boric via Digitalmars-d
On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis 
wrote:
On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer 
via Digitalmars-d wrote:

On 10/23/17 12:56 PM, Brian Schott wrote:
> Context: https://issues.dlang.org/show_bug.cgi?id=17914
>
> I need to get this issue resolved as soon as possible so 
> that the fix makes it into the next compiler release. 
> Because it involves cleanup code in a class destructor a 
> design change may be necessary. Who should I contact to 
> determine the best way to fix this bug?


It appears that the limitation applies to mmap calls as well, 
and mmap call to allocate the stack has been in Fiber since as 
far as I can tell the beginning. How has this not shown up 
before?


Maybe there was a change in the OS(es) being used that affected 
the limit?


- Jonathan M Davis


Yes, the stack is not immediately unmapped because it's very 
common
just to reset the fiber and reuse it for handling the new 
connection -
creating new fibers (and doing unmap on termination) is a problem 
in the

real life (as this is as well).

At sociomantic we already had this issue: 
https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe 
this is the way to go - I don't see a reason why every stack 
should be mmaped separately.


Re: Advice requested for fixing issue 17914

2017-10-25 Thread Nemanja Boric via Digitalmars-d
On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis 
wrote:
On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer 
via Digitalmars-d wrote:

On 10/23/17 12:56 PM, Brian Schott wrote:
> Context: https://issues.dlang.org/show_bug.cgi?id=17914
>
> I need to get this issue resolved as soon as possible so 
> that the fix makes it into the next compiler release. 
> Because it involves cleanup code in a class destructor a 
> design change may be necessary. Who should I contact to 
> determine the best way to fix this bug?


It appears that the limitation applies to mmap calls as well, 
and mmap call to allocate the stack has been in Fiber since as 
far as I can tell the beginning. How has this not shown up 
before?


Maybe there was a change in the OS(es) being used that affected 
the limit?


- Jonathan M Davis


On linux it's controllable by: `sysctl vm.max_map_count`


Re: Advice requested for fixing issue 17914

2017-10-25 Thread Jonathan M Davis via Digitalmars-d
On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via 
Digitalmars-d wrote:
> On 10/23/17 12:56 PM, Brian Schott wrote:
> > Context: https://issues.dlang.org/show_bug.cgi?id=17914
> >
> > I need to get this issue resolved as soon as possible so that the fix
> > makes it into the next compiler release. Because it involves cleanup
> > code in a class destructor a design change may be necessary. Who should
> > I contact to determine the best way to fix this bug?
>
> It appears that the limitation applies to mmap calls as well, and mmap
> call to allocate the stack has been in Fiber since as far as I can tell
> the beginning. How has this not shown up before?

Maybe there was a change in the OS(es) being used that affected the limit?

- Jonathan M Davis



Re: Advice requested for fixing issue 17914

2017-10-25 Thread Steven Schveighoffer via Digitalmars-d

On 10/23/17 12:56 PM, Brian Schott wrote:

Context: https://issues.dlang.org/show_bug.cgi?id=17914

I need to get this issue resolved as soon as possible so that the fix 
makes it into the next compiler release. Because it involves cleanup 
code in a class destructor a design change may be necessary. Who should 
I contact to determine the best way to fix this bug?


It appears that the limitation applies to mmap calls as well, and mmap 
call to allocate the stack has been in Fiber since as far as I can tell 
the beginning. How has this not shown up before?


Regardless of the cause, this puts a limitation on the number of 
simultaneous Fibers one can have. In other words, this is not just a 
problem with Fibers not being cleaned up properly, because one may need 
more than 65k fibers actually running simultaneously. We should try to 
prevent that as a limitation.


For example, even the following code I would think is something we 
should support:


void main()
{
import std.concurrency : Generator, yield;
import std.stdio : File, writeln;

auto f = File("/proc/sys/vm/max_map_count", "r");
ulong n;
f.readf("%d", );
writeln("/proc/sys/vm/max_map_count = ", n);
Generator!int[] gens; // retain pointers to all the generators
foreach (i; 0 .. n + 1000)
{
if (i % 1000 == 0)
writeln("i = ", i);
gens ~= new Generator!int({ yield(1); });
}
}

If we *can't* do this, then we should provide a way to manage the limits

I.e. there should be a way to be able to create more than the limit's 
number of fibers, but only allocate stacks when we can (and have a way 
to tell the user what's going on).


-Steve


Re: Advice requested for fixing issue 17914

2017-10-24 Thread safety0ff via Digitalmars-d

On Wednesday, 25 October 2017 at 01:26:10 UTC, Brian Schott wrote:


I've been reading the Fiber code and (so far) that seems seems 
to be reasonable. Can anybody think of a reason that this would 
be a bad idea? I'd rather not create a pull request for a 
design that's not going to work because of a detail I've 
overlooked.


Just skimming the Fiber code I found the reset(...) API functions 
whose purpose is to re-use Fibers once they've terminated.


Eager stack deallocation would have to coexist with the Fiber 
reuse API.


Perhaps the Fiber reuse API could simply be polished & made easy 
to integrate so that your original use case no longer hits system 
limits.


I.e. Perhaps an optional delegate could be called upon 
termination, making it easier to hook in Fiber recycling.


The reason my thoughts head in that direction is that I've read 
that mmap/unmmap 'ing frequently isn't recommended in performance 
conscious programs.


Re: Advice requested for fixing issue 17914

2017-10-24 Thread Brian Schott via Digitalmars-d

On Tuesday, 24 October 2017 at 21:49:10 UTC, qznc wrote:
My question wrt to the bug: Why is munmap/freeStack called in 
the destructor? Could be done right after termination?


I've been reading the Fiber code and (so far) that seems seems to 
be reasonable. Can anybody think of a reason that this would be a 
bad idea? I'd rather not create a pull request for a design 
that's not going to work because of a detail I've overlooked.


Re: Advice requested for fixing issue 17914

2017-10-24 Thread Brian Schott via Digitalmars-d
On Tuesday, 24 October 2017 at 14:28:01 UTC, Steven Schveighoffer 
wrote:
A failing use case would help. Fixing a bug when you can't 
reproduce is difficult.


-Steve


I've attached one to the bug report.



Re: Advice requested for fixing issue 17914

2017-10-24 Thread qznc via Digitalmars-d

On Monday, 23 October 2017 at 16:56:32 UTC, Brian Schott wrote:

Context: https://issues.dlang.org/show_bug.cgi?id=17914

I need to get this issue resolved as soon as possible so that 
the fix makes it into the next compiler release. Because it 
involves cleanup code in a class destructor a design change may 
be necessary. Who should I contact to determine the best way to 
fix this bug?


Looking at git blame [0], I guess Martin Nowak and Nemanja Boric 
seem to be pretty involved. Not sure how deep Petar Kirov and 
Sean Kelly are into Fibers.


My question wrt to the bug: Why is munmap/freeStack called in the 
destructor? Could be done right after termination?


[0] 
https://github.com/dlang/druntime/blame/ec9a79e15d446863191308fd5e20febce2053546/src/core/thread.d#L4077


Re: Advice requested for fixing issue 17914

2017-10-24 Thread Steven Schveighoffer via Digitalmars-d

On 10/23/17 12:56 PM, Brian Schott wrote:

Context: https://issues.dlang.org/show_bug.cgi?id=17914

I need to get this issue resolved as soon as possible so that the fix 
makes it into the next compiler release. Because it involves cleanup 
code in a class destructor a design change may be necessary. Who should 
I contact to determine the best way to fix this bug?


A failing use case would help. Fixing a bug when you can't reproduce is 
difficult.


-Steve


Re: Advice requested for fixing issue 17914

2017-10-24 Thread Kagamin via Digitalmars-d
Call destroy(fiber) when it completed execution? Who manages 
fibers?


Advice requested for fixing issue 17914

2017-10-23 Thread Brian Schott via Digitalmars-d

Context: https://issues.dlang.org/show_bug.cgi?id=17914

I need to get this issue resolved as soon as possible so that the 
fix makes it into the next compiler release. Because it involves 
cleanup code in a class destructor a design change may be 
necessary. Who should I contact to determine the best way to fix 
this bug?