Re: Sv: Sv: Sv: Sv: Sv: Sv: Sv: Sv: Named pipes and multiple writers

2020-04-02 Thread Ken Brown via Cygwin

On 4/2/2020 4:05 AM, sten.kristian.ivars...@gmail.com wrote:

It is great that you're looking into a totally dynamic solution


I've just pushed a first attempt at this.  I tested it by running your test 
program with parameters 50/50.  It looks like there are about 2000 writers open 
at once during this run.


Ken
--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Sv: Sv: Sv: Sv: Sv: Sv: Sv: Sv: Named pipes and multiple writers

2020-04-02 Thread Kristian Ivarsson via Cygwin
> On 4/1/2020 2:34 PM, Ken Brown via Cygwin wrote:
> > On 4/1/2020 1:14 PM, sten.kristian.ivars...@gmail.com wrote:
> >>> On 4/1/2020 4:52 AM, sten.kristian.ivars...@gmail.com wrote:
> > On 3/31/2020 5:10 PM, sten.kristian.ivars...@gmail.com wrote:
> >>> On 3/28/2020 10:19 PM, Ken Brown via Cygwin wrote:
>  On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote:
> > On 3/28/2020 8:10 AM, sten.kristian.ivars...@gmail.com wrote:
> >>> On 3/27/2020 10:53 AM, sten.kristian.ivars...@gmail.com wrote:
> > On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:
> >> On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:
> >>> On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com
wrote:
>  The ENIXIO occurs when parallel child-processes
>  simultaneously using O_NONBLOCK opening the descriptor.
> >>>
> >>> This is consistent with my guess that the error is
> >>> generated by fhandler_fifo::wait.  I have a feeling that
> >>> read_ready should have been created as a manual-reset
> >>> event, and that more care is needed to make sure it's
> >>> set
> >> when it should be.
> >>
> >> [snip]
> >>
>  Never mind.  I was able to reproduce the problem and find the
cause.
>  What happens is that when the first subprocess exits,
>  fhandler_fifo::close resets read_ready.  That causes the second
>  and subsequent subprocesses to think that there's no reader
>  open, so their attempts to open a writer with O_NONBLOCK fail
with ENXIO.
> >>
> >> [snip]
> >>
>  I wrote in a previous mail in this topic that it seemed to work
>  fine for me as well, but when I bumped up the numbers of writers
>  and/or the number of messages (e.g. 25/25) it starts to fail again
> >>
> >> [snip]
> >>
> >>> Yes, it is a resource issue.  There is a limit on the number of
> >>> writers
> >> that can be open at one
> >>> time, currently 64.  I chose that number arbitrarily, with no idea
> >>> what
> >> might actually be
> >>> needed in practice, and it can easily be changed.
> >>
> >> Does it have to be a limit at all ? We would rather see that the
> >> application decide how much resources it would like to use. In our
> >> particular case there will be a process-manager with an incoming pipe
> >> that possible several thousands of processes will write to
> >
> > I agree.
> >
> >> Just for fiddling around (to figure out if this is the limit that
> >> make other things work a bit odd), where's this 64 limit defined now ?
> >
> > It's MAX_CLIENTS, defined in fhandler.h.  But there seem to be other
> > resource issues also; simply increasing MAX_CLIENTS doesn't solve the
> > problem.  I think there are also problems with the number of threads,
> > for example.  Each time your program forks, the subprocess inherits
> > the rfd file descriptor and its "fifo_reader_thread" starts up.  This
> > is unnecessary for your application, so I tried disabling it (in
> fhandler_fifo::fixup_after_fork), just as an experiment.
> >
> > But then I ran into some deadlocks, suggesting that one of the locks
> > I'm using isn't robust enough.  So I've got a lot of things to work on.
> >
> >>> In addition, a writer isn't recognized as closed until a reader
> >>> tries to
> >> read and gets an error.
> >>> In your example with 25/25, the list of writers quickly gets to 64
> >>> before
> >> the parent ever tries
> >>> to read.
> >>
> >> That explains the behaviour, but should there be some error returned
> >> from open/write (maybe it is but I'm missing it) ?
> >
> > The error is discovered in add_client_handler, called from
> > thread_func.  I think you'll only see it if you run the program under
> > strace.  I'll see if I can find a way to report it.  Currently,
> > there's a retry loop in fhandler_fifo::open when a writer tries to
> > open, and I think I need to limit the number of retries and then error
out.
> 
> I pushed a few improvements and bug fixes, and your 25/25 example now runs
without a
> problem.  I increased MAX_CLIENTS to 1024 just for the sake of this
example, but I'll
> work on letting the number of writers increase dynamically as needed.

I pulled it and tried it out and yes, the sample test program with 25/25
worked well and a whole bunch of our unit-tests passed with ok result now

We still do have some issues, but I cannot yet tell if they are related to
named pipes or not

It is great that you're looking into a totally dynamic solution

Kristian

> Ken

--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Sv: Sv: Sv: Sv: Sv: Sv: Sv: Named pipes and multiple writers

2020-04-01 Thread Ken Brown via Cygwin

On 4/1/2020 2:34 PM, Ken Brown via Cygwin wrote:

On 4/1/2020 1:14 PM, sten.kristian.ivars...@gmail.com wrote:

On 4/1/2020 4:52 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/31/2020 5:10 PM, sten.kristian.ivars...@gmail.com wrote:

On 3/28/2020 10:19 PM, Ken Brown via Cygwin wrote:

On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote:

On 3/28/2020 8:10 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/27/2020 10:53 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote:

The ENIXIO occurs when parallel child-processes
simultaneously using O_NONBLOCK opening the descriptor.


This is consistent with my guess that the error is
generated by fhandler_fifo::wait.  I have a feeling that
read_ready should have been created as a manual-reset
event, and that more care is needed to make sure it's set

when it should be.

[snip]


Never mind.  I was able to reproduce the problem and find the cause.
What happens is that when the first subprocess exits,
fhandler_fifo::close resets read_ready.  That causes the second
and subsequent subprocesses to think that there's no reader open,
so their attempts to open a writer with O_NONBLOCK fail with ENXIO.


[snip]


I wrote in a previous mail in this topic that it seemed to work fine
for me as well, but when I bumped up the numbers of writers and/or the
number of messages (e.g. 25/25) it starts to fail again


[snip]


Yes, it is a resource issue.  There is a limit on the number of writers

that can be open at one

time, currently 64.  I chose that number arbitrarily, with no idea what

might actually be

needed in practice, and it can easily be changed.


Does it have to be a limit at all ? We would rather see that the application
decide how much resources it would like to use. In our particular case there
will be a process-manager with an incoming pipe that possible several
thousands of processes will write to


I agree.


Just for fiddling around (to figure out if this is the limit that make other
things work a bit odd), where's this 64 limit defined now ?


It's MAX_CLIENTS, defined in fhandler.h.  But there seem to be other resource 
issues also; simply increasing MAX_CLIENTS doesn't solve the problem.  I think 
there are also problems with the number of threads, for example.  Each time your 
program forks, the subprocess inherits the rfd file descriptor and its 
"fifo_reader_thread" starts up.  This is unnecessary for your application, so I 
tried disabling it (in fhandler_fifo::fixup_after_fork), just as an experiment.


But then I ran into some deadlocks, suggesting that one of the locks I'm using 
isn't robust enough.  So I've got a lot of things to work on.



In addition, a writer isn't recognized as closed until a reader tries to

read and gets an error.

In your example with 25/25, the list of writers quickly gets to 64 before

the parent ever tries

to read.


That explains the behaviour, but should there be some error returned from
open/write (maybe it is but I'm missing it) ?


The error is discovered in add_client_handler, called from thread_func.  I think 
you'll only see it if you run the program under strace.  I'll see if I can find 
a way to report it.  Currently, there's a retry loop in fhandler_fifo::open when 
a writer tries to open, and I think I need to limit the number of retries and 
then error out.


I pushed a few improvements and bug fixes, and your 25/25 example now runs 
without a problem.  I increased MAX_CLIENTS to 1024 just for the sake of this 
example, but I'll work on letting the number of writers increase dynamically as 
needed.


Ken
--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Sv: Sv: Sv: Sv: Sv: Sv: Sv: Named pipes and multiple writers

2020-04-01 Thread Ken Brown via Cygwin

On 4/1/2020 1:14 PM, sten.kristian.ivars...@gmail.com wrote:

On 4/1/2020 4:52 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/31/2020 5:10 PM, sten.kristian.ivars...@gmail.com wrote:

On 3/28/2020 10:19 PM, Ken Brown via Cygwin wrote:

On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote:

On 3/28/2020 8:10 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/27/2020 10:53 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote:

The ENIXIO occurs when parallel child-processes
simultaneously using O_NONBLOCK opening the descriptor.


This is consistent with my guess that the error is
generated by fhandler_fifo::wait.  I have a feeling that
read_ready should have been created as a manual-reset
event, and that more care is needed to make sure it's set

when it should be.

[snip]


Never mind.  I was able to reproduce the problem and find the cause.
What happens is that when the first subprocess exits,
fhandler_fifo::close resets read_ready.  That causes the second
and subsequent subprocesses to think that there's no reader open,
so their attempts to open a writer with O_NONBLOCK fail with ENXIO.


[snip]


I wrote in a previous mail in this topic that it seemed to work fine
for me as well, but when I bumped up the numbers of writers and/or the
number of messages (e.g. 25/25) it starts to fail again


[snip]


Yes, it is a resource issue.  There is a limit on the number of writers

that can be open at one

time, currently 64.  I chose that number arbitrarily, with no idea what

might actually be

needed in practice, and it can easily be changed.


Does it have to be a limit at all ? We would rather see that the application
decide how much resources it would like to use. In our particular case there
will be a process-manager with an incoming pipe that possible several
thousands of processes will write to


I agree.


Just for fiddling around (to figure out if this is the limit that make other
things work a bit odd), where's this 64 limit defined now ?


It's MAX_CLIENTS, defined in fhandler.h.  But there seem to be other resource 
issues also; simply increasing MAX_CLIENTS doesn't solve the problem.  I think 
there are also problems with the number of threads, for example.  Each time your 
program forks, the subprocess inherits the rfd file descriptor and its 
"fifo_reader_thread" starts up.  This is unnecessary for your application, so I 
tried disabling it (in fhandler_fifo::fixup_after_fork), just as an experiment.


But then I ran into some deadlocks, suggesting that one of the locks I'm using 
isn't robust enough.  So I've got a lot of things to work on.



In addition, a writer isn't recognized as closed until a reader tries to

read and gets an error.

In your example with 25/25, the list of writers quickly gets to 64 before

the parent ever tries

to read.


That explains the behaviour, but should there be some error returned from
open/write (maybe it is but I'm missing it) ?


The error is discovered in add_client_handler, called from thread_func.  I think 
you'll only see it if you run the program under strace.  I'll see if I can find 
a way to report it.  Currently, there's a retry loop in fhandler_fifo::open when 
a writer tries to open, and I think I need to limit the number of retries and 
then error out.


Ken
--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Sv: Sv: Sv: Sv: Sv: Sv: Sv: Named pipes and multiple writers

2020-04-01 Thread Kristian Ivarsson via Cygwin
> On 4/1/2020 4:52 AM, sten.kristian.ivars...@gmail.com wrote:
> >> On 3/31/2020 5:10 PM, sten.kristian.ivars...@gmail.com wrote:
>  On 3/28/2020 10:19 PM, Ken Brown via Cygwin wrote:
> > On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote:
> >> On 3/28/2020 8:10 AM, sten.kristian.ivars...@gmail.com wrote:
>  On 3/27/2020 10:53 AM, sten.kristian.ivars...@gmail.com wrote:
> >> On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:
> >>> On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:
>  On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote:
> > The ENIXIO occurs when parallel child-processes
> > simultaneously using O_NONBLOCK opening the descriptor.
> 
>  This is consistent with my guess that the error is
>  generated by fhandler_fifo::wait.  I have a feeling that
>  read_ready should have been created as a manual-reset
>  event, and that more care is needed to make sure it's set
when it should be.

[snip] 

> > Never mind.  I was able to reproduce the problem and find the cause.
> > What happens is that when the first subprocess exits,
> > fhandler_fifo::close resets read_ready.  That causes the second
> > and subsequent subprocesses to think that there's no reader open,
> > so their attempts to open a writer with O_NONBLOCK fail with ENXIO.

[snip] 

> > I wrote in a previous mail in this topic that it seemed to work fine
> > for me as well, but when I bumped up the numbers of writers and/or the
> > number of messages (e.g. 25/25) it starts to fail again

[snip] 

> Yes, it is a resource issue.  There is a limit on the number of writers
that can be open at one
> time, currently 64.  I chose that number arbitrarily, with no idea what
might actually be
> needed in practice, and it can easily be changed.

Does it have to be a limit at all ? We would rather see that the application
decide how much resources it would like to use. In our particular case there
will be a process-manager with an incoming pipe that possible several
thousands of processes will write to

Just for fiddling around (to figure out if this is the limit that make other
things work a bit odd), where's this 64 limit defined now ?

> In addition, a writer isn't recognized as closed until a reader tries to
read and gets an error.
> In your example with 25/25, the list of writers quickly gets to 64 before
the parent ever tries
> to read.

That explains the behaviour, but should there be some error returned from
open/write (maybe it is but I'm missing it) ?

> I'll see if I can find a better way to manage this.
> 
> Ken

Kristian
Kristian 

--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Sv: Sv: Sv: Sv: Sv: Sv: Named pipes and multiple writers

2020-04-01 Thread Ken Brown via Cygwin

On 4/1/2020 4:52 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/31/2020 5:10 PM, sten.kristian.ivars...@gmail.com wrote:

On 3/28/2020 10:19 PM, Ken Brown via Cygwin wrote:

On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote:

On 3/28/2020 8:10 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/27/2020 10:53 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote:

The ENIXIO occurs when parallel child-processes
simultaneously using O_NONBLOCK opening the descriptor.


This is consistent with my guess that the error is generated
by fhandler_fifo::wait.  I have a feeling that read_ready
should have been created as a manual-reset event, and that
more care is needed to make sure it's set when it should be.


I could provide a code-snippet to reproduce it if wanted ?


Yes, please!


That might not be necessary.  If you're able to build the git
repo master branch, please try the attached patch.



Here's a better patch.



I finally succeeded to build latest master (make is not my
favourite
tool) and added the patch, but still no success in my little
test-program (see
attachment) when creating a write-file-descriptor with
O_NONBLOCK



Your test program fails for me on Linux too.  Here's the output
from one

run:

You're right. That was extremely careless of me to not test this
in Linux first :-)


No problem.


I can assure that we have a use case that works on Linux but not
in Cygwin, but it seems like I failed to narrow it down in the
wrong way

I'll try to rearrange my code (that works in Linux) to mimic our
application but in a simple way (I'll be back)


OK, I'll be waiting for you.  BTW, if it's not too hard to write
your test case in plain C, or at least less modern C++, that would
simplify things for me.  For example, your pipe.cpp failed to
compile on one Linux machine I wanted to test it on, presumably
because that

machine had an older C++ compiler.


Never mind.  I was able to reproduce the problem and find the cause.
What happens is that when the first subprocess exits,
fhandler_fifo::close resets read_ready.  That causes the second and
subsequent subprocesses to think that there's no reader open, so
their attempts to open a writer with O_NONBLOCK fail with ENXIO.

I should be able to fix this tomorrow.



I've pushed what I think is a fix to the topic/fifo branch.  I tested
it

with the attached program, which is a variant of the test case you
sent last week.

Please test it in your use case.



Note: If you've previously pulled the topic/fifo branch, then you
will

probably get a lot of conflicts when you pull again, because I did a
forced push a few days ago.  If that happens, just do


git reset --hard origin/topic/fifo



It turned out that the fix required some of the ideas that I've been

working on in connection with allowing multiple readers.  Even though
the code allows a FIFO to be *explicitly* opened for reading only
once, there can still be several open file descriptors for readers
because of dup and fork.  The existing code on git master doesn't
handle those situations properly.


The code on topic/fifo doesn't completely fix that yet, but I think
it

should work under the following assumptions:


1. The FIFO is opened only once for reading.



2. The file descriptor obtained from this is the only one on which a
read

is attempted.


I'm working on removing both of these restrictions.



Ken


We finally took the time to make some kind of a simplified "hack" that
works on Ubuntu and BSD/OSX but with latest on master newlib-cygwin gave

"ENXIO"

now and then but with your previous patch attached, there was no ENXIO
but ::read returns EAGIN (until exhausted) (with cygwin) almost every
run

I will try your newest things tomorrow

See latest attatched test-program (starts to get bloated but this time
more C-compatible though:-)


Thanks.  This runs fine with the current HEAD of topic/fifo.


I wrote in a previous mail in this topic that it seemed to work fine for me
as well, but when I bumped up the numbers of writers and/or the number of
messages (e.g. 25/25) it starts to fail again

The initial thought is that we're bumping into some kind of system resource
limit, but I haven't had the time to dig into details (yet) (I'm sorry for
that)


Yes, it is a resource issue.  There is a limit on the number of writers that can 
be open at one time, currently 64.  I chose that number arbitrarily, with no 
idea what might actually be needed in practice, and it can easily be changed.


In addition, a writer isn't recognized as closed until a reader tries to read 
and gets an error.  In your example with 25/25, the list of writers quickly gets 
to 64 before the parent ever tries to read.


I'll see if I can find a better way to manage this.

Ken
--
Problem reports:  https://cygwin.com/problems.html
FAQ:  

Re: Sv: Sv: Sv: Sv: Sv: Sv: Named pipes and multiple writers

2020-04-01 Thread Ken Brown via Cygwin

On 4/1/2020 3:45 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/31/2020 5:10 PM, sten.kristian.ivars...@gmail.com wrote:

On 3/28/2020 10:19 PM, Ken Brown via Cygwin wrote:

On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote:

On 3/28/2020 8:10 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/27/2020 10:53 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote:

The ENIXIO occurs when parallel child-processes
simultaneously using O_NONBLOCK opening the descriptor.


[snip]


Thanks.  This runs fine with the current HEAD of topic/fifo.


We are very grateful for your efforts

The test-program works fine now with the latest commits in topic/fifo

There are some (possible other) issues that make our application doesn't
work as in Linux and we haven't had the time to dig into what the problems
are yet. A quick guess is that it is related to signals and possible related
to pselect, but we need to dig deeper into the logic to narrow down the
problems. Shall we "close" this issue even if we find out that there's still
problems with "named pipes" later on ?


You can write again any time that you discover further problems.  The Cygwin 
project doesn't have any mechanism for closing issues.



The application server (the open source project) that we're trying to make
work in Windows can be found here https://bitbucket.org/casualcore/ (if
interested)


Thanks, I'll take a look at some point.


Once again, tnx Ken


Thank you for reporting the problem.  The code probably hasn't been exercised 
very much up to now.


Ken
--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Sv: Sv: Sv: Sv: Sv: Sv: Named pipes and multiple writers

2020-04-01 Thread Kristian Ivarsson via Cygwin
> On 3/31/2020 5:10 PM, sten.kristian.ivars...@gmail.com wrote:
> >> On 3/28/2020 10:19 PM, Ken Brown via Cygwin wrote:
> >>> On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote:
>  On 3/28/2020 8:10 AM, sten.kristian.ivars...@gmail.com wrote:
> >> On 3/27/2020 10:53 AM, sten.kristian.ivars...@gmail.com wrote:
>  On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:
> > On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:
> >> On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote:
> >>> The ENIXIO occurs when parallel child-processes
> >>> simultaneously using O_NONBLOCK opening the descriptor.
> >>
> >> This is consistent with my guess that the error is generated
> >> by fhandler_fifo::wait.  I have a feeling that read_ready
> >> should have been created as a manual-reset event, and that
> >> more care is needed to make sure it's set when it should be.
> >>
> >>> I could provide a code-snippet to reproduce it if wanted ?
> >>
> >> Yes, please!
> >
> > That might not be necessary.  If you're able to build the git
> > repo master branch, please try the attached patch.
> >>>
>  Here's a better patch.
> >>>
> >>>
> >>> I finally succeeded to build latest master (make is not my
> >>> favourite
> >>> tool) and added the patch, but still no success in my little
> >>> test-program (see
> >>> attachment) when creating a write-file-descriptor with
> >>> O_NONBLOCK
> >
> >> Your test program fails for me on Linux too.  Here's the output
> >> from one
> > run:
> >
> > You're right. That was extremely careless of me to not test this
> > in Linux first :-)
> 
>  No problem.
> 
> > I can assure that we have a use case that works on Linux but not
> > in Cygwin, but it seems like I failed to narrow it down in the
> > wrong way
> >
> > I'll try to rearrange my code (that works in Linux) to mimic our
> > application but in a simple way (I'll be back)
> 
>  OK, I'll be waiting for you.  BTW, if it's not too hard to write
>  your test case in plain C, or at least less modern C++, that would
>  simplify things for me.  For example, your pipe.cpp failed to
>  compile on one Linux machine I wanted to test it on, presumably
>  because that
> > machine had an older C++ compiler.
> >>>
> >>> Never mind.  I was able to reproduce the problem and find the cause.
> >>> What happens is that when the first subprocess exits,
> >>> fhandler_fifo::close resets read_ready.  That causes the second and
> >>> subsequent subprocesses to think that there's no reader open, so
> >>> their attempts to open a writer with O_NONBLOCK fail with ENXIO.
> >>>
> >>> I should be able to fix this tomorrow.
> >
> >> I've pushed what I think is a fix to the topic/fifo branch.  I tested
> >> it
> > with the attached program, which is a variant of the test case you
> > sent last week.
> >> Please test it in your use case.
> >
> >> Note: If you've previously pulled the topic/fifo branch, then you
> >> will
> > probably get a lot of conflicts when you pull again, because I did a
> > forced push a few days ago.  If that happens, just do
> >
> >>git reset --hard origin/topic/fifo
> >
> >> It turned out that the fix required some of the ideas that I've been
> > working on in connection with allowing multiple readers.  Even though
> > the code allows a FIFO to be *explicitly* opened for reading only
> > once, there can still be several open file descriptors for readers
> > because of dup and fork.  The existing code on git master doesn't
> > handle those situations properly.
> >
> >> The code on topic/fifo doesn't completely fix that yet, but I think
> >> it
> > should work under the following assumptions:
> >
> >> 1. The FIFO is opened only once for reading.
> >
> >> 2. The file descriptor obtained from this is the only one on which a
> >> read
> > is attempted.
> >
> >> I'm working on removing both of these restrictions.
> >
> >> Ken
> >
> > We finally took the time to make some kind of a simplified "hack" that
> > works on Ubuntu and BSD/OSX but with latest on master newlib-cygwin gave
"ENXIO"
> > now and then but with your previous patch attached, there was no ENXIO
> > but ::read returns EAGIN (until exhausted) (with cygwin) almost every
> > run
> >
> > I will try your newest things tomorrow
> >
> > See latest attatched test-program (starts to get bloated but this time
> > more C-compatible though:-)
> 
> Thanks.  This runs fine with the current HEAD of topic/fifo.

I wrote in a previous mail in this topic that it seemed to work fine for me
as well, but when I bumped up the numbers of writers and/or the number of
messages (e.g. 25/25) it starts to fail again

The initial thought is that we're bumping into some kind of system resource
limit, but I haven't had the time to dig into details 

Sv: Sv: Sv: Sv: Sv: Sv: Named pipes and multiple writers

2020-04-01 Thread Kristian Ivarsson via Cygwin
> On 3/31/2020 5:10 PM, sten.kristian.ivars...@gmail.com wrote:
> >> On 3/28/2020 10:19 PM, Ken Brown via Cygwin wrote:
> >>> On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote:
>  On 3/28/2020 8:10 AM, sten.kristian.ivars...@gmail.com wrote:
> >> On 3/27/2020 10:53 AM, sten.kristian.ivars...@gmail.com wrote:
>  On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:
> > On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:
> >> On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote:
> >>> The ENIXIO occurs when parallel child-processes
> >>> simultaneously using O_NONBLOCK opening the descriptor.

[snip] 

> Thanks.  This runs fine with the current HEAD of topic/fifo.

We are very grateful for your efforts 

The test-program works fine now with the latest commits in topic/fifo

There are some (possible other) issues that make our application doesn't
work as in Linux and we haven't had the time to dig into what the problems
are yet. A quick guess is that it is related to signals and possible related
to pselect, but we need to dig deeper into the logic to narrow down the
problems. Shall we "close" this issue even if we find out that there's still
problems with "named pipes" later on ?

The application server (the open source project) that we're trying to make
work in Windows can be found here https://bitbucket.org/casualcore/ (if
interested)

Once again, tnx Ken

Kristian

> Ken

--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Sv: Sv: Sv: Sv: Sv: Named pipes and multiple writers

2020-03-31 Thread Ken Brown via Cygwin

On 3/31/2020 5:10 PM, sten.kristian.ivars...@gmail.com wrote:

On 3/28/2020 10:19 PM, Ken Brown via Cygwin wrote:

On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote:

On 3/28/2020 8:10 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/27/2020 10:53 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote:

The ENIXIO occurs when parallel child-processes simultaneously
using O_NONBLOCK opening the descriptor.


This is consistent with my guess that the error is generated by
fhandler_fifo::wait.  I have a feeling that read_ready should
have been created as a manual-reset event, and that more care
is needed to make sure it's set when it should be.


I could provide a code-snippet to reproduce it if wanted ?


Yes, please!


That might not be necessary.  If you're able to build the git
repo master branch, please try the attached patch.



Here's a better patch.



I finally succeeded to build latest master (make is not my
favourite
tool) and added the patch, but still no success in my little
test-program (see
attachment) when creating a write-file-descriptor with O_NONBLOCK



Your test program fails for me on Linux too.  Here's the output
from one

run:

You're right. That was extremely careless of me to not test this in
Linux first :-)


No problem.


I can assure that we have a use case that works on Linux but not in
Cygwin, but it seems like I failed to narrow it down in the wrong
way

I'll try to rearrange my code (that works in Linux) to mimic our
application but in a simple way (I'll be back)


OK, I'll be waiting for you.  BTW, if it's not too hard to write your
test case in plain C, or at least less modern C++, that would
simplify things for me.  For example, your pipe.cpp failed to compile
on one Linux machine I wanted to test it on, presumably because that

machine had an older C++ compiler.


Never mind.  I was able to reproduce the problem and find the cause.
What happens is that when the first subprocess exits,
fhandler_fifo::close resets read_ready.  That causes the second and
subsequent subprocesses to think that there's no reader open, so their
attempts to open a writer with O_NONBLOCK fail with ENXIO.

I should be able to fix this tomorrow.



I've pushed what I think is a fix to the topic/fifo branch.  I tested it

with the attached program, which is a variant of the test case you sent last
week.

Please test it in your use case.



Note: If you've previously pulled the topic/fifo branch, then you will

probably get a lot of conflicts when you pull again, because I did a forced
push a few days ago.  If that happens, just do


   git reset --hard origin/topic/fifo



It turned out that the fix required some of the ideas that I've been

working on in connection with allowing multiple readers.  Even though the
code allows a FIFO to be *explicitly* opened for reading only once, there
can still be several open file descriptors for readers because of dup and
fork.  The existing code on git master doesn't handle those situations
properly.


The code on topic/fifo doesn't completely fix that yet, but I think it

should work under the following assumptions:


1. The FIFO is opened only once for reading.



2. The file descriptor obtained from this is the only one on which a read

is attempted.


I'm working on removing both of these restrictions.



Ken


We finally took the time to make some kind of a simplified "hack" that works
on Ubuntu and BSD/OSX but with latest on master newlib-cygwin gave "ENXIO"
now and then but with your previous patch attached, there was no ENXIO but
::read returns EAGIN (until exhausted) (with cygwin) almost every run

I will try your newest things tomorrow

See latest attatched test-program (starts to get bloated but this time more
C-compatible though:-)


Thanks.  This runs fine with the current HEAD of topic/fifo.

Ken
--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Sv: Sv: Sv: Sv: Sv: Named pipes and multiple writers

2020-03-31 Thread Kristian Ivarsson via Cygwin
>On 3/28/2020 10:19 PM, Ken Brown via Cygwin wrote:
>> On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote:
>>> On 3/28/2020 8:10 AM, sten.kristian.ivars...@gmail.com wrote:
> On 3/27/2020 10:53 AM, sten.kristian.ivars...@gmail.com wrote:
>>> On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:
 On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:
> On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote:
>> The ENIXIO occurs when parallel child-processes simultaneously 
>> using O_NONBLOCK opening the descriptor.
>
> This is consistent with my guess that the error is generated by 
> fhandler_fifo::wait.  I have a feeling that read_ready should 
> have been created as a manual-reset event, and that more care 
> is needed to make sure it's set when it should be.
>
>> I could provide a code-snippet to reproduce it if wanted ?
>
> Yes, please!

 That might not be necessary.  If you're able to build the git 
 repo master branch, please try the attached patch.
>>
>>> Here's a better patch.
>>
>>
>> I finally succeeded to build latest master (make is not my 
>> favourite
>> tool) and added the patch, but still no success in my little 
>> test-program (see
>> attachment) when creating a write-file-descriptor with O_NONBLOCK

> Your test program fails for me on Linux too.  Here's the output 
> from one
 run:

 You're right. That was extremely careless of me to not test this in 
 Linux first :-)
>>>
>>> No problem.
>>>
 I can assure that we have a use case that works on Linux but not in 
 Cygwin, but it seems like I failed to narrow it down in the wrong 
 way

 I'll try to rearrange my code (that works in Linux) to mimic our 
 application but in a simple way (I'll be back)
>>>
>>> OK, I'll be waiting for you.  BTW, if it's not too hard to write your 
>>> test case in plain C, or at least less modern C++, that would 
>>> simplify things for me.  For example, your pipe.cpp failed to compile 
>>> on one Linux machine I wanted to test it on, presumably because that
machine had an older C++ compiler.
>> 
>> Never mind.  I was able to reproduce the problem and find the cause.  
>> What happens is that when the first subprocess exits, 
>> fhandler_fifo::close resets read_ready.  That causes the second and 
>> subsequent subprocesses to think that there's no reader open, so their 
>> attempts to open a writer with O_NONBLOCK fail with ENXIO.
>> 
>> I should be able to fix this tomorrow.

>I've pushed what I think is a fix to the topic/fifo branch.  I tested it
with the attached program, which is a variant of the test case you sent last
week. 
>Please test it in your use case.

>Note: If you've previously pulled the topic/fifo branch, then you will
probably get a lot of conflicts when you pull again, because I did a forced
push a few days ago.  If that happens, just do

>   git reset --hard origin/topic/fifo

>It turned out that the fix required some of the ideas that I've been
working on in connection with allowing multiple readers.  Even though the
code allows a FIFO to be *explicitly* opened for reading only once, there
can still be several open file descriptors for readers because of dup and
fork.  The existing code on git master doesn't handle those situations
properly.

>The code on topic/fifo doesn't completely fix that yet, but I think it
should work under the following assumptions:

>1. The FIFO is opened only once for reading.

>2. The file descriptor obtained from this is the only one on which a read
is attempted.

>I'm working on removing both of these restrictions.

>Ken 

We finally took the time to make some kind of a simplified "hack" that works
on Ubuntu and BSD/OSX but with latest on master newlib-cygwin gave "ENXIO"
now and then but with your previous patch attached, there was no ENXIO but
::read returns EAGIN (until exhausted) (with cygwin) almost every run

I will try your newest things tomorrow

See latest attatched test-program (starts to get bloated but this time more
C-compatible though:-)

Kristian

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 


int print_error(const int error)
{
   printf("%s\n", strerror(error));
   return error;
}
struct Message
{
   int pid;
   int index;
};

int main()
{
   const char* name = "/tmp/my_pipe";
   const int result = mkfifo(name, 0666);
   if (result)
  return print_error(errno);
   const int writers{5};
   const int messages{5};
   printf("open parent pipe\n");
   const int rfd = open(name, O_RDONLY | O_NONBLOCK);
   const int wfd = open(name, O_WRONLY);
   if (rfd < 0)
  return print_error(errno);
   const int block_alternation[] = {0, O_NONBLOCK};
   int pids[writers];
   for (auto idx = 0; idx < writers; ++idx)
   {
  const auto pid = fork();
  if (pid < 0)
 return print_error(errno);

Re: Sv: Sv: Sv: Sv: Named pipes and multiple writers

2020-03-30 Thread Ken Brown via Cygwin

On 3/28/2020 10:19 PM, Ken Brown via Cygwin wrote:

On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote:

On 3/28/2020 8:10 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/27/2020 10:53 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote:

The ENIXIO occurs when parallel child-processes simultaneously
using O_NONBLOCK opening the descriptor.


This is consistent with my guess that the error is generated by
fhandler_fifo::wait.  I have a feeling that read_ready should have
been created as a manual-reset event, and that more care is needed
to make sure it's set when it should be.


I could provide a code-snippet
to reproduce it if wanted ?


Yes, please!


That might not be necessary.  If you're able to build the git repo
master branch, please try the attached patch.



Here's a better patch.



I finally succeeded to build latest master (make is not my favourite
tool) and added the patch, but still no success in my little
test-program (see
attachment) when creating a write-file-descriptor with O_NONBLOCK



Your test program fails for me on Linux too.  Here's the output from one

run:

You're right. That was extremely careless of me to not test this in Linux
first :-)


No problem.


I can assure that we have a use case that works on Linux but not in Cygwin,
but it seems like I failed to narrow it down in the wrong way

I'll try to rearrange my code (that works in Linux) to mimic our application
but in a simple way (I'll be back)


OK, I'll be waiting for you.  BTW, if it's not too hard to write your test 
case in plain C, or at least less modern C++, that would simplify things for 
me.  For example, your pipe.cpp failed to compile on one Linux machine I 
wanted to test it on, presumably because that machine had an older C++ compiler.


Never mind.  I was able to reproduce the problem and find the cause.  What 
happens is that when the first subprocess exits, fhandler_fifo::close resets 
read_ready.  That causes the second and subsequent subprocesses to think that 
there's no reader open, so their attempts to open a writer with O_NONBLOCK fail 
with ENXIO.


I should be able to fix this tomorrow.


I've pushed what I think is a fix to the topic/fifo branch.  I tested it with 
the attached program, which is a variant of the test case you sent last week. 
Please test it in your use case.


Note: If you've previously pulled the topic/fifo branch, then you will probably 
get a lot of conflicts when you pull again, because I did a forced push a few 
days ago.  If that happens, just do


  git reset --hard origin/topic/fifo

It turned out that the fix required some of the ideas that I've been working on 
in connection with allowing multiple readers.  Even though the code allows a 
FIFO to be *explicitly* opened for reading only once, there can still be several 
open file descriptors for readers because of dup and fork.  The existing code on 
git master doesn't handle those situations properly.


The code on topic/fifo doesn't completely fix that yet, but I think it should 
work under the following assumptions:


1. The FIFO is opened only once for reading.

2. The file descriptor obtained from this is the only one on which a read is 
attempted.


I'm working on removing both of these restrictions.

Ken
/* Adapted from
   https://sourceware.org/pipermail/cygwin/2020-March/244219.html */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define FIFO "/tmp/myfifo"
#define nsubproc  5
#define nmessages 4
#define pid_len   4

int
error (const int n, const char *name)
{
  fprintf (stderr, "\n%d\t%s:\t%d\t%s\n", getpid (), name, n, strerror (n));
  return n;
}

int
main ()
{
  if (mkfifo (FIFO, S_IRUSR | S_IWUSR | S_IWGRP) < 0
  && errno != EEXIST)
return error (errno, "mkfifo");

  int rfd = open (FIFO, O_RDWR);
  if (rfd < 0)
return error (errno, "open reader");

  for (int i = 0; i < nsubproc; i++)
{
  pid_t pid = fork ();

  if (pid < 0)
return error (errno, "fork");

  if (pid == 0)
{
  printf ("child %d\n", getpid ());
  for (int j = 0; j < nmessages; j++)
{
  char buf[pid_len + 2]; /* +1 for newline, +1 for nul */

  int wfd = open (FIFO, O_WRONLY | O_NONBLOCK);
  if (wfd < 0)
_exit (error (errno, "open writer"));
  sprintf (buf, "%d\n", getpid ());
  ssize_t nwritten = write (wfd, buf, strlen (buf));
  if (nwritten < 0)
error (errno, "write");
  /* printf ("i = %d, j = %d, nwritten = %zd\n", i, j, nwritten); */
  close (wfd);
}
  _exit (0);
}
}

  printf ("parent\n");
  char buf[pid_len + 2];
  for (int i = 0; i < nsubproc; i++)
for (int j = 0; j < nmessages; j++)
  {
if (read (rfd, 

Re: Sv: Sv: Sv: Sv: Named pipes and multiple writers

2020-03-28 Thread Ken Brown via Cygwin

On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote:

On 3/28/2020 8:10 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/27/2020 10:53 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote:

The ENIXIO occurs when parallel child-processes simultaneously
using O_NONBLOCK opening the descriptor.


This is consistent with my guess that the error is generated by
fhandler_fifo::wait.  I have a feeling that read_ready should have
been created as a manual-reset event, and that more care is needed
to make sure it's set when it should be.


I could provide a code-snippet
to reproduce it if wanted ?


Yes, please!


That might not be necessary.  If you're able to build the git repo
master branch, please try the attached patch.



Here's a better patch.



I finally succeeded to build latest master (make is not my favourite
tool) and added the patch, but still no success in my little
test-program (see
attachment) when creating a write-file-descriptor with O_NONBLOCK



Your test program fails for me on Linux too.  Here's the output from one

run:

You're right. That was extremely careless of me to not test this in Linux
first :-)


No problem.


I can assure that we have a use case that works on Linux but not in Cygwin,
but it seems like I failed to narrow it down in the wrong way

I'll try to rearrange my code (that works in Linux) to mimic our application
but in a simple way (I'll be back)


OK, I'll be waiting for you.  BTW, if it's not too hard to write your test case 
in plain C, or at least less modern C++, that would simplify things for me.  For 
example, your pipe.cpp failed to compile on one Linux machine I wanted to test 
it on, presumably because that machine had an older C++ compiler.


Never mind.  I was able to reproduce the problem and find the cause.  What 
happens is that when the first subprocess exits, fhandler_fifo::close resets 
read_ready.  That causes the second and subsequent subprocesses to think that 
there's no reader open, so their attempts to open a writer with O_NONBLOCK fail 
with ENXIO.


I should be able to fix this tomorrow.

Ken
--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Sv: Sv: Sv: Sv: Named pipes and multiple writers

2020-03-28 Thread Ken Brown via Cygwin

On 3/28/2020 8:10 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/27/2020 10:53 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote:

The ENIXIO occurs when parallel child-processes simultaneously
using O_NONBLOCK opening the descriptor.


This is consistent with my guess that the error is generated by
fhandler_fifo::wait.  I have a feeling that read_ready should have
been created as a manual-reset event, and that more care is needed
to make sure it's set when it should be.


I could provide a code-snippet
to reproduce it if wanted ?


Yes, please!


That might not be necessary.  If you're able to build the git repo
master branch, please try the attached patch.



Here's a better patch.



I finally succeeded to build latest master (make is not my favourite
tool) and added the patch, but still no success in my little
test-program (see
attachment) when creating a write-file-descriptor with O_NONBLOCK



Your test program fails for me on Linux too.  Here's the output from one

run:

You're right. That was extremely careless of me to not test this in Linux
first :-)


No problem.


I can assure that we have a use case that works on Linux but not in Cygwin,
but it seems like I failed to narrow it down in the wrong way

I'll try to rearrange my code (that works in Linux) to mimic our application
but in a simple way (I'll be back)


OK, I'll be waiting for you.  BTW, if it's not too hard to write your test case 
in plain C, or at least less modern C++, that would simplify things for me.  For 
example, your pipe.cpp failed to compile on one Linux machine I wanted to test 
it on, presumably because that machine had an older C++ compiler.


Ken
--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Sv: Sv: Sv: Sv: Named pipes and multiple writers

2020-03-28 Thread Kristian Ivarsson via Cygwin
>On 3/27/2020 10:53 AM, sten.kristian.ivars...@gmail.com wrote:
>>> On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:
 On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:
> On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote:
>> The ENIXIO occurs when parallel child-processes simultaneously 
>> using O_NONBLOCK opening the descriptor.
>
> This is consistent with my guess that the error is generated by 
> fhandler_fifo::wait.  I have a feeling that read_ready should have 
> been created as a manual-reset event, and that more care is needed 
> to make sure it's set when it should be.
>
>> I could provide a code-snippet
>> to reproduce it if wanted ?
>
> Yes, please!

 That might not be necessary.  If you're able to build the git repo 
 master branch, please try the attached patch.
>> 
>>> Here's a better patch.
>> 
>> 
>> I finally succeeded to build latest master (make is not my favourite 
>> tool) and added the patch, but still no success in my little 
>> test-program (see
>> attachment) when creating a write-file-descriptor with O_NONBLOCK

>Your test program fails for me on Linux too.  Here's the output from one
run:

You're right. That was extremely careless of me to not test this in Linux
first :-)

I can assure that we have a use case that works on Linux but not in Cygwin,
but it seems like I failed to narrow it down in the wrong way

I'll try to rearrange my code (that works in Linux) to mimic our application
but in a simple way (I'll be back)

[snip]

>Ken

--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Sv: Sv: Sv: Named pipes and multiple writers

2020-03-27 Thread Ken Brown via Cygwin

On 3/27/2020 6:56 PM, Ken Brown via Cygwin wrote:

On 3/27/2020 10:53 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote:

The ENIXIO occurs when parallel child-processes simultaneously using
O_NONBLOCK opening the descriptor.


This is consistent with my guess that the error is generated by
fhandler_fifo::wait.  I have a feeling that read_ready should have
been created as a manual-reset event, and that more care is needed to
make sure it's set when it should be.


I could provide a code-snippet
to reproduce it if wanted ?


Yes, please!


That might not be necessary.  If you're able to build the git repo
master branch, please try the attached patch.



Here's a better patch.



I finally succeeded to build latest master (make is not my favourite tool)
and added the patch, but still no success in my little test-program (see
attachment) when creating a write-file-descriptor with O_NONBLOCK


Your test program fails for me on Linux too.  Here's the output from one run:

child 657
657 error:  6   No such device or address
child 658
child 659
658659  error:  child 660
parent
child 661
     error:  66606661 661 661
     error:  661
No such device or address6No such device or address

No such device or address

[I then killed it with control-C; the parent was blocked trying to open the 
FIFO.]

There's a race condition in your code.  The parent is trying to open the FIFO 
for reading (without O_NONBLOCK) while the child is trying to open it for 
writing (with O_NONBLOCK).  The parent is blocked waiting for the child, and the 
child's open fails with ENXIO; see


   
https://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html#tag_16_357

I think you need to rearrange things so that the FIFO is open for reading before 
you try a nonblocking open for writing.


For example, you could open it with O_RDWR instead of O_RDONLY.
--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Sv: Sv: Sv: Named pipes and multiple writers

2020-03-27 Thread Ken Brown via Cygwin

On 3/27/2020 10:53 AM, sten.kristian.ivars...@gmail.com wrote:

On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote:

The ENIXIO occurs when parallel child-processes simultaneously using
O_NONBLOCK opening the descriptor.


This is consistent with my guess that the error is generated by
fhandler_fifo::wait.  I have a feeling that read_ready should have
been created as a manual-reset event, and that more care is needed to
make sure it's set when it should be.


I could provide a code-snippet
to reproduce it if wanted ?


Yes, please!


That might not be necessary.  If you're able to build the git repo
master branch, please try the attached patch.



Here's a better patch.



I finally succeeded to build latest master (make is not my favourite tool)
and added the patch, but still no success in my little test-program (see
attachment) when creating a write-file-descriptor with O_NONBLOCK


Your test program fails for me on Linux too.  Here's the output from one run:

child 657
657 error:  6   No such device or address
child 658
child 659
658659  error:  child 660
parent
child 661
error:  66606661 661 661
error:  661
No such device or address6No such device or address

No such device or address

[I then killed it with control-C; the parent was blocked trying to open the 
FIFO.]

There's a race condition in your code.  The parent is trying to open the FIFO 
for reading (without O_NONBLOCK) while the child is trying to open it for 
writing (with O_NONBLOCK).  The parent is blocked waiting for the child, and the 
child's open fails with ENXIO; see


  
https://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html#tag_16_357

I think you need to rearrange things so that the FIFO is open for reading before 
you try a nonblocking open for writing.


I can work around the race by using a small positive 'wait' in 
fhandler_fifo::wait(), but I'm not sure this is the right thing to do, since 
Cygwin aims to emulate Linux.  Can you find a test case that works on Linux but 
fails on Cygwin?


Ken
--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Sv: Sv: Sv: Named pipes and multiple writers

2020-03-27 Thread Kristian Ivarsson via Cygwin
>On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:
>> On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:
>>> On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote:
 The ENIXIO occurs when parallel child-processes simultaneously using 
 O_NONBLOCK opening the descriptor.
>>>
>>> This is consistent with my guess that the error is generated by 
>>> fhandler_fifo::wait.  I have a feeling that read_ready should have 
>>> been created as a manual-reset event, and that more care is needed to 
>>> make sure it's set when it should be.
>>>
 I could provide a code-snippet
 to reproduce it if wanted ?
>>>
>>> Yes, please!
>> 
>> That might not be necessary.  If you're able to build the git repo 
>> master branch, please try the attached patch.

>Here's a better patch.


I finally succeeded to build latest master (make is not my favourite tool)
and added the patch, but still no success in my little test-program (see
attachment) when creating a write-file-descriptor with O_NONBLOCK

 
>Ken
#include 

#include 
#include 
#include 
#include 

#include 
#include 

namespace
{
   auto error(const int error)
   {
  std::cerr << getpid() << "\terror:\t" << error << '\t' << 
std::strerror(error) << std::endl;
  return error;
   }
}

int main()
{
   const auto name{"/tmp/my_name"};

   const auto result = mkfifo(name, 0666);

   if(result) return error(errno);

   constexpr auto writers{5};
   constexpr auto messages{4};

   for(auto idx = 0; idx < writers; ++idx)
   {
  const auto pid = fork();

  if(pid < 0) return error(errno);

  if(pid == 0)
  {
 std::cout << "child " << getpid() << std::endl;
 for(auto idx = 0; idx < messages; ++idx)
 {
const auto wfd = open(name, O_WRONLY | O_NONBLOCK);
if(wfd < 0) return error(errno);
const auto msg{std::to_string(getpid())};
if(write(wfd, msg.data(), msg.size() + 1) < 0) error(errno);
close(wfd);
 }
 return 0;
  }
   }

   {
  std::cout << "parent" << std::endl;

  const auto rfd = open(name, O_RDONLY);
  const auto wfd = open(name, O_WRONLY);
  if(rfd < 0) return error(errno);
  for(auto idx = 0; idx < writers * messages; ++idx)
  {
 std::string buffer;
 buffer.resize(80);
 if(read(rfd, [0], buffer.size()) < 0) error(errno);
 std::cout << buffer << std::endl;
  }
  close(wfd);
  close(rfd);
   }

   if(unlink(name) < 0) return error(errno);

   return 0;
}--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Sv: Sv: Named pipes and multiple writers

2020-03-27 Thread Ken Brown via Cygwin

On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote:

The ENIXIO occurs when parallel child-processes simultaneously using
O_NONBLOCK opening the descriptor.


This is consistent with my guess that the error is generated by 
fhandler_fifo::wait.  I have a feeling that read_ready should have been 
created as a manual-reset event, and that more care is needed to make sure 
it's set when it should be.



I could provide a code-snippet
to reproduce it if wanted ?


Yes, please!


That might not be necessary.  If you're able to build the git repo master 
branch, please try the attached patch.


Here's a better patch.

Ken
>From 3efd5a8cbff8d48b8cf9807070134bb79f591b7d Mon Sep 17 00:00:00 2001
From: Ken Brown 
Date: Thu, 26 Mar 2020 19:02:16 -0400
Subject: [PATCH] Cygwin: FIFO: fix a problem opening nonblocking writers

Make read_ready a manual-reset event.  Previously, when it was an
auto-reset event, there was a brief period when read_ready was not set
after a writer opened.  An attempt to open a second writer during this
period would fail with ENXIO if O_NONBLOCK was set, even if a reader
was open.

For the same reason, move ResetEvent(read_ready) from
listen_client_thread() to close().

Addresses: https://sourceware.org/pipermail/cygwin/2020-March/244201.html
---
 winsup/cygwin/fhandler_fifo.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/winsup/cygwin/fhandler_fifo.cc b/winsup/cygwin/fhandler_fifo.cc
index 19cd0e507..c7e27e883 100644
--- a/winsup/cygwin/fhandler_fifo.cc
+++ b/winsup/cygwin/fhandler_fifo.cc
@@ -463,7 +463,6 @@ fhandler_fifo::listen_client_thread ()
 out:
   if (evt)
 CloseHandle (evt);
-  ResetEvent (read_ready);
   if (ret < 0)
 debug_printf ("exiting with error, %E");
   else
@@ -516,7 +515,7 @@ fhandler_fifo::open (int flags, mode_t)
 
   char npbuf[MAX_PATH];
   __small_sprintf (npbuf, "r-event.%08x.%016X", get_dev (), get_ino ());
-  if (!(read_ready = CreateEvent (sa_buf, false, false, npbuf)))
+  if (!(read_ready = CreateEvent (sa_buf, true, false, npbuf)))
 {
   debug_printf ("CreateEvent for %s failed, %E", npbuf);
   res = error_set_errno;
@@ -1016,6 +1015,8 @@ fhandler_fifo::close ()
  handler or another thread. */
   fifo_client_unlock ();
   int ret = stop_listen_client ();
+  if (reader && read_ready)
+ResetEvent (read_ready);
   if (read_ready)
 CloseHandle (read_ready);
   if (write_ready)
-- 
2.21.0

--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Sv: Sv: Named pipes and multiple writers

2020-03-26 Thread Ken Brown via Cygwin

On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote:

The ENIXIO occurs when parallel child-processes simultaneously using
O_NONBLOCK opening the descriptor.


This is consistent with my guess that the error is generated by 
fhandler_fifo::wait.  I have a feeling that read_ready should have been created 
as a manual-reset event, and that more care is needed to make sure it's set when 
it should be.



I could provide a code-snippet
to reproduce it if wanted ?


Yes, please!


That might not be necessary.  If you're able to build the git repo master 
branch, please try the attached patch.


Ken
>From 279591d91a13616957964256e02344a627b6f558 Mon Sep 17 00:00:00 2001
From: Ken Brown 
Date: Thu, 26 Mar 2020 19:02:16 -0400
Subject: [PATCH] Cygwin: FIFO: make read_ready a manual-reset event

If a FIFO is open for reading and an attempt is made to open it for
writing with O_NONBLOCK, read_ready must be set in order for open to
succeed.  When read_ready was an auto-reset event, there was a brief
period when read_ready was not set set after a writer opened.  If a
second writer attempted to open the FIFO with O_NONBLOCK during this
period, the attempt would fail.

Addresses: https://sourceware.org/pipermail/cygwin/2020-March/244201.html
---
 winsup/cygwin/fhandler_fifo.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/winsup/cygwin/fhandler_fifo.cc b/winsup/cygwin/fhandler_fifo.cc
index 19cd0e507..c05161099 100644
--- a/winsup/cygwin/fhandler_fifo.cc
+++ b/winsup/cygwin/fhandler_fifo.cc
@@ -516,7 +516,7 @@ fhandler_fifo::open (int flags, mode_t)
 
   char npbuf[MAX_PATH];
   __small_sprintf (npbuf, "r-event.%08x.%016X", get_dev (), get_ino ());
-  if (!(read_ready = CreateEvent (sa_buf, false, false, npbuf)))
+  if (!(read_ready = CreateEvent (sa_buf, true, false, npbuf)))
 {
   debug_printf ("CreateEvent for %s failed, %E", npbuf);
   res = error_set_errno;
-- 
2.21.0

--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Sv: Sv: Named pipes and multiple writers

2020-03-26 Thread Ken Brown via Cygwin

On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote:

The ENIXIO occurs when parallel child-processes simultaneously using
O_NONBLOCK opening the descriptor.


This is consistent with my guess that the error is generated by 
fhandler_fifo::wait.  I have a feeling that read_ready should have been created 
as a manual-reset event, and that more care is needed to make sure it's set when 
it should be.



I could provide a code-snippet
to reproduce it if wanted ?


Yes, please!

Ken
--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Sv: Sv: Named pipes and multiple writers

2020-03-26 Thread Kristian Ivarsson via Cygwin
>> [snip]
 As far as I can see, reading through history, this have been a known 
 issue for quite some time, but it seems like there have been some 
 attempts to solve it, e.g. in the branch topic/fifo (by Ken Brown)
>> 
>> [snip]
 Does anyone have any knowledge about if this (topic/fifo branch) is 
 working and/or if it is somehow planned to make it into the master 
 branch and end up in a future release ?
>> 
>>> That branch is obsolete.  Support for multiple writers was added to 
>>> Cygwin
>> as of release 3.1.0.
>> 
>> Ok, thanks, but we're running 3.1.4 (and tested 3.1.5) but do still 
>> have problems (experiencing ENXIO (No such device or address)) but 
>> actually (as far as we see) with the 3:rd writer ?
>> 
>> We need to investigate the issue more thoroughly and might get back 
>> when we have more knowledge

>Does the ENXIO come from fhandler_fifo::wait?  If so, it's quite possible
that there's a bug involving read_ready in my code.

Our application is a bit complex and I have now narrowed it down

The ENIXIO occurs when parallel child-processes simultaneously using
O_NONBLOCK opening the descriptor. We're sometimes opening it blocking and
sometimes non-blocking and it seems like when the 2:nd non-blocking process
tries to open it gets ENIXIO. The child process open and closes the
fifo-descriptor for writing multiple times. I could provide a code-snippet
to reproduce it if wanted ?

Tnx for showing interest btw

Kristian

>Ken

--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Sv: Named pipes and multiple writers

2020-03-26 Thread Norton Allen

On 3/26/2020 12:44 PM, Ken Brown via Cygwin wrote:

On 3/26/2020 12:03 PM, Norton Allen wrote:

On 3/26/2020 11:11 AM, Ken Brown via Cygwin wrote:


BTW, I've been working on adding support for multiple readers.  I 
expect to have a first cut ready within a week or two.  Would you 
have any use for that?  If so, I could revive the topic/fifo branch 
and push my patches there for you to test.




Ken, what are the semantics for multiple readers? Do all readers see 
the same data,  or is it first come first served or something else?


It's first come, first served.  If two readers attempt to read 
simultaneously, it's possible that one will get some of the available 
input and the other will get some more.


The only use case for multiple readers that I've come across of is 
Midnight Commander running under tcsh.  I didn't dig into the code 
enough to know why they do it, or why only under tcsh.  See


https://sourceware.org/pipermail/cygwin/2019-December/243317.html

and

https://cygwin.com/pipermail/cygwin-apps/2019-December/039777.html

That's what got me interested in this.  It would be nice to know if 
there are other use cases.


I suppose it could be used as a simple approach to deploying jobs to 
worker processes, provided a process could guarantee that it received 
enough information to define a job and not more than one. I guess if the 
job definition were fixed length that could work.



--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Sv: Named pipes and multiple writers

2020-03-26 Thread Ken Brown via Cygwin

On 3/26/2020 12:03 PM, Norton Allen wrote:

On 3/26/2020 11:11 AM, Ken Brown via Cygwin wrote:


BTW, I've been working on adding support for multiple readers.  I expect to 
have a first cut ready within a week or two.  Would you have any use for 
that?  If so, I could revive the topic/fifo branch and push my patches there 
for you to test.




Ken, what are the semantics for multiple readers? Do all readers see the same 
data,  or is it first come first served or something else?


It's first come, first served.  If two readers attempt to read simultaneously, 
it's possible that one will get some of the available input and the other will 
get some more.


The only use case for multiple readers that I've come across of is Midnight 
Commander running under tcsh.  I didn't dig into the code enough to know why 
they do it, or why only under tcsh.  See


  https://sourceware.org/pipermail/cygwin/2019-December/243317.html

and

  https://cygwin.com/pipermail/cygwin-apps/2019-December/039777.html

That's what got me interested in this.  It would be nice to know if there are 
other use cases.


Ken
--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Sv: Named pipes and multiple writers

2020-03-26 Thread Norton Allen

On 3/26/2020 11:11 AM, Ken Brown via Cygwin wrote:


BTW, I've been working on adding support for multiple readers.  I 
expect to have a first cut ready within a week or two.  Would you have 
any use for that?  If so, I could revive the topic/fifo branch and 
push my patches there for you to test.




Ken, what are the semantics for multiple readers? Do all readers see the 
same data,  or is it first come first served or something else?


--
=
Norton Allen (he/him/his)
Software Engineer
Harvard University School of Engineering and Applied Sciences
12 Oxford St., Link Bldg. (Office 282)
Cambridge, MA  02138
Phone: (617) 998-5553
=

--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Sv: Named pipes and multiple writers

2020-03-26 Thread Ken Brown via Cygwin

On 3/26/2020 10:06 AM, Ken Brown via Cygwin wrote:

[Let's keep the discussion on the list in case others have suggestions.]

On 3/25/2020 9:41 AM, sten.kristian.ivars...@gmail.com wrote:

[snip]

As far as I can see, reading through history, this have been a known
issue for quite some time, but it seems like there have been some
attempts to solve it, e.g. in the branch topic/fifo (by Ken Brown)


[snip]

Does anyone have any knowledge about if this (topic/fifo branch) is
working and/or if it is somehow planned to make it into the master
branch and end up in a future release ?



That branch is obsolete.  Support for multiple writers was added to Cygwin

as of release 3.1.0.

Ok, thanks, but we're running 3.1.4 (and tested 3.1.5) but do still have
problems (experiencing ENXIO (No such device or address)) but actually (as
far as we see) with the 3:rd writer ?

We need to investigate the issue more thoroughly and might get back when we
have more knowledge


Does the ENXIO come from fhandler_fifo::wait?  If so, it's quite possible that 
there's a bug involving read_ready in my code.


BTW, I've been working on adding support for multiple readers.  I expect to have 
a first cut ready within a week or two.  Would you have any use for that?  If 
so, I could revive the topic/fifo branch and push my patches there for you to test.


Ken
--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Sv: Named pipes and multiple writers

2020-03-26 Thread Ken Brown via Cygwin

[Let's keep the discussion on the list in case others have suggestions.]

On 3/25/2020 9:41 AM, sten.kristian.ivars...@gmail.com wrote:

[snip]

As far as I can see, reading through history, this have been a known
issue for quite some time, but it seems like there have been some
attempts to solve it, e.g. in the branch topic/fifo (by Ken Brown)


[snip]

Does anyone have any knowledge about if this (topic/fifo branch) is
working and/or if it is somehow planned to make it into the master
branch and end up in a future release ?



That branch is obsolete.  Support for multiple writers was added to Cygwin

as of release 3.1.0.

Ok, thanks, but we're running 3.1.4 (and tested 3.1.5) but do still have
problems (experiencing ENXIO (No such device or address)) but actually (as
far as we see) with the 3:rd writer ?

We need to investigate the issue more thoroughly and might get back when we
have more knowledge


Does the ENXIO come from fhandler_fifo::wait?  If so, it's quite possible that 
there's a bug involving read_ready in my code.


Ken
--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple