On 02/10/13 16:26, Rob Vesse wrote:
OK so I think I have solved this problem as far as is possible

The iterator now has a configurable max poll attempts, if it fails to read
any data after some number of attempts (default 10) it will declare the
producer to be dead and error out.  I've also added in the fix for the
close() issue that you highlighted.

Any reason the producer can't explicitly say the pipe is dead?

(worry about timeouts and polling because thread scheduling can be really bad under load - 10s is unlikely but possible)

        Andy


I kicked off a deploy build so if you try with 2.11.1-SNAPSHOT you should
no longer have problems since the consumer thread should now more reliably
detect a dead producer and act accordingly.

Rob

On 10/2/13 12:48 PM, "Rob Vesse" <[email protected]> wrote:

So I mullered my terminology in my previous email:

- reader -> producer
- writer -> consumer

I produced a test case that reproduced the exact scenario I outlined and
was able to make a specific fix for that case

I have also produced a variation on your test case which I think shows
some part of the issue, the problem of the writeSide variable never being
set can be easily solved by setting it when the producer (writer) thread
calls start() on the PipedRDFStream.  BUT this only solves the problem in
the case where the producer thread actually really dies and when it gets
far enough to call start()

Where thread pools are involved it is possible for the task to die but the
thread to remain so even though writeSide is set that thread may still be
alive so the case of a broken write end may never get detected.

However in the type of scenario you show where a parser is used a parser
should always be calling finish() as part of its clean up.  However as
Andy points out it looks like there is a case where the parser can fail
prematurely so neither start() or finish() is ever actually called.  It
should be possible to set up a check so that an error is thrown if the
producer thread never calls start though you have to be careful on this
one because there is a race condition where both threads are fired off and
the consumer makes it's first check before the producer ever gets as far
as calling start()

Your fix for close() seems to be correct though the error message is not,
we merely need to allow hasNext() to bail out from the while (true) loop
if close() has been called and so the correct error is the same as the one
thrown earlier in the hasNext() method I.e. "Pipe is closed"

I'll finish up my tweaks to the PipedRDFIterator classes (plus unit tests)
and get those committed later today.

Rob


On 10/2/13 11:35 AM, "Rob Vesse" <[email protected]> wrote:

Ok

Having looked over the code again the only obvious possible issue I can
see is the following scenario:

1 - App launches a reader thread using a thread pool - reader is assigned
to thread X
2 - Reader thread fails and releases thread X to the pool
3 - App launches a writer thread using a thread pool - writer is assigned
to thread X
4 - Writer never detects the reader failure because it is on the same
thread as the reader originally was and so the writer thread is
considered
alive

I.e. it may be a timing issue combined with a thread scheduling issue.

I haven't attempted to construct a test case to test this theory - I will
do shortly - but this is the only obvious failure condition I can see
from
the code.  Of course there may be others, stack traces/profiler
output/etc
>from when this occurs in your production scenario might help here.  If
this is the case then scheduling the reader and writer threads on
different thread pools would avoid any possibility of this happening.
I'll reply again to the thread once I've had chance to concoct a test
case
for this scenario.

2.11.0 is the latest stable release version but the PipedRDFIterator code
has not changed for the last couple of releases AFAIK so I doubt bumping
the version will make much difference.

Rob

On 10/2/13 10:50 AM, "Norman Walsh" <[email protected]> wrote:

Rob Vesse <[email protected]> writes:
I've never seen the email you quote before nor does it appear to be in
the
mailing list archives, are you sure you actually sent it to the list?

Indeed, there it is, stuck in my outbound mail queue for seven days.
Alas.

The PipedRDFIterator is supposed to handle errors sensibly, a minimal
reproducible test case with sample data where it does not would be
helpful

Naturally, all my attempts to write such a case have failed. But it
consistently goes off into the weeds in the big, multi-threaded
production app. *sigh*

I'll keep hacking at it.

Also what version of ARQ are you using?

2.10.0 according to the POM.

                                        Be seeing you,
                                          norm

--
Norman Walsh <[email protected]> | It would not be better if things
http://nwalsh.com/            | happened to men just as they wished.--
                              | Heraclitus













Reply via email to