OK so I think I have solved this problem as far as is possible The iterator now has a configurable max poll attempts, if it fails to read any data after some number of attempts (default 10) it will declare the producer to be dead and error out. I've also added in the fix for the close() issue that you highlighted.
I kicked off a deploy build so if you try with 2.11.1-SNAPSHOT you should no longer have problems since the consumer thread should now more reliably detect a dead producer and act accordingly. Rob On 10/2/13 12:48 PM, "Rob Vesse" <[email protected]> wrote: >So I mullered my terminology in my previous email: > >- reader -> producer >- writer -> consumer > >I produced a test case that reproduced the exact scenario I outlined and >was able to make a specific fix for that case > >I have also produced a variation on your test case which I think shows >some part of the issue, the problem of the writeSide variable never being >set can be easily solved by setting it when the producer (writer) thread >calls start() on the PipedRDFStream. BUT this only solves the problem in >the case where the producer thread actually really dies and when it gets >far enough to call start() > >Where thread pools are involved it is possible for the task to die but the >thread to remain so even though writeSide is set that thread may still be >alive so the case of a broken write end may never get detected. > >However in the type of scenario you show where a parser is used a parser >should always be calling finish() as part of its clean up. However as >Andy points out it looks like there is a case where the parser can fail >prematurely so neither start() or finish() is ever actually called. It >should be possible to set up a check so that an error is thrown if the >producer thread never calls start though you have to be careful on this >one because there is a race condition where both threads are fired off and >the consumer makes it's first check before the producer ever gets as far >as calling start() > >Your fix for close() seems to be correct though the error message is not, >we merely need to allow hasNext() to bail out from the while (true) loop >if close() has been called and so the correct error is the same as the one >thrown earlier in the hasNext() method I.e. "Pipe is closed" > >I'll finish up my tweaks to the PipedRDFIterator classes (plus unit tests) >and get those committed later today. > >Rob > > >On 10/2/13 11:35 AM, "Rob Vesse" <[email protected]> wrote: > >>Ok >> >>Having looked over the code again the only obvious possible issue I can >>see is the following scenario: >> >>1 - App launches a reader thread using a thread pool - reader is assigned >>to thread X >>2 - Reader thread fails and releases thread X to the pool >>3 - App launches a writer thread using a thread pool - writer is assigned >>to thread X >>4 - Writer never detects the reader failure because it is on the same >>thread as the reader originally was and so the writer thread is >>considered >>alive >> >>I.e. it may be a timing issue combined with a thread scheduling issue. >> >>I haven't attempted to construct a test case to test this theory - I will >>do shortly - but this is the only obvious failure condition I can see >>from >>the code. Of course there may be others, stack traces/profiler >>output/etc >>from when this occurs in your production scenario might help here. If >>this is the case then scheduling the reader and writer threads on >>different thread pools would avoid any possibility of this happening. >>I'll reply again to the thread once I've had chance to concoct a test >>case >>for this scenario. >> >>2.11.0 is the latest stable release version but the PipedRDFIterator code >>has not changed for the last couple of releases AFAIK so I doubt bumping >>the version will make much difference. >> >>Rob >> >>On 10/2/13 10:50 AM, "Norman Walsh" <[email protected]> wrote: >> >>>Rob Vesse <[email protected]> writes: >>>> I've never seen the email you quote before nor does it appear to be in >>>>the >>>> mailing list archives, are you sure you actually sent it to the list? >>> >>>Indeed, there it is, stuck in my outbound mail queue for seven days. >>>Alas. >>> >>>> The PipedRDFIterator is supposed to handle errors sensibly, a minimal >>>> reproducible test case with sample data where it does not would be >>>>helpful >>> >>>Naturally, all my attempts to write such a case have failed. But it >>>consistently goes off into the weeds in the big, multi-threaded >>>production app. *sigh* >>> >>>I'll keep hacking at it. >>> >>>> Also what version of ARQ are you using? >>> >>>2.10.0 according to the POM. >>> >>> Be seeing you, >>> norm >>> >>>-- >>>Norman Walsh <[email protected]> | It would not be better if things >>>http://nwalsh.com/ | happened to men just as they wished.-- >>> | Heraclitus >> >> >> >> > > > >
