On 02/10/13 16:26, Rob Vesse wrote:
OK so I think I have solved this problem as far as is possibleThe iterator now has a configurable max poll attempts, if it fails to read any data after some number of attempts (default 10) it will declare the producer to be dead and error out. I've also added in the fix for the close() issue that you highlighted.
Any reason the producer can't explicitly say the pipe is dead?(worry about timeouts and polling because thread scheduling can be really bad under load - 10s is unlikely but possible)
Andy
I kicked off a deploy build so if you try with 2.11.1-SNAPSHOT you should no longer have problems since the consumer thread should now more reliably detect a dead producer and act accordingly. Rob On 10/2/13 12:48 PM, "Rob Vesse" <[email protected]> wrote:So I mullered my terminology in my previous email: - reader -> producer - writer -> consumer I produced a test case that reproduced the exact scenario I outlined and was able to make a specific fix for that case I have also produced a variation on your test case which I think shows some part of the issue, the problem of the writeSide variable never being set can be easily solved by setting it when the producer (writer) thread calls start() on the PipedRDFStream. BUT this only solves the problem in the case where the producer thread actually really dies and when it gets far enough to call start() Where thread pools are involved it is possible for the task to die but the thread to remain so even though writeSide is set that thread may still be alive so the case of a broken write end may never get detected. However in the type of scenario you show where a parser is used a parser should always be calling finish() as part of its clean up. However as Andy points out it looks like there is a case where the parser can fail prematurely so neither start() or finish() is ever actually called. It should be possible to set up a check so that an error is thrown if the producer thread never calls start though you have to be careful on this one because there is a race condition where both threads are fired off and the consumer makes it's first check before the producer ever gets as far as calling start() Your fix for close() seems to be correct though the error message is not, we merely need to allow hasNext() to bail out from the while (true) loop if close() has been called and so the correct error is the same as the one thrown earlier in the hasNext() method I.e. "Pipe is closed" I'll finish up my tweaks to the PipedRDFIterator classes (plus unit tests) and get those committed later today. Rob On 10/2/13 11:35 AM, "Rob Vesse" <[email protected]> wrote:Ok Having looked over the code again the only obvious possible issue I can see is the following scenario: 1 - App launches a reader thread using a thread pool - reader is assigned to thread X 2 - Reader thread fails and releases thread X to the pool 3 - App launches a writer thread using a thread pool - writer is assigned to thread X 4 - Writer never detects the reader failure because it is on the same thread as the reader originally was and so the writer thread is considered alive I.e. it may be a timing issue combined with a thread scheduling issue. I haven't attempted to construct a test case to test this theory - I will do shortly - but this is the only obvious failure condition I can see from the code. Of course there may be others, stack traces/profiler output/etc>from when this occurs in your production scenario might help here. Ifthis is the case then scheduling the reader and writer threads on different thread pools would avoid any possibility of this happening. I'll reply again to the thread once I've had chance to concoct a test case for this scenario. 2.11.0 is the latest stable release version but the PipedRDFIterator code has not changed for the last couple of releases AFAIK so I doubt bumping the version will make much difference. Rob On 10/2/13 10:50 AM, "Norman Walsh" <[email protected]> wrote:Rob Vesse <[email protected]> writes:I've never seen the email you quote before nor does it appear to be in the mailing list archives, are you sure you actually sent it to the list?Indeed, there it is, stuck in my outbound mail queue for seven days. Alas.The PipedRDFIterator is supposed to handle errors sensibly, a minimal reproducible test case with sample data where it does not would be helpfulNaturally, all my attempts to write such a case have failed. But it consistently goes off into the weeds in the big, multi-threaded production app. *sigh* I'll keep hacking at it.Also what version of ARQ are you using?2.10.0 according to the POM. Be seeing you, norm -- Norman Walsh <[email protected]> | It would not be better if things http://nwalsh.com/ | happened to men just as they wished.-- | Heraclitus
