Yes there are two possibilities:

1 - We never ever see anything from the producer and so have no way of
tracking its thread and knowing if it is dead or alive
2 - The producer dies on a thread pool thread so meaning that in many
cases the actual JVM thread itself is still alive so we can't actually
tell the producer died

Either way we need some way to not deadlock indefinitely, as part of these
changes both the poll timeout and max poll attempts are now configurable
so users can tweak these settings for systems with higher load.

When the code bails out it can and does distinguish between the two
possibilities and throw an appropriate error but that's as much as we can
do.

Rob

On 10/2/13 5:17 PM, "Andy Seaborne" <[email protected]> wrote:

>On 02/10/13 16:26, Rob Vesse wrote:
>> OK so I think I have solved this problem as far as is possible
>>
>> The iterator now has a configurable max poll attempts, if it fails to
>>read
>> any data after some number of attempts (default 10) it will declare the
>> producer to be dead and error out.  I've also added in the fix for the
>> close() issue that you highlighted.
>
>Any reason the producer can't explicitly say the pipe is dead?
>
>(worry about timeouts and polling because thread scheduling can be
>really bad under load - 10s is unlikely but possible)
>
>       Andy
>
>>
>> I kicked off a deploy build so if you try with 2.11.1-SNAPSHOT you
>>should
>> no longer have problems since the consumer thread should now more
>>reliably
>> detect a dead producer and act accordingly.
>>
>> Rob
>>
>> On 10/2/13 12:48 PM, "Rob Vesse" <[email protected]> wrote:
>>
>>> So I mullered my terminology in my previous email:
>>>
>>> - reader -> producer
>>> - writer -> consumer
>>>
>>> I produced a test case that reproduced the exact scenario I outlined
>>>and
>>> was able to make a specific fix for that case
>>>
>>> I have also produced a variation on your test case which I think shows
>>> some part of the issue, the problem of the writeSide variable never
>>>being
>>> set can be easily solved by setting it when the producer (writer)
>>>thread
>>> calls start() on the PipedRDFStream.  BUT this only solves the problem
>>>in
>>> the case where the producer thread actually really dies and when it
>>>gets
>>> far enough to call start()
>>>
>>> Where thread pools are involved it is possible for the task to die but
>>>the
>>> thread to remain so even though writeSide is set that thread may still
>>>be
>>> alive so the case of a broken write end may never get detected.
>>>
>>> However in the type of scenario you show where a parser is used a
>>>parser
>>> should always be calling finish() as part of its clean up.  However as
>>> Andy points out it looks like there is a case where the parser can fail
>>> prematurely so neither start() or finish() is ever actually called.  It
>>> should be possible to set up a check so that an error is thrown if the
>>> producer thread never calls start though you have to be careful on this
>>> one because there is a race condition where both threads are fired off
>>>and
>>> the consumer makes it's first check before the producer ever gets as
>>>far
>>> as calling start()
>>>
>>> Your fix for close() seems to be correct though the error message is
>>>not,
>>> we merely need to allow hasNext() to bail out from the while (true)
>>>loop
>>> if close() has been called and so the correct error is the same as the
>>>one
>>> thrown earlier in the hasNext() method I.e. "Pipe is closed"
>>>
>>> I'll finish up my tweaks to the PipedRDFIterator classes (plus unit
>>>tests)
>>> and get those committed later today.
>>>
>>> Rob
>>>
>>>
>>> On 10/2/13 11:35 AM, "Rob Vesse" <[email protected]> wrote:
>>>
>>>> Ok
>>>>
>>>> Having looked over the code again the only obvious possible issue I
>>>>can
>>>> see is the following scenario:
>>>>
>>>> 1 - App launches a reader thread using a thread pool - reader is
>>>>assigned
>>>> to thread X
>>>> 2 - Reader thread fails and releases thread X to the pool
>>>> 3 - App launches a writer thread using a thread pool - writer is
>>>>assigned
>>>> to thread X
>>>> 4 - Writer never detects the reader failure because it is on the same
>>>> thread as the reader originally was and so the writer thread is
>>>> considered
>>>> alive
>>>>
>>>> I.e. it may be a timing issue combined with a thread scheduling issue.
>>>>
>>>> I haven't attempted to construct a test case to test this theory - I
>>>>will
>>>> do shortly - but this is the only obvious failure condition I can see
>>>> from
>>>> the code.  Of course there may be others, stack traces/profiler
>>>> output/etc
>>> >from when this occurs in your production scenario might help here.  If
>>>> this is the case then scheduling the reader and writer threads on
>>>> different thread pools would avoid any possibility of this happening.
>>>> I'll reply again to the thread once I've had chance to concoct a test
>>>> case
>>>> for this scenario.
>>>>
>>>> 2.11.0 is the latest stable release version but the PipedRDFIterator
>>>>code
>>>> has not changed for the last couple of releases AFAIK so I doubt
>>>>bumping
>>>> the version will make much difference.
>>>>
>>>> Rob
>>>>
>>>> On 10/2/13 10:50 AM, "Norman Walsh" <[email protected]> wrote:
>>>>
>>>>> Rob Vesse <[email protected]> writes:
>>>>>> I've never seen the email you quote before nor does it appear to be
>>>>>>in
>>>>>> the
>>>>>> mailing list archives, are you sure you actually sent it to the
>>>>>>list?
>>>>>
>>>>> Indeed, there it is, stuck in my outbound mail queue for seven days.
>>>>> Alas.
>>>>>
>>>>>> The PipedRDFIterator is supposed to handle errors sensibly, a
>>>>>>minimal
>>>>>> reproducible test case with sample data where it does not would be
>>>>>> helpful
>>>>>
>>>>> Naturally, all my attempts to write such a case have failed. But it
>>>>> consistently goes off into the weeds in the big, multi-threaded
>>>>> production app. *sigh*
>>>>>
>>>>> I'll keep hacking at it.
>>>>>
>>>>>> Also what version of ARQ are you using?
>>>>>
>>>>> 2.10.0 according to the POM.
>>>>>
>>>>>                                         Be seeing you,
>>>>>                                           norm
>>>>>
>>>>> --
>>>>> Norman Walsh <[email protected]> | It would not be better if things
>>>>> http://nwalsh.com/            | happened to men just as they
>>>>>wished.--
>>>>>                               | Heraclitus
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>




Reply via email to