On 2/10/07, Marshall Schor <[EMAIL PROTECTED]> wrote:
Yes, good catch! If there are multiple threads trying to dequeue, they
all wait, and
they all are notified, and wake up, but only the first one gets the
element. The rest
will return null without having waited the full time of their timeout,
perhaps.
Another problem: the notifyAll wakes up all the threads before the
timeout. So,
the next time around in the "while" loop, the timeout value needs to be
adjusted to
compensate for the time already elapsed.
This kind of code occurs in other places, too. Maybe it could be
refactored so
that fixes in one spot would fix all uses.
Another problem I see: the enqueue lets things get added beyond the max
size of the
queue, in several cases (e.g. if enqueuing an EOF token, or if the cpm
is marked not-running).
Once that happens, the enqueue test for queue full fails, and the queue
can grow without bounds. Maybe this never occurs currently, but it's a
risky design.
Maybe it could occur if the CPM is paused? (Not sure about the
semantics of
"not-running")?
I think in fact there is no need for a bounded-size work queue, since
we have a limited CAS pool size. IIRC the queue sizes used to be
settable in the CPE descriptor but no longer are; now they are all set
to the CAS pool size. Thus we never hit the condition where a thread
is trying to add something to a queue and the queue is full.
However, we may hit the case where multiple threads are trying to
dequeue something but the queue is empty. That seems like it would
happen if the CAS Consumers are slow, so that the queue feeding
the Analysis Engine pipelines would be empty and all of the pipelines
would be trying to dequeue from it simultaneously. So it seems like
the dequeue() returning null issue could arise in pratice, unless
there are further checks for null by the caller, which I did not look
for. My guess is those checks are there, that's the only explanation
I can think of for why this didn't result in a serious bug.
-Adam