[
http://issues.apache.org/jira/browse/NUTCH-306?page=comments#action_12416673 ]
Sami Siren commented on NUTCH-306:
--
This patch does not seem to apply anymore, can you please attach a patch
against current svn trunk.
DistributedSearch.Client liveAddresses concurrency problem
--
Key: NUTCH-306
URL: http://issues.apache.org/jira/browse/NUTCH-306
Project: Nutch
Type: Bug
Components: searcher
Versions: 0.7, 0.8-dev
Reporter: Grant Glouser
Assignee: Sami Siren
Priority: Critical
Attachments: DistributedSearch.java-patch
Under heavy load, hits returned by DistributedSearch.Client can become out of
sync with the Client's live server list.
DistributedSearch.Client maintains an array of live search servers
(liveAddresses). This array is updated at intervals by a watchdog thread.
When the Client returns hits from a search, it tracks which hits came from
which server by saving an index into the liveAddresses array (as Hit.indexNo).
The problem occurs when the search servers cannot service some remote
procedure calls before the client times out (due to heavy load, for example).
If the Client returns some Hits from a search, and then the array of
liveAddresses changes while the Hits are still being used, the indexNos for
those Hits can become invalid, referring to different servers than the Hit
originated from (or no server at all!).
Symptoms of this problem include:
- ArrayIndexOutOfBoundsException (when the array of liveAddresses shrinks, a
Hit from the last server in liveAddresses in the previous update cycle now
has an indexNo past the end of the array)
- IOException: read past EOF (suppose a hit comes back from server A with a
doc number of 1000. Then the watchdog thread updates liveAddresses and now
the Hit looks like it came from server B, but server B only has 900
documents. Trying to get details for the hit will read past EOF in server
B's index.)
- Of course, you could also get a silent failure in which you find a hit on
server A, but the details/summary are fetched from server B. To the user, it
would simply look like an incorrect or nonsense hit.
We have solved this locally by removing the liveAddresses array. Instead,
the watchdog thread updates an array of booleans (same size as the array of
defaultAddresses) that indicate whether that address responded to the latest
call from the watchdog thread. Hit.indexNo is then always an index into the
complete array of defaultAddresses, so it is stable and always valid.
Callers of getDetails()/getSummary()/etc. must still be aware that these
methods may return null when the corresponding server is unable to respond.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira