[ http://issues.apache.org/jira/browse/NUTCH-306?page=all ]
Sami Siren reassigned NUTCH-306:
--------------------------------
Assign To: Sami Siren
> DistributedSearch.Client liveAddresses concurrency problem
> ----------------------------------------------------------
>
> Key: NUTCH-306
> URL: http://issues.apache.org/jira/browse/NUTCH-306
> Project: Nutch
> Type: Bug
> Components: searcher
> Versions: 0.7, 0.8-dev
> Reporter: Grant Glouser
> Assignee: Sami Siren
> Priority: Critical
> Attachments: DistributedSearch.java-patch
>
> Under heavy load, hits returned by DistributedSearch.Client can become out of
> sync with the Client's live server list.
> DistributedSearch.Client maintains an array of live search servers
> (liveAddresses). This array is updated at intervals by a watchdog thread.
> When the Client returns hits from a search, it tracks which hits came from
> which server by saving an index into the liveAddresses array (as Hit.indexNo).
> The problem occurs when the search servers cannot service some remote
> procedure calls before the client times out (due to heavy load, for example).
> If the Client returns some Hits from a search, and then the array of
> liveAddresses changes while the Hits are still being used, the indexNos for
> those Hits can become invalid, referring to different servers than the Hit
> originated from (or no server at all!).
> Symptoms of this problem include:
> - ArrayIndexOutOfBoundsException (when the array of liveAddresses shrinks, a
> Hit from the last server in liveAddresses in the previous update cycle now
> has an indexNo past the end of the array)
> - IOException: read past EOF (suppose a hit comes back from server A with a
> doc number of 1000. Then the watchdog thread updates liveAddresses and now
> the Hit looks like it came from server B, but server B only has 900
> documents. Trying to get details for the hit will read past EOF in server
> B's index.)
> - Of course, you could also get a "silent" failure in which you find a hit on
> server A, but the details/summary are fetched from server B. To the user, it
> would simply look like an incorrect or nonsense hit.
> We have solved this locally by removing the liveAddresses array. Instead,
> the watchdog thread updates an array of booleans (same size as the array of
> defaultAddresses) that indicate whether that address responded to the latest
> call from the watchdog thread. Hit.indexNo is then always an index into the
> complete array of defaultAddresses, so it is stable and always valid.
> Callers of getDetails()/getSummary()/etc. must still be aware that these
> methods may return null when the corresponding server is unable to respond.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers