[ http://issues.apache.org/jira/browse/NUTCH-306?page=comments#action_12416673 ]
Sami Siren commented on NUTCH-306: ---------------------------------- This patch does not seem to apply anymore, can you please attach a patch against current svn trunk. > DistributedSearch.Client liveAddresses concurrency problem > ---------------------------------------------------------- > > Key: NUTCH-306 > URL: http://issues.apache.org/jira/browse/NUTCH-306 > Project: Nutch > Type: Bug > Components: searcher > Versions: 0.7, 0.8-dev > Reporter: Grant Glouser > Assignee: Sami Siren > Priority: Critical > Attachments: DistributedSearch.java-patch > > Under heavy load, hits returned by DistributedSearch.Client can become out of > sync with the Client's live server list. > DistributedSearch.Client maintains an array of live search servers > (liveAddresses). This array is updated at intervals by a watchdog thread. > When the Client returns hits from a search, it tracks which hits came from > which server by saving an index into the liveAddresses array (as Hit.indexNo). > The problem occurs when the search servers cannot service some remote > procedure calls before the client times out (due to heavy load, for example). > If the Client returns some Hits from a search, and then the array of > liveAddresses changes while the Hits are still being used, the indexNos for > those Hits can become invalid, referring to different servers than the Hit > originated from (or no server at all!). > Symptoms of this problem include: > - ArrayIndexOutOfBoundsException (when the array of liveAddresses shrinks, a > Hit from the last server in liveAddresses in the previous update cycle now > has an indexNo past the end of the array) > - IOException: read past EOF (suppose a hit comes back from server A with a > doc number of 1000. Then the watchdog thread updates liveAddresses and now > the Hit looks like it came from server B, but server B only has 900 > documents. Trying to get details for the hit will read past EOF in server > B's index.) > - Of course, you could also get a "silent" failure in which you find a hit on > server A, but the details/summary are fetched from server B. To the user, it > would simply look like an incorrect or nonsense hit. > We have solved this locally by removing the liveAddresses array. Instead, > the watchdog thread updates an array of booleans (same size as the array of > defaultAddresses) that indicate whether that address responded to the latest > call from the watchdog thread. Hit.indexNo is then always an index into the > complete array of defaultAddresses, so it is stable and always valid. > Callers of getDetails()/getSummary()/etc. must still be aware that these > methods may return null when the corresponding server is unable to respond. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
