Re: [Nutch-cvs] svn commit: r414681 - /lucene/nutch/trunk/src/java/org/apache/nutch/protocol/ProtocolFactory.java

2006-06-18 Thread Sami Siren



+
+  if (conf.getObject(protocolName) != null) {
+return (Protocol) conf.getObject(protocolName);
+  } else {
+Extension extension = findExtension(protocolName);
+if (extension == null) {
+  throw new ProtocolNotFound(protocolName);
+}
  



I'm somewhat worried about the possible clash in the conf name-space 
-  usually, when we store Object's in Configuration instance, we use 
their full class name, or at least a long and most probably unique 
string. In this case, we use just http, https, ftp, file and 
so on ... Would it make sense if in this special case we used the 
X_POINT + protocolName as the unique string?


Perhaps I'm worrying too much ... ;)

I changed the code as you proposed. In the long run I would like to see 
this kind of caching stuff refactored to either Configuration
or to the plugin system or perhaps if we start using some kind of 
component container then there.


--
Sami Siren


[jira] Commented: (NUTCH-306) DistributedSearch.Client liveAddresses concurrency problem

2006-06-18 Thread Sami Siren (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-306?page=comments#action_12416673 ] 

Sami Siren commented on NUTCH-306:
--

This patch does not seem to apply anymore, can you please attach a patch 
against current  svn trunk.

 DistributedSearch.Client liveAddresses concurrency problem
 --

  Key: NUTCH-306
  URL: http://issues.apache.org/jira/browse/NUTCH-306
  Project: Nutch
 Type: Bug

   Components: searcher
 Versions: 0.7, 0.8-dev
 Reporter: Grant Glouser
 Assignee: Sami Siren
 Priority: Critical
  Attachments: DistributedSearch.java-patch

 Under heavy load, hits returned by DistributedSearch.Client can become out of 
 sync with the Client's live server list.
 DistributedSearch.Client maintains an array of live search servers 
 (liveAddresses).  This array is updated at intervals by a watchdog thread.  
 When the Client returns hits from a search, it tracks which hits came from 
 which server by saving an index into the liveAddresses array (as Hit.indexNo).
 The problem occurs when the search servers cannot service some remote 
 procedure calls before the client times out (due to heavy load, for example). 
  If the Client returns some Hits from a search, and then the array of 
 liveAddresses changes while the Hits are still being used, the indexNos for 
 those Hits can become invalid, referring to different servers than the Hit 
 originated from (or no server at all!).
 Symptoms of this problem include:
 - ArrayIndexOutOfBoundsException (when the array of liveAddresses shrinks, a 
 Hit from the last server in liveAddresses in the previous update cycle now 
 has an indexNo past the end of the array)
 - IOException: read past EOF (suppose a hit comes back from server A with a 
 doc number of 1000.  Then the watchdog thread updates liveAddresses and now 
 the Hit looks like it came from server B, but server B only has 900 
 documents.  Trying to get details for the hit will read past EOF in server 
 B's index.)
 - Of course, you could also get a silent failure in which you find a hit on 
 server A, but the details/summary are fetched from server B.  To the user, it 
 would simply look like an incorrect or nonsense hit.
 We have solved this locally by removing the liveAddresses array.  Instead, 
 the watchdog thread updates an array of booleans (same size as the array of 
 defaultAddresses) that indicate whether that address responded to the latest 
 call from the watchdog thread.  Hit.indexNo is then always an index into the 
 complete array of defaultAddresses, so it is stable and always valid.  
 Callers of getDetails()/getSummary()/etc. must still be aware that these 
 methods may return null when the corresponding server is unable to respond.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira