Antonio,

Did this patch work for you?

I think cleanExpiredServerBlocks() could be coded a bit more efficiently, but I doubt it will make a significant difference.

If this fixes things for you then I will commit it, with a rewritten cleanExpiredServerBlocks().

Cheers,

Doug

Stefan Groschupf wrote:

I'm very Sorry guys I was a bit to fast. ;-(
I forgot a (nextAccessTime!=null &&) thats broke the V2 patch.
Well the good think is that most of people in this list do not work over the weekend. ;-S
I had tried this patch on my server and it looks like its now only limited by the bandwidth. Would be great if someone can test with more bandwidth.
I'm sorry, thats the price trying fix things to fast, I will take care next time and hope people see my good will to help and the traffic just as a necessary side effect.



------------------------------------------------------------------------

Index: src/plugin/protocol-http/src/java/net/nutch/protocol/http/Http.java
===================================================================
RCS file: /cvsroot/nutch/nutch/src/plugin/protocol-http/src/java/net/nutch/protocol/http/Http.java,v
retrieving revision 1.2
diff -u -r1.2 Http.java
--- src/plugin/protocol-http/src/java/net/nutch/protocol/http/Http.java 3 Jun 2004 18:21:12 -0000 1.2
+++ src/plugin/protocol-http/src/java/net/nutch/protocol/http/Http.java 12 Jun 2004 14:23:29 -0000
@@ -8,9 +8,12 @@
import java.net.InetAddress;
import java.net.UnknownHostException;
import java.util.HashMap;
+import java.util.LinkedList;
import java.util.logging.Level;
import java.util.logging.Logger;
+
+
import net.nutch.util.LogFormatter;
import net.nutch.util.NutchConf;
@@ -43,10 +46,11 @@
NutchConf.getInt("fetcher.server.delay", 1) * 1000;
private static HashMap BLOCKED_ADDRS = new HashMap();
-
+ private static LinkedList HOST_BLOCK_LIST = new LinkedList();
private RobotRulesParser robotRules = new RobotRulesParser();
private static InetAddress blockAddr(URL url) throws ProtocolException {
+ cleanExpiredServerBlocks();
InetAddress addr;
try {
addr = InetAddress.getByName(url.getHost());
@@ -54,35 +58,61 @@
throw new HttpException(e);
}
while (true) {
- InetAddress blockedAddress;
+ Long nextAccessTime;
synchronized (BLOCKED_ADDRS) {
- blockedAddress = (InetAddress)BLOCKED_ADDRS.get(addr);
- if (blockedAddress == null) { // addr is unblocked
- BLOCKED_ADDRS.put(addr, addr); // block it
- return addr; // and return
+ nextAccessTime = (Long) BLOCKED_ADDRS.get(addr);
+ if (nextAccessTime == null
+ || nextAccessTime.longValue() < System.currentTimeMillis()) {
+ BLOCKED_ADDRS.put(addr, new Long(0));
+ return addr;
+ } else if (nextAccessTime.longValue() == 0) {
+ try {
+ Thread.sleep(SERVER_DELAY);
+ continue;
+ } catch (InterruptedException e1) {
+ // do nothing
+ }
+ } else if (nextAccessTime.longValue() < System.currentTimeMillis()) {
+ try {
+ Thread.sleep(System.currentTimeMillis()
+ - nextAccessTime.longValue());
+ continue;
+ } catch (InterruptedException e1) {
+ // do nothing
+ }
}
}
- synchronized (blockedAddress) {
- try {
- blockedAddress.wait(SERVER_DELAY); // wait for it
- } catch (InterruptedException e) {}
- }
+
}
}
- private static void unblockAddr(InetAddress addr) {
- InetAddress blockedAddress;
+
+
+ + private static void cleanExpiredServerBlocks() {
synchronized (BLOCKED_ADDRS) {
- blockedAddress = (InetAddress)BLOCKED_ADDRS.remove(addr);
+ int count = HOST_BLOCK_LIST.size();
+ Long nextAccessTime;
+ for (int i = count - 1; i >= 0; i--) {
+ InetAddress addr = (InetAddress) HOST_BLOCK_LIST.get(i);
+ nextAccessTime = (Long) BLOCKED_ADDRS.get(addr);
+ if (nextAccessTime != null
+ && nextAccessTime.longValue() <= System.currentTimeMillis()) {
+ BLOCKED_ADDRS.remove(addr);
+ HOST_BLOCK_LIST.remove(addr);
+ } else if(nextAccessTime == null){
+ HOST_BLOCK_LIST.remove(addr);
+ }
+ }
}
- if (blockedAddress == null)
- throw new RuntimeException("addr must be blocked!");
- synchronized (blockedAddress) {
- try {
- Thread.sleep(SERVER_DELAY); // delay a bit
- } catch (InterruptedException e) {}
+ }
- blockedAddress.notify(); // then wake waiting threads
+
+
+ private static void unblockAddr(InetAddress addr) {
+ synchronized (BLOCKED_ADDRS) {
+ BLOCKED_ADDRS.put(addr, new Long(System.currentTimeMillis()+SERVER_DELAY));
+ HOST_BLOCK_LIST.add(addr);
}
}



------------------------------------------------------------------------


Stefan

---------------------------------------------------------------
enterprise information technology consulting
open technology:   http://www.media-style.com
open source:           http://www.weta-group.net
open discussion:    http://www.text-mining.org


-------------------------------------------------------
This SF.Net email is sponsored by The 2004 JavaOne(SM) Conference
Learn from the experts at JavaOne(SM), Sun's Worldwide Java Developer
Conference, June 28 - July 1 at the Moscone Center in San Francisco, CA
REGISTER AND SAVE! http://java.sun.com/javaone/sf Priority Code NWMGYKND
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to