[ http://issues.apache.org/jira/browse/NUTCH-344?page=all ]
Greg Kim updated NUTCH-344:
---------------------------
Affects Version/s: 0.8
> Fetcher threads blocked on synchronized block in cleanExpiredServerBlocks
> -------------------------------------------------------------------------
>
> Key: NUTCH-344
> URL: http://issues.apache.org/jira/browse/NUTCH-344
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 0.8, 0.9.0, 0.8.1
> Environment: All
> Reporter: Greg Kim
> Attachments: cleanExpiredServerBlocks.patch
>
>
> With the recent change to the following code in HttpBase.java has tendencies
> to block fetcher threads while one thread busy waits...
> private static void cleanExpiredServerBlocks() {
> synchronized (BLOCKED_ADDR_TO_TIME) {
> while (!BLOCKED_ADDR_QUEUE.isEmpty()) { <===== LINE 3:
> String host = (String) BLOCKED_ADDR_QUEUE.getLast();
> long time = ((Long) BLOCKED_ADDR_TO_TIME.get(host)).longValue();
> if (time <= System.currentTimeMillis()) {
> BLOCKED_ADDR_TO_TIME.remove(host);
> BLOCKED_ADDR_QUEUE.removeLast();
> }
> }
> }
> }
> LINE3: As long as there are *any* entries in the BLOCKED_ADDR_QUEUE, the
> thread that first enters this block busy-waits until it becomes empty while
> all other threads block on the synchronized block. This leads to extremely
> poor fetcher performance.
> Since the checkin to respect crawlDelay in robots.txt, we are no longer
> guranteed that BLOCKED_ADDR_TO_TIME queue is a fifo list. The simple fix is
> to iterate the queue once rather than busy waiting...
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers