After further debugging, i still couldn't identify the exact problems, but i
noticed that if i comment out my code that in the filter, it does not hang.
My code does the following:
- Obtain the list of outlinks from the JSParseFilter (pre-packaged in nutch.
Just need to configure it)
- iterate each link one by one and try to fetch the dimensions of them, by
using the java imageIO/Imagereader class (in case one of them is an image)
- remember the largest one
This is the code:
private Dimension readImageDimension(String urlString) {
//return new Dimension(500, 500);
long start = System.currentTimeMillis();
URL url = null;
ImageInputStream imageStream = null;
try {
url=new URL(urlString);
imageStream =
ImageIO.createImageInputStream(url.openStream());
java.util.Iterator<ImageReader> readers =
ImageIO.getImageReaders(imageStream);
ImageReader reader = null;
if(readers.hasNext()) {
reader = readers.next();
reader.setInput(imageStream,true,true);
int imageWidth = reader.getWidth(0);
int imageHeight = reader.getHeight(0);
reader.dispose();
imageStream.close();
return new Dimension(imageWidth, imageHeight);
}else {
imageStream.close();
//can't read image format... what do you want to do about
it,
//throw an exception, return ?
}
} catch (Throwable e) {
e.printStackTrace();
try {
imageStream.close();
} catch (IOException e1) {
}
} finally {
long end = System.currentTimeMillis();
System.out.println("calculate dimension takes: " +
(end-start) + "ms");
}
return new Dimension(0, 0);
}
Note that i am consuming all exceptions that could occur here.
Whenever i comment out this code that uses imageReader/IO to fetch the page,
the hang problem did not happen. I do notice that each readDimension call
takes somewhere between 50-200ms. I wonder if this delay in filter can
causes some timing issues in nutch?
Also, the hang ONLY occurs at the beginning of a new depth,
indeterministically (at depth 2 of 10, depth 8 or 10, etc...)
Any help would be appreciated!
gary
--
View this message in context:
http://lucene.472066.n3.nabble.com/Nutch-1-2-fetcher-aborting-with-N-hung-threads-tp2411724p3002598.html
Sent from the Nutch - User mailing list archive at Nabble.com.