After further debugging, i still couldn't identify the exact problems, but i
noticed that if i comment out my code that in the filter, it does not hang.

My code does the following:
- Obtain the list of outlinks from the JSParseFilter (pre-packaged in nutch.
Just need to configure it)
- iterate each link one by one and try to fetch the dimensions of them, by
using the java imageIO/Imagereader class (in case one of them is an image)
- remember the largest one


This is the code:
private Dimension readImageDimension(String urlString) {
                //return new Dimension(500, 500);
                long start = System.currentTimeMillis();
                URL url = null;
                ImageInputStream imageStream = null;
                try {
                        url=new URL(urlString);
                        imageStream = 
ImageIO.createImageInputStream(url.openStream());
         
                        java.util.Iterator<ImageReader> readers =
ImageIO.getImageReaders(imageStream);
         
                        ImageReader reader = null;
                        if(readers.hasNext()) {
                                reader = readers.next();
                                reader.setInput(imageStream,true,true);
                                int imageWidth = reader.getWidth(0);
                                int imageHeight = reader.getHeight(0);
                                 
                                reader.dispose();
                                imageStream.close();
                                return new Dimension(imageWidth, imageHeight);
                        }else {
                                imageStream.close();
                    //can't read image format... what do you want to do about 
it,
                    //throw an exception, return ?
                        }
                } catch (Throwable e) {
                        e.printStackTrace();
                        try {
                                imageStream.close();
                        } catch (IOException e1) {
                        }
                } finally {
                        long end = System.currentTimeMillis();
                        System.out.println("calculate dimension takes: " + 
(end-start) + "ms"); 
                }
                return new Dimension(0, 0);
        }

Note that i am consuming all exceptions that could occur here.
Whenever i comment out this code that uses imageReader/IO to fetch the page,
the hang problem did not happen. I do notice that each readDimension call
takes somewhere between 50-200ms. I wonder if this delay in filter can
causes some timing issues in nutch?

Also, the hang ONLY occurs at the beginning of a new depth,
indeterministically (at depth 2 of 10, depth 8 or 10, etc...)

Any help would be appreciated!
gary





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nutch-1-2-fetcher-aborting-with-N-hung-threads-tp2411724p3002598.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to