I've been working for a couple of weeks on a simple focused crawler based on
Nutch.
I used the score field to assign a priority to each url to be crawled by
means of a particular Prioritizer implementation, that could also be the
current Nutch link analysis algorithm of course.
I basically iterate the basic cycle: generate segment, fetch, updatedb, but
in the analyzer's place I placed a call to the ad hoc prioritizer. Each
iteration corresponds to a new segment.
But when I need to instantiate the MultiSearcher to run some query in the
cycle, for example to show some statistics, after nearly 20 iterations (less
then 1000 urls), that is 20 Searcher calls, I got the "Too many open files"
message. I took care to close the Searcher when I finished with it and I
also raised the max opened-file settings but the problem persists.
Any suggestions?
Thanks

Fabio Gasparetti



Nutch: 0.4
Java: 1.4.2_01
SO: Linux Red Hat 7.1
1Gbytes ram


040628 150916 10 SEVERE Exception in CrawlerStat
call:java.io.FileNotFoundException:
pluto/segments/20040628150731/fetcher_text/index (Too many open files)
040628 150916 10 indexing segment: pluto/segments/20040628150903
java.lang.NullPointerException
        at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:141)
        at org.apache.lucene.store.FSDirectory.<init>(FSDirectory.java:128)
        at
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:102)
        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:193)
        at net.nutch.indexer.IndexSegment.indexPages(IndexSegment.java:49)
        at net.nutch.indexer.IndexSegment.main(IndexSegment.java:182)
        at
com.parc.search.focusedcrawler.FocusedCrawlTool.run(FocusedCrawlTool.java:17
3)
        at
com.parc.search.focusedcrawler.FocusedCrawlTool.main(FocusedCrawlTool.java:3
88)
040628 150916 10 SEVERE java.lang.NullPointerException





-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to