Hello,
I am trying to get a Plugin together. So far it looks quite ok. The
Plugin contains one IndexingFilter Extension.
To test the integration within nutch, i run the
org.apache.nutch.tools.CrawlTool with several arguments.
The strange thing is, that the shutDown() method of my Plugin is never
called.
Since i couldnt see my Plugin.shutDown() method called i put some
additional outputs in PluginRepository.java:
one at the beginning of method installExtensions():
LOG.info("**********installExtensions");
one at the beginning of method shotDownActivatedPlugins():
LOG.info("**********shotDownActivatedPlugins");
When i look now at the output of CrawlTool i see the following:
...
051108 135523 impl: point=org.apache.nutch.protocol.Protocol
class=org.apache.nutch.protocol.httpclient.Http
051108 135523 impl: point=org.apache.nutch.protocol.Protocol
class=org.apache.nutch.protocol.httpclient.Http
*051108 135523 **********installExtensions
*051108 135523 found resource regex-urlfilter.txt at
file:/home/rude/workspace/nutch-0.7/conf/regex-urlfilter.txt
051108 135523 RegexURLFIlter rules have been reloaded
....
051108 135523 Processing pagesByURL: Merged Infinity records/second
051108 135523 Processing pagesByMD5: Sorted 1 instructions in 0.0020
seconds.
051108 135523 Processing pagesByMD5: Sorted 500.0 instructions/second
051108 135523 Processing pagesByMD5: Merged to new DB containing 1
records in 0.0 seconds
....
051108 135523 FetchListTool completed
051108 135523 logging at INFO
051108 135523 fetching http://........................
051108 135523 http.proxy.host = null
051108 135523 http.proxy.port = 8080
051108 135523 http.timeout = 10000
051108 135523 http.content.limit = 65536
051108 135523 http.agent = NutchCVS/0.7 (Nutch;
http://lucene.apache.org/nutch/bot.html; [email protected])
051108 135523 http.auth.ntlm.username =
051108 135523 fetcher.server.delay = 1000
051108 135523 http.max.delays = 100
051108 135524 Configured Client
*051108 135525 **********shotDownActivatedPlugins
*051108 135526 status: segment 20051108135523, 1 pages, 0 errors, 27656
bytes, 2234 ms
051108 135526 status: 0.44762757 pages/s, 96.71553 kb/s, 27656.0 bytes/page
051108 135527 Updating /tmp/crawlx/db
051108 135527 Updating for /tmp/crawlx/segments/20051108135523
051108 135527 Processing document 0
.....
051108 135527 Finishing update
051108 135527 Processing pagesByURL: Sorted 91 instructions in 0.0070
seconds.
051108 135527 Processing pagesByURL: Sorted 13000.0 instructions/second
....
051108 135527 Processing linksByMD5: Merged 11666.666666666666
records/second
051108 135527 Update finished
051108 135527 Updating /tmp/crawlx/segments from /tmp/crawlx/db
051108 135527 reading /tmp/crawlx/segments/20051108135523
051108 135527 Sorting pages by url...
051108 135527 Getting updated scores and anchors from db...
051108 135527 Sorting updates by segment...
051108 135527 Updating segments...
051108 135527 updating /tmp/crawlx/segments/20051108135523
051108 135527 Done updating /tmp/crawlx/segments from /tmp/crawlx/db
051108 135527 indexing segment: /tmp/crawlx/segments/20051108135523
051108 135527 * Opening segment 20051108135523
051108 135527 * Indexing segment 20051108135523
*..........Custom Indexing filter start..........
this is my IndexingFilter Extension
.......... Custom Indexing filter end ..........
*051108 135527 found resource common-terms.utf8 at
file:/home/rude/workspace/nutch-0.7/conf/common-terms.utf8
051108 135527 * Optimizing index...
051108 135527 * Moving index to NFS if needed...
051108 135527 DONE indexing segment 20051108135523: total 1 records in
0.25 s (Infinity rec/s).
051108 135527 done indexing
051108 135527 Reading url hashes...
051108 135527 Sorting url hashes...
051108 135527 Deleting url duplicates...
051108 135527 Deleted 0 url duplicates.
051108 135527 Reading content hashes...
051108 135527 Sorting content hashes...
051108 135527 Deleting content duplicates...
051108 135527 Deleted 0 content duplicates.
051108 135527 Duplicate deletion complete locally. Now returning to NFS...
051108 135527 DeleteDuplicates complete
051108 135527 Merging segment indexes...
Any ideas what i do / might be wrong?
Kind regards,
ud