> > # De-duplicate indexes > # "bogus" argument is ignored but needed due to > # a bug in the number of args expected > bin/nutch dedup crawl/segments bogus >
The dedup command works only on many indexes and not on one or many segments. The directory structure of an index looks like: index/part-00000/SOME_LUCENE_FILES Here is an example how is the structure of an crawl: crawl/segments/20060702232437 crawl/segments/20060702233040 crawl/linkdb crawl/indexes //this is the index of the two segments Now you can run dedup: bin/nutch dedup crawl/indexes If you run dedup on a folder which contains segments, an exception should be thrown. Look at your logfiles and verify that the dedup process runs whithout exeptions. Marko Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
