You will want to take a look at the index-basic and index-more plugins as well as the org.apache.nutch.indexer.Indexer class. If you just want to score documents differently as opposed to indexing them differently you will want to take a look at the scoring-opic plugin for an implementatiof of the scoring algorithm.
Dennis Kubes Otis Gospodnetic wrote: > Stephane, > > Nutch uses Lucene for indexing, and Lucene has a class called IndexWriter > that is used for indexing Lucene Documents. Here is a quick grep in Nutch's > *java files: > > $ ffjg -l IndexWriter > ./src/test/org/apache/nutch/indexer/TestDeleteDuplicates.java > ./src/java/org/apache/nutch/indexer/IndexMerger.java > ./src/java/org/apache/nutch/indexer/IndexSorter.java > ./src/java/org/apache/nutch/indexer/Indexer.java > ./contrib/web2/plugins/web-query-propose-spellcheck/src/java/org/apache/nutch/spell/NGramSpeller.java > > Otis > > > ----- Original Message ---- > From: Stephane Gamard <[EMAIL PROTECTED]> > To: [email protected] > Sent: Wednesday, October 25, 2006 7:22:25 AM > Subject: Nutch Indexing > > Hi all, > > I am a researcher in Semantic Analysis and I am very interested in > the Nutch project as a test bed for new indexing methods. As I > understand (and from the documentation online) Nutch allows for > plugin development and manipulation. It looks promising then to be > able to substitute my indexing method to the default Nutch one. Yet I > would love some clarification regarding "which" plugin are > responsible for the indexing of fetched web-pages, as they are the > ones I shall be replacing. > > Does this make any sense, am I going the right way? > > Thank you, > > _Stephane > > > > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
