Thanks for the link, it was an interesting read. Seems like their over complicating things a bit. To me it's just a matter of counting how long a sentence is, if you look at most web pages the sentences in their side columns are usually filler, and short, while the sentences in the main content area are longer.
Anyways, I'll leave it to the Java pro's, thanks for the link. > > Sorry i cant give more then an idea, I'm not a java developer, but I think > > the idea could prove useful. > > The idea is to limit the length of sentences that get entered into the > > index. So, after parsing a page, and words that don't make what appears to > > be a complete sentence get ignored. > > Douglas, > > Here is a previous discussion about this subject on the list: > http://www.mail-archive.com/[email protected]/msg03070.html > Take a look at this thread.. this problem is not so easy. > > Regards > > Jérôme > > -- > http://motrech.free.fr/ > http://www.frutch.org/ ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
