Thanks for the link, it was an interesting read. Seems like their over 
complicating things a bit. To me it's just a matter of counting how long a 
sentence is, if you look at most web pages the sentences in their side columns 
are usually filler, and short, while the sentences in the main content area are 
longer.

Anyways, I'll leave it to the Java pro's, thanks for the link.

> > Sorry i cant give more then an idea, I'm not a java developer, but I think
> > the idea could prove useful.
> > The idea is to limit the length of sentences that get entered into the
> > index. So, after parsing a page, and words that don't make what appears to
> > be a complete sentence get ignored.
> 
> Douglas,
> 
> Here is a previous discussion about this subject on the list:
> http://www.mail-archive.com/[email protected]/msg03070.html
> Take a look at this thread.. this problem is not so easy.
> 
> Regards
> 
> Jérôme
> 
> --
> http://motrech.free.fr/
> http://www.frutch.org/






-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to