On May 11, 2006, at 3:36 AM, Jérôme Charron wrote:
Actually, the clustering uses the summaries as input. I assumes it
would
provides some better results if it takes the whole documents
content. no?
I assumes that clustering uses the summaries instead of documents
content
for some performances purpose.
But there is a (bad) side effect : since the size of the summaries is
configurable, the clustering "quality" will vary depending on the
summaries
size configuration. I really found this very confusing : when folks
adjust
this parameter it is only for front-end consideration (they want to
display
a long or a short summary), but certainly not for clustering reasons.
What you and others thinks about this?
Bob Carpenter of alias-i had this to say when I brought up this very
idea:
http://article.gmane.org/gmane.comp.jakarta.lucene.devel/12599
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers