Andrzej said: > Nutch 0.7 uses a variant of PageRank link analysis, and the analyze tool > would perform a couple iterations to propagate the scores along links. > However, it was a slow and very resource-hungry process, so sometimes it > was even impossible to go through the analysis step even for > moderatly-sized collections.
Interesting. If this is invoked with "bin/nutch analyze db_dir 3" (three rounds of analysis) it took about 35 minutes with some 300,000 pages on a dual Xeon machine with 3 gigs of RAM. This is a small share of time spent fetching, generating segments, etc. > 0.7 offers also an option to use a static ranking method, which doesn't > require running the analysis step, and which is based on the number of > outlinks and inlinks. Um, it isn't clear how to do this. I don't see anything in http://wiki.apache.org/nutch/CommandLineOptions nor nutch-default.xml. > Nutch 0.8 uses scoring plugins, which can implement different scoring > algorithms. The default one is based on OPIC, which is again a variant > of link-based quality metrics - please see OPICScoringFilter for more > details. That sounds useful. The referenced paper sure makes it sure sounds more efficient. Thanks and best wishes, Bill P.S. Any thoughts on how to downplay repeated instances of a word on a page? -- *------------------------------------------------------* | Bill Goffe [EMAIL PROTECTED] | | Department of Economics voice: (315) 312-3444 | | SUNY Oswego fax: (315) 312-5444 | | 416 Mahar Hall <http://cook.rfe.org> | | Oswego, NY 13126 | *--------*------------------------------------------------------*-----------* | "I have been informed by the senior neurosurgical society to discontinue | | expert testimony for plaintiffs or risk membership. Therefore I am | | withdrawing as your expert." | | -- Dr. Robert W. Rand, a neurosurgeon, on why he couldn't testify | | against another neurosurgeon, Dr. Edgar Housepian. Dr. Housepian was | | alleged to have accidentally cut a major artery in the brain of a 3 | | year old who ended up with permanent disabilities. "Making | | Malpractice Harder to Prove," Michelle Andrews, New York Times, | | 12/21/03. | *---------------------------------------------------------------------------*
