Andrzej said:

> Nutch 0.7 uses a variant of PageRank link analysis, and the analyze tool 
> would perform a couple iterations to propagate the scores along links. 
> However, it was a slow and very resource-hungry process, so sometimes it 
> was even impossible to go through the analysis step even for 
> moderatly-sized collections. 

Interesting. If this is invoked with "bin/nutch analyze db_dir 3" (three
rounds of analysis) it took about 35 minutes with some 300,000 pages on a
dual Xeon machine with 3 gigs of RAM. This is a small share of time spent
fetching, generating segments, etc.

> 0.7 offers also an option to use a static ranking method, which doesn't
> require running the analysis step, and which is based on the number of
> outlinks and inlinks.

Um, it isn't clear how to do this. I don't see anything in
http://wiki.apache.org/nutch/CommandLineOptions nor nutch-default.xml.

> Nutch 0.8 uses scoring plugins, which can implement different scoring 
> algorithms. The default one is based on OPIC, which is again a variant 
> of link-based quality metrics - please see OPICScoringFilter for more 
> details.

That sounds useful. The referenced paper sure makes it sure sounds more
efficient.

Thanks and best wishes,

           Bill

P.S. Any thoughts on how to downplay repeated instances of a word on 
     a page?

-- 
         *------------------------------------------------------*
         | Bill Goffe                 [EMAIL PROTECTED]          |
         | Department of Economics    voice: (315) 312-3444     |
         | SUNY Oswego                fax:   (315) 312-5444     |
         | 416 Mahar Hall             <http://cook.rfe.org>     |          
         | Oswego, NY  13126                                    |
*--------*------------------------------------------------------*-----------*
| "I have been informed by the senior neurosurgical society to discontinue  |
| expert testimony for plaintiffs or risk membership. Therefore I am        |
| withdrawing as your expert."                                              |
|  --  Dr. Robert W. Rand, a neurosurgeon, on why he couldn't testify       |
|      against another neurosurgeon, Dr. Edgar Housepian. Dr. Housepian was |
|      alleged to have accidentally cut a major artery in the brain of a 3  |
|      year old who ended up with permanent disabilities. "Making           |
|      Malpractice Harder to Prove," Michelle Andrews, New York Times,      |
|      12/21/03.                                                            |
*---------------------------------------------------------------------------*

Reply via email to