Hi there Jay,

 Here are some numbers that a colleague and I presented in my graduate
computer science seminar class on search engines in the Spring 05' semester
at USC. The numbers measure the efficiency and scalability of several of the
plugin content extractors for Nutch (PDF, WORD, RSS, etc.). The tests were
performed on a RedHat Linux 7.3 Box, with 1.3 GB RAM, and a 10 GB HD, and a
Pentium III 500 Mhz processor. 

 The presentation is geared towards the parse-rss plugin that I wrote,
although they should give you an idea of the other content extractors too.

Hope they help, here's the link to the presentation:

http://baron.pagemewhen.com:8080/~chris/RSS-Nutch-Eval.ppt

Cheers,
  Chris


______________________________________________
Chris A. Mattmann
[EMAIL PROTECTED] 
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group

_________________________________________________
Jet Propulsion Laboratory               Pasadena, CA
Office: 171-266B                        Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.

> -----Original Message-----
> From: webmaster [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, July 20, 2005 8:02 PM
> To: [email protected]
> Subject: benchmarking
> 
> hey could some of you post your speeds (sorting,indexing, pages a
> sec/documents a sec) and system specs I'm trying to compile a database of
> which of nutches functions are better suited to run on what hardware. also
> if
> any of you have a sun box could you post its specs and some of the info
> for
> database sorting speeds and indexing speed, anything that uses full cpu.
> whats everyones pages a sec top score??? e-mail me @
> [EMAIL PROTECTED]
> I'll post a webpage with the results
> Thanks,
> -Jay Pound

Reply via email to