(The message below was posted to nutch-dev a few days ago.) Can anyone (anonymous or otherwise) confirm whether it's possible to use Nutch 0.7 for a "4-6 billion page search engine"? Is this a typo or for real? Just curious and if it's true what were the major issues e.g. time, RAM, (storage presumably)? My understanding was that the practical limit on 0.7 was about 100 million pages whatever hardware you have.
-Ed On 1/3/07, Nutch User <[EMAIL PROTECTED]> wrote:
Hello Nutch Developers, I hope this post is appropriate for the list, and apologize if it is not. Our company is currently utilizing Nutch 0.7 for a 4-6 billion page search engine. This engine is used both by internal staff and external users for searching on internet content. As you well know, there are many many issues associated with this large of an index. We were hoping some of these issues would be addressed in 0.8, but we don't think 0.8 is quite ready yet for prime time. Therefore, we would like to hire a Nutch programmer to help us make Nutch into a more viable solution for large indexes such as ours. We prefer a full-time person to work on-site with us in the US, but will consider possible remote work as well. If you are interested, please reply to this e-mail address ( [EMAIL PROTECTED]) with your resume and salary requirements. Please include any java experience, Nutch-specific experience, and any experience with large data sets (particularly with large url databases). We (the company) prefer to remain confidential for now, but will discuss details with candidates. Thank you for your time, Nutch User
