We are integrating the dmoz.org rdf dump into our site as a directory.  We
seeded our database with the dmoz.org data and will look at extending that
to incorporate the directory categories in our search results if the data
was seeded or exists in the odp content.

Just researching the best way to do this without adding much more overhead
to an already slowing down index process :)

Once we make some headway i'll upload our code to jira if this is what you
are looking to accomplish as well. 

You may want to look at the creativecommons plugin to find a good starting
point for doing somethig like this.

thanks,
-byron


-----Original Message-----
From: Philippe LE NAOUR <[EMAIL PROTECTED]>
To: [email protected]
Date: Sat, 21 May 2005 12:47:38 +0200
Subject: Re: Hardware requirements and some other questions about Nutch

> 
> Thanks for responding.
> 
> Byron Miller a �crit :
> 
> >Actually at mozdex we have consolidated a bit and we are rebuilding
> under
> >the latest release.   For 50 million urls a 200 gig disk is all you
> need.
> >That leaves you enough room for your segmetns, db and the space needed
> to
> >process (about double your db size)
> >  
> >
> Thanks, I think it's a googd starting point to make the test platform.
> I 
> will see after if I need to upgrade.
> 
> >The biggest boost you can give your query servers is tons of memory.
> SATA
> >150 or Scsi drives at 10krpm is also a bonus.
> >  
> >
> I've seen that mozdex query servers use 2G of memory, do you think that
> 1G is enough for little trafic ?
> 
> 
> For the categorizer, do you think it's possible, I took a look on some 
> Nutch plugins like GeoPosition and Language Identifier, but it doesn't 
> help me ?
> If it's possible to do it, I will study the plugins deeper.
> 



Reply via email to