Hi, Gal,

Yes, I am interested. You can post the tarball to
http://issues.apache.org/jira/browse/Nutch

Thanks,

John

On Thu, Sep 29, 2005 at 09:53:42PM +0200, Gal Nitzan wrote:
> Hi,
> 
> I have written (not much) a new plugin, based on the URLFilter 
> interface: urlfilter-db .
> 
> The purpose of this plugin is to filter domains, i.e. I would like to 
> crawl the world but to fetch only certain domains.
> 
> The plugin uses a caching system (SwarmCache, easier to deploy than JCS) 
> and on the back-end a database.
> 
> For each url
>    filter is called
> end for
> 
> filter
>  get the domain name from url
>   call cache.get domain
>   if not in cache try the database
>   if in database cache it and return it
>   return null
> end filter
> 
> 
> The plugin reads the cache size, jdbc driver, connection string, table 
> to use and domain field from nutch-site.xml
> 
> Since I do not have the tools to add it to the svn and all, If someone 
> is interested let me know and I can mail it.
> 
> Regards,
> 
> Gal
> 
__________________________________________
http://www.neasys.com - A Good Place to Be
Come to visit us today!

Reply via email to