Hi, Gal, Yes, I am interested. You can post the tarball to http://issues.apache.org/jira/browse/Nutch
Thanks, John On Thu, Sep 29, 2005 at 09:53:42PM +0200, Gal Nitzan wrote: > Hi, > > I have written (not much) a new plugin, based on the URLFilter > interface: urlfilter-db . > > The purpose of this plugin is to filter domains, i.e. I would like to > crawl the world but to fetch only certain domains. > > The plugin uses a caching system (SwarmCache, easier to deploy than JCS) > and on the back-end a database. > > For each url > filter is called > end for > > filter > get the domain name from url > call cache.get domain > if not in cache try the database > if in database cache it and return it > return null > end filter > > > The plugin reads the cache size, jdbc driver, connection string, table > to use and domain field from nutch-site.xml > > Since I do not have the tools to add it to the svn and all, If someone > is interested let me know and I can mail it. > > Regards, > > Gal > __________________________________________ http://www.neasys.com - A Good Place to Be Come to visit us today!
