I've managed to get some progress on creating a nutchdb off a sql database (using mcKoi right now, however any rdbms can be used)
Some of the nice features obtained from using a sql backend us: 1. Quick updates - only update records that changed - not the entire db. I'm working on a scale of hundreds of gigs for my goals - tmp space/sort space and duplicate files cost me big money in storage 2. Manageability - using an rdbms will allow you to store data in segments, using oracle 10g to do grid computing/clustering/shared memory/quick rdbms sorts and live querying. 3. Templates/stored procedures - the ability to create fetchlists of a sql template and add database fields to manage your fetchlists accordingly. I have already ported existing funcitonality as well as implemented "partitioned" fetchlists which allow you to group elements and fetch them based on whatever values you wish. (last fetch date, score, and query server location) Part of my goal is to be able to build a managed search system and to identify where the segments are and how to update them as well as use sql to manage where the segments are without having to have a central repository to do index de-duping and such. The coolest thing is the ability to refactor and test different algorithms without having to rebuild your database. Upgrades could be as simple as an export, drop, recreate schema, import (map as necessary) and voila you don't have to restart or write a custom conversion process. I've also attempted to create an ODBC bridge to the query servers as well so you could implement an ODBC interface to query your segments and do joins across your webdb. i havn't worked through everything yet, i'm still having issues of learning curve, best practices and performance, however its fun :) not sure if it will scale out as well or affordably, but i'm come from an rdbms background and it works for me! BTW, is there an "OODBMS" that is open source? (Object Oriented DBMS) - may work better for seemless integration on the object level. ------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
