I've managed to get some progress on creating a
nutchdb off a sql database (using mcKoi right now,
however any rdbms can be used)

Some of the nice features obtained from using a sql
backend us:

1. Quick updates - only update records that changed -
not the entire db. I'm working on a scale of hundreds
of gigs for my goals - tmp space/sort space and
duplicate files cost me big money in storage
2. Manageability - using an rdbms will allow you to
store data in segments, using oracle 10g to do grid
computing/clustering/shared memory/quick rdbms sorts
and live querying.
3. Templates/stored procedures - the ability to create
fetchlists of a sql template and add database fields
to manage your fetchlists accordingly.  I have already
ported existing funcitonality as well as implemented
"partitioned" fetchlists which allow you to group
elements and fetch them based on whatever values you
wish. (last fetch date, score, and query server
location)

Part of my goal is to be able to build a managed
search system and to identify where the segments are
and how to update them as well as use sql to manage
where the segments are without having to have a
central repository to do index de-duping and such.

The coolest thing is the ability to refactor and test
different algorithms without having to rebuild your
database. Upgrades could be as simple as an export,
drop, recreate schema, import (map as necessary) and
voila you don't have to restart or write a custom
conversion process.

I've also attempted to create an ODBC bridge to the
query servers as well so you could implement an ODBC
interface to query your segments and do joins across
your webdb.

i havn't worked through everything yet, i'm still
having issues of learning curve, best practices and
performance, however its fun :)  not sure if it will
scale out as well or affordably, but i'm come from an
rdbms background and it works for me!

BTW, is there an "OODBMS" that is open source? 
(Object Oriented DBMS) - may work better for seemless
integration on the object level.


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to