Is anyone using this right now? Any measure of performance/overhead when distributing across multiple systems?
I still have a "BIG BOX" server doing all of the webdb and it can take 2-3 days to analyze a single iteration of my webdb - would the noted distributed webdb offer any gain?
i see a lot of the ndfs switches/command line args enabled across the board, but the doc's don't reference distributed webdb as beeing fully integrated.
First of all, welcome back! - it's nice to see that Mozdex is again up and running.
AFAIK, NDFS has been integrated into all tools that deal with segments and WebDB, as an abstraction layer above the real filesystem. So, you can use NDFS to distribute the processing of any tool, with the notable exception of tools that use Lucene indexes - because there is no NDFS-aware version of Lucene Directory (yet).
Regarding the DistributedWebDB... I've never tried it yet, but from my reading the code it looks like it will happily use NDFS, too.
-- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
