Re: About HBase Integration

2010-02-09 Thread Andrzej Bialecki
On 2010-02-09 03:08, Hua Su wrote: Thanks. But heritrix is another project, right? Please see this Git repository, it contains the latest work in progress on Nutch+HBase: git://github.com/dogacan/nutchbase.git -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _

Re: About HBase Integration

2010-02-09 Thread Hua Su
Hi, I notice the repository has not been updated since last Christmas. Is that work still in progress? Best, Hua On Tue, Feb 9, 2010 at 4:23 PM, Andrzej Bialecki a...@getopt.org wrote: On 2010-02-09 03:08, Hua Su wrote: Thanks. But heritrix is another project, right? Please see this Git

Re: Nutch + Solr: filtering URL while indexing

2010-02-09 Thread Stefano Cherchi
Thanks for your hints, Julien. I'm going to make some test and let you know.. If I understand well, at the moment I have to perform a mergesegments cycle before the final indexing to filter out the undesired urls? Talking of adding filtering to the map method, I need to take some time to

Re: repeat fetch of same page without error

2010-02-09 Thread Sunnyvale Fl
seems to work!! thanks a lot for the help!!! On Tue, Feb 2, 2010 at 5:07 PM, reinhard schwab reinhard.sch...@aon.atwrote: i have never used and tested 0.9. i have looked into the code, it is quite different to 1.0 in regard to CrawlDbReducer and scheduling. i propose to change the method