On 2010-02-09 03:08, Hua Su wrote:
Thanks. But heritrix is another project, right?
Please see this Git repository, it contains the latest work in progress
on Nutch+HBase:
git://github.com/dogacan/nutchbase.git
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _
Hi,
I notice the repository has not been updated since last Christmas. Is that
work still in progress?
Best,
Hua
On Tue, Feb 9, 2010 at 4:23 PM, Andrzej Bialecki a...@getopt.org wrote:
On 2010-02-09 03:08, Hua Su wrote:
Thanks. But heritrix is another project, right?
Please see this Git
Thanks for your hints, Julien. I'm going to make some test and let you know..
If I understand well, at the moment I have to perform a mergesegments cycle
before the final indexing to filter out the undesired urls?
Talking of adding filtering to the map method, I need to take some time to
seems to work!! thanks a lot for the help!!!
On Tue, Feb 2, 2010 at 5:07 PM, reinhard schwab reinhard.sch...@aon.atwrote:
i have never used and tested 0.9.
i have looked into the code, it is quite different to 1.0 in regard to
CrawlDbReducer and scheduling.
i propose to change the method