It seems that I understand this problem now: this comes from the prior
fetch(es).
I need to find some way to reset the database if I want to execute a
fresh crawl, right?
Sorry if this is too basic a question. This is only my 4th day into
Nutch/Hadoop/Hbase though I have been a Java programmer for a while.
Thanks
-Weilei


On Wed, Jan 30, 2013 at 11:52 AM, Weilei Zhang <[email protected]> wrote:
> Hi
> I am trying to use Nutch 2.x and have one question regarding Generator
> and Injector:
> Basically, I only have link as root to crawl and I see (by
> instrumenting the code) that this one link was written to Context in
> the last step of InjectorJob and that is the only link written to
> Context from GeneratorJob. However, I saw multiple links sent to map
> function  in the first steps of GeneratorJob ( I instrumented setup
> function). Those links seem to include all URLs referenced from the
> original link. My question is where does fetch/parse happen? From the
> Crawler code, it is straightforward to me that Injector is immediately
> followed by Generator; I tried to scrub the code down to do the job
> but failed.
>
> I ran crawl in the following way:
>>/nutch  crawl urlsDir
>
> There is only one link under a file in urlsDir.
>>cat urlsDir/*
> http://www.bmw.com
>
> The following is excerpt from the Generator map function
> instrumentation output. Those are reversedURL.
> al.com.bmw.www:http/
> al.com.bmw.www:http/al/en
> am.bmw.www:http/
> am.bmw.www:http/am/en
> ao.co.bmw:http/
> ao.co.bmw:http/ao/pt
> ar.com.bmw.www:http/
> ar.com.bmw.www:http/ar/es/
> at.bmw.www:http/
> at.bmw.www:http/at/de/general/configurations_center/configure.html
> at.bmw.www:http/de/index.html
> at.bmw.www:http/de/topics/services-angebote/connecteddrivedienste/connecteddrive-antrag/ueberblick.html
> au.com.bmw.www:http/
> au.com.bmw.www:http/com/en/newvehicles/1series/5door/2011/showroom/compare.html
> au.com.bmw.www:http/com/en/newvehicles/1series/5door/2011/showroom/configurator.html
> au.com.bmw.www:http/com/en/newvehicles/1series/5door/2011/showroom/driveawayprice.html
> au.com.bmw.www:http/com/en/newvehicles/1series/5door/2011/showroom/financecalculator.html
> au.com.bmw.www:http/com/en/newvehicles/1series/5door/2011/showroom/highlights/
> au.com.bmw.www:http/com/en/newvehicles/1series/5door/2011/showroom/introduction.html
> au.com.bmw.www:http/com/en/newvehicles/1series/5door/2011/showroom/requestebrochure.html
> au.com.bmw.www:http/com/en/newvehicles/1series/5door/2011/showroom/requesttestdrive.html
> au.com.bmw.www:http/com/en/newvehicles/1series/convertible/2011/showroom/compare.html
> au.com.bmw.www:http/com/en/newvehicles/1series/convertible/2011/showroom/configurator.html
> au.com.bmw.www:http/com/en/newvehicles/1series/convertible/2011/showroom/driveawayprice.html
> au.com.bmw.www:http/com/en/newvehicles/1series/convertible/2011/showroom/financecalculator.html
> au.com.bmw.www:http/com/en/newvehicles/1series/convertible/2011/showroom/highlights/
>
>
> Thanks for any hints!
> --
> Best Regards
> -Weilei



-- 
Best Regards
-Weilei

Reply via email to