fetch, then parse.

> I'm running nutch 1.3 on 64 bit Ubuntu, following are the commands and
> relevant output.
> 
> ----------------------------------
> llist@LeosLinux:~$ /usr/share/nutch/runtime/local/bin/nutch
> inject /home/llist/nutchData/crawl/crawldb /home/llist/nutchData/seed
> Injector: starting at 2011-07-15 18:32:10
> Injector: crawlDb: /home/llist/nutchData/crawl/crawldb
> Injector: urlDir: /home/llist/nutchData/seed
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2011-07-15 18:32:13, elapsed: 00:00:02
> =================
> llist@LeosLinux:~$ /usr/share/nutch/runtime/local/bin/nutch
> generate /home/llist/nutchData/crawl/crawldb
> /home/llist/nutchData/crawl/segments Generator: starting at 2011-07-15
> 18:32:41
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: Partitioning selected urls for politeness.
> Generator: segment: /home/llist/nutchData/crawl/segments/20110715183244
> Generator: finished at 2011-07-15 18:32:45, elapsed: 00:00:03
> ==================
> llist@LeosLinux:~$ /usr/share/nutch/runtime/local/bin/nutch
> fetch /home/llist/nutchData/crawl/segments/20110715183244
> Fetcher: Your 'http.agent.name' value should be listed first in
> 'http.robots.agents' property.
> Fetcher: starting at 2011-07-15 18:34:55
> Fetcher: segment: /home/llist/nutchData/crawl/segments/20110715183244
> Fetcher: threads: 10
> QueueFeeder finished: total 1 records + hit by time limit :0
> fetching http://www.seek.com.au/
> -finishing thread FetcherThread, activeThreads=1
> -finishing thread FetcherThread, activeThreads=1
> -finishing thread FetcherThread, activeThreads=1
> -finishing thread FetcherThread, activeThreads=1
> -finishing thread FetcherThread, activeThreads=2
> -finishing thread FetcherThread, activeThreads=1
> -finishing thread FetcherThread, activeThreads=1
> -finishing thread FetcherThread, activeThreads=1
> -finishing thread FetcherThread, activeThreads=1
> -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
> -finishing thread FetcherThread, activeThreads=0
> -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
> -activeThreads=0
> Fetcher: finished at 2011-07-15 18:34:59, elapsed: 00:00:03
> =================
> llist@LeosLinux:~$ /usr/share/nutch/runtime/local/bin/nutch
> updatedb /home/llist/nutchData/crawl/crawldb
> -dir /home/llist/nutchData/crawl/segments/20110715183244
> CrawlDb update: starting at 2011-07-15 18:36:00
> CrawlDb update: db: /home/llist/nutchData/crawl/crawldb
> CrawlDb update: segments:
> [file:/home/llist/nutchData/crawl/segments/20110715183244/crawl_fetch,
> file:/home/llist/nutchData/crawl/segments/20110715183244/crawl_generate,
> file:/home/llist/nutchData/crawl/segments/20110715183244/content]
> CrawlDb update: additions allowed: true
> CrawlDb update: URL normalizing: false
> CrawlDb update: URL filtering: false
> - skipping invalid segment
> file:/home/llist/nutchData/crawl/segments/20110715183244/crawl_fetch
> - skipping invalid segment
> file:/home/llist/nutchData/crawl/segments/20110715183244/crawl_generate
> - skipping invalid segment
> file:/home/llist/nutchData/crawl/segments/20110715183244/content
> CrawlDb update: Merging segment data into db.
> CrawlDb update: finished at 2011-07-15 18:36:01, elapsed: 00:00:01
> -----------------------------------
> 
> Appreciate any hints on what I'm missing.

Reply via email to