Done, but now get additional errors:

-------------------
llist@LeosLinux:~/nutchData$ /usr/share/nutch/runtime/local/bin/nutch
updatedb /home/llist/nutchData/crawl/crawldb
-dir /home/llist/nutchData/crawl/segments/20110716105826
CrawlDb update: starting at 2011-07-16 11:03:56
CrawlDb update: db: /home/llist/nutchData/crawl/crawldb
CrawlDb update: segments:
[file:/home/llist/nutchData/crawl/segments/20110716105826/crawl_fetch,
file:/home/llist/nutchData/crawl/segments/20110716105826/content,
file:/home/llist/nutchData/crawl/segments/20110716105826/crawl_parse,
file:/home/llist/nutchData/crawl/segments/20110716105826/parse_data,
file:/home/llist/nutchData/crawl/segments/20110716105826/crawl_generate,
file:/home/llist/nutchData/crawl/segments/20110716105826/parse_text]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: false
CrawlDb update: URL filtering: false
 - skipping invalid segment
file:/home/llist/nutchData/crawl/segments/20110716105826/crawl_fetch
 - skipping invalid segment
file:/home/llist/nutchData/crawl/segments/20110716105826/content
 - skipping invalid segment
file:/home/llist/nutchData/crawl/segments/20110716105826/crawl_parse
 - skipping invalid segment
file:/home/llist/nutchData/crawl/segments/20110716105826/parse_data
 - skipping invalid segment
file:/home/llist/nutchData/crawl/segments/20110716105826/crawl_generate
 - skipping invalid segment
file:/home/llist/nutchData/crawl/segments/20110716105826/parse_text
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2011-07-16 11:03:57, elapsed: 00:00:01
-------------------------------------------

On Sat, 2011-07-16 at 02:36 +0200, Markus Jelsma wrote:

> fetch, then parse.
> 
> > I'm running nutch 1.3 on 64 bit Ubuntu, following are the commands and
> > relevant output.
> > 
> > ----------------------------------
> > llist@LeosLinux:~$ /usr/share/nutch/runtime/local/bin/nutch
> > inject /home/llist/nutchData/crawl/crawldb /home/llist/nutchData/seed
> > Injector: starting at 2011-07-15 18:32:10
> > Injector: crawlDb: /home/llist/nutchData/crawl/crawldb
> > Injector: urlDir: /home/llist/nutchData/seed
> > Injector: Converting injected urls to crawl db entries.
> > Injector: Merging injected urls into crawl db.
> > Injector: finished at 2011-07-15 18:32:13, elapsed: 00:00:02
> > =================
> > llist@LeosLinux:~$ /usr/share/nutch/runtime/local/bin/nutch
> > generate /home/llist/nutchData/crawl/crawldb
> > /home/llist/nutchData/crawl/segments Generator: starting at 2011-07-15
> > 18:32:41
> > Generator: Selecting best-scoring urls due for fetch.
> > Generator: filtering: true
> > Generator: normalizing: true
> > Generator: jobtracker is 'local', generating exactly one partition.
> > Generator: Partitioning selected urls for politeness.
> > Generator: segment: /home/llist/nutchData/crawl/segments/20110715183244
> > Generator: finished at 2011-07-15 18:32:45, elapsed: 00:00:03
> > ==================
> > llist@LeosLinux:~$ /usr/share/nutch/runtime/local/bin/nutch
> > fetch /home/llist/nutchData/crawl/segments/20110715183244
> > Fetcher: Your 'http.agent.name' value should be listed first in
> > 'http.robots.agents' property.
> > Fetcher: starting at 2011-07-15 18:34:55
> > Fetcher: segment: /home/llist/nutchData/crawl/segments/20110715183244
> > Fetcher: threads: 10
> > QueueFeeder finished: total 1 records + hit by time limit :0
> > fetching http://www.seek.com.au/
> > -finishing thread FetcherThread, activeThreads=1
> > -finishing thread FetcherThread, activeThreads=1
> > -finishing thread FetcherThread, activeThreads=1
> > -finishing thread FetcherThread, activeThreads=1
> > -finishing thread FetcherThread, activeThreads=2
> > -finishing thread FetcherThread, activeThreads=1
> > -finishing thread FetcherThread, activeThreads=1
> > -finishing thread FetcherThread, activeThreads=1
> > -finishing thread FetcherThread, activeThreads=1
> > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
> > -finishing thread FetcherThread, activeThreads=0
> > -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
> > -activeThreads=0
> > Fetcher: finished at 2011-07-15 18:34:59, elapsed: 00:00:03
> > =================
> > llist@LeosLinux:~$ /usr/share/nutch/runtime/local/bin/nutch
> > updatedb /home/llist/nutchData/crawl/crawldb
> > -dir /home/llist/nutchData/crawl/segments/20110715183244
> > CrawlDb update: starting at 2011-07-15 18:36:00
> > CrawlDb update: db: /home/llist/nutchData/crawl/crawldb
> > CrawlDb update: segments:
> > [file:/home/llist/nutchData/crawl/segments/20110715183244/crawl_fetch,
> > file:/home/llist/nutchData/crawl/segments/20110715183244/crawl_generate,
> > file:/home/llist/nutchData/crawl/segments/20110715183244/content]
> > CrawlDb update: additions allowed: true
> > CrawlDb update: URL normalizing: false
> > CrawlDb update: URL filtering: false
> > - skipping invalid segment
> > file:/home/llist/nutchData/crawl/segments/20110715183244/crawl_fetch
> > - skipping invalid segment
> > file:/home/llist/nutchData/crawl/segments/20110715183244/crawl_generate
> > - skipping invalid segment
> > file:/home/llist/nutchData/crawl/segments/20110715183244/content
> > CrawlDb update: Merging segment data into db.
> > CrawlDb update: finished at 2011-07-15 18:36:01, elapsed: 00:00:01
> > -----------------------------------
> > 
> > Appreciate any hints on what I'm missing.


Reply via email to