Hi Lewis, Will try your suggestion shortly, but am still puzzled why the crawl command works. Isn't it using the same filter, etc?
Cheers, Leo On Thu, 2011-07-21 at 20:55 +0100, lewis john mcgibbney wrote: > Hi Leo, > > From the times both the fetching and parsing took, I suspecting that > maybe Nutch didn't actually fetch the URL, however this may not be the > case as I have nothing to benchmark it on. Unfortuantely on the > occasion the URL http://wiki.apache.org actually redirects to > http://wiki.apache.org/general/ so I'm going to post my log output > from last URL you specified in an attempt to clear this one up. The > following confirms that you are accurate with your observations that > not only does this produce invalid segments but also nothing is > fetched in the process. > > Therefore the reason that we are getting the - skipping invalid > segment message is that we are not actually fetching any content. My > initial thoughts were that your urlfilters were not set properly and I > think that this is part of the case. > > Please follow the syntax very carefully and it will work perfectly for > you as follows > > regex-urlfilter.txt > -------------------------- > > # skip URLs with slash-delimited segment that repeats 3+ times, to > break loops > -.*(/[^/]+)/[^/]+\1/[^/]+\1/ > > # crawl URLs in the following domains. > +^http://([a-z0-9]*\.)*seek.com.au/ > > # accept anything else > #+. > > seed file > ---------------------- > http://www.seek.com.au > > It sounds really trivial but I think that the trailing '/' in in your > seed file may have been making all of the difference. > > Please try, test with readdb and readseg and comment back. > > Sorry for the delayed posts on this one I have not had much time to > get to it. Hope all goes to plan. Evidence can be seen below > > lewis@lewis-01:~/ASF/branch-1.4/runtime/local$ bin/nutch readdb > crawldb -stats > CrawlDb statistics start: crawldb > Statistics for CrawlDb: crawldb > TOTAL urls: 48 > retry 0: 48 > min score: 0.017 > avg score: 0.041125 > max score: 1.175 > status 1 (db_unfetched): 47 > status 2 (db_fetched): 1 > CrawlDb statistics: done > > > > > > > On Thu, Jul 21, 2011 at 3:30 AM, Leo Subscriptions > <[email protected]> wrote: > > Following are the suggested commands and the result as > suggested > I left the redirect as 0 as 'crawl' works without any issues. > The > problem only occurs when running the individual commands. > > ------- nutch-site.xml ------------------------------- > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > > <property> > <name>http.agent.name</name> > <value>listers spider</value> > </property> > > <property> > <name>fetcher.verbose</name> > <value>true</value> > <description>If true, fetcher will log more > verbosely.</description> > </property> > > <property> > <name>http.verbose</name> > <value>true</value> > <description>If true, HTTP will log more > verbosely.</description> > </property> > > </configuration> > --------------------------------------------------------------- > > ------ Individual commands and > results------------------------- > > > llist@LeosLinux:~/nutchData > $ /usr/share/nutch/runtime/local/bin/nutch > > > inject /home/llist/nutchData/crawl/crawldb > /home/llist/nutchData/seed/urls > Injector: starting at 2011-07-21 12:24:52 > > Injector: crawlDb: /home/llist/nutchData/crawl/crawldb > Injector: urlDir: /home/llist/nutchData/seed/urls > Injector: Converting injected urls to crawl db entries. > Injector: Merging injected urls into crawl db. > > > Injector: finished at 2011-07-21 12:24:55, elapsed: 00:00:02 > > > > llist@LeosLinux:~/nutchData > $ /usr/share/nutch/runtime/local/bin/nutch > > > generate /home/llist/nutchData/crawl/crawldb > /home/llist/nutchData/crawl/segments -topN 100 > Generator: starting at 2011-07-21 12:25:16 > > Generator: Selecting best-scoring urls due for fetch. > Generator: filtering: true > Generator: normalizing: true > > > Generator: topN: 100 > > Generator: jobtracker is 'local', generating exactly one > partition. > Generator: Partitioning selected urls for politeness. > > > Generator: > segment: /home/llist/nutchData/crawl/segments/20110721122519 > Generator: finished at 2011-07-21 12:25:20, elapsed: 00:00:03 > > > > llist@LeosLinux:~/nutchData > $ /usr/share/nutch/runtime/local/bin/nutch > > > fetch /home/llist/nutchData/crawl/segments/20110721122519 > > Fetcher: Your 'http.agent.name' value should be listed first > in > 'http.robots.agents' property. > > > Fetcher: starting at 2011-07-21 12:26:36 > Fetcher: > segment: /home/llist/nutchData/crawl/segments/20110721122519 > > Fetcher: threads: 10 > QueueFeeder finished: total 1 records + hit by time limit :0 > > -finishing thread FetcherThread, activeThreads=1 > > > fetching http://wiki.apache.org/ > > -finishing thread FetcherThread, activeThreads=1 > -finishing thread FetcherThread, activeThreads=1 > -finishing thread FetcherThread, activeThreads=1 > -finishing thread FetcherThread, activeThreads=1 > -finishing thread FetcherThread, activeThreads=1 > -finishing thread FetcherThread, activeThreads=1 > -finishing thread FetcherThread, activeThreads=1 > -finishing thread FetcherThread, activeThreads=1 > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 > -finishing thread FetcherThread, activeThreads=0 > -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 > -activeThreads=0 > > > Fetcher: finished at 2011-07-21 12:26:40, elapsed: 00:00:04 > > > > llist@LeosLinux:~/nutchData > $ /usr/share/nutch/runtime/local/bin/nutch > > > parse /home/llist/nutchData/crawl/segments/20110721122519 > ParseSegment: starting at 2011-07-21 12:27:22 > ParseSegment: > segment: /home/llist/nutchData/crawl/segments/20110721122519 > ParseSegment: finished at 2011-07-21 12:27:24, elapsed: > 00:00:01 > > > > llist@LeosLinux:~/nutchData > $ /usr/share/nutch/runtime/local/bin/nutch > updatedb /home/llist/nutchData/crawl/crawldb > > > -dir /home/llist/nutchData/crawl/segments/20110721122519 > CrawlDb update: starting at 2011-07-21 12:28:03 > > CrawlDb update: db: /home/llist/nutchData/crawl/crawldb > CrawlDb update: segments: > > > [file:/home/llist/nutchData/crawl/segments/20110721122519/parse_text, > file:/home/llist/nutchData/crawl/segments/20110721122519/content, > file:/home/llist/nutchData/crawl/segments/20110721122519/crawl_parse, > file:/home/llist/nutchData/crawl/segments/20110721122519/parse_data, > file:/home/llist/nutchData/crawl/segments/20110721122519/crawl_fetch, > > file:/home/llist/nutchData/crawl/segments/20110721122519/crawl_generate] > > CrawlDb update: additions allowed: true > CrawlDb update: URL normalizing: false > CrawlDb update: URL filtering: false > - skipping invalid segment > > > file:/home/llist/nutchData/crawl/segments/20110721122519/parse_text > - skipping invalid segment > file:/home/llist/nutchData/crawl/segments/20110721122519/content > - skipping invalid segment > file:/home/llist/nutchData/crawl/segments/20110721122519/crawl_parse > - skipping invalid segment > file:/home/llist/nutchData/crawl/segments/20110721122519/parse_data > - skipping invalid segment > file:/home/llist/nutchData/crawl/segments/20110721122519/crawl_fetch > - skipping invalid segment > > file:/home/llist/nutchData/crawl/segments/20110721122519/crawl_generate > > CrawlDb update: Merging segment data into db. > > > CrawlDb update: finished at 2011-07-21 12:28:04, elapsed: > 00:00:01 > > > ------------------------------------------------------------------------------------ > > > > > > On Wed, 2011-07-20 at 21:58 +0100, lewis john mcgibbney wrote: > > > There is no documentation for individual commands used to > run a Nutch 1.3 > > crawl so I'm not sure where there has been a mislead. In the > instance that > > this was required I would direct newer users to the legacy > documentation for > > the time being. > > > > My comment to Leo was to understand whether he managed to > correct the > > invalid segments problem. > > > > Leo, if this still persists may I ask you to try again, I > will do the same > > and will be happy to provide feedback > > > > May I suggest the following > > > > > > use the following commands > > > > inject > > generate > > fetch > > parse > > updatedb > > > > At this stage we should be able to ascertain if something is > correct and > > hopefully debug. May I add the following... please make the > following > > additions to nutch-site. > > > > fetcher verbose - true > > http verbose - true > > check for redirects and set accordingly > > > > > > On Wed, Jul 20, 2011 at 1:39 PM, Julien Nioche < > > [email protected]> wrote: > > > > > The wiki can be edited and you are welcome to suggest > improvements if there > > > is something missing > > > > > > On 20 July 2011 13:31, Cam Bazz <[email protected]> wrote: > > > > > > > Hello, > > > > > > > > I think there is a mislead in the documentation, it does > not tell us > > > > that we have to parse. > > > > > > > > On Wed, Jul 20, 2011 at 11:42 AM, Julien Nioche > > > > <[email protected]> wrote: > > > > > Haven't you forgotten to call parse? > > > > > > > > > > On 19 July 2011 23:40, Leo Subscriptions > <[email protected]> > > > > wrote: > > > > > > > > > >> Hi Lewis, > > > > >> > > > > >> You are correct about the last post not showing any > errors. I just > > > > >> wanted to show that I don't get any errors if I use > 'crawl' and to > > > prove > > > > >> that I do not have any faults in the conf files or > the directories. > > > > >> > > > > >> I still get the errors if I use the individual > commands inject, > > > > >> generate, fetch.... > > > > >> > > > > >> Cheers, > > > > >> > > > > >> Leo > > > > >> > > > > >> > > > > >> > > > > >> On Tue, 2011-07-19 at 22:09 +0100, lewis john > mcgibbney wrote: > > > > >> > > > > >> > Hi Leo > > > > >> > > > > > >> > Did you resolve? > > > > >> > > > > > >> > Your second log data doesn't appear to show any > errors however the > > > > >> > problem you specify if one I have witnessed myself > while ago. Since > > > > >> > you posted have you been able to replicate... or > resolve? > > > > >> > > > > > >> > > > > > >> > On Sun, Jul 17, 2011 at 1:03 AM, Leo Subscriptions > > > > >> > <[email protected]> wrote: > > > > >> > > > > > >> > I've used crawl to ensure config is correct > and I don't get > > > > >> > any errors, > > > > >> > so I must be doing something wrong with the > individual > > > steps, > > > > >> > but can;t > > > > >> > see what. > > > > >> > > > > > >> > > > > > >> > > > > > > > > > -------------------------------------------------------------------------------------------------------------------- > > > > >> > > > > > >> > llist@LeosLinux:~/nutchData > > > > >> > $ /usr/share/nutch/runtime/local/bin/nutch > > > > >> > > > > > >> > > > > > >> > crawl /home/llist/nutchData/seed/urls > > > > >> > -dir /home/llist/nutchData/crawl > > > > >> > -depth 3 -topN 5 > > > > >> > solrUrl is not set, indexing will be > skipped... > > > > >> > crawl started > in: /home/llist/nutchData/crawl > > > > >> > rootUrlDir > = /home/llist/nutchData/seed/urls > > > > >> > threads = 10 > > > > >> > depth = 3 > > > > >> > solrUrl=null > > > > >> > topN = 5 > > > > >> > Injector: starting at 2011-07-17 09:31:19 > > > > >> > > > > > >> > Injector: > crawlDb: /home/llist/nutchData/crawl/crawldb > > > > >> > > > > > >> > > > > > >> > Injector: > urlDir: /home/llist/nutchData/seed/urls > > > > >> > > > > > >> > Injector: Converting injected urls to crawl > db entries. > > > > >> > Injector: Merging injected urls into crawl > db. > > > > >> > > > > > >> > > > > > >> > Injector: finished at 2011-07-17 09:31:22, > elapsed: 00:00:02 > > > > >> > Generator: starting at 2011-07-17 09:31:22 > > > > >> > > > > > >> > Generator: Selecting best-scoring urls due > for fetch. > > > > >> > Generator: filtering: true > > > > >> > Generator: normalizing: true > > > > >> > > > > > >> > > > > > >> > Generator: topN: 5 > > > > >> > > > > > >> > Generator: jobtracker is 'local', > generating exactly one > > > > >> > partition. > > > > >> > Generator: Partitioning selected urls for > politeness. > > > > >> > > > > > >> > > > > > >> > Generator: > > > > >> > > segment: /home/llist/nutchData/crawl/segments/20110717093124 > > > > >> > Generator: finished at 2011-07-17 09:31:26, > elapsed: > > > 00:00:04 > > > > >> > > > > > >> > Fetcher: Your 'http.agent.name' value > should be listed > > > first > > > > >> > in > > > > >> > 'http.robots.agents' property. > > > > >> > > > > > >> > > > > > >> > Fetcher: starting at 2011-07-17 09:31:26 > > > > >> > Fetcher: > > > > >> > > segment: /home/llist/nutchData/crawl/segments/20110717093124 > > > > >> > > > > > >> > Fetcher: threads: 10 > > > > >> > QueueFeeder finished: total 1 records + hit > by time limit :0 > > > > >> > fetching http://www.seek.com.au/ > > > > >> > -finishing thread FetcherThread, > activeThreads=1 > > > > >> > -finishing thread FetcherThread, > activeThreads=1 > > > > >> > -finishing thread FetcherThread, > activeThreads=1 > > > > >> > -finishing thread FetcherThread, > activeThreads=1 > > > > >> > -finishing thread FetcherThread, > activeThreads=1 > > > > >> > -finishing thread FetcherThread, > activeThreads=1 > > > > >> > -finishing thread FetcherThread, > activeThreads=1 > > > > >> > -finishing thread FetcherThread, > activeThreads=1 > > > > >> > > > > > >> > -finishing thread FetcherThread, > activeThreads=1 > > > > >> > -activeThreads=1, spinWaiting=0, > fetchQueues.totalSize=0 > > > > >> > -finishing thread FetcherThread, > activeThreads=0 > > > > >> > -activeThreads=0, spinWaiting=0, > fetchQueues.totalSize=0 > > > > >> > -activeThreads=0 > > > > >> > > > > > >> > > > > > >> > Fetcher: finished at 2011-07-17 09:31:29, > elapsed: 00:00:03 > > > > >> > ParseSegment: starting at 2011-07-17 > 09:31:29 > > > > >> > ParseSegment: > > > > >> > > segment: /home/llist/nutchData/crawl/segments/20110717093124 > > > > >> > ParseSegment: finished at 2011-07-17 > 09:31:32, elapsed: > > > > >> > 00:00:02 > > > > >> > CrawlDb update: starting at 2011-07-17 > 09:31:32 > > > > >> > > > > > >> > CrawlDb update: > db: /home/llist/nutchData/crawl/crawldb > > > > >> > CrawlDb update: segments: > > > > >> > > > > > >> > > > > > >> > > [/home/llist/nutchData/crawl/segments/20110717093124] > > > > >> > > > > > >> > CrawlDb update: additions allowed: true > > > > >> > > > > > >> > > > > > >> > CrawlDb update: URL normalizing: true > > > > >> > CrawlDb update: URL filtering: true > > > > >> > > > > > >> > CrawlDb update: Merging segment data into > db. > > > > >> > > > > > >> > > > > > >> > CrawlDb update: finished at 2011-07-17 > 09:31:34, elapsed: > > > > >> > 00:00:02 > > > > >> > : > > > > >> > : > > > > >> > : > > > > >> > : > > > > >> > > > > > >> > > > > > > > > > ----------------------------------------------------------------------------------------------- > > > > >> > > > > > >> > > > > > >> > > > > > >> > On Sat, 2011-07-16 at 12:14 +1000, Leo > Subscriptions wrote: > > > > >> > > > > > >> > > Done, but now get additional errors: > > > > >> > > > > > > >> > > ------------------- > > > > >> > > llist@LeosLinux:~/nutchData > > > > >> > $ /usr/share/nutch/runtime/local/bin/nutch > > > > >> > > > updatedb /home/llist/nutchData/crawl/crawldb > > > > >> > > > -dir /home/llist/nutchData/crawl/segments/20110716105826 > > > > >> > > CrawlDb update: starting at 2011-07-16 > 11:03:56 > > > > >> > > CrawlDb update: > db: /home/llist/nutchData/crawl/crawldb > > > > >> > > CrawlDb update: segments: > > > > >> > > > > > > >> > > > > > >> > [file:/home/llist/nutchData/crawl/segments/20110716105826/crawl_fetch, > > > > >> > > > > > > >> > > > > > > file:/home/llist/nutchData/crawl/segments/20110716105826/content, > > > > >> > > > > > > >> > > > > > >> > file:/home/llist/nutchData/crawl/segments/20110716105826/crawl_parse, > > > > >> > > > > > > >> > > > > > >> > file:/home/llist/nutchData/crawl/segments/20110716105826/parse_data, > > > > >> > > > > > > >> > > > > > >> > > > > > file:/home/llist/nutchData/crawl/segments/20110716105826/crawl_generate, > > > > >> > > > > > > >> > > > > > >> > file:/home/llist/nutchData/crawl/segments/20110716105826/parse_text] > > > > >> > > CrawlDb update: additions allowed: true > > > > >> > > CrawlDb update: URL normalizing: false > > > > >> > > CrawlDb update: URL filtering: false > > > > >> > > - skipping invalid segment > > > > >> > > > > > > >> > > > > > >> > file:/home/llist/nutchData/crawl/segments/20110716105826/crawl_fetch > > > > >> > > - skipping invalid segment > > > > >> > > > > > > >> > > > > > > file:/home/llist/nutchData/crawl/segments/20110716105826/content > > > > >> > > - skipping invalid segment > > > > >> > > > > > > >> > > > > > >> > file:/home/llist/nutchData/crawl/segments/20110716105826/crawl_parse > > > > >> > > - skipping invalid segment > > > > >> > > > > > > >> > > > > > >> > file:/home/llist/nutchData/crawl/segments/20110716105826/parse_data > > > > >> > > - skipping invalid segment > > > > >> > > > > > > >> > > > > > >> > > > > > file:/home/llist/nutchData/crawl/segments/20110716105826/crawl_generate > > > > >> > > - skipping invalid segment > > > > >> > > > > > > >> > > > > > >> > file:/home/llist/nutchData/crawl/segments/20110716105826/parse_text > > > > >> > > CrawlDb update: Merging segment data into > db. > > > > >> > > CrawlDb update: finished at 2011-07-16 > 11:03:57, elapsed: > > > > >> > 00:00:01 > > > > >> > > > ------------------------------------------- > > > > >> > > > > > > >> > > On Sat, 2011-07-16 at 02:36 +0200, Markus > Jelsma wrote: > > > > >> > > > > > > >> > > > fetch, then parse. > > > > >> > > > > > > > >> > > > > I'm running nutch 1.3 on 64 bit > Ubuntu, following are > > > > >> > the commands and > > > > >> > > > > relevant output. > > > > >> > > > > > > > > >> > > > > ---------------------------------- > > > > >> > > > > llist@LeosLinux:~ > > > > >> > $ /usr/share/nutch/runtime/local/bin/nutch > > > > >> > > > > > > > > >> > inject /home/llist/nutchData/crawl/crawldb > > > > >> /home/llist/nutchData/seed > > > > >> > > > > Injector: starting at 2011-07-15 > 18:32:10 > > > > >> > > > > Injector: > crawlDb: /home/llist/nutchData/crawl/crawldb > > > > >> > > > > Injector: > urlDir: /home/llist/nutchData/seed > > > > >> > > > > Injector: Converting injected urls to > crawl db > > > entries. > > > > >> > > > > Injector: Merging injected urls into > crawl db. > > > > >> > > > > Injector: finished at 2011-07-15 > 18:32:13, elapsed: > > > > >> > 00:00:02 > > > > >> > > > > ================= > > > > >> > > > > llist@LeosLinux:~ > > > > >> > $ /usr/share/nutch/runtime/local/bin/nutch > > > > >> > > > > > generate /home/llist/nutchData/crawl/crawldb > > > > >> > > > > /home/llist/nutchData/crawl/segments > Generator: > > > starting > > > > >> > at 2011-07-15 > > > > >> > > > > 18:32:41 > > > > >> > > > > Generator: Selecting best-scoring > urls due for fetch. > > > > >> > > > > Generator: filtering: true > > > > >> > > > > Generator: normalizing: true > > > > >> > > > > Generator: jobtracker is 'local', > generating exactly > > > one > > > > >> > partition. > > > > >> > > > > Generator: Partitioning selected urls > for politeness. > > > > >> > > > > Generator: > > > > >> > > segment: /home/llist/nutchData/crawl/segments/20110715183244 > > > > >> > > > > Generator: finished at 2011-07-15 > 18:32:45, elapsed: > > > > >> > 00:00:03 > > > > >> > > > > ================== > > > > >> > > > > llist@LeosLinux:~ > > > > >> > $ /usr/share/nutch/runtime/local/bin/nutch > > > > >> > > > > > > > > >> > > fetch /home/llist/nutchData/crawl/segments/20110715183244 > > > > >> > > > > Fetcher: Your 'http.agent.name' value > should be > > > listed > > > > >> > first in > > > > >> > > > > 'http.robots.agents' property. > > > > >> > > > > Fetcher: starting at 2011-07-15 > 18:34:55 > > > > >> > > > > Fetcher: > > > > >> > > segment: /home/llist/nutchData/crawl/segments/20110715183244 > > > > >> > > > > Fetcher: threads: 10 > > > > >> > > > > QueueFeeder finished: total 1 records > + hit by time > > > > >> > limit :0 > > > > >> > > > > fetching http://www.seek.com.au/ > > > > >> > > > > -finishing thread FetcherThread, > activeThreads=1 > > > > >> > > > > -finishing thread FetcherThread, > activeThreads=1 > > > > >> > > > > -finishing thread FetcherThread, > activeThreads=1 > > > > >> > > > > -finishing thread FetcherThread, > activeThreads=1 > > > > >> > > > > -finishing thread FetcherThread, > activeThreads=2 > > > > >> > > > > -finishing thread FetcherThread, > activeThreads=1 > > > > >> > > > > -finishing thread FetcherThread, > activeThreads=1 > > > > >> > > > > -finishing thread FetcherThread, > activeThreads=1 > > > > >> > > > > -finishing thread FetcherThread, > activeThreads=1 > > > > >> > > > > -activeThreads=1, spinWaiting=0, > > > fetchQueues.totalSize=0 > > > > >> > > > > -finishing thread FetcherThread, > activeThreads=0 > > > > >> > > > > -activeThreads=0, spinWaiting=0, > > > fetchQueues.totalSize=0 > > > > >> > > > > -activeThreads=0 > > > > >> > > > > Fetcher: finished at 2011-07-15 > 18:34:59, elapsed: > > > > >> > 00:00:03 > > > > >> > > > > ================= > > > > >> > > > > llist@LeosLinux:~ > > > > >> > $ /usr/share/nutch/runtime/local/bin/nutch > > > > >> > > > > > updatedb /home/llist/nutchData/crawl/crawldb > > > > >> > > > > -dir > > > /home/llist/nutchData/crawl/segments/20110715183244 > > > > >> > > > > CrawlDb update: starting at > 2011-07-15 18:36:00 > > > > >> > > > > CrawlDb update: db: > > > /home/llist/nutchData/crawl/crawldb > > > > >> > > > > CrawlDb update: segments: > > > > >> > > > > > > > > >> > > > > > >> > [file:/home/llist/nutchData/crawl/segments/20110715183244/crawl_fetch, > > > > >> > > > > > > > > >> > > > > > >> > > > > > file:/home/llist/nutchData/crawl/segments/20110715183244/crawl_generate, > > > > >> > > > > > > > > >> > > > > > > file:/home/llist/nutchData/crawl/segments/20110715183244/content] > > > > >> > > > > CrawlDb update: additions allowed: > true > > > > >> > > > > CrawlDb update: URL normalizing: > false > > > > >> > > > > CrawlDb update: URL filtering: false > > > > >> > > > > - skipping invalid segment > > > > >> > > > > > > > > >> > > > > > >> > file:/home/llist/nutchData/crawl/segments/20110715183244/crawl_fetch > > > > >> > > > > - skipping invalid segment > > > > >> > > > > > > > > >> > > > > > >> > > > > > file:/home/llist/nutchData/crawl/segments/20110715183244/crawl_generate > > > > >> > > > > - skipping invalid segment > > > > >> > > > > > > > > >> > > > > > > file:/home/llist/nutchData/crawl/segments/20110715183244/content > > > > >> > > > > CrawlDb update: Merging segment data > into db. > > > > >> > > > > CrawlDb update: finished at > 2011-07-15 18:36:01, > > > > >> > elapsed: 00:00:01 > > > > >> > > > > ----------------------------------- > > > > >> > > > > > > > > >> > > > > Appreciate any hints on what I'm > missing. > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > -- > > > > >> > Lewis > > > > >> > > > > > >> > > > > >> > > > > >> > > > > > > > > > > > > > > > -- > > > > > * > > > > > *Open Source Solutions for Text Engineering > > > > > > > > > > http://digitalpebble.blogspot.com/ > > > > > http://www.digitalpebble.com > > > > > > > > > > > > > > > > > > > > > -- > > > * > > > *Open Source Solutions for Text Engineering > > > > > > http://digitalpebble.blogspot.com/ > > > http://www.digitalpebble.com > > > > > > > > > > > > > > > > > -- > Lewis >

