Hello, Dennis, Tanks again, for your response. I am really amazed that the things can`t go right. I have verified my configuration, in nutch-site.xml and I have already filled all the fields we mentioned in your e-mail. I have even copied the file nutch-site.xml to a sub-folder under the folder ROOT in TomCat. Still no results, although the log does not show any problems. Just for your information I will reproduce two section of the log:
The first one, just when starting the crawl: 006-09-28 17:15:43,930 INFO http.Http - http.agent = qualidade/0.8.1(qualidade e meio ambiente; http://www.qualidade.eng.br; [EMAIL PROTECTED]) and, the final section, after all the indexing and optimization: 2006-09-28 17:25:58,551 INFO indexer.Indexer - Indexer: done 2006-09-28 17:25:58,556 INFO indexer.DeleteDuplicates - Dedup: starting 2006-09-28 17:25:58,593 INFO indexer.DeleteDuplicates - Dedup: adding indexes in: teste/indexes 2006-09-28 17:26:01,356 INFO indexer.DeleteDuplicates - Dedup: done 2006-09-28 17:26:01,358 INFO indexer.IndexMerger - Adding teste/indexes/part-00000 2006-09-28 17:26:02,377 INFO crawl.Crawl - crawl finished: teste Then I go to the "teste" folder and start TomCat from there, like in Nutch 0.7.2, get that nice search page, try something and ..........zero results! Any new ideas? Tanks, W. Melo ----- Original Message ----- From: "Dennis Kubes" <[EMAIL PROTECTED]> To: <[email protected]> Sent: Thursday, September 28, 2006 6:19 PM Subject: Re: no results in nutch 0.8.1 > This is what we have, hope this clears up some confusion. It will show up > in log files of the sites that you crawl like this. I don't know if the > configuration is what is causing your problem but I have talked to other > people on the list with similar problems where their configuration was > incorrect. I think the only thing that is "required" is for the > http.agent.name not to be blank but I would set all of the other options > as well, just for politeness. > > Dennis > > Log file will record a crawler similar to this: > NameOfAgent/1.0_(Yourwebsite.com;_http://www.yoururl.com/bot.html;[EMAIL > PROTECTED]) > > <!-- HTTP properties --> > <property> > <name>http.agent.name</name> > <value>NameOfAgent</value> > <description>Our HTTP 'User-Agent' request header.</description> > </property> > > <property> > <name>http.robots.agents</name> > <value>NutchCVS,Nutch,NameOfAgent,*</value> > <description>The agent strings we'll look for in robots.txt files, > comma-separated, in decreasing order of precedence.</description> > </property> > > <property> > <name>http.robots.403.allow</name> > <value>true</value> > <description>Some servers return HTTP status 403 (Forbidden) if > /robots.txt doesn't exist. This should probably mean that we are > allowed to crawl the site nonetheless. If this is set to false, > then such sites will be treated as forbidden.</description> > </property> > > <property> > <name>http.agent.description</name> > <value>Yourwebsite.com</value> > <description>Further description of our bot- this text is used in > the User-Agent header. It appears in parenthesis after the agent name. > </description> > </property> > > <property> > <name>http.agent.url</name> > <value>http://yoururl.com</value> > <description>A URL to advertise in the User-Agent header. This will > appear in parenthesis after the agent name. > </description> > </property> > > <property> > <name>http.agent.email</name> > <value>[EMAIL PROTECTED]</value> > <description>An email address to advertise in the HTTP 'From' request > header and User-Agent header.</description> > </property> > > <property> > <name>http.agent.version</name> > <value>1.0</value> > <description>A version string to advertise in the User-Agent > header.</description> > </property> > > carmmello wrote: >> Tanks for your answer Dennis, but, yes, I did. The only thing I did not >> (and I have some doubt about it) is that in the http.agent.version I only >> used Nutch-0.8.1 name, but not the the name I used in http.robots.agent, >> although in this configuration I have kept the *. Also, in the log >> file, I can not find any error regarding this >> >> ----- Original Message ----- From: "Dennis Kubes" >> <[EMAIL PROTECTED]> >> To: <[email protected]> >> Sent: Wednesday, September 27, 2006 7:59 PM >> Subject: Re: no results in nutch 0.8.1 >> >> >>> Did you setup the user agent name in the nutch-site.xml file or >>> nutch-default.xml file? >>> >>> Dennis >>> >>> carmmello wrote: >>>> I have followed the steps in the 0.8.1 tutorial and, also, I have been >>>> using Nutch for some time now, without seeing the kind of problem I am >>>> encountering now. >>>> After I have finished the crawl process (intranet crawling), I go to >>>> localhost:8080, try a search and get, no matter what, 0 results. >>>> Looking at the logs, everything seems ok. Also, if I use the command >>>> bin/nutch readdb "crawl/crawldb" I found more than 6000 urls. >>>> So, why can`t I get any results? >>>> Tanks >>>> >>> >>> >>> -- >>> No virus found in this incoming message. >>> Checked by AVG Free Edition. >>> Version: 7.1.405 / Virus Database: 268.12.9/458 - Release Date: >>> 27/9/2006 >>> >>> >> > > > -- > No virus found in this incoming message. > Checked by AVG Free Edition. > Version: 7.1.405 / Virus Database: 268.12.9/458 - Release Date: 27/9/2006 > > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
