Cheers for your response Alvaro, I now understand more why Nutch uses multiple indexes (and the hats)
I hope the following links help http://www.heypatty.com/nutch-site.xml is the the config I am actually using (didn't want to paste it due to size) And the directory of the index looks like this image -> http://www.heypatty.com/nutch_dir.jpg -----Original Message----- From: Alvaro Cabrerizo [mailto:[EMAIL PROTECTED] Sent: Tuesday, 13 February 2007 9:53 PM To: [email protected] Subject: Re: n00b question follow up It looks like the query is well done. Query means (scoring part is a little bit more complicated): "please index, give me all the ducuments you have that contain in the field url OR anchor OR content OR host the word test. If you find it in the field url use to score the document a boost factor of 4 if you find it in the field anchor use to score a boost of 2..." Maybe it could be helpful to see your nuch-site.xml (e.g. the one you are using when deploy nutch.war in tomcat, nutch/WEB-INF/classes/nutch-site). 2007/2/12, Patrick Simon <[EMAIL PROTECTED]>: > > All of my config stuff sits in nutch-default.xml (nutch-site.xml > doesn't change anything but I assume this should be fine) > > The out put to my log file of what you advise below is > > Query->+(url:test^4.0 anchor:test^2.0 content:test title:test^1.5 > host:test^2.0) > > I'm not too sure why all the carots are appearing? > > > -----Original Message----- > From: Alvaro Cabrerizo [mailto:[EMAIL PROTECTED] > Sent: Thursday, 8 February 2007 1:25 AM > To: [email protected] > Subject: Re: n00b question follow up > > Hi: > > First you can check that query plugins (query-basic, more, etc) appear > in your nutch-site.xml. If everything is ok, you can add a LOG line in > the method "search" of the class > org.apache.nutch.searcher.IndexSearcher > in order to see how the lucene query is built. If I'm not wrong you > have to add in line 99 LOG.info("query > ->"+luceneQuery.toString()); This method should look like this: > > public Hits search(Query query...) > ... > try{ > org.apache.lucene.search.BooleanQuery luceneQuery = > this.queryFilters.filter(query); LOG.info("query -> > "+luceneQuery.toString()); return .. > > Recompile, and make a new query. > > Hope it helps. > > > > > > 2007/2/7, Patrick Simon <[EMAIL PROTECTED]>: > > > > Hi All, > > > > The is an older post I made with more details from logs that will > > hopefully be painfully obvious to someone out there why its not > > working.. > > > > It appears that I have successfully created a Nutch index via the > > command "nutch/bin :>./nutch crawl ../urls -dir ../crawl.test -depth > 5". > > > > I say it is successful as when I use Luke (a Lucene GUI tool that > > interegates Lucene indexes) to view the index, a valid index and > > search results come up. > > > > The directory I point Luke to is > > /home/simonp/nutch-0.8/crawl.test/indexes/part-00000 (the value I > > give > > > for searcher.dir in nutch-default.xml is > > "/home/simonp/nutch-0.8/crawl.test") > > > > The problem is that I cannot see any results via the command > > "bin/nutch org.apache.nutch.searcher.NutchBean apache" or when I > > search for the string apache within the nutch servlet. > > > > I don't run any fetching or indexing as the tutorial says not to for > > simple intranet searching. > > > > I am using Tomcat 5.5 and Nutch 0.8. > > > > Can any body help with this one please? > > > > The output from catalina.out is > > > > 2007-02-06 09:01:27,990 INFO NutchBean - opening indexes in > > /home/simonp/nutch-8.0/crawl.test/indexes > > 2007-02-06 09:01:28,032 INFO Configuration - found resource > > common-terms.utf8 at > > file:/usr/local/tomcat/webapps/nutch-0.8/WEB-INF/classes/common-terms. > > ut > > f8 > > 2007-02-06 09:01:28,037 INFO NutchBean - opening segments in > > /home/simonp/nutch-8.0/crawl.test/segments > > 2007-02-06 09:01:28,056 INFO SummarizerFactory - Using the first > > summarizer extension found: Basic Summarizer > > 2007-02-06 09:01:28,056 INFO NutchBean - opening linkdb in > > /home/simonp/nutch-8.0/crawl.test/linkdb > > 2007-02-06 09:01:28,062 INFO NutchBean - query request from > > 192.168.5.173 > > 2007-02-06 09:01:28,072 INFO NutchBean - query: ubuntu > > 2007-02-06 09:01:28,072 INFO NutchBean - lang: en > > 2007-02-06 09:01:28,101 INFO NutchBean - searching for 20 raw hits > > 2007-02-06 09:01:28,142 INFO NutchBean - total hits: 0 > > 2007-02-06 09:01:30,506 INFO NutchBean - query request from > > 192.168.5.173 > > 2007-02-06 09:01:30,506 INFO NutchBean - query: apache > > 2007-02-06 09:01:30,506 INFO NutchBean - lang: en > > 2007-02-06 09:01:30,507 INFO NutchBean - searching for 20 raw hits > > 2007-02-06 09:01:30,507 INFO NutchBean - total hits: 0 > > 2007-02-06 09:01:51,191 INFO NutchBean - query request from > > 192.168.5.173 > > 2007-02-06 09:01:51,191 INFO NutchBean - query: test > > 2007-02-06 09:01:51,191 INFO NutchBean - lang: en > > 2007-02-06 09:01:51,193 INFO NutchBean - searching for 20 raw hits > > 2007-02-06 09:01:51,193 INFO NutchBean - total hits: 0 > > 2007-02-06 10:22:51,068 INFO NutchBean - query request from > > 192.168.5.173 > > 2007-02-06 10:22:51,070 INFO NutchBean - query: test > > 2007-02-06 10:22:51,070 INFO NutchBean - lang: en > > 2007-02-06 10:22:51,073 INFO NutchBean - searching for 20 raw hits > > 2007-02-06 10:22:51,076 INFO NutchBean - total hits: 0 OAG Best Low > > Cost Airline Of The Year > > > > The content of this e-mail, including any attachments, is a > > confidential communication between Virgin Blue, Pacific Blue or a > > related entity (or the sender if this email is a private > > communication) and the intended addressee and is for the sole use of > > that intended addressee. If you are not the intended addressee, any > > use, interference with, disclosure or copying of this material is > > unauthorized and prohibited. If you have received this e-mail in > > error > > > please contact the sender immediately and then delete the message > > and any attachment(s). There is no warranty that this email is > > error, virus or defect free. This email is also subject to > > copyright. No part > > > of it should be reproduced, adapted or communicated without the > > written consent of the copyright owner. If this is a private > > communication it does not represent the views of Virgin Blue, > > Pacific Blue or their related entities. Please be aware that the > > contents of any emails sent to or from Virgin Blue, Pacific Blue or > > their related entities may be periodically monitored and reviewed. > > Virgin Blue, > Pacific Blue and their related entities respect your privacy. Our > privacy policy can be accessed from our website: > > www.virginblue.com.au > > > > > > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
