I'm trying to get nutch working for my large web site, but I can't find answers to basic questions after looking all over the nutch site and searching google.
1) Why doesn't 0.7.2 allow me to search by "title:", I have 15 different fields showing in Luke but I can only search two of them? Url: and site:, is that it? 2) How would I add an additional field like "author:" that can be searched by. 3) Is there an search in "anchor:" ability? 4) Can't you do wildcard searches? Like "d?g" or "t*est" etc. 5) Why does it seem that nutch doesn't support Lucene's full feature set of query types etc.? 6) I'm using this mostly for site search, I have access to the database, would it just be better to use Lucene and index my database instead of using nutch? Is there application that's better suited for indexing a database that uses Lucene, and preferably outputs opensearch XML? Also, what is your guys IRC channel you're using? -----Original Message----- From: Abdelhakim Diab [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 27, 2006 2:53 AM To: [email protected]; Dima Mazmanov Subject: Re: urls list crawling thanks for your replay . I solved the problem . I am useing nutch 7.0.2 the problem was in the filter. thanks very much. ----- Original Message ----- From: "Dima Mazmanov" <[EMAIL PROTECTED]> To: "Abdelhakim Diab" <[email protected]> Sent: Monday, June 26, 2006 4:24 PM Subject: Re: urls list crawling Hi,Abdelhakim. What is the version of nutch you are using? You wrote 26 июня 2006 г., 17:04:12: > I want to crawl a list of sites , but when I put the urls in the urls.txt > file the crawler fetches the first url just. > and no fetching for the other urls > how can I solve this problem . > the urls : > http://lucene.apache.org/nutch/ > http://www.spacetoon.com -- Regards, Dima mailto:[EMAIL PROTECTED] ______________________________________ Tonal web design and hosting http://tonalweb.com eCommerce development & marketing Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
