I didn't see query-basic/query-more on your list of plugins included. This
is what
handles most queries usually. query-url will only handle parts of the
query that look like url:http://www.google.com, and query-site handles
site:www.google.com. Nothing seems to be handling just regular
text in the content.
Is query-basic or query-more included in your nutch-default.xml?
I'm not sure why you don't see anything in Luke though.
Howie
From: "Hasan Diwan" <[EMAIL PROTECTED]>
Mr Tang:
> Crawling seems ok. Can you pls try org.apache.nutch.searcher.NutchBean
> [your-query-string] in shell/cmd?
server: 7:20pm % ./bin/nutch org.apache.nutch.searcher.NutchBean hasan
060305 192042 10 parsing
file:/home/hdiwan/nutch-0.7.1/conf/nutch-default.xml
060305 192042 10 parsing file:/home/hdiwan/nutch-0.7.1/conf/nutch-site.xml
060305 192042 10 opening merged index in
/home/hdiwan/SpectraSearch/crawl/index
060305 192042 10 Plugins: looking in:
/home/hdiwan/nutch-0.7.1/build/plugins
060305 192042 10 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/nutch-extensionpoints/plugin.xml
060305 192042 10 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/protocol-file
060305 192042 10 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/protocol-ftp
060305 192042 10 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/protocol-http
060305 192042 10 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/protocol-httpclient/plugin.xml
060305 192042 10 impl: point=org.apache.nutch.protocol.Protocol
class=org.apache.nutch.protocol.httpclient.Http
060305 192042 10 impl: point=org.apache.nutch.protocol.Protocol
class=org.apache.nutch.protocol.httpclient.Http
060305 192042 10 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/parse-html/plugin.xml
060305 192042 10 impl: point=org.apache.nutch.parse.Parser
class=org.apache.nutc
che.nutch.searcher.more.TypeQueryFilter
060305 192043 10 impl: point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.more.DateQueryFilter
060305 192043 10 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/query-site/plugin.xml
060305 192043 10 impl: point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.site.SiteQueryFilter
060305 192043 10 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/query-url/plugin.xml
060305 192043 10 impl: point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.url.URLQueryFilter
060305 192043 10 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/urlfilter-regex
060305 192043 10 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/urlfilter-prefix
060305 192043 10 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/creativecommons
060305 192043 10 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/language-identifier
060305 192043 10 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/clustering-carrot2
060305 192043 10 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/ontology
Total hits: 0
--
Cheers,
Hasan Diwan <[EMAIL PROTECTED]>