Fred, I must say I am happy to see that I am not the only one!
You are right: Using Luke and the org.apache.lucene.analysis.KeywordAnalyzer I can search for my added field (scope). An example: +content:"cancer" +scope:"aScope". What I understand is that using this analyzer you can filter your query using any of the stored fields. When executing a query through Nutch, the analyzer used is org.apache.nutch.analysis.NutchAnalyzer. I guess it might execute similar tasks... The class called by my query plugin is org.apache.nutch.searcher.RawFieldQueryFilter. I'll check into that also. I'll plunge into the details and let you know if I find something. David ----------------------------------------- David Poirier E-business Consultant - Software Engineer -----Original Message----- From: Fred Gilmore [mailto:[EMAIL PROTECTED] Sent: mercredi, 26. mars 2008 18:34 To: [email protected] Subject: Re: nutch: creating new plugins: query plugin I'm watching this thread with interest as I'm stuck in the same place. From reading three years of list archives, people seem to get over the hump of indexing custom fields and then get mired in query side. My index is fine. Luke shows me the fields, the values. I can change my index plugin code to not split on commas and it obeys. I can search it with Luke and it pulls data. I realize that only means so much since it's parsing those queries with a Lucene class. But I can't get past the query plugin. No matter how closely I follow the example on the wiki. I can look at the query-url, query-more, doesn't seem to matter. In fact, right now, if I load the query-plugin listed below (in addition to query-basic) it breaks all searching. keyword, fielded, whatever. <plugin id="query-placename" name="Placename Query Filter" version="1.0.0" provider-name="utexas.edu"> <runtime> <library name="query-placename.jar"> <export name="*"/> </library> </runtime> <requires> <import plugin="nutch-extensionpoints"/> </requires> <extension id="org.apache.nutch.searcher.placename.PlacenameQueryFilter" name="Placename Query Filter" point="org.apache.nutch.searcher.QueryFilter"> <implementation id="PlacenameQueryFilter" class="org.apache.nutch.searcher.placename.PlacenameQueryFilter"> <parameter name="fields" value="placename"/> </implementation> </extension> </plugin> =============== [search1]:nutch> pg PlacenameQueryFilter.java package org.apache.nutch.searcher.placename; import org.apache.nutch.searcher.FieldQueryFilter; import org.apache.hadoop.conf.Configuration; public class PlacenameQueryFilter extends FieldQueryFilter { public PlacenameQueryFilter() { super("placename", 5f); } public void setConf(Configuration conf) { super.setConf(conf); } } The wiki plugin example omits setConf as above, the query-url code sets it as does query-more. Some use rawfieldqueryfilter, some use queryfilter, doesn't seem that should matter. The plugin gets shuttled over to the tomcat side, the nutch-site.xml gets updated with a new plugins.include stanza and the webapp redeployed. I've tried loading nutch-extensionpoints first here as well, doesn't seem to matter. Maybe the boost is messing things up, it's set high but that's because previous threads have indicated it was the only way to get the field only searches like placename:london working. <property> <name>searcher.dir</name> <value>/usr/local/db/nutch/search1/crawls/missions-test</value> </property> <property> <name>plugin.includes</name> <value>protocol-http|urlfilter-regex|parse-(text|html|meta)|index-(basic |more|meta)|query-(basic|more|placename|creator|url)|summary-lucene|scor ing-opic|urlnormalizer-(pass|regex|basic)</value> </property> No other nutch-default.xml or nutch-site.xml settings are altered. But there must be something obvious I'm leaving unset or that's conflicting on the tomcat side which is breaking this. removing the query-placename and query-creator plugins, keyword searching resumes. url: works, so the syntax is accepted. After several weeks of trying diff things, I'm all out. But there must be something I'm missing. Any ideas at all? thanks, Fred Gilmore University of Texas Austin Libraries >>
