I didn't try the date range queries, but I did try the type queries, which is also part of the query-more functionalities and it works.
David ----------------------------------------- David Poirier E-business Consultant - Software Engineer Direct: +41 (0)22 596 10 35 Cross Systems - Groupe Micropole Univers Route des Acacias 45 B 1227 Carouge / Genève Tél: +41 (0)22 308 48 60 Fax: +41 (0)22 308 48 68 -----Original Message----- From: Brian Ulicny [mailto:[EMAIL PROTECTED] Sent: mercredi, 26. mars 2008 16:06 To: [email protected]; [email protected] Subject: RE: nutch: creating new plugins: query plugin Date range queries are part of the query-more functionality, right? Do they work? Brian On Wed, 26 Mar 2008 15:57:44 +0100, "POIRIER David" <[EMAIL PROTECTED]> said: > Brian, > > Thank you for your answer. > > Q: What happens when you do the keyword query only? > A: It works. Example: > query: cancer > results: yes > > Q: Where are you executing the query from? Using the NutchBean? > A: From the nutchBean. Here's a few tests I made: > query: cancer scope:aScope > results: no > > query: cancer scope:"aScope" > results: no > > query: cancer "scope:aScope" > results: no > > query: "cancer scope:aScope" > results: no > > Q: Why don't you try searching for the urls directly that you think > should > be returned using the url: syntax to make sure they got indexed and you > are pointing at the right index. > A: Thanks for the tip. I am indeed 100% certain that an index metadata > named scope with a value aScope exist for ALL the reference in my index. > > Example: > Query: cancer > Results: > * segment = 20080326104113 > * digest = 678e47f1a52ce036b89e2dc4c6f3571c > * url = http://www.aWebsite.com/article/511833.aspx > * title = Arimidex with Tamoxifen efficacy and safety trial for > advanced breast cancer (1033IL/0027) > * tstamp = 20080326094143023 > * contentLength = 45167 > * primaryType = text > * subType = html > * scope = aScope > * boost = 0.028375218 > > I am turning in circle (if we can say that in english)... I went back to > my first plugin, which is a modification of the query-site plugin, > without success. > > If you, or anybody, think of something else, please let me know. > > David > > > > > > ----------------------------------------- > David Poirier > E-business Consultant - Software Engineer > > Direct: +41 (0)22 596 10 35 > > Cross Systems - Groupe Micropole Univers > Route des Acacias 45 B > 1227 Carouge / Genève > Tél: +41 (0)22 308 48 60 > Fax: +41 (0)22 308 48 68 > > > > > > > -----Original Message----- > From: Brian Ulicny [mailto:[EMAIL PROTECTED] > Sent: mercredi, 26. mars 2008 15:30 > To: [email protected]; [email protected] > Subject: RE: nutch: creating new plugins: query plugin > > What happens when you do the keyword query only? > > Where are you executing the query from? Using the NutchBean? If so, > then the double-quotes would be necessary. > > Why don't you try searching for the urls directly that you think should > be returned using the url: syntax to make sure they got indexed and you > are pointing at the right index. > > Brian Ulicny > > > > > On Wed, 26 Mar 2008 11:50:43 +0100, "POIRIER David" > <[EMAIL PROTECTED]> said: > > Hello, > > > > I really need your help here please. I tried a few more things; I > > deleted my two plugins and instead of creating new ones I modified the > > existing index-more and query-more plugins. > > > > The index-more modification is working. Here's what I added: > > private Document addScope(Document doc, ParseData data, String url) { > > doc.add(new Field("scope", "aScope", Field.Store.YES, > > Field.Index.UN_TOKENIZED)); > > return doc; > > } > > > > And made sure that the method is called by adding this in the filter > > method: > > addScope(doc, parse.getData(), url_s); > > > > Using the Nutch API, when I check for the details of a hit, I look for: > > String scope = detail.getValue("scope"); > > > > And as expected it always return "aScope". > > > > The problem is when I try to filter a query using my modified query-more > > plugin. When executing the query "aKeyword scope:aScope" (the double > > quotes are there only for the email lisibility)the index always returns > > 0 result. > > > > Here's the additional class to the org.apache.nutch.indexer.more > > package: > > import org.apache.nutch.searcher.RawFieldQueryFilter; > > import org.apache.hadoop.conf.Configuration; > > > > /** > > * Handles "scope:" query clauses, causing them to search the field > > * indexed by MoreIndexingFilter. > > * > > * @author John Xing / David Poirier > > */ > > > > public class ScopeQueryFilter extends RawFieldQueryFilter { > > private Configuration conf; > > > > public ScopeQueryFilter() { > > super("scope"); > > } > > > > public void setConf(Configuration conf) { > > this.conf = conf; > > setBoost(conf.getFloat("query.scope.boost", 0.0f)); > > } > > > > public Configuration getConf() { > > return this.conf; > > } > > } > > > > And the plugin.xml file associated with it: > > <plugin > > id="query-more" > > name="More Query Filter" > > version="1.0.0" > > provider-name="nutch.org"> > > > > <runtime> > > <library name="query-more.jar"> > > <export name="*"/> > > </library> > > </runtime> > > > > <requires> > > <import plugin="nutch-extensionpoints"/> > > </requires> > > > > <extension id="org.apache.nutch.searcher.more" > > name="Nutch More Query Filter" > > point="org.apache.nutch.searcher.QueryFilter"> > > <implementation id="TypeQueryFilter" > > > > class="org.apache.nutch.searcher.more.TypeQueryFilter"> > > <parameter name="raw-fields" value="type"/> > > </implementation> > > > > </extension> > > > > <extension id="org.apache.nutch.searcher.more" > > name="Nutch More Query Filter" > > point="org.apache.nutch.searcher.QueryFilter"> > > <implementation id="DateQueryFilter" > > > > class="org.apache.nutch.searcher.more.DateQueryFilter"> > > <parameter name="raw-fields" value="date"/> > > </implementation> > > > > </extension> > > > > <extension id="org.apache.nutch.searcher.more" > > name="Nutch More Query Filter" > > point="org.apache.nutch.searcher.QueryFilter"> > > <implementation id="ScopeQueryFilter" > > > > class="org.apache.nutch.searcher.more.ScopeQueryFilter"> > > <parameter name="raw-fields" value="scope"/> > > </implementation> > > > > </extension> > > > > </plugin> > > > > If this tells something to anybody, please let me know. > > > > Thank you in advance, > > > > David > > > > > > ----------------------------------------- > > David Poirier > > E-business Consultant - Software Engineer > > > > > > > > -----Original Message----- > > From: POIRIER David [mailto:[EMAIL PROTECTED] > > Sent: mardi, 25. mars 2008 18:09 > > To: [email protected] > > Subject: nutch: creating new plugins: query plugin > > > > Hello, > > > > Following the info available on the wiki > > (http://wiki.apache.org/nutch/CreateNewFilter), I have created two new > > plugins: > > - index-scope (based on index-more) > > - query-scope (based on query-site) > > > > As you can guess, the first plugin simply add the "scope" metadata to > > every parsed document, giving them, as a test, a fixed value, while the > > second plugin add the possibility to search for a "scope" using the > > Lucene syntax. > > > > I have deploy the two new plugins, as JARS, in my plugins repository and > > modified my nutch-site.xml file to look for them. To be sure of > > everything I have performed a complete crawling of a "virgin" source. I > > have also modified both plugin.xml files so that the system can find the > > right java classes. > > > > Looking at a resultset everything looks fine: every hit in the set > > possesses the metadata scope=aScope, which is exactly what I am looking > > for. Things stop working though when I try to search for the metadata > > using the Lucene syntax. The query "aWord scope:aScope" returns > > nothing... > > > > When I check at my log files I can see that the query-scope plugin is > > available: > > [...] > > 2008-03-25 16:02:55,015 [http-8080-Processor23] INFO > > org.apache.nutch.plugin.PluginRepository - Scope Query Filter > > (query-scope) > > [...] > > And that the proper extension point is registered: > > [...] > > 2008-03-25 16:02:55,015 [http-8080-Processor23] INFO > > org.apache.nutch.plugin.PluginRepository - Nutch Query Filter > > (org.apache.nutch.searcher.QueryFilter) > > [...] > > > > > > Here is the plugin.xml file associated with the plugin: > > > > <plugin > > id="query-scope" > > name="a description" > > version="1.0.0" > > provider-name="myName.xyz"> > > > > <runtime> > > <library name="query-scope.jar"> > > <export name="*"/> > > </library> > > </runtime> > > > > <requires> > > <import plugin="nutch-extensionpoints"/> > > </requires> > > > > <extension > > id="org.apache.nutch.searcher.site.modified.SiteQueryFilterModified" > > name="Scope Query Filter" > > point="org.apache.nutch.searcher.QueryFilter"> > > <implementation id="SiteQueryFilterModified" > > > > class="org.apache.nutch.searcher.site.modified.SiteQueryFilterModified"> > > <parameter name="raw-fields" value="scope"/> > > </implementation> > > > > </extension> > > </plugin> > > > > > > > > If somebody has any idea... please let me know! Thank you in advance! > > > > David > > > > > -- > Brian Ulicny > bulicny at alum dot mit dot edu > home: 781-721-5746 > fax: 360-361-5746 > > -- Brian Ulicny bulicny at alum dot mit dot edu home: 781-721-5746 fax: 360-361-5746
