I guess you are talking about language-identifier plugin. It might seem
strange, but even this one is not working.
I tried using Nutch 0.8 in order to be on the same page, but I am still
facing the problem. I see a field called 'lang' in the index, but cannot
query on it.
Until now, the only field on which I was able to query is 'url'.
Here are the steps I followed to use the language-identifier plugin:
* I set the plugin to be used in conf/nutch-site.xml at: plugin.includes.
<property>
<name>plugin.includes</name>
<value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(html|xml|text|js|pdf)|index-(basic)|query-(basic|site|url|more)|scoring-opic|language-identifier</value>
</property>
* I run the crawl using bin/nutch crawl.
* I verify that the field has been added by using Luke and see that all
documents have the 'lang' field in them.
* I run a query on the url field and verify that there are results.
* I run a query on the 'lang' field. No results. :(
Thanks for helping out.
Gautham.
Milan Krendzelak wrote:
>
> Hi Gautham,
>
> I am using Nutch 0.8 and implemented the new field to search in according
> the plugin query-lang.
> Try to do the same as query-lang, let's say just for testing...
> Also don't forget to create new plugin.xml and define fields parameter.
> It works for me, and I think it should work for you too.
>
> BasicQueryFilter is used to query the index on different fields but this
> the same Term
> for example +(url:java anchor:java content:java title:java ...)
> in your case, as I understand you want to query index with different terms
> like: +author:Guatham +title:Nutch +description:Java
> In this case you have to build you own query and when pass the query as a
> parameter to search function ( for example in NutchBean )
>
> Actually you are right about the tutorial or documentation.
> Compare to other Apache products, Nutch is really pure documented.
> Thanks god we have this mailing list, otherwise I would be lost :-)
>
> Regards,
> M
>
> Milan Krendzelak
> Senior Software Developer
>
> mTLD Top Level Domain Limited is a private limited company incorporated
> and registered in the Republic of Ireland with registered number 398040
> and registered office at Arthur Cox Building, Earlsfort Terrace, Dublin 2
>
> ________________________________
>
> From: Gautham Pai [mailto:[EMAIL PROTECTED]
> Sent: Wed 10/10/2007 16:24
> To: [email protected]
> Subject: Re: Custom field query
>
>
>
>
> Still, no luck. I am not able to search on a single field let alone
> multiple
> fields per class.
>
> I tried debugging the code and this is what I found:
>
> * I see the field listed in the FIELD_NAMES HashSet in QueryFilters.java.
> * LuceneQueryOptimizer's method: optimize has a call to searcher.search
> and
> this returns no TopDocs in the case of author. If I do a search on "url"
> it
> works fine and I see results.
> * I tried changing the boost value. No effect.
>
> The fields that I am searching on are not tokenized. I don't have any
> analyzers defined. Is this a problem?
>
> What else could be wrong?
>
> Could this be a problem with Lucene or am I missing some configuration?
>
> Thanks,
> Gautham
>
> Sagar Naik-2 wrote:
>>
>> Hey,
>> Pl see the answers to the questions below.
>> Gautham Pai wrote:
>>> I have seen this question being asked multiple times in this forum.
>>> However
>>> this has confused me more because each has its own approach to solving
>>> the
>>> issue and no one has outlined the steps in one place. The tutorials seem
>>> to
>>> be a bit outdated too.
>>>
>>> The version of Nutch I am using is 0.9.
>>>
>>> I have 3 custom fields that I have added via an IndexingFilter. The
>>> fields
>>> are: author, title and description. I now intend to provide support for
>>> querying these fields as:
>>> author:Gautham
>>> title:Nutch
>>> etc.
>>>
>>> I added an Author class as follows:
>>>
>>> public class Author extends RawFieldQueryFilter {
>>> private Configuration conf;
>>>
>>> public Author() {
>>> super("author", 5f);
>>> }
>>>
>>> public void setConf(Configuration conf) {
>>> this.conf = conf;
>>> }
>>>
>>> public Configuration getConf() {
>>> return this.conf;
>>> }
>>> }
>>>
>>> and made an entry in plugin.xml as:
>>>
>>> <extension id="query.Author"
>>> name="Author"
>>> point="org.apache.nutch.searcher.QueryFilter">
>>> <implementation id="Author"
>>> class="query.Author">
>>> <parameter name="fields" value="author"/>
>>> </implementation>
>>> </extension>
>>>
>>> When I use NutchBean to perform the query, I see no results. I also
>>> tried
>>> changing the RawFieldQueryFilter to QueryFilter and following the
>>> approach
>>> used in the query-more plugin. It does not seem to work either.
>>>
>>> The questions I have specifically are:
>>> * Do I need to create one class per custom field that I intend to
>>> provide
>>> support for query?
>>>
>> Generally, one class for all the custom fields is sufficient. In your
>> case too, u should be able to do with one class
>>> * Should I use RawFieldQueryFilter or QueryFilter?
>>>
>> RawFieldQueryFilter implements QueryFilter , So I would use
>> RawfieldQueryFilter.
>>> * Should I make an entry as: <parameter name="fields" value="author"/>
>>> or
>>> <parameter name="fields" value="DEFAULT"/> in plugin.xml?
>>>
>>>
>> In your case,
>>
>> <parameter name="fields" value="author, title, description"/> should
>> solve
>> the problem.
>> Check "out org.apache.nutch.searcher.QueryFilters" class's Ctor.
>>
>>> Any help or pointers is greatly appreciated.
>>>
>>> Thanks,
>>> Gautham.
>>>
>>
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content and is believed to be clean.
>>
>>
>>
>
> --
> View this message in context:
> http://www.nabble.com/Custom-field-query-tf4596454.html#a13138143
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>
>
>
>
--
View this message in context:
http://www.nabble.com/Custom-field-query-tf4596454.html#a13144511
Sent from the Nutch - User mailing list archive at Nabble.com.