I indexed my data by using index-more plugin and added my required field
(like content_type) to schema.xml
Now how can i search on pdf files (a kind of content_types) using this new
index? what query should i enter to have a search on pdf files?


On Thu, Sep 29, 2011 at 9:33 AM, ahmad ajiloo <[email protected]>wrote:

> How can I use the Index-more plugin? I'm new to Nutch and need your help in
> detail !
> thanks
>
>
> On Wed, Sep 14, 2011 at 12:54 PM, Markus Jelsma <
> [email protected]> wrote:
>
>> Just i wrote on the Solr list. Use the index-more plugin or copyField the
>> url
>> to an extension field in which you can use char pattern replace filter to
>> skip
>> everything up to the first dot.
>>
>> > Hello
>> > I want to search on articles via Solr. So need to find only specific
>> files
>> > like doc, docx, and pdf.
>> > I don't need any html pages. Thus the result of our search should only
>> > consists of doc, docx, and pdf files.
>> >
>> > I'm using Nutch to crawling web pages and sending Nutch's data to Solr
>> for
>> > indexing. There is an approach to search on specific file types: Put the
>> > file extension into my index and I have no idea about the type of schema
>> > nutch uses when indexing into Solr, wether it creates a specific field
>> for
>> > file extension, and/or how we can modify the nutch indexer to create a
>> > field like that for ourselves.
>>
>
>

Reply via email to