Hi,
The "type" query filter will only handle parts of the query that are
specified with "type:" The other parts of the query string will be passed
to the other query filters. By default the basic query filter will search
for things in content, url, host, title and anchor text.
So, for example, if you have a query string that says "type:movies harry
potter",
the "type:movies" will use the "type" query filter you wrote, and the "harry
potter"
will probably go to the basic query filter which will find matches within
content,
url, host, anchor text, and title. The results are just the intersection of
the
two, which is what you want.
You'll have to edit your search form handler to add "type:movies" or
"type:music"
to the query string before it passes it to Nutch, but that's pretty easy to
do.
Howie
Hi Wang
But i thought when you include a query-plugin and you have a field called
type:
It will search content only in that filed
So You are asking me to make all the content a subset of this one .Is it ?
For example -query-url will basically search in url field in the documents
So how can this be a solution.
Rgds
Prabhu
On 1/9/06, Howie Wang <[EMAIL PROTECTED]> wrote:
>
> To do what I mentioned, you basically have to write two plugins,
> an IndexFilter plugin and a QueryFilter plugin. I think this page has
> some info on writing plugins:
>
> http://wiki.apache.org/nutch/WritingPlugins
>
> It will probably be easiest if you copy the src/plugins/index-basic
> directory, and just change all the build files and filenames as needed.
If
> you
> look at BasicIndexingFilter.java file, you'll see that the modifications
> needed
> aren't bad at all. There are a whole bunch of lines that do something
> like:
>
> doc.add(Field.Text("myfield"), "somevalue");
>
> You should figure out if the url is from a movie page and then
> add your field:
>
> if (isFromMovieSite(url)) {
> doc.add(Field.Text("type"), "movies");
> } else if (isFromMusicSite(url)) {
> doc.add(Field.Text("type"), "music");
> } else {
> // Need to make sure all docs have the field,
> // Otherwise it will crash when you search
> doc.add(Field.Text("type"), "miscellaneous");
> }
>
> Doing the query filter is even easier, just copy the
> src/plugins/query-site
> directory, change filenames and build files as needed. And change the
> line that says:
>
> super("site");
>
> to:
>
> super("type");
>
> That's pretty much it. You'll have to edit your conf/nutch-*.xml files
to
> include your new plugins.
>
>
> >Can you explain what exactly you have in mind
> >
> >Say that i have fetched sites under movie category (a list of websites
> >which
> >i have ),how do i add
> >a field to it and have fetched sites for songs.
> >How do i specifically add a field to first set of pages (ie that of
> movies)
> >and a separate field to the second (ie that of songs)
> >
> >And field search ,How can i search by this field
> >
> >How will nutch understand this query
> >newfield:uniquename
> >
> >I thought you needed to create a query-plugin for each field u create .
> >(like query-url)
> >
> >I still did not get what u meant .If you can clearly mention ,it will
be
> >helpful
> >
> >Thanks .
> >Raghavendra Prabhu R
>
>
>
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general