How abt refetching pages with only movies So during refetch if want to fetch only movie pages
What do we do ? When we generate segments from the db. Can we mention that pick only the following urls matching a certain query Wont this create difficulties in refetching those pages Rgds Prabhu On 1/9/06, Howie Wang <[EMAIL PROTECTED]> wrote: > > Hi, > > The "type" query filter will only handle parts of the query that are > specified with "type:" The other parts of the query string will be passed > to the other query filters. By default the basic query filter will search > for things in content, url, host, title and anchor text. > > So, for example, if you have a query string that says "type:movies harry > potter", > the "type:movies" will use the "type" query filter you wrote, and the > "harry > potter" > will probably go to the basic query filter which will find matches within > content, > url, host, anchor text, and title. The results are just the intersection > of > the > two, which is what you want. > > You'll have to edit your search form handler to add "type:movies" or > "type:music" > to the query string before it passes it to Nutch, but that's pretty easy > to > do. > > Howie > > >Hi Wang > > > >But i thought when you include a query-plugin and you have a field called > > > >type: > > > >It will search content only in that filed > > > >So You are asking me to make all the content a subset of this one .Is it > ? > > > >For example -query-url will basically search in url field in the > documents > > > >So how can this be a solution. > > > > > > > >Rgds > >Prabhu > > > > > >On 1/9/06, Howie Wang <[EMAIL PROTECTED]> wrote: > > > > > > To do what I mentioned, you basically have to write two plugins, > > > an IndexFilter plugin and a QueryFilter plugin. I think this page has > > > some info on writing plugins: > > > > > > http://wiki.apache.org/nutch/WritingPlugins > > > > > > It will probably be easiest if you copy the src/plugins/index-basic > > > directory, and just change all the build files and filenames as > needed. > >If > > > you > > > look at BasicIndexingFilter.java file, you'll see that the > modifications > > > needed > > > aren't bad at all. There are a whole bunch of lines that do something > > > like: > > > > > > doc.add(Field.Text("myfield"), "somevalue"); > > > > > > You should figure out if the url is from a movie page and then > > > add your field: > > > > > > if (isFromMovieSite(url)) { > > > doc.add(Field.Text("type"), "movies"); > > > } else if (isFromMusicSite(url)) { > > > doc.add(Field.Text("type"), "music"); > > > } else { > > > // Need to make sure all docs have the field, > > > // Otherwise it will crash when you search > > > doc.add(Field.Text("type"), "miscellaneous"); > > > } > > > > > > Doing the query filter is even easier, just copy the > > > src/plugins/query-site > > > directory, change filenames and build files as needed. And change the > > > line that says: > > > > > > super("site"); > > > > > > to: > > > > > > super("type"); > > > > > > That's pretty much it. You'll have to edit your conf/nutch-*.xml files > >to > > > include your new plugins. > > > > > > > > > >Can you explain what exactly you have in mind > > > > > > > >Say that i have fetched sites under movie category (a list of > websites > > > >which > > > >i have ),how do i add > > > >a field to it and have fetched sites for songs. > > > >How do i specifically add a field to first set of pages (ie that of > > > movies) > > > >and a separate field to the second (ie that of songs) > > > > > > > >And field search ,How can i search by this field > > > > > > > >How will nutch understand this query > > > >newfield:uniquename > > > > > > > >I thought you needed to create a query-plugin for each field u create > . > > > >(like query-url) > > > > > > > >I still did not get what u meant .If you can clearly mention ,it will > >be > > > >helpful > > > > > > > >Thanks . > > > >Raghavendra Prabhu R > > > > > > > > > > > >
