How abt refetching pages with only movies

So during refetch if want to fetch only movie pages

What do we do ?

When we generate segments from the db.

Can we mention that pick only the following urls matching a certain query

Wont this create difficulties in refetching those pages

Rgds

Prabhu


On 1/9/06, Howie Wang <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> The "type" query filter will only handle parts of the query that are
> specified with "type:" The other parts of the query string will be passed
> to the other query filters. By default the basic query filter will search
> for things in content, url, host, title and anchor text.
>
> So, for example, if you have a query string that says "type:movies harry
> potter",
> the "type:movies" will use the "type" query filter you wrote, and the
> "harry
> potter"
> will probably go to the basic query filter which will find matches within
> content,
> url, host, anchor text, and title. The results are just the intersection
> of
> the
> two, which is what you want.
>
> You'll have to edit your search form handler to add "type:movies" or
> "type:music"
> to the query string before it passes it to Nutch, but that's pretty easy
> to
> do.
>
> Howie
>
> >Hi Wang
> >
> >But i thought when you include a query-plugin and you have a field called
> >
> >type:
> >
> >It will search content only in that filed
> >
> >So You are asking me to make all the content a subset of this one .Is it
> ?
> >
> >For example -query-url will basically search in url field in the
> documents
> >
> >So how can this be a solution.
> >
> >
> >
> >Rgds
> >Prabhu
> >
> >
> >On 1/9/06, Howie Wang <[EMAIL PROTECTED]> wrote:
> > >
> > > To do what I mentioned, you basically have to write two plugins,
> > > an IndexFilter plugin and a QueryFilter plugin. I think this page has
> > > some info on writing plugins:
> > >
> > > http://wiki.apache.org/nutch/WritingPlugins
> > >
> > > It will probably be easiest if you copy the src/plugins/index-basic
> > > directory, and just change all the build files and filenames as
> needed.
> >If
> > > you
> > > look at BasicIndexingFilter.java file, you'll see that the
> modifications
> > > needed
> > > aren't bad at all. There are a whole bunch of lines that do something
> > > like:
> > >
> > >    doc.add(Field.Text("myfield"), "somevalue");
> > >
> > > You should figure out if the url is from a movie page and then
> > > add your field:
> > >
> > >    if (isFromMovieSite(url)) {
> > >        doc.add(Field.Text("type"), "movies");
> > >    } else if (isFromMusicSite(url)) {
> > >        doc.add(Field.Text("type"), "music");
> > >    }  else {
> > >        // Need to make sure all docs have the field,
> > >        // Otherwise it will crash when you search
> > >        doc.add(Field.Text("type"), "miscellaneous");
> > >    }
> > >
> > > Doing the query filter is even easier, just copy the
> > > src/plugins/query-site
> > > directory, change filenames and build files as needed. And change the
> > > line that says:
> > >
> > >    super("site");
> > >
> > > to:
> > >
> > >    super("type");
> > >
> > > That's pretty much it. You'll have to edit your conf/nutch-*.xml files
> >to
> > > include your new plugins.
> > >
> > >
> > > >Can you explain what exactly you have in mind
> > > >
> > > >Say that i have fetched sites under movie category (a list of
> websites
> > > >which
> > > >i have ),how do i add
> > > >a field to it  and have fetched sites for songs.
> > > >How do i specifically add a field to first set of pages (ie that of
> > > movies)
> > > >and a separate field to the second (ie that of songs)
> > > >
> > > >And field search ,How can i search by this field
> > > >
> > > >How will nutch understand this query
> > > >newfield:uniquename
> > > >
> > > >I thought you needed to create a query-plugin for each field u create
> .
> > > >(like query-url)
> > > >
> > > >I still did not get what u meant .If you can clearly mention ,it will
> >be
> > > >helpful
> > > >
> > > >Thanks .
> > > >Raghavendra Prabhu R
> > >
> > >
> > >
>
>
>

Reply via email to