Someone correct me if I'm wrong, but I believe you can have
two different crawls using two different webdbs, then you can
copy the indexed segments into one central directory where
the searcher can pick it up as a single index.
If you're crawling habits are different for the movies and music
stuff, it might be easier to do it this way. If you have overlap
between the two sites and want to use the same webdb, you
might have to do what you're saying, but I think you'll have to
write your own utility set the fetch times the way you want.
Howie
How abt refetching pages with only movies
So during refetch if want to fetch only movie pages
What do we do ?
When we generate segments from the db.
Can we mention that pick only the following urls matching a certain query
Wont this create difficulties in refetching those pages
Rgds
Prabhu
On 1/9/06, Howie Wang <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> The "type" query filter will only handle parts of the query that are
> specified with "type:" The other parts of the query string will be
passed
> to the other query filters. By default the basic query filter will
search
> for things in content, url, host, title and anchor text.
>
> So, for example, if you have a query string that says "type:movies harry
> potter",
> the "type:movies" will use the "type" query filter you wrote, and the
> "harry
> potter"
> will probably go to the basic query filter which will find matches
within
> content,
> url, host, anchor text, and title. The results are just the intersection
> of
> the
> two, which is what you want.
>
> You'll have to edit your search form handler to add "type:movies" or
> "type:music"
> to the query string before it passes it to Nutch, but that's pretty easy
> to
> do.
>
> Howie
>
> >Hi Wang
> >
> >But i thought when you include a query-plugin and you have a field
called
> >
> >type:
> >
> >It will search content only in that filed
> >
> >So You are asking me to make all the content a subset of this one .Is
it
> ?
> >
> >For example -query-url will basically search in url field in the
> documents
> >
> >So how can this be a solution.
> >
> >
> >
> >Rgds
> >Prabhu
> >
> >
> >On 1/9/06, Howie Wang <[EMAIL PROTECTED]> wrote:
> > >
> > > To do what I mentioned, you basically have to write two plugins,
> > > an IndexFilter plugin and a QueryFilter plugin. I think this page
has
> > > some info on writing plugins:
> > >
> > > http://wiki.apache.org/nutch/WritingPlugins
> > >
> > > It will probably be easiest if you copy the src/plugins/index-basic
> > > directory, and just change all the build files and filenames as
> needed.
> >If
> > > you
> > > look at BasicIndexingFilter.java file, you'll see that the
> modifications
> > > needed
> > > aren't bad at all. There are a whole bunch of lines that do
something
> > > like:
> > >
> > > doc.add(Field.Text("myfield"), "somevalue");
> > >
> > > You should figure out if the url is from a movie page and then
> > > add your field:
> > >
> > > if (isFromMovieSite(url)) {
> > > doc.add(Field.Text("type"), "movies");
> > > } else if (isFromMusicSite(url)) {
> > > doc.add(Field.Text("type"), "music");
> > > } else {
> > > // Need to make sure all docs have the field,
> > > // Otherwise it will crash when you search
> > > doc.add(Field.Text("type"), "miscellaneous");
> > > }
> > >
> > > Doing the query filter is even easier, just copy the
> > > src/plugins/query-site
> > > directory, change filenames and build files as needed. And change
the
> > > line that says:
> > >
> > > super("site");
> > >
> > > to:
> > >
> > > super("type");
> > >
> > > That's pretty much it. You'll have to edit your conf/nutch-*.xml
files
> >to
> > > include your new plugins.
> > >
> > >
> > > >Can you explain what exactly you have in mind
> > > >
> > > >Say that i have fetched sites under movie category (a list of
> websites
> > > >which
> > > >i have ),how do i add
> > > >a field to it and have fetched sites for songs.
> > > >How do i specifically add a field to first set of pages (ie that of
> > > movies)
> > > >and a separate field to the second (ie that of songs)
> > > >
> > > >And field search ,How can i search by this field
> > > >
> > > >How will nutch understand this query
> > > >newfield:uniquename
> > > >
> > > >I thought you needed to create a query-plugin for each field u
create
> .
> > > >(like query-url)
> > > >
> > > >I still did not get what u meant .If you can clearly mention ,it
will
> >be
> > > >helpful
> > > >
> > > >Thanks .
> > > >Raghavendra Prabhu R
> > >
> > >
> > >
>
>
>
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general