prashant_nutch wrote:
> IS Subcollection useful for specific URL Searching ?
> How we activate subcollection at indexing and searching time?
>
> in conf/subcollection , 
> if we include our URL in whitelist ,then only we have search on that URLs?
> command for searching on subcollection
>
> Subcollection :< Name of subcollection> < word for specific URL>
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <subcollections>
>       <subcollection>
>               <name>nutch</name>
>               <id>nutch</id>
>               <whitelist>
>                                            http://lucene.apache.org/nutch/
>                                            http://wiki.apache.org/nutch/
>                                 </whitelist>
>               <blacklist />
>       </subcollection>
> </subcollections>
>
> can anybody explain how overall thing should work ?
> can it is useful for specific URL searching ?(we are using nutch 0.8.1)
>
>   
Subcollection is a very useful way to group a set of urls and then 
assign a label for them. You can use it to limit searching to certain urls.

You should first enable subcollection in the nutch-site.xml file.
Then you should add collections to the conf/subcollection.xml file.
After indexing, the documents with the matched urls should have the 
subcollection field in the index.
After that, since subcollection also includes a query plugin, you can do 
searches like

      java subcollection:nutch

To limit the search to the nutch collection. You can consult the readme 
file in the plugin's directory.






-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to