Hasaan: The first place to look is the wiki. There is documentation on Plugin's there. http://www.nutch.org/cgi-bin/twiki/view/Main/TheNutchPluginSystem
Though honestly, I feel its very theoretical and while it gives you a good overview, it does not put you on the path of writing one. Combine that knowledge with the excellent post by Matt Kangas "Dissecting the Nutch 0.5 Crawler" and you're on your way. http://www.nutch.org/cgi-bin/twiki/view/Main/DissectingTheNutchCrawler Look at the part where it talks about the ParseFactory, ProtocolFactory -- explaining how the plugins get invoked. Given that I've just written my first "buggy" plugin, here's a very basic overview: - All plugins are located in the "plugins" directory as specified in the conf file. NOTE: I've found that at times the plugins directory is not found using a relative path/classpath -- thus please run your programs that test the plugin from NUTCH_HOME - The plugins have an xml file which define the "extension point" -- i.e the function withing Nutch were this will plugin get called. Look at the sample xml file to see other properties. - The plugins define some or the other property (depending on what type of plugin it is), that will be looked at for a matching value. Example: for the protocol plugin the "protocolName" value specifics which plugin to call -- for http:// the protocol-http plugin is invoked (Look at second link above for explaination) - The best way to learn on expand on the the "parse-ext" plugin in some way -- once you get that hang of that, you should have no trouble understnading the other plugins that are there. Feel free to shoot me questions if this is not clear, I'm trying to make it an early night tonight (Well at least I'm trying)! -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Hasan Diwan Sent: Friday, January 07, 2005 8:14 PM To: [EMAIL PROTECTED] Subject: Re: [Nutch-dev] Results Syndication Feed Doug: On Tue, 28 Dec 2004 08:35:28 -0800, Doug Cutting <[EMAIL PROTECTED]> wrote: > A servlet that uses NutchBean should do most of what's required. If > you need results date-ordered then you'd need an indexing plugin that > indexes a date for each page, and, finally, a query plugin that causes > a lucene Lucene Sort to be used. This last part is the hardest, as > query plugins can currently only modify the query and don't get to > specify a Sort. So we'll need to revise this API a bit. I haven't heard back from you regarding my questions, resent below: 1. index-more already handles sorting by date. Can I leverage this instead of writing an indexing plugin? 2. I have no idea how to write a plugin for nutch, care to provide me with some pointers? -- Cheers, Hasan Diwan <[EMAIL PROTECTED]> ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
