Hasaan:

The first place to look is the wiki. There is documentation on Plugin's
there.
http://www.nutch.org/cgi-bin/twiki/view/Main/TheNutchPluginSystem

Though honestly, I feel its very theoretical and while it gives you a good
overview, it does not put you on the path of writing one.

Combine that knowledge with the excellent post by Matt Kangas "Dissecting
the Nutch 0.5 Crawler" and you're on your way.
http://www.nutch.org/cgi-bin/twiki/view/Main/DissectingTheNutchCrawler

Look at the part where it talks about the ParseFactory, ProtocolFactory --
explaining how the plugins get invoked.


Given that I've just written my first "buggy" plugin, here's a very basic
overview:

- All plugins are located in the "plugins" directory as specified in the
conf file. NOTE: I've found that at times the plugins directory is not found
using a relative path/classpath -- thus please run your programs that test
the plugin from NUTCH_HOME

- The plugins have an xml file which define the "extension point" -- i.e the
function withing Nutch were this will plugin get called. Look at the sample
xml file to see other properties.

- The plugins define some or the other property (depending on what type of
plugin it is), that will be looked at for a matching value. 
Example: for the protocol plugin the "protocolName" value specifics which
plugin to call -- for http:// the protocol-http plugin is invoked (Look at
second link above for explaination)

- The best way to learn on expand on the the "parse-ext" plugin in some way
-- once you get that hang of that, you should have no trouble understnading
the other plugins that are there.


Feel free to shoot me questions if this is not clear, I'm trying to make it
an early night tonight (Well at least I'm trying)!


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Hasan
Diwan
Sent: Friday, January 07, 2005 8:14 PM
To: [EMAIL PROTECTED]
Subject: Re: [Nutch-dev] Results Syndication Feed

Doug:


On Tue, 28 Dec 2004 08:35:28 -0800, Doug Cutting <[EMAIL PROTECTED]> wrote:
> A servlet that uses NutchBean should do most of what's required.  If 
> you need results date-ordered then you'd need an indexing plugin that 
> indexes a date for each page, and, finally, a query plugin that causes 
> a lucene Lucene Sort to be used.  This last part is the hardest, as 
> query plugins can currently only modify the query and don't get to 
> specify a Sort.  So we'll need to revise this API a bit.

I haven't heard back from you regarding my questions, resent below:
1. index-more already handles sorting by date. Can I leverage this instead
of writing an indexing plugin?
2. I have no idea how to write a plugin for nutch, care to provide me with
some pointers?
--
Cheers,
Hasan Diwan <[EMAIL PROTECTED]>


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers




-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to