Hello all. This time following up on my own post...
>>> When you look at the protocol-smb hook it comes with this static >>> hook, but as it is never executed does not help. >> >>Yes, it has to be called. > >So when would Nutch call this static hook? In practice this does not happen >before the plugin is required, but then it is too late as the >MalformedURLException is thrown already. >And this aproach cannot cover the classpath issue. It seems Nutch would never call this static hook. That is why I patched the PluginRepository class. >>> - create a tutorial to add some arbitrary protocol (e.g. the >>> foo://bar/baz url) >>> - modify the protocol-smb plugin to make use of the smbclient binary. >>> >>> I'd be willing to do the latter but would like to see a less clumsy >>> behaviour for plugins. >> >>Great! Nutch could not exist without voluntary work. Thanks! >> >>Sorry, that integration will not be that easy. The problem was indeed already >>known since long and should have been better tested, see also [1] and [2] - >>the class >>org.apache.nutch.protocol.sftp.Handler (a dummy handler) has been lost, >>you'll find it in the zip file attached to NUTCH-714. >> >>However, encapsulation and lazy instantiation I would not call "clumsy >>behavior", it's useful for heavy-weight plugins (e.g., parse-tika which >>brings 50 MB dependencies). > >Both concepts, encapsulation and lazy instantiation are great. What I call >clumsy is that the encapsulation does not work. Look at it from a user >perspective of the protocol-smb plugin. >It comes as a (set of) jars, together with an XML descriptor. This could be >nicely wrapped in a zip file and thus is one artifact that can easily be >versioned and distributed. > >But as soon as I want to install it, I have to >1 - put the artifact into the plugins directory >2 - modify Nutch configuration files to allow smb:// urls plus include the >plugin to the loaded list >3 - extract jcifs.jar and place it on the system classpath >4 - run nutch with the correct system property > >While items 1 and 2 can be understood easily and maybe one day come with a >nice management interface, items 3 and 4 require knowledge about the internals >of the plugin. >Where did the encapsulation go? This is where I'd like to improve, and I have >an idea how that could be established. Need to test it though. I have a solution that makes steps 3 and 4 obsolete. >I would need the first to test modifications to the plugin system. >Then with the second I would create a smb plugin that would suffer other >limitations than the LGPL. ;-) So here is the solution to the first step - the modified plugin system. It is available here, however I am not sure how to create the pull request... https://github.com/HiranChaudhuri/nutch/commit/dc9cbeb3da7ca021e2cce322482d2eaa1ec15b28 Next will be one example plugin and the mentioned protocol-smb. Hiran