Hello all.

This time following up on my own post...

>>> When you look at the protocol-smb hook it comes with this static 
>>> hook, but as it is never executed does not help.
>>
>>Yes, it has to be called.
>
>So when would Nutch call this static hook? In practice this does not happen 
>before the plugin is required, but then it is too late as the 
>MalformedURLException is thrown already.
>And this aproach cannot cover the classpath issue.

It seems Nutch would never call this static hook. That is why I patched the 
PluginRepository class.

>>> - create a tutorial to add some arbitrary protocol (e.g. the  
>>> foo://bar/baz url)
>>> - modify the protocol-smb plugin to make use of the smbclient binary.
>>>
>>> I'd be willing to do the latter but would like to see a less clumsy 
>>> behaviour for plugins.
>>
>>Great! Nutch could not exist without voluntary work. Thanks!
>>
>>Sorry, that integration will not be that easy. The problem was indeed already 
>>known since long and should have been better tested, see also [1] and [2] - 
>>the class 
>>org.apache.nutch.protocol.sftp.Handler (a dummy handler) has been lost, 
>>you'll find it in the zip file attached to NUTCH-714.
>>
>>However, encapsulation and lazy instantiation I would not call "clumsy 
>>behavior", it's useful for heavy-weight plugins (e.g., parse-tika which 
>>brings 50 MB dependencies).
>
>Both concepts, encapsulation and lazy instantiation are great. What I call 
>clumsy is that the encapsulation does not work. Look at it from a user 
>perspective of the protocol-smb plugin.
>It comes as a (set of) jars, together with an XML descriptor. This could be 
>nicely wrapped in a zip file and thus is one artifact that can easily be 
>versioned and distributed.
>
>But as soon as I want to install it, I have to
>1 - put the artifact into the plugins directory
>2 - modify Nutch configuration files to allow smb:// urls plus include the 
>plugin to the loaded list
>3 - extract jcifs.jar and place it on the system classpath
>4 - run nutch with the correct system property
>
>While items 1 and 2 can be understood easily and maybe one day come with a 
>nice management interface, items 3 and 4 require knowledge about the internals 
>of the plugin. 
>Where did the encapsulation go? This is where I'd like to improve, and I have 
>an idea how that could be established. Need to test it though.

I have a solution that makes steps 3 and 4 obsolete.

>I would need the first to test modifications to the plugin system.
>Then with the second I would create a smb plugin that would suffer other 
>limitations than the LGPL. ;-)

So here is the solution to the first step - the modified plugin system. It is 
available here, however I am not sure how to create the pull request...
https://github.com/HiranChaudhuri/nutch/commit/dc9cbeb3da7ca021e2cce322482d2eaa1ec15b28

Next will be one example plugin and the mentioned protocol-smb.

Hiran

Reply via email to