Hi,

Has anyone been able to use SFTP with Nutch 2.0?

* I have enabled the out-of-the-box SFTP plugin in nutch-site.xml / 
plugin.includes property
* I have added the appropriate line to prefix-urlfilter.txt
* I configured Nutch to accept everything in regex-urlfilter.txt
* I am trying to inject a single URL with SFTP to a clean HBase / Nutch / Solr 
setup

I consider my setup working properly otherwise since I am able to inject / 
generate / fetch / parse / etc. a sample of 1,000 URLs from the DMOZ Open 
Directory (similar to the Nutch 1.x tutorial).

Here is the output of the inject command:

InjectorJob: starting
InjectorJob: urlDir: ***censored***
Skipping sftp://***censored***/:java.net.MalformedURLException: unknown 
protocol: sftp
InjectorJob: finished

Here is the related snippet from the log file with TRACE level:

2012-09-27 11:21:50,874 DEBUG plugin.PluginRepository - parsing: 
/home/totha/development/apache-nutch-2.0/plugins/protocol-sftp/plugin.xml
2012-09-27 11:21:50,875 DEBUG plugin.PluginRepository - plugin: 
id=protocol-sftp name=Sftp Protocol Plug-in version=1.0.0 
provider=nutch.orgclass=null
2012-09-27 11:21:50,875 DEBUG plugin.PluginRepository - impl: 
point=org.apache.nutch.protocol.Protocol 
class=org.apache.nutch.protocol.sftp.Sftp
...
2012-09-27 11:21:50,880 INFO  plugin.PluginRepository - Registered Plugins:
2012-09-27 11:21:50,881 INFO  plugin.PluginRepository -         the nutch core 
extension points (nutch-extensionpoints)
2012-09-27 11:21:50,881 INFO  plugin.PluginRepository -         Basic URL 
Normalizer (urlnormalizer-basic)
2012-09-27 11:21:50,881 INFO  plugin.PluginRepository -         Html Parse 
Plug-in (parse-html)
2012-09-27 11:21:50,881 INFO  plugin.PluginRepository -         Basic Indexing 
Filter (index-basic)
2012-09-27 11:21:50,881 INFO  plugin.PluginRepository -         HTTP Framework 
(lib-http)
2012-09-27 11:21:50,881 INFO  plugin.PluginRepository -         Pass-through 
URL Normalizer (urlnormalizer-pass)
2012-09-27 11:21:50,881 INFO  plugin.PluginRepository -         Regex URL 
Filter (urlfilter-regex)
2012-09-27 11:21:50,881 INFO  plugin.PluginRepository -         Http Protocol 
Plug-in (protocol-http)
2012-09-27 11:21:50,881 INFO  plugin.PluginRepository -         Sftp Protocol 
Plug-in (protocol-sftp)
2012-09-27 11:21:50,881 INFO  plugin.PluginRepository -         Regex URL 
Normalizer (urlnormalizer-regex)
2012-09-27 11:21:50,881 INFO  plugin.PluginRepository -         Tika Parser 
Plug-in (parse-tika)
2012-09-27 11:21:50,881 INFO  plugin.PluginRepository -         OPIC Scoring 
Plug-in (scoring-opic)
2012-09-27 11:21:50,881 INFO  plugin.PluginRepository -         CyberNeko HTML 
Parser (lib-nekohtml)
2012-09-27 11:21:50,881 INFO  plugin.PluginRepository -         Anchor Indexing 
Filter (index-anchor)
2012-09-27 11:21:50,881 INFO  plugin.PluginRepository -         Regex URL 
Filter Framework (lib-regex-filter)

Thanks.
IMPORTANT NOTICE: This message, including attachments, may be confidential or 
legally privileged and is for the intended recipient(s) only. Unauthorized 
distribution, copying or disclosure is strictly prohibited. By accepting email 
communications that may contain your personal information, you are deemed to 
consent to its transmission. Please delete this email if obtained in error and 
email confirmation to sender.

Reply via email to