When trying to run the example protocol-foo plugin (I am writing it), I was
able to pass the injector and generator phases, but it seems the fetch phase
fails.
From the log I have it seems the fetcher tries to resolve URLs before the
PluginRepository is initialized. Such behaviour would of course render the
whole protocol plugins useless...
So yes, the whole construct still needs to be tested carefully.
2017-09-23 08:13:06,783 INFO fetcher.FetchItemQueues - Using queue mode :
byHost
2017-09-23 08:13:06,785 INFO fetcher.Fetcher - Fetcher: threads: 50
2017-09-23 08:13:06,785 INFO fetcher.Fetcher - Fetcher: time-out divisor: 2
2017-09-23 08:13:06,836 INFO plugin.PluginRepository - Plugins: looking in:
/home/hiran/dev/nutch/runtime/local/plugins
2017-09-23 08:13:06,845 WARN fetcher.FetchItem - Cannot parse url:
foo://example.com
java.net.MalformedURLException: unknown protocol: foo
at java.net.URL.<init>(URL.java:600)
at java.net.URL.<init>(URL.java:490)
at java.net.URL.<init>(URL.java:439)
at org.apache.nutch.fetcher.FetchItem.create(FetchItem.java:71)
at org.apache.nutch.fetcher.FetchItem.create(FetchItem.java:63)
at
org.apache.nutch.fetcher.FetchItemQueues.addFetchItem(FetchItemQueues.java:87)
at org.apache.nutch.fetcher.QueueFeeder.run(QueueFeeder.java:91)
2017-09-23 08:13:06,899 INFO fetcher.QueueFeeder - QueueFeeder finished: total
2 records + hit by time limit :0
2017-09-23 08:13:07,508 INFO plugin.PluginRepository - Plugin Auto-activation
mode: [true]
2017-09-23 08:13:07,508 INFO plugin.PluginRepository - Registered Plugins:
2017-09-23 08:13:07,508 INFO plugin.PluginRepository - Regex URL
Filter (urlfilter-regex)
2017-09-23 08:13:07,508 INFO plugin.PluginRepository - Html Parse
Plug-in (parse-html)
2017-09-23 08:13:07,508 INFO plugin.PluginRepository - HTTP Framework
(lib-http)
2017-09-23 08:13:07,508 INFO plugin.PluginRepository - the nutch core
extension points (nutch-extensionpoints)
2017-09-23 08:13:07,508 INFO plugin.PluginRepository - Basic Indexing
Filter (index-basic)
2017-09-23 08:13:07,508 INFO plugin.PluginRepository - Anchor Indexing
Filter (index-anchor)
2017-09-23 08:13:07,508 INFO plugin.PluginRepository - Tika Parser
Plug-in (parse-tika)
2017-09-23 08:13:07,508 INFO plugin.PluginRepository - Basic URL
Normalizer (urlnormalizer-basic)
2017-09-23 08:13:07,508 INFO plugin.PluginRepository - Regex URL
Filter Framework (lib-regex-filter)
2017-09-23 08:13:07,508 INFO plugin.PluginRepository - Regex URL
Normalizer (urlnormalizer-regex)
2017-09-23 08:13:07,508 INFO plugin.PluginRepository - CyberNeko HTML
Parser (lib-nekohtml)
2017-09-23 08:13:07,508 INFO plugin.PluginRepository - OPIC Scoring
Plug-in (scoring-opic)
2017-09-23 08:13:07,508 INFO plugin.PluginRepository - Pass-through
URL Normalizer (urlnormalizer-pass)
2017-09-23 08:13:07,508 INFO plugin.PluginRepository - Http Protocol
Plug-in (protocol-http)
2017-09-23 08:13:07,508 INFO plugin.PluginRepository - Foo Protocol
Example Plug-in (protocol-foo)
2017-09-23 08:13:07,508 INFO plugin.PluginRepository - SolrIndexWriter
(indexer-solr)
2