Well, this is certainly not an indexing metatags problem. You need to use 
protocol-httpclient for https, or configure protocol-http's plugin.xml to 
support https. That's identical to protocol-httpclient's plugin.xml.

On the other hand, when we added support for https to protocol-http, did we 
forget to add it to the plugin.xml?



 
 
-----Original message-----
> From:KRIS MUSSHORN <[email protected]>
> Sent: Tuesday 6th September 2016 19:29
> To: [email protected]
> Subject: indexing metatags with Nutch 1.12
> 
> https://wiki.apache.org/nutch/IndexMetatags 
> <https://wiki.apache.org/nutch/IndexMetatags>
> 
> Soon as i switch to nutch-site_v2 nutch throws protocol missing errors during 
> crawl.
> 
> 2016-09-06 12:23:53,102 INFO  fetcher.Fetcher - -activeThreads=50, 
> spinWaiting=50, fetchQueues.totalSize=442, fetchQueues.getQueueCount=1
> 2016-09-06 12:23:53,576 INFO  fetcher.FetcherThread - fetching 
> https://snip/inside/events/events_summary/documents/Harford_Co_Sheriff_Special_Brief.pdf
>  (queue crawl delay=500ms)
> 2016-09-06 12:23:53,576 INFO  fetcher.FetcherThread - fetch of 
> https://snip/inside/events/events_summary/documents/Harford_Co_Sheriff_Special_Brief.pdf
>  failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not found 
> for url=https
>     at 
> org.apache.nutch.protocol.ProtocolFactory.getProtocol(ProtocolFactory.java:84)
>     at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:257) 
> 
> how can i fix this?
> 
> Kris
> 

Reply via email to