Markus, 
I'm not sure how to answer your question. 
here are 2 xml files for your consideration. 

Kris 

----- Original Message -----

From: "Markus Jelsma" <[email protected]> 
To: [email protected] 
Sent: Tuesday, September 6, 2016 2:30:39 PM 
Subject: RE: indexing metatags with Nutch 1.12 

Well, this is certainly not an indexing metatags problem. You need to use 
protocol-httpclient for https, or configure protocol-http's plugin.xml to 
support https. That's identical to protocol-httpclient's plugin.xml. 

On the other hand, when we added support for https to protocol-http, did we 
forget to add it to the plugin.xml? 





-----Original message----- 
> From:KRIS MUSSHORN <[email protected]> 
> Sent: Tuesday 6th September 2016 19:29 
> To: [email protected] 
> Subject: indexing metatags with Nutch 1.12 
> 
> https://wiki.apache.org/nutch/IndexMetatags 
> <https://wiki.apache.org/nutch/IndexMetatags> 
> 
> Soon as i switch to nutch-site_v2 nutch throws protocol missing errors during 
> crawl. 
> 
> 2016-09-06 12:23:53,102 INFO fetcher.Fetcher - -activeThreads=50, 
> spinWaiting=50, fetchQueues.totalSize=442, fetchQueues.getQueueCount=1 
> 2016-09-06 12:23:53,576 INFO fetcher.FetcherThread - fetching 
> https://snip/inside/events/events_summary/documents/Harford_Co_Sheriff_Special_Brief.pdf
>  (queue crawl delay=500ms) 
> 2016-09-06 12:23:53,576 INFO fetcher.FetcherThread - fetch of 
> https://snip/inside/events/events_summary/documents/Harford_Co_Sheriff_Special_Brief.pdf
>  failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not found 
> for url=https 
> at 
> org.apache.nutch.protocol.ProtocolFactory.getProtocol(ProtocolFactory.java:84)
>  
> at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:257) 
> 
> how can i fix this? 
> 
> Kris 
> 

Attachment: protocol-http-plugin.xml
Description: XML document

Attachment: protocol-httpclient-plugin.xml
Description: XML document

Reply via email to