Markus, I'm not sure how to answer your question. here are 2 xml files for your consideration.
Kris ----- Original Message ----- From: "Markus Jelsma" <[email protected]> To: [email protected] Sent: Tuesday, September 6, 2016 2:30:39 PM Subject: RE: indexing metatags with Nutch 1.12 Well, this is certainly not an indexing metatags problem. You need to use protocol-httpclient for https, or configure protocol-http's plugin.xml to support https. That's identical to protocol-httpclient's plugin.xml. On the other hand, when we added support for https to protocol-http, did we forget to add it to the plugin.xml? -----Original message----- > From:KRIS MUSSHORN <[email protected]> > Sent: Tuesday 6th September 2016 19:29 > To: [email protected] > Subject: indexing metatags with Nutch 1.12 > > https://wiki.apache.org/nutch/IndexMetatags > <https://wiki.apache.org/nutch/IndexMetatags> > > Soon as i switch to nutch-site_v2 nutch throws protocol missing errors during > crawl. > > 2016-09-06 12:23:53,102 INFO fetcher.Fetcher - -activeThreads=50, > spinWaiting=50, fetchQueues.totalSize=442, fetchQueues.getQueueCount=1 > 2016-09-06 12:23:53,576 INFO fetcher.FetcherThread - fetching > https://snip/inside/events/events_summary/documents/Harford_Co_Sheriff_Special_Brief.pdf > (queue crawl delay=500ms) > 2016-09-06 12:23:53,576 INFO fetcher.FetcherThread - fetch of > https://snip/inside/events/events_summary/documents/Harford_Co_Sheriff_Special_Brief.pdf > failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not found > for url=https > at > org.apache.nutch.protocol.ProtocolFactory.getProtocol(ProtocolFactory.java:84) > > at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:257) > > how can i fix this? > > Kris >
protocol-http-plugin.xml
Description: XML document
protocol-httpclient-plugin.xml
Description: XML document

