Well, this is certainly not an indexing metatags problem. You need to use protocol-httpclient for https, or configure protocol-http's plugin.xml to support https. That's identical to protocol-httpclient's plugin.xml.
On the other hand, when we added support for https to protocol-http, did we forget to add it to the plugin.xml? -----Original message----- > From:KRIS MUSSHORN <[email protected]> > Sent: Tuesday 6th September 2016 19:29 > To: [email protected] > Subject: indexing metatags with Nutch 1.12 > > https://wiki.apache.org/nutch/IndexMetatags > <https://wiki.apache.org/nutch/IndexMetatags> > > Soon as i switch to nutch-site_v2 nutch throws protocol missing errors during > crawl. > > 2016-09-06 12:23:53,102 INFO fetcher.Fetcher - -activeThreads=50, > spinWaiting=50, fetchQueues.totalSize=442, fetchQueues.getQueueCount=1 > 2016-09-06 12:23:53,576 INFO fetcher.FetcherThread - fetching > https://snip/inside/events/events_summary/documents/Harford_Co_Sheriff_Special_Brief.pdf > (queue crawl delay=500ms) > 2016-09-06 12:23:53,576 INFO fetcher.FetcherThread - fetch of > https://snip/inside/events/events_summary/documents/Harford_Co_Sheriff_Special_Brief.pdf > failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not found > for url=https > at > org.apache.nutch.protocol.ProtocolFactory.getProtocol(ProtocolFactory.java:84) > at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:257) > > how can i fix this? > > Kris >

