Marcus,
nutch_site.xml with... <property> <name> plugin.includes </name> <value> protocol-httpclient|urlfilter-regex|parse-(html|tika|metatags)|index-(basic|anchor|metadata)|scoring-opic|urlnormalizer-(pass|regex|basic) </value> <description> item needed to parse metatags out of html. </description> </property> Throws the same errors. Kris ----- Original Message ----- From: "Markus Jelsma" <[email protected]> To: [email protected] Sent: Tuesday, September 6, 2016 6:24:00 PM Subject: RE: indexing metatags with Nutch 1.12 Hm, this is odd. You have protocol-http configured and it should work just like that. Change it to protocol-httpclient to confirm a problem. Protocol-httpclient supported https for a much longer time than protocol-http. If it works with httpclient, there is some weird problem never noticed before. M. -----Original message----- > From:Kris Musshorn <[email protected]> > Sent: Tuesday 6th September 2016 23:26 > To: [email protected] > Subject: RE: indexing metatags with Nutch 1.12 > > Marcus, > > Here is the nutch-site.xml in place when it throws errors that I posted > previously > > -----Original Message----- > From: Markus Jelsma [mailto:[email protected]] > Sent: Tuesday, September 6, 2016 3:02 PM > To: [email protected] > Subject: RE: indexing metatags with Nutch 1.12 > > Well, so we did add https to protocol-http's plugin.xml. Does your > plugin.includes actually contain a protocol-* plugin? > > > > > -----Original message----- > > From:KRIS MUSSHORN <[email protected]> > > Sent: Tuesday 6th September 2016 20:39 > > To: [email protected] > > Subject: Re: indexing metatags with Nutch 1.12 > > > > Markus, > > I'm not sure how to answer your question. > > here are 2 xml files for your consideration. > > > > Kris > > > > ----------- > > From: "Markus Jelsma" <[email protected]> > > To: [email protected] > > Sent: Tuesday, September 6, 2016 2:30:39 PM > > Subject: RE: indexing metatags with Nutch 1.12 > > > > Well, this is certainly not an indexing metatags problem. You need to use > > protocol-httpclient for https, or configure protocol-http's plugin.xml to > > support https. That's identical to protocol-httpclient's plugin.xml. > > > > On the other hand, when we added support for https to protocol-http, did we > > forget to add it to the plugin.xml? > > > > > > > > > > > > -----Original message----- > > > From:KRIS MUSSHORN <[email protected]> > > > Sent: Tuesday 6th September 2016 19:29 > > > To: [email protected] > > > Subject: indexing metatags with Nutch 1.12 > > > > > > https://wiki.apache.org/nutch/IndexMetatags > > > <https://wiki.apache.org/nutch/IndexMetatags> > > > > > > Soon as i switch to nutch-site_v2 nutch throws protocol missing errors > > > during crawl. > > > > > > 2016-09-06 12:23:53,102 INFO fetcher.Fetcher - -activeThreads=50, > > > spinWaiting=50, fetchQueues.totalSize=442, fetchQueues.getQueueCount=1 > > > 2016-09-06 12:23:53,576 INFO fetcher.FetcherThread - fetching > > > https://snip/inside/events/events_summary/documents/Harford_Co_Sheriff_Special_Brief.pdf > > > (queue crawl delay=500ms) > > > 2016-09-06 12:23:53,576 INFO fetcher.FetcherThread - fetch of > > > https://snip/inside/events/events_summary/documents/Harford_Co_Sheriff_Special_Brief.pdf > > > failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not > > > found for url=https > > > at > > > org.apache.nutch.protocol.ProtocolFactory.getProtocol(ProtocolFactory.java:84) > > > > > > at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:257) > > > > > > how can i fix this? > > > > > > Kris > > > > > >

