Hm, this is odd. You have protocol-http configured and it should work just like 
that. Change it to protocol-httpclient to confirm a problem. 
Protocol-httpclient supported https for a much longer time than protocol-http. 

If it works with httpclient, there is some weird problem never noticed before.
M.

 
 
-----Original message-----
> From:Kris Musshorn <[email protected]>
> Sent: Tuesday 6th September 2016 23:26
> To: [email protected]
> Subject: RE: indexing metatags with Nutch 1.12
> 
> Marcus,
> 
> Here is the nutch-site.xml in place when it throws errors that I posted 
> previously
> 
> -----Original Message-----
> From: Markus Jelsma [mailto:[email protected]] 
> Sent: Tuesday, September 6, 2016 3:02 PM
> To: [email protected]
> Subject: RE: indexing metatags with Nutch 1.12
> 
> Well, so we did add https to protocol-http's plugin.xml. Does your 
> plugin.includes actually contain a protocol-* plugin?
> 
> 
>  
>  
> -----Original message-----
> > From:KRIS MUSSHORN <[email protected]>
> > Sent: Tuesday 6th September 2016 20:39
> > To: [email protected]
> > Subject: Re: indexing metatags with Nutch 1.12
> > 
> > Markus, 
> > I'm not sure how to answer your question.
> > here are 2 xml files for your consideration.
> > 
> > Kris
> > 
> > ----------- 
> > From: "Markus Jelsma" <[email protected]>
> > To: [email protected]
> > Sent: Tuesday, September 6, 2016 2:30:39 PM
> > Subject: RE: indexing metatags with Nutch 1.12
> > 
> > Well, this is certainly not an indexing metatags problem. You need to use 
> > protocol-httpclient for https, or configure protocol-http's plugin.xml to 
> > support https. That's identical to protocol-httpclient's plugin.xml.
> > 
> > On the other hand, when we added support for https to protocol-http, did we 
> > forget to add it to the plugin.xml?
> > 
> > 
> > 
> >  
> >  
> > -----Original message-----
> > > From:KRIS MUSSHORN <[email protected]>
> > > Sent: Tuesday 6th September 2016 19:29
> > > To: [email protected]
> > > Subject: indexing metatags with Nutch 1.12
> > > 
> > > https://wiki.apache.org/nutch/IndexMetatags 
> > > <https://wiki.apache.org/nutch/IndexMetatags>
> > > 
> > > Soon as i switch to nutch-site_v2 nutch throws protocol missing errors 
> > > during crawl.
> > > 
> > > 2016-09-06 12:23:53,102 INFO  fetcher.Fetcher - -activeThreads=50, 
> > > spinWaiting=50, fetchQueues.totalSize=442, fetchQueues.getQueueCount=1
> > > 2016-09-06 12:23:53,576 INFO  fetcher.FetcherThread - fetching 
> > > https://snip/inside/events/events_summary/documents/Harford_Co_Sheriff_Special_Brief.pdf
> > >  (queue crawl delay=500ms)
> > > 2016-09-06 12:23:53,576 INFO  fetcher.FetcherThread - fetch of 
> > > https://snip/inside/events/events_summary/documents/Harford_Co_Sheriff_Special_Brief.pdf
> > >  failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not 
> > > found for url=https
> > >     at 
> > > org.apache.nutch.protocol.ProtocolFactory.getProtocol(ProtocolFactory.java:84)
> > >     at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:257) 
> > > 
> > > how can i fix this?
> > > 
> > > Kris
> > > 
> > 
> 

Reply via email to