CLASSIFICATION: UNCLASSIFIED

nutch_site.xml with...
<property>
                <name>
                        plugin.includes
                </name>
                <value>
                        
protocol-httpclient|urlfilter-regex|parse-(html|tika|metatags)|index-(basic|anchor|metadata)|scoring-opic|urlnormalizer-(pass|regex|basic)
                </value>
                <description>
                        item needed to parse metatags out of html.
                </description>
</property>

Throws the same errors.

Thanks,
Kris

~~~~~~~~~~~~~~~~~~~~~~~~~~
Kris T. Musshorn
FileMaker Developer - Contractor – Catapult Technology Inc.      
US Army Research Lab 
Aberdeen Proving Ground 
Application Management & Development Branch 
410-278-7251
[email protected]
~~~~~~~~~~~~~~~~~~~~~~~~~~

-----Original Message-----
From: Markus Jelsma [mailto:[email protected]] 
Sent: Tuesday, September 06, 2016 6:24 PM
To: [email protected]
Subject: [Non-DoD Source] RE: indexing metatags with Nutch 1.12

All active links contained in this email were disabled.  Please verify the 
identity of the sender, and confirm the authenticity of all links contained 
within the message prior to copying and pasting the address to a Web browser.  




----

Hm, this is odd. You have protocol-http configured and it should work just like 
that. Change it to protocol-httpclient to confirm a problem. 
Protocol-httpclient supported https for a much longer time than protocol-http. 

If it works with httpclient, there is some weird problem never noticed before.
M.

 
 
-----Original message-----
> From:Kris Musshorn <[email protected]>
> Sent: Tuesday 6th September 2016 23:26
> To: [email protected]
> Subject: RE: indexing metatags with Nutch 1.12
> 
> Marcus,
> 
> Here is the nutch-site.xml in place when it throws errors that I posted 
> previously
> 
> -----Original Message-----
> From: Markus Jelsma [Caution-mailto:[email protected]] 
> Sent: Tuesday, September 6, 2016 3:02 PM
> To: [email protected]
> Subject: RE: indexing metatags with Nutch 1.12
> 
> Well, so we did add https to protocol-http's plugin.xml. Does your 
> plugin.includes actually contain a protocol-* plugin?
> 
> 
>  
>  
> -----Original message-----
> > From:KRIS MUSSHORN <[email protected]>
> > Sent: Tuesday 6th September 2016 20:39
> > To: [email protected]
> > Subject: Re: indexing metatags with Nutch 1.12
> > 
> > Markus, 
> > I'm not sure how to answer your question.
> > here are 2 xml files for your consideration.
> > 
> > Kris
> > 
> > ----------- 
> > From: "Markus Jelsma" <[email protected]>
> > To: [email protected]
> > Sent: Tuesday, September 6, 2016 2:30:39 PM
> > Subject: RE: indexing metatags with Nutch 1.12
> > 
> > Well, this is certainly not an indexing metatags problem. You need to use 
> > protocol-httpclient for https, or configure protocol-http's plugin.xml to 
> > support https. That's identical to protocol-httpclient's plugin.xml.
> > 
> > On the other hand, when we added support for https to protocol-http, did we 
> > forget to add it to the plugin.xml?
> > 
> > 
> > 
> >  
> >  
> > -----Original message-----
> > > From:KRIS MUSSHORN <[email protected]>
> > > Sent: Tuesday 6th September 2016 19:29
> > > To: [email protected]
> > > Subject: indexing metatags with Nutch 1.12
> > > 
> > > Caution-https://wiki.apache.org/nutch/IndexMetatags 
> > > <Caution-https://wiki.apache.org/nutch/IndexMetatags>
> > > 
> > > Soon as i switch to nutch-site_v2 nutch throws protocol missing errors 
> > > during crawl.
> > > 
> > > 2016-09-06 12:23:53,102 INFO  fetcher.Fetcher - -activeThreads=50, 
> > > spinWaiting=50, fetchQueues.totalSize=442, fetchQueues.getQueueCount=1
> > > 2016-09-06 12:23:53,576 INFO  fetcher.FetcherThread - fetching 
> > > Caution-https://snip/inside/events/events_summary/documents/Harford_Co_Sheriff_Special_Brief.pdf
> > >  (queue crawl delay=500ms)
> > > 2016-09-06 12:23:53,576 INFO  fetcher.FetcherThread - fetch of 
> > > Caution-https://snip/inside/events/events_summary/documents/Harford_Co_Sheriff_Special_Brief.pdf
> > >  failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not 
> > > found for url=https
> > >     at 
> > > org.apache.nutch.protocol.ProtocolFactory.getProtocol(ProtocolFactory.java:84)
> > >     at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:257) 
> > > 
> > > how can i fix this?
> > > 
> > > Kris
> > > 
> > 
> 


CLASSIFICATION: UNCLASSIFIED

Reply via email to