i think u need a protocol to parse the https
so u need to change this in ur nutch-site.xml if u hava the
protocol-https plugin


<name>plugin.includes</name>
  <value>nutch-extensionpoints|protocol-http|protocol-https
|urlfilter-regex|parse-(text|html)|index-basic|query-(basic|site|url)</value>

<description>Regular expression naming plugin directory names to
  include.  Any plugin not matching this expression is excluded.
  In any case you need at least include the nutch-extensionpoints plugin. By
  default Nutch includes crawling just HTML and plain text via HTTP,
  and basic indexing and search plugins.
  </description>
</property>

On 3/27/06, Michael Ji <[EMAIL PROTECTED]> wrote:
>
> hi there:
>
> Does the following lines in nutch-site.xml will let
> nutch to fetch https page down?
>
> "protocol-(http|https)"
>
> I tried that but gives me error message of
>
> "
> failed with:
> org.apache.nutch.protocol.ProtocolNotFound: protocol
> not found for url=https
> "
>
> Any idea how to fix it?
>
> thanks,
>
> Michael
>
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>



--
www.babatu.com

Reply via email to