i think u need a protocol to parse the https so u need to change this in ur nutch-site.xml if u hava the protocol-https plugin
<name>plugin.includes</name> <value>nutch-extensionpoints|protocol-http|protocol-https |urlfilter-regex|parse-(text|html)|index-basic|query-(basic|site|url)</value> <description>Regular expression naming plugin directory names to include. Any plugin not matching this expression is excluded. In any case you need at least include the nutch-extensionpoints plugin. By default Nutch includes crawling just HTML and plain text via HTTP, and basic indexing and search plugins. </description> </property> On 3/27/06, Michael Ji <[EMAIL PROTECTED]> wrote: > > hi there: > > Does the following lines in nutch-site.xml will let > nutch to fetch https page down? > > "protocol-(http|https)" > > I tried that but gives me error message of > > " > failed with: > org.apache.nutch.protocol.ProtocolNotFound: protocol > not found for url=https > " > > Any idea how to fix it? > > thanks, > > Michael > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > -- www.babatu.com
