Hello,

It doesn't say much except failure, no reason. You might want to set debugging 
to TRACE, the authenticator logs on that level. You could also check if there 
are server side messages.

Regards,
Markus
 
 
-----Original message-----
> From:Larry.Santello <larry.sante...@uline.com>
> Sent: Thursday 25th April 2019 15:28
> To: user@nutch.apache.org
> Subject: Nutch NTLM to IIS 8.5 - issues!
> 
> All -
> 
> I've tried several 1.x versions of Nutch and a variety of configurations and
> simply can NOT get NTLM authentication working with Nutch. I need help
> desperately!
> 
> Here are the relevent configuration points:
> Note: "user", "password", and "ntdomain" are, of course, fillers for real
> values
> 
> httpclient-auth.xml:
> <credentials username="user" password="password" >
>       <default realm="ntdomain" /> 
> </credentials>
> 
> nutch-site.xml:
> <property>
>   <name>plugin.includes</name>
>  
> <value>protocol-(http|httpclient)|urlfilter-(regex|validator)|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
>   <description> </description>
> </property>
> 
> logged problem (note that, yes, this is from 1.5.1, but 1.15 produces
> similar results):
> 2019-04-25 07:38:47,641 INFO  parse.ParserChecker - fetching:
> http://url.com/crawltest.html
> 2019-04-25 07:38:47,650 INFO  plugin.PluginRepository - Plugins: looking in:
> C:\nutch\apache-nutch-1.5.1\plugins
> 2019-04-25 07:38:47,728 INFO  plugin.PluginRepository - Plugin
> Auto-activation mode: [true]
> 2019-04-25 07:38:47,729 INFO  plugin.PluginRepository - Registered Plugins:
> 2019-04-25 07:38:47,729 INFO  plugin.PluginRepository -       Html Parse 
> Plug-in
> (parse-html)
> 2019-04-25 07:38:47,729 INFO  plugin.PluginRepository -       HTTP Framework
> (lib-http)
> 2019-04-25 07:38:47,729 INFO  plugin.PluginRepository -       Http / Https
> Protocol Plug-in (protocol-httpclient)
> 2019-04-25 07:38:47,729 INFO  plugin.PluginRepository -       Regex URL Filter
> (urlfilter-regex)
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository -       the nutch core
> extension points (nutch-extensionpoints)
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository -       Basic Indexing
> Filter (index-basic)
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository -       Anchor Indexing
> Filter (index-anchor)
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository -       Tika Parser 
> Plug-in
> (parse-tika)
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository -       Basic URL
> Normalizer (urlnormalizer-basic)
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository -       Regex URL Filter
> Framework (lib-regex-filter)
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository -       Regex URL
> Normalizer (urlnormalizer-regex)
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository -       URL Validator
> (urlfilter-validator)
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository -       CyberNeko HTML
> Parser (lib-nekohtml)
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository -       Pass-through URL
> Normalizer (urlnormalizer-pass)
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository -       OPIC Scoring
> Plug-in (scoring-opic)
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository -       Http Protocol
> Plug-in (protocol-http)
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository - Registered
> Extension-Points:
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository -       Nutch Content
> Parser (org.apache.nutch.parse.Parser)
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository -       Nutch URL Filter
> (org.apache.nutch.net.URLFilter)
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository -       HTML Parse 
> Filter
> (org.apache.nutch.parse.HtmlParseFilter)
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository -       Nutch Scoring
> (org.apache.nutch.scoring.ScoringFilter)
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository -       Nutch URL
> Normalizer (org.apache.nutch.net.URLNormalizer)
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository -       Nutch Protocol
> (org.apache.nutch.protocol.Protocol)
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository -       Nutch Segment 
> Merge
> Filter (org.apache.nutch.segment.SegmentMergeFilter)
> 2019-04-25 07:38:47,733 INFO  plugin.PluginRepository -       Nutch Indexing
> Filter (org.apache.nutch.indexer.IndexingFilter)
> 2019-04-25 07:38:47,761 INFO  httpclient.Http - http.proxy.host = null
> 2019-04-25 07:38:47,762 INFO  httpclient.Http - http.proxy.port = 8080
> 2019-04-25 07:38:47,763 INFO  httpclient.Http - http.timeout = 10000
> 2019-04-25 07:38:47,763 INFO  httpclient.Http - http.content.limit = -1
> 2019-04-25 07:38:47,763 INFO  httpclient.Http - http.agent = Ulinenet
> Spider/Nutch-1.5.1
> 2019-04-25 07:38:47,764 INFO  httpclient.Http - http.accept.language =
> en-us,en-gb,en;q=0.7,*;q=0.3
> 2019-04-25 07:38:47,764 INFO  httpclient.Http - http.accept =
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> 2019-04-25 07:38:47,835 DEBUG auth.AuthChallengeProcessor - Supported
> authentication schemes in the order of preference: [ntlm, digest, basic]
> 2019-04-25 07:38:47,836 INFO  auth.AuthChallengeProcessor - ntlm
> authentication scheme selected
> 2019-04-25 07:38:47,837 DEBUG auth.AuthChallengeProcessor - Using
> authentication scheme: ntlm
> 2019-04-25 07:38:47,837 DEBUG auth.AuthChallengeProcessor - Authorization
> challenge processed
> 2019-04-25 07:38:47,847 DEBUG auth.AuthChallengeProcessor - Using
> authentication scheme: ntlm
> 2019-04-25 07:38:47,847 DEBUG auth.AuthChallengeProcessor - Authorization
> challenge processed
> 2019-04-25 07:38:48,335 DEBUG auth.AuthChallengeProcessor - Using
> authentication scheme: ntlm
> 2019-04-25 07:38:48,336 DEBUG auth.AuthChallengeProcessor - Authorization
> challenge processed
> 2019-04-25 07:38:48,337 INFO  httpclient.HttpMethodDirector - Failure
> authenticating with NTLM <any realm>@url.com:80
> 2019-04-25 07:38:48,507 INFO  crawl.SignatureFactory - Using Signature impl:
> org.apache.nutch.crawl.MD5Signature
> 2019-04-25 07:38:48,509 INFO  parse.ParserChecker - parsing:
> http://url.com/crawltest.html
> 2019-04-25 07:38:48,509 INFO  parse.ParserChecker - contentType:
> application/xhtml+xml
> 2019-04-25 07:38:48,510 INFO  parse.ParserChecker - signature:
> 495abb7f991fb4dd6a056f748908a2d9
> 
> The way i'm testing:
> bin/nutch parsechecker http://url.com/crawltest.html
> 
> Finally, I should note that the following curl command DOES work:
> curl --ntlm --user user:password http://url.com/crawltest.html
> 
> 
> 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Nutch-User-f603147.html
> 

Reply via email to