Thanks for all of your work on this. I'll be able to commit this patch tonight.
Karl Sent from my Windows Phone ------------------------------ From: Jan van Haarst Sent: 7/8/2012 6:40 AM To: [email protected] Subject: Re: Crawling behind an ISA proxy (iis 7.5) Dear All, We are now able to connect to the IIS proxy, thanks to the added logging facilities by Karl, we were able to see that this is the fix : Index: connectors/webcrawler/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/webcrawler/WebcrawlerConnector.java =================================================================== --- connectors/webcrawler/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/webcrawler/WebcrawlerConnector.java (revision 1357379) +++ connectors/webcrawler/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/webcrawler/WebcrawlerConnector.java (working copy) @@ -361,7 +361,7 @@ String emailAddress = params.getParameter(WebcrawlerConfig.PARAMETER_EMAIL); if (emailAddress == null) throw new ManifoldCFException("Missing email address"); - userAgent = "ApacheManifoldCFWebCrawler; "+emailAddress+")"; + userAgent = "Mozilla/5.0 (ApacheManifoldCFWebCrawler; "+emailAddress+")"; from = emailAddress; x = params.getParameter(WebcrawlerConfig.PARAMETER_ROBOTSUSAGE); Yes, this is weird, a proxy shouldn't fail on User-Agent settings, but apparently this one does. Even Google apparently does this : http://www.useragentstring.com/pages/Googlebot/ Now, we 'just' have to get the crawling working, but the main (unique) hurdle has now been taken ! Karl, a big Thank You for your help, and for the openssl s_client that enabled us to debug this. Dag, Jan On Thu, Jun 28, 2012 at 11:05 PM, Jan van Haarst <[email protected]> wrote: > On Thu, Jun 28, 2012 at 11:26 AM, Karl Wright <[email protected]> wrote: > >> I was wondering if you'd picked up and tried the patch for >> CONNECTORS-483. This patch adds official proxy support for the Web >> Connector. Alternatively, you could try to build and run with trunk >> code. >> >> Karl >> > > I'm going the building from trunk way, and all seems to go well up to the > creation of the zip and tar.gz files. > Is there anything special to do after running the build process like this ? > > ant clean clean-core-deps clean-deps && ant make-core-deps make-deps build > && ant image > > Did I miss anything ? > If not, I'll replace the old binary installation with my source-build one, > and see where it leads me. > > -- > Dag, > Jan > -- Dag, Jan
