Dear All,

We are now able to connect to the IIS proxy, thanks to the added logging
facilities by Karl, we were able to see that this is the fix :

Index:
connectors/webcrawler/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/webcrawler/WebcrawlerConnector.java
===================================================================
---
connectors/webcrawler/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/webcrawler/WebcrawlerConnector.java
(revision
1357379)
+++
connectors/webcrawler/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/webcrawler/WebcrawlerConnector.java
(working
copy)
@@ -361,7 +361,7 @@
       String emailAddress =
params.getParameter(WebcrawlerConfig.PARAMETER_EMAIL);
       if (emailAddress == null)
         throw new ManifoldCFException("Missing email address");
-      userAgent = "ApacheManifoldCFWebCrawler; "+emailAddress+")";
+      userAgent = "Mozilla/5.0 (ApacheManifoldCFWebCrawler;
"+emailAddress+")";
       from = emailAddress;

       x = params.getParameter(WebcrawlerConfig.PARAMETER_ROBOTSUSAGE);

Yes, this is weird, a proxy shouldn't fail on User-Agent settings, but
apparently this one does.
Even Google apparently does this :
http://www.useragentstring.com/pages/Googlebot/
Now, we 'just' have to get the crawling working,  but the main (unique)
hurdle has now been taken !

Karl, a big Thank You for your help, and for the openssl s_client that
enabled us to debug this.

Dag,
Jan

On Thu, Jun 28, 2012 at 11:05 PM, Jan van Haarst <[email protected]> wrote:

> On Thu, Jun 28, 2012 at 11:26 AM, Karl Wright <[email protected]> wrote:
>
>> I was wondering if you'd picked up and tried the patch for
>> CONNECTORS-483.  This patch adds official proxy support for the Web
>> Connector.  Alternatively, you could try to build and run with trunk
>> code.
>>
>> Karl
>>
>
> I'm going the building from trunk way, and all seems to go well up to the
> creation of the zip and tar.gz files.
> Is there anything special to do after running the build process like this ?
>
> ant clean clean-core-deps clean-deps && ant make-core-deps make-deps build
> && ant image
>
> Did I miss anything ?
> If not, I'll replace the old binary installation with my source-build one,
> and see where it leads me.
>
> --
> Dag,
> Jan
>



-- 
Dag,
Jan

Reply via email to