RE: No internet connection in Nutch crawler: Proxy configuration -PAC file

2018-04-23 Thread Yossi Tamari
To add to what Lewis said, PAC files are mostly used by browsers, not so much 
by servers (like Nutch). It is possible your IT department has another proxy 
configuration that you can use in a server.
Keep in mind that a PAC file is just a JavaScript function that translates a 
URL to proxy information, so if the logic is simple and the file is static, it 
may be enough for you to look at the contents of the file, and extract some 
static proxy definition that will work for all URLs.

> -Original Message-
> From: lewis john mcgibbney 
> Sent: 23 April 2018 18:04
> To: user@nutch.apache.org
> Subject: Re: No internet connection in Nutch crawler: Proxy configuration -PAC
> file
> 
> Hi Patricia,
> I've never used a proxy auto-config (PAC) method for proxying anything before.
> The PAC is defined as "...Proxy auto-configuration (PAC): Specify the URL for 
> a
> PAC file with a JavaScript function that determines the appropriate proxy for
> each URL. This method is more suitable for laptop users who need several
> different proxy configurations, or complex corporate setups with many 
> different
> proxies."
> Right now, the public guidance for using Nutch with a proxy goes as far as the
> following tutorial https://wiki.apache.org/nutch/SetupProxyForNutch
> Right now, Nutch does not support the reading of PAC files... I think you 
> would
> need to add this functionality.
> Lewis
> 
> On Sun, Apr 22, 2018 at 10:31 AM,  wrote:
> 
> >
> > From: Patricia Helmich 
> > To: "user@nutch.apache.org" 
> > Cc:
> > Bcc:
> > Date: Fri, 20 Apr 2018 10:31:42 +
> > Subject: No internet connection in Nutch crawler: Proxy configuration
> > -PAC file Hi,
> >
> > I am using Nutch and it used to work fine. Now, some internet
> > configurations changed and I have to use a proxy. In my browser, I
> > specify the proxy by providing a PAC file to the option "Automatic
> > proxy configuration URL". I was searching for a similar option in
> > Nutch in the conf/nutch-default.xml file. I do find some proxy options
> > (http.proxy.host, http.proxy.port, http.proxy.username,
> > http.proxy.password,
> > http.proxy.realm) but none seems to be the one I am searching for.
> >
> > So, my question is: where can I specify the PAC file in the Nutch
> > configurations for the proxy?
> >
> > Thanks for your help,
> >
> > Patricia
> >
> >
> 
> 
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc



Re: No internet connection in Nutch crawler: Proxy configuration -PAC file

2018-04-23 Thread lewis john mcgibbney
Hi Patricia,
I've never used a proxy auto-config (PAC) method for proxying anything
before. The PAC is defined as "...Proxy auto-configuration (PAC): Specify
the URL for a PAC file with a JavaScript function that determines the
appropriate proxy for each URL. This method is more suitable for laptop
users who need several different proxy configurations, or complex corporate
setups with many different proxies."
Right now, the public guidance for using Nutch with a proxy goes as far as
the following tutorial
https://wiki.apache.org/nutch/SetupProxyForNutch
Right now, Nutch does not support the reading of PAC files... I think you
would need to add this functionality.
Lewis

On Sun, Apr 22, 2018 at 10:31 AM,  wrote:

>
> From: Patricia Helmich 
> To: "user@nutch.apache.org" 
> Cc:
> Bcc:
> Date: Fri, 20 Apr 2018 10:31:42 +
> Subject: No internet connection in Nutch crawler: Proxy configuration -PAC
> file
> Hi,
>
> I am using Nutch and it used to work fine. Now, some internet
> configurations changed and I have to use a proxy. In my browser, I specify
> the proxy by providing a PAC file to the option "Automatic proxy
> configuration URL". I was searching for a similar option in Nutch in the
> conf/nutch-default.xml file. I do find some proxy options (http.proxy.host,
> http.proxy.port, http.proxy.username, http.proxy.password,
> http.proxy.realm) but none seems to be the one I am searching for.
>
> So, my question is: where can I specify the PAC file in the Nutch
> configurations for the proxy?
>
> Thanks for your help,
>
> Patricia
>
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc