Hi team, As i wanted to crawl some website. I am using paid proxy to hit that specific website. I wanted to know how we can configure nutch, so that it will crawl using my proxy id.
I have around 1000 proxy ip and it user name and password. So wanted to know how we can configure nutch so that it will use my proxy in round robin fashion? Also in nutch-default.xml i tried setting this property property> <name>http.proxy.host</name> <value>12.34..56.789</value> <description>The proxy hostname. If empty, no proxy is used.</description> </property> <property> <name>http.proxy.port</name> <value>1234</value> <description>The proxy port.</description> </property> <property> <name>http.proxy.username</name> <value>qwer</value> <description>Username for proxy. This will be used by 'protocol-httpclient', if the proxy server requests basic, digest and/or NTLM authentication. To use this, 'protocol-httpclient' must be present in the value of 'plugin.includes' property. NOTE: For NTLM authentication, do not prefix the username with the domain, i.e. 'susam' is correct whereas 'DOMAIN\susam' is incorrect. </description> </property>. But in hadoop log i found it throwing SSL exception. Please help me out in fixing this issue. With Regards Jyoti Aditya

