it seems that google is blocking the user agent

i get this reply with lwp-request

Your client does not have permission to get URL
<code>/search?q=site:se&amp;hl=sv&amp;start=100&amp;sa=N</code> from
this server.  (Client IP address: XX.XX.XX.XX)<br><br>
Please see Google's Terms of Service posted at
http://www.google.com/terms_of_service.html

if you set the user agent properties to a client such as firefox,
google will serve your request.

reinhard schwab schrieb:
> http://www.google.se/robots.txt
>
> google disallows it.
>
> User-agent: *
> Allow: /searchhistory/
> Disallow: /search
>
>
> Larsson85 schrieb:
>   
>> Why isnt nutch able to handle links from google?
>>
>> I tried to start a crawl from the following url
>> http://www.google.se/search?q=site:se&hl=sv&start=100&sa=N
>>
>> And all I get is "no more URLs to fetch"
>>
>> The reason for why I want to do this is because I had a tought on maby I
>> could use google to generate my start list of urls by injecting pages of
>> search result.
>>
>> Why wont this page be parsed and links extracted so the crawl can start?
>>   
>>     
>
>
>   

Reply via email to