http://www.google.se/robots.txt

google disallows it.

User-agent: *
Allow: /searchhistory/
Disallow: /search


Larsson85 schrieb:
> Why isnt nutch able to handle links from google?
>
> I tried to start a crawl from the following url
> http://www.google.se/search?q=site:se&hl=sv&start=100&sa=N
>
> And all I get is "no more URLs to fetch"
>
> The reason for why I want to do this is because I had a tought on maby I
> could use google to generate my start list of urls by injecting pages of
> search result.
>
> Why wont this page be parsed and links extracted so the crawl can start?
>   

Reply via email to