identify nutch as popular user agent such as firefox. Larsson85 schrieb: > Any workaround for this? Making nutch identify as something else or something > similar? > > > reinhard schwab wrote: > >> http://www.google.se/robots.txt >> >> google disallows it. >> >> User-agent: * >> Allow: /searchhistory/ >> Disallow: /search >> >> >> Larsson85 schrieb: >> >>> Why isnt nutch able to handle links from google? >>> >>> I tried to start a crawl from the following url >>> http://www.google.se/search?q=site:se&hl=sv&start=100&sa=N >>> >>> And all I get is "no more URLs to fetch" >>> >>> The reason for why I want to do this is because I had a tought on maby I >>> could use google to generate my start list of urls by injecting pages of >>> search result. >>> >>> Why wont this page be parsed and links extracted so the crawl can start? >>> >>> >> >> > >
- Re: Why cant I inject a google link to the database? reinhard schwab
- Re: Why cant I inject a google link to the database? reinhard schwab
- Re: Why cant I inject a google link to the datab... reinhard schwab
- Re: Why cant I inject a google link to the database? Larsson85
- Re: Why cant I inject a google link to the datab... Doğacan Güney
- Re: Why cant I inject a google link to the d... Doğacan Güney
- Re: Why cant I inject a google link to the datab... reinhard schwab
- Re: Why cant I inject a google link to the d... Dennis Kubes
- Re: Why cant I inject a google link to t... reinhard schwab
- Re: Why cant I inject a google link to the d... Larsson85
- Re: Why cant I inject a google link to t... Jake Jacobson
- Re: Why cant I inject a google link to t... Brian Ulicny
- Re: Why cant I inject a google link... Andrzej Bialecki
- Re: Why cant I inject a google link... reinhard schwab
