Any workaround for this? Making nutch identify as something else or something similar?
reinhard schwab wrote: > > http://www.google.se/robots.txt > > google disallows it. > > User-agent: * > Allow: /searchhistory/ > Disallow: /search > > > Larsson85 schrieb: >> Why isnt nutch able to handle links from google? >> >> I tried to start a crawl from the following url >> http://www.google.se/search?q=site:se&hl=sv&start=100&sa=N >> >> And all I get is "no more URLs to fetch" >> >> The reason for why I want to do this is because I had a tought on maby I >> could use google to generate my start list of urls by injecting pages of >> search result. >> >> Why wont this page be parsed and links extracted so the crawl can start? >> > > > -- View this message in context: http://www.nabble.com/Why-cant-I-inject-a-google-link-to-the-database--tp24533162p24533426.html Sent from the Nutch - User mailing list archive at Nabble.com.
