Any workaround for this? Making nutch identify as something else or something
similar?


reinhard schwab wrote:
> 
> http://www.google.se/robots.txt
> 
> google disallows it.
> 
> User-agent: *
> Allow: /searchhistory/
> Disallow: /search
> 
> 
> Larsson85 schrieb:
>> Why isnt nutch able to handle links from google?
>>
>> I tried to start a crawl from the following url
>> http://www.google.se/search?q=site:se&hl=sv&start=100&sa=N
>>
>> And all I get is "no more URLs to fetch"
>>
>> The reason for why I want to do this is because I had a tought on maby I
>> could use google to generate my start list of urls by injecting pages of
>> search result.
>>
>> Why wont this page be parsed and links extracted so the crawl can start?
>>   
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Why-cant-I-inject-a-google-link-to-the-database--tp24533162p24533426.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to