I have a spider that want to crawl some data of a movie site, but I found that these data was generated through ajax after the page loaded,and my code is: require 'nokogiri' require 'watir-webdriver'
browser = Watir::Browser.new browser.goto 'http://www.tudou.com/albumplay/2Dk1-JIVpzo/yp927-uKGMs.html?FR=LIAN' browser.element(:css => "#digBury .dig_container").wait_until_present puts '***************************************' puts browser.html puts '***************************************' doc = Nokogiri::HTML(browser.html) content = doc.css(".dig_container .num") browser.close from the output I can get the content which I want: 在此输入代码... <div id="digBury" class="dig_wrap"> <div class="dig_container"> <a title="喜欢就挖一下吧,登录后双倍威力" class="btn" href="#"> <i class="iconfont"></i> <i class="tip">+1</i> <span class="num">332</span> </a> </div> </div> but I know that on the server I must use headless ,so I changed my code to: require 'nokogiri' require 'watir-webdriver' require 'headless' headless = Headless.new headless.start browser = Watir::Browser.new browser.goto 'http://www.tudou.com/albumplay/2Dk1-JIVpzo/yp927-uKGMs.html?FR=LIAN' browser.element(:css => "#digBury .dig_container").wait_until_present puts '***************************************' puts browser.html puts '***************************************' doc = Nokogiri::HTML(browser.html) content = doc.css(".dig_container .num") browser.close headless.destroy this time I can't get my result,and the result is: <div id="digBury" class="dig_wrap disabled"> <a title="挖" class="btn" href="#"> <i class="iconfont"></i> <span class="btn_desc">挖</span> </a> </div> the diffrence is I have added headless and the effect is the ajax request don't send or the ajax response I missed ,how can i fix this problem? -- -- Before posting, please read http://watir.com/support. In short: search before you ask, be nice. [email protected] http://groups.google.com/group/watir-general [email protected] --- You received this message because you are subscribed to the Google Groups "Watir General" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
