[Rails] hpricot won't scrape! (newb question)

jrgoodner Wed, 01 Apr 2009 01:28:34 -0700

Hey all!  Just to preface, I am fairly new to RoR, and brand new to
using hpricot.


I am using the following code to scrape this xpath:
"/html/body/div/div[5]/div/div[2]/div[2]/div[2]"

from this url:
"http://www.greatnonprofits.org/";

Here is my code to do so (taken from igvita.com's related blogpost):
*************
require 'rubygems'

require 'open-uri'

require 'hpricot'



@url = "http://www.greatnonprofits.org/";

@response = ''



begin

  # open-uri RDoc: http://stdlib.rubyonrails.org/libdoc/open-uri/rdoc/index.html

  open(@url, "User-Agent" => "Ruby/#{RUBY_VERSION}",
    "From" => "[email protected]",
    "Referer" => "http://www.igvita.com/blog/";) { |f|



    puts "Fetched document: #{f.base_uri}"

    puts "\t Content Type: #{f.content_type}\n"

    puts "\t Charset: #{f.charset}\n"

    puts "\t Content-Encoding: #{f.content_encoding}\n"

    puts "\t Last Modified: #{f.last_modified}\n\n"



    # Save the response body

    @response = f.read

  }



  # HPricot RDoc: http://code.whytheluckystiff.net/hpricot/

  doc = Hpricot(@response)



  # Retrieve content

  puts (doc/"/html/body/div/div[5]/div/div[2]/div[2]/div[2]").to_html
()





rescue Exception => e

  print e, "\n"

end
***************

In my irb terminal, I get this:

***************
irb(main):031:0> load 'greatnonprofitsscraper.rb'
Fetched document: http://www.greatnonprofits.org/
         Content Type: text/html
         Charset: utf-8
         Content-Encoding:
         Last Modified: Tue Mar 31 23:43:52 -0700 2009


=> true
***************

Anyone know why this is happening?  The code works with other urls/
xpaths.  Can anyone specify for me why www.greatnonprofits.com is
different?

Thanks a million!  I am quite frustrated, and I appreciate any help!!!

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

[Rails] hpricot won't scrape! (newb question)

Reply via email to