Re: Parsing through downloaded html

Иван Бишевац Wed, 12 Sep 2012 14:54:46 -0700

require 'nokogiri'
require 'spreadsheet'

Spreadsheet.client_encoding = 'UTF-8'
book = Spreadsheet::Workbook.new
sheet1 = book.create_worksheet


# Numbering is zero based. This means that first row is labeled 0, first
column 0.
row = 0

Dir.chdir("anattempt")
Dir.glob['*.html'].each do |document|
  f = file.open(document)
  searchablefile = Nokogiri::HTML(f)

  # use at_xpath rather than xpath since first one method returns just 1
element,
  # but second method xpath returns array of all found records matching
criteria
  var1 = searchablefile.at_xpath("your xpath here..")
  var2 = searchablefile.at_xpath("your xpath here..")

  # In first pass it saves data to first row, and two columns A and B.
  # Every nest pass increments row by 1, but columns are same A and B.
  sheet1[row, 0] = variabelebasedonaxpath.content
  shhet1[row, 1] = variabelebasedonaxpath.content

  #After saving data increment row position by 1
  row += 1
end

book.write 'htmltoexcel.xls'


I didn't tested this, but if something goes wrong ask here.
Also read http://nokogiri.org/tutorials for learning how to parse xml/html
documents, that's short but useful resource.

-- You received this message because you are subscribed to the Google Groups 
ruby-talk-google group. To post to this group, send email to 
[email protected]. To unsubscribe from this group, send email 
to [email protected]. For more options, visit this 
group at https://groups.google.com/d/forum/ruby-talk-google?hl=en

Re: Parsing through downloaded html

Reply via email to