Frederick Cheung wrote:
> 
> Does Scraper need to be an activerecord class at all ? you could pass
> to it the class whose table needs to be updated ie
> 
> def do_something(some_klass)
>   some_klass.update_all(...)
> end
> 
> or perhaps you might want to couple things a little more loosely
> 
> def do_something(some_klass)
>   some_klass.handle_scraper_data(...)
> end
> 
> Fred

Hi Fred:

Here's what I managed to do on my own (believe it or not - lol ):

My Rake Task:

Basically calling the RushingOffense class from models

desc "Parse Rushing Offenses data from ncaa.org"
task :parse_rushing_offenses => :environment do
  update_rushing = RushingOffense.new
  update_rushing.scrape
end

My Model for Rushing Offense:

Which basically I created a method for "scrape" to scrape data utilizing 
the Scraper class.  Since this model has inheritance with ActiveRecord 
it should be able to update...

class RushingOffense < ActiveRecord::Base
  def scrape
    offensive_rushing = 
Scraper.new('http://web1.ncaa.org/mfb/natlRank.jsp?year=2008&rpt=IA_teamrush&site=org',
      'table', 'statstable', '//tr')
    offensive_rushing.scrape_data
    offensive_rushing.clean_celldata
    for i in 0..offensive_rushing.numrows-1
      puts "Updating Team Name = #{offensive_rushing.rows[i][1]}."
      RushingOffense.update_all(:name => offensive_rushing.rows[i][1], 
:games => offensive_rushing.rows[i][2])
    end
  end
end

Then finally, I have my scraper.rb file

#== Scraper Version 1.0
#
#*Created By:* _Elricstorm_
#
# _Special thanks to Soledad Penades for his initial parse idea which I 
worked with to create the Scraper program.
# His article is located at 
http://www.iterasi.net/openviewer.aspx?sqrlitid=wd5wiad-hkgk93aw8zidbw_
#
require 'hpricot'
require 'open-uri'

# This class is used to parse and collect data out of an html element
class Scraper #< ActiveRecord::Base
#class Scraper
  attr_accessor :url, :element_type, :clsname, :childsearch, :doc, 
:numrows, :rows
  # Define what the url is, what element type and class name we want to 
parse and open the url.
  def initialize(url, element_type, clsname, childsearch)
    @url = url
    @element_type = element_type
    @clsname = clsname
    @childsearch = childsearch
    @doc = Hpricot(open(url))
    @numrows = numrows
    @rows = rows
  end

  # Scrape data based on the type of element, its class name, and define 
the child element that contains our data
  def scrape_data

    @rows = []

    (doc/"#...@element_type}.#{@clsname...@childsearch}").each do |row|
      cells = []
      (row/"td").each do |cell|

        if (cell/" span.s").length > 0
          values = (cell/"span.s").inner_html.split('<br />').collect{ 
|str|
            pair = str.strip.split('=').collect{|val| val.strip}
            Hash[pair[0], pair[1]]
          }

          if(values.length==1)
            cells << cell.inner_text.strip
          else
            cells << values.strip
          end

        elsif
          cells << cell.inner_text.strip
        end
      end
      @rows << cells
    end
    @rows.shift # Shifting removes the row containing the <th> table 
header elements.
    @rows.delete([]) # Remove any empty rows in our array of arrays.
    @numrows = @rows.length
  end

  def clean_celldata
    @ro...@numrows-1][0] = 120
  end

  # Print a joined list by row to see our results
  def print_values
    puts "Number of rows = #{numrows}."
    for i in 0...@numrows-1
      puts @rows[i].join(', ')
    end
  end
end

--------------------------------

Now the only problem I have now is when I run the rake task, I don't get 
any errors and I see the puts for each team as it's being updated (or 
supposed to be updated).  So, it's counting each row as I expected.

I only tried to update 2 fields just for a test.. but no data is being 
listed in the database..

Any ideas of what I might be doing wrong?

This still has been a great day because even though I've seen tons of 
errors, I'm learning..

-- 
Posted via http://www.ruby-forum.com/.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to