If crawling through the NCAA website is something that you want to automate, as "crawl daily at 12 pm" you can create a rake task in your Rails application that calls this code and then put the rake task to be run as a cron job:
#ncaa_crawler.rake at /lib/tasks rake :ncaa_crawler => :environment do #here goes the code that gets the NCAA data #and saves it into the database end The :environment thing tells Rake that it should load the rails application before executing the task, so all objects defined in your Rails application will be available, like your ActiveRecord models. This code you have shown should be placed into a class in your application and be called on this rake task. - Maurício Linhares http://alinhavado.wordpress.com/ (pt-br) | http://codeshooter.wordpress.com/ (en) On Sat, Jun 6, 2009 at 12:38 AM, J. D.<[email protected]> wrote: > > Maurício Linhares wrote: >> The "integer", "string" and "float" methods are just shorthands for >> the column call using that type. >> >> It's also the "new way" (new since Rails 2, not that new now) of >> writing migrations. And the "timestamps" will create both a created_at >> and also an updated_at column. >> >> - >> Maur�cio Linhares >> http://alinhavado.wordpress.com/ (pt-br) | >> http://codeshooter.wordpress.com/ (en) > > Thanks Mauricio - I actually like it better than having to do things a > long way. Anything shorter is better, IMO. I appreciate the > explanation. > > I have one more question.. > > I've created a ruby program that actually parses raw statistics from the > main NCAA web site, which I want to bring into my own database. > > So, using the example above.. here's an example of the parser I created: > > #== Scraper Version 1.0 > # > #*Created By:* _Elricstorm_ > # > # _Special thanks to Soledad Penades for his initial parse idea which I > worked with to create the Scraper program. > # His article is located at > http://www.iterasi.net/openviewer.aspx?sqrlitid=wd5wiad-hkgk93aw8zidbw_ > # > require 'hpricot' > require 'open-uri' > > # This class is used to parse and collect data out of an html element > class Scraper > attr_accessor :url, :element_type, :clsname, :childsearch, :doc, > :numrows > # Define what the url is, what element type and class name we want to > parse and open the url. > def initialize(url, element_type, clsname, childsearch) > �...@url = url > �...@element_type = element_type > �...@clsname = clsname > �...@childsearch = childsearch > �...@doc = Hpricot(open(url)) > �...@numrows = numrows > end > > # Scrape data based on the type of element, its class name, and define > the child element that contains our data > def scrape_data > > �...@rows = [] > > (doc/"#...@element_type}.#{@clsname...@childsearch}").each do |row| > cells = [] > (row/"td").each do |cell| > > if (cell/" span.s").length > 0 > values = (cell/"span.s").inner_html.split('<br > />').collect{ |str| > pair = str.strip.split('=').collect{|val| val.strip} > Hash[pair[0], pair[1]] > } > > if(values.length==1) > cells << cell.inner_text.strip > else > cells << values.strip > end > > elsif > cells << cell.inner_text.strip > end > end > �...@rows << cells > end > �[email protected] # Shifting removes the row containing the <th> table > header elements. > �[email protected]([]) # Remove any empty rows in our array of arrays. > �...@numrows = @rows.length > end > > def clean_celldata > @ro...@numrows-1][0] = 120 > end > > # Print a joined list by row to see our results > def print_values > puts "Number of rows = #{numrows}." > for i in 0...@numrows-1 > puts @rows[i].join(', ') > end > end > > # This method will be used to further process collected data > def process_values > File.open("testdata.txt", "w") do |f| > for i in 0...@numrows-1 > f.puts @rows[i].join(', ') > end > end > puts "Processing completed." > end > end > > > > # In our search we are supplying the website url to parse, the type of > element (ex: table), the class name of that element > # and the child element that contains the data we wish to retrieve. > offensive_rushing = > Scraper.new('http://web1.ncaa.org/mfb/natlRank.jsp?year=2008&rpt=IA_teamrush&site=org', > 'table', 'statstable', '//tr') > offensive_rushing.scrape_data > offensive_rushing.clean_celldata > offensive_rushing.print_values > offensive_rushing.process_values > > ------------- > > So, the other question I have is how do I tie in the mechanics of a > regular ruby program into rails? For instance, the ruby program I wrote > requires hpricot.. > > I just need a bit of guidance (I catch on fast).. > > Someone can run the program I included to see how it outputs.. > > -- > Posted via http://www.ruby-forum.com/. > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---

