Hi Everyone, I just need some further help clarifying a custom rake task I'm building and the logistics of how it should be working.
I've created a custom rake task in libs/tasks called scraper.rake which so far just contains the following: desc "This task will parse data from ncaa.org and upload the data to our db" task :scraper => :environment do # code goes here for scraping end This rake task will be parsing data from ncaa.org and placing it into my DB for further processing. The .rb file I created has the following: =============================== #== Scraper Version 1.0 # #*Created By:* _Elricstorm_ # # _Special thanks to Soledad Penades for his initial parse idea which I worked with to create the Scraper program. # His article is located at http://www.iterasi.net/openviewer.aspx?sqrlitid=wd5wiad-hkgk93aw8zidbw_ # require 'hpricot' require 'open-uri' # This class is used to parse and collect data out of an html element class Scraper attr_accessor :url, :element_type, :clsname, :childsearch, :doc, :numrows # Define what the url is, what element type and class name we want to parse and open the url. def initialize(url, element_type, clsname, childsearch) @url = url @element_type = element_type @clsname = clsname @childsearch = childsearch @doc = Hpricot(open(url)) @numrows = numrows end # Scrape data based on the type of element, its class name, and define the child element that contains our data def scrape_data @rows = [] (doc/"#...@element_type}.#{@clsname...@childsearch}").each do |row| cells = [] (row/"td").each do |cell| if (cell/" span.s").length > 0 values = (cell/"span.s").inner_html.split('<br />').collect{ |str| pair = str.strip.split('=').collect{|val| val.strip} Hash[pair[0], pair[1]] } if(values.length==1) cells << cell.inner_text.strip else cells << values.strip end elsif cells << cell.inner_text.strip end end @rows << cells end @rows.shift # Shifting removes the row containing the <th> table header elements. @rows.delete([]) # Remove any empty rows in our array of arrays. @numrows = @rows.length end def clean_celldata @ro...@numrows-1][0] = 120 end # Print a joined list by row to see our results def print_values puts "Number of rows = #{numrows}." for i in 0...@numrows-1 puts @rows[i].join(', ') end end end # In our search we are supplying the website url to parse, the type of element (ex: table), the class name of that element # and the child element that contains the data we wish to retrieve. offensive_rushing = Scraper.new('http://web1.ncaa.org/mfb/natlRank.jsp?year=2008&rpt=IA_teamrush&site=org', 'table', 'statstable', '//tr') offensive_rushing.scrape_data offensive_rushing.clean_celldata offensive_rushing.print_values ================================ If you tested that out, you will see a print out of 120 rows of data.. What I want to do is to utilize the .rb file I created with my rake task. However, I'm not sure how to incorporate that into rails. Once I get past this hurdle it should help with future issues. So, here are my list of questions in order of what I am curious to know.. 1. Where do custom .rb files go inside of my rails project? (for instance I understand the MVC but with a rake task - in my brain it's outside of the project and I'm not sure how it is supposed to communicate with controllers or pull/associate variables from those areas. 2. With my custom .rb I'm also requiring 'hpricot'. Is there anything special I need to do with a .rake file to make sure that it knows to pull this gem? And, if I export to my real site, how do I ensure that hpricot is loaded there too? In otherwords, what expectations should I be relying on? 3. When I run a rake task and need to communicate with my database (for uploading purposes) is there an easy way to do this? Can I utilize .rake with my DB inside of my rails environment? Or, are rake tasks completely seperate and distinct and need to be considered outside of scope? 4. Can anyone provide me a summarized step by step (nothing too fancy or that takes up too much of your own time) with how "you" would accomplish this kind of rake task given a similar .rb and .rake file? What generalized steps would you take? Create a class? (if so, where would you place it) How would you communicate with the DB within rails? etc. I know these are a lot of questions but I figure even if one or two of them get answered, I'm happy. You don't have to feel that you can't reply if you don't have the answers to all of them. Any answers that can be touched upon would be greatly appreciated. I am a newbie and learning rails (but many books do not talk about these particulars). So, I'm relying on others that have patience and understanding to help enlighten me so that one day I too, can help others that need similar help. Thanks. -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---

