One 30000+ row file and another with just over 200. How much memory should I need for this not to take forever parsing? (I'm currently using my computer as server and I can see ruby taking about 1GB in the task manager when processing this (and it takes forever).
The 30000+ row file is about 7MB, which is not that much (I think) On Friday, October 11, 2013 8:44:22 AM UTC-4:30, Walter Lee Davis wrote: > > > On Oct 10, 2013, at 4:50 PM, Monserrat Foster wrote: > > > A coworker suggested I should use just basic OOP for this, to create a > class that reads files, and then another to load the files into memory. > Could please point me in the right direction for this (where can I read > about it)? I have no idea what's he talking about, as I've never done this > before. > > How many of these files are you planning to parse at any one time? Do you > have the memory on your server to deal with this load? I can see this > approach working, but getting slow and process-bound very quickly. Lots of > edge cases to deal with when parsing big uploaded files. > > Walter > > > > > I'll look up nokogiri and SAX > > > > On Thursday, October 10, 2013 4:12:33 PM UTC-4:30, Walter Lee Davis > wrote: > > On Oct 10, 2013, at 4:36 PM, Monserrat Foster wrote: > > > > > Hello, I'm developing an app that basically, receives a 10MB or less > XLSX files with +30000 rows or so, and another XLSX file with about > 200rows, I have to read one row of the smallest file, look it up on the > largest file and write data from both files to a new one. > > > > Wow. Do you have to do all this in a single request? > > > > You may want to look at Nokogiri and its SAX parser. SAX parsers don't > care about the size of the document they operate on, because they work one > node at a time, and don't load the whole thing into memory at once. There > are some limitations on what kind of work a SAX parser can perform, because > it isn't able to see the entire document and "know" where it is within the > document at any point. But for certain kinds of problems, it can be the > only way to go. Sounds like you may need something like this. > > > > Walter > > > > > > > > I just did a test reading a few rows from the largest file using ROO > (Spreadsheet doesn't support XSLX and Creek look good but I can't find a > way to read row by row) > > > and it basically made my computer crash, the server crashed, I tried > rebooting it and it said It was already started, anyway, it was a disaster. > > > > > > So, my question was, is there gem that works best with large XLSX > files or is there another way to approach this withouth crashing my > computer? > > > > > > This is what I had (It's very possible I'm doing it wrong, help is > welcome) > > > What i was trying to do here, was to process the files and create the > new XLS file after both of the XLSX files were uploaded: > > > > > > > > > require 'roo' > > > require 'spreadsheet' > > > require 'creek' > > > class UploadFiles < ActiveRecord::Base > > > after_commit :process_files > > > attr_accessible :inventory, :material_list > > > has_one :inventory > > > has_one :material_list > > > has_attached_file :inventory, :url=>"/:current_user/inventory", > :path=>":rails_root/tmp/users/uploaded_files/inventory/inventory.:extension" > > > > has_attached_file :material_list, > :url=>"/:current_user/material_list", > :path=>":rails_root/tmp/users/uploaded_files/material_list/material_list.:extension" > > > > > validates_attachment_presence :material_list > > > accepts_nested_attributes_for :material_list, :allow_destroy => true > > > > accepts_nested_attributes_for :inventory, :allow_destroy => true > > > validates_attachment_content_type :inventory, :content_type => > ["application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"], > :message => "Only .XSLX files are accepted as Inventory" > > > validates_attachment_content_type :material_list, :content_type => > ["application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"], > :message => "Only .XSLX files are accepted as Material List" > > > > > > > > > def process_files > > > inventory = Creek::Book.new(Rails.root.to_s + > "/tmp/users/uploaded_files/inventory/inventory.xlsx") > > > material_list = Creek::Book.new(Rails.root.to_s + > "/tmp/users/uploaded_files/material_list/material_list.xlsx") > > > inventory = inventory.sheets[0] > > > scl = Spreadsheet::Workbook.new > > > sheet1 = scl.create_worksheet > > > inventory.rows.each do |row| > > > row.inspect > > > sheet1.row(1).push(row) > > > end > > > > > > sheet1.name = "Site Configuration List" > > > scl.write(Rails.root.to_s + > "/tmp/users/generated/siteconfigurationlist.xls") > > > end > > > end > > > > > > > > > -- > > > You received this message because you are subscribed to the Google > Groups "Ruby on Rails: Talk" group. > > > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected]. > > > To post to this group, send email to [email protected]. > > > To view this discussion on the web visit > https://groups.google.com/d/msgid/rubyonrails-talk/bc470d4d-19c4-4969-8ba7-4ead7a35d40c%40googlegroups.com. > > > > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > > -- > > You received this message because you are subscribed to the Google > Groups "Ruby on Rails: Talk" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected] <javascript:>. > > To post to this group, send email to > > [email protected]<javascript:>. > > > To view this discussion on the web visit > https://groups.google.com/d/msgid/rubyonrails-talk/0325dc87-0649-45fc-9d55-0fbcd8bed0a0%40googlegroups.com. > > > > For more options, visit https://groups.google.com/groups/opt_out. > > -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/ba633f69-5527-4dc1-8518-b6104e414e15%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.

