Hi, the files automatically download in .XLSX formats, I can't change them and I can't force the users to change it in order to make my job easier. Thanks for the suggestion.
On Friday, October 11, 2013 11:34:39 AM UTC-4:30, donz wrote: > > On 10/11/2013 11:30 AM, Monserrat Foster wrote: > > One 30000+ row file and another with just over 200. How much memory should > I need for this not to take forever parsing? (I'm currently using my > computer as server and I can see ruby taking about 1GB in the task manager > when processing this (and it takes forever). > > The 30000+ row file is about 7MB, which is not that much (I think) > > On Friday, October 11, 2013 8:44:22 AM UTC-4:30, Walter Lee Davis wrote: >> >> >> On Oct 10, 2013, at 4:50 PM, Monserrat Foster wrote: >> >> > A coworker suggested I should use just basic OOP for this, to create a >> class that reads files, and then another to load the files into memory. >> Could please point me in the right direction for this (where can I read >> about it)? I have no idea what's he talking about, as I've never done this >> before. >> >> How many of these files are you planning to parse at any one time? Do you >> have the memory on your server to deal with this load? I can see this >> approach working, but getting slow and process-bound very quickly. Lots of >> edge cases to deal with when parsing big uploaded files. >> >> Walter >> >> > >> > I'll look up nokogiri and SAX >> > >> > On Thursday, October 10, 2013 4:12:33 PM UTC-4:30, Walter Lee Davis >> wrote: >> > On Oct 10, 2013, at 4:36 PM, Monserrat Foster wrote: >> > >> > > Hello, I'm developing an app that basically, receives a 10MB or less >> XLSX files with +30000 rows or so, and another XLSX file with about >> 200rows, I have to read one row of the smallest file, look it up on the >> largest file and write data from both files to a new one. >> > >> > Wow. Do you have to do all this in a single request? >> > >> > You may want to look at Nokogiri and its SAX parser. SAX parsers don't >> care about the size of the document they operate on, because they work one >> node at a time, and don't load the whole thing into memory at once. There >> are some limitations on what kind of work a SAX parser can perform, because >> it isn't able to see the entire document and "know" where it is within the >> document at any point. But for certain kinds of problems, it can be the >> only way to go. Sounds like you may need something like this. >> > >> > Walter >> > >> > > >> > > I just did a test reading a few rows from the largest file using ROO >> (Spreadsheet doesn't support XSLX and Creek look good but I can't find a >> way to read row by row) >> > > and it basically made my computer crash, the server crashed, I tried >> rebooting it and it said It was already started, anyway, it was a disaster. >> > > >> > > So, my question was, is there gem that works best with large XLSX >> files or is there another way to approach this withouth crashing my >> computer? >> > > >> > > This is what I had (It's very possible I'm doing it wrong, help is >> welcome) >> > > What i was trying to do here, was to process the files and create the >> new XLS file after both of the XLSX files were uploaded: >> > > >> > > >> > > require 'roo' >> > > require 'spreadsheet' >> > > require 'creek' >> > > class UploadFiles < ActiveRecord::Base >> > > after_commit :process_files >> > > attr_accessible :inventory, :material_list >> > > has_one :inventory >> > > has_one :material_list >> > > has_attached_file :inventory, :url=>"/:current_user/inventory", >> :path=>":rails_root/tmp/users/uploaded_files/inventory/inventory.:extension" >> >> > > has_attached_file :material_list, >> :url=>"/:current_user/material_list", >> :path=>":rails_root/tmp/users/uploaded_files/material_list/material_list.:extension" >> >> >> > > validates_attachment_presence :material_list >> > > accepts_nested_attributes_for :material_list, :allow_destroy => >> true >> > > accepts_nested_attributes_for :inventory, :allow_destroy => true >> > > validates_attachment_content_type :inventory, :content_type => >> ["application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"], >> :message => "Only .XSLX files are accepted as Inventory" >> > > validates_attachment_content_type :material_list, :content_type => >> ["application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"], >> :message => "Only .XSLX files are accepted as Material List" >> > > >> > > >> > > def process_files >> > > inventory = Creek::Book.new(Rails.root.to_s + >> "/tmp/users/uploaded_files/inventory/inventory.xlsx") >> > > material_list = Creek::Book.new(Rails.root.to_s + >> "/tmp/users/uploaded_files/material_list/material_list.xlsx") >> > > inventory = inventory.sheets[0] >> > > scl = Spreadsheet::Workbook.new >> > > sheet1 = scl.create_worksheet >> > > inventory.rows.each do |row| >> > > row.inspect >> > > sheet1.row(1).push(row) >> > > end >> > > >> > > sheet1.name = "Site Configuration List" >> > > scl.write(Rails.root.to_s + >> "/tmp/users/generated/siteconfigurationlist.xls") >> > > end >> > > end >> > > >> > > >> > > -- >> > > You received this message because you are subscribed to the Google >> Groups "Ruby on Rails: Talk" group. >> > > To unsubscribe from this group and stop receiving emails from it, >> send an email to [email protected]. >> > > To post to this group, send email to [email protected]. >> > > To view this discussion on the web visit >> https://groups.google.com/d/msgid/rubyonrails-talk/bc470d4d-19c4-4969-8ba7-4ead7a35d40c%40googlegroups.com. >> >> >> > > For more options, visit https://groups.google.com/groups/opt_out. >> > >> > >> > -- >> > You received this message because you are subscribed to the Google >> Groups "Ruby on Rails: Talk" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> an email to [email protected]. >> > To post to this group, send email to [email protected]. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/rubyonrails-talk/0325dc87-0649-45fc-9d55-0fbcd8bed0a0%40googlegroups.com. >> >> >> > For more options, visit https://groups.google.com/groups/opt_out. >> >> -- > You received this message because you are subscribed to the Google Groups > "Ruby on Rails: Talk" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > To post to this group, send email to [email protected]<javascript:> > . > To view this discussion on the web visit > https://groups.google.com/d/msgid/rubyonrails-talk/ba633f69-5527-4dc1-8518-b6104e414e15%40googlegroups.com > . > For more options, visit https://groups.google.com/groups/opt_out. > > I use a rather indirect route that works fine for me with 15,000 lines and > about 26 MB. I export the file from LibreOffice Calc using csv (Comma > separated variables). Then, in the rails controller I use something like: > > require 'csv' > > class TheControllerController # ;') > > # other controller code > > def upload > data = CSV.parse(params[:entries].tempfile.read) # from Ruby's CSV > class > for line in data do > logger.debug "line: #{line.inspect}" > #each line is an array of strings containing the columns of the one > row of the csv file > #I use these data to populate the appropriate db table / rails > model at this point > end > end > > end > > make sure that your routes.db points to this: > > match 'the_controller/upload' => 'the_controller#upload' > > from your client machine's command line > > curl -F [email protected] <javascript:>localhost:3000/the_controller/upload > > note that 'entries' in the curl command matches the 'entries' in the > param[:entries] in the controller. > > If you want to do this from a rails gui form, look at > http://guides.rubyonrails.org/form_helpers.html#uploading-files > > During testing on my 4-core, 8 GB laptop, processing the really big files > take several minutes. When I have the app on heroku, this causes a timeout > so I break up the csv file into multiple sections such that each section > takes less than 30 seconds to upload. By leaving a little 'slack' in the > size, I have this automated so it occurs in the background while I am doing > other work. > > Hope these suggestions help. > > Don Ziesig > > > > > > > -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/2d162c44-039d-4bb3-8949-4b7b0464f83f%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.

