Hi, the files automatically download in .XLSX formats, I can't change them 
and I can't force the users to change it in order to make my job easier. 
Thanks for the suggestion. 

On Friday, October 11, 2013 11:34:39 AM UTC-4:30, donz wrote:
>
>  On 10/11/2013 11:30 AM, Monserrat Foster wrote:
>  
> One 30000+ row file and another with just over 200. How much memory should 
> I need for this not to take forever parsing? (I'm currently using my 
> computer as server and I can see ruby taking about 1GB in the task manager 
> when processing this (and it takes forever). 
>
>  The 30000+ row file is about 7MB, which is not that much (I think) 
>
> On Friday, October 11, 2013 8:44:22 AM UTC-4:30, Walter Lee Davis wrote: 
>>
>>
>> On Oct 10, 2013, at 4:50 PM, Monserrat Foster wrote: 
>>
>> > A coworker suggested I should use just basic OOP for this, to create a 
>> class that reads files, and then another to load the files into memory. 
>> Could please point me in the right direction for this (where can I read 
>> about it)? I have no idea what's he talking about, as I've never done this 
>> before. 
>>
>> How many of these files are you planning to parse at any one time? Do you 
>> have the memory on your server to deal with this load? I can see this 
>> approach working, but getting slow and process-bound very quickly. Lots of 
>> edge cases to deal with when parsing big uploaded files. 
>>
>> Walter 
>>
>> > 
>> > I'll look up nokogiri and SAX 
>> > 
>> > On Thursday, October 10, 2013 4:12:33 PM UTC-4:30, Walter Lee Davis 
>> wrote: 
>> > On Oct 10, 2013, at 4:36 PM, Monserrat Foster wrote: 
>> > 
>> > > Hello, I'm developing an app that basically, receives a 10MB or less 
>> XLSX files with +30000 rows or so, and another XLSX file with about 
>> 200rows, I have to read one row of the smallest file, look it up on the 
>> largest file and write data from both files to a new one. 
>> > 
>> > Wow. Do you have to do all this in a single request? 
>> > 
>> > You may want to look at Nokogiri and its SAX parser. SAX parsers don't 
>> care about the size of the document they operate on, because they work one 
>> node at a time, and don't load the whole thing into memory at once. There 
>> are some limitations on what kind of work a SAX parser can perform, because 
>> it isn't able to see the entire document and "know" where it is within the 
>> document at any point. But for certain kinds of problems, it can be the 
>> only way to go. Sounds like you may need something like this. 
>> > 
>> > Walter 
>> > 
>> > > 
>> > > I just did a test reading a few rows from the largest file using ROO 
>> (Spreadsheet doesn't support XSLX and Creek look good but I can't find a 
>> way to read row by row) 
>> > > and it basically made my computer crash, the server crashed, I tried 
>> rebooting it and it said It was already started, anyway, it was a disaster. 
>> > > 
>> > > So, my question was, is there gem that works best with large XLSX 
>> files or is there another way to approach this withouth crashing my 
>> computer? 
>> > > 
>> > > This is what I had (It's very possible I'm doing it wrong, help is 
>> welcome) 
>> > > What i was trying to do here, was to process the files and create the 
>> new XLS file after both of the XLSX files were uploaded: 
>> > > 
>> > > 
>> > > require 'roo' 
>> > > require 'spreadsheet' 
>> > > require 'creek' 
>> > > class UploadFiles < ActiveRecord::Base 
>> > >   after_commit :process_files 
>> > >   attr_accessible :inventory, :material_list 
>> > >   has_one :inventory 
>> > >   has_one :material_list 
>> > >   has_attached_file :inventory, :url=>"/:current_user/inventory", 
>> :path=>":rails_root/tmp/users/uploaded_files/inventory/inventory.:extension" 
>>
>> > >   has_attached_file :material_list, 
>> :url=>"/:current_user/material_list", 
>> :path=>":rails_root/tmp/users/uploaded_files/material_list/material_list.:extension"
>>  
>>
>> > >   validates_attachment_presence :material_list 
>> > >   accepts_nested_attributes_for :material_list, :allow_destroy => 
>> true   
>> > >   accepts_nested_attributes_for :inventory, :allow_destroy => true   
>> > >   validates_attachment_content_type :inventory, :content_type => 
>> ["application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"], 
>> :message => "Only .XSLX files are accepted as Inventory" 
>> > >   validates_attachment_content_type :material_list, :content_type => 
>> ["application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"], 
>> :message => "Only .XSLX files are accepted as Material List" 
>> > >   
>> > >   
>> > >   def process_files 
>> > >     inventory =  Creek::Book.new(Rails.root.to_s + 
>> "/tmp/users/uploaded_files/inventory/inventory.xlsx") 
>> > >     material_list = Creek::Book.new(Rails.root.to_s + 
>> "/tmp/users/uploaded_files/material_list/material_list.xlsx") 
>> > >     inventory = inventory.sheets[0] 
>> > >     scl = Spreadsheet::Workbook.new 
>> > >     sheet1 = scl.create_worksheet 
>> > >     inventory.rows.each do |row| 
>> > >       row.inspect 
>> > >       sheet1.row(1).push(row) 
>> > >     end 
>> > >     
>> > >     sheet1.name = "Site Configuration List" 
>> > >     scl.write(Rails.root.to_s + 
>> "/tmp/users/generated/siteconfigurationlist.xls") 
>> > >   end 
>> > > end 
>> > > 
>> > > 
>> > > -- 
>> > > You received this message because you are subscribed to the Google 
>> Groups "Ruby on Rails: Talk" group. 
>> > > To unsubscribe from this group and stop receiving emails from it, 
>> send an email to [email protected]. 
>> > > To post to this group, send email to [email protected]. 
>> > > To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/rubyonrails-talk/bc470d4d-19c4-4969-8ba7-4ead7a35d40c%40googlegroups.com.
>>  
>>
>> > > For more options, visit https://groups.google.com/groups/opt_out. 
>> > 
>> > 
>> > -- 
>> > You received this message because you are subscribed to the Google 
>> Groups "Ruby on Rails: Talk" group. 
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> an email to [email protected]. 
>> > To post to this group, send email to [email protected]. 
>> > To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/rubyonrails-talk/0325dc87-0649-45fc-9d55-0fbcd8bed0a0%40googlegroups.com.
>>  
>>
>> > For more options, visit https://groups.google.com/groups/opt_out. 
>>
>>   -- 
> You received this message because you are subscribed to the Google Groups 
> "Ruby on Rails: Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected]<javascript:>
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/rubyonrails-talk/ba633f69-5527-4dc1-8518-b6104e414e15%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>
> I use a rather indirect route that works fine for me with 15,000 lines and 
> about 26 MB.  I export the file from LibreOffice Calc using csv (Comma 
> separated variables).  Then, in the rails controller I use something like:
>
> require 'csv'
>
> class TheControllerController # ;')
>
> # other controller code 
>
>   def upload
>     data = CSV.parse(params[:entries].tempfile.read) # from Ruby's CSV 
> class
>     for line in data do
>       logger.debug "line: #{line.inspect}"
>        #each line is an array of strings containing the columns of the one 
> row of the csv file
>        #I use these data to populate the appropriate db table / rails 
> model at this point
>     end
>   end
>
> end
>
> make sure that your routes.db points to this:
>
>   match 'the_controller/upload' => 'the_controller#upload'
>
> from your client machine's command line
>   
> curl -F [email protected] <javascript:>localhost:3000/the_controller/upload
>
> note that 'entries' in the curl command matches the 'entries' in the 
> param[:entries] in the controller.
>
> If you want to do this from a rails gui form, look at 
> http://guides.rubyonrails.org/form_helpers.html#uploading-files
>
> During testing on my 4-core, 8 GB laptop, processing the really big files 
> take several minutes.  When I have the app on heroku, this causes a timeout 
> so I break up the csv file into multiple sections such that each section 
> takes less than 30 seconds to upload.  By leaving a little 'slack' in the 
> size, I have this automated so it occurs in the background while I am doing 
> other work.
>
> Hope these suggestions help.
>
> Don Ziesig
>
>
>
>
>
>
>  

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/rubyonrails-talk/2d162c44-039d-4bb3-8949-4b7b0464f83f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to