Please always respond to the list. And avoid top posting. > -----Original Message----- > From: Abhishek Pratap [mailto:abhishek....@gmail.com] > Sent: Thursday, August 30, 2012 5:47 PM > To: Prasad, Ramit > Subject: Re: [Tutor] using multiprocessing efficiently to process large data > file > > Hi Ramit > > Thanks for your quick reply. Unfortunately given the size of the file > I cant afford to load it all into memory at one go. > I could read, lets say first 1 million lines process them in parallel > and so on. I am looking for some example which does something similar. > > -Abhi >
The same logic should work just process your batch after checking size and iterate over the file directly instead of reading in memory. with open( file, 'r' ) as f: iterdata = iter(f) grouped_data =[] for d in iterdata: l = [d, next(iterdata)] # make this list 8 elements instead grouped_data.append( l ) if len(grouped_data) > 1000000/8: # one million lines # process batch grouped_data = [] This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor