Re: [Tutor] using multiprocessing efficiently to process large data file

Prasad, Ramit Fri, 31 Aug 2012 07:13:11 -0700

Please always respond to the list. And avoid top posting.

> -----Original Message-----
> From: Abhishek Pratap [mailto:[email protected]]
> Sent: Thursday, August 30, 2012 5:47 PM
> To: Prasad, Ramit
> Subject: Re: [Tutor] using multiprocessing efficiently to process large data
> file
> 
> Hi Ramit
> 
> Thanks for your quick reply. Unfortunately given the size of the file
> I  cant afford to load it all into memory at one go.
> I could read, lets say first 1 million lines process them in parallel
> and so on. I am looking for some example which does something similar.
> 
> -Abhi
>


The same logic should work just process your batch after checking size
and iterate over the file directly instead of reading in memory.

with open( file, 'r' ) as f:
    iterdata = iter(f)
    grouped_data =[]
    for d in iterdata:
        l = [d, next(iterdata)] # make this list 8 elements instead
        grouped_data.append( l )
        if len(grouped_data) > 1000000/8: # one million lines
            # process batch
            grouped_data = []


This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] using multiprocessing efficiently to process large data file

Reply via email to