Thanks. I actually comment out all  handling code. The loop ends with  the
   re_pat.match and nothing followed.
   Sent from Mail Master
   On 07/05/2017 08:31, [1]Cameron Simpson wrote:

     On 04Jul2017 17:01, Mayling ge <maylinge0...@gmail.com> wrote:
     >   My function is in the following way to handle file line by line.
     There are
     >   multiple error patterns  defined and  need to apply  to each  line.
     I  use
     >   multiprocessing.Pool to handle the file in block.
     >
     >   The memory usage increases to 2G for a 1G file. And stays in 2G even
     after
     >   the file processing. File closed in the end.
     >
     >   If I comment  out the  call to re_pat.match,  memory usage  is
     normal  and
     >   keeps under 100Mb. [...]
     >
     >   def line_match(lines, errors)
     >       for error in errors:
     >           try:
     >               re_pat = re.compile(error['pattern'])
     >           except Exception:
     >               print_error
     >               continue
     >           for line in lines:
     >               m = re_pat.match(line)
     >               # other code to handle matched object
     [...]
     >   Notes: I  omit  some  code  as  I  think  the  significant
      difference  is
     >   with/without re_pat.match(...)

     Hmm. Does the handling code (omitted) keep the line or match object in
     memory?

     If leaving out the "m = re_pat.match(line)" triggers the leak, and
     presuming
     that line itself doesn't leak, then I would start to suspect the
     handling code
     is not letting go of the match object "m" or of the line (which is
     probably
     attached to the match object "m" to support things like m.group() and so
     forth).

     So you might need to show us the handling code.

     Cheers,
     Cameron Simpson <c...@zip.com.au>

References

   Visible links
   1. mailto:c...@zip.com.au
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to