Hi, In the attached script, the longest time is spent in the following functions (verified by psyco log):
def match_generator(self,regex): """ Generate the next line of self.input_file that matches regex. """ generator_ = self.line_generator() while True: self.file_pointer = self.input_file.tell() if self.file_pointer != 0: self.file_pointer -= 1 if (self.file_pointer + 2) >= self.last_line_offset: break line_ = generator_.next() print "%.2f%% \r" % (((self.last_line_offset - self.input_file.tell()) / (self.last_line_offset * 1.0)) * 100.0), if not line_: break else: match_ = regex.match(line_) groups_ = re.findall(regex,line_) if match_: yield line_.strip("\n"), groups_ def get_matching_records_by_regex_extremes(self,regex_array): """ Function will: Find the record matching the first item of regex_array. Will save all records until the last item of regex_array. Will save the last line. Will remember the position of the beginning of the next line in self.input_file. """ start_regex = regex_array[0] end_regex = regex_array[len(regex_array) - 1] all_recs = [] generator_ = self.match_generator try: match_start,groups_ = generator_(start_regex).next() except StopIteration: return(None) if match_start != None: all_recs.append([match_start,groups_]) line_ = self.line_generator().next() while line_: match_ = end_regex.match(line_) groups_ = re.findall(end_regex,line_) if match_ != None: all_recs.append([line_,groups_]) return(all_recs) else: all_recs.append([line_,[]]) line_ = self.line_generator().next() def line_generator(self): """ Generate the next line of self.input_file, and update self.file_pointer to the beginning of that line. """ while self.input_file.tell() <= self.last_line_offset: self.file_pointer = self.input_file.tell() line_ = self.input_file.readline() if not line_: break yield line_.strip("\n") I was trying to think of optimisations, so I could cut down on processing time, but got no inspiration. (I need the "print "%.2f%% \r" ..." line for user's feedback). Could you suggest any optimisations ? Thanks, Ron. P.S.: Examples of processing times are: * 2m42.782s on two files with combined size of 792544 bytes (no matches found). * 28m39.497s on two files with combined size of 4139320 bytes (783 matches found). These times are quite unacceptable, as a normal input to the program would be ten files with combined size of ~17MB.
_failover_multiple_files_client.py
Description: _failover_multiple_files_client.py
-- http://mail.python.org/mailman/listinfo/python-list