On 11/14/18 3:01 PM, Steven D'Aprano wrote: > On Tue, Nov 13, 2018 at 11:59:24PM -0500, Avi Gross wrote: >> I have been thinking about the thread we have had where the job seemed to be >> to read in a log file and if some string was found, process the line before >> it and generate some report. Is that generally correct? > > If that description is correct, then the solution is trivial: iterate > over the file, line by line, keeping the previous line: > > previous_line = None > for current_line in file: > process(current_line, previous_line) > previous_line = current_line > > > No need for complex solutions, or memory-hungry solutions that require > reading the entire file into memory at once (okay for, say, a million > lines, but not if your logfile is 2GB in size). If you need the line > number:
Absolutely, let's not go reading everything in in bulk, Python has tried very hard to build elegant iterators all over the place to avoid "doing the whole thing" when you don't have to - and it has helped heaps with what years ago used to be an indictment of Python as being "too slow": not doing work you don't need to do is always a good thing. The general problem is pretty common, I think, and expands a bit beyond the trivial case. Log files may have a start and end marker for a case you have to examine, and the number of lines between those may be fixed (0, 1, 2, whatever - 0 being the most trivial case) or variable - I think that's the situation that started this thread way back, and it comes up lot. You can have a search on Stack{Exchange,Overflow}, a non-trivial number of people have asked. I just now have a different scenario, similar requirement... I happen to want to scan a bunch of Python code to locate instances of the Python idiom for ignoring certain possible/expected error conditions: try: block of code except SomeError: pass to experiment with replacing those with contextlib.suppress and see if the team of a particular project thinks that makes code more readable: from contextlib import suppress ... with suppress(SomeError): block of code This is pretty similar - I want to identify a multi-line sequence that starts with "try:", has one or more lines, then ends with, in this case, a two-line sequence where the first line starts with "except" and is immediately followed by "pass" - but to make it more exciting, is can then not then followed by either "else" or "finally", because if the try block has either of those clauses, it is not a candidate for using suppress instead. Regexes aren't necessarily helpful on multiline patterns, even if you ignore the jokes about regexes ("now you have two problems") As common as this is, I suspect there are elegant solutions that go beyond everyone rolling their own. I'm thinking that maybe pyparsing has the tools to help with this kind of problem... I may take a look into that over then next few days since I just ended up with a personal interest. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor