Re: [Tutor] Increase performance of the script
Steven D'Aprano wrote: > [...] >> In python 2.6 print statement work as print "Solution" >> however after import collection I have to use print with >> print("Solution") is this a known issue ? > > As Peter says, you must have run > > from __future__ import print_function > > to see this behaviour. This has nothing to do with import collection. > You can debug that for yourself by exiting the interactive interpreter, > starting it up again, and trying to print before and after importing > collection. To be fair to Asad -- I sneaked in the __future__ import into my sample code. I did it to be able to write Python 3 code that would still run in his 2.6 interpreter. In hindsight that was not a good idea as it can confuse someone who has never seen it, and the OP has yet to learn other more important things. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Increase performance of the script
On Wed, Dec 12, 2018 at 12:52:09AM -0500, Avi Gross wrote: > Asad, > > I wonder if an import from __future__ happened, perhaps in the version of > collections you used. Later versions of 2.x allow optional use of the 3.x > style of print. The effect of __future__ imports, like any other import, is only within the module that actually does the import. Even in the unlikely event that collections did such a future import, it would only effect print within that module, not globally or in the interactive interpreter. Here's a demo: # prfunc_demo.py from __future__ import print_function try: exec("print 123") except SyntaxError: print("old style print failed, as expected") print("as print is now a function") And importing it into the interactive interpreter shows that the effect of the future import is localised: [steve@ando ~]$ python2.6 Python 2.6.7 (r267:88850, Mar 10 2012, 12:32:58) [GCC 4.1.2 20080704 (Red Hat 4.1.2-51)] on linux2 Type "help", "copyright", "credits" or "license" for more information. py> import prfunc_demo old style print failed, as expected as print is now a function py> print "But here in the REPL, nothing has changed." But here in the REPL, nothing has changed. -- Steve ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Increase performance of the script
Asad, I wonder if an import from __future__ happened, perhaps in the version of collections you used. Later versions of 2.x allow optional use of the 3.x style of print. When you redefine print, the old statement style is hidden or worse. -Original Message- From: Tutor On Behalf Of Asad Sent: Tuesday, December 11, 2018 10:38 AM To: tutor@python.org Subject: [Tutor] Increase performance of the script Hi All, I used your solution , however found a strange issue with deque : I am using python 2.6.6: >>> import collections >>> d = collections.deque('abcdefg') >>> print 'Deque:', d File "", line 1 print 'Deque:', d ^ SyntaxError: invalid syntax >>> print ('Deque:', d) Deque: deque(['a', 'b', 'c', 'd', 'e', 'f', 'g']) >>> print d File "", line 1 print d ^ SyntaxError: invalid syntax >>> print (d) deque(['a', 'b', 'c', 'd', 'e', 'f', 'g']) In python 2.6 print statement work as print "Solution" however after import collection I have to use print with print("Solution") is this a known issue ? Please let me know . Thanks, ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Increase performance of the script
On Tue, Dec 11, 2018 at 09:07:58PM +0530, Asad wrote: > Hi All, > > I used your solution , however found a strange issue with deque : No you haven't. You found a *syntax error*, as the exception says: > >>> print 'Deque:', d > File "", line 1 > print 'Deque:', d > ^ > SyntaxError: invalid syntax which means the error occurs before the interpreter runs the code. You could replace the above line with any similar line: print 'Not a deque', 1.2345 and you will get the same error. When you are faced with an error in the interactive interpreter, you should try different things to see how they effect the problem. Does the problem go away if you use a float instead of a deque? If you change the string, does the problem go away? If you swap the order, does the problem go away? What if you use a single value instead of two? This is called "debugging", and as a programmer, you need to learn how to do this. [...] > In python 2.6 print statement work as print "Solution" > however after import collection I have to use print with print("Solution") > is this a known issue ? As Peter says, you must have run from __future__ import print_function to see this behaviour. This has nothing to do with import collection. You can debug that for yourself by exiting the interactive interpreter, starting it up again, and trying to print before and after importing collection. -- Steve ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Increase performance of the script
Hi All, I used your solution , however found a strange issue with deque : I am using python 2.6.6: >>> import collections >>> d = collections.deque('abcdefg') >>> print 'Deque:', d File "", line 1 print 'Deque:', d ^ SyntaxError: invalid syntax >>> print ('Deque:', d) Deque: deque(['a', 'b', 'c', 'd', 'e', 'f', 'g']) >>> print d File "", line 1 print d ^ SyntaxError: invalid syntax >>> print (d) deque(['a', 'b', 'c', 'd', 'e', 'f', 'g']) In python 2.6 print statement work as print "Solution" however after import collection I have to use print with print("Solution") is this a known issue ? Please let me know . Thanks, On Mon, Dec 10, 2018 at 10:30 PM wrote: > Send Tutor mailing list submissions to > tutor@python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/tutor > or, via email, send a message with subject or body 'help' to > tutor-requ...@python.org > > You can reach the person managing the list at > tutor-ow...@python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Tutor digest..." > Today's Topics: > >1. Re: Increase performance of the script (Peter Otten) >2. Re: Increase performance of the script (Steven D'Aprano) >3. Re: Increase performance of the script (Steven D'Aprano) > > > > ------ Forwarded message ------ > From: Peter Otten <__pete...@web.de> > To: tutor@python.org > Cc: > Bcc: > Date: Sun, 09 Dec 2018 21:17:53 +0100 > Subject: Re: [Tutor] Increase performance of the script > Asad wrote: > > > Hi All , > > > > I have the following code to search for an error and prin the > > solution . > > > > /A/B/file1.log size may vary from 5MB -5 GB > > > > f4 = open (r" /A/B/file1.log ", 'r' ) > > string2=f4.readlines() > > Do not read the complete file into memory. Read one line at a time and > keep > only those lines around that you may have to look at again. > > > for i in range(len(string2)): > > position=i > > lastposition =position+1 > > while True: > > if re.search('Calling rdbms/admin',string2[lastposition]): > > break > > elif lastposition==len(string2)-1: > > break > > else: > > lastposition += 1 > > You are trying to find a group of lines. The way you do it for a file of > the > structure > > foo > bar > baz > end-of-group-1 > ham > spam > end-of-group-2 > > you find the groups > > foo > bar > baz > end-of-group-1 > > bar > baz > end-of-group-1 > > baz > end-of-group-1 > > ham > spam > end-of-group-2 > > spam > end-of-group-2 > > That looks like a lot of redundancy which you can probably avoid. But > wait... > > > > errorcheck=string2[position:lastposition] > > for i in range ( len ( errorcheck ) ): > > if re.search ( r'"error(.)*13?"', errorcheck[i] ): > > print "Reason of error \n", errorcheck[i] > > print "script \n" , string2[position] > > print "block of code \n" > > print errorcheck[i-3] > > print errorcheck[i-2] > > print errorcheck[i-1] > > print errorcheck[i] > > print "Solution :\n" > > print "Verify the list of objects belonging to Database " > > break > > else: > > continue > > break > > you throw away almost all the hard work to look for the line containing > those four lines? It looks like you only need the > "error...13" lines, the three lines that precede it and the last > "Calling..." line occuring before the "error...13". > > > The problem I am facing in performance issue it takes some minutes to > > print out the solution . Please advice if there can be performance > > enhancements to this script . > > If you want to learn the Python way you should try hard to write your > scripts without a single > > for i in range(...): > ... > > loop. This style is usually the last resort, it may work for small > datasets, > but as soon as you have to deal with large files performance dives. > Even worse, these loops tend to make your code hard to debug. > > Below is a suggestion for an implementation of what your code
Re: [Tutor] Increase performance of the script
On Sun, Dec 09, 2018 at 03:45:07PM +0530, Asad wrote: > Hi All , > > I have the following code to search for an error and prin the > solution . Please tidy your code before asking for help optimizing it. We're volunteers, not being paid to work on your problem, and your code is too hard to understand. Some comments: > f4 = open (r" /A/B/file1.log ", 'r' ) > string2=f4.readlines() You have a variable "f4". Where are f1, f2 and f3? You have a variable "string2", which is a lie, because it is not a string, it is a list. I will be very surprised if the file name you show is correct. It has a leading space, and two trailing spaces. > for i in range(len(string2)): > position=i Poor style. In Python, you almost never need to write code that iterates over the indexes (this is not Pascal). You don't need the assignment position=i. Better: for position, line in enumerate(lines): ... > lastposition =position+1 Poorly named variable. You call it "last position", but it is actually the NEXT position. > while True: > if re.search('Calling rdbms/admin',string2[lastposition]): Unnecessary use of regex, which will be slow. Better: if 'Calling rdbms/admin' in line: break > break > elif lastposition==len(string2)-1: > break If you iterate over the lines, you don't need to check for the end of the list yourself. A better solution is to use the *accumulator* design pattern to collect a block of lines for further analysis: # Untested. with open(filename, 'r') as f: block = [] inside_block = False for line in f: line = line.strip() if inside_block: if line == "End of block": inside_block = False process(block) block = [] # Reset to collect the next block. else: block.append(line) elif line == "Start of block": inside_block = True # At the end of the loop, we might have a partial block. if block: process(block) Your process() function takes a single argument, the list of lines which makes up the block you care about. If you need to know the line numbers, it is easy to adapt: for line in f: becomes: for linenumber, line in enumerate(f): # The next line is not needed in Python 3. linenumber += 1 # Adjust to start line numbers at 1 instead of 0 and: block.append(line) becomes block.append((linenumber, line)) If you re-write your code using this accumulator pattern, using ordinary substring matching and equality instead of regular expressions whenever possible, I expect you will see greatly improved performance (as well as being much, much easier to understand and maintain). -- Steve ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Increase performance of the script
On Sun, Dec 09, 2018 at 03:45:07PM +0530, Asad wrote: > Hi All , > > I have the following code to search for an error and prin the > solution . > > /A/B/file1.log size may vary from 5MB -5 GB [...] > The problem I am facing in performance issue it takes some minutes to print > out the solution . Please advice if there can be performance enhancements > to this script . How many minutes is "some"? If it takes 2 minutes to analyse a 5GB file, that's not bad performance. If it takes 2 minutes to analyse a 5MB file, that's not so good. -- Steve ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Increase performance of the script
Asad wrote: > Hi All , > > I have the following code to search for an error and prin the > solution . > > /A/B/file1.log size may vary from 5MB -5 GB > > f4 = open (r" /A/B/file1.log ", 'r' ) > string2=f4.readlines() Do not read the complete file into memory. Read one line at a time and keep only those lines around that you may have to look at again. > for i in range(len(string2)): > position=i > lastposition =position+1 > while True: > if re.search('Calling rdbms/admin',string2[lastposition]): > break > elif lastposition==len(string2)-1: > break > else: > lastposition += 1 You are trying to find a group of lines. The way you do it for a file of the structure foo bar baz end-of-group-1 ham spam end-of-group-2 you find the groups foo bar baz end-of-group-1 bar baz end-of-group-1 baz end-of-group-1 ham spam end-of-group-2 spam end-of-group-2 That looks like a lot of redundancy which you can probably avoid. But wait... > errorcheck=string2[position:lastposition] > for i in range ( len ( errorcheck ) ): > if re.search ( r'"error(.)*13?"', errorcheck[i] ): > print "Reason of error \n", errorcheck[i] > print "script \n" , string2[position] > print "block of code \n" > print errorcheck[i-3] > print errorcheck[i-2] > print errorcheck[i-1] > print errorcheck[i] > print "Solution :\n" > print "Verify the list of objects belonging to Database " > break > else: > continue > break you throw away almost all the hard work to look for the line containing those four lines? It looks like you only need the "error...13" lines, the three lines that precede it and the last "Calling..." line occuring before the "error...13". > The problem I am facing in performance issue it takes some minutes to > print out the solution . Please advice if there can be performance > enhancements to this script . If you want to learn the Python way you should try hard to write your scripts without a single for i in range(...): ... loop. This style is usually the last resort, it may work for small datasets, but as soon as you have to deal with large files performance dives. Even worse, these loops tend to make your code hard to debug. Below is a suggestion for an implementation of what your code seems to be doing that only remembers the four recent lines and works with a single loop. If that saves you some time use that time to clean the scripts you have lying around from occurences of "for i in range(): ..." ;) from __future__ import print_function import re import sys from collections import deque def show(prompt, *values): print(prompt) for value in values: print(" {}".format(value.rstrip("\n"))) def process(filename): tail = deque(maxlen=4) # the last four lines script = None with open(filename) as instream: for line in instream: tail.append(line) if "Calling rdbms/admin" in line: script = line elif re.search('"error(.)*13?"', line) is not None: show("Reason of error:", tail[-1]) show("Script:", script) show("Block of code:", *tail) show( "Solution", "Verify the list of objects belonging to Database" ) break if __name__ == "__main__": filename = sys.argv[1] process(filename) ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Increase performance of the script
On 09/12/2018 10:15, Asad wrote: > f4 = open (r" /A/B/file1.log ", 'r' ) Are you sure you want that space at the start ofthe filename? > string2=f4.readlines() Here you read the entire file into memory. OK for small files but if it really can be 5GB that's a lot of memory being used. > for i in range(len(string2)): This is usually the wrong thing to do in Python. Aside from the loss of readability it requires the interpreter to do a lot of indexing operations which is not the fastest way to access things. > position=i > lastposition =position+1 > while True: > if re.search('Calling rdbms/admin',string2[lastposition]): You are using regex to search for a fixed string. Its simpler and faster to use string methods either foo in string or string.find(foo) > break > elif lastposition==len(string2)-1: > break > else: > lastposition += 1 This means you iterate over the whole file content multiple times. Once for every line in the file. If the file has 1000 lines that means you do these tests close to 100/2 times! This is probably your biggest performance issue. > errorcheck=string2[position:lastposition] > for i in range ( len ( errorcheck ) ): > if re.search ( r'"error(.)*13?"', errorcheck[i] ) This use of regex is valid since its a pattern. But it might be more efficient to join the lines and do a single regex search across lone boundaries. But you need to test/time it to see. But you also do another loop inside the outer loop. You need to look at how/whether you can eliminate all these inner loops and just loop over the file once - ideally without reading the entire thing into memory before you start. Processing it as you read it will be much more efficient. On a previous thread we showed you several ways you could approach that. > print "Reason of error \n", errorcheck[i] > print "script \n" , string2[position] > print "block of code \n" > print errorcheck[i-3] > print errorcheck[i-2] > print errorcheck[i-1] > print errorcheck[i] > print "Solution :\n" > print "Verify the list of objects belonging to Database " > break > else: > continue > break -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Increase performance of the script
Hi All , I have the following code to search for an error and prin the solution . /A/B/file1.log size may vary from 5MB -5 GB f4 = open (r" /A/B/file1.log ", 'r' ) string2=f4.readlines() for i in range(len(string2)): position=i lastposition =position+1 while True: if re.search('Calling rdbms/admin',string2[lastposition]): break elif lastposition==len(string2)-1: break else: lastposition += 1 errorcheck=string2[position:lastposition] for i in range ( len ( errorcheck ) ): if re.search ( r'"error(.)*13?"', errorcheck[i] ): print "Reason of error \n", errorcheck[i] print "script \n" , string2[position] print "block of code \n" print errorcheck[i-3] print errorcheck[i-2] print errorcheck[i-1] print errorcheck[i] print "Solution :\n" print "Verify the list of objects belonging to Database " break else: continue break The problem I am facing in performance issue it takes some minutes to print out the solution . Please advice if there can be performance enhancements to this script . Thanks, ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor