Re: [Tutor] Increase performance of the script

2018-12-12 Thread Peter Otten
Steven D'Aprano wrote:

> [...]
>> In python 2.6 print statement work as print "Solution"
>> however after import collection I have to use print with
>> print("Solution") is this a known issue ?
> 
> As Peter says, you must have run
> 
> from __future__ import print_function
> 
> to see this behaviour. This has nothing to do with import collection.
> You can debug that for yourself by exiting the interactive interpreter,
> starting it up again, and trying to print before and after importing
> collection.

To be fair to Asad -- I sneaked in the __future__ import into my sample 
code. I did it to be able to write Python 3 code that would still run in his 
2.6 interpreter. 

In hindsight that was not a good idea as it can confuse someone who has 
never seen it, and the OP has yet to learn other more important things.

 


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Increase performance of the script

2018-12-12 Thread Steven D'Aprano
On Wed, Dec 12, 2018 at 12:52:09AM -0500, Avi Gross wrote:
> Asad,
> 
> I wonder if an import from __future__ happened, perhaps in the version of
> collections you used. Later versions of 2.x allow optional use of the 3.x
> style of print.


The effect of __future__ imports, like any other import, is only within 
the module that actually does the import. Even in the unlikely event 
that collections did such a future import, it would only effect print 
within that module, not globally or in the interactive interpreter.

Here's a demo:

# prfunc_demo.py
from __future__ import print_function

try:
exec("print 123")
except SyntaxError:
print("old style print failed, as expected")
print("as print is now a function")



And importing it into the interactive interpreter shows that the effect 
of the future import is localised:

[steve@ando ~]$ python2.6
Python 2.6.7 (r267:88850, Mar 10 2012, 12:32:58)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-51)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
py> import prfunc_demo
old style print failed, as expected
as print is now a function
py> print "But here in the REPL, nothing has changed."
But here in the REPL, nothing has changed.







-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Increase performance of the script

2018-12-12 Thread Avi Gross
Asad,

I wonder if an import from __future__ happened, perhaps in the version of
collections you used. Later versions of 2.x allow optional use of the 3.x
style of print.

When you redefine print, the old statement style is hidden or worse.

-Original Message-
From: Tutor  On Behalf Of
Asad
Sent: Tuesday, December 11, 2018 10:38 AM
To: tutor@python.org
Subject: [Tutor] Increase performance of the script

Hi All,

  I used your solution , however found a strange issue with deque :

I am using python 2.6.6:

>>> import collections
>>> d = collections.deque('abcdefg')
>>> print 'Deque:', d
  File "", line 1
print 'Deque:', d
 ^
SyntaxError: invalid syntax
>>> print ('Deque:', d)
Deque: deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
>>> print d
  File "", line 1
print d
  ^
SyntaxError: invalid syntax
>>> print (d)
deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])

In python 2.6 print statement work as print "Solution"

however after import collection I have to use print with print("Solution")
is this a known issue ?

Please let me know .

Thanks,


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Increase performance of the script

2018-12-11 Thread Steven D'Aprano
On Tue, Dec 11, 2018 at 09:07:58PM +0530, Asad wrote:
> Hi All,
> 
>   I used your solution , however found a strange issue with deque :

No you haven't. You found a *syntax error*, as the exception says:

> >>> print 'Deque:', d
>   File "", line 1
> print 'Deque:', d
>  ^
> SyntaxError: invalid syntax

which means the error occurs before the interpreter runs the code. You 
could replace the above line with any similar line:

print 'Not a deque', 1.2345

and you will get the same error.

When you are faced with an error in the interactive interpreter, you 
should try different things to see how they effect the problem. Does the 
problem go away if you use a float instead of a deque? If you change the 
string, does the problem go away? If you swap the order, does the 
problem go away? What if you use a single value instead of two?

This is called "debugging", and as a programmer, you need to learn how 
to do this.


[...]
> In python 2.6 print statement work as print "Solution"
> however after import collection I have to use print with print("Solution")
> is this a known issue ?

As Peter says, you must have run 

from __future__ import print_function

to see this behaviour. This has nothing to do with import collection. 
You can debug that for yourself by exiting the interactive interpreter, 
starting it up again, and trying to print before and after importing 
collection.



-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Increase performance of the script

2018-12-11 Thread Asad
Hi All,

  I used your solution , however found a strange issue with deque :

I am using python 2.6.6:

>>> import collections
>>> d = collections.deque('abcdefg')
>>> print 'Deque:', d
  File "", line 1
print 'Deque:', d
 ^
SyntaxError: invalid syntax
>>> print ('Deque:', d)
Deque: deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
>>> print d
  File "", line 1
print d
  ^
SyntaxError: invalid syntax
>>> print (d)
deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])

In python 2.6 print statement work as print "Solution"

however after import collection I have to use print with print("Solution")
is this a known issue ?

Please let me know .

Thanks,

On Mon, Dec 10, 2018 at 10:30 PM  wrote:

> Send Tutor mailing list submissions to
> tutor@python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://mail.python.org/mailman/listinfo/tutor
> or, via email, send a message with subject or body 'help' to
> tutor-requ...@python.org
>
> You can reach the person managing the list at
> tutor-ow...@python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Tutor digest..."
> Today's Topics:
>
>1. Re: Increase performance of the script (Peter Otten)
>2. Re: Increase performance of the script (Steven D'Aprano)
>3. Re: Increase performance of the script (Steven D'Aprano)
>
>
>
> ------ Forwarded message ------
> From: Peter Otten <__pete...@web.de>
> To: tutor@python.org
> Cc:
> Bcc:
> Date: Sun, 09 Dec 2018 21:17:53 +0100
> Subject: Re: [Tutor] Increase performance of the script
> Asad wrote:
>
> > Hi All ,
> >
> >   I have the following code to search for an error and prin the
> > solution .
> >
> > /A/B/file1.log size may vary from 5MB -5 GB
> >
> > f4 = open (r" /A/B/file1.log  ", 'r' )
> > string2=f4.readlines()
>
> Do not read the complete file into memory. Read one line at a time and
> keep
> only those lines around that you may have to look at again.
>
> > for i in range(len(string2)):
> > position=i
> > lastposition =position+1
> > while True:
> >  if re.search('Calling rdbms/admin',string2[lastposition]):
> >   break
> >  elif lastposition==len(string2)-1:
> >   break
> >  else:
> >   lastposition += 1
>
> You are trying to find a group of lines. The way you do it for a file of
> the
> structure
>
> foo
> bar
> baz
> end-of-group-1
> ham
> spam
> end-of-group-2
>
> you find the groups
>
> foo
> bar
> baz
> end-of-group-1
>
> bar
> baz
> end-of-group-1
>
> baz
> end-of-group-1
>
> ham
> spam
> end-of-group-2
>
> spam
> end-of-group-2
>
> That looks like a lot of redundancy which you can probably avoid. But
> wait...
>
>
> > errorcheck=string2[position:lastposition]
> > for i in range ( len ( errorcheck ) ):
> > if re.search ( r'"error(.)*13?"', errorcheck[i] ):
> > print "Reason of error \n", errorcheck[i]
> > print "script \n" , string2[position]
> > print "block of code \n"
> > print errorcheck[i-3]
> > print errorcheck[i-2]
> > print errorcheck[i-1]
> > print errorcheck[i]
> > print "Solution :\n"
> > print "Verify the list of objects belonging to Database "
> > break
> > else:
> > continue
> > break
>
> you throw away almost all the hard work to look for the line containing
> those four lines? It looks like you only need the
> "error...13" lines, the three lines that precede it and the last
> "Calling..." line occuring before the "error...13".
>
> > The problem I am facing in performance issue it takes some minutes to
> > print out the solution . Please advice if there can be performance
> > enhancements to this script .
>
> If you want to learn the Python way you should try hard to write your
> scripts without a single
>
> for i in range(...):
> ...
>
> loop. This style is usually the last resort, it may work for small
> datasets,
> but as soon as you have to deal with large files performance dives.
> Even worse, these loops tend to make your code hard to debug.
>
> Below is a suggestion for an implementation of what your code 

Re: [Tutor] Increase performance of the script

2018-12-09 Thread Steven D'Aprano
On Sun, Dec 09, 2018 at 03:45:07PM +0530, Asad wrote:
> Hi All ,
> 
>   I have the following code to search for an error and prin the
> solution .

Please tidy your code before asking for help optimizing it. We're 
volunteers, not being paid to work on your problem, and your code is too 
hard to understand.

Some comments:


> f4 = open (r" /A/B/file1.log  ", 'r' )
> string2=f4.readlines()

You have a variable "f4". Where are f1, f2 and f3?

You have a variable "string2", which is a lie, because it is not a 
string, it is a list.

I will be very surprised if the file name you show is correct. It has a 
leading space, and two trailing spaces.


> for i in range(len(string2)):
> position=i

Poor style. In Python, you almost never need to write code that iterates 
over the indexes (this is not Pascal). You don't need the assignment 
position=i. Better:

for position, line in enumerate(lines):
...


> lastposition =position+1

Poorly named variable. You call it "last position", but it is actually 
the NEXT position.


> while True:
>  if re.search('Calling rdbms/admin',string2[lastposition]):

Unnecessary use of regex, which will be slow. Better:

if 'Calling rdbms/admin' in line:
break


>   break
>  elif lastposition==len(string2)-1:
>   break

If you iterate over the lines, you don't need to check for the end of 
the list yourself.


A better solution is to use the *accumulator* design pattern to collect 
a block of lines for further analysis:

# Untested.
with open(filename, 'r') as f:
block = []
inside_block = False
for line in f:
line = line.strip()
if inside_block:
if line == "End of block":
inside_block = False
process(block)
block = []  # Reset to collect the next block.
else:
block.append(line)
elif line == "Start of block":
inside_block = True
# At the end of the loop, we might have a partial block.
if block:
 process(block)


Your process() function takes a single argument, the list of lines which 
makes up the block you care about.

If you need to know the line numbers, it is easy to adapt:

for line in f:

becomes:

for linenumber, line in enumerate(f):
# The next line is not needed in Python 3.
linenumber += 1  # Adjust to start line numbers at 1 instead of 0

and:
 
block.append(line)

becomes 

block.append((linenumber, line))


If you re-write your code using this accumulator pattern, using ordinary 
substring matching and equality instead of regular expressions whenever 
possible, I expect you will see greatly improved performance (as well as 
being much, much easier to understand and maintain).



-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Increase performance of the script

2018-12-09 Thread Steven D'Aprano
On Sun, Dec 09, 2018 at 03:45:07PM +0530, Asad wrote:
> Hi All ,
> 
>   I have the following code to search for an error and prin the
> solution .
> 
> /A/B/file1.log size may vary from 5MB -5 GB
[...]

> The problem I am facing in performance issue it takes some minutes to print
> out the solution . Please advice if there can be performance enhancements
> to this script .

How many minutes is "some"? If it takes 2 minutes to analyse a 5GB file, 
that's not bad performance. If it takes 2 minutes to analyse a 5MB file, 
that's not so good.



-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Increase performance of the script

2018-12-09 Thread Peter Otten
Asad wrote:

> Hi All ,
> 
>   I have the following code to search for an error and prin the
> solution .
> 
> /A/B/file1.log size may vary from 5MB -5 GB
> 
> f4 = open (r" /A/B/file1.log  ", 'r' )
> string2=f4.readlines()

Do not read the complete file into memory. Read one line at a time and keep 
only those lines around that you may have to look at again.

> for i in range(len(string2)):
> position=i
> lastposition =position+1
> while True:
>  if re.search('Calling rdbms/admin',string2[lastposition]):
>   break
>  elif lastposition==len(string2)-1:
>   break
>  else:
>   lastposition += 1

You are trying to find a group of lines. The way you do it for a file of the 
structure

foo
bar
baz
end-of-group-1
ham
spam
end-of-group-2

you find the groups

foo
bar
baz
end-of-group-1

bar
baz
end-of-group-1

baz
end-of-group-1

ham
spam
end-of-group-2

spam
end-of-group-2

That looks like a lot of redundancy which you can probably avoid. But 
wait...


> errorcheck=string2[position:lastposition]
> for i in range ( len ( errorcheck ) ):
> if re.search ( r'"error(.)*13?"', errorcheck[i] ):
> print "Reason of error \n", errorcheck[i]
> print "script \n" , string2[position]
> print "block of code \n"
> print errorcheck[i-3]
> print errorcheck[i-2]
> print errorcheck[i-1]
> print errorcheck[i]
> print "Solution :\n"
> print "Verify the list of objects belonging to Database "
> break
> else:
> continue
> break

you throw away almost all the hard work to look for the line containing 
those four lines? It looks like you only need the 
"error...13" lines, the three lines that precede it and the last 
"Calling..." line occuring before the "error...13".

> The problem I am facing in performance issue it takes some minutes to
> print out the solution . Please advice if there can be performance
> enhancements to this script .

If you want to learn the Python way you should try hard to write your 
scripts without a single

for i in range(...):
...

loop. This style is usually the last resort, it may work for small datasets, 
but as soon as you have to deal with large files performance dives.
Even worse, these loops tend to make your code hard to debug.

Below is a suggestion for an implementation of what your code seems to be 
doing that only remembers the four recent lines and works with a single 
loop. If that saves you some time use that time to clean the scripts you 
have lying around from occurences of "for i in range(): ..." ;)


from __future__ import print_function

import re
import sys
from collections import deque


def show(prompt, *values):
print(prompt)
for value in values:
print(" {}".format(value.rstrip("\n")))


def process(filename):
tail = deque(maxlen=4)  # the last four lines
script = None
with open(filename) as instream:
for line in instream:
tail.append(line)
if "Calling rdbms/admin" in line:
script = line
elif re.search('"error(.)*13?"', line) is not None:
show("Reason of error:", tail[-1])
show("Script:", script)
show("Block of code:", *tail)
show(
"Solution",
"Verify the list of objects belonging to Database"
)
break


if __name__ == "__main__":
filename = sys.argv[1]
process(filename)


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Increase performance of the script

2018-12-09 Thread Alan Gauld via Tutor
On 09/12/2018 10:15, Asad wrote:

> f4 = open (r" /A/B/file1.log  ", 'r' )

Are you sure you want that space at the start ofthe filename?


> string2=f4.readlines()

Here you read the entire file into memory. OK for small files
but if it really can be 5GB that's a lot of memory being used.

> for i in range(len(string2)):

This is usually the wrong thing to do in Python. Aside
from the loss of readability it requires the interpreter
to do a lot of indexing operations which is not the
fastest way to access things.

> position=i
> lastposition =position+1
> while True:
>  if re.search('Calling rdbms/admin',string2[lastposition]):

You are using regex to search for a fixed string.
Its simpler and faster to use string methods
either foo in string or string.find(foo)

>   break
>  elif lastposition==len(string2)-1:
>   break
>  else:
>   lastposition += 1

This means you iterate over the whole file content
multiple times. Once for every line in the file.
If the file has 1000 lines that means you do these
tests close to 100/2 times!

This is probably your biggest performance issue.

> errorcheck=string2[position:lastposition]
> for i in range ( len ( errorcheck ) ):
> if re.search ( r'"error(.)*13?"', errorcheck[i] )

This use of regex is valid since its a pattern.
But it might be more efficient to join the lines
and do a single regex search across lone boundaries.
But you need to test/time it to see.

But you also do another loop inside the outer loop.
You need to look at how/whether you can eliminate
all these inner loops and just loop over the file
once - ideally without reading the entire thing
into memory before you start.

Processing it as you read it will be much more efficient.
On a previous thread we showed you several ways you
could approach that.

> print "Reason of error \n", errorcheck[i]
> print "script \n" , string2[position]
> print "block of code \n"
> print errorcheck[i-3]
> print errorcheck[i-2]
> print errorcheck[i-1]
> print errorcheck[i]
> print "Solution :\n"
> print "Verify the list of objects belonging to Database "
> break
> else:
> continue
> break



-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Increase performance of the script

2018-12-09 Thread Asad
Hi All ,

  I have the following code to search for an error and prin the
solution .

/A/B/file1.log size may vary from 5MB -5 GB

f4 = open (r" /A/B/file1.log  ", 'r' )
string2=f4.readlines()
for i in range(len(string2)):
position=i
lastposition =position+1
while True:
 if re.search('Calling rdbms/admin',string2[lastposition]):
  break
 elif lastposition==len(string2)-1:
  break
 else:
  lastposition += 1
errorcheck=string2[position:lastposition]
for i in range ( len ( errorcheck ) ):
if re.search ( r'"error(.)*13?"', errorcheck[i] ):
print "Reason of error \n", errorcheck[i]
print "script \n" , string2[position]
print "block of code \n"
print errorcheck[i-3]
print errorcheck[i-2]
print errorcheck[i-1]
print errorcheck[i]
print "Solution :\n"
print "Verify the list of objects belonging to Database "
break
else:
continue
break

The problem I am facing in performance issue it takes some minutes to print
out the solution . Please advice if there can be performance enhancements
to this script .

Thanks,
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor