fynali wrote:
[bonono]
Have you tried the explicit loop variant with psyco ?
Sure I wouldn't mind trying; can you suggest some code snippets along
the lines of which I should try...?
[fynali]
Needless to say, I'm utterly new to python and my programming
skills know-how
$ cat cleanup.py
#!/usr/bin/python
postpaid_file = open('/home/oracle/stc/test/PSP333')
outfile = open('/home/oracle/stc/test/PSP-CBR.dat', 'w')
barred = {}
for number in open('/home/oracle/stc/test/CBR333'):
barred[number] = None # just add it as a key
$ cat cleanup_use_psyco_and_list_compr.py
#!/usr/bin/python
import psyco
psyco.full()
postpaid_file = open('/home/sajid/python/wip/stc/2/PSP333')
outfile = open('/home/sajid/python/wip/stc/2/PSP-CBR.dat.psyco',
'w')
barred = {}
for number in
fynali wrote:
$ cat cleanup_use_psyco_and_list_compr.py
#!/usr/bin/python
import psyco
psyco.full()
postpaid_file = open('/home/sajid/python/wip/stc/2/PSP333')
outfile = open('/home/sajid/python/wip/stc/2/PSP-CBR.dat.psyco',
'w')
barred = {}
for number
$ cat cleanup_use_psyco_and_list_compr.py
#!/usr/bin/python
import psyco
psyco.full()
postpaid_file = open('/home/sajid/python/wip/stc/2/PSP333')
outfile = open('/home/sajid/python/wip/stc/2/PSP-CBR.dat.psyco',
'w')
barred = {}
for number in
Sorry, pls read that ~15 secs.
--
http://mail.python.org/mailman/listinfo/python-list
fynali wrote:
Sorry, pls read that ~15 secs.
That is more or less about it. As set() is faster than dict(), about 2x
on my machine and I assume a portion of your time is in set/dict
creation as it is pretty large data set.
--
http://mail.python.org/mailman/listinfo/python-list
$ cat cleanup_use_psyco_and_list_compr.py
#!/usr/bin/python
#import psyco
#psyco.full()
postpaid_file = open('/home/sajid/python/wip/stc/2/PSP333')
outfile = open('/home/sajid/python/wip/stc/2/PSP-CBR.dat.psyco',
'w')
barred = {}
for number in
b = set(file('/home/sajid/python/wip/stc/2/CBR333'))
file('PSP-CBR.dat,ray','w').writelines(itertools.ifilterfalse(b.__contains__,file('/home/sajid/python/wip/stc/2/PSP333')))
--
$ time ./cleanup_ray.py
real0m5.451s
user0m4.496s
sys
On 13 Jan 2006 23:17:05 -0800, [EMAIL PROTECTED] wrote:
fynali wrote:
$ cat cleanup_ray.py
#!/usr/bin/python
import itertools
b = set(file('/home/sajid/python/wip/stc/2/CBR333'))
On 01/12/2006-09:04AM, fynali wrote:
- PSP320.dat (quite a large list of mobile numbers),
- CBR319.dat (a subset of the above, a list of barred bumbers)
fgrep -x -v -f CBR319.dat PSP320.dat PSP-CBR.dat
--
http://mail.python.org/mailman/listinfo/python-list
The code it down to 5 lines!
#!/usr/bin/python
barred = set(open('/home/sajid/python/wip/CBR319.dat'))
postpaid_file = open('/home/sajid/python/wip/PSP320.dat')
outfile = open('/home/sajid/python/wip/PSP-CBR.dat', 'w')
outfile.writelines(number for number in
Fredrik Lundh wrote:
Steve Holden wrote:
looks like premature non-optimization to me...
It might be quicker to establish a dict whose keys are the barred
numbers and use that, rather than a list, to determine whether the input
numbers should make it through.
what do you think
On 12 Jan 2006 22:29:22 -0800
Raymond Hettinger [EMAIL PROTECTED] wrote:
AJL wrote:
How fast does this run?
a = set(file('PSP320.dat'))
b = set(file('CBR319.dat'))
file('PSP-CBR.dat', 'w').writelines(a.difference(b))
Turning PSP into a set takes extra time, consumes
$ time fgrep -x -v -f CBR333 PSP333 PSP-CBR.dat.fgrep
real0m31.551s
user0m16.841s
sys 0m0.912s
--
$ time ./cleanup.py
real0m6.080s
user0m4.836s
sys 0m0.408s
--
$ wc -l PSP-CBR.dat.fgrep PSP-CBR.dat.python
$ cat cleanup_ray.py
#!/usr/bin/python
import itertools
b = set(file('/home/sajid/python/wip/stc/2/CBR333'))
file('PSP-CBR.dat,ray','w').writelines(itertools.ifilterfalse(b.__contains__,file('/home/sajid/python/wip/stc/2/PSP333')))
--
$ time ./cleanup_ray.py
fynali wrote:
$ cat cleanup_ray.py
#!/usr/bin/python
import itertools
b = set(file('/home/sajid/python/wip/stc/2/CBR333'))
file('PSP-CBR.dat,ray','w').writelines(itertools.ifilterfalse(b.__contains__,file('/home/sajid/python/wip/stc/2/PSP333')))
--
$ time
--
$ ./cleanup.py
Traceback (most recent call last):
File ./cleanup.py, line 3, in ?
import itertools
ImportError: No module named itertools
--
$ time ./cleanup.py
File ./cleanup.py, line 8
outfile.writelines(number for number in postpaid_file
[bonono]
Have you tried the explicit loop variant with psyco ?
Sure I wouldn't mind trying; can you suggest some code snippets along
the lines of which I should try...?
[fynali]
Needless to say, I'm utterly new to python and my programming
skills know-how are rudimentary.
(-:
fynali wrote:
Is a rewrite possible of Raymond's or Fredrik's suggestions above which
will still give me the time saving made?
Python 2.2 don't have a readymade set type (new in 2.3), and it doesn't
support generator expressions (the thing that caused the syntax error).
however, using a
Hi all,
I have two files:
- PSP320.dat (quite a large list of mobile numbers),
- CBR319.dat (a subset of the above, a list of barred bumbers)
# head PSP320.dat CBR319.dat
== PSP320.dat ==
96653696338
96653766996
96654609431
96654722608
On 12 Jan 2006 09:04:21 -0800, fynali [EMAIL PROTECTED] wrote:
Hi all,I have two files:- PSP320.dat (quite a large list of mobile numbers),- CBR319.dat (a subset of the above, a list of barred bumbers)# head PSP320.dat CBR319.dat
== PSP320.dat
On 12/01/06, Tim Williams (gmail) [EMAIL PROTECTED] wrote:
On 12 Jan 2006 09:04:21 -0800, fynali
[EMAIL PROTECTED] wrote:
Hi all,I have two files:- PSP320.dat (quite a large list of mobile numbers),- CBR319.dat (a subset of the above, a list of barred bumbers)# head PSP320.dat
fynali wrote:
Objective: to remove the numbers present in barred-list from the
PSPfile.
$ ls -lh PSP320.dat CBR319.dat
... 56M Dec 28 19:41 PSP320.dat
... 8.6M Dec 28 19:40 CBR319.dat
$ wc -l PSP320.dat CBR319.dat
4,462,603
[fynali]
I have two files:
- PSP320.dat (quite a large list of mobile numbers),
- CBR319.dat (a subset of the above, a list of barred bumbers)
# print all non-barred mobile phone numbers
barred = set(open('CBR319.dat'))
for num in open('PSP320.dat'):
if num not in
Fredrik Lundh wrote:
fynali wrote:
Objective: to remove the numbers present in barred-list from the
PSPfile.
$ ls -lh PSP320.dat CBR319.dat
... 56M Dec 28 19:41 PSP320.dat
... 8.6M Dec 28 19:40 CBR319.dat
$ wc -l PSP320.dat CBR319.dat
4,462,603
Steve Holden wrote:
looks like premature non-optimization to me...
It might be quicker to establish a dict whose keys are the barred
numbers and use that, rather than a list, to determine whether the input
numbers should make it through.
what do you think
barred =
On 12 Jan 2006 09:04:21 -0800
fynali [EMAIL PROTECTED] wrote:
Hi all,
I have two files:
- PSP320.dat (quite a large list of mobile numbers),
- CBR319.dat (a subset of the above, a list of barred bumbers)
...
Objective: to remove the numbers present in barred-list from the
fynali [EMAIL PROTECTED] writes:
Hi all,
I have two files:
Others have pointed out the Python solution - use a set instead of a
list for membership testing. I want to point out a better Unix
solution ('cause I probably wouldn't have written a Python program to
do this):
Objective: to remove
AJL wrote:
How fast does this run?
a = set(file('PSP320.dat'))
b = set(file('CBR319.dat'))
file('PSP-CBR.dat', 'w').writelines(a.difference(b))
Turning PSP into a set takes extra time, consumes unnecessary memory,
eliminates duplicates (possibly a bad thing), and loses the original
30 matches
Mail list logo