Re: How to remove subset from a file efficiently?

2006-01-14 Thread bonono
fynali wrote: [bonono] Have you tried the explicit loop variant with psyco ? Sure I wouldn't mind trying; can you suggest some code snippets along the lines of which I should try...? [fynali] Needless to say, I'm utterly new to python and my programming skills know-how

Re: How to remove subset from a file efficiently?

2006-01-14 Thread fynali
$ cat cleanup.py #!/usr/bin/python postpaid_file = open('/home/oracle/stc/test/PSP333') outfile = open('/home/oracle/stc/test/PSP-CBR.dat', 'w') barred = {} for number in open('/home/oracle/stc/test/CBR333'): barred[number] = None # just add it as a key

Re: How to remove subset from a file efficiently?

2006-01-14 Thread fynali
$ cat cleanup_use_psyco_and_list_compr.py #!/usr/bin/python import psyco psyco.full() postpaid_file = open('/home/sajid/python/wip/stc/2/PSP333') outfile = open('/home/sajid/python/wip/stc/2/PSP-CBR.dat.psyco', 'w') barred = {} for number in

Re: How to remove subset from a file efficiently?

2006-01-14 Thread bonono
fynali wrote: $ cat cleanup_use_psyco_and_list_compr.py #!/usr/bin/python import psyco psyco.full() postpaid_file = open('/home/sajid/python/wip/stc/2/PSP333') outfile = open('/home/sajid/python/wip/stc/2/PSP-CBR.dat.psyco', 'w') barred = {} for number

Re: How to remove subset from a file efficiently?

2006-01-14 Thread fynali
$ cat cleanup_use_psyco_and_list_compr.py #!/usr/bin/python import psyco psyco.full() postpaid_file = open('/home/sajid/python/wip/stc/2/PSP333') outfile = open('/home/sajid/python/wip/stc/2/PSP-CBR.dat.psyco', 'w') barred = {} for number in

Re: How to remove subset from a file efficiently?

2006-01-14 Thread fynali
Sorry, pls read that ~15 secs. -- http://mail.python.org/mailman/listinfo/python-list

Re: How to remove subset from a file efficiently?

2006-01-14 Thread bonono
fynali wrote: Sorry, pls read that ~15 secs. That is more or less about it. As set() is faster than dict(), about 2x on my machine and I assume a portion of your time is in set/dict creation as it is pretty large data set. -- http://mail.python.org/mailman/listinfo/python-list

Re: How to remove subset from a file efficiently?

2006-01-14 Thread fynali
$ cat cleanup_use_psyco_and_list_compr.py #!/usr/bin/python #import psyco #psyco.full() postpaid_file = open('/home/sajid/python/wip/stc/2/PSP333') outfile = open('/home/sajid/python/wip/stc/2/PSP-CBR.dat.psyco', 'w') barred = {} for number in

Re: How to remove subset from a file efficiently?

2006-01-14 Thread Raymond Hettinger
b = set(file('/home/sajid/python/wip/stc/2/CBR333')) file('PSP-CBR.dat,ray','w').writelines(itertools.ifilterfalse(b.__contains__,file('/home/sajid/python/wip/stc/2/PSP333'))) -- $ time ./cleanup_ray.py real0m5.451s user0m4.496s sys

Re: How to remove subset from a file efficiently?

2006-01-14 Thread Bengt Richter
On 13 Jan 2006 23:17:05 -0800, [EMAIL PROTECTED] wrote: fynali wrote: $ cat cleanup_ray.py #!/usr/bin/python import itertools b = set(file('/home/sajid/python/wip/stc/2/CBR333'))

Re: How to remove subset from a file efficiently?

2006-01-13 Thread Christopher Weimann
On 01/12/2006-09:04AM, fynali wrote: - PSP320.dat (quite a large list of mobile numbers), - CBR319.dat (a subset of the above, a list of barred bumbers) fgrep -x -v -f CBR319.dat PSP320.dat PSP-CBR.dat -- http://mail.python.org/mailman/listinfo/python-list

Re: How to remove subset from a file efficiently?

2006-01-13 Thread fynali
The code it down to 5 lines! #!/usr/bin/python barred = set(open('/home/sajid/python/wip/CBR319.dat')) postpaid_file = open('/home/sajid/python/wip/PSP320.dat') outfile = open('/home/sajid/python/wip/PSP-CBR.dat', 'w') outfile.writelines(number for number in

Re: How to remove subset from a file efficiently?

2006-01-13 Thread Steve Holden
Fredrik Lundh wrote: Steve Holden wrote: looks like premature non-optimization to me... It might be quicker to establish a dict whose keys are the barred numbers and use that, rather than a list, to determine whether the input numbers should make it through. what do you think

Re: How to remove subset from a file efficiently?

2006-01-13 Thread AJL
On 12 Jan 2006 22:29:22 -0800 Raymond Hettinger [EMAIL PROTECTED] wrote: AJL wrote: How fast does this run? a = set(file('PSP320.dat')) b = set(file('CBR319.dat')) file('PSP-CBR.dat', 'w').writelines(a.difference(b)) Turning PSP into a set takes extra time, consumes

Re: How to remove subset from a file efficiently?

2006-01-13 Thread fynali
$ time fgrep -x -v -f CBR333 PSP333 PSP-CBR.dat.fgrep real0m31.551s user0m16.841s sys 0m0.912s -- $ time ./cleanup.py real0m6.080s user0m4.836s sys 0m0.408s -- $ wc -l PSP-CBR.dat.fgrep PSP-CBR.dat.python

Re: How to remove subset from a file efficiently?

2006-01-13 Thread fynali
$ cat cleanup_ray.py #!/usr/bin/python import itertools b = set(file('/home/sajid/python/wip/stc/2/CBR333')) file('PSP-CBR.dat,ray','w').writelines(itertools.ifilterfalse(b.__contains__,file('/home/sajid/python/wip/stc/2/PSP333'))) -- $ time ./cleanup_ray.py

Re: How to remove subset from a file efficiently?

2006-01-13 Thread bonono
fynali wrote: $ cat cleanup_ray.py #!/usr/bin/python import itertools b = set(file('/home/sajid/python/wip/stc/2/CBR333')) file('PSP-CBR.dat,ray','w').writelines(itertools.ifilterfalse(b.__contains__,file('/home/sajid/python/wip/stc/2/PSP333'))) -- $ time

Re: How to remove subset from a file efficiently?

2006-01-13 Thread fynali
-- $ ./cleanup.py Traceback (most recent call last): File ./cleanup.py, line 3, in ? import itertools ImportError: No module named itertools -- $ time ./cleanup.py File ./cleanup.py, line 8 outfile.writelines(number for number in postpaid_file

Re: How to remove subset from a file efficiently?

2006-01-13 Thread fynali
[bonono] Have you tried the explicit loop variant with psyco ? Sure I wouldn't mind trying; can you suggest some code snippets along the lines of which I should try...? [fynali] Needless to say, I'm utterly new to python and my programming skills know-how are rudimentary. (-:

Re: How to remove subset from a file efficiently?

2006-01-13 Thread Fredrik Lundh
fynali wrote: Is a rewrite possible of Raymond's or Fredrik's suggestions above which will still give me the time saving made? Python 2.2 don't have a readymade set type (new in 2.3), and it doesn't support generator expressions (the thing that caused the syntax error). however, using a

How to remove subset from a file efficiently?

2006-01-12 Thread fynali
Hi all, I have two files: - PSP320.dat (quite a large list of mobile numbers), - CBR319.dat (a subset of the above, a list of barred bumbers) # head PSP320.dat CBR319.dat == PSP320.dat == 96653696338 96653766996 96654609431 96654722608

Re: How to remove subset from a file efficiently?

2006-01-12 Thread Tim Williams (gmail)
On 12 Jan 2006 09:04:21 -0800, fynali [EMAIL PROTECTED] wrote: Hi all,I have two files:- PSP320.dat (quite a large list of mobile numbers),- CBR319.dat (a subset of the above, a list of barred bumbers)# head PSP320.dat CBR319.dat == PSP320.dat

Re: How to remove subset from a file efficiently?

2006-01-12 Thread Tim Williams (gmail)
On 12/01/06, Tim Williams (gmail) [EMAIL PROTECTED] wrote: On 12 Jan 2006 09:04:21 -0800, fynali [EMAIL PROTECTED] wrote: Hi all,I have two files:- PSP320.dat (quite a large list of mobile numbers),- CBR319.dat (a subset of the above, a list of barred bumbers)# head PSP320.dat

Re: How to remove subset from a file efficiently?

2006-01-12 Thread Fredrik Lundh
fynali wrote: Objective: to remove the numbers present in barred-list from the PSPfile. $ ls -lh PSP320.dat CBR319.dat ... 56M Dec 28 19:41 PSP320.dat ... 8.6M Dec 28 19:40 CBR319.dat $ wc -l PSP320.dat CBR319.dat 4,462,603

Re: How to remove subset from a file efficiently?

2006-01-12 Thread Raymond Hettinger
[fynali] I have two files: - PSP320.dat (quite a large list of mobile numbers), - CBR319.dat (a subset of the above, a list of barred bumbers) # print all non-barred mobile phone numbers barred = set(open('CBR319.dat')) for num in open('PSP320.dat'): if num not in

Re: How to remove subset from a file efficiently?

2006-01-12 Thread Steve Holden
Fredrik Lundh wrote: fynali wrote: Objective: to remove the numbers present in barred-list from the PSPfile. $ ls -lh PSP320.dat CBR319.dat ... 56M Dec 28 19:41 PSP320.dat ... 8.6M Dec 28 19:40 CBR319.dat $ wc -l PSP320.dat CBR319.dat 4,462,603

Re: How to remove subset from a file efficiently?

2006-01-12 Thread Fredrik Lundh
Steve Holden wrote: looks like premature non-optimization to me... It might be quicker to establish a dict whose keys are the barred numbers and use that, rather than a list, to determine whether the input numbers should make it through. what do you think barred =

Re: How to remove subset from a file efficiently?

2006-01-12 Thread AJL
On 12 Jan 2006 09:04:21 -0800 fynali [EMAIL PROTECTED] wrote: Hi all, I have two files: - PSP320.dat (quite a large list of mobile numbers), - CBR319.dat (a subset of the above, a list of barred bumbers) ... Objective: to remove the numbers present in barred-list from the

Re: How to remove subset from a file efficiently?

2006-01-12 Thread Mike Meyer
fynali [EMAIL PROTECTED] writes: Hi all, I have two files: Others have pointed out the Python solution - use a set instead of a list for membership testing. I want to point out a better Unix solution ('cause I probably wouldn't have written a Python program to do this): Objective: to remove

Re: How to remove subset from a file efficiently?

2006-01-12 Thread Raymond Hettinger
AJL wrote: How fast does this run? a = set(file('PSP320.dat')) b = set(file('CBR319.dat')) file('PSP-CBR.dat', 'w').writelines(a.difference(b)) Turning PSP into a set takes extra time, consumes unnecessary memory, eliminates duplicates (possibly a bad thing), and loses the original