On 12/01/06, Tim Williams (gmail) <[EMAIL PROTECTED]> wrote:


On 12 Jan 2006 09:04:21 -0800, fynali < [EMAIL PROTECTED]> wrote:
Hi all,

I have two files:

  - PSP0000320.dat (quite a large list of mobile numbers),
  - CBR0000319.dat (a subset of the above, a list of barred bumbers)

    # head PSP0000320.dat CBR0000319.dat
    ==> PSP0000320.dat <==
    96653696338
    96653766996
    96654609431
    96654722608
    96654738074
    96655697044
    96655824738
    96656190117
    96656256762
    96656263751

    ==> CBR0000319.dat <==
    96651131135
    96651131135
    96651420412
    96651730095
    96652399117
    96652399142
    96652399142
    96652399142
    96652399160
    96652399271

Objective: to remove the numbers present in barred-list from the
PSPfile.

    $ ls -lh PSP0000320.dat CBR0000319..dat
    ...  56M Dec 28 19:41 PSP0000320.dat
    ... 8.6M Dec 28 19:40 CBR0000319.dat

    $ wc -l PSP0000320.dat CBR0000319.dat
     4,462,603 PSP0000320.dat
       693,585 CBR0000319.dat

I wrote the following in python to do it:

    #: c01:rmcommon.py
    barredlist = open(r'/home/sjd/python/wip/CBR0000319.dat', 'r')
    postlist = open(r'/home/sjd/python/wip/PSP0000320.dat', 'r')
    outfile = open(r'/home/sjd/python/wip/PSP-CBR.dat', 'w')

    # reading it all in one go, so as to avoid frequent disk accesses
(assume machine has plenty memory)
    barredlist.read()
    postlist.read()

    #
    for number in postlist:
            if number in barrlist:
                    pass
            else:
                    outfile.write(number)

    barredlist.close(); postlist.close(); outfile.close()
    #:~

The above code simply takes too long to complete.  If I were to do a
diff -y PSP0000320.dat CBR0000319.dat, catch the '<' & clean it up with
sed -e 's/\([0-9]*\) *</\1/' > PSP-CBR.dat it takes <4 minutes to
complete.


It should be quicker to do this

   #
   for number in postlist:
           if not number in barrlist:
                   outfile.write(number)


and quicker doing this

   #
numbers =  [number for number in postlist if not number in barrlist]
c

I forgot to add this one

for num in (number for number in postlist if not number in barrlist):
         outfile.write(number)




-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to