Hi Alan,





Here is the revised code below:




fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
for line in fh:
    if not line.startswith('From'): continue
    line2 = line.strip()
    line3 = line2.split()
    line4 = line3[1]
    addresses = set()
    addresses.add(line4)
    count = count + 1 
    print addresses
print "There were", count, "lines in the file with From as the first word"










The code produces the following out put:




In [15]: %run _8_5_v_13.py
Enter file name: mbox-short.txt
set(['stephen.marqu...@uct.ac.za'])
set(['stephen.marqu...@uct.ac.za'])
set(['lo...@media.berkeley.edu'])
set(['lo...@media.berkeley.edu'])
set(['zq...@umich.edu'])
set(['zq...@umich.edu'])
set(['rjl...@iupui.edu'])
set(['rjl...@iupui.edu'])
set(['zq...@umich.edu'])
set(['zq...@umich.edu'])
set(['rjl...@iupui.edu'])
set(['rjl...@iupui.edu'])
set(['c...@iupui.edu'])
set(['c...@iupui.edu'])
set(['c...@iupui.edu'])
set(['c...@iupui.edu'])
set(['gsil...@umich.edu'])
set(['gsil...@umich.edu'])
set(['gsil...@umich.edu'])
set(['gsil...@umich.edu'])
set(['zq...@umich.edu'])
set(['zq...@umich.edu'])
set(['gsil...@umich.edu'])
set(['gsil...@umich.edu'])
set(['wagne...@iupui.edu'])
set(['wagne...@iupui.edu'])
set(['zq...@umich.edu'])
set(['zq...@umich.edu'])
set(['antra...@caret.cam.ac.uk'])
set(['antra...@caret.cam.ac.uk'])
set(['gopal.ramasammyc...@gmail.com'])
set(['gopal.ramasammyc...@gmail.com'])
set(['david.horw...@uct.ac.za'])
set(['david.horw...@uct.ac.za'])
set(['david.horw...@uct.ac.za'])
set(['david.horw...@uct.ac.za'])
set(['david.horw...@uct.ac.za'])
set(['david.horw...@uct.ac.za'])
set(['david.horw...@uct.ac.za'])
set(['david.horw...@uct.ac.za'])
set(['stephen.marqu...@uct.ac.za'])
set(['stephen.marqu...@uct.ac.za'])
set(['lo...@media.berkeley.edu'])
set(['lo...@media.berkeley.edu'])
set(['lo...@media.berkeley.edu'])
set(['lo...@media.berkeley.edu'])
set(['r...@media.berkeley.edu'])
set(['r...@media.berkeley.edu'])
set(['c...@iupui.edu'])
set(['c...@iupui.edu'])
set(['c...@iupui.edu'])
set(['c...@iupui.edu'])
set(['c...@iupui.edu'])
set(['c...@iupui.edu'])
There were 54 lines in the file with From as the first word







Question no. 1: is there a build in function for set that parses the data for 
duplicates.




In [18]: dir (set)
Out[18]:
['__and__',
 '__class__',
 '__cmp__',
 '__contains__',
 '__delattr__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__iand__',
 '__init__',
 '__ior__',
 '__isub__',
 '__iter__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__or__',
 '__rand__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__ror__',
 '__rsub__',
 '__rxor__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__xor__',
 'add',
 'clear',
 'copy',
 'difference',
 'difference_update',
 'discard',
 'intersection',
 'intersection_update',
 'isdisjoint',
 'issubset',
 'issuperset',
 'pop',
 'remove',
 'symmetric_difference',
 'symmetric_difference_update',
 'union',
 'update']







 Question no. 2: Why is there not a building function for append?







Question no. 3: If all else fails, i.e., append & set,  my only option is the 
slice the data set?




Regards,

Hal






Sent from Surface





From: Alan Gauld
Sent: ‎Friday‎, ‎July‎ ‎31‎, ‎2015 ‎2‎:‎00‎ ‎AM
To: Tutor@python.org





On 31/07/15 01:25, ltc.hots...@gmail.com wrote:

> fname = raw_input("Enter file name: ")
> if len(fname) < 1 : fname = "mbox-short.txt"
> fh = open(fname)
> count = 0
> for line in fh:
>      if not line.startswith('From'): continue
>      line2 = line.strip()
>      line3 = line2.split()
>      line4 = line3[1]
>      print line4
>      count = count + 1
> print "There were", count, "lines in the file with From as the first word"
>
> Question: How do I remove the duplicates:

OK, You now have the original code working, well done.
To remove the duplicates you need to collect the addresses
rather than printing them. Since you want the addresses
to be unique you can use a set.

You do that by first creating an empty set above
the loop, let's call it addresses:

addresses = set()

Then replace your print statement with the set add()
method:

addresses.add(line4)

This means that at the end of your loop you will have
a set containing all of the unique addresses you found.
You now print the set. You can do that directly or for
more control over layout you can write another for
loop that prints each address individually.

print addresses

or

for address in addresses:
    print address   # plus any formatting you want

You can also sort the addresses by calling the
sorted() function before printing:

print sorted(addresses)


HTH
-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to