Hi Alan,
Here is the revised code below:
fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
for line in fh:
if not line.startswith('From'): continue
line2 = line.strip()
line3 = line2.split()
line4 = line3[1]
addresses = set()
addresses.add(line4)
count = count + 1
print addresses
print "There were", count, "lines in the file with From as the first word"
The code produces the following out put:
In [15]: %run _8_5_v_13.py
Enter file name: mbox-short.txt
set(['stephen.marqu...@uct.ac.za'])
set(['stephen.marqu...@uct.ac.za'])
set(['lo...@media.berkeley.edu'])
set(['lo...@media.berkeley.edu'])
set(['zq...@umich.edu'])
set(['zq...@umich.edu'])
set(['rjl...@iupui.edu'])
set(['rjl...@iupui.edu'])
set(['zq...@umich.edu'])
set(['zq...@umich.edu'])
set(['rjl...@iupui.edu'])
set(['rjl...@iupui.edu'])
set(['c...@iupui.edu'])
set(['c...@iupui.edu'])
set(['c...@iupui.edu'])
set(['c...@iupui.edu'])
set(['gsil...@umich.edu'])
set(['gsil...@umich.edu'])
set(['gsil...@umich.edu'])
set(['gsil...@umich.edu'])
set(['zq...@umich.edu'])
set(['zq...@umich.edu'])
set(['gsil...@umich.edu'])
set(['gsil...@umich.edu'])
set(['wagne...@iupui.edu'])
set(['wagne...@iupui.edu'])
set(['zq...@umich.edu'])
set(['zq...@umich.edu'])
set(['antra...@caret.cam.ac.uk'])
set(['antra...@caret.cam.ac.uk'])
set(['gopal.ramasammyc...@gmail.com'])
set(['gopal.ramasammyc...@gmail.com'])
set(['david.horw...@uct.ac.za'])
set(['david.horw...@uct.ac.za'])
set(['david.horw...@uct.ac.za'])
set(['david.horw...@uct.ac.za'])
set(['david.horw...@uct.ac.za'])
set(['david.horw...@uct.ac.za'])
set(['david.horw...@uct.ac.za'])
set(['david.horw...@uct.ac.za'])
set(['stephen.marqu...@uct.ac.za'])
set(['stephen.marqu...@uct.ac.za'])
set(['lo...@media.berkeley.edu'])
set(['lo...@media.berkeley.edu'])
set(['lo...@media.berkeley.edu'])
set(['lo...@media.berkeley.edu'])
set(['r...@media.berkeley.edu'])
set(['r...@media.berkeley.edu'])
set(['c...@iupui.edu'])
set(['c...@iupui.edu'])
set(['c...@iupui.edu'])
set(['c...@iupui.edu'])
set(['c...@iupui.edu'])
set(['c...@iupui.edu'])
There were 54 lines in the file with From as the first word
Question no. 1: is there a build in function for set that parses the data for
duplicates.
In [18]: dir (set)
Out[18]:
['__and__',
'__class__',
'__cmp__',
'__contains__',
'__delattr__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__gt__',
'__hash__',
'__iand__',
'__init__',
'__ior__',
'__isub__',
'__iter__',
'__ixor__',
'__le__',
'__len__',
'__lt__',
'__ne__',
'__new__',
'__or__',
'__rand__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__ror__',
'__rsub__',
'__rxor__',
'__setattr__',
'__sizeof__',
'__str__',
'__sub__',
'__subclasshook__',
'__xor__',
'add',
'clear',
'copy',
'difference',
'difference_update',
'discard',
'intersection',
'intersection_update',
'isdisjoint',
'issubset',
'issuperset',
'pop',
'remove',
'symmetric_difference',
'symmetric_difference_update',
'union',
'update']
Question no. 2: Why is there not a building function for append?
Question no. 3: If all else fails, i.e., append & set, my only option is the
slice the data set?
Regards,
Hal
Sent from Surface
From: Alan Gauld
Sent: Friday, July 31, 2015 2:00 AM
To: Tutor@python.org
On 31/07/15 01:25, ltc.hots...@gmail.com wrote:
> fname = raw_input("Enter file name: ")
> if len(fname) < 1 : fname = "mbox-short.txt"
> fh = open(fname)
> count = 0
> for line in fh:
> if not line.startswith('From'): continue
> line2 = line.strip()
> line3 = line2.split()
> line4 = line3[1]
> print line4
> count = count + 1
> print "There were", count, "lines in the file with From as the first word"
>
> Question: How do I remove the duplicates:
OK, You now have the original code working, well done.
To remove the duplicates you need to collect the addresses
rather than printing them. Since you want the addresses
to be unique you can use a set.
You do that by first creating an empty set above
the loop, let's call it addresses:
addresses = set()
Then replace your print statement with the set add()
method:
addresses.add(line4)
This means that at the end of your loop you will have
a set containing all of the unique addresses you found.
You now print the set. You can do that directly or for
more control over layout you can write another for
loop that prints each address individually.
print addresses
or
for address in addresses:
print address # plus any formatting you want
You can also sort the addresses by calling the
sorted() function before printing:
print sorted(addresses)
HTH
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor