On 31/07/15 15:39, [email protected] wrote:
fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
for line in fh:
if not line.startswith('From'): continue
line2 = line.strip()
line3 = line2.split()
line4 = line3[1]
addresses = set()
Notice I said you had to create and initialize the set
*above* the loop.
Here you are creating a new set every time round the
loop and throwing away the old one.
addresses.add(line4)
count = count + 1
print addresses
And notice I said to move the print statement
to *after* the loop so as to print the complete set,
not just the current status.
print "There were", count, "lines in the file with From as the first word"
The code produces the following out put:
In [15]: %run _8_5_v_13.py
Enter file name: mbox-short.txt
set(['[email protected]'])
set(['[email protected]'])
set(['[email protected]'])
Thats correct because you create a new set each time
and add precisely one element to it before throwing
it away and starting over next time round.
Question no. 1: is there a build in function for set that parses the data for
duplicates.
No because thats what a set does. it is a collection of
unique items. It will not allow duplicates.
Your problem is you create a new set of one item for
every line. So you have multiple sets with the same
data in them.
Question no. 2: Why is there not a building function for append?
add() is the equivalent of append for a set.
If you try to add() a value that already exists it
will be ignored.
Question no. 3: If all else fails, i.e., append & set, my only option is the
slice the data set?
No there are lots of other options but none of them are necessary
because a set is a collection of unique values. You just need to
use it properly. Read my instructions again, carefully:
You do that by first creating an empty set above
the loop, let's call it addresses:
addresses = set()
Then replace your print statement with the set add()
method:
addresses.add(line4)
This means that at the end of your loop you will have
a set containing all of the unique addresses you found.
You now print the set.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos
_______________________________________________
Tutor maillist - [email protected]
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor