Jonathan Gardner wrote:
On Feb 15, 3:34 pm, galileo228 <[email protected]> wrote:
I'm trying to write python code that will open a textfile and find the
email addresses inside it. I then want the code to take just the
characters to the left of the "@" symbol, and place them in a list.
(So if [email protected] was in the file, 'galileo228' would be
added to the list.)
Any suggestions would be much appeciated!
You may want to use regexes for this. For every match, split on '@'
and take the first bit.
Note that the actual specification for email addresses is far more
than a single regex can handle. However, for almost every single case
out there nowadays, a regex will get what you need.
You can even capture the part as you find the regexps. As
Jonathan mentions, finding RFC-compliant email addresses can be a
hairy/intractable problem. But you can get a pretty close
approximation:
import re
r = re.compile(r'([-\w._+]+)@(?:[-\w]+\.)+(?:\w{2,5})', re.I)
# ^
# if you want to allow local domains like
# u...@localhost
# then change the "+" marked with the "^"
# to a "*" and the "{2,5}" to "+" to unlimit
# the TLD. This will change the outcome
# of the last test "j...@com" to True
for test, expected in (
('[email protected]', True),
('[email protected]', True),
('@example.com', False),
('@sub.example.com', False),
('@com', False),
('j...@com', False),
):
m = r.match(test)
if bool(m) ^ expected:
print "Failed: %r should be %s" % (test, expected)
emails = set()
for line in file('test.txt'):
for match in r.finditer(line):
emails.add(match.group(1))
print "All the emails:",
print ', '.join(emails)
-tkc
--
http://mail.python.org/mailman/listinfo/python-list