[PHP] Strip emails from a document
Hi gang: I thought I had a function to strip emails from a document, but I can't find it. So, before I start writing a common script, do any of you have a simple script to do this? Here's an example of the problem: Before: Will Alex ale...@cit.msu.edu;Moita Zact za...@cit.msu.edu;Bob Arms ar...@cit.msu.edu;Meia Terms term...@cit.msu.edu; After: ale...@cit.msu.edu za...@cit.msu.edu ar...@cit.msu.edu term...@cit.msu.edu Cheers, tedd _ t...@sperling.com http://sperling.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Strip emails from a document
I think you meant extract emails from document, right? I'd probably find `@` and iterate before and after unless I get posix punct, space, characters. But it'll probably give some false matches. So its really hard to find 100% emails from an arbitrary text. This is because valid email can contain many different type of characters. According to RFC 822 space is a valid character in email. So finding all the valid emails is tough. In a *trivial situation* an email would be separated by space. So find @ first. Then go back and front to find the first space. You'll get most common emails. Something like using this regex pattern [^[:space:]@]+@[^[:space:]]+ would suffice. But keep in mind, it'll work on trivial cases. Not on special cases. Regular expression can not be used on special cases. Here is full RFC-822 compliant email matching regular expression http://ex-parrot.com/~pdw/Mail-RFC822-Address.html More information can be found on http://stackoverflow.com/questions/201323/using-a-regular-expression-to-validate-an-email-address On Sat, Jan 26, 2013 at 10:24 PM, Tedd Sperling t...@sperling.com wrote: Hi gang: I thought I had a function to strip emails from a document, but I can't find it. So, before I start writing a common script, do any of you have a simple script to do this? Here's an example of the problem: Before: Will Alex ale...@cit.msu.edu;Moita Zact za...@cit.msu.edu;Bob Arms ar...@cit.msu.edu;Meia Terms term...@cit.msu.edu; After: ale...@cit.msu.edu za...@cit.msu.edu ar...@cit.msu.edu term...@cit.msu.edu Cheers, tedd _ t...@sperling.com http://sperling.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- Shiplu.Mokadd.im ImgSign.com | A dynamic signature machine Innovation distinguishes between follower and leader
Re: [PHP] Strip emails from a document
If you are expecting the email address to always be the same but the first part being different you can create a regular expression to match it that way. Using a regular expression over all is going to be your best bet as shiplu suggested. On Sat, Jan 26, 2013 at 10:54 AM, shiplu shiplu@gmail.com wrote: I think you meant extract emails from document, right? I'd probably find `@` and iterate before and after unless I get posix punct, space, characters. But it'll probably give some false matches. So its really hard to find 100% emails from an arbitrary text. This is because valid email can contain many different type of characters. According to RFC 822 space is a valid character in email. So finding all the valid emails is tough. In a *trivial situation* an email would be separated by space. So find @ first. Then go back and front to find the first space. You'll get most common emails. Something like using this regex pattern [^[:space:]@]+@[^[:space:]]+ would suffice. But keep in mind, it'll work on trivial cases. Not on special cases. Regular expression can not be used on special cases. Here is full RFC-822 compliant email matching regular expression http://ex-parrot.com/~pdw/Mail-RFC822-Address.html More information can be found on http://stackoverflow.com/questions/201323/using-a-regular-expression-to-validate-an-email-address On Sat, Jan 26, 2013 at 10:24 PM, Tedd Sperling t...@sperling.com wrote: Hi gang: I thought I had a function to strip emails from a document, but I can't find it. So, before I start writing a common script, do any of you have a simple script to do this? Here's an example of the problem: Before: Will Alex ale...@cit.msu.edu;Moita Zact za...@cit.msu.edu;Bob Arms ar...@cit.msu.edu;Meia Terms term...@cit.msu.edu; After: ale...@cit.msu.edu za...@cit.msu.edu ar...@cit.msu.edu term...@cit.msu.edu Cheers, tedd _ t...@sperling.com http://sperling.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- Shiplu.Mokadd.im ImgSign.com | A dynamic signature machine Innovation distinguishes between follower and leader
Re: [PHP] Strip emails from a document
On Jan 26, 2013, at 12:20 PM, Daniel Brown danbr...@php.net wrote: It's imperfect, but will work for the majority of emails: ?php function scrape_emails($input) { preg_match_all(/\b([a-z0-9%\._\+\-]+@[a-z0-9-\.]+\.[a-z]{2,6})\b/Ui,$input,$matches); return $matches; } ? It works imperfectly enough for me. :-) Here's the result: http://www.webbytedd.com/aa/strip-email/index.php Thanks to all. Cheers, tedd PS: Yes, 'extract is what I meant and more correct than 'strip' as I said. _ t...@sperling.com http://sperling.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Strip emails from a document
What is your input? On Sat, Jan 26, 2013 at 11:29 PM, Tedd Sperling t...@sperling.com wrote: On Jan 26, 2013, at 12:20 PM, Daniel Brown danbr...@php.net wrote: It's imperfect, but will work for the majority of emails: ?php function scrape_emails($input) { preg_match_all(/\b([a-z0-9%\._\+\-]+@[a-z0-9-\.]+\.[a-z]{2,6})\b/Ui,$input,$matches); return $matches; } ? It works imperfectly enough for me. :-) Here's the result: http://www.webbytedd.com/aa/strip-email/index.php Thanks to all. Cheers, tedd PS: Yes, 'extract is what I meant and more correct than 'strip' as I said. _ t...@sperling.com http://sperling.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- Shiplu.Mokadd.im ImgSign.com | A dynamic signature machine Innovation distinguishes between follower and leader
[PHP] Re: Strip emails from a document
On 26 Jan 2013 at 16:24, Tedd Sperling t...@sperling.com wrote: I thought I had a function to strip emails from a document, but I can't find it. So, before I start writing a common script, do any of you have a simple script to do this? I have a function that will take a comma separated string consisting of emails in these formats: Soap, Joe joe.s...@example.com Joe Soap joe.s...@example.com (Joe Soap) joe.s...@example.com and turn them into a list where all the above examples are converted to: Joe Soap joe.s...@example.com but it won't handle things like: joe@soap@example.com which I understand is also valid. You are welcome to it if you wish. Cheers, -- Cheers -- Tim -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Strip emails from a document
On Jan 26, 2013, at 12:48 PM, shiplu shiplu@gmail.com wrote: What is your input? Check my first email in this thread. Cheers, tedd _ t...@sperling.com http://sperling.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php