I ported some code that does a good job of extracting the interesting parts from an email reply:
http://smalltalkhub.com/#!/~pdebruic/EmailReplyParser It has examples and can parse raw mails and text only or multipart emails. Its based on what github uses https://github.com/github/email_reply_parser I see no reason why it couldn't also be adapted for use with an initial email, as well as the replies. Sven Van Caekenberghe-2 wrote > With ZnHeaders and ZnMimePart you should get a long way in parsing mail > boxes. I believe some people have already experimented with this, but I am > not sure and I forgot. > >> On 06 Jul 2015, at 16:11, Dmitri Zagidulin < > dmitri@ > > wrote: >> >> I've been doing some mailing list analysis recently (in Ruby), and would >> be very interested in porting it over to Smalltalk. (I was actually >> getting really frustrated at the lack of proper debugging setup in Ruby, >> even though it had some great mail-related libraries). I was looking at >> thread lengths, numbers of unanswered threads, etc. >> >> Alexandre -- I haven't been able to find a good Mail parsing library for >> Smalltalk (preferably one that reads the Mbox format natively), I'd be >> curious to know what you end up using. >> >> As for the download URL -- the link Marcus gave is, unfortunately, in >> Piper-mail's own format (a simplified version of mbox, really). >> To get the actual .mbox file, you'd need to use this link: >> >> http://lists.pharo.org/mailman/private/pharo-dev_lists.pharo.org.mbox/pharo-dev_lists.pharo.org.mbox >> >> (Note that it requires you to authenticate with your mailing list email >> and password (that you created when you first signed up for the mailing >> list)). But once authenticated, you can download it with Zinc (or wget) >> or whatever, and start processing it. >> >> Let us know how it goes! >> >> >> On Mon, Jul 6, 2015 at 8:41 AM, Thierry Goubier < > thierry.goubier@ > > wrote: >> >> >> 2015-07-06 14:29 GMT+02:00 Peter Uhnák < > i.uhnak@ > >: >> The archives are straight text files, in which the individual messages >> are >> separated by a seemingly random number of LFs. >> >> Actually they are valid mbox files. (At least my mutt opened it just >> fine.) >> The separator is "From " line, not newlines. >> >> From followed by a space. Each message ends with an blank line >> >> https://en.wikipedia.org/wiki/Mbox, https://tools.ietf.org/html/rfc4155 >> >> It seems there are multiple, incompatible mbox formats. >> >> Thierry >> >> -- View this message in context: http://forum.world.st/Getting-the-mbox-file-for-this-mailing-list-tp4835958p4836140.html Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.
