yOn Tue, 4 Sep 2018, Tucker Barbour wrote:
I've exported a GMail archive in MBOX format using takeout.google.com. The MBOX archive also includes GChat messages. However, the GChat messages do not include a Date header. Instead the date sent is included in what appears to be a non-conforming RFC822 header which the tika mbox parser does not recognize.

As a user of Tika, were you expecting these to show up as additional emails in the mbox, or something else?

(The underlying library may not give us a choice, I haven't dug in enough recently to remember, but in case it does, user expectations are of interst!)

I'm wondering if anyone has any experience extracting metadata from Gmail exports, specifically gchat messages. Any help or guidance would be appreciated.

Any chance you could share / produce a small mbox file, with a handful of both real emails and these gchat messages in, so we can take a look? If you could open a bug in jira, and attach the small mbox file, that'd be great

Nick

Reply via email to