Hi again, I have an update regarding my report about Tika not recognizing the date in an Outlook .msg files [1]. I tried using a different tool, ruby-msg (http://code.google.com/p/ruby-msg/), to process the same message as in my earlier email, and ruby-msg did pull out the date [2] This experiment shows that the email *is* in the .msg file, and that Tika is failing to pick it up.
Can anyone suggest the best way to proceed to improve Tika's handling of dates in Outlook .msg files? I'll be happy to file a bug report, but I'm just not sure whether this is an issue in Tika itself or in one of Tika's dependencies. Thanks, Joe [1] The Tika output, quoting from my last email: > Author: PA History Mailbox > Content-Length: 40960 > Content-Type: application/vnd.ms-outlook > Message-Bcc: > Message-Cc: > Message-From: History Mailbox > Message-Recipient-Address: [email protected] > Message-To: 'Snip' > resourceName: RE Â Inquiry.msg > subject: Inquiry > title: RE: Inquiry [2] The ruby-msg output -- notice the "Date:" line: From: "History Mailbox" <[email protected]> To: "Snip" <[email protected]> Subject: RE: Inquiry Date: Fri, 22 Jun 2012 12:11:00 -0000 Message-ID: <[email protected]> In-Reply-To: <CAJ4nNe1FPo7Q=10dbk8sdzprarzykjv6skv3nyg5l2li13b...@mail.gmail.com> Priority: 0 Thread-Topic: Inquiry Content-Type: multipart/alternative; boundary="----_=_NextPart_001_8149ed38.4fec8c61"
