Hi Nick, Thanks so much for your reply.
>> While experimenting with some sample .msg files, I >> noticed that Tika is failing not returning the date of most messages. >> For example, Outlook indicates that the following message was sent on >> "Fri 6/22/2012 8:11 AM", but no date appears in the HTML head or in >> the early portion of the body of the Tika output [1]. I retrieved >> this using Tika 1.1 on Windows XP using the following command: > > Did you try with --metadata? I ran tika with --metadata on the same message I mentioned in my first email, and tika didn't output the message's date this way either. Here are the results: Author: PA History Mailbox Content-Length: 40960 Content-Type: application/vnd.ms-outlook Message-Bcc: Message-Cc: Message-From: History Mailbox Message-Recipient-Address: [email protected] Message-To: 'Snip' resourceName: RE Inquiry.msg subject: Inquiry title: RE: Inquiry > Also, are you sure that the messages contain the dates? Some kinds of > outlook files don't... This same message does show a date in Outlook ("Fri 6/22/2012 8:11 AM"). Do you know of some way to tell whether the date that appears in Outlook is actually inside the message (versus stored elsewhere in some sort of Outlook database)? (In other mail clients I would think to look at the "mail headers" mode, but I don't recall seeing such a mode in Outlook. Do you happen to know under what circumstances Outlook would not include a date? Tika does recognize dates in some of my sample messages, but definitely this is the minority. In fact, tika only retrieved dates for 3 of 47 messages. (Specifically, those 3 messages have the following fields: date, Creation-Date, and Last-Save-Date. Thanks for any suggestions, Joe
