Hi Nick,

Thanks so much for your reply.

>> While experimenting with some sample .msg files, I
>> noticed that Tika is failing not returning the date of most messages.
>> For example, Outlook indicates that the following message was sent on
>> "Fri 6/22/2012 8:11 AM", but no date appears in the HTML head or in
>> the early portion of the body of the Tika output [1].  I retrieved
>> this using Tika 1.1 on Windows XP using the following command:
>
> Did you try with --metadata?

I ran tika with --metadata on the same message I mentioned in my first
email, and tika didn't output the message's date this way either.
Here are the results:

Author: PA History Mailbox
Content-Length: 40960
Content-Type: application/vnd.ms-outlook
Message-Bcc:
Message-Cc:
Message-From: History Mailbox
Message-Recipient-Address: [email protected]
Message-To: 'Snip'
resourceName: RE  Inquiry.msg
subject: Inquiry
title: RE: Inquiry

> Also, are you sure that the messages contain the dates? Some kinds of
> outlook files don't...

This same message does show a date in Outlook ("Fri 6/22/2012 8:11
AM").  Do you know of some way to tell whether the date that appears
in Outlook is actually inside the message (versus stored elsewhere in
some sort of Outlook database)?  (In other mail clients I would think
to look at the "mail headers" mode, but I don't recall seeing such a
mode in Outlook.  Do you happen to know under what circumstances
Outlook would not include a date?

Tika does recognize dates in some of my sample messages, but
definitely this is the minority.  In fact, tika only retrieved dates
for 3 of 47 messages.  (Specifically, those 3 messages have the
following fields: date, Creation-Date, and Last-Save-Date.

Thanks for any suggestions,
Joe

Reply via email to