Dear list,

I started to explore the possibilities of TIKA and I have a couple of questions 
that I will send to the list in separate emails, to keep things tidy.

To begin with, I noticed the following behaviour which might or might not be a 
bug. I asked this question on stackexchange 
(https://stackoverflow.com/questions/37226842/tika-metadata-from-email-misses-date
 
<https://stackoverflow.com/questions/37226842/tika-metadata-from-email-misses-date>)
 but perhaps this is a better place.

I have two email testfiles:

• A file that has been created by using "save as" in Mac Mail (this creates a 
.txt file)
• A file that has been created by dragging an email from Mac Mail to the 
Desktop (this creates an .eml file)

If I feed the files with

curl -T filename http://localhost:9998/detect/stream

I get the response "message/rfc822" for both files.

If I run

curl -T filename http://localhost:9998/meta

I get the metadata, but in the case of (1) I do not get the DATE extracted, 
while in case (2) I do.

I understand, of course, that the .eml file includes the full raw header, while 
the .txt file only includes a very abbreviated header. However, even the 
abbreviated header does include a "Date" field, and so I think Tika should 
extract it. Is this a bug or intentional? In the latter case, is there anything 
I could do to get the Tika to extract the date in case (1)?

I am running Tika-server 1.14.

Any suggestions much appreciated!
All best,
Philipp


Reply via email to