Hey team

I'm migrating an existing project to Tika 2.0.0.
I'm seeing a strange behavior.

TL;DR: the created date of the document changes depending on the timezone.

Long story:

I have a unit test which extracts content and metadata from a RTF document [1].
When using Tika 1.27, whatever the timezone defined for my JVM, I'm always 
getting the same value for "dcterms:created": "2016-07-07T13:38:00Z".

When running the same test with Tika 2.0.0, the date changes depending on the 
Timezone.

For example:

• Asia/Sakhalin gives dcterms:created=2016-07-06T23:38:00Z
• Asia/Colombo gives dcterms:created=2016-07-07T05:08:00Z
• Europe/Stockholm gives dcterms:created=2016-07-07T08:38:00Z

I don't know if it's a bug or expected. May be the RTF format does not specify 
the Timezone.
I'm surprised that I don't see the same behavior for Office documents actually.

WDYT?


[1] 
https://github.com/dadoonet/fscrawler/raw/master/test-documents/src/main/resources/documents/test.rtf

David

Reply via email to