I'm having an issue with custom formats in an Excel spreadsheet. I found some pretty old issues that lead me to believe custom formats should be supported. And doing some testing they seem to be for the most part.
https://issues.apache.org/jira/browse/TIKA-103 https://issues.apache.org/jira/browse/TIKA-360 https://issues.apache.org/jira/browse/TIKA-2025 Hopefully my attachment comes through. I've made a simple xlsx with 3 columns, 'formatting', 'expected' and 'actual'. Where the 'formatting' column is the name of the built-in format applied, or definition of the custom format, the 'expected' column is a text-formatted version of what I expect, and the 'actual' column is the column with formatting applied. Things seem to work fine for built-in formats. The exception being the 14-digit number is not coming through Tika with E-notation, but that seems to be due to TIKA-2025, so that's fine. But my two custom formats that zero-pad don't seem to work at all while my format that appends an 'a' to a number works fine. I've pasted the plain text output from TikaCLI app below. Is there some way to get Tika to respect the zero-pad formats? Or are my expectations wrong somehow? Sheet1 formatting expected actual General 123 123 General 12345678901234 12345678901234 General 12345678901 12345678901 Short Date 12/18/19 12/18/19 Long Date Wednesday, December 18, 2019 Wednesday, December 18, 2019 Percentage 50.00% 50.00% Number w Thousands Sep 1,234 1,234 Accounting $ (1,234.56) $ (1,234.56) 0# 01 1 0############# 012345678980123 1.23457E+12 ###a 123a 123a Thanks for any advice!
format_tests.xlsx
Description: MS-Excel 2007 spreadsheet
