This looks like an area for a new feature in both Tika and POI. I've only looked very briefly into the POI libraries, and I may have missed how to extract text from autoshapes. I'll open an issue in both projects.
-----Original Message----- From: Hiroshi Tatsumi [mailto:honekich...@comet.ocn.ne.jp] Sent: Sunday, July 21, 2013 10:16 AM To: user@tika.apache.org Subject: How to extract autoshape text in Excel 2007+ Hi, I am using Tika 1.3 and Solr 4.3.1. I'd like to extract autoshape text in Excel 2007+(.xlsx), but I can't. I tried to extract from some MS office files. The results are below. Success (I can extract autoshape text.) - Excel 2003(.xls) - Word 2003(.doc) - Word 2007+(.docx) Failed (I cannot extract autoshape text.) - Excel 2007+(.xlsx) Is this a bug? If you know, could you tell me how to extract autoshape text in Excel 2007+? Thanks, Hiro.