[ 
https://issues.apache.org/jira/browse/TIKA-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-2590.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.18

> ExcelExtractor: cannot choose listening to the selected records only
> --------------------------------------------------------------------
>
>                 Key: TIKA-2590
>                 URL: https://issues.apache.org/jira/browse/TIKA-2590
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.17
>            Reporter: Grigoriy Alekseev
>            Priority: Critical
>             Fix For: 1.18, 2.0.0
>
>
> The listenForAllRecords argument is being always reset to 'true', so the 
> 'else' branch is never reached. It may cause incorrect text extraction when 
> records with certain unsupported types (e.g. SharedFormula) are present in a 
> file.
> {code:java}
>         public void processFile(DirectoryNode root, boolean 
> listenForAllRecords)
>                 throws IOException, SAXException, TikaException {
>             // Set up listener and register the records we want to process
>             HSSFRequest hssfRequest = new HSSFRequest();
>             listenForAllRecords = true;
>             if (listenForAllRecords) {
>                 hssfRequest.addListenerForAllRecords(formatListener);
>             } else {
>                 hssfRequest.addListener(formatListener, BOFRecord.sid);
>                 hssfRequest.addListener(formatListener, EOFRecord.sid);
>                 hssfRequest.addListener(formatListener, 
> DateWindow1904Record.sid);
>                 hssfRequest.addListener(formatListener, CountryRecord.sid);
>                 hssfRequest.addListener(formatListener, BoundSheetRecord.sid);
>                 hssfRequest.addListener(formatListener, SSTRecord.sid);
>                 hssfRequest.addListener(formatListener, FormulaRecord.sid);
>                 hssfRequest.addListener(formatListener, LabelRecord.sid);
>                 hssfRequest.addListener(formatListener, LabelSSTRecord.sid);
>                 hssfRequest.addListener(formatListener, NumberRecord.sid);
>                 hssfRequest.addListener(formatListener, RKRecord.sid);
>                 hssfRequest.addListener(formatListener, StringRecord.sid);
>                 hssfRequest.addListener(formatListener, HyperlinkRecord.sid);
>                 hssfRequest.addListener(formatListener, TextObjectRecord.sid);
>                 hssfRequest.addListener(formatListener, SeriesTextRecord.sid);
>                 hssfRequest.addListener(formatListener, FormatRecord.sid);
>                 hssfRequest.addListener(formatListener, 
> ExtendedFormatRecord.sid);
>                 hssfRequest.addListener(formatListener, 
> DrawingGroupRecord.sid);
>                 if 
> (extractor.officeParserConfig.getIncludeHeadersAndFooters()) {
>                     hssfRequest.addListener(formatListener, HeaderRecord.sid);
>                     hssfRequest.addListener(formatListener, FooterRecord.sid);
>                 }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to