This is an automated email from the ASF dual-hosted git repository. nick pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/tika.git.
from c8b9b44 TIKA-2626 add aa3134d Begin implementing the Supplemental and Fallback parsers from https://wiki.apache.org/tika/CompositeParserDiscussion add bc8a75e Sample fallback and supplemental config files based on https://wiki.apache.org/tika/CompositeParserDiscussion add 217a9ce Name sample config files based on issue number add 62cf6f6 Add TODOs for code to be shared/copied with other areas add 3555745 Ignore vim temp files add d5a06ba Pull out deep Metadata clone to a utils method for re-use add 427417c Prepare to track metadata between parsers add c3897db Fix exception handling add d229ab6 Pull common "Real Parser" identification logic out to utils add f4a926c Use utils for recording details of the parser used add 97b97b3 Move logic for recording embedded parser failures in the metadata to utils, and use for multiple parsers add 9be93c6 TODO updates, enforce allowed policies add 82f6f5f Bring over stream reset logic from ParserDecorator and update comments add ee60f5e Implement some metadata policies for merging values from multiple parsers add 348bfb9 More metadata handling between parsers, start on unit testing add 819898f Start on a multiple parser that would try several text encodings, pick the best and use that, to ensure it would be possible add 62b02b0 Give parserCompleted the ParseContext, use that to pass around for the pick-best-text case what charsets to try next and what text we got from them add 6a39214 Some (currently failing) Supplemental Parser tests add d181c3c Correct Metadata merging by policy, and get (incomplete) unit tests to pass add e82f7ce Further unit tests add c989d2f Optionally use a new Handler for each Parser, if a factory was given add 12a98b6 Keep all implemented and unit test add f1f3067 Remove un-used reference add 50f8591 Fix test references to embedded exception property definition add f80fc23 All obvious places that need changing have, alias back in the original name for compatibility add 6514a00 Merge commit '682c38d' into multiple-parsers add 5bd8b28 ParserUtils methods for handling the reset/re-read of the stream add 54477aa Simplify stream resetting logic by using new ParserUtil methods for it add a830e20 List of parsers can just be a Collection, does not require a list add 4e7bdaf Replace the old experimental Fallback ParserDecorator code with a call to the new FallbackParser add 6fde374 Mark the ContentHandlerFactory override parse method as still experimental, pending more feedback and a decision on returning a list of created handlers add a55a1de Fix XML to be valid add d909b77 Preseve old fallback decorator behaviour with explicit mime types override add 4665561 Support loading well known enum params add be246f1 Pass the params to the composite parser constructors add dda50b6 MultipleParser constructor that accepts Params add f412d56 TikaConfig loading of Multiple Parsers with Policy add 26391d2 Ignore expected warnings, mark TODOs done add 742e60e Merge branch 'master' of https://github.com/apache/tika into multiple-parsers No new revisions were added by this update. Summary of changes: .gitignore | 1 + .../main/java/org/apache/tika/config/Param.java | 22 +- .../java/org/apache/tika/config/TikaConfig.java | 28 +- .../org/apache/tika/parser/CompositeParser.java | 7 +- .../org/apache/tika/parser/ParserDecorator.java | 61 +--- .../apache/tika/parser/RecursiveParserWrapper.java | 34 +- .../parser/multiple/AbstractMultipleParser.java | 357 +++++++++++++++++++++ .../tika/parser/multiple/FallbackParser.java | 76 +++++ .../multiple/PickBestTextEncodingParser.java | 193 +++++++++++ .../tika/parser/multiple/SupplementingParser.java | 86 +++++ .../java/org/apache/tika/utils/ParserUtils.java | 142 ++++++++ .../org/apache/tika/config/TikaConfigTest.java | 20 +- .../tika/parser/multiple/MultipleParserTest.java | 253 +++++++++++++++ ...-except.xml => TIKA-1509-multiple-fallback.xml} | 21 +- .../config/TIKA-1509-multiple-supplemental.xml | 46 +++ .../tika/parser/RecursiveParserWrapperTest.java | 3 +- .../apache/tika/parser/crypto/TSDParserTest.java | 6 +- .../microsoft/POIContainerExtractionTest.java | 4 +- .../ooxml/OOXMLContainerExtractionTest.java | 4 +- 19 files changed, 1254 insertions(+), 110 deletions(-) create mode 100644 tika-core/src/main/java/org/apache/tika/parser/multiple/AbstractMultipleParser.java create mode 100644 tika-core/src/main/java/org/apache/tika/parser/multiple/FallbackParser.java create mode 100644 tika-core/src/main/java/org/apache/tika/parser/multiple/PickBestTextEncodingParser.java create mode 100644 tika-core/src/main/java/org/apache/tika/parser/multiple/SupplementingParser.java create mode 100644 tika-core/src/main/java/org/apache/tika/utils/ParserUtils.java create mode 100644 tika-core/src/test/java/org/apache/tika/parser/multiple/MultipleParserTest.java copy tika-core/src/test/resources/org/apache/tika/config/{TIKA-1445-default-except.xml => TIKA-1509-multiple-fallback.xml} (64%) create mode 100644 tika-core/src/test/resources/org/apache/tika/config/TIKA-1509-multiple-supplemental.xml -- To stop receiving notification emails like this one, please contact n...@apache.org.