This is an automated email from the ASF dual-hosted git repository.

nick pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/tika.git.


    from c8b9b44  TIKA-2626
     add aa3134d  Begin implementing the Supplemental and Fallback parsers from 
https://wiki.apache.org/tika/CompositeParserDiscussion
     add bc8a75e  Sample fallback and supplemental config files based on 
https://wiki.apache.org/tika/CompositeParserDiscussion
     add 217a9ce  Name sample config files based on issue number
     add 62cf6f6  Add TODOs for code to be shared/copied with other areas
     add 3555745  Ignore vim temp files
     add d5a06ba  Pull out deep Metadata clone to a utils method for re-use
     add 427417c  Prepare to track metadata between parsers
     add c3897db  Fix exception handling
     add d229ab6  Pull common "Real Parser" identification logic out to utils
     add f4a926c  Use utils for recording details of the parser used
     add 97b97b3  Move logic for recording embedded parser failures in the 
metadata to utils, and use for multiple parsers
     add 9be93c6  TODO updates, enforce allowed policies
     add 82f6f5f  Bring over stream reset logic from ParserDecorator and update 
comments
     add ee60f5e  Implement some metadata policies for merging values from 
multiple parsers
     add 348bfb9  More metadata handling between parsers, start on unit testing
     add 819898f  Start on a multiple parser that would try several text 
encodings, pick the best and use that, to ensure it would be possible
     add 62b02b0  Give parserCompleted the ParseContext, use that to pass 
around for the pick-best-text case what charsets to try next and what text we 
got from them
     add 6a39214  Some (currently failing) Supplemental Parser tests
     add d181c3c  Correct Metadata merging by policy, and get (incomplete) unit 
tests to pass
     add e82f7ce  Further unit tests
     add c989d2f  Optionally use a new Handler for each Parser, if a factory 
was given
     add 12a98b6  Keep all implemented and unit test
     add f1f3067  Remove un-used reference
     add 50f8591  Fix test references to embedded exception property definition
     add f80fc23  All obvious places that need changing have, alias back in the 
original name for compatibility
     add 6514a00  Merge commit '682c38d' into multiple-parsers
     add 5bd8b28  ParserUtils methods for handling the reset/re-read of the 
stream
     add 54477aa  Simplify stream resetting logic by using new ParserUtil 
methods for it
     add a830e20  List of parsers can just be a Collection, does not require a 
list
     add 4e7bdaf  Replace the old experimental Fallback ParserDecorator code 
with a call to the new FallbackParser
     add 6fde374  Mark the ContentHandlerFactory override parse method as still 
experimental, pending more feedback and a decision on returning a list of 
created handlers
     add a55a1de  Fix XML to be valid
     add d909b77  Preseve old fallback decorator behaviour with explicit mime 
types override
     add 4665561  Support loading well known enum params
     add be246f1  Pass the params to the composite parser constructors
     add dda50b6  MultipleParser constructor that accepts Params
     add f412d56  TikaConfig loading of Multiple Parsers with Policy
     add 26391d2  Ignore expected warnings, mark TODOs done
     add 742e60e  Merge branch 'master' of https://github.com/apache/tika into 
multiple-parsers

No new revisions were added by this update.

Summary of changes:
 .gitignore                                         |   1 +
 .../main/java/org/apache/tika/config/Param.java    |  22 +-
 .../java/org/apache/tika/config/TikaConfig.java    |  28 +-
 .../org/apache/tika/parser/CompositeParser.java    |   7 +-
 .../org/apache/tika/parser/ParserDecorator.java    |  61 +---
 .../apache/tika/parser/RecursiveParserWrapper.java |  34 +-
 .../parser/multiple/AbstractMultipleParser.java    | 357 +++++++++++++++++++++
 .../tika/parser/multiple/FallbackParser.java       |  76 +++++
 .../multiple/PickBestTextEncodingParser.java       | 193 +++++++++++
 .../tika/parser/multiple/SupplementingParser.java  |  86 +++++
 .../java/org/apache/tika/utils/ParserUtils.java    | 142 ++++++++
 .../org/apache/tika/config/TikaConfigTest.java     |  20 +-
 .../tika/parser/multiple/MultipleParserTest.java   | 253 +++++++++++++++
 ...-except.xml => TIKA-1509-multiple-fallback.xml} |  21 +-
 .../config/TIKA-1509-multiple-supplemental.xml     |  46 +++
 .../tika/parser/RecursiveParserWrapperTest.java    |   3 +-
 .../apache/tika/parser/crypto/TSDParserTest.java   |   6 +-
 .../microsoft/POIContainerExtractionTest.java      |   4 +-
 .../ooxml/OOXMLContainerExtractionTest.java        |   4 +-
 19 files changed, 1254 insertions(+), 110 deletions(-)
 create mode 100644 
tika-core/src/main/java/org/apache/tika/parser/multiple/AbstractMultipleParser.java
 create mode 100644 
tika-core/src/main/java/org/apache/tika/parser/multiple/FallbackParser.java
 create mode 100644 
tika-core/src/main/java/org/apache/tika/parser/multiple/PickBestTextEncodingParser.java
 create mode 100644 
tika-core/src/main/java/org/apache/tika/parser/multiple/SupplementingParser.java
 create mode 100644 
tika-core/src/main/java/org/apache/tika/utils/ParserUtils.java
 create mode 100644 
tika-core/src/test/java/org/apache/tika/parser/multiple/MultipleParserTest.java
 copy 
tika-core/src/test/resources/org/apache/tika/config/{TIKA-1445-default-except.xml
 => TIKA-1509-multiple-fallback.xml} (64%)
 create mode 100644 
tika-core/src/test/resources/org/apache/tika/config/TIKA-1509-multiple-supplemental.xml

-- 
To stop receiving notification emails like this one, please contact
n...@apache.org.

Reply via email to