Add an auto-detecting Parser implementation
-------------------------------------------
Key: TIKA-67
URL: https://issues.apache.org/jira/browse/TIKA-67
Project: Tika
Issue Type: New Feature
Components: general
Reporter: Jukka Zitting
Assignee: Jukka Zitting
Fix For: 0.1-incubator
We should have an AutoDetectParser class that uses the MIME framework to
automatically detect the type of the document being parsed, and that dispatches
the parsing task to the parser class configured for the detected MIME type.
The class would work like this:
InputStream stream = ...;
ContentHandler handler = ...;
Metadata metadata = new Metadata();
metadata.set(Metadata.CONTENT_TYPE, ...); // optional content type hint
metadata.set("filename", ...); // optional file name hint
AutoDetectParser parser = new AutoDetectParser();
parser.setConfig(...); // optional TikaConfig configuration
parser.parse(stream, handler, metadata);
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.