[ https://issues.apache.org/jira/browse/TIKA-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712283#action_12712283 ]
Jukka Zitting commented on TIKA-232: ------------------------------------ If you're instantiating the package parsers directly, then you can achieve this simply by overriding the parser that is used for the files inside a package: PackageParser parser = ...; parser.setParser(new EmptyParser()); You could also use the following hack to do this for a pre-configured composite parser like the AutoDetectParser: CompositeParser composite = new AutoDetectParser(); for (Parser parser : composite.getParsers().values()) { if (Parser instanceof PackageParser) { ((PackageParser) parser).setParser(new EmptyParser()); } } Perhaps someone has a good idea how to make this easier? > Scanning of archive files > ------------------------- > > Key: TIKA-232 > URL: https://issues.apache.org/jira/browse/TIKA-232 > Project: Tika > Issue Type: New Feature > Components: parser > Affects Versions: 0.3 > Environment: All > Reporter: Karl Heinz Marbaise > Priority: Minor > > If i parse an archive all the files inside the archive will be extracted with > their text as well. It would be nice to have the choice to extract only the > list of files (directory) of an archive instead of extracting the whole > contents. This seemed to be usable only for zip, tar, tar.gz, tar.bz2, .jar. > May be this could be realized by using a different calling or by a run time > configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.