[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723941#comment-14723941 ]
Tim Allison edited comment on TIKA-1657 at 8/31/15 7:50 PM: ------------------------------------------------------------ I looked at this a bit today, I'm now backing off to putting this only in tika-app with the "-c" option printing to STDOUT. In order to maintain round-trip-ability (xml -> TikaConfig -> xml), we'll need to store a few more things, which makes things a bit ugly...we may need to store the original "include" mime-types/parsers as well as the "exclude" mime-types/parsers...I think: # {{getMimeRegistryResource()}} in TikaConfig (String, trivial) # {{getExcludedTypes()}} in ParserDecorator (fairly trivial) # {{getOriginalIncludedTypes()}} in ParserDecorator (trivial, but ugly) # {{getExcludedParsers()}} in CompositeParser (fairly trivial) # {{getOriginalIncludedParsers()}} in CompositeParser (trivial, but ugly) Does this look ok? Any other recommendations? Is there a more elegant way to represent a ParserDecorator in xml? Plan B: store only the excluded and assume that they were included in the "included"... There may be more items that arise as I progress on this, of course. I'd like to get this issue out of the way before working on TIKA-1508. was (Author: talli...@mitre.org): I looked at this a bit today, I'm now backing off to putting this only in tika-app with the "-c" option printing to STDOUT. In order to maintain round-trip-ability (xml -> TikaConfig -> xml), we'll need to store a few more things, which makes things a bit ugly...we may need to store the original "include" mime-types/parsers as well as the "exclude" mime-types/parsers...I think: #. {{getMimeRegistryResource()}} in TikaConfig (String, trivial) #. {{getExcludedTypes()}} in ParserDecorator (fairly trivial) #. {{getOriginalIncludedTypes()}} in ParserDecorator (trivial, but ugly) #. {{getExcludedParsers()}} in CompositeParser (fairly trivial) #. {{getOriginalIncludedParsers()}} in CompositeParser (trivial, but ugly) Does this look ok? Any other recommendations? Is there a more elegant way to represent a ParserDecorator in xml? Plan B: store only the excluded and assume that they were included in the "included"... There may be more items that arise as I progress on this, of course. I'd like to get this issue out of the way before working on TIKA-1508. > Allow easier dumping of TikaConfig file from tika-core > ------------------------------------------------------ > > Key: TIKA-1657 > URL: https://issues.apache.org/jira/browse/TIKA-1657 > Project: Tika > Issue Type: Improvement > Reporter: Tim Allison > Priority: Minor > Fix For: 1.11 > > > In TIKA-1418, we added an example for how to dump the config file so that > users could easily modify it. I think we should go further and make this an > option at the tika-core level with hooks for tika-app and tika-server. I > propose adding a main() to TikaConfig that will print the xml config file > that Tika is currently using to stdout. > I'd like to put this into core so that e.g. Solr's DIH users can get by > without having to download tika-app separately. > There's every chance that I've not accounted for issues with dynamic loading > etc. Also, I'd be ok with only having this available in tika-app and > tika-server if there are good reasons. > Feedback? -- This message was sent by Atlassian JIRA (v6.3.4#6332)