[ 
https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723941#comment-14723941
 ] 

Tim Allison edited comment on TIKA-1657 at 8/31/15 7:50 PM:
------------------------------------------------------------

I looked at this a bit today, I'm now backing off to putting this only in 
tika-app with the "-c" option printing to STDOUT.

In order to maintain round-trip-ability (xml -> TikaConfig -> xml), we'll need 
to store a few more things, which makes things a bit ugly...we may need to 
store the original "include" mime-types/parsers as well as the "exclude" 
mime-types/parsers...I think:

# {{getMimeRegistryResource()}} in TikaConfig (String, trivial)
# {{getExcludedTypes()}} in ParserDecorator (fairly trivial)
# {{getOriginalIncludedTypes()}} in ParserDecorator (trivial, but ugly)
# {{getExcludedParsers()}} in CompositeParser (fairly trivial)
# {{getOriginalIncludedParsers()}} in CompositeParser (trivial, but ugly)

Does this look ok?  Any other recommendations?  Is there a more elegant way to 
represent a ParserDecorator in xml?

Plan B: store only the excluded and assume that they were included in the 
"included"...

There may be more items that arise as I progress on this, of course.

I'd like to get this issue out of the way before working on TIKA-1508.


was (Author: talli...@mitre.org):
I looked at this a bit today, I'm now backing off to putting this only in 
tika-app with the "-c" option printing to STDOUT.

In order to maintain round-trip-ability (xml -> TikaConfig -> xml), we'll need 
to store a few more things, which makes things a bit ugly...we may need to 
store the original "include" mime-types/parsers as well as the "exclude" 
mime-types/parsers...I think:

#. {{getMimeRegistryResource()}} in TikaConfig (String, trivial)
#. {{getExcludedTypes()}} in ParserDecorator (fairly trivial)
#. {{getOriginalIncludedTypes()}} in ParserDecorator (trivial, but ugly)
#. {{getExcludedParsers()}} in CompositeParser (fairly trivial)
#. {{getOriginalIncludedParsers()}} in CompositeParser (trivial, but ugly)

Does this look ok?  Any other recommendations?  Is there a more elegant way to 
represent a ParserDecorator in xml?

Plan B: store only the excluded and assume that they were included in the 
"included"...

There may be more items that arise as I progress on this, of course.

I'd like to get this issue out of the way before working on TIKA-1508.

> Allow easier dumping of TikaConfig file from tika-core
> ------------------------------------------------------
>
>                 Key: TIKA-1657
>                 URL: https://issues.apache.org/jira/browse/TIKA-1657
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>             Fix For: 1.11
>
>
> In TIKA-1418, we added an example for how to dump the config file so that 
> users could easily modify it.  I think we should go further and make this an 
> option at the tika-core level with hooks for tika-app and tika-server.  I 
> propose adding a main() to TikaConfig that will print the xml config file 
> that Tika is currently using to stdout.
> I'd like to put this into core so that e.g. Solr's DIH users can get by 
> without having to download tika-app separately.  
> There's every chance that I've not accounted for issues with dynamic loading 
> etc.  Also, I'd be ok with only having this available in tika-app and 
> tika-server if there are good reasons.
> Feedback?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to