[ 
https://issues.apache.org/jira/browse/TIKA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760113#action_12760113
 ] 

Ken Krugler commented on TIKA-288:
----------------------------------

The issue for me with using put(mimetype, parser) is that i'd need to know 
about all of the mime types that Tika internally maps to the HtmlParser.

E.g. for your example, I would really need:

    CompositeParser parser = new AutoDetectParser(); 
    Map<String, Parser> parsers = parser.getParsers();
    Parser myParser = new MyCustomHtmlParser();
    
    parsers.put("text/html", myParser);
    parsers.put("application/xhtml+xml", myParser);
    parsers.put("application/x-asp", myParser);

So having a way to replace by class would be safer and easier, e.g. a 
CompositeParser.replace(class, Parser).

> Support override parsers in AutoDetectParser
> --------------------------------------------
>
>                 Key: TIKA-288
>                 URL: https://issues.apache.org/jira/browse/TIKA-288
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 0.4
>            Reporter: Ken Krugler
>            Priority: Minor
>
> In some situations, being able to specify an alternative parser is useful 
> even when the general parser framework/full set of parsers is desired.
> For example, when processing HTML documents the current HtmlParser doesn't 
> pass through all of the tags that a vertical crawler might want.
> I'm proposing an alternative constructor, something like:
> public AutoDetectParser(Map<class, Parser>)
> where class would be the class of the standard Tika parser, and Parser is the 
> override.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to