parsing mime-type text/html with parse-tika

alxsss Tue, 31 Mar 2015 15:05:05 -0700

Hello,   
   
   
   
I try to use nutch-2.x trunk to parse text/html types with tika. 
   
I get error "parser for     text/html not found".  
   
   
   
   
   I see that parse-tika code was changed. These lines  
   
   
   
   
   // get the right parser using the mime type as a clue  
   
   
    String mimeType = page.getContentType().toString();
    CompositeParser compositeParser = (CompositeParser) tikaConfig.getParser();
    Parser parser = compositeParser.getParsers().get(MediaType.parse(mimeType));
return no parser.   
   
   
   
   
However, if I revert back to older version with  
   
   
 // get the right parser using the mime type as a clue
    String mimeType = page.getContentType().toString();
    Parser parser = tikaConfig.getParser(mimeType);
   
   
it works.
   
Has anyone tested the new tika with text/html types?
   
Thanks.
   
Alex.

parsing mime-type text/html with parse-tika

Reply via email to