I try to use writer also without any luck !

StringWriter writer = new StringWriter();
Metadata metadata = new Metadata();
ContentHandler texthandler = new BodyContentHandler(writer);
Parser parser = new AutoDetectParser();
InputStream in = TikaInputStream.get(content.getContent());
parser.parse(in, texthandler, metadata, new ParseContext());    
LOG.info("Content: " + writer .toString());
LOG.info("is Empty? " + writer .toString().isEmpty());


Where is the problem !!!!
 

> To: [email protected]
> Subject: Tika with nutch
> Date: Sat, 18 Feb 2012 08:49:43 +0300
> 
> 
> Hi all ,,
> 
> I'm developing a plug-in in Nutch that implement HtmlParserFilter, I want to 
> use Tika tool kit to be able to convert the web page to plain text to be 
> processed.
> I knew that Tika is now integrated with Nutch since version 1.1, so I didn't 
> download anything and start coding.
> 
> found that BodyContentHandler may help so I use this code:
> 
> //=======
> //import packages:
> 
> import org.apache.tika.sax.BodyContentHandler;
> import org.apache.tika.metadata.Metadata;
> import org.apache.tika.parser.ParseContext;
> import org.apache.tika.parser.AutoDetectParser;
> import org.apache.tika.parser.Parser;
> import org.apache.tika.io.TikaInputStream;
> 
> //=====
> 
> 
> public ParseResult filter(Content content, ParseResult parseResult, 
> HTMLMetaTags metaTags, DocumentFragment doc) 
>       {
> Metadata metadata = new Metadata();
> BodyContentHandler texthandler = new BodyContentHandler();
> Parser parser = new AutoDetectParser();
> InputStream in = TikaInputStream.get(content.getContent());
> parser.parse(in, texthandler, metadata, new ParseContext());    
> LOG.info("Content: " + texthandler.toString());
> LOG.info("is Empty? " + texthandler.toString().isEmpty());
>      }
> 
> Now, The content is always empty, and isEmpty() gives me true all the time !
> 
> I don't know why, I've searched a lot, resources are rare, so I asked this 
> question here in the mailing list
> 
> Thanks in advanced and I appreciated :)
> 
>                                         
                                          

Reply via email to