Hello Tika Gurus,
I am trying to extract the main text content using BoilerPipe
interfaces provided in Tika.
When i use the interface.
ContentHandler handler1 = new
BoilerpipeContentHandler(textBuffer);
It works fine.
I believe this constructor uses the DefaultExtractor. I want to use the
article extractor.
I tried doing something like this.
ContentHandler handler1 = new BoilerpipeContentHandler(textBuffer);
ContentHandler handler2 = new BoilerpipeContentHandler(handler1,
ArticleExtractor.getInstance());
But this gave weird nested <A> errors.
Could you please let me know what is the right away to invoke the
ArticleExtractor.
Thanks
Shyam