You can use Boilerpipe which is supported by Tika. Check out Nutch' TikaParser 
as an example.

Regards,
Markus

https://github.com/apache/nutch/blob/master/src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/TikaParser.java
 
-----Original message-----
> From:Francesco Viscomi <[email protected]>
> Sent: Wednesday 6th September 2017 14:12
> To: [email protected]
> Subject: extract from URL text
> 
> Hi all,
> im new to Tika and im trying to extract 
 
> text from a web page, but i want only the text inside the body, every 
 
> other content i want strip off.
> Ive looking some example on 
 
> internet but every example i found so far isnt good because it do not 
 
> strip off some tag inside the menu for example, can someone help me?
> 
> thanks really much<br clear="all" />
> -- 
> Ing. Viscomi Francesco 

Reply via email to