You can use Boilerpipe which is supported by Tika. Check out Nutch' TikaParser as an example.
Regards, Markus https://github.com/apache/nutch/blob/master/src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/TikaParser.java -----Original message----- > From:Francesco Viscomi <[email protected]> > Sent: Wednesday 6th September 2017 14:12 > To: [email protected] > Subject: extract from URL text > > Hi all, > im new to Tika and im trying to extract > text from a web page, but i want only the text inside the body, every > other content i want strip off. > Ive looking some example on > internet but every example i found so far isnt good because it do not > strip off some tag inside the menu for example, can someone help me? > > thanks really much<br clear="all" /> > -- > Ing. Viscomi Francesco
