Extracting the structure of an HTML Document

Sznajder ForMailingList Mon, 17 Aug 2015 08:51:40 -0700

Hi

I am a new user of Tika.


I am handling HTML documents... I succeeded to parse the HTML documents to
a "clean" text string.

However, I am interested to get the structure of the documents : what are
the different sections, what are the titles of these sections etc...

Is there a way to do that with Tika?

Thanks!

Benjamin

Extracting the structure of an HTML Document

Reply via email to