There isn't currently a way to do this in Tika, but it _should_ be possible to add. I think there's been some interest in this over the years, but there hasn't been enough momentum to add this to Tika.
@Tilman this should be doable, right? On Tue, Nov 17, 2020 at 12:42 PM Bogdan Kostic <[email protected]> wrote: > Hello, > > I am using tika to extract text out of pdf documents. I want to write a > heuristic to differentiate between headings and paragraphs. For this, I > need font style and size of the extracted text. Is there any way to get > font style and size using tika? I was not able to find an option to extract > this information. > > Thank you in advance!
