Hi, I'm trying to convert PDF into XML and I'm using PDFText2HTML class in tools as inspiration. I noticed that PDFText2HTML extends PDFTextStripper which extends LegacyPDFStreamEngine. The comment sections on top of LegacyPDFStreamEngine says something peculiar: * This class exists only so that we don't break the code of users who have their own subclasses * of PDFTextStripper. It replaces the good implementation of showGlyph in PDFStreamEngine, with * a bad implementation which is backwards compatible.
I looked at the comment of the "good implementation" of showGlyph in PDFStreamEngine and it says there: * Called when a glyph is to be processed.This method is intended for overriding in subclasses, * the default implementation does nothing. PDFStreamEngine.showGlyph is supposed to be the "good implementation" but it does nothing and a subclass should override it. How does this make sense? If LegacyPDFStreamEngine's showGlyph implementation is incorrect but PDFTextStripper relies on the incorrect behavior of showGlyph, how does PDFTextStripper compensate for it? Does it even matter in PDFTextStripper's case since this class just extracts text it's not rendering anything? Any clarification would be helpful. Thank you. -Justinus