Dear PDFBox Dev Team, Hope this message finds you well.
Just wanted to raise this for your attention. Please can you provide any solutions on the parsing order issue? Attached is my config file, an example of pdf file and my parsing results. Thanks so much in advance. Wish you and your team a Merry Christmas and Happy New Year. Regards, Luke On Tue, 17 Dec 2019 at 12:34, Tim Allison <talli...@apache.org> wrote: > PDFBox Colleagues, > Any recommendations? > > On Mon, Dec 16, 2019 at 7:05 AM Lu Sun <vistax...@gmail.com> wrote: > >> Dear Tika Dev Team, >> >> >> >> Hope this email finds you well. >> >> >> >> I have been actively using Tika for pdf file reading. One issue I found >> is the parsing order. As shown in attached image, the parsing order of pdf >> file is not based on position of texts. >> >> >> >> As suggested in this github link >> <https://github.com/chrismattmann/tika-python/issues/266>, I used a >> customized config file (see attached), hoping to solve the issue. But this >> has not worked out. If any chance, can you please review this issue, and >> provide any insights or solutions? >> >> >> >> Thanks so much in advance. >> >> >> >> Regards, >> >> Luke >> >
tika.config
Description: XML document
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org