Hi Tim, unfortunately the image didn't make it to the mailing list. What is the issue here? Is the extracted text not in the right order?
Order of PDF parsing and visual order of text are not related. BR Maruan > PDFBox Colleagues, > Any recommendations? > > On Mon, Dec 16, 2019 at 7:05 AM Lu Sun <vistax...@gmail.com> wrote: > > > Dear Tika Dev Team, > > > > > > > > Hope this email finds you well. > > > > > > > > I have been actively using Tika for pdf file reading. One issue I found is > > the parsing order. As shown in attached image, the parsing order of pdf > > file is not based on position of texts. > > > > > > > > As suggested in this github link > > <https://github.com/chrismattmann/tika-python/issues/266>;, I used a > > customized config file (see attached), hoping to solve the issue. But this > > has not worked out. If any chance, can you please review this issue, and > > provide any insights or solutions? > > > > > > > > Thanks so much in advance. > > > > > > > > Regards, > > > > Luke > > -- --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org