Re: Parsing order issue

Maruan Sahyoun Tue, 17 Dec 2019 04:43:13 -0800

Hi Tim,

unfortunately the image didn't make it to the mailing list. What is the issue 
here? Is the extracted text not in the right
order?


Order of PDF parsing and visual order of text are not related.

BR
Maruan

 
> PDFBox Colleagues,
>   Any recommendations?
> 
> On Mon, Dec 16, 2019 at 7:05 AM Lu Sun <[email protected]> wrote:
> 
> > Dear Tika Dev Team,
> > 
> > 
> > 
> > Hope this email finds you well.
> > 
> > 
> > 
> > I have been actively using Tika for pdf file reading. One issue I found is
> > the parsing order. As shown in attached image, the parsing order of pdf
> > file is not  based on position of texts.
> > 
> > 
> > 
> > As suggested in this github link
> > <https://github.com/chrismattmann/tika-python/issues/266>;, I used a
> > customized config file (see attached), hoping to solve the issue. But this
> > has not worked out. If any chance, can you please review this issue, and
> > provide any insights or solutions?
> > 
> > 
> > 
> > Thanks so much in advance.
> > 
> > 
> > 
> > Regards,
> > 
> > Luke
> > 
-- 




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Parsing order issue

Reply via email to