Ok, I think buying aspose works..I'll go ahead with that..Thank you On 2020/08/11 19:23:11, Tilman Hausherr <thaush...@t-online.de> wrote: > Am 11.08.2020 um 10:15 schrieb Aravind Swarana: > > Hi , > > I tried icecite, it is very buggy and Apache pdf box paragraph > > Identification works even better. Any other solutions.. or any one know how > > Aspose PDF does it internally ? > > > If Aspose works for you, then you should buy / license it. It's probably > cheaper than to work out your own algorithm. > > No, I don't know how Aspose works. > > Tilman > > > > > > > On 2020/08/10 18:32:58, Tilman Hausherr <thaush...@t-online.de> wrote: > >> Maybe icecite? > >> > >> https://github.com/ckorzen/icecite > >> > >> Tilman > >> > >> Am 10.08.2020 um 20:19 schrieb Aravind Swarana: > >>> Hi, > >>> > >>> I wanted to extract text as paragraphs using Apache PDFBox. I came to know > >>> from my reading that extracting text from PDF is not that simple. > >>> > >>> I have extracted Paragraphs from pdf using PDFBox API but they are not > >>> that > >>> great. > >>> > >>> Meanwhile I have evaluated a Paid version of PDF Parsing called Aspose PDF > >>> which is extracting paragraphs with very minimal error. > >>> > >>> I'm trying to implement a similar algorithm for Apache PDFBox. Can you > >>> guys > >>> suggest any recent Research paper or open source library which has > >>> efficient paragraph Identification algorithms. I'll need to evaluate and > >>> implement them. > >>> > >>> So far I found : > >>> https://github.com/elacin/PDFExtract (There were some errors Observed > >>> while > >>> evaluating this and not as perfect as Aspose) > >>> > >>> https://github.com/BMKEG/lapdftext/wiki/System-Overview (Not based on > >>> apache pdf box) > >>> > >>> I just need some suggestions whether there are any other algorithms I can > >>> look at and implement them ? > >>> > >>> > >>> > >>> > >>> Thanks & regards, > >>> Aravind Swarna > >>> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > >> For additional commands, e-mail: users-h...@pdfbox.apache.org > >> > >> > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > > For additional commands, e-mail: users-h...@pdfbox.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > >
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org