Ok, I think buying aspose works..I'll go ahead with that..Thank you

On 2020/08/11 19:23:11, Tilman Hausherr <thaush...@t-online.de> wrote: 
> Am 11.08.2020 um 10:15 schrieb Aravind Swarana:
> > Hi ,
> > I tried icecite, it is very buggy and Apache pdf box paragraph 
> > Identification works even better. Any other solutions.. or any one know how 
> > Aspose PDF does it internally ?
> 
> 
> If Aspose works for you, then you should buy / license it. It's probably 
> cheaper than to work out your own algorithm.
> 
> No, I don't know how Aspose works.
> 
> Tilman
> 
> 
> 
> >
> > On 2020/08/10 18:32:58, Tilman Hausherr <thaush...@t-online.de> wrote:
> >> Maybe icecite?
> >>
> >> https://github.com/ckorzen/icecite
> >>
> >> Tilman
> >>
> >> Am 10.08.2020 um 20:19 schrieb Aravind Swarana:
> >>> Hi,
> >>>
> >>> I wanted to extract text as paragraphs using Apache PDFBox. I came to know
> >>> from my reading that extracting text from PDF is not that simple.
> >>>
> >>> I have extracted Paragraphs from pdf using PDFBox API but they are not 
> >>> that
> >>> great.
> >>>
> >>> Meanwhile I have evaluated a Paid version of PDF Parsing called Aspose PDF
> >>> which is extracting paragraphs with very minimal error.
> >>>
> >>> I'm trying to implement a similar algorithm for Apache PDFBox. Can you 
> >>> guys
> >>> suggest any recent Research paper or open source library which has
> >>> efficient paragraph Identification algorithms. I'll need to evaluate and
> >>> implement them.
> >>>
> >>> So far I found :
> >>> https://github.com/elacin/PDFExtract (There were some errors Observed 
> >>> while
> >>> evaluating this and not as perfect as Aspose)
> >>>
> >>> https://github.com/BMKEG/lapdftext/wiki/System-Overview (Not based on
> >>> apache pdf box)
> >>>
> >>> I just need some suggestions whether there are any other algorithms I can
> >>> look at and implement them ?
> >>>
> >>>
> >>>
> >>>
> >>> Thanks & regards,
> >>> Aravind Swarna
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> >> For additional commands, e-mail: users-h...@pdfbox.apache.org
> >>
> >>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: users-h...@pdfbox.apache.org
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to