Hi, I am also been looking since some time for a solution to interpret the text content of an pdf-invoice. But I don't think there's an easy solution for now. Deep learning and neural networks are too complex to quickly categorize the contents of an invoice. Cloud solutions such as Rossum <https://rossum.ai/> do this quite well. But all data is sent to AWS first, which is quite questionable for business data....
=== Ralph On 11.07.19 19:26, Chris Mattmann wrote: > > Tabula PDF is something I have been looking at for this as well as doing > like Deep Neural Nets… > > > > > > > > *From: *Sergey Beryozkin <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Thursday, July 11, 2019 at 10:25 AM > *To: *"[email protected]" <[email protected]> > *Subject: *[EXTERNAL] How to parse PDF more effectively > > > > Hi > > > > I've used Tika to parse this invoice PDF: > > > > https://slicedinvoices.com/pdf/wordpress-pdf-invoice-plugin-sample.pdf > > > > (AutoDetectParser, ToTextContentHandler), see below what is returned. > > The numbers like (1), (2) are added by myself, this is the preferred > order (approximately). > > > > Is it possible to hint somehow to Tika how to report the content ? > > > > Thanks Sergey > > > > PDF Invoice Example > Invoice > > (5)Payment is due within 30 days from date of invoice. Late payment is > subject to fees of 5% per month. > > Thanks for choosing DEMO - Sliced Invoices | [email protected] > <mailto:[email protected]> > > Page 1/1 > > (2)From: > > DEMO - Sliced Invoices > > Suite 5A-1204 > > 123 Somewhere Street > > Your City AZ 12345 > > [email protected] <mailto:[email protected]> > > (1)Invoice Number INV-3337 > > Order Number 12345 > > Invoice Date January 25, 2016 > > Due Date January 31, 2016 > > Total Due $93.50 > > (3)To: > > Test Business > > 123 Somewhere St > > Melbourne, VIC 3000 > > [email protected] <mailto:[email protected]> > > (4) Hrs/Qty Service Rate/Price Adjust Sub Total > > 1.00 > Web Design > This is a sample description... > > $85.00 0.00% $85.00 > > Sub Total $85.00 > > Tax $8.50 > > Total $93.50 > > (5) ANZ Bank > > ACC # 1234 1234 > > BSB # 4321 432 Pa > id > --
