Might not need to break out the neural nets just yet...try turning on
sortByPosition via the PDFParserConfig and/or tika_config.xml.

This is what you get:



<title>PDF Invoice Example</title>
</head>
<body><div class="page"><p />
<p>Invoice
</p>
<p>From: Invoice Number INV-3337
</p>
<p>DEMO - Sliced Invoices Order Number 12345
Suite 5A-1204 Invoice Date January 25, 2016
123 Somewhere Street Due Date January 31, 2016
Your City AZ 12345
[email protected] Total Due $93.50
</p>
<p>To:
Test Business
123 Somewhere St
Melbourne, VIC 3000
[email protected]
</p>
<p>Hrs/Qty Service Rate/Price Adjust Sub Total
</p>
<p>1.00 Web DesignThis is a sample description... $85.00 0.00% $85.00
</p>
<p>Pa
idSub Total $85.00
</p>
<p>Tax $8.50
Total $93.50
</p>
<p>ANZ Bank
ACC # 1234 1234
BSB # 4321 432
</p>
<p>Payment is due within 30 days from date of invoice. Late payment is
subject to fees of 5% per month.
Thanks for choosing DEMO - Sliced Invoices | [email protected]
Page 1/1</p>
<p />
<div class="annotation"><a
href="http://slicedinvoices.com/demo";>http://slicedinvoices.com/demo</a></div>
<div class="annotation"><a
href="http://slicedinvoices.com/demo";>http://slicedinvoices.com/demo</a></div>
<div class="annotation"><a
href="http://slicedinvoices.com/demo";>http://slicedinvoices.com/demo</a></div>
<div class="annotation"><a
href="mailto:[email protected]";>mailto:[email protected]</a></div>
</div>
</body></html>

On Thu, Jul 11, 2019 at 1:25 PM Sergey Beryozkin <[email protected]> wrote:
>
> Hi
>
> I've used Tika to parse this invoice PDF:
>
> https://slicedinvoices.com/pdf/wordpress-pdf-invoice-plugin-sample.pdf
>
> (AutoDetectParser, ToTextContentHandler), see below what is returned.
> The numbers like (1), (2) are added by myself, this is the preferred order 
> (approximately).
>
> Is it possible to hint somehow to Tika how to report the content ?
>
> Thanks Sergey
>
> PDF Invoice Example
> Invoice
>
> (5)Payment is due within 30 days from date of invoice. Late payment is 
> subject to fees of 5% per month.
>
> Thanks for choosing DEMO - Sliced Invoices | [email protected]
>
> Page 1/1
>
> (2)From:
>
> DEMO - Sliced Invoices
>
> Suite 5A-1204
>
> 123 Somewhere Street
>
> Your City AZ 12345
>
> [email protected]
>
> (1)Invoice Number INV-3337
>
> Order Number 12345
>
> Invoice Date January 25, 2016
>
> Due Date January 31, 2016
>
> Total Due $93.50
>
> (3)To:
>
> Test Business
>
> 123 Somewhere St
>
> Melbourne, VIC 3000
>
> [email protected]
>
> (4) Hrs/Qty Service Rate/Price Adjust Sub Total
>
> 1.00
> Web Design
> This is a sample description...
>
> $85.00 0.00% $85.00
>
> Sub Total $85.00
>
> Tax $8.50
>
> Total $93.50
>
> (5) ANZ Bank
>
> ACC # 1234 1234
>
> BSB # 4321 432 Pa
> id

Reply via email to