Re: [go-nuts] Extracting table data out of PDFs

2016-07-11 Thread Sankar P
2016-07-11 15:34 GMT+05:30 Sankar P : >> I don't know if it does what you want, but have you looked at >> https://godoc.org/rsc.io/pdf ? > > It seems to be unmaintained. I tried loading a complex PDF with plenty > of tables and it hung infinitely on Content() call in the first page. > I lost intere

Re: [go-nuts] Extracting table data out of PDFs

2016-07-11 Thread Sankar P
Using pdftohtml and then using regexes or parser on top, seem to be the easiest solution as of now. I came across tabula-java which also seems interesting. Thank you everyone for the recommendations. I've still not got multiple tables in a single page or tables over-flowing across pages working cor

Re: [go-nuts] Extracting table data out of PDFs

2016-07-11 Thread Sankar P
> I don't know if it does what you want, but have you looked at > https://godoc.org/rsc.io/pdf ? It seems to be unmaintained. I tried loading a complex PDF with plenty of tables and it hung infinitely on Content() call in the first page. I lost interest after that. Thanks. -- Sankar P http://ps

Re: [go-nuts] Extracting table data out of PDFs

2016-06-30 Thread Konstantin Khomoutov
On Thu, 30 Jun 2016 11:22:00 -0400 Shawn Milochik wrote: > I don't know of a Go solution, but if you are on Linux you could try > pdftotext and parse the text. With the obvious caveat of "it depends > on how the PDF was encoded." I'm using this approach in one of my applications. The only probl

Re: [go-nuts] Extracting table data out of PDFs

2016-06-30 Thread Ian Lance Taylor
On Thu, Jun 30, 2016 at 1:35 AM, Sankar wrote: > > Are there any stable/production-quality golang libraries that people are > aware of which could read and extract tabular data out of PDF documents ? I don't know if it does what you want, but have you looked at https://godoc.org/rsc.io/pdf ? Ian

Re: [go-nuts] Extracting table data out of PDFs

2016-06-30 Thread Shawn Milochik
I don't know of a Go solution, but if you are on Linux you could try pdftotext and parse the text. With the obvious caveat of "it depends on how the PDF was encoded." Worst-case you may be able to use tesseract OCR to generate text and then do the same thing. https://packages.debian.org/sid/poppl

Re: [go-nuts] Extracting table data out of PDFs

2016-06-30 Thread Sankar P
Yes, I did come across your service when I was searching. I have some PII information and so did not try the service. Having an on-premise solution is encouraging. I will play with it. Thanks. 2016-06-30 14:35 GMT+05:30 Peter Waller : > Hi Sankar, > > It may not be exactly what you're looking for

Re: [go-nuts] Extracting table data out of PDFs

2016-06-30 Thread Peter Waller
Hi Sankar, It may not be exactly what you're looking for but I can't resist the opportunity to plug our product! PDFTables.com has a remote API, you can see an example of how to use it here: https://github.com/pdftables/api/blob/master/go/cmd/pdftables-api/main.go You can get an API key and find

[go-nuts] Extracting table data out of PDFs

2016-06-30 Thread Sankar
Hi Are there any stable/production-quality golang libraries that people are aware of which could read and extract tabular data out of PDF documents ? Thanks Sankar -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group