Re: PDF Scraping
Keeping mouth shut. On Tue, Jan 16, 2024 at 10:24 AM Richard Kaye wrote: > As Stephen would say, Bad Ed! > > From: ProfoxTech On Behalf Of Ed Leafe > Sent: Monday, January 15, 2024 8:18 PM > To: profoxt...@leafe.com > Subject: Re: PDF Scraping > > On Jan 12, 2024, at 21:51, Brian Erickson <mailto:br...@dashley.net> > wrote: > > > > It is really easy to do with python. > > Heh, I think those exact words with most posts on this list! ;-P > > > -- Ed Leafe > [excessive quoting removed by server] ___ Post Messages to: ProFox@leafe.com Subscription Maintenance: https://mail.leafe.com/mailman/listinfo/profox OT-free version of this list: https://mail.leafe.com/mailman/listinfo/profoxtech Searchable Archive: https://leafe.com/archives This message: https://leafe.com/archives/byMID/cajidmyj26d7ddfjijo2eavw_1v2d4-n7trfl+vne+yxxezk...@mail.gmail.com ** All postings, unless explicitly stated otherwise, are the opinions of the author, and do not constitute legal or medical advice. This statement is added to the messages for those lawyers who are too stupid to see the obvious.
RE: PDF Scraping
As Stephen would say, Bad Ed! From: ProfoxTech On Behalf Of Ed Leafe Sent: Monday, January 15, 2024 8:18 PM To: profoxt...@leafe.com Subject: Re: PDF Scraping On Jan 12, 2024, at 21:51, Brian Erickson <mailto:br...@dashley.net> wrote: > > It is really easy to do with python. Heh, I think those exact words with most posts on this list! ;-P -- Ed Leafe ___ Post Messages to: ProFox@leafe.com Subscription Maintenance: https://mail.leafe.com/mailman/listinfo/profox OT-free version of this list: https://mail.leafe.com/mailman/listinfo/profoxtech Searchable Archive: https://leafe.com/archives This message: https://leafe.com/archives/byMID/byapr10mb3398e7da9e53a816a692b7b0d2...@byapr10mb3398.namprd10.prod.outlook.com ** All postings, unless explicitly stated otherwise, are the opinions of the author, and do not constitute legal or medical advice. This statement is added to the messages for those lawyers who are too stupid to see the obvious.
Re: PDF Scraping
On Jan 12, 2024, at 21:51, Brian Erickson wrote: > > It is really easy to do with python. Heh, I think those exact words with most posts on this list! ;-P -- Ed Leafe ___ Post Messages to: ProFox@leafe.com Subscription Maintenance: https://mail.leafe.com/mailman/listinfo/profox OT-free version of this list: https://mail.leafe.com/mailman/listinfo/profoxtech Searchable Archive: https://leafe.com/archives This message: https://leafe.com/archives/byMID/9762e150-b547-42ac-871e-ec5240d86...@leafe.com ** All postings, unless explicitly stated otherwise, are the opinions of the author, and do not constitute legal or medical advice. This statement is added to the messages for those lawyers who are too stupid to see the obvious.
RE: PDF Scraping
Looks interesting, I will check it out ... thanks Gianni -Original Message- From: ProfoxTech On Behalf Of Gianni Turri Sent: Saturday, January 13, 2024 12:07 PM To: profoxt...@leafe.com Subject: Re: PDF Scraping Another option is the Balabolka Text Extract Utility, I have used it with success in the past. https://www.cross-plus-a.com/btext.htm This is the command line version, so you can run it from VFP. Example usage: blb2txt -f "My file.pdf" -out "My file.txt" The program has many options, for example you can process many files at once. Gianni On Fri, 12 Jan 2024 12:46:50 +, Chris Davis wrote: Forgot Ghostscript could do that, thank you Alan ... works a treat ? -Original Message- From: ProfoxTech On Behalf Of Alan Bourke Sent: Friday, January 12, 2024 11:27 AM To: profoxt...@leafe.com Subject: Re: PDF Scraping Chris This is not easy in general and probably not possible without going outside of VFP. You're probably looking at leveraging Ghostcript somehow to parse the PDF files and dump the text out. -- Alan Bourke alanpbourke (at) fastmail (dot) fm [excessive quoting removed by server] ___ Post Messages to: ProFox@leafe.com Subscription Maintenance: https://mail.leafe.com/mailman/listinfo/profox OT-free version of this list: https://mail.leafe.com/mailman/listinfo/profoxtech Searchable Archive: https://leafe.com/archives This message: https://leafe.com/archives/byMID/cwlp123mb58903eca8add3ba554b3df618f...@cwlp123mb5890.gbrp123.prod.outlook.com ** All postings, unless explicitly stated otherwise, are the opinions of the author, and do not constitute legal or medical advice. This statement is added to the messages for those lawyers who are too stupid to see the obvious.
Re: PDF Scraping
Another option is the Balabolka Text Extract Utility, I have used it with success in the past. https://www.cross-plus-a.com/btext.htm This is the command line version, so you can run it from VFP. Example usage: blb2txt -f "My file.pdf" -out "My file.txt" The program has many options, for example you can process many files at once. Gianni On Fri, 12 Jan 2024 12:46:50 +, Chris Davis wrote: Forgot Ghostscript could do that, thank you Alan ... works a treat ? -Original Message- From: ProfoxTech On Behalf Of Alan Bourke Sent: Friday, January 12, 2024 11:27 AM To: profoxt...@leafe.com Subject: Re: PDF Scraping Chris This is not easy in general and probably not possible without going outside of VFP. You're probably looking at leveraging Ghostcript somehow to parse the PDF files and dump the text out. -- Alan Bourke alanpbourke (at) fastmail (dot) fm [excessive quoting removed by server] ___ Post Messages to: ProFox@leafe.com Subscription Maintenance: https://mail.leafe.com/mailman/listinfo/profox OT-free version of this list: https://mail.leafe.com/mailman/listinfo/profoxtech Searchable Archive: https://leafe.com/archives This message: https://leafe.com/archives/byMID/j4v4qi117amg36259o7st5reqqa1l2i...@4ax.com ** All postings, unless explicitly stated otherwise, are the opinions of the author, and do not constitute legal or medical advice. This statement is added to the messages for those lawyers who are too stupid to see the obvious.
Re: PDF Scraping
It is really easy to do with python. Sent from my iPhone > On Jan 12, 2024, at 5:47 AM, Chris Davis wrote: > > Forgot Ghostscript could do that, thank you Alan ... works a treat > > -Original Message- > From: ProfoxTech On Behalf Of Alan Bourke > Sent: Friday, January 12, 2024 11:27 AM > To: profoxt...@leafe.com > Subject: Re: PDF Scraping > > Chris > > This is not easy in general and probably not possible without going outside > of VFP. You're probably looking at leveraging Ghostcript somehow to parse the > PDF files and dump the text out. > > -- > Alan Bourke > alanpbourke (at) fastmail (dot) fm > [excessive quoting removed by server] ___ Post Messages to: ProFox@leafe.com Subscription Maintenance: https://mail.leafe.com/mailman/listinfo/profox OT-free version of this list: https://mail.leafe.com/mailman/listinfo/profoxtech Searchable Archive: https://leafe.com/archives This message: https://leafe.com/archives/byMID/50299ca1-0998-45e0-a8f5-63eac6638...@dashley.net ** All postings, unless explicitly stated otherwise, are the opinions of the author, and do not constitute legal or medical advice. This statement is added to the messages for those lawyers who are too stupid to see the obvious.
RE: PDF Scraping
Forgot Ghostscript could do that, thank you Alan ... works a treat -Original Message- From: ProfoxTech On Behalf Of Alan Bourke Sent: Friday, January 12, 2024 11:27 AM To: profoxt...@leafe.com Subject: Re: PDF Scraping Chris This is not easy in general and probably not possible without going outside of VFP. You're probably looking at leveraging Ghostcript somehow to parse the PDF files and dump the text out. -- Alan Bourke alanpbourke (at) fastmail (dot) fm ___ Post Messages to: ProFox@leafe.com Subscription Maintenance: https://mail.leafe.com/mailman/listinfo/profox OT-free version of this list: https://mail.leafe.com/mailman/listinfo/profoxtech Searchable Archive: https://leafe.com/archives This message: https://leafe.com/archives/byMID/c073fe82-ac75-47ad-8a8b-e0e69350a...@app.fastmail.com ** All postings, unless explicitly stated otherwise, are the opinions of the author, and do not constitute legal or medical advice. This statement is added to the messages for those lawyers who are too stupid to see the obvious. ___ Post Messages to: ProFox@leafe.com Subscription Maintenance: https://mail.leafe.com/mailman/listinfo/profox OT-free version of this list: https://mail.leafe.com/mailman/listinfo/profoxtech Searchable Archive: https://leafe.com/archives This message: https://leafe.com/archives/byMID/cwlp123mb5890f99b93ac2a87d4f9ae4b8f...@cwlp123mb5890.gbrp123.prod.outlook.com ** All postings, unless explicitly stated otherwise, are the opinions of the author, and do not constitute legal or medical advice. This statement is added to the messages for those lawyers who are too stupid to see the obvious.
Re: PDF Scraping
Chris This is not easy in general and probably not possible without going outside of VFP. You're probably looking at leveraging Ghostcript somehow to parse the PDF files and dump the text out. -- Alan Bourke alanpbourke (at) fastmail (dot) fm ___ Post Messages to: ProFox@leafe.com Subscription Maintenance: https://mail.leafe.com/mailman/listinfo/profox OT-free version of this list: https://mail.leafe.com/mailman/listinfo/profoxtech Searchable Archive: https://leafe.com/archives This message: https://leafe.com/archives/byMID/c073fe82-ac75-47ad-8a8b-e0e69350a...@app.fastmail.com ** All postings, unless explicitly stated otherwise, are the opinions of the author, and do not constitute legal or medical advice. This statement is added to the messages for those lawyers who are too stupid to see the obvious.