Re: PDF Scraping

2024-01-16 Thread Stephen Russell
Keeping mouth shut.



On Tue, Jan 16, 2024 at 10:24 AM Richard Kaye  wrote:

> As Stephen would say, Bad Ed! 
>
> From: ProfoxTech  On Behalf Of Ed Leafe
> Sent: Monday, January 15, 2024 8:18 PM
> To: profoxt...@leafe.com
> Subject: Re: PDF Scraping
>
> On Jan 12, 2024, at 21:51, Brian Erickson <mailto:br...@dashley.net>
> wrote:
> >
> > It is really easy to do with python.
>
> Heh, I think those exact words with most posts on this list! ;-P
>
>
> -- Ed Leafe
>
[excessive quoting removed by server]

___
Post Messages to: ProFox@leafe.com
Subscription Maintenance: https://mail.leafe.com/mailman/listinfo/profox
OT-free version of this list: https://mail.leafe.com/mailman/listinfo/profoxtech
Searchable Archive: https://leafe.com/archives
This message: 
https://leafe.com/archives/byMID/cajidmyj26d7ddfjijo2eavw_1v2d4-n7trfl+vne+yxxezk...@mail.gmail.com
** All postings, unless explicitly stated otherwise, are the opinions of the 
author, and do not constitute legal or medical advice. This statement is added 
to the messages for those lawyers who are too stupid to see the obvious.

RE: PDF Scraping

2024-01-16 Thread Richard Kaye
As Stephen would say, Bad Ed!  

From: ProfoxTech  On Behalf Of Ed Leafe
Sent: Monday, January 15, 2024 8:18 PM
To: profoxt...@leafe.com
Subject: Re: PDF Scraping

On Jan 12, 2024, at 21:51, Brian Erickson <mailto:br...@dashley.net> wrote:
> 
> It is really easy to do with python.

Heh, I think those exact words with most posts on this list! ;-P


-- Ed Leafe

___
Post Messages to: ProFox@leafe.com
Subscription Maintenance: https://mail.leafe.com/mailman/listinfo/profox
OT-free version of this list: https://mail.leafe.com/mailman/listinfo/profoxtech
Searchable Archive: https://leafe.com/archives
This message: 
https://leafe.com/archives/byMID/byapr10mb3398e7da9e53a816a692b7b0d2...@byapr10mb3398.namprd10.prod.outlook.com
** All postings, unless explicitly stated otherwise, are the opinions of the 
author, and do not constitute legal or medical advice. This statement is added 
to the messages for those lawyers who are too stupid to see the obvious.

Re: PDF Scraping

2024-01-15 Thread Ed Leafe
On Jan 12, 2024, at 21:51, Brian Erickson  wrote:
> 
> It is really easy to do with python.

Heh, I think those exact words with most posts on this list!  ;-P


-- Ed Leafe


___
Post Messages to: ProFox@leafe.com
Subscription Maintenance: https://mail.leafe.com/mailman/listinfo/profox
OT-free version of this list: https://mail.leafe.com/mailman/listinfo/profoxtech
Searchable Archive: https://leafe.com/archives
This message: 
https://leafe.com/archives/byMID/9762e150-b547-42ac-871e-ec5240d86...@leafe.com
** All postings, unless explicitly stated otherwise, are the opinions of the 
author, and do not constitute legal or medical advice. This statement is added 
to the messages for those lawyers who are too stupid to see the obvious.


RE: PDF Scraping

2024-01-15 Thread Chris Davis
Looks interesting, I will check it out ... thanks Gianni

-Original Message-
From: ProfoxTech  On Behalf Of Gianni Turri
Sent: Saturday, January 13, 2024 12:07 PM
To: profoxt...@leafe.com
Subject: Re: PDF Scraping

Another option is the Balabolka Text Extract Utility, I have used it with 
success in the past.

https://www.cross-plus-a.com/btext.htm

This is the command line version, so you can run it from VFP.

Example usage:

blb2txt -f "My file.pdf" -out "My file.txt"

The program has many options, for example you can process many files at once.

Gianni

On Fri, 12 Jan 2024 12:46:50 +, Chris Davis  wrote:

Forgot Ghostscript could do that, thank you Alan ... works a treat ?

-Original Message-
From: ProfoxTech  On Behalf Of Alan Bourke
Sent: Friday, January 12, 2024 11:27 AM
To: profoxt...@leafe.com
Subject: Re: PDF Scraping

Chris

This is not easy in general and probably not possible without going outside of 
VFP. You're probably looking at leveraging Ghostcript somehow to parse the PDF 
files and dump the text out.

--
  Alan Bourke
  alanpbourke (at) fastmail (dot) fm

[excessive quoting removed by server]

___
Post Messages to: ProFox@leafe.com
Subscription Maintenance: https://mail.leafe.com/mailman/listinfo/profox
OT-free version of this list: https://mail.leafe.com/mailman/listinfo/profoxtech
Searchable Archive: https://leafe.com/archives
This message: 
https://leafe.com/archives/byMID/cwlp123mb58903eca8add3ba554b3df618f...@cwlp123mb5890.gbrp123.prod.outlook.com
** All postings, unless explicitly stated otherwise, are the opinions of the 
author, and do not constitute legal or medical advice. This statement is added 
to the messages for those lawyers who are too stupid to see the obvious.


Re: PDF Scraping

2024-01-13 Thread Gianni Turri
Another option is the Balabolka Text Extract Utility, I have used it with 
success in the past.

https://www.cross-plus-a.com/btext.htm

This is the command line version, so you can run it from VFP.

Example usage:

blb2txt -f "My file.pdf" -out "My file.txt"

The program has many options, for example you can process many files at once.

Gianni

On Fri, 12 Jan 2024 12:46:50 +, Chris Davis  wrote:

Forgot Ghostscript could do that, thank you Alan ... works a treat ?

-Original Message-
From: ProfoxTech  On Behalf Of Alan Bourke
Sent: Friday, January 12, 2024 11:27 AM
To: profoxt...@leafe.com
Subject: Re: PDF Scraping

Chris

This is not easy in general and probably not possible without going outside of 
VFP. You're probably looking at leveraging Ghostcript somehow to parse the PDF 
files and dump the text out.

--
  Alan Bourke
  alanpbourke (at) fastmail (dot) fm

[excessive quoting removed by server]

___
Post Messages to: ProFox@leafe.com
Subscription Maintenance: https://mail.leafe.com/mailman/listinfo/profox
OT-free version of this list: https://mail.leafe.com/mailman/listinfo/profoxtech
Searchable Archive: https://leafe.com/archives
This message: 
https://leafe.com/archives/byMID/j4v4qi117amg36259o7st5reqqa1l2i...@4ax.com
** All postings, unless explicitly stated otherwise, are the opinions of the 
author, and do not constitute legal or medical advice. This statement is added 
to the messages for those lawyers who are too stupid to see the obvious.


Re: PDF Scraping

2024-01-12 Thread Brian Erickson
It is really easy to do with python. 
Sent from my iPhone

> On Jan 12, 2024, at 5:47 AM, Chris Davis  wrote:
> 
> Forgot Ghostscript could do that, thank you Alan ... works a treat 
> 
> -Original Message-
> From: ProfoxTech  On Behalf Of Alan Bourke
> Sent: Friday, January 12, 2024 11:27 AM
> To: profoxt...@leafe.com
> Subject: Re: PDF Scraping
> 
> Chris
> 
> This is not easy in general and probably not possible without going outside 
> of VFP. You're probably looking at leveraging Ghostcript somehow to parse the 
> PDF files and dump the text out.
> 
> --
>  Alan Bourke
>  alanpbourke (at) fastmail (dot) fm
> 
[excessive quoting removed by server]

___
Post Messages to: ProFox@leafe.com
Subscription Maintenance: https://mail.leafe.com/mailman/listinfo/profox
OT-free version of this list: https://mail.leafe.com/mailman/listinfo/profoxtech
Searchable Archive: https://leafe.com/archives
This message: 
https://leafe.com/archives/byMID/50299ca1-0998-45e0-a8f5-63eac6638...@dashley.net
** All postings, unless explicitly stated otherwise, are the opinions of the 
author, and do not constitute legal or medical advice. This statement is added 
to the messages for those lawyers who are too stupid to see the obvious.

RE: PDF Scraping

2024-01-12 Thread Chris Davis
Forgot Ghostscript could do that, thank you Alan ... works a treat 

-Original Message-
From: ProfoxTech  On Behalf Of Alan Bourke
Sent: Friday, January 12, 2024 11:27 AM
To: profoxt...@leafe.com
Subject: Re: PDF Scraping

Chris

This is not easy in general and probably not possible without going outside of 
VFP. You're probably looking at leveraging Ghostcript somehow to parse the PDF 
files and dump the text out.

--
  Alan Bourke
  alanpbourke (at) fastmail (dot) fm

___
Post Messages to: ProFox@leafe.com
Subscription Maintenance: https://mail.leafe.com/mailman/listinfo/profox
OT-free version of this list: https://mail.leafe.com/mailman/listinfo/profoxtech
Searchable Archive: https://leafe.com/archives This message: 
https://leafe.com/archives/byMID/c073fe82-ac75-47ad-8a8b-e0e69350a...@app.fastmail.com
** All postings, unless explicitly stated otherwise, are the opinions of the 
author, and do not constitute legal or medical advice. This statement is added 
to the messages for those lawyers who are too stupid to see the obvious.

___
Post Messages to: ProFox@leafe.com
Subscription Maintenance: https://mail.leafe.com/mailman/listinfo/profox
OT-free version of this list: https://mail.leafe.com/mailman/listinfo/profoxtech
Searchable Archive: https://leafe.com/archives
This message: 
https://leafe.com/archives/byMID/cwlp123mb5890f99b93ac2a87d4f9ae4b8f...@cwlp123mb5890.gbrp123.prod.outlook.com
** All postings, unless explicitly stated otherwise, are the opinions of the 
author, and do not constitute legal or medical advice. This statement is added 
to the messages for those lawyers who are too stupid to see the obvious.

Re: PDF Scraping

2024-01-12 Thread Alan Bourke
Chris

This is not easy in general and probably not possible without going outside of 
VFP. You're probably looking at leveraging Ghostcript somehow to parse the PDF 
files and dump the text out.

-- 
  Alan Bourke
  alanpbourke (at) fastmail (dot) fm

___
Post Messages to: ProFox@leafe.com
Subscription Maintenance: https://mail.leafe.com/mailman/listinfo/profox
OT-free version of this list: https://mail.leafe.com/mailman/listinfo/profoxtech
Searchable Archive: https://leafe.com/archives
This message: 
https://leafe.com/archives/byMID/c073fe82-ac75-47ad-8a8b-e0e69350a...@app.fastmail.com
** All postings, unless explicitly stated otherwise, are the opinions of the 
author, and do not constitute legal or medical advice. This statement is added 
to the messages for those lawyers who are too stupid to see the obvious.