RE: How to extract whole text from a PDF file with the PDF widget?

2021-12-13 Thread Ralph DiMola via use-livecode
Sorry I could not get back to you on this until now. (-1)s don't work here.

put 1 into tHilitedArray["from"]["page"]

  put 1 into tHilitedArray["from"]["index"]

  put 99 into tHilitedArray["to"]["page"]

  put 99 into tHilitedArray["to"]["index"]

  set the hilitedRange of control "PDF1" to tHilitedArray

  put the hilitedRangeText of control "PDF1" into tText

This will work if you don't need to know the page number. If you do then cycle 
thru each page. (1 to the NumberOfPages of control "PDF1")

Ralph DiMola
IT Director
Evergreen Information Services
rdim...@evergreeninfo.net


-Original Message-
From: use-livecode [mailto:use-livecode-boun...@lists.runrev.com] On Behalf Of 
Paul Dupuis via use-livecode
Sent: Sunday, December 12, 2021 7:18 PM
To: use-livecode@lists.runrev.com
Cc: Paul Dupuis
Subject: Re: How to extract whole text from a PDF file with the PDF widget?

Thank you Monte,

We've just started to make a map from XPDF APIs to the PDF Widget APIs, so I'll 
make sure that gets done soon and add any missing capabilities as requests to 
the LC Quality Center.

With regard to the hilitedRange and hilitedRangeText properties, can you just 
advise on the correct use to get a PDF's text? i.e can you use a range of 1 to 
-1 to get the whole document text or would that just be the current page text?

Thanks in advance,


On 12/12/2021 6:49 PM, Monte Goulding via use-livecode wrote:
> Hi Folks
>
> Currently you can extract text in the widget by setting the hilitedRange and 
> getting the hilitedRangeText. It wouldn’t be that hard to add extracted text 
> to the documentPages property. The PDF widget was built to meet the 
> requirements for a client rather than to match the features of XPDF so it’s 
> worthwhile anyone still using XPDF to take the time to audit their use and 
> see if there’s any extra features required. If so please create feature 
> requests for them. While XPDF will continue to function we intend to stop 
> including it in LiveCode.
>
> Cheers
>
> Monte
>
>> On 12 Dec 2021, at 12:27 am, Paul Dupuis via use-livecode 
>>  wrote:
>>
>> I suspect it is for backward compatibility.
>>
>> When I turned over the XPDF external to Livecode, I asked that they maintain 
>> it for a couple years. I had expected we'd migrate out apps to the PDF 
>> widget by then, but business factors mean we're only now just starting a 
>> migration.
>>
>> That's why I jumped in on this thread - we HAVE to have the ability to 
>> extract text and images from the PDF widget (as you can with the External) - 
>> to migrate to the Widget.
>>
>> I suspect many other commercial developers who used the External still have 
>> active code using it that they have not migrated yet OR the issue of the 
>> undocumented (or, even worse, missing) properties of the widget most likely 
>> would have been raised before now.
>>
>> To migrate, all the command and functions of the External need to be mapped 
>> to the properties of the Widget. We have probably a couple hundred calls to 
>> the External in our code all of which need to be mapped, updated, and tested 
>> - so no trivial task.
>>
>>
>> On 12/11/2021 6:50 AM, matthias rebbe via use-livecode wrote:
>>> Ah, i thought you were referring only to XPDF.
>>> Btw. do you have an idea why both, XPDF external and PDF widget, are 
>>> maintained? Wouldn't it make sense to have only one pdf solution included?
>>> Or am i missing something?
>>>
>>> Regards,
>>> Matthias
>>>
>>>
 Am 11.12.2021 um 02:01 schrieb Paul Dupuis via use-livecode 
 :

 Yes, I am familiar with the XPDF external (based on Google's PDFium 
 library), having designed it and paid Monte to code it and then turned it 
 over to LiveCode.

 I was referring to the PDF Widget (also based on Google's PDFium), which 
 should have a comparable property for fetching the text of a page. The LC 
 dictionary does not list any property for returning the page text, so I 
 assume that is a Dictionary/Documentation error and that Monte can tell us 
 the correct property of the PDF widget that will return the text of a page.


 On 12/10/2021 7:05 PM, matthias rebbe via use-livecode wrote:
> Paul,
>
> here on mac OS the dictionary of LC 10 DP1 definitely lists the function 
> XPDFViewer_Text(viewerName, pageNumber).
> Btw. checking this showed me that this function seems to be deprecated 
> and instead the command
>   XPDFViewer_Unicode viewerName, pageNumber, variableName 
> should be used.
>
>
>> Am 10.12.2021 um 23:22 schrieb Paul Dupuis via use-livecode 
>> :
>>
>> There must be an undocumented property for the text of a page - there 
>> was a function to return the full text of a page in the External (XPDF) 
>> and to get the full text of the PDF file, you just stepped through the 
>> pages (1..N) getting and concatenating the page text.
>>
>> Monte? 

Re: How to extract whole text from a PDF file with the PDF widget?

2021-12-12 Thread Monte Goulding via use-livecode
Both the page and character index are clamped to the number of pages and 
characters on a page so you could set both to very high numbers. Adding 
character counts to the documentPages property might be useful here too.

Cheers

Monte

> On 13 Dec 2021, at 11:17 am, Paul Dupuis via use-livecode 
>  wrote:
> 
> Thank you Monte,
> 
> We've just started to make a map from XPDF APIs to the PDF Widget APIs, so 
> I'll make sure that gets done soon and add any missing capabilities as 
> requests to the LC Quality Center.
> 
> With regard to the hilitedRange and hilitedRangeText properties, can you just 
> advise on the correct use to get a PDF's text? i.e can you use a range of 1 
> to -1 to get the whole document text or would that just be the current page 
> text?
> 
> Thanks in advance,
> 
> 
> On 12/12/2021 6:49 PM, Monte Goulding via use-livecode wrote:
>> Hi Folks
>> 
>> Currently you can extract text in the widget by setting the hilitedRange and 
>> getting the hilitedRangeText. It wouldn’t be that hard to add extracted text 
>> to the documentPages property. The PDF widget was built to meet the 
>> requirements for a client rather than to match the features of XPDF so it’s 
>> worthwhile anyone still using XPDF to take the time to audit their use and 
>> see if there’s any extra features required. If so please create feature 
>> requests for them. While XPDF will continue to function we intend to stop 
>> including it in LiveCode.
>> 
>> Cheers
>> 
>> Monte
>> 
>>> On 12 Dec 2021, at 12:27 am, Paul Dupuis via use-livecode 
>>>  wrote:
>>> 
>>> I suspect it is for backward compatibility.
>>> 
>>> When I turned over the XPDF external to Livecode, I asked that they 
>>> maintain it for a couple years. I had expected we'd migrate out apps to the 
>>> PDF widget by then, but business factors mean we're only now just starting 
>>> a migration.
>>> 
>>> That's why I jumped in on this thread - we HAVE to have the ability to 
>>> extract text and images from the PDF widget (as you can with the External) 
>>> - to migrate to the Widget.
>>> 
>>> I suspect many other commercial developers who used the External still have 
>>> active code using it that they have not migrated yet OR the issue of the 
>>> undocumented (or, even worse, missing) properties of the widget most likely 
>>> would have been raised before now.
>>> 
>>> To migrate, all the command and functions of the External need to be mapped 
>>> to the properties of the Widget. We have probably a couple hundred calls to 
>>> the External in our code all of which need to be mapped, updated, and 
>>> tested - so no trivial task.
>>> 
>>> 
>>> On 12/11/2021 6:50 AM, matthias rebbe via use-livecode wrote:
 Ah, i thought you were referring only to XPDF.
 Btw. do you have an idea why both, XPDF external and PDF widget, are 
 maintained? Wouldn't it make sense to have only one pdf solution included?
 Or am i missing something?
 
 Regards,
 Matthias
 
 
> Am 11.12.2021 um 02:01 schrieb Paul Dupuis via use-livecode 
> :
> 
> Yes, I am familiar with the XPDF external (based on Google's PDFium 
> library), having designed it and paid Monte to code it and then turned it 
> over to LiveCode.
> 
> I was referring to the PDF Widget (also based on Google's PDFium), which 
> should have a comparable property for fetching the text of a page. The LC 
> dictionary does not list any property for returning the page text, so I 
> assume that is a Dictionary/Documentation error and that Monte can tell 
> us the correct property of the PDF widget that will return the text of a 
> page.
> 
> 
> On 12/10/2021 7:05 PM, matthias rebbe via use-livecode wrote:
>> Paul,
>> 
>> here on mac OS the dictionary of LC 10 DP1 definitely lists the function 
>> XPDFViewer_Text(viewerName, pageNumber).
>> Btw. checking this showed me that this function seems to be deprecated 
>> and instead the command
>>  XPDFViewer_Unicode viewerName, pageNumber, variableName
>> should be used.
>> 
>> 
>>> Am 10.12.2021 um 23:22 schrieb Paul Dupuis via use-livecode 
>>> :
>>> 
>>> There must be an undocumented property for the text of a page - there 
>>> was a function to return the full text of a page in the External (XPDF) 
>>> and to get the full text of the PDF file, you just stepped through the 
>>> pages (1..N) getting and concatenating the page text.
>>> 
>>> Monte? LC 10.0.0 Dictionary does not list a property for the page text.
>>> 
>>> 
>>> On 12/10/2021 4:46 PM, Torsten Holmer via use-livecode wrote:
 Hi,
 
 I have a PDF file with text and pictures, but I just want the text.
 
 I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file 
 with Preview on MacOS.
 
 I have a business licence and want to use the PDF widget but I cannot 
 

Re: How to extract whole text from a PDF file with the PDF widget?

2021-12-12 Thread Paul Dupuis via use-livecode

Thank you Monte,

We've just started to make a map from XPDF APIs to the PDF Widget APIs, 
so I'll make sure that gets done soon and add any missing capabilities 
as requests to the LC Quality Center.


With regard to the hilitedRange and hilitedRangeText properties, can you 
just advise on the correct use to get a PDF's text? i.e can you use a 
range of 1 to -1 to get the whole document text or would that just be 
the current page text?


Thanks in advance,


On 12/12/2021 6:49 PM, Monte Goulding via use-livecode wrote:

Hi Folks

Currently you can extract text in the widget by setting the hilitedRange and 
getting the hilitedRangeText. It wouldn’t be that hard to add extracted text to 
the documentPages property. The PDF widget was built to meet the requirements 
for a client rather than to match the features of XPDF so it’s worthwhile 
anyone still using XPDF to take the time to audit their use and see if there’s 
any extra features required. If so please create feature requests for them. 
While XPDF will continue to function we intend to stop including it in LiveCode.

Cheers

Monte


On 12 Dec 2021, at 12:27 am, Paul Dupuis via use-livecode 
 wrote:

I suspect it is for backward compatibility.

When I turned over the XPDF external to Livecode, I asked that they maintain it 
for a couple years. I had expected we'd migrate out apps to the PDF widget by 
then, but business factors mean we're only now just starting a migration.

That's why I jumped in on this thread - we HAVE to have the ability to extract 
text and images from the PDF widget (as you can with the External) - to migrate 
to the Widget.

I suspect many other commercial developers who used the External still have 
active code using it that they have not migrated yet OR the issue of the 
undocumented (or, even worse, missing) properties of the widget most likely 
would have been raised before now.

To migrate, all the command and functions of the External need to be mapped to 
the properties of the Widget. We have probably a couple hundred calls to the 
External in our code all of which need to be mapped, updated, and tested - so 
no trivial task.


On 12/11/2021 6:50 AM, matthias rebbe via use-livecode wrote:

Ah, i thought you were referring only to XPDF.
Btw. do you have an idea why both, XPDF external and PDF widget, are 
maintained? Wouldn't it make sense to have only one pdf solution included?
Or am i missing something?

Regards,
Matthias



Am 11.12.2021 um 02:01 schrieb Paul Dupuis via use-livecode 
:

Yes, I am familiar with the XPDF external (based on Google's PDFium library), 
having designed it and paid Monte to code it and then turned it over to 
LiveCode.

I was referring to the PDF Widget (also based on Google's PDFium), which should 
have a comparable property for fetching the text of a page. The LC dictionary 
does not list any property for returning the page text, so I assume that is a 
Dictionary/Documentation error and that Monte can tell us the correct property 
of the PDF widget that will return the text of a page.


On 12/10/2021 7:05 PM, matthias rebbe via use-livecode wrote:

Paul,

here on mac OS the dictionary of LC 10 DP1 definitely lists the function 
XPDFViewer_Text(viewerName, pageNumber).
Btw. checking this showed me that this function seems to be deprecated and 
instead the command
  XPDFViewer_Unicode viewerName, pageNumber, variableName
should be used.



Am 10.12.2021 um 23:22 schrieb Paul Dupuis via use-livecode 
:

There must be an undocumented property for the text of a page - there was a 
function to return the full text of a page in the External (XPDF) and to get 
the full text of the PDF file, you just stepped through the pages (1..N) 
getting and concatenating the page text.

Monte? LC 10.0.0 Dictionary does not list a property for the page text.


On 12/10/2021 4:46 PM, Torsten Holmer via use-livecode wrote:

Hi,

I have a PDF file with text and pictures, but I just want the text.

I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file with Preview 
on MacOS.

I have a business licence and want to use the PDF widget but I cannot find a 
way to do it.

Can someone help me out?

Cheers,
Torsten
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

___

Re: How to extract whole text from a PDF file with the PDF widget?

2021-12-12 Thread Monte Goulding via use-livecode
Hi Folks

Currently you can extract text in the widget by setting the hilitedRange and 
getting the hilitedRangeText. It wouldn’t be that hard to add extracted text to 
the documentPages property. The PDF widget was built to meet the requirements 
for a client rather than to match the features of XPDF so it’s worthwhile 
anyone still using XPDF to take the time to audit their use and see if there’s 
any extra features required. If so please create feature requests for them. 
While XPDF will continue to function we intend to stop including it in LiveCode.

Cheers

Monte

> On 12 Dec 2021, at 12:27 am, Paul Dupuis via use-livecode 
>  wrote:
> 
> I suspect it is for backward compatibility.
> 
> When I turned over the XPDF external to Livecode, I asked that they maintain 
> it for a couple years. I had expected we'd migrate out apps to the PDF widget 
> by then, but business factors mean we're only now just starting a migration.
> 
> That's why I jumped in on this thread - we HAVE to have the ability to 
> extract text and images from the PDF widget (as you can with the External) - 
> to migrate to the Widget.
> 
> I suspect many other commercial developers who used the External still have 
> active code using it that they have not migrated yet OR the issue of the 
> undocumented (or, even worse, missing) properties of the widget most likely 
> would have been raised before now.
> 
> To migrate, all the command and functions of the External need to be mapped 
> to the properties of the Widget. We have probably a couple hundred calls to 
> the External in our code all of which need to be mapped, updated, and tested 
> - so no trivial task.
> 
> 
> On 12/11/2021 6:50 AM, matthias rebbe via use-livecode wrote:
>> Ah, i thought you were referring only to XPDF.
>> Btw. do you have an idea why both, XPDF external and PDF widget, are 
>> maintained? Wouldn't it make sense to have only one pdf solution included?
>> Or am i missing something?
>> 
>> Regards,
>> Matthias
>> 
>> 
>>> Am 11.12.2021 um 02:01 schrieb Paul Dupuis via use-livecode 
>>> :
>>> 
>>> Yes, I am familiar with the XPDF external (based on Google's PDFium 
>>> library), having designed it and paid Monte to code it and then turned it 
>>> over to LiveCode.
>>> 
>>> I was referring to the PDF Widget (also based on Google's PDFium), which 
>>> should have a comparable property for fetching the text of a page. The LC 
>>> dictionary does not list any property for returning the page text, so I 
>>> assume that is a Dictionary/Documentation error and that Monte can tell us 
>>> the correct property of the PDF widget that will return the text of a page.
>>> 
>>> 
>>> On 12/10/2021 7:05 PM, matthias rebbe via use-livecode wrote:
 Paul,
 
 here on mac OS the dictionary of LC 10 DP1 definitely lists the function 
 XPDFViewer_Text(viewerName, pageNumber).
 Btw. checking this showed me that this function seems to be deprecated and 
 instead the command
  XPDFViewer_Unicode viewerName, pageNumber, variableName
 should be used.
 
 
> Am 10.12.2021 um 23:22 schrieb Paul Dupuis via use-livecode 
> :
> 
> There must be an undocumented property for the text of a page - there was 
> a function to return the full text of a page in the External (XPDF) and 
> to get the full text of the PDF file, you just stepped through the pages 
> (1..N) getting and concatenating the page text.
> 
> Monte? LC 10.0.0 Dictionary does not list a property for the page text.
> 
> 
> On 12/10/2021 4:46 PM, Torsten Holmer via use-livecode wrote:
>> Hi,
>> 
>> I have a PDF file with text and pictures, but I just want the text.
>> 
>> I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file with 
>> Preview on MacOS.
>> 
>> I have a business licence and want to use the PDF widget but I cannot 
>> find a way to do it.
>> 
>> Can someone help me out?
>> 
>> Cheers,
>> Torsten
>> ___
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your 
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your 
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
 ___
 use-livecode mailing list
 use-livecode@lists.runrev.com
 Please visit this url to subscribe, unsubscribe and manage your 
 subscription preferences:
 http://lists.runrev.com/mailman/listinfo/use-livecode
>>> 
>>> ___
>>> use-livecode mailing list
>>> use-livecode@lists.runrev.com
>>> 

Re: How to extract whole text from a PDF file with the PDF widget?

2021-12-11 Thread Paul Dupuis via use-livecode

I suspect it is for backward compatibility.

When I turned over the XPDF external to Livecode, I asked that they 
maintain it for a couple years. I had expected we'd migrate out apps to 
the PDF widget by then, but business factors mean we're only now just 
starting a migration.


That's why I jumped in on this thread - we HAVE to have the ability to 
extract text and images from the PDF widget (as you can with the 
External) - to migrate to the Widget.


I suspect many other commercial developers who used the External still 
have active code using it that they have not migrated yet OR the issue 
of the undocumented (or, even worse, missing) properties of the widget 
most likely would have been raised before now.


To migrate, all the command and functions of the External need to be 
mapped to the properties of the Widget. We have probably a couple 
hundred calls to the External in our code all of which need to be 
mapped, updated, and tested - so no trivial task.



On 12/11/2021 6:50 AM, matthias rebbe via use-livecode wrote:

Ah, i thought you were referring only to XPDF.
Btw. do you have an idea why both, XPDF external and PDF widget, are 
maintained? Wouldn't it make sense to have only one pdf solution included?
Or am i missing something?

Regards,
Matthias



Am 11.12.2021 um 02:01 schrieb Paul Dupuis via use-livecode 
:

Yes, I am familiar with the XPDF external (based on Google's PDFium library), 
having designed it and paid Monte to code it and then turned it over to 
LiveCode.

I was referring to the PDF Widget (also based on Google's PDFium), which should 
have a comparable property for fetching the text of a page. The LC dictionary 
does not list any property for returning the page text, so I assume that is a 
Dictionary/Documentation error and that Monte can tell us the correct property 
of the PDF widget that will return the text of a page.


On 12/10/2021 7:05 PM, matthias rebbe via use-livecode wrote:

Paul,

here on mac OS the dictionary of LC 10 DP1 definitely lists the function 
XPDFViewer_Text(viewerName, pageNumber).
Btw. checking this showed me that this function seems to be deprecated and 
instead the command
  XPDFViewer_Unicode viewerName, pageNumber, variableName
should be used.



Am 10.12.2021 um 23:22 schrieb Paul Dupuis via use-livecode 
:

There must be an undocumented property for the text of a page - there was a 
function to return the full text of a page in the External (XPDF) and to get 
the full text of the PDF file, you just stepped through the pages (1..N) 
getting and concatenating the page text.

Monte? LC 10.0.0 Dictionary does not list a property for the page text.


On 12/10/2021 4:46 PM, Torsten Holmer via use-livecode wrote:

Hi,

I have a PDF file with text and pictures, but I just want the text.

I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file with Preview 
on MacOS.

I have a business licence and want to use the PDF widget but I cannot find a 
way to do it.

Can someone help me out?

Cheers,
Torsten
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode



___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: How to extract whole text from a PDF file with the PDF widget?

2021-12-11 Thread matthias rebbe via use-livecode
Ah, i thought you were referring only to XPDF. 
Btw. do you have an idea why both, XPDF external and PDF widget, are 
maintained? Wouldn't it make sense to have only one pdf solution included?
Or am i missing something?

Regards,
Matthias


> Am 11.12.2021 um 02:01 schrieb Paul Dupuis via use-livecode 
> :
> 
> Yes, I am familiar with the XPDF external (based on Google's PDFium library), 
> having designed it and paid Monte to code it and then turned it over to 
> LiveCode.
> 
> I was referring to the PDF Widget (also based on Google's PDFium), which 
> should have a comparable property for fetching the text of a page. The LC 
> dictionary does not list any property for returning the page text, so I 
> assume that is a Dictionary/Documentation error and that Monte can tell us 
> the correct property of the PDF widget that will return the text of a page.
> 
> 
> On 12/10/2021 7:05 PM, matthias rebbe via use-livecode wrote:
>> Paul,
>> 
>> here on mac OS the dictionary of LC 10 DP1 definitely lists the function 
>> XPDFViewer_Text(viewerName, pageNumber).
>> Btw. checking this showed me that this function seems to be deprecated and 
>> instead the command
>>  XPDFViewer_Unicode viewerName, pageNumber, variableName
>> should be used.
>> 
>> 
>>> Am 10.12.2021 um 23:22 schrieb Paul Dupuis via use-livecode 
>>> :
>>> 
>>> There must be an undocumented property for the text of a page - there was a 
>>> function to return the full text of a page in the External (XPDF) and to 
>>> get the full text of the PDF file, you just stepped through the pages 
>>> (1..N) getting and concatenating the page text.
>>> 
>>> Monte? LC 10.0.0 Dictionary does not list a property for the page text.
>>> 
>>> 
>>> On 12/10/2021 4:46 PM, Torsten Holmer via use-livecode wrote:
 Hi,
 
 I have a PDF file with text and pictures, but I just want the text.
 
 I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file with 
 Preview on MacOS.
 
 I have a business licence and want to use the PDF widget but I cannot find 
 a way to do it.
 
 Can someone help me out?
 
 Cheers,
 Torsten
 ___
 use-livecode mailing list
 use-livecode@lists.runrev.com
 Please visit this url to subscribe, unsubscribe and manage your 
 subscription preferences:
 http://lists.runrev.com/mailman/listinfo/use-livecode
>>> 
>>> ___
>>> use-livecode mailing list
>>> use-livecode@lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your 
>>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> 
>> ___
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription 
>> preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> 
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: How to extract whole text from a PDF file with the PDF widget?

2021-12-10 Thread Paul Dupuis via use-livecode
Yes, I am familiar with the XPDF external (based on Google's PDFium 
library), having designed it and paid Monte to code it and then turned 
it over to LiveCode.


I was referring to the PDF Widget (also based on Google's PDFium), which 
should have a comparable property for fetching the text of a page. The 
LC dictionary does not list any property for returning the page text, so 
I assume that is a Dictionary/Documentation error and that Monte can 
tell us the correct property of the PDF widget that will return the text 
of a page.



On 12/10/2021 7:05 PM, matthias rebbe via use-livecode wrote:

Paul,

here on mac OS the dictionary of LC 10 DP1 definitely lists the function 
XPDFViewer_Text(viewerName, pageNumber).
Btw. checking this showed me that this function seems to be deprecated and 
instead the command
  XPDFViewer_Unicode viewerName, pageNumber, variableName
should be used.



Am 10.12.2021 um 23:22 schrieb Paul Dupuis via use-livecode 
:

There must be an undocumented property for the text of a page - there was a 
function to return the full text of a page in the External (XPDF) and to get 
the full text of the PDF file, you just stepped through the pages (1..N) 
getting and concatenating the page text.

Monte? LC 10.0.0 Dictionary does not list a property for the page text.


On 12/10/2021 4:46 PM, Torsten Holmer via use-livecode wrote:

Hi,

I have a PDF file with text and pictures, but I just want the text.

I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file with Preview 
on MacOS.

I have a business licence and want to use the PDF widget but I cannot find a 
way to do it.

Can someone help me out?

Cheers,
Torsten
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode



___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: How to extract whole text from a PDF file with the PDF widget?

2021-12-10 Thread matthias rebbe via use-livecode
Paul,

here on mac OS the dictionary of LC 10 DP1 definitely lists the function 
XPDFViewer_Text(viewerName, pageNumber). 
Btw. checking this showed me that this function seems to be deprecated and 
instead the command
 XPDFViewer_Unicode viewerName, pageNumber, variableName
should be used.


> Am 10.12.2021 um 23:22 schrieb Paul Dupuis via use-livecode 
> :
> 
> There must be an undocumented property for the text of a page - there was a 
> function to return the full text of a page in the External (XPDF) and to get 
> the full text of the PDF file, you just stepped through the pages (1..N) 
> getting and concatenating the page text.
> 
> Monte? LC 10.0.0 Dictionary does not list a property for the page text.
> 
> 
> On 12/10/2021 4:46 PM, Torsten Holmer via use-livecode wrote:
>> Hi,
>> 
>> I have a PDF file with text and pictures, but I just want the text.
>> 
>> I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file with 
>> Preview on MacOS.
>> 
>> I have a business licence and want to use the PDF widget but I cannot find a 
>> way to do it.
>> 
>> Can someone help me out?
>> 
>> Cheers,
>> Torsten
>> ___
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription 
>> preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> 
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: How to extract whole text from a PDF file with the PDF widget?

2021-12-10 Thread Paul Dupuis via use-livecode
There must be an undocumented property for the text of a page - there 
was a function to return the full text of a page in the External (XPDF) 
and to get the full text of the PDF file, you just stepped through the 
pages (1..N) getting and concatenating the page text.


Monte? LC 10.0.0 Dictionary does not list a property for the page text.


On 12/10/2021 4:46 PM, Torsten Holmer via use-livecode wrote:

Hi,

I have a PDF file with text and pictures, but I just want the text.

I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file with Preview 
on MacOS.

I have a business licence and want to use the PDF widget but I cannot find a 
way to do it.

Can someone help me out?

Cheers,
Torsten
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode



___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: How to extract whole text from a PDF file with the PDF widget?

2021-12-10 Thread matthias rebbe via use-livecode
Hi Torsten,

i think the PDF widget does not support extracting text by code. At least the 
documentation does not show any information about this.

You wrote, that you have a business license. That would mean, that you can use 
the Pro features of Livecode.
There is an external included in the Pro Feature Pack called XPDF. That 
external supports extracting text. Have a look at the function XPDFVIEWER_text.


Regards,

Matthias

> Am 10.12.2021 um 22:46 schrieb Torsten Holmer via use-livecode 
> :
> 
> Hi,
> 
> I have a PDF file with text and pictures, but I just want the text.
> 
> I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file with 
> Preview on MacOS. 
> 
> I have a business licence and want to use the PDF widget but I cannot find a 
> way to do it.
> 
> Can someone help me out?
> 
> Cheers,
> Torsten
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


How to extract whole text from a PDF file with the PDF widget?

2021-12-10 Thread Torsten Holmer via use-livecode
Hi,

I have a PDF file with text and pictures, but I just want the text.

I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file with Preview 
on MacOS. 

I have a business licence and want to use the PDF widget but I cannot find a 
way to do it.

Can someone help me out?

Cheers,
Torsten
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode