RE: How to extract whole text from a PDF file with the PDF widget?
Sorry I could not get back to you on this until now. (-1)s don't work here. put 1 into tHilitedArray["from"]["page"] put 1 into tHilitedArray["from"]["index"] put 99 into tHilitedArray["to"]["page"] put 99 into tHilitedArray["to"]["index"] set the hilitedRange of control "PDF1" to tHilitedArray put the hilitedRangeText of control "PDF1" into tText This will work if you don't need to know the page number. If you do then cycle thru each page. (1 to the NumberOfPages of control "PDF1") Ralph DiMola IT Director Evergreen Information Services rdim...@evergreeninfo.net -Original Message- From: use-livecode [mailto:use-livecode-boun...@lists.runrev.com] On Behalf Of Paul Dupuis via use-livecode Sent: Sunday, December 12, 2021 7:18 PM To: use-livecode@lists.runrev.com Cc: Paul Dupuis Subject: Re: How to extract whole text from a PDF file with the PDF widget? Thank you Monte, We've just started to make a map from XPDF APIs to the PDF Widget APIs, so I'll make sure that gets done soon and add any missing capabilities as requests to the LC Quality Center. With regard to the hilitedRange and hilitedRangeText properties, can you just advise on the correct use to get a PDF's text? i.e can you use a range of 1 to -1 to get the whole document text or would that just be the current page text? Thanks in advance, On 12/12/2021 6:49 PM, Monte Goulding via use-livecode wrote: > Hi Folks > > Currently you can extract text in the widget by setting the hilitedRange and > getting the hilitedRangeText. It wouldn’t be that hard to add extracted text > to the documentPages property. The PDF widget was built to meet the > requirements for a client rather than to match the features of XPDF so it’s > worthwhile anyone still using XPDF to take the time to audit their use and > see if there’s any extra features required. If so please create feature > requests for them. While XPDF will continue to function we intend to stop > including it in LiveCode. > > Cheers > > Monte > >> On 12 Dec 2021, at 12:27 am, Paul Dupuis via use-livecode >> wrote: >> >> I suspect it is for backward compatibility. >> >> When I turned over the XPDF external to Livecode, I asked that they maintain >> it for a couple years. I had expected we'd migrate out apps to the PDF >> widget by then, but business factors mean we're only now just starting a >> migration. >> >> That's why I jumped in on this thread - we HAVE to have the ability to >> extract text and images from the PDF widget (as you can with the External) - >> to migrate to the Widget. >> >> I suspect many other commercial developers who used the External still have >> active code using it that they have not migrated yet OR the issue of the >> undocumented (or, even worse, missing) properties of the widget most likely >> would have been raised before now. >> >> To migrate, all the command and functions of the External need to be mapped >> to the properties of the Widget. We have probably a couple hundred calls to >> the External in our code all of which need to be mapped, updated, and tested >> - so no trivial task. >> >> >> On 12/11/2021 6:50 AM, matthias rebbe via use-livecode wrote: >>> Ah, i thought you were referring only to XPDF. >>> Btw. do you have an idea why both, XPDF external and PDF widget, are >>> maintained? Wouldn't it make sense to have only one pdf solution included? >>> Or am i missing something? >>> >>> Regards, >>> Matthias >>> >>> Am 11.12.2021 um 02:01 schrieb Paul Dupuis via use-livecode : Yes, I am familiar with the XPDF external (based on Google's PDFium library), having designed it and paid Monte to code it and then turned it over to LiveCode. I was referring to the PDF Widget (also based on Google's PDFium), which should have a comparable property for fetching the text of a page. The LC dictionary does not list any property for returning the page text, so I assume that is a Dictionary/Documentation error and that Monte can tell us the correct property of the PDF widget that will return the text of a page. On 12/10/2021 7:05 PM, matthias rebbe via use-livecode wrote: > Paul, > > here on mac OS the dictionary of LC 10 DP1 definitely lists the function > XPDFViewer_Text(viewerName, pageNumber). > Btw. checking this showed me that this function seems to be deprecated > and instead the command > XPDFViewer_Unicode viewerName, pageNumber, variableName > should be used. > > >> Am 10.12.2021 um 23:22 schrieb Paul Dupuis via use-livecode >> : >> >> There must be an undocumented property for the text of a page - there >> was a function to return the full text of a page in the External (XPDF) >> and to get the full text of the PDF file, you just stepped through the >> pages (1..N) getting and concatenating the page text. >> >> Monte?
Re: How to extract whole text from a PDF file with the PDF widget?
Both the page and character index are clamped to the number of pages and characters on a page so you could set both to very high numbers. Adding character counts to the documentPages property might be useful here too. Cheers Monte > On 13 Dec 2021, at 11:17 am, Paul Dupuis via use-livecode > wrote: > > Thank you Monte, > > We've just started to make a map from XPDF APIs to the PDF Widget APIs, so > I'll make sure that gets done soon and add any missing capabilities as > requests to the LC Quality Center. > > With regard to the hilitedRange and hilitedRangeText properties, can you just > advise on the correct use to get a PDF's text? i.e can you use a range of 1 > to -1 to get the whole document text or would that just be the current page > text? > > Thanks in advance, > > > On 12/12/2021 6:49 PM, Monte Goulding via use-livecode wrote: >> Hi Folks >> >> Currently you can extract text in the widget by setting the hilitedRange and >> getting the hilitedRangeText. It wouldn’t be that hard to add extracted text >> to the documentPages property. The PDF widget was built to meet the >> requirements for a client rather than to match the features of XPDF so it’s >> worthwhile anyone still using XPDF to take the time to audit their use and >> see if there’s any extra features required. If so please create feature >> requests for them. While XPDF will continue to function we intend to stop >> including it in LiveCode. >> >> Cheers >> >> Monte >> >>> On 12 Dec 2021, at 12:27 am, Paul Dupuis via use-livecode >>> wrote: >>> >>> I suspect it is for backward compatibility. >>> >>> When I turned over the XPDF external to Livecode, I asked that they >>> maintain it for a couple years. I had expected we'd migrate out apps to the >>> PDF widget by then, but business factors mean we're only now just starting >>> a migration. >>> >>> That's why I jumped in on this thread - we HAVE to have the ability to >>> extract text and images from the PDF widget (as you can with the External) >>> - to migrate to the Widget. >>> >>> I suspect many other commercial developers who used the External still have >>> active code using it that they have not migrated yet OR the issue of the >>> undocumented (or, even worse, missing) properties of the widget most likely >>> would have been raised before now. >>> >>> To migrate, all the command and functions of the External need to be mapped >>> to the properties of the Widget. We have probably a couple hundred calls to >>> the External in our code all of which need to be mapped, updated, and >>> tested - so no trivial task. >>> >>> >>> On 12/11/2021 6:50 AM, matthias rebbe via use-livecode wrote: Ah, i thought you were referring only to XPDF. Btw. do you have an idea why both, XPDF external and PDF widget, are maintained? Wouldn't it make sense to have only one pdf solution included? Or am i missing something? Regards, Matthias > Am 11.12.2021 um 02:01 schrieb Paul Dupuis via use-livecode > : > > Yes, I am familiar with the XPDF external (based on Google's PDFium > library), having designed it and paid Monte to code it and then turned it > over to LiveCode. > > I was referring to the PDF Widget (also based on Google's PDFium), which > should have a comparable property for fetching the text of a page. The LC > dictionary does not list any property for returning the page text, so I > assume that is a Dictionary/Documentation error and that Monte can tell > us the correct property of the PDF widget that will return the text of a > page. > > > On 12/10/2021 7:05 PM, matthias rebbe via use-livecode wrote: >> Paul, >> >> here on mac OS the dictionary of LC 10 DP1 definitely lists the function >> XPDFViewer_Text(viewerName, pageNumber). >> Btw. checking this showed me that this function seems to be deprecated >> and instead the command >> XPDFViewer_Unicode viewerName, pageNumber, variableName >> should be used. >> >> >>> Am 10.12.2021 um 23:22 schrieb Paul Dupuis via use-livecode >>> : >>> >>> There must be an undocumented property for the text of a page - there >>> was a function to return the full text of a page in the External (XPDF) >>> and to get the full text of the PDF file, you just stepped through the >>> pages (1..N) getting and concatenating the page text. >>> >>> Monte? LC 10.0.0 Dictionary does not list a property for the page text. >>> >>> >>> On 12/10/2021 4:46 PM, Torsten Holmer via use-livecode wrote: Hi, I have a PDF file with text and pictures, but I just want the text. I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file with Preview on MacOS. I have a business licence and want to use the PDF widget but I cannot
Re: How to extract whole text from a PDF file with the PDF widget?
Thank you Monte, We've just started to make a map from XPDF APIs to the PDF Widget APIs, so I'll make sure that gets done soon and add any missing capabilities as requests to the LC Quality Center. With regard to the hilitedRange and hilitedRangeText properties, can you just advise on the correct use to get a PDF's text? i.e can you use a range of 1 to -1 to get the whole document text or would that just be the current page text? Thanks in advance, On 12/12/2021 6:49 PM, Monte Goulding via use-livecode wrote: Hi Folks Currently you can extract text in the widget by setting the hilitedRange and getting the hilitedRangeText. It wouldn’t be that hard to add extracted text to the documentPages property. The PDF widget was built to meet the requirements for a client rather than to match the features of XPDF so it’s worthwhile anyone still using XPDF to take the time to audit their use and see if there’s any extra features required. If so please create feature requests for them. While XPDF will continue to function we intend to stop including it in LiveCode. Cheers Monte On 12 Dec 2021, at 12:27 am, Paul Dupuis via use-livecode wrote: I suspect it is for backward compatibility. When I turned over the XPDF external to Livecode, I asked that they maintain it for a couple years. I had expected we'd migrate out apps to the PDF widget by then, but business factors mean we're only now just starting a migration. That's why I jumped in on this thread - we HAVE to have the ability to extract text and images from the PDF widget (as you can with the External) - to migrate to the Widget. I suspect many other commercial developers who used the External still have active code using it that they have not migrated yet OR the issue of the undocumented (or, even worse, missing) properties of the widget most likely would have been raised before now. To migrate, all the command and functions of the External need to be mapped to the properties of the Widget. We have probably a couple hundred calls to the External in our code all of which need to be mapped, updated, and tested - so no trivial task. On 12/11/2021 6:50 AM, matthias rebbe via use-livecode wrote: Ah, i thought you were referring only to XPDF. Btw. do you have an idea why both, XPDF external and PDF widget, are maintained? Wouldn't it make sense to have only one pdf solution included? Or am i missing something? Regards, Matthias Am 11.12.2021 um 02:01 schrieb Paul Dupuis via use-livecode : Yes, I am familiar with the XPDF external (based on Google's PDFium library), having designed it and paid Monte to code it and then turned it over to LiveCode. I was referring to the PDF Widget (also based on Google's PDFium), which should have a comparable property for fetching the text of a page. The LC dictionary does not list any property for returning the page text, so I assume that is a Dictionary/Documentation error and that Monte can tell us the correct property of the PDF widget that will return the text of a page. On 12/10/2021 7:05 PM, matthias rebbe via use-livecode wrote: Paul, here on mac OS the dictionary of LC 10 DP1 definitely lists the function XPDFViewer_Text(viewerName, pageNumber). Btw. checking this showed me that this function seems to be deprecated and instead the command XPDFViewer_Unicode viewerName, pageNumber, variableName should be used. Am 10.12.2021 um 23:22 schrieb Paul Dupuis via use-livecode : There must be an undocumented property for the text of a page - there was a function to return the full text of a page in the External (XPDF) and to get the full text of the PDF file, you just stepped through the pages (1..N) getting and concatenating the page text. Monte? LC 10.0.0 Dictionary does not list a property for the page text. On 12/10/2021 4:46 PM, Torsten Holmer via use-livecode wrote: Hi, I have a PDF file with text and pictures, but I just want the text. I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file with Preview on MacOS. I have a business licence and want to use the PDF widget but I cannot find a way to do it. Can someone help me out? Cheers, Torsten ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___
Re: How to extract whole text from a PDF file with the PDF widget?
Hi Folks Currently you can extract text in the widget by setting the hilitedRange and getting the hilitedRangeText. It wouldn’t be that hard to add extracted text to the documentPages property. The PDF widget was built to meet the requirements for a client rather than to match the features of XPDF so it’s worthwhile anyone still using XPDF to take the time to audit their use and see if there’s any extra features required. If so please create feature requests for them. While XPDF will continue to function we intend to stop including it in LiveCode. Cheers Monte > On 12 Dec 2021, at 12:27 am, Paul Dupuis via use-livecode > wrote: > > I suspect it is for backward compatibility. > > When I turned over the XPDF external to Livecode, I asked that they maintain > it for a couple years. I had expected we'd migrate out apps to the PDF widget > by then, but business factors mean we're only now just starting a migration. > > That's why I jumped in on this thread - we HAVE to have the ability to > extract text and images from the PDF widget (as you can with the External) - > to migrate to the Widget. > > I suspect many other commercial developers who used the External still have > active code using it that they have not migrated yet OR the issue of the > undocumented (or, even worse, missing) properties of the widget most likely > would have been raised before now. > > To migrate, all the command and functions of the External need to be mapped > to the properties of the Widget. We have probably a couple hundred calls to > the External in our code all of which need to be mapped, updated, and tested > - so no trivial task. > > > On 12/11/2021 6:50 AM, matthias rebbe via use-livecode wrote: >> Ah, i thought you were referring only to XPDF. >> Btw. do you have an idea why both, XPDF external and PDF widget, are >> maintained? Wouldn't it make sense to have only one pdf solution included? >> Or am i missing something? >> >> Regards, >> Matthias >> >> >>> Am 11.12.2021 um 02:01 schrieb Paul Dupuis via use-livecode >>> : >>> >>> Yes, I am familiar with the XPDF external (based on Google's PDFium >>> library), having designed it and paid Monte to code it and then turned it >>> over to LiveCode. >>> >>> I was referring to the PDF Widget (also based on Google's PDFium), which >>> should have a comparable property for fetching the text of a page. The LC >>> dictionary does not list any property for returning the page text, so I >>> assume that is a Dictionary/Documentation error and that Monte can tell us >>> the correct property of the PDF widget that will return the text of a page. >>> >>> >>> On 12/10/2021 7:05 PM, matthias rebbe via use-livecode wrote: Paul, here on mac OS the dictionary of LC 10 DP1 definitely lists the function XPDFViewer_Text(viewerName, pageNumber). Btw. checking this showed me that this function seems to be deprecated and instead the command XPDFViewer_Unicode viewerName, pageNumber, variableName should be used. > Am 10.12.2021 um 23:22 schrieb Paul Dupuis via use-livecode > : > > There must be an undocumented property for the text of a page - there was > a function to return the full text of a page in the External (XPDF) and > to get the full text of the PDF file, you just stepped through the pages > (1..N) getting and concatenating the page text. > > Monte? LC 10.0.0 Dictionary does not list a property for the page text. > > > On 12/10/2021 4:46 PM, Torsten Holmer via use-livecode wrote: >> Hi, >> >> I have a PDF file with text and pictures, but I just want the text. >> >> I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file with >> Preview on MacOS. >> >> I have a business licence and want to use the PDF widget but I cannot >> find a way to do it. >> >> Can someone help me out? >> >> Cheers, >> Torsten >> ___ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your >> subscription preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode >>> >>> ___ >>> use-livecode mailing list >>> use-livecode@lists.runrev.com >>>
Re: How to extract whole text from a PDF file with the PDF widget?
I suspect it is for backward compatibility. When I turned over the XPDF external to Livecode, I asked that they maintain it for a couple years. I had expected we'd migrate out apps to the PDF widget by then, but business factors mean we're only now just starting a migration. That's why I jumped in on this thread - we HAVE to have the ability to extract text and images from the PDF widget (as you can with the External) - to migrate to the Widget. I suspect many other commercial developers who used the External still have active code using it that they have not migrated yet OR the issue of the undocumented (or, even worse, missing) properties of the widget most likely would have been raised before now. To migrate, all the command and functions of the External need to be mapped to the properties of the Widget. We have probably a couple hundred calls to the External in our code all of which need to be mapped, updated, and tested - so no trivial task. On 12/11/2021 6:50 AM, matthias rebbe via use-livecode wrote: Ah, i thought you were referring only to XPDF. Btw. do you have an idea why both, XPDF external and PDF widget, are maintained? Wouldn't it make sense to have only one pdf solution included? Or am i missing something? Regards, Matthias Am 11.12.2021 um 02:01 schrieb Paul Dupuis via use-livecode : Yes, I am familiar with the XPDF external (based on Google's PDFium library), having designed it and paid Monte to code it and then turned it over to LiveCode. I was referring to the PDF Widget (also based on Google's PDFium), which should have a comparable property for fetching the text of a page. The LC dictionary does not list any property for returning the page text, so I assume that is a Dictionary/Documentation error and that Monte can tell us the correct property of the PDF widget that will return the text of a page. On 12/10/2021 7:05 PM, matthias rebbe via use-livecode wrote: Paul, here on mac OS the dictionary of LC 10 DP1 definitely lists the function XPDFViewer_Text(viewerName, pageNumber). Btw. checking this showed me that this function seems to be deprecated and instead the command XPDFViewer_Unicode viewerName, pageNumber, variableName should be used. Am 10.12.2021 um 23:22 schrieb Paul Dupuis via use-livecode : There must be an undocumented property for the text of a page - there was a function to return the full text of a page in the External (XPDF) and to get the full text of the PDF file, you just stepped through the pages (1..N) getting and concatenating the page text. Monte? LC 10.0.0 Dictionary does not list a property for the page text. On 12/10/2021 4:46 PM, Torsten Holmer via use-livecode wrote: Hi, I have a PDF file with text and pictures, but I just want the text. I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file with Preview on MacOS. I have a business licence and want to use the PDF widget but I cannot find a way to do it. Can someone help me out? Cheers, Torsten ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to extract whole text from a PDF file with the PDF widget?
Ah, i thought you were referring only to XPDF. Btw. do you have an idea why both, XPDF external and PDF widget, are maintained? Wouldn't it make sense to have only one pdf solution included? Or am i missing something? Regards, Matthias > Am 11.12.2021 um 02:01 schrieb Paul Dupuis via use-livecode > : > > Yes, I am familiar with the XPDF external (based on Google's PDFium library), > having designed it and paid Monte to code it and then turned it over to > LiveCode. > > I was referring to the PDF Widget (also based on Google's PDFium), which > should have a comparable property for fetching the text of a page. The LC > dictionary does not list any property for returning the page text, so I > assume that is a Dictionary/Documentation error and that Monte can tell us > the correct property of the PDF widget that will return the text of a page. > > > On 12/10/2021 7:05 PM, matthias rebbe via use-livecode wrote: >> Paul, >> >> here on mac OS the dictionary of LC 10 DP1 definitely lists the function >> XPDFViewer_Text(viewerName, pageNumber). >> Btw. checking this showed me that this function seems to be deprecated and >> instead the command >> XPDFViewer_Unicode viewerName, pageNumber, variableName >> should be used. >> >> >>> Am 10.12.2021 um 23:22 schrieb Paul Dupuis via use-livecode >>> : >>> >>> There must be an undocumented property for the text of a page - there was a >>> function to return the full text of a page in the External (XPDF) and to >>> get the full text of the PDF file, you just stepped through the pages >>> (1..N) getting and concatenating the page text. >>> >>> Monte? LC 10.0.0 Dictionary does not list a property for the page text. >>> >>> >>> On 12/10/2021 4:46 PM, Torsten Holmer via use-livecode wrote: Hi, I have a PDF file with text and pictures, but I just want the text. I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file with Preview on MacOS. I have a business licence and want to use the PDF widget but I cannot find a way to do it. Can someone help me out? Cheers, Torsten ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode >>> >>> ___ >>> use-livecode mailing list >>> use-livecode@lists.runrev.com >>> Please visit this url to subscribe, unsubscribe and manage your >>> subscription preferences: >>> http://lists.runrev.com/mailman/listinfo/use-livecode >> >> ___ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your subscription >> preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode > > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to extract whole text from a PDF file with the PDF widget?
Yes, I am familiar with the XPDF external (based on Google's PDFium library), having designed it and paid Monte to code it and then turned it over to LiveCode. I was referring to the PDF Widget (also based on Google's PDFium), which should have a comparable property for fetching the text of a page. The LC dictionary does not list any property for returning the page text, so I assume that is a Dictionary/Documentation error and that Monte can tell us the correct property of the PDF widget that will return the text of a page. On 12/10/2021 7:05 PM, matthias rebbe via use-livecode wrote: Paul, here on mac OS the dictionary of LC 10 DP1 definitely lists the function XPDFViewer_Text(viewerName, pageNumber). Btw. checking this showed me that this function seems to be deprecated and instead the command XPDFViewer_Unicode viewerName, pageNumber, variableName should be used. Am 10.12.2021 um 23:22 schrieb Paul Dupuis via use-livecode : There must be an undocumented property for the text of a page - there was a function to return the full text of a page in the External (XPDF) and to get the full text of the PDF file, you just stepped through the pages (1..N) getting and concatenating the page text. Monte? LC 10.0.0 Dictionary does not list a property for the page text. On 12/10/2021 4:46 PM, Torsten Holmer via use-livecode wrote: Hi, I have a PDF file with text and pictures, but I just want the text. I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file with Preview on MacOS. I have a business licence and want to use the PDF widget but I cannot find a way to do it. Can someone help me out? Cheers, Torsten ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to extract whole text from a PDF file with the PDF widget?
Paul, here on mac OS the dictionary of LC 10 DP1 definitely lists the function XPDFViewer_Text(viewerName, pageNumber). Btw. checking this showed me that this function seems to be deprecated and instead the command XPDFViewer_Unicode viewerName, pageNumber, variableName should be used. > Am 10.12.2021 um 23:22 schrieb Paul Dupuis via use-livecode > : > > There must be an undocumented property for the text of a page - there was a > function to return the full text of a page in the External (XPDF) and to get > the full text of the PDF file, you just stepped through the pages (1..N) > getting and concatenating the page text. > > Monte? LC 10.0.0 Dictionary does not list a property for the page text. > > > On 12/10/2021 4:46 PM, Torsten Holmer via use-livecode wrote: >> Hi, >> >> I have a PDF file with text and pictures, but I just want the text. >> >> I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file with >> Preview on MacOS. >> >> I have a business licence and want to use the PDF widget but I cannot find a >> way to do it. >> >> Can someone help me out? >> >> Cheers, >> Torsten >> ___ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your subscription >> preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode > > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to extract whole text from a PDF file with the PDF widget?
There must be an undocumented property for the text of a page - there was a function to return the full text of a page in the External (XPDF) and to get the full text of the PDF file, you just stepped through the pages (1..N) getting and concatenating the page text. Monte? LC 10.0.0 Dictionary does not list a property for the page text. On 12/10/2021 4:46 PM, Torsten Holmer via use-livecode wrote: Hi, I have a PDF file with text and pictures, but I just want the text. I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file with Preview on MacOS. I have a business licence and want to use the PDF widget but I cannot find a way to do it. Can someone help me out? Cheers, Torsten ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: How to extract whole text from a PDF file with the PDF widget?
Hi Torsten, i think the PDF widget does not support extracting text by code. At least the documentation does not show any information about this. You wrote, that you have a business license. That would mean, that you can use the Pro features of Livecode. There is an external included in the Pro Feature Pack called XPDF. That external supports extracting text. Have a look at the function XPDFVIEWER_text. Regards, Matthias > Am 10.12.2021 um 22:46 schrieb Torsten Holmer via use-livecode > : > > Hi, > > I have a PDF file with text and pictures, but I just want the text. > > I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file with > Preview on MacOS. > > I have a business licence and want to use the PDF widget but I cannot find a > way to do it. > > Can someone help me out? > > Cheers, > Torsten > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
How to extract whole text from a PDF file with the PDF widget?
Hi, I have a PDF file with text and pictures, but I just want the text. I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file with Preview on MacOS. I have a business licence and want to use the PDF widget but I cannot find a way to do it. Can someone help me out? Cheers, Torsten ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode