Ah! I apologize. In my old code I had been calling processStream on a single 
PDPage, not processPage. Sorry that was my mixup.

I think I am good now using the setPage(PDPage) override for what I was looking 
to do.

Cheers,

Britt


Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
http://wiredinformatics.com
[email protected]

> On Dec 4, 2015, at 3:53 PM, Tilman Hausherr <[email protected]> wrote:
> 
> Am 04.12.2015 um 21:39 schrieb britt fitch:
>> Thanks very much for the quick replies!
>> 
>> I think setting startPage & endPage with make it so you correctly only 
>> extract the pages you want, but on every extraction it will iterate over all 
>> pages first.
>> 
>> For example, if you have a 100 page document and want to extract page 2 & 
>> page 90, you will iterate over all 100 pages and process page 2, then 
>> iterate over all 100 pages and process page 90.
>> 
>> The 1.8 version allowed you to pass a single page to be processed. I’m 
>> curious if that functionality was removed because of an issue or if it was 
>> just a bug.
> 
> Really? I looked at processPage(), and it does use currentPageNo and I don't 
> see a way to set that one from outside.
> 
> On a second look, I think I understand what you mean: processPages() uses a 
> list of pages, so you would set your own list. But this would mean trouble if 
> you had set other variables.
> I assume this was changed in 2.0 as part of the page tree refactoring.
> 
> Btw this looping does indeed look weird, but I doubt you'll use any time. The 
> text extraction by itself does much more, it needs to loop through every 
> glyph in the page you're extracting.
> 
> Tilman
> 
>> 
>> It looks like I can get around this a bit by overriding startPage(PDPage) 
>> and endPage(PDPage) though.
>> 
>> Thanks again, I really appreciate all your feedback.
>> 
>> Cheers,
>> 
>> Britt
>> 
>> 
>> 
>> Britt Fitch
>> Wired Informatics
>> 265 Franklin St Ste 1702
>> Boston, MA 02110
>> http://wiredinformatics.com
>> [email protected]
>> 
>>> On Dec 4, 2015, at 3:07 PM, Tilman Hausherr <[email protected] 
>>> <mailto:[email protected]><mailto:[email protected] 
>>> <mailto:[email protected]>>> wrote:
>>> 
>>> Am 04.12.2015 um 20:56 schrieb britt fitch:
>>>> Awesome, thanks. That takes care of #1 & 2.
>>>> 
>>>> For #3, is the check on currentPageNo necessary?
>>>> Right now processPage must be called from processPages or nothing happens.
>>>> This has a negative effect for cases like mine where I want to override 
>>>> processTextPosition and handle different pages or even if you only want to 
>>>> extract data from particular pages.
>>> 
>>> You can set the start and endpage through the setters setStartPage() and 
>>> setEndPage(). That's the official way to do it.
>>> 
>>> Tilman

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to