Re: [poppler] Testing Re: Multicolumn select

Albert Astals Cid Wed, 09 Dec 2009 14:26:16 -0800

A Dimecres 09 Desembre 2009 14:51:59, Baz va escriure:
> 2009/12/8 Albert Astals Cid <[email protected]>:
> > What we want is something that makes text extraction/selection better,
> > the definition of better is the problem here :D
> 
> Ok. So it sounds like it would be worth adding tests in, so we can be
> explicit about what we want text extraction to do.
> 
> I could do this in two ways:
> - write a test harness that calls the apis directly (following the
> example of cairo). This has the advantage that more apis could be
> tested later, but complicates writing the tests; and in any case most
> other tests will be about rendering not text extraction. Since this
> would be a unit test, its also fragile to API changes.
> - extend pdftotext to allow me to specify start and end points for
> text extraction (page,x,y). This would make writing tests easy - just
> simple shell scripts along the lines of the git test suite. This
> feature could be useful to end users too, I guess.
> 
> I like the second plan better, since it supports building ad-hoc tests
> with pdfs attached to bugs. Since we already have -f and -l, (and -x,
> -y do something unrelated to the selection) I'm thinking of int args
> -fx, -fy, -lx, -ly, which default to (0,0) (pageWidth, pageHeight).


Why isn't x,y,W,H enough? AFAIR they define which area gets extracted.

Albert

> 
> Does this sound useful to you?
> 
> -Baz
> _______________________________________________
> poppler mailing list
> [email protected]
> http://lists.freedesktop.org/mailman/listinfo/poppler
> 
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Re: [poppler] Testing Re: Multicolumn select

Reply via email to