2009/12/8 Albert Astals Cid <[email protected]>: > What we want is something that makes text extraction/selection better, the > definition of better is the problem here :D
Ok. So it sounds like it would be worth adding tests in, so we can be explicit about what we want text extraction to do. I could do this in two ways: - write a test harness that calls the apis directly (following the example of cairo). This has the advantage that more apis could be tested later, but complicates writing the tests; and in any case most other tests will be about rendering not text extraction. Since this would be a unit test, its also fragile to API changes. - extend pdftotext to allow me to specify start and end points for text extraction (page,x,y). This would make writing tests easy - just simple shell scripts along the lines of the git test suite. This feature could be useful to end users too, I guess. I like the second plan better, since it supports building ad-hoc tests with pdfs attached to bugs. Since we already have -f and -l, (and -x, -y do something unrelated to the selection) I'm thinking of int args -fx, -fy, -lx, -ly, which default to (0,0) (pageWidth, pageHeight). Does this sound useful to you? -Baz _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
