Re: [poppler] [PATCH and RFC] Bugfixes, Improved Forms Support for Unicode

Carlos Garcia Campos Sun, 03 Feb 2008 07:39:03 -0800

El sáb, 02-02-2008 a las 21:27 -0800, Michael Vrable escribió:
> The root cause https://bugs.freedesktop.org/show_bug.cgi?id=12808 is 
> that the code for rendering form fields in poppler didn't properly deal 
> with input strings provided in UTF-16: the string was treated as an 
> 8-bit string, and the byte-order-mark at the front was included in the 
> length calculation.
> 
> I started off trying to create a simple fix for this problem, but 
> eventually ended up significantly rewriting the code for displaying form 
> fields to fix other problems that I found, eventually working to add 
> near full support for Unicode inputs.
> 
> Since these changes are large, I don't expect this patch to go in right 
> away.  But please, provide feedback.  My work in based on git commit 
> 6f11ef660540.
> 
> There are two patches.  The first, character-encoding-fixes.patch, is a 
> couple of fairly trivial fixes that I came across while working on the 
> larger patch.  It can go in at any time if it looks good.
> 
> The second patch, unicode-forms-support.patch, is the main part of the 
> work and the patch I'd like comments on.  Most new functionality is in 
> the new Annot::layoutText function.  It performs a few steps:
>    - Converts input in PDFDocEncoding or UTF-16 to the font's encoding
>    - Computes the width of the text on the page
>    - Optionally breaks the text at the specified width, for multi-line
>      form fields
> All of this ended up in the same function since finding break-points for 
> lines is easiest to do on the input encoding, where spaces and newlines 
> are easier to recognize than in whatever encoding the font uses, but the 
> width of text is easiest to compute when re-encoding the text string.
> 
> The main missing element for full Unicode handling is the writing out of 
> text for CID-keyed fonts.  There is currently be support for taking 
> Unicode characters as input and finding the appropriate character code 
> in the font to show it.  However, there isn't code for writing out the 
> correct sequence of bytes to show that character (doing so should be 
> trivial for an identity CMap, but isn't added quite yet).
> 
> Also missing: support for Unicode text outside the BMP, using surrogate 
> pairs.
> 
> I've done some limited testing with these patches (in evince), and it 
> definitely work better for me than before.  However, I don't currently 
> have PDFs for testing many features, so pointers to any good test forms 
> are appreciated!


Hi Michael, thank you very much for the patches. I have tested them with
several documents and it works pretty well. The only thing that it's
still broken is multiline form fields. It was already broken indeed (see
bug http://bugzilla.gnome.org/show_bug.cgi?id=499939) but in a different
way. Now it seems to enter into an infinite loop after editing a
multiline form field. 

You can use this file to reproduce the problem: 

http://www.okular.org/stuff/forms-scribus.pdf

> Features tested:
>    - Accented characters; typographic characters such as bullets, quotes
>    - Left, center, right alignment of single-line fields
>    - Checkboxes work as before
>    - Single-line comb fields still work
> Not tested:
>    - Multi-line fields (my test form doesn't have them)
>    - Form fields with composite fonts (no test forms; code still needs a
>      tiny bit of work)
> 
> --Michael Vrable
> _______________________________________________
> poppler mailing list
> [email protected]
> http://lists.freedesktop.org/mailman/listinfo/poppler
-- 
Carlos Garcia Campos
   [EMAIL PROTECTED]
   [EMAIL PROTECTED]
   http://carlosgc.linups.org
PGP key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x523E6462

signature.asc
Description: Esta parte del mensaje está firmada digitalmente

_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Re: [poppler] [PATCH and RFC] Bugfixes, Improved Forms Support for Unicode

Reply via email to