Unfortunately, I think there is nothing we can do. I've done everything I can to maximize compatibility with various PDF rendering engines, but Preview uses particularly terrible text extraction heuristics. To be fair, the root problem is the design and complexity of the PDF specification itself.
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/262a0e22-eddf-4b10-bd17-7e7f5f17cac9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

