Hi Aleks. - I am still stuck with the tests for pdf_text_filter, but the most important ones are already implemented (not commited).
What is the problem with these tests? Can we help in any way? - I found while checking After_Soft_Dotted context condition that Unicode Word Boundary Rule #4 is not implemented. It involves grapheme cluster boundary checking, which is not trivial, so I added a new task in flyspray (http://www.gnupdf.org/flyspray/index.php?do=details&task_id=31&project=2). I am with it now. Ok.
