The 1.0 API change, has moved further away from user-based API to a functional API, which is a very bad thing to do. And that is why there are lot of complaints about the API now being "broken". From a use-case point of view, the API has suffered a very serious regression.
Your argument about how the API ought to work is well-reasoned, and I don't take issue with it. However, you're wrong to say that there has been a regression in pdfbox. The pdfbox API never promised that processTextPosition() would be called once per word. It sounds like you and others observed empirically, on particular documents, that the callback was called once per word (or once per table cell in someone else's case), and you incorrectly inferred that this was guaranteed. But in fact, even with older versions of pdfbox there are documents for which it is called with one character at a time. It depends on the software that created the PDF.
In other words, software that expected processTextPosition to be called once per word was always broken. Pdfbox 1.0 just makes the breakage apparent on a wider range of documents.
You can certainly request an improvement to make it work the way you previously thought it worked. But the correct implementation of that feature would be to calculate the average inter-character spacing, and infer a word break when a spacing significantly larger than the average is observed. That's not what pdfbox 0.8 did.
-Aaron

