Thank you for raising this issue. Please re-request a jira account, and we'll accept it. Sorry about that.
On Wed, Sep 25, 2024 at 11:06 AM Ruairidh Williamson < ruairidh.william...@nextdlp.com> wrote: > Hello, > > We are using tika to extract text from XPS files and have hit an issue > where whitespace is not emitted where we would expect. See the attached > example file where opening the file it visually has a large gap between "x" > and "abcde1234f" but when extracted by tika it calls `characters` with "x" > and then `characters` on "abcde1234f". We would expect a > `ignorableWhitespace` in between those calls but we don't get one. > > I've taken a look through the XPS source code and think I've identified > the issue and how to fix it. I would like to submit a pull request on > github. The contribution requirements say I must have a tika issue open > first. My request to make an ASF account was denied so if anyone is able to > create an issue for me I will create my pull request against that. > > Any help or feedback would be appreciated. > > Kind regards, > Ruairidh > > > Next DLP, Huckletree West, Mediaworks, 191 Wood Ln, London W12 7FP. > Company number 13785405.