Thank you for raising this issue. Please re-request a jira account, and
we'll accept it. Sorry about that.

On Wed, Sep 25, 2024 at 11:06 AM Ruairidh Williamson <
ruairidh.william...@nextdlp.com> wrote:

> Hello,
>
> We are using tika to extract text from XPS files and have hit an issue
> where whitespace is not emitted where we would expect. See the attached
> example file where opening the file it visually has a large gap between "x"
> and "abcde1234f" but when extracted by tika it calls `characters` with "x"
> and then `characters` on "abcde1234f". We would expect a
> `ignorableWhitespace` in between those calls but we don't get one.
>
> I've taken a look through the XPS source code and think I've identified
> the issue and how to fix it. I would like to submit a pull request on
> github. The contribution requirements say I must have a tika issue open
> first. My request to make an ASF account was denied so if anyone is able to
> create an issue for me I will create my pull request against that.
>
> Any help or feedback would be appreciated.
>
> Kind regards,
> Ruairidh
>
>
> Next DLP, Huckletree West, Mediaworks, 191 Wood Ln, London W12 7FP.
> Company number 13785405.

Reply via email to