Anders,
Thank you for the explanation.
> which could be written in HTML like this:
>
> use <tt><p></tt> to mark a paragraph
Ok.
> so the mapping char filter would map it into:
>
> use <tt><p></tt> to mark a paragraph
This is correct when you have the mapping definition:
"<" => "<"
">" => ">"
: :
But I thought you could not have them, but have only:
"ü" => "ü"
"ä" => "ä"
: :
Didn't it solve your problem?
Thank you,
Koji
Anders Melchiorsen wrote:
Koji Sekiguchi <k...@r.email.ne.jp> writes:
Thank you for attaching the patch. Sorry again, I don't have enough
time to investigate the patch and the problem you have, though, I'd
like just to recommend that you'd open a JIRA issue and attach the
patch so that I or someone can look into it later.
Sorry, learning an issue tracker every time I find a bug in some
project is too much trouble. I wouldn't mind if someone else transfers
my previous mail, though.
And I didn't understand this part of your previous mail:
Adding MappingCharFilterFactory in front of the HTML stripper (so
that the latter will not see the entity) does work as expected.
That is, until I try strings like "use <p> to mark a
paragraph", where the HTML stripper will then remove parts of the
actual text. So this approach will not work.
Entity mapping and tag removal has to happen in one pass to keep
fidelity.
Let's say that we are analyzing a tutorial on writing HTML. It might
contain the text:
use <p> to mark a paragraph
which could be written in HTML like this:
use <tt><p></tt> to mark a paragraph
so the mapping char filter would map it into:
use <tt><p></tt> to mark a paragraph
which is already wrong. Next, the HTML stripper would remove the tags:
use to mark a paragraph
and we have now lost a part of the original text.
Cheers,
Anders.