Yes You need to parse the entities Yourself. I implemented an HTML
entity parser as a part of http://objectledge.org project. You may use
it if it will fit Your needs. It is in a ledge-components project
module. See http://objectledge.org/modules/ledge-components/index.html
Have fun,
--
Damian
-Original Message-
From: Damian Gajda [mailto:[EMAIL PROTECTED]
Sent: Tuesday, October 12, 2004 10:23 AM
To: Lucene Users List
Subject: Re: indexing numeric entities?
Yes You need to parse the entities Yourself. I implemented an HTML
entity parser as a part of http://objectledge.org
Daan Hoogland wrote:
Daan Hoogland wrote:
Hello,
Does anyone do indexeing of numeric entities for japanese characters? I
have (non-x)html containing those entities and need to index and search
them.
Can the CJKAnalyzer index a string like #9679;#20837;#31038;? It
seems to be
maybe inline?
html xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
head
titlejapan/title
/head
body bgcolor=#FF alink=black
p
#12501;#12451;#12540;#12523;#12489;#12469;#12540;#12499;#12473;#12456;#12531;#12472;#12491;#12450;
/p
/html
Indexing the above document using the