OK - probably I should have said "A",or "a" :) My point was just that there is not really anything special about "special" characters.

On 11/21/2013 10:50 AM, Jack Krupansky wrote:
"Would you store "a" as "A" ?"

No, not in any case.

-- Jack Krupansky

-----Original Message----- From: Michael Sokolov
Sent: Thursday, November 21, 2013 8:56 AM
To: solr-user@lucene.apache.org
Subject: Re: How to index X™ as ™ (HTML decimal entity)

I have to agree w/Walter.  Use unicode as a storage format.  The entity
encodings are for transfer/interchange.  Encode/decode on the way in and
out if you have to.  Would you store "a" as "A" ?  It makes it
impossible to search for, for one thing.  What if someone wants to
search for the TM character?

-Mike

On 11/20/13 12:07 PM, Jack Krupansky wrote:
AFAICT, it's not an "extremely bad idea" - using SGML/HTML as a format for storing text to be rendered. If you disagree - try explaining yourself.

But maybe TM should be encoded as "™". Ditto for other named SGML entities.

-- Jack Krupansky

-----Original Message----- From: Walter Underwood
Sent: Wednesday, November 20, 2013 11:21 AM
To: solr-user@lucene.apache.org
Subject: Re: How to index X™ as ™ (HTML decimal entity)

Again, I'd like to know why this is wanted. It sounds like an X-Y, problem. Storing Unicode characters as XML/HTML encoded character references is an extremely bad idea.

wunder

On Nov 20, 2013, at 5:01 AM, "Jack Krupansky" <j...@basetechnology.com> wrote:

Any analysis filtering affects the indexed value only, but the stored value would be unchanged from the original input value. An update processor lets you modify the original input value that will be stored.

-- Jack Krupansky

-----Original Message----- From: Uwe Reh
Sent: Wednesday, November 20, 2013 5:43 AM
To: solr-user@lucene.apache.org
Subject: Re: How to index X™ as ™ (HTML decimal entity)

What's about having a simple charfilter in the analyzer queue for
indexing *and* searching. e.g
<charFilter class="solr.PatternReplaceFilterFactory" pattern="™"
replacement="&#8482;" />
or
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-specials.txt" />

Uwe

Am 19.11.2013 23:46, schrieb Developer:
I have a data coming in to SOLR as below.

<field name="displayName">X™ - Black</field>

I need to store the HTML Entity (decimal) equivalent value (i.e. &#8482;)
in SOLR rather than storing the original value.

Is there a way to do this?


--
Walter Underwood
wun...@wunderwood.org





Reply via email to