Re: Problem with UTF-8 and Solr ISOLatin1AccentFilterFactory

aerox7 Fri, 20 Mar 2009 01:36:48 -0700

==> where are you seeing it as ""SolÃ¨ne" as opposed to the   
correct way of solène?

I have "SolÃ¨ne" in my Mysql DATA BASE ! so i don't know if this is correct
or not ? i gess that "SolÃ¨ne" is solène in UTF-8 ?!

I'vz tryed analysis in http://localhost:8983/solr/admin/analysis.jsp, so
when i try with solène everything is ok ! but when i try with SolÃ¨ne (like
what i have in DB) analysis convert Ã in A delete ¨ so i get SolAne !!!

I think that ISOLatin1AccentFilterFactory take only string with Charset
ISO-8859-1 .

So any solution to transform my string to ISO-8859-1 before indexing
process. May be by creating transformer in DataImportHandler ? (Never code
in java :( )

Thank you all.

Koji Sekiguchi-2 wrote:
> 
> aerox7 wrote:
>> Hi,
>> I have a mysql data base in UTF-8. I have a row with "SolÃ¨ne" (solène).
>> I
>> want to transforme this to solene, so i use Solr
>> ISOLatin1AccentFilterFactory to perform this task but it dosn't work ?!!
>>
>> i gess that "SolÃ¨ne" is "solène" in UTF-8 ?! i also set tomcat to utf-8
>> so
>> normaly ISOLatin1AccentFilterFactory have to replace the accent .......
>>
>> any ideas ?
>>
>> i use DataImportHandler.
>>   
> 
> If a mapping rule "Ã¨" to "e" is always true in your field, you can try 
> to use MappingCharFilter
> instead of ISOLatin1AccentFilter. Add the following line to 
> mapping-ISOLatin1Accent.txt:
> 
> "Ã¨" => "e"
> 
> and add the following fieldType:
> 
> <fieldType name="textCharNorm" class="solr.TextField" 
> positionIncrementGap="100" >
>   <analyzer>
>     <charFilter class="solr.MappingCharFilterFactory" 
> mapping="mapping-ISOLatin1Accent.txt"/>
>     <tokenizer class="solr.CharStreamAwareWhitespaceTokenizerFactory"/>
>   </analyzer>
> </fieldType>
> 
> MappingCharFilter and mapping-ISOLatin1Accent.txt are in nightly build.
> 
> Koji
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22616220.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with UTF-8 and Solr ISOLatin1AccentFilterFactory

Reply via email to