Re: Problem with UTF-8 and Solr ISOLatin1AccentFilterFactory

Óscar Marín Miró Fri, 20 Mar 2009 03:15:09 -0700

Hi,

My guess is that *although* your DB is in UTF-8, the database engine sends
you the rows in ISO-Latin1, so before doing *anything* after receiving the
data, you should transcode from ISO-Latin1 to UTF-8 and then send that to
SolR. I'm no Java expert, but in perl (MySQL DB in utf-8) I have to do with
any row:


$row=decode("iso-8859-1",$row);

... and before building the xml to invoque and add document to SolR:

$row=encode("utf8",$row);

On Fri, Mar 20, 2009 at 10:55 AM, aerox7 <amyne.berr...@me.com> wrote:

>
> I add :
> "Ã¨" => "e" to mapping-ISOLatin1Accent.txt
>
> and add the following fieldType:
>
> <fieldType name="textCharNorm" class="solr.TextField"
> positionIncrementGap="100" >
>  <analyzer>
>    <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping-ISOLatin1Accent.txt"/>
>    <tokenizer class="solr.CharStreamAwareWhitespaceTokenizerFactory"/>
>  </analyzer>
> </fieldType>
>
> By still have the same probleme ! it's only work when i store ISO string
> into UTF-8 data base (ex: store solène not solÃ¨ne)............ :,(
>
>
>
>
> aerox7 wrote:
> >
> > ==> where are you seeing it as ""SolÃ¨ne" as opposed to the
> > correct way of solène?
> >
> > I have "SolÃ¨ne" in my Mysql DATA BASE ! so i don't know if this is
> > correct or not ? i gess that "SolÃ¨ne" is solène in UTF-8 ?!
> >
> > I'vz tryed analysis in http://localhost:8983/solr/admin/analysis.jsp, so
> > when i try with solène everything is ok ! but when i try with SolÃ¨ne
> > (like what i have in DB) analysis convert Ã in A delete ¨ so i get SolAne
> > !!!
> >
> > I think that ISOLatin1AccentFilterFactory take only string with Charset
> > ISO-8859-1 .
> >
> > So any solution to transform my string to ISO-8859-1 before indexing
> > process. May be by creating transformer in DataImportHandler ? (Never
> code
> > in java :( )
> >
> > Thank you all.
> >
> >
> > Koji Sekiguchi-2 wrote:
> >>
> >> aerox7 wrote:
> >>> Hi,
> >>> I have a mysql data base in UTF-8. I have a row with "SolÃ¨ne"
> (solène).
> >>> I
> >>> want to transforme this to solene, so i use Solr
> >>> ISOLatin1AccentFilterFactory to perform this task but it dosn't work
> ?!!
> >>>
> >>> i gess that "SolÃ¨ne" is "solène" in UTF-8 ?! i also set tomcat to
> utf-8
> >>> so
> >>> normaly ISOLatin1AccentFilterFactory have to replace the accent .......
> >>>
> >>> any ideas ?
> >>>
> >>> i use DataImportHandler.
> >>>
> >>
> >> If a mapping rule "Ã¨" to "e" is always true in your field, you can try
> >> to use MappingCharFilter
> >> instead of ISOLatin1AccentFilter. Add the following line to
> >> mapping-ISOLatin1Accent.txt:
> >>
> >> "Ã¨" => "e"
> >>
> >> and add the following fieldType:
> >>
> >> <fieldType name="textCharNorm" class="solr.TextField"
> >> positionIncrementGap="100" >
> >>   <analyzer>
> >>     <charFilter class="solr.MappingCharFilterFactory"
> >> mapping="mapping-ISOLatin1Accent.txt"/>
> >>     <tokenizer class="solr.CharStreamAwareWhitespaceTokenizerFactory"/>
> >>   </analyzer>
> >> </fieldType>
> >>
> >> MappingCharFilter and mapping-ISOLatin1Accent.txt are in nightly build.
> >>
> >> Koji
> >>
> >>
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22617278.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
“I may not believe in myself, but I believe in what I'm doing.”

-- Jimmy Page

Re: Problem with UTF-8 and Solr ISOLatin1AccentFilterFactory

Reply via email to