Hi, My guess is that *although* your DB is in UTF-8, the database engine sends you the rows in ISO-Latin1, so before doing *anything* after receiving the data, you should transcode from ISO-Latin1 to UTF-8 and then send that to SolR. I'm no Java expert, but in perl (MySQL DB in utf-8) I have to do with any row:
$row=decode("iso-8859-1",$row); ... and before building the xml to invoque and add document to SolR: $row=encode("utf8",$row); On Fri, Mar 20, 2009 at 10:55 AM, aerox7 <amyne.berr...@me.com> wrote: > > I add : > "è" => "e" to mapping-ISOLatin1Accent.txt > > and add the following fieldType: > > <fieldType name="textCharNorm" class="solr.TextField" > positionIncrementGap="100" > > <analyzer> > <charFilter class="solr.MappingCharFilterFactory" > mapping="mapping-ISOLatin1Accent.txt"/> > <tokenizer class="solr.CharStreamAwareWhitespaceTokenizerFactory"/> > </analyzer> > </fieldType> > > By still have the same probleme ! it's only work when i store ISO string > into UTF-8 data base (ex: store solène not solène)............ :,( > > > > > aerox7 wrote: > > > > ==> where are you seeing it as ""Solène" as opposed to the > > correct way of solène? > > > > I have "Solène" in my Mysql DATA BASE ! so i don't know if this is > > correct or not ? i gess that "Solène" is solène in UTF-8 ?! > > > > I'vz tryed analysis in http://localhost:8983/solr/admin/analysis.jsp, so > > when i try with solène everything is ok ! but when i try with Solène > > (like what i have in DB) analysis convert à in A delete ¨ so i get SolAne > > !!! > > > > I think that ISOLatin1AccentFilterFactory take only string with Charset > > ISO-8859-1 . > > > > So any solution to transform my string to ISO-8859-1 before indexing > > process. May be by creating transformer in DataImportHandler ? (Never > code > > in java :( ) > > > > Thank you all. > > > > > > Koji Sekiguchi-2 wrote: > >> > >> aerox7 wrote: > >>> Hi, > >>> I have a mysql data base in UTF-8. I have a row with "Solène" > (solène). > >>> I > >>> want to transforme this to solene, so i use Solr > >>> ISOLatin1AccentFilterFactory to perform this task but it dosn't work > ?!! > >>> > >>> i gess that "Solène" is "solène" in UTF-8 ?! i also set tomcat to > utf-8 > >>> so > >>> normaly ISOLatin1AccentFilterFactory have to replace the accent ....... > >>> > >>> any ideas ? > >>> > >>> i use DataImportHandler. > >>> > >> > >> If a mapping rule "è" to "e" is always true in your field, you can try > >> to use MappingCharFilter > >> instead of ISOLatin1AccentFilter. Add the following line to > >> mapping-ISOLatin1Accent.txt: > >> > >> "è" => "e" > >> > >> and add the following fieldType: > >> > >> <fieldType name="textCharNorm" class="solr.TextField" > >> positionIncrementGap="100" > > >> <analyzer> > >> <charFilter class="solr.MappingCharFilterFactory" > >> mapping="mapping-ISOLatin1Accent.txt"/> > >> <tokenizer class="solr.CharStreamAwareWhitespaceTokenizerFactory"/> > >> </analyzer> > >> </fieldType> > >> > >> MappingCharFilter and mapping-ISOLatin1Accent.txt are in nightly build. > >> > >> Koji > >> > >> > >> > >> > > > > > > -- > View this message in context: > http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22617278.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- “I may not believe in myself, but I believe in what I'm doing.” -- Jimmy Page