Ahhhh got you :) Sorry. Correct, I use a Perl client. But sorry to say, I don't use DataImportHandler. I just make the queries to the DB, filter the results, and build the solr XML 'by hand' at the perl script :(
On Fri, Mar 20, 2009 at 1:04 PM, aerox7 <amyne.berr...@me.com> wrote: > > Yes ! i completely understand the problem. I'm just asking about your > solution to resolvre this problem. > > I gess that you use Solar PERL Client to index your DATABASE. for my case i > use DataImportHandler, so to only solution that i have with this is to > create a transformer for DataImportHandler and try to convert my row from > latin to UTF-8. (see > > http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9 > ) > > So i just wanna know if you use DataImportHandler two with a perl script > like a transformer ? > > > Óscar Marín Miró wrote: > > > > What I mean is that unless "solène" travels to Solr in strict UTF-8, > > mapping-ISOLatin1Accent won't do anything, and posibly your DB query > > returns > > data in ISO-Latin1 (I always have this issue with UTF8-Mysql), so unless > > you > > transcode your data from Latin1 to UTF8 before sending it to SolR, > > mapping-ISOLatin1Accent won't know how to interpret it. > > > > Does it make any sense? :P > > > > On Fri, Mar 20, 2009 at 11:53 AM, aerox7 <amyne.berr...@me.com> wrote: > > > >> > >> I'm using DataImportHandler to send my data to Solr ! so you mean it > >> possible > >> to apply a transformer in db-config.xml with a perl script ? > >> > >> > >> Óscar Marín Miró wrote: > >> > > >> > Hi, > >> > > >> > My guess is that *although* your DB is in UTF-8, the database engine > >> sends > >> > you the rows in ISO-Latin1, so before doing *anything* after receiving > >> the > >> > data, you should transcode from ISO-Latin1 to UTF-8 and then send that > >> to > >> > SolR. I'm no Java expert, but in perl (MySQL DB in utf-8) I have to do > >> > with > >> > any row: > >> > > >> > $row=decode("iso-8859-1",$row); > >> > > >> > ... and before building the xml to invoque and add document to SolR: > >> > > >> > $row=encode("utf8",$row); > >> > > >> > On Fri, Mar 20, 2009 at 10:55 AM, aerox7 <amyne.berr...@me.com> > wrote: > >> > > >> >> > >> >> I add : > >> >> "è" => "e" to mapping-ISOLatin1Accent.txt > >> >> > >> >> and add the following fieldType: > >> >> > >> >> <fieldType name="textCharNorm" class="solr.TextField" > >> >> positionIncrementGap="100" > > >> >> <analyzer> > >> >> <charFilter class="solr.MappingCharFilterFactory" > >> >> mapping="mapping-ISOLatin1Accent.txt"/> > >> >> <tokenizer > class="solr.CharStreamAwareWhitespaceTokenizerFactory"/> > >> >> </analyzer> > >> >> </fieldType> > >> >> > >> >> By still have the same probleme ! it's only work when i store ISO > >> string > >> >> into UTF-8 data base (ex: store solène not solène)............ :,( > >> >> > >> >> > >> >> > >> >> > >> >> aerox7 wrote: > >> >> > > >> >> > ==> where are you seeing it as ""Solène" as opposed to the > >> >> > correct way of solène? > >> >> > > >> >> > I have "Solène" in my Mysql DATA BASE ! so i don't know if this is > >> >> > correct or not ? i gess that "Solène" is solène in UTF-8 ?! > >> >> > > >> >> > I'vz tryed analysis in > >> http://localhost:8983/solr/admin/analysis.jsp, > >> >> so > >> >> > when i try with solène everything is ok ! but when i try with > >> Solène > >> >> > (like what i have in DB) analysis convert à in A delete ¨ so i get > >> >> SolAne > >> >> > !!! > >> >> > > >> >> > I think that ISOLatin1AccentFilterFactory take only string with > >> Charset > >> >> > ISO-8859-1 . > >> >> > > >> >> > So any solution to transform my string to ISO-8859-1 before > indexing > >> >> > process. May be by creating transformer in DataImportHandler ? > >> (Never > >> >> code > >> >> > in java :( ) > >> >> > > >> >> > Thank you all. > >> >> > > >> >> > > >> >> > Koji Sekiguchi-2 wrote: > >> >> >> > >> >> >> aerox7 wrote: > >> >> >>> Hi, > >> >> >>> I have a mysql data base in UTF-8. I have a row with "Solène" > >> >> (solène). > >> >> >>> I > >> >> >>> want to transforme this to solene, so i use Solr > >> >> >>> ISOLatin1AccentFilterFactory to perform this task but it dosn't > >> work > >> >> ?!! > >> >> >>> > >> >> >>> i gess that "Solène" is "solène" in UTF-8 ?! i also set tomcat > to > >> >> utf-8 > >> >> >>> so > >> >> >>> normaly ISOLatin1AccentFilterFactory have to replace the accent > >> >> ....... > >> >> >>> > >> >> >>> any ideas ? > >> >> >>> > >> >> >>> i use DataImportHandler. > >> >> >>> > >> >> >> > >> >> >> If a mapping rule "è" to "e" is always true in your field, you > can > >> >> try > >> >> >> to use MappingCharFilter > >> >> >> instead of ISOLatin1AccentFilter. Add the following line to > >> >> >> mapping-ISOLatin1Accent.txt: > >> >> >> > >> >> >> "è" => "e" > >> >> >> > >> >> >> and add the following fieldType: > >> >> >> > >> >> >> <fieldType name="textCharNorm" class="solr.TextField" > >> >> >> positionIncrementGap="100" > > >> >> >> <analyzer> > >> >> >> <charFilter class="solr.MappingCharFilterFactory" > >> >> >> mapping="mapping-ISOLatin1Accent.txt"/> > >> >> >> <tokenizer > >> >> class="solr.CharStreamAwareWhitespaceTokenizerFactory"/> > >> >> >> </analyzer> > >> >> >> </fieldType> > >> >> >> > >> >> >> MappingCharFilter and mapping-ISOLatin1Accent.txt are in nightly > >> >> build. > >> >> >> > >> >> >> Koji > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> > > >> >> > > >> >> > >> >> -- > >> >> View this message in context: > >> >> > >> > http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22617278.html > >> >> Sent from the Solr - User mailing list archive at Nabble.com. > >> >> > >> >> > >> > > >> > > >> > -- > >> > “I may not believe in myself, but I believe in what I'm doing.” > >> > > >> > -- Jimmy Page > >> > > >> > > >> > >> -- > >> View this message in context: > >> > http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22618085.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> > > > > > > -- > > “I may not believe in myself, but I believe in what I'm doing.” > > > > -- Jimmy Page > > > > > > -- > View this message in context: > http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22618999.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- “I may not believe in myself, but I believe in what I'm doing.” -- Jimmy Page