Ahhhh got you :)

Sorry. Correct, I use a Perl client. But sorry to say, I don't use
DataImportHandler. I just make the queries to the DB, filter the results,
and build the solr XML 'by hand' at the perl script :(

On Fri, Mar 20, 2009 at 1:04 PM, aerox7 <amyne.berr...@me.com> wrote:

>
> Yes ! i completely understand the problem. I'm just asking about your
> solution to resolvre this problem.
>
> I gess that you use Solar PERL Client to index your DATABASE. for my case i
> use DataImportHandler, so to only solution that i have with this is to
> create a transformer for DataImportHandler and try to convert my row from
> latin to UTF-8. (see
>
> http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9
> )
>
> So i just wanna know if you use DataImportHandler two with a perl script
> like a transformer ?
>
>
> Óscar Marín Miró wrote:
> >
> > What I mean is that unless "solène" travels to Solr in strict UTF-8,
> > mapping-ISOLatin1Accent won't do anything, and posibly your DB query
> > returns
> > data in ISO-Latin1 (I always have this issue with UTF8-Mysql), so unless
> > you
> > transcode your data from Latin1 to UTF8 before sending it to SolR,
> > mapping-ISOLatin1Accent won't know how to interpret it.
> >
> > Does it make any sense? :P
> >
> > On Fri, Mar 20, 2009 at 11:53 AM, aerox7 <amyne.berr...@me.com> wrote:
> >
> >>
> >> I'm using DataImportHandler to send my data to Solr ! so you mean it
> >> possible
> >> to apply a transformer in db-config.xml with a perl script ?
> >>
> >>
> >> Óscar Marín Miró wrote:
> >> >
> >> > Hi,
> >> >
> >> > My guess is that *although* your DB is in UTF-8, the database engine
> >> sends
> >> > you the rows in ISO-Latin1, so before doing *anything* after receiving
> >> the
> >> > data, you should transcode from ISO-Latin1 to UTF-8 and then send that
> >> to
> >> > SolR. I'm no Java expert, but in perl (MySQL DB in utf-8) I have to do
> >> > with
> >> > any row:
> >> >
> >> > $row=decode("iso-8859-1",$row);
> >> >
> >> > ... and before building the xml to invoque and add document to SolR:
> >> >
> >> > $row=encode("utf8",$row);
> >> >
> >> > On Fri, Mar 20, 2009 at 10:55 AM, aerox7 <amyne.berr...@me.com>
> wrote:
> >> >
> >> >>
> >> >> I add :
> >> >> "è" => "e" to mapping-ISOLatin1Accent.txt
> >> >>
> >> >> and add the following fieldType:
> >> >>
> >> >> <fieldType name="textCharNorm" class="solr.TextField"
> >> >> positionIncrementGap="100" >
> >> >>  <analyzer>
> >> >>    <charFilter class="solr.MappingCharFilterFactory"
> >> >> mapping="mapping-ISOLatin1Accent.txt"/>
> >> >>    <tokenizer
> class="solr.CharStreamAwareWhitespaceTokenizerFactory"/>
> >> >>  </analyzer>
> >> >> </fieldType>
> >> >>
> >> >> By still have the same probleme ! it's only work when i store ISO
> >> string
> >> >> into UTF-8 data base (ex: store solène not solène)............ :,(
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> aerox7 wrote:
> >> >> >
> >> >> > ==> where are you seeing it as ""Solène" as opposed to the
> >> >> > correct way of solène?
> >> >> >
> >> >> > I have "Solène" in my Mysql DATA BASE ! so i don't know if this is
> >> >> > correct or not ? i gess that "Solène" is solène in UTF-8 ?!
> >> >> >
> >> >> > I'vz tryed analysis in
> >> http://localhost:8983/solr/admin/analysis.jsp,
> >> >> so
> >> >> > when i try with solène everything is ok ! but when i try with
> >> Solène
> >> >> > (like what i have in DB) analysis convert à in A delete ¨ so i get
> >> >> SolAne
> >> >> > !!!
> >> >> >
> >> >> > I think that ISOLatin1AccentFilterFactory take only string with
> >> Charset
> >> >> > ISO-8859-1 .
> >> >> >
> >> >> > So any solution to transform my string to ISO-8859-1 before
> indexing
> >> >> > process. May be by creating transformer in DataImportHandler ?
> >> (Never
> >> >> code
> >> >> > in java :( )
> >> >> >
> >> >> > Thank you all.
> >> >> >
> >> >> >
> >> >> > Koji Sekiguchi-2 wrote:
> >> >> >>
> >> >> >> aerox7 wrote:
> >> >> >>> Hi,
> >> >> >>> I have a mysql data base in UTF-8. I have a row with "Solène"
> >> >> (solène).
> >> >> >>> I
> >> >> >>> want to transforme this to solene, so i use Solr
> >> >> >>> ISOLatin1AccentFilterFactory to perform this task but it dosn't
> >> work
> >> >> ?!!
> >> >> >>>
> >> >> >>> i gess that "Solène" is "solène" in UTF-8 ?! i also set tomcat
> to
> >> >> utf-8
> >> >> >>> so
> >> >> >>> normaly ISOLatin1AccentFilterFactory have to replace the accent
> >> >> .......
> >> >> >>>
> >> >> >>> any ideas ?
> >> >> >>>
> >> >> >>> i use DataImportHandler.
> >> >> >>>
> >> >> >>
> >> >> >> If a mapping rule "è" to "e" is always true in your field, you
> can
> >> >> try
> >> >> >> to use MappingCharFilter
> >> >> >> instead of ISOLatin1AccentFilter. Add the following line to
> >> >> >> mapping-ISOLatin1Accent.txt:
> >> >> >>
> >> >> >> "è" => "e"
> >> >> >>
> >> >> >> and add the following fieldType:
> >> >> >>
> >> >> >> <fieldType name="textCharNorm" class="solr.TextField"
> >> >> >> positionIncrementGap="100" >
> >> >> >>   <analyzer>
> >> >> >>     <charFilter class="solr.MappingCharFilterFactory"
> >> >> >> mapping="mapping-ISOLatin1Accent.txt"/>
> >> >> >>     <tokenizer
> >> >> class="solr.CharStreamAwareWhitespaceTokenizerFactory"/>
> >> >> >>   </analyzer>
> >> >> >> </fieldType>
> >> >> >>
> >> >> >> MappingCharFilter and mapping-ISOLatin1Accent.txt are in nightly
> >> >> build.
> >> >> >>
> >> >> >> Koji
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >> --
> >> >> View this message in context:
> >> >>
> >>
> http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22617278.html
> >> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > “I may not believe in myself, but I believe in what I'm doing.”
> >> >
> >> > -- Jimmy Page
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22618085.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
> > --
> > “I may not believe in myself, but I believe in what I'm doing.”
> >
> > -- Jimmy Page
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22618999.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
“I may not believe in myself, but I believe in what I'm doing.”

-- Jimmy Page

Reply via email to