Re: Importing of unix date format from mysql database and dates of format 'Thu, 06 Sep 2012 22:32:33 +0000' in Solr 4.0

Gora Mohanty Thu, 06 Sep 2012 19:14:32 -0700

On 7 September 2012 06:24, kiran chitturi <chitturikira...@gmail.com> wrote:
[...]


> When i index a text field which has arabic and English like this tweet
> “@anaga3an: هو سعد الحريري بيعمل ايه غير تحديد الدوجلاس ويختار الكرافته ؟؟”
> #gcc #ksa #lebanon #syria #kuwait #egypt #سوريا
> with field_type as 'text_ar' and when i try to see the same field again in
> solr, it is shown as below.
> RT @AhmedWagih: Ù„Ùˆ Ù…Ø¹Ù…Ù„Ù†Ø§Ø´ ØØ§Ø¬Ø© Ù�ÙŠ Ø§Ù„Ø²ÙŠØ§Ø¯Ø©
> Ø§Ù„Ø³ÙƒØ§Ù†ÙŠØ© Ù�ÙŠ Ù…ØµØ±ØŒ Ù‡Ù†ØªØÙˆÙ„ Ù„Ø¯ÙˆÙ„Ø© Ù�Ù‚ÙŠØ±Ø©
> ÙƒØ«ÙŠÙ�Ø© Ø§Ù„Ø³ÙƒØ§Ù† Ø²ÙŠ Ø¨Ù†Ø¬Ù„Ø§Ø¯Ø´ #Egypt #EgyEconomy
>
> both of the lines do not mean the same, but i have just placed them here as
> an example. This was the problem i am facing.
>
[...]

The encoding of your input text is being mangled at some point.
Presuming that your original encoding is UTF-8, I would look at
how you are indexing into Solr, and the encoding settings on the
Java container. Solr itself handles UTF-8 perfectly fine, as do
most Java containers if configured properly, so my first suspicion
would be the indexing code.

As it looks like you are pulling from mysql using DIH, check that
the database character set is UTF-8, and that the connection uses
UTF-8.

Regards,
Gora

Re: Importing of unix date format from mysql database and dates of format 'Thu, 06 Sep 2012 22:32:33 +0000' in Solr 4.0

Reply via email to