Google forces a translation to Japanese

2009-09-14 Thread Shachar Shemesh

Hi all,

One of my clients is having a weird problem, and I'm pretty much at my 
wit's end as for what to do about it.


The site is called Tzofit (at tzofit.co.il), and is an index and 
publisher for Zimmers. When you search Google for צימרים the site 
appears on the second page, and when you search Google for צופית it is 
the first result. In both cases, you cannot miss it - Google displays 
the site's title and summary as Japanese!


Now here's where it gets really strange. While the main site is 
proclaimed to be in Japanese, all the deep links are in Hebrew. If you 
ask to see the Google cache, the site appears in Hebrew. If you search 
for its address directly (tzofit.co.il), the site appears with correct 
title and summary. The only explanation I have is that this is a Google 
index bug.


The problem is that even if that is the case, I cannot see what I can do 
about it. I tried to ask about it on the Google forums 
(http://www.google.com/support/forum/p/Web+Search/thread?tid=08c423ea40d5c1abhl=en), 
but, as expected, got not replies. On the other hand, I did not manage 
to find anything wrong with the actual page.


Trying to translate the Japanese text, using Google Translate, back to 
English seems to show that the text translates, but is not coherent 
sentences. Then again, looking at the raw encoding, this does not appear 
to be Hebrew interpreted with the wrong encoding (or am I missing 
something?)


If anyone has any clue, it would be much appreciated.

Thanks,
Shachar

--
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Google forces a translation to Japanese

2009-09-14 Thread shimi
2009/9/14 Shachar Shemesh shac...@shemesh.biz

  Hi all,

 One of my clients is having a weird problem, and I'm pretty much at my
 wit's end as for what to do about it.

 The site is called Tzofit (at tzofit.co.il), and is an index and
 publisher for Zimmers. When you search Google for צימרים the site appears
 on the second page, and when you search Google for צופית it is the first
 result. In both cases, you cannot miss it - Google displays the site's title
 and summary as Japanese!

 Now here's where it gets really strange. While the main site is proclaimed
 to be in Japanese, all the deep links are in Hebrew. If you ask to see the
 Google cache, the site appears in Hebrew. If you search for its address
 directly (tzofit.co.il), the site appears with correct title and summary.
 The only explanation I have is that this is a Google index bug.

 The problem is that even if that is the case, I cannot see what I can do
 about it. I tried to ask about it on the Google forums (
 http://www.google.com/support/forum/p/Web+Search/thread?tid=08c423ea40d5c1abhl=en),
 but, as expected, got not replies. On the other hand, I did not manage to
 find anything wrong with the actual page.

 Trying to translate the Japanese text, using Google Translate, back to
 English seems to show that the text translates, but is not coherent
 sentences. Then again, looking at the raw encoding, this does not appear to
 be Hebrew interpreted with the wrong encoding (or am I missing something?)

 If anyone has any clue, it would be much appreciated.


 I would try the following:

   - remove extra newlines from beginning of document. an xml document
   should begin with an xml definition. maybe newlines are valid, i never
   checked, but usually they don't begin that way, so why do it... :)
   - in an html document, you define the language inside the html opening
   tag, with lang=he. the meta tag that does this is redundant, and I would
   assume google likes the html definition better.
   - the newlines in the file appears to be dos-style. maybe you want to try
   to run the file through dos2unix
   - it could be this windows-1255 thing - maybe try putting there
   iso-8859-8-i - or even better, switch to utf-8 altogether. everybody loves
   utf-8 :)


These are my ideas...

HTH,

-- Shimi
___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Google forces a translation to Japanese

2009-09-14 Thread Noam Rathaus
Hi Shahacr,

A bit far fetched, if you control the web server, could you verify
that there is no special treatment to Google Bots in regard to the
responses receives/sent to it?

Also I noticed the web server doesn't specify which language it
responds with, it is worth telling it via the Content-Language
header.

2009/9/14 Shachar Shemesh shac...@shemesh.biz:
 Hi all,

 One of my clients is having a weird problem, and I'm pretty much at my wit's
 end as for what to do about it.

 The site is called Tzofit (at tzofit.co.il), and is an index and publisher
 for Zimmers. When you search Google for צימרים the site appears on the
 second page, and when you search Google for צופית it is the first result.
 In both cases, you cannot miss it - Google displays the site's title and
 summary as Japanese!

 Now here's where it gets really strange. While the main site is proclaimed
 to be in Japanese, all the deep links are in Hebrew. If you ask to see the
 Google cache, the site appears in Hebrew. If you search for its address
 directly (tzofit.co.il), the site appears with correct title and summary.
 The only explanation I have is that this is a Google index bug.

 The problem is that even if that is the case, I cannot see what I can do
 about it. I tried to ask about it on the Google forums
 (http://www.google.com/support/forum/p/Web+Search/thread?tid=08c423ea40d5c1abhl=en),
 but, as expected, got not replies. On the other hand, I did not manage to
 find anything wrong with the actual page.

 Trying to translate the Japanese text, using Google Translate, back to
 English seems to show that the text translates, but is not coherent
 sentences. Then again, looking at the raw encoding, this does not appear to
 be Hebrew interpreted with the wrong encoding (or am I missing something?)

 If anyone has any clue, it would be much appreciated.

 Thanks,
 Shachar

 --
 Shachar Shemesh
 Lingnu Open Source Consulting Ltd.
 http://www.lingnu.com

 ___
 Linux-il mailing list
 Linux-il@cs.huji.ac.il
 http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il



___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Google forces a translation to Japanese

2009-09-14 Thread Amos Shapira
2009/9/14 Shachar Shemesh shac...@shemesh.biz

 Hi all,

 One of my clients is having a weird problem, and I'm pretty much at my wit's 
 end as for what to do about it.

In addition to the other advise you got - maybe have a sniff around
the Google Webmaster Tools site (http://www.google.com/webmasters/) to
try to find a way through to google or understand more about what
Google thinks about this web site.

--Amos

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Google forces a translation to Japanese

2009-09-14 Thread Shachar Shemesh

Yuval Hager wrote:

Trying to translate the Japanese text, using Google Translate, back to
English seems to show that the text translates, but is not coherent
sentences. Then again, looking at the raw encoding, this does not appear
to be Hebrew interpreted with the wrong encoding (or am I missing
something?)

If anyone has any clue, it would be much appreciated.

Thanks,
Shachar



The Japanese text is not complete nonsense, like you would expect from an 
encoding problem. Could it be that the site was hacked in some way that 
presents Google bots different content from what others see? 

  
The client contacted no less than three (3) SEO specialists, with all 
not coming any more than It's a malware of some sort.  They even 
recommended we hire a scanning service by one of the list's participants 
(which we would have, had Noam answered his messenger - in fact, Noam, 
please have one of your sales people contact me). What not one of them 
managed to do is explain how a malware can cause the Google cache to 
show the wrong result for some search results, and the correct one for 
others, nor how to make Google show the wrong summary, but the correct 
page in the cache.


My personal opinion is that Google had a bug that crossed the index with 
some other site. Admittedly, that theory does not completely match up to 
all available evidence. For instance, if you search Google for the 
Japanese description in quotes, there are zero results found (then 
again, you also don't get Tzofit's site, which is also weird).


Like I said, ideas welcome.

Shachar

--
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il