Quoting John Kalucki <j...@twitter.com>:
We don't have current plans to fix this issue. The problem isn't around
utf-8, but rather around non-space separated languages. Our language
processing experts described the effort required, and it's a pretty large
project, and may be computationally impractical in the current streaming
architecture. There is a workaround, albeit a generally impractical one:
take the firehose and perform the language parsing on your end.
Is there a "canonical list" of non-space-separated languages? I'm just
starting to look into this myself. There's quite a bit of research
available for Chinese, but what are the others? And while we're on the
subject, how about right-to-left languages?
Yes, it's a large project, but CJK and Arabic represent large
*markets* too. I can understand Twitter needing to prioritize
engineering resources, but the marketer in me says such problems could
be solved with the application of money and a Twitter lab somewhere in
east Asia. ;-)
On Mon, Jun 28, 2010 at 1:34 AM, sjoonk <sjo...@gmail.com> wrote:
I know current Twitter Streaming API do not support utf-8 track
As an CJK engineer, I hope this feature will implemented so soon.
Does anybody know when will be supported this feature in Twitter