[twitter-dev] Streaming API with Chinese/Japanese language track predicates

Toby Phipps Wed, 07 Apr 2010 09:03:16 -0700

Hi,

Has anyone managed to get Japanese or Chinese language track
predicates working with the Streaming API? No matter what I try, I
fail to get any matches using "track" and any Japanese character, or
word.


I note from the doc that "Some UTF-8 keywords will not match
correctly- this is a known temporary defect", however this sounds more
like an edge case, maybe with with certain denormalized Unicode forms.
Does this really extend to pretty much any searching in Chinese/
Japanese?

Some of the predicates I've tried, all which result in no statuses
arriving:

日本　("Japan" - shows up as being very frequent via the search API)
よ　(A Japanese form of exclamation - again very popular in tweets)
ツイッター (Japanese for Twitter - literally "tsu-i-tta")

Given the talk about a hash map being used for status matching, I'm
thinking that this could be because no wordbreaking (n-gram/
morphology) is performed against Chinese/Japanese tweets before they
get added to the hash map, and since most words aren't space-delimited
in these languages, if I don't manage to match an entire sentence, I
won't get a hit. However, all these searches work just fine via the
search API (which I understand is still on a different platform).

Any ideas?

Thanks,
Toby.

[twitter-dev] Streaming API with Chinese/Japanese language track predicates

Reply via email to