Hi, this problem was already posted to the twitter4j mailing list [1]. Not sure if it is an issue with my code, twitter4j or an API issue... user reported similar problems in the past [2].
First: I'm doing a 100 tweet search (without paging) every 5 minutes e.g. against 'twitter search'. I get a set of tweets A - excluding the duplicates, of course. I get approx 5 new tweets for every 5 minutes, so 100 tweets as pageSize should be perfectly sufficient to get all tweets. Second: When I'm doing a streaming filter request for the same terms 'twitter search' then I'm getting a set of tweets B. The problem is: combining A and B ('C=A v B') gives me a set C where the count of C is more than 10% larger then A or B, which means that neither with search nor streaming API I can catch a nearly complete set of tweets. E.g. doing this for 3 hours I'm getting 254 tweets (A) for the search and 257 tweets (B) for the streaming but the combined set C has 337 tweets! Is this a bug in my code or could this be an API issue? BTW: I don't assume 100% correctness, I only want something above 90% :) especially for such relatively infrequent terms, where users can, should and have noticed it. Regards, Peter. [1] http://groups.google.com/group/twitter4j/msg/d959e6257ceb452f [2] http://groups.google.com/group/twitter-development-talk/browse_thread/thread/71ab5cc666113c9e http://blog.tweetsmarter.com/twitter-downtime/twitters-dirty-secret-they-dont-show-you-all-tweets/ -- http://jetwick.com Twitter Search without Noise -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk