We're not using Cassandra to store tweets just yet. See:
I don't think we've announced our approach for tweet storage as yet.
On Mon, Aug 23, 2010 at 8:03 PM, D. Smith <emai...@sharedlog.com> wrote:
> Another one hits the MySQL brick wall.
> I'm surprised someone with as much data as you have have managed to
> stay with MySQL for as long as you have.
> I must have been a real pain to constantly fight the loosing MySQL
> optimization battle.
> It would be very interesting to know what made you choose Cassandra
> over other NoSQL solutions.
> I hope you will post a nice blog post about this, why you chose
> Cassandra? What alternative have you considered?
> On Aug 23, 6:45 pm, Matt Harris <thematthar...@twitter.com> wrote:
> > Hey Developers!
> > A while ago we let you know about the new Tweet ID generation service
> > we developed called Snowflake and published the source code so you
> > could get familiar with how it works. Today, we're announcing that at
> > 10am PDT on Tuesday September 21st, 2010 Snowflake will be in use on
> > our production systems and that status IDs will no longer be
> > sequential.
> > Snowflake still uses 64-bit unsigned integers but instead of being
> > sequential they will instead be based on time and composed of: a
> > timestamp, a worker number and a sequence number. For the majority of
> > you this change will go unnoticed and your applications will continue
> > to function without the need for any changes. In addition the API is
> > ready for Snowflake and parameters such as max_id and since_id will
> > work as expected. Snowflake does mean Tweet IDs will no longer be
> > useful for data analysis, and things like counting Tweets by
> > subtracting status IDs will not be possible.
> > We listened when you told us about sorting Tweets by ID and knew that
> > we needed to keep the ID roughly sortable. With Snowflake if two
> > Tweets are posted within 1 second of each other they will be within a
> > second of each other in the ID space too. This means although Tweets
> > will no longer be sorted, they will be k-sorted to approximately 1
> > second.
> > The key points:
> > * Status IDs will be unique
> > * Status IDs will continue to increase - Tweets created later in the
> > day will have a higher ID that those created in the morning
> > * Order will be maintained for Tweets allowing you to sort by Status
> > ID. The accuracy of the sort will be to approximately 1 second,
> > meaning Tweets created within a second of each other have no order.
> > * All existing API methods will continue to work the same as before
> > * Previous status IDs will be unchanged
> > * There will be a noticeable jump in the numerical value of status IDs
> > when we change.
> > You can read more about Snowflake on the Twitter Engineering blog:
> > Best
> > Matt Harris
> > Developer Advocate, Twitterhttp://twitter.com/themattharris