Re: design that mimics twitter tweet search

2012-03-19 Thread Sasha Dolgy
most excellent ... thanks Chris! On Mon, Mar 19, 2012 at 9:23 AM, Chris Goffinet wrote: > We do not use Cassandra for search. We made modifications to Lucene. > > Here is a blog post on our engineering section that talks about what we > did: > > > http://engineering.twitter.com/2011/04/twitter-se

Re: design that mimics twitter tweet search

2012-03-19 Thread Chris Goffinet
We do not use Cassandra for search. We made modifications to Lucene. Here is a blog post on our engineering section that talks about what we did: http://engineering.twitter.com/2011/04/twitter-search-is-now-3x-faster_1656.html On Sun, Mar 18, 2012 at 11:22 PM, Tharindu Mathew wrote: > Sasha, >

Re: design that mimics twitter tweet search

2012-03-18 Thread Tharindu Mathew
Sasha, It depends on the way you implement I guess... Maybe twitter uses Solandra, who's very good at indexing these in different ways but has the power of Cassandra underneath... If your doing your own impl of indexing be mindful that you can break the sentence into four words and index or you i

Re: design that mimics twitter tweet search

2012-03-18 Thread Andrey V. Panov
Why you suppose they did search on Cassandra? On 19 March 2012 00:16, Sasha Dolgy wrote: > yes -- but given i have two keywords, and want to find all tweets that > have "cassandra" and "bestest" ... means, retrieving all columns + values > in each row, iterating through both to see if tweet id's

Re: design that mimics twitter tweet search

2012-03-18 Thread Sasha Dolgy
yes -- but given i have two keywords, and want to find all tweets that have "cassandra" and "bestest" ... means, retrieving all columns + values in each row, iterating through both to see if tweet id's in one, exist in the other and finishing up with a consolidated list of tweet id's that only exis

Re: design that mimics twitter tweet search

2012-03-18 Thread Benoit Perroud
The simpliest modeling you could have is using the keyword as key, a timestamp/time UUID as column name and the tweetid as value -> cf['keyword']['timestamp'] = tweetid then you do a range query to get all tweetid sorted by time (you may want them in reverse order) and you can limit to the number

design that mimics twitter tweet search

2012-03-18 Thread Sasha Dolgy
Hi All, With twitter, when I search for words like: "cassandra is the bestest", 4 tweets will appear, including one i just did. My understand that the internals of twitter work in that each word in a tweet is allocated, irrespective of the presence of a # hash tag, and the tweet id is assigned