AFAIK, atomic increments are not available. There recently has been quite a bit of discussion about them. So, you might search the archives.
Dave Viner On Mon, Jul 26, 2010 at 7:02 PM, Mark <[email protected]> wrote: > On 7/26/10 6:06 PM, Dave Viner wrote: > > I'd love to hear other's opinions here... but here are my 2 cents. > > With Cassandra, you need to think of the queries - which you've pretty > much done. > > For the most popular queries, you could do something like: > > <ColumnFamily Name="QueriesCounted" > ComparesWith="UTF8Type" > /> > And then access it as: > key-space.QueriesCounted['query-foo-bar'] = $count; > > This makes it easy to get the count for any particular query. I'm not > sure the best way to store the "top counts" idea. Perhaps a secondary > process which iterates over all the queries to see which sorts the query > values by count, and then stores them into another ColumnFamily. > > You could use the same idea for the last query (session ids by query) > > <ColumnFamily Name="QueriesRecorded" > ComparesWith="UTF8Type" > ColumnType="super" > CompareSubcolumnsWith="TimeUUIDType" > /> > And then access it as: > key-space. QueriesRecorded['query-foo-bar'][timeuuid] = session-id; > > Actually, if you used that idea (queries-recorded), you could generate > the counts and aggregates from that directly in a hadoop post-processing... > > But perhaps others will have better ideas. If you haven't read > http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model, go read it > now. It won't answer your question directly, but will describe the process > of modeling a blog in cassandra so you can get a sense of the process. > > Dave Viner > > > > > On Mon, Jul 26, 2010 at 4:46 PM, Mark <[email protected]> wrote: > >> We are thinking about using Cassandra to store our search logs. Can >> someone point me in the right direction/lend some guidance on design? I am >> new to Cassandra and I am having trouble wrapping my head around some of >> these new concepts. My brain keeps wanting to go back to a RDBMS design. >> >> We will be storing the user query, # of hits returned and their session >> id. We would like to be able to answer the following questions. >> >> - What is the n most popular queries and their counts within the last x >> (mins/hours/days/etc). Basically the most popular searches within a given >> time range. >> - What is the most popular query within the last x where hits = 0. Same as >> above but with an extra "where" clause >> - For session id x give me all their other queries >> - What are all the session ids that searched for 'foos' >> >> We accomplish the above functionality w/ MySQL using 2 tables. One for the >> raw search log information and the other to keep the aggregate/running >> counts of queries. >> >> Would this sort of ad-hoc querying be better implemented using Hadoop + >> Hive? If so, should I be storing all this information in Cassandra then >> using Hadoop to retrieve it? >> >> Thanks for your suggestions >> > > "Perhaps a secondary process which iterates over all the queries to see > which sorts the query values by count, and then stores them into another > ColumnFamily." > > - I was trying to avoid this. Is there some sort of atomic increment > feature available? I guess I could do the same thing we are currently doing > which is... > > a) store full query details into table A > b) query table B for aggregate count of query 'foo' then store count + 1 >
