i think the search log will require a lot of storage which may make indexes size unreasonable large if store in solr.
and the aggregration results may not really fixed in lucene index structure. :) kiwi happy hacking ! On Tue, Jul 27, 2010 at 7:47 AM, Tommy Chheng <tommy.chh...@gmail.com>wrote: > Alternatively, have you considered storing(or i should say indexing) the > search logs with Solr? > > This lets you text search across your search queries. You can perform time > range queries with solr as well. > > @tommychheng > Programmer and UC Irvine Graduate Student > Find a great grad school based on research interests: > http://gradschoolnow.com > > > > On 7/26/10 4:43 PM, Mark wrote: > >> We are thinking about using Cassandra to store our search logs. Can >> someone point me in the right direction/lend some guidance on design? I am >> new to Cassandra and I am having trouble wrapping my head around some of >> these new concepts. My brain keeps wanting to go back to a RDBMS design. >> >> We will be storing the user query, # of hits returned and their session >> id. We would like to be able to answer the following questions. >> >> - What is the n most popular queries and their counts within the last x >> (mins/hours/days/etc). Basically the most popular searches within a given >> time range. >> - What is the most popular query within the last x where hits = 0. Same as >> above but with an extra "where" clause >> - For session id x give me all their other queries >> - What are all the session ids that searched for 'foos' >> >> We accomplish the above functionality w/ MySQL using 2 tables. One for the >> raw search log information and the other to keep the aggregate/running >> counts of queries. >> >> Would this sort of ad-hoc querying be better implemented using Hadoop + >> Hive? If so, should I be storing all this information in Cassandra then >> using Hadoop to retrieve it? >> >> Thanks for your suggestions >> >>