i think the search log will require a lot of storage which may make indexes
size unreasonable large if store in solr.

and the aggregration results may not really fixed in lucene index structure.
:)

kiwi
happy hacking !



On Tue, Jul 27, 2010 at 7:47 AM, Tommy Chheng <tommy.chh...@gmail.com>wrote:

>  Alternatively, have you considered storing(or i should say indexing) the
> search logs with Solr?
>
> This lets you text search across your search queries. You can perform time
> range queries with solr as well.
>
> @tommychheng
> Programmer and UC Irvine Graduate Student
> Find a great grad school based on research interests:
> http://gradschoolnow.com
>
>
>
> On 7/26/10 4:43 PM, Mark wrote:
>
>> We are thinking about using Cassandra to store our search logs. Can
>> someone point me in the right direction/lend some guidance on design? I am
>> new to Cassandra and I am having trouble wrapping my head around some of
>> these new concepts. My brain keeps wanting to go back to a RDBMS design.
>>
>> We will be storing the user query, # of hits returned and their session
>> id. We would like to be able to answer the following questions.
>>
>> - What is the n most popular queries and their counts within the last x
>> (mins/hours/days/etc). Basically the most popular searches within a given
>> time range.
>> - What is the most popular query within the last x where hits = 0. Same as
>> above but with an extra "where" clause
>> - For session id x give me all their other queries
>> - What are all the session ids that searched for 'foos'
>>
>> We accomplish the above functionality w/ MySQL using 2 tables. One for the
>> raw search log information and the other to keep the aggregate/running
>> counts of queries.
>>
>> Would this sort of ad-hoc querying be better implemented using Hadoop +
>> Hive? If so, should I be storing all this information in Cassandra then
>> using Hadoop to retrieve it?
>>
>> Thanks for your suggestions
>>
>>

Reply via email to