Hi all,

I was wondering whether anyone has ever used information retrieval metrics on 
real-time big data
with variable amounts of data.

The main idea would be to test whether you can find relevant information for a 
given time frame for two data repositories:
one baseline repository and one with extra content. The question here would be 
how to do this in a fair way:
chances are that the extra content will contain more relevant documents than 
the baseline. So how can you be sure that
finding more relevant documents is really related to the quality of your search 
system and not to the size of your data repository?

Regards,
Joachim

Reply via email to