Hi all, I was wondering whether anyone has ever used information retrieval metrics on real-time big data with variable amounts of data.
The main idea would be to test whether you can find relevant information for a given time frame for two data repositories: one baseline repository and one with extra content. The question here would be how to do this in a fair way: chances are that the extra content will contain more relevant documents than the baseline. So how can you be sure that finding more relevant documents is really related to the quality of your search system and not to the size of your data repository? Regards, Joachim
