Hi, I'm using SOLRJ to run a query, with the goal of obtaining:
(1) the retrieved documents, (2) the TF of each term in each document, (3) the IDF of each term in the set of retrieved documents (TF/IDF would be fine too) ...all at interactive speeds, or <10s per query. This is a demo, so if all else fails I can adjust the corpus, but I'd rather, y'know, actually do it. (1) and (2) are working; I completed the patch posted in the following issue: https://issues.apache.org/jira/browse/SOLR-949 and am just setting tv=true&tv.tf=true for my query. This way I get the documents and the tf information all in one go. With (3) I'm running into trouble. I have found 2 ways to do it so far: Option A: set tv.df=true or tv.tf_idf for my query, and get the idf information along with the documents and tf information. Since each term may appear in multiple documents, this means retrieving idf information for each term about 20 times, and takes over a minute to do. Option B: After I've gathered the tf information, run through the list of terms used across the set of retrieved documents, and for each term, run a query like: {!func}idf(text,'the_term')&deftype=func&fl=score&rows=1 ...while this retrieves idf information only once for each term, the added latency for doing that many queries piles up to almost two minutes on my current corpus. Is there anything I didn't think of -- a way to construct a query to get idf information for a set of terms all in one go, outside the bounds of what terms happen to be in a document? Failing that, does anyone have a sense for how far I'd have to scale down a corpus to approach interactive speeds, if I want this sort of data? Katie