David, We have a similar query in astrophysics, an user can select an area of the sky....many stars out there....
I am long overdue in creating a Jira issue, but here you have another efficient mechanism for searching large number of ids https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/search/BitSetQParserPlugin.java Roman On 12 Oct 2013 01:57, "David Philip" <davidphilipshe...@gmail.com> wrote: > Groups are pharmaceutical research expts.. User is presented with graph > view, he can select some region and all the groups in that region gets > included..user can modify the groups also here.. so we didn't maintain > group information in same solr index but we have externalized. > I looked at post filter article. So my understanding is that, I simply have > to extended as you did and should include implementaton for > "isAllowed(acls[doc], groups)" .This will filter the documents in the > collector and finally this collector will be returned. am I right? > > @Override > public void collect(int doc) throws IOException { > if (isAllowed(acls[doc], user, groups)) super.collect(doc); > } > > > Erick, I am interested to know whether I can extend any class that can > return me only the bitset of the documents that match the search query. I > can then do bitset1.andbitset2OfGroups - finally, collect only those > documents to return to user. How do I try this approach? Any pointers for > bit set? > > Thanks - David > > > > > On Thu, Oct 10, 2013 at 5:25 PM, Erick Erickson <erickerick...@gmail.com > >wrote: > > > Well, my first question is why 50K groups is necessary, and > > whether you can simplify that. How a user can manually > > choose from among that many groups is "interesting". But > > assuming they're all necessary, I can think of two things. > > > > If the user can only select ranges, just put in filter queries > > using ranges. Or possibly both ranges and individual entries, > > as fq=group:[1A TO 10000A] OR group:(2B 45C 98Z) etc. > > You need to be a little careful how you put index these so > > range queries work properly, in the above you'd miss > > 2A because it's sorting lexicographically, you'd need to > > store in some form that sorts like 0000001A 010000A > > and so on. You wouldn't need to show that form to the > > user, just form your fq's in the app to work with > > that form. > > > > If that won't work (you wouldn't want this to get huge), think > > about a "post filter" that would only operate on documents that > > had made it through the select, although how to convey which > > groups the user selected to the post filter is an open > > question. > > > > Best, > > Erick > > > > On Wed, Oct 9, 2013 at 12:23 PM, David Philip > > <davidphilipshe...@gmail.com> wrote: > > > Hi All, > > > > > > I have an issue in handling filters for one of our requirements and > > > liked to get suggestion for the best approaches. > > > > > > > > > *Use Case:* > > > > > > 1. We have List of groups and the number of groups can increase upto > >1 > > > million. Currently we have almost 90 thousand groups in the solr search > > > system. > > > > > > 2. Just before the user hits a search, He has options to select the > no. > > of > > > groups he want to retrieve. [the distinct list of these group Names > for > > > display are retrieved from other solr index that has more information > > about > > > groups] > > > > > > *3.User Operation:** * > > > Say if user selected group 1A - group 10000A. and searches for > > key:cancer. > > > > > > > > > The current approach I was thinking is : get search results and filter > > > query by groupids' list selected by user. But my concern is When these > > > groups list is increasing to >50k unique Ids, This can cause lot of > delay > > > in getting search results. So wanted to know whether there are > different > > > filtering ways that I can try for? > > > > > > I was thinking of one more approach as suggested by my colleague to do > - > > > intersection. - > > > Get the groupIds' selected by user. > > > Get the list of groupId's from search results, > > > Perform intersection of both and then get the entire result set of only > > > those groupid that intersected. Is this better way? Can I use any cache > > > technique in this case? > > > > > > > > > - David. > > >