Hi Shawn / Michael, Thanks for your replies and I guess you have got my scenarios exactly right.
Initially my document contains information about who have access to the documents, like field as (U1_s:true). if 100 users can access a document, we will have 100 such fields for each user. So when U1 wants to see all this documents..i will query like get all documents where U1_s:true. If user U5 added to group G1, then I have to take all the documents of group G1 and have to set the information of user U5 in the document like U5_s:true in the document. For this, I have re-index all the documents in that group. To avoid this, I was trying to keep group information instead of user information like G1_s:true, G2_s:true in the document. And for querying user documents, I will first get all the groups of User U1, and then query get all documents where G1_s:true OR G2_s:true or G3_s:true.... By this we don't need to re-index all the documents. But while querying I need to query with OR of all the groups user belongs to. For how many ORs solr can give the results in less than one second.Can I pass 100's of OR condtion in the solr query? will that affects the performance ? Pls share your valuable inputs. On Thu, Mar 16, 2017 at 6:04 PM Shawn Heisey <apa...@elyograg.org> wrote: > On 3/16/2017 6:02 AM, Ganesh M wrote: > > We have 1 million of documents and would like to query with multiple fq > values. > > > > We have kept the access_control ( multi value field ) which holds > information about for which group that document is accessible. > > > > Now to get the list of all the documents of an user, we would like to > pass multiple fq values ( one for each group user belongs to ) > > > > > q:somefiled:value&fq:access_control:g1&fq:access_control:g2&fq:access_control:g3&fq:access_control:g4&fq:access_control:g5... > > > > Like this, there could be 100 groups for an user. > > The correct syntax is fq=field:value -- what you have there is not going > to work. > > This might not do what you expect. Filter queries are ANDed together -- > *every* filter must match, which means that if a document that you want > has only one of those values in access_control, or has 98 of them but > not all 100, then the query isn't going to match that document. The > solution is one filter query that can match ANY of them, which also > might run faster. I can't say whether this is a problem for you or > not. Your data might be completely correct for matching 100 filters. > > Also keep in mind that there is a limit to the size of a URL that you > can send into any webserver, including the container that runs Solr. > That default limit is 8192 bytes, and includes the "GET " or "POST " at > the beginning and the " HTTP/1.1" at the end (note the spaces). The > filter query information for 100 of the filters you mentioned is going > to be over 2K, which will fit in the default, but if your query has more > complexity than you have mentioned here, the total URL might not fit. > There's a workaround to this -- use a POST request and put the > parameters in the request body. > > > If we fire query with 100 values in the fq, whats the penalty on the > performance ? Can we get the result in less than one second for 1 million > of documents. > > With one million documents, each internal filter query result is 250000 > bytes -- the number of documents divided by eight. That's 2.5 megabytes > for 100 of them. In addition, every time a filter is run, it must > examine every document in the index to create that 250000 byte > structure, which means that filters which *aren't* found in the > filterCache are relatively slow. If they are found in the cache, > they're lightning fast, because the cache will contain the entire 250000 > byte bitset. > > If you make your filterCache large enough, it's going to consume a LOT > of java heap memory, particularly if the index gets bigger. The nice > thing about the filterCache is that once the cache entries exist, the > filters are REALLY fast, and if they're all cached, you would DEFINITELY > be able to get results in under one second. I have no idea whether the > same would happen when filters aren't cached. It might. Filters that > do not exist in the cache will be executed in parallel, so the number of > CPUs that you have in the machine, along with the query rate, will have > a big impact on the overall performance of a single query with a lot of > filters. > > Also related to the filterCache, keep in mind that every time a commit > is made that opens a new searcher, the filterCache will be autowarmed. > If the autowarmCount value for the filterCache is large, that can make > commits take a very long time, which will cause problems if commits are > happening frequently. On the other hand, a very small autowarmCount can > cause slow performance after a commit if you use a lot of filters. > > My reply is longer and more dense than I had anticipated. Apologies if > it's information overload. > > Thanks, > Shawn > >