Solr Upgrade socketTimeout issue in 8.2

2020-02-18 Thread kshitij tyagi
Hi,

We have upgraded our solrCloud from version 6.6.0 to 8.2.0

At the time of indexing intermittently we are observing socketTimeout
exception when using Collection apis. example when we try reloading one of
the collection using CloudSolrClient class.

Is there any performance degradation in Solrcloud collection apis?

logs:

IOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of
stream exception

EndOfStreamException: Unable to read additional data from client sessionid
0x2663e756d775747, likely client has closed socket

at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)

at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)

at java.lang.Thread.run(Unknown Source)


logs:


Exception has occured in job switch: Timeout occurred while waiting
response from server at:http://prod-t-8.net:8983/solr


Is anyone facing same type of issue? in Solrcloud? Any suggestions to solve??



Regards,

kshitij


Re: Best Practises around relevance tuning per query

2020-02-18 Thread Jörn Franke
You are too much focus on the solution. If you would describe the business case 
in more detail without including the solution itself more people could help.

Eg it ie not clear why you have a scoring model and why this can address 
business needs. 

> Am 18.02.2020 um 01:50 schrieb Ashwin Ramesh :
> 
> Hi,
> 
> We are in the process of applying a scoring model to our search results. In
> particular, we would like to add scores for documents per query and user
> context.
> 
> For example, we want to have a score from 500 to 1 for the top 500
> documents for the query “dog” for users who speak US English.
> 
> We believe it becomes infeasible to store these scores in Solr because we
> want to update the scores regularly, and the number of scores increases
> rapidly with increased user attributes.
> 
> One solution we explored was to store these scores in a secondary data
> store, and use this at Solr query time with a boost function such as:
> 
> `bf=mul(termfreq(id,’ID-1'),500) mul(termfreq(id,'ID-2'),499) …
> mul(termfreq(id,'ID-500'),1)`
> 
> We have over a hundred thousand documents in one Solr collection, and about
> fifty million in another Solr collection. We have some queries for which
> roughly 80% of the results match, although this is an edge case. We wanted
> to know the worst case performance, so we tested with such a query. For
> both of these collections we found the a message similar to the following
> in the Solr cloud logs (tested on a laptop):
> 
> Elapsed time: 5020. Exceeded allowed search time: 5000 ms.
> 
> We then tried using the following boost, which seemed simpler:
> 
> `boost=if(query($qq), 10, 1)&qq=id:(ID-1 OR ID-2 OR … OR ID-500)`
> 
> We then saw the following in the Solr cloud logs:
> 
> `The request took too long to iterate over terms.`
> 
> All responses above took over 5000 milliseconds to return.
> 
> We are considering Solr’s re-ranker, but I don’t know how we would use this
> without pushing all the query-context-document scores to Solr.
> 
> 
> The alternative solution that we are currently considering involves
> invoking multiple solr queries.
> 
> This means we would make a request to solr to fetch the top N results (id,
> score) for the query. E.g. q=dog, fq=featureA:foo, fq=featureB=bar, limit=N.
> 
> Another request would be made using a filter query with a set of doc ids
> that we know are high value for the user’s query. E.g. q=*:*,
> fq=featureA:foo, fq=featureB:bar, fq=id:(d1, d2, d3), limit=N.
> 
> We would then do a reranking phase in our service layer.
> 
> Do you have any suggestions for known patterns of how we can store and
> retrieve scores per user context and query?
> 
> Regards,
> Ash & Spirit.
> 
> -- 
> **
> ** Empowering the world to design
> Also, we're 
> hiring. Apply here! 
> 
>   
>    
>     
> 


Re: Best Practises around relevance tuning per query

2020-02-18 Thread Mikhail Khludnev
Note, {!terms} query is more efficient for long ids list. I'd try to group
ids by boost, and cache long ids lists. Something like:
q=filter({!terms f=id}1,3,5)^=100  filter({!terms f=id}2,4,6)^=-1
Thus, it let to reuse heavy terms lists between queries.
Another idea, extract boost score to the separate core/index (strictly
single shard in SolrCloud so far), and use {!join score=sum} to bring ranks
to the main index. It let to update smaller core faster. Although, it might
require some hack to decouple updates and cache invalidation.
Also, Solr has in-place updates, which might update columns with boosts and
score by this column.

On Tue, Feb 18, 2020 at 9:27 PM Ashwin Ramesh 
wrote:

> ping on this :)
>
> On Tue, Feb 18, 2020 at 11:50 AM Ashwin Ramesh  wrote:
>
> > Hi,
> >
> > We are in the process of applying a scoring model to our search results.
> > In particular, we would like to add scores for documents per query and
> user
> > context.
> >
> > For example, we want to have a score from 500 to 1 for the top 500
> > documents for the query “dog” for users who speak US English.
> >
> > We believe it becomes infeasible to store these scores in Solr because we
> > want to update the scores regularly, and the number of scores increases
> > rapidly with increased user attributes.
> >
> > One solution we explored was to store these scores in a secondary data
> > store, and use this at Solr query time with a boost function such as:
> >
> > `bf=mul(termfreq(id,’ID-1'),500) mul(termfreq(id,'ID-2'),499) …
> > mul(termfreq(id,'ID-500'),1)`
> >
> > We have over a hundred thousand documents in one Solr collection, and
> > about fifty million in another Solr collection. We have some queries for
> > which roughly 80% of the results match, although this is an edge case. We
> > wanted to know the worst case performance, so we tested with such a
> query.
> > For both of these collections we found the a message similar to the
> > following in the Solr cloud logs (tested on a laptop):
> >
> > Elapsed time: 5020. Exceeded allowed search time: 5000 ms.
> >
> > We then tried using the following boost, which seemed simpler:
> >
> > `boost=if(query($qq), 10, 1)&qq=id:(ID-1 OR ID-2 OR … OR ID-500)`
> >
> > We then saw the following in the Solr cloud logs:
> >
> > `The request took too long to iterate over terms.`
> >
> > All responses above took over 5000 milliseconds to return.
> >
> > We are considering Solr’s re-ranker, but I don’t know how we would use
> > this without pushing all the query-context-document scores to Solr.
> >
> >
> > The alternative solution that we are currently considering involves
> > invoking multiple solr queries.
> >
> > This means we would make a request to solr to fetch the top N results
> (id,
> > score) for the query. E.g. q=dog, fq=featureA:foo, fq=featureB=bar,
> limit=N.
> >
> > Another request would be made using a filter query with a set of doc ids
> > that we know are high value for the user’s query. E.g. q=*:*,
> > fq=featureA:foo, fq=featureB:bar, fq=id:(d1, d2, d3), limit=N.
> >
> > We would then do a reranking phase in our service layer.
> >
> > Do you have any suggestions for known patterns of how we can store and
> > retrieve scores per user context and query?
> >
> > Regards,
> > Ash & Spirit.
> >
>
> --
> **
> ** Empowering the world to design
> Also, we're
> hiring. Apply here! 
>
>  
>  
>   
> 
>
>
>
>
>
>
>
>
>
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Best Practises around relevance tuning per query

2020-02-18 Thread Walter Underwood
I didn’t respond because it seemed like you were stuck on an approach that would
never be efficient in Solr. It requires massive amounts of data applied to 
documents
in a fine-grained way. Maybe it makes the math easier, but the data management 
is impractical. I could not see any way to make that fast.

Here are four alternative approaches.

1. Instead of a 1-500 scale, use a 0/1 scale. Either this result is good for 
this query term or not.
That can be implemented with a single multivalued field containing the query 
terms. If the 
query matches that field, it gets a boost.

2. How much of that score is really different between query terms? Split out a 
common document
score that is independent of the query term. Use the boost parameter with that 
document
quality score and see how close it is to the ideal ranking.

3. Group your queries and documents into categories. Give each document a score 
for each
category. That could be boolean (in the category or not) or a quality score for 
that category.
That can be stored in a dynamic field, so topic_score_1, topic_score_42, etc. A 
query for
topic 42 fetches the matching field. We did this for three different sets of 
topics each with
thousands of categories. It was a ton of fields, but ran really fast. 

You can categorize the query by seeing which documents it matches. Check the 
category
memberships of the first k results and choose the top-scoring category. This is 
a kNN 
(k Nearest Neighbors) classifier. Then take that category and run a second 
query using
the category scores.

4. Pre-calculate the top 50 results for each category with the slow algorithm 
and use the
elevate component to force that ranking for that term. 

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 18, 2020, at 9:27 PM, Ashwin Ramesh  wrote:
> 
> ping on this :)
> 
> On Tue, Feb 18, 2020 at 11:50 AM Ashwin Ramesh  wrote:
> 
>> Hi,
>> 
>> We are in the process of applying a scoring model to our search results.
>> In particular, we would like to add scores for documents per query and user
>> context.
>> 
>> For example, we want to have a score from 500 to 1 for the top 500
>> documents for the query “dog” for users who speak US English.
>> 
>> We believe it becomes infeasible to store these scores in Solr because we
>> want to update the scores regularly, and the number of scores increases
>> rapidly with increased user attributes.
>> 
>> One solution we explored was to store these scores in a secondary data
>> store, and use this at Solr query time with a boost function such as:
>> 
>> `bf=mul(termfreq(id,’ID-1'),500) mul(termfreq(id,'ID-2'),499) …
>> mul(termfreq(id,'ID-500'),1)`
>> 
>> We have over a hundred thousand documents in one Solr collection, and
>> about fifty million in another Solr collection. We have some queries for
>> which roughly 80% of the results match, although this is an edge case. We
>> wanted to know the worst case performance, so we tested with such a query.
>> For both of these collections we found the a message similar to the
>> following in the Solr cloud logs (tested on a laptop):
>> 
>> Elapsed time: 5020. Exceeded allowed search time: 5000 ms.
>> 
>> We then tried using the following boost, which seemed simpler:
>> 
>> `boost=if(query($qq), 10, 1)&qq=id:(ID-1 OR ID-2 OR … OR ID-500)`
>> 
>> We then saw the following in the Solr cloud logs:
>> 
>> `The request took too long to iterate over terms.`
>> 
>> All responses above took over 5000 milliseconds to return.
>> 
>> We are considering Solr’s re-ranker, but I don’t know how we would use
>> this without pushing all the query-context-document scores to Solr.
>> 
>> 
>> The alternative solution that we are currently considering involves
>> invoking multiple solr queries.
>> 
>> This means we would make a request to solr to fetch the top N results (id,
>> score) for the query. E.g. q=dog, fq=featureA:foo, fq=featureB=bar, limit=N.
>> 
>> Another request would be made using a filter query with a set of doc ids
>> that we know are high value for the user’s query. E.g. q=*:*,
>> fq=featureA:foo, fq=featureB:bar, fq=id:(d1, d2, d3), limit=N.
>> 
>> We would then do a reranking phase in our service layer.
>> 
>> Do you have any suggestions for known patterns of how we can store and
>> retrieve scores per user context and query?
>> 
>> Regards,
>> Ash & Spirit.
>> 
> 
> -- 
> **
> ** Empowering the world to design
> Also, we're 
> hiring. Apply here! 
> 
>   
>    
>     
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 



Re: [SUSPICIOUS] Re: Best Practises around relevance tuning per query

2020-02-18 Thread David Hastings
I don’t think anyone is responding because it’s too focused of a use case, 
where you just simply have to figure out an alternative on your own.  

> On Feb 19, 2020, at 12:28 AM, Ashwin Ramesh  wrote:
> 
> ping on this :)
> 
>> On Tue, Feb 18, 2020 at 11:50 AM Ashwin Ramesh  wrote:
>> 
>> Hi,
>> 
>> We are in the process of applying a scoring model to our search results.
>> In particular, we would like to add scores for documents per query and user
>> context.
>> 
>> For example, we want to have a score from 500 to 1 for the top 500
>> documents for the query “dog” for users who speak US English.
>> 
>> We believe it becomes infeasible to store these scores in Solr because we
>> want to update the scores regularly, and the number of scores increases
>> rapidly with increased user attributes.
>> 
>> One solution we explored was to store these scores in a secondary data
>> store, and use this at Solr query time with a boost function such as:
>> 
>> `bf=mul(termfreq(id,’ID-1'),500) mul(termfreq(id,'ID-2'),499) …
>> mul(termfreq(id,'ID-500'),1)`
>> 
>> We have over a hundred thousand documents in one Solr collection, and
>> about fifty million in another Solr collection. We have some queries for
>> which roughly 80% of the results match, although this is an edge case. We
>> wanted to know the worst case performance, so we tested with such a query.
>> For both of these collections we found the a message similar to the
>> following in the Solr cloud logs (tested on a laptop):
>> 
>> Elapsed time: 5020. Exceeded allowed search time: 5000 ms.
>> 
>> We then tried using the following boost, which seemed simpler:
>> 
>> `boost=if(query($qq), 10, 1)&qq=id:(ID-1 OR ID-2 OR … OR ID-500)`
>> 
>> We then saw the following in the Solr cloud logs:
>> 
>> `The request took too long to iterate over terms.`
>> 
>> All responses above took over 5000 milliseconds to return.
>> 
>> We are considering Solr’s re-ranker, but I don’t know how we would use
>> this without pushing all the query-context-document scores to Solr.
>> 
>> 
>> The alternative solution that we are currently considering involves
>> invoking multiple solr queries.
>> 
>> This means we would make a request to solr to fetch the top N results (id,
>> score) for the query. E.g. q=dog, fq=featureA:foo, fq=featureB=bar, limit=N.
>> 
>> Another request would be made using a filter query with a set of doc ids
>> that we know are high value for the user’s query. E.g. q=*:*,
>> fq=featureA:foo, fq=featureB:bar, fq=id:(d1, d2, d3), limit=N.
>> 
>> We would then do a reranking phase in our service layer.
>> 
>> Do you have any suggestions for known patterns of how we can store and
>> retrieve scores per user context and query?
>> 
>> Regards,
>> Ash & Spirit.
>> 
> 
> -- 
> **
> ** Empowering the world to design
> Also, we're 
> hiring. Apply here! 
> 
>   
>    
>     
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 


Re: Best Practises around relevance tuning per query

2020-02-18 Thread Ashwin Ramesh
ping on this :)

On Tue, Feb 18, 2020 at 11:50 AM Ashwin Ramesh  wrote:

> Hi,
>
> We are in the process of applying a scoring model to our search results.
> In particular, we would like to add scores for documents per query and user
> context.
>
> For example, we want to have a score from 500 to 1 for the top 500
> documents for the query “dog” for users who speak US English.
>
> We believe it becomes infeasible to store these scores in Solr because we
> want to update the scores regularly, and the number of scores increases
> rapidly with increased user attributes.
>
> One solution we explored was to store these scores in a secondary data
> store, and use this at Solr query time with a boost function such as:
>
> `bf=mul(termfreq(id,’ID-1'),500) mul(termfreq(id,'ID-2'),499) …
> mul(termfreq(id,'ID-500'),1)`
>
> We have over a hundred thousand documents in one Solr collection, and
> about fifty million in another Solr collection. We have some queries for
> which roughly 80% of the results match, although this is an edge case. We
> wanted to know the worst case performance, so we tested with such a query.
> For both of these collections we found the a message similar to the
> following in the Solr cloud logs (tested on a laptop):
>
> Elapsed time: 5020. Exceeded allowed search time: 5000 ms.
>
> We then tried using the following boost, which seemed simpler:
>
> `boost=if(query($qq), 10, 1)&qq=id:(ID-1 OR ID-2 OR … OR ID-500)`
>
> We then saw the following in the Solr cloud logs:
>
> `The request took too long to iterate over terms.`
>
> All responses above took over 5000 milliseconds to return.
>
> We are considering Solr’s re-ranker, but I don’t know how we would use
> this without pushing all the query-context-document scores to Solr.
>
>
> The alternative solution that we are currently considering involves
> invoking multiple solr queries.
>
> This means we would make a request to solr to fetch the top N results (id,
> score) for the query. E.g. q=dog, fq=featureA:foo, fq=featureB=bar, limit=N.
>
> Another request would be made using a filter query with a set of doc ids
> that we know are high value for the user’s query. E.g. q=*:*,
> fq=featureA:foo, fq=featureB:bar, fq=id:(d1, d2, d3), limit=N.
>
> We would then do a reranking phase in our service layer.
>
> Do you have any suggestions for known patterns of how we can store and
> retrieve scores per user context and query?
>
> Regards,
> Ash & Spirit.
>

-- 
**
** Empowering the world to design
Also, we're 
hiring. Apply here! 
 
  
   
    













Re: Batch updates, optimistic concurrency and conflict errors

2020-02-18 Thread Erick Erickson
I think what you want is to just configure TolerantUpdateProcessorFactory
in solrconfig.xml as part of your update chain and specify the _version_
field as appropriate in the URL you referenced.

You can configure TolerantUpdateProcessorFactory to limit the number
of errors allowed before terminating etc. It also returns an indication of
what failures happened.

Best,
Erick



> On Feb 18, 2020, at 8:56 AM, Sachin Divekar  wrote:
> 
> Hi,
> 
> I am trying to use *must-exist* and *must-not-exist* semantics of
> optimisti# concurrency provided by Solr. When doing batch updates SolrM
> stops indexing immediately when it encounters a conflict. It does not
> process subsequent records in the input list.
> 
> That is one extreme. And the other extreme is using
> failOnVersionConflicts=false as described in the documentation at
> https://lucene.apache.org/solr/guide/8_4/updating-parts-of-documents.html#optimistic-concurrency
> I think it internally uses *TolerantUpdateProcessorFactory*. This silently
> ignores and suppresses errors and the client never`knows if there was any
> error during indexing which is not useful when using optimistic concurrency.
> 
> I am wondering if there is any way to have batch updates where Solr would
> process the entire batch and send the list of errors in the response.
> 
> I checked if such update processor is available but did not find it. If it
> is not possible in Solr out of the box can it be implemented as a custom
> update processor?
> 
> Thank you.
> 
> --
> Sachin



Batch updates, optimistic concurrency and conflict errors

2020-02-18 Thread Sachin Divekar
Hi,

I am trying to use *must-exist* and *must-not-exist* semantics of
optimistic concurrency provided by Solr. When doing batch updates SolrM
stops indexing immediately when it encounters a conflict. It does not
process subsequent records in the input list.

That is one extreme. And the other extreme is using
failOnVersionConflicts=false as described in the documentation at
https://lucene.apache.org/solr/guide/8_4/updating-parts-of-documents.html#optimistic-concurrency
I think it internally uses *TolerantUpdateProcessorFactory*. This silently
ignores and suppresses errors and the client never knows if there was any
error during indexing which is not useful when using optimistic concurrency.

I am wondering if there is any way to have batch updates where Solr would
process the entire batch and send the list of errors in the response.

I checked if such update processor is available but did not find it. If it
is not possible in Solr out of the box can it be implemented as a custom
update processor?

Thank you.

--
Sachin


Re: A question about solr filter cache

2020-02-18 Thread Erick Erickson
Again depending on the version of Solr, but the metrics end point (added in 
6.4) has a TON of information. Be prepared to wade through it for half a day to 
find out the things you need ;). There are something like 150 different metrics 
returned…

Frankly I don’t remember if cache RAM usage is one of them, but that’s what 
grep was made for ;)

Best,
Erick



> On Feb 18, 2020, at 2:53 AM, Hongxu Ma  wrote:
> 
> @Vadim Ivanov
> 
> Thank you!
> 
> From: Vadim Ivanov 
> Sent: Tuesday, February 18, 2020 15:27
> To: solr-user@lucene.apache.org 
> Subject: RE: A question about solr filter cache
> 
> Hi!
> Yes, it may depends on Solr version
> Solr 8.3 Admin filterCache page stats looks like:
> 
> stats:
> CACHE.searcher.filterCache.cleanupThread:false
> CACHE.searcher.filterCache.cumulative_evictions:0
> CACHE.searcher.filterCache.cumulative_hitratio:0.94
> CACHE.searcher.filterCache.cumulative_hits:198
> CACHE.searcher.filterCache.cumulative_idleEvictions:0
> CACHE.searcher.filterCache.cumulative_inserts:12
> CACHE.searcher.filterCache.cumulative_lookups:210
> CACHE.searcher.filterCache.evictions:0
> CACHE.searcher.filterCache.hitratio:1
> CACHE.searcher.filterCache.hits:84
> CACHE.searcher.filterCache.idleEvictions:0
> CACHE.searcher.filterCache.inserts:0
> CACHE.searcher.filterCache.lookups:84
> CACHE.searcher.filterCache.maxRamMB:-1
> CACHE.searcher.filterCache.ramBytesUsed:70768
> CACHE.searcher.filterCache.size:12
> CACHE.searcher.filterCache.warmupTime:1
> 
>> -Original Message-
>> From: Hongxu Ma [mailto:inte...@outlook.com]
>> Sent: Tuesday, February 18, 2020 5:32 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: A question about solr filter cache
>> 
>> @Erick Erickson and @Mikhail Khludnev
>> 
>> got it, the explanation is very clear.
>> 
>> Thank you for your help.
>> 
>> From: Hongxu Ma 
>> Sent: Tuesday, February 18, 2020 10:22
>> To: Vadim Ivanov ; solr-
>> u...@lucene.apache.org 
>> Subject: Re: A question about solr filter cache
>> 
>> Thank you @Vadim Ivanov
>> I know that admin page, but I cannot find the memory usage of filter cache
>> (only has "CACHE.searcher.filterCache.size", I think it's the used slot
> number
>> of filtercache)
>> 
>> There is my output (solr version 7.3.1):
>> 
>> filterCache
>> 
>>  *
>> 
>> class:
>> org.apache.solr.search.FastLRUCache
>>  *
>> 
>> description:
>> Concurrent LRU Cache(maxSize=512, initialSize=512, minSize=460,
>> acceptableSize=486, cleanupThread=false)
>>  *   stats:
>> *
>> 
>> CACHE.searcher.filterCache.cumulative_evictions:
>> 0
>> *
>> 
>> CACHE.searcher.filterCache.cumulative_hitratio:
>> 0.5
>> *
>> 
>> CACHE.searcher.filterCache.cumulative_hits:
>> 1
>> *
>> 
>> CACHE.searcher.filterCache.cumulative_inserts:
>> 1
>> *
>> 
>> CACHE.searcher.filterCache.cumulative_lookups:
>> 2
>> *
>> 
>> CACHE.searcher.filterCache.evictions:
>> 0
>> *
>> 
>> CACHE.searcher.filterCache.hitratio:
>> 0.5
>> *
>> 
>> CACHE.searcher.filterCache.hits:
>> 1
>> *
>> 
>> CACHE.searcher.filterCache.inserts:
>> 1
>> *
>> 
>> CACHE.searcher.filterCache.lookups:
>> 2
>> *
>> 
>> CACHE.searcher.filterCache.size:
>> 1
>> *
>> 
>> CACHE.searcher.filterCache.warmupTime:
>> 0
>> 
>> 
>> 
>> 
>> From: Vadim Ivanov 
>> Sent: Monday, February 17, 2020 17:51
>> To: solr-user@lucene.apache.org 
>> Subject: RE: A question about solr filter cache
>> 
>> You can easily check amount of RAM used by core filterCache in Admin UI:
>> Choose core - Plugins/Stats - Cache - filterCache It shows useful
> information
>> on configuration, statistics and current RAM usage by filter cache, as
> well as
>> some examples of current filtercaches in RAM Core, for ex, with 10 mln
> docs
>> uses 1.3 MB of Ram for every filterCache
>> 
>> 
>>> -Original Message-
>>> From: Hongxu Ma [mailto:inte...@outlook.com]
>>> Sent: Monday, February 17, 2020 12:13 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: A question about solr filter cache
>>> 
>>> Hi
>>> I want to know the internal of solr filter cache, especially its
>>> memory
>> usage.
>>> 
>>> I googled some pages:
>>> https://teaspoon-consulting.com/articles/solr-cache-tuning.html
>>> https://lucene.472066.n3.nabble.com/Solr-Filter-Cache-Size-td4120912.h
>>> tml
>>> (Erick Erickson's answer)
>>> 
>>> All of them said its structure is: fq => a bitmap (total doc number
>>> bits),
>> but I
>>> think it's not so simple, reason:
>>> Given total doc number is 1 billion, each filter cache entry will use
>> nearly
>>> 1GB(10/8 bit), it's too big and very easy to make solr OOM (I
>>> have
>> a
>>> 1 billion doc cluster, looks it works well)
>>> 
>>> And I also checked solr node, but cannot find the details (only saw
>>> using DocSets structure)
>>> 
>>> So far, I guess:
>>>