Re: minimize disc space requirement.

2019-05-18 Thread Erick Erickson
It Depends (tm). No, limiting the background threads won’t help much. Here’s the issue: At time T, the segments file contains the current “snapshot” of the index, i.e. the names of all the segments that have been committed. At time T+N, another commit happens. Or, consider an optimize which for

Re: minimize disc space requirement.

2019-05-18 Thread Erick Erickson
Oh, and none of that includes people adding more and more documents to the existing replicas…. > On May 18, 2019, at 10:22 AM, Shawn Heisey wrote: > > On 5/18/2019 9:36 AM, tom_s wrote: >> im aware that the best practice is to have disk space on your solr servers >> to be 2 times the size of

Re: minimize disc space requirement.

2019-05-18 Thread Shawn Heisey
On 5/18/2019 9:36 AM, tom_s wrote: im aware that the best practice is to have disk space on your solr servers to be 2 times the size of the index. but my goal to minimize this overhead and have my index occupy more than 50% of disk space. in our index documents have TTL, so documents are deleted

minimize disc space requirement.

2019-05-18 Thread tom_s
hey, im aware that the best practice is to have disk space on your solr servers to be 2 times the size of the index. but my goal to minimize this overhead and have my index occupy more than 50% of disk space. in our index documents have TTL, so documents are deleted every day and it causes

Re: Distributed IDF in Alias

2019-05-18 Thread Erick Erickson
In a word, “yes”. For time routed alias, you also have to be aware of the nature of your data. Take the canonical example of news stories for instance, and let’s assume that every day a new collection is created. Now a hot news story breaks and the news is flooded with the latest story,

Re: Distributed IDF in Alias

2019-05-18 Thread Andrzej Białecki
Yes, the IDFs will be different. You could probably implement a custom component that would take term statistics from the previous collections to pre-populate the stats of the current collection, but this is an uncharted area, there’s a lot that could go wrong. Eg. if there’s a genuine shift in

Re: Distributed IDF in Alias

2019-05-18 Thread SOLR4189
I ask my question due to I want to use TRA (Time Routed Aliases). Let's say SOLR will open new collection every month. In the beginning of month a new collection will be empty almost. So IDF will be different between new collection and collection of previous month? -- Sent from: