Re: Real Time Search and External File Fields

2016-10-10 Thread Mike Lissner
Thanks for the replies. I made the changes so that the external file field is loaded per:

Re: Real Time Search and External File Fields

2016-10-09 Thread Shawn Heisey
On 10/8/2016 1:18 PM, Mike Lissner wrote: > I want to make sure I understand this properly and document this for > futurepeople that may find this thread. Here's what I interpret your > advice to be: > 0. Slacken my auto soft commit interval to something more like a minute. Yes, I would do this.

Re: Real Time Search and External File Fields

2016-10-08 Thread Erick Erickson
I chose 16 as a place to start. You usually reach diminishing returns pretty quickly, i feel it's a mistake to set your autowarm counts to, say 256 (and I've seen this in the thousands) unless you have some proof that it's useful to bump higher. But certainly if you set them to 16 and see spikes

Re: Real Time Search and External File Fields

2016-10-08 Thread Walter Underwood
With time-oriented data, you can use an old trick (goes back to Infoseek in 1995). Make a “today” collection that is very fresh. Nightly, migrate new documents to the “not today” collection. The today collection will be small and can be updated quickly. The archive collection will be large and

Re: Real Time Search and External File Fields

2016-10-08 Thread Mike Lissner
On Fri, Oct 7, 2016 at 8:18 PM Erick Erickson wrote: > What you haven't mentioned is how often you add new docs. Is it once a > day? Steadily > from 8:00 to 17:00? > Alas, it's a steady trickle during business hours. We're ingesting court documents as they're posted on

Re: Real Time Search and External File Fields

2016-10-08 Thread Mike Lissner
On Sat, Oct 8, 2016 at 8:46 AM Shawn Heisey wrote: > Most soft commit > > documentation talks about setting up soft commits with of > about a > > second. > > IMHO any documentation that recommends autoSoftCommit with a maxTime of > one second is bad documentation, and needs

Re: Real Time Search and External File Fields

2016-10-08 Thread Shawn Heisey
On 10/7/2016 6:19 PM, Mike Lissner wrote: > Soft commits seem to be exactly the thing for this, but whenever I open a > new searcher (which soft commits seem to do), the external file is > reloaded, and all queries are halted until it finishes loading. When I just > measured, this took about 30

Re: Real Time Search and External File Fields

2016-10-07 Thread Erick Erickson
bq: Most soft commit documentation talks about setting up soft commits with of about a second. I think this is really a consequence of this being included in the example configs for illustrative purposes, personally I never liked this. There is no one right answer. I've seen soft commit

Real Time Search and External File Fields

2016-10-07 Thread Mike Lissner
I have an index of about 4M documents with an external file field configured to do boosting based on pagerank scores of each document. The pagerank file is about 93MB as of today -- it's pretty big. Each day, I add about 1,000 new documents to the index, and I need them to be available as soon as