On Sat, Oct 8, 2016 at 8:46 AM Shawn Heisey <apa...@elyograg.org> wrote:

> Most soft commit
> > documentation talks about setting up soft commits with <maxtime> of
> about a
> > second.
>
> IMHO any documentation that recommends autoSoftCommit with a maxTime of
> one second is bad documentation, and needs to be fixed.  Where have you
> seen such a recommendation?


You know, I must have made that up, sorry. But the documentation you linked
to (on the Lucid Works blog) and the example file says 15 seconds for hard
commits, so it I think that got me thinking that soft commits could be more
frequent.

Should soft commits be less frequent than hard commits
(opensearcher=False)? If so, I didn't find that to be at all clear.


> right now Solr/Lucene has no
> way of knowing that your external file has not changed, so it must read
> the file every time it builds a searcher.


Is it crazy to file a feature request asking that Solr/Lucene keep the
modtime of this file and on reload it if it has changed? Seems like an easy
win.


>  I doubt this feature was
> designed to deal well with an extremely large external file like yours.
>

Perhaps not. It's probably worth mentioning that part of the reason the
file is so large is because pagerank uses very small and accurate floats.
So a typical line is:

1=9.50539603222e-08

Not something smaller like:

1=3.2

Pagerank also provides a value for every item in the index, so that makes
the file long. I'd suspect that anybody with a pagerank boosted index of
moderate size would have a similarly-sized file.


> If the info changes that infrequently, can you just incorporate it
> directly into the index with a standard field, with the info coming in
> as a part of your normal indexing process?


We've considered that, but whenever you re-run pagerank, it updates EVERY
value. So I guess we could try updating every doc in our index whenever we
run pagerank, but that's a nasty solution.


> It seems unlikely that Solr would stop serving queries while setting up
> a new searcher.  The old searcher should continue to serve requests
> until the new searcher is ready.  If this is happening, that definitely
> seems like a bug.
>

I'm positive I've observed this, though you're right, some queries still
seem to come through. Is it possible that queries relying on the field are
stopped while the field is loading? I've observed this two ways:

1. From the front end, things were stalling every time I was doing a hard
commit (opensearcher=true). I had hard commits coming in every ten minutes
via cron job, and sure enough, at ten, twenty, thirty...minutes after every
hour, I'd see stalls.

2. Watching the logs, I saw a flood of queries come through after the line:

Loaded external value source external_pagerank

Some queries were coming through before this line, but I think none of
those queries use the external file field (external_pagerank).

Mike

Reply via email to