SortableTextField uses docValues in a very specific way, and is not a
general-purpose workaround for enabling docValues on TextFields. Possibly
of interest: https://issues.apache.org/jira/browse/SOLR-8362

That said, DocValues are relevant mainly (only?) wrt full-domain per-doc
value-access (e.g., for faceting, sorting, functions, export ...). Enabling
docValues for any field against which you're only running _searches_ is
unlikely to help.

If search latency is the main issue for you now, sharing more detail about
the queries you're running would be helpful (e.g., are you only running
searches? are you also running facets? how are you sorting? etc.). Pasting
a literal, complete search url (and any configured param defaults, if
applicable) could be helpful (fwiw, the example search you provided
earlier, ".../select?q=ptokens: 41654165%20AND% ptokens: 65798761" looks a
bit odd in several respects, and may not be being interpreted the way you
think it should be; e.g., field spec should be immediately adjacent to
field value, with no intervening whitespace, etc...).

I note that you have a small amount of swap space being used; "small amount
used" or not, I would _strongly_ recommend disabling swap entirely
(`swapoff -a`). There are risks associated with disabling in general; but
with an index that large, you should be running with enough memory headroom
for the OS page cache that you shouldn't get anywhere near a situation
where application memory actually _needs_ swap. Also, a shot in the dark:
is there any chance you're running this index on a network filesystem?

On Thu, Jul 22, 2021 at 11:51 AM Jon Morisi <jon.mor...@hsc.utah.edu> wrote:

> I dug some more into a workaround and found, the SortableTextField, field
> type:
> https://solr.apache.org/guide/7_4/field-types-included-with-solr.html
>
> My max length is 3945.
>
> Any concerns about changing my solr.TextField type to a SortableTextField
> type in order to enable docValues?
> I would then configure the maxCharsForDocValues to 4096.
>
> Is this a bad idea, or am I on the right track?
> Is there another way to enable docValues for a pipe delimited string of
> tokens?
>
> -----Original Message-----
> From: Jon Morisi <jon.mor...@hsc.utah.edu>
> Sent: Thursday, July 22, 2021 8:45 AM
> To: users@solr.apache.org
> Subject: RE: Solr nodes crashing
>
> I looked into this (https://solr.apache.org/guide/7_4/docvalues.html),
> and it looks like I can't use docvalues because my field type is
> solr.textfield.  Specifically:
>
> <fieldType name="PipeToken" class="solr.TextField"
> positionIncrementGap="100" multiValued="false">
>   <analyzer>
>     <tokenizer class="solr.SimplePatternSplitTokenizerFactory"
> pattern="|"/>
>     <filter class="solr.LowerCaseFilterFactory"/>
>   </analyzer>
>   </fieldType>
>
> I'm passing in a string of tokens separated by '|'.
>
> Some (made up) example data would be:
> 41654165|This is a phrase|6579813|phrases are all one
> 41654165|token|65798761|There can be multiple phrases or tokens per doc
>
>  Is there a workaround?
>
> My search would look something like:
> .../select?q=ptokens: 41654165%20AND% ptokens: 65798761
>
>
> -----Original Message-----
> From: Mike Drob <md...@mdrob.com>
> Sent: Wednesday, July 21, 2021 12:36 PM
> To: users@solr.apache.org
> Subject: Re: Solr nodes crashing
>
> You may want to look into enabling docvalues for your fields in your
> scheme, if not already enabled. That often helps with memory usage during
> query, but requires a reindex of your data.
>
> There are also first searches and new searches queries you can configure
> in your Solr config, those would be able to warm your caches for you if
> that is the case.
>
> Mike
>
> On Wed, Jul 21, 2021 at 11:06 AM Jon Morisi <jon.mor...@hsc.utah.edu>
> wrote:
>
> > Thanks for the help Shawn and Walter.  After increasing the open files
> > setting to 128000 and increasing the JVM-Memory to 16 GB, I was able
> > to load my documents.
> >
> > I now have a collection with 2.3 T rows / ~480 GB running on a 4-node
> > cluster.  I have found that complicated queries (searching for two
> > search terms in a field with "AND" for example), often timeout.  If I
> > try multiple times the query does eventually complete.  I'm assuming
> > this is a caching / warm-up issue.
> >
> > Is there a configuration option I can use to cache the indexes for one
> > of the columns or increase the timeout?  Any other advice to get this
> > performing quicker is appreciated.
> >
> > Thanks again,
> > Jon
> >
> > -----Original Message-----
> > From: Shawn Heisey <apa...@elyograg.org>
> > Sent: Thursday, July 1, 2021 6:48 PM
> > To: users@solr.apache.org
> > Subject: Re: Solr nodes crashing
> >
> > On 7/1/2021 4:23 PM, Jon Morisi wrote:
> > > I've had an indexing job running for 24+ hours.  I'm importing 100m+
> > documents.  After about 8 hours both of the replica nodes crashed but
> > the primary nodes have continued to run and index.
> >
> > There's a common misconception.  Java programs, including Solr, almost
> > never crash.
> >
> > If you've started a recent Solr version on a platform other than
> > Windows, then Solr is started with a Java option that runs a script
> > whenever an OutOfMemoryError exception is thrown by the program.  What
> > that script does is simple -- it logs a line to a logfile and then
> > kills Solr with the -9
> > (kill) signal.  Note that there are a number of resource depletion
> > scenarios, other than memory, which can result in an OutOfMemoryError.
> > That's why you were asked about open file and process limits.
> >
> > Most operating systems also have what has been named the "oom killer".
> > When system memory becomes extremely tight, the OS will find programs
> > using a lot of memory and kill one of them.
> >
> > These two things will LOOK like a crash, but they're not really crashes.
> >
> > > JVM-Memory 50.7%
> > > 981.38 MB
> > > 981.38 MB
> > > 497
> >
> > This indicates that your max heap setting for Solr is in the ballpark
> > of 1GB.  This is extremely small, and so you're probably throwing
> > OutOfMemoryError because of heap space.  Which, on a non-Windows
> > system, will basically cause Solr to commit suicide.  It does this
> > because when OOME is thrown, program operation becomes completely
> > unpredictable, and index corruption is a very real possibility.
> >
> > There are precisely two ways to deal with OOME.  One is to increase
> > the size of the resource that is being depleted.  The other is to
> > change the program or the program configuration so that it doesn't
> > require as much of that resource.  Often, especially with Solr, the
> > second option is simply not possible.
> >
> > Most likely you're going to need to increase Solr's heap far beyond 1GB.
> >   There's no way for us to come up with a recommendation for you
> > without asking you a lot of very detailed questions about your setup
> > ... and even with that, it's possible that we would give you an
> > incorrect recommendation.  I'll give you a number, and warn you that
> > it could be wrong, either way too small or way too large.  Try an 8GB
> > heap.  You have lots of memory in this system, 8GB is barely a drop in
> the bucket.
> >
> > Thanks,
> > Shawn
> >
>

Reply via email to