From other mails, it looks like you’re inheriting something you had no input in building. My sympathies ;)
Unless you’ve explicitly changed the memory by specifying -Xmx and -Xms at startup, you’re operating with 512M of memory, which is far too small for most Solr installations. the -m parameter at startup will modify this. The admin UI will also show you how much memory Solr is running with. Best, Erick > On Mar 26, 2020, at 8:52 AM, matthew sporleder <msporle...@gmail.com> wrote: > > That explains the OOM's I've been getting in the initial test cycle. > I'm working with about 50M (small) documents. > > On Thu, Mar 26, 2020 at 7:58 AM Erick Erickson <erickerick...@gmail.com> > wrote: >> >> the ngramming is a time/space tradeoff. Typically, >> if you restrict the wildcards to have three or more >> “real” characters performance is fine. One real >> character (i.e. a*) will be your worst-case. I’ve >> seen requiring two characters in the prefix work well >> too. It Depends (tm). >> >> Conceptually what happens here is that Lucene has >> to enumerate all of the terms that start with the prefix >> and create a ginormous OR clause. The term >> enumeration will take longer the more terms there are. >> Things are more efficient than that, but still... >> >> So make sure you’re testing with a real corpus. Having >> a test index with just a few terms will be misleading. >> >> Best, >> Erick >> >>> On Mar 25, 2020, at 9:37 PM, matthew sporleder <msporle...@gmail.com> wrote: >>> >>> Okay confirmed- >>> I am getting a more predictable results set after adding an additional >>> field: >>> <fieldType name="string_alpha" class="solr.TextField" >>> sortMissingLast="true" omitNorms="true"> >>> <analyzer> >>> <tokenizer class="solr.KeywordTokenizerFactory"/> >>> <filter class="solr.LowerCaseFilterFactory" /> >>> <filter class="solr.PatternReplaceFilterFactory" >>> pattern="\p{Punct}" replacement=""/> >>> </analyzer> >>> </fieldType> >>> >>> q=slug:what_is_lo*&fl=slug&rows=1000&wt=csv&sort=slug_alpha%20asc >>> >>> So it appears I can skip edge ngram entirely using this method as >>> slug:foo* appears to be the exact same results as fayt:foo, but I have >>> the cost of the alphaOnly field :) >>> >>> I will try to figure out some benchmarks or something to decide how to go. >>> >>> Thanks again for the help so far. >>> >>> >>> On Wed, Mar 25, 2020 at 2:39 PM Erick Erickson <erickerick...@gmail.com> >>> wrote: >>>> >>>> You’re getting the correct sorted order… The underscore character is >>>> confusing you. >>>> >>>> It’s ascii code for underscore is %2d which sorts before any letter, >>>> uppercase or lowercase. >>>> >>>> See the alphaOnlySort type for a way to remove this, although the output >>>> there can also >>>> be confusing. >>>> >>>> Best, >>>> Erick >>>> >>>>> On Mar 25, 2020, at 1:30 PM, matthew sporleder <msporle...@gmail.com> >>>>> wrote: >>>>> >>>>> What_is_Lov_Holtz_known_for >>>>> What_is_lova_after_it_harddens >>>>> What_is_Lova_Moor's_birthday >>>>> What_is_lovable_in_Spanish >>>>> What_is_lovage >>>>> What_is_Lovagny's_population >>>>> What_is_lovan_for >>>>> What_is_lovanox >>>>> What_is_lovarstan_for >>>>> What_is_Lovasatin >>>> >>