Re: Trailing wild card searches very slow in Solr

Erick Erickson Mon, 20 Nov 2017 18:37:15 -0800

At first glance you have a mis-configured setup. The most glaring
issue is that you're trying to search a 150G index in 1G of memory.


bq: String field (not tokenized) is docValues=true, indexed=true and stored=true

OK, this is kind of unusual to query but if the field just contains
single tokens it's probably OK.

bq: Field is almost unique in the index, around 80 million are unique

This is a _lot_ of unique fields, but as long as your wildcard
searches don't actually match too many values (say 1,000 or so) it
should be OK.

bq: no commits on index

Huh? Then you can't search. I suspect you have autocommit settings in
your solrconfig.xml file?

bq: solr jvm heap 1GB

This is far too small. It's a miracle it works at all.

bq: index size on disk is around 150GB

I pretty much guarantee your heap is undersized for an index that size.

bq: q=myfield:abc* has QTime=17-20secs after filecache on OS is primed

How many terms does abc* match? That's the biggest question in terms
of perfirmance.

But really, I expect even if you created an OR clause with, say, 50
terms in it it would perform poorly. My guess is that you don't have
nearly enough memory for your Solr instance.

You didn't include the results of adding &debug=query, perhaps you
can't due to corporate policy. But you _can_ scrub the
parsedQuery_tostring bits of the return.

But really, don't do much until you give your Solr instance enough
memory to work  with.

Best,
Erick


On Mon, Nov 20, 2017 at 5:26 PM, Sundeep T <sundeep....@gmail.com> wrote:
> Hi Erick,
>
> Thanks for the reply. Here are more details on our setup -
>
> *Setup/schema details -*
>
> 100 million doc solr core
>
> String field (not tokenized) is docValues=true, indexed=true and stored=true
>
> Field is almost unique in the index, around 80 million are unique
>
> no commits on index
>
> all caches disabled in solrconfig.xml
>
> solr jvm heap 1GB
>
> single solr core in jvm
>
> solr core is not optimized and has about 50 segment files some up to 5GB
>
> index size on disk is around 150GB
>
> solr v6.5.0
>
>
>
> *Performance -*
>
>
> q=myfield:abc* has QTime=30secs+ first time
>
> q=myfield:abc* has QTime=17-20secs after filecache on OS is primed
>
>
> Thanks
> Sundeep
>
>
> On Mon, Nov 20, 2017 at 12:16 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> Well, define "slow". Conceptually a large OR clause is created that
>> contains all the terms that start with the indicated text. (actually a
>> PrefixQuery should be formed).
>>
>> That said, I'd expect hello* to be reasonably fast as not many terms
>> _probably_ start with 'hello'. Not the same at all for, say, h*.
>>
>> You might review: https://wiki.apache.org/solr/UsingMailingLists,
>> you're not really providing much information to go on here.
>>
>> What is the result of adding &debug=query? Particularly it would be
>> useful to see the parsed query.
>>
>> Are all such queries slow? What happens if you submit hel* followed by
>> hello*, the first one will bring the underlying index structures into
>> memory, for all we know this could simply be an autowarming issue.
>>
>> Are you indexing at the same time? Do you have a short autocommit interval?
>>
>> What version of Solr?
>>
>> Details matter.
>> Best,
>> Erick
>>
>> On Mon, Nov 20, 2017 at 11:50 AM, Sundeep T <sundeep....@gmail.com> wrote:
>> > Hi Erick.
>> >
>> > I initially asked this question regarding leading wildcards. This was a
>> > typo, and what I meant was trailing wild card queries were slow. So
>> queries
>> > like text:'hello*" are slow. We were expecting since the string field is
>> > already indexed, the searches should be fast, but that seems to be not
>> the
>> > case
>> >
>> > Thanks
>> > Sundeep
>> >
>> > On Mon, Nov 20, 2017 at 9:39 AM, Erick Erickson <erickerick...@gmail.com
>> >
>> > wrote:
>> >
>> >> You already asked that question and got several answers, did you not
>> >> see them? If you did see them, what is unclear?
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Mon, Nov 20, 2017 at 9:33 AM, Sundeep T <sundeep....@gmail.com>
>> wrote:
>> >> > Hi,
>> >> >
>> >> > We have several indexed string fields which is not tokenized and does
>> not
>> >> > have docValues enabled.
>> >> >
>> >> > When we do trailing wildcard searches on these fields they are running
>> >> very
>> >> > slow. We were thinking that since this field is indexed, such queries
>> >> > should be running pretty quickly. We are using Solr 6.6.1. Anyone has
>> >> ideas
>> >> > on why these queries are running slow and if there are any ways to
>> speed
>> >> > them up?
>> >> >
>> >> > Thanks
>> >> > Sundeep
>> >>
>>

Re: Trailing wild card searches very slow in Solr

Reply via email to