Re: Partial Counts in SOLR

Erick Erickson Wed, 19 Mar 2014 06:59:51 -0700

Yes, that'll be slow. Wildcards are, at best, interesting and at worst
resource consumptive. Especially when you're doing this kind of
positioning information as well.


Consider looking at the problem sideways. That is, what is your
purpose in searching for, say, "buy*"? You want to find buy, buying,
buyers, etc? Would you get bette results if you just stemmed and
omitted the wildcards?

Do you have a restricted vocabulary that would allow you to define
synonyms for the "important" words and all their variants at index
time and use that?

Finally, of course, you could shard your index (or add more shards if
you're already sharding) if you really _must_ support these kinds of
queries and can't work around the problem.

Best,
Erick

On Tue, Mar 18, 2014 at 11:21 PM, Salman Akram
<salman.ak...@northbaysolutions.net> wrote:
> Anyone?
>
>
> On Mon, Mar 17, 2014 at 12:03 PM, Salman Akram <
> salman.ak...@northbaysolutions.net> wrote:
>
>> Below is one of the sample slow query that takes mins!
>>
>> ((stock or share*) w/10 (sale or sell* or sold or bought or buy* or
>> purchase* or repurchase*)) w/10 (executive or director)
>>
>> If a filter is used it comes in fq but what can be done about plain
>> keyword search?
>>
>>
>> On Sun, Mar 16, 2014 at 4:37 AM, Erick Erickson 
>> <erickerick...@gmail.com>wrote:
>>
>>> What are our complex queries? You
>>> say that your app will very rarely see the
>>> same query thus you aren't using caches...
>>> But, if you can move some of your
>>> clauses to fq clauses, then the filterCache
>>> might well be used to good effect.
>>>
>>>
>>>
>>> On Thu, Mar 13, 2014 at 7:22 AM, Salman Akram
>>> <salman.ak...@northbaysolutions.net> wrote:
>>> > 1- SOLR 4.6
>>> > 2- We do but right now I am talking about plain keyword queries just
>>> sorted
>>> > by date. Once this is better will start looking into caches which we
>>> > already changed a little.
>>> > 3- As I said the contents are not stored in this index. Some other
>>> metadata
>>> > fields are but with normal queries its super fast so I guess even if I
>>> > change there it will be a minor difference. We have SSD and quite fast
>>> too.
>>> > 4- That's something we need to do but even in low workload those queries
>>> > take a lot of time
>>> > 5- Every 10 mins and currently no auto warming as user queries are
>>> rarely
>>> > same and also once its fully warmed those queries are still slow.
>>> > 6- Nops.
>>> >
>>> > On Thu, Mar 13, 2014 at 5:38 PM, Dmitry Kan <solrexp...@gmail.com>
>>> wrote:
>>> >
>>> >> 1. What is your solr version? In 4.x family the proximity searches have
>>> >> been optimized among other query types.
>>> >> 2. Do you use the filter queries? What is the situation with the cache
>>> >> utilization ratios? Optimize (= i.e. bump up the respective cache
>>> sizes) if
>>> >> you have low hitratios and many evictions.
>>> >> 3. Can you avoid storing some fields and only index them? When the
>>> field is
>>> >> stored and it is retrieved in the result, there are couple of disk
>>> seeks
>>> >> per field=> search slows down. Consider SSD disks.
>>> >> 4. Do you monitor your system in terms of RAM / cache stats / GC? Do
>>> you
>>> >> observe STW GC pauses?
>>> >> 5. How often do you commit & do you have the autowarming / external
>>> warming
>>> >> configured?
>>> >> 6. If you use faceting, consider storing DocValues for facet fields.
>>> >>
>>> >> some solr wiki docs:
>>> >>
>>> >>
>>> https://wiki.apache.org/solr/SolrPerformanceProblems?highlight=%28%28SolrPerformanceFactors%29%29
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Thu, Mar 13, 2014 at 8:52 AM, Salman Akram <
>>> >> salman.ak...@northbaysolutions.net> wrote:
>>> >>
>>> >> > Well some of the searches take minutes.
>>> >> >
>>> >> > Below are some stats about this particular index that I am talking
>>> about:
>>> >> >
>>> >> > Index size = 400GB (Using CommonGrams so without that the index is
>>> around
>>> >> > 180GB)
>>> >> > Position File = 280GB
>>> >> > Total Docs = 170 million (just indexed for searching - for
>>> highlighting
>>> >> > contents are stored in another index)
>>> >> > Avg Doc Size = Few hundred KBs
>>> >> > RAM = 384GB (it has other indexes too but still OS cache can have
>>> 60-80%
>>> >> of
>>> >> > the total index cached)
>>> >> >
>>> >> > Phrase queries run pretty fast with CG but complex versions of
>>> wildcard
>>> >> and
>>> >> > proximity queries can be really slow. I know using CG will make them
>>> slow
>>> >> > but they just take too long. By default sorting is on date but users
>>> have
>>> >> > few other parameters too on which they can sort.
>>> >> >
>>> >> > I wanted to avoid creating multiple indexes (maybe based on years)
>>> but
>>> >> > seems that to search on partial data that's the only feasible way.
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Wed, Mar 12, 2014 at 2:47 PM, Dmitry Kan <solrexp...@gmail.com>
>>> >> wrote:
>>> >> >
>>> >> > > As Hoss pointed out above, different projects have different
>>> >> > requirements.
>>> >> > > Some want to sort by date of ingestion reverse, which means that
>>> having
>>> >> > > posting lists organized in a reverse order with the early
>>> termination
>>> >> is
>>> >> > > the way to go (no such feature in Solr directly). Some other
>>> projects
>>> >> > want
>>> >> > > to collect all docs matching a query, and then sort by rank, but
>>> you
>>> >> > cannot
>>> >> > > guarantee, that the most recently inserted document is the most
>>> >> relevant
>>> >> > in
>>> >> > > terms of your ranking.
>>> >> > >
>>> >> > >
>>> >> > > Do your current searches take too long?
>>> >> > >
>>> >> > >
>>> >> > > On Tue, Mar 11, 2014 at 11:51 AM, Salman Akram <
>>> >> > > salman.ak...@northbaysolutions.net> wrote:
>>> >> > >
>>> >> > > > Its a long video and I will definitely go through it but it seems
>>> >> this
>>> >> > is
>>> >> > > > not possible with SOLR as it is?
>>> >> > > >
>>> >> > > > I just thought it would be quite a common issue; I mean
>>> generally for
>>> >> > > > search engines its more important to show the first page results,
>>> >> > rather
>>> >> > > > than using timeAllowed which might not even return a single
>>> result.
>>> >> > > >
>>> >> > > > Thanks!
>>> >> > > >
>>> >> > > >
>>> >> > > > --
>>> >> > > > Regards,
>>> >> > > >
>>> >> > > > Salman Akram
>>> >> > > >
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > > --
>>> >> > > Dmitry
>>> >> > > Blog: http://dmitrykan.blogspot.com
>>> >> > > Twitter: http://twitter.com/dmitrykan
>>> >> > >
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Regards,
>>> >> >
>>> >> > Salman Akram
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Dmitry
>>> >> Blog: http://dmitrykan.blogspot.com
>>> >> Twitter: http://twitter.com/dmitrykan
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Regards,
>>> >
>>> > Salman Akram
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Salman Akram
>>
>>
>
>
> --
> Regards,
>
> Salman Akram

Re: Partial Counts in SOLR

Reply via email to