RE: Improving proximity search performance

蒋明原 Sat, 15 Sep 2012 03:27:35 -0700

i have the same problem.and did you got some good idea? wish you can share
it.thanks
在 2012-2-18 上午8:52，"Bryan Loofbourrow" <bloofbour...@knowledgemosaic.com>写道：


> Apologies. I meant to type “1.4 TB” and somehow typed “1.4 GB.” Little
> wonder that no one thought the question was interesting, or figured I must
> be using Sneakernet to run my searches.
>
>
>
> -- Bryan Loofbourrow
>
>
>   ------------------------------
>
> *From:* Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com]
> *Sent:* Thursday, February 16, 2012 7:07 PM
> *To:* 'solr-user@lucene.apache.org'
> *Subject:* Improving proximity search performance
>
>
>
> Here’s my use case. I expect to set up a Solr index that is approximately
> 1.4GB (this is a real number from the proof-of-concept using the real data,
> which consists of about 10 million documents, many of significant size, and
> making use of the FastVectorHighlighter to do highlighting on the body text
> field, which is of course stored, and with termVectors, termPositions, and
> termOffsets on).
>
>
>
> I no longer have the proof-of-concept Solr core available (our live site
> uses Solr 1.4 and the ordinary Highlighter), so I can’t get an empirical
> answer to this question: Will storing that extra information about the
> location of terms help the performance of proximity searches?
>
>
>
> A significant and important subset of my users make extensive use of
> proximity searches. These sophisticated users have found that they are best
> able to locate what they want by doing searches about THISWORD within 5
> words of THATWORD, or much more sophisticated variants on that theme,
> including plenty of booleans and wildcards. The problem I’m facing is
> performance. Some of these searches, when common words are used, can take
> many minutes, even with the index on an SSD.
>
>
>
> The question is, how to improve the performance. It occurred to me as
> possible that all of that term vector information, stored for the benefit
> of the FastVectorHighlighter, might be a significant aid to the performance
> of these searches.
>
>
>
> First question: is that already the case? Will storing this extra
> information automatically improve my proximity search performance?
>
>
>
> Second question: If not, I’m very willing to dive into the code and come up
> with a patch that would do this. Can someone with knowledge of the
> internals comment on whether this is a plausible strategy for improving
> performance, and, if so, give tips about the outlines of what a successful
> approach to the problem might look like?
>
>
>
> Third question: Any tips in general for improving the performance of these
> proximity searches? I have explored the question of whether the customers
> might be weaned off of them, and that does not appear to be an option.
>
>
>
> Thanks,
>
>
>
> -- Bryan Loofbourrow
>

RE: Improving proximity search performance

Reply via email to