i have the same problem.and did you got some good idea? wish you can share it.thanks 在 2012-2-18 上午8:52,"Bryan Loofbourrow" <bloofbour...@knowledgemosaic.com>写道:
> Apologies. I meant to type “1.4 TB” and somehow typed “1.4 GB.” Little > wonder that no one thought the question was interesting, or figured I must > be using Sneakernet to run my searches. > > > > -- Bryan Loofbourrow > > > ------------------------------ > > *From:* Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com] > *Sent:* Thursday, February 16, 2012 7:07 PM > *To:* 'solr-user@lucene.apache.org' > *Subject:* Improving proximity search performance > > > > Here’s my use case. I expect to set up a Solr index that is approximately > 1.4GB (this is a real number from the proof-of-concept using the real data, > which consists of about 10 million documents, many of significant size, and > making use of the FastVectorHighlighter to do highlighting on the body text > field, which is of course stored, and with termVectors, termPositions, and > termOffsets on). > > > > I no longer have the proof-of-concept Solr core available (our live site > uses Solr 1.4 and the ordinary Highlighter), so I can’t get an empirical > answer to this question: Will storing that extra information about the > location of terms help the performance of proximity searches? > > > > A significant and important subset of my users make extensive use of > proximity searches. These sophisticated users have found that they are best > able to locate what they want by doing searches about THISWORD within 5 > words of THATWORD, or much more sophisticated variants on that theme, > including plenty of booleans and wildcards. The problem I’m facing is > performance. Some of these searches, when common words are used, can take > many minutes, even with the index on an SSD. > > > > The question is, how to improve the performance. It occurred to me as > possible that all of that term vector information, stored for the benefit > of the FastVectorHighlighter, might be a significant aid to the performance > of these searches. > > > > First question: is that already the case? Will storing this extra > information automatically improve my proximity search performance? > > > > Second question: If not, I’m very willing to dive into the code and come up > with a patch that would do this. Can someone with knowledge of the > internals comment on whether this is a plausible strategy for improving > performance, and, if so, give tips about the outlines of what a successful > approach to the problem might look like? > > > > Third question: Any tips in general for improving the performance of these > proximity searches? I have explored the question of whether the customers > might be weaned off of them, and that does not appear to be an option. > > > > Thanks, > > > > -- Bryan Loofbourrow >