Perf. difference when the solr core is 'current' or not 'current'
in Solr's admin statistics page, there is a 'current' flag indicating whether the core index reader is 'current' or not. According to some discussions in this mailing list a few months back, it wouldn't affect anything. But my observation is completely different. When the current flag was not checked for some of the cores ( I have defined 15 cores in total), my median search latency over 48M records was over 190ms, but if every current flag was checked, the median dropped to only 87 ms. Another observation is, restarting solr instance may not necessarily make 'current' flags checked, have to reload cores even after starting solr. Could anybody explain the difference? I am using Datastax Enterprise 3.0.2 Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Perf-difference-when-the-solr-core-is-current-or-not-current-tp4074438.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: does solr support query time only stopwords?
Thanks to you all and finally it seems that I figured out a workaround. Yes I used edismax, but my test query was very simple, it only queries one field and uses only one stopword. So i see no chance it would hit another field (but datastax might have done something we don't know). debug didn't yield useful information either. So what I did was to keep the stopFilterFactory element for index analyzer but without specifying our stopword file. I reindexed all solr cores. This time it seems like I could get stopwords frequency info from Luke, while queying stopwords returned 0 match. my wild guess is that the stopFilterFactory for index analyze serves as an overall on switch for the stopwords feature. Erick Erickson wrote My _guess_ is that you're perhaps using edismax or similar and getting matches from fields you don't expect on terms you that are not stopwords. Try adding debug=query and seeing what the parsed query actually is. And, of course, I have no idea what Datastax is doing. And, you have to at least reload the core to pick up the new stopwords. Best Erick On Sat, Jun 8, 2013 at 6:33 PM, jchen2000 lt; jchen200@ gt; wrote: I wanted to analyze high frequency terms using Solr's Luke request handler and keep updating the stopwords file for new queries from time to time. Obviously I have to index all terms whether they belong to stopwords list or not. So I configured query analyzer stopwords list but disabled index analyzer stopwords list, However, it seems like the query would return all records containing stopwords after this. Anybody has an idea why this would happen? ps. I am using Datastax Enterprise 3.0.2 and the solr version is 4.0 -- View this message in context: http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087p4069464.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: does solr support query time only stopwords?
Nope. I only searched with individual stop words. Very strange to me Otis Gospodnetic-5 wrote Maybe returned hits match other query terms. Otis Solr ElasticSearch Support http://sematext.com/ On Jun 8, 2013 6:34 PM, jchen2000 lt; jchen200@ gt; wrote: I wanted to analyze high frequency terms using Solr's Luke request handler and keep updating the stopwords file for new queries from time to time. Obviously I have to index all terms whether they belong to stopwords list or not. So I configured query analyzer stopwords list but disabled index analyzer stopwords list, However, it seems like the query would return all records containing stopwords after this. Anybody has an idea why this would happen? ps. I am using Datastax Enterprise 3.0.2 and the solr version is 4.0 -- View this message in context: http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087p4069143.html Sent from the Solr - User mailing list archive at Nabble.com.
does solr support query time only stopwords?
I wanted to analyze high frequency terms using Solr's Luke request handler and keep updating the stopwords file for new queries from time to time. Obviously I have to index all terms whether they belong to stopwords list or not. So I configured query analyzer stopwords list but disabled index analyzer stopwords list, However, it seems like the query would return all records containing stopwords after this. Anybody has an idea why this would happen? ps. I am using Datastax Enterprise 3.0.2 and the solr version is 4.0 -- View this message in context: http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: customize solr search/scoring for performance
The following was generated from jvisualvm. Seems like the perf is related to scoring a lot. Any idea/pointer on how to customize that part? http://lucene.472066.n3.nabble.com/file/n4019850/profilingResult.png -- View this message in context: http://lucene.472066.n3.nabble.com/customize-solr-search-scoring-for-performance-tp4019444p4019850.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: customize solr search/scoring for performance
Yes, we only need term overlap information to choose top candidates (we may incorporate boost factor for different terms later but that's another story). we are quite new to solr so haven't really profiled the process. Is there any rough guess on what could be expected latency from such cases? our throughput is only around 100 qps so that might not be a significant factor here. Thanks, Jeremy Otis Gospodnetic-5 wrote Fuzzy answer: Can you verify the bottleneck, especially in slow cases is indeed scoring? Profiler? Not sure if coord method in Similarity is still around... are you saying you need just term overlap for scoring/ordering? 20m small docs and 2s queries on good hardware sounds suspicious ... do slow queries correspond to GC or something else? Otis -- Performance Monitoring - http://sematext.com/spm -- View this message in context: http://lucene.472066.n3.nabble.com/customize-solr-search-scoring-for-performance-tp4019444p4019675.html Sent from the Solr - User mailing list archive at Nabble.com.
customize solr search/scoring for performance
Hi we have 20million short docs (about 60 terms, less than 1k in total bytes each) on each box, and we wanted to rank results based on how many terms got matched only. In particular we are only interested in top N with best scores (say a small number like 5). With some help from the forum users (Thanks to Otis), we chose to use edismax with mm set properly (something like 85% or 80% as we wanted to have reasonable recall). It seems like the recall is good but performance is way off. The results vary from 30ms to 2s but we need 200 ~ 300ms for 99% of searches. Since our searching requirement is really straightforward, we don't need tf, idf, positions etc, nor do we need fancy tokenizers since our terms are all pre-processed. In addition, we also don't need to evaluate scores, or sorting over a large doc set as long as we know the top N that has to most terms matched. Any advice on how to custom the process to make it faster? And what could be potential perf bottlenecks (searching in the index, or scoring or sorting)? Could this be done by plugin or we need deeper hacking? Some facts 1) the machine we use are good, so hardware is not a solution 2) dismax seems not working but edismax works (I though dismax could have an edge in perf but I couldn't run it) -- View this message in context: http://lucene.472066.n3.nabble.com/customize-solr-search-scoring-for-performance-tp4019444.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help on solr search
Used mm parameter and it works! Right now preparing perf test. Please share if anybody has method to optimize dismax queries Thanks! Jeremy Otis Gospodnetic-5 wrote Hi, Have a look at your solrconfig.xml and look for your default operator. Also look at the docs for the mm parameter on the Wiki. Let us know if that does it for you. -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-on-solr-search-tp4017191p4018397.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help on solr search
Seems like phrase query is close, but not exactly what we needed. Here is an example assuming just one field: the doc: a1 a2 a3 b1 b2 c1 c2 c3 c4 d1 d2 the query: a1 a2* a3 a4* b1 b2* c2 d1* d2 both doc and query terms are ordered. We know that a term should never go match with b or c terms. Obviously, if we treat all query terms with OR, we could have the job done, but in a much slower way (also the returned list would be too long). So the question is, do we have a way to speed this query up? or customization code is needed (and how)? Thanks, Jeremy -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-on-solr-search-tp4017191p4017630.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help on solr search
Otis Gospodnetic-5 wrote You want ordered term matching (like in a phrase), but you cannot use AND because you do not want all query terms to be required. Correct? That's exactly right! actually none of the query term is required, but we need to base similarity score on how many terms are matched. In addtion, since we have unique prefixes like a, b, c, we guarantee a1 would never match anything in group b or c, etc. Otis Gospodnetic-5 wrote If so, would a1 a2* a3 a4* b1 b2* c2 d1* d2~someBigSlop work? This does not work, because a2* (just any term different from a2, not wildcard), a4* etc do not appear in the doc. quoted proximity match seems still to require all query terms to appear. Thanks, Jeremy -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-on-solr-search-tp4017191p4017686.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help on solr search
Sure. here are some more details: 1) we are having 30M ~ 60M documents per node (right now we have 4 nodes, but that will increase in the future). Documents are relatively small (around 3K), but 99% searches must be returned within 200ms and this is measured by test drivers sitting right in front of solr servers. 2) throughput requirement right now is about 300 qps. The machines we use are quite powerful with 16 cores, lots of memory and with ssd drives. We haven't really achieved this throughput, but search latency is more of an issue 3) one property value may overlap with value in another different property, but we don't want to match those so we prefixed terms with property name Thanks, Fang -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-on-solr-search-tp4017191p4017341.html Sent from the Solr - User mailing list archive at Nabble.com.
need help on solr search
Hi Solr experts, Our documents as well as queries consist of 10 properties in a particular order. Because of stringent requirements on search latency, we grouped them into only 2 fields with 5 properties each (we may use just 1 field, field number over 3 seems too slow), and each property value is split into fixed-length terms (like n-gram, hopefully to save search time) and prefixed with property name. What we want is to find out how similar the query is to the documents by comparing terms. We can't use the default OR operator since it's slow, we wanted to take advantage of the prefix and the defined order. My questions are: 1) Can we do this simply through solr configuration, and how if possible? 2) If we need to customize solr request handler or anything else, where to start? Thanks a lot! Jeremy -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-on-solr-search-tp4017191.html Sent from the Solr - User mailing list archive at Nabble.com.