Perf. difference when the solr core is 'current' or not 'current'

2013-07-01 Thread jchen2000
in Solr's admin statistics page, there is a 'current' flag indicating whether
the core index reader is 'current' or not. According to some discussions in
this mailing list a few months back, it wouldn't affect anything. But my
observation is completely different. When the current flag was not checked
for some of the cores ( I have defined 15 cores in total), my median search
latency over 48M records was over 190ms, but if every current flag was
checked, the median dropped to only 87 ms. 

Another observation is, restarting solr instance may not necessarily make
'current' flags  checked,  have to reload cores even after starting solr.

Could anybody explain the difference? I am using Datastax Enterprise 3.0.2

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Perf-difference-when-the-solr-core-is-current-or-not-current-tp4074438.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: does solr support query time only stopwords?

2013-06-10 Thread jchen2000
Thanks to you all and finally it seems that I figured out a workaround.

Yes I used edismax, but my test query was very simple,  it only queries one
field and uses only one stopword. So i see no chance it would hit another
field (but datastax might have done something we don't know). debug didn't
yield useful information either.

So what I did was to keep the stopFilterFactory element for index analyzer
but without specifying our stopword file. I reindexed all solr cores. This
time it seems like I could get stopwords frequency info from Luke, while
queying stopwords returned 0 match.

my wild guess is that the stopFilterFactory for index analyze serves as an
overall on switch for the stopwords feature.


Erick Erickson wrote
 My _guess_ is that you're perhaps using
 edismax or similar and getting matches from
 fields you don't expect on terms you that are
 not stopwords. Try adding debug=query and
 seeing what the parsed query actually is.
 
 And, of course, I have no idea what Datastax is
 doing.
 
 And, you have to at least reload the core
 to pick up the new stopwords.
 
 Best
 Erick
 
 On Sat, Jun 8, 2013 at 6:33 PM, jchen2000 lt;

 jchen200@

 gt; wrote:
 I wanted to analyze high frequency terms using Solr's Luke request
 handler
 and keep updating the stopwords file for new queries from time to time.
 Obviously I have to index all terms whether they belong to stopwords list
 or
 not.

 So I configured query analyzer stopwords list but disabled index analyzer
 stopwords list, However, it seems like the query would return all records
 containing stopwords after this.

 Anybody has an idea why this would happen?

 ps. I am using Datastax Enterprise 3.0.2 and the solr version is 4.0



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087.html
 Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087p4069464.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: does solr support query time only stopwords?

2013-06-09 Thread jchen2000
Nope. I only searched with individual stop words.  Very strange to me


Otis Gospodnetic-5 wrote
 Maybe returned hits match other query terms.
 
 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Jun 8, 2013 6:34 PM, jchen2000 lt;

 jchen200@

 gt; wrote:
 
 I wanted to analyze high frequency terms using Solr's Luke request
 handler
 and keep updating the stopwords file for new queries from time to time.
 Obviously I have to index all terms whether they belong to stopwords list
 or
 not.

 So I configured query analyzer stopwords list but disabled index analyzer
 stopwords list, However, it seems like the query would return all records
 containing stopwords after this.

 Anybody has an idea why this would happen?

 ps. I am using Datastax Enterprise 3.0.2 and the solr version is 4.0



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087.html
 Sent from the Solr - User mailing list archive at Nabble.com.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087p4069143.html
Sent from the Solr - User mailing list archive at Nabble.com.


does solr support query time only stopwords?

2013-06-08 Thread jchen2000
I wanted to analyze high frequency terms using Solr's Luke request handler
and keep updating the stopwords file for new queries from time to time.
Obviously I have to index all terms whether they belong to stopwords list or
not.

So I configured query analyzer stopwords list but disabled index analyzer
stopwords list, However, it seems like the query would return all records
containing stopwords after this.

Anybody has an idea why this would happen?

ps. I am using Datastax Enterprise 3.0.2 and the solr version is 4.0



--
View this message in context: 
http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: customize solr search/scoring for performance

2012-11-12 Thread jchen2000
The following was generated from jvisualvm. Seems like the perf is related to
scoring a lot. Any idea/pointer on how to customize that part?

http://lucene.472066.n3.nabble.com/file/n4019850/profilingResult.png 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/customize-solr-search-scoring-for-performance-tp4019444p4019850.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: customize solr search/scoring for performance

2012-11-11 Thread jchen2000
Yes, we only need term overlap information to choose top candidates (we may
incorporate boost factor for different terms later but that's another
story).

we are quite new to solr so haven't really profiled the process. Is there
any rough guess on what could be expected latency from such cases?  our
throughput is only around 100 qps so that might not be a significant factor
here. 

Thanks,

Jeremy
  

Otis Gospodnetic-5 wrote
 Fuzzy answer:
 Can you verify the bottleneck, especially in slow cases is indeed scoring?
 Profiler?
 Not sure if coord method in Similarity is still around... are you saying
 you need just term overlap for scoring/ordering?
 20m small docs and 2s queries on good hardware sounds suspicious ... do
 slow queries correspond to GC or something else?
 
 Otis
 --
 Performance Monitoring - http://sematext.com/spm





--
View this message in context: 
http://lucene.472066.n3.nabble.com/customize-solr-search-scoring-for-performance-tp4019444p4019675.html
Sent from the Solr - User mailing list archive at Nabble.com.


customize solr search/scoring for performance

2012-11-09 Thread jchen2000
Hi 

we have 20million short docs (about 60 terms, less than 1k in total bytes
each) on each box, and we wanted to rank results based on how many terms got
matched only. In particular we are only interested in top N with best scores
(say a small number like 5). 

With some help from the forum users (Thanks to Otis), we chose to use
edismax with mm set properly (something like 85% or 80% as we wanted to have
reasonable recall). It seems like the recall is good but performance is way
off. The results vary from 30ms to 2s but we need 200 ~ 300ms for 99% of
searches.   Since our searching requirement is really straightforward, we
don't need tf, idf, positions etc, nor do we need fancy tokenizers since our
terms are all pre-processed. In addition, we also don't need to evaluate
scores, or sorting over a large doc set as long as we know the top N that
has to most terms matched. 

Any advice on how to custom the process to make it faster? And what could be
potential perf bottlenecks (searching in the index, or scoring or sorting)? 
Could this be done by plugin or we need deeper hacking? 

Some facts
1) the machine we use are good, so hardware is not a solution
2) dismax seems not working but edismax works (I though dismax could have an
edge in perf but I couldn't run it)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/customize-solr-search-scoring-for-performance-tp4019444.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need help on solr search

2012-11-05 Thread jchen2000
Used mm parameter and it works!

Right now preparing perf test. Please share if anybody has method to
optimize dismax queries

Thanks!

Jeremy


Otis Gospodnetic-5 wrote
 Hi,
 
 Have a look at your solrconfig.xml and look for your default operator.
 Also
 look at the docs for the mm parameter on the Wiki. Let us know if that
 does
 it for you.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-on-solr-search-tp4017191p4018397.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need help on solr search

2012-11-01 Thread jchen2000
Seems like phrase query is close, but not exactly what we needed. Here is an
example assuming just one field:
the doc: a1 a2 a3 b1 b2 c1 c2 c3 c4 d1 d2
the query: a1 a2* a3 a4* b1 b2* c2 d1* d2

both doc and query terms are ordered. We know that a term should never go
match with b or c terms. Obviously, if we treat all query terms with OR, we
could have the job done, but in a much slower way (also the returned list
would be too long). So the question is, do we have a way to speed this query
up? or customization code is needed (and how)?

Thanks,
Jeremy



--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-on-solr-search-tp4017191p4017630.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need help on solr search

2012-11-01 Thread jchen2000
Otis Gospodnetic-5 wrote
 You want ordered term matching (like in a phrase), but you cannot use
 AND
 because you do not want all query terms to be required.  Correct?

That's exactly right! actually none of the query term is required, but we
need to base similarity score on how many terms are matched. In addtion,
since we have unique prefixes like a, b, c, we guarantee a1 would
never match anything in group b or c, etc.


Otis Gospodnetic-5 wrote
 If so, would a1 a2* a3 a4* b1 b2* c2 d1* d2~someBigSlop work?

This does not work, because a2* (just any term different from a2, not
wildcard), a4* etc do not appear in the doc. quoted proximity match seems
still to require all query terms to appear.

Thanks,
Jeremy





--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-on-solr-search-tp4017191p4017686.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need help on solr search

2012-10-31 Thread jchen2000
Sure.  here are some more details:
1) we are having 30M ~ 60M documents per node (right now we have 4 nodes,
but that will increase in the future).  Documents are relatively small
(around 3K), but 99% searches must be returned within 200ms and this is
measured by test drivers sitting right in front of solr servers. 

2) throughput requirement right now is about 300 qps. The machines we use
are quite powerful with 16 cores, lots of memory and with ssd drives. We
haven't really achieved this throughput, but search latency is more of an
issue

3) one property value may overlap with value in another different property,
but we don't want to match those so we prefixed terms with property name

Thanks,
Fang  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-on-solr-search-tp4017191p4017341.html
Sent from the Solr - User mailing list archive at Nabble.com.


need help on solr search

2012-10-30 Thread jchen2000
Hi Solr experts,

Our documents as well as queries consist of 10 properties in a particular
order. Because of stringent requirements on search latency, we grouped them
into only 2 fields with 5 properties each (we may use just 1 field, field
number over 3 seems too slow), and each property value is split into
fixed-length terms (like n-gram, hopefully to save search time) and prefixed
with property name. What we want is to find out how similar the query is to
the documents by comparing terms. We can't use the default OR operator since
it's slow, we wanted to take advantage of the prefix and the defined order. 

My questions are:
1) Can we do this simply through solr configuration, and how if possible?
2) If we need to customize solr request handler or anything else, where to
start?

Thanks a lot!

Jeremy



--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-on-solr-search-tp4017191.html
Sent from the Solr - User mailing list archive at Nabble.com.