Re: Solr 4.2.1 higher memory footprint vs Solr 3.5

2013-06-05 Thread SandeepM
/So we see the jagged edge waveform which keeps climbing (GC cycles don't
completely collect memory over time).  Our test has a short capture from
real traffic and we are replaying that via solrmeter./

Any idea why the memory climbs over time.  The GC should cleanup after data
is shipped back.  Could there be a memory leak in SOLR?

Appreciate any help.
Thanks.
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-2-1-higher-memory-footprint-vs-Solr-3-5-tp4067879p4068378.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.2.1 higher memory footprint vs Solr 3.5

2013-06-04 Thread SandeepM
Thanks Eric and Shawn,

Your explanations help understand where SOLR may be spending its time. 
Sounds like compression can be a CPU and heap hog. (I'll try to confirm this
with the heapdumps)

Initially we tried to keep the JVM heap sizes the same on both Solr 3.5 and
4.2.1, which was around 3GB ,which 3.5 handled well even with a 200QPS load. 
Moving to 4.2.1 with the same heap size instantly killed the Server. 
Changing the JVM to 6GB (double) did not help either.  We were seeing higher
CPU and higher heap usage.

We later changed cache settings so as to reduce their sizes, increased the
JVM to 8GB and we see an improvement.  But over time, we do see that the
Heap utilization slowly climbs as the 200QPS test is allowed to run, and
sometimes leads to max heap being exceeded from the JConsole.  So we see the
jagged edge waveform which keeps climbing (GC cycles don't completely
collect memory over time).  Our test has a short capture from real traffic
and we are replaying that via solrmeter. 

Thanks.
Regards,
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-2-1-higher-memory-footprint-vs-Solr-3-5-tp4067879p4068150.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 4.2.1 higher memory footprint vs Solr 3.5

2013-06-03 Thread SandeepM
Hi,

Using the same schema for both Solr 3.5 and Solr 4.2.1 and posting the same
data to both these server,  and the memory requirements seem to have gone up
sharply during request handling.
. Requests come in at around 200QPS.
. Document sizes are very large but that did not seem to be a problem with
3.5 (Lots of multivalued fields with large array lengths.)
Could you help me understand what change in SOLR 4.2.1 would attribute to
this higher memory requirement?

Also, in a different test, I ran a query to just get a list of all unique
ID's via a single query and no load and I see it complete in 500ms however
the time it takes to ship the data back to the client seems to be very
large.  Any idea what could be causing this behavior?

Would appreciate any help.

Regards,
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-2-1-higher-memory-footprint-vs-Solr-3-5-tp4067879.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-24 Thread SandeepM

One of our main concerns is the solr returns the best match based on what it
thinks is the best.  It uses Levenshtein's distance metrics to determine the
best suggestions.   Can we tune this to put more weightage on the number of
frequency/hits vs the number of edits ?   If we can tune this, suggestions
would seem more relevant when corrected.Also, if we can do this while
keeping maxCollation = 1 and maxCollationTries = some reasonable number so
that QTime does not go out of control that will be great!   

Any insights into this would be great. Thanks for your help.

Regards,
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058655.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: spellcheck: change in behavior and QTime

2013-04-23 Thread SandeepM
I apologize for the length of the previous message.

I do see a problem with spellcheck becoming faster (notice QTime).  I also
see an increase in the number of cache hits if spellcheck=false is run one
time followed by the original spellcheck query.  Seems like spellcheck=false
alters the behavior of spellcheck. 

http://host/solr/select?spellcheck=truespellcheck.q=cucoo's+nestdf=spell 
http://host/solr/select?spellcheck=falsespellcheck.q=cucoo's+nestdf=spell  
http://host/solr/select?spellcheck=truespellcheck.q=cucoo's+nestdf=spell 
--- see a faster response and increase in the number of query cache hits.

Thanks.
-- Sandeep





--
View this message in context: 
http://lucene.472066.n3.nabble.com/spellcheck-change-in-behavior-and-QTime-tp4058014p4058402.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-23 Thread SandeepM
James, Is there a way to determine how many times the collations were tried?  
Is there a parameter that can be issued that can return this in debug
information?  This would be very helpful.
Appreciate your help with this.

Thanks.
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058400.html
Sent from the Solr - User mailing list archive at Nabble.com.


spellcheck: change in behavior and QTime

2013-04-22 Thread SandeepM
I am using the same setup (solrconfig.xml and schema.xml) as stated in my
prior message:
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tt4057176.html#a4057389
I am using SOLR 4.2.1 . Just wanted to report something wierd that I am
seeing and would like to find out if anyone else is seeing this behavior.  
Since I don't understand the details of what is happening, I'd like to know
why the change in behavior and if we can do anything to get better QTime
upfront?

I see a change in behavior when running queries against the server due to
which the QTime also changes.

QUERY:
?spellcheck=true
spellcheck.q=cucoo's+nest
df=spell
fq= Its the same every time and I believe moot.

Here is what I have to do:
1.  Run the query.
2.  Run the same query with spellcheck=false
3.  Run the original query (spellcheck=true)

QTime from each of the above stages:
1.  40ms (multiple runs with spellcheck=true.)
2.  10ms (spellcheck = false is run just once)
3.  20ms (after changing back to spellcheck=true again and running multiple
times.)

Cache details at each of the above times:
1.  filterCache

class:
org.apache.solr.search.FastLRUCache

version:
1.0

description:
Concurrent LRU Cache(maxSize=1024, initialSize=512, minSize=921,
acceptableSize=972, cleanupThread=false, autowarmCount=128,
regenerator=org.apache.solr.search.SolrIndexSearcher$2@7ce3d64e)

src:
$URL:
https:/?/?svn.apache.org/?repos/?asf/?lucene/?dev/?branches/?lucene_solr_4_2/?solr/?core/?src/?java/?org/?apache/?solr/?search/?FastLRUCache.java
$

stats:

lookups:
Was: 30, Now: 35, Delta: 5

hits:
Was: 25, Now: 30, Delta: 5

hitratio:
Was: 0.83, Now: 0.85

inserts:
5

evictions:
0

size:
5

warmupTime:
0

cumulative_lookups:
Was: 30, Now: 35, Delta: 5

cumulative_hits:
Was: 25, Now: 30, Delta: 5

cumulative_hitratio:
Was: 0.83, Now: 0.85

cumulative_inserts:
5

cumulative_evictions:
0

queryResultCache

class:
org.apache.solr.search.FastLRUCache

version:
1.0

description:
Concurrent LRU Cache(maxSize=40960, initialSize=10240,
minSize=36864, acceptableSize=38912, cleanupThread=false,
autowarmCount=2560,
regenerator=org.apache.solr.search.SolrIndexSearcher$3@520adaf0)

src:
$URL:
https:/?/?svn.apache.org/?repos/?asf/?lucene/?dev/?branches/?lucene_solr_4_2/?solr/?core/?src/?java/?org/?apache/?solr/?search/?FastLRUCache.java
$

stats:

lookups:
Was: 8, Now: 10, Delta: 2

hits:
Was: 3, Now: 4, Delta: 1

hitratio:
Was: 0.37, Now: 0.40

inserts:
Was: 5, Now: 6, Delta: 1

evictions:
0

size:
Was: 6, Now: 7, Delta: 1

warmupTime:
0

cumulative_lookups:
Was: 8, Now: 10, Delta: 2

cumulative_hits:
Was: 3, Now: 4, Delta: 1

cumulative_hitratio:
Was: 0.37, Now: 0.40

cumulative_inserts:
Was: 5, Now: 6, Delta: 1

cumulative_evictions:
0

CACHE 2
CORE
HIGHLIGHTING
OTHER
QUERYHANDLER 3
UPDATEHANDLER
Watch Changes
Refresh Values

2.  filterCache

class:
org.apache.solr.search.FastLRUCache

version:
1.0

description:
Concurrent LRU Cache(maxSize=1024, initialSize=512, minSize=921,
acceptableSize=972, cleanupThread=false, autowarmCount=128,
regenerator=org.apache.solr.search.SolrIndexSearcher$2@7ce3d64e)

src:
$URL:
https:/?/?svn.apache.org/?repos/?asf/?lucene/?dev/?branches/?lucene_solr_4_2/?solr/?core/?src/?java/?org/?apache/?solr/?search/?FastLRUCache.java
$

stats:

lookups:
Was: 35, Now: 40, Delta: 5

hits:
Was: 30, Now: 35, Delta: 5

hitratio:
Was: 0.85, Now: 0.87

inserts:
5

evictions:
0

size:
5

warmupTime:
0

cumulative_lookups:
Was: 35, Now: 40, Delta: 5

cumulative_hits:
Was: 30, Now: 35, Delta: 5

cumulative_hitratio:
Was: 0.85, Now: 0.87

cumulative_inserts:
5

cumulative_evictions:
0

queryResultCache

class:
   

RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-22 Thread SandeepM
James, Thanks.  That was very helpful. That helped me understand count and
alternativeTermCount a bit more.

I also have the following case as pointed out earlier...
My query: 

http://host/solr/select?q=spellcheck.q=chocolat%20factryspellcheck=truedf=spellfl=indent=onwt=xmlrows=10version=2.2echoParams=explicit

In this case, the intent is to correct chocolat factry with chocolate
factory which exists in my spell field index. I see a QTime from the above
query as somewhere between 350-400ms 

I run a similar query replacing the spellcheck terms to pursut hapyness
whereas pursuit happyness actually exists in my spell field and I see
QTime of 15-17ms . 

Both query produce collations correctly and picking the first suggestions
and applying them as collation find what I am looking for but there is order
of magnitude difference in QTime.  There is one edit per term in both cases
or 2 edits in each query. The length of words in both these queries seem
identical. I'd like to understand why there is this vast difference in
QTime.  Also Chocolate factory and Pursuit happyness both are spellcheck
indexed as is.

I would appreciate any help with this since I am not sure how I can get any
meaningful performance numbers and attribute the slowness to anything in
particular. 

Thanks.
Regards,
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058048.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-22 Thread SandeepM
Chocolat Factry


?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
  int name=status0/int
  int name=QTime77/int
/lst
result name=response numFound=0 start=0
/result
lst name=spellcheck
  lst name=suggestions
lst name=chocolat
  int name=numFound1/int
  int name=startOffset0/int
  int name=endOffset8/int
  int name=origFreq615/int
  arr name=suggestion
lst
  str name=wordchocolate/str
  int name=freq6544/int
/lst
  /arr
/lst
lst name=factry
  int name=numFound5/int
  int name=startOffset9/int
  int name=endOffset15/int
  int name=origFreq6/int
  arr name=suggestion
lst
  str name=wordfactory/str
  int name=freq23614/int
/lst
lst
  str name=wordfactor/str
  int name=freq5128/int
/lst
lst
  str name=wordfactus/str
  int name=freq290/int
/lst
lst
  str name=wordfactum/str
  int name=freq178/int
/lst
lst
  str name=wordfactae/str
  int name=freq102/int
/lst
  /arr
/lst
bool name=correctlySpelledfalse/bool
lst name=collation
  str name=collationQuerychocolate factory/str
  int name=hits85/int
  lst name=misspellingsAndCorrections
str name=chocolatchocolate/str
str name=factryfactory/str
  /lst
/lst
  /lst
/lst
/response




Pursut Hapyness
?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
  int name=status0/int
  int name=QTime16/int
/lst
result name=response numFound=0 start=0
/result
lst name=spellcheck
  lst name=suggestions
lst name=pursut
  int name=numFound5/int
  int name=startOffset0/int
  int name=endOffset6/int
  int name=origFreq0/int
  arr name=suggestion
lst
  str name=wordpursuit/str
  int name=freq1209/int
/lst
lst
  str name=wordpursue/str
  int name=freq108/int
/lst
lst
  str name=wordpursit/str
  int name=freq1/int
/lst
lst
  str name=wordperdut/str
  int name=freq94/int
/lst
lst
  str name=wordpurdue/str
  int name=freq70/int
/lst
  /arr
/lst
lst name=hapyness
  int name=numFound5/int
  int name=startOffset7/int
  int name=endOffset15/int
  int name=origFreq0/int
  arr name=suggestion
lst
  str name=wordhappyness/str
  int name=freq175/int
/lst
lst
  str name=wordhapiness/str
  int name=freq62/int
/lst
lst
  str name=wordhayness/str
  int name=freq1/int
/lst
lst
  str name=wordhappiness/str
  int name=freq7788/int
/lst
lst
  str name=wordharkness/str
  int name=freq324/int
/lst
  /arr
/lst
bool name=correctlySpelledfalse/bool
lst name=collation
  str name=collationQuerypursuit happyness/str
  int name=hits10/int
  lst name=misspellingsAndCorrections
str name=pursutpursuit/str
str name=hapynesshappyness/str
  /lst
/lst
  /lst
/lst
/response

Spellcheck is used separately and we are not using any q along with
spellcheck.

Our search query also queries other fields, not just spellcheck and
therefore does not give a good representation of Qtime.   We use groupings
in the search query.
For Chocolate Factory, I get a search QTime of 198ms
For Pursuit Happyness, I get a search QTime of 318ms

Would appreciate your insights.
Thanks.
-- Sandeep




--
View this message in context: 
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058086.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-19 Thread SandeepM
James,
Thanks for the reply.  I see your point and sure enough, reducing
maxCollationTries does reduce time, however may not produce results.
It seems like the time is taken for the collations re-runs.  Is there any
way we can activate caching for collations.  The same query repeatedly takes
the same amount of time.  My queryCaches are activated, however don't
believe it gets used for spellchecks.
Thanks.
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4057389.html
Sent from the Solr - User mailing list archive at Nabble.com.


DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-18 Thread SandeepM
Hi!

I am using SOLR 4.2.1.

My solrconfig.xml contains the following:

  searchComponent name=MySpellcheck class=solr.SpellCheckComponent
   str name=queryAnalyzerFieldTypetext_spell/str

 lst name=spellchecker
   str name=nameMySpellchecker/str
   str name=fieldspell/str
   str name=classnamesolr.DirectSolrSpellChecker/str
   str name=distanceMeasureinternal/str
   float name=accuracy0.5/float
   int name=maxEdits2/int
   int name=minPrefix1/int
   int name=maxInspections5/int
   int name=minQueryLength3/int
   float name=maxQueryFrequency0.01/float
   
 /lst
 /searchComponent

requestHandler name=/select class=solr.SearchHandler startup=lazy
lst name=defaults
  int name=rows10/int
  str name=dfid/str
  str name=spellcheck.dictionaryMySpellchecker/str
  str name=spellcheckon/str
  str name=spellcheck.extendedResultsfalse/str
  str name=spellcheck.count10/str
  str name=spellcheck.alternativeTermCount10/str
  str name=spellcheck.maxResultsForSuggest35/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.collatetrue/str
  str name=spellcheck.collateExtendedResultsfalse/str
  str name=spellcheck.maxCollationTries10/str
  str name=spellcheck.maxCollations1/str
  str name=spellcheck.collateParam.q.opAND/str
/lst
arr name=last-components
  strMySpellcheck/str
/arr
  /requestHandler

schema.xml with the spell field looks like:

fieldType name=text_spell class=solr.TextField
positionIncrementGap=100  sortMissingLast=true 
analyzer type=index
tokenizer
class=solr.StandardTokenizerFactory /
filter class=solr.LowerCaseFilterFactory
/
filter class=solr.StopFilterFactory
ignoreCase=true
 words=lang/stopwords_en.txt
enablePositionIncrements=true /
/analyzer
analyzer type=query
tokenizer
class=solr.StandardTokenizerFactory /
filter class=solr.LowerCaseFilterFactory
/
filter class=solr.StopFilterFactory
ignoreCase=true
 words=lang/stopwords_en.txt
enablePositionIncrements=true /
/analyzer
/fieldType

field name=spell type=text_spell indexed=true
stored=false multiValued=true /

copyField source=title dest=spell /
copyField source=artist dest=spell /
 
My query:
http://host/solr/select?q=spellcheck.q=chocolat%20factryspellcheck=truedf=spellfl=indent=onwt=xmlrows=10version=2.2echoParams=explicit

In this case, the intent is to correct chocolat factry with chocolate
factory which exists in my spell field index. I see a QTime from the above
query as somewhere between 350-400ms

I run a similar query replacing the spellcheck terms to pursut hapyness
whereas pursuit happyness actually exists in my spell field and I see
QTime of 15-17ms .

Both query produce collations correctly but there is order of magnitude
difference in QTime.  There is one edit per term in both cases or 2 edits in
each query. The length of words in both these queries seem identical. I'd
like to understand why there is this vast difference in QTime.  I would
appreciate any help with this since I am not sure how I can get any
meaningful performance numbers and attribute the slowness to anything in
particular. 

I also see a vast difference in QTime in another case.  Replace the search
terms in the above query with over cuckoo's nest, over cuccoo's nst,
etc.   over cuckoo's nest exists in my indexed spell field and so it
should find it almost immediately.  This query fails to produce any
collation and takes 10seconds. While the second query over cuccoo's nst
corrects the phrase and also returns in 24ms. Something does not sound right
here.

I would appreciate help with these.

Thanks in advance.
Regards,
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176.html
Sent from the Solr - User mailing list archive at Nabble.com.