I am doing some performance analysis,
There are currently more than 550000 of documents in SOLR, and Tokenizer
(web-crawler, http://www.tokenizer.org) adds about 2000 of new documents
each hour. I was forced to stop crawler, but even after 20 minutes SOLR uses
about 60% CPU (two Opteron 252 processors, SLES 10).
I have autocommit set to 1000, and default merge to 1000. Didn't issue
"optimize" yet, and I have 738 files in /solr/data/index folder. Usually
"optimize" does not help (after I reached 400000 docs).
I want to share some findings... Currently, database size can easily reach 2
millions of docs (by adding some URLs from USA); but I am forced to stop
crawler.
Max number of open files is set to 65000, SuSE 10 Ent. Server.
8192 "max open files" didn't help - in fact, this number is enough; but OS
have some kind of delay (when it is overloaded), it shows 8000 open files
when we have only 3000 (it will show correct number after some delay! It's
not "truly" real-time number)
Solr runs at Tomcat with 4Gb: -Xms4096M -Xmx4096M
Ok. I issued "commit" (via HTTP XML), it took maybe 10 seconds... I have 17
file now in index folder, but SOLR still uses about 66% double-CPU, and
there are no any incoming HTTP requests, and Robot stopped crawl.
Size of SOLR index files on disk is 200Mb total, so I expect that 4Gb for
dedicated Tomcat is more than enough.
I see often such message at Admin screen:
No deadlock found.
Full Thread Dump:
"http-10080-Processor50" Id=68 in BLOCKED on
[EMAIL PROTECTED] total
cpu time=72310.0000ms user time=71090.0000ms
owned by http-10080-Processor9 Id=20
at
org.apache.solr.update.DirectUpdateHandler2.checkCommit(DirectUpdateHandler2
.java:566)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java
:271)
at org.apache.solr.core.SolrCore.update(SolrCore.java:716)
at
org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:53)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
FilterChain.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
ain.java:173)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
va:213)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
va:178)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126
)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105
)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
:107)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processC
onnection(Http11BaseProtocol.java:664)
at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.jav
a:527)
at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWo
rkerThread.java:80)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.jav
a:684)
at java.lang.Thread.run(Thread.java:595)
No deadlock, and BLOCKED.
Usually after restart I have 3-5% CPU usage...
SOLR-1.1
Thanks,
Fuad
P.S.
It is still blocked:
"http-10080-Processor50" Id=68 in BLOCKED on
[EMAIL PROTECTED] total
cpu time=72310.0000ms user time=71090.0000ms
Why it shows same numbers after 5 minutes? I clicked F5 in Internet
Explorer, and I don't expect HTTP Caching!
cpu time=72310.0000ms user time=71090.0000ms