Re: Loading performance slowdown at ~ 400K documents
Hi Tracy. Can you advise the sort of difference in max heap space that resulted in the improvement, that is, your before and after max heap space. Many thanks. Regards, David Tracy Flynn wrote: Thanks for the replies. For a completely different reason, I happened to look at the memory stats for all processes including the SOLR instances. Noticed that the SLOW Solr instance was maxing out with more virtual memory than allocated. After boosting the maximum heap space and restarting, everything started to run at 4x-5x the speed before the fix - and at the rate I reasonably thought it should. Tracy On May 9, 2008, at 8:02 AM, Tracy Flynn wrote: Hi, I'm starting to see significant slowdown in loading performance after I have loaded about 400K documents. I go from a load rate of near 40 docs/sec to 20- 25 docs a second. Am I correct in assuming that, during indexing operations, Lucene/SOLR tries to hold as much of the indexex in memory as possible? If so, does the slowdown indicate need to increase JVM heap space? Any ideas / help would be appreciated Regards, Tracy - Details Documents loaded as XML via POST command in batches of 1000, commit after each batch Total current documents ~ 450,000 Avg document size: 4KB One indexed text field contains 3KB or so. (body field below - standard type 'text') Dual XEON 3 GHZ 4 GB memory SOLR JVM Startup options java -Xms256m -Xmx1000m -jar start.jar Relevant portion of the schema follows field name=document_id type=string indexed=true stored=true required=true/ field name=language type=string indexed=true stored=true required=false/ field name=languages type=string indexed=true stored=true required=false/ !-- The value specified for folding_id must be a field of type integer - type sint does not work -- field name=folding_id type=integer indexed=true stored=true required=false default=0/ field name=document_type type=string indexed=true stored=true required=true/ field name=title type=text indexed=true stored=true required=false/ field name=body type=text indexed=true stored=true required=false compressed=true/ field name=teaser type=text indexed=no stored=true required=false/ field name=articles_in_category type=sint indexed=true stored=true required=false default=0/ field name=pen_name type=text indexed=true stored=true required=false/ field name=article_id type=sint indexed=true stored=true required=false default=0/ field name=article_status_id type=sint indexed=true stored=true required=false default=0/ field name=user_id type=sint indexed=true stored=true required=false default=0/ field name=user_name type=text indexed=true stored=true required=false/ field name=user_email type=text indexed=true stored=true required=false/ field name=channel_context type=sint indexed=true stored=true required=false multiValued=true/ field name=category_id type=sint indexed=true stored=true required=false default=0/ field name=category_status_id type=sint indexed=true stored=true required=false default=0/ field name=category_title type=text indexed=true stored=true required=false/ field name=category_keywords type=text indexed=true stored=true required=false multiValued=true/ field name=category_type type=text indexed=true stored=true required=false/ field name=channel_id type=sint indexed=true stored=true required=false default=0/ field name=channel_title type=text indexed=true stored=true required=false/ field name=helium_rank type=sint indexed=false stored=true required=false default=0/ field name=helium_rank_percentile type=sfloat indexed=false stored=true required=false/ field name=helium_scaled_rank_boost type=sfloat indexed=true stored=true required=false/ field name=helium_scaled_rank_boost_string type=string indexed=true stored=true required=false/ !-- field name=title_popularity type=sint indexed=true stored=true default=0/ field name=title_recent_popularity type=sint indexed=true stored=true default=0/ field name=title_views_measure type=sint indexed=true stored=true default=0/ field name=title_recent_earnings_measure type=sint indexed=true stored=true default=0/ field name=title_earnings_measure type=sint indexed=true stored=true default=0/ -- field name=created_date type=date indexed=true stored=true required=false /
multi core vs multi app
I am trying to decide whether it is best to work with multiple apps in a tomcat instance or run a single app with multiple cores. How do these options compare in terms of impact on RAM requirements. Is anyone using the multicore in production to suggest whether it is stable enough to use with replication scripts etc. Is there any sort of a write up anywhere of how one could perform a search across multiple cores within the single app? Not sure about efficiency of something like though compared to merging indexes periodically to search across the merged index instead. It would seem that a search across multiple cores would add slowness even if this is doable. Any advice appreciated. Many thanks. Regards, David
Re: multi core vs multi app
Hi there. Many thanks for your replies. They have helped me determine a direction. It is a great thing to have both options available (and to better understand the pros and cons of each). Regards David Ryan McKinley wrote: Otis Gospodnetic wrote: Quick answers. 2 webapps one core/index each vs. 1 webapp with 2 cores (but there is also 1 webapp with 2 virtual webapps, one core/index each). If RAM is an issue, I'd think 1 webapp would be slightly gentler on your RAM. I think we should emphasize the *slightly* -- Whatever increase there is from the solr side is minimal -- none of the memory intensive aspects are shared across cores. The servlet container may add a little more memory for each webapp though. Another thing to consider is that in a multi-core setup, if one core freezs the JVM (who knows why), it will freeze everything. In a multi-webapp setup, it would only affect its own webapp. If you only have a few indexes and the configuration does not change much, you may just want to stick with multiple webapps. Multi-core is good for cases where you want to start/stop/add/modify cores at runtime. ryan
Re: Index availability during merge
Hi Otis. Many thanks for your reply. My inclination is to create separate slave and to merge from it. I was thinking to give it a different cron cycle (than any other slave) for applying snapshots. So I'd try and determine a reasonable amount of time for a merge and then resuming snapinstaller. Can anyone see a problem with this scenario? Many thanks. Regards, David Otis Gospodnetic wrote: David, Well, presumably the merging would be done on the master, while the indices on your search slaves would still happily be serving queries. Thus, you really just need to coordinate your index merging app and the app that sends documents to your Solr master for indexing. Since no new documents will be added and there will be no updates whle your merger app is running (and no commits and optimize calls), there will be no new snapshooter calls. Communication between the apps could be as simple as FS-based file (e.g. /foo/bar/i.am.merging.now-dont.touch.the.index.lock) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: David Pratt [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, February 27, 2008 9:16:08 AM Subject: Index availability during merge Hi. Merging indexes requires that the indexes be closed for the operation occur. I am interested in setting up a cron to merge indexes that are in use to generate a fresh consolidated index at specific time intervals. I don't want the smaller indexes to be taken out of service while this occurs. Can someone suggest a strategy that would not result in the loss of availability during merges. Does snapshooter fit into this scenario, can a safe copy be made while index is running, etc? Many thanks. Regards, David
Re: Start of solr 1.3 with patch collapse
Hi kordi. What was the issue and how did you solve it for the benefit of the list. Many thanks. Regards, David kordi wrote: I solved it now on myselft sorry for the post. kordi wrote: I cant start solr trunk with the path collapse i got the following error SEVERE: Could not start SOLR. Check solr/home property java.lang.NoSuchMethodError: org.apache.lucene.analysis.Token.init(IILjava/lang/String;)V at org.apache.solr.analysis.SynonymMap.makeTokens(SynonymMap.java:103) at org.apache.solr.analysis.SynonymFilterFactory.parseRules(SynonymFilterFactory.java:92) at org.apache.solr.analysis.SynonymFilterFactory.inform(SynonymFilterFactory.java:49) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:256) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:84) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:74) at org.apache.solr.core.SolrCore.init(SolrCore.java:314) My startparameters are: java -Dsolr.solr.home=/opt/solr-tomcat/solr -jar -Xms250M-Xmx250M -verbose:gc bootstrap.jar
Re: Solr in Windows XP + JDK 5 + Tomcat 6.0.13
Hi Alejandro. Since this was a bit of trouble for you could you post the steps you used to get it to work (and/or any deviation from the wiki) to summarize this thread. It has been some days that I have seen the thread on the list and it would leave something useful other than I got it running for other folks with a similar issue in future. Many thanks. Regards David Alejandro Valdez wrote: Thanks a lot, it's running right now. It seems that solr.solr.home should not point into the webapps directory, maybe this tip should be included in the installation guide... Thanks again. On Wed, Feb 20, 2008 at 10:50 PM, Yonik Seeley [EMAIL PROTECTED] wrote: On Wed, Feb 20, 2008 at 5:32 PM, Alejandro Valdez [EMAIL PROTECTED] wrote: Hi, I changed that line to: set JAVA_OPTS=-Dsolr.home=C:\xampp\tomcat\webapps\solr -Duser.language=en But It STILL isn't working...I almost give up :-( When I try to open http://localhost:8080/solr/admin, I get: --- HTTP Status 404 - /solr/admin type Status report message /solr/admin description The requested resource (/solr/admin) is not available. Apache Tomcat/6.0.13 --- Someone should fix the page http://wiki.apache.org/solr/SolrTomcat, there says that should be used -Dsolr.solr.home=... : solr.solr.home is the correct variable. Try putting the solr home (the contents of solr/example) outside the webapps directory. Only solr.war should go inside webapps. You could also try the simple example install from here: http://wiki.apache.org/solr/SolrTomcat -Yonik
Re: Solr in Windows XP + JDK 5 + Tomcat 6.0.13
Hi Alejandro. Your summary is good and it should be of benefit to others. Thank you for taking the time to prepare it. Regards, David Alejandro Valdez wrote: Hello, yes of course. I followed the instructions from http://wiki.apache.org/solr/SolrTomcat (see below) but instead of copy the example configuration files into the directory c:\web\solr\ as is explained in that page, I did it into c:\tomcat\webapps\solr and started Tomcat with: -Dsolr.solr.home=c:\tomcat\webapps\solr But it didn't work. Apparently the directory used in solr.solr.home variable MUST NOT point inside the Tomcat's webapps directory, or it will be ignored. *** The enviroment I used was: Windows XP Professional XAMPP 1.6.4 Tomcat 6.0.13 Sun JDK 5 Updated content of http://wiki.apache.org/solr/SolrTomcat: Tomcat on Windows Single Solr app 1) Download and install [WWW] Tomcat for Windows using the MSI installer. Install it with the tcnative.dll file. Say you installed it in c:\tomcat\ 2) Check if Tomcat is installed correctly by going to [WWW] http://localhost:8080/ 3) Change the c:\tomcat\conf\server.xml file to add the URIEncoding Connector element as shown above. 4) Download and unzip the Solr distribution zip file into (say) c:\temp\solrZip\ 5) Make a directory called solr where you intend the application server to function, say c:\web\solr\ (Important: It must be outside the Tomcat's webapps directory) 6) Copy the contents of the example\solr directory c:\temp\solrZip\example\solr\ to c:\web\solr\ 7) Stop the Tomcat service 8) Copy the *solr*.war file from c:\temp\solrZip\dist\ to the Tomcat webapps directory c:\tomcat\webapps\ 9) Rename the *solr*.war file solr.war 10)Use the system tray icon to configure Tomcat to start with the following Java option: -Dsolr.solr.home=c:\web\solr 11)Start the Tomcat service 12)Go to the solr admin page to verify that the installation is working. It will be at [WWW] http://localhost:8080/solr/admin On Thu, Feb 21, 2008 at 4:38 PM, David Pratt [EMAIL PROTECTED] wrote: Hi Alejandro. Since this was a bit of trouble for you could you post the steps you used to get it to work (and/or any deviation from the wiki) to summarize this thread. It has been some days that I have seen the thread on the list and it would leave something useful other than I got it running for other folks with a similar issue in future. Many thanks. Regards David Alejandro Valdez wrote: Thanks a lot, it's running right now. It seems that solr.solr.home should not point into the webapps directory, maybe this tip should be included in the installation guide... Thanks again. On Wed, Feb 20, 2008 at 10:50 PM, Yonik Seeley [EMAIL PROTECTED] wrote: On Wed, Feb 20, 2008 at 5:32 PM, Alejandro Valdez [EMAIL PROTECTED] wrote: Hi, I changed that line to: set JAVA_OPTS=-Dsolr.home=C:\xampp\tomcat\webapps\solr -Duser.language=en But It STILL isn't working...I almost give up :-( When I try to open http://localhost:8080/solr/admin, I get: --- HTTP Status 404 - /solr/admin type Status report message /solr/admin description The requested resource (/solr/admin) is not available. Apache Tomcat/6.0.13 --- Someone should fix the page http://wiki.apache.org/solr/SolrTomcat, there says that should be used -Dsolr.solr.home=... : solr.solr.home is the correct variable. Try putting the solr home (the contents of solr/example) outside the webapps directory. Only solr.war should go inside webapps. You could also try the simple example install from here: http://wiki.apache.org/solr/SolrTomcat -Yonik
Re: Building Solr with Maven 2 - Solr-19
Hi Ryan. Thanks for your reply. I don't see anywhere that we have created a war for the example app to use. Jetty will run but it does not have an app to run. Regards, David Ryan McKinley wrote: aaah -- mvn jetty:run does not work with that pom you can run the example server from the example directory using java -jar start.jar check: http://lucene.apache.org/solr/tutorial.html ryan David Pratt wrote: Hey Ryan. You were right about SOLR-303. I checked out trunk from svn, patched source, ran the pom and manually ran the commons pom and it all built fine. Only trouble is I am not providing the right incantation for maven to start the server. This I am sure will all seem very simple once I have got this working at least the first time but I need a hint :-) Many thanks. Regards, David Ryan McKinley wrote: I just posted the pom.xml I am using with the current trunk code. Give that a go. src/java/org/apache/solr/handler/component/ShardRequest.java:[60,0] 'class' or 'interface' expected src/java/src/test/org/apache/solr/TestDistributedSearch.java:[129,0] 'class' or 'interface' expected src/java/src/java/org/apache/solr/handler/component/ShardDoc.java:[273,0] 'class' or 'interface' expected This is not in trunk yet -- my guess is you are applying a SOLR-303 patch that is not in sync. make sure 'ant' works before trying to debug maven (for now) ryan
Building Solr with Maven 2 - Solr-19
Hi I am trying to build solr from solr-19 poms from the issue tracker. I've tried both of the most recent poms but pehaps it is my inexperience with maven that may be the issue. I have been reading up on maven but perhaps I am missing something. For the combined pom.xml (Jan 9 2008, Ryan McKinley) I placed it in the source directory and ran: mvn clean install The build failed indicating issues with the following files with serveral line numbes in the following files. I've only cut and pasted the first line of each src/java/org/apache/solr/handler/component/ShardRequest.java:[60,0] 'class' or 'interface' expected src/java/src/test/org/apache/solr/TestDistributedSearch.java:[129,0] 'class' or 'interface' expected src/java/src/java/org/apache/solr/handler/component/ShardDoc.java:[273,0] 'class' or 'interface' expected the trace running mvn -e is below. In any case, this was unsuccessful. For the pom that does the builds with solr in separate packages (solr-test-maven.zip, Dec 27 2007, Ryan McKinley) I unzipped the folders and the two poms to the source directory and ran mvn clean install mvn jetty:run It build successfully - see result below. Problem was how to run it. mvn jetty:run did not launch the server. Am i missing a step? Any hints would be helpful. Many thanks. Regards, David Build using combined pom.xml: [INFO] [INFO] Trace org.apache.maven.BuildFailureException: Compilation failure at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:560) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalWithLifecycle(DefaultLifecycleExecutor.java:480) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:459) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:311) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:278) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:143) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:333) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:126) at org.apache.maven.cli.MavenCli.main(MavenCli.java:282) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315) at org.codehaus.classworlds.Launcher.launch(Launcher.java:255) at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430) at org.codehaus.classworlds.Launcher.main(Launcher.java:375) Caused by: org.apache.maven.plugin.CompilationFailureException: Compilation failure at org.apache.maven.plugin.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:516) at org.apache.maven.plugin.CompilerMojo.execute(CompilerMojo.java:114) at org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:447) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:539) ... 16 more [INFO] [INFO] Total time: 23 seconds [INFO] Finished at: Sat Jan 26 02:00:52 AST 2008 [INFO] Final Memory: 6M/21M [INFO] Build using solr-test-maven.zip: /Users/davidpratt/.m2/repository/org/apache/lucene/solr/solr-server/1.3-SNAPSHOT/solr-server-1.3-SNAPSHOT-tests.jar [INFO] [INFO] [INFO] [INFO] Reactor Summary: [INFO] [INFO] Solr Parent ... SUCCESS [31.160s] [INFO] Solr Common ... SUCCESS [15.464s] [INFO] Solr Client ... SUCCESS [7.087s] [INFO] Solr Core . SUCCESS [9.945s] [INFO] Solr Server ... SUCCESS [5.514s] [INFO] [INFO] [INFO] BUILD SUCCESSFUL [INFO] [INFO] Total time: 1 minute 15 seconds [INFO] Finished at: Sat Jan 26 10:38:32 AST 2008 [INFO] Final Memory: 12M/25M [INFO]
Multisearching with Solr
Hi. I am checking out solr after having some experience with lucene using pyLucene. I am looking at the potential of solr to search over a large index divided over multiple servers to collect results, sort of what the parallel multisearcher does in Lucene on its own. From quick scan of archives it appears SOLR-303 may be the answer to this. Can this functionality be incorporated into 1.2 in a sandbox environment? Has anyone written a recipe that would be helpful in getting a sandbox up and running with SOLR-303? It will most likely be a few months before needing to incorporate this type of functionality in production but hoping to begin experimenting as soon as possible. On that note, is it anticipated that 1.3 will be out in a few months. If so, will it include this functionality? Lastly, what sort of load balancing and replication potential is anticipated for the multisearching capability? Many thanks. Regards, David
Re: Multisearching with Solr
Hi Erick. Thank you for your reply. Unfortunately, I cannot access the link you provided. It this message from the solr-user list? Many thanks. Regards, David Erick Erickson wrote: You can always use the trunk build, but you'll have to check the status of SOLR-303 to be sure it's in the trunk... Here's a thread that discusses this... http://mail.google.com/mail/?zx=wmtcqx3ngeupshva=1#label/Solr/11799e3704804489 Best Erick On Jan 21, 2008 10:55 AM, David Pratt [EMAIL PROTECTED] wrote: Hi. I am checking out solr after having some experience with lucene using pyLucene. I am looking at the potential of solr to search over a large index divided over multiple servers to collect results, sort of what the parallel multisearcher does in Lucene on its own. From quick scan of archives it appears SOLR-303 may be the answer to this. Can this functionality be incorporated into 1.2 in a sandbox environment? Has anyone written a recipe that would be helpful in getting a sandbox up and running with SOLR-303? It will most likely be a few months before needing to incorporate this type of functionality in production but hoping to begin experimenting as soon as possible. On that note, is it anticipated that 1.3 will be out in a few months. If so, will it include this functionality? Lastly, what sort of load balancing and replication potential is anticipated for the multisearching capability? Many thanks. Regards, David