Re: Loading performance slowdown at ~ 400K documents

2008-05-11 Thread David Pratt
Hi Tracy. Can you advise the sort of difference in max heap space that 
resulted in the improvement, that is, your before and after max heap 
space. Many thanks.


Regards,
David

Tracy Flynn wrote:

Thanks for the replies.

For a completely different reason, I happened to look at the memory 
stats for all processes including the SOLR instances. Noticed that the 
SLOW Solr instance was maxing out with more virtual memory than 
allocated. After boosting the maximum heap space and restarting, 
everything started to run at 4x-5x the speed before the fix - and at the 
rate I reasonably thought it should.


Tracy

On May 9, 2008, at 8:02 AM, Tracy Flynn wrote:


Hi,

I'm starting to see significant slowdown in loading performance after 
I have loaded about 400K documents.  I go from a load rate of near 40 
docs/sec to 20- 25 docs a second.


Am I correct in assuming that, during indexing operations, Lucene/SOLR 
tries to hold as much of the indexex in memory as possible? If so, 
does the slowdown indicate need to increase JVM heap space?


Any ideas / help would be appreciated

Regards,

Tracy

- 



Details

Documents loaded as XML via POST command in batches of 1000, commit 
after each batch


Total current documents ~ 450,000
Avg document size: 4KB
One indexed text field contains 3KB or so. (body field below - 
standard type 'text')


Dual XEON 3 GHZ 4 GB memory

SOLR JVM Startup options

java -Xms256m -Xmx1000m  -jar start.jar


Relevant portion of the schema follows


  field name=document_id type=string indexed=true stored=true 
required=true/
  field name=language type=string indexed=true stored=true 
required=false/
  field name=languages type=string indexed=true stored=true 
required=false/
  !-- The value specified for folding_id must be a field of type 
integer -

   type sint does not work --
  field name=folding_id type=integer indexed=true stored=true 
required=false default=0/
  field name=document_type type=string indexed=true 
stored=true required=true/
  field name=title type=text indexed=true stored=true 
required=false/
  field name=body type=text indexed=true stored=true 
required=false compressed=true/
  field name=teaser type=text indexed=no stored=true 
required=false/
  field name=articles_in_category type=sint indexed=true 
stored=true required=false default=0/
  field name=pen_name type=text indexed=true stored=true 
required=false/
  field name=article_id type=sint indexed=true stored=true 
required=false default=0/
  field name=article_status_id type=sint indexed=true 
stored=true required=false default=0/
  field name=user_id type=sint indexed=true stored=true 
required=false default=0/
  field name=user_name type=text indexed=true stored=true 
required=false/
  field name=user_email type=text indexed=true stored=true 
required=false/
  field name=channel_context type=sint indexed=true 
stored=true required=false multiValued=true/
  field name=category_id type=sint indexed=true stored=true 
required=false default=0/
  field name=category_status_id type=sint indexed=true 
stored=true required=false default=0/
  field name=category_title type=text indexed=true 
stored=true required=false/
  field name=category_keywords type=text indexed=true 
stored=true required=false multiValued=true/
  field name=category_type type=text indexed=true stored=true 
required=false/
  field name=channel_id type=sint indexed=true stored=true 
required=false default=0/
  field name=channel_title type=text indexed=true stored=true 
required=false/
  field name=helium_rank type=sint indexed=false stored=true 
required=false default=0/
  field name=helium_rank_percentile type=sfloat indexed=false 
stored=true required=false/
  field name=helium_scaled_rank_boost type=sfloat indexed=true 
stored=true required=false/
  field name=helium_scaled_rank_boost_string type=string 
indexed=true stored=true required=false/

   !--
   field name=title_popularity type=sint indexed=true 
stored=true default=0/
   field name=title_recent_popularity type=sint indexed=true 
stored=true default=0/
   field name=title_views_measure type=sint indexed=true 
stored=true default=0/
   field name=title_recent_earnings_measure type=sint 
indexed=true stored=true default=0/
   field name=title_earnings_measure type=sint indexed=true 
stored=true default=0/

  --
  field name=created_date type=date indexed=true stored=true 
required=false /










multi core vs multi app

2008-03-03 Thread David Pratt
I am trying to decide whether it is best to work with multiple apps in a 
 tomcat instance or run a single app with multiple cores. How do these 
options compare in terms of impact on RAM requirements.


Is anyone using the multicore in production to suggest whether it is 
stable enough to use with replication scripts etc. Is there any sort of 
a write up anywhere of how one could perform a search across multiple 
cores within the single app? Not sure about efficiency of something like 
though compared to merging indexes periodically to search across the 
merged index instead. It would seem that a search across multiple cores 
would add slowness even if this is doable. Any advice appreciated. Many 
thanks.


Regards,
David


Re: multi core vs multi app

2008-03-03 Thread David Pratt
Hi there. Many thanks for your replies. They have helped me determine a 
direction. It is a great thing to have both options available (and to 
better understand the pros and cons of each).


Regards
David

Ryan McKinley wrote:

Otis Gospodnetic wrote:
Quick answers.  2 webapps one core/index each vs. 1 webapp with 2 
cores (but there is also 1 webapp with 2 virtual webapps, one 
core/index each).  If RAM is an issue, I'd think 1 webapp would be 
slightly gentler on your RAM.




I think we should emphasize the *slightly* -- Whatever increase there is 
from the solr side is minimal -- none of the memory intensive aspects 
are shared across cores.  The servlet container may add a little more 
memory for each webapp though.


Another thing to consider is that in a multi-core setup, if one core 
freezs the JVM (who knows why), it will freeze everything.  In a 
multi-webapp setup, it would only affect its own webapp.


If you only have a few indexes and the configuration does not change 
much, you may just want to stick with multiple webapps.  Multi-core is 
good for cases where you want to start/stop/add/modify cores at runtime.


ryan



Re: Index availability during merge

2008-02-27 Thread David Pratt
Hi Otis. Many thanks for your reply. My inclination is to create 
separate slave and to merge from it. I was thinking to give it a 
different cron cycle (than any other slave) for applying snapshots. So 
I'd try and determine a reasonable amount of time for a merge and then 
resuming snapinstaller. Can anyone see a problem with this scenario? 
Many thanks.


Regards,
David

Otis Gospodnetic wrote:

David,

Well, presumably the merging would be done on the master, while the indices on 
your search slaves would still happily be serving queries.  Thus, you really 
just need to coordinate your index merging app and the app that sends documents 
to your Solr master for indexing.  Since no new documents will be added and 
there will be no updates whle your merger app is running (and no commits and 
optimize calls), there will be no new snapshooter calls.  Communication between 
the apps could be as simple as FS-based file (e.g. 
/foo/bar/i.am.merging.now-dont.touch.the.index.lock)

Otis 


--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 

From: David Pratt [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, February 27, 2008 9:16:08 AM
Subject: Index availability during merge

Hi. Merging indexes requires that the indexes be closed for the 
operation occur. I am interested in setting up a cron to merge indexes 
that are in use to generate a fresh consolidated index at specific time 
intervals. I don't want the smaller indexes to be taken out of service 
while this occurs. Can someone suggest a strategy that would not result 
in the loss of availability during merges. Does snapshooter fit into 
this scenario, can a safe copy be made while index is running, etc? Many 
thanks.


Regards,
David






Re: Start of solr 1.3 with patch collapse

2008-02-26 Thread David Pratt
Hi kordi. What was the issue and how did you solve it for the benefit of 
the list. Many thanks.


Regards,
David

kordi wrote:

I solved it now on myselft sorry for the post.

kordi wrote:

I cant start solr trunk with the path collapse i got the following error

SEVERE: Could not start SOLR. Check solr/home property
java.lang.NoSuchMethodError:
org.apache.lucene.analysis.Token.init(IILjava/lang/String;)V
at
org.apache.solr.analysis.SynonymMap.makeTokens(SynonymMap.java:103)
at
org.apache.solr.analysis.SynonymFilterFactory.parseRules(SynonymFilterFactory.java:92)
at
org.apache.solr.analysis.SynonymFilterFactory.inform(SynonymFilterFactory.java:49)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:256)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:84)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:74)
at org.apache.solr.core.SolrCore.init(SolrCore.java:314)

My startparameters are:

java -Dsolr.solr.home=/opt/solr-tomcat/solr -jar  -Xms250M-Xmx250M 
-verbose:gc bootstrap.jar







Re: Solr in Windows XP + JDK 5 + Tomcat 6.0.13

2008-02-21 Thread David Pratt
Hi Alejandro. Since this was a bit of trouble for you could you post the 
steps you used to get it to work (and/or any deviation from the wiki) to 
summarize this thread. It has been some days that I have seen the thread 
on the list and it would leave something useful other than I got it 
running for other folks with a similar issue in future. Many thanks.


Regards
David

Alejandro Valdez wrote:

Thanks a lot, it's running right now.

It seems that solr.solr.home should not point into the webapps
directory, maybe this tip should be included in the installation
guide...

Thanks again.


On Wed, Feb 20, 2008 at 10:50 PM, Yonik Seeley [EMAIL PROTECTED] wrote:

On Wed, Feb 20, 2008 at 5:32 PM, Alejandro Valdez

[EMAIL PROTECTED] wrote:


Hi, I changed that line to:

 
   set JAVA_OPTS=-Dsolr.home=C:\xampp\tomcat\webapps\solr -Duser.language=en
 
   But It STILL isn't working...I almost give up :-(
 
   When I try to open http://localhost:8080/solr/admin, I get:
 
  ---
   HTTP Status 404 - /solr/admin
   type Status report
   message /solr/admin
   description The requested resource (/solr/admin) is not available.
   Apache Tomcat/6.0.13
   ---
 
 
   Someone should fix the page http://wiki.apache.org/solr/SolrTomcat,
   there says that should be used -Dsolr.solr.home=... :

 solr.solr.home is the correct variable.
 Try putting the solr home (the contents of solr/example) outside the
 webapps directory.  Only solr.war should go inside webapps.

 You could also try the simple example install from here:


http://wiki.apache.org/solr/SolrTomcat

 -Yonik





Re: Solr in Windows XP + JDK 5 + Tomcat 6.0.13

2008-02-21 Thread David Pratt
Hi Alejandro. Your summary is good and it should be of benefit to 
others. Thank you for taking the time to prepare it.


Regards,
David

Alejandro Valdez wrote:

Hello, yes of course.

I followed the instructions from
http://wiki.apache.org/solr/SolrTomcat (see below)
but instead of copy the example configuration files into the directory
c:\web\solr\ as
is explained in that page, I did it into c:\tomcat\webapps\solr and
started Tomcat with:
-Dsolr.solr.home=c:\tomcat\webapps\solr

But it didn't work.


Apparently the directory used in solr.solr.home variable MUST NOT
point inside the Tomcat's webapps directory, or it will be ignored.
***

The enviroment I used was:
Windows XP Professional
XAMPP 1.6.4
Tomcat 6.0.13
Sun JDK 5


Updated content of http://wiki.apache.org/solr/SolrTomcat:

Tomcat on Windows
Single Solr app

1) Download and install [WWW] Tomcat for Windows using the MSI
installer. Install it with the tcnative.dll file. Say you installed it
in c:\tomcat\
2) Check if Tomcat is installed correctly by going to [WWW]
http://localhost:8080/
3) Change the c:\tomcat\conf\server.xml file to add the URIEncoding
Connector element as shown above.
4) Download and unzip the Solr distribution zip file into (say) c:\temp\solrZip\
5) Make a directory called solr where you intend the application
server to function, say c:\web\solr\ (Important: It must be outside
the Tomcat's webapps directory)
6) Copy the contents of the example\solr directory
c:\temp\solrZip\example\solr\ to c:\web\solr\
7) Stop the Tomcat service
8) Copy the *solr*.war file from c:\temp\solrZip\dist\ to the Tomcat
webapps directory c:\tomcat\webapps\
9) Rename the *solr*.war file solr.war
10)Use the system tray icon to configure Tomcat to start with the
following Java option: -Dsolr.solr.home=c:\web\solr
11)Start the Tomcat service
12)Go to the solr admin page to verify that the installation is
working. It will be at [WWW] http://localhost:8080/solr/admin


On Thu, Feb 21, 2008 at 4:38 PM, David Pratt [EMAIL PROTECTED] wrote:

Hi Alejandro. Since this was a bit of trouble for you could you post the
 steps you used to get it to work (and/or any deviation from the wiki) to
 summarize this thread. It has been some days that I have seen the thread
 on the list and it would leave something useful other than I got it
 running for other folks with a similar issue in future. Many thanks.

 Regards
 David



 Alejandro Valdez wrote:
  Thanks a lot, it's running right now.
 
  It seems that solr.solr.home should not point into the webapps
  directory, maybe this tip should be included in the installation
  guide...
 
  Thanks again.
 
 
  On Wed, Feb 20, 2008 at 10:50 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
  On Wed, Feb 20, 2008 at 5:32 PM, Alejandro Valdez
 
  [EMAIL PROTECTED] wrote:
 
  Hi, I changed that line to:
   
 set JAVA_OPTS=-Dsolr.home=C:\xampp\tomcat\webapps\solr 
-Duser.language=en
   
 But It STILL isn't working...I almost give up :-(
   
 When I try to open http://localhost:8080/solr/admin, I get:
   
---
 HTTP Status 404 - /solr/admin
 type Status report
 message /solr/admin
 description The requested resource (/solr/admin) is not available.
 Apache Tomcat/6.0.13
 ---
   
   
 Someone should fix the page http://wiki.apache.org/solr/SolrTomcat,
 there says that should be used -Dsolr.solr.home=... :
 
   solr.solr.home is the correct variable.
   Try putting the solr home (the contents of solr/example) outside the
   webapps directory.  Only solr.war should go inside webapps.
 
   You could also try the simple example install from here:
 
 
  http://wiki.apache.org/solr/SolrTomcat
 
   -Yonik
 
 





Re: Building Solr with Maven 2 - Solr-19

2008-01-27 Thread David Pratt
Hi Ryan. Thanks for your reply. I don't see anywhere that we have 
created a war for the example app to use. Jetty will run but it does not 
have an app to run.


Regards,
David

Ryan McKinley wrote:

aaah -- mvn jetty:run does not work with that pom

you can run the example server from the example directory using java 
-jar start.jar


check:
http://lucene.apache.org/solr/tutorial.html

ryan


David Pratt wrote:
Hey Ryan. You were right about SOLR-303. I checked out trunk from svn, 
patched source, ran the pom and manually ran the commons pom and it 
all built fine. Only trouble is I am not providing the right 
incantation for maven to start the server. This I am sure will all 
seem very simple once I have got this working at least the first time 
but I need a hint :-) Many thanks.


Regards,
David


Ryan McKinley wrote:
I just posted the pom.xml I am using with the current trunk code.  
Give that a go.



src/java/org/apache/solr/handler/component/ShardRequest.java:[60,0] 
'class' or 'interface' expected
src/java/src/test/org/apache/solr/TestDistributedSearch.java:[129,0] 
'class' or 'interface' expected
src/java/src/java/org/apache/solr/handler/component/ShardDoc.java:[273,0] 
'class' or 'interface' expected


This is not in trunk yet -- my guess is you are applying a SOLR-303 
patch that is not in sync.


make sure 'ant' works before trying to debug maven  (for now)

ryan







Building Solr with Maven 2 - Solr-19

2008-01-26 Thread David Pratt

Hi I am trying to build solr from solr-19 poms from the issue tracker.

I've tried both of the most recent poms but pehaps it is my inexperience 
with maven that may be the issue. I have been reading up on maven but 
perhaps I am missing something.


For the combined pom.xml (Jan 9 2008, Ryan McKinley) I placed it in the 
source directory and ran:


mvn clean install

The build failed indicating issues with the following files with 
serveral line numbes in the following files. I've only cut and pasted 
the first line of each


src/java/org/apache/solr/handler/component/ShardRequest.java:[60,0] 
'class' or 'interface' expected
src/java/src/test/org/apache/solr/TestDistributedSearch.java:[129,0] 
'class' or 'interface' expected
src/java/src/java/org/apache/solr/handler/component/ShardDoc.java:[273,0] 
'class' or 'interface' expected


the trace running mvn -e is below. In any case, this was unsuccessful.

For the pom that does the builds with solr in separate packages 
(solr-test-maven.zip, Dec 27 2007, Ryan McKinley) I unzipped the folders 
and the two poms to the source directory and ran


mvn clean install
mvn jetty:run

It build successfully - see result below. Problem was how to run it. mvn 
jetty:run did not launch the server. Am i missing a step? Any hints 
would be helpful.


Many thanks.

Regards,
David




Build using combined pom.xml:

[INFO] 


[INFO] Trace
org.apache.maven.BuildFailureException: Compilation failure
at 
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:560)
at 
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalWithLifecycle(DefaultLifecycleExecutor.java:480)
at 
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:459)
at 
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:311)
at 
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:278)
at 
org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:143)

at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:333)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:126)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:282)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:585)
at 
org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315)

at org.codehaus.classworlds.Launcher.launch(Launcher.java:255)
at 
org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430)

at org.codehaus.classworlds.Launcher.main(Launcher.java:375)
Caused by: org.apache.maven.plugin.CompilationFailureException: 
Compilation failure
at 
org.apache.maven.plugin.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:516)
at 
org.apache.maven.plugin.CompilerMojo.execute(CompilerMojo.java:114)
at 
org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:447)
at 
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:539)

... 16 more
[INFO] 


[INFO] Total time: 23 seconds
[INFO] Finished at: Sat Jan 26 02:00:52 AST 2008
[INFO] Final Memory: 6M/21M
[INFO] 



Build using solr-test-maven.zip:

/Users/davidpratt/.m2/repository/org/apache/lucene/solr/solr-server/1.3-SNAPSHOT/solr-server-1.3-SNAPSHOT-tests.jar
[INFO]
[INFO]
[INFO] 


[INFO] Reactor Summary:
[INFO] 

[INFO] Solr Parent ... SUCCESS 
[31.160s]
[INFO] Solr Common ... SUCCESS 
[15.464s]
[INFO] Solr Client ... SUCCESS 
[7.087s]
[INFO] Solr Core . SUCCESS 
[9.945s]
[INFO] Solr Server ... SUCCESS 
[5.514s]
[INFO] 

[INFO] 


[INFO] BUILD SUCCESSFUL
[INFO] 


[INFO] Total time: 1 minute 15 seconds
[INFO] Finished at: Sat Jan 26 10:38:32 AST 2008
[INFO] Final Memory: 12M/25M
[INFO] 

Multisearching with Solr

2008-01-21 Thread David Pratt
Hi. I am checking out solr after having some experience with lucene 
using pyLucene. I am looking at the potential of solr to search over a 
large index divided over multiple servers to collect results, sort of 
what the parallel multisearcher does in Lucene on its own. From quick 
scan of archives it appears SOLR-303 may be the answer to this. Can this 
functionality be incorporated into 1.2 in a sandbox environment? Has 
anyone written a recipe that would be helpful in getting a sandbox up 
and running with SOLR-303?


It will most likely be a few months before needing to incorporate this 
type of functionality in production but hoping to begin experimenting as 
soon as possible. On that note, is it anticipated that 1.3 will be out 
in a few months. If so, will it include this functionality? Lastly, what 
sort of load balancing and replication potential is anticipated for the 
multisearching capability? Many thanks.


Regards,
David


Re: Multisearching with Solr

2008-01-21 Thread David Pratt
Hi Erick. Thank you for your reply. Unfortunately, I cannot access the 
link you provided. It this message from the solr-user list? Many thanks.


Regards,
David

Erick Erickson wrote:

You can always use the trunk build, but you'll have to check the
status of SOLR-303 to be sure it's in the trunk...

Here's a thread that discusses this...

http://mail.google.com/mail/?zx=wmtcqx3ngeupshva=1#label/Solr/11799e3704804489

Best
Erick

On Jan 21, 2008 10:55 AM, David Pratt [EMAIL PROTECTED] wrote:


Hi. I am checking out solr after having some experience with lucene
using pyLucene. I am looking at the potential of solr to search over a
large index divided over multiple servers to collect results, sort of
what the parallel multisearcher does in Lucene on its own. From quick
scan of archives it appears SOLR-303 may be the answer to this. Can this
functionality be incorporated into 1.2 in a sandbox environment? Has
anyone written a recipe that would be helpful in getting a sandbox up
and running with SOLR-303?

It will most likely be a few months before needing to incorporate this
type of functionality in production but hoping to begin experimenting as
soon as possible. On that note, is it anticipated that 1.3 will be out
in a few months. If so, will it include this functionality? Lastly, what
sort of load balancing and replication potential is anticipated for the
multisearching capability? Many thanks.

Regards,
David