Re: How to preserve filelist / commit-points after master restart
If it helps, these are filelist output before and after restarting master on a sample setup: Before restarting master: --- {indexSize=113.82 KB, indexPath=C:\JavaStuff\Solr\replication\solrhome\master\data\index, commits=[{indexVersion=1252480003511,generation=107,filelist=[_35.fdx, _35.frq, _35.tii, _35.fdt, _35.tis, segments_2z, _35.fnm, _35.prx]}] ,isMaster=true,isSlave=false,indexVersion=1252480003511,generation=107} http://localhost:8081/master/replication Poll Interval null Local Index Index Version: 1252480003511, Generation: 107 Location: C:\JavaStuff\Solr\replication\solrhome\repeater\data\index Size: 57.29 KB Config Files To Replicate: schema.xml,stopwords.txt,synonyms.txt Trigger Replication On: [optimize, startup] Times Replicated Since Startup: 25 Previous Replication Done At: Wed May 04 10:51:01 EST 2011 Config Files Replicated At: null Config Files Replicated: null Times Config Files Replicated Since Startup: null After restarting master: --- {indexSize=113.82 KB, indexPath=C:\JavaStuff\Solr\replication\solrhome\master\data\index, commits=[], isMaster=true,isSlave=false,indexVersion=1252480003512,generation=108} Master http://localhost:8081/master/replication Poll Interval null Local Index Index Version: 1252480003511, Generation: 107 Location: C:\JavaStuff\Solr\replication\solrhome\repeater\data\index Size: 57.29 KB Config Files To Replicate: schema.xml,stopwords.txt,synonyms.txt Trigger Replication On: [optimize, startup] Times Replicated Since Startup: 25 Previous Replication Done At: Wed May 04 10:51:01 EST 2011 Config Files Replicated At: null Config Files Replicated: null Times Config Files Replicated Since Startup: null Hope someone would be able to help. Thanks On Wed, May 4, 2011 at 3:46 PM, Maduranga Kannangara madura...@gmail.comwrote: Hi All, We use Solr 1.4.1. Single core setup with a repeater (for QA) and a few slaves (for Production). Master will index many sources and make data ready. Once all data is ready-for-production, optimization will take place. On master replicateAfter is set to optimize. (Subsequently on repeater replicateAfter=commit,startup). We do not want to use replicateAfter=startup,optimize on master as that would release bad data. As you can see, a bunch of sources should fit together to be able to release a sensible product. So we use replicateAfter=optimize to denote data is now okay to move to the next level. The problem is when master is restarted the filelist command on ReplicationHandler returns nothing and replication will not take place until another optimise command is done to master. How can I preserve the optimized state (or filelist or commit-points, not sure what keyword to use..) even after a master restart so that slaves can carry on from there. (I saw the mail thread Yonik has answered: Replication filelist command failure on container restart, but I am trying to figure out if its possible to persist this file-list or indexDeletionPolicy or whatever that state -- please correct me on that and sorry for my layman language) We have too many master indexes setup in this way, therefore its not a good idea for us to run optimize or have replicateAfter=startup on each index as that will reduce the data quality or possible level-of-automation. Any solution to work around or fix this issue is highly appreciated. Thanks in advance Madu
How to preserve filelist / commit-points after master restart
Hi All, We use Solr 1.4.1. Single core setup with a repeater (for QA) and a few slaves (for Production). Master will index many sources and make data ready. Once all data is ready-for-production, optimization will take place. On master replicateAfter is set to optimize. (Subsequently on repeater replicateAfter=commit,startup). We do not want to use replicateAfter=startup,optimize on master as that would release bad data. As you can see, a bunch of sources should fit together to be able to release a sensible product. So we use replicateAfter=optimize to denote data is now okay to move to the next level. The problem is when master is restarted the filelist command on ReplicationHandler returns nothing and replication will not take place until another optimise command is done to master. How can I preserve the optimized state (or filelist or commit-points, not sure what keyword to use..) even after a master restart so that slaves can carry on from there. (I saw the mail thread Yonik has answered: Replication filelist command failure on container restart, but I am trying to figure out if its possible to persist this file-list or indexDeletionPolicy or whatever that state -- please correct me on that and sorry for my layman language) We have too many master indexes setup in this way, therefore its not a good idea for us to run optimize or have replicateAfter=startup on each index as that will reduce the data quality or possible level-of-automation. Any solution to work around or fix this issue is highly appreciated. Thanks in advance Madu
RE: Solr Deployment Question
They are two web applications running on a single Tomcat instance. Thanks Madu -Original Message- From: findbestopensource [mailto:findbestopensou...@gmail.com] Sent: Friday, 14 May 2010 4:38 PM To: solr-user@lucene.apache.org Subject: Re: Solr Deployment Question Please explain how you have handled two indexes in a single VM. Is it multi core? To identify memory consumption, You need to calculate usedmemory before and after loading the indexes, basically calculate usedmemory before and after any check point you want to analyse. Their difference will give you the actual memory consumption. Regards Aditya http://www.findbestopensource.com On Fri, May 14, 2010 at 11:14 AM, Maduranga Kannangara mkannang...@infomedia.com.au wrote: But even we used a single index, we were running out of memory. What do you mean by active? No queries on the masters. Only one index is being processed/optimized. Also, if I may add to my same question, how can I find the amount of memory that an index would use, theoretically? i.e.: Is there a formulae etc? Thanks Madu -Original Message- From: findbestopensource [mailto:findbestopensou...@gmail.com] Sent: Friday, 14 May 2010 3:34 PM To: solr-user@lucene.apache.org Subject: Re: Solr Deployment Question You may use one index at a time, but both indexes are active and loaded all its terms in memory. Memory consumption will be certainly more. Regards Aditya http://www.findbestopensource.com On Fri, May 14, 2010 at 10:28 AM, Maduranga Kannangara mkannang...@infomedia.com.au wrote: Hi We use separate JVMs to Index and Query. (Client applications will query only slaves, while master does only indexing) Recently we moved a two master indexes to a single JVM. Our memory allocation was for each index was 512Mb and 1Gb. Once we moved both indexes to a single VM, we thought it would still Index using 1Gb as we use only one index at a time. But for our surprise it needed more than that (1.2Gb) even though only one index was used at a time. Can I know why, or can I know how to find why this is? Solr 1.4 Java 1.6.0_20 We use a VPS for deployment. Thanks in advance Madu
Solr Deployment Question
Hi We use separate JVMs to Index and Query. (Client applications will query only slaves, while master does only indexing) Recently we moved a two master indexes to a single JVM. Our memory allocation was for each index was 512Mb and 1Gb. Once we moved both indexes to a single VM, we thought it would still Index using 1Gb as we use only one index at a time. But for our surprise it needed more than that (1.2Gb) even though only one index was used at a time. Can I know why, or can I know how to find why this is? Solr 1.4 Java 1.6.0_20 We use a VPS for deployment. Thanks in advance Madu
RE: Solr Deployment Question
But even we used a single index, we were running out of memory. What do you mean by active? No queries on the masters. Only one index is being processed/optimized. Also, if I may add to my same question, how can I find the amount of memory that an index would use, theoretically? i.e.: Is there a formulae etc? Thanks Madu -Original Message- From: findbestopensource [mailto:findbestopensou...@gmail.com] Sent: Friday, 14 May 2010 3:34 PM To: solr-user@lucene.apache.org Subject: Re: Solr Deployment Question You may use one index at a time, but both indexes are active and loaded all its terms in memory. Memory consumption will be certainly more. Regards Aditya http://www.findbestopensource.com On Fri, May 14, 2010 at 10:28 AM, Maduranga Kannangara mkannang...@infomedia.com.au wrote: Hi We use separate JVMs to Index and Query. (Client applications will query only slaves, while master does only indexing) Recently we moved a two master indexes to a single JVM. Our memory allocation was for each index was 512Mb and 1Gb. Once we moved both indexes to a single VM, we thought it would still Index using 1Gb as we use only one index at a time. But for our surprise it needed more than that (1.2Gb) even though only one index was used at a time. Can I know why, or can I know how to find why this is? Solr 1.4 Java 1.6.0_20 We use a VPS for deployment. Thanks in advance Madu
RE: Solr replication 1.3 issue
Yes, that makes sense Lance. Thanks. For the moment we broke our script to do the master's part (starting rsyncd) on the master server itself. Our problem is that we have so many instances running in different environment and we'd really love to minimize the number of them. :-) Thanks again. Madu -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Wednesday, 23 December 2009 9:33 AM To: solr-user@lucene.apache.org Subject: Re: Solr replication 1.3 issue This is a Unix security question. rsyncd' is a system daemon and should be managed in the OS scripts. The rsyncd-* scripts include a security setting for the 'solr' account that limits the account to the solr data directory (and this code does not support multicore). For security reasons, I personally would not make a sudoer out of a user with an automated remote login; but every site is different. On Mon, Dec 21, 2009 at 7:54 PM, Maduranga Kannangara mkannang...@infomedia.com.au wrote: Hi All, We're trying to replicate indexes on Solr 1.3 across from Dev-QA-Staging-Prod etc. So at each stage other than Dev and Prod, each would live as a master and a slave at a given time. We hit a bottle neck (may be?) when we try to start rsyncd-start on the master from the slave machine. Commands used: ssh -o StrictHostKeyChecking=no ad...@192.168.22.1 /solr/SolrHome/bin/rsyncd-enable ssh -o StrictHostKeyChecking=no ad...@192.168.22.1 /solr / SolrHome /bin/rsyncd-start -p 18003 On slave following error is displayed: @RSYNCD: 29 @ERROR: protocol startup error On master logs following were found: 2009/12/21 22:46:05 enabled by admin 2009/12/21 22:46:05 command: / solr/SolrHome /bin/rsyncd-enable 2009/12/21 22:46:05 ended (elapsed time: 0 sec) 2009/12/21 22:46:09 started by admin 2009/12/21 22:46:09 command: /solr/SolrHome/bin/rsyncd-start -p 18993 2009/12/21 22:46:09 [16964] forward name lookup for devserver002 failed: ai_family not supported 2009/12/21 22:46:09 [16964] connect from UNKNOWN (localhost) 2009/12/21 22:46:29 [16964] rsync: connection unexpectedly closed (0 bytes received so far) [receiver] 2009/12/21 22:46:29 [16964] rsync error: error in rsync protocol data stream (code 12) at io.c(463) [receiver=2.6.8] 2009/12/21 22:46:44 rsyncd not accepting connections, exiting 2009/12/21 22:46:57 enabled by admin 2009/12/21 22:46:57 command: /solr/SolrHome/bin/rsyncd-enable 2009/12/21 22:46:57 rsyncd already currently enabled 2009/12/21 22:46:57 exited (elapsed time: 0 sec) 2009/12/21 22:47:00 started by admin 2009/12/21 22:47:00 command: /solr/SolrHome/bin/rsyncd-start -p 18993 2009/12/21 22:47:00 [17115] forward name lookup for devserver002 failed: ai_family not supported 2009/12/21 22:47:00 [17115] connect from UNKNOWN (localhost) 2009/12/21 22:49:18 rsyncd not accepting connections, exiting Is it not possible to start the rsync daemon on master from the slave? The user that we use is on the sudoers list as well. Thanks Madu -- Lance Norskog goks...@gmail.com
Profiling Solr
Hi All, Recently we noticed that some of our heavy load Solr instances are facing memory leak kind situations. It goes onto Full GC and as it was unable to release any memory, the broken pripe and socket errors happen. (This happens both in Solr 1.3 and 1.4 for us.) Is there a good tool (preferably open source) that we could use to profile on the Solr deployed Tomcat and to figure out what is happening with the situation? If it was a connection keep alive or some wrong queries/bad schema configuration etc? Sorry about the laymen language.. Thanks in advance for all the responses! Madu
Solr replication 1.3 issue
Hi All, We're trying to replicate indexes on Solr 1.3 across from Dev-QA-Staging-Prod etc. So at each stage other than Dev and Prod, each would live as a master and a slave at a given time. We hit a bottle neck (may be?) when we try to start rsyncd-start on the master from the slave machine. Commands used: ssh -o StrictHostKeyChecking=no ad...@192.168.22.1 /solr/SolrHome/bin/rsyncd-enable ssh -o StrictHostKeyChecking=no ad...@192.168.22.1 /solr / SolrHome /bin/rsyncd-start -p 18003 On slave following error is displayed: @RSYNCD: 29 @ERROR: protocol startup error On master logs following were found: 2009/12/21 22:46:05 enabled by admin 2009/12/21 22:46:05 command: / solr/SolrHome /bin/rsyncd-enable 2009/12/21 22:46:05 ended (elapsed time: 0 sec) 2009/12/21 22:46:09 started by admin 2009/12/21 22:46:09 command: /solr/SolrHome/bin/rsyncd-start -p 18993 2009/12/21 22:46:09 [16964] forward name lookup for devserver002 failed: ai_family not supported 2009/12/21 22:46:09 [16964] connect from UNKNOWN (localhost) 2009/12/21 22:46:29 [16964] rsync: connection unexpectedly closed (0 bytes received so far) [receiver] 2009/12/21 22:46:29 [16964] rsync error: error in rsync protocol data stream (code 12) at io.c(463) [receiver=2.6.8] 2009/12/21 22:46:44 rsyncd not accepting connections, exiting 2009/12/21 22:46:57 enabled by admin 2009/12/21 22:46:57 command: /solr/SolrHome/bin/rsyncd-enable 2009/12/21 22:46:57 rsyncd already currently enabled 2009/12/21 22:46:57 exited (elapsed time: 0 sec) 2009/12/21 22:47:00 started by admin 2009/12/21 22:47:00 command: /solr/SolrHome/bin/rsyncd-start -p 18993 2009/12/21 22:47:00 [17115] forward name lookup for devserver002 failed: ai_family not supported 2009/12/21 22:47:00 [17115] connect from UNKNOWN (localhost) 2009/12/21 22:49:18 rsyncd not accepting connections, exiting Is it not possible to start the rsync daemon on master from the slave? The user that we use is on the sudoers list as well. Thanks Madu
RE: Segment file not found error - after replicating
Permanent solution we found was to add: 1. flush() before closing the segment.gen file write (On Lucene). 2. Remove the slave's segment.gen before replication Point 1 elaborated: Lucene 2.4, org.apache.lucene.index.SegmentInfos.finishCommit(Directory dir) method: Writing of segment.gen file was changed to: public final void prepareCommit(Directory dir) throws IOException { . . . try { IndexOutput genOutput = dir.createOutput(IndexFileNames.SEGMENTS_GEN); try { genOutput.writeInt(FORMAT_LOCKLESS); genOutput.writeLong(generation); genOutput.writeLong(generation); } finally { genOutput.flush(); // this is the simple change! genOutput.close(); } } catch (Throwable t) { // It's OK if we fail to write this file since it's // used only as one of the retry fallbacks. } } I believe, if this makes sense, we should add this simple line in Lucene! :-) However, since Java Replication in Solr 1.4, an application level process, should have already solved this issue in another way as well. Yet to test it. Thanks Madu -Original Message- From: Maduranga Kannangara Sent: Monday, 16 November 2009 2:39 PM To: solr-user@lucene.apache.org Subject: RE: Segment file not found error - after replicating Yes, I too believed so.. The logic in earlier said method does the gen number calculation using segment files available (genA) and using segment.gen file content (genB). Which ever larger, would be the gen number used to look up for segment file. When the file is not properly replicated (due to that is not being written to hard disk, or rsync ed) and segment gen number in the segment.gen file (genB) is larger than the file based calculation (genA) we hit the pre-said issue. Cheers Madu -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Monday, 16 November 2009 2:19 PM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating Thats odd - that file is normally not used - its a backup method to figure out the current generation in case it cannot be determined with a directory listing - its basically for NFS. Maduranga Kannangara wrote: Just found out the root cause: * The segments.gen file does not get replicated to slave all the time. For some reason, this small (20bytes) file lives in memory and does not get updated to the master's hard disk. Therefore it is not obviously transferred to slaves. Solution was to shut down the master web app (must be a clean shut down!, not kill of Tomcat). Then do the replication. Also, if the timestamp/size (size won't change anyway!) is not changed, Rsync does not seem to copy over this file too. So enforcing in the replication scripts solved the problem. Thanks Otis and everyone for all your support! Madu -Original Message- From: Maduranga Kannangara Sent: Monday, 16 November 2009 12:37 PM To: solr-user@lucene.apache.org Subject: RE: Segment file not found error - after replicating Yes. We have tried Solr 1.4 and so far its been great success. Still I am investigating why Solr 1.3 gave an issue like before. Currently seems to me org.apache.lucene.index.SegmentInfos.FindSegmentFile.run() is not able to figure out correct segment file name. (May be index replication issue -- leading to not fully replicated.. but its so hard to believe as both master and slave are having 100% same data now!) Anyway.. will keep on trying till I find something useful.. and will let you know. Thanks Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, 11 November 2009 10:03 AM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating It sounds like your index is not being fully replicated. I can't tell why, but I can suggest you try the new Solr 1.4 replication. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara mkannang...@infomedia.com.au To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tue, November 10, 2009 5:42:44 PM Subject: RE: Segment file not found error - after replicating Thanks Otis, I did the du -s for all three index directories as you said right after replicating and when I find errors. All three gave me the exact same value. This time I found the error in a rather small index too (31Mb). BTW, if I copy the segment_x file to what Solr is looking for, and restart the Solr web-app from Tomcat manager, this resolves. But it's just a work around, never good enough for the production deployments. My next plan is to do a remote debug to see what exactly happening in the code. Any other things I should looking at? Any help is really appreciated on this matter
RE: Segment file not found error - after replicating
Yes. We have tried Solr 1.4 and so far its been great success. Still I am investigating why Solr 1.3 gave an issue like before. Currently seems to me org.apache.lucene.index.SegmentInfos.FindSegmentFile.run() is not able to figure out correct segment file name. (May be index replication issue -- leading to not fully replicated.. but its so hard to believe as both master and slave are having 100% same data now!) Anyway.. will keep on trying till I find something useful.. and will let you know. Thanks Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, 11 November 2009 10:03 AM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating It sounds like your index is not being fully replicated. I can't tell why, but I can suggest you try the new Solr 1.4 replication. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara mkannang...@infomedia.com.au To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tue, November 10, 2009 5:42:44 PM Subject: RE: Segment file not found error - after replicating Thanks Otis, I did the du -s for all three index directories as you said right after replicating and when I find errors. All three gave me the exact same value. This time I found the error in a rather small index too (31Mb). BTW, if I copy the segment_x file to what Solr is looking for, and restart the Solr web-app from Tomcat manager, this resolves. But it's just a work around, never good enough for the production deployments. My next plan is to do a remote debug to see what exactly happening in the code. Any other things I should looking at? Any help is really appreciated on this matter. Thanks Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, 10 November 2009 1:14 PM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating Madu, So are you saying that all slaves have the exact same index, and that index is exactly the same as the one on the master, yet only some of those slaves exhibit this error, while others do not? Mind listing index directories of 1) master 2) slave without errors, 3) slave with errors and doing: du -s /path/to/index/on/master du -s /path/to/index/on/slave/without/errors du -s /path/to/index/on/slave/with/errors Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara To: solr-user@lucene.apache.org Sent: Mon, November 9, 2009 7:47:04 PM Subject: RE: Segment file not found error - after replicating Thanks Otis! Yes, I checked the index directories and they are 100% same, both timestamp and size wise. Not all the slaves face this issue. I would say roughly 50% has this trouble. Logs do not have any errors too :-( Any other things I should do/look at? Cheers Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, 10 November 2009 9:26 AM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating It's hard to troubleshoot blindly like this, but have you tried manually comparing the contents of the index dir on the master and on the slave(s)? If they are out of sync, have you tried forcing of replication to see if one of the subsequent replication attempts gets the dirs in sync? Do you have more than 1 slave and do they all start having this problem at the same time? Any errors in the logs for any of the scripts involved in replication in 1.3? Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara To: solr-user@lucene.apache.org Sent: Sun, November 8, 2009 10:30:44 PM Subject: Segment file not found error - after replicating Hi guys, We use Solr 1.3 for indexing large amounts of data (50G avg) on Linux environment and use the replication scripts to make replicas those live in load balancing slaves. The issue we face quite often (only in Linux servers) is that they tend to not been able to find the segment file (segment_x etc) after the replicating completed. As this has become quite common, we started hitting a serious issue. Below is a stack trace, if that helps and any help on this matter is greatly appreciated. Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created /admin/: org.apache.solr.handler.admin.AdminHandlers
RE: Segment file not found error - after replicating
Just found out the root cause: * The segments.gen file does not get replicated to slave all the time. For some reason, this small (20bytes) file lives in memory and does not get updated to the master's hard disk. Therefore it is not obviously transferred to slaves. Solution was to shut down the master web app (must be a clean shut down!, not kill of Tomcat). Then do the replication. Also, if the timestamp/size (size won't change anyway!) is not changed, Rsync does not seem to copy over this file too. So enforcing in the replication scripts solved the problem. Thanks Otis and everyone for all your support! Madu -Original Message- From: Maduranga Kannangara Sent: Monday, 16 November 2009 12:37 PM To: solr-user@lucene.apache.org Subject: RE: Segment file not found error - after replicating Yes. We have tried Solr 1.4 and so far its been great success. Still I am investigating why Solr 1.3 gave an issue like before. Currently seems to me org.apache.lucene.index.SegmentInfos.FindSegmentFile.run() is not able to figure out correct segment file name. (May be index replication issue -- leading to not fully replicated.. but its so hard to believe as both master and slave are having 100% same data now!) Anyway.. will keep on trying till I find something useful.. and will let you know. Thanks Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, 11 November 2009 10:03 AM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating It sounds like your index is not being fully replicated. I can't tell why, but I can suggest you try the new Solr 1.4 replication. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara mkannang...@infomedia.com.au To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tue, November 10, 2009 5:42:44 PM Subject: RE: Segment file not found error - after replicating Thanks Otis, I did the du -s for all three index directories as you said right after replicating and when I find errors. All three gave me the exact same value. This time I found the error in a rather small index too (31Mb). BTW, if I copy the segment_x file to what Solr is looking for, and restart the Solr web-app from Tomcat manager, this resolves. But it's just a work around, never good enough for the production deployments. My next plan is to do a remote debug to see what exactly happening in the code. Any other things I should looking at? Any help is really appreciated on this matter. Thanks Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, 10 November 2009 1:14 PM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating Madu, So are you saying that all slaves have the exact same index, and that index is exactly the same as the one on the master, yet only some of those slaves exhibit this error, while others do not? Mind listing index directories of 1) master 2) slave without errors, 3) slave with errors and doing: du -s /path/to/index/on/master du -s /path/to/index/on/slave/without/errors du -s /path/to/index/on/slave/with/errors Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara To: solr-user@lucene.apache.org Sent: Mon, November 9, 2009 7:47:04 PM Subject: RE: Segment file not found error - after replicating Thanks Otis! Yes, I checked the index directories and they are 100% same, both timestamp and size wise. Not all the slaves face this issue. I would say roughly 50% has this trouble. Logs do not have any errors too :-( Any other things I should do/look at? Cheers Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, 10 November 2009 9:26 AM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating It's hard to troubleshoot blindly like this, but have you tried manually comparing the contents of the index dir on the master and on the slave(s)? If they are out of sync, have you tried forcing of replication to see if one of the subsequent replication attempts gets the dirs in sync? Do you have more than 1 slave and do they all start having this problem at the same time? Any errors in the logs for any of the scripts involved in replication in 1.3? Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara To: solr-user@lucene.apache.org Sent: Sun, November 8
RE: Segment file not found error - after replicating
Yes, I too believed so.. The logic in earlier said method does the gen number calculation using segment files available (genA) and using segment.gen file content (genB). Which ever larger, would be the gen number used to look up for segment file. When the file is not properly replicated (due to that is not being written to hard disk, or rsync ed) and segment gen number in the segment.gen file (genB) is larger than the file based calculation (genA) we hit the pre-said issue. Cheers Madu -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Monday, 16 November 2009 2:19 PM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating Thats odd - that file is normally not used - its a backup method to figure out the current generation in case it cannot be determined with a directory listing - its basically for NFS. Maduranga Kannangara wrote: Just found out the root cause: * The segments.gen file does not get replicated to slave all the time. For some reason, this small (20bytes) file lives in memory and does not get updated to the master's hard disk. Therefore it is not obviously transferred to slaves. Solution was to shut down the master web app (must be a clean shut down!, not kill of Tomcat). Then do the replication. Also, if the timestamp/size (size won't change anyway!) is not changed, Rsync does not seem to copy over this file too. So enforcing in the replication scripts solved the problem. Thanks Otis and everyone for all your support! Madu -Original Message- From: Maduranga Kannangara Sent: Monday, 16 November 2009 12:37 PM To: solr-user@lucene.apache.org Subject: RE: Segment file not found error - after replicating Yes. We have tried Solr 1.4 and so far its been great success. Still I am investigating why Solr 1.3 gave an issue like before. Currently seems to me org.apache.lucene.index.SegmentInfos.FindSegmentFile.run() is not able to figure out correct segment file name. (May be index replication issue -- leading to not fully replicated.. but its so hard to believe as both master and slave are having 100% same data now!) Anyway.. will keep on trying till I find something useful.. and will let you know. Thanks Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, 11 November 2009 10:03 AM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating It sounds like your index is not being fully replicated. I can't tell why, but I can suggest you try the new Solr 1.4 replication. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara mkannang...@infomedia.com.au To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tue, November 10, 2009 5:42:44 PM Subject: RE: Segment file not found error - after replicating Thanks Otis, I did the du -s for all three index directories as you said right after replicating and when I find errors. All three gave me the exact same value. This time I found the error in a rather small index too (31Mb). BTW, if I copy the segment_x file to what Solr is looking for, and restart the Solr web-app from Tomcat manager, this resolves. But it's just a work around, never good enough for the production deployments. My next plan is to do a remote debug to see what exactly happening in the code. Any other things I should looking at? Any help is really appreciated on this matter. Thanks Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, 10 November 2009 1:14 PM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating Madu, So are you saying that all slaves have the exact same index, and that index is exactly the same as the one on the master, yet only some of those slaves exhibit this error, while others do not? Mind listing index directories of 1) master 2) slave without errors, 3) slave with errors and doing: du -s /path/to/index/on/master du -s /path/to/index/on/slave/without/errors du -s /path/to/index/on/slave/with/errors Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara To: solr-user@lucene.apache.org Sent: Mon, November 9, 2009 7:47:04 PM Subject: RE: Segment file not found error - after replicating Thanks Otis! Yes, I checked the index directories and they are 100% same, both timestamp and size wise. Not all the slaves face this issue. I would say roughly 50% has this trouble. Logs do not have any errors too :-( Any other things I should do/look at? Cheers Madu -Original Message
RE: Segment file not found error - after replicating
Thanks Otis, I did the du -s for all three index directories as you said right after replicating and when I find errors. All three gave me the exact same value. This time I found the error in a rather small index too (31Mb). BTW, if I copy the segment_x file to what Solr is looking for, and restart the Solr web-app from Tomcat manager, this resolves. But it's just a work around, never good enough for the production deployments. My next plan is to do a remote debug to see what exactly happening in the code. Any other things I should looking at? Any help is really appreciated on this matter. Thanks Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, 10 November 2009 1:14 PM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating Madu, So are you saying that all slaves have the exact same index, and that index is exactly the same as the one on the master, yet only some of those slaves exhibit this error, while others do not? Mind listing index directories of 1) master 2) slave without errors, 3) slave with errors and doing: du -s /path/to/index/on/master du -s /path/to/index/on/slave/without/errors du -s /path/to/index/on/slave/with/errors Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara mkannang...@infomedia.com.au To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Mon, November 9, 2009 7:47:04 PM Subject: RE: Segment file not found error - after replicating Thanks Otis! Yes, I checked the index directories and they are 100% same, both timestamp and size wise. Not all the slaves face this issue. I would say roughly 50% has this trouble. Logs do not have any errors too :-( Any other things I should do/look at? Cheers Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, 10 November 2009 9:26 AM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating It's hard to troubleshoot blindly like this, but have you tried manually comparing the contents of the index dir on the master and on the slave(s)? If they are out of sync, have you tried forcing of replication to see if one of the subsequent replication attempts gets the dirs in sync? Do you have more than 1 slave and do they all start having this problem at the same time? Any errors in the logs for any of the scripts involved in replication in 1.3? Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara To: solr-user@lucene.apache.org Sent: Sun, November 8, 2009 10:30:44 PM Subject: Segment file not found error - after replicating Hi guys, We use Solr 1.3 for indexing large amounts of data (50G avg) on Linux environment and use the replication scripts to make replicas those live in load balancing slaves. The issue we face quite often (only in Linux servers) is that they tend to not been able to find the segment file (segment_x etc) after the replicating completed. As this has become quite common, we started hitting a serious issue. Below is a stack trace, if that helps and any help on this matter is greatly appreciated. Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created /admin/: org.apache.solr.handler.admin.AdminHandlers Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created /admin/ping: org.apache.solr.handler.PingRequestHandler Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created /debug/dump: org.apache.solr.handler.DumpRequestHandler Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created gap: org.apache.solr.highlight.GapFragmenter Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created regex: org.apache.solr.highlight.RegexFragmenter Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created html: org.apache.solr.highlight.HtmlFormatter Nov 5, 2009 11:34:46 PM org.apache.solr.servlet.SolrDispatchFilter init SEVERE: Could not start SOLR. Check solr/home property java.lang.RuntimeException: java.io.FileNotFoundException: /solrinstances/solrhome01/data/index/segments_v (No such file or directory) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960) at org.apache.solr.core.SolrCore.(SolrCore.java:470) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119) at org.apache.solr.servlet.SolrDispatchFilter.init
RE: Segment file not found error - after replicating
Thanks Otis! Yes, I checked the index directories and they are 100% same, both timestamp and size wise. Not all the slaves face this issue. I would say roughly 50% has this trouble. Logs do not have any errors too :-( Any other things I should do/look at? Cheers Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, 10 November 2009 9:26 AM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating It's hard to troubleshoot blindly like this, but have you tried manually comparing the contents of the index dir on the master and on the slave(s)? If they are out of sync, have you tried forcing of replication to see if one of the subsequent replication attempts gets the dirs in sync? Do you have more than 1 slave and do they all start having this problem at the same time? Any errors in the logs for any of the scripts involved in replication in 1.3? Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara mkannang...@infomedia.com.au To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Sun, November 8, 2009 10:30:44 PM Subject: Segment file not found error - after replicating Hi guys, We use Solr 1.3 for indexing large amounts of data (50G avg) on Linux environment and use the replication scripts to make replicas those live in load balancing slaves. The issue we face quite often (only in Linux servers) is that they tend to not been able to find the segment file (segment_x etc) after the replicating completed. As this has become quite common, we started hitting a serious issue. Below is a stack trace, if that helps and any help on this matter is greatly appreciated. Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created /admin/: org.apache.solr.handler.admin.AdminHandlers Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created /admin/ping: org.apache.solr.handler.PingRequestHandler Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created /debug/dump: org.apache.solr.handler.DumpRequestHandler Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created gap: org.apache.solr.highlight.GapFragmenter Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created regex: org.apache.solr.highlight.RegexFragmenter Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created html: org.apache.solr.highlight.HtmlFormatter Nov 5, 2009 11:34:46 PM org.apache.solr.servlet.SolrDispatchFilter init SEVERE: Could not start SOLR. Check solr/home property java.lang.RuntimeException: java.io.FileNotFoundException: /solrinstances/solrhome01/data/index/segments_v (No such file or directory) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960) at org.apache.solr.core.SolrCore.(SolrCore.java:470) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) at org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4363) at org.apache.catalina.core.StandardContext.reload(StandardContext.java:3099) at org.apache.catalina.manager.ManagerServlet.reload(ManagerServlet.java:916) at org.apache.catalina.manager.HTMLManagerServlet.reload(HTMLManagerServlet.java:536) at org.apache.catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.java:114) at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at com.jamonapi.JAMonFilter.doFilter(JAMonFilter.java:57) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233
Segment file not found error - after replicating
Hi guys, We use Solr 1.3 for indexing large amounts of data (50G avg) on Linux environment and use the replication scripts to make replicas those live in load balancing slaves. The issue we face quite often (only in Linux servers) is that they tend to not been able to find the segment file (segment_x etc) after the replicating completed. As this has become quite common, we started hitting a serious issue. Below is a stack trace, if that helps and any help on this matter is greatly appreciated. Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created /admin/: org.apache.solr.handler.admin.AdminHandlers Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created /admin/ping: org.apache.solr.handler.PingRequestHandler Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created /debug/dump: org.apache.solr.handler.DumpRequestHandler Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created gap: org.apache.solr.highlight.GapFragmenter Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created regex: org.apache.solr.highlight.RegexFragmenter Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created html: org.apache.solr.highlight.HtmlFormatter Nov 5, 2009 11:34:46 PM org.apache.solr.servlet.SolrDispatchFilter init SEVERE: Could not start SOLR. Check solr/home property java.lang.RuntimeException: java.io.FileNotFoundException: /solrinstances/solrhome01/data/index/segments_v (No such file or directory) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960) at org.apache.solr.core.SolrCore.init(SolrCore.java:470) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4363) at org.apache.catalina.core.StandardContext.reload(StandardContext.java:3099) at org.apache.catalina.manager.ManagerServlet.reload(ManagerServlet.java:916) at org.apache.catalina.manager.HTMLManagerServlet.reload(HTMLManagerServlet.java:536) at org.apache.catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.java:114) at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at com.jamonapi.JAMonFilter.doFilter(JAMonFilter.java:57) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.FileNotFoundException: /solrinstances/solrhome01/data/index/segments_v (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:212) at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.init(FSDirectory.java:552) at org.apache.lucene.store.FSDirectory$FSIndexInput.init(FSDirectory.java:582) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:488) at