Overseer Leader gone
Hi All, I have a cluster that has the overseer leader gone. This is on Solr 4.10.3 version. Its completely gone from zookeeper and bouncing any instance does not start a new election process. Anyone experience this issue before and any ideas to fix this. Thanks, Rishi.
Re: Multiple index.timestamp directories using up disk space
We use the following merge policy on SSD's and are running on physical machines with linux OS. 10 3 15 64 Not sure if its very aggressive, but its something we keep to prevent deleted documents taking up too much space on our index. Is there some error message that solr logs when rename and deletion of the directories fails. If so we could monitor our logs to get a better idea for the root cause. At present we can only react when things go wrong based on disk space alarms. Thanks, Rishi. -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-index-timestamp-directories-using-up-disk-space-tp4201098p4204145.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple index.timestamp directories using up disk space
Hi Shawn, Thanks for clarifying lucene segment behaviour. We don't trigger optimize externally, could it be internal solr optimize? Is there a setting/ knob to control when optimize occurs. Thanks for pointing it out, will monitor memory closely. Though doubt memory is an issue, these are top tier machines with 144GB RAM supporting 12x4GB JVM's. Out of which 9 JVM's are running in cloud mode writing to SSD, should be enough memory leftover for OS cache. The behaviour we see multiple huge directories for the same core. Till we figure out what's going on, the only option we are left with it is to clean up the entire index to free up disk space, and allow a replica to sync from scratch. Thanks, Rishi. -Original Message- From: Shawn Heisey To: solr-user Sent: Tue, May 5, 2015 10:55 am Subject: Re: Multiple index.timestamp directories using up disk space On 5/5/2015 7:29 AM, Rishi Easwaran wrote: > Worried about data loss makes sense. If I get the way solr behaves, the new directory should only have missing/changed segments. > I guess since our application is extremely write heavy, with lot of inserts and deletes, almost every segment is touched even during a short window, so it appears like for our deployment every segment is copied over when replicas get out of sync. Once a segment is written, it is *NEVER* updated again. This aspect of Lucene indexes makes Solr replication more efficient. The ids of deleted documents are written to separate files specifically for tracking deletes. Those files are typically quite small compared to the index segments. Any new documents are inserted into new segments. When older segments are merged, the information in all of those segments is copied to a single new segment (minus documents marked as deleted), and then the old segments are erased. Optimizing replaces the entire index, and each replica of the index would be considered different, so an index recovery that happens after optimization might copy the whole thing. If you are seeing a lot of index recoveries during normal operation, chances are that your Solr servers do not have enough resources, and the resource that has the most impact on performance is memory. The amount of memory required for good Solr performance is higher than most people expect. It's a normal expectation that programs require memory to run, but Solr has an additional memory requirement that often surprises them -- the need for a significant OS disk cache: http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn
Solr/ Solr Cloud meetup at Aol
Hi All, Aol is hosting a meetup in Dulles VA. The topic this time is Solr/ Solr Cloud. http://www.meetup.com/Code-Brew/events/53217/ Thanks, Rishi.
Re: Multiple index.timestamp directories using up disk space
Worried about data loss makes sense. If I get the way solr behaves, the new directory should only have missing/changed segments. I guess since our application is extremely write heavy, with lot of inserts and deletes, almost every segment is touched even during a short window, so it appears like for our deployment every segment is copied over when replicas get out of sync. Thanks for clarifying this behaviour from solr cloud so we can put in external steps to resolve when this situation arises. -Original Message- From: Ramkumar R. Aiyengar To: solr-user Sent: Tue, May 5, 2015 4:52 am Subject: Re: Multiple index.timestamp directories using up disk space Yes, data loss is the concern. If the recovering replica is not able to retrieve the files from the leader, it at least has an older copy. Also, the entire index is not fetched from the leader, only the segments which have changed. The replica initially gets the file list from the replica, checks against what it has, and then downloads the difference -- then moves it to the main index. Note that this process can fail sometimes (say due to I/O errors, or due to a problem with the leader itself), in which case the replica drops all accumulated files from the leader, and starts from scratch. If that happens, it needs to look back at its old index again to figure out what it needs to download on the next attempt. May be with a fair number of assumptions which should usually hold good, you can still come up with a mechanism to drop existing files, but those won't hold good in case of serious issues with the cloud, you could end up losing data. That's worse than using a bit more disk space! On 4 May 2015 11:56, "Rishi Easwaran" wrote: Thanks for the responses Mark and Ramkumar. The question I had was, why does Solr need 2 copies at any given time, leading to 2x disk space usage. Not sure if this information is not published anywhere, and makes HW estimation almost impossible for large scale deployment. Even if the copies are temporary, this becomes really expensive, especially when using SSD in production, when the complex size is over 400TB indexes, running 1000's of solr cloud shards. If a solr follower has decided that it needs to do replication from leader and capture full copy snapshot. Why can't it delete the old information and replicate from scratch, not requiring more disk space. Is the concern data loss (a case when both leader and follower lose data)?. Thanks, Rishi. -Original Message- From: Mark Miller To: solr-user Sent: Tue, Apr 28, 2015 10:52 am Subject: Re: Multiple index.timestamp directories using up disk space If copies of the index are not eventually cleaned up, I'd fill a JIRA to address the issue. Those directories should be removed over time. At times there will have to be a couple around at the same time and others may take a while to clean up. - Mark On Tue, Apr 28, 2015 at 3:27 AM Ramkumar R. Aiyengar < andyetitmo...@gmail.com> wrote: > SolrCloud does need up to twice the amount of disk space as your usual > index size during replication. Amongst other things, this ensures you have > a full copy of the index at any point. There's no way around this, I would > suggest you provision the additional disk space needed. > On 20 Apr 2015 23:21, "Rishi Easwaran" wrote: > > > Hi All, > > > > We are seeing this problem with solr 4.6 and solr 4.10.3. > > For some reason, solr cloud tries to recover and creates a new index > > directory - (ex:index.20150420181214550), while keeping the older index > as > > is. This creates an issues where the disk space fills up and the shard > > never ends up recovering. > > Usually this requires a manual intervention of bouncing the instance and > > wiping the disk clean to allow for a clean recovery. > > > > Any ideas on how to prevent solr from creating multiple copies of index > > directory. > > > > Thanks, > > Rishi. > > >
Re: Solr Cloud reclaiming disk space from deleted documents
Thanks Shawn.. yeah regular optimize might be the route we take, if this becomes a recurring issue. I remember in our old multicore deployment CPU used to spike and the core almost became non responsive. My guess with solr cloud architecture, any slack by leader while optimizing is picked up by the replica. I was searching around for optimize behaviour of solr cloud and could not find much information. Does anyone have experience running optimize for solr cloud in a loaded production env? Thanks, Rishi. -Original Message- From: Shawn Heisey To: solr-user Sent: Mon, May 4, 2015 9:11 am Subject: Re: Solr Cloud reclaiming disk space from deleted documents On 5/4/2015 4:55 AM, Rishi Easwaran wrote: > Sadly with the size of our complex, spiting and adding more HW is not a viable long term solution. > I guess the options we have are to run optimize regularly and/or become aggressive in our merges proactively even before solr cloud gets into this situation. If you are regularly deleting most of your index, or reindexing large parts of it, which effectively does the same thing, then regular optimizes may be required to keep the index size down, although you must remember that you need enough room for the core to grow in order to actually complete the optimize. If the core is 75-90 percent deleted docs, then you will not need 2x the core size to optimize it, because the new index will be much smaller. Currently, SolrCloud will always optimize the entire collection when you ask for an optimize on any core, but it will NOT optimize all the replicas (cores) at the same time. It will go through the cores that make up the collection and optimize each one one in sequence. If your index is sharded and replicated enough, hopefully that will make it possible for the optimize to complete even though the amount of disk space available may be low. We have at least one issue in Jira where users have asked for optimize to honor distrib=false, which would allow the user to be in complete control of all optimizing, but so far that hasn't been implemented. The volunteers that maintain Solr can only accomplish so much in the limited time they have available. Thanks, Shawn
Re: Multiple index.timestamp directories using up disk space
Walter, Unless I am missing something here.. I completely get that, when a few segment merges solr requires 2x space of segments to accomplish this. Usually any index has multiple segments files so this fragmented 2x space consumption is not an issue, even as merged segments grow bigger. But what I am talking about is copy of a whole index as is into a new directory. The new directory has no relation to the older index directory or its segments, so not sure what merges are going on across directories/indexes, and why solr needs the older index. Thanks, Rishi. -Original Message- From: Walter Underwood To: solr-user Sent: Mon, May 4, 2015 9:50 am Subject: Re: Multiple index.timestamp directories using up disk space One segment is in-use, being searched. That segment (and others) are merged into a new segment. After the new segment is ready, searches are directed to the new copy and the old copies are deleted. That is how two copies are needed. If you cannot provide 2X the disk space, you will not have a stable Solr installation. You should consider a different search engine. “Optimizing” (forced merges) will not help. It will probably cause failures more often because it always merges the larges segment. Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On May 4, 2015, at 3:53 AM, Rishi Easwaran wrote: > Thanks for the responses Mark and Ramkumar. > > The question I had was, why does Solr need 2 copies at any given time, leading to 2x disk space usage. > Not sure if this information is not published anywhere, and makes HW estimation almost impossible for large scale deployment. Even if the copies are temporary, this becomes really expensive, especially when using SSD in production, when the complex size is over 400TB indexes, running 1000's of solr cloud shards. > > If a solr follower has decided that it needs to do replication from leader and capture full copy snapshot. Why can't it delete the old information and replicate from scratch, not requiring more disk space. > Is the concern data loss (a case when both leader and follower lose data)?. > > Thanks, > Rishi. > > > > > > > > -Original Message- > From: Mark Miller > To: solr-user > Sent: Tue, Apr 28, 2015 10:52 am > Subject: Re: Multiple index.timestamp directories using up disk space > > > If copies of the index are not eventually cleaned up, I'd fill a JIRA > to > address the issue. Those directories should be removed over time. At > times > there will have to be a couple around at the same time and others may > take > a while to clean up. > > - Mark > > On Tue, Apr 28, 2015 at 3:27 AM Ramkumar > R. Aiyengar < > andyetitmo...@gmail.com> wrote: > >> SolrCloud does need up to > twice the amount of disk space as your usual >> index size during replication. > Amongst other things, this ensures you have >> a full copy of the index at any > point. There's no way around this, I would >> suggest you provision the > additional disk space needed. >> On 20 Apr 2015 23:21, "Rishi Easwaran" > wrote: >> >>> Hi All, >>> >>> We are seeing this > problem with solr 4.6 and solr 4.10.3. >>> For some reason, solr cloud tries to > recover and creates a new index >>> directory - (ex:index.20150420181214550), > while keeping the older index >> as >>> is. This creates an issues where the > disk space fills up and the shard >>> never ends up recovering. >>> Usually > this requires a manual intervention of bouncing the instance and >>> wiping > the disk clean to allow for a clean recovery. >>> >>> Any ideas on how to > prevent solr from creating multiple copies of index >>> directory. >>> >>> > Thanks, >>> Rishi. >>> >> > >
Re: Multiple index.timestamp directories using up disk space
Thanks for the responses Mark and Ramkumar. The question I had was, why does Solr need 2 copies at any given time, leading to 2x disk space usage. Not sure if this information is not published anywhere, and makes HW estimation almost impossible for large scale deployment. Even if the copies are temporary, this becomes really expensive, especially when using SSD in production, when the complex size is over 400TB indexes, running 1000's of solr cloud shards. If a solr follower has decided that it needs to do replication from leader and capture full copy snapshot. Why can't it delete the old information and replicate from scratch, not requiring more disk space. Is the concern data loss (a case when both leader and follower lose data)?. Thanks, Rishi. -Original Message- From: Mark Miller To: solr-user Sent: Tue, Apr 28, 2015 10:52 am Subject: Re: Multiple index.timestamp directories using up disk space If copies of the index are not eventually cleaned up, I'd fill a JIRA to address the issue. Those directories should be removed over time. At times there will have to be a couple around at the same time and others may take a while to clean up. - Mark On Tue, Apr 28, 2015 at 3:27 AM Ramkumar R. Aiyengar < andyetitmo...@gmail.com> wrote: > SolrCloud does need up to twice the amount of disk space as your usual > index size during replication. Amongst other things, this ensures you have > a full copy of the index at any point. There's no way around this, I would > suggest you provision the additional disk space needed. > On 20 Apr 2015 23:21, "Rishi Easwaran" wrote: > > > Hi All, > > > > We are seeing this problem with solr 4.6 and solr 4.10.3. > > For some reason, solr cloud tries to recover and creates a new index > > directory - (ex:index.20150420181214550), while keeping the older index > as > > is. This creates an issues where the disk space fills up and the shard > > never ends up recovering. > > Usually this requires a manual intervention of bouncing the instance and > > wiping the disk clean to allow for a clean recovery. > > > > Any ideas on how to prevent solr from creating multiple copies of index > > directory. > > > > Thanks, > > Rishi. > > >
Re: Solr Cloud reclaiming disk space from deleted documents
Sadly with the size of our complex, spiting and adding more HW is not a viable long term solution. I guess the options we have are to run optimize regularly and/or become aggressive in our merges proactively even before solr cloud gets into this situation. Thanks, Rishi. -Original Message- From: Gili Nachum To: solr-user Sent: Mon, Apr 27, 2015 4:18 pm Subject: Re: Solr Cloud reclaiming disk space from deleted documents To prevent it from re occurring you could monitor index size and once above a certain size threshold add another machine and split the shard between existing and new machine. On Apr 20, 2015 9:10 PM, "Rishi Easwaran" wrote: > So is there anything that can be done from a tuning perspective, to > recover a shard that is 75%-90% full, other that get rid of the index and > rebuild the data? > Also to prevent this issue from re-occurring, looks like we need make our > system aggressive with segment merges using lower merge factor > > > Thanks, > Rishi. > > > > -Original Message- > From: Shawn Heisey > To: solr-user > Sent: Mon, Apr 20, 2015 11:25 am > Subject: Re: Solr Cloud reclaiming disk space from deleted documents > > > On 4/20/2015 8:44 AM, Rishi Easwaran wrote: > > Yeah I noticed that. Looks like > optimize won't work since on some disks we are already pretty full. > > Any > thoughts on increasing/decreasing 10 or > ConcurrentMergeScheduler to make solr do merges faster. > > You don't have to do > an optimize to need 2x disk space. Even normal > merging, if it happens just > right, can require the same disk space as a > full optimize. Normal Solr > operation requires that you have enough > space for your index to reach at least > double size on occasion. > > Higher merge factors are better for indexing speed, > because merging > happens less frequently. Lower merge factors are better for > query > speed, at least after the merging finishes, because merging happens > more > frequently and there are fewer total segments at any given moment. > > During a merge, there is so much I/O that query speed is often > negatively > affected. > > Thanks, > Shawn > > > >
Multiple index.timestamp directories using up disk space
Hi All, We are seeing this problem with solr 4.6 and solr 4.10.3. For some reason, solr cloud tries to recover and creates a new index directory - (ex:index.20150420181214550), while keeping the older index as is. This creates an issues where the disk space fills up and the shard never ends up recovering. Usually this requires a manual intervention of bouncing the instance and wiping the disk clean to allow for a clean recovery. Any ideas on how to prevent solr from creating multiple copies of index directory. Thanks, Rishi.
Re: Solr Cloud reclaiming disk space from deleted documents
So is there anything that can be done from a tuning perspective, to recover a shard that is 75%-90% full, other that get rid of the index and rebuild the data? Also to prevent this issue from re-occurring, looks like we need make our system aggressive with segment merges using lower merge factor Thanks, Rishi. -Original Message- From: Shawn Heisey To: solr-user Sent: Mon, Apr 20, 2015 11:25 am Subject: Re: Solr Cloud reclaiming disk space from deleted documents On 4/20/2015 8:44 AM, Rishi Easwaran wrote: > Yeah I noticed that. Looks like optimize won't work since on some disks we are already pretty full. > Any thoughts on increasing/decreasing 10 or ConcurrentMergeScheduler to make solr do merges faster. You don't have to do an optimize to need 2x disk space. Even normal merging, if it happens just right, can require the same disk space as a full optimize. Normal Solr operation requires that you have enough space for your index to reach at least double size on occasion. Higher merge factors are better for indexing speed, because merging happens less frequently. Lower merge factors are better for query speed, at least after the merging finishes, because merging happens more frequently and there are fewer total segments at any given moment. During a merge, there is so much I/O that query speed is often negatively affected. Thanks, Shawn
Re: Solr Cloud reclaiming disk space from deleted documents
Yeah I noticed that. Looks like optimize won't work since on some disks we are already pretty full. Any thoughts on increasing/decreasing 10 or ConcurrentMergeScheduler to make solr do merges faster. -Original Message- From: Gili Nachum To: solr-user Sent: Sun, Apr 19, 2015 12:34 pm Subject: Re: Solr Cloud reclaiming disk space from deleted documents I assume you don't have much free space available in your disk. Notice that during optimization (merge into a single segment) your shard replica space usage may peak to 2x-3x of it's normal size until optimization completes. Is it a problem? Not if optimization occurs over shards serially and your index is broken to many small shards. On Apr 18, 2015 1:54 AM, "Rishi Easwaran" wrote: > Thanks Shawn for the quick reply. > Our indexes are running on SSD, so 3 should be ok. > Any recommendation on bumping it up? > > I guess will have to run optimize for entire solr cloud and see if we can > reclaim space. > > Thanks, > Rishi. > > > > > > > > > -Original Message- > From: Shawn Heisey > To: solr-user > Sent: Fri, Apr 17, 2015 6:22 pm > Subject: Re: Solr Cloud reclaiming disk space from deleted documents > > > On 4/17/2015 2:15 PM, Rishi Easwaran wrote: > > Running into an issue and wanted > to see if anyone had some suggestions. > > We are seeing this with both solr 4.6 > and 4.10.3 code. > > We are running an extremely update heavy application, with > millions of writes and deletes happening to our indexes constantly. An > issue we > are seeing is that solr cloud reclaiming the disk space that can be used > for new > inserts, by cleanup up deletes. > > > > We used to run optimize periodically with > our old multicore set up, not sure if that works for solr cloud. > > > > Num > Docs:28762340 > > Max Doc:48079586 > > Deleted Docs:19317246 > > > > Version > 1429299216227 > > Gen 16525463 > > Size 109.92 GB > > > > In our solrconfig.xml we > use the following configs. > > > > > > > > > false > > > 1000 > > > 2147483647 > > > 1 > > > > > 10 > > class="org.apache.lucene.index.TieredMergePolicy"/> > > class="org.apache.lucene.index.ConcurrentMergeScheduler"> > > name="maxThreadCount">3 > > name="maxMergeCount">15 > > > > > 64 > > > > > > This > part of my response won't help the issue you wrote about, but it > can affect > performance, so I'm going to mention it. If your indexes are > stored on regular > spinning disks, reduce mergeScheduler/maxThreadCount > to 1. If they are stored > on SSD, then a value of 3 is OK. Spinning > disks cannot do seeks (read/write > head moves) fast enough to handle > multiple merging threads properly. All the > seek activity required will > really slow down merging, which is a very bad thing > when your indexing > load is high. SSD disks do not have to seek, so multiple > threads are OK > there. > > An optimize is the only way to reclaim all of the disk > space held by > deleted documents. Over time, as segments are merged > automatically, > deleted doc space will be automatically recovered, but it won't > be > perfect, especially as segments are merged multiple times into very > large > segments. > > If you send an optimize command to a core/collection in SolrCloud, > the > entire collection will be optimized ... the cloud will do one > shard > replica (core) at a time until the entire collection has been > optimized. > There is no way (currently) to ask it to only optimize a > single core, or to do > multiple cores simultaneously, even if they are on > different > servers. > > Thanks, > Shawn > > > >
Re: Solr Cloud reclaiming disk space from deleted documents
Thanks Shawn for the quick reply. Our indexes are running on SSD, so 3 should be ok. Any recommendation on bumping it up? I guess will have to run optimize for entire solr cloud and see if we can reclaim space. Thanks, Rishi. -Original Message- From: Shawn Heisey To: solr-user Sent: Fri, Apr 17, 2015 6:22 pm Subject: Re: Solr Cloud reclaiming disk space from deleted documents On 4/17/2015 2:15 PM, Rishi Easwaran wrote: > Running into an issue and wanted to see if anyone had some suggestions. > We are seeing this with both solr 4.6 and 4.10.3 code. > We are running an extremely update heavy application, with millions of writes and deletes happening to our indexes constantly. An issue we are seeing is that solr cloud reclaiming the disk space that can be used for new inserts, by cleanup up deletes. > > We used to run optimize periodically with our old multicore set up, not sure if that works for solr cloud. > > Num Docs:28762340 > Max Doc:48079586 > Deleted Docs:19317246 > > Version 1429299216227 > Gen 16525463 > Size 109.92 GB > > In our solrconfig.xml we use the following configs. > > > > false > 1000 > 2147483647 > 1 > > 10 > > > 3 > 15 > > 64 > > This part of my response won't help the issue you wrote about, but it can affect performance, so I'm going to mention it. If your indexes are stored on regular spinning disks, reduce mergeScheduler/maxThreadCount to 1. If they are stored on SSD, then a value of 3 is OK. Spinning disks cannot do seeks (read/write head moves) fast enough to handle multiple merging threads properly. All the seek activity required will really slow down merging, which is a very bad thing when your indexing load is high. SSD disks do not have to seek, so multiple threads are OK there. An optimize is the only way to reclaim all of the disk space held by deleted documents. Over time, as segments are merged automatically, deleted doc space will be automatically recovered, but it won't be perfect, especially as segments are merged multiple times into very large segments. If you send an optimize command to a core/collection in SolrCloud, the entire collection will be optimized ... the cloud will do one shard replica (core) at a time until the entire collection has been optimized. There is no way (currently) to ask it to only optimize a single core, or to do multiple cores simultaneously, even if they are on different servers. Thanks, Shawn
Solr Cloud reclaiming disk space from deleted documents
Hi All, Running into an issue and wanted to see if anyone had some suggestions. We are seeing this with both solr 4.6 and 4.10.3 code. We are running an extremely update heavy application, with millions of writes and deletes happening to our indexes constantly. An issue we are seeing is that solr cloud reclaiming the disk space that can be used for new inserts, by cleanup up deletes. We used to run optimize periodically with our old multicore set up, not sure if that works for solr cloud. Num Docs:28762340 Max Doc:48079586 Deleted Docs:19317246 Version 1429299216227 Gen 16525463 Size 109.92 GB In our solrconfig.xml we use the following configs. false 1000 2147483647 1 10 3 15 64 Any suggestions on which which tunable to adjust, mergeFactor, mergeScheduler thread counts etc would be great. Thanks, Rishi.
Re: Basic Multilingual search capability
Hi Tom, Thanks for your inputs. I was planning to use stopword filter, but will definitely make sure they are unique and not to step over each other. I think for our system even going with length of 50-75 should be fine, will definitely up that number after doing some analysis on our input. Just one clarification, when you say ICUFilterFactory am I correct in thinking its ICUFodingFilterFactory. Thanks, Rishi. -Original Message- From: Tom Burton-West To: solr-user Sent: Wed, Feb 25, 2015 4:33 pm Subject: Re: Basic Multilingual search capability Hi Rishi, As others have indicated Multilingual search is very difficult to do well. At HathiTrust we've been using the ICUTokenizer and ICUFilterFactory to deal with having materials in 400 languages. We also added the CJKBigramFilter to get better precision on CJK queries. We don't use stop words because stop words in one language are content words in another. For example "die" in German is a stopword but it is a content word in English. Putting multiple languages in one index can affect word frequency statistics which make relevance ranking less accurate. So for example for the English query "Die Hard" the word "die" would get a low idf score because it occurs so frequently in German. We realize that our approach does not produce the best results, but given the 400 languages, and limited resources, we do our best to make search "not suck" for non-English languages. When we have the resources we are thinking about doing special processing for a small fraction of the top 20 languages. We plan to select those languages that most need special processing and relatively easy to disambiguate from other languages. If you plan on identifying languages (rather than scripts), you should be aware that most language detection libraries don't work well on short texts such as queries. If you know that you have scripts for which you have content in only one language, you can use script detection instead of language detection. If you have German, a filter length of 25 might be too low (Because of compounding). You might want to analyze a sample of your German text to find a good length. Tom http://www.hathitrust.org/blogs/Large-scale-Search On Wed, Feb 25, 2015 at 10:31 AM, Rishi Easwaran wrote: > Hi Alex, > > Thanks for the suggestions. These steps will definitely help out with our > use case. > Thanks for the idea about the lengthFilter to protect our system. > > Thanks, > Rishi. > > > > > > > > -Original Message- > From: Alexandre Rafalovitch > To: solr-user > Sent: Tue, Feb 24, 2015 8:50 am > Subject: Re: Basic Multilingual search capability > > > Given the limited needs, I would probably do something like this: > > 1) Put a language identifier in the UpdateRequestProcessor chain > during indexing and route out at least known problematic languages, > such as Chinese, Japanese, Arabic into individual fields > 2) Put everything else together into one field with ICUTokenizer, > maybe also ICUFoldingFilter > 3) At the very end of that joint filter, stick in LengthFilter with > some high number, e.g. 25 characters max. This will ensure that > super-long words from non-space languages and edge conditions do not > break the rest of your system. > > > Regards, >Alex. > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > http://www.solr-start.com/ > > > On 23 February 2015 at 23:14, Walter Underwood > wrote: > >> I understand relevancy, stemming etc becomes extremely complicated with > multilingual support, but our first goal is to be able to tokenize and > provide > basic search capability for any language. Ex: When the document contains > hello > or здравствуйте, the analyzer creates tokens and provides exact match > search > results. > > >
Re: Basic Multilingual search capability
Hi Alex, Thanks for the suggestions. These steps will definitely help out with our use case. Thanks for the idea about the lengthFilter to protect our system. Thanks, Rishi. -Original Message- From: Alexandre Rafalovitch To: solr-user Sent: Tue, Feb 24, 2015 8:50 am Subject: Re: Basic Multilingual search capability Given the limited needs, I would probably do something like this: 1) Put a language identifier in the UpdateRequestProcessor chain during indexing and route out at least known problematic languages, such as Chinese, Japanese, Arabic into individual fields 2) Put everything else together into one field with ICUTokenizer, maybe also ICUFoldingFilter 3) At the very end of that joint filter, stick in LengthFilter with some high number, e.g. 25 characters max. This will ensure that super-long words from non-space languages and edge conditions do not break the rest of your system. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 23 February 2015 at 23:14, Walter Underwood wrote: >> I understand relevancy, stemming etc becomes extremely complicated with multilingual support, but our first goal is to be able to tokenize and provide basic search capability for any language. Ex: When the document contains hello or здравствуйте, the analyzer creates tokens and provides exact match search results.
Re: Basic Multilingual search capability
Hi Trey, Thanks for the detailed response and the link to the talk, it was very informative. Yes looking at the current system requirements ICUTokenizer might be the best bet for our use case. MultiTextField mentioned in the jira SOLR-6492 has some cool features and definitely looking forward to trying out once its integrated to main. Thanks, Rishi. -Original Message- From: Trey Grainger To: solr-user Sent: Tue, Feb 24, 2015 1:40 am Subject: Re: Basic Multilingual search capability Hi Rishi, I don't generally recommend a language-insensitive approach except for really simple multilingual use cases (for most of the reasons Walter mentioned), but the ICUTokenizer is probably the best bet you're going to have if you really want to go that route and only need exact-match on the tokens that are parsed. It won't work that well for all languages (CJK languages, for example), but it will work fine for many. It is also possible to handle multi-lingual content in a more intelligent (i.e. per-language configuration) way in your search index, of course. There are three primary strategies (i.e. ways that actually work in the real world) to do this: 1) create a separate field for each language and search across all of them at query time 2) create a separate core per language-combination and search across all of them at query time 3) invoke multiple language-specific analyzers within a single field's analyzer and index/query using one or more of those language's analyzers for each document/query. These are listed in ascending order of complexity, and each can be valid based upon your use case. For at least the first and third cases, you can use index-time language detection to map to the appropriate fields/analyzers if you are otherwise unaware of the languages of the content from your application layer. The third option requires custom code (included in the large Multilingual Search chapter of Solr in Action <http://solrinaction.com> and soon to be contributed back to Solr via SOLR-6492 <https://issues.apache.org/jira/browse/SOLR-6492>), but it enables you to index an arbitrarily large number of languages into the same field if needed, while preserving language-specific analysis for each language. I presented in detail on the above strategies at Lucene/Solr Revolution last November, so you may consider checking out the presentation and/or slides to asses if one of these strategies will work for your use case: http://www.treygrainger.com/posts/presentations/semantic-multilingual-strategies-in-lucenesolr/ For the record, I'd highly recommend going with the first strategy (a separate field per language) if you can, as it is certainly the simplest of the approaches (albeit the one that scales the least well after you add more than a few languages to your queries). If you want to stay simple and stick with the ICUTokenizer then it will work to a point, but some of the problems Walter mentioned may eventually bite you if you are supporting certain groups of languages. All the best, Trey Grainger Co-author, Solr in Action Director of Engineering, Search & Recommendations @ CareerBuilder On Mon, Feb 23, 2015 at 11:14 PM, Walter Underwood wrote: > It isn’t just complicated, it can be impossible. > > Do you have content in Chinese or Japanese? Those languages (and some > others) do not separate words with spaces. You cannot even do word search > without a language-specific, dictionary-based parser. > > German is space separated, except many noun compounds are not > space-separated. > > Do you have Finnish content? Entire prepositional phrases turn into word > endings. > > Do you have Arabic content? That is even harder. > > If all your content is in space-separated languages that are not heavily > inflected, you can kind of do OK with a language-insensitive approach. But > it hits the wall pretty fast. > > One thing that does work pretty well is trademarked names (LaserJet, Coke, > etc). Those are spelled the same in all languages and usually not inflected. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > On Feb 23, 2015, at 8:00 PM, Rishi Easwaran > wrote: > > > Hi Alex, > > > > There is no specific language list. > > For example: the documents that needs to be indexed are emails or any > messages for a global customer base. The messages back and forth could be > in any language or mix of languages. > > > > I understand relevancy, stemming etc becomes extremely complicated with > multilingual support, but our first goal is to be able to tokenize and > provide basic search capability for any language. Ex: When the document > contains hello or здравствуйте, the analyzer creates tokens and provides > exact match search results. > > > > No
Re: Basic Multilingual search capability
Hi Wunder, Yes we do expect incoming documents to contain Chinese/Japanese/Arabic languages. From what you have mentioned, it looks like we need to auto detect the incoming content language and tokenize/filter after that. But I thought the ICU tokenizer had capability to do that (https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers-ICUTokenizer) "This tokenizer processes multilingual text and tokenizes it appropriately based on its script attribute." or am I missing something? Thanks, Rishi. -Original Message- From: Walter Underwood To: solr-user Sent: Mon, Feb 23, 2015 11:17 pm Subject: Re: Basic Multilingual search capability It isn’t just complicated, it can be impossible. Do you have content in Chinese or Japanese? Those languages (and some others) do not separate words with spaces. You cannot even do word search without a language-specific, dictionary-based parser. German is space separated, except many noun compounds are not space-separated. Do you have Finnish content? Entire prepositional phrases turn into word endings. Do you have Arabic content? That is even harder. If all your content is in space-separated languages that are not heavily inflected, you can kind of do OK with a language-insensitive approach. But it hits the wall pretty fast. One thing that does work pretty well is trademarked names (LaserJet, Coke, etc). Those are spelled the same in all languages and usually not inflected. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Feb 23, 2015, at 8:00 PM, Rishi Easwaran wrote: > Hi Alex, > > There is no specific language list. > For example: the documents that needs to be indexed are emails or any > messages for a global customer base. The messages back and forth could be in any language or mix of languages. > > I understand relevancy, stemming etc becomes extremely complicated with multilingual support, but our first goal is to be able to tokenize and provide basic search capability for any language. Ex: When the document contains hello or здравствуйте, the analyzer creates tokens and provides exact match search results. > > Now it would be great if it had capability to tokenize email addresses (ex:he...@aol.com- i think standardTokenizer already does this), filenames (здравствуйте.pdf), but maybe we can use filters to accomplish that. > > Thanks, > Rishi. > > -Original Message- > From: Alexandre Rafalovitch > To: solr-user > Sent: Mon, Feb 23, 2015 5:49 pm > Subject: Re: Basic Multilingual search capability > > > Which languages are you expecting to deal with? Multilingual support > is a complex issue. Even if you think you don't need much, it is > usually a lot more complex than expected, especially around relevancy. > > Regards, > Alex. > > Sign up for my Solr resources newsletter at http://www.solr-start.com/ > > > On 23 February 2015 at 16:19, Rishi Easwaran wrote: >> Hi All, >> >> For our use case we don't really need to do a lot of manipulation of >> incoming > text during index time. At most removal of common stop words, tokenize > emails/ > filenames etc if possible. We get text documents from our end users, which > can > be in any language (sometimes combination) and we cannot determine the language > of the incoming text. Language detection at index time is not necessary. >> >> Which analyzer is recommended to achive basic multilingual search capability > for a use case like this. >> I have read a bunch of posts about using a combination standardtokenizer or > ICUtokenizer, lowercasefilter and reverwildcardfilter factory, but looking > for > ideas, suggestions, best practices. >> >> http://lucene.472066.n3.nabble.com/ICUTokenizer-or-StandardTokenizer-or-for-quot-text-all-quot-type-field-that-might-include-non-whitess-td4142727.html#a4144236 >> http://lucene.472066.n3.nabble.com/How-to-implement-multilingual-word-components-fields-schema-td4157140.html#a4158923 >> https://issues.apache.org/jira/browse/SOLR-6492 >> >> >> Thanks, >> Rishi. >> > >
Re: Basic Multilingual search capability
Hi Alex, There is no specific language list. For example: the documents that needs to be indexed are emails or any messages for a global customer base. The messages back and forth could be in any language or mix of languages. I understand relevancy, stemming etc becomes extremely complicated with multilingual support, but our first goal is to be able to tokenize and provide basic search capability for any language. Ex: When the document contains hello or здравствуйте, the analyzer creates tokens and provides exact match search results. Now it would be great if it had capability to tokenize email addresses (ex:he...@aol.com- i think standardTokenizer already does this), filenames (здравствуйте.pdf), but maybe we can use filters to accomplish that. Thanks, Rishi. -Original Message- From: Alexandre Rafalovitch To: solr-user Sent: Mon, Feb 23, 2015 5:49 pm Subject: Re: Basic Multilingual search capability Which languages are you expecting to deal with? Multilingual support is a complex issue. Even if you think you don't need much, it is usually a lot more complex than expected, especially around relevancy. Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 23 February 2015 at 16:19, Rishi Easwaran wrote: > Hi All, > > For our use case we don't really need to do a lot of manipulation of incoming text during index time. At most removal of common stop words, tokenize emails/ filenames etc if possible. We get text documents from our end users, which can be in any language (sometimes combination) and we cannot determine the language of the incoming text. Language detection at index time is not necessary. > > Which analyzer is recommended to achive basic multilingual search capability for a use case like this. > I have read a bunch of posts about using a combination standardtokenizer or ICUtokenizer, lowercasefilter and reverwildcardfilter factory, but looking for ideas, suggestions, best practices. > > http://lucene.472066.n3.nabble.com/ICUTokenizer-or-StandardTokenizer-or-for-quot-text-all-quot-type-field-that-might-include-non-whitess-td4142727.html#a4144236 > http://lucene.472066.n3.nabble.com/How-to-implement-multilingual-word-components-fields-schema-td4157140.html#a4158923 > https://issues.apache.org/jira/browse/SOLR-6492 > > > Thanks, > Rishi. >
Basic Multilingual search capability
Hi All, For our use case we don't really need to do a lot of manipulation of incoming text during index time. At most removal of common stop words, tokenize emails/ filenames etc if possible. We get text documents from our end users, which can be in any language (sometimes combination) and we cannot determine the language of the incoming text. Language detection at index time is not necessary. Which analyzer is recommended to achive basic multilingual search capability for a use case like this. I have read a bunch of posts about using a combination standardtokenizer or ICUtokenizer, lowercasefilter and reverwildcardfilter factory, but looking for ideas, suggestions, best practices. http://lucene.472066.n3.nabble.com/ICUTokenizer-or-StandardTokenizer-or-for-quot-text-all-quot-type-field-that-might-include-non-whitess-td4142727.html#a4144236 http://lucene.472066.n3.nabble.com/How-to-implement-multilingual-word-components-fields-schema-td4157140.html#a4158923 https://issues.apache.org/jira/browse/SOLR-6492 Thanks, Rishi.
Re: Strange search behaviour when upgrading to 4.10.3
Thanks Shawn. Just ran the analysis between 4.6 and 4.10, there seems to be only difference between the outputs positionLength value is set in 4.10. Does that mean anything. Version 4.10 SF text raw_bytes start end positionLength type position message [6d 65 73 73 61 67 65] 0 7 1 ALNUM 1 Version 4.6 SF text raw_bytes type start end position message [6d 65 73 73 61 67 65] ALNUM 0 7 1 Thanks, Rishi. -Original Message- From: Shawn Heisey To: solr-user Sent: Fri, Feb 20, 2015 6:51 pm Subject: Re: Strange search behaviour when upgrading to 4.10.3 On 2/20/2015 4:24 PM, Rishi Easwaran wrote: > Also, the tokenizer we use is very similar to the following. > ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalTokenizer.java > ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalLexer.jflex > > > From the looks of it the text is being indexed as a single token and not broken across whitespace. I can't claim to know how analyzer code works. I did manage to see the code, but it doesn't mean much to me. I would suggest using the analysis tab in the Solr admin interface. On that page, select the field or fieldType, set the "verbose" flag and type the actual field contents into the "index" side of the page. When you click the Analyze Values button, it will show you what Solr does with the input at index time. Do you still have access to any machines (dev or otherwise) running the old version with the custom component? If so, do the same things on the analysis page for that version that you did on the new version, and see whether it does something different. If it does do something different, then you will need to track down the problem in the code for your custom analyzer. Thanks, Shawn
Re: Strange search behaviour when upgrading to 4.10.3
Hi Shawn, Also, the tokenizer we use is very similar to the following. ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalTokenizer.java ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalLexer.jflex From the looks of it the text is being indexed as a single token and not broken across whitespace. Thanks, Rishi. -Original Message- From: Shawn Heisey To: solr-user Sent: Fri, Feb 20, 2015 11:52 am Subject: Re: Strange search behaviour when upgrading to 4.10.3 On 2/20/2015 9:37 AM, Rishi Easwaran wrote: > We are trying to upgrade from Solr 4.6 to 4.10.3. When testing search 4.10.3 search results are not being returned, actually looks like only the first word in a sentence is getting indexed. > Ex: inserting "This is a test message" only returns results when searching > for content:this*. searching for content:test* or content:message* does not work with 4.10. Only searching for content:*message* works. This leads to me to believe there is something wrong with behaviour of our analyzer and tokenizers > > > > > > > > > Looking at the release notes from solr and lucene > http://lucene.apache.org/solr/4_10_1/changes/Changes.html > http://lucene.apache.org/core/4_10_1/changes/Changes.html > Nothing really sticks out, atleast to me. Any help to get it working with 4.10 would be great. The links you provided lead to zero-byte files when I try them, so I could not look deeper. Have you recompiled your custom analysis components against the newer versions of the Solr/Lucene libraries? Anytime you're dealing with custom components, you cannot assume that a component compiled to work with one version of Solr will work with another version. The internal API does change, and there is less emphasis on avoiding API breaks in minor Solr releases than there is with Lucene, because the vast majority of Solr users are not writing their own code that uses the Solr API. Recompiling against the newer libraries may cause compiler errors that reveal places in your code that require changes. Thanks, Shawn
Re: Strange search behaviour when upgrading to 4.10.3
Yes, The analyzers and tokenizers were recompiled with new version of solr/lucene and there were some errors, most of them were related to using BytesRefBuilder, which i did. Can you try these links. ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/ZimbraAnalyzer.java ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalAnalyzer.java -Original Message- From: Shawn Heisey To: solr-user Sent: Fri, Feb 20, 2015 11:52 am Subject: Re: Strange search behaviour when upgrading to 4.10.3 On 2/20/2015 9:37 AM, Rishi Easwaran wrote: > We are trying to upgrade from Solr 4.6 to 4.10.3. When testing search 4.10.3 search results are not being returned, actually looks like only the first word in a sentence is getting indexed. > Ex: inserting "This is a test message" only returns results when searching > for content:this*. searching for content:test* or content:message* does not work with 4.10. Only searching for content:*message* works. This leads to me to believe there is something wrong with behaviour of our analyzer and tokenizers > > > > > > > > > Looking at the release notes from solr and lucene > http://lucene.apache.org/solr/4_10_1/changes/Changes.html > http://lucene.apache.org/core/4_10_1/changes/Changes.html > Nothing really sticks out, atleast to me. Any help to get it working with 4.10 would be great. The links you provided lead to zero-byte files when I try them, so I could not look deeper. Have you recompiled your custom analysis components against the newer versions of the Solr/Lucene libraries? Anytime you're dealing with custom components, you cannot assume that a component compiled to work with one version of Solr will work with another version. The internal API does change, and there is less emphasis on avoiding API breaks in minor Solr releases than there is with Lucene, because the vast majority of Solr users are not writing their own code that uses the Solr API. Recompiling against the newer libraries may cause compiler errors that reveal places in your code that require changes. Thanks, Shawn
Strange search behaviour when upgrading to 4.10.3
Hi, We are trying to upgrade from Solr 4.6 to 4.10.3. When testing search 4.10.3 search results are not being returned, actually looks like only the first word in a sentence is getting indexed. Ex: inserting "This is a test message" only returns results when searching for content:this*. searching for content:test* or content:message* does not work with 4.10. Only searching for content:*message* works. This leads to me to believe there is something wrong with behaviour of our analyzer and tokenizers A little bit of background. We have our own analyzer and tokenizer since pre solr 1.4 and its been regularly updated. The analyzer works with solr 4.6 we have it running in production (I also tested that search works with solr 4.9.1). It is very similar to the tokenizers and analyzers located here. ftp://193.87.16.77/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/ZimbraAnalyzer.java ftp://193.87.16.77/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalAnalyzer.java ftp://193.87.16.77/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/ But with modifications to work with latest solr/lucene code ex: override- createComponents The schema of the filed being analyzed is as follows Looking at the release notes from solr and lucene http://lucene.apache.org/solr/4_10_1/changes/Changes.html http://lucene.apache.org/core/4_10_1/changes/Changes.html Nothing really sticks out, atleast to me. Any help to get it working with 4.10 would be great. Thanks, Rishi.
SOLR Talk at AOL Dulles Campus.
All, There is a tech talk on AOL Dulles campus tomorrow. Do swing by if you can and share it with your colleagues and friends. www.meetup.com/Code-Brew/events/192361672/ There will be free food and beer served at this event :) Thanks, Rishi.
Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2
The SSD is separated into logical volumes..each instance gets 100 GB SSD disk space to write its index. If I add them all up its ~45GB in 1TB SSD disk space. Not sure I get " You should not be running more than one instance of Solr per machine.One instance of Solr can run multiple indexes. " Yeah I know that, we have been running 6-8 instances of SOLR using multicore ability since ~2008, supporting millions of small indexes. Now we are looking at SOLR cloud with large indexes to see if we can leverage some of its benefits. As many folks have experienced, JVM with its stop the world pauses, cannot GC using CMS within acceptable limits with very large heaps. To utilize the H/W to its full potential, multiple instances on a single host is pretty common practice for us. -Original Message- From: Shawn Heisey To: solr-user Sent: Sun, Mar 30, 2014 5:51 pm Subject: Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2 On 3/30/2014 2:59 PM, Rishi Easwaran wrote: > RAM shouldn't be a problem. > I have a box with 144GB RAM, running 12 instances with 4GB Java heap each. > There are 9 instances wrting to 1TB of SSD disk space. > Other 3 are writing to SATA drives, and have autosoftcommit disabled. This brought up more questions than it answered. I was assuming that you only had a total of 4GB of index data, but after reading this, I think my assumption may be incorrect. If you add up all the Solr index data on the SSD, how much disk space does it take? You should not be running more than one instance of Solr per machine. One instance of Solr can run multiple indexes. Running more than one results in quite a lot of overhead, and it seems unlikely that you would need to dedicate 48GB of total RAM to the Java heap. Thanks, Shawn
Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2
RAM shouldn't be a problem. I have a box with 144GB RAM, running 12 instances with 4GB Java heap each. There are 9 instances wrting to 1TB of SSD disk space. Other 3 are writing to SATA drives, and have autosoftcommit disabled. -Original Message- From: Shawn Heisey To: solr-user Sent: Fri, Mar 28, 2014 8:35 pm Subject: Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2 On 3/28/2014 4:07 PM, Rishi Easwaran wrote: > > Shawn, > > I changed the autoSoftCommit value to 15000 (15 sec). > My index size is pretty small ~4GB and its running on a SSD drive with ~100 > GB space on it. > Now I see the warn message every 15 seconds. > > The caches I think are minimal > > > > initialSize="512" autowarmCount="0"/> > initialSize="512" autowarmCount="0"/> > > 200 > > I think still something is going on. I mean 15s on SSD drives is a long time to handle a 4GB index. How much RAM do you have and what size is your max java heap? https://wiki.apache.org/solr/SolrPerformanceProblems#RAM Thanks, Shawn
Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2
Shawn, I changed the autoSoftCommit value to 15000 (15 sec). My index size is pretty small ~4GB and its running on a SSD drive with ~100 GB space on it. Now I see the warn message every 15 seconds. The caches I think are minimal 200 I think still something is going on. I mean 15s on SSD drives is a long time to handle a 4GB index. Thanks, Rishi. -Original Message- From: Shawn Heisey To: solr-user Sent: Fri, Mar 28, 2014 3:28 pm Subject: Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2 On 3/28/2014 1:03 PM, Rishi Easwaran wrote: > I thought auto soft commit was for NRT search (shouldn't it be optimized for search performance), if i have to wait 10 mins how is it NRT? or am I missing something? You are correct, but once a second is REALLY often. If the rest of your config is not set up properly, that's far too frequent. With commits happening once a second, they must complete in less than a second, and that can be difficult to achieve. A typical extreme NRT config requires small (or disabled) Solr caches, no cache autowarming, and enough free RAM (not allocated to programs) to cache all of the index data on the server. If the index is very big, it may not be possible to get the commit time below one second, so you may need to go with something like 10 to 60 seconds. Thanks, Shawn
Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2
Hi Dmitry, I thought auto soft commit was for NRT search (shouldn't it be optimized for search performance), if i have to wait 10 mins how is it NRT? or am I missing something? Thanks, Rishi. -Original Message- From: Dmitry Kan To: solr-user Sent: Fri, Mar 28, 2014 1:02 pm Subject: Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2 Hi Rishi, Do you really need soft-commit every second? Can you make it 10 mins, for example? What is happening (conditional on checking your logs) is that several commits (looks like 2 in your case) are arriving in a quick succession. Then system is starting to warmup the searchers, one per each commit. This is a waste of resources, because only one searcher will be used in the end, so one of them is warming in vain. Just rethink your commit strategy with regards to the update frequency and warming up time to avoid issues with this in the future. Dmitry On Thu, Mar 27, 2014 at 11:16 PM, Rishi Easwaran wrote: > All, > > I am running SOLR Cloud 4.6, everything looks ok, except for this warn > message constantly in the logs. > > > 2014-03-27 17:09:03,982 WARN [commitScheduler-15-thread-1] [] SolrCore - > [index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 > 2014-03-27 17:09:05,517 WARN [commitScheduler-15-thread-1] [] SolrCore - > [index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 > 2014-03-27 17:09:06,774 WARN [commitScheduler-15-thread-1] [] SolrCore - > [index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 > 2014-03-27 17:09:08,085 WARN [commitScheduler-15-thread-1] [] SolrCore - > [index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 > 2014-03-27 17:09:09,114 WARN [commitScheduler-15-thread-1] [] SolrCore - > [index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 > 2014-03-27 17:09:10,238 WARN [commitScheduler-15-thread-1] [] SolrCore - > [index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 > > Searched around a bit, looks like my solrconfig.xml is configured fine and > verified there are no explicit commits sent by our clients. > > My solrconfig.xml > > 1 > 6 > false > > > > 1000 > > > > Any idea why its warning every second, the only config that has 1 second > is softcommit. > > Thanks, > Rishi. > > -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan
SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2
All, I am running SOLR Cloud 4.6, everything looks ok, except for this warn message constantly in the logs. 2014-03-27 17:09:03,982 WARN [commitScheduler-15-thread-1] [] SolrCore - [index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 2014-03-27 17:09:05,517 WARN [commitScheduler-15-thread-1] [] SolrCore - [index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 2014-03-27 17:09:06,774 WARN [commitScheduler-15-thread-1] [] SolrCore - [index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 2014-03-27 17:09:08,085 WARN [commitScheduler-15-thread-1] [] SolrCore - [index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 2014-03-27 17:09:09,114 WARN [commitScheduler-15-thread-1] [] SolrCore - [index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 2014-03-27 17:09:10,238 WARN [commitScheduler-15-thread-1] [] SolrCore - [index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 Searched around a bit, looks like my solrconfig.xml is configured fine and verified there are no explicit commits sent by our clients. My solrconfig.xml 1 6 false 1000 Any idea why its warning every second, the only config that has 1 second is softcommit. Thanks, Rishi.
Re: Solr Cloud Hangs consistently .
Update!! Got SOLR cloud working, was able to do 90k document inserts with replicationFactor=2, with my jmeter script, previously was getting stuck with 3k inserts or less. After some investigation, figured out that ulimits for my process were not being set properly, OS defaults were kicking in, which is very small for a server app. One of our install script had changed. I had to up the ulimits - -n,-u,-v and for now no other issues seen. -Original Message- From: Rishi Easwaran To: solr-user Sent: Tue, Jun 18, 2013 10:40 am Subject: Re: Solr Cloud Hangs consistently . Mark, All I am doing are inserts, afaik search side deadlocks should not be an issue. I am using Jmeter, standard test driver we use for most of our benchmarks and stats collection. My jmeter.jmx file- http://apaste.info/79IS , maybe i overlooked something Is there a benchmark script that solr community uses (preferably with jmeter), we are write heavy so at the moment focusing on inserts only. Thanks, Rishi. -Original Message- From: Yago Riveiro To: solr-user Sent: Mon, Jun 17, 2013 6:19 pm Subject: Re: Solr Cloud Hangs consistently . I do all the indexing through a HTTP POST, with replicationFactor=1 no problem, if is higher deadlock problems can appear A stack trace like this http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862 is that I get -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, June 17, 2013 at 11:03 PM, Mark Miller wrote: > If it actually happens with replicationFactor=1, it doesn't likely have anything to do with the update handler issue I'm referring to. In some cases like these, people have better luck with Jetty than Tomcat - we test it much more. For instance, it's setup to help avoid search side distributed deadlocks. > > In any case, there is something special about it - I do and have seen a lot > of heavy indexing to SolrCloud by me and others without running into this. Both with replicationFacotor=1 and greater. So there is something specific in how the load is being done or what features/methods are being used that likely causes it or makes it easier to cause. > > But again, the issue I know about involves threads that are not even created in the replicationFactor = 1 case, so that could be a first report afaik. > > - Mark > > On Jun 17, 2013, at 5:52 PM, Rishi Easwaran mailto:rishi.easwa...@aol.com)> wrote: > > > Update!! > > > > This happens with replicationFactor=1 > > Just for kicks I created a collection with a 24 shards, replicationfactor=1 cluster on my exisiting benchmark env. > > Same behaviour, SOLR cloud just hangs. Nothing in the logs, top/heap/cpu most metrics looks fine. > > Only indication seems to be netstat showing incoming request not being read in. > > > > Yago, > > > > I saw your previous post > > (http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067631) > > Following it, Last week, I upgraded to SOLR 4.3, to see if the issue gets fixed, but no luck. > > Looks like this is a dominant and easily reproducible issue on SOLR cloud. > > > > > > Thanks, > > > > Rishi. > > > > > > > > > > > > > > > > > > > > > > > > -Original Message- > > From: Yago Riveiro mailto:yago.rive...@gmail.com)> > > To: solr-user > (mailto:solr-user@lucene.apache.org)> > > Sent: Mon, Jun 17, 2013 5:15 pm > > Subject: Re: Solr Cloud Hangs consistently . > > > > > > I can confirm that the deadlock happen with only 2 replicas by shard. I > > need > > shutdown one node that host a replica of the shard to recover the > > indexation > > capability. > > > > -- > > Yago Riveiro > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > > > > > > On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote: > > > > > > > > > > > Hi All, > > > > > > I am trying to benchmark SOLR Cloud and it consistently hangs. > > > Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck. > > > > > > A little bit about my set up. > > > I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host > > > > > > > is configured to have 8 SOLR cloud nodes running at 4GB each. > > > JVM configs: http://apaste.info/57Ai > > > > > > My cluster has 12 shards with replication factor 2- > > > http://apaste.info/09sA > > > > > > I originally stated with SOLR 4.2., tomcat
Re: SOLR Cloud - Disable Transaction Logs
Erick, We at AOL mail have been using SOLR for quiet a while and our system is pretty write heavy and disk I/O is one of our bottlenecks. At present we use regular SOLR in the lotsOfCore configuration and I am in the process of benchmarking SOLR cloud for our use case. I don't have concrete data that tLogs are placing lot of load on the system, but for a large scale system like ours even minimal load gets magnified. >From the Cloud design, for a properly set up cluster, usually you have >replicas at different availability zones . Probablity of losing more than 1 >availability zone at any given time should be pretty low. Why have tLogs if >all replicas on an update get the request anyway, In theory 1 replica must be >able to commit eventually. NRT is an optional feature and probably not tied to Cloud, correct? Thanks, Rishi. -Original Message- From: Erick Erickson To: solr-user Sent: Tue, Jun 18, 2013 4:07 pm Subject: Re: SOLR Cloud - Disable Transaction Logs bq: the replica can take over and maintain a durable state of my index This is not true. On an update, all the nodes in a slice have already written the data to the tlog, not just the leader. So if a leader goes down, the replicas have enough local info to insure that data is not lost. Without tlogs this would not be true since documents are not durably saved until a hard commit. tlogs save data between hard commits. As Yonik explained to me once, "soft commits are about visibility, hard commits are about durability" and tlogs fill up the gap between hard commits. So to reinforce Shalin's comment yes, you can disable tlogs if 1> you don't want any of SolrCloud's HA/DR capabilities 2> NRT is unimportant IOW if you're using 4.x just like you would 3.x in terms of replication, HA/DR, etc. This is perfectly reasonable, but don't get hung up on disabling tlogs. And you haven't told us _why_ you want to do this. They don't consume much memory or disk space unless you have configured your hard commits (with openSearcher true or false) to be quite long. Do you have any proof at all that the tlogs are placing enough load on the system to go down this road? Best Erick On Tue, Jun 18, 2013 at 10:49 AM, Rishi Easwaran wrote: > SolrJ already has access to zookeeper cluster state. Network I/O bottleneck can be avoided by parallel requests. > You are only as slow as your slowest responding server, which could be your single leader with the current set up. > > Wouldn't this lessen the burden of the leader, as he does not have to > maintain transaction logs or distribute to replicas? > > > > > > > > -Original Message- > From: Shalin Shekhar Mangar > To: solr-user > Sent: Tue, Jun 18, 2013 2:05 am > Subject: Re: SOLR Cloud - Disable Transaction Logs > > > Yes, but at what cost? You are thinking of replacing disk IO with even more > slower network IO. The transaction log is a append-only log -- it is not > pretty cheap especially so if you compare it with the indexing process. > Plus your write request/sec will drop a lot once you start doing > synchronous replication. > > > On Tue, Jun 18, 2013 at 2:18 AM, Rishi Easwaran wrote: > >> Shalin, >> >> Just some thoughts. >> >> Near Real time replication- don't we use solrCmdDistributor, which send >> requests immediately to replicas with a clonedRequest, as an option can't >> we achieve something similar form CloudSolrserver in Solrj instead of >> leader doing it. As long as 2 nodes receive writes and acknowledge. >> durability should be high. >> Peer-Sync and Recovery - Can we achieve that merging indexes from leader >> as needed, instead of replaying the transaction logs? >> >> Rishi. >> >> >> >> >> >> >> >> -Original Message- >> From: Shalin Shekhar Mangar >> To: solr-user >> Sent: Mon, Jun 17, 2013 3:43 pm >> Subject: Re: SOLR Cloud - Disable Transaction Logs >> >> >> It is also necessary for near real-time replication, peer sync and >> recovery. >> >> >> On Tue, Jun 18, 2013 at 1:04 AM, Rishi Easwaran > >wrote: >> >> > Hi, >> > >> > Is there a way to disable transaction logs in SOLR cloud. As far as I can >> > tell no. >> > Just curious why do we need transaction logs, seems like an I/O intensive >> > operation. >> > As long as I have replicatonFactor >1, if a node (leader) goes down, the >> > replica can take over and maintain a durable state of my index. >> > >> > I understand from the previous discussions, that it was intended for >> > update durability and realtime get. >> > But, unless I am missing something an ability to disable it in SOLR cloud >> > if not needed would be good. >> > >> > Thanks, >> > >> > Rishi. >> > >> > >> >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> >> >> > > > -- > Regards, > Shalin Shekhar Mangar. > >
Re: SOLR Cloud - Disable Transaction Logs
SolrJ already has access to zookeeper cluster state. Network I/O bottleneck can be avoided by parallel requests. You are only as slow as your slowest responding server, which could be your single leader with the current set up. Wouldn't this lessen the burden of the leader, as he does not have to maintain transaction logs or distribute to replicas? -Original Message- From: Shalin Shekhar Mangar To: solr-user Sent: Tue, Jun 18, 2013 2:05 am Subject: Re: SOLR Cloud - Disable Transaction Logs Yes, but at what cost? You are thinking of replacing disk IO with even more slower network IO. The transaction log is a append-only log -- it is not pretty cheap especially so if you compare it with the indexing process. Plus your write request/sec will drop a lot once you start doing synchronous replication. On Tue, Jun 18, 2013 at 2:18 AM, Rishi Easwaran wrote: > Shalin, > > Just some thoughts. > > Near Real time replication- don't we use solrCmdDistributor, which send > requests immediately to replicas with a clonedRequest, as an option can't > we achieve something similar form CloudSolrserver in Solrj instead of > leader doing it. As long as 2 nodes receive writes and acknowledge. > durability should be high. > Peer-Sync and Recovery - Can we achieve that merging indexes from leader > as needed, instead of replaying the transaction logs? > > Rishi. > > > > > > > > -Original Message- > From: Shalin Shekhar Mangar > To: solr-user > Sent: Mon, Jun 17, 2013 3:43 pm > Subject: Re: SOLR Cloud - Disable Transaction Logs > > > It is also necessary for near real-time replication, peer sync and > recovery. > > > On Tue, Jun 18, 2013 at 1:04 AM, Rishi Easwaran >wrote: > > > Hi, > > > > Is there a way to disable transaction logs in SOLR cloud. As far as I can > > tell no. > > Just curious why do we need transaction logs, seems like an I/O intensive > > operation. > > As long as I have replicatonFactor >1, if a node (leader) goes down, the > > replica can take over and maintain a durable state of my index. > > > > I understand from the previous discussions, that it was intended for > > update durability and realtime get. > > But, unless I am missing something an ability to disable it in SOLR cloud > > if not needed would be good. > > > > Thanks, > > > > Rishi. > > > > > > > -- > Regards, > Shalin Shekhar Mangar. > > > -- Regards, Shalin Shekhar Mangar.
Re: Solr Cloud Hangs consistently .
Mark, All I am doing are inserts, afaik search side deadlocks should not be an issue. I am using Jmeter, standard test driver we use for most of our benchmarks and stats collection. My jmeter.jmx file- http://apaste.info/79IS , maybe i overlooked something Is there a benchmark script that solr community uses (preferably with jmeter), we are write heavy so at the moment focusing on inserts only. Thanks, Rishi. -Original Message- From: Yago Riveiro To: solr-user Sent: Mon, Jun 17, 2013 6:19 pm Subject: Re: Solr Cloud Hangs consistently . I do all the indexing through a HTTP POST, with replicationFactor=1 no problem, if is higher deadlock problems can appear A stack trace like this http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862 is that I get -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, June 17, 2013 at 11:03 PM, Mark Miller wrote: > If it actually happens with replicationFactor=1, it doesn't likely have anything to do with the update handler issue I'm referring to. In some cases like these, people have better luck with Jetty than Tomcat - we test it much more. For instance, it's setup to help avoid search side distributed deadlocks. > > In any case, there is something special about it - I do and have seen a lot > of heavy indexing to SolrCloud by me and others without running into this. Both with replicationFacotor=1 and greater. So there is something specific in how the load is being done or what features/methods are being used that likely causes it or makes it easier to cause. > > But again, the issue I know about involves threads that are not even created in the replicationFactor = 1 case, so that could be a first report afaik. > > - Mark > > On Jun 17, 2013, at 5:52 PM, Rishi Easwaran mailto:rishi.easwa...@aol.com)> wrote: > > > Update!! > > > > This happens with replicationFactor=1 > > Just for kicks I created a collection with a 24 shards, replicationfactor=1 cluster on my exisiting benchmark env. > > Same behaviour, SOLR cloud just hangs. Nothing in the logs, top/heap/cpu most metrics looks fine. > > Only indication seems to be netstat showing incoming request not being read in. > > > > Yago, > > > > I saw your previous post > > (http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067631) > > Following it, Last week, I upgraded to SOLR 4.3, to see if the issue gets fixed, but no luck. > > Looks like this is a dominant and easily reproducible issue on SOLR cloud. > > > > > > Thanks, > > > > Rishi. > > > > > > > > > > > > > > > > > > > > > > > > -Original Message- > > From: Yago Riveiro mailto:yago.rive...@gmail.com)> > > To: solr-user > (mailto:solr-user@lucene.apache.org)> > > Sent: Mon, Jun 17, 2013 5:15 pm > > Subject: Re: Solr Cloud Hangs consistently . > > > > > > I can confirm that the deadlock happen with only 2 replicas by shard. I > > need > > shutdown one node that host a replica of the shard to recover the > > indexation > > capability. > > > > -- > > Yago Riveiro > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > > > > > > On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote: > > > > > > > > > > > Hi All, > > > > > > I am trying to benchmark SOLR Cloud and it consistently hangs. > > > Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck. > > > > > > A little bit about my set up. > > > I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host > > > > > > > is configured to have 8 SOLR cloud nodes running at 4GB each. > > > JVM configs: http://apaste.info/57Ai > > > > > > My cluster has 12 shards with replication factor 2- > > > http://apaste.info/09sA > > > > > > I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already > > running this configuration in production in Non-Cloud form. > > > It got stuck repeatedly. > > > > > > I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 > > and tomcat7. > > > It still shows same behaviour and hangs through the test. > > > > > > My test schema and config. > > > Schema.xml - http://apaste.info/imah > > > SolrConfig.xml - http://apaste.info/ku4F > > > > > > The test is pretty simple. its a jmeter test with upda
Re: Solr Cloud Hangs consistently .
Update!! This happens with replicationFactor=1 Just for kicks I created a collection with a 24 shards, replicationfactor=1 cluster on my exisiting benchmark env. Same behaviour, SOLR cloud just hangs. Nothing in the logs, top/heap/cpu most metrics looks fine. Only indication seems to be netstat showing incoming request not being read in. Yago, I saw your previous post (http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067631) Following it, Last week, I upgraded to SOLR 4.3, to see if the issue gets fixed, but no luck. Looks like this is a dominant and easily reproducible issue on SOLR cloud. Thanks, Rishi. -Original Message- From: Yago Riveiro To: solr-user Sent: Mon, Jun 17, 2013 5:15 pm Subject: Re: Solr Cloud Hangs consistently . I can confirm that the deadlock happen with only 2 replicas by shard. I need shutdown one node that host a replica of the shard to recover the indexation capability. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote: > > > Hi All, > > I am trying to benchmark SOLR Cloud and it consistently hangs. > Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck. > > A little bit about my set up. > I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host is configured to have 8 SOLR cloud nodes running at 4GB each. > JVM configs: http://apaste.info/57Ai > > My cluster has 12 shards with replication factor 2- http://apaste.info/09sA > > I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already running this configuration in production in Non-Cloud form. > It got stuck repeatedly. > > I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 and tomcat7. > It still shows same behaviour and hangs through the test. > > My test schema and config. > Schema.xml - http://apaste.info/imah > SolrConfig.xml - http://apaste.info/ku4F > > The test is pretty simple. its a jmeter test with update command via SOAP rpc (round robin request across every node), adding in 5 fields from a csv file - id, guid, subject, body, compositeID (guid!id). > number of jmeter threads = 150. loop count = 20, num of messages to add/per guid = 3; total 150*3*20 = 9000 documents. > > When cloud gets stuck, i don't get anything in the logs, but when i run netstat i see the following. > Sample netstat on a stuck run. http://apaste.info/hr0O > hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts. > > At the moment my benchmarking efforts are at a stand still. > > Any help from the community would be great, I got some heap dumps and stack dumps, but haven't found a smoking gun yet. > If I can provide anything else to diagnose this issue. just let me know. > > Thanks, > > Rishi.
Spread the word - Opening at AOL Mail Team in Dulles VA
Hi All, With the economy the way it is and many folks still looking. Figured this is a good place as any to publish this. Just today, we got an opening for mid-senior level Software Engineer in our team. Experience with SOLR is a big+. Feel free to have a look at this position. http://www.linkedin.com/jobs?viewJob=&jobId=6073910 If interested, send your current resume to rishi.easwa...@aol.com. I will take it to my Director. This position is in Dulles, VA. Thanks, Rishi.
Re: SOLR Cloud - Disable Transaction Logs
Shalin, Just some thoughts. Near Real time replication- don't we use solrCmdDistributor, which send requests immediately to replicas with a clonedRequest, as an option can't we achieve something similar form CloudSolrserver in Solrj instead of leader doing it. As long as 2 nodes receive writes and acknowledge. durability should be high. Peer-Sync and Recovery - Can we achieve that merging indexes from leader as needed, instead of replaying the transaction logs? Rishi. -Original Message- From: Shalin Shekhar Mangar To: solr-user Sent: Mon, Jun 17, 2013 3:43 pm Subject: Re: SOLR Cloud - Disable Transaction Logs It is also necessary for near real-time replication, peer sync and recovery. On Tue, Jun 18, 2013 at 1:04 AM, Rishi Easwaran wrote: > Hi, > > Is there a way to disable transaction logs in SOLR cloud. As far as I can > tell no. > Just curious why do we need transaction logs, seems like an I/O intensive > operation. > As long as I have replicatonFactor >1, if a node (leader) goes down, the > replica can take over and maintain a durable state of my index. > > I understand from the previous discussions, that it was intended for > update durability and realtime get. > But, unless I am missing something an ability to disable it in SOLR cloud > if not needed would be good. > > Thanks, > > Rishi. > > -- Regards, Shalin Shekhar Mangar.
SOLR Cloud - Disable Transaction Logs
Hi, Is there a way to disable transaction logs in SOLR cloud. As far as I can tell no. Just curious why do we need transaction logs, seems like an I/O intensive operation. As long as I have replicatonFactor >1, if a node (leader) goes down, the replica can take over and maintain a durable state of my index. I understand from the previous discussions, that it was intended for update durability and realtime get. But, unless I am missing something an ability to disable it in SOLR cloud if not needed would be good. Thanks, Rishi.
Re: Solr Cloud Hangs consistently .
FYI..you can ignore http4ClientExpiryService thread in the stack dump. Its a dummy executor service, i created to test out something, unrelated to this issue. -Original Message- From: Rishi Easwaran To: solr-user Sent: Mon, Jun 17, 2013 2:54 pm Subject: Re: Solr Cloud Hangs consistently . Mark, I got a few stack dumps of the instance that was stuck ssdtest-d03:8011 http://apaste.info/cofK http://apaste.info/sv4M http://apaste.info/cxUf I can get dumps of others if needed. Thanks, Rishi. -Original Message- From: Mark Miller To: solr-user Sent: Mon, Jun 17, 2013 1:57 pm Subject: Re: Solr Cloud Hangs consistently . Could you give a simple stack trace dump as well? It's likely the distributed update deadlock that has been reported a few times now - I think usually with a replication factor greater than 2, but I can't be sure. The deadlock involves sending docs concurrently to replicas and I wouldn't have expected it to be so easily hit with only 2 replicas per shard. I should be able to tell from a stack trace though. If it is that, it's on my short list to investigate (been there a long time now though - but I still hope to look at it soon). - Mark On Jun 17, 2013, at 1:44 PM, Rishi Easwaran wrote: > > > Hi All, > > I am trying to benchmark SOLR Cloud and it consistently hangs. > Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck. > > A little bit about my set up. > I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host is configured to have 8 SOLR cloud nodes running at 4GB each. > JVM configs: http://apaste.info/57Ai > > My cluster has 12 shards with replication factor 2- http://apaste.info/09sA > > I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already running this configuration in production in Non-Cloud form. > It got stuck repeatedly. > > I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 and tomcat7. > It still shows same behaviour and hangs through the test. > > My test schema and config. > Schema.xml - http://apaste.info/imah > SolrConfig.xml - http://apaste.info/ku4F > > The test is pretty simple. its a jmeter test with update command via SOAP rpc (round robin request across every node), adding in 5 fields from a csv file - id, guid, subject, body, compositeID (guid!id). > number of jmeter threads = 150. loop count = 20, num of messages to add/per guid = 3; total 150*3*20 = 9000 documents. > > When cloud gets stuck, i don't get anything in the logs, but when i run netstat i see the following. > Sample netstat on a stuck run. http://apaste.info/hr0O > hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts. > > > At the moment my benchmarking efforts are at a stand still. > > Any help from the community would be great, I got some heap dumps and stack dumps, but haven't found a smoking gun yet. > If I can provide anything else to diagnose this issue. just let me know. > > Thanks, > > Rishi. > > > > > > > >
Re: Solr Cloud Hangs consistently .
Mark, I got a few stack dumps of the instance that was stuck ssdtest-d03:8011 http://apaste.info/cofK http://apaste.info/sv4M http://apaste.info/cxUf I can get dumps of others if needed. Thanks, Rishi. -Original Message- From: Mark Miller To: solr-user Sent: Mon, Jun 17, 2013 1:57 pm Subject: Re: Solr Cloud Hangs consistently . Could you give a simple stack trace dump as well? It's likely the distributed update deadlock that has been reported a few times now - I think usually with a replication factor greater than 2, but I can't be sure. The deadlock involves sending docs concurrently to replicas and I wouldn't have expected it to be so easily hit with only 2 replicas per shard. I should be able to tell from a stack trace though. If it is that, it's on my short list to investigate (been there a long time now though - but I still hope to look at it soon). - Mark On Jun 17, 2013, at 1:44 PM, Rishi Easwaran wrote: > > > Hi All, > > I am trying to benchmark SOLR Cloud and it consistently hangs. > Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck. > > A little bit about my set up. > I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host is configured to have 8 SOLR cloud nodes running at 4GB each. > JVM configs: http://apaste.info/57Ai > > My cluster has 12 shards with replication factor 2- http://apaste.info/09sA > > I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already running this configuration in production in Non-Cloud form. > It got stuck repeatedly. > > I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 and tomcat7. > It still shows same behaviour and hangs through the test. > > My test schema and config. > Schema.xml - http://apaste.info/imah > SolrConfig.xml - http://apaste.info/ku4F > > The test is pretty simple. its a jmeter test with update command via SOAP rpc (round robin request across every node), adding in 5 fields from a csv file - id, guid, subject, body, compositeID (guid!id). > number of jmeter threads = 150. loop count = 20, num of messages to add/per guid = 3; total 150*3*20 = 9000 documents. > > When cloud gets stuck, i don't get anything in the logs, but when i run netstat i see the following. > Sample netstat on a stuck run. http://apaste.info/hr0O > hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts. > > > At the moment my benchmarking efforts are at a stand still. > > Any help from the community would be great, I got some heap dumps and stack dumps, but haven't found a smoking gun yet. > If I can provide anything else to diagnose this issue. just let me know. > > Thanks, > > Rishi. > > > > > > > >
Solr Cloud Hangs consistently .
Hi All, I am trying to benchmark SOLR Cloud and it consistently hangs. Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck. A little bit about my set up. I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host is configured to have 8 SOLR cloud nodes running at 4GB each. JVM configs: http://apaste.info/57Ai My cluster has 12 shards with replication factor 2- http://apaste.info/09sA I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already running this configuration in production in Non-Cloud form. It got stuck repeatedly. I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 and tomcat7. It still shows same behaviour and hangs through the test. My test schema and config. Schema.xml - http://apaste.info/imah SolrConfig.xml - http://apaste.info/ku4F The test is pretty simple. its a jmeter test with update command via SOAP rpc (round robin request across every node), adding in 5 fields from a csv file - id, guid, subject, body, compositeID (guid!id). number of jmeter threads = 150. loop count = 20, num of messages to add/per guid = 3; total 150*3*20 = 9000 documents. When cloud gets stuck, i don't get anything in the logs, but when i run netstat i see the following. Sample netstat on a stuck run. http://apaste.info/hr0O hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts. At the moment my benchmarking efforts are at a stand still. Any help from the community would be great, I got some heap dumps and stack dumps, but haven't found a smoking gun yet. If I can provide anything else to diagnose this issue. just let me know. Thanks, Rishi.
Re: shardkey
>From my understanding. In SOLR cloud the CompositeIdDocRouter uses HashbasedDocRouter. CompositeId router is default if your numShards>1 on collection creation. CompositeId router generates an hash using the uniqueKey defined in your schema.xml to route your documents to a dedicated shard. You can use select?q=xyz&shard.keys=uniquekey to focus your search to hit only the shard that has your shard.key Thanks, Rishi. -Original Message- From: Joshi, Shital To: 'solr-user@lucene.apache.org' Sent: Wed, Jun 12, 2013 10:01 am Subject: shardkey Hi, We are using Solr 4.3.0 SolrCloud (5 shards, 10 replicas). I have couple questions on shard key. 1. Looking at the admin GUI, how do I know which field is being used for shard key. 2. What is the default shard key used? 3. How do I override the default shard key? Thanks.
Re: Solr Composite Unique key from existing fields in schema
Thanks Jack, That fixed it and guarantees the order. As far as I can tell SOLR cloud 4.2.1 needs a uniquekey defined in its schema, or I get an exception. SolrCore Initialization Failures * testCloud2_shard1_replica1: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: QueryElevationComponent requires the schema to have a uniqueKeyField. Now that I have an autogenerated composite-id, it has to become a part of my schema as uniquekey for SOLR cloud to work. compositeId Is there a way to avoid compositeId field being defined in my schema.xml, would like to avoid the overhead of storing this field in my index. Thanks, Rishi. -Original Message- From: Jack Krupansky To: solr-user Sent: Tue, May 28, 2013 4:33 pm Subject: Re: Solr Composite Unique key from existing fields in schema The TL;DR response: Try this: userid_s id docid_s id id -- That will assure that the userid gets processed before the docid. I'll have to review the contract for CloneFieldUpdateProcessorFactory to see what is or ain't guaranteed when there are multiple input fields - whether this is a bug or a feature or simply undefined. -- Jack Krupansky -Original Message----- From: Rishi Easwaran Sent: Tuesday, May 28, 2013 3:54 PM To: solr-user@lucene.apache.org Subject: Re: Solr Composite Unique key from existing fields in schema I thought the same, but that doesn't seem to be the case. -Original Message- From: Jack Krupansky To: solr-user Sent: Tue, May 28, 2013 3:32 pm Subject: Re: Solr Composite Unique key from existing fields in schema The order in the ID should be purely dependent on the order of the field names in the processor configuration: docid_s userid_s -- Jack Krupansky -Original Message----- From: Rishi Easwaran Sent: Tuesday, May 28, 2013 2:54 PM To: solr-user@lucene.apache.org Subject: Re: Solr Composite Unique key from existing fields in schema Jack, No sure if this is the correct behaviour. I set up updateRequestorPorcess chain as mentioned below, but looks like the compositeId that is generated is based on input order. For example: If my input comes in as 1 12345 I get the following compositeId1-12345. If I reverse the input 12345 1 I get the following compositeId 12345-1 . In this case the compositeId is not unique and I am getting duplicates. Thanks, Rishi. -Original Message- From: Jack Krupansky To: solr-user Sent: Tue, May 28, 2013 12:07 pm Subject: Re: Solr Composite Unique key from existing fields in schema You can do this by combining the builtin update processors. Add this to your solrconfig: docid_s userid_s id id -- Add documents such as: curl "http://localhost:8983/solr/update?commit=true&update.chain=composite-id"; \ -H 'Content-type:application/json' -d ' [{"title": "Hello World", "docid_s": "doc-1", "userid_s": "user-1", "comments_ss": ["Easy", "Fast"]}]' And get results like: "title":["Hello World"], "docid_s":"doc-1", "userid_s":"user-1", "comments_ss":["Easy", "Fast"], "id":"doc-1--user-1", Add as many fields in whatever order you want using "source" in the clone update processor, and pick your composite key field name as well. And set the delimiter string as well in the concat update processor. I managed to reverse the field order from what you requested (userid, docid). I used the standard Solr example schema, so I used dynamic fields for the two ids, but use your own field names. -- Jack Krupansky -Original Message- From: Rishi Easwaran Sent: Tuesday, May 28, 2013 11:12 AM To: solr-user@lucene.apache.org Subject: Solr Composite Unique key from existing fields in schema Hi All, Historically we have used a single field in our schema as a uniqueKey. docid Wanted to change this to a composite key something like userid-docid. I know I can auto generate compositekey at document insert time, using custom code to generate a new field, but wanted to know if there was an inbuilt SOLR mechanism of doing this. That would prevent us from creating and storing an extra field. Thanks, Rishi.
Re: Solr Composite Unique key from existing fields in schema
I thought the same, but that doesn't seem to be the case. -Original Message- From: Jack Krupansky To: solr-user Sent: Tue, May 28, 2013 3:32 pm Subject: Re: Solr Composite Unique key from existing fields in schema The order in the ID should be purely dependent on the order of the field names in the processor configuration: docid_s userid_s -- Jack Krupansky -Original Message- From: Rishi Easwaran Sent: Tuesday, May 28, 2013 2:54 PM To: solr-user@lucene.apache.org Subject: Re: Solr Composite Unique key from existing fields in schema Jack, No sure if this is the correct behaviour. I set up updateRequestorPorcess chain as mentioned below, but looks like the compositeId that is generated is based on input order. For example: If my input comes in as 1 12345 I get the following compositeId1-12345. If I reverse the input 12345 1 I get the following compositeId 12345-1 . In this case the compositeId is not unique and I am getting duplicates. Thanks, Rishi. -Original Message- From: Jack Krupansky To: solr-user Sent: Tue, May 28, 2013 12:07 pm Subject: Re: Solr Composite Unique key from existing fields in schema You can do this by combining the builtin update processors. Add this to your solrconfig: docid_s userid_s id id -- Add documents such as: curl "http://localhost:8983/solr/update?commit=true&update.chain=composite-id"; \ -H 'Content-type:application/json' -d ' [{"title": "Hello World", "docid_s": "doc-1", "userid_s": "user-1", "comments_ss": ["Easy", "Fast"]}]' And get results like: "title":["Hello World"], "docid_s":"doc-1", "userid_s":"user-1", "comments_ss":["Easy", "Fast"], "id":"doc-1--user-1", Add as many fields in whatever order you want using "source" in the clone update processor, and pick your composite key field name as well. And set the delimiter string as well in the concat update processor. I managed to reverse the field order from what you requested (userid, docid). I used the standard Solr example schema, so I used dynamic fields for the two ids, but use your own field names. -- Jack Krupansky -Original Message- From: Rishi Easwaran Sent: Tuesday, May 28, 2013 11:12 AM To: solr-user@lucene.apache.org Subject: Solr Composite Unique key from existing fields in schema Hi All, Historically we have used a single field in our schema as a uniqueKey. docid Wanted to change this to a composite key something like userid-docid. I know I can auto generate compositekey at document insert time, using custom code to generate a new field, but wanted to know if there was an inbuilt SOLR mechanism of doing this. That would prevent us from creating and storing an extra field. Thanks, Rishi.
Re: Solr Composite Unique key from existing fields in schema
Jack, No sure if this is the correct behaviour. I set up updateRequestorPorcess chain as mentioned below, but looks like the compositeId that is generated is based on input order. For example: If my input comes in as 1 12345 I get the following compositeId1-12345. If I reverse the input 12345 1 I get the following compositeId 12345-1 . In this case the compositeId is not unique and I am getting duplicates. Thanks, Rishi. -Original Message- From: Jack Krupansky To: solr-user Sent: Tue, May 28, 2013 12:07 pm Subject: Re: Solr Composite Unique key from existing fields in schema You can do this by combining the builtin update processors. Add this to your solrconfig: docid_s userid_s id id -- Add documents such as: curl "http://localhost:8983/solr/update?commit=true&update.chain=composite-id"; \ -H 'Content-type:application/json' -d ' [{"title": "Hello World", "docid_s": "doc-1", "userid_s": "user-1", "comments_ss": ["Easy", "Fast"]}]' And get results like: "title":["Hello World"], "docid_s":"doc-1", "userid_s":"user-1", "comments_ss":["Easy", "Fast"], "id":"doc-1--user-1", Add as many fields in whatever order you want using "source" in the clone update processor, and pick your composite key field name as well. And set the delimiter string as well in the concat update processor. I managed to reverse the field order from what you requested (userid, docid). I used the standard Solr example schema, so I used dynamic fields for the two ids, but use your own field names. -- Jack Krupansky -Original Message- From: Rishi Easwaran Sent: Tuesday, May 28, 2013 11:12 AM To: solr-user@lucene.apache.org Subject: Solr Composite Unique key from existing fields in schema Hi All, Historically we have used a single field in our schema as a uniqueKey. docid Wanted to change this to a composite key something like userid-docid. I know I can auto generate compositekey at document insert time, using custom code to generate a new field, but wanted to know if there was an inbuilt SOLR mechanism of doing this. That would prevent us from creating and storing an extra field. Thanks, Rishi.
Re: Solr Composite Unique key from existing fields in schema
Thanks Jack, looks like that will do the trick from me. I will try it out. -Original Message- From: Jack Krupansky To: solr-user Sent: Tue, May 28, 2013 12:07 pm Subject: Re: Solr Composite Unique key from existing fields in schema You can do this by combining the builtin update processors. Add this to your solrconfig: docid_s userid_s id id -- Add documents such as: curl "http://localhost:8983/solr/update?commit=true&update.chain=composite-id"; \ -H 'Content-type:application/json' -d ' [{"title": "Hello World", "docid_s": "doc-1", "userid_s": "user-1", "comments_ss": ["Easy", "Fast"]}]' And get results like: "title":["Hello World"], "docid_s":"doc-1", "userid_s":"user-1", "comments_ss":["Easy", "Fast"], "id":"doc-1--user-1", Add as many fields in whatever order you want using "source" in the clone update processor, and pick your composite key field name as well. And set the delimiter string as well in the concat update processor. I managed to reverse the field order from what you requested (userid, docid). I used the standard Solr example schema, so I used dynamic fields for the two ids, but use your own field names. -- Jack Krupansky -Original Message- From: Rishi Easwaran Sent: Tuesday, May 28, 2013 11:12 AM To: solr-user@lucene.apache.org Subject: Solr Composite Unique key from existing fields in schema Hi All, Historically we have used a single field in our schema as a uniqueKey. docid Wanted to change this to a composite key something like userid-docid. I know I can auto generate compositekey at document insert time, using custom code to generate a new field, but wanted to know if there was an inbuilt SOLR mechanism of doing this. That would prevent us from creating and storing an extra field. Thanks, Rishi.
Solr Composite Unique key from existing fields in schema
Hi All, Historically we have used a single field in our schema as a uniqueKey. docid Wanted to change this to a composite key something like userid-docid. I know I can auto generate compositekey at document insert time, using custom code to generate a new field, but wanted to know if there was an inbuilt SOLR mechanism of doing this. That would prevent us from creating and storing an extra field. Thanks, Rishi.
Re: Upgrading from SOLR 3.5 to 4.2.1 Results.
No, we just upgraded to 4.2.1. With the size of our complex and effort required apply our patches and rollout, our upgrades are not that often. -Original Message- From: Noureddine Bouhlel To: solr-user Sent: Mon, May 20, 2013 3:36 pm Subject: Re: Upgrading from SOLR 3.5 to 4.2.1 Results. Hi Rishi, Have you done any tests with Solr 4.3 ? Regards, Cordialement, BOUHLEL Noureddine On 17 May 2013 21:29, Rishi Easwaran wrote: > > > Hi All, > > Its Friday 3:00pm, warm & sunny outside and it was a good week. Figured > I'd share some good news. > I work for AOL mail team and we use SOLR for our mail search backend. > We have been using it since pre-SOLR 1.4 and strong supporters of SOLR > community. > We deal with millions indexes and billions of requests a day across our > complex. > We finished full rollout of SOLR 4.2.1 into our production last week. > > Some key highlights: > - ~75% Reduction in Search response times > - ~50% Reduction in SOLR Disk busy , which in turn helped with ~90% > Reduction in errors > - Garbage collection total stop reduction by over 50% moving application > throughput into the 99.8% - 99.9% range > - ~15% reduction in CPU usage > > We did not tune our application moving from 3.5 to 4.2.1 nor update java. > For the most part it was a binary upgrade, with patches for our special > use case. > > Now going forward we are looking at prototyping SOLR Cloud for our search > system, upgrade java and tomcat, tune our application further. Lots of fun > stuff :) > > Have a great weekend everyone. > Thanks, > > Rishi. > > > > >
Re: Upgrading from SOLR 3.5 to 4.2.1 Results.
We use commodity H/W which we procured over the years as our complex grew. Running on jdk6 with tomcat 5. (Planning to upgrade to jdk7 and tomcat7 soon). We run them with about 4GB heap. Using CMS GC. -Original Message- From: adityab To: solr-user Sent: Sat, May 18, 2013 10:37 am Subject: Re: Upgrading from SOLR 3.5 to 4.2.1 Results. These numbers are really great. Would you mind sharing your h/w configuration and JVM params thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrading-from-SOLR-3-5-to-4-2-1-Results-tp4064266p4064370.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Upgrading from SOLR 3.5 to 4.2.1 Results.
Sure Shalin, hopefully soon. -Original Message- From: Shalin Shekhar Mangar To: solr-user Sent: Sat, May 18, 2013 11:35 pm Subject: Re: Upgrading from SOLR 3.5 to 4.2.1 Results. Awesome news Rishi! Looking forward to your SolrCloud updates. On Sat, May 18, 2013 at 12:59 AM, Rishi Easwaran wrote: > > > Hi All, > > Its Friday 3:00pm, warm & sunny outside and it was a good week. Figured > I'd share some good news. > I work for AOL mail team and we use SOLR for our mail search backend. > We have been using it since pre-SOLR 1.4 and strong supporters of SOLR > community. > We deal with millions indexes and billions of requests a day across our > complex. > We finished full rollout of SOLR 4.2.1 into our production last week. > > Some key highlights: > - ~75% Reduction in Search response times > - ~50% Reduction in SOLR Disk busy , which in turn helped with ~90% > Reduction in errors > - Garbage collection total stop reduction by over 50% moving application > throughput into the 99.8% - 99.9% range > - ~15% reduction in CPU usage > > We did not tune our application moving from 3.5 to 4.2.1 nor update java. > For the most part it was a binary upgrade, with patches for our special > use case. > > Now going forward we are looking at prototyping SOLR Cloud for our search > system, upgrade java and tomcat, tune our application further. Lots of fun > stuff :) > > Have a great weekend everyone. > Thanks, > > Rishi. > > > > > -- Regards, Shalin Shekhar Mangar.
Upgrading from SOLR 3.5 to 4.2.1 Results.
Hi All, Its Friday 3:00pm, warm & sunny outside and it was a good week. Figured I'd share some good news. I work for AOL mail team and we use SOLR for our mail search backend. We have been using it since pre-SOLR 1.4 and strong supporters of SOLR community. We deal with millions indexes and billions of requests a day across our complex. We finished full rollout of SOLR 4.2.1 into our production last week. Some key highlights: - ~75% Reduction in Search response times - ~50% Reduction in SOLR Disk busy , which in turn helped with ~90% Reduction in errors - Garbage collection total stop reduction by over 50% moving application throughput into the 99.8% - 99.9% range - ~15% reduction in CPU usage We did not tune our application moving from 3.5 to 4.2.1 nor update java. For the most part it was a binary upgrade, with patches for our special use case. Now going forward we are looking at prototyping SOLR Cloud for our search system, upgrade java and tomcat, tune our application further. Lots of fun stuff :) Have a great weekend everyone. Thanks, Rishi.
Re: SOLR Cloud Collection Management quesiotn.
Hi Anshum, What if you have more nodes than shards*replicationFactor. In the example below, originally I created the collection to use 6 shards* 2 replicationFactor = 12 nodes total. Now I added 6 more nodes, 18 nodes total. I just want to add 1 extra replica per shard. How will it get evenly distributed, what is the determining criteria. Thanks, Rishi. -Original Message- From: Anshum Gupta To: solr-user Sent: Tue, May 14, 2013 9:42 pm Subject: Re: SOLR Cloud Collection Management quesiotn. Hi Rishi, If you have your cluster up and running, just add the nodes and they will get evenly assigned to the shards. As of now, the replication factor is not persisted. On Wed, May 15, 2013 at 1:07 AM, Rishi Easwaran wrote: Ok looks like...I have to go to every node, add a replica individually, create the cores and add them to the collection. ex: http://newNode1:port/solr/**admin/cores?action=CREATE&** name=testCloud1_shard1_**replica3&collection=**testCloud1&shard=shard1&** collection.configName=myconf http://newNode2:port/solr/**admin/cores?action=CREATE&** name=testCloud1_shard2_**replica3&collection=**testCloud1&shard=shard2&** collection.configName=myconf Is there an easier way to do this. Any ideas. Thanks, Rishi. -Original Message- From: Rishi Easwaran To: solr-user Sent: Tue, May 14, 2013 2:58 pm Subject: SOLR Cloud Collection Management quesiotn. Hi, I am beginning to work on SOLR cloud implementation. I created a collection using the collections API http://myhost:port/solr/admin/**collections?action=CREATE&** name=testCloud1&numShards=6&**replicationFactor=2&** collection.configName=myconf&**maxShardsPerNode=1 My cluster now has 6 shards and 2 replicas (1 leader & 1 replica) for each shard. Now I want to add extra replicas to each shard in my cluster without out changing the replicationFactor used to create the collection. Any ideas on how to go about doing that. Thanks, Rishi. -- Anshum Gupta http://www.anshumgupta.net
Re: SOLR Cloud Collection Management quesiotn.
Ok looks like...I have to go to every node, add a replica individually, create the cores and add them to the collection. ex: http://newNode1:port/solr/admin/cores?action=CREATE&name=testCloud1_shard1_replica3&collection=testCloud1&shard=shard1&collection.configName=myconf http://newNode2:port/solr/admin/cores?action=CREATE&name=testCloud1_shard2_replica3&collection=testCloud1&shard=shard2&collection.configName=myconf Is there an easier way to do this. Any ideas. Thanks, Rishi. -Original Message- From: Rishi Easwaran To: solr-user Sent: Tue, May 14, 2013 2:58 pm Subject: SOLR Cloud Collection Management quesiotn. Hi, I am beginning to work on SOLR cloud implementation. I created a collection using the collections API http://myhost:port/solr/admin/collections?action=CREATE&name=testCloud1&numShards=6&replicationFactor=2&collection.configName=myconf&maxShardsPerNode=1 My cluster now has 6 shards and 2 replicas (1 leader & 1 replica) for each shard. Now I want to add extra replicas to each shard in my cluster without out changing the replicationFactor used to create the collection. Any ideas on how to go about doing that. Thanks, Rishi.
SOLR Cloud Collection Management quesiotn.
Hi, I am beginning to work on SOLR cloud implementation. I created a collection using the collections API http://myhost:port/solr/admin/collections?action=CREATE&name=testCloud1&numShards=6&replicationFactor=2&collection.configName=myconf&maxShardsPerNode=1 My cluster now has 6 shards and 2 replicas (1 leader & 1 replica) for each shard. Now I want to add extra replicas to each shard in my cluster without out changing the replicationFactor used to create the collection. Any ideas on how to go about doing that. Thanks, Rishi.