Re: Solr facet search improvements
It would probably be better to do entity extraction and normalization of job titles as a front-end process before ingesting the data into Solr, but you could also do it as a custom or script update processor. The latter can be easily coded in JavaScript to run within Solr Your first step in any case will be to define the specific rules you wish to use for both normalization of job titles and the actual matching. Yes, you can do that in Solr, but you have to do it, Solr will not do it magically for you. Also, post some specific query examples that completely cover the range of queries you need to be able to handle. -- Jack Krupansky On Wed, Jan 28, 2015 at 5:56 AM, thakkar.aayush thakkar.aay...@gmail.com wrote: I have around 1 million job titles which are indexed on Solr and am looking to improve the faceted search results on job title matches. For example: a job search for *Research Scientist Computer Architecture* is made, and the facet field title which is tokenized in solr and gives the following results: 1. Senior Data Scientist 2. PARALLEL COMPUTING SOFTWARE ENGINEER 3. Engineer/Scientist 4 4. Data Scientist 5. Engineer/Scientist 6. Senior Research Scientist 7. Research Scientist-Wireless Networks 8. Research Scientist-Andriod Development 9. Quantum Computing Theorist Job 10.Data Sceintist Smart Analytics I want to be able to improve / optimize the job titles and be able to make exclusions and some normalizations. Is this possible with Solr? What is the best way to have more granular control over the facted search results ? For example *Engineer/Scientist 4* - is not useful and too specific and titles like *Quantum Computing theorist* would ideally also be excluded -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-facet-search-improvements-tp4182502.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Running multiple full-import commands via curl in a script
Literally, queue can be done by submitting as is (async) and polling command status. However, giving https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DataImportHandler.java#L200 you can try to add synchronous=true... that should hang request until it's completed. The other question is how run requests in parallel which is explicitly violated by https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DataImportHandler.java#L173 The only workaround I can suggest is to duplicate DIH definitions in solr config requestHandler name=/dataimport class=solr.DataImportHandler ... requestHandler name=/dataimport2 class=solr.DataImportHandler ... requestHandler name=/dataimport3 class=solr.DataImportHandler ... ... then those guys should be able to handle own request in parallel. Nasty stuff.. have a good hack On Wed, Jan 28, 2015 at 3:47 AM, Carl Roberts carl.roberts.zap...@gmail.com wrote: Hi, I am attempting to run all these curl commands from a script so that I can put them in a crontab job, however, it seems that only the first one executes and the other ones return with an error (below): curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2002 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2003 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2004 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2005 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2006 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2007 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2008 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2009 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2010 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2011 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2012 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2013 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2014 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2015 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= delta-importclean=falseentity=cve-last error: *A command is still running...* Question: Is there a way to queue the other requests in Solr so that they run as soon as the previous one is done? If not, how would you recommend I do this? Many thanks in advance, Joe -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Reindex data without creating new index.
On 1/27/2015 11:54 PM, SolrUser1543 wrote: I want to reindex my data in order to change a value of some field according to value of another. ( both field are existing ) For this purpose I run a clue utility in order to get a list of IDs. Then I created an update processor , which can set a value of field A according to value of field B. I added a new request handler ,like a classic update , but with new update chain with a new update processor I want to run a http post request for each ID , to a new handler ,with item id only. This will trigger my update processor , which will get an existing doc from the index and do the logic. So in this way I can do some enrichment , without full data import and without creating a new index . What do you think about it ? Could it cause a performance degradation because of it? SOLR can handle it or it will rebalance the index ? Does SOLR has some built in feature which can do it ? This is likely possible, with some caveats. You'll need to write all the code yourself, extending the UpdateRequestProcessorFactory and UpdateRequestProcessor classes. This will be similar to the atomic update feature, so you'll likely need to find that source code and model yours on its operation. It will have the same requirements -- all fields must be 'stored=true' except those which are copyField destinations, which must be 'stored=false'. With Atomic Updates, this requirement is not *enforced*, but it must be met, or there will be data loss. https://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations What do you mean by rebalance the index? This could mean almost anything, but most of the meanings I can come up with would not apply to this situation at all. The effect on Solr for each document you process will be the sum of: A query for that document, a tiny bit for the update processor itself, followed by a reindex of that document. Thanks, Shawn
Re: Solr facet search improvements
On 1/28/2015 3:56 AM, thakkar.aayush wrote: I have around 1 million job titles which are indexed on Solr and am looking to improve the faceted search results on job title matches. For example: a job search for *Research Scientist Computer Architecture* is made, and the facet field title which is tokenized in solr and gives the following results: 1. Senior Data Scientist 2. PARALLEL COMPUTING SOFTWARE ENGINEER 3. Engineer/Scientist 4 4. Data Scientist 5. Engineer/Scientist 6. Senior Research Scientist 7. Research Scientist-Wireless Networks 8. Research Scientist-Andriod Development 9. Quantum Computing Theorist Job 10.Data Sceintist Smart Analytics I want to be able to improve / optimize the job titles and be able to make exclusions and some normalizations. Is this possible with Solr? What is the best way to have more granular control over the facted search results ? For example *Engineer/Scientist 4* - is not useful and too specific and titles like *Quantum Computing theorist* would ideally also be excluded Normally, if the field is tokenized, you will not get the original values in the facet. You will get values like senior instead of Senior Data Scientist. If DocValues are enabled on the field, then you may well indeed get the original values. I've never tried facets on a tokenized field with DocValues, but everything I understand about the feature says it would result in the original (not tokenized) values. If you want different values in the facets, then you'll need to change those values before they get indexed in Solr. That can be done with custom UpdateProcessor code embedded in the update chain, or you can simply do the changes in your program that indexes the data in Solr. Thanks, Shawn
What is the best way to update an index?
Hi, What is the best way to update an index with new data or records? Via this command: curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-importclean=falsesynchronous=trueentity=cve-2002; or this command: curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=delta-importsynchronous=trueentity=cve-2002; Thanks, Joe
Re: Running multiple full-import commands via curl in a script
Thanks Mikhail - synchronous=true works like a charm...:) On 1/28/15, 5:16 AM, Mikhail Khludnev wrote: Literally, queue can be done by submitting as is (async) and polling command status. However, giving https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DataImportHandler.java#L200 you can try to add synchronous=true... that should hang request until it's completed. The other question is how run requests in parallel which is explicitly violated by https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DataImportHandler.java#L173 The only workaround I can suggest is to duplicate DIH definitions in solr config requestHandler name=/dataimport class=solr.DataImportHandler ... requestHandler name=/dataimport2 class=solr.DataImportHandler ... requestHandler name=/dataimport3 class=solr.DataImportHandler ... ... then those guys should be able to handle own request in parallel. Nasty stuff.. have a good hack On Wed, Jan 28, 2015 at 3:47 AM, Carl Roberts carl.roberts.zap...@gmail.com wrote: Hi, I am attempting to run all these curl commands from a script so that I can put them in a crontab job, however, it seems that only the first one executes and the other ones return with an error (below): curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2002 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2003 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2004 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2005 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2006 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2007 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2008 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2009 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2010 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2011 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2012 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2013 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2014 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-importclean=falseentity=cve-2015 curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= delta-importclean=falseentity=cve-last error: *A command is still running...* Question: Is there a way to queue the other requests in Solr so that they run as soon as the previous one is done? If not, how would you recommend I do this? Many thanks in advance, Joe
Re: Morphology of synonims
On 1/28/2015 5:11 AM, Reinforcer wrote: Is Solr capable of using morphology for synonims? For example. Request: inanely. Indexed text in Solr: Searching keywords without morphology is fatuously. inane and fatuous are synonims. So, inanely ---morphology inane -synonims--- fatuous ---morphology fatuously. Is this possible (double morphology)? Synonyms are handled via exact match. The feature you are describing is called stemming or lemmatization. https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming It is possible to combine stemming and synonyms in the same analysis chain, but you must figure out what the root word is to put into your synonym list. It may not be what you expect. For example, the english stemmer will probably change acheive to acheiv ... which sounds wrong, until you remember that stemming must be applied both at index and query time, and the user will never see that form of the word. Synonyms are usually only applied at either index or query time. Which one to choose depends on your requirements, but I believe it is typically on the query side. The analysis tab in the admin UI is invaluable for seeing the results of changes in the analysis chain. Thanks, Shawn
Re: Reading data from another solr core
Hi, I usually use the SolrEntityProcessor for moving/transform data between cores, it's a piece of cake! Regards. On Wed, Jan 28, 2015 at 8:13 AM, solrk koushikga...@gmail.com wrote: Hi Guys, I have multiple cores setup in my solr server. I would like read/import data from one core(source) into another core(target) and index it..Is there is a easy way in solr to do so? I was thinking of using SolrEntityProcessor for this purpose..any other suggestions is appreciated.. http://blog.trifork.com/2011/11/08/importing-data-from-another-solr/ For example: dataconfig document entity name=user pk=id url= processor=XPathEntityProcessor field column=id xpath=/user/id / entity name=sep processor=SolrEntityProcessor query=*:* url=http://127.0.0.1:8081/solr/core2; /entity /entity /document /dataconfig Please sugget me if there is better solution? or Should i write new processor which reads the index of another core? -- View this message in context: http://lucene.472066.n3.nabble.com/Reading-data-from-another-solr-core-tp4182466.html Sent from the Solr - User mailing list archive at Nabble.com.
Morphology of synonims
Hi, Is Solr capable of using morphology for synonims? For example. Request: inanely. Indexed text in Solr: Searching keywords without morphology is fatuously. inane and fatuous are synonims. So, inanely ---morphology inane -synonims--- fatuous ---morphology fatuously. Is this possible (double morphology)? Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Morphology-of-synonims-tp4182517.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr facet search improvements
I have around 1 million job titles which are indexed on Solr and am looking to improve the faceted search results on job title matches. For example: a job search for *Research Scientist Computer Architecture* is made, and the facet field title which is tokenized in solr and gives the following results: 1. Senior Data Scientist 2. PARALLEL COMPUTING SOFTWARE ENGINEER 3. Engineer/Scientist 4 4. Data Scientist 5. Engineer/Scientist 6. Senior Research Scientist 7. Research Scientist-Wireless Networks 8. Research Scientist-Andriod Development 9. Quantum Computing Theorist Job 10.Data Sceintist Smart Analytics I want to be able to improve / optimize the job titles and be able to make exclusions and some normalizations. Is this possible with Solr? What is the best way to have more granular control over the facted search results ? For example *Engineer/Scientist 4* - is not useful and too specific and titles like *Quantum Computing theorist* would ideally also be excluded -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-facet-search-improvements-tp4182502.html Sent from the Solr - User mailing list archive at Nabble.com.
CoreContainer#createAndLoad, existing cores not loaded
My problem: I create cores dynamically using container#create( CoreDescriptor ) and then add documents to the very core(s). So far so good. When I restart my app I do container = CoreContainer#createAndLoad(...) but when I then call container.getAllCoreNames() an empty list is returned. What cores should be loaded by the container if I call CoreContainer#createAndLoad(...) ? Where does the container lookup the existing cores?
Re: extract and add fields on the fly
Create the SID from the existing doc implies that a document already exists that you wish to add fields to. However if the document is a binary are you suggesting 1) curl to upload/extract passing docID 2) obtain a SID based off docID 3) add addtinal fields to SID commit I know I'm possibly wandering into the schemaless teritory here as well On 28 January 2015 at 17:11, Andrew Pawloski apawlo...@gmail.com wrote: I would switch the order of those. Add the new fields and *then* index to solr. We do something similar when we create SolrInputDocuments that are pushed to solr. Create the SID from the existing doc, add any additional fields, then add to solr. On Wed, Jan 28, 2015 at 11:56 AM, Mark javam...@gmail.com wrote: Is it possible to use curl to upload a document (for extract indexing) and specify some fields on the fly? sort of: 1) index this document 2) by the way here are some important facets whilst your at it Regards Mark
Re: extract and add fields on the fly
Sorry, I may have misunderstood: Are you talking about adding additional fields at indexing time? (Here I would add the fields first *then* send to solr.) Are you talking about updating a field withing an existing document in a solr index? (In that case I would direct you here [1].) Am I still misunderstanding? [1] https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents On Wed, Jan 28, 2015 at 12:30 PM, Mark javam...@gmail.com wrote: Create the SID from the existing doc implies that a document already exists that you wish to add fields to. However if the document is a binary are you suggesting 1) curl to upload/extract passing docID 2) obtain a SID based off docID 3) add addtinal fields to SID commit I know I'm possibly wandering into the schemaless teritory here as well On 28 January 2015 at 17:11, Andrew Pawloski apawlo...@gmail.com wrote: I would switch the order of those. Add the new fields and *then* index to solr. We do something similar when we create SolrInputDocuments that are pushed to solr. Create the SID from the existing doc, add any additional fields, then add to solr. On Wed, Jan 28, 2015 at 11:56 AM, Mark javam...@gmail.com wrote: Is it possible to use curl to upload a document (for extract indexing) and specify some fields on the fly? sort of: 1) index this document 2) by the way here are some important facets whilst your at it Regards Mark
Re: extract and add fields on the fly
Second thoughts SID is purely i/p as its name suggests :) I think a better approach would be 1) curl to upload/extract passing docID 2) curl to update additional fields for that docID On 28 January 2015 at 17:30, Mark javam...@gmail.com wrote: Create the SID from the existing doc implies that a document already exists that you wish to add fields to. However if the document is a binary are you suggesting 1) curl to upload/extract passing docID 2) obtain a SID based off docID 3) add addtinal fields to SID commit I know I'm possibly wandering into the schemaless teritory here as well On 28 January 2015 at 17:11, Andrew Pawloski apawlo...@gmail.com wrote: I would switch the order of those. Add the new fields and *then* index to solr. We do something similar when we create SolrInputDocuments that are pushed to solr. Create the SID from the existing doc, add any additional fields, then add to solr. On Wed, Jan 28, 2015 at 11:56 AM, Mark javam...@gmail.com wrote: Is it possible to use curl to upload a document (for extract indexing) and specify some fields on the fly? sort of: 1) index this document 2) by the way here are some important facets whilst your at it Regards Mark
Re: extract and add fields on the fly
I'm looking to 1) upload a binary document using curl 2) add some additional facets Specifically my question is can this be achieved in 1 curl operation or does it need 2? On 28 January 2015 at 17:43, Mark javam...@gmail.com wrote: Second thoughts SID is purely i/p as its name suggests :) I think a better approach would be 1) curl to upload/extract passing docID 2) curl to update additional fields for that docID On 28 January 2015 at 17:30, Mark javam...@gmail.com wrote: Create the SID from the existing doc implies that a document already exists that you wish to add fields to. However if the document is a binary are you suggesting 1) curl to upload/extract passing docID 2) obtain a SID based off docID 3) add addtinal fields to SID commit I know I'm possibly wandering into the schemaless teritory here as well On 28 January 2015 at 17:11, Andrew Pawloski apawlo...@gmail.com wrote: I would switch the order of those. Add the new fields and *then* index to solr. We do something similar when we create SolrInputDocuments that are pushed to solr. Create the SID from the existing doc, add any additional fields, then add to solr. On Wed, Jan 28, 2015 at 11:56 AM, Mark javam...@gmail.com wrote: Is it possible to use curl to upload a document (for extract indexing) and specify some fields on the fly? sort of: 1) index this document 2) by the way here are some important facets whilst your at it Regards Mark
Re: extract and add fields on the fly
I would switch the order of those. Add the new fields and *then* index to solr. We do something similar when we create SolrInputDocuments that are pushed to solr. Create the SID from the existing doc, add any additional fields, then add to solr. On Wed, Jan 28, 2015 at 11:56 AM, Mark javam...@gmail.com wrote: Is it possible to use curl to upload a document (for extract indexing) and specify some fields on the fly? sort of: 1) index this document 2) by the way here are some important facets whilst your at it Regards Mark
Re: replicas goes in recovery mode right after update
Hi Shawn, Thank you so much for the assistance. Building is not a problem . Back in the days I have worked with linking, compiling and building C , C++ software . Java is a piece of cake. We have built the new war from the source version 4.10.3 and our preliminary tests have shown that our issue (replicas in recovery on high load)* is resolved *. We will continue to do more testing and confirm . Please note that the *patch is BUGGY*. It removed the break statement within while loop because of which, whenever we send a list of docs it would hang (API CloudSolrServer.add) , but it would work if send one doc at a time. It took a while to figure out why that is happening. Once we put the break statement back it worked like a charm. Furthermore the patch has solr/solrj/src/java/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrClient.java which should be solr/solrj/src/java/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.java Finally checking if(!offer) is sufficient than using if(offer == false) Last but not the least having a configurable queue size and timeouts (managed via solrconfig) would be quite helpful Thank you once again for your help. Vijay On Tue, Jan 27, 2015 at 6:20 PM, Shawn Heisey apa...@elyograg.org wrote: On 1/27/2015 2:52 PM, Vijay Sekhri wrote: Hi Shawn, Here is some update. We found the main issue We have configured our cluster to run under jetty and when we tried full indexing, we did not see the original Invalid Chunk error. However the replicas still went into recovery All this time we been trying to look into replicas logs to diagnose the issue. The problem seem to be at the leader side. When we looked into leader logs, we found the following on all the leaders 3439873 [qtp1314570047-92] WARN org.apache.solr.update.processor.DistributedUpdateProcessor – Error sending update *java.lang.IllegalStateException: Queue full* snip There is a similar bug reported around this https://issues.apache.org/jira/browse/SOLR-5850 and it seem to be in OPEN status. Is there a way we can configure the queue size and increase it ? or is there a version of solr that has this issue resolved already? Can you suggest where we go from here to resolve this ? We can repatch the war file if that is what you would recommend . In the end our initial speculation about solr unable to handle so many update is correct. We do not see this issue when the update load is less. Are you in a position where you can try the patch attached to SOLR-5850? You would need to get the source code for the version you're on (or perhaps a newer 4.x version), patch it, and build Solr yourself. If you have no experience building java packages from source, this might prove to be difficult. Thanks, Shawn -- * Vijay Sekhri *
IndexFormatTooNewException
Hi, We upgraded our cluster to Solr 4.10.0 for couple days and again reverted back to 4.8.0. However the dashboard still shows Solr 4.10.0. Do you know why? * solr-spec 4.10.0 * solr-impl 4.10.0 1620776 * lucene-spec 4.10.0 * lucene-impl 4.10.0 1620776 We recently added new shards to our cluster and dashboard shows correct Solr version (4.8.0) for these new shards. We copied index from one of old shards (where it is showing 4.10.0 on dashboard) to this new shard and we see this error upon start up. How do we get rid of this error? Caused by: org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource: BufferedChecksumIndexInput(MMapIndexInput(path=/local/data/solr13/index.20140919180209018/segments_1tzz))): 3 (needs to be between 0 and 2) at org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:156) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:335) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:416) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:864) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:710) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:412) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:749)
Re: replica never takes leader role
Yes, after 45 seconds a replica should take over as leader. It should likely explain in the logs of the replica that should be taking over why this is not happening. - Mar On Wed Jan 28 2015 at 2:52:32 PM Joshi, Shital shital.jo...@gs.com wrote: When leader reaches 99% physical memory on the box and starts swapping (stops replicating), we forcefully bring down leader (first kill -15 and then kill -9 if kill -15 doesn't work). This is when we are looking up to replica to assume leader's role and it never happens. Zookeeper timeout is 45 seconds. We can increase it up to 2 minutes and test. cores adminPath=/admin/cores defaultCoreName=collection1 host=${host:} hostPort=${jetty.port:8983} hostContext=${hostContext:solr} zkClientTimeout=${zkClientTimeout:45000} As per definition of zkClientTimeout, After the leader is brought down and it doesn't talk to zookeeper for 45 seconds, shouldn't ZK promote replica to leader? I am not sure how increasing zk timeout will help. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, January 28, 2015 11:42 AM To: solr-user@lucene.apache.org Subject: Re: replica never takes leader role This is not the desired behavior at all. I know there have been improvements in this area since 4.8, but can't seem to locate the JIRAs. I'm curious _why_ the nodes are going down though, is it happening at random or are you taking it down? One problem has been that the Zookeeper timeout used to default to 15 seconds, and occasionally a node would be unresponsive (sometimes due to GC pauses) and exceed the timeout. So upping the ZK timeout has helped some people avoid this... FWIW, Erick On Wed, Jan 28, 2015 at 7:11 AM, Joshi, Shital shital.jo...@gs.com wrote: We're using Solr 4.8.0 -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, January 27, 2015 7:47 PM To: solr-user@lucene.apache.org Subject: Re: replica never takes leader role What version of Solr? This is an ongoing area of improvements and several are very recent. Try searching the JIRA for Solr for details. Best, Erick On Tue, Jan 27, 2015 at 1:51 PM, Joshi, Shital shital.jo...@gs.com wrote: Hello, We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and three zookeeper instances. We have noticed that when a leader node goes down the replica never takes over as a leader, cloud becomes unusable and we have to bounce entire cloud for replica to assume leader role. Is this default behavior? How can we change this? Thanks.
RE: IndexFormatTooNewException
Thank you for replying. We added new shard to same cluster where some shards are showing Solr version 4.10.0 and this new shard is showing Solr version 4.8.0. All shards source Solr software from same location and use same start up script. I am surprised how older shards are still running Solr 4.10.0. How we do real downgrade index to 4.8? You mean replay all data? -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Wednesday, January 28, 2015 4:10 PM To: solr-user@lucene.apache.org Subject: Re: IndexFormatTooNewException : We upgraded our cluster to Solr 4.10.0 for couple days and again : reverted back to 4.8.0. However the dashboard still shows Solr 4.10.0. : Do you know why? because you didn't fully revert - you are still running Solr 4.10.0 - the details of what steps you took to try and switch back make a huge differnet in understanding why you are still running .0 even though you don't want to. : We recently added new shards to our cluster and dashboard shows correct : Solr version (4.8.0) for these new shards. We copied index from one of : old shards (where it is showing 4.10.0 on dashboard) to this new shard : and we see this error upon start up. How do we get rid of this error? IndexFormatTooNewException means exactly what it sounds like -- you are asking Solr/Lucene to open an index that it can tell was created by a newer version of the software and it is incapable of doing so. You either need to upgrade all of the nodes to 4.10, or you need to scrap this index, do a *real* downgrade to 4.8, and then rebuild your index (or restore a backup index from before you attempted to upgrade. : Caused by: org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource: BufferedChecksumIndexInput(MMapIndexInput(path=/local/data/solr13/index.20140919180209018/segments_1tzz))): 3 (needs to be between 0 and 2) : at org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:156) : at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:335) : at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:416) : at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:864) : at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:710) : at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:412) : at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:749) : : : -Hoss http://www.lucidworks.com/
Re: IndexFormatTooNewException
: We upgraded our cluster to Solr 4.10.0 for couple days and again : reverted back to 4.8.0. However the dashboard still shows Solr 4.10.0. : Do you know why? because you didn't fully revert - you are still running Solr 4.10.0 - the details of what steps you took to try and switch back make a huge differnet in understanding why you are still running .0 even though you don't want to. : We recently added new shards to our cluster and dashboard shows correct : Solr version (4.8.0) for these new shards. We copied index from one of : old shards (where it is showing 4.10.0 on dashboard) to this new shard : and we see this error upon start up. How do we get rid of this error? IndexFormatTooNewException means exactly what it sounds like -- you are asking Solr/Lucene to open an index that it can tell was created by a newer version of the software and it is incapable of doing so. You either need to upgrade all of the nodes to 4.10, or you need to scrap this index, do a *real* downgrade to 4.8, and then rebuild your index (or restore a backup index from before you attempted to upgrade. : Caused by: org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource: BufferedChecksumIndexInput(MMapIndexInput(path=/local/data/solr13/index.20140919180209018/segments_1tzz))): 3 (needs to be between 0 and 2) : at org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:156) : at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:335) : at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:416) : at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:864) : at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:710) : at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:412) : at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:749) : : : -Hoss http://www.lucidworks.com/
Re: Reindex data without creating new index.
By rebalancing I mean that such a big amount of updates will create a situation which will require running optimization of index ,because each document will be added again, instead of original one. But according to what you say it is should not be a problem, am I correct? -- View this message in context: http://lucene.472066.n3.nabble.com/Reindex-data-without-creating-new-index-tp4182464p4182726.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrcloud open new searcher not happening in slave for deletebyID
On 1/27/2015 5:50 PM, vsriram30 wrote: I am using Solrcloud 4.6.1 In that if I use CloudSolrServer to add a record to solr, then I see the following commit update command in both master and in slave node : One of the first things to find out is whether it's still a problem in the latest version of Solr, which is currently 4.10.3. Solr 4.6.1 is a year old, and there have been seven new versions released since then. Solr, especially SolrCloud, changes at a VERY rapid pace ... in each version, many bugs are fixed, and each x.y.0 version adds new features/functionality. I'm not in a position to set up a minimal SolrCloud testbed to try this out, or I would try it myself. 2015-01-27 15:20:23,625 INFO org.apache.solr.update.UpdateHandler: start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} I am also setting the updateRequest.setCommitWithin(5000); Here as noticed, the openSearcher=true and hence after 5 seconds, I am able to see the record in index in both slave and in master. Now if I trigger another UpdateRequest with only deleteById set and no add documents to Solr, with the same commit within time, then in the master log I see, 2015-01-27 15:21:46,389 INFO org.apache.solr.update.UpdateHandler: start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} and in the slave log I see, 2015-01-27 15:21:56,393 INFO org.apache.solr.update.UpdateHandler: start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} Here as noticed, the master is having openSearcher=true and slave is having openSearcher=false. This causes inconsistency in the results as master shows that the record is deleted and slave still has the record. After digging through the code a bit, I think this is probably happening in CommitTracker where the openSearcher might be false while creating the CommitUpdateCommand. Can you advise if there is any ticket created to address this issue or can I create one? Also is there any workaround for this till the bug is fixed than to set commit within duration in server to a lower value? It does sound like a bug. Some possible workarounds, no idea how effective they will be: *) Try deleteByQuery to see whether it is affected the same way. *) Use autoSoftCommit in solrconfig.xml instead of commitWithin on the update request. I do see a report of an identical problem on this mailing list, two days after 4.0-ALPHA was announced, which was the first public release that included SolrCloud. Both of the following URLs open the same message: http://osdir.com/ml/solr-user.lucene.apache.org/2012-07/msg00214.html http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201207.mbox/%3ccal3vrcdsiqyajuy6eqvpak0ftg-oy7n5g7cql4x4_8sz5jm...@mail.gmail.com%3E I did not find an existing issue in Jira for this problem, so if the same problem exists in 4.10.3, filing one sounds like a good idea. Thanks, Shawn
Re: Stop word suggestions are coming when I indexed sentence using ShingleFilterFactory
Ok.. I got the solution. Changed the value of maxQueryFrequency from 0.01(1%) to 0.9(90%). It is working. thanks a lot. On Tue, Jan 27, 2015 at 8:55 PM, Dyer, James james.d...@ingramcontent.com wrote: Can you give a little more information as to how you have the spellchecker configured in solrsonfig.xml? Also, it would help if you showed a query and the spell check response and then explain what you wanted it to return vs what it actually returned. My guess is that the stop words you mention exist in your spelling index and you're not using the alternativeTermCount parameter, which tells it to suggest for terms that exist in the index. I take it also you're using shingles to get word-break suggestions? You might have better luck with this using WordBreakSolrSpellchecker instead of shingles. James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Tuesday, January 27, 2015 5:06 AM To: solr-user@lucene.apache.org Subject: Stop word suggestions are coming when I indexed sentence using ShingleFilterFactory Hi, I am getting the suggestion of both correct words and misspell words but not getting, stop words suggestions. Why? Even I am not using solr.StopFilterFactory. Schema.xml : *field name=gram type=textSpell indexed=true stored=true required=true multiValued=false/* fieldType name=*textSpell* class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.ShingleFilterFactory maxShingleSize=5 minShingleSize=2 outputUnigrams=true/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.ShingleFilterFactory maxShingleSize=5 minShingleSize=2 outputUnigrams=true/ /analyzer /fieldType
RE: replica never takes leader role
We're using Solr 4.8.0 -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, January 27, 2015 7:47 PM To: solr-user@lucene.apache.org Subject: Re: replica never takes leader role What version of Solr? This is an ongoing area of improvements and several are very recent. Try searching the JIRA for Solr for details. Best, Erick On Tue, Jan 27, 2015 at 1:51 PM, Joshi, Shital shital.jo...@gs.com wrote: Hello, We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and three zookeeper instances. We have noticed that when a leader node goes down the replica never takes over as a leader, cloud becomes unusable and we have to bounce entire cloud for replica to assume leader role. Is this default behavior? How can we change this? Thanks.
RE: Suggesting broken words with solr.WordBreakSolrSpellChecker
I tried increasing my alternativeTermCount to 5 and enable extended results. I also added a filter fq parameter to clarify what I mean: *Querying for go pro is good:* { responseHeader: { status: 0, QTime: 2, params: { q: go pro, indent: true, fq: marchio:\GO PRO\, rows: 1, wt: json, spellcheck.extendedResults: true, _: 1422485581792 } }, response: { numFound: 27, start: 0, docs: [ { codice_produttore_s: DK00150020, codice_s: 5.BAT.27407, id: 27407, marchio: GO PRO, barcode_interno_s: 185323000958, prezzo_acquisto_d: 16.12, data_aggiornamento_dt: 2012-06-21T00:00:00Z, descrizione: BATTERIA GO PRO HERO , prezzo_vendita_d: 39.9, categoria: Batterie, _version_: 1491583424191791000 }, ] }, spellcheck: { suggestions: [ go pro, { numFound: 1, startOffset: 0, endOffset: 6, origFreq: 433, suggestion: [ { word: gopro, freq: 2 } ] }, correctlySpelled, false, collation, [ collationQuery, gopro, hits, 3, misspellingsAndCorrections, [ go pro, gopro ] ] ] } } While querying for gopro is not: { responseHeader: { status: 0, QTime: 6, params: { q: gopro, indent: true, fq: marchio:\GO PRO\, rows: 1, wt: json, spellcheck.extendedResults: true, _: 1422485629480 } }, response: { numFound: 3, start: 0, docs: [ { codice_produttore_s: DK0030010, codice_s: 5.VID.39163, id: 38814, marchio: GO PRO, barcode_interno_s: 818279012477, prezzo_acquisto_d: 150.84, data_aggiornamento_dt: 2014-12-24T00:00:00Z, descrizione: VIDEOCAMERA GO-PRO HERO 3 WHITE NUOVO SLIM, prezzo_vendita_d: 219, categoria: Fotografia, _version_: 1491583425479442400 }, ] }, spellcheck: { suggestions: [ gopro, { numFound: 1, startOffset: 0, endOffset: 5, origFreq: 2, suggestion: [ { word: giro, freq: 6 } ] }, correctlySpelled, false ] } } --- I'd like go pro as a suggestion for gopro too. -- View this message in context: http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182735.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: IndexFormatTooNewException
On 1/28/2015 2:51 PM, Joshi, Shital wrote: Thank you for replying. We added new shard to same cluster where some shards are showing Solr version 4.10.0 and this new shard is showing Solr version 4.8.0. All shards source Solr software from same location and use same start up script. I am surprised how older shards are still running Solr 4.10.0. How we do real downgrade index to 4.8? You mean replay all data? It is often not enough to simply replace the solr war. You may also need to wipe out the extracted war before restarting, or jars from the previous version may still exist and some of them might be loaded instead of the new version. If you're using the jetty included in the example, the war is in the webapps directory and the extracted files are under solr-webapp. If you're using another container, then I have no idea where the war gets extracted. If any index segments were written by the 4.10 version, they will not be readable after downgrading to the 4.8 version. Wiping out the index and rebuilding it from scratch is usually the only way to fix that situation. Thanks, Shawn
Re: Solrcloud open new searcher not happening in slave for deletebyID
Thanks Shawn. Not sure whether I will be able to test it out with 4.10.3. I will try the workarounds and update. Thanks, V.Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solrcloud-open-new-searcher-not-happening-in-slave-for-deletebyID-tp4182439p4182757.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reading data from another solr core
Thank you Alvaro Cabrerizo! I am going to give a shot. -- View this message in context: http://lucene.472066.n3.nabble.com/Reading-data-from-another-solr-core-tp4182466p4182758.html Sent from the Solr - User mailing list archive at Nabble.com.
Define Id when using db dih
Hi, I am using data import handler and import data from oracle db. I have a problem that the table I am importing from has no one column which is defined as a key. How should I define the key in the data config file ? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Define-Id-when-using-db-dih-tp4182797.html Sent from the Solr - User mailing list archive at Nabble.com.
AW: CoreContainer#createAndLoad, existing cores not loaded
Thx Shawn. I am running latest-greatest Solr (4.10.3) Solr home is e.g. /opt/webs/siteX/WebContent/WEB-INF/solr the core(s) reside in /opt/webs/siteX/WebContent/WEB-INF/solr/cores Should these be found by core discovery? If not, how can I configure coreRootDirectory in sorl.xml to be cores folder below slorHome str name=coreRootDirectory${coreRootDirectory:solrHome/cores}/str Note: the solr.xml is to be used for any of our 150sites we host. Therefore like it to be generic - solrHome/cores -Ursprüngliche Nachricht- Von: Shawn Heisey [mailto:apa...@elyograg.org] Gesendet: Mittwoch, 28. Januar 2015 17:08 An: solr-user@lucene.apache.org Betreff: Re: CoreContainer#createAndLoad, existing cores not loaded On 1/28/2015 8:52 AM, Clemens Wyss DEV wrote: My problem: I create cores dynamically using container#create( CoreDescriptor ) and then add documents to the very core(s). So far so good. When I restart my app I do container = CoreContainer#createAndLoad(...) but when I then call container.getAllCoreNames() an empty list is returned. What cores should be loaded by the container if I call CoreContainer#createAndLoad(...) ? Where does the container lookup the existing cores? If the solr.xml is the old format, then cores are defined in solr.xml, in the cores section of that config file. There is a new format for solr.xml that is supported in version 4.4 and later and will be mandatory in 5.0. If that format is present, then Solr will use core discovery -- starting from either the solr home or a defined coreRootDirectory, solr will search for core.properties files and treat each directory where one is found as a core's instanceDir. http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond Thanks, Shawn
Re: PostingsFormat block size
Hi Thanks for your input. I do not do updates to the existing docs, so that is not relevant in my case, and I have just skipped that test case :-) I have not been able to measure any significant changes to the distributed searches or just doing a direct search for an id. Did I miss something with your comment Here it is? Best regards Trym On 27-01-2015 17:22, Mikhail Khludnev wrote: Hm.. It's not blocks which I'm familiar with. Regarding performance impact from bigger ID blocks: if you have uniqueKeyID/uniqueKey and sends update for existing docs. And IDs are also used for some of the distributed search stages, I suppose. Here it is. On Tue, Jan 27, 2015 at 4:33 PM, Trym Møller t...@sigmat.dk wrote: Hi Thanks for your clarifying questions. In the constructor of the Lucene41PostingsFormat class the minimum and maximum block size is provided. These sizes are used when creating the BlockTreeTermsWriter (responsible for writing the .tim and .tip files of the lucene index). It is the blocksizes of the BlockTreeTermsWriter I refer to. I'm not quite sure I understand your second question - sorry. I can tell that I have not tried if the PulsingPostingsFormat is of any help in regards to lowering the Solr JVM Memory usage, but I can see the same BlockTreeTermsWriter with its block sizes are used by the PulsingPostingsFormat. Should I expect something else from the PulsingPostingsFormat in regards to memory usage or in regards to searching (if have have changed to block sizes of the BlockTreeTermsWriter)? Best regards Trym On 27-01-2015 14:00, Mikhail Khludnev wrote: Hello Trym, Can you clarify, which blockSize do you mean? And the second q, just to avoid unnecessary explanation, do you know what's Pulsing? On Tue, Jan 27, 2015 at 2:28 PM, Trym Møller t...@sigmat.dk wrote: Hi I have successfully create a really cool Lucene41x8PostingsFormat class (a copy of the Lucene41PostingsFormat class modified to use 8 times the default block size), registered the format as required. In the schema.xml I have created a field type string with this postingsformat and lastly I'm using this field type for my id field. This all works great and as a consequence the .tip files of the Lucene index (segments) are considerably smaller and the same goes for the Solr JVM Memory usage (which was the end goal). Now I need to find the consequences (besides the disk and memory usage) of this change to the id-field. I would expect that id-searches are slower. But when will Solr/Lucene do id-searches? I have myself no user scenarios where my documents are searched by the id value. Thanks for any comments. Best regards Trym
AW: CoreContainer#createAndLoad, existing cores not loaded
BTW: None of my core folders contains a core.properties file ... ? Could it be due to the fact that I am (so far) running only EmbeddedSolrServer, hence no real Solr-Server? -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Donnerstag, 29. Januar 2015 08:08 An: solr-user@lucene.apache.org Betreff: AW: CoreContainer#createAndLoad, existing cores not loaded Thx Shawn. I am running latest-greatest Solr (4.10.3) Solr home is e.g. /opt/webs/siteX/WebContent/WEB-INF/solr the core(s) reside in /opt/webs/siteX/WebContent/WEB-INF/solr/cores Should these be found by core discovery? If not, how can I configure coreRootDirectory in sorl.xml to be cores folder below slorHome str name=coreRootDirectory${coreRootDirectory:solrHome/cores}/str Note: the solr.xml is to be used for any of our 150sites we host. Therefore like it to be generic - solrHome/cores -Ursprüngliche Nachricht- Von: Shawn Heisey [mailto:apa...@elyograg.org] Gesendet: Mittwoch, 28. Januar 2015 17:08 An: solr-user@lucene.apache.org Betreff: Re: CoreContainer#createAndLoad, existing cores not loaded On 1/28/2015 8:52 AM, Clemens Wyss DEV wrote: My problem: I create cores dynamically using container#create( CoreDescriptor ) and then add documents to the very core(s). So far so good. When I restart my app I do container = CoreContainer#createAndLoad(...) but when I then call container.getAllCoreNames() an empty list is returned. What cores should be loaded by the container if I call CoreContainer#createAndLoad(...) ? Where does the container lookup the existing cores? If the solr.xml is the old format, then cores are defined in solr.xml, in the cores section of that config file. There is a new format for solr.xml that is supported in version 4.4 and later and will be mandatory in 5.0. If that format is present, then Solr will use core discovery -- starting from either the solr home or a defined coreRootDirectory, solr will search for core.properties files and treat each directory where one is found as a core's instanceDir. http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond Thanks, Shawn
Re: replica never takes leader role
This is not the desired behavior at all. I know there have been improvements in this area since 4.8, but can't seem to locate the JIRAs. I'm curious _why_ the nodes are going down though, is it happening at random or are you taking it down? One problem has been that the Zookeeper timeout used to default to 15 seconds, and occasionally a node would be unresponsive (sometimes due to GC pauses) and exceed the timeout. So upping the ZK timeout has helped some people avoid this... FWIW, Erick On Wed, Jan 28, 2015 at 7:11 AM, Joshi, Shital shital.jo...@gs.com wrote: We're using Solr 4.8.0 -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, January 27, 2015 7:47 PM To: solr-user@lucene.apache.org Subject: Re: replica never takes leader role What version of Solr? This is an ongoing area of improvements and several are very recent. Try searching the JIRA for Solr for details. Best, Erick On Tue, Jan 27, 2015 at 1:51 PM, Joshi, Shital shital.jo...@gs.com wrote: Hello, We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and three zookeeper instances. We have noticed that when a leader node goes down the replica never takes over as a leader, cloud becomes unusable and we have to bounce entire cloud for replica to assume leader role. Is this default behavior? How can we change this? Thanks.
Re: CoreContainer#createAndLoad, existing cores not loaded
On 1/28/2015 8:52 AM, Clemens Wyss DEV wrote: My problem: I create cores dynamically using container#create( CoreDescriptor ) and then add documents to the very core(s). So far so good. When I restart my app I do container = CoreContainer#createAndLoad(...) but when I then call container.getAllCoreNames() an empty list is returned. What cores should be loaded by the container if I call CoreContainer#createAndLoad(...) ? Where does the container lookup the existing cores? If the solr.xml is the old format, then cores are defined in solr.xml, in the cores section of that config file. There is a new format for solr.xml that is supported in version 4.4 and later and will be mandatory in 5.0. If that format is present, then Solr will use core discovery -- starting from either the solr home or a defined coreRootDirectory, solr will search for core.properties files and treat each directory where one is found as a core's instanceDir. http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond Thanks, Shawn
Re: extract and add fields on the fly
Use case is use curl to upload/extract/index document passing in additional facets not present in the document e.g. literal.source=old system In this way some fields come from the uploaded extracted content and some fields as specified in the curl URL Hope that's clearer? Regards Mark On 28 January 2015 at 17:54, Alexandre Rafalovitch arafa...@gmail.com wrote: Sounds like 'literal.X' syntax from https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika Can you explain your use case as different from what's already documented? May be easier to understand. Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 28 January 2015 at 12:45, Mark javam...@gmail.com wrote: I'm looking to 1) upload a binary document using curl 2) add some additional facets Specifically my question is can this be achieved in 1 curl operation or does it need 2? On 28 January 2015 at 17:43, Mark javam...@gmail.com wrote: Second thoughts SID is purely i/p as its name suggests :) I think a better approach would be 1) curl to upload/extract passing docID 2) curl to update additional fields for that docID On 28 January 2015 at 17:30, Mark javam...@gmail.com wrote: Create the SID from the existing doc implies that a document already exists that you wish to add fields to. However if the document is a binary are you suggesting 1) curl to upload/extract passing docID 2) obtain a SID based off docID 3) add addtinal fields to SID commit I know I'm possibly wandering into the schemaless teritory here as well On 28 January 2015 at 17:11, Andrew Pawloski apawlo...@gmail.com wrote: I would switch the order of those. Add the new fields and *then* index to solr. We do something similar when we create SolrInputDocuments that are pushed to solr. Create the SID from the existing doc, add any additional fields, then add to solr. On Wed, Jan 28, 2015 at 11:56 AM, Mark javam...@gmail.com wrote: Is it possible to use curl to upload a document (for extract indexing) and specify some fields on the fly? sort of: 1) index this document 2) by the way here are some important facets whilst your at it Regards Mark
Re: extract and add fields on the fly
That approach works although as suspected the schma has to recognise the additinal facet (stuff in this case): responseHeader:{status:400,QTime:1},error:{msg:ERROR: [doc=6252671B765A1748992DF1A6403BDF81A4A15E00] unknown field 'stuff',code:400}} ..getting closer.. On 28 January 2015 at 18:03, Mark javam...@gmail.com wrote: Use case is use curl to upload/extract/index document passing in additional facets not present in the document e.g. literal.source=old system In this way some fields come from the uploaded extracted content and some fields as specified in the curl URL Hope that's clearer? Regards Mark On 28 January 2015 at 17:54, Alexandre Rafalovitch arafa...@gmail.com wrote: Sounds like 'literal.X' syntax from https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika Can you explain your use case as different from what's already documented? May be easier to understand. Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 28 January 2015 at 12:45, Mark javam...@gmail.com wrote: I'm looking to 1) upload a binary document using curl 2) add some additional facets Specifically my question is can this be achieved in 1 curl operation or does it need 2? On 28 January 2015 at 17:43, Mark javam...@gmail.com wrote: Second thoughts SID is purely i/p as its name suggests :) I think a better approach would be 1) curl to upload/extract passing docID 2) curl to update additional fields for that docID On 28 January 2015 at 17:30, Mark javam...@gmail.com wrote: Create the SID from the existing doc implies that a document already exists that you wish to add fields to. However if the document is a binary are you suggesting 1) curl to upload/extract passing docID 2) obtain a SID based off docID 3) add addtinal fields to SID commit I know I'm possibly wandering into the schemaless teritory here as well On 28 January 2015 at 17:11, Andrew Pawloski apawlo...@gmail.com wrote: I would switch the order of those. Add the new fields and *then* index to solr. We do something similar when we create SolrInputDocuments that are pushed to solr. Create the SID from the existing doc, add any additional fields, then add to solr. On Wed, Jan 28, 2015 at 11:56 AM, Mark javam...@gmail.com wrote: Is it possible to use curl to upload a document (for extract indexing) and specify some fields on the fly? sort of: 1) index this document 2) by the way here are some important facets whilst your at it Regards Mark
Re: extract and add fields on the fly
Well, the schema does need to know what type your field is. If you can't add it to schema, use dynamicFields with prefixe/suffixes or dynamic schema (less recommended). Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 28 January 2015 at 13:32, Mark javam...@gmail.com wrote: That approach works although as suspected the schma has to recognise the additinal facet (stuff in this case): responseHeader:{status:400,QTime:1},error:{msg:ERROR: [doc=6252671B765A1748992DF1A6403BDF81A4A15E00] unknown field 'stuff',code:400}} ..getting closer.. On 28 January 2015 at 18:03, Mark javam...@gmail.com wrote: Use case is use curl to upload/extract/index document passing in additional facets not present in the document e.g. literal.source=old system In this way some fields come from the uploaded extracted content and some fields as specified in the curl URL Hope that's clearer? Regards Mark On 28 January 2015 at 17:54, Alexandre Rafalovitch arafa...@gmail.com wrote: Sounds like 'literal.X' syntax from https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika Can you explain your use case as different from what's already documented? May be easier to understand. Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 28 January 2015 at 12:45, Mark javam...@gmail.com wrote: I'm looking to 1) upload a binary document using curl 2) add some additional facets Specifically my question is can this be achieved in 1 curl operation or does it need 2? On 28 January 2015 at 17:43, Mark javam...@gmail.com wrote: Second thoughts SID is purely i/p as its name suggests :) I think a better approach would be 1) curl to upload/extract passing docID 2) curl to update additional fields for that docID On 28 January 2015 at 17:30, Mark javam...@gmail.com wrote: Create the SID from the existing doc implies that a document already exists that you wish to add fields to. However if the document is a binary are you suggesting 1) curl to upload/extract passing docID 2) obtain a SID based off docID 3) add addtinal fields to SID commit I know I'm possibly wandering into the schemaless teritory here as well On 28 January 2015 at 17:11, Andrew Pawloski apawlo...@gmail.com wrote: I would switch the order of those. Add the new fields and *then* index to solr. We do something similar when we create SolrInputDocuments that are pushed to solr. Create the SID from the existing doc, add any additional fields, then add to solr. On Wed, Jan 28, 2015 at 11:56 AM, Mark javam...@gmail.com wrote: Is it possible to use curl to upload a document (for extract indexing) and specify some fields on the fly? sort of: 1) index this document 2) by the way here are some important facets whilst your at it Regards Mark
Re: extract and add fields on the fly
Sounds like 'literal.X' syntax from https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika Can you explain your use case as different from what's already documented? May be easier to understand. Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 28 January 2015 at 12:45, Mark javam...@gmail.com wrote: I'm looking to 1) upload a binary document using curl 2) add some additional facets Specifically my question is can this be achieved in 1 curl operation or does it need 2? On 28 January 2015 at 17:43, Mark javam...@gmail.com wrote: Second thoughts SID is purely i/p as its name suggests :) I think a better approach would be 1) curl to upload/extract passing docID 2) curl to update additional fields for that docID On 28 January 2015 at 17:30, Mark javam...@gmail.com wrote: Create the SID from the existing doc implies that a document already exists that you wish to add fields to. However if the document is a binary are you suggesting 1) curl to upload/extract passing docID 2) obtain a SID based off docID 3) add addtinal fields to SID commit I know I'm possibly wandering into the schemaless teritory here as well On 28 January 2015 at 17:11, Andrew Pawloski apawlo...@gmail.com wrote: I would switch the order of those. Add the new fields and *then* index to solr. We do something similar when we create SolrInputDocuments that are pushed to solr. Create the SID from the existing doc, add any additional fields, then add to solr. On Wed, Jan 28, 2015 at 11:56 AM, Mark javam...@gmail.com wrote: Is it possible to use curl to upload a document (for extract indexing) and specify some fields on the fly? sort of: 1) index this document 2) by the way here are some important facets whilst your at it Regards Mark
RE: Suggesting broken words with solr.WordBreakSolrSpellChecker
Try using something larger than 2 for alternativeTermCount. 5 is probably ok here. If that doesn't work, then post the exact query you are using and the full extended spellcheck results. James Dyer Ingram Content Group -Original Message- From: fabio.bozzo [mailto:f.bo...@3-w.it] Sent: Tuesday, January 27, 2015 3:59 PM To: solr-user@lucene.apache.org Subject: RE: Suggesting broken words with solr.WordBreakSolrSpellChecker I have this in my solrconfig: requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dfcatch_all/str str name=spellcheckon/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count5/str str name=spellcheck.alternativeTermCount2/str str name=spellcheck.maxResultsForSuggest100/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.maxCollationTries5/str str name=spellcheck.maxCollations3/str /lst arr name=last-components strspellcheck/str /arr /requestHandler Although my spellchecker does work, suggesting for misspelled terms, it doesn't work for the example above: I mean terms which are both valid, (gopro=100 docs; go pro=150 'others' docs). I want to suggest gopro for go pro search term and vice-versa, even if they're both perfectly valid terms in the index. Thank you -- View this message in context: http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182398.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [MASSMAIL]Re: Contextual sponsored results with Solr
We are trying to avoid firing 2 queries per request. I've started to play with a PostFilter to see how it goes, perhaps something in the line of the ReRankQueryQueryParser could be used to avoid using two queries and instead rerank the results? - Original Message - From: Ahmet Arslan iori...@yahoo.com.INVALID To: solr-user@lucene.apache.org Sent: Tuesday, January 27, 2015 11:06:29 PM Subject: [MASSMAIL]Re: Contextual sponsored results with Solr Hi Jorge, We have done similar thing with N=3. We issue separate two queries/requests, display 'special N' above the results. We excluded 'special N' with -id:(1 2 3 ... N) type query. all done on client side. Ahmet On Tuesday, January 27, 2015 8:28 PM, Jorge Luis Betancourt González jlbetanco...@uci.cu wrote: Hi all, Recently I got an interesting use case that I'm not sure how to implement, the idea is that the client wants a fixed number of documents, let's call it N, to appear in the top of the results. Let me explain a little we're working with web documents so the idea is too promote the documents that match the query of the user from a given domain (wikipedia, for example) to the top of the list. So if I apply a a boost using the boost parameter: http://localhost:8983/solr/select?q=searchfl=urlboost=map(query($type1query),0,0,1,50)type1query=host:wikipedia I get *all* the documents from the desired host at the top, but there is no way of limiting the number of documents from the host that are boosted to the top of the result list (which could lead to several pages of content from the same host, which is not desired, the idea is to only show N) . I was thinking in something like field collapsing/grouping but only for the documents that match my $type1query parameter (host:wikipedia) but I don't see any way of doing grouping/collapsing on only one group and leave the other results untouched. I although thought on using 2 groups using group.query=host:wikipedia and group.query=-host:wikipedia, but in this case there is no way of controlling how much documents each independently group will have. In this particular case QueryElevationComponent it's not helping because I don't want to map all the posible queries I just want to put the some of the results from a certain host in the top of the list, but without boosting all the documents from the same host. Any thoughts or recommendations on this? Thank you, Regards, --- XII Aniversario de la creación de la Universidad de las Ciencias Informáticas. 12 años de historia junto a Fidel. 12 de diciembre de 2014. --- XII Aniversario de la creación de la Universidad de las Ciencias Informáticas. 12 años de historia junto a Fidel. 12 de diciembre de 2014.
Re: PostingsHighlighter highlighted snippet size (fragsize)
It seems that a solution has been found. PostingsHighlighter uses by default Java's SENTENCE BreakIterator so it breaks the snippets into fragments per sentence. In my text_en analysis chain though I was using a filter that lowercases input and this seems to mess with the logic of SENTENCE BreakIterator. Removing the filter did the trick. Apart from that there is a new issue now. I'm trying to search on one field and highlight another and this seems to not be working even If I use the exact same analyzers for both fields. I get the correct results in the highlighting section but there is no highlight. Digging deeper I've found inside PostingsHighlighter.highlightFieldsAsObjects() (line 393 in version 4.10.3) that the fields to be highlighted (I guess) are the intersection of the query terms set (fields used in the search query) and the set of fields to be highlighted (defined by the hl.fl param). So, unless I use the field to be highlighted in the search query I get no highlight. -- View this message in context: http://lucene.472066.n3.nabble.com/PostingsHighlighter-highlighted-snippet-size-fragsize-tp4180634p4182596.html Sent from the Solr - User mailing list archive at Nabble.com.
extract and add fields on the fly
Is it possible to use curl to upload a document (for extract indexing) and specify some fields on the fly? sort of: 1) index this document 2) by the way here are some important facets whilst your at it Regards Mark
Re: How to implement Auto complete, suggestion client side
Hi, Thank you Dan Davis and Alexandre Rafalovitch. This is very helpful for me. Regards Olivier 2015-01-27 0:51 GMT+01:00 Alexandre Rafalovitch arafa...@gmail.com: You've got a lot of options depending on what you want. But since you seem to just want _an_ example, you can use mine from http://www.solr-start.com/javadoc/solr-lucene/index.html (gray search box there). You can see the source for the test screen (using Spring Boot and Spring Data Solr as a middle-layer) and Select2 for the UI at: https://github.com/arafalov/Solr-Javadoc/tree/master/SearchServer. The Solr definition is at: https://github.com/arafalov/Solr-Javadoc/tree/master/JavadocIndex/JavadocCollection/conf Other implementation pieces are in that (and another) public repository as well, but it's all in Java. You'll probably want to do something similar in PHP. Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 26 January 2015 at 17:11, Olivier Austina olivier.aust...@gmail.com wrote: Hi All, I would say I am new to web technology. I would like to implement auto complete/suggestion in the user search box as the user type in the search box (like Google for example). I am using Solr as database. Basically I am familiar with Solr and I can formulate suggestion queries. But now I don't know how to implement suggestion in the User Interface. Which technologies should I need. The website is in PHP. Any suggestions, examples, basic tutorial is welcome. Thank you. Regards Olivier
Re: replicas goes in recovery mode right after update
Vijay: Thanks for reporting this back! Could I ask you to post a new patch with your correction? Please use the same patch name (SOLR-5850.patch), and include a note about what you found (I've already added a comment). Thanks! Erick On Wed, Jan 28, 2015 at 9:18 AM, Vijay Sekhri sekhrivi...@gmail.com wrote: Hi Shawn, Thank you so much for the assistance. Building is not a problem . Back in the days I have worked with linking, compiling and building C , C++ software . Java is a piece of cake. We have built the new war from the source version 4.10.3 and our preliminary tests have shown that our issue (replicas in recovery on high load)* is resolved *. We will continue to do more testing and confirm . Please note that the *patch is BUGGY*. It removed the break statement within while loop because of which, whenever we send a list of docs it would hang (API CloudSolrServer.add) , but it would work if send one doc at a time. It took a while to figure out why that is happening. Once we put the break statement back it worked like a charm. Furthermore the patch has solr/solrj/src/java/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrClient.java which should be solr/solrj/src/java/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.java Finally checking if(!offer) is sufficient than using if(offer == false) Last but not the least having a configurable queue size and timeouts (managed via solrconfig) would be quite helpful Thank you once again for your help. Vijay On Tue, Jan 27, 2015 at 6:20 PM, Shawn Heisey apa...@elyograg.org wrote: On 1/27/2015 2:52 PM, Vijay Sekhri wrote: Hi Shawn, Here is some update. We found the main issue We have configured our cluster to run under jetty and when we tried full indexing, we did not see the original Invalid Chunk error. However the replicas still went into recovery All this time we been trying to look into replicas logs to diagnose the issue. The problem seem to be at the leader side. When we looked into leader logs, we found the following on all the leaders 3439873 [qtp1314570047-92] WARN org.apache.solr.update.processor.DistributedUpdateProcessor – Error sending update *java.lang.IllegalStateException: Queue full* snip There is a similar bug reported around this https://issues.apache.org/jira/browse/SOLR-5850 and it seem to be in OPEN status. Is there a way we can configure the queue size and increase it ? or is there a version of solr that has this issue resolved already? Can you suggest where we go from here to resolve this ? We can repatch the war file if that is what you would recommend . In the end our initial speculation about solr unable to handle so many update is correct. We do not see this issue when the update load is less. Are you in a position where you can try the patch attached to SOLR-5850? You would need to get the source code for the version you're on (or perhaps a newer 4.x version), patch it, and build Solr yourself. If you have no experience building java packages from source, this might prove to be difficult. Thanks, Shawn -- * Vijay Sekhri *
Re: extract and add fields on the fly
Thanks Alexandre, I figured it out with this example, https://wiki.apache.org/solr/ExtractingRequestHandler whereby you can add additional fields at upload/extract time curl http://localhost:8983/solr/update/extract?literal.id=doc4captureAttr=truedefaultField=textcapture=divfmap.div=foo_txtboost.foo_txt=3literal.blah_s=Bah; -F tutorial=@help.pdf and therefore I learned that you can't update a field that isn't in the original which is what I was trying to do before. Regards Mark On 28 January 2015 at 18:38, Alexandre Rafalovitch arafa...@gmail.com wrote: Well, the schema does need to know what type your field is. If you can't add it to schema, use dynamicFields with prefixe/suffixes or dynamic schema (less recommended). Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 28 January 2015 at 13:32, Mark javam...@gmail.com wrote: That approach works although as suspected the schma has to recognise the additinal facet (stuff in this case): responseHeader:{status:400,QTime:1},error:{msg:ERROR: [doc=6252671B765A1748992DF1A6403BDF81A4A15E00] unknown field 'stuff',code:400}} ..getting closer.. On 28 January 2015 at 18:03, Mark javam...@gmail.com wrote: Use case is use curl to upload/extract/index document passing in additional facets not present in the document e.g. literal.source=old system In this way some fields come from the uploaded extracted content and some fields as specified in the curl URL Hope that's clearer? Regards Mark On 28 January 2015 at 17:54, Alexandre Rafalovitch arafa...@gmail.com wrote: Sounds like 'literal.X' syntax from https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika Can you explain your use case as different from what's already documented? May be easier to understand. Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 28 January 2015 at 12:45, Mark javam...@gmail.com wrote: I'm looking to 1) upload a binary document using curl 2) add some additional facets Specifically my question is can this be achieved in 1 curl operation or does it need 2? On 28 January 2015 at 17:43, Mark javam...@gmail.com wrote: Second thoughts SID is purely i/p as its name suggests :) I think a better approach would be 1) curl to upload/extract passing docID 2) curl to update additional fields for that docID On 28 January 2015 at 17:30, Mark javam...@gmail.com wrote: Create the SID from the existing doc implies that a document already exists that you wish to add fields to. However if the document is a binary are you suggesting 1) curl to upload/extract passing docID 2) obtain a SID based off docID 3) add addtinal fields to SID commit I know I'm possibly wandering into the schemaless teritory here as well On 28 January 2015 at 17:11, Andrew Pawloski apawlo...@gmail.com wrote: I would switch the order of those. Add the new fields and *then* index to solr. We do something similar when we create SolrInputDocuments that are pushed to solr. Create the SID from the existing doc, add any additional fields, then add to solr. On Wed, Jan 28, 2015 at 11:56 AM, Mark javam...@gmail.com wrote: Is it possible to use curl to upload a document (for extract indexing) and specify some fields on the fly? sort of: 1) index this document 2) by the way here are some important facets whilst your at it Regards Mark
Issue on server restarts with Solr 4.6.0 Cloud
Using Solr 4.6.0 on linux with Java 6 (Oracle JRockit 1.6.0_75 R28.3.2-14-160877-1.6.0_75) We are seeing these issues when doing a restart on a Solr cloud configuration.After restarting each server in sequence none of them will come up. The servers start up after a long time but the cloud status shows the Solr as being down. java.nio.channels.ClosedChannelException at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:87) at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:603) at org.apache.solr.update.ChannelFastInputStream.readWrappedStream(TransactionLog.java:778) at org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:89) at org.apache.solr.common.util.FastInputStream.readUnsignedByte(FastInputStream.java:71) at org.apache.solr.common.util.FastInputStream.readInt(FastInputStream.java:216) at org.apache.solr.update.TransactionLog$FSReverseReader.init(TransactionLog.java:696) at org.apache.solr.update.TransactionLog.getReverseReader(TransactionLog.java:575) at org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:942) at org.apache.solr.update.UpdateLog$RecentUpdates.access$000(UpdateLog.java:885) at org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1043) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:280) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:244) SnapPull failed :org.apache.lucene.store.AlreadyClosedException: Already closed at org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:340) at org.apache.solr.handler.ReplicationHandler.loadReplicationProperties(ReplicationHandler.java:811) at org.apache.solr.handler.SnapPuller.logReplicationTimeAndConfFiles(SnapPuller.java:564) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:506) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:156) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:433) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:244) Error while trying to recover. core=[REDACTED]:org.apache.solr.common.SolrException: No registered leader was found, collection:[REDACTED] slice:shard1 at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:484) at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:467) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:342) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:244)
RE: replica never takes leader role
When leader reaches 99% physical memory on the box and starts swapping (stops replicating), we forcefully bring down leader (first kill -15 and then kill -9 if kill -15 doesn't work). This is when we are looking up to replica to assume leader's role and it never happens. Zookeeper timeout is 45 seconds. We can increase it up to 2 minutes and test. cores adminPath=/admin/cores defaultCoreName=collection1 host=${host:} hostPort=${jetty.port:8983} hostContext=${hostContext:solr} zkClientTimeout=${zkClientTimeout:45000} As per definition of zkClientTimeout, After the leader is brought down and it doesn't talk to zookeeper for 45 seconds, shouldn't ZK promote replica to leader? I am not sure how increasing zk timeout will help. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, January 28, 2015 11:42 AM To: solr-user@lucene.apache.org Subject: Re: replica never takes leader role This is not the desired behavior at all. I know there have been improvements in this area since 4.8, but can't seem to locate the JIRAs. I'm curious _why_ the nodes are going down though, is it happening at random or are you taking it down? One problem has been that the Zookeeper timeout used to default to 15 seconds, and occasionally a node would be unresponsive (sometimes due to GC pauses) and exceed the timeout. So upping the ZK timeout has helped some people avoid this... FWIW, Erick On Wed, Jan 28, 2015 at 7:11 AM, Joshi, Shital shital.jo...@gs.com wrote: We're using Solr 4.8.0 -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, January 27, 2015 7:47 PM To: solr-user@lucene.apache.org Subject: Re: replica never takes leader role What version of Solr? This is an ongoing area of improvements and several are very recent. Try searching the JIRA for Solr for details. Best, Erick On Tue, Jan 27, 2015 at 1:51 PM, Joshi, Shital shital.jo...@gs.com wrote: Hello, We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and three zookeeper instances. We have noticed that when a leader node goes down the replica never takes over as a leader, cloud becomes unusable and we have to bounce entire cloud for replica to assume leader role. Is this default behavior? How can we change this? Thanks.