Where to set Shards.tolerant to true ?
Hey guys, I am trying to implement Distributed search with Master Slave server. Search requests goes to Slave Servers. I am planning to have a load balancer before the Slave servers. So here is the custom search handler which is defined. *:* host address of the slaves I believe if more than one slave servers are provided in the shards parameter, it will not be fault tolerant. So in that case I came across something like shard.tolenance = true parameter. But I am not sure on where we can define this ? https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance Can we set this up with Solr Master Slave architecture. Could someone please tell me if this is possible to set up at Solr Server level ? Thanks Sangeetha
Commit after every document - alternate approach
Hi All, I am trying to understand on how we can have commit issued to solr while indexing documents. Around 200K to 300K document/per hour with an avg size of 10 KB size each will be getting into SOLR . JAVA code fetches the document from MQ and streamlines it to SOLR. The problem is the client code issues hard-commit after each document which is sent to SOLR for indexing and it waits for the response from SOLR to get assurance whether the document got indexed successfully. Only if it gets a OK status from SOLR the document is cleared out from SOLR. As far as I understand doing a commit after each document is an expensive operation. But we need to make sure that all the documents which are put into MQ gets indexed in SOLR. Is there any other way of getting this done ? Please let me know. If we do a batch indexing, is there any chances we can identify if some documents is missed from indexing ? Thanks Sangeetha
SnapPuller Exception in Slave server
Hi All, I am using solr 4.5.1 with Master and Slave architecture. I am seeing the below exception in the Slave server SnapPuller Master at: not available. Index fetch failed. Exception: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: I don't see replication not working due to this exception but I guess it slows down the replication process Could someone please let me know why this occurs and what needs to checked for ? Thanks Sangeetha
Warning : Import command failed . another import is running
Hi All, I have enabled auto indexing via Data Import option. I see below warning occur on a daily basis not at regular specific time. Could someone please tell me why I am seeing below warning , Aug 3, 2015 1:01:34 PM org.apache.solr.handler.dataimport.DataImporter runCmd WARNING: Import command failed . another import is running Aug 3, 2015 1:03:34 PM org.apache.solr.handler.dataimport.DataImporter runCmd WARNING: Import command failed . another import is running Aug 3, 2015 1:25:36 PM org.apache.solr.handler.dataimport.DataImporter runCmd WARNING: Import command failed . another import is running Is there a possibility that files get missed out from indexing due to this. Please let me know. Thanks Sangeetha
RE: SOLR Exception with SOLR Cloud 5.1 setup on Linux
Thanks Yonik and Shawn. 8000-7fff change worked. Thanks Sangeetha -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: 29 July 2015 18:39 To: solr-user@lucene.apache.org Subject: Re: SOLR Exception with SOLR Cloud 5.1 setup on Linux On 7/28/2015 5:10 PM, Yonik Seeley wrote: On Tue, Jul 28, 2015 at 6:54 PM, Shawn Heisey apa...@elyograg.org wrote: To get out of the hole you're in now, either build a new collection with the actual shard count that you want so it's correctly set up, or edit the clusterstate in zookeeper to change the hash range (change 8000 to ) Actually, if you want a range that covers the entire 32 bit hash space, it would be 8000-7fff (hex representations of signed integers). Good to know. Thanks. I was somewhat confused by something I saw in my own clusterstate on a three-shard collection where the start value appeared to be larger than the end value in one of the shards, this note makes that understandable. I find it irritating and confusing, but now it makes sense. Shawn
How to upgrade SOLR from 4.0 to 5.1 on Windows
Hi, I would like to know how to upgrade SOLR from version 4 to 5.1. I did find the below wiki links which says what are the changes available. https://cwiki.apache.org/confluence/display/solr/Upgrading+Solr https://cwiki.apache.org/confluence/display/solr/Major+Changes+from+Solr+4+to+Solr+5 Apart from these is there any link or document which will tell on how to do the upgrade like step-by-step procedure on the Installation part ? Please let me know. Thanks Sangeetha
Exceptions in Tomcat - Master Server
Hi All, We have SOLR 4.1 Master and Replica setup using Tomcat service . Lately I am seeing below two exceptions in Master server logs. *Socket Connection org.apache.solr.handler.ReplicationHandler$DirectoryFileStream write WARNING: Exception while writing response for params: file=_aq5o_Lucene41_0.poscommand=filecontentchecksum=truewt=filestreamqt=/replicationgeneration=493833 ClientAbortException: java.net.SocketException: Connection reset by peer: socket write error at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:388) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:371) at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:413) at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:401) at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:91) at org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:214) at org.apache.solr.common.util.FastOutputStream.write(FastOutputStream.java:84) at org.apache.solr.handler.ReplicationHandler$DirectoryFileStream.write(ReplicationHandler.java:1142) at org.apache.solr.handler.ReplicationHandler$3.write(ReplicationHandler.java:995) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:684) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) *File Not Found Exception org.apache.solr.handler.ReplicationHandler$DirectoryFileStream write WARNING: Exception while writing response for params: file=_aq5s_3.delcommand=filecontentchecksum=truewt=filestreamqt=/replicationgeneration=493833 java.io.FileNotFoundException: F:\Solr-Index\Index_New\Master\index\_aq5s_3.del (The system cannot find the file specified) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(Unknown Source) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:193) at org.apache.solr.handler.ReplicationHandler$DirectoryFileStream. I don't have much expertise on the Tomcat exceptions. Could someone tell me why these errors has occurred and how this can rectified ? Thanks Sangeetha
SOLR Exception with SOLR Cloud 5.1 setup on Linux
Hi, I have set up SOLR Cloud comprising of 2 solr instances and zookeeper in separate instance. Have created one shard in one of the solr node and the other solr node act as a replica for that shard. I am able to post documents through UI. But while trying to connect from Java layer I am getting below error. From Java level using CLoudSolrCLient class I am passing zookeeper host which is 10.111.65.152 on 2181 port. The collection name is umbcollection. I am not sure what is wrong here. Could someone help me in finding what could be the rootcause. org.apache.solr.client.solrj.SolrServerException: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://10.111.65.150:8080/solr/umbcollection: No active slice servicing hash code 103646ce in DocCollection(umbcollection)={ shards:{shard1:{ range:8000-, state:active, replicas:{ core_node1:{ state:active, core:umb, node_name:10.111.65.150:8080_solr, base_url:http://10.111.65.150:8080/solr;, leader:true}, core_node2:{ state:active, core:shard1-replica-1, node_name:10.111.65.151:8080_solr, base_url:http://10.111.65.151:8080/solr, maxShardsPerNode:1, router:{name:compositeId}, replicationFactor:1, autoAddReplicas:false, autoCreated:true} Thanks Sangeetha
Re: SOLR Exception with SOLR Cloud 5.1 setup on Linux
Yes I did create two shards and two replicas and later dropped the other one.. Version is 5.1 . can you please tell me how this can be fixed ?? Thanks Sangeetha Sent from mobile On Jul 28, 2015 8:46 PM, Shawn Heisey apa...@elyograg.org wrote: On 7/28/2015 8:22 AM, sangeetha.subraman...@gtnexus.com wrote: org.apache.solr.client.solrj.SolrServerException: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://10.111.65.150:8080/solr/umbcollection: No active sice servicing hash code 103646ce in DocCollection(umbcollection)={ shards:{shard1:{ range:8000-, That JSON structure looks like it is a complete collection cluterstate. Which means that you only have one shard, but it i configured to only cover half of the range of hash values. You have nothing covering through 7fff. That is consistent with the error message. There should be another shard which would cover the other half of the range. It seems highly unlikely that you could have ended up with thi cluterstate unless you have been manually changing your collection with the collections API after creating it, or maybe doing manual tweaks to the config in zookeeper. Has anything like that happened? What is your Solr version? Thanks, Shawn
What is the best practice to Backup and delete a core from SOLR Master-Slave architecture
Hi, I am a newbie to SOLR. I have setup Master Slave configuration with SOLR 4.0. I am trying to identify what is the best way to backup an old core and delete the same so as to free up space from the disk. I did get the information on how to unload a core and delete the indexes from the core. Unloading - http://localhost:8983/solr/admin/cores?action=UNLOADcore=core0 Delete Indexes - http://localhost:8983/solr/admin/cores?action=UNLOADcore=core0deleteIndex=true What is the best approach to remove the old core ? * Approach 1 o Unload the core in both Master and Slave server AND delete the index only from Master server (retain the indexes in slave server as a backup). If I am retaining the indexes in Slave server, at later point is there a way to bring those to Master Server ? * Approach 2 o Unload and delete the indexes from both Master and Slave server. Before deleting, take a backup of the data dir of old core from File system. I am not sure if this is even possible ? Is there any other way better way of doing this ? Please let me know Thanks Sangeetha
RE: What is the best way of Indexing different formats of documents?
Hi Swaraj, Thanks for the answers. From my understanding We can index, · Using DIH from db · Using DIH from filesystem - this is where I am concentrating on. o For this we can use SolrJ with Tika(solr cell) from Java layer in order to extract the content and send the data through REST API to solrserver o Or we can use extractrequesthandler to do the job. I just want to index only certain documents and there will not be any update happening on the indexed document. In our existing system we already have DIH implemented which indexes document from sql server (As you said based on last index time). In this case the metadata is there available in database. But if we are streaming via url, we would need to append the metadata too. correct me if i am wrong. And how does the indexing happening here based on last index time or something else ? Also for extractrequesthandler when you say manual operation what is it you are talking about ? Can you please clarify. Thanks Sangeetha -Original Message- From: Swaraj Kumar [mailto:swaraj2...@gmail.com] Sent: 07 April 2015 18:02 To: solr-user@lucene.apache.org Subject: Re: What is the best way of Indexing different formats of documents? You can always choose either DIH or /update/extract to index docs in solr. Now there are multiple benefits of DIH which I am listing below :- 1. Clean and update using a single command. 2. DIH also optimize indexing using optimize=true 3. You can do delta-import based on last index time where as in case of /update/extract you need to do manual operation in case of delta import. 4. You can use multiple entity processor and transformers in case of DIH which is very useful to index exact data you want. 5. Query parameter rows limits the num of records. Regards, Swaraj Kumar Senior Software Engineer I MakeMyTrip.com Mob No- 9811774497 On Tue, Apr 7, 2015 at 4:18 PM, sangeetha.subraman...@gtnexus.commailto:sangeetha.subraman...@gtnexus.com sangeetha.subraman...@gtnexus.commailto:sangeetha.subraman...@gtnexus.com wrote: Hi, I am a newbie to SOLR and basically from database background. We have a requirement of indexing files of different formats (x12,edifact, csv,xml). The files which are inputted can be of any format and we need to do a content based search on it. From the web I understand we can use TIKA processor to extract the content and store it in SOLR. What I want to know is, is there any better approach for indexing files in SOLR ? Can we index the document through streaming directly from the Application ? If so what is the disadvantage of using it (against DIH which fetches from the database)? Could someone share me some insight on this ? ls there any web links which I can refer to get some idea on it ? Please do help. Thanks Sangeetha
What is the best way of Indexing different formats of documents?
Hi, I am a newbie to SOLR and basically from database background. We have a requirement of indexing files of different formats (x12,edifact, csv,xml). The files which are inputted can be of any format and we need to do a content based search on it. From the web I understand we can use TIKA processor to extract the content and store it in SOLR. What I want to know is, is there any better approach for indexing files in SOLR ? Can we index the document through streaming directly from the Application ? If so what is the disadvantage of using it (against DIH which fetches from the database)? Could someone share me some insight on this ? ls there any web links which I can refer to get some idea on it ? Please do help. Thanks Sangeetha
Help needed in Indexing and Search on xml content
Hi Team, I am a newbie to SOLR. I have got search fields stored in a xml file which is stored in MSSQL. I want to index on the content of the xml file in SOLR. We need to provide search based on the fields present in the XML file. The reason why we are storing the input details as XML file is , the users will be able to add custom input fields on their own with values. Storing these custom fields as columns in MSSQL seems to be not an optimal solution. So we thought of putting it in XML file and store that file in RDBMS. But I am not sure on how we can index the content of the file to make search better. I believe this can be done by ExtractingRequestHandler. Could someone help me on how we can implement this/ direct me to some pages which could be of help to me ? Thanks Sangeetha