Where to set Shards.tolerant to true ?

2016-04-22 Thread sangeetha.subraman...@gtnexus.com
Hey guys,

I am trying to implement Distributed search with Master Slave server. Search 
requests goes to Slave Servers. I am planning to have a load balancer before 
the Slave servers. So here is the custom search handler which is defined.


 
*:*
 host address of the slaves
 
   

I believe if more than one slave servers are provided in the shards parameter, 
it will not be fault tolerant. So in that case I came across something like 
shard.tolenance = true parameter.
But I am not sure on where we can define this ? 
https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance
 Can we set this up with Solr Master Slave architecture.
Could someone please tell me if this is possible to set up at Solr Server level 
?

Thanks
Sangeetha


Commit after every document - alternate approach

2016-03-02 Thread sangeetha.subraman...@gtnexus.com
Hi All,

I am trying to understand on how we can have commit issued to solr while 
indexing documents. Around 200K to 300K document/per hour with an avg size of 
10 KB size each will be getting into SOLR . JAVA code fetches the document from 
MQ and streamlines it to SOLR. The problem is the client code issues 
hard-commit after each document which is sent to SOLR for indexing and it waits 
for the response from SOLR to get assurance whether the document got indexed 
successfully. Only if it gets a OK status from SOLR the document is cleared out 
from SOLR.

As far as I understand doing a commit after each document is an expensive 
operation. But we need to make sure that all the documents which are put into 
MQ gets indexed in SOLR. Is there any other way of getting this done ? Please 
let me know.
If we do a batch indexing, is there any chances we can identify if some 
documents is missed from indexing ?

Thanks
Sangeetha


SnapPuller Exception in Slave server

2015-11-07 Thread sangeetha.subraman...@gtnexus.com
Hi All,

I am using solr 4.5.1 with Master and Slave architecture. I am seeing the below 
exception in the Slave server

SnapPuller

Master at:  not available. Index fetch failed. Exception: 
org.apache.solr.client.solrj.SolrServerException: IOException occured when 
talking to server at:


I don't see replication not working due to this exception but I guess it slows 
down the replication process
Could someone please let me know why this occurs and what needs to checked for ?

Thanks
Sangeetha


Warning : Import command failed . another import is running

2015-08-04 Thread sangeetha.subraman...@gtnexus.com
Hi All,

I have enabled auto indexing via Data Import option. I see below warning occur 
on a daily basis not at regular specific time.  Could someone please tell me 
why I am seeing below warning ,

Aug 3, 2015 1:01:34 PM org.apache.solr.handler.dataimport.DataImporter runCmd
WARNING: Import command failed . another import is running
Aug 3, 2015 1:03:34 PM org.apache.solr.handler.dataimport.DataImporter runCmd
WARNING: Import command failed . another import is running
Aug 3, 2015 1:25:36 PM org.apache.solr.handler.dataimport.DataImporter runCmd
WARNING: Import command failed . another import is running


Is there a possibility that files get missed out from indexing due to this. 
Please let me know.

Thanks
Sangeetha


RE: SOLR Exception with SOLR Cloud 5.1 setup on Linux

2015-07-31 Thread sangeetha.subraman...@gtnexus.com
Thanks Yonik and Shawn. 8000-7fff  change worked.

Thanks
Sangeetha

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: 29 July 2015 18:39
To: solr-user@lucene.apache.org
Subject: Re: SOLR Exception with SOLR Cloud 5.1 setup on Linux

On 7/28/2015 5:10 PM, Yonik Seeley wrote:
 On Tue, Jul 28, 2015 at 6:54 PM, Shawn Heisey apa...@elyograg.org wrote:
 To get out of the hole you're in now, either build a new collection 
 with the actual shard count that you want so it's correctly set up, 
 or edit the clusterstate in zookeeper to change the hash range 
 (change 8000 to )
 
 Actually, if you want a range that covers the entire 32 bit hash 
 space, it would be 8000-7fff  (hex representations of signed 
 integers).

Good to know.  Thanks.  I was somewhat confused by something I saw in my own 
clusterstate on a three-shard collection where the start value appeared to be 
larger than the end value in one of the shards, this note makes that 
understandable.  I find it irritating and confusing, but now it makes sense.

Shawn



How to upgrade SOLR from 4.0 to 5.1 on Windows

2015-07-31 Thread sangeetha.subraman...@gtnexus.com
Hi,

I would like to know how to upgrade SOLR from version 4 to 5.1. I did find the 
below wiki links which says what are the changes available.
https://cwiki.apache.org/confluence/display/solr/Upgrading+Solr
https://cwiki.apache.org/confluence/display/solr/Major+Changes+from+Solr+4+to+Solr+5

Apart from these is there any link or document which will tell on how to do the 
upgrade like step-by-step procedure on the Installation part ? Please let me 
know.

Thanks
Sangeetha



Exceptions in Tomcat - Master Server

2015-07-31 Thread sangeetha.subraman...@gtnexus.com
Hi All,

We have SOLR 4.1 Master and Replica setup using Tomcat service . Lately I am 
seeing below two exceptions in Master server logs.


*Socket Connection
org.apache.solr.handler.ReplicationHandler$DirectoryFileStream write
WARNING: Exception while writing response for params: 
file=_aq5o_Lucene41_0.poscommand=filecontentchecksum=truewt=filestreamqt=/replicationgeneration=493833
ClientAbortException:  java.net.SocketException: Connection reset by peer: 
socket write error
   at 
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:388)
   at 
org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:371)
   at 
org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:413)
   at 
org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:401)
   at 
org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:91)
   at 
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:214)
   at 
org.apache.solr.common.util.FastOutputStream.write(FastOutputStream.java:84)
   at 
org.apache.solr.handler.ReplicationHandler$DirectoryFileStream.write(ReplicationHandler.java:1142)
   at 
org.apache.solr.handler.ReplicationHandler$3.write(ReplicationHandler.java:995)
   at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:684)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)


*File Not Found Exception
org.apache.solr.handler.ReplicationHandler$DirectoryFileStream write
WARNING: Exception while writing response for params: 
file=_aq5s_3.delcommand=filecontentchecksum=truewt=filestreamqt=/replicationgeneration=493833
java.io.FileNotFoundException: F:\Solr-Index\Index_New\Master\index\_aq5s_3.del 
(The system cannot find the file specified)
   at java.io.RandomAccessFile.open(Native Method)
   at java.io.RandomAccessFile.init(Unknown Source)
   at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:193)
   at 
org.apache.solr.handler.ReplicationHandler$DirectoryFileStream.


I don't have much expertise on the Tomcat exceptions. Could someone tell me why 
these errors has occurred and how this can rectified ?

Thanks
Sangeetha



SOLR Exception with SOLR Cloud 5.1 setup on Linux

2015-07-28 Thread sangeetha.subraman...@gtnexus.com
Hi,

I have set up SOLR Cloud comprising of 2 solr instances and zookeeper in 
separate instance. Have created one shard in one of the solr node and the other 
solr node act as a replica for that shard.
I am able to post documents through UI.

But while trying to connect from Java layer I am getting below error. From Java 
level using CLoudSolrCLient class I am passing zookeeper host which is 
10.111.65.152 on 2181 port.

The collection name is umbcollection. I am not sure what is wrong here. Could 
someone help me in finding what could be the rootcause.



org.apache.solr.client.solrj.SolrServerException: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://10.111.65.150:8080/solr/umbcollection: No active slice 
servicing hash code 103646ce in DocCollection(umbcollection)={

  shards:{shard1:{

  range:8000-,

  state:active,

  replicas:{

core_node1:{

  state:active,

  core:umb,

  node_name:10.111.65.150:8080_solr,

  base_url:http://10.111.65.150:8080/solr;,

  leader:true},

core_node2:{
  state:active,

  core:shard1-replica-1,

  node_name:10.111.65.151:8080_solr,

  base_url:http://10.111.65.151:8080/solr,

  maxShardsPerNode:1,

  router:{name:compositeId},

  replicationFactor:1,

  autoAddReplicas:false,

  autoCreated:true}


Thanks
Sangeetha



Re: SOLR Exception with SOLR Cloud 5.1 setup on Linux

2015-07-28 Thread sangeetha.subraman...@gtnexus.com
Yes I did create two shards and two replicas and later dropped the other one.. 
Version is 5.1 . can you please tell me how this can be fixed ??

Thanks
Sangeetha

Sent from mobile

On Jul 28, 2015 8:46 PM, Shawn Heisey apa...@elyograg.org wrote:
On 7/28/2015 8:22 AM, sangeetha.subraman...@gtnexus.com wrote:
 org.apache.solr.client.solrj.SolrServerException: 
 org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
 from server at http://10.111.65.150:8080/solr/umbcollection: No active sice 
 servicing hash code 103646ce in DocCollection(umbcollection)={

   shards:{shard1:{
   range:8000-,

That JSON structure looks like it is a complete collection
cluterstate.  Which means that you only have one shard, but it i configured to 
only cover half of the range of hash values.  You have
nothing covering  through 7fff.  That is consistent with the
error message.  There should be another shard which would cover the
other half of the range.

It seems highly unlikely that you could have ended up with thi cluterstate 
unless you have been manually changing your collection with
the collections API after creating it, or maybe doing manual tweaks to
the config in zookeeper.  Has anything like that happened?

What is your Solr version?

Thanks,
Shawn



What is the best practice to Backup and delete a core from SOLR Master-Slave architecture

2015-05-06 Thread sangeetha.subraman...@gtnexus.com
Hi,

I am a newbie to SOLR. I have setup Master Slave configuration with SOLR 4.0. I 
am trying to identify what is the best way to backup an old core and delete the 
same so as to free up space from the disk.

I did get the information on how to unload a core and delete the indexes from 
the core.

Unloading - http://localhost:8983/solr/admin/cores?action=UNLOADcore=core0
Delete Indexes - 
http://localhost:8983/solr/admin/cores?action=UNLOADcore=core0deleteIndex=true

What is the best approach to remove the old core ?


*   Approach 1

o   Unload the core in both Master and Slave server AND delete the index only 
from Master server (retain the indexes in slave server as a backup). If I am 
retaining the indexes in Slave server, at later point is there a way to bring 
those to Master Server ?

*   Approach 2

o   Unload and delete the indexes from both Master and Slave server. Before 
deleting, take a backup of the data dir of old core from File system. I am not 
sure if this is even possible ?

Is there any other way better way of doing this ? Please let me know

Thanks
Sangeetha


RE: What is the best way of Indexing different formats of documents?

2015-04-08 Thread sangeetha.subraman...@gtnexus.com
Hi Swaraj,



Thanks for the answers.

From my understanding We can index,

·   Using DIH from db

·   Using DIH from filesystem - this is where I am concentrating on.

o   For this we can use SolrJ with Tika(solr cell) from Java layer in order to 
extract the content and send the data through REST API to solrserver

o   Or we can use extractrequesthandler to do the job.



I just want to index only certain documents and there will not be any update 
happening on the indexed document.



In our existing system we already have DIH implemented which indexes document 
from sql server (As you said based on last index time). In this case the 
metadata is there available in database.



But if we are streaming via url, we would need to append the metadata too. 
correct me if i am wrong. And how does the indexing happening here based on 
last index time or something else ? Also for  extractrequesthandler when you 
say manual operation what is it you are talking about ? Can you please clarify.



Thanks

Sangeetha



-Original Message-
From: Swaraj Kumar [mailto:swaraj2...@gmail.com]
Sent: 07 April 2015 18:02
To: solr-user@lucene.apache.org
Subject: Re: What is the best way of Indexing different formats of documents?



You can always choose either DIH or /update/extract to index docs in solr.

Now there are multiple benefits of DIH which I am listing below :-



1. Clean and update using a single command.

2. DIH also optimize indexing using optimize=true 3. You can do delta-import 
based on last index time where as in case of /update/extract you need to do 
manual operation in case of delta import.

4. You can use multiple entity processor and transformers in case of DIH which 
is very useful to index exact data you want.

5. Query parameter rows limits the num of records.



Regards,





Swaraj Kumar

Senior Software Engineer I

MakeMyTrip.com

Mob No- 9811774497



On Tue, Apr 7, 2015 at 4:18 PM, 
sangeetha.subraman...@gtnexus.commailto:sangeetha.subraman...@gtnexus.com  
sangeetha.subraman...@gtnexus.commailto:sangeetha.subraman...@gtnexus.com 
wrote:



 Hi,



 I am a newbie to SOLR and basically from database background. We have

 a requirement of indexing files of different formats (x12,edifact, csv,xml).

 The files which are inputted can be of any format and we need to do a

 content based search on it.



 From the web I understand we can use TIKA processor to extract the

 content and store it in SOLR. What I want to know is, is there any

 better approach for indexing files in SOLR ? Can we index the document

 through streaming directly from the Application ? If so what is the

 disadvantage of using it (against DIH which fetches from the

 database)? Could someone share me some insight on this ? ls there any

 web links which I can refer to get some idea on it ? Please do help.



 Thanks

 Sangeetha






What is the best way of Indexing different formats of documents?

2015-04-07 Thread sangeetha.subraman...@gtnexus.com
Hi,

I am a newbie to SOLR and basically from database background. We have a 
requirement of indexing files of different formats (x12,edifact, csv,xml).
The files which are inputted can be of any format and we need to do a content 
based search on it.

From the web I understand we can use TIKA processor to extract the content and 
store it in SOLR. What I want to know is, is there any better approach for 
indexing files in SOLR ? Can we index the document through streaming directly 
from the Application ? If so what is the disadvantage of using it (against DIH 
which fetches from the database)? Could someone share me some insight on this 
? ls there any web links which I can refer to get some idea on it ? Please do 
help.

Thanks
Sangeetha



Help needed in Indexing and Search on xml content

2014-09-25 Thread sangeetha.subraman...@gtnexus.com
Hi Team,

I am a newbie to SOLR. I have got search fields stored in a xml file which is 
stored in MSSQL. I want to index on the content of the xml file in SOLR. We 
need to provide search based on the fields present in the XML file.

The reason why we are storing the input details as XML file is , the users will 
be able to add custom input fields on their own with values. Storing these 
custom fields as columns in MSSQL seems to be not an optimal solution. So we 
thought of putting it in XML file and store that file in RDBMS.
But I am not sure on how we can index the content of the file to make search 
better. I believe this can be done by ExtractingRequestHandler.

Could someone help me on how we can implement this/ direct me to some pages 
which could be of help to me ?

Thanks
Sangeetha