Re: Limit the documents for each shard in solr cloud

2015-05-08 Thread Jilani Shaik
Hi,

Actually we are facing lot of issues with Solr shards in our environment.
Our environment is fully loaded with around 150 million documents where
each document will have around 50+ stored fields which has multiple values.
And also we have lot of custom components in this environment which are
using FieldCache and various other Solr features.

The main issue we are facing is shards going down frequently in Solr cloud.

As you mentioned in this reply and I also I have observed various other
reply on memory issues. I will try to debug further and keep posted here if
any issues I found in that process.

Thanks,
Jilani

On Thu, May 7, 2015 at 10:17 PM, Daniel Collins danwcoll...@gmail.com
wrote:

 Jilani, you did say My team needs that option if at all possible, my
 first response would be why?.   Why do they want to limit the number of
 documents per shard, what's the rationale/use case behind that
 requirement?  Once we understand that, we can explain why its a bad idea.
 :)

 I suspect I'm re-iterating Jack's comments, but why are you sharding in the
 first place? 8 shards split across 4 machines, so 2 shards per machine.
 But you have 2 replicas of each shard, so you have 16 Solr core, and hence
 4 Solr cores per machine?  Since you need an instance of all 8 shards to be
 up in order to service requests, you can get away with everything on 2
 machines, but you still have 8 Solr cores to manage in order to have a
 fully functioning system.  What's the benefit of sharding in this
 scenario?  Sharding adds complexity, so you normally only add sharding if
 your search times are too slow without it.

 You need to work out how much disk space the whole 20m docs is going to
 take (maybe index 1m or 5m docs and extrapolate if they are all equivalent
 in size), then split it across 4 machines.  But as Erick points out you
 need to allow for merges to occur, so whatever the space of the static
 data set, you need to allow for double that from time to time if background
 merges are happening.


 On 7 May 2015 at 16:05, Jack Krupansky jack.krupan...@gmail.com wrote:

  A leader is also a replica - SolrCloud is not a master/slave
 architecture.
  Any replica can be elected to be the leader, but that is only temporary
 and
  can change over time.
 
  You can place multiple shards on a single node, but was that really your
  intention?
 
  Generally, number of nodes equals number of shards times the replication
  factor. But then divided by shards per node if you do place more than one
  shard per node.
 
  -- Jack Krupansky
 
  On Thu, May 7, 2015 at 1:29 AM, Jilani Shaik jilani24...@gmail.com
  wrote:
 
   Hi,
  
   Is it possible to restrict number of documents per shard in Solr cloud?
  
   Lets say we have Solr cloud with 4 nodes, and on each node we have one
   leader and one replica. Like wise total we have 8 shards that includes
   replicas. Now I need to index my documents in such a way that each
 shard
   will have only 5 million documents. Total documents in Solr cloud
 should
  be
   20 million documents.
  
  
   Thanks,
   Jilani
  
 



Re: Limit the documents for each shard in solr cloud

2015-05-07 Thread Jilani Shaik
Hi Daniel,

Thanks for the detailed explanation.

My understanding is also similar to you that we should not provide limit
over the shard for number of documents that it can index. Usually it will
depend on shard routing provided by Solr and I am not expecting any change
to document routing process.

My team needs that option if at all possible, Before saying not possible
at Solr end to limit the documents per shard, I just want to get
confirmation or some details of this. So I dropped a question here to get
answers.

You mentioned that as long as it has sufficient space to do index
 - How will Solr knows or estimate that whether Solr has sufficient
space to index or not on particular shard or on entire cloud?

Conclusion of my understand:
We will not be able to limit the documents per shard in Solr Cloud. As Solr
will accept all the documents as long as space is there for it to index.

Please suggest.

Thanks,
Jilani

On Thu, May 7, 2015 at 12:41 PM, Daniel Collins danwcoll...@gmail.com
wrote:

 Not sure I understand your problem.  If you have 20m documents, and 8
 shards, then each shard is (broadly speaking) only going to have 2.5m docs
 each, so I don't follow the 5m limit? That is with the default
 routing/hashing, obviously you can write your own hash algorithm or you can
 shard at your application level.

 In terms of limiting documents in a shard, I'm not sure what purpose that
 would serve.  If for arguments sake you only had 2 shards, and a limit of
 5m doccs per shard, what happens when you hit that limit?  If you have
 indexed 10m docs, and now you try to index one more, what would you expect
 to happen, would the system just reject any documents, should it try to
 shard to shard 1 but see that is full, and then fail-over to shard2 instead
 (that's not going to work as sharding needs to be reproducible and the
 document was intended for shard 1)?

 Solr's basic premise would be to index what you gave it, as long as it has
 sufficient space to do that.  If you want to limit your index to 20m docs,
 that is probably better done at the application layer (but I still don't
 really see why you would want to do that).

 On 7 May 2015 at 06:29, Jilani Shaik jilani24...@gmail.com wrote:

  Hi,
 
  Is it possible to restrict number of documents per shard in Solr cloud?
 
  Lets say we have Solr cloud with 4 nodes, and on each node we have one
  leader and one replica. Like wise total we have 8 shards that includes
  replicas. Now I need to index my documents in such a way that each shard
  will have only 5 million documents. Total documents in Solr cloud should
 be
  20 million documents.
 
 
  Thanks,
  Jilani
 



Limit the documents for each shard in solr cloud

2015-05-06 Thread Jilani Shaik
Hi,

Is it possible to restrict number of documents per shard in Solr cloud?

Lets say we have Solr cloud with 4 nodes, and on each node we have one
leader and one replica. Like wise total we have 8 shards that includes
replicas. Now I need to index my documents in such a way that each shard
will have only 5 million documents. Total documents in Solr cloud should be
20 million documents.


Thanks,
Jilani


Solr Cloud

2015-05-04 Thread Jilani Shaik
Hi All,

Do we have any monitoring tools for Apache Solr Cloud? similar to Apache
Ambari which is used for Hadoop Cluster.


Basically I am looking for tool similar to Apache Ambari, which will give
us various metrics in terms of graphs and charts along with deep details
for each node in Hadoop cluster.


Thanks,
Jilani


Re: Solr Cloud

2015-05-04 Thread Jilani Shaik
Thanks Shawn, It has provided the pointers of open source, I am really
interested to look for open source solution, I have basic knowledge of
Ganglia and Nagios. I have gone through the sematext and our company
already using newrelic on this space. But I am interested in open source
similar to Ambari/cloud era manager as one shop for this. Even I am
interested to contribute on this as a developer. Are there any one working
on this monitoring tool for Apache Solr.

Thanks,
Jilani

On Mon, May 4, 2015 at 7:08 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 5/4/2015 6:16 AM, Jilani Shaik wrote:
  Do we have any monitoring tools for Apache Solr Cloud? similar to Apache
  Ambari which is used for Hadoop Cluster.
 
  Basically I am looking for tool similar to Apache Ambari, which will give
  us various metrics in terms of graphs and charts along with deep details
  for each node in Hadoop cluster.

 The most comprehensive and capable Solr monitoring available that I know
 of is a service provided by Sematext.

 http://sematext.com/

 If you want something cheaper, you'll have to build it yourself with
 free tools.  Some of the metrics available from sematext can be
 duplicated by Xymon or Nagios, others can be duplicated by JavaMelody or
 another monitoring tool made specifically for Java programs.  I have
 duplicated some of that information with tools that I wrote myself, like
 this status servlet:

 https://www.dropbox.com/s/gh6e47mu8sp7zkt/status-page-solr.png?dl=0

 Nothing that I have built comes close to what sematext provides, but if
 you want history from their SPM product on your servers that goes back
 more than half an hour, you will pay for it.   Their prices are actually
 fairly reasonable for everything you get.

 Thanks,
 Shawn




suggest.Suggester - Loading stored lookup data failed

2015-05-02 Thread Jilani Shaik
Hi,

When my solr core is loading, I am getting the below error, even though it
is WARN. I just wants to fix this. Please let me know how to fix it.It is
showing file missing, do we have any sample file for this. I did not find
even in Apache Solr SVN.

2015-05-01 11:33:52,475 WARN suggest.Suggester - Loading stored lookup data
failed
java.io.FileNotFoundException:
/solr/Applications/shards/shard1/data/solr/cores/syslog/data/autocomplete/tst.dat
(No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.init(FileInputStream.java:138)
at
org.apache.solr.spelling.suggest.Suggester.init(Suggester.java:117)
at
org.apache.solr.handler.component.SpellCheckComponent.inform(SpellCheckComponent.java:636)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:651)
at org.apache.solr.core.SolrCore.init(SolrCore.java:849)
at org.apache.solr.core.SolrCore.init(SolrCore.java:641)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:583)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:264)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:256)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Please suggest me what to do to remove this warning from my logs.


Thanks,
Jilani


mlt handler not giving response in Solr Cloud

2014-11-18 Thread Jilani Shaik
Hi,

When I tried to execute the mlt handler query on a shard it is giving
result if the documents exist on that shards.

in below scenario, I have a cloud shards on localhost with ports 8181 and
8191. where documents are distributed. if the mlt query document id belongs
to 8181 shard and the query hits to 8181 shard then only I am getting the
results.


 No result
http://localhost:8181/solr/collectionName/mlt?q=id:medl_24806189fq=segment:medlfl=id,owui_prows=100

 Will give result
http://localhost:8191/solr/collectionName/mlt?q=id:medl_24806189fq=segment:medlfl=id,owui_prows=100

*So the distributed search is not working for mlt handler(my assumption,
please correct). *

Even I tried with the below

http://localhost:8181/solr/collectionName/mlt?q=owui_p:medl_24806189fq=segment:medlfl=id,owui_prows=100;
*shards.qt=/mltshards=localhost:8181/solr/,localhost:8191/solr/*

http://localhost:8181/solr/collectionName/mlt?q=owui_p:medl_24806189fq=segment:medlfl=id,owui_prows=100
*shards.qt=/mltshards=localhost:8181/solr/collectionName/,localhost:8191/solr/collectionName/*

even I tried with select handler and with mlt as true also not working.

http://localhost:8181/solr/collectionName/*select?mlt=true*
q=owui_p:medl_24806189fq=segment:medlfl=id,owui_prows=100distrib=truemlt.fl=ti_w


MLT configuration from solrconfig.xml

!-- MoreLikeThis request handler --
requestHandler name=/mlt class=solr.MoreLikeThisHandler
lst name=defaults
str name=mlt.flti_w/str
str name=mlt.mintf1/str
str name=mlt.mindf2/str
str name=mlt.boosttrue/str
str
name=shardslocalhost:8181/solr/collectionName,localhost:8191/solr/collectionName/str
str name=shards.qt/mlt/str
str name=mlttrue/str
str name=echoParamsall/str
/lst
/requestHandler



Please let me know what is the missing here to get the result in solr cloud.

Thanks,
Jilani


mlt handler not giving response in Solr Cloud

2014-11-18 Thread Jilani Shaik
Hi,

When I tried to execute the mlt handler query on a shard it is giving
result if the documents exist on that shards.

in below scenario, I have a cloud shards on localhost with ports 8181 and
8191. where documents are distributed. if the mlt query document id belongs
to 8181 shard and the query hits to 8181 shard then only I am getting the
results.


 No result
http://localhost:8181/solr/collectionName/mlt?q=id:medl_24806189fq=segment:medlfl=id,owui_prows=100

 Will give result
http://localhost:8191/solr/collectionName/mlt?q=id:medl_24806189fq=segment:medlfl=id,owui_prows=100

*So the distributed search is not working for mlt handler(my assumption,
please correct). *

Even I tried with the below

http://localhost:8181/solr/collectionName/mlt?q=owui_p:medl_24806189fq=segment:medlfl=id,owui_prows=100;
*shards.qt=/mltshards=localhost:8181/solr/,localhost:8191/solr/*

http://localhost:8181/solr/collectionName/mlt?q=owui_p:medl_24806189fq=segment:medlfl=id,owui_prows=100
*shards.qt=/mltshards=localhost:8181/solr/collectionName/,localhost:8191/solr/collectionName/*

even I tried with select handler and with mlt as true also not working.

http://localhost:8181/solr/collectionName/*select?mlt=true*
q=owui_p:medl_24806189fq=segment:medlfl=id,owui_prows=100distrib=truemlt.fl=ti_w


MLT configuration from solrconfig.xml

!-- MoreLikeThis request handler --
requestHandler name=/mlt class=solr.MoreLikeThisHandler
lst name=defaults
str name=mlt.flti_w/str
str name=mlt.mintf1/str
str name=mlt.mindf2/str
str name=mlt.boosttrue/str
str
name=shardslocalhost:8181/solr/collectionName,localhost:8191/solr/collectionName/str
str name=shards.qt/mlt/str
str name=mlttrue/str
str name=echoParamsall/str
/lst
/requestHandler



Please let me know what is the missing here to get the result in solr cloud.

Thanks,
Jilani


Re: mlt handler not giving response in Solr Cloud

2014-11-18 Thread Jilani Shaik
Please help me on this issue. Please provide me suggestions what is missing
to get the response from multiple solr shards in cloud.

On Tue, Nov 18, 2014 at 1:40 PM, Jilani Shaik jilani24...@gmail.com wrote:

 Hi,

 When I tried to execute the mlt handler query on a shard it is giving
 result if the documents exist on that shards.

 in below scenario, I have a cloud shards on localhost with ports 8181 and
 8191. where documents are distributed. if the mlt query document id belongs
 to 8181 shard and the query hits to 8181 shard then only I am getting the
 results.


  No result

 http://localhost:8181/solr/collectionName/mlt?q=id:medl_24806189fq=segment:medlfl=id,owui_prows=100

  Will give result

 http://localhost:8191/solr/collectionName/mlt?q=id:medl_24806189fq=segment:medlfl=id,owui_prows=100

 *So the distributed search is not working for mlt handler(my assumption,
 please correct). *

 Even I tried with the below


 http://localhost:8181/solr/collectionName/mlt?q=owui_p:medl_24806189fq=segment:medlfl=id,owui_prows=100;
 *shards.qt=/mltshards=localhost:8181/solr/,localhost:8191/solr/*


 http://localhost:8181/solr/collectionName/mlt?q=owui_p:medl_24806189fq=segment:medlfl=id,owui_prows=100
 *shards.qt=/mltshards=localhost:8181/solr/collectionName/,localhost:8191/solr/collectionName/*

 even I tried with select handler and with mlt as true also not working.

 http://localhost:8181/solr/collectionName/*select?mlt=true*
 q=owui_p:medl_24806189fq=segment:medlfl=id,owui_prows=100distrib=truemlt.fl=ti_w


 MLT configuration from solrconfig.xml

 !-- MoreLikeThis request handler --
 requestHandler name=/mlt class=solr.MoreLikeThisHandler
 lst name=defaults
 str name=mlt.flti_w/str
 str name=mlt.mintf1/str
 str name=mlt.mindf2/str
 str name=mlt.boosttrue/str
 str
 name=shardslocalhost:8181/solr/collectionName,localhost:8191/solr/collectionName/str
 str name=shards.qt/mlt/str
 str name=mlttrue/str
 str name=echoParamsall/str
 /lst
 /requestHandler



 Please let me know what is the missing here to get the result in solr
 cloud.

 Thanks,
 Jilani



Getting huge difference in QTime for terms.lower and terms.prefix

2014-04-10 Thread Jilani Shaik
Hi,

When I queried terms component with a terms.prefix the QTime for it is
100 milli seconds, where as the same query I am giving with terms.lower
then the QTime is  500 milliseconds. I am using the Solr Cloud.

I am giving both the cases terms.limit as 60 and terms.sort=index.

Query1 Params:
terms.fl=field_Nameterms.limit=60terms.prefix=bwt=jsonterms.sort=indexshard.keys=shard_key
QTime: 100 milli seconds


Query2 Params:
terms.fl=field_Nameterms.limit=60terms.lower=bwt=jsonterms.sort=indexshard.keys=shard_key
QTime: 500 milliseconds


The response is giving the same terms in both queries, But the QTime is
different.


Please let me know why is the difference in QTime for both approaches.


Thanks,
Jilani


Re: Getting huge difference in QTime for terms.lower and terms.prefix

2014-04-10 Thread Jilani Shaik
Please provide suggestions what could be the reason for this.

Thanks,


On Thu, Apr 10, 2014 at 2:54 PM, Jilani Shaik jilani24...@gmail.com wrote:

 Hi,

 When I queried terms component with a terms.prefix the QTime for it is
 100 milli seconds, where as the same query I am giving with terms.lower
 then the QTime is  500 milliseconds. I am using the Solr Cloud.

 I am giving both the cases terms.limit as 60 and terms.sort=index.

 Query1 Params:

 terms.fl=field_Nameterms.limit=60terms.prefix=bwt=jsonterms.sort=indexshard.keys=shard_key
 QTime: 100 milli seconds


 Query2 Params:

 terms.fl=field_Nameterms.limit=60terms.lower=bwt=jsonterms.sort=indexshard.keys=shard_key
 QTime: 500 milliseconds


 The response is giving the same terms in both queries, But the QTime is
 different.


 Please let me know why is the difference in QTime for both approaches.


 Thanks,
 Jilani



Re: Filter in terms component

2014-03-20 Thread Jilani Shaik
Will it work for multi value fields, It is saying that Field Cache will not
work for multi value fields error. Most of the data is multi value fields
in index.

Thanks,
Jilani




On Thu, Mar 20, 2014 at 1:53 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 If you just need counts may be you can make use of
 http://wiki.apache.org/solr/FunctionQuery#Relevance_Functions

 Ahmet



 On Wednesday, March 19, 2014 9:49 PM, Jilani Shaik jilani24...@gmail.com
 wrote:
 Hi Ahmet,

 I have gone through the facet component, as our application has 300+
 million docs and it very time consuming with this component and also it
 uses cache. So I have gone through the terms component where Solr is
 reading index for field terms, is there any approach where I can get the
 terms using the filter. So that I can restrict some of the document terms
 in counts.

 Basically we have set of documents where we want to show the terms count
 based on those filters with set name. Instead of reading entire index.

 Please let me know if you need any details to throw some more pointers

 Thanks,
 Jilani



 On Thu, Mar 20, 2014 at 1:04 AM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi Jilani,
 
  What features of terms component are you after? If if it is just
  terms.prefix, it could be simulated with facet component with
 facet.prefix
  parameter. faceting component respects filter queries.
 
 
 
  On Wednesday, March 19, 2014 8:58 PM, Jilani Shaik 
 jilani24...@gmail.com
  wrote:
  Hi,
 
  I have huge index and using Solr. I need terms component with filter by a
  field. Please let me know is there anything that I can get it.
 
  Please provide me some pointers, even to develop this by going through
 the
  Lucene.
 
  Please suggest.
 
  Thanks,
  Jilani
 
 




Re: Filter in terms component

2014-03-20 Thread Jilani Shaik
Hi,

Please provide some more pointers to go ahead in addressing this.

Thnks,
Jilani


On Thu, Mar 20, 2014 at 8:50 PM, Jilani Shaik jilani24...@gmail.com wrote:


 Will it work for multi value fields, It is saying that Field Cache will
 not work for multi value fields error. Most of the data is multi value
 fields in index.

 Thanks,
 Jilani




 On Thu, Mar 20, 2014 at 1:53 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 If you just need counts may be you can make use of
 http://wiki.apache.org/solr/FunctionQuery#Relevance_Functions

 Ahmet



 On Wednesday, March 19, 2014 9:49 PM, Jilani Shaik jilani24...@gmail.com
 wrote:
 Hi Ahmet,

 I have gone through the facet component, as our application has 300+
 million docs and it very time consuming with this component and also it
 uses cache. So I have gone through the terms component where Solr is
 reading index for field terms, is there any approach where I can get the
 terms using the filter. So that I can restrict some of the document terms
 in counts.

 Basically we have set of documents where we want to show the terms count
 based on those filters with set name. Instead of reading entire index.

 Please let me know if you need any details to throw some more pointers

 Thanks,
 Jilani



 On Thu, Mar 20, 2014 at 1:04 AM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi Jilani,
 
  What features of terms component are you after? If if it is just
  terms.prefix, it could be simulated with facet component with
 facet.prefix
  parameter. faceting component respects filter queries.
 
 
 
  On Wednesday, March 19, 2014 8:58 PM, Jilani Shaik 
 jilani24...@gmail.com
  wrote:
  Hi,
 
  I have huge index and using Solr. I need terms component with filter by
 a
  field. Please let me know is there anything that I can get it.
 
  Please provide me some pointers, even to develop this by going through
 the
  Lucene.
 
  Please suggest.
 
  Thanks,
  Jilani
 
 





Filter in terms component

2014-03-19 Thread Jilani Shaik
Hi,

I have huge index and using Solr. I need terms component with filter by a
field. Please let me know is there anything that I can get it.

Please provide me some pointers, even to develop this by going through the
Lucene.

Please suggest.

Thanks,
Jilani


Re: Filter in terms component

2014-03-19 Thread Jilani Shaik
Hi Ahmet,

I have gone through the facet component, as our application has 300+
million docs and it very time consuming with this component and also it
uses cache. So I have gone through the terms component where Solr is
reading index for field terms, is there any approach where I can get the
terms using the filter. So that I can restrict some of the document terms
in counts.

Basically we have set of documents where we want to show the terms count
based on those filters with set name. Instead of reading entire index.

Please let me know if you need any details to throw some more pointers

Thanks,
Jilani


On Thu, Mar 20, 2014 at 1:04 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Jilani,

 What features of terms component are you after? If if it is just
 terms.prefix, it could be simulated with facet component with facet.prefix
 parameter. faceting component respects filter queries.



 On Wednesday, March 19, 2014 8:58 PM, Jilani Shaik jilani24...@gmail.com
 wrote:
 Hi,

 I have huge index and using Solr. I need terms component with filter by a
 field. Please let me know is there anything that I can get it.

 Please provide me some pointers, even to develop this by going through the
 Lucene.

 Please suggest.

 Thanks,
 Jilani




Get the query result from one collection and send it to other collection to for merging the result sets

2013-06-26 Thread Jilani Shaik
Hi,

We will have two categories of data, where one category will be the list of
primary data (for example products) and the other collection (it could be
spread across shards) holds the transaction data (for example product sales
data).



We have search scenario where we need to show the products along with the
number of sales for each product. For this we need to do a facet based
search on second collection and then this has to shown together along with
the primary data.


Is there any way to handle this kind of scenario. Please suggest any other
approaches to get the desired result.


Thank you,
Jilani