Re: Limit the documents for each shard in solr cloud
Hi, Actually we are facing lot of issues with Solr shards in our environment. Our environment is fully loaded with around 150 million documents where each document will have around 50+ stored fields which has multiple values. And also we have lot of custom components in this environment which are using FieldCache and various other Solr features. The main issue we are facing is shards going down frequently in Solr cloud. As you mentioned in this reply and I also I have observed various other reply on memory issues. I will try to debug further and keep posted here if any issues I found in that process. Thanks, Jilani On Thu, May 7, 2015 at 10:17 PM, Daniel Collins danwcoll...@gmail.com wrote: Jilani, you did say My team needs that option if at all possible, my first response would be why?. Why do they want to limit the number of documents per shard, what's the rationale/use case behind that requirement? Once we understand that, we can explain why its a bad idea. :) I suspect I'm re-iterating Jack's comments, but why are you sharding in the first place? 8 shards split across 4 machines, so 2 shards per machine. But you have 2 replicas of each shard, so you have 16 Solr core, and hence 4 Solr cores per machine? Since you need an instance of all 8 shards to be up in order to service requests, you can get away with everything on 2 machines, but you still have 8 Solr cores to manage in order to have a fully functioning system. What's the benefit of sharding in this scenario? Sharding adds complexity, so you normally only add sharding if your search times are too slow without it. You need to work out how much disk space the whole 20m docs is going to take (maybe index 1m or 5m docs and extrapolate if they are all equivalent in size), then split it across 4 machines. But as Erick points out you need to allow for merges to occur, so whatever the space of the static data set, you need to allow for double that from time to time if background merges are happening. On 7 May 2015 at 16:05, Jack Krupansky jack.krupan...@gmail.com wrote: A leader is also a replica - SolrCloud is not a master/slave architecture. Any replica can be elected to be the leader, but that is only temporary and can change over time. You can place multiple shards on a single node, but was that really your intention? Generally, number of nodes equals number of shards times the replication factor. But then divided by shards per node if you do place more than one shard per node. -- Jack Krupansky On Thu, May 7, 2015 at 1:29 AM, Jilani Shaik jilani24...@gmail.com wrote: Hi, Is it possible to restrict number of documents per shard in Solr cloud? Lets say we have Solr cloud with 4 nodes, and on each node we have one leader and one replica. Like wise total we have 8 shards that includes replicas. Now I need to index my documents in such a way that each shard will have only 5 million documents. Total documents in Solr cloud should be 20 million documents. Thanks, Jilani
Re: Limit the documents for each shard in solr cloud
Hi Daniel, Thanks for the detailed explanation. My understanding is also similar to you that we should not provide limit over the shard for number of documents that it can index. Usually it will depend on shard routing provided by Solr and I am not expecting any change to document routing process. My team needs that option if at all possible, Before saying not possible at Solr end to limit the documents per shard, I just want to get confirmation or some details of this. So I dropped a question here to get answers. You mentioned that as long as it has sufficient space to do index - How will Solr knows or estimate that whether Solr has sufficient space to index or not on particular shard or on entire cloud? Conclusion of my understand: We will not be able to limit the documents per shard in Solr Cloud. As Solr will accept all the documents as long as space is there for it to index. Please suggest. Thanks, Jilani On Thu, May 7, 2015 at 12:41 PM, Daniel Collins danwcoll...@gmail.com wrote: Not sure I understand your problem. If you have 20m documents, and 8 shards, then each shard is (broadly speaking) only going to have 2.5m docs each, so I don't follow the 5m limit? That is with the default routing/hashing, obviously you can write your own hash algorithm or you can shard at your application level. In terms of limiting documents in a shard, I'm not sure what purpose that would serve. If for arguments sake you only had 2 shards, and a limit of 5m doccs per shard, what happens when you hit that limit? If you have indexed 10m docs, and now you try to index one more, what would you expect to happen, would the system just reject any documents, should it try to shard to shard 1 but see that is full, and then fail-over to shard2 instead (that's not going to work as sharding needs to be reproducible and the document was intended for shard 1)? Solr's basic premise would be to index what you gave it, as long as it has sufficient space to do that. If you want to limit your index to 20m docs, that is probably better done at the application layer (but I still don't really see why you would want to do that). On 7 May 2015 at 06:29, Jilani Shaik jilani24...@gmail.com wrote: Hi, Is it possible to restrict number of documents per shard in Solr cloud? Lets say we have Solr cloud with 4 nodes, and on each node we have one leader and one replica. Like wise total we have 8 shards that includes replicas. Now I need to index my documents in such a way that each shard will have only 5 million documents. Total documents in Solr cloud should be 20 million documents. Thanks, Jilani
Limit the documents for each shard in solr cloud
Hi, Is it possible to restrict number of documents per shard in Solr cloud? Lets say we have Solr cloud with 4 nodes, and on each node we have one leader and one replica. Like wise total we have 8 shards that includes replicas. Now I need to index my documents in such a way that each shard will have only 5 million documents. Total documents in Solr cloud should be 20 million documents. Thanks, Jilani
Solr Cloud
Hi All, Do we have any monitoring tools for Apache Solr Cloud? similar to Apache Ambari which is used for Hadoop Cluster. Basically I am looking for tool similar to Apache Ambari, which will give us various metrics in terms of graphs and charts along with deep details for each node in Hadoop cluster. Thanks, Jilani
Re: Solr Cloud
Thanks Shawn, It has provided the pointers of open source, I am really interested to look for open source solution, I have basic knowledge of Ganglia and Nagios. I have gone through the sematext and our company already using newrelic on this space. But I am interested in open source similar to Ambari/cloud era manager as one shop for this. Even I am interested to contribute on this as a developer. Are there any one working on this monitoring tool for Apache Solr. Thanks, Jilani On Mon, May 4, 2015 at 7:08 PM, Shawn Heisey apa...@elyograg.org wrote: On 5/4/2015 6:16 AM, Jilani Shaik wrote: Do we have any monitoring tools for Apache Solr Cloud? similar to Apache Ambari which is used for Hadoop Cluster. Basically I am looking for tool similar to Apache Ambari, which will give us various metrics in terms of graphs and charts along with deep details for each node in Hadoop cluster. The most comprehensive and capable Solr monitoring available that I know of is a service provided by Sematext. http://sematext.com/ If you want something cheaper, you'll have to build it yourself with free tools. Some of the metrics available from sematext can be duplicated by Xymon or Nagios, others can be duplicated by JavaMelody or another monitoring tool made specifically for Java programs. I have duplicated some of that information with tools that I wrote myself, like this status servlet: https://www.dropbox.com/s/gh6e47mu8sp7zkt/status-page-solr.png?dl=0 Nothing that I have built comes close to what sematext provides, but if you want history from their SPM product on your servers that goes back more than half an hour, you will pay for it. Their prices are actually fairly reasonable for everything you get. Thanks, Shawn
suggest.Suggester - Loading stored lookup data failed
Hi, When my solr core is loading, I am getting the below error, even though it is WARN. I just wants to fix this. Please let me know how to fix it.It is showing file missing, do we have any sample file for this. I did not find even in Apache Solr SVN. 2015-05-01 11:33:52,475 WARN suggest.Suggester - Loading stored lookup data failed java.io.FileNotFoundException: /solr/Applications/shards/shard1/data/solr/cores/syslog/data/autocomplete/tst.dat (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.init(FileInputStream.java:138) at org.apache.solr.spelling.suggest.Suggester.init(Suggester.java:117) at org.apache.solr.handler.component.SpellCheckComponent.inform(SpellCheckComponent.java:636) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:651) at org.apache.solr.core.SolrCore.init(SolrCore.java:849) at org.apache.solr.core.SolrCore.init(SolrCore.java:641) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:583) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:264) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:256) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Please suggest me what to do to remove this warning from my logs. Thanks, Jilani
mlt handler not giving response in Solr Cloud
Hi, When I tried to execute the mlt handler query on a shard it is giving result if the documents exist on that shards. in below scenario, I have a cloud shards on localhost with ports 8181 and 8191. where documents are distributed. if the mlt query document id belongs to 8181 shard and the query hits to 8181 shard then only I am getting the results. No result http://localhost:8181/solr/collectionName/mlt?q=id:medl_24806189fq=segment:medlfl=id,owui_prows=100 Will give result http://localhost:8191/solr/collectionName/mlt?q=id:medl_24806189fq=segment:medlfl=id,owui_prows=100 *So the distributed search is not working for mlt handler(my assumption, please correct). * Even I tried with the below http://localhost:8181/solr/collectionName/mlt?q=owui_p:medl_24806189fq=segment:medlfl=id,owui_prows=100; *shards.qt=/mltshards=localhost:8181/solr/,localhost:8191/solr/* http://localhost:8181/solr/collectionName/mlt?q=owui_p:medl_24806189fq=segment:medlfl=id,owui_prows=100 *shards.qt=/mltshards=localhost:8181/solr/collectionName/,localhost:8191/solr/collectionName/* even I tried with select handler and with mlt as true also not working. http://localhost:8181/solr/collectionName/*select?mlt=true* q=owui_p:medl_24806189fq=segment:medlfl=id,owui_prows=100distrib=truemlt.fl=ti_w MLT configuration from solrconfig.xml !-- MoreLikeThis request handler -- requestHandler name=/mlt class=solr.MoreLikeThisHandler lst name=defaults str name=mlt.flti_w/str str name=mlt.mintf1/str str name=mlt.mindf2/str str name=mlt.boosttrue/str str name=shardslocalhost:8181/solr/collectionName,localhost:8191/solr/collectionName/str str name=shards.qt/mlt/str str name=mlttrue/str str name=echoParamsall/str /lst /requestHandler Please let me know what is the missing here to get the result in solr cloud. Thanks, Jilani
mlt handler not giving response in Solr Cloud
Hi, When I tried to execute the mlt handler query on a shard it is giving result if the documents exist on that shards. in below scenario, I have a cloud shards on localhost with ports 8181 and 8191. where documents are distributed. if the mlt query document id belongs to 8181 shard and the query hits to 8181 shard then only I am getting the results. No result http://localhost:8181/solr/collectionName/mlt?q=id:medl_24806189fq=segment:medlfl=id,owui_prows=100 Will give result http://localhost:8191/solr/collectionName/mlt?q=id:medl_24806189fq=segment:medlfl=id,owui_prows=100 *So the distributed search is not working for mlt handler(my assumption, please correct). * Even I tried with the below http://localhost:8181/solr/collectionName/mlt?q=owui_p:medl_24806189fq=segment:medlfl=id,owui_prows=100; *shards.qt=/mltshards=localhost:8181/solr/,localhost:8191/solr/* http://localhost:8181/solr/collectionName/mlt?q=owui_p:medl_24806189fq=segment:medlfl=id,owui_prows=100 *shards.qt=/mltshards=localhost:8181/solr/collectionName/,localhost:8191/solr/collectionName/* even I tried with select handler and with mlt as true also not working. http://localhost:8181/solr/collectionName/*select?mlt=true* q=owui_p:medl_24806189fq=segment:medlfl=id,owui_prows=100distrib=truemlt.fl=ti_w MLT configuration from solrconfig.xml !-- MoreLikeThis request handler -- requestHandler name=/mlt class=solr.MoreLikeThisHandler lst name=defaults str name=mlt.flti_w/str str name=mlt.mintf1/str str name=mlt.mindf2/str str name=mlt.boosttrue/str str name=shardslocalhost:8181/solr/collectionName,localhost:8191/solr/collectionName/str str name=shards.qt/mlt/str str name=mlttrue/str str name=echoParamsall/str /lst /requestHandler Please let me know what is the missing here to get the result in solr cloud. Thanks, Jilani
Re: mlt handler not giving response in Solr Cloud
Please help me on this issue. Please provide me suggestions what is missing to get the response from multiple solr shards in cloud. On Tue, Nov 18, 2014 at 1:40 PM, Jilani Shaik jilani24...@gmail.com wrote: Hi, When I tried to execute the mlt handler query on a shard it is giving result if the documents exist on that shards. in below scenario, I have a cloud shards on localhost with ports 8181 and 8191. where documents are distributed. if the mlt query document id belongs to 8181 shard and the query hits to 8181 shard then only I am getting the results. No result http://localhost:8181/solr/collectionName/mlt?q=id:medl_24806189fq=segment:medlfl=id,owui_prows=100 Will give result http://localhost:8191/solr/collectionName/mlt?q=id:medl_24806189fq=segment:medlfl=id,owui_prows=100 *So the distributed search is not working for mlt handler(my assumption, please correct). * Even I tried with the below http://localhost:8181/solr/collectionName/mlt?q=owui_p:medl_24806189fq=segment:medlfl=id,owui_prows=100; *shards.qt=/mltshards=localhost:8181/solr/,localhost:8191/solr/* http://localhost:8181/solr/collectionName/mlt?q=owui_p:medl_24806189fq=segment:medlfl=id,owui_prows=100 *shards.qt=/mltshards=localhost:8181/solr/collectionName/,localhost:8191/solr/collectionName/* even I tried with select handler and with mlt as true also not working. http://localhost:8181/solr/collectionName/*select?mlt=true* q=owui_p:medl_24806189fq=segment:medlfl=id,owui_prows=100distrib=truemlt.fl=ti_w MLT configuration from solrconfig.xml !-- MoreLikeThis request handler -- requestHandler name=/mlt class=solr.MoreLikeThisHandler lst name=defaults str name=mlt.flti_w/str str name=mlt.mintf1/str str name=mlt.mindf2/str str name=mlt.boosttrue/str str name=shardslocalhost:8181/solr/collectionName,localhost:8191/solr/collectionName/str str name=shards.qt/mlt/str str name=mlttrue/str str name=echoParamsall/str /lst /requestHandler Please let me know what is the missing here to get the result in solr cloud. Thanks, Jilani
Getting huge difference in QTime for terms.lower and terms.prefix
Hi, When I queried terms component with a terms.prefix the QTime for it is 100 milli seconds, where as the same query I am giving with terms.lower then the QTime is 500 milliseconds. I am using the Solr Cloud. I am giving both the cases terms.limit as 60 and terms.sort=index. Query1 Params: terms.fl=field_Nameterms.limit=60terms.prefix=bwt=jsonterms.sort=indexshard.keys=shard_key QTime: 100 milli seconds Query2 Params: terms.fl=field_Nameterms.limit=60terms.lower=bwt=jsonterms.sort=indexshard.keys=shard_key QTime: 500 milliseconds The response is giving the same terms in both queries, But the QTime is different. Please let me know why is the difference in QTime for both approaches. Thanks, Jilani
Re: Getting huge difference in QTime for terms.lower and terms.prefix
Please provide suggestions what could be the reason for this. Thanks, On Thu, Apr 10, 2014 at 2:54 PM, Jilani Shaik jilani24...@gmail.com wrote: Hi, When I queried terms component with a terms.prefix the QTime for it is 100 milli seconds, where as the same query I am giving with terms.lower then the QTime is 500 milliseconds. I am using the Solr Cloud. I am giving both the cases terms.limit as 60 and terms.sort=index. Query1 Params: terms.fl=field_Nameterms.limit=60terms.prefix=bwt=jsonterms.sort=indexshard.keys=shard_key QTime: 100 milli seconds Query2 Params: terms.fl=field_Nameterms.limit=60terms.lower=bwt=jsonterms.sort=indexshard.keys=shard_key QTime: 500 milliseconds The response is giving the same terms in both queries, But the QTime is different. Please let me know why is the difference in QTime for both approaches. Thanks, Jilani
Re: Filter in terms component
Will it work for multi value fields, It is saying that Field Cache will not work for multi value fields error. Most of the data is multi value fields in index. Thanks, Jilani On Thu, Mar 20, 2014 at 1:53 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, If you just need counts may be you can make use of http://wiki.apache.org/solr/FunctionQuery#Relevance_Functions Ahmet On Wednesday, March 19, 2014 9:49 PM, Jilani Shaik jilani24...@gmail.com wrote: Hi Ahmet, I have gone through the facet component, as our application has 300+ million docs and it very time consuming with this component and also it uses cache. So I have gone through the terms component where Solr is reading index for field terms, is there any approach where I can get the terms using the filter. So that I can restrict some of the document terms in counts. Basically we have set of documents where we want to show the terms count based on those filters with set name. Instead of reading entire index. Please let me know if you need any details to throw some more pointers Thanks, Jilani On Thu, Mar 20, 2014 at 1:04 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi Jilani, What features of terms component are you after? If if it is just terms.prefix, it could be simulated with facet component with facet.prefix parameter. faceting component respects filter queries. On Wednesday, March 19, 2014 8:58 PM, Jilani Shaik jilani24...@gmail.com wrote: Hi, I have huge index and using Solr. I need terms component with filter by a field. Please let me know is there anything that I can get it. Please provide me some pointers, even to develop this by going through the Lucene. Please suggest. Thanks, Jilani
Re: Filter in terms component
Hi, Please provide some more pointers to go ahead in addressing this. Thnks, Jilani On Thu, Mar 20, 2014 at 8:50 PM, Jilani Shaik jilani24...@gmail.com wrote: Will it work for multi value fields, It is saying that Field Cache will not work for multi value fields error. Most of the data is multi value fields in index. Thanks, Jilani On Thu, Mar 20, 2014 at 1:53 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, If you just need counts may be you can make use of http://wiki.apache.org/solr/FunctionQuery#Relevance_Functions Ahmet On Wednesday, March 19, 2014 9:49 PM, Jilani Shaik jilani24...@gmail.com wrote: Hi Ahmet, I have gone through the facet component, as our application has 300+ million docs and it very time consuming with this component and also it uses cache. So I have gone through the terms component where Solr is reading index for field terms, is there any approach where I can get the terms using the filter. So that I can restrict some of the document terms in counts. Basically we have set of documents where we want to show the terms count based on those filters with set name. Instead of reading entire index. Please let me know if you need any details to throw some more pointers Thanks, Jilani On Thu, Mar 20, 2014 at 1:04 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi Jilani, What features of terms component are you after? If if it is just terms.prefix, it could be simulated with facet component with facet.prefix parameter. faceting component respects filter queries. On Wednesday, March 19, 2014 8:58 PM, Jilani Shaik jilani24...@gmail.com wrote: Hi, I have huge index and using Solr. I need terms component with filter by a field. Please let me know is there anything that I can get it. Please provide me some pointers, even to develop this by going through the Lucene. Please suggest. Thanks, Jilani
Filter in terms component
Hi, I have huge index and using Solr. I need terms component with filter by a field. Please let me know is there anything that I can get it. Please provide me some pointers, even to develop this by going through the Lucene. Please suggest. Thanks, Jilani
Re: Filter in terms component
Hi Ahmet, I have gone through the facet component, as our application has 300+ million docs and it very time consuming with this component and also it uses cache. So I have gone through the terms component where Solr is reading index for field terms, is there any approach where I can get the terms using the filter. So that I can restrict some of the document terms in counts. Basically we have set of documents where we want to show the terms count based on those filters with set name. Instead of reading entire index. Please let me know if you need any details to throw some more pointers Thanks, Jilani On Thu, Mar 20, 2014 at 1:04 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi Jilani, What features of terms component are you after? If if it is just terms.prefix, it could be simulated with facet component with facet.prefix parameter. faceting component respects filter queries. On Wednesday, March 19, 2014 8:58 PM, Jilani Shaik jilani24...@gmail.com wrote: Hi, I have huge index and using Solr. I need terms component with filter by a field. Please let me know is there anything that I can get it. Please provide me some pointers, even to develop this by going through the Lucene. Please suggest. Thanks, Jilani
Get the query result from one collection and send it to other collection to for merging the result sets
Hi, We will have two categories of data, where one category will be the list of primary data (for example products) and the other collection (it could be spread across shards) holds the transaction data (for example product sales data). We have search scenario where we need to show the products along with the number of sales for each product. For this we need to do a facet based search on second collection and then this has to shown together along with the primary data. Is there any way to handle this kind of scenario. Please suggest any other approaches to get the desired result. Thank you, Jilani