Re: Solr Cloud and Multiple Indexes
Just to give you a context of what I am talking about, I am collecting data from different sources (such as articles, videos etc.). Moreover, I will be doing enrichment on the data such as Entity Extraction. From my previous experiment with Solr what I was doing is dumping all articles, videos meta data into a single index (distributed into multiple shards). Now that made the whole query very slow. So for entity extraction, I created another index on the same shards and pushed entities there. This actually made querying entities very quick as there was very little data on that index (although it was residing on the same machine as the main index). Based on that quick experiment, I was thinking if I need to use another approach for my data. For example, instead of just relying on Solr Cloud to distribute my data on different shards, why don't I create another index for each type of data I have, such as articles, videos and then perform some sort of distributed search over them. Will that be better in some sense, such as performance? Which version of solr are you using? Currently, I am using Solr 5.3. btw, I could not find segment info link. Is it under Core Admin? Regards, Salman On Fri, Nov 6, 2015 at 7:26 AM, Modassar Atherwrote: > Thanks for your response. I have already gone through those documents > before. My point was that if I am using Solr Cloud the only way to > distribute my indexes is by adding shards? and I don't have to do anything > manually (because all the distributed search is handled by Solr Cloud). > > Yes as per my knowledge. > > How do I check how many segments are there in the index? > You can see into the index folder manually. Which version of solr are you > using? I don't remember exactly the start version but in the latest and > Solr-5.2.1 there is a "Segments info" link available where you can see > number of segments. > > Regards, > Modassar > > On Thu, Nov 5, 2015 at 5:41 PM, Salman Ansari > wrote: > > > Thanks for your response. I have already gone through those documents > > before. My point was that if I am using Solr Cloud the only way to > > distribute my indexes is by adding shards? and I don't have to do > anything > > manually (because all the distributed search is handled by Solr Cloud). > > > > What is the Xms and Xmx you are allocating to Solr and how much max is > > used by > > your solr? > > Xms and Xmx are both 4G. My current JVM-Memory consumption is 1.58 GB > > > > How many segments are there in the index? The more the segment the slower > > is > > the search. > > How do I check how many segments are there in the index? > > > > Is this after you moved to solrcloud? > > I have been using SolrCloud from the beginning. > > > > Regards, > > Salman > > > > > > On Thu, Nov 5, 2015 at 1:21 PM, Modassar Ather > > wrote: > > > > > SolrCloud makes the distributed search easier. You can find details > about > > > it under following link. > > > https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works > > > > > > You can also refer to following link: > > > > > > > > > https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud > > > > > > From size of your index I meant index size and not the total document > > > alone. > > > How many segments are there in the index? The more the segment the > slower > > > is the search. > > > What is the Xms and Xmx you are allocating to Solr and how much max is > > used > > > by your solr? > > > > > > I doubt this as the slowness was happening for a long period of time. > > > I mentioned this point as I have seen gc pauses of 30 seconds and more > in > > > some complex queries. > > > > > > I am facing delay of 2-3 seconds but previously I > > > had delays of around 28 seconds. > > > Is this after you moved to solrcloud? > > > > > > Regards, > > > Modassar > > > > > > > > > On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansari > > > > wrote: > > > > > > > Here is the current info > > > > > > > > How much memory is used? > > > > Physical memory consumption: 5.48 GB out of 14 GB. > > > > Swap space consumption: 5.83 GB out of 15.94 GB. > > > > JVM-Memory consumption: 1.58 GB out of 3.83 GB. > > > > > > > > What is your index size? > > > > I have around 70M documents distributed on 2 shards (so each shard > has > > > 35M > > > > document) > > > > > > > > What type of queries are slow? > > > > I am running normal queries (queries on a field) no faceting or > > > highlights > > > > are requested. Currently, I am facing delay of 2-3 seconds but > > > previously I > > > > had delays of around 28 seconds. > > > > > > > > Are there GC pauses as they can be a cause of slowness? > > > > I doubt this as the slowness was happening for a long period of time. > > > > > > > > Are document updates/additions happening in parallel? > > > > No, I have stopped adding/updating documents and doing queries only. > > > > > >
Re: Solr Cloud and Multiple Indexes
As per my understanding if the data getting indexed is completely different and does not fall into same schema they can be segregated for indexing. But if they fit into same schema then it is better to keep them in same index and if the index size grows then switch to SolrCloud as it has lots of benefits. Our is a 12 shard cluster and each cluster has around 100 gb of index on each of them. The simple query response is very fast. Currently, I am using Solr 5.3. btw, I could not find segment info link. Is it under Core Admin? Select your core on the dashboard. The last link is segment info link. Regards, Modassar On Sun, Nov 8, 2015 at 3:07 PM, Salman Ansariwrote: > Just to give you a context of what I am talking about, I am collecting data > from different sources (such as articles, videos etc.). Moreover, I will be > doing enrichment on the data such as Entity Extraction. From my previous > experiment with Solr what I was doing is dumping all articles, videos meta > data into a single index (distributed into multiple shards). Now that made > the whole query very slow. So for entity extraction, I created another > index on the same shards and pushed entities there. This actually made > querying entities very quick as there was very little data on that index > (although it was residing on the same machine as the main index). > > Based on that quick experiment, I was thinking if I need to use another > approach for my data. For example, instead of just relying on Solr Cloud to > distribute my data on different shards, why don't I create another index > for each type of data I have, such as articles, videos and then perform > some sort of distributed search over them. Will that be better in some > sense, such as performance? > > Which version of solr are you using? > Currently, I am using Solr 5.3. btw, I could not find segment info link. Is > it under Core Admin? > > Regards, > Salman > > > On Fri, Nov 6, 2015 at 7:26 AM, Modassar Ather > wrote: > > > Thanks for your response. I have already gone through those documents > > before. My point was that if I am using Solr Cloud the only way to > > distribute my indexes is by adding shards? and I don't have to do > anything > > manually (because all the distributed search is handled by Solr Cloud). > > > > Yes as per my knowledge. > > > > How do I check how many segments are there in the index? > > You can see into the index folder manually. Which version of solr are you > > using? I don't remember exactly the start version but in the latest and > > Solr-5.2.1 there is a "Segments info" link available where you can see > > number of segments. > > > > Regards, > > Modassar > > > > On Thu, Nov 5, 2015 at 5:41 PM, Salman Ansari > > wrote: > > > > > Thanks for your response. I have already gone through those documents > > > before. My point was that if I am using Solr Cloud the only way to > > > distribute my indexes is by adding shards? and I don't have to do > > anything > > > manually (because all the distributed search is handled by Solr Cloud). > > > > > > What is the Xms and Xmx you are allocating to Solr and how much max is > > > used by > > > your solr? > > > Xms and Xmx are both 4G. My current JVM-Memory consumption is 1.58 GB > > > > > > How many segments are there in the index? The more the segment the > slower > > > is > > > the search. > > > How do I check how many segments are there in the index? > > > > > > Is this after you moved to solrcloud? > > > I have been using SolrCloud from the beginning. > > > > > > Regards, > > > Salman > > > > > > > > > On Thu, Nov 5, 2015 at 1:21 PM, Modassar Ather > > > > wrote: > > > > > > > SolrCloud makes the distributed search easier. You can find details > > about > > > > it under following link. > > > > https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works > > > > > > > > You can also refer to following link: > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud > > > > > > > > From size of your index I meant index size and not the total document > > > > alone. > > > > How many segments are there in the index? The more the segment the > > slower > > > > is the search. > > > > What is the Xms and Xmx you are allocating to Solr and how much max > is > > > used > > > > by your solr? > > > > > > > > I doubt this as the slowness was happening for a long period of time. > > > > I mentioned this point as I have seen gc pauses of 30 seconds and > more > > in > > > > some complex queries. > > > > > > > > I am facing delay of 2-3 seconds but previously I > > > > had delays of around 28 seconds. > > > > Is this after you moved to solrcloud? > > > > > > > > Regards, > > > > Modassar > > > > > > > > > > > > On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansari < > salman.rah...@gmail.com > > > > > > > wrote: > > > > > > > > > Here is the current info >
Solr Cloud and Multiple Indexes
Hi, I am using Solr cloud and I have created a single index that host around 70M documents distributed into 2 shards (each having 35M documents) and 2 replicas. The queries are very slow to run so I was thinking to distribute the indexes into multiple indexes and consequently distributed search. Can anyone guide me to some sources (articles) that discuss this in Solr Cloud? Appreciate your feedback regarding this. Regards, Salman
Re: Solr Cloud and Multiple Indexes
What is your index size? How much memory is used? What type of queries are slow? Are there GC pauses as they can be a cause of slowness? Are document updates/additions happening in parallel? The queries are very slow to run so I was thinking to distribute the indexes into multiple indexes and consequently distributed search. Can anyone guide me to some sources (articles) that discuss this in Solr Cloud? This is what you are already doing. Did you mean that you want to add more shards? Regards, Modassar On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansariwrote: > Hi, > > I am using Solr cloud and I have created a single index that host around > 70M documents distributed into 2 shards (each having 35M documents) and 2 > replicas. The queries are very slow to run so I was thinking to distribute > the indexes into multiple indexes and consequently distributed search. Can > anyone guide me to some sources (articles) that discuss this in Solr Cloud? > > Appreciate your feedback regarding this. > > Regards, > Salman >
Re: Solr Cloud and Multiple Indexes
Here is the current info How much memory is used? Physical memory consumption: 5.48 GB out of 14 GB. Swap space consumption: 5.83 GB out of 15.94 GB. JVM-Memory consumption: 1.58 GB out of 3.83 GB. What is your index size? I have around 70M documents distributed on 2 shards (so each shard has 35M document) What type of queries are slow? I am running normal queries (queries on a field) no faceting or highlights are requested. Currently, I am facing delay of 2-3 seconds but previously I had delays of around 28 seconds. Are there GC pauses as they can be a cause of slowness? I doubt this as the slowness was happening for a long period of time. Are document updates/additions happening in parallel? No, I have stopped adding/updating documents and doing queries only. This is what you are already doing. Did you mean that you want to add more shards? No, what I meant is that I read that previously there was a way to chunk a large index into multiple and then do distributed search on that as in this article https://wiki.apache.org/solr/DistributedSearch. What I was looking for how this is handled in Solr Cloud? Regards, Salman On Thu, Nov 5, 2015 at 12:06 PM, Modassar Atherwrote: > What is your index size? How much memory is used? What type of queries are > slow? > Are there GC pauses as they can be a cause of slowness? > Are document updates/additions happening in parallel? > > The queries are very slow to run so I was thinking to distribute > the indexes into multiple indexes and consequently distributed search. Can > anyone guide me to some sources (articles) that discuss this in Solr Cloud? > > This is what you are already doing. Did you mean that you want to add more > shards? > > Regards, > Modassar > > On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari > wrote: > > > Hi, > > > > I am using Solr cloud and I have created a single index that host around > > 70M documents distributed into 2 shards (each having 35M documents) and 2 > > replicas. The queries are very slow to run so I was thinking to > distribute > > the indexes into multiple indexes and consequently distributed search. > Can > > anyone guide me to some sources (articles) that discuss this in Solr > Cloud? > > > > Appreciate your feedback regarding this. > > > > Regards, > > Salman > > >
Re: Solr Cloud and Multiple Indexes
Thanks for your response. I have already gone through those documents before. My point was that if I am using Solr Cloud the only way to distribute my indexes is by adding shards? and I don't have to do anything manually (because all the distributed search is handled by Solr Cloud). What is the Xms and Xmx you are allocating to Solr and how much max is used by your solr? Xms and Xmx are both 4G. My current JVM-Memory consumption is 1.58 GB How many segments are there in the index? The more the segment the slower is the search. How do I check how many segments are there in the index? Is this after you moved to solrcloud? I have been using SolrCloud from the beginning. Regards, Salman On Thu, Nov 5, 2015 at 1:21 PM, Modassar Atherwrote: > SolrCloud makes the distributed search easier. You can find details about > it under following link. > https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works > > You can also refer to following link: > > https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud > > From size of your index I meant index size and not the total document > alone. > How many segments are there in the index? The more the segment the slower > is the search. > What is the Xms and Xmx you are allocating to Solr and how much max is used > by your solr? > > I doubt this as the slowness was happening for a long period of time. > I mentioned this point as I have seen gc pauses of 30 seconds and more in > some complex queries. > > I am facing delay of 2-3 seconds but previously I > had delays of around 28 seconds. > Is this after you moved to solrcloud? > > Regards, > Modassar > > > On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansari > wrote: > > > Here is the current info > > > > How much memory is used? > > Physical memory consumption: 5.48 GB out of 14 GB. > > Swap space consumption: 5.83 GB out of 15.94 GB. > > JVM-Memory consumption: 1.58 GB out of 3.83 GB. > > > > What is your index size? > > I have around 70M documents distributed on 2 shards (so each shard has > 35M > > document) > > > > What type of queries are slow? > > I am running normal queries (queries on a field) no faceting or > highlights > > are requested. Currently, I am facing delay of 2-3 seconds but > previously I > > had delays of around 28 seconds. > > > > Are there GC pauses as they can be a cause of slowness? > > I doubt this as the slowness was happening for a long period of time. > > > > Are document updates/additions happening in parallel? > > No, I have stopped adding/updating documents and doing queries only. > > > > This is what you are already doing. Did you mean that you want to add > more > > shards? > > No, what I meant is that I read that previously there was a way to chunk > a > > large index into multiple and then do distributed search on that as in > this > > article https://wiki.apache.org/solr/DistributedSearch. What I was > looking > > for how this is handled in Solr Cloud? > > > > > > Regards, > > Salman > > > > > > > > > > > > On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather > > wrote: > > > > > What is your index size? How much memory is used? What type of queries > > are > > > slow? > > > Are there GC pauses as they can be a cause of slowness? > > > Are document updates/additions happening in parallel? > > > > > > The queries are very slow to run so I was thinking to distribute > > > the indexes into multiple indexes and consequently distributed search. > > Can > > > anyone guide me to some sources (articles) that discuss this in Solr > > Cloud? > > > > > > This is what you are already doing. Did you mean that you want to add > > more > > > shards? > > > > > > Regards, > > > Modassar > > > > > > On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari > > > > wrote: > > > > > > > Hi, > > > > > > > > I am using Solr cloud and I have created a single index that host > > around > > > > 70M documents distributed into 2 shards (each having 35M documents) > > and 2 > > > > replicas. The queries are very slow to run so I was thinking to > > > distribute > > > > the indexes into multiple indexes and consequently distributed > search. > > > Can > > > > anyone guide me to some sources (articles) that discuss this in Solr > > > Cloud? > > > > > > > > Appreciate your feedback regarding this. > > > > > > > > Regards, > > > > Salman > > > > > > > > > >
Re: Solr Cloud and Multiple Indexes
SolrCloud makes the distributed search easier. You can find details about it under following link. https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works You can also refer to following link: https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud >From size of your index I meant index size and not the total document alone. How many segments are there in the index? The more the segment the slower is the search. What is the Xms and Xmx you are allocating to Solr and how much max is used by your solr? I doubt this as the slowness was happening for a long period of time. I mentioned this point as I have seen gc pauses of 30 seconds and more in some complex queries. I am facing delay of 2-3 seconds but previously I had delays of around 28 seconds. Is this after you moved to solrcloud? Regards, Modassar On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansariwrote: > Here is the current info > > How much memory is used? > Physical memory consumption: 5.48 GB out of 14 GB. > Swap space consumption: 5.83 GB out of 15.94 GB. > JVM-Memory consumption: 1.58 GB out of 3.83 GB. > > What is your index size? > I have around 70M documents distributed on 2 shards (so each shard has 35M > document) > > What type of queries are slow? > I am running normal queries (queries on a field) no faceting or highlights > are requested. Currently, I am facing delay of 2-3 seconds but previously I > had delays of around 28 seconds. > > Are there GC pauses as they can be a cause of slowness? > I doubt this as the slowness was happening for a long period of time. > > Are document updates/additions happening in parallel? > No, I have stopped adding/updating documents and doing queries only. > > This is what you are already doing. Did you mean that you want to add more > shards? > No, what I meant is that I read that previously there was a way to chunk a > large index into multiple and then do distributed search on that as in this > article https://wiki.apache.org/solr/DistributedSearch. What I was looking > for how this is handled in Solr Cloud? > > > Regards, > Salman > > > > > > On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather > wrote: > > > What is your index size? How much memory is used? What type of queries > are > > slow? > > Are there GC pauses as they can be a cause of slowness? > > Are document updates/additions happening in parallel? > > > > The queries are very slow to run so I was thinking to distribute > > the indexes into multiple indexes and consequently distributed search. > Can > > anyone guide me to some sources (articles) that discuss this in Solr > Cloud? > > > > This is what you are already doing. Did you mean that you want to add > more > > shards? > > > > Regards, > > Modassar > > > > On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari > > wrote: > > > > > Hi, > > > > > > I am using Solr cloud and I have created a single index that host > around > > > 70M documents distributed into 2 shards (each having 35M documents) > and 2 > > > replicas. The queries are very slow to run so I was thinking to > > distribute > > > the indexes into multiple indexes and consequently distributed search. > > Can > > > anyone guide me to some sources (articles) that discuss this in Solr > > Cloud? > > > > > > Appreciate your feedback regarding this. > > > > > > Regards, > > > Salman > > > > > >
Re: Solr Cloud and Multiple Indexes
Thanks for your response. I have already gone through those documents before. My point was that if I am using Solr Cloud the only way to distribute my indexes is by adding shards? and I don't have to do anything manually (because all the distributed search is handled by Solr Cloud). Yes as per my knowledge. How do I check how many segments are there in the index? You can see into the index folder manually. Which version of solr are you using? I don't remember exactly the start version but in the latest and Solr-5.2.1 there is a "Segments info" link available where you can see number of segments. Regards, Modassar On Thu, Nov 5, 2015 at 5:41 PM, Salman Ansariwrote: > Thanks for your response. I have already gone through those documents > before. My point was that if I am using Solr Cloud the only way to > distribute my indexes is by adding shards? and I don't have to do anything > manually (because all the distributed search is handled by Solr Cloud). > > What is the Xms and Xmx you are allocating to Solr and how much max is > used by > your solr? > Xms and Xmx are both 4G. My current JVM-Memory consumption is 1.58 GB > > How many segments are there in the index? The more the segment the slower > is > the search. > How do I check how many segments are there in the index? > > Is this after you moved to solrcloud? > I have been using SolrCloud from the beginning. > > Regards, > Salman > > > On Thu, Nov 5, 2015 at 1:21 PM, Modassar Ather > wrote: > > > SolrCloud makes the distributed search easier. You can find details about > > it under following link. > > https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works > > > > You can also refer to following link: > > > > > https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud > > > > From size of your index I meant index size and not the total document > > alone. > > How many segments are there in the index? The more the segment the slower > > is the search. > > What is the Xms and Xmx you are allocating to Solr and how much max is > used > > by your solr? > > > > I doubt this as the slowness was happening for a long period of time. > > I mentioned this point as I have seen gc pauses of 30 seconds and more in > > some complex queries. > > > > I am facing delay of 2-3 seconds but previously I > > had delays of around 28 seconds. > > Is this after you moved to solrcloud? > > > > Regards, > > Modassar > > > > > > On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansari > > wrote: > > > > > Here is the current info > > > > > > How much memory is used? > > > Physical memory consumption: 5.48 GB out of 14 GB. > > > Swap space consumption: 5.83 GB out of 15.94 GB. > > > JVM-Memory consumption: 1.58 GB out of 3.83 GB. > > > > > > What is your index size? > > > I have around 70M documents distributed on 2 shards (so each shard has > > 35M > > > document) > > > > > > What type of queries are slow? > > > I am running normal queries (queries on a field) no faceting or > > highlights > > > are requested. Currently, I am facing delay of 2-3 seconds but > > previously I > > > had delays of around 28 seconds. > > > > > > Are there GC pauses as they can be a cause of slowness? > > > I doubt this as the slowness was happening for a long period of time. > > > > > > Are document updates/additions happening in parallel? > > > No, I have stopped adding/updating documents and doing queries only. > > > > > > This is what you are already doing. Did you mean that you want to add > > more > > > shards? > > > No, what I meant is that I read that previously there was a way to > chunk > > a > > > large index into multiple and then do distributed search on that as in > > this > > > article https://wiki.apache.org/solr/DistributedSearch. What I was > > looking > > > for how this is handled in Solr Cloud? > > > > > > > > > Regards, > > > Salman > > > > > > > > > > > > > > > > > > On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather < > modather1...@gmail.com> > > > wrote: > > > > > > > What is your index size? How much memory is used? What type of > queries > > > are > > > > slow? > > > > Are there GC pauses as they can be a cause of slowness? > > > > Are document updates/additions happening in parallel? > > > > > > > > The queries are very slow to run so I was thinking to distribute > > > > the indexes into multiple indexes and consequently distributed > search. > > > Can > > > > anyone guide me to some sources (articles) that discuss this in Solr > > > Cloud? > > > > > > > > This is what you are already doing. Did you mean that you want to add > > > more > > > > shards? > > > > > > > > Regards, > > > > Modassar > > > > > > > > On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari < > salman.rah...@gmail.com > > > > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > I am using Solr cloud and I have created a single index that host > > > around > > > > > 70M documents distributed into 2