Re: Data indexing is going too slow on single shard Why?
Okay. Thanks Shawn.. On Thu, Mar 26, 2015 at 12:25 PM, Shawn Heisey apa...@elyograg.org wrote: On 3/26/2015 12:03 AM, Nitin Solanki wrote: Great thanks Shawn... As you said - **For 204GB of data per server, I recommend at least 128GB of total RAM, preferably 256GB**. Therefore, if I have 204GB of data on single server/shard then I prefer is 256GB by which searching will be fast and never slow down. Is it? Obviously I cannot guarantee it, but I think it's extremely likely that with that much memory, performance will be very good. One other possibility, which is discussed on that wiki page I linked, is that your java heap is being almost exhausted and large amounts of time are spent in garbage collection. If you increase the heap from 4GB to 5GB and see performance get better, then that would be confirmed. There would be less memory available for caching, but constant garbage collection would be a much greater problem than the disk cache being too small. Thanks, Shawn
Re: Data indexing is going too slow on single shard Why?
Great thanks Shawn... As you said - **For 204GB of data per server, I recommend at least 128GB of total RAM, preferably 256GB**. Therefore, if I have 204GB of data on single server/shard then I prefer is 256GB by which searching will be fast and never slow down. Is it? On Wed, Mar 25, 2015 at 9:50 PM, Shawn Heisey apa...@elyograg.org wrote: On 3/25/2015 8:42 AM, Nitin Solanki wrote: Server configuration: 8 CPUs. 32 GB RAM O.S. - Linux snip are running. Java heap set to 4096 MB in Solr. While indexing, snip *Currently*, I have 1 shard with 2 replicas using SOLR CLOUD. Data Size: 102Gsolr/node1/solr/wikingram_shard1_replica2 102Gsolr/node2/solr/wikingram_shard1_replica1 If both of those are on the same machine, I'm guessing that you're running two Solr instances on that machine, so there's 8GB of RAM used for Java. That means you have about 24 GB of RAM left for caching ... and 200GB of index data to cache. 24GB is not enough to cache 200GB of index. If there is only one Solr instance (leaving 28GB for caching) with 102GB of data on the machine, it still might not be enough. See that SolrPerformanceProblems wiki page I linked in my earlier email. For 102GB of data per server, I recommend at least 64GB of total RAM, preferably 128GB. For 204GB of data per server, I recommend at least 128GB of total RAM, preferably 256GB. Thanks, Shawn
Re: Data indexing is going too slow on single shard Why?
On 3/26/2015 12:03 AM, Nitin Solanki wrote: Great thanks Shawn... As you said - **For 204GB of data per server, I recommend at least 128GB of total RAM, preferably 256GB**. Therefore, if I have 204GB of data on single server/shard then I prefer is 256GB by which searching will be fast and never slow down. Is it? Obviously I cannot guarantee it, but I think it's extremely likely that with that much memory, performance will be very good. One other possibility, which is discussed on that wiki page I linked, is that your java heap is being almost exhausted and large amounts of time are spent in garbage collection. If you increase the heap from 4GB to 5GB and see performance get better, then that would be confirmed. There would be less memory available for caching, but constant garbage collection would be a much greater problem than the disk cache being too small. Thanks, Shawn
Re: Data indexing is going too slow on single shard Why?
On 3/25/2015 5:03 AM, Nitin Solanki wrote: Please can anyone assist me? I am indexing on single shard it is taking too much of time to index data. And I am indexing around 49GB of data on single shard. What's wrong? Why solr is taking too much time to index data? Earlier I was indexing same data on 8 shards. That time, it was fast as compared to single shard. Why so? Any help please.. There's practically no information to go on here, so about all I can offer is general information in return: http://wiki.apache.org/solr/SolrPerformanceProblems I looked over the previous messages that you have sent the list, and I can find very little of the required information about your index. I see a lot of questions from you, but they did not include the kind of details needed here: How much total RAM is in each Solr server? Are there any other programs on the server with significant RAM requirements? An example of such a program would be a database server. On each server, how much memory is dedicated to the java heap(s) for Solr? I gather from other questions that you are running SolrCloud, can you confirm? On a per-server basis, how much disk space do all the index replicas take? How many documents are on each server? Note that for disk space and number of documents, I am asking you to count every replica, not take the total in the collection and divide it by the number of servers. How are you doing your indexing? For this question, I am asking what program or Solr API is actually sending the data to Solr. Possible answers include the dataimport handler, a SolrJ program, one of the other Solr APIs such as a PHP client, and hand-crafted URLs with an HTTP client. Thanks, Shawn
Re: Data indexing is going too slow on single shard Why?
Hello, * Updating my question again.* Please can anyone assist me? I am indexing on single shard it is taking too much of time to index data. And I am indexing around 49GB of data on single shard. What's wrong? Why solr is taking too much time to index data? Earlier I was indexing same data on 8 shards. That time, it was fast as compared to single shard. Why so? Any help please.. *HardCommit - 15 sec* *SoftCommit - 10 min.* ii) Searching a query/term is also taking too much time. Any help on this also. On Wed, Mar 25, 2015 at 4:33 PM, Nitin Solanki nitinml...@gmail.com wrote: Hello, Please can anyone assist me? I am indexing on single shard it is taking too much of time to index data. And I am indexing around 49GB of data on single shard. What's wrong? Why solr is taking too much time to index data? Earlier I was indexing same data on 8 shards. That time, it was fast as compared to single shard. Why so? Any help please.. *HardCommit - 15 sec* *SoftCommit - 10 min.* Best, Nitin
Re: Data indexing is going too slow on single shard Why?
Hi Shawn, Sorry for all the things. Server configuration: 8 CPUs. 32 GB RAM O.S. - Linux *Earlier*, I was using 8 shards without replica(default is 1) using SOLR CLOUD. On server, Only Solr is running. There is no other application which are running. Java heap set to 4096 MB in Solr. While indexing, Solr(sometime) eats up whole RAM. I don't know how each solr server takes RAM? Each server taking around 50 GB data(indexed). Actually, I had deleted previous solr architecture, so I don't have any idea that how many documents were on each shards and also don't know total documents. *Currently*, I have 1 shard with 2 replicas using SOLR CLOUD. Data Size: 102Gsolr/node1/solr/wikingram_shard1_replica2 102Gsolr/node2/solr/wikingram_shard1_replica1 I am running a python script to index data using Solr RESTAPI. Commiting 2 Documents each time for indexing using python script with Solr RESTAPI. If I missed anything related to Solr. Please inform me.. THanks Shawn. Waiting for your reply On Wed, Mar 25, 2015 at 7:33 PM, Shawn Heisey apa...@elyograg.org wrote: On 3/25/2015 5:03 AM, Nitin Solanki wrote: Please can anyone assist me? I am indexing on single shard it is taking too much of time to index data. And I am indexing around 49GB of data on single shard. What's wrong? Why solr is taking too much time to index data? Earlier I was indexing same data on 8 shards. That time, it was fast as compared to single shard. Why so? Any help please.. There's practically no information to go on here, so about all I can offer is general information in return: http://wiki.apache.org/solr/SolrPerformanceProblems I looked over the previous messages that you have sent the list, and I can find very little of the required information about your index. I see a lot of questions from you, but they did not include the kind of details needed here: How much total RAM is in each Solr server? Are there any other programs on the server with significant RAM requirements? An example of such a program would be a database server. On each server, how much memory is dedicated to the java heap(s) for Solr? I gather from other questions that you are running SolrCloud, can you confirm? On a per-server basis, how much disk space do all the index replicas take? How many documents are on each server? Note that for disk space and number of documents, I am asking you to count every replica, not take the total in the collection and divide it by the number of servers. How are you doing your indexing? For this question, I am asking what program or Solr API is actually sending the data to Solr. Possible answers include the dataimport handler, a SolrJ program, one of the other Solr APIs such as a PHP client, and hand-crafted URLs with an HTTP client. Thanks, Shawn
Re: Data indexing is going too slow on single shard Why?
On 3/25/2015 8:42 AM, Nitin Solanki wrote: Server configuration: 8 CPUs. 32 GB RAM O.S. - Linux snip are running. Java heap set to 4096 MB in Solr. While indexing, snip *Currently*, I have 1 shard with 2 replicas using SOLR CLOUD. Data Size: 102Gsolr/node1/solr/wikingram_shard1_replica2 102Gsolr/node2/solr/wikingram_shard1_replica1 If both of those are on the same machine, I'm guessing that you're running two Solr instances on that machine, so there's 8GB of RAM used for Java. That means you have about 24 GB of RAM left for caching ... and 200GB of index data to cache. 24GB is not enough to cache 200GB of index. If there is only one Solr instance (leaving 28GB for caching) with 102GB of data on the machine, it still might not be enough. See that SolrPerformanceProblems wiki page I linked in my earlier email. For 102GB of data per server, I recommend at least 64GB of total RAM, preferably 128GB. For 204GB of data per server, I recommend at least 128GB of total RAM, preferably 256GB. Thanks, Shawn
Data indexing is going too slow on single shard Why?
Hello, Please can anyone assist me? I am indexing on single shard it is taking too much of time to index data. And I am indexing around 49GB of data on single shard. What's wrong? Why solr is taking too much time to index data? Earlier I was indexing same data on 8 shards. That time, it was fast as compared to single shard. Why so? Any help please.. *HardCommit - 15 sec* *SoftCommit - 10 min.* Best, Nitin