Re: Data indexing is going too slow on single shard Why?

2015-03-27 Thread Nitin Solanki
Okay. Thanks Shawn..

On Thu, Mar 26, 2015 at 12:25 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 3/26/2015 12:03 AM, Nitin Solanki wrote:
  Great thanks Shawn...
  As you said -  **For 204GB of data per server, I recommend at least 128GB
  of total RAM,
  preferably 256GB**. Therefore, if I have 204GB of data on single
  server/shard then I prefer is 256GB by which searching will be fast and
  never slow down. Is it?

 Obviously I cannot guarantee it, but I think it's extremely likely that
 with that much memory, performance will be very good.

 One other possibility, which is discussed on that wiki page I linked, is
 that your java heap is being almost exhausted and large amounts of time
 are spent in garbage collection.  If you increase the heap from 4GB to
 5GB and see performance get better, then that would be confirmed.  There
 would be less memory available for caching, but constant garbage
 collection would be a much greater problem than the disk cache being too
 small.

 Thanks,
 Shawn




Re: Data indexing is going too slow on single shard Why?

2015-03-26 Thread Nitin Solanki
Great thanks Shawn...
As you said -  **For 204GB of data per server, I recommend at least 128GB
of total RAM,
preferably 256GB**. Therefore, if I have 204GB of data on single
server/shard then I prefer is 256GB by which searching will be fast and
never slow down. Is it?

On Wed, Mar 25, 2015 at 9:50 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 3/25/2015 8:42 AM, Nitin Solanki wrote:
  Server configuration:
  8 CPUs.
  32 GB RAM
  O.S. - Linux

 snip

  are running.  Java heap set to 4096 MB in Solr.  While indexing,

 snip

  *Currently*, I have 1 shard  with 2 replicas using SOLR CLOUD.
  Data Size:
  102Gsolr/node1/solr/wikingram_shard1_replica2
  102Gsolr/node2/solr/wikingram_shard1_replica1

 If both of those are on the same machine, I'm guessing that you're
 running two Solr instances on that machine, so there's 8GB of RAM used
 for Java.  That means you have about 24 GB of RAM left for caching ...
 and 200GB of index data to cache.

 24GB is not enough to cache 200GB of index.  If there is only one Solr
 instance (leaving 28GB for caching) with 102GB of data on the machine,
 it still might not be enough.  See that SolrPerformanceProblems wiki
 page I linked in my earlier email.

 For 102GB of data per server, I recommend at least 64GB of total RAM,
 preferably 128GB.

 For 204GB of data per server, I recommend at least 128GB of total RAM,
 preferably 256GB.

 Thanks,
 Shawn




Re: Data indexing is going too slow on single shard Why?

2015-03-26 Thread Shawn Heisey
On 3/26/2015 12:03 AM, Nitin Solanki wrote:
 Great thanks Shawn...
 As you said -  **For 204GB of data per server, I recommend at least 128GB
 of total RAM,
 preferably 256GB**. Therefore, if I have 204GB of data on single
 server/shard then I prefer is 256GB by which searching will be fast and
 never slow down. Is it?

Obviously I cannot guarantee it, but I think it's extremely likely that
with that much memory, performance will be very good.

One other possibility, which is discussed on that wiki page I linked, is
that your java heap is being almost exhausted and large amounts of time
are spent in garbage collection.  If you increase the heap from 4GB to
5GB and see performance get better, then that would be confirmed.  There
would be less memory available for caching, but constant garbage
collection would be a much greater problem than the disk cache being too
small.

Thanks,
Shawn



Re: Data indexing is going too slow on single shard Why?

2015-03-25 Thread Shawn Heisey
On 3/25/2015 5:03 AM, Nitin Solanki wrote:
 Please can anyone assist me? I am indexing on single shard it
 is taking too much of time to index data. And I am indexing around 49GB of
 data on single shard. What's wrong? Why solr is taking too much time to
 index data?
 Earlier I was indexing same data on 8 shards. That time, it was fast as
 compared to single shard. Why so? Any help please..

There's practically no information to go on here, so about all I can
offer is general information in return:

http://wiki.apache.org/solr/SolrPerformanceProblems

I looked over the previous messages that you have sent the list, and I
can find very little of the required information about your index.  I
see a lot of questions from you, but they did not include the kind of
details needed here:

How much total RAM is in each Solr server?  Are there any other programs
on the server with significant RAM requirements?  An example of such a
program would be a database server.  On each server, how much memory is
dedicated to the java heap(s) for Solr?  I gather from other questions
that you are running SolrCloud, can you confirm?

On a per-server basis, how much disk space do all the index replicas
take?  How many documents are on each server?  Note that for disk space
and number of documents, I am asking you to count every replica, not
take the total in the collection and divide it by the number of servers.

How are you doing your indexing?  For this question, I am asking what
program or Solr API is actually sending the data to Solr.  Possible
answers include the dataimport handler, a SolrJ program, one of the
other Solr APIs such as a PHP client, and hand-crafted URLs with an HTTP
client.

Thanks,
Shawn



Re: Data indexing is going too slow on single shard Why?

2015-03-25 Thread Nitin Solanki
Hello,
* Updating my question again.*
Please can anyone assist me? I am indexing on single shard it
is taking too much of time to index data. And I am indexing around 49GB of
data on single shard. What's wrong? Why solr is taking too much time to
index data?
Earlier I was indexing same data on 8 shards. That time, it was fast as
compared to single shard. Why so? Any help please..


*HardCommit - 15 sec*
*SoftCommit - 10 min.*

ii) Searching a query/term is also taking too much time. Any help on this
also.



On Wed, Mar 25, 2015 at 4:33 PM, Nitin Solanki nitinml...@gmail.com wrote:

 Hello,
 Please can anyone assist me? I am indexing on single shard it
 is taking too much of time to index data. And I am indexing around 49GB of
 data on single shard. What's wrong? Why solr is taking too much time to
 index data?
 Earlier I was indexing same data on 8 shards. That time, it was fast as
 compared to single shard. Why so? Any help please..


 *HardCommit - 15 sec*
 *SoftCommit - 10 min.*



 Best,
 Nitin



Re: Data indexing is going too slow on single shard Why?

2015-03-25 Thread Nitin Solanki
Hi Shawn,
  Sorry for all the things.

Server configuration:
8 CPUs.
32 GB RAM
O.S. - Linux
*Earlier*, I was using 8 shards without replica(default is 1) using SOLR
CLOUD. On server, Only Solr is running. There is no other application which
are running.  Java heap set to 4096 MB in Solr.  While indexing,
Solr(sometime) eats up whole RAM. I don't know how each solr server takes
RAM? Each server taking around 50 GB data(indexed). Actually, I had deleted
previous solr architecture, so I don't have any idea that how many
documents were on each shards and also don't know total documents.

*Currently*, I have 1 shard  with 2 replicas using SOLR CLOUD.
Data Size:
102Gsolr/node1/solr/wikingram_shard1_replica2
102Gsolr/node2/solr/wikingram_shard1_replica1

I am running a python script to index data using Solr RESTAPI. Commiting
2 Documents each time for indexing using python script with Solr
RESTAPI.
If I missed anything related to Solr. Please inform me..
THanks Shawn. Waiting for your reply




On Wed, Mar 25, 2015 at 7:33 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 3/25/2015 5:03 AM, Nitin Solanki wrote:
  Please can anyone assist me? I am indexing on single shard it
  is taking too much of time to index data. And I am indexing around 49GB
 of
  data on single shard. What's wrong? Why solr is taking too much time to
  index data?
  Earlier I was indexing same data on 8 shards. That time, it was fast as
  compared to single shard. Why so? Any help please..

 There's practically no information to go on here, so about all I can
 offer is general information in return:

 http://wiki.apache.org/solr/SolrPerformanceProblems

 I looked over the previous messages that you have sent the list, and I
 can find very little of the required information about your index.  I
 see a lot of questions from you, but they did not include the kind of
 details needed here:

 How much total RAM is in each Solr server?  Are there any other programs
 on the server with significant RAM requirements?  An example of such a
 program would be a database server.  On each server, how much memory is
 dedicated to the java heap(s) for Solr?  I gather from other questions
 that you are running SolrCloud, can you confirm?

 On a per-server basis, how much disk space do all the index replicas
 take?  How many documents are on each server?  Note that for disk space
 and number of documents, I am asking you to count every replica, not
 take the total in the collection and divide it by the number of servers.

 How are you doing your indexing?  For this question, I am asking what
 program or Solr API is actually sending the data to Solr.  Possible
 answers include the dataimport handler, a SolrJ program, one of the
 other Solr APIs such as a PHP client, and hand-crafted URLs with an HTTP
 client.

 Thanks,
 Shawn




Re: Data indexing is going too slow on single shard Why?

2015-03-25 Thread Shawn Heisey
On 3/25/2015 8:42 AM, Nitin Solanki wrote:
 Server configuration:
 8 CPUs.
 32 GB RAM
 O.S. - Linux

snip

 are running.  Java heap set to 4096 MB in Solr.  While indexing,

snip

 *Currently*, I have 1 shard  with 2 replicas using SOLR CLOUD.
 Data Size:
 102Gsolr/node1/solr/wikingram_shard1_replica2
 102Gsolr/node2/solr/wikingram_shard1_replica1

If both of those are on the same machine, I'm guessing that you're
running two Solr instances on that machine, so there's 8GB of RAM used
for Java.  That means you have about 24 GB of RAM left for caching ...
and 200GB of index data to cache.

24GB is not enough to cache 200GB of index.  If there is only one Solr
instance (leaving 28GB for caching) with 102GB of data on the machine,
it still might not be enough.  See that SolrPerformanceProblems wiki
page I linked in my earlier email.

For 102GB of data per server, I recommend at least 64GB of total RAM,
preferably 128GB.

For 204GB of data per server, I recommend at least 128GB of total RAM,
preferably 256GB.

Thanks,
Shawn



Data indexing is going too slow on single shard Why?

2015-03-25 Thread Nitin Solanki
Hello,
Please can anyone assist me? I am indexing on single shard it
is taking too much of time to index data. And I am indexing around 49GB of
data on single shard. What's wrong? Why solr is taking too much time to
index data?
Earlier I was indexing same data on 8 shards. That time, it was fast as
compared to single shard. Why so? Any help please..


*HardCommit - 15 sec*
*SoftCommit - 10 min.*



Best,
Nitin