Re: Concurrent Indexing and Searching in Solr.

2015-08-11 Thread Nitin Solanki
Hi Erick,
 Thanks a lot for your help. I will go through MongoDB.

On Mon, Aug 10, 2015 at 9:14 PM Erick Erickson erickerick...@gmail.com
wrote:

 bq:  I changed
 maxWarmingSearchers*2*/maxWarmingSearchers
 to maxWarmingSearchers*100*/maxWarmingSearchers. And apply simultaneous
 searching using 100 workers.

 Do not do this. This has nothing to do with the number of searcher
 threads. And with
 your update rate, especially if you continue to insist on adding
 commit=true to every
 update request, this will explode your memory requirements. To no good
 purpose
 whatsoever.

 bq: But MongoDB can handle concurrent searching and indexing faster.

 Because MongoDB is optimized for different kinds of operations. Solr
 is a ranking,
 free-text search engine. It's an apples-and-oranges comparison. If MongoDB
 meets your search needs, you should use it.

 Best,
 Erick

 On Sun, Aug 9, 2015 at 11:04 PM, Nitin Solanki nitinml...@gmail.com
 wrote:
  Hi,
   I used solr 5.2.1 version. It is fast, I think. But again, I am
 stuck
  on concurrent searching and threading. I changed
  maxWarmingSearchers*2*/maxWarmingSearchers
  to maxWarmingSearchers*100*/maxWarmingSearchers. And apply
 simultaneous
  searching using 100 workers. It works fast but not upto the mark.
 
  It increases searching from 1.5  to 0.5 seconds. But If I run only single
  worker then searching time is 0.03 seconds,  it is too fast but not
  possible with 100 workers simultaneously.
 
  As Shawn said - Making 100 concurrent indexing requests at the same time
  as 100
  concurrent queries will overwhelm *any* single Solr server. I got your
  point.
 
  But MongoDB can handle concurrent searching and indexing faster. Then why
  not solr? Sorry for this..
 
 
 
  On Mon, Aug 10, 2015 at 2:39 AM Shawn Heisey apa...@elyograg.org
 wrote:
 
  On 8/7/2015 1:15 PM, Nitin Solanki wrote:
   I wrote a python script for indexing and using
   urllib and urllib2 for indexing data via http..
 
  There are a number of Solr python clients.  Using a client makes your
  code much easier to write and understand.
 
  https://wiki.apache.org/solr/SolPython
 
  I have no experience with any of these clients, but I can say that the
  one encountered most often when Python developers come into the #solr
  IRC channel is pysolr.  Our wiki page says the last update for pysolr
  happened in December of 2013, but I can see that the last version on
  their web page is dated 2015-05-26.
 
  Making 100 concurrent indexing requests at the same time as 100
  concurrent queries will overwhelm *any* single Solr server.  In a
  previous message you said that you have 4 CPU cores.  The load you're
  trying to put on Solr will require at *LEAST* 200 threads.  It may be
  more than that.  Any single system is going to have trouble with that.
  A system with 4 cores will be *very* overloaded.
 
  Thanks,
  Shawn
 
 



Re: Concurrent Indexing and Searching in Solr.

2015-08-10 Thread Erick Erickson
bq:  I changed
maxWarmingSearchers*2*/maxWarmingSearchers
to maxWarmingSearchers*100*/maxWarmingSearchers. And apply simultaneous
searching using 100 workers.

Do not do this. This has nothing to do with the number of searcher
threads. And with
your update rate, especially if you continue to insist on adding
commit=true to every
update request, this will explode your memory requirements. To no good purpose
whatsoever.

bq: But MongoDB can handle concurrent searching and indexing faster.

Because MongoDB is optimized for different kinds of operations. Solr
is a ranking,
free-text search engine. It's an apples-and-oranges comparison. If MongoDB
meets your search needs, you should use it.

Best,
Erick

On Sun, Aug 9, 2015 at 11:04 PM, Nitin Solanki nitinml...@gmail.com wrote:
 Hi,
  I used solr 5.2.1 version. It is fast, I think. But again, I am stuck
 on concurrent searching and threading. I changed
 maxWarmingSearchers*2*/maxWarmingSearchers
 to maxWarmingSearchers*100*/maxWarmingSearchers. And apply simultaneous
 searching using 100 workers. It works fast but not upto the mark.

 It increases searching from 1.5  to 0.5 seconds. But If I run only single
 worker then searching time is 0.03 seconds,  it is too fast but not
 possible with 100 workers simultaneously.

 As Shawn said - Making 100 concurrent indexing requests at the same time
 as 100
 concurrent queries will overwhelm *any* single Solr server. I got your
 point.

 But MongoDB can handle concurrent searching and indexing faster. Then why
 not solr? Sorry for this..



 On Mon, Aug 10, 2015 at 2:39 AM Shawn Heisey apa...@elyograg.org wrote:

 On 8/7/2015 1:15 PM, Nitin Solanki wrote:
  I wrote a python script for indexing and using
  urllib and urllib2 for indexing data via http..

 There are a number of Solr python clients.  Using a client makes your
 code much easier to write and understand.

 https://wiki.apache.org/solr/SolPython

 I have no experience with any of these clients, but I can say that the
 one encountered most often when Python developers come into the #solr
 IRC channel is pysolr.  Our wiki page says the last update for pysolr
 happened in December of 2013, but I can see that the last version on
 their web page is dated 2015-05-26.

 Making 100 concurrent indexing requests at the same time as 100
 concurrent queries will overwhelm *any* single Solr server.  In a
 previous message you said that you have 4 CPU cores.  The load you're
 trying to put on Solr will require at *LEAST* 200 threads.  It may be
 more than that.  Any single system is going to have trouble with that.
 A system with 4 cores will be *very* overloaded.

 Thanks,
 Shawn




Re: Concurrent Indexing and Searching in Solr.

2015-08-10 Thread Nitin Solanki
Hi,
 I used solr 5.2.1 version. It is fast, I think. But again, I am stuck
on concurrent searching and threading. I changed
maxWarmingSearchers*2*/maxWarmingSearchers
to maxWarmingSearchers*100*/maxWarmingSearchers. And apply simultaneous
searching using 100 workers. It works fast but not upto the mark.

It increases searching from 1.5  to 0.5 seconds. But If I run only single
worker then searching time is 0.03 seconds,  it is too fast but not
possible with 100 workers simultaneously.

As Shawn said - Making 100 concurrent indexing requests at the same time
as 100
concurrent queries will overwhelm *any* single Solr server. I got your
point.

But MongoDB can handle concurrent searching and indexing faster. Then why
not solr? Sorry for this..



On Mon, Aug 10, 2015 at 2:39 AM Shawn Heisey apa...@elyograg.org wrote:

 On 8/7/2015 1:15 PM, Nitin Solanki wrote:
  I wrote a python script for indexing and using
  urllib and urllib2 for indexing data via http..

 There are a number of Solr python clients.  Using a client makes your
 code much easier to write and understand.

 https://wiki.apache.org/solr/SolPython

 I have no experience with any of these clients, but I can say that the
 one encountered most often when Python developers come into the #solr
 IRC channel is pysolr.  Our wiki page says the last update for pysolr
 happened in December of 2013, but I can see that the last version on
 their web page is dated 2015-05-26.

 Making 100 concurrent indexing requests at the same time as 100
 concurrent queries will overwhelm *any* single Solr server.  In a
 previous message you said that you have 4 CPU cores.  The load you're
 trying to put on Solr will require at *LEAST* 200 threads.  It may be
 more than that.  Any single system is going to have trouble with that.
 A system with 4 cores will be *very* overloaded.

 Thanks,
 Shawn




Re: Concurrent Indexing and Searching in Solr.

2015-08-09 Thread Shawn Heisey
On 8/7/2015 1:15 PM, Nitin Solanki wrote:
 I wrote a python script for indexing and using
 urllib and urllib2 for indexing data via http..

There are a number of Solr python clients.  Using a client makes your
code much easier to write and understand.

https://wiki.apache.org/solr/SolPython

I have no experience with any of these clients, but I can say that the
one encountered most often when Python developers come into the #solr
IRC channel is pysolr.  Our wiki page says the last update for pysolr
happened in December of 2013, but I can see that the last version on
their web page is dated 2015-05-26.

Making 100 concurrent indexing requests at the same time as 100
concurrent queries will overwhelm *any* single Solr server.  In a
previous message you said that you have 4 CPU cores.  The load you're
trying to put on Solr will require at *LEAST* 200 threads.  It may be
more than that.  Any single system is going to have trouble with that. 
A system with 4 cores will be *very* overloaded.

Thanks,
Shawn



Re: Concurrent Indexing and Searching in Solr.

2015-08-08 Thread Nitin Solanki
Thanks Erick for your suggestion. I will remove commit = true and use solr
5.2 and then get back to you again. For further help. Thanks.

On Sat, Aug 8, 2015 at 4:07 AM Erick Erickson erickerick...@gmail.com
wrote:

 bq: So, How much minimum concurrent threads should I run?

 I really can't answer that in the abstract, you'll simply have to
 test.

 I'd prefer SolrJ to post.jar. If you're not going to SolrJ, I'd imagine
 that
 moving from Python to post.jar isn't all that useful.

 But before you do anything, see what really happens when you remove th
 commit=true. That's likely way more important than the rest.

 Best,
 Erick

 On Fri, Aug 7, 2015 at 3:15 PM, Nitin Solanki nitinml...@gmail.com
 wrote:
  Hi Erick,
  posting files to Solr via curl =
  Rather than posting files via curl. Which is better SolrJ or post.jar...
 I
  don't use both things. I wrote a python script for indexing and using
  urllib and urllib2 for indexing data via http.. I don't have any  option
 to
  use SolrJ Right now. How can I do same thing via post.jar in python? Any
  help Please.
 
  indexing with 100 threads is going to eat up a lot of CPU cycles
  = So, How much minimum concurrent threads should I run? And I also need
  concurrent searching. So, How much?
 
  And Thanks for solr 5.2, I will go through that. Thanking for reply.
 Please
  help me..
 
  On Fri, Aug 7, 2015 at 11:51 PM Erick Erickson erickerick...@gmail.com
  wrote:
 
  bq: How much limitations does Solr has related to indexing and searching
  simultaneously? It means that how many simultaneously calls, I made for
  searching and indexing once?
 
  None a-priori. It all depends on the hardware you're throwing at it.
  Obviously
  indexing with 100 threads is going to eat up a lot of CPU cycles that
  can't then
  be devoted to satisfying queries. You need to strike a balance. Do
  seriously
  consider using some other method than posting files to Solr via curl
  or the like,
  that's rarely a robust solution for production.
 
  As for adding the commit=true, this shouldn't be affecting the index
 size,
  I
  suspect you were mislead by something else happening.
 
  Really, remove it or you'll beat up your system hugely. As for the soft
  commit
  interval, that's totally irrelevant when you're committing every
  document. But do
  lengthen it as much as you can. Most of the time when people say real
  time,
  it turns out that 10 seconds is OK. Or 60 seconds is OK.  You have to
 check
  what the _real_ requirement is, it's often not what's stated.
 
  bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
  indexing and searching data.
 
  Did you read the link I provided? With replicas, 5.2 will index almost
  twice as
  fast. That means (roughly) half the work on the followers is being done,
  freeing up cycles for performing queries.
 
  Best,
  Erick
 
 
  On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki nitinml...@gmail.com
  wrote:
   Hi Erick,
 You said that soft commit should be more than 3000 ms.
   Actually, I need Real time searching and that's why I need soft commit
  fast.
  
   commit=true = I made commit=true because , It reduces by indexed data
  size
   from 1.5GB to 500MB on* each shard*. When I did commit=false then, my
   indexed data size was 1.5GB. After changing it to commit=true, then
 size
   reduced to 500MB only. I am not getting how is it?
  
   I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
   indexing and searching data.
  
   How much limitations does Solr has related to indexing and searching
   simultaneously? It means that how many simultaneously calls, I made
 for
   searching and indexing once?
  
  
   On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson 
 erickerick...@gmail.com
   wrote:
  
   Your soft commit time of 3 seconds is quite aggressive,
   I'd lengthen it to as long as possible.
  
   Ugh, looked at your query more closely. Adding commit=true to every
  update
   request is horrible performance wise. Let your autocommit process
   handle the commits is the first thing I'd do. Second, I'd try going
 to
   SolrJ
   and batching up documents (I usually start with 1,000) or using the
   post.jar
   tool rather than sending them via a raw URL.
  
   I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what
   version of Solr?
   There was a 2x speedup in Solr 5.2, see:
  
 
 http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
  
   One symptom was that the followers were doing way more work than
 the
   leader
   (BTW, using master/slave when talking SolrCloud is a bit
 confusing...)
   which will
   affect query response rates.
  
   Basically, if query response is paramount, you really need to
 throttle
   your indexing,
   there's just a whole lot of work going on here..
  
   Best,
   Erick
  
   On Fri, Aug 7, 2015 at 11:23 AM, Upayavira u...@odoko.co.uk wrote:
How many CPUs do you have? 100 concurrent 

Re: Concurrent Indexing and Searching in Solr.

2015-08-08 Thread Upayavira
If you are using Python, then you can use urllib2, or requests which
is reportedly better, or better still something like pysolr, which makes
life simpler.

Here's a Pull Request that makes pysolr Zookeeper aware, which'll help
if you are using SolrCloud. I hope one day they will merge it:

https://github.com/toastdriven/pysolr/pull/138

Upayavira

On Fri, Aug 7, 2015, at 11:37 PM, Erick Erickson wrote:
 bq: So, How much minimum concurrent threads should I run?
 
 I really can't answer that in the abstract, you'll simply have to
 test.
 
 I'd prefer SolrJ to post.jar. If you're not going to SolrJ, I'd imagine
 that
 moving from Python to post.jar isn't all that useful.
 
 But before you do anything, see what really happens when you remove th
 commit=true. That's likely way more important than the rest.
 
 Best,
 Erick
 
 On Fri, Aug 7, 2015 at 3:15 PM, Nitin Solanki nitinml...@gmail.com
 wrote:
  Hi Erick,
  posting files to Solr via curl =
  Rather than posting files via curl. Which is better SolrJ or post.jar... I
  don't use both things. I wrote a python script for indexing and using
  urllib and urllib2 for indexing data via http.. I don't have any  option to
  use SolrJ Right now. How can I do same thing via post.jar in python? Any
  help Please.
 
  indexing with 100 threads is going to eat up a lot of CPU cycles
  = So, How much minimum concurrent threads should I run? And I also need
  concurrent searching. So, How much?
 
  And Thanks for solr 5.2, I will go through that. Thanking for reply. Please
  help me..
 
  On Fri, Aug 7, 2015 at 11:51 PM Erick Erickson erickerick...@gmail.com
  wrote:
 
  bq: How much limitations does Solr has related to indexing and searching
  simultaneously? It means that how many simultaneously calls, I made for
  searching and indexing once?
 
  None a-priori. It all depends on the hardware you're throwing at it.
  Obviously
  indexing with 100 threads is going to eat up a lot of CPU cycles that
  can't then
  be devoted to satisfying queries. You need to strike a balance. Do
  seriously
  consider using some other method than posting files to Solr via curl
  or the like,
  that's rarely a robust solution for production.
 
  As for adding the commit=true, this shouldn't be affecting the index size,
  I
  suspect you were mislead by something else happening.
 
  Really, remove it or you'll beat up your system hugely. As for the soft
  commit
  interval, that's totally irrelevant when you're committing every
  document. But do
  lengthen it as much as you can. Most of the time when people say real
  time,
  it turns out that 10 seconds is OK. Or 60 seconds is OK.  You have to check
  what the _real_ requirement is, it's often not what's stated.
 
  bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
  indexing and searching data.
 
  Did you read the link I provided? With replicas, 5.2 will index almost
  twice as
  fast. That means (roughly) half the work on the followers is being done,
  freeing up cycles for performing queries.
 
  Best,
  Erick
 
 
  On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki nitinml...@gmail.com
  wrote:
   Hi Erick,
 You said that soft commit should be more than 3000 ms.
   Actually, I need Real time searching and that's why I need soft commit
  fast.
  
   commit=true = I made commit=true because , It reduces by indexed data
  size
   from 1.5GB to 500MB on* each shard*. When I did commit=false then, my
   indexed data size was 1.5GB. After changing it to commit=true, then size
   reduced to 500MB only. I am not getting how is it?
  
   I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
   indexing and searching data.
  
   How much limitations does Solr has related to indexing and searching
   simultaneously? It means that how many simultaneously calls, I made for
   searching and indexing once?
  
  
   On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson erickerick...@gmail.com
   wrote:
  
   Your soft commit time of 3 seconds is quite aggressive,
   I'd lengthen it to as long as possible.
  
   Ugh, looked at your query more closely. Adding commit=true to every
  update
   request is horrible performance wise. Let your autocommit process
   handle the commits is the first thing I'd do. Second, I'd try going to
   SolrJ
   and batching up documents (I usually start with 1,000) or using the
   post.jar
   tool rather than sending them via a raw URL.
  
   I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what
   version of Solr?
   There was a 2x speedup in Solr 5.2, see:
  
  http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
  
   One symptom was that the followers were doing way more work than the
   leader
   (BTW, using master/slave when talking SolrCloud is a bit confusing...)
   which will
   affect query response rates.
  
   Basically, if query response is paramount, you really need to throttle
   your indexing,
   there's just a 

Concurrent Indexing and Searching in Solr.

2015-08-07 Thread Nitin Solanki
Hello Everyone,
  I have indexed 16 million documents in Solr
Cloud. Created 4 nodes and 8 shards with single replica.
I am trying to make concurrent indexing and searching on those indexed
documents. Trying to make 100 concurrent indexing calls along with 100
concurrent searching calls.
It *degrades searching and indexing* performance both.

Configuration :

  commitWithin:{softCommit:true},
  autoCommit:{
maxDocs:-1,
maxTime:6,
openSearcher:false},
  autoSoftCommit:{
maxDocs:-1,
maxTime:3000}},

  indexConfig:{
  maxBufferedDocs:-1,
  maxMergeDocs:-1,
  maxIndexingThreads:8,
  mergeFactor:-1,
  ramBufferSizeMB:100.0,
  writeLockTimeout:-1,
  lockType:native}}}

AND  maxWarmingSearchers2/maxWarmingSearchers

I don't have know that how master and slave works. Normally, I created 8
shards and indexed documents using :




*http://localhost:8983/solr/test_commit_fast/update/json?commit=true
http://localhost:8983/solr/test_commit_fast/update/json?commit=true -H
'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching using
*: http://localhost:8983/solr/test_commit_fast/select
http://localhost:8983/solr/test_commit_fast/select*?q= field_name:
search_string

Please any help on it. To make searching and indexing fast concurrently.
Thanks.


Regards,
Nitin


Re: Concurrent Indexing and Searching in Solr.

2015-08-07 Thread Erick Erickson
bq: So, How much minimum concurrent threads should I run?

I really can't answer that in the abstract, you'll simply have to
test.

I'd prefer SolrJ to post.jar. If you're not going to SolrJ, I'd imagine that
moving from Python to post.jar isn't all that useful.

But before you do anything, see what really happens when you remove th
commit=true. That's likely way more important than the rest.

Best,
Erick

On Fri, Aug 7, 2015 at 3:15 PM, Nitin Solanki nitinml...@gmail.com wrote:
 Hi Erick,
 posting files to Solr via curl =
 Rather than posting files via curl. Which is better SolrJ or post.jar... I
 don't use both things. I wrote a python script for indexing and using
 urllib and urllib2 for indexing data via http.. I don't have any  option to
 use SolrJ Right now. How can I do same thing via post.jar in python? Any
 help Please.

 indexing with 100 threads is going to eat up a lot of CPU cycles
 = So, How much minimum concurrent threads should I run? And I also need
 concurrent searching. So, How much?

 And Thanks for solr 5.2, I will go through that. Thanking for reply. Please
 help me..

 On Fri, Aug 7, 2015 at 11:51 PM Erick Erickson erickerick...@gmail.com
 wrote:

 bq: How much limitations does Solr has related to indexing and searching
 simultaneously? It means that how many simultaneously calls, I made for
 searching and indexing once?

 None a-priori. It all depends on the hardware you're throwing at it.
 Obviously
 indexing with 100 threads is going to eat up a lot of CPU cycles that
 can't then
 be devoted to satisfying queries. You need to strike a balance. Do
 seriously
 consider using some other method than posting files to Solr via curl
 or the like,
 that's rarely a robust solution for production.

 As for adding the commit=true, this shouldn't be affecting the index size,
 I
 suspect you were mislead by something else happening.

 Really, remove it or you'll beat up your system hugely. As for the soft
 commit
 interval, that's totally irrelevant when you're committing every
 document. But do
 lengthen it as much as you can. Most of the time when people say real
 time,
 it turns out that 10 seconds is OK. Or 60 seconds is OK.  You have to check
 what the _real_ requirement is, it's often not what's stated.

 bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
 indexing and searching data.

 Did you read the link I provided? With replicas, 5.2 will index almost
 twice as
 fast. That means (roughly) half the work on the followers is being done,
 freeing up cycles for performing queries.

 Best,
 Erick


 On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki nitinml...@gmail.com
 wrote:
  Hi Erick,
You said that soft commit should be more than 3000 ms.
  Actually, I need Real time searching and that's why I need soft commit
 fast.
 
  commit=true = I made commit=true because , It reduces by indexed data
 size
  from 1.5GB to 500MB on* each shard*. When I did commit=false then, my
  indexed data size was 1.5GB. After changing it to commit=true, then size
  reduced to 500MB only. I am not getting how is it?
 
  I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
  indexing and searching data.
 
  How much limitations does Solr has related to indexing and searching
  simultaneously? It means that how many simultaneously calls, I made for
  searching and indexing once?
 
 
  On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson erickerick...@gmail.com
  wrote:
 
  Your soft commit time of 3 seconds is quite aggressive,
  I'd lengthen it to as long as possible.
 
  Ugh, looked at your query more closely. Adding commit=true to every
 update
  request is horrible performance wise. Let your autocommit process
  handle the commits is the first thing I'd do. Second, I'd try going to
  SolrJ
  and batching up documents (I usually start with 1,000) or using the
  post.jar
  tool rather than sending them via a raw URL.
 
  I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what
  version of Solr?
  There was a 2x speedup in Solr 5.2, see:
 
 http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
 
  One symptom was that the followers were doing way more work than the
  leader
  (BTW, using master/slave when talking SolrCloud is a bit confusing...)
  which will
  affect query response rates.
 
  Basically, if query response is paramount, you really need to throttle
  your indexing,
  there's just a whole lot of work going on here..
 
  Best,
  Erick
 
  On Fri, Aug 7, 2015 at 11:23 AM, Upayavira u...@odoko.co.uk wrote:
   How many CPUs do you have? 100 concurrent indexing calls seems like
   rather a lot. You're gonna end up doing a lot of context switching,
   hence degraded performance. Dunno what others would say, but I'd aim
 for
   approx one indexing thread per CPU.
  
   Upayavira
  
   On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote:
   Hello Everyone,
 I have indexed 16 million 

Re: Concurrent Indexing and Searching in Solr.

2015-08-07 Thread Erick Erickson
Your soft commit time of 3 seconds is quite aggressive,
I'd lengthen it to as long as possible.

Ugh, looked at your query more closely. Adding commit=true to every update
request is horrible performance wise. Let your autocommit process
handle the commits is the first thing I'd do. Second, I'd try going to SolrJ
and batching up documents (I usually start with 1,000) or using the post.jar
tool rather than sending them via a raw URL.

I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what
version of Solr?
There was a 2x speedup in Solr 5.2, see:
http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/

One symptom was that the followers were doing way more work than the leader
(BTW, using master/slave when talking SolrCloud is a bit confusing...)
which will
affect query response rates.

Basically, if query response is paramount, you really need to throttle
your indexing,
there's just a whole lot of work going on here..

Best,
Erick

On Fri, Aug 7, 2015 at 11:23 AM, Upayavira u...@odoko.co.uk wrote:
 How many CPUs do you have? 100 concurrent indexing calls seems like
 rather a lot. You're gonna end up doing a lot of context switching,
 hence degraded performance. Dunno what others would say, but I'd aim for
 approx one indexing thread per CPU.

 Upayavira

 On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote:
 Hello Everyone,
   I have indexed 16 million documents in Solr
 Cloud. Created 4 nodes and 8 shards with single replica.
 I am trying to make concurrent indexing and searching on those indexed
 documents. Trying to make 100 concurrent indexing calls along with 100
 concurrent searching calls.
 It *degrades searching and indexing* performance both.

 Configuration :

   commitWithin:{softCommit:true},
   autoCommit:{
 maxDocs:-1,
 maxTime:6,
 openSearcher:false},
   autoSoftCommit:{
 maxDocs:-1,
 maxTime:3000}},

   indexConfig:{
   maxBufferedDocs:-1,
   maxMergeDocs:-1,
   maxIndexingThreads:8,
   mergeFactor:-1,
   ramBufferSizeMB:100.0,
   writeLockTimeout:-1,
   lockType:native}}}

 AND  maxWarmingSearchers2/maxWarmingSearchers

 I don't have know that how master and slave works. Normally, I created 8
 shards and indexed documents using :




 *http://localhost:8983/solr/test_commit_fast/update/json?commit=true
 http://localhost:8983/solr/test_commit_fast/update/json?commit=true -H
 'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching
 using
 *: http://localhost:8983/solr/test_commit_fast/select
 http://localhost:8983/solr/test_commit_fast/select*?q= field_name:
 search_string

 Please any help on it. To make searching and indexing fast concurrently.
 Thanks.


 Regards,
 Nitin


Re: Concurrent Indexing and Searching in Solr.

2015-08-07 Thread Nitin Solanki
Hi Erick,
posting files to Solr via curl =
Rather than posting files via curl. Which is better SolrJ or post.jar... I
don't use both things. I wrote a python script for indexing and using
urllib and urllib2 for indexing data via http.. I don't have any  option to
use SolrJ Right now. How can I do same thing via post.jar in python? Any
help Please.

indexing with 100 threads is going to eat up a lot of CPU cycles
= So, How much minimum concurrent threads should I run? And I also need
concurrent searching. So, How much?

And Thanks for solr 5.2, I will go through that. Thanking for reply. Please
help me..

On Fri, Aug 7, 2015 at 11:51 PM Erick Erickson erickerick...@gmail.com
wrote:

 bq: How much limitations does Solr has related to indexing and searching
 simultaneously? It means that how many simultaneously calls, I made for
 searching and indexing once?

 None a-priori. It all depends on the hardware you're throwing at it.
 Obviously
 indexing with 100 threads is going to eat up a lot of CPU cycles that
 can't then
 be devoted to satisfying queries. You need to strike a balance. Do
 seriously
 consider using some other method than posting files to Solr via curl
 or the like,
 that's rarely a robust solution for production.

 As for adding the commit=true, this shouldn't be affecting the index size,
 I
 suspect you were mislead by something else happening.

 Really, remove it or you'll beat up your system hugely. As for the soft
 commit
 interval, that's totally irrelevant when you're committing every
 document. But do
 lengthen it as much as you can. Most of the time when people say real
 time,
 it turns out that 10 seconds is OK. Or 60 seconds is OK.  You have to check
 what the _real_ requirement is, it's often not what's stated.

 bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
 indexing and searching data.

 Did you read the link I provided? With replicas, 5.2 will index almost
 twice as
 fast. That means (roughly) half the work on the followers is being done,
 freeing up cycles for performing queries.

 Best,
 Erick


 On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki nitinml...@gmail.com
 wrote:
  Hi Erick,
You said that soft commit should be more than 3000 ms.
  Actually, I need Real time searching and that's why I need soft commit
 fast.
 
  commit=true = I made commit=true because , It reduces by indexed data
 size
  from 1.5GB to 500MB on* each shard*. When I did commit=false then, my
  indexed data size was 1.5GB. After changing it to commit=true, then size
  reduced to 500MB only. I am not getting how is it?
 
  I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
  indexing and searching data.
 
  How much limitations does Solr has related to indexing and searching
  simultaneously? It means that how many simultaneously calls, I made for
  searching and indexing once?
 
 
  On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson erickerick...@gmail.com
  wrote:
 
  Your soft commit time of 3 seconds is quite aggressive,
  I'd lengthen it to as long as possible.
 
  Ugh, looked at your query more closely. Adding commit=true to every
 update
  request is horrible performance wise. Let your autocommit process
  handle the commits is the first thing I'd do. Second, I'd try going to
  SolrJ
  and batching up documents (I usually start with 1,000) or using the
  post.jar
  tool rather than sending them via a raw URL.
 
  I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what
  version of Solr?
  There was a 2x speedup in Solr 5.2, see:
 
 http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
 
  One symptom was that the followers were doing way more work than the
  leader
  (BTW, using master/slave when talking SolrCloud is a bit confusing...)
  which will
  affect query response rates.
 
  Basically, if query response is paramount, you really need to throttle
  your indexing,
  there's just a whole lot of work going on here..
 
  Best,
  Erick
 
  On Fri, Aug 7, 2015 at 11:23 AM, Upayavira u...@odoko.co.uk wrote:
   How many CPUs do you have? 100 concurrent indexing calls seems like
   rather a lot. You're gonna end up doing a lot of context switching,
   hence degraded performance. Dunno what others would say, but I'd aim
 for
   approx one indexing thread per CPU.
  
   Upayavira
  
   On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote:
   Hello Everyone,
 I have indexed 16 million documents in Solr
   Cloud. Created 4 nodes and 8 shards with single replica.
   I am trying to make concurrent indexing and searching on those
 indexed
   documents. Trying to make 100 concurrent indexing calls along with
 100
   concurrent searching calls.
   It *degrades searching and indexing* performance both.
  
   Configuration :
  
 commitWithin:{softCommit:true},
 autoCommit:{
   maxDocs:-1,
   maxTime:6,
   openSearcher:false},
 

Re: Concurrent Indexing and Searching in Solr.

2015-08-07 Thread Nitin Solanki
Hi Erick,
  You said that soft commit should be more than 3000 ms.
Actually, I need Real time searching and that's why I need soft commit fast.

commit=true = I made commit=true because , It reduces by indexed data size
from 1.5GB to 500MB on* each shard*. When I did commit=false then, my
indexed data size was 1.5GB. After changing it to commit=true, then size
reduced to 500MB only. I am not getting how is it?

I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
indexing and searching data.

How much limitations does Solr has related to indexing and searching
simultaneously? It means that how many simultaneously calls, I made for
searching and indexing once?


On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson erickerick...@gmail.com
wrote:

 Your soft commit time of 3 seconds is quite aggressive,
 I'd lengthen it to as long as possible.

 Ugh, looked at your query more closely. Adding commit=true to every update
 request is horrible performance wise. Let your autocommit process
 handle the commits is the first thing I'd do. Second, I'd try going to
 SolrJ
 and batching up documents (I usually start with 1,000) or using the
 post.jar
 tool rather than sending them via a raw URL.

 I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what
 version of Solr?
 There was a 2x speedup in Solr 5.2, see:
 http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/

 One symptom was that the followers were doing way more work than the
 leader
 (BTW, using master/slave when talking SolrCloud is a bit confusing...)
 which will
 affect query response rates.

 Basically, if query response is paramount, you really need to throttle
 your indexing,
 there's just a whole lot of work going on here..

 Best,
 Erick

 On Fri, Aug 7, 2015 at 11:23 AM, Upayavira u...@odoko.co.uk wrote:
  How many CPUs do you have? 100 concurrent indexing calls seems like
  rather a lot. You're gonna end up doing a lot of context switching,
  hence degraded performance. Dunno what others would say, but I'd aim for
  approx one indexing thread per CPU.
 
  Upayavira
 
  On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote:
  Hello Everyone,
I have indexed 16 million documents in Solr
  Cloud. Created 4 nodes and 8 shards with single replica.
  I am trying to make concurrent indexing and searching on those indexed
  documents. Trying to make 100 concurrent indexing calls along with 100
  concurrent searching calls.
  It *degrades searching and indexing* performance both.
 
  Configuration :
 
commitWithin:{softCommit:true},
autoCommit:{
  maxDocs:-1,
  maxTime:6,
  openSearcher:false},
autoSoftCommit:{
  maxDocs:-1,
  maxTime:3000}},
 
indexConfig:{
maxBufferedDocs:-1,
maxMergeDocs:-1,
maxIndexingThreads:8,
mergeFactor:-1,
ramBufferSizeMB:100.0,
writeLockTimeout:-1,
lockType:native}}}
 
  AND  maxWarmingSearchers2/maxWarmingSearchers
 
  I don't have know that how master and slave works. Normally, I created 8
  shards and indexed documents using :
 
 
 
 
  *http://localhost:8983/solr/test_commit_fast/update/json?commit=true
  http://localhost:8983/solr/test_commit_fast/update/json?commit=true
 -H
  'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching
  using
  *: http://localhost:8983/solr/test_commit_fast/select
  http://localhost:8983/solr/test_commit_fast/select*?q= field_name:
  search_string
 
  Please any help on it. To make searching and indexing fast concurrently.
  Thanks.
 
 
  Regards,
  Nitin



Re: Concurrent Indexing and Searching in Solr.

2015-08-07 Thread Nitin Solanki
Hi, Upayavira

RAM = 28GB
CPU = 4 processes..


On Fri, Aug 7, 2015 at 8:53 PM Upayavira u...@odoko.co.uk wrote:

 How many CPUs do you have? 100 concurrent indexing calls seems like
 rather a lot. You're gonna end up doing a lot of context switching,
 hence degraded performance. Dunno what others would say, but I'd aim for
 approx one indexing thread per CPU.

 Upayavira

 On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote:
  Hello Everyone,
I have indexed 16 million documents in Solr
  Cloud. Created 4 nodes and 8 shards with single replica.
  I am trying to make concurrent indexing and searching on those indexed
  documents. Trying to make 100 concurrent indexing calls along with 100
  concurrent searching calls.
  It *degrades searching and indexing* performance both.
 
  Configuration :
 
commitWithin:{softCommit:true},
autoCommit:{
  maxDocs:-1,
  maxTime:6,
  openSearcher:false},
autoSoftCommit:{
  maxDocs:-1,
  maxTime:3000}},
 
indexConfig:{
maxBufferedDocs:-1,
maxMergeDocs:-1,
maxIndexingThreads:8,
mergeFactor:-1,
ramBufferSizeMB:100.0,
writeLockTimeout:-1,
lockType:native}}}
 
  AND  maxWarmingSearchers2/maxWarmingSearchers
 
  I don't have know that how master and slave works. Normally, I created 8
  shards and indexed documents using :
 
 
 
 
  *http://localhost:8983/solr/test_commit_fast/update/json?commit=true
  http://localhost:8983/solr/test_commit_fast/update/json?commit=true -H
  'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching
  using
  *: http://localhost:8983/solr/test_commit_fast/select
  http://localhost:8983/solr/test_commit_fast/select*?q= field_name:
  search_string
 
  Please any help on it. To make searching and indexing fast concurrently.
  Thanks.
 
 
  Regards,
  Nitin



Re: Concurrent Indexing and Searching in Solr.

2015-08-07 Thread Erick Erickson
bq: How much limitations does Solr has related to indexing and searching
simultaneously? It means that how many simultaneously calls, I made for
searching and indexing once?

None a-priori. It all depends on the hardware you're throwing at it. Obviously
indexing with 100 threads is going to eat up a lot of CPU cycles that can't then
be devoted to satisfying queries. You need to strike a balance. Do seriously
consider using some other method than posting files to Solr via curl
or the like,
that's rarely a robust solution for production.

As for adding the commit=true, this shouldn't be affecting the index size, I
suspect you were mislead by something else happening.

Really, remove it or you'll beat up your system hugely. As for the soft commit
interval, that's totally irrelevant when you're committing every
document. But do
lengthen it as much as you can. Most of the time when people say real time,
it turns out that 10 seconds is OK. Or 60 seconds is OK.  You have to check
what the _real_ requirement is, it's often not what's stated.

bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
indexing and searching data.

Did you read the link I provided? With replicas, 5.2 will index almost twice as
fast. That means (roughly) half the work on the followers is being done,
freeing up cycles for performing queries.

Best,
Erick


On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki nitinml...@gmail.com wrote:
 Hi Erick,
   You said that soft commit should be more than 3000 ms.
 Actually, I need Real time searching and that's why I need soft commit fast.

 commit=true = I made commit=true because , It reduces by indexed data size
 from 1.5GB to 500MB on* each shard*. When I did commit=false then, my
 indexed data size was 1.5GB. After changing it to commit=true, then size
 reduced to 500MB only. I am not getting how is it?

 I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
 indexing and searching data.

 How much limitations does Solr has related to indexing and searching
 simultaneously? It means that how many simultaneously calls, I made for
 searching and indexing once?


 On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson erickerick...@gmail.com
 wrote:

 Your soft commit time of 3 seconds is quite aggressive,
 I'd lengthen it to as long as possible.

 Ugh, looked at your query more closely. Adding commit=true to every update
 request is horrible performance wise. Let your autocommit process
 handle the commits is the first thing I'd do. Second, I'd try going to
 SolrJ
 and batching up documents (I usually start with 1,000) or using the
 post.jar
 tool rather than sending them via a raw URL.

 I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what
 version of Solr?
 There was a 2x speedup in Solr 5.2, see:
 http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/

 One symptom was that the followers were doing way more work than the
 leader
 (BTW, using master/slave when talking SolrCloud is a bit confusing...)
 which will
 affect query response rates.

 Basically, if query response is paramount, you really need to throttle
 your indexing,
 there's just a whole lot of work going on here..

 Best,
 Erick

 On Fri, Aug 7, 2015 at 11:23 AM, Upayavira u...@odoko.co.uk wrote:
  How many CPUs do you have? 100 concurrent indexing calls seems like
  rather a lot. You're gonna end up doing a lot of context switching,
  hence degraded performance. Dunno what others would say, but I'd aim for
  approx one indexing thread per CPU.
 
  Upayavira
 
  On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote:
  Hello Everyone,
I have indexed 16 million documents in Solr
  Cloud. Created 4 nodes and 8 shards with single replica.
  I am trying to make concurrent indexing and searching on those indexed
  documents. Trying to make 100 concurrent indexing calls along with 100
  concurrent searching calls.
  It *degrades searching and indexing* performance both.
 
  Configuration :
 
commitWithin:{softCommit:true},
autoCommit:{
  maxDocs:-1,
  maxTime:6,
  openSearcher:false},
autoSoftCommit:{
  maxDocs:-1,
  maxTime:3000}},
 
indexConfig:{
maxBufferedDocs:-1,
maxMergeDocs:-1,
maxIndexingThreads:8,
mergeFactor:-1,
ramBufferSizeMB:100.0,
writeLockTimeout:-1,
lockType:native}}}
 
  AND  maxWarmingSearchers2/maxWarmingSearchers
 
  I don't have know that how master and slave works. Normally, I created 8
  shards and indexed documents using :
 
 
 
 
  *http://localhost:8983/solr/test_commit_fast/update/json?commit=true
  http://localhost:8983/solr/test_commit_fast/update/json?commit=true
 -H
  'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching
  using
  *: http://localhost:8983/solr/test_commit_fast/select
  http://localhost:8983/solr/test_commit_fast/select*?q= field_name:
  search_string
 

Re: Concurrent Indexing and Searching in Solr.

2015-08-07 Thread Upayavira
How many CPUs do you have? 100 concurrent indexing calls seems like
rather a lot. You're gonna end up doing a lot of context switching,
hence degraded performance. Dunno what others would say, but I'd aim for
approx one indexing thread per CPU.

Upayavira

On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote:
 Hello Everyone,
   I have indexed 16 million documents in Solr
 Cloud. Created 4 nodes and 8 shards with single replica.
 I am trying to make concurrent indexing and searching on those indexed
 documents. Trying to make 100 concurrent indexing calls along with 100
 concurrent searching calls.
 It *degrades searching and indexing* performance both.
 
 Configuration :
 
   commitWithin:{softCommit:true},
   autoCommit:{
 maxDocs:-1,
 maxTime:6,
 openSearcher:false},
   autoSoftCommit:{
 maxDocs:-1,
 maxTime:3000}},
 
   indexConfig:{
   maxBufferedDocs:-1,
   maxMergeDocs:-1,
   maxIndexingThreads:8,
   mergeFactor:-1,
   ramBufferSizeMB:100.0,
   writeLockTimeout:-1,
   lockType:native}}}
 
 AND  maxWarmingSearchers2/maxWarmingSearchers
 
 I don't have know that how master and slave works. Normally, I created 8
 shards and indexed documents using :
 
 
 
 
 *http://localhost:8983/solr/test_commit_fast/update/json?commit=true
 http://localhost:8983/solr/test_commit_fast/update/json?commit=true -H
 'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching
 using
 *: http://localhost:8983/solr/test_commit_fast/select
 http://localhost:8983/solr/test_commit_fast/select*?q= field_name:
 search_string
 
 Please any help on it. To make searching and indexing fast concurrently.
 Thanks.
 
 
 Regards,
 Nitin