Re: Concurrent Indexing and Searching in Solr.
Hi Erick, Thanks a lot for your help. I will go through MongoDB. On Mon, Aug 10, 2015 at 9:14 PM Erick Erickson erickerick...@gmail.com wrote: bq: I changed maxWarmingSearchers*2*/maxWarmingSearchers to maxWarmingSearchers*100*/maxWarmingSearchers. And apply simultaneous searching using 100 workers. Do not do this. This has nothing to do with the number of searcher threads. And with your update rate, especially if you continue to insist on adding commit=true to every update request, this will explode your memory requirements. To no good purpose whatsoever. bq: But MongoDB can handle concurrent searching and indexing faster. Because MongoDB is optimized for different kinds of operations. Solr is a ranking, free-text search engine. It's an apples-and-oranges comparison. If MongoDB meets your search needs, you should use it. Best, Erick On Sun, Aug 9, 2015 at 11:04 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi, I used solr 5.2.1 version. It is fast, I think. But again, I am stuck on concurrent searching and threading. I changed maxWarmingSearchers*2*/maxWarmingSearchers to maxWarmingSearchers*100*/maxWarmingSearchers. And apply simultaneous searching using 100 workers. It works fast but not upto the mark. It increases searching from 1.5 to 0.5 seconds. But If I run only single worker then searching time is 0.03 seconds, it is too fast but not possible with 100 workers simultaneously. As Shawn said - Making 100 concurrent indexing requests at the same time as 100 concurrent queries will overwhelm *any* single Solr server. I got your point. But MongoDB can handle concurrent searching and indexing faster. Then why not solr? Sorry for this.. On Mon, Aug 10, 2015 at 2:39 AM Shawn Heisey apa...@elyograg.org wrote: On 8/7/2015 1:15 PM, Nitin Solanki wrote: I wrote a python script for indexing and using urllib and urllib2 for indexing data via http.. There are a number of Solr python clients. Using a client makes your code much easier to write and understand. https://wiki.apache.org/solr/SolPython I have no experience with any of these clients, but I can say that the one encountered most often when Python developers come into the #solr IRC channel is pysolr. Our wiki page says the last update for pysolr happened in December of 2013, but I can see that the last version on their web page is dated 2015-05-26. Making 100 concurrent indexing requests at the same time as 100 concurrent queries will overwhelm *any* single Solr server. In a previous message you said that you have 4 CPU cores. The load you're trying to put on Solr will require at *LEAST* 200 threads. It may be more than that. Any single system is going to have trouble with that. A system with 4 cores will be *very* overloaded. Thanks, Shawn
Re: Concurrent Indexing and Searching in Solr.
bq: I changed maxWarmingSearchers*2*/maxWarmingSearchers to maxWarmingSearchers*100*/maxWarmingSearchers. And apply simultaneous searching using 100 workers. Do not do this. This has nothing to do with the number of searcher threads. And with your update rate, especially if you continue to insist on adding commit=true to every update request, this will explode your memory requirements. To no good purpose whatsoever. bq: But MongoDB can handle concurrent searching and indexing faster. Because MongoDB is optimized for different kinds of operations. Solr is a ranking, free-text search engine. It's an apples-and-oranges comparison. If MongoDB meets your search needs, you should use it. Best, Erick On Sun, Aug 9, 2015 at 11:04 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi, I used solr 5.2.1 version. It is fast, I think. But again, I am stuck on concurrent searching and threading. I changed maxWarmingSearchers*2*/maxWarmingSearchers to maxWarmingSearchers*100*/maxWarmingSearchers. And apply simultaneous searching using 100 workers. It works fast but not upto the mark. It increases searching from 1.5 to 0.5 seconds. But If I run only single worker then searching time is 0.03 seconds, it is too fast but not possible with 100 workers simultaneously. As Shawn said - Making 100 concurrent indexing requests at the same time as 100 concurrent queries will overwhelm *any* single Solr server. I got your point. But MongoDB can handle concurrent searching and indexing faster. Then why not solr? Sorry for this.. On Mon, Aug 10, 2015 at 2:39 AM Shawn Heisey apa...@elyograg.org wrote: On 8/7/2015 1:15 PM, Nitin Solanki wrote: I wrote a python script for indexing and using urllib and urllib2 for indexing data via http.. There are a number of Solr python clients. Using a client makes your code much easier to write and understand. https://wiki.apache.org/solr/SolPython I have no experience with any of these clients, but I can say that the one encountered most often when Python developers come into the #solr IRC channel is pysolr. Our wiki page says the last update for pysolr happened in December of 2013, but I can see that the last version on their web page is dated 2015-05-26. Making 100 concurrent indexing requests at the same time as 100 concurrent queries will overwhelm *any* single Solr server. In a previous message you said that you have 4 CPU cores. The load you're trying to put on Solr will require at *LEAST* 200 threads. It may be more than that. Any single system is going to have trouble with that. A system with 4 cores will be *very* overloaded. Thanks, Shawn
Re: Concurrent Indexing and Searching in Solr.
Hi, I used solr 5.2.1 version. It is fast, I think. But again, I am stuck on concurrent searching and threading. I changed maxWarmingSearchers*2*/maxWarmingSearchers to maxWarmingSearchers*100*/maxWarmingSearchers. And apply simultaneous searching using 100 workers. It works fast but not upto the mark. It increases searching from 1.5 to 0.5 seconds. But If I run only single worker then searching time is 0.03 seconds, it is too fast but not possible with 100 workers simultaneously. As Shawn said - Making 100 concurrent indexing requests at the same time as 100 concurrent queries will overwhelm *any* single Solr server. I got your point. But MongoDB can handle concurrent searching and indexing faster. Then why not solr? Sorry for this.. On Mon, Aug 10, 2015 at 2:39 AM Shawn Heisey apa...@elyograg.org wrote: On 8/7/2015 1:15 PM, Nitin Solanki wrote: I wrote a python script for indexing and using urllib and urllib2 for indexing data via http.. There are a number of Solr python clients. Using a client makes your code much easier to write and understand. https://wiki.apache.org/solr/SolPython I have no experience with any of these clients, but I can say that the one encountered most often when Python developers come into the #solr IRC channel is pysolr. Our wiki page says the last update for pysolr happened in December of 2013, but I can see that the last version on their web page is dated 2015-05-26. Making 100 concurrent indexing requests at the same time as 100 concurrent queries will overwhelm *any* single Solr server. In a previous message you said that you have 4 CPU cores. The load you're trying to put on Solr will require at *LEAST* 200 threads. It may be more than that. Any single system is going to have trouble with that. A system with 4 cores will be *very* overloaded. Thanks, Shawn
Re: Concurrent Indexing and Searching in Solr.
On 8/7/2015 1:15 PM, Nitin Solanki wrote: I wrote a python script for indexing and using urllib and urllib2 for indexing data via http.. There are a number of Solr python clients. Using a client makes your code much easier to write and understand. https://wiki.apache.org/solr/SolPython I have no experience with any of these clients, but I can say that the one encountered most often when Python developers come into the #solr IRC channel is pysolr. Our wiki page says the last update for pysolr happened in December of 2013, but I can see that the last version on their web page is dated 2015-05-26. Making 100 concurrent indexing requests at the same time as 100 concurrent queries will overwhelm *any* single Solr server. In a previous message you said that you have 4 CPU cores. The load you're trying to put on Solr will require at *LEAST* 200 threads. It may be more than that. Any single system is going to have trouble with that. A system with 4 cores will be *very* overloaded. Thanks, Shawn
Re: Concurrent Indexing and Searching in Solr.
Thanks Erick for your suggestion. I will remove commit = true and use solr 5.2 and then get back to you again. For further help. Thanks. On Sat, Aug 8, 2015 at 4:07 AM Erick Erickson erickerick...@gmail.com wrote: bq: So, How much minimum concurrent threads should I run? I really can't answer that in the abstract, you'll simply have to test. I'd prefer SolrJ to post.jar. If you're not going to SolrJ, I'd imagine that moving from Python to post.jar isn't all that useful. But before you do anything, see what really happens when you remove th commit=true. That's likely way more important than the rest. Best, Erick On Fri, Aug 7, 2015 at 3:15 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, posting files to Solr via curl = Rather than posting files via curl. Which is better SolrJ or post.jar... I don't use both things. I wrote a python script for indexing and using urllib and urllib2 for indexing data via http.. I don't have any option to use SolrJ Right now. How can I do same thing via post.jar in python? Any help Please. indexing with 100 threads is going to eat up a lot of CPU cycles = So, How much minimum concurrent threads should I run? And I also need concurrent searching. So, How much? And Thanks for solr 5.2, I will go through that. Thanking for reply. Please help me.. On Fri, Aug 7, 2015 at 11:51 PM Erick Erickson erickerick...@gmail.com wrote: bq: How much limitations does Solr has related to indexing and searching simultaneously? It means that how many simultaneously calls, I made for searching and indexing once? None a-priori. It all depends on the hardware you're throwing at it. Obviously indexing with 100 threads is going to eat up a lot of CPU cycles that can't then be devoted to satisfying queries. You need to strike a balance. Do seriously consider using some other method than posting files to Solr via curl or the like, that's rarely a robust solution for production. As for adding the commit=true, this shouldn't be affecting the index size, I suspect you were mislead by something else happening. Really, remove it or you'll beat up your system hugely. As for the soft commit interval, that's totally irrelevant when you're committing every document. But do lengthen it as much as you can. Most of the time when people say real time, it turns out that 10 seconds is OK. Or 60 seconds is OK. You have to check what the _real_ requirement is, it's often not what's stated. bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding indexing and searching data. Did you read the link I provided? With replicas, 5.2 will index almost twice as fast. That means (roughly) half the work on the followers is being done, freeing up cycles for performing queries. Best, Erick On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, You said that soft commit should be more than 3000 ms. Actually, I need Real time searching and that's why I need soft commit fast. commit=true = I made commit=true because , It reduces by indexed data size from 1.5GB to 500MB on* each shard*. When I did commit=false then, my indexed data size was 1.5GB. After changing it to commit=true, then size reduced to 500MB only. I am not getting how is it? I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding indexing and searching data. How much limitations does Solr has related to indexing and searching simultaneously? It means that how many simultaneously calls, I made for searching and indexing once? On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson erickerick...@gmail.com wrote: Your soft commit time of 3 seconds is quite aggressive, I'd lengthen it to as long as possible. Ugh, looked at your query more closely. Adding commit=true to every update request is horrible performance wise. Let your autocommit process handle the commits is the first thing I'd do. Second, I'd try going to SolrJ and batching up documents (I usually start with 1,000) or using the post.jar tool rather than sending them via a raw URL. I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what version of Solr? There was a 2x speedup in Solr 5.2, see: http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/ One symptom was that the followers were doing way more work than the leader (BTW, using master/slave when talking SolrCloud is a bit confusing...) which will affect query response rates. Basically, if query response is paramount, you really need to throttle your indexing, there's just a whole lot of work going on here.. Best, Erick On Fri, Aug 7, 2015 at 11:23 AM, Upayavira u...@odoko.co.uk wrote: How many CPUs do you have? 100 concurrent
Re: Concurrent Indexing and Searching in Solr.
If you are using Python, then you can use urllib2, or requests which is reportedly better, or better still something like pysolr, which makes life simpler. Here's a Pull Request that makes pysolr Zookeeper aware, which'll help if you are using SolrCloud. I hope one day they will merge it: https://github.com/toastdriven/pysolr/pull/138 Upayavira On Fri, Aug 7, 2015, at 11:37 PM, Erick Erickson wrote: bq: So, How much minimum concurrent threads should I run? I really can't answer that in the abstract, you'll simply have to test. I'd prefer SolrJ to post.jar. If you're not going to SolrJ, I'd imagine that moving from Python to post.jar isn't all that useful. But before you do anything, see what really happens when you remove th commit=true. That's likely way more important than the rest. Best, Erick On Fri, Aug 7, 2015 at 3:15 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, posting files to Solr via curl = Rather than posting files via curl. Which is better SolrJ or post.jar... I don't use both things. I wrote a python script for indexing and using urllib and urllib2 for indexing data via http.. I don't have any option to use SolrJ Right now. How can I do same thing via post.jar in python? Any help Please. indexing with 100 threads is going to eat up a lot of CPU cycles = So, How much minimum concurrent threads should I run? And I also need concurrent searching. So, How much? And Thanks for solr 5.2, I will go through that. Thanking for reply. Please help me.. On Fri, Aug 7, 2015 at 11:51 PM Erick Erickson erickerick...@gmail.com wrote: bq: How much limitations does Solr has related to indexing and searching simultaneously? It means that how many simultaneously calls, I made for searching and indexing once? None a-priori. It all depends on the hardware you're throwing at it. Obviously indexing with 100 threads is going to eat up a lot of CPU cycles that can't then be devoted to satisfying queries. You need to strike a balance. Do seriously consider using some other method than posting files to Solr via curl or the like, that's rarely a robust solution for production. As for adding the commit=true, this shouldn't be affecting the index size, I suspect you were mislead by something else happening. Really, remove it or you'll beat up your system hugely. As for the soft commit interval, that's totally irrelevant when you're committing every document. But do lengthen it as much as you can. Most of the time when people say real time, it turns out that 10 seconds is OK. Or 60 seconds is OK. You have to check what the _real_ requirement is, it's often not what's stated. bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding indexing and searching data. Did you read the link I provided? With replicas, 5.2 will index almost twice as fast. That means (roughly) half the work on the followers is being done, freeing up cycles for performing queries. Best, Erick On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, You said that soft commit should be more than 3000 ms. Actually, I need Real time searching and that's why I need soft commit fast. commit=true = I made commit=true because , It reduces by indexed data size from 1.5GB to 500MB on* each shard*. When I did commit=false then, my indexed data size was 1.5GB. After changing it to commit=true, then size reduced to 500MB only. I am not getting how is it? I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding indexing and searching data. How much limitations does Solr has related to indexing and searching simultaneously? It means that how many simultaneously calls, I made for searching and indexing once? On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson erickerick...@gmail.com wrote: Your soft commit time of 3 seconds is quite aggressive, I'd lengthen it to as long as possible. Ugh, looked at your query more closely. Adding commit=true to every update request is horrible performance wise. Let your autocommit process handle the commits is the first thing I'd do. Second, I'd try going to SolrJ and batching up documents (I usually start with 1,000) or using the post.jar tool rather than sending them via a raw URL. I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what version of Solr? There was a 2x speedup in Solr 5.2, see: http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/ One symptom was that the followers were doing way more work than the leader (BTW, using master/slave when talking SolrCloud is a bit confusing...) which will affect query response rates. Basically, if query response is paramount, you really need to throttle your indexing, there's just a
Concurrent Indexing and Searching in Solr.
Hello Everyone, I have indexed 16 million documents in Solr Cloud. Created 4 nodes and 8 shards with single replica. I am trying to make concurrent indexing and searching on those indexed documents. Trying to make 100 concurrent indexing calls along with 100 concurrent searching calls. It *degrades searching and indexing* performance both. Configuration : commitWithin:{softCommit:true}, autoCommit:{ maxDocs:-1, maxTime:6, openSearcher:false}, autoSoftCommit:{ maxDocs:-1, maxTime:3000}}, indexConfig:{ maxBufferedDocs:-1, maxMergeDocs:-1, maxIndexingThreads:8, mergeFactor:-1, ramBufferSizeMB:100.0, writeLockTimeout:-1, lockType:native}}} AND maxWarmingSearchers2/maxWarmingSearchers I don't have know that how master and slave works. Normally, I created 8 shards and indexed documents using : *http://localhost:8983/solr/test_commit_fast/update/json?commit=true http://localhost:8983/solr/test_commit_fast/update/json?commit=true -H 'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching using *: http://localhost:8983/solr/test_commit_fast/select http://localhost:8983/solr/test_commit_fast/select*?q= field_name: search_string Please any help on it. To make searching and indexing fast concurrently. Thanks. Regards, Nitin
Re: Concurrent Indexing and Searching in Solr.
bq: So, How much minimum concurrent threads should I run? I really can't answer that in the abstract, you'll simply have to test. I'd prefer SolrJ to post.jar. If you're not going to SolrJ, I'd imagine that moving from Python to post.jar isn't all that useful. But before you do anything, see what really happens when you remove th commit=true. That's likely way more important than the rest. Best, Erick On Fri, Aug 7, 2015 at 3:15 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, posting files to Solr via curl = Rather than posting files via curl. Which is better SolrJ or post.jar... I don't use both things. I wrote a python script for indexing and using urllib and urllib2 for indexing data via http.. I don't have any option to use SolrJ Right now. How can I do same thing via post.jar in python? Any help Please. indexing with 100 threads is going to eat up a lot of CPU cycles = So, How much minimum concurrent threads should I run? And I also need concurrent searching. So, How much? And Thanks for solr 5.2, I will go through that. Thanking for reply. Please help me.. On Fri, Aug 7, 2015 at 11:51 PM Erick Erickson erickerick...@gmail.com wrote: bq: How much limitations does Solr has related to indexing and searching simultaneously? It means that how many simultaneously calls, I made for searching and indexing once? None a-priori. It all depends on the hardware you're throwing at it. Obviously indexing with 100 threads is going to eat up a lot of CPU cycles that can't then be devoted to satisfying queries. You need to strike a balance. Do seriously consider using some other method than posting files to Solr via curl or the like, that's rarely a robust solution for production. As for adding the commit=true, this shouldn't be affecting the index size, I suspect you were mislead by something else happening. Really, remove it or you'll beat up your system hugely. As for the soft commit interval, that's totally irrelevant when you're committing every document. But do lengthen it as much as you can. Most of the time when people say real time, it turns out that 10 seconds is OK. Or 60 seconds is OK. You have to check what the _real_ requirement is, it's often not what's stated. bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding indexing and searching data. Did you read the link I provided? With replicas, 5.2 will index almost twice as fast. That means (roughly) half the work on the followers is being done, freeing up cycles for performing queries. Best, Erick On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, You said that soft commit should be more than 3000 ms. Actually, I need Real time searching and that's why I need soft commit fast. commit=true = I made commit=true because , It reduces by indexed data size from 1.5GB to 500MB on* each shard*. When I did commit=false then, my indexed data size was 1.5GB. After changing it to commit=true, then size reduced to 500MB only. I am not getting how is it? I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding indexing and searching data. How much limitations does Solr has related to indexing and searching simultaneously? It means that how many simultaneously calls, I made for searching and indexing once? On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson erickerick...@gmail.com wrote: Your soft commit time of 3 seconds is quite aggressive, I'd lengthen it to as long as possible. Ugh, looked at your query more closely. Adding commit=true to every update request is horrible performance wise. Let your autocommit process handle the commits is the first thing I'd do. Second, I'd try going to SolrJ and batching up documents (I usually start with 1,000) or using the post.jar tool rather than sending them via a raw URL. I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what version of Solr? There was a 2x speedup in Solr 5.2, see: http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/ One symptom was that the followers were doing way more work than the leader (BTW, using master/slave when talking SolrCloud is a bit confusing...) which will affect query response rates. Basically, if query response is paramount, you really need to throttle your indexing, there's just a whole lot of work going on here.. Best, Erick On Fri, Aug 7, 2015 at 11:23 AM, Upayavira u...@odoko.co.uk wrote: How many CPUs do you have? 100 concurrent indexing calls seems like rather a lot. You're gonna end up doing a lot of context switching, hence degraded performance. Dunno what others would say, but I'd aim for approx one indexing thread per CPU. Upayavira On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote: Hello Everyone, I have indexed 16 million
Re: Concurrent Indexing and Searching in Solr.
Your soft commit time of 3 seconds is quite aggressive, I'd lengthen it to as long as possible. Ugh, looked at your query more closely. Adding commit=true to every update request is horrible performance wise. Let your autocommit process handle the commits is the first thing I'd do. Second, I'd try going to SolrJ and batching up documents (I usually start with 1,000) or using the post.jar tool rather than sending them via a raw URL. I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what version of Solr? There was a 2x speedup in Solr 5.2, see: http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/ One symptom was that the followers were doing way more work than the leader (BTW, using master/slave when talking SolrCloud is a bit confusing...) which will affect query response rates. Basically, if query response is paramount, you really need to throttle your indexing, there's just a whole lot of work going on here.. Best, Erick On Fri, Aug 7, 2015 at 11:23 AM, Upayavira u...@odoko.co.uk wrote: How many CPUs do you have? 100 concurrent indexing calls seems like rather a lot. You're gonna end up doing a lot of context switching, hence degraded performance. Dunno what others would say, but I'd aim for approx one indexing thread per CPU. Upayavira On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote: Hello Everyone, I have indexed 16 million documents in Solr Cloud. Created 4 nodes and 8 shards with single replica. I am trying to make concurrent indexing and searching on those indexed documents. Trying to make 100 concurrent indexing calls along with 100 concurrent searching calls. It *degrades searching and indexing* performance both. Configuration : commitWithin:{softCommit:true}, autoCommit:{ maxDocs:-1, maxTime:6, openSearcher:false}, autoSoftCommit:{ maxDocs:-1, maxTime:3000}}, indexConfig:{ maxBufferedDocs:-1, maxMergeDocs:-1, maxIndexingThreads:8, mergeFactor:-1, ramBufferSizeMB:100.0, writeLockTimeout:-1, lockType:native}}} AND maxWarmingSearchers2/maxWarmingSearchers I don't have know that how master and slave works. Normally, I created 8 shards and indexed documents using : *http://localhost:8983/solr/test_commit_fast/update/json?commit=true http://localhost:8983/solr/test_commit_fast/update/json?commit=true -H 'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching using *: http://localhost:8983/solr/test_commit_fast/select http://localhost:8983/solr/test_commit_fast/select*?q= field_name: search_string Please any help on it. To make searching and indexing fast concurrently. Thanks. Regards, Nitin
Re: Concurrent Indexing and Searching in Solr.
Hi Erick, posting files to Solr via curl = Rather than posting files via curl. Which is better SolrJ or post.jar... I don't use both things. I wrote a python script for indexing and using urllib and urllib2 for indexing data via http.. I don't have any option to use SolrJ Right now. How can I do same thing via post.jar in python? Any help Please. indexing with 100 threads is going to eat up a lot of CPU cycles = So, How much minimum concurrent threads should I run? And I also need concurrent searching. So, How much? And Thanks for solr 5.2, I will go through that. Thanking for reply. Please help me.. On Fri, Aug 7, 2015 at 11:51 PM Erick Erickson erickerick...@gmail.com wrote: bq: How much limitations does Solr has related to indexing and searching simultaneously? It means that how many simultaneously calls, I made for searching and indexing once? None a-priori. It all depends on the hardware you're throwing at it. Obviously indexing with 100 threads is going to eat up a lot of CPU cycles that can't then be devoted to satisfying queries. You need to strike a balance. Do seriously consider using some other method than posting files to Solr via curl or the like, that's rarely a robust solution for production. As for adding the commit=true, this shouldn't be affecting the index size, I suspect you were mislead by something else happening. Really, remove it or you'll beat up your system hugely. As for the soft commit interval, that's totally irrelevant when you're committing every document. But do lengthen it as much as you can. Most of the time when people say real time, it turns out that 10 seconds is OK. Or 60 seconds is OK. You have to check what the _real_ requirement is, it's often not what's stated. bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding indexing and searching data. Did you read the link I provided? With replicas, 5.2 will index almost twice as fast. That means (roughly) half the work on the followers is being done, freeing up cycles for performing queries. Best, Erick On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, You said that soft commit should be more than 3000 ms. Actually, I need Real time searching and that's why I need soft commit fast. commit=true = I made commit=true because , It reduces by indexed data size from 1.5GB to 500MB on* each shard*. When I did commit=false then, my indexed data size was 1.5GB. After changing it to commit=true, then size reduced to 500MB only. I am not getting how is it? I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding indexing and searching data. How much limitations does Solr has related to indexing and searching simultaneously? It means that how many simultaneously calls, I made for searching and indexing once? On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson erickerick...@gmail.com wrote: Your soft commit time of 3 seconds is quite aggressive, I'd lengthen it to as long as possible. Ugh, looked at your query more closely. Adding commit=true to every update request is horrible performance wise. Let your autocommit process handle the commits is the first thing I'd do. Second, I'd try going to SolrJ and batching up documents (I usually start with 1,000) or using the post.jar tool rather than sending them via a raw URL. I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what version of Solr? There was a 2x speedup in Solr 5.2, see: http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/ One symptom was that the followers were doing way more work than the leader (BTW, using master/slave when talking SolrCloud is a bit confusing...) which will affect query response rates. Basically, if query response is paramount, you really need to throttle your indexing, there's just a whole lot of work going on here.. Best, Erick On Fri, Aug 7, 2015 at 11:23 AM, Upayavira u...@odoko.co.uk wrote: How many CPUs do you have? 100 concurrent indexing calls seems like rather a lot. You're gonna end up doing a lot of context switching, hence degraded performance. Dunno what others would say, but I'd aim for approx one indexing thread per CPU. Upayavira On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote: Hello Everyone, I have indexed 16 million documents in Solr Cloud. Created 4 nodes and 8 shards with single replica. I am trying to make concurrent indexing and searching on those indexed documents. Trying to make 100 concurrent indexing calls along with 100 concurrent searching calls. It *degrades searching and indexing* performance both. Configuration : commitWithin:{softCommit:true}, autoCommit:{ maxDocs:-1, maxTime:6, openSearcher:false},
Re: Concurrent Indexing and Searching in Solr.
Hi Erick, You said that soft commit should be more than 3000 ms. Actually, I need Real time searching and that's why I need soft commit fast. commit=true = I made commit=true because , It reduces by indexed data size from 1.5GB to 500MB on* each shard*. When I did commit=false then, my indexed data size was 1.5GB. After changing it to commit=true, then size reduced to 500MB only. I am not getting how is it? I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding indexing and searching data. How much limitations does Solr has related to indexing and searching simultaneously? It means that how many simultaneously calls, I made for searching and indexing once? On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson erickerick...@gmail.com wrote: Your soft commit time of 3 seconds is quite aggressive, I'd lengthen it to as long as possible. Ugh, looked at your query more closely. Adding commit=true to every update request is horrible performance wise. Let your autocommit process handle the commits is the first thing I'd do. Second, I'd try going to SolrJ and batching up documents (I usually start with 1,000) or using the post.jar tool rather than sending them via a raw URL. I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what version of Solr? There was a 2x speedup in Solr 5.2, see: http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/ One symptom was that the followers were doing way more work than the leader (BTW, using master/slave when talking SolrCloud is a bit confusing...) which will affect query response rates. Basically, if query response is paramount, you really need to throttle your indexing, there's just a whole lot of work going on here.. Best, Erick On Fri, Aug 7, 2015 at 11:23 AM, Upayavira u...@odoko.co.uk wrote: How many CPUs do you have? 100 concurrent indexing calls seems like rather a lot. You're gonna end up doing a lot of context switching, hence degraded performance. Dunno what others would say, but I'd aim for approx one indexing thread per CPU. Upayavira On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote: Hello Everyone, I have indexed 16 million documents in Solr Cloud. Created 4 nodes and 8 shards with single replica. I am trying to make concurrent indexing and searching on those indexed documents. Trying to make 100 concurrent indexing calls along with 100 concurrent searching calls. It *degrades searching and indexing* performance both. Configuration : commitWithin:{softCommit:true}, autoCommit:{ maxDocs:-1, maxTime:6, openSearcher:false}, autoSoftCommit:{ maxDocs:-1, maxTime:3000}}, indexConfig:{ maxBufferedDocs:-1, maxMergeDocs:-1, maxIndexingThreads:8, mergeFactor:-1, ramBufferSizeMB:100.0, writeLockTimeout:-1, lockType:native}}} AND maxWarmingSearchers2/maxWarmingSearchers I don't have know that how master and slave works. Normally, I created 8 shards and indexed documents using : *http://localhost:8983/solr/test_commit_fast/update/json?commit=true http://localhost:8983/solr/test_commit_fast/update/json?commit=true -H 'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching using *: http://localhost:8983/solr/test_commit_fast/select http://localhost:8983/solr/test_commit_fast/select*?q= field_name: search_string Please any help on it. To make searching and indexing fast concurrently. Thanks. Regards, Nitin
Re: Concurrent Indexing and Searching in Solr.
Hi, Upayavira RAM = 28GB CPU = 4 processes.. On Fri, Aug 7, 2015 at 8:53 PM Upayavira u...@odoko.co.uk wrote: How many CPUs do you have? 100 concurrent indexing calls seems like rather a lot. You're gonna end up doing a lot of context switching, hence degraded performance. Dunno what others would say, but I'd aim for approx one indexing thread per CPU. Upayavira On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote: Hello Everyone, I have indexed 16 million documents in Solr Cloud. Created 4 nodes and 8 shards with single replica. I am trying to make concurrent indexing and searching on those indexed documents. Trying to make 100 concurrent indexing calls along with 100 concurrent searching calls. It *degrades searching and indexing* performance both. Configuration : commitWithin:{softCommit:true}, autoCommit:{ maxDocs:-1, maxTime:6, openSearcher:false}, autoSoftCommit:{ maxDocs:-1, maxTime:3000}}, indexConfig:{ maxBufferedDocs:-1, maxMergeDocs:-1, maxIndexingThreads:8, mergeFactor:-1, ramBufferSizeMB:100.0, writeLockTimeout:-1, lockType:native}}} AND maxWarmingSearchers2/maxWarmingSearchers I don't have know that how master and slave works. Normally, I created 8 shards and indexed documents using : *http://localhost:8983/solr/test_commit_fast/update/json?commit=true http://localhost:8983/solr/test_commit_fast/update/json?commit=true -H 'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching using *: http://localhost:8983/solr/test_commit_fast/select http://localhost:8983/solr/test_commit_fast/select*?q= field_name: search_string Please any help on it. To make searching and indexing fast concurrently. Thanks. Regards, Nitin
Re: Concurrent Indexing and Searching in Solr.
bq: How much limitations does Solr has related to indexing and searching simultaneously? It means that how many simultaneously calls, I made for searching and indexing once? None a-priori. It all depends on the hardware you're throwing at it. Obviously indexing with 100 threads is going to eat up a lot of CPU cycles that can't then be devoted to satisfying queries. You need to strike a balance. Do seriously consider using some other method than posting files to Solr via curl or the like, that's rarely a robust solution for production. As for adding the commit=true, this shouldn't be affecting the index size, I suspect you were mislead by something else happening. Really, remove it or you'll beat up your system hugely. As for the soft commit interval, that's totally irrelevant when you're committing every document. But do lengthen it as much as you can. Most of the time when people say real time, it turns out that 10 seconds is OK. Or 60 seconds is OK. You have to check what the _real_ requirement is, it's often not what's stated. bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding indexing and searching data. Did you read the link I provided? With replicas, 5.2 will index almost twice as fast. That means (roughly) half the work on the followers is being done, freeing up cycles for performing queries. Best, Erick On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, You said that soft commit should be more than 3000 ms. Actually, I need Real time searching and that's why I need soft commit fast. commit=true = I made commit=true because , It reduces by indexed data size from 1.5GB to 500MB on* each shard*. When I did commit=false then, my indexed data size was 1.5GB. After changing it to commit=true, then size reduced to 500MB only. I am not getting how is it? I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding indexing and searching data. How much limitations does Solr has related to indexing and searching simultaneously? It means that how many simultaneously calls, I made for searching and indexing once? On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson erickerick...@gmail.com wrote: Your soft commit time of 3 seconds is quite aggressive, I'd lengthen it to as long as possible. Ugh, looked at your query more closely. Adding commit=true to every update request is horrible performance wise. Let your autocommit process handle the commits is the first thing I'd do. Second, I'd try going to SolrJ and batching up documents (I usually start with 1,000) or using the post.jar tool rather than sending them via a raw URL. I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what version of Solr? There was a 2x speedup in Solr 5.2, see: http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/ One symptom was that the followers were doing way more work than the leader (BTW, using master/slave when talking SolrCloud is a bit confusing...) which will affect query response rates. Basically, if query response is paramount, you really need to throttle your indexing, there's just a whole lot of work going on here.. Best, Erick On Fri, Aug 7, 2015 at 11:23 AM, Upayavira u...@odoko.co.uk wrote: How many CPUs do you have? 100 concurrent indexing calls seems like rather a lot. You're gonna end up doing a lot of context switching, hence degraded performance. Dunno what others would say, but I'd aim for approx one indexing thread per CPU. Upayavira On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote: Hello Everyone, I have indexed 16 million documents in Solr Cloud. Created 4 nodes and 8 shards with single replica. I am trying to make concurrent indexing and searching on those indexed documents. Trying to make 100 concurrent indexing calls along with 100 concurrent searching calls. It *degrades searching and indexing* performance both. Configuration : commitWithin:{softCommit:true}, autoCommit:{ maxDocs:-1, maxTime:6, openSearcher:false}, autoSoftCommit:{ maxDocs:-1, maxTime:3000}}, indexConfig:{ maxBufferedDocs:-1, maxMergeDocs:-1, maxIndexingThreads:8, mergeFactor:-1, ramBufferSizeMB:100.0, writeLockTimeout:-1, lockType:native}}} AND maxWarmingSearchers2/maxWarmingSearchers I don't have know that how master and slave works. Normally, I created 8 shards and indexed documents using : *http://localhost:8983/solr/test_commit_fast/update/json?commit=true http://localhost:8983/solr/test_commit_fast/update/json?commit=true -H 'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching using *: http://localhost:8983/solr/test_commit_fast/select http://localhost:8983/solr/test_commit_fast/select*?q= field_name: search_string
Re: Concurrent Indexing and Searching in Solr.
How many CPUs do you have? 100 concurrent indexing calls seems like rather a lot. You're gonna end up doing a lot of context switching, hence degraded performance. Dunno what others would say, but I'd aim for approx one indexing thread per CPU. Upayavira On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote: Hello Everyone, I have indexed 16 million documents in Solr Cloud. Created 4 nodes and 8 shards with single replica. I am trying to make concurrent indexing and searching on those indexed documents. Trying to make 100 concurrent indexing calls along with 100 concurrent searching calls. It *degrades searching and indexing* performance both. Configuration : commitWithin:{softCommit:true}, autoCommit:{ maxDocs:-1, maxTime:6, openSearcher:false}, autoSoftCommit:{ maxDocs:-1, maxTime:3000}}, indexConfig:{ maxBufferedDocs:-1, maxMergeDocs:-1, maxIndexingThreads:8, mergeFactor:-1, ramBufferSizeMB:100.0, writeLockTimeout:-1, lockType:native}}} AND maxWarmingSearchers2/maxWarmingSearchers I don't have know that how master and slave works. Normally, I created 8 shards and indexed documents using : *http://localhost:8983/solr/test_commit_fast/update/json?commit=true http://localhost:8983/solr/test_commit_fast/update/json?commit=true -H 'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching using *: http://localhost:8983/solr/test_commit_fast/select http://localhost:8983/solr/test_commit_fast/select*?q= field_name: search_string Please any help on it. To make searching and indexing fast concurrently. Thanks. Regards, Nitin