lucene vs Solr Indexing on Sample data
Hello Everyone, I had posted a question on stackoverflow.com after performing a few POCs My hadrware consist of a single i-3 intel processor (4 CPU as per dxdiag on run ), 8GB Ram, Laptop machine. My Question Link : http://stackoverflow.com/questions/30823314/lucene-vs-solr-indexning-speed-for-sampe-data but no one could solve it as of now.. I hope the question I posted is undertandable. Please if anyone could help me out with the indexing speed of Solr (way slower) vs Lucene (way faster).. I am trying to build a module for real time indexing and querying, and the traffic is high, POC pass with Lucene for handling High Traffic for Indexing, for Solr It is not able to do so.. Again My Machine Spec : HP, intel core i3, 8GB ram, TB HDD. Please let me know if there is a problem with Solr or am I doing anything wrong. Thanks Argho
Re: lucene vs Solr Indexing on Sample data
Actually I can see a problem in your question… Lucene and Solr are not competitor technologies. Solr is a Search Server that internally uses the Lucene library and offers easy to use configuration and REST API. Lucene is a library that implements tons of search algorithms and features. You can see Solr as best practice for Lucene implemented server. It offers out of the box a usable search server with tons of features easy to use( take a look to the official site to have an idea) . On the other hand Lucene is a library, so you can develop with it your personal Search Server or Search application. More than performance you should really understand if you want to rewrite a lot of already implemented search features, or maybe re-use the ones developer by Lucene gurus. Furthermore of course, it depends of the feature you really need for your application. Cheers 2015-06-15 13:16 GMT+01:00 Argho Chatterjee joy.chatterjee.crazyc...@gmail.com: Hello Everyone, I had posted a question on stackoverflow.com after performing a few POCs My hadrware consist of a single i-3 intel processor (4 CPU as per dxdiag on run ), 8GB Ram, Laptop machine. My Question Link : http://stackoverflow.com/questions/30823314/lucene-vs-solr-indexning-speed-for-sampe-data but no one could solve it as of now.. I hope the question I posted is undertandable. Please if anyone could help me out with the indexing speed of Solr (way slower) vs Lucene (way faster).. I am trying to build a module for real time indexing and querying, and the traffic is high, POC pass with Lucene for handling High Traffic for Indexing, for Solr It is not able to do so.. Again My Machine Spec : HP, intel core i3, 8GB ram, TB HDD. Please let me know if there is a problem with Solr or am I doing anything wrong. Thanks Argho -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: lucene vs Solr Indexing on Sample data
Basically I expect you're falling afoul of a very common misunderstanding; It's not that Solr is slower, it's that the client isn't feeding Solr as fast as it should. If you profile your Solr server, my suspicion is that you're not driving it very hard. You'll probably see 4 spikes in CPU activity, followed by it doing nothing at all. The spikes are when you actually send the doclist to Solr. Your client is creating a 250K document packet, _then_ transmitting it to Solr, waiting for the response, then creating another packet. While creating a packet, Solr is doing nothing at all, just waiting. You'll get better performance by using ConcurrentUpdateSolrClient and much smaller packets (say 1,000). Give it, say, 10 threads and a queue length of 10 or so. You'll have to experiment for sure. Now, all that said since Solr is wrapping Lucene, since there's some additional overhead because Solr has to parse out the doc and pass it on to Lucene etc, you'll inevitably see some degradation. It shouldn't be as extreme as you're seeing though so I'm pretty sure you'll find your client isn't written to get the best performance out of Solr. In future, please don't link questions to another forum. It makes it less likely that people will actually respond. Best, Erick On Mon, Jun 15, 2015 at 6:52 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Actually I can see a problem in your question… Lucene and Solr are not competitor technologies. Solr is a Search Server that internally uses the Lucene library and offers easy to use configuration and REST API. Lucene is a library that implements tons of search algorithms and features. You can see Solr as best practice for Lucene implemented server. It offers out of the box a usable search server with tons of features easy to use( take a look to the official site to have an idea) . On the other hand Lucene is a library, so you can develop with it your personal Search Server or Search application. More than performance you should really understand if you want to rewrite a lot of already implemented search features, or maybe re-use the ones developer by Lucene gurus. Furthermore of course, it depends of the feature you really need for your application. Cheers 2015-06-15 13:16 GMT+01:00 Argho Chatterjee joy.chatterjee.crazyc...@gmail.com: Hello Everyone, I had posted a question on stackoverflow.com after performing a few POCs My hadrware consist of a single i-3 intel processor (4 CPU as per dxdiag on run ), 8GB Ram, Laptop machine. My Question Link : http://stackoverflow.com/questions/30823314/lucene-vs-solr-indexning-speed-for-sampe-data but no one could solve it as of now.. I hope the question I posted is undertandable. Please if anyone could help me out with the indexing speed of Solr (way slower) vs Lucene (way faster).. I am trying to build a module for real time indexing and querying, and the traffic is high, POC pass with Lucene for handling High Traffic for Indexing, for Solr It is not able to do so.. Again My Machine Spec : HP, intel core i3, 8GB ram, TB HDD. Please let me know if there is a problem with Solr or am I doing anything wrong. Thanks Argho -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Lucene vs Solr design decision
Great answer Robert. On Fri, Mar 9, 2012 at 12:06 PM, Robert Stewart bstewart...@gmail.com wrote: Split up index into say 100 cores, and then route each search to a specific core by some mod operator on the user id: core_number = userid % num_cores core_name = core+core_number That way each index core is relatively small (maybe 100 million docs or less). On Mar 9, 2012, at 2:02 PM, Glen Newton wrote: millions of cores will not work... ...yet. -glen On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote: Solr has no limitation on the number of cores. It's limited by your hardware, inodes and how many files you could keep open. I think even if you went the Lucene route you would run into same hardware limits. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html Sent from the Solr - User mailing list archive at Nabble.com. -- - http://zzzoot.blogspot.com/ - -- Bill Bell billnb...@gmail.com cell 720-256-8076
Lucene vs Solr design decision
Hi everybody, Let's say we have a system with billions of small documents (average of 2-3 fields). and each document belongs to JUST ONE user and searches are user specific, meaning that when we search for something, we just look into documents of that user. On the other hand we need to see the newly added documents as soon as they are added to the indexes. Now I think we have two solutions: 1. Use Lucene directly and create a separate index file for each user 2. Use Solr and store all of the users' data all together in one HUGE index file the benefit of using Lucene is that each commit() will take less time comparing to the case that we use Solr. Is there any suggested solution for cases like this? Thanks -- Alireza Salimi Java EE Developer
Re: Lucene vs Solr design decision
Solr has cores which are independent search indexes. You could create a separate core per user. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813489.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Lucene vs Solr design decision
Sorry I didn't mention that, the number of users can be millions! Meaning that millions of cores! So I'm not sure if it's a good idea. On Fri, Mar 9, 2012 at 1:35 PM, Lan dung@gmail.com wrote: Solr has cores which are independent search indexes. You could create a separate core per user. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813489.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alireza Salimi Java EE Developer
Re: Lucene vs Solr design decision
Solr has no limitation on the number of cores. It's limited by your hardware, inodes and how many files you could keep open. I think even if you went the Lucene route you would run into same hardware limits. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Lucene vs Solr design decision
millions of cores will not work... ...yet. -glen On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote: Solr has no limitation on the number of cores. It's limited by your hardware, inodes and how many files you could keep open. I think even if you went the Lucene route you would run into same hardware limits. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html Sent from the Solr - User mailing list archive at Nabble.com. -- - http://zzzoot.blogspot.com/ -
Re: Lucene vs Solr design decision
probably, and besides that, how can I use the features that SolrCloud provides (i.e. high availability and distribution)? The other solution would be to use SolrCloud and keep all of the users' information in single collection and use NRT. But on the other hand the frequency of updates on that big collection will be high. Do you think it makes sense? On Fri, Mar 9, 2012 at 2:02 PM, Glen Newton glen.new...@gmail.com wrote: millions of cores will not work... ...yet. -glen On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote: Solr has no limitation on the number of cores. It's limited by your hardware, inodes and how many files you could keep open. I think even if you went the Lucene route you would run into same hardware limits. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html Sent from the Solr - User mailing list archive at Nabble.com. -- - http://zzzoot.blogspot.com/ - -- Alireza Salimi Java EE Developer
Re: Lucene vs Solr design decision
Split up index into say 100 cores, and then route each search to a specific core by some mod operator on the user id: core_number = userid % num_cores core_name = core+core_number That way each index core is relatively small (maybe 100 million docs or less). On Mar 9, 2012, at 2:02 PM, Glen Newton wrote: millions of cores will not work... ...yet. -glen On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote: Solr has no limitation on the number of cores. It's limited by your hardware, inodes and how many files you could keep open. I think even if you went the Lucene route you would run into same hardware limits. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html Sent from the Solr - User mailing list archive at Nabble.com. -- - http://zzzoot.blogspot.com/ -
Re: Lucene vs Solr design decision
This solution makes sense, but I still don't know if I can use solrCloud with this configuration or not. On Fri, Mar 9, 2012 at 2:06 PM, Robert Stewart bstewart...@gmail.comwrote: Split up index into say 100 cores, and then route each search to a specific core by some mod operator on the user id: core_number = userid % num_cores core_name = core+core_number That way each index core is relatively small (maybe 100 million docs or less). On Mar 9, 2012, at 2:02 PM, Glen Newton wrote: millions of cores will not work... ...yet. -glen On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote: Solr has no limitation on the number of cores. It's limited by your hardware, inodes and how many files you could keep open. I think even if you went the Lucene route you would run into same hardware limits. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html Sent from the Solr - User mailing list archive at Nabble.com. -- - http://zzzoot.blogspot.com/ - -- Alireza Salimi Java EE Developer
Re: Lucene vs Solr design decision
On the other hand, I'm aware of the fact that if I go with Lucene approach, failover is something that I will have to support manually! which is a nightmare! On Fri, Mar 9, 2012 at 2:13 PM, Alireza Salimi alireza.sal...@gmail.comwrote: This solution makes sense, but I still don't know if I can use solrCloud with this configuration or not. On Fri, Mar 9, 2012 at 2:06 PM, Robert Stewart bstewart...@gmail.comwrote: Split up index into say 100 cores, and then route each search to a specific core by some mod operator on the user id: core_number = userid % num_cores core_name = core+core_number That way each index core is relatively small (maybe 100 million docs or less). On Mar 9, 2012, at 2:02 PM, Glen Newton wrote: millions of cores will not work... ...yet. -glen On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote: Solr has no limitation on the number of cores. It's limited by your hardware, inodes and how many files you could keep open. I think even if you went the Lucene route you would run into same hardware limits. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html Sent from the Solr - User mailing list archive at Nabble.com. -- - http://zzzoot.blogspot.com/ - -- Alireza Salimi Java EE Developer -- Alireza Salimi Java EE Developer
Re: Lucene vs Solr
Is that right? On Tue, Oct 19, 2010 at 11:08 PM, findbestopensource findbestopensou...@gmail.com wrote: Hello all, I have posted an article Lucene vs Solr http://www.findbestopensource.com/article-detail/lucene-vs-solr Please feel free to add your comments. Regards Aditya www.findbestopensource.com