lucene vs Solr Indexing on Sample data

2015-06-15 Thread Argho Chatterjee
Hello Everyone,

I had posted a question on stackoverflow.com after performing a few POCs

My hadrware consist of a single i-3 intel processor (4 CPU as per dxdiag
on run ), 8GB Ram, Laptop machine.

My Question Link :
http://stackoverflow.com/questions/30823314/lucene-vs-solr-indexning-speed-for-sampe-data

but no one could solve it as of now..
I hope the question I posted is undertandable.

Please if anyone could help me out with the indexing speed of Solr (way
slower) vs Lucene (way faster)..

I am trying to build a module for real time indexing and querying, and the
traffic is high, POC pass with Lucene for handling High Traffic for
Indexing, for Solr It is not able to do so..

Again My Machine Spec :
HP, intel core i3, 8GB ram, TB HDD.

Please let me know if there is a problem with Solr or am I doing anything
wrong.

Thanks
Argho


Re: lucene vs Solr Indexing on Sample data

2015-06-15 Thread Alessandro Benedetti
Actually I can see a problem in your question…
Lucene and Solr are not competitor technologies.
Solr is a Search Server that internally uses the Lucene library and offers
easy to use configuration and REST API.
Lucene is a library that implements tons of search algorithms and features.
You can see Solr as best practice for Lucene implemented server.
It offers out of the box a usable search server with tons of features easy
to use( take a look to the official site to have an idea) .

On the other hand Lucene is a library, so you can develop with it your
personal Search Server or Search application.
More than performance you should really understand if you want to rewrite a
lot of already implemented search features, or maybe re-use the ones
developer by Lucene gurus.

Furthermore of course, it depends of the feature you really need for your
application.

Cheers

2015-06-15 13:16 GMT+01:00 Argho Chatterjee 
joy.chatterjee.crazyc...@gmail.com:

 Hello Everyone,

 I had posted a question on stackoverflow.com after performing a few POCs

 My hadrware consist of a single i-3 intel processor (4 CPU as per dxdiag
 on run ), 8GB Ram, Laptop machine.

 My Question Link :

 http://stackoverflow.com/questions/30823314/lucene-vs-solr-indexning-speed-for-sampe-data

 but no one could solve it as of now..
 I hope the question I posted is undertandable.

 Please if anyone could help me out with the indexing speed of Solr (way
 slower) vs Lucene (way faster)..

 I am trying to build a module for real time indexing and querying, and the
 traffic is high, POC pass with Lucene for handling High Traffic for
 Indexing, for Solr It is not able to do so..

 Again My Machine Spec :
 HP, intel core i3, 8GB ram, TB HDD.

 Please let me know if there is a problem with Solr or am I doing anything
 wrong.

 Thanks
 Argho




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: lucene vs Solr Indexing on Sample data

2015-06-15 Thread Erick Erickson
Basically I expect you're falling afoul of a very common misunderstanding;
It's not that Solr is slower, it's that the client isn't feeding Solr
as fast as it
should.

If you profile your Solr server, my suspicion is that you're not
driving it very hard.
You'll probably see 4 spikes in CPU activity, followed by it doing
nothing at all. The
spikes are when you actually send the doclist to Solr.

Your client is creating a 250K document packet, _then_ transmitting it to Solr,
waiting for the response, then creating another packet. While creating a
packet, Solr is doing nothing at all, just waiting.

You'll get better performance by using ConcurrentUpdateSolrClient and
much smaller packets (say 1,000). Give it, say, 10 threads and a queue length
of 10 or so. You'll have to experiment for sure.

Now, all that said since Solr is wrapping Lucene, since there's some additional
overhead because Solr has to parse out the doc and pass it on to Lucene etc,
you'll inevitably see some degradation. It shouldn't be as extreme as you're
seeing though so I'm pretty sure you'll find your client isn't written
to get the
best performance out of Solr.

In future, please don't link questions to another forum. It makes it
less likely that
people will actually respond.

Best,
Erick

On Mon, Jun 15, 2015 at 6:52 AM, Alessandro Benedetti
benedetti.ale...@gmail.com wrote:
 Actually I can see a problem in your question…
 Lucene and Solr are not competitor technologies.
 Solr is a Search Server that internally uses the Lucene library and offers
 easy to use configuration and REST API.
 Lucene is a library that implements tons of search algorithms and features.
 You can see Solr as best practice for Lucene implemented server.
 It offers out of the box a usable search server with tons of features easy
 to use( take a look to the official site to have an idea) .

 On the other hand Lucene is a library, so you can develop with it your
 personal Search Server or Search application.
 More than performance you should really understand if you want to rewrite a
 lot of already implemented search features, or maybe re-use the ones
 developer by Lucene gurus.

 Furthermore of course, it depends of the feature you really need for your
 application.

 Cheers

 2015-06-15 13:16 GMT+01:00 Argho Chatterjee 
 joy.chatterjee.crazyc...@gmail.com:

 Hello Everyone,

 I had posted a question on stackoverflow.com after performing a few POCs

 My hadrware consist of a single i-3 intel processor (4 CPU as per dxdiag
 on run ), 8GB Ram, Laptop machine.

 My Question Link :

 http://stackoverflow.com/questions/30823314/lucene-vs-solr-indexning-speed-for-sampe-data

 but no one could solve it as of now..
 I hope the question I posted is undertandable.

 Please if anyone could help me out with the indexing speed of Solr (way
 slower) vs Lucene (way faster)..

 I am trying to build a module for real time indexing and querying, and the
 traffic is high, POC pass with Lucene for handling High Traffic for
 Indexing, for Solr It is not able to do so..

 Again My Machine Spec :
 HP, intel core i3, 8GB ram, TB HDD.

 Please let me know if there is a problem with Solr or am I doing anything
 wrong.

 Thanks
 Argho




 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England


Re: Lucene vs Solr design decision

2012-03-10 Thread William Bell
Great answer Robert.

On Fri, Mar 9, 2012 at 12:06 PM, Robert Stewart bstewart...@gmail.com wrote:
 Split up index into say 100 cores, and then route each search to a specific 
 core by some mod operator on the user id:

 core_number = userid % num_cores

 core_name = core+core_number

 That way each index core is relatively small (maybe 100 million docs or less).


 On Mar 9, 2012, at 2:02 PM, Glen Newton wrote:

 millions of cores will not work...
 ...yet.

 -glen

 On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote:
 Solr has no limitation on the number of cores. It's limited by your 
 hardware,
 inodes and how many files you could keep open.

 I think even if you went the Lucene route you would run into same hardware
 limits.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html
 Sent from the Solr - User mailing list archive at Nabble.com.



 --
 -
 http://zzzoot.blogspot.com/
 -




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Lucene vs Solr design decision

2012-03-09 Thread Alireza Salimi
Hi everybody,

Let's say we have a system with billions of small documents (average of 2-3
fields).
and each document belongs to JUST ONE user
and searches are user specific, meaning that when we search
for something, we just look into documents of that user.

On the other hand we need to see the newly added documents
as soon as they are added to the indexes.

Now I think we have two solutions:
1. Use Lucene directly and create a separate index file for each user
2. Use Solr and store all of the users' data all together in one HUGE index
file

the benefit of using Lucene is that each commit() will take less time
comparing to the case that we use Solr.

Is there any suggested solution for cases like this?

Thanks

-- 
Alireza Salimi
Java EE Developer


Re: Lucene vs Solr design decision

2012-03-09 Thread Lan
Solr has cores which are independent search indexes. You could create a
separate core per user. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813489.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Lucene vs Solr design decision

2012-03-09 Thread Alireza Salimi
Sorry I didn't mention that, the number of users can be millions!
Meaning that millions of cores! So I'm not sure if it's a good idea.

On Fri, Mar 9, 2012 at 1:35 PM, Lan dung@gmail.com wrote:

 Solr has cores which are independent search indexes. You could create a
 separate core per user.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813489.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Alireza Salimi
Java EE Developer


Re: Lucene vs Solr design decision

2012-03-09 Thread Lan
Solr has no limitation on the number of cores. It's limited by your hardware,
inodes and how many files you could keep open.

I think even if you went the Lucene route you would run into same hardware
limits.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Lucene vs Solr design decision

2012-03-09 Thread Glen Newton
millions of cores will not work...
...yet.

-glen

On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote:
 Solr has no limitation on the number of cores. It's limited by your hardware,
 inodes and how many files you could keep open.

 I think even if you went the Lucene route you would run into same hardware
 limits.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
-
http://zzzoot.blogspot.com/
-


Re: Lucene vs Solr design decision

2012-03-09 Thread Alireza Salimi
probably, and besides that, how can I use the features that SolrCloud
provides (i.e. high availability and distribution)?

The other solution would be to use SolrCloud and keep all of the users'
information in single collection and use NRT. But on the other hand
the frequency of updates on that big collection will be high.

Do you think it makes sense?

On Fri, Mar 9, 2012 at 2:02 PM, Glen Newton glen.new...@gmail.com wrote:

 millions of cores will not work...
 ...yet.

 -glen

 On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote:
  Solr has no limitation on the number of cores. It's limited by your
 hardware,
  inodes and how many files you could keep open.
 
  I think even if you went the Lucene route you would run into same
 hardware
  limits.
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html
  Sent from the Solr - User mailing list archive at Nabble.com.



 --
 -
 http://zzzoot.blogspot.com/
 -




-- 
Alireza Salimi
Java EE Developer


Re: Lucene vs Solr design decision

2012-03-09 Thread Robert Stewart
Split up index into say 100 cores, and then route each search to a specific 
core by some mod operator on the user id:

core_number = userid % num_cores

core_name = core+core_number

That way each index core is relatively small (maybe 100 million docs or less).


On Mar 9, 2012, at 2:02 PM, Glen Newton wrote:

 millions of cores will not work...
 ...yet.
 
 -glen
 
 On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote:
 Solr has no limitation on the number of cores. It's limited by your hardware,
 inodes and how many files you could keep open.
 
 I think even if you went the Lucene route you would run into same hardware
 limits.
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 -- 
 -
 http://zzzoot.blogspot.com/
 -



Re: Lucene vs Solr design decision

2012-03-09 Thread Alireza Salimi
This solution makes sense, but I still don't know if I can use solrCloud
with
this configuration or not.

On Fri, Mar 9, 2012 at 2:06 PM, Robert Stewart bstewart...@gmail.comwrote:

 Split up index into say 100 cores, and then route each search to a
 specific core by some mod operator on the user id:

 core_number = userid % num_cores

 core_name = core+core_number

 That way each index core is relatively small (maybe 100 million docs or
 less).


 On Mar 9, 2012, at 2:02 PM, Glen Newton wrote:

  millions of cores will not work...
  ...yet.
 
  -glen
 
  On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote:
  Solr has no limitation on the number of cores. It's limited by your
 hardware,
  inodes and how many files you could keep open.
 
  I think even if you went the Lucene route you would run into same
 hardware
  limits.
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
  --
  -
  http://zzzoot.blogspot.com/
  -




-- 
Alireza Salimi
Java EE Developer


Re: Lucene vs Solr design decision

2012-03-09 Thread Alireza Salimi
On the other hand, I'm aware of the fact that if I go with Lucene approach,
failover is something that I will have to support manually! which is a
nightmare!

On Fri, Mar 9, 2012 at 2:13 PM, Alireza Salimi alireza.sal...@gmail.comwrote:

 This solution makes sense, but I still don't know if I can use solrCloud
 with
 this configuration or not.

 On Fri, Mar 9, 2012 at 2:06 PM, Robert Stewart bstewart...@gmail.comwrote:

 Split up index into say 100 cores, and then route each search to a
 specific core by some mod operator on the user id:

 core_number = userid % num_cores

 core_name = core+core_number

 That way each index core is relatively small (maybe 100 million docs or
 less).


 On Mar 9, 2012, at 2:02 PM, Glen Newton wrote:

  millions of cores will not work...
  ...yet.
 
  -glen
 
  On Fri, Mar 9, 2012 at 1:46 PM, Lan dung@gmail.com wrote:
  Solr has no limitation on the number of cores. It's limited by your
 hardware,
  inodes and how many files you could keep open.
 
  I think even if you went the Lucene route you would run into same
 hardware
  limits.
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813511.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
  --
  -
  http://zzzoot.blogspot.com/
  -




 --
 Alireza Salimi
 Java EE Developer





-- 
Alireza Salimi
Java EE Developer


Re: Lucene vs Solr

2010-10-20 Thread Pradeep Singh
Is that right?

On Tue, Oct 19, 2010 at 11:08 PM, findbestopensource 
findbestopensou...@gmail.com wrote:

 Hello all,

 I have posted an article Lucene vs Solr
 http://www.findbestopensource.com/article-detail/lucene-vs-solr

 Please feel free to add your comments.

 Regards
 Aditya
 www.findbestopensource.com