Re: Solr architecture

2016-02-12 Thread Mark Robinson
Thanks All for your suggestions! Rgds, Mark. On Thu, Feb 11, 2016 at 9:45 AM, Upayavira wrote: > Your biggest issue here is likely to be http connections. Making an HTTP > connection to Solr is way more expensive than the ask of adding a single > document to the index. If you are expecting to a

Re: Solr architecture

2016-02-11 Thread Upayavira
Your biggest issue here is likely to be http connections. Making an HTTP connection to Solr is way more expensive than the ask of adding a single document to the index. If you are expecting to add 24 billion docs per day, I'd suggest that somehow merging those documents into batches before sending

Re: Solr architecture

2016-02-11 Thread Emir Arnautovic
Hi Mark, Nothing comes for free :) With doc per action, you will have to handle large number of docs. There is hard limit for number of docs per shard - it is ~4 billion (size of int) so sharding is mandatory. It is most likely that you will have to have more than one collection. Depending on

Re: Solr architecture

2016-02-10 Thread Mark Robinson
Thanks everyone for your suggestions. Based on it I am planning to have one doc per event with sessionId common. So in this case hopefully indexing each doc as and when it comes would be okay? Or do we still need to batch and index to Solr? Also with 4M sessions a day with about 6000 docs (events

Re: Solr architecture

2016-02-10 Thread Mark Robinson
Thanks everyone for your suggestions. Based on it I am planning to have a doc per event. On Wed, Feb 10, 2016 at 3:38 AM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: > Hi Mark, > Appending session actions just to be able to return more than one session > without retrieving large numb

Re: Solr architecture

2016-02-10 Thread Emir Arnautovic
Hi Mark, Appending session actions just to be able to return more than one session without retrieving large number of results is not good tradeoff. Like Upayavira suggested, you should consider storing one action per doc and aggregate on read time or push to Solr once session ends and aggregat

Re: Solr architecture

2016-02-09 Thread Mark Robinson
Thanks for your replies and suggestions! Why I store all events related to a session under one doc? Each session can have about 500 total entries (events) corresponding to it. So when I try to retrieve a session's info it can back with around 500 records. If it is this compounded one doc per sessi

Re: Solr architecture

2016-02-09 Thread Daniel Collins
So as I understand your use case, its effectively logging actions within a user session, why do you have to do the update in NRT? Why not just log all the user session events (with some unique key, and ensuring the session Id is in the document somewhere), then when you want to do the query, you j

Re: Solr architecture

2016-02-09 Thread Upayavira
Bear in mind that Lucene is optimised towards high read lower write. That is, it puts in a lot of effort at write time to make reading efficient. It sounds like you are going to be doing far more writing than reading, and I wonder whether you are necessarily choosing the right tool for the job. Ho

Re: Solr architecture

2016-02-09 Thread Mark Robinson
Hi, Thanks for all your suggestions. I took some time to get the details to be more accurate. Please find what I have gathered:- My data being indexed is something like this. I am basically capturing all data related to a user session. Inside a session I have categorized my actions like actionA, a

Re: Solr architecture

2016-02-08 Thread Jack Krupansky
Oops... at 100 qps for a single node you would need 120 nodes to get to 12K qps and 800 nodes to get 80K qps, but that is just an extremely rough ballpark estimate, not some precise and firm number. And that's if all the queries can be evenly distributed throughout the cluster and don't require fan

Re: Solr architecture

2016-02-08 Thread Jack Krupansky
So is there any aging or TTL (in database terminology) of older docs? And do all of your queries need to query all of the older documents all of the time or is there a clear hierarchy of querying for aged documents, like past 24-hours vs. past week vs. past year vs. older than a year? Sure, you ca

Re: Solr architecture

2016-02-08 Thread Erick Erickson
Short form: You really have to prototype. Here's the long form: https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ I've seen between 20M and 200M docs fit on a single piece of hardware, so you'll absolutely have to shard. And the other th

Re: Solr architecture

2016-02-08 Thread Emir Arnautovic
Hi Mark, Can you give us bit more details: size of docs, query types, are docs grouped somehow, are they time sensitive, will they update or it is rebuild every time, etc. Thanks, Emir On 08.02.2016 16:56, Mark Robinson wrote: Hi, We have a requirement where we would need to index around 2 B

Re: Solr architecture

2016-02-08 Thread Susheel Kumar
Also if you are expecting indexing of 2 billion docs as NRT or if it will be offline (during off hours etc). For more accurate sizing you may also want to index say 10 million documents which may give you idea how much is your index size and then use that for extrapolation to come up with memory r

Solr architecture

2016-02-08 Thread Mark Robinson
Hi, We have a requirement where we would need to index around 2 Billion docs in a day. The queries against this indexed data set can be around 80K queries per second during peak time and during non peak hours around 12K queries per second. Can Solr realize this huge volumes. If so, assuming we ha

SOLR architecture recommendation

2011-09-27 Thread Robert Stewart
I need some recommendations for a new SOLR project. We currently have a large (200M docs) production system using Lucene.Net and what I would call our own .NET implementation of SOLR (built early on when SOLR was less mature and did not run as well on Windows). Our current architecture works

Re: Solr architecture diagram

2011-04-10 Thread Lance Norskog
Very cool! "The Life Cycle of the IndexSearcher" would also be a great diagram. The whole dance that happens during a commit is hard to explain. Also, it would help show why garbage collection can act up around commits. Lance On Sun, Apr 10, 2011 at 2:05 AM, Jan Høydahl wrote: >> Looks really go

Re: Solr architecture diagram

2011-04-10 Thread Jan Høydahl
> Looks really good, but two bits that i think might confuse people are > the implications that a "Query Parser" then invokes a series of search > components; and that "analysis" (and the pieces of an analyzer chain) > are what to lookups in the underlying lucene index. > > the first might just

Re: Solr architecture diagram

2011-04-07 Thread Chris Hostetter
: of the components as well as the flow of data and queries. The result is : a conceptual architecture diagram, clearly showing how Solr relates to : the app-server, how cores relate to a Solr instance, how documents enter : through an UpdateRequestHandler, through an UpdateChain and Analysis a

Re: Solr architecture diagram

2011-04-07 Thread David MARTIN
Hi, Thank you for this contribution. Such a diagram could be useful in the official documentation. David On Thu, Apr 7, 2011 at 12:15 PM, Jeffrey Chang wrote: > This is awesome; thank you! > > On Thu, Apr 7, 2011 at 6:09 PM, Jan Høydahl wrote: > > > Hi, > > > > Glad you liked it. You'd like t

Re: Solr architecture diagram

2011-04-07 Thread Jeffrey Chang
This is awesome; thank you! On Thu, Apr 7, 2011 at 6:09 PM, Jan Høydahl wrote: > Hi, > > Glad you liked it. You'd like to model the inner architecture of SolrJ as > well, do you? Perhaps that should be a separate diagram. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.comin

Re: Solr architecture diagram

2011-04-07 Thread Jan Høydahl
Hi, Glad you liked it. You'd like to model the inner architecture of SolrJ as well, do you? Perhaps that should be a separate diagram. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 6. apr. 2011, at 12.06, Stevo Slavić wrote: > Nice, thank you! > > Wish there wa

Re: Solr architecture diagram

2011-04-06 Thread Stevo Slavić
Nice, thank you! Wish there was something similar or extra to this one depicting where do SolrJ's CommonsHttpSolrServer and EmbeddedSolrServer fit in. Regards, Stevo. On Wed, Apr 6, 2011 at 11:44 AM, Jan Høydahl wrote: > Hi, > > At Cominvent we've often had the need to visualize the internal ar

Solr architecture diagram

2011-04-06 Thread Jan Høydahl
Hi, At Cominvent we've often had the need to visualize the internal architecture of Apache Solr in order to explain both the relationships of the components as well as the flow of data and queries. The result is a conceptual architecture diagram, clearly showing how Solr relates to the app-serv

Re: Solr Architecture discussion

2010-06-14 Thread Chris Hostetter
: B- A backup of the current index would be created : C- Re-Indexing will happen on Master-core2 : D- When Indexing is done, we'll trigger a swap between Master-core1 and : core2 ... : But how can B,C, and D. I'll do it manually. Wait! I'm not sure my boss will : pay for that. : 1/Can I

Re: Solr Architecture discussion

2010-06-01 Thread rabahb
oes that sound good to you? Or is there a better and more elegant way to do the trick when indexing and replication should be beating at a high pace? Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Architecture-discussion-tp825708p860942.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Architecture discussion

2010-06-01 Thread rabahb
optimization only when the replication activity is not so crucial in order to avoid degrading the search performances. Thank you very much. That helps a lot. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Architecture-discussion-tp825708p860767.html Sent from the Solr

Re: Solr Architecture discussion

2010-05-26 Thread Chris Hostetter
: 4- trigger swap between core 1 and core2 : 5- At this point Slave index has been renewed ... we can revert back to the : previous index if there was any issues with the new one. these steps are largely unneccessary -- within a single SolrCore Solr already keeps track of the "current" searcher

Re: Solr Architecture discussion

2010-05-19 Thread rabahb
Do you have any insights that could help me and other people that might be interested in that discussion? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Architecture-discussion-tp825708p828658.html Sent from the Solr - User mailing list archive at Nabble.com.

Solr Architecture discussion

2010-05-18 Thread rabahb
for sharing. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Architecture-discussion-tp825708p825708.html Sent from the Solr - User mailing list archive at Nabble.com.

Proposed Solr architecture - does this make sense?

2008-07-01 Thread Todd Breiholz
Hi all New to Solr/Lucene. Our current search is done with Verity and we are looking to move towards open-source products. Our first application would have less than 500,000 documents indexed at the outset. Additions/updates to the index would occur at 2,000-3,000 per minute. We are currently upd