Re: unable to get more throughput with more threads

2017-03-23 Thread Suresh Pendap
Edwin, The heap was not being used much, only 1GB of heap was being used out of 8GB. I do have space to allocate more to the heap size. I was reading in some SOLR performance blogs that it is better not to use large heap size, instead it is better to provide lot of space to the Operating system

Re: unable to get more throughput with more threads

2017-03-23 Thread Suresh Pendap
I am using version 6.3 of Solr On 3/23/17 7:56 PM, "Aman Deep Singh" wrote: >system

Re: unable to get more throughput with more threads

2017-03-23 Thread Erick Erickson
I'd check my I/O. Since you're firing the same query, I expect that you aren't I/O bound at all, since, as you say, the docs should already be in memory. This assumes that your document cache size is > 0. You can check this. Go to the admin UI, select one of your cores (not collection) and go to

Re: unable to get more throughput with more threads

2017-03-23 Thread Aman Deep Singh
You can play with the merge factor in the index config. If their is no frequent updates then make it 2 ,it will give you High throughput and less latency. On 24-Mar-2017 8:22 AM, "Zheng Lin Edwin Yeo" wrote: > I also did find that beyond 10 threads for 8GB heap size ,

Re: unable to get more throughput with more threads

2017-03-23 Thread Zheng Lin Edwin Yeo
I also did find that beyond 10 threads for 8GB heap size , there isn't much improvement with the performance. But you can increase your heap size a little if your system allows it. By the way, which Solr version are you using? Regards, Edwin On 24 March 2017 at 09:21, Matt Magnusson

solr lost connection to zookeeper

2017-03-23 Thread Xie, Sean
When Solr lost the connection to Zookeeper, is there any way to have Solr reconnect to it after zookeeper is back online? Does Solr must be restarted to re-initiate the connection? Thanks Sean Confidentiality Notice:: This email, including attachments, may include non-public, proprietary,

Re: to handle expired documents: collection alias or delete by id query

2017-03-23 Thread Derek Poh
Erick Generally the products have contracted date but they could be extended and also get expired prematurely. We will need additional processing to cater for these scenarios and update the 'expiry date' fields accordingly. Will go through thedocumentationagainand see if it can fitour use

Re: unable to get more throughput with more threads

2017-03-23 Thread Matt Magnusson
Out of curosity, what is your index size? I'm trying to do something similar with maximizing output, I'm currently looking at streaming expressions which I'm seeing some interesting results for, I'm also finding that the direct mass query route seems to hit a wall for performance. I'm also finding

Re: to handle expired documents: collection alias or delete by id query

2017-03-23 Thread Derek Poh
Hi Emir Thank you for pointing outdeleted docwill still existin the indextill it is optimize and itwill skewed statistics. We dosort by score. This new collectionsare partofa new business initiativeandwe do not know as yet what will be their sizelike. Willgo ponder on your inputs. Thank

unable to get more throughput with more threads

2017-03-23 Thread Suresh Pendap
Hi, I am new to SOLR search engine technology and I am trying to get some performance numbers to get maximum throughput from the SOLR cluster of a given size. I am currently doing only query load testing in which I randomly fire a bunch of queries to the SOLR cluster to generate the query load.

Re: Newbie in Solr

2017-03-23 Thread Alexandre Rafalovitch
Glad to hear you liked my site. You can find the truly minimal (non-production) example at https://github.com/arafalov/simplest-solr-config . It is not that scary. If you are looking at the database import, you may also want to review my work in progress on simplifying DIH DB example at:

Newbie in Solr

2017-03-23 Thread Ercan Karadeniz
Hi All, I'm a newbie in Solr. I have the task to replace the built-in search functionality of a online shop system (xtcmodified commerce, it's a German online shop system => https://www.modified-shop.org/) with Solr. modified eCommerce Shopsoftware - kostenloser OpenSource

[ANNOUNCE] Apache Gora 0.7 Release

2017-03-23 Thread lewis john mcgibbney
Hi Folks, The Apache Gora team are pleased to announce the immediate availability of Apache Gora 0.7. The Apache Gora open source framework provides an in-memory data model and persistence for big data. Gora supports persisting to column stores, key value stores, document stores and RDBMSs, and

Re: Architecture suggestions

2017-03-23 Thread David Hastings
Yeah coming up with a "perfect" machine for your use is completely trial and error. for me personally i found that on one machine with 24 cores, 148gb ram, handles one solr instance with 4 cores, a 16mil records sitting at 400gb, a 53mil records sitting at 160gb, and a 108mil records sitting at

Re: Architecture suggestions

2017-03-23 Thread Erick Erickson
I've seen single nodes handle 10M docs using 64G of heap (using Zing). I've seen 300M in 12G of memory. There's absolutely no way to tell. See: https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ for a methodology to answer the question with

Re: to handle expired documents: collection alias or delete by id query

2017-03-23 Thread Erick Erickson
have you considered using TTL (Time To Live)? You have to know at index time when the doc will expire. If you do, Solr will delete the doc for you when its life is over. See: https://lucidworks.com/2014/05/07/document-expiration/ Also the Ref guide:

Re: Regex Phrases

2017-03-23 Thread Mark Johnson
So I managed to get the tokenizing to work with both PatternTokenizerFactory and WordDelimiterFilterFactory (used in combination with WhitespaceTokenizerFactory). For PT I used a regex that matches the various permutations of the phrases, and for WDF/WT I used protected words with every

Re: to handle expired documents: collection alias or delete by id query

2017-03-23 Thread Emir Arnautovic
Hi Derek, There are both pros and cons for both approaches: 1. if you are doing full reindexing PRO is that you have clean index all the time and even if something goes wrong, you don't have to switch alias to updated index so your users will not notice issues. CON is that you are doing full

Re: Architecture suggestions

2017-03-23 Thread Emir Arnautovic
Hi Vrindavda, It is hard to tell anything without testing and details on what/how is indexed, how it is going to be queried and what are latency/throughput requirements. 25M or 12.5M documents per shard might be too much if you have strict latency requirements, but testing is the only way

Re: Regex Phrases

2017-03-23 Thread Joel Bernstein
You can also checkout https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers-RegularExpressionPatternTokenizer . Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Mar 22, 2017 at 7:52 PM, Erick Erickson wrote: > Susheel: > > That'll work, but the

Architecture suggestions

2017-03-23 Thread vrindavda
Hello, My production index is expected to contain 50 million documents, with addition of around 1 million every year. Should I go for 64GB RAM (4 Shards /4 Replicas) Or 128GB (2 Shards/ 2 Replicas) ? Please suggest if above assumptions are incorrect. What all parameters should I consider ?

to handle expired documents: collection alias or delete by id query

2017-03-23 Thread Derek Poh
Hi I have collections of products. I am doing indexing 3-4 times daily. Every day there are products that expired and I need to remove them from these collectionsdaily. Ican think of 2 ways to do this. 1. using collection aliasto switch between a main and temp collection. - clear and index

Re: block join - search together at parent and childern

2017-03-23 Thread Jan Nekuda
Hi Mikhail, thank you very much - it's exactly what I need. When I have tried it first a had problem with spaces and it seems that it doesn't work, but now it works great. Thanks and have a nice day Jan 2017-03-21 10:11 GMT+01:00 Mikhail Khludnev : > Hello Jan, > If I get you