Re: How to do a Data sharding for data in a database table

2015-07-02 Thread wwang525
Hi, I worked with other search solutions before, and cache management is important in boosting performance. Apart from the cache generated due to user's requests, loading the search index into memory is the very initial step after the index is built. This is to ensure search results to be

Re: How to do a Data sharding for data in a database table

2015-07-02 Thread Erick Erickson
bq: Does Solr automatically loads search index into memory after the index is built? No. That's what the autowarm counts on on your queryResultCache and filterCache are intended to facilitate. Also after every commit, a newSearcher event is fired and any warmup queries you have configured in the

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread wwang525
Hi, I am currently investigating the queries with a much small index size (1M) to see the grouping, faceting on the performance degradation. This will allow me to do a lot of tests in a short period of time. However, it looks like the query is executed much faster the second time. This is tested

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread Erick Erickson
I'd set filterCache and queryResultCache to zero (size and autowarm count) Leave documentCache alone IMO as it's used to store documents on disk as the pass through various query components and doesn't autowarm anyway. I'd think taking it out would skew your results because of multiple

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread wwang525
Test_results_round_2.doc http://lucene.472066.n3.nabble.com/file/n4215016/Test_results_round_2.doc -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4215016.html Sent from the Solr - User mailing list archive

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread wwang525
Hi All, I did many tests with very consistent test results. Each query was executed after re-indexing, and only one request was sent to query the index. I disabled filterCache and queryResultCache for this test based on Erick's recommendation. The test document was posted to this email list

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread Erick Erickson
bq: The index size is only 1 M records. A 10 times of the record size ( 10M) will likely bring the total response time to 1 second This is an extrapolation you simply cannot make. Plus you cannot really tell anything from just a few queries about system performance. In fact you must disregard

Re: How to do a Data sharding for data in a database table

2015-06-27 Thread Erick Erickson
Hmmm, indeed it does. Never mind ;) I guess the thing I'd be looking at is garbage collection, here's a very good writeup: http://lucidworks.com/blog/garbage-collection-bootcamp-1-0/ Kind of a shot in the dark, but it's possible. Good luck! Erick On Thu, Jun 25, 2015 at 3:26 PM, Wenbin Wang

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread Erick Erickson
bq: Try not to store fields as much as possible. Why? Storing fields certainly adds lots of size to the _disk_ files, but have much less effect on memory requirements than one might think. The *.fdt and *.fdx files in your index are used for the stored data, and they're only read for the top N

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread wwang525
schema.xml http://lucene.472066.n3.nabble.com/file/n4213864/schema.xml solrconfig.xml http://lucene.472066.n3.nabble.com/file/n4213864/solrconfig.xml -- View this message in context:

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread Wenbin Wang
Hi Erick, The configuration is largely the default one, and I have not made much change. I am also quite new to Solr although I have a lot of experience in other search products. The whole list of fields need to be retrieved, so I do not have much of a choice. The total size of the index files

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread Erick Erickson
You're missing the point. One of the things that can really affect response time is too-frequent commits. The fact that the commit configurations have been commented out indicate that the commits are happening either manually (curl, HTTP request or the like) _or_ you have, say, a SolrJ client that

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread Shawn Heisey
On 6/25/2015 10:27 AM, Wenbin Wang wrote: To clarify the work: We are very early in the investigative phase, and the indexing is NOT done continuously. I indexed the data once through Admin UI, and test the query. If I need to index again, I can use curl or through the Admin UI. The Solr

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread Wenbin Wang
To clarify the work: We are very early in the investigative phase, and the indexing is NOT done continuously. I indexed the data once through Admin UI, and test the query. If I need to index again, I can use curl or through the Admin UI. The Solr 4.7 seems to have a default setting of

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread Wenbin Wang
Hi Guys, I have no problem changing it to 2. However, we are talking about two different applications. The Solr 4.7 has two applications: example and example-DIH. The application example-DIH is the one I started with since it works with database. The example-DIH has the default setting to 4.

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread William Bell
1GB is too small to start. Try starting the same on both: -Xms8196m -Xmx8196m We use 12GB for these on a similar sized index and it works good. Send schema.xml and solrconfig.xml. Try not to store fields as much as possible. On Wed, Jun 24, 2015 at 8:08 AM, wwang525 wwang...@gmail.com wrote:

Re: How to do a Data sharding for data in a database table

2015-06-24 Thread wwang525
Hi All, I built the Solr index with 14 M records. I have 20 G RAM in my local machine, and the Solr instance was started with -Xms1024m -Xmx8196m The following query: http://localhost:8983/solr/db-mssql/select?q=*:*fq=GatewayCode:(YYZ)fq=DestCode:(CUN)fq=Duration:(5 OR 6 OR 7 OR

RE: How to do a Data sharding for data in a database table

2015-06-19 Thread Carlos Maroto
As stated previously, using Field Collapsing (group parameters) tends to significantly slow down queries. In my experience, search response gets even worst when: - Requesting facets, which more often than not I do in my query formulation - Asking for the facet counts to be on the groups via the

RE: How to do a Data sharding for data in a database table

2015-06-19 Thread Reitzel, Charles
Hi Wenbin, To me, your instance appears well provisioned. Likewise, your analysis of test vs. production performance makes a lot of sense. Perhaps your time would be well spent tuning the query performance for your app before resorting to sharding? To that end, what do you see when you

RE: How to do a Data sharding for data in a database table

2015-06-19 Thread Reitzel, Charles
Grouping does tend to be expensive. Our regular queries typically return in 10-15ms while the grouping queries take 60-80ms in a test environment ( 1M docs). This is ok for us, since we wrote our app to take the grouping queries out of the critical path (async query in parallel with two

Re: How to do a Data sharding for data in a database table

2015-06-19 Thread Wenbin Wang
I have enough RAM (30G) and Hard disk (1000G). It is not I/O bound or computer disk bound. In addition, the Solr was started with maximal 4G for JVM, and index size is 2G. In a typical test, I made sure enough free RAM of 10G was available. I have not tuned any parameter in the configuration, it

Re: How to do a Data sharding for data in a database table

2015-06-19 Thread Erick Erickson
First and most obvious thing to try: bq: the Solr was started with maximal 4G for JVM, and index size is 2G Bump your JVM to 8G, perhaps 12G. The size of the index on disk is very loosely coupled to JVM requirements. It's quite possible that you're spending all your time in GC cycles. Consider

Re: How to do a Data sharding for data in a database table

2015-06-19 Thread Wenbin Wang
As for now, the index size is 6.5 M records, and the performance is good enough. I will re-build the index for all the records (14 M) and test it again with debug turned on. Thanks On Fri, Jun 19, 2015 at 12:10 PM, Erick Erickson erickerick...@gmail.com wrote: First and most obvious thing to

RE: How to do a Data sharding for data in a database table

2015-06-19 Thread Reitzel, Charles
Also, since you are tuning for relative times, you can tune on the smaller index. Surely, you will want to test at scale. But tuning query, analyzer or schema options is usually easier to do on a smaller index. If you get a 3x improvement at small scale, it may only be 2.5x at full scale.

Re: How to do a Data sharding for data in a database table

2015-06-19 Thread Erick Erickson
Do be aware that turning on debug=query adds a load. I've seen the debug component take 90% of the query time. (to be fair it usually takes a much smaller percentage). But you'll see a section at the end of the response if you set debug=all with the time each component took so you'll have a sense

Re: How to do a Data sharding for data in a database table

2015-06-18 Thread Erick Erickson
You've repeated your original statement. Shawn's observation is that 10M docs is a very small corpus by Solr standards. You either have very demanding document/search combinations or you have a poorly tuned Solr installation. On reasonable hardware I expect 25-50M documents to have sub-second

How to do a Data sharding for data in a database table

2015-06-18 Thread wwang525
Hi, We probably would like to shard the data since the response time for demanding queries at 10M records is getting 1 second in a single request scenario. I have not done any data sharding before. What are some recommended way to do data sharding. For example, may be by a criteria with a list

Re: How to do a Data sharding for data in a database table

2015-06-18 Thread wwang525
The query without load is still under 1 second. But under load, response time can be much longer due to the queued up query. We would like to shard the data to something like 6 M / shard, which will still give a under 1 second response time under load. What are some best practice to shard the

Re: How to do a Data sharding for data in a database table

2015-06-18 Thread Jack Krupansky
10M doesn't sound too demanding. How complex are your queries? How complex is your data - like number of fields and size, like very large documents? Are you sure you have enough RAM to fully cache your index? Are your queries compute-bound or I/O bound? If I/O-bound, get more RAM. If