Re: Performance gain with setting !cache=false in the query for complex queries

2015-08-25 Thread wwang525
Hi Erick, Up to now, all the tests were based on randomly generated requests. In reality, many requests will get executed more than twice since this is to support the advertising project. On the other hand, new queries could be generated daily. So some of the filter queries will be used

Performance gain with setting !cache=false in the query for complex queries

2015-08-24 Thread wwang525
Hi All, I am working on improving query performance of queries that is based on 15 M records, and all the queries have a list of about 6 filter queries with grouping and faceting requirements. So far, I found that the cache setting in solrconfig.xml is helpful after the Solr server is warmed

Re: Performance gain with setting !cache=false in the query for complex queries

2015-08-24 Thread wwang525
Hi Erick, The earlier test was done through individual requests. However, my load test is even better. (1) load test (3 requests/per second/per core) immediately after restarting Solr: average response time: 122 ms (2) load test (5 requests/per second/per core) immediately after restarting Solr:

Re: Is it a good query performance with this data size ?

2015-08-19 Thread wwang525
Hi Erick, All my queries are based on fq (filter query). I have to send the randomly generated queries to warm up low level lucene cache. I went to the more tedious way to warm up low level cache without utilizing the three caches by turning off the three caches (set values to zero). Then, I

Re: Is it a good query performance with this data size ?

2015-08-19 Thread wwang525
Hi Upayavira, Thank you very much for pointing out the potential design issue The queries will be determined through a configuration by business users. There will be limited number of queries every day, and will get executed by customers repeatedly. However, business users will change the

Re: Is it a good query performance with this data size ?

2015-08-19 Thread wwang525
Hi Upayavira, I happened to compose individual fq for each field, such as: fq=Gatewaycode:(...)fq=DestCode:(...)fq=DateDep:(...)fq=Duration:(...) It is nice to know that I am not creating unnecessary cache entries since the above method results in minimal carnality as you pointed out. Thank

Re: Is it a good query performance with this data size ?

2015-08-18 Thread wwang525
Hi Erick, Two facets are probably demanding: departure_date have 365 distinct values and hotel_code can have 800 distinct values. The docValues setting definitely helped me a lot even when all the queries had the above two facets. I will test a list of queries with or without the two facets

Re: Is it a good query performance with this data size ?

2015-08-18 Thread wwang525
Hi Erick, I just tested 10 different queries with or without the faceting search on the two properties : departure_date, and hotel_code. Under cold cache scenario, they have pretty much the same response time, and the faceting took much less time than the query time. Under cold cache scenario,

Is it a good query performance with this data size ?

2015-08-18 Thread wwang525
Hi All, I am working on a search service based on Solr (v5.1.0). The data size is 15 M records. The size of the index files is 860MB. The test was performed on a local machine that has 8 cores with 32 G memory and CPU is 3.4Ghz (Intel Core i7-3770). I found out that setting docValues=true for

Re: TrieIntField not working in Solr 4.7 ?

2015-08-05 Thread wwang525
Hi Upayavira, I edited the definition of tint to have a precisionStep=0 for DateDep (i.e.: departure date). This field is used as filter query and also used in faceted search. The following are definitions: fieldType name=tint class=solr.TrieIntField precisionStep=0

Re: TrieIntField not working in Solr 4.7 ?

2015-08-05 Thread wwang525
Hi Upayavira, A bit more explanation on DateDep. This value in database is expressed as a varchar (8), and has the format of 20150803. I map it to be an SortableIntField before, and it worked with the filter query and faceted search. After I changed it to be TrieIntField, tried re-indexing

Re: TrieIntField not working in Solr 4.7 ?

2015-08-05 Thread wwang525
Hi All, It looks like Numeric field can not be used for faceting if docValues=true. The following article seemed to indicate an issue in this scenario: https://issues.apache.org/jira/browse/SOLR-7495 Unexpected docvalues type NUMERIC when grouping by a int facet -- View this message in

TrieIntField not working in Solr 4.7 ?

2015-08-04 Thread wwang525
Hi All, I was trying to switch the type definition for some fields from SortableIntField to TrieIntField so that I may be able boost the performance for the queries that use grouping, sorting, and faceting. After I switched on field for grouping, I got the following error:

Re: TrieIntField not working in Solr 4.7 ?

2015-08-04 Thread wwang525
Hi Alex, I waited until the indexing process finished successfully. I also set default value for these fields and I can see from simply query that the data was fine. The error happened after I execute a faceted query. Thanks -- View this message in context:

Re: TrieIntField not working in Solr 4.7 ?

2015-08-04 Thread wwang525
Hi Upayavira, I have physically cleaned up the files under index directory, and re-index did not fix the problem. The following is an example of the field definition: field name=DateDep type=tint indexed=true stored=true docValues=true default=0 required=true/ and the following is the

Re: TrieIntField not working in Solr 4.7 ?

2015-08-04 Thread wwang525
Hi Upayavira, My queries has all the features: search, sorting, grouping, faceting. As I was working on the project, I noticed the response time of the query got longer and longer as I added these features. I was reading the solr-ref-guide-4.7, and the following is from page 66. I thought covert

Re: Planning Solr migration to production: clean and autoSoftCommit

2015-07-13 Thread wwang525
Hi Erick, That status request shows if the Solr instance is busy or idle. I think this is a doable option to check if the indexing process completed (idle) or not (busy). Now, I have some concern about the solution of not using the default polling mechanism from the slave instance to the master

Re: Planning Solr migration to production: clean and autoSoftCommit

2015-07-13 Thread wwang525
Hi Erick, I think this is good solution. It is going to work although I have not implemented with Http API which I was able to find in https://wiki.apache.org/solr/SolrReplication. In my local machine, a total of 800MB of index files were downloaded within a minute to another folder. However,

Planning Solr migration to production: clean and autoSoftCommit

2015-07-10 Thread wwang525
Hi, The following questions are about the basic configuration options in production. We will have three machines: one indexing instance (master) and two Solr instances (in different machines) for searching purpose. This way, we will always have two Solr instances dedicated for executing search

Re: Planning Solr migration to production: clean and autoSoftCommit

2015-07-10 Thread wwang525
Hi Erick, It is Solr 4.7. For the time being, we are considering the old style master/slave configuration. The re-indexing is going to be every 4 hours or even every 2 hours a day, so it is not rare. Manually managing replication is not an option. Is there any other easy-to-manage option ?

How to determine cache setting in Solr Search Instance

2015-07-09 Thread wwang525
Hi All, I did a load test with a total of 800 requests (at 40 concurrent requests per second) to be executed against Solr index with 14 M records. Performance was good ( 1 second) especially after a short period of time of the test. BTW, the second round of load test was even better. The local

Re: How to determine cache setting in Solr Search Instance

2015-07-09 Thread wwang525
Hi, The real production requests will not be randomly generated, and a lot of requests will be repeated. I think the performance will be better due to the repeated requests. In addition, I am sure the configuration will need to be adjusted once the application is in production. For the time

Re: How to do a Data sharding for data in a database table

2015-07-02 Thread wwang525
Hi, I worked with other search solutions before, and cache management is important in boosting performance. Apart from the cache generated due to user's requests, loading the search index into memory is the very initial step after the index is built. This is to ensure search results to be

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread wwang525
Hi, I am currently investigating the queries with a much small index size (1M) to see the grouping, faceting on the performance degradation. This will allow me to do a lot of tests in a short period of time. However, it looks like the query is executed much faster the second time. This is tested

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread wwang525
Test_results_round_2.doc http://lucene.472066.n3.nabble.com/file/n4215016/Test_results_round_2.doc -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4215016.html Sent from the Solr - User mailing list archive

Re: How to do a Data sharding for data in a database table

2015-06-30 Thread wwang525
Hi All, I did many tests with very consistent test results. Each query was executed after re-indexing, and only one request was sent to query the index. I disabled filterCache and queryResultCache for this test based on Erick's recommendation. The test document was posted to this email list

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread wwang525
schema.xml http://lucene.472066.n3.nabble.com/file/n4213864/schema.xml solrconfig.xml http://lucene.472066.n3.nabble.com/file/n4213864/solrconfig.xml -- View this message in context:

Re: How to do a Data sharding for data in a database table

2015-06-24 Thread wwang525
Hi All, I built the Solr index with 14 M records. I have 20 G RAM in my local machine, and the Solr instance was started with -Xms1024m -Xmx8196m The following query: http://localhost:8983/solr/db-mssql/select?q=*:*fq=GatewayCode:(YYZ)fq=DestCode:(CUN)fq=Duration:(5 OR 6 OR 7 OR

How to do a Data sharding for data in a database table

2015-06-18 Thread wwang525
Hi, We probably would like to shard the data since the response time for demanding queries at 10M records is getting 1 second in a single request scenario. I have not done any data sharding before. What are some recommended way to do data sharding. For example, may be by a criteria with a list

Re: How to do a Data sharding for data in a database table

2015-06-18 Thread wwang525
The query without load is still under 1 second. But under load, response time can be much longer due to the queued up query. We would like to shard the data to something like 6 M / shard, which will still give a under 1 second response time under load. What are some best practice to shard the