Hi Erick,
Up to now, all the tests were based on randomly generated requests.
In reality, many requests will get executed more than twice since this is to
support the advertising project. On the other hand, new queries could be
generated daily. So some of the filter queries will be used
Hi All,
I am working on improving query performance of queries that is based on 15 M
records, and all the queries have a list of about 6 filter queries with
grouping and faceting requirements.
So far, I found that the cache setting in solrconfig.xml is helpful after
the Solr server is warmed
Hi Erick,
The earlier test was done through individual requests. However, my load test
is even better.
(1) load test (3 requests/per second/per core) immediately after restarting
Solr: average response time: 122 ms
(2) load test (5 requests/per second/per core) immediately after restarting
Solr:
Hi Erick,
All my queries are based on fq (filter query). I have to send the randomly
generated queries to warm up low level lucene cache.
I went to the more tedious way to warm up low level cache without utilizing
the three caches by turning off the three caches (set values to zero). Then,
I
Hi Upayavira,
Thank you very much for pointing out the potential design issue
The queries will be determined through a configuration by business users.
There will be limited number of queries every day, and will get executed by
customers repeatedly. However, business users will change the
Hi Upayavira,
I happened to compose individual fq for each field, such as:
fq=Gatewaycode:(...)fq=DestCode:(...)fq=DateDep:(...)fq=Duration:(...)
It is nice to know that I am not creating unnecessary cache entries since
the above method results in minimal carnality as you pointed out.
Thank
Hi Erick,
Two facets are probably demanding:
departure_date have 365 distinct values and hotel_code can have 800 distinct
values.
The docValues setting definitely helped me a lot even when all the queries
had the above two facets. I will test a list of queries with or without the
two facets
Hi Erick,
I just tested 10 different queries with or without the faceting search on
the two properties : departure_date, and hotel_code. Under cold cache
scenario, they have pretty much the same response time, and the faceting
took much less time than the query time. Under cold cache scenario,
Hi All,
I am working on a search service based on Solr (v5.1.0). The data size is 15
M records. The size of the index files is 860MB. The test was performed on a
local machine that has 8 cores with 32 G memory and CPU is 3.4Ghz (Intel
Core i7-3770).
I found out that setting docValues=true for
Hi Upayavira,
I edited the definition of tint to have a precisionStep=0 for DateDep
(i.e.: departure date). This field is used as filter query and also used in
faceted search.
The following are definitions:
fieldType name=tint class=solr.TrieIntField precisionStep=0
Hi Upayavira,
A bit more explanation on DateDep.
This value in database is expressed as a varchar (8), and has the format of
20150803. I map it to be an SortableIntField before, and it worked with the
filter query and faceted search.
After I changed it to be TrieIntField, tried re-indexing
Hi All,
It looks like Numeric field can not be used for faceting if
docValues=true.
The following article seemed to indicate an issue in this scenario:
https://issues.apache.org/jira/browse/SOLR-7495
Unexpected docvalues type NUMERIC when grouping by a int facet
--
View this message in
Hi All,
I was trying to switch the type definition for some fields from
SortableIntField to TrieIntField so that I may be able boost the performance
for the queries that use grouping, sorting, and faceting.
After I switched on field for grouping, I got the following error:
Hi Alex,
I waited until the indexing process finished successfully.
I also set default value for these fields and I can see from simply query
that the data was fine. The error happened after I execute a faceted query.
Thanks
--
View this message in context:
Hi Upayavira,
I have physically cleaned up the files under index directory, and re-index
did not fix the problem.
The following is an example of the field definition:
field name=DateDep type=tint indexed=true stored=true
docValues=true default=0 required=true/
and the following is the
Hi Upayavira,
My queries has all the features: search, sorting, grouping, faceting. As I
was working on the project, I noticed the response time of the query got
longer and longer as I added these features.
I was reading the solr-ref-guide-4.7, and the following is from page 66. I
thought covert
Hi Erick,
That status request shows if the Solr instance is busy or idle. I think
this is a doable option to check if the indexing process completed (idle) or
not (busy).
Now, I have some concern about the solution of not using the default polling
mechanism from the slave instance to the master
Hi Erick,
I think this is good solution. It is going to work although I have not
implemented with Http API which I was able to find in
https://wiki.apache.org/solr/SolrReplication.
In my local machine, a total of 800MB of index files were downloaded
within a minute to another folder. However,
Hi,
The following questions are about the basic configuration options in
production.
We will have three machines: one indexing instance (master) and two Solr
instances (in different machines) for searching purpose. This way, we will
always have two Solr instances dedicated for executing search
Hi Erick,
It is Solr 4.7. For the time being, we are considering the old style
master/slave configuration.
The re-indexing is going to be every 4 hours or even every 2 hours a day, so
it is not rare. Manually managing replication is not an option. Is there any
other easy-to-manage option ?
Hi All,
I did a load test with a total of 800 requests (at 40 concurrent requests
per second) to be executed against Solr index with 14 M records. Performance
was good ( 1 second) especially after a short period of time of the test.
BTW, the second round of load test was even better.
The local
Hi,
The real production requests will not be randomly generated, and a lot of
requests will be repeated. I think the performance will be better due to the
repeated requests. In addition, I am sure the configuration will need to be
adjusted once the application is in production.
For the time
Hi,
I worked with other search solutions before, and cache management is
important in boosting performance. Apart from the cache generated due to
user's requests, loading the search index into memory is the very initial
step after the index is built. This is to ensure search results to be
Hi,
I am currently investigating the queries with a much small index size (1M)
to see the grouping, faceting on the performance degradation. This will
allow me to do a lot of tests in a short period of time.
However, it looks like the query is executed much faster the second time.
This is tested
Test_results_round_2.doc
http://lucene.472066.n3.nabble.com/file/n4215016/Test_results_round_2.doc
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4215016.html
Sent from the Solr - User mailing list archive
Hi All,
I did many tests with very consistent test results. Each query was executed
after re-indexing, and only one request was sent to query the index. I
disabled filterCache and queryResultCache for this test based on Erick's
recommendation.
The test document was posted to this email list
schema.xml http://lucene.472066.n3.nabble.com/file/n4213864/schema.xml
solrconfig.xml
http://lucene.472066.n3.nabble.com/file/n4213864/solrconfig.xml
--
View this message in context:
Hi All,
I built the Solr index with 14 M records.
I have 20 G RAM in my local machine, and the Solr instance was started
with -Xms1024m -Xmx8196m
The following query:
http://localhost:8983/solr/db-mssql/select?q=*:*fq=GatewayCode:(YYZ)fq=DestCode:(CUN)fq=Duration:(5
OR 6 OR 7 OR
Hi,
We probably would like to shard the data since the response time for
demanding queries at 10M records is getting 1 second in a single request
scenario.
I have not done any data sharding before. What are some recommended way to
do data sharding. For example, may be by a criteria with a list
The query without load is still under 1 second. But under load, response time
can be much longer due to the queued up query.
We would like to shard the data to something like 6 M / shard, which will
still give a under 1 second response time under load.
What are some best practice to shard the
30 matches
Mail list logo